Category Archives: dataviz

Visual Communication in Python: Pie Charts with Matplotlib

A personal story about how I started using Python for my research: when I was a post-doc at Microsoft, I was embarrassed to ask them to buy me Matlab. But I knew how to plot things in Matlab and I didn’t have time to learn how to make a graphic look nice with Excel or whatever the preferred Microsoft tool was at the time. Matplotlib to the rescue. It was free, it looked *better* than Matlab, and then it was done.

As readers of this blog may know, I have come to use Python extensive in my research by now. But one thing that I have not changed in the 10 years since that post-doc experience is using matplotlib like it was Matlab. It might be time to change.

I recently read a short blog on the modern approach to using Matplotlib, http://pbpython.com/effective-matplotlib.html, and it seems worth a try. Do you remember a talk on data visualization I gave last fall? https://github.com/aflaxman/iths-communicating-results-visually-2

I’m going to try remaking the plots I spoke on with my old school mpl and the modern approach. Here is the first, a pie chart.

My old-fashioned way is in a notebook from my talk, and looks like this:

plt.figure(figsize=(9,8))
plt.subplots_adjust(hspace=.3, right=.8, left=.1)
plt.pie([2,98], labels=['Survived\n2%', 'Died\n98%'], colors=colors, startangle=0)

The new way is built up in this notebook here, and ends up being comparable:

fig, ax = plt.subplots(figsize=(9,8))
s.plot(kind='pie', colors=colors, startangle=0)
fig.subplots_adjust(hspace=.3, right=.8, left=.1)
ax.set_ylabel('')

Is that cooler? I’m not convinced, but I’ll keep trying.

Comments Off on Visual Communication in Python: Pie Charts with Matplotlib

Filed under dataviz

Infographics in Python: Plot a Noun Project Icon on a Matplotlib Chart

I had to put an icon on a chart in Python last week, and I couldn’t find a good brief blog about how to do it. Here is what I cobbled together:

1. Find a free, appropriate image from The Noun Project.
2. Load it into Python with plt.imread
3. Draw it in the proper place on a figure with plt.imshow and some cryptic, hacky options.

Looks good, right?
1500

See this all in action here: https://gist.github.com/aflaxman/c171050384471636e8f23f322ba7e9c5

Comments Off on Infographics in Python: Plot a Noun Project Icon on a Matplotlib Chart

Filed under dataviz

Visualizing Uncertainty

Click to access paper_BELIV_evaluating_uncertainty_vis.pdf

Click to access Harrower.pdf

Click to access HoekstraEtAlPBR.pdf

Click to access uncertain-bus-chi2016.pdf

Click to access MacEachren_Visualizing_98.pdf

Click to access MacEachren_IEEE_TVCG_PrePub_2012_reduced_res.pdf

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6654171

Click to access MacEachren_IEEE_TVCG_PrePub_2012_reduced_res.pdf

Comments Off on Visualizing Uncertainty

Filed under dataviz

New words of wisdom from S Few

The Visual Perception of Variation in Data Displays

Click to access the_visual_perception_of_variation.pdf

(well, it was new when I started this post)

Comments Off on New words of wisdom from S Few

Filed under dataviz

Big Data Science resources

• The Oregon Health & Science University (OHSU) Department of Medical Informatics & Clinical Epidemiology (DMICE) and Library are pleased to announce the release of open educational resources (OERs) in the area of Biomedical Big Data Science. Funded by a grant from the National Institutes of Health (NIH) Big Data to Knowledge (BD2K) Program, OERs have been produced that can be downloaded, used, and repurposed for a variety of educational audiences by both learners and educators. Development of the OERs is an ongoing process, but they have reached the point where a critical mass of the content is being made available for use and to obtain feedback. The OERs are intended to be flexible and customizable and their use or repurpose is encouraged. They can be used as “out of the box” courses for students or as materials for educators to use in courses, training programs, and other learning activities. The goal is to create 32 module topics. Currently, 20 of the modules are available for download and use. For additional information, contact Bill Hersh at: hersh@ohsu.edu.

Also all on GitHub: https://github.com/OHSUBD2K/

I want to see this one: BDK32 Displaying Confidence and Uncertainty

it doesn’t exist yet, so I have to remember to check back when it does.

Comments Off on Big Data Science resources

Filed under dataviz

Ideas that did not make it into my recent Data Viz talk

D3js in any substantial way
Steve Few email list, and his example with isotype and patient risk charts
538.com viz stuff

Comments Off on Ideas that did not make it into my recent Data Viz talk

Filed under dataviz

OHSU BD2K material on data visualization

https://github.com/OHSUBD2K/BDK18-Data-Visualization

Comments Off on OHSU BD2K material on data visualization

Filed under dataviz

S Few on expressing proportions

Always makes me think: http://www.perceptualedge.com/articles/visual_business_intelligence/expressing_proportions.pdf

Not sure I agree so much with this edition, but I like that he is taking on those goofy unit charts (and I missed the Unit Charts are for Kids essay the first time around http://www.perceptualedge.com/articles/visual_business_intelligence/unit_charts_are_for_kids.pdf )

Comments Off on S Few on expressing proportions

Filed under dataviz

To read: EnsembleMatrix paper

EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

I want one

Comments Off on To read: EnsembleMatrix paper

Filed under dataviz, machine learning

Beyond the Sum-Difference Plot

Bland–Altman plot
From Wikipedia, the free encyclopedia

Bland–Altman plot example
A Bland–Altman plot (Difference plot) in analytical chemistry and biostatistics is a method of data plotting used in analyzing the agreement between two different assays. It is identical to a Tukey mean-difference plot, the name by which it is known in other fields, but was popularised in medical statistics by J. Martin Bland and Douglas G. Altman.[1][2]

1 Comment

Filed under dataviz