MCMC in Python: sampling in parallel with PyMC

Question and answer on Stack Overflow.

Comments Off on MCMC in Python: sampling in parallel with PyMC

Filed under software engineering

IDV in Python: adding text callouts to a scatter plot interactively with mpld3

I’ve been pretty interested in the potential of interactive data visualization recently, especially ever since I saw the reaction that the Global Burden of Disease 2010 visualization tool, GBD Compare, received last year. And one promising technology for making this stuff routine is mpld3, a mashup of the Python plotting library matplotlib and the javascript visualization kernel d3. Have I mentioned this before?

The thing about interactive data visualization is that its not always clear what is useful because it excites my reptile brain, and what is useful for more logical reasons. But I was asking a colleague to add some callouts to a (non-interactive) figure recently when I realized that this is a chance for interactivity to be _obviously_ useful. These finishing touches on a graphic often take me tons of time, and using a command-line plotting program just can’t be the right way to do it. How about an mpld3 plugin that lets me add text callouts interactively? And when I’m done, it can “save” the callouts, by creating the necessary Python script to generate them again? Here it is, in a notebook.

Comments Off on IDV in Python: adding text callouts to a scatter plot interactively with mpld3

Filed under dataviz

MCMC in Python: Estimating failure rates from observed data

A question and answer on CrossValidated, which make me reflect on the danger of knowing enough statistics to be dangerous.

Comments Off on MCMC in Python: Estimating failure rates from observed data

Filed under statistics

MCMC in Python: How to make a custom sampler in PyMC

The PyMC documentation is a little slim on the topic of defining a custom sampler, and I had to figure it out for some DisMod work over the years. Here is a minimal example of how I did it, in answer to a CrossValidated question.

Comments Off on MCMC in Python: How to make a custom sampler in PyMC

Filed under MCMC

MCMC in Python: How to set a custom prior with joint distribution on two parameters in PyMC

Question and answer on Stackoverflow. Motivated by question and answer on CrossValidated about modeling incidence rates.

Comments Off on MCMC in Python: How to set a custom prior with joint distribution on two parameters in PyMC

Filed under Uncategorized

Change Seminar: Measuring Mortality in Iraq

I gave a seminar last week for the UW computer scientists interested in doing good with technology. It’s a fun crowd. Here are the slides, requested by a regular attendee who couldn’t be there.

Comments Off on Change Seminar: Measuring Mortality in Iraq

Filed under global health

Fact checking with GBD Compare

I’ve been developing a habit of comparing health statistics I hear in the media with the results in GBD Compare. It is nice when they agree, such as in a recent ScienceMag focus on chronic kidney disease, corroborated here: http://ihmeuw.org/1v7i . It would be even better if the cause was known, and the burden could be removed.

Comments Off on Fact checking with GBD Compare

Filed under global health

IHME Seminar: Overdiagnosed

I have fallen way behind in noting the IHME weekly seminars, but I was just thinking of this wonderful one from last semester, and I couldn’t wait any longer to link to it: Overdiagnosed: Making people sick in the pursuit of health by H. Gilbert Welch.

Comments Off on IHME Seminar: Overdiagnosed

Filed under global health

Smoking trends by US counties

I’m away from work for some really exciting family stuff, but while I wait on that, our paper on trends in smoking prevalence has just come out, along with a fun interactive data visualization of the results, and some media coverage that I think tells the story quite well.

What makes this work methodologically challenging is that the data comes from telephone surveys, but people who smoke stopped using landlines more than people who don’t smoke:

smoking_brfss_nhis

Comments Off on Smoking trends by US counties

Filed under global health

IDV in Python: Interactive heatmap with Pandas and mpld3

I’ve been having a good time following the development of the mpld3 package, and I think it has a lot of potential for making interactive data visualization part of my regular workflow instead of that special something extra. A few weeks ago, an mpld3 user showed up with an interesting challenge, and solved their own problem quite well.

I finally got a chance to look at it today, and with a little spit-and-polish this could be something really useful for me.

ihm

Comments Off on IDV in Python: Interactive heatmap with Pandas and mpld3

Filed under dataviz, software engineering