Speaking of graphics…

I got this email the other day:

the open data kit team is in dire need of a logo and we need your help. if you know a designer (or are a designer) looking to contribute to a growing open source project, now is your chance!

before you get started, we have three rules.

  • go big! although the core team works on healthcare in africa, odk is a much broader project. try to stay away from global development or health themes, but feel free to play off the words and ideas in the project name.
  • think unified! we build a lot of tools (collect, aggregate, build, voice, clinic, etc), so a brand that could be used across all of them would be great. for example, most of the adobe creative suite applications use the same basic theme. see them at
  • be inspired! if you’ve never designed a logo, has some great ideas to help you get started. that site also has links to an amazing array of examples.
  • our goal is to have a couple of logos that we can pick from, so spread the word and send in a few! the deadline for the competition is november 1 at midnight. send your attempts to

    thanks for helping make odk a more visually pleasing project,


Election Season Infographics

I’ve seen a lot of visual display of quantitative information in the news lately, and I like that. But I’ve seen a lot of ink used for more style than substance, and that bugs me, especially when the point is stronger with more substance.

In Exhibit A, I draw your attention to the graphic from last week’s NYTimes front page article Top Corporations Aid U.S. Chamber of Commerce Campaign. The graphic on the web differs slightly from online, but both obscure the point: conservative groups are drastically outspending their rivals in the current election cycle.

A casual observer might miss this though, because of the stylish way the data is represented as red and blue squares, each standing on edge. The artfully arbitrary spacing between the overlapping squares makes it even harder to interpret.

Here’s my remix:

With a pro designer to work this over, the NYTimes could have a sexy front page infographic that’s meaningful, too. Look at that: among the top ten organizations, conservative spending is two times liberal. And if you pull out the “party spenders”, i.e. NRCC, NRSC, DCCC, DSCC, then conservatives are spending five times more. A picture is worth a large number of words, but we should still make them mean something.

I have another remix to share… actually, this is the one that got me to make some graphs of my own. Seeing misleading areas in print once a week, that I can stand, but when I was reading up on Washington State’s “Tax the Ultra-Rich” ballot initiative and I saw it again this morning… well, you’re reading the results.

Behold Exhibit B:

In this case, there is no pretense that the pyramid slabs mean something about the number of returns that they represent. They’re not even separate slabs, take a look at the top. This pyramid is metaphorical, and it does have a nice color scheme.

But why not make an actual plot? Again, if you get a professional designer to work it over, it can have nice fonts and margins and all, but doesn’t my remix below get the point across better?

Here’s some code if you want to remix my remix.


Global Congress on Verbal Autopsy in 2011 open for abstract submission

Have you heard me say that Verbal Autopsy is a exemplary machine learning challenge? I think I say it about once a week.

Now there’s going to be a great forum for saying it. Read more here.

MCMC in Python: How to stick a statistical model on a system dynamics model in PyMC

A recent question on the PyMC mailing list inspired me.  How can you estimate transition parameters in a compartmental model?  I did a lit search for just this when I started up my generic disease modeling project two years ago.  Much information, I did not find.  I turned up one paper which said basically that using a Bayesian approach was a great idea and someone should try it (and I can’t even find that now!).

Part of the problem was language.  I’ve since learned that micro-simulators call it “calibration” when you estimate parameter values, and there is a whole community of researchers working on “black-box modeling plug-and-play inference” that do something similar as well.  These magic phrases are incantations to the search engines that help find some relevant prior work.

But I started blazing my own path before I learned any of the right words; with PyMC, it is relatively simple.  Consider the classic SIR model from mathematical epidemology.  It’s a great place to start, and it’s what Jason Andrews started with on the PyMC list.  I’ll show you how to formulate it for Bayesian parameter estimation in PyMC, and how to make sure your MCMC has run for long enough. Continue reading


IHME and a Gates Foundation Critique

I was forwarded a recent article about the Gates Foundation and how it has partnered with news organizations like ABC News and The Guardian. And guess what? IHME makes an appearance in the second half of the second page! I wouldn’t say that it’s positive about my work, but I am delighted to see the technical appendix mentioned in print.

During my recent education in medicine, I’ve learned that an appendix is something that people think you don’t need. Also, if something goes wrong with it, it can kill you. And it’s true that the “webpendix” is 219 pages, but the bulk of that is pictures. The first 19 pages are a pretty decent stats paper about how we used Gaussian Processes to model really noisy time-series data.


Yearly percentage decline in mortality in children younger than 5 years between 1990 and 2010



Network Stats Continue

A couple of new papers on networks crossed my desk this week.   Well, more than a couple, since I’m PC-ing for the Web Algorithms Workshop (WAW) right now.  But a couple crossed my desk that I’m not reviewing, which means I can write about them.

Brendan Nyhan writes:

Just came across your blog post on Christakis/Fowler and the various critiques – thought this paper I just posted to arXiv with Hans Noel might also be of interest: The “Unfriending” Problem: The Consequences of Homophily in Friendship Retention for Causal Estimates of Social Influence

Unfriending is interesting, and an area that seems understudied.  In online social networks, there is often no cost to keep a tie in place.  The XBox Live friend network is not such a case: an XBox gamer sees frequent updates about their friends’ activities. That’s why I thought it made sense when I learned that the XBox Live social network does not exhibit the heavy tailed degree-distribution phenomenon that has been widely reported in real-world networks. Someone should talk Microsoft into releasing an anonymized edition of this graph (if such an anonymization is possible…).

Meanwhile, Anton Westveld and Peter Hoff’s paper on modeling longitudinal network data caught my eye on arxiv: A Mixed Effects Model for Longitudinal Relational and Network Data, with Applications to International Trade and Conflict.

All the things I’d like to read… I could write a book about it. Before I even had time to finish writing this post, I saw another one: On the Existence of the MLE for a Directed Random Graph Network Model with Reciprocation.

Open Source for Voting gets the goods

Here’s a great summary of how an evaluation of Washington D.C.’s open-source voting system found and fixed security flaws just the way the open-source lovers said it would: Hacking the D.C. Internet Voting Pilot.

