I read the The Girl With the Dragon Tattoo series recently, which was extremely engrossing. The first book has a bit of a health metrics theme, with each section prefaced with a shocking statistic about violence against women in Sweden. The second book has a bit of a math theme, with each section prefaced by a correct, if inane algebraic equation.
Also in the second book, the tattooed girl spends some time reading a strangely titled math book, Dimensions in Mathematics, and I liked the story enough to google the book, since it was presented with author and publisher. It turned out that this just revealed more mystery.
Besides the marvelous upgrade to ipython, there were some other things I saw at SciPy 2011 that I want to remember to remember.
I think I’ll have a lot more to say about Dexy soon, because I really need something like that. A tool to make documentation sexy. If only the tool itself had more documentation!
Speaking of SciPy 2011 (as I was in my last post), the coolest, most draw-dropping-est demo I saw there was hands-down for the new ipython. The most cutting edge stuff is available on the web. I want it.
I just returned from the SciPy 2011 conference in Austin. Definitely a different experience than a theory conference, and definitely different than the mega-conferences I’ve found myself at lately. I think I like it. My goal was to evangelize for PyMC a little bit, and I think that went successfully. I even got to meet PyMC founder Chris Fonnesbeck in person (about 30 seconds before we presented a 4 hour tutorial together).
For the tutorial, I put together a set of PyMC-by-Example slides and code to dig into that silly relationship between Human Development Index and Total Fertility Rate that foiled my best attempts at Bayesian model selection so long ago.
I’m not sure the slides stand on their own, but together with the code samples they should reproduce my portion of the talk pretty well. I even started writing it up for people who want to read it in paper form, but then I ran out of momentum. Patches welcome.
20 seconds, 20 minutes, or 20 hours. These are all amounts of time that a computational method I’ve been working at some time has taken to complete processing. They each lead to a very different experience for the model developer, and probably in the end for the model, too. Twenty seconds is definitely what I prefer.
Filed under statistics, TCS
Just like last summer, many of the Post-Bachelors Fellows of IHME are away now to learn where global health metrics come from. Spencer James has a great photoblog from his work in Zambia. Are there other PBFs that I can follow from afar?
I’m excited to report that my first contribution back to the PyMC codebase was accepted. 🙂
It is a slight reworking of the pymc.Matplot.plot function that make it include autocorrelation plots of the trace, as well as histograms and timeseries. I also made the histogram look nicer (in my humble opinion).
In this example, I can tell that MCMC hasn’t converged from the trace of
beta_2 without my changes, but it is dead obvious from the autocorrelation plot of
beta_2 in the new version.
The process of making changes to the pymc sourcecode is something that has intimidated me for a while. Here are the steps in my workflow, in case it helps you get started doing this, too.
# first fork a copy of pymc from https://github.com/pymc-devs/pymc.git on github
git clone https://github.com/pymc-devs/pymc.git
# then use virtualenv to install it and make sure the tests work
# then you can install pymc without being root
python setup.py install
# so you can make changes to it without breaking everything else
# to test that it is working
>>> import pymc
# then make changes to pymc...
# to test changes, and make sure that all of the tests that use to pass still do repeat the process above
python setup.py install
>>> import pymc
# once everything is perfect, push it to a public git repo and send a "pull request" to the pymc developers.
Is there an easier way? Let me know in the comments.
Tom Paulson, the global health journalist behind the NPR blog Humanosphere, has been taking on some very non-transparent (opaque?) rules from the Pacific Health Summit here in Seattle. Fortunately, he took a break to laud the transparency with which the institute I’m working at operates. Maybe he thinks we can be an example for the summitteers, or at least a counter balance.
Paulson didn’t mention the aspect of IHME’s work which, as an ivory-tower inhabiting academic, I find most radically transparent, however. The journal Population Health Metrics, which IHME director Chris Murray is the co-editor-in-chief and big booster of, has a scarily open review process. It’s not just open publishing where everyone can read the papers, it’s so open that everyone can read the referee reports, and the responses to referees, and the whole chain of revisions that a paper goes through before being stamped peer-reviewed.
This is great for authors. As a referee, it makes me much more responsible for my actions, which takes longer, but is probably a good thing overall. I even put some PyMC code in a review once, to tell the authors how to do something the easy way. But now I’m not sure I want to go look at this correspondence after all.
Speaking of IE, which usually stands for Industrial Engineering, but at Cornell now stands for Information Engineering (or OR/MS, which stands for Operations Research/Management Sciences), there is a subdiscipline of global health that is going through a familiar search for the perfect name. It is actually somewhat related to OR/MS, definitely it would fit in at an INFORMS meeting. For a while was going to be called “Operational Research”, which I find confusing, since this is the old European name for Operations Research. But now it seems like they too want to have “science” in the name. The new contenders are “Implementation Science” and “Program Science”.
Any thoughts from veterans of the “OR/MS” naming process that I should share with my colleagues in a similar situation?
I adapted my “Theoretical Computer Science Challenges in Global Health” for the UW Industrial Engineering department a couple of weeks ago. Instead of 10 minutes on noisy sorting for disability weight estimation, now it has 10 minutes stochastic optimization for disease control priorities. I consider it still a work in progress, but I do have a nice recording of the lecture if anyone wants some relaxing viewing: