Tag Archives: my research

GBD 2010

The massive project I’ve been working on since moving from math to global health has been published!

The Global Burden of Disease Study 2010 (GBD 2010) is the largest ever systematic effort to describe the global distribution and causes of a wide array of major diseases, injuries, and health risk factors. The results show that infectious diseases, maternal and child illness, and malnutrition now cause fewer deaths and less illness than they did twenty years ago. As a result, fewer children are dying every year, but more young and middle-aged adults are dying and suffering from disease and injury, as non-communicable diseases, such as cancer and heart disease, become the dominant causes of death and disability worldwide. Since 1970, men and women worldwide have gained slightly more than ten years of life expectancy overall, but they spend more years living with injury and illness.

GBD 2010 consists of seven Articles, each containing a wealth of data on different aspects of the study (including data for different countries and world regions, men and women, and different age groups), while accompanying Comments include reactions to the study’s publication from WHO Director-General Margaret Chan and World Bank President Jim Yong Kim. The study is described by Lancet Editor-in-Chief Dr Richard Horton as “a critical contribution to our understanding of present and future health priorities for countries and the global community.”

Now I have to get my book about the methods out the door as well…

2 Comments

Filed under global health

Automated Quality Assurance for Mobile Data Collection

I’m excited to call your attention to a paper that my co-author Ben Birnbaum is presenting next week at the ACM DEV conference:

This research is about… well, the title says it pretty clearly. I’m interested in using our approach to detect surprises in data quality in all kinds of settings. Ben did the heavy lifting for this paper, so he deserves a lot of the congratulations that it has received the best paper award from the DEV 2012 program committee.

Congratulations, Ben!

Comments Off on Automated Quality Assurance for Mobile Data Collection

Filed under global health

Flock of VA papers

I’m afraid that Healthy Algorithms will be pretty quiet in the next month, I’ve got some major other writing commitments to attend to, and I need to ration my keystrokes if I’m going to make the deadline.

But here is something I’m happy to leave at the top of the page while I’m busy: the special issue of Population Health Metrics devoted to the Verbal Autopsy is provisionally available.

This includes the paper on using random forests for computer coding verbal autopsies that I’ve mentioned before, a paper describing the massive efforts that went into collecting a verbal autopsy validation dataset, and a paper on our take on the metrics of prediction quality that we recommend for any approach to verbal autopsy.

Bonus, a commentary that quotes Foucault to put random forests in context.

2 Comments

Filed under global health

Random Forest Verbal Autopsy Debut

I just got back from a very fun conference, which was the culmination of some very hard work, all on the Verbal Autopsy (which I’ve mentioned often here in the past).

In the end, we managed to produce machine learning methods that rival the ability of physicians. Forget Jeopardy, this is a meaningful victory for computers. Now Verbal Autopsy can scale up without pulling human doctors away from their work.

Oh, and the conference was in Bali, Indonesia. Yay global health!

I do have a Machine Learning question that has come out of this work, maybe one of you can help me. The thing that makes VA most different from the machine learning applications I have seen in the past is the large set of values the labels can take. For neonatal deaths, for which the set is smallest, we were hoping to make predictions out of 11 different causes, and we ended up thinking that maybe 5 causes is the most we could do. For adult deaths, we had 55 causes on our initial list. There are two standard approaches that I know for converting binary classifiers to multiclass classifiers, and I tried both. Random Forest can produce multiclass predictions directly, and I tried this, too. But the biggest single improvement to all of the methods I tried came from a post-processing step that I have not seen in the literature, and I hope someone can tell me what it is called, or at least what it reminds them of.

For any method that produces a score for each cause, what we ended up doing is generating a big table with scores for a collection of deaths (one row for each death) for all the causes on our cause list (one column for each cause). Then we calculated the rank of the scores down each column, i.e. was it the largest score seen for this cause in the dataset, second largest, etc., and then to predict the cause of a particular death, we looked across the row corresponding to that death and found the column with the best rank. This can be interpreted as a non-parametric transformation from scores into probabilities, but saying it that way doesn’t make it any clearer why it is a good idea. It is a good idea, though! I have verified that empirically.

So what have we been doing here?

7 Comments

Filed under TCS

World Malaria Report and MCMC

OMG I have got busy. I went to NIPS and the weekend disappeared and now it’s post-doc interview season again, already! So much to say, but I plan to pace myself. For this short post, an exciting announcement that my model of the insecticide treated mosquito net distribution supply chain was used in the WHO 2010 World Malaria Report, which just came out. Since it is a Bayesian statistical model that draws samples from a posterior distribution with MCMC, it’s really nice that the report includes some of the uncertainty intervals around the coverage estimates. Guess what? There is a lot of uncertainty. But nets are getting to households and getting used. Pages 19 and 20 in Chapter 4 have the results of our hard work.

3 Comments

Filed under global health, MCMC

Applied Approximate Counting: Malaria

My first first-authored global health paper came out today (I consider it my first “first-authored” paper ever, since the mathematicians I’ve worked with deviantly list authorship in alphabetical order regardless of seniority and contribution). It’s a bit of a mouthful by title: Rapid Scaling Up of Insecticide-Treated Bed Net Coverage in Africa and Its Relationship with Development Assistance for Health: A Systematic Synthesis of Supply, Distribution, and Household Survey Data.

What I find really pleasing about this research paper is the way it continues research I worked on in graduate school, but in a completely different and unexpected direction. Approximate counting is something that my advisor specialized in, and he won a big award for the random polynomial time algorithm for approximating the volume of convex bodies. I followed in his footsteps when I was a student, and I’m still doing approximate counting, it’s just that now, instead of approximating the amount of high-dimensional sand that will fit in an oddly shaped high-dimensional box, I’ve been approximating the number of insecticide-treated bednets that have made it from manufacturers through the distribution supply-chain and into the households of malaria-endemic regions of the world. I’m even using the same technique, Markov-chain Monte Carlo.

I’ve been itching to write about the computational details of this research for a while, and now that the paper’s out, I will have my chance. But for today, I refer you to the PLoS Med paper, and the technical appendix, and the PyMC code on github.

4 Comments

Filed under global health, MCMC

Child Mortality Paper

Check it out, my first published research in global health: Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970—2010: a systematic analysis of progress towards Millennium Development Goal 4. I’m the ‘t’ in et al, and my contribution was talking them into using the really fun Gaussian Process in their model (and helping do it).

I’ve long wanted to write a how-to style tutorial about using Gaussian Processes in PyMC, but time continues to be on someone else’s side. Instead of waiting for that day, you can enjoy the GP Users Guide now.

3 Comments

Filed under global health

Teleportation Measurements

I’m not attending WWW this week, but I am promoting a paper that I helped with, Tracking the random surfer: Empirically measured teleportation parameters in PageRank. My main contribution was connecting the people with the idea to the people with the data, but I’m happy with the results.

Incidentally, this sort of measurement has a great application in health metrics. But I’m going to keep it secret for a little while, to get my thoughts in order.

Comments Off on Teleportation Measurements

Filed under statistics

Verbal Autopsy Challenge from AI-D

I was down in Palo Alto last week to attend the AAAI session on Artificial Intelligence for Development. The proceedings should be available online soon.

I was there to connect with other theoretical computer science and find out how they have been applying machine learning to “development”. It turned out that development means mostly applications to health, education, and agriculture in this crowd.

I was also there to share a very concrete challenge problem that I’ve been dabbling in here at IHME, which my colleague Sean Green presented our short paper on: the Verbal Autopsy.

Instead of recapping the problem in detail here, I’ll point you to our paper, and try to say just enough to get you interested. Continue reading

7 Comments

Filed under global health

A useful metaphor for explaining MCMC

I work in an interdisciplinary institute, and you should see the fun when mathematicians, statisticians, and physicists try to discuss models and methods for health metrics each using the dialect of their specific fields. And then throw doctors and epidemiologists into the mix, with the understanding that doctors secretly think scientists might not be smart enough to be doctors and vice-versa.

It’s here where I think this metaphor my officemate and I were just trying out will be really useful. Markov Chain Monte Carlo (MCMC) is this foundational technique in my work lately, the central algorithm I have been using for sampling from the posterior distribution of all of my models. But “how does it work?”, my non-MCMC colleagues sometimes dare to ask me. (Or more frequently lately, “why doesn’t it work?”)

To explain by way of analogy, imagine that the posterior probability density of the model is a mountain, with higher probability parameters corresponding to points of higher elevation. Our goal is to summarize the topography of the mountain. Many of my colleagues are familiar with “hill-climbing algorithms”, wherein the algorithm looks for the mountain peak by taking the steepest path up from wherever it currently stands. (Familiar because they have using algorithms that do this, and often, since this is the pacific northwest, because they spend their weekends doing this themselves on actual mountains.)

MCMC is an approach that explores the mountain with a “drunken walk”, one carefully designed to stand at points of a given elevation for an amount of time proportional to the elevation. I love the visual, drunken mountain climbing.

Then, as Nate and I were just discussing, the “why do/doesn’t it work” question has an analogical answer summarized by these pictures:


Which mountain are you trying to climb drunk?

4 Comments

Filed under education, MCMC