Category Archives: global health

Life Expectancy by County in US

This recent study by my colleagues has been making headlines a lot last week, but I’m just getting to write about it now.  While I was busy, stories about it appeared in high-profile outlets like NPR and the Statistical Modeling, Causal Inference, and Social Science blog.

As I’ve been thinking for two years (according to the ancient post I pushed out the door yesterday), life expectancy is a weird statistic. Life expectancy at birth is not, as the name might imply, a prediction on the average length of the life of a baby born this year. It is something more complicated to describe, but easier to predict. I like to think of it as the length of life if you froze the world exactly the way it is right now, and the baby today was exposed to the mortality risk of today’s one-year-olds next year, today’s two-year-olds in two years, etc. Although, as a friend pointed out two weeks ago, this is not a really good way to look at things either, if you push the analogy too hard. Currently Wikipedia isn’t really helpful on this matter, but maybe it will be better in the future.

There is another interesting thing in this paper, which is the validation approach the authors used. Unfortunately, it’s full development is in a paper still in press. Here is what they have to say about it so far:

We validated the performance of the model by creating small counties whose “true” underlying death rates were known. We did this by treating counties with large populations (> 750,000) as those where death rates have little sampling uncertainty. We then repeatedly sampled residents and deaths from these counties (by year and sex) to construct simulated small-county populations. We used the above model to predict mortality for these small, sampled-down counties, which were then compared with the mortality of the original large county.

I believe that this is fully developed in the paper which they cite at the beginning of the modeling section, Srebotnjak T, Mokdad AH, Murray CJL: A novel framework for validating and applying standardized small area measurement strategies, submitted. From what I’ve heard about it, I like it.

Comments Off on Life Expectancy by County in US

Filed under global health

A simple optimization problem I don’t know how to solve (from DCP)

Inspired by the recent 8F workshop, I’m trying to write up theory challenges arising from global health. And I’m trying to do it with less background research, because avoiding foolishness is a recipe for silence.

This is the what I called the “simplest open problem in DCP optimization” in a recent post about DCP (Disease Control Priorities), but with more reflection, I should temper that claim. I’m not sure it is the simplest. I’m not sure it is an open problem. And I’m pretty sure that if we solve it, the DCP optimizers will come back with something more complicated.

But it is a nice, clean problem to start with. I’m calling it “Fully Stochastic Knapsack”. It looks just like the plain, old knapsack problem:
\max \bigg\{ \sum_{i=1}^n v_ix_i \qquad s.t. \quad \sum_{i=1}^n w_ix_i \leq W, \quad x_i \in \{0,1\} \bigg\}
The fully stochastic part is that everything that usually would be input data is now a probability distribution, and the parameters of the distribution are the input data.

This makes even deciding what to maximize a challenge. I was visiting the UW Industrial Engineering Dept yesterday, and Zelda Zabinsky pointed me to this nice INFORMS tutorial by Terry Rockafeller on “coherent approaches” to this.

7 Comments

Filed under combinatorial optimization, global health

DCP MIP

(This is a post that I started a long time ago (Aug 10, 2010), but hung on to because it is only half-baked. But based on the philosophy that avoiding foolishness is a recipe for silence, I’m sending it out into the world. Coming next: the simplest useful open problem in DCP optimization.)

Acronyms are like candy in Global Health, and Operations Research is full of them, too. This post is a place for me to collect information about software available for solving the Mixed Integer Programming (MIP) problem that the Disease Control Priorities (DCP) project is going to put together.

DCP is a massive effort to quantify the cost-effectiveness of hundreds of health interventions across tens of “health platforms” (big hospitals, small clinics, pharmacies, etc). The output of this large, coordinated effort will be (as far as I’m concerned) a collection of giant matrices, that say, for different subsets of interventions across different platforms, (1) the cost of setting up the interventions, (2) the cost of operating the interventions, (3) the health gain from the interventions. Undoubtedly, there will be some quantification of the uncertainty in each of these quantities. Also, there is something called platform improvement, which can be thought of as a special type of intervention that makes a bunch of other interventions more effective on a certain platform. And there are a number of side-constraints; some interventions come together on certain platforms, some interventions are mutually exclusive, etc.

Some unknowns: (how) will the uncertainty in the entries of these matrices be quantified? Are the intervention choices all “yes/no” or do you choose how much you want of some of them, i.e a non-negative continuous variable?

This is the optimization that mixed integer programming was born for (except for the uncertainty, which takes us into less charted waters). So how we are going to do it, in theory, is just the sort of thing my OR classes focused on when I was in grad school. We didn’t talk much about how to do it in practice. Some of the hard-working students who sat in the business school and actually solved large integer programs would mutter about CPLEX once in a while at parties, but I didn’t pay it much heed.

Heed I must now pay. So I am collecting up the available MIP solvers here, and (eventually) evaluating them for my DCP optimization task. Got suggestions? I would love to hear them.

  • Gurobi – 1 2 3
  • ILOG/CPLEX – 1 2
  • GAMS – 1
  • AMPL – 1
  • NEOS – 1

5 Comments

Filed under combinatorial optimization, global health

Algorithms in the Field Workshop

Last week I attended the workshop Algorithms in the Field or “8F”, as its puzzling acronym turns out to be. David Eppstein published comprehensive notes on the talks on his blog:

I like to think that I’m doing “8F” in my global health job, and also some algorithms in the forest, algorithms in the desert, etc. But this workshop gave me a chance to think about the connection to Algorithms with a capitol “A”, the kind going on in the theory group of the ivory towered computer science department. I’ve been pretty successful at coming up with little challenges in global health where algorithmic thinking is useful (e.g. Doctors = Noise Machines), and I’m going to try to use the blog to throw more of these puzzles over the fence in the next few weeks. My barrier to doing this in the past has been the amount of background research I need to do to avoid sounding foolish. But my writing style was never suited to caution. Let’s see how it goes. I hope that even half-explained connections between Algorithms and global health will inspire some algorithmitician to fill in the details.

What I challenge myself to do more of is to go beyond the little puzzles, and synthesize something bigger to ask from algorithmists. What algorithmic innovation would really change how we’re doing things in Global Health? This is a domain where avoiding foolishness is even more of a recipe for silence. But I will try.

On my mind now: the computation time necessary to fit a model. 20 seconds, 20 minutes, or 20 hours is a really big difference. More on that thought to come.

Question to readers from Global Health Departments, what do you think capitol-A algorithms researchers can offer us?

Comments Off on Algorithms in the Field Workshop

Filed under global health, TCS

In the media

Jake called my attention to a recent public interest story in the NYTimes, about using the results of an on-going telephone survey on predict the demographic profile of happiest man in America. It seems like they’re making fun of simple models, a pursuit which I approve of:

The New York Times asked Gallup to come up with a statistical composite for the happiest person in America, based on the characteristics that most closely correlated with happiness in 2010. Men, for example, tend to be happier than women, older people are happier than middle-aged people, and so on.

Gallup’s answer: he’s a tall, Asian-American, observant Jew who is at least 65 and married, has children, lives in Hawaii, runs his own business and has a household income of more than $120,000 a year.

But the joke is that they made a few phone calls and found the guy… he looks pretty happy.

1 Comment

Filed under global health, statistics

In the media

Jake Marcus, a student I’ve been working with since I came to IHME, has an essay in The New Republic! It’s about political movements and fighting disease, and it’s his answer to the question Why isn’t there a global movement to combat noncommunicable diseases? The answer: it’s complicated.

Jake says credit for his article should also go to another of our coworkers, Steve Lim, who helped him understand some of the complications.

Comments Off on In the media

Filed under global health

Backcalculations on the Price of Life

Peter Singer has a short article on the way our society implicitly values human lives. Very clear and very quantitative. His calculations conclude with this:

The US regards the life of an American as equivalent to the lives of 144 Afghans.

Comments Off on Backcalculations on the Price of Life

Filed under global health

Videos of the GHME

The Global Health Metrics and Evaluation conference that I attended two weeks ago was scrupulously videotaped, and most sessions are now online. I had a really good time at the session Responsible data sharing and strengthening country capacity for analysis, which started with this unusual framing by moderator Elizabeth Pisani:

1 Comment

Filed under global health, science policy

World Malaria Report and MCMC

OMG I have got busy. I went to NIPS and the weekend disappeared and now it’s post-doc interview season again, already! So much to say, but I plan to pace myself. For this short post, an exciting announcement that my model of the insecticide treated mosquito net distribution supply chain was used in the WHO 2010 World Malaria Report, which just came out. Since it is a Bayesian statistical model that draws samples from a posterior distribution with MCMC, it’s really nice that the report includes some of the uncertainty intervals around the coverage estimates. Guess what? There is a lot of uncertainty. But nets are getting to households and getting used. Pages 19 and 20 in Chapter 4 have the results of our hard work.

3 Comments

Filed under global health, MCMC

MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

My recent tutorial on how to stick a statistical model on a systems dynamics model in PyMC generated a good amount of reader interest, as well as an interesting comment from Anand Patil, who writes:

Something that might interest you is that, in continuous-time stochastic differential equation models, handling the unobserved sample path between observations is really tricky. Roberts and Stramer’s On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm explains why. This difficulty can apply to discrete-time models with loads of missing data as well. Alexandros Beskos has produced several really cool solutions.

This body of research is quite far from the vocabulary I am familiar with, so I’m not sure how serious a problem this could be for me. It did get me interested in sticking my statistical model to a systems model with stochastic dynamics, though, something which took only a few additional lines… thanks PyMC!

## Stochastic SI model

from pymc import *
from numpy import *

#observed data
T = 10
susceptible_data = array([999,997,996,994,993,992,990,989,986,984])
infected_data = array([1,2,5,6,7,18,19,21,23,25])

# stochastic priors
beta = Uniform('beta', 0., 1., value=.05)
gamma = Uniform('gamma', 0., 1., value=.001)
tau = Normal('tau', mu=.01, tau=100., value=.01)

# stochastic compartmental model
S = {}
I = {}

## uninformative initial conditions
S[0] = Uninformative('S_0', value=999.)
I[0] = Uninformative('I_0', value=1.)

## stochastic difference equations for later times
for i in range(1,T):
    @deterministic(name='E[S_%d]'%i)
    def E_S_i(S=S[i-1], I=I[i-1], beta=beta):
        return S - beta * S * I / (S + I)
    S[i] = Normal('S_%d'%i, mu=E_S_i, tau=tau, value=E_S_i.value)

    @deterministic(name='E[I_%d]'%i)
    def E_I_i(S=S[i-1], I=I[i-1], beta=beta, gamma=gamma):
        return I + beta * S * I / (S + I) - gamma * I
    I[i] = Normal('I_%d'%i, mu=E_I_i, tau=tau, value=E_I_i.value)

# data likelihood
A = Poisson('A', mu=[S[i] for i in range(T)], value=susceptible_data, observed=True)
B = Poisson('B', mu=[I[i] for i in range(T)], value=infected_data, observed=True)

This ends up taking a total of 6 lines more than the deterministic version, and all the substantial changes are from lines 24-34. So, question one is for Anand, do I have to worry about unobserved sample paths here? If I’ve understood Roberts and Stramer’s introduction, I should be ok. Question two returns to a blog topic from one year ago, that I’ve continued to try to educate myself about: how do I decide if and when this more complicated model should be used?

Comments Off on MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

Filed under global health, MCMC, statistics