If you’ve read some of my previous posts, you might be wondering, what does Health Metrics have to do with sampling independent sets in graphs? (What is Health Metrics? you might also be wondering.)
In my new job, I’m not that interested in sampling independent sets. I’m mostly interested in sampling from a weird distribution that comes out of a Bayesian denoising problem.
Let me set the stage: a huge project that IHME is part of is the Global Burden of Disease study, which will (among other things) rank 200 diseases and injuries according to how severely they impact humanity. How you could possibly, ethically make this list is the topic of many books, and I won’t try to get into it now. IHME director Chris Murray seems to favor measuring impact in “disability adjusted life years” (DALYs), which is a fairly individualistic, fairly egalitarian approach.
Most any approach for measuring disease impact that you can dream up is going to require some serious data as input. One thing that seems essential is an idea of how many people get the disease (prevalence) and how long they have it for (duration).
This information is recorded, but it is sporadic, systematically biased, and very, very noisy. It is something available from multiple independent sources, and often the different sources do not agree very well.
But, if you think you know a little bit about how diseases behave then there are some consistency conditions that the measurements should satisfy. I’m not talking about deep knowledge of epidemiology here. I mean things like the number of people sick at the beginning of last year plus the number of people who got sick during the course of last year should equal the number of people sick at the beginning of this year (minus the number who got better or died… guess it’s not that simple).
So, if these things don’t add up, what do you do? Well, if you are predisposed towards Bayesianism, you do what DisMod III is going to do, which is you try to find the posterior distribution of the true values in terms of the data and your prior beliefs about what the true values should look like.
And how are we going to get a picture of the marginal posterior distributions? Well, MCMC is the only game I know for that. Okay, I know that message passing approaches like belief propagation might be able to do it, too. I’ll have to look into that.
Update: Slides from a talk on how this project is advancing, from JSM 2010
good to see you here, abie!