Monthly Archives: July 2010

9 Hours to Numeracy

I’m helping to plan an Introduction to Statistics for incoming post-bachelors fellows in the next month, and because of the wide range of backgrounds these recent college graduates will be coming from, I’m approaching it as a short course on numeracy (we’ve got about 9 hours of lecture time scheduled for it), focused on statistics. This will be complemented with a very hands-on dose of STATA, but I’m going to try not to think about that.

My favorite numeracy-in-stats book is a dusty classic, and it would have survived on its name alone: How to Lie with Statistics. I wonder if that title is too cheeky for global health applications when the numbers really matter…

Do you know this book, and do you like it? Or is there a more modern book or article that I should think of instead? What would you pack into 9 hours of stats numeracy training. Tell me.


Filed under education

More PBFs out of Seattle

For those of you interested in hearing more about the summer travels of the IHME post-bachelors fellows, I alert you to the existence of these blogs:

1 Comment

Filed under education

Guest post from “the field”

One cool program here at IHME is the field placements for our Post-Bachelors Fellows. This is a roughly 6 week stint during the summer of their second year here where they travel from Seattle to some distant place, to see where the numbers we’ve been crunching come from. Kyle Foreman is in Sri Lanka doing this now, and here is a guest post he’s written about an ICT4D challenge he’s seeing that he wants your ideas on:

I’m spending this summer in Sri Lanka working with the Ministry of Health and the community health department of the University of Peradeniya, observing how Sri Lanka’s medical record keeping and vital statistics system works. They’d like for me to make some suggestions on how it could be improved, so I was hoping to get some feedback on how to make this work.

Here’s the problem: keeping track of something as simple as the number of people who die each year is very difficult here. Patient records are kept at each hospital, then they are tabulated and sent to a regional office, then tabulated at a district office, ad nauseam, until they finally reach the national level. It takes literally years (they just finished the 2006 returns), is full of errors (because they do it all by hand), and is very incomplete (because every step along the way there’s further tabulation which strips away valuable data). They thus have difficulty identifying problems (especially outbreaks), targeting resources, and assessing the outcomes of their efforts. Continue reading


Filed under global health

Big Week for Healthy Algorithms Posts

I’ve been in meetings literally all day, but I’ve got so much to say that I’m still here… this is a newsletter that IHME put out yesterday, and it’s got me in the front cover photo. How can I pass up announcing that? It’s mostly for my mom, but PyMC fans might also appreciate the shoutout I managed to Anand Patil, who authored the PyMC Gaussian Process package that I’ve been urging people to use lately.


Filed under general

The GBD 2010 Health Measurement Survey is here

Have you got 15 minutes for science? Take this strange survey that I’ve been excited about for the last two years. I’ve been calling it the Disability Weights Survey, but now that it’s all professionally implemented and communications-department approved, it is officially the GBD 2010 Health Measurement Survey.

The survey is part of the Global Burden of Diseases, Injuries, and Risk Factors Study 2010 led by the Institute for Health Metrics and Evaluation (IHME) at the University of Washington, in collaboration with four other leading institutions: Harvard University, Johns Hopkins University, the University of Queensland, and the World Health Organization.

Our goal is to collect responses from at least 50,000 people worldwide. Please consider sharing this and encouraging participation within your organization. In addition, we would ask you to consider forwarding information about this survey to colleagues and contacts outside your organization who might be interested in participating.

The survey takes about 15 minutes to complete. Participation is completely voluntary and anonymous. In the near future, we hope to translate the survey into additional languages.


Filed under global health

Salt: Bad Dialogue and Worse

I had a break yesterday to see one of those “summer blockbusters”, a spy flick staring Angelina Jolie called Salt. It had some good explosions and good action, but overall it was so outrageously terrible that I will reveal the entire cloak-and-dagger twist to complain. (spoiler ahead) Continue reading


Filed under education, videos

What is this “effects” business?

I’ve got to figure out what people mean when they say “fixed effect” and “random effect”, because I’ve been confused about it for a year and I’ve been hearing it all the time lately.

Bayesian Data Analysis is my starting guide, which includes a footnote on page 391:

The terms ‘fixed’ and ‘random’ come from the non-Bayesian statistical tradition are are somewhat confusing in a Bayesian context where all unknown parameters are treated as ‘random’. The non-Bayesian view considers fixed effects to be fixed unknown quantities, but the standard procedures proposed to estimate these parameters, based on specified repeated-sampling properties, happen to be equivalent to the Bayesian posterior inferences under a noninformative (uniform) prior distribution.

That doesn’t totally resolve my confusion, though, because my doctor-economist colleagues are often asking for the posterior mean of the random effects, or similarly non-non-Bayesian sounding quantities.

I was about to formulate my working definition, and see how long I can stick to it, but then I was volunteered to teach a seminar on this very topic! So instead of doing the work now, I turn to you, wise internet, to tell me how I can understand this thing.


Filed under statistics

MCMC in Python: Sudoku is a strange game

I was on a “vacation” for the last week, which included a real vacation to visit my parents, and also a scare-quotes vacation to Detroit that is a totally different subject. But the side effect of this vacation that is to be the topic of this post is the strange game of Sudoku, which is popular with travelers.

Jessi was with me and she seems to like this Sudoku business (although she denies it), so I thought about playing, too. But I always get interested in the wrong parts of these games. Solving Sudoku puzzles seems more like a game for computers than humans, and writing a program to let computers have their fun is the way I would prefer to waste time while en route. There really isn’t much to it:

def solve(T):
    # if all cells are filled in, we win
    if all(T > 0):
        return 'success'

    # solve T recursively, by trying all values for the most constrained var
    pos = possibilities(T)
    i,j = most_constrained(T, pos)

    for val in pos[(i,j)]:
        T[i,j] = val
        if solve(T) == 'success':
            return 'success'

    # if this point is reached, this branch is unsatisfiable
    T[i, j] = -1
    return 'failure'

Is this ruining anyone’s homework problems? Just in case it is, everything I said at the beginning of my post on minimum spanning trees still applies.

With plenty of time left in my flight, this was just the opening move, though. The real question is what makes this fun for the humans who find this kind of thing fun? One hypothesis, the kind that comes naturally when you have watched the statistical physicists in action, is that fun is some sort of phase transition, and if you take random Sudoku puzzles with a certain number of entries you will maximize fun. Half-baked, I know, but as interesting as the movies they show on planes, at least. And it raises the extremely important, and possibly unsolved challenge, how do you sample uniformly at random from Sudoku puzzles with n cells filled in?

In my travel-addled state, I thought maybe a good way to do it would be to start with some complete configuration, knockout a large fraction of the cells at random, permute the remaining cells, and then solve the resulting configuration. Repeating this has got to be an ergodic markov chain, right? I’m not sure… and even then, do you think the stationary distribution is uniform?

def rand(n, T=None):
    # start with an empty board
    if T == None:
        T = -1*ones([9,9])

    # solve it to generate an initial solution

    # do random shuffles to approximate uniformly random solution
    for k in range(5):
        select_random_cells(T, 20)

    # remove appropriate amount of labels
    select_random_cells(T, n)

    return T

Now, when Jessi found out that I was helping computers take her sudoku-solving job away, she thought I was a geek, but when she found out I was generating puzzles with more than one solution, she was outraged. Sudoku puzzles have a unique solution! So maybe what is really fun is a random puzzle with a unique solution, and the right number of cells filled in, where a smaller number of cells means that harder puzzles are right. Fortunately/unfortunately my travel ended before I finished investigating this important issue.

I doubt it is related, but I got pretty sick the next day.


Filed under MCMC