Monthly Archives: May 2011

MCMC in Python: A simple random effects model and its extensions

A nice example of using PyMC for multilevel (aka “Random Effects”) modeling came through on the PyMC mailing list a couple of weeks ago, and I’ve put it into a git repo so that I can play around with it a little, and collect up the feedback that the list generates.

Here is the situation:

Hi All,

New to this group and to PyMC (and mostly new to Python). In any case, I’m writing to ask for help in specifying a multilevel model (mixed model) for what I call partially clustered designs. An example of a partially clustered design is a 2-condition randomized psychotherapy study where subjects are randomly assigned to conditions. In condition 1, subjects are treated in small groups of say 8 people a piece. In the condition 2, usually a control, subjects are only assessed on the outcome (they don’t receive an intervention). Thus you have clustering in condition 1 but not condition 2.

The model for a 2-condition study looks like (just a single time point to keep things simpler):
y_{ij} = b_0 + b_1 Tx + u_j Tx + e_{ij},
where y_ij is the outcome for the ith person in cluster j (in most multilevel modeling software and in PROC MCMC in SAS, subjects in the unclustered condition are all in clusters of just 1 person), b_0 is the overall intercept, b_1 is the treatment effect, Tx is a dummy coded variable coded as 1 for the clustered condition and 0 for the unclustered condition, u_j is the random effect for cluster j, and e_ij is the residual error. The variance among clusters is \sigma^2_u and the residual term is \sigma^2_e (ideally you would estimate a unique residual by condition).

Because u_j interacts with Tx, the random effect is only contributes to the clustered condition.

In my PyMC model, I expressed the model in matrix form – I find it easier to deal with especially for dealing with the cluster effects. Namely:
Y = XB + ZU + R,
where X is an n x 2 design matrix for the overall intercept and intervention effect, B is a 1 x 2 matrix of regression coefficients, Z is an n x c design matrix for the cluster effects (where c is equal to the number of clusters in the clustered condition), and U is a c x 1 matrix of cluster effects. The way I’ve written the model below, I have R as an n x n diagonal matrix with \sigma^2_e on the diagonal and 0’s on the off-diagonal.

All priors below are based on a model fit in PROC MCMC in SAS. I’m trying to replicate the analyses in PyMC so I don’t want to change the priors.

The data consist of 200 total subjects. 100 in the clustered condition and 100 in the unclustered. In the clustered condition there are 10 clusters of 10 people each. There is a single outcome variable.

I have 3 specific questions about the model:

  1. Given the description of the model, have I successfully specified the model? The results are quite similar to PROC MCMC, though the cluster variance (\sigma^2_u) differs more than I would expect due to Monte Carlo error. The differences make me wonder if I haven’t got it quite right in PyMC.
  2. Is there a better (or more efficient) way to set up the model? The model runs quickly but I am trying to learn Python and to get better at optimizing how to set up models (especially multilevel models).
  3. How can change my specification so that I can estimate unique residual variances for clustered and unclustered conditions? Right now I’ve estimated just a single residual variance. But I typically want separate estimates for the residual variances per intervention condition.

Here are my first responses:

1. This code worked for me without any modification. 🙂 Although when
I tried to run it twice in the same ipython session, it gave me
strange complaints. (for pymc version 2.1alpha, wall time 78s).
For the newest version in the git repo (pymc version 2.2grad,
commit ca77b7aa28c75f6d0e8172dd1f1c3f2cf7358135, wall time 75s) it
didn’t complain.

2. I find the data wrangling section of the model quite opaque. If
there is a difference between the PROC MCMC and PyMC results, this
is the first place I would look. I’ve been searching for the most
transparent ways to deal with data in Python, so I can share some
of my findings as applied to this block of code.

3. The model could probably be faster. This is the time for me to
recall the two cardinal rules of program optimization: 1) Don’t
Optimize, and 2) (for experts only) Don’t Optimize Yet.

That said, the biggest change to the time PyMC takes to run is in
the step methods. But adjusting this is a delicate operation.
More on this to follow.

4. Changing the specification is the true power of the PyMC approach,
and why this is worth the extra effort, since a random effects
model like yours is one line of STATA. So I’d like to write out
in detail how to change it. More on this to follow.

5. Indentation should be 4 spaces. Diverging from this inane detail
will make python people itchy.

Have a look in the revision history and the git repo README for more.

Good times. Here is my final note from the time I spent messing around:
Django and Rails have gotten a lot of mileage out of emphasizing
_convention_ in frequently performed tasks, and I think that PyMC
models could also benefit from this approach. I’m sure I can’t
develop our conventions myself, but I have changed all the file names
to move towards what I think we might want them to look like. Commit
linked here
.

My analyses often have these basic parts: data, model, fitting code,
graphics code. Maybe your do, too.

p.s. Scott and I have started making an automatic model translator, to generate models like this from the kind of concise specification that users of R and STATA are familiar with. More news on that in a future post.

1 Comment

Filed under MCMC, statistics

DCP MIP

(This is a post that I started a long time ago (Aug 10, 2010), but hung on to because it is only half-baked. But based on the philosophy that avoiding foolishness is a recipe for silence, I’m sending it out into the world. Coming next: the simplest useful open problem in DCP optimization.)

Acronyms are like candy in Global Health, and Operations Research is full of them, too. This post is a place for me to collect information about software available for solving the Mixed Integer Programming (MIP) problem that the Disease Control Priorities (DCP) project is going to put together.

DCP is a massive effort to quantify the cost-effectiveness of hundreds of health interventions across tens of “health platforms” (big hospitals, small clinics, pharmacies, etc). The output of this large, coordinated effort will be (as far as I’m concerned) a collection of giant matrices, that say, for different subsets of interventions across different platforms, (1) the cost of setting up the interventions, (2) the cost of operating the interventions, (3) the health gain from the interventions. Undoubtedly, there will be some quantification of the uncertainty in each of these quantities. Also, there is something called platform improvement, which can be thought of as a special type of intervention that makes a bunch of other interventions more effective on a certain platform. And there are a number of side-constraints; some interventions come together on certain platforms, some interventions are mutually exclusive, etc.

Some unknowns: (how) will the uncertainty in the entries of these matrices be quantified? Are the intervention choices all “yes/no” or do you choose how much you want of some of them, i.e a non-negative continuous variable?

This is the optimization that mixed integer programming was born for (except for the uncertainty, which takes us into less charted waters). So how we are going to do it, in theory, is just the sort of thing my OR classes focused on when I was in grad school. We didn’t talk much about how to do it in practice. Some of the hard-working students who sat in the business school and actually solved large integer programs would mutter about CPLEX once in a while at parties, but I didn’t pay it much heed.

Heed I must now pay. So I am collecting up the available MIP solvers here, and (eventually) evaluating them for my DCP optimization task. Got suggestions? I would love to hear them.

  • Gurobi – 1 2 3
  • ILOG/CPLEX – 1 2
  • GAMS – 1
  • AMPL – 1
  • NEOS – 1

5 Comments

Filed under combinatorial optimization, global health

Interesting Book about Open-Source Software

I learned about an interesting book today, The Architecture of Open Source Applications, edited by Amy Brown and Greg Wilson. The introduction caught my attention:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book’s goal is to change that. In it, the authors of twenty-five open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to this book provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, this book is the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, this book can help you too.

There are chapters on several software packages that I’ve enjoyed using, and chapters on several scientific/data analysis tools, but nothing on the tools I’m using day to day. Still interesting. Let me know if there is a great chapter you come across, since I’m going to be too busy to read the whole thing in the near future.

Comments Off on Interesting Book about Open-Source Software

Filed under software engineering

Long day, but there are longer

I had a long day today, but it’s nothing compared to my brother-in-law. He just went to Greenland. The sun doesn’t set there for months! He’s got some great pictures from his trip. It’s not all pretty pictures, though. There’s a new volcanic ash cloud floating his way, which means serious delays on his lost luggage. He has a great internet connection there, so he’ll be able to see how many page views his blog is getting, so go take a look, maybe a lot of web traffic will distract him from the fact that he has only one pair of underwear in a 1000 mile radius.

Comments Off on Long day, but there are longer

Filed under general

Journal Culture

All fields have their quirks in publication style. Today I’m thinking about statistics, because I’ve been asked to explain something about survey weights to our post-bachelor’s fellows. There is a nice paper on the matter by Andrew Gelman, which starts strong, with first sentence “Survey weighting is a mess.” Start like that, and you’re sure to get a response from survey statisticians, who (at least I imagine) think of themselves as about as tidy as it comes.

The quirk in stats publications that I’m thinking of today is the Comment/Rejoinder format, wherein an article was published together with responses from several statisticians who don’t all agree with the article, and then a response from authors of the article. This is cool.

Unfortunately, Google scholar hasn’t kept up with this format, and searching for the paper title Struggles with Survey Weighting and Regression Modeling found me just one of the five comments. Project Euclid hasn’t kept up either, with only a tiny link from the article to the table of contents from the journal it appeared in. And thus I was forced to follow the obscure links in the pdf of the article to find the comprehensive list, which I’m putting here in case I need to find them all again sometime.

Statistical Science, Vol. 22, No. 2
: Article/Comments/Rejoiner on Survey Weights

Comments Off on Journal Culture

Filed under statistics

Algorithms in the Field Workshop

Last week I attended the workshop Algorithms in the Field or “8F”, as its puzzling acronym turns out to be. David Eppstein published comprehensive notes on the talks on his blog:

I like to think that I’m doing “8F” in my global health job, and also some algorithms in the forest, algorithms in the desert, etc. But this workshop gave me a chance to think about the connection to Algorithms with a capitol “A”, the kind going on in the theory group of the ivory towered computer science department. I’ve been pretty successful at coming up with little challenges in global health where algorithmic thinking is useful (e.g. Doctors = Noise Machines), and I’m going to try to use the blog to throw more of these puzzles over the fence in the next few weeks. My barrier to doing this in the past has been the amount of background research I need to do to avoid sounding foolish. But my writing style was never suited to caution. Let’s see how it goes. I hope that even half-explained connections between Algorithms and global health will inspire some algorithmitician to fill in the details.

What I challenge myself to do more of is to go beyond the little puzzles, and synthesize something bigger to ask from algorithmists. What algorithmic innovation would really change how we’re doing things in Global Health? This is a domain where avoiding foolishness is even more of a recipe for silence. But I will try.

On my mind now: the computation time necessary to fit a model. 20 seconds, 20 minutes, or 20 hours is a really big difference. More on that thought to come.

Question to readers from Global Health Departments, what do you think capitol-A algorithms researchers can offer us?

Comments Off on Algorithms in the Field Workshop

Filed under global health, TCS

The most vital lesson from your work

Seed Magazine had a short article that captivated my imagination for the last few days:

If you only had a single statement to pass on to others summarizing the most vital lesson to be drawn from your work, what would it be? Seed asked eleven scientists this question.

I was pretty disappointed with the answers, but upon re-reading the question, I realized that it’s probably due to differences of interpretation. The only answer that I liked was from Steve Strogatz, who said:

You can make sense of anything that changes smoothly in space or time, no matter how wild and complicated it may appear, by reimagining it as an infinite series of infinitesimal changes, each proceeding at a constant (and hence much simpler) rate, and then adding all those simple little changes back together to reconstitute the original whole.

This is a pretty loose understanding of what is “from” means in “drawn from your work”. I don’t think Strogatz claims he invented infinitesimal calculus, just that has in work this has been a vital idea (the most vital? total orderings are tough sometimes.)

What’s the most vital lesson to be drawn from your work?

2 Comments

Filed under Uncategorized

In the media

Jake called my attention to a recent public interest story in the NYTimes, about using the results of an on-going telephone survey on predict the demographic profile of happiest man in America. It seems like they’re making fun of simple models, a pursuit which I approve of:

The New York Times asked Gallup to come up with a statistical composite for the happiest person in America, based on the characteristics that most closely correlated with happiness in 2010. Men, for example, tend to be happier than women, older people are happier than middle-aged people, and so on.

Gallup’s answer: he’s a tall, Asian-American, observant Jew who is at least 65 and married, has children, lives in Hawaii, runs his own business and has a household income of more than $120,000 a year.

But the joke is that they made a few phone calls and found the guy… he looks pretty happy.

1 Comment

Filed under global health, statistics

MCMC in Python: Part IIb of PyMC Step Methods and their pitfalls

Two weeks ago, I slapped together a post about the banana distribution and some of my new movies sampling from it. Today I’m going to explore the performance of MCMC with the Adaptive Metropolis step method when the banana dimension grows and it becomes more curvy.

To start things off, have a look at the banana model, as implemented for PyMC:

    C_1 = pl.ones(dim)
    C_1[0] = 100.
    X = mc.Uninformative('X', value=pl.zeros(dim))

    def banana_like(X, tau, b):
        phi_X = pl.copy(X)
        phi_X *= 30. # rescale X to match scale of other models
        phi_X[1] = phi_X[1] + b*phi_X[0]**2 - 100*b

        return mc.normal_like(phi_X, 0., tau)

    @mc.potential
    def banana(X=X, tau=C_1**-1, b=b):
        return banana_like(X, tau, b)

Pretty simple stuff, right? This is a pattern that I find myself using a lot, an uninformative stochastic variable that has the intended distribution implemented as a potential. If I understand correctly, it is something that you can’t do easily with BUGS.

Adaptive Metropolis is (or was) the default step method that PyMC uses to fit a model like this, and you can ensure that it is used (and fiddle with the AM parameters) with the use_step_method() function:

mod.use_step_method(mc.AdaptiveMetropolis, mod.X)

Important Tip: If you want to experiment with different step methods, you need to call mod.use_step_method() before doing any sampling. When you call mod.sample(), any stochastics without step methods assigned are given the method that PyMC deems best, and after they are assigned, use_step_method adds additional methods, but doesn’t remove the automatically added ones. (See this discussion from PyMC mailing list.)

So let’s see how this Adaptive Metropolis performs on a flattened banana as the dimension increases:

You can tell what dimension the banana lives in by counting up the panels in the autocorrelation plots in the upper right of each video, 2, 4 or 8. Definitely AM is up to the challenge of sampling from an 8-dimensional “flattened banana”, but this is nothing more than an uncorrelated normal distribution that has been stretched along one axis, so the results shouldn’t impress anyone.

Here is what happens in a more curvy banana:

Now you can see the performance start to degrade. AM does just fine in a 2-d banana, but its going to need more time to sample appropriately from a 4-d one and even more for the 8-d. This is similar to the way a cross-diagonal region breaks AM, but a lot more like something that could come up in a routine statistical application.

For a serious stress test, I can curve the banana even more:

Now AM has trouble even with the two dimensional version.

I’ve had an idea to take this qualitative exploration and quantify it to do a real “experimental analysis of algorithms” -style analysis. Have you seen something like this already? I do have enough to do without making additional work for myself.

Comments Off on MCMC in Python: Part IIb of PyMC Step Methods and their pitfalls

Filed under statistics, TCS