Multilevel (hierarchical) modeling: what it can and cannot do in Python

I re-read a short paper of Andrew Gelman’s yesterday about multilevel modeling, and thought “That would make a nice example for PyMC”. The paper is “Multilevel (hierarchical) modeling: what it can and cannot do, and R code for it is on his website.

To make things even easier for a casual blogger like myself, the example from the paper is extended in the “ARM book”, and Whit Armstrong has already implemented several variants from this book in PyMC.

There is really nothing left for me to do but collect up the links and find a nice graphic. I guess I’ll go the extra distance and condense the code to as few lines as possible and paste it in here, too. Here’s my github, if you want to get my reformed csv files and follow along at home.

household_data = [d for d in csv.DictReader(open('household.csv'))]
county_data = [d for d in csv.DictReader(open('county.csv'))]

# hyper-priors
g = Uniform('gamma', [0,0], [100,100])

s_a = Uniform('sigma_a', 0, 100)

# priors
a = {}
for d in county_data:
    @stochastic(name='a_%s'%d['county'])
    def a_j(value=0., g=g, u_j=float(d['u']), s_a=s_a):
        return normal_like(value, g[0] + g[1]*u_j, s_a**-2.)
    a[d['county']] = a_j

b = Uniform('beta', 0, 100)

s_y = Uniform('sigma_y', 0, 100)

# likelihood
y = {}
for d in household_data:
    @stochastic(observed=True, name='y_%s'%d['household'])
    def y_i(value=float(d['y']), a_j=a[d['county']], b=b,
            x_ij=float(d['x']), s_y=s_y):
        return normal_like(value, a_j + b*x_ij, s_y**-2.)
    y[d['household']] = y_i

mc = MCMC([g, s_a, a, b, s_y, y])
mc.sample(10000, 2500, verbose=1)

In [25]: time mc.sample(100) CPU times: user 39.46 s, sys: 0.02 s, total: 39.48 s Wall time: 39.57 s In [27]: time mc_fast.sample(100) CPU times: user 0.22 s, sys: 0.00 s, total: 0.22 s Wall time: 0.22 s

17 responses to “Multilevel (hierarchical) modeling: what it can and cannot do in Python”

Abraham Flaxman

December 2, 2009 at 7:16 am

Now who’s going to race this against WinBugs?
Xavier

December 2, 2009 at 12:13 pm

Thanks for sharing the code. I am thinking that it would be interesting also to show some performance tests of different implementations in different programs. I have made the commitment to start doing it myself.

Anyway, thanks.
Abraham Flaxman

December 2, 2009 at 3:29 pm

Great, Xavier! Let me know how it goes.

This took about 75 min to run on my laptop, but I bet when the pymc-users crowd gives it a look, there will be suggestions for major speedups.
Chris

December 3, 2009 at 7:41 pm

Hey Xavier,

Thanks for doing this; it does make for a nice example. I would suggest a couple of fundamental structural changes from a PyMC implementation standpoint. As much as possible, you want to avoid loops, and stick with vector-valued stochastics whenever possible. Not only is the code more concise, but it is far faster. Here is my version of the multilevel_radon model; notice the use of indexing for the county intercepts.
Chris

December 3, 2009 at 7:42 pm

Sorry, my embedded code snippet did not show up. Here is a link: http://pastie.org/726147
Abraham Flaxman

December 3, 2009 at 8:13 pm

Thanks, Chris. I’ll try a laptop-melting speed test tonight, and report back.

Abraham Flaxman

December 4, 2009 at 4:28 am

Chris wins! 200x speed up…

Chris

December 4, 2009 at 6:19 am

Sorry — realized that I said “Hey Xavier” when I should have said “Hey Abraham”.

Not that Xavier doesn’t deserve a “hey”, but you know …
Abraham Flaxman

December 4, 2009 at 7:58 pm

🙂
Todd

December 22, 2009 at 7:14 am

Hi Abraham,

Thanks very much for your post and sharing your code!

I’m having trouble reproducing your figure using Chris’s fast version. What I get when I run run_multilevel_radon_fast.py is shown in the following figure:

http://drp.ly/5HcbD

I’m using a recent SVN version of PyMC with NumPy v1.4.0.dev7805 on a Mac with Python 2.6. Have you verified that Chris’s fast version actually yields the right numbers?

Running your original version yields a figure identical to the one in your blog post (and identical to the results from running Gelman’s model with JAGS).

Thanks,

Todd
Abraham Flaxman

December 22, 2009 at 6:14 pm

Hi Todd,

I think there is a small bug in Chris’s code, and I tried fixing it here:

http://github.com/aflaxman/pymc_radon/commit/b8f7c7b10cc769cf2840fa3e549d317076b24fa0#L1L16

I haven’t had time to verify it carefully, though, so it would be great if you can.
Todd

December 22, 2009 at 6:27 pm

Hi Abraham,

I think I actually used the latest versions (12/10/2009, better priors, floats not ints) of multilevel_radon_fast and run_multilevel_radon_fast.py.

I’m going to spend some time this morning trying to figure out what’s going on, and I’ll let you know if I have any luck.

Todd
Abraham Flaxman

December 22, 2009 at 7:36 pm

Oh, my other thought is to make sure you are waiting long enough for the “burn-in” period. The AdaptiveMetropolis might need 10-20K samples to learn about the shape of the probability space.
Todd

December 29, 2009 at 2:15 am

Well, it took me a little more time than I imagined to figure things out, or at least to *think* that I’ve figured things out.

It all seems to come down to initialization. The original version initializes all of the “a” (or “alpha”) values to 0, while the fast version leaves the initialization up to PyMC. After even 100 million iterations, the fast version without “thoughtful” initialization does not converge (although it’s clearly moving in the right direction).

If I initialize the stochastics following Gelman’s prescription (i.e., normal variables are initialized with random draws from a N(0, 1) distribution and uniform variables are initialized with draws from a uniform(0, 1) distribution; p. 350 of Gelman and Hill), then the fast version does converge after a still lengthy ~200,000 iterations.

It’s worth noting that JAGS, even without sensible initialization, converges in a mere few thousand iterations.

Thanks again for sharing your code!

Todd
Whit Armstrong

January 20, 2010 at 3:00 pm

Abraham,

Thanks for posting this example up! The model variation that gave me the most trouble is actually the inverse wishart version.

The version that I put together was miserably slow, and if I recall correctly I don’t think I was able to get identical output to Gelman’s version.

I’d be obliged if someone would have a look:
http://github.com/armstrtw/pymc_radon/blob/master/radon_inv_wishart.py

-Whit
Abraham Flaxman

January 27, 2010 at 9:12 pm

Hi Whit,

I had some down time this morning, so I took a look. There were indeed some possible speedups, and the big one is from vectorizing the MvNormal stoch. That makes things about 15x faster.

Can you make sure that you still like the estimates coming out? As Todd discovered above, a computation completing quickly is only part of the challenge.
Pingback: 2010 in review | Healthy Algorithms

Multilevel (hierarchical) modeling: what it can and cannot do in Python

17 responses to “Multilevel (hierarchical) modeling: what it can and cannot do in Python”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

Multilevel (hierarchical) modeling: what it can and cannot do in Python

Share this:

Related

17 responses to “Multilevel (hierarchical) modeling: what it can and cannot do in Python”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta