Monthly Archives: August 2010

MCMC in Python: Global Temperature Reconstruction with PyMC

A short note on the PyMC mailing list alerted me that the Apeescape, the author of mind of a Markov chain blog, was thinking of using PyMC for replicating some controversial climate data analysis, but was having problems with it. Since I’m a sucker for controversial data, I decided to see if I could do the replication exercise in PyMC myself.

I didn’t dig in to what the climate-hockey-stick fuss is about, that’s something I’ll leave for my copious spare time. What I did do is find the data pretty easily available on the original author’s website, and make a translation of the R/bugs model into pymc/python. My work is all in a github repository if you want to try it yourself, here.

Based on Apeescape’s bugs model, I want to have \textnormal{temp}_t = N(\mu_t, \sigma^2) where \mu_t = \beta_0 + \beta_1\textnormal{temp}_{t-1} + \beta_2\textnormal{temp}_{t-2} + \sum_{i=3}^{12} \beta_i(\textnormal{PC})_{t,i}, with priors \vec{\beta} \sim N(\vec{0}, 1000 I) and \sigma \sim \textnormal{Uniform}(0,100).

I implemented this in a satisfyingly concise 21 lines of code, that also generate posterior predictive values for model validation:

# load data                                                                                                                                                                      
data = csv2rec('BUGS_data.txt', delimiter='\t')


# define priors                                                                                                                                                                  
beta = Normal('beta', mu=zeros(13), tau=.001, value=zeros(13))
sigma = Uniform('sigma', lower=0., upper=100., value=1.)


# define predictions                                                                                                                                                             
pc = array([data['pc%d'%(ii+1)] for ii in range(10)]) # copy pc data into an array for speed & convenience                                                                       
@deterministic
def mu(beta=beta, temp1=data.lagy1, temp2=data.lagy2, pc=pc):
    return beta[0] + beta[1]*temp1 + beta[2]*temp2 + dot(beta[3:], pc)

@deterministic
def predicted(mu=mu, sigma=sigma):
    return rnormal(mu, sigma**-2.)

# define likelihood                                                                                                                                                              
@observed
def y(value=data.y, mu=mu, sigma=sigma):
    return normal_like(value, mu, sigma**-2.)

Making an image out of this to match the r version got me stuck for a little bit, because the author snuck in a call to “Friedman’s SuperSmoother” in the plot generation code. That seems unnecessarily sneaky to me, especially after going through all the work of setting up a model with fully bayesian priors. Don’t you want to see the model output before running it through some highly complicated smoothing function? (The super-smoother supsmu is a “running lines smoother which chooses between three spans for the lines”, whatever that is.) In case you do, here it is, together with an alternative smoother I hacked together, since python has no super-smoother that I know of.

Since I have the posterior predictions handy, I plotted the median residuals against the median predicted temperature values. I think this shows that the error model is fitting the data pretty well:

5 Comments

Filed under MCMC, statistics

Applied Approximate Counting: Malaria

My first first-authored global health paper came out today (I consider it my first “first-authored” paper ever, since the mathematicians I’ve worked with deviantly list authorship in alphabetical order regardless of seniority and contribution). It’s a bit of a mouthful by title: Rapid Scaling Up of Insecticide-Treated Bed Net Coverage in Africa and Its Relationship with Development Assistance for Health: A Systematic Synthesis of Supply, Distribution, and Household Survey Data.

What I find really pleasing about this research paper is the way it continues research I worked on in graduate school, but in a completely different and unexpected direction. Approximate counting is something that my advisor specialized in, and he won a big award for the random polynomial time algorithm for approximating the volume of convex bodies. I followed in his footsteps when I was a student, and I’m still doing approximate counting, it’s just that now, instead of approximating the amount of high-dimensional sand that will fit in an oddly shaped high-dimensional box, I’ve been approximating the number of insecticide-treated bednets that have made it from manufacturers through the distribution supply-chain and into the households of malaria-endemic regions of the world. I’m even using the same technique, Markov-chain Monte Carlo.

I’ve been itching to write about the computational details of this research for a while, and now that the paper’s out, I will have my chance. But for today, I refer you to the PLoS Med paper, and the technical appendix, and the PyMC code on github.

4 Comments

Filed under global health, MCMC

Coding tips for grad students, by grad students

Kyle writes from Sri Lanka with his stats programming tips for the new PBFs. It’s all things that old PBFs and even old young professors can benefit from:

• It’s taken me 2 years to jump on the version control bandwagon (~18 months after your PToW on git….), so I certainly can’t claim to be an exemplar myself. But I think the main themes would be:

• Location, location, location! Can you find your code? Can others find your code? Do both the directory and the filename make sense?

• Replicability – even of the mistakes. If you do something right, you want to be able to do it again.
o But often, even if it’s wrong, you want to do it again. Chris will say, let’s go back to the broken version from 2 months ago, I liked that better. So if you change your code, keep a record of the old parts (and maybe even why you ditched them).

• If others can’t look at your code and figure out quickly what each “chunk” of code does, it’s not well documented enough. If you can’t even tell within 30 seconds what a particular piece does (and you wrote it!), that’s a problem.

• On the other hand: Yes, a few PBFs were lit majors, but that doesn’t mean your code should be in novella format. Concise, readable code is often more understandable than a few sentences of explanation.

• Whitespace! Headers! Tab and Enter are your close and personal friends: “Without negative space how would we appreciate the positive in our art and in our lives?” – Dyan Law (some artist I’ve never heard of)

• Good exercises: Take someone else’s raw code, figure out what it does, and comment it. Read through a program you wrote and haven’t used in months – how long does it take you to figure out? Have someone else comment your own raw code; did they explicate things you left implied? did they misinterpret anything?

All good advice, and I often regret it when I don’t follow it. Anything else that should be on this list?

Comments Off on Coding tips for grad students, by grad students

Filed under software engineering

Aaronson’s non-bet of non-confidence in P≠NP

As you have undoubtedly heard by now, there is a paper that claims to prove P!=NP, and there is a serious effort to understand the proof.

It has been fun to watch the experts set to work on this, and it has brought a lot of attention to random k-SAT, a problem that was near and dear to me when I was a grad student. And I get to learn interesting things from them without having to struggle through Deolalikar’s opus myself.

One interesting thing is the way Scott Aaronson reacted, saying:

If Vinay Deolalikar is awarded the $1,000,000 Clay Millennium Prize for his proof of P≠NP, then I, Scott Aaronson, will personally supplement his prize by the amount of $200,000.

When I first read about Aaronson’s offer to add $200K to the prize money, reported 2nd hand in a roundup of what the #pnp blogs were saying, it came off like the young professor is really hoping to have people work on this thing. But once my trusty rss feeder fed me his post, I realized his offer is not about how profs at private universities have disposable income that public schools don’t provide. It’s his way of quantifing his confidence in the accuracy of the proof.

If Aaronson had framed this in terms of a bet, it would be a textbook example of his level of certainty that the proof will have a flaw (a textbook in decision theory, anyway). But offering the sum without any possibility of receiving a return in the alternative scenario breaks expected utility theory. How certain is Scott? It all depends on what amount of money means nothing to him.

1 Comment

Filed under probability, TCS

P, NP, and more

There was a lot of buzz this weekend about a paper claiming to resolve the “P vs NP” conjecture.  I’ve seen plenty of papers claiming to do this over the years, so I didn’t rush to track it down.  But as the tweeting continued, I decided to have a quick look for myself, if only to produce a crotchety blog post on the matter.

Part of the drama around this paper comes from the way it appeared.  Author Vinay Deolalikar‘s web page explains:

Manuscript sent on 6th August to several leading researchers in various areas. Confirmations began arriving 8th August early morning.  The preliminary version made it to the web without my knowledge.  I have made minor updates, here. Please note that the  final version of the paper is under preparation, and is to be posted here very shortly.

I’m a global health researcher now, so I’m not going to be the one who tries to verify this 104 page paper, but I would love to learn from a careful review that it is true.  The result seems to go further than separating P and NP, since it is a statement about a natural distribution of instances being hard (from p. 78):

If LFP were able to compute solutions to the d1RSB phase of random k-SAT, then the distribution of the entire space of solutions would have a substantially simpler parametrization than we know it does.

Ryan Williams thinks this is suspicious, and expressed his concern in a series of tweets:

  1. This P vs NP claim is getting tons of press. I am really doubtful that it works. Hard to convey my skepticism in a tweet, but here goes…
  2. The author tries to use the fact that for certain distributions of random k-SAT, the solution space has a “hard structure”. Two problems:
  3. (1) Polytime solvable problems (such as perfect matching on random graphs) can also have complicated solution distributions.

  4. (2) There is a randomized reduction from SAT to formulas with at most ONE satisfying assignment (Valiant-Vazirani). A simple solution space
  5. So either Valiant-Vazirani can’t be derandomized or RP=NP (seems very unlikely!) or the proof must break. That’s my intuition.

Time will tell. I find proof of the existence of hard-on-average distributions much more exciting than plain old P vs NP, and Deolalikar’s paper might have something for everybody.

1 Comment

Filed under TCS

UW Exchange email on an Android phone

This is a sort-of public service message, and I expect that it will at least be of service to my future self. Since I am now an absent-minded professor, I lost my cell phone when I was at JSM. So I got a new one today (thus following the advice my department head gave me when he saw the cracked screen on my old phone a month ago). And since it’s new and shiny, I started playing around and decided to make it check my university email, calendar, etc.

This is all easy, provided you know the magic settings, which are not recorded anywhere (yet).  I’ll write down precisely what it should be for someone with UW username “your_username”.  Just replace that with “abie” if you’re me, or something appropriate if you’re not.

  • Email address: your_username@u.washington.edu
  • Server address: exchange.washington.edu
  • Domain: netid
  • Username: your_username
  • Password: *********
  • This server requires an encrypted SSL connection: Yes

Search engines, please index this so I can find it next time I’m looking for “UW email on android”, thanks.

6 Comments

Filed under education

Speaking of Networks

I don’t get to do much about network research these days, except feel guilty about a few unfinished projects that I have no time for, but at least I can spread the word about conferences. The Web Algorithms Workshop 2010, for which I am on the PC, has a call-for-papers out, and the deadline is in 1.5 months.

The SAMSI focus year on Complex Networks had a preview session at JSM today, and it sounds like a truly interdisciplinary crowd. According to Eric Kolaczyk, the opening workshop (Aug 31 – Sept 1) is nearly full, so act fast if you want to act.

Comments Off on Speaking of Networks

Filed under Uncategorized