Tag Archives: python

Powell’s Method for Maximization in PyMC

I have been using “Powell’s Method” to find maximum likelihood (ahem, maximum a posteriori) estimates in my PyMC models for years now. It is something that I arrived at by trying all the options once, and it seemed to work most reliably. But what does it do? I never bothered to figure out, until today.

It does something very reasonable. That is to optimize a multidimensional function along one dimension at a time. And it does something very tricky, which is to update the basis for one-dimensional optimization periodically, so that it quickly finds a “good” set of dimensions to optimize over. Now that sounds familiar, doesn’t it? It is definitely the same kind of trick that makes the Adaptive Metropolis step method a winner in MCMC.

The 48-year-old paper introducing the approach, M. J. D. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives, is quite readable today. If you want to see it in action, I added an ipython notebook to my pymc examples repository on github: [ipynb] [py] [pdf].

Leave a Comment

Filed under Uncategorized

PyMC+Pandas: Poisson Regression Example

When I was gushing about the python data package pandas, commenter Rafael S. Calsaverini asked about combining it with PyMC, the python MCMC package that I usually gush about. I had a few minutes free and gave it a try. And just for fun I gave it a try in the new ipython notebook. It works, but it could work even better. See attached:

[pdf] [ipynb]

3 Comments

Filed under MCMC, software engineering

My First Contribution to PyMC

I’m excited to report that my first contribution back to the PyMC codebase was accepted. :)

It is a slight reworking of the pymc.Matplot.plot function that make it include autocorrelation plots of the trace, as well as histograms and timeseries.  I also made the histogram look nicer (in my humble opinion).

Before:

After:

In this example, I can tell that MCMC hasn’t converged from the trace of beta_2 without my changes, but it is dead obvious from the autocorrelation plot of beta_2 in the new version.

The process of making changes to the pymc sourcecode is something that has intimidated me for a while.  Here are the steps in my workflow, in case it helps you get started doing this, too.

# first fork a copy of pymc from https://github.com/pymc-devs/pymc.git on github
git clone https://github.com/pymc-devs/pymc.git

# then use virtualenv to install it and make sure the tests work
virtualenv env_pymc_dev
source env_pymc_dev/bin/activate

# then you can install pymc without being root
cd pymc
python setup.py install
# so you can make changes to it without breaking everything else

# to test that it is working
cd ..
python
>>> import pymc
>>> pymc.test()
# then make changes to pymc...
# to test changes, and make sure that all of the tests that use to pass still do repeat the process above
cd pymc
python setup.py install
cd ..
python
>>> import pymc
>>> pymc.test()

# once everything is perfect, push it to a public git repo and send a "pull request" to the pymc developers.

Is there an easier way?  Let me know in the comments.

1 Comment

Filed under MCMC, software engineering

MCMC in Python: A simple random effects model and its extensions

A nice example of using PyMC for multilevel (aka “Random Effects”) modeling came through on the PyMC mailing list a couple of weeks ago, and I’ve put it into a git repo so that I can play around with it a little, and collect up the feedback that the list generates.

Here is the situation:

Hi All,

New to this group and to PyMC (and mostly new to Python). In any case, I’m writing to ask for help in specifying a multilevel model (mixed model) for what I call partially clustered designs. An example of a partially clustered design is a 2-condition randomized psychotherapy study where subjects are randomly assigned to conditions. In condition 1, subjects are treated in small groups of say 8 people a piece. In the condition 2, usually a control, subjects are only assessed on the outcome (they don’t receive an intervention). Thus you have clustering in condition 1 but not condition 2.

The model for a 2-condition study looks like (just a single time point to keep things simpler):
y_{ij} = b_0 + b_1 Tx + u_j Tx + e_{ij},
where y_ij is the outcome for the ith person in cluster j (in most multilevel modeling software and in PROC MCMC in SAS, subjects in the unclustered condition are all in clusters of just 1 person), b_0 is the overall intercept, b_1 is the treatment effect, Tx is a dummy coded variable coded as 1 for the clustered condition and 0 for the unclustered condition, u_j is the random effect for cluster j, and e_ij is the residual error. The variance among clusters is \sigma^2_u and the residual term is \sigma^2_e (ideally you would estimate a unique residual by condition).

Because u_j interacts with Tx, the random effect is only contributes to the clustered condition.

In my PyMC model, I expressed the model in matrix form – I find it easier to deal with especially for dealing with the cluster effects. Namely:
Y = XB + ZU + R,
where X is an n x 2 design matrix for the overall intercept and intervention effect, B is a 1 x 2 matrix of regression coefficients, Z is an n x c design matrix for the cluster effects (where c is equal to the number of clusters in the clustered condition), and U is a c x 1 matrix of cluster effects. The way I’ve written the model below, I have R as an n x n diagonal matrix with \sigma^2_e on the diagonal and 0′s on the off-diagonal.

All priors below are based on a model fit in PROC MCMC in SAS. I’m trying to replicate the analyses in PyMC so I don’t want to change the priors.

The data consist of 200 total subjects. 100 in the clustered condition and 100 in the unclustered. In the clustered condition there are 10 clusters of 10 people each. There is a single outcome variable.

I have 3 specific questions about the model:

  1. Given the description of the model, have I successfully specified the model? The results are quite similar to PROC MCMC, though the cluster variance (\sigma^2_u) differs more than I would expect due to Monte Carlo error. The differences make me wonder if I haven’t got it quite right in PyMC.
  2. Is there a better (or more efficient) way to set up the model? The model runs quickly but I am trying to learn Python and to get better at optimizing how to set up models (especially multilevel models).
  3. How can change my specification so that I can estimate unique residual variances for clustered and unclustered conditions? Right now I’ve estimated just a single residual variance. But I typically want separate estimates for the residual variances per intervention condition.

Here are my first responses:

1. This code worked for me without any modification. :) Although when
I tried to run it twice in the same ipython session, it gave me
strange complaints. (for pymc version 2.1alpha, wall time 78s).
For the newest version in the git repo (pymc version 2.2grad,
commit ca77b7aa28c75f6d0e8172dd1f1c3f2cf7358135, wall time 75s) it
didn’t complain.

2. I find the data wrangling section of the model quite opaque. If
there is a difference between the PROC MCMC and PyMC results, this
is the first place I would look. I’ve been searching for the most
transparent ways to deal with data in Python, so I can share some
of my findings as applied to this block of code.

3. The model could probably be faster. This is the time for me to
recall the two cardinal rules of program optimization: 1) Don’t
Optimize, and 2) (for experts only) Don’t Optimize Yet.

That said, the biggest change to the time PyMC takes to run is in
the step methods. But adjusting this is a delicate operation.
More on this to follow.

4. Changing the specification is the true power of the PyMC approach,
and why this is worth the extra effort, since a random effects
model like yours is one line of STATA. So I’d like to write out
in detail how to change it. More on this to follow.

5. Indentation should be 4 spaces. Diverging from this inane detail
will make python people itchy.

Have a look in the revision history and the git repo README for more.

Good times. Here is my final note from the time I spent messing around:
Django and Rails have gotten a lot of mileage out of emphasizing
_convention_ in frequently performed tasks, and I think that PyMC
models could also benefit from this approach. I’m sure I can’t
develop our conventions myself, but I have changed all the file names
to move towards what I think we might want them to look like. Commit
linked here
.

My analyses often have these basic parts: data, model, fitting code,
graphics code. Maybe your do, too.

p.s. Scott and I have started making an automatic model translator, to generate models like this from the kind of concise specification that users of R and STATA are familiar with. More news on that in a future post.

1 Comment

Filed under MCMC, statistics

Data Wrangling in R, Stata and Python

It was nearly a year ago when I was accosted by students I had urged to try Python and their complaints that the data manipulation capabilities they found so convenient in R and Stata were nowhere to be found. At the time, I did some digging and told them to try la.larry (or pandas, as mentioned by readers of that post). With some more experience, these recommendations have come up again, and in hindsight it seems like la.larry is too heavy a hammer for our most common tasks.

I’m hoping to put together a translation guide for R, Stata, and Python (there is already an extensive one… ours will be much more specialized, to just a few data wrangling commands), and until then, here are Kyle’s top two:

The easiest way to build record arrays (aside from csv2rec) IMO:

import numpy as np
a = ['USA','USA','CAN']
b = [1,6,4]
c = [1990.1,2005.,1995.]
d = ['x','y','z']
some_recarray = np.core.records.fromarrays([a,b,c,d], names=['country','age','year','data'])

The fromarrays method is especially nice because it automatically deals with datatypes.

To subset a particular country-year-age:

some_recarray[(some_recarray.country=='USA')
                & (some_recarray.age==6)
                & (some_recarray.year==2005)]

I’ve also found that caching each of the indices independently vastly speeds things up if you’re going to be iterating.

Love the recarray, hate the documentation.

6 Comments

Filed under software engineering

Introducing the H-RAM

I haven’t had time to say much recently, due to some travel for work, but I did have a chance to prototype the Hit-and-Run/Adaptive Metropolis approach to MCMC that I mentioned in my last post.  Was that really three weeks ago?  How time flies.

Anyway, thanks to the tip Aram pointed out in the comments, Hit-and-Run can take steps in the correct direction without explicitly computing an approximation of the covariance matrix, just by taking a randomly weighted sum of random points.  It’s nearly magic, although Aram says that it actually makes plenty of sense. The code for this “H-RAM” looks pretty similar to my original Hit-and-Run step method, and it’s short enough that I’ll just show it to you:

class HRAM(Gibbs):
    def __init__(self, stochastic, proposal_sd=None, verbose=None):
        Metropolis.__init__(self, stochastic, proposal_sd=proposal_sd,
                            verbose=verbose, tally=False)
        self.proposal_tau = self.proposal_sd**-2.
        self.n = 0
        self.N = 11
        self.value = rnormal(self.stochastic.value, self.proposal_tau, size=tuple([self.N] + list(self.stochastic.value.shape)))

    def step(self):
        x0 = self.value[self.n]
        u = rnormal(zeros(self.N), 1.)
        dx = dot(u, self.value)

        self.stochastic.value = x0
        logp = [self.logp_plus_loglike]
        x_prime = [x0]

        for direction in [-1, 1]:
            for i in xrange(25):
                delta = direction*exp(.1*i)*dx
                try:
                    self.stochastic.value = x0 + delta
                    logp.append(self.logp_plus_loglike)
                    x_prime.append(x0 + delta)
                except ZeroProbability:
                    self.stochastic.value = x0

        i = rcategorical(exp(array(logp) - flib.logsum(logp)))
        self.value[self.n] = x_prime[i]
        self.stochastic.value = x_prime[i]

        if i == 0:
            self.rejected += 1
        else:
            self.accepted += 1

        self.n += 1
        if self.n == self.N:
            self.n = 0

Compare the results to the plain old Hit-and-Run step method:

9 Comments

Filed under MCMC

MCMC in Python: Custom StepMethods and bounded-depth spanning tree distraction

I was looking for a distraction earlier this week, which led me to the world of stackexchange sites. The stack overflow has been on my radar for a while now, because web-search for coding questions often leads there and the answers are often good. And I knew that math overflow, tcs overflow, and even stats overflow existed, but I’d never really explored these things.

Well, diversion found! I got enamored with an MCMC question on the tcs site, about how to find random bounded-depth spanning trees. Bounded-depth spanning trees are something that I worked on with David Wilson and Riccardo Zechinna in my waning days of my MSR post-doc, and we came up with some nice results, but the theoretical ones are just theory, and the practical ones are based on message passing algorithms that still seem magical to me, even after hours of patient explanation from my collaborators.

So let’s do it in PyMC… this amounts to an exercise in writing custom step methods, something that’s been on my mind for a while anyways. And, as a bonus, I got to make an animation of the chain in action which I find incredibly soothing to watch on repeat:

Continue reading

3 Comments

Filed under MCMC, TCS

Beautiful Networks

I’ve had a secret project running in the background this week two weeks ago (how time flies!), a continuation of my work on bias reduction for traceroute sampling. It would be nice if this had applications to global health, but unfortunately (and uncharacteristically) I can’t think of any. It is a great opportunity for visualizing networks, though, a topic worthy of a quick post.

The bowl-of-spaghetti network visualization has been a staple of complex networks research for the last decade. I’m not convinced that there is anything interesting to learn from real world networks by drawing them in 2 or 3 dimensions, but the graphics a seriously eye catching. And I’m not convinced that there isn’t anything to learn from them, either. I invite you to convince me in the comments.

Interesting?


What my side project has reminded me of, however, is the value of drawing networks in 2 dimensions for illustrating the principles of network algorithms and network statistics. And if the topic of study is complex or real-world or random networks, than a little bit of spaghetti in the graphic seems appropriate.

There are a lot of nice tools for doing this now, and just collecting the things I should consider in one place makes this post worthwhile for me. I learn towards a Pythonic solution of networkx driving graphviz, but there are some javascript options out there now that seem promising (jit, protovis, possibly more from a stackoverflow question) in. And for those looking for a less command-line-based solution, the Pajek system seems like a good route.

As for what to graph, here are my thoughts. The Erdos-Renyi graph doesn’t look good, and the Preferential Attachment graph doesn’t look good. Use them for your theorems and for your simulations, but when it comes time to draw something, consider a random geometric graph. And since these can be a little dense, you might want an “edge-percolated random geometric graph”.

I did have a little trouble with this approach, too, when I was drawing minimum spanning trees, because the random geometric points end up being placed really close together occasionally. So maybe the absolutely best random graph for illustrations would be a geometric graph with vertices from a “hard core” model, which is to say random conditioned on being a minimum distance apart. Unfortunately, it is an open question how to efficiently generate hard-core points. But it’s not hard to fake:

More informative?


Want some of your own? Here’s the code.

Leave a Comment

Filed under probability

Nice post on Cython

Python/Sage guru William Stein has a nice article about Cython and how it can speed up routine calculations at the expense of mathematical elegance (and why that is a good thing). Could Sage be my interactive shell for PyMC modeling?

Leave a Comment

Filed under software engineering

MCMC in Python: Global Temperature Reconstruction with PyMC

A short note on the PyMC mailing list alerted me that the Apeescape, the author of mind of a Markov chain blog, was thinking of using PyMC for replicating some controversial climate data analysis, but was having problems with it. Since I’m a sucker for controversial data, I decided to see if I could do the replication exercise in PyMC myself.

I didn’t dig in to what the climate-hockey-stick fuss is about, that’s something I’ll leave for my copious spare time. What I did do is find the data pretty easily available on the original author’s website, and make a translation of the R/bugs model into pymc/python. My work is all in a github repository if you want to try it yourself, here.

Based on Apeescape’s bugs model, I want to have \textnormal{temp}_t = N(\mu_t, \sigma^2) where \mu_t = \beta_0 + \beta_1\textnormal{temp}_{t-1} + \beta_2\textnormal{temp}_{t-2} + \sum_{i=3}^{12} \beta_i(\textnormal{PC})_{t,i}, with priors \vec{\beta} \sim N(\vec{0}, 1000 I) and \sigma \sim \textnormal{Uniform}(0,100).

I implemented this in a satisfyingly concise 21 lines of code, that also generate posterior predictive values for model validation:

# load data                                                                                                                                                                      
data = csv2rec('BUGS_data.txt', delimiter='\t')


# define priors                                                                                                                                                                  
beta = Normal('beta', mu=zeros(13), tau=.001, value=zeros(13))
sigma = Uniform('sigma', lower=0., upper=100., value=1.)


# define predictions                                                                                                                                                             
pc = array([data['pc%d'%(ii+1)] for ii in range(10)]) # copy pc data into an array for speed & convenience                                                                       
@deterministic
def mu(beta=beta, temp1=data.lagy1, temp2=data.lagy2, pc=pc):
    return beta[0] + beta[1]*temp1 + beta[2]*temp2 + dot(beta[3:], pc)

@deterministic
def predicted(mu=mu, sigma=sigma):
    return rnormal(mu, sigma**-2.)

# define likelihood                                                                                                                                                              
@observed
def y(value=data.y, mu=mu, sigma=sigma):
    return normal_like(value, mu, sigma**-2.)

Making an image out of this to match the r version got me stuck for a little bit, because the author snuck in a call to “Friedman’s SuperSmoother” in the plot generation code. That seems unnecessarily sneaky to me, especially after going through all the work of setting up a model with fully bayesian priors. Don’t you want to see the model output before running it through some highly complicated smoothing function? (The super-smoother supsmu is a “running lines smoother which chooses between three spans for the lines”, whatever that is.) In case you do, here it is, together with an alternative smoother I hacked together, since python has no super-smoother that I know of.

Since I have the posterior predictions handy, I plotted the median residuals against the median predicted temperature values. I think this shows that the error model is fitting the data pretty well:

5 Comments

Filed under MCMC, statistics