Tag Archives: gaussian processes

Non-parametric regression in Python: Gaussian Processes in sklearn (with a little PyMC)

I’ve got a fun class going this quarter, on “artificial intelligence for health metricians”, and the course content mixed with some of the student interest has got me looking at the options for doing Gaussian process regression in Python. `PyMC2` has some nice stuff, but the `sklearn` version fits with the rest of my course examples more naturally, so I’m using that instead.

But `sklearn` doesn’t have the fanciest of fancy covariance functions implemented, and at IHME we have been down the road of the Matern covariance function for over five years now. It’s in `PyMC`, so I took a crack at mash-up. (Took a mash at a mash-up?) There is some room for improvement, but it is a start. If you need to do non-parametric regression for something that is differentiable more than once, but less than infinity times, you could try starting here: http://nbviewer.ipython.org/gist/aflaxman/af7bdb56987c50f3812b

p.s. Chris Fonnesbeck has some great notes on doing stuff like this and much more here: http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section5_1-Gaussian-Processes.ipynb

Comments Off on Non-parametric regression in Python: Gaussian Processes in sklearn (with a little PyMC)

Filed under statistics

MCMC in Python: (approximate) derivative-constrained Gaussian Processes with PyMC.gp

I’ve always enjoyed the Gaussian Process part of the PyMC package, and a question on the mailing list yesterday reminded me of a project I worked on with it that never came to fruition: how to implement constraints on the derivatives of the GP.

The best answer I could come up with is to use “potential” nodes, and do it approximately. That is to say, instead of constraining the derivative, I satisfy myself to constrain a secant that approximates the derivative. And instead of constraining it at every point in an interval, I satisfy myself to constrain it at a discrete subset of points.

Here is an ipython notebook example: [ipynb] [py]

Comments Off on MCMC in Python: (approximate) derivative-constrained Gaussian Processes with PyMC.gp

Filed under MCMC

Gaussian Processes and Jigsaw Puzzles with PyMC.gp

I was thinking about cutting something up into little pieces the other day, let’s not get into the details. The point is, I turned my destructive urge into creative energy when I started thinking about jigsaw puzzles. You might remember when my hobby was maze making with randomly generated bounded depth spanning trees a few months ago. It turns out that jigsaw puzzles are just as fun.

The secret ingredient to my jigsaw puzzle design is the Gaussian process with a Matern covariance function. (Maybe you knew that was coming.) GPs are an elegant way to make the little nubs that hold the puzzle together. It’s best to use two of them together to make the nub, like this:

Doing this is not hard at all, once you sort out the intricacies of the PyMC.gp package, and takes only a few lines of Python code:

def gp_puzzle_nub(diff_degree=2., amp=1., scale=1.5, steps=100):
    """ Generate a puzzle nub connecting point a to point b"""

    M, C = uninformative_prior_gp(0., diff_degree, amp, scale)
    gp.observe(M, C, data.puzzle_t, data.puzzle_x, data.puzzle_V)
    GPx = gp.GPSubmodel('GP', M, C, pl.arange(1))
    X = GPx.value.f(pl.arange(0., 1.0001, 1. / steps))

    M, C = uninformative_prior_gp(0., diff_degree, amp, scale)
    gp.observe(M, C, data.puzzle_t, data.puzzle_y, data.puzzle_V)
    GPy = gp.GPSubmodel('GP', M, C, pl.arange(1))
    Y = GPy.value.f(pl.arange(0., 1.0001, 1. / steps))
    return X, Y

(full code here)

I was inspired by something Andrew Gelman blogged, about the utility of writing a paper and a blog post about this or that. So I tried it out. It didn’t work for me, though. There isn’t a paper’s worth of ideas here, but now I’ve depleted my energy before finishing the blog. Here it is: an attempted paper to accompany this post. Patches welcome.

In addition to a aesthetically pleasing diversion, I also got something potentially useful out of this, a diagram of how misspecifying any one of the parameters of the Matern covariance function can lead to similarly strange looking results. This is my evidence that you can’t tell if your amplitude is too small or your scale is too large from a single bad fit:

Comments Off on Gaussian Processes and Jigsaw Puzzles with PyMC.gp

Filed under statistics

Gaussian Processes in Theory

James Lee has a new post on his tcsmath blog about Gaussian Processes, a topic I’ve been enamored with for the last while. I love the graphics he includes in his posts… they look like they take a lot of work to produce.

James (and Talagrand) are interested in finding the supremum of a GP, which I can imagine being a very useful tool for studying random graphs and average-case analysis of algorithms. I’m interested in finding rapidly mixing Markov chains over GPs, which seems to be useful for disease modeling. Seemingly very different directions of research, but I’ll be watching tcsmath for the next installment of majorizing measures.

1 Comment

Filed under probability

Holiday Viewing

It’s been snowing in Seattle for a week now, and that never happens. Things were already getting quiet around here for the holidays, but now there are almost no cars on the roads and it’s been really quiet. I’ve been watching healthy algorithm videos to pass the nice, quiet time:

Gaussian Process Basics
David MacKay

GP Covariance Functions
Carl Edward Rasmussen

Unnatural Causes
California Newsreel

The Trap
Adam Curtis

Comments Off on Holiday Viewing

Filed under global health, probability, videos