Monthly Archives: November 2010

MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

My recent tutorial on how to stick a statistical model on a systems dynamics model in PyMC generated a good amount of reader interest, as well as an interesting comment from Anand Patil, who writes:

Something that might interest you is that, in continuous-time stochastic differential equation models, handling the unobserved sample path between observations is really tricky. Roberts and Stramer’s On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm explains why. This difficulty can apply to discrete-time models with loads of missing data as well. Alexandros Beskos has produced several really cool solutions.

This body of research is quite far from the vocabulary I am familiar with, so I’m not sure how serious a problem this could be for me. It did get me interested in sticking my statistical model to a systems model with stochastic dynamics, though, something which took only a few additional lines… thanks PyMC!

## Stochastic SI model

from pymc import *
from numpy import *

#observed data
T = 10
susceptible_data = array([999,997,996,994,993,992,990,989,986,984])
infected_data = array([1,2,5,6,7,18,19,21,23,25])

# stochastic priors
beta = Uniform('beta', 0., 1., value=.05)
gamma = Uniform('gamma', 0., 1., value=.001)
tau = Normal('tau', mu=.01, tau=100., value=.01)

# stochastic compartmental model
S = {}
I = {}

## uninformative initial conditions
S[0] = Uninformative('S_0', value=999.)
I[0] = Uninformative('I_0', value=1.)

## stochastic difference equations for later times
for i in range(1,T):
    def E_S_i(S=S[i-1], I=I[i-1], beta=beta):
        return S - beta * S * I / (S + I)
    S[i] = Normal('S_%d'%i, mu=E_S_i, tau=tau, value=E_S_i.value)

    def E_I_i(S=S[i-1], I=I[i-1], beta=beta, gamma=gamma):
        return I + beta * S * I / (S + I) - gamma * I
    I[i] = Normal('I_%d'%i, mu=E_I_i, tau=tau, value=E_I_i.value)

# data likelihood
A = Poisson('A', mu=[S[i] for i in range(T)], value=susceptible_data, observed=True)
B = Poisson('B', mu=[I[i] for i in range(T)], value=infected_data, observed=True)

This ends up taking a total of 6 lines more than the deterministic version, and all the substantial changes are from lines 24-34. So, question one is for Anand, do I have to worry about unobserved sample paths here? If I’ve understood Roberts and Stramer’s introduction, I should be ok. Question two returns to a blog topic from one year ago, that I’ve continued to try to educate myself about: how do I decide if and when this more complicated model should be used?

Comments Off on MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

Filed under global health, MCMC, statistics

Snowed in Seattle

Since Seattle rarely sees snow the city gets quite shut down when it does happen. We have a few inches on the ground today, so despite the beautiful sun, UW has suspended operations. Fine for me, I like the change of scenery. Somehow, these snowy days have got me watching a lot of online videos. First of all, a movie recommendation: In the loop, a political satire, which is the genre that Jessi and I agree on the most.

But for those of you looking for a shorter diversion and/or something healthyalgorithms related, I call your attention first to Google Refine Screencasts and second to a list of the 20 must-see CS TED talks from the mastersincomputerscience blog.

On the 20 must-see list, Torsten Reil’s talk caught my eye. He has built these human models with really lifelike kinetics using serious physics and biology, and then has 15 minutes of his talk devoted to showing what happens when he pushes them around. My caricature of a hard-AI researcher is a mad scientist building a sentient computer to be their only friend, so the way Reil has created these lifelike virtual people just to beat the up and make fun of their dance moves, that exceeds my expectations.

1 Comment

Filed under videos

Beautiful Networks

I’ve had a secret project running in the background this week two weeks ago (how time flies!), a continuation of my work on bias reduction for traceroute sampling. It would be nice if this had applications to global health, but unfortunately (and uncharacteristically) I can’t think of any. It is a great opportunity for visualizing networks, though, a topic worthy of a quick post.

The bowl-of-spaghetti network visualization has been a staple of complex networks research for the last decade. I’m not convinced that there is anything interesting to learn from real world networks by drawing them in 2 or 3 dimensions, but the graphics a seriously eye catching. And I’m not convinced that there isn’t anything to learn from them, either. I invite you to convince me in the comments.


What my side project has reminded me of, however, is the value of drawing networks in 2 dimensions for illustrating the principles of network algorithms and network statistics. And if the topic of study is complex or real-world or random networks, than a little bit of spaghetti in the graphic seems appropriate.

There are a lot of nice tools for doing this now, and just collecting the things I should consider in one place makes this post worthwhile for me. I learn towards a Pythonic solution of networkx driving graphviz, but there are some javascript options out there now that seem promising (jit, protovis, possibly more from a stackoverflow question) in. And for those looking for a less command-line-based solution, the Pajek system seems like a good route.

As for what to graph, here are my thoughts. The Erdos-Renyi graph doesn’t look good, and the Preferential Attachment graph doesn’t look good. Use them for your theorems and for your simulations, but when it comes time to draw something, consider a random geometric graph. And since these can be a little dense, you might want an “edge-percolated random geometric graph”.

I did have a little trouble with this approach, too, when I was drawing minimum spanning trees, because the random geometric points end up being placed really close together occasionally. So maybe the absolutely best random graph for illustrations would be a geometric graph with vertices from a “hard core” model, which is to say random conditioned on being a minimum distance apart. Unfortunately, it is an open question how to efficiently generate hard-core points. But it’s not hard to fake:

More informative?

Want some of your own? Here’s the code.

Comments Off on Beautiful Networks

Filed under probability

OR jobs (oh, are they?)

Jason Hartlines writes to publicize the Northwestern University Industrial Engineering department’s search to fill two faculty positions. Individuals with interest in logistics and health-care operations (as well as a range of other interests) are especially encouraged to apply.

Comments Off on OR jobs (oh, are they?)

Filed under Uncategorized

Nice post on Cython

Python/Sage guru William Stein has a nice article about Cython and how it can speed up routine calculations at the expense of mathematical elegance (and why that is a good thing). Could Sage be my interactive shell for PyMC modeling?

Comments Off on Nice post on Cython

Filed under software engineering

Interesting Talks

Here are a slew of interesting things for computer scientists interested in global health, from the UW CSE Industrial Affiliates Program. This came my from the UW Change mailing list.

Comments Off on Interesting Talks

Filed under global health

Global Health Metrics and Evaluation (GHME) Conference 2011

The Global Health Metrics and Evaluation (GHME) Conference 2011 is something that might interest you. It’s in Seattle, from March 14-16, 2011. More from the conference website:

The Global Health Metrics & Evaluation (GHME) conference aims to bring together all the different disciplines involved in global health measurement and evaluation under one roof to share innovative tools and methods to get a better understanding of what the possibilities are in approaching population health measurement.

Who should come? People who are interested in cutting edge science, who want to learn from others in their fields and in other fields. Specifically, researchers, academic leaders, students, policymakers, non-governmental organizations, foundations, country offices of health statistics, and national and multi-national health organizations.

Comments Off on Global Health Metrics and Evaluation (GHME) Conference 2011

Filed under global health