My recent tutorial on how to stick a statistical model on a systems dynamics model in PyMC generated a good amount of reader interest, as well as an interesting comment from Anand Patil, who writes:

Something that might interest you is that, in continuous-time stochastic differential equation models, handling the unobserved sample path between observations is really tricky. Roberts and Stramer’s On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm explains why. This difficulty can apply to discrete-time models with loads of missing data as well. Alexandros Beskos has produced several really cool solutions.

This body of research is quite far from the vocabulary I am familiar with, so I’m not sure how serious a problem this could be for me. It did get me interested in sticking my statistical model to a systems model with stochastic dynamics, though, something which took only a few additional lines… thanks PyMC!

## Stochastic SI model from pymc import * from numpy import * #observed data T = 10 susceptible_data = array([999,997,996,994,993,992,990,989,986,984]) infected_data = array([1,2,5,6,7,18,19,21,23,25]) # stochastic priors beta = Uniform('beta', 0., 1., value=.05) gamma = Uniform('gamma', 0., 1., value=.001) tau = Normal('tau', mu=.01, tau=100., value=.01) # stochastic compartmental model S = {} I = {} ## uninformative initial conditions S[0] = Uninformative('S_0', value=999.) I[0] = Uninformative('I_0', value=1.) ## stochastic difference equations for later times for i in range(1,T): @deterministic(name='E[S_%d]'%i) def E_S_i(S=S[i-1], I=I[i-1], beta=beta): return S - beta * S * I / (S + I) S[i] = Normal('S_%d'%i, mu=E_S_i, tau=tau, value=E_S_i.value) @deterministic(name='E[I_%d]'%i) def E_I_i(S=S[i-1], I=I[i-1], beta=beta, gamma=gamma): return I + beta * S * I / (S + I) - gamma * I I[i] = Normal('I_%d'%i, mu=E_I_i, tau=tau, value=E_I_i.value) # data likelihood A = Poisson('A', mu=[S[i] for i in range(T)], value=susceptible_data, observed=True) B = Poisson('B', mu=[I[i] for i in range(T)], value=infected_data, observed=True)

This ends up taking a total of 6 lines more than the deterministic version, and all the substantial changes are from lines 24-34. So, question one is for Anand, do I have to worry about unobserved sample paths here? If I’ve understood Roberts and Stramer’s introduction, I should be ok. Question two returns to a blog topic from one year ago, that I’ve continued to try to educate myself about: how do I decide if and when this more complicated model should be used?