Tag Archives: Bayesian

Jake’s exposition of Bayesianism

Still a gem:

  1. http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/
  2. http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/
  3. http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/
  4. http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/
  5. http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/

(A collection of gems?)

And, TIL, (editors note, actually a few months ago…) there there is a paper version of it, too: https://arxiv.org/pdf/1411.5018.pdf

Comments Off on Jake’s exposition of Bayesianism

Filed under Uncategorized

MCMC in Python: Gaussian mixture model in PyMC3

PyMC3 is really coming along. I tried it out on a Gaussian mixture model that was the subject of some discussion on GitHub: https://github.com/pymc-devs/pymc3/issues/443#issuecomment-109813012 http://nbviewer.ipython.org/gist/aflaxman/64f22d07256f67396d3a

1 Comment

Filed under MCMC, software engineering, statistics

Laplace approximation in Python: another cool trick with PyMC3

I admit that I’ve been skeptical of the complete rewrite of PyMC that underlies version 3. It seemed to me motivated by an interest in using unproven new step methods that require knowing the derivative of the posterior distribution. But, it is really coming together, and regardless of whether or not the Hamiltonian Monte Carlo stuff pays off, there are some cool tricks you can do when you can get derivatives without a hassle.

Exhibit 1: A Laplace approximation approach to fitting mixed effect models (as described in http://www.seanet.com/~bradbell/tmb.htm)


Comments Off on Laplace approximation in Python: another cool trick with PyMC3

Filed under software engineering, statistics

MCMC in Python: Estimating failure rates from observed data

A question and answer on CrossValidated, which make me reflect on the danger of knowing enough statistics to be dangerous.

Comments Off on MCMC in Python: Estimating failure rates from observed data

Filed under statistics

MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

My recent tutorial on how to stick a statistical model on a systems dynamics model in PyMC generated a good amount of reader interest, as well as an interesting comment from Anand Patil, who writes:

Something that might interest you is that, in continuous-time stochastic differential equation models, handling the unobserved sample path between observations is really tricky. Roberts and Stramer’s On inference for partially observed nonlinear diffusion models using the Metropolis-Hastings algorithm explains why. This difficulty can apply to discrete-time models with loads of missing data as well. Alexandros Beskos has produced several really cool solutions.

This body of research is quite far from the vocabulary I am familiar with, so I’m not sure how serious a problem this could be for me. It did get me interested in sticking my statistical model to a systems model with stochastic dynamics, though, something which took only a few additional lines… thanks PyMC!

## Stochastic SI model

from pymc import *
from numpy import *

#observed data
T = 10
susceptible_data = array([999,997,996,994,993,992,990,989,986,984])
infected_data = array([1,2,5,6,7,18,19,21,23,25])

# stochastic priors
beta = Uniform('beta', 0., 1., value=.05)
gamma = Uniform('gamma', 0., 1., value=.001)
tau = Normal('tau', mu=.01, tau=100., value=.01)

# stochastic compartmental model
S = {}
I = {}

## uninformative initial conditions
S[0] = Uninformative('S_0', value=999.)
I[0] = Uninformative('I_0', value=1.)

## stochastic difference equations for later times
for i in range(1,T):
    def E_S_i(S=S[i-1], I=I[i-1], beta=beta):
        return S - beta * S * I / (S + I)
    S[i] = Normal('S_%d'%i, mu=E_S_i, tau=tau, value=E_S_i.value)

    def E_I_i(S=S[i-1], I=I[i-1], beta=beta, gamma=gamma):
        return I + beta * S * I / (S + I) - gamma * I
    I[i] = Normal('I_%d'%i, mu=E_I_i, tau=tau, value=E_I_i.value)

# data likelihood
A = Poisson('A', mu=[S[i] for i in range(T)], value=susceptible_data, observed=True)
B = Poisson('B', mu=[I[i] for i in range(T)], value=infected_data, observed=True)

This ends up taking a total of 6 lines more than the deterministic version, and all the substantial changes are from lines 24-34. So, question one is for Anand, do I have to worry about unobserved sample paths here? If I’ve understood Roberts and Stramer’s introduction, I should be ok. Question two returns to a blog topic from one year ago, that I’ve continued to try to educate myself about: how do I decide if and when this more complicated model should be used?

Comments Off on MCMC in Python: Statistical model stuck on a stochastic system dynamics model in PyMC

Filed under global health, MCMC, statistics

MCMC in Python: How to stick a statistical model on a system dynamics model in PyMC

A recent question on the PyMC mailing list inspired me.  How can you estimate transition parameters in a compartmental model?  I did a lit search for just this when I started up my generic disease modeling project two years ago.  Much information, I did not find.  I turned up one paper which said basically that using a Bayesian approach was a great idea and someone should try it (and I can’t even find that now!).

Part of the problem was language.  I’ve since learned that micro-simulators call it “calibration” when you estimate parameter values, and there is a whole community of researchers working on “black-box modeling plug-and-play inference” that do something similar as well.  These magic phrases are incantations to the search engines that help find some relevant prior work.

But I started blazing my own path before I learned any of the right words; with PyMC, it is relatively simple.  Consider the classic SIR model from mathematical epidemology.  It’s a great place to start, and it’s what Jason Andrews started with on the PyMC list.  I’ll show you how to formulate it for Bayesian parameter estimation in PyMC, and how to make sure your MCMC has run for long enough. Continue reading


Filed under global health, MCMC, statistics

Child Mortality Paper

Check it out, my first published research in global health: Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970—2010: a systematic analysis of progress towards Millennium Development Goal 4. I’m the ‘t’ in et al, and my contribution was talking them into using the really fun Gaussian Process in their model (and helping do it).

I’ve long wanted to write a how-to style tutorial about using Gaussian Processes in PyMC, but time continues to be on someone else’s side. Instead of waiting for that day, you can enjoy the GP Users Guide now.


Filed under global health

Multilevel (hierarchical) modeling: what it can and cannot do in Python

I re-read a short paper of Andrew Gelman’s yesterday about multilevel modeling, and thought “That would make a nice example for PyMC”.  The paper is “Multilevel (hierarchical) modeling: what it can and cannot do, and R code for it is on his website.

To make things even easier for a casual blogger like myself, the example from the paper is extended in the “ARM book”, and Whit Armstrong has already implemented several variants from this book in PyMC. Continue reading


Filed under MCMC, statistics

MCMC in Python: PyMC for Bayesian Model Selection

(Updated 9/2/2009, but still unfinished; see other’s work on this that I’ve collected)

I never took a statistics class, so I only know the kind of statistics you learn on the street. But now that I’m in global health research, I’ve been doing a lot of on-the-job learning. This post is about something I’ve been reading about recently, how to decide if a simple statistical model is sufficient or if the data demands a more complicated one. To keep the matter concrete (and controversial) I’ll focus on a claim from a recent paper in Nature that my colleague, Haidong Wang, choose for our IHME journal club last week: Advances in development reverse fertility declines. The title of this short letter boldly claims a causal link between total fertility rate (an instantaneous measure of how many babies a population is making) and the human development index (a composite measure of how “developed” a country is, on a scale of 0 to 1). Exhibit A in their case is the following figure:

An astute observer of this chart might ask, “what’s up with the scales on those axes?” But this post is not about the visual display of quantitative information. It is about deciding if the data has a piecewise linear relationship that Myrskyla et al claim, and doing it in a Bayesian framework with Python and PyMC. But let’s start with a figure where the axes have a familiar linear scale! Continue reading


Filed under MCMC, statistics

MCMC in Python: PyMC for Bayesian Probability

I’ve got an urge to write another introductory tutorial for the Python MCMC package PyMC.  This time, I say enough to the comfortable realm of Markov Chains for their own sake.  In this tutorial, I’ll test the waters of Bayesian probability.

Darwin's other Bulldog

Tom Bayes

Now, what better problem to stick my toe in than the one that inspired Reverend Thomas in the first place? Let’s talk about sex ratio. This is also convenient, because I can crib from Bayesian Data Analysis, that book Rif recommended me a month ago.

Bayes started this enterprise off with a question that has inspired many an evolutionary biologist: are girl children as likely as boy children? Or are they more likely or less likely? Laplace wondered this also, and in his time and place (from 1745 to 1770 in Paris) there were birth records of 241,945 girls and 251,527 boys. In the USA in 2005, the vital registration system recorded 2,118,982 male and 2,019,367 female live births [1]. I’ll set up a Bayesian model of this, and ask PyMC if the sex ratio could really be 1.0.

Continue reading


Filed under MCMC, probability