# Causal Modeling in Python: Bayesian Networks in PyMC

While I was off being really busy, an interesting project to learn PyMC was discussed on their mailing list, beginning thusly:

I am trying to learn PyMC and I decided to start from the very simple discrete Sprinkler model.
I have found plenty of examples for continuous models, but I am not sure how should I proceed with conditional tables, especially when the condition is over more than a variable. How would you model a CPT with PyMC, and in particular the Sprinkler model?

I spend all my time on continuous models these days, and I miss my old topics from back in grad school. Not that this is one of them, exactly. But when I found myself with a long wait while running some validation code, I thought I’d give it a try. The model turned out to be simple, although using MCMC to fit it is probably not the best idea.

The model:

A simple causal network

The model is a simple causal network, which says that two things cause the grass to be wet, the rain and the sprinkler. And the absence of rain causes the sprinkler to be used. But all of these things happen with conditional probabilities, which are known.

The Question:
Questions about causality gets me very confused, and are a topic of philosophy that I would like to learn more about one day. But questions about the probability of events in discrete probability spaces are (were?) my specialty. So I will ignore the direction of the arrows in the figure above, and basically ignore the figure, and focus on the conditional probabilities. We can find the probability that the grass is wet if it is raining by summing over the possibilities for the sprinkler system. But what is the probability that it is raining if the grass is wet?

That question is a good homework exercise on Bayes Rule, and the wikipedia link above work out the full details in by hand. But isn’t there some way to have my computer do the work?

Yes, of course. And it doesn’t take much code:

```# sprinkler.py
import pylab as pl
import pymc as mc

G_obs = [1.]
N = len(G_obs)

R = mc.Bernoulli('R', .2, value=pl.ones(N))

p_S = mc.Lambda('p_S', lambda R=R: pl.where(R, .01, .4),
doc='Pr[S|R]')
S = mc.Bernoulli('S', p_S, value=pl.ones(N))

p_G = mc.Lambda('p_G', lambda S=S, R=R:
pl.where(S, pl.where(R, .99, .9), pl.where(R, .8, 0.)),
doc='Pr[G|S,R]')
G = mc.Bernoulli('G', p_G, value=G_obs, observed=True)
```

This is all it takes to implement the sprinkler model in PyMC. I call your attention to a few details:

• Initial values: the `value=...` parameters in the unobserved stochastic nodes are not required, and if they are left out, PyMC will initialize them randomly from the prior distribution. But there is a state that is impossible in this model (wet grass without sprinkler or rain), and there will be annoying errors when this is chosen as the random initial values.
• Names and docs: I have a great regret from a model I made with nice descriptive names for all the stochastic nodes. Ever since then I’ve used names that correspond exactly to Python variable names. If I want to be descriptive about what the node is for, I’m going to use the `doc` parameters. I hope that goes well.
• `Lambda` deterministic nodes: a lot of recent work has gone into making deterministic nodes automatically in PyMC, so there might be a cooler way to implement these conditional probabilities. This approach is clean, and I like it for keeping my example simple, but it feels like I’m missing something cleaner and simpler and ready to scale to a not-so-simple example.

Fitting this model with MCMC is silly. It is easy to work out by hand, and wikipedia does, so you could just look it up. And if you have a version where you don’t know the answer, then belief propagation is a much more efficient way to find it (and is correct as long are there are no loops in the causal network). But it is also easy. Maybe it is worthwhile then, since it only takes two lines of code:

```import sprinkler
m = mc.MCMC(sprinkler)
m.sample(100000)
```

To make sure it is worthwhile, though, you need to convince yourself that the MCMC mixes, and that you have enough samples to an estimate with sufficient accuracy. Since the exact answer is easy to work out, this is a fun relationship to explore experimentally. Here is the relationship between the number of MCMC iterations and the median absolute error in the estimated probability of rain given wet grass over 10 replications:

The estimate can be pretty accurate if you wait long enough.

Homework:
The practical extension of this is to figure out a nice way to collect up the conditional probability distributions in the general case. And just figuring out what they are is a worthy topic of many Ph.D. dissertations.

The theoretical question that this left me with is when does MCMC work for this, and how long does it take? With some assumptions on the structure of the Bayesian network and the conditional probability distributions, you might be able to prove something interesting.

And the big question is how do you come up with these networks in the first place, and how do you know if you’re right (or right enough)? You probably need to be much more of a statistician and much more of a philosopher than I to sort that one out.