Daily Archives: January 4, 2013

Classic EM in Python: Warm-up problem of 197 animals in PyMC

The classic paper on the EM algorithm begins with a little application in multinomial modeling:

Rao (1965, pp. 368-369) presents data in which 197 animals are distributed multinomially into four categories, so that the observed data consist of y = (y_1, y_2, y_3, y_4) = (125, 18, 20, 34).

A genetic model for the population specifies cell probabilities \left(\frac12 + \frac14\pi, \frac14(1-\pi), \frac14(1 -\pi), \frac14\pi\right) for some \pi with 0\leq \pi \leq 1.

Solving this analytically sounds very much like a 1960’s pastime to me (answer: \pi = \frac{15 + \sqrt{53809}}{394}), and a modern computer with PyMC can make short work of numerically approximating this maximum likelihood estimation. It takes no more than this:

import pymc as mc, numpy as np

pi = mc.Uniform('pi', 0, 1)
y = mc.Multinomial('y', n=197, 
                   p=[.5 + .25*pi, .25*(1-pi), .25*(1-pi), .25*pi],
                   value=[125, 18, 20, 34], observed=True)

mc.MAP([pi,y]).fit()
print pi.value

But the point is to warm-up, not to innovate. The EM way is to introduce a latent variable x = (x_1, x_2, x_3, x_4, x_5) and set y_1 = x_1+x_2, y_2=x_3, y_3=x_4, y_4=x_5, and work with a multinomial distribution for x that induces the multinomial distribution above on y. The art is determining a good latent variable, mapping, and distribution, and “good” is meant in terms of the computation it yields:

import pymc as mc, numpy as np

pi = mc.Uniform('pi', 0, 1)

@mc.stochastic
def x(n=197, p=[.5, .25*pi, .25*(1-pi), .25*(1-pi), .25*pi], value=[125.,0.,18.,20.,34.]):
    return mc.multinomial_like(np.round_(value), n, p)

@mc.observed
def y(x=x, value=[125, 18, 20, 34]):
    if np.allclose([x[0]+x[1], x[2], x[3], x[4]], value):
        return 0
    else:
        return -np.inf

It is no longer possible to get a good fit to an mc.MAP object for this model (why?), but EM does not need to. The EM approach is to alternate between two steps:

  • Update x (E-step, because it is set to its conditional expectation based on the current value of \pi)
  • Update \pi (M-step, because it is chosen to maximize the likelihood for the current value of x)

This is simply implemented in PyMC and quick to get the goods:

def E_step():
    x.value = [125 * .5 / (.5 + .25*pi.value), 125 * .25*pi.value / (.5 + .25*pi.value), 18, 20, 34]
def M_step():
    pi.value = (x.value[1] + 34.) / (x.value[1] + 34. + 18. + 20.)
for i in range(5):
    print 'iter %2d: pi=%0.4f, X=%s' %(i, pi.value, x.value)
    E_step()
    M_step()

It is not fair to compare speeds without making both run to the same tolerance, but when that is added EM is 2.5 times faster. That could be important sometimes, I suppose.

1 Comment

Filed under statistics