# Category Archives: Uncategorized

## IHME Seminar: Ambient Air Pollution and Cardiovascular Disease

The semester is starting up again, and that means that weekly IHME seminars are starting up again. This week, we heard from Joel Kaufman, a doctor and UW professor who knows quite a lot about how air pollution is bad for the heart. He had some great historical photos of air pollution from the Great Smog of London, which I had not heard of before. Searching later led me to this collection in the Guardian. Dr. Kaufman also had some recent photos of extreme air pollution, which looked sort of like this one.

I remember when I started this blog, I had a goal to draw connections between the issues in global health and the methods and results of theoretical computer science. What does the air-pollution/cardio-health talk inspire along these lines? Well, there are two “big data” sources going on here: continuously updated measurements of air quality from a number of geographically dispersed sensors, and regularly conducted CT scans of participants in a large longitudinal study. It was only an hour long talk, so I’m not sure what problems arise when you put these things together, but I bet you can’t store it all in memory, even on a pretty large computer. And that’s one definition of big…

Filed under Uncategorized

## Thoughtful blog about using IPython Notebook in scientific workflow

I’ve been doing this for about a year now, and it is working super-well. What I wish for is a way to paste images directly into the notebook. I think it would be pretty easy to add but I haven’t figured out how to do it yet.

Filed under Uncategorized

## DSP in Python: Active Noise Reduction with PyAudio

I had a fun little project a while back, to deal with some night noise that was getting in the way of my sleep. Active noise reduction, hacked together in Python. It really works (for me)! There is tons of room for improvement, and at least one interested party. I’m finally pushing it out into the world, so maybe someone will improve it.

Filed under Uncategorized

## Graph Analytics Challenge

I was playing around with SPARQL queries and the semantic web earlier this year, inspired in part by a contest I entered. Well, the good news came out that my project was second runner-up! Of course, I would like to be first place, but the projects that beat mine were both really cool. Information Week did a nice story on all of us.

Filed under Uncategorized

## Math and Dance

I almost didn’t share these HarleMCMC videos, but how long could I resist, really?

We’ll see how this holds up to repeated viewing…

Here is a math/dance video for the ages:

Filed under MCMC, Uncategorized

## Better typography for IPython notebooks, now

I came across a long blog about how to make ipython notebooks more aesthetically pleasing recently, and I think there is a lot to be learned there.

The good news is that you can try this out on a notebook-by-notebook basis with a little trick. Just drop this into a cell in any IPython notebook:

import IPython.core.display

IPython.core.display.HTML("""

div.input {
width: 105ex; /* about 80 chars + buffer */
}

div.text_cell {
width: 105ex /* instead of 100%, */
}

div.text_cell_render {
/*font-family: "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;*/
font-family: "Charis SIL", serif; /* Make non-code text serif. */
line-height: 145%; /* added for some line spacing of text. */
width: 105ex; /* instead of 'inherit' for shorter lines */
}

/* Set the size of the headers */
div.text_cell_render h1 {
font-size: 18pt;
}

div.text_cell_render h2 {
font-size: 14pt;
}

.CodeMirror {
font-family: Consolas, monospace;
}

""")


Filed under Uncategorized

Filed under Uncategorized

## Classic EM in Python: Multinomial sampling

In the classic paper on the EM algorithm, the extensive example section begins with a multinomial modeling example that is theoretically very similar to the warm-up problem on 197 animals:

We can think of the complete data as an $n \times p$ matrix $x$ whose $(i,j)$ element is unity if the $i$-th unit belongs in the $j$-th of $p$ possible cells, and is zero otherwise. The $i$-th row of $x$ contains $p-1$ zeros and one unity, but if the $i$-th unit has incomplete data, some of the indicators in the $i$-th row of $x$ are observed to be zero, while the others are missing and we know only that one of them must be unity. The E-step then assigns to the missing indicators fractions that sum to unity within each unit, the assigned values being expectations given the current estimate of $\phi$. The M-step then becomes the usual estimation of $\phi$ from the observed and assigned values of the indicators summed over the units.

In practice, it is convenient to collect together those units with the same pattern of missing indicators, since the filled in fractional counts will be the same for each; hence one may think of the procedure as filling in estimated counts for each of the missing cells within each group of units having the same pattern of missing data.

When I first made some data to try this out, it looked like this:

import pymc as mc, numpy as np, pandas as pd, random

n = 100000
p = 5

pi_true = mc.rdirichlet(np.ones(p))
pi_true = np.hstack([pi_true, 1-pi_true.sum()])
x_true = mc.rmultinomial(1, pi_true, size=n)

x_obs = array(x_true, dtype=float)
for i in range(n):
for j in random.sample(range(p), 3):
x_obs[i,j] = np.nan


At first, I was pretty pleased with myself when I managed to make a PyMC model and an E-step and M-step that converged to something like the true value of $\pi$. The model is not super slick:

pi = mc.Uninformative('pi', value=np.ones(p)/p)

x_missing = np.isnan(x_obs)
x_initial = x_obs.copy()
x_initial[x_missing] = 0.
for i in range(n):
if x_initial[i].sum() == 0:
j = np.where(x_missing[i])[0][0]
x_initial[i,j] = 1.
@mc.stochastic
def x(pi=pi, value=x_initial):
return mc.multinomial_like(value, 1, pi)

@mc.observed
def y(x=x, value=x_obs):
if np.allclose(x[~x_missing], value[~x_missing]):
return 0
else:
return -np.inf


And the E-step/M-step parts are pretty simple:

def E_step():
x_new = array(x_obs, dtype=float)
for i in range(n):
if x_new[i, ~x_missing[i]].sum() == 0:
conditional_pi_sum = pi.value[x_missing[i]].sum()
for j in np.where(x_missing[i])[0]:
x_new[i,j] = pi.value[j] / conditional_pi_sum
else:
x_new[i, x_missing[i]] = 0.
x.value = x_new

def M_step():
counts = x.value.sum(axis=0)
pi.value = (counts / counts.sum())


But the way the values converge does look nice:

The thing that made me feel silly was comparing this fancy-pants approach to the result of averaging all of the non-empty cells of x_obs:

ests = pd.DataFrame(dict(pr=pi_true, true=x_true.mean(0),
naive=pd.DataFrame(x_obs).mean(), em=pi.value),
columns=['pr', 'true', 'naive', 'em']).sort('true')
print np.round_(ests, 3)
pr   true  naive     em
2  0.101  0.101  0.100  0.101
0  0.106  0.106  0.108  0.108
3  0.211  0.208  0.209  0.208
1  0.269  0.271  0.272  0.271
4  0.313  0.313  0.314  0.313


Simple averages are just as good as EM, for the simplest distribution I could think of based on the example, anyways.

To see why this EM business is worth the effort requires a more elaborate model of missingness. I made one, but it is a little bit messy. Can you make one that is nice and neat?

Filed under statistics, Uncategorized

## R in ipython

This is going to be great. Now I’ve got to figure out how to make Rpy work, though!

There is a funny little discussion on twitter, where I found out about it.

Filed under Uncategorized

## Turing Centennial Series from In Theory blog

I have greatly enjoyed the series of posts that Luca Trevisan brought together in honor of Alan Turing’s 100th birthday.  He introduced the series as follows:

Within the Turing festivities, I think it would be interesting to talk about how things have changed (or not) since Turing’s time for people who do academic work in cryptography and in the theory of computing and who are gay or lesbian.

So I have invited a number of gay and lesbian colleagues to write guest posts talking about how things have been for them, and so far half a dozen have tentatively accepted. Their posts will appear next month which, besides being Turing’s centennial month, also happens to be the anniversary of the Stonewall riots.

The 8 contributed posts (starting from post 0) are collected here, but I recommend that you start from Post 0 and work your way up: