Monthly Archives: November 2008

MCMC in Python: PyMC for Bayesian Probability

I’ve got an urge to write another introductory tutorial for the Python MCMC package PyMC.  This time, I say enough to the comfortable realm of Markov Chains for their own sake.  In this tutorial, I’ll test the waters of Bayesian probability.

Darwin's other Bulldog

Tom Bayes

Now, what better problem to stick my toe in than the one that inspired Reverend Thomas in the first place? Let’s talk about sex ratio. This is also convenient, because I can crib from Bayesian Data Analysis, that book Rif recommended me a month ago.

Bayes started this enterprise off with a question that has inspired many an evolutionary biologist: are girl children as likely as boy children? Or are they more likely or less likely? Laplace wondered this also, and in his time and place (from 1745 to 1770 in Paris) there were birth records of 241,945 girls and 251,527 boys. In the USA in 2005, the vital registration system recorded 2,118,982 male and 2,019,367 female live births [1]. I’ll set up a Bayesian model of this, and ask PyMC if the sex ratio could really be 1.0.

Continue reading

8 Comments

Filed under MCMC, probability

Google Flu

Have yinz already seen Google Flu? It’s a project by google.org, in collaboration with the Center for Disease Control and Prevention (CDC). It’s been getting healthy press coverage for the last two weeks or so. And, if you want to dig deeper, a draft manuscript on their approach is also available.

The headline result of this approach to tracking flu outbreaks is that it is fast: google.org can observe flu trends two weeks before the CDC. And it is accurate enough, with correlations of 0.85-0.98 between the search-result-based estimate and the gold-standard rates produced by the CDC.

I’ll tell you what’s wrong with it, but first let me praise it.

Continue reading

2 Comments

Filed under global health

Learning to Rank

A lovely stats paper appeared on the arxiv recently. Learning to rank with combinatorial Hodge theory, by Jiang, Lim, Yao, and Ye.

I admit it, the title is more than a little scary. But it may be the case that no more readable paper has so intimidating a title, and no more intimidatingly titled a paper is more readable.

Continue reading

Comments Off on Learning to Rank

Filed under combinatorial optimization, probability

Grad Students: NSF Funding for Research Abroad

NSF recently began accepting applications for their annual EAPSI program (due date: Dec. 9). The “East Asia and Pacific Summer Institutes” are an opportunity for science and tech grad students who are U.S. citizens or permanent residents to do some research in an Asian or Pacific country of their choice.

This could be your view

Continue reading

Comments Off on Grad Students: NSF Funding for Research Abroad

Filed under science policy

MCMC in Python: PyMC to sample uniformly from a convex body

This post is a little tutorial on how to use PyMC to sample points uniformly at random from a convex body.  This computational challenge says: if you have a magic box which will tell you yes/no when you ask, “Is this point (in n-dimensions) in the convex set S”, can you come up with a random point which is nearly uniformly distributed over S?

MCMC has been the main approach to solving this problem, and it has been a great success for the polynomial-time dogma, starting with the work of Dyer, Frieze, and Kannan which established the running-time upper bound of \mathcal{O}\left(n^{23}(\log n)^5\right).  The idea is this: you start with some point in S and try to move to a new, nearby point randomly.  “Randomly how?”, you wonder.  That is the art. Continue reading

9 Comments

Filed under MCMC, probability