Monthly Archives: November 2008

MCMC in Python: PyMC for Bayesian Probability

I’ve got an urge to write another introductory tutorial for the Python MCMC package PyMC.  This time, I say enough to the comfortable realm of Markov Chains for their own sake.  In this tutorial, I’ll test the waters of Bayesian probability.

Tom Bayes

Now, what better problem to stick my toe in than the one that inspired Reverend Thomas in the first place? Let’s talk about sex ratio. This is also convenient, because I can crib from Bayesian Data Analysis, that book Rif recommended me a month ago.

Bayes started this enterprise off with a question that has inspired many an evolutionary biologist: are girl children as likely as boy children? Or are they more likely or less likely? Laplace wondered this also, and in his time and place (from 1745 to 1770 in Paris) there were birth records of 241,945 girls and 251,527 boys. In the USA in 2005, the vital registration system recorded 2,118,982 male and 2,019,367 female live births [1]. I’ll set up a Bayesian model of this, and ask PyMC if the sex ratio could really be 1.0.

Filed under MCMC, probability

Have yinz already seen Google Flu? It’s a project by google.org, in collaboration with the Center for Disease Control and Prevention (CDC). It’s been getting healthy press coverage for the last two weeks or so. And, if you want to dig deeper, a draft manuscript on their approach is also available.

The headline result of this approach to tracking flu outbreaks is that it is fast: google.org can observe flu trends two weeks before the CDC. And it is accurate enough, with correlations of 0.85-0.98 between the search-result-based estimate and the gold-standard rates produced by the CDC.

I’ll tell you what’s wrong with it, but first let me praise it.

Filed under global health

Learning to Rank

A lovely stats paper appeared on the arxiv recently. Learning to rank with combinatorial Hodge theory, by Jiang, Lim, Yao, and Ye.

I admit it, the title is more than a little scary. But it may be the case that no more readable paper has so intimidating a title, and no more intimidatingly titled a paper is more readable.

Comments Off on Learning to Rank

Filed under combinatorial optimization, probability

NSF recently began accepting applications for their annual EAPSI program (due date: Dec. 9). The “East Asia and Pacific Summer Institutes” are an opportunity for science and tech grad students who are U.S. citizens or permanent residents to do some research in an Asian or Pacific country of their choice.

MCMC has been the main approach to solving this problem, and it has been a great success for the polynomial-time dogma, starting with the work of Dyer, Frieze, and Kannan which established the running-time upper bound of $\mathcal{O}\left(n^{23}(\log n)^5\right)$.  The idea is this: you start with some point in S and try to move to a new, nearby point randomly.  “Randomly how?”, you wonder.  That is the art. Continue reading