I’ve got an urge to write another introductory tutorial for the Python MCMC package PyMC. This time, I say enough to the comfortable realm of Markov Chains for their own sake. In this tutorial, I’ll test the waters of Bayesian probability.
Now, what better problem to stick my toe in than the one that inspired Reverend Thomas in the first place? Let’s talk about sex ratio. This is also convenient, because I can crib from Bayesian Data Analysis, that book Rif recommended me a month ago.
Bayes started this enterprise off with a question that has inspired many an evolutionary biologist: are girl children as likely as boy children? Or are they more likely or less likely? Laplace wondered this also, and in his time and place (from 1745 to 1770 in Paris) there were birth records of 241,945 girls and 251,527 boys. In the USA in 2005, the vital registration system recorded 2,118,982 male and 2,019,367 female live births . I’ll set up a Bayesian model of this, and ask PyMC if the sex ratio could really be 1.0.
A lovely stats paper appeared on the arxiv recently. Learning to rank with combinatorial Hodge theory, by Jiang, Lim, Yao, and Ye.
I admit it, the title is more than a little scary. But it may be the case that no more readable paper has so intimidating a title, and no more intimidatingly titled a paper is more readable.
This post is a little tutorial on how to use PyMC to sample points uniformly at random from a convex body. This computational challenge says: if you have a magic box which will tell you yes/no when you ask, “Is this point (in n-dimensions) in the convex set S”, can you come up with a random point which is nearly uniformly distributed over S?
MCMC has been the main approach to solving this problem, and it has been a great success for the polynomial-time dogma, starting with the work of Dyer, Frieze, and Kannan which established the running-time upper bound of . The idea is this: you start with some point in S and try to move to a new, nearby point randomly. “Randomly how?”, you wonder. That is the art. Continue reading
I’ve got a new paper up on the arxiv. David Wilson recently posted this joint work that was one of the last things I did during my post-doc at Microsoft. It hasn’t been applied to health metrics yet, but maybe it will be. Let me tell you the story:
A spanning tree is just what it sounds like, if you know that a tree is a graph with no cycles, and make a good guess that the spanning tree is a subgraph that is a tree “spans” all the vertices in the graph. Minimum cost spanning trees come up in some very practical optimization problems, like if you want to build a electricity network without wasting wires. It was in this exact context, designing an electricity network for Moravia, that the first algorithm for finding a minimum spanning tree was developed.
A Minimum Spanning Tree, from Wikipedia
The great thing about looking for a minimum spanning tree is that you don’t have to be sneaky to find it. If you repeatedly find the cheapest edge which does not create a cycle, and add that to your subgraph, then this greedy approach never goes wrong. When you can’t add any more edges, you have a minimum spanning tree. Set systems with this property have been so fun for people to study that they have an intimidating name, matroids. But don’t be intimidated, you can go a long way in matroids by doing a search-and-replace
My new job is in a den of Bayesians! This sort of philosophical trouble is something I avoided for years when I worked on random graphs. In combinatorial probability, I just said “assume the axioms of probability” and got to look for all the interesting facts that follow logically. People want these probability calculations to say something about the “real world”? That’s not my thing; it’s up to them to go from math to science. Well, now it is my problem.