I was looking for a distraction earlier this week, which led me to the world of stackexchange sites. The stack overflow has been on my radar for a while now, because web-search for coding questions often leads there and the answers are often good. And I knew that math overflow, tcs overflow, and even stats overflow existed, but I’d never really explored these things.
Well, diversion found! I got enamored with an MCMC question on the tcs site, about how to find random bounded-depth spanning trees. Bounded-depth spanning trees are something that I worked on with David Wilson and Riccardo Zechinna in my waning days of my MSR post-doc, and we came up with some nice results, but the theoretical ones are just theory, and the practical ones are based on message passing algorithms that still seem magical to me, even after hours of patient explanation from my collaborators.
So let’s do it in PyMC… this amounts to an exercise in writing custom step methods, something that’s been on my mind for a while anyways. And, as a bonus, I got to make an animation of the chain in action which I find incredibly soothing to watch on repeat:
Sometimes, instead of working, I like to see what search terms are bringing readers to my blog. The most common search that healthyalgorithms has been most useless for is “minimum spanning tree python”. Today, I’ll remedy that.
But first, dear searchers, consider this: why are you searching for minimum spanning tree code in python? Is it because you have a programming assignment due soon? High-school CS class is voluntary. All college is optional, and many you are paying to attend. You know what I’m talking about? Perhaps the short motivational comic Time Management for Anarchists is better than some Python code.
Still want to know how to do it? Ok, but I warned you.
I’ve got a new paper up on the arxiv. David Wilson recently posted this joint work that was one of the last things I did during my post-doc at Microsoft. It hasn’t been applied to health metrics yet, but maybe it will be. Let me tell you the story:
A spanning tree is just what it sounds like, if you know that a tree is a graph with no cycles, and make a good guess that the spanning tree is a subgraph that is a tree “spans” all the vertices in the graph. Minimum cost spanning trees come up in some very practical optimization problems, like if you want to build a electricity network without wasting wires. It was in this exact context, designing an electricity network for Moravia, that the first algorithm for finding a minimum spanning tree was developed.
A Minimum Spanning Tree, from Wikipedia
The great thing about looking for a minimum spanning tree is that you don’t have to be sneaky to find it. If you repeatedly find the cheapest edge which does not create a cycle, and add that to your subgraph, then this greedy approach never goes wrong. When you can’t add any more edges, you have a minimum spanning tree. Set systems with this property have been so fun for people to study that they have an intimidating name, matroids. But don’t be intimidated, you can go a long way in matroids by doing a search-and-replace