A couple of weeks ago, I mentioned the exciting experiment in online math collaboration, where Tim Gowers invited the world to set out and develop a combinatorial proof of the density Hales-Jewitt theorem (DHJ). Big congratulations to them, because the problem is solved, probably. Summarizing why he spent his time on this particular problem, Terry Tao wrote:
I guess DHJ is known to experts in the field to be an interesting question, partly because it implies a number of other deep theorems (e.g. Szemeredi’s theorem, which was for instance a key tool in my result with Ben that the primes contain arbitrarily long arithmetic progressions), but also because it (until very recently) was one of the most prominent density Ramsey theorems that could only be proven by ergodic theoretic techniques. I myself am a big believer in exploiting more systematically the connections between ergodic theory, combinatorics, and Fourier analysis, and so this project was certainly very appealing to me. Besides, historically every new proof of Szemeredi’s theorem has led to a substantial amount of progress and activity in at least one subfield of mathematics; now that we have yet another proof (the fifth genuinely new proof of Szemeredi, by my count), one can hope that the tools developed here will have some applicability elsewhere.
Now, are there any applications of DHJ or Ramsey theory to Health Metrics? I wouldn’t say they are leaping out at me, but I wouldn’t rule it out either. When noisy data has unavoidable structure, some of the noise could be removed.
Sometimes, instead of working, I like to see what search terms are bringing readers to my blog. The most common search that healthyalgorithms has been most useless for is “minimum spanning tree python”. Today, I’ll remedy that.
But first, dear searchers, consider this: why are you searching for minimum spanning tree code in python? Is it because you have a programming assignment due soon? High-school CS class is voluntary. All college is optional, and many you are paying to attend. You know what I’m talking about? Perhaps the short motivational comic Time Management for Anarchists is better than some Python code.
Still want to know how to do it? Ok, but I warned you.
A few posts ago, when I told you how amazingly simple it turned out to be to sample independent sets with PyMC. Remember when I said that it was working a little differently than I expected, though? I sent an email to the pymc-users mailing list, and, in just a few days, one of the developers, Anand Patil, replied to say that there was a little typo in their code which was making the chain reject with the probability it was supposed to accept with. (I’m realizing that it is hard to make a story about debugging python code sound exciting, so let’s skip the build up and cut to the thrilling conclusion.) Anand fixed the bug, which required changing one word, but also required finding that one word in the right 1200-line file.
Some of the folks I corresponded with from the PyMC list didn’t know what I was talking about with this sampling independent sets stuff, so I thought I’d expand a little bit on it now, as a attempt at gratitude.