I learned about an interesting book today, The Architecture of Open Source Applications, edited by Amy Brown and Greg Wilson. The introduction caught my attention:
Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.
This book’s goal is to change that. In it, the authors of twenty-five open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to this book provide unique insights into how they think.
If you are a junior developer, and want to learn how your more experienced colleagues think, this book is the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, this book can help you too.
There are chapters on several software packages that I’ve enjoyed using, and chapters on several scientific/data analysis tools, but nothing on the tools I’m using day to day. Still interesting. Let me know if there is a great chapter you come across, since I’m going to be too busy to read the whole thing in the near future.
To take my mind off my meetings, I spent a little time modifying the Spatial Preferred Attachment model from Aiello, Bonato, Cooper, Janssen, and Prałat’s paper A Spatial Web Graph Model with Local Influence Regions so that it changes over time. Continue reading
I’m supposed to be doing the final edits on the journal version of this old paper of mine (jointly with Greg Sorkin and David Gamarnik), but I’ve thought of a way to procrastinate.
Instead of checking the proofs that the length of the shortest path in my weigthed width-2 strip is , I’ll make a quick blog post about verifying this claim numerically (in python with networkx). This also gives me a chance to try out the new networkx, which is currently version 1.0rc1. I think it has changed a bit since we last met.
from pylab import *
from networkx import *
G = Graph()
for u, v in grid_2d_graph(100, 2).edges():
G.add_edge(u, v, weight=rand() < .5)
wt, p = bidirectional_dijkstra(G, (0,0), (99,1))
I haven’t had time to write anything this week because I am up to my neck in this Seven-Samurai-style software engineering project. You know, where a bunch of untrained villagers (that’s me) need to defend themselves against marauding bandits (that’s the Global Burden of Disease 2005 Study), so they have to learn everything about being a samurai (that’s writing an actual application that people other than this one villager can use) as quickly as possible.
I guess this analogy is stretching so thin that you could chop it with Toshirō Mifune’s wooden sword. But, if anyone knows how a mild-mannered theoretical computer scientist can get a web-app built in two weeks, holler. If you prefer to explain in terms of wild-west gunslingers, that is fine.
Here’s my game plan so far: I’m going to make the lightest of light-weight Python/Django apps to hold all the Global Disease Data, and then try to get my epidemologist doctors to interact with it on the command-line via an interactive python session.
The rest of this post is basically a repeat of the Django tutorial, but specialized for building a data server for global population data. As far as interesting theoretical math stuff, hidden somewhere towards the end, I’ll do some interpolation with PyMC’s Gaussian Processes using the exotic (to me) Matérn covariance function. Continue reading
Since 1995, presidential decree has designated the first full week of April to be National Public Health Week in the United States. The American Public Health Association is kicking things off with an online “viral video” campaign. Public health has much more experience trying to stop the spread of viruses, so this campaign has some underdog appeal. It’s also got nice motion graphics, but definitely not my first choice for inspirational music.
(Hey, this soundtrack would be so easy to remix, if only it had an appropriate Creative Commons license. APHA could probably get a bit of notice from folks who wouldn’t otherwise see a public health video by changing the license today and send CC and friends a nice press release. Hint hint.)
This is the final item in my series on Matching Algorithms and Reproductive Health, and it brings the story full circle, returning to the algorithms side of the show. Today I’ll demonstrate how to actually find minimum-weight perfect matchings in Python, and toss in a little story about . Continue reading
MIT faculty makes scholarly articles freely and openly available to the entire world.
Google Summer of Code returns, and suggested Python projects. (A nice way for students to spend the summer, especially during an “economic downturn”).
And for those of you that are looking for NSF grants to apply to: Foundations of Data and Visual Analytics.
A couple of weeks ago, I mentioned the exciting experiment in online math collaboration, where Tim Gowers invited the world to set out and develop a combinatorial proof of the density Hales-Jewitt theorem (DHJ). Big congratulations to them, because the problem is solved, probably. Summarizing why he spent his time on this particular problem, Terry Tao wrote:
I guess DHJ is known to experts in the field to be an interesting question, partly because it implies a number of other deep theorems (e.g. Szemeredi’s theorem, which was for instance a key tool in my result with Ben that the primes contain arbitrarily long arithmetic progressions), but also because it (until very recently) was one of the most prominent density Ramsey theorems that could only be proven by ergodic theoretic techniques. I myself am a big believer in exploiting more systematically the connections between ergodic theory, combinatorics, and Fourier analysis, and so this project was certainly very appealing to me. Besides, historically every new proof of Szemeredi’s theorem has led to a substantial amount of progress and activity in at least one subfield of mathematics; now that we have yet another proof (the fifth genuinely new proof of Szemeredi, by my count), one can hope that the tools developed here will have some applicability elsewhere.
Now, are there any applications of DHJ or Ramsey theory to Health Metrics? I wouldn’t say they are leaping out at me, but I wouldn’t rule it out either. When noisy data has unavoidable structure, some of the noise could be removed.
Sometimes, instead of working, I like to see what search terms are bringing readers to my blog. The most common search that healthyalgorithms has been most useless for is “minimum spanning tree python”. Today, I’ll remedy that.
But first, dear searchers, consider this: why are you searching for minimum spanning tree code in python? Is it because you have a programming assignment due soon? High-school CS class is voluntary. All college is optional, and many you are paying to attend. You know what I’m talking about? Perhaps the short motivational comic Time Management for Anarchists is better than some Python code.
Still want to know how to do it? Ok, but I warned you.
A few posts ago, when I told you how amazingly simple it turned out to be to sample independent sets with PyMC. Remember when I said that it was working a little differently than I expected, though? I sent an email to the pymc-users mailing list, and, in just a few days, one of the developers, Anand Patil, replied to say that there was a little typo in their code which was making the chain reject with the probability it was supposed to accept with. (I’m realizing that it is hard to make a story about debugging python code sound exciting, so let’s skip the build up and cut to the thrilling conclusion.) Anand fixed the bug, which required changing one word, but also required finding that one word in the right 1200-line file.
Some of the folks I corresponded with from the PyMC list didn’t know what I was talking about with this sampling independent sets stuff, so I thought I’d expand a little bit on it now, as a attempt at gratitude.