Tag Archives: python

ACO in Python: Shortest Paths in NetworkX

I’m supposed to be doing the final edits on the journal version of this old paper of mine (jointly with Greg Sorkin and David Gamarnik), but I’ve thought of a way to procrastinate.

Instead of checking the proofs that the length of the shortest path in my weigthed width-2 strip is \frac{p^2(1+p)^2}{(3p^2+1)} n, I’ll make a quick blog post about verifying this claim numerically (in python with networkx). This also gives me a chance to try out the new networkx, which is currently version 1.0rc1. I think it has changed a bit since we last met.

from pylab import *
from networkx import *

G = Graph()
for u, v in grid_2d_graph(100, 2).edges():
    G.add_edge(u, v, weight=rand() < .5)

wt, p = bidirectional_dijkstra(G, (0,0), (99,1))

Continue reading

2 Comments

Filed under combinatorial optimization

Multilevel (hierarchical) modeling: what it can and cannot do in Python


I re-read a short paper of Andrew Gelman’s yesterday about multilevel modeling, and thought “That would make a nice example for PyMC”.  The paper is “Multilevel (hierarchical) modeling: what it can and cannot do, and R code for it is on his website.

To make things even easier for a casual blogger like myself, the example from the paper is extended in the “ARM book”, and Whit Armstrong has already implemented several variants from this book in PyMC. Continue reading

17 Comments

Filed under MCMC, statistics

Dense-Subset Break-the-Bank Challenge

I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.

Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!

Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.

Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.

As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut. Continue reading

4 Comments

Filed under combinatorial optimization, cryptography, TCS

MCMC in Python: PyMC for Bayesian Model Selection

(Updated 9/2/2009, but still unfinished; see other’s work on this that I’ve collected)

I never took a statistics class, so I only know the kind of statistics you learn on the street. But now that I’m in global health research, I’ve been doing a lot of on-the-job learning. This post is about something I’ve been reading about recently, how to decide if a simple statistical model is sufficient or if the data demands a more complicated one. To keep the matter concrete (and controversial) I’ll focus on a claim from a recent paper in Nature that my colleague, Haidong Wang, choose for our IHME journal club last week: Advances in development reverse fertility declines. The title of this short letter boldly claims a causal link between total fertility rate (an instantaneous measure of how many babies a population is making) and the human development index (a composite measure of how “developed” a country is, on a scale of 0 to 1). Exhibit A in their case is the following figure:

An astute observer of this chart might ask, “what’s up with the scales on those axes?” But this post is not about the visual display of quantitative information. It is about deciding if the data has a piecewise linear relationship that Myrskyla et al claim, and doing it in a Bayesian framework with Python and PyMC. But let’s start with a figure where the axes have a familiar linear scale! Continue reading

30 Comments

Filed under MCMC, statistics

August is Too-Many-Projects Month

(Tap… tap… tap… is this thing on? Good.)

July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don’t like biking on the side of a highway all day. Don’t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.

But now that I’m back, August is turning out to be project month. I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon. But one mixed blessing about these applications is that people actually want to see the results, like, yesterday! So first I have to deal with the results, and then I can write papers and blogs about the techniques.

Since Project Month is a little over-booked with projects, I’m going to have to triage one today. You’ve heard of the NetFlix Challenge, right? Well, github.com is running a smaller scale recommendation contest, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers. I haven’t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it. Maybe someone else will take it up, let me know if you get it to work; networkx + data = good times.

    f = open('download/data.txt')
    for l in f:
        u_id, r_id = l.strip().split(':')
        G.add_edge(user(u_id), repo(r_id))

[get the code]

2 Comments

Filed under combinatorial optimization, software engineering, TCS

Anatomy of a Django-driven Data Server

I haven’t had time to write anything this week because I am up to my neck in this Seven-Samurai-style software engineering project. You know, where a bunch of untrained villagers (that’s me) need to defend themselves against marauding bandits (that’s the Global Burden of Disease 2005 Study), so they have to learn everything about being a samurai (that’s writing an actual application that people other than this one villager can use) as quickly as possible.

I guess this analogy is stretching so thin that you could chop it with Toshirō Mifune’s wooden sword. But, if anyone knows how a mild-mannered theoretical computer scientist can get a web-app built in two weeks, holler. If you prefer to explain in terms of wild-west gunslingers, that is fine.

Here’s my game plan so far: I’m going to make the lightest of light-weight Python/Django apps to hold all the Global Disease Data, and then try to get my epidemologist doctors to interact with it on the command-line via an interactive python session.

The rest of this post is basically a repeat of the Django tutorial, but specialized for building a data server for global population data. As far as interesting theoretical math stuff, hidden somewhere towards the end, I’ll do some interpolation with PyMC’s Gaussian Processes using the exotic (to me) Matérn covariance function. Continue reading

2 Comments

Filed under global health, software engineering

ACO in Python: Minimum Weight Perfect Matchings (a.k.a. Matching Algorithms and Reproductive Health: Part 4)

This is the final item in my series on Matching Algorithms and Reproductive Health, and it brings the story full circle, returning to the algorithms side of the show. Today I’ll demonstrate how to actually find minimum-weight perfect matchings in Python, and toss in a little story about \pi^2/6. Continue reading

4 Comments

Filed under combinatorial optimization

Items of Interest

MIT faculty makes scholarly articles freely and openly available to the entire world.

Google Summer of Code returns, and suggested Python projects. (A nice way for students to spend the summer, especially during an “economic downturn”).

And for those of you that are looking for NSF grants to apply to: Foundations of Data and Visual Analytics.

Comments Off on Items of Interest

Filed under science policy

ACO in Python: PADS for Minimum Spanning Trees

Sometimes, instead of working, I like to see what search terms are bringing readers to my blog. The most common search that healthyalgorithms has been most useless for is “minimum spanning tree python”. Today, I’ll remedy that.

But first, dear searchers, consider this: why are you searching for minimum spanning tree code in python? Is it because you have a programming assignment due soon? High-school CS class is voluntary. All college is optional, and many you are paying to attend. You know what I’m talking about? Perhaps the short motivational comic Time Management for Anarchists is better than some Python code.

Still want to know how to do it? Ok, but I warned you.

Continue reading

10 Comments

Filed under combinatorial optimization

Hurrah for Free/Open Software like PyMC

A few posts ago, when I told you how amazingly simple it turned out to be to sample independent sets with PyMC.  Remember when I said that it was working a little differently than I expected, though?  I sent an email to the pymc-users mailing list, and, in just a few days, one of the developers, Anand Patil, replied to say that there was a little typo in their code which was making the chain reject with the probability it was supposed to accept with.  (I’m realizing that it is hard to make a story about debugging python code sound exciting, so let’s skip the build up and cut to the thrilling conclusion.)  Anand fixed the bug, which required changing one word, but also required finding that one word in the right 1200-line file.

Some of the folks I corresponded with from the PyMC list didn’t know what I was talking about with this sampling independent sets stuff, so I thought I’d expand a little bit on it now, as a attempt at gratitude.

Continue reading

Comments Off on Hurrah for Free/Open Software like PyMC

Filed under MCMC