I’m updating my CV, and that reminded me that I meant to promote this cool clustering technique that I was a little bit involved in, Clustering With Shallow Trees.
This goes way back to about half-way through my post-doc at MSR, when statistical physicist Riccardo Zecchina was visiting for a semester, and was teaching me about all of the “intractable” optimization problems that he can solve using his panoply of propagation algorithms. In particular, he was working on algorithms for certain types of steiner tree optimization, and he had discovered that adding an extra constraint on the depth of the tree didn’t make the problem harder. (All variants of the problem he considers are NP-hard, but some are NP-harder than others.) Continue reading
I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.
Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!
Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.
Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.
As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut. Continue reading
(Tap… tap… tap… is this thing on? Good.)
July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don’t like biking on the side of a highway all day. Don’t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.
But now that I’m back, August is turning out to be project month. I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon. But one mixed blessing about these applications is that people actually want to see the results, like, yesterday! So first I have to deal with the results, and then I can write papers and blogs about the techniques.
Since Project Month is a little over-booked with projects, I’m going to have to triage one today. You’ve heard of the NetFlix Challenge, right? Well, github.com is running a smaller scale recommendation contest, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers. I haven’t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it. Maybe someone else will take it up, let me know if you get it to work; networkx + data = good times.
f = open('download/data.txt')
for l in f:
u_id, r_id = l.strip().split(':')
[get the code]
This is the final item in my series on Matching Algorithms and Reproductive Health, and it brings the story full circle, returning to the algorithms side of the show. Today I’ll demonstrate how to actually find minimum-weight perfect matchings in Python, and toss in a little story about . Continue reading
It’s been three weeks and one IHME retreat since I wrote about matching algorithms and virginity pledges, and I think I now understand what’s going on in Patient Teenagers well enough to describe it. I’ll try to give a stylized example of how the minimum-weight perfect matching algorithm makes itself useful in reproductive health research.
I think it’s helpful to focus on a concrete research question about the virginity pledge and its effects on reproductive health. Here’s one: “does taking the pledge reduce the chances that an individual contracts trichomoniasis?” If the answer is yes, or if the answer is no, people can still argue about the value of the virginity pledge programs, but this seems like relevant information for decision making. Continue reading
I might have been a little over-ambitious with this series. I wrote a little bit about the how matching theory emerged from the social sciences two weeks ago. But then I got really busy! And that was the part I actually knew something about ahead of time. The promised connection between matching algorithms and reproductive health (and more generally, how matching is being used in quasi-experiment design) is the part that I have to do some reading on before I can write knowledgeably about.
However, I have a plan: I’d like to “crowd-source” my library research. Continue reading
Earlier this week, I was inspired by current events to launch a bold, crazy-sounding series about matching theory and its application to reproductive health. This first installment is a quick social history of the development of matching theory, largely influenced by (and fact-checked against) Lex Scrijver’s encyclopediac Combinatorial Optimization: Polyhedra and Efficiency. His paper “On the history of combinatorial optimization (till 1960)” contains similar information in an easy-to-download form.
On to the story: how social science applications drove the development of matching theory. Continue reading