Learning to Rank

A lovely stats paper appeared on the arxiv recently. Learning to rank with combinatorial Hodge theory, by Jiang, Lim, Yao, and Ye.

I admit it, the title is more than a little scary. But it may be the case that no more readable paper has so intimidating a title, and no more intimidatingly titled a paper is more readable.

Continue reading

Comments Off on Learning to Rank

Filed under combinatorial optimization, probability

Grad Students: NSF Funding for Research Abroad

NSF recently began accepting applications for their annual EAPSI program (due date: Dec. 9). The “East Asia and Pacific Summer Institutes” are an opportunity for science and tech grad students who are U.S. citizens or permanent residents to do some research in an Asian or Pacific country of their choice.

This could be your view

Continue reading

Comments Off on Grad Students: NSF Funding for Research Abroad

Filed under science policy

MCMC in Python: PyMC to sample uniformly from a convex body

This post is a little tutorial on how to use PyMC to sample points uniformly at random from a convex body.  This computational challenge says: if you have a magic box which will tell you yes/no when you ask, “Is this point (in n-dimensions) in the convex set S”, can you come up with a random point which is nearly uniformly distributed over S?

MCMC has been the main approach to solving this problem, and it has been a great success for the polynomial-time dogma, starting with the work of Dyer, Frieze, and Kannan which established the running-time upper bound of \mathcal{O}\left(n^{23}(\log n)^5\right).  The idea is this: you start with some point in S and try to move to a new, nearby point randomly.  “Randomly how?”, you wonder.  That is the art. Continue reading

9 Comments

Filed under MCMC, probability

Health metrics a gold-mine for applications of theory

Is there a metaphor which doesn’t glorify mineral extraction?

Gold Mine Hadal Awatib

Gold Mine Hadal Awatib

1 Comment

Filed under general

Minimum Spanning Trees of Bounded Depth (Random)

I’ve got a new paper up on the arxiv.  David Wilson recently posted this joint work that was one of the last things I did during my post-doc at Microsoft.  It hasn’t been applied to health metrics yet, but maybe it will be.  Let me tell you the story:

A spanning tree is just what it sounds like, if you know that a tree is a graph with no cycles, and make a good guess that the spanning tree is a subgraph that is a tree “spans” all the vertices in the graph.  Minimum cost spanning trees come up in some very practical optimization problems, like if you want to build a electricity network without wasting wires.  It was in this exact context, designing an electricity network for Moravia, that the first algorithm for finding a minimum spanning tree was developed.

Public Domain image of a Minimum Spanning Tree, from Wikipedia

A Minimum Spanning Tree, from Wikipedia

The great thing about looking for a minimum spanning tree is that you don’t have to be sneaky to find it.  If you repeatedly find the cheapest edge which does not create a cycle, and add that to your subgraph, then this greedy approach never goes wrong.  When you can’t add any more edges, you have a minimum spanning tree.  Set systems with this property have been so fun for people to study that they have an intimidating name, matroids.  But don’t be intimidated, you can go a long way in matroids by doing a search-and-replace s/matroid/spanning tree/.

Continue reading

3 Comments

Filed under combinatorial optimization, probability

Mortality research got you down?

It’s important not to get depressed when you measure the burden of disease all day.  Animal videos help.

Comments Off on Mortality research got you down?

Filed under global health

Applied Privacy

A multitude of events in the last week or so have made me want to blog about (and learn more about) the cryptographic theory of privacy. Journalist James Bamford’s new book about the NSA came out, the third in his trilogy. Bamford described his findings on Democracy Now last Tuesday, including how government contractors were hired to eavesdrop on US soldiers in Iraq:

Not only were they eavesdropping on a lot of these conversations, some of which were very intimate, but they would have sort of locker room chats about what they were hearing, and they would post—or they would notify their co-workers that you should listen to this, what they call “cut,” their conversations. You should listen to this conversation or that conversation. They’d laugh about it.

Also last week, (or maybe two weeks ago) the National Academy Press published a new report called Protecting Individual Privacy in the Struggle Against Terrorists. The report’s primary recommendation, that “Programs Should be Evaluated for Effectiveness, Privacy”, is not too revolutionary, but the report contains some interesting summaries of technology and public opinion.

And kicking off this season of privacy discussion, there were demonstrations across the EU on Oct 11 in a world-wide protest against surveillance entitled Freedom not fear.

Or, almost kicking it off… just a few weeks before this tsunami of privacy, Adam Smith posted an interesting sounding paper to the arxiv, Efficient, Differentially Private Point Estimators. This sort of cryptographic approach to privacy is where I’m going with this post. But let me first mention why I’m going there.

Continue reading

2 Comments

Filed under cryptography, global health

Through the algorithmic lens, doctors = noise machines

A few years ago, some concerned citizens of TCS, like Sanjeev Arora and Bernard Chazelle, came up with this idea to promote the applications of algorithmic ideas more widely.  Chazelle’s essay The Algorithm: Idiom of Modern Science is an example of this (with lots of nice pictures).  The name for this world view seems to be “The Algorithmic Lens”.

I like the way that sounds, but it makes me imagine how kids will scorch ants with a magnifying glass.  Maybe that is not the best mental imagery to frame interdisciplinary research.

This post is about an application of algorithmic thinking in health metrics. I hope I don’t burn the public health docs with my highly focused beam of algorithms. (That reminds me, if you can’t understand what I’m talking about, and want to, you can leave a comment with questions. I’ll answer. It’s likely that no one else understands what I’m talking about either.)

Continue reading

2 Comments

Filed under global health

Science Funding 2009 (not good news)

There is this paradox: federal budgets, and, particularly, what is allocated for science, is something so important day-to-day for researchers, yet reading about budgets is so boring that I can hardly bring myself to do it.

It is important, though, so we should try. The folks at ScienceNOW have done a nice summary of the effects of the “continuing resolution” which congress passed last weekend and Bush signed on Tuesday. What this means in dollars is that most all budget items stay the same as last year, except that there is also inflation, so, in real dollars the amount spent on all science decreases.

“I think the next Administration will be very leery of more spending given the current state of the economy,” speculates Samuel Rankin III, a lobbyist for the American Mathematical Society and head of the Coalition for National Science Funding.

For some science agencies, the CR actually puts them below the amounts spent this year. That’s because the legislators excluded the $400 million divvied up among NSF, DOE, NASA, and the National Institutes of Health (NIH) under a supplemental 2008 spending bill passed in June

Let’s not be all doom and gloom, though; Michael Mitzenmacher reports that enrollment in beginning CS at Harvard is up, up, up.

2 years ago — 132
1 year ago — 282
this year — 341

Comments Off on Science Funding 2009 (not good news)

Filed under science policy

Hurrah for Free/Open Software like PyMC

A few posts ago, when I told you how amazingly simple it turned out to be to sample independent sets with PyMC.  Remember when I said that it was working a little differently than I expected, though?  I sent an email to the pymc-users mailing list, and, in just a few days, one of the developers, Anand Patil, replied to say that there was a little typo in their code which was making the chain reject with the probability it was supposed to accept with.  (I’m realizing that it is hard to make a story about debugging python code sound exciting, so let’s skip the build up and cut to the thrilling conclusion.)  Anand fixed the bug, which required changing one word, but also required finding that one word in the right 1200-line file.

Some of the folks I corresponded with from the PyMC list didn’t know what I was talking about with this sampling independent sets stuff, so I thought I’d expand a little bit on it now, as a attempt at gratitude.

Continue reading

Comments Off on Hurrah for Free/Open Software like PyMC

Filed under MCMC