Applied Approximate Counting: Malaria

My first first-authored global health paper came out today (I consider it my first “first-authored” paper ever, since the mathematicians I’ve worked with deviantly list authorship in alphabetical order regardless of seniority and contribution). It’s a bit of a mouthful by title: Rapid Scaling Up of Insecticide-Treated Bed Net Coverage in Africa and Its Relationship with Development Assistance for Health: A Systematic Synthesis of Supply, Distribution, and Household Survey Data.

What I find really pleasing about this research paper is the way it continues research I worked on in graduate school, but in a completely different and unexpected direction. Approximate counting is something that my advisor specialized in, and he won a big award for the random polynomial time algorithm for approximating the volume of convex bodies. I followed in his footsteps when I was a student, and I’m still doing approximate counting, it’s just that now, instead of approximating the amount of high-dimensional sand that will fit in an oddly shaped high-dimensional box, I’ve been approximating the number of insecticide-treated bednets that have made it from manufacturers through the distribution supply-chain and into the households of malaria-endemic regions of the world. I’m even using the same technique, Markov-chain Monte Carlo.

I’ve been itching to write about the computational details of this research for a while, and now that the paper’s out, I will have my chance. But for today, I refer you to the PLoS Med paper, and the technical appendix, and the PyMC code on github.

4 Comments

Filed under global health, MCMC

Coding tips for grad students, by grad students

Kyle writes from Sri Lanka with his stats programming tips for the new PBFs. It’s all things that old PBFs and even old young professors can benefit from:

• It’s taken me 2 years to jump on the version control bandwagon (~18 months after your PToW on git….), so I certainly can’t claim to be an exemplar myself. But I think the main themes would be:

• Location, location, location! Can you find your code? Can others find your code? Do both the directory and the filename make sense?

• Replicability – even of the mistakes. If you do something right, you want to be able to do it again.
o But often, even if it’s wrong, you want to do it again. Chris will say, let’s go back to the broken version from 2 months ago, I liked that better. So if you change your code, keep a record of the old parts (and maybe even why you ditched them).

• If others can’t look at your code and figure out quickly what each “chunk” of code does, it’s not well documented enough. If you can’t even tell within 30 seconds what a particular piece does (and you wrote it!), that’s a problem.

• On the other hand: Yes, a few PBFs were lit majors, but that doesn’t mean your code should be in novella format. Concise, readable code is often more understandable than a few sentences of explanation.

• Whitespace! Headers! Tab and Enter are your close and personal friends: “Without negative space how would we appreciate the positive in our art and in our lives?” – Dyan Law (some artist I’ve never heard of)

• Good exercises: Take someone else’s raw code, figure out what it does, and comment it. Read through a program you wrote and haven’t used in months – how long does it take you to figure out? Have someone else comment your own raw code; did they explicate things you left implied? did they misinterpret anything?

All good advice, and I often regret it when I don’t follow it. Anything else that should be on this list?

Comments Off on Coding tips for grad students, by grad students

Filed under software engineering

Aaronson’s non-bet of non-confidence in P≠NP

As you have undoubtedly heard by now, there is a paper that claims to prove P!=NP, and there is a serious effort to understand the proof.

It has been fun to watch the experts set to work on this, and it has brought a lot of attention to random k-SAT, a problem that was near and dear to me when I was a grad student. And I get to learn interesting things from them without having to struggle through Deolalikar’s opus myself.

One interesting thing is the way Scott Aaronson reacted, saying:

If Vinay Deolalikar is awarded the $1,000,000 Clay Millennium Prize for his proof of P≠NP, then I, Scott Aaronson, will personally supplement his prize by the amount of $200,000.

When I first read about Aaronson’s offer to add $200K to the prize money, reported 2nd hand in a roundup of what the #pnp blogs were saying, it came off like the young professor is really hoping to have people work on this thing. But once my trusty rss feeder fed me his post, I realized his offer is not about how profs at private universities have disposable income that public schools don’t provide. It’s his way of quantifing his confidence in the accuracy of the proof.

If Aaronson had framed this in terms of a bet, it would be a textbook example of his level of certainty that the proof will have a flaw (a textbook in decision theory, anyway). But offering the sum without any possibility of receiving a return in the alternative scenario breaks expected utility theory. How certain is Scott? It all depends on what amount of money means nothing to him.

1 Comment

Filed under probability, TCS

P, NP, and more

There was a lot of buzz this weekend about a paper claiming to resolve the “P vs NP” conjecture.  I’ve seen plenty of papers claiming to do this over the years, so I didn’t rush to track it down.  But as the tweeting continued, I decided to have a quick look for myself, if only to produce a crotchety blog post on the matter.

Part of the drama around this paper comes from the way it appeared.  Author Vinay Deolalikar‘s web page explains:

Manuscript sent on 6th August to several leading researchers in various areas. Confirmations began arriving 8th August early morning.  The preliminary version made it to the web without my knowledge.  I have made minor updates, here. Please note that the  final version of the paper is under preparation, and is to be posted here very shortly.

I’m a global health researcher now, so I’m not going to be the one who tries to verify this 104 page paper, but I would love to learn from a careful review that it is true.  The result seems to go further than separating P and NP, since it is a statement about a natural distribution of instances being hard (from p. 78):

If LFP were able to compute solutions to the d1RSB phase of random k-SAT, then the distribution of the entire space of solutions would have a substantially simpler parametrization than we know it does.

Ryan Williams thinks this is suspicious, and expressed his concern in a series of tweets:

  1. This P vs NP claim is getting tons of press. I am really doubtful that it works. Hard to convey my skepticism in a tweet, but here goes…
  2. The author tries to use the fact that for certain distributions of random k-SAT, the solution space has a “hard structure”. Two problems:
  3. (1) Polytime solvable problems (such as perfect matching on random graphs) can also have complicated solution distributions.

  4. (2) There is a randomized reduction from SAT to formulas with at most ONE satisfying assignment (Valiant-Vazirani). A simple solution space
  5. So either Valiant-Vazirani can’t be derandomized or RP=NP (seems very unlikely!) or the proof must break. That’s my intuition.

Time will tell. I find proof of the existence of hard-on-average distributions much more exciting than plain old P vs NP, and Deolalikar’s paper might have something for everybody.

1 Comment

Filed under TCS

UW Exchange email on an Android phone

This is a sort-of public service message, and I expect that it will at least be of service to my future self. Since I am now an absent-minded professor, I lost my cell phone when I was at JSM. So I got a new one today (thus following the advice my department head gave me when he saw the cracked screen on my old phone a month ago). And since it’s new and shiny, I started playing around and decided to make it check my university email, calendar, etc.

This is all easy, provided you know the magic settings, which are not recorded anywhere (yet).  I’ll write down precisely what it should be for someone with UW username “your_username”.  Just replace that with “abie” if you’re me, or something appropriate if you’re not.

  • Email address: your_username@u.washington.edu
  • Server address: exchange.washington.edu
  • Domain: netid
  • Username: your_username
  • Password: *********
  • This server requires an encrypted SSL connection: Yes

Search engines, please index this so I can find it next time I’m looking for “UW email on android”, thanks.

6 Comments

Filed under education

Speaking of Networks

I don’t get to do much about network research these days, except feel guilty about a few unfinished projects that I have no time for, but at least I can spread the word about conferences. The Web Algorithms Workshop 2010, for which I am on the PC, has a call-for-papers out, and the deadline is in 1.5 months.

The SAMSI focus year on Complex Networks had a preview session at JSM today, and it sounds like a truly interdisciplinary crowd. According to Eric Kolaczyk, the opening workshop (Aug 31 – Sept 1) is nearly full, so act fast if you want to act.

Comments Off on Speaking of Networks

Filed under Uncategorized

Societally-useful applications of Networks

Aravind Srinivasan writes:

I am teaching a new grad course on Networks in the Fall, and would like to include some societally-useful applications of networks as possible projects for the students (can be theory, very applied, or anywhere in the middle); these would be among a menu of choices available to the students. Could you please let me know if you have any suggestions? Of course, I will be happy to suggest anything useful that came out of my course, if you are interested.

I am interested, and it’s a fun/important thing to think about the societal value of your work periodically (not that I get to work on networks very much these days). I’ll put my response “below the fold”, in case you want to come up with your own list first. Continue reading

3 Comments

Filed under education

9 Hours to Numeracy

I’m helping to plan an Introduction to Statistics for incoming post-bachelors fellows in the next month, and because of the wide range of backgrounds these recent college graduates will be coming from, I’m approaching it as a short course on numeracy (we’ve got about 9 hours of lecture time scheduled for it), focused on statistics. This will be complemented with a very hands-on dose of STATA, but I’m going to try not to think about that.

My favorite numeracy-in-stats book is a dusty classic, and it would have survived on its name alone: How to Lie with Statistics. I wonder if that title is too cheeky for global health applications when the numbers really matter…

Do you know this book, and do you like it? Or is there a more modern book or article that I should think of instead? What would you pack into 9 hours of stats numeracy training. Tell me.

7 Comments

Filed under education

More PBFs out of Seattle

For those of you interested in hearing more about the summer travels of the IHME post-bachelors fellows, I alert you to the existence of these blogs:

1 Comment

Filed under education

Guest post from “the field”

One cool program here at IHME is the field placements for our Post-Bachelors Fellows. This is a roughly 6 week stint during the summer of their second year here where they travel from Seattle to some distant place, to see where the numbers we’ve been crunching come from. Kyle Foreman is in Sri Lanka doing this now, and here is a guest post he’s written about an ICT4D challenge he’s seeing that he wants your ideas on:

I’m spending this summer in Sri Lanka working with the Ministry of Health and the community health department of the University of Peradeniya, observing how Sri Lanka’s medical record keeping and vital statistics system works. They’d like for me to make some suggestions on how it could be improved, so I was hoping to get some feedback on how to make this work.

Here’s the problem: keeping track of something as simple as the number of people who die each year is very difficult here. Patient records are kept at each hospital, then they are tabulated and sent to a regional office, then tabulated at a district office, ad nauseam, until they finally reach the national level. It takes literally years (they just finished the 2006 returns), is full of errors (because they do it all by hand), and is very incomplete (because every step along the way there’s further tabulation which strips away valuable data). They thus have difficulty identifying problems (especially outbreaks), targeting resources, and assessing the outcomes of their efforts. Continue reading

11 Comments

Filed under global health