Category Archives: TCS

MCMC in Python: Custom StepMethods and bounded-depth spanning tree distraction

I was looking for a distraction earlier this week, which led me to the world of stackexchange sites. The stack overflow has been on my radar for a while now, because web-search for coding questions often leads there and the answers are often good. And I knew that math overflow, tcs overflow, and even stats overflow existed, but I’d never really explored these things.

Well, diversion found! I got enamored with an MCMC question on the tcs site, about how to find random bounded-depth spanning trees. Bounded-depth spanning trees are something that I worked on with David Wilson and Riccardo Zechinna in my waning days of my MSR post-doc, and we came up with some nice results, but the theoretical ones are just theory, and the practical ones are based on message passing algorithms that still seem magical to me, even after hours of patient explanation from my collaborators.

So let’s do it in PyMC… this amounts to an exercise in writing custom step methods, something that’s been on my mind for a while anyways. And, as a bonus, I got to make an animation of the chain in action which I find incredibly soothing to watch on repeat:

Continue reading


Filed under MCMC, TCS

Losing touch with theory?

I’ve been flipping through the titles of SODA acceptances listed on the blogs, and wondering if I’m losing touch with TCS research. It’s a good chance for me to think about what algorithms (discrete or otherwise) have been really big in the health metrics work I’ve been doing recently.

  • Markov Chain Monte Carlo (MCMC):  This is the workhorse algorithm for me when fitting statistical models.  There are a few MCMC-sounding titles in the SODA list;  does anything have an exciting new step method to speed up my life?
  • Mixed Integer Programming (MIP):  This classic formulation of operations research must make an appearance in some of the approximation algorithms or other analysis in SODA.  Is there any work there that’s taking it on directly?
  • Stochastic Programming:  There was a lot of excitement about two-stage stochastic programming a few years ago, but the fad seems to have died down in theory land.  Too bad for me, because two-stage formulations are not really what I need, and my StoPro needs are growing.
  • Random Forests:  I really didn’t get enough education on machine learning in grad school.  What I do know is very much on the theoretical side of the spectrum.  But this Random Forests business has been pretty promising so far, and I just made a bet 10-to-1 that it will out-perform an ad-hoc method for verbal autopsy.  I believe the odds, but I wasn’t paying enough attention to the stakes…
  • Nonlinear optimization:  I love MCMC, but I don’t love waiting around for my chains to mix.  The alternative approach to statistical modeling, where you make due with a maximum likelihood estimate is starting to look pretty appealing.  This is pretty outside of the SODA realm.  I tried to convince Steven Rudich to include Newton’s Method in his course “Great Theoretical Ideas in Computer Science” some years ago, but I didn’t succeed.
  • Automatic Differentiation:  If I’m getting into nonlinear optimization, I will at least be a user of automatic differentiation, since the nonlinear optimizer wants to know the gradient, and I’m sure not going to be computing it if I don’t have to be.

So I guess my research needs are not squarely within the SODA realm.  But they are not disjoint from it either.  I’m still touching theory, if not totally in touch.  Maybe one day soon I’ll even have time to prove something.

Comments Off on Losing touch with theory?

Filed under TCS

Experimental Analysis of Algorithms

It’s been a busy two weeks since I got back in town. The PBFs who went to “the field” for their summer abroad have returned with lots of fun and interesting stories. A new batch of PBFs and PGFs has arrived, bringing IHME to it’s planned capacity of around 100 heads. And I’ve been getting deeply into experimental analysis of a gaussian process regression technique, much like the one we used for estimating child mortality rates.

Maybe I’ll work on it publicly here on healthy algorithms. I’ll see if that seems too boring as I proceed.

For the moment, I’m just looking for reading suggestions. I was very inspired by David Johnson’s papaer A Theoretician’s Guide to the Experimental Analysis of Algorithms when I read it, but that was years ago. I’m going to have to read it again. What else do you recommend like this?


Filed under TCS

Aaronson’s non-bet of non-confidence in P≠NP

As you have undoubtedly heard by now, there is a paper that claims to prove P!=NP, and there is a serious effort to understand the proof.

It has been fun to watch the experts set to work on this, and it has brought a lot of attention to random k-SAT, a problem that was near and dear to me when I was a grad student. And I get to learn interesting things from them without having to struggle through Deolalikar’s opus myself.

One interesting thing is the way Scott Aaronson reacted, saying:

If Vinay Deolalikar is awarded the $1,000,000 Clay Millennium Prize for his proof of P≠NP, then I, Scott Aaronson, will personally supplement his prize by the amount of $200,000.

When I first read about Aaronson’s offer to add $200K to the prize money, reported 2nd hand in a roundup of what the #pnp blogs were saying, it came off like the young professor is really hoping to have people work on this thing. But once my trusty rss feeder fed me his post, I realized his offer is not about how profs at private universities have disposable income that public schools don’t provide. It’s his way of quantifing his confidence in the accuracy of the proof.

If Aaronson had framed this in terms of a bet, it would be a textbook example of his level of certainty that the proof will have a flaw (a textbook in decision theory, anyway). But offering the sum without any possibility of receiving a return in the alternative scenario breaks expected utility theory. How certain is Scott? It all depends on what amount of money means nothing to him.

1 Comment

Filed under probability, TCS

P, NP, and more

There was a lot of buzz this weekend about a paper claiming to resolve the “P vs NP” conjecture.  I’ve seen plenty of papers claiming to do this over the years, so I didn’t rush to track it down.  But as the tweeting continued, I decided to have a quick look for myself, if only to produce a crotchety blog post on the matter.

Part of the drama around this paper comes from the way it appeared.  Author Vinay Deolalikar‘s web page explains:

Manuscript sent on 6th August to several leading researchers in various areas. Confirmations began arriving 8th August early morning.  The preliminary version made it to the web without my knowledge.  I have made minor updates, here. Please note that the  final version of the paper is under preparation, and is to be posted here very shortly.

I’m a global health researcher now, so I’m not going to be the one who tries to verify this 104 page paper, but I would love to learn from a careful review that it is true.  The result seems to go further than separating P and NP, since it is a statement about a natural distribution of instances being hard (from p. 78):

If LFP were able to compute solutions to the d1RSB phase of random k-SAT, then the distribution of the entire space of solutions would have a substantially simpler parametrization than we know it does.

Ryan Williams thinks this is suspicious, and expressed his concern in a series of tweets:

  1. This P vs NP claim is getting tons of press. I am really doubtful that it works. Hard to convey my skepticism in a tweet, but here goes…
  2. The author tries to use the fact that for certain distributions of random k-SAT, the solution space has a “hard structure”. Two problems:
  3. (1) Polytime solvable problems (such as perfect matching on random graphs) can also have complicated solution distributions.

  4. (2) There is a randomized reduction from SAT to formulas with at most ONE satisfying assignment (Valiant-Vazirani). A simple solution space
  5. So either Valiant-Vazirani can’t be derandomized or RP=NP (seems very unlikely!) or the proof must break. That’s my intuition.

Time will tell. I find proof of the existence of hard-on-average distributions much more exciting than plain old P vs NP, and Deolalikar’s paper might have something for everybody.

1 Comment

Filed under TCS

Practical MCMC Advice: When to Stop

I read some good practical advice about when enough is enough in Markov Chain Monte Carlo sampling this morning. In their “Inference from simulations and monitoring convergence” chapter of Handbook of Markov Chain Monte Carlo, Andrew Gelman and Kenneth Shirley say many useful things in a quickly digested format. Continue reading

Comments Off on Practical MCMC Advice: When to Stop

Filed under MCMC, statistics, TCS

Dense-Subset Break-the-Bank Challenge

I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.

Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!

Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.

Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.

As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut. Continue reading


Filed under combinatorial optimization, cryptography, TCS

Conference you should know about

This weekend marks the submission of my first “Global Health” paper. Congratulations to me! And many, many thanks to all the people who have worked with me to make it happen. I’ll go into details sometime in the future, first let me see how things go in the refereeing process.

While I was over-working on that business, I got an interesting Call-for-Papers forwarded from global health/AI researcher Emma Brunskill. The AAAI Spring Symposium on Artificial Intelligence for Development (AI-D) is an effort to build a community of people applying computer science and artificial intelligence in less-developed settings.

TCS people, don’t let the “AI” in their title turn you off. Eric Horvitz says that this is for all of us. Continue reading

Comments Off on Conference you should know about

Filed under global health, TCS

August is Too-Many-Projects Month

(Tap… tap… tap… is this thing on? Good.)

July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don’t like biking on the side of a highway all day. Don’t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.

But now that I’m back, August is turning out to be project month. I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon. But one mixed blessing about these applications is that people actually want to see the results, like, yesterday! So first I have to deal with the results, and then I can write papers and blogs about the techniques.

Since Project Month is a little over-booked with projects, I’m going to have to triage one today. You’ve heard of the NetFlix Challenge, right? Well, is running a smaller scale recommendation contest, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers. I haven’t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it. Maybe someone else will take it up, let me know if you get it to work; networkx + data = good times.

    f = open('download/data.txt')
    for l in f:
        u_id, r_id = l.strip().split(':')
        G.add_edge(user(u_id), repo(r_id))

[get the code]


Filed under combinatorial optimization, software engineering, TCS

k-SAT and me

I’ve posted a new paper on random k-SAT on the arxiv.  This is work I did towards the end of my post-doctoral stint at Microsoft Reseach with Danny Vilenchik and Uri Feige.  It is an application of a cool technique that Danny and others came up with to study random instances above the satisfiability threshold that have been selected uniformly at random from satisfiable instances at that density. We use it to derive some bounds on the likely diameter of the set of satisfying solutions under this conditionally random distribution.  Unfortunately, I don’t think that there are too many global health applications for random k-SAT with k large.

That’s too bad, because Amin Coja-Oghlan has also recently posted a paper about k-SAT on the arxiv, which sounds very promising. In A better algorithm for random k-SAT, Amin presents (from the abstract):

a polynomial time algorithm that finds a satisfying assignment of F with high probability for constraint densities m/n<(1-\epsilon_k)2^k\ln(k)/k, where \epsilon_k \rightarrow 0. Previously no efficient algorithm was known to find solutions with non-vanishing probability beyond m/n=1.817.2^k/k.

His algorithm is a combinatorial, local-search type algorithm. I’ll try to find time to read the paper even if I don’t come up with a compelling application of k-SAT to health metrics.

1 Comment

Filed under probability, TCS