I’ve been flipping through the titles of SODA acceptances listed on the blogs, and wondering if I’m losing touch with TCS research. It’s a good chance for me to think about what algorithms (discrete or otherwise) have been really big in the health metrics work I’ve been doing recently.
- Markov Chain Monte Carlo (MCMC): This is the workhorse algorithm for me when fitting statistical models. There are a few MCMC-sounding titles in the SODA list; does anything have an exciting new step method to speed up my life?
- Mixed Integer Programming (MIP): This classic formulation of operations research must make an appearance in some of the approximation algorithms or other analysis in SODA. Is there any work there that’s taking it on directly?
- Stochastic Programming: There was a lot of excitement about two-stage stochastic programming a few years ago, but the fad seems to have died down in theory land. Too bad for me, because two-stage formulations are not really what I need, and my StoPro needs are growing.
- Random Forests: I really didn’t get enough education on machine learning in grad school. What I do know is very much on the theoretical side of the spectrum. But this Random Forests business has been pretty promising so far, and I just made a bet 10-to-1 that it will out-perform an ad-hoc method for verbal autopsy. I believe the odds, but I wasn’t paying enough attention to the stakes…
- Nonlinear optimization: I love MCMC, but I don’t love waiting around for my chains to mix. The alternative approach to statistical modeling, where you make due with a maximum likelihood estimate is starting to look pretty appealing. This is pretty outside of the SODA realm. I tried to convince Steven Rudich to include Newton’s Method in his course “Great Theoretical Ideas in Computer Science” some years ago, but I didn’t succeed.
- Automatic Differentiation: If I’m getting into nonlinear optimization, I will at least be a user of automatic differentiation, since the nonlinear optimizer wants to know the gradient, and I’m sure not going to be computing it if I don’t have to be.
So I guess my research needs are not squarely within the SODA realm. But they are not disjoint from it either. I’m still touching theory, if not totally in touch. Maybe one day soon I’ll even have time to prove something.
I’m not attending WWW this week, but I am promoting a paper that I helped with, Tracking the random surfer: Empirically measured teleportation parameters in PageRank. My main contribution was connecting the people with the idea to the people with the data, but I’m happy with the results.
Incidentally, this sort of measurement has a great application in health metrics. But I’m going to keep it secret for a little while, to get my thoughts in order.
I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.
Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!
Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.
Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.
As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut. Continue reading
This weekend marks the submission of my first “Global Health” paper. Congratulations to me! And many, many thanks to all the people who have worked with me to make it happen. I’ll go into details sometime in the future, first let me see how things go in the refereeing process.
While I was over-working on that business, I got an interesting Call-for-Papers forwarded from global health/AI researcher Emma Brunskill. The AAAI Spring Symposium on Artificial Intelligence for Development (AI-D) is an effort to build a community of people applying computer science and artificial intelligence in less-developed settings.
TCS people, don’t let the “AI” in their title turn you off. Eric Horvitz says that this is for all of us. Continue reading
I don’t feel like having that post about how big things are brewing in US health care reform on the top of my blog anymore, so here is a quick replacement: a ranking paper that caught my eye recently on arxiv, where computer scientists is applied to politics: On Ranking Senators By Their Votes, by my fellow CMU alum, Mugizi Rwebangira (@rweba on twitter).
I haven’t had time to write anything this week because I am up to my neck in this Seven-Samurai-style software engineering project. You know, where a bunch of untrained villagers (that’s me) need to defend themselves against marauding bandits (that’s the Global Burden of Disease 2005 Study), so they have to learn everything about being a samurai (that’s writing an actual application that people other than this one villager can use) as quickly as possible.
I guess this analogy is stretching so thin that you could chop it with Toshirō Mifune’s wooden sword. But, if anyone knows how a mild-mannered theoretical computer scientist can get a web-app built in two weeks, holler. If you prefer to explain in terms of wild-west gunslingers, that is fine.
Here’s my game plan so far: I’m going to make the lightest of light-weight Python/Django apps to hold all the Global Disease Data, and then try to get my epidemologist doctors to interact with it on the command-line via an interactive python session.
The rest of this post is basically a repeat of the Django tutorial, but specialized for building a data server for global population data. As far as interesting theoretical math stuff, hidden somewhere towards the end, I’ll do some interpolation with PyMC’s Gaussian Processes using the exotic (to me) Matérn covariance function. Continue reading
The Chronicle of Higher Ed has a short piece on public-service applications of computer science that are coming out of a class called Computing for Good (C4G) that TCS star Santosh Vempala co-taught at Georgia Tech last spring.
This is an idea that is emerging in several ACO-related disciplines. Manuela Veloso has been running a similar program at CMU called V-Unit, Karen Smilowitz and Michael Johnson held a session at INFORMS 2007 on community-based operations research, and in 2006 student statisticians started a network of volunteer consultancies called Statistics in the Community.
It’s great to see a tradition of “pro bono” work developing in theoretical fields. It’s not just a way for lawyers to assuage their consciences anymore.