Have yinz already seen Google Flu? It’s a project by, in collaboration with the Center for Disease Control and Prevention (CDC). It’s been getting healthy press coverage for the last two weeks or so. And, if you want to dig deeper, a draft manuscript on their approach is also available.

The headline result of this approach to tracking flu outbreaks is that it is fast: can observe flu trends two weeks before the CDC. And it is accurate enough, with correlations of 0.85-0.98 between the search-result-based estimate and the gold-standard rates produced by the CDC.

I’ll tell you what’s wrong with it, but first let me praise it.

Something that I really appreciate about this project is how the researchers have made the raw data from their findings easily available. Just follow their Download raw data link, and you are at a page with csv files. Good show!

Another thing that I like about this project is that it is getting people thinking about online privacy in a personal way. As I mentioned recently, 67% of survey respondents recently said that, assuming that there was no way that anyone will have access to their identity, they would they would share health information for researchers to learn about disease treatment, prevention, and other related issues. This is something that is working to make clear in their faq, especially because is trying to get into the electronic-health-records business.

Now, what problems?

> That flu stuff is fun. I’ve been trying to come up with what it’s weak points are, for a curmudgeonly blog post. The main failing (as mentioned by cnn) is that it misses anyone who doesn’t use google and has false positives for folks who are curious, not sick. The former would be major in a global health setting (and probably also in the US), since it is probably poorer people who both google less and die from flu more.
> Another fun point to criticize is the possibility of intentional manipulation. Not a big deal if people only use the system for their curiosity, but if public decision making starts relying on google searches (or if “savvy” investors are following flu trends for trading in pharma companies), it would be an easy weekend project to hire a botnet that makes spikes wherever you like. Would that be illegal? I’ll ask my dad/lawyer.
The paper does address the risk of false positives, and how a news event, like the recall of a popular flu remedy, could lead to a spike in flu-related searches, but it doesn’t mention the risk of intentional manipulation.  But who would do that?  I’ve got more fun weekend projects!

Regarding the way flu affects lower-income people more, there is a recent study by IHME director Chris Murray and others (that I haven’t read yet) which takes the historical data on the 1918 flu pandemic and models what it would look like if it happened today.  It got good press coverage when it came out, too.

For a future post, Gaetano Borrielo, a UW prof who does ubiquitous computing is on sabbatical at Google-Seattle for the year.  He spoke at IHME yesterday about other health-related projects that is working on.  I’ll tell you about that soon.


  1. Mohammad

    fyi, there’s also a published paper on the subject from yahoo research:

  2. Thanks for the link, Mohammad! Is there a way to compare the google and yahoo results to see who has the sicker searchers (or most average)?

    Is anyone from Microsoft working on this who wants to chime in? Or has anyone tried it with public data, like wikipedia page views?