I was there to connect with other theoretical computer science and find out how they have been applying machine learning to “development”. It turned out that development means mostly applications to health, education, and agriculture in this crowd.
I was also there to share a very concrete challenge problem that I’ve been dabbling in here at IHME, which my colleague Sean Green presented our short paper on: the Verbal Autopsy.
Instead of recapping the problem in detail here, I’ll point you to our paper, and try to say just enough to get you interested. In many parts of the world, there are not death certificates, so it’s hard to know what diseases should be public health priorities. To try to get some idea, you can conduct interviews with relatives of recently deceased people, asking them about the signs and symptoms of illness that they observed shortly before death. These interviews are verbal autopsies. What to do with these interview results? Well, the standard practice is to hire local physicians to read them and diagnose the cause of death, and then use aggregate statistics of their findings in priority setting. But there are some problems: physicians are not very accurate in their diagnoses, and, especially in places where there aren’t enough doctors, these physicians could be spending their time on people who are not yet dead.
I think it’s a great place for robots! There has previously been a stumbling block in validating machine learning techniques, however, which is the lack of “labeled examples”. But, just before heading off to AI-D, I got some good news. Sean and I were able to convince the IHME top brass to release some appropriately anonymized verbal autopsy data, together with gold-standard cause-of-death diagnosis. I put it in a github repository, verbal-autopsy-challenge. Maybe when I have some time, I’ll put some sample code in there, too.
I hope the format is self-explanatory, and if it’s not, leave a comment and we can figure out how to describe it better. It looks like this:
"symptom1","symptom2","symptom3","symptom4","symptom5",...,"causeOfDeath" 0,70,1,2,1,0,...,14 ...
The comment section is also a great place to discuss machine learning approaches to tangling with this ML task. If you use the data in a paper, please cite our AI-D paper, S. T. Green and A. D. Flaxman, Machine Learning Methods for Verbal Autopsy in Developing Countries, 2010.