Here is an interesting method that spans my old life in theory world and my new one in global health metrics: the Random Forest.
This is a technique that grows (get it?) out of research on decision trees, and it is a great example of how combining a few simple ideas can get complicated very quickly.
The task is the following: learn from labeled examples. (Is this yet another baby-related research topic? Not as directly as the last few.) To be specific, I start with a training data set, which to be specifically about the task at hand in global health, may be the results of verbal autopsy interviews, all digitized and encoded as numeric data; together with the true underlying cause of death (as identified by gold-standard clinical diagnostic criteria) as the labels.
To “learn” in this case means to build a predictor that can take new, unlabeled examples and assign a cause of death to them.
The first simple idea needed for the random forest is the decision tree, and I found a nice youtube video that explains it, so I don’t need to write it up myself:
Well, this video is not perfect; if you have not seen this before, you may be left with a few questions.