I might have been a little over-ambitious with this series. I wrote a little bit about the how matching theory emerged from the social sciences two weeks ago. But then I got really busy! And that was the part I actually knew something about ahead of time. The promised connection between matching algorithms and reproductive health (and more generally, how matching is being used in quasi-experiment design) is the part that I have to do some reading on before I can write knowledgeably about.

However, I have a plan: I’d like to “crowd-source” my library research. I’ve read a little bit about a grand experiment that is going on right now, wherein Tim Gowers, Terry Tao, and mathematicians well-known and unknown, are collaborating online to try to develop a combinatorial proof of the density Hales-Jewitt theorem. I’ll try a small-scale version of their amazing project. I don’t expect it to draw in the same heavy-weight-champion mathematicians that Gower’s polymath project has attracted, but it will not require background knowledge of Szemerédi’s regularity lemma or the triangle-removal lemma (which currently have no applications to Global Public Health, but I’m on the lookout…).

The article that made me think I could title a series of posts “Matching Algorithms and Reproductive Health” came out a month or so ago in the journal of the American Academy of Pediatrics: Janet Elise Rosenbaum, Patient Teenagers? A Comparison of the Sexual Behavior of Virginity Pledgers and Matched Nonpledgers, Pediatrics, Vol. 123 No. 1 Jan 2009. (See? It has “match” in the title.) One nice thing about these medical journals is that they give good summaries of the articles, sort of an abstract-of-the-abstract. Patient Teenagers in two sentences:

What’s Known on This Subject

Two studies have found, by using regression, that virginity pledges delay sex, but regression cannot correct for large preexisting differences between pledgers and nonpledgers.

What This Study AddsWe used a more robust method than regression to compare virginity pledgers with similar nonpledgers and found virtually no difference in sexual behavior or STDs and much less use of condoms.

This is great ammo for the culture wars, and it was headline news for a few days when it came out. In blog time, I’m way late in mentioning it. But my interest is the more robust method that Rosenbaum uses. It is something that statisticians call “matching”, but when they say matching, they don’t mean finding a subgraph with maximum degree one. I have a feeling that what they mean is related, however. Can we figure out together?

It seems like there is an R package that is popular for this sort of analysis, matchit, and there is an award-winning 38 page paper explaining the ideas behind the code. Have a look and report back. I will too, when I have another 90 minutes free.

I didn’t realize this while I was writing, but now I’m pretty sure that I went to high school with Janet Rosenbaum! Small world, huh?

Hi Abie,

I think this kind of data-set matching includes the combinatorial kind, but also a more generalized pruning of the data set, in which control and treatment look nearly the same in all the pre-treatment dimensions, at least in some aggregate sort of way.

From page 212 of the paper:

I found that the matchit documentation is a quicker way to get to the bottom of this than the matchit paper.

Check out section 3.2.1.4, Optimal Matching. This is one of 7 ways recommended, but, if I’m reading it right, it is

exactlya minimum weight perfect matching of the treatment subjects to the control subjects.Pingback: Matching Algorithms and Reproductive Health: Part 3, A Stylized Virginity Pledge « Healthy Algorithms