How many 3-digit zip codes are there?

There are 929 3-digit ZIP Codes in the country (USA).

Comments Off

Filed under global health

ML in Python: Decision Trees with Pandas

Doctors love decision trees, computer scientists love recursion, so maybe that’s why decision trees have been coming up so much in the Artificial Intelligence for Health Metricians class I’m teaching this quarter. We’ve been very sklearn-focused in our labs so far, but I thought my students might like to see how to build their own decision tree learner from scratch. So I put together this little notebook for them. Unfortunately, it is a little too complicated to make them do it themselves in a quarter-long class with no prerequisites on programming.


Filed under machine learning


In a recent post, I confessed my interest in a recent National Academy Press report on teaching methods. The tough thing for me about using this discipline-based education research (DBER) approach is not the name or the acronym, but coming up with the misunderstood concepts from the discipline that students benefit from learning actively. In the report examples, it seems like they are articulated by geniuses dedicated to teaching after years of student observation. I don’t know if I’ll get there one day, but I’m certainly not there now.

But I had a great idea, or at least one that I think is great: see what people are confused by online. I tried this out for my lecture last week on cross-validation, using the stats.stackexchange site:

After reading a ton of these, I decided that if my students know when they need test/train/validation splits and when they can get aways with test/train splits then they’ve really figured things out. Now I can’t find the question that I thought distilled this best, though.

Comments Off

Filed under machine learning

Quinlan stuff

To complement that ASA address about what is statistics that I read last week, here is the abstract KDD address about what is data mining:


Does the talk exist somewhere?

Comments Off

Filed under machine learning

Talk: Balasubramanian Sivan / Optimal Crowdsourcing Contests / Wed 1/28

Subject: Talk: Balasubramanian Sivan / Optimal Crowdsourcing Contests / Wed 1/28, 3:30pm / CSE 403

SPEAKER: Balasubramanian Sivan (MSR)
TITLE: Optimal Crowdsourcing Contests

WHEN: Wednesday, 1/28, 3:30pm

We study the design and approximation of optimal crowdsourcing
contests. Crowdsourcing contests can be modeled as all-pay auctions because
entrants must exert effort up-front to enter. Unlike all-pay auctions where a
usual design objective would be to maximize revenue, in crowdsourcing contests,
the principal only benefits from the submission with the highest quality. We
give a theory for optimal crowdsourcing contests that mirrors the theory of
optimal auction design. We also compare crowdsourcing contests with more
conventional means of procurement and show that crowdsourcing contests are
constant factor approximations to conventional methods.

Joint work with Shuchi Chawla and Jason Hartline.

From: Abraham D. Flaxman
Subject: FW: Talk: Balasubramanian Sivan / Optimal Crowdsourcing Contests / Wed 1/28, 3:30pm / CSE 403

Sorry I missed this. Jason told me about this project a little while back, and it convinced me to enter a contest. It was more fun than writing a grant proposal, and when it was rejected they gave me a 2nd runner up cash prizeā€¦


Comments Off

Filed under auctions

Verbal Autopsy methods earlier

A cool addition to the big verbal autopsy study I worked on a few years ago is out now: “symptomatic diagnosis” takes the verbal autopsy approach and applies it to find out what ails people non-fatally.


Comments Off

Filed under global health

Kish Stuff

A student came by interested in survey statistics and we go to talking about what an amazing person Leslie Kish must have been. We did some googling on it. Here are a few items we found:

Comments Off

Filed under statistics