What was I up to two year ago?

One fun thing about keeping my lab notebook in digital form with IPython Notebooks is that I can flip through my old work so easily. Did I say fun? I meant scary, and sometimes depressing. But yes, also fun.

For example, two years ago, I was working on some projects that are still not wrapped up today, and I was doing a lot of prep for the first edition of my now re-titled “machine learning for health metricians” class.

Hey that includes the answer to [a question someone just asked on stats.stackexchange](http://stats.stackexchange.com/q/149801/18291)

Leave a comment

Filed under machine learning

Jupyter Notebooks in GitHub

So cool:



I wonder what diffs look like?
Currently, not shown: https://github.com/fonnesbeck/statistical-analysis-python-tutorial/commit/17ca0cd15c1379f9adc4561042c4a31621baeef6

Is that next GitHub? It will be huge.

Leave a comment

Filed under software engineering

Irreproducibile science as a communication failure

From: Abraham D. Flaxman
Sent: Thursday, May 7, 2015 4:40 PM
To: reproducible@u.washington.edu
Subject: [Reproducible] licenses and reproducibility: the scholarly communication lens

The recent discussion on reproducibility and licensing inspired me to read something historical about UW and software licensing that has been on my desk for a while. I think others on the list might find it interesting as well, so I scanned a copy for you: https://www.dropbox.com/s/79k92iwm20159of/williams_barnett_digital_ventures_2009.pdf?dl=0

I particularly like the idea that software is communication, and the university is an institute that is good at scholarly communication and at teaching. I think there is some framing here that could be valuable for reproducible research as well. Irreproducible results are, in a sense, a communication failure, and a lot of what we are talking about on this list are different ways to improve our scholarly communication.


Leave a comment

Filed under science policy

Why do we call it “ridge” regression?

Asked and answered: http://stats.stackexchange.com/a/151351/18291

With a link to more detail: http://www.itl.nist.gov/div898/handbook/pri/section3/pri336.htm

Leave a comment

Filed under machine learning

A post on a talk on the book Epic Measures

I got my high school buddy to write a book about my boss… what could go wrong? They were at Town Hall Seattle a few weeks ago, and I think nothing did: http://townhallseattle.org/event/jeremy-smith-and-christopher-murray/

Is there a recording online somewhere?

Leave a comment

Filed under global health

I like the term OneHotEncoder

Dummy variable just sounds demeaning to me. http://stats.stackexchange.com/questions/149122/treating-missing-data-in-voting-pattern-analysis/149572#149572

Leave a comment

Filed under machine learning

How did I end up reading a 30 year old book on density estimation?

Simple, I wanted to make violin plots for efficiency scores, and they shouldn’t have any density below zero. Here is a sneaky way to sneak such a figure out of Python/Seaborn: https://github.com/mwaskom/seaborn/issues/525#issuecomment-97651992

The truncation makes them look more like gyro meat than violins.

Leave a comment

Filed under dataviz