Reusable Holdout

Cool paper, cool idea, ICYMI:

Click to access 636.full.pdf

From: Mabry, Patricia L
Sent: Thursday, January 14, 2016 5:51 AM
Subject: [iuni_systems_sci-l] Article of interest: reusable holdout method

Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., & Roth, A. (2015). The reusable holdout: Preserving validity in adaptive data analysis.Science, 349(6248), 636-638.

Misapplication of statistical data analysis is a common cause of spurious discoveries in
scientific research. Existing approaches to ensuring the validity of inferences drawn from data
assume a fixed procedure to be performed, selected before the data are examined. In common
practice, however, data analysis is an intrinsically adaptive process, with new analyses
generated on the basis of data exploration, as well as the results of previous analyses on the
same data. We demonstrate a new approach for addressing the challenges of adaptivity based
on insights from privacy-preserving data analysis. As an application, we show how to safely
reuse a holdout data set many times to validate the results of adaptively chosen analyses.

http://science.sciencemag.org/content/349/6248/636.full-text.pdf+html

Comments Off on Reusable Holdout

Filed under Uncategorized

This might be better than dropping into the Python Debugger sometimes

http://stackoverflow.com/a/2158266/1935494

Comments Off on This might be better than dropping into the Python Debugger sometimes

Filed under Uncategorized

New Publication: Implementing the PHMRC shortened questionnaire: Survey duration of open and closed questions in three sites

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0178085

Abstract

Background

More countries are using verbal autopsy as a part of routine mortality surveillance. The length of time required to complete a verbal autopsy interview is a key logistical consideration for planning large-scale surveillance.

Methods

We use the PHMRC shortened questionnaire to conduct verbal autopsy interviews at three sites and collect data on the length of time required to complete the interview. This instrument uses a novel checklist of keywords to capture relevant information from the open response. The open response section is timed separately from the section consisting of closed questions.

Results

We found the median time to complete the entire interview was approximately 25 minutes and did not vary substantially by age-specific module. The median time for the open response section was approximately 4 minutes and 60% of interviewees mentioned at least one keyword within the open response section.

Conclusions

The length of time required to complete the interview was short enough for large-scale routine use. The open-response section did not add a substantial amount of time and provided useful information which can be used to increase the accuracy of the predictions of the cause of death. The novel checklist approach further reduces the burden of transcribing and translating a large amount of free text. This makes the PHMRC instrument ideal for national mortality surveillance.

Also with a replication archive on the Global Health Data Exchange (GHDx) [http://ghdx.healthdata.org/node/263527].

Comments Off on New Publication: Implementing the PHMRC shortened questionnaire: Survey duration of open and closed questions in three sites

Filed under Uncategorized

A classroom-worthy example of the power of visual analytics

I started reading an “economics of diversity” book recently, and stumbled across a great example of the power of visual analytics (included early in the book to demonstrate the value of diverse representations):

This game is hard, right? I mean I have to think about it to figure out a good move. But if you think of it visually, the right way, it is not hard.  I’ll leave it as a mystery for now, and say that I can imagine a classroom exercise on this when I next teach interactive data visualization again.

3 Comments

Filed under Uncategorized

Testing what you don’t know

Did I mention that I attended a Software Carpentry (SWC) train-the-trainers event recently (editors note: not so recently anymore…)? And did I mention that they got me to read a fun book called _Teaching what you don’t know_? It had a number of fun-sounding ideas to encourage students to actively engage with material, in a chapter titled “Thinking in Class”, and I tried one out in a guest lecture.

The super-simple idea is this: at some point when students are spacing out from hearing too much talking from me, I paused for questions. When there were none, I said, “now I want you to turn to the person next to you, and spend just two minutes and see where your notes differ from theirs. And figure out what makes sense now but might not when you look back at your notes.”

Then I had a little break for two minutes, and people talked to each other. When I brought them back to me, there were questions and there was renewed attention.

I tried it again about 20 minutes later, and it didn’t have the same magic. Maybe it is a once a class thing.

Comments Off on Testing what you don’t know

Filed under Uncategorized

Professors are WEIRD

I read a book called _Teaching what you don’t know_ in preparation for my SWC training, and it had a great chapter on “Teaching who you don’t know” that I’ve been thinking about. It turns out that a lot of students are not like a lot of professors. Basing our understanding of human psychology on the responses of Psych 101 students is risky, and assuming that our students want to learn the way we want to learn is risky, too. https://healthyalgorithms.com/2015/06/09/weird-view-of-human-nature/

Comments Off on Professors are WEIRD

Filed under Uncategorized

Potentially of interest: summary of a workshop on reproducibility

Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results:
Summary of a Workshop (2016)

http://www.nap.edu/catalog/21915/statistical-challenges-in-assessing-and-fostering-the-reproducibility-of-scientific-results

Comments Off on Potentially of interest: summary of a workshop on reproducibility

Filed under Uncategorized

MCMC in Python: Power-posterior calculation with PyMC3

I’ve got a fun little project that has let me check in on the PyMC project after a long time away. Long-time readers of Healthy Algorithms might remember my obsession with PyMC2 from my DisMod days nearly ten years ago, but for those of you joining us more recently… there is a great way to build Bayesian statistical models with Python, and it is the PyMC package. It was completely rewritten for version 3, which is now available, and using it looks like this:

import numpy as np
import matplotlib.pyplot as plt

# Initialize random number generator
np.random.seed(123)

# True parameter values
alpha, sigma = 1, 1
beta = [1, 2.5]

# Size of dataset
size = 100

# Predictor variable
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2

# Simulate outcome variable
Y = alpha + beta[0]*X1 + beta[1]*X2 + np.random.randn(size)*sigma

from pymc3 import Model, Normal, HalfNormal
basic_model = Model()

with basic_model:

    # Priors for unknown model parameters
    alpha = Normal('alpha', mu=0, sd=10)
    beta = Normal('beta', mu=0, sd=10, shape=2)
    sigma = HalfNormal('sigma', sd=1)

    # Expected value of outcome
    mu = alpha + beta[0]*X1 + beta[1]*X2

    # Likelihood (sampling distribution) of observations
    Y_obs = Normal('Y_obs', mu=mu, sd=sigma, observed=Y)

basic_model.logp({'alpha':1,
                 'beta':[1,1],
                 'sigma_log_':0})

What I need is a way to compute the “power-posterior” of a model, which is to say p(\theta\mid y,t) = \frac{p(y\mid \theta)^t p(\theta)}{\mathcal{Z}_t(y)}  .

I’ve figured out a pretty cool way to compute the prior and the likelihood of the model:

from theano import theano, tensor as tt

def logpriort(self):
    """Theano scalar of log-prior of the model"""
    factors = [var.logpt for var in self.free_RVs]
    return tt.add(*map(tt.sum, factors))
def logprior(self):
    """Compiled log probability density function"""
    return self.model.fn(logpriort(self))
def logliket(self):
    """Theano scalar of log-prior of the model"""
    factors = [var.logpt for var in self.observed_RVs] + self.potentials
    return tt.add(*map(tt.sum, factors))
def loglike(self):
    """Compiled log probability density function"""
    return self.model.fn(logliket(self))

Now I just need to put them together, to get the log of the power-posterior. There must be an easy way to do it, but this is not it:

def logpowerpt(self, t):
    return t*loglike(self) + logprior(self)
def logpowerp(self, t):
    return self.model.fn(logpowerpt(self, t))

vals = {'alpha':1,
        'beta':[1,1],
        'sigma_log_':0}
logpowerp(basic_model, 0)(vals)

Any suggestions?

Comments Off on MCMC in Python: Power-posterior calculation with PyMC3

Filed under MCMC

AAUP Forum: The shooting at UW on inagguration day

I moderated a panel discussion of this recent challenging event, and a recording of it is now online:

On January 20, Inauguration Day, self-described anti-fascist activist Josh Dukes was shot and critically injured at a rally at the University of Washington. Milo Yiannopoulos, a high-profile “alt-right” (white supremacist) speaker, had been invited by the College Republicans to speak at Kane Hall on that evening. Hundreds of people opposed to the speaker and his message had rallied outside Kane Hall, where those attending the talk were queued up to enter the hall. Both the UW and the Seattle police were there in force.

In the aftermath of the shooting, information about the incident and the shooter was scarce, and often contradictory. It was reported that someone had turned himself in, but this person was released. Rumors abounded about whether the shooter was a UW student, about whether police had purposefully corralled people from opposing sides in order to create an incident to facilitate arrests, and about whether the shooters were affiliated with individuals who had placed threatening neo-Nazi-style posters on the campus. No arrests were made, and no explanation for the lack of arrests was given. Police and UW administration declined to comment, citing the ongoing investigation as the reason.

Comments Off on AAUP Forum: The shooting at UW on inagguration day

Filed under Uncategorized

Diversity Club: Medical Education and the Minority Tax

Last week the IHME diversity club discussed a recent JAMA viewpoint on “Medical Education and the Minority Tax”. I think this is a good way to frame an important issue:

http://jamanetwork.com/journals/jama/fullarticle/2625322

A Piece of My Mind

May 9, 2017

Medical Education and the Minority Tax

Kali D. Cyrus, MD, MPH1

Author Affiliations Article Information

JAMA. 2017;317(18):1833-1834. doi:10.1001/jama.2017.0196

I sat down at the large conference room table surrounded by the other medical students, some of whom I recognized from earlier stops on the residency interview trail. As they continued their conversations, I looked around, realizing I was once again the only interviewee who is black. I kept gazing around the room, only to find more faces staring back that did not look like me. Hanging grandly from the walls were faces, painted in watercolor, framed in bronze, and undoubtedly of really important men … really important white men.

Comments Off on Diversity Club: Medical Education and the Minority Tax

Filed under Uncategorized