Journal Club: Meat, fish, and esophageal cancer risk: a systematic review and dose-response meta-analysis

For our final journal club paper of last semester, we read Meat, fish, and esophageal cancer risk: a systematic review and dose-response meta-analysis. I am a vegetarian.

That brings me up to date for fall quarter, and the winter quarter is just finishing its first week. Good!

1 Comment

Filed under statistics

IHME Seminars from Last Quarter

It is time for seminars to start up again at IHME, and yesterday we had one already! Here are the remaining ones from last semester that I did not get a chance to mention individually:

Paying for Health Results in Developing Countries: Current and Future – Rena Eichler

A country perspective on the Global Burden of Disease (GBD) project – the case of Norway – Stein Emil Vollset

Improving the quality of siblings’ survival histories: results from a randomized controlled trial in Niakhar (Senegal) and next steps – Stéphane Helleringer

A Statistician’s Challenges with HIV and AIDS – Nicholas P. Jewell

Comments Off on IHME Seminars from Last Quarter

Filed under global health

Journal Club: Investigating health system performance: An application of data envelopment analysis to Zambian hospitals

I’m almost caught up on recording last quarter’s journal club papers, with this second to last topic: Investigating health system performance: An application of data envelopment analysis to Zambian hospitals by Felix Masiye.

This data envelopment analysis (DEA) is an approach I’ve been hearing a lot about recently, and it seems to work through quite an operations-research lens. I hope I’ll be looking into it more in the near future.

Comments Off on Journal Club: Investigating health system performance: An application of data envelopment analysis to Zambian hospitals

Filed under global health

Journal Club: Wireless Substitution: State-level Estimates From the National Health Interview Survey, January 2007–June 2010

This National Health Statistics Report that we read toward the end of last quarter’s journal club has one of the driest names we’ve seen. But the topic is a fascinating glimpse into the limits of our knowledge about society. How many households in USA have given up their landline phone entirely and only have a cell phone? Well, we answer most questions like that with a telephone survey. Uh-oh. Fortunately the National Health Interview Survey (in my experience, pronounced most commonly as “en-hiss”) is a health survey were enumerators visit households in person, and even though it is about population health, it can also answer this pressing question about technology use (and the potential invalidity of all of the surveys that do not visit households in person, but just call on the phone).

Comments Off on Journal Club: Wireless Substitution: State-level Estimates From the National Health Interview Survey, January 2007–June 2010

Filed under global health

I used the IPython Notebook for my lab book for a year. How did it go?

It was exactly a year ago when I firmed up a workflow wherein the IPython Notebook was the center of my daily scientific research. All notes end up in a .ipynb file, and my code, plots, and equations all live together there. Looking back on 2013, how did it go and what should I change for 2014?

I am very happy with it overall. I have 641 .ipynb files, with names like 2013_01_01_EM_4_1_2.ipynb and 2013_12_22a_dm_pde_for_pop_prediction.ipynb. This includes notes for two courses I taught and plan to teach again, for several papers that we published, and for a large number of projects that didn’t pan out. I’ll definitely use the course notes again the next time I teach, I’ve already had to look up the calculations from some of those papers for responses to reviewers and clarifications after publication, and maybe I can come back to projects that didn’t pan out in the future with some new insight.

What could go better? I couldn’t decide if my lab book should capture everything, like I was taught in science class, or have a curated collection of my work including only the parts I would need in the future. Probably some blend is best, and since it is hard to know the right balance ahead of time, I tried to keep everything in a git repo, so that I could curate and edit, but recover anything that I realized I still wanted after cutting. I only ended up with 59 git commits, though. If that approach was working, I would expect more commits than notebooks.

I sometimes lost things in my stack of notebooks. The .ipynb format is not easy to search, so I kept a .py copy of everything and grepped through them looking for the notebooks about a specific technique or project. Since I organized my notebooks chronologically, I ended up doing this a lot more than if I had organized them thematically, but even if I already had all of my congenital heart disease notes in one place, I would still find myself saying, “I know I did some data munging like this for a different project recently, how does the pandas.melt function work again?”, or whatever.

The feature I would like the most is a way to paste images into my notebook. I wrote some notes about it in a github issue page about IPython Notebook feature requests. I want the digital equivalent of stapling a copy into my lab book, and I want it to be easy.

Collaboration worked pretty well. I have a lot of colleagues who don’t want to see Python code, no matter how much easier it would make their lives. I’ve had good success sending them pdf version of notebooks, or sticking my research notes in a github gist and sending them a link to nbviewer. I think there is room for improvement in this, too, though.

2 Comments

Filed under global health

The four pillars of probabilistic programming systems

I’ve been organizing my thoughts on probabilistic programming and Bayesian computation. Try this out: there are four things a probabilistic programming system needs, depending on who is using it for what:

  • Expressive language for formulating models
  • Efficient computation of objective functions
  • Flexible inference algorithms
  • Appropriate data analysis workflow

Maybe I can come up with better names for these pieces, and maybe they are not all different. And maybe I am missing something. This is sort of preliminary. But let me elaborate on how it works in the case of PyMC.

Expressive language for formulating models: this is what drew me to PyMC when I started doing applied work five-ish years ago. Just write Python. For simple things, it reads as easily as equations in a stats paper, and for complex things it can have subroutines, data structures, and all of the nice things I expect from a modern programming language.

Efficient computation of objective functions: PyMC2 has a strange confection of Python and Fortran under the hood, which works well enough for the stuff I’ve been doing. But (if I understand correctly) PyMC3 pushes everything off into Theano, which does a more sophisticated translation/compilation of the code.

Flexible inference algorithms: I think that a lot of the inspiration for PyMC3 is the possibility of using Hamiltonian Monte Carlo methods for generating MCMC steps, which requires quickly computing the derivative of the objective function. PyMC2 has relied heavily on the Adaptive Metropolis step method. In the past, I’ve had a lot of fun experimenting with alternative approaches.

Appropriate data analysis workflow: I’ve had a few long discussions with other researchers who are using these methods about the barriers for their work and their colleagues, and this is the part that seems most important. How do you get the data all in place to evaluate objective functions and run flexible inference algorithms? This is not really a core part of PyMC, but rather something to be done with general Python, which suits me just fine.

I’d love to workshop this a little bit with you, dear reader, so I’m going to try turning on comments again. I hope I don’t get spammed into oblivion.

1 Comment

Filed under software engineering

What qualifies as probabilistic programming?

I just went through the classic paper on WinBUGS, which might or might not be called probabilistic programming. It is listed on the probabilistic programming resource page, and it is certainly interesting. The WinBUGS “hello, world” is a linear regression model:

regression_hello_world

Comments Off on What qualifies as probabilistic programming?

Filed under software engineering, statistics

Probabilistic Programming Examples

I’ve been reading up on probabilistic programming, it is so close to PyMC, but so different. Coolest example so far comes from a talk on Microsoft’s offering, infer.net:

hello_uncertain_world

Comments Off on Probabilistic Programming Examples

Filed under probability, software engineering

Occupation codes in the NHIS

This is my new obsession, does anyone know what I should know about self-reported occupation data in NHIS surveys?

nhis_occupation

Comments Off on Occupation codes in the NHIS

Filed under Mysteries

IHME Seminar: Caleb Banta-Green on Drugs in Toilet Water

I’m catching up on all the happenings around IHME while I was busy last quarter, and here is the one where information technology served me the best. The Wednesday seminar from Oct 30 was a particularly cool approach to finding out about “hidden health behaviors” from waste water monitoring, like if there is more psychostimulant use in urban or rural settings.

It is the one where information technology served me the best because I was traveling when this seminar happened, and I watched it in a live broadcast online when I couldn’t fall asleep in Geneva. Yay technology. You can watch it now, too, in archived form. Yay, again.

Comments Off on IHME Seminar: Caleb Banta-Green on Drugs in Toilet Water

Filed under global health