Category Archives: software engineering

Laplace approximation in Python: another cool trick with PyMC3

I admit that I’ve been skeptical of the complete rewrite of PyMC that underlies version 3. It seemed to me motivated by an interest in using unproven new step methods that require knowing the derivative of the posterior distribution. But, it is really coming together, and regardless of whether or not the Hamiltonian Monte Carlo stuff pays off, there are some cool tricks you can do when you can get derivatives without a hassle.

Exhibit 1: A Laplace approximation approach to fitting mixed effect models (as described in http://www.seanet.com/~bradbell/tmb.htm)

http://nbviewer.ipython.org/gist/aflaxman/9dab52248d159e02b2ae

Comments Off on Laplace approximation in Python: another cool trick with PyMC3

Filed under software engineering, statistics

Some thoughts on code review

We’ve been doing a massive code review exercise at IHME this quarter. I think I mentioned it in passing. I have adapted the approach that Philip Guo wrote up in ACM Communications, and forced everyone to use git and comment on pull requests (in stash, Atalassian’s Github alternative, which we host internally).

It has been a good experience for some, but it is time to find out how it worked for all, with a more traditional course evaluation. Fingers crossed.

Comments Off on Some thoughts on code review

Filed under software engineering

Can I use unittest.mock in python?

A question I would like to answer. Perhaps this is the place to start: https://docs.python.org/3/library/unittest.mock.html

Comments Off on Can I use unittest.mock in python?

Filed under software engineering

SciLint

I’m considering sneaking syntax checking into the scientific code review process I’ve been running this quarter. Here are some resources:

PyLint: http://www.pylint.org/
RLint: https://code.google.com/p/google-rlint/ or http://cran.r-project.org/web/packages/lint/index.html
Stata Lint: http://www.stata.com/statalist/archive/2009-08/msg01048.html http://www.stata.com/help.cgi?syntax http://www.stata.com/help.cgi?language

Comments Off on SciLint

Filed under software engineering

Two ways to do categorical data in Python

Pandas has it: http://pandas-docs.github.io/pandas-docs-travis/categorical.html

Python has it, too: https://docs.python.org/3/library/enum.html

Will this make my life easier?

Comments Off on Two ways to do categorical data in Python

Filed under software engineering

What style should I use for my docstrings?

The numpy docstring style should be just fine: http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_numpy.html

Comments Off on What style should I use for my docstrings?

Filed under software engineering

Overheard: Mozilla Study Groups in Seattle

Hey all,
Lately, I’ve been thinking about a way to help people keep learning & practicing their coding skills long term; my new project, Mozilla Study Groups, are what I came up with, and I wanted to ping the Seattle community to see if people would be interested in trying this out locally.
The idea is to have a casual meetup, maybe 1-2 hours anywhere from weekly to monthly, where people can come and share skills in some guided demos of the code and packages they use in their research, ask each other questions, find out what each other are working on, and just generally have a place to come talk and learn about coding for research.
I’ve made a few assets to support this (all still works in progress, feedback very welcome):
– Study Group Handbook, a how-to guide for organizing these meetup groups: http://mozillascience.github.io/studyGroupHandbook/
– Study Group Website Kit, a forkable, quick to set up website to list events for your group: https://github.com/mozillascience/studyGroup
– Study Group Lessons, a collection of short lessons from past meetups, intended for recirculation: https://github.com/mozillascience/studyGroupLessons
The pilot in Vancouver is a big hit, check out their website: http://minisciencegirl.github.io/studyGroup/ .

Sound interesting? These Study Groups work best when the community gets together to organize; if you’re interested in giving this a go, I’d be happy to help out and maybe scoot down from Vancouver to assist in getting started; let me know!


Best Regards,
Bill Mills
Community Manager
Mozilla Science Lab

Comments Off on Overheard: Mozilla Study Groups in Seattle

Filed under software engineering

Jupyter Notebooks in GitHub

So cool:
https://github.com/blog/1995-github-jupyter-notebooks-3

https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks

https://github.com/fonnesbeck/statistical-analysis-python-tutorial/blob/master/4.%20Statistical%20Data%20Modeling.ipynb

I wonder what diffs look like?
Currently, not shown: https://github.com/fonnesbeck/statistical-analysis-python-tutorial/commit/17ca0cd15c1379f9adc4561042c4a31621baeef6

Is that next GitHub? It will be huge.

Comments Off on Jupyter Notebooks in GitHub

Filed under software engineering

Writing papers in the IPython Notebook

From: Spencer James
Sent: Monday, April 20, 2015 6:33 AM
To: Abraham D. Flaxman
Subject: writing manuscript in Notebook?

Hi Abie,

Hope things are well back in Seattle! I’m about halfway through the Epic Measures book, which has been a great read.. sure is interesting to understand more of the background story.

I have another question for you. Do you ever use IPython Notebook to write manuscripts or abstracts? I’ve been trying to figure out a way to work on manuscripts/abstracts such that I have the results/statistics directly embedded into the paper so that I don’t have to continually copy/paste numbers into Word as we go through edits/reviews, ie if I’m reporting an odds ratio, it’d be nice if I could just populate that value in the text itself straight from the logistic regression rather than copy/paste values. I found a few publicly available notebooks where researchers had done something along these lines, but they tended to not be results/data heavy, so I thought I’d see if you had anything like this or know of any better examples.

At any rate, hope all is well — it would be great to hear about what you’re working on these days as well.

Spencer

On Mon, Apr 20, 2015 at 1:59 PM, Abraham D. Flaxman wrote:
Hi Spencer,

It is always a pleasure to hear from you, and it sounds like you are doing a ton of interesting stuff!

Regarding IPython Notebooks, there is a crowd of hardcore IPython users who want to do exactly what you describe, and they even have collection of successful examples: https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications

When I last considered doing this, I thought it was promising, but not quite ready. A particular challenge is working with colleagues who like to write collaboratively in MSWord using Track Changes. As a hybrid solution, what I’ve been doing is using python to insert values from calculations into blocks of text and the copy-paste those into Word documents. This saves some of the pain of updating figures after new data comes in, etc, but is less burdensome on people I’m working with who do not prioritize exploring the cutting edge of technology (bleeding edge?).

I’ll make this into a blog, if you don’t mind. Maybe others will be interested, and/or have better ideas.

Here is an example of what my approach might look like:
import jinja2

tmpl = """
Results.
The birth prevalence of moderate and severe CHD in 2010 for males was
{{birth_prev_m}}
per 1,000
(95% Credible Interval [CrI] {{birth_prev_m_lb}}-{{birth_prev_m_ub}}),
and for females was
{{birth_prev_f}}
per 1,000
(95% CrI {{birth_prev_f_lb}}-{{birth_prev_f_ub}}).
From 1968 to 2010, mortality with heart defects (all ages)
declined {{all_mx_decline_pct}}%,
from
{{mx_per_100000_in_1968}}
to
{{mx_per_100000_in_2010}}
per 100 000 person-years (PY);
among zero to 51-week olds,
the decline was
{{u1_mx_decline_pct}}%,
from
{{u1_mx_per_100000_in_1968}}
to
{{u1_mx_per_100000_in_2010}}
per 100 000 PY. The estimated number of adults (age 20 to 65 years)
with moderate to severe CHD in 1968
was
{{adult_cases_1968}} (95% CrI {{adult_cases_1968_lb}}-{{adult_cases_1968_ub}}),
with
{{repro_cases_1968}} (95% CrI {{repro_cases_1968_lb}}-{{repro_cases_1968_ub}})
eproductive age females (age 15-49 years).
In 2010, it was
{{adult_cases_2010}} 95% CrI {{adult_cases_2010_lb}}-{{adult_cases_2010_ub}}),
an increase by a factor of
{{case_increase}} (95% CrI {{case_increase_lb}}-{{case_increase_ub}}),
with
{{repro_cases_2010}} 95% CrI {{repro_cases_2010_lb}}-{{repro_cases_2010_ub}})
reproductive age females.
"""

t = jinja2.Template(tmpl)
print t.render(vals)

with file('achd/achd_abstract_results.txt', 'w') as f:
f.write(t.render(vals))

From: Spencer James

Those are some great examples — the biomed-related ones are especially helpful. Some of my more recent projects involve publicly-available datasets like SEER, and I’d like to start also making that effort to have my analytic code available/reproducible alongside the manuscript itself. The collaboration issue with MS Word is a headache in medicine, too — I like the use of Python to write the text itself as a happy medium, though!

If I can write this next SEER manuscript or abstract using a Notebook, I’ll send it your way, too.

Good hearing from you and I’ll follow up on your blog to see if anyone has some other ideas,

Spencer

4 Comments

Filed under software engineering

Interesting stuff at Mozilla Science Lab

http://www.mozillascience.org/training

http://mozillascience.github.io/codeReview/intro.html

Comments Off on Interesting stuff at Mozilla Science Lab

Filed under software engineering