Tag Archives: reproducible research

Catalog of Reproducible Products

The Reproducibility and Open Science (ROS) Working Group recently finished up a form to begin to gather information on Reproducible Products from the community.

Please take a few minutes to submit information on any product (peer-reviewed manuscript, preprint, or other product). It is only about 20 questions with many multiple choice question.

The google form can be accessed @
http://goo.gl/hHcFlK

Please feel free to let us know if you have any questions or comments.
Thanks
Steven

Steven Roberts
faculty.washington.edu/sr320

Comments Off on Catalog of Reproducible Products

Filed under science policy

Jupyter Notebooks in GitHub

So cool:
https://github.com/blog/1995-github-jupyter-notebooks-3

https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks

https://github.com/fonnesbeck/statistical-analysis-python-tutorial/blob/master/4.%20Statistical%20Data%20Modeling.ipynb

I wonder what diffs look like?
Currently, not shown: https://github.com/fonnesbeck/statistical-analysis-python-tutorial/commit/17ca0cd15c1379f9adc4561042c4a31621baeef6

Is that next GitHub? It will be huge.

Comments Off on Jupyter Notebooks in GitHub

Filed under software engineering

Irreproducibile science as a communication failure

From: Abraham D. Flaxman
Sent: Thursday, May 7, 2015 4:40 PM
To: reproducible@u.washington.edu
Subject: [Reproducible] licenses and reproducibility: the scholarly communication lens

The recent discussion on reproducibility and licensing inspired me to read something historical about UW and software licensing that has been on my desk for a while. I think others on the list might find it interesting as well, so I scanned a copy for you: https://www.dropbox.com/s/79k92iwm20159of/williams_barnett_digital_ventures_2009.pdf?dl=0

I particularly like the idea that software is communication, and the university is an institute that is good at scholarly communication and at teaching. I think there is some framing here that could be valuable for reproducible research as well. Irreproducible results are, in a sense, a communication failure, and a lot of what we are talking about on this list are different ways to improve our scholarly communication.

–Abie

Comments Off on Irreproducibile science as a communication failure

Filed under science policy

Writing papers in the IPython Notebook

From: Spencer James
Sent: Monday, April 20, 2015 6:33 AM
To: Abraham D. Flaxman
Subject: writing manuscript in Notebook?

Hi Abie,

Hope things are well back in Seattle! I’m about halfway through the Epic Measures book, which has been a great read.. sure is interesting to understand more of the background story.

I have another question for you. Do you ever use IPython Notebook to write manuscripts or abstracts? I’ve been trying to figure out a way to work on manuscripts/abstracts such that I have the results/statistics directly embedded into the paper so that I don’t have to continually copy/paste numbers into Word as we go through edits/reviews, ie if I’m reporting an odds ratio, it’d be nice if I could just populate that value in the text itself straight from the logistic regression rather than copy/paste values. I found a few publicly available notebooks where researchers had done something along these lines, but they tended to not be results/data heavy, so I thought I’d see if you had anything like this or know of any better examples.

At any rate, hope all is well — it would be great to hear about what you’re working on these days as well.

Spencer

On Mon, Apr 20, 2015 at 1:59 PM, Abraham D. Flaxman wrote:
Hi Spencer,

It is always a pleasure to hear from you, and it sounds like you are doing a ton of interesting stuff!

Regarding IPython Notebooks, there is a crowd of hardcore IPython users who want to do exactly what you describe, and they even have collection of successful examples: https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications

When I last considered doing this, I thought it was promising, but not quite ready. A particular challenge is working with colleagues who like to write collaboratively in MSWord using Track Changes. As a hybrid solution, what I’ve been doing is using python to insert values from calculations into blocks of text and the copy-paste those into Word documents. This saves some of the pain of updating figures after new data comes in, etc, but is less burdensome on people I’m working with who do not prioritize exploring the cutting edge of technology (bleeding edge?).

I’ll make this into a blog, if you don’t mind. Maybe others will be interested, and/or have better ideas.

Here is an example of what my approach might look like:
import jinja2

tmpl = """
Results.
The birth prevalence of moderate and severe CHD in 2010 for males was
{{birth_prev_m}}
per 1,000
(95% Credible Interval [CrI] {{birth_prev_m_lb}}-{{birth_prev_m_ub}}),
and for females was
{{birth_prev_f}}
per 1,000
(95% CrI {{birth_prev_f_lb}}-{{birth_prev_f_ub}}).
From 1968 to 2010, mortality with heart defects (all ages)
declined {{all_mx_decline_pct}}%,
from
{{mx_per_100000_in_1968}}
to
{{mx_per_100000_in_2010}}
per 100 000 person-years (PY);
among zero to 51-week olds,
the decline was
{{u1_mx_decline_pct}}%,
from
{{u1_mx_per_100000_in_1968}}
to
{{u1_mx_per_100000_in_2010}}
per 100 000 PY. The estimated number of adults (age 20 to 65 years)
with moderate to severe CHD in 1968
was
{{adult_cases_1968}} (95% CrI {{adult_cases_1968_lb}}-{{adult_cases_1968_ub}}),
with
{{repro_cases_1968}} (95% CrI {{repro_cases_1968_lb}}-{{repro_cases_1968_ub}})
eproductive age females (age 15-49 years).
In 2010, it was
{{adult_cases_2010}} 95% CrI {{adult_cases_2010_lb}}-{{adult_cases_2010_ub}}),
an increase by a factor of
{{case_increase}} (95% CrI {{case_increase_lb}}-{{case_increase_ub}}),
with
{{repro_cases_2010}} 95% CrI {{repro_cases_2010_lb}}-{{repro_cases_2010_ub}})
reproductive age females.
"""

t = jinja2.Template(tmpl)
print t.render(vals)

with file('achd/achd_abstract_results.txt', 'w') as f:
f.write(t.render(vals))

From: Spencer James

Those are some great examples — the biomed-related ones are especially helpful. Some of my more recent projects involve publicly-available datasets like SEER, and I’d like to start also making that effort to have my analytic code available/reproducible alongside the manuscript itself. The collaboration issue with MS Word is a headache in medicine, too — I like the use of Python to write the text itself as a happy medium, though!

If I can write this next SEER manuscript or abstract using a Notebook, I’ll send it your way, too.

Good hearing from you and I’ll follow up on your blog to see if anyone has some other ideas,

Spencer

4 Comments

Filed under software engineering

That Docker thing sounds promising

I missed this presentation, but I am going to figure out how to use Docker for reproducible research soon! http://benmarwick.github.io/UW-eScience-docker-for-reproducible-research/#1

2 Comments

Filed under software engineering

Open and Reproducible Research: Goals, Obstacles, and Solutions

A set of slides from a talk by Matthew Salgnik crossed my inbox recently, titled “Open and Reproducible Research: Goals, Obstacles, and Solutions”. Good stuff! I liked the *bonus points* in the Data-is-available section:

bonus points for releasing extra variables that are not need to reproduce specific analysis.

This gets at what I think is really the point of reproducible research. To make it faster and easier to make new knowledge.

Comments Off on Open and Reproducible Research: Goals, Obstacles, and Solutions

Filed under science policy

Reproducible Computational Research by UW Folks

This interesting thing crossed my inbox during the quiet time between quarters:

Inspired by Dave and Randy’s presentations earlier in the quarter, our lab happened to publish two preprints today, both with supplemental GitHub repositories.

As mentioned several times, the reproducible part is hard. I would appreciate any feedback on our attempts to provide data and code, and how they might be improved. Of course you are welcome to comment on preprints if you wish.

1) Heare JE, Blake B, Davis JP, Vadopalas B, Roberts SB. (2014) Evidence of Ostrea lurida (Carpenter 1894) population structure in Puget Sound, WA. PeerJ PrePrints 2:e704v1 http://dx.doi.org/10.7287/peerj.preprints.704v1

GitHub Repo (Data and R scripts): https://github.com/jheare/OluridaSurvey2014

2) Indication of family-specific DNA methylation patterns in developing oysters
Claire E. Olson, Steven B. Roberts
bioRxivdoi: http://dx.doi.org/10.1101/012831

GitHub Repo (IPython notebook): https://github.com/che625/olson-ms-nb/tree/1.0

Any feedback on how we might improve our Repositories is certainly welcome.

Very daring. I hope it was ok to share on my blog. I find this level of transparency inspiring.

The discussion that ensued indicates that there is still room for better tools to archive the computational environment where these analyses are being performed. I’ve always dreamed of doing my whole project in a virtual machine and then freezing it for posterity when I’m done. It would be the digital version of keeping a laptop on my shelf for each analysis. Easier said than done, however.

The discussion also resulted in a new wiki listing code products that accompany UW research projects: https://github.com/uwescience/reproducible/wiki/Code-Products

2 Comments

Filed under software engineering