From: Spencer James
Sent: Monday, April 20, 2015 6:33 AM
To: Abraham D. Flaxman
Subject: writing manuscript in Notebook?
Hi Abie,
Hope things are well back in Seattle! I’m about halfway through the Epic Measures book, which has been a great read.. sure is interesting to understand more of the background story.
…
I have another question for you. Do you ever use IPython Notebook to write manuscripts or abstracts? I’ve been trying to figure out a way to work on manuscripts/abstracts such that I have the results/statistics directly embedded into the paper so that I don’t have to continually copy/paste numbers into Word as we go through edits/reviews, ie if I’m reporting an odds ratio, it’d be nice if I could just populate that value in the text itself straight from the logistic regression rather than copy/paste values. I found a few publicly available notebooks where researchers had done something along these lines, but they tended to not be results/data heavy, so I thought I’d see if you had anything like this or know of any better examples.
At any rate, hope all is well — it would be great to hear about what you’re working on these days as well.
Spencer
On Mon, Apr 20, 2015 at 1:59 PM, Abraham D. Flaxman wrote:
Hi Spencer,
It is always a pleasure to hear from you, and it sounds like you are doing a ton of interesting stuff!
Regarding IPython Notebooks, there is a crowd of hardcore IPython users who want to do exactly what you describe, and they even have collection of successful examples: https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications
When I last considered doing this, I thought it was promising, but not quite ready. A particular challenge is working with colleagues who like to write collaboratively in MSWord using Track Changes. As a hybrid solution, what I’ve been doing is using python to insert values from calculations into blocks of text and the copy-paste those into Word documents. This saves some of the pain of updating figures after new data comes in, etc, but is less burdensome on people I’m working with who do not prioritize exploring the cutting edge of technology (bleeding edge?).
I’ll make this into a blog, if you don’t mind. Maybe others will be interested, and/or have better ideas.
Here is an example of what my approach might look like:
import jinja2
tmpl = """
Results.
The birth prevalence of moderate and severe CHD in 2010 for males was
{{birth_prev_m}}
per 1,000
(95% Credible Interval [CrI] {{birth_prev_m_lb}}-{{birth_prev_m_ub}}),
and for females was
{{birth_prev_f}}
per 1,000
(95% CrI {{birth_prev_f_lb}}-{{birth_prev_f_ub}}).
From 1968 to 2010, mortality with heart defects (all ages)
declined {{all_mx_decline_pct}}%,
from
{{mx_per_100000_in_1968}}
to
{{mx_per_100000_in_2010}}
per 100 000 person-years (PY);
among zero to 51-week olds,
the decline was
{{u1_mx_decline_pct}}%,
from
{{u1_mx_per_100000_in_1968}}
to
{{u1_mx_per_100000_in_2010}}
per 100 000 PY. The estimated number of adults (age 20 to 65 years)
with moderate to severe CHD in 1968
was
{{adult_cases_1968}} (95% CrI {{adult_cases_1968_lb}}-{{adult_cases_1968_ub}}),
with
{{repro_cases_1968}} (95% CrI {{repro_cases_1968_lb}}-{{repro_cases_1968_ub}})
eproductive age females (age 15-49 years).
In 2010, it was
{{adult_cases_2010}} 95% CrI {{adult_cases_2010_lb}}-{{adult_cases_2010_ub}}),
an increase by a factor of
{{case_increase}} (95% CrI {{case_increase_lb}}-{{case_increase_ub}}),
with
{{repro_cases_2010}} 95% CrI {{repro_cases_2010_lb}}-{{repro_cases_2010_ub}})
reproductive age females.
"""
t = jinja2.Template(tmpl)
print t.render(vals)
with file('achd/achd_abstract_results.txt', 'w') as f:
f.write(t.render(vals))
From: Spencer James
Those are some great examples — the biomed-related ones are especially helpful. Some of my more recent projects involve publicly-available datasets like SEER, and I’d like to start also making that effort to have my analytic code available/reproducible alongside the manuscript itself. The collaboration issue with MS Word is a headache in medicine, too — I like the use of Python to write the text itself as a happy medium, though!
If I can write this next SEER manuscript or abstract using a Notebook, I’ll send it your way, too.
Good hearing from you and I’ll follow up on your blog to see if anyone has some other ideas,
Spencer
Hi,
I am exploring a similar question myself and documenting my findings in my blog: http://robclewley.github.io
I am hoping that new collaborative authoring tools such as authorea will eat way at this annoying insistence that so many science folk still have on MS Word. Word is absolutely not meant to have anything to do with technical writing involving math or code! There are sadly not so many replacements for tracked changes, though, unless you’re good at reading `diff` output. But your copy-paste solution still gives me the shivers…
To summarize my findings with ipython so far: it’s *sometimes* useful and appropriate. Specifically, more involved computational examples tend to require more code modularity (if you’re “doing it properly”), implying multiple modules. So, these modules can’t all be presented on the same page (by definition!). So, the notebooks are great for scenarios that can be written out entirely as single, self-contained, scripts. That *is* fine for many examples, including some manuscripts, but sometimes that encourages bad programming style to fit the situation to the tool, rather than the correct (other) way around.
In my computational science, my “manuscripts” are often presenting new software tools and applications as part of the topic of the manuscript. This code requires more structure and commentary than I can force it into a single notebook format. For instance, I can’t add metadata / markup *in the middle* of a code block: Ipython code blocks have to be syntactically complete blocks. That doesn’t help me when I’m writing about my work.
Lastly, marking up references and citations with notebooks is also not there yet, although I believe it is being actively worked on. Jupyter is rapidly evolving and there are more and more cool things that it can do every month or two, it seems. For instance, Github *just* announced today that ipynb (Jupyter, to be more precise and general) notebooks are now natively rendered on github pages, like markdown syntax already is. This is an awesome development!
Also, there are many plugins and extensions for Jupyter that give greater functionality for doing “smart” things between code and markup, or making the notebooks more compatible with other tools and workflows, so I have hope that within a year or two the answer might evolve to be a wholehearted “yes”. I know folk who are major contributors to the Jupyter ecosystem and they seem incredibly open and enthusiastic towards implementing whatever the scientists need to make it more usable.
Cheers,
Rob
Thanks for posting, it seems like you’ve got some really interesting stuff along these lines! Sorry about the nasty copy-and-paste solution… I’m looking for something better, too!
Also, O’Reilly have just announced first-class authoring using Jupyter on Friday, with an expectation that notebooks will convert to markdown before publication: https://beta.oreilly.com/ideas/jupyter-at-oreilly
oh, that is so cool!