Healthy Algorithms

May 12, 2015 · 8:00 am

By no means unhelpful

Good advice from Density Estimation for Statistics and Data Analysis by Bernard. W. Silverman:

https://books.google.com/books?id=e-xsrjsL7WkC&lpg=PP1&dq=density%20estimation%20silverman&pg=PA45#v=onepage&q=statistician%20scientist&f=false

2 Comments

Filed under statistics

Tagged as books

May 11, 2015 · 8:00 am

Updated systematic review guidelines

Preferred Reporting Items for a Systematic Review and Meta-analysis of Individual Participant Data: The PRISMA-IPD Statement
Lesley A. Stewart, PhD, Mike Clarke, DPhil, Maroeska Rovers, PhD, et al.
JAMA. 2015;313(16):1657 doi:10.1001/jama.2015.3656
http://jama.jamanetwork.com/article.aspx?articleid=2279718

Editorial: Researchers, Readers, and Reporting Guidelines; Robert M. Golub, MD; Phil B. Fontanarosa, MD
http://jama.jamanetwork.com/article.aspx?articleid=2279693

Comments Off on Updated systematic review guidelines

Filed under global health

Tagged as systematic review

May 8, 2015 · 8:00 am

Enhancing Team Science

Enhancing the Effectiveness of Team Science

The past half-century has witnessed a dramatic increase in the scale and complexity of scientific research. The growing scale of science has been accompanied by a shift toward collaborative research, referred to as “team science.” Scientific research is increasingly conducted by small teams and larger groups rather than individual investigators, but the challenges of collaboration can slow these teams’ progress in achieving their scientific goals. How does a team-based approach work, and how can universities and research institutions support teams? Enhancing the Effectiveness of Team Science synthesizes and integrates the available research to provide guidance on assembling the science team; leadership, education and professional development for science teams and groups. It also examines institutional and organizational structures and policies to support science teams and identifies areas where further research is needed to help science teams and groups achieve their scientific and translational goals. This report offers major public policy recommendations for science research agencies and policymakers, as well as recommendations for individual scientists, disciplinary associations, and research universities. Enhancing the Effectiveness of Team Science will be of interest to university research administrators, team science leaders, science faculty, and graduate and postdoctoral students.

http://www.nap.edu/catalog/19007/enhancing-the-effectiveness-of-team-science

Comments Off on Enhancing Team Science

Filed under science policy

Tagged as team science

May 7, 2015 · 8:00 am

Writing papers in the IPython Notebook

From: Spencer James
Sent: Monday, April 20, 2015 6:33 AM
To: Abraham D. Flaxman
Subject: writing manuscript in Notebook?

Hi Abie,

Hope things are well back in Seattle! I’m about halfway through the Epic Measures book, which has been a great read.. sure is interesting to understand more of the background story.

…

I have another question for you. Do you ever use IPython Notebook to write manuscripts or abstracts? I’ve been trying to figure out a way to work on manuscripts/abstracts such that I have the results/statistics directly embedded into the paper so that I don’t have to continually copy/paste numbers into Word as we go through edits/reviews, ie if I’m reporting an odds ratio, it’d be nice if I could just populate that value in the text itself straight from the logistic regression rather than copy/paste values. I found a few publicly available notebooks where researchers had done something along these lines, but they tended to not be results/data heavy, so I thought I’d see if you had anything like this or know of any better examples.

At any rate, hope all is well — it would be great to hear about what you’re working on these days as well.

Spencer

On Mon, Apr 20, 2015 at 1:59 PM, Abraham D. Flaxman wrote:
Hi Spencer,

It is always a pleasure to hear from you, and it sounds like you are doing a ton of interesting stuff!

Regarding IPython Notebooks, there is a crowd of hardcore IPython users who want to do exactly what you describe, and they even have collection of successful examples: https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications

When I last considered doing this, I thought it was promising, but not quite ready. A particular challenge is working with colleagues who like to write collaboratively in MSWord using Track Changes. As a hybrid solution, what I’ve been doing is using python to insert values from calculations into blocks of text and the copy-paste those into Word documents. This saves some of the pain of updating figures after new data comes in, etc, but is less burdensome on people I’m working with who do not prioritize exploring the cutting edge of technology (bleeding edge?).

I’ll make this into a blog, if you don’t mind. Maybe others will be interested, and/or have better ideas.

Here is an example of what my approach might look like:
import jinja2


tmpl = """

Results.

The birth prevalence of moderate and severe CHD in 2010 for males was

{{birth_prev_m}}

per 1,000

(95% Credible Interval [CrI] {{birth_prev_m_lb}}-{{birth_prev_m_ub}}),

and for females was

{{birth_prev_f}}

per 1,000

(95% CrI {{birth_prev_f_lb}}-{{birth_prev_f_ub}}).

From 1968 to 2010, mortality with heart defects (all ages)

declined {{all_mx_decline_pct}}%,

from

{{mx_per_100000_in_1968}}

to

{{mx_per_100000_in_2010}}

per 100 000 person-years (PY);

among zero to 51-week olds,

the decline was

{{u1_mx_decline_pct}}%,

from

{{u1_mx_per_100000_in_1968}}

to

{{u1_mx_per_100000_in_2010}}

per 100 000 PY. The estimated number of adults (age 20 to 65 years)

with moderate to severe CHD in 1968

was

{{adult_cases_1968}} (95% CrI {{adult_cases_1968_lb}}-{{adult_cases_1968_ub}}),

with

{{repro_cases_1968}} (95% CrI {{repro_cases_1968_lb}}-{{repro_cases_1968_ub}})

eproductive age females (age 15-49 years).

In 2010, it was

{{adult_cases_2010}} 95% CrI {{adult_cases_2010_lb}}-{{adult_cases_2010_ub}}),

an increase by a factor of

{{case_increase}}  (95% CrI {{case_increase_lb}}-{{case_increase_ub}}),

with

{{repro_cases_2010}} 95% CrI {{repro_cases_2010_lb}}-{{repro_cases_2010_ub}})

reproductive age females.

"""
t = jinja2.Template(tmpl)

print t.render(vals)

with file('achd/achd_abstract_results.txt', 'w') as f: f.write(t.render(vals))

From: Spencer James

Those are some great examples — the biomed-related ones are especially helpful. Some of my more recent projects involve publicly-available datasets like SEER, and I’d like to start also making that effort to have my analytic code available/reproducible alongside the manuscript itself. The collaboration issue with MS Word is a headache in medicine, too — I like the use of Python to write the text itself as a happy medium, though!

If I can write this next SEER manuscript or abstract using a Notebook, I’ll send it your way, too.

Good hearing from you and I’ll follow up on your blog to see if anyone has some other ideas,

Spencer

4 Comments

Filed under software engineering

Tagged as reproducible research

May 6, 2015 · 8:00 am

Interesting stuff at Mozilla Science Lab

http://www.mozillascience.org/training

http://mozillascience.github.io/codeReview/intro.html

Comments Off on Interesting stuff at Mozilla Science Lab

Filed under software engineering

May 5, 2015 · 8:00 am

Will this motive my colleagues?

I’m not sure if the framing is quite right, but it should be inspirational: In Praise of Bad Programmers
http://cacm.acm.org/magazines/2010/1/55757-in-praise-of-bad-programmers/abstract

Comments Off on Will this motive my colleagues?

Filed under software engineering

May 4, 2015 · 8:00 am

A problem I know

A key problem in supporting research software development is that funding agencies in many countries do not view software development as an intellectual exercise worthy of a research grant.

From http://www.nature.com/nphys/journal/v11/n5/full/nphys3313.html

Comments Off on A problem I know

Filed under software engineering

March 10, 2015 · 12:00 pm

Interesting Q/A: autocorrelation for categorical var in MCMC

From http://stats.stackexchange.com/questions/10798/measures-of-autocorrelation-in-categorical-values-of-a-markov-chain, a question I run into from time to time:
> Are there any measures of auto-correlation for a sequence of observations of an (unordered) categorical variable?

An (accepted) answer that got me thinking:
> [L]ook directly at the convergence rate for the Markov chain.

My interpretation, in PyMC2 terms: run chain, calculate empirical transition probabilities for categorical variable, examine spectral gap.

Experimental notebook tk.

Comments Off on Interesting Q/A: autocorrelation for categorical var in MCMC

Filed under MCMC

March 2, 2015 · 8:00 am

Interesting Q/A: some good questions about data transformation

I’m continuing my class-prep practice of searching through Cross-Validated questions with tags corresponding to upcoming class topics, and here are some interesting ones I found about data transformations:

http://stats.stackexchange.com/questions/46418/why-is-the-square-root-transformation-recommended-for-count-data
http://stats.stackexchange.com/questions/1444/how-should-i-transform-non-negative-data-including-zeros
http://stats.stackexchange.com/questions/27951/when-are-log-scales-appropriate
http://stats.stackexchange.com/questions/90149/pitfalls-to-avoid-when-transforming-data
http://stats.stackexchange.com/questions/60777/what-are-the-assumptions-of-negative-binomial-regression

The last one isn’t really about data transformations, but is still interesting.

Comments Off on Interesting Q/A: some good questions about data transformation

Filed under machine learning

Tagged as ai4hm

February 23, 2015 · 12:00 pm

Tables of Stacked Bars in mpl (but not mpld3)

Here is a little feature in Matplotlib that I never saw before: stacked bar plots with tables attached. Perhaps too ugly for my Iraq Mortality stacked bar charts, but definitely handy for exploratory work.

I learned about it because it doesn’t work in `mpld3`… just one more benefit of being part of an open-source project. It would be so cool to have a `mpld3` version with some interactivity included, since interactivity can address one pitfalls of the stacked bar chart, the challenge of comparing lengths with different baselines.

Comments Off on Tables of Stacked Bars in mpl (but not mpld3)

Filed under dataviz

Tagged as IDV4GH, mpld3

Healthy Algorithms

By no means unhelpful

Updated systematic review guidelines

Enhancing Team Science

Writing papers in the IPython Notebook

Interesting stuff at Mozilla Science Lab

Will this motive my colleagues?

A problem I know

Interesting Q/A: autocorrelation for categorical var in MCMC

Interesting Q/A: some good questions about data transformation

Tables of Stacked Bars in mpl (but not mpld3)

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta