This interesting thing crossed my inbox during the quiet time between quarters:
Inspired by Dave and Randy’s presentations earlier in the quarter, our lab happened to publish two preprints today, both with supplemental GitHub repositories.
As mentioned several times, the reproducible part is hard. I would appreciate any feedback on our attempts to provide data and code, and how they might be improved. Of course you are welcome to comment on preprints if you wish.
1) Heare JE, Blake B, Davis JP, Vadopalas B, Roberts SB. (2014) Evidence of Ostrea lurida (Carpenter 1894) population structure in Puget Sound, WA. PeerJ PrePrints 2:e704v1 http://dx.doi.org/10.7287/peerj.preprints.704v1
GitHub Repo (Data and R scripts): https://github.com/jheare/OluridaSurvey2014
2) Indication of family-specific DNA methylation patterns in developing oysters
Claire E. Olson, Steven B. Roberts
bioRxivdoi: http://dx.doi.org/10.1101/012831GitHub Repo (IPython notebook): https://github.com/che625/olson-ms-nb/tree/1.0
Any feedback on how we might improve our Repositories is certainly welcome.
Very daring. I hope it was ok to share on my blog. I find this level of transparency inspiring.
The discussion that ensued indicates that there is still room for better tools to archive the computational environment where these analyses are being performed. I’ve always dreamed of doing my whole project in a virtual machine and then freezing it for posterity when I’m done. It would be the digital version of keeping a laptop on my shelf for each analysis. Easier said than done, however.
The discussion also resulted in a new wiki listing code products that accompany UW research projects: https://github.com/uwescience/reproducible/wiki/Code-Products
I’ve recently begun working with Vagrant and Virtual Machines. With Vagrant you can define the exact OS, packages, python libs, R libs, etc. For example, if I had a specific environment , I can package that up, share it in the cloud, and someone else can initiate/download my exact environment. It’s very interesting. An example is here=> http://datasciencetoolbox.org/
@joelotz2014: Yes! Vagrant sounds so promising, but I have not had time to figure out if it will work for me. Does the dst site have good advice about doing the packaging/sharing, or is it mostly about getting a specific shared environment running locally?