Category Archives: software engineering

Ben Marwick on ‘The Conversation’: How computers broke science – and what we can do to fix it

—–Original Message—–
From: Reproducible On Behalf Of Ben Marwick
Sent: Monday, November 9, 2015 5:58 AM
Subject: [Reproducible] My article on ‘The Conversation’: How computers broke science – and what we can do to fix it

I wrote a short essay on reproducible research and how researchers use computers for a popular media outlet (citing UW eScience):

https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938

Please leave a comment at the bottom to help demonstrate to other readers that there really is a movement toward this way of working, and I’m not making it up!

Ben

Comments Off on Ben Marwick on ‘The Conversation’: How computers broke science – and what we can do to fix it

Filed under software engineering

Using the sklearn grid_search tools

Scikit-learn has a really nice grid search module. It will soon be called model_selection, because it has grown beyond simple grid search. But here is the spirit of it:

import sklearn.svm, sklearn.grid_search, sklearn.datasets.samples_generator
parameters = {'kernel':('poly', 'rbf'), 'C':[.01, .1, 1, 10, 100]}
clf = sklearn.grid_search.GridSearchCV(
    sklearn.svm.SVC(probability=True),
    parameters,
    n_jobs=64)
X, y = sklearn.datasets.samples_generator.make_classification(n_samples=200, n_features=5, random_state=12345)
clf.fit(X, y)
clf.best_params_

And say you want to take a careful look at the results? They are all in there, too. http://nbviewer.ipython.org/gist/aflaxman/cb0660e602d361d06599

Comments Off on Using the sklearn grid_search tools

Filed under machine learning, software engineering

My top 20 numpy calls

One fun thing about using the IPython Notebook as my lab book for all my research is that I can do “me”-search in my copious spare time, for example to see the top 25 `numpy` calls I’ve used this year:

In [1]:
import glob

In [2]:
lines = ''
for fname in glob.glob('*.py'):
    with file(fname) as f:
        lines += f.read()
        lines += '\n'

In [3]:
import re

# Find top np.* calls

In [9]:
np_calls = re.findall('np\.[\w\.]+', lines)
np_calls[:5]
Out[9]:
['np.linspace',
 'np.random.random',
 'np.random.normal',
 'np.sqrt',
 'np.random.normal']

In [10]:
import pandas as pd

In [12]:
pd.Series(np_calls).value_counts().head(20)
Out[12]:
np.array            219
np.random.normal    170
np.mean             130
np.random.seed      126
np.round            124
np.log              119
np.exp              114
np.linspace          96
np.random.choice     84
np.where             84
np.zeros             78
np.ones              65
np.dot               62
np.empty             62
np.sum               52
np.absolute          49
np.nan               47
np.arange            45
np.inf               38
np.sqrt              37

Number one thing: `np.array`! I wonder why I use that.

Comments Off on My top 20 numpy calls

Filed under software engineering

Smudge variant 2

Speaking of smudge variants, here is a multitouch drawing game that reminds me of lightbrite, if you know what I’m talking about. The 3 year old calls it “candy smudge” or “game smudge” depending on mood: http://bl.ocks.org/aflaxman/raw/a18d97fd21ae6a0ac171/

Comments Off on Smudge variant 2

Filed under software engineering

Smudge variant 1

Do you remember a year ago when I made a multitouch drawing thingie with d3js? https://healthyalgorithms.com/2014/12/12/introducing-smudge/

My 3 year old and I modded it, and may we now share “castle smudge” with you: http://bl.ocks.org/aflaxman/raw/26fc5aa0e14b01754b0f/

Comments Off on Smudge variant 1

Filed under software engineering

I asked for an improvement to mpld3, and somebody did it!

So cool! http://nbviewer.ipython.org/gist/aflaxman/153dc591c6b63578d9ec

If you don’t ask, how would somebody know that you want it:
https://github.com/jakevdp/mpld3/issues/312

Comments Off on I asked for an improvement to mpld3, and somebody did it!

Filed under dataviz, software engineering

I missed SciPy 2015, what did I miss?

Hours of video available online: http://www.analyticsvidhya.com/blog/2015/07/data-science-videos-scipy-2015/

Comments Off on I missed SciPy 2015, what did I miss?

Filed under software engineering

Porting PyMC2 models to PyMC3

It is time for me to start doing this. Here is a StackOverflow question and answer about it, if it is time for you, too: http://stackoverflow.com/questions/30798447/porting-pymc2-code-to-pymc3-hierarchical-model-for-sports-analytics/30853077#30853077

Comments Off on Porting PyMC2 models to PyMC3

Filed under software engineering

MCMC in Python: Gaussian mixture model in PyMC3

PyMC3 is really coming along. I tried it out on a Gaussian mixture model that was the subject of some discussion on GitHub: https://github.com/pymc-devs/pymc3/issues/443#issuecomment-109813012 http://nbviewer.ipython.org/gist/aflaxman/64f22d07256f67396d3a

1 Comment

Filed under MCMC, software engineering, statistics

IDV in Python: Interactive Legend in MPLD3

We got a nice pull request for mpld3 recently, interactive legends. Another chance for me to use my new GIF animation recorder: https://github.com/jakevdp/mpld3/pull/299#issuecomment-110434953

Comments Off on IDV in Python: Interactive Legend in MPLD3

Filed under dataviz, software engineering