Tag Archives: python

Can I use unittest.mock in python?

A question I would like to answer. Perhaps this is the place to start: https://docs.python.org/3/library/unittest.mock.html

Comments Off on Can I use unittest.mock in python?

Filed under software engineering

Two ways to do categorical data in Python

Pandas has it: http://pandas-docs.github.io/pandas-docs-travis/categorical.html

Python has it, too: https://docs.python.org/3/library/enum.html

Will this make my life easier?

Comments Off on Two ways to do categorical data in Python

Filed under software engineering

Scientific Constants

A fun pointer I picked up at the eScience institute recently is where to find constants in python: scipy.constants http://docs.scipy.org/doc/scipy/reference/constants.html

or, if you fancy units, astropy.constants http://docs.astropy.org/en/stable/constants/index.html

Comments Off on Scientific Constants

Filed under disease modeling

What style should I use for my docstrings?

The numpy docstring style should be just fine: http://sphinxcontrib-napoleon.readthedocs.org/en/latest/example_numpy.html

Comments Off on What style should I use for my docstrings?

Filed under software engineering

SciPy2014 Plotting Contest

I helped judge a plotting contest for the Scientific Python conference last summer. Who won? I don’t know, and a short web searching binge didn’t find out. A lovely plot took 3rd place, and every entry is here (with sourcecode). Good stuff for seeing how different groups do different tricks, and for checking what still doesn’t work in mpld3.

Comments Off on SciPy2014 Plotting Contest

Filed under dataviz

Styling Excel with Pandas

I had a bunch of stylish tables to make once long ago, and I thought, “why don’t I do that automatically?” It would take longer the first time, but it would be faster in future iterations. Unfortunately, there never were any future iterations, but fortunately, it was more fun to research automatic generation of stylish tables than do what I needed to get done.

The seeds I planted have started to sprout a little bit, though, and the latest pandas now supports openpyxl2 which supports a lot of style. So here is a start on the stylish table writing feature.

Comments Off on Styling Excel with Pandas

Filed under software engineering

Python Pandas Intros

I’m going to give a Python Pandas guest lecture in the Python Science class next week, and I thought I’d take a look at the Pandas intros that are out there. There are a lot now! Here are some that I flipped through:

http://pandas.pydata.org/pandas-docs/stable/10min.html
http://nbviewer.ipython.org/gist/fonnesbeck/5850375
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/
http://www.gregreda.com/2013/10/26/using-pandas-on-the-movielens-dataset/
http://synesthesiam.com/posts/an-introduction-to-pandas.html
https://www.youtube.com/watch?v=p8hle-ni-DM
http://www.datarobot.com/blog/introduction-to-python-for-statistical-learning/

Click to access Python_introduction.pdf

http://blog.kaggle.com/2013/01/17/getting-started-with-pandas-predicting-sat-scores-for-new-york-city-schools/

Its fun being a teacher in the age of information.

2 Comments

Filed under education

MCMC in Python: sim and fit with same model

Here is a github issue and solution that I saw the other day. I think it’s a nice pattern.

def generate_model(values={'mu': true_param, 'm': None}):

    #prior
    mu = pymc.Uniform("mu", lower=-10, upper=10, value=values['mu'], 
        observed=(values['mu'] is not None))

    # likelihood function
    m = pymc.Normal("m", mu=mu, tau=tau, value=values['m'], 
        observed=(values['m'] is not None))

    return locals()

Comments Off on MCMC in Python: sim and fit with same model

Filed under statistics

Tabular Data in Python: Getting just the columns I want from pandas.DataFrame.describe

The Python Pandas DataFrame object has become the mainstay of my data manipulation work over the last two years. One thing that I like about it is the `.describe()` method, that computes lots of interesting things about columns of a table. I often want those results stratified, and `.groupby(col)` + `.describe()` is a powerful combination for doing that.

*But* today, and many days, I don’t want all of the things that `.describe()` describes. And the ones that I do want, I want as columns. Here is the recipe for that:

import pandas as pd

df = pd.DataFrame({'A': [0,0,0,0,1,1],
                   'B': [1,2,3,4,5,6],
                   'C': [8,9,10,11,12,13]})

df.groupby('A').describe().unstack()\
    .loc[:,(slice(None),['count','mean']),]

and out comes just what I wanted:

       B            C
   count  mean  count  mean
A
0      4   2.5      4   9.5
1      2   5.5      2  12.5

It took me a while to figure this out, and these docs helped:
http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-stacking-and-unstacking
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-xs

Here it is as a ipython notebook.

(Note: this requires Pandas version at least 0.14.)

Comments Off on Tabular Data in Python: Getting just the columns I want from pandas.DataFrame.describe

Filed under software engineering

MCMC in Python: a bake-off

While I’m on a microblogging spree, I’ve been meaning to link to this informative comparison of pymc, emcee, and pystan: http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/

Comments Off on MCMC in Python: a bake-off

Filed under statistics