Tag Archives: python

Python Pandas Intros

I’m going to give a Python Pandas guest lecture in the Python Science class next week, and I thought I’d take a look at the Pandas intros that are out there. There are a lot now! Here are some that I flipped through:

http://pandas.pydata.org/pandas-docs/stable/10min.html

http://nbviewer.ipython.org/gist/fonnesbeck/5850375

http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/

http://www.gregreda.com/2013/10/26/using-pandas-on-the-movielens-dataset/

http://synesthesiam.com/posts/an-introduction-to-pandas.html

http://www.datarobot.com/blog/introduction-to-python-for-statistical-learning/

http://www.kevinsheppard.com/images/0/09/Python_introduction.pdf

http://blog.kaggle.com/2013/01/17/getting-started-with-pandas-predicting-sat-scores-for-new-york-city-schools/

Its fun being a teacher in the age of information.

Leave a comment

Filed under education

MCMC in Python: sim and fit with same model

Here is a github issue and solution that I saw the other day. I think it’s a nice pattern.

def generate_model(values={'mu': true_param, 'm': None}):

    #prior
    mu = pymc.Uniform("mu", lower=-10, upper=10, value=values['mu'], 
        observed=(values['mu'] is not None))

    # likelihood function
    m = pymc.Normal("m", mu=mu, tau=tau, value=values['m'], 
        observed=(values['m'] is not None))

    return locals()

Comments Off

Filed under statistics

Tabular Data in Python: Getting just the columns I want from pandas.DataFrame.describe

The Python Pandas DataFrame object has become the mainstay of my data manipulation work over the last two years. One thing that I like about it is the `.describe()` method, that computes lots of interesting things about columns of a table. I often want those results stratified, and `.groupby(col)` + `.describe()` is a powerful combination for doing that.

*But* today, and many days, I don’t want all of the things that `.describe()` describes. And the ones that I do want, I want as columns. Here is the recipe for that:

import pandas as pd

df = pd.DataFrame({'A': [0,0,0,0,1,1],
                   'B': [1,2,3,4,5,6],
                   'C': [8,9,10,11,12,13]})

df.groupby('A').describe().unstack()\
    .loc[:,(slice(None),['count','mean']),]

and out comes just what I wanted:

       B            C
   count  mean  count  mean
A
0      4   2.5      4   9.5
1      2   5.5      2  12.5

It took me a while to figure this out, and these docs helped:

http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-stacking-and-unstacking

http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-xs

Here it is as a ipython notebook.

(Note: this requires Pandas version at least 0.14.)

Comments Off

Filed under software engineering

MCMC in Python: a bake-off

While I’m on a microblogging spree, I’ve been meaning to link to this informative comparison of pymc, emcee, and pystan: http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/

Comments Off

Filed under statistics

MCMC in Python: Estimating failure rates from observed data

A question and answer on CrossValidated, which make me reflect on the danger of knowing enough statistics to be dangerous.

Comments Off

Filed under statistics

MCMC in Python: How to make a custom sampler in PyMC

The PyMC documentation is a little slim on the topic of defining a custom sampler, and I had to figure it out for some DisMod work over the years. Here is a minimal example of how I did it, in answer to a CrossValidated question.

Comments Off

Filed under MCMC

IDV in Python: Interactive heatmap with Pandas and mpld3

I’ve been having a good time following the development of the mpld3 package, and I think it has a lot of potential for making interactive data visualization part of my regular workflow instead of that special something extra. A few weeks ago, an mpld3 user showed up with an interesting challenge, and solved their own problem quite well.

I finally got a chance to look at it today, and with a little spit-and-polish this could be something really useful for me.

ihm

Comments Off

Filed under dataviz, software engineering