The Python Pandas DataFrame object has become the mainstay of my data manipulation work over the last two years. One thing that I like about it is the `.describe()` method, that computes lots of interesting things about columns of a table. I often want those results stratified, and `.groupby(col)` + `.describe()` is a powerful combination for doing that.
*But* today, and many days, I don’t want all of the things that `.describe()` describes. And the ones that I do want, I want as columns. Here is the recipe for that:
import pandas as pd df = pd.DataFrame({'A': [0,0,0,0,1,1], 'B': [1,2,3,4,5,6], 'C': [8,9,10,11,12,13]}) df.groupby('A').describe().unstack()\ .loc[:,(slice(None),['count','mean']),]
and out comes just what I wanted:
B C count mean count mean A 0 4 2.5 4 9.5 1 2 5.5 2 12.5
It took me a while to figure this out, and these docs helped:
http://pandas.pydata.org/pandas-docs/stable/reshaping.html#reshaping-by-stacking-and-unstacking
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-xs
Here it is as a ipython notebook.
(Note: this requires Pandas version at least 0.14.)