My new favorite for pythonic data wrangling

I’ve written before about my search for the way to deal with data in python. It’s time to write again, though because I have a new favorite: pandas, the panel data package.

There is copious, and growing documentation for pandas, but it assumes a level of familiarity with python and numpy. I thought I’d write some little examples calculations that I’ve done with pandas recently to complement the real docs with some “recipes”. You don’t really need to know python to use these, let alone numpy.

To begin, here are the creation and subset routines in pandas that do the same work that my last foray into this subject accomplished with the rec_array:

import pandas
a = ['USA','USA','CAN']
b = [1,6,4]
c = [1990.1,2005.,1995.]
d = ['x','y','z']
df = pandas.DataFrame({'country': a, 'age': b, 'year': c, 'data': d})

This is cooler than a rec_array because you don’t have to dig in the docs for the constructor, and you can use a dictionary to name each column.

You can select the subset of data relevant to a particular country-year-age thusly:

df[(df['country']=='USA') & (df['age']==6) & (df['year']==2005)]

~~This is not as cool as a rec_array, because writing df['age'] has more characters than df.age, but I feel churlish to complain about it.~~
It’s good that I complained about my uncool df['age'] business, because I learned that df.age works, too, as long as you are using an up-to-date pandas.

More substantial recipe to come. Is there already a cookbook out there?

5 responses to “My new favorite for pythonic data wrangling”

Ben

January 9, 2012 at 5:19 pm

Pandas can do data.age! When you add a column via dictionary, their names get added to the data frame as an attribute.
Abraham Flaxman

January 9, 2012 at 6:41 pm

Awesome, you’re right! I thought it didn’t because I was using an old version.
Rafael S. Calsaverini

January 17, 2012 at 9:46 pm

Hi!
Did you tried using pandas with pymc? Do these two packages talk well to one another?
Can I have a pymc stochastic that is a DataFrame or a Series?
Abraham Flaxman

January 17, 2012 at 9:50 pm

Good question… I expect that you can, so give it a try. Maybe I will to in a future blog post.
Pingback: PyMC+Pandas: Poisson Regression Example | Healthy Algorithms

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

My new favorite for pythonic data wrangling

5 responses to “My new favorite for pythonic data wrangling”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

My new favorite for pythonic data wrangling

Share this:

Related

5 responses to “My new favorite for pythonic data wrangling”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta