March 16, 2016 · 8:00 am

Delta Time in Python: Simple calendar times with Pandas

Here is something that Google did not help with as quickly as I would have expected: how do I convert start and stop times into the time between events in seconds (or minutes)?

Or for the busy searcher “how do I convert Pandas Timedelta to seconds”?

The classy answer is:

start_time = df.interviewstarttime.map(pd.Timestamp)
end_time = df.interviewendtime.map(pd.Timestamp)

((end_time-start_time) / pd.Timedelta(minutes=1)).describe()

I found it hidden away here: http://www.datasciencebytes.com/bytes/2015/05/16/pandas-timedelta-histograms-unit-conversion-and-overflow-danger/

6 Comments

Filed under statistics

Tagged as pandas, python

6 responses to “Delta Time in Python: Simple calendar times with Pandas”

Anonymous

March 17, 2016 at 3:55 am

As a minor FYI, df.foo.map(pd.Timestamp) is pretty slow, its analogous to calling map on a numpy array instead of using a numpy elementwise function. The preferred/faster version is:

start_time = pd.to_datetime(df.interviewstartime)
Which is even quicker if you specify the format string rather than making it infer it:

start_time = pd.to_datetime(df.interviewstartime, format='%Y/%m/%d %H:%M)
Abraham Flaxman

March 17, 2016 at 6:29 am

Cool, thanks! For my dataset, this doesn’t speed things up noticeably, but I’ll keep this tip in my back pocket in case I need it. How much data do you think I need before I see a difference?

%timeit start_time = df.interviewstarttime.map(pd.Timestamp) 10 loops, best of 3: 29.6 ms per loop
%timeit start_time = pd.to_datetime(df.interviewstarttime) 10 loops, best of 3: 27.9 ms per loop
Anonymous

March 17, 2016 at 12:25 pm

My table has a couple million rows in it, definitely noticable at that point 🙂
Abraham Flaxman

March 17, 2016 at 9:55 pm

🙂

Anonymous

March 18, 2016 at 1:22 am

The format string is key really, since it lets it skip trying to infer the format for each individual string:

In [17]: N = 10000

In [18]: dt = pd.date_range(end=pd.Timestamp.now(), periods=N, freq='min').to_series().reset_index(drop=True)

In [19]: ss = dt.dt.strftime('%Y/%m/%d %H:%M')

In [20]: %timeit ss.map(pd.Timestamp)
1 loop, best of 3: 884 ms per loop

In [21]: %timeit pd.to_datetime(ss)
1 loop, best of 3: 860 ms per loop

In [22]: %timeit pd.to_datetime(ss, format='%Y/%m/%d %H:%M')
10 loops, best of 3: 28.4 ms per loop

Abraham Flaxman

March 18, 2016 at 7:40 pm

Nice demonstration. Now that you’ve got me curious, I wonder what happens to this 30x speed-up as a function of N. Seems like it shines even more:
N map to_dt to_dt_w_fmt speed_up 10000000 3243.196088 2030.612348 89.009663 36.436450

Posts
aco ai ai4hm algorithms baby animals Bayesian books conference contest costs dataviz data viz disease modeling dismod diversity diversity club free/open source funding gaussian processes gbd global health health inequality health metrics health records idv IDV4GH ihme infoviz ipython iraq journal club machine learning malaria matching algorithms matchings MCMC media microsimulation mortality mpld3 my research Mysteries networks networkx optimization orms pandas privacy probability public health pymc pymc3 python random effects reading list reproducible research reproductive health research jobs seminar sklearn software carpentry spanning trees sparql statistics stats survey talks TCS teaching Theory Blogs travel tutorial va verbal autopsy vital registration
Theory Blogs
some rights reserved

This material is released under the Creative Commons Noncommercial Attribution Share-Alike 3.0 License
Pages
- About
March 2016

M T W T F S S

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30 31

« Feb Apr »
Archives
Archives
Meta

Delta Time in Python: Simple calendar times with Pandas

Share this:

Related

6 responses to “Delta Time in Python: Simple calendar times with Pandas”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta