Delta Time in Python: Simple calendar times with Pandas

Here is something that Google did not help with as quickly as I would have expected: how do I convert start and stop times into the time between events in seconds (or minutes)?

Or for the busy searcher “how do I convert Pandas Timedelta to seconds”?

The classy answer is:

start_time = df.interviewstarttime.map(pd.Timestamp)
end_time = df.interviewendtime.map(pd.Timestamp)

((end_time-start_time) / pd.Timedelta(minutes=1)).describe()

I found it hidden away here: http://www.datasciencebytes.com/bytes/2015/05/16/pandas-timedelta-histograms-unit-conversion-and-overflow-danger/

6 Comments

Filed under statistics

6 responses to “Delta Time in Python: Simple calendar times with Pandas

  1. Anonymous

    As a minor FYI, df.foo.map(pd.Timestamp) is pretty slow, its analogous to calling map on a numpy array instead of using a numpy elementwise function. The preferred/faster version is:

    start_time = pd.to_datetime(df.interviewstartime)

    Which is even quicker if you specify the format string rather than making it infer it:

    start_time = pd.to_datetime(df.interviewstartime, format='%Y/%m/%d %H:%M)

  2. Cool, thanks! For my dataset, this doesn’t speed things up noticeably, but I’ll keep this tip in my back pocket in case I need it. How much data do you think I need before I see a difference?


    %timeit start_time = df.interviewstarttime.map(pd.Timestamp)
    10 loops, best of 3: 29.6 ms per loop

    %timeit start_time = pd.to_datetime(df.interviewstarttime)
    10 loops, best of 3: 27.9 ms per loop

  3. Anonymous

    My table has a couple million rows in it, definitely noticable at that point ๐Ÿ™‚

  4. Anonymous

    The format string is key really, since it lets it skip trying to infer the format for each individual string:

    In [17]: N = 10000
    
    In [18]: dt = pd.date_range(end=pd.Timestamp.now(), periods=N, freq='min').to_series().reset_index(drop=True)
    
    In [19]: ss = dt.dt.strftime('%Y/%m/%d %H:%M')
    
    In [20]: %timeit ss.map(pd.Timestamp)
    1 loop, best of 3: 884 ms per loop
    
    In [21]: %timeit pd.to_datetime(ss)
    1 loop, best of 3: 860 ms per loop
    
    In [22]: %timeit pd.to_datetime(ss, format='%Y/%m/%d %H:%M')
    10 loops, best of 3: 28.4 ms per loop
    
  5. Nice demonstration. Now that you’ve got me curious, I wonder what happens to this 30x speed-up as a function of N. Seems like it shines even more:
    N map to_dt to_dt_w_fmt speed_up
    10000000 3243.196088 2030.612348 89.009663 36.436450