Here is something that Google did not help with as quickly as I would have expected: how do I convert start and stop times into the time between events in seconds (or minutes)?
Or for the busy searcher “how do I convert Pandas Timedelta to seconds”?
The classy answer is:
start_time = df.interviewstarttime.map(pd.Timestamp) end_time = df.interviewendtime.map(pd.Timestamp) ((end_time-start_time) / pd.Timedelta(minutes=1)).describe()
I found it hidden away here: http://www.datasciencebytes.com/bytes/2015/05/16/pandas-timedelta-histograms-unit-conversion-and-overflow-danger/
As a minor FYI,
df.foo.map(pd.Timestamp)
is pretty slow, its analogous to calling map on a numpy array instead of using a numpy elementwise function. The preferred/faster version is:start_time = pd.to_datetime(df.interviewstartime)
Which is even quicker if you specify the format string rather than making it infer it:
start_time = pd.to_datetime(df.interviewstartime, format='%Y/%m/%d %H:%M)
Cool, thanks! For my dataset, this doesn’t speed things up noticeably, but I’ll keep this tip in my back pocket in case I need it. How much data do you think I need before I see a difference?
%timeit start_time = df.interviewstarttime.map(pd.Timestamp)
10 loops, best of 3: 29.6 ms per loop
%timeit start_time = pd.to_datetime(df.interviewstarttime)
10 loops, best of 3: 27.9 ms per loop
My table has a couple million rows in it, definitely noticable at that point ๐
๐
The format string is key really, since it lets it skip trying to infer the format for each individual string:
Nice demonstration. Now that you’ve got me curious, I wonder what happens to this 30x speed-up as a function of N. Seems like it shines even more:
N map to_dt to_dt_w_fmt speed_up
10000000 3243.196088 2030.612348 89.009663 36.436450