Software Carpentry on software testing

Greg Wilson has sparked an interesting discussion in the last little while, about writing automatic tests for scientific code. Here is his blog about it, which ends with a request for input about how you would unit test this physics simulation benchmark.

I’ve been thinking about testing recently myself, so this discussion was well timed. For me, the answer is that it is too late… you need to think about and maybe even write your tests _before_ you write your n-body simulation, or whatever. And it is too removed from context. The point of automatic tests is that you can run them again and again. But why would you run them again? It all depends what you are going to change. If I’m reading this right, the reason debian developers are interested in reference implementations of the n-body problem is to compare the speed of this algorithm when implemented in different programming languages. So the most important test is really a “regression test”: does the output generated match the output expected?
Actually, this test is recommended precisely:

ndiff -abserr 1.0e-8 program output N = 1000 with this output file to check your program is correct before contributing.

Some of the things I want to test over and over and over again are: Is the input data formatted correctly? Does it look reasonable? Did I convert dates correctly? Did I make a change that breaks something which I will not see for hours (or days) when running on my full dataset?

Comments Off

Filed under software engineering

Dates and Times in Python: average of two dates with Pandas

I spent a little longer than expected figuring out how to find the midpoint of two dates for a little table of data recently. Here is a code snippet in case I (or you) have to do this again:

# midpoint of two date columns
df = pd.DataFrame({'a': ['5/1/2012 0:00', '4/1/2014 0:00'],
                   'b': ['4/1/2014 0:00', 'unknown']})

# make time data into Timestamp format
def try_totime(t):
        return pd.Timestamp(t)
        return np.nan
df['start'] =
df['end'] =

# generate midpoint time
# harder than it would seem...
df['time'] = df.start + (df.end - df.start)/2



Filed under software engineering

What I’m Reading: The Design and Implementation of Probabilistic Programming Languages

A new online book crossed my screen recently: The Design and Implementation of Probabilistic Programming Languages. Looks good so far.

1 Comment

Filed under Uncategorized

Research Culture Questionnaire

Interesting questionnaire on research culture from CACM (article | questions) . Would be fun to have a school of public health version…

Comments Off

Filed under science policy

My Coursera Obsession: Visual Perception and the Brain

Did I already mention this MOOC watching habit I developed over the summer? I got sucked in to watching lectures online from all sort of classes. It is sort of like being in college again, but when I fall asleep during lecture, I can rewind when I wake up (if I want to).

One of the classes that I devoured video lectures from is , taught by Duke neuroscience prof Dale Purves. It’s got a little bit of that evolutionary-psychologist-explains-everything flavor, and a lot of visual illusions to use-not-abuse in data visualizations.

I remembered it when watching animal videos with my two year old today (his choice). Here is something that 75 million years of primate evolution can do, and it needs quite the visual system to do so:


Filed under education

IDV in Python: Retrieve Data From Dynamic mpld3 plot in python

Mpld3 questions show up on Stack Overflow from time to time, too, and they can get really informative answers if they pull in the javascript experts. This one got a comprehensive answer that was perhaps too expert, and so this follow up was an opportunity to show off my interactive plot call-out plugin yet again.

Comments Off

Filed under software engineering

MCMC in Python: observed data for a sum of random variables in PyMC

I like answering PyMC questions on Stack Overflow, but sometimes I give an answer and end up the one with the question. Like what would you model as the sum of a Poisson and a Negative Binomial?

Comments Off

Filed under statistics