Greg Wilson has sparked an interesting discussion in the last little while, about writing automatic tests for scientific code. Here is his blog about it, which ends with a request for input about how you would unit test this physics simulation benchmark.
I’ve been thinking about testing recently myself, so this discussion was well timed. For me, the answer is that it is too late… you need to think about and maybe even write your tests _before_ you write your n-body simulation, or whatever. And it is too removed from context. The point of automatic tests is that you can run them again and again. But why would you run them again? It all depends what you are going to change. If I’m reading this right, the reason debian developers are interested in reference implementations of the n-body problem is to compare the speed of this algorithm when implemented in different programming languages. So the most important test is really a “regression test”: does the output generated match the output expected?
Actually, this test is recommended precisely:
ndiff -abserr 1.0e-8
program output N = 1000 with this output file to check your program is correct before contributing.
Some of the things I want to test over and over and over again are: Is the input data formatted correctly? Does it look reasonable? Did I convert dates correctly? Did I make a change that breaks something which I will not see for hours (or days) when running on my full dataset?