Validating Statistical Models

I’ve been thinking a lot about validating statistical models. My disease models are complicated, there are many places to make a little mistake. And people care about the numbers, so they will care if I make mistakes. My concern is grounded in experience; when I was re-implementing my disease modeling system, I realized that I mis-parameterized a bit of the model, giving undue influence to observations with small sample size. Good thing I caught it before ~~anything was published based on the results~~I published anything based on the results!

How do I avoid this trouble going forwards? A well-timed blog post from Statistical Modeling, Causal Inference, and Social Science highlights one way, described in a paper linked there. I like this and I partially replicated in PyMC. But I’m concerned about something, which the authors mention in their conclusion:

To help ensure that errors, when present, are apparent from the simulation results, we caution against using “nice” numbers for fixed inputs or “balanced” dimensions in these simulations. For example, consider a generic hyperprior scale parameter s. If software were incorrectly written to use s^2 instead of s, the software could still appear to work correctly if tested with the fixed value of s set to 1 (or very close to 1), but would not work correctly for other values of s.

How do I avoid nice numbers in practice? I have an idea, but I’m not sure I like it. Does anyone else have ideas?

Also, my replication only works part of the time for my simple example, I guess because one of my errors is not enough of an error:

Comments Off on Validating Statistical Models

Filed under MCMC, software engineering

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Validating Statistical Models

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

Validating Statistical Models

Share this:

Related

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta