I almost didn’t share these HarleMCMC videos, but how long could I resist, really?

We’ll see how this holds up to repeated viewing…

Here is a math/dance video for the ages:

I almost didn’t share these HarleMCMC videos, but how long could I resist, really?

We’ll see how this holds up to repeated viewing…

Here is a math/dance video for the ages:

Comments Off on Math and Dance

Filed under MCMC, Uncategorized

I’ve always enjoyed the Gaussian Process part of the PyMC package, and a question on the mailing list yesterday reminded me of a project I worked on with it that never came to fruition: how to implement constraints on the derivatives of the GP.

The best answer I could come up with is to use “potential” nodes, and do it approximately. That is to say, instead of constraining the derivative, I satisfy myself to constrain a secant that approximates the derivative. And instead of constraining it at every point in an interval, I satisfy myself to constrain it at a discrete subset of points.

Comments Off on MCMC in Python: (approximate) derivative-constrained Gaussian Processes with PyMC.gp

Filed under MCMC

I have had this idea for a while, to go through the examples from the OpenBUGS webpage and port them to PyMC, so that I can be sure I’m not going much slower than I could be, and so that people can compare MCMC samplers “apples-to-apples”. But its easy to have ideas. Acting on them takes more time.

So I’m happy that I finally found a little time to sit with Kyle Foreman and get started. We ported one example over, the “seeds” random effects logistic regression. It is a nice little example, and it also gave me a chance to put something in the ipython notebook, which I continue to think is a great way to share code.

Filed under MCMC, software engineering

When I was gushing about the python data package pandas, commenter Rafael S. Calsaverini asked about combining it with PyMC, the python MCMC package that I usually gush about. I had a few minutes free and gave it a try. And just for fun I gave it a try in the new ipython notebook. It works, but it could work even better. See attached:

Filed under MCMC, software engineering

I learned last week about a Python Package for doing MCMC estimation, called PyMCMC. It sounds sort of like something I’m always writing about, doesn’t it?

From my quick look, it appears that pyMCMC has some advanced sampling methods (like Slice sampling) that are not yet implemented for PyMC. On the other hand, it seems like PyMC has a more flexible modeling language, which permits formulation of complex models without writing out likelihood functions explicitly.

Has anyone used PyMCMC? How did it go for you?

Comments Off on PyMC and PyMCMC

Filed under MCMC

I’ve been thinking a lot about validating statistical models. My disease models are complicated, there are many places to make a little mistake. And people care about the numbers, so they will care if I make mistakes. My concern is grounded in experience; when I was re-implementing my disease modeling system, I realized that I mis-parameterized a bit of the model, giving undue influence to observations with small sample size. Good thing I caught it before ~~anything was published based on the results~~I published anything based on the results!

How do I avoid this trouble going forwards? A well-timed blog post from Statistical Modeling, Causal Inference, and Social Science highlights one way, described in a paper linked there. I like this and I partially replicated in PyMC. But I’m concerned about something, which the authors mention in their conclusion:

To help ensure that errors, when present, are apparent from the simulation results, we caution against using “nice” numbers for fixed inputs or “balanced” dimensions in these simulations. For example, consider a generic hyperprior scale parameter

s. If software were incorrectly written to uses^2instead ofs, the software could still appear to work correctly if tested with the fixed value ofsset to 1 (or very close to 1), but would not work correctly for other values ofs.

How do I avoid nice numbers in practice? I have an idea, but I’m not sure I like it. Does anyone else have ideas?

Also, my replication only works part of the time for my simple example, I guess because one of my errors is not enough of an error:

Comments Off on Validating Statistical Models

Filed under MCMC, software engineering