The four pillars of probabilistic programming systems

I’ve been organizing my thoughts on probabilistic programming and Bayesian computation. Try this out: there are four things a probabilistic programming system needs, depending on who is using it for what:

  • Expressive language for formulating models
  • Efficient computation of objective functions
  • Flexible inference algorithms
  • Appropriate data analysis workflow

Maybe I can come up with better names for these pieces, and maybe they are not all different. And maybe I am missing something. This is sort of preliminary. But let me elaborate on how it works in the case of PyMC.

Expressive language for formulating models: this is what drew me to PyMC when I started doing applied work five-ish years ago. Just write Python. For simple things, it reads as easily as equations in a stats paper, and for complex things it can have subroutines, data structures, and all of the nice things I expect from a modern programming language.

Efficient computation of objective functions: PyMC2 has a strange confection of Python and Fortran under the hood, which works well enough for the stuff I’ve been doing. But (if I understand correctly) PyMC3 pushes everything off into Theano, which does a more sophisticated translation/compilation of the code.

Flexible inference algorithms: I think that a lot of the inspiration for PyMC3 is the possibility of using Hamiltonian Monte Carlo methods for generating MCMC steps, which requires quickly computing the derivative of the objective function. PyMC2 has relied heavily on the Adaptive Metropolis step method. In the past, I’ve had a lot of fun experimenting with alternative approaches.

Appropriate data analysis workflow: I’ve had a few long discussions with other researchers who are using these methods about the barriers for their work and their colleagues, and this is the part that seems most important. How do you get the data all in place to evaluate objective functions and run flexible inference algorithms? This is not really a core part of PyMC, but rather something to be done with general Python, which suits me just fine.

I’d love to workshop this a little bit with you, dear reader, so I’m going to try turning on comments again. I hope I don’t get spammed into oblivion.

1 Comment

Filed under software engineering

One response to “The four pillars of probabilistic programming systems

  1. Comments successfully re-enabled!