Category Archives: software engineering

Stylish tooltips in mpld3

I have added some stylish HTML tooltips to mpld3, make something pretty with them. Demonstration here.

Comments Off on Stylish tooltips in mpld3

Filed under dataviz, software engineering

Code review in the sciences

Software Carpentry has been doing a very interesting project on incorporating code review into the scientific process. The results of the first attempt are here, and the announcement of the second round is here. Maybe you will participate.

Comments Off on Code review in the sciences

Filed under software engineering

Matplotlib and dj3s, together at last

There is an exciting new project in pythonic interactive data visualization that I have my eye on: mpld3. It plays well with matplotlib-based pretty plotting packages, and has the beginnings of a plugin framework for adding custom interactivity.

I used it to mock up a Cartesian fish eye distortion plot, something I’ve wanted for DisMod-MR ever since I learned about it. (Sometimes the interactivity doesn’t work in that notebook, and requires reloading everything… cutting edge software has some rough edges.)

1 Comment

Filed under dataviz, software engineering

The four pillars of probabilistic programming systems

I’ve been organizing my thoughts on probabilistic programming and Bayesian computation. Try this out: there are four things a probabilistic programming system needs, depending on who is using it for what:

  • Expressive language for formulating models
  • Efficient computation of objective functions
  • Flexible inference algorithms
  • Appropriate data analysis workflow

Maybe I can come up with better names for these pieces, and maybe they are not all different. And maybe I am missing something. This is sort of preliminary. But let me elaborate on how it works in the case of PyMC.

Expressive language for formulating models: this is what drew me to PyMC when I started doing applied work five-ish years ago. Just write Python. For simple things, it reads as easily as equations in a stats paper, and for complex things it can have subroutines, data structures, and all of the nice things I expect from a modern programming language.

Efficient computation of objective functions: PyMC2 has a strange confection of Python and Fortran under the hood, which works well enough for the stuff I’ve been doing. But (if I understand correctly) PyMC3 pushes everything off into Theano, which does a more sophisticated translation/compilation of the code.

Flexible inference algorithms: I think that a lot of the inspiration for PyMC3 is the possibility of using Hamiltonian Monte Carlo methods for generating MCMC steps, which requires quickly computing the derivative of the objective function. PyMC2 has relied heavily on the Adaptive Metropolis step method. In the past, I’ve had a lot of fun experimenting with alternative approaches.

Appropriate data analysis workflow: I’ve had a few long discussions with other researchers who are using these methods about the barriers for their work and their colleagues, and this is the part that seems most important. How do you get the data all in place to evaluate objective functions and run flexible inference algorithms? This is not really a core part of PyMC, but rather something to be done with general Python, which suits me just fine.

I’d love to workshop this a little bit with you, dear reader, so I’m going to try turning on comments again. I hope I don’t get spammed into oblivion.

1 Comment

Filed under software engineering

What qualifies as probabilistic programming?

I just went through the classic paper on WinBUGS, which might or might not be called probabilistic programming. It is listed on the probabilistic programming resource page, and it is certainly interesting. The WinBUGS “hello, world” is a linear regression model:

regression_hello_world

Comments Off on What qualifies as probabilistic programming?

Filed under software engineering, statistics

Probabilistic Programming Examples

I’ve been reading up on probabilistic programming, it is so close to PyMC, but so different. Coolest example so far comes from a talk on Microsoft’s offering, infer.net:

hello_uncertain_world

Comments Off on Probabilistic Programming Examples

Filed under probability, software engineering

Code review for science

A cool experiment from the Mozilla Science Lab: Code Review for Scientists. I look forward to the results.

If anyone wants to do code review of my science, you can comment line-by-line on my github repos.

Comments Off on Code review for science

Filed under software engineering

An ipython notebook to diff ipython notebooks

Here is something I needed recently that other people have been tweeting about needing, too: http://nbviewer.ipython.org/5649571

This could also a place to collect other ways to do it.

Comments Off on An ipython notebook to diff ipython notebooks

Filed under software engineering

hello, world of statistical graphics in IPython notebook

A few months ago, I had great success invoking the internet to come up with the “hello, world” of statistical graphics.

There are some exciting new developments in javascript-based plotting, and this graphic is just the thing to compare them. D3js has conquered the world in recent years, and is something that my colleagues are starting to think they need to know. Meanwhile, one of the d3js instigators has unveiled the next in his series of revolutions in data visualization, Vega. This is still in development, but may be more appropriate than d3js for routine plots. And it was very soon after the Vega specification and runtime appeared that a python package for it was also released.

Here is an IPython notebook comparing all of these options. The notebook doesn’t save javascript in a way that redisplays, but if you put it in your own notebook server and execute all the cells you should see something like this:

vincent_vega

p.s. google vincent vega to learn the pop culture joke behind this strangely named python package.

2 Comments

Filed under software engineering

Happy Pi Day

I recently came across a stack overflow post just perfect for Pi day. The path to knowledge is asking many questions, and it is a strange feature of the days in which we live how steep this path can be: a question that starts “How to determine whether my calculation of pi is accurate? I was trying various methods to implement a program that gives the digits of pi sequentially…” eventually receives an answer that starts “Since I’m the current world record holder for the most digits of pi, I’ll add my two cents…”

All here.

Comments Off on Happy Pi Day

Filed under software engineering