Good advice from Density Estimation for Statistics and Data Analysis by Bernard. W. Silverman:

# Category Archives: statistics

## Kish Stuff

A student came by interested in survey statistics and we go to talking about what an amazing person Leslie Kish must have been. We did some googling on it. Here are a few items we found:

http://projecteuclid.org/download/pdf_1/euclid.ss/1032209665

http://www.amstat.org/about/statisticiansinhistory/index.cfm?fuseaction=biosinfo&BioID=9

https://asapresidentialpapers.info/documents/Kish_Leslie_1977_edit_(wla_092809).pdf

Comments Off on Kish Stuff

Filed under statistics

## Non-parametric regression in Python: Gaussian Processes in sklearn (with a little PyMC)

I’ve got a fun class going this quarter, on “artificial intelligence for health metricians”, and the course content mixed with some of the student interest has got me looking at the options for doing Gaussian process regression in Python. `PyMC2` has some nice stuff, but the `sklearn` version fits with the rest of my course examples more naturally, so I’m using that instead.

But `sklearn` doesn’t have the fanciest of fancy covariance functions implemented, and at IHME we have been down the road of the Matern covariance function for over five years now. It’s in `PyMC`, so I took a crack at mash-up. (Took a mash at a mash-up?) There is some room for improvement, but it is a start. If you need to do non-parametric regression for something that is differentiable more than once, but less than infinity times, you could try starting here: http://nbviewer.ipython.org/gist/aflaxman/af7bdb56987c50f3812b

p.s. Chris Fonnesbeck has some great notes on doing stuff like this and much more here: http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section5_1-Gaussian-Processes.ipynb

Comments Off on Non-parametric regression in Python: Gaussian Processes in sklearn (with a little PyMC)

Filed under statistics

## Bayesian Correlation in PyMC

Here is a StackOverflow question with a nice figure:

Is there a nice, simple reference for just what exactly these graphical model figures mean? I want more of them.

Filed under statistics

## Statistics in Python: Calculating R^2

I wanted to include some old-fashioned statistics in a paper recently, and did some websearching on how to calculate R^2 in Python. It’s all very touchy, it seems. Here’s what I found:

http://stats.stackexchange.com/questions/36064/calculating-r-squared-coefficient-of-determination-with-centered-vs-un-center

http://stackoverflow.com/questions/893657/how-do-i-calculate-r-squared-using-python-and-numpy

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html

http://forums.udacity.com/questions/100154896/why-is-r-squared-from-formula-different-than-scipy-functions-one

I eventually went with this:

%load_ext rmagic x = np.array(1/df.J) y = np.array(df.conc_rand) %Rpush x y %R print(summary(lm(y ~ x + 0)))

Comments Off on Statistics in Python: Calculating R^2

Filed under statistics

## CrossValidated on interesting and well-written papers in applied stats

I should read some of these, and stash a few for the PGF journal club:

Comments Off on CrossValidated on interesting and well-written papers in applied stats

Filed under statistics

## MCMC in Python: observed data for a sum of random variables in PyMC

I like answering PyMC questions on Stack Overflow, but sometimes I give an answer and end up the one with the question. Like what would you model as the sum of a Poisson and a Negative Binomial?

Comments Off on MCMC in Python: observed data for a sum of random variables in PyMC

Filed under statistics