Category Archives: statistics

Statistics in Python: Calculating R^2

I wanted to include some old-fashioned statistics in a paper recently, and did some websearching on how to calculate R^2 in Python. It’s all very touchy, it seems. Here’s what I found:

http://stats.stackexchange.com/questions/36064/calculating-r-squared-coefficient-of-determination-with-centered-vs-un-center

http://stackoverflow.com/questions/893657/how-do-i-calculate-r-squared-using-python-and-numpy

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html

http://forums.udacity.com/questions/100154896/why-is-r-squared-from-formula-different-than-scipy-functions-one

I eventually went with this:

%load_ext rmagic

x = np.array(1/df.J)
y = np.array(df.conc_rand)
%Rpush x y
%R print(summary(lm(y ~ x + 0)))

Leave a comment

Filed under statistics

CrossValidated on interesting and well-written papers in applied stats

I should read some of these, and stash a few for the PGF journal club:

http://stats.stackexchange.com/questions/9365/what-are-some-interesting-and-well-written-applied-statistics-papers

http://www.jstor.org/stable/2347679

Comments Off

Filed under statistics

MCMC in Python: observed data for a sum of random variables in PyMC

I like answering PyMC questions on Stack Overflow, but sometimes I give an answer and end up the one with the question. Like what would you model as the sum of a Poisson and a Negative Binomial?

Comments Off

Filed under statistics

MCMC in Python: sim and fit with same model

Here is a github issue and solution that I saw the other day. I think it’s a nice pattern.

def generate_model(values={'mu': true_param, 'm': None}):

    #prior
    mu = pymc.Uniform("mu", lower=-10, upper=10, value=values['mu'], 
        observed=(values['mu'] is not None))

    # likelihood function
    m = pymc.Normal("m", mu=mu, tau=tau, value=values['m'], 
        observed=(values['m'] is not None))

    return locals()

Comments Off

Filed under statistics

MCMC in Python: Fit a non-linear function with PyMC

Here is a recent q&a on stack overflow that I did and liked.

Comments Off

Filed under statistics

The one before that

Jake Vanderplas’s comparison of Python MCMC modules was preceded by a Bayesian polemic. In general, I find the stats philosophy war old-timey and distracting, but his comparison of confidence intervals and credible intervals is something I need to understand better.

http://jakevdp.github.io/blog/2014/06/12/frequentism-and-bayesianism-3-confidence-credibility/

Comments Off

Filed under statistics

MCMC in Python: a bake-off

While I’m on a microblogging spree, I’ve been meaning to link to this informative comparison of pymc, emcee, and pystan: http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/

Comments Off

Filed under statistics