# Tag Archives: stats

## Statistics in Python: Calculating R^2

I wanted to include some old-fashioned statistics in a paper recently, and did some websearching on how to calculate R^2 in Python. It’s all very touchy, it seems. Here’s what I found:

I eventually went with this:

```%load_ext rmagic

x = np.array(1/df.J)
y = np.array(df.conc_rand)
%Rpush x y
%R print(summary(lm(y ~ x + 0)))
```

Comments Off on Statistics in Python: Calculating R^2

Filed under statistics

## Computation time and model development

20 seconds, 20 minutes, or 20 hours.  These are all amounts of time that a computational method I’ve been working at some time has taken to complete processing.  They each lead to a very different experience for the model developer, and probably in the end for the model, too. Twenty seconds is definitely what I prefer.

Comments Off on Computation time and model development

Filed under statistics, TCS

## Age-heaping and Hedgehogs

I heard an interesting talk a few weeks ago about “age-heaping” in survey responses, the phenomenon where people remember ages imprecisely and say that their siblings are ages that are divisible by 5 much more often than expected.  There are some nice theory challenges here, with a big dose of stats modeling, but I’ll have to share some more thoughts on that later.

In the talk, the age-heaping was also referred to a a hedgehog or porcupine plot, because of the spikey histogram that the data produces.  I was looking for a nice picture of one, or some additional background reading, and when I searched for “hedgehog statistical plots”, all google would give me was a bunch of pages about stats on actual hedgehogs.  Cute!

Filed under TCS

Kyle writes from Sri Lanka with his stats programming tips for the new PBFs. It’s all things that old PBFs and even old young professors can benefit from:

• It’s taken me 2 years to jump on the version control bandwagon (~18 months after your PToW on git….), so I certainly can’t claim to be an exemplar myself. But I think the main themes would be:

• Location, location, location! Can you find your code? Can others find your code? Do both the directory and the filename make sense?

• Replicability – even of the mistakes. If you do something right, you want to be able to do it again.
o But often, even if it’s wrong, you want to do it again. Chris will say, let’s go back to the broken version from 2 months ago, I liked that better. So if you change your code, keep a record of the old parts (and maybe even why you ditched them).

• If others can’t look at your code and figure out quickly what each “chunk” of code does, it’s not well documented enough. If you can’t even tell within 30 seconds what a particular piece does (and you wrote it!), that’s a problem.

• On the other hand: Yes, a few PBFs were lit majors, but that doesn’t mean your code should be in novella format. Concise, readable code is often more understandable than a few sentences of explanation.

• Whitespace! Headers! Tab and Enter are your close and personal friends: “Without negative space how would we appreciate the positive in our art and in our lives?” – Dyan Law (some artist I’ve never heard of)

• Good exercises: Take someone else’s raw code, figure out what it does, and comment it. Read through a program you wrote and haven’t used in months – how long does it take you to figure out? Have someone else comment your own raw code; did they explicate things you left implied? did they misinterpret anything?

All good advice, and I often regret it when I don’t follow it. Anything else that should be on this list?