> Are there any measures of auto-correlation for a sequence of observations of an (unordered) categorical variable?

An (accepted) answer that got me thinking:

> [L]ook directly at the convergence rate for the Markov chain.

My interpretation, in PyMC2 terms: run chain, calculate empirical transition probabilities for categorical variable, examine spectral gap.

Experimental notebook tk.

]]>

http://stats.stackexchange.com/questions/27951/when-are-log-scales-appropriate

http://stats.stackexchange.com/questions/90149/pitfalls-to-avoid-when-transforming-data

The last one isn’t really about data transformations, but is still interesting.

]]>

I learned about it because it doesn’t work in `mpld3`… just one more benefit of being part of an open-source project. It would be so cool to have a `mpld3` version with some interactivity included, since interactivity can address one pitfalls of the stacked bar chart, the challenge of comparing lengths with different baselines.

]]>

def print_tree(t, root=0, depth=1): if depth == 1: print 'def predict(X_i):' indent = ' '*depth print indent + '# node %s: impurity = %.2f' % (str(root), t.impurity[root]) left_child = t.children_left[root] right_child = t.children_right[root] if left_child == sklearn.tree._tree.TREE_LEAF: print indent + 'return %s # (node %d)' % (str(t.value[root]), root) else: print indent + 'if X_i[%d] < %.2f: # (node %d)' % (t.feature[root], t.threshold[root], root) print_tree(t, root=left_child, depth=depth+1) print indent + 'else:' print_tree(t,root=right_child, depth=depth+1)

Did I do this for MILK a few years ago? I’m becoming an absent-minded professor ahead of my time.

]]>

]]>

Good, simple ideas are our most precious intellectual commodity.

]]>

]]>

http://www.carrierroutes.com/ZIPCodes.html

]]>

]]>

But I had a great idea, or at least one that I think is great: see what people are confused by online. I tried this out for my lecture last week on cross-validation, using the stats.stackexchange site: http://stats.stackexchange.com/questions/tagged/cross-validation?sort=votes&pageSize=50

After reading a ton of these, I decided that if my students know when they need test/train/validation splits and when they can get aways with test/train splits then they’ve really figured things out. Now I can’t find the question that I thought distilled this best, though.

]]>