From http://stats.stackexchange.com/questions/10798/measures-of-autocorrelation-in-categorical-values-of-a-markov-chain, a question I run into from time to time:
> Are there any measures of auto-correlation for a sequence of observations of an (unordered) categorical variable?
An (accepted) answer that got me thinking:
> [L]ook directly at the convergence rate for the Markov chain.
My interpretation, in PyMC2 terms: run chain, calculate empirical transition probabilities for categorical variable, examine spectral gap.
Experimental notebook tk.
Here is a little feature in Matplotlib that I never saw before: stacked bar plots with tables attached. Perhaps too ugly for my Iraq Mortality stacked bar charts, but definitely handy for exploratory work.
I learned about it because it doesn’t work in `mpld3`… just one more benefit of being part of an open-source project. It would be so cool to have a `mpld3` version with some interactivity included, since interactivity can address one pitfalls of the stacked bar chart, the challenge of comparing lengths with different baselines.
I helped my students understand the decision tree classifier in sklearn recently. Maybe they think I helped too much. But I think it was good for them. We did an interesting little exercise, too, writing a program that writes a program that represents a decision tree. Maybe it will be useful to someone else as well:
def print_tree(t, root=0, depth=1):
if depth == 1:
print 'def predict(X_i):'
indent = ' '*depth
print indent + '# node %s: impurity = %.2f' % (str(root), t.impurity[root])
left_child = t.children_left[root]
right_child = t.children_right[root]
if left_child == sklearn.tree._tree.TREE_LEAF:
print indent + 'return %s # (node %d)' % (str(t.value[root]), root)
print indent + 'if X_i[%d] < %.2f: # (node %d)' % (t.feature[root], t.threshold[root], root)
print_tree(t, root=left_child, depth=depth+1)
print indent + 'else:'
See it in action here.
Did I do this for MILK a few years ago? I’m becoming an absent-minded professor ahead of my time.
These seminars that eScience and company are putting on are great. I have to go to the IHME seminars scheduled at competing time once in a while, so someone else attend at tell me about this one: http://data.uw.edu/seminar/2015/mullainathan/
A new edition of the Visual Business Intelligence Newsletter crossed my inbox recently, on how to display timeseries with missing and incomplete values: http://www.perceptualedge.com/articles/visual_business_intelligence/missing_values_and_incomplete_periods_in_time_series.pdf
Good, simple ideas are our most precious intellectual commodity.
I missed this presentation, but I am going to figure out how to use Docker for reproducible research soon! http://benmarwick.github.io/UW-eScience-docker-for-reproducible-research/#1