February 23, 2015 · 12:00 pm
Here is a little feature in Matplotlib that I never saw before: stacked bar plots with tables attached. Perhaps too ugly for my Iraq Mortality stacked bar charts, but definitely handy for exploratory work.
I learned about it because it doesn’t work in `mpld3`… just one more benefit of being part of an open-source project. It would be so cool to have a `mpld3` version with some interactivity included, since interactivity can address one pitfalls of the stacked bar chart, the challenge of comparing lengths with different baselines.
Filed under dataviz
Tagged as IDV4GH, mpld3
February 19, 2015 · 12:00 pm
I helped my students understand the decision tree classifier in sklearn recently. Maybe they think I helped too much. But I think it was good for them. We did an interesting little exercise, too, writing a program that writes a program that represents a decision tree. Maybe it will be useful to someone else as well:
def print_tree(t, root=0, depth=1):
if depth == 1:
print 'def predict(X_i):'
indent = ' '*depth
print indent + '# node %s: impurity = %.2f' % (str(root), t.impurity[root])
left_child = t.children_left[root]
right_child = t.children_right[root]
if left_child == sklearn.tree._tree.TREE_LEAF:
print indent + 'return %s # (node %d)' % (str(t.value[root]), root)
print indent + 'if X_i[%d] < %.2f: # (node %d)' % (t.feature[root], t.threshold[root], root)
print_tree(t, root=left_child, depth=depth+1)
print indent + 'else:'
See it in action here.
Did I do this for MILK a few years ago? I’m becoming an absent-minded professor ahead of my time.
February 18, 2015 · 12:00 pm
These seminars that eScience and company are putting on are great. I have to go to the IHME seminars scheduled at competing time once in a while, so someone else attend at tell me about this one: http://data.uw.edu/seminar/2015/mullainathan/
February 17, 2015 · 12:00 pm
A new edition of the Visual Business Intelligence Newsletter crossed my inbox recently, on how to display timeseries with missing and incomplete values: http://www.perceptualedge.com/articles/visual_business_intelligence/missing_values_and_incomplete_periods_in_time_series.pdf
Good, simple ideas are our most precious intellectual commodity.
February 14, 2015 · 12:00 pm
I missed this presentation, but I am going to figure out how to use Docker for reproducible research soon! http://benmarwick.github.io/UW-eScience-docker-for-reproducible-research/#1
February 13, 2015 · 12:00 pm
There are 929 3-digit ZIP Codes in the country (USA).
Filed under global health
Tagged as non-faq
February 6, 2015 · 12:00 pm
Doctors love decision trees, computer scientists love recursion, so maybe that’s why decision trees have been coming up so much in the Artificial Intelligence for Health Metricians class I’m teaching this quarter. We’ve been very sklearn-focused in our labs so far, but I thought my students might like to see how to build their own decision tree learner from scratch. So I put together this little notebook for them. Unfortunately, it is a little too complicated to make them do it themselves in a quarter-long class with no prerequisites on programming.
February 4, 2015 · 12:00 pm
In a recent post, I confessed my interest in a recent National Academy Press report on teaching methods. The tough thing for me about using this discipline-based education research (DBER) approach is not the name or the acronym, but coming up with the misunderstood concepts from the discipline that students benefit from learning actively. In the report examples, it seems like they are articulated by geniuses dedicated to teaching after years of student observation. I don’t know if I’ll get there one day, but I’m certainly not there now.
But I had a great idea, or at least one that I think is great: see what people are confused by online. I tried this out for my lecture last week on cross-validation, using the stats.stackexchange site: http://stats.stackexchange.com/questions/tagged/cross-validation?sort=votes&pageSize=50
After reading a ton of these, I decided that if my students know when they need test/train/validation splits and when they can get aways with test/train splits then they’ve really figured things out. Now I can’t find the question that I thought distilled this best, though.
Filed under machine learning
Tagged as dber
February 2, 2015 · 12:00 pm
To complement that ASA address about what is statistics that I read last week, here is the abstract KDD address about what is data mining: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=754923
Does the talk exist somewhere?
Filed under machine learning
Tagged as ai4hm