From the sklearn docs: “We call our estimator instance `clf`, as it is a classifier.” http://scikit-learn.org/stable/tutorial/basic/tutorial.html#learning-and-predicting
Category Archives: machine learning
This could be a useful guide: http://terrytangyuan.github.io/2016/03/14/scikit-flow-intro/
Darcy AM, Louie AK, Roberts L. Machine Learning and the Profession of Medicine. JAMA. 2016;315(6):551-552. doi:10.1001/jama.2015.18421.
> Must a physician be human? …
I have been getting some great success from the scikits-learn CountVectorizer transformations. Here are some notes on how I like to use it:
import sklearn.feature_extraction ngram_range = (1,2) clf = sklearn.feature_extraction.text.CountVectorizer( ngram_range=ngram_range, min_df=10, # minimum number of docs that must contain n-gram to include as a column #tokenizer=lambda x: [x_i.strip() for x_i in x.split()] # keep '*' characters as tokens )
There is a
stop_words parameter that is also sometimes useful.
EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf
I want one