From the sklearn docs: “We call our estimator instance `clf`, as it is a classifier.” http://scikit-learn.org/stable/tutorial/basic/tutorial.html#learning-and-predicting
Tag Archives: sklearn
Why do I call that variable `clf`?
Comments Off on Why do I call that variable `clf`?
Filed under machine learning, software engineering
Using the sklearn text.CountVectorizer
I have been getting some great success from the scikits-learn CountVectorizer transformations. Here are some notes on how I like to use it:
import sklearn.feature_extraction ngram_range = (1,2) clf = sklearn.feature_extraction.text.CountVectorizer( ngram_range=ngram_range, min_df=10, # minimum number of docs that must contain n-gram to include as a column #tokenizer=lambda x: [x_i.strip() for x_i in x.split()] # keep '*' characters as tokens )
There is a stop_words
parameter that is also sometimes useful.
Comments Off on Using the sklearn text.CountVectorizer
Filed under machine learning
Using the sklearn grid_search tools
Scikit-learn has a really nice grid search module. It will soon be called model_selection, because it has grown beyond simple grid search. But here is the spirit of it:
import sklearn.svm, sklearn.grid_search, sklearn.datasets.samples_generator parameters = {'kernel':('poly', 'rbf'), 'C':[.01, .1, 1, 10, 100]} clf = sklearn.grid_search.GridSearchCV( sklearn.svm.SVC(probability=True), parameters, n_jobs=64) X, y = sklearn.datasets.samples_generator.make_classification(n_samples=200, n_features=5, random_state=12345) clf.fit(X, y) clf.best_params_
And say you want to take a careful look at the results? They are all in there, too. http://nbviewer.ipython.org/gist/aflaxman/cb0660e602d361d06599
Comments Off on Using the sklearn grid_search tools
Filed under machine learning, software engineering
I like the term OneHotEncoder
Dummy variable just sounds demeaning to me. http://stats.stackexchange.com/questions/149122/treating-missing-data-in-voting-pattern-analysis/149572#149572
Comments Off on I like the term OneHotEncoder
Filed under machine learning