Warning: This documentation is for scikits.learn version 0.7.1. — Latest stable version

Contents

6.13.1. scikits.learn.grid_search.GridSearchCV

class scikits.learn.grid_search.GridSearchCV(estimator, param_grid, loss_func=None, score_func=None, fit_params={}, n_jobs=1, iid=True, refit=True, cv=None)

Grid search on the parameters of a classifier

Important members are fit, predict.

GridSearchCV implements a “fit” method and a “predict” method like any classifier except that the parameters of the classifier used to predict is optimized by cross-validation

Parameters :

estimator: object type that implements the “fit” and “predict” methods :

A object of that type is instanciated for each grid point

param_grid: dict :

a dictionary of parameters that are used the generate the grid

loss_func: callable, optional :

function that takes 2 arguments and compares them in order to evaluate the performance of prediciton (small is good) if None is passed, the score of the estimator is maximized

score_func: callable, optional :

function that takes 2 arguments and compares them in order to evaluate the performance of prediciton (big is good) if None is passed, the score of the estimator is maximized

fit_params : dict, optional

parameters to pass to the fit method

n_jobs: int, optional :

number of jobs to run in parallel (default 1)

iid: boolean, optional :

If True, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds.

cv : crossvalidation generator

see scikits.learn.cross_val module

refit: boolean :

refit the best estimator with the entire dataset

Notes

The parameters selected are those that maximize the score of the left out data, unless an explicit score_func is passed in which case it is used instead. If a loss function loss_func is passed, it overrides the score functions and is minimized.

Examples

>>> from scikits.learn import svm, grid_search, datasets
>>> iris = datasets.load_iris()
>>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
>>> svr = svm.SVR()
>>> clf = grid_search.GridSearchCV(svr, parameters)
>>> clf.fit(iris.data, iris.target) 
GridSearchCV(n_jobs=1, fit_params={}, loss_func=None, refit=True, cv=None,
       iid=True,
       estimator=SVR(kernel='rbf', C=1.0, probability=False, ...
       ...

Methods

fit(X[, y]) Run fit with all sets of parameters
score(X[, y])
__init__(estimator, param_grid, loss_func=None, score_func=None, fit_params={}, n_jobs=1, iid=True, refit=True, cv=None)
fit(X, y=None, **params)

Run fit with all sets of parameters

Returns the best classifier

Parameters :

X: array, [n_samples, n_features] :

Training vector, where n_samples in the number of samples and n_features is the number of features.

y: array, [n_samples] or None :

Target vector relative to X, None for unsupervised problems