8.3.2. sklearn.cross_validation.KFold

class sklearn.cross_validation.KFold(n, n_folds=3, indices=True, shuffle=False, random_state=None, k=None)

K-Folds cross validation iterator.

Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds (without shuffling).

Each fold is then used a validation set once while the k - 1 remaining fold form the training set.

Parameters:

n : int

Total number of elements.

n_folds : int, default=3

Number of folds.

indices : boolean, optional (default True)

Return train/test split as arrays of indices, rather than a boolean mask array. Integer indices are required when dealing with sparse matrices, since those cannot be indexed by boolean masks.

shuffle : boolean, optional

Whether to shuffle the data before splitting into batches.

random_state : int or RandomState

Pseudo number generator state used for random sampling.

See also

StratifiedKFold
take label information into account to avoid building

folds, classification

Notes

All the folds have size trunc(n_samples / n_folds), the last one has the complementary.

Examples

>>> from sklearn import cross_validation
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = cross_validation.KFold(4, n_folds=2)
>>> len(kf)
2
>>> print(kf)
sklearn.cross_validation.KFold(n=4, n_folds=2)
>>> for train_index, test_index in kf:
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
__init__(n, n_folds=3, indices=True, shuffle=False, random_state=None, k=None)
Previous
Next