8.7.2. sklearn.ensemble.RandomTreesEmbedding¶

class sklearn.ensemble.RandomTreesEmbedding(n_estimators=10, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_density=0.1, n_jobs=1, random_state=None, verbose=0)¶

An ensemble of totally random trees.

An unsupervised transformation of a dataset to a high-dimensional sparse representation. A datapoint is coded according to which leaf of each tree it is sorted into. Using a one-hot encoding of the leaves, this leads to a binary coding with as many ones as trees in the forest.

The dimensionality of the resulting representation is approximately n_estimators * 2 ** max_depth.

Parameters:

n_estimators : int

Number of trees in the forest.

max_depth : int

Maximum depth of each tree.

min_samples_split : integer, optional (default=2)

The minimum number of samples required to split an internal node. Note: this parameter is tree-specific.

min_samples_leaf : integer, optional (default=1)

The minimum number of samples in newly created leaves. A split is discarded if after the split, one of the leaves would contain less then min_samples_leaf samples. Note: this parameter is tree-specific.

min_density : float, optional (default=0.1)

This parameter controls a trade-off in an optimization heuristic. It controls the minimum density of the sample_mask (i.e. the fraction of samples in the mask). If the density falls below this threshold the mask is recomputed and the input data is packed which results in data copying. If min_density equals to one, the partitions are always represented as copies of the original data. Otherwise, partitions are represented as bit masks (aka sample masks).

n_jobs : integer, optional (default=1)

The number of jobs to run in parallel. If -1, then the number of jobs is set to the number of cores.

random_state : int, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

verbose : int, optional (default=0)

Controls the verbosity of the tree building process.

References

[R76]

P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006.

[R77]

Moosmann, F. and Triggs, B. and Jurie, F. “Fast discriminative visual codebooks using randomized clustering forests” NIPS 2007

Attributes

estimators_: list of DecisionTreeClassifier

The collection of fitted sub-estimators.

Methods

`apply`(X)	Apply trees in the forest to X, return leaf indices.
`fit`(X[, y])	Fit estimator.
`fit_transform`(X[, y])	Fit estimator and transform dataset.
`get_params`([deep])	Get parameters for the estimator
`set_params`(**params)	Set the parameters of the estimator.
`transform`(X)	Transform dataset.

__init__(n_estimators=10, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_density=0.1, n_jobs=1, random_state=None, verbose=0)¶

apply(X)¶

Apply trees in the forest to X, return leaf indices.

Parameters:

X : array-like, shape = [n_samples, n_features]

Input data.

Returns:

X_leaves : array_like, shape = [n_samples, n_estimators]

For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in.

fit(X, y=None)¶

Fit estimator.

Parameters:

X : array-like, shape=(n_samples, n_features)

Input data used to build forests.

fit_transform(X, y=None)¶

Fit estimator and transform dataset.

Parameters:

X : array-like, shape=(n_samples, n_features)

Input data used to build forests.

Returns:

X_transformed: sparse matrix, shape=(n_samples, n_out) :

Transformed dataset.

get_params(deep=True)¶

Get parameters for the estimator

Parameters:

deep: boolean, optional :

If True, will return the parameters for this estimator and contained subobjects that are estimators.

set_params(**params)¶

Set the parameters of the estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self :

transform(X)¶

Transform dataset.

Parameters:

X : array-like, shape=(n_samples, n_features)

Input data to be transformed.

Returns:

X_transformed: sparse matrix, shape=(n_samples, n_out) :

Transformed dataset.

Citing

This page

8.7.2. sklearn.ensemble.RandomTreesEmbedding¶