8.1.1.5. sklearn.cluster.MeanShift¶
- class sklearn.cluster.MeanShift(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)¶
MeanShift clustering
Parameters: bandwidth : float, optional
Bandwith used in the RBF kernel If not set, the bandwidth is estimated. See clustering.estimate_bandwidth.
seeds : array [n_samples, n_features], optional
Seeds used to initialize kernels. If not set, the seeds are calculated by clustering.get_bin_seeds with bandwidth as the grid size and default values for other parameters.
cluster_all : boolean, default True
If true, then all points are clustered, even those orphans that are not within any kernel. Orphans are assigned to the nearest kernel. If false, then orphans are given cluster label -1.
Notes
Scalability:
Because this implementation uses a flat kernel and a Ball Tree to look up members of each kernel, the complexity will is to O(T*n*log(n)) in lower dimensions, with n the number of samples and T the number of points. In higher dimensions the complexity will tend towards O(T*n^2).
Scalability can be boosted by using fewer seeds, for examply by using a higher value of min_bin_freq in the get_bin_seeds function.
Note that the estimate_bandwidth function is much less scalable than the mean shift algorithm and will be the bottleneck if it is used.
References
Dorin Comaniciu and Peter Meer, “Mean Shift: A robust approach toward feature space analysis”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002. pp. 603-619.
Attributes
cluster_centers_ array, [n_clusters, n_features] Coordinates of cluster centers. labels_ : Labels of each point. Methods
fit(X) Compute MeanShift fit_predict(X[, y]) Performs clustering on X and returns cluster labels. get_params([deep]) Get parameters for the estimator set_params(**params) Set the parameters of the estimator. - __init__(bandwidth=None, seeds=None, bin_seeding=False, cluster_all=True)¶
- fit(X)¶
Compute MeanShift
Parameters: X : array-like, shape=[n_samples, n_features]
Input points.
- fit_predict(X, y=None)¶
Performs clustering on X and returns cluster labels.
Parameters: X : ndarray, shape (n_samples, n_features)
Input data.
Returns: y : ndarray, shape (n_samples,)
cluster labels
- get_params(deep=True)¶
Get parameters for the estimator
Parameters: deep: boolean, optional :
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- set_params(**params)¶
Set the parameters of the estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.
Returns: self :