scikits.learn.cluster.KMeans¶
- class scikits.learn.cluster.KMeans(k=8, init='random', n_init=10, max_iter=300)¶
K-Means clustering
Parameters : data : ndarray
A M by N array of M observations in N dimensions or a length M array of M one-dimensional observations.
k : int or ndarray
The number of clusters to form as well as the number of centroids to generate. If init initialization string is ‘matrix’, or if a ndarray is given instead, it is interpreted as initial cluster to use instead.
n_iter : int
Number of iterations of the k-means algrithm to run. Note that this differs in meaning from the iters parameter to the kmeans function.
init : {‘k-means++’, ‘random’, ‘points’, ‘matrix’}
Method for initialization, defaults to ‘k-means++’:
‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.
‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data.
‘points’: choose k observations (rows) at random from data for the initial centroids.
‘matrix’: interpret the k parameter as a k by M (or length k array for one-dimensional data) array of initial centroids.
Notes
The k-means problem is solved using the Lloyd algorithm.
The average complexity is given by O(k n T), were n is the number of samples and T is the number of iteration.
The worst case complexity is given by O(n^(k+2/p)) with n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii, ‘How slow is the k-means method?’ SoCG2006)
In practice, the K-means algorithm is very fast (on of the fastest clustering algorithms available), but it falls in local minimas, and it can be useful to restarts it several times.
Attributes
cluster_centers_: array, [n_clusters, n_features] Coordinates of cluster centers labels_: Labels of each point inertia_: float The value of the inertia criterion associated with the chosen partition. Methods
fit(X): Compute K-Means clustering - __init__(k=8, init='random', n_init=10, max_iter=300)¶
- fit(X, **params)¶
Compute k-means