6.6.1. scikits.learn.mixture.GMM¶
- class scikits.learn.mixture.GMM(n_states=1, cvtype='diag')¶
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters : n_states : int
Number of mixture components.
cvtype : string (read-only)
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
Examples
>>> import numpy as np >>> from scikits.learn import mixture >>> g = mixture.GMM(n_states=2)
>>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> np.random.seed(0) >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(cvtype='diag', n_states=2) >>> g.weights array([ 0.25, 0.75]) >>> g.means array([[ 0.05980802], [ 9.94199467]]) >>> g.covars [array([[ 1.01682662]]), array([[ 0.96080513]])] >>> np.round(g.weights, 2) array([ 0.25, 0.75]) >>> np.round(g.means, 2) array([[ 0.06], [ 9.94]]) >>> np.round(g.covars, 2) ... array([[[ 1.02]], [[ 0.96]]]) >>> g.predict([[0], [2], [9], [10]]) array([0, 0, 1, 1]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.32, -4.16, -1.65, -1.19])
>>> # Refit the model on new data (initial parameters remain the >>> # same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(cvtype='diag', n_states=2) >>> np.round(g.weights, 2) array([ 0.5, 0.5])
Attributes
cvtype Covariance type of the model. n_states Number of mixture components in the model. weights Mixing weights for each mixture component. means Mean parameters for each mixture component. covars Return covars as a full matrix. n_features int Dimensionality of the Gaussians. Methods
decode(X) Find most likely mixture components for each point in X. eval(X) Compute the log likelihood of X under the model and the posterior distribution over mixture components. fit(X) Estimate model parameters from X using the EM algorithm. predict(X) Like decode, find most likely mixtures components for each observation in X. rvs(n=1) Generate n samples from the model. score(X) Compute the log likelihood of X under the model. - __init__(n_states=1, cvtype='diag')¶
- covars¶
Return covars as a full matrix.
- cvtype¶
Covariance type of the model.
Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
- decode(obs)¶
Find most likely mixture components for each point in obs.
Parameters : obs : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprobs : array_like, shape (n,)
Log probability of each point in obs under the model.
components : array_like, shape (n,)
Index of the most likelihod mixture components for each observation
- eval(obs)¶
Evaluate the model on data
Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs.
Parameters : obs : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n,)
Log probabilities of each data point in obs
posteriors: array_like, shape (n, n_states) :
Posterior probabilities of each mixture component for each observation
- fit(X, n_iter=10, min_covar=0.001, thresh=0.01, params='wmc', init_params='wmc')¶
Estimate model parameters with the expectation-maximization algorithm.
A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.
Parameters : X : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
n_iter : int, optional
Number of EM iterations to perform.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
- means¶
Mean parameters for each mixture component.
- n_states¶
Number of mixture components in the model.
- predict(X)¶
Predict label for data.
Parameters : X : array-like, shape = [n_samples, n_features] Returns : C : array, shape = [n_samples]
- predict_proba(X)¶
Predict posterior probability of data under each Gaussian in the model.
Parameters : X : array-like, shape = [n_samples, n_features]
Returns : T : array-like, shape = [n_samples, n_states]
Returns the probability of the sample for each Gaussian (state) in the model.
- rvs(n=1)¶
Generate random samples from the model.
Parameters : n : int
Number of samples to generate.
Returns : obs : array_like, shape (n, n_features)
List of samples
- score(obs)¶
Compute the log probability under the model.
Parameters : obs : array_like, shape (n, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n,)
Log probabilities of each data point in obs
- weights¶
Mixing weights for each mixture component.