scikits.learn.gmm.GMM¶
- class scikits.learn.gmm.GMM(n_states=1, n_dim=1, cvtype='diag', weights=None, means=None, covars=None)¶
Gaussian Mixture Model
Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.
Examples
>>> import numpy as np >>> from scikits.learn.gmm import GMM >>> g = GMM(n_states=2, n_dim=1) >>> # The initial parameters are fixed. >>> np.round(g.weights, 2) array([ 0.5, 0.5]) >>> np.round(g.means, 2) array([[ 0.], [ 0.]]) >>> np.round(g.covars, 2) array([[[ 1.]], <BLANKLINE> [[ 1.]]])
>>> # Generate random observations with two modes centered on 0 >>> # and 10 to use for training. >>> np.random.seed(0) >>> obs = np.concatenate((np.random.randn(100, 1), ... 10 + np.random.randn(300, 1))) >>> g.fit(obs) GMM(n_dim=1, cvtype='diag', means=array([[ ...], [ ...]]), covars=[array([[ ...]]), array([[ ...]])], n_states=2, weights=array([ 0.75, 0.25]))
>>> np.round(g.weights, 2) array([ 0.75, 0.25]) >>> np.round(g.means, 2) array([[ 9.94], [ 0.06]]) >>> np.round(g.covars, 2) ... array([[[ 0.96]], [[ 1.02]]]) >>> g.predict([[0], [2], [9], [10]]) array([1, 1, 0, 0]) >>> np.round(g.score([[0], [2], [9], [10]]), 2) array([-2.32, -4.16, -1.65, -1.19])
>>> # Refit the model on new data (initial parameters remain the >>> #same), this time with an even split between the two modes. >>> g.fit(20 * [[0]] + 20 * [[10]]) GMM(n_dim=1, cvtype='diag', means=array([[ 10.], [ 0.]]), covars=[array([[ 0.001]]), array([[ 0.001]])], n_states=2, weights=array([ 0.5, 0.5]))
>>> np.round(g.weights, 2) array([ 0.5, 0.5])
Attributes
cvtype n_states weights means covars n_dim int Dimensionality of the Gaussians. labels list, len n_states Optional labels for each mixture component. Methods
decode(X) Find most likely mixture components for each point in X. eval(X) Compute the log likelihood of X under the model and the posterior distribution over mixture components. fit(X) Estimate model parameters from X using the EM algorithm. predict(X) Like decode, find most likely mixtures components for each observation in X. rvs(n=1) Generate n samples from the model. score(X) Compute the log likelihood of X under the model. - __init__(n_states=1, n_dim=1, cvtype='diag', weights=None, means=None, covars=None)¶
Create a Gaussian mixture model
Initializes parameters such that every mixture component has zero mean and identity covariance.
Parameters : n_states : int
Number of mixture components.
n_dim : int
Dimensionality of the mixture components.
cvtype : string (read-only)
String describing the type of covariance parameters to use. Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’. Defaults to ‘diag’.
- covars¶
Return covars as a full matrix.
- cvtype¶
Covariance type of the model.
Must be one of ‘spherical’, ‘tied’, ‘diag’, ‘full’.
- decode(obs)¶
Find most likely mixture components for each point in obs.
Parameters : obs : array_like, shape (n, n_dim)
List of n_dim-dimensional data points. Each row corresponds to a single data point.
Returns : logprobs : array_like, shape (n,)
Log probability of each point in obs under the model.
components : array_like, shape (n,)
Index of the most likelihod mixture components for each observation
- eval(obs)¶
Evaluate the model on data
Compute the log probability of obs under the model and return the posterior distribution (responsibilities) of each mixture component for each element of obs.
Parameters : obs : array_like, shape (n, n_dim)
List of n_dim-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n,)
Log probabilities of each data point in obs
posteriors: array_like, shape (n, n_states) :
Posterior probabilities of each mixture component for each observation
- fit(X, n_iter=10, min_covar=0.001, thresh=0.01, params='wmc', init_params='wmc', **kwargs)¶
Estimate model parameters with the expectation-maximization algorithm.
A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’. Likewise, if you would like just to do an initialization, call this method with n_iter=0.
Parameters : X : array_like, shape (n, n_dim)
List of n_dim-dimensional data points. Each row corresponds to a single data point.
n_iter : int, optional
Number of EM iterations to perform.
min_covar : float, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
thresh : float, optional
Convergence threshold.
params : string, optional
Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
init_params : string, optional
Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars. Defaults to ‘wmc’.
kwargs : keyword, optional
Keyword arguments passed to scipy.cluster.vq.kmeans2
- means¶
Mean parameters for each mixture component.
- n_states¶
Number of mixture components in the model.
- predict(X)¶
Predict label for data.
Parameters : X : array-like, shape = [n_samples, n_features] Returns : C : array, shape = [n_samples]
- rvs(n=1)¶
Generate random samples from the model.
Parameters : n : int
Number of samples to generate.
Returns : obs : array_like, shape (n, n_dim)
List of samples
- score(obs)¶
Compute the log probability under the model.
Parameters : obs : array_like, shape (n, n_dim)
List of n_dim-dimensional data points. Each row corresponds to a single data point.
Returns : logprob : array_like, shape (n,)
Log probabilities of each data point in obs
- weights¶
Mixing weights for each mixture component.