.. _example_applications_plot_model_complexity_influence.py:


==========================
Model Complexity Influence
==========================

Demonstrate how model complexity influences both prediction accuracy and
computational performance.

The dataset is the Boston Housing dataset (resp. 20 Newsgroups) for
regression (resp. classification).

For each class of models we make the model complexity vary through the choice
of relevant model parameters and measure the influence on both computational
performance (latency) and predictive power (MSE or Hamming Loss).


.. rst-class:: horizontal


    *

      .. image:: images/plot_model_complexity_influence_001.png
            :scale: 47

    *

      .. image:: images/plot_model_complexity_influence_002.png
            :scale: 47

    *

      .. image:: images/plot_model_complexity_influence_003.png
            :scale: 47


**Script output**::

  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.25, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 4516 | Hamming Loss (Misclassification Ratio): 0.2562 | Pred. Time: 0.036248s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.5, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 1668 | Hamming Loss (Misclassification Ratio): 0.2939 | Pred. Time: 0.026719s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.75, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 890 | Hamming Loss (Misclassification Ratio): 0.3194 | Pred. Time: 0.021321s
  
  Benchmarking SGDClassifier(alpha=0.001, class_weight=None, epsilon=0.1, eta0=0.0,
         fit_intercept=True, l1_ratio=0.9, learning_rate='optimal',
         loss='modified_huber', n_iter=5, n_jobs=1, penalty='elasticnet',
         power_t=0.5, random_state=None, shuffle=False, verbose=0,
         warm_start=False)
  Complexity: 669 | Hamming Loss (Misclassification Ratio): 0.3347 | Pred. Time: 0.019336s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.1, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 69 | MSE: 31.8133 | Pred. Time: 0.000538s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.25, probability=False,
     random_state=None, shrinking=True, tol=0.001, verbose=False)
  Complexity: 136 | MSE: 25.6140 | Pred. Time: 0.000998s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.5, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 243 | MSE: 22.3315 | Pred. Time: 0.001719s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.75, probability=False,
     random_state=None, shrinking=True, tol=0.001, verbose=False)
  Complexity: 350 | MSE: 21.3679 | Pred. Time: 0.002448s
  
  Benchmarking NuSVR(C=1000.0, cache_size=200, coef0=0.0, degree=3, gamma=3.0517578125e-05,
     kernel='rbf', max_iter=-1, nu=0.9, probability=False, random_state=None,
     shrinking=True, tol=0.001, verbose=False)
  Complexity: 404 | MSE: 21.0915 | Pred. Time: 0.002822s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=10,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 10 | MSE: 28.4402 | Pred. Time: 0.000073s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=50,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 50 | MSE: 7.8822 | Pred. Time: 0.000173s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=100,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 100 | MSE: 6.6961 | Pred. Time: 0.000280s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=200,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 200 | MSE: 5.8514 | Pred. Time: 0.000488s
  
  Benchmarking GradientBoostingRegressor(alpha=0.9, init=None, learning_rate=0.1, loss='ls',
               max_depth=3, max_features=None, max_leaf_nodes=None,
               min_samples_leaf=1, min_samples_split=2, n_estimators=500,
               random_state=None, subsample=1.0, verbose=0, warm_start=False)
  Complexity: 500 | MSE: 6.0121 | Pred. Time: 0.001166s


**Python source code:** :download:`plot_model_complexity_influence.py <plot_model_complexity_influence.py>`

.. literalinclude:: plot_model_complexity_influence.py
    :lines: 16-

**Total running time of the example:**  29.36 seconds
( 0 minutes  29.36 seconds)