Lesson 33 - Grid Search

The following topis are discussed in this notebook:

  • Using GridSearchCV for hyperparameter selection.

Grid search is a method for performing hyperparameter tuning for a model. This technique involves identifying one or more hyperparameters that you would like to tune, and then selecting some number of values to consider for each hyperparameter. We then evaluate each possible set of hyperparameters by performing some type of validation. Typically, this will involve performing cross-validation to generate an out-of-sample performance estimate for each set of hyperparameters. We then typically select the model that has the highest cross-validation score.

Import Packages

We will illustrate how to perform grid search in Scikit-Learn in this lesson. We begin by importing a few packages and tools that are not directly related to grid search.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.datasets import make_classification

Generate Data

In this lesson, we will work with a synthetic dataset created for a classification problem. The dataset contains 400 observations, each of which will have 6 features, and will be assigned one of 10 possible classes. The features are stored in an array named X, while the labels are stored in an array named y. We start by viewing the contents of X in a DataFrame format.

In [2]:
np.random.seed(1)
X, y = make_classification(n_samples=400, n_features=6, n_informative=6, 
                           n_redundant=0, n_classes=10, class_sep=2)

pd.DataFrame(X)   
Out[2]:
0 1 2 3 4 5
0 3.219743 -0.855040 1.057843 -2.180787 -1.946133 -0.744607
1 -0.038669 2.188514 -0.217455 -0.164903 -1.419136 1.940659
2 4.105727 0.484263 3.613972 -0.691252 2.483372 -2.685359
3 1.780591 1.294739 -2.106925 -1.441586 -1.973862 2.725750
4 -3.989205 -1.455436 1.104988 -1.800713 -2.176374 1.209061
... ... ... ... ... ... ...
395 2.500843 3.062626 -2.145771 2.160640 2.514152 -2.183479
396 2.573652 2.614137 1.166412 3.494776 -2.391065 0.161473
397 -0.894672 0.993666 0.036938 -0.096246 -3.065324 -1.059657
398 -3.439928 2.070202 -0.051047 -3.818329 -1.081785 1.702601
399 1.353171 -3.241138 2.331002 -2.089317 5.452808 2.473093

400 rows × 6 columns

Let's now view the first few elements of y.

In [3]:
print(y[:20])
[7 9 8 9 0 8 5 4 1 0 0 6 2 1 3 8 6 0 5 4]

Grid Search in Scikit-Learn

GridSearch is performed in Scikit-Learn using the GridSearchCV class. We will import this class in the cell below.

In [4]:
from sklearn.model_selection import GridSearchCV

Grid Search with Logistic Regression

We will illustrate the usage of GridSearchCV by first performing hyperparameter tuning to select the optimal value of the regularization parameter C in a logistic regression model.

We start by defining a parameter grid. This is a dictionary containing keys for any hyperparameters we wish to tune over. The values associated with each key should be a list or array of values to consider for that hyperparameter.

In [5]:
param_grid = [
    {'C': 10**np.linspace(-3,3,20)}
]

We then create an instance of the estimate that we wish to tune over. In this case, that is the LogisticRegression class. Note that we do not fit the model to the training data yet.

In [6]:
lin_reg = LogisticRegression(solver='lbfgs', multi_class='multinomial', max_iter=1000)

We then create an instance of the GridSearchCv class. When creating this instance, we must provide an estimator, a parameter grid, a number of folds to use in cross-validation, an evaluation metric to use in cross-validation. If we specify that refit=True (which is the default value), then GridSearchCV will automatically fit the best model found to the entire data set. We will discuss this more later.

After creating an instance of GridSearchCV, we train it using the fit method.

A trained GridSearchCV obtain has many attributes and methods that we might be interested in. We will explore this in more detail later, but for now, the most important attributes are best_score_ and best_params_. The best_score_ attribute will contain the cross-validation score for the best model found, while best_params_ will be a dictionary of the hyperparameter values that generated the optimal cross-validation score.

In [7]:
lr_gridsearch = GridSearchCV(lin_reg, param_grid, cv=10, scoring='accuracy', 
                             refit=True, iid=False)
lr_gridsearch.fit(X, y)

print(lr_gridsearch.best_score_)
print(lr_gridsearch.best_params_)
0.6270556816033276
{'C': 0.1623776739188721}

We see that the highest cross-validation score obtain for any of the values of C considered was 62.7%. This was obtained by using C = 0.16237767.

Obtaining the Best Model

When trained, GridSearchCV class will automatically refit a final model to the full training set using the optimal hyperparameter values found. This model is stored in the attribute best_estimator_.

In the cell below, we extract the best model from our GridSearchCV object and use it to calculate the training accuracy for this model.

In [8]:
lr_model = lr_gridsearch.best_estimator_
print('Training Score:', lr_model.score(X, y))
Training Score: 0.6775

Grid Search with Decision Trees

We will now illustrate how to use GridSearchCV to perform hyperparameter tuning for a decision tree. We will tune over two hyperparameters: max_depth and min_samples_leaf.

In [9]:
param_grid = [{
    'max_depth': [2, 4, 8, 16, 32, 64], 
    'min_samples_leaf': [2, 4, 8, 16]
}]

tree = DecisionTreeClassifier()

np.random.seed(1)
dt_gridsearch = GridSearchCV(tree, param_grid, cv=10, scoring='accuracy', 
                             refit=True, iid=False)
dt_gridsearch.fit(X, y)

print(dt_gridsearch.best_score_)
print(dt_gridsearch.best_params_)
0.6941086846914981
{'max_depth': 32, 'min_samples_leaf': 8}

The decision tree with the highest cross-validation score had a max_depth of 32 and a min_samples_leaf of 8. Notice that this model outperforms the best logistic regression model that we found above. In the cell below, we extract the best model from the GridSearchCV object, and calculate its score on the training set.

In [10]:
dt_model = dt_gridsearch.best_estimator_
print('Training Score:', dt_model.score(X, y))
Training Score: 0.8225

Grid Search with Random Forests

We will now illustrate how to use GridSearchCV to perform hyperparameter tuning for a random forest. We will tune over two hyperparameters: max_depth and min_samples_leaf. We will set the n_estimators hyperparameter to 200.

In [11]:
param_grid = [{
    'max_depth':[2, 4, 8, 16, 32, 64], 
    'min_samples_leaf':[2, 4, 8, 16]
}]

forest = RandomForestClassifier(n_estimators=200)

np.random.seed(1)
rf_gridsearch = GridSearchCV(forest, param_grid, cv=10, scoring='accuracy',
                             refit=True, iid=False)
rf_gridsearch.fit(X, y)

print(rf_gridsearch.best_score_)
print(rf_gridsearch.best_params_)
0.8217596268985945
{'max_depth': 8, 'min_samples_leaf': 4}

The random forest with the highest cross-validation score had a max_depth of 8 and a min_samples_leaf of 4. This model outperforms either of our previous two models. In the cell below, we extract the best model from the GridSearchCV object, and calculate its score on the training set.

In [12]:
rf_model = rf_gridsearch.best_estimator_
print('Training Score:', rf_model.score(X, y))
Training Score: 0.955

Exploring Grid Search Results

If we would like to see more detailed results pertaining to the results of the grid seach process, more information can be found in the cv_results attribute of a trained instance of the GridSearchCV class. This attribute contains a dictionary with several pieces of information pertaining to the results of the cross-validation steps. We will start by looking at the keys of the items stored in this dictionary.

In [13]:
cv_res = rf_gridsearch.cv_results_
print(cv_res.keys())
dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_max_depth', 'param_min_samples_leaf', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'split5_test_score', 'split6_test_score', 'split7_test_score', 'split8_test_score', 'split9_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score'])

The items split0_test_score through split9_test_score each contain the validation score for each of the models considered on one particular fold. The average validation scores for each individual model can be found in the mean_test_score item.

In [14]:
print(cv_res['mean_test_score'])
[0.54909498 0.55449471 0.55295157 0.53729265 0.77979139 0.77207459
 0.78416478 0.76469654 0.80936299 0.82175963 0.79683919 0.77479139
 0.81936299 0.81675963 0.79694255 0.76457459 0.81459838 0.81380743
 0.78677822 0.76918337 0.81964076 0.8120374  0.78190017 0.7727012 ]

In the cell below, we print the average test scores along with the hyperparameter values for the models that generated them.

In [15]:
for score, params in zip(cv_res['mean_test_score'], cv_res['params']):
    print(score, params)
0.5490949769962816 {'max_depth': 2, 'min_samples_leaf': 2}
0.5544947059935716 {'max_depth': 2, 'min_samples_leaf': 4}
0.5529515661435684 {'max_depth': 2, 'min_samples_leaf': 8}
0.5372926514148862 {'max_depth': 2, 'min_samples_leaf': 16}
0.7797913909371651 {'max_depth': 4, 'min_samples_leaf': 2}
0.7720745887691435 {'max_depth': 4, 'min_samples_leaf': 4}
0.784164775950085 {'max_depth': 4, 'min_samples_leaf': 8}
0.7646965399886556 {'max_depth': 4, 'min_samples_leaf': 16}
0.8093629860717211 {'max_depth': 8, 'min_samples_leaf': 2}
0.8217596268985945 {'max_depth': 8, 'min_samples_leaf': 4}
0.7968391945547362 {'max_depth': 8, 'min_samples_leaf': 8}
0.7747913909371652 {'max_depth': 8, 'min_samples_leaf': 16}
0.8193629860717211 {'max_depth': 16, 'min_samples_leaf': 2}
0.8167596268985946 {'max_depth': 16, 'min_samples_leaf': 4}
0.7969425537278628 {'max_depth': 16, 'min_samples_leaf': 8}
0.7645745887691435 {'max_depth': 16, 'min_samples_leaf': 16}
0.8145983802861284 {'max_depth': 32, 'min_samples_leaf': 2}
0.8138074305161656 {'max_depth': 32, 'min_samples_leaf': 4}
0.7867782189449801 {'max_depth': 32, 'min_samples_leaf': 8}
0.7691833679964707 {'max_depth': 32, 'min_samples_leaf': 16}
0.8196407638494991 {'max_depth': 64, 'min_samples_leaf': 2}
0.8120374046763723 {'max_depth': 64, 'min_samples_leaf': 4}
0.7819001701644923 {'max_depth': 64, 'min_samples_leaf': 8}
0.7727012037562236 {'max_depth': 64, 'min_samples_leaf': 16}

We see that although the max_depth=8, min_samples_leaf=4 model performed the best, there were a few other models that had very similar results.