Lesson 07 - Feature Scaling¶

The following topics are discussed in this notebook:¶

Using Scikit-Learn to scale numerical features.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

For some supervised learning models, it is important for all of the numerical features to be on roughly the same scale. As it turns out, this does not apply to linear regression, but it does apply to certain variants of linear regression that we will consider soon.

In this lesson, we will learn about two types of feature scaling (normalization and standardization), and will see how to use tools from Scikit-Learn to perform these scaling methods. We will illustrate these concepts using a subset of the Boston Housing dataset.

Load the Dataset¶

In the next few cells, we will load the dataset and will select a subset of the features to work with.

df = pd.read_csv('data/BostonHousingV2.txt', sep='\t')
df = df.iloc[:, [5, 6, 11, 12, 16]]
df.head(n=10)

X = df.iloc[:, 1:].values
y = df.iloc[:,0].values

print(X.shape)
print(y.shape)

(506, 4)
(506,)

X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size=0.4, random_state=1)
X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=1)

Explore Scale of Features in Training Set¶

np.set_printoptions(suppress=True)

X_min = X_train.min(axis=0)
X_max = X_train.max(axis=0)
X_avg = X_train.mean(axis=0)
X_std = X_train.std(axis=0)

print(np.vstack([X_min, X_max, X_avg, X_std]))

[[  0.00632      3.863        6.          12.6       ]
 [ 73.5341       8.398      100.          22.        ]
 [  3.7912705    6.2409967   68.92937294  18.38910891]
 [  8.8976109    0.67164858  28.48253177   2.17961911]]

Normalization¶

Let $X$ refer to a feature in a dataset. Let $x_{min}$ be the smallest value of $X$ for observations in the training set, and let $x_{max}$ be the largest value of $X$ for observations in the training set.

If $x_0$ is a value of $X$ for an observation (from any set), its normalized value is given by: $\large z_0 = \frac{x_0 - x_{min}}{x_{max} - x_{min}}$

When using normalization to scale features, each feature in the scaled training set will have values ranging between 0 and 1.

We can perform normalization using the MinMaxScaler class from sklearn.preprocessing.

from sklearn.preprocessing import MinMaxScaler

n_scaler = MinMaxScaler()

Xn_train = n_scaler.fit_transform(X_train)

#n_scaler.fit(X_train)
#Xn_train = n_scaler.transform(X_train)

print(np.vstack([
    Xn_train.min(axis=0),
    Xn_train.max(axis=0),
    Xn_train.mean(axis=0),
    X_train.std(axis=0)
]))

[[ 0.          0.          0.          0.        ]
 [ 1.          1.          1.          1.        ]
 [ 0.05147647  0.52436531  0.66946141  0.61586265]
 [ 8.8976109   0.67164858 28.48253177  2.17961911]]

We can use the scaler that was fit to the training set to also scale the features in the validation and testing sets.

Xn_val = n_scaler.transform(X_val)
Xn_test = n_scaler.transform(X_test)

Xn_test_min = Xn_train.min(axis=0)
Xn_max = Xn_train.max(axis=0)

print(np.vstack([
    Xn_val.min(axis=0),
    Xn_val.max(axis=0),
    Xn_val.mean(axis=0),
    Xn_val.std(axis=0)
]), '\n')

print(np.vstack([
    Xn_test.min(axis=0),
    Xn_test.max(axis=0),
    Xn_test.mean(axis=0),
    Xn_test.std(axis=0)
]))

[[ 0.00006311 -0.06659316  0.04042553  0.04255319]
 [ 1.21001722  1.06747519  1.          1.        ]
 [ 0.04882438  0.56006637  0.65115863  0.65809985]
 [ 0.13324011  0.15779041  0.29707514  0.20492739]] 

[[ 0.00018129  0.06063947 -0.03297872  0.        ]
 [ 0.51212725  1.08423374  1.          0.91489362]
 [ 0.0421103   0.53674903  0.66887776  0.6090947 ]
 [ 0.08130617  0.16755498  0.2891138   0.24445259]]

Standardization¶

Let $X$ refer to a feature in a dataset. Let $\bar x$ be the mean of $X$ values for observations in the training set, and let $s_X$ be the standard deviation of $X$ for observations in the training set.

If $x_0$ is a value of $X$ for an observation (from any set), its normalized value is given by: $\large z_0 = \frac{x_0 - \bar x}{s_X}$

When using normalization to scale features, then each feature in the scaled training set will have a mean of 0 and a standard deviation of 1.

We can perform standardization using the StandardScaler class from sklearn.preprocessing.

from sklearn.preprocessing import StandardScaler

s_scaler = StandardScaler()

Xs_train = s_scaler.fit_transform(X_train)

#s_scaler.fit(X_train)
#Xs_train = s_scaler.transform(X_train)

print(np.vstack([
    Xs_train.min(axis=0),
    Xs_train.max(axis=0),
    Xs_train.mean(axis=0),
    Xs_train.std(axis=0)
]))

[[-0.42538953 -3.54053709 -2.20940236 -2.6560186 ]
 [ 7.8383771   3.2115058   1.09086605  1.65666151]
 [ 0.          0.          0.          0.        ]
 [ 1.          1.          1.          1.        ]]

We can use the scaler that was fit to the training set to also scale the features in the validation and testing sets.

Xs_val = s_scaler.transform(X_val)
Xs_test = s_scaler.transform(X_test)

Xs_test_min = Xs_train.min(axis=0)
Xs_max = Xs_train.max(axis=0)

print(np.vstack([
    Xs_val.min(axis=0),
    Xs_val.max(axis=0),
    Xs_val.mean(axis=0),
    Xs_val.std(axis=0)
]), '\n')

print(np.vstack([
    Xs_test.min(axis=0),
    Xs_test.max(axis=0),
    Xs_test.mean(axis=0),
    Xs_test.std(axis=0)
]))

[[-0.42486804 -3.99017699 -2.07598726 -2.4725003 ]
 [ 9.5739104   3.66710119  1.09086605  1.65666151]
 [-0.02191623  0.24105506 -0.06040411  0.18215554]
 [ 1.10106518  1.06540759  0.98042769  0.88378628]] 

[[-0.42389137 -3.13109678 -2.318241   -2.6560186 ]
 [ 3.80671058  3.78025561  1.09086605  1.2896249 ]
 [-0.07739987  0.0836154  -0.0019262  -0.029188  ]
 [ 0.67189525  1.13133838  0.95415315  1.05424582]]

Comparison of Regression Models Trained on Scaled and Unscaled Data¶

Although scaling is critical for the effectiveness of certain learning algorithms, the basic version of linear regression is not such an algorithm. To illustrate this fact, we will train three models, one on the unscaled data, one on the normalized data, and one on the standardized data. We will show that feature scaling does not affect model performance for linear regression.

Unscaled Data¶

mod1 = LinearRegression()
mod1.fit(X_train, y_train)

print('Training r-Squared:   ', mod1.score(X_train, y_train))
print('Validation r-Squared: ', mod1.score(X_val, y_val))

Training r-Squared:    0.6209574849008046
Validation r-Squared:  0.5048967029572833

Normalized Data¶

mod2 = LinearRegression()
mod2.fit(Xn_train, y_train)

print('Training r-Squared:   ', mod2.score(Xn_train, y_train))
print('Validation r-Squared: ', mod2.score(Xn_val, y_val))

Training r-Squared:    0.6209574849008046
Validation r-Squared:  0.5048967029572842

Standardized Data¶

mod3 = LinearRegression()
mod3.fit(Xs_train, y_train)

print('Training r-Squared:   ', mod3.score(Xs_train, y_train))
print('Validation r-Squared: ', mod3.score(Xs_val, y_val))

Training r-Squared:    0.6209574849008046
Validation r-Squared:  0.5048967029572844

	cmedv	crim	rm	age	ptratio
0	24.0	0.00632	6.575	65.2	15.3
1	21.6	0.02731	6.421	78.9	17.8
2	34.7	0.02729	7.185	61.1	17.8
3	33.4	0.03237	6.998	45.8	18.7
4	36.2	0.06905	7.147	54.2	18.7
5	28.7	0.02985	6.430	58.7	18.7
6	22.9	0.08829	6.012	66.6	15.2
7	22.1	0.14455	6.172	96.1	15.2
8	16.5	0.21124	5.631	100.0	15.2
9	18.9	0.17004	6.004	85.9	15.2