Lesson 07 - Regression and Regularization

The following topics are discussed in this notebook:

  • Using a neural network for performing regression.
  • Feature scaling.
  • Applying regularization in a neural network.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random

import keras
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import set_random_seed

from sklearn.model_selection import train_test_split
Using TensorFlow backend.

Boston Housing Data

In this example, we will be working with the "Boston Housing" dataset. This dataset contains data for 506 census tracts of Boston from the 1970 census.

The dataset contains the following 19 pieces of information for each census tract:

  • town - name of town
  • tract - census tract
  • lon - longitude of census tract
  • lat - latitude of census tract
  • medv - median value of owner-occupied homes in USD 1000's
  • cmedv - corrected median value of owner-occupied homes in USD 1000's
  • crim - per capita crime rate by town
  • zn - proportion of residential land zoned for lots over 25,000 sq.ft
  • indus - proportion of non-retail business acres per town
  • chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • nox - nitric oxides concentration (parts per 10 million)
  • rm - average number of rooms per dwelling
  • age - proportion of owner-occupied units built prior to 1940
  • dis - weighted distances to five Boston employment centres
  • rad - index of accessibility to radial highways
  • tax - full-value property-tax rate per USD 10,000
  • ptratio - pupil-teacher ratio by town
  • b - 1000(B - 0.63)^2 where B is the proportion of blacks by town
  • lstat - percentage of lower status of the population

We will start by importing the dataset from a text file, and then viewing the first 10 rows.

In [2]:
df = pd.read_csv('data/BostonHousingV2.txt', sep='\t')
df.head(n=10)
Out[2]:
town tract lon lat medv cmedv crim zn indus chas nox rm age dis rad tax ptratio b lstat
0 Nahant 2011 -70.9550 42.2550 24.0 24.0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
1 Swampscott 2021 -70.9500 42.2875 21.6 21.6 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
2 Swampscott 2022 -70.9360 42.2830 34.7 34.7 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
3 Marblehead 2031 -70.9280 42.2930 33.4 33.4 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
4 Marblehead 2032 -70.9220 42.2980 36.2 36.2 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
5 Marblehead 2033 -70.9165 42.3040 28.7 28.7 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
6 Salem 2041 -70.9360 42.2970 22.9 22.9 0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.60 12.43
7 Salem 2042 -70.9375 42.3100 27.1 22.1 0.14455 12.5 7.87 0 0.524 6.172 96.1 5.9505 5 311 15.2 396.90 19.15
8 Salem 2043 -70.9330 42.3120 16.5 16.5 0.21124 12.5 7.87 0 0.524 5.631 100.0 6.0821 5 311 15.2 386.63 29.93
9 Salem 2044 -70.9290 42.3160 18.9 18.9 0.17004 12.5 7.87 0 0.524 6.004 85.9 6.5921 5 311 15.2 386.71 17.10
In [3]:
print(df.shape)
(506, 19)
In [4]:
X = df.iloc[:,6:].values
y = df.iloc[:,5].values

print(X.shape)
print(y.shape)
(506, 13)
(506,)
In [5]:
X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size=0.4, random_state=1)
X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=1)

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)
(303, 13)
(101, 13)
(102, 13)

Fit Linear Model

We will start by using Scikit-Learn to fit a linear regression model to serve as a baseline to compare our later models against.

In [6]:
from sklearn.linear_model import LinearRegression
In [7]:
lin_mod = LinearRegression()
lin_mod.fit(X_train, y_train)

print('Training r-Squared:  ', lin_mod.score(X_train, y_train))
print('Validation r-Squared:', lin_mod.score(X_val, y_val))
Training r-Squared:   0.7506185989057699
Validation r-Squared: 0.705775930720328

Scale Data

We will use StandardScaler from sklearn to scale our features.

In [8]:
#from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
In [9]:
#scaler = MinMaxScaler()
scaler = StandardScaler()

scaler.fit(X_train)

Xs_train = scaler.transform(X_train)
Xs_val = scaler.transform(X_val)
Xs_test = scaler.transform(X_test)

Define Custom Metric for r-Squared

Unfortunately, Keras does not come equipped with a metric for reporting r-squared values in a regression problem. We can, however, create our own custom metrics to use with Keras.

In [10]:
import keras.backend as K

def r2(y_true, y_pred):
    SS_res =  K.sum(K.square( y_true-y_pred ))
    SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
    return ( 1 - SS_res/(SS_tot + K.epsilon()) )

Train a Neural Network

In [11]:
#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_1 = Sequential()
model_1.add(Dense(32, input_shape=(13,), activation='relu'))
model_1.add(Dense(16, activation='relu'))
model_1.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.01)
model_1.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_1.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 50
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('r2:   ', max(h.history['val_r2']))
print('Epoch:', np.argmax(h.history['val_r2']))
-- Minimum Validation Loss --
Loss:  12.294610977172852
Epoch: 209

-- Maximum Validation r2 --
r2:    0.8791505694389343
Epoch: 209

Retrain Model with Fewer Epochs

In [12]:
np.random.seed(1)
set_random_seed(1)

model_2 = Sequential()
model_2.add(Dense(32, input_shape=(13,), activation='relu'))
model_2.add(Dense(16, activation='relu'))
model_2.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.01)
model_2.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_2.fit(Xs_train, y_train, batch_size=500, epochs=215, validation_data = [Xs_val, y_val], verbose=0)

tr_results = model_2.evaluate(Xs_train, y_train, verbose=0)
va_results = model_2.evaluate(Xs_val, y_val, verbose=0)

print('Training Loss:  ', tr_results[0])
print('Training r2:    ', tr_results[1], '\n')

print('Validation Loss:', va_results[0])
print('Validation r2:  ', va_results[1])
Training Loss:   7.33555284349045
Training r2:     0.8938835619306407 

Validation Loss: 12.319281096505646
Validation r2:   0.8616513769225319

Apply L2 Regularization

We will now train the same model, this time adding an L2 regularization term to the loss. In this scenario, our learning algorithm will try to minimize the following function:

$$\Large Loss = \frac{1}{n} SSE + \lambda \sum \left[w^{(d)}_{i,j} \right]^2$$

Where $w^{(d)}_{i,j}$ is the weight that leads from node i in layer d-1 to node j in layer d. The sum in the penalty term is only calculated over the non-bias weights.

The constant $\lambda$ is a model hyperparameter that we are free to choose.

In [13]:
from keras import regularizers
In [14]:
#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_3 = Sequential()
model_3.add(Dense(32, input_shape=(13,), activation='relu', kernel_regularizer=regularizers.l2(0.2)))
model_3.add(Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.2)))
model_3.add(Dense(1, activation='linear', kernel_regularizer=regularizers.l2(0.2)))

opt = keras.optimizers.Adam(lr=0.01)
model_3.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_3.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 50
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('Loss: ', max(h.history['val_r2']))
print('Epoch:   ', np.argmax(h.history['val_r2']))
-- Minimum Validation Loss --
Loss:  16.652097702026367
Epoch: 1896

-- Maximum Validation r2 --
Loss:  0.8931909203529358
Epoch:    1896

Apply Dropout Regularization

In [15]:
from keras.layers import Dropout
In [16]:
#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_4 = Sequential()
model_4.add(Dense(32, input_shape=(13,), activation='relu'))
model_4.add(Dropout(0.1))
model_4.add(Dense(16, activation='relu'))
model_4.add(Dropout(0.1))
model_4.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.001)
model_4.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_4.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 200
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('Loss: ', max(h.history['val_r2']))
print('r2:   ', np.argmax(h.history['val_r2']))
-- Minimum Validation Loss --
Loss:  12.617439270019531
Epoch: 1992

-- Maximum Validation r2 --
Loss:  0.8759773969650269
r2:    1992

Evaluate Testing Performance

In [17]:
testing_results = model_3.evaluate(Xs_test, y_test, verbose=0)

print('Testing Loss: ', testing_results[0])
print('Testing r2:   ', testing_results[1])
Testing Loss:  19.094673867319145
Testing r2:    0.8409638159415301