Lesson 07 - Regression and Regularization¶

The following topics are discussed in this notebook:¶

Using a neural network for performing regression.
Feature scaling.
Applying regularization in a neural network.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import random

import keras
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import set_random_seed

from sklearn.model_selection import train_test_split

Using TensorFlow backend.

Boston Housing Data¶

In this example, we will be working with the "Boston Housing" dataset. This dataset contains data for 506 census tracts of Boston from the 1970 census.

The dataset contains the following 19 pieces of information for each census tract:

town - name of town
tract - census tract
lon - longitude of census tract
lat - latitude of census tract
medv - median value of owner-occupied homes in USD 1000's
cmedv - corrected median value of owner-occupied homes in USD 1000's
crim - per capita crime rate by town
zn - proportion of residential land zoned for lots over 25,000 sq.ft
indus - proportion of non-retail business acres per town
chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nox - nitric oxides concentration (parts per 10 million)
rm - average number of rooms per dwelling
age - proportion of owner-occupied units built prior to 1940
dis - weighted distances to five Boston employment centres
rad - index of accessibility to radial highways
tax - full-value property-tax rate per USD 10,000
ptratio - pupil-teacher ratio by town
b - 1000(B - 0.63)^2 where B is the proportion of blacks by town
lstat - percentage of lower status of the population

We will start by importing the dataset from a text file, and then viewing the first 10 rows.

df = pd.read_csv('data/BostonHousingV2.txt', sep='\t')
df.head(n=10)

print(df.shape)

(506, 19)

X = df.iloc[:,6:].values
y = df.iloc[:,5].values

print(X.shape)
print(y.shape)

(506, 13)
(506,)

X_train, X_holdout, y_train, y_holdout = train_test_split(X, y, test_size=0.4, random_state=1)
X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=1)

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)

(303, 13)
(101, 13)
(102, 13)

Fit Linear Model¶

We will start by using Scikit-Learn to fit a linear regression model to serve as a baseline to compare our later models against.

from sklearn.linear_model import LinearRegression

lin_mod = LinearRegression()
lin_mod.fit(X_train, y_train)

print('Training r-Squared:  ', lin_mod.score(X_train, y_train))
print('Validation r-Squared:', lin_mod.score(X_val, y_val))

Training r-Squared:   0.7506185989057699
Validation r-Squared: 0.705775930720328

Scale Data¶

We will use StandardScaler from sklearn to scale our features.

#from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

#scaler = MinMaxScaler()
scaler = StandardScaler()

scaler.fit(X_train)

Xs_train = scaler.transform(X_train)
Xs_val = scaler.transform(X_val)
Xs_test = scaler.transform(X_test)

Define Custom Metric for r-Squared¶

Unfortunately, Keras does not come equipped with a metric for reporting r-squared values in a regression problem. We can, however, create our own custom metrics to use with Keras.

import keras.backend as K

def r2(y_true, y_pred):
    SS_res =  K.sum(K.square( y_true-y_pred ))
    SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
    return ( 1 - SS_res/(SS_tot + K.epsilon()) )

Train a Neural Network¶

#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_1 = Sequential()
model_1.add(Dense(32, input_shape=(13,), activation='relu'))
model_1.add(Dense(16, activation='relu'))
model_1.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.01)
model_1.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_1.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 50
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('r2:   ', max(h.history['val_r2']))
print('Epoch:', np.argmax(h.history['val_r2']))

-- Minimum Validation Loss --
Loss:  12.294610977172852
Epoch: 209

-- Maximum Validation r2 --
r2:    0.8791505694389343
Epoch: 209

Retrain Model with Fewer Epochs¶

np.random.seed(1)
set_random_seed(1)

model_2 = Sequential()
model_2.add(Dense(32, input_shape=(13,), activation='relu'))
model_2.add(Dense(16, activation='relu'))
model_2.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.01)
model_2.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_2.fit(Xs_train, y_train, batch_size=500, epochs=215, validation_data = [Xs_val, y_val], verbose=0)

tr_results = model_2.evaluate(Xs_train, y_train, verbose=0)
va_results = model_2.evaluate(Xs_val, y_val, verbose=0)

print('Training Loss:  ', tr_results[0])
print('Training r2:    ', tr_results[1], '\n')

print('Validation Loss:', va_results[0])
print('Validation r2:  ', va_results[1])

Training Loss:   7.33555284349045
Training r2:     0.8938835619306407 

Validation Loss: 12.319281096505646
Validation r2:   0.8616513769225319

Apply L2 Regularization¶

We will now train the same model, this time adding an L2 regularization term to the loss. In this scenario, our learning algorithm will try to minimize the following function:

$$\Large Loss = \frac{1}{n} SSE + \lambda \sum \left[w^{(d)}_{i,j} \right]^2$$

Where $w^{(d)}_{i,j}$ is the weight that leads from node i in layer d-1 to node j in layer d. The sum in the penalty term is only calculated over the non-bias weights.

The constant $\lambda$ is a model hyperparameter that we are free to choose.

from keras import regularizers

#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_3 = Sequential()
model_3.add(Dense(32, input_shape=(13,), activation='relu', kernel_regularizer=regularizers.l2(0.2)))
model_3.add(Dense(16, activation='relu', kernel_regularizer=regularizers.l2(0.2)))
model_3.add(Dense(1, activation='linear', kernel_regularizer=regularizers.l2(0.2)))

opt = keras.optimizers.Adam(lr=0.01)
model_3.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_3.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 50
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('Loss: ', max(h.history['val_r2']))
print('Epoch:   ', np.argmax(h.history['val_r2']))

-- Minimum Validation Loss --
Loss:  16.652097702026367
Epoch: 1896

-- Maximum Validation r2 --
Loss:  0.8931909203529358
Epoch:    1896

Apply Dropout Regularization¶

from keras.layers import Dropout

#####################################################
# Design and Train Model
#####################################################

np.random.seed(1)
set_random_seed(1)

model_4 = Sequential()
model_4.add(Dense(32, input_shape=(13,), activation='relu'))
model_4.add(Dropout(0.1))
model_4.add(Dense(16, activation='relu'))
model_4.add(Dropout(0.1))
model_4.add(Dense(1, activation='linear'))

opt = keras.optimizers.Adam(lr=0.001)
model_4.compile(loss='mean_squared_error', optimizer=opt, metrics=[r2])

h = model_4.fit(Xs_train, y_train, batch_size=500, epochs=2000, validation_data = [Xs_val, y_val], verbose=0)

#####################################################
# Visualize Training Progress
#####################################################

m = 200
rng = range(m, len(h.history['loss']))

plt.figure(figsize=[10,4])
plt.subplot(1,2,1)
plt.plot(rng, h.history['loss'][m:], label='Training')
plt.plot(rng, h.history['val_loss'][m:], label='Validation')
plt.title('Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
    
plt.subplot(1,2,2)
plt.plot(rng, h.history['r2'][m:], label='Training')
plt.plot(rng, h.history['val_r2'][m:], label='Validation')
plt.title('r2')
plt.ylabel('r2')
plt.xlabel('Epoch')
plt.legend()
plt.show()

#####################################################
# Find Minimum Validation Loss
#####################################################

print('-- Minimum Validation Loss --')
print('Loss: ', min(h.history['val_loss']))
print('Epoch:', np.argmin(h.history['val_loss']))
print()

print('-- Maximum Validation r2 --')
print('Loss: ', max(h.history['val_r2']))
print('r2:   ', np.argmax(h.history['val_r2']))

-- Minimum Validation Loss --
Loss:  12.617439270019531
Epoch: 1992

-- Maximum Validation r2 --
Loss:  0.8759773969650269
r2:    1992

Evaluate Testing Performance¶

testing_results = model_3.evaluate(Xs_test, y_test, verbose=0)

print('Testing Loss: ', testing_results[0])
print('Testing r2:   ', testing_results[1])

Testing Loss:  19.094673867319145
Testing r2:    0.8409638159415301

	town	tract	lon	lat	medv	cmedv	crim	zn	indus	nox	rm	age	dis	rad	tax	ptratio	b	lstat
0	Nahant	2011	-70.9550	42.2550	24.0	24.0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1	296	15.3	396.90	4.98
1	Swampscott	2021	-70.9500	42.2875	21.6	21.6	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2	242	17.8	396.90	9.14
2	Swampscott	2022	-70.9360	42.2830	34.7	34.7	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2	242	17.8	392.83	4.03
3	Marblehead	2031	-70.9280	42.2930	33.4	33.4	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3	222	18.7	394.63	2.94
4	Marblehead	2032	-70.9220	42.2980	36.2	36.2	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3	222	18.7	396.90	5.33
5	Marblehead	2033	-70.9165	42.3040	28.7	28.7	0.02985	0.0	2.18	0.458	6.430	58.7	6.0622	3	222	18.7	394.12	5.21
6	Salem	2041	-70.9360	42.2970	22.9	22.9	0.08829	12.5	7.87	0.524	6.012	66.6	5.5605	5	311	15.2	395.60	12.43
7	Salem	2042	-70.9375	42.3100	27.1	22.1	0.14455	12.5	7.87	0.524	6.172	96.1	5.9505	5	311	15.2	396.90	19.15
8	Salem	2043	-70.9330	42.3120	16.5	16.5	0.21124	12.5	7.87	0.524	5.631	100.0	6.0821	5	311	15.2	386.63	29.93
9	Salem	2044	-70.9290	42.3160	18.9	18.9	0.17004	12.5	7.87	0.524	6.004	85.9	6.5921	5	311	15.2	386.71	17.10