Lesson 04 - Testing LinReg Class

The following topics are discussed in this notebook:

  • Testing our implementation of the linear regression algorithm.
In [1]:
import numpy as np
import pandas as pd

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from LinearRegression import LinReg
#%run -i LinearRegression.py

Generate Data

In [2]:
np.random.seed(1)

n = 500
x1 = np.random.uniform(0, 10, n)
x2 = np.random.uniform(0, 50, n)
x3 = np.random.uniform(100, 200, n)
y = 3 + 4.2 * x1 + 0.7 * x2 + 0.2 * x3 + np.random.normal(0, 6, n)

df = pd.DataFrame({'x1':x1, 'x2':x2, 'x3':x3, 'y':y})

df.head()
Out[2]:
x1 x2 x3 y
0 4.170220 4.374110 132.580997 52.273088
1 7.203245 11.365487 188.982734 73.127695
2 0.001144 15.718831 175.170772 44.389317
3 3.023326 8.738294 176.263210 68.452480
4 1.467559 30.354708 146.947903 59.731218
In [3]:
X = np.hstack([x1.reshape(n,1), x2.reshape(n,1), x3.reshape(n,1)])
print(X.shape)
(500, 3)
In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(400, 3)
(100, 3)
(400,)
(100,)

Testing our Implementation

In [5]:
mod = LinReg(X_train, y_train)

print('Model coefficients:', mod.coefficients)
print('Training r-Squared:', mod.r_squared)
print('Testing r-Squared: ', mod.score(X_test, y_test))
Model coefficients: [0.72734427 4.29466891 0.73180363 0.20863976]
Training r-Squared: 0.8854550708351295
Testing r-Squared:  0.8777033249070975
In [6]:
mod.summary()
+-----------------------------+
|  Linear Regression Summary  |
+-----------------------------+
Number of training observations: 400
Coefficient Estimates:   [0.72734427 4.29466891 0.73180363 0.20863976]
Residual Standard Error: 6.4003411505276455
r-Squared: 0.8854550708351295

Compare with Scikit-Learn Results

In [7]:
sk_mod = LinearRegression()
sk_mod.fit(X_train, y_train)

print('Model intercept:   ', sk_mod.intercept_)
print('Model coefficients:', sk_mod.coef_)
print('Training r-Squared:', sk_mod.score(X_train,y_train))
print('Testing r-Squared: ', sk_mod.score(X_test,y_test))
Model intercept:    0.7273095326780492
Model coefficients: [4.29466885 0.73180367 0.20863999]
Training r-Squared: 0.8854550708352479
Testing r-Squared:  0.8777032659161139