Hypothetical and Fitted Models

We begin this lesson by recalling the concepts and notation that we introduced in the previous lesson.

When performing simple linear regression to describe a relationship between two variables \(X\) and \(Y\), we begin by assuming that the relationship is determined by a hypothetical model of the form:
\[Y = \beta_0 + \beta_1 X + e\]

We collect a sample of paired observations \((x_i, y_i)\) which we use to create a fitted model. The fitted model is an approximation of the hypothetical model, and can be written in either of the following (equivalent) forms:

\[\hat Y = \hat \beta_0 + \hat \beta_1 X\] \[Y = \hat \beta_0 + \hat \beta_1 X + \hat e\]
Given a paired observation \(\left(x_i, y_i \right)\), the fitted value of y given \(x_i\) is given by:
\[\hat y_i = \hat\beta_0 + \hat\beta_1 x_i\]

The residual associated with the observation is given by:

\[\hat e_i = y_i - \hat y_i\]

Least Squares Regression

We will now discuss how to find the fitted model \(\hat Y = \hat \beta_0 + \hat \beta_1 X\). Assume that such a model has been proposed. In other words, assume that parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\) have been suggested. We can use this model to calculate fitted values and residuals for each observation in our sample, as described above.

Intuitively, it makes sense that we would want to select parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\) so that the total size of the residuals is small. This suggests that we might attempt to select parameter estimates so as to minimize the following quantity:

\[\sum\limits_{i=1}^n \hat e_i\]

However, we need to note that some of the residuals could be positive, and some could be negative. For the proposed scoring method above, we could encounter a situation in which large positive residuals cancel out with large negative residuals, falsely indicating that there is a small amount of error overall. To remedy this issue, we will square the residuals before summing them. Thus, our goal is to select parameter estimates so as to minimize the quantity:

\[SSE = \sum\limits_{i=1}^n \hat e_i^2\]

To emphasize that this is a function of the proposed parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\), we sometimes write:

\[SSE\left ( \hat\beta_0, \hat\beta_1 \right) = \sum\limits_{i=1}^n \hat e_i^2 = \sum\limits_{i=1}^n \left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right) ^2\]

Preliminary Derivations

Before we derive the optimal values for our parameter estimates, we need to establish two preliminary results. These results now will make our derivation of \(\hat\beta_0\) and \(\hat\beta_1\) go a bit smoother. Assume that we have a sample of \(n\) paired observations \((x_i, y_i)\). We define the quantities \(SXX\) and \(SXY\) as follows:

\[SXX = \sum\limits_{i=1}^n \left(x_i - \bar x \right)^2\] \[SXY = \sum\limits_{i=1}^n \left(x_i - \bar x \right)\left(y_i - \bar y \right)\] Notice that we can rewrite the expression for \(SXY\) as follows:

\[SXY = \sum\limits_{i=1}^n \left(x_i - \bar x \right)\left(y_i - \bar y \right) = \sum\limits_{i=1}^n \left(x_i y_i - \bar x y_i - \bar y x_i + \bar x \bar y \right) \] \[= \sum\limits_{i=1}^n x_i y_i - \bar x \sum\limits_{i=1}^n y_i - \bar y \sum\limits_{i=1}^n x_i + \sum\limits_{i=1}^n\bar x \bar y\] \[ = \sum\limits_{i=1}^n x_i y_i - n\bar x \bar y - n\bar y \bar x + n\bar x \bar y = \sum\limits_{i=1}^n x_i y_i - n\bar x \bar y \]

In summary, we have shown that: \[SXY = \sum\limits_{i=1}^n \left(x_i - \bar x \right)\left(y_i - \bar y \right) = \sum\limits_{i=1}^n x_i y_i - n\bar x \bar y\] A very similar argument shows that: \[SXX = \sum\limits_{i=1}^n \left(x_i - \bar x \right)^2 = \sum\limits_{i=1}^n x_i^2 - n\bar x^2\] We will need both of these expressions in the near future.

Parameter Estimates

We will now derive formulas for \(\hat\beta_0\) and \(\hat\beta_1\) that will minimize \(SSE\left ( \hat\beta_0, \hat\beta_1 \right)\). To do this, we will need to differentiate this function with respect to both \(\hat\beta_0\) and \(\hat\beta_1\), and then set the resulting expressions to 0.

\[\frac{\partial}{\partial\hat\beta_0} SSE\left ( \hat\beta_0, \hat\beta_1 \right) = -2 \sum\limits_{i=1}^n \left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right)\] \[\frac{\partial}{\partial\hat\beta_1} SSE\left ( \hat\beta_0, \hat\beta_1 \right) = -2 \sum\limits_{i=1}^n x_i\left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right)\]

Setting these two expressions to zero and dividing both sides by -2, we get:

\[\sum\limits_{i=1}^n \left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right) = 0\] \[\sum\limits_{i=1}^n x_i\left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right) = 0\]

The two equations above are referred to as the normal equations. We need to solve these equations for \(\hat\beta_0\) and \(\hat\beta_1\).

Notice that we can rewrite the first normal equation as follows:

\[\sum\limits_{i=1}^n y_i - \sum\limits_{i=1}^n\hat\beta_0 - \sum\limits_{i=1}^n\hat\beta_1 x_i = 0\]

\[n \bar y - n\hat\beta_0 -n \hat\beta_1 \bar x = 0\] \[\hat\beta_0 = \bar y - \hat\beta_1 \bar x\]

We will now work on the second normal equation:

\[\sum\limits_{i=1}^n \left(x_i y_i - \hat\beta_0 x_i - \hat\beta_1 x_i^2 \right) = 0\] \[ \sum\limits_{i=1}^n x_i y_i - \sum\limits_{i=1}^n \hat\beta_0 x_i - \sum\limits_{i=1}^n \hat\beta_1 x_i^2 = 0\] \[ \sum\limits_{i=1}^n x_i y_i - n\bar x \hat\beta_0 - \hat\beta_1\sum\limits_{i=1}^n x_i^2 = 0\] \[ n \bar x \hat\beta_0 + \hat\beta_1\sum\limits_{i=1}^n x_i^2 = \sum\limits_{i=1}^n x_i y_i\]

When working with the first normal equation above, we showed that \(\hat\beta_0 = \bar y - \hat\beta_1 \bar x\). We use this expression to substitute out \(\hat\beta_0\) in the last expression we derived from the second normal equation.

\[ n \bar x \left(\bar y - \hat\beta_1 \bar x \right) + \hat\beta_1\sum\limits_{i=1}^n x_i^2 = \sum\limits_{i=1}^n x_i y_i\] \[ n \bar x \bar y - n \hat\beta_1 \bar x^2 + \hat\beta_1\sum\limits_{i=1}^n x_i^2 = \sum\limits_{i=1}^n x_i y_i\]

Collecting the \(\hat\beta_1\) terms together gives:

\[ \hat\beta_1 \left( \sum\limits_{i=1}^n x_i^2 - n \bar x^2 \right) = \sum\limits_{i=1}^n x_i y_i - n \bar x \bar y\] Solving for \(\hat\beta_1\) yields:

\[ \hat\beta_1 = \frac{\sum\limits_{i=1}^n x_i y_i - n \bar x \bar y}{\sum\limits_{i=1}^n x_i^2 - n \bar x^2}\] Applying the results from our “preliminaries” section above, we conclude that:

\[\hat\beta_1= \frac{SXY}{SXX} \hspace{20px} \mathrm{and} \hspace{20px}\hat\beta_0 = \bar y - \hat\beta_1 \bar x\]

This provides us with formulas for the parameter estimates that will result in \(SSE\) being minimized.

Additional Comments

Alternate form for \(\hat\beta_1\)

There are many different ways to write the formula for \(\hat\beta_1\). One commonly encountered formula is \(\hat\beta_1 = \frac{cov[X,Y]}{s_X^2}\). This formula can be derived from our previous formula for \(\hat\beta_1\) by multiplying the top and bottom of the expression by \(1/n\).

Normal Equations

Notice that the the normal equations:

\[\sum\limits_{i=1}^n \left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right) = 0 \hspace{10px} \mathrm{and} \hspace{10px}\sum\limits_{i=1}^n x_i\left( y_i - \hat\beta_0 - \hat\beta_1 x_i \right) = 0\]

can be written as:

\[\sum\limits_{i=1}^n \hat e_i = 0\hspace{10px} \mathrm{and} \hspace{10px}\sum\limits_{i=1}^n x_i \hat e_i = 0\]

These versions of the normal equations will be useful to use in deriving certain results in the future.

Sample Means

Recall that our formula for the estimate of the intercept was given by \(\hat\beta_0 = \bar y - \hat\beta_1 \bar x\). We can rewrite this equation as \(\bar y = \hat\beta_0 - \hat\beta_1 \bar x\). This demonstrates that the point \((\bar x, \bar y)\) lies on the least squares regression line.

Summary

Given a sample of \(n\) paired observations \((x_i, y_i)\), we define \(SXX\) and \(SXY\) as follows: \[SXX = \sum\limits_{i=1}^n \left(x_i - \bar x \right)^2\] \[SXY = \sum\limits_{i=1}^n \left(x_i - \bar x \right)\left(y_i - \bar y \right)\]

The least squares regression line \(\hat y = \hat\beta_0 + \hat\beta_1 x\) is obtained by using the following parameter estimates:

\[\hat\beta_1= \frac{SXY}{SXX} = \frac{cov[X,Y]}{s_X^2} \hspace{20px} \mathrm{and} \hspace{20px}\hat\beta_0 = \bar y - \hat\beta_1 \bar x\]

The least squares regression line satisfies the following properties:

  1. \(SSE = \sum\limits_{i=1}^n \hat e_i^2\) is minimized.

  2. \(\sum\limits_{i=1}^n \hat e_i = 0\hspace{5px}\) and \(\hspace{5px}\sum\limits_{i=1}^n x_i \hat e_i = 0\)

  3. The line passes through the point \((\bar x, \bar y)\).

