Linear Regression Assumptions (Gauss-Markov Assumptions)

In the last lesson, we introduced a set of assumptions that we will commonly make regarding the error term in our hypothetical model \(Y = \beta_0 + \beta_1 X + e\). These assumptions are as follows:

  1. The relationship between X and Y is determined by a linear model of the form \(Y = \beta_0 + \beta_1 X + e\).

  2. Seperate observations of \(e\) are independent from one another.

  3. The error term \(e\) is normally distributed with a mean of 0 and standard deviation \(\sigma\).

  4. The error term \(e\) is independent from \(X\). In particular, the \(\mathrm{e} = \sigma^2\) does not depend on \(X\) (homoskedasticity).

Note that these assumptions are not required in order to find or even apply the OLS regression line. They are, however, necessary if we wish to use the model for inferential purposes.

Distribution of Parameter Estimates

Assume that we fix several \(X\) values, \(x_1, ..., x_n\) and that we use these \(X\) values to generate \(Y\) values \(y_1, ..., y_n\) according to a hypothetical model \(Y = \beta_0 + \beta_1 X + e\). Suppose we then calculate parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\).

If we repeat this process, we will get different values of \(y_1, ..., y_n\) each time, since the error terms \(e_i\) are assumed to be random. Since our \(Y\) values will vary each time, so too will our parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\). In this way, our parameter estimates are random variables whose distribution depends on that of \(e\).


It can be shown that if the Gauss-Markov assumptions hold, then the distibutions of \(\hat\beta_0\) and \(\hat\beta_1\) are as follows:


Unfortunately, we don’t typically know \(\mathrm{Var}[e] = \sigma^2\). If we do not know \(\sigma^2\), then we will need to approximate it using \(s^2 = \frac{SSE}{n-2}\). Replacing \(\sigma^2\) with \(s^2\) in the variance formulas above, and then taking square roots results in the appoximations for \(\mathrm{SD}[\hat\beta_0]\) and \(\mathrm{SD}[\hat\beta_1]\) shown below. These approximations are called the standard errors of the parameter estimates.

Hypothesis Tests for Parameters

It is often useful to be able to test the hypothesis that a particular parameter is equal to a specific number. In other words, we want to be able to conduct hypothesis tests of the following form:

\(\hspace{30pt} H_0: \beta_i = k\)

\(\hspace{30pt} H_A: \beta_i \neq k\)

We can perform such a test using the following test statistic:

\(\large\hspace{30pt} t = \frac{\hat\beta_{i} - k}{\mathrm{SE}[\hat\beta_i]}\)

This test statistic will follow a t distribution with \(n-2\) degrees of freedom.

Although we can test either regression parameter for any value of \(k\), the test that we will be most interested in is the following:

\(\hspace{30pt} H_0: \beta_1 = 0\)

\(\hspace{30pt} H_A: \beta_1 \neq 0\)

If we are unable to reject the null hypothesis in this test, then we can not be confident that the parameter \(\beta_1\) is non-zero. If \(\beta_1 = 0\), then our hypothetical model reduces to \(Y = \beta_0 + e\), and there is no relationship between \(X\) and \(Y\).

Hypothesis Tests in R

Fortunately, R will test for the regression parameters being zero when you call the lm() function. The results of these tests are displayed in the summary output for the model. To demonstrate this, we will return to the Pearson dataset.

myData <- read.table("father_son.txt", sep="\t", header=TRUE)
mod <- lm(sheight ~ fheight, myData)
summary(mod)

Call:
lm(formula = sheight ~ fheight, data = myData)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.8772 -1.5144 -0.0079  1.6285  8.9685 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.88660    1.83235   18.49   <2e-16 ***
fheight      0.51409    0.02705   19.01   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.437 on 1076 degrees of freedom
Multiple R-squared:  0.2513,    Adjusted R-squared:  0.2506 
F-statistic: 361.2 on 1 and 1076 DF,  p-value: < 2.2e-16

Before putting too much stock into the results of these hypothesis tests, we should perform an analysis of our residuals.

plot(myData$sheight ~ myData$fheight, pch=21, bg="cyan", col="black", cex=1,
     main="Scatter Plot", xlab="Father Height", ylab="Son Height")
abline(mod$coefficients, col="darkred", lwd=2)

plot(mod$residuals ~ myData$fheight, pch=21, bg="cyan", col="black", cex=1,
     main="Residual Plot", xlab="Father Height", ylab="Residuals")
abline(h=0, col="darkred", lwd=2)

qqnorm(mod$residuals)
qqline(mod$residuals)

Simulated Examples

n = 100
x <- runif(n, 0, 10)
y <- 7 + 1.3 * x + rnorm(n, 0, 15)
m <- lm(y ~ x)
summary(m)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.020  -9.456  -1.969  11.941  32.565 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  11.4469     2.9146   3.927  0.00016 ***
x             0.7086     0.5112   1.386  0.16890    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 15.83 on 98 degrees of freedom
Multiple R-squared:  0.01922,   Adjusted R-squared:  0.009217 
F-statistic: 1.921 on 1 and 98 DF,  p-value: 0.1689
plot(y ~ x, pch=21, bg="cyan", col="black", cex=1)
abline(m$coefficients, col="darkred", lwd=2)
abline(7, 1.3, col="darkblue", lwd=2, lty=2)

LS0tDQp0aXRsZTogIkxlc3NvbiAxMCAtIEh5cG90aGVzaXMgVGVzdHMgZm9yIFBhcmFtZXRlcnMiDQphdXRob3I6ICJSb2JiaWUgQmVhbmUiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6DQogICAgdGhlbWU6IGZsYXRseQ0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19kZXB0aDogMg0KLS0tDQoNCiMgTGluZWFyIFJlZ3Jlc3Npb24gQXNzdW1wdGlvbnMgKEdhdXNzLU1hcmtvdiBBc3N1bXB0aW9ucykNCg0KSW4gdGhlIGxhc3QgbGVzc29uLCB3ZSBpbnRyb2R1Y2VkIGEgc2V0IG9mIGFzc3VtcHRpb25zIHRoYXQgd2Ugd2lsbCBjb21tb25seSBtYWtlIHJlZ2FyZGluZyB0aGUgZXJyb3IgdGVybSBpbiBvdXIgaHlwb3RoZXRpY2FsIG1vZGVsICRZID0gXGJldGFfMCArIFxiZXRhXzEgWCArIGUkLiBUaGVzZSBhc3N1bXB0aW9ucyBhcmUgYXMgZm9sbG93czoNCg0KMS4gVGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIGBYYCBhbmQgYFlgIGlzIGRldGVybWluZWQgYnkgYSBsaW5lYXIgbW9kZWwgb2YgdGhlIGZvcm0gJFkgPSBcYmV0YV8wICsgXGJldGFfMSBYICsgZSQuIA0KDQoyLiBTZXBlcmF0ZSBvYnNlcnZhdGlvbnMgb2YgJGUkIGFyZSBpbmRlcGVuZGVudCBmcm9tIG9uZSBhbm90aGVyLiANCg0KMy4gVGhlIGVycm9yIHRlcm0gJGUkIGlzIG5vcm1hbGx5IGRpc3RyaWJ1dGVkIHdpdGggYSBtZWFuIG9mIDAgYW5kIHN0YW5kYXJkIGRldmlhdGlvbiAkXHNpZ21hJC4NCg0KNC4gVGhlIGVycm9yIHRlcm0gJGUkIGlzIGluZGVwZW5kZW50IGZyb20gJFgkLiBJbiBwYXJ0aWN1bGFyLCB0aGUgJFxtYXRocm17ZX0gPSBcc2lnbWFeMiQgZG9lcyBub3QgZGVwZW5kIG9uICRYJCAoaG9tb3NrZWRhc3RpY2l0eSkuDQoNCk5vdGUgdGhhdCB0aGVzZSBhc3N1bXB0aW9ucyBhcmUgbm90IHJlcXVpcmVkIGluIG9yZGVyIHRvIGZpbmQgb3IgZXZlbiBhcHBseSB0aGUgT0xTIHJlZ3Jlc3Npb24gbGluZS4gVGhleSBhcmUsIGhvd2V2ZXIsIG5lY2Vzc2FyeSBpZiB3ZSB3aXNoIHRvIHVzZSB0aGUgbW9kZWwgZm9yIGluZmVyZW50aWFsIHB1cnBvc2VzLiAgIA0KDQojIERpc3RyaWJ1dGlvbiBvZiBQYXJhbWV0ZXIgRXN0aW1hdGVzDQoNCkFzc3VtZSB0aGF0IHdlIGZpeCBzZXZlcmFsICRYJCB2YWx1ZXMsICR4XzEsIC4uLiwgeF9uJCBhbmQgdGhhdCB3ZSB1c2UgdGhlc2UgJFgkIHZhbHVlcyB0byBnZW5lcmF0ZSAkWSQgdmFsdWVzICR5XzEsIC4uLiwgeV9uJCBhY2NvcmRpbmcgdG8gYSBoeXBvdGhldGljYWwgbW9kZWwgJFkgPSBcYmV0YV8wICsgXGJldGFfMSBYICsgZSQuIFN1cHBvc2Ugd2UgdGhlbiBjYWxjdWxhdGUgcGFyYW1ldGVyIGVzdGltYXRlcyAkXGhhdFxiZXRhXzAkIGFuZCAkXGhhdFxiZXRhXzEkLg0KDQpJZiB3ZSByZXBlYXQgdGhpcyBwcm9jZXNzLCB3ZSB3aWxsIGdldCBkaWZmZXJlbnQgdmFsdWVzIG9mICR5XzEsIC4uLiwgeV9uJCBlYWNoIHRpbWUsIHNpbmNlIHRoZSBlcnJvciB0ZXJtcyAkZV9pJCBhcmUgYXNzdW1lZCB0byBiZSByYW5kb20uIFNpbmNlIG91ciAkWSQgdmFsdWVzIHdpbGwgdmFyeSBlYWNoIHRpbWUsIHNvIHRvbyB3aWxsIG91ciBwYXJhbWV0ZXIgZXN0aW1hdGVzICRcaGF0XGJldGFfMCQgYW5kICRcaGF0XGJldGFfMSQuIEluIHRoaXMgd2F5LCBvdXIgcGFyYW1ldGVyIGVzdGltYXRlcyBhcmUgcmFuZG9tIHZhcmlhYmxlcyB3aG9zZSBkaXN0cmlidXRpb24gZGVwZW5kcyBvbiB0aGF0IG9mICRlJC4gDQoNCi0tLQ0KDQpJdCBjYW4gYmUgc2hvd24gdGhhdCBpZiB0aGUgR2F1c3MtTWFya292IGFzc3VtcHRpb25zIGhvbGQsIHRoZW4gdGhlIGRpc3RpYnV0aW9ucyBvZiAkXGhhdFxiZXRhXzAkIGFuZCAkXGhhdFxiZXRhXzEkIGFyZSBhcyBmb2xsb3dzOg0KDQoqICRcaGF0XGJldGFfMCQgZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24gd2l0aDoNCg0KICAgICogJFxtdV97XGhhdFxiZXRhXzB9ID0gXG1hdGhybXtFfVtcaGF0XGJldGFfMF0gPSBcYmV0YV8wJA0KICAgIA0KICAgICogJFxzaWdtYV97XGhhdFxiZXRhXzB9XjIgPSBcbWF0aHJte1Zhcn1bXGhhdFxiZXRhXzBdID0gXHNpZ21hXjIgXGxlZnRbXGZyYWN7MX17bn0gKyBcZnJhY3tcYmFyIHheMn17U1hYfSBccmlnaHRdJA0KDQoqICRcaGF0XGJldGFfMSQgZm9sbG93cyBhIG5vcm1hbCBkaXN0cmlidXRpb24gd2l0aDoNCg0KICAgICogJFxtdV97XGhhdFxiZXRhXzF9ID0gXG1hdGhybXtFfVtcaGF0XGJldGFfMV0gPSBcYmV0YV8xJA0KICAgIA0KICAgICogJFxzaWdtYV97XGhhdFxiZXRhXzF9XjIgPSBcbWF0aHJte1Zhcn1bXGhhdFxiZXRhXzFdID0gXGZyYWN7XHNpZ21hXjJ9e1NYWH0kDQoNCi0tLQ0KDQpVbmZvcnR1bmF0ZWx5LCB3ZSBkb24ndCB0eXBpY2FsbHkga25vdyAkXG1hdGhybXtWYXJ9W2VdID0gXHNpZ21hXjIkLiBJZiB3ZSBkbyBub3Qga25vdyAkXHNpZ21hXjIkLCB0aGVuIHdlIHdpbGwgbmVlZCB0byBhcHByb3hpbWF0ZSBpdCB1c2luZyAkc14yID0gXGZyYWN7U1NFfXtuLTJ9JC4gUmVwbGFjaW5nICRcc2lnbWFeMiQgd2l0aCAkc14yJCBpbiB0aGUgdmFyaWFuY2UgZm9ybXVsYXMgYWJvdmUsIGFuZCB0aGVuIHRha2luZyBzcXVhcmUgcm9vdHMgcmVzdWx0cyBpbiB0aGUgYXBwb3hpbWF0aW9ucyBmb3IgJFxtYXRocm17U0R9W1xoYXRcYmV0YV8wXSQgYW5kICRcbWF0aHJte1NEfVtcaGF0XGJldGFfMV0kIHNob3duIGJlbG93LiBUaGVzZSBhcHByb3hpbWF0aW9ucyBhcmUgY2FsbGVkIHRoZSAqKnN0YW5kYXJkIGVycm9ycyoqIG9mIHRoZSBwYXJhbWV0ZXIgZXN0aW1hdGVzLg0KDQogICAgDQogICogJFxtYXRocm17U0R9W1xoYXRcYmV0YV8wXSBcYXBwcm94IFxtYXRocm17U0V9W1xoYXRcYmV0YV8wXSA9IHMgXHNxcnR7XGZyYWN7MX17bn0gKyBcZnJhY3tcYmFyIHheMn17U1hYfSB9JA0KICANCiAgKiAkXG1hdGhybXtTRH1bXGhhdFxiZXRhXzFdIFxhcHByb3ggXG1hdGhybXtTRX1bXGhhdFxiZXRhXzFdID0gXGZyYWN7c14yfXtcc3FydHtTWFh9fSQNCg0KDQojIEh5cG90aGVzaXMgVGVzdHMgZm9yIFBhcmFtZXRlcnMNCg0KSXQgaXMgb2Z0ZW4gdXNlZnVsIHRvIGJlIGFibGUgdG8gdGVzdCB0aGUgaHlwb3RoZXNpcyB0aGF0IGEgcGFydGljdWxhciBwYXJhbWV0ZXIgaXMgZXF1YWwgdG8gYSBzcGVjaWZpYyBudW1iZXIuIEluIG90aGVyIHdvcmRzLCB3ZSB3YW50IHRvIGJlIGFibGUgdG8gY29uZHVjdCBoeXBvdGhlc2lzIHRlc3RzIG9mIHRoZSBmb2xsb3dpbmcgZm9ybToNCg0KDQokXGhzcGFjZXszMHB0fSBIXzA6IFxiZXRhX2kgPSBrJA0KDQokXGhzcGFjZXszMHB0fSBIX0E6IFxiZXRhX2kgXG5lcSBrJA0KDQpXZSBjYW4gcGVyZm9ybSBzdWNoIGEgdGVzdCB1c2luZyB0aGUgZm9sbG93aW5nIHRlc3Qgc3RhdGlzdGljOg0KDQokXGxhcmdlXGhzcGFjZXszMHB0fSB0ID0gXGZyYWN7XGhhdFxiZXRhX3tpfSAtIGt9e1xtYXRocm17U0V9W1xoYXRcYmV0YV9pXX0kDQoNClRoaXMgdGVzdCBzdGF0aXN0aWMgd2lsbCBmb2xsb3cgYSB0IGRpc3RyaWJ1dGlvbiB3aXRoICRuLTIkIGRlZ3JlZXMgb2YgZnJlZWRvbS4gDQoNCg0KQWx0aG91Z2ggd2UgY2FuIHRlc3QgZWl0aGVyIHJlZ3Jlc3Npb24gcGFyYW1ldGVyIGZvciBhbnkgdmFsdWUgb2YgJGskLCB0aGUgdGVzdCB0aGF0IHdlIHdpbGwgYmUgbW9zdCBpbnRlcmVzdGVkIGluIGlzIHRoZSBmb2xsb3dpbmc6DQoNCiRcaHNwYWNlezMwcHR9IEhfMDogXGJldGFfMSA9IDAkDQoNCiRcaHNwYWNlezMwcHR9IEhfQTogXGJldGFfMSBcbmVxIDAkDQoNCklmIHdlIGFyZSB1bmFibGUgdG8gcmVqZWN0IHRoZSBudWxsIGh5cG90aGVzaXMgaW4gdGhpcyB0ZXN0LCB0aGVuIHdlIGNhbiBub3QgYmUgY29uZmlkZW50IHRoYXQgdGhlIHBhcmFtZXRlciAkXGJldGFfMSQgaXMgbm9uLXplcm8uIElmICRcYmV0YV8xID0gMCQsIHRoZW4gb3VyIGh5cG90aGV0aWNhbCBtb2RlbCByZWR1Y2VzIHRvICRZID0gXGJldGFfMCArIGUkLCBhbmQgdGhlcmUgaXMgbm8gcmVsYXRpb25zaGlwIGJldHdlZW4gJFgkIGFuZCAkWSQuIA0KDQoNCiMgSHlwb3RoZXNpcyBUZXN0cyBpbiBSDQoNCkZvcnR1bmF0ZWx5LCBSIHdpbGwgdGVzdCBmb3IgdGhlIHJlZ3Jlc3Npb24gcGFyYW1ldGVycyBiZWluZyB6ZXJvIHdoZW4geW91IGNhbGwgdGhlIGBsbSgpYCBmdW5jdGlvbi4gVGhlIHJlc3VsdHMgb2YgdGhlc2UgdGVzdHMgYXJlIGRpc3BsYXllZCBpbiB0aGUgc3VtbWFyeSBvdXRwdXQgZm9yIHRoZSBtb2RlbC4gVG8gZGVtb25zdHJhdGUgdGhpcywgd2Ugd2lsbCByZXR1cm4gdG8gdGhlIFBlYXJzb24gZGF0YXNldC4gDQoNCmBgYHtyfQ0KbXlEYXRhIDwtIHJlYWQudGFibGUoImZhdGhlcl9zb24udHh0Iiwgc2VwPSJcdCIsIGhlYWRlcj1UUlVFKQ0KbW9kIDwtIGxtKHNoZWlnaHQgfiBmaGVpZ2h0LCBteURhdGEpDQpzdW1tYXJ5KG1vZCkNCmBgYA0KDQpCZWZvcmUgcHV0dGluZyB0b28gbXVjaCBzdG9jayBpbnRvIHRoZSByZXN1bHRzIG9mIHRoZXNlIGh5cG90aGVzaXMgdGVzdHMsIHdlIHNob3VsZCBwZXJmb3JtIGFuIGFuYWx5c2lzIG9mIG91ciByZXNpZHVhbHMuIA0KDQpgYGB7cn0NCnBsb3QobXlEYXRhJHNoZWlnaHQgfiBteURhdGEkZmhlaWdodCwgcGNoPTIxLCBiZz0iY3lhbiIsIGNvbD0iYmxhY2siLCBjZXg9MSwNCiAgICAgbWFpbj0iU2NhdHRlciBQbG90IiwgeGxhYj0iRmF0aGVyIEhlaWdodCIsIHlsYWI9IlNvbiBIZWlnaHQiKQ0KYWJsaW5lKG1vZCRjb2VmZmljaWVudHMsIGNvbD0iZGFya3JlZCIsIGx3ZD0yKQ0KYGBgDQoNCmBgYHtyfQ0KcGxvdChtb2QkcmVzaWR1YWxzIH4gbXlEYXRhJGZoZWlnaHQsIHBjaD0yMSwgYmc9ImN5YW4iLCBjb2w9ImJsYWNrIiwgY2V4PTEsDQogICAgIG1haW49IlJlc2lkdWFsIFBsb3QiLCB4bGFiPSJGYXRoZXIgSGVpZ2h0IiwgeWxhYj0iUmVzaWR1YWxzIikNCmFibGluZShoPTAsIGNvbD0iZGFya3JlZCIsIGx3ZD0yKQ0KYGBgDQoNCmBgYHtyfQ0KcXFub3JtKG1vZCRyZXNpZHVhbHMpDQpxcWxpbmUobW9kJHJlc2lkdWFscykNCmBgYA0KDQoNCiMgU2ltdWxhdGVkIEV4YW1wbGVzDQoNCg0KYGBge3J9DQpuID0gMTAwDQp4IDwtIHJ1bmlmKG4sIDAsIDEwKQ0KeSA8LSA3ICsgMS4zICogeCArIHJub3JtKG4sIDAsIDE1KQ0KDQptIDwtIGxtKHkgfiB4KQ0KDQpzdW1tYXJ5KG0pDQoNCnBsb3QoeSB+IHgsIHBjaD0yMSwgYmc9ImN5YW4iLCBjb2w9ImJsYWNrIiwgY2V4PTEpDQphYmxpbmUobSRjb2VmZmljaWVudHMsIGNvbD0iZGFya3JlZCIsIGx3ZD0yKQ0KYWJsaW5lKDcsIDEuMywgY29sPSJkYXJrYmx1ZSIsIGx3ZD0yLCBsdHk9MikNCg0KYGBgDQoNCg0KDQoNCg0K