Load Packages

require(ggplot2)
require(MASS)
require(RColorBrewer)
require(dplyr)
require(reshape)

Create Sets of Predictors

We begin by creating several six different sets of predictors. Each set contains two predictors, X1 and X2, with varying degrees of correlation.

Statistics for our Predictor Sets

Generating Fitted Models

Assume we have a variable \(Y\) that is related to \(X1\) and \(X2\) according to the following hypothetical mode:

\[Y = 4 + 2 \cdot X1 + 3 \cdot X2 + e, \hspace{2 em} e \sim N(0, 2.5)\]

For each set of predictors, we will do the following:

  1. Create 100 sets of observed \(Y\) values according to the model above.
  2. For each set of observed \(Y\) values, we will create a fitted model.
  3. For each fitted model, we will store the coefficient estimates and the residual standard error for the model.

All told, we will create 6 models; 100 for each set of predictors. A summary of the statistics from our fitted models is shown below, grouped by predictor set.

Distribution of Parameter Estimates

We see that on average, the estimates of the coefficients are approximately equal to the values, regardless of the strength of the correlation in our predictors. However, the mean of the estimates only tells us part of the story. We should look at how these parameter estimates are distributed.

Relationship between Parameter Estimates

We can see from the plot above, the estimates for \(\beta_0\) and \(\beta_1\) can vary wildly when the predictors are strongly correlated. However, there is some order in the way in which these parameter estimates vary.

Residual Standard Errors

We saw above that the mean of the residual standard error resulting from the fitted models was very close to the true value of \(\sigma\), regardless of the size of \(corr(X1, X2)\). As with the parameter estimates, we should see how the distribution of \(s\) changes with respect to \(corr(X1, X2)\).

Distribution of Predictions

Recall that our hypothetical model was given by \(Y = 3 + 2 \cdot X1 + 2 \cdot X2 + e\), where \(e \sim N(0,2)\). If we plug \(X1 = 1.5, X2 = 1.5\) into this model, we get \(Y = 9 + e\).

We will now take each of the 600 fitted models we created previously, and will use those models to generate predictions for \(X1 = 1.5, X2 = 1.5\). For each prediction, we will also produce a 95% prediction interval, noting the margin of error of the resulting interval.

The table below shows the mean of the predicted values, as well as the mean margin of error for the prediction interval, group by predictor set.

We see that, on average, the predictions and the margin of error are about what they should be, regardless of the value of \(corr(X1, X2)\). As before, we should see how these values are distributed for each set of predictors.

We will start with the predicted values.

We now consider the margin of error for the prediction intervals.

Comparing with a SLR Model

We know that our data was generated using a hypothetical model of the form \(Y = 3 + 2 \cdot X1 + 2 \cdot X2 + e\), where \(e \sim N(0,2)\). However, we will now consider what our results would look like if we considered a simple linear regression model of the form \(Y =\hat\beta_0 + \hat\beta_1 \cdot X1\). More importantly, we will study how the results from that model vary for different values of \(corr(X1, X2)\).

Predictions from SLR Model

As we did with the MRL models, we will now use each of the 600 fitted SLR models to generate predictions for \(X1 = 1.5\). For each prediction, we will also produce a 95% prediction interval, noting the margin of error of the resulting interval.

The table below shows the mean of the predicted values, as well as the mean margin of error for the prediction interval, group by predictor set.

We will plot the distributions of the predictions for each group of predictors.

We will also plot the distributions of the prediction interval margin of errors for each group of predictors.

---
title: "Lesson 22 - Effects of Multicolinearity"
author: "Robbie Beane"
output:
  html_notebook:
    theme: flatly
    toc: yes
    toc_depth: 2
---

# Load Packages

```{r, message=FALSE}
require(ggplot2)
require(MASS)
require(RColorBrewer)
require(dplyr)
require(reshape)
```

# Create Sets of Predictors

We begin by creating several six different sets of predictors. Each set contains two predictors, `X1` and `X2`, with varying degrees of correlation. 


```{r, echo=FALSE}
set.seed(1)

n <- 100

r <- c(0, 0.2, 0.4, 0.6, 0.8, 0.99)

df <- mvrnorm(n, mu=c(0,0), Sigma = matrix(c(1,r[1],r[1],1), nrow=2), empirical = TRUE)
df <- data.frame(df)
df['corr'] = toString(r[1])
df['G'] = 1

for(i in 2:6){
  temp <- mvrnorm(n, mu=c(0,0), Sigma = matrix(c(1,r[i],r[i],1), nrow=2), empirical = TRUE)
  temp <- data.frame(temp)
  temp['corr'] = toString(r[i])
  temp['G'] = i
  df <- rbind(temp, df)
}

myPal <- rev(brewer.pal(10, "RdYlBu"))
myPal <- myPal[c(1:3,7:9)]

ggplot(df, aes(x=X1, y=X2, col=corr)) + geom_point() + facet_wrap(~corr) +
  scale_color_manual(values = myPal, guide=FALSE)
```

# Statistics for our Predictor Sets


```{r, echo=FALSE}
df %>% 
  group_by(corr) %>%
  summarise(meanX1 = round(mean(X1),2), meanX2 = round(mean(X2),2), sdX1 = sd(X1), sdX2 = sd(X2))
```

# Generating Fitted Models

Assume we have a variable $Y$ that is related to $X1$ and $X2$ according to the following hypothetical mode:

$$Y = 4 + 2 \cdot X1 + 3 \cdot X2 + e, \hspace{2 em} e \sim N(0, 2.5)$$

For each set of predictors, we will do the following:

1. Create 100 sets of observed $Y$ values according to the model above.
2. For each set of observed $Y$ values, we will create a fitted model. 
3. For each fitted model, we will store the coefficient estimates and the residual standard error for the model.

All told, we will create 6 models; 100 for each set of predictors. A summary of the statistics from our fitted models is shown below, grouped by predictor set. 

```{r, warning=FALSE, echo=FALSE}
iter <- 1000
s <- 2.5
nd <- data.frame(X1 = 1, X2 = 1)

b0_est <- c()
b1_est <- c()
b2_est <- c()
cor_list <- c()
r_list <- c()

b0s_est <- c()
b1s_est <- c()

pred_list <- c()
margin_list <- c()
rse_list <- c()

pred2_list <- c()
margin2_list <- c()
rse2_list <- c()

for(i in 1:iter){
  set.seed(i)
  e <- rnorm(n, 0, s)
  
  for(j in 1:6){
    sel_df <- filter(df, corr == r[j])
    X1 <- sel_df[,'X1']
    X2 <- sel_df[,'X2']
    y <- 4 + 2 * X1 + 3 * X2 + e
    mod <- lm(y ~ X1 + X2)
    mod2 <- lm(y ~ X1)
  
    cor_list <- c(cor_list, toString(r[j]))
    r_list <- c(r_list, r[j]) 
      
    # Full Model
    
    b0_est <- c(b0_est, mod$coefficients[1])
    b1_est <- c(b1_est, mod$coefficients[2])
    b2_est <- c(b2_est, mod$coefficients[3])
    
    rse_list <- c(rse_list, summary(mod)$sigma)
    pred <- predict(mod, newdata=nd, interval="prediction", level=0.95)
    pred_list <- c(pred_list, pred[1])
    margin_list <- c(margin_list, pred[3] - pred[1])
    
    # SLR Model
    
    b0s_est <- c(b0s_est, mod2$coefficients[1])
    b1s_est <- c(b1s_est, mod2$coefficients[2])
    
    rse2_list <- c(rse2_list, summary(mod2)$sigma)
    pred2 <- predict(mod2, newdata=nd, interval="prediction", level=0.95)
    pred2_list <- c(pred2_list, pred2[1])
    margin2_list <- c(margin2_list, pred2[3] - pred2[1])
  }
  
}

estimates <- data.frame(corr=cor_list, r=r_list, b0=b0_est, b1=b1_est, b2=b2_est,
                        rse=rse_list, pred=pred_list, margin=margin_list,
                        b0s=b0s_est, b1s=b1s_est,
                        rse2=rse2_list, pred2=pred2_list, margin2=margin2_list)

estimates %>%
  group_by(corr) %>%
  summarise(mean_b0 = mean(b0), mean_b1 = mean(b1), mean_b2 = mean(b2), mean_rse = mean(rse))
```

# Distribution of Parameter Estimates

We see that on average, the estimates of the coefficients are approximately equal to the values, regardless of the strength of the correlation in our predictors. However, the mean of the estimates only tells us part of the story. We should look at how these parameter estimates are distributed. 


```{r, fig.width=10, echo = FALSE}
melted <- estimates %>% melt(id=c("corr")) %>% filter(variable %in% c('b0', 'b1', 'b2'))
ggplot(melted, aes(value, fill=corr, col=corr)) + geom_density(alpha=1) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal) + facet_grid(corr~variable)
```

# Relationship between Parameter Estimates

We can see from the plot above, the estimates for $\beta_0$ and $\beta_1$ can vary wildly when the predictors are strongly correlated. However, there is some order in the way in which these parameter estimates vary. 

```{r, echo=FALSE}
ggplot(estimates, aes(x=b1, y=b2, col=corr)) + geom_point() + facet_wrap(~corr) + scale_color_manual(values = myPal)
```

# Residual Standard Errors

We saw above that the mean of the residual standard error resulting from the fitted models was very close to the true value of $\sigma$, regardless of the size of $corr(X1, X2)$. As with the parameter estimates, we should see how the distribution of $s$ changes with respect to $corr(X1, X2)$.

```{r, echo=FALSE}
ggplot(estimates, aes(rse, col=corr, fill=corr)) + geom_density(alpha=0.8) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal, guide=FALSE) + 
  facet_wrap(~corr)
```

# Distribution of Predictions

Recall that our hypothetical model was given by $Y = 3 + 2 \cdot X1 + 2 \cdot X2 + e$, where $e \sim N(0,2)$. If we plug $X1 = 1.5, X2 = 1.5$ into this model, we get $Y = 9 + e$. 

We will now take each of the 600 fitted models we created previously, and will use those models to generate predictions for $X1 = 1.5, X2 = 1.5$. For each prediction, we will also produce a 95% prediction interval, noting the margin of error of the resulting interval. 

The table below shows the mean of the predicted values, as well as the mean margin of error for the prediction interval, group by predictor set. 

```{r, echo = FALSE}
estimates %>%
  group_by(corr) %>%
  summarise(mean_pred = mean(pred), mean_margin = mean(margin))
```

We see that, on average, the predictions and the margin of error are about what they should be, regardless of the value of $corr(X1, X2)$. As before, we should see how these values are distributed for each set of predictors. 

We will start with the predicted values. 

```{r, echo=FALSE}
ggplot(estimates, aes(pred, col=corr, fill=corr)) + geom_density(alpha=0.8) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal, guide=FALSE) + 
  facet_wrap(~corr)
```

We now consider the margin of error for the prediction intervals. 

```{r, echo=FALSE}
ggplot(estimates, aes(margin, col=corr, fill=corr)) + geom_density(alpha=0.8) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal, guide=FALSE) + 
  facet_wrap(~corr)
```


# Comparing with a SLR Model

We know that our data was generated using a hypothetical model of the form $Y = 3 + 2 \cdot X1 + 2 \cdot X2 + e$, where $e \sim N(0,2)$. However, we will now consider what our results would look like if we considered a simple linear regression model of the form $Y =\hat\beta_0 + \hat\beta_1 \cdot X1$. More importantly, we will study how the results from that model vary for different values of $corr(X1, X2)$.  

```{r, echo = FALSE}
estimates %>%
  group_by(corr) %>%
  summarise(mean_b0s = mean(b0s), mean_b1s = mean(b1s), mean_rse_s = mean(rse2))
```

```{r, fig.width=10, echo = FALSE}
melted <- estimates %>% melt(id=c("corr")) %>% filter(variable %in% c('b0s', 'b1s', 'rse2'))
ggplot(melted, aes(value, fill=corr, col=corr)) + geom_density(alpha=1) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal) + facet_grid(corr ~ variable)
```

# Predictions from SLR Model 

As we did with the MRL models, we will now use each of the 600 fitted SLR models to generate predictions for $X1 = 1.5$. For each prediction, we will also produce a 95% prediction interval, noting the margin of error of the resulting interval. 

The table below shows the mean of the predicted values, as well as the mean margin of error for the prediction interval, group by predictor set. 

```{r, echo = FALSE}
estimates %>%
  group_by(corr) %>%
  summarise(mean_pred_s = mean(pred2), mean_margin_s = mean(margin2))
```

We will plot the distributions of the predictions for each group of predictors. 

```{r, echo=FALSE}
ggplot(estimates, aes(pred2, col=corr, fill=corr)) + geom_density(alpha=0.8) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal, guide=FALSE) + 
  facet_wrap(~corr)
```

We will also plot the distributions of the prediction interval margin of errors for each group of predictors. 

```{r, echo=FALSE}
ggplot(estimates, aes(margin2, col=corr, fill=corr)) + geom_density(alpha=0.8) + 
  scale_fill_manual(values = myPal) + scale_color_manual(values = myPal, guide=FALSE) + 
  facet_wrap(~corr)
```
