Population Parameters

We begin this lesson by recalling the definition of the population mean, variance and standard deviation.


Definition. Assume that \(X\) is a random variable defined for a population of size \(N\). Let the values of \(X\) for individuals in the population be denoted by \(x_1, x_2, ..., x_N\).


Notice that the units for \(\sigma^2\) will be in the square of the units in which \(X\) was measured. The units for \(\sigma\), however, will be the same as that of \(X\).

Sample Statistics

One of the primary goals of the field of Statistics is to attempt to learn information about a population by studying a sample drawn at random from the population. Assume that we are interested in a variable \(X\) defined for a population, and that we do not know the values of the population mean, variance, or standard deviation. We could draw a sample and used information gained from that sample to produce estimates of \(\mu\), \(\sigma^2\), and \(\sigma\). These estimates will be referred to as sample statistics.


Definition. Consider a sample \(x_1, x_2, ..., x_n\) from from a population denoted by \(X\).


Using R to Calculate \(s\) and \(s^2\)

We will now illustrate how to use R to calculate the sample variance and standard deviation. We start by generating a random sample of ten observations drawn from a normal distribution with a mean of 8 and a standard deviation of 2. That is, we will sample 10 observations of the random variable \(X \sim N(\mu = 8, \sigma = 2)\).

x <- rnorm(n = 10, mean = 8, sd = 2)
x
 [1]  4.692689  8.221005  5.920189 12.080893  6.224123 10.099972  9.510903  7.407727 10.388348
[10]  9.853547

We will now calculate the sample mean for our sample.

n <- length(x)
xbar <- sum(x) / n
xbar
[1] 8.43994

In the next code chuck, we will create a vector called errors which stores the error for each observation. We will then square the entries in the error vector, and sum the entries in the vector of squared errors. This sum is called the Sum of Squared Errors, or SSE. We finally divide SSE by n-1 to calculate the sample variance.

errors <- x - xbar
sq_errors <- errors^2
SSE <- sum(sq_errors)
s2 <- SSE / (n-1)
s2
[1] 5.485341

We now take the square root of the sample variance to calculate the sample standard deviation.

s <- sqrt(s2)
s
[1] 2.34208

As it turns out, R has built in functions for directly calculated the mean, variance, and standard deviation of a sample.

mean(x)
[1] 8.43994
var(x)
[1] 5.485341
sd(x)
[1] 2.34208

Comments on the Definition of the Sample Variance

Recall that the variance of a population is defined by \(\sigma^2 = \frac{1}{N} \sum_{i=i}^n (x_i - \mu)^2\). Given the definition of population variance, it might seem strange that we divide the SSE by n-1 rather than n when calculating sample variance. It would certainly make intuitive sense to divide by the sample size n.

To help explain why we define sample variance as \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2\), let \(t^2= \frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2\) denote the “intuitive” definition of sample variance.

The primary purpose for calculating the sample variance is to use it as an estimate for \(\sigma^2\). As it turns out, \(s^2\) provides a slightly better estimate for \(\sigma^2\) than does \(t^2\). The estimate \(t^2\) will tend to under-estimate the true value of \(\sigma^2\), and as such is said to be a biased estimator.

We will not offer a proof that \(t^2\) is a biased estimater of \(\sigma^2\) and \(s^2\) is unbiased, but we will provide some evidence to attempt to convince you of these facts.

Baisedness of \(t^2\)

To illustrate that \(t^2\) is a biased estimator of \(\sigma^2\), we will draw 10,000 samples, each one containing \(n=5\) observations of the variable \(X \sim N(\mu = 100, \sigma = 5)\). We will calculate \(t^2\) for each sample. We will then calculate the mean of the \(t^2\) values over all samples.

t2_vector <- c()
for (i in 1:10000){
  x <- rnorm(n = 5, mean = 100, sd = 5)
  errors <- x - mean(x)
  SSE <- sum(errors^2)
  t2 <- SSE / 5
  
  t2_vector <- c(t2_vector, t2)
}
mean(t2_vector)
[1] 20.0732

Notice that on average, the value of \(t^2\) is around 20, and thus under-estimates the population variance \(\sigma^2 = 25\).

Unbaisedness of \(s^2\)

We will now demonstrate that \(s^2\) is an unbaised estimator of \(\sigma^2\) using the same method we used to show that \(t^2\) was biased. We will draw 10,000 samples of \(n=5\) observations of the variable \(X \sim N(\mu = 100, \sigma = 5)\), calculating \(s^2\) for each sample. We will then calculate the mean of the resulting \(s^2\) values.

s2_vector <- c()
for (i in 1:10000){
  x <- rnorm(n = 5, mean = 100, sd = 5)
  errors <- x - mean(x)
  SSE <- sum(errors^2)
  s2 <- SSE / 4
  
  s2_vector <- c(s2_vector, s2)
}
mean(s2_vector)
[1] 25.02956

On average, the value of \(s^2\) is around 25, which is the true value of \(\sigma^2\).

Properties of Mean, Variance, and Standard Deviation

We will conclude this lesson by stating some important properties of the mean, variance and standard deviation. Each of these properties is stated for the population parameters, but also hold for sample statistics.

Theorem. Let \(X\) be a random variable, and let \(a\) and \(k\) be constants. Then:

  1. \(\mathrm{E}[a X] = a\cdot \mathrm{E}[X]\)

  2. \(\mathrm{E}[X + k] = \mathrm{E}[X] + k\)

  3. \(\mathrm{Var}[a X] = a^2 \cdot \mathrm{Var}[X]\)

  4. \(\mathrm{SD}[a X] = a \cdot \mathrm{SD}[X]\)

  5. \(\mathrm{Var}[X + k] = \mathrm{Var}[X]\)

We will provide proofs of Property 1 and Property 3 in the case where \(X\) is a random variable defined on a population of size \(N\).


Proof of Property 1. Let \(x_1, x_2, ..., x_N\) denote the values of \(X\) for individuals within the population. Then:

\[\mathrm{E}[a X] = \frac{1}{N} \sum_{i=1}^N (a x_i) = \frac{1}{N} \cdot a \cdot \sum_{i=1}^N x_i = a \left(\frac{1}{N} \sum_{i=1}^N x_i \right) = a \cdot \mathrm{E}[X]\]

Proof of Property 3. Let \(x_1, x_2, ..., x_N\) denote the values of \(X\) for individuals within the population. Then:

\[\mathrm{Var}[a X] = \frac{1}{N} \sum_{i=1}^N (a x_i - E[a X])^2 = \frac{1}{N} \sum_{i=1}^N (a x_i - a \mu)^2 = \frac{1}{N} \sum_{i=1}^N \left[a(x_i - \mu)\right]^2 \] \[ = \frac{1}{N} \sum_{i=1}^N a^2(x_i - \mu)^2 = a^2 \frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2 = a^2 \cdot \mathrm{Var}[X] \]
---
title: "Lesson 02 - Mean, Variance, Standard Deviation"
author: "Robbie Beane"
output:
  html_notebook:
    theme: flatly
    toc: true
    toc_depth: 2
    #toc_float: 
    #  collapsed: false
---

# Population Parameters

We begin this lesson by recalling the definition of the population mean, variance and standard deviation. 

-----

**Definition.** Assume that $X$ is a random variable defined for a population of size $N$. Let the values of $X$ for individuals in the population be denoted by $x_1, x_2, ..., x_N$.

* The **population mean**, denoted by $\mu$, is defined by $\mu = E[X] = \frac{1}{N} \sum\limits_{i=1}^N x_i$.

* The **population variance** is denoted by $\sigma^2$ and is defined by $\sigma^2 = Var[X] = \frac{1}{N} \sum\limits_{i=1}^N (x_i - \mu)^2$.

* The **population standard deviation** is denoted by $\sigma$ and is defined by $\sigma = SD[X] = \sqrt{\frac{1}{N} \sum\limits_{i=1}^N (x_i - \mu)^2}$.

-----

Notice that the units for $\sigma^2$ will be in the square of the units in which $X$ was measured. The units for $\sigma$, however, will be the same as that of $X$. 


# Sample Statistics

One of the primary goals of the field of Statistics is to attempt to learn information about a population by studying a sample drawn at random from the population. Assume that we are interested in a variable $X$ defined for a population, and that we do not know the values of the population mean, variance, or standard deviation. We could draw a sample and used information gained from that sample to produce estimates of $\mu$, $\sigma^2$, and $\sigma$. These estimates will be referred to as **sample statistics**. 

-----

**Definition.** Consider a sample $x_1, x_2, ..., x_n$ from from a population denoted by $X$.

* The **sample mean** is denoted by $\bar x$ and is defined by $\bar x = \frac{1}{n} \sum\limits_{i=1}^n x_i$.

* The **sample variance** is denoted by $s^2$ and is defined by $s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (x_i - \bar x)^2$.

* The **sample standard deviation** is denoted by $s$ and is defined by $s = \sqrt{ \frac{1}{n-1} \sum\limits_{i=1}^n (x_i - \bar x)^2}$.

-----



# Using R to Calculate $s$ and $s^2$

We will now illustrate how to use R to calculate the sample variance and standard deviation. We start by generating a random sample of ten observations drawn from a normal distribution with a mean of 8 and a standard deviation of 2. That is, we will sample 10 observations of the random variable $X \sim N(\mu = 8, \sigma = 2)$.


```{r}
x <- rnorm(n = 10, mean = 8, sd = 2)
x
```

We will now calculate the sample mean for our sample. 

```{r}
n <- length(x)
xbar <- sum(x) / n
xbar
```

In the next code chuck, we will create a vector called `errors` which stores the error for each observation. We will then square the entries in the error vector, and sum the entries in the vector of squared errors. This sum is called the Sum of Squared Errors, or SSE. We finally divide SSE by `n-1` to calculate the sample variance. 

```{r}
errors <- x - xbar
sq_errors <- errors^2
SSE <- sum(sq_errors)
s2 <- SSE / (n-1)
s2
```

We now take the square root of the sample variance to calculate the sample standard deviation. 

```{r}
s <- sqrt(s2)
s
```

As it turns out, R has built in functions for directly calculated the mean, variance, and standard deviation of a sample. 

```{r}
mean(x)
```

```{r}
var(x)
```

```{r}
sd(x)
```

# Comments on the Definition of the Sample Variance

Recall that the variance of a population is defined by $\sigma^2 = \frac{1}{N} \sum_{i=i}^n (x_i - \mu)^2$. Given the definition of population variance, it might seem strange that we divide the SSE by `n-1` rather than `n` when calculating sample variance. It would certainly make intuitive sense to divide by the sample size `n`. 

To help explain why we define sample variance as $s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2$, let $t^2= \frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2$ denote the "intuitive" definition of sample variance. 

The primary purpose for calculating the sample variance is to use it as an estimate for $\sigma^2$. As it turns out, $s^2$ provides a slightly better estimate for $\sigma^2$ than does $t^2$. The estimate $t^2$ will tend to under-estimate the true value of $\sigma^2$, and as such is said to be a **biased** estimator. 

We will not offer a proof that $t^2$ is a biased estimater of $\sigma^2$ and $s^2$ is unbiased, but we will provide some evidence to attempt to convince you of these facts.

## Baisedness of $t^2$

To illustrate that $t^2$ is a biased estimator of $\sigma^2$, we will draw 10,000 samples, each one containing $n=5$ observations of the variable $X \sim N(\mu = 100, \sigma = 5)$. We will calculate $t^2$ for each sample. We will then calculate the mean of the $t^2$ values over all samples. 

```{r}
t2_vector <- c()

for (i in 1:10000){
  x <- rnorm(n = 5, mean = 100, sd = 5)
  errors <- x - mean(x)
  SSE <- sum(errors^2)
  t2 <- SSE / 5
  
  t2_vector <- c(t2_vector, t2)
}

mean(t2_vector)

```

Notice that on average, the value of $t^2$ is around 20, and thus under-estimates the population variance $\sigma^2 = 25$.


## Unbaisedness of $s^2$

We will now demonstrate that $s^2$ is an unbaised estimator of $\sigma^2$ using the same method we used to show that $t^2$ was biased. We will draw 10,000 samples of $n=5$ observations of the variable $X \sim N(\mu = 100, \sigma = 5)$, calculating $s^2$ for each sample. We will then calculate the mean of the resulting $s^2$ values. 

```{r}
s2_vector <- c()

for (i in 1:10000){
  x <- rnorm(n = 5, mean = 100, sd = 5)
  errors <- x - mean(x)
  SSE <- sum(errors^2)
  s2 <- SSE / 4
  
  s2_vector <- c(s2_vector, s2)
}

mean(s2_vector)

```

On average, the value of $s^2$ is around 25, which is the true value of $\sigma^2$.


# Properties of Mean, Variance, and Standard Deviation

We will conclude this lesson by stating some important properties of the mean, variance and standard deviation. Each of these properties is stated for the population parameters, but also hold for sample statistics. 

**Theorem.** Let $X$ be a random variable, and let $a$ and $k$ be constants. Then: 

1. $\mathrm{E}[a X] = a\cdot \mathrm{E}[X]$

2. $\mathrm{E}[X + k] = \mathrm{E}[X] + k$

3. $\mathrm{Var}[a X] = a^2 \cdot \mathrm{Var}[X]$

4. $\mathrm{SD}[a X] = a \cdot \mathrm{SD}[X]$

5. $\mathrm{Var}[X + k] = \mathrm{Var}[X]$


We will provide proofs of Property 1 and Property 3 in the case where $X$ is a random variable defined on a population of size $N$. 

-----

**Proof of Property 1.** Let $x_1, x_2, ..., x_N$ denote the values of $X$ for individuals within the population. Then: 

<center>
$$\mathrm{E}[a X] = 
\frac{1}{N} \sum_{i=1}^N (a x_i) =  
\frac{1}{N} \cdot a \cdot \sum_{i=1}^N x_i =  
a \left(\frac{1}{N} \sum_{i=1}^N  x_i \right) = 
a \cdot \mathrm{E}[X]$$
</center>


-----

**Proof of Property 3.** Let $x_1, x_2, ..., x_N$ denote the values of $X$ for individuals within the population. Then: 

<center>
$$\mathrm{Var}[a X] = 
\frac{1}{N} \sum_{i=1}^N (a x_i - E[a X])^2 =
\frac{1}{N} \sum_{i=1}^N (a x_i - a \mu)^2 =  
\frac{1}{N} \sum_{i=1}^N \left[a(x_i - \mu)\right]^2  
$$
$$
= \frac{1}{N} \sum_{i=1}^N a^2(x_i - \mu)^2 =  
a^2 \frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2 =  
a^2 \cdot \mathrm{Var}[X]
$$
</center>