Population Parameters
We begin this lesson by recalling the definition of the population mean, variance and standard deviation.
Definition. Assume that \(X\) is a random variable defined for a population of size \(N\). Let the values of \(X\) for individuals in the population be denoted by \(x_1, x_2, ..., x_N\).
The population mean, denoted by \(\mu\), is defined by \(\mu = E[X] = \frac{1}{N} \sum\limits_{i=1}^N x_i\).
The population variance is denoted by \(\sigma^2\) and is defined by \(\sigma^2 = Var[X] = \frac{1}{N} \sum\limits_{i=1}^N (x_i - \mu)^2\).
The population standard deviation is denoted by \(\sigma\) and is defined by \(\sigma = SD[X] = \sqrt{\frac{1}{N} \sum\limits_{i=1}^N (x_i - \mu)^2}\).
Notice that the units for \(\sigma^2\) will be in the square of the units in which \(X\) was measured. The units for \(\sigma\), however, will be the same as that of \(X\).
Sample Statistics
One of the primary goals of the field of Statistics is to attempt to learn information about a population by studying a sample drawn at random from the population. Assume that we are interested in a variable \(X\) defined for a population, and that we do not know the values of the population mean, variance, or standard deviation. We could draw a sample and used information gained from that sample to produce estimates of \(\mu\), \(\sigma^2\), and \(\sigma\). These estimates will be referred to as sample statistics.
Definition. Consider a sample \(x_1, x_2, ..., x_n\) from from a population denoted by \(X\).
The sample mean is denoted by \(\bar x\) and is defined by \(\bar x = \frac{1}{n} \sum\limits_{i=1}^n x_i\).
The sample variance is denoted by \(s^2\) and is defined by \(s^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (x_i - \bar x)^2\).
The sample standard deviation is denoted by \(s\) and is defined by \(s = \sqrt{ \frac{1}{n-1} \sum\limits_{i=1}^n (x_i - \bar x)^2}\).
Using R to Calculate \(s\) and \(s^2\)
We will now illustrate how to use R to calculate the sample variance and standard deviation. We start by generating a random sample of ten observations drawn from a normal distribution with a mean of 8 and a standard deviation of 2. That is, we will sample 10 observations of the random variable \(X \sim N(\mu = 8, \sigma = 2)\).
x <- rnorm(n = 10, mean = 8, sd = 2)
x
[1] 4.692689 8.221005 5.920189 12.080893 6.224123 10.099972 9.510903 7.407727 10.388348
[10] 9.853547
We will now calculate the sample mean for our sample.
n <- length(x)
xbar <- sum(x) / n
xbar
[1] 8.43994
In the next code chuck, we will create a vector called errors
which stores the error for each observation. We will then square the entries in the error vector, and sum the entries in the vector of squared errors. This sum is called the Sum of Squared Errors, or SSE. We finally divide SSE by n-1
to calculate the sample variance.
errors <- x - xbar
sq_errors <- errors^2
SSE <- sum(sq_errors)
s2 <- SSE / (n-1)
s2
[1] 5.485341
We now take the square root of the sample variance to calculate the sample standard deviation.
s <- sqrt(s2)
s
[1] 2.34208
As it turns out, R has built in functions for directly calculated the mean, variance, and standard deviation of a sample.
mean(x)
[1] 8.43994
var(x)
[1] 5.485341
sd(x)
[1] 2.34208
Properties of Mean, Variance, and Standard Deviation
We will conclude this lesson by stating some important properties of the mean, variance and standard deviation. Each of these properties is stated for the population parameters, but also hold for sample statistics.
Theorem. Let \(X\) be a random variable, and let \(a\) and \(k\) be constants. Then:
\(\mathrm{E}[a X] = a\cdot \mathrm{E}[X]\)
\(\mathrm{E}[X + k] = \mathrm{E}[X] + k\)
\(\mathrm{Var}[a X] = a^2 \cdot \mathrm{Var}[X]\)
\(\mathrm{SD}[a X] = a \cdot \mathrm{SD}[X]\)
\(\mathrm{Var}[X + k] = \mathrm{Var}[X]\)
We will provide proofs of Property 1 and Property 3 in the case where \(X\) is a random variable defined on a population of size \(N\).
Proof of Property 1. Let \(x_1, x_2, ..., x_N\) denote the values of \(X\) for individuals within the population. Then:
\[\mathrm{E}[a X] =
\frac{1}{N} \sum_{i=1}^N (a x_i) =
\frac{1}{N} \cdot a \cdot \sum_{i=1}^N x_i =
a \left(\frac{1}{N} \sum_{i=1}^N x_i \right) =
a \cdot \mathrm{E}[X]\]
Proof of Property 3. Let \(x_1, x_2, ..., x_N\) denote the values of \(X\) for individuals within the population. Then:
\[\mathrm{Var}[a X] =
\frac{1}{N} \sum_{i=1}^N (a x_i - E[a X])^2 =
\frac{1}{N} \sum_{i=1}^N (a x_i - a \mu)^2 =
\frac{1}{N} \sum_{i=1}^N \left[a(x_i - \mu)\right]^2
\] \[
= \frac{1}{N} \sum_{i=1}^N a^2(x_i - \mu)^2 =
a^2 \frac{1}{N} \sum_{i=1}^N (x_i - \mu)^2 =
a^2 \cdot \mathrm{Var}[X]
\]
LS0tDQp0aXRsZTogIkxlc3NvbiAwMiAtIE1lYW4sIFZhcmlhbmNlLCBTdGFuZGFyZCBEZXZpYXRpb24iDQphdXRob3I6ICJSb2JiaWUgQmVhbmUiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6DQogICAgdGhlbWU6IGZsYXRseQ0KICAgIHRvYzogdHJ1ZQ0KICAgIHRvY19kZXB0aDogMg0KICAgICN0b2NfZmxvYXQ6IA0KICAgICMgIGNvbGxhcHNlZDogZmFsc2UNCi0tLQ0KDQojIFBvcHVsYXRpb24gUGFyYW1ldGVycw0KDQpXZSBiZWdpbiB0aGlzIGxlc3NvbiBieSByZWNhbGxpbmcgdGhlIGRlZmluaXRpb24gb2YgdGhlIHBvcHVsYXRpb24gbWVhbiwgdmFyaWFuY2UgYW5kIHN0YW5kYXJkIGRldmlhdGlvbi4gDQoNCi0tLS0tDQoNCioqRGVmaW5pdGlvbi4qKiBBc3N1bWUgdGhhdCAkWCQgaXMgYSByYW5kb20gdmFyaWFibGUgZGVmaW5lZCBmb3IgYSBwb3B1bGF0aW9uIG9mIHNpemUgJE4kLiBMZXQgdGhlIHZhbHVlcyBvZiAkWCQgZm9yIGluZGl2aWR1YWxzIGluIHRoZSBwb3B1bGF0aW9uIGJlIGRlbm90ZWQgYnkgJHhfMSwgeF8yLCAuLi4sIHhfTiQuDQoNCiogVGhlICoqcG9wdWxhdGlvbiBtZWFuKiosIGRlbm90ZWQgYnkgJFxtdSQsIGlzIGRlZmluZWQgYnkgJFxtdSA9IEVbWF0gPSBcZnJhY3sxfXtOfSBcc3VtXGxpbWl0c197aT0xfV5OIHhfaSQuDQoNCiogVGhlICoqcG9wdWxhdGlvbiB2YXJpYW5jZSoqIGlzIGRlbm90ZWQgYnkgJFxzaWdtYV4yJCBhbmQgaXMgZGVmaW5lZCBieSAkXHNpZ21hXjIgPSBWYXJbWF0gPSBcZnJhY3sxfXtOfSBcc3VtXGxpbWl0c197aT0xfV5OICh4X2kgLSBcbXUpXjIkLg0KDQoqIFRoZSAqKnBvcHVsYXRpb24gc3RhbmRhcmQgZGV2aWF0aW9uKiogaXMgZGVub3RlZCBieSAkXHNpZ21hJCBhbmQgaXMgZGVmaW5lZCBieSAkXHNpZ21hID0gU0RbWF0gPSBcc3FydHtcZnJhY3sxfXtOfSBcc3VtXGxpbWl0c197aT0xfV5OICh4X2kgLSBcbXUpXjJ9JC4NCg0KLS0tLS0NCg0KTm90aWNlIHRoYXQgdGhlIHVuaXRzIGZvciAkXHNpZ21hXjIkIHdpbGwgYmUgaW4gdGhlIHNxdWFyZSBvZiB0aGUgdW5pdHMgaW4gd2hpY2ggJFgkIHdhcyBtZWFzdXJlZC4gVGhlIHVuaXRzIGZvciAkXHNpZ21hJCwgaG93ZXZlciwgd2lsbCBiZSB0aGUgc2FtZSBhcyB0aGF0IG9mICRYJC4gDQoNCg0KIyBTYW1wbGUgU3RhdGlzdGljcw0KDQpPbmUgb2YgdGhlIHByaW1hcnkgZ29hbHMgb2YgdGhlIGZpZWxkIG9mIFN0YXRpc3RpY3MgaXMgdG8gYXR0ZW1wdCB0byBsZWFybiBpbmZvcm1hdGlvbiBhYm91dCBhIHBvcHVsYXRpb24gYnkgc3R1ZHlpbmcgYSBzYW1wbGUgZHJhd24gYXQgcmFuZG9tIGZyb20gdGhlIHBvcHVsYXRpb24uIEFzc3VtZSB0aGF0IHdlIGFyZSBpbnRlcmVzdGVkIGluIGEgdmFyaWFibGUgJFgkIGRlZmluZWQgZm9yIGEgcG9wdWxhdGlvbiwgYW5kIHRoYXQgd2UgZG8gbm90IGtub3cgdGhlIHZhbHVlcyBvZiB0aGUgcG9wdWxhdGlvbiBtZWFuLCB2YXJpYW5jZSwgb3Igc3RhbmRhcmQgZGV2aWF0aW9uLiBXZSBjb3VsZCBkcmF3IGEgc2FtcGxlIGFuZCB1c2VkIGluZm9ybWF0aW9uIGdhaW5lZCBmcm9tIHRoYXQgc2FtcGxlIHRvIHByb2R1Y2UgZXN0aW1hdGVzIG9mICRcbXUkLCAkXHNpZ21hXjIkLCBhbmQgJFxzaWdtYSQuIFRoZXNlIGVzdGltYXRlcyB3aWxsIGJlIHJlZmVycmVkIHRvIGFzICoqc2FtcGxlIHN0YXRpc3RpY3MqKi4gDQoNCi0tLS0tDQoNCioqRGVmaW5pdGlvbi4qKiBDb25zaWRlciBhIHNhbXBsZSAkeF8xLCB4XzIsIC4uLiwgeF9uJCBmcm9tIGZyb20gYSBwb3B1bGF0aW9uIGRlbm90ZWQgYnkgJFgkLg0KDQoqIFRoZSAqKnNhbXBsZSBtZWFuKiogaXMgZGVub3RlZCBieSAkXGJhciB4JCBhbmQgaXMgZGVmaW5lZCBieSAkXGJhciB4ID0gXGZyYWN7MX17bn0gXHN1bVxsaW1pdHNfe2k9MX1ebiB4X2kkLg0KDQoqIFRoZSAqKnNhbXBsZSB2YXJpYW5jZSoqIGlzIGRlbm90ZWQgYnkgJHNeMiQgYW5kIGlzIGRlZmluZWQgYnkgJHNeMiA9IFxmcmFjezF9e24tMX0gXHN1bVxsaW1pdHNfe2k9MX1ebiAoeF9pIC0gXGJhciB4KV4yJC4NCg0KKiBUaGUgKipzYW1wbGUgc3RhbmRhcmQgZGV2aWF0aW9uKiogaXMgZGVub3RlZCBieSAkcyQgYW5kIGlzIGRlZmluZWQgYnkgJHMgPSBcc3FydHsgXGZyYWN7MX17bi0xfSBcc3VtXGxpbWl0c197aT0xfV5uICh4X2kgLSBcYmFyIHgpXjJ9JC4NCg0KLS0tLS0NCg0KDQoNCiMgVXNpbmcgUiB0byBDYWxjdWxhdGUgJHMkIGFuZCAkc14yJA0KDQpXZSB3aWxsIG5vdyBpbGx1c3RyYXRlIGhvdyB0byB1c2UgUiB0byBjYWxjdWxhdGUgdGhlIHNhbXBsZSB2YXJpYW5jZSBhbmQgc3RhbmRhcmQgZGV2aWF0aW9uLiBXZSBzdGFydCBieSBnZW5lcmF0aW5nIGEgcmFuZG9tIHNhbXBsZSBvZiB0ZW4gb2JzZXJ2YXRpb25zIGRyYXduIGZyb20gYSBub3JtYWwgZGlzdHJpYnV0aW9uIHdpdGggYSBtZWFuIG9mIDggYW5kIGEgc3RhbmRhcmQgZGV2aWF0aW9uIG9mIDIuIFRoYXQgaXMsIHdlIHdpbGwgc2FtcGxlIDEwIG9ic2VydmF0aW9ucyBvZiB0aGUgcmFuZG9tIHZhcmlhYmxlICRYIFxzaW0gTihcbXUgPSA4LCBcc2lnbWEgPSAyKSQuDQoNCg0KYGBge3J9DQp4IDwtIHJub3JtKG4gPSAxMCwgbWVhbiA9IDgsIHNkID0gMikNCngNCmBgYA0KDQpXZSB3aWxsIG5vdyBjYWxjdWxhdGUgdGhlIHNhbXBsZSBtZWFuIGZvciBvdXIgc2FtcGxlLiANCg0KYGBge3J9DQpuIDwtIGxlbmd0aCh4KQ0KeGJhciA8LSBzdW0oeCkgLyBuDQp4YmFyDQpgYGANCg0KSW4gdGhlIG5leHQgY29kZSBjaHVjaywgd2Ugd2lsbCBjcmVhdGUgYSB2ZWN0b3IgY2FsbGVkIGBlcnJvcnNgIHdoaWNoIHN0b3JlcyB0aGUgZXJyb3IgZm9yIGVhY2ggb2JzZXJ2YXRpb24uIFdlIHdpbGwgdGhlbiBzcXVhcmUgdGhlIGVudHJpZXMgaW4gdGhlIGVycm9yIHZlY3RvciwgYW5kIHN1bSB0aGUgZW50cmllcyBpbiB0aGUgdmVjdG9yIG9mIHNxdWFyZWQgZXJyb3JzLiBUaGlzIHN1bSBpcyBjYWxsZWQgdGhlIFN1bSBvZiBTcXVhcmVkIEVycm9ycywgb3IgU1NFLiBXZSBmaW5hbGx5IGRpdmlkZSBTU0UgYnkgYG4tMWAgdG8gY2FsY3VsYXRlIHRoZSBzYW1wbGUgdmFyaWFuY2UuIA0KDQpgYGB7cn0NCmVycm9ycyA8LSB4IC0geGJhcg0Kc3FfZXJyb3JzIDwtIGVycm9yc14yDQpTU0UgPC0gc3VtKHNxX2Vycm9ycykNCnMyIDwtIFNTRSAvIChuLTEpDQpzMg0KYGBgDQoNCldlIG5vdyB0YWtlIHRoZSBzcXVhcmUgcm9vdCBvZiB0aGUgc2FtcGxlIHZhcmlhbmNlIHRvIGNhbGN1bGF0ZSB0aGUgc2FtcGxlIHN0YW5kYXJkIGRldmlhdGlvbi4gDQoNCmBgYHtyfQ0KcyA8LSBzcXJ0KHMyKQ0Kcw0KYGBgDQoNCkFzIGl0IHR1cm5zIG91dCwgUiBoYXMgYnVpbHQgaW4gZnVuY3Rpb25zIGZvciBkaXJlY3RseSBjYWxjdWxhdGVkIHRoZSBtZWFuLCB2YXJpYW5jZSwgYW5kIHN0YW5kYXJkIGRldmlhdGlvbiBvZiBhIHNhbXBsZS4gDQoNCmBgYHtyfQ0KbWVhbih4KQ0KYGBgDQoNCmBgYHtyfQ0KdmFyKHgpDQpgYGANCg0KYGBge3J9DQpzZCh4KQ0KYGBgDQoNCiMgQ29tbWVudHMgb24gdGhlIERlZmluaXRpb24gb2YgdGhlIFNhbXBsZSBWYXJpYW5jZQ0KDQpSZWNhbGwgdGhhdCB0aGUgdmFyaWFuY2Ugb2YgYSBwb3B1bGF0aW9uIGlzIGRlZmluZWQgYnkgJFxzaWdtYV4yID0gXGZyYWN7MX17Tn0gXHN1bV97aT1pfV5uICh4X2kgLSBcbXUpXjIkLiBHaXZlbiB0aGUgZGVmaW5pdGlvbiBvZiBwb3B1bGF0aW9uIHZhcmlhbmNlLCBpdCBtaWdodCBzZWVtIHN0cmFuZ2UgdGhhdCB3ZSBkaXZpZGUgdGhlIFNTRSBieSBgbi0xYCByYXRoZXIgdGhhbiBgbmAgd2hlbiBjYWxjdWxhdGluZyBzYW1wbGUgdmFyaWFuY2UuIEl0IHdvdWxkIGNlcnRhaW5seSBtYWtlIGludHVpdGl2ZSBzZW5zZSB0byBkaXZpZGUgYnkgdGhlIHNhbXBsZSBzaXplIGBuYC4gDQoNClRvIGhlbHAgZXhwbGFpbiB3aHkgd2UgZGVmaW5lIHNhbXBsZSB2YXJpYW5jZSBhcyAkc14yID0gXGZyYWN7MX17bi0xfSBcc3VtX3tpPTF9Xm4gKHhfaSAtIFxiYXIgeCleMiQsIGxldCAkdF4yPSBcZnJhY3sxfXtufSBcc3VtX3tpPTF9Xm4gKHhfaSAtIFxiYXIgeCleMiQgZGVub3RlIHRoZSAiaW50dWl0aXZlIiBkZWZpbml0aW9uIG9mIHNhbXBsZSB2YXJpYW5jZS4gDQoNClRoZSBwcmltYXJ5IHB1cnBvc2UgZm9yIGNhbGN1bGF0aW5nIHRoZSBzYW1wbGUgdmFyaWFuY2UgaXMgdG8gdXNlIGl0IGFzIGFuIGVzdGltYXRlIGZvciAkXHNpZ21hXjIkLiBBcyBpdCB0dXJucyBvdXQsICRzXjIkIHByb3ZpZGVzIGEgc2xpZ2h0bHkgYmV0dGVyIGVzdGltYXRlIGZvciAkXHNpZ21hXjIkIHRoYW4gZG9lcyAkdF4yJC4gVGhlIGVzdGltYXRlICR0XjIkIHdpbGwgdGVuZCB0byB1bmRlci1lc3RpbWF0ZSB0aGUgdHJ1ZSB2YWx1ZSBvZiAkXHNpZ21hXjIkLCBhbmQgYXMgc3VjaCBpcyBzYWlkIHRvIGJlIGEgKipiaWFzZWQqKiBlc3RpbWF0b3IuIA0KDQpXZSB3aWxsIG5vdCBvZmZlciBhIHByb29mIHRoYXQgJHReMiQgaXMgYSBiaWFzZWQgZXN0aW1hdGVyIG9mICRcc2lnbWFeMiQgYW5kICRzXjIkIGlzIHVuYmlhc2VkLCBidXQgd2Ugd2lsbCBwcm92aWRlIHNvbWUgZXZpZGVuY2UgdG8gYXR0ZW1wdCB0byBjb252aW5jZSB5b3Ugb2YgdGhlc2UgZmFjdHMuDQoNCiMjIEJhaXNlZG5lc3Mgb2YgJHReMiQNCg0KVG8gaWxsdXN0cmF0ZSB0aGF0ICR0XjIkIGlzIGEgYmlhc2VkIGVzdGltYXRvciBvZiAkXHNpZ21hXjIkLCB3ZSB3aWxsIGRyYXcgMTAsMDAwIHNhbXBsZXMsIGVhY2ggb25lIGNvbnRhaW5pbmcgJG49NSQgb2JzZXJ2YXRpb25zIG9mIHRoZSB2YXJpYWJsZSAkWCBcc2ltIE4oXG11ID0gMTAwLCBcc2lnbWEgPSA1KSQuIFdlIHdpbGwgY2FsY3VsYXRlICR0XjIkIGZvciBlYWNoIHNhbXBsZS4gV2Ugd2lsbCB0aGVuIGNhbGN1bGF0ZSB0aGUgbWVhbiBvZiB0aGUgJHReMiQgdmFsdWVzIG92ZXIgYWxsIHNhbXBsZXMuIA0KDQpgYGB7cn0NCnQyX3ZlY3RvciA8LSBjKCkNCg0KZm9yIChpIGluIDE6MTAwMDApew0KICB4IDwtIHJub3JtKG4gPSA1LCBtZWFuID0gMTAwLCBzZCA9IDUpDQogIGVycm9ycyA8LSB4IC0gbWVhbih4KQ0KICBTU0UgPC0gc3VtKGVycm9yc14yKQ0KICB0MiA8LSBTU0UgLyA1DQogIA0KICB0Ml92ZWN0b3IgPC0gYyh0Ml92ZWN0b3IsIHQyKQ0KfQ0KDQptZWFuKHQyX3ZlY3RvcikNCg0KYGBgDQoNCk5vdGljZSB0aGF0IG9uIGF2ZXJhZ2UsIHRoZSB2YWx1ZSBvZiAkdF4yJCBpcyBhcm91bmQgMjAsIGFuZCB0aHVzIHVuZGVyLWVzdGltYXRlcyB0aGUgcG9wdWxhdGlvbiB2YXJpYW5jZSAkXHNpZ21hXjIgPSAyNSQuDQoNCg0KIyMgVW5iYWlzZWRuZXNzIG9mICRzXjIkDQoNCldlIHdpbGwgbm93IGRlbW9uc3RyYXRlIHRoYXQgJHNeMiQgaXMgYW4gdW5iYWlzZWQgZXN0aW1hdG9yIG9mICRcc2lnbWFeMiQgdXNpbmcgdGhlIHNhbWUgbWV0aG9kIHdlIHVzZWQgdG8gc2hvdyB0aGF0ICR0XjIkIHdhcyBiaWFzZWQuIFdlIHdpbGwgZHJhdyAxMCwwMDAgc2FtcGxlcyBvZiAkbj01JCBvYnNlcnZhdGlvbnMgb2YgdGhlIHZhcmlhYmxlICRYIFxzaW0gTihcbXUgPSAxMDAsIFxzaWdtYSA9IDUpJCwgY2FsY3VsYXRpbmcgJHNeMiQgZm9yIGVhY2ggc2FtcGxlLiBXZSB3aWxsIHRoZW4gY2FsY3VsYXRlIHRoZSBtZWFuIG9mIHRoZSByZXN1bHRpbmcgJHNeMiQgdmFsdWVzLiANCg0KYGBge3J9DQpzMl92ZWN0b3IgPC0gYygpDQoNCmZvciAoaSBpbiAxOjEwMDAwKXsNCiAgeCA8LSBybm9ybShuID0gNSwgbWVhbiA9IDEwMCwgc2QgPSA1KQ0KICBlcnJvcnMgPC0geCAtIG1lYW4oeCkNCiAgU1NFIDwtIHN1bShlcnJvcnNeMikNCiAgczIgPC0gU1NFIC8gNA0KICANCiAgczJfdmVjdG9yIDwtIGMoczJfdmVjdG9yLCBzMikNCn0NCg0KbWVhbihzMl92ZWN0b3IpDQoNCmBgYA0KDQpPbiBhdmVyYWdlLCB0aGUgdmFsdWUgb2YgJHNeMiQgaXMgYXJvdW5kIDI1LCB3aGljaCBpcyB0aGUgdHJ1ZSB2YWx1ZSBvZiAkXHNpZ21hXjIkLg0KDQoNCiMgUHJvcGVydGllcyBvZiBNZWFuLCBWYXJpYW5jZSwgYW5kIFN0YW5kYXJkIERldmlhdGlvbg0KDQpXZSB3aWxsIGNvbmNsdWRlIHRoaXMgbGVzc29uIGJ5IHN0YXRpbmcgc29tZSBpbXBvcnRhbnQgcHJvcGVydGllcyBvZiB0aGUgbWVhbiwgdmFyaWFuY2UgYW5kIHN0YW5kYXJkIGRldmlhdGlvbi4gRWFjaCBvZiB0aGVzZSBwcm9wZXJ0aWVzIGlzIHN0YXRlZCBmb3IgdGhlIHBvcHVsYXRpb24gcGFyYW1ldGVycywgYnV0IGFsc28gaG9sZCBmb3Igc2FtcGxlIHN0YXRpc3RpY3MuIA0KDQoqKlRoZW9yZW0uKiogTGV0ICRYJCBiZSBhIHJhbmRvbSB2YXJpYWJsZSwgYW5kIGxldCAkYSQgYW5kICRrJCBiZSBjb25zdGFudHMuIFRoZW46IA0KDQoxLiAkXG1hdGhybXtFfVthIFhdID0gYVxjZG90IFxtYXRocm17RX1bWF0kDQoNCjIuICRcbWF0aHJte0V9W1ggKyBrXSA9IFxtYXRocm17RX1bWF0gKyBrJA0KDQozLiAkXG1hdGhybXtWYXJ9W2EgWF0gPSBhXjIgXGNkb3QgXG1hdGhybXtWYXJ9W1hdJA0KDQo0LiAkXG1hdGhybXtTRH1bYSBYXSA9IGEgXGNkb3QgXG1hdGhybXtTRH1bWF0kDQoNCjUuICRcbWF0aHJte1Zhcn1bWCArIGtdID0gXG1hdGhybXtWYXJ9W1hdJA0KDQoNCldlIHdpbGwgcHJvdmlkZSBwcm9vZnMgb2YgUHJvcGVydHkgMSBhbmQgUHJvcGVydHkgMyBpbiB0aGUgY2FzZSB3aGVyZSAkWCQgaXMgYSByYW5kb20gdmFyaWFibGUgZGVmaW5lZCBvbiBhIHBvcHVsYXRpb24gb2Ygc2l6ZSAkTiQuIA0KDQotLS0tLQ0KDQoqKlByb29mIG9mIFByb3BlcnR5IDEuKiogTGV0ICR4XzEsIHhfMiwgLi4uLCB4X04kIGRlbm90ZSB0aGUgdmFsdWVzIG9mICRYJCBmb3IgaW5kaXZpZHVhbHMgd2l0aGluIHRoZSBwb3B1bGF0aW9uLiBUaGVuOiANCg0KPGNlbnRlcj4NCiQkXG1hdGhybXtFfVthIFhdID0gDQpcZnJhY3sxfXtOfSBcc3VtX3tpPTF9Xk4gKGEgeF9pKSA9ICANClxmcmFjezF9e059IFxjZG90IGEgXGNkb3QgXHN1bV97aT0xfV5OIHhfaSA9ICANCmEgXGxlZnQoXGZyYWN7MX17Tn0gXHN1bV97aT0xfV5OICB4X2kgXHJpZ2h0KSA9IA0KYSBcY2RvdCBcbWF0aHJte0V9W1hdJCQNCjwvY2VudGVyPg0KDQoNCi0tLS0tDQoNCioqUHJvb2Ygb2YgUHJvcGVydHkgMy4qKiBMZXQgJHhfMSwgeF8yLCAuLi4sIHhfTiQgZGVub3RlIHRoZSB2YWx1ZXMgb2YgJFgkIGZvciBpbmRpdmlkdWFscyB3aXRoaW4gdGhlIHBvcHVsYXRpb24uIFRoZW46IA0KDQo8Y2VudGVyPg0KJCRcbWF0aHJte1Zhcn1bYSBYXSA9IA0KXGZyYWN7MX17Tn0gXHN1bV97aT0xfV5OIChhIHhfaSAtIEVbYSBYXSleMiA9DQpcZnJhY3sxfXtOfSBcc3VtX3tpPTF9Xk4gKGEgeF9pIC0gYSBcbXUpXjIgPSAgDQpcZnJhY3sxfXtOfSBcc3VtX3tpPTF9Xk4gXGxlZnRbYSh4X2kgLSBcbXUpXHJpZ2h0XV4yICANCiQkDQokJA0KPSBcZnJhY3sxfXtOfSBcc3VtX3tpPTF9Xk4gYV4yKHhfaSAtIFxtdSleMiA9ICANCmFeMiBcZnJhY3sxfXtOfSBcc3VtX3tpPTF9Xk4gKHhfaSAtIFxtdSleMiA9ICANCmFeMiBcY2RvdCBcbWF0aHJte1Zhcn1bWF0NCiQkDQo8L2NlbnRlcj4=
Comments on the Definition of the Sample Variance
Recall that the variance of a population is defined by \(\sigma^2 = \frac{1}{N} \sum_{i=i}^n (x_i - \mu)^2\). Given the definition of population variance, it might seem strange that we divide the SSE by
n-1
rather thann
when calculating sample variance. It would certainly make intuitive sense to divide by the sample sizen
.To help explain why we define sample variance as \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar x)^2\), let \(t^2= \frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2\) denote the “intuitive” definition of sample variance.
The primary purpose for calculating the sample variance is to use it as an estimate for \(\sigma^2\). As it turns out, \(s^2\) provides a slightly better estimate for \(\sigma^2\) than does \(t^2\). The estimate \(t^2\) will tend to under-estimate the true value of \(\sigma^2\), and as such is said to be a biased estimator.
We will not offer a proof that \(t^2\) is a biased estimater of \(\sigma^2\) and \(s^2\) is unbiased, but we will provide some evidence to attempt to convince you of these facts.
Baisedness of \(t^2\)
To illustrate that \(t^2\) is a biased estimator of \(\sigma^2\), we will draw 10,000 samples, each one containing \(n=5\) observations of the variable \(X \sim N(\mu = 100, \sigma = 5)\). We will calculate \(t^2\) for each sample. We will then calculate the mean of the \(t^2\) values over all samples.
Notice that on average, the value of \(t^2\) is around 20, and thus under-estimates the population variance \(\sigma^2 = 25\).
Unbaisedness of \(s^2\)
We will now demonstrate that \(s^2\) is an unbaised estimator of \(\sigma^2\) using the same method we used to show that \(t^2\) was biased. We will draw 10,000 samples of \(n=5\) observations of the variable \(X \sim N(\mu = 100, \sigma = 5)\), calculating \(s^2\) for each sample. We will then calculate the mean of the resulting \(s^2\) values.
On average, the value of \(s^2\) is around 25, which is the true value of \(\sigma^2\).