Deterministic vs. Stochastic Models
Let \(X\) and \(Y\) be two variables that are related in some way. Assume that we wish to build a model that allows us to make predictions about the value of \(Y\) based on a supplied value for \(X\).
A deterministic model is one in which there is no randomness. Given a specific value for the input variable \(X\), a deterministic model will return a specific value for \(Y\). When working with deterministic models, we tend to use the following terms to refer to the variables:
- The input variable \(X\) is called the independent variable.
- The output variable \(Y\) is called the dependent variable.
A stochastic model is one that incorporates a random element into its predictions. Given a specific value of \(X\), a stochastic model provides a probability distribution for the variable \(Y\). The model does not tell us the exact value of \(Y\), but it allows us to make probabilistic statements concerning its possible values. When working with stochastic models, we tend to use the following terms to refer to the variables:
- The input variable \(X\) is called the predictor variable.
- The output variable \(Y\) is called the response variable.
Hypothetical Model
In simple linear regression, we assume that the predictor variable \(X\) and the response variable \(Y\) are related by a linear stochastic relationship of the following form:
\[Y = \beta_0 + \beta_1 X + e\]
where \(e\) is a random variable. We call the equation above the hypothetical model or population model for linear regression.
The random variable \(e\) is called the error term or the noise term.
The error term \(e\) is intended to account for the (hopefully small) effects that have not been explained by the deterministic portion of our model.
Given a specific paired observation \(\left(x_i, y_i \right)\), we write \(y_i = \beta_0 + \beta_1 x_i+ e_i\).
When applying this model, we will often assume that \(e\) follows a normal distribution with a mean of 0 and some standard deviation \(\sigma\). The size of \(\sigma\) reflects that amoung of uncertainty in our model.
Note that if \(E[e]=0\), then \(E[Y|X] = \beta_0 + \beta_1 X\).
Fitted Model
We generally begin a linear regression problem by assuming that a hypothetical model of the form \(Y = \beta_0 + \beta_1 X + e\) exists and describes the relationship between our variables. We do not generally have direct access to such a model, however. Our goal in a regression problem is to approximate the hypothetical model by collecting a sample of paired observations of the form \(\left(x_i, y_i \right)\), and then using those to attempt to reconstruct our model. An approximate model generated from sample data is referred to a fitted model.
For simple linear regression, creating a fitted model involves estimating the values of the parameters \(\beta_0\) and \(\beta_1\), as well as proposing a reasonable distribution for \(e\).
We denote the estimates of \(\beta_0\) and \(\beta_1\) by \(\hat\beta_0\) and \(\hat\beta_1\) respectively.
Given parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\), we define a new variable \(\hat Y = \hat \beta_0 + \hat\beta_1 X\).
The variable \(\hat Y\) is called the fitted value of \(Y\). It is the point estimate that the model generates for \(Y\) given the value of \(X\).
For a given paired observation \(\left(x_i, y_i \right)\), we write \(\hat y_i = \hat \beta_0 + \hat \beta_1 x_i\).
For a given paired observation \(\left(x_i, y_i \right)\), we have that \(\hat e_i = y_i - \left(\hat \beta_0 + \hat \beta_1 x_i \right)\), or in other words \(\hat e_i = y_i - \hat y_i\).
The values \(\hat e_i\) calculated from the paired observations in our sample are called residuals. We can use the residuals to approximate the distribution of the error term \(e\).
We can write our fitted model as either \(\hat Y = \hat \beta_0 + \hat\beta_1 X\) or \(Y = \hat \beta_0 + \hat\beta_1 X + \hat e\).
Summary
We assume that the relationship between our variables is determined by a
hypothetical model of the form:
\[Y = \beta_0 + \beta_1 X + e\]
The fitted model is an approximation of the hypothetical model, and can be written in either of the following forms:
\[\hat Y = \hat \beta_0 + \hat \beta_1 X\] \[Y = \hat \beta_0 + \hat \beta_1 X + \hat e\]
Given a paired observation
\(\left(x_i, y_i \right)\), the
fitted value of y given \(x_i\) is given by:
\[\hat y_i = \hat\beta_0 + \hat\beta_1 x_i\]
The residual associated with the observation is given by:
\[\hat e_i = y_i - \hat y_i\]
Finding the Fitted Model
We have not yet discussed how to find the fitted model \(\hat Y = \hat \beta_0 + \hat \beta_1 X\). Technically, any proposed values of the parameter estimates \(\hat\beta_0\) and \(\hat\beta_1\) would result in a fitted model. However, some values of these estimates will obviously result in a better estimate of the hypothetical model than others. In the next lecture, we will discuss how to find the best possible parameter estimates (at least for one common interpretation of what word “best” means in this context).
LS0tDQp0aXRsZTogIkxlc3NvbiAwNSAtIEh5cG90aGV0aWNhbCBhbmQgRml0dGVkIE1vZGVscyINCmF1dGhvcjogIlJvYmJpZSBCZWFuZSINCm91dHB1dDoNCiAgaHRtbF9ub3RlYm9vazoNCiAgICB0aGVtZTogZmxhdGx5DQogICAgdG9jOiB0cnVlDQogICAgdG9jX2RlcHRoOiAyDQotLS0NCg0KDQojIERldGVybWluaXN0aWMgdnMuIFN0b2NoYXN0aWMgTW9kZWxzDQoNCkxldCAkWCQgYW5kICRZJCBiZSB0d28gdmFyaWFibGVzIHRoYXQgYXJlIHJlbGF0ZWQgaW4gc29tZSB3YXkuIEFzc3VtZSB0aGF0IHdlIHdpc2ggdG8gYnVpbGQgYSBtb2RlbCB0aGF0IGFsbG93cyB1cyB0byBtYWtlIHByZWRpY3Rpb25zIGFib3V0IHRoZSB2YWx1ZSBvZiAkWSQgYmFzZWQgb24gYSBzdXBwbGllZCB2YWx1ZSBmb3IgJFgkLg0KDQoqIEEgKipkZXRlcm1pbmlzdGljIG1vZGVsKiogaXMgb25lIGluIHdoaWNoIHRoZXJlIGlzIG5vIHJhbmRvbW5lc3MuIEdpdmVuIGEgc3BlY2lmaWMgdmFsdWUgZm9yIHRoZSBpbnB1dCB2YXJpYWJsZSAkWCQsIGEgZGV0ZXJtaW5pc3RpYyBtb2RlbCB3aWxsIHJldHVybiBhIHNwZWNpZmljIHZhbHVlIGZvciAkWSQuIFdoZW4gd29ya2luZyB3aXRoIGRldGVybWluaXN0aWMgbW9kZWxzLCB3ZSB0ZW5kIHRvIHVzZSB0aGUgZm9sbG93aW5nIHRlcm1zIHRvIHJlZmVyIHRvIHRoZSB2YXJpYWJsZXM6DQoNCiAgICAqIFRoZSBpbnB1dCB2YXJpYWJsZSAkWCQgaXMgY2FsbGVkIHRoZSAqKmluZGVwZW5kZW50IHZhcmlhYmxlKiouDQogICAgKiBUaGUgb3V0cHV0IHZhcmlhYmxlICRZJCBpcyBjYWxsZWQgdGhlICoqZGVwZW5kZW50IHZhcmlhYmxlKiouIA0KPGJyPjxicj4NCg0KKiBBICoqc3RvY2hhc3RpYyBtb2RlbCoqIGlzIG9uZSB0aGF0IGluY29ycG9yYXRlcyBhIHJhbmRvbSBlbGVtZW50IGludG8gaXRzIHByZWRpY3Rpb25zLiBHaXZlbiBhIHNwZWNpZmljIHZhbHVlIG9mICRYJCwgYSBzdG9jaGFzdGljIG1vZGVsIHByb3ZpZGVzIGEgcHJvYmFiaWxpdHkgZGlzdHJpYnV0aW9uIGZvciB0aGUgdmFyaWFibGUgJFkkLiBUaGUgbW9kZWwgZG9lcyBub3QgdGVsbCB1cyB0aGUgZXhhY3QgdmFsdWUgb2YgJFkkLCBidXQgaXQgYWxsb3dzIHVzIHRvIG1ha2UgcHJvYmFiaWxpc3RpYyBzdGF0ZW1lbnRzIGNvbmNlcm5pbmcgaXRzIHBvc3NpYmxlIHZhbHVlcy4gV2hlbiB3b3JraW5nIHdpdGggc3RvY2hhc3RpYyBtb2RlbHMsIHdlIHRlbmQgdG8gdXNlIHRoZSBmb2xsb3dpbmcgdGVybXMgdG8gcmVmZXIgdG8gdGhlIHZhcmlhYmxlczoNCg0KICAgICogVGhlIGlucHV0IHZhcmlhYmxlICRYJCBpcyBjYWxsZWQgdGhlICoqcHJlZGljdG9yIHZhcmlhYmxlKiouDQogICAgKiBUaGUgb3V0cHV0IHZhcmlhYmxlICRZJCBpcyBjYWxsZWQgdGhlICoqcmVzcG9uc2UgdmFyaWFibGUqKi4gDQoNCg0KIyBIeXBvdGhldGljYWwgTW9kZWwNCg0KSW4gKipzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24qKiwgd2UgYXNzdW1lIHRoYXQgdGhlIHByZWRpY3RvciB2YXJpYWJsZSAkWCQgYW5kIHRoZSByZXNwb25zZSB2YXJpYWJsZSAkWSQgYXJlIHJlbGF0ZWQgYnkgYSBsaW5lYXIgc3RvY2hhc3RpYyByZWxhdGlvbnNoaXAgb2YgdGhlIGZvbGxvd2luZyBmb3JtOg0KDQo8Y2VudGVyPg0KJCRZID0gXGJldGFfMCArIFxiZXRhXzEgWCArIGUkJA0KPC9jZW50ZXI+DQoNCndoZXJlICRlJCBpcyBhIHJhbmRvbSB2YXJpYWJsZS4gV2UgY2FsbCB0aGUgZXF1YXRpb24gYWJvdmUgdGhlICoqaHlwb3RoZXRpY2FsIG1vZGVsKiogb3IgKipwb3B1bGF0aW9uIG1vZGVsKiogZm9yIGxpbmVhciByZWdyZXNzaW9uLiANCg0KKiBUaGUgcmFuZG9tIHZhcmlhYmxlICRlJCBpcyBjYWxsZWQgdGhlICoqZXJyb3IgdGVybSoqIG9yIHRoZSAqKm5vaXNlIHRlcm0qKi4gDQoNCiogVGhlIGVycm9yIHRlcm0gJGUkIGlzIGludGVuZGVkIHRvIGFjY291bnQgZm9yIHRoZSAoaG9wZWZ1bGx5IHNtYWxsKSBlZmZlY3RzIHRoYXQgaGF2ZSBub3QgYmVlbiBleHBsYWluZWQgYnkgdGhlIGRldGVybWluaXN0aWMgcG9ydGlvbiBvZiBvdXIgbW9kZWwuDQoNCiogR2l2ZW4gYSBzcGVjaWZpYyBwYWlyZWQgb2JzZXJ2YXRpb24gJFxsZWZ0KHhfaSwgeV9pIFxyaWdodCkkLCB3ZSB3cml0ZSAkeV9pID0gXGJldGFfMCArIFxiZXRhXzEgeF9pKyBlX2kkLiANCg0KKiBXaGVuIGFwcGx5aW5nIHRoaXMgbW9kZWwsIHdlIHdpbGwgb2Z0ZW4gYXNzdW1lIHRoYXQgJGUkIGZvbGxvd3MgYSBub3JtYWwgZGlzdHJpYnV0aW9uIHdpdGggYSBtZWFuIG9mIDAgYW5kIHNvbWUgc3RhbmRhcmQgZGV2aWF0aW9uICRcc2lnbWEkLiBUaGUgc2l6ZSBvZiAkXHNpZ21hJCByZWZsZWN0cyB0aGF0IGFtb3VuZyBvZiB1bmNlcnRhaW50eSBpbiBvdXIgbW9kZWwuIA0KDQoqIE5vdGUgdGhhdCBpZiAkRVtlXT0wJCwgdGhlbiAkRVtZfFhdID0gXGJldGFfMCArIFxiZXRhXzEgWCQuIA0KDQoNCiMgRml0dGVkIE1vZGVsDQoNCldlIGdlbmVyYWxseSBiZWdpbiBhIGxpbmVhciByZWdyZXNzaW9uIHByb2JsZW0gYnkgYXNzdW1pbmcgdGhhdCBhIGh5cG90aGV0aWNhbCBtb2RlbCBvZiB0aGUgZm9ybSAkWSA9IFxiZXRhXzAgKyBcYmV0YV8xIFggKyBlJCBleGlzdHMgYW5kIGRlc2NyaWJlcyB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gb3VyIHZhcmlhYmxlcy4gV2UgZG8gbm90IGdlbmVyYWxseSBoYXZlIGRpcmVjdCBhY2Nlc3MgdG8gc3VjaCBhIG1vZGVsLCBob3dldmVyLiBPdXIgZ29hbCBpbiBhIHJlZ3Jlc3Npb24gcHJvYmxlbSBpcyB0byBhcHByb3hpbWF0ZSB0aGUgaHlwb3RoZXRpY2FsIG1vZGVsIGJ5IGNvbGxlY3RpbmcgYSBzYW1wbGUgb2YgcGFpcmVkIG9ic2VydmF0aW9ucyBvZiB0aGUgZm9ybSAkXGxlZnQoeF9pLCB5X2kgXHJpZ2h0KSQsIGFuZCB0aGVuIHVzaW5nIHRob3NlIHRvIGF0dGVtcHQgdG8gcmVjb25zdHJ1Y3Qgb3VyIG1vZGVsLiBBbiBhcHByb3hpbWF0ZSBtb2RlbCBnZW5lcmF0ZWQgZnJvbSBzYW1wbGUgZGF0YSBpcyByZWZlcnJlZCB0byBhICoqZml0dGVkIG1vZGVsKiouDQoNCkZvciBzaW1wbGUgbGluZWFyIHJlZ3Jlc3Npb24sIGNyZWF0aW5nIGEgZml0dGVkIG1vZGVsIGludm9sdmVzIGVzdGltYXRpbmcgdGhlIHZhbHVlcyBvZiB0aGUgcGFyYW1ldGVycyAkXGJldGFfMCQgYW5kICRcYmV0YV8xJCwgYXMgd2VsbCBhcyBwcm9wb3NpbmcgYSByZWFzb25hYmxlIGRpc3RyaWJ1dGlvbiBmb3IgJGUkLg0KDQoqIFdlIGRlbm90ZSB0aGUgZXN0aW1hdGVzIG9mICRcYmV0YV8wJCBhbmQgJFxiZXRhXzEkIGJ5ICRcaGF0XGJldGFfMCQgYW5kICRcaGF0XGJldGFfMSQgcmVzcGVjdGl2ZWx5LiANCg0KKiBHaXZlbiBwYXJhbWV0ZXIgZXN0aW1hdGVzICRcaGF0XGJldGFfMCQgYW5kICRcaGF0XGJldGFfMSQsIHdlIGRlZmluZSBhIG5ldyB2YXJpYWJsZSAkXGhhdCBZID0gXGhhdCBcYmV0YV8wICsgXGhhdFxiZXRhXzEgWCQuIA0KDQoqIFRoZSB2YXJpYWJsZSAkXGhhdCBZJCBpcyBjYWxsZWQgdGhlICoqZml0dGVkIHZhbHVlKiogb2YgJFkkLiBJdCBpcyB0aGUgcG9pbnQgZXN0aW1hdGUgdGhhdCB0aGUgbW9kZWwgZ2VuZXJhdGVzIGZvciAkWSQgZ2l2ZW4gdGhlIHZhbHVlIG9mICRYJC4gDQoNCiogRm9yIGEgZ2l2ZW4gcGFpcmVkIG9ic2VydmF0aW9uICRcbGVmdCh4X2ksIHlfaSBccmlnaHQpJCwgd2Ugd3JpdGUgJFxoYXQgeV9pID0gXGhhdCBcYmV0YV8wICsgXGhhdCBcYmV0YV8xIHhfaSQuIA0KDQoqIEZvciBhIGdpdmVuIHBhaXJlZCBvYnNlcnZhdGlvbiAkXGxlZnQoeF9pLCB5X2kgXHJpZ2h0KSQsIHdlIGhhdmUgdGhhdCAkXGhhdCBlX2kgPSB5X2kgLSBcbGVmdChcaGF0IFxiZXRhXzAgKyBcaGF0IFxiZXRhXzEgeF9pIFxyaWdodCkkLCBvciBpbiBvdGhlciB3b3JkcyAkXGhhdCBlX2kgPSB5X2kgLSBcaGF0IHlfaSQuIA0KDQoqIFRoZSB2YWx1ZXMgJFxoYXQgZV9pJCBjYWxjdWxhdGVkIGZyb20gdGhlIHBhaXJlZCBvYnNlcnZhdGlvbnMgaW4gb3VyIHNhbXBsZSBhcmUgY2FsbGVkICoqcmVzaWR1YWxzKiouIFdlIGNhbiB1c2UgdGhlIHJlc2lkdWFscyB0byBhcHByb3hpbWF0ZSB0aGUgZGlzdHJpYnV0aW9uIG9mIHRoZSBlcnJvciB0ZXJtICRlJC4gDQoNCiogV2UgY2FuIHdyaXRlIG91ciBmaXR0ZWQgbW9kZWwgYXMgZWl0aGVyICRcaGF0IFkgPSBcaGF0IFxiZXRhXzAgKyBcaGF0XGJldGFfMSBYJCBvciAkWSA9IFxoYXQgXGJldGFfMCArIFxoYXRcYmV0YV8xIFggKyBcaGF0IGUkLg0KDQojIFN1bW1hcnkNCg0KV2UgYXNzdW1lIHRoYXQgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIG91ciB2YXJpYWJsZXMgaXMgZGV0ZXJtaW5lZCBieSBhICoqaHlwb3RoZXRpY2FsIG1vZGVsKiogb2YgdGhlIGZvcm06IA0KPGNlbnRlcj4NCiQkWSA9IFxiZXRhXzAgKyBcYmV0YV8xIFggKyBlJCQNCjwvY2VudGVyPg0KDQpUaGUgKipmaXR0ZWQgbW9kZWwqKiBpcyBhbiBhcHByb3hpbWF0aW9uIG9mIHRoZSBoeXBvdGhldGljYWwgbW9kZWwsIGFuZCBjYW4gYmUgd3JpdHRlbiBpbiBlaXRoZXIgb2YgdGhlIGZvbGxvd2luZyBmb3JtczoNCg0KPGNlbnRlcj4NCiQkXGhhdCBZID0gXGhhdCBcYmV0YV8wICsgXGhhdCBcYmV0YV8xIFgkJA0KJCRZID0gXGhhdCBcYmV0YV8wICsgXGhhdCBcYmV0YV8xIFggKyBcaGF0IGUkJA0KPC9jZW50ZXI+DQoNCkdpdmVuIGEgcGFpcmVkIG9ic2VydmF0aW9uICRcbGVmdCh4X2ksIHlfaSBccmlnaHQpJCwgdGhlICoqZml0dGVkIHZhbHVlIG9mIHkgZ2l2ZW4gJHhfaSQqKiBpcyBnaXZlbiBieTogDQo8Y2VudGVyPg0KJCRcaGF0IHlfaSA9IFxoYXRcYmV0YV8wICsgXGhhdFxiZXRhXzEgeF9pJCQNCjwvY2VudGVyPg0KDQpUaGUgKipyZXNpZHVhbCoqIGFzc29jaWF0ZWQgd2l0aCB0aGUgb2JzZXJ2YXRpb24gaXMgZ2l2ZW4gYnk6DQoNCjxjZW50ZXI+DQokJFxoYXQgZV9pID0geV9pIC0gXGhhdCB5X2kkJA0KPC9jZW50ZXI+DQoNCmBgYHtyLCBlY2hvPSdGQUxTRSd9DQpzZXQuc2VlZCgzKQ0KeCA8LSBydW5pZihuPTEwLCA1LCAxNSkNCnkgPC0gNCArIDAuMyp4ICsgcm5vcm0oMTAsIDAsIDAuMikNCm1vZGVsIDwtIGxtKHl+eCkNCnBsb3QoeSB+IHgsIHBjaD0yMSwgYmc9J29yYW5nZScsIGNvbD0nYmxhY2snLCBjZXg9MS41KQ0KYWJsaW5lKDQsIDAuMywgY29sPSdkYXJrcmVkJywgbHR5PTIsIGx3ZD0yKQ0KYWJsaW5lKG1vZGVsJGNvZWZmaWNpZW50cywgY29sPSdibHVlJywgbHdkPTIpDQpwb2ludHMoeSB+IHgsIHBjaD0yMSwgYmc9J29yYW5nZScsIGNvbD0nYmxhY2snLCBjZXg9MS41KQ0KdGV4dCg3LCA2LjUsICdIeXBvdGhldGljYWwgTW9kZWwnLCBjb2w9J2RhcmtyZWQnKQ0KdGV4dCg5LjUsIDYuNSwgJ0ZpdHRlZCBNb2RlbCcsIGNvbD0nYmx1ZScpDQpgYGANCg0KDQojIEZpbmRpbmcgdGhlIEZpdHRlZCBNb2RlbA0KDQpXZSBoYXZlIG5vdCB5ZXQgZGlzY3Vzc2VkIGhvdyB0byBmaW5kIHRoZSBmaXR0ZWQgbW9kZWwgJFxoYXQgWSA9IFxoYXQgXGJldGFfMCArIFxoYXQgXGJldGFfMSBYJC4gVGVjaG5pY2FsbHksIGFueSBwcm9wb3NlZCB2YWx1ZXMgb2YgdGhlIHBhcmFtZXRlciBlc3RpbWF0ZXMgJFxoYXRcYmV0YV8wJCBhbmQgJFxoYXRcYmV0YV8xJCB3b3VsZCByZXN1bHQgaW4gYSBmaXR0ZWQgbW9kZWwuIEhvd2V2ZXIsIHNvbWUgdmFsdWVzIG9mIHRoZXNlIGVzdGltYXRlcyB3aWxsIG9idmlvdXNseSByZXN1bHQgaW4gYSBiZXR0ZXIgZXN0aW1hdGUgb2YgdGhlIGh5cG90aGV0aWNhbCBtb2RlbCB0aGFuIG90aGVycy4gSW4gdGhlIG5leHQgbGVjdHVyZSwgd2Ugd2lsbCBkaXNjdXNzIGhvdyB0byBmaW5kIHRoZSBiZXN0IHBvc3NpYmxlIHBhcmFtZXRlciBlc3RpbWF0ZXMgKGF0IGxlYXN0IGZvciBvbmUgY29tbW9uIGludGVycHJldGF0aW9uIG9mIHdoYXQgd29yZCAiYmVzdCIgbWVhbnMgaW4gdGhpcyBjb250ZXh0KS4gDQo=