Sample Exam Questions: True or False? y-intercept is zero.

Sample Exam Questions:
True or False?
1. If the sample means of x and y are zero, then the estimated
y-intercept is zero.
2. The slope of the simple regression model indicates how the
actual value of y changes as x changes.
3. If the sample covariance between x and y is zero, then the
slope of the LS regression line zero.
Keys to Success:
1. Read reading assignments before you come to the class
2. Attend classes, and ask questions if you do not understand
3. Do your homework independently
IV. SLM: Properties of Least Squares Estimators
4.1 Unbiased LS Estimators
4.2 Best LS Estimators
- variances and covariance
- the Gauss-Markov Theorem (BLUE)
4.3 The Probability Distribution of the LS Estimators
4.4 Estimating the Variance of the Error Term
4.5 The Coefficient of Determination
R2, Adj. R2, AIC, SC
4.6 LS predictor and its Variance
(Reminder!)
Assumptions of the Simple Linear Regression Model
SR1.
yt = β1 + β2 xt + et
SR2.
E (et ) = 0 ⇔ E ( yt ) = β1 + β2 xt
SR3.
var(et ) = σ2 = var( yt )
SR4.
cov(ei , e j ) = cov( yi , y j ) = 0
SR5.
{xt , t = 1....T } is a set of fixed variables and
takes at least two different values
4.1. The Unbiased Estimators – sampling property
Quantity = β1 + β2 Price + ε
Sample Number
b1
b2
1
70.2034
-0.0657
2
69.0453
-0.0557
3
73.2357
-0.0478
4
71.3232
-0.1098
5
65.3365
-0.0987
6
68.6789
-0.0655
7
69.0037
-0.0789
8
34.3387
-0.0599
9
57.0098
-0.0768
10
64.4455
-0.0699
•
E(b) = β
• OLS procedure is unbiased, but we cannot say that the individual
estimate is unbiased
OLS estimator is unbiased but we cannot say estimate is unbiased.
Estimator: when the formulas for b1 and b2, are taken to be rules
that are used whatever the sample data turn out to be, then b1 and b2
are random variables. In this context we call b1 and b2
the least squares estimators.
Estimate: when actual sample values, numbers, are substituted
into the formulas, we obtain numbers that are values of random
variables. In this context, we call b1 and b2 the least squares estimates
• Proof: E(b) = β
b2
( x − x )( y − y )
∑
=
∑( x − x )
t
t
2
t
(x − x)E( y − y)
∑
=β
E (b ) =
∑ (x − x)
t
t
2
2
t
b1 = y − b2 x
yt = β1 + β 2 xt + et
⎫
=> E ( yt ) = β1 + β 2 xt ⎪⎪
⎪
⎬ E ( yt − y ) = β 2 ( xt − x )
⎪
y = β1 + β 2 x + e
⎪
=> E ( y ) = β1 + β 2 x ⎪⎭
E (b1 ) = E ( y ) − E (b2 ) x = β1 + β 2 x − β 2 x = β1
2
4.2 The Best Estimators
• If the regression model assumptions SR1-SR5 are correct
(SR6 is not required), then the variances and covariance of b1 and
b2 are derived from:
Var (b1 ) = E[b1 − E (b1 )]2
Cov(b1 , b2 ) = E{[b1 − E (b1 )][b2 − E (b2 )]}
v a r ( b1 ) = σ
v a r(b2 ) =
2
∑
c o v ( b1 , b 2 ) = σ
2
⎡
⎢
⎢⎣ T
∑
∑
x t2
( xt − x )2
σ2
( xt − x )2
⎡
⎢
⎢⎣
∑
− x
( xt − x )2
⎤
⎥
⎥⎦
⎤
⎥
⎥⎦
Derivations
•
•
Var (b2 ) = E[b2 − E (b2 )]2 = E[b2 − β 2 ]2
( x − x )( y − y ) ∑ ( x − x )[ β ( x − x ) + (e − e )]
∑
b =
=
∑ (x − x)
∑ (x − x)
( x − x )e e ∑ ( x − x )
( x − x )e
∑
∑
=β +
−
=β +
∑ (x − x) ∑ (x − x)
∑ (x − x)
t
t
2
t
2
t
2
t
t
2
t
b2 − β 2 =
t
2
t
•
t
2
∑ ( xt − x )et
2
(
−
)
x
x
∑ t
t
t
2
2
t
⇒ (b2 − β 2 ) 2 =
t
2
t
[∑ ( xt − x )et ]2
[∑ ( xt − x ) 2 ]2
•
E(b2 − β2 )2 =
=
E[∑( xt − x)et ]2
[∑( xt − x)2 ]2
E[{(x1 − x)e1 + ( x2 − x)e2 + ....+ ( xT − x)eT }{(x1 − x)e1 + ( x2 − x)e2 + ....+ ( xT − x)eT }]
[∑( xt − x)2 ]2
E[∑( xt − x)2 et ]
2
=
[∑( xt − x)2 ]2
2
(
x
−
x
)
∑ t E(et )
2
=
=
[∑( xt − x) ]
2 2
=
2 2
(
x
−
x
)
∑ t σ
[∑( xt − x)2 ]2
σ2
2
(
x
−
x
)
∑ t
“Need to prove that this is the best”
The Gauss-Markov Theorem
Gauss-Markov Theorem: Under the assumptions SR1-SR5 of
the linear regression model the estimators b1 and b2 have the
smallest variance of all linear and unbiased estimators of β1 and
β2. They are the Best Linear Unbiased Estimators (BLUE) of β1
and β2
1.
The estimators b1 and b2 are “best” when compared to
similar estimators, those that are linear and unbiased.
Note that the Theorem does not say that b1 and b2 are the
best of all possible estimators.
2. The estimators b1 and b2 are best within their class
because they have the minimum variance.
3. In order for the Gauss-Markov Theorem to hold,
the assumptions (SR1-SR5) must be true.
If any of the assumptions 1-5 are not true, then b1 and b2
are not the best linear unbiased estimators of β1 and β2.
4. The Gauss-Markov Theorem does not depend
on the assumption of normality
5. The Gauss-Markov theorem applies to the least squares
estimators.
It does not apply to the least squares estimates
from a single sample.
Proof of the Gauss-Markov Theorem:
Step 1. Define a generalized linear unbiased estimator form which can
includes our OLS estimator.
Step 2. Derive the variance of the generalized linear unbiased
estimator
Step 3. Show the generalized estimator becomes the OLS estimator
when its variance is the smallest.
4.3 The Probability Distribution of the LSEs
• If we make the normality assumption, assumption SR6 about the error term,
then the least squares estimators (a linear combination of the error term) are
normally distributed.
⎛
σ 2 ∑ x t2
b1 ~ N ⎜ β 1 ,
2
⎜
−
T
x
x
(
)
∑ t
⎝
⎛
b2 ~ N ⎜ β 2 ,
⎜
⎝
∑
σ2
( xt − x ) 2
⎞
⎟⎟
⎠
⎞
⎟⎟
⎠
• If assumptions SR1-SR5 hold, and if the sample size T is sufficiently large,
then the least squares estimators have a distribution that approximates
the normal distributions shown above.
4.5
Estimating the Variance of the Error Term
The variance of the random variable et is
var(et ) = σ2 = E[et − E (et )]2 = E (et2 )
if the assumption E(et)=0 is correct.
Since the “expectation” is an average value we might consider
estimating σ2 as the average of the squared errors,
σˆ 2 =
2
e
∑t
T
• Recall that the random errors are
et = yt − β1 − β2 xt
• The least squares residuals are obtained by replacing the unknown
parameters by their least squares estimators,
eˆt = yt − b1 − b2 xt
σˆ 2 =
2
ˆ
e
∑t
T
• There is a simple modification that produces an unbiased estimator,
and that is
σˆ 2 =
2
ˆ
e
∑t
T −2
E (σˆ 2 ) = σ 2
• Need to show that σˆ 2 =
2
ˆ
e
∑t
σˆ 2 =
2
ˆ
e
∑t
“Unbiased”
T −2
1
E ( ∑ eˆt2 ) = 1 E (∑ eˆt2 ) ≠ σ 2
T
T
• Evaluate
“Biased”
T
E(
1
eˆt2 ) = 1
E (∑ eˆt2 ) = σ 2
∑
T −2
T −2
E (∑ eˆt2 )
eˆt = yt − b1 − b2 xt = β1 + β 2 xt + et − (b1 + b2 xt )
= et + ( β1 − b1 ) + ( β 2 − b2 ) xt
= (et − e ) − (b2 − β 2 )( xt − x )
because β1 = y − β 2 x − e ,
b1 = y − b2 x
From eˆt = (et − e ) − (b2 − β 2 )( xt − x )
2
ˆ
e
∑ t = ∑ (et − e ) 2 + (b2 − β 2 ) 2 ∑ ( xt − x ) 2 − 2(b2 − β 2 )∑ (et − e )( xt − x )
E (∑ eˆt2 )
Remember we are looking for this guy!
(1)
E{∑ (et − e ) 2 } = E{∑ et + Te 2 − 2e ∑ et }
2
= ∑ E (et ) + T {
2
∑ E (et )
2
2
} − 2{
E (∑ et ∑ et )
T
= Tσ 2 + σ 2 − 2σ 2 = (T − 1)σ 2
(2)
E{(b2 − β 2 )
2
∑ ( xt − x ) } =
2
σ2
2
(
x
−
x
)
∑ t
T
}
2
2
(
x
−
x
)
=
σ
∑ t
(3) E{−2(b2 − β 2 )∑ et ( xt − x ) }
= −2 E{
∑ ( xt − x )( yt − y ) − β 2 ∑ ( xt − x )
∑ (x
t
(x
∑
= −2 E{
t
− x)
2
∑ e (x
t
2
t
− x ){β 2 ( xt − x ) + et } − β 2 ∑ ( xt − x )
∑ (x − x)
β ∑ (x − x) − β ∑ (x − x) + ∑ e (x
= −2 E{
∑ (x − x)
e (x − x)
E (e ) ∑ ( x − x )
∑
= −2 E{
} = −2
∑ (x − x)
∑ (x − x)
2
2
− x )}
∑ e (x
t
− x )}
t
t
2
2
2
2
t
t
t
2
t
2
t
2
t
2
t
2
t
2
t
= −2 E (et2 ) = −2σ 2
2
t
t
− x)
∑ e (x
t
t
− x )}
E (∑ eˆt2 ) = (1) + (2) + (3) = (T − 1)σ 2 + σ 2 − 2σ 2 = (T − 2)σ 2
1
(T − 2) 2
2
2
1
ˆ
ˆ
σ
≠σ2
E ( ∑ et ) = E (∑ et ) =
T
T
T
1
2
2
2
2
2
1
1
ˆ
ˆ
ˆ
E(
e
)
=
E
(
σ
)
=
E
(
e
)
=
•
(
T
−
2
)
σ
=
σ
∑
∑
t
t
T
−
2
T −2
T −2
σˆ 2 =
2
ˆ
e
∑t
T −2
E (σˆ 2 ) = σ 2
Estimating the Variances and Covariances of the Least Squares
Estimators
• Replace the unknown error variance σ 2 in earlier formulas with
σˆ 2 for variances and covariance estimators:
2
⎡
⎤
x
∑
t
2
ˆ b1 ) = σˆ ⎢
var(
,
2 ⎥
⎢⎣ T ∑ ( xt − x ) ⎥⎦
σˆ 2
ˆ b2 ) =
var(
,
2
∑ ( xt − x )
⎡
⎤
−x
ˆ b1 , b2 ) = σˆ ⎢
cov(
2 ⎥
⎢⎣ ∑ ( xt − x ) ⎥⎦
2
ˆ b1 )
se(b1 ) = var(
ˆ b2 )
se(b2 ) = var(
4.6 The Coefficient of Determination, R2
Two major reasons for analyzing the model yt = β1 + β 2 xt + et
are
1. Estimation: to explain how the dependent variable (yt) changes
as the independent variable (xt) changes, and
2. Prediction: to predict y0 given an x0.
•
For the “prediction” purpose, we introduce the “explanatory”
variable xt in hope that its variation will “explain” the variation
in yt.
How well do the explanatory variables explain the variation in yt?
How to compute the coefficient of determination, R2 ?
1. To develop a measure of the variation in yt that is explained
by the model, we begin by separating yt into its explainable
and unexplainable components.
yt = E ( yt ) + et
•
E ( yt ) = β1 + β2 xt is the explainable, “systematic”
component of yt ,
•
et is the random, unsystematic, unexplainable noise
component of yt.
2. We can estimate the unknown parameters β1 and β2 and
decompose the value of yt into
yt = E ( yt ) + et
=>
yt = b1 + b2 xt + eˆt = yˆ t + eˆt
y = b1 + b2 x + eˆ
yt − y = b2 ( xt − x ) + (eˆt − eˆ ) = b2 ( xt − x ) + eˆt = ( yˆ t − y ) + eˆt
3. SST = SSR + SSE =
2
2
2
2
ˆ
(
)
(
)
y
−
y
=
b
x
−
x
+
e
∑ t
∑t
t
2∑
= ∑ ( yˆ t − y ) 2 + ∑ eˆt2
2
(
y
−
y
)
= Sum of Squares for Total Variation (SST)
∑ t
2
ˆ
(
y
−
y
)
= Sum of Squares from Regression (SSR)
∑ t
2
ˆ
e
∑t
= Sum of Squares from Error (SSE)
yt
y
•
•
•
SST = ∑ ( yt − y ) ⇐ ( yt − y )
2
t
•
•
}
( yt − yˆ t ) ⇒ ∑ eˆt2 = SSE
}
yˆ t = b1 + b2 xt
t
( yˆ t − y ) ⇒ ∑ ( yˆ t − y ) 2 = SSR
t
y
2
ˆ
−
(
y
y
)
∑ t
SSR
R =
=
2
∑ ( yt − y ) SST
2
t
t
xt
4. R2, a measure of the proportion of variation
in y explained by x within the regression model:
SSR
SSE
2
R =
= 1−
=
SST
SST
b22 ∑ ( xt − x )
∑ ( yt − y )
2
2
= 1−
2
ˆ
e
∑t
2
y
y
(
−
)
∑ t
•
coefficient of determination.
•
The closer it is to one, the better the job we have done in
explaining the variation in yt with yˆt = b1 + b2 xt ; and the greater it
is the predictive ability of our model over all the sample
observations.
• R 2 =1, SSE=0
vs. R 2 =0, SSR=0 vs. 0 < R 2<1
Uncentered vs. centered R2
1. Centered
2
2
2
ˆ
ˆ
(
y
−
y
)
=
(
y
−
y
)
+
e
∑ t
∑ t
∑t
t
t
⇒ R2 =
t
2
ˆ
(
y
−
y
)
∑ t
t
2
(
y
−
y
)
∑ t
t
2. Uncentered
2
2
2
ˆ
ˆ
y
y
e
=
+
∑ t ∑ t ∑t
t
t
t
⇒ R2 =
2
ˆ
y
∑ t
t
2
y
∑ t
t
yt
y
•
•
•
•
•
}
( yt − yˆ t ) ⇒ ∑ eˆt2 = SSE
t
yˆ t = b1 + b2 xt
( yˆ t − 0) ⇒ ∑ yˆ t2 = SSR
t
SST = ∑ yt2 ⇐ ( yt − 0)
t
R2 =
2
ˆ
y
∑ t
t
2
y
∑ t
t
xt
1. Uncentered R2
y = Xb + eˆ = yˆ + eˆ
y′y = (yˆ + eˆ)′(yˆ + eˆ) = yˆ′yˆ + eˆ′eˆ
= (b′X′Xb) + eˆ′eˆ
⇐ yˆ′eˆ = b′X′eˆ = 0
= {(X′X) -1 X′y}′ X′X{(X′X) -1 X′y} + eˆ′eˆ
= y′X(X′X) -1 X′X{(X′X) -1 X′y} + eˆ′eˆ
= y′X(X′X) -1 X′y + eˆ′eˆ
−1
′
′
y
X(
X
X)
X′y
2
R =
y′y
2
2
2
ˆ
ˆ
=
+
y
y
e
∑ t ∑ t ∑t
t
t
t
⇒ R2 =
2
ˆ
y
∑ t
t
2
y
∑ t
t
2. Centered R2
2
2
2
ˆ
ˆ
(
y
−
y
)
=
(
y
−
y
)
+
e
∑ t
∑ t
∑t
t
t
t
⇒ R2
∑ ( yˆ
=
∑(y
t
− y)2
t
− y)2
t
t
⎛ ∑ yt ⎞
yt ∑ y t
∑
∑t ( yt − y ) =∑ ( y + y − 2 yt y ) = ∑ y + T ⎜⎜ T ⎟⎟ − 2 T
⎠
⎝
2
2
(
y
)
′
(
i
y)
∑
t
= ∑ yt2 −
= y′y −
T
T
2
2
2
t
2
ˆ
−
(
y
y
)
∑ t
2
2
t
2
−1
′
′
′
′
−
y
X(
X
X)
X
y
(
i
y)
/T b′X′AXb
2
t
=
=
R =
;
2
2
y′y − (i′y) /T
y′Ay
∑ (y t − y)
t
A TxT
1
= [I T − ii′]
T
R-square is a descriptive measure.
By itself it does not measure the quality of the regression model.
It is not the objective of regression analysis to find the model
with the highest.
Following a regression strategy focused solely on maximizing is not
a good idea.
Why is it not the objective of regression?
Conceptually….
R2 has to do with predictability only.
R2 measures linear relationship between y and E(y).
Empirically….
The more explanatory variables are in the regression, the higher R2 is
5. Adjusted R2, Coefficient of Determination for Degree of Freedom
2
ˆ
SSE
/
T
−
2
σ
R 2 = 1−
= 1−
2
SST / T − 1
(
y
−
y
)
/ T −1
∑ t
R <R
2
2
6. Akaike Information Criterion
eˆ′eˆ 2k
AIC = ln
+
T
T
7. Schwarz Criterion
eˆ′eˆ k
SC = ln
+ ln T
T T
The computer output usually contains the Analysis of Variance.
For a simple regression analysis with obs. 40 it is:
Analysis of Variance Table
Source
Explained
Unexplained
Total
DF
1
38
39
R-square
Sum of
Squares
25221.2229
54311.3314
79532.5544
0.3171
K-1
T-k
T-1
Sample Computer Output
Dependent Variable: FOODEXP
Method: Least Squares
Sample: 1 40
Included observations: 40
Variable
Coefficient
Std. Error
C
40.76756
22.13865
INCOME
0.128289
0.030539
t-Statistic
1.841465
4.200777
Prob.
0.0734
0.0002
R-squared
0.317118
Mean dependent var
130.3130
Adjusted R-squared
0.299148
S.D. dependent var
45.15857
S.E. of regression
37.80536
Akaike info criterion
10.15149
Sum squared error
54311.33
Schwarz criterion
10.23593
F-statistic
17.64653
Log likelihood
Durbin-Watson stat
-201.0297
2.370373
Prob(F-statistic)
0.000155
4.7 The Least Squares Predictor
We want to predict for a given value of the explanatory variable
x0 the value of the dependent variable y0, which is given by
y0 = β1 + β2 x0 + e0
where e0 is a random error. This random error has mean E(e0)=0
2
and variance var(e0)= σ. We also assume that cov(e0, et)=0.
The least squares predictor of y0,
yˆ 0 = b1 + b2 x0
The forecast error is
f = yˆ 0 − y0 = b1 + b2 x0 − (β1 + β2 x0 + e0 )
= (b1 − β1 ) + (b2 − β2 ) x0 − e0
The expected value of f is:
E ( f ) = E ( yˆ 0 − y0 ) = E (b1 − β1 ) + E (b2 − β2 ) x0 − E (e0 )
=0+0−0=0
yˆ 0 is an unbiased linear predictor of y0
Variance of forecast error
•
f = yˆ 0 − y0 = b1 + b2 x0 − (β1 + β2 x0 + e0 )
= (b1 − β1 ) + (b2 − β2 ) x0 − e0
2
(
)
var(
)
Var
f
=
b
+
x
•
1
0 var(b2 ) + var(e0 ) + 2 x0 cov(b1 , b2 )
2
σ 2 x02
2
σ
x
x
σ2
x 2σ 2
2
0
σ
=
+
+
+
−
2
2
2
(
)
(
)
(
)
x
−
x
T
x
−
x
x
−
x
∑ t
∑ t
∑ t
2
2
2 x x0
x
1
x
2
0
]
=σ [
+ +
+1−
2
2
2
∑ ( xt − x ) T ∑ ( xt − x )
∑ ( xt − x )
2
(
)
x
−
x
1
0
]
= σ 2 [1 + +
2
T ∑ ( xt − x )
•
cov(b1 , b2 ) = E (b1 − E (b1 ))(b2 − E (b2 ))
= E ( y − b2 x − β1 )(b2 − β 2 )
= E ( β1 + β 2 x + e − b2 x − β1 )(b2 − β 2 )
= E{− x (b2 − β 2 )(b2 − β 2 )}
= − x E (b2 − β 2 ) 2
− xσ 2
=
2
x
x
(
−
)
∑ t
•
2
x
x
(
)
−
1
0
var( yˆ 0 ) = var(b1 + b2 x0 ) = σ 2 [ +
]
2
T ∑ ( xt − x )
“Variance of Predicted Value”
Estimated variance of forecast error
2 ⎤
⎡
−
1
(
x
x
)
0
var( f ) = var( yˆ 0 − y0 ) = σ2 ⎢1 + +
2⎥
⎢⎣ T ∑ ( xt − x ) ⎥⎦
The forecast error variance is estimated by replacing σ 2 by its
estimator, σ
ˆ2
2 ⎤
⎡
1
(
x
x
)
−
0
ˆ f ) = σˆ 2 ⎢1 + +
var(
2⎥
⎢⎣ T ∑ ( xt − x ) ⎥⎦
The square root of the estimated variance is the standard error of
the forecast,
ˆ (f)
se ( f ) = var