8. The Simple Linear Regression Model 8.1 The “True” Model

8. The Simple Linear Regression Model
8.1
8.2
8.3
8.4
8.5
The “True” Model
Understanding the SLR Model
Predictive Intervals Using the SLR Model
Sampling Distributions of the regression
model and confidence intervals
Hypothesis Testing
8.1
The “True” Model
• If we take a different sample of size 15 we would clearly get a different best fit regression line. That is, the line that minimizes the sum of in sample squared errors depends on the particular sample we got.
• Here is a new set of data of 20 houses from the same neighborhood.
1
New set of 20 houses from the same neighborhood.
Regression Plot
Y = 24.7587 + 40.7068X
R-Sq = 0.769
150
140
130
Price
120
110
100
90
80
70
60
50
1
2
3
Size
regression line from old data
The true model
• We now describe the “true” relationship between X and Y or the “true regression model”. • In the last couple of sections of the notes, we considered the example of estimating the Normal(,2) model with parameters  and 2.
sx2
 and 2. • We viewed and as estimates of x
• We now view b0 and b1 as estimates of parameters of a “true model”. • So what is the true regression model?
2
Our model is:
Y   0  1X  
How far off the line the
points tend to be (everything
about Y not captured by X)
The true line describing the
linear pattern
0
is the true intercept
1
is the true slope
We assume that  has a normal:
Saying that ε is a normally
distributed random variable
is a statement
about the possible values
that ε could turn out to be.
c
 ~ N 0,  2
mean of  is zero
(sometimes Y is above the line,
sometimes Y is below the line,
on average Y is on the line)
h
variance of  is 
If  is small,  tends to be small (close to zero).
If  is large,  tends to be large (far from zero).
3
A picture of the model:
Y   0  1X  
 0  1X
Y
 2 ~ N(0,  2 )
Given X=x1
expect Y1
in here
1 ~ N(0,  2 )
X
x2
x1
You can see the role played by σ
If σ is large…
If σ is small…
o
o
200
200
o
o
o o oooo
oooo
o
o o
o
o
o
o
o
o
o
o
o
o
oo
o oo o
o
o
o o o
o
o o
o
o
o
o
oo
oo o
oo
oo o o
o o
oo
o o
o o o
o o
o
oo
o
oo o o o
o o
o
o
ooo oo
o o
o
o
oo oo
o oo o o oo
o
o
o
o
oo
o
o ooo oo
o
o
o
o
o
150
o
50
o
50
o
o
Price
o
o oo
o o oo
ooo o
ooo ooo o
o
o
o
o
o o oo
o ooooo oo o
o oo ooooo oo oo o
oo oo
o o ooo
ooo o
o oo
oooo o o
o
o
o
o o
ooo o
o o ooooo
oo oo
o oo
ooo
oo oooo o
o o o ooo o
o
o oooo
o
o
o
o o
ooooo o
oo
ooooo
o
o ooo oooooo
oo ooooooooo o o
oo
o
oo oooo
ooooo
o ooo oo o
o o oo o
100
100
Price
150
o
o
o
o
o o oo oo
oo
oo
o o
o o o o
oo
o
o
o
o
oo o
oooo oo o o
o
o
o o
oo
o
o
o o o o ooo
o
o
o
oo
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
1.0
1.5
2.0
Size
2.5
3.0
3.5
1.0
1.5
2.0
2.5
3.0
3.5
Size
4
Y   0  1X  
c
 ~ N 0,  2
h
The model has three unknown parameters:
0
determine the “true line”
1

describes how big the part of
Y unrelated to X can be
We estimate these parameters from the data.
In fact, this is what we have …
Y
X
From the data we have to estimate the true
line (0 and 1) and .
5
8.2 Understanding the SLR Model
Regression reveals information about the
conditional distribution of Y given X.
That is, what do we know about Y for a
particular value of the X variable.
The marginal distribution tells us about
a random variable, without conditioning
on the value of another random variable.
• Recall our discussion of discrete random variables. • We could talk about the unconditional distribution (ie whether bond prices go up or down), or we could talk about conditional distributions (ie whether bonds go up or down given that stocks went down). 6
B
S
0
1
B
0
.3
.15
0
3/5
1
.2
.35
1
2/5
Pr(B|S=0)
.45 .55
Pr(B=0|S=0)=.3/.5
Pr(B=1|S=0)=.2/.5
Marginal or
“unconditional”
distribution of
bonds
Conditional
distribution of bonds given stocks go down
Interpretation: On days that the bonds fell, there was a 3/5 chance that stocks also fell and a 2/5 chance that stocks went up. Back to the regression problem. We can use
the regression model to form the conditional
expected value and variance of Y given X.
 Y | X  E (Y | X )  E (  0   1 X   | X )   0   1 X
 Y2 | X  Var ( Y | X )  Var (  0   1 X   | X )   2
The conditional distribution of Y given X:
Y | X ~ N ( 0   1 X , 2 )
7
Example
• Suppose that the distribution of sales prices is Normal with a mean of mean of 125 (thousand) and a standard deviation of 33. 0.014
N(125,332)
0.012
f(y)
0.01
0.008
0.006
0.004
0.002
0
50
70
90
110
130
150
170
190
price
• Suppose that the true value of 0 is 40 and the true value of 1 is 35.
• Then the average sales price of homes with 3000 sqft is 0 + 1(3)=40+35(3)=145
0.03
0.025
f(y)
0.02
N(145,152)
0.015
0.01
0.005
0
-0.005
50
70
90
110
130
150
170
190
price
8
Let’s graph the probability (density) of the sales price both unconditionally and conditional on the size=3000 sqft
0.03
0.025
Given
Size=3000
f(Y)
0.02
unconditional
0.015
0.01
0.005
0
-0.005
50
100
150
200
Price (Y)
8.3
Predictive Intervals Using the SLR Model
For the housing problem, suppose we knew
the true parameter values:
0 = 40, 1 = 35,  = 15
What can be said about a house with X = 3.0?
From the model, f(Y|X)=3000~N(145,152)
So there is a 95% chance that a house with
3000 sqft sells for somewhere between
145-2*15 and 145+2*15 or 115,000 to 175,000
9
In general, given X and the model parameters,
we are 95% sure that Y is in the interval
(  0   1 X  2 ,  0   1 X  2 )
In real life, we don’t know the true parameters
of the model (0, 1, and ). We estimate them
from the data. We use the estimated values
rather than the true, unknown, values.
Best guess for is (basically) the sample standard deviation of the residuals
Results of simple regression for Price
Results of simple regression for Price
Summary measures
Summary measures
Found here in
regression output.
Multiple R
0.9092
R‐Square
0.8267
Multiple R
R‐Square
StErr of Est
StErr of Est
0.9092
0.8267
14.1384
14.1384
ANOVA table
1 n 2
s 
 ei
n  2 i 1
2
ANOVA table
Source
Source
Explained
df
SS MS
MS F
1 12393.1077 12393.1077
61.9983
Explained
Unexplained
SS
1 12393.107 12393.107
13
Unexplained
We divide by
n‐2 for technical reasons.
df
2598.6256
p‐value
F
p‐value
0.0000
61.9983
0.0000
199.8943
13 2598.6256 199.8943
Regression coefficients
Regression coefficients
Coefficient
Std Err
t‐value
p‐value Lower limit Upper limit
Constant
38.8847Coef.9.0939Std Err
4.2759
0.0009
t‐value
19.2385 Lower limit
58.5309 Upper limit
p‐value
Size
Constant
35.3860
7.8739
38.88474.49419.0939
0.0000
4.2759
25.6771
0.0009
45.0948
19.2385
58.5309
7.8739
0.0000
25.6771
45.0948
Size
35.3860
4.4941
10
8.4 Sampling Distribution for the regression
model and confidence intervals
• It turns out that both b0 and b1 are unbiassed estimates of 0 and  so that E(b0)= 0 and E(b1)= .
• The next question is how wrong might they be? • We need to know the standard deviation (or standard error) of b0 and b1
Back to the housing regression:
Results of simple regression for Price
Summary measures
Multiple R
R‐Square
StErr of Est
0.9092
0.8267
14.1384
ANOVA table
Source
Explained
Unexplained
df
SS
MS
F
p‐value
1 12393.107 12393.107
61.9983
0.0000
13 2598.6256 199.8943
sb0
Regression coefficients
Coef.
Std Err
t‐value
p‐value Lower limit Upper limit
Constant
38.8847
9.0939
4.2759
0.0009
19.2385
58.5309
Size
35.3860
4.4941
7.8739
0.0000
25.6771
45.0948
sb1
11
It turns out that the standardized distance
that bj lies away from the true parameter j
follows a the tn-2 distribution (“t distribution
with n-2 degrees of freedom”) :
bj   j
sb j
~ tn2
The confidence intervals and hypothesis tests
are based on this key result.
Thus, a 95% confidence interval for j is
b  t
j
s , bj  tn2,.025 sbj
n2,.025 b j

• This is the same notation as before, specifically, we would use
=tinv(.05,n‐2) to obtain tn  2,.025
12
Back to the housing regression:
Regression coefficients
Coef.
Std Err
t‐value
p‐value Lower limit Upper limit
Constant
38.8847
9.0939
4.2759
0.0009
19.2385
58.5309
Size
35.3860
4.4941
7.8739
0.0000
25.6771
45.0948
Goal: Form a 95% confidence interval for 1.
b1  35.4
Calculation:
35.4  ( 2.16)( 4.5)  35.4  9.7
sb1  4.5
tn2,.025  t13,0.025  2.16
95% confidence interval for  1 : ( 25.7 , 45.1)
8.5
Hypothesis Testing
Suppose you are interested in testing
whether the true slope parameter, 1, is
equal to a particular value.
Ex: Is the “beta coefficient” from the market
model equal to 1?
Maybe you want to test whether X affects Y
in any simple linear regression model, we
would test whether 1 is equal to zero.
13
Example
In finance, a well known model for pricing assets regresses stock returns against returns of some market index, such as the S&P 500. The slope of the regression line is referred to as the asset’s “beta”. In your finance course you will learn that asset’s with high betas tend to be viewed as more risky than assets with low betas.
Facts: The market portfolio (S&P500) has a beta of 1
A portfolio with a beta larger than 1 should have a higher average return than the market (S&P500)
A portfolio with a beta less than 1 should have a smaller average return than the market (S&P500).
Form the following null hypothesis:
H 0 : 1  10
The alternative hypothesis is defined as the
opposite of the null hypothesis:
H a : 1  10
For the market model example, the null and
alternative hypotheses are:
H 0 :  1  1 and H a :  1  1
14
To test
H 0 :  1   10
, form the t-statistic:
b   10
t  1
s b1
The intuition
is exactly as
before.
The top is just the difference between the estimate and
the claimed value.
The bottom is the standard error that accounts for the
accuracy of our estimate.
If the null hypothesis is true, the t-statistic
should be small (in absolute value).
If the null hypothesis is false, the t-statistic
should be large (in absolute value).
The t-test
Reject H0 at the 5% level if
b1  10
| t |
 tn  2,.025
sb1
15
Beta estimate for IBM (n = 240):
Regression coefficients
Coefficient
Std Err
t‐value
p‐value
Lower limit
Upper limit
Constant
0.0031
0.0046
0.6786
0.4980
‐0.0059
0.0122
Market
0.8889
0.1011
8.7891
0.0000
0.6896
1.0881
sb1
b1
Test #1:
b1
sb1
(say more about this)
Test the null hypothesis that there is
no relationship between the return on
the IBM & the market return.
H 0 :  1  0 and H a :  1  0
t-statistic:
t
b1  0 .889

 8.78
sb1
.101
reported by
Excel!
Large sample so t‐cutoff is the same as Normal(0,1), or 2!
Reject H0 at the 5 level.
16
Test #2:
Test the null hypothesis that the
IBM has the same risk as the market
portfolio (i.e. test that the beta is 1).
H 0 :  1  1 and H a :  1  1
t-statistic:
t
b1 1 .1111

 1.0989
sb1
0.1011
Fail to reject H0 at the 5 level.
P-values are calculate just as before.
p-value  Pr | t n  2 |  | t | 
random variable
from tn-2 distribution
calculated
t-statistic
17
The p-value for testing  1  0 is part of
Excel’s regression output (under the
column labeled “p”).
Compute the p-value for testing  1  1 from
the IBM regression:
pvalue  Pr | t238 |  | 1.0989 |
=tdist(1.0989,238,2)=.273
So, the p-value is 27.3%.
RECALL
Loosely speaking:
Small p-value means we reject!
More precisely:
If the p-value < .05, then we reject at level .05!
In general:
If the p-value < α, then we reject at
level α!
18
Adding more X’s in the prediction problem, the multiple regression model
• Clearly there is more information that is useful in predicting the sales price of a house than just the size.
• We need to expand our model to allow for this possibility.
• This will be called the multiple regression model.
The multiple regression model
9.1 The multiple regression model.
9.2 Fitted values and residuals again
9.3 Prediction in the multiple
regression model.
9.4 Confidence intervals and
Hypothesis testing.
19
9.1
Multiple regression model
• So far we have considered modeling and prediction when we have one Y variable and one X.
• Often we might think that there is more than one variable that would be useful in predicting (the size of the house is surely not the only variable that carries information about the expected sales price!)
• More information should lead to more precise predication.
The “true” model with k independent (X)
variables:
Y   0   1 X 1   2 X 2  ...   k X k  
 ~ N ( 0,  2 ) i.i.d.
The conditional mean,
the “true plane”
(in a (k+1)dimensional
space!)
E (Y | X 1 , X 2 ,..., X k )   0   1 X 1   2 X 2  ...   k X k ,
is the part of Y related to the X’s.
The residual, , is independent of all the
X variables!
20
Once there is more than one X variable,
it’s difficult to plot the data and/or the
regression “line.”
One could use a 3-D plot for the case of
two X variables, but anything more than
two variables is basically hopeless…
Interpreting the coefficients:
the average (or expected)
change in Y for a one unit
change in Xj, holding all
of the other independent
variables fixed
 E (Y | X 1 , ..., X k )
j 
Xj
9.2
Fitted values and residuals
Parameters to estimate:
 0 ,  1 ,...,  k , 
Notation for the estimates:
b0 , b1 ,..., bk , s
Fitted values (in sample):
Y  b  b X  b X
i
0
1
1 ,i
2
2 ,i
 ...  bk X k ,i
Residuals:
ei  Yi  Yi
21
Choose b0, b1, …, bk to make the
sum of squared residuals as small as possible.
n
2
minimize  ei 
i 1
n
 (Y  b
i
0
 b1 X 1,i  ...  bk X k ,i ) 2
i 1
In the simple linear regression case (k=1),
we gave explicit formulae for the least-squares
estimates. To do so for k>1 requires extensive
use of matrix algebra. We pass. However:
R‐squared
• Rearranging the terms in the definition of the residual: and . Y  Yˆ  e
Var Y   Var Yˆ   Var  e 
i
i
i
• The R‐squared has the same interpretation as before and is defined as:
R2 
 
Var Yˆ
Var Y 
• It answers the question of “what fraction of the variation in the Y’s is explained by the X’s”.
22
Given b0, b1, …, bk, the estimate of the
variance of the true residual (2) is
n
1
s 
ei2

n  k  1 i 1
2
and the estimate of the standard deviation () is
s
n
1
ei2

n  k  1 i 1
Fact: E(s2) = 2 (unbiased)
Regression output of son’s height on mom’s height and dad’s height
Results of multiple regression for SHGT
R‐squared
s
Summary measures
Multiple R
R‐Square
Adj R‐Square
StErr of Est
ANOVA Table
Source
Explained
Unexplained
b0
b1
0.5083
0.2583
0.2380
2.5216
df
2
73
SS
161.6780
464.1773
MS
80.8390
6.3586
Regression coefficients
Coefficient
Constant
29.0036
MHGT
0.3601
FHGT
0.2726
Std Err
8.4198
0.1092
0.1063
t‐value
3.4447
3.2966
2.5641
F
12.7133
p‐value
0.0000
p‐value Lower limit Upper limit
0.0010
12.2230
45.7842
0.0015
0.1424
0.5778
0.0124
0.0607
0.4846
b2
23
9.3 Prediction in the multiple regression model
Predict Y given Xj,f (j = 1, …,k) where the
prediction is obtained by plugging in X’s:
Ypred=b0+b1X1,f+b2X2,f+…+bkXk,f
95% prediction interval is:
(Ypred  2 s, Ypred  2 s )
s is the standard error of the prediction
error.
Prediction from the multiple regression:
Predict the height when mom’s height is 64 inches and dad’s
height is 70 inches.
Summary measures
Multiple R
R‐Square
Adj R‐Square
StErr of Est
0.5083
0.2583
0.2380
2.5216
Regression coefficients
Coefficient
Constant
29.0036
MHGT
0.3601
FHGT
0.2726
Std Err
8.4198
0.1092
0.1063
t‐value
3.4447
3.2966
2.5641
p‐valueLower limitUpper limit
0.0010 12.2230 45.7842
0.0015
0.1424
0.5778
0.0124
0.0607
0.4846
Ypred=29.00+.26*64+.27*70=64.54
95% prediction interval=64.54+/-2*2.52
24
9.4 Distribution of bj in the multiple regression
Fact: E(b0) = 0, E(b1) = 1, …, E(bk) = k
Fact:
bj   j
sb j
~ t n  k 1
accounts for the fact
that (k+1) parameters
are being estimated
This fact can be used to construct confidence
intervals and perform t-tests.
(1)
Confidence Intervals
95% confidence interval is:
(bj  tnk 1,.025 sbj , bj  tnk 1,.025 sbj )
25
(2)
t-Tests and p-values
Test H0 : j   0j vs. Ha : j   0j
Reject H0 at the 5% level if t 
b
bj   0j
sbj
 tnk 1,.025
g
p - value  Pr | t n  k 1 |  | t |
random variable
calculated
from tn-k-1 distribution t-statistic
Example: Regression of male MBA student
height on parents’ heights (n = 76)
Regression coefficients
Coefficien
t
Constant
29.0036
MHGT
0.3601
FHGT
0.2726
Std Err
8.4198
0.1092
0.1063
t‐value
3.4447
3.2966
2.5641
Lower Upper p‐value
limit
limit
0.0010 12.2230 45.7842
0.0015 0.1424 0.5778
0.0124 0.0607 0.4846
“partial” effects
are significantly
different from zero
 confidence interval for MomHgt estimate:
(0.36  t 73, 0.025 (0.109), 0.36  t 73, 0.025 (0.109))  (0.14, 0.58)
 confidence interval for DadHgt estimate:
( 0.273  t 73 , 0 .025 ( 0.106 ), 0.273  t 73 , 0 .025 ( 0.106 ))  ( 0.061, 0.485)
26