Download Report

EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
1
Sylvan Herskowitz
Section Handout 8
Pop Quiz: This quiz is worth 85% of your final grade.
You have 2 minutes. Go.
Imagine you are estimating a model: cdsales = β 0 + β 1 radioplay + β 2 price + β 3 genre + u. You are
interested in knowing how getting lots of radio play impacts cd sales.
1. What omitted variables may be biasing an estimation of this model?
2. What assumption would this lead us to violate?
3. What direction do you think omission of this variable might be introducing?
Imagine you want to improve this model and introduce a new variable, quality, which (somehow)
measures the objective quality of a given album.
1. What is likely to happen to the SSR of the new model relative to the original?
2. What is likely to happen to the R2 of the new model relative to the original?
3. What TWO effects might this have on the standard error of βˆ1 .
4. What is the equation for the variance of Var ( βˆ1 )?
Warm-Up: Interpreting βˆ
Write down the two-sentence size interpretation for the underlined βˆ in each of the following
regressions:
ˆ
log(wage
) = 1.056 − .254 f emale + .117educ
ˆ = 2.670 + .279cm + .001avgcons
error
Quick note on p-values and significance levels:
These four concepts—significance level, p-value, t-statistic, and critical value—are easy to mix up
and are intimately related. Maybe a picture helps.
1
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
2
Sylvan Herskowitz
Section Handout 8
Hypothesis Testing and Confidence Interval Review
Just to help keep all of these test straight:
Test Type
Population mean
e.g. H0 : µ = µ0
Test Statistic
x¯ −µ0
q
t ∼ t n −1
( x¯r
1 − x¯ 2 )− µ0
t ∼ t n1 + n2 −2
pˆ − p0
z ∼ N (0, 1)
t=
Difference in population means
e.g. H0 : µ1 − µ2 = µ0
t=
Population proportion
e.g. H0 : p = p0
z=
Difference in population proportions
e.g. H0 : p1 − p2 = p0
z=
Distribution
s2
n
s2
s2
1
2
n1 + n2
q
p0 (1− p0 )
n
( pˆ − pˆ )− p0
r 1 2
ˆ
ˆ
pˆ (1− pˆ )
+ p(1n− p)
n
z ∼ N (0, 1)
2
1
True regression parameter (k other vars)
e.g. H0 : β = β 0
βˆ − β 0
SE( βˆ )
t ∼ t n − k −1
2 − R2 /q
( RUR
R)
2
(1− RUR )/(n−k−1)
F ∼ Fq,n−k−1
t=
Multiple restrictions in regression
(q restrictions, k total variables in UR model)
F=
And one for confidence intervals:
Confidence Interval for:
Population mean (non-binary)
h
i
x¯ − cSE( x¯ ), x¯ + cSE( x¯ )
Standard Error
q
Difference in population means (non-binary)
h
i
ˆ − cSE D
ˆ ,D
ˆ − cSE D
ˆ
D
q
Population mean/proportion (binary)
h
i
pˆ − cSE pˆ , pˆ + cSE pˆ
q
Difference in population proportions (binary)
h
i
ˆ ,D
ˆ − cSE D
ˆ
ˆ − cSE D
D
Regression
population parameter
h
i
βˆ − cSE βˆ , βˆ − cSE βˆ
q
s21
n1
s2
n
+
t n −1
s22
n2
pˆ (1− pˆ )
n
pˆ 1 (1− pˆ 1 )
n1
+
Distribution of c
pˆ 2 (1− pˆ 2 )
n2
Stata Ouput
t n1 + n2 −2
t n −1
t n1 + n2 −2
t n − k −1
2
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
3
Sylvan Herskowitz
Section Handout 8
CI and HT Practice with Regressions
Consider the equation:
colGPA = β 0 + β 1 hsGPA + β 2 ACT + β 3 skipped + u
where colGPA is cumulative college grade point average, hsGPA is high school GPA, and
skipped is the average lectures skipped per week. What are your expectations for the coefficients
in this equation?
1. Estimate the equation and report the results. Assume that n = 141. Test for the hypothesis
β 3 = 0.
ˆ
colGPA
= 1.3896 + .4118hsGPA + .0147ACT − .0831skipped
(0.332)
(0.094)
(0.011)
(0.026)
• Step 1: State the hypotheses:
H0 :
H1 :
• Step 2: Compute the test statistic:
t=
• Step 3: Choose significance level and critical value:
• Step 4: Reject the null hypothesis
or
fail to reject the null
• Step 5: Interpret:
2. Construct a 90% confidence interval for β 3 . Interpret your results.
(a) Confidence level:
(b) x¯ & s:
(c) Find c90 :
(d) Compute & Interpret interval:
(e) Interpret:
3
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
3. Test for the hypothesis β 1 = .4 against the two-sided alternative at the 5% significance level.
• Step 1: State the hypotheses:
H0 :
H1 :
• Step 2: Compute the test statistic:
t=
• Step 3: Choose significance level and critical value: Using the t-table, for two-side, at
0.05 significance, with 141 − 3 − 1 = 137 degrees of freedom, c = 1.960.
• Step 4: Reject the null hypothesis
or
fail to reject the null
• Step 5: Interpret:
4. Test for the hypothesis β 1 = 1 against β 1 < 1 at the 10% significance level.
• Step 1: State the hypotheses:
H0 :
H1 :
• Step 2: Compute the test statistic:
t=
• Step 3: Choose significance level and critical value:
• Step 4: Reject the null hypothesis
or
fail to reject the null
• Step 5: Interpret:
4
Hypothesis Testing with Two Proportions
Example Now let’s consider an example from actual data for a poverty alleviation program in Mexico. In
1997, 24,059 households in rural Mexico were randomly allocated between treatment and control groups for
a conditional cash transfer program called Oportunidades to keep kids in school. When analyzing the results of a randomized experiment, the first step is to verify that the control group is, on average, very much
like the treatment group in terms of characteristics that we observe and have data for. For example, data
was collected on household assets. Your data reveals that while 14.47% of the 14,846 treatment households
have a refrigerator, and 16.53% of the 9,213 control households have one. In order to confirm that about the
same proportion of households in each group have a refrigerator, we need to perform a hypothesis test.
4
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
Call the sample proportion of households with a refrigerator in the treatment group pˆ t , the true treatment proportion pt , the sample proportion of households with a refrigerator in the control group pˆ c , and
the true control proportion pc . Also, call the whole sample proportion of households (in either treatment or
ˆ
control) with a refrigerator p.
Step 1.
H0 :
pt − pc = D = 0
H1 :
pt − pc = D 6= 0
Step 2. How do we compute this test statistic? We know that the null hypothesis specifies E[ pt −
pc ] = 0, so what’s left is the standard deviation. Whenever we’re testing a difference of means, remember
the formula: Var ( x¯ − y¯ ) = Var ( x¯ ) + Var (y¯ ).
So applying the formula, we have that:
Var ( pˆ t − pˆ c ) = Var ( pˆ t ) + Var ( pˆ c )
Var ( pˆ t ) =
Var ( pˆ c ) =
pˆ (1− pˆ )
nt
pˆ (1− pˆ )
nc
Which means SD ( pˆ t − pˆ c ) =
q
pˆ (1− pˆ )
nt
+
pˆ (1− pˆ )
nc
The trickiest part here is keeping track of what your null hypothesis is! Now we’re ready to calculate our
z-statistic:
ˆ
D
=
pˆ
=
ˆ)
SD ( D
=
⇒
pˆ t − pˆ c = −.0206
14846
9213
(.1447) +
(.1653) = .1526
24059
24059
r
.1526(1 − .1526) .1526(1 − .1526)
+
= .00477
14846
9213
−.0206 − 0
z=
= −4.32
.00477
Step 3. By the null hypotheses we chose, we’re doing a two-sided test. Let’s choose the 5% significance level as this is the most common test that economists evaluate. Check the normal table to find that
c = 1.96
Step 4.
Reject
Fail to reject
Step 5. Interpret: At the 5% significance level, there is statistical evidence that the proportion of
households with a refrigerator in the control group is not the same as the proportion of households with a
refrigerator in the treatment group. What does this mean for the study? Probably not much. In randomized
experiments such as this one, many household characteristics are checked for “balance” across treatment and control.
Statistically, we expect that some of our hypothesis tests will reject the null simply because a 5% significance level
indicates that 5% of the time we will reject the null even though it’s true.
Confidence Interval
With this same example, how would we compute a confidence interval?
The KEY difference here is that now, instead of assuming a null hypothesis to be true, we are just
taking our estimated variance from what we observe in our sample(s). Therefore, instead of constructing
a pˆ that represents the mean of all observations in our sample, we allow for the means of the two samples
5
EEP/IAS 118 - Introductory Applied Econometrics
Spring 2015
Sylvan Herskowitz
Section Handout 8
to be different. In fact, the confidence interval is centered on our estimated difference from the sample. We
then use standard errors from these sub-samples to constuct the standard error of their difference:
ˆ = pˆt − pˆc
D
\
We can use the formula Var ( x¯ − y¯ ) = Var ( x¯ ) + Var (y¯ ) to find the Var
( Dˆ ) = Var\
( pˆ t − pˆ c ):
\
Var
( Dˆ )
\
\
= Var
( pˆ t ) + Var
( pˆ c )
2
sp
\
Var
(dˆp ) =
np
\
Var
(dˆc )
=
s2c
nc
You can then take the square root of this estimated variance to get an estimate for the estimator’s standard
error. Then, we can plug in our values in order to construct the confidence interval.
s
s
CIW = x¯ − cW √
, x¯ + cW √
n
n
6