EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 1 Sylvan Herskowitz Section Handout 8 Pop Quiz: This quiz is worth 85% of your final grade. You have 2 minutes. Go. Imagine you are estimating a model: cdsales = β 0 + β 1 radioplay + β 2 price + β 3 genre + u. You are interested in knowing how getting lots of radio play impacts cd sales. 1. What omitted variables may be biasing an estimation of this model? 2. What assumption would this lead us to violate? 3. What direction do you think omission of this variable might be introducing? Imagine you want to improve this model and introduce a new variable, quality, which (somehow) measures the objective quality of a given album. 1. What is likely to happen to the SSR of the new model relative to the original? 2. What is likely to happen to the R2 of the new model relative to the original? 3. What TWO effects might this have on the standard error of βˆ1 . 4. What is the equation for the variance of Var ( βˆ1 )? Warm-Up: Interpreting βˆ Write down the two-sentence size interpretation for the underlined βˆ in each of the following regressions: ˆ log(wage ) = 1.056 − .254 f emale + .117educ ˆ = 2.670 + .279cm + .001avgcons error Quick note on p-values and significance levels: These four concepts—significance level, p-value, t-statistic, and critical value—are easy to mix up and are intimately related. Maybe a picture helps. 1 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 2 Sylvan Herskowitz Section Handout 8 Hypothesis Testing and Confidence Interval Review Just to help keep all of these test straight: Test Type Population mean e.g. H0 : µ = µ0 Test Statistic x¯ −µ0 q t ∼ t n −1 ( x¯r 1 − x¯ 2 )− µ0 t ∼ t n1 + n2 −2 pˆ − p0 z ∼ N (0, 1) t= Difference in population means e.g. H0 : µ1 − µ2 = µ0 t= Population proportion e.g. H0 : p = p0 z= Difference in population proportions e.g. H0 : p1 − p2 = p0 z= Distribution s2 n s2 s2 1 2 n1 + n2 q p0 (1− p0 ) n ( pˆ − pˆ )− p0 r 1 2 ˆ ˆ pˆ (1− pˆ ) + p(1n− p) n z ∼ N (0, 1) 2 1 True regression parameter (k other vars) e.g. H0 : β = β 0 βˆ − β 0 SE( βˆ ) t ∼ t n − k −1 2 − R2 /q ( RUR R) 2 (1− RUR )/(n−k−1) F ∼ Fq,n−k−1 t= Multiple restrictions in regression (q restrictions, k total variables in UR model) F= And one for confidence intervals: Confidence Interval for: Population mean (non-binary) h i x¯ − cSE( x¯ ), x¯ + cSE( x¯ ) Standard Error q Difference in population means (non-binary) h i ˆ − cSE D ˆ ,D ˆ − cSE D ˆ D q Population mean/proportion (binary) h i pˆ − cSE pˆ , pˆ + cSE pˆ q Difference in population proportions (binary) h i ˆ ,D ˆ − cSE D ˆ ˆ − cSE D D Regression population parameter h i βˆ − cSE βˆ , βˆ − cSE βˆ q s21 n1 s2 n + t n −1 s22 n2 pˆ (1− pˆ ) n pˆ 1 (1− pˆ 1 ) n1 + Distribution of c pˆ 2 (1− pˆ 2 ) n2 Stata Ouput t n1 + n2 −2 t n −1 t n1 + n2 −2 t n − k −1 2 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 3 Sylvan Herskowitz Section Handout 8 CI and HT Practice with Regressions Consider the equation: colGPA = β 0 + β 1 hsGPA + β 2 ACT + β 3 skipped + u where colGPA is cumulative college grade point average, hsGPA is high school GPA, and skipped is the average lectures skipped per week. What are your expectations for the coefficients in this equation? 1. Estimate the equation and report the results. Assume that n = 141. Test for the hypothesis β 3 = 0. ˆ colGPA = 1.3896 + .4118hsGPA + .0147ACT − .0831skipped (0.332) (0.094) (0.011) (0.026) • Step 1: State the hypotheses: H0 : H1 : • Step 2: Compute the test statistic: t= • Step 3: Choose significance level and critical value: • Step 4: Reject the null hypothesis or fail to reject the null • Step 5: Interpret: 2. Construct a 90% confidence interval for β 3 . Interpret your results. (a) Confidence level: (b) x¯ & s: (c) Find c90 : (d) Compute & Interpret interval: (e) Interpret: 3 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 Sylvan Herskowitz Section Handout 8 3. Test for the hypothesis β 1 = .4 against the two-sided alternative at the 5% significance level. • Step 1: State the hypotheses: H0 : H1 : • Step 2: Compute the test statistic: t= • Step 3: Choose significance level and critical value: Using the t-table, for two-side, at 0.05 significance, with 141 − 3 − 1 = 137 degrees of freedom, c = 1.960. • Step 4: Reject the null hypothesis or fail to reject the null • Step 5: Interpret: 4. Test for the hypothesis β 1 = 1 against β 1 < 1 at the 10% significance level. • Step 1: State the hypotheses: H0 : H1 : • Step 2: Compute the test statistic: t= • Step 3: Choose significance level and critical value: • Step 4: Reject the null hypothesis or fail to reject the null • Step 5: Interpret: 4 Hypothesis Testing with Two Proportions Example Now let’s consider an example from actual data for a poverty alleviation program in Mexico. In 1997, 24,059 households in rural Mexico were randomly allocated between treatment and control groups for a conditional cash transfer program called Oportunidades to keep kids in school. When analyzing the results of a randomized experiment, the first step is to verify that the control group is, on average, very much like the treatment group in terms of characteristics that we observe and have data for. For example, data was collected on household assets. Your data reveals that while 14.47% of the 14,846 treatment households have a refrigerator, and 16.53% of the 9,213 control households have one. In order to confirm that about the same proportion of households in each group have a refrigerator, we need to perform a hypothesis test. 4 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 Sylvan Herskowitz Section Handout 8 Call the sample proportion of households with a refrigerator in the treatment group pˆ t , the true treatment proportion pt , the sample proportion of households with a refrigerator in the control group pˆ c , and the true control proportion pc . Also, call the whole sample proportion of households (in either treatment or ˆ control) with a refrigerator p. Step 1. H0 : pt − pc = D = 0 H1 : pt − pc = D 6= 0 Step 2. How do we compute this test statistic? We know that the null hypothesis specifies E[ pt − pc ] = 0, so what’s left is the standard deviation. Whenever we’re testing a difference of means, remember the formula: Var ( x¯ − y¯ ) = Var ( x¯ ) + Var (y¯ ). So applying the formula, we have that: Var ( pˆ t − pˆ c ) = Var ( pˆ t ) + Var ( pˆ c ) Var ( pˆ t ) = Var ( pˆ c ) = pˆ (1− pˆ ) nt pˆ (1− pˆ ) nc Which means SD ( pˆ t − pˆ c ) = q pˆ (1− pˆ ) nt + pˆ (1− pˆ ) nc The trickiest part here is keeping track of what your null hypothesis is! Now we’re ready to calculate our z-statistic: ˆ D = pˆ = ˆ) SD ( D = ⇒ pˆ t − pˆ c = −.0206 14846 9213 (.1447) + (.1653) = .1526 24059 24059 r .1526(1 − .1526) .1526(1 − .1526) + = .00477 14846 9213 −.0206 − 0 z= = −4.32 .00477 Step 3. By the null hypotheses we chose, we’re doing a two-sided test. Let’s choose the 5% significance level as this is the most common test that economists evaluate. Check the normal table to find that c = 1.96 Step 4. Reject Fail to reject Step 5. Interpret: At the 5% significance level, there is statistical evidence that the proportion of households with a refrigerator in the control group is not the same as the proportion of households with a refrigerator in the treatment group. What does this mean for the study? Probably not much. In randomized experiments such as this one, many household characteristics are checked for “balance” across treatment and control. Statistically, we expect that some of our hypothesis tests will reject the null simply because a 5% significance level indicates that 5% of the time we will reject the null even though it’s true. Confidence Interval With this same example, how would we compute a confidence interval? The KEY difference here is that now, instead of assuming a null hypothesis to be true, we are just taking our estimated variance from what we observe in our sample(s). Therefore, instead of constructing a pˆ that represents the mean of all observations in our sample, we allow for the means of the two samples 5 EEP/IAS 118 - Introductory Applied Econometrics Spring 2015 Sylvan Herskowitz Section Handout 8 to be different. In fact, the confidence interval is centered on our estimated difference from the sample. We then use standard errors from these sub-samples to constuct the standard error of their difference: ˆ = pˆt − pˆc D \ We can use the formula Var ( x¯ − y¯ ) = Var ( x¯ ) + Var (y¯ ) to find the Var ( Dˆ ) = Var\ ( pˆ t − pˆ c ): \ Var ( Dˆ ) \ \ = Var ( pˆ t ) + Var ( pˆ c ) 2 sp \ Var (dˆp ) = np \ Var (dˆc ) = s2c nc You can then take the square root of this estimated variance to get an estimate for the estimator’s standard error. Then, we can plug in our values in order to construct the confidence interval. s s CIW = x¯ − cW √ , x¯ + cW √ n n 6
© Copyright 2025