Final sample exam

Final sample exam
Multiple-Choice Questions
Choose the one alternative that best completes the statement or answers the question.
1) A possible solution to errors-in-variables bias is to
A) mitigate the problem through instrumental variables regression.
B) use log-log specifications.
C) use the square root of that variable since the error becomes smaller.
D) choose different functional forms.
2) The following equations belong to the class of linear regression model except:
A) Yi = β 0 + β1 X i + β 2 X i + ui .
2
B) ln Yi = β 0 + β1 X 1i + ui .
C) Yi = ln ( β 0 + β1 X i + ui ) .
D) Yi = ln ( β 0 + β1 X i ) + ui .
3) The interpretation of the slope coefficient in the model Yi = β 0 + β1 ln ( X i ) + ui is: a
A) 1% change in X is associated with a β1% change in Y.
B) change in X by one unit is associated with a 100 β1% change in Y.
C) 1% change in X is associated with a change in Y of 0.01β1.
D) change in X by one unit is associated with a β1 change in Y.
4) To test the population regression function is linear rather than a polynomial of order r,
A) look at the pattern of the coefficients: if they change from positive to negative to positive,
etc., then the polynomial regression should be used.
B) use the test of (r-1) restrictions using the F-statistic.
C) compare the TSS from both regressions.
1
D) check whether the regression R2 for the polynomial regression is higher than that of the
linear regression.
5) Including an interaction term between two independent variables, X1 and X2,allows for the
following, except that: the interaction term
A) lets the effect on Y of a change in X2 depend on the value of X1.
B) lets the effect on Y of a change in X1 depend on the value of X2.
C) coefficient is the effect of a unit increase in
( X1 × X 2 ) .
D) coefficient is the effect of a unit increase in X1 and X2 above and beyond the sum of
the individual effects of a unit increase in the two variables alone.
6) The ADL(p, q) model is represented by the following equation
A) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ 0 + δ1 X t −1 + ut − q .
B) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ1 X t −1 + δ 2 X t − 2 + " + δ q X t − q + ut .
C) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ q ut − q .
D) Yt = β 0 + β pYt − p + δ q X t − q + ut .
7) In the log-log model, the slope coefficient indicates
A) the elasticity of Y with respect to X.
C) ΔY ΔX .
B)
ΔY Y
× .
ΔX X
D) the effect that a unit change in X has on Y.
8) Simultaneous causality
A) means that a third variable affects both Y and X.
B) leads to correlation between the repressor and the error term.
C) cannot be established since regression analysis only detects correlation between
variables.
2
D) means you must run a second repression of X on Y.
9) Sample selection bias
A) results in the OLS estimator being biased, although it is still consistent.
B) is more important for nonlinear least squares estimation than for OLS.
C) is only important for finite sample results.
D) occurs when a selection process influences the availability of data and that process is
related to the dependent variable.
10) Possible solutions to omitted variable bias, when the omitted variable is not observed,
include the following with the exception of
A) use of instrumental variables regressions.
B) panel data estimation.
C) use of randomized controlled experiments.
D) nonlinear least squares estimation.
11) The Granger causality test
A) uses the F-statistic to test the hypothesis that certain repressors have no predictive content
for the dependent variable beyond that contained in the other repressors.
B) is a special case of the augmented Dickey-Fuller test.
C) establishes the direction of causality (as used in common parlance) between X and Y in
addition to correlation.
D) is a rather complicated test for statistical independence.
12) The root mean squared forecast error (RMSFE) is defined as
A)
E ⎡ Yt − Yˆt|t −1 ⎤ .
⎣
⎦
C)
2
E ⎡ Yt − Yˆt|t −1 ⎤ .
⎢⎣
⎥⎦
(
)
3
B)
(Y − Yˆ )
D)
E ⎡ Yt − Yˆt|t −1 ⎤ .
⎣
⎦
t |t −1
t
(
2
.
)
13) In order to make reliable forecasts with time series data, all of the following conditions are
needed with the exception of
A) the presence of omitted variable bias.
B) the regression having high explanatory power.
C) coefficients having been estimated precisely.
D) the regression being stable.
14) The first difference of the logarithm of Yt equals
A) the difference between the lead and the lag of Y.
B) the growth rate of Y exactly.
C) approximately the growth rate of Y when the growth rate is small.
D) the first difference of Y.
15) Stationarity means that the
A) error terms are not correlated.
B) forecasts remain within 1.96 standard deviation outside the sample period.
C) time series has a unit root.
D) probability distribution of the time series variable does not change over time.
16) Negative autocorrelation in the change of a variable implies that
A) the data are negatively trended.
B) the variable contains only negative values.
C) the series is not stable.
D) an increase in the variable in one period is, on average, associated with a decrease in
the next.
4
17) The AR(p) model
A) is defined as Yt = β 0 + β pYt − p + ut .
B) can be written as Yt = β 0 + β1Yt −1 + ut − p .
C) represents Yt as a linear function of p of its lagged values.
D) can be represented as follows: Yt = β 0 + β1 X t + β pYt − p + ut .
18) To choose the number of lags in either an autoregression or a time series regression model
with multiple predictors, you can use any of the following test statistics with the exception of
A) Bayes information criterion.
B) augmented Dickey-Fuller test.
C) Akaike information criterion.
D) F-statistic.
19) A possible solution to errors-in-variables bias is to
A) mitigate the problem through instrumental variables regression.
B) use log-log specifications.
C) use the square root of that variable since the error becomes smaller.
D) choose different functional forms.
20) Pseudo out-of-sample forecasting can be used for the following reasons with the exception of
A) analyzing whether or not a time series contains a unit root.
B) estimating the RMSFE.
C) evaluating the relative forecasting performance of two or more forecasting models.
D) giving the forecaster a sense of how well the model forecasts at the end of the sample.
5
Essay Questions.
The size of test is 5% if not specified in question.
1. Discuss the five threats to the internal validity of regression studies. (20)
To regress Beef Demand (B) on the Constant (C), the price of Beef (P) and Per Capita
Disposable Income (YD), obtain
Dependent Variable: B
Method: Least Squares
Date: 12/12/07
Time: 15:54
Sample: 1960 1987
Included observations: 28
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
37.53605
10.04020
3.738575
0.0010
P
-0.882623
0.164730
-5.357981
0.0000
YD
11.89115
1.762162
6.748045
0.0000
R-squared
0.658030
Mean dependent var
106.6500
Adjusted R-squared
0.630672
S.D. dependent var
10.00561
S.E. of regression
6.080646
Akaike info criterion
6.549056
Sum squared resid
924.3564
Schwarz criterion
6.691792
F-statistic
24.05287
Prob(F-statistic)
0.000001
Log likelihood
Durbin-Watson stat
-88.68678
0.292597
a)
Omitted Variable Bias
b)
Wrong Functional Form
c)
Errors-in-Variables Bias
d)
Sample Selection Bias
e)
Simultaneous Causality Bias
Answers:
a) We know if the regressor(the price of Beef) is correlated with a variable that has been
omitted from the analysis but that determines, in part, the dependent variable(Beef
Demand), then the OLS estimator will have omitted variable bias. And omitted variable
bias means that the first least square assumption—that E (ui | X i ) = 0 , is incorrect. Then
β 1 will be the inconsistent estimator of β1 .
6
And we know the pork or mutton or other meat is the substitutes for beef. The change of their
prices will influence the demand of beef. So if we exclude the price of pork or mutton as
regressor, there will be omitted variable bias in the model.
b) The regressor is the price of Beef (P) and Per Capita Disposable Income (YD).It assumes that
the relation between the demand of beef and Per Capita Disposable Income is linear. But in
2
fact, the relation may be not linear. Perhaps YD will influence the demand of beef
significantly. Thus the function form will be wrong.
c) Errors in variables bias in the OLS estimator arises when an independent variable is measured
imprecisely. Then
β 1 will be biased towards zero, even in large sample. We know the price
of beef ,YD and the demand of beef is dynamic. The data we get may be imprecise. There will
be measurement bias for these variables.
d) Sample selection bias arises when a selection process influences the availability of data and
that process is related to the dependent variable. Sample selection induces correlation between
one or more regressors and the error term, leading to bias and inconsistency of the OLS
estimator.
For the above model, that how do we choose the sample is very important. For example if we
get the data from different areas, and if the people in one area didn’t like beef for some
reasons, and in another area the people never eat pork, the coefficient will be very different.
e) Simultaneous causality bias arises in a regression of Y on X when, in addition to the causal
link of interest from X to Y, there is a causal link from Y on X. This reverse causality makes X
correlated with the error term in the population regression of interest.
For this question, we know the price of beef will influence the demand of beef. But if the
demand of beef increase, according to the supply and demand theory, the price of beef will
increase too. Thus there will be Simultaneous causality bias.
2. Time Series Analysis of US Inflation Rates (20)
Define DINF = INF – INF(-1), which is the first difference of inflation rate.
Before you run autoregressive models, you did a ADF tests on inflation rate.
ADF Test Statistic
-2.546901
1%
Critical Value*
-3.4725
5%
Critical Value
-2.8797
10% Critical Value
-2.5763
*MacKinnon critical values for rejection of hypothesis of a unit root.
7
Augmented Dickey-Fuller Test Equation
Dependent Variable: D(INF)
Method: Least Squares
Date: 12/15/07
Time: 18:50
Sample(adjusted): 1960:2 1999:4
Included observations: 159 after adjusting endpoints
Variable
Coefficient
Std. Error
t-Statistic
Prob.
INF(-1)
-0.105769
0.041528
-2.546901
0.0118
D(INF(-1))
-0.189017
0.082751
-2.284149
0.0237
D(INF(-2))
-0.236062
0.079496
-2.969496
0.0035
D(INF(-3))
0.207785
0.078366
2.651488
0.0089
C
0.480850
0.217344
2.212391
0.0284
R-squared
0.231259
Mean dependent var
0.017699
Adjusted R-squared
0.211291
S.D. dependent var
1.698389
S.E. of regression
1.508327
Akaike info criterion
3.690821
Sum squared resid
350.3579
Schwarz criterion
3.787327
F-statistic
11.58186
Prob(F-statistic)
0.000000
Log likelihood
-288.4203
Durbin-Watson stat
1.994166
Regress inflation rate on its first lag term and get the result as follows:
Dependent Variable: INF
Method: Least Squares
Date: 12/15/07
Time: 18:26
Sample: 1960:1 1999:4
Included observations: 160
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
0.660881
0.223931
2.951273
0.0036
INF(-1)
0.849624
0.041839
20.30703
0.0000
R-squared
0.722990
Mean dependent var
4.367900
Adjusted R-squared
0.721236
S.D. dependent var
3.107202
S.E. of regression
1.640543
Akaike info criterion
3.840352
Sum squared resid
425.2381
Schwarz criterion
3.878792
F-statistic
412.3756
Prob(F-statistic)
0.000000
Log likelihood
Durbin-Watson stat
-305.2282
2.296115
8
Furthermore, you run the AR(3) model and get
Dependent Variable: INF
Method: Least Squares
Date: 12/15/07
Time: 18:30
Sample: 1960:1 1999:4
Included observations: 160
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
0.383577
0.218769
1.753345
0.0815
INF(-1)
0.637163
0.075867
8.398401
0.0000
INF(-2)
-0.040623
0.091351
-0.444690
0.6572
INF(-3)
0.317906
0.075544
4.208218
0.0000
R-squared
0.759449
Mean dependent var
4.367900
Adjusted R-squared
0.754823
S.D. dependent var
3.107202
S.E. of regression
1.538542
Akaike info criterion
3.724229
Sum squared resid
369.2693
Schwarz criterion
3.801109
F-statistic
164.1704
Prob(F-statistic)
0.000000
Log likelihood
-293.9383
Durbin-Watson stat
1.856161
Then, you run AR(1) and AR(3) model about DINF, respectively.
Dependent Variable: DINF
Method: Least Squares
Date: 12/15/07
Time: 18:35
Sample: 1960:1 1999:4
Included observations: 160
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
0.005898
0.130855
0.045070
0.9641
DINF(-1)
-0.242908
0.077174
-3.147511
0.0020
R-squared
0.059002
Mean dependent var
0.004774
Adjusted R-squared
0.053046
S.D. dependent var
1.700916
S.E. of regression
1.655187
Akaike info criterion
3.858127
Sum squared resid
432.8640
Schwarz criterion
3.896566
F-statistic
9.906826
Prob(F-statistic)
0.001969
Log likelihood
Durbin-Watson stat
-306.6501
2.155980
9
Dependent Variable: DINF
Method: Least Squares
Date: 12/15/07
Time: 18:38
Sample(adjusted): 1960:2 1999:4
Included observations: 159 after adjusting endpoints
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
0.018678
0.121719
0.153452
0.8782
DINF(-1)
-0.262771
0.078879
-3.331310
0.0011
DINF(-2)
-0.288200
0.078162
-3.687201
0.0003
DINF(-3)
0.177317
0.078806
2.250044
0.0259
R-squared
0.198878
Mean dependent var
0.017699
Adjusted R-squared
0.183373
S.D. dependent var
1.698389
S.E. of regression
1.534791
Akaike info criterion
3.719501
Sum squared resid
365.1154
Schwarz criterion
3.796706
F-statistic
12.82622
Prob(F-statistic)
0.000000
Log likelihood
-291.7003
Durbin-Watson stat
1.975932
a)
Explain the meaning and purpose of ADF test in time series analysis.
b)
Interpret result of the ADF test. Why the dependent variable is DINF in regression?
c)
After you get the above regression results, you decide to use one of the four models
to forecast the next-period inflation rate. Explain your decision.
d)
Given the quarterly inflation rates in 1999, what is your forecast of 2000:I.
1999:I
1999:II
1999:III
1999:IV
1.62
2.82
2.80
3.18
Answers:
a) The ADF test for a unit autoregressive root tests the null hypothesis H 0 : δ = 0
against the one-sided alternative H1 : δ < 0 in the regression
ΔYt = β 0 + δ Yt −1 + γ 1ΔYt −1 + γ 2 ΔYt − 2 + ... + γ p ΔYt − p + ut
Under the null hypothesis, Yt has a stochastic trend; under the alternative hypothesis,
Yt is stationary. The ADF statistic is the OLS t-statistic testing δ = 0 in last equation.
10
b)
D( INF ) = 0.481 − 0.106 INF (−1) − 0.189 D( INF (−1))
−0.236 D( INF (−2)) + 0.208 D( INF (−3))
The ADF t-statistic testing is the t-statistic testing the hypothesis that the coefficient on
INF( − 1) is zero; that is t= − 2.546901. And the 5% critical value is -2.8797. Because
the ADF statistic is less negative than -2.8797, we can’t reject the null hypothesis at
the 5% significance level.
So we can’t reject the hypothesis at the 5% significance level that the inflation has a unit
autoregressive root, that inflation contains a stochastic trend, against that alternative that
it is stationary.
c) The fourth model is the best. Because according to the ADF test at the 5% significance
level , we can’t reject the null hypothesis that inflation contains a stochastic trend against
that alternative that it is stationary. So we should use the lags of D(INF) as regressors.
For the AR(1) and AR(3) model, the R-squared is 0.059002 and 0.198878 respectively, so
AR(3) is better than AR(1); And Akaike info criterion is 3.858127 and 3.719501, so AR(3)
is better than AR(1); And Schwarz criterion is 3.896566 and 3.796706, so AR(3) is better
than AR(1).
According the above statements, we should choose the fourth model.
d) According to the fourth model, we get
n t = 0.02 − 0.26 DINF − 0.29 DINF + 0.18 DINF ,
DINF
−1
−2
−3
Then we can get
DINF1999:II = 2.82 − 1.62 = 1.2 ,
DINF1999:III = 2.80 − 2.82 = −0.02
DINF1999:IV = 3.18 − 2.80 = 0.38 ,
Thus,
n 2000:I = 0.02 − 0.26 DINF
DINF
1999:IV − 0.29 DINF1999:III + 0.18 DINF1999:II
= 0.02 − 0.26 × 0.38 + 0.29 × 0.02 + 0.18 × 1.2
= 0.143
Then, INF2000:I = INF1999:IV + DINF2000:I = 3.18 + 0.143 = 3.32 .
11
3. Measurement Errors in Variables (20)
Assume there exists an exact linear relationship between true weights and true heights:
Wi = β0 + β1Hi. However, weights and heights are measured with errors as follows:
Yi = Wi + wi and Xi = Hi + vi,
where wi and vi are uncorrelated with Wi and Hi respectively. To figure out the relationship
between weights and heights, suppose you run the following regression:
Yi = β0 + β1Xi + ui for i = 1, 2, …, n.
a)
Show that OLS estimator of β1 is biased toward zero.
b)
Under which conditions, the OLS estimator of β1 is unbiased?
Answers:
a)
12
b)
Since
p
β 1 ⎯⎯
→ β1
σ H2
σ H2 + σ v 2
Then if there is no measurement error,
p
σ v2 = 0 so β 1 ⎯⎯
→ β1 .
一、单选题答案
AD C B C
BAB D D
ACAC D
D C B AA
13