Final sample exam Multiple-Choice Questions Choose the one alternative that best completes the statement or answers the question. 1) A possible solution to errors-in-variables bias is to A) mitigate the problem through instrumental variables regression. B) use log-log specifications. C) use the square root of that variable since the error becomes smaller. D) choose different functional forms. 2) The following equations belong to the class of linear regression model except: A) Yi = β 0 + β1 X i + β 2 X i + ui . 2 B) ln Yi = β 0 + β1 X 1i + ui . C) Yi = ln ( β 0 + β1 X i + ui ) . D) Yi = ln ( β 0 + β1 X i ) + ui . 3) The interpretation of the slope coefficient in the model Yi = β 0 + β1 ln ( X i ) + ui is: a A) 1% change in X is associated with a β1% change in Y. B) change in X by one unit is associated with a 100 β1% change in Y. C) 1% change in X is associated with a change in Y of 0.01β1. D) change in X by one unit is associated with a β1 change in Y. 4) To test the population regression function is linear rather than a polynomial of order r, A) look at the pattern of the coefficients: if they change from positive to negative to positive, etc., then the polynomial regression should be used. B) use the test of (r-1) restrictions using the F-statistic. C) compare the TSS from both regressions. 1 D) check whether the regression R2 for the polynomial regression is higher than that of the linear regression. 5) Including an interaction term between two independent variables, X1 and X2,allows for the following, except that: the interaction term A) lets the effect on Y of a change in X2 depend on the value of X1. B) lets the effect on Y of a change in X1 depend on the value of X2. C) coefficient is the effect of a unit increase in ( X1 × X 2 ) . D) coefficient is the effect of a unit increase in X1 and X2 above and beyond the sum of the individual effects of a unit increase in the two variables alone. 6) The ADL(p, q) model is represented by the following equation A) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ 0 + δ1 X t −1 + ut − q . B) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ1 X t −1 + δ 2 X t − 2 + " + δ q X t − q + ut . C) Yt = β 0 + β1Yt −1 + β 2Yt − 2 + " + β pYt − p + δ q ut − q . D) Yt = β 0 + β pYt − p + δ q X t − q + ut . 7) In the log-log model, the slope coefficient indicates A) the elasticity of Y with respect to X. C) ΔY ΔX . B) ΔY Y × . ΔX X D) the effect that a unit change in X has on Y. 8) Simultaneous causality A) means that a third variable affects both Y and X. B) leads to correlation between the repressor and the error term. C) cannot be established since regression analysis only detects correlation between variables. 2 D) means you must run a second repression of X on Y. 9) Sample selection bias A) results in the OLS estimator being biased, although it is still consistent. B) is more important for nonlinear least squares estimation than for OLS. C) is only important for finite sample results. D) occurs when a selection process influences the availability of data and that process is related to the dependent variable. 10) Possible solutions to omitted variable bias, when the omitted variable is not observed, include the following with the exception of A) use of instrumental variables regressions. B) panel data estimation. C) use of randomized controlled experiments. D) nonlinear least squares estimation. 11) The Granger causality test A) uses the F-statistic to test the hypothesis that certain repressors have no predictive content for the dependent variable beyond that contained in the other repressors. B) is a special case of the augmented Dickey-Fuller test. C) establishes the direction of causality (as used in common parlance) between X and Y in addition to correlation. D) is a rather complicated test for statistical independence. 12) The root mean squared forecast error (RMSFE) is defined as A) E ⎡ Yt − Yˆt|t −1 ⎤ . ⎣ ⎦ C) 2 E ⎡ Yt − Yˆt|t −1 ⎤ . ⎢⎣ ⎥⎦ ( ) 3 B) (Y − Yˆ ) D) E ⎡ Yt − Yˆt|t −1 ⎤ . ⎣ ⎦ t |t −1 t ( 2 . ) 13) In order to make reliable forecasts with time series data, all of the following conditions are needed with the exception of A) the presence of omitted variable bias. B) the regression having high explanatory power. C) coefficients having been estimated precisely. D) the regression being stable. 14) The first difference of the logarithm of Yt equals A) the difference between the lead and the lag of Y. B) the growth rate of Y exactly. C) approximately the growth rate of Y when the growth rate is small. D) the first difference of Y. 15) Stationarity means that the A) error terms are not correlated. B) forecasts remain within 1.96 standard deviation outside the sample period. C) time series has a unit root. D) probability distribution of the time series variable does not change over time. 16) Negative autocorrelation in the change of a variable implies that A) the data are negatively trended. B) the variable contains only negative values. C) the series is not stable. D) an increase in the variable in one period is, on average, associated with a decrease in the next. 4 17) The AR(p) model A) is defined as Yt = β 0 + β pYt − p + ut . B) can be written as Yt = β 0 + β1Yt −1 + ut − p . C) represents Yt as a linear function of p of its lagged values. D) can be represented as follows: Yt = β 0 + β1 X t + β pYt − p + ut . 18) To choose the number of lags in either an autoregression or a time series regression model with multiple predictors, you can use any of the following test statistics with the exception of A) Bayes information criterion. B) augmented Dickey-Fuller test. C) Akaike information criterion. D) F-statistic. 19) A possible solution to errors-in-variables bias is to A) mitigate the problem through instrumental variables regression. B) use log-log specifications. C) use the square root of that variable since the error becomes smaller. D) choose different functional forms. 20) Pseudo out-of-sample forecasting can be used for the following reasons with the exception of A) analyzing whether or not a time series contains a unit root. B) estimating the RMSFE. C) evaluating the relative forecasting performance of two or more forecasting models. D) giving the forecaster a sense of how well the model forecasts at the end of the sample. 5 Essay Questions. The size of test is 5% if not specified in question. 1. Discuss the five threats to the internal validity of regression studies. (20) To regress Beef Demand (B) on the Constant (C), the price of Beef (P) and Per Capita Disposable Income (YD), obtain Dependent Variable: B Method: Least Squares Date: 12/12/07 Time: 15:54 Sample: 1960 1987 Included observations: 28 Variable Coefficient Std. Error t-Statistic Prob. C 37.53605 10.04020 3.738575 0.0010 P -0.882623 0.164730 -5.357981 0.0000 YD 11.89115 1.762162 6.748045 0.0000 R-squared 0.658030 Mean dependent var 106.6500 Adjusted R-squared 0.630672 S.D. dependent var 10.00561 S.E. of regression 6.080646 Akaike info criterion 6.549056 Sum squared resid 924.3564 Schwarz criterion 6.691792 F-statistic 24.05287 Prob(F-statistic) 0.000001 Log likelihood Durbin-Watson stat -88.68678 0.292597 a) Omitted Variable Bias b) Wrong Functional Form c) Errors-in-Variables Bias d) Sample Selection Bias e) Simultaneous Causality Bias Answers: a) We know if the regressor(the price of Beef) is correlated with a variable that has been omitted from the analysis but that determines, in part, the dependent variable(Beef Demand), then the OLS estimator will have omitted variable bias. And omitted variable bias means that the first least square assumption—that E (ui | X i ) = 0 , is incorrect. Then β 1 will be the inconsistent estimator of β1 . 6 And we know the pork or mutton or other meat is the substitutes for beef. The change of their prices will influence the demand of beef. So if we exclude the price of pork or mutton as regressor, there will be omitted variable bias in the model. b) The regressor is the price of Beef (P) and Per Capita Disposable Income (YD).It assumes that the relation between the demand of beef and Per Capita Disposable Income is linear. But in 2 fact, the relation may be not linear. Perhaps YD will influence the demand of beef significantly. Thus the function form will be wrong. c) Errors in variables bias in the OLS estimator arises when an independent variable is measured imprecisely. Then β 1 will be biased towards zero, even in large sample. We know the price of beef ,YD and the demand of beef is dynamic. The data we get may be imprecise. There will be measurement bias for these variables. d) Sample selection bias arises when a selection process influences the availability of data and that process is related to the dependent variable. Sample selection induces correlation between one or more regressors and the error term, leading to bias and inconsistency of the OLS estimator. For the above model, that how do we choose the sample is very important. For example if we get the data from different areas, and if the people in one area didn’t like beef for some reasons, and in another area the people never eat pork, the coefficient will be very different. e) Simultaneous causality bias arises in a regression of Y on X when, in addition to the causal link of interest from X to Y, there is a causal link from Y on X. This reverse causality makes X correlated with the error term in the population regression of interest. For this question, we know the price of beef will influence the demand of beef. But if the demand of beef increase, according to the supply and demand theory, the price of beef will increase too. Thus there will be Simultaneous causality bias. 2. Time Series Analysis of US Inflation Rates (20) Define DINF = INF – INF(-1), which is the first difference of inflation rate. Before you run autoregressive models, you did a ADF tests on inflation rate. ADF Test Statistic -2.546901 1% Critical Value* -3.4725 5% Critical Value -2.8797 10% Critical Value -2.5763 *MacKinnon critical values for rejection of hypothesis of a unit root. 7 Augmented Dickey-Fuller Test Equation Dependent Variable: D(INF) Method: Least Squares Date: 12/15/07 Time: 18:50 Sample(adjusted): 1960:2 1999:4 Included observations: 159 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. INF(-1) -0.105769 0.041528 -2.546901 0.0118 D(INF(-1)) -0.189017 0.082751 -2.284149 0.0237 D(INF(-2)) -0.236062 0.079496 -2.969496 0.0035 D(INF(-3)) 0.207785 0.078366 2.651488 0.0089 C 0.480850 0.217344 2.212391 0.0284 R-squared 0.231259 Mean dependent var 0.017699 Adjusted R-squared 0.211291 S.D. dependent var 1.698389 S.E. of regression 1.508327 Akaike info criterion 3.690821 Sum squared resid 350.3579 Schwarz criterion 3.787327 F-statistic 11.58186 Prob(F-statistic) 0.000000 Log likelihood -288.4203 Durbin-Watson stat 1.994166 Regress inflation rate on its first lag term and get the result as follows: Dependent Variable: INF Method: Least Squares Date: 12/15/07 Time: 18:26 Sample: 1960:1 1999:4 Included observations: 160 Variable Coefficient Std. Error t-Statistic Prob. C 0.660881 0.223931 2.951273 0.0036 INF(-1) 0.849624 0.041839 20.30703 0.0000 R-squared 0.722990 Mean dependent var 4.367900 Adjusted R-squared 0.721236 S.D. dependent var 3.107202 S.E. of regression 1.640543 Akaike info criterion 3.840352 Sum squared resid 425.2381 Schwarz criterion 3.878792 F-statistic 412.3756 Prob(F-statistic) 0.000000 Log likelihood Durbin-Watson stat -305.2282 2.296115 8 Furthermore, you run the AR(3) model and get Dependent Variable: INF Method: Least Squares Date: 12/15/07 Time: 18:30 Sample: 1960:1 1999:4 Included observations: 160 Variable Coefficient Std. Error t-Statistic Prob. C 0.383577 0.218769 1.753345 0.0815 INF(-1) 0.637163 0.075867 8.398401 0.0000 INF(-2) -0.040623 0.091351 -0.444690 0.6572 INF(-3) 0.317906 0.075544 4.208218 0.0000 R-squared 0.759449 Mean dependent var 4.367900 Adjusted R-squared 0.754823 S.D. dependent var 3.107202 S.E. of regression 1.538542 Akaike info criterion 3.724229 Sum squared resid 369.2693 Schwarz criterion 3.801109 F-statistic 164.1704 Prob(F-statistic) 0.000000 Log likelihood -293.9383 Durbin-Watson stat 1.856161 Then, you run AR(1) and AR(3) model about DINF, respectively. Dependent Variable: DINF Method: Least Squares Date: 12/15/07 Time: 18:35 Sample: 1960:1 1999:4 Included observations: 160 Variable Coefficient Std. Error t-Statistic Prob. C 0.005898 0.130855 0.045070 0.9641 DINF(-1) -0.242908 0.077174 -3.147511 0.0020 R-squared 0.059002 Mean dependent var 0.004774 Adjusted R-squared 0.053046 S.D. dependent var 1.700916 S.E. of regression 1.655187 Akaike info criterion 3.858127 Sum squared resid 432.8640 Schwarz criterion 3.896566 F-statistic 9.906826 Prob(F-statistic) 0.001969 Log likelihood Durbin-Watson stat -306.6501 2.155980 9 Dependent Variable: DINF Method: Least Squares Date: 12/15/07 Time: 18:38 Sample(adjusted): 1960:2 1999:4 Included observations: 159 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. C 0.018678 0.121719 0.153452 0.8782 DINF(-1) -0.262771 0.078879 -3.331310 0.0011 DINF(-2) -0.288200 0.078162 -3.687201 0.0003 DINF(-3) 0.177317 0.078806 2.250044 0.0259 R-squared 0.198878 Mean dependent var 0.017699 Adjusted R-squared 0.183373 S.D. dependent var 1.698389 S.E. of regression 1.534791 Akaike info criterion 3.719501 Sum squared resid 365.1154 Schwarz criterion 3.796706 F-statistic 12.82622 Prob(F-statistic) 0.000000 Log likelihood -291.7003 Durbin-Watson stat 1.975932 a) Explain the meaning and purpose of ADF test in time series analysis. b) Interpret result of the ADF test. Why the dependent variable is DINF in regression? c) After you get the above regression results, you decide to use one of the four models to forecast the next-period inflation rate. Explain your decision. d) Given the quarterly inflation rates in 1999, what is your forecast of 2000:I. 1999:I 1999:II 1999:III 1999:IV 1.62 2.82 2.80 3.18 Answers: a) The ADF test for a unit autoregressive root tests the null hypothesis H 0 : δ = 0 against the one-sided alternative H1 : δ < 0 in the regression ΔYt = β 0 + δ Yt −1 + γ 1ΔYt −1 + γ 2 ΔYt − 2 + ... + γ p ΔYt − p + ut Under the null hypothesis, Yt has a stochastic trend; under the alternative hypothesis, Yt is stationary. The ADF statistic is the OLS t-statistic testing δ = 0 in last equation. 10 b) D( INF ) = 0.481 − 0.106 INF (−1) − 0.189 D( INF (−1)) −0.236 D( INF (−2)) + 0.208 D( INF (−3)) The ADF t-statistic testing is the t-statistic testing the hypothesis that the coefficient on INF( − 1) is zero; that is t= − 2.546901. And the 5% critical value is -2.8797. Because the ADF statistic is less negative than -2.8797, we can’t reject the null hypothesis at the 5% significance level. So we can’t reject the hypothesis at the 5% significance level that the inflation has a unit autoregressive root, that inflation contains a stochastic trend, against that alternative that it is stationary. c) The fourth model is the best. Because according to the ADF test at the 5% significance level , we can’t reject the null hypothesis that inflation contains a stochastic trend against that alternative that it is stationary. So we should use the lags of D(INF) as regressors. For the AR(1) and AR(3) model, the R-squared is 0.059002 and 0.198878 respectively, so AR(3) is better than AR(1); And Akaike info criterion is 3.858127 and 3.719501, so AR(3) is better than AR(1); And Schwarz criterion is 3.896566 and 3.796706, so AR(3) is better than AR(1). According the above statements, we should choose the fourth model. d) According to the fourth model, we get n t = 0.02 − 0.26 DINF − 0.29 DINF + 0.18 DINF , DINF −1 −2 −3 Then we can get DINF1999:II = 2.82 − 1.62 = 1.2 , DINF1999:III = 2.80 − 2.82 = −0.02 DINF1999:IV = 3.18 − 2.80 = 0.38 , Thus, n 2000:I = 0.02 − 0.26 DINF DINF 1999:IV − 0.29 DINF1999:III + 0.18 DINF1999:II = 0.02 − 0.26 × 0.38 + 0.29 × 0.02 + 0.18 × 1.2 = 0.143 Then, INF2000:I = INF1999:IV + DINF2000:I = 3.18 + 0.143 = 3.32 . 11 3. Measurement Errors in Variables (20) Assume there exists an exact linear relationship between true weights and true heights: Wi = β0 + β1Hi. However, weights and heights are measured with errors as follows: Yi = Wi + wi and Xi = Hi + vi, where wi and vi are uncorrelated with Wi and Hi respectively. To figure out the relationship between weights and heights, suppose you run the following regression: Yi = β0 + β1Xi + ui for i = 1, 2, …, n. a) Show that OLS estimator of β1 is biased toward zero. b) Under which conditions, the OLS estimator of β1 is unbiased? Answers: a) 12 b) Since p β 1 ⎯⎯ → β1 σ H2 σ H2 + σ v 2 Then if there is no measurement error, p σ v2 = 0 so β 1 ⎯⎯ → β1 . 一、单选题答案 AD C B C BAB D D ACAC D D C B AA 13
© Copyright 2024