Sample Exams Utrecht University School of Economics (USE)

Sample Exams
Utrecht University School of Economics (USE)
In this page you find two sample exams to test your required knowledge in quantitative methods:
 Econometrics, general exam (page 2)
 Econometrics, Stata (page 9)
These sample exams give you an indication of the level you are expected to have in Econometrics
before you start one of the Master’s programmes offered by Utrecht University School of
Economics:
 International Economics and Business
 Economics of Public Policy and Management
 Economics and Law
 Multidisciplinary Economics (Research Programme)
Please bear in mind that these tests are meant as self-assessment. Please do not send it in, we will not
assess your test.
If after completion of the exams you feel that you need to brush up your knowledge of Econometrics
before the start of your programme, we strongly recommend you to do the Utrecht University
Summer School course Introductory Econometrics.
1
Sample exam 1: Econometrics, general exam
QUESTION 1
a. Consider the multiple regression model with two independent variables. We have a random
sample of N observations. The regression equation is
Y   0  1 X 1   2 X 2  
for which we assume that the error term  is independent of the explanatory variables X 1 and X 2 .
Under some circumstances, the researcher is not able to calculate the Ordinary Least Squares
(OLS) estimator of the regression parameters. Could you please explain when the OLS-estimator
cannot be calculated?
b. After applying OLS, the estimated regression equation may be rewritten as
Y  ˆ0  ˆ1 X 1  ˆ2 X 2  e
For which e is the residual. The researcher claims that the residual e may be correlated to the
explanatory variables X 1 and X 2 if the regression equation excludes relevant explanatory
variables (thus there are omitted variables). Is the researcher right? Please explain your answer.
c. Which of the following outcome(s) can cause the t-value of the estimated parameters not to be tdistributed? Do they lead to unbiased estimates of the regression parameters? Please motivate your
answer briefly.
 Heteroskedasticity of the error term.
 A correlation coefficient of 0.80 between two explanatory variables of the regression equation.
 One of the explanatory variables is correlated to the error term of the regression equation.
d. Next, we assume that all of the classical assumptions of the regression model are valid. The
regression equation becomes:
Y   0  1 X 1   2 X 1 X 2  3 X 2  
Please determine the effect of X 1 on Y.
e. We assume that the random variables X and X are independent. Please rewrite:
1

E( X  5 X | X  x )

Cov(5 X , X  X )
1
2
1
1
1
2
1
2
Next, we assume that the random variables X and X are not independent. Please rewrite:
1
2
2

Var (2 X  3 X )
1
2
QUESTION 2
The following information is available:
tothours
avgsal
lavgsal
sales
lsales
dy1
dy2
dy3
large
: total hours of training
: annual salary (in $)
: logarithm of avgsal
: annual sales (in $)
: logarithm of sales
: dummy variable; 1 for 1987
: dummy variable; 1 1988
: dummy variable; 1 1989
: dummy variable; 1 if large firm
sum tothrs avgsal lavgsal sales lsales dy1 dy2 dy3 large
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------tothrs |
304
33.39474
52.42831
0
320
avgsal |
304
18723.52
6967.911
4237
42583
lavgsal |
304
9.773076
.3590829
8.351611
10.65921
sales |
304
6413918
7899873
110000
4.90e+07
lsales |
304
15.07133
1.126338
11.60824
17.70733
-------------+-------------------------------------------------------dy1 |
304
.3190789
.4668882
0
1
dy2 |
304
.3322368
.471792
0
1
dy3 |
304
.3486842
.4773396
0
1
large |
304
.2138158
.4106743
0
1
A researcher regresses the variable tothours on the logarithm of avgsal and the logarithm of sales,
using a random sample of 304 observations, which were sampled in 1987, 1988, and 1989. See the
regression output below.
. reg tothrs lavgsal lsales
Source |
SS
df
MS
-------------+-----------------------------Model | 52584.9232
2 26292.4616
Residual | 780279.708
301 2592.29139
-------------+-----------------------------Total | 832864.632
303 2748.72816
Number of obs
F( 2,
301)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
304
10.14
0.0001
0.0631
0.0569
50.915
-----------------------------------------------------------------------------tothrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
17.08952
8.314689
2.06
0.041
.7272393
33.4518
lsales |
-11.5002
2.65077
-4.34
0.000
-16.71659
-6.283811
_cons |
39.70086
83.09774
0.48
0.633
-123.8252
203.227
------------------------------------------------------------------------------
a) Please give a precise economic interpretation of the estimated parameter on lavgsal.
Next, we add two dummy variables dy2 and dy3 to the equation. See the Stata output below.
3
. reg tothrs lavgsal lsales dy2 dy3
Source |
SS
df
MS
-------------+-----------------------------Model | 56680.5896
4 14170.1474
Residual | 776184.042
299 2595.93325
-------------+-----------------------------Total | 832864.632
303 2748.72816
Number of obs
F( 4,
299)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
304
5.46
0.0003
0.0681
0.0556
50.95
-----------------------------------------------------------------------------tothrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
15.59959
8.405549
1.86
0.064
-.9419353
32.14112
lsales | -11.66302
2.655821
-4.39
0.000
-16.88949
-6.436551
dy2 |
4.221359
7.271943
0.58
0.562
-10.08931
18.53203
dy3 |
9.090815
7.254422
1.25
0.211
-5.185378
23.36701
_cons |
52.14363
83.74939
0.62
0.534
-112.6693
216.9565
------------------------------------------------------------------------------
b) Please apply a statistical testing procedure to test whether year has a statistically significant
effect on the total hours of training. Use a significance level   0.05 .
c) Please provide an economic interpretation of the parameter estimate on dy3 (for which you may
ignore that the estimated parameter is statistically insignificant).
d) Is there any indication of heteroskedasticity? See the regression output below. Please apply a
statistical testing procedure, for which you use a significance level   0.05 .
predict uhat, resid
gen uhat2 = uhat^2
reg uhat lavgsal lsales dy2 dy3
Source |
SS
df
MS
-------------+-----------------------------Model | 1.1642e-10
4 2.9104e-11
Residual | 776184.046
299 2595.93326
-------------+-----------------------------Total | 776184.046
303 2561.66352
Number of obs
F( 4,
299)
Prob > F
R-squared
Adj R-squared
Root MSE
=
304
=
0.00
= 1.0000
= 0.0000
= -0.0134
=
50.95
-----------------------------------------------------------------------------uhat |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
1.92e-07
8.405549
0.00
1.000
-16.54153
16.54153
lsales | -6.72e-08
2.655821
-0.00
1.000
-5.226469
5.226469
dy2 | -2.04e-07
7.271943
-0.00
1.000
-14.31067
14.31067
dy3 | -1.75e-07
7.254422
-0.00
1.000
-14.27619
14.27619
_cons | -7.52e-07
83.74939
-0.00
1.000
-164.8129
164.8129
-----------------------------------------------------------------------------. reg uhat2 lavgsal lsales dy2 dy3
Source |
SS
df
MS
Number of obs =
304
-------------+-----------------------------F( 4,
299) =
3.82
Model | 1.0166e+09
4
254151538
R-squared
= 0.0486
Residual | 1.9889e+10
299 66516778.3
Adj R-squared = 0.0359
-------------+-----------------------------Root MSE
= 8155.8
Total | 2.0905e+10
303 68993804.9
-----------------------------------------------------------------------------uhat2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
232.6349
1345.504
0.17
0.863
-2415.222
2880.492
lsales | -1631.846
425.126
-3.84
0.000
-2468.464
-795.2283
dy2 | -143.6411
1164.044
-0.12
0.902
-2434.397
2147.115
dy3 |
534.56
1161.239
0.46
0.646
-1750.677
2819.797
_cons |
24735.1
13406.04
1.85
0.066
-1647.041
51117.25
4
------------------------------------------------------------------------------
e) The dummy variable large is introduced, which becomes 1 for a large firm. Is the regression
equation of sub-question a) different between large firms and small firms? Use a significance level
  0.05 .
gen large_lavgsal = large*lavgsal
gen large_lsales = large*lsales
reg tothrs lavgsal lsales large large_lavgsal large_lsales
Source |
SS
df
MS
Number of obs =
304
-------------+-----------------------------F( 5,
298) =
5.08
Model | 65423.7882
5 13084.7576
Prob > F
= 0.0002
Residual | 767440.843
298 2575.30484
R-squared
= 0.0786
-------------+-----------------------------Adj R-squared = 0.0631
Total | 832864.632
303 2748.72816
Root MSE
= 50.747
-----------------------------------------------------------------------------tothrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
22.45324
8.665497
2.59
0.010
5.399921
39.50656
lsales | -9.971127
3.131076
-3.18
0.002
-16.13295
-3.809305
large |
582.7495
273.2314
2.13
0.034
45.04194
1120.457
large_lavgsal| -59.82844
30.77553
-1.94
0.053
-120.3933
.7364569
large_lsales | -.0325676
9.070745
-0.00
0.997
-17.8834
17.81826
_cons | -34.48482
90.38981
-0.38
0.703
-212.368
143.3984
------------------------------------------------------------------------------
f) The next regression is an equation that includes an interaction term in large and the average
salary: large_lavgsal. Please give a precise economic interpretation of the effect of salary on hours
of training.
reg tothrs lavgsal lsales large large_lavgsal
Source |
SS
df
MS
Number of obs =
304
-------------+-----------------------------F( 4,
299) =
6.37
Model |
65423.755
4 16355.9388
Prob > F
= 0.0001
Residual | 767440.877
299 2566.69189
R-squared
= 0.0786
-------------+-----------------------------Adj R-squared = 0.0662
Total | 832864.632
303 2748.72816
Root MSE
= 50.663
-----------------------------------------------------------------------------tothrs |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lavgsal |
22.45481
8.640065
2.60
0.010
5.451766
39.45785
lsales | -9.975008
2.933707
-3.40
0.001
-15.74834
-4.20168
large |
582.6985
272.4064
2.14
0.033
46.62183
1118.775
large_lavgsal|
-59.8758
27.75878
-2.16
0.032
-114.5031
-5.248465
_cons | -34.44257
89.4705
-0.38
0.701
-210.5142
141.6291
------------------------------------------------------------------------------
g) Given the estimation results, is it necessary to include an intercept in the equation of subquestion f?
QUESTION 3
A researcher wants to investigate the effect of the percentage of unemployment (unem) on the real
wage (wage), using annual data. She specifies the following equation:
(1) waget   0  1waget 1   2unemt  3unemt 1   t
5
a) Using the parameters of equation (1), please give a careful economic interpretation of  2 .
b) Using the parameters of equation (1), please calculate the long-run effect of unemployment on
the real wage.
c) When would it be necessary to include a time trend in equation (1)?
d) The researcher investigates first-order autocorrelation of the error term. Please formulate
carefully an equation (in which you also explain the notation) that specifies a first-order
autocorrelation process of the error term. You do not need to provide a testing procedure on
autocorrelation.
e) Two Dickey Fuller tests are performed. One Dickey Fuller test for the real wage and another
test for the unemployment rate. The zero hypothesis of the test is rejected both for the real
wage and for the percentage of unemployment. What are the consequences for the OLSestimator of the regression parameters of equation (1)?
6
Formula card
1. OLS estimator
Simple
Multiple
N
(2.4) ˆ1 =
 (
i 1
i
 )( i   )
N
 (
i 1
i
 ) 2
(2.5) ˆ0    ˆ1 X
(slides) Var ( ˆ1 ) =
2
N
 (
i 1
2
i  )
(slides) Var ( ˆ j ) =
2
SST j (1  R 2j )
N
where SST j   ( X ij  X j ) 2
i 1
(slides) Var ( ˆ 0 ) =
 2.
N
1
i2

n i 1
N
 (
i 1
i
 ) 2
N
RSS   ei2 (2.35)
i 1
(slides) ˆ 2 =
N
RSS
1
)  ei2
(
n  2 n  2 i 1
(slides) ˆ 2 =
RSS
N  ( K  1)
(slides) se ( ˆ j ) =
ˆ
TSS j (1  R 2j )
N
where TSS j   ( X ij  X j ) 2
i 1
2. Summary statistics
Total sample variation of Y (Total Sum of Squares):
N
TSS   ( i   ) 2
(2.12)
i 1
R squared:
RSS
R2  1 
TSS
(2.14)
7
Adjusted R squared
R2  1
ˆ 2
TSS /( N  1)
 1  (1  R 2 ).
N 1
N  ( K  1)
3. Test statistics
F-statistic for multiple linear restrictions:
F=
RSS R  RSS N  ( K  1)
.
~ FM , N ( K 1)
RSS
M
4. Time series
AR(1)-process: Yt  1Yt 1  ut , t  1, 2,....
(5.11)
(12.3)
Durbin-Watson (DW) statistic on autocorrelation
N
d
 (e  e
t 2
t 1
t
N
e
t 1
)2
(9.10)
2
1
d  2(1  ˆ ).
8
Exam 2: Econometrics, Stata
Please note that the required data sets are not provided. This example exam is meant to give
you a realistic impression of the exam and the knowledge that we consider as known.
QUESTION 1 data set: exam_apple.dta
Please give the Stata commands and mention very briefly the conclusions. In all subquestions of this exam you need to assume a significance level of 0.05, unless stated
differently.
a) Using Ordinary Least Squares, regress the quantity of ecolabeled apples on the price of
regular apples, the price of ecolabeled apples, the logarithm of family income (lfaminc), and
years of schooling. Show that the residual is uncorrelated with all of the explanatory
variables.
b) Re-estimate the equation of sub-question a) for households that purchased ecolabeled
apples.
c) Reconsider the equation of sub-question a). Test whether the price of regular apples and
the price of ecolabeled apples have a joint effect on the quantity of apples. What do you
conclude?
d) Test whether the coefficient on the price of regular apples is the negative of the price of
ecolabeled apples. What do you conclude?
e) Re-specify and re-estimate the equation of sub-question a) in such a way that you are able
to measure the constant elasticity of income on the quantity of apples.
f) Please reconsider the regression equation of sub-question a). Test for heteroskedasticity.
What do you conclude?
g) Reconsider the regression equation of sub-question a). Create a dummy variable (dumh1)
which is one for one-person households. Create an additional dummy variable (dumh2)
which is one for two-person households. Is there any joint effect of one-person and twopersons households on the dependent variable? What do you conclude?
h) Please reconsider the regression equation of sub-question a). Is the effect of the price of
regular apples different between males and females, keeping the effects of the price of
ecolabeled apples, logarithm of family income, education, age, and gender constant? What do
you conclude?
9
Question 2 data set: exam_housing.dta (35 points; a: 8 points; b-d: 9 points each)
a) Regress the real housing investment on the housing price index and a time trend. Is there
any indication of first-order autocorrelation?
b) How would you re-estimate equation a) given your outcome of equation a)?
c) Using the Dickey-Fuller test, please test for stationarity (i.e. no unit root) of the two
variables real housing investment and the housing price, using the tables of the critical value
below. What do you conclude?
Table of critical value of DF-test with time trend
Signif. level
Critical value
1%
-3.96
2.5%
-3.66
5%
-3.41
10%
-3.12
Table of critical values of DF-test without time trend
Signif. level
1%
2.5%
5%
Critical value
-3.43
-3.12
-2.86
10%
-2.57
d) Let’s assume there is no co-integration. How would you re-estimate the equation of subquestion a) given the outcome of Dickey-Fuller test?
10