Econ 590A WEAK INSTRUMENTS I

October 10, 2012
Econ 590A
WEAK INSTRUMENTS I
Large Sample Properties of 2SLS Estimators, t-Tests, and Anderson-Rubin Tests
Model
Consider the following regression model with a single endogenous regressor and a number of exogenous
regressors:
y1 = y2 γ + Z2 β + u,
where y1 is the n-vector of observations on the dependent variable, y2 is the n-vector of observations on the
endogenous regressor, Z2 is the n × l2 matrix of l2 exogenous regressors, u is the n-vector of residuals, γ ∈ R,
and β ∈ Rl2 . We have the reduced form equation for y2 :
y2 = Z1 π1 + Z2 π2 + v,
where Z1 is the n × l1 matrix of l1 instruments for y2 , πj ∈ Rlj for j = 1, 2, and v is the n-vector of
observations on the reduced form residuals.
If π1 = 0, Z1 is not a valid instrument for y2 and the model is not identified. Here we consider the case
when the model is identified, however, the relationship between Z1 and y2 (for a given Z2 ) is weak, i.e. Z1
is the matrix of weak instruments. Weak instruments are defined by the following assumption.
Assumption 1 π1 = π1 (n) = n−1/2 C, where C ∈ Rl1 and fixed.
Assumption 1 defines weak, but different from zero relationship between the endogenous regressor y2
and its instruments Z1 (after controlling for the effect of Z2 ). We will rely on large n approximation for
the distribution of the estimators, and, therefore, weakness of the relationship between y2 and Z1 has to be
modelled in terms of the sample size n. This is because any fixed (independent of n) π1 , will be "large" when
n → ∞ as long as it is different from zero. Therefore, we assume that π1 = π1 (n) → 0 as n → ∞. The rate
of convergence is chosen in a such way so that small correlations captured by non-zero C’s will appear in
the limit.
In addition, we make the following assumptions.
Assumption
2
(a) {(y1i , y2i , Z1i , Z2i ) : i = 1, . . . , n} are iid.
Z1i
ui vi = 0.
(b) E
Z2i
! 0
ui
ui
σu2 σuv
= Σ, a finite and positive definite matrix.
(c) E
|Z1i , Z2i =
σuv σv2
vi
vi
0 Z1i
Z1i
Q11 Q12
(d) E
=
= Q, a finite and positive definite matrix.
Z2i
Z2i
Q012 Q22
Part (b) in the above assumption says that the instruments are exogenous. Part (c) says that the errors
u and v are homoskedastic. It also introduces a partition on their second moment matrix. Part (d) says that
the instrument have finite second moments. Not that part (d) also implies that Q11 and Q22 are positive
definite.
Define
Z = Z1 Z2 .
1
It follows from Assumption 2(a),(d) and the WLLN that
n
−1
Z10 Z1
n
Z20 Z1
n
P
0
ZZ=
Z10 Z2
n
Z20 Z2
n
!
0
Z1i Z1i
n
P
0
i=1 Z2i Z1i
n
0
Z1i Z2i
n
P
0
i=1 Z2i Z2i
n
P
i=1
=
→p
Q11
Q012
i=1
!
Q12
Q22
= Q.
Further, the CLT and Assumption 2 imply that

n
−1/2
Z 0u
Z 0v



=


=n
Z10 u
√
n
Z20 u
√
n
Z10 v
√
n
Z20 v
√
n
−1/2







n X
ui
Z1i
⊗
vi
Z2i
i=1
→d N (0, Σ ⊗ Q)


Φ1
 Φ2 

=
 Ψ1  .
Ψ2
(1)
√
Thus, for example, the asymptotic distribution of Z10 u/ n is given by the distribution of Φ1 ∼ N 0, σu2 Q11 .
It is important for
√ further results
√ that convergence in distribution is joint. For example, the asymptotic covariance of Z10 u/ n and Z20 v/ n is given by the covariance of Φ1 and Ψ2 , which equals σuv Q12 .
Define
y2 Z2 ,
X =
P
−1
= Z (Z 0 Z)
Z 0.
Since the errors are homoskedastic, we can focus on the 2SLS estimator:
γ
b
−1
= (X 0 P X) X 0 P y1 .
βb
2
A simple regression example: no exogenous regressors and single IV
Suppose that β = π2 = 0 and l1 = 1. This is a just identified case, and 2SLS estimator of γ reduces to the
IV estimator:
γ
b=
Z10 y1
Z10 y2
Z10 u
Z10 (Z1 π1 + v)
Z10 u
√
=γ+ 0
Z1 (Z1 C/ n + v)
√
Z10 u/ n
√
=γ+ 0
Z1 Z1 /nC + Z10 v/ n
Φ1
→d γ +
Q11 C + Ψ1
= γ + ∆,
=γ+
(2)
where the distribution of the random variable ∆ is given by
∆=
Φ1
,
Q11 C + Ψ1
and (2) follows from the joint convergence in (1) and by the Continuous Mapping Theorem (CMT).
This simple example illustrates the problem with IV estimation in presence of weak instruments. First,
the 2SLS (IV) estimator is inconsistent. Second, instead of usual converge in probability, γ
b converges in
distribution to a random variable. The asymptotic distribution of γ
b is non-standard and depends on the
ratio of two normal random variables. The "bias" term ∆ has an inverse relationship with C.
Suppose that the econometrician ignores the fact that Z1 is a weak instrument, and for testing H0 : γ = γ0 ,
he uses the usual t statistic based on γ
b:
t= q
γ
b − γ0
,
\ar (b
AsyV
γ ) /n
\ar (b
where AsyV
γ ) is an estimator of the usual asymptotic variance of the IV estimator when the instruments
are strong:
\ar (b
AsyV
γ)
σ
bu2
= σ
bu2
Z10 Z1 /n
2,
(Z10 y2 /n)
2
= ky1 − y2 γ
bk /n,
2
where kck = c0 c is the Euclidean norm. Under H0 , we have
Z0 u
t= p 1 0 .
σ
bu2 Z1 Z1
In order to derive the asymptotic distribution of t under H0 , first, note that
√
Z10 u/ n
Φ1
p
.
→d √
0
Q11
Z1 Z1 /n
3
(3)
(4)
Next,
2
σ
bu2 = ky2 (b
γ − γ) − uk /n
2
√
= Z1 C/ n + v (b
γ − γ) − u /n
0
v0 v
Z10 u
v0 u
Z10 v
u0 u
2
2 Z1 Z1
√
√
= (b
γ − γ) C
+
−
2
(b
γ
−
γ)
C
+
+
2C
+
n2
n
n
n
n n
n n
→d σv2 ∆2 − 2σvu ∆ + σu2 ,
(5)
Combining the results in (3)-(5), and using the CMT,
√
Φ1 / Q11
t →d p
σv2 ∆2 − 2σvu ∆ + σu2
p
Φ1 / σu2 Q11
.
=q 2
σv 2
σvu
σ2 ∆ − 2 σ2 ∆ + 1
u
u
p
Note that while Φ1 / σu2 Q11 ∼ N (0, 1) , the denominator in the above expression is a random variable,
which depends on the numerator through Φ1 . Furthermore, the limiting distribution of t depends on the
unknown nuisance parameter C, which cannot be estimated consistently. Thus, under the null, the t statistic
does not have the usual standard normal distribution. Consequently, a test that rejects the null hypothesis
in favor of the alternative, H1 : γ 6= γ0 , when |t| > z1−α/2 is invalid, since its asymptotic size is different from
α. Similarly, the usual confidence intervals based on γ
b are invalid, since they are constructed by inverting
the t test. Let
q
q
\
\
CIα = γ
b − z1−α/2 AsyV ar (b
γ ) /n, γ
b + z1−α/2 AsyV ar (b
γ ) /n ,
and let γ0 be the true value of γ. Then,
lim P (γ0 ∈ CIa )
n→∞
=
=
lim P |t| < z1−α/2
n→∞


p
2Q
Φ
/
σ
1
u 11
< z1−α/2  .
P  q 2
σv2 ∆2 − 2 σvu
2 ∆ + 1
σu
σu
Large sample properties of the 2SLS estimator of γ: a model with
exogenous regressors and multiple IVs
The 2SLS estimator can be written as
γ
b
βb
−1
b 0X
b
b 0 y1 ,
= X
X
where
yb2
Z2
b
X
=
yb2
= Z1 π
b1 + Z2 π
b2 ,
,
and π
b1 and π
b2 are the LS coefficients from the regression of y2 against Z1 and Z2 . Define
M2
P2
= I n − P2 ,
−1
= Z2 (Z20 Z2 )
4
Z20 .
Then we can write
γ
b
=
=
=
yb20 M2 y1
yb20 M2 yb2
yb0 M2 u
γ + 02
yb2 M2 yb2
π
b 0 Z 0 M2 u
.
γ + 0 10 1
π
b1 Z1 M2 Z1 π
b1
(6)
Next,
π
b1 = (Z10 M2 Z1 )
−1
Z10 M2 y2 .
Combining the last two equations,
γ
b−γ =
y20 M2 Z1
√
n
Z10 M2 Z1
n
−1
Z10 M2 y2
√
n
!−1
y20 M2 Z1
√
n
Z10 M2 Z1
n
−1
Z10 M2 u
√
.
n
(7)
We have
Z10 M2 Z1 /n =
−1 0
Z20 Z2
Z2 Z1
n
n
0
−1
Q22 Q12
Z10 Z1
Z 0 Z2
− 1
n
n
→p Q11 − Q12
= Q1·2 .
(8)
√
√
Z10 M2 y2 / n = Z10 M2 (Z1 π1 + v) / n
√
√
= Z10 M2 Z1 C/ n + v / n
Z10 M2 Z1
C
n
−1 0
Z10 v Z10 Z2 Z20 Z2
Z2 v
√
√
+
−
n
n
n
n
=
→d Q1·2 C + Ψ1 − Q12 Q−1
22 Ψ2
= Q1·2 C + Ψ1·2 ,
(9)
where
Ψ1·2 = Ψ1 − Q12 Q−1
22 Ψ2 .
Note that
Ψ1·2 ∼ N 0, σv2 Q1·2 .
Lastly,
√
Z 0 u Z 0 Z2
Z10 M2 u/ n = √1 − 1
n
n
Z20 Z2
n
−1
Z20 u
√
n
→d Φ1 − Q12 Q−1
22 Φ2
= Φ1·2 ,
(10)
where
2
Φ1·2 = Φ1 − Q12 Q−1
22 Φ2 ∼ N 0, σu Q1·2 .
Combining (7)-(10), we obtain
0
γ
b →d γ +
(Q1·2 C + Ψ1·2 ) Q−1
1·2 Φ1·2
0 −1
(Q1·2 C + Ψ1·2 ) Q1·2 (Q1·2 C + Ψ1·2 )
= γ + ∆,
5
(11)
where
0
(Q1·2 C + Ψ1·2 ) Q−1
1·2 Φ1·2
.
0
(Q1·2 C + Ψ1·2 ) Q−1
(Q
1·2 C + Ψ1·2 )
1·2
We conclude that the 2SLS estimator of γ is inconsistent. Note that the denominator in the expression for
∆, after devision by σv2 , has a noncentral χ2 distribution with l1 degrees of freedom and the noncentrality
parameter given by C 0 Q1·2 C. Similarly to the simple example discussed in the previous section, the usual
tests and confidence intervals for γ are invalid due to the weak IVs problem.
∆=
Large sample properties of the 2SLS estimator of β
The 2SLS estimator of β can be written as
βb =
−1
(Z20 Z2 )
Z20 (y1 − yb2 γ
b)
= β + (Z20 Z2 )
−1
−1
Z20 (y2 γ − yb2 γ
b) + (Z20 Z2 )
Z20 u.
From (6),
y2 γ − yb2 γ
b
(y2 − yb2 ) γ
b − y2 (b
γ − γ)
−1
=
In − Z (Z 0 Z) Z 0 vb
γ − y2 (b
γ − γ)
=
= M vb
γ − y2 (b
γ − γ) ,
where
M
= In − Z (Z 0 Z)
−1
Z0
(12)
= In − P,
a projection matrix onto the space orthogonal to Z’s. Note that Z20 M = 0. We have
βb − β
−1
=
(Z20 Z2 )
=
− (Z20 Z2 )
=
=
Z20 (M vb
γ − y2 (b
γ − γ)) + (Z20 Z2 )
−1
−1
Z20 u
−1
Z20 y2 (b
γ − γ) + (Z20 Z2 ) Z20 u
√
−1
−1
γ − γ) + (Z20 Z2 ) Z20 u
− (Z20 Z2 ) Z20 Z1 C/ n + Z2 π2 + v (b
0 −1 0
0 −1 0
0 −1 0
Z2 Z2
Z2 Z1
Z2 Z2
Z2 v
Z2 Z2
Z2 u
√ C (b
γ − γ) − π2 (b
γ − γ) −
(b
γ − γ) +
.
−
n
n
n
n
n
n n
Using the results of the previous sections,
βb − β →d −π2 ∆.
Thus, βb is inconsistent, if π2 is fixed and different from zero. In particular, the 2SLS estimator of the
coefficients on exogenous regressors is consistent if π2 = 0, i.e. the exogenous regressors Z2 are uncorrelated
with the endogenous regressor y2 after controlling for the instruments Z1 .
Let’s make the following assumption.
Assumption 3 π2 = π2 (n) = n−1/2 D, where D ∈ Rl2 and fixed.
Using the last assumption, we have
0 −1 0
0 −1 0
0 −1 0
√ Z2 Z2
Z2 Z1
Z2 Z2
Z2 v
Z2 Z2
Z2 u
√ (b
√
n βb − β = −
C (b
γ − γ) − D (b
γ − γ) −
γ − γ) +
n
n
n
n
n
n
0
→d −Q−1
22 (Q12 C∆ + Ψ2 ∆ − Φ2 ) − D∆.
In the case of weak correlation between Z2 and y2 , the 2SLS estimator of β is consistent. However, its
asymptotic distribution is nonstandard and depends on unknown nuisance parameters C and D which
cannot be estimated consistently.
6
Inference on γ in simple regressions: no exogenous regressors
Suppose we are interested in testing H0 : γ = γ0 against H1 : γ 6= γ0 . Consider first a model with no
exogenous regressors: β = π2 = 0, and l1 ≥ 1 IVs:
y1
= γy2 + u,
y2
= Z 1 π1 + v
The null restricted residuals are given by y1 − y2 γ0 . Under the null,
√
√
Z10 (y1 − y2 γ0 ) / n = Z10 u/ n
→d Φ1
= N 0, σu2 Q11 .
(13)
The variance term σu2 Q11 can be estimated consistently by σ
eu2 Z10 Z1 /n, where the estimator σ
eu2 is based on
the null restricted residuals:
0
σ
eu2 = (y1 − y2 γ0 ) (y1 − y2 γ0 ) /n.
Under the null,
σ
eu2 →p σu2 .
Let
−1
P1 = Z1 (Z10 Z1 )
(14)
Z10 ,
and consider the following statistic:
g n (γ0 )
AR
=
=
0
(y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) /e
σu2
√
√
−1
0
(y1 − y2 γ0 ) Z1 / n (Z10 Z1 /n) [Z10 (y1 − y2 γ0 ) / n]
.
σ
eu2
(15)
When the null hypothesis is true, (13) and (14) imply that
g n (γ0 ) →d χ2 ,
AR
l1
g n (γ0 ) >
which holds regardless of the strength of the instruments. Hence, a test that rejects the null when AR
2
2
2
χl1 ,1−α , where χl1 ,1−α is the 1 − α quantile of the χl1 , has the asymptotic size α.
Suppose that l1 = 1. Then, under H0 , the statistic in (15) can be written as
g n (γ0 ) =
AR
Z0 u
p 1 0
σ
eu2 Z1 Z1
!2
.
g n (γ0 ) is a null-restricted version of the
By comparing the above equation with (3), we observe that AR
t-statistic (squared): we replace the usual estimator of σ 2 with its null-restricted version. Thus, in this case
we can solve the problem of weak identification by using a null-restricted t-statistic.
The statistic in (15) was originally suggested by Anderson and Rubin (1949) in the following form. Let
M1 = In − P1 .
The Anderson-Rubin statistic (AR) is given by
0
AR (γ0 ) =
(y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) /l1
.
0
(y1 − y2 γ0 ) M1 (y1 − y2 γ0 ) / (n − l1 )
7
Note that under the null,
0
(y1 − y2 γ0 ) M1 (y1 − y2 γ0 )
n − l1
−1 0
0
uu
u0 Z1 Z10 Z1
Z1 u n
=
−
n − l1
n
n
n n − l1
→p σu2 .
Therefore, under the null,
AR (γ0 ) l1 →d χ2l1 .
Thus, the two versions have the same asymptotic distribution under the null. However, under the null, when
the disturbances are normally
distributed, AR (γ0 ) has the Fl1 ,n−l1 distribution in finite samples. Assuming
that u|Z1 ∼ N 0, σu2 In ,
u0 P1 u/σu2 |Z1
∼ χ2rank(P1 )
= χ2l1 .
u0 M1 u/σu2 |Z1
∼ χ2rank(M1 )
= χ2n−l1 .
Further, numerator and denominator are independent due to normality and orthogonality of P1 and M1 . It
follows that, under the null, AR (γ0 ) ∼ Fl1 ,n−l1 .
While the test based on the AR statistic has correct size regardless of the power of the instruments, its
power depends on the strength of the IVs. Consider a fixed alternative H1 : γ = γ0 + δ, where δ is a fixed
unknown constant. Assume that IVs are weak as in Assumption 1. In this case,
M1 (y1 − y2 γ0 )
= M1 (y2 δ + u)
= M1 ((Z1 π1 + v) δ + u)
= M1 (vδ + u) .
Hence,
0
(y1 − y2 γ0 ) M1 (y1 − y2 γ0 ) / (n − l1 ) →p σv2 δ 2 + σu2 + 2σuv δ.
Further, the numerator of the AR statistic can be written as
√
√
−1
0
0
(y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) = (y1 − y2 γ0 ) Z1 / n (Z10 Z1 /n) Z10 (y1 − y2 γ0 ) / n.
Next,
√
√
Z1 C/ n + v δ + u / n
√
Z10 (y1 − y2 γ0 ) / n = Z10
→d Q11 Cδ + Ψ1 δ + Φ1 ,
and
0
AR (γ0 ) l1
→
=
(Q11 Cδ + Ψ1 δ + Φ1 ) Q−1
11 (Q11 Cδ + Ψ1 δ + Φ1 )
d
2
2
2
σv δ + σu + 2σuv δ
2
1/2
−1/2
Q11 Cδ
Q11 (Ψ1 δ + Φ1 ) +p
p 2 2
.
σv δ + σu2 + 2σuv δ
σv2 δ 2 + σu2 + 2σuv δ Note that
Ψ1 δ + Φ1 ∼ N 0, σv2 δ 2 + σu2 + 2σuv δ Q11 .
8
(16)
Hence,
−1/2
Q
(Ψ1 δ + Φ1 )
p 11
∼ N (0, Il1 ) ,
σv2 δ 2 + σu2 + 2σuv δ
or
−1/2
1/2
Q11 Cδ
Q
(Ψ1 δ + Φ1 )
p
+ p 11
∼N
2
2
2
2
2
σv δ + σu + 2σuv δ
σv δ + σu2 + 2σuv δ
1/2
Q11 Cδ
p
, Il1
2
2
σv δ + σu2 + 2σuv δ
!
,
and therefore, in the case of fixed alternatives and weak IVs, AR (γ0 ) has asymptotic non-central χ2l1 distribution with the noncentarlity parameter
δ2
C 0 Q11 C
.
σv2 δ 2 + σu2 + 2σuv δ
The econometrician should reject H0 when AR (γ0 ) l1 > χ2l1 ,1−α . The size of the test (probability of rejecting
H0 when it is true (δ = 0)) is always α (asymptotically) regardless of the value of C: even when IVs are
irrelevant (C = 0) the AR test has correct size. We say that the AR test is robust against weak identification,
as weakness of IVs does not lead to size distortions.
The power of the test depends on the strength of the IVs as captured by C. The test has only trivial
power if C = 0: in that case, the probability of rejection is always α regardless of the value of δ (it is the
same under H0 and H1 ). Note also that the AR test can “waste” degrees of freedom: while we have only one
restriction under H0 (γ is a scalar), the number of degrees of freedom l1 > 1 when the model is overidentified
(note that having more degrees of freedom corresponds to larger critical values).
Since the AR test is valid whether IVs are weak or strong, it is also important to study its power properties
when IVs are strong. For the remainder of this section, assume that π1 = fixed (unlike in Assumption 1).
First, we consider fixed alternatives of the form H1 : γ = γ0 +δ with fixed δ. The behavior of the denominator
is the same as in (16) (this is because the denominator does not depend on Z’s and π 0 s as they are eliminated
by M ). For the numerator:
0
(y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) = (y2 δ + u)0 P1 (y2 δ + u)
= (Z1 π1 δ + vδ + u)0 Z1 (Z10 Z1 )−1 Z1 (Z1 π1 δ + vδ + u)
0 0 −1 0
0
Z1 Z1
Z10 v
Z10 u
Z1 Z1
Z10 v
Z10 u
Z1 Z1
√ π1 δ + √ δ + √
√ π1 δ + √ δ + √
,
=
n
n
n
n
n
n
n
or
0
n−1 (y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) =
Z10 Z1
Z0 v
Z0 u
π1 δ + 1 δ + 1
n
n
n
0 Z10 Z1
n
−1 Z0 v
Z0 u
Z10 Z1
π1 δ + 1 δ + 1
n
n
n
→p δ 2 π10 Q11 π1 > 0
Hence, the numerator of the AR statistic and consequently the AR statistic itself diverge to +∞ with
probability approaching in the sense that for any constant c, P (AR (γ0 ) l1 > c) → 1 under fixed alternatives
and strong IVs. We conclude that the AR test is consistent against fixed alternatives when IVs are strong.
Next, we consider the power
properties of the AR test with strong IVs in the case of local alternatives of
√
the form H1 :γ = γ0 + δ/ n. In this case, the denominator of the AR statistic converges in probability to
σu2 as is shown below:
√
√
0
0
(y1 − y2 γ0 ) M1 (y1 − y2 γ0 )
(y2 δ/ n + u) M1 (y2 δ/ n + u)
=
n − l1
n − l1
√
√
0
((Z1 π1 + v)δ/ n + u) M1 ((Z1 π1 + v)δ/ n + u)
=
n − l1
0
u0 M1 u
v
M
v
v 0 M1 u
1
=
+ δ2
+ 2δ √
n − l1
n(n − l1 )
n(n − l1 )
→p σu2 .
9
For the numerator of the AR statistic, we now have:
√
√
0
(y1 − y2 γ0 ) P1 (y1 − y2 γ0 ) = (y2 δ/ n + u)0 P1 (y2 δ/ n + u)
√
√
√
√
= (Z1 π1 δ/ n + vδ/ n + u)0 Z1 (Z10 Z1 )−1 Z1 (Z1 π1 δ/ n + vδ/ n + u)
0
0 0 −1 0
Z1 Z1
Z10 v
Z10 u
Z1 Z1
Z1 Z1
Z10 v
Z10 u
=
π1 δ +
δ+ √
π1 δ +
δ+ √
n
n
n
n
n
n
n
0
→d (Q11 π1 δ + Φ1 ) Q−1
11 (Q11 π1 δ + Φ1 ) .
Note that:
−1/2
Q11
1/2
(Q11 π1 δ + Φ1 ) /σu ∼ N Q11 π1 δ/σu , Il1 .
Therefore, under local alternatives, the AR statistic has noncentral χ2l1 distribution with the noncentrality
parameter
π 0 Q11 π1
δ2 1 2 .
σu
Thus, under strong IVs the AR test has nontrivial power against local alternatives. Since larger values of
the noncentrality parameter correspond to higher rejection probabilities, the power of the test increases with
the distance from H0 (δ 2 ), stronger correlation between the endogenous regressor and IVs (π1 of higher
magnitude), and more variation in the IVs (Q11 ). More noise in the structural equation (larger σu2 ) will
adversely affect the power of the test.
Inference on γ in regressions with exogenous regressors and multiple
IVs
The joint hypotheses on γ and β can be tested in exactly the same way as in the simple regression case.
However, often econometricians are interested in testing hypotheses on γ while leaving β unrestricted. This
can be done by projecting the IVs Z1 onto the space orthogonal to Z2 . In this case, the AR statistic takes
the following form. Define
Ze1
=
Pe1
=
M2 Z 1 ,
−1
e1 Ze0 Ze1
Z
Ze10 .
1
The AR statistic is given by
0
AR (γ0 ) =
(y1 − y2 γ0 ) Pe1 (y1 − y2 γ0 ) /l1
.
0
(y1 − y2 γ0 ) M (y1 − y2 γ0 ) / (n − l1 − l2 )
Consider again the fixed alternatives H1 : γ = γ0 + δ (δ is fixed) and weak IVs. We have
−1
0
0
(y1 − y2 γ0 ) Pe1 (y1 − y2 γ0 ) = (y1 − y2 γ0 ) M2 Z1 (Z10 M2 Z1 ) Z10 M2 (y1 − y2 γ0 ) ,
√
Z10 M2 (y1 − y2 γ0 ) = Z10 M2 Z1 C/ n + v δ + u , and
√
Z10 M2 ((Z1 C/ n + v) δ + u)
√
→d Q1·2 Cδ + Ψ1·2 δ + Φ1·2 .
n
Next,
0
0
(y1 − y2 γ0 ) M (y1 − y2 γ0 )
(vδ + u) M (vδ + u)
=
n
n
→p σv2 δ 2 + σu2 + 2σuv δ.
10
(17)
Combining the above results,
0
AR (γ0 ) l1 →d
(Q1·2 Cδ + Ψ1·2 δ + Φ1·2 ) Q−1
1·2 (Q1·2 Cδ + Ψ1·2 δ + Φ1·2 )
.
σv2 δ 2 + σu2 + 2σuv δ
Further,
Ψ1·2 δ + Φ1·2 ∼ N 0, σv2 δ 2 + σu2 + 2σuv δ Q1·2 ,
and, therefore,
−1/2
Q1·2
(Q Cδ + Ψ1·2 δ + Φ1·2 )
p 1·2
∼N
σv2 δ 2 + σu2 + 2σuv δ
1/20
Q1·2 Cδ
p
, Il1
2
2
σv δ + σu2 + 2σuv δ
!
.
Hence, asymptotically AR (γ0 ) l1 has the noncentral χ2l1 distribution with the noncentrality parameter given
by
C 0 Q1·2 C
δ2 2 2
.
σv δ + σu2 + 2σuv δ
Suppose that one rejects the null if AR (γ0 ) l1 > χ2l1 ,1−α . When the null hypothesis is true (δ = 0) or when the
instruments are unrelated to the endogenous regressor (C = 0), AR (γ0 ) l1 has the central χ2l1 distribution,
and
P AR (γ0 ) l1 > χ2l1 ,1−α → α.
However, when the instruments are weak, and the null is false, the asymptotic power of the AR test exceeds
α. The probability to reject the null increases with the strength of IVs as captured by C.
√
Next, we consider the behavior of the AR test under local alternatives H1 : γ = γ0 + δ/ n and strong
IVs (π1 is fixed). In that case,
√
Z10 M2 (y1 − y2 γ0 ) = Z10 M2 (Z1 π1 + v) δ/ n + u .
Substituting this into (17), we obtain
0
(y1 − y2 γ0 ) Pe1 (y1 − y2 γ0 )
0
−1 0
0 0
Z 1 M2 Z 1
Z10 M2 v
Z10 M2 u
Z1 M2 Z1
Z10 M2 v
Z10 M2 u
Z1 M2 Z1
√
√
=
π1 δ +
δ+
π1 δ +
δ+
n
n
n
n
n
n
n
0
→d (Q1·2 π1 δ + Φ1·2 ) Q−1
1·2 (Q1·2 π1 δ + Φ1·2 ) .
Similarly to the case without exogenous regressors, under local alternatives
0
(y1 − y2 γ0 ) M (y1 − y2 γ0 ) / (n − l1 − l2 ) → σu2 .
Now, it follows by (11) that the AR statistic converges to a noncentral χ2l1 distribution with noncentrality
parameter given by
π 0 Q1·2 π1
δ2 1 2
.
σu
Confidence sets for γ
Since the usual t- and Wald tests based on the 2SLS estimator of γ can exhibit size distortions when IVs
are weak, confidence intervals constructed as estimator ± z1−α/2 × std.err can be invalid: their asymptotic
coverage can be substantially smaller than 1−α. Asymptotically valid confidence sets for γ can be constructed
by inverting a test which is robust to the weak IVs problem. For example, using the AR statistic, robust to
weak IVs confidence sets with asymptotic coverage probability 1 − α are constructed as:
CS1−α = γ0 : AR (γ0 ) l1 ≤ χ2l1 ,1−α .
(18)
11
Thus, one collects all values of γ for which the null hypotheses H0 : γ = γ0 cannot be rejected. It is easy to
see that CS1−α has correct asymptotic coverage regardless of the strength of IVs. The probability that the
true γ is inside the confidence set is
P (γ ∈ CS1−α ) = P AR (γ) l1 ≤ χ2l1 ,1−α
→ 1 − α,
where the result in the last line follows from the properties of the AR test.
While their coverage probability is always correct, the length of robust confidence sets depends on the
strength of IVs. For example, if IVs and the endogenous variable are unrelated, the AR test always accepts
the null with an asymptotic probability of 1 − α regardless of whether it is true or false, and we should expect
very wide and uninformative confidence sets.
The form of robust confidence sets can differ drastically from that of usual confidence intervals: CS1−α
can be an interval, a union of two half-lines (−∞, a] ∪ [b, ∞) for a < b, or even the entire real line. We can
illustrate this using the example of a simple IV regression with a single endogenous regressor, single IV, and
no exogenous regressors. For simplicity, consider again the version of the AR statistic given in equation (15).
In this case, γ0 is included in the confidence set if
(y1 − y2 γ0 )0 Z1 (Z10 Z1 )−1 Z10 (y1 − y2 γ0 )
≤ χ21,1−α
0
(y1 − y2 γ0 ) (y1 − y2 γ0 ) /n.
or, equivalently, when
2
(Z10 (y1 − y2 γ0 )) − χ21,1−α
Z10 Z1
n
0
(y1 − y2 γ0 ) (y1 − y2 γ0 ) ≤ 0.
By expanding and rearranging the terms, we obtain the following condition for γ0 to be included in the
robust confidence set:
y 0 y2
γ02 (Z10 y2 )2 − χ21,1−α (Z10 Z1 ) ( 2 ) −
n
0 2 #
0 "
y1 y1
y1 y2
2
0
2
0
0
0
2
0
+ (Z1 y1 ) − χ1,1−α (Z1 Z1 )
≤ 0.
− 2γ0 (Z1 y2 ) (Z1 y1 ) − χ1,1−α (Z1 Z1 )
n
n
One can show that there is always a value γ0 satisfying the above condition, and therefore the confidence
set is never empty. Next, since we have a quadratic equation on the left-hand side, for the confidence set to
be an interval we need that
y 0 y2
(Z10 y2 )2 − χ21,1−α (Z10 Z1 ) ( 2 ) > 0.
(19)
n
2
Suppose that the IV is strong (π1 is fixed). In that case (Z10 y2 )2 = [(Z10 Z1 )π1 + Z10 v] , which behaves
asymptotically as n2 × constant2 . At the same time, χ21,1−α (Z10 Z1 ) (y20 y2 /n) is asymptotically n × constant2 .
Thus, for all n sufficiently large, the above inequality holds with a probability approaching one. We conclude
that, when IVs are strong, the robust confidence √
set is an interval.
√ 2
2
When the IV is weak, (Z10 y2 )2 = [(Z10 Z1 )C/ n + Z10 v] = n [(Z10 Z1 /n)C + Z10 v/ n] = Op (n). Since
χ21,1−α (Z10 Z1 ) (y20 y2 /n) = Op (n), the condition in (19) can fail with a positive probability, in which case
CS1−α will be unbounded (the entire real line or a union of two half-lines). Note that Dufour (1997,
“Some Impossibility Theorems in Econometrics,” Econometrica) shows that in the case of local identification
failures, a valid confidence set must be unbounded with a positive probability.
Multiple endogenous regressors
In the case of a vector of endogenous regressors (Y2 ∈ Rn×k , γ ∈ Rk ), the AR statistic can be constructed in
a similar manner to the single endogenous regressor case by considering the sample covariance between IVs
12
and the null restricted residuals:
AR (γ0 ) =
0
(y1 − Y2 γ0 ) Pe1 (y1 − Y2 γ0 ) /l1
.
0
(y1 − Y2 γ0 ) M (y1 − Y2 γ0 ) / (n − l1 − l2 )
One should reject the null when AR (γ0 ) l1 > χ2l1 ,1−α .
Again, confidence sets can be constructed by inverting the AR test as in (18). A major drawback of this
approach is that obtaining confidence sets for subvectors of γ is difficult. One approach that is commonly
1,2
used for that purpose is the projection method. Suppose that k = 2, i.e. γ = (γ1 , γ2 ) and let CS1−α
be a
joint confidence set for γ1 and γ2 :
1,2
CS1−α
= {(γ10 , γ20 ) : AR(γ10 , γ20 ) ≤ χ2l1 ,1−α /2}.
A projection-type confidence set for γ1 is given by
1,2
1
CS1−α
for some γ20 ∈ R}.
= {γ10 : (γ10 , γ20 ) ∈ CS1−α
Projection-type confidence sets are typically conservative: their asymptotic coverage exceeds 1 − α:
n
o
1,2
1
P γ1 ∈ CS1−α
=P
(γ1 , γ20 ) ∈ CS1−α
for some γ20 ∈ R
n
o
1,2
≥P
(γ1 , γ2 ) ∈ CS1−α
= 1 − α.
Hence, comparing to ideal confidence sets, projection confidence sets will be too large and, therefore, less
informative.
Confidence sets for β
Here we discuss construction of confidence sets for the coefficients on the exogenous regressors. Write
y1 = y2 γ + Z2 β + u
= y2 γ + Z21 β1 + Z22 β2 + u,
where β1 is an l21 -vector, β2 is an l22 -vector, l21 + l22 = l2 , and matrix Z2 is partitioned accordingly into Z21
and Z22 . Suppose that the econometrician is interested in constructing a weak-IVs-robust confidence set for
β1 .
We consider first an unfeasible approach, which assumes that γ is known. Later we will use the unfeasible
confidence set for β1 with the information about γ contained in its robust confidence set. When γ is known,
we can estimate β1 as
−1 0
0
βˆ1 (γ) = (Z21
M22 Z21 ) Z21
M22 (y1 − y2 γ) ,
0
M22 = In − Z22 (Z22
Z22 )
−1
0
Z22
.
Privided that the correct value of γ is used in its construction, the estimator βˆ1 (γ) is consistent and asymptotically normal. Under our assumptions, its asymptotic variance can be estimated as
−1
0
ˆ ar βˆ1 (γ) = σ
AsyV
ˆu2 (γ) (Z21
M22 Z21 /n) ,
0
σ
ˆu2 (γ) = (y1 − y2 γ) M (y1 − y2 γ) ,
where M is the projection matrix onto the space orthogonal to the span of Z1 and Z2 as defined in (12).
Hypotheses about β1 can be tested using the Wald statistic:
0
0
W (b1 , γ) = βˆ1 (γ) − b1 (Z21
M22 Z21 ) βˆ1 (γ) − b1 /ˆ
σu2 (γ).
13
When the correct value of γ is used,
W (β1 , γ) →d χ2l21 .
(20)
Hence, an asymptotic 1 − τ1 confidence set for β1 can be constructed as
β1
(γ) = b1 ∈ Rl21 : W (b1 , γ) ≤ χ2l21 ,1−τ1 .
CS1−τ
1
β1
(γ) is asymptotically valid (has the coverage probability of 1 − τ1 )
Again, we need to emphasize that CS1−τ
1
only if the correct value of γ was used in its construction.
β1
(γ) with the possible values of γ from a weakSince in practice γ is unknown, we can supplement CS1−τ
1
γ
IVs-robust confidence set for γ, such as the one constructed using the AR statistic in (18). Let CS1−τ
2
denote a weak-IVs-robust confidence set for γ with an asymptotic coverage probability of 1 − τ2 . Consider
the following confidence set for β1 :
o
n
γ
β1
(c)
for
some
c
∈
CS
B = b1 ∈ Rl21 : b21 ∈ CS1−τ
1−τ2
1
γ
2
l21
.
= b1 ∈ R : W (b1 , c) ≤ χl21 ,1−τ1 for some c ∈ CS1−τ
2
What is the asymptotic coverage probability of B for given τ1 and τ2 ? We have
γ
P (β1 ∈
/ B) = P W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
2
γ
γ
= P W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
, γ ∈ CS1−τ
2
2
+ P W (β1 , c) >
χ2l21 ,1−τ1
for all c ∈
γ
CS1−τ
,γ
2
∈
/
(21)
γ
CS1−τ
.
2
The second probability can be bounded asymptotically by τ2 :
γ
γ
γ
≤P γ∈
/ CS1−τ
,γ ∈
/ CS1−τ
P W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
2
2
2
→ τ2 ,
γ
where the last line follows because CS1−τ
has the asymptotic coverage of 1 − τ2 . The probability in (21)
2
can be bounded asymptotically by τ1 as follows:
γ
γ
, γ ∈ CS1−τ
P W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
2
2
γ
γ
γ ∈ CS γ
= P W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
1−τ2 P γ ∈ CS1−τ2
2
γ
≤ P W (β1 , γ) > χ2l21 ,1−τ1 P γ ∈ CS1−τ
2
≤ P W (β1 , γ) > χ2l21 ,1−τ1
→ τ1 ,
where the inequality in the third line holds because W (β1 , γ) > χ2l21 ,1−τ1 is more likely to hold than
γ
γ
W (β1 , c) > χ2l21 ,1−τ1 for all c ∈ CS1−τ
conditional on γ ∈ CS1−τ
, the inequality in the fourth line holds
2
2
γ
because 0 ≤ P γ ∈ CS1−τ2 ≤ 1, and the inequality in the fifth line holds by (20). Hence,
P (β1 ∈ B) → 1 − τ, where
τ ≤ τ1 + τ2 .
Thus, a weak-IVs-robust confidence set for β1 with asymptotic coverage of at least 1 − α can be constructed
as follows. Pick α1 and α2 between zero and one such that α1 + α2 = α. The confidence set for β1 is given
by
β1
γ
CS1−α
= b1 ∈ Rl21 : W (b1 , c) ≤ χ2l21 ,1−α1 for some c ∈ CS1−α
2
γ
= ∪c∈CS1−α
b1 ∈ Rl21 : W (b1 , c) ≤ χ2l21 ,1−α1 .
2
Note that
β1
CS1−α
is conservative.
14