A Framework for Eliciting, Incorporating, and Disciplining

Preliminary and Incomplete
A Framework for Eliciting, Incorporating, and
Disciplining Identification Beliefs in Linear Models
Francis J. DiTraglia
Camilo Garcia-Jimeno
University of Pennsylvania
March 20, 2015
Abstract
We consider the problem of estimating a causal effect from observational
data in a simple linear model that may be subject to classical measurement
error, an endogenous regressor and an invalid instrument. After characterizing
the identified set for this problem, we propose a Bayesian tool for inference
that is more general and informative than the usual frequentist approach to
partial identification and show how it can be used to help applied researchers
reason coherently about their identification beliefs. We conclude with two simple
examples illustrating the usefulness of our method.
1
Introduction
To identify causal effects from observational data, even the staunchest frequentist
econometrician must augment the data with her beliefs. In an instrumental variable
(IV) regression, for example, the exclusion restriction represents the belief that the
instrument has no direct on the outcome of interest after controlling for the regressors.
While this is a strong belief, it is explicit, and its meaning is well-understood. Although the exclusion restriction can never be directly tested, applied researchers know
how to think about it and how to debate it. Indeed, in specific problems we often
1
have a reasonable idea of the kinds of factors that make up the regression error term:
in a wage regression, for example, the key unobservable is ability. This allows us to
consider whether the assumption that these are uncorrelated with the instrument is
truly plausible.
The exclusion restriction is what we call an “formal identification belief” – something that is directly stated and whose role in achieving identification is clear. In
addition to imposing formal beliefs to achieve identification, researchers often state a
number of other “informal beliefs” in applied work. We use this term to refer to beliefs
that are not imposed in estimation, but which may be used, among other things, to
interpret results, or reconcile conflicting estimates from different specifications. For
example, papers that report the results of IV regressions almost invariably state the
authors’ belief about the sign of the correlation between the endogenous regressor and
the error term but fail to exploit this information. Referring to the more than 60 papers published in the top three empirical journals between 2002 and 2005 that reported
the results of IV regressions, for example, Moon and Schorfheide (2009) pointed out
that “in almost all of the papers the authors explicitly stated their beliefs about the
sign of the correlation between the endogenous regressor and the error term; yet none
of the authors exploited the resulting inequality moment condition in their estimation.” Another common informal belief involves measurement error. When empirical
researchers uncover an OLS estimate that is substantially smaller than but has the
same sign as its IV counterpart, classical measurement error, with its attendant “least
squares attenuation bias,” often takes the blame.
While measurement error, endogenous regressors and invalid instruments have all
generated voluminous literatures, we know of no paper that considers the effects of all
three problems at once. In a certain sense this is unsurprising: a partial identification
analysis based on a model that suffers from so many serious problems seems unlikely to
produce particularly informative bounds. Nevertheless, applied researchers have beliefs
about all three of these quantities, and at present lack a tool for testing whether these
beliefs cohere and, if they do, imposing them in estimation.
In this paper we consider a simple linear model in which the goal is to estimate
the causal effect β of a regressor x that may be measured with error and is potentially endogenous. Although an instrumental variable z is available, it may not satisfy
the exclusion restriction. For the moment we abstract from covariates and limit our
attention to classical measurement error: extensions to address both of these short-
2
comings are currently in progress. After characterizing the identified set for this model,
we propose a Bayesian tool to allow applied researchers, who may not be Bayesians
themselves, to reason coherently about their identification beliefs. Specifically, we propose a procedure for sampling uniformly from the identified set for the non-identified
parameters of the model conditional on the identified parameters. By imposing sign
and interval restrictions, we can add prior information to the problem while remaining
uniform on the regions of the identified set that remain. In some cases the result can
be quite informative even if the beliefs imposed are somewhat weak. Unlike the usual
frequentist partial identification analysis, we take seriously the possibility that there
may be more points on the identified set that are compatible with a particular value
of β than another. While the uniformity of the prior on the identified set need not be
taken literally, it provides a good starting point for moving beyond the more common
analysis based on “worst-case” bounds.
This paper relates to a vast literature on partial identification, measurement error,
and invalid instruments. Two recent papers with a similar flavor to this one are Conley
et al. (2012), who propose a Bayesian procedure for examining the importance of
violations of the exclusion restriction in IV regressions, and Nevo and Rosen (2012)
who derive bounds in the setting where an endogenous regressor is “more endogenous”
than the variable used to instrument it is invalid. Our paper also relates to the literature
on the Bayesian analysis of non-identified models, particularly Poirier (1998) and Moon
and Schorfheide (2012).
The remainder of this paper is organized as follows. Section 2 derives the model,
and Section 3 explains our preferred parameterization of the identified set. Section
4 solves for the identified set and Section 5 describes our inferential procedure and
how we sample from the identified set. We conclude in Section 6 with two examples
illustrating the usefulness of our method: one that examines the effect of institutions
on development and another that revisits the returns to schooling.
3
2
The Model
We observe x, y and z from the following linear structural model
y = βx∗ + u
(1)
x∗ = πz + v
(2)
x = x∗ + w
(3)
where we assume, without loss of generality, that all random variables in the system
are mean zero or have been de-meaned. Our goal is to learn the parameter β, the
causal effect of x∗ . Unfortunately x∗ is unobserved: we only observe a noisy measure
x that has been polluted by classical measurement error w. We call (u, v, w, z) the
“primitives” of the system. Their covariance matrix is as follows:



Ω = V ar 


u
v
w
z


 
 
=
 
 
σu2 σuv 0 σuz
0
σuv σv2 0
0
0 σw2 0
σuz 0
0 σz2






(4)
Because w represents classical measurement error, it is uncorrelated with u, v, and z
as well as x∗ . The parameter σuz controls the endogeneity of the instrument z: unless
σuz = 0, z is an invalid instrument. Both σuv and σuz are sources of endogeneity for
the unobserved regressor x∗ . In particular,
σx∗ u = σuv + πσuz
which we can derive, along with the rest
the fact that

 
y
1

 
 x   0

 
 x∗  =  0
 

0
z
(5)
of the covariance matrix for (y, x, x∗ , z), from
β
1
1
0


0 βπ
u




1 π 
 v 


0 π  w 

0 1
z
(6)
along with the assumptions underlying the covariance matrix Ω of (u, v, w, z).
The system we have just finished describing is not identified: without further restrictions we cannot learn the value of β from any amount of data. In particular,
4
neither the OLS nor IV estimators converge in probability to β, instead they approach
βOLS
σxy
σx2∗
σ x∗ u
= 2 =β 2
+ 2
σx
σx∗ + σw2
σx∗
(7)
σzy
σzy
=β+
σxz
σxz
(8)
and
βIV =
where σx2∗ denotes the variance of the unobserved true regressor x∗ , which equals σx2 −σw2 .
Some quantities in the system, however, are identified. Since we observe (x, y, z),
we can learn the entries of the covariance matrix Σ of the observables, defined as

σx2 σxy σxz


Σ =  σxy σy2 σyz 
σxz σyz σz2

(9)
and, as a consequence, the value of the first stage coefficient π since
π=
σ x∗ z
σxz
= 2
2
σz
σz
(10)
where the fact that σx∗ z = σxz follows from Equations 4 and 6.
Although β is unidentified, the observable covariance matrix Σ, along with constraints on the unobserved covariance matrix Ω of the primitives, does impose restrictions on the unobservables. Combined with even relatively weak subject-specific prior
knowledge, these restrictions can sometimes prove surprisingly informative, as we show
below. Before we can do this, however, we need to derive the identified set. To aid
in this derivation, we first provide a re-parameterization of the problem that will not
only simplify the expressions for the identified set, but express it in terms of quantities
that are empirically meaningful and thus practical for eliciting beliefs.
3
A Convenient Parameterization
The model introduced in the preceding section contains five non-identified parameters:
β, σuv , σuz , σv2 , and σw2 . In spite of this, as we will show below, there are only two degrees
of freedom: knowledge of any two of the five is sufficient to identify the remaining three.
As such we have a choice of how to represent the identified set. Because our ultimate
goal is to elicit and incorporate researcher’s beliefs, we adopt three criteria for choosing
5
a parameterization:
1. The parameters should be scale-free.
2. The parameter space should be compact.
3. The parameters should be meaningful in real applications.
Based on these considerations, we define the identified set in terms of the following
quantities:
ρzu = Cor(z, u)
(11)
ρx∗ u = Cor(x∗ , u)
σ 2∗
σ 2∗
κ = x2 = 2 x 2
σx
σx∗ + σw
(12)
(13)
Note that these parameters are not independent of one another. For example, ρx∗ u
depends on both κ and ρzu . This is precisely the point of our analysis: these three
quantities are bound together by the assumptions of the model, which allows us to
derive the identified set. The first quantity ρzu is the correlation between the instrument
and the main equation error term u. This measures the endogeneity of the instrument:
the exclusion restriction in IV estimation, for example, corresponds the belief that
ρzu = 0. When critiquing an instrument, researchers often state a belief about the
likely sign of this quantity. The second quantity ρzu is the correlation between the
unobserved regressor x∗ and the main equation error term. This measures the overall
endogeneity of x∗ , taking into account both the effect of σuv and σzu . As pointed out
by Moon and Schorfheide (2009), researchers almost invariably state their belief about
the sign of this quantity before undertaking an IV estimation exercise.
The third quantity, κ, is somewhat less familiar. In the simple setting we consider
here, with no covariates, κ measures the degree of attenuation bias present in the OLS
estimator. In other words, if ρx∗ u = 0 then the OLS probability limit is κ. Equivalently,
since σx∗ y = σxy
κ=
σx2∗
σx2
2
σyx
2
σyx
∗
=
2
σyx
σx2 σy2
σx2∗ σy2
2
σyx
∗
ρ2yx
= 2
ρyx∗
(14)
so another way to interpret κ is as the ratio of the observed R2 of the main equation
and the unobserved R2 that we would obtain if our regressor had not been polluted
6
by measurement error. A third and more general way to think about κ is in terms of
signal and noise. If κ = 1/2, for example, this means that half of the variation in the
observed regressor x is “signal,” x∗ , and the remainder is noise, w. While the other
two interpretations we have provided are specific to the case of no covariates, this third
interpretation is not.
There are several advantages to parameterizing measurement error in terms of κ
rather than the measurement error variance σw2 . First, κ has compact support: it takes
a value in (0, 1]. When κ = 1, σw2 = 0 so there is no measurement error. In the limit
as κ approaches zero corresponds to taking σw2 to infinity. Second, writing expressions
in terms of κ greatly simplifies our calculations. Indeed, as we will see in the next
section, the sample data provide simple and informative bounds for κ. Third, and
most importantly, we consider it much easier to elicit beliefs about κ than σw2 . We will
consider this point in some detail in the empirical examples that we present below.
In the section that follows we will solve for ρzu in terms of ρx∗ u , κ and the observable
covariance matrix Σ. First, however, we will derive bounds on these three quantities.
4
4.1
The Identified Set
Bounds on The Non-Identified Parameters
Our compact parameterization from the preceding section gives us several obvious
bounds: ρx∗ u , ρzu ∈ [−1, 1] and κ ∈ (0, 1]. Yet there are other, less obvious bounds that
come from the two covariance matrices: Σ and Ω. To state these additional bounds,
we need an expression for σv2 , the variation in x∗ not attributable to the instrument z,
in terms of κ and observables only. To this end, note that the R2 of the IV first stage,
ρ2xz , can be expressed as
(πσz )2
π2σ2
ρ2zx = 2 2 = 2 z
σx σz
σx
Combining this with the fact that σx2 = σv2 + σw2 + π 2 σz2 , we have
1=
σv2 + σw2
+ ρ2xz
σx2
Rearranging and simplifying we find that ρ2xz = κ − σv2 /σx2 and hence
σv2 = σx2 (κ − ρ2xz )
7
(15)
We now proceed to construct an additional bound for κ in terms of the elements of
Σ. To begin, since we can express κ as ρ2xy /ρ2x∗ y and squared correlations are necessarily
less than or equal to one, it follows that κ > ρ2xy . Although typically stated somewhat
differently, this bound is well known: in fact it corresponds to the familiar “reverse
regression bound” for β which goes back at least to Frisch (1934).1 As it happens,
however, Σ provides an additional bound that may be tighter than κ > ρ2xy . Since σv2
and σx2 are both strictly positive, Equation 15 immediately implies that κ ≥ ρ2xz . In
other words the R2 of the IV first-stage provides an upper bound for the maximum
possible amount of measurement error. Given its simplicity, we doubt that we are the
first to notice this additional bound. Nevertheless, to the best of our knowledge, it has
not appeared in the literature. Taking the best of these two bounds, we have
max{ρ2zx , ρ2yx } ≤ κ ≤ 1
(16)
Recall that κ is inversely related to the measurement error variance σw2 : larger values
of κ correspond to less. We see from the bound in Equation 16 that larger values of
either the first-stage or OLS R-squared leave less room for measurement error. This
is important because applied econometricians often argue that their data is subject to
large measurement error to explain a large discrepancy between OLS and IV estimates,
but we are unaware of any cases in which this belief is confronted with these restrictions.
Before proceeding to solve for the identified set, we derive one further bound from
the requirement that Ω – the covariance matrix of the model primitives (u, v, w, z) –
be positive definite. At first glance it might appear that this restriction merely ensures
that variances are positive and correlations bounded above by one in absolute value.
Recall, however, that Equation 4 imposes a considerable degree of structure on Ω. In
particular, many of its elements are assumed to equal zero. Consider the restriction
|Ω| > 0. This implies
2
2
σz + σv2 σu σz2 − σuz
>0
σw2 −σuv
but since σw2 > 0, this is equivalent to
2
2
σz2 σu2 σv2 − σuv
> σv2 σuz
1
To see this, suppose that ρx∗ u = 0, and without loss of generality that β is positive. Then
Equation 7 gives βOLS = κβ < β. Multiplying both sides of κ > ρ2xy by β and rearranging gives
β < βOLS /ρ2xy , and hence βOLS < β < βOLS /ρ2xy .
8
Dividing both sides through by σu2 σz2 σv2 and rearranging, we find that
ρ2uv + ρ2uz < 1
(17)
In other words (ρuz , ρuv ) must lie within the unit circle: if one of the correlations is
very large in absolute value, the other cannot be. To understand the intuition behind
this constraint, recall that since v is the residual from the projection of x∗ onto z, it
is uncorrelated with z by construction. Now suppose that ρuz = 1. If ρuv were also
equal to one, we would have a contradiction: z and v would be perfectly correlated.
The constraint given in Inequality 17 rules this out.
As explained above, we will characterize the identified set in terms of ρx∗ u , ρzu and
κ, eliminating ρuv from the system. Thus, we need to restate Inequality 17 so that it
no longer involves ρuv . To accomplish this, first write
ρx∗ u =
σv
σx∗
ρuv +
πσz
σ x∗
ρuz
p
p
and then note that σv /σx∗ = 1 − ρ2xz /k and πσz /σx∗ = ρ2xz using Equation 15 and
the definition of κ. Combining,
ρ x∗ u =
p
p
1 − ρ2xz /κ ρuv +
ρ2xz /κ ρuz
and solving for ρuv ,
ρuv
√
ρx∗ u κ − ρuz ρxz
p
=
κ − ρ2xz
(18)
(19)
so we can re-express the constraint from Inequality 17 as
!2
√
ρx∗ u κ − ρuz ρxz
p
+ ρ2uz < 1
κ − ρ2xz
4.2
(20)
Solving for the Identified Set
We now provide a characterization of the identified set by solving for ρuz in terms of
ρx∗ u , κ and the observables contained in Σ. Rewriting the Equation 8, we have
β=
σzy − σuz
σzx
9
(21)
and proceeding similarly for Equation 7,
β=
σxy − σx∗ u
κσx2
(22)
Combining Equations 21 and 22, we have
σzy − σuz
σxy − σx∗ u
=
σzx
κσx2
(23)
Now, using Equations 4 and 6, the variance of y can be expressed as
σy2 = σu2 + β 2σx∗ u + βκσx2
Substituting Equation 21 for β, Equation 22 for βκσx2 , and rearranging,
σu2
−
σy2
+
σzy − σuz
σzx
(σx∗ u + σxy ) = 0
(24)
The next step is to eliminate σu from our system of equations. First we substitute
√
σx∗ z = σu κσx ρx∗ u
σuz = σu σz ρuz
into Equations 23 and 24, yielding
σu2
and
−
σy2
+
σzy − σu σz ρuz
σzx
√
σu σx κρx∗ u + σxy = 0
σxy − σx σu ρx∗ u
σzy − σu σz ρuz
=
σzx
κσx2
(25)
(26)
Rearranging Equation 26 and solving for σu , we find that
σu =
σx
√
σzx σxy − κσx2 σzy
κσxz ρx∗ u − σz κσx2 ρuz
(27)
Since we have stated the problem in terms of scale-free structural parameters,
namely (ρzu , ρx∗ u , κ), we may assume without loss of generality that σx = σy = σz .
Even if the raw data do not satisfy this assumption, the identified set for the structural
parameters is unchanged. Imposing this normalization, the equation for the identified
10
set becomes
σ
eu2
−1 +
where
ρzy − σ
eu ρuz
ρzx
√
σ
eu κρx∗ u + ρxy = 0
ρxz ρxy − κρzy
σ
eu = √
κρxz ρx∗ u − κρuz
(28)
(29)
We use the notation σ
eu to indicate that normalizing y to have unit variance does change
the scale of σu . Specifically, σ
eu = σu /σy . This does not introduce any complications
because we eliminate σ
eu from the system by substituting Equation 29 into Equation
√
28. Note, however, that when κρuz = ρx∗ u ρxz , Equation 27 has a singularity.
After eliminating σ
eu , Equation 28 becomes a quadratic in ρzu with parameters
that depend on the structural parameters (ρx∗ u , κ) and the reduced form correlations
(ρxy , ρxz , ρzy ). Solving, we find that
−
(ρ+
uz , ρuz ) =
ρx∗ u ρxz
√
κ
s
± (ρxy ρxz − κρzy )
1 − ρ2x∗ u
κ κ − ρ2xy
(30)
Notice that the fraction under the square root is always positive, so both solutions are
always real. This follows because ρ2x∗ u must be between zero and one and, as we showed
above, κ > ρ2xy . Although the preceding expression always yields two real solutions,
one of these is extraneous as it implies a negative value for σ
eu . To see why this is the
case, substitute each solution into the reciprocal of Equation 29. We have
σ
eu−1
s
#
"
√
κρxz ρx∗ u
κ
ρx∗ u ρxz
1 − ρ2x∗ u
√
−
=
± (ρxy ρxz − κρzy )
ρxy ρxz − κρzy ρxz ρxy − κρzy
κ
κ κ − ρ2xy
" √
#
s
√
κρxz ρx∗ u
κρxz ρx∗ u
κ(1 − ρ2x∗ u )
=
−
±
ρxy ρxz − κρzy
ρxy ρxz − κρzy
κ − ρ2xy
s
κ(1 − ρ2x∗ u )
= ∓
κ − ρ2xy
Since the quantity inside the square root is necessarily positive given the constraints
on correlations and κ, we see that ρ+
uz is always extraneous. Thus, the only admissible
solution is
s
∗
ρx u ρxz
1 − ρ2x∗ u
√
ρuz =
− (ρxy ρxz − κρzy )
(31)
κ
κ κ − ρ2xy
Along with Inequalities 16 and 20, and the requirement that correlations be less than
11
one in absolute value, Equation 31 gives a complete characterization of the identified
set. Given a triple (ρzu , ρx∗ u , κ) and values for the elements (σx , σz , σy , ρxy , ρxz , ρyz )
of the observable covariance matrix Σ, we can solve for the implied value of β using
Equation 21. Specifically,
σy ρyz − ρzu σ
eu
β=
(32)
σz
ρxz
using the fact that σ
eu = σu /σy , where σ
eu is the standard deviation of the main equation
error term from the normalized system, as given in Equation 27, and σu is the standard
deviation of the main equation error term from the original system. Notice that ρx∗ u
and κ enter Equation 32 through σ
eu . This fact highlights the central point of our
analysis: even though exact knowledge of σuz alone would be sufficient to correct the
IV estimator, yielding a consistent estimator of β, stating beliefs about this quantity
alone does not provide a satisfactory solution to the identification problem. For one,
because it depends on the scaling of both z and u, it may be difficult to elicit beliefs
about σuz . Although we can learn σz from the data, σu can only be estimated if we have
resolved the identification problem. In contrast, ρzu , our preferred parameterization, is
scale-free. More importantly, however, the form of the identified set makes it clear that
our beliefs about ρuz are constrained by any beliefs we may have about ρx∗ u and κ. This
observation has two important consequences. First, it provides us with the opportunity
to incorporate our beliefs about measurement error and the endogeneity of the regressor
to improve our estimates. Failing to use this information is like leaving money on the
table. Second, it disciplines our beliefs to prevent us from reasoning to a contradiction.
Without knowledge of the form of the identified set, applied researchers could easily
state beliefs that are mutually incompatible without realizing it. Our analysis provides
a tool for them to realize this and adjust their beliefs accordingly. While we have thus
far discussed only beliefs about ρzu , ρx∗ u and κ, one could also work backwards from
beliefs about β to see how they constrain the identified set. We explore this possibility
in one of our examples below.
5
Bayesian Inference for the Identified Set
Having characterized the identified set, the usual Frequentist approach would be to
use it to derive bounds for β, possibly after imposing sign or interval restrictions on
ρzu , ρx∗ u and κ. In its broad strokes, we agree with this approach: it makes sense
12
to report the full range of possible values for β and the prior beliefs that researchers
commonly state often take the form of sign restrictions. But bounds on β tell only
part of the story. The identified set is a two-dimensional surface of which the usual
partial identification bounds consider only the two worst-case points. While it may
well be difficult to specify an informative prior on the identified set (ρzu , ρx∗ u , κ) it is
surely relevant to consider what fraction of the points in this set lead to a particular
value for β. Given that the partial identification bounds for β could easily map back
to extremely atypical values for (ρzu , ρx∗ u , κ), it would seem odd not to find some way
of averaging over the information contained in the entire identified set. Accordingly,
we adopt a suggestion from Moon and Schorfheide (2012) and place a uniform prior
on the identified set conditional on the observable covariance matrix Σ.
Choosing a prior to represent “ignorance” is always somewhat contentious as a prior
that is flat in one parameterization can be highly informative in another. As explained
above, we believe that there are compelling reasons to parameterize the problem in
terms of ρzu , ρx∗ u and κ: they are scale-free, empirically meaningful quantities about
which researchers are naturally inclined to state beliefs. In most situations, however,
these beliefs will be fairly vague. And indeed, specifying an informative prior on the
identified set may be challenging. An advantage of our proposed conditionally uniform
prior is that it remains uniform after imposing interval or sign restrictions by “cutting
off” sections of the identified set. In this way, we can allow researchers impose beliefs
on the problem without the need to specify a density supported on a complicated
two-dimensional region embedded in three-dimensional space. Moreover, there is no
need to take the uniform prior literally in this context. Instead, one can view it as a
starting point. For example, one can pose the question of what kind of deviation from
uniformity would be necessary to encode particular beliefs about β. Will consider this
possibility in one of our empirical examples below.
The analysis of the preceding section took Σ as known, but in practice it must be
estimated from sample data. As such there is not a single identified set but an identified
set for each possible Σ. Thus, having stated a conditional prior for ρzu , ρx∗ u , κ, it
remains to decide how to sampling uncertainty in the observable covariance matrix Σ
into the problem. As our aim is to appeal to applied researchers who may not typically
rely on Bayesian methods, the ideal would be a minimally informative, default prior
that closely approximates the usual frequentist inference for the identified parameters.
We are currently exploring various possibilities to achieve this goal. In interim, and
13
for the purposes of this draft, we specify a multivariate normal likelihood for (x, y, z)
and a Jeffrey’s prior for Σ. Specifically for i = 1, . . . , n we suppose


xi


 yi  ∼ iid N3 (µ, Σ)
zi
(33)
π (µ, Σ) ∝ |Σ|−2
(34)
leading to the marginal posterior
Σ|x, y, z ∼ Inverse-Wishart(n − 1, S)
(35)
where


(x
−
x
¯
)
i
n
h
i
X


S=
 (yi − y¯)  (xi − x¯) (yi − y¯) (zi − z¯)
i=1
(zi − z¯)
(36)
To generate uniform draws on the identified set conditional on a given posterior
draw for Σ` , we employ a two-stage accept-reject algorithm. We begin the first step by
drawing κj ∼ Uniform(κL , κU ) independently of ρjx∗ u ∼ Uniform(ρLx∗ u , ρUx∗ u ). Absent
any prior restrictions that further restrict the support of κ or ρx∗ u , we take κL =
max (ρ2zx )j , (ρ2xy )j , κU = 1, ρLx∗ u = −1 and ρUx∗ u = 1. We then solve for ρjzu via
Equation 31 and check whether it lies in the interval [ρLzu , ρUzu ]. Absent any prior
restrictions on ρzu , we take this interval to be [−1, 1]. If ρjzu lies in this region and if
the triple (ρzu , ρx∗ u , κ) satisfies Inequality 20, we accept draw j; otherwise we reject it.
We repeat this process until we have J draws on the identified set. While these draws
are uniform when projected into the (κ, ρx∗ u ) plane, however, they are not uniform on
the identified set itself. To make them uniform, we need to re-weight each draw based
on the local surface area of the identified set at that point. By “local surface area” we
refer to the quantity
s
M (ρx∗ u , κ) =
1+
∂ρuz
∂ρx∗ u
2
+
∂ρuz
∂κ
2
(37)
which Apostol (1969) calls the “local magnification factor” of a parametric surface.
14
The derivatives required to evaluate the function M are
ρxz
ρx∗ u (ρxy ρxz − κρzy )
∂ρuz
=√ +q
∂ρx∗ u
κ
κ κ − ρ2xy (1 − ρ2x∗ u )
(38)
and
∂ρuz
ρx∗ u ρxz
=−
+
∂κ
2κ3/2
s
1 − ρ2x∗ u
κ κ − ρ2xy
1
1
1
ρzy + (ρxy ρxz − κρzy )
+
2
κ κ − ρ2xy
(39)
To accomplish the re-weighting, we first evaluate M j = M (ρjx∗ u , κj ) at each draw j
that was accepted in the first step. We then calculate Mmax = maxj=1,...,J M j and
resample the draws ρjzu , ρjx∗ u , κj with probability pj = M j /Mmax .
6
Empirical Examples
We now consider two simple empirical examples illustrating the methods proposed
above: the first considers the effect of institutions on income per capita, and the
second considers the returns to schooling.
6.1
The Colonial Origins of Comparative Development
We begin by considering the main specification of Acemoglu et al. (2001), who use
early settler mortality as an instrument to study the effect of institutions on GDP per
capita based on cross-country data for a sample of 64 countries. The main equation is
log GDP/capita = constant + β (Institutions) + u
and the first stage is
Institutions = constant + π (log Settler Mortality) + v
Leading to an OLS estimate of βbOLS = 0.52 and an IV estimate that is nearly twice as
large (βbIV = 0.94), a difference which the authors attribute to measurement error:
This estimate is highly significant . . . and in fact larger than the OLS estimates reported in Table 2. This suggests that measurement error in the
15
institutions variables that creates attenuation bias is likely to be more important that reverse causality and omitted variables biases.
But can measurement error really explain this disparity, or is something else to blame?
Figure 1 presents two views of the identified set evaluated at the maximum likelihood
estimate for Σ, imposing no prior information on the problem. Figure 2 depicts each
two-dimensional projection of the same set. The points in red correspond to values of
κ that are greater than 0.6
Without prior restrictions, the identified set is not particularly informative although
it does rule out especially large amounts of measurement error: the minimum value
of κ consistent with the data (at the MLE) is around 0.5. Figure 3 maps the points
on the identified set to the corresponding values of β. The panel at left is based on
the identified set at the MLE, while the panel at right averages over 1000 identified
sets corresponding to the Inverse-Wishart draws depicted in Figure 4. The posterior
mean value of β in this case is quite close to the IV estimate and far above the OLS
estimate. Moreover, the posterior is heavily concentrated around positive values of β.
Even if you do not believe our uniform conditional prior, if would be difficult to obtain
a posterior the assigned substantial probability to negative values of β in this case.
In their paper, however, Acemoglu et al. (2001) state a number of beliefs that are
relevant for this exercise. First, they claim that there is likely a positive correlation
between “true” institutions and the main equation error term u. Second, by way of
a footnote that uses a second measure of institutions as an instrument for the first,
they argue that measurement error could be substantial enough to yield a value of κ as
small as 0.6. This would correspond to 40% of the variation in the observed measure
of institutions being noise. Accordingly, Figures 5 and 6 restrict the identified set to
impose these constraints.
Even after imposing these relatively weak beliefs the picture dramatically changes.
From the rightmost panel of Figure 5, we see that Settler Mortality cannot be a valid
instrument: if we believe that ρx∗ u is positive and that κ is at most 0.6, then ρzu must
be negative. Turning our attention to Figure 6, the posterior for β is now concentrated
around the OLS estimate. Indeed, the IV estimate is at the edge of being infeasible
given the data. At the very least it is likely to be a substantial overestimate. Nevertheless, the main result of Acemoglu et al. (2001) continues to hold: in spite of the
fact that Settler Mortality is negatively correlated with u, from this exercise it appears
that the effect of institutions on income per capita is almost certainly positive.
16
6.2
The Returns to Schooling
Our second example uses a subset of data from Blackburn and Neumark (1992) to
study the returns to schooling based on a sample of 935 US males. The main equation
is
log Wage = constant + β(Education) + u
and the first stage is
Education = constant + π(Siblings) + v
The variable Education measures an individual’s years of schooling, and Siblings measures the number of brothers and sisters that he has. The estimated first stage coefficient in this example is π
b = −0.23 while the OLS and IV estimates are βbOLS = 0.06 and
βbIV = 0.12. As in the Colonial Origins example, the IV estimate is much larger than
the OLS estimate: a 12% increase in wages per additional year of schooling compared
to a 6% increase. Could measurement error be the blame?
Figure 7 presents two views of the identified set, evaluated at the MLE for Σ,
imposing only the requirement that κ > 0.1 to avoid numerical problems. (Since
this lower bound corresponds to 90% of the observed variation in years of schooling
being noise, it may be considered a fairly innocuous restriction.) Note how different the
identified set appears in this example compared to the Colonial Origins example. Here,
the data do not rule out any values of κ and the restrictions κ > 0.1 binds. Figure 8
gives the corresponding posterior for β: the panel at left ignores sampling variability,
considering the identified set at the MLE for Σ, while the panel at right averages over
the 1000 Inverse-Wishart draws depicted in Figure 9. With nearly 1000 observations
and only six quantities to estimate, sampling variability has no appreciable impact in
this example, unlike the Colonial Origins example from above. But more importantly,
the identified set in this example is almost completely uninformative: -300 and +300%
differences in wages per additional year of schooling appear to be consistent with the
data. Indeed, on this scale, the differences between the OLS and IV estimates are
trivial. Perhaps imposing prior beliefs can help.
The key unobservable that makes up u is almost certainly ability, which we would
suspect is positively correlated with years of schooling. Because of mis-reporting, it is
likely that years of schooling is measured with error but it seems extreme to entertain
a value of κ below 0.5, as this would correspond to more than half of the observed
17
variation in years of schooling being noise. But what about the instrument, Siblings?
There is certainly reason to suspect that it could be correlated with ability, u. For
example, in parents with more children likely have less time to spend with each of them
and this may cause a negative correlation between Siblings and u. Alternatively, one
could imagine that older siblings supplement parental attention and thereby increase
the ability of their younger siblings. This story would result in a positive correlation
between Siblings and ability. Based on this reasoning, we now consider imposing the
restrictions κ > 0.5 and ρx∗ u > 0. Because it is unclear what sign to expect for
ρzu , we leave this parameter unconstrained. Figure 10 gives the posterior for β after
restricting the identified set so that κ > 0.5 and ρx∗ u > 0. Surprisingly the result of
this restriction has not been to rule out extremely large negative effects of schooling on
wages, but nearly all positive effects: wage declines of 100 or even 200% still appear
to be consistent with the data.
Surely something must be amiss: we have a very strong prior belief that the returns
to education should not be negative. To understand what is happening here, we plot
both the restricted and unrestricted identified sets using the color red to denote a point
that maps into a positive value of β: Figure 11 presents the three-dimensional version
while Figure 12 presents the two-dimensional projections. From the Figures we see
that, while a majority of the unrestricted identified set map into a positive values for
β, nearly all of these points correspond to extremely small values of κ and negative
values for ρx∗ u . After imposing the restrictions κ > 0.5 and ρx∗ u > 0, hardly any of the
red points remain.
In this example the results are essentially negative: we do not learn anything meaningful about the returns to education. Nevertheless, we still uncovered something valuable: a contradiction in our beliefs. The belief that ρx∗ u is positive and κ isn’t too small
is effectively incompatible with the belief that the returns to education are positive in
this example. Something is clearly wrong: either with our beliefs or with our maintained assumptions – for example the model specification and the assumption that the
measurement error is classical – but this was not obvious until after we examined the
identified set and posterior for β.
18
References
Acemoglu, D., Johnson, S., Robinson, J. A., 2001. The colonial origins of comparative
development: An empirical investigation. The American Economic Review 91 (5),
1369–1401.
Apostol, T. M., 1969. Calculus, 2nd Edition. Vol. II. John Wiley and Sons, New York.
Blackburn, M., Neumark, D., 1992. Unobserved ability, efficiency wages, and interindustry wage differentials. The Quarterly Journal of Economics 107 (4), 1421–
1436.
Conley, T. G., Hansen, C. B., Rossi, P. E., 2012. Plausibly exogenous. The Review of
Economics and Statistics 94 (1), 260–272.
Moon, H. R., Schorfheide, F., 2009. Estimation with overidentifying inequality moment
conditions. Journal of Econometrics 153, 136–154.
Moon, H. R., Schorfheide, F., 2012. Bayesian and frequentist inference in partially
identified models. Econometrica 80 (2), 755–782.
Nevo, A., Rosen, A. M., 2012. Identification with imperfect instruments. The Review
of Economics and Statistics 94 (3), 659–671.
Poirier, D. J., 1998. Revising beliefs in nonidentified models. Econometric Theory 14,
483–509.
19
Figure 1: Identified set for Colonial Origins Example: No prior constraints on
ρzu , ρx∗ u , κ.
20
Figure 2: Identified Set for Colonial Origins Example: No prior constraints on
ρzu , ρx∗ u , κ, values of κ less than 0.6 in red.
21
Full Posterior
2.0
Conditional Prior (at MLE)
5
OLS
0
0.0
1
0.5
2
Density
1.0
Density
3
4
1.5
IV
−2
−1
0
β
1
2
3
−2
−1
0
β
1
2
3
Figure 3: Posterior Draws for β in the Colonial Origins Example: no prior constraints
on ρzu , ρx∗ u , κ ρzu , ρx∗ u , κ. The left panel ignores uncertainty in Σ and evaluates the
b The right panel averages over 1000 identified sets correidentified set at the MLE Σ.
sponding to the posterior draws for Σ illustrated in Figure 4.
22
Cov(y,x)
Cov(z,x)
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1.0
0.0
0.0
0.0
0.2
0.5
0.5
0.4
1.0
0.6
0.8
1.5
1.5
1.0
Var(x)
0.5
1.0
1.5
2.0
2.5
−2.5
−2.0
−1.5
−1.0
−0.5
Cov(z,y)
0.6
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
2.0
2.0
Var(y)
0.8
1.0
1.2
1.4
1.6
1.8
2.0
−1.5
−1.0
−0.5
0.8
1.2
Var(z)
0.4
Red Vertical Line = MLE
0.0
Based on 1000 Posterior Draws
1.0
1.5
2.0
2.5
Figure 4: Posterior Draws for Σ in Colonial Origins Example.
23
3.0
Figure 5: Identified Set for Colonial Origins Example: κ constrained to be greater than
0.6 and ρx∗ u constrained to be positive.
Full Posterior
1.5
Conditional Prior (at MLE)
OLS
Density
0.0
0.0
0.5
0.5
Density
1.0
1.0
1.5
2.0
IV
−1.0
−0.5
0.0
β
0.5
1.0
−1.0
−0.5
0.0
β
0.5
1.0
Figure 6: Posterior Draws for β in the Colonial Origins Example: κ constrained to be
greater than 0.6 and ρx∗ u constrained to be positive. The left panel ignores uncertainty
b The right panel averages over 1000
in Σ and evaluates the identified set at the MLE Σ.
identified sets corresponding to the posterior draws for Σ illustrated in Figure 4.
24
Figure 7: Identified Set for Returns to Schooling Example: No prior constraints on
ρzu , ρx∗ u , κ.
25
Full Posterior
0.30
Conditional Prior (at MLE)
OLS
Density
0.15
0.00
0.00
0.05
0.05
0.10
0.10
Density
0.15
0.20
0.20
0.25
0.25
IV
−5
0
β
5
−5
0
β
5
Figure 8: Posterior Draws for β in the Returns to Schooling Example: no prior constraints on ρzu , ρx∗ u , κ ρzu , ρx∗ u , κ. The left panel ignores uncertainty in Σ and evaluates
b The right panel averages over 1000 identified sets corthe identified set at the MLE Σ.
responding to the posterior draws for Σ illustrated in Figure 9.
26
Cov(y,x)
Cov(z,x)
2.0
0.5
0.0
0
0.0
2
0.5
4
1.0
6
1.0
8
1.5
10
1.5
12
Var(x)
4.5
5.0
5.5
0.20
0.25
0.30
0.35
0.40
−1.6
−1.4
−1.2
−1.0
−0.8
Cov(z,y)
0
0
2
10
4
20
6
8
30
40
10 12
Var(y)
0.16
0.17
0.18
0.19
0.20
0.21
−0.25
−0.20
−0.15
−0.10
−0.05
1.0
1.5
Var(z)
Red Vertical Line = MLE
0.0
0.5
Based on 1000 Posterior Draws
5.0
5.5
Figure 9: Posterior Draws for Σ in Returns to Schooling Example.
27
6.0
Full Posterior
0.8
0.8
Conditional Prior (at MLE)
OLS
Density
0.4
0.0
0.0
0.2
0.2
Density
0.4
0.6
0.6
IV
−5
−4
−3
−2
β
−1
0
−5
−4
−3
−2
β
−1
0
Figure 10: Posterior Draws for β in the Returns to Schooling Example: κ constrained
to be greater than 0.5 and ρx∗ u constrained to be positive. The left panel ignores
b The right panel
uncertainty in Σ and evaluates the identified set at the MLE Σ.
averages over 1000 identified sets corresponding to the posterior draws for Σ illustrated
in Figure 9.
28
Figure 11: Identified Set for Returns to Schooling Example: red points correspond to
positive values of β. The top panel imposes no constraints on κ, ρzu , ρx∗ u while the
bottom panel constrains κ to be greater than 0.5 and ρx∗ u constrained to be positive.
29
Figure 12: Identified Set for Returns to Schooling Example: red points correspond to
positive values of β. The top panel imposes no constraints on κ, ρzu , ρx∗ u while the
bottom panel constrains κ to be greater than 0.5 and ρx∗ u constrained to be positive.
30