Download Report

Estimation of Sample Selection Models with Spatial
Dependence∗
Alfonso Flores-Lagunes†
Kurt Erik Schnier‡
Preliminary and Incomplete Draft
August 13, 2004
Do not quote or circulate without permission of the authors
Abstract
We consider the estimation of sample selection (type II Tobit) models that exhibit
spatial dependence. Two types of spatial dependence are considered: spatial error
dependence (or spatial autoregressive error model, SAE) and spatial lag dependence
(or spatial autoregressive model, SAL).
The method considered is motivated by a two-step strategy analogous to the popular
heckit model. The first step of estimation is based on a spatial probit model following a
methodology proposed by Pinkse and Slade (1998) that yields consistent, although not
fully eﬃcient, estimates. The consistent estimates of the selection equation are used to
estimate the inverse mills ratio (IMR) to be included as a regressor in the estimation of
the outcome equation. Since the appropriate IMR turns out to depend on a parameter
from the second step under SAE, the models are estimated jointly, which we propose
to do in a GMM context.
We explore the large sample properties of the proposed estimator and undertake a
Monte Carlo experiment to assess its performance in finite samples. Finally, we discuss
the importance of the spatial sample selection model in applied work and illustrate
its application with a censored sample of fishermen in the Eastern Bering Sea used in
Schnier (2003).
Key words and phrases: sample selection, spatial dependence, generalized method
of moments.
JEL classification: C13, C15, C24, C49
∗
We want to thank Dan Ackerberg, Carlos Flores and Kei Hirano for useful discussions, and seminar
participants at the department of Agricultural & Resource Economics at the University of Arizona and the
2003 Midwest Econometrics Group meetings. All errors are our own.
†
Department of Economics, University of Arizona. [email protected]
‡
Department of Environmental and Natural Resource Economics, University of Rhode Island.
[email protected]
1
Introduction
Econometric models that take into account spatial interactions among economic units have
been increasingly used by economists over the last several years.1 The diﬀerent approaches
for undertaking estimation and inference in linear regression models with spatial eﬀects are
well known and have been summarized in a number of papers by Anselin (1988, 2001),
Anselin and Bera (1998), and others.
The estimation of nonlinear models that include spatial interactions, in particular limited
dependent variable models, is not as well developed as that for linear models. In fact, only
recently have methods for estimating and conducting statistical inference in spatial models
with limited dependent variables been proposed. This literature has concentrated mainly
in the probit model with spatial eﬀects, as in Case (1992), Beron and Vijverberg (1999),
Fleming (2002), LeSage (2000), McMillen (1992), and Pinkse and Slade (1998).
In this paper we add to the literature on the estimation of limited dependent variable
models with spatial eﬀects by introducing a sample selection model with spatial dependence
and proposing a method for its estimation. The type of sample selection model considered is
the widely used heckit model (Heckman, 1976, 1979), also known as the Tobit type II model,
in the terminology of Amemiya (1985).2 As for spatial dependence, we consider two separate
structures: the spatial autoregressive error (SAE) model and the spatial autoregressive lag
(SAL) model.
Our estimation strategy can be thought of as a two-step procedure analogous to the
popular heckit model that is estimated jointly as a "pseudo" sequential estimator using GMM
(Newey, 1984). The first step of estimation is based on a spatial probit model following a
methodology by Pinkse and Slade (1998), which yields consistent, although not fully eﬃcient,
estimates of the selection equation. As in the heckit procedure, the consistent estimates of
the selection equation are used to estimate the inverse Mills ratio (IMR) to be included in the
estimation of the outcome equation. Since in the presence of spatial error dependence the
IMR depends upon unknown parameters from the outcome equation, the model is estimated
jointly.
The estimation of a probit model with spatial dependence introduces a non-spherical
variance-covariance matrix that renders the simple probit estimator inconsistent and inef1
2
Some examples are Case (1991), Fishback, Horrace and Kantor (2002), Topa (2001), among many others.
For a review of these methods see Maddala (1983) and Vella (1998).
1
ficient. In turn, to obtain consistent and fully eﬃcient estimates, one has to deal with
multidimensional integrals. For instance, LeSage (2000) and Beron and Vijverberg (2003)
employ simulation methods to approximate such multidimensional integrals and obtain consistent and eﬃcient parameter estimates of the spatial probit model. Unfortunately, the
simulation of multidimensional integrals is computationally intensive, which restricts estimation to moderate sample sizes. This same limitation would apply to the use of simulation
methods to approximate the multidimensional integrals in the likelihood functions of sample
selection models with spatial dependence.3
In an attempt to avoid approximating multidimensional integrals but achieve consistency of the probit estimates at the expense of eﬃciency, some authors have addressed the
heteroskedasticity induced into the probit model by the spatial dependence, ignoring the
oﬀ-diagonal elements of the variance-covariance matrix. These authors include Case (1992),
McMillen (1992) and Pinkse and Slade (1998). Although not fully eﬃcient as the estimators
that approximate multidimensional integrals, we use Pinkse and Slade’s estimator in the
first step of the sample selection model for at least three reasons. First, it yields consistent
estimates of the selection equation that are necessary to obtain consistent estimates of the
parameters in the outcome equation (second step). Second, it is computationally simpler
than both of the other estimators that approximate multidimensional integrals. Third, it
has been developed in the framework of GMM, the same framework we employ in the joint
estimation of our sample selection models.
The consistent estimates obtained in the first step are used to construct the IMR, which
is used in the second step to correct for selectivity bias (Heckman, 1979). Since the outcome
equation is a linear equation, any estimation method can be used to estimate the second step
by including its corresponding moment conditions. However, since it is assumed that the
outcome equation also exhibits spatial dependence and since generally the spatial parameters
will be diﬀerent in each equation, the IMR under SAE turns out to be a function of the
unknown spatial parameter in the outcome equation. As a result, the sample selection
model under SAE cannot be estimated in two separate steps and an appropriate method that
estimates both steps simultaneously has to be used. We build on the sequential estimation
framework proposed by Newey (1984) to jointly estimate the sample selection model with
SAE. The sample selection model with SAL dependence can be estimated in two steps, as
3
For examples of likelihood functions for sample selections models under independent observations see
Heckman (1979) and Maddala (1983).
2
the IMR depends only upon parameters from the selection equation. However, we propose
the estimation of both models using the sequential estimation GMM framework since an
added advantage is that under this framework the appropriate variance-covariance matrix
of the estimator is obtained directly. Finally, building on Pinkse and Slade’s (1998) results
and standard GMM theory, the asymptotic properties of our estimators can be derived.
An early attempt to specify and estimate a sample selection model with spatial eﬀects
is McMillen (1995). His model builds on the spatial expansion model of Casetti (1992) that
is used in geography. However, such model is not explicitly spatial, and the consistency of
the estimator depends on a functional-form assumption of the underlying heteroskedasticity
induced by the spatial dependence.4
Our estimators have two potential disadvantages. First, is their lower eﬃciency as compared to the computationally intensive simulation methods, however, given the fact that
the multidimensional integration is in the order of the number of observations and the large
amounts of data available to researchers, we believe the computational simplicity of our
method is preferable. Second, is the fact that our estimators require the availability of at
least one instrumental variable in addition to the common "exclusion restriction" desirable
in the heckit model. To this end, however, the instrumental variables proposed by Kelejian and Prucha (1998) can be used, which are always available since are constructed from
polynomials of available variables. Using simulations, we find that the Kelejian and Prucha
instruments work well in practice.
The paper is organized as follows. Section 2 outlines the sample selection model with
spatial eﬀects, considering separately the spatial autoregressive error (SAE) and the spatial
autoregressive lag (SAL) models. Section 3 introduces our spatial sample selection model
estimators and states their large-sample properties. Section 4 describes and presents the
results of a Monte Carlo experiment, while Section 5 discusses the practical importance of
the spatial sample selection model and presents two empirical applications of our method
that correspond to the SAE and SAL models, employing a censored sample of fishermen in
the Eastern Bering Sea used in Schnier (2003). Concluding remarks are provided in the last
section of the paper. Proofs are presented in Appendix C.
4
McMillen (1995) also shows that using the EM algorithm (as in McMillen (1992)’s spatial probit estimator) in a spatial sample selection model is impractical either because of computational issues or because
it requires knowledge of the exact spatial process generating the underlying data.
3
2
The Sample Selection Models with Spatial Dependence
We consider estimation of two sample selection (Tobit type II) models that diﬀer in the form
of spatial dependence embodied on them. We specify the sample selection models to include
spatial eﬀects in both the selection and the outcome equations.
The first model is the spatial autoregressive error (SAE) model, which specifies that the
disturbances are spatially autocorrelated:
∗
y1i
= α0 + x01i α1 + u1i ,
u1i = δ
X
cij u1j + ε1i
(1)
cij u2j + ε2i
(2)
j6=i
∗
y2i
= β0 +
x02i β 1
+ u2i ,
u2i = γ
X
j6=i
∗
∗
where y1i
and y2i
are latent variables with the following relationship with the observed
∗
∗
variables y1i = 1 if y1i
> 0 and y1i = 0 otherwise, and y2i = y2i
∗ y1i . Therefore, (1) is
the selection equation while (2) is the outcome equation. Note that each of these equations
exhibit spatial dependence, as u1i and u2i depend on other u1j and u2j through their location
in space, as given by the spatial weights cij and the spatial autoregressive parameters δ and
γ. Typically, the spatial weights are specified by the econometrician based on some function
of contiguity or (economic) distance (Anselin, 1988; Anselin and Bera, 1998). Note also that,
in general, one will specify diﬀerent spatial autoregressive parameters for the selection and
outcome equations. It is assumed that:
Assumption A The errors ε1i and ε2i , i = 1, ...N , are iid N(0, Σ) with
¸
· 2
σ1 σ 12
.
Σ=
σ 12 σ 22
The model in (1)-(2) can also be presented in a reduced form:
∗
y1i
= α0 + x01i α1 +
X
ω1ij ε1j
(3)
ω 2ij ε2j
(4)
j
∗
= β 0 + x02i β 1 +
y2i
X
j
where the weights ω1ij and ω 2ij are the (i, j) elements of the inverse matrices (I − δC)−1 and
(I − γC)−1 , respectively, with C the matrix of spatial weights cij . Note that both sets of
weights, ω1ij and ω2ij , depend upon the unknown parameters δ and γ, respectively.
4
The second form of spatial dependence considered is the spatial autoregressive lag (SAL)
model, which specifies that the dependent variable of each unit is spatially dependent on the
value of the dependent variable in all (or some) of the other units:
∗
y1i
= α0 + x01i α1 + δ
X
∗
cij y1j
+ ε1i
(5)
∗
cij y2j
+ ε2i
(6)
j6=i
∗
= β 0 + x02i β 1 + γ
y2i
X
j6=i
∗
∗
and y2i
are latent variables with the following relationship with the observed
where y1i
∗
∗
variables y1i = 1 if y1i
> 0 and y1i = 0 otherwise, and y2i = y2i
∗ y1i . Assumption A is
applied here as well to the error terms ε1i and ε2i . The spatial dependence implied by the
SAL model is given by the spatial weights cij and the spatial autoregressive lag parameters
δ and γ. A reduced form of the SAL model similar to (3)-(4) is:
∗
y1i
= α0
X
ω1ij + α1
j
∗
= β0
y2i
X
X
ω 1ij x01j +
j
ω2ij + β 1
j
X
j
X
ω1ij ε1j
(7)
ω2ij ε2j
(8)
j
ω 2ij x02j +
X
j
where the weights ω 1ij and ω 2ij are defined as in the reduced form of the SAE model, and
we note again that these weights are functions of the unknown spatial parameters δ and γ,
respectively, in both the SAL and SAE models.
3
Estimators for the Sample Selection Models with
Spatial Dependence
We now describe our proposed estimation method for the sample selection models with
spatial dependence presented in the previous section. As previously described, we follow
a two-step procedure in the spirit of Heckman (1976, 1979) that is estimated jointly in a
GMM context. The selection equation is estimated using Pinkse and Slade’s (1998) GMM
estimator for the spatial probit model, while the outcome equation can be estimated with
any of the well-developed spatial methods for linear models. An estimate of the inverse Mills
ratio is included in the outcome equation to correct for selectivity bias. To estimate these
two parts simultaneously, the corresponding moment conditions are stacked, and a GMM
criterion function is minimized with respect to all parameters in the model. In what follows,
we consider again each type of spatial dependence separately.
5
3.1
Estimation of the SAE sample selection model
To motivate the estimation of the SAE model in (1)-(2), we start with the following calculations:
var(u1i ) = σ 21
X
(ω1ij )2
(9)
j
var(u2i ) = σ 22
X
(ω2ij )2
(10)
j
E(u1i , u2i ) = σ 12
X
ω1ij ω 2ij .
(11)
j
In the typical heckit model, in the first step a probit model is estimated for the probability
of each observation being included in the observed sample. In the presence of SAE errors,
this probit is complicated by the induced heteroskedasticity of the error terms in (9). Pinkse
and Slade (1998) propose to estimate this spatial probit model by taking into account the
known form of the induced heteroskedasticity in a GMM context.
Define θ1 = {α0 , α01 , δ} as the parameters to be estimated in the spatial probit model,
0
α +x α
and ψi (θ1 ) = √0 1i 1 the index function of the probit model weighted by the standard
var(u1i )
deviation of the residual. The corresponding generalized residuals of this model are:
u˜1i (θ1 ) = {y1i − Φ [ψi (θ1 )]} ·
φ [ψi (θ1 )]
.
Φ [ψi (θ1 )] {1 − Φ [ψi (θ1 )]}
(12)
Then, the GMM estimates for θ1 can be obtained as follows:
ˆθ1,GMM = arg min SN (θ1 )0 MN SN (θ1 )
(13)
θ1 ∈Θ1
where SN =
1 0
z u˜ (θ ),
N N 1N 1
zN is a matrix of regressors plus at least one instrument (to identify
p
the extra parameter δ), and MN is a positive definite matrix such that MN → M. Pinkse
and Slade (1998) show that this estimator is consistent and asymptotically normal.
The consistent estimates of θ1 will be used in the construction of the inverse Mills ratio
(IMR) to correct for sample selection bias. Note that the conditional regression function for
6
the outcome equation (2) has the following form:
E[y2i |y1i > 0] = β 0 + x02i β 1 + E[u2i |u1i > −(α0 + x01i α1 )]
φ [−ψi (θ1 )]
E(u1i , u2i )
·
= β 0 + x02i β 1 + p
var(u1i ) {1 − Φ [−ψi (θ1 )]}
P
σ 12 ω 1ij ω 2ij
φ [−ψi (θ1 )]
j
= β 0 + x02i β 1 + r P
·
σ 21 (ω1ij )2 {1 − Φ [−ψi (θ1 )]}
j
P
ω 1ij ω 2ij
σ
φ [−ψi (θ1 )]
j
12
· rP
·
= β 0 + x02i β 1 +
σ1
(ω1 )2 {1 − Φ [−ψi (θ1 )]}
j
ij
Therefore, the selectivity correction implies the following "adjusted" IMR:
P
ω 1ij ω 2ij
φ [−ψi (θ1 )]
.
·
λi ≡ rP
[ω 1ij ]2 {1 − Φ [−ψi (θ1 )]}
j
(14)
j
Once estimated, the "adjusted" IMR would be included as an additional regressor in
the outcome equation, which in turn could be estimated with any of the spatial methods
developed for this linear equation.5 We illustrate our estimator employing feasible generalized
bi + v2i .6
least squares (FGLS) to estimate the outcome equation: y2i = β 0 + x0 β 1 + µλ
2i
However, this method cannot be applied in two separate steps since the "adjusted" IMR
in (14) depends on a parameter that is not estimated in the first step: γ, which is included
in the weights ω 2ij . To overcome this diﬃculty, we use GMM to estimate simultaneously all
parameters of the sample selection model. Rewrite the sample selection model as a sequential
estimator (Newey, 1984), composed of the Pinkse and Slade (1998) and FGLS estimators,
by stacking their corresponding moment conditions:
0
g(z, θ) = [s(z1N , θ)0 , m(z2N , θ)0 ] ,
θ = {α0 , α01 , δ, β 0 , β 01 , µ, γ}
with
0
s(z1N , θ) = z1N
u˜1N (θ),
5
u˜1N (θ) as in (12),
b γ)
m(z2N , θ) = [y1N · z2N ]0 u˜2N (θ), u˜2N (θ) = y2N − β 0 − x02N β 1 − µλ(δ,
The linear SAE model could be estimated, for instance, with FGLS, the methods by Kelejian and Prucha
(1999), Lee (2001a), or maximum likelihood.
6
Kelejian and Prucha (1999) propose a set of three moment conditions for the estimation of the spatial
autoregressive parameter in a SAE model. These moments will be added to our estimators in the Monte
Carlo experiment and empirical application, to increase their eﬃciency.
7
and z1N includes the regressors of the selection equation plus at least one instrument, and
z2N includes the regressors of the outcome equation, the estimated "adjusted" IMR, and
least one instrument, which could be in principle the same instrument contained in z1 . We
note that the instruments proposed by Kelejian and Prucha (1998) can be used if no other
instruments are available.7
0
0
Define zN
= (z1N
, [y1N · z2N ]0 )0 and u˜N (θ) = (˜
u01N (θ), u˜02N (θ))0 . Then, all parameters of
the sample selection model can be estimated as:
ˆθGMM = arg min gN (θ)0 MN gN (θ)
(15)
θ∈Θ
p
0
u˜N (θ), for a positive definite MN such that MN → M.
where gN = N −1 zN
We show in appendix C that, under conditions similar to those in Pinkse and Slade (1998),
ˆθGMM is consistent and asymptotically normal. We call it the "spatial heckit" estimator for
the SAE model.
p
Proposition 1 Under assumptions A1 to A8 (stated in appendix C), ˆθGMM → θ0 .
The following two propositions concern the asymptotic distribution and the estimation
of the variance-covariance matrix of the spatial heckit estimator for the SAE model.
Proposition 2 Under assumptions A1 to A11 (stated in appendix C),
√
d
N(ˆθGMM − θ0 ) → N(0, [Ψ2 (θ0 )]−1 [∂g0 (θ0 )/∂θ]MΨ1 (θ0 )M[∂g(θ0 )/∂θ0 ][Ψ2 (θ0 )]−1 )
where Ψ1 (θ0 ) = lim E{NgN (θ0 )gN (θ0 )0 }, and Ψ2 (θ0 ) = [∂g0 (θ0 )/∂θ]M[∂g(θ0 )/∂θ0 ].
N→∞
Proposition 3 Under assumptions A1 to A14 (stated in appendix C), then
p
p
Ψ1N (ˆθGMM ) → Ψ1 (θ0 ) and Ψ2N (ˆθGMM ) → Ψ2 (θ0 ), where
Ψ1N (ˆθGMM ) = N E{gN (ˆθGMM )gN (ˆθGMM )0 } and
0 ˆ
Ψ2N (ˆθGMM ) = [∂gN
(θGMM )/∂θ]MN [∂gN (ˆθGMM )/∂θ0 ].
Finally, the following corollary gives the (eﬃcient) asymptotic variance of our estimator
when the optimal weighting matrix is used.
7
In the context of a linear SAL model, Kelejian and Prucha (1998) show that the optimal set of instruments
is approximated by the linearly independent colums of [x, W x, W 2 x, ...]. These instruments can also be used
in this context, as they are still exogenous.
8
p
Corollary 1 If MN = [Ψ1N (ˆθGMM )]−1 is chosen, such that MN → [Ψ1 (θ0 )]−1 , then the
asymptotic distribution of ˆθGMM simplifies to:
√
d
N(ˆθGMM − θ0 ) → N(0, [Ψ2 (θ0 )]−1 ).
3.2
Estimation of the SAL sample selection model
We start by noting that in the sample selection model with SAL dependence the IMR does
not depend on parameters from the outcome equation (second step) and therefore could in
principle be estimated in two separate steps. To see this, note that from (5) and (6):
var(ε2i ) = σ 22 ,
var(ε1i ) = 1,
and
E(ε1i , ε2i ) = σ 12 .
(16)
and the conditional regression function for the outcome equation (6) becomes:
X
X
cij y2j + E[ε2i |ε1i > −(α0 + x01i α1 + δ
cij y1j )]
E[y2i |y1i > 0] = β 0 + x02i β 1 + γ
j6=i
= β 0 + x02i β 1 + γ
X
j6=i
= β 0 + x02i β 1 + γ
X
E(ε1i , ε2i )
φ [−ϕi (θ1 )]
cij y2j + p
·
var(ε1i ) {1 − Φ [−ϕi (θ1 )]}
cij y2j +
j6=i
where ϕi (θ1 ) ≡ α0 +
x01i α1
+δ
j6=i
P
j6=i cij y1j
σ 12
φ [−ϕi (θ1 )]
·
σ 1 {1 − Φ [−ϕi (θ1 )]}
and in which the second multiplicand in the last
term is the usual IMR that only depends on parameters from the selection equation, implying
that the model can be estimated in two separate steps. Nevertheless, we employ the same
sequential estimator framework as with the SAE model to directly obtain the appropriate
variance-covariance matrix of the estimated parameters (Newey, 1984). We note that under
the assumptions maintained for the SAL model and regularity conditions on the spatial
dependence process (such as A6 and A9 in appendix C), standard GMM asymptotic results
apply, and thus we do not provide formal proofs of consistency and asymptotic normality of
our spatial heckit estimator for the SAL model.8
We now outline the moment conditions upon which the spatial heckit estimator for the
SAL model is based. First consider the selection equation of the spatial autoregressive lag
(SAL) model in (5), which implies a binary choice model for the observed variable y1i , and
under assumption A can be rewritten as:
P (y1i = 1) = Φ(α0 + x01i α1 + δ
X
cij y1j ) = Φ(ϕi ).
j6=i
8
For a review of the asymptotic properties of GMM sequential estimators see Newey (1984).
9
(17)
Note that this probit model contains both endogenous and exogenous variables, and thus
a nonlinear two-stage least-squares (NL2SLS) procedure (Amemiya, 1974) can be used to
estimate the parameters of interest. Let z1i be a vector of variables that contains a constant,
x01i , plus at least one instrument. Replacing ϕi for ψi in the definition of the generalized
residuals in (12), the NL2SLS estimator can be obtained by solving the following moment
conditions:
0
s(z1 , θ1 ) = z1N
u˜1N (θ1 ) = 0
where θ1 = {α0 , α01 , δ}.
The outcome equation, now expanded by the inclusion of the IMR, can be estimated with
any of the spatial methods developed for this linear equation.9 We illustrate our estimator
employing 2SLS. Define
b 1 ) − γCy2N
m(z2N , θ) = [y1N · z2N ]0 u˜2N (θ), u˜2N (θ) = y2N − β 0 − x02N β 1 − µλ(θ
where z2N includes the regressors of the outcome equation, the estimated IMR, and at
least one instrument, which could be in principle the same instrument(s) contained in z1 .10
0
0
Defining zN
= (z1N
, [y1N · z2N ]0 )0 and u˜N (θ) = (˜
u01N (θ1 ), u˜02N (θ))0 . Then, all parameters of
the sample selection model with SAL dependence can be estimated as:
ˆθGMM = arg min gN (θ)0 MN gN (θ)
θ∈Θ
p
0
where gN = N −1 zN
u˜N (θ), for a positive definite MN such that MN → M.
4
Monte Carlo Experiment
The goal of the Monte Carlo experiment is to explore the finite-sample performance of the
spatial heckit estimators for the spatial sample selection models proposed in Section 3. In
addition, our estimators are compared to three other types of estimators: those that ignore
sample selection but account for spatial dependence, such as that of Kelejian and Prucha
(1998) for estimating the SAE models and 2SLS for the SAL model; the simple heckit
estimator that accounts for sample selection but ignores spatial dependence, and finally
9
The linear SAL model could be estimated, for instance, with 2SLS, the methods by Kelejian and Prucha
(1998), Lee (2001b), or maximum likelihood.
10
In particular, the same instruments proposed by Kelejian and Prucha (1998) can be used. See footnote
7.
10
the ordinary least squares (OLS) estimator that ignores both sample selection and spatial
dependence. We compare these estimators in terms of their finite sample bias and root-mean
square error.
Given that we combine features of a sample selection model with a spatial econometric
specification, we pay close attention to previous simulation studies in specifying each of
the two features of our models, such as Cosslett (1991) and Leung and Yu (1996) for the
sample selection model, and Beron and Vijverberg (1999) and Kelejian and Prucha (1999)
for spatially dependent models.
Our data generating process (DGP) can be summarized in the following general specification that encompasses both SAL and SAE models, although we analyze them separately:
∗
y1i
= α0 + α1 x1i + α2 x2i + δ SAL
X
∗
cij y1j
+ u1i , u1i = δ SAE
j6=i
∗
= β 0 + β 1 x3i + β 2 x1i + γ SAL
y2i
X
X
cij u1j + ε1i
(18)
cij u2j + ε2i .
(19)
j6=i
∗
cij y1j
+ u2i , u2i = γ SAE
j6=i
X
j6=i
Clearly, the SAE model is obtained by setting δ SAL = γ SAL = 0, while the SAL model is
simulated by setting δ SAE = γ SAE = 0. Each of our models consists of three uncorrelated
exogenous variables, one of which is common to both equations, as in Cosslett’s (1991)
experimental design. These exogenous variables are generated as xk ∼ Unif orm(0, 1), k =
1, 2, 3. The innovations ε1i and ε2i are generated bivariate normal as follows:
¸¶
·
¸
µ· ¸ ·
1 ρ
ε1i
0
,
vN
ρ 1
0
ε2i
(20)
where we set ρ = 0.5 for the correlation between the innovations in each of the two equations.
The parameters of the model that are not related to the spatial dependence feature are set
as α1 = α2 = β 1 = β 2 = 1 and β 0 = 0. The parameter α0 is used to control the amount
of sample selection, for which we consider two cases: 25% censoring (α0 = −0.3) and 50%
censoring (α0 = −1.1). We consider in the experiment two diﬀerent sample sizes: N = 225
and 400 observations.
For the models with N = 225, we consider three diﬀerent values for the spatial parameter
(for either SAE or SAL): 0.25, 0.5, and 0.75. While for the models with N = 400, due to
computational constraints, we consider only two values: 0.25 and 0.5. Finally, we consider
two diﬀerent types of instruments in our simulations to gauge their relative performance: the
Kelejian and Prucha (KP) instruments, which are always available to the researcher; and two
artificially generated exogenous instruments that mirror a situation in which the researcher
11
has available additional instruments. In the simulations, the generated instruments are
specified to have a correlation of about 0.2 with the exogenous included variables in the
model. Due to computational constraints, 100 replications are undertaken for the models
with N = 225, whereas 50 replications are undertaken for the models with N = 400.
The matrix of spatial weights has to be specified. For this, we create a 15 by 15 grid
for the 225 observations and a 20 by 20 grid for the 400 observation matrix. Each grid
is assigned an X and Y coordinate centered on the grid such that the bottom left corner
of the grid had a value of (0.5, 0.5). We use the program Stata with these grids to create
a weighting matrix that is based on the square of the inverse Euclidean distance between
any two points. After creating the location specific weights for each grid, the matrix is row
standardized so that the diagonal elements of the weighting matrix are all zeros and the sum
of any one row is equal to 1. Finally a band is used to determine the number of observations
that may influence a centered observation. Such band is set with a lower bound of 0 and
an upper bound equal to the maximum distance between any two points. Therefore, there
exists spatial dependence for all the observations, and, on average, each observation has 4
neighbors.11 This way of specifying weighting matrix is standard in the literature, see, for
instance, Schnier (2003).
Tables 1 and 2 present some preliminary simulation results for the outcome equation
of the SAE model for samples of size 225 and 400, respectively. In addition to presenting
simulation results for OLS, Heckit and Kelejian and Prucha’s (1998) estimator for the SAE
model (KP-SAE), Table 1 also shows simulation results for two diﬀerent versions of our
spatial heckit estimator (Spheck): one in which the generated instruments discussed above
are used (Spheck Gen) and one in which the Kelejian and Prucha instruments are employed
(Spheck KP). Results are presented for a number of models that diﬀer on the extent of
sample selection and spatial dependence which are listed in the first column of the table.
For each of these models, up to four parameters of interest are reported, which are listed
in the second column of the table. The following columns in the table are arranged in two
blocks that correspond to the average bias (BIAS) and the root-mean squared error (RMSE)
of the diﬀerent estimators, respectively.
The first estimator reported is OLS, which ignores both features of the data: sample
11
We note that Kelejian and Prucha (1999) find that controlling or not for the number of neighbors per
unit when specifying a weighting matrix does not lead to significantly diﬀerent results in their simulation
study.
12
selection and spatial dependence. As a result, OLS has large bias and RMSE that increase
as the amount of sample selection or spatial dependence increases. In general, though, OLS
is able to estimate β 1 (the coeﬃcient on the variable that does not appear in the selection
equation) with relatively small bias compared to the other coeﬃcients. This is due to the
fact that the variables x1 and x3 are generated independently, and thus there is little eﬀect
of the sample selection on the coeﬃcient on x3 (β 1 ).
The second estimator reported is Heckit, which accounts for sample selection but ignores
spatial dependence. The consequence of spatial dependence on this estimator is inconsistency, as explained above, since the probit model that is estimated in the first step is heteroskedastic. Therefore, we might expect that the bias and RMSE of the Heckit will increase
as the amount of sample selection increases. This is exactly what is found in Table 1, except
for the bias when moving from spatial dependence of 0.5 to 0.75 in the models with 50%
censoring, and the bias of the parameter β 1 . Compared to OLS, the Heckit estimator shows
a great improvement, even though it is also inconsistent. While the average bias and RMSE
of β 1 is very small and comparable to that of OLS, the other two coeﬃcients (β 0 and β 2 )
have larger bias: in the case of β 2 , the bias has the interpretation of percentage and varies
from 7.2% to 14.4%, and in the case of β 0 the bias ranges from -0.35 to -0.15. Interestingly,
the bias of these two coeﬃcients is of opposite sign compared to the bias in OLS. In contrast,
the RMSE of the Heckit estimator for β 0 and β 2 is about twice as large as that of OLS β 1 .
The third estimator reported is KP-SAE that accounts for spatial dependence but ignores
the sample selection feature of the data, which would result in inconsistent parameter estimates. In agreement with this notion, the bias and RMSE increase as the amount of sample
selection increases, but also typically increase as the amount of spatial dependence increases.
Compared to the previous two estimators, the KP-SAE shows typically a higher bias: the
average bias of β 1 is higher than that of the other estimators (which are practically unbiased), and is up to 11.4%; while the bias of β 0 and β 2 is of about the same magnitude as
that of the OLS estimator and thus significantly larger than that of the Heckit estimator. In
the same way, the RMSE of the KP-SAE estimator is of about the same magnitude as that
of OLS and thus smaller than that of Heckit. KP-SAE is the first estimator that estimates
the SAE parameter γ, which shows bias and RMSE that increase with both the amount of
sample selection and spatial dependence. The bias of γ ranges from -0.77 to -0.257 (18% to
66% of the true γ), underestimating the true value in all cases, while the RMSE ranges from
13
0.152 to 0.289.
The last two estimators reported are the two versions of our estimator, Spheck Gen
and Spheck KP, which are both consistent for all parameters in the model. We note that
simulation results for two models (sample selection of 50% and spatial dependence of 0.25
and 0.5) are not included in this preliminary set of results.12 Both Spheck estimators for β 0
and β 2 have typically smaller bias than other three estimators, except in the model with 50%
selection, in which the bias of the Heckit estimator is smaller. On the other hand, the Spheck
estimators’ bias of β 1 is typically higher than that of the other estimators, especially the
OLS and Heckit. However, this bias is small (between 1.3% and 5.1%) for the models with
25% sample selection, but up to 21% in the model with 50% sample selection. In terms of
bias of γ, the Spheck estimators have significantly smaller bias than the KP-SAE estimator,
except for the one specification with 25% sample selection and 0.25 spatial dependence. This
bias ranges between 0 and 0.98, or 0% to 39% of the true γ.
In terms of RMSE of the three β coeﬃcients, the Spheck estimators typically show values
that lie between the high RMSE of the Heckit estimator and the relatively low RMSE of OLS,
although two exceptions arise. First, the Spheck estimators show smallest RMSE for the β 1
parameter (except for the 50% selection specification). Second, the Spheck estimators have
the highest RMSE of β 0 and β 2 for the two models with 25% selection and spatial dependence
of 0.25 and 0.5. Looking at the RMSE of the SAE parameter γ, the Spheck estimators show
a considerable improvement over the KP-SAE estimator, which is more true as the model
considered has more selection and higher spatial dependence.
Comparing the two sets of instruments used in the Spheck, they produce very similar
results in terms of both bias and RMSE in all four parameters, with those results using
generated instruments marginally better in several cases. This finding is important, for in
practice researchers might find it hard to come up with good instruments to use, whereas
the KP instruments are always available, and according to our results not much is lost by
employing them.
Table 2 presents the simulation partial results for N = 400 for the two specifications
with 25% censoring. Similar patterns as with N = 225 arise, although now the bias of the
Spheck estimator (using KP instruments) is considerable smaller than that of the other three
estimators in the two models reported. The only exception is the estimate of β 1 , for which
12
The main problem with these models is non-convergence of the optimization routine in several of the
replications.
14
the other estimators have slightly smaller bias. The RMSE of the three β coeﬃcients is
comparable to the RMSE of the other estimators (although slightly larger), which is notable
when compared to the simulation results from the smaller sample size, and suggest that
the precision of the Spheck estimator is acceptable for samples as small as 300 observations
(about 100 observations are censored).
In summary, we regard this set of preliminary results as encouraging with respect to the
properties of the Spheck estimator. In particular, the fact that using these relatively small
samples the advantages of our estimator show up makes us optimistic. Indeed, one of the
reasons the non-convergence issues mentioned above arise is due to the small sample size
when 50% is applied: on average estimation is undertaken with only 112 observations.
A number of issues are still ongoing. First, the completion of the simulations might shed
new light on other performance features of our estimator for the SAE model. Second, the
SAL estimator will also be simulated and compared to other estimators that do not account
for both sample selection and spatial dependence.
5
The Sample Selection Models with Spatial Dependence in Practice
In this section we discuss the empirical importance of taking into account sample selection
bias when estimating models that exhibit spatial dependence. Furthermore, we illustrate
the application of the two-step methods proposed in Section 3 using a rich data set from a
fishery, which is censored due to confidentiality issues (see Schnier, 2003).
McMillen (1995) motivates the pervasiveness of sample selection problems in spatial data,
in particular in urban economics and regional science. His main example deals with data on
land use and values in the city of Chicago during the 1920s (see references in McMillen, 1995).
In this case, unobserved variables that make a parcel more likely to receive residential zoning
may increase the value of residential land (McMillen, 1995). Other applications of sample
selection models with spatial data discussed in McMillen (1995) include models of housing
prices, rent and tenure choice (Goodman, 1988), oﬃce rents and lease provisions (Benjamin
et al., 1992), and home improvement choice (Montgomery, 1992) in urban economics; the
choice between central city and suburban employment (McMillen, 1993), analysis of earnings
and migration (Borjas et al., 1992) in labor economics. In fact, the increasing availability
of geo-coded data makes even more relevant the availability of methods to deal with sample
15
selection when spatial dependence is present.
Our application in this section is yet another application of a sample selection model with
spatial dependence, in this case in the area of environmental economics, in particular in the
economics of fisheries. We estimate two diﬀerent simple models to illustrate the performance
of our spatial heckit estimators for the SAE and SAL models.
The application of the SAL model employs data on the locational choice of fishermen in
the Eastern Bering Sea to assess the factors that influence fishermen’s selection of locations
within a fishery (Schnier, 2003). Schnier’s study is the first to incorporate spatial dependence
in such choices and investigate the presence of herding behavior among fishermen. We provide
only a brief description of the data below. A more thorough description can be found in
Schnier (2003).
Fishermen in the Eastern Bering Sea are required to document their harvest according
to the statistical reporting areas used by the Alaska Department of Fish and Game and
the National Marine Fisheries Service (NMFS), who monitor the fisheries. These statistical
reporting areas divide the Eastern Bering Sea into grids that are one-half degree latitude
by one degree longitude. These grids define the micro-regions that parse the dataset. In
addition, macro-region variables, according to the six major regions into which the NMFS
divides the Eastern Bering Sea, are used as control variables.
There are two major classifications of vessels that participate in the Pacific cod fishery, the
catcher-vessel and the catcher-processor sectors. The catcher-vessel sector represents those
vessels that travel to and from their port after harvesting, whereas the catcher-processor
sector represents those vessels that can process their catch at sea following their harvest and
are thus able to stay at sea for a longer duration than the catcher-vessel sector.
The National Oceanic and Atmospheric Administration (NOAA) provides the data used
in the analysis. However, the dataset provided by NOAA has been screened for confidentiality.
Eﬀort levels for locations that have had four or more vessels of similar characteristics in a
statistical reporting area are observed whereas locations with less than four vessels are not
observed.13 In the data used to estimate the spatial SAL model, which corresponds to the
year 2000, the amount of censoring is in the order of 45% of the 361 observations. For
comparison, the model is estimated using the same four estimators described in the previous
13
The diﬀerent characteristics that are used to diﬀerentiate the vessels are whether or not they are within
the catcher-vessel or catcher-processor sector, the length class (small, medium or large) and the type of gear
used by the vessel.
16
section.
Biomass estimates that are not a direct result of the harvest information are incorporated into the econometric model. These observations were obtained from the NMFS, which
conducts an annual biomass trawl survey in the early summer of each year to determine the
estimated densities of the fish populations within the Eastern Bering Sea. Other data used
in the analysis, such as measurements of depth within the statistical reporting area, were
obtained from nautical maps with bathymetric contours. Appendix A indicates the name of
the variables chosen and the source of the data used in the econometric model.
The use of a spatial weighting matrix is premised with the belief that fishermen’s locational choice is not only determined by the expected biomass in a location but also by
the observed eﬀort exerted by other fisherman and the expected biomass in the surrounding
locations. In order to determine such spatial weighting matrix, one must construct a belief
about which surrounding locations aﬀect a specific locational observation. We use the following common specification to assign spatial weights among locations: cij =
1
,
d2ij
where cij
is the spatial weight assigned to the distance between location i and location j, and dij is the
Euclidian distance between locations i and j. The spatial weights, cij , are row standardized
such that the diagonal elements of the spatial weighting matrix are all zero and the sum of
any one row is one.
The SAL model we estimate is the one in (5)-(6), specifying hauls as the dependent
variable in the outcome equation. The variables used in both the selection equation and
the outcome equation are illustrated in Appendix B. For identification purposes, the variable DumCV Mi (which indicates catcher-vessel (CVi ) and medium class vessel (Mi )) is
suppressed from the estimation of the main equation. This variable was created because a
predominant number of vessel groups that possessed this classification did not possess any
haul information, y1i = 0 (were censored), and could therefore be used to explain the selection process. We implicitly assume that DumCV Mi does not enter the outcome equation,
which is plausible since CVi and Mi are already included in this equation.
Results of the empirical application for the SAL model are in process...
The second empirical application involves illustrating the sample selection SAE model in
the estimation of the eﬃciency of the production process within a fishery, using the catchper-unit-eﬀort (CPUE) as a measure of eﬃciency.
17
The CPUE is defined as the quantity of a target species caught per a haul executed, the
use of a trawl device, pot vessel, hook-and-line, jig, etc. Therefore, a higher CPUE indicates
a higher degree of eﬃciency, given the inputs utilized. However, it is plausible to assume that
for a fixed level of technological inputs, the CPUE may vary over the spatial distribution
of the fishery and be correlated with the CPUE in surrounding locations. As a result, it is
desirable to estimate the CPUE controlling for the spatial relationships within the fishery.
To estimate this model, we use data from NMFS and NOAA for the Pacific cod fishery
within the Eastern Bering Sea for the year 1997, which has been screened for confidentiality.
This feature of the data makes it necessary to employ an estimation method that would
remove the bias arising from sample selection while at the same time accounting for spatial
dependence, such as our spatial heckit model with SAE dependence.
The model estimated is of the form (1)-(2) with the observation matrix containing data
for an aggregation of vessels of similar description and the CPUE as the dependent variable
in the outcome equation. CPUE is defined as the catch for the aggregate within location i
divided by the number of "unique hauls" executed in location i by the aggregate. Unique
hauls reflects the amount of eﬀort that the aggregate exerted within the given location. Variables within the observation matrix are vessel group (catcher-processors vs. catcher-vessels),
size of vessels (small, medium or large) and gear utilized (non-pelagic trawl, bottom trawl,
pelagic trawl, hook-and-line, pot vessel or jig gear). To augment this data set, bathymetric
measurements are added indicating maximum and minimum depth in the area fished as well
as biomass estimates obtained from the annual biomass trawl survey conducted by NMFS.
This data set consists of 320 observations, with a censoring process of 35% and contains
observations on 90 spatially distinct regions within the Eastern Bering Sea.
Results of the empirical application for the SAE model are in process..
6
Conclusion
This paper proposes a method of estimation for two sample selection models with spatial
dependence that diﬀer in terms of the type of spatial correlation present. The first, a model
with spatial autoregressive errors (SAE), is shown to be analytically more challenging than
the second, a model with spatial autoregressive lags (SAL).
The method of estimation for both models is analogous to the popular heckit model
18
(and thus we call our estimator "spatial heckit"), in which first consistent estimates of the
probability of observing a particular unit (selection equation) are estimated using a probit
model. Then, the odds of observing each unit are calculated (the inverse Mills ratio) and
used as an additional regressor that controls for the selection bias in the equation of interest
(outcome equation). Importantly, the method we propose for the model with SAE cannot
be computed in the popular two-step procedure for the heckit model since the appropriate
inverse Mills ratio depends on the SAE parameter of the outcome equation. Instead, we
propose to estimate the whole model jointly by nesting the two equations into a "pseudo"
sequential GMM framework (Newey, 1984). For the SAL estimator, this complication does
not arise and thus can in principle be estimated in two steps, however, an advantage of using
the joint GMM framework is that it yields the correct standard errors for the parameters
directly.
We explore the properties of our estimators by deriving their asymptotic properties,
conducting simulations, and applying them to actual data. The estimators are consistent
and asymptotically normally distributed. The simulations show the potential biases incurred
by other estimators that ignore sample selection, spatial dependence, or both, and also show
that our estimator is valuable when the data exhibits both of these characteristics. Finally,
the empirical application section argues that sample selection is a common occurrence in
spatial data sets typically available to researchers, and shows that our estimators are feasible
to use in practice.
While our proposed estimator is among the first to account for sample selection and
spatial dependence simultaneously, some shortcomings are worth mentioning. First, our estimator relies on a distributional assumption (joint normality) of the error terms in selection
and outcome equations, just as the heckit model does. Second is the computational intensity
of our estimator, which is higher than the available methods for spatial models without sample selection, although still compares favorably in this respect with estimation methods that
require approximation of multidimensional integrals. These shortcomings of our estimators
indicate areas for future research.
19
References
[1] Amemiya, T. 1974. The Nonlinear Two-Stage Least-Squares Estimator. Journal of
Econometrics, 2: 105-10.
[2] Amemiya, T. 1985. Advanced Econometrics. Cambridge, Mass.: Harvard University
Press.
[3] Andrews, D.W.K. 1992. Generic Uniform Convergence. Econometric Theory, 8: 241-57.
[4] Anselin, L. 1988. Spatial Econometrics: Methods and Models. Dordrecht: Kluwer Academic Publishers.
[5] Anselin, L. 2001. Spatial Econometrics. In B. H. Baltagi (ed.), A Companion to Theoretical Econometrics, Oxford: Blackwell.
[6] Anselin, L. and A. Bera. 1998. Spatial Dependence in Linear Regression Models with
an Introduction to Spatial Econometrics. In A. Ullah and D.E.A. Giles (eds.), Handbook
of Applied Economic Statistics. New York: Marcel Dekker.
[7] Benjamin, J.D., J. Sa-Aadu and J.D. Shilling. 1992. Influence of Rent Diﬀerentials on
the Choice Between Oﬃce Rent Contracts with and without Relocation Provisions.
Journal of American Real Estate and Urban Economics Association. 20: 289-302.
[8] Bernstein, S. 1927. Sur l’Extension du Theoreme du Calcul des Probabilites aux Sommes
de Quantities Dependantes. Mathematische Annalen. 97, 1-59.
[9] Beron, K. and W. Vijverberg. 1999. Probit in a Spatial Context: A Monte Carlo Analysis. Forthcoming in L. Anselin and R. Florax (eds.), Advances in Spatial Econometrics.
New York: Springer.
[10] Borjas, G.J., S.G. Bronars, and S. Trejo. 1992. Self-Selection and Internal Migration in
the United States. Journal of Urban Economics. 32: 159—85.
[11] Case, A.C. 1991. Spatial Patterns in Household Demand. Econometrica. 59: 953-965.
[12] Case, A.C. 1992. Neighborhood Influence and Technology Change. Regional Sciences
and Urban Economics. 22: 491-508.
20
[13] Casetti, E. 1972. Generating Models by the Expansion Method: Applications to Geographical Research. Geographical Analysis. 4: 81-91.
[14] Cosslett, S.R. 1991. Semiparametric Estimation of a Regression Model with Sample Selectivity. In W.A. Barnett, J. Powell and G. Tauchen (eds.), Nonparametric and Semiparametric Methods in Econometrics and Statistics. Cambridge: Cambridge University
Press.
[15] Davidson, J. 1994. Stochastic Limit Theory. Oxford University Press: Oxford.
[16] Fishback, P.V., W.C. Horrace and S. Kantor. 2002. Do Federal Programs Aﬀect Internal Migration? The Impact of New Deal Expenditures on Mobility During the Great
Depression. Working Paper Series, NBER, Number 8283.
[17] Fleming, Mark. 2002. Techniques for Estimating Spatially Dependent Discrete Choice
Models. Forthcoming in L. Anselin and R. Florax (eds.), Advances in Spatial Econometrics. New York: Springer.
[18] Goodman, A.C. 1988. An Econometric Model of Housing Price, Permanent Income,
Tenure Choice, and Housing Demand. Journal of Urban Economics. 23: 327-53.
[19] Heckman, J. 1976. The Common Structure of Statistical Models of Truncation, Sample
Selection, and Limited Dependent Variables and a Simple Estimator for such Models.
The Annals of Economic and Social Measurement. 5: 475-492.
[20] Heckman, J. 1979. Sample Selection Bias as a Specification Error. Econometrica. 47:
153-61.
[21] Kelejian, Harry H. and Ingmar R. Prucha. 1998. A Generalized Spatial Two-Stage Least
Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive
Disturbances. Journal of Real Estate Finance and Economics, 17: 99-121.
[22] Kelejian, Harrry H. and Ingmar R. Prucha. 1999. A Generalized Moments Estimator
for the Autoregressive Parameter in a Spatial Model. International Economic Review,
40: 509-33.
[23] Kelejian, H.H. and I.R. Prucha. 2001. On the Asymptotic Distribution of the Moran I
Test Statistic with Applications. Journal of Econometrics. 104: 219-57.
21
[24] Lee, L.F. 2001a. Generalized Method of Moments Estimation of Spatial Autoregressive
Processes. Working paper, Ohio State University.
[25] Lee, L.F. 2001b. GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Working paper, Ohio State University.
[26] LeSage, J.P. 2000. Bayesian Estimation of Limited Dependent Variable Spatial Autoregressive Models. Geographical Analysis. 32: 19-35.
[27] Leung, S.F and S. Yu. 1996. On the Choice Between Sample Selection and Two-Part
Models. Journal of Econometrics. 72: 197-229.
[28] Maddala, G.S. 1983. Limited Dependent and Qualitative Variables in Econometrics.
Cambridge: Cambridge University Press.
[29] McLeish, D.L. 1974. Dependent Central-Limit Theorems and Invariance Principals. Annals of Probability. 2, 620-628.
[30] McMillen, D.P. 1992. Probit with Spatial Autocorrelation. Journal of Regional Science.
32: 335-48.
[31] McMillen, D.P. 1993. Can Blacks Earn More in the Suburbs? Racial Diﬀerences in
Intrametropolitan Earnings Variation. Journal of Urban Economics. 33: 167-88.
[32] McMillen, D.P. 1995. Selection Bias in Spatial Econometric Models. Journal of Regional
Science. 35: 417-36.
[33] Montgomery, C. 1992. Explaining Home Improvement in the Context of Household
Investment in Residential Housing. Journal of Urban Economics. 32: 326-50.
[34] Newey, W. 1984. A Method of Moments Interpretation of Sequential Estimators. Economics Letters. 14: 201-206.
[35] Pinkse, J. and M.E. Slade. 1998. Contracting in Space. An Application of Spatial Statistics to Discrete-Choice Models. Journal of Econometrics. 85: 125-54.
[36] Schnier, K. 2003. Because That’s Where the Fish Are? Spatial Analysis of Locational
Choice in the Eastern Bering Sea. Working Paper, University of Arizona.
22
[37] Topa, G. 2001. Social Interactions, Local Spillovers, and Unemployment. Review of
Economic Studies. 2: 133-138.
[38] Vella, F. 1998. Estimating Models with Sample Selection Bias: A Survey. Journal of
Human Resources. 33: 127-69.
23
Table 1. Simulation Results for N=225 - Outcome Equation
BIAS
(sel, d=?)
(25%,0.25)
β0
OLS
Heckit
0.305
-0.001
-0.208
-0.153
0.001
0.110
KP-SAE
0.295
0.009
β2
-0.200
γ
-0.077
β0
(25%,0.5)
0.320
-0.199
0.315
β1
0.008
0.010
0.033
β2
-0.219
0.129
-0.229
γ
-0.127
β0
(25%,0.75)
0.397
-0.272
0.405
β1
-0.004
-0.003
0.114
β2
-0.243
0.134
-0.353
γ
-0.137
β0
(50%,0.25)
0.561
-0.220
0.556
β1
-0.011
0.000
-0.012
β2
-0.302
0.109
-0.294
γ
-0.167
β0
(50%,0.5)
0.579
-0.349
0.576
β1
-0.004
0.006
0.011
β2
-0.309
0.144
-0.302
γ
-0.242
β0
(50%,0.75)
0.656
-0.221
0.685
β1
0.002
0.013
0.047
β2
-0.326
0.072
-0.362
γ
-0.257
Note: Simulation results are based on 100 replications.
β1
RMSE
Spheck
Gen
-0.109
-0.031
0.074
0.098
-0.152
-0.013
0.113
0.000
-0.203
-0.025
0.141
-0.072
0.404
-0.175
-0.184
-0.029
Spheck
KP
-0.196
-0.051
0.133
0.091
-0.229
-0.032
0.143
-0.002
-0.265
-0.050
0.139
-0.074
0.407
-0.216
-0.151
-0.037
OLS
Heckit
KP-SAE
0.380
0.268
0.336
0.757
0.268
0.615
0.408
0.276
0.348
0.886
0.277
0.669
0.538
0.335
0.403
1.808
0.336
1.041
0.639
0.344
0.452
1.190
0.350
0.780
0.669
0.359
0.476
1.560
0.361
0.852
0.773
0.376
0.507
1.943
0.382
0.943
0.379
0.279
0.336
0.152
0.438
0.372
0.398
0.171
0.849
0.949
0.972
0.167
0.631
0.339
0.452
0.221
0.685
0.395
0.519
0.276
0.946
0.822
0.906
0.289
Spheck
Gen
0.790
0.264
0.647
0.136
0.923
0.277
0.694
0.087
1.251
0.325
0.784
0.101
Spheck
KP
0.934
0.278
0.785
0.133
1.042
0.287
0.791
0.083
1.193
0.329
0.750
0.104
1.678
0.415
0.799
0.067
1.839
0.425
0.898
0.072
Table 2. Simulation Results for N=400 - Outcome Equation
BIAS
(sel, d=?)
(25%,0.25)
β0
β1
β2
γ
(25%,0.5)
β0
β1
β2
γ
(50%,0.25)
OLS
0.329
-0.009
-0.242
Heckit
-0.066
-0.006
0.028
0.334
-0.009
-0.234
-0.069
-0.012
0.036
KP-SAE
0.330
-0.003
-0.255
-0.086
0.357
-0.019
-0.263
-0.125
β0
β1
β2
γ
(50%,0.5)
β0
β1
β2
γ
Note: Simulation results are based on 50 replications.
RMSE
Spheck
KP
-0.065
-0.064
0.018
0.089
-0.053
-0.043
0.012
-0.001
OLS
0.361
0.188
0.304
Heckit
0.435
0.191
0.315
0.374
0.205
0.301
0.484
0.203
0.329
KP-SAE
0.363
0.208
0.319
0.139
0.405
0.287
0.352
0.157
Spheck
KP
0.451
0.213
0.310
0.108
0.490
0.211
0.321
0.050
7
Appendix A
The following provides a description of the variables used in the estimation procedure and
the sources from which they were obtained.
HAULSi : The aggregate hauls for a given vessel group classification.(NOAA)
CPi : A dummy variable for a catcher-processor vessel group.(NOAA)
CVi : A dummy variable for a catcher-vessel vessel group.(NOAA)
Li : A dummy variable for a large class vessel group.(≥ 125 feet)(NOAA)
Mi : A dummy variable for a medium class vessel group.(99 − 124 feet)(NOAA)
Si : A dummy variable for a small class vessel group.(< 99 feet)(NOAA)
DumCV Mi : A dummy variable created by crossing CVij with Mij .
NP Ti : A dummy variable for whether or not the vessel group used non-pelagic trawl
gear.(NOAA)
BT Ri : A dummy variable for whether or not a vessel group used a bottom trawl.(NOAA)
P T Ri : A dummy variable for whether or not a vessel group used pelagic trawl gear.(NOAA)
HALi : A dummy variable for whether or not a vessel group used hook and line gear.(NOAA)
P OTi : A dummy variable for whether or not a vessel group used pot vessels for gear.(NOAA)
JIGi : A dummy variable for whether or not a vessel group group used jigs as gear.(NOAA)
REGION1: A dummy variable for the region 1 classification of the Eastern Bering
Sea(NMFS)
REGION2: A dummy variable for the region 2 classification of the Eastern Bering
Sea(NMFS)
REGION3: A dummy variable for the region 3 classification of the Eastern Bering
Sea(NMFS)
REGION4: A dummy variable for the region 4 classification of the Eastern Bering
Sea(NMFS)
REGION5: A dummy variable for the region 5 classification of the Eastern Bering
Sea(NMFS)
REGION6: A dummy variable for the region 6 classification of the Eastern Bering
Sea(NMFS)
XCOORDi : The X-Coordinate that observation ij was observed.(Nautical Maps and
ADFG Stat Areas)
Y COORDi : The Y-Coordinate that observation ij was observed.(Nautical Maps and
ADFG Stat Areas)
MAXDEP T Hi : The maximum depth in location i.(Nautical Maps with Bathymetric
Measurements)
MINDEP T Hi : The minimum depth in location i.(Nautical Maps with Bathymetric
Measurements)
BIOMASSi : The biomass estimate for observation ij. (NMFS)
LAGBIOMASSi : The lagged biomass estimate for observation ij. (NMFS)
IMRi : The inverse-mills ratio constructed from the selection equation.
8
Appendix B
The following is a list of the variables included in the empirical application of equations
(??)-(??).
Pacific cod:
x1i : CPi , Li , DumCV Mi, NP Ti , BT Ri , HALi , P OTi ,
REGION3, REGION 4, REGION5, REGION6, Dum1995, Dum1996, Dum1997,
Dum1998, Dum1999, BIOMASSi , LAGBIOMASSi , MAXDEP T Hi , MINDEP T Hi
24
z1i : CPi , Li , NP Ti , BT Ri , HALi , P OTi , REGION3,
REGION4, REGION 5, REGION6, Dum1995, Dum1996, Dum1997, Dum1998,
Dum1999, BIOMASSi , LAGBIOMASSi , MAXDEP T Hi , MINDEP T Hi , IMRi
9
Appendix C
The parameter estimate of the sample selection model with SAE dependence is obtained
0 ∼
from the solution to (15), where gN (θ) = N1 zN uN (θ). Denote g(θ) ≡ lim E[gn (θ)]. Then,
n→∞
the unknown parameter vector θ0 satisfies lim E[gN (θ0 )] = 0. Further, define the objective
n→∞
p
0
function as QN = gN (θ)MN gN (θ), where MN → M, and Q = g 0 (θ) Mg (θ).
The proof of all three propositions follow closely Pinkse and Slade (1998). The main
diﬀerence is that some extra conditions have to be verified in for the additional moments
stacked in gN (θ) and the estimated inverse-Mills ratio (IMR).
Assumptions
A1 θ0 is in the interior of the parameter space Θ,which is a compact set.
A2 Q is uniquely minimized at θ0
A3 The vector valued function g (θ) is continuous.
A4 The density of observations in any region whose area exceeds a fixed minimum is
bounded.
A5 The elements of zN are uniformly bounded.14
A6 Let dij denote the distance between location i and j, then
sup|cov(y1i , y1j )| ≤ α (dij ) ,
N ij
sup|cov(y2i , y2j )| ≤ α (dij ) and
N ij
α (d) → 0 as d → ∞.
A7 The moments var (uij ) = σ 21
P £ 1 ¤2
P £ 2 ¤2
P
ωij , var (u2i ) = σ 22
ω ij , and E [u1i, u2i ] = σ 12 ω1ij ω2ij
j
j
j
are uniformly bounded , bounded away from zero, and boundedly diﬀerentiable (with
respect to θ).
p
A8 As N → ∞, MN → M for some positive definite matrix M.
A9 As d → ∞, d2 α (dd∗ ) /α (d∗ ) → 0, for all fixed d∗ > 0.
A10 The area in which the observations are located grows at a rate of
√
N in both directions.
0
A11 Ψ1 (θ0 ) = lim E{NgN (θ0 )gN
(θ0 )} and Ψ2 (θ0 ) = [∂g 0 (θ0 )/∂θ]M[∂g(θ0 )/∂θ0 ] are posiN→∞
tive definite matrices.
14
λ(δ, γ) is an element of zN which will be shown below to be uniformly bounded.
25
uN i
var(uN i )
A12 Let ρNij (θ) be the covariance between √
and √
uN j
var(uN j )
. For some fixed N ∗ > 0,
ρN ij (θ) is boundedly diﬀerentiable , uniformly in θ ∈ Θ, N > L, and i 6= j.
A13 | ρNij (θ) | is boundedly away from one from below, uniformly in θ ∈ H(θ0 ), N > N ∗ ,
and i 6= j, with H(θ0 ) some neighborhood of θ0 .
P
A14 As N → ∞, N1
| ρNij (θ) | is uniformly bounded in θ ∈ H(θ0 ).
ij
A discussion of the implications of these assumptions can be found in Pinkse and Slade
(1998).
Proof of Proposition 1
By assumption A2, Q is uniquely minimized at θ0 . Thus, we only need to establish that
QN converges uniformly to Q over the parameter space Θ.
To show that QN converges uniformly to Q over Θ, it suﬃces to show that
p
p
(a) QN → Q at all θ ∈ Θ ⇐⇒ gN (θ) → g (θ) at all θ ∈ Θ.
p
(b) QN is stochastically equicontinuous and Q is continuous on Θ ⇐⇒ gN (θ) → g (θ) is
stochastically equicontinuous.
p
p
For (a), note that g (θ) ≡ lim E [gN (θ)], that is , E [gN (θ)] → g (θ) . If gN (θ) →
N→∞
p
E [gN (θ)], then gN (θ) → g (θ) , so we show the former.
Define the functions
τ 1i (ψi (θ)) ≡ φ (ψi (θ)) /{Φ (ψi (θ)) [1 − Φ (ψi (θ))]}
such that u
e1i (θ) = τ 1i (ψi (θ)) (yi − Φ (ψi (θ))).
0
Let τ i (ψi (θ)) ≡ (τ 1i (ψi (θ))) , 1) where 1 is a conformable vector of ones such that
0
0
0
u
eN (θ) = ((τ 1 (ψ (θ)) (y − Φ (ψ (θ)))) , 1u2N (θ)) .
Then,
lim E|gN (θ) − E [gN (θ)] |2
1 X 0
= lim 2
zi τ Ni τ Nj zj cov [yi, yj ]
n→∞ N
ij
1 X
≤ lim 2 C α (dij ) by assumptions A5-A7
N→∞ N
ij
n→∞
= 0
p
since α (dij ) → 0 as d → ∞ by assumption A6. Therefore, since gN (θ) → E [gN (θ)] then
p
gN (θ) → g (θ).
For (b), given that by assumption A3 g(θ) is continuous, we need to show that gN (θ) is
stochastically equicontinuous.
26
Using the mean value theorem for θ∗ between θ and ˜θ we rewrite
³ ´
uNi (θ∗ )
1 0 0
1 0 ∂e
gN (θ) − gN ˜θ = zN {e
uN − u
eN (˜θ)} = zi
(θ − ˜θ).
0
N
N
∂θ
Following Andrews(1992), stochastic equicontinuity is implied by
¯
¯
¯ 1 X 0 ∂e
uNi (θ) ¯¯
¯
sup ¯
zi
¯ = Op (1) .
0
∂θ ¯
θ∈Θ ¯ N
i
0
0
Recall that τ i (ψ (θ)) ≡ (τ 1i (ψ (θ)) , 1) with τ 1i (ψi (θ)) ≡ φ[ψi (θ)]/Φ[ψi (θ)] {1 − Φ (ψi (θ))}.
0
0 0
Similarly, define hi (θ) ≡ ((yi − Φ(ψi (θ))) , u
e2i (θ) ) , such that uNi (θ) = τ i (ψi (θ)) hi (θ) and
the derivative becomes:
·
¸
∂e
uNi (θ)
∂τ i (ψi (θ))
∂hi (θ)
∂ψi (θ)
=
hi (θ) −
τ i (ψi (θ))
0
∂θ
∂θ
∂θ
∂θ
where
∂τ i (ψi (θ))
∂
=
∂θ
∂θ
∂
∂hi (θ)
=
∂θ
∂θ
·
0
τ 1i (ψi (θ))
1
"
¸
=
"
∂
∂θ
h
Φ[ψ i (θ)]
Φ[ψ i (θ)]{1−Φ(ψ i (θ))}
0
0
[y1 − Φ[ψi (θ)]]
¤0
£
0
y2 − β 0 − x2i β 1 − µλi (δ, γ)


#

=
i0 #
0
[−Φ[ψi (θ)]]
h
i0 
0
∂λi (δ,γ)
− 1 − x2i − µ ∂θ − λi (δ, γ)

∂ 
∂ψi (θ)
 rαo + x1i α1 
=
∂θ
∂θ  P £ω 1 (θ)¤2 
0
j
ij
rP £

¤2 
1
∂
(θ)
ω
·
¸
sX
ij
´


¤2 ³
¤2
£ 1
£ 1
j
0
 · Σ ωij (θ)
= 
1 + x1i − (α0 + x1i α1 )
ωij (θ)

 j
∂θ
j
N i (θ)
is bounded uniformly. Following
Then the task is to show that each of the parts of ∂hu∂θ
0
Pinkse and Slade (1998), we first establish
¯
¯
¯ ∂τ (t)
¯
∂h
(t)
¯
¯<∞
h
(t)
−
τ
(t)
sup
¯ ∂t
¯
∂t
yi ∈0,1;t∈R,y2 ∈R
for which it is enough to show (i) ∂τ∂θ(θ) and (ii) ∂h(t)
τ (t) are bounded uniformly in t.
∂θ
1 (t)
For (i), given that the only non-zero components of ∂τ∂θ(θ) are those of ∂τ∂θ
, we concentrate
on those. These components are the same as in Pinkse and Slade (1998) setup and thus their
same arguments apply. Note that:
·
½
¾¸
1
φ (t)
φ (t)
φ2 (t)
∂τ 1 (θ)
=
−t − 2
∂θ
Φ (t) 1 − Φ (t) 1 − Φ (t)
Φ (t) (1 − Φ (t))
27

and the only places it can be unbounded are at ±∞. Since the expression is an even function,
φ(t)
and rewrite the above expression as:
it suﬃces to check t → ∞. Define Υ (t) = 1−Φ(t)
∂τ 1 (t)
(Υ (t) − t)2 + t(Υ (t) − t) − φ (t) (Υ (t) − t)/Φ (t) − tφ (t) /Φ (t)
=
∂t
Φ (t)
and noting that as t → ∞, Φ (t) → 1 and tφ (t) → 0, and that
½
¾
Z ∞
Z ∞
¡ −4 ¢
tφ (t)
φ (w)
1
1 − Φ (w) =
dt =
1+ 2 +O w
φ (t) dt =
t
w
w
w
w
as w → ∞, the remaining term is
Υ (t) − t =
as t → ∞.
Thus, analyzing
[ 1 + O(t−3 )]2 + t
∂τ 1 (t)
= t
∂t
and hence
∂τ 1 (θ)
∂θ
£1
t
1+
t
1
− t = + O(t−3 )
−4
+ O (t )
t
1/t2
¤
£
¤
+ O(t−3 ) − φ (t) 1t + O(t−3 ) /Φ (t) − tφ (t) /Φ (t)
→1
Φ (t)
is bounded.
∂h(t)
τ
∂θ
For (ii), that is,
(t), recall that


0
[−φ (t)]
∂h (t)  h
i0  and τ (t) =
=
0
∂λ(δ,γ)
∂t
− 1 − x2 − µ ∂θ − λ (δ, γ)
·
0
τ 1 (t)
1
¸
and note it can be written as
#
# "
"
1
[Υ (t) − t + t]
−φ (t) Φ(t)
−φ (t) τ 1 (t)
∂h (t)
h
i
i .
h
=
τ (t) =
0
0
∂λ(t)
− 1 − x2 − µ ∂λ(t)
−
λ
(t)
−
µ
−
λ
(t)
−
1
−
x
∂t
2
∂t
∂t
1
The first component, φ (t) Φ(t)
[Υ (t) − t + t] is bounded by the same arguments above:
−1
Φ (t) → 1, (Υ(t) − t) = t + Oh(t−3 ) and φ (t) → 0. So
i that it remains to check that the
0
∂λ(t)
second component is bounded: 1 − x2 − µ ∂t − λ (t) .
Take λ (t) =
Σω1ij ω2ij
j
Σ[ω1ij ]2
j
·
φ(t)
{1−Φ(t)}
=
Σω1ij ω2ij
j
Σ[ω 1ij ]2
Υ (t).
j
1
as t → ∞ and by assumption A7, λ (t) is bounded.
1/t+1/t3 +(1/t)O(t−4 )
1
2
Σωij ωij
j
=
· ∂Υ(t)
, by assumption A7 the first term is bounded, while using
Taking ∂λ(t)
∂t
∂t
Σ[ω 1ij ]2
Since Υ (t) =
j
results above, the second term:
·
¸
φ2 (t) − φ (t) t (1 − Φ (t))
∂
φ (t)
{1 − Φ (t)} φ (t) (−t) − φ (t) (−φ (t))
=
=
∂t 1 − Φ (t)
[{1 − Φ (t)}]2
[{1 − Φ (t)}]2
=
φ2 (t)
[1/t + 1/t2 + 0 · (t−4 )]
−
φ2 (t) [1/t + 1/t3 + (1/t)0 · (t−4 )]2 [1/t + 1/t3 + (1/t)0 · (t−4 )]2
28
∂
and thus ∂t
Υ (t) is bounded and therefore ∂λ(t)
is bounded and the expression ∂h(t)
is also
∂t
∂t
bounded.
¸
·
y1 − Φ (t)
¤ is also bounded as Φ (t) →
Note also that the term h (t) = £
0
y2 − β o − x2 β 1 − µλ (t)
1 and λ (t) is bounded.
By assumption A5 the elements of zN are bounded and thus for some constant C:
¯
¯
°
N °
¯ 1 X 0 ∂e
¯
°
∂ψ
u
(θ)
1 X°
(θ)
¯
¯
Ni
i
°
°.
sup ¯
zi
¯ ≤ Csup
0
°
∂θ °
∂θ ¯
θ∈Θ ¯ N i
θ∈Θ N i=1
Finally, checking
∂ψ i (θ)
∂θ
with previous results,
rP £

¤2 
1
"
#
∂
(θ)
ω
sX
ij
´
 X£ 1
¤2 ³
¤2
£ 1
j
∂ψi (θ) 
0
0
·
.
1 + x2i − (α0 + x1i α1 )
=
ω ij (θ)
ω ij (θ)


∂θ
∂θ
j
j
Since x1i is bounded by A5 and the terms in the sums are also bounded by A7, then
°
N °
P
° ∂ψi (θ) °
sup N1
° ∂θ ° is bounded by A6 and A1. QED
θ∈Θ
i=1
Proof of Proposition 2
The first order
³ ´ conditions based on the objective function QN = gN 0 (θ) MN gN (θ) are
∂QN ˆθ
given by
= 0. Using the mean value theorem for θ∗ between ˆθ and θ0 we can write:
∂θ
³
´ · ∂ 2 Q (θ∗ ) ¸−1 ∂Q (θ )
N
N
0
ˆθGMM − θ0 =
(21)
0
∂θ
∂θ∂θ
The expression that corresponds to the second derivative of the objective function QN
can be written as:
"N
#
N
X
∂ 2 QN (θ)
eNi (θ) 0
∂e
uNi (θ) 0
∂e
uNj (θ)
2 X ∂2u
.
= 2
z MN zj u
eN j (θ) +
zi MN zj
0
N i=1 ∂θ∂θ0 i
∂θ
∂θ∂θ0
∂θ
i=1
We start by analyzing the convergence properties of this second derivative of the objective
function. The following lemma will be useful.
Lemma 1 (Pinkse and Slade (1998) Lemma A3)
p
For any ˜θ consistent for θ0 , i.e. ˜θ −→ θ0 :
³ ´
∂gN ˜θ
p ∂g (θ 0 )
(a)
−→
0
∂θ
∂θ0
29
³ ´ p
(b) gN ˜θ −→ gN (θ0 )
Proof:
(a) Need to show that for ω such that kωk = 1,
³ ´


∂gN ˜θ
∂gN (θ0 )  p
ω0 
−
−→ 0
0
∂θ
∂θ0
∂gN (θ0 )
∂g (θ0 )
p
=
follows from gN (θ0 ) −→ g (θ0 ) and A3.
0
0
N−→∞
∂θ
∂θ
0
Setting z¯i = ω zi and using the mean value theorem, the above expression can be
written as
³ ´


˜
N
N
´0 1 X
θ
∂e
u
∂2u
eNi (θ∗ )
1 X  Ni
∂e
uN i (θ0 )  ³˜
=
z¯i
−
z
¯
=
θ
−
θ
0
i
N i=1
N i=1
∂θ0
∂θ0
∂θ∂θ0
as lim
where θ∗ is between ˜θ and θ0 .
N
1P
∂2u
eNi (θ∗ )
Analyzing the last expression,
z¯i
is bounded since by A5 z¯i is uniN i=1
∂θ∂θ0
formly bounded (including λ (δ, γ), which was shown in the proof of proposition 1)
´ p
³
eNi (θ∗ )
∂2u
p
˜
˜
is also bounded. Therefore, since θ −→ θ0 , θ − θ0 −→ 0 and thus
and
0
³ ´
 ∂θ∂θ

∂gN ˜θ
∂gN (θ0 )  p
−
−→ 0 ¥
ω0 
0
∂θ
∂θ0
(b) The proof is analogous to part (a).
N
P
zj u
eNj (θ) =
j=1
p
gN (θ∗ ) −→ gN (θ0 )
Now returning to analyzing the second derivative of QN (θ), note that
op(N) from (a) in the proof of proposition 1. Furthermore, by lemma 1
N ∂2u
P
eNi (θ∗ ) 0
zi ω is bounded in probability in probability ∀ kωk = 1. Then, the
and also N1
∂θ∂θ0
i=1
first term inside the square brackets will vanish asymptotically.
³ ´
˜θ
∂e
u
N
Ni
P
∂ 2 QN (θ)
1
, it will converge
Considering the remaining term of ∂θ∂θ0 , looking at N zi
∂θ0
i=1
∂ 2 QN (θ∗ ) p
∂g (θ0 )
p
by
lemma
1.
Finally,
since
M
→
M
by
A8,
→
in probability to
N
∂θ¸0
∂θ∂θ0
¸ ·
· 0
∂g (θ0 )
∂g (θ0 )
≡ Ψ2 (θ0 ).
M
∂θ
∂θ0
30
∂QN (θ0 )
∂QN (θ0 )
in equation (21), which is equal to
=
∂θ
∂θ
0
0
0
∂g (θ0 )
∂g (θ0 ) p ∂g (θ0 )
2 N
MN gN (θ0 ) and it follows from previous results that N
.
−→
∂θ
∂θ
∂θ
The remaining task is to show that gN (θ0 ) → N (0, Ψ1 (θ0 )). To do this, we follow again
closely Pinkse and Slade (1998)’s strategy and employ Berenstein (1927) blocking method
using McLeish (1974) central limit theorem for dependent processes in Davison’s (1994,
chapter 24).
N
1 √
P
0
(θ0 )]}− 2 NgN (θ0 ) = √1N AN t for implicStart by defining Y0N = ω 0 {E [NgN (θ0 ) gN
Now we turn to the term
t=1
d
itly defined ANt and ∀ kωk = 1, and thus the task is to show Y0N → N (0, 1).
Following Davison (1994, chapter
24),
√
√ split the region in which the observations are
located into aN areas of size c1 bN × c2 bN each, where aN and bN are integers such that
aN bN = N.
Without loss of generality, set c1 = c2 = 1 and let aN and bN be such that α (bN ) aN → 0
1
and N l−( 2 ) bN < 1 uniformly in N for some fixed 0 < l < 12 .
Define the set of indices ΛNj that correspond to observations in area j. By assumption,
a number c > 0 exists such that max |ΛN j | < CbN , where |·| applied to sets denotes the
j
cardinality of that set. Define DN j =
√1
N
P
ANt such that Y0N =
t∈ΛN j
aN
P
DNj .
j=1
Following Davison’s (1994) theorem 24.1, McLeish’s (1974) CLT requires that the following conditions hold.
aN
(a) TNaN = Π (1 + iλDNj ) is uniformly integrable in N > N ∗ for some fixed N ∗ and
j=1
λ > 0.
(b) E [TNaN ] − 1 → 0
(c)
aN
P
p
2
DNj
−1→0
j=1
p
(d) max |DN j | → 0
j
Before establishing each of these conditions, we note that for suﬃciently large N, ANt is
0
bounded, since Ψ1 (θ0 ) is positive definite (p.d.) and thus E [NgN (θ0 ) gN
(θ0 )] is p.d. and
its inverse is bounded. We establish each of the above conditions in turn.
(a) It needs to be shown that for some fixed N ∗ ,
sup E |TNaN I {|TNaN | > K}| → 0 as K → ∞
N>N ∗
where I is an indicator function.
Following Pinkse and Slade (1998), we begin by showing that P
for some K > 0 which will imply the above condition.
Note that
31
·
¸
sup |TNaN | > K = 0
N>N ∗
¯
¯
¸
¯ aN
¯
¯
¯
P sup |TNaN | > K = P sup ¯ Π (1 + iλDNj )¯ > K
N>N ∗
·N>N ∗ aj=1
¸
¢ 12
N ¡
2 2
≤ P sup Π 1 + λ DNj > K
N>N ∗ j=1
#
"
·
¸
aN ¡
1
¢
2 2
l
l
2
= P sup Π 1 + λ DNj > K | sup N |DNj | ≤ C · P maxN |DN j | ≤ C +
j
N>N ∗ j=1
N>Nj∗
"
#
"
#
aN ¡
1
¢
2
2
P sup Π 1 + λ2 DNj
> K | sup N l |DNj | > C · P sup N l |DNj | > C
N>N ∗ j=1
N>Nj∗
N>N ∗
"
#
" j
#
aN ¡
1
¢
2
2
≤ P sup Π 1 + λ2 DN
> K | sup N l |DN j | ≤ C + P sup N l |DNj | > C
j
·
¸
·
N>N ∗ j=1
N>Nj∗
N >Nj∗
with C©¡
a uniform upper
ª the ANt0 s . Nothing that the first summand is bounded
¢ bound to
2
−2l an/2
> K which is zero for suﬃciently large K, and that the
by sup I 1 + λ CN
N>N ∗
·
¸
1
l−( 12 )
second summand is bounded by P sup N
bN |ANt | > C = 0 since N l−( 2 ) bN < 1 by
N >N ∗ ,l
¸
·
construction. Therefore, P sup |TNaN | > K = 0 for some K > 0, which in turn implies
N>N ∗
sup E |TNaN I {|TNaN | > K}| → 0 as K → ∞ and thus condition (a) holds.
N>N ∗
aN
P
(b) From Davison (1994) we can write TNaN = (iλ) DNj TN,j−1 . Then, it is enough to
j=1
¡ −1 ¢
show that the max |E [DNj TN,j−1 ]| = o aN .
j
Writing TN,j−1 =
Π
k∈ΞN j1
(1 + iλDNk ) ·
Π
k∈Ξ
/ N j1
(1 + iλDNk ) =
Π
k∈ΞN j1
(1 + iλDNk ) TRNj
where TRNj is implicitly defined and ΞNj1 is the set of blocks adjacent to block j.
µ
¶
P
γk
Thus, TN,j−1 = (iλ)
Π DNk TRNj with ΓNj the set of vectors of size equal to
γ∈ΓN j
k∈ΞN j1
|ΞNj1 | whose elements are all either zero ¯or one.
· Because the number
¸¯ of elements in ΓNj
¯
¯
¡ ¢
γk ¯
= o a−1
is finite, it is enough to show that max ¯¯E DNj TRNj Π DNk
N . To show
¯
j,γ
k∈ΞN j1
¡ ¢
this, we show that (i) max |E [DNj TRNj ]| = o a−1
and (ii) max |E [DNj DNk TRNj ]| =
N
j
j6=k
¡ −1 ¢
o aN .
1
For (i), note that the observations in TRNj are located at least a distance bN2 away
from those in DNj by construction. Therefore, max |E [DNj TRNj ]| is bounded by
j
¡√ ¢
C1 maxE |DNj TRNj | α bN for large C1 > 0 given the conditions on the covariances
j
and that E [DNj ] = 0. Using the properties of the constructed aN , bN and the properties of α in A6:
³
³ − 1 − 1 ¡√ ¢´
¡√ ¢
¡√ ¢´
¡ ¢
1
= o a−1
C1 maxE |DNj TRNj | α bN = o N − 2 bN α bN = o aN 2 bN 2 α bN
N
j
for large N, as required.
32
1
For (ii), max |E [DN j DNk TRNj ]| ≤ maxN − 2 bN |E[DNj TRNj ]| by the boundedness propj6=k
j6=k
³
´
¡ ¢
1
erty of the ANt0 s . The maximum is thus of order O N − 2 bN a−1
which is o a−1
N
N .
Repeating the argument for the remaining elements in ΓN j completes the proof.
Therefore, condition (b) E [TNaN ] → 1 is demonstrated.
(c) We start by showing that
aN ¡
£ 2 ¤¢ p
P
2
− E DNj
→0
DNj
j=1
Take
aN
P
√
C1PaN
©¡ 2
£ 2 ¤¢ ¡ 2
£ 2 ¤¢ª
¡√ ¢
4
E DNj
− E DNj
(l + 1) α bN l maxE [DNi
]
DNj − E DNj
≤ C2
i,j=1
i
l=0
where C1 , C2 > 0 are suﬃciently large constants, and the inequality is a result of the
conditions on the covariances and locations in assumptions A6 and A9. The right
hand side of the inequality is of order 0 (N −2 b3N aN ) since it follows from the conditions
4
in A6 and A9 that maxE [DN
i ] is bounded by
i
P
1
max
|E
N2 i
t1 ,t2 ,t3 ,t4 ∈ΛNi
P
C1 N12 max
i t ,t ,t ,t ∈Λ
1 2 3 4
P
C4 N12 b2N max
i
[ANt1 AN t2 AN t3 ANt4 ] | ≤
{α (dt1 ,t2 ) + . . . + α (dt3 ,t4 )} ≤ C2 N12 b2N max
i
Ni
1/2
C3 bN
P
P
t1 ,t2 ∈ΛNi
lα (l) for some C1 , C2 , C3 , C4 > 0, and since
t1 ∈ΛN i l=0
α (dt1 ,t2 ) ≤
∞
P
lα (l) is bounded
l=0
by A6, the last term
is of order O (N −2 b3N aN ).
√
C1PaN
¡√ ¢
4
−2 3
(l + 1) α bN l maxE [DN
bN aN ) = o (1) as n → ∞ by the
Finally, C2
i ] = O (N
i
l=0
properties of aN and bN . Thus we have shown that
(c) can be written as:
as
aN
P
aN
P
2
DNj
−1 =
j=1
aN
P
£
¤
aN ¡
£ 2 ¤¢ p
P
2
− E DNj
→ 0 and condition
DNj
j=1
2
E DN
j − 1 + op (1) which can be further rewritten
j=1
£ 2 ¤
P
2
− 1 + op (1) = E [Y0N
E DNj
] − 1 − E [DNi DNj ] + op (1) .
j=1
i6=j
P
To check the order of convergence we need only to analyze the term E [DNi DNj ]. For
i6=j
P
this it is enough to analyze max |E [DNi DN j ] | since each of the summations over i and
i
i6=j
j contain
P terms with aN or aN−1 . Take ΞNil as previously defined, then, for some C1 > 0,
max |E [DNi DNj ] | can be bounded by
i
i6=j
√
C1 aN
max
i
X X
l=1 j∈ΞN il
|E [DNi DNj ]| ≤ max
i
X
j∈ΞN il
√
C1 aN
|E [DNi DNj ]| + max
i
X X
l=2 j∈ΞN il
|E [DNi DNj ]|
and we analyze each of the terms in the right-hand side
¯ .
¯
¯
¯
P
¯1
¯
For the first term, note that max |E [DN i DNj ]| = max ¯ N
E [ANt ANs ]¯
i6=j
i6=j ¯ t∈Λ ,s∈Λ
¯
Ni
Ni
33
(22)
≤ maxC1 N1
i6=j
P
t∈ΛN i ,s∈ΛN i
α (dts ) for some large C1 > 0, by the boundedness of AN t0 s and A6.
Consider adjacent blocks for which dependence will be typically stronger, then, by A9 and
1/2
A6, the number of (t, s) combinations within distance d is bounded by C2 bN d2 for some
√
C4 bN
1/2 P
C2 > 0. Letting C3 = C2 C1 the expression is bounded by C3 max N1 bN
d2 α (d) for some
d=0
¡
¢
C4 > 0 and since by A9 d2 α (d) → 0 the expression is o N1 bN and thus max |E [DNi DNj ]| =
i6=j
¡
¢
¡ ¢
o N1 bN or o a−1
(since
n
=
a
b
),
and
so
is
the
first
term
in
(22).
N N
N
For the second term, first note that by the boundedness of ANt and E [AN t ] = 0: max
j∈ΞN il
¡ ¡√
¢¢
max max |E [ANt AN s ]| = O α bN (l − 1) uniformly in l. Therefore, the second term is
t∈ΛN i s∈ΛN j
√
C1PaN
¡√
¢
|ΞN il | |ΛNi | |ΛNj | α bN (l − 1) ≤
i
l=2
Ã
!
√
√
C1PaN
C1PaN
¡√
¢
¡ ¢
lα bN l = o N1 bN
lα (l) = o a−1
C3 N1 b2N
N . The second-to-last equality
bounded by C2 max
1
N
l=1
is due to
√
C1PaN
l=1
α (ts)
= o (t2 ) as t → ∞ and the last one due to the boundedness of the sum
α (s)
lα (l).
l=1
Finally, since max
i
(c) is demonstrated.
P
i6=j
¡ ¢
P
|E [DNi DNj ]| is o a−1
and thus E [DNi DNj ] is o (1), condition
N
(d) Using the boundedness of ANt ,
¯
¯
¯
¯
P
¯ √1
¯
ANt ¯ ≤
max |DNj | = max ¯ N
j
j ¯
¯
t∈ΛN j
construction of bN .
i6=j
√1
N
|ΛNj | max |ANt | = Op
t
³
√1 bN
N
´
= op (1) by the
d
Since conditions (a)-(d) are satisfied under the current assumptions, Y0N → N (0, 1) ⇐⇒
d
gN (θ0 ) → N (0, Ψ1 (θ0 )) which concludes the proof of proposition 2. QED
Proof of Proposition 3
³
´
³
´ p
∂gN ˆθGMM p ∂g (θ )
0
→
by part (a)
Ψ2N ˆθGMM → Ψ2 (θ0 ) follows from the fact that
0
0
∂θ
∂θ
p
p
of Lemma 1, by the consistency
³
´ pof θGMM → θ0 ; and by A8 which
³ assumes
´ pMN → M
To show that Ψ1N ˆθGMM → Ψ1 (θ0 ) we can show that Ψ1N ˆθGMM → Ψ1N (θ0 ) as we
did in Lemma 1.
Note that
¢
¢ª
© ¡
¡
P
Ψ1N (θ0 ) = N1 τ N i (θ0 ) τ Nj (θ0 ) zi zj0 Φ2 ψi (θ0 ) , ψ j (θ0 ) , ρNij (θ0 ) − Φ2 ψi (θ0 ) , ψ j (θ0 ) , 0
ij
¢
¡
∗
X
(θ
)
,
ψ
(θ
)
,
ρ
(θ
)
ψ
∂Φ
1
2
0
0
i
j
Nij
=
τ Ni (θ0 ) τ N j (θ0 ) zi zj0 ρNij (θ0 )
(23)
N ij
∂ρ
34
where τ (·) is as defined in the proof of proposition 1, Φ2 stands for the bivariate normal
distribution, and the second equality follows from the mean value theorem for θ∗ between
ˆθGMM and θ0 .
First consider the partial derivative term and show that it is bounded. Take the following
bivariate normal distribution.
½
¾
Zb Za
1
1
2
2
Φ2 (a, b, ρ) =
(t − 2ρts + s ) dtds
exp −
2 (1 − ρ2 )
2Π (1 − ρ)1/2
−∞−∞
−1/2
and make the following change of variable from t to u = (t − ρs) (1 − ρ2 )
. Integrating
over u and diﬀerentiating with respect to ρ yields:
Zb n
o
∂Φ(a,b,ρ)
1
2 −1/2
2 −3/2
√
−s
(1
−
ρ
which, by the conditions on
=
)
+
(a
−
ρs)
ρ
(1
−
ρ
)
∂ρ
2Π
−∞
ρ assumed in A12 - A14, is bounded.
Repeating the application of the mean value theorem with respect to θ in (23), using the
p
boundedness
that ˆθGMM → θ0 yields the result
³ of the
´ pabove partial derivative and
³ the ´fact
p
that Ψ1N ˆθGMM → Ψ1N (θ0 ) and thus Ψ1N ˆθGMM → Ψ1 (θ0 ). QED
35