Estimation of Sample Selection Models with Spatial Dependence∗ Alfonso Flores-Lagunes† Kurt Erik Schnier‡ Preliminary and Incomplete Draft August 13, 2004 Do not quote or circulate without permission of the authors Abstract We consider the estimation of sample selection (type II Tobit) models that exhibit spatial dependence. Two types of spatial dependence are considered: spatial error dependence (or spatial autoregressive error model, SAE) and spatial lag dependence (or spatial autoregressive model, SAL). The method considered is motivated by a two-step strategy analogous to the popular heckit model. The first step of estimation is based on a spatial probit model following a methodology proposed by Pinkse and Slade (1998) that yields consistent, although not fully efficient, estimates. The consistent estimates of the selection equation are used to estimate the inverse mills ratio (IMR) to be included as a regressor in the estimation of the outcome equation. Since the appropriate IMR turns out to depend on a parameter from the second step under SAE, the models are estimated jointly, which we propose to do in a GMM context. We explore the large sample properties of the proposed estimator and undertake a Monte Carlo experiment to assess its performance in finite samples. Finally, we discuss the importance of the spatial sample selection model in applied work and illustrate its application with a censored sample of fishermen in the Eastern Bering Sea used in Schnier (2003). Key words and phrases: sample selection, spatial dependence, generalized method of moments. JEL classification: C13, C15, C24, C49 ∗ We want to thank Dan Ackerberg, Carlos Flores and Kei Hirano for useful discussions, and seminar participants at the department of Agricultural & Resource Economics at the University of Arizona and the 2003 Midwest Econometrics Group meetings. All errors are our own. † Department of Economics, University of Arizona. [email protected] ‡ Department of Environmental and Natural Resource Economics, University of Rhode Island. [email protected] 1 Introduction Econometric models that take into account spatial interactions among economic units have been increasingly used by economists over the last several years.1 The different approaches for undertaking estimation and inference in linear regression models with spatial effects are well known and have been summarized in a number of papers by Anselin (1988, 2001), Anselin and Bera (1998), and others. The estimation of nonlinear models that include spatial interactions, in particular limited dependent variable models, is not as well developed as that for linear models. In fact, only recently have methods for estimating and conducting statistical inference in spatial models with limited dependent variables been proposed. This literature has concentrated mainly in the probit model with spatial effects, as in Case (1992), Beron and Vijverberg (1999), Fleming (2002), LeSage (2000), McMillen (1992), and Pinkse and Slade (1998). In this paper we add to the literature on the estimation of limited dependent variable models with spatial effects by introducing a sample selection model with spatial dependence and proposing a method for its estimation. The type of sample selection model considered is the widely used heckit model (Heckman, 1976, 1979), also known as the Tobit type II model, in the terminology of Amemiya (1985).2 As for spatial dependence, we consider two separate structures: the spatial autoregressive error (SAE) model and the spatial autoregressive lag (SAL) model. Our estimation strategy can be thought of as a two-step procedure analogous to the popular heckit model that is estimated jointly as a "pseudo" sequential estimator using GMM (Newey, 1984). The first step of estimation is based on a spatial probit model following a methodology by Pinkse and Slade (1998), which yields consistent, although not fully efficient, estimates of the selection equation. As in the heckit procedure, the consistent estimates of the selection equation are used to estimate the inverse Mills ratio (IMR) to be included in the estimation of the outcome equation. Since in the presence of spatial error dependence the IMR depends upon unknown parameters from the outcome equation, the model is estimated jointly. The estimation of a probit model with spatial dependence introduces a non-spherical variance-covariance matrix that renders the simple probit estimator inconsistent and inef1 2 Some examples are Case (1991), Fishback, Horrace and Kantor (2002), Topa (2001), among many others. For a review of these methods see Maddala (1983) and Vella (1998). 1 ficient. In turn, to obtain consistent and fully efficient estimates, one has to deal with multidimensional integrals. For instance, LeSage (2000) and Beron and Vijverberg (2003) employ simulation methods to approximate such multidimensional integrals and obtain consistent and efficient parameter estimates of the spatial probit model. Unfortunately, the simulation of multidimensional integrals is computationally intensive, which restricts estimation to moderate sample sizes. This same limitation would apply to the use of simulation methods to approximate the multidimensional integrals in the likelihood functions of sample selection models with spatial dependence.3 In an attempt to avoid approximating multidimensional integrals but achieve consistency of the probit estimates at the expense of efficiency, some authors have addressed the heteroskedasticity induced into the probit model by the spatial dependence, ignoring the off-diagonal elements of the variance-covariance matrix. These authors include Case (1992), McMillen (1992) and Pinkse and Slade (1998). Although not fully efficient as the estimators that approximate multidimensional integrals, we use Pinkse and Slade’s estimator in the first step of the sample selection model for at least three reasons. First, it yields consistent estimates of the selection equation that are necessary to obtain consistent estimates of the parameters in the outcome equation (second step). Second, it is computationally simpler than both of the other estimators that approximate multidimensional integrals. Third, it has been developed in the framework of GMM, the same framework we employ in the joint estimation of our sample selection models. The consistent estimates obtained in the first step are used to construct the IMR, which is used in the second step to correct for selectivity bias (Heckman, 1979). Since the outcome equation is a linear equation, any estimation method can be used to estimate the second step by including its corresponding moment conditions. However, since it is assumed that the outcome equation also exhibits spatial dependence and since generally the spatial parameters will be different in each equation, the IMR under SAE turns out to be a function of the unknown spatial parameter in the outcome equation. As a result, the sample selection model under SAE cannot be estimated in two separate steps and an appropriate method that estimates both steps simultaneously has to be used. We build on the sequential estimation framework proposed by Newey (1984) to jointly estimate the sample selection model with SAE. The sample selection model with SAL dependence can be estimated in two steps, as 3 For examples of likelihood functions for sample selections models under independent observations see Heckman (1979) and Maddala (1983). 2 the IMR depends only upon parameters from the selection equation. However, we propose the estimation of both models using the sequential estimation GMM framework since an added advantage is that under this framework the appropriate variance-covariance matrix of the estimator is obtained directly. Finally, building on Pinkse and Slade’s (1998) results and standard GMM theory, the asymptotic properties of our estimators can be derived. An early attempt to specify and estimate a sample selection model with spatial effects is McMillen (1995). His model builds on the spatial expansion model of Casetti (1992) that is used in geography. However, such model is not explicitly spatial, and the consistency of the estimator depends on a functional-form assumption of the underlying heteroskedasticity induced by the spatial dependence.4 Our estimators have two potential disadvantages. First, is their lower efficiency as compared to the computationally intensive simulation methods, however, given the fact that the multidimensional integration is in the order of the number of observations and the large amounts of data available to researchers, we believe the computational simplicity of our method is preferable. Second, is the fact that our estimators require the availability of at least one instrumental variable in addition to the common "exclusion restriction" desirable in the heckit model. To this end, however, the instrumental variables proposed by Kelejian and Prucha (1998) can be used, which are always available since are constructed from polynomials of available variables. Using simulations, we find that the Kelejian and Prucha instruments work well in practice. The paper is organized as follows. Section 2 outlines the sample selection model with spatial effects, considering separately the spatial autoregressive error (SAE) and the spatial autoregressive lag (SAL) models. Section 3 introduces our spatial sample selection model estimators and states their large-sample properties. Section 4 describes and presents the results of a Monte Carlo experiment, while Section 5 discusses the practical importance of the spatial sample selection model and presents two empirical applications of our method that correspond to the SAE and SAL models, employing a censored sample of fishermen in the Eastern Bering Sea used in Schnier (2003). Concluding remarks are provided in the last section of the paper. Proofs are presented in Appendix C. 4 McMillen (1995) also shows that using the EM algorithm (as in McMillen (1992)’s spatial probit estimator) in a spatial sample selection model is impractical either because of computational issues or because it requires knowledge of the exact spatial process generating the underlying data. 3 2 The Sample Selection Models with Spatial Dependence We consider estimation of two sample selection (Tobit type II) models that differ in the form of spatial dependence embodied on them. We specify the sample selection models to include spatial effects in both the selection and the outcome equations. The first model is the spatial autoregressive error (SAE) model, which specifies that the disturbances are spatially autocorrelated: ∗ y1i = α0 + x01i α1 + u1i , u1i = δ X cij u1j + ε1i (1) cij u2j + ε2i (2) j6=i ∗ y2i = β0 + x02i β 1 + u2i , u2i = γ X j6=i ∗ ∗ where y1i and y2i are latent variables with the following relationship with the observed ∗ ∗ variables y1i = 1 if y1i > 0 and y1i = 0 otherwise, and y2i = y2i ∗ y1i . Therefore, (1) is the selection equation while (2) is the outcome equation. Note that each of these equations exhibit spatial dependence, as u1i and u2i depend on other u1j and u2j through their location in space, as given by the spatial weights cij and the spatial autoregressive parameters δ and γ. Typically, the spatial weights are specified by the econometrician based on some function of contiguity or (economic) distance (Anselin, 1988; Anselin and Bera, 1998). Note also that, in general, one will specify different spatial autoregressive parameters for the selection and outcome equations. It is assumed that: Assumption A The errors ε1i and ε2i , i = 1, ...N , are iid N(0, Σ) with ¸ · 2 σ1 σ 12 . Σ= σ 12 σ 22 The model in (1)-(2) can also be presented in a reduced form: ∗ y1i = α0 + x01i α1 + X ω1ij ε1j (3) ω 2ij ε2j (4) j ∗ = β 0 + x02i β 1 + y2i X j where the weights ω1ij and ω 2ij are the (i, j) elements of the inverse matrices (I − δC)−1 and (I − γC)−1 , respectively, with C the matrix of spatial weights cij . Note that both sets of weights, ω1ij and ω2ij , depend upon the unknown parameters δ and γ, respectively. 4 The second form of spatial dependence considered is the spatial autoregressive lag (SAL) model, which specifies that the dependent variable of each unit is spatially dependent on the value of the dependent variable in all (or some) of the other units: ∗ y1i = α0 + x01i α1 + δ X ∗ cij y1j + ε1i (5) ∗ cij y2j + ε2i (6) j6=i ∗ = β 0 + x02i β 1 + γ y2i X j6=i ∗ ∗ and y2i are latent variables with the following relationship with the observed where y1i ∗ ∗ variables y1i = 1 if y1i > 0 and y1i = 0 otherwise, and y2i = y2i ∗ y1i . Assumption A is applied here as well to the error terms ε1i and ε2i . The spatial dependence implied by the SAL model is given by the spatial weights cij and the spatial autoregressive lag parameters δ and γ. A reduced form of the SAL model similar to (3)-(4) is: ∗ y1i = α0 X ω1ij + α1 j ∗ = β0 y2i X X ω 1ij x01j + j ω2ij + β 1 j X j X ω1ij ε1j (7) ω2ij ε2j (8) j ω 2ij x02j + X j where the weights ω 1ij and ω 2ij are defined as in the reduced form of the SAE model, and we note again that these weights are functions of the unknown spatial parameters δ and γ, respectively, in both the SAL and SAE models. 3 Estimators for the Sample Selection Models with Spatial Dependence We now describe our proposed estimation method for the sample selection models with spatial dependence presented in the previous section. As previously described, we follow a two-step procedure in the spirit of Heckman (1976, 1979) that is estimated jointly in a GMM context. The selection equation is estimated using Pinkse and Slade’s (1998) GMM estimator for the spatial probit model, while the outcome equation can be estimated with any of the well-developed spatial methods for linear models. An estimate of the inverse Mills ratio is included in the outcome equation to correct for selectivity bias. To estimate these two parts simultaneously, the corresponding moment conditions are stacked, and a GMM criterion function is minimized with respect to all parameters in the model. In what follows, we consider again each type of spatial dependence separately. 5 3.1 Estimation of the SAE sample selection model To motivate the estimation of the SAE model in (1)-(2), we start with the following calculations: var(u1i ) = σ 21 X (ω1ij )2 (9) j var(u2i ) = σ 22 X (ω2ij )2 (10) j E(u1i , u2i ) = σ 12 X ω1ij ω 2ij . (11) j In the typical heckit model, in the first step a probit model is estimated for the probability of each observation being included in the observed sample. In the presence of SAE errors, this probit is complicated by the induced heteroskedasticity of the error terms in (9). Pinkse and Slade (1998) propose to estimate this spatial probit model by taking into account the known form of the induced heteroskedasticity in a GMM context. Define θ1 = {α0 , α01 , δ} as the parameters to be estimated in the spatial probit model, 0 α +x α and ψi (θ1 ) = √0 1i 1 the index function of the probit model weighted by the standard var(u1i ) deviation of the residual. The corresponding generalized residuals of this model are: u˜1i (θ1 ) = {y1i − Φ [ψi (θ1 )]} · φ [ψi (θ1 )] . Φ [ψi (θ1 )] {1 − Φ [ψi (θ1 )]} (12) Then, the GMM estimates for θ1 can be obtained as follows: ˆθ1,GMM = arg min SN (θ1 )0 MN SN (θ1 ) (13) θ1 ∈Θ1 where SN = 1 0 z u˜ (θ ), N N 1N 1 zN is a matrix of regressors plus at least one instrument (to identify p the extra parameter δ), and MN is a positive definite matrix such that MN → M. Pinkse and Slade (1998) show that this estimator is consistent and asymptotically normal. The consistent estimates of θ1 will be used in the construction of the inverse Mills ratio (IMR) to correct for sample selection bias. Note that the conditional regression function for 6 the outcome equation (2) has the following form: E[y2i |y1i > 0] = β 0 + x02i β 1 + E[u2i |u1i > −(α0 + x01i α1 )] φ [−ψi (θ1 )] E(u1i , u2i ) · = β 0 + x02i β 1 + p var(u1i ) {1 − Φ [−ψi (θ1 )]} P σ 12 ω 1ij ω 2ij φ [−ψi (θ1 )] j = β 0 + x02i β 1 + r P · σ 21 (ω1ij )2 {1 − Φ [−ψi (θ1 )]} j P ω 1ij ω 2ij σ φ [−ψi (θ1 )] j 12 · rP · = β 0 + x02i β 1 + σ1 (ω1 )2 {1 − Φ [−ψi (θ1 )]} j ij Therefore, the selectivity correction implies the following "adjusted" IMR: P ω 1ij ω 2ij φ [−ψi (θ1 )] . · λi ≡ rP [ω 1ij ]2 {1 − Φ [−ψi (θ1 )]} j (14) j Once estimated, the "adjusted" IMR would be included as an additional regressor in the outcome equation, which in turn could be estimated with any of the spatial methods developed for this linear equation.5 We illustrate our estimator employing feasible generalized bi + v2i .6 least squares (FGLS) to estimate the outcome equation: y2i = β 0 + x0 β 1 + µλ 2i However, this method cannot be applied in two separate steps since the "adjusted" IMR in (14) depends on a parameter that is not estimated in the first step: γ, which is included in the weights ω 2ij . To overcome this difficulty, we use GMM to estimate simultaneously all parameters of the sample selection model. Rewrite the sample selection model as a sequential estimator (Newey, 1984), composed of the Pinkse and Slade (1998) and FGLS estimators, by stacking their corresponding moment conditions: 0 g(z, θ) = [s(z1N , θ)0 , m(z2N , θ)0 ] , θ = {α0 , α01 , δ, β 0 , β 01 , µ, γ} with 0 s(z1N , θ) = z1N u˜1N (θ), 5 u˜1N (θ) as in (12), b γ) m(z2N , θ) = [y1N · z2N ]0 u˜2N (θ), u˜2N (θ) = y2N − β 0 − x02N β 1 − µλ(δ, The linear SAE model could be estimated, for instance, with FGLS, the methods by Kelejian and Prucha (1999), Lee (2001a), or maximum likelihood. 6 Kelejian and Prucha (1999) propose a set of three moment conditions for the estimation of the spatial autoregressive parameter in a SAE model. These moments will be added to our estimators in the Monte Carlo experiment and empirical application, to increase their efficiency. 7 and z1N includes the regressors of the selection equation plus at least one instrument, and z2N includes the regressors of the outcome equation, the estimated "adjusted" IMR, and least one instrument, which could be in principle the same instrument contained in z1 . We note that the instruments proposed by Kelejian and Prucha (1998) can be used if no other instruments are available.7 0 0 Define zN = (z1N , [y1N · z2N ]0 )0 and u˜N (θ) = (˜ u01N (θ), u˜02N (θ))0 . Then, all parameters of the sample selection model can be estimated as: ˆθGMM = arg min gN (θ)0 MN gN (θ) (15) θ∈Θ p 0 u˜N (θ), for a positive definite MN such that MN → M. where gN = N −1 zN We show in appendix C that, under conditions similar to those in Pinkse and Slade (1998), ˆθGMM is consistent and asymptotically normal. We call it the "spatial heckit" estimator for the SAE model. p Proposition 1 Under assumptions A1 to A8 (stated in appendix C), ˆθGMM → θ0 . The following two propositions concern the asymptotic distribution and the estimation of the variance-covariance matrix of the spatial heckit estimator for the SAE model. Proposition 2 Under assumptions A1 to A11 (stated in appendix C), √ d N(ˆθGMM − θ0 ) → N(0, [Ψ2 (θ0 )]−1 [∂g0 (θ0 )/∂θ]MΨ1 (θ0 )M[∂g(θ0 )/∂θ0 ][Ψ2 (θ0 )]−1 ) where Ψ1 (θ0 ) = lim E{NgN (θ0 )gN (θ0 )0 }, and Ψ2 (θ0 ) = [∂g0 (θ0 )/∂θ]M[∂g(θ0 )/∂θ0 ]. N→∞ Proposition 3 Under assumptions A1 to A14 (stated in appendix C), then p p Ψ1N (ˆθGMM ) → Ψ1 (θ0 ) and Ψ2N (ˆθGMM ) → Ψ2 (θ0 ), where Ψ1N (ˆθGMM ) = N E{gN (ˆθGMM )gN (ˆθGMM )0 } and 0 ˆ Ψ2N (ˆθGMM ) = [∂gN (θGMM )/∂θ]MN [∂gN (ˆθGMM )/∂θ0 ]. Finally, the following corollary gives the (efficient) asymptotic variance of our estimator when the optimal weighting matrix is used. 7 In the context of a linear SAL model, Kelejian and Prucha (1998) show that the optimal set of instruments is approximated by the linearly independent colums of [x, W x, W 2 x, ...]. These instruments can also be used in this context, as they are still exogenous. 8 p Corollary 1 If MN = [Ψ1N (ˆθGMM )]−1 is chosen, such that MN → [Ψ1 (θ0 )]−1 , then the asymptotic distribution of ˆθGMM simplifies to: √ d N(ˆθGMM − θ0 ) → N(0, [Ψ2 (θ0 )]−1 ). 3.2 Estimation of the SAL sample selection model We start by noting that in the sample selection model with SAL dependence the IMR does not depend on parameters from the outcome equation (second step) and therefore could in principle be estimated in two separate steps. To see this, note that from (5) and (6): var(ε2i ) = σ 22 , var(ε1i ) = 1, and E(ε1i , ε2i ) = σ 12 . (16) and the conditional regression function for the outcome equation (6) becomes: X X cij y2j + E[ε2i |ε1i > −(α0 + x01i α1 + δ cij y1j )] E[y2i |y1i > 0] = β 0 + x02i β 1 + γ j6=i = β 0 + x02i β 1 + γ X j6=i = β 0 + x02i β 1 + γ X E(ε1i , ε2i ) φ [−ϕi (θ1 )] cij y2j + p · var(ε1i ) {1 − Φ [−ϕi (θ1 )]} cij y2j + j6=i where ϕi (θ1 ) ≡ α0 + x01i α1 +δ j6=i P j6=i cij y1j σ 12 φ [−ϕi (θ1 )] · σ 1 {1 − Φ [−ϕi (θ1 )]} and in which the second multiplicand in the last term is the usual IMR that only depends on parameters from the selection equation, implying that the model can be estimated in two separate steps. Nevertheless, we employ the same sequential estimator framework as with the SAE model to directly obtain the appropriate variance-covariance matrix of the estimated parameters (Newey, 1984). We note that under the assumptions maintained for the SAL model and regularity conditions on the spatial dependence process (such as A6 and A9 in appendix C), standard GMM asymptotic results apply, and thus we do not provide formal proofs of consistency and asymptotic normality of our spatial heckit estimator for the SAL model.8 We now outline the moment conditions upon which the spatial heckit estimator for the SAL model is based. First consider the selection equation of the spatial autoregressive lag (SAL) model in (5), which implies a binary choice model for the observed variable y1i , and under assumption A can be rewritten as: P (y1i = 1) = Φ(α0 + x01i α1 + δ X cij y1j ) = Φ(ϕi ). j6=i 8 For a review of the asymptotic properties of GMM sequential estimators see Newey (1984). 9 (17) Note that this probit model contains both endogenous and exogenous variables, and thus a nonlinear two-stage least-squares (NL2SLS) procedure (Amemiya, 1974) can be used to estimate the parameters of interest. Let z1i be a vector of variables that contains a constant, x01i , plus at least one instrument. Replacing ϕi for ψi in the definition of the generalized residuals in (12), the NL2SLS estimator can be obtained by solving the following moment conditions: 0 s(z1 , θ1 ) = z1N u˜1N (θ1 ) = 0 where θ1 = {α0 , α01 , δ}. The outcome equation, now expanded by the inclusion of the IMR, can be estimated with any of the spatial methods developed for this linear equation.9 We illustrate our estimator employing 2SLS. Define b 1 ) − γCy2N m(z2N , θ) = [y1N · z2N ]0 u˜2N (θ), u˜2N (θ) = y2N − β 0 − x02N β 1 − µλ(θ where z2N includes the regressors of the outcome equation, the estimated IMR, and at least one instrument, which could be in principle the same instrument(s) contained in z1 .10 0 0 Defining zN = (z1N , [y1N · z2N ]0 )0 and u˜N (θ) = (˜ u01N (θ1 ), u˜02N (θ))0 . Then, all parameters of the sample selection model with SAL dependence can be estimated as: ˆθGMM = arg min gN (θ)0 MN gN (θ) θ∈Θ p 0 where gN = N −1 zN u˜N (θ), for a positive definite MN such that MN → M. 4 Monte Carlo Experiment The goal of the Monte Carlo experiment is to explore the finite-sample performance of the spatial heckit estimators for the spatial sample selection models proposed in Section 3. In addition, our estimators are compared to three other types of estimators: those that ignore sample selection but account for spatial dependence, such as that of Kelejian and Prucha (1998) for estimating the SAE models and 2SLS for the SAL model; the simple heckit estimator that accounts for sample selection but ignores spatial dependence, and finally 9 The linear SAL model could be estimated, for instance, with 2SLS, the methods by Kelejian and Prucha (1998), Lee (2001b), or maximum likelihood. 10 In particular, the same instruments proposed by Kelejian and Prucha (1998) can be used. See footnote 7. 10 the ordinary least squares (OLS) estimator that ignores both sample selection and spatial dependence. We compare these estimators in terms of their finite sample bias and root-mean square error. Given that we combine features of a sample selection model with a spatial econometric specification, we pay close attention to previous simulation studies in specifying each of the two features of our models, such as Cosslett (1991) and Leung and Yu (1996) for the sample selection model, and Beron and Vijverberg (1999) and Kelejian and Prucha (1999) for spatially dependent models. Our data generating process (DGP) can be summarized in the following general specification that encompasses both SAL and SAE models, although we analyze them separately: ∗ y1i = α0 + α1 x1i + α2 x2i + δ SAL X ∗ cij y1j + u1i , u1i = δ SAE j6=i ∗ = β 0 + β 1 x3i + β 2 x1i + γ SAL y2i X X cij u1j + ε1i (18) cij u2j + ε2i . (19) j6=i ∗ cij y1j + u2i , u2i = γ SAE j6=i X j6=i Clearly, the SAE model is obtained by setting δ SAL = γ SAL = 0, while the SAL model is simulated by setting δ SAE = γ SAE = 0. Each of our models consists of three uncorrelated exogenous variables, one of which is common to both equations, as in Cosslett’s (1991) experimental design. These exogenous variables are generated as xk ∼ Unif orm(0, 1), k = 1, 2, 3. The innovations ε1i and ε2i are generated bivariate normal as follows: ¸¶ · ¸ µ· ¸ · 1 ρ ε1i 0 , vN ρ 1 0 ε2i (20) where we set ρ = 0.5 for the correlation between the innovations in each of the two equations. The parameters of the model that are not related to the spatial dependence feature are set as α1 = α2 = β 1 = β 2 = 1 and β 0 = 0. The parameter α0 is used to control the amount of sample selection, for which we consider two cases: 25% censoring (α0 = −0.3) and 50% censoring (α0 = −1.1). We consider in the experiment two different sample sizes: N = 225 and 400 observations. For the models with N = 225, we consider three different values for the spatial parameter (for either SAE or SAL): 0.25, 0.5, and 0.75. While for the models with N = 400, due to computational constraints, we consider only two values: 0.25 and 0.5. Finally, we consider two different types of instruments in our simulations to gauge their relative performance: the Kelejian and Prucha (KP) instruments, which are always available to the researcher; and two artificially generated exogenous instruments that mirror a situation in which the researcher 11 has available additional instruments. In the simulations, the generated instruments are specified to have a correlation of about 0.2 with the exogenous included variables in the model. Due to computational constraints, 100 replications are undertaken for the models with N = 225, whereas 50 replications are undertaken for the models with N = 400. The matrix of spatial weights has to be specified. For this, we create a 15 by 15 grid for the 225 observations and a 20 by 20 grid for the 400 observation matrix. Each grid is assigned an X and Y coordinate centered on the grid such that the bottom left corner of the grid had a value of (0.5, 0.5). We use the program Stata with these grids to create a weighting matrix that is based on the square of the inverse Euclidean distance between any two points. After creating the location specific weights for each grid, the matrix is row standardized so that the diagonal elements of the weighting matrix are all zeros and the sum of any one row is equal to 1. Finally a band is used to determine the number of observations that may influence a centered observation. Such band is set with a lower bound of 0 and an upper bound equal to the maximum distance between any two points. Therefore, there exists spatial dependence for all the observations, and, on average, each observation has 4 neighbors.11 This way of specifying weighting matrix is standard in the literature, see, for instance, Schnier (2003). Tables 1 and 2 present some preliminary simulation results for the outcome equation of the SAE model for samples of size 225 and 400, respectively. In addition to presenting simulation results for OLS, Heckit and Kelejian and Prucha’s (1998) estimator for the SAE model (KP-SAE), Table 1 also shows simulation results for two different versions of our spatial heckit estimator (Spheck): one in which the generated instruments discussed above are used (Spheck Gen) and one in which the Kelejian and Prucha instruments are employed (Spheck KP). Results are presented for a number of models that differ on the extent of sample selection and spatial dependence which are listed in the first column of the table. For each of these models, up to four parameters of interest are reported, which are listed in the second column of the table. The following columns in the table are arranged in two blocks that correspond to the average bias (BIAS) and the root-mean squared error (RMSE) of the different estimators, respectively. The first estimator reported is OLS, which ignores both features of the data: sample 11 We note that Kelejian and Prucha (1999) find that controlling or not for the number of neighbors per unit when specifying a weighting matrix does not lead to significantly different results in their simulation study. 12 selection and spatial dependence. As a result, OLS has large bias and RMSE that increase as the amount of sample selection or spatial dependence increases. In general, though, OLS is able to estimate β 1 (the coefficient on the variable that does not appear in the selection equation) with relatively small bias compared to the other coefficients. This is due to the fact that the variables x1 and x3 are generated independently, and thus there is little effect of the sample selection on the coefficient on x3 (β 1 ). The second estimator reported is Heckit, which accounts for sample selection but ignores spatial dependence. The consequence of spatial dependence on this estimator is inconsistency, as explained above, since the probit model that is estimated in the first step is heteroskedastic. Therefore, we might expect that the bias and RMSE of the Heckit will increase as the amount of sample selection increases. This is exactly what is found in Table 1, except for the bias when moving from spatial dependence of 0.5 to 0.75 in the models with 50% censoring, and the bias of the parameter β 1 . Compared to OLS, the Heckit estimator shows a great improvement, even though it is also inconsistent. While the average bias and RMSE of β 1 is very small and comparable to that of OLS, the other two coefficients (β 0 and β 2 ) have larger bias: in the case of β 2 , the bias has the interpretation of percentage and varies from 7.2% to 14.4%, and in the case of β 0 the bias ranges from -0.35 to -0.15. Interestingly, the bias of these two coefficients is of opposite sign compared to the bias in OLS. In contrast, the RMSE of the Heckit estimator for β 0 and β 2 is about twice as large as that of OLS β 1 . The third estimator reported is KP-SAE that accounts for spatial dependence but ignores the sample selection feature of the data, which would result in inconsistent parameter estimates. In agreement with this notion, the bias and RMSE increase as the amount of sample selection increases, but also typically increase as the amount of spatial dependence increases. Compared to the previous two estimators, the KP-SAE shows typically a higher bias: the average bias of β 1 is higher than that of the other estimators (which are practically unbiased), and is up to 11.4%; while the bias of β 0 and β 2 is of about the same magnitude as that of the OLS estimator and thus significantly larger than that of the Heckit estimator. In the same way, the RMSE of the KP-SAE estimator is of about the same magnitude as that of OLS and thus smaller than that of Heckit. KP-SAE is the first estimator that estimates the SAE parameter γ, which shows bias and RMSE that increase with both the amount of sample selection and spatial dependence. The bias of γ ranges from -0.77 to -0.257 (18% to 66% of the true γ), underestimating the true value in all cases, while the RMSE ranges from 13 0.152 to 0.289. The last two estimators reported are the two versions of our estimator, Spheck Gen and Spheck KP, which are both consistent for all parameters in the model. We note that simulation results for two models (sample selection of 50% and spatial dependence of 0.25 and 0.5) are not included in this preliminary set of results.12 Both Spheck estimators for β 0 and β 2 have typically smaller bias than other three estimators, except in the model with 50% selection, in which the bias of the Heckit estimator is smaller. On the other hand, the Spheck estimators’ bias of β 1 is typically higher than that of the other estimators, especially the OLS and Heckit. However, this bias is small (between 1.3% and 5.1%) for the models with 25% sample selection, but up to 21% in the model with 50% sample selection. In terms of bias of γ, the Spheck estimators have significantly smaller bias than the KP-SAE estimator, except for the one specification with 25% sample selection and 0.25 spatial dependence. This bias ranges between 0 and 0.98, or 0% to 39% of the true γ. In terms of RMSE of the three β coefficients, the Spheck estimators typically show values that lie between the high RMSE of the Heckit estimator and the relatively low RMSE of OLS, although two exceptions arise. First, the Spheck estimators show smallest RMSE for the β 1 parameter (except for the 50% selection specification). Second, the Spheck estimators have the highest RMSE of β 0 and β 2 for the two models with 25% selection and spatial dependence of 0.25 and 0.5. Looking at the RMSE of the SAE parameter γ, the Spheck estimators show a considerable improvement over the KP-SAE estimator, which is more true as the model considered has more selection and higher spatial dependence. Comparing the two sets of instruments used in the Spheck, they produce very similar results in terms of both bias and RMSE in all four parameters, with those results using generated instruments marginally better in several cases. This finding is important, for in practice researchers might find it hard to come up with good instruments to use, whereas the KP instruments are always available, and according to our results not much is lost by employing them. Table 2 presents the simulation partial results for N = 400 for the two specifications with 25% censoring. Similar patterns as with N = 225 arise, although now the bias of the Spheck estimator (using KP instruments) is considerable smaller than that of the other three estimators in the two models reported. The only exception is the estimate of β 1 , for which 12 The main problem with these models is non-convergence of the optimization routine in several of the replications. 14 the other estimators have slightly smaller bias. The RMSE of the three β coefficients is comparable to the RMSE of the other estimators (although slightly larger), which is notable when compared to the simulation results from the smaller sample size, and suggest that the precision of the Spheck estimator is acceptable for samples as small as 300 observations (about 100 observations are censored). In summary, we regard this set of preliminary results as encouraging with respect to the properties of the Spheck estimator. In particular, the fact that using these relatively small samples the advantages of our estimator show up makes us optimistic. Indeed, one of the reasons the non-convergence issues mentioned above arise is due to the small sample size when 50% is applied: on average estimation is undertaken with only 112 observations. A number of issues are still ongoing. First, the completion of the simulations might shed new light on other performance features of our estimator for the SAE model. Second, the SAL estimator will also be simulated and compared to other estimators that do not account for both sample selection and spatial dependence. 5 The Sample Selection Models with Spatial Dependence in Practice In this section we discuss the empirical importance of taking into account sample selection bias when estimating models that exhibit spatial dependence. Furthermore, we illustrate the application of the two-step methods proposed in Section 3 using a rich data set from a fishery, which is censored due to confidentiality issues (see Schnier, 2003). McMillen (1995) motivates the pervasiveness of sample selection problems in spatial data, in particular in urban economics and regional science. His main example deals with data on land use and values in the city of Chicago during the 1920s (see references in McMillen, 1995). In this case, unobserved variables that make a parcel more likely to receive residential zoning may increase the value of residential land (McMillen, 1995). Other applications of sample selection models with spatial data discussed in McMillen (1995) include models of housing prices, rent and tenure choice (Goodman, 1988), office rents and lease provisions (Benjamin et al., 1992), and home improvement choice (Montgomery, 1992) in urban economics; the choice between central city and suburban employment (McMillen, 1993), analysis of earnings and migration (Borjas et al., 1992) in labor economics. In fact, the increasing availability of geo-coded data makes even more relevant the availability of methods to deal with sample 15 selection when spatial dependence is present. Our application in this section is yet another application of a sample selection model with spatial dependence, in this case in the area of environmental economics, in particular in the economics of fisheries. We estimate two different simple models to illustrate the performance of our spatial heckit estimators for the SAE and SAL models. The application of the SAL model employs data on the locational choice of fishermen in the Eastern Bering Sea to assess the factors that influence fishermen’s selection of locations within a fishery (Schnier, 2003). Schnier’s study is the first to incorporate spatial dependence in such choices and investigate the presence of herding behavior among fishermen. We provide only a brief description of the data below. A more thorough description can be found in Schnier (2003). Fishermen in the Eastern Bering Sea are required to document their harvest according to the statistical reporting areas used by the Alaska Department of Fish and Game and the National Marine Fisheries Service (NMFS), who monitor the fisheries. These statistical reporting areas divide the Eastern Bering Sea into grids that are one-half degree latitude by one degree longitude. These grids define the micro-regions that parse the dataset. In addition, macro-region variables, according to the six major regions into which the NMFS divides the Eastern Bering Sea, are used as control variables. There are two major classifications of vessels that participate in the Pacific cod fishery, the catcher-vessel and the catcher-processor sectors. The catcher-vessel sector represents those vessels that travel to and from their port after harvesting, whereas the catcher-processor sector represents those vessels that can process their catch at sea following their harvest and are thus able to stay at sea for a longer duration than the catcher-vessel sector. The National Oceanic and Atmospheric Administration (NOAA) provides the data used in the analysis. However, the dataset provided by NOAA has been screened for confidentiality. Effort levels for locations that have had four or more vessels of similar characteristics in a statistical reporting area are observed whereas locations with less than four vessels are not observed.13 In the data used to estimate the spatial SAL model, which corresponds to the year 2000, the amount of censoring is in the order of 45% of the 361 observations. For comparison, the model is estimated using the same four estimators described in the previous 13 The different characteristics that are used to differentiate the vessels are whether or not they are within the catcher-vessel or catcher-processor sector, the length class (small, medium or large) and the type of gear used by the vessel. 16 section. Biomass estimates that are not a direct result of the harvest information are incorporated into the econometric model. These observations were obtained from the NMFS, which conducts an annual biomass trawl survey in the early summer of each year to determine the estimated densities of the fish populations within the Eastern Bering Sea. Other data used in the analysis, such as measurements of depth within the statistical reporting area, were obtained from nautical maps with bathymetric contours. Appendix A indicates the name of the variables chosen and the source of the data used in the econometric model. The use of a spatial weighting matrix is premised with the belief that fishermen’s locational choice is not only determined by the expected biomass in a location but also by the observed effort exerted by other fisherman and the expected biomass in the surrounding locations. In order to determine such spatial weighting matrix, one must construct a belief about which surrounding locations affect a specific locational observation. We use the following common specification to assign spatial weights among locations: cij = 1 , d2ij where cij is the spatial weight assigned to the distance between location i and location j, and dij is the Euclidian distance between locations i and j. The spatial weights, cij , are row standardized such that the diagonal elements of the spatial weighting matrix are all zero and the sum of any one row is one. The SAL model we estimate is the one in (5)-(6), specifying hauls as the dependent variable in the outcome equation. The variables used in both the selection equation and the outcome equation are illustrated in Appendix B. For identification purposes, the variable DumCV Mi (which indicates catcher-vessel (CVi ) and medium class vessel (Mi )) is suppressed from the estimation of the main equation. This variable was created because a predominant number of vessel groups that possessed this classification did not possess any haul information, y1i = 0 (were censored), and could therefore be used to explain the selection process. We implicitly assume that DumCV Mi does not enter the outcome equation, which is plausible since CVi and Mi are already included in this equation. Results of the empirical application for the SAL model are in process... The second empirical application involves illustrating the sample selection SAE model in the estimation of the efficiency of the production process within a fishery, using the catchper-unit-effort (CPUE) as a measure of efficiency. 17 The CPUE is defined as the quantity of a target species caught per a haul executed, the use of a trawl device, pot vessel, hook-and-line, jig, etc. Therefore, a higher CPUE indicates a higher degree of efficiency, given the inputs utilized. However, it is plausible to assume that for a fixed level of technological inputs, the CPUE may vary over the spatial distribution of the fishery and be correlated with the CPUE in surrounding locations. As a result, it is desirable to estimate the CPUE controlling for the spatial relationships within the fishery. To estimate this model, we use data from NMFS and NOAA for the Pacific cod fishery within the Eastern Bering Sea for the year 1997, which has been screened for confidentiality. This feature of the data makes it necessary to employ an estimation method that would remove the bias arising from sample selection while at the same time accounting for spatial dependence, such as our spatial heckit model with SAE dependence. The model estimated is of the form (1)-(2) with the observation matrix containing data for an aggregation of vessels of similar description and the CPUE as the dependent variable in the outcome equation. CPUE is defined as the catch for the aggregate within location i divided by the number of "unique hauls" executed in location i by the aggregate. Unique hauls reflects the amount of effort that the aggregate exerted within the given location. Variables within the observation matrix are vessel group (catcher-processors vs. catcher-vessels), size of vessels (small, medium or large) and gear utilized (non-pelagic trawl, bottom trawl, pelagic trawl, hook-and-line, pot vessel or jig gear). To augment this data set, bathymetric measurements are added indicating maximum and minimum depth in the area fished as well as biomass estimates obtained from the annual biomass trawl survey conducted by NMFS. This data set consists of 320 observations, with a censoring process of 35% and contains observations on 90 spatially distinct regions within the Eastern Bering Sea. Results of the empirical application for the SAE model are in process.. 6 Conclusion This paper proposes a method of estimation for two sample selection models with spatial dependence that differ in terms of the type of spatial correlation present. The first, a model with spatial autoregressive errors (SAE), is shown to be analytically more challenging than the second, a model with spatial autoregressive lags (SAL). The method of estimation for both models is analogous to the popular heckit model 18 (and thus we call our estimator "spatial heckit"), in which first consistent estimates of the probability of observing a particular unit (selection equation) are estimated using a probit model. Then, the odds of observing each unit are calculated (the inverse Mills ratio) and used as an additional regressor that controls for the selection bias in the equation of interest (outcome equation). Importantly, the method we propose for the model with SAE cannot be computed in the popular two-step procedure for the heckit model since the appropriate inverse Mills ratio depends on the SAE parameter of the outcome equation. Instead, we propose to estimate the whole model jointly by nesting the two equations into a "pseudo" sequential GMM framework (Newey, 1984). For the SAL estimator, this complication does not arise and thus can in principle be estimated in two steps, however, an advantage of using the joint GMM framework is that it yields the correct standard errors for the parameters directly. We explore the properties of our estimators by deriving their asymptotic properties, conducting simulations, and applying them to actual data. The estimators are consistent and asymptotically normally distributed. The simulations show the potential biases incurred by other estimators that ignore sample selection, spatial dependence, or both, and also show that our estimator is valuable when the data exhibits both of these characteristics. Finally, the empirical application section argues that sample selection is a common occurrence in spatial data sets typically available to researchers, and shows that our estimators are feasible to use in practice. While our proposed estimator is among the first to account for sample selection and spatial dependence simultaneously, some shortcomings are worth mentioning. First, our estimator relies on a distributional assumption (joint normality) of the error terms in selection and outcome equations, just as the heckit model does. Second is the computational intensity of our estimator, which is higher than the available methods for spatial models without sample selection, although still compares favorably in this respect with estimation methods that require approximation of multidimensional integrals. These shortcomings of our estimators indicate areas for future research. 19 References [1] Amemiya, T. 1974. The Nonlinear Two-Stage Least-Squares Estimator. Journal of Econometrics, 2: 105-10. [2] Amemiya, T. 1985. Advanced Econometrics. Cambridge, Mass.: Harvard University Press. [3] Andrews, D.W.K. 1992. Generic Uniform Convergence. Econometric Theory, 8: 241-57. [4] Anselin, L. 1988. Spatial Econometrics: Methods and Models. Dordrecht: Kluwer Academic Publishers. [5] Anselin, L. 2001. Spatial Econometrics. In B. H. Baltagi (ed.), A Companion to Theoretical Econometrics, Oxford: Blackwell. [6] Anselin, L. and A. Bera. 1998. Spatial Dependence in Linear Regression Models with an Introduction to Spatial Econometrics. In A. Ullah and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics. New York: Marcel Dekker. [7] Benjamin, J.D., J. Sa-Aadu and J.D. Shilling. 1992. Influence of Rent Differentials on the Choice Between Office Rent Contracts with and without Relocation Provisions. Journal of American Real Estate and Urban Economics Association. 20: 289-302. [8] Bernstein, S. 1927. Sur l’Extension du Theoreme du Calcul des Probabilites aux Sommes de Quantities Dependantes. Mathematische Annalen. 97, 1-59. [9] Beron, K. and W. Vijverberg. 1999. Probit in a Spatial Context: A Monte Carlo Analysis. Forthcoming in L. Anselin and R. Florax (eds.), Advances in Spatial Econometrics. New York: Springer. [10] Borjas, G.J., S.G. Bronars, and S. Trejo. 1992. Self-Selection and Internal Migration in the United States. Journal of Urban Economics. 32: 159—85. [11] Case, A.C. 1991. Spatial Patterns in Household Demand. Econometrica. 59: 953-965. [12] Case, A.C. 1992. Neighborhood Influence and Technology Change. Regional Sciences and Urban Economics. 22: 491-508. 20 [13] Casetti, E. 1972. Generating Models by the Expansion Method: Applications to Geographical Research. Geographical Analysis. 4: 81-91. [14] Cosslett, S.R. 1991. Semiparametric Estimation of a Regression Model with Sample Selectivity. In W.A. Barnett, J. Powell and G. Tauchen (eds.), Nonparametric and Semiparametric Methods in Econometrics and Statistics. Cambridge: Cambridge University Press. [15] Davidson, J. 1994. Stochastic Limit Theory. Oxford University Press: Oxford. [16] Fishback, P.V., W.C. Horrace and S. Kantor. 2002. Do Federal Programs Affect Internal Migration? The Impact of New Deal Expenditures on Mobility During the Great Depression. Working Paper Series, NBER, Number 8283. [17] Fleming, Mark. 2002. Techniques for Estimating Spatially Dependent Discrete Choice Models. Forthcoming in L. Anselin and R. Florax (eds.), Advances in Spatial Econometrics. New York: Springer. [18] Goodman, A.C. 1988. An Econometric Model of Housing Price, Permanent Income, Tenure Choice, and Housing Demand. Journal of Urban Economics. 23: 327-53. [19] Heckman, J. 1976. The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variables and a Simple Estimator for such Models. The Annals of Economic and Social Measurement. 5: 475-492. [20] Heckman, J. 1979. Sample Selection Bias as a Specification Error. Econometrica. 47: 153-61. [21] Kelejian, Harry H. and Ingmar R. Prucha. 1998. A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances. Journal of Real Estate Finance and Economics, 17: 99-121. [22] Kelejian, Harrry H. and Ingmar R. Prucha. 1999. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. International Economic Review, 40: 509-33. [23] Kelejian, H.H. and I.R. Prucha. 2001. On the Asymptotic Distribution of the Moran I Test Statistic with Applications. Journal of Econometrics. 104: 219-57. 21 [24] Lee, L.F. 2001a. Generalized Method of Moments Estimation of Spatial Autoregressive Processes. Working paper, Ohio State University. [25] Lee, L.F. 2001b. GMM and 2SLS Estimation of Mixed Regressive, Spatial Autoregressive Models. Working paper, Ohio State University. [26] LeSage, J.P. 2000. Bayesian Estimation of Limited Dependent Variable Spatial Autoregressive Models. Geographical Analysis. 32: 19-35. [27] Leung, S.F and S. Yu. 1996. On the Choice Between Sample Selection and Two-Part Models. Journal of Econometrics. 72: 197-229. [28] Maddala, G.S. 1983. Limited Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University Press. [29] McLeish, D.L. 1974. Dependent Central-Limit Theorems and Invariance Principals. Annals of Probability. 2, 620-628. [30] McMillen, D.P. 1992. Probit with Spatial Autocorrelation. Journal of Regional Science. 32: 335-48. [31] McMillen, D.P. 1993. Can Blacks Earn More in the Suburbs? Racial Differences in Intrametropolitan Earnings Variation. Journal of Urban Economics. 33: 167-88. [32] McMillen, D.P. 1995. Selection Bias in Spatial Econometric Models. Journal of Regional Science. 35: 417-36. [33] Montgomery, C. 1992. Explaining Home Improvement in the Context of Household Investment in Residential Housing. Journal of Urban Economics. 32: 326-50. [34] Newey, W. 1984. A Method of Moments Interpretation of Sequential Estimators. Economics Letters. 14: 201-206. [35] Pinkse, J. and M.E. Slade. 1998. Contracting in Space. An Application of Spatial Statistics to Discrete-Choice Models. Journal of Econometrics. 85: 125-54. [36] Schnier, K. 2003. Because That’s Where the Fish Are? Spatial Analysis of Locational Choice in the Eastern Bering Sea. Working Paper, University of Arizona. 22 [37] Topa, G. 2001. Social Interactions, Local Spillovers, and Unemployment. Review of Economic Studies. 2: 133-138. [38] Vella, F. 1998. Estimating Models with Sample Selection Bias: A Survey. Journal of Human Resources. 33: 127-69. 23 Table 1. Simulation Results for N=225 - Outcome Equation BIAS (sel, d=?) (25%,0.25) β0 OLS Heckit 0.305 -0.001 -0.208 -0.153 0.001 0.110 KP-SAE 0.295 0.009 β2 -0.200 γ -0.077 β0 (25%,0.5) 0.320 -0.199 0.315 β1 0.008 0.010 0.033 β2 -0.219 0.129 -0.229 γ -0.127 β0 (25%,0.75) 0.397 -0.272 0.405 β1 -0.004 -0.003 0.114 β2 -0.243 0.134 -0.353 γ -0.137 β0 (50%,0.25) 0.561 -0.220 0.556 β1 -0.011 0.000 -0.012 β2 -0.302 0.109 -0.294 γ -0.167 β0 (50%,0.5) 0.579 -0.349 0.576 β1 -0.004 0.006 0.011 β2 -0.309 0.144 -0.302 γ -0.242 β0 (50%,0.75) 0.656 -0.221 0.685 β1 0.002 0.013 0.047 β2 -0.326 0.072 -0.362 γ -0.257 Note: Simulation results are based on 100 replications. β1 RMSE Spheck Gen -0.109 -0.031 0.074 0.098 -0.152 -0.013 0.113 0.000 -0.203 -0.025 0.141 -0.072 0.404 -0.175 -0.184 -0.029 Spheck KP -0.196 -0.051 0.133 0.091 -0.229 -0.032 0.143 -0.002 -0.265 -0.050 0.139 -0.074 0.407 -0.216 -0.151 -0.037 OLS Heckit KP-SAE 0.380 0.268 0.336 0.757 0.268 0.615 0.408 0.276 0.348 0.886 0.277 0.669 0.538 0.335 0.403 1.808 0.336 1.041 0.639 0.344 0.452 1.190 0.350 0.780 0.669 0.359 0.476 1.560 0.361 0.852 0.773 0.376 0.507 1.943 0.382 0.943 0.379 0.279 0.336 0.152 0.438 0.372 0.398 0.171 0.849 0.949 0.972 0.167 0.631 0.339 0.452 0.221 0.685 0.395 0.519 0.276 0.946 0.822 0.906 0.289 Spheck Gen 0.790 0.264 0.647 0.136 0.923 0.277 0.694 0.087 1.251 0.325 0.784 0.101 Spheck KP 0.934 0.278 0.785 0.133 1.042 0.287 0.791 0.083 1.193 0.329 0.750 0.104 1.678 0.415 0.799 0.067 1.839 0.425 0.898 0.072 Table 2. Simulation Results for N=400 - Outcome Equation BIAS (sel, d=?) (25%,0.25) β0 β1 β2 γ (25%,0.5) β0 β1 β2 γ (50%,0.25) OLS 0.329 -0.009 -0.242 Heckit -0.066 -0.006 0.028 0.334 -0.009 -0.234 -0.069 -0.012 0.036 KP-SAE 0.330 -0.003 -0.255 -0.086 0.357 -0.019 -0.263 -0.125 β0 β1 β2 γ (50%,0.5) β0 β1 β2 γ Note: Simulation results are based on 50 replications. RMSE Spheck KP -0.065 -0.064 0.018 0.089 -0.053 -0.043 0.012 -0.001 OLS 0.361 0.188 0.304 Heckit 0.435 0.191 0.315 0.374 0.205 0.301 0.484 0.203 0.329 KP-SAE 0.363 0.208 0.319 0.139 0.405 0.287 0.352 0.157 Spheck KP 0.451 0.213 0.310 0.108 0.490 0.211 0.321 0.050 7 Appendix A The following provides a description of the variables used in the estimation procedure and the sources from which they were obtained. HAULSi : The aggregate hauls for a given vessel group classification.(NOAA) CPi : A dummy variable for a catcher-processor vessel group.(NOAA) CVi : A dummy variable for a catcher-vessel vessel group.(NOAA) Li : A dummy variable for a large class vessel group.(≥ 125 feet)(NOAA) Mi : A dummy variable for a medium class vessel group.(99 − 124 feet)(NOAA) Si : A dummy variable for a small class vessel group.(< 99 feet)(NOAA) DumCV Mi : A dummy variable created by crossing CVij with Mij . NP Ti : A dummy variable for whether or not the vessel group used non-pelagic trawl gear.(NOAA) BT Ri : A dummy variable for whether or not a vessel group used a bottom trawl.(NOAA) P T Ri : A dummy variable for whether or not a vessel group used pelagic trawl gear.(NOAA) HALi : A dummy variable for whether or not a vessel group used hook and line gear.(NOAA) P OTi : A dummy variable for whether or not a vessel group used pot vessels for gear.(NOAA) JIGi : A dummy variable for whether or not a vessel group group used jigs as gear.(NOAA) REGION1: A dummy variable for the region 1 classification of the Eastern Bering Sea(NMFS) REGION2: A dummy variable for the region 2 classification of the Eastern Bering Sea(NMFS) REGION3: A dummy variable for the region 3 classification of the Eastern Bering Sea(NMFS) REGION4: A dummy variable for the region 4 classification of the Eastern Bering Sea(NMFS) REGION5: A dummy variable for the region 5 classification of the Eastern Bering Sea(NMFS) REGION6: A dummy variable for the region 6 classification of the Eastern Bering Sea(NMFS) XCOORDi : The X-Coordinate that observation ij was observed.(Nautical Maps and ADFG Stat Areas) Y COORDi : The Y-Coordinate that observation ij was observed.(Nautical Maps and ADFG Stat Areas) MAXDEP T Hi : The maximum depth in location i.(Nautical Maps with Bathymetric Measurements) MINDEP T Hi : The minimum depth in location i.(Nautical Maps with Bathymetric Measurements) BIOMASSi : The biomass estimate for observation ij. (NMFS) LAGBIOMASSi : The lagged biomass estimate for observation ij. (NMFS) IMRi : The inverse-mills ratio constructed from the selection equation. 8 Appendix B The following is a list of the variables included in the empirical application of equations (??)-(??). Pacific cod: x1i : CPi , Li , DumCV Mi, NP Ti , BT Ri , HALi , P OTi , REGION3, REGION 4, REGION5, REGION6, Dum1995, Dum1996, Dum1997, Dum1998, Dum1999, BIOMASSi , LAGBIOMASSi , MAXDEP T Hi , MINDEP T Hi 24 z1i : CPi , Li , NP Ti , BT Ri , HALi , P OTi , REGION3, REGION4, REGION 5, REGION6, Dum1995, Dum1996, Dum1997, Dum1998, Dum1999, BIOMASSi , LAGBIOMASSi , MAXDEP T Hi , MINDEP T Hi , IMRi 9 Appendix C The parameter estimate of the sample selection model with SAE dependence is obtained 0 ∼ from the solution to (15), where gN (θ) = N1 zN uN (θ). Denote g(θ) ≡ lim E[gn (θ)]. Then, n→∞ the unknown parameter vector θ0 satisfies lim E[gN (θ0 )] = 0. Further, define the objective n→∞ p 0 function as QN = gN (θ)MN gN (θ), where MN → M, and Q = g 0 (θ) Mg (θ). The proof of all three propositions follow closely Pinkse and Slade (1998). The main difference is that some extra conditions have to be verified in for the additional moments stacked in gN (θ) and the estimated inverse-Mills ratio (IMR). Assumptions A1 θ0 is in the interior of the parameter space Θ,which is a compact set. A2 Q is uniquely minimized at θ0 A3 The vector valued function g (θ) is continuous. A4 The density of observations in any region whose area exceeds a fixed minimum is bounded. A5 The elements of zN are uniformly bounded.14 A6 Let dij denote the distance between location i and j, then sup|cov(y1i , y1j )| ≤ α (dij ) , N ij sup|cov(y2i , y2j )| ≤ α (dij ) and N ij α (d) → 0 as d → ∞. A7 The moments var (uij ) = σ 21 P £ 1 ¤2 P £ 2 ¤2 P ωij , var (u2i ) = σ 22 ω ij , and E [u1i, u2i ] = σ 12 ω1ij ω2ij j j j are uniformly bounded , bounded away from zero, and boundedly differentiable (with respect to θ). p A8 As N → ∞, MN → M for some positive definite matrix M. A9 As d → ∞, d2 α (dd∗ ) /α (d∗ ) → 0, for all fixed d∗ > 0. A10 The area in which the observations are located grows at a rate of √ N in both directions. 0 A11 Ψ1 (θ0 ) = lim E{NgN (θ0 )gN (θ0 )} and Ψ2 (θ0 ) = [∂g 0 (θ0 )/∂θ]M[∂g(θ0 )/∂θ0 ] are posiN→∞ tive definite matrices. 14 λ(δ, γ) is an element of zN which will be shown below to be uniformly bounded. 25 uN i var(uN i ) A12 Let ρNij (θ) be the covariance between √ and √ uN j var(uN j ) . For some fixed N ∗ > 0, ρN ij (θ) is boundedly differentiable , uniformly in θ ∈ Θ, N > L, and i 6= j. A13 | ρNij (θ) | is boundedly away from one from below, uniformly in θ ∈ H(θ0 ), N > N ∗ , and i 6= j, with H(θ0 ) some neighborhood of θ0 . P A14 As N → ∞, N1 | ρNij (θ) | is uniformly bounded in θ ∈ H(θ0 ). ij A discussion of the implications of these assumptions can be found in Pinkse and Slade (1998). Proof of Proposition 1 By assumption A2, Q is uniquely minimized at θ0 . Thus, we only need to establish that QN converges uniformly to Q over the parameter space Θ. To show that QN converges uniformly to Q over Θ, it suffices to show that p p (a) QN → Q at all θ ∈ Θ ⇐⇒ gN (θ) → g (θ) at all θ ∈ Θ. p (b) QN is stochastically equicontinuous and Q is continuous on Θ ⇐⇒ gN (θ) → g (θ) is stochastically equicontinuous. p p For (a), note that g (θ) ≡ lim E [gN (θ)], that is , E [gN (θ)] → g (θ) . If gN (θ) → N→∞ p E [gN (θ)], then gN (θ) → g (θ) , so we show the former. Define the functions τ 1i (ψi (θ)) ≡ φ (ψi (θ)) /{Φ (ψi (θ)) [1 − Φ (ψi (θ))]} such that u e1i (θ) = τ 1i (ψi (θ)) (yi − Φ (ψi (θ))). 0 Let τ i (ψi (θ)) ≡ (τ 1i (ψi (θ))) , 1) where 1 is a conformable vector of ones such that 0 0 0 u eN (θ) = ((τ 1 (ψ (θ)) (y − Φ (ψ (θ)))) , 1u2N (θ)) . Then, lim E|gN (θ) − E [gN (θ)] |2 1 X 0 = lim 2 zi τ Ni τ Nj zj cov [yi, yj ] n→∞ N ij 1 X ≤ lim 2 C α (dij ) by assumptions A5-A7 N→∞ N ij n→∞ = 0 p since α (dij ) → 0 as d → ∞ by assumption A6. Therefore, since gN (θ) → E [gN (θ)] then p gN (θ) → g (θ). For (b), given that by assumption A3 g(θ) is continuous, we need to show that gN (θ) is stochastically equicontinuous. 26 Using the mean value theorem for θ∗ between θ and ˜θ we rewrite ³ ´ uNi (θ∗ ) 1 0 0 1 0 ∂e gN (θ) − gN ˜θ = zN {e uN − u eN (˜θ)} = zi (θ − ˜θ). 0 N N ∂θ Following Andrews(1992), stochastic equicontinuity is implied by ¯ ¯ ¯ 1 X 0 ∂e uNi (θ) ¯¯ ¯ sup ¯ zi ¯ = Op (1) . 0 ∂θ ¯ θ∈Θ ¯ N i 0 0 Recall that τ i (ψ (θ)) ≡ (τ 1i (ψ (θ)) , 1) with τ 1i (ψi (θ)) ≡ φ[ψi (θ)]/Φ[ψi (θ)] {1 − Φ (ψi (θ))}. 0 0 0 Similarly, define hi (θ) ≡ ((yi − Φ(ψi (θ))) , u e2i (θ) ) , such that uNi (θ) = τ i (ψi (θ)) hi (θ) and the derivative becomes: · ¸ ∂e uNi (θ) ∂τ i (ψi (θ)) ∂hi (θ) ∂ψi (θ) = hi (θ) − τ i (ψi (θ)) 0 ∂θ ∂θ ∂θ ∂θ where ∂τ i (ψi (θ)) ∂ = ∂θ ∂θ ∂ ∂hi (θ) = ∂θ ∂θ · 0 τ 1i (ψi (θ)) 1 " ¸ = " ∂ ∂θ h Φ[ψ i (θ)] Φ[ψ i (θ)]{1−Φ(ψ i (θ))} 0 0 [y1 − Φ[ψi (θ)]] ¤0 £ 0 y2 − β 0 − x2i β 1 − µλi (δ, γ) # = i0 # 0 [−Φ[ψi (θ)]] h i0 0 ∂λi (δ,γ) − 1 − x2i − µ ∂θ − λi (δ, γ) ∂ ∂ψi (θ) rαo + x1i α1 = ∂θ ∂θ P £ω 1 (θ)¤2 0 j ij rP £ ¤2 1 ∂ (θ) ω · ¸ sX ij ´ ¤2 ³ ¤2 £ 1 £ 1 j 0 · Σ ωij (θ) = 1 + x1i − (α0 + x1i α1 ) ωij (θ) j ∂θ j N i (θ) is bounded uniformly. Following Then the task is to show that each of the parts of ∂hu∂θ 0 Pinkse and Slade (1998), we first establish ¯ ¯ ¯ ∂τ (t) ¯ ∂h (t) ¯ ¯<∞ h (t) − τ (t) sup ¯ ∂t ¯ ∂t yi ∈0,1;t∈R,y2 ∈R for which it is enough to show (i) ∂τ∂θ(θ) and (ii) ∂h(t) τ (t) are bounded uniformly in t. ∂θ 1 (t) For (i), given that the only non-zero components of ∂τ∂θ(θ) are those of ∂τ∂θ , we concentrate on those. These components are the same as in Pinkse and Slade (1998) setup and thus their same arguments apply. Note that: · ½ ¾¸ 1 φ (t) φ (t) φ2 (t) ∂τ 1 (θ) = −t − 2 ∂θ Φ (t) 1 − Φ (t) 1 − Φ (t) Φ (t) (1 − Φ (t)) 27 and the only places it can be unbounded are at ±∞. Since the expression is an even function, φ(t) and rewrite the above expression as: it suffices to check t → ∞. Define Υ (t) = 1−Φ(t) ∂τ 1 (t) (Υ (t) − t)2 + t(Υ (t) − t) − φ (t) (Υ (t) − t)/Φ (t) − tφ (t) /Φ (t) = ∂t Φ (t) and noting that as t → ∞, Φ (t) → 1 and tφ (t) → 0, and that ½ ¾ Z ∞ Z ∞ ¡ −4 ¢ tφ (t) φ (w) 1 1 − Φ (w) = dt = 1+ 2 +O w φ (t) dt = t w w w w as w → ∞, the remaining term is Υ (t) − t = as t → ∞. Thus, analyzing [ 1 + O(t−3 )]2 + t ∂τ 1 (t) = t ∂t and hence ∂τ 1 (θ) ∂θ £1 t 1+ t 1 − t = + O(t−3 ) −4 + O (t ) t 1/t2 ¤ £ ¤ + O(t−3 ) − φ (t) 1t + O(t−3 ) /Φ (t) − tφ (t) /Φ (t) →1 Φ (t) is bounded. ∂h(t) τ ∂θ For (ii), that is, (t), recall that 0 [−φ (t)] ∂h (t) h i0 and τ (t) = = 0 ∂λ(δ,γ) ∂t − 1 − x2 − µ ∂θ − λ (δ, γ) · 0 τ 1 (t) 1 ¸ and note it can be written as # # " " 1 [Υ (t) − t + t] −φ (t) Φ(t) −φ (t) τ 1 (t) ∂h (t) h i i . h = τ (t) = 0 0 ∂λ(t) − 1 − x2 − µ ∂λ(t) − λ (t) − µ − λ (t) − 1 − x ∂t 2 ∂t ∂t 1 The first component, φ (t) Φ(t) [Υ (t) − t + t] is bounded by the same arguments above: −1 Φ (t) → 1, (Υ(t) − t) = t + Oh(t−3 ) and φ (t) → 0. So i that it remains to check that the 0 ∂λ(t) second component is bounded: 1 − x2 − µ ∂t − λ (t) . Take λ (t) = Σω1ij ω2ij j Σ[ω1ij ]2 j · φ(t) {1−Φ(t)} = Σω1ij ω2ij j Σ[ω 1ij ]2 Υ (t). j 1 as t → ∞ and by assumption A7, λ (t) is bounded. 1/t+1/t3 +(1/t)O(t−4 ) 1 2 Σωij ωij j = · ∂Υ(t) , by assumption A7 the first term is bounded, while using Taking ∂λ(t) ∂t ∂t Σ[ω 1ij ]2 Since Υ (t) = j results above, the second term: · ¸ φ2 (t) − φ (t) t (1 − Φ (t)) ∂ φ (t) {1 − Φ (t)} φ (t) (−t) − φ (t) (−φ (t)) = = ∂t 1 − Φ (t) [{1 − Φ (t)}]2 [{1 − Φ (t)}]2 = φ2 (t) [1/t + 1/t2 + 0 · (t−4 )] − φ2 (t) [1/t + 1/t3 + (1/t)0 · (t−4 )]2 [1/t + 1/t3 + (1/t)0 · (t−4 )]2 28 ∂ and thus ∂t Υ (t) is bounded and therefore ∂λ(t) is bounded and the expression ∂h(t) is also ∂t ∂t bounded. ¸ · y1 − Φ (t) ¤ is also bounded as Φ (t) → Note also that the term h (t) = £ 0 y2 − β o − x2 β 1 − µλ (t) 1 and λ (t) is bounded. By assumption A5 the elements of zN are bounded and thus for some constant C: ¯ ¯ ° N ° ¯ 1 X 0 ∂e ¯ ° ∂ψ u (θ) 1 X° (θ) ¯ ¯ Ni i ° °. sup ¯ zi ¯ ≤ Csup 0 ° ∂θ ° ∂θ ¯ θ∈Θ ¯ N i θ∈Θ N i=1 Finally, checking ∂ψ i (θ) ∂θ with previous results, rP £ ¤2 1 " # ∂ (θ) ω sX ij ´ X£ 1 ¤2 ³ ¤2 £ 1 j ∂ψi (θ) 0 0 · . 1 + x2i − (α0 + x1i α1 ) = ω ij (θ) ω ij (θ) ∂θ ∂θ j j Since x1i is bounded by A5 and the terms in the sums are also bounded by A7, then ° N ° P ° ∂ψi (θ) ° sup N1 ° ∂θ ° is bounded by A6 and A1. QED θ∈Θ i=1 Proof of Proposition 2 The first order ³ ´ conditions based on the objective function QN = gN 0 (θ) MN gN (θ) are ∂QN ˆθ given by = 0. Using the mean value theorem for θ∗ between ˆθ and θ0 we can write: ∂θ ³ ´ · ∂ 2 Q (θ∗ ) ¸−1 ∂Q (θ ) N N 0 ˆθGMM − θ0 = (21) 0 ∂θ ∂θ∂θ The expression that corresponds to the second derivative of the objective function QN can be written as: "N # N X ∂ 2 QN (θ) eNi (θ) 0 ∂e uNi (θ) 0 ∂e uNj (θ) 2 X ∂2u . = 2 z MN zj u eN j (θ) + zi MN zj 0 N i=1 ∂θ∂θ0 i ∂θ ∂θ∂θ0 ∂θ i=1 We start by analyzing the convergence properties of this second derivative of the objective function. The following lemma will be useful. Lemma 1 (Pinkse and Slade (1998) Lemma A3) p For any ˜θ consistent for θ0 , i.e. ˜θ −→ θ0 : ³ ´ ∂gN ˜θ p ∂g (θ 0 ) (a) −→ 0 ∂θ ∂θ0 29 ³ ´ p (b) gN ˜θ −→ gN (θ0 ) Proof: (a) Need to show that for ω such that kωk = 1, ³ ´ ∂gN ˜θ ∂gN (θ0 ) p ω0 − −→ 0 0 ∂θ ∂θ0 ∂gN (θ0 ) ∂g (θ0 ) p = follows from gN (θ0 ) −→ g (θ0 ) and A3. 0 0 N−→∞ ∂θ ∂θ 0 Setting z¯i = ω zi and using the mean value theorem, the above expression can be written as ³ ´ ˜ N N ´0 1 X θ ∂e u ∂2u eNi (θ∗ ) 1 X Ni ∂e uN i (θ0 ) ³˜ = z¯i − z ¯ = θ − θ 0 i N i=1 N i=1 ∂θ0 ∂θ0 ∂θ∂θ0 as lim where θ∗ is between ˜θ and θ0 . N 1P ∂2u eNi (θ∗ ) Analyzing the last expression, z¯i is bounded since by A5 z¯i is uniN i=1 ∂θ∂θ0 formly bounded (including λ (δ, γ), which was shown in the proof of proposition 1) ´ p ³ eNi (θ∗ ) ∂2u p ˜ ˜ is also bounded. Therefore, since θ −→ θ0 , θ − θ0 −→ 0 and thus and 0 ³ ´ ∂θ∂θ ∂gN ˜θ ∂gN (θ0 ) p − −→ 0 ¥ ω0 0 ∂θ ∂θ0 (b) The proof is analogous to part (a). N P zj u eNj (θ) = j=1 p gN (θ∗ ) −→ gN (θ0 ) Now returning to analyzing the second derivative of QN (θ), note that op(N) from (a) in the proof of proposition 1. Furthermore, by lemma 1 N ∂2u P eNi (θ∗ ) 0 zi ω is bounded in probability in probability ∀ kωk = 1. Then, the and also N1 ∂θ∂θ0 i=1 first term inside the square brackets will vanish asymptotically. ³ ´ ˜θ ∂e u N Ni P ∂ 2 QN (θ) 1 , it will converge Considering the remaining term of ∂θ∂θ0 , looking at N zi ∂θ0 i=1 ∂ 2 QN (θ∗ ) p ∂g (θ0 ) p by lemma 1. Finally, since M → M by A8, → in probability to N ∂θ¸0 ∂θ∂θ0 ¸ · · 0 ∂g (θ0 ) ∂g (θ0 ) ≡ Ψ2 (θ0 ). M ∂θ ∂θ0 30 ∂QN (θ0 ) ∂QN (θ0 ) in equation (21), which is equal to = ∂θ ∂θ 0 0 0 ∂g (θ0 ) ∂g (θ0 ) p ∂g (θ0 ) 2 N MN gN (θ0 ) and it follows from previous results that N . −→ ∂θ ∂θ ∂θ The remaining task is to show that gN (θ0 ) → N (0, Ψ1 (θ0 )). To do this, we follow again closely Pinkse and Slade (1998)’s strategy and employ Berenstein (1927) blocking method using McLeish (1974) central limit theorem for dependent processes in Davison’s (1994, chapter 24). N 1 √ P 0 (θ0 )]}− 2 NgN (θ0 ) = √1N AN t for implicStart by defining Y0N = ω 0 {E [NgN (θ0 ) gN Now we turn to the term t=1 d itly defined ANt and ∀ kωk = 1, and thus the task is to show Y0N → N (0, 1). Following Davison (1994, chapter 24), √ √ split the region in which the observations are located into aN areas of size c1 bN × c2 bN each, where aN and bN are integers such that aN bN = N. Without loss of generality, set c1 = c2 = 1 and let aN and bN be such that α (bN ) aN → 0 1 and N l−( 2 ) bN < 1 uniformly in N for some fixed 0 < l < 12 . Define the set of indices ΛNj that correspond to observations in area j. By assumption, a number c > 0 exists such that max |ΛN j | < CbN , where |·| applied to sets denotes the j cardinality of that set. Define DN j = √1 N P ANt such that Y0N = t∈ΛN j aN P DNj . j=1 Following Davison’s (1994) theorem 24.1, McLeish’s (1974) CLT requires that the following conditions hold. aN (a) TNaN = Π (1 + iλDNj ) is uniformly integrable in N > N ∗ for some fixed N ∗ and j=1 λ > 0. (b) E [TNaN ] − 1 → 0 (c) aN P p 2 DNj −1→0 j=1 p (d) max |DN j | → 0 j Before establishing each of these conditions, we note that for sufficiently large N, ANt is 0 bounded, since Ψ1 (θ0 ) is positive definite (p.d.) and thus E [NgN (θ0 ) gN (θ0 )] is p.d. and its inverse is bounded. We establish each of the above conditions in turn. (a) It needs to be shown that for some fixed N ∗ , sup E |TNaN I {|TNaN | > K}| → 0 as K → ∞ N>N ∗ where I is an indicator function. Following Pinkse and Slade (1998), we begin by showing that P for some K > 0 which will imply the above condition. Note that 31 · ¸ sup |TNaN | > K = 0 N>N ∗ ¯ ¯ ¸ ¯ aN ¯ ¯ ¯ P sup |TNaN | > K = P sup ¯ Π (1 + iλDNj )¯ > K N>N ∗ ·N>N ∗ aj=1 ¸ ¢ 12 N ¡ 2 2 ≤ P sup Π 1 + λ DNj > K N>N ∗ j=1 # " · ¸ aN ¡ 1 ¢ 2 2 l l 2 = P sup Π 1 + λ DNj > K | sup N |DNj | ≤ C · P maxN |DN j | ≤ C + j N>N ∗ j=1 N>Nj∗ " # " # aN ¡ 1 ¢ 2 2 P sup Π 1 + λ2 DNj > K | sup N l |DNj | > C · P sup N l |DNj | > C N>N ∗ j=1 N>Nj∗ N>N ∗ " # " j # aN ¡ 1 ¢ 2 2 ≤ P sup Π 1 + λ2 DN > K | sup N l |DN j | ≤ C + P sup N l |DNj | > C j · ¸ · N>N ∗ j=1 N>Nj∗ N >Nj∗ with C©¡ a uniform upper ª the ANt0 s . Nothing that the first summand is bounded ¢ bound to 2 −2l an/2 > K which is zero for sufficiently large K, and that the by sup I 1 + λ CN N>N ∗ · ¸ 1 l−( 12 ) second summand is bounded by P sup N bN |ANt | > C = 0 since N l−( 2 ) bN < 1 by N >N ∗ ,l ¸ · construction. Therefore, P sup |TNaN | > K = 0 for some K > 0, which in turn implies N>N ∗ sup E |TNaN I {|TNaN | > K}| → 0 as K → ∞ and thus condition (a) holds. N>N ∗ aN P (b) From Davison (1994) we can write TNaN = (iλ) DNj TN,j−1 . Then, it is enough to j=1 ¡ −1 ¢ show that the max |E [DNj TN,j−1 ]| = o aN . j Writing TN,j−1 = Π k∈ΞN j1 (1 + iλDNk ) · Π k∈Ξ / N j1 (1 + iλDNk ) = Π k∈ΞN j1 (1 + iλDNk ) TRNj where TRNj is implicitly defined and ΞNj1 is the set of blocks adjacent to block j. µ ¶ P γk Thus, TN,j−1 = (iλ) Π DNk TRNj with ΓNj the set of vectors of size equal to γ∈ΓN j k∈ΞN j1 |ΞNj1 | whose elements are all either zero ¯or one. · Because the number ¸¯ of elements in ΓNj ¯ ¯ ¡ ¢ γk ¯ = o a−1 is finite, it is enough to show that max ¯¯E DNj TRNj Π DNk N . To show ¯ j,γ k∈ΞN j1 ¡ ¢ this, we show that (i) max |E [DNj TRNj ]| = o a−1 and (ii) max |E [DNj DNk TRNj ]| = N j j6=k ¡ −1 ¢ o aN . 1 For (i), note that the observations in TRNj are located at least a distance bN2 away from those in DNj by construction. Therefore, max |E [DNj TRNj ]| is bounded by j ¡√ ¢ C1 maxE |DNj TRNj | α bN for large C1 > 0 given the conditions on the covariances j and that E [DNj ] = 0. Using the properties of the constructed aN , bN and the properties of α in A6: ³ ³ − 1 − 1 ¡√ ¢´ ¡√ ¢ ¡√ ¢´ ¡ ¢ 1 = o a−1 C1 maxE |DNj TRNj | α bN = o N − 2 bN α bN = o aN 2 bN 2 α bN N j for large N, as required. 32 1 For (ii), max |E [DN j DNk TRNj ]| ≤ maxN − 2 bN |E[DNj TRNj ]| by the boundedness propj6=k j6=k ³ ´ ¡ ¢ 1 erty of the ANt0 s . The maximum is thus of order O N − 2 bN a−1 which is o a−1 N N . Repeating the argument for the remaining elements in ΓN j completes the proof. Therefore, condition (b) E [TNaN ] → 1 is demonstrated. (c) We start by showing that aN ¡ £ 2 ¤¢ p P 2 − E DNj →0 DNj j=1 Take aN P √ C1PaN ©¡ 2 £ 2 ¤¢ ¡ 2 £ 2 ¤¢ª ¡√ ¢ 4 E DNj − E DNj (l + 1) α bN l maxE [DNi ] DNj − E DNj ≤ C2 i,j=1 i l=0 where C1 , C2 > 0 are sufficiently large constants, and the inequality is a result of the conditions on the covariances and locations in assumptions A6 and A9. The right hand side of the inequality is of order 0 (N −2 b3N aN ) since it follows from the conditions 4 in A6 and A9 that maxE [DN i ] is bounded by i P 1 max |E N2 i t1 ,t2 ,t3 ,t4 ∈ΛNi P C1 N12 max i t ,t ,t ,t ∈Λ 1 2 3 4 P C4 N12 b2N max i [ANt1 AN t2 AN t3 ANt4 ] | ≤ {α (dt1 ,t2 ) + . . . + α (dt3 ,t4 )} ≤ C2 N12 b2N max i Ni 1/2 C3 bN P P t1 ,t2 ∈ΛNi lα (l) for some C1 , C2 , C3 , C4 > 0, and since t1 ∈ΛN i l=0 α (dt1 ,t2 ) ≤ ∞ P lα (l) is bounded l=0 by A6, the last term is of order O (N −2 b3N aN ). √ C1PaN ¡√ ¢ 4 −2 3 (l + 1) α bN l maxE [DN bN aN ) = o (1) as n → ∞ by the Finally, C2 i ] = O (N i l=0 properties of aN and bN . Thus we have shown that (c) can be written as: as aN P aN P 2 DNj −1 = j=1 aN P £ ¤ aN ¡ £ 2 ¤¢ p P 2 − E DNj → 0 and condition DNj j=1 2 E DN j − 1 + op (1) which can be further rewritten j=1 £ 2 ¤ P 2 − 1 + op (1) = E [Y0N E DNj ] − 1 − E [DNi DNj ] + op (1) . j=1 i6=j P To check the order of convergence we need only to analyze the term E [DNi DNj ]. For i6=j P this it is enough to analyze max |E [DNi DN j ] | since each of the summations over i and i i6=j j contain P terms with aN or aN−1 . Take ΞNil as previously defined, then, for some C1 > 0, max |E [DNi DNj ] | can be bounded by i i6=j √ C1 aN max i X X l=1 j∈ΞN il |E [DNi DNj ]| ≤ max i X j∈ΞN il √ C1 aN |E [DNi DNj ]| + max i X X l=2 j∈ΞN il |E [DNi DNj ]| and we analyze each of the terms in the right-hand side ¯ . ¯ ¯ ¯ P ¯1 ¯ For the first term, note that max |E [DN i DNj ]| = max ¯ N E [ANt ANs ]¯ i6=j i6=j ¯ t∈Λ ,s∈Λ ¯ Ni Ni 33 (22) ≤ maxC1 N1 i6=j P t∈ΛN i ,s∈ΛN i α (dts ) for some large C1 > 0, by the boundedness of AN t0 s and A6. Consider adjacent blocks for which dependence will be typically stronger, then, by A9 and 1/2 A6, the number of (t, s) combinations within distance d is bounded by C2 bN d2 for some √ C4 bN 1/2 P C2 > 0. Letting C3 = C2 C1 the expression is bounded by C3 max N1 bN d2 α (d) for some d=0 ¡ ¢ C4 > 0 and since by A9 d2 α (d) → 0 the expression is o N1 bN and thus max |E [DNi DNj ]| = i6=j ¡ ¢ ¡ ¢ o N1 bN or o a−1 (since n = a b ), and so is the first term in (22). N N N For the second term, first note that by the boundedness of ANt and E [AN t ] = 0: max j∈ΞN il ¡ ¡√ ¢¢ max max |E [ANt AN s ]| = O α bN (l − 1) uniformly in l. Therefore, the second term is t∈ΛN i s∈ΛN j √ C1PaN ¡√ ¢ |ΞN il | |ΛNi | |ΛNj | α bN (l − 1) ≤ i l=2 à ! √ √ C1PaN C1PaN ¡√ ¢ ¡ ¢ lα bN l = o N1 bN lα (l) = o a−1 C3 N1 b2N N . The second-to-last equality bounded by C2 max 1 N l=1 is due to √ C1PaN l=1 α (ts) = o (t2 ) as t → ∞ and the last one due to the boundedness of the sum α (s) lα (l). l=1 Finally, since max i (c) is demonstrated. P i6=j ¡ ¢ P |E [DNi DNj ]| is o a−1 and thus E [DNi DNj ] is o (1), condition N (d) Using the boundedness of ANt , ¯ ¯ ¯ ¯ P ¯ √1 ¯ ANt ¯ ≤ max |DNj | = max ¯ N j j ¯ ¯ t∈ΛN j construction of bN . i6=j √1 N |ΛNj | max |ANt | = Op t ³ √1 bN N ´ = op (1) by the d Since conditions (a)-(d) are satisfied under the current assumptions, Y0N → N (0, 1) ⇐⇒ d gN (θ0 ) → N (0, Ψ1 (θ0 )) which concludes the proof of proposition 2. QED Proof of Proposition 3 ³ ´ ³ ´ p ∂gN ˆθGMM p ∂g (θ ) 0 → by part (a) Ψ2N ˆθGMM → Ψ2 (θ0 ) follows from the fact that 0 0 ∂θ ∂θ p p of Lemma 1, by the consistency ³ ´ pof θGMM → θ0 ; and by A8 which ³ assumes ´ pMN → M To show that Ψ1N ˆθGMM → Ψ1 (θ0 ) we can show that Ψ1N ˆθGMM → Ψ1N (θ0 ) as we did in Lemma 1. Note that ¢ ¢ª © ¡ ¡ P Ψ1N (θ0 ) = N1 τ N i (θ0 ) τ Nj (θ0 ) zi zj0 Φ2 ψi (θ0 ) , ψ j (θ0 ) , ρNij (θ0 ) − Φ2 ψi (θ0 ) , ψ j (θ0 ) , 0 ij ¢ ¡ ∗ X (θ ) , ψ (θ ) , ρ (θ ) ψ ∂Φ 1 2 0 0 i j Nij = τ Ni (θ0 ) τ N j (θ0 ) zi zj0 ρNij (θ0 ) (23) N ij ∂ρ 34 where τ (·) is as defined in the proof of proposition 1, Φ2 stands for the bivariate normal distribution, and the second equality follows from the mean value theorem for θ∗ between ˆθGMM and θ0 . First consider the partial derivative term and show that it is bounded. Take the following bivariate normal distribution. ½ ¾ Zb Za 1 1 2 2 Φ2 (a, b, ρ) = (t − 2ρts + s ) dtds exp − 2 (1 − ρ2 ) 2Π (1 − ρ)1/2 −∞−∞ −1/2 and make the following change of variable from t to u = (t − ρs) (1 − ρ2 ) . Integrating over u and differentiating with respect to ρ yields: Zb n o ∂Φ(a,b,ρ) 1 2 −1/2 2 −3/2 √ −s (1 − ρ which, by the conditions on = ) + (a − ρs) ρ (1 − ρ ) ∂ρ 2Π −∞ ρ assumed in A12 - A14, is bounded. Repeating the application of the mean value theorem with respect to θ in (23), using the p boundedness that ˆθGMM → θ0 yields the result ³ of the ´ pabove partial derivative and ³ the ´fact p that Ψ1N ˆθGMM → Ψ1N (θ0 ) and thus Ψ1N ˆθGMM → Ψ1 (θ0 ). QED 35
© Copyright 2024