Review of Economic Studies (2001) 68, 543–572 2001 The Review of Economic Studies Limited 0034-6527兾01兾00230543$02.00 Estimation of Dynamic Panel Data Sample Selection Models EKATERINI KYRIAZIDOU Uniûersity of California, Los Angeles First ûersion receiûed December 1997; final ûersion accepted June 2000 (Eds.) This paper considers the problem of identification and estimation in panel data sample selection models with a binary selection rule, when the latent equations contain strictly exogenous variables, lags of the dependent variables, and unobserved individual effects. We derive a set of conditional moment restrictions which are then exploited to construct two-step GMM-type estimators for the parameters of the main equation. In the first step, the unknown parameters of the selection equation are consistently estimated. In the second step, these estimates are used to construct kernel weights in a manner such that the weight that any two-period individual observation receives in the estimation varies inversely with the relative magnitude of the sample selection effect in the two periods. Under appropriate assumptions, these ‘‘kernel-weighted’’ GMM estimators are consistent and asymptotically normal. The finite sample properties of the proposed estimators are investigated in a small Monte-Carlo study. 1. INTRODUCTION Panel data are very useful in applied research. Not only do they allow researchers to study the intertemporal behaviour of individuals, they also enable them to control for the presence of unobserved permanent individual heterogeneity. To date there exists a large body of literature on panel data models with unobserved individual effects that enter additively in the (possibly latent) regression model (see for example Hsiao (1986), and Ma´tya´s and Sevestre (1996)). In recent years, considerable advances in the panel data literature have been made in the direction of dynamic linear models that allow for the presence of lags of the dependent variable and other predetermined variables (see for example Ahn and Schmidt (1995), Arellano and Bover (1995), and Blundell and Bond (1998)). These are reviewed in Arellano and Honore´ (1999), who also describe results for dynamic non-linear panel data models (such as discrete choice and sample selection models). Much less is known, however, for this type of models. This paper aims to contribute in that direction. In particular, we consider the problem of identification and estimation in panel data sample selection models with a binary selection rule (Type 2 Tobit models in the terminology of Amemiya (1985)) when the latent equations contain strictly exogenous variables, lags of the dependent variables, and additive unobserved individual effects. The model under consideration has the form y*it Gρ0 y*itA1 Cx*it β 0 Cα *i Cε *it , (1) yit Gdit y*it , (2) dit G1{φ 0 ditA1 Cwit γ 0 Cη i Auit ‰0}, (3) where iG1, . . . , n and tG1, . . . , T. Throughout the paper, n is considered to be large relative to T, which is the case most frequently encountered in practice. It is assumed that the sample starts at date tG0, and that di0 and yi0 (which may or may not be censored) 543 544 REVIEW OF ECONOMIC STUDIES are observed although the model is not specified for the initial period. As will become clear later, it is not necessary to assume that the other covariates are observed in the initial period of the sample. In the model given by (1)–(3), ρ0 , φ 0 ∈ℜ, β 0 ∈ℜk and γ0 ∈ℜq are the unknown parameters of interest, x*it and wit are vectors of strictly exogenous explanatory variables (with possibly common elements),1 ε *it and uit are unobserved disturbances (not necessarily independent of each other), and α *i and η i are unobservable time-invariant individual-specific effects that are possibly correlated with each other as well as with the errors, the regressors, and the initial observations. y*it ∈ℜ is a latent variable whose observability depends on the outcome of the indicator variable dit ∈{0, 1}. In particular, it is assumed that, while (dit , wit) is always observed, ( y*it , x*it ) is observed only if dit G1.2 In other words, the ‘‘selection’’ variable dit determines whether the it-th observation in equation (1) is censored or not. Thus, the observed sample consists of quadruples (dit , wit , yit , xit), where xit ≡ dit x*it , from which we want to estimate (ρ 0, φ 0, β 0, γ 0). As there exist several results on identification and estimation of the discrete choice selection equation (3) (see Section 3), in this paper we will focus attention on the parameters (ρ 0, β 0) of the continuous outcome equation (1). A feature of the model that should be pointed out from the outset is that, although x*it and wit may contain common variables, the two vectors do not coincide, which rules out the censored regression model (the Type 1 Tobit model) as a special case of the model considered in this paper. The reason is that our semiparametric identification scheme of the continuous outcome equation requires that the selection equation contains at least one variable that is not included in the outcome equation. Such an exclusion restriction is standard in semiparametric Type 2 Tobit models.3 In this paper we derive conditions under which the parameters of the model (1)–(3) are identified and propose estimators that are consistent and asymptotically normal, without placing any restrictions on the parametric form of the distribution of any of the unobservables, or on the statistical relationship of the individual effects with the observed covariates and the initial conditions. In this sense we follow a semiparametric ‘‘fixed effects’’ approach. Arellano et al. (1997), and Bover and Arellano (1997) discuss estimation of certain dynamic panel data limited dependent variable models taking a parametric ‘‘random effects approach.’’ The model under consideration may be relevant, for example, for estimating intertemporal labour supply responses to wage rate and non-labour income changes. Intertemporal substitution of labour has received a great deal of attention in the literature, having figured prominently in explanations of aggregate business cycle fluctuations (Kydland and Prescott (1982)). Dynamic models of labour supply of the form of (1) may arise when preferences are allowed to be non-separable over time (see, for example, Hotz et al. (1988), and Johnson and Pencavel (1984)), and have been found to yield intertemporal labour supply elasticities of substitution higher than models that assume intertemporal separability. Most of this literature, however, has only considered interior solutions, i.e. behaviour only at the intensive (hours worked) margin. Another strand of the literature (see Heckman (1993) for a recent survey) has stressed the importance of considering behaviour at the extensive (participation) margin as well, especially for women, which gives rise to 1. As we discuss in Section 2, although it may be possible to dispense with the strict exogeneity assumption on x*it , this is not possible for wit. 2. As we discuss in Section 2.2, the case where some or all of the variables in x*it are always observed can be easily handled. 3. A recent paper (Chen (1996)) shows that in the cross-sectional version of the model, it is possible to replace the exclusion restriction with a symmetry assumption on the joint distribution of the errors. This result does not seem to carry over to the panel data case. KYRIAZIDOU PANEL DATA MODELS 545 Tobit-type models of labour supply. When looking at the participation decision alone, models of discrete choice that incorporate state dependence of the form of (3) have been used to account for the presence of human capital accumulation (Heckman (1981)) or search costs (Eckstein and Wolpin (1990), and Hyslop (1999)). Finally, a set of papers (Cogan (1981), Hanoch (1980), Hausman (1980)) has found considerable evidence of fixed costs associated with working, implying that a Type 2 Tobit specification may be more appropriate for analysing labour supply. The model considered in this paper incorporates the salient features of all these strands of literature. However, it may be difficult to derive the model (1)–(3) directly from a structural dynamic utility maximization problem. The reason is that typically the model would introduce the lagged yit (or y*it ) in the selection equation. This would then violate the strict exogeneity assumption on the selection covariates wit required by the proposed identification approach, which conditions on current, past, and future values of the explanatory variables in the selection equation. As we discuss in Section 2, although in principle it is possible to allow for the lagged selection variable in the main equation, its coefficient is not identified. The paper is organized as follows. Section 2 obtains a set of moment restrictions for the model (1)–(3). To facilitate comparisons with the existing literature, in that section we discuss the assumptions and the moment conditions used in the estimation of dynamic linear panel data models of the form of equation (1). In that section we also discuss a variation of the model in which the continuous outcome equation contains the lagged censored endogenous variable instead of the lagged latent one. Section 3 presents the proposed estimators for the dynamic panel data sample selection model and discusses practical issues such as estimation of the discrete choice selection equation. Section 4 presents the results of a small Monte Carlo experiment for the case where the main equation follows a pure first-order autoregression with an individual-specific drift and the selection equation contains only exogenous regressors. The Appendix presents conditions under which the proposed estimators are consistent and asymptotically normal, and states the formal results.4 2. MOMENT RESTRICTIONS IN DYNAMIC PANEL DATA SAMPLE SELECTION MODELS The model (1)–(3) is an extension of the model considered in Kyriazidou (1997) which only allowed for strictly exogenous variables in (1) and (3) (i.e. imposed ρ0 Gφ 0 G0). The idea for identifying β 0 in that paper relies on the conditional pairwise exchangeability of the error vector (ε *it , uit) given the entire path of the exogenous covariates and the individual effects. Identification of β 0 is based on the observation that, for an individual who is observed in two time periods, say t and tA1 (i.e. who has dit GditA1 G1), the magnitude of the sample selection bias in the two time periods is the same if ∆wit γ 0 G0.5 (Here, and throughout, ∆ denotes first-differences.) This implies that time-differencing the main equation (1) eliminates not only the individual effect but also the effect of sample selection. However, the consistency of the estimator above breaks down in the presence of the lagged dependent variable in (1). The reason is the same as in linear dynamic panel data models, 4. All proofs are contained in an additional Appendix available at the author’s web site, http:兾兾www.econ. ucla.edu兾kyria. At the same site there is also available a preliminary GAUSS code that implements the proposed estimator. 5. The idea of using pairwise comparisons to eliminate the sample selection effect was first proposed by Powell (1987) for the cross-sectional version of the model. 546 REVIEW OF ECONOMIC STUDIES where standard estimators such as the first-difference and the within estimators are inconsistent because of the non-zero correlation of y*itA1 with the transformed (first-differenced or in deviation from its time mean) error term. As we review in this section, in the absence of sample selection, estimation of ρ0 and β 0 in (1) is based on linear and nonlinear moment conditions, implied by assumptions on the serial correlation structure of the time-varying unobservables, on the correlation structure of the unobservables with the observable covariates, and兾or on assumptions on the initial conditions. We show that similar moment conditions may be also obtained in the presence of sample selection under appropriate assumptions on the errors of the model. In particular, we will assume here that (ε *it , uit) is independent and identically distributed over time conditional on the exogenous variables, the individual effects and the initial observations. Similar to Kyriazidou (1997), identification in the dynamic model (1)–(3) relies on conditioning on the event ∆wit γ 0 Cφ 0 (1AditA2 )G0. Clearly, this condition collapses to ∆wit γ 0 G0 if ditA2 G1. Thus, for an individual who is observed for three (say, consecutive) periods, taking time-differences eliminates not only the individual effect but also the effect of sample selection from the main equation, provided that the ‘‘selection index’’, wit γ 0 , remains constant. The key intuition is that in this case, all conditional moments of the main equation errors are constant due to the assumed stationarity of the error vector and are therefore eliminated by time-differencing. We proceed by considering successively four cases that build towards the general model (1)–(3). 2.1. Case 1: ρ0 ≠0, β 0 G0, φ 0 G0 We will first consider the case where all the selection covariates are strictly exogenous with respect to (ε *it , uit) and the main equation follows a first-order purely autoregressive model. As will become obvious, more than one lag of the dependent variable y*it may be handled in a straightforward manner. The model has the form y*it Gρ0 y*itA1 Cα *i Cε *it , yit Gdit y*it , (1′) (2) dit G1{wit γ 0 Cη i Auit ‰0}. (3′) In the absence of sample selection, i.e. when yit Gy*it for all i and for all t, estimation of ρ0 in (1′) is typically based on the following assumptions: SA1: E(ε *it y*i0 ) is the same for all t for each i. SA2: E(ε *it α *i ) is the same for all t for each i. SA3: E(ε *it ε *is ) is the same for all t≠s for each i. Typically, the moments above are assumed to be zero (see for example Ahn and Schmidt (1995) and Blundell and Bond (1998)). Assuming that y*i0 is observed, Assumptions SA1– SA3 imply the following orthogonality conditions E(y*itAj ∆ε *it )G0, tG2, . . . , T; jG2, . . . , t, (4) E((α *i Cε *iT)∆ε *it )G0, tG2, . . . , TA1, (5) (compare to equations (3) and (4) of Ahn and Schmidt (1995)). Note that (4) implies T(TA1)兾2 zero moment restrictions that are linear in ρ0 , while (5) implies TA2 nonlinear restrictions. As Ahn and Schmidt show, estimation of ρ0 by GMM that uses the sample KYRIAZIDOU PANEL DATA MODELS 547 analogues of (4) and (5) produces an estimator that is efficient within the class of all estimators that exploit all the moment conditions. In addition to SA1–SA3, it is also often assumed that the time-varying errors in (1′) are homoskedastic over time, in the sense that: SA4: E(ε *it 2)GE(ε is*2), for all i and for all t, s, which implies the additional TA1 nonlinear moment restrictions6 E((α *i Cε *it )2A(α *i Cε *itA1)2)G0, tG2, . . . , T. (6) We now turn to examine whether conditions similar to (4)–(6) hold for the sample selection model (1′)–(3′). We will make the following assumption: A1: {(ε *it , uit)}TtG1 is i.i.d. over time for all i conditional on ζ i ≡ (wi , α *i , η i , y*i0 , di0) where wi ≡ (wi1 , . . . , wiT). The strict stationarity assumption on ( ε *it , uit) is stronger than the second moment assumptions SA1–SA4. In fact, it implies SA1–SA4. It is however typical in nonlinear semiparametric panel data models (see for example Manski (1987), and Honore´ (1992, 1993)). The conditional serial independence is also a strong assumption, stronger than both the serial uncorrelatedness usually assumed in linear dynamic panel data models and the conditional pairwise exchangeability often assumed in ‘‘static’’ nonlinear panel data models.7 For the sample selection model (1′)–(3′), the analogue of the right-hand-side of (4) is E(dit ditA1 ditA2 ditAj yitAj (∆yit Aρ0 ∆yitA1))GE(dit ditA1 ditA2 ditAj y*itAj ∆ε *it ), which in general will not be zero due to the contemporaneous correlation of ε *it and uit. This may be seen from E(y*itAj ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i) GρtAj i0 E(∆ε * it 兩dit ditA1 ditA2 ditAj G1, ζ i) 0 y* tAjA1 ρl0 α *i E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i) tAjA1 ρl0 E(ε *itAjAl ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i), C ∑l G0 C ∑l G0 where due to the conditional independence of ( ε *it , uit) over time, E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i)GE(ε *it 兩dit G1, ζ i)AE(ε *itA1 兩ditA1 G1, ζ i), 6. As Ahn and Schmidt discuss, with the addition of SA4, (5) and (6) may be expressed as E (y*it ∆ε i*tC1 Ay*itC1 ∆ε i*tC2)G0, tG1, . . . , TA2, E(∆ε i*tC1 (1兾T) ∑t G1 (α i* Cε i*t ))G0, tG1, . . . , TA1, T so that the number of linear in ρ0 restrictions is maximized. 7. For a discussion of the role of conditional pairwise exchangeability in ‘‘static’’ panel data Tobit-type models, see Honore´ and Kyriazidou (2000). 548 REVIEW OF ECONOMIC STUDIES while for all jG2, . . . , t and for all lG0, . . . , tAjA1, E(ε *itAjAl ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i) GE(ε *itAjAl 兩ditAjAl G1, ζ i) [E(ε *it 兩dit G1, ζ i)AE(ε *itA1 兩ditA1 G1, ζ i)]. Obviously in general, the ‘‘selection correction term’’ or ‘‘sample selection effect’’ Λ1it ≡ E(ε *it 兩dit G1, ζ i)GE(ε *it 兩uit ‰wit γ 0 Cη i ,ζ i), will not be zero even if E(ε *it 兩ζ i)G0 for all t. Furthermore, Λ1it ≠Λ1itA1 since uit and uitA1 are truncated at different thresholds, wit γ 0Cη i and witA1 γ 0Cη i, respectively, which vary over time due to the time-variation of the scalar ‘‘selection index’’ wit γ 0. However, the stationarity of (ε *it , uit) over time implies that the functional form of Λ1it and Λ1itA1 is time-invariant, i.e. Λ1it GΛ1 (wit γ 0 Cη i , ζ i), although in principle the functional form of Λ1 may vary over individuals. This implies that if for an individual i the selection index wit γ 0 is constant in periods t and tA1, the magnitude of the sample selection effects in the two periods will also be the same, i.e. wit γ 0 GwitA1 γ 0 ⇒ wit γ 0 Cη i GwitA1 γ 0 Cη i ⇒ Λ1it GΛ1itA1 . In other words, E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ∆wit γ 0 G0, ζ i)G0. Hence we obtain the following moment restriction for the sample selection model which is analogous to (4)8 E(dit ditA1 ditA2 ditAj y*itAj ∆ε *it 兩∆wit γ 0 G0)G0, tG2, . . . , T; jG2, . . . , t. (7) In fact, A1 implies an infinite number of moment restrictions since any measurable function of the lagged y*’s that are not censored may be substituted for yitAj in (7). Using the same reasoning as above, it is straightforward to obtain the following moment restrictions which are analogous to (5) and (6), respectively9 E(diT diTA1 dit ditA1 ditA2 (α *i Cε *iT)∆ε *it 兩∆wit γ 0 G0)G0, E(dit ditA1 ditA2 ((α *i Cε *it )2A(α *i Cε *itA1)2)兩∆wit γ 0 G0)G0, tG2, . . . , TA1, tG2, . . . , T. (8) (9) In Appendix A we discuss another assumption suggested by Arellano and Bover (1995) which implies mean stationarity of the y*it process and which has been proven quite useful for the identification of ρ0 in linear dynamic models in cases where the series shows high persistence. In this Appendix we also discuss possible ways for extending this identification argument in the presence of sample selection. Finally, it is of interest to consider here the case where the main equation contains a possibly time-varying intercept, i.e. it is of the form y*it Gδ 0t Cρ0 y*itA1 Cα *i Cε *it . By allowing for a time-varying intercept in the model we are allowing for the presence of aggregate shocks that are common to all individuals and hence do not have zero cross8. The same reasoning may be used to construct moments that are based on taking time-differences over non-consecutive periods. In this case the moment condition becomes E(dit ditA1 dis disA1 disAj y*isAj (ε i*t Aε i*s )兩 (wit Awis) γ 0 G0)G0, tG2, . . . , T; sFt; jG2, . . . , s. (7′) The moment restriction (7′) may be useful in practice in situations where for example participation spells are few and far apart. I thank a co-editor for pointing this out. 9. It is not difficult to see that we could in principle exploit the strict stationarity assumption on the errors to construct an infinite number of moment conditions that use higher moments of ε i*t (provided of course that those exist). KYRIAZIDOU PANEL DATA MODELS 549 sectional mean. We can now without loss of generality assume that SA0: E(α *i )GE(ε *it )G0, for all i and for all t. In the absence of sample selection and under assumptions SA0–SA4, we obtain T additional moment restrictions of the form E(α *i Cε *it )G0, tG1, . . . , T, (10) which identify {δ 0t }TtG1 . It is clear that in the presence of sample selection the analogue of (10) will in general not hold for any t. That is, in general, E(dit ditA1 ( yit Aρ0 yitA1 Aδ 0t))GE(dit ditA1 (α *i Cε *it ))≠0. Thus, our scheme only identifies ∆δ 0t through (7), (8) and (9). This implies that timeinvariant intercepts are not identified by our approach. It also implies that we cannot identify (time-invariant) coefficients of either the current discrete choice, dit , or the lagged discrete choice, ditA1 , if either or both enter the main equation, since (7), (8) and (9) use only observations for which dit GditA1 G1. 2.2. Case 2: ρ0 ≠0, β 0 ≠0, φ 0 G0 We will now consider the case where the main equation also contains other regressors besides lags of the dependent variable, while the selection covariates are still strictly exogenous. The model has the form y*it Gρ0 y*itA1 Cx*it β 0 Cα *i Cε *it , (1) yit Gdit y*it , (2) dit G1{wit γ 0 Cη i Auit ‰0}. (3′) In the absence of sample selection, i.e. for the linear model of equation (1), it is often assumed that the regressors x*it are uncorrelated with the time-varying errors ε *it at all leads and lags, i.e.10 SA6: E(x*is ε *it )G0, for all i and for all t, s. This assumption yields an additional T(TA1) moment restrictions which are linear in ρ0 and β 0 E(x*is ∆ε *it )G0, tG2, . . . , T; sG1, . . . , T. (11) If instead it is assumed that x*it is only predetermined with respect to ε *it , in the sense that: SA6′: E(x*is ε *it )G0, for all i and for all s‰t, we only obtain an additional T(TA1)兾2 moment restrictions E(x*is ∆ε *it )G0, tG2, . . . , T; sG1, . . . , tA1. (11′) It is clear that the moment restrictions (11) (or (11′)) will also hold if, for each i and for each s, E(x*is ε *it ) is constant for all t (or for all tÂs), say γ s , but not necessarily equal to zero. 10. Note that the presence of aggregate shocks with non-zero cross-sectional mean may be captured by including a time dummy in x*it . This is why we are not including a time-varying intercept in (1). Thus, we may without loss of generality assume that E(ε i*t )G0 for all i and t, so that Cov (x*is , ε i*t )GE(x*is ε i*t ). 550 REVIEW OF ECONOMIC STUDIES For the sample selection model (1)–(3′), we will focus on the case where x*it is strictly exogenous with respect to (ε *it , uit) in the sense that: A1′: (ε *it , uit) is i.i.d. over time for all i conditional on ζ˜ i ≡ (wi , x*i , α *i , η i , y*i0 , di0) where wi ≡ (wi1 , . . . , wi T ) and x*i ≡ (x*i1 , . . . , x*iT). Note that A1′ implies that E(x*is ε *it )GE(x*is E(ε *it 兩x*i ))Gγ s , i.e. it is constant for all t which is sufficient, as noted above, for (11) to hold in the linear model. It is not difficult to see that A1′ leads to the following T(TA1) conditions that are analogous to (11) E(dis dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0, tG2, . . . , T; sG1, . . . , T. (12) Notice however that A1′ in fact implies an infinite number of moment restrictions, namely any measurable function of any subset of x*i that are not censored will satisfy (12).11 It is useful here to point out the similarity of (12) to the moment restriction (7), which uses uncensored lags of the dependent variable as instruments. In the case where x*it is not subject to censoring, that is when xit ≡ x*it for all i and t, the moment condition becomes E(dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0, tG2, . . . , T; sG1, . . . , T. (12′) In Appendix B we examine another set of moment restrictions that have been proposed for linear dynamic panel data models and discuss conditions under which it is possible to derive analogues to those in the presence of sample selection. 2.3. Case 3: ρ0 G0, β 0 ≠0, φ 0 ≠0 We will now consider the case where the main equation contains only strictly exogenous regressors, while the selection equation has the dependent variable lagged once as an explanatory variable. The model has the form y*it Gx*it β 0 Cα *i Cε *it , (1″) yit Gdit y*it , (2) dit G1{φ 0 ditA1 Cwit γ 0 Cη i Auit ‰0}, (3) and we assume that A1′ holds. It is straightforward to see that for all sG1, . . . , T E(∆ε *it 兩dis dit ditA1 G1, ditA2 , ζ i) GE(ε *it 兩uit ‰wit γ 0 Cη i Cφ 0 , ζ˜ i)AE(ε *itA1兩uitA1 ‰witA1 γ 0 Cη i Cφ 0 ditA2 , ζ˜ i), 11. Suppose that instead of A1′ the following assumption holds: A2: (ε i*t , uit) is independent of x*is for all s and t and for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0). Then, it is clear that under A2, (12) (or (12′)) still holds. If A2 is replaced with: A2′: (ε i*t , uit) is independent of x*is for all s‰t and for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0) then we obtain an analogue to (11′): E(dis dit ditA1 ditA2 x*is ∆ε i*t 兩∆wit γ 0 G0)G0, tG2, . . . , T; sG1, . . . ,tA2. One case that Assumption A2′ may be relevant is when y*it comes from a VAR and x*it is a lag of another endogenous variable. Suppose for example that x*it ≡z*itA1, where z*it Gδ z*itA1Cα i*Cû*it , with û*it potentially correlated with (ε i*t , uit), but independent of (ε i*s , uis) for all t≠s conditional on ζ i ≡(wi , α i* , η i , y*i0 , di0 , z*i0). KYRIAZIDOU PANEL DATA MODELS 551 will be zero conditional on ∆wit γ 0 Cφ 0 (1AditA2)G0. Hence we obtain the following set of moment restrictions E(dis dit ditA1 x*is ∆ε *it 兩∆wit γ 0 Cφ 0 (1AditA2)G0)G0, tG2, . . . , T; sG1, . . . , T. (13) The moment condition above looks more different than the moment condition (12), obtained when there are no dynamics in the selection equation, than it really is. Note that for observations for which ditA2 G1, the conditioning set in (13) becomes the same as in (12), namely ∆wit γ 0 G0. Thus, a valid subset of moment restrictions for the model (1′′)–(3) is E(dis dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0, tG2, . . . , T; sG1, . . . , T, which is the same set of moments as in (12). 2.4. Case 4: ρ0 ≠0, β 0 ≠0, φ 0 ≠0 As we saw above, in the presence of dynamics only in the selection equation the conditioning set becomes ∆wit γ 0 G0 when ditA2 G1. On the other hand, when dynamics are present only in the main equation, the same conditions are imposed (either explicitly in the case of ∆wit γ 0 G0, or implicitly for ditA2 G1, by multiplication with ditA2) for the moments in (7), (8), (9) and (12) to be non-trivially equal to zero. It is therefore clear that (7), (8), (9) and (12) continue to hold for the general model described by equations (1)–(3) that allows for the presence of dynamics in both the selection and the main equation. Note that in all four cases the conditioning set could be written as ∆wit γ 0C φ 0 (1AditA2)G0, as in Case 3. In Cases 1 and 2, φ 0 is identically equal to zero, while in Case 4, ditA2 G1 is implicitly imposed. In the estimation section below we will focus on the general model (Case 4) and will therefore condition on ∆wit γ 0 G0. 2.5. A ûariation of the main equation In this section we consider a variation of the model where the main equation contains a lag of the censored dependent variable instead of the latent one. Specifically, the main equation is given by y*it Gρ0 yitA1 Cx*it β 0 Cα *i Cε *it Gρ0 ditA1 y*itA1 Cx*it β 0 Cα *i Cε *it . The question then is whether the moment conditions (7), (8), (9) and (12) continue to hold in this case as well. It is clear that the only case where this modification of the main equation may play a role is in the derivation of (7). This moment condition, which uses (uncensored) lags of the dependent variable as instruments for the equation in first differences, was derived by backward substitution of y*itAj ( jÂ2). With the modification of the main equation above, this backward substitution is still valid for up to that time period tAjAs (sG1, . . . , tAj) that the first zero is observed. Assume Ditjs ≡ dit ditA1 ditA2 ditAj ditAjA1, . . . , ditAjAsC1 . Note that, for any sG1, . . . , tAj, E(y*itAj ∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i) sA2 GE((ρsA1 y*itAjAsC1C∑l G0 ρl0 (x*itAjAl β 0 Cα *i Cε *itAjAl))∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i ) 0 sA1 GE((∑l G0 ρl0 (x*itAjAl β 0 Cα *i Cε *itAjAl))∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i ), 552 REVIEW OF ECONOMIC STUDIES since y*itAjAsC1 Gx*itAjAsC1 β 0 Cα *i Cε *itAjAsC1 due to the fact that ditAjAs G0. Following the analysis of Section 2.1, E(∆ε *it 兩Ditjs G1, ditAjAs G0, ∆w it γ 0 G0, ζ˜ i) GE(ε *it 兩dit G1, ∆w it γ 0 G0, ζ˜ i)AE(ε *itA1 兩ditA1 G1, ∆w it γ 0 G0, ζ˜ i) G0, and for all lG0, . . . , sA1, E(ε *itAjAl ∆ε *it 兩Ditjs G1, ditAjAs G0, ∆wit γ 0 G0, ζ˜ i) GE(ε *itAjAl 兩ditAjAl G1, ∆wit γ 0 G0, ζ˜ i) B[E(ε *it 兩dit G1, ζ˜ i)AE(ε *itA1 兩ditA1 G1, ∆wit γ 0 G0, ζ˜ i)] G0. These last two equations, which hold for any sG1, . . . , tAj ( jÂ2), imply that the moment condition (7) will continue to hold in the modified model as well. 3. ESTIMATION OF DYNAMIC PANEL DATA SAMPLE SELECTION MODELS First we will define some notation. Let θ 0 ≡ ( ρ0 , β 0′ )′ denote the true parameter vector that belongs to a subset Θ of ℜkC1, and z*it ≡ (y*itA1 , x*it ). Define the following functions m1it, j (θ ) ≡ dit ditA1 ditA2 ditAj y*itAj (∆y*it A∆z*it θ ), m2it,κ (θ ) ≡ dis dit ditA1 ditA2 x*is,κ (∆y*it A∆z*it θ ), tG2, . . . , T; jG2, . . . , t, tG2, . . . , T; sG1, . . . , T; κ G1, . . . , k, m3it (θ ) ≡ diT diTA1 dit ditA1 ditA2 ( y*iT Az*iT θ ) (∆y*it A∆z*it θ ), tG2, . . . , TA1, m4it (θ ) ≡ dit ditA1 ditA2 ((y*it Az*it θ ) A(y*itA1 Az*itA1 θ ) ), tG2, . . . , T, 2 2 where θ ∈Θ. Also, define the ((tA1)B1) vector-valued function m1it (θ ) ≡ (m1it,2 (θ ), . . . , m1it,t (θ ))′ and the (kB1) vector-valued function m2it (θ ) ≡ (m2it,1 (θ ), . . . , m2it,k (θ ))′. As we have seen in the previous section, Assumption A1′ implies the following orthogonality conditions for the general sample selection model (1)–(3) E(mlit (θ 0)兩∆wit γ 0 G0)G0, lG1, . . . , 4, (14) If γ 0 is known, and if all variables in wit are discrete and Pr (∆wit γ 0 G0)H0, a natural way of estimating θ 0 is to minimize the distance, according to some metric, of the sample analogues of the moments above from zero, i.e. do GMM using 1 n ∑ 1{∆wit γ 0 G0}mlit (θ ), n i G1 lG1, . . . , 4. Obviously this estimation scheme will fail if one or more of the selection covariates wit are continuously distributed. Furthermore, γ 0 will generally be unknown. However, if the functions Λ1 and Λ2 (see Assumption N3 in Appendix C) are sufficiently ‘‘smooth’’, then a small value of ∆wit γ for some γ ∈ℜq will imply that ∆Λ1it GΛ1 (wit γ , ζ˜ i)AΛ1 (witA1 γ , ζ˜ i) and ∆Λ2it GΛ2 (wit γ , ζ˜ i)AΛ2 (witA1 γ , ζ˜ i) are also small, so that the moment conditions above will be satisfied approximately. The idea is then to replace the indicator function in the sample moments above with a weight KYRIAZIDOU PANEL DATA MODELS 553 that depends on the magnitude of ∆wit γˆ n, where γˆ n is an initial consistent estimate of γ 0. The sample averages will then converge to their population analogues provided that the weights decline to zero for observations with ∆wit γˆ 0 ≠0 as sample size increases. We choose kernel weights of the form 1 hn K 冢 ∆wit γˆ n , hn 冣 where hn is a bandwidth that shrinks to zero as n → S, while K(·) is a kernel ‘‘density’’ function. The proposed estimator then is a kernel-weighted GMM estimator that solves ˆ n (θ )′A′n An G ˆ n (θ ), θˆ n Garg min G θ ∈Θ where An is a stochastic matrix that converges in probability to a finite non-stochastic ˆ n(θ ) is the vector of stacked sample moments with rows of the form limit A0 , and G ∆wit γˆ n 1 n 1 K mlit (θ ). ∑ n i G1 hn hn 冢 冣 In Appendix C we present conditions under which the proposed estimators are consistent and asymptotically normal. Apart from Assumption A1′, a key condition for consistency is an exclusion restriction between x*it and wit. This is required so that β 0 be identified from the moment conditions (14) for lG1, 2. Under appropriate smoothness assumptions, the estimators are asymptotically normal and achieve the same rate of convergence as in univariate nonparametric density and regression function estimation. Provided that γ 0 can be estimated at a sufficiently fast rate (Assumption N14), the asymptotic distribution of θˆ n does not depend on the asymptotic distribution of the first-step estimator γˆ n. Note that its asymptotic variance is of the standard GMM form, A1 , where A*0 ≡ D′0 A′0 A0 . The definitions of D0 and V0 are given (A*0 D0)A1A*0 V0 A*′ 0 (A* 0 D0) in the Appendix, where we also discuss how they can be consistently estimated. The implementation of the proposed estimator requires that a consistent and sufficiently ‘‘fast’’ estimator of γ 0 (and φ 0) be available (see Assumptions (C12) and (N14) in Appendix B).12 In the absence of dynamics in the selection equation, γ 0 may be estimated at the standard root-n rate under a logistic assumption on the errors of the binary choice model (see, for example, Chamberlain (1984)). If one is not willing to parameterize this distribution, one may use the ‘‘smoothed conditional maximum score estimator’’ (see Kyriazidou (1995), and Charlier et al. (1995)) which modifies Manski’s (1987) estimator by smoothing the score function in the manner suggested by Horowitz (1992). Under appropriate smoothness assumptions this ‘‘smoothed’’ estimator will converge at a rate sufficiently fast, as required by Assumptions (C12) and (N14). When the selection equation contains one lag of the dependent variable along with other exogenous variables, Honore´ and Kyriazidou (2000) show that, in a logistic framework and under appropriate assumptions, (φ 0 , γ 0) may be also estimated at a sufficiently fast rate.13 Alternatively, if 12. As noted for the static panel data sample selection model in Kyriazidou (1997), it is in principle possible to dispense with estimation of the selection equation altogether and condition on the event that wit GwitA1 which obviously implies that wit γ 0 GwitA1 γ 0 . However, in this case, only the coefficients in β 0 that correspond to the non-overlapping variables between x*it and wit would be identified. Furthermore, in this case, the second-step estimator of the outcome equation would converge at even slower rate. However, this approach may be desirable if one wants to avoid estimation of the selection equation. 13. As Honore´ and Kyriazidou show, when the logit assumption is relaxed, the dynamic binary choice model (3) may be still estimated consistently. They do not however provide the rate of convergence in this case, although they conjecture that it may be sufficiently fast under appropriate smoothness conditions. 554 REVIEW OF ECONOMIC STUDIES one is willing to assume that the discrete choice selection equation contains a regressor that is independent of both the error term and the individual effect in that equation, it is possible to estimate (φ 0 , γ 0) at the standard root-n rate in the manner suggested by Honore´ and Lewbel (1999). In order to compute θˆ n in practice, one needs to choose the kernel function K and to assign a numerical value to the bandwidth. The results in kernel density and regression function estimation suggest that the specific choice of the kernel function may not be as important as the choice of the bandwidth. For choosing the bandwidth one may follow the ‘‘plug-in’’ approach described in Kyriazidou (1997)—see also Horowitz (1992) and Ha¨ rdle (1990). We finally turn to the choice of A0 which enters the asymptotic variance term through A*0 ≡ D′0 A′0 A0 . The choice of A0 is therefore important in terms of efficiency of the proposed estimator. As Hansen (1982) shows, for the standard GMM estimator that has an asymptotic normal distribution with zero asymptotic bias and asymptotic covariance A1 matrix (A*0 D0)A1A*0 V0 A*′ , A0 may be chosen to satisfy A′0 A0 GV −1 0 (A* 0 D0) 0 . In the Monte Carlo study that follows, we use this weight matrix which implies that the covari−1 ance matrix of the proposed estimator is equal to (D′0 V −1 0 D0) . Note however that in principle this choice of weight matrix does not necessarily have the optimality properties as in the standard GMM context, since in our case the asymptotic variance of the estimator is also affected by the choice of bandwidth and kernel function.14 4. MONTE CARLO EVIDENCE In this section we report the results of a small Monte Carlo study that investigates the finite sample properties of the proposed estimator as they compare to those of the ‘‘naive’’ estimator that ignores sample selectivity. We focus on a single specification where the main equation follows a pure first-order autoregression with an individual specific drift. This model has been used in recent papers to study the performance of GMM estimators in linear dynamic panel data models (see e.g. Blundell and Bond (1998) and Wyhowski (1996)). The selection equation follows a discrete choice model with strictly exogenous regressors, individual effects, and logistic errors. Thus, the experiments are conducted under the favourable environment where sufficiently fast estimation of γ is feasible, namely, by conditional logit. Data for the Monte Carlo experiments are generated according to the model y*it Gρ0 y*itA1 Cα *i Cε *it , tG1, 2, 3, (1) yit Gdit y*it , dit G1{w1,it γ 10 Cw2,it γ 20 Cη i Auit Â0}, (2) tG0, 1, 2, 3; iG1, . . . , n. (3) In the selection equation, w1,it and w2,it are distributed as N(A1, 1), η i G3, uit is logistically distributed normalized to have unit variance, and γ 10 Gγ 20 G1. In the main equation, α i G1C(w2,i1Cw2,i2)兾2C12ξ 1i where ξ 1i is a standard normal random variable, and ε it ≡ uit . The initial observation yi0 is generated as di0 y*i0 , where y*i0 G α i 兾(120Aρ0)C1兾(11Aρ20 )ξ 2i with ξ 2i distributed as N(0, 1). This specification implies that the latent process { y*it } is covariance stationary. Three different values are considered for ρ0 : 0·3, 0·5 and 0·8. The variables w1,it , w2,it , uit , ξ 1i and ξ 2i are all generated independent of each other, and are independent and identically distributed over 14. I thank an anonymous referee for pointing this out. KYRIAZIDOU PANEL DATA MODELS 555 time and across individuals. Four different sample sizes n are considered: 500, 1000, 4000 and 16,000. This design implies that Pr (dit G1) ≈ 0·7, Pr (dit ditA1 ditA2 G1) ≈ 0·33, and Pr (dit ditA1 ditA2 ditA3 G1) ≈ 0·26. Both the ‘‘naive’’ and the proposed estimators considered in this section are GMM estimators that exploit (all or a subset of) the following sample moments 1 ∑i G1 di0 di1 di2 yi0 (∆yi2 Aρ∆yi1)ω i2 , (M1) 1 n ∑ di0 di1 di2 di3 yi0 (∆yi3 Aρ∆yi2)ω i3 , n i G1 (M2) n 1 n 1 n n ∑i G1 di1 di2 di3 yi1 (∆yi3 Aρ∆yi2)ω i3 , (M3) 2 2 ∑i G1 di0 di1 di2 [(yi2 Aρyi1) A(yi1 Aρyi0) ] ω i2 , (M4) n n 1 n 2 2 ∑ di1 di2 di3 [(yi3 Aρyi2) A(yi2 Aρyi1) ]ω i3 , n i G1 (M5) 1 n ∑ di0 di1 di2 di3 ( yi3 Aρyi2)(∆yi2 Aρ∆yi1)ω i2 . n i G1 (M6) For the ‘‘naive’’ estimator, ω it G1 for all i and t, while for the proposed estimator ω it GK(∆wit γˆ 兾hn) where K(·) is the standard normal density function, hn Gn−1兾5, and γˆ is obtained by conditional logit. In order to investigate how the estimation of γ 0 affects the results, we also consider the infeasible estimators that use the true γ 0 in the construction of the kernel weights. The estimators solve the problem min G′n A′n An Gn , ρ where Gn G(1兾n) ∑ gni (ρ) is a column vector that contains all or a subset of the sample moments (M1)–(M6) and An is a weighting matrix. We will denote by IV the estimators that exploit only the linear (in ρ) moments (M1)–(M3), by GMM1 those that exploit in addition the sample moments (M4) and (M5), and by GMM2 the estimators that exploit all moments (M1)–(M6). The objective is to study how the finite sample properties of the estimators are affected as the non-linear moment restrictions are added to the linear ones. It has been noticed that for the linear dynamic panel data model, IV estimators that exploit only the linear restrictions perform poorly especially for larger values of the autoregressive parameter, the reason being that the lagged values of the dependent variable are only weak instruments for the equation in first differences. For either the ‘‘naive’’ or the proposed approach, estimates are computed in a first stage using A′n An GIn , the identity matrix of size n. In a second stage, we compute estimates (denoted with o) using the ‘‘optimal’’ weighting matrix obtained from minimizing the asymptotic variance of the estimator. Tables 1–3 report the results for the Mean Bias, Median Bias, Root Mean Squared Error, and Median Absolute Error of the estimates for the three different specifications of ρ0 across 100 replications. For the design under investigation, we note that the IV, GMM1 and GMM2 versions of the ‘‘naive’’ estimator are in general more biased (in n i G1 556 TABLE 1 ρ0 G0·3 Proposed estimator true γ 0 ‘‘Naive’’ estimator Median bias RMSE MAE Mean bias Median bias RMSE MAE Mean bias Median bias RMSE MAE n G500 IV IVo GMM1 GMM1o GMM2 GMM2o 0·0673 0·0727 0·1308 0·0882 0·1165 0·0567 0·0689 0·0750 0·1038 0·0670 0·0913 0·0391 0·1543 0·1526 0·2137 0·1857 0·1908 0·1531 0·1107 0·1130 0·1235 0·0885 0·1061 0·0976 A0·0259 A0·0196 A0·0256 A0·0367 0·0638 0·0594 0·0602 0·0063 0·0656 0·0594 0·0481 0·0061 0·2406 0·2281 0·2651 0·2285 0·2490 0·2266 0·1669 0·1426 0·1534 0·1354 0·1347 0·1227 A0·0326 A0·0128 A0·0370 A0·0245 0·0599 0·0644 0·0667 0·0227 0·0691 0·0480 0·0763 0·0191 0·2720 0·2584 0·3116 0·2749 0·2960 0·2898 0·1651 0·1430 0·1753 0·1572 0·1892 0·1396 n G1000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·0644 0·0594 0·1044 0·0618 0·0925 0·0390 0·0668 0·0590 0·0888 0·0194 0·0724 0·0104 0·1394 0·1315 0·1805 0·1569 0·1622 0·1325 0·0858 0·0866 0·0972 0·0713 0·0888 0·0531 A0·0065 A0·0416 A0·0084 A0·0429 0·0396 0·0131 0·0403 0·0005 0·0432 0·0221 0·0280 A0·0019 0·2187 0·1982 0·2336 0·2098 0·2090 0·1739 0·1494 0·1341 0·1520 0·1017 0·1155 0·0723 A0·0036 A0·0118 0·0342 0·0357 0·0399 0·0288 A0·0296 A0·0561 0·0079 A0·0226 0·0143 A0·0071 0·2625 0·2355 0·2803 0·2544 0·2533 0·2310 0·1834 0·1470 0·1880 0·1292 0·1425 0·0977 n G4000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·0761 0·0720 0·1056 0·0407 0·0880 0·0139 0·0660 0·0640 0·1043 0·0311 0·0884 0·0145 0·0952 0·0901 0·1300 0·0713 0·1119 0·0436 0·0660 0·0640 0·1043 0·0382 0·0884 0·0286 A0·0041 A0·0036 A0·0069 A0·0103 0·0211 0·0096 0·0041 0·0019 0·0202 0·0098 0·0006 0·0029 0·1195 0·1127 0·1144 0·0925 0·1020 0·0689 0·0827 0·0772 0·0731 0·0605 0·0562 0·0472 A0·0136 0·0029 A0·0177 A0·0082 0·0276 0·0102 0·0068 0·0014 0·0269 0·0124 A0·0002 A0·0117 0·1477 0·1366 0·1392 0·1079 0·1270 0·0811 0·0936 0·0895 0·0899 0·0703 0·0747 0·0556 n G16,000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·0809 0·0752 0·1042 0·0378 0·0873 0·0161 0·0818 0·0761 0·1051 0·0387 0·0856 0·0158 0·0851 0·0795 0·1089 0·0441 0·0915 0·0245 0·0818 0·0761 0·1051 0·0387 0·0856 0·0170 A0·0067 A0·0035 A0·0067 A0·0038 0·0049 0·0009 0·0012 0·0001 0·0060 0·0032 0·0027 0·0037 0·0582 0·0538 0·0515 0·0411 0·0446 0·0347 0·0326 0·0370 0·0288 0·0238 0·0255 0·0200 A0·0099 0·0009 A0·0107 A0·0129 0·0064 0·0006 0·0002 0·0002 0·0084 0·0034 0·0038 0·0019 0·0719 0·0666 0·0652 0·0498 0·0556 0·0425 0·0368 0·0429 0·0387 0·0302 0·0321 0·0224 REVIEW OF ECONOMIC STUDIES Mean bias Proposed estimator estimated γ 0 TABLE 2 ρ0 G0·5 Proposed estimator true γ 0 ‘‘Naive’’ estimator Median bias RMSE MAE Mean bias Median bias RMSE MAE Mean bias Median bias MAE IV IVo GMM1 GMM1o GMM2 GMM2o 0·0972 0·1015 0·1753 0·1505 0·1721 0·1198 0·1047 0·1221 0·1792 0·1090 0·1639 0·0742 0·2331 0·2290 0·2675 0·2736 0·2582 0·2569 0·1547 0·1468 0·1987 0·1558 0·1763 0·1513 A0·0590 A0·0471 A0·0757 A0·0829 0·0697 0·1050 0·0664 0·0172 0·0758 0·1166 0·0623 0·0255 0·3554 0·3418 0·3273 0·2872 0·2995 0·2981 0·2714 0·2324 0·2093 0·2037 0·1930 0·1966 A0·0795 A0·0537 A0·1056 A0·1057 0·0512 0·0921 0·0583 0·0172 0·0582 0·0896 0·0399 0·0201 0·4198 0·4066 0·3482 0·3251 0·3445 0·2981 0·2737 0·2634 0·2354 0·2108 0·2367 0·2118 n G1000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·1133 0·1018 0·1573 0·1230 0·1502 0·1015 0·1020 0·0961 0·1624 0·0708 0·1477 0·0303 0·2222 0·2054 0·2434 0·2364 0·2340 0·2264 0·1353 0·1231 0·1649 0·1057 0·1520 0·0759 A0·0086 A0·0269 0·0519 0·0346 0·0642 0·0576 A0·0901 A0·0701 0·0373 A0·0231 0·0467 A0·0051 0·3638 0·3102 0·2966 0·3155 0·2793 0·2769 0·2381 0·2021 0·2021 0·1237 0·1739 0·0899 A0·0146 A0·0453 A0·0071 0·0225 0·0415 0·0074 A0·0686 A0·1197 0·0024 A0·0459 0·0259 A0·0271 0·4182 0·3620 0·3415 0·2804 0·2987 0·2899 0·2703 0·2488 0·2136 0·1292 0·1929 0·1222 n G4000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·1308 0·1179 0·1684 0·1091 0·1549 0·0567 0·1190 0·1030 0·1607 0·0743 0·1513 0·0322 0·1595 0·1443 0·1990 0·1651 0·1871 0·1232 0·1190 0·1030 0·1607 0·0794 0·1513 0·0546 A0·0067 A0·0124 A0·0172 A0·0138 0·0404 0·0432 0·0213 0·0054 0·0429 0·0253 0·0233 A0·0020 0·1804 0·1613 0·1627 0·1451 0·1554 0·1387 0·1059 0·1066 0·0966 0·0870 0·0838 0·0701 A0·0213 A0·0261 A0·0363 A0·0302 0·0522 0·0498 0·0393 0·0099 0·0549 0·0254 0·0225 A0·0086 0·2262 0·1988 0·1988 0·1911 0·1912 0·1774 0·1344 0·1328 0·1202 0·1049 0·1101 0·0827 n G16,000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·1391 0·1239 0·1710 0·0899 0·1560 0·0386 0·1401 0·1223 0·1714 0·0824 0·1574 0·0350 0·1447 0·1296 0·1771 0·1028 0·1620 0·0504 0·1401 0·1223 0·1714 0·0824 0·1574 0·0374 A0·0088 A0·0042 A0·0104 A0·0051 0·0128 0·0036 0·0096 A0·0007 0·0146 0·0050 0·0101 0·0031 0·0926 0·0838 0·0866 0·0786 0·0796 0·0795 0·0513 0·0547 0·0441 0·0371 0·0376 0·0273 A0·0136 A0·0062 A0·0173 A0·0140 0·0176 0·0054 0·0121 0·0008 0·0212 0·0039 0·0112 A0·0028 0·1133 0·1028 0·1103 0·1009 0·1029 0·0905 0·0576 0·0623 0·0569 0·0461 0·0511 0·0317 PANEL DATA MODELS RMSE n G500 KYRIAZIDOU Mean bias Proposed estimator estimated γ 0 557 558 TABLE 3 ρ0 G0·8 Proposed estimator true γ 0 ‘‘Naive’’ estimator Median bias MAE Mean bias Median bias RMSE RMSE A0·0323 0·0795 A0·0601 A0·0125 0·0895 0·1673 A0·0044 0·0792 0·1301 0·1780 0·0609 0·1291 0·8188 0·6660 0·3490 0·4221 0·2618 0·3234 0·3519 0·3239 0·2191 0·2546 0·2138 0·2174 A0·2982 A0·3652 A0·0793 A0·0254 A0·0634 A0·0373 A0·3034 A0·3729 0·0197 A0·0112 0·0242 A0·0064 Proposed estimator estimated γ 0 MAE Mean bias Median bias RMSE MAE 0·8982 0·8955 0·4201 0·3340 0·3775 0·3072 0·6056 0·5209 0·1976 0·2080 0·1864 0·2158 A0·3437 A0·3870 A0·0989 A0·0813 A0·0922 A0·1003 A0·3764 A0·3930 0·0051 A0·0457 0·0178 A0·0130 1·3143 1·1134 0·4231 0·3642 0·4071 0·3634 0·5875 0·5447 0·1825 0·1875 0·1675 0·2148 n G500 IV IVo GMM1 GMM1o GMM2 GMM2o n G1000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·2534 0·2330 0·1575 0·0908 0·1673 0·1313 0·1506 0·1484 0·1884 0·1366 0·1903 0·1567 0·7700 0·6340 0·2985 0·3512 0·2617 0·2732 0·3889 0·3313 0·2169 0·2605 0·2132 0·2279 A0·3678 A0·4020 A0·2307 A0·1586 A0·1203 A0·0723 A0·4410 A0·4656 A0·0886 A0·1676 A0·0321 A0·0571 0·9140 0·8898 0·5246 0·3822 0·3783 0·2833 0·4997 0·5704 0·2611 0·2344 0·1852 0·1854 A0·4834 A0·4740 A0·2335 A0·1642 A0·1851 A0·1419 A0·4485 A0·4680 A0·1011 A0·1346 A0·0687 A0·0850 1·1328 1·0287 0·5319 0·3979 0·4697 0·3585 0·5850 0·5541 0·2036 0·2248 0·1862 0·1948 n G4000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·4195 0·3513 0·2685 0·2734 0·2661 0·2659 0·3283 0·2827 0·2667 0·2857 0·2641 0·2790 0·5480 0·4557 0·2946 0·3014 0·2915 0·3004 0·3283 0·2827 0·2667 0·2857 0·2641 0·2790 A0·0582 A0·1653 0·0053 A0·0680 0·0155 0·0106 A0·1573 A0·2017 0·0084 A0·0756 0·0205 A0·0081 0·5304 0·4395 0·2506 0·3311 0·2347 0·2386 0·3250 0·2729 0·1769 0·1638 0·1619 0·1364 A0·1281 A0·2294 A0·0390 A0·0588 A0·0093 A0·0109 A0·2314 A0·3348 0·0025 A0·0597 0·0091 A0·0156 0·7125 0·6054 0·3068 0·3575 0·2557 0·2729 0·4076 0·4110 0·1740 0·1762 0·1620 0·1482 n G16,000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·4426 0·3732 0·3235 0·3209 0·3227 0·3170 0·4506 0·3695 0·3258 0·3317 0·3258 0·3330 0·4617 0·3908 0·3299 0·3254 0·3291 0·3224 0·4506 0·3695 0·3258 0·3317 0·3258 0·3330 A0·0191 A0·0412 0·0203 0·0073 0·0259 0·0368 A0·0488 A0·0572 A0·0088 A0·0162 A0·0087 0·0177 0·2688 0·2549 0·1856 0·2294 0·1782 0·1730 0·1564 0·1414 0·1104 0·0941 0·1053 0·0783 A0·0418 A0·0785 0·0138 A0·0031 0·0217 0·0172 A0·0679 A0·0988 A0·0255 A0·0442 A0·0205 A0·0075 0·3143 0·2942 0·1992 0·2454 0·1904 0·1965 0·1916 0·1771 0·1200 0·1017 0·1129 0·0941 REVIEW OF ECONOMIC STUDIES Mean bias KYRIAZIDOU PANEL DATA MODELS 559 absolute magnitude) than the respective versions of the proposed estimator (for the chosen bandwidth). An interesting exception occurs for ρ0 G0·8 and for the sample sizes of 500 and 1000 (see Table 3). However, the bias of the ‘‘naive’’ estimators increases substantially in this case for the larger sample sizes. The ‘‘naive’’ estimators are obviously inconsistent, as is demonstrated by the failure of their RMSE to decrease at the appropriate rate as sample size increases. In contrast, the RMSE of the proposed estimators decreases with sample size at rate which is almost equal to 1n, although it is slower for the high value of the autoregressive parameter. As expected, using the estimated γ 0 in the kernel weights almost invariably increases the RMSE of the proposed estimators relative to the RMSE of the infeasible estimators that use the true γ 0. The only exception occurs for ρ0 G0·5 and nG1000 for the estimator GMM1. The MAE also tends to be smaller for the infeasible estimators than the feasible ones, although in some cases, in particular for ρ0 G0·8 and smaller sample sizes, the MAE is slightly larger for the infeasible estimators. With respect to either measure of the bias, the evidence concerning the effect of estimating γ 0 is mixed. It appears that when the true γ 0 is used, the finite sample biases of the proposed estimators are smaller for higher values of ρ0 and for larger sample sizes. Concerning the relative performance of the IV and nonlinear GMM versions of the proposed estimators, we note that the IV estimator is invariably negatively biased on average and it tends to perform worse both in terms of bias and dispersion as ρ0 increases. This is similar to the findings for IV estimators in linear dynamic panel data models (see e.g. Wyhowski (1996)). Note that in contrast, the ‘‘naive’’ IV estimators are positively biased except for ρ0 G0·8 and for nG500. Adding the nonlinear moments (M4)–(M5) for GMM1 and (M6) for GMM2 in general decreases the dispersion of the estimates and also the bias for the larger values of ρ0. We note small efficiency gains, and especially for large sample sizes, when the restrictions (M6) are added. The use of the ‘‘optimal’’ weighting matrix improves in general on the efficiency of the estimates, although it does not always reduce their bias.15 In general, we observe a deterioration of the finite sample properties of all versions of the proposed estimator as the autoregressive parameter increases. This may be also seen from Figures 1 and 2, where we plot the kernel-smoothed density16 of the IVo, GMM1o, and GMM2o estimates for nG500 and nG16,000, respectively, against that of a normal with the same mean and standard deviation as the ones obtained for the estimators in 100 replications. The figures also allow us to assess the asymptotic approximation to the sampling distribution of the estimators. We find that the normal approximation is reasonable for the smaller values of the autoregressive parameter, but there is substantial evidence of non-normality for ρ0 G0·8 even for the sample size of 16,000. For comparison, we plot in Figures 3 and 4 the smoothed density of the IVo, GMM1o and GMM2o estimators in the absence of sample selectivity, i.e. using dit G1 and ω it G1 for all i and t in the sample moments (M1)–(M6). (Note that these estimators use the entire samples and are infeasible in practice.) The plots suggest that normality may be a poor approximation for the GMM linear panel data estimators for high values of the autoregressive parameter even for very large sample sizes and that it is not a problem specific to the proposed kernel-weighted GMM approach. Finally, we consider the effect of increasing the length of the panel by drawing observations from (1)–(3) for tG0, 1, . . . , 4. The number of linear moment restrictions increases 15. In the absence of sample selection, as Wyhowski (1996) reports, both biases and sampling variances are significantly affected by the fact that the optimal weighting matrices are estimated. 16. The smoothing was done using a standard normal kernel and the rule of thumb bandwidth suggested by Silverman (1986) for density estimation. 560 REVIEW OF ECONOMIC STUDIES FIGURE 1 nG500 KYRIAZIDOU PANEL DATA MODELS FIGURE 2 nG16,000 561 562 REVIEW OF ECONOMIC STUDIES FIGURE 3 nG500 KYRIAZIDOU PANEL DATA MODELS FIGURE 4 nG16,000 563 564 REVIEW OF ECONOMIC STUDIES to six, of the nonlinear homoskedasticity restrictions to three, and of the restrictions of the type of (M6) to two. The results are reported in Table 4. Due to space considerations we only report results for the proposed estimators that use the estimated γ 0. We note that almost invariably the RMSE decreases with the inclusion of the new moment restrictions that result by increasing the length of the panel by one period, although biases are sometimes higher, especially for larger ρ0 and smaller n. In other words, including the additional moment conditions improves on the efficiency of the estimators, as expected, although it does not necessarily decrease their finite sample bias. 5. CONCLUSIONS In this paper, we considered the problem of identification and estimation in panel data sample selection models with a binary selection rule when the latent equations contain strictly exogenous variables, (own) lags of the dependent variables, and additive unobserved individual effects. Under a stationarity and serial independence assumption on the time-varying error vector, we derived a set of conditional moment restrictions which were used to construct GMM-type estimators that are consistent and asymptotically normal under a set of mild regularity conditions. An advantage of the approach taken in this paper is that it does not require any assumptions on the parametric form of the distribution of the unobservables conditional on the observed covariates and the initial conditions. We should point out, however, a potentially serious limitation of the model considered in this paper, namely that the selection equation (3) does not contain the lagged continuous endogenous variable or other predetermined variables. APPENDIX Part A The assumptions SA1–SA4 for the linear model (1′), are sometimes complemented by the ‘‘stationarity’’ assumption suggested by Arellano and Bover (1995): SA5: E(y*it α i* ) is the same for all i and for all t (or equivalently E(α *i 2)G(1Aρ0)E(α i* y*i0)). SA5 implies one additional moment restriction, namely E ((α i* Cε i*2)∆y*i1)G0, (15) (see equation (39) in Arellano and Bover (1995)) while it also allows to express all moment conditions linearly in ρ0 (see equations (12a) and (12b) in Ahn and Schmidt (1995)). As Blundell and Bond (1998) notice, the restriction in (15) holds if in addition to SA1–SA3 we assume: SA5′: y*i0 Gα i* 兾(1Aρ0)Cû*i0 and E (û*i0 α i*)GE (û*i0 ε i*2)G0 for all i, which in conjunction with SA2 implies SA5. Assumption SA5′ will be satisfied when 兩 ρ0兩F1 and the model in j (1′) along with assumptions SA2 and SA3 hold for all tG . . . , 0, 1, . . . , T.17 Then û*i0 ≡ ΣS 0Aj , since in j G0 ρ 0 ε i* S j S j this case y*i0 G Σ j G0 ρ 0 (α i* Cε i*0Aj)Gα i* 兾(1Aρ0)C Σ j G0 ρ 0 ε i*0Aj , although as discussed by Blundell and Bond (1998), these assumptions are not necessary for (15) to hold. We will next examine whether an analogue to (15) holds for the sample selection model (1′)–(3′). As in the linear case, we will need to make additional assumptions about the initial period and possibly about the presample periods as well. An interesting case to examine is when the model (1′)–(3′) holds for all j tG. . . ,A2, A1, 0, 1, 2, . . . , T, and 兩ρ0 兩F1. In this case y*i0 Gα i* 兾(1Aρ0)Cûi0 , where now ûi0 G ΣS 0Aj . j G0 ρ 0 ε i* Then the analogue of the right-hand side of (15) in the presence of sample selection is E(di2 di1 di0 (yi2 Aρ0 yi1)∆yi1)GE((di2 di1 di0 (α i* Cε i*2)(( ρ0 A1) ∑ j G0 ρ 0j ε i*0Aj Cε i*1 )). S 17. Note that, under these assumptions, SA1 is implied by SA2 and SA3. TABLE 4 Proposed estimator; estimated γ 0 ; T G5 ρ0 G0·3 ρ0 G0·5 ρ0 G0·8 Median bias RMSE MAE Mean bias Median bias RMSE MAE Mean bias Median bias MAE IV IVo GMM1 GMM1o GMM2 GMM2o A0·0784 A0·0914 0·0352 0·0317 0·0469 0·0228 A0·1124 A0·1427 0·0012 A0·0282 0·0113 A0·0109 0·2493 0·2358 0·2938 0·2620 0·2597 0·2436 0·1716 0·1844 0·1709 0·1612 0·1522 0·1313 A0·1424 A0·1684 0·0161 0·0079 0·0314 0·0222 A0·1953 A0·2377 0·0113 A0·0357 0·0031 A0·0351 0·3640 0·3436 0·3181 0·3278 0·2872 0·2827 0·2550 0·2646 0·1977 0·2227 0·1778 0·1856 A0·5416 A0·6235 A0·1884 A0·1475 A0·1330 A0·1160 A0·5491 A0·5923 A0·0603 A0·0937 A0·0286 A0·0595 0·7980 0·7488 0·4522 0·3458 0·3806 0·3265 0·5518 0·6004 0·1901 0·2201 0·1815 0·1751 n G1000 IV IVo GMM1 GMM1o GMM2 GMM2o A0·0331 A0·0607 0·0617 0·0227 0·0626 0·0123 A0·0582 A0·0764 0·0297 A0·0046 0·0345 A0·0188 0·1770 0·1604 0·2242 0·1817 0·2071 0·1633 0·1256 0·1262 0·1392 0·1048 0·1299 0·1050 A0·0765 A0·1210 0·0581 0·0148 0·0769 0·0352 A0·1169 A0·1587 0·0521 A0·0007 0·0744 A0·0062 0·2652 0·2423 0·2614 0·2646 0·2449 0·2377 0·1911 0·1810 0·1803 0·1549 0·1760 0·1450 A0·3521 A0·4995 A0·1723 A0·1524 A0·1214 A0·0953 A0·4061 A0·4745 A0·0876 A0·1189 A0·0580 A0·0630 0·6428 0·6305 0·4074 0·3656 0·3533 0·3146 0·4336 0·4786 0·1892 0·2048 0·1652 0·1474 n G4000 IV IVo GMM1 GMM1o GMM2 GMM2o A0·0180 A0·0306 A0·0293 A0·0417 0·0196 0·0228 A0·0018 0·0005 0·0216 0·0283 A0·0026 A0·0074 0·1028 0·0914 0·1070 0·0681 0·0978 0·0625 0·0614 0·0643 0·0737 0·0452 0·0640 0·0456 A0·0287 A0·0548 0·0327 0·0102 0·0374 0·0058 A0·0415 A0·0740 0·0306 A0·0042 0·0308 A0·0091 0·1629 0·1431 0·1483 0·1099 0·1396 0·0983 0·1019 0·1064 0·0955 0·0636 0·0852 0·0580 A0·1028 A0·2256 A0·0132 A0·0146 A0·0030 0·0149 A0·1582 A0·2460 A0·0148 A0·0372 A0·0041 A0·0020 0·4201 0·4026 0·2279 0·2111 0·2106 0·1678 0·2340 0·3035 0·1286 0·1355 0·1179 0·1118 n G16,000 IV IVo GMM1 GMM1o GMM2 GMM2o 0·0021 A0·0008 0·0124 A0·0011 0·0123 A0·0007 0·0603 0·0571 0·0605 0·0461 0·0557 0·0383 0·0421 0·0426 0·0432 0·0304 0·0409 0·0227 A0·0005 A0·0085 0·0201 0·0002 0·0217 A0·0018 A0·0169 A0·0204 0·0058 A0·0077 0·0110 A0·0102 0·0923 0·0839 0·0893 0·0650 0·0843 0·0491 0·0711 0·0619 0·0639 0·0397 0·0611 0·0292 A0·0218 A0·0448 A0·0948 A0·1113 0·0192 0·0175 0·0327 0·0046 0·0253 0·0165 0·0312 0·0223 0·2321 0·2185 0·1763 0·1696 0·1692 0·1545 0·1493 0·1514 0·1214 0·0977 0·1133 0·0795 A0·0057 A0·0099 0·0021 A0·0061 0·0089 A0·0056 PANEL DATA MODELS RMSE n G500 KYRIAZIDOU Mean bias 565 566 REVIEW OF ECONOMIC STUDIES Assuming that {(ε i*t , uit )}tTG0 is i.i.d. over time for all i conditional on ζr i ≡ ({wit }tTG0 , α i* , η i), which is a natural extension of Assumption A1, a sufficient condition for the expectation above to be zero is (1Aρ0) ∑ j G0 ρ 0j E(ε i*0Aj 兩di0 G1, ζr i)GE (ε i*1 兩di1 G1, ζr i). S ¯ 1 (wi0 γ 0 , ζr i) ≡ E(ε i*0 兩di0 G1, ζr i)G This last equality will be satisfied if wi1 γ 0 Gwi0 γ 0 , which implies that Λ ¯ 1 (wi1 γ 0 , ζ¯ i), and in addition E(ε i*0Aj 兩di0 G1, ζr i)GE(ε i*0 兩di0 G1, ζr i)GΛ ¯ 1 (wi1 γ 0 , ζr i) for all jH0. E(ε i*1 兩di1 G1, ζr i) ≡ Λ This last condition, however, that the effect of sample selection in the initial sample period tG0 on the presample errors is constant, does not seem tenable. It will not be satisfied in general if we extend, for example, the conditional independence over time assumption on the error vector (ε i*t , uit) to all periods tG . . . , −2, −1, 0, 1, 2, . . . , T. We will therefore not pursue this restriction further. Part B For the linear model (1), assumptions SA6 and SA6′ are sometimes complemented by the ‘‘stationarity’’ assumption (compare to SA5): SA7: E(x*it α i* ) is the same for all t for each i, (see Bhargava and Sargan (1983) and Breusch, Mizon and Schmidt (1989) for the case where x*it is strictly exogenous, and Arellano and Bover (1995) when it is only predetermined). Under SA1–SA5, SA6 and SA7 we obtain the following T(TA1)C2(TA1) moment conditions in addition to (4), (5), (6), (10) and (12)18 E(∆x*it (α i* Cε i*s ))G0, tG2, . . . , T; sG1, . . . , T, (16) E(x*is (α i* Cε i*t )Ax*it (α i* Cε i*s ))G0, ∀t≠s, (17) E(x*it (α i* Cε i*t )Ax*is (α i* Cε i*s ))G0, ∀t≠s. (18) If SA6′ holds, then we only obtain the following TA1 restrictions in addition to (4), (5), (6), (15) and (11′) (see Arellano and Bover (1995)) E(∆x*it (α i* Cε i*t ))G0, tG2, . . . , T. (16′) In this appendix we examine whether restrictions similar to (16)–(18) may be obtained for the sample selection model. First, consider the analogue of the right-hand-side of (16) E(dit ditA1 dis disA1 ∆xit ( yis Aρ0 yisA1 Axis β 0))GE(dit ditA1 dis disA1 ∆x*it (α i* Cε i*s )). (19) It is clear that Assumption A1′ will in general not suffice for the expectation above to be zero, even if α i* is identically zero for all i. The reason is that without any further restrictions on the time series properties of x*it as well as on the manner by which the x*it process is affected by sample selection, the presence of the latter potentially destroys any assumed stationarity in the correlation structure between x*it and ε i*s (and between x*it and α i* ). In other words, A1′ in general will not imply that either E(α i* ∆x*it 兩dit ditA1 dis disA1 G1)G0, or E(ε i*s ∆x*it 兩dit ditA1 dis disA1 G1)G0. Suppose however that, in addition to A1′, the following assumption holds: B1: (x*it , ε i*t , uit ) is i.i.d. over time for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0). Note that B1 implies that x*it is i.i.d. over time for all i conditional on ζ i which in turn implies SA7. We will next demonstrate how to derive an analogue to (16) in the presence of sample selection. Consider E(α i* ∆x*it 兩dit ditA1 dis disA1 G1)GE (α i* E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)兩dit ditA1 dis disA1 G1). Given that (x*it , uit) is independent over time conditional on ζ i by B1, we have E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)GE(x*it 兩dit G1, ζ i)AE(x*itA1 兩ditA1 G1, ζ i), and by the stationarity of (x*it , uit) conditional on ζ i , we obtain E(∆x*it 兩dit ditA1 dis disA1 G1, ∆wit γ 0 G0, ζ i), 18. As Arellano and Bover (1995, page 45) observe, some of the moment conditions (16)–(18) will be redundant given those in (12). KYRIAZIDOU PANEL DATA MODELS 567 and hence E(dit ditA1 dis disA1 α i* ∆x*it 兩∆wit γ 0 G0), (20) if x*it and uit are not assumed independent conditional on ζ i , or if they are independent, E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)GE(∆x*it 兩ζ i)G0, and hence E(dit ditA1 dis disA1 α i* ∆x*it )G0. (20′) For the second term in (19), we have E(ε i*s ∆x*it 兩dit ditA1 dis disA1 G1, ζ i) G[E(x*it 兩dit G1, ζ i)AE(x*itA1 兩ditA1 G1, ζ i)] E(ε i*s 兩dis G1, ζ i), for t, tA1≠s, and hence E(dit ditA1 dis disA1 ε i*s ∆x*it 兩∆wit γ 0 G0)G0, ∀t, tA1≠s, (21) if x*it and uit are not assumed independent conditional on ζ i , or if they are independent, we obtain E(dit ditA1 dis disA1 ε i*s ∆x*it )G0, ∀t, tA1≠s. (21′) Note that (21) and (21′) will hold for all s, and t, similar to the linear case, if in addition we assume that for all t, ε i*t and x*it are independent conditional on dit and ζ i. Now, combining (20) and (21) (or (20′) and (21′)) we obtain the following moment restriction that is an analogue to (16) E(dit ditA1 dis disA1 ∆x*it ε i*s 兩∆wit γ 0 G0)G0, ∀t, tA1≠s, (22) or E(dit ditA1 dis disA1 ∆x*it ε i*s )G0, ∀t, tA1≠s, (22′) depending on whether x*it and uit are assumed correlated or independent conditional on ζ i. It is easy to verify that, when x*it is not subject to censoring and under assumption B1, we obtain E(dis disA1 ∆x*it ε i*s )G0, ∀t, tA1≠s. (22′′) Using similar arguments we can construct analogues to (17) and (18) when sample selection is present, provided that B1 holds in addition to A1′. The form of these moment restrictions, similar to the analogues (22) or (22′) or (22″) of (16), will depend on whether x*it is censored or not, on whether the sample selection process is endogenous or exogenous to the x*it process conditional on ζ i , as well as on whether x*it is independent of ε i*t given the selection. Whether the variables in x*it are censored or not is an empirical matter that is applicationspecific. Furthermore, the plausibility of Assumption B1 and of any other additional assumptions concerning the relationship between each one of the variables in x*it with (uit , ε i*t ) conditional on ζ i also depends on the particular application at hand. We will therefore not investigate the implied moment restrictions any further. Their inclusion, however, in the estimation procedure described in Section 3 should be straightforward. Another reason for not pursuing these moment restrictions any further is that, although their inclusion should improve the asymptotic efficiency of the estimators, it is not clear how the inclusion of a large number of additional moments would affect the finite sample bias and precision of the estimators. Part C In this appendix we discuss the asymptotic properties of the proposed estimators. We will first present sufficient conditions for consistency of the infeasible estimator that uses the true γ 0 in the construction of the kernel weights. Assumption C1. {(x*it , wit , ε i*t , uit , α i* , η i ); tG1, . . . , T}niG1 is an i.i.d. sample of n draws from a distribution that satisfies (1)–(3). And in addition to {(xit , wit , yit , dit ); tG1, . . . , T}niG1 , we also observe a random sample {(yi0 , di0 )}niG1 from a distribution that satisfies yi0 ≡di0 · y*i0 , where di0 ∈{0, 1} and y*i0 takes values on a subset of the real line. Given the random sampling assumption above, we will from now on drop the subscripts i that denote the individuals’ identity. 568 REVIEW OF ECONOMIC STUDIES Assumption C2. (ε t* , ut) is i.i.d. w ≡ (w1 , . . . , wT) and x* ≡ (x*1 , . . . , x*T ). over time conditional on ζ˜ ≡ (w, x*, α *, η , y*0 , d0) where Assumption C3. For all l, t, E(mlt (θ )兩W0t G0) takes a unique zero at θ Gθ 0 ≡ (φ 0 , β 0′ )′. Assumption C4. W0t ≡ ∆wt γ 0 is absolutely continuously distributed for all t with density ft( · ) that is bounded from above on its support and strictly positive and continuous in a neighbourhood of zero. Assumption C5. θ 0 ∈Θ, a compact subset of ℜkC1. Assumption C6. For some pH1, E兩y*t 兩2p, and E兩x*t 兩2p are finite for all t. Furthermore, E(兩y*t 兩2兩W0t′ G·) and E(兩x*t 兩2兩W0t′ G·) are bounded on their support for all t, t′. Assumption C7. For all l, t (lG1, . . . , 4; tG1, . . . , T ), E(mlt (θ )兩W0t G·) is continuous in a neighbourhood of zero for all θ ∈Θ. (Note that by Assumption (C6), E(mlt (θ )兩W0t G·) is bounded on its support.) Assumption C8. K: ℜ → ℜ is a function of bounded variation that satisfies: (i) supν∈ ℜ 兩K(ν)兩FS, (ii) 冮兩K(ν)兩dνFS, and (iii) 冮K(ν)dν G1. Assumption C9. hn is a sequence of finite positive numbers that satisfies: (i) hn →0 as n→S, and (ii) n1A1兾phn兾ln n → S as n→S, where p is as in Assumption (C6). The strict positiveness of ft in Assumption (C4) and the condition that E(mlt (θ )兩W0t G0) has a unique zero at θ Gθ 0 (Assumption (C3)) are required for identification of θ 0. Note that for lG1, 2, the latter condition is satisfied provided that x*t has full rank conditional on W0t G0, which implies that there exists at least one variable in wt that is not contained in x*t . The rest of the assumptions are regularity conditions that permit the application of a uniform law of large numbers to show convergence of the objective function to its population analogue, a condition required in all consistency proofs of extremum estimators. In some cases the assumption that y*t and x*t have bounded second moments conditional on W0t (see Assumption (C6)) may be restrictive. However, this assumption may be relaxed. Specifically, all that is required is that E(mlt(θ )兩W0t G· ) ft (·) is bounded on its support. The same comment applies to other conditional expectations used in the theorems that follow. Finally, Assumptions (C8) and (C9) are standard in kernel estimation of conditional expectations. Theorem 1. (Consistency of Infeasible Estimator). Let Assumptions (C1)–(C9) hold. Define θ n Garg min Gn (θ )′ A′n An Gn (θ ), θεΘ (∗) where An is a stochastic matrix that conûerges in probability to a finite non-stochastic limit A0, and Gn(θ ) is an RB1 ûector with rows of the form 1 n 1 ∆wit γ 0 K mlit (θ ). ∑ n i G1 hn hn 冢 冣 Then, θ n → θ 0. p We next turn to examine the feasible two-step estimator that uses a consistent estimator γˆ n in the construction of the kernel weights. Under some additional assumptions, this estimator is also consistent. These conditions involve a strengthening of the moment conditions in (C5), additional smoothness on the kernel, and a restriction on the rate of convergence of hn to zero given the rate of convergence of the first-step estimator γˆ n. Assumption C10. E兩y*t 兩4, E兩x*t 兩4 and E兩wt 兩2 are finite for all t. Assumption C11. K is continuously differentiable with derivative that satisfies: supν∈ ℜ 兩K ′ (ν)兩FK1FS. Assumption C12. hn satisfies hn−2(γˆ nAγ 0 )Gop(1) where γˆ n is a consistent estimator of γ 0. Theorem 2. (Consistency of Two-Step Estimator). Let Assumptions (C1)–(C12) hold. Define ˆ n (θ )′A′n An G ˆ n (θ ), θˆ n Garg min G θ∈Θ ˆ n(θ ) is an RB1 ûector with rows of the form where An is as in Theorem 1, and G 1 n ∑i G1 n 1 hn K 冢 ∆wit γˆ n hn 冣m lit (θ ). (∗∗) KYRIAZIDOU PANEL DATA MODELS 569 p Then, θˆ n → θ 0. We next present conditions that are sufficient for asymptotic normality of the proposed estimators. Apart from the usual strengthening of regularity conditions on the existence and finiteness of moments higher than those required for consistency, additional smoothness is imposed on the model which allows convergence at a faster rate. Assumption N1. θ 0 ∈int(Θ). Assumption N2. For all t, ft( · ) is s (sÂ1) times continuously differentiable on its support and has uniformly bounded derivatives. Also, for all t, t′, (W0t , W0t′ ) has density f ( · , · ) that is uniformly bounded on its support. ˜ 1t ≡Λ1 (wt γ 0 , ζ˜ ) ≡E(ε t* 兩ut ‰wt γ 0 Cη Cφ 0 , ζ˜ ) and Λ ˜ 2t ≡Λ2 (wt γ 0 , Assumption N3. For all t, the functions Λ ζ˜ ) ≡E(ε *t 2 兩ut ‰wt γ 0 Cη Cφ 0 , ζ˜ ) satisfy ˜ jt GΛ ˜ jt AΛ ˜ jtA1 GΛ j (wt γ 0 , ζ˜ )AΛ j (wtA1 γ 0 , ζ˜ )GΛ*jt W0t ∆Λ where jG1, 2, and Λ*jt ≡Λ*j (wt γ 0 , wtA1 γ 0 , ζ˜ ) is bounded on its support. (For example, this condition will hold if Λ1(wtγ 0 , ζ˜ ) and Λ2 (wtγ 0 , ζ˜ ) are continuously differentiable with respect to their first argument with bounded derivatives.) Assumption N4. For all l and t, and for all θ ∈Θ, E(m(1) lt (θ )兩W0t G·) is continuous in a neighbourhood of zero. (Here, m(1) lt (θ ) is the Jacobian of first-order derivatives of mlt(θ ) with respect to θ ′.) Assumption N5. E(mlt (θ 0 )ml ′t (θ 0 )′兩W0t G·) is continuous in a neighbourhood of zero as a function of W0t for all l, l′, t. Assumption N6. E兩y*t 兩4 and E兩x*t 兩4 are finite for all t. Assumption N7. E(兩y*t 兩4C2δ兩W0t′ G·), E(兩x*t 兩4C2δ 兩W0t′ G·), E(兩y*t 兩4兩W0t′ G ·, W0t″ G·), and E(兩x*t 兩4兩W0t′ G·, W0t″ G·) are bounded on their support for some δ ∈(0, 1), and for all t, t′, t″. Assumption N8. For all l and t, E(m*lt (θ 0 )兩W0t G·) is s times continuously differentiable on its support, and has bounded derivatives. (The functions m*lt (θ 0) are defined below.) Assumption N9. K is an (sC1)-th order bias-reducing kernel that satisfies: 兰 兩ν兩 j 兩K(ν)兩dνFS for jG0 and j‰sC1, and 冮ν K(ν)dν G冦0,1, j if jG0, if 0FjFsC1. The functions m*lt (θ 0) in Assumption (N8) are defined as follows ρ 0l (α *Cx*tAjAl β 0 CE(ε t*AjAl 兩D1t, j G1, ζ˜ )))Λ*1t , m*1t, j (θ 0)GPr (D1t, j G1兩ζ˜ )( ρtAj 0 C∑l G0 0 y* tAjA1 m*2t,κ (θ 0)GPr (D2t G1兩ζ˜ ) x*s,κ Λ*1t , m*3t (θ 0)GPr (D3t G1兩ζ˜ ) (α *CE(ε T* 兩D3t G1, ζ˜ ))Λ*1t , m*4t (θ 0)GPr (D4t G1兩ζ˜ )(2α *Λ*1t CΛ*2t). Note that by the boundedness of Λ*jt ( jG1, 2) in Assumption (N3), and of E(兩y*t 兩4C2δ 兩W0t′ ), E (兩x*t 兩4C2δ 兩W0t′ ) in Assumption (N7), E(兩m*lt (θ 0)兩2兩W0t G·) is bounded on its support for all l, t. Now, with this notation we can write for example E(m1t, j (θ 0))GE(E(m1t,j (θ 0)兩ζ˜ )) GE(E(D1t, j y*tAj ∆ε t* 兩ζ˜ )) GE(Pr (D1t, j G1兩ζ˜ ) E(y*tAj ∆ε t* 兩D1t, j G1,ζ˜ )) tAjA1 l GE(Pr (D1t,j G1兩ζ˜ )(ρtAj ρ 0 (α *Cx*tAjAl β 0 CE(ε t*AjAl 兩D1t, j G1, ζ˜ )))Λ*1t W0t 0 C∑l G0 0 y* GE(m*1t, j (θ 0)W0t). where jG2, . . . , t. Similarly, we can write for lG2, 3, 4 E(mlt (θ 0))GE(m*lt (θ 0)W0t). 570 REVIEW OF ECONOMIC STUDIES These expressions are a consequence of the smoothness condition on the functions Λj in Assumption (N3) alluded to earlier. With this additional smoothness, the bias of the estimator, which is due to the fact that E((1兾hn )K(W0t 兾hn)mlt (θ 0))≠0 for any finite n, and which, similarly to univariate kernel density and regression function estimation, would be of order O(hsn) given a degree of smoothness s on E(mlt(θ 0)兩W0t G·) and ft( · ) (compare to Assumptions (N2) and (N8)), is now of smaller order O(hnsC1). As a result, the estimator converges in distribution at a faster rate, namely at rate nA(sC1)兾[2(sC1)C1] compared to the one typically obtained in kernel estimation, namely nAs兾(2sC1). Theorem 3. (Asymptotic Normality of Infeasible Estimator). Let Assumptions (C1)–(C9) and (N1)–(N9) hold and θ n be a solution to (∗). (i) Let 1nhn hnsC1 → h, with 0‰hFS. Then d −1 1nhn (θ n Aθ 0) → N (Ah(A*0 D0)−1 A*0 B0 , (A*0 D0)−1 A*0 V0 A*′ 0 (A* 0 D0) ), where B0 is an (RB1) ûector with elements of the form B0lt ≡Blt (θ 0) ≡ ∂(s) {E(m*lt (θ 0)兩W0t) ft (W0t)}W0t G0 · s! ∂W s0t 1 · 冮ν K(ν)dν. sC1 A*0 ≡D′0 A′0 A0 , with D0 an (RB(kC1)) matrix with elements of the form D0lt ≡Dlt (θ 0) ≡ ft (0) · E(m(1) lt (θ 0)兩W0t G0), and V0 is an (RBR) matrix with elements of the form V0ltl ′t′ ≡Vltl ′t′ (θ 0) ≡ ft (0) · E(mlt (θ 0)ml ′t′ (θ 0)′兩W0t G0) · 兰 K(ν)2dν 冦0 if tGt′, if t≠t′. (ii) Let 1nh˜ n h˜ nsC1 → S. Then p h˜ nA(sC1) (θ n Aθ 0) → A(A*0 D0)−1A*0 B0. In order to obtain the limiting distribution for the feasible estimator, θˆ n , additional smoothness is required. We next present sufficient conditions for θˆ n to have the same asymptotic distribution as the infeasible estimator, θ n , of the previous theorem. For the limiting distribution of θˆ n not to depend on the asymptotic distribution of the first-step estimator γˆ n , the bandwidth is required to converge to zero at a rate such that γˆ n converges in distribution faster than θ n (see Assumption (N13)). Assumption N10. E兩y*t 兩8 , E兩x*t 兩8, and E兩wt兩8 are finite for all t. Assumption N11. E(兩y*t 兩4C2δ兩W0t′ G·) and E(兩x*t 兩4C2δ 兩W0t′ G·) are bounded on their support for some δ ∈ (0, 1), and for all t, t′. Assumption N12. E(mlt (θ 0 )mlt (θ 0 )′∆wqt 1 ∆wqt 2 兩W0t G·) and E(mlt (θ 0)mlt (θ 0)′∆wqt 1 ∆wqt 2 ∆wqt 3 ∆wqt 4 兩W0t G· ) are continuous in a neighbourhood of zero for all l and t. (Here, qj ∈{1, . . . , q}). Assumption N13. K is three times continuously differentiable with derivatives that satisfy: (i) supν ∈ℜ 兩K ′(ν)兩FK1FS, supν ∈ℜ 兩K″(ν)兩FK2FS, supν ∈ℜ兩K″′(ν)兩FK3FS, and (ii) 兰 兩νK ′(ν)兩dνFS, and 兰 兩νK″(ν)兩dνFS. Assumption N14. hn satisfies 1nhn(γˆ nAγ 0)Gop(1). Theorem 4. (Asymptotic Normality of Two-Step Estimator). Let θˆ n be a solution to (∗∗). In addition to the assumptions of Theorem 2, let Assumptions (N1)–(N14) hold. (i) Let 1nhnhnsC1 → h, with 0‰hFS. Then d −1 1nhn (θˆ n Aθ 0) → N(Ah(A*0 D0)−1 A*0 B0 , (A*0 D0)−1 A*0 V0 A*′ 0 (A* 0 D0) ), where A*0 , V0, and D0 are defined as in Theorem 3. (ii) Let 1nh˜ n h˜ nsC1 →S. Then p h˜ nA(sC1) (θ nAθ 0) → A(A*0 D0 )−1A*0 B0. KYRIAZIDOU PANEL DATA MODELS 571 In order to carry out hypothesis testing and to construct confidence intervals based on the asymptotic distribution of the estimator, one needs consistent estimators of the components of the asymptotic variance. Consider the following functions Vnltl ′t′ (θ )G Dnlt (θ )G 1 n ∆wit γˆ n ∑i G1 mlit (θ )ml ′it′ (θ )′ K nhn hn 冢 2 冣, ∆wit γˆ n 1 n (1) , ∑ mlit (θ )K nhn i G1 hn 冢 冣 where θˆ n is a consistent estimator of θ 0. Under some additional regularity conditions that guarantee that the sample averages above converge uniformly in θ to their population analogues, and provided that the latter are continuous at θ 0 , it is not difficult to show that p Vnltl ′t′ (θˆ n) → Vltl ′t′ (θ 0) ≡V0ltl ′t ′, Dnlt (θˆ n) → Dlt (θ 0) ≡D0lt . p Acknowledgements. I would like to thank Richard Blundell, Xiaohong Chen, Lars Hansen, Jim Heckman, Bo Honore´ , Joe Hotz, Guido Imbens, Costas Meghir, seminar participants at various institutions, the journal’s managing editors, and two anonymous referees for helpful suggestions and comments. The paper was first presented at the 1997 Econometric Society European meetings, Toulouse, France. This research was supported by the National Science Foundation. All errors are naturally mine. REFERENCES AHN, S. C. and SCHMIDT, P. (1995), ‘‘Efficient Estimation of Models for Dynamic Panel Data’’, Journal of Econometrics, 68, 5–27. AMEMIYA, T. (1985) Adûanced Econometrics (Cambridge: Harvard University Press). ANDERSON, T. W. and HSIAO, C. (1981), ‘‘Estimation of Dynamic Models with Error Components’’, Journal of the American Statistical Association, 76, 598–606. ARELLANO, M. and BOND, S. R. (1991), ‘‘Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations’’, Reûiew of Economic Studies, 58, 277–297. ARELLANO, M. and BOVER, O. (1995), ‘‘Another Look at the Instrumental Variable Estimation of Error Component Models’’, Journal of Econometrics, 68, 29–51. ARELLANO, M., BOVER, O. and LABEAGA, J. M. (1997), ‘‘Autoregressive Models with Sample Selectivity for Panel Data’’, in C. Hsiao, K. Lahiri, L.-F. Lee and H. Pesaran (eds.), Analysis of Panels and Limited Dependent Variable Models (Cambridge: Cambridge University Press). ARELLANO, M. and HONORE´ , B. E. (1999), ‘‘Panel Data Models. Some Recent Developments’’ (Unpublished manuscript prepared for the Handbook of Econometrics, Vol. 5). BHARGAVA, A. and SARGAN, J. D. (1983), ‘‘Estimating Dynamic Random Effects Models from Panel Data Covering Short Time Periods’’, Econometrica, 51, 1635–1659. BLUNDELL, R. and BOND, S. (1998), ‘‘Initial Conditions and Moment Restrictions in Dynamic Panel Data Models’’, Journal of Econometrics, 87, 115–143. BOVER, O. and ARELLANO, M. (1997), ‘‘Estimating Dynamic Limited Dependent Variable Models from Panel Data’’, Inûestigaciones Economicas, 21, 141–165. BREUSCH, T. S., MIZON, G. E. and SCHMIDT, P. (1989), ‘‘Efficient Estimation Using Panel Data’’, Econometrica, 57, 695–700. CHAMBERLAIN, G. (1984), ‘‘Panel Data’’, in Z. Griliches and M. Intriligator (eds.), Handbook of Econometrics, Vol. II (Amsterdam: North Holland). CHARLIER, E., MELENBERG, B. and VAN SOEST, A. (1995), ‘‘A Smoothed Maximum Score Estimator for the Binary Choice Panel Data Model and an Application to Labour Force Participation’’, Statistica Neerlandica, 49, 324–342. COGAN, J. F. (1981), ‘‘Fixed Costs and Labor Supply’’, Econometrica, 49, 945–964. ECKSTEIN, Z. and WOLPIN, K. I. (1990), ‘‘On the Estimation of Labor Force Participation, Job Search, and Job Matching Models Using Panel Data’’, in Y. Weiss and G. Fishelson (eds.), Adûances in the Theory and Measurement of Unemployment. HANOCH, G. (1980), ‘‘Hours and Weeks in the Theory of Labor Supply’’, in J. P. Smith (ed.), Female Labor Supply: Theory and Estimation (Princeton: Princeton University Press). HANSEN, L. P. (1982), ‘‘Large Sample Properties of Generalized Method of Moments Estimators’’, Econometrica, 50, 1029–1054. ¨ RDLE, W. (1990), ‘‘Applied Nonparametric Regression (Cambridge: Cambridge University Press). HA HAUSMAN, J. A. (1980), ‘‘The Effects of Wages, Taxes, and Fixed Costs Women’s Labor Force Participation’’, Journal of Public Economics, 14, 161–194. 572 REVIEW OF ECONOMIC STUDIES HECKMAN, J. J. (1981), ‘‘Heterogeneity and State Dependence’’, in S. Rosen (ed.), Studies of Labor Markets, (Chicago: University of Chicago Press). HECKMAN, J. J. (1993), ‘‘Lessons from Empirical Labor Economics: 1972–1992’’, American Economic Association Paper and Proceedings, 83, 116–121. HOLTZ-EAKIN, D., NEWEY, W. and ROSEN, H. S. (1988), ‘‘Estimating Vector Autoregression with Panel Data’’, Econometrica, 56, 1371–1396. HONORE´ , B. E. (1992), ‘‘Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression Models with Fixed Effects’’, Econometrica, 60, 533–565. HONORE´ , B. E. (1993), ‘‘Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Dependent Variables’’, Journal of Econometrics, 59, 35–61. HONORE´ , B. E. and KYRIAZIDOU, E. (2000), ‘‘Panel Data Discrete Choice Models with Lagged Dependent Variables’’, Econometrica, 68, 839–874. HONORE´ , B. E. and KYRIAZIDOU, E. (2000), ‘‘Estimation of Tobit-Type Models with Individual Specific Effects’’, Econometric Reûiews, 19, 341–366. HONORE´ , B. E. and LEWBEL, A. (1998), ‘‘Semiparametric Binary Choice Panel Data Models without Strictly Exogenous Regressors’’ (Unpublished manuscript). HOROWITZ, J. L. (1992), ‘‘A Smoothed Maximum Score Estimator for the Binary Response Model’’, Econometrica, 60, 505–531. HOTZ, V. J., KYDLAND, F. E. and SEDLACEK, G. L. (1988), ‘‘Intertemporal Preferences and Labor Supply’’, Econometrica, 56, 335–360. HSIAO, C. (1986) Analysis of Panel Data (Cambridge: Cambridge University Press). HYSLOP, D. (1999), ‘‘State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force Participation of Married Women’’, Econometrica, 67, 1255–1294. JOHNSON, T. R. and PENCAVEL, J. H. (1984), ‘‘Dynamic Hours of Work Functions for Husbands, Wives, and Single Females’’, Econometrica, 52, 363–389. KYDLAND, F. E. and PRESCOTT, E. C. (1982), ‘‘Time to Build and Aggregate Fluctuations’’, Econometrica, 50, 1345–1370. KYRIAZIDOU, E. (1995) Essays in Estimation and Testing of Econometric Models (Unpublished Ph.D. thesis, Department of Economics, Northwestern University). KYRIAZIDOU, E. (1997), ‘‘Estimation of A Panel Data Sample Selection Model’’, Econometrica, 65, 1335– 1364. MANSKI, C. (1987), ‘‘Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data’’, Econometrica, 55, 357–362. ´ TYA ´ S, L. and SEVESTRE, P. (1996) The Econometrics of Panel Data (Kluwer Academic Publishers: MA Boston). POWELL, J. L. (1987), ‘‘Semiparametric Estimation of Bivariate Latent Variable Models’’ (Working Paper No 8704, Social Systems Research Institute, University of Wisconsin–Madison). SILVERMAN, B. W. (1986) Density Estimation for Statistics and Data Analysis (New York: Chapman and Hall). WYHOWSKI, D. J. (1996), ‘‘Monte Carlo Evidence for Dynamic Panel Data Models’’ (Unpublished manuscript, The Australian National University).
© Copyright 2024