Estimation of Dynamic Panel Data Sample Selection Models EKATERINI KYRIAZIDOU Uni

Review of Economic Studies (2001) 68, 543–572
 2001 The Review of Economic Studies Limited
0034-6527兾01兾00230543$02.00
Estimation of Dynamic Panel
Data Sample Selection Models
EKATERINI KYRIAZIDOU
Uniûersity of California, Los Angeles
First ûersion receiûed December 1997; final ûersion accepted June 2000 (Eds.)
This paper considers the problem of identification and estimation in panel data sample selection models with a binary selection rule, when the latent equations contain strictly exogenous
variables, lags of the dependent variables, and unobserved individual effects. We derive a set of
conditional moment restrictions which are then exploited to construct two-step GMM-type estimators for the parameters of the main equation. In the first step, the unknown parameters of the
selection equation are consistently estimated. In the second step, these estimates are used to construct kernel weights in a manner such that the weight that any two-period individual observation
receives in the estimation varies inversely with the relative magnitude of the sample selection effect
in the two periods. Under appropriate assumptions, these ‘‘kernel-weighted’’ GMM estimators are
consistent and asymptotically normal. The finite sample properties of the proposed estimators are
investigated in a small Monte-Carlo study.
1. INTRODUCTION
Panel data are very useful in applied research. Not only do they allow researchers to study
the intertemporal behaviour of individuals, they also enable them to control for the presence of unobserved permanent individual heterogeneity. To date there exists a large body
of literature on panel data models with unobserved individual effects that enter additively
in the (possibly latent) regression model (see for example Hsiao (1986), and Ma´tya´s and
Sevestre (1996)). In recent years, considerable advances in the panel data literature have
been made in the direction of dynamic linear models that allow for the presence of lags
of the dependent variable and other predetermined variables (see for example Ahn and
Schmidt (1995), Arellano and Bover (1995), and Blundell and Bond (1998)). These are
reviewed in Arellano and Honore´ (1999), who also describe results for dynamic non-linear
panel data models (such as discrete choice and sample selection models). Much less is
known, however, for this type of models. This paper aims to contribute in that direction.
In particular, we consider the problem of identification and estimation in panel data
sample selection models with a binary selection rule (Type 2 Tobit models in the terminology of Amemiya (1985)) when the latent equations contain strictly exogenous variables,
lags of the dependent variables, and additive unobserved individual effects. The model
under consideration has the form
y*it Gρ0 y*itA1 Cx*it β 0 Cα *i Cε *it ,
(1)
yit Gdit y*it ,
(2)
dit G1{φ 0 ditA1 Cwit γ 0 Cη i Auit ‰0},
(3)
where iG1, . . . , n and tG1, . . . , T. Throughout the paper, n is considered to be large
relative to T, which is the case most frequently encountered in practice. It is assumed that
the sample starts at date tG0, and that di0 and yi0 (which may or may not be censored)
543
544
REVIEW OF ECONOMIC STUDIES
are observed although the model is not specified for the initial period. As will become
clear later, it is not necessary to assume that the other covariates are observed in the
initial period of the sample. In the model given by (1)–(3), ρ0 , φ 0 ∈ℜ, β 0 ∈ℜk and γ0 ∈ℜq
are the unknown parameters of interest, x*it and wit are vectors of strictly exogenous
explanatory variables (with possibly common elements),1 ε *it and uit are unobserved disturbances (not necessarily independent of each other), and α *i and η i are unobservable
time-invariant individual-specific effects that are possibly correlated with each other as
well as with the errors, the regressors, and the initial observations. y*it ∈ℜ is a latent
variable whose observability depends on the outcome of the indicator variable
dit ∈{0, 1}. In particular, it is assumed that, while (dit , wit) is always observed, ( y*it , x*it ) is
observed only if dit G1.2 In other words, the ‘‘selection’’ variable dit determines whether
the it-th observation in equation (1) is censored or not. Thus, the observed sample consists
of quadruples (dit , wit , yit , xit), where xit ≡ dit x*it , from which we want to estimate (ρ 0, φ 0,
β 0, γ 0). As there exist several results on identification and estimation of the discrete choice
selection equation (3) (see Section 3), in this paper we will focus attention on the parameters (ρ 0, β 0) of the continuous outcome equation (1).
A feature of the model that should be pointed out from the outset is that, although
x*it and wit may contain common variables, the two vectors do not coincide, which rules
out the censored regression model (the Type 1 Tobit model) as a special case of the model
considered in this paper. The reason is that our semiparametric identification scheme of
the continuous outcome equation requires that the selection equation contains at least
one variable that is not included in the outcome equation. Such an exclusion restriction
is standard in semiparametric Type 2 Tobit models.3
In this paper we derive conditions under which the parameters of the model (1)–(3)
are identified and propose estimators that are consistent and asymptotically normal, without placing any restrictions on the parametric form of the distribution of any of the
unobservables, or on the statistical relationship of the individual effects with the observed
covariates and the initial conditions. In this sense we follow a semiparametric ‘‘fixed
effects’’ approach. Arellano et al. (1997), and Bover and Arellano (1997) discuss
estimation of certain dynamic panel data limited dependent variable models taking a
parametric ‘‘random effects approach.’’
The model under consideration may be relevant, for example, for estimating intertemporal labour supply responses to wage rate and non-labour income changes. Intertemporal
substitution of labour has received a great deal of attention in the literature, having figured
prominently in explanations of aggregate business cycle fluctuations (Kydland and
Prescott (1982)). Dynamic models of labour supply of the form of (1) may arise when
preferences are allowed to be non-separable over time (see, for example, Hotz et al. (1988),
and Johnson and Pencavel (1984)), and have been found to yield intertemporal labour
supply elasticities of substitution higher than models that assume intertemporal separability. Most of this literature, however, has only considered interior solutions, i.e. behaviour only at the intensive (hours worked) margin. Another strand of the literature (see
Heckman (1993) for a recent survey) has stressed the importance of considering behaviour
at the extensive (participation) margin as well, especially for women, which gives rise to
1. As we discuss in Section 2, although it may be possible to dispense with the strict exogeneity assumption
on x*it , this is not possible for wit.
2. As we discuss in Section 2.2, the case where some or all of the variables in x*it are always observed can
be easily handled.
3. A recent paper (Chen (1996)) shows that in the cross-sectional version of the model, it is possible to
replace the exclusion restriction with a symmetry assumption on the joint distribution of the errors. This result
does not seem to carry over to the panel data case.
KYRIAZIDOU
PANEL DATA MODELS
545
Tobit-type models of labour supply. When looking at the participation decision alone,
models of discrete choice that incorporate state dependence of the form of (3) have been
used to account for the presence of human capital accumulation (Heckman (1981)) or
search costs (Eckstein and Wolpin (1990), and Hyslop (1999)). Finally, a set of papers
(Cogan (1981), Hanoch (1980), Hausman (1980)) has found considerable evidence of fixed
costs associated with working, implying that a Type 2 Tobit specification may be more
appropriate for analysing labour supply. The model considered in this paper incorporates
the salient features of all these strands of literature. However, it may be difficult to derive
the model (1)–(3) directly from a structural dynamic utility maximization problem. The
reason is that typically the model would introduce the lagged yit (or y*it ) in the selection
equation. This would then violate the strict exogeneity assumption on the selection covariates wit required by the proposed identification approach, which conditions on current,
past, and future values of the explanatory variables in the selection equation. As we discuss
in Section 2, although in principle it is possible to allow for the lagged selection variable
in the main equation, its coefficient is not identified.
The paper is organized as follows. Section 2 obtains a set of moment restrictions for
the model (1)–(3). To facilitate comparisons with the existing literature, in that section we
discuss the assumptions and the moment conditions used in the estimation of dynamic
linear panel data models of the form of equation (1). In that section we also discuss a
variation of the model in which the continuous outcome equation contains the lagged
censored endogenous variable instead of the lagged latent one. Section 3 presents the
proposed estimators for the dynamic panel data sample selection model and discusses
practical issues such as estimation of the discrete choice selection equation. Section 4
presents the results of a small Monte Carlo experiment for the case where the main equation follows a pure first-order autoregression with an individual-specific drift and the
selection equation contains only exogenous regressors. The Appendix presents conditions
under which the proposed estimators are consistent and asymptotically normal, and states
the formal results.4
2. MOMENT RESTRICTIONS IN DYNAMIC PANEL DATA SAMPLE
SELECTION MODELS
The model (1)–(3) is an extension of the model considered in Kyriazidou (1997) which
only allowed for strictly exogenous variables in (1) and (3) (i.e. imposed ρ0 Gφ 0 G0). The
idea for identifying β 0 in that paper relies on the conditional pairwise exchangeability of
the error vector (ε *it , uit) given the entire path of the exogenous covariates and the individual effects. Identification of β 0 is based on the observation that, for an individual who is
observed in two time periods, say t and tA1 (i.e. who has dit GditA1 G1), the magnitude
of the sample selection bias in the two time periods is the same if ∆wit γ 0 G0.5 (Here, and
throughout, ∆ denotes first-differences.) This implies that time-differencing the main equation (1) eliminates not only the individual effect but also the effect of sample selection.
However, the consistency of the estimator above breaks down in the presence of the lagged
dependent variable in (1). The reason is the same as in linear dynamic panel data models,
4. All proofs are contained in an additional Appendix available at the author’s web site, http:兾兾www.econ.
ucla.edu兾kyria. At the same site there is also available a preliminary GAUSS code that implements the proposed
estimator.
5. The idea of using pairwise comparisons to eliminate the sample selection effect was first proposed by
Powell (1987) for the cross-sectional version of the model.
546
REVIEW OF ECONOMIC STUDIES
where standard estimators such as the first-difference and the within estimators are inconsistent because of the non-zero correlation of y*itA1 with the transformed (first-differenced
or in deviation from its time mean) error term. As we review in this section, in the absence
of sample selection, estimation of ρ0 and β 0 in (1) is based on linear and nonlinear moment
conditions, implied by assumptions on the serial correlation structure of the time-varying
unobservables, on the correlation structure of the unobservables with the observable
covariates, and兾or on assumptions on the initial conditions. We show that similar moment
conditions may be also obtained in the presence of sample selection under appropriate
assumptions on the errors of the model. In particular, we will assume here that (ε *it , uit) is
independent and identically distributed over time conditional on the exogenous variables,
the individual effects and the initial observations. Similar to Kyriazidou (1997), identification in the dynamic model (1)–(3) relies on conditioning on the event
∆wit γ 0 Cφ 0 (1AditA2 )G0. Clearly, this condition collapses to ∆wit γ 0 G0 if ditA2 G1. Thus,
for an individual who is observed for three (say, consecutive) periods, taking time-differences eliminates not only the individual effect but also the effect of sample selection from
the main equation, provided that the ‘‘selection index’’, wit γ 0 , remains constant. The key
intuition is that in this case, all conditional moments of the main equation errors are
constant due to the assumed stationarity of the error vector and are therefore eliminated
by time-differencing. We proceed by considering successively four cases that build towards
the general model (1)–(3).
2.1. Case 1: ρ0 ≠0, β 0 G0, φ 0 G0
We will first consider the case where all the selection covariates are strictly exogenous
with respect to (ε *it , uit) and the main equation follows a first-order purely autoregressive
model. As will become obvious, more than one lag of the dependent variable y*it may be
handled in a straightforward manner. The model has the form
y*it Gρ0 y*itA1 Cα *i Cε *it ,
yit Gdit y*it ,
(1′)
(2)
dit G1{wit γ 0 Cη i Auit ‰0}.
(3′)
In the absence of sample selection, i.e. when yit Gy*it for all i and for all t, estimation
of ρ0 in (1′) is typically based on the following assumptions:
SA1: E(ε *it y*i0 ) is the same for all t for each i.
SA2: E(ε *it α *i ) is the same for all t for each i.
SA3: E(ε *it ε *is ) is the same for all t≠s for each i.
Typically, the moments above are assumed to be zero (see for example Ahn and Schmidt
(1995) and Blundell and Bond (1998)). Assuming that y*i0 is observed, Assumptions SA1–
SA3 imply the following orthogonality conditions
E(y*itAj ∆ε *it )G0,
tG2, . . . , T; jG2, . . . , t,
(4)
E((α *i Cε *iT)∆ε *it )G0,
tG2, . . . , TA1,
(5)
(compare to equations (3) and (4) of Ahn and Schmidt (1995)). Note that (4) implies
T(TA1)兾2 zero moment restrictions that are linear in ρ0 , while (5) implies TA2 nonlinear
restrictions. As Ahn and Schmidt show, estimation of ρ0 by GMM that uses the sample
KYRIAZIDOU
PANEL DATA MODELS
547
analogues of (4) and (5) produces an estimator that is efficient within the class of all
estimators that exploit all the moment conditions.
In addition to SA1–SA3, it is also often assumed that the time-varying errors in (1′)
are homoskedastic over time, in the sense that:
SA4: E(ε *it 2)GE(ε is*2), for all i and for all t, s,
which implies the additional TA1 nonlinear moment restrictions6
E((α *i Cε *it )2A(α *i Cε *itA1)2)G0,
tG2, . . . , T.
(6)
We now turn to examine whether conditions similar to (4)–(6) hold for the sample
selection model (1′)–(3′). We will make the following assumption:
A1: {(ε *it , uit)}TtG1 is i.i.d. over time for all i conditional on ζ i ≡ (wi , α *i , η i , y*i0 , di0)
where wi ≡ (wi1 , . . . , wiT).
The strict stationarity assumption on ( ε *it , uit) is stronger than the second moment
assumptions SA1–SA4. In fact, it implies SA1–SA4. It is however typical in nonlinear
semiparametric panel data models (see for example Manski (1987), and Honore´ (1992,
1993)). The conditional serial independence is also a strong assumption, stronger than
both the serial uncorrelatedness usually assumed in linear dynamic panel data models and
the conditional pairwise exchangeability often assumed in ‘‘static’’ nonlinear panel data
models.7
For the sample selection model (1′)–(3′), the analogue of the right-hand-side of (4)
is
E(dit ditA1 ditA2 ditAj yitAj (∆yit Aρ0 ∆yitA1))GE(dit ditA1 ditA2 ditAj y*itAj ∆ε *it ),
which in general will not be zero due to the contemporaneous correlation of ε *it and uit.
This may be seen from
E(y*itAj ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i)
GρtAj
i0 E(∆ε *
it 兩dit ditA1 ditA2 ditAj G1, ζ i)
0 y*
tAjA1
ρl0 α *i E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i)
tAjA1
ρl0 E(ε *itAjAl ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i),
C ∑l G0
C ∑l G0
where due to the conditional independence of ( ε *it , uit) over time,
E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i)GE(ε *it 兩dit G1, ζ i)AE(ε *itA1 兩ditA1 G1, ζ i),
6. As Ahn and Schmidt discuss, with the addition of SA4, (5) and (6) may be expressed as
E (y*it ∆ε i*tC1 Ay*itC1 ∆ε i*tC2)G0,
tG1, . . . , TA2,
E(∆ε i*tC1 (1兾T) ∑t G1 (α i* Cε i*t ))G0,
tG1, . . . , TA1,
T
so that the number of linear in ρ0 restrictions is maximized.
7. For a discussion of the role of conditional pairwise exchangeability in ‘‘static’’ panel data Tobit-type
models, see Honore´ and Kyriazidou (2000).
548
REVIEW OF ECONOMIC STUDIES
while for all jG2, . . . , t and for all lG0, . . . , tAjA1,
E(ε *itAjAl ∆ε *it 兩dit ditA1 ditA2 ditAj G1, ζ i)
GE(ε *itAjAl 兩ditAjAl G1, ζ i) [E(ε *it 兩dit G1, ζ i)AE(ε *itA1 兩ditA1 G1, ζ i)].
Obviously in general, the ‘‘selection correction term’’ or ‘‘sample selection effect’’
Λ1it ≡ E(ε *it 兩dit G1, ζ i)GE(ε *it 兩uit ‰wit γ 0 Cη i ,ζ i),
will not be zero even if E(ε *it 兩ζ i)G0 for all t. Furthermore, Λ1it ≠Λ1itA1 since uit and uitA1
are truncated at different thresholds, wit γ 0Cη i and witA1 γ 0Cη i, respectively, which vary
over time due to the time-variation of the scalar ‘‘selection index’’ wit γ 0. However, the
stationarity of (ε *it , uit) over time implies that the functional form of Λ1it and Λ1itA1 is
time-invariant, i.e. Λ1it GΛ1 (wit γ 0 Cη i , ζ i), although in principle the functional form of Λ1
may vary over individuals. This implies that if for an individual i the selection index wit γ 0
is constant in periods t and tA1, the magnitude of the sample selection effects in the two
periods will also be the same, i.e.
wit γ 0 GwitA1 γ 0 ⇒ wit γ 0 Cη i GwitA1 γ 0 Cη i ⇒ Λ1it GΛ1itA1 .
In other words, E(∆ε *it 兩dit ditA1 ditA2 ditAj G1, ∆wit γ 0 G0, ζ i)G0. Hence we obtain the
following moment restriction for the sample selection model which is analogous to (4)8
E(dit ditA1 ditA2 ditAj y*itAj ∆ε *it 兩∆wit γ 0 G0)G0,
tG2, . . . , T; jG2, . . . , t.
(7)
In fact, A1 implies an infinite number of moment restrictions since any measurable
function of the lagged y*’s that are not censored may be substituted for yitAj in (7).
Using the same reasoning as above, it is straightforward to obtain the following
moment restrictions which are analogous to (5) and (6), respectively9
E(diT diTA1 dit ditA1 ditA2 (α *i Cε *iT)∆ε *it 兩∆wit γ 0 G0)G0,
E(dit ditA1 ditA2 ((α *i Cε *it )2A(α *i Cε *itA1)2)兩∆wit γ 0 G0)G0,
tG2, . . . , TA1,
tG2, . . . , T.
(8)
(9)
In Appendix A we discuss another assumption suggested by Arellano and Bover
(1995) which implies mean stationarity of the y*it process and which has been proven quite
useful for the identification of ρ0 in linear dynamic models in cases where the series shows
high persistence. In this Appendix we also discuss possible ways for extending this identification argument in the presence of sample selection.
Finally, it is of interest to consider here the case where the main equation contains a
possibly time-varying intercept, i.e. it is of the form
y*it Gδ 0t Cρ0 y*itA1 Cα *i Cε *it .
By allowing for a time-varying intercept in the model we are allowing for the presence of
aggregate shocks that are common to all individuals and hence do not have zero cross8. The same reasoning may be used to construct moments that are based on taking time-differences over
non-consecutive periods. In this case the moment condition becomes
E(dit ditA1 dis disA1 disAj y*isAj (ε i*t Aε i*s )兩 (wit Awis) γ 0 G0)G0,
tG2, . . . , T; sFt; jG2, . . . , s.
(7′)
The moment restriction (7′) may be useful in practice in situations where for example participation spells are
few and far apart. I thank a co-editor for pointing this out.
9. It is not difficult to see that we could in principle exploit the strict stationarity assumption on the errors
to construct an infinite number of moment conditions that use higher moments of ε i*t (provided of course that
those exist).
KYRIAZIDOU
PANEL DATA MODELS
549
sectional mean. We can now without loss of generality assume that
SA0: E(α *i )GE(ε *it )G0, for all i and for all t.
In the absence of sample selection and under assumptions SA0–SA4, we obtain T
additional moment restrictions of the form
E(α *i Cε *it )G0,
tG1, . . . , T,
(10)
which identify {δ 0t }TtG1 . It is clear that in the presence of sample selection the analogue
of (10) will in general not hold for any t. That is, in general,
E(dit ditA1 ( yit Aρ0 yitA1 Aδ 0t))GE(dit ditA1 (α *i Cε *it ))≠0.
Thus, our scheme only identifies ∆δ 0t through (7), (8) and (9). This implies that timeinvariant intercepts are not identified by our approach. It also implies that we cannot
identify (time-invariant) coefficients of either the current discrete choice, dit , or the lagged
discrete choice, ditA1 , if either or both enter the main equation, since (7), (8) and (9) use
only observations for which dit GditA1 G1.
2.2. Case 2: ρ0 ≠0, β 0 ≠0, φ 0 G0
We will now consider the case where the main equation also contains other regressors
besides lags of the dependent variable, while the selection covariates are still strictly
exogenous. The model has the form
y*it Gρ0 y*itA1 Cx*it β 0 Cα *i Cε *it ,
(1)
yit Gdit y*it ,
(2)
dit G1{wit γ 0 Cη i Auit ‰0}.
(3′)
In the absence of sample selection, i.e. for the linear model of equation (1), it is often
assumed that the regressors x*it are uncorrelated with the time-varying errors ε *it at all
leads and lags, i.e.10
SA6:
E(x*is ε *it )G0, for all i and for all t, s.
This assumption yields an additional T(TA1) moment restrictions which are linear in ρ0
and β 0
E(x*is ∆ε *it )G0,
tG2, . . . , T; sG1, . . . , T.
(11)
If instead it is assumed that x*it is only predetermined with respect to ε *it , in the sense that:
SA6′: E(x*is ε *it )G0, for all i and for all s‰t,
we only obtain an additional T(TA1)兾2 moment restrictions
E(x*is ∆ε *it )G0,
tG2, . . . , T; sG1, . . . , tA1.
(11′)
It is clear that the moment restrictions (11) (or (11′)) will also hold if, for each i and for
each s, E(x*is ε *it ) is constant for all t (or for all tÂs), say γ s , but not necessarily equal to
zero.
10. Note that the presence of aggregate shocks with non-zero cross-sectional mean may be captured by
including a time dummy in x*it . This is why we are not including a time-varying intercept in (1). Thus, we may
without loss of generality assume that E(ε i*t )G0 for all i and t, so that Cov (x*is , ε i*t )GE(x*is ε i*t ).
550
REVIEW OF ECONOMIC STUDIES
For the sample selection model (1)–(3′), we will focus on the case where x*it is strictly
exogenous with respect to (ε *it , uit) in the sense that:
A1′: (ε *it , uit) is i.i.d. over time for all i conditional on ζ˜ i ≡ (wi , x*i , α *i , η i , y*i0 , di0)
where wi ≡ (wi1 , . . . , wi T ) and x*i ≡ (x*i1 , . . . , x*iT).
Note that A1′ implies that E(x*is ε *it )GE(x*is E(ε *it 兩x*i ))Gγ s , i.e. it is constant for all t
which is sufficient, as noted above, for (11) to hold in the linear model. It is not difficult
to see that A1′ leads to the following T(TA1) conditions that are analogous to (11)
E(dis dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0,
tG2, . . . , T; sG1, . . . , T.
(12)
Notice however that A1′ in fact implies an infinite number of moment restrictions, namely
any measurable function of any subset of x*i that are not censored will satisfy (12).11 It is
useful here to point out the similarity of (12) to the moment restriction (7), which uses
uncensored lags of the dependent variable as instruments.
In the case where x*it is not subject to censoring, that is when xit ≡ x*it for all i and t,
the moment condition becomes
E(dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0,
tG2, . . . , T; sG1, . . . , T.
(12′)
In Appendix B we examine another set of moment restrictions that have been
proposed for linear dynamic panel data models and discuss conditions under which it is
possible to derive analogues to those in the presence of sample selection.
2.3. Case 3: ρ0 G0, β 0 ≠0, φ 0 ≠0
We will now consider the case where the main equation contains only strictly exogenous
regressors, while the selection equation has the dependent variable lagged once as an
explanatory variable. The model has the form
y*it Gx*it β 0 Cα *i Cε *it ,
(1″)
yit Gdit y*it ,
(2)
dit G1{φ 0 ditA1 Cwit γ 0 Cη i Auit ‰0},
(3)
and we assume that A1′ holds. It is straightforward to see that for all sG1, . . . , T
E(∆ε *it 兩dis dit ditA1 G1, ditA2 , ζ i)
GE(ε *it 兩uit ‰wit γ 0 Cη i Cφ 0 , ζ˜ i)AE(ε *itA1兩uitA1 ‰witA1 γ 0 Cη i Cφ 0 ditA2 , ζ˜ i),
11. Suppose that instead of A1′ the following assumption holds:
A2: (ε i*t , uit) is independent of x*is for all s and t and for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0).
Then, it is clear that under A2, (12) (or (12′)) still holds. If A2 is replaced with:
A2′: (ε i*t , uit) is independent of x*is for all s‰t and for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0) then
we obtain an analogue to (11′):
E(dis dit ditA1 ditA2 x*is ∆ε i*t 兩∆wit γ 0 G0)G0,
tG2, . . . , T; sG1, . . . ,tA2.
One case that Assumption A2′ may be relevant is when y*it comes from a VAR and x*it is a lag of another
endogenous variable. Suppose for example that x*it ≡z*itA1, where z*it Gδ z*itA1Cα i*Cû*it , with û*it potentially correlated with (ε i*t , uit), but independent of (ε i*s , uis) for all t≠s conditional on ζ i ≡(wi , α i* , η i , y*i0 , di0 , z*i0).
KYRIAZIDOU
PANEL DATA MODELS
551
will be zero conditional on ∆wit γ 0 Cφ 0 (1AditA2)G0. Hence we obtain the following set
of moment restrictions
E(dis dit ditA1 x*is ∆ε *it 兩∆wit γ 0 Cφ 0 (1AditA2)G0)G0,
tG2, . . . , T; sG1, . . . , T.
(13)
The moment condition above looks more different than the moment condition (12),
obtained when there are no dynamics in the selection equation, than it really is. Note that
for observations for which ditA2 G1, the conditioning set in (13) becomes the same as in
(12), namely ∆wit γ 0 G0. Thus, a valid subset of moment restrictions for the model
(1′′)–(3) is
E(dis dit ditA1 ditA2 x*is ∆ε *it 兩∆wit γ 0 G0)G0,
tG2, . . . , T; sG1, . . . , T,
which is the same set of moments as in (12).
2.4. Case 4: ρ0 ≠0, β 0 ≠0, φ 0 ≠0
As we saw above, in the presence of dynamics only in the selection equation the conditioning set becomes ∆wit γ 0 G0 when ditA2 G1. On the other hand, when dynamics are present
only in the main equation, the same conditions are imposed (either explicitly in the case
of ∆wit γ 0 G0, or implicitly for ditA2 G1, by multiplication with ditA2) for the moments in
(7), (8), (9) and (12) to be non-trivially equal to zero. It is therefore clear that (7), (8), (9)
and (12) continue to hold for the general model described by equations (1)–(3) that allows
for the presence of dynamics in both the selection and the main equation.
Note that in all four cases the conditioning set could be written as ∆wit γ 0C
φ 0 (1AditA2)G0, as in Case 3. In Cases 1 and 2, φ 0 is identically equal to zero, while in
Case 4, ditA2 G1 is implicitly imposed. In the estimation section below we will focus on
the general model (Case 4) and will therefore condition on ∆wit γ 0 G0.
2.5. A ûariation of the main equation
In this section we consider a variation of the model where the main equation contains a
lag of the censored dependent variable instead of the latent one. Specifically, the main
equation is given by
y*it Gρ0 yitA1 Cx*it β 0 Cα *i Cε *it Gρ0 ditA1 y*itA1 Cx*it β 0 Cα *i Cε *it .
The question then is whether the moment conditions (7), (8), (9) and (12) continue to hold
in this case as well. It is clear that the only case where this modification of the main
equation may play a role is in the derivation of (7). This moment condition, which uses
(uncensored) lags of the dependent variable as instruments for the equation in first differences, was derived by backward substitution of y*itAj ( jÂ2). With the modification of the
main equation above, this backward substitution is still valid for up to that time period
tAjAs (sG1, . . . , tAj) that the first zero is observed. Assume Ditjs ≡
dit ditA1 ditA2 ditAj ditAjA1, . . . , ditAjAsC1 . Note that, for any sG1, . . . , tAj,
E(y*itAj ∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i)
sA2
GE((ρsA1
y*itAjAsC1C∑l G0 ρl0 (x*itAjAl β 0 Cα *i Cε *itAjAl))∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i )
0
sA1
GE((∑l G0 ρl0 (x*itAjAl β 0 Cα *i Cε *itAjAl))∆ε *it 兩Ditjs G1, ditAjAs G0, ζ˜ i ),
552
REVIEW OF ECONOMIC STUDIES
since y*itAjAsC1 Gx*itAjAsC1 β 0 Cα *i Cε *itAjAsC1 due to the fact that ditAjAs G0. Following
the analysis of Section 2.1,
E(∆ε *it 兩Ditjs G1, ditAjAs G0, ∆w it γ 0 G0, ζ˜ i)
GE(ε *it 兩dit G1, ∆w it γ 0 G0, ζ˜ i)AE(ε *itA1 兩ditA1 G1, ∆w it γ 0 G0, ζ˜ i)
G0,
and for all lG0, . . . , sA1,
E(ε *itAjAl ∆ε *it 兩Ditjs G1, ditAjAs G0, ∆wit γ 0 G0, ζ˜ i)
GE(ε *itAjAl 兩ditAjAl G1, ∆wit γ 0 G0, ζ˜ i)
B[E(ε *it 兩dit G1, ζ˜ i)AE(ε *itA1 兩ditA1 G1, ∆wit γ 0 G0, ζ˜ i)]
G0.
These last two equations, which hold for any sG1, . . . , tAj ( jÂ2), imply that the moment
condition (7) will continue to hold in the modified model as well.
3. ESTIMATION OF DYNAMIC PANEL DATA SAMPLE
SELECTION MODELS
First we will define some notation. Let θ 0 ≡ ( ρ0 , β 0′ )′ denote the true parameter vector that
belongs to a subset Θ of ℜkC1, and z*it ≡ (y*itA1 , x*it ). Define the following functions
m1it, j (θ ) ≡ dit ditA1 ditA2 ditAj y*itAj (∆y*it A∆z*it θ ),
m2it,κ (θ ) ≡ dis dit ditA1 ditA2 x*is,κ (∆y*it A∆z*it θ ),
tG2, . . . , T; jG2, . . . , t,
tG2, . . . , T; sG1, . . . , T; κ G1, . . . , k,
m3it (θ ) ≡ diT diTA1 dit ditA1 ditA2 ( y*iT Az*iT θ ) (∆y*it A∆z*it θ ),
tG2, . . . , TA1,
m4it (θ ) ≡ dit ditA1 ditA2 ((y*it Az*it θ ) A(y*itA1 Az*itA1 θ ) ),
tG2, . . . , T,
2
2
where θ ∈Θ. Also, define the ((tA1)B1) vector-valued function m1it (θ ) ≡ (m1it,2 (θ ), . . . ,
m1it,t (θ ))′ and the (kB1) vector-valued function m2it (θ ) ≡ (m2it,1 (θ ), . . . , m2it,k (θ ))′.
As we have seen in the previous section, Assumption A1′ implies the following
orthogonality conditions for the general sample selection model (1)–(3)
E(mlit (θ 0)兩∆wit γ 0 G0)G0,
lG1, . . . , 4,
(14)
If γ 0 is known, and if all variables in wit are discrete and Pr (∆wit γ 0 G0)H0, a natural
way of estimating θ 0 is to minimize the distance, according to some metric, of the sample
analogues of the moments above from zero, i.e. do GMM using
1 n
∑ 1{∆wit γ 0 G0}mlit (θ ),
n i G1
lG1, . . . , 4.
Obviously this estimation scheme will fail if one or more of the selection covariates wit
are continuously distributed. Furthermore, γ 0 will generally be unknown. However, if the
functions Λ1 and Λ2 (see Assumption N3 in Appendix C) are sufficiently ‘‘smooth’’, then
a small value of ∆wit γ for some γ ∈ℜq will imply that
∆Λ1it GΛ1 (wit γ , ζ˜ i)AΛ1 (witA1 γ , ζ˜ i) and ∆Λ2it GΛ2 (wit γ , ζ˜ i)AΛ2 (witA1 γ , ζ˜ i)
are also small, so that the moment conditions above will be satisfied approximately. The
idea is then to replace the indicator function in the sample moments above with a weight
KYRIAZIDOU
PANEL DATA MODELS
553
that depends on the magnitude of ∆wit γˆ n, where γˆ n is an initial consistent estimate of γ 0.
The sample averages will then converge to their population analogues provided that the
weights decline to zero for observations with ∆wit γˆ 0 ≠0 as sample size increases. We
choose kernel weights of the form
1
hn
K
冢
∆wit γˆ n
,
hn
冣
where hn is a bandwidth that shrinks to zero as n → S, while K(·) is a kernel ‘‘density’’
function. The proposed estimator then is a kernel-weighted GMM estimator that solves
ˆ n (θ )′A′n An G
ˆ n (θ ),
θˆ n Garg min G
θ ∈Θ
where An is a stochastic matrix that converges in probability to a finite non-stochastic
ˆ n(θ ) is the vector of stacked sample moments with rows of the form
limit A0 , and G
∆wit γˆ n
1 n 1
K
mlit (θ ).
∑
n i G1 hn
hn
冢
冣
In Appendix C we present conditions under which the proposed estimators are consistent and asymptotically normal. Apart from Assumption A1′, a key condition for consistency is an exclusion restriction between x*it and wit. This is required so that β 0 be
identified from the moment conditions (14) for lG1, 2. Under appropriate smoothness
assumptions, the estimators are asymptotically normal and achieve the same rate of convergence as in univariate nonparametric density and regression function estimation. Provided that γ 0 can be estimated at a sufficiently fast rate (Assumption N14), the asymptotic
distribution of θˆ n does not depend on the asymptotic distribution of the first-step estimator γˆ n. Note that its asymptotic variance is of the standard GMM form,
A1
, where A*0 ≡ D′0 A′0 A0 . The definitions of D0 and V0 are given
(A*0 D0)A1A*0 V0 A*′
0 (A*
0 D0)
in the Appendix, where we also discuss how they can be consistently estimated.
The implementation of the proposed estimator requires that a consistent and sufficiently ‘‘fast’’ estimator of γ 0 (and φ 0) be available (see Assumptions (C12) and (N14) in
Appendix B).12 In the absence of dynamics in the selection equation, γ 0 may be estimated
at the standard root-n rate under a logistic assumption on the errors of the binary choice
model (see, for example, Chamberlain (1984)). If one is not willing to parameterize this
distribution, one may use the ‘‘smoothed conditional maximum score estimator’’ (see
Kyriazidou (1995), and Charlier et al. (1995)) which modifies Manski’s (1987) estimator
by smoothing the score function in the manner suggested by Horowitz (1992). Under
appropriate smoothness assumptions this ‘‘smoothed’’ estimator will converge at a rate
sufficiently fast, as required by Assumptions (C12) and (N14). When the selection equation contains one lag of the dependent variable along with other exogenous variables,
Honore´ and Kyriazidou (2000) show that, in a logistic framework and under appropriate
assumptions, (φ 0 , γ 0) may be also estimated at a sufficiently fast rate.13 Alternatively, if
12. As noted for the static panel data sample selection model in Kyriazidou (1997), it is in principle
possible to dispense with estimation of the selection equation altogether and condition on the event that
wit GwitA1 which obviously implies that wit γ 0 GwitA1 γ 0 . However, in this case, only the coefficients in β 0 that
correspond to the non-overlapping variables between x*it and wit would be identified. Furthermore, in this case,
the second-step estimator of the outcome equation would converge at even slower rate. However, this approach
may be desirable if one wants to avoid estimation of the selection equation.
13. As Honore´ and Kyriazidou show, when the logit assumption is relaxed, the dynamic binary choice
model (3) may be still estimated consistently. They do not however provide the rate of convergence in this case,
although they conjecture that it may be sufficiently fast under appropriate smoothness conditions.
554
REVIEW OF ECONOMIC STUDIES
one is willing to assume that the discrete choice selection equation contains a regressor
that is independent of both the error term and the individual effect in that equation, it is
possible to estimate (φ 0 , γ 0) at the standard root-n rate in the manner suggested by
Honore´ and Lewbel (1999).
In order to compute θˆ n in practice, one needs to choose the kernel function K and to
assign a numerical value to the bandwidth. The results in kernel density and regression
function estimation suggest that the specific choice of the kernel function may not be as
important as the choice of the bandwidth. For choosing the bandwidth one may follow
the ‘‘plug-in’’ approach described in Kyriazidou (1997)—see also Horowitz (1992) and
Ha¨ rdle (1990).
We finally turn to the choice of A0 which enters the asymptotic variance term through
A*0 ≡ D′0 A′0 A0 . The choice of A0 is therefore important in terms of efficiency of the proposed estimator. As Hansen (1982) shows, for the standard GMM estimator that has
an asymptotic normal distribution with zero asymptotic bias and asymptotic covariance
A1
matrix (A*0 D0)A1A*0 V0 A*′
, A0 may be chosen to satisfy A′0 A0 GV −1
0 (A*
0 D0)
0 . In the
Monte Carlo study that follows, we use this weight matrix which implies that the covari−1
ance matrix of the proposed estimator is equal to (D′0 V −1
0 D0) . Note however that in
principle this choice of weight matrix does not necessarily have the optimality properties
as in the standard GMM context, since in our case the asymptotic variance of the
estimator is also affected by the choice of bandwidth and kernel function.14
4. MONTE CARLO EVIDENCE
In this section we report the results of a small Monte Carlo study that investigates the
finite sample properties of the proposed estimator as they compare to those of the ‘‘naive’’
estimator that ignores sample selectivity. We focus on a single specification where the
main equation follows a pure first-order autoregression with an individual specific drift.
This model has been used in recent papers to study the performance of GMM estimators
in linear dynamic panel data models (see e.g. Blundell and Bond (1998) and Wyhowski
(1996)). The selection equation follows a discrete choice model with strictly exogenous
regressors, individual effects, and logistic errors. Thus, the experiments are conducted
under the favourable environment where sufficiently fast estimation of γ is feasible,
namely, by conditional logit.
Data for the Monte Carlo experiments are generated according to the model
y*it Gρ0 y*itA1 Cα *i Cε *it ,
tG1, 2, 3,
(1)
yit Gdit y*it ,
dit G1{w1,it γ 10 Cw2,it γ 20 Cη i Auit Â0},
(2)
tG0, 1, 2, 3; iG1, . . . , n.
(3)
In the selection equation, w1,it and w2,it are distributed as N(A1, 1), η i G3, uit is
logistically distributed normalized to have unit variance, and γ 10 Gγ 20 G1. In the main
equation, α i G1C(w2,i1Cw2,i2)兾2C12ξ 1i where ξ 1i is a standard normal random
variable, and ε it ≡ uit . The initial observation yi0 is generated as di0 y*i0 , where
y*i0 G α i 兾(120Aρ0)C1兾(11Aρ20 )ξ 2i with ξ 2i distributed as N(0, 1). This specification
implies that the latent process { y*it } is covariance stationary. Three different values
are considered for ρ0 : 0·3, 0·5 and 0·8. The variables w1,it , w2,it , uit , ξ 1i and ξ 2i are all
generated independent of each other, and are independent and identically distributed over
14. I thank an anonymous referee for pointing this out.
KYRIAZIDOU
PANEL DATA MODELS
555
time and across individuals. Four different sample sizes n are considered: 500, 1000, 4000
and 16,000. This design implies that Pr (dit G1) ≈ 0·7, Pr (dit ditA1 ditA2 G1) ≈ 0·33, and
Pr (dit ditA1 ditA2 ditA3 G1) ≈ 0·26.
Both the ‘‘naive’’ and the proposed estimators considered in this section are GMM
estimators that exploit (all or a subset of) the following sample moments
1
∑i G1 di0 di1 di2 yi0 (∆yi2 Aρ∆yi1)ω i2 ,
(M1)
1 n
∑ di0 di1 di2 di3 yi0 (∆yi3 Aρ∆yi2)ω i3 ,
n i G1
(M2)
n
1
n
1
n
n
∑i G1 di1 di2 di3 yi1 (∆yi3 Aρ∆yi2)ω i3 ,
(M3)
2
2
∑i G1 di0 di1 di2 [(yi2 Aρyi1) A(yi1 Aρyi0) ] ω i2 ,
(M4)
n
n
1 n
2
2
∑ di1 di2 di3 [(yi3 Aρyi2) A(yi2 Aρyi1) ]ω i3 ,
n i G1
(M5)
1 n
∑ di0 di1 di2 di3 ( yi3 Aρyi2)(∆yi2 Aρ∆yi1)ω i2 .
n i G1
(M6)
For the ‘‘naive’’ estimator, ω it G1 for all i and t, while for the proposed estimator
ω it GK(∆wit γˆ 兾hn) where K(·) is the standard normal density function, hn Gn−1兾5, and γˆ is
obtained by conditional logit. In order to investigate how the estimation of γ 0 affects the
results, we also consider the infeasible estimators that use the true γ 0 in the construction
of the kernel weights.
The estimators solve the problem
min G′n A′n An Gn ,
ρ
where Gn G(1兾n) ∑ gni (ρ) is a column vector that contains all or a subset of the sample
moments (M1)–(M6) and An is a weighting matrix. We will denote by IV the estimators
that exploit only the linear (in ρ) moments (M1)–(M3), by GMM1 those that exploit in
addition the sample moments (M4) and (M5), and by GMM2 the estimators that exploit
all moments (M1)–(M6). The objective is to study how the finite sample properties of the
estimators are affected as the non-linear moment restrictions are added to the linear ones.
It has been noticed that for the linear dynamic panel data model, IV estimators that
exploit only the linear restrictions perform poorly especially for larger values of the autoregressive parameter, the reason being that the lagged values of the dependent variable
are only weak instruments for the equation in first differences. For either the ‘‘naive’’ or
the proposed approach, estimates are computed in a first stage using A′n An GIn , the identity matrix of size n. In a second stage, we compute estimates (denoted with o) using the
‘‘optimal’’ weighting matrix obtained from minimizing the asymptotic variance of the
estimator.
Tables 1–3 report the results for the Mean Bias, Median Bias, Root Mean Squared
Error, and Median Absolute Error of the estimates for the three different specifications
of ρ0 across 100 replications. For the design under investigation, we note that the IV,
GMM1 and GMM2 versions of the ‘‘naive’’ estimator are in general more biased (in
n
i G1
556
TABLE 1
ρ0 G0·3
Proposed estimator true γ 0
‘‘Naive’’ estimator
Median
bias
RMSE
MAE
Mean
bias
Median
bias
RMSE
MAE
Mean
bias
Median
bias
RMSE
MAE
n G500
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0673
0·0727
0·1308
0·0882
0·1165
0·0567
0·0689
0·0750
0·1038
0·0670
0·0913
0·0391
0·1543
0·1526
0·2137
0·1857
0·1908
0·1531
0·1107
0·1130
0·1235
0·0885
0·1061
0·0976
A0·0259 A0·0196
A0·0256 A0·0367
0·0638
0·0594
0·0602
0·0063
0·0656
0·0594
0·0481
0·0061
0·2406
0·2281
0·2651
0·2285
0·2490
0·2266
0·1669
0·1426
0·1534
0·1354
0·1347
0·1227
A0·0326 A0·0128
A0·0370 A0·0245
0·0599
0·0644
0·0667
0·0227
0·0691
0·0480
0·0763
0·0191
0·2720
0·2584
0·3116
0·2749
0·2960
0·2898
0·1651
0·1430
0·1753
0·1572
0·1892
0·1396
n G1000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0644
0·0594
0·1044
0·0618
0·0925
0·0390
0·0668
0·0590
0·0888
0·0194
0·0724
0·0104
0·1394
0·1315
0·1805
0·1569
0·1622
0·1325
0·0858
0·0866
0·0972
0·0713
0·0888
0·0531
A0·0065 A0·0416
A0·0084 A0·0429
0·0396
0·0131
0·0403
0·0005
0·0432
0·0221
0·0280 A0·0019
0·2187
0·1982
0·2336
0·2098
0·2090
0·1739
0·1494
0·1341
0·1520
0·1017
0·1155
0·0723
A0·0036
A0·0118
0·0342
0·0357
0·0399
0·0288
A0·0296
A0·0561
0·0079
A0·0226
0·0143
A0·0071
0·2625
0·2355
0·2803
0·2544
0·2533
0·2310
0·1834
0·1470
0·1880
0·1292
0·1425
0·0977
n G4000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0761
0·0720
0·1056
0·0407
0·0880
0·0139
0·0660
0·0640
0·1043
0·0311
0·0884
0·0145
0·0952
0·0901
0·1300
0·0713
0·1119
0·0436
0·0660
0·0640
0·1043
0·0382
0·0884
0·0286
A0·0041 A0·0036
A0·0069 A0·0103
0·0211
0·0096
0·0041
0·0019
0·0202
0·0098
0·0006
0·0029
0·1195
0·1127
0·1144
0·0925
0·1020
0·0689
0·0827
0·0772
0·0731
0·0605
0·0562
0·0472
A0·0136
0·0029
A0·0177 A0·0082
0·0276
0·0102
0·0068
0·0014
0·0269
0·0124
A0·0002 A0·0117
0·1477
0·1366
0·1392
0·1079
0·1270
0·0811
0·0936
0·0895
0·0899
0·0703
0·0747
0·0556
n G16,000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0809
0·0752
0·1042
0·0378
0·0873
0·0161
0·0818
0·0761
0·1051
0·0387
0·0856
0·0158
0·0851
0·0795
0·1089
0·0441
0·0915
0·0245
0·0818
0·0761
0·1051
0·0387
0·0856
0·0170
A0·0067 A0·0035
A0·0067 A0·0038
0·0049
0·0009
0·0012
0·0001
0·0060
0·0032
0·0027
0·0037
0·0582
0·0538
0·0515
0·0411
0·0446
0·0347
0·0326
0·0370
0·0288
0·0238
0·0255
0·0200
A0·0099
0·0009
A0·0107 A0·0129
0·0064
0·0006
0·0002
0·0002
0·0084
0·0034
0·0038
0·0019
0·0719
0·0666
0·0652
0·0498
0·0556
0·0425
0·0368
0·0429
0·0387
0·0302
0·0321
0·0224
REVIEW OF ECONOMIC STUDIES
Mean
bias
Proposed estimator estimated γ 0
TABLE 2
ρ0 G0·5
Proposed estimator true γ 0
‘‘Naive’’ estimator
Median
bias
RMSE
MAE
Mean
bias
Median
bias
RMSE
MAE
Mean
bias
Median
bias
MAE
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0972
0·1015
0·1753
0·1505
0·1721
0·1198
0·1047
0·1221
0·1792
0·1090
0·1639
0·0742
0·2331
0·2290
0·2675
0·2736
0·2582
0·2569
0·1547
0·1468
0·1987
0·1558
0·1763
0·1513
A0·0590 A0·0471
A0·0757 A0·0829
0·0697
0·1050
0·0664
0·0172
0·0758
0·1166
0·0623
0·0255
0·3554
0·3418
0·3273
0·2872
0·2995
0·2981
0·2714
0·2324
0·2093
0·2037
0·1930
0·1966
A0·0795 A0·0537
A0·1056 A0·1057
0·0512
0·0921
0·0583
0·0172
0·0582
0·0896
0·0399
0·0201
0·4198
0·4066
0·3482
0·3251
0·3445
0·2981
0·2737
0·2634
0·2354
0·2108
0·2367
0·2118
n G1000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·1133
0·1018
0·1573
0·1230
0·1502
0·1015
0·1020
0·0961
0·1624
0·0708
0·1477
0·0303
0·2222
0·2054
0·2434
0·2364
0·2340
0·2264
0·1353
0·1231
0·1649
0·1057
0·1520
0·0759
A0·0086
A0·0269
0·0519
0·0346
0·0642
0·0576
A0·0901
A0·0701
0·0373
A0·0231
0·0467
A0·0051
0·3638
0·3102
0·2966
0·3155
0·2793
0·2769
0·2381
0·2021
0·2021
0·1237
0·1739
0·0899
A0·0146
A0·0453
A0·0071
0·0225
0·0415
0·0074
A0·0686
A0·1197
0·0024
A0·0459
0·0259
A0·0271
0·4182
0·3620
0·3415
0·2804
0·2987
0·2899
0·2703
0·2488
0·2136
0·1292
0·1929
0·1222
n G4000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·1308
0·1179
0·1684
0·1091
0·1549
0·0567
0·1190
0·1030
0·1607
0·0743
0·1513
0·0322
0·1595
0·1443
0·1990
0·1651
0·1871
0·1232
0·1190
0·1030
0·1607
0·0794
0·1513
0·0546
A0·0067 A0·0124
A0·0172 A0·0138
0·0404
0·0432
0·0213
0·0054
0·0429
0·0253
0·0233 A0·0020
0·1804
0·1613
0·1627
0·1451
0·1554
0·1387
0·1059
0·1066
0·0966
0·0870
0·0838
0·0701
A0·0213 A0·0261
A0·0363 A0·0302
0·0522
0·0498
0·0393
0·0099
0·0549
0·0254
0·0225 A0·0086
0·2262
0·1988
0·1988
0·1911
0·1912
0·1774
0·1344
0·1328
0·1202
0·1049
0·1101
0·0827
n G16,000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·1391
0·1239
0·1710
0·0899
0·1560
0·0386
0·1401
0·1223
0·1714
0·0824
0·1574
0·0350
0·1447
0·1296
0·1771
0·1028
0·1620
0·0504
0·1401
0·1223
0·1714
0·0824
0·1574
0·0374
A0·0088 A0·0042
A0·0104 A0·0051
0·0128
0·0036
0·0096 A0·0007
0·0146
0·0050
0·0101
0·0031
0·0926
0·0838
0·0866
0·0786
0·0796
0·0795
0·0513
0·0547
0·0441
0·0371
0·0376
0·0273
A0·0136 A0·0062
A0·0173 A0·0140
0·0176
0·0054
0·0121
0·0008
0·0212
0·0039
0·0112 A0·0028
0·1133
0·1028
0·1103
0·1009
0·1029
0·0905
0·0576
0·0623
0·0569
0·0461
0·0511
0·0317
PANEL DATA MODELS
RMSE
n G500
KYRIAZIDOU
Mean
bias
Proposed estimator estimated γ 0
557
558
TABLE 3
ρ0 G0·8
Proposed estimator true γ 0
‘‘Naive’’ estimator
Median
bias
MAE
Mean
bias
Median
bias
RMSE
RMSE
A0·0323
0·0795
A0·0601 A0·0125
0·0895
0·1673
A0·0044
0·0792
0·1301
0·1780
0·0609
0·1291
0·8188
0·6660
0·3490
0·4221
0·2618
0·3234
0·3519
0·3239
0·2191
0·2546
0·2138
0·2174
A0·2982
A0·3652
A0·0793
A0·0254
A0·0634
A0·0373
A0·3034
A0·3729
0·0197
A0·0112
0·0242
A0·0064
Proposed estimator estimated γ 0
MAE
Mean
bias
Median
bias
RMSE
MAE
0·8982
0·8955
0·4201
0·3340
0·3775
0·3072
0·6056
0·5209
0·1976
0·2080
0·1864
0·2158
A0·3437
A0·3870
A0·0989
A0·0813
A0·0922
A0·1003
A0·3764
A0·3930
0·0051
A0·0457
0·0178
A0·0130
1·3143
1·1134
0·4231
0·3642
0·4071
0·3634
0·5875
0·5447
0·1825
0·1875
0·1675
0·2148
n G500
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
n G1000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·2534
0·2330
0·1575
0·0908
0·1673
0·1313
0·1506
0·1484
0·1884
0·1366
0·1903
0·1567
0·7700
0·6340
0·2985
0·3512
0·2617
0·2732
0·3889
0·3313
0·2169
0·2605
0·2132
0·2279
A0·3678
A0·4020
A0·2307
A0·1586
A0·1203
A0·0723
A0·4410
A0·4656
A0·0886
A0·1676
A0·0321
A0·0571
0·9140
0·8898
0·5246
0·3822
0·3783
0·2833
0·4997
0·5704
0·2611
0·2344
0·1852
0·1854
A0·4834
A0·4740
A0·2335
A0·1642
A0·1851
A0·1419
A0·4485
A0·4680
A0·1011
A0·1346
A0·0687
A0·0850
1·1328
1·0287
0·5319
0·3979
0·4697
0·3585
0·5850
0·5541
0·2036
0·2248
0·1862
0·1948
n G4000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·4195
0·3513
0·2685
0·2734
0·2661
0·2659
0·3283
0·2827
0·2667
0·2857
0·2641
0·2790
0·5480
0·4557
0·2946
0·3014
0·2915
0·3004
0·3283
0·2827
0·2667
0·2857
0·2641
0·2790
A0·0582
A0·1653
0·0053
A0·0680
0·0155
0·0106
A0·1573
A0·2017
0·0084
A0·0756
0·0205
A0·0081
0·5304
0·4395
0·2506
0·3311
0·2347
0·2386
0·3250
0·2729
0·1769
0·1638
0·1619
0·1364
A0·1281
A0·2294
A0·0390
A0·0588
A0·0093
A0·0109
A0·2314
A0·3348
0·0025
A0·0597
0·0091
A0·0156
0·7125
0·6054
0·3068
0·3575
0·2557
0·2729
0·4076
0·4110
0·1740
0·1762
0·1620
0·1482
n G16,000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·4426
0·3732
0·3235
0·3209
0·3227
0·3170
0·4506
0·3695
0·3258
0·3317
0·3258
0·3330
0·4617
0·3908
0·3299
0·3254
0·3291
0·3224
0·4506
0·3695
0·3258
0·3317
0·3258
0·3330
A0·0191
A0·0412
0·0203
0·0073
0·0259
0·0368
A0·0488
A0·0572
A0·0088
A0·0162
A0·0087
0·0177
0·2688
0·2549
0·1856
0·2294
0·1782
0·1730
0·1564
0·1414
0·1104
0·0941
0·1053
0·0783
A0·0418
A0·0785
0·0138
A0·0031
0·0217
0·0172
A0·0679
A0·0988
A0·0255
A0·0442
A0·0205
A0·0075
0·3143
0·2942
0·1992
0·2454
0·1904
0·1965
0·1916
0·1771
0·1200
0·1017
0·1129
0·0941
REVIEW OF ECONOMIC STUDIES
Mean
bias
KYRIAZIDOU
PANEL DATA MODELS
559
absolute magnitude) than the respective versions of the proposed estimator (for the chosen
bandwidth). An interesting exception occurs for ρ0 G0·8 and for the sample sizes of 500
and 1000 (see Table 3). However, the bias of the ‘‘naive’’ estimators increases substantially
in this case for the larger sample sizes. The ‘‘naive’’ estimators are obviously inconsistent,
as is demonstrated by the failure of their RMSE to decrease at the appropriate rate as
sample size increases. In contrast, the RMSE of the proposed estimators decreases with
sample size at rate which is almost equal to 1n, although it is slower for the high value
of the autoregressive parameter. As expected, using the estimated γ 0 in the kernel weights
almost invariably increases the RMSE of the proposed estimators relative to the RMSE
of the infeasible estimators that use the true γ 0. The only exception occurs for ρ0 G0·5
and nG1000 for the estimator GMM1. The MAE also tends to be smaller for the infeasible estimators than the feasible ones, although in some cases, in particular for ρ0 G0·8
and smaller sample sizes, the MAE is slightly larger for the infeasible estimators. With
respect to either measure of the bias, the evidence concerning the effect of estimating γ 0
is mixed. It appears that when the true γ 0 is used, the finite sample biases of the proposed
estimators are smaller for higher values of ρ0 and for larger sample sizes.
Concerning the relative performance of the IV and nonlinear GMM versions of the
proposed estimators, we note that the IV estimator is invariably negatively biased on
average and it tends to perform worse both in terms of bias and dispersion as ρ0 increases.
This is similar to the findings for IV estimators in linear dynamic panel data models (see
e.g. Wyhowski (1996)). Note that in contrast, the ‘‘naive’’ IV estimators are positively
biased except for ρ0 G0·8 and for nG500. Adding the nonlinear moments (M4)–(M5) for
GMM1 and (M6) for GMM2 in general decreases the dispersion of the estimates and also
the bias for the larger values of ρ0. We note small efficiency gains, and especially for large
sample sizes, when the restrictions (M6) are added. The use of the ‘‘optimal’’ weighting
matrix improves in general on the efficiency of the estimates, although it does not always
reduce their bias.15
In general, we observe a deterioration of the finite sample properties of all versions
of the proposed estimator as the autoregressive parameter increases. This may be also
seen from Figures 1 and 2, where we plot the kernel-smoothed density16 of the IVo,
GMM1o, and GMM2o estimates for nG500 and nG16,000, respectively, against that of
a normal with the same mean and standard deviation as the ones obtained for the estimators in 100 replications. The figures also allow us to assess the asymptotic approximation to the sampling distribution of the estimators. We find that the normal
approximation is reasonable for the smaller values of the autoregressive parameter, but
there is substantial evidence of non-normality for ρ0 G0·8 even for the sample size of
16,000. For comparison, we plot in Figures 3 and 4 the smoothed density of the IVo,
GMM1o and GMM2o estimators in the absence of sample selectivity, i.e. using dit G1
and ω it G1 for all i and t in the sample moments (M1)–(M6). (Note that these estimators
use the entire samples and are infeasible in practice.) The plots suggest that normality
may be a poor approximation for the GMM linear panel data estimators for high values
of the autoregressive parameter even for very large sample sizes and that it is not a
problem specific to the proposed kernel-weighted GMM approach.
Finally, we consider the effect of increasing the length of the panel by drawing observations from (1)–(3) for tG0, 1, . . . , 4. The number of linear moment restrictions increases
15. In the absence of sample selection, as Wyhowski (1996) reports, both biases and sampling variances
are significantly affected by the fact that the optimal weighting matrices are estimated.
16. The smoothing was done using a standard normal kernel and the rule of thumb bandwidth suggested
by Silverman (1986) for density estimation.
560
REVIEW OF ECONOMIC STUDIES
FIGURE 1
nG500
KYRIAZIDOU
PANEL DATA MODELS
FIGURE 2
nG16,000
561
562
REVIEW OF ECONOMIC STUDIES
FIGURE 3
nG500
KYRIAZIDOU
PANEL DATA MODELS
FIGURE 4
nG16,000
563
564
REVIEW OF ECONOMIC STUDIES
to six, of the nonlinear homoskedasticity restrictions to three, and of the restrictions of
the type of (M6) to two. The results are reported in Table 4. Due to space considerations
we only report results for the proposed estimators that use the estimated γ 0. We note that
almost invariably the RMSE decreases with the inclusion of the new moment restrictions
that result by increasing the length of the panel by one period, although biases are sometimes higher, especially for larger ρ0 and smaller n. In other words, including the
additional moment conditions improves on the efficiency of the estimators, as expected,
although it does not necessarily decrease their finite sample bias.
5. CONCLUSIONS
In this paper, we considered the problem of identification and estimation in panel data
sample selection models with a binary selection rule when the latent equations contain
strictly exogenous variables, (own) lags of the dependent variables, and additive unobserved individual effects. Under a stationarity and serial independence assumption on the
time-varying error vector, we derived a set of conditional moment restrictions which were
used to construct GMM-type estimators that are consistent and asymptotically normal
under a set of mild regularity conditions. An advantage of the approach taken in this
paper is that it does not require any assumptions on the parametric form of the distribution of the unobservables conditional on the observed covariates and the initial conditions. We should point out, however, a potentially serious limitation of the model
considered in this paper, namely that the selection equation (3) does not contain the lagged
continuous endogenous variable or other predetermined variables.
APPENDIX
Part A
The assumptions SA1–SA4 for the linear model (1′), are sometimes complemented by the ‘‘stationarity’’ assumption suggested by Arellano and Bover (1995):
SA5: E(y*it α i* ) is the same for all i and for all t (or equivalently E(α *i 2)G(1Aρ0)E(α i* y*i0)).
SA5 implies one additional moment restriction, namely
E ((α i* Cε i*2)∆y*i1)G0,
(15)
(see equation (39) in Arellano and Bover (1995)) while it also allows to express all moment conditions linearly
in ρ0 (see equations (12a) and (12b) in Ahn and Schmidt (1995)).
As Blundell and Bond (1998) notice, the restriction in (15) holds if in addition to SA1–SA3 we assume:
SA5′: y*i0 Gα i* 兾(1Aρ0)Cû*i0 and E (û*i0 α i*)GE (û*i0 ε i*2)G0 for all i,
which in conjunction with SA2 implies SA5. Assumption SA5′ will be satisfied when 兩 ρ0兩F1 and the model in
j
(1′) along with assumptions SA2 and SA3 hold for all tG . . . , 0, 1, . . . , T.17 Then û*i0 ≡ ΣS
0Aj , since in
j G0 ρ 0 ε i*
S
j
S
j
this case y*i0 G Σ j G0 ρ 0 (α i* Cε i*0Aj)Gα i* 兾(1Aρ0)C Σ j G0 ρ 0 ε i*0Aj , although as discussed by Blundell and Bond
(1998), these assumptions are not necessary for (15) to hold.
We will next examine whether an analogue to (15) holds for the sample selection model (1′)–(3′). As in
the linear case, we will need to make additional assumptions about the initial period and possibly about the presample periods as well. An interesting case to examine is when the model (1′)–(3′) holds for all
j
tG. . . ,A2, A1, 0, 1, 2, . . . , T, and 兩ρ0 兩F1. In this case y*i0 Gα i* 兾(1Aρ0)Cûi0 , where now ûi0 G ΣS
0Aj .
j G0 ρ 0 ε i*
Then the analogue of the right-hand side of (15) in the presence of sample selection is
E(di2 di1 di0 (yi2 Aρ0 yi1)∆yi1)GE((di2 di1 di0 (α i* Cε i*2)(( ρ0 A1) ∑ j G0 ρ 0j ε i*0Aj Cε i*1 )).
S
17. Note that, under these assumptions, SA1 is implied by SA2 and SA3.
TABLE 4
Proposed estimator; estimated γ 0 ; T G5
ρ0 G0·3
ρ0 G0·5
ρ0 G0·8
Median
bias
RMSE
MAE
Mean
bias
Median
bias
RMSE
MAE
Mean
bias
Median
bias
MAE
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
A0·0784
A0·0914
0·0352
0·0317
0·0469
0·0228
A0·1124
A0·1427
0·0012
A0·0282
0·0113
A0·0109
0·2493
0·2358
0·2938
0·2620
0·2597
0·2436
0·1716
0·1844
0·1709
0·1612
0·1522
0·1313
A0·1424
A0·1684
0·0161
0·0079
0·0314
0·0222
A0·1953
A0·2377
0·0113
A0·0357
0·0031
A0·0351
0·3640
0·3436
0·3181
0·3278
0·2872
0·2827
0·2550
0·2646
0·1977
0·2227
0·1778
0·1856
A0·5416
A0·6235
A0·1884
A0·1475
A0·1330
A0·1160
A0·5491
A0·5923
A0·0603
A0·0937
A0·0286
A0·0595
0·7980
0·7488
0·4522
0·3458
0·3806
0·3265
0·5518
0·6004
0·1901
0·2201
0·1815
0·1751
n G1000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
A0·0331
A0·0607
0·0617
0·0227
0·0626
0·0123
A0·0582
A0·0764
0·0297
A0·0046
0·0345
A0·0188
0·1770
0·1604
0·2242
0·1817
0·2071
0·1633
0·1256
0·1262
0·1392
0·1048
0·1299
0·1050
A0·0765
A0·1210
0·0581
0·0148
0·0769
0·0352
A0·1169
A0·1587
0·0521
A0·0007
0·0744
A0·0062
0·2652
0·2423
0·2614
0·2646
0·2449
0·2377
0·1911
0·1810
0·1803
0·1549
0·1760
0·1450
A0·3521
A0·4995
A0·1723
A0·1524
A0·1214
A0·0953
A0·4061
A0·4745
A0·0876
A0·1189
A0·0580
A0·0630
0·6428
0·6305
0·4074
0·3656
0·3533
0·3146
0·4336
0·4786
0·1892
0·2048
0·1652
0·1474
n G4000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
A0·0180 A0·0306
A0·0293 A0·0417
0·0196
0·0228
A0·0018
0·0005
0·0216
0·0283
A0·0026 A0·0074
0·1028
0·0914
0·1070
0·0681
0·0978
0·0625
0·0614
0·0643
0·0737
0·0452
0·0640
0·0456
A0·0287
A0·0548
0·0327
0·0102
0·0374
0·0058
A0·0415
A0·0740
0·0306
A0·0042
0·0308
A0·0091
0·1629
0·1431
0·1483
0·1099
0·1396
0·0983
0·1019
0·1064
0·0955
0·0636
0·0852
0·0580
A0·1028
A0·2256
A0·0132
A0·0146
A0·0030
0·0149
A0·1582
A0·2460
A0·0148
A0·0372
A0·0041
A0·0020
0·4201
0·4026
0·2279
0·2111
0·2106
0·1678
0·2340
0·3035
0·1286
0·1355
0·1179
0·1118
n G16,000
IV
IVo
GMM1
GMM1o
GMM2
GMM2o
0·0021
A0·0008
0·0124
A0·0011
0·0123
A0·0007
0·0603
0·0571
0·0605
0·0461
0·0557
0·0383
0·0421
0·0426
0·0432
0·0304
0·0409
0·0227
A0·0005
A0·0085
0·0201
0·0002
0·0217
A0·0018
A0·0169
A0·0204
0·0058
A0·0077
0·0110
A0·0102
0·0923
0·0839
0·0893
0·0650
0·0843
0·0491
0·0711
0·0619
0·0639
0·0397
0·0611
0·0292
A0·0218 A0·0448
A0·0948 A0·1113
0·0192
0·0175
0·0327
0·0046
0·0253
0·0165
0·0312
0·0223
0·2321
0·2185
0·1763
0·1696
0·1692
0·1545
0·1493
0·1514
0·1214
0·0977
0·1133
0·0795
A0·0057
A0·0099
0·0021
A0·0061
0·0089
A0·0056
PANEL DATA MODELS
RMSE
n G500
KYRIAZIDOU
Mean
bias
565
566
REVIEW OF ECONOMIC STUDIES
Assuming that {(ε i*t , uit )}tTG0 is i.i.d. over time for all i conditional on ζr i ≡ ({wit }tTG0 , α i* , η i), which is a natural
extension of Assumption A1, a sufficient condition for the expectation above to be zero is
(1Aρ0) ∑ j G0 ρ 0j E(ε i*0Aj 兩di0 G1, ζr i)GE (ε i*1 兩di1 G1, ζr i).
S
¯ 1 (wi0 γ 0 , ζr i) ≡ E(ε i*0 兩di0 G1, ζr i)G
This last equality will be satisfied if wi1 γ 0 Gwi0 γ 0 , which implies that Λ
¯ 1 (wi1 γ 0 , ζ¯ i), and in addition E(ε i*0Aj 兩di0 G1, ζr i)GE(ε i*0 兩di0 G1, ζr i)GΛ
¯ 1 (wi1 γ 0 , ζr i) for all jH0.
E(ε i*1 兩di1 G1, ζr i) ≡ Λ
This last condition, however, that the effect of sample selection in the initial sample period tG0 on the presample errors is constant, does not seem tenable. It will not be satisfied in general if we extend, for example, the
conditional independence over time assumption on the error vector (ε i*t , uit) to all periods tG . . . , −2, −1, 0, 1,
2, . . . , T. We will therefore not pursue this restriction further.
Part B
For the linear model (1), assumptions SA6 and SA6′ are sometimes complemented by the ‘‘stationarity’’ assumption (compare to SA5):
SA7: E(x*it α i* ) is the same for all t for each i,
(see Bhargava and Sargan (1983) and Breusch, Mizon and Schmidt (1989) for the case where x*it is strictly
exogenous, and Arellano and Bover (1995) when it is only predetermined). Under SA1–SA5, SA6 and SA7 we
obtain the following T(TA1)C2(TA1) moment conditions in addition to (4), (5), (6), (10) and (12)18
E(∆x*it (α i* Cε i*s ))G0,
tG2, . . . , T; sG1, . . . , T,
(16)
E(x*is (α i* Cε i*t )Ax*it (α i* Cε i*s ))G0,
∀t≠s,
(17)
E(x*it (α i* Cε i*t )Ax*is (α i* Cε i*s ))G0,
∀t≠s.
(18)
If SA6′ holds, then we only obtain the following TA1 restrictions in addition to (4), (5), (6), (15) and (11′) (see
Arellano and Bover (1995))
E(∆x*it (α i* Cε i*t ))G0,
tG2, . . . , T.
(16′)
In this appendix we examine whether restrictions similar to (16)–(18) may be obtained for the sample
selection model. First, consider the analogue of the right-hand-side of (16)
E(dit ditA1 dis disA1 ∆xit ( yis Aρ0 yisA1 Axis β 0))GE(dit ditA1 dis disA1 ∆x*it (α i* Cε i*s )).
(19)
It is clear that Assumption A1′ will in general not suffice for the expectation above to be zero, even if α i* is
identically zero for all i. The reason is that without any further restrictions on the time series properties of x*it
as well as on the manner by which the x*it process is affected by sample selection, the presence of the latter
potentially destroys any assumed stationarity in the correlation structure between x*it and ε i*s (and between x*it
and α i* ). In other words, A1′ in general will not imply that either
E(α i* ∆x*it 兩dit ditA1 dis disA1 G1)G0,
or
E(ε i*s ∆x*it 兩dit ditA1 dis disA1 G1)G0.
Suppose however that, in addition to A1′, the following assumption holds:
B1: (x*it , ε i*t , uit ) is i.i.d. over time for all i conditional on ζ i ≡ (wi , α i* , η i , y*i0 , di0).
Note that B1 implies that x*it is i.i.d. over time for all i conditional on ζ i which in turn implies SA7.
We will next demonstrate how to derive an analogue to (16) in the presence of sample selection. Consider
E(α i* ∆x*it 兩dit ditA1 dis disA1 G1)GE (α i* E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)兩dit ditA1 dis disA1 G1).
Given that (x*it , uit) is independent over time conditional on ζ i by B1, we have
E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)GE(x*it 兩dit G1, ζ i)AE(x*itA1 兩ditA1 G1, ζ i),
and by the stationarity of (x*it , uit) conditional on ζ i , we obtain
E(∆x*it 兩dit ditA1 dis disA1 G1, ∆wit γ 0 G0, ζ i),
18. As Arellano and Bover (1995, page 45) observe, some of the moment conditions (16)–(18) will be
redundant given those in (12).
KYRIAZIDOU
PANEL DATA MODELS
567
and hence
E(dit ditA1 dis disA1 α i* ∆x*it 兩∆wit γ 0 G0),
(20)
if x*it and uit are not assumed independent conditional on ζ i , or if they are independent,
E(∆x*it 兩dit ditA1 dis disA1 G1, ζ i)GE(∆x*it 兩ζ i)G0,
and hence
E(dit ditA1 dis disA1 α i* ∆x*it )G0.
(20′)
For the second term in (19), we have
E(ε i*s ∆x*it 兩dit ditA1 dis disA1 G1, ζ i)
G[E(x*it 兩dit G1, ζ i)AE(x*itA1 兩ditA1 G1, ζ i)] E(ε i*s 兩dis G1, ζ i),
for t, tA1≠s, and hence
E(dit ditA1 dis disA1 ε i*s ∆x*it 兩∆wit γ 0 G0)G0,
∀t, tA1≠s,
(21)
if x*it and uit are not assumed independent conditional on ζ i , or if they are independent, we obtain
E(dit ditA1 dis disA1 ε i*s ∆x*it )G0,
∀t, tA1≠s.
(21′)
Note that (21) and (21′) will hold for all s, and t, similar to the linear case, if in addition we assume that for all
t, ε i*t and x*it are independent conditional on dit and ζ i. Now, combining (20) and (21) (or (20′) and (21′)) we
obtain the following moment restriction that is an analogue to (16)
E(dit ditA1 dis disA1 ∆x*it ε i*s 兩∆wit γ 0 G0)G0,
∀t, tA1≠s,
(22)
or
E(dit ditA1 dis disA1 ∆x*it ε i*s )G0,
∀t, tA1≠s,
(22′)
depending on whether x*it and uit are assumed correlated or independent conditional on ζ i. It is easy to verify
that, when x*it is not subject to censoring and under assumption B1, we obtain
E(dis disA1 ∆x*it ε i*s )G0,
∀t, tA1≠s.
(22′′)
Using similar arguments we can construct analogues to (17) and (18) when sample selection is present,
provided that B1 holds in addition to A1′. The form of these moment restrictions, similar to the analogues (22)
or (22′) or (22″) of (16), will depend on whether x*it is censored or not, on whether the sample selection process
is endogenous or exogenous to the x*it process conditional on ζ i , as well as on whether x*it is independent of
ε i*t given the selection. Whether the variables in x*it are censored or not is an empirical matter that is applicationspecific. Furthermore, the plausibility of Assumption B1 and of any other additional assumptions concerning
the relationship between each one of the variables in x*it with (uit , ε i*t ) conditional on ζ i also depends on the
particular application at hand. We will therefore not investigate the implied moment restrictions any further.
Their inclusion, however, in the estimation procedure described in Section 3 should be straightforward. Another
reason for not pursuing these moment restrictions any further is that, although their inclusion should improve
the asymptotic efficiency of the estimators, it is not clear how the inclusion of a large number of additional
moments would affect the finite sample bias and precision of the estimators.
Part C
In this appendix we discuss the asymptotic properties of the proposed estimators. We will first present sufficient
conditions for consistency of the infeasible estimator that uses the true γ 0 in the construction of the kernel
weights.
Assumption C1. {(x*it , wit , ε i*t , uit , α i* , η i ); tG1, . . . , T}niG1 is an i.i.d. sample of n draws from a distribution that satisfies (1)–(3). And in addition to {(xit , wit , yit , dit ); tG1, . . . , T}niG1 , we also observe a random
sample {(yi0 , di0 )}niG1 from a distribution that satisfies yi0 ≡di0 · y*i0 , where di0 ∈{0, 1} and y*i0 takes values on a
subset of the real line.
Given the random sampling assumption above, we will from now on drop the subscripts i that denote the
individuals’ identity.
568
REVIEW OF ECONOMIC STUDIES
Assumption C2. (ε t* , ut) is i.i.d.
w ≡ (w1 , . . . , wT) and x* ≡ (x*1 , . . . , x*T ).
over
time
conditional
on
ζ˜ ≡ (w, x*, α *, η , y*0 , d0)
where
Assumption C3. For all l, t, E(mlt (θ )兩W0t G0) takes a unique zero at θ Gθ 0 ≡ (φ 0 , β 0′ )′.
Assumption C4. W0t ≡ ∆wt γ 0 is absolutely continuously distributed for all t with density ft( · ) that is
bounded from above on its support and strictly positive and continuous in a neighbourhood of zero.
Assumption C5. θ 0 ∈Θ, a compact subset of ℜkC1.
Assumption C6. For some pH1, E兩y*t 兩2p, and E兩x*t 兩2p are finite for all t. Furthermore, E(兩y*t 兩2兩W0t′ G·)
and E(兩x*t 兩2兩W0t′ G·) are bounded on their support for all t, t′.
Assumption C7. For all l, t (lG1, . . . , 4; tG1, . . . , T ), E(mlt (θ )兩W0t G·) is continuous in a neighbourhood
of zero for all θ ∈Θ. (Note that by Assumption (C6), E(mlt (θ )兩W0t G·) is bounded on its support.)
Assumption C8. K: ℜ → ℜ is a function of bounded variation that satisfies:
(i) supν∈ ℜ 兩K(ν)兩FS,
(ii)
冮兩K(ν)兩dνFS,
and (iii)
冮K(ν)dν G1.
Assumption C9. hn is a sequence of finite positive numbers that satisfies: (i) hn →0 as n→S, and
(ii) n1A1兾phn兾ln n → S as n→S, where p is as in Assumption (C6).
The strict positiveness of ft in Assumption (C4) and the condition that E(mlt (θ )兩W0t G0) has a unique zero
at θ Gθ 0 (Assumption (C3)) are required for identification of θ 0. Note that for lG1, 2, the latter condition is
satisfied provided that x*t has full rank conditional on W0t G0, which implies that there exists at least one
variable in wt that is not contained in x*t . The rest of the assumptions are regularity conditions that permit the
application of a uniform law of large numbers to show convergence of the objective function to its population
analogue, a condition required in all consistency proofs of extremum estimators. In some cases the assumption
that y*t and x*t have bounded second moments conditional on W0t (see Assumption (C6)) may be restrictive.
However, this assumption may be relaxed. Specifically, all that is required is that E(mlt(θ )兩W0t G· ) ft (·) is
bounded on its support. The same comment applies to other conditional expectations used in the theorems that
follow. Finally, Assumptions (C8) and (C9) are standard in kernel estimation of conditional expectations.
Theorem 1. (Consistency of Infeasible Estimator). Let Assumptions (C1)–(C9) hold. Define
θ n Garg min Gn (θ )′ A′n An Gn (θ ),
θεΘ
(∗)
where An is a stochastic matrix that conûerges in probability to a finite non-stochastic limit A0, and Gn(θ ) is an
RB1 ûector with rows of the form
1 n 1
∆wit γ 0
K
mlit (θ ).
∑
n i G1 hn
hn
冢
冣
Then, θ n → θ 0.
p
We next turn to examine the feasible two-step estimator that uses a consistent estimator γˆ n in the construction of the kernel weights. Under some additional assumptions, this estimator is also consistent. These conditions
involve a strengthening of the moment conditions in (C5), additional smoothness on the kernel, and a restriction
on the rate of convergence of hn to zero given the rate of convergence of the first-step estimator γˆ n.
Assumption C10. E兩y*t 兩4, E兩x*t 兩4 and E兩wt 兩2 are finite for all t.
Assumption C11. K is continuously differentiable with derivative that satisfies: supν∈ ℜ 兩K ′ (ν)兩FK1FS.
Assumption C12. hn satisfies hn−2(γˆ nAγ 0 )Gop(1) where γˆ n is a consistent estimator of γ 0.
Theorem 2. (Consistency of Two-Step Estimator). Let Assumptions (C1)–(C12) hold. Define
ˆ n (θ )′A′n An G
ˆ n (θ ),
θˆ n Garg min G
θ∈Θ
ˆ n(θ ) is an RB1 ûector with rows of the form
where An is as in Theorem 1, and G
1
n
∑i G1
n
1
hn
K
冢
∆wit γˆ n
hn
冣m
lit
(θ ).
(∗∗)
KYRIAZIDOU
PANEL DATA MODELS
569
p
Then, θˆ n → θ 0.
We next present conditions that are sufficient for asymptotic normality of the proposed estimators. Apart
from the usual strengthening of regularity conditions on the existence and finiteness of moments higher than
those required for consistency, additional smoothness is imposed on the model which allows convergence at a
faster rate.
Assumption N1. θ 0 ∈int(Θ).
Assumption N2. For all t, ft( · ) is s (sÂ1) times continuously differentiable on its support and has uniformly bounded derivatives. Also, for all t, t′, (W0t , W0t′ ) has density f ( · , · ) that is uniformly bounded on its
support.
˜ 1t ≡Λ1 (wt γ 0 , ζ˜ ) ≡E(ε t* 兩ut ‰wt γ 0 Cη Cφ 0 , ζ˜ ) and Λ
˜ 2t ≡Λ2 (wt γ 0 ,
Assumption N3. For all t, the functions Λ
ζ˜ ) ≡E(ε *t 2 兩ut ‰wt γ 0 Cη Cφ 0 , ζ˜ ) satisfy
˜ jt GΛ
˜ jt AΛ
˜ jtA1 GΛ j (wt γ 0 , ζ˜ )AΛ j (wtA1 γ 0 , ζ˜ )GΛ*jt W0t
∆Λ
where jG1, 2, and Λ*jt ≡Λ*j (wt γ 0 , wtA1 γ 0 , ζ˜ ) is bounded on its support. (For example, this condition will hold if
Λ1(wtγ 0 , ζ˜ ) and Λ2 (wtγ 0 , ζ˜ ) are continuously differentiable with respect to their first argument with bounded
derivatives.)
Assumption N4. For all l and t, and for all θ ∈Θ, E(m(1)
lt (θ )兩W0t G·) is continuous in a neighbourhood of
zero. (Here, m(1)
lt (θ ) is the Jacobian of first-order derivatives of mlt(θ ) with respect to θ ′.)
Assumption N5. E(mlt (θ 0 )ml ′t (θ 0 )′兩W0t G·) is continuous in a neighbourhood of zero as a function of W0t
for all l, l′, t.
Assumption N6. E兩y*t 兩4 and E兩x*t 兩4 are finite for all t.
Assumption N7. E(兩y*t 兩4C2δ兩W0t′ G·), E(兩x*t 兩4C2δ 兩W0t′ G·), E(兩y*t 兩4兩W0t′ G ·, W0t″ G·), and E(兩x*t 兩4兩W0t′ G·,
W0t″ G·) are bounded on their support for some δ ∈(0, 1), and for all t, t′, t″.
Assumption N8. For all l and t, E(m*lt (θ 0 )兩W0t G·) is s times continuously differentiable on its support,
and has bounded derivatives. (The functions m*lt (θ 0) are defined below.)
Assumption N9. K is an (sC1)-th order bias-reducing kernel that satisfies: 兰 兩ν兩 j 兩K(ν)兩dνFS for jG0 and
j‰sC1, and
冮ν K(ν)dν G冦0,1,
j
if jG0,
if 0FjFsC1.
The functions m*lt (θ 0) in Assumption (N8) are defined as follows
ρ 0l (α *Cx*tAjAl β 0 CE(ε t*AjAl 兩D1t, j G1, ζ˜ )))Λ*1t ,
m*1t, j (θ 0)GPr (D1t, j G1兩ζ˜ )( ρtAj
0 C∑l G0
0 y*
tAjA1
m*2t,κ (θ 0)GPr (D2t G1兩ζ˜ ) x*s,κ Λ*1t ,
m*3t (θ 0)GPr (D3t G1兩ζ˜ ) (α *CE(ε T* 兩D3t G1, ζ˜ ))Λ*1t ,
m*4t (θ 0)GPr (D4t G1兩ζ˜ )(2α *Λ*1t CΛ*2t).
Note that by the boundedness of Λ*jt ( jG1, 2) in Assumption (N3), and of E(兩y*t 兩4C2δ 兩W0t′ ), E (兩x*t 兩4C2δ 兩W0t′ ) in
Assumption (N7), E(兩m*lt (θ 0)兩2兩W0t G·) is bounded on its support for all l, t. Now, with this notation we can
write for example
E(m1t, j (θ 0))GE(E(m1t,j (θ 0)兩ζ˜ ))
GE(E(D1t, j y*tAj ∆ε t* 兩ζ˜ ))
GE(Pr (D1t, j G1兩ζ˜ ) E(y*tAj ∆ε t* 兩D1t, j G1,ζ˜ ))
tAjA1 l
GE(Pr (D1t,j G1兩ζ˜ )(ρtAj
ρ 0 (α *Cx*tAjAl β 0 CE(ε t*AjAl 兩D1t, j G1, ζ˜ )))Λ*1t W0t
0 C∑l G0
0 y*
GE(m*1t, j (θ 0)W0t).
where jG2, . . . , t. Similarly, we can write for lG2, 3, 4
E(mlt (θ 0))GE(m*lt (θ 0)W0t).
570
REVIEW OF ECONOMIC STUDIES
These expressions are a consequence of the smoothness condition on the functions Λj in Assumption (N3) alluded
to earlier. With this additional smoothness, the bias of the estimator, which is due to the fact that
E((1兾hn )K(W0t 兾hn)mlt (θ 0))≠0 for any finite n, and which, similarly to univariate kernel density and regression
function estimation, would be of order O(hsn) given a degree of smoothness s on E(mlt(θ 0)兩W0t G·) and ft( · )
(compare to Assumptions (N2) and (N8)), is now of smaller order O(hnsC1). As a result, the estimator converges
in distribution at a faster rate, namely at rate nA(sC1)兾[2(sC1)C1] compared to the one typically obtained in kernel
estimation, namely nAs兾(2sC1).
Theorem 3. (Asymptotic Normality of Infeasible Estimator). Let Assumptions (C1)–(C9) and (N1)–(N9)
hold and θ n be a solution to (∗).
(i) Let 1nhn hnsC1 → h, with 0‰hFS. Then
d
−1
1nhn (θ n Aθ 0) →
N (Ah(A*0 D0)−1 A*0 B0 , (A*0 D0)−1 A*0 V0 A*′
0 (A*
0 D0) ),
where B0 is an (RB1) ûector with elements of the form
B0lt ≡Blt (θ 0) ≡
∂(s)
{E(m*lt (θ 0)兩W0t) ft (W0t)}W0t G0 ·
s! ∂W s0t
1
·
冮ν
K(ν)dν.
sC1
A*0 ≡D′0 A′0 A0 , with D0 an (RB(kC1)) matrix with elements of the form
D0lt ≡Dlt (θ 0) ≡ ft (0) · E(m(1)
lt (θ 0)兩W0t G0),
and V0 is an (RBR) matrix with elements of the form
V0ltl ′t′ ≡Vltl ′t′ (θ 0) ≡
ft (0) · E(mlt (θ 0)ml ′t′ (θ 0)′兩W0t G0) · 兰 K(ν)2dν
冦0
if tGt′,
if t≠t′.
(ii) Let 1nh˜ n h˜ nsC1 → S. Then
p
h˜ nA(sC1) (θ n Aθ 0) → A(A*0 D0)−1A*0 B0.
In order to obtain the limiting distribution for the feasible estimator, θˆ n , additional smoothness is required.
We next present sufficient conditions for θˆ n to have the same asymptotic distribution as the infeasible estimator,
θ n , of the previous theorem. For the limiting distribution of θˆ n not to depend on the asymptotic distribution of
the first-step estimator γˆ n , the bandwidth is required to converge to zero at a rate such that γˆ n converges in
distribution faster than θ n (see Assumption (N13)).
Assumption N10. E兩y*t 兩8 , E兩x*t 兩8, and E兩wt兩8 are finite for all t.
Assumption N11. E(兩y*t 兩4C2δ兩W0t′ G·) and E(兩x*t 兩4C2δ 兩W0t′ G·) are bounded on their support for some δ ∈
(0, 1), and for all t, t′.
Assumption N12. E(mlt (θ 0 )mlt (θ 0 )′∆wqt 1 ∆wqt 2 兩W0t G·) and E(mlt (θ 0)mlt (θ 0)′∆wqt 1 ∆wqt 2 ∆wqt 3 ∆wqt 4 兩W0t G· ) are
continuous in a neighbourhood of zero for all l and t. (Here, qj ∈{1, . . . , q}).
Assumption N13. K is three times continuously differentiable with derivatives that satisfy:
(i) supν ∈ℜ 兩K ′(ν)兩FK1FS, supν ∈ℜ 兩K″(ν)兩FK2FS, supν ∈ℜ兩K″′(ν)兩FK3FS, and
(ii) 兰 兩νK ′(ν)兩dνFS, and 兰 兩νK″(ν)兩dνFS.
Assumption N14. hn satisfies 1nhn(γˆ nAγ 0)Gop(1).
Theorem 4. (Asymptotic Normality of Two-Step Estimator). Let θˆ n be a solution to (∗∗). In addition to the
assumptions of Theorem 2, let Assumptions (N1)–(N14) hold.
(i) Let 1nhnhnsC1 → h, with 0‰hFS. Then
d
−1
1nhn (θˆ n Aθ 0) →
N(Ah(A*0 D0)−1 A*0 B0 , (A*0 D0)−1 A*0 V0 A*′
0 (A*
0 D0) ),
where A*0 , V0, and D0 are defined as in Theorem 3.
(ii) Let 1nh˜ n h˜ nsC1 →S. Then
p
h˜ nA(sC1) (θ nAθ 0) → A(A*0 D0 )−1A*0 B0.
KYRIAZIDOU
PANEL DATA MODELS
571
In order to carry out hypothesis testing and to construct confidence intervals based on the asymptotic
distribution of the estimator, one needs consistent estimators of the components of the asymptotic variance.
Consider the following functions
Vnltl ′t′ (θ )G
Dnlt (θ )G
1 n
∆wit γˆ n
∑i G1 mlit (θ )ml ′it′ (θ )′ K
nhn
hn
冢
2
冣,
∆wit γˆ n
1 n
(1)
,
∑ mlit (θ )K
nhn i G1
hn
冢
冣
where θˆ n is a consistent estimator of θ 0. Under some additional regularity conditions that guarantee that the
sample averages above converge uniformly in θ to their population analogues, and provided that the latter are
continuous at θ 0 , it is not difficult to show that
p
Vnltl ′t′ (θˆ n) → Vltl ′t′ (θ 0) ≡V0ltl ′t ′,
Dnlt (θˆ n) → Dlt (θ 0) ≡D0lt .
p
Acknowledgements. I would like to thank Richard Blundell, Xiaohong Chen, Lars Hansen, Jim Heckman,
Bo Honore´ , Joe Hotz, Guido Imbens, Costas Meghir, seminar participants at various institutions, the journal’s
managing editors, and two anonymous referees for helpful suggestions and comments. The paper was first
presented at the 1997 Econometric Society European meetings, Toulouse, France. This research was supported
by the National Science Foundation. All errors are naturally mine.
REFERENCES
AHN, S. C. and SCHMIDT, P. (1995), ‘‘Efficient Estimation of Models for Dynamic Panel Data’’, Journal of
Econometrics, 68, 5–27.
AMEMIYA, T. (1985) Adûanced Econometrics (Cambridge: Harvard University Press).
ANDERSON, T. W. and HSIAO, C. (1981), ‘‘Estimation of Dynamic Models with Error Components’’, Journal
of the American Statistical Association, 76, 598–606.
ARELLANO, M. and BOND, S. R. (1991), ‘‘Some Tests of Specification for Panel Data: Monte Carlo Evidence
and an Application to Employment Equations’’, Reûiew of Economic Studies, 58, 277–297.
ARELLANO, M. and BOVER, O. (1995), ‘‘Another Look at the Instrumental Variable Estimation of Error
Component Models’’, Journal of Econometrics, 68, 29–51.
ARELLANO, M., BOVER, O. and LABEAGA, J. M. (1997), ‘‘Autoregressive Models with Sample Selectivity
for Panel Data’’, in C. Hsiao, K. Lahiri, L.-F. Lee and H. Pesaran (eds.), Analysis of Panels and Limited
Dependent Variable Models (Cambridge: Cambridge University Press).
ARELLANO, M. and HONORE´ , B. E. (1999), ‘‘Panel Data Models. Some Recent Developments’’ (Unpublished manuscript prepared for the Handbook of Econometrics, Vol. 5).
BHARGAVA, A. and SARGAN, J. D. (1983), ‘‘Estimating Dynamic Random Effects Models from Panel Data
Covering Short Time Periods’’, Econometrica, 51, 1635–1659.
BLUNDELL, R. and BOND, S. (1998), ‘‘Initial Conditions and Moment Restrictions in Dynamic Panel Data
Models’’, Journal of Econometrics, 87, 115–143.
BOVER, O. and ARELLANO, M. (1997), ‘‘Estimating Dynamic Limited Dependent Variable Models from
Panel Data’’, Inûestigaciones Economicas, 21, 141–165.
BREUSCH, T. S., MIZON, G. E. and SCHMIDT, P. (1989), ‘‘Efficient Estimation Using Panel Data’’, Econometrica, 57, 695–700.
CHAMBERLAIN, G. (1984), ‘‘Panel Data’’, in Z. Griliches and M. Intriligator (eds.), Handbook of Econometrics, Vol. II (Amsterdam: North Holland).
CHARLIER, E., MELENBERG, B. and VAN SOEST, A. (1995), ‘‘A Smoothed Maximum Score Estimator
for the Binary Choice Panel Data Model and an Application to Labour Force Participation’’, Statistica
Neerlandica, 49, 324–342.
COGAN, J. F. (1981), ‘‘Fixed Costs and Labor Supply’’, Econometrica, 49, 945–964.
ECKSTEIN, Z. and WOLPIN, K. I. (1990), ‘‘On the Estimation of Labor Force Participation, Job Search, and
Job Matching Models Using Panel Data’’, in Y. Weiss and G. Fishelson (eds.), Adûances in the Theory
and Measurement of Unemployment.
HANOCH, G. (1980), ‘‘Hours and Weeks in the Theory of Labor Supply’’, in J. P. Smith (ed.), Female Labor
Supply: Theory and Estimation (Princeton: Princeton University Press).
HANSEN, L. P. (1982), ‘‘Large Sample Properties of Generalized Method of Moments Estimators’’, Econometrica, 50, 1029–1054.
¨ RDLE, W. (1990), ‘‘Applied Nonparametric Regression (Cambridge: Cambridge University Press).
HA
HAUSMAN, J. A. (1980), ‘‘The Effects of Wages, Taxes, and Fixed Costs Women’s Labor Force Participation’’,
Journal of Public Economics, 14, 161–194.
572
REVIEW OF ECONOMIC STUDIES
HECKMAN, J. J. (1981), ‘‘Heterogeneity and State Dependence’’, in S. Rosen (ed.), Studies of Labor Markets,
(Chicago: University of Chicago Press).
HECKMAN, J. J. (1993), ‘‘Lessons from Empirical Labor Economics: 1972–1992’’, American Economic Association Paper and Proceedings, 83, 116–121.
HOLTZ-EAKIN, D., NEWEY, W. and ROSEN, H. S. (1988), ‘‘Estimating Vector Autoregression with Panel
Data’’, Econometrica, 56, 1371–1396.
HONORE´ , B. E. (1992), ‘‘Trimmed LAD and Least Squares Estimation of Truncated and Censored Regression
Models with Fixed Effects’’, Econometrica, 60, 533–565.
HONORE´ , B. E. (1993), ‘‘Orthogonality Conditions for Tobit Models with Fixed Effects and Lagged Dependent
Variables’’, Journal of Econometrics, 59, 35–61.
HONORE´ , B. E. and KYRIAZIDOU, E. (2000), ‘‘Panel Data Discrete Choice Models with Lagged Dependent
Variables’’, Econometrica, 68, 839–874.
HONORE´ , B. E. and KYRIAZIDOU, E. (2000), ‘‘Estimation of Tobit-Type Models with Individual Specific
Effects’’, Econometric Reûiews, 19, 341–366.
HONORE´ , B. E. and LEWBEL, A. (1998), ‘‘Semiparametric Binary Choice Panel Data Models without Strictly
Exogenous Regressors’’ (Unpublished manuscript).
HOROWITZ, J. L. (1992), ‘‘A Smoothed Maximum Score Estimator for the Binary Response Model’’, Econometrica, 60, 505–531.
HOTZ, V. J., KYDLAND, F. E. and SEDLACEK, G. L. (1988), ‘‘Intertemporal Preferences and Labor Supply’’, Econometrica, 56, 335–360.
HSIAO, C. (1986) Analysis of Panel Data (Cambridge: Cambridge University Press).
HYSLOP, D. (1999), ‘‘State Dependence, Serial Correlation and Heterogeneity in Intertemporal Labor Force
Participation of Married Women’’, Econometrica, 67, 1255–1294.
JOHNSON, T. R. and PENCAVEL, J. H. (1984), ‘‘Dynamic Hours of Work Functions for Husbands, Wives,
and Single Females’’, Econometrica, 52, 363–389.
KYDLAND, F. E. and PRESCOTT, E. C. (1982), ‘‘Time to Build and Aggregate Fluctuations’’, Econometrica,
50, 1345–1370.
KYRIAZIDOU, E. (1995) Essays in Estimation and Testing of Econometric Models (Unpublished Ph.D. thesis,
Department of Economics, Northwestern University).
KYRIAZIDOU, E. (1997), ‘‘Estimation of A Panel Data Sample Selection Model’’, Econometrica, 65, 1335–
1364.
MANSKI, C. (1987), ‘‘Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data’’,
Econometrica, 55, 357–362.
´ TYA
´ S, L. and SEVESTRE, P. (1996) The Econometrics of Panel Data (Kluwer Academic Publishers:
MA
Boston).
POWELL, J. L. (1987), ‘‘Semiparametric Estimation of Bivariate Latent Variable Models’’ (Working Paper No
8704, Social Systems Research Institute, University of Wisconsin–Madison).
SILVERMAN, B. W. (1986) Density Estimation for Statistics and Data Analysis (New York: Chapman and
Hall).
WYHOWSKI, D. J. (1996), ‘‘Monte Carlo Evidence for Dynamic Panel Data Models’’ (Unpublished manuscript, The Australian National University).