√ K−Consistent Semiparametric Estimators of A Dynamic Panel Sample Selection Models .

√
K−Consistent Semiparametric Estimators of A
Dynamic Panel Sample Selection Models∗.
George-Levi Gayle
Dept. Of Economics,
University of Pittsburgh.
Christelle Viauroux
Dept. Of Economics,
University of Pittsburgh.
Previous Draft: December 2002
Present Draft: January 2003
Abstract
This paper considers the problem of identification and estimation in panel
data sample selection models with a binary selection rule, when the latent equations contain possible predetermined variables, lags of the dependent variables,
and unobserved individual effects. The selection equation contains lags of the
dependent variables from the latent equations and other possible predetermined
variables relative to the latent equations. We derive a set of conditional moment
restrictions which are then exploited to construct a two-step GMM sieve estimators for the parameters of the main equation and a nonparametric estimator
of the sample selection term. In the first step, the unknown parameters of the
selection equation are consistently estimated using a transformation approach
in the spirit of Berkson’s minimum chi-square method and a kernel estimator
for the selection probability. In the second step, the estimates are used to
construct sieve GMM estimator for the unknown parametric parameters √
and
the unknown functions of the ”selection bias” term. The estimators are K
-consistent and asymptotically normal.
JEL Classification: C33, C34
Keywords: Predetermined Variable, Efficiency Bound, Series Estimators,
Kernel Estimator, Single Index Models.
∗
Incomplete, Please do not quote without permission. We would like to thank Whitney Newey
and Mehmet Caner for helpful suggestions and comments. All errors are naturally ours.
1
1
Introduction
Panel data are very useful in applied research. Not only do they allow researchers to
study the intertemporal behavior of individuals, they also enable them to control for
the presence of unobserved permanent individual heterogeneity. At present, there
exists a large body of literature of panel data models with unobserved individual
effects that enter additively in the regression model (see, for example Hsiao ,1986
and Matyas and Sevestre,1996). In recent years, considerable advances in the panel
data literature have been made in the direction of linear models that allow for the
presence of lags of the dependent variable and other predetermined variables, as well
as in the direction of “static” limited dependent variable models, that contain only
strictly exogenous variables. These are reviewed in Arellano and Honore (1999), who
also describe results for dynamic non-linear panel data models, for which, much less
is known.
Moreover, it is well known that parameter estimates from short panels jointly estimated with individual specific effects can be seriously biased when the explanatory
variables are only predetermined as opposed to strictly exogenous. This situation
includes models with lagged dependent variables as well as other models in which
the explanatory variables are Granger caused by the endogenous variables. In linear models with additive effects, the standard response to this problem has been
to consider IV estimates that exploit the lack of correlation between future errors
in first-differences and lagged values of the variables see e.g. Anderson and Hsiao,
1981, Holtz-Eakin, Newey and Rosen, 1988, or Aralleno and Bond, 1991). However,
much less results are available on limited dependent variables models with predetermined variables. This is the case when current values of the explanatory variables
are influenced by past values of the dependent variables. The economic literature
contains many important situations when this would be the case. For example, consider the case of Euler equation for household consumption(Zeldes, 1989, Runkle,
1991, and Keane and Runkle, 1992) or Company investment (Bond and Meghir,
1994). In these cases, the explanatory variables include variables in the agents information sets. However, these variables would be correlated with past shocks and
hence past values of the dependent variables. Another example which will be of particular interest is models of life-cycle behavior, for example a model of labor supply
of females. The assumption that all the explanatory variables are exogenous would
mean for instance, that the current number of children does not depend on past
labour supply decisions, which is unlikely if theoretical models of life cycle behavior
are to be taken seriously (see e.g. Gayle and Miller, 2002 or Browning, 1992).
2
Sample selection is a problem frequently encountered in applied research. It
arises as a result of either self-selection by the individuals under investigation, or
sample selection decisions made by data analysts. A classic example is the studied
in the seminal work of Gronau (1974) and Heckman (1976), is female labor supply,
where hours worked are observed only for those women who decide to participate
in the labor force. Failure to account for sample selection is well known to lead to
inconsistent estimation of the behavioral parameters of interest, as these are confounded with parameters that determine the probability of entry into the sample.
In recent years, a vast amount of econometric literature has been devoted to the
problem of controlling for sample selectivity. Methods for deAing with sample selectivity is well known in the cross-section case. Recently, this problem was analyzed
by Kyriazidou (1997 and 1999) in the “static” and “dynamic” panel data models
respectively. However, in both papers, it was assumed that explanatory variables
were strictly exogenous. However, as mentioned above and illustrated by the above
classical case of Gronau (1974) and Heckman (1976) of female labor supply, it is obvious that this may not be a vAid assumption in many applications. Her estimator
also have a very serious limitation, namely that selection equation does not contain
the lagged continuous endogenous variable or other predetermined
variables. More
√
K.
Chen
(1998)
proposed an
over those estimators converge at a rate slower
than
√
estimator of a panel selection model which is K consistent however this model is a
static model and does not allow for predetermined variable whether in the selection
equation or the structural equation.
In this paper we consider the problem of estimating a more general dynamic panel
data sample selection model where the latent equations and binary selection equation
each includes an additive unobservable individual specific effects and explanatory
variables which may be predetermined with respect to the endogenous variable in
the latent equations and are dynamic in that they depend on lagged endogenous
variable of the latent equation. The binary selection equation also depend on lags
of the observed endogenous variable of the latent equation.
To estimate the binary selection equation,we will adopt a transformation approach in the spirit of Berkson’s minimum Chi-Square method for cross section
binary choice model with group data which has been recently used by Chen(1998)
to estimate a panel sample selection model and modify it to fit our case. As a
result, an artificial panel data partial linear regression model with parameters of
interest appearing linearly with a nonlinear component being an additive functions
of the selection propensity scores of unknown form and the dependent variable being
the observed endogenous variable of the latent equation lagged one period. Conse-
3
quently, we use a least squares type approach for the resulting partial linear model
as in Andrews(1991), Chen(1998), Donald(1995) and Newey(1988). The nonlinear
component is estimated nonparametrically using a series expansion with the selection probabilities replaced by nonparametric kernel estimates. The number of basis
functions in the series approximation increases while the bandwidth in the kernel
estimation decreases as the number of cross section units in the sample increases, allowing the approximation to become arbitrarily close. In addition as in Chen(1998),
the additive structure of the nonlinear component is specifically taken into account
in the series approximation, leading to a weaker identification condition as well as
likely efficiency gain.
In the latent equations, we adopt the classical view of a “selection bias” term as
an unknown function of the selection propensity score or parameters summarizing
the selection process. Consequently, the structure of the outcome equation can be
expressed as a panel data partial linear regression for the particular subsample for
which the outcome variable is observed for at least two consecutive period. In order
to account for the lagged dependent variable and other predetermined variables we
derived moment conditions following Arellano and Bover(1995). We then used a
conditional moment seive estimator to estimate the structural parameters simultaneously with the “selection bias” term. In this approach the nonlinear sample
selection component is approximated by a series expansion and the propensity score
replaced by the double index representation with the first stage estimates inserted
instead of the true parameters.
The paper is organized as follows. The next section describes the model,some
empirical examples, identification and derivation of the moment conditions Section
3 describes in more details the estimation procedure. Large sample properties of the
selection equation estimators are investigated in Section 4. Section 5 investigates
the large properties of the second stage estimator. Section 6 concludes with some
discussions. All proofs are collected in a mathematical appendix.
2
Model
The most typical concern in empirical work using panel data has been the presence
of unobserved heterogeneity. Heterogeneity across economic agents may arise for
example as a result of different preferences, endowments or attributes. These permanent individual characteristics are commonly unobservable, or may simply not be
measurable due to their quAitative nature. Failure to account for such individualspecific effects may result in biased and inconsistent estimates of the parameters of
4
interest. The simultaneous presence of sample selection and unobserved heterogeneity has been noted in empirical work ( as for example in Hausman and Wise(1979) ,
Nijman and Verbeek(1992), Rosholm and Smith( 1994), Altug and Miller(1998) and
Gayle and Miller(2002). In recent years, considerable attention have been placed in
the panel data literature on the dynamic linear models that allowed for the presence
of lags of the dependent variable and other predetermined variables( see for example
Ahn and Schmidt(1995), Arellano and Bover(1995), and Blundell and Bond(1998)).
As noted in Arellano and Honore (1999), there are a abundance of results for the
dynamic linear model but much less is known for the dynamic non-linear panel
model. A seminal contribution in that regard is provided in Kyriazidou(2001), she
considered the problem of identification and estimation in panel data sample selection models with a binary selection rule( Type 2 Tobit models in the terminology of
Ameniya(1985)) when the latent equations contain strictly exogenous variables, lags
of the dependent variables and additive unobserved individual effects. Her model
does not allow for the possibility of predetermined variables in either the selection
equation or the latent equations. However in many empirical applications one might
think that some variable in the both the selection equation and the latent equations
may be predetermined relative to the latent dependent variable. For example,as
pointed out in Arrellano and Honore(1999), the assumption that correct values of
the regressors are not influenced by past values of the dependent variable and the
error term is often unreAistic. Here we say that a regressor is predetermined in the
model if the current error is uncorrelated with past values of the dependent variable and with current and past values of the regressors, but feedback effects from
lagged dependent variables( or lagged errors) to current and future values of the explanatory variable are not ruled out. Empirical examples of these situations include
Euler equations for household consumption Zeldes 1989, Runkle, 1991, Keane and
Runkle, 1992) or for company investment ( Bond and Meghir, 1994), in which variables in the agents’ information sets are uncorrelated with the current and future
idiosyncratic shocks but not with past shocks, together with the assumption that
the empirical model’s errors are given by such shocks. An example which is more
directly related to our present model is the effect of children on female labor supply
and labor force participation decisions. In this context, assuming that children are
strictly exogenous is much stronger than the assumption of predeterminedness, since
it would require us to maintain that labor supply plans have no effect on fertility
decisions at any point in the life cycle( see for example, Gayle and Miller, 2002).
Kyriazidou(2001) also rule out the possibility of feedback effects from the latent
equation to the selection equation. Continuing with the previous example on female
5
labor supply and labor force participation, this assumption rules out the possibility
that the decision to participation in the labor today does not depend on the number
of hours worked in the past. This has been noted in empirical work, for example,
Gayle and Miller( 2002), found that with time nonseparable preferences over leisure
there will be persistence in the labor participation decision of female. This paper
seeks to contributed to the state of the art in panel data models by studying the
identification and estimation of a model which account for the above issues.
The model under consideration has the form
∗
∗
= vfq−1
+ u∗fq + ∗f + ∗fq
vfq
(1)
vfq = afq vfq∗
(2)
afq = 1{
(3)
0 afq−1 + 1 vfq−1 +tfq + f +fq ≥0}
where afq ∈ {0 1}, f = 1 K and q = 1 Q Throughout the paper, K is considered to be large relative to Q It is assumed that the sample starts at date q = 0 and
that af0 and vf0 are observed although the model is not specified for initial period.
In the model given by (1)-(3), 0 0 1 ∈ R, ∈ Rh , ∈ Rn , u∗fq and tfq
are vectors of explanatory variables (with possibly common elements ) while ∗f
and f are unobservable time-invariant individual specific effects that are possibly
correlated with each other as well as with the errors, the regressors and dependent
variable are latent variables whose observability depends on the outcome of the
∗ u∗ ) is
indicator variables afq . It is assumed that (afq tfq ) is always observed, (vfq
fq
observed only if afq = 1. In other words, the “ selection ” variable afq determines
whether the fqe observation in equation(1) is censored or not. Thus, the observed
sample consists of quadruples (afq tfq vfq ufq ) where ufq = afq u∗fq An important feature of this model as with the model in Kyriazidou(2001) that
should be pointed out is that, although u∗fq and tfq may contain common variables,
the two vectors do not coincide, which rules the censored regression model ( the Type
I Tobit model ) as a special case of the model considered in this paper. The reason
is that our semiparametric identification scheme of continuous outcome equation
requires that the selection equation contains at least one variable that is not included
in the outcome equations. This is the standard exclusion restriction in the literature
on semiparametric identification of Type 2 Tobit models.
The model under consideration may be relevant, for example, for estimating intertemporal labor supply responses to wage rate and non-labor incomes changes, or
the joint effect of female labour supply and fertility behavior. Intertemporal substitution of labor has been studied in many empirical applications, as it pertains to
6
aggregates fluctuations and human capital accumulation. Dynamic models of labor
supply of the form of (1) can be found in Hotz et al.(1988), Atlug and Miller(1998),
and Gayle and Milller(2002)). These studies have found that these dynamic models yield intertemporal labor supply elasticities of substitution higher than models
that assume intertemporal separability. In most studies they are considered interior
solution, however when looking at female supply the it is obvious that one has to
consider the relevance of corner solution. See Heckman(1993) for a recent survey of
this literature. If one is only interested in participation decision alone, models of discrete choice that incorporate state dependence of the form of (3) have been used to
account for the presence of human capital accumulation( Heckman(1981), Altug and
Miller(1998) and Gayle and Miller(2002)). These models are also used to analyze
search costs( Eckstein and Wolpin(1990), and Hyslop(1999). These is also a number
of papers ( Cogan(1981), Hanoch(1980), Hausman(1980) , Altug and Miller(1998)
and Gayle and Miller(2002)) have found considerable evidence of fixed costs associated with working, implying a Type 2 Tobit specification may be more appropriate
for analyzing labor supply. While the model studied in Kyriazidou(2001) incorporates most of these sAient features of all these strands of literature, it would be
difficult to derive her model directly from a structural dynamic utility maximization problem. The reason is that typically the model would introduce the lagged
∗ ) in the selection covariates equation. Our model overcome this limitation
vfq ( or vfq
and hence could be directly derived from a structural dynamic utility maximization
problem.
2.1
Identification
In order to identify our model, we made the following assumptions
∗
and afq is independent of u∗fq is stationary
A1: { ∗fq }Qq=1 conditional on vfq−1
over time and is i.i.d. distributed over individuals and is independent of ∗f
The strict stationary assumption of { ∗fq }Qq=1 is although stranger than the second
moment restrictions found in the standard literature on dynamic linear panel data
models, it is standard in the nonlinear semiparametric panel data models ( see for
example Manki (1987), and Honore ( 1992, 1993). This is however less restrictive
that the similar assumption in Kyriazidou (2001)
b are the strictly exogenous compoA2: f = 0 bf + f where f are i.i.d. and tfq
nents of tfq ,
bf
Q
1X b
=
tfq
Q
(4)
q=1
7
Assumption of this type is frequently found in models of panel data and is
usually called the Mundlak specification, which is very flexible in that is allowed for
the possibility of both a fixed or random effect. This is a cost we have to pay in
our work relative to Kyriazidou (2001) since she does not make any assumption in
this
√ regard. We however think this is a small cost to pay for the fact we will obtain
a K − lkpfpqbkq estimator.
Let us define an additional indicator,Afq = afq afq−1 then in the spirit of the classical literature on sample selection we reformulate the model defined by equations(1)(3) as
fq vfq = 0 vfq−1 + ufq 0 + f + fq + e
(5)
where
∗
fq = B[∗fq | Afq = 1 vfq−1
u∗fq ∗f ]
(6)
∗
u∗fq ∗f ] = 0 The term fq is analogous to
and by construction B[e
fq | Afq = 1 vfq−1
the selection term or inverse Mill’s ratio in the classical sample selection literature
but is defined conditional on Afq , to account for the dynamic /lagged dependent
variables.
Indeed,
¤
£
∗
u∗fq ∗f
B vfq∗ |Afq = 1 vfq−1
£
¤
∗
u∗fq ∗f
(7)
= 0 vfq−1 + ufq + f + B ∗fq |Afq = 1 vfq−1
When Afq = 1, we have both 0 afq−1 + 1 vfq−1 + tfq + f + fq ≥ 0 and 0 afq−2 +
v
1 fq−2 + tfq−1 + f + fq−1 ≥ 0 , noting this and assumption A1 hence we can write
the selection bias term as
¤
£
∗
u∗fq ∗f
B ∗fq |Afq = 1 vfq−1
¸
· ∗
fq | 0 afq−1 + 0 vfq−1 + tfq + f + fq ≥ 0
(8)
= B
0 afq−2 + 0 vfq−2 + tfq−1 + f + fq−1 ≥ 0
The using by assumption A1, we obtain
¤
£
∗
B ∗fq |Afq = 1 vfq−1
u∗fq ∗f
= ( 0 afq−1 + 1 vfq−1 + tfq + 0 bf 0 afq−2
+
1 vfq−2
+ tfq−1 + 0 bf ) (9)
This implies that following the standard restrictions in a semiparametric conditional moment restrictions we can now identify and estimate the rest of the parameters in the model.
.
8
2.2
Moment Conditions
In order to include the fact that we may have predetermined variables along with
exogenous, we now assume that the vector of right hand side variables ufq f =
1 K q = 1 Q may include, time invariant variables wf plus other strictly exogenous and predetermined variables.
In this matter, let ufq ≡ (wf ubfq umfq ) refer respectively to time-invariant, strictly
exogenous and predetermined variable respectively. For each category, we introduce
0 w 0 )0 ub ≡ (ub ub ) and um ≡ (um um ) with the first
the partitions wf ≡ (w1f
2f
fq
1fq 2fq
fq
1fq 2fq
subsets denoting the variables that are uncorrelated to f Following Arellano and Bover(1995), these assumptions on the first model lead
to the following moment restrictions:
∗m
∗m ∗
∗b
∗
B[∗fq | wf∗ u∗b
f1 ufQ uf1 ufq vf1 vfq−1 ] = 0, q ≤ Q
(10)
Note that these conditions imply a lack of serial dependence. Let us see if the
equivalent condition in the transformed model holds
¯
¤
∗m
∗m ∗
∗b
∗
∗
B [e
fq ¯wf∗ u∗b
(11)
f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f
¯ ∗ ∗b
¤
∗m
∗m
∗
∗b
∗
∗
∗
= B [fq − fq ¯wf uf1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f
Using the law of iterative expectations, we have that
¯
¤
∗m
∗m ∗
∗b
∗
∗
B [∗fq ¯wf∗ u∗b
f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f
¯
£
¤
∗
= B B [∗fq ¯Afq = 1 vfq−1
u∗fq ∗f
¯ ∗ ∗b
¤
∗m
∗m
∗b
∗b
∗
∗
¯wf uf1 u∗b
fq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2
¯
¤
∗m
∗m
∗b
∗b
∗b
∗
∗
= B [fq ¯wf∗ u∗b
f1 ufq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2
(12)
Then by substituting in the definition of fq into the second term of equation
(11)we obtain,
¯
¤
∗m
∗m ∗
∗b
∗
∗
(13)
B [fq ¯wf∗ u∗b
f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f
¯
£
¤
∗ ¯
∗
∗
∗
= B B [fq Afq = 1 vfq−1 ufq f
¯ ∗ ∗b
¤
∗m
∗m
∗b
∗b
∗
∗
¯wf uf1 u∗b
fq−1 ufq+1 ufQ uf1 ufq−1 vf1 vfq−2
It is then obvious that equation (11) is just equal to equation(12) minus equation(13),
which is zero. Finally, we have the following moment conditions, which relate to the
transformed model,
¯
¤
∗m
∗m ∗
∗b
∗
∗
(14)
B [e
fq ¯wf∗ u∗b
f1 ufQ uf1 ufq vf1 vfq−1 Afq = 1 f = 0
9
Then as in the standard literature, we assume that individual specific effect is
independent of at least a part of the independent variables. This can be formAized
as
¯
¤
∗m
∗m
∗b
∗
∗
B [∗f ¯wf∗ u∗b
(15)
1f1 u1fQ u1f1 u1fQ vf1 vfq−1 = 0
Combining equation(10) and equation(15) gives
¯
¤
∗m
∗m
∗b
∗
∗
B [∗f + ∗fq ¯wf∗ u∗b
1f1 u1fQ u1f1 u1fq vf1 vfq−1 = 0
or equivalently
¯
£ ∗
¤
∗m
∗m
∗
∗b
∗
∗
B vfq
− vfq−1
− u∗fq ¯wf∗ u∗b
1f1 u1fQ u1f1 u1fq vf1 vfq−1 = 0
(16)
(17)
As before, these moment conditions have no usefulness in their present form
since this is a latent model and the selection equation will determine whether or not
these are observed.
¯
¤
∗m
∗m
∗b
∗
∗
(18)
B [f ¯wf∗ u∗b
1f1 u1fQ u1f1 u1fQ vf1 vfq−1 Afq = 1
¯
¤
£
∗m
∗m
∗ ¯ ∗ ∗b
∗b
∗
∗
= B B [f wf u1f1 u1fQ u1f1 u1fQ vf1 vfq−1 |Afq = 1]
= B [0 |Afq = 1] = 0
Combining equation(14) and equation(18) we obtain the feasible moment condition,
¯
¤
∗m
∗m
∗b
∗
∗
B [f + e
fq ¯wf∗ u∗b
(19)
1f1 u1fQ u1f1 u1fq vf1 vfq−1 Afq = 1 = 0
or equivalently
¯ ∗ ∗b
¸
¯ w u u∗b u∗m u∗m f
1f1
1fQ
1f1
1fq
¯
=0
B [vfq − vfq−1 − ufq − fq ¯
v ∗ v ∗ Afq = 1
f1
fq−1
(20)
There are other moment conditions that one could derive by placing other restrictions on the variance over time or the possible pattern of serial correlation but
we choose not to pursue them here.
3
Estimation
As in the cross-section, semiparametric estimation could mean different degrees
of suffixation, and relaxation of different assumptions from say a fully parametric
10
model. However, the
√ aim of this paper is to make just enough assumption(reasonable)
in order to secure K − lkpfpqbkv, those they are not as efficient as the estimates
of a fully specified parametric model. We will be using a two step procedure, which
is prevalent in the standard semiparametric literature.
The two step procedure starts out by initially specifying the parametric parts
of the model and by ordering the whole sample of Q K observations such that we
have pairs of individual data with two consecutive periods of observed outcome, i.e.
Afq = 1 Then, by using a step one estimator over the full sample of Q K observations, we estimate the selection equation. Once estimates of the selection equation
are extracted, execution of a step two estimator over appropriate participation subsample of observations supplies the estimates of the structural(outcome) equation.
As is well known, many step two estimators are able to identify only the slope coefficients of the structural equation. Therefore, an additional intercept estimator has
to be applied to the participation subsample if an estimate of the intercept term
of the structural equation is required. We will not pursue that estimate here and
leave that for future work. As the step one and step two estimators are separate
building-blocks, they may be combined in different ways. However, as the properties
of the estimates of the desired structural parameters depend on the attributes of
both estimators and their conjunction, a sensitive combination should be chosen.
So instead of specifying properties that the first estimator should possess we will
propose another that has that property.
3.1
Estimation of the Selection Equation
The first step to estimating the selectivity model is to analyze the selection assignment mechanism, through which non-random samples emerge, and introduce the
selectively bias into the structural equations. This connection between some explanatory variables and discrete assignment to one of a finite number of categories,
here two, is embraced by the binary selection equations(3)over the full sample of
observations. The two versions of the model can be linked through the propensity
score Mfq ≡ B[afq | afq−1 vfq−1 tfq bf ] = M [afq | afq−1 vfq−1 tfq bf ]
Equation (3), which is the most general form of the propensity score could be
estimated nonparametrically by a multivariate kernel as
B[afq |
afq−1 vfq−1 tfq bf ]
=
Q
P
K
P
p=1p6=q g=1g6=f
Q
P
K
P
agp e1 H
p=1p6=q g=1g6=f
11
1
eH
³T
1fq −T1gp
³T
e
1fq −T1gp
e
´
´
1{T a −T a }
1fq
1gp
1{T a
a
nfq −T1gp
}
(21)
¢
¡ a , T ≡ (v
b
a
a
ba
where T1fq ≡ T1fq
T1fq
fq−1 tfq f ) T1fq ≡ (afq−1 tfq f ) where
1fq
the subscript (resp. a) denotes continuous (resp. discrete) variables. Parameter e
is the appropriately chosen bandwidth/smoothing parameter. This estimator is consistent under only weak smoothness assumptions on Cr|aq−1 vq−1 t Yet as expected
this estimator’s generAity does not come along without shortcoming. The nonparametric estimates are susceptible to the curse of dimensionAity since the selection
equation is likely to contain many regressors T1fq , leading to high inaccuracy that is
particularly serious in as much as the succeeding estimation steps ground on those
outcomes. Besides the low precision, the nonparametric approach is only able to
identify B[afq | afq−1 vfq−1 tfq ] and not the distribution Cr|aq−1 vq−1 t , which contain
useful information about the selection process. This later shortcoming, however,
may be acceptable if not the selection process by itself is of interest but only its
impact on the structural equation. Then the propensity score is sufficient for identification of the structural relationships, this is similar to the estimators of Ahnn
and Powell (1993)and Robinson() assert on the bias of the nonparametric estimates
of the selection equation in the cross section case.
The selection equation(3) can be estimated by leaving the conditional distribution Cr|aq−1 vq−1 t unspecified and identify the selection equation via some additional
conditions on f and rfq This class of estimators has as common feature the single
index assumption on f and rfq As is the case in all binary choice model, for this
model to be identified it must contain at least one continuous variable, here this
would be satisfied by construction with vfq−1 , fulfilling that role. It also need a
scale normAization, here we will set 0 = 1. With A2, the model becomes,
afq = 1{
0 b
0 afq−1 +vfq−1 +tfq + f +f +fq ≥0
}
(22)
Then, the selection propensity score Mfq ≡ B[afq | afq−1 vfq−1 tfq bf ] becomes
¤
£
(23)
Mfq = M f + fq ≤ 0 afq−1 + vfq−1 + tfq + 0 bf
0 b
= Cq ( 0 afq−1 + vfq−1 + tfq + f )
(24)
where Cq () is the cumulative distribution function of f +fq assumed to be monotone
increasing. Then, by inverting the distribution function, we have
0 afq−1
+
1 vfq−1
+ tfq + 0 bf = Cq−1 (Mfq )
(25)
Making the standard identification restrictions from the semiparametric model and
normAizing 1 to one, we obtain the following partial linear pseudo regression;
e+e
0 bf + C −1 (Mfq )
vfq−1 = f0 afq−1 + tfq 12
(26)
where e = −
, f0 = − 0 e
= −
e e
0 ), Tfq ≡ (afq−1 tfq ) and fq ≡ Cq−1 (Mfq )
Let us also denote by ≡ (f0 Assume first that Mfq is known and let nfI ≡ (n1I (Mfq ) nII (Mfq ))0 of approximating basis functions. Let us denote by nfq ≡ nq (Mfq ) N∗ ≡ (n11 n1Q −2 nK 1 nKQ −2 )0
, T ≡ (T11 T1Q −2 TK 1 TKQ −2 ),and ve ≡ (v11 v1Q −2 vK 1 vK Q −2 )
then using Frisch-Waugh theorem, we obtain an infeasible estimator of denoted
b∗ b ≡ [T 0 J ∗ T ]−1 [T 0 J ∗ ve]
(27)
m
m
¤
£
where Jm∗ ≡ F − N∗ (N∗0 N∗ )−1 N∗0 We can then transform the estimator in equation (27) by substituting the nonparametric estimator for Mfq in equation (21) into
equation (27). This will then transform the infeasible estimator into a feasible estimator.
3.2
Estimation of the outcome Equation
We will present a general GMM framework for the estimation of the outcome equation. The estimation will process from equation (20). First we denote fq = (Mefq )
where Mefq ≡ (Mfq Mfq ) Suppose for the moment that the Mfq0 p were known. Then
following literature on series estimators ( see Andrews (1991) and Newey (1989)),
we first approximate fq by a series of basis functions. That is, we let
fq ≡
K
X
p=1
ep (Mefq )%p
(28)
where {ep (); p = 1 } is a set of basis function and %(K ) ≡ (%1 %κK )0 are unknown parameters. This then transformed equation (??) into
"
¯ ∗ ∗b
¸
K
X
¯ wf u1f1 u∗b u∗m u∗m 1fQ
1f1
1fq
e
¯
ep (Mfq )%p ¯ ∗
≈ 0 (29)
B vfq − vfq−1 − ufq −
v v ∗ Afq = 1 tfq tfq−1
p=1
f1
fq−1
for &K large enough,based on this we can define a GMM estimator.
Let us now define some notation that will help us simplifies the subsequent
presentation. Let ' 0 ≡ (0 00 )0 denote the true parameter vector that belongs to a
subset compact subset, Γ of Rh+1 Ufq ≡ (vfq−1 ufq ) . Define the following moment
functions.:
0
0
∗
(vfq −Ufq0 '−(b Tfq b Tfq−1 )) q = 2 Q g = 1 q−1
j1fqg (' (K ) ) ≡ Afq afq−g vfq−g
13
(30)
0
b0
b0
j2fqg (' (K ) ) ≡ Afq afq−g u∗b
1fq−g (vfq −Ufq '−( Tfq Tfq−1 ))
q = 2 Q g = 1 q−1
(31)
b0
b0
j3fqg (' (K ) ) ≡ Afq afq+g u∗b
1fq+g (vfq −Ufq '−( Tfq Tfq−1 ))
q = 2 Q g = 1 Q
(32)
0
0
0
b
b
j4fqg (' (K ) ) ≡ Afq afq−g u∗m
1fq−g (vfq −Ufq '−( Tfq Tfq−1 ))
0
0
j5fq (' (K ) ) ≡ Afq afq wf∗ (vfq − Ufq0 ' − (b Tfq b Tfq−1 ))
q = 2 Q g = 1 q−1
(33)
q = 2 Q
(34)
be the vector of moment conditions.
Let us define wfq as a 1 × nq of instruments for period q Note that because of the
nature of panel data models, the number of instruments are going to be different
each period. For ease of notation, let us also define the moment conditions as
0
0
jfq (0 ()) ≡ wfq0 (vfq − 0 vfq−1 − ufq − (b Tfq b Tfq−1 ))
For convenience, let n
instruments Wf as

Wf2 0  0 Wf3

Wf = 
  0
(35)
= max{n2 nQ } and define a (Q − 1) × n matrix of
0
0
0 WfQ






(36)
and a (Q − 1) × 1 vector of dependent variable as
Vf = (vf2 vfQ )0 (37)
a (Q − 1) × 1 vector of lagged dependent variables
Vf−1 = (vf1 vfQ −1 )0 (38)
14
a (Q − 1) × m matrix of independent variables
Uf = (u0f2 u0fQ )0
(39)
a (Q − 1) × 1 vector of lagged dependent variables
0
0
0
0
(bTf ) = ((b Tf2 b Tf2 (b TfQ b TfQ )0
Then, we have the following orthogonAity conditions for estimating the outcome
equation:jf (' ()) ≡ Wf0 (Vf − Vf−1 − Uf − (bTf )) ∀f = 1 K
Let’s formally assume () ∈ Λ then the natural estimator is the GMM estimator
of the form:
#
#0
"
"
K
K
X
1 X
1
jf (' (bTf )) Σ(U)−1
jf (' (bTf ))
(40)
inf
K
∈Γ∈Λ K
K
f=1
f=1
−1
−1 a
where Σ(U)−1
K is the weighting matrix, chosen such that m lim Σ(U)K = Σ(U)
K →∞
nonstochastic matrix. Just as we did in the first stage we want to replace this with
an approximation by a series estimator. To formally do that we first define a set of
approximating functions,ΛK called a sieve and constructed it so it is dense in the
underlying parameter space, i.e.Λ ∈ limK →∞ ΛK The our estimator becomes
inf
∈Γ∈ΛK
"
#
#0
"
K
K
X
1 X
1
jf (' (bTf )) Σ(U)−1
jf (' (bTf ))
K
K f=1
K f=1
(41)
This estimator is now a standard estimator as in Chen and Shen(1998) and some
what similar to Ai and Chen (2001). How it is different in many regards. First unlike
Ai and Chen(2001) however we do not have to estimate the conditional expectation,
.jf () since we have it by definition of our model.
4
Asymptotic Properties of Selection Estimates
In this section, we derive the large sample properties of the estimators defined in the
previous section. Let k>k = [qob(>0 >)]1.2 for a matrix > Let Vf = (vfl vfQ )0 ,
Uf = (uf1 ufQ )0 tf = (tf1 tfQ )0 and af = (af1 afQ ) and Tf = (tf af ) We
make the following assumptions:
15
Assumption 4.1: The vectors (af Vf Uf Tf ) satisfying (??)-(3) are independently and identically distributed across i, with finite fourth-order moments for
each component. The cumulative distribution functions Cq (), q = 1 Q are strictly
monotonic.
| T a ) be the conditional density function of T given T a and
Let %(T1fq
1
1
1fq
a
%0 (T1 ) the probability density for T1a Assumption 4.2: For each T1 = (T1 T1a ) ∈ T1 (i) %(T1 | T1a ) is bounded away from zero.
(ii) %(T1 | T1a ) 0 (T1 T1a ) and Mq (T1 T1a ) are continuously differentiable to
order p in T1 ∈ T 1 for q = 1 Q
a
(iii) The number of points of the support of T1a ∈ T1 is finite.
Assumption 4.3: The kernel function h(r) has bounded support, is symmetric
and continuously differentiable, and is of order p :
Z
h(r)ar = 1
Z
ri h(r)ar 6= 0 if |i| 3 p
i
where ri = ri11 ri22 rjj for r = (r1 r2 rj ) and |i| = i1 + i2 + + ij for
i = (i1 i2 ij ) a vector of nonnegative integers, and
Z
ri h(r)ar = 0 if |i| = p
Assumption 4.4: The bandwidth sequence eK is chosen such that K 1.2 ej
K . ln K →
2p
∞ and KeK → 0 as K → ∞
Let
PT T = B(T − B ∗ (T | M ))B(T − B ∗ (T | M ))0
(42)
Assumption 4.5:
The matrix PT T is nonsingular.
Let MT be a compact interval such that Mq (T ) ∈ MT for T ∈ T q = 1 Q
Assumption 4.6:
(i) Mq (T ) ∈ MT is continuously distributed with density bounded away from
zero for all T ∈ T q = 1 Q
(ii) (M ) is continuously differentiable of order fq max{p +1 p+1} and B ∗ (T |
M ) is continuously differentiable of order ju{p p}
16
−1
Assumption 4.7: KI−2p → 0, and K 1.2 I5 [(Kej
ln K + e2p
K )
K ] → 0 as
K → ∞
Assumption 4.1 describes the model and the data. Assumption 2 contains some
smoothness and boundedness conditions on the distribution of the regressors in the
selection equation. Assumption 3 states that the kernel function used in the estimation of the selection propensity score is of highest order p , this is standard
in the nonparametric and semiparametric literature. This assumption along with
Assumption 1 and the requirement on the bandwidth sequence in assumption 4.4,
ensures the fast rate of uniform convergence of the selection propensity scores Mbfq
and the existence of an asymptotic linear representation of certain weighted averages
of Mbfq − Mfq Assumption 5 along with the monotonicity and stationarity conditions
of assumption 1 are the identification conditions for and (0 ) respectively. Assumption 5 (i) rules out any deterministic relationship between (Mfq ) and Tfq The
boundedness and smoothness conditions in assumption 6 are useful for controlling the bias of the series estimators. Assumption 7 restricts the rate of growth
of the number of terms I in the series approximation, taking the first step kernel
estimation into account.
Although we could estimate the model without evoking the single index assumption, we will only derived the asymptotic properties of the model with the single
index assumption which in reAity is a three step procedure but the first two steps
will be treated as one step for convenient.
Theorem 4.1:
Under
° under°Assumptions 4.1 -4.7:
°
°
(i) °bK − 0 ° = lm(1)
√
(ii) K(bK − 0 ) ⇒ K(0 S )
where
−1
−
(a) S ≡ PT
T S1 PT T
0
(b) S1 = B[8
PQ1f 8 1f ]
(c) 8 1f ≡ q=2 (Tfq − B ∗ (Tfq | Mfq )(Ψa1fq 9 fq )0 fq
(d) 9 fq ≡ afq − Mfq
(e) Ψa1fq = //Mfqfq
Proof. See Technical Appendix
5
Asymptotic Properties of the structural Estimates
17
Recall
jf (' (Tf 0 ) Vf Uf )
B [jf ('0 0 (Tf ≡ Wf0 (Vf − Vf−1 − Uf − ( Tf Tf−1 ))
0 ) Vf Uf )]
=0
(43)
(44)
√
b
∃ bK suchh that
³ K ³→ 0 at´a rate faster than
´i K
Then, B jf ' Tf bK Vf Vf−1 Uf Wf
= 0, where ≡ (' ) ∈ Υ ≡
Γ ⊗ Λ, where Γ is an infinite dimensional compact subspace of OH+1 and Λ is an
infinite dimensional space. Let VK ≡ Γ⊗ΛK be a sequence of approximating spaces,
such that {ΥK } is dense in Υ as K → ∞, that is for any ∈ Υ, there exists ΠK ∈
ΥK , such that a ( ΠK ) → 0 as K → ∞ where a is a pseudo distance. Next we
formally define the second stage estimator. We first introduce some definitions, then
present a set of sufficient conditions for consistency and Asymptotic normAity. Let
≡ ( ) and 0 ≡ (0 00 0 ) We define the (pathwise) directional derivatives
of jf (' ) with respect to evaluated at 0 at the direction [ − 0 ];
j0 [ − 0 ] = −Wf0 Vf−1 [ − 0 ] − Wf0 Uf [ − 0 ] − Wf0 [ − 0 ]
and for any 1 2 , we define a Fisher-like metric k1 − 2 k as
q
k1 − 2 k = B{j0 [ − 0 ]}0 Σ(U)−1 B{j0 [ − 0 ]}
(45)
(46)
Here our pseudo distance a will be the distance induce by the Fisher-like metric.
p
p
Definition 1 b
is ; K −consistent ( under the
Fisher-like
metric)
for
if
; K kb
− 0 k →
0
p
0 in probability , denoted as kb
− 0 k = lm (1. ; K )
Assumption 5.1:. (Uf Tf ) ∈ a compact set with non-empty interior.
Assumption 5.2: B[jf (' )] = 0 holds iff = 0 a sufficient condition for
this to hold is
1- B [Wf0 ([Vf−1 Uf ] − B [(Vf−1 Uf ) |Tf Tf−1 ])] is nonsingular, i.e. for any < 6=
0 there is no measurable function c(Tf Tf−1 ) such that c(Tf 0 Tf−1 0 ) =
[Vf−1 Uf ]0 < This is similar to the condition imposed by Newey (1999) and is the
selection instrumental variable version of Robinson”(1988) identification condition
for additive semiparametric regression.
18
Assumption 5.3:(i) Σ(U)K = Σ(U) + lm(; K ) uniformly over (Uf Tf ) ∈
(ii) there exist some positive constant 1 and 2 such that 1 ≤ jfj (Σ(U)) ≤
max (Σ(U)) ≤ 2 for all (Uf Tf ) ∈ (iii) there exist some positive constant
1 and 2 such that 1 ≤ jfj (Σl (U)) ≤ max (Σ(U)) ≤ 2 for all (Uf Tf ) ∈ (iv)
for some positive value max (S o{j0 [ΠK >∗ ] ≤ Assumption 5.4: (i) B[Wf0 Uf ] 3 ∞ ,B[Wf0 Vf−1 ] B[Wf ] 3 ∞ for all , 2and
sup
so [j (
0)
{∈ΥK :k 0 −k≤}
− j (0 0 )]
≤ ∂1 2
(47)
for all small , 0
Assumption 5.5: (i) For any ∈ ΥK , there exists ΠK ∈ ΥK such that
k − ΠK k = l(; K ) as K → ∞
p
Denote zK ≡ {j[]0 Σ(U)−1 j[] : ∈ ΥK , k − 0 k ≤ ; K and κK ∈ (0 1]
as a measure of the size of the sieve space:
½
¾
Z κp
√
1
log K(zK B)aB ≤ L( K)
(48)
κK = inf κ ∈ (0 1] : 2
κ
κ κ2
where K(zK B) is the minimum balls of radius B required to cover zK under the
Fisher-like metric. (ii) κK = l(; K )
Here we are using the I2 metric entropy with bracketing to measure the size
of a space. Denote kk2 as the I2 −norm on z for an element in z. Let L2 be
T ,
the completion of z under kk2 For any given
°
° 0, if there exists P (T K) =
ª
© i r
° r
°
i
r
i
e1 e1 eK eK ⊂ L2 such that max °eg − eg ° ≤ T , and if for any e ∈ V
there exists a g ∈ {1 K} with
eig
1≤g≤K
≤e≤
K (T V ) = log (min {K : P (T K)})
erg
2
a e M , then
(49)
is defined as the bracketing I2 −metric entropy of the space z
Assumption 5.1 restricts the condition regressors (Uf Tf ) to be bounded. This
is not necessary for the results that follows since one can always trim large values
as was done in the first stage. Assumption 5.2 is a global identification conditions
these conditions can be stated in more primitive condition which we are working
on at the moment. Assumption 5.3(i) requires that the estimator of the weighting
matrix converges to the weighting matrix uniformly over the regressors at a rate
faster than ; K . This assumption is not every restrictive and can be satisfied by
many estimators. For example, the identity weighting matrix satisfies this condition. Assumption 5.3(ii) requires the weighting matrix to be bounded above and
19
below. These are standard assumption in the econometric literature, for example,
these are commonly found in the weighted least squares literature. Assumption
5.4 are bounded moment conditions which are standard in all econometric model.
Assumption 5.5 are conditions related to the sieve approximation of 0 ∈ Λ by
ΠK 0 ∈ ΛK . Assumption 5.5(i) requires that the sieve approximation error must
shrink to zero at a rate faster than ; K while assumption 5.5(ii) requires that the
size of the sieve space ΛK should not be too large. See for example Fenton and
Gallant(1996) for example of Hermite polynomials and Newey(1997) for spline and
power series.
Theorem 5.1 Under Assumptions 4.1 and 4.1-4.5 then
´´
³
³p
; K k0 ΠK 0 k
kb
K − 0 k = Lm max
Proof. See Appendix.
e denote the linear completion of the space Γ ×Λ under the FisherLet Λ ≡ Γ × Λ
like metric kk and let h i denote the inner product induced by the norm kk on
Λ A linear functional c : Λ → R is bounded (i.e. continuous) if and only if
|c() − c(0 )|
3∞
k − 0 k
k− 0 k,0∈Λ
sup
(50)
Then following Chen and Shen(1998) among others used the Reisz representation
theorem which states that for any bounded linear functional c : Λ → R, there
exists a representor
k>∗ k ≡
|c() − c(0 )|
3∞
k − 0 k
k− 0 k,0∈Λ
sup
(51)
Under some weak conditions and any linear the functional c (), one can established the following link of c(b
) − c(0 ) to the otherwise directional derivatives of
the sample criterion function:
K
√
1 X
K(c (b
) − c( 0 )) = − √
jf 0 [> ∗ ]0 Σ(Uf )−1 jf (0 ) + lm(1)
K f=1
(52)
S o(c ) = B{jf 0 [>∗ ]0 Σ(Uf )−1 Σ0 (U)Σ(Uf )−1 jf 0 [>∗ ]}
(53)
√
) − c( 0 )) is
Then the Lindeberg-Levy central limit theorem implies that K(c (b
asymptotically normally distributed with mean zero and variance S o(c):
20
with Σ0 (U) ≡ S√o[jf (0 )] We then apply this approach to derive the asymptotic
' − ' 0 ) Note that for any fixed non-zero ; ∈ Rh+1 ,c() = ; 0 '
distribution of K(b
is bounded so we need to provide sufficient to conditions to ensure: (a) c() = ; 0 '
is bounded so we can compute its corresponding representor >∗ = (> ∗ >∗ ) ∈ Λ
e and j [>∗ ] in the variance formula and the linkage equation(52) holds.
≡ Γ×Λ
f 0
f ∗−0 . Note that W
f can be exTo simplify notation let W = ∗ − 0 and W=e
pressed as a linear combination of itself. Hence, the space is the same as the linear
combination of itself, and we can always replace − 0 by −B(' − '0 ) in equation
45. Let AB(U) the matrix valued function
AB(U) = −Wf0 Vf−1 [ − 0 ] − Wf0 Uf [ − 0 ] − Wf0 B[' − ' 0 ]
(54)
Then we have
j0 [ − 0 ] = AB(U)(' − ' 0 )
(55)
and Fisher-like norm
k − 0 k = (' − ' 0 )0 B[AB(U)0 Σ(Uf )−1 AB(U)](' − ' 0 )
(56)
Also note that B[AB(U)0 Σ(Uf )−1 AB(U)] is quadratic in B, therefore there exists a
Q
f such that
B ∗ ≡ (B ∗1 B ∗! ) ∈ !g=1 W
(57)
B[AB(U)0 Σ(Uf )−1 AB(U) − B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)]]
Q! f
is positive semi-definite for any B ≡ (B 1 B ! ) ∈ g=1 W Then for c() = ; 0 '
we have
k>∗ k ≡
|c() − c(0 )|
= ; 0 B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)]−1 ;
k
−
k
k− 0 k,0∈Λ
0
sup
(58)
Thus, c() = ; 0 ' is bounded if and only if B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)] is finite
positive-definite. Given this we can find the Riesz representor >∗ = (> ∗ > ∗ ) ∈ Λ
e for the bounded linear functional c () = ; 0 ' as:
≡Γ×Λ
¡
¢−1
;
(59)
> ∗ = B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)]
¡
¢−1
> ∗ = −B ∗ () B[A"∗ (U)0 Σ(Uf )−1 A"∗ (U)]
;
(60)
Hence
¡
¢−1
jf 0 [> ∗ ] = A"∗ (U) B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)]
;
21
(61)
Assuming that the linkage equation
√ 52 0 holds,0 and substituting 61 into the general
variance formula, we obtain that K(; b
' − ; '0 ) is asymptotically distributed with
mean zero and variance ; 0 S −1 ; where


¡
¢−1
B[A" ∗ (U)0 Σ(Uf )−1 A"∗ (U)]
×


(62)
S −1 ≡  (B{A"∗ (U)0 Σ(U)−1 Σl (U)Σ(U)−1 A"∗ (U)})× 
¡
¢
−1
0
−1
B[A" ∗ (U) Σ(Uf ) A"∗ (U)]
√
Since ; is arbitrary, b
' is then K(b
' − ' 0 ) is asymptotically normally distributed
with variance S −1 Note that Σl (U) is the variance of the moment condition which
would into consideration the correction for the first stage equation. √
K(b
' − '0 ) ⇒
Theorem 5.2: Under Assumptions 5.1 and 6.1-6.5 then
−1
K(0 S )
Proof. See Appendix.
6
Conclusion
In this paper, we consider the problem of identification and estimation in panel
sample selection models with binary selection rule when the latent equations could
contain predetermined variables, lags of dependent variables and additive unobserved individual effects. The selection equation contains lags of its dependent
variables and lags of the outcome dependent variables along with individual effects.
Under the assumptions stationary and strict monotonicity of the selection equation
error distribution function, we were able to derive a set of conditional moment restrictions
which were used to construct a semiparametric GMM estimators that are
√
K − lkpfpqbkq and asymptotically normal under a set of mild regularity conditions. An advantage of this approach is that it does not require any assumptions on
the parametric form of the distribution of the unobservables conditional on the observed covariates and the initial conditions. It also estimate a more general model
then the present models in the panel data literature but which
has many uses.
√
Finally unlike previous proposed estimation procedure it is K − lkpfpqbkq At
present we are designing monte Carlo experiments which will be used to study the
finite sample performance of our estimator.
7
7.1
Technical Appendix
Proof of 1st Stage Estimator
22
Let denote a generic constant, which may take on different values in different
situations. Let
° q
°
° # I
°
; q (I) = sup °/ nq (M )° 0 q (Tq )
|#q |=
where: £
¤
I (M ) 0
nqI (M ) = n1I (M1q ) nK
Kq
I (M ) where n I (M ) = >n I (M ) where > is nonsingular since a
and nqI (M ) = nq∗
q∗
q
nonsingular transformation would not affect nonparametric estimates
Mq = (M1q MK q )
Eq = (Eq Eq−1 ) and Eq and Eq−1 are two nonnegative integers and
q
q
/ # nqI (M ) =
/ |# | nqI (M )
/Mq#q
Also, for a measurable function c (Mq ) and a nonnegative integer rlet
¯ q
¯
¯
¯
|c (Mq )|r = max ¯/ # c (Mq )¯ 0 q (T )
q
|# |≤r
Mq ¯ ¯
q
and |c (Mq )|r equals to infinity if / # c (Mq ) does not exist for some ¯Eq ¯ ≤ r and
Let min (F) and max (F) denote the minimum and the maximum eigenvalues
of a symmetric matrix F
Let
0 f = 0 f1 0 f2 0 fQ −2
Lemma 1 Under Assumptions 1, 2, 3, 4, we have
¯
¯
³
´
¯
¯
sup ¯Mbfq − Mfq ¯ 0 f = Lm (ln K)1.2 (Kej )−1.2 + ep
f
and
´
³
Mbfq − Mfq 0 f =
1
(K −1)ej
K
Q P
P
q6=g g=f
(agq − Mfq ) H
³T
1fq −T1gp
eK
´
Q
¡ ¢ ¡
¢
P
a % Ta
% T1fq
|T1fq
0
1fq
q6=g
uniformly in f for q = 1 Q
23
1{T a =T a }
1fq
1gq
³
´
+Lm K 1.2
Proof. The details are omitted here. The interested reader is referred to Newey
(1994) or Chen (1998)
Lemma 2 Under Assumptions 1 to 4, and suppose that (T ) is continuously dif of order p , then
ferentiable in T1fq
K
K
´
³
1 X
1 X
√
(Tfq ) Mbfq − Mfq 0 f = √
(Tfq ) 0 f >fq + lm (1)
K f=1
K f=1
Proof. The details are omitted here. The reader is referred to Chen (1998).
The following useful results are adapted from Newey (1993). If Assumption
7 holds, then for the polynomial series-based approximating functions n I (M ) and
eh (M ) specified above, we have:
a- For each I and h, there are nonsingular matrices > and£F such that n∗I (M ) =¤
I
>n (M£) and eh∗ (M ) = Feh¤(M ), the smallest eigenvalues of B n∗I (M ) n I (M )0 0 (T )
and B eh∗ (M ) eh (M )0 0 (u) is bounded away from zero uniformly in I and H respectively.
b- For each I and H respectively, there are ( T ) and ( u ) such that:
¯
¯
¯
¯
¯ (M ) − eh∗ (M ) 0 ¯ 0 (T ) 3 h−p
0
¯
¯
¯ (M ) − eh∗ (M )
and
¯
¯ (M ) − n∗0I (M )
¯
¯ (M ) − n∗0I (M )
¯
¯
−p+1
0 ¯ 0 (T ) 3 h
1
¯
¯ 3 I−p
0
¯
¯ 3 I−p+1
1
°
°
c- For each nonnegative a, max °/ r n∗I (M )° 0 (T ) ≤ I1+2a
|r|=a
Proof. ( of Theorem 1)Following Newey, Powell and Vella (2002), we can write
that n I (M ) = n∗I (M ) since a nonsingular transformation would not affect nonparametric estimates based on power series expansion.
24
Let P = B [0 f nf nf ] where nf is the individual element of nI (M ), it then suffices
to prove our results with P = F
Recall that
¢ ¡
¢
¡
b = T 0 Jm T −1 T 0 Jm ve
¢
¡
b = T 0 Jm T −1 T 0 Jm (T + )
where
=(
1 0 1 K0K)
= C −1 (M )
¢
¡
b = + T 0 Jm T −1 T 0 Jm
¢
¡
b − = T 0 Jm T −1 T 0 Jm
and
°
°
° °¡
¢−1 0
°
°b
° °
0
J
T
T
J
−
=
T
°
° °
m
m °
Since kkis a continuous operator, we can use the continuous mapping theorem
m
and it is sufficient to show that (T 0 Jm T )−1 T 0 Jm → 0And by the Slutsky
¡
¢−1 m −1
m
theorem, that K1 T 0 Jm T
→ PT T (i) and K1 T 0 Jm → 0 (ii).
(i) Following Chen (1989), we can show that
1
1 0
0
∗ ∗−1 ∗0
T Jm T = T 0 Jm∗ T − (Π1 Π−1
2 Π1 − Π1 Π2 Π1 )
K
K
where
Π1 = T N.K, Π∗1 = T N∗ .K, Π2 = N0 N.K, Π∗2 = N∗0 N∗ .K with N∗ =
∗
∗ 0 ) and J ∗ = (F − N∗ (N∗0 N∗ )N∗0 )
(n1 0 1 n2∗ 0 2 nK
K
m
∗ = Π∗ Π∗−1
Also, Π3 = Π1 Π−1
and
Π
3
1 2
2
From the Cauchy InequAity and Lemma 3,
°
°
K
°1 X
°
°
°
Tf nf0 0 f °
kΠ1 k = °
°K
°
f=1
!1.2 Ã
!1.2
Ã
K
K
X
1
1 X
kTf k2 0 f
knf k2 0 f
≤
K
K
f=1
f=1
³
´
= Lm I1.2
25
since
1
K
K
P
f=1
kTf k2 0 f = Lm (1) and from Lemma 3,
1
K
K
P
f=1
knf k2 0 f = Lm (I) In the same way, from Lemma 1 and 3,
°
°
K
°1 X
°
°
°
Tf (nf − nf∗ )0 0 f °
kΠ1 − Π∗1 k = °
°K
°
f=1
!1.2 Ã
!1.2
Ã
K
K
1 X
1 X
2
∗ 2
kTf k 0 f
knf − nf k 0 f
≤
K
K
f=1
f=1
= Lm (G 1 )
¶
µ
with G 1 = max ; q1 G 0 and G 0 = (ln K)1.2 (Kej )−1.2 + ep
1≤q≤Q
Moreover,
kΠ1 − Π∗1 k ≥ kΠ1 k − kΠ∗1 k
hence
³
´
kΠ1 k ≤ kΠ∗1 k + Lm (G 1 ) = Lm I1.2
Similarly, we have that
K
1 X
∗
kΠ2 − Π2 k ≤
knf − nf∗ k2 0 f + K
f=1
Ã
K
1 X
knf − nf∗ k2
K
f=1
kΠ2 − Π∗2 k = Lm (G 2 )
µ
¶2
µ
¶
2
1.2
max ; q1 G 0
with G 2 = max ; q1 G 0 + I
1≤q≤Q
!1.2 Ã
K
1 X ∗ 2
knf k
K
f=1
!1.2
1≤q≤Q
From Lemma 4 and the proof of Lemma 5 in Newey (1993), and the result astated above, one can show tm → 1 that:
° −1
°
°
¡
¢ °
°Π − Π∗−1 ° = °Π−1 Π∗−1 − Π−1 Π∗2 °
2
2
2
2
2
¡
¢
¡
¢ ∗
max Π∗−1
kΠ2 − Π2 k
≤ max Π−1
2
2
= Lm (G 2 )
which implies that
26
1-
2-
°
°
°
¡
¢°
°(Π1 − Π∗ ) Π−1 − Π∗−1 ° ≤ kΠ1 − Π∗ k °Π−1 − Π∗−1 °
1
1
2
2
2
2
= Lm (G 1 G 2 )
°
°
∗ ∗−1 °
kΠ3 − Π∗3 k = °Π1 Π−1
2 − Π1 Π2
¡
¢
¢
¡
+ (Π°1 − Π∗1 ) Π−1
+ Π∗1 Π−1
Π∗−1
Π∗−1
with Π3 − Π∗3 = (Π¡1 − Π∗1¢) Π∗−1
2
2 −
2
2 −
2
°
°
°
∗ k+kΠ∗ k °Π−1 − Π∗−1 °+kΠ − Π∗ k °Π−1 − Π∗−1 °
kΠ
kΠ3 − Π∗3 k ≤ max Π∗−1
−
Π
1
1
1
1
1
2
2
2
2
2
= Lm (G 3 ) with G 3 = G 1 + I1.2 G 2 + G 1 G 2 3°
°
°
kΠ∗3 k = °Π∗1 Π∗−1
2
¡
¢
∗
≤ kΠ1 k max Π∗−1
2
¡
¢
= Lm I1.2
¡
¢
kΠ3 k ≤ kΠ3 − Π∗3 k + kΠ∗3 k = Lm I1.2
Similarly,
∗
∗
∗
∗
∗
∗
kΠ3 Π01 − Π∗3 Π∗0
1 k ≤ k(Π3 − Π3 ) Π1 k + kΠ3 (Π1 − Π1 )k + k(Π3 − Π3 ) (Π1 − Π1 )k
= Lm (G 4 ) = lm (1)
1.2
where G 4 = G 3 I + G 1 I1.2 + G 1 G 3
Therefore,
1 0
1
T Jm T = T 0 Jm∗ T − lm (1)
K
K
By the Law of Large Numbers,
£
¤
1 0 ∗
m
T Jm T → B (T − B ∗ [T |M ])0 (T − B ∗ [T |M ]) = PT T
K
(ii)
°
³ ´°
°
°1 0
° T Jm Mb ° = lm (1)
°
°K
°
°
³ ´°
h ³ ´
i°
°
°
°
°1 0
b
b
° T Jm Mb ° = ° 1 T 0 Jm
M −N °
°
°K
°
°K
°
° °³ ³ ´
´°
° 1 0 °°
b − Nb °
°°
≤ °
M
J
T
°
m
°K
°
°1.2 ° ³ ´
°
°
° °
°
−1.2 ° 1
0
° ° Mb − Nb°
= K
J
T
T
°
m
°
°K
= K −1.2 I−p lm (1) → 0 as K → +∞
27
Hence,
°
°
°b
°
−
°
° = lm (1)
Next, we show that
¶−1
³ ´
´ µ1
√ ³
1
0
√ T 0 JMb
Mb
K b−
T JMb T
=
K
K
³ ´
Mb converges in distribution to a
So, now it suffices to show that √1K T 0 JM
normal random variable with mean zero. We already have that:
³ ´
³
´
1
1
1
√ T 0 Jm Mb = √ T 0 Jmb (M ) + √ T 0 Jmb a Mb − M Mb − M
K
K
K
³
³ ´´
1
1
√ T 0 Jm (M ) = √ T 0 Jmb
(M ) −
Mb + lm (1)
K
K
³ ´
Mb around M
Let ∗ia be the second term, that is a mean value expansion of
Then,
°
°
°1 0
°
° T Jm (M ) + 1 T 0 Jmb ∗ia ° = lm (1)
°K
°
K
Therefore,
µ
1
1
0
√ T Jmb (M ) = − √ T 0 Jmb
K
K
¡
¢
Note √1K T 0 Jmb − Jm ∗ia Let
∗
ia
¡
¢
1
− √ T 0 Jmb − Jm
K
∗
ia
¶
¡
¢
1
P21 = √ T 0 Jmb − Jm 1bm
K
¡
¢
1
P22 = √ T 0 Jmb − Jm 2bm
K
For P21 , we have:
´
1 ³
c0 − Π∗1 Π−1 N0 1bm
Π1 Π−1
N
P21 = √
2
2
K
·
³
´0
¢
1 ¡
−1 c0
−1 b
Π1 Π−1
N
−
N
= √
−
Π
Π
+
Π
Π
N
1
1
1b
m
2
2
2
K
28
+ lm (1)
¸
1b
m
° 0
°N
°
K Q
−2
°X
X
°
°
°
= °
nfq
1b
m
°
°
³
´ °
°
b
a1fq Mfq−1 − Mfq−1 0 fq °
°
f=1 q=2


¶
µ
°³
´°
°
°
≤
max ; q0 (I) K (Q − 2)  max ° Mbfq−1 − Mfq−1 ° 0 fq 
1≤q≤Q −2
= Lm
and
°³
´0
°
b−N
° N
°
µµ
¶
¶
max ; q0 (I) KG 0
and
1≤f≤K
1≤q≤Q −2
° K Q −2
°
°
°X X
³
´ °
°
°
°
°
(b
nfq − nfq ) a1fq Mbfq−1 − Mfq−1 0 fq °
1Mb ° = °
°
°
f=1 q=2
¶
µ
°³
´°
°
°
≤
max ; q2 (I) K(Q − 4) max ° Mbfq−1 − Mfq−1 ° 0 fq
2≤q≤Q −2
= Lm
Therefore,
°
°c0
°N
1≤q≤Q −2
µµ
1≤f≤K
¶
¶
2
max ; q1 (I) KG 0
2≤q≤Q −2
°³
°
°
´0
°
°
° ° 0
°
b
°
° + °N b °
N
−
N
≤
°
b
b
°
1M °
1M
1M
µ µµ
¶
µ
¶ ¶¶
max ; q0 (I) G 0 +
max ; q1 (I) G 20
= Lm K
2≤q≤Q −2
2≤q≤Q −2
µµ
¶
µ
¶ ¶ 

1
G K
max ; q0 (I) G 0 +
max ; q1 (I) G 20 
kP21 k = Lm  √  3
2≤q≤Q −2
2≤q≤Q −2
K
+I1.2 K; (I) G 2

q1
0
= lm (1)
Similarly, we can show that P22 = lm (1), hence P2 = lm (1) We now consider P1 Note that
P1 = P11 + P12 + P13
where
1
P11 = √ (T − B ∗ (T |M ))0
K
Mb
29
1
P12 = √ JM B ∗ (T |M )0 1Mb
K
1
P13 = √ (T − B ∗ (T |M ))0 (F − JM ) Mb
K
For P13 we have
1
kP13 k = √ (T − B ∗ (T |M ))0 (F − JM ) 1Mb
°K
°
° 1
°°
°
0
∗
°
° b°
≤ ° √ (T − B (T |M )) (F − JM )°
° 1M
K


°³
´°2
°
°
≤ Lm K −1.2 I1.2 K (Q − 4) max ° Mbfq−1 − Mfq−1 ° 0 fq 
1≤q≤Q −2
1≤f≤K
³
´
= Lm I1.2 G 0
= lm (1)
For P12 by the result stated from Newey (1993) c-, we have:
°
°
°
°
°° 1
°
√
kP12 k = °JM B ∗ (T |M )0 ° °
° K Mb °
°
°
°° 1
0
∗
°
°
= JM B (T |M ) − N T °
° √K
³
´
≤ Lm K 1.2 I−p G 0 = lm (1)
°
°
°
Mb °
By Lemma 2, we can show that:
P11 =
=
=
=
¢
1 ¡
√
(T − B ∗ (T |M )) 2Mb
K
K Q −2
1 XXh
√
(Tfq − B ∗ (Tfq |Mfq Mfq−1 ))
K f=1 q=2
³
´ i
+ a1fq Mbfq−1 − Mfq−1 0 fq
K Q −2
1 XX
√
(Tfq − B ∗ (Tfq |Mfq Mfq−1 )) (
K f=1 q=2
K
1 X
√
; 1f + lm (1)
K f=1
30
a1fq
³
´
Mbfq−1 − Mfq−1
a1fq > 1fq
+
a2fq > 2fq ) + lm (1)
Consequently, we obtain
√ ³
K bK −
0
´
K
1 X
−1
√
= PT
; 1f + lm (1)
T
K f=1
By the Linberg-fuller Central Limit theorem, we get the desired result.
7.2
Proof of 2nd Stage Estimator
Proof. Proof
³ of´theorem
h P 5.1
i
h P
i0
K
−1
1
b
b
b
Let IK K ≡ K1 K
f=1 jf (' ( Tf )) (−Σ(U)K ) K
f=1 jf (' ( Tf ))
this is defined this way such that the minimization problem becomes a maximization
problem with out loss of generAity.
By definition of b
K and the fact that ΠK 0 ∈ ΥK M [k0 − b
K k ≥ uK ]
"
= M
∗
= M∗
≤ M∗
sup
{k 0 −k≥K ∈ΥK }
"
"
sup
#
n
³
´
³
´o
³
´
³
´
IK bK − IK 0 bK
≥ IK b
K bK − IK 0 bK
{k 0 −k≥K ∈ΥK }
{IK () − IK (0 ) − IK (0 ) − IK ()}
≥ IK (b
K ) − IK (0 ) − IK (0 ) − IK ()
sup
{k 0 −k≥K ∈ΥK }
{IK () − IK (0 ) − IK (0 ) − IK ()}
≥ IK (ΠK 0 ) − IK (0 ) − IK ( 0 ) − IK ()
≤ M1 + M2 + M3
#
#
with
M1 ≡ M
M3
∗
"
(uK )2
{IK ( 0 ) − IK (0 0 )} ≥ −
sup
3
{k 0 −k≥K ∈VK }
"
#
#
³
´
³
´ (u )2
¡2¢
K
≡ M IK 0 bK − IK ΠK 0 bK ≥
− L K
3
#
"
³
´
³
´ (u )2
K
≡ M IK 0 bK − IK ΠK 0 bK ≥
3
31
(63)
(64)
(65)
M2


{j ( 0 ) − j (0 0 )}
 {k0 −k≥K ∈ΥK } ³
´ 
≡ M∗ 
2 
1
H (0 ) − 3 (uK )
≥
inf
{k 0 −k≥K ∈ΥK }
"
≤ M∗
sup
1
MK {j ( 0 ) − j (0 0 )} ≥ (uK )2
sup
3
{k 0 −k≥K ∈ΥK }
#
where
H ( 0 ) ≡ K −1
K
X
B (jf ( 0 f=1
= B [j (
0)
− j (
0)
− j (
0 ))
0 )]
Then, by corollary 1 of Chen and Shen (1998, pp298), there exists constants
∂,∂ , 0 such that for any u ≥ 1 and any integer K :
"
1
RK [j ( ) − j ( 0 )] ≥ (uK )2
sup
M∗
3
{k0 −k≥K ∈ΥK }
i
h
≤ n∂ exp −∂K (uK )2
#
Then,
i
h
2
M2 ≤ n∂ exp −∂K (uK )
"
(66)
2
#
{IK ( 0 ) − IK (0 0 )} ≥ − (u3K )
{k 0 −k≥K ∈ΥK }
³
´
³
´i
³
´
h
Next, we bound M1 : Since B IK 0 bK − IK ΠK 0 bK = H 0 ΠK 0 bK ≤
M1 ≡ M ∗
sup
2K and u ≥ 1 :
32
"
³
´
³
´
h
³
´
³
´i (u )2
K
≤ M IK 0 bK − IK ΠK 0 bK − B IK 0 bK − IK ΠK 0 bK ≥
9
° K h ³

´
³
´i °
° P
° "
#
°
°
b
b
2 −
j
−
j
Π
K
K
K
)
(u
0
0
°
° 
K
≤ B ° f=1 h ³
 K
´
³
´i °
°
°
9
° −B j 0 bK − j ΠK 0 bK
°
M1
Then,
M1 ≤ ∂3 K −1.2 u−1 −1
K
(67)
Next, we bound M3 :
M3 ≤ M
∗
"
³
IK 0 bK
´
³
´
h
− IK ( 0 0 ) − B IK 0 bK − IK ( 0 °
´
i °
K h ³
° P
° "
#−
°
°
b
j
−
j
(
)
K
0
(uK )2
0
0
°
°
≤ B ° f=1 h ³
K
i °
´
°
°
9
b
° −B j 0 K − j (0 0 ) °
i
(uK )2
)
≥
0
9
#
This completes the proof.
1
We will first let K = K − 2 r∗ = ±> ∗ and ∗ = + k r∗ Then − ΠK ∗ =
−K ΠK r∗ The fallowing lemmas will be useful later for proving asymptotic normAity and consistency of our estimator. Let denote
oj [ − l ] = j() − jl [ − l ]0
K
1 X
IK 0 [ − l ] ≡
jf 0 [>∗ ]0 Σ(Uf )−1 jf (0 ) = 0
K
f=1
"
#
#0
"
K
K
X
1 X
1
jf (' (bTf )) Σ(U)−1
jf (' (bTf ))
IK [] ≡
K
K
K
f=1
and
f=1
O[ l ] ≡ IK [] − IK [l ] − IK 0 [ − l ] = IK []
33
#
Lemma 3 Under Assumption 5.1-5.6 and Theorem 1 we have
K
K
1 X
1 X
1
−1
∗ 0
j [ΠK > ] Σ(Uf )K jf (0 ) =
j [>∗ ]0 Σ(Uf )−1 jf (0 ) + lm( √ )
K f=1 f 0
K f=1 f 0
K
Proof. The details are omitted here. The interested reader is referred to Ai and
Chen(1999)
Lemma 4 Under assumption 4.1-4.6 and 5.1-5.6, we have uniformly over ∈ ΛK
¡
¢
IK jl [ − l ]0 Σ(U)−1 jl [ΠK r∗ ] = lm (k )
Proof. The details are omitted here. The interested reader is referred to Ai and
Chen(1999)
Lemma 5 Under assumptions 4.1-4.6, 5.1-5.6 we have
O[ l ] − O[Π ∗ l ] − B{O[ l ] − O[Π ∗ l ]} = lm(2K )
Proof. The details are omitted here. The interested reader is referred to Ai and
Chen(1999)
Lemma 6 Under assumptions 4.1 -5.6, we have uniformly over ∈ ΛK
1
B[IK [] − IK [Π∗ ] = [kΠ ∗ − k2 + k − l k] + lm(2K )
2
Proof. The details are omitted here. The interested reader is referred to Ai and
Chen(1999
Proof. of Theorem 5.1) For H , 1


³
p ´
≤ M
sup
IK [] ≥ IK [Π]
M kb
− 0 k ≥ H2 ; K
√
kb
− 0 k≥H2

≤ M
' K ∈ΛK
sup
√
kb
− 0 k≥H2
' K ∈ΛK
IK [] − IK [0 ] ≥ IK [Π] − IK [0 ]
To show that this go to zero , we notice that all conditions A.1-A.4 in Chen and
Shen(1998) are trivially satisfied given Assumptions 5.1-5.5. Hence this tends to
zero by the theorem 1 in Chen and Shen (1998).
34

Proof. of Theorem 5.2)Let b
∗ = b
+ K r∗ and
] = IK [ΠK b
∗ ] + IK 0 [b
− ΠK b
∗ ] + O[(b
l ] − O[ΠK b
∗ l ]
IK [b
= IK [ΠK b
∗ ] + IK 0 [b
− ΠK b
∗ ] + O[(b
l ] − O[ΠK b
∗ l ]
+B{IK [b
] − IK [ΠK b
∗ ]} − B{O[(b
l ] − O[ΠK b
∗ l ]}
This implies that
1
1
] = IK [ΠK b
∗ ] + [kΠb
− l k] + IK 0 [b
− ΠK b
∗ ] + lm( )
∗ − 0 k2 + kb
IK [b
2
K
Looking at the second term on the right hand side of the above equation, we
have
∗ − k2 + kb
− l k2 + 2K kr∗ k2
kΠb
∗ − l k2 = kΠb
+2 hΠb
∗ − ∗ b
− l i + 22K hΠb
∗ − ∗ r∗ i
+2K hb
− l r∗ i
Note that
|hΠb
∗ − ∗ b
− l i| = K |hΠr∗ − r∗ b
− l i|
∗
∗
≤ K kΠr − r k × kb
− l k = lm(2K )
and
∗ − ∗ r∗ i| = 2K |hΠr∗ − r∗ r∗ i| = lm(2K )
2K |hΠb
It follows then that
kΠb
∗ − l k2 = lm(2K ) + kb
− l k2 + 2K hb
− l r∗ i
and
] − IK [ΠK b
∗ ] − IK 0 [b
− ΠK b
∗ ] = K hb
− l r∗ i + lm(2K )
IK [b
By definition of b
, the IK [b
] − IK 0 [b
− ΠK b
∗ ] ≥ 0 By Theorem 5.1 we have
1
kb
− l k = lm( √K ) It follows that
0 ≤ IK 0 [b
− ΠK b
∗ ] + K hb
− l r∗ i + lm(2K )
35
Note that b
− ΠK b
∗ = −K ΠK r∗ Which means that
and
− ΠK b
∗ ] = −K IK 0 [ΠK r∗ ]
IK 0 [b
− l r∗ i + lm(2K )
0 ≤ −K IK 0 [ΠK r∗ ] + K hb
Therefore
− l r∗ i + lm(2K )
0 ≥ K IK 0 [ΠK r∗ ] − K hb
Since this holds for r∗ = ±>∗ we obtain
¯
¯
∗ ¯
¯IK [ΠK r∗ ] − hb
−
r
i
= lm(K )
l
0
This proves that
¯
¯
1
¯IK [ΠK r∗ ] = hb
− l r∗ i¯ + lm( √ )
0
K
Hence for any fixed non-zero ; ∈ RH+1 we have
; 0 (b
' K − '0 ) = −
K
K
1X 1 X
1
jf 0 [>∗ ]0 Σ(Uf )−1 jf (0 ) + √
k
K
K
f=1
f=1
substituting for >∗ we obtain
K
√
¡
¢−1 1 X
√
K(b
' K −' 0 ) = − B[A"∗ (U)0 Σ(Uf )−1 A" ∗ (U)]
A"∗ (U)0 Σ(Uf )−1 jf ( 0 )+lm(1)
K f=1
The theorem now follows from applying a standard CLT for i.i.d. data.
References
[1] Ahn H. and J.L. Powell(1993): Semiparametric estimation of censored Selection
Models with a Nonparametric Selection Mechanism, Journal of Econometrics,
58 3- 29.
[2] Ai, C and X. Chen(1999): Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions, Department of Economics,
NYU.
36
[3] Altug, S and R. A. Miller (1998): The Effect of Work Experience on Female
Wages and Labour Supply, Review of Economic Studies, 45-85
[4] Anderson, T.W. and C. Hsiao (1981): Estimation of Dynamic Models with
Error Components, Journal of American Statistical Association, Vol. 76, 598606.
[5] Andrews, D W. K. (1987): “Consistency in Nonlinear Econometric Models: A
Generic Uniform Law of Large Numbers,” Econometrica, 55, 1465-1472.
[6] –––(1988): “ Laws of Large Numbers for Dependent Non-identically Distributed Random Variables,” Econometric Theory, 4, 458-467.
[7] ______(1991): ” Asymptotic NormAity of Series estimators For Nonparametric and Semiparametric regression Models”, Econometrica, Vol. 59, 307-347.
[8] _____(1992): “ Generic Uniform Convergence,” Econometric Theory, 8,
241-257.
[9] ______(1993): “ Tests for Parameter Instability and Structural Change
With Unknown Change Point,” Econometrica, 61, 821-856.
[10] ______(1994a): “ Asymptotics for Semiparametric Econometric Models Via
Stochastic Equicontinuity,” Econometrica, 62, 43-72.
[11] ______(1994b): “ Empirical Process Methods in Econometrics,” Chapter 2
in Handbook of Econometrics, Vol. 4, New York: North Holland.
[12] Arellano, M. and S. Bond (1991): Some Tests of Specification for Panel Data:
Monte-Carlo Evidence and an Application to Employment Equations, Review
of Economic Studies, Vol.56, 277-297.
[13] Arellano, M. and O. Bover(1995): Another Look at the Instrumental Variable
Estimation of Error Component Models, Jornal of Econometrics, 68, 29-51.
[14] Arellano, M and R. Carrasco( 2002): Binary Choice Panel Data Models with
Predetermined Variables, CEMFI, Madrid.
[15] Arellano, M. and B. Honore (2001): Panel Data Models. Some Recent Developments. Handbook of Econometrics, in J. Heckman and E. Leamer (eds.),
Handbook of Econometrics, Vol.5, Ch. 53, North Holland.
37
[16] Billingsley, P. (1995): Probability and Measure. New York: Wiley.
[17] Bond, S. and C. Meghir (1994): Dynamic Investment Models and the Firm’s
Financial Policy, Review of Economic Studies, Vol.61, 197-222.
[18] Browning, M. (1992): Children, Household, Economic Behavior, Journal of
Economic literature, Vol.30, 1434-1476.
√
[19] Chen, S. (1998): K−consistent estimation of a panel Data Sample Selection
Model, Department of Economics, The Hong Kong Unversity of Science and
Technology.
[20] Chen, S (1999): Semiparametric estimation of heteroscedastic Binary Choice
Sample Selection Models Under Symmetry, The Hong Kong Unversity of Science and Technology.
[21] Chen, X and X. Shen (1998): Sieve Extremum Estimates for Weakly Dependent
Data, Econometrica,66, 289-314.
[22] Davidson, J. (1994): Stochastic Limit Theory : An Introduction for Econometricians. New York : Oxford University Press.
[23] Donald S.G. (1995): Two-Step Estimation of Heteroscedastic Sample Selection
Models, Journal of Econometrics, 65, 347-380.
[24] Fenton, V. and gallant(1996): Convergence Rate of SNP Density Estimators,
Econometrica 64, 719-727.
[25] Gayle, G-L and Robert A. Miller (2002), Life-cycle Fertility Behavior and Human Capital Accumulation, Working Paper, Department of Economics, University of Pittsburgh.
[26] Gronau, R. (1974): Wage Comparisons - A Selectivity Bias, Journal of Political
Economy, Vol.82, 1119-1144.
[27] Heckman, J. (1976): Common Structures of Statistical Models of Truncations
Sample Selection and Limited Dependent Variables, and a Simple Estimator for
Such Models, Annals of Economic and Social Measurement, Vol.15, 475-492..
[28] Holtz-Eakin, D., Newey and H.S. Rosen (1988): Estimating Vector Autoregressions with Panel Data, Econometrica, Vol 56, 1371-1395.
38
[29] Hotz, V. Joseph and Robert A. Miller (1988): An Empirical Analysis of Life
Cycle Fertility and Female Labour Supply, Econometrica, Vol. 56. no. 1, 91-118.
[30] Honore, B. and E. Kyriazidou (1998): Panel Data Discrete Choice models With
Lagged Dependent Variables, Princeton University.
[31] Jennrich, R. I. (1969): ” Asymptotic properties of Nonlinear least Squares
Estimators,” Annals of Mathematical Statistics, 40, 633-643.
[32] Keane, M.and D. Runkle (1992): On the Estimation of Panel Data Models
with Serial Correlation when Instruments are Not Strictly Exogenous, Journal
of Business and Economic Statistics, Vol.10, 1-9.
[33] Kyriazidou, E. (1997): Estimation of Panel Data Sample Selection Model,
Econometrica, Vol.65, 1335-1364.
[34] Kyriazidou, E.(1999): Estimation of Dynamic Panel Data Sample Selection
Model, Unpublished Manuscript, Department of Economics, University of
Chicago.
[35] Hsiao, C.(1986): Analysis of Panel Data. Cambridge: Cambridge University
Press.
[36] Matyas, L. and P. Sevestre, editors: Econometrics of Panel Data. Kluwer Academic Public Press.
[37] Newey W. (1988): Two-Step Series Estimation of sample Selection Models,
Princeton University.
[38] Newey, W, (1997): Convergence rates and asymptotic normAity for series estimators, Journal of Econometrics, Vol. 79, 147-168.
[39] Newey, W., J.L. Powell and F. Vella(2002): Nonparametric estimation of Triangular Simultaneous equations models, Econometrica
[40] Pollard, D. (1984): Convergence of Stochastic Processes. New York: SpringerVerlag.
[41] Powell, J.L., J.H. stock and T.M. Stoker (1989): Semiparametric Estimation
of weighted Average Derivatives, Econometrica 57, 1403-1430.
39
[42] Powell J.L.( 1994), Estimation of Semiparametric models, in Handbook of
Econometrics, Vol. 4, 2444-2523, eds. R.F. Engle and D.L. McFadden, Amsterdam: North-Holland.
[43] Runkle, D.(1991): Liquidity Constraints and Permanent Income Hypothesis:
Evidence from Panel Data, Journal of Monetary Economics, Vol.97, 73-98.
40