Qualitative Dependent Variable Models Except for the LPM, estimation of binary choice models is usually based on maximum likelihood techniques. Each observation is treated as if it was a single draw from a Bernoulli distribution JHGLL, p. 44; Greene, p. 1026 Consider an experiment in which only two outcomes are possible, 0 or 1 y, the outcome of the experiment, is said to have a Bernoulli distribution P(y = 1) = p and P(y = 0) = 1 – p where 0 ≤ p ≤ 1 The PDF of y can be represented as: p y 1 p 1 y y 0,1 f y | p 0 otherwise y can either be 0 or 1 1 Qualitative Dependent Variable Models Maximum likelihood estimation of the discrete choice model Define Pi* as the probability of observing whatever value of y that actually occurs for a given observation * Pi Pr yi =1|X i if yi = 1 observed 1 Pr yi =1|X i if yi = 0 observed Pr(yi=1|Xi)=F(Xiβ) Pr(yi=0|Xi)=1 − F(Xiβ) General CDF This is the result obtained from the latent variable interpretation 2 Qualitative Dependent Variable Models The model From above we represent the success probability by F(Xβ) Assume we have independent “draws” The joint probability of sample size T is: Pr y1, y 2 ,..., y T | X combination of 0’s and 1’s P(yi = 1) 1 F Xiβ F Xiβ yi 0 yi 1 Product for those obs. with yi = 0 Product for those obs. with yi = 1 Bernoulli Result: p y 1 p 1 y f y | p 0 otherwise y 0,1 Total of T obs: i.e., an observation makes only 1 contribution to the joint PDF 3 Qualitative Dependent Variable Models p y 1 p 1y y 0,1 f y | p 0 otherwise The total sample likelihood function for General CDF T observations is: l β|X,y F Xβ 1-F Xβ i1 T yi 1-yi Notice what happens when yi = 1 or 0 The contribution of an observation to the sample likelihood function will depend on whether it has an observed 0 or 1 value An observation makes only one contribution to the likelihood function: F(Xβ) or 1-F(Xβ) The total sample log-likelihood function: i 1 T L yi ln F Xβ 1 yi ln 1 F Xβ 4 Qualitative Dependent Variable Models The FOC for maximization of the total sample LLF are: Note what happens when y = 0 or 1 L T f i f i yi 1-yi Xi 0 β i 1 Fi 1 Fi where fi is the PDF ≡ dF/d(Zi) X iβ In general, the partial derivative of a CDF wrt its argument is the PDF evaluated at that argument’s value: CDF Z PDF Z Z With Zi X iβ CDF Z PDF Zi X i β 5 Qualitative Dependent Variable Models i 1 T L yi ln F Xβ 1 yi ln 1 F Xβ Choice of a particular functional form for F(·) generates the empirical model Remember if the distribution is symmetric (e.g. normal, logistic) then 1− F(Xβ) = F(−Xβ) If F(Xβ) is symmetric: qi ≡ 2yi – 1 When yi = 0 → qi = −1 When yi = 1 → qi = 1 → you can simplify the above LLF: T L ln F q i Xiβ i 1 6 Qualitative Dependent Variable Models T L ln F q i X iβ q i 2yi 1 i 1 Fi F X iβ f i f X iβ f i L T f i yi 1-yi Xi 0 β i 1 Fi 1 Fi The FOC’s are a system of K nonlinear functions of β!! Under most likely conditions this likelihood function is globally concave Ensures uniqueness of the ML parameter estimates Properties of Maximum Likelihood parameter estimates Consistent, asymptotically efficient and asymptotically normally distributed around β 7 Qualitative Dependent Variable Models How can we estimate βl? As we derived under the general ML section: 1 2 L 1 In general: βl E I β ββ 1 L Newton-Raphson: βl ββ β βl 2 1 Li Li BHHH: βl β β i 1 β β l T The NR and BHHH are asymptotically equivalent but in small samples they often provide different covariance estimates for the same model 8 Qualitative Dependent Variable Models The above shows the relationship between the curvature of the LF and βl. The size of the variance is inversely related to the 2nd derivative. The smaller the derivative the larger the variance. Smaller 2nd derivatives • → flatter LF • → the harder to find a max and the less confidence in the solution 9 Qualitative Dependent Variable Models p y 1 p 1 y f y | p 0 otherwise L β|y,X y 0,1 yi ln F X iβ 1-yi ln 1-F X iβ T i=1 The above LLF does not assume a particular CDF functional form Standard Normal CDF: Probit Model Logistic CDF: Logit Model 10 Qualitative Dependent Variable Models If RV s is distributed standard normal: s~N(0,1) s2 1 exp 2 2 PDF(s): s s CDF(s): s t2 1 exp dt 2 2 If RV s is distributed logistic: s~logistic(0,π2/3) PDF(s): s exp -s 1+exp(s) 2 exp s 1 CDF(s) : s 1+ exp(s) 1+ exp( s) 11 Qualitative Dependent Variable Models Remember for both the normal and logistic distributions we have: CDF s PDF s s Lets compare the logit vs. standard normal distributions General Functional Forms PDF’s CDF’s 12 Qualitative Dependent Variable Models 13 Qualitative Dependent Variable Models 14 Qualitative Dependent Variable Models Probit Model CDF: Φ(Xβ) Std. Normal PDF d Xβ Xβ βi via chain rule dXi In general: (s) s s s d Xβ βi Xβ Xβ dX i Z Xβ (Z) (Z) Z Z Z βi Xi Z Xi Xβ 1 Xβ 15 Qualitative Dependent Variable Models Logit Model CDF: PDF: Xβ d Xβ dXi d Xβ dXi βi Xβ e Xβ 1 Xβ 1 e ( Xβ) 2 1+ e Xβ e 1+ e ( Xβ) 2 Xβ βi Xβ Xβ 1 e Xβ 1 Xβ 1 1 1+ e( Xβ) Xβ Xβ 1 e 1 e ( Xβ) ( Xβ) 1+ e 1+ e 1+ e( Xβ) 16 Qualitative Dependent Variable Models Under the probit model we have: y*=Xβ+ε εi~N(0,1) → Pr(yi = 1) = Φ(Xiβ) Probit total sample log-likelihood function L β|y,X;ε ~ N(0,1) std. normal CDF yi ln Φ Xiβ 1-yi ln 1-Φ Xiβ T i 1 17 Qualitative Dependent Variable Models The FOC for maximizing L(β|y,X) are X iβ X iβ L Xi Xi 0 β yi 0 1 X iβ yi 1 X i with λ 0i λ1i X iβ 1 X iβ X iβ X iβ X iβ X iβ Eq. 19.2.19 in JHGLL L λ 0i X i λ1i X i 0, β yi 0 yi 1 L T q i q i X iβ q i (2yi 1) Xi 0 β i 1 q i X iβ In general: Xβ Xβ 1 Xβ Xβ 18 Qualitative Dependent Variable Models We can further simplify the FOC’s for the probit model to be the following: T L T qi qi Xiβ Xi λi Xi 0 β i 1 qi Xiβ i 1 qi qi Xiβ where: λi i 1 q i X iβ T qi (2yi 1) 19 Qualitative Dependent Variable Models One can show that the Hessian for the probit model is: Eq. 19.2.21 in JHGLL 2L H ββ X i β X i β X i β yi 2 T X iβ Xi Xi X iβ i 1 1 yi Xiβ Xiβ Xiβ 2 X β i T λi λi + Xiβ Xi Xi i 1 T where: λi i 1 qi qi Xiβ q i X iβ qi 2yi 1 20 Qualitative Dependent Variable Models One can show that the above Hessian is negative definite for all values of β → the LLF is globally concave The asymptotic covariance matrix for the ML estimator of β can be obtained from: The inverse of the Hessian evaluated at βML (NR method): T λ λ X β X X i i i i i 1 2 iT1 λi Xi Xi 1 1 BHHH estimator: Based on the inverse of the expected value of the Hessian evaluated at βML T E H =E λ i λ i + X iβ X i X i i 1 Note there are yi’s in the Hessian T λi i 1 qi qi Xiβ q i X iβ qi (2yi 1) 21 Qualitative Dependent Variable Models y*=Xβ+ε Under the logit model we have εi ~logistically → Pr y*i 0 Pr yi 1| X i eXiβ 1 e Xiβ Logistic CDF X iβ Logit sample log-likelihood function: L β|y,X;ε ~ logistic yiln Xiβ i 1 1-yi ln 1- Xiβ Similar to probit T 1 Xβ e -Xβ 1+ e(-Xβ) The FOC for maximizing logistic L are L T yi i Xi 0 β i 1 22 Qualitative Dependent Variable Models One can show that the Hessian for the logistic model is: T 2L H Xiβ 1 Xiβ Xi Xi ββ i 1 Note that the LLF Hessian does not involve the RV yi unlike the Hessian for the standard normal LLF One can show that the Hessian is always positive definite →The LLF is globally concave 23 Qualitative Dependent Variable Models Similar to the probit model, the asymptotic covariance matrix for the ML estimator of β can be obtained from our ML-based methods: Based on the inverse of the Hessian evaluated at βML NR Xiβ 1 Xiβ Xi Xi i 1 T 1 BHHH estimator: 1 T 2 B Σ yi Xiβ Xi Xi i 1 With y is not in the Hessian, ΣNR = Σ GN 24 Qualitative Dependent Variable Models L β|y,X yi ln F Xiβ 1-yi ln 1-F Xiβ T i=1 In summary, probit and logit functional forms are used to implement the above log-likelihood based-models In terms of hypothesis testing: Asymptotic characteristic of parameter estimates obtained from both logit and probit models: βˆ ~ N β,cov βˆ 2L β β 1 Functional form varies and depends on whether estimating a Logit or Probit model 25 Qualitative Dependent Variable Models Statistical significance of single parameter β̂ k β k0 z ~ N(0,1) V βˆ k Tests of general hypotheses H0: Rβ = r J indep. hypotheses H1: Rβ ≠ r where βˆ ~ N β,cov βˆ Remember that that above implies: Rβˆ r ~ N Rβ r,R cov βˆ R True, unknown value 26 Qualitative Dependent Variable Models Under H0 ∂H0/∂β H0 1 w Rβˆ r Rcov βˆ R Rβˆ r Cov(Rβ̂) If λw > J → reject H0 λw/J ≈ FJ,T-K χ2 ~ 2 χJ Likelihood Ratio Test λ LR 2 L βˆ L β R ~ χ J2 unrestricted coefficients restricted coefficients 27 Qualitative Dependent Variable Models 1 w Rβˆ r Rcov βˆ R Rβˆ r Lets test joint hypothesis that a subset of coefficients (say the last M) are zero R=[0(K-M)|IM] r=0M (MxM) submatrix 1 ˆ W βˆ M βM βM (Mx1) matrix ΣβM created first and then inverted =Rcov(β)R′ 28 Qualitative Dependent Variable Models Joint test of all slope coefficients being 0 Equivalent to the test that the model explains significantly more of the variability of the dependent variable then naïve model of no exogenous variable impacts For both logit and probit models the restricted model can be represented as: L(βR) = T[P*ln(P*)+(1-P*)ln(1-P*)] where P* is the sample proportion of observations that have y = 1 No need to actually estimate a restricted model As Greene notes (p. 704), don’t use the LR test to test the Probit versus Logit model → no parameter restrictions available to go from one to the other 29 Qualitative Dependent Variable Models As noted above: λ LR 2 L βˆ L β R ~ χ J2 Similar to the overall equation Fstatistic in the CRM, you should include the χ2 test statistic for jointly 0 exogenous variable coefficients as you do not actually need to estimate a 2nd model L(βR) = T[P*ln(P*) + (1-P*)ln(1-P*)] sample proportion = 1 30 Qualitative Dependent Variable Models Whatever the assumed error term distribution, the parameters of the discrete choice model are not the marginal effects In terms of the observed dependent (0/1) variable with the general CDF [F]: E(y|X) = 0[1-F(Xβ)] + 1F(Xβ) = F(Xβ) Two possible values of Y Z ≡ Xβ In general we have: PDF E y|X F Xβ β f Xβ β X Xβ F Z Z Z X F Z Z 31 Qualitative Dependent Variable Models For the standard normal distribution E y|X Xβ β= Xβ β X Xβ For the logistic distribution PDF>0 Xβ Xβ 1 Xβ Xβ E Y|X Xβ 1 Xβ β X Greene, p. 775 The above implies the marginal effect Has the same shape as the associated PDF Inflated or deflated by βi 32 Qualitative Dependent Variable Models One can evaluate the above expressions At sample means of your data At every observation and use sample average of individual marginal effects For small/moderate sized samples you may get different marginal effects depending on method used. Train(2003) has a good discussion of predictions and the pros/cons of each method When one has a dummy variable (D) In general the derivative is with respect to a small continuous change →It is not appropriate to apply when analyzing effect of a dummy variable33 Qualitative Dependent Variable Models For discrete exogenous variables the appropriate marginal effect (ME) is: E(y) ME Pr y 1| X d ,D 1 D Pr y 1| X d ,D 0 Mean of other exogenous variables Pr(y = 1)= F(Xβ|X*,D,β) 34 Qualitative Dependent Variable Models Under logit and probit specifications of the discrete choice model: predicted probabilities and estimated marginal effects (for either continuous or discreet cases) are nonlinear functions of parameters. To compute standard errors one can use the linear approximation (delta) method (Greene, p.68) Can use numerical methods or use analytical results when implementing the delta method 35 Qualitative Dependent Variable Models For example, the variance of the predicted probability of undertaking the activity: X* is a point Asy.Var Fˆ X*βˆ of evaluation Asy.Var(β) Fˆ X*βˆ Fˆ X*βˆ Predicted Probability β β β estimated coeff. F̂ X*βˆ with fˆ X*βˆ X* β via chain rule Predicted PDF (1 x 1) *ˆ Asy.Var Fˆ X β fˆ X*βˆ 2 X*β X* Note that the above variance depends on the point of evaluation, X* 36 Qualitative Dependent Variable Models What are the variances of changes associated with dummy variables? ˆ = 1) (F|D ˆ = 0) Fˆ (F|D Predicted Probability Fˆ X*βˆ Fˆ X*βˆ Asy.Var.Fˆ β β β F̂ X*βˆ fˆ1 X*D=1βˆ X*D=1 β K βˆ X fˆ X* 0 D=0 X*D=1 X D 1 , X*D=0 X D * D=0 0 Mean of variables except D 37 Qualitative Dependent Variable Models What is the variance of the marginal effects of a change in X on the probability of occurrence where X is continuous? i.e., is the Prob. of event related to X? Lets define ∂F(Xβ)/∂X as γ where PDF ˆ ˆ γ̂ f(Xβ)β Marginal impact of X The variance of the above marginal effects can be obtained from: γˆ γˆ Asy.Var. γˆ β ˆ ˆ β β We are taking the derivative of the marginal effect wrt the β’s What do these derivative look like? 38 Qualitative Dependent Variable Models ˆ F(Xβ) ˆ ˆ γ̂ f(Xβ)β X X* is a point of evaluation With z ≡ X*β we have via the chain rule: γ̂ βˆ df z z ˆ f z β βˆ βˆ dz βˆ df z ˆ * f z IK βX dz Note at X* f(z) is a scaler and this is a vector Functional form will vary across error specification, i.e., probit or logit model 39 Qualitative Dependent Variable Models df z ˆ * γ̂ *ˆ f X β IK βX dz βˆ γ is the marginal effect on Pr(y=1) of a change in X γˆ γˆ Asy.Var. γˆ β ˆ ˆ β β For the probit model we have: z ≡ Xβ df/dz=dφ/dz= -zφ(X*β) standard normal PDF Asy.Var. γˆ X*βˆ I X*βˆ βˆ X* X*β K a scalar *ˆ X β ˆ βX X β X*βˆ I X*βˆ βˆ X* X*β K 2 I βˆ X* X*β K I K * * With K parameters, the above covariance matrix will be (KxK) 40 Qualitative Dependent Variable Models df z ˆ * γ̂ *ˆ f X β IK βX 1 dz βˆ z γ is the marginal effect 1 exp(-z) γˆ γˆ Asy.Var. γˆ β ˆ ˆ β β For the logit model we have: z ≡ X* β ˆ z 1 Λ ˆ z fˆ z Λ ˆ z dΛ dfˆ ˆ z 1 2Λ dz dz ˆ z Λ ˆ z 1 Λ ˆ z 1 2Λ ˆ z 1 Λ ˆ z Asy.Var γˆ Λ 2 f(z)2 ˆ z X*β I 1 2Λ ˆ z X*β IK 1 2Λ K 41 Qualitative Dependent Variable Models Goodness-of-Fit measures for discrete choice models Dependent variable is not continuous so the use of R2 not appropriate Should always report: The LLF value for the full model The LLF value with only the intercept which can be calculated directly from the data L(βR) = T[P ln(P) + (1-P)ln(1-P)] where P is sample proportion with y = 1 = average value of y →you only have to run 1 model to obtain both LLF values 42 Qualitative Dependent Variable Models Likelihood Ratio Index (Pseudo R2 Value) LU less negative than L0 Unrestricted L-value L U L-value when all exogenous LRI 1 L0 variable coefficients set to 0 Bounded between 0 & 1 If all slope coefficients are indeed 0 → LU=L0→LRI=0 No way to make LRI=1 although one can come close. Estimated Φi(Xiβ) = 1 when y=1 Φi(Xiβ) = 0 when yi = 0 →LU = 0 (perfect fit) → LRI=1 Again, you only need to estimate the original model, LU and L0: L0 = T[P ln(P) + (1-P)ln(1-P)] 43 Qualitative Dependent Variable Models Likelihood Ratio Index (Pseudo R2 Value) The value has no natural interpretation with a value between 0 and 1 like R2 However when comparing 2 models estimated with the same data and with same 0/1 choice (e.g., L0 the same under both) it is usually valid to say that the model with the higher LRI value is preferred e.g. ↑ the value of L(•) is preferred →2 models estimated using nonidentical samples or w/different alternatives can’t be compared using LRI 44 Qualitative Dependent Variable Models Ben-Akiva and Lerman(1985) define a measure of explanatory power based on the probability of correct prediction: R 2BL 1 T ˆ yi Fi 1 yi 1 Fˆi T i 1 predicted prob of occur. One problem is that for unbalanced data, less frequent outcomes are usually predicted poorly which may not be readily apparent in the above Cramer(1999) has suggested a measure that addresses the above shortcoming: C Fˆ | yi 1 Fˆ | yi 0 avg. pred. probability Higher values are better 1-Fˆ | yi 0 1-Fˆ | yi 1 Penalizes incorrect prediction Given conditional means, not impacted by disproportionate sample sizes 45 Qualitative Dependent Variable Models Contingency Table used to summarize results Categorizes hits/misses via the following prediction rule * ˆ ŷ 1 if F > F i , some critical value, i 0 otherwise F* usually set to 0.5 Note, under the naïve model (everyone has predicted value of 1) One always predicts 100P percent of observations correctly where P is the sample proportion with y=1 → naïve predictor never has a zero fit If the sample is unbalanced (e.g. many more 0’s or 1’s) you may not predict either a 1 or 0 using the above rule ŷi 1 if Fˆi > F* 46 Qualitative Dependent Variable Models Contingency Table Fi < F * TA TB Fi ≥ F* Predicted Values Observed Values y=0 y=1 TC TD TA+TB+TC+TD=T Correctly “predicted” You do not need to use 0.5 for F* May want to set F* relatively high (e.g. 0.75) 47 Qualitative Dependent Variable Models Adjusted Count R2 (Long) With a binary choice (0/1) model you will always have a “good” prediction as it is possible to predict at least 50% of the cases by choosing the outcome category w/the largest % of observed cases (0’s or 1’s) For example, 57% of sample is in the paid labor force → if your model predicts all individuals are working you will be correct 57% of the time The Adjusted Count R2 controls for this 48 Qualitative Dependent Variable Models Adjusted Count R2 No. of correct quesses NC n jj Max N r+ R 2AdjCount j=1 T -Max N r+ NC = number of choices Nr+= marginal count of contingency table for rth row (e.g., sum of rth row) Max(Nr+) ≡ maximum Nr+ value njj= count with the jth row and column (e.g., correct prediction) R2AdjCount = proportion of correct “guesses” beyond the number that would be correctly guessed by choose the largest marginal (e.g. predicting everyone works) Max(Nr+) 49 Qualitative Dependent Variable Models Akaike’s Information Criterion (AIC) (ah-kah-ee-kay’s) -2L U +2K AIC = T Number of RHS including intercept -2LU ranges from 0 to +∞ w/smaller values indicating a better fitting model Remember LU is ≤ 0 (i.e., sum of ln(Prob)) L β|y,X T yi ln F Xiβ 1-yi ln 1-F Xiβ i=1 As the number of parameters ↑ −2LU becomes smaller (better fit) 2K is added as a penalty for ↑ number of parameters Since number of obs.impacts total sample log-likelihood function value, LU, is divided by T to obtain per obs. value 50 Qualitative Dependent Variable Models Akaike’s Information Criterion (AIC) Used to compare models with different specifications Used to compare models with different sample sizes Why we divide by T Compare non-nested models that cannot be evaluated with the LR test In general, the model with the smaller AIC value is considered to be the better fitting model specification Remember, model “fit” is only one criteria in evaluation e.g., individual marginal effect results 51 Qualitative Dependent Variable Models An example discrete choice problem Choice of Auto or Bus for commuting Net utility obtained from using an auto to commute Determined by the difference in commute time (TD) between the two modes TD ≡ Bus Time - Auto Time Hypothesize a positive impact of an increase in TD on the probability of choosing to use an auto to commute Data for 21 commuters 52 Qualitative Dependent Variable Models Example Discrete Choice problem Auto Time Bus Time TD Auto Auto Time Bus Time TD Auto 53 4.4 -49 0 19 84 66 1 4.1 29 24 0 82 38 -44 1 4.1 87 83 1 8.6 1.6 -7 0 56 32 -25 0 23 74 52 1 52 20 -32 0 51 84 32 1 0.2 91 91 1 81 19 -62 0 28 80 52 1 51 85 34 1 90 2.2 -88 0 62 90 28 1 42 25 -17 0 95 22 -73 0 95 44 -52 0 42 92 50 1 99 8.4 -91 0 RHS variable ≡ Bus Time – Auto Time Observed Dependent Variable Values Note that the displayed TD values are rounded 53 Qualitative Dependent Variable Models Commuting choice problem summary yi*(Net Utility) = β1 + β2TDi + εi Log-likelihood function with εi~N(0,1) Std. Normal CDF Assume errors are normally distributed, homoscedastic and non-autocorrelated L β yi ln Xiβ 1 yi ln 1 Xiβ i 1 T Given data, a plot of L(β) over all possible values of β1 and β2 54 Qualitative Dependent Variable Models ML Estimation of Probit Model Proc Defining Likelihood Func. CRM Starting Values Dep. & Exog. Variables Analytical Gradients Maximum Likelihood Procedure (NR) Analytical Hessian Estimates of , , LLF Likelihood Ratio Test: 2= 3=…= K = 0 Marginal Effects 55 Qualitative Dependent Variable Models An example discrete choice problem Overview of MATLAB code Summary of Probit results Marginal impacts on probability of auto use as bus commute time standard normal PDF. increases Under the probit model: dΦ Xβ βTD Xβ dTD Probability of auto use Takes the same shape as the pdf The variance of the marginal effect calculated via delta method since these are nonlinear functions of the parameters 56 Qualitative Dependent Variable Models Heteroscedasticity in the binary choice model As noted by Greene (p.787-790), unlike the CRM, when we have heteroscedasticity in the binary choice model → ML estimators are Inconsistent Traditionally evaluated covariance matrix is inappropriate Lets look at the example given in Greene (p. 789-790) were he incorporates multiplicative heteroscedasticity This framework can be applied to both the logit and probit specifications Latent regression: y* = Xβ + ε where Var(ε) =[exp(Zγ|y,X)]2 E(ε) = 0 Note square term 57 Qualitative Dependent Variable Models Latent model e ~N(0,σ ) t t 2 Given this assumption we have: result P y t 1 P e t X tβ et X tβ Given exp(Ztγ)>0 P exp Zt γ exp Z t γ X tβ et ~ N 0,1 given exp Zt γ exp Zt γ Std. Normal CDF Std. deviation The resulting sample log-likelihood function for the heteroscedastic probit model is: X tβ y t ln T exp Zt γ L t 1 X tβ 1 y t ln 1 exp Z t γ Qualitative Dependent Variable Models Given the normality assumption, the gradients of the sample log-likelihood are (Greene p. 789): L T X t β y t X tβ exp Z t γ Xt β t 1 X tβ 1 X tβ L T X t β y t Φ X tβ exp Z t γ Z X tβ γ t 1 X tβ 1 Φ X tβ The above gradients imply that the loglikelihood could be difficult to maximize Greene (p. 789) notes that for identification purposes, in order to estimate all model parameters, the Z matrix cannot have a constant term Remember the restricted model has σ2=1 (i.e., γj=0, for all j) 59 Qualitative Dependent Variable Models P yt 1 X t β exp Zt γ Note what happens when variable wk is in both X and Z Unclear as to sign Xβ β k Xβ γ k Prob y=1|X,Z w k exp Zγ exp Zγ ? + Only the 1st term applies if wk appears Same sign as βk in X but not Z Xβ β k Prob y=1|X,Z w k exp Zγ exp Zγ Only the 2nd term applies if wk appears in Z but not X Prob y=1|X,Z w k Note the “–” sign Xβ Xβ γ k exp Zγ exp Zγ Unclear as to sign 60 Qualitative Dependent Variable Models Lets look at the labor supply example presented in Greene (p. 789-790) Can motivate this model using the latent variable approach Individuals make decision whether to accept an offer of employment depending on whether wage offer exceeds reservation wage (offered wage – reservation wage > 0?) Reservation wage determined by VMP in home production: educational attainment, age, presence of children in the household, other sources of income and marginal tax rates We use Mroz’s (1987) data on labor supply of married women from 1975 61 Qualitative Dependent Variable Models Under the heteroscedastic discrete choice model we developed earlier we have: y*t X tβ+ε t where ε t ~ N 0, exp Z t γ 2 X tβ εt P y t 1 ~ N 0,1 , exp Z t γ exp Z t γ Std. Normal CDF As we saw earlier, the probit model sample log-likelihood function is: X tβ y t ln T exp Zt γ L t 1 X tβ 1 y t ln 1 exp Zt γ 62 Qualitative Dependent Variable Models 753 observations/428 were participants in the formal labor market Prob(LFP=1)=F(Age, Age2,HHIncome, Educ) LFP ≡ 1 if woman worked in 1975 Age ≡ Wife’s age (years) HHIncome ≡ Family income in 1975$ Educ ≡ Wife’s educational attainment (years) 63 Qualitative Dependent Variable Models Lets review the MATLAB program used to estimate this model Initially, homoscedastic error term Estimate 3 models Full sample Kids present sub-sample Kids not present sub-sample Evaluate marginal effects → the marginal effect of the age variable using traditional formula is not correct given presence of Age2 In the program we use a separate function to estimate marginal impacts given quadratic age effect 64 Qualitative Dependent Variable Models After we estimate the full homoscedastic model, lets test the following null hypothesis using above MATLAB Code: H0: No statistical difference in coefficients whether or not children are present (i.e., at least one is different) Restricted model based on pooled (full) sample with rhs variables of Age, Age2, HHIncome, and Educ LLFR= -496.87 Unrestricted model obtained from estimating the probit for two mutually exclusive subgroups Kids=1 (524 obs) Kids=0 (229 obs) 753 Obs. → all estimated coefficients could differ depending on kid status 65 Qualitative Dependent Variable Models Unrestricted Model LLFKid=1 = – 347.87 (524 obs) LLFKid=0 = – 141.61 (229 obs) Total sample LLF for unrestricted model is the sum of the above LLFs since all observations accounted for only once → LLFU = – 347.87 – 141.61 = – 489.48 (753 obs) χ2 for testing the 5 restrictions is: LR=2[– 489.48 – (– 496.87)]=14.77 Unrestricted LLF Restricted LLF from full model Critical χ2.05,5=11.07 Reject null hypothesis of no statistical difference in estimated coefficients across child present sub-groups 66 Qualitative Dependent Variable Models Lets now estimate a probit model of labor supply where we allow for the possibility of heteroscedastic errors in the latent regression equation y* = Xβ + ε, where εt ~ N(0,σ2t) =1 if #of Kids > 0 Var(εt)=[exp(γ1Kidt + γ2HHIncomet)]2 X matrix composed of Intercept, Age, Age2, HHIncome, Educ, and Kid Note no intercept term in variance function γ1 = γ2 = 0→ Var(εt)=e0=1 →homoscedastic I developed some MATLAB code to estimate the heteroscedastic probit models Estimate both homoscedastic and heteroscedastic specifications 67 Qualitative Dependent Variable Models Prob(LFP=1)=F(Age, Age2,HHIncome, Educ, Kid) LFP ≡ 1 if woman worked in 1975 Age ≡ Wife’s age (years) HHIncome ≡ Family income in 1975 $ Educ ≡ Wife’s educational attainment (years) Kid ≡ 1 if children < 18 years old Var(εt)=[exp(γ1Kidt + γ2HHIncomet)]2 68 Qualitative Dependent Variable Models Comparison of Results: Homoscedastic Heteroscedastic Variable Coef. Std Err Coef. Std Err Inter. -4.1568 1.4021 -6.0299 2.4984 Age 0.1854 0.0660 0.2643 0.1182 Age2 -0.0024 0.0008 -0.0036 0.0014 Income Edu 0.0458 0.0421 0.0982 0.0230 0.4244 0.1402 0.2219 0.0519 Kid -0.4490 0.1309 -0.8791 0.3028 Heteroscedastic Component Kid --------- -0.1407 0.3237 Income LLF ---------490.847 0.3129 0.1228 -487.636 β=0: χ2(5)= 48.05 γ=0: χ2(2)= 6.42 LR Test: 2 χ (.05,5)Crit=11.07 χ2(.05,2)Crit = 5.99 69 Qualitative Dependent Variable Models Homoscedastic probit: marginal impacts of age on probability of LFP Use a separate procedure to estimate marginal impacts given quadratic function wrt age Negative and significant age impact “+” age variable “–” age2 variable Use discrete change effect formulas to evaluate whether or not the presence of KIDS impacts LFP under the homoscedastic probit model Eq. 23-25, 23-26 p. 781 in Greene Negative and significant impact 70 Qualitative Dependent Variable Models Lets test heteroscedasticity versus homoscedasticity H0: Homescedastic Probit (i.e., γ1 = γ2 = 0) H1: Heteroscedastic Probit The 95% χ2 critical value for 2 restrictions is 5.99 χ2 tests of whether our Probit is homoscedastic versus heteroscedastic LR value: 6.42 Wald Statistic: : 6.43 71 Qualitative Dependent Variable Models χ2 tests of whether our Probit is homoscedastic versus heteroscedastic LM value: 2.24 LM ≡ g′Σβg where g is the 1st deriv. of the sample LLF of the unrestricted (heteroscedastic) model evaluated at restricted parameter values (homoscedastic) I use BHHH estimate of parameter variance (evaluated at restricted values) As noted by Greene, the covariance estimate based on E(H) may be preferred LM inconsistent with LR and Wald Our use of the BHHH estimator may be the reason 72 Qualitative Dependent Variable Models Separate procedure used to calculate marginal effects of variables under heteroscedastic formulation Using formulas from p. 788 in Greene Xβ β k Xβ γ k Prob y=1|X,Z w k exp Zγ exp Zγ Used variables “in_x” and “in_z” to determine correct formula Note: the AGE impacts are not correct due to nonlinear relationship nor is the KIDS impacts due to discrete nature Natural extension of the homoscedastic probit results Marginal income effect increases from 0.017 under homo. and not significant to 0.038 and a Z-stat of 1.8 under hetero. 73 Qualitative Dependent Variable Models These marginal impacts are not really meaningful given the nature of the dependent variable and units of income for example Need to evaluate these impacts in elasticity terms, ξw How would one go about this? X tβ et P yt 1 , ~ N 0,1 exp Z γ exp Z γ t t ˆˆ * Prob y=1|X* ,Z* ,β,γ w k ˆw k ˆˆ w k Prob y=1|X* ,Z* ,β,γ Predicted value X*,Z* usually set at sample mean values but can vary depending on study being undertaken 74
© Copyright 2025