Empirical Assignment 2 - University of British Columbia

Page 1 of 8
3/30/2015
The University of British Columbia
Department of Economics
Economics 561: Economics of Labour (Industrial Relations)
Professor Nicole M. Fortin
Winter 2015
Empirical Assignment #2
Due Date: April 15th
The purpose of this assignment is to give you practical knowledge of the Oaxaca-Blinder decomposition
methodology for means, the RIF decompositions for quantiles, and of the DFL decomposition
methodology for wage distributions. As always, feel free to work cooperatively and in groups. However,
each student must hand in his/her own problem set using his/her own interpretation of the results. You
are encouraged to use STATA which is available on the computers on the 11th floor. Do not hand in
your entire computer output. You can paste parts of your log files into your assignment but you should
sift through your output to include only the relevant parts. The data for the assignment are available on
the course web site in the zip file: assign2.zip. Some pre-programmed procedures (oaxaca,
fastgini) have to be downloaded from STATA resources (e.g. by issuing the command : findit
oaxaca. The estimations in both papers below were originally performed using the SAS software, so
that you may not be able to get the exact same numbers.
Note that the STATA commands below use ; as delimiter, you need to begin your program with the
command
#delimit ;
*/the delimit command replace the cr as end of line*/
1. In this exercise, you are asked to reproduce some of the results of the paper
O’Neill, J. and D. O’Neill, “What Do Wage Differentials Tell Us about Labor Market
Discrimination?” NBER Working Paper 11240 (April 2005).
The NLSY data used by the O’Neills can be downloaded as file nlsy00.dta. Refer to the article for
explanations of the variables. Here, we will focus on racial and ethnic wage differentials for men, so
drop women and rescale the AFQT scores.
drop if female==1;
replace afqtp89=afqtp89/100.0;
a) We will begin by running the OLS regressions to measure the log wage differentials as partial
regression coefficients of the dummies variable for black in a sample of non Hispanic men. You
should gather your results in a Table with 6 columns, where each column will correspond to a
different specification and where the coefficient of black will be the first entry, a bit like Table
1, but focusing only on the black-white differentials. Note that lropc00 is the variable
representing log wage and sch10_12 is the omitted category for education.
1. First run a regression where the only explanatory variable is black to get the raw log wage
differential and its standard errors. Is it statistically significant?
reg lropc00 black if hispanic==0;
2. Next add some demographic variables:
reg lropc00 black age00 msa ctrlcity north_central south00 if hispanic==0;
Page 2 of 8
3/30/2015
3. Then add AFQT scores:
reg lropc00 black age00 msa ctrlcity north_central south00 afqtp89 if
hispanic==0;
4. Now include schooling variables but remove AFQT scores
reg lropc00 black age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col if
hispanic==0;
5. Then, include variables pertaining to labour market experience
reg lropc00 black age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col age1stb30
wkswk_18 yrsmil78_00 famrspb pcntpt_22 if hispanic==0;
6.
Finally, add the AFQT scores back
reg lropc00 black age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89
age1stb30 wkswk_18 yrsmil78_00 famrspb pcntpt_22 if hispanic==0;
Comparing the results of the different columns what can you conclude about the magnitude and
statistical significance of the unexplained log wage differentials between Black and Whites in
this NLSY79 sample? What can you conclude about the effect of AFQT scores in accounting for
the racial wage gap? Why could the results from comparing different columns lead to erroneous
interpretations of the effect of AFQT (Hint: any omitted variables bias)? Explain.
b) Next you are asked to reproduce the decomposition results of Table 5 for the model M1. Begin
by obtaining means of the explanatory variables for the 3 racial/ethnic groups,
for example for whites, run the command
sum lropc00 age00 msa ctrlcity north_central south00 west sch_10 sch10_12
diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89 if
white==1 & hispanic==0;
Can you predict which variables are likely to have more explanatory which respect to
racial/ethnic wage differentials?
c) Use the following commands to perform an Oaxaca-Blinder decomposition (after being sure that
the oaxaca package has been installed on your computer) of the white-black wage differential
among men. Use in turn the white coefficients, the black coefficients and then the pooled
coefficients (the option weight(1) use group 1 coefficients and weight(0) group 2
coefficients as reference coefficients).
oaxaca lropc00 age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89 if
hispanic==0, weight(0) by(white) detail(groupage: age00 msa ctrlcity
north_central south00 west, grouped: sch_10 diploma_hs ged_hs smcol
bachelor_col master_col doctor_col);
oaxaca lropc00 age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89 if
hispanic==0, weight(1) by(white) detail(groupage: age00 msa ctrlcity
north_central south00 west, grouped: sch_10 diploma_hs ged_hs smcol
bachelor_col master_col doctor_col);
oaxaca lropc00 age00 msa ctrlcity north_central south00 west sch_10
diploma_hs ged_hs smcol bachelor_col master_col doctor_col afqtp89 if
hispanic==0, pooled by(white) detail(groupage: age00 msa ctrlcity
Page 3 of 8
3/30/2015
north_central south00 west, grouped: sch_10 diploma_hs ged_hs smcol
bachelor_col master_col doctor_col);
Does the unexplained part in this last pooled decomposition correspond to the coefficient of the
black dummy estimated in a) column 6. Because the coefficients of different reference groups
are used as reference coefficients in each of the three decompositions, we might expect different
results. Explain. Would you conclude that there remains no significant explained racial wage gap
once racial differences in schooling and AFTQ scores have been controlled for? Are your
predictions in b) about the most important explanatory variables confirmed? Explain. What
portion of the racial wage gap is accounted for by racial differences in AFTQ scores? Do your
conclusions about the importance of AFTQ scores in accounting for racial gap differ from a)?
Which methodology gives you more conclusive results? Explain.
d) Now perform the corresponding estimation for the white-Hispanic wage differential. Would you
conclude that there remains no significant explained white-Hispanic wage gap once ethnic
differences in schooling and AFTQ scores have been controlled for? Is this result sensitive to the
choice of reference group? Explain.
e) One problem with the Oaxaca-Blinder decomposition is that it is not-invariant to the choice of
omitted category in the case of categorical variables. Redo the black-white decomposition above
but this time, choose south00 as the omitted region, you will need to construct the dummy
variable east
gen east=0;
replace east=1 if west==0 & north_central==0 & south00==0;
Does anything change? Explain what is going on.
2. In this exercise, you are asked to reproduce some of the results of the paper
DiNardo, J., N. Fortin, and T. Lemieux, “Labor Market Institutions and the Distribution of
Wages,1973-1992: A Semiparametric Approach," Econometrica, Vol. 64 (Sept. 1996):1001-44.
The CPS data for 1979 and 1988 for male workers used in DFL can be downloaded from on the
course web site as menun7988_cell.dta. Refer to the article for explanations of the variables. Note
that standard errors for these procedures are typically obtained by bootstrap, which you will not be
asked to compute.
a) Begin by displaying the density of male log wages in 1979 and 1988, that is, reproduce Figure
4a.
set mem 50m;
use menun7988_cell;
replace wage=wage*65.8/104.3 if year88==1; /*transform to 1979 dollars*/
gen lwage=log(wage);
gen hweight=eweight*uhrswk/100.0;
/*hours weighted*/
*to simplify the kernel density estimation, estimated it only at 200 values;
quietly sum lwage, detail;
gen xstep=(r(max)-r(min))/200;
*kwage will be the wage at which the density is estimated;
gen kwage=r(min)+(_n-1)*xstep if _n<=200;
kdensity lwage [aweight=hweight] if year88==1 , at(kwage) gauss width(0.065)
generate(w88 fd88) nograph ;
Page 4 of 8
3/30/2015
kdensity lwage [aweight=hweight] if year88==0 , at(kwage) gauss width(0.065)
generate(w79 fd79) nograph ;
label var fd88 "Men 1988";
label var fd79 "Men 1979";
label var kwage "Log(Wage)";
graph twoway (connected fd88 kwage if kwage>=0 & kwage<=3.91, ms(i) lc(blue) )
(connected fd79 kwage if kwage>=0 & kwage<=3.91, ms(i) lp(dash) lc(magenta) ),
xlabel(.69 1.61 2.3 3.22) xline(0.748 1.065,lstyle(grid))) scheme(s1color)
saving(dflfig4a,replace);
graph export dflfig4a.wmf, replace;
*/ “*.wmf” can be inserted in Word doc/*
Where in the wage distribution do the changes over time appear more striking?
Compute the 90-10, 90-50 and 50-10 wage differentials, the variance of log wages, and the Gini
(fastgini needs to be installed). Which measure(s) appear to capture best the differences
shown in the graph of densities?
forvalues t=0/1 { ;
sum lwage [weight=hweight] if year88==`t', detail;
gen d9010_`t'=r(p90)-r(p10); di "d9010= " d9010_`t';
gen d9050_`t'=r(p90)-r(p50); di "d9050= " d9050_`t';
gen d5010_`t'=r(p50)-r(p10); di "d5010= " d5010_`t';
gen std_`t'=r(sd); di std_`t’;
fastgini wage [weight=hweight] if year88==`t';
};
Your results will be different but in the same range as those displayed in Table A1. Begin to built
Table 2 (using the template in the zip file) where each measure of wage inequality will
correspond to a column and the different distributions will correspond to different rows
(transposing Table III) and add a row that displays the differences in the measures between the
two years.
b) Our next exercise asks what would the 1988 distribution of wages have looked like if union
coverage had remained at its 1979 level. For that purpose, we will need to compute the
reweighting function
Pr(𝑢 = 1|𝑡𝑢|𝑥 = 79)
Pr(𝑢 = 0|𝑡𝑢|𝑥 = 79)
𝜓𝑢|𝑥 (𝑢, 𝑥) = 𝑢 ∙
+ [1 − 𝑢] ∙
Pr(𝑢 = 1|𝑡𝑢|𝑥 = 88)
Pr(𝑢 = 0|𝑡𝑢|𝑥 = 88)
First compute the probability of union coverage in each year that depends on the other
covariates.
forvalues t=0/1 { ;
probit covered ee1-ee15 exper exper2 exper3 exper4 edex educ reg1-reg3
ind1-ind10 ind12-ind18 occ1 occ2 nonwhite partt married smsa [pweight=eweight]
if year88==`t';
predict pruncx_`t', p;
};
gen phiux=pruncx_0/pruncx_1 if covered==1 & year88==1;
replace phiux=(1-pruncx_0)/(1-pruncx_1) if covered==0 & year88==1;
replace phiux=phiux*hweight/1000.0 if year88==1;
Then obtain the corresponding kernel density estimate
kdensity lwage [aweight=phiux] if year88==1 , at(kwage) gauss width(0.065)
generate(w88un79 fd88un79) nograph ;
Page 5 of 8
3/30/2015
and add the counterfactual density to your graph in a) and compute the counterfactual measures
of inequality by replacing hweight by phiux in the last set of commands in a). Which part of
the wage distribution is likely most impacted by the decrease in unionization? Will these changes
be reflected in our measures? For which measures are these effects in the range of results
reported in DFL (Table III or Table VI)? Explain (Hint: remember the importance of the order of
the decomposition in question 1a). Is this counterfactual likely to overstate or understate the
effects of union? Explain.
c) In this next exercise, we ask what would the 1988 wage distribution have looked like if all
available covariates, including union coverage, had remained at their 1979 level. To construct
this counterfactual, we will need to compute the reweighting function
Pr(𝑡𝑥 = 79| 𝑥) Pr(𝑡𝑥 = 88)
𝜓𝑥 (𝑥) =
∙
Pr(𝑡𝑥 = 88| 𝑥) Pr(𝑡𝑥 = 79)
***probit for year effect;
forvalues i=1/18 {;
gen ind`i'cov=ind`i'*covered;
};
probit year88 ee1-ee15 exper exper2 exper3 exper4 edex educ reg1-reg3 covered
ind1cov-ind18cov nonwhite partt married smsa [iweight=eweight];
predict py88, p;
summ year88 [weight=eweight] ;
gen pbar=r(mean);
gen phix=((1-py88)/py88)*(pbar/(1-pbar)) if year88==1;
replace phix=phix*hweight if year88==1;
Estimate the corresponding counterfactual density
kdensity lwage [aweight=phix] if year88==1 , at(kwage) gauss width(0.065)
generate(wx79 fdx79) nograph ;
and add it to your graph in a) (in a different color/style) and compute the counterfactual measures
of inequality by replacing hweight by phix in the commands in a). How much of the changes
in these measures are accounted for by this counterfactual? Which part of the wage distribution
is likely most impacted by these compositional effects? How is this different from the impact of
unionization by itself? For which measures are these effects in the range of results reported in
DFL (Table III or Table VI)? Explain. Which important factor is this analysis omitting?
3. (Ph.D. Students) In this third exercise, we are asked to use the RIF-regressions introduced in
Firpo, S., N.M. Fortin and T. Lemieux, "Unconditional Quantile Regressions," Econometrica,
Vol.77 (2009): 953-973.
to revisit some of the results from the previous exercise focusing only on the 90-10, 90-50 and
50-10 log wage differentials.
You will begin by creating a counterfactual time period with the counterfactual weights after the
above computation of the weights phix
save temp01, replace;
keep if year88==1;
replace hweight=phix;
replace year88=2;
save temp2, replace;
append using temp01;
Page 6 of 8
3/30/2015
save temp012, replace;
/*this dataset will include data for 1979, 1988 and 1988rw79 */
Note that the correct standard errors for these procedures below are typically obtained by
bootstrap, which you will not be asked to compute. The programs give approximate standard
errors only.
a) An important limitation of the DFL procedure is that it leaves the union premium at its 1988
level, whereas likely as union coverage has decreased so did the union premium. Check whether
the union premium has indeed changed at the mean and at 10th, 50th , and 90th centiles of the
wage distribution using rifreg regression (ado file included in the *.zip to be copied in your
relevanty your c:\ado\personal\r directory or equivalent)
forvalues t=0/2{ ;
reg lwage covered educ exper exper2 exper3 exper4 nonwhite partt married
smsa reg1-reg3 [weight=hweight] if year88==`t';
forvalues qt = 10(40)90 { ;
rifreg lwage covered educ exper exper2 exper3 exper4 nonwhite partt married
smsa reg1-reg3 [weight=hweight] if year88==`t', q(`qt') retain(rif`t’_`qt’);
};
};
Has the union premium changed between 1988 and 1979 if so, for which percentiles of the log
wage distribution? How is this likely to have impacted changing wage inequality? Discuss.
b) Use the RIF computed in a) to perform a standard Oaxaca-Blinder decomposition at the 10th, 50th
, and 90th centiles of the wage distribution comparing period 1 and period 0
gen rifat=.;
forvalues qt = 10(40)90 { ;
** get decomposition without reweighing;
** Explained: [E(X_1|t=1)- E(X_0|t=0)]B_0 ;
** Unexplained: E(X_1|t=1)*[B_1-B_0] ;
** Oaxaca “weight” option: 1=b1, 0=b2 ;
replace rifat=rif0_`qt' if year88==0;
replace rifat=rif1_`qt' if year88==1;
oaxaca rifat covered educ exper exper2 exper3 exper4 nonwhite partt
married smsa reg1-reg3 [aweight=hweight] if year88==0 | year88==1 ,
by(year88) weight(0) swap
detail(groupun:covered, grouped:educ , groupex:exper exper2 exper3 exper4,
groupdem: nonwhite partt married smsa reg1 reg2 reg3) ;
matrix Rt`qt'=e(b);
};
and then comparing the effects at the three percentiles of interest pairwise to obtain the changes
over time (first D) in the wage differentials (second D). Begin to gather the changes in wage
differentials in Table 3 (available in the *.zip file)
matrix DR9010=Rt90-Rt10; matrix list DR9010;
matrix DR9050=Rt90-Rt50; matrix list DR9050;
matrix DR5010=Rt50-Rt10; matrix list DR5010;
Compare the effects of the changes in union coverage (explained part) on the three log wage
differentials from the RIF-OB decompositions to the ones obtained in Table III and V of DFL,
Page 7 of 8
3/30/2015
and to the results in question 2), are they in the same range? Discuss. What is an advantage of
using RIF-OB decomposition? What is a drawback?
c) Now perform a reweighted decomposition to separate the wage structure effects from the
composition effects. Here, you have to appeal to the reweighted data set year88=2. Because
we have reweighted 1988 distribution to look like the 1979, we want compare the 1988rw79 and
1988 to get the composition effects and the 1988rw79 to the 1979 to get the wage structure
effects. We will perform the two following decomposition OB1) with sample 1988 (1) and
sample 1988rw79 (10) to get the pure composition effect, OB2) with sample 1979 (0) and
sample 1988rw79 (10) to get the pure wage structure effect. Because we are reweighting 1 to be
like 0, the equations are different from the notes (where the reweighting is from 0 to 1). Thus we
are estimating
𝑌̅1 − 𝑌̅0
(𝑌̅1 − 𝑌̅𝑐 )
(𝑌̅𝑐 − 𝑌̅0 )
=
+
=
(𝑋̅1 𝛽̂1 − 𝑋̅𝑐 𝛽̂𝑐 )
+
(𝑋̅𝑐 𝛽̂𝑐 − 𝑋̅0 𝛽̂0 )
= (𝑋̅1 − 𝑋̅𝑐 )𝛽̂1 + 𝑋̅𝑐 (𝛽̂1 − 𝛽̂𝑐 ) + 𝑋̅0 (𝛽̂𝑐 − 𝛽̂0 ) + (𝑋̅𝑐 − 𝑋̅0 )𝛽̂𝑐
Composition Effect + Spec. Error
Wage Structure + Reweighting error
replace rifat=.;
*** get composition effects with reweighting ;
** Explained: [E(X_1|t=1)- E(X_1|t=0)]B_c ;
** Total unexplained is specification error: E(X_1|t=1)](B_1-B_c);
forvalues qt = 10(40)90 { ;
replace rifat=rif1_`qt' if year88==1;
replace rifat=rif2_`qt' if year88==2;
oaxaca rifat covered educ exper exper2 exper3 exper4 nonwhite partt married
smsa reg1-reg3 [aweight=hweight] if year88==1 | year88==2 ,
by(year88) weight(1)
detail(groupun:covered, grouped:educ , groupex:exper exper2 exper3
exper4, groupdem: nonwhite partt married smsa reg1 reg2 reg3) ;
matrix Rx`qt'=e(b);
};
***the
matrix
matrix
matrix
composition effects are the «
Dx9010=Rx90-Rx10; matrix list
Dx9050=Rx90-Rx50; matrix list
Dx5010=Rx50-Rx10; matrix list
explained » effect ;
Dx9010;
Dx9050;
Dx5010;
To get the wage structure effects, we want to compare the 1979 and the 1988rw79
replace rifat=.;
forvalues qt = 10(40)90 { ;
*** get wage structure effects with reweighting;
** Unexplained: E(X_0|t=0)*[B_c-B_0] ;
** Reweighting error is Total explained: [E(X_1|t=0)-E(X_0|t=0)]*B_c;
replace rifat=rif1_`qt' if year88==2;
replace rifat=rif2_`qt' if year88==0;
Page 8 of 8
3/30/2015
oaxaca rifat covered educ exper exper2 exper3 exper4 nonwhite partt married
smsa reg1-reg3 [aweight=hweight] if year88==0 | year88==2 ,
by(year88) weight(1)
detail(groupun:covered, grouped:educ , groupex:exper exper2 exper3
exper4,groupdem: nonwhite partt married smsa reg1 reg2 reg3) ;
matrix Rw`qt'=e(b);
};
***the
matrix
matrix
matrix
wage structure effects are the « unexplained » effect ;
Dw9010=Rw90-Rw10; matrix list Dw9010;
Dw9050=Rw90-Rw50; matrix list Dw9050;
Dw5010=Rw50-Rw10; matrix list Dw5010;
Compare the results from the classic Oaxaca-Blinder decomposition in b) with the results of this
reweighted decomposition. Discuss the relative role of de-unionization and increasing returns to
education in changes in wage inequality over this period. Contrast their effects on overall wage
inequality (90-10), upper end wage inequality (90-50) and lower end wage inequality (50-10).