Problem Set #2 - Agricultural and Resource Economics

Department of Agricultural and Resource Economics
University of California at Berkeley
Steve Buck and Sylvan Herskowitz
Spring 2015
ENV ECON 118 / IAS 118 – Introductory Applied Econometrics
Assignment 2
Due February 24, at beginning of class
This assignment should be completed using STATA, and you are encouraged to use a .do file to write your code
for the exercise. To write notes in the .do file that STATA will not read as commands, type an “*” at the beginning
of each line in which you’ve written a comment. This will help you keep track of the purpose of each command,
which question you are trying to answer, etc.
The first thing you should do in your .do file is to change directories so that Stata knows where to find the data
that you downloaded and saved to your computer. To do this, you will use a command that is something like the
following (but your file path will vary)
cd "C:\....\EEP_118\PS2_Liberia"
Then turn on a log file:
log using PS_2.txt, text replace
This time you are opening a Stata-formatted “.dta” file, so you will use the “use” command instead of “insheet”,
which was needed to open a .csv spreadsheet in the last assignment. Your command should look like this:
use "PS2_LiberiaData_S2015.dta", clear
Finally don’t forget to close your log file at the end of the do-file by including the line “log close”.
Note: For Exercise 1 you need to submit your log file from your Stata work in addition to your written answers.
Exercise 1:
The data for this exercise comes from a research project Sylvan was working on in Monrovia, Liberia. The sample
consists entirely of women who came to apply for a factory job at the beginning of 2014. Women who were eligible
to be hired were then given a survey which covered a wide range of topics including family composition, education
background, household consumption, time use, and earnings among other topics. We have drawn a sample from the
total of 720 women in the survey. The PS2_LiberiaData_S2015.dta file includes the following variables:










age: Age of respondent
secondary: Whether the respondent has completed secondary school (1=yes, 0=no)
hoh: Respondent’s status as head of household (1=household head, 0=otherwise)
hhmbrs: Number of household members
children: Number of children in household under age 14
elderly: Number of elderly in household above the age of 64
exptot: Total monthly household expenditures (in Liberian Dollars)
expclothes: Monthly expenditures spent on clothing (in Liberian Dollars)
expfood: Monthly food expenditures per household (in Liberian Dollars)
expnonfood: Monthly non-food expenditures per household (in Liberian Dollars)
Also, note that at the time of the survey the exchange rate was 80 Liberian Dollar per US Dollar
1. First, write a short paragraph that describes your data. In particular:
a) How many women are in your data set? What is their average age? What is their full age range? How
many women have completed secondary school? How many are head of their household?
b) Construct a variable exptotpc equal to total expenditures per capita in US$ using the given exchange rate.
Plot a histogram of this constructed variable. Include this answer key in your solutions. What is the range
of household total expenditures per capita? Hint: it may be easier to first generate a variable for
expenditures per capita in Liberian Dollars and then convert it into US dollars.
c) For each household calculate the proportion of household expenditures spent on clothes. What is the
mean proportion of household expenditures on clothes? What is the median? How does this proportion of
household expenditures on clothes differ between the households of women who have completed secondary
school and those who have not?
d) Construct a variable expfoodpc equal to total expenditures on food in US$ per person in the household.
Plot a scatter diagram of this expfoodpc on exptotpc constructed in part (b). Include this in your solutions.
How does this amount spent on food relate to spending overall?
Stata tips: To count observations, use the command count. To create a new variable named var1, use the command
generate var1. Open the “Data Editor (Browse)” to see and check what you have done. To create a scatter plot of
variables y on x, use scatter y x and to create a histogram of a variable x, use the command histogram x. The
command tabulate lists all values a variable takes in the sample and the number of times it takes each value. To
summarize data for a specified subset of the observations, you can us summarize along with an “if” statement.
2. Estimate the following model of food and total expenditures:
(1)
expfood   0   1exptotal  u
a) Interpret your bˆ1 , remembering the triplet S(ign), S(size), and S(ignificance), though you don’t need to
comment on significance in this problem set.
b) How much of the variation in food expenditures is explained by variations in total expenditures?
c) What is the predicted level of food expenditure for a household with total monthly expenditures of
US$175?
3. Now estimate the following models of food and non-food expenditures. Note that you will first need to generate
new, logged versions of the variables in the regression model (Stata hint: this can be done by using generate and the
ln option):
(2)
log( expfood )   0   1 log( exptotal )  u
(3)
log( expnonfood )   0   1 log( exptotal )  u
a) In this sample, what are the elasticities of food and non-food expenditures with respect to total
expenditures? Comparing the estimates of bˆ1 in equations (2) and (3), do your results seem reasonable?
(Hint: What does it mean for an elasticity to be greater or less than 1?)
b) Using the results from (2), how would you expect food expenditure to change if total expenditures
decreased by 15%?
4. We will now explore the role of household size in food consumption:
(4)
log( expfood )   0   1 log( exptotal )   2 hhmbrs  u
a) Estimate equation (4) and interpret your results. Is it a better statistical relationship than equation (2)?
b) How did your estimate of bˆ1 change between equation (2) and equation (4)? Without performing any
calculations, what information does this give you about the correlation between total expenditures and
household size? (Explain your reasoning in no more than 4 sentences.)
c) Predict the expected value of food expenditures of a household with 4 members and total expenditures of
US$150 per month using your results from (4).
5. A country’s dependency ratio is the ratio of old and young dependents (dependents are those not in the labor
force) to the working-age population. A similar measure could be constructed for the household:
(5)
hhdr 
hh members under 14 or over 64
hh members age 15  64
Equation (4) doesn’t quite capture how the composition of a household, i.e. the characteristics of the members, is
associated with expenditure. You suspect that the log of total expenditures is negatively correlated with the log of
the household dependency ratio, controlling for household size.
a) Write an equation you could estimate that would test this hypothesis.
b) Estimate the equation in part (a). Does the evidence from the regression support or contradict the
hypothesis? Why? What might be driving this correlation?
c) How many observations are in the regression estimated in 5(b)? Why is this different from what was
estimated in 4(a)?
Exercise 2
Population growth is a critical factor effecting a country’s growth, development, and ability to manage its natural
resources. A researcher is interested in the relationship between different personal and household characteristics on
women’s fertility outcomes. She has information from a survey of women in a country she is interested in. She has
the following pieces of data in her data set:





children: number of children that the respondent has
income: household income in thousands of dollars per year
ageatmarriage: age of the woman when she first got married
education: number of years of education for the respondent
parentsincome: parents’ income of respondent.
In order to explore this further, the researcher runs a few regressions. She gets the following results from her
regressions:
(1)
children = 2.45 – 1.57 income – 0.76 ageatmarriage
(0.33) (0.14)
(0.04)
R^2 = 0.37
(2)
children = 2.42 – 0.93 income – 0.52 ageatmarriage – 0.33 education
(0.28) (0.09)
(0.02)
(0.11)
R^2 = 0.53
(3)
children = 2.48 - 0.65 income – 0.38 ageatmarriage – 0.22 education – 0.53 parentsincome
(1.74) (0.58)
(0.54)
(0.18)
(0.18)
R^2 = 0.55
Remember that the numbers in parentheses beneath the regression equation are the standard errors for the estimated
parameter value. The R^2 is also reported for each regression model.
a)
Looking at the results in the first regression (1) do the signs of the coefficients on income and
ageatmarriage make sense? Are they statistically significant? Do you trust the estimated magnitude
of these coefficients? Why or why not?
b) The researcher then decides to add the variable education to the regression model and estimates
equation (2). Comparing the results from equation (2) and the original ones from equation (1), what
problem do you think the first model may have had? How can you tell?
c) The researcher then decides to go further and add parentsincome to the other variables and estimates
equation (3). She notes that the R^2 has improved slightly over the R^2 in equation (2). If there are
other notable advantages or disadvantages of equation (3) relative to (2) then point them out. Overall,
do you think model (3) is an improvement over the one in equation (2)? Why or why not?