ECON4950 Problem Set 1 Georgia State University

ECON4950 Problem Set 1
Georgia State University
Questions on Background Material
1. A random sample of 22 businessa economists were asked to predict the
percentage growth in the consumer price index over the next year. The
forcasts were:
3.6, 3.1, 3.9, 3.7, 3.5, 3.7, 3.4, 3.0, 3.6, 3.4, 3.1, 2.9, 3.0, 4.0, 2.8, 3.8, 4.2,
2.5, 3.1, 3.9, 2.9, 2.6
(a) What are the sample mean, minimum, and maximum?
(b) What is the sample variance and standard deviation
The following table displays data on annual solid waste collection for nine
cities in the U.S. The data includes information on the number of households in the city, the total tons of solid waste (per year), and the revenue
generated by the solid waste haulers (per year).
City
A
B
C
D
E
F
G
H
I
Number of Households
2200
2500
2700
4000
4000
4000
5500
6000
9000
Total Tons
3080
3500
3780
5600
5600
5600
7700
8400
12600
Revenue
118800
200000
250560
201600
308800
268800
452100
277200
358200
(a) Calculate the covariance between revenue and number of households.
(b) Calculate the covariance between revenue and total tons collected.
2. A large consumer goods compa y has been studying the effect of advertising on total profits. As part of this study, data on advertising expenditures
and total sales were collected for a six-month period and are as follows:
(10, 100), (15, 200), (7, 80), (12, 120), (14, 150).
1
(a) Plot the data and compute the correlation coefficient.
(b) Do these results provide conclusive evidence that advertising has a
positive effect on sales? Explain your reasoning.
Questions on Chapter 2 (Simple Regression) from
Wooldridge
Answer the following questions from the end of the chapter of the textbook.
Please show your work, and attach your log file for the computer problems. All
data can be found at http://gsu-econ4950.s3.amazonaws.com.
Problem 2.3
The following table contains the ACT score and the GP A (grade point average)
for eight college students. Grade point average is based on a four-point scale
and has been rounded to one digit after the decimal.
Student
1
2
3
4
5
6
7
9
GP A
2.8
3.4
3.0
3.5
3.6
3.0
2.7
3.7
ACT
21
24
26
27
29
25
25
30
1. Estimate the relationship between GP A and ACT using OLS; that is,
obtain the intercept and slope estimates in the equation
ˆ A = βˆ0 + βˆ1 ACT
GP
(1)
Comment on the direction of the relationship. Does the intercept have
a useful interpretation here? Explain. How much higher is the GP A
predicted to be if the ACT score is increased by five points?
2. Compute the fitted values and residuals for each observation, and verify
that the residuals (approximately) sum to zero.
3. What is the predicted value of GP A when ACT = 20?
4. How much of the variation in GP A for these eight students is explained
by ACT ? Explain.
2
Problem 2.4
The data set bwght.csv contains data on births to women in the United States.
Two variables of interest are are the dependent variable, infant birth weight
in ounces (bwght), and an explanatory variable, average number of cigarettes
the mother smoked per day during pregnancy (cigs). The following simple
regression was estimated using data on n = 1, 388 births.
d = 119.77 − 0.514cigs
bwght
(2)
• What is the predicted birth weight when cigs = 0? What about when
cigs = 20 (one pack per day)? Comment on the difference.
• Does this simple regression necessarily capture a causal relationship between the child’s birth weight and the mother’s smoking habits? Explain.
• To predict a birth weight of 125 ounces, what would cigs have to be?
Comment.
• The proportion of women in the sample who do not smoke while pregnant
is about .85. Does this help reconcile your finding from part 3.
Problem 2.6
Using data from 1988 for houses sold in Andover, Massachusetts, from Kiel
and McClain (1995), the following equation relates housing price (price) to the
distance from a recently build garbage inccinerator (dist):
d
log(price)
=
9.40 + 0.312 log(dist)
n
=
135, R2 = 0.162.
1. Interpret the coefficient on log(dist). Is the sign of this estimate what you
expect it to be?
2. Do you think simple regression provides an unbiased estimator of the
ceteris paribus elasticity of price with respect to dist? (Think about the
city’s decision on where to put the incinerator.)
3. What other factors about a house affect its price? Might these be correlated with distance from the incinerator?
Problem 2.9, First Part
1. Let βˆ0 and βˆ1 be the intercept and slope from the regression of yi on xi ,
using n observations. Let c1 and c2 , with c2 6= 0, be constants. Let β˜0 and
β˜1 be the intercept and slope from the regression of c1 yi on c2 xi . Show
that β˜1 = (c1 /c2 )βˆ1 and β˜0 = c1 βˆ0 . [Hint: To obtain β˜1 , plug the scaled
versions of x and y into the definition of βˆ1 , and then use βˆ0 = y¯ − βˆ1 x
¯
for β˜0 .]
3
Computer Problem 2.1
The data in 401k.csv are a subset of data analyzed by Papke (1995) to study the
relationship between participation in a 401(k) pension plan and the generosity of
the plan. The variable prate is the percentage of eligible workers with an active
account; this is the variable we would like to explain. The measure of generosity
is the plan match rate, mrate. This variable gives the average amount the firm
contributes to each worker’s plan for each $1 contribution by the worker. For
example if mrate = 0.50, then a $1 contribution by the worker is matched by a
50 cent contribution by the firm.
1. Find the average participation rate and the average match rate in the
sample of plans.
2. Now, estimate the simple regression equation:
d = βˆ0 + βˆ1 mrate
prate
(3)
and report the results along with the sample size and R-squared.
3. Interpret the intercept in your equation.
mrate.
Interpret the coefficient on
4. Find the predicted prate when mrate = 3.5. Is this a reasonable prediction? Explain what is happening here.
5. How much of the variation in prates explained by mrate? Is this a lot in
your opinion?
Computer Problem 2.2
The data set in ceosal2.csv contains information on chief executive officers for
U.S. corporations. The variable salary is annual compensation, in thousands of
dollars, and ceoten is prior number of years as company CEO.
1. Find the average salary and the average tenure in the sample.
2. How many CEOs are in their first year as CEO (that is, ceoten = 0)?
What is the longest tenure as a CEO?
3. Estimate the simple regression model
log(salary) = β0 + β1 ceoten + u
(4)
and report your results in the usual form. What is the (approximate)
predicted percentage increase in salary given one more year as a CEO?
4
Computer Problem 2.6
Use the data in meap93.csv to explore the relationship between the math pass
rate among tenth graders at a high school (math10) and spending per student
(expend).
1. Do you think each additional dollar spent has the same effect on the pass
rate, or does a diminishing effect seem more appropriate? Explain.
2. In the population model
math10 = β0 + β1 log(expend) + u
(5)
argue that β1 /10 is the percentage point change in math10 given a 10%
increase in expend.
3. Use the data in meap93.csv to estimate the model from part 2. Report
the estimated equation in the usual way, including the sample size and
R-squared.
4. How big is the estimated spending effect? Namely, if spending increases
by 10%, what is the estimated percentage point increase in math10?
5. One might worry that regression analysis can produce fitted values for
math10 are greater than 100. Why is this not much of a worry in this
data set?
5