Multiple regression analysis: Further topics EC 252

Multiple regression analysis: Further topics
EC 252 Introduction to Econometric Methods
Abhimanyu Gupta
March 2,5, 2015
1 / 30
Today’s lecture:
I
How can we compare different sets of estimates when the measurement
units are different?
I
How can we accurately calculate the percentage effect in a log-model?
I
How can we interpret the coefficients when we use quadratic functions?
I
Adjusted R-squared measure and how to select between different models
I
How can we use our estimates for residual analysis?
2 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
Reading: Wooldridge, Chapter 6
3 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
4 / 30
Effects of Data Scaling on OLS Statistics
We begin by examining the effect of rescaling the dependent or independent
variables.
I
I
We will see that they change in ways that preserve all measured effects and
testing outcomes.
Many variables do not have a natural scale:
I
I
I
Pounds versus thousands of pounds
Degrees Celcius versus degrees Fahrenheit
Time in months versus years
Application: How can we compare different sets of estimates, when the variables
have been rescaled?
Or to put it another way: does choice of measurement units matter?
I
In Problem Set 5 (Q.3), we used the definition of the OLS estimator to
show how estimates change when we rescale variables.
I
Here we take a different approach, and re-write our estimation model
directly
5 / 30
Example: Salaries of professional basketball players
log\
(wage) =β0 + β1 minutes + β2 points + β3 exper + β4 expersq
n = 269, R 2 = 0.49
where minutes ... minutes played per season, points ...average points per game.
Question: Suppose you replaced minutes by hours played per season ( hours ),
how would the coefficient change?
1. The two variables are related by: #minutes = 60 × # hours
2. Thus, we can re-write the model as
log\
(wage) =β0 + β1 (60hours ) + β2 points + β3 exper + β4 expersq
log\
(wage) =β0 + (60β1 ) hours + β2 points + β3 exper + β4 expersq
3. Conclusion: We expect the coefficient on hours to be 60 times as large as
the coefficient on minutes.
6 / 30
Example (Regression results for basketball salaries)
Dependent variable: log (wage)
(1)
(2)
minutes
0.000159
(0.0000808)
hours
points
exper
expersq
constant
R2
Observations
0.0612
(0.0120)
0.156
(0.0384)
-0.00612
(0.00276)
5.493
(0.114)
0.491
269
0.00954
(0.00485)
0.0612
(0.0120)
0.156
(0.0384)
-0.00612
(0.00276)
5.493
(0.114)
0.491
269
7 / 30
Effects of Data Scaling on OLS Statistics
Similarly, suppose we re-scale the dependent variable. Given the fitted values:
yˆ = βˆ0 + βˆ1 x
suppose we had multiplied y by a constant c so that y˜ := c × yˆ.
We can re-write the model:
y˜ := c yˆ = c βˆ0 + βˆ1 x
= (c βˆ0 ) + (c βˆ1 ) x
Conclusion: We expect all the coefficients to go up by factor c.
8 / 30
Effects of Data Scaling on OLS Statistics
What if some variables enter the regression in logarithms?
I
If the dependent variable is in logarithmic form, changing the unit of
measurement does not affect the slope coefficient:
log (c1 yi ) = log (c1 ) + log (yi )
for any constant c1 > 0.
I
So the new intercept will be βˆ0 − log (c1 ).
I
Similarly, changing the unit of measurement of any xj , where log (xj )
appears in the regression, only affects the intercept.
β1 log (cxi ) = β1 log (c) + β1 log (xi )
=⇒ This corresponds to what we know about percentage changes and
elasticities: they are invariant to the units of measurement of either y or
the xj .
9 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
10 / 30
More on Logarithmic Functional Forms
We have already seen that we often write the dependent variable and/or some
independent variables in logarithmic form:
\
log
(y ) = βˆ0 + βˆ1 log (x1 ) + βˆ2 x2
Why use logarithms?
1. Coefficients have appealing interpretation.
=⇒ We can ignore the units of measurement of variables in log form.
2. When y > 0, models using log (y ) as the dependent variable often satisfy
the CLM assumptions more closely than models using y .
3. Taking logs usually narrows the range of the variable.
=⇒ The estimates are less sensitive to outliers.
11 / 30
Some rules of thumb:
I
When a variable is a (positive) currency amount, the log is often taken
(e.g. wages, salaries, firm sales).
I
Variables which are large integer values also appear in logarithmic form
(e.g. population, total number of employees, pupils).
I
Variables that are measured in years usually appear in their original form
(e.g. education, experience, age, tenure).
I
A variable that is a proportion or a percent are usually used in level form
(e.g. the unemployment rate).
Limitation: the log cannot be used if a variable takes on zero or negative values.
I
We often deal with variables which can take value 0 in principle (wealth,
income).
I
In cases where a variable y is nonnegative but can take on the value 0,
log (1 + y ) is sometimes used.
12 / 30
Interpretation of the coefficient in a log-model
Consider the general estimated model:
\
log
(y ) = βˆ0 + βˆ1 log (x1 ) + βˆ2 x2
I
Take x1 to be fixed, and change x2 by ∆x2 ...
I
...or imagine that there are two units with the same value of x1 , but x2
values that differ by ∆x2
I
Then the predicted difference in the dependent variable is
\
∆log
(y ) = βˆ2 ∆x2 .
So far we have used the approximation %∆y ≈ 100∆log (y ).
I
I
I
Example: If log(y ) goes up by 0.01, this represents an increase in y of
approximately 1%.
However, this approximation becomes less accurate as the change in log (y )
becomes larger.
13 / 30
Interpretation of the coefficient in a log-model
Fortunately, we can compute the exact percentage change in y as follows:
%∆ˆ
y = 100 [exp(βˆ2 ∆x2 ) − 1]
Example (Computing the exact percentage change for y )
Suppose we obtain the following estimates in a regression:
dy = 0.988 + 0.503x
log
Question: What is the percentage effect of increasing x from 2.3 to 4.6?
1. What would be the prediction using our previous approximation?
2. What is the exact percentage change?
3. How far are we off by using the approximation?
14 / 30
Models with Quadratic Terms
I
Quadratic functions are used often in applied economics to capture
decreasing or increasing marginal effects.
I
Simplest case:
y = β0 + β1 x + β2 x 2 + u
I
The estimated equation is:
yˆ = βˆ0 + βˆ1 x + βˆ2 x 2
I
Take derivatives to get the partial effect:
d yˆ
= βˆ1 + 2βˆ2 x
dx
I
So we have the approximation:
∆ˆ
y ≈ βˆ1 + 2βˆ2 x ∆x
which can readily be seen to depend on x itself.
15 / 30
Example (At which experience is log wage expected to be highest?)
I
A common empirical case is one of positive, but declining, returns:
log\
(wage) = 5.493 + 0.00016 minutes + 0.061 points + 0.155 exper −
0.006 expersq ,
(0.114)
(0.00008)
(0.012)
(0.038)
(0.00276)
I
Holding other covariates constant:
log\
(wage) = constant + 0.155exper − 0.006exper 2
1.2
1
I
Find the maximum wage
as a function of
experience by solving the
first-order condition to
obtain:
∗
exper = 12.92
0.8
0.6
0.4
0.2
0
0
2
4
6
8
10
12
14
16
18
20
Years of experience
16 / 30
Models with Quadratics (ctd.)
I
We can combine the use of quadratics along with logarithms...
I
...but extra care is needed to figure out the correct partial effects
(Wooldridge, pp. 192–197).
I
Finally, other polynomial terms can be included in regression models, such
as a cubic term, a quartic term, etc.
I
The interpretation would proceed along similar lines as in the quadratic
case.
Polynomials in x are an example of how we can allow for a flexible effect of
x on y in a multiple linear regression model.
I
17 / 30
Models with Interaction Terms
I
Sometimes, it is natural for the partial effect of the dependent variable with
respect to an explanatory variable to depend on the magnitude of another
explanatory variable.
I
Example: consider the following model of house prices:
price = β0 + β1 sqrft + β2 bdrms + β3 sqrft × bdrms + β4 bthrms + u
where sqrft refer to size in square feet, bdrms to number of bedrooms,
bthrms to number of bathrooms.
=⇒ There is an interaction effect between square footage and number of
bedrooms.
I
It’s important to be able to understand and use models with interaction
effects.
18 / 30
Models with Interaction Terms
I
I
The partial effect of bdrms on price (holding all other variables fixed) in
this linear model is:
∆price
= β2 + β3 sqrft
∆ bdrms
If β3 > 0, an additional bedroom yields a higher increase in housing price
for larger houses.
I
we say: the effect of an additional bedroom increases with the size of the
house.
19 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
20 / 30
Goodness-of-Fit and Selection of Regressors
Recall:
I
A small R 2 implies that the error variance is large relative to the variance of
y.
=⇒ It might be difficult to precisely estimate the βj .
I
However, a large error variance can be offset by a large sample size.
=⇒ If we have enough data, we may be able to precisely estimate the βj
even though we have not controlled for many unobserved factors.
I
Note: poor explanatory power has nothing to do with unbiased estimation
of the βj !
I
That is determined by the zero conditional mean assumption (MLR.4).
21 / 30
Goodness-of-Fit and Selection of Regressors
Recall the definition of our standard R-squared measure, and re-write slightly:
R2 = 1 −
SSR
SSR/n
=1−
SST
SST /n
where SSR/n is an estimate of σu2 and SST is an estimate of σy2 .
I
What happens as we include more and more variables?
I
SSR goes down (better fit)
I
SST stays the same (only depends on the observed data, not on our
estimates)
I
If we try to have a high R 2 , we will include more and more variables.
I
It may be useful to adjust for the number of regressors we have included as
a “fair way” of assessing how much of the variation our model explains.
22 / 30
Goodness-of-Fit and Selection of Regressors
Replace the estimators for σu2 and σy2 with their unbiased counterparts. Then
the ‘adjusted’ goodness-of-fit measure is as follows:
Definition (Adjusted R-squared)
2
R =1−
SSR / (n − k − 1)
SST / (n − 1)
Intuition:
2
I
Adding variables reduces SSR, so raising R ...
I
...but larger k increases −1/(n − k − 1), so lowering R ...
I
...making the overall effect ambiguous
2
23 / 30
Goodness-of-Fit and Selection of Regressors
2
We now understand that R imposes a penalty for adding additional
independent variables to a model.
2
I
If we add a new independent variable to a regression, R increases if, and
only if, the t statistic on the new variable is greater than one in absolute
value.
I
If we add a new group of independent variables to a regression, R
increases if, and only if, the F statistic for the joint significance of the new
variables is greater than unity.
2
24 / 30
Goodness-of-Fit and Selection of Regressors
We have already seen how we can decide whether we can exclude a particular
set of variables from our model:
I
a t-test for an individual variable
I
an F -test for sets of variables
This is a form of model selection, but it only works for comparing nested
models:
I
one model (the restricted model) is a special case of the other model (the
unrestricted model).
2
We can use R to compare between non-nested models.
25 / 30
Goodness-of-Fit and Selection of Regressors
2
One use of R is as an aid to selecting the best functional form for your
regression, when non-nested sets of covariates are under consideration.
I
For example, consider the following two models:
y = β0 + β1 log x + u
and
y = β0 + β1 x + β2 x 2 + u
How would you decide which specification to adopt? No t- or F -test helps
here...
I
...and the first model contains one fewer parameter, so R 2 would not allow
for a fair comparison.
I
I
One option is to adopt R as a decision criterion.
This approach does not work for deciding which functional form is
appropriate for the dependent variable.
I
In Lecture 10, we look at a specific test for functional form.
2
26 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
27 / 30
Residual Analysis
Sometimes we are interested not in the systematic relationship (summarized by
the coefficient estimates), but in the deviations from the expected value for
specific observations.
I
After estimating the coefficients, we can compute the residual for each
observational unit. Recall that
uˆi = yi − yˆi
I
We can then study whether we have outliers in the data (unusually large
individual residuals)
I
We can look at the histogram corresponding to the residuals to get an idea
of their distribution.
I
We can study whether particular units have positive or negative residuals,
i.e. whether they lie above or below (resp.) their predicted value.
28 / 30
Example
Question: Is a specific individual over- or underpaid relative to his peers?
Suppose you were manager of a basketball team, and wanted to buy a new
player in the market. You could use the residual analysis to
I
study how individual pay relates to performance measures
I
which player might be currently ‘underpaid’ given his performance.
29 / 30
Outline
Effects of Data Scaling on OLS Statistics
Functional form
More on Logarithmic Functional Forms
Percentage effects in log models
Models with Quadratics
Models with Interaction Terms
Adjusted R-squared
Residual Analysis
Next lecture: Binary variables (Reading: Wooldridge Chapter 7)
30 / 30