Worksheet 4: Forecasting Regression Residuals

Econ 3355 Forecasting Spring 2015
Worksheet 4: Forecasting Regression Residuals
In this worksheet we’ll practice MINITAB to continue our review of multiple regression.
DIRECTIONS:
1. Download and open the JCPennys file used in Quiz 2.
2. Make a time plot (with time stamp.) (Notice mild increasing trend and definite seasonality.)
3. Perform Trend-Only Regression (Model 1.)
(a) Store regression fitted values. Change column title to something legible such as Trend Fits.
(b) Store regression residuals. (Change column title to Trend Residuals.) Make a TS plot
of residuals. Notice that seasonal is still apparent since we’ve only removed the trend,
so far.
4. Add dummy (indicator) variables to the speadsheet to model the seasonality:
(
1 if first quarter
Q1 =
0 otherwise
(
1 if second quarter
Q2 =
0 otherwise
(
1 if third quarter
Q3 =
0 otherwise
5. Perform Trend-Seasonal Regression.
(a) Store fitted values. Change title to Trend-Seasonal Fits.
(b) Store residuals. Change title to Trend-Seasonal Residuals. Plot the residuals. Notice
that seasonality is now removed from data but we see apparent autocorrelation.
(c) Compare Trend-Only to Trend-Seasonal fits by plotting both fits together with original
times series in a multiple time plot.
(d) Forecast out 6 quarters to 2nd Quarter 2003.
• Note the indicated 95% lower limit on sales:
• Note the indicated 95% upper limit on sales:
6. Test for autocorrelation in Trend-Seasonal Residuals in several different ways.
(a) Use Durbin-Watson statistic. Generate DW stat in MINITAB. Review and discuss DW
formula on page 267 text. Review DW bell curve on page 304 text. Look up DW table
in back of text. What’s the conclusion?
(b) The DW statistic is an “old classic” for testing first-order autocorrelation (correlation
between two consecutive residuals. A more modern and complementary approach is the
Runs Test.
• Apply the MINITAB Runs test:
Stat > Nonparametrics > Runs Test
1
• Read in the MINITAB Help file what the Runs Test does and what the P -valuemeans.
Does the Runs Test indicate autocorrelation in regression residuals.
• For comparison, generate an independent-residuals counterpart to the actual regression residuals using MINITAB which has the same mean and standard deviation:
Use
Stat > Basic Statistics > Display Descriptive Statistics
to calculate the mean and standard deviation. Then use
Stat > Random Data > Normal > (generate 24 values and store in a
column named Independent Residuals.
• Plot the new computer-generated residuals. Do they appear to be independent?
• Apply the Runs Test to the new residuals. What’s the conclusion, at 5% significance? (NOTE: Every student’s answer will vary due to different random data!)
(c) Apply Autocorrelation Function to regression residuals:
Stat > Time Series > Autocorrelation
Dependence over how many time lags (quarters) does the plot indicate?
7. Formal Description of The Autocorrelation Function (ACF)
• The DW statistic and the Runs Test really check for first-order autocorrelation — the
correlation between consecutive residuals et and et−1 :
(Use the Greek letter rho)
ρ1 = cor(et , et−1 )
• But (since economic conditions tend to change s-l-o-w-l-y) time dependencies can also
exist over longer lengths of time. There may be correlations between variables 2 quarters
apart, 3 quarters apart, etc.
ρ2 = cor(et , et−2 )
(correlation two quarters apart)
ρ3 = cor(et , et−3 )
..
.
(correlation three quarters apart)
ρk = cor(et , et−k )
..
.
(correlation k quarters apart)
• In practice we see only a snapshot or sample of a complete time series. In the sales data
we sample 24 quarters of residuals:
e1 , e2 , e3 , . . . , e24
Naturally we estimate each ρK by the sample correlation rk computed from the sample.
2
• The MINITAB Autocorrelation procedure shows the sample autocorrelation function (ACF)
r(k) and a separate hypothesis test to check correlation at each time lag:
H0 : ρ1 = 0
(test for first-order autocorrelation)
HA : ρ1 6= 0
H0 : ρ2 = 0
(test for second-order autocorrelation)
HA : ρ2 6= 0
H0 : ρ3 = 0
(test for third-order autocorrelation)
HA : ρ3 6= 0
..
.
8. Make a one-period time lag for regression residuals in the spreadsheet. Label the column
Lag Residuals.
Stat > Time Series > Lag
9. Apply a lag to the previous lags to make a two-period time lag for regression residuals in the
spreadsheet. Label the column Lag 2 Residuals.
10. Make a scatterplot of Residuals vs Lag 1 Residuals and Lag 2 Residuals. What relationships
do you see (or not see)? Also calculate correlations for the two plots and compare them to
the ACF values. (There should be slight differences due to a slightly different correlation
formula applied to time data used by the ACF.)
11. Now it’s time to predict regression residuals, with two purposes:
(1) Improve JCPennys forecasts (i.e. reduce future forecast MSE) by reducing the chronic
overprediction that we see in regression residuals.
(2) Restore the 95% guarantee for lower and upper limits in sales forecast for 2nd quarter
2003 by eliminating autocorrelation.
12. Formal Description of The Autoregressive Model AR(p):
We can use regression to forecast residuals et from other residuals (hence the name auto
regression) from past months:
ebt = β0 + β1 et−1 + β2 et−2 + · · · + βp et−p + error
(where p = number of past months used in regression)
3
Example: p = 3
ebt = β0 + β1 et−1 + β2 et−2 + β3 et−3 + error
Note: The regression slope βk is sometimes called a partial regression coefficient since it
shows the impact of residual et−k when all other residuals are held constant.
Modifications:
(a) Set the regression intercept β0 = 0 (since residuals have mean 0)
(b) Let’s use a different Greek letter (phi ) for slopes to distinguish auto regression from
ordinary regression:
ebt = φ1 et−1 + φ2 et−2 + φ3 et−3 + error
13. Formal Description of Partial Autocorrelation Function (PACF)
• The sample partial autocorrelation function (PACF) shows estimated autoregression
slopes φbk .
• The MINITAB Partial Autocorrelation procedure shows the PACF and tentatively tests
each autoregression slope:
H0 : φ1 = 0
(a single time lag is significant, after accounting for all other lags)
H0 : φ2 = 0
(a second time lag is significant, after accounting for all other lags)
H0 : φ3 = 0
(a third time lag is significant, after accounting for all other lags)
..
.
• Use the PACF to tentatively choose p for the AR(p) model graphically.
14. MINITAB ARIMA Procedure
• Use this procedure to formally (more seriously) test regression coefficients in the AR(p)
model. (Uses P -values.)
• Good news! We can “drop” non-significant time periods the same way we would drop
non-significant predictor variables in ordinary regression!
15. Apply MINITAB PACF and ARIMA to regression residuals. Adjust single-valued forecast
and 95% prediction interval forecasts accordingly!
Stat > Time Series > PACF
and
4
Stat > Time Series > ARIMA