assignment 11.

Exercise 10.3.4
March 29, 2015
1
Question
Consider the n = 11 data values in the following table.
Observation
1
2
3
4
5
6
X
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
Y
-10.00
-8.83
-9.15
-4.26
-0.30
-0.04
Observation
X
Y
7
1.00 3.52
8
2.00 5.64
9
3.00 7.28
10
4.00 7.62
11
5.00 8.51
Suppose we consider the simple linear regression to describe the relationship between the response Y and the predictor X.
(a) Plot the data in a scatter plot.
(b) Calculate the least-squares line and plot this on the scatter plot in part
(a).
(c) Plot the standardized residuals against X.
(d) Produce a normal probability plot of the standardized residuals.
(e) What are your conclusions based on the plots produced in parts (c) and
(d)?
(f) If appropriate, calculate 0.95-confidence intervals for the intercept and
slope.
(g) Construct the ANOVA table to test whether or not there is a relationship
between the response and the predictors. What is your conclusion?
(h) If the model is correct, what proportion of the observed variation in the
response is explained by changes in the predictor?
1
2
Solution
(a) The simple linear regression model is
Yi = β0 + β1 Xi + i
where i ∼ N (0, σ 2 ). Furthermore, i and j are independent for i 6= j.
●
●
●
5
●
0
●
●
Y
●
−5
●
−10
●
●
●
−4
−2
0
2
4
X
Figure 1: Scatter plot of Y against X.
A linear relationship between X and Y seems reasonable.
##########################
## R codes ##
X <- seq(-5, 5, by = 1)
Y <- c(-10.00, -8.83, -9.15, -4.26, -0.30, -0.04,
3.52, 5.64, 7.28, 7.62, 8.51)
plot(X, Y)
##########################
(b) The estimates of the slope and intercept parameters are:
b1 = 2.1023636
2
and
b0 = −0.0009091,
respectively. Hence, the estimated regression line is
Yˆi = −0.0009091 + 2.1023636Xi .
The estimated regression line is superimposed to the scatter plot in the
following figure.
●
●
●
5
●
0
●
●
Y
●
−5
●
−10
●
●
●
−4
−2
0
2
4
X
Figure 2: Estimated linear regression line superimposed to the scatter plot of
Y against X.
##########################
## R codes ##
reg.fit <- lm(Y ~ X)
summary(reg.fit)
plot(X, Y)
abline(reg.fit)
##########################
(c) The standardized residuals are plotted against X in the following figure.
3
3
2
●
●
●
0
●
●
●
●
●
−1
Standardized Residuals
1
●
●
−3
−2
●
−4
−2
0
2
4
X
Figure 3: Standardized residuals against X.
##########################
## R codes ##
n <- length(Y)
e <- residuals(reg.fit)
SSE <- sum(e^2)
MSE <- SSE/(n - 2)
stdres <- e/sqrt(MSE)
plot(X, stdres, ylim = c(-3, 3), xlab = "X",
ylab = "Standardized Residuals")
abline(h = 0, lty = 2)
##########################
(d) The normal probality plot is produced in the following figure.
##########################
## R codes ##
res.std = rstandard(reg.fit)
qqnorm(res.std, xlim = c(-2, 1.5), ylim = c(-2, 1.5),
ylab="Standardized Residuals",
4
1.5
Normal Q−Q Plot
1.0
●
●
●
0.0
●
●
●
−0.5
●
●
−1.5
−1.0
Standardized Residuals
0.5
●
−2.0
●
●
−2.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Normal Scores
Figure 4: Normal Probability Plot of the residuals.
xlab="Normal Scores")
qqline(res.std)
##########################
(e) Both figures indicate that the normal simple linear regression model is
reasonable.
(f) The 0.95-confidence intervals for both of the regression coefficients are
given below:
P (−1.053171 ≤ β0 ≤ 1.051353) = 0.95
and
P (1.769609 ≤ β1 ≤ 2.435118) = 0.95.
##########################
## R codes ##
confint(reg.fit, level=0.95)
##########################
5
Regression
Residuals
Total
Df Sum Sq Mean Sq F value Pr(>F)
1
486.19
486.19 204.27 0.0000
9
21.42
2.38
10
507.61
(g) The analysis of variance (ANOVA) table is given below
We want to test the null hypothesis
H0 : β1 = 0,
against alternative
HA : β1 6= 0.
The test statistic is
F = 204.27
which follows F distribution with 1 and 9 degrees of freedoms. The
corresponding P-Value is 0.000 which is very small and smaller than
0.05 = 1 − γ. Hence, the null hypothesis is rejected at the 0.05 level of
significance. There exists a straight line relationship between X and Y .
##########################
## R codes ##
anova(reg.fit)
##########################
(h) The coefficient of determination is computed as
R2 =
486.19
= 0.9578.
507.61
This tells us that approximately 95.78% of the variability in Y is expressed by the fitted regression model.
##########################
## R codes ##
summary(reg.fit)
##########################
6