COR1-GB.1305 SAMPLE FINAL EXAM

COR1-GB.1305
SAMPLE FINAL EXAM
This is the question sheet. There are 10 questions, each worth 10 points. Please write all
answers in the answer book, and justify your answers. Good Luck!
1) The following table presents data collected in the 1960s for 21 countries on
X=Annual Per Capita Cigarette Consumption (“Cigarette”), and
Y=Deaths from Coronary Heart Disease per 100,000 persons of age 35-64
(“Coronary”).
Country
United States
Canada
Australia
New Zealand
United Kingdom
Switzerland
Ireland
Iceland
Finland
West Germany
Netherlands
Greece
Austria
Belgium
Mexico
Italy
Denmark
France
Sweden
Spain
Norway
Cigarette
3900
3350
3220
3220
2790
2780
2770
2290
2160
1890
1810
1800
1770
1700
1680
1510
1500
1410
1270
1200
1090
Coronary
259.9
211.6
238.1
211.8
194.1
124.5
187.3
110.5
233.1
150.3
124.7
41.2
182.1
118.1
31.9
114.3
144.9
144.9
126.9
43.9
136.3
Scatterplot of Coronary vs Cigarette
250
Coronary
200
150
100
50
0
1000
1500
2000
2500
Cigarette
3000
3500
4000
Versus Fits
(response is Coronary)
100
Residual
50
0
-50
-100
100
120
140
160
180
Fitted Value
200
220
240
260
Regression Analysis: Coronary versus Cigarette
The regression equation is
Coronary = 29.5 + 0.0557 Cigarette
Predictor
Constant
Cigarette
Coef
29.45
0.05568
S = 46.5558
SE Coef
29.48
0.01288
R-Sq = 49.6%
T
1.00
4.32
P
0.330
0.000
R-Sq(adj) = 46.9%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
19
20
SS
40484
41181
81666
MS
40484
2167
F
18.68
P
0.000
A) Based on the scatterplot of Coronary versus Cigarette, does there appear to be a
linear relationship between cigarette consumption and heart disease? If so, does
the relationship appear to be negative or positive? (1 Point.)
B) What patterns or problems, if any, do you see in the residuals versus fits plot?
Would you feel reasonably comfortable in fitting a simple linear regression model
to this data set? (1 Point.)
C) Write the equation for the fitted model. (2 Points.)
D) Give an interpretation of the fitted slope, βˆ . (3 Points.)
E) How much natural variability is associated with the estimated intercept αˆ ? (3
Points.)
2) For the situation described in Problem 1, answer these questions.
A) Compute the residual for Greece. (2 Points.)
B) Do you think that natural variability alone could account for such a large value of
βˆ as actually found here? Explain. (2 Points.)
C) Using the Minitab output, determine whether sufficient statistical evidence
exists to conclude that there is a positive linear relationship between Cigarette and
Coronary at the 1% level of significance. (2 Points.)
D) Based on R 2 , assess the strength of the linear relationship between Cigarette and
Coronary. (2 Points.)
E) Do the p-value for βˆ and the value of R 2 provide contradictory evidence on the
strength of the linear relationship between smoking and heart disease? Explain. (2
Points.)
Questions 3) and 4) pertain to a data set on quarterly sales (in Millions of Dollars) of
Lowe's Companies, a home improvement retailer, for the time period 1997-2002 (n=24).
We will use a response variable of Y=log Sales. The explanatory variables are Housing
Starts (in millions) and Mortgage Rate. Figure 1 gives a scatterplot for log Sales vs.
Housing Starts. This is followed by the corresponding Minitab Simple Linear Regression
output.
Fig 1: Scatterplot of Log Sales vs. Housing Starts, Lowe's Companies
Quarterly, 1997-2002
3.9
3.8
Log Sales
3.7
3.6
3.5
3.4
3.3
3.0
3.5
4.0
Housing starts
4.5
5.0
Regression Analysis: Log Sales versus Housing starts
The regression equation is
Log Sales = 2.95 + 0.166 Housing starts
Predictor
Constant
Housing starts
S = 0.129419
Coef
2.9459
0.16578
SE Coef
0.2067
0.05147
R-Sq = 32.0%
T
14.25
3.22
P
0.000
0.004
R-Sq(adj) = 29.0%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
22
23
SS
0.17374
0.36848
0.54223
MS
0.17374
0.01675
F
10.37
P
0.004
3)
A) Based on the fitted linear regression model, if Housing Starts increase by 1
million, what happens to log Sales? (2 Points).
B) Is there evidence of a positive linear relationship between Housing Starts and log
Sales at the .003 (3 in 1000) level of significance? (3 points).
C) Construct a 95% confidence interval for the true coefficient of Housing Starts (3
Points).
D) Interpret the confidence interval you constructed in Part C) (2 Points).
The table below gives Minitab output for the multiple regression of log Sales vs.
Housing Starts and Mortgage Rate
Regression Analysis: Log Sales versus Housing starts, Mortgage
The regression equation is
Log Sales = 3.43 + 0.152 Housing starts - 0.0580 Mortgage
Predictor
Constant
Housing starts
Mortgage
S = 0.128297
Coef
3.4330
0.15198
-0.05805
SE Coef
0.4617
0.05235
0.04930
R-Sq = 36.3%
T
7.44
2.90
-1.18
P
0.000
0.009
0.252
R-Sq(adj) = 30.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
2
21
23
SS
0.19656
0.34566
0.54223
MS
0.09828
0.01646
F
5.97
P
0.009
4)
A) Based on the multiple regression output above, is there evidence at the 5% level
of significance of a negative relationship between the Mortgage Rate and log Sales?
(3 Points.)
B) Based on the simple and multiple regression output, does Mortgage Rate seem to
be an important variable for predicting log Sales, above and beyond what can be
achieved using Housing Starts alone? (2 Points.)
C) Explain how the F-statistic of 5.97 can be obtained from other numbers given in
the Minitab output. (2 Points.)
D) Do the results of the F-test imply that, beyond a reasonable doubt, all of the true
slope coefficients in the model are nonzero? Explain. (3 Points.)
5) If the assumptions for the simple linear regression model are all satisfied and the
sample size is n=6, then what is the probability that half of the data points (that is, exactly
three of the points) will lie above the true regression line?
6) Consider the following general statement: “The larger the p-value, the higher the
probability that the alternative hypothesis is true.” If this statement is correct, explain
why. If this statement is not correct, explain why not and then provide a correct statement
starting with “The larger the p-value …”.
7) One hundred randomly selected milk cows were observed for one week and then given
a genetically engineered drug designed to increase milk production. The increase in milk
production (second week minus first week) averaged to 11 gallons with a sample
standard deviation of 50 gallons.
A) State the appropriate null and alternative hypotheses for this problem, in terms of
μ . (2 Points.)
B) What is the meaning of μ (in terms of cows)? (2 Points.)
C) What do the null and alternative hypotheses imply about the effectiveness of the
drug? (2 Points.)
D) Give all values of the significance level α at which the null hypothesis can be
rejected. (2 Points.)
E) Suppose the drug had no effect. Then out of 1000 random samples of 100 cows,
how many samples would be expected to yield an increase in milk production at
least as large as what was found in our sample? (2 Points.)
8) Consider a new rule for testing H 0 : μ = μ 0 based on a large sample size. First, we
observe the t-statistic. If the t-statistic is positive, then we perform a hypothesis test of
H 0 : μ = μ 0 versus H A : μ > μ 0 at significance level .05. If the t-statistic is negative,
then we perform a hypothesis test of H 0 : μ = μ 0 versus H A : μ < μ 0 at significance
level .05. If the null hypothesis is true and we use this method, what is the probability that
we will find the results to be statistically significant at level .05?
9) True or False: “The least-squares method allows us to compute the slope and intercept
of the true regression line based on sample data.” Explain.
10) For a random sample of 10 observations from a standard normal distribution, what is
the probability that the absolute value of the sample mean will exceed 2.262 SE, where
SE is the estimated standard error of the mean?