When finished, please put bubble sheet inside

Statistics for Strategy Exam 3 Spring 2015
1:30 pm Friday, April 10 100 points
CALCULATION DIRECTIONS: Carry all calculations to at least four decimal places.
1. Bubble Sheet: (Fill in 2 items only)
(a) Write and bubble your Name
(Last name, First name)
(b) Write and bubble your 8-digit Student ID Number
(Begin in the leftmost box.)
2. Exam Booklet: (Fill in 3 items)
Name
Student ID Number
Section Number
(Fill in box, worth 3 points on exam)
3. Sign Tippie Honor Pledge:
“I have neither given nor received assistance on this exam.”
Signature:
• This is a closed-book exam. Use a calculator (but no cell phones) and pencils only. You have
50 minutes to complete the exam.
• There are 25 questions, one table, and a formula sheet on 25 pages.
• The exam format is multiple choice. Circle the single best answer on the exam booklet
and fill the corresponding bubble on the bubble sheet with a No. 2 pencil.
• The base score for built-in partial credit is 22 points. In addition, 3 points are earned for the
correct section section number and for each question which is answered correctly on the bubble
sheet. Check ICON to see your score and a list of exam questions which were answered incorrectly.
When finished, please put bubble sheet inside the first page
of the exam booklet and deposit into box.
1
Questions 1–3.
One reason to invest abroad is that markets in different countries don’t move in step. When
U.S. stocks go down, foreign stocks may go up. So an investor who holds both bears less
risk. That’s the theory of stock diversification.
The headline in a recent business journal article reads:
The correlation between changes in U.S. and European share prices has risen from
0.4 in the year 2000 to 0.8 in the year 2015.
1. Explain to an investor in U.S. stocks why the headline implies reduced protection if
he/she also buys European stocks.
(a) Increased correlation implies that when U.S. stocks go up, European stocks are
more likely to go down.
(b) Increased correlation implies that when U.S. stocks go down, European stocks are
more likely to go down also.
(c) Increased correlation implies that when U.S. stocks go down, European stocks are
more likely to go up.
(d) No explanation is possible.
2. The same article goes on to say, “Crudely, this means that movements on Wall Street
(i.e., in U.S. stocks) can explain 80% of price movements in Europe.” Is this true?
(a) Yes, since 80% of changes in European stock prices is explained by changes in
U.S. stock prices.
(b) No, since 64% of changes in European stock prices is explained by changes in U.S.
stock prices.
(c) No, since 8.94% of changes in European stock prices is explained by changes in
U.S. stock prices.
(d) No, since 20% of changes in European stock prices is explained by changes in U.S.
stock prices.
3. Let the variables U.S. and Europe represent changes in U.S. and European stock prices,
respectively. Which of the following is true?
(a) Regression to predict Europe from U.S. will have a positive slope while regression
to predict U.S. from Europe will have a negative slope.
(b) Regression to predict Europe from U.S. will have a negative slope while regression
to predict U.S. from Europe will have a positive slope.
(c) Regression to predict Europe from U.S. and regression to predict U.S. from Europe
will both have positive slopes.
(d) Regression to predict Europe from U.S. and regression to predict U.S. from Europe
will both have negative slopes.
2
Questions 4–6.
4. What problem is indicated by a strong curved pattern in regression residuals?
(a) The correlation between variables is measuring nonlinear as well as linear association.
(b) The mathematical assumptions for regression are not satisfied.
(c) The curve represents an extrapolation from the data.
5. In a study of 2013 model cars, a researcher found that 64% of the variation in the price
of cars which was explained by the least-squares regression on horsepower. Due to a
spike in gasoline prices in the summer of 2013, cars in the study with less horsepower
tended to have higher prices. The value of the correlation between horsepower and
price is
(a) −0.8000
(b) 0.3600
(c) 0.4096
(d) 0.6400
(e) 0.8000
6. Name one reason why a predictor variable may be included in a multiple regression
model which is used for prediction, even if the variable is not statistically significant.
(a) The sample size may be large enough to include the variable, regardless of significance.
(b) Inclusion of the variable may be supported by economic theory.
(c) Inclusion of the variable may increase R2 .
(d) The variable may be correlated with other variables in the model.
(e) None of the above
3
Questions 7–8.
Consider the following scatterplot for two variables x and y. The sample means are x¯ = 35.00
and y¯ = 88.83.
Scatterplot of y vs x
35
100
95
90
88.83
y
85
80
75
10
20
30
x
40
50
60
7. How many data points contribute negative terms to the correlation?
(a) 0
(b) 1
(c) 2
(d) 5
(e) 6
8. Predict the value of y when x = 35.
(a) 73.45
(b) 85.64
(c) 88.83
(d) 90.07
(e) The answer cannot be determined based on the available information.
4
Question 9.
Recall the gender-discrimination class-action lawsuit against Wal-Mart described in the
Notes. Some variables are defined below:
y
x1
x2
x3
x4
= Employee Compensation (dollars)
= Work Experience (months)
= Education (years)
= Job Classification (clerk, department head, manager)
= Gender (0 = woman, 1 = man)
Consider further the choice between two regression models to provide statistical evidence in
the lawsuit. Model 1 uses simple regression while Model 2 uses multiple regression.
• (Model 1) y = β0 + β4 x4
• (Model 2) y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4
9. Which model is recommended, and why?
(a) Model 1 since adding more variables is likely to reduce standard deviation s and
increase R2 .
(b) Model 2 so that possible gender discrimination can be evaluated after adjusting for employees’ different educational backgrounds, work experience, and job
classifications.
(c) Model 1 in order to clearly separate the issue of gender discrimination from other
factors which may affect employee compensation.
(d) Model 2 since x1 and x2 are quantitative variables and regression cannot be run
using categorical predictors only.
5
Questions 10–17.
At ACME University, a large-lecture math course has many small discussion sections led
by teaching assistants. To improve the quality of teaching, students were asked to fill out
anonymous questionnaires rating the effectiveness of their TAs on the following scale:
5
godlike
4
awesome
3
average
2
fair
1
why-do-I-pay
tuition-for-this?
Data for some sections are shown below. In addition to the average rating of each section,
the average final exam score of all students in the corresponding section is also shown.
Section
Avg TA Rating
Avg Final Exam Score
001
3.3
80
002
2.9
74
003
4.1
88
004
3.3
86
005
2.7
78
006
3.4
79
007
2.8
79
008
2.1
60
009
3.7
86
010
3.2
75
10. Calculate the regression equation to predict y = Average Final Exam Score
from x = Average TA Rating.
(a) yb = 41.22 + 11.905x
(b) yb = 11.905 + 41.22x
(c) yb = −1.89 + 0.0638x
(d) yb = 0.0638 − 1.89x
(continued)
6
011
2.4
72
11. Find a 96% confidence interval for the slope using the following partial Minitab output
shown below. (Some values have been replaced by ****.)
Predictor
Constant
****
(a) (−5.90, 6.02)
Coef
****
****
(b) (6.55, 17.26)
SE Coef
6.991
2.233
T
****
****
(c) (6.64, 17.18)
P
0.002
0.000
(d) (−7.85, 4.07)
(e) (35.85, 46.55)
12. Is Average Final Exam Score score linearly related to Average TA Rating, at 1%
significance? Provide the corresponding t-statistic.
(a) Yes, t = 5.33
(b) No, t = 1.70
(c) Yes, t = 13.98
(d) Yes but it’s not possible to determine the value of the t statistic
(e) No but it’s not possible to determine the value of the t statistic
(continued)
7
13. Consider the MINITAB printout shown below. With 90% certainty, find the average
final exam score for a single section whose average TA rating is 3.5.
(a) (73.05, 92.73)
(b) (74.91, 90.86)
(c) (79.40, 86.37)
(d) (80.06, 85.71)
(e) The answer cannot be determined based on the available information
Prediction for Final Exam Score
Variable
Rating
Fit
82.8876
Setting
3.5
SE Fit
1.54143
90% CI
(80.0620, 85.7132)
90% PI
(74.9147, 90.8606)
14. With 90% certainty, find the mean of the average final exam scores for all sections
whose average TA rating is 3.5.
(a) (73.05, 92.73)
(b) (74.91, 90.86)
(c) (79.40, 86.37)
(d) (80.06, 85.71)
(e) The answer cannot be determined based on the available information
(continued)
8
15. With 99% certainty, find the mean of the average final exam scores for all sections
whose average TA rating is 3.5.
(a) (73.05, 92.73)
(b) (77.88, 87.90)
(c) (78.01, 87.77)
(d) (81.08, 96.61)
(e) The answer cannot be determined based on the available information
16. With 99% certainty, find the average final exam score for a single section whose average
TA rating is 3.5.
(a) (66.25, 99.52)
(b) (68.75, 97.03)
(c) (71.69, 94.08)
(d) (77.88, 87.90)
(e) The answer cannot be determined based on the available information
17. What percent of variation in Average Final Exam Score is explained by Average TA
Rating?
(a) 73.3%
(b) 75.9%
(c) 85.6%
(d) 87.1%
(e) 92.9%
9
Questions 18–25. A database contains information on 21 firms engaged in commercial
architecture. The goal is to predict
y = firm’s total annual billings (in billions of dollars)
based on one or more of the available predictor variables:
x1 = Number of Architects employed by firm
x2 = Number of Engineers employed by firm
x3 = Number of support Staff employed by firm
x4 = Year that firm was established
Use 5% significance for any tests throughout this set of questions.
Firm
BSA
CSO
American
Schmidt
Browning
OdleMcG
Ratio
Cripe
InterDes
RQAW
Gibraltar
Fanning
Schenkel
Sebree
Woollen
Rowland
DLZ
Gove
MECA
Axis
Partenh
y
(Total Billings)
29.5
12.1
18.1
10.5
12.2
5.1
9.6
15.3
5.9
6.7
7.2
10.6
4.4
2.0
2.4
3.3
7.5
6.0
1.7
1.6
1.6
x1
(Architects)
39
17
9
17
22
6
16
7
19
7
13
12
5
2
8
6
6
2
1
5
3
(continued)
10
x2
(Engineers)
36
1
35
5
0
2
0
17
6
11
7
5
0
0
0
0
15
3
0
0
0
x3
(Staff )
240
66
168
80
70
47
62
91
55
72
66
64
17
12
15
27
58
16
13
10
9
x4
(Year)
1975
1961
1966
1976
1967
1916
1983
1937
1975
1954
1996
1983
1958
1973
1955
1968
1978
1985
1989
1996
1987
Here’s a matrix plot:
Matrix Plot of TotalBill, N_Arch, N_Eng, N_Staff, Yr_Estab
0
20
40
0
100
200
30
15
TotalBill
0
40
20
N_Arch
0
40
20
N_Eng
0
200
N_Staff
100
0
2000
1960
Yr_Estab
1920
0
15
30
0
20
40
1920
1960
Several regression printouts are shown below and on the next 8 pages.
Questions follow the output. (Use 5% significance for any tests.)
Regression Analysis: TotalBill versus N_Arch
Model Summary
S
R-sq
4.29419 61.48%
R-sq(adj)
59.46%
Coefficients
Term
Coef
Constant
1.97
N_Arch
0.594
SE Coef
1.48
0.108
R-sq(pred)
47.76%
T-Value
1.34
5.51
P-Value
0.197
0.000
VIF
1.00
Regression Equation
TotalBill = 1.97 + 0.594 N_Arch
Prediction for TotalBill
Variable
N_Arch
Fit
13.8529
Setting
20
SE Fit
1.38284
95% CI
(10.9586, 16.7473)
95% PI
(4.41056, 23.2953)
11
2000
Regression Analysis: TotalBill versus N_Eng
Model Summary
S
R-sq
3.96580 67.15%
R-sq(adj)
65.42%
Coefficients
Term
Coef
Constant
4.77
N_Eng
0.5119
SE Coef
1.03
0.0821
R-sq(pred)
51.20%
T-Value
4.63
6.23
P-Value
0.000
0.000
VIF
1.00
Regression Equation
TotalBill = 4.77 + 0.5119 N_Eng
Prediction for TotalBill
Variable
N_Eng
Setting
10
Fit
9.88552
SE Fit
0.904215
95% CI
(7.99297, 11.7781)
95% PI
(1.37199, 18.3990)
Regression Analysis: TotalBill versus N_Staff
Model Summary
S
R-sq
1.96819 91.91%
R-sq(adj)
91.48%
Coefficients
Term
Coef
Constant
1.322
N_Staff
0.11568
SE Coef
0.638
0.00787
R-sq(pred)
90.35%
T-Value
2.07
14.69
P-Value
0.052
0.000
VIF
1.00
Regression Equation
TotalBill = 1.322 + 0.11568 N_Staff
Prediction for TotalBill
Variable
N_Staff
Setting
50
Fit
7.10656
SE Fit
0.436519
95% CI
(6.19291, 8.02020)
95% PI
(2.88698, 11.3261)
12
Regression Analysis: TotalBill versus Yr_Estab
Model Summary
S
R-sq
6.86691 1.51%
R-sq(adj)
0.00%
Coefficients
Term
Coef
Constant
93
Yr_Estab -0.0429
SE Coef
157
0.0795
R-sq(pred)
0.00%
T-Value
0.59
-0.54
P-Value
0.561
0.596
VIF
1.00
Regression Equation
TotalBill = 93 - 0.0429 Yr_Estab
Prediction for TotalBill
Variable
Yr_Estab
Fit
7.83992
Setting
1980
SE Fit
1.68233
95% CI
(4.31877, 11.3611)
95% PI
(-6.95772, 22.6376)
Regression Analysis: TotalBill versus N_Arch, N_Eng
Model Summary
S
R-sq
2.42478 88.37%
R-sq(adj)
87.07%
Coefficients
Term
Coef
Constant
1.626
N_Arch
0.3923
N_Eng
0.3641
SE Coef
0.835
0.0685
0.0565
R-sq(pred)
85.03%
T-Value
1.95
5.73
6.45
P-Value
0.067
0.000
0.000
VIF
1.26
1.26
Regression Equation
TotalBill = 1.626 + 0.3923 N_Arch + 0.3641 N_Eng
Prediction for TotalBill
Variable
N_Arch
N_Eng
Setting
20
10
Fit
13.1125
SE Fit
0.789238
95% CI
(11.4544, 14.7706)
95% PI
(7.75517, 18.4698)
13
Regression Analysis: TotalBill versus N_Arch, N_Staff
Model Summary
S
R-sq
1.92254 92.69%
R-sq(adj)
91.87%
Coefficients
Term
Coef
Constant
0.980
N_Arch
0.1024
N_Staff
0.1033
SE Coef
0.670
0.0740
0.0118
R-sq(pred)
90.03%
T-Value
1.46
1.38
8.76
P-Value
0.161
0.184
0.000
VIF
2.35
2.35
Regression Equation
TotalBill = 0.980 + 0.1024 N_Arch + 0.1033 N_Staff
Prediction for TotalBill
Variable
N_Arch
N_Staff
Setting
20
50
Fit
8.19427
SE Fit
0.894584
95% CI
(6.31482, 10.0737)
95% PI
(3.73931, 12.6492)
Regression Analysis: TotalBill versus N_Arch, Yr_Estab
Model Summary
S
R-sq
4.25515 64.17%
R-sq(adj)
60.19%
Coefficients
Term
Coef
Constant
114.8
N_Arch
0.600
Yr_Estab -0.0573
SE Coef
97.2
0.107
0.0493
R-sq(pred)
44.16%
T-Value
1.18
5.61
-1.16
P-Value
0.253
0.000
0.260
VIF
1.00
1.00
Regression Equation
TotalBill = 114.8 + 0.600 N_Arch - 0.0573 Yr_Estab
Prediction for TotalBill
Variable
N_Arch
Yr_Estab
Fit
13.3627
Setting
20
1980
SE Fit
1.43374
95% CI
(10.3506, 16.3749)
95% PI
(3.92917, 22.7963)
14
Regression Analysis: TotalBill versus N_Eng, N_Staff
Model Summary
S
R-sq
1.88132 93.00%
R-sq(adj)
92.22%
Coefficients
Term
Coef
Constant
0.776
N_Eng
-0.1507
N_Staff
0.1419
SE Coef
0.692
0.0902
0.0174
R-sq(pred)
90.96%
T-Value
1.12
-1.67
8.15
P-Value
0.277
0.112
0.000
VIF
5.35
5.35
Regression Equation
TotalBill = 0.776 - 0.1507 N_Eng + 0.1419 N_Staff
Prediction for TotalBill
Variable
N_Eng
N_Staff
Setting
10
50
Fit
6.36560
SE Fit
0.608693
95% CI
(5.08679, 7.64442)
95% PI
(2.21137, 10.5198)
Regression Analysis: TotalBill versus N_Eng, Yr_Estab
Model Summary
S
R-sq
4.06183 67.35%
R-sq(adj)
63.73%
Coefficients
Term
Coef
Constant
36.0
N_Eng
0.5092
Yr_Estab -0.0158
SE Coef
93.1
0.0845
0.0472
R-sq(pred)
48.68%
T-Value
0.39
6.03
-0.33
P-Value
0.704
0.000
0.742
VIF
1.01
1.01
Regression Equation
TotalBill = 36.0 + 0.5092 N_Eng - 0.0158 Yr_Estab
Prediction for TotalBill
Variable
N_Eng
Yr_Estab
Fit
9.72475
Setting
10
1980
SE Fit
1.04312
95% CI
(7.53323, 11.9163)
95% PI
(0.914244, 18.5353)
15
Regression Analysis: TotalBill versus N_Staff, Yr_Estab
Model Summary
S
R-sq
2.01918 91.93%
R-sq(adj)
91.04%
Coefficients
Term
Coef
Constant
12.0
N_Staff
0.11548
Yr_Estab -0.0054
SE Coef
46.4
0.00813
0.0235
R-sq(pred)
88.54%
T-Value
0.26
14.20
-0.23
P-Value
0.800
0.000
0.821
VIF
1.01
1.01
Regression Equation
TotalBill = 12.0 + 0.11548 N_Staff - 0.0054 Yr_Estab
Prediction for TotalBill
Variable
N_Staff
Yr_Estab
Setting
50
1980
Fit
7.05678
SE Fit
0.497744
95% CI
(6.01106, 8.10251)
95% PI
(2.68765, 11.4259)
Regression Analysis: TotalBill versus N_Arch, N_Eng, N_Staff
Model Summary
S
R-sq
1.93512 93.00%
R-sq(adj)
91.77%
Coefficients
Term
Coef
Constant
0.780
N_Arch
0.014
N_Eng
-0.136
N_Staff
0.1377
SE Coef
0.713
0.125
0.156
0.0410
R-sq(pred)
89.62%
T-Value
1.09
0.11
-0.88
3.36
P-Value
0.289
0.910
0.393
0.004
VIF
6.63
15.11
28.10
Regression Equation
TotalBill = 0.780 + 0.014 N_Arch - 0.136 N_Eng + 0.1377 N_Staff
Prediction for TotalBill
Variable
N_Arch
N_Eng
N_Staff
Fit
6.58766
Setting
20
10
50
SE Fit
2.04378
95% CI
(2.27566, 10.8997)
95% PI
(0.649472, 12.5259)
16
Regression Analysis: TotalBill versus N_Arch, N_Eng, Yr_Estab
Model Summary
S
R-sq
2.39551 89.28%
R-sq(adj)
87.38%
Coefficients
Term
Coef
Constant
67.9
N_Arch
0.4011
N_Eng
0.3550
Yr_Estab -0.0337
SE Coef
55.2
0.0680
0.0563
0.0280
R-sq(pred)
84.70%
T-Value
1.23
5.90
6.31
-1.20
P-Value
0.235
0.000
0.000
0.246
VIF
1.28
1.29
1.02
Regression Equation
TotalBill = 67.9 + 0.4011 N_Arch + 0.3550 N_Eng - 0.0337 Yr_Estab
Prediction for TotalBill
Variable
N_Arch
N_Eng
Yr_Estab
Setting
20
10
1980
Fit
12.8431
SE Fit
0.811340
95% CI
(11.1313, 14.5548)
95% PI
(7.50697, 18.1792)
Regression Analysis: TotalBill versus N_Arch, N_Staff, Yr_Estab
Model Summary
S
R-sq
1.96184 92.81%
R-sq(adj)
91.54%
Coefficients
Term
Coef
Constant
25.6
N_Arch
0.1111
N_Staff
0.1018
Yr_Estab -0.0125
SE Coef
46.1
0.0773
0.0124
0.0234
R-sq(pred)
88.31%
T-Value
0.56
1.44
8.23
-0.53
P-Value
0.585
0.169
0.000
0.600
VIF
2.46
2.48
1.06
Regression Equation
TotalBill = 25.6 + 0.1111 N_Arch + 0.1018 N_Staff - 0.0125 Yr_Estab
Prediction for TotalBill
Variable
N_Arch
N_Staff
Yr_Estab
Setting
20
50
1980
Fit
8.17172
SE Fit
0.913846
95% CI
(6.24367, 10.0998)
95% PI
(3.60557, 12.7379)
17
Regression Analysis: TotalBill versus N_Eng, N_Staff, Yr_Estab
Model Summary
S
R-sq
1.93320 93.02%
R-sq(adj)
91.78%
Coefficients
Term
Coef
Constant
10.4
N_Eng
-0.1504
N_Staff
0.1417
Yr_Estab -0.0049
SE Coef
44.4
0.0926
0.0179
0.0225
R-sq(pred)
88.29%
T-Value
0.23
-1.62
7.90
-0.22
P-Value
0.818
0.123
0.000
0.831
VIF
5.35
5.37
1.01
Regression Equation
TotalBill = 10.4 - 0.1504 N_Eng + 0.1417 N_Staff - 0.0049 Yr_Estab
Prediction for TotalBill
Variable
N_Eng
N_Staff
Yr_Estab
Setting
10
50
1980
Fit
6.32204
SE Fit
0.657150
95% CI
(4.93557, 7.70850)
95% PI
(2.01413, 10.6300)
Regression Analysis: TotalBill versus N_Arch, N_Eng, N_Staff, Yr_Estab
Model Summary
S
R-sq
1.99010 93.03%
R-sq(adj)
91.29%
Coefficients
Term
Coef
Constant
14.2
N_Arch
0.028
N_Eng
-0.122
N_Staff
0.1332
Yr_Estab -0.0068
SE Coef
49.4
0.139
0.169
0.0453
0.0250
R-sq(pred)
87.04%
T-Value
0.29
0.20
-0.72
2.94
-0.27
P-Value
0.778
0.840
0.481
0.010
0.790
VIF
7.72
16.80
32.44
1.18
Regression Equation
TotalBill = 14.2 + 0.028 N_Arch - 0.122 N_Eng + 0.1332 N_Staff
- 0.0068 Yr_Estab
Prediction for TotalBill
Variable
N_Arch
N_Eng
N_Staff
Yr_Estab
Fit
6.74650
Setting
20
10
50
1980
SE Fit
2.18182
95% CI
(2.12125, 11.3717)
95% PI
(0.486200, 13.0068)
18
Cumulative Distribution Function
F distribution with 2 DF in numerator and 16 DF in denominator
x
1.26
P(_X_<=_x_)
0.689664
Cumulative Distribution Function
F distribution with 2 DF in numerator and 16 DF in denominator
x
3.26
P(_X_<=_x_)
0.935075
Cumulative Distribution Function
F distribution with 2 DF in numerator and 16 DF in denominator
x
5.35
P(_X_<=_x_)
0.983371
Cumulative Distribution Function
F distribution with 2 DF in numerator and 16 DF in denominator
x
7.26
P(_X_<=_x_)
0.994295
(end of output, questions begin next page)
19
18. What’s the best single predictor variable to use in simple regression?
(a) x1 = Number of Architects
(b) x2 = Number of Engineers
(c) x3 = Number of Staff
(d) x4 = Year
(e) All four choices are equally good
19. The MINITAB printouts show that
• the slope for x1 is 0.594 in simple regression
• the slope for x1 is 0.392 when both x1 and x2 are used together in multiple
regression.
Why do these slopes for x1 differ? (Tip: Recall Direct and Indirect Effects.)
(a) The predictor x1 acts as a lurking variable in simple regression.
(b) The predictor x1 acts as a lurking variable in multiple regression.
(c) The predictor x2 acts as a lurking variable in simple regression.
(d) The predictor x2 acts as a lurking variable in multiple regression.
20. The Topic 9 Notes recommend 3 different model choices, depending on the goal.
If the goal is to gain the best-possible understanding of how the number of architects
affects total billings, provide the recommended interpretation.
(a) Total billings increase by $28 million for each extra architect on average, when
number of engineers, number of staff and year that firm was established are all
held constant.
(b) Total billings increase by $28 million for each extra architect, on average.
(c) Total billings increase by $594 million for each extra architect, on average.
(d) Total billings increase by $594 million for each extra architect on average, when
number of engineers, number of staff and year that firm was established are all
held constant.
(e) Total billings increase by $392 million for each extra architect on average, when
number of engineers is held constant.
(continued)
20
21. Firm A has the following profile:
Variable
Architects
Engineers
Staff
Year
Value
20
10
50
1980
Use the best conservative model to forecast Firm A’s annual billings (in billions of
dollars) with 95% certainty.
(a) (0.486, 13.007)
(b) (2.121, 11.372)
(c) (7.755, 18.470)
(d) (11.454, 14.771)
(e) None of the answers is correct to the third decimal place
22. Refer to Firm A and its profile in the previous question. Suppose that the firm’s owner
insists that you use variable x4 = Year in the regression model for your forecast.
He says “We’ve always used Year in our forecasts, and we’re not about to stop now!”
Use the modified best conservative model to forecast Firm A’s annual billings (in billions of dollars) with 95% certainty.
(a) (0.486, 13.007)
(b) (2.121, 11.372)
(c) (2.688, 11.426)
(d) (7.507, 18.179)
(e) Stop! The answer cannot be provided since the required variable x4 isn’t significant in any model.
23. The only information available for Firm B is that it was established in 1980.
Forecast Firm B’s annual billings (in billions of dollars) with 95% certainty.
(a) (−6.96, 22.64)
(b) (0, 22.64)
(c) (0.486, 13.007)
(d) (4.32, 11.36)
(e) Stop! It’s not recommended to use regression to provide any such forecast.
(continued)
21
24. Consider two sub-databases, each of which contains records for two variables:
• (DB1) contains x1 (Architects) and x2 (Engineers)
• (DB2) contains x3 (Staff) and x4 (Year)
If (DB1) will definitely be used to predict Total Billings, is it useful to use (DB2) in
addition?
(a) Yes since P -value = 0.006
(b) Yes since P -value = 0.017
(c) Yes since P -value = 0.065
(d) Yes since P -value = 0.310
(e) No since P -value = 0.310
25. Reconsider (DB1) and (DB2) from the previous question.
If (DB2) will definitely be used to predict Total Billings, is it useful to use (DB1) in
addition?
(a) No since P -value = 0.006
(b) No since P -value = 0.017
(c) No since P -value = 0.065
(d) No since P -value = 0.310
(e) Yes since P -value = 0.006
(next page blank)
22
(blank page)
23
Table entry for p and
C is the critical value
t * with probability p
lying to its right and
probability C lying
between −t * and t *.
Probability p
t*
0
TABLE D t distribution critical values
Upper tail probability p
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
50
60
80
100
1000
z∗
.25
.20
.15
.10
.05
.025
.02
.01
.005
.0025
.001
.0005
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.679
0.678
0.677
0.675
0.674
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.849
0.848
0.846
0.845
0.842
0.841
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.047
1.045
1.043
1.042
1.037
1.036
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.299
1.296
1.292
1.290
1.282
1.282
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.676
1.671
1.664
1.660
1.646
1.645
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.009
2.000
1.990
1.984
1.962
1.960
15.89
4.849
3.482
2.999
2.757
2.612
2.517
2.449
2.398
2.359
2.328
2.303
2.282
2.264
2.249
2.235
2.224
2.214
2.205
2.197
2.189
2.183
2.177
2.172
2.167
2.162
2.158
2.154
2.150
2.147
2.123
2.109
2.099
2.088
2.081
2.056
2.054
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.403
2.390
2.374
2.364
2.330
2.326
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.678
2.660
2.639
2.626
2.581
2.575
127.3
14.09
7.453
5.598
4.773
4.317
4.029
3.833
3.690
3.581
3.497
3.428
3.372
3.326
3.286
3.252
3.222
3.197
3.174
3.153
3.135
3.119
3.104
3.091
3.078
3.067
3.057
3.047
3.038
3.030
2.971
2.937
2.915
2.887
2.871
2.813
2.807
318.3
22.33
10.21
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.611
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.261
3.232
3.195
3.174
3.098
3.091
636.6
31.60
12.92
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.496
3.460
3.416
3.390
3.300
3.291
50%
60%
70%
80%
90%
95%
96%
98%
99%
99.5%
99.8%
99.9%
Confidence level C
T-11
24
Exam 3 Formulas
1 ∑( xi − x¯ )( yi − y¯ )
r=
n − 1 i=1
sx
sy
n
Error df = (n − 2) for simple regression
Error df = (n − p − 1) for multiple regression (where p = # of predictors)
b1 = r
sy
sx
b0 = y¯ − b1 x¯
bi ± t∗ SEbi
t=
bi
SEbi
yb ± t∗ SEµb
yb ± t∗ SEyb
s
x¯ ± t∗n−1 √
n
F =
MSR
MSE
with p and n − p − 1 degrees of freedom
SSRegression
SSTotal
∑n
(yi − ybi )2
2
s = MSE = i=1
n−p−1
R2 =
Testing reduced models in multiple regression:
F =
(R12 − R22 )/q
(1 − R12 )/(n − p − 1)
where
• R12 is from full model,
• numerator df = q
R22 is from reduced model
denominator df = n − p − 1
• p = # variables in full model,
q = # variables being tested as a group
25
Answers
1. b
2. b
3. c
4. b
5. a
6. b
7. d
8. c
From page 32 in the Notes: The point (¯
x, y¯) is always on the regression line.
9. b
10. a
11. b
12. a
13. b
14. d
15. b
16. e and b
The intended answer is (e) since MINITAB provides only SEµb , not SEyb.
But it is actually possible to calculate SEyb working backward from the 90%
PI, and then use SEyb to calculate answer (b). Therefore exam credit is given
for both answers (e) and (b).
17. b
18. c
19. c
20. a
21. e
The BC model uses Staffing alone in simple regression (R2 = 91.91%).
The 95% PI for Firm A using this model is (2.887, 11.326) .
22. c
23. e
24. b
25. d
26