Statistics for Strategy Exam 3 Spring 2015 1:30 pm Friday, April 10 100 points CALCULATION DIRECTIONS: Carry all calculations to at least four decimal places. 1. Bubble Sheet: (Fill in 2 items only) (a) Write and bubble your Name (Last name, First name) (b) Write and bubble your 8-digit Student ID Number (Begin in the leftmost box.) 2. Exam Booklet: (Fill in 3 items) Name Student ID Number Section Number (Fill in box, worth 3 points on exam) 3. Sign Tippie Honor Pledge: “I have neither given nor received assistance on this exam.” Signature: • This is a closed-book exam. Use a calculator (but no cell phones) and pencils only. You have 50 minutes to complete the exam. • There are 25 questions, one table, and a formula sheet on 25 pages. • The exam format is multiple choice. Circle the single best answer on the exam booklet and fill the corresponding bubble on the bubble sheet with a No. 2 pencil. • The base score for built-in partial credit is 22 points. In addition, 3 points are earned for the correct section section number and for each question which is answered correctly on the bubble sheet. Check ICON to see your score and a list of exam questions which were answered incorrectly. When finished, please put bubble sheet inside the first page of the exam booklet and deposit into box. 1 Questions 1–3. One reason to invest abroad is that markets in different countries don’t move in step. When U.S. stocks go down, foreign stocks may go up. So an investor who holds both bears less risk. That’s the theory of stock diversification. The headline in a recent business journal article reads: The correlation between changes in U.S. and European share prices has risen from 0.4 in the year 2000 to 0.8 in the year 2015. 1. Explain to an investor in U.S. stocks why the headline implies reduced protection if he/she also buys European stocks. (a) Increased correlation implies that when U.S. stocks go up, European stocks are more likely to go down. (b) Increased correlation implies that when U.S. stocks go down, European stocks are more likely to go down also. (c) Increased correlation implies that when U.S. stocks go down, European stocks are more likely to go up. (d) No explanation is possible. 2. The same article goes on to say, “Crudely, this means that movements on Wall Street (i.e., in U.S. stocks) can explain 80% of price movements in Europe.” Is this true? (a) Yes, since 80% of changes in European stock prices is explained by changes in U.S. stock prices. (b) No, since 64% of changes in European stock prices is explained by changes in U.S. stock prices. (c) No, since 8.94% of changes in European stock prices is explained by changes in U.S. stock prices. (d) No, since 20% of changes in European stock prices is explained by changes in U.S. stock prices. 3. Let the variables U.S. and Europe represent changes in U.S. and European stock prices, respectively. Which of the following is true? (a) Regression to predict Europe from U.S. will have a positive slope while regression to predict U.S. from Europe will have a negative slope. (b) Regression to predict Europe from U.S. will have a negative slope while regression to predict U.S. from Europe will have a positive slope. (c) Regression to predict Europe from U.S. and regression to predict U.S. from Europe will both have positive slopes. (d) Regression to predict Europe from U.S. and regression to predict U.S. from Europe will both have negative slopes. 2 Questions 4–6. 4. What problem is indicated by a strong curved pattern in regression residuals? (a) The correlation between variables is measuring nonlinear as well as linear association. (b) The mathematical assumptions for regression are not satisfied. (c) The curve represents an extrapolation from the data. 5. In a study of 2013 model cars, a researcher found that 64% of the variation in the price of cars which was explained by the least-squares regression on horsepower. Due to a spike in gasoline prices in the summer of 2013, cars in the study with less horsepower tended to have higher prices. The value of the correlation between horsepower and price is (a) −0.8000 (b) 0.3600 (c) 0.4096 (d) 0.6400 (e) 0.8000 6. Name one reason why a predictor variable may be included in a multiple regression model which is used for prediction, even if the variable is not statistically significant. (a) The sample size may be large enough to include the variable, regardless of significance. (b) Inclusion of the variable may be supported by economic theory. (c) Inclusion of the variable may increase R2 . (d) The variable may be correlated with other variables in the model. (e) None of the above 3 Questions 7–8. Consider the following scatterplot for two variables x and y. The sample means are x¯ = 35.00 and y¯ = 88.83. Scatterplot of y vs x 35 100 95 90 88.83 y 85 80 75 10 20 30 x 40 50 60 7. How many data points contribute negative terms to the correlation? (a) 0 (b) 1 (c) 2 (d) 5 (e) 6 8. Predict the value of y when x = 35. (a) 73.45 (b) 85.64 (c) 88.83 (d) 90.07 (e) The answer cannot be determined based on the available information. 4 Question 9. Recall the gender-discrimination class-action lawsuit against Wal-Mart described in the Notes. Some variables are defined below: y x1 x2 x3 x4 = Employee Compensation (dollars) = Work Experience (months) = Education (years) = Job Classification (clerk, department head, manager) = Gender (0 = woman, 1 = man) Consider further the choice between two regression models to provide statistical evidence in the lawsuit. Model 1 uses simple regression while Model 2 uses multiple regression. • (Model 1) y = β0 + β4 x4 • (Model 2) y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 9. Which model is recommended, and why? (a) Model 1 since adding more variables is likely to reduce standard deviation s and increase R2 . (b) Model 2 so that possible gender discrimination can be evaluated after adjusting for employees’ different educational backgrounds, work experience, and job classifications. (c) Model 1 in order to clearly separate the issue of gender discrimination from other factors which may affect employee compensation. (d) Model 2 since x1 and x2 are quantitative variables and regression cannot be run using categorical predictors only. 5 Questions 10–17. At ACME University, a large-lecture math course has many small discussion sections led by teaching assistants. To improve the quality of teaching, students were asked to fill out anonymous questionnaires rating the effectiveness of their TAs on the following scale: 5 godlike 4 awesome 3 average 2 fair 1 why-do-I-pay tuition-for-this? Data for some sections are shown below. In addition to the average rating of each section, the average final exam score of all students in the corresponding section is also shown. Section Avg TA Rating Avg Final Exam Score 001 3.3 80 002 2.9 74 003 4.1 88 004 3.3 86 005 2.7 78 006 3.4 79 007 2.8 79 008 2.1 60 009 3.7 86 010 3.2 75 10. Calculate the regression equation to predict y = Average Final Exam Score from x = Average TA Rating. (a) yb = 41.22 + 11.905x (b) yb = 11.905 + 41.22x (c) yb = −1.89 + 0.0638x (d) yb = 0.0638 − 1.89x (continued) 6 011 2.4 72 11. Find a 96% confidence interval for the slope using the following partial Minitab output shown below. (Some values have been replaced by ****.) Predictor Constant **** (a) (−5.90, 6.02) Coef **** **** (b) (6.55, 17.26) SE Coef 6.991 2.233 T **** **** (c) (6.64, 17.18) P 0.002 0.000 (d) (−7.85, 4.07) (e) (35.85, 46.55) 12. Is Average Final Exam Score score linearly related to Average TA Rating, at 1% significance? Provide the corresponding t-statistic. (a) Yes, t = 5.33 (b) No, t = 1.70 (c) Yes, t = 13.98 (d) Yes but it’s not possible to determine the value of the t statistic (e) No but it’s not possible to determine the value of the t statistic (continued) 7 13. Consider the MINITAB printout shown below. With 90% certainty, find the average final exam score for a single section whose average TA rating is 3.5. (a) (73.05, 92.73) (b) (74.91, 90.86) (c) (79.40, 86.37) (d) (80.06, 85.71) (e) The answer cannot be determined based on the available information Prediction for Final Exam Score Variable Rating Fit 82.8876 Setting 3.5 SE Fit 1.54143 90% CI (80.0620, 85.7132) 90% PI (74.9147, 90.8606) 14. With 90% certainty, find the mean of the average final exam scores for all sections whose average TA rating is 3.5. (a) (73.05, 92.73) (b) (74.91, 90.86) (c) (79.40, 86.37) (d) (80.06, 85.71) (e) The answer cannot be determined based on the available information (continued) 8 15. With 99% certainty, find the mean of the average final exam scores for all sections whose average TA rating is 3.5. (a) (73.05, 92.73) (b) (77.88, 87.90) (c) (78.01, 87.77) (d) (81.08, 96.61) (e) The answer cannot be determined based on the available information 16. With 99% certainty, find the average final exam score for a single section whose average TA rating is 3.5. (a) (66.25, 99.52) (b) (68.75, 97.03) (c) (71.69, 94.08) (d) (77.88, 87.90) (e) The answer cannot be determined based on the available information 17. What percent of variation in Average Final Exam Score is explained by Average TA Rating? (a) 73.3% (b) 75.9% (c) 85.6% (d) 87.1% (e) 92.9% 9 Questions 18–25. A database contains information on 21 firms engaged in commercial architecture. The goal is to predict y = firm’s total annual billings (in billions of dollars) based on one or more of the available predictor variables: x1 = Number of Architects employed by firm x2 = Number of Engineers employed by firm x3 = Number of support Staff employed by firm x4 = Year that firm was established Use 5% significance for any tests throughout this set of questions. Firm BSA CSO American Schmidt Browning OdleMcG Ratio Cripe InterDes RQAW Gibraltar Fanning Schenkel Sebree Woollen Rowland DLZ Gove MECA Axis Partenh y (Total Billings) 29.5 12.1 18.1 10.5 12.2 5.1 9.6 15.3 5.9 6.7 7.2 10.6 4.4 2.0 2.4 3.3 7.5 6.0 1.7 1.6 1.6 x1 (Architects) 39 17 9 17 22 6 16 7 19 7 13 12 5 2 8 6 6 2 1 5 3 (continued) 10 x2 (Engineers) 36 1 35 5 0 2 0 17 6 11 7 5 0 0 0 0 15 3 0 0 0 x3 (Staff ) 240 66 168 80 70 47 62 91 55 72 66 64 17 12 15 27 58 16 13 10 9 x4 (Year) 1975 1961 1966 1976 1967 1916 1983 1937 1975 1954 1996 1983 1958 1973 1955 1968 1978 1985 1989 1996 1987 Here’s a matrix plot: Matrix Plot of TotalBill, N_Arch, N_Eng, N_Staff, Yr_Estab 0 20 40 0 100 200 30 15 TotalBill 0 40 20 N_Arch 0 40 20 N_Eng 0 200 N_Staff 100 0 2000 1960 Yr_Estab 1920 0 15 30 0 20 40 1920 1960 Several regression printouts are shown below and on the next 8 pages. Questions follow the output. (Use 5% significance for any tests.) Regression Analysis: TotalBill versus N_Arch Model Summary S R-sq 4.29419 61.48% R-sq(adj) 59.46% Coefficients Term Coef Constant 1.97 N_Arch 0.594 SE Coef 1.48 0.108 R-sq(pred) 47.76% T-Value 1.34 5.51 P-Value 0.197 0.000 VIF 1.00 Regression Equation TotalBill = 1.97 + 0.594 N_Arch Prediction for TotalBill Variable N_Arch Fit 13.8529 Setting 20 SE Fit 1.38284 95% CI (10.9586, 16.7473) 95% PI (4.41056, 23.2953) 11 2000 Regression Analysis: TotalBill versus N_Eng Model Summary S R-sq 3.96580 67.15% R-sq(adj) 65.42% Coefficients Term Coef Constant 4.77 N_Eng 0.5119 SE Coef 1.03 0.0821 R-sq(pred) 51.20% T-Value 4.63 6.23 P-Value 0.000 0.000 VIF 1.00 Regression Equation TotalBill = 4.77 + 0.5119 N_Eng Prediction for TotalBill Variable N_Eng Setting 10 Fit 9.88552 SE Fit 0.904215 95% CI (7.99297, 11.7781) 95% PI (1.37199, 18.3990) Regression Analysis: TotalBill versus N_Staff Model Summary S R-sq 1.96819 91.91% R-sq(adj) 91.48% Coefficients Term Coef Constant 1.322 N_Staff 0.11568 SE Coef 0.638 0.00787 R-sq(pred) 90.35% T-Value 2.07 14.69 P-Value 0.052 0.000 VIF 1.00 Regression Equation TotalBill = 1.322 + 0.11568 N_Staff Prediction for TotalBill Variable N_Staff Setting 50 Fit 7.10656 SE Fit 0.436519 95% CI (6.19291, 8.02020) 95% PI (2.88698, 11.3261) 12 Regression Analysis: TotalBill versus Yr_Estab Model Summary S R-sq 6.86691 1.51% R-sq(adj) 0.00% Coefficients Term Coef Constant 93 Yr_Estab -0.0429 SE Coef 157 0.0795 R-sq(pred) 0.00% T-Value 0.59 -0.54 P-Value 0.561 0.596 VIF 1.00 Regression Equation TotalBill = 93 - 0.0429 Yr_Estab Prediction for TotalBill Variable Yr_Estab Fit 7.83992 Setting 1980 SE Fit 1.68233 95% CI (4.31877, 11.3611) 95% PI (-6.95772, 22.6376) Regression Analysis: TotalBill versus N_Arch, N_Eng Model Summary S R-sq 2.42478 88.37% R-sq(adj) 87.07% Coefficients Term Coef Constant 1.626 N_Arch 0.3923 N_Eng 0.3641 SE Coef 0.835 0.0685 0.0565 R-sq(pred) 85.03% T-Value 1.95 5.73 6.45 P-Value 0.067 0.000 0.000 VIF 1.26 1.26 Regression Equation TotalBill = 1.626 + 0.3923 N_Arch + 0.3641 N_Eng Prediction for TotalBill Variable N_Arch N_Eng Setting 20 10 Fit 13.1125 SE Fit 0.789238 95% CI (11.4544, 14.7706) 95% PI (7.75517, 18.4698) 13 Regression Analysis: TotalBill versus N_Arch, N_Staff Model Summary S R-sq 1.92254 92.69% R-sq(adj) 91.87% Coefficients Term Coef Constant 0.980 N_Arch 0.1024 N_Staff 0.1033 SE Coef 0.670 0.0740 0.0118 R-sq(pred) 90.03% T-Value 1.46 1.38 8.76 P-Value 0.161 0.184 0.000 VIF 2.35 2.35 Regression Equation TotalBill = 0.980 + 0.1024 N_Arch + 0.1033 N_Staff Prediction for TotalBill Variable N_Arch N_Staff Setting 20 50 Fit 8.19427 SE Fit 0.894584 95% CI (6.31482, 10.0737) 95% PI (3.73931, 12.6492) Regression Analysis: TotalBill versus N_Arch, Yr_Estab Model Summary S R-sq 4.25515 64.17% R-sq(adj) 60.19% Coefficients Term Coef Constant 114.8 N_Arch 0.600 Yr_Estab -0.0573 SE Coef 97.2 0.107 0.0493 R-sq(pred) 44.16% T-Value 1.18 5.61 -1.16 P-Value 0.253 0.000 0.260 VIF 1.00 1.00 Regression Equation TotalBill = 114.8 + 0.600 N_Arch - 0.0573 Yr_Estab Prediction for TotalBill Variable N_Arch Yr_Estab Fit 13.3627 Setting 20 1980 SE Fit 1.43374 95% CI (10.3506, 16.3749) 95% PI (3.92917, 22.7963) 14 Regression Analysis: TotalBill versus N_Eng, N_Staff Model Summary S R-sq 1.88132 93.00% R-sq(adj) 92.22% Coefficients Term Coef Constant 0.776 N_Eng -0.1507 N_Staff 0.1419 SE Coef 0.692 0.0902 0.0174 R-sq(pred) 90.96% T-Value 1.12 -1.67 8.15 P-Value 0.277 0.112 0.000 VIF 5.35 5.35 Regression Equation TotalBill = 0.776 - 0.1507 N_Eng + 0.1419 N_Staff Prediction for TotalBill Variable N_Eng N_Staff Setting 10 50 Fit 6.36560 SE Fit 0.608693 95% CI (5.08679, 7.64442) 95% PI (2.21137, 10.5198) Regression Analysis: TotalBill versus N_Eng, Yr_Estab Model Summary S R-sq 4.06183 67.35% R-sq(adj) 63.73% Coefficients Term Coef Constant 36.0 N_Eng 0.5092 Yr_Estab -0.0158 SE Coef 93.1 0.0845 0.0472 R-sq(pred) 48.68% T-Value 0.39 6.03 -0.33 P-Value 0.704 0.000 0.742 VIF 1.01 1.01 Regression Equation TotalBill = 36.0 + 0.5092 N_Eng - 0.0158 Yr_Estab Prediction for TotalBill Variable N_Eng Yr_Estab Fit 9.72475 Setting 10 1980 SE Fit 1.04312 95% CI (7.53323, 11.9163) 95% PI (0.914244, 18.5353) 15 Regression Analysis: TotalBill versus N_Staff, Yr_Estab Model Summary S R-sq 2.01918 91.93% R-sq(adj) 91.04% Coefficients Term Coef Constant 12.0 N_Staff 0.11548 Yr_Estab -0.0054 SE Coef 46.4 0.00813 0.0235 R-sq(pred) 88.54% T-Value 0.26 14.20 -0.23 P-Value 0.800 0.000 0.821 VIF 1.01 1.01 Regression Equation TotalBill = 12.0 + 0.11548 N_Staff - 0.0054 Yr_Estab Prediction for TotalBill Variable N_Staff Yr_Estab Setting 50 1980 Fit 7.05678 SE Fit 0.497744 95% CI (6.01106, 8.10251) 95% PI (2.68765, 11.4259) Regression Analysis: TotalBill versus N_Arch, N_Eng, N_Staff Model Summary S R-sq 1.93512 93.00% R-sq(adj) 91.77% Coefficients Term Coef Constant 0.780 N_Arch 0.014 N_Eng -0.136 N_Staff 0.1377 SE Coef 0.713 0.125 0.156 0.0410 R-sq(pred) 89.62% T-Value 1.09 0.11 -0.88 3.36 P-Value 0.289 0.910 0.393 0.004 VIF 6.63 15.11 28.10 Regression Equation TotalBill = 0.780 + 0.014 N_Arch - 0.136 N_Eng + 0.1377 N_Staff Prediction for TotalBill Variable N_Arch N_Eng N_Staff Fit 6.58766 Setting 20 10 50 SE Fit 2.04378 95% CI (2.27566, 10.8997) 95% PI (0.649472, 12.5259) 16 Regression Analysis: TotalBill versus N_Arch, N_Eng, Yr_Estab Model Summary S R-sq 2.39551 89.28% R-sq(adj) 87.38% Coefficients Term Coef Constant 67.9 N_Arch 0.4011 N_Eng 0.3550 Yr_Estab -0.0337 SE Coef 55.2 0.0680 0.0563 0.0280 R-sq(pred) 84.70% T-Value 1.23 5.90 6.31 -1.20 P-Value 0.235 0.000 0.000 0.246 VIF 1.28 1.29 1.02 Regression Equation TotalBill = 67.9 + 0.4011 N_Arch + 0.3550 N_Eng - 0.0337 Yr_Estab Prediction for TotalBill Variable N_Arch N_Eng Yr_Estab Setting 20 10 1980 Fit 12.8431 SE Fit 0.811340 95% CI (11.1313, 14.5548) 95% PI (7.50697, 18.1792) Regression Analysis: TotalBill versus N_Arch, N_Staff, Yr_Estab Model Summary S R-sq 1.96184 92.81% R-sq(adj) 91.54% Coefficients Term Coef Constant 25.6 N_Arch 0.1111 N_Staff 0.1018 Yr_Estab -0.0125 SE Coef 46.1 0.0773 0.0124 0.0234 R-sq(pred) 88.31% T-Value 0.56 1.44 8.23 -0.53 P-Value 0.585 0.169 0.000 0.600 VIF 2.46 2.48 1.06 Regression Equation TotalBill = 25.6 + 0.1111 N_Arch + 0.1018 N_Staff - 0.0125 Yr_Estab Prediction for TotalBill Variable N_Arch N_Staff Yr_Estab Setting 20 50 1980 Fit 8.17172 SE Fit 0.913846 95% CI (6.24367, 10.0998) 95% PI (3.60557, 12.7379) 17 Regression Analysis: TotalBill versus N_Eng, N_Staff, Yr_Estab Model Summary S R-sq 1.93320 93.02% R-sq(adj) 91.78% Coefficients Term Coef Constant 10.4 N_Eng -0.1504 N_Staff 0.1417 Yr_Estab -0.0049 SE Coef 44.4 0.0926 0.0179 0.0225 R-sq(pred) 88.29% T-Value 0.23 -1.62 7.90 -0.22 P-Value 0.818 0.123 0.000 0.831 VIF 5.35 5.37 1.01 Regression Equation TotalBill = 10.4 - 0.1504 N_Eng + 0.1417 N_Staff - 0.0049 Yr_Estab Prediction for TotalBill Variable N_Eng N_Staff Yr_Estab Setting 10 50 1980 Fit 6.32204 SE Fit 0.657150 95% CI (4.93557, 7.70850) 95% PI (2.01413, 10.6300) Regression Analysis: TotalBill versus N_Arch, N_Eng, N_Staff, Yr_Estab Model Summary S R-sq 1.99010 93.03% R-sq(adj) 91.29% Coefficients Term Coef Constant 14.2 N_Arch 0.028 N_Eng -0.122 N_Staff 0.1332 Yr_Estab -0.0068 SE Coef 49.4 0.139 0.169 0.0453 0.0250 R-sq(pred) 87.04% T-Value 0.29 0.20 -0.72 2.94 -0.27 P-Value 0.778 0.840 0.481 0.010 0.790 VIF 7.72 16.80 32.44 1.18 Regression Equation TotalBill = 14.2 + 0.028 N_Arch - 0.122 N_Eng + 0.1332 N_Staff - 0.0068 Yr_Estab Prediction for TotalBill Variable N_Arch N_Eng N_Staff Yr_Estab Fit 6.74650 Setting 20 10 50 1980 SE Fit 2.18182 95% CI (2.12125, 11.3717) 95% PI (0.486200, 13.0068) 18 Cumulative Distribution Function F distribution with 2 DF in numerator and 16 DF in denominator x 1.26 P(_X_<=_x_) 0.689664 Cumulative Distribution Function F distribution with 2 DF in numerator and 16 DF in denominator x 3.26 P(_X_<=_x_) 0.935075 Cumulative Distribution Function F distribution with 2 DF in numerator and 16 DF in denominator x 5.35 P(_X_<=_x_) 0.983371 Cumulative Distribution Function F distribution with 2 DF in numerator and 16 DF in denominator x 7.26 P(_X_<=_x_) 0.994295 (end of output, questions begin next page) 19 18. What’s the best single predictor variable to use in simple regression? (a) x1 = Number of Architects (b) x2 = Number of Engineers (c) x3 = Number of Staff (d) x4 = Year (e) All four choices are equally good 19. The MINITAB printouts show that • the slope for x1 is 0.594 in simple regression • the slope for x1 is 0.392 when both x1 and x2 are used together in multiple regression. Why do these slopes for x1 differ? (Tip: Recall Direct and Indirect Effects.) (a) The predictor x1 acts as a lurking variable in simple regression. (b) The predictor x1 acts as a lurking variable in multiple regression. (c) The predictor x2 acts as a lurking variable in simple regression. (d) The predictor x2 acts as a lurking variable in multiple regression. 20. The Topic 9 Notes recommend 3 different model choices, depending on the goal. If the goal is to gain the best-possible understanding of how the number of architects affects total billings, provide the recommended interpretation. (a) Total billings increase by $28 million for each extra architect on average, when number of engineers, number of staff and year that firm was established are all held constant. (b) Total billings increase by $28 million for each extra architect, on average. (c) Total billings increase by $594 million for each extra architect, on average. (d) Total billings increase by $594 million for each extra architect on average, when number of engineers, number of staff and year that firm was established are all held constant. (e) Total billings increase by $392 million for each extra architect on average, when number of engineers is held constant. (continued) 20 21. Firm A has the following profile: Variable Architects Engineers Staff Year Value 20 10 50 1980 Use the best conservative model to forecast Firm A’s annual billings (in billions of dollars) with 95% certainty. (a) (0.486, 13.007) (b) (2.121, 11.372) (c) (7.755, 18.470) (d) (11.454, 14.771) (e) None of the answers is correct to the third decimal place 22. Refer to Firm A and its profile in the previous question. Suppose that the firm’s owner insists that you use variable x4 = Year in the regression model for your forecast. He says “We’ve always used Year in our forecasts, and we’re not about to stop now!” Use the modified best conservative model to forecast Firm A’s annual billings (in billions of dollars) with 95% certainty. (a) (0.486, 13.007) (b) (2.121, 11.372) (c) (2.688, 11.426) (d) (7.507, 18.179) (e) Stop! The answer cannot be provided since the required variable x4 isn’t significant in any model. 23. The only information available for Firm B is that it was established in 1980. Forecast Firm B’s annual billings (in billions of dollars) with 95% certainty. (a) (−6.96, 22.64) (b) (0, 22.64) (c) (0.486, 13.007) (d) (4.32, 11.36) (e) Stop! It’s not recommended to use regression to provide any such forecast. (continued) 21 24. Consider two sub-databases, each of which contains records for two variables: • (DB1) contains x1 (Architects) and x2 (Engineers) • (DB2) contains x3 (Staff) and x4 (Year) If (DB1) will definitely be used to predict Total Billings, is it useful to use (DB2) in addition? (a) Yes since P -value = 0.006 (b) Yes since P -value = 0.017 (c) Yes since P -value = 0.065 (d) Yes since P -value = 0.310 (e) No since P -value = 0.310 25. Reconsider (DB1) and (DB2) from the previous question. If (DB2) will definitely be used to predict Total Billings, is it useful to use (DB1) in addition? (a) No since P -value = 0.006 (b) No since P -value = 0.017 (c) No since P -value = 0.065 (d) No since P -value = 0.310 (e) Yes since P -value = 0.006 (next page blank) 22 (blank page) 23 Table entry for p and C is the critical value t * with probability p lying to its right and probability C lying between −t * and t *. Probability p t* 0 TABLE D t distribution critical values Upper tail probability p df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 80 100 1000 z∗ .25 .20 .15 .10 .05 .025 .02 .01 .005 .0025 .001 .0005 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.681 0.679 0.679 0.678 0.677 0.675 0.674 1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.851 0.849 0.848 0.846 0.845 0.842 0.841 1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.050 1.047 1.045 1.043 1.042 1.037 1.036 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.292 1.290 1.282 1.282 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.664 1.660 1.646 1.645 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.990 1.984 1.962 1.960 15.89 4.849 3.482 2.999 2.757 2.612 2.517 2.449 2.398 2.359 2.328 2.303 2.282 2.264 2.249 2.235 2.224 2.214 2.205 2.197 2.189 2.183 2.177 2.172 2.167 2.162 2.158 2.154 2.150 2.147 2.123 2.109 2.099 2.088 2.081 2.056 2.054 31.82 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.374 2.364 2.330 2.326 63.66 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.639 2.626 2.581 2.575 127.3 14.09 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.971 2.937 2.915 2.887 2.871 2.813 2.807 318.3 22.33 10.21 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.611 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.195 3.174 3.098 3.091 636.6 31.60 12.92 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.690 3.674 3.659 3.646 3.551 3.496 3.460 3.416 3.390 3.300 3.291 50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9% Confidence level C T-11 24 Exam 3 Formulas 1 ∑( xi − x¯ )( yi − y¯ ) r= n − 1 i=1 sx sy n Error df = (n − 2) for simple regression Error df = (n − p − 1) for multiple regression (where p = # of predictors) b1 = r sy sx b0 = y¯ − b1 x¯ bi ± t∗ SEbi t= bi SEbi yb ± t∗ SEµb yb ± t∗ SEyb s x¯ ± t∗n−1 √ n F = MSR MSE with p and n − p − 1 degrees of freedom SSRegression SSTotal ∑n (yi − ybi )2 2 s = MSE = i=1 n−p−1 R2 = Testing reduced models in multiple regression: F = (R12 − R22 )/q (1 − R12 )/(n − p − 1) where • R12 is from full model, • numerator df = q R22 is from reduced model denominator df = n − p − 1 • p = # variables in full model, q = # variables being tested as a group 25 Answers 1. b 2. b 3. c 4. b 5. a 6. b 7. d 8. c From page 32 in the Notes: The point (¯ x, y¯) is always on the regression line. 9. b 10. a 11. b 12. a 13. b 14. d 15. b 16. e and b The intended answer is (e) since MINITAB provides only SEµb , not SEyb. But it is actually possible to calculate SEyb working backward from the 90% PI, and then use SEyb to calculate answer (b). Therefore exam credit is given for both answers (e) and (b). 17. b 18. c 19. c 20. a 21. e The BC model uses Staffing alone in simple regression (R2 = 91.91%). The 95% PI for Firm A using this model is (2.887, 11.326) . 22. c 23. e 24. b 25. d 26
© Copyright 2024