Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Learning Objectives (LOs) LO 13.1: Provide a conceptual overview of ANOVA. LO 13.2: Conduct and evaluate hypothesis tests based on one-way ANOVA. LO 13.3: Use Tukey’s HSD method in order to determine which means differ. LO 13.4: Conduct and evaluate hypothesis tests based on two-way ANOVA with no interaction. LO 13.5: Conduct and evaluate hypothesis tests based on two-way ANOVA with interaction. 13-2 Chapter Case - How Much Does Using Public Transportation Save? Environmental and economic concerns have led to an upswing in the use of public transportation. Research analyst Sean Cox looked at study results from a Boston Globe article that claimed commuters there topped the nation in cost savings from public transportation. He wants to know if the average savings significantly differ among these cities. 13-3 How Much Does Using Public Transportation Save? 13-4 13.1 One-Way ANOVA LO 13.1 Provide a conceptual overview of ANOVA. Analysis of Variance (ANOVA) is used to determine if there are differences among three or more populations means (µ). One-way ANOVA compares population means based on one categorical variable (City in the chapter case) called treatment. We utilize a completely randomized design, comparing sample means computed for each treatment to test whether the population means differ. 13-5 LO 13.1 ANOVA Assumptions The assumptions are extensions of those we used when comparing just two populations: 1.The populations are normally distributed. 2.The population standard deviations are unknown but assumed equal. 3.Samples are selected independently from each population. Here we compare a total of c (more than 2) populations, rather than just two. 13-6 4 Steps of HT 1. 2. 3. 4. State hypotheses (H0 and HA) Find test statistic (F in this case) Find critical or p-value (We get them from the Excel output) Conclusion and interpretation 13-7 The Hypothesis Test LO 13.2 Conduct and evaluate hypothesis tests based on one-way ANOVA. Step 1: The competing hypotheses for the one-way ANOVA: H0: µ1 = µ2 = µ3 = µ4 (different treatments have no different effect on population means) HA: Not all population means are equal (different treatments have different effect on population means) We have only this pair of hypotheses. 13-8 LO 13.2 The ANOVA Concept The competing hypotheses are displayed graphically below: The left graph depicts the null hypothesis, where all sample means are drawn from the same distribution. On the right, the distributions, and population means, differ. Sample means are close enough to one another population means are same. 13-9 LO 13.2 The ANOVA Concept The rationale is 1. If sample means are close enough to one another, we guess that population means are same. 2. Then how close the sample means should be? 3. They should be close enough so that the variance of the sample means is small enough. 4. How small the variance should be? It is determined by F critical value in the F distribution. 5. So we convert the variance of sample means into F test statistic. 6. F value the ratio of once variance to another variance. 7. So we find another variance and divide the variance of sample means by another variance. 8. This is the F test statistic. 13-10 LO 13.2 The ANOVA Concept The rationale is 9. The F test statistic should be small enough (smaller than the critical value) for us to conclude that population means are same. If F test statistic < F critical value, then we do not reject H0. If F test statistic > F critical value, then we reject H0. 10. Small enough F test statistic also means p –value (P(F >F test statistic) is greater than level of significance (α) If p-value > α, then we do not reject H0. If p-value < α, then we reject H0. 13-11 Chapter Case Steps 2 and 3: Generate Excel table ANOVA Source of Variation Between Groups Within Groups SS df 13204720 3 144180 20 Total 13348900 23 MS 4401573 7209 F P-value 610.5664 7.96E-20 F crit 3.098391 This ANOVA table shows F test statistic and critical value, but we will use the p-value, which is 7.96/100000000000000000000. It is practically 0. Please see Public Transportation in Excel Demonstration in D2L to see how to generate this table. The data is posted in Excel Data so you can practice it. 13-12 Chapter Case Step 4: F test statistic (610.5694) > F critical value (3.0984), so we reject H0. Or p-value (0) < α (0.05). So we reject H0. It means that there is significance difference among the average cost savings in 4 cities. 13-13 13.2 Multiple Comparison Methods LO 13.3 Use confidence intervals and Tukey’s HSD method in order to determine which means differ. When the one-way ANOVA finds significant differences between the population means (as in the chapter case), it is natural to ask which means differ. In this section we show two techniques for performing this follow-up analysis: Fisher’s Least Difference Method (Not very efficient) Tukey’s Honestly Significant Differences Method 13-14 LO 13.3 The Tukey HSD Procedure Tukey’s method generates confidence intervals (CI) of the form: If CI does not include 0, there is difference between 2 population means (positive or negative) The q-statistic comes from the studentized range distribution with c and (nT – c) degrees of freedom. 13-15 LO 13.3 The Tukey HSD Procedure Is there difference between Boston and Chicago? Boston sample mean = 12622 Chicago sample mean = 10730 With α = 0.05, c = 4, nT – c = 24 – 4 = 20, q = 3.96 MSE = 7209 nBoston = 5 nChicago = 5 13-16 LO 13.3 The Tukey HSD Procedure confidence interval (CI) for the difference between Boston and Chicago = (12622-10730) ± 3.96 7209/2 (1/5+1/5) = 1892 ± 150.3653 = (1741.635, 2042.365) CI does not include 0, so there is difference between Boston and Chicago (Boston > Chicago). You can repeat this procedure for all pairs of cities. 13-17 Excel Output of the Chapter Case Anova: Single Factor SUMMARY Groups Boston NY SF Count 5 8 6 Chicago 5 Sum Average Variance 63110 12622 7707.5 100680 12585 6464.286 70320 11720 7050 53650 10730 ANOVA Source of Variation 8212.5 c-1 SS df MS Between Groups Within Groups 13204720 144180 3 20 Total 13348900 23 nT - c 4401573 7209 F 610.5664 P-value 7.96E-20 F crit 3.098391 MSE 13-18 Finding q 13-19 Homework 1. 2. Problem 8 on page 394. To answer a, Download “Detergent” data from s:\jslee\QM3620\Data and run Excel to generate the ANOVA table. Problem 14 on page 400. Answers are on pp. 677-678 in the appendix. 13-20 13.3 Two-Way ANOVA with No Interaction - Example 13.4 She interviews four workers in three fields and determines their salaries. This is one-way ANOVA Data are not reliable 13-21 Two-Way ANOVA Example If the education level of the 12 workers is considered, a different story emerges. The data is posted as “Two_Factor_Income” in Excel Data. Educational Services Financial Services Medical Services High School 18 25 26 Bachelor’s 35 45 43 Master’s 46 58 62 Ph.D. 75 90 110 It is clear that education also impacts wage. Se we collect more reliable data this way 13-22 The Randomized Block Design This type of two-way ANOVA is called a randomized block design. The term “block” refers to a matched set of observations across the treatments. In the salary example, the treatments are the three fields of employment. The blocks are the education levels. Until we account for them, we cannot capture the employment field effects. 13-23 LO 13.4 ANOVA for the Salary Example H0: µ1 = µ2 = µ3 (Average salaries in 3 fields are same) H1: Not all the average salaries are same 13-24 LO 13.4 ANOVA for the Salary Example ANOVA Source of Variation Rows Columns Error Total SS 7632.91667 579.5 269.833333 8482.25 df MS F P-value F crit 3 2544.306 56.57505 8.6E-05 4.757063 2 289.75 6.442866 0.032067 5.143253 6 44.97222 11 See “Two Factor Income” In Excel Demonstration in D2L. You can also find the date in Excel Data. 13-25 LO 13.4 Interpreting the Results Steps 2, 3 & 4 F test statistic of column (56.58) > f critical value (4.76). Also p-value of column is about 0.03 (< 0.05). So, we reject H0. After controlling for educational level, the average salaries differ. Salary also differs by educational level, as indicated by the very small p-value. 13-26 LO 13.4 ANOVA for the Salary Example Hypothesis are always: H0: Same means (There is no effect of the factor) HA: Different means (There is effect of the factor) 13-27 Homework Problem 30 on p. 409. Do parts a, b and c. The data, “House” is posted on S:\jslee\QM3620\Data. You should download this file and run Excel to generate ANOVA table. Answers are on p. 679 in the Appendix. 13-28 13.4: Two-Way ANOVA with Interaction LO 13.5 Conduct and evaluate hypothesis tests based on two-way ANOVA with interaction. Now we will look at data categorized by two factors, but with two or more values observed in each “cell.” We want to know three effects: Effect of column factor Effect of row factor Interaction effect with three p-values (You can use F test statistics and F critical values, but using p-values is simpler) 13-29 LO 13.5 What is Interaction? Interaction means that the effect of one factor depends on the level of the other factor. For example, perhaps education impacts salaries in the financial sector, but not in professional sports. The two categories, employment sector and education, interact differently depending on the sector. 13-30 LO 13.5 Another Salary Example Each interior cell entry represents the average salary of 3 employees who fit the categories. By choosing Data > Data Analysis > ANOVA: Two Factor With Replication, we are able to obtain Excel output. 13-31 LO 13.5 Two-way ANOVA with Interaction Data and demonstration are in Excel Data and Excel Demonstration. The are called “Two_Factor_Income_with_Interaction.” 13-32 LO 13.5 Interpreting the Output We could do all the steps of HT in a simple way. If p-value is smaller than level of significance (α), then we conclude the relevant factor has effect. 13-33 LO 13.5 Interpreting the Output P-value for row factor (Education level) = 0 < α =0.5, reject H0, meaning there is effect. P-value for column factor (Field) = 0 < α =0.5, reject H0, meaning there is effect. P-value for Interaction (Education level * Field) = 0.0017 < α =0.5, reject H0, meaning there is effect. 13-34 Homework Problem 38 on p. 415. Data called “Job Satisfaction” is posted on s:\jslee\qm3620\Data. Answers are on p. 679. 13-35
© Copyright 2024