Download Report

Business Statistics: Communicating with Numbers
By Sanjiv Jaggia and Alison Kelly
McGraw-Hill/Irwin
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 13 Learning Objectives (LOs)
LO 13.1: Provide a conceptual overview of ANOVA.
LO 13.2: Conduct and evaluate hypothesis tests based on
one-way ANOVA.
LO 13.3: Use Tukey’s HSD method in order to determine
which means differ.
LO 13.4: Conduct and evaluate hypothesis tests based on
two-way ANOVA with no interaction.
LO 13.5: Conduct and evaluate hypothesis tests based on
two-way ANOVA with interaction.
13-2
Chapter Case - How Much Does Using Public
Transportation Save?

Environmental and economic concerns have led to
an upswing in the use of public transportation.

Research analyst Sean Cox looked at study results
from a Boston Globe article that claimed commuters
there topped the nation in cost savings from public
transportation.

He wants to know if the average savings significantly
differ among these cities.
13-3
How Much Does Using Public Transportation Save?
13-4
13.1 One-Way ANOVA
LO 13.1 Provide a conceptual overview of ANOVA.

Analysis of Variance (ANOVA) is used to determine if
there are differences among three or more populations
means (µ).

One-way ANOVA compares population means based on
one categorical variable (City in the chapter case) called
treatment.

We utilize a completely randomized design, comparing
sample means computed for each treatment to test
whether the population means differ.
13-5
LO 13.1
ANOVA Assumptions
The assumptions are extensions of those we
used when comparing just two populations:
1.The populations are normally distributed.
2.The
population standard deviations are unknown
but assumed equal.
3.Samples
are selected independently from each
population.
Here we compare a total of c (more than 2)
populations, rather than just two.
13-6
4 Steps of HT
1.
2.
3.
4.
State hypotheses (H0 and HA)
Find test statistic (F in this case)
Find critical or p-value
(We get them from the Excel
output)
Conclusion and interpretation
13-7
The Hypothesis Test
LO 13.2 Conduct and evaluate hypothesis tests based on one-way ANOVA.

Step 1: The competing hypotheses for the
one-way ANOVA:
H0: µ1 = µ2 = µ3 = µ4
(different treatments have no different effect on population means)
HA: Not all population means are equal
(different treatments have different effect on population means)
We have only this pair of hypotheses.
13-8
LO 13.2
The ANOVA Concept

The competing hypotheses are displayed graphically below:

The left graph depicts the null hypothesis, where all sample
means are drawn from the same distribution.
On the right, the distributions, and population means, differ.

Sample means are close enough to one another  population means are same.
13-9
LO 13.2
The ANOVA Concept
The rationale is
1. If sample means are close enough to one another, we
guess that population means are same.
2. Then how close the sample means should be?
3. They should be close enough so that the variance of the
sample means is small enough.
4. How small the variance should be? It is determined by F
critical value in the F distribution.
5. So we convert the variance of sample means into F test
statistic.
6. F value the ratio of once variance to another variance.
7. So we find another variance and divide the variance of
sample means by another variance.
8. This is the F test statistic.
13-10
LO 13.2
The ANOVA Concept
The rationale is
9. The F test statistic should be small enough (smaller than
the critical value) for us to conclude that population means
are same.
If F test statistic < F critical value, then we do not reject H0.
If F test statistic > F critical value, then we reject H0.
10. Small enough F test statistic also means p –value (P(F
>F test statistic) is greater than level of significance (α)
If p-value > α, then we do not reject H0.
If p-value < α, then we reject H0.
13-11
Chapter Case

Steps 2 and 3: Generate Excel table
ANOVA
Source of Variation
Between Groups
Within Groups
SS
df
13204720
3
144180 20
Total
13348900 23
MS
4401573
7209
F
P-value
610.5664 7.96E-20
F crit
3.098391
This ANOVA table shows F test statistic and critical value, but we will
use the p-value, which is 7.96/100000000000000000000. It is
practically 0.
Please see Public Transportation in Excel Demonstration in D2L to see
how to generate this table. The data is posted in Excel Data so you can
practice it.
13-12
Chapter Case

Step 4:
F test statistic (610.5694) > F critical value
(3.0984), so we reject H0.
Or
p-value (0) < α (0.05).
So we reject H0.
It means that there is significance difference
among the average cost savings in 4 cities.
13-13
13.2 Multiple Comparison Methods
LO 13.3 Use confidence intervals and Tukey’s HSD method in order to
determine which means differ.

When the one-way ANOVA finds significant
differences between the population means (as in
the chapter case), it is natural to ask which
means differ.

In this section we show two techniques for
performing this follow-up analysis:


Fisher’s Least Difference Method (Not very efficient)
Tukey’s Honestly Significant Differences
Method
13-14
LO 13.3
The Tukey HSD Procedure

Tukey’s method generates confidence intervals (CI)
of the form:

If CI does not include 0, there is difference between
2 population means (positive or negative)

The q-statistic comes from the studentized range
distribution with c and (nT – c) degrees of freedom.
13-15
LO 13.3
The Tukey HSD Procedure

Is there difference between Boston and Chicago?

Boston sample mean = 12622
Chicago sample mean = 10730
With α = 0.05, c = 4, nT – c = 24 – 4 = 20, q = 3.96
MSE = 7209
nBoston = 5 nChicago = 5




13-16
LO 13.3

The Tukey HSD Procedure
confidence interval (CI) for the difference between
Boston and Chicago
= (12622-10730) ± 3.96 7209/2 (1/5+1/5)
= 1892 ± 150.3653
= (1741.635, 2042.365)
CI does not include 0, so there is difference between
Boston and Chicago (Boston > Chicago).
You can repeat this procedure for all pairs of cities.
13-17
Excel Output of the Chapter Case
Anova: Single Factor
SUMMARY
Groups
Boston
NY
SF
Count
5
8
6
Chicago
5
Sum
Average Variance
63110
12622
7707.5
100680
12585 6464.286
70320
11720
7050
53650
10730
ANOVA
Source of Variation
8212.5
c-1
SS
df
MS
Between Groups
Within Groups
13204720
144180
3
20
Total
13348900
23
nT - c
4401573
7209
F
610.5664
P-value
7.96E-20
F crit
3.098391
MSE
13-18
Finding q
13-19
Homework
1.
2.
Problem 8 on page 394. To answer a,
Download “Detergent” data from
s:\jslee\QM3620\Data and run Excel to
generate the ANOVA table.
Problem 14 on page 400.
Answers are on pp. 677-678 in the appendix.
13-20
13.3 Two-Way ANOVA with No
Interaction - Example 13.4

She interviews four workers in three fields and
determines their salaries.

This is one-way ANOVA
Data are not reliable

13-21
Two-Way ANOVA Example
If the education level of the 12 workers is
considered, a different story emerges.
The data is posted as “Two_Factor_Income” in
Excel Data.

Educational Services
Financial Services
Medical Services
High School
18
25
26
Bachelor’s
35
45
43
Master’s
46
58
62
Ph.D.
75
90
110
It is clear that education also impacts wage. Se we
collect more reliable data this way
13-22
The Randomized Block Design

This type of two-way ANOVA is called a
randomized block design.

The term “block” refers to a matched set of
observations across the treatments.

In the salary example, the treatments are the
three fields of employment.

The blocks are the education levels. Until we
account for them, we cannot capture the
employment field effects.
13-23
LO 13.4
ANOVA for the Salary Example
H0: µ1 = µ2 = µ3 (Average salaries in 3 fields are same)
H1: Not all the average salaries are same
13-24
LO 13.4
ANOVA for the Salary Example
ANOVA
Source of Variation
Rows
Columns
Error
Total
SS
7632.91667
579.5
269.833333
8482.25
df
MS
F
P-value
F crit
3 2544.306 56.57505 8.6E-05 4.757063
2 289.75 6.442866 0.032067 5.143253
6 44.97222
11
See “Two Factor Income” In Excel Demonstration in D2L. You can
also find the date in Excel Data.
13-25
LO 13.4
Interpreting the Results
Steps 2, 3 & 4
 F test statistic of column (56.58) > f critical value (4.76).
Also p-value of column is about 0.03 (< 0.05). So, we
reject H0. After controlling for educational level, the
average salaries differ.
 Salary also differs by educational level, as indicated by
the very small p-value.
13-26
LO 13.4
ANOVA for the Salary Example
Hypothesis are always:
H0: Same means
(There is no effect of the factor)
HA: Different means
(There is effect of the factor)
13-27
Homework
Problem 30 on p. 409. Do parts a, b and c.
The data, “House” is posted on
S:\jslee\QM3620\Data. You should download
this file and run Excel to generate ANOVA
table. Answers are on p. 679 in the Appendix.
13-28
13.4: Two-Way ANOVA with Interaction
LO 13.5 Conduct and evaluate hypothesis tests based on two-way ANOVA with
interaction.
Now we will look at data categorized by two factors,
but with two or more values observed in each “cell.”
 We want to know three effects:
Effect of column factor
Effect of row factor
Interaction effect
with three p-values

(You can use F test statistics and F critical values, but using p-values is simpler)
13-29
LO 13.5
What is Interaction?

Interaction means that the effect of one factor
depends on the level of the other factor.

For example, perhaps education impacts
salaries in the financial sector, but not in
professional sports. The two categories,
employment sector and education, interact
differently depending on the sector.
13-30
LO 13.5
Another Salary Example

Each interior cell entry represents the average salary of 3
employees who fit the categories.

By choosing Data > Data Analysis > ANOVA: Two
Factor With Replication, we are able to obtain Excel
output.
13-31
LO 13.5
Two-way ANOVA with Interaction
Data and demonstration are in Excel Data and Excel Demonstration.
The are called “Two_Factor_Income_with_Interaction.”
13-32
LO 13.5

Interpreting the Output
We could do all the steps of HT in a simple way. If p-value
is smaller than level of significance (α), then we conclude
the relevant factor has effect.
13-33
LO 13.5



Interpreting the Output
P-value for row factor (Education level) = 0 < α =0.5, reject
H0, meaning there is effect.
P-value for column factor (Field) = 0 < α =0.5, reject H0,
meaning there is effect.
P-value for Interaction (Education level * Field) = 0.0017 <
α =0.5, reject H0, meaning there is effect.
13-34
Homework
Problem 38 on p. 415. Data called “Job
Satisfaction” is posted on s:\jslee\qm3620\Data.
Answers are on p. 679.
13-35