Download Report

Example #1 – Confidence Interval
 To find Z values, use "INVN" function and input "area" (which is the
A sample of 20 students were asked to fill in a
confidence level (usually expressed in %); std dev=1; and mean=0.
survey on gender, how many hours spent studying
for the test and what was the grade earned on the
 To find the t values, you have two options: (1) t table (which will be provided
same test that they had studied for.
in the quiz/test/exam) and (2) INVt function and input "upper area", "df"
Hours: Male 14, Female 17, Female 3, Female 6,
(degree of freedom=n-1).
Male 17, Male 3, Male 8, Male 4, Female 20, Male
*t-distribution is a continuous probability distribution that resulted from estimating the
15, Female 7, Male 9, Male 0, Male 5, Female 11,
mean of a small sample size and the population standard deviation is unknown (more
Female 15, Male 18, Female 13, Male 8, Male 4
spread out than a normal distribution; the larger the degrees of freedom=more accurate
Construct a 91% confidence interval estimate of the
normal distribution)
proportion of all female students.
Margin of error: measures the uncertainty in estimating the population parameter.
a) What parameter are you estimating here? π
Margin of error= (critical value) x (standard error).
b) State/check the necessary assumptions required to
Point estimate: Using the sample Example
statistic (e.g.
#2 or p) to estimate the corresponding
construct the confidence interval:
The director of patient services of a large health
Example #4
np > 5?  20 (8/20) = 8
maintenance organization wants to evaluate
What proportion of people hit
n(1-p)  20 (1 – 8/20) = 12
patient waiting time at a local facility. A random
snags with online
The conditions are met, therefore, it is normally
sample of 25 patients is selected from the appoint
transactions?
According
to
a
distributed.
book. The waiting time is defined as the time
poll,
89%
hit
snags
with
c) Before you construct a 91% confidence interval
from when the patient signs in to when he or she
estimate of the proportion of all female students, state the online transactions.
is seen by the doctor. The following data
a) To conduct a follow up study
critical value: z = ± 1.6954
represents the waiting times (in minutes). x is
that would provide 95%
InvN, Area: 0.91, σ = 1, μ = 0
known:
confidence that the point
d) The point estimate for a 91% confidence interval
19. 30. 45. 39. 29. 41. 13. 17.4 10
estimate
is
correct
to
within
estimate of the proportion of all female students is: x =
5
5
6
8
6
3
8
.7
±0.04
of
the
population
9.85
25. 21. 28. 52. 25. 39. 36.
1.9
Example #3
proportion, how large a sample
4
8
6
0
4
0
6
A survey is planned to determine the mean annual
size is required?
45. 42. 12. 26. 4.9 12. 31. 43.1
family medical expenses of employees of a large
235.0339  234 *In this
9
5
1
1
7
1
company. The management of the company wishes question, you are estimating
to be 95% confident that the sample mean is
Construct
a
95%
confidence
interval
estimate of
π (the proportion)
correct to within ±50 of the population mean
the pop. avg. waiting time.
π = 0.89 z = 1.9599 e = 0.04
annual family medical expenses. A previous study
1. State the necessary assumption(s)/condition(s)
indicates that the standard deviation is approx. Hypothesis Testing - Mean (Z-Test)
required to construct a 95% confidence interval
A new battery has been developed
$400.
estimate for the population average waiting time.
to power laptop computers. It will
- It is not normally distributed (n is not equal or
a) How large a sample size is necessary? 245.8  246 *
sell in a certain price range. It is
greater than 30)
In this question, you are estimating μ (the mean)
hoped that the battery can be used
- Apply CLT ....no. because n = 25
z = 1.9599 e = 50 σ = 400
n = z²σ²
for more than 4.00 hours before it
- Assume x is normally distributed
e²
needs to be recharged. We will
2. State the critical value needed to construct the 95%
b) If management wants to be correct with ±25, how
assume that the battery lives are
confidence interval estimate for the population average
many employees need to be selected?
normally
distributed
with
a
waiting time.
standard deviation of 0.25 hours. A
t critical values = ± 2.064 (Calc: Dist., t, Invt,
random sample of 50 batteries is
Area: 0.025, df: 24)
tested.
The
sample
batteries
lasted an average of 4.12 hours
before they required recharging.
SPSS Example
The table below is a random sample of 20 companies whose
stock is traded on the New York Stock Exchange. For each
company, the number of shares traded on May 25, 1999, and
May 26, 1999, is given.
a. At the .05 level of significance, is there evidence that the
average number of shares traded on May 25 is higher than the
average number of shares traded on May 26?
b. Determine the p-value in (a)
Ho: µD < 0
Ha: µD > 0
Assume the distribution of the differences between the
number of shares traded on the two days is at least
approximately normally distributed
Decision Rule: Reject Ho if p-value is < than the level of
significance (0.05), otherwise fail to reject Ho
p-value = 0.4645
Calc: TEST, t, 1-s, µ: >µ0, µ0: 0, x: 6090, sx: 301997.99, n: 20
Decision: Fail to reject Ho
3. State the point estimate and its value used to
construct a 95% confidence interval estimate for the
population average waiting time.
x = 27.892 (Calc: INTR, t, 1-s, List, C-level: 0.95)
4. The 95% confidence interval estimate for the
population average waiting time is: 22.1668939 < μ <
33.6171261
Calc: INTR, t, 1-s, List, C-level: 0.95, left and right
values
Hypothesis Testing - Mean (T-Test)
A new battery has been developed to power
laptop computers. It will sell in a certain price
range. It is hoped that the battery can be used for
more than 4.00 hours before it needs to be
recharged. We will assume that the battery lives
are normally distributed. A random sample of 50
batteries is tested. The sample batteries lasted an
average of 4.12 hours with a standard deviation of
Hypothesis
0.25 hours
Testing
before
for Z- they
Test –required
σ knownrecharging. Let
A company that makes bolts that are used on an
automotive component uses two machines to make
these bolts. It has been determined by past studies that
the standard deviation of the bolt diameters made by
machine 1 is 0.025 mm. and the standard deviation of
the bolt diameters of machine 2 is 0.022 mm. Both
machines have a dial to set for the desired diameter.
Recently they used both machines to fill a large order.
The customer found that many of the bolts from a
certain package were too large and made a complaint.
It was determined that the package in question was
made by machine 2. The manufacturer decided to take
samples of the bolts from both machines to test to see
whether the mean diameter of the bolts from machine 2
was significantly larger than the mean diameter from
machine 1 when the dial was set to the same diameter
on each machine. The sample of 100 bolts from
machine 1 had a mean diameter of 5.023 mm and a
Hypothesis Testing – One
Population
Population
Mean: µ
Is normal?
Ie. X is or can be
assumed normal
or n ≥30? (CLT
Theorem)
NO
NO
Use a distributionfree test or if
appropriate assume
the population is
normally distributed
and proceed through
the flow chart
Is σ
known?
Z-Test, with test
Statistics
about a population parameter = NULL hypothesis - Denoted as
Ho
- The null hypothesis is written in terms of the population mean, not
the sample mean
- After specifying Ho, we have to specify Alternative Hypothesis -
YE
S
YE
S
- In hypothesis testing we begin with a tentative assumption
Population
proportions:
π
YE
S
Convert
to
underlyin
g
binomial
distributi
on
Denoted Ha OR H1
Example:
- Ho : The average age of the population is equal to 45 years old
- Ha : The average age of the population is NOT equal to 45 years
old
- The alternative hypothesis is also known as research hypothesis
Important note:
In any situation that involves testing the validity of a claim, the null
hypothesis is based on the assumption that the claim is true. The
alternative hypothesis is formulated so that rejection of Ho will
NO
Z-Test,
with test
Statistics
T-Test, with test
statistic
Hypothesis Testing – Proportion (Z-Test)
A random sample of 300 retail outlets
indicated that 165 outlets included the GST
in their prices while the others did not. Can
one conclude at the 5% level of significance
that more than retail outlets include the
GST in their prices than those that do not?
Independent Samples
What is the corresponding p‐ value?
The samples chosen at random are not related
to each other. We wish to study the mean
incomes of companies X and Y. We select a
random sample of 28 employees from the
Company X and a sample of 19 employees in
Hypothesis Testing – Two
Company Y. A person cannot be an employee in
Population Means
both companies.
Dependent Sample
Dependent samples are
Independent
characterized by a
Sample
Dependent/Paire
measurement, then some type
d Samples
of intervention, followed by
another measurement. Paired
Is 1 and 2 normal?
samples are also dependent
NO
because the same individual or
item is a member of both
samples.
YE
Paired T-test
Advanced
Examples: 10 participants in a
S
Statistics
marathon were weighed prior to
and after competing in the race.
Is σ1 and σ2 Example: Ho: µ = 4.5 and Ha: µ ≠
We wish to study the mean
Statistical Decision:
Hypothesis Testing for T-Test (equal
known?
4.5have
NO Risks in Decision Making Using Hypothesis-Testing
1. If you reject the Ho, you
variance)
Methodology
Type 1 error ifσyou
say that µ ≠ 4.5 when
statistical proof that the alternative
A work team has developed a new process to
=
YE
- Since our statistical evidence is based on sample data and the
1
YE
hypothesis is
correct.
µ =NO
4.5
assemble a certain component. They would
S
corresponding
sample
variability,
there
is a risk that we may
σ
2. If you do S
not reject the Ho, then you 2
like to know if this new process has
make
the
wrong
conclusion.
have failed to prove the Ha. The failure to
significantly reduced the time to assemble
Type 1 error and Type II error
prove the alternative hypothesis, does not
the component. They have taken samples of
T-Test,
with
“NoHo
Pooled
- Type
1 error:
Reject
when Ho is true.
mean that you have proven the null
50 components produced by the existing
Variance”
Prob
of committing Type I error =process
α (leveland
of 40 components produced by the
hypothesis.
*we can never prove that H0 is true
significance)
new process. The mean and standard
Z-Test, with test
T-Test, with “Pooled You control the Type I error by deciding the
risk level
that
you
deviation
of αthe
assembly
times for the
Statistics
Variance”
are willing to have in rejecting the null hypothesis
existing process
when it is
were
true.
73.2 minutes and 3.6
minutes,
- Type II error: Did not reject Ho when Ho
is false.respectively. The mean time was
Prob of committing Type II error =71.4
β minutes with a standard deviation of 3.2
for the
components
assembled by
It depends on the difference between theminutes
hypothesized
and
the
newdifference
process. Assume that the times for
actual value of the population parameter.the
If the
between the hypothesized and actual value of the population
parameter is large, then β is small.
Regression Analysis(RA) is a statistical forecasting model that is
concerned with describing and evaluating the relationship between a
given variable (usually called the dependent variable, denoted as Y) and
one or more other variable (usually known as the
independent/exploratory variable, denoted as X) .
• RA can predict the outcome of a given key business indicator
(dependent variable) based on the interactions of other related business
drivers (independent /exploratory variables)
• The relationship can be described as a function of a linear (straightline) equation <called linear regression> “simple linear regression”
Dependent Variable (Notation: Y) – The variable you wish to predict
Independent Variable (Notation: X) – Variable used to make the prediction
Simple Linear Regression – A single numerical independent variable X is
used to predict the numerical dependent variable Y
Multiple Regression – Use several independent variables to predict a
numerical dependent variable Y.
*When changes in the variable X leads to predictable change in the variable
Y then we say “X can be used to explain Y”
Regression analysis allows you to identify the type of relationship that
exists between a dependent variable (X) and an independent variable (Y).
The simplest
relationship is the
straight line or
linear relationship
Populatio
Sample
n
Regression
Y
β0
b0
Coefficient
Rule
intercep
of Thumb Concepts
t
Type of
X
Y
X, Y
Slope
β1
B1 Relationship
Increas
Increase
Regression
(least squares)
Y = β0 +
Y = b0 +
line
es
s
x + εsame b1x + e Direct
Move inβ1the
Decrea Error
Decreas
Random
(Residual) direction
ε
e relationship
ses
es
Forecasting
Line/
Y = b0 +
Increas
Decreas
Prediction
Line
b1x
es
es
Move in the
Inverse
opposite direction
relationship
Decrea
Increase
ses
s
Increas
Cannot
es
Tell
No apparent
No
relationship
relationship
Decrea
Cannot
ses
Tell
ASSUMPTIONS
General Assumptions of the Simple Linear Regression Model
<similar to ANOVA> Referring to the residuals
1. Linearity - The mean of the model error terms is 0.
2. Independence - The model error terms are independent.
3. Normality - The regression model errors are normally distributed.
4. Equal variance - The model error terms have a constant variance, σε2 ,
for all combinations of values of the independent variables.
The Coefficient of Determination
The
Actual
Data
Value
The
Predicted
Data
Value
Testing whether there is a linear relationship
Test for significance of the correlation between x and y.
Ho: ρ = 0 (There is no linear relationship) - Ha: ρ ≠ 0 (There is a linear
relationship)
Test for significance of the regression slope coefficient.
Ho: β = 0 (There is no linear relationship) – Ha: β ≠ 0 (There is a linear
relationship)
Note: For a simple linear regression model (one independent
variable), these two are equivalent methods.
Example: Simple Linear Regression
The marketing manager of a large supermarket chain would like to
determine the effect of shelf space on the sales of pet food. A random
sample of 12 equal-sized stores is selected with the following results:
Stor
Shelf
Week Sales,
e
Space, X
Y ($00)
(feet)
1
5
1.6
2
5
2.2
3
5
1.4
4
10
1.9
5
10
2.4
6
10
2.6
7
15
2.3
8
15
2.7
9
15
2.8
10
20
2.6
11
20
2.9
12
20
3.1
**First determine which are the independent and dependent variables
a. Assuming a linear relationship, use the least-squares method to find the
regression coefficients b0 and b1. Where b0 = intercept and b1 = slope
(stat, F3, TEST, T, REG, ≠, List # x, List # y, Freq-1) b1 = 0.7400 b0 =
1.450
b. Interpret the meaning of the slope b1 in this problem.
Positive Relationship - For every increase of one foot in shelf space,
there is an expected increase of 0.074 hundred of dollars ($7.40) in
weekly sales.
c. Use the regression model developed in (a) to predict the average weekly
sales (in hundreds of dollars) of pet food for stores with 8 feet of shelf space
for pet food
(x=8); y= 1.45 + 0.074 (8)= 2.042*100 because sales is in ($00) 
= $204.20
*predict the average weekly sales (in hundreds of dollars) of pet
food for stores with (12)20 feet of shelf space for pet food. What is
the residual error?
(X=20); y=1.45+0.074(20) = 2.93*100  $293; Residual Error =
e= y – y  = 3.1 – 2.93 = 0.17
d. compute the coefficient of determination r2, and interpret its meaning in
this problem. NOTE: r2 is called the coefficient of determination.
MEANING: The percentage of variation in the dependent variable
explained by its relationship to the independent variables in the
regression model
r2 = 0.6839 ~ 68.39% of the variation in y (week sales) is explained
by x (shelf space)
e. Compute the coefficient of correlation r. NOTE: r is the sample
correlation coefficient (also called the Pearson coefficient of
correlation). It is used to measure the strength of association
between two variables.
r= 0.82700 (positive relationship)
f. At the 0.005 level of significance, is there evidence of a linear relationship
between shelf space and sales? T-Test; Reject Ho, since there is
evidence of linear rel. between x&y
Ho:β1 =0
p-value
test statistics used to make statistical
decisions
Ha: β1≠0
tcal
1. Reject Ho
or
2. Don’t Reject Ho
Tcal = 4.6517273 p-value = 0.00090566 < α=0.05
Analyzing a Multiple Regression Model
Step 1: Collect sample data. The values of Y, X1, X2, X3, … ,Xk
Step 2: Hypothesize the form of the model. This includes choosing which
independent variables to include in the model. Y = β0 + β1X1 + β2X2 +
β3X3 + … + βkXk + ε (Linear)
Step 3: Use the method of least squares to estimate the unknown
parameters
β0 , β1 , β2…., βk
Step 4: Specify the probability distribution of the random error component
ε and estimates its variance σ2
*σ2 = variance of the random error ε
Step 5: Statistically evaluate the utility (or usefulness) of the model.
Step 6: Check the assumptions on σ are satisfied and make model
modifications, if necessary.
Step 7: Finally, if the model is deemed adequate, use the fitted model to
estimate the mean value of y or to predict a particular value of y for a given
values of the independent variables, and to make other inferences.
ASSUMPTIONS About the random - Error ε
Referring to the probability distribution of the random error component ε
and estimates its variance σ2.
1. Linearity ‐ The mean of the model error terms is 0.(E(ε)=0)
2. Independence ‐ The model error terms are independent.
3. Normality ‐ The regression model errors are normally distributed.
4. Equal variance ‐ The model error terms have a constant variance, σ2 ,
for all combinations of values of the independent variables.
Estimator of σ2 for multiple regression with k
Caution: A rejection of null hypothesis:
independent variables
Ho: β1 = β 2 =…..= β k =0 in the global F‐test
leads to the conclusion that the model is
statistically useful.
- However, statistically “useful” does not
necessarily mean “best”.
- Another model may prove even more useful in
Example: Multiple Regression
terms of providing more reliable estimates and
The following are data on horsepower x1, time
predictions.
from zero to 60 miles per hour (x2), top speed (x3),
- The global F‐test is usually regarded as a test
miles per gallon (x4), and price (y) in thousands of
the model must pass to merit further
dollars for 10 sports cars (Road & Track, October
consideration.
1994).
X1
X2
X3
X4
Y
BMW M3
24
6.
12
24.
38.
0
0
0
6
4
Corvette
30
5.
17
16.
41.
0
7
0
8
4
Dodge Viper
40
4.
16
14.
54.
0
8
0
0
8
Ford
24
6.
14
18.
25.
Mustang
0
9
0
0
8
Honda
19
7.
13
24.
25.
Prelude
0
1
9
0
6
Misubishi GT 32
5.
15
16.
43.
0
7
9
3
7
Toyota
32
5.
15
18.
48.
Supra
0
3
5
8
2
Nissan
30
6.
15
18.
0.8
300ZX
0
0
5
7
Alfa Romeo
32
7.
15
17.
38.
0
6
0
5
1
Mazda RX 25
5.
15
17.
35.
7
5
5
8
0
0
a. Develop an estimated regression equation
with horsepower, time from zero to 60 miles
per hour, top speed, and miles per gallon as
the four independent variables to predict
Hypothesis Testing for Dependent Samples
The process improvement team selects 12 cars at random and uses
both procedures on each car. There are two procedures: A and B.
We record the time (in mins) for each procedure to oil and filter
change. The results are shown in the next slide. At the 1% level of
significance, can we conclude that there is a difference in the
average time for an oil change and filter change? The times are
normally distributed.
Automo
bile
1
2
3
4
5
6
7
8
9
10
11
12
Time (mins) for
filter change
Procedure A
28.2
27.1
26.4
27.3
24.8
23.4
26.8
27.2
25.5
25.8
26.0
25.4
an oil and
Procedure B
25.4
27.0
25.5
27.1
26.5
27.4
26.2
26.8
28.9
26.1
24.7
26.6
j) What was the statistical decision made and why?
Reject Ho because the p-value is < than the level of
significance (0.05)
k) Using specific references to the appropriate population
parameters, state the test’s conclusion: There is sufficient
sample evidence to indicate there is a significant difference
between the two methods in the proportion of fair ratings
D=Time A
-Time B
2.8
0.1
0.9
0.2
-1.7
-4.0
0.6
0.4
-3.4
-0.3
1.3
-1.2
Hypothesis Two Sample Testing – Proportion:
A human resources director decided to investigate employee
perception of the fairness of two performance evaluation methods.
To test for the differences between the two methods, 160
employees were randomly assigned to be evaluated by one of the
methods: 78 were assigned to method 1, where individuals provide
feedback to supervisory queries as part of the evaluation process; 82
were assigned to method 2, where individuals provided selfassessments of their work performances. Following the evaluations,
employees were asked whether they considered the performance
evaluation fair or unfair. Of the 78 employees in method 1, there
were 63 fair ratings. Of the 82 employees in method, there were 49
fair ratings. Using a 0.05 level of significance, is there evidence of a
significant difference between the two methods in the proportion
of fair ratings?
a) What type of parameter is being tested here? π
b) Define the variable or parameter associated with this test:
π1 = population proportion of fair ratings for method 1
π2 = population proportion of fair ratings for method 2
c) State the hypotheses:
Null Hypothesis:
Ho: π1 = π2
Alternative Hypothesis: Ha: π1 ≠ π2
d) State the condition(s) are required to be true for this procedure to
be legitimate:
Apply CLT, x1 and x2 are normally distributed
e) What is the calculator procedure that you used for doing this test?
z, 2-p, Var, ≠
f) What is the critical value for this test? z = ±1.9599
Calc: DIST, Norm, InvN, Tail: Central, Area: 1-0.05, σ = 1, µ =
0
g) What is/are the rejection region(s)? Zcal < -1.9599 and Zcal >
1.9599
h) What is the value of the test statistic? Zcal = 2.8991
i) What is the p-value for the test? p-value = 0.0037413
Calc: TEST, z, 2-p, p1: ≠p2, x1: 63, n1: 78, x2: 49, n2: 82
Hypothesis Testing for T-Test (unequal variance)
We wish to determine if there is a difference in the
breaking distances for two types of tires. Use the 5%
level of significance and assume that the breaking
distances for each type of tire are normally distributed
with the unequal variance. Based on the data for the
samples of tires shown, at the 5% level of significance,
should we conclude that there is a difference in the mean
breaking distance?
Breaking Distance
(meters)
Tire (A)
Tire (B)
83
75
79
84
82
76
84
83
80
85
81
78
83
Hypothesis Two Sample Testing – Variances:
A carpet manufacturer is studying differences between two of its major
outlet stores. The company is particularly interested in the time it takes
customers to receive carpeting that was ordered from the plant. Data
concerning a sample of delivery times for most popular type of carpet are
summarized as follows:
At the 0.01 level of significance, is there evidence in a difference in the
variances of
the shipping time between the two outlets?
a) What type of parameter is being tested here? σ
b) Define the variable or parameter associated with this test:
σ1 ² = population variance in the shipping times for store A
σ2 ² = population variance in the shipping times for store B
c) State the hypotheses:
Null Hypothesis:
Ho: σ1 ² = σ2 ²
Alternative Hypothesis: Ha: σ1 ² ≠ σ2 ²
d) State the condition(s) are required to be true for this procedure to be
legitimate:
Assume that the proportions are normally distributed
e) What is the calculator procedure that you used for doing this test? F,
Var
f) What is the critical value for this test?
To get Fu  go to InvF, Area: 0.01/2, n: df = 40, n: df = 30
Fu = F0.01/2, 40, 30 = 2.52
FL = F 1/Fu = 0.416
g) What is/are the rejection region(s)? F > 2.25 and F < 0.416
h) What is the value of the test statistic? Fcal = 0.5993756
i) What is the p-value for the test? p-value = 0.129946737
Calc: TEST, f, Variable, σ1: ≠σ2, sx1: 2.4, n1: 41, sx2: 3.1, n2: 31
j) What was the statistical decision made and why?
Fail to reject Ho because the p-value is > than the level of
significance (0.01)
k) Using specific references to the appropriate population parameters,
state the test’s conclusion: There is not enough evidence to conclude
that the two proportion variances in the shipping times between
the two major outlet stores are not different
*Note: assuming the underlying normality in the 2 populations is
met, based on results above, it is appropriate
X
Store A
34.3 days
S
n
2.4 days
41
Store B
43.7
days
3.1 days
31
Confidence Interval
Estimate
What Population Parameter are
you testing?
Is normal?
Ie. X is or can be
assumed normal or
n ≥30? (CLT
Theorem)
Non –
Standar
d
procedu
re
required
Is σ
known?
Limits :
Is p normal?
Ie. np ≥ 5
and n(1-p) ≥
5
Limits :
Limits :
Limits :
Limits:
Find Sample
Size
Chi Square Testing: Example #3
What is the test(s) procedure
A survey is taken in three different locations in Nassau
for this set of hypothesis
County in New York to
a. Z test
determine whether there is a relationship between
b. X2 test
architectual style of houses
Post Hoc Tests – will be given on test
c. Shelf
T test
and georgraphic location. The results for a sample of 233
Dependent Variable:
Life
F test
houses are as follows:
Tukey HSD * The d.
mean
difference is significant
e. ATukey-Kramer
&B
Multiple
Procedure
East
Farmingd
Levittow
Tota
level Comparisons:
Test of0.05
Homogeneity
of Variances
Medo
ale
n
l
• To determine which means differ, you can use
Life (days)
w
procedure.
ChiTukey-Kramer
Square x2 proportion
test
Cape
31
14
52
97
• SPSS output
Qualitative
Data “counting” of attribute
Levene
df1
df2
Sig.
Expanded
2
1
12
15
*When you have a global test after rejection it
ranch
meansStatistic
that the means are different
Colonial
6
8
9
23
.528ANOVA
3 (AN20
.668
One-Way
alysis
Of VAriance)
Ranch
16
20
24
60
- Compare means of more than
two groups
(p-value
approach)
Split-level
19
17
2
38
- One-way ANOVA, deals with one factor of
Total
74
60
99
233
interest (e.g. performance, salary, etc)
ANOVA
At the 0.05 level of significance, is there evidence
of a Example #1
- Analyzing the variation “within groups” and
A snack foods company that supplies stores in a metropolitan
area
with
relationship between
“between
groups”
architectural and geographic location?
“healthy” snack products was interested in improving-the
shelf life
of its populations whose values
c groups
represent
a) What test procedure was used for doing thistortilla
test? Chichips product. Six batches (each batch containing
pound)and
of independently selected,
areone
randomly
Square
the product were made under each of four different formulations.
The distribution
follow a normal
b) State the hypothesis using statistical symbols:
batches were then kept under the same conditions of -storage.
Product
Have equal
variances?
Null Hypothesis: Ho: style is not related to
locationwas checked each day for freshness. The shelf life in days until
condition
o Use Levene’s test (SPSS output)
Ho: π1 = π2 = π3
the product was deemed to be lacking in freshness was
as follows: At
p-value
Alternative Hypothesis: Ha: style is related
theto
.05 level of significance, completely analyze the data to determine
- p-value allows you to make direct conclusions
location
whether there is evidence of a difference in the average shelf life among
Ha: at least one πj is different / Ha: not all πj is equal
the
formulations.
If
appropriate,
determine
which
groups
differ
average
Oneway ANOVA – may not
beingiven
on test
c) State the condition(s) and assumption(s) are required for
life.
Shelf Life
this procedure to be legitimate: All the fe > 5shelf
for all
cells
30.80
24.97
41.21
Calculator Input:
6
8
4
List 1: A, B, C, D
4.763
3.862
6.373
List 2: 111111, 222222, etc
9
6
3
7.304
5.992
9.772
Factor A: List 2
7
7
5
Dependent: List 1
19.05
15.45
25.49
MEAN
5
3
Xa = 95.33
Define the parameters
being tested:
What is the p-value for the test? 0.000
12.06
9.785
16.14
μa
=
population
shelf life for This is a upper tail test (always an upper tail test)
Xbaverage
= 84.833
8
4
5
formulation
A Xc = 75.33
To find Fu: Dist, F, InvF, Area: 0.05, n: df:
d) Draw a graph of the most appropriate distribution
clearly
μb = population average shelf life for 3, d: df: 20
showing the value(s) for the critical value(s) and the
Xd
=
81.833
formulation B
rejection region(s).
Fu = 3.098
μc = population average shelf life for What condition(s) are required to be true for this
How to obtain df? df(5-1), df(3-1) = 4 x 2 = 8
formulation C
procedure to be legitimate?
μd = population average shelf life for Randomness and independence
formulation D
Normality  visual  box whisker plot
State the hypothesis:
of variance  Ho: σa² = σb² = σc²
Null Hypothesis:
Ho: μa = μHomogeneity
b
= σd²
= μc = μd
(I)
(J)
Mean
Std.
Sig.
Lower Upper
Ha:
not all the σj² is
Alternative Hypothesis: Method
Ha: at
least Difference (I – J)Error
Method
Bound Bound
equal
one μj is different
A
B
10.50*
2.44
.002
3.66
17.34
Reject
< than
the level
of
What standardized test statistic is being
used
C
20.00*Ho since p-value
2.44 is
.000
13.16
26.84
significance
by this test?
D
13.50*
2.44
.000
6.66
20.34
P-value
Significance
0.05
F = 23.274 (given in the B“Oneway
A
-10.50*= 0.000, Level
2.44 of.002
-17.34 = -3.66
C
D
C
D
A
B
D
A
B
C
9.50*
3.00
-20.00*
-9.50*
-6.50
-13.50*
-3.00
6.50
2.44
2.44
2.44
2.44
2.44
2.44
2.44
2.44
.005
.617
.000
.005
.066
.000
.617
.066
2.66
-3.84
-26.84
-16.34
-13.34
-20.34
-9.84
-.34
16.34
9.84
-13.16
-2.66
.34
-6.66
3.84
13.34
If appropriate, determine which
method differs in average shelf
life? Use 0.05 level of significance.
Since Ho is rejected, it is
appropriate to use the Tukey
test
Tukey procedure  Post Hoc
tests
Using the Tukey output (at the .05
level of significance) formulation A
has a longer shelf life than
formulations B, C, and D.
Formulation B has a longer shelf
life than formulation C.
At the 5% level of significance
formulation A appears to have a
longer shelf life than formulations