Download Report

196
CHAPTER 7
.1 Sample, n 30)
Two-Tailed Hypothesis Tests7(Large
H
ypothesis testing is nothing more than a formalized approach to the central
limit theorem incorporating the concepts of accept/reject decision making
and Type I error. Let’s see how it works in the following problem.
䉲
Suppose the Fiche Company (a manufacturer of telephone cable) receives shipments
of ﬁber optic thread, hair-thin strands of glass capable of transmitting hundreds of
thousands of times more information than a copper wire. The Fiche Company will
ultimately coat the ﬁber-optic threads with steel and plastic and bind several into
cables to be laid on ocean ﬂoors for intercontinental communications. However, it
is important for production purposes that the incoming shipments of hair-thin glass
ﬁber thread maintain an average thickness of .560 mm. Of course the supplier of the
thread claims this is so.
Claim: µ = .560
Thickness (diameter off thread)
This is a typical situation in business. A supplier ships you goods and makes a claim
with the expectation that you will believe that claim. In this case, the claim is: the
average thickness of ﬁber optic thread in the shipment is .560 mm. In statistical
terms, we call this a hypothesis.
A hypothessis,, the
hen, iss mereely a cla
claim
im putt ffor
orth
t by so
som
meone. TThi
h s hypooth
theesis or
claim is den
enot
oted byy th
thee symbol H0 (H-s
-sub-zero)
o) an
ndd referrre
redd to ffor
ormaall
l y as the
he
null
nu
l hy
hypoth
thes
esis
is.*
.*
In this case, our claim or null hypothesis would be written
H0: μ .560mm
This null hypothesis may or may not be true. The supplier may have documented
evidence for making such a claim, or may simply be guessing. In fact, for all we
know, the supplier may be lying outright, which of course obliges us as prudent
individuals to test their claim.
The alternative hypothesis is the opposite of the null hypothesis. It represents a
negation of the original claim of the null hypothesis.
*Technical note: Actually the symbol H0 originates from tests involving the comparison of
two population means or ratios, however the symbol H0 has now evolved to represent any
hypothesis set up for the purposes of seeing if it can be rejected.
gib90160_ch07.indd 196
22/12/11 4:57 AM
Hypothesis Testing
197
The altern
nativve hy
h po
poth
t esis
es s is de
deno
noted by H1 and is th
he ar
argume
ment
nt thaat
refutees H0.
A test that analyzes H0 as it relates to H1 is referred to as a hypothesis test.
Hypothesis Test
A te
t stt d
des
esig
igned to pro
rove
ve or disprove som
omee in
i itiial cl
clai
aim
m, youur nu
null
ll h
hyp
ypotheesis, H0.
When dealing with a hypothesis test, we always begin by assuming the claim or
null hypothesis (H0) is true, in this case that the supplier is correct, that indeed the
average thickness is μ .560 mm for these shipments of ﬁber optic thread.
We begin a hyp
ypot
othe
h siss te
test
st by as
assu
sumi
mingg H0 is tr
t ue.
Indeed, if we accept H0: μ .560 mm as true (which we must to begin a hypothesis
test), then we know from decades of experience a certain logic will necessarily
follow, namely, if we were to measure the thickness of all the glass ﬁber in the
shipment and arrange these measurements according to size into a histogram, these
measurements would probably cluster about the average value of μ .560 mm,
however many measurements would be less than .560 mm and many would be
more, and the histogram might take on the following shape.
FIGURE
7.1
Population Histogram:
Glass Fiber Thickness
μ = .560 mm
Notice this population is somewhat ragged in shape with a slight skew. Although
in real life we may not actually know the shape of the population prior to sampling,
it would not be unusual for such a ragged skewed shape to appear. Although the
output from one process or machine, properly operating and running uninterrupted,
is often found to be normally or nearly normally distributed, an entire shipment
may very well consist of output from several machines or processes over several
periods of time and, thus, could vary considerably. When the output from various
processes are mixed, a normal distribution may or may not form, depending on a
number of factors. However, this should not make a diﬀerence in our analysis of
gib90160_ch07.indd 197
22/12/11 4:58 AM
198
CHAPTER 7
μ, since whatever the shape of your population, as long as the sample size exceeds
30, x the distribution will be normally distributed, as follows:
FIGURE
7.2
Emergence of Sampling
Distribution: Glass Fiber
Thickness
x distribution: several
thousand sample averages
which represent the total
μ = .560 mm
However, we do have another problem.
Noticeably absent from the above histogram is information concerning the standard
deviation of this population, σ, which in real-life situations is often not supplied. In fact,
more often than not, it is simply unknown. However, without σ we cannot calculate x .
Remember: x n
And without x , we cannot estimate the spread of our x distribution, which tells
us where we should expect sample averages ( x ’s) to cluster—which of course forms
the entire basis of our central limit theorem analysis. In other words, we are stuck!
But wait, the problem is not insurmountable. We have learned from prior exercises
that when we randomly select 30 or more measurements from a population that
x ≈μ
the sample average, x , is approximately equal to the population
average, μ, and
s≈σ
the sample standard deviation, s, is approximately equal to the
population standard deviation.
If indeed s ≈ σ, that is, the individual measurements in one sample are spread out in
a manner similar to how the measurements in the entire population are spread out,
we may be able to use the standard deviation of one sample, s, as an estimator of the
standard deviation of the entire population, σ. Experience has conﬁrmed that when
your sample size is over 30, indeed the spread of measurements in one sample is a
good estimator of the spread of measurements in the entire population—that is, s is a
good estimator of σ, and this is precisely what is done in industry and research studies.
s is used too essti
tima
mate σ..
Since the standard deviation of one sample should give us what we want to know,
namely, an approximation of σ, the standard deviation of the population, then the
gib90160_ch07.indd 198
22/12/11 4:58 AM
Hypothesis Testing
199
telephone cable manufacturer is obliged on receiving the shipment to take a random
sample. Although many results are possible, let us say, for the purposes of this
example that the manufacturer randomly samples 36 pieces of ﬁber-optic thread
and calculates the following:
n 36 measurements
x .553 mm
s .030 mm
If this is indeed a properly conducted random sample, the spread (standard deviation)
of the 36 measurements should be similar to the spread (standard deviation) of the
entire population. That is, if s .030 mm (note sample results above) and if s ≈ σ,
then σ must be approximately equal to .030 mm. And we can use this estimate to
calculate x , as follows:
x =
s
n
n
= .005 mm
=
.030
36
=
.030
6
Now that we know x is approximately equal to .005 mm, we can now estimate the
spread of the x distribution.
FIGURE
7.3
A Look at Spread in
Sampling Distribution
Population histogram:
millions and millions of
individual measurements
off glass fiber arranged
according to thickness
μ = .560 mm
σ ≈ .030 mm
x distribution: several
thousand sample averages
which represent the total
(sample size, n = 36)
μx = .560 mm
σx = .005 mm
.550
.570
.555 .565
.500 mm
.530 mm
.560 mm
.590 mm
.620 mm
Keep in mind, what we have done so far is a make-believe construction based solely
on the assumption that the supplier’s claim μ .560 mm is true. We really do not
know whether μ .560 mm is true or not. We are merely saying: “if ” μ .560 mm
is true, and “if ” we were to measure every piece of ﬁber in the shipment, and “if ”
we continually took random samples of 36 measurements and calculated the
sample average, x , for each sample, then the central limit theorem tells us that
the x ’s should form into a normally distributed x distribution, symmetrical about
μ .560 mm and spread out as shown above.
Okay, now that we know what the x distribution should look like if the supplier’s
claim is true, how do we prove (or disprove) μ .560 mm? Simple. We take a
random sample of 36 measurements from our shipment, calculate the sample
gib90160_ch07.indd 199
22/12/11 4:58 AM
200
CHAPTER 7
average, x , and observe if this x reasonably ﬁts
into the expected x distribution.*
Wait a minute. We already took a random sample
of 36 measurements. True. There’s no point spending time and money on another sample. Let’s use
the x we observed from the earlier sample. If you
recall, our sample results were as follows (reprinted here for convenience):
n 36 measurements
x .553 mm ← (Now we are interested
in this measurement)
s .030 mm
Notice that, now, we are concerned with the x
of the sample. In other words, does this x of .553 mm reasonably ﬁt into our
expected x distribution? And the answer is, yes. We can look at this sample average
of .553 mm and look at the x distribution and see that this x of .553 mm is a
reasonably likely occurrence. Observe:
FIGURE
7.4
The Test Statistic
Sample average
x = .553 mm
x distribution
x
.550
.570
.555 .565
.500 mm
.530 mm
.560 mm
.590 mm
.620 mm
Since an x of .553 mm would be a reasonably likely occurrence, we conclude that
the supplier’s claim (μ .560 mm) is quite possible. If we choose to make a ﬁrm
accept H0 or reject H0 decision, then we
Accept H0: μ .560 mm
In reality, there is not enough evidence to prove μ is precisely .560 mm. The best we
can show is that μ .560 mm is reasonably possible given the evidence of this one
sample. The concept of hypothesis testing is much like a jury trial: μ .560 mm is
innocent (accepted) unless proven guilty. Since a sample average of x .553 mm
*In hypothesis testing, we call the sample x the test statistic because we use it when determining where it reasonably ﬁts.
gib90160_ch07.indd 200
22/12/11 4:58 AM
Hypothesis Testing
201
does not prove the supplier’s claim false, then we must assume the supplier’s claim
is true.
Professionally, this conclusion is written in a number of ways. Two of the most
popular are:
The null hypothesis cannot be rejected
or
Results not signiﬁcant
Both statements say the same thing, that is, if we use the accept H0–reject H0 format,
then we must accept the supplier’s claim (μ .560 mm), since we have no evidence
to disprove the claim. My preference is to word the conclusion as follows:
Since the sample average of x ⫽ .553 mm reasonably ﬁts into the
expected x distribution for μ ⫽ .560 mm, we
Accept H0: μ ⫽ .560 mm
The words not signiﬁcant have a very special meaning in statistical testing. They
mean the results may reasonably be attributed to “chance ﬂuctuation.” In other
words, x ’s may very well vary, ﬂuctuate by chance, between .550 mm and .570 mm
when μ .560 mm. Since we achieved an x (.553 mm) in this chance ﬂuctuation
range, we merely accept H0. In broad terms, when sample results are,
Not signiﬁcant:
Signiﬁcant:
we accept H0
we reject H0
Now you might feel a little uncomfortable accepting H0 since your sample average
(.553 mm) did not fall precisely on the claimed population value of .560 mm. And
at this point you might say, why don’t we continue sampling to be more positive
of our decision? Unfortunately, in most areas of research, further sampling is not
practical. It is usually expensive, time-consuming, and in some cases physically
impossible (when test circumstances cannot be duplicated). Certainly in this
production control experiment, another random sample can be taken with relative
ease, however in most studies in marketing, medicine, sociology, economics, and
other ﬁelds, we often must rely on the results of one and only one sample. Even
in this production control experiment, no one wants to absorb the added time and
expense of further sampling unless absolutely necessary. In other words, in statistical
studies,
we normally base our decision on one and only one sample.
And we will conform to this practice in this text. So, to sum up our experiment, if
our one sample average, x , is reasonably close to the claimed μ, we accept H0 as true
and therefore accept the shipment of ﬁber-optic thread as meeting our speciﬁcation
of μ .560 mm.
gib90160_ch07.indd 201
22/12/11 4:59 AM
202
CHAPTER 7
However, this may cause some questions, such as: at what point do we grow
suspicious that our sample x is not reasonably close to μ? For instance, what if our
sample average turned out to be .550 mm or .540 mm or .577? Clearly, these values
are on the very fringe of the “expected” sample averages. Observe:
x distribution
.550
.570
.555 .565
.500 mm
.530 mm
.560 mm
.590 mm
.620 mm
In other words, at what value of x do we begin to grow suspicious that maybe the
supplier’s claim is false? Fortunately, there are certain industry standards that have
proven reliable over decades of use. Although a number of industry standards exist,
one of the most popular is the
Level of signiﬁcance, α 5% (.05)
Although discussed in the last chapter, a brief review here might be helpful.
Essentially, a level of signiﬁcance sets up the cutoﬀs, or boundaries for accepting or
rejecting H0. For instance,
For level of signiﬁcance, α 5% (.05),* establish where the middle 95% of the x ’s
are expected to fall if H0 is true. Then, if the x you calculate from your random
sample falls inside (or exactly on the border) of this 95% range, accept H0 as true.
If the sample x falls outside, assume H0 is false.
Visually we might present this α .05 hypothesis test as follows:
FIGURE
7.5
α = .05
two-tailed test
Accept HO
ffor middle
95%
of x’s
’
Acceptance/Rejection
Regions
Reject HO
Reject HO
μ
5% of x’s
’
Two-Tailed Test
Th
his
i iss ca
call
lleed a two
o-t
-tai
aile
led hypothesis tes
estt siincee wee h
hav
ave tw
wo ta
tail
ilss of rrejeccti
t on (ass
sh
how
own
n sh
had
aded
ed aabbovee).
) That is, we would
d rejec
e t th
he nuull hyp
ypot
othesi
siss fo
forr any samp
m le
ng iin
n ei
eith
ther
er off the tw
wo shaded tail
ils.
x ffalliling
*Actually, many levels of signiﬁcance are possible.
gib90160_ch07.indd 202
22/12/11 4:59 AM
Hypothesis Testing
203
To recap: if your sample x falls inside this 95% range (or on the border), accept
H0. If your sample x falls outside this range (that is, in the shaded tails), reject
H0. And this is precisely what is done in industry and research. When making
decisions concerning these large-sample, two-tailed hypothesis tests, we can choose
between one of three methods: the P-value method and two versions of the classical
(traditional) method. Since Method One is the most popular, we go into detail
concerning its development and apply its meaning to our example. We then discuss
both Methods Two and Three. Here is a problem as it would be worded and solved
in practice.
EXAMPLE
A supplier claims the average thickness (diameter) of its ﬁber-optic thread is
.560 mm. You receive a shipment and decide to test their claim at a .05 level of
signiﬁcance by taking a sample of 36 randomly selected measurements, with the
following results:
n 36 measurements
x .553 mm
s .030 mm
What can we conclude?
METHOD ONE: THE P-VALUE METHOD
Perhaps the most popular way that we accept or reject our null hypothesis (H0) is
to use the statistic called the p-value. For hypothesis tests of μ (and other statistical
tests), computer printouts often provide a p-value in the results. We discuss here
the basic notion of the p-value and apply it to help us solve the current problem
at hand.
DEFINITION
A p-value refers to the probability of obtaining a sample average ( x ) at least as
extreme as the one found from the sample data, given that the null hypothesis is
true. We generally denote it as p or as p-value.
For instance, for a sample x with a p-value
p .0012
the probability of obtaining a sample average x at least as extreme as this sample x
given the null hypothesis is true is .0012 (.12%). In such situations, we compare this
p-value of the sample x (.0012) to the level of signiﬁcance (α) for the experiment
to determine whether we accept or reject the null hypothesis.
For p greater than α, accept H0; otherwise reject.
Notice that in this instance, if α .05, we would reject H0, since p was less than α
(.0012 .05 or .12% 5%).
gib90160_ch07.indd 203
22/12/11 4:59 AM
204
CHAPTER 7
Generally, if the p-value is
a. greater than .05, there is insuﬃcient evidence against H0
b. between .01 and .05, there is ample evidence against H0
c. between .001 and .01, there is strong evidence against H0
d. less than .01, there is strong evidence against H0
not signiﬁcant
signiﬁcant
highly signiﬁcant
very highly signiﬁcant
However, conventionally, those conducting the tests will choose the α-level
(normally .05 or .01) and then compare to the p-value in order to reach a decision.
We follow this testing convention, and apply this p-value method to our earlier
problem about ﬁber optics, in which we use the popular 5% level of signiﬁcance.
P-values For Large-Sample, Two-Tailed Tests
In the case of two-tailed tests when the sample is large (n 30), we compute the
p-value by using knowledge of our test statistic ( x ) and the area or region that exceeds
this value. Of course, before ﬁnding the area, we should convert the test statistic to its
comparable z value so we can use the standard table (Table A in Appendix) to assist us
with area percentages. If the test statistic falls to the right of the mean in the normal
curve, we consider the upper percentage area from the test statistic to the nearest tail
of the curve. If the test statistic falls to the left of the mean in the normal curve, we
consider the lower percentage area from the test statistic to the nearest tail of the curve.
Since we have a two-tailed test, once we decide on which region (upper or lower) is
of interest, we double this found percentage value. We follow this strategy because
the value to which we are comparing it has been divided into two tails. So we
need to ensure that the p-value in this case represents total area in both tails. Thanks
to the concept of symmetry, this allows us to add the p-value to itself or equally
multiply it by 2. Figure 7.6 provides a visual interpretation for ﬁnding p-values in a
large-sample, two-tailed scenario:
FIGURE
7.6
Flowchart for Finding
P-Values in Two-Tailed
Tests (n 30)
Locate
Locate
t test
t t statistic
stat
t tiisttiic
(x)
C
Convert
t iinto
intto
z score
sco e
z =
x
s
µ
n
U z ttable
Use
bl tto ddetermine
t i area ffrom z
t it
to
i closest
its
l
t ttail
il
1
p – value
l = 0.50
0 50 – area ffrom mean to
t z
2
Add this
thi
hi value
l to
t itself
iit lff ((or multiply
lltiiply value
l by
by
2)) to
t get
g t overallll pp-value
value
l . Signifies
Sig ifi
f area in
i
b thh tail
both
t il regions
gi .
(p - value
l = 2 x [0.5
[[0
0 50 - area ffrom mean tto z])
gib90160_ch07.indd 204
22/12/11 4:59 AM
Hypothesis Testing
205
Once we arrive at our p-value, we essentially have the area under the normal curve,
which can be compared to the area of the signiﬁcance level in the test ( ). If it is the case
that the area outlined by the p-value engulfs the area outlined by ∝ (p value ∝ ),
then we accept our null hypothesis (H0). For this type of test, a comparison of areas
between the p-value and ∝ , along with decision results, is illustrated in Figure 7.7.
FIGURE
7.7
Using P-values to Make
Decisions (Two-Tailed)
p-value diagram
–ztest
total p-value area
(based on –z and z
test
α diagram
vs.
ztest
total α level
test
statistic)
–ztest
ztest
Case One
If p – value > α, we accept Ho
–ztest
Observe total p-value area
covers total α area when both
curves above are placed on top
off each other
ztest
Case Two
If p – value < α, we reject Ho
Observe total α area covers
total p-value area when both
curves above are placed on top
off each other
Applying P-values to our Example
In our ﬁber optics example, we recall that we had a test statistic of .553 ( x .553)
and wish to see if this statistic is “good enough” to support the supplier’s claim that
ﬁber optics shipments, on average, are .560mm (μ .560mm).
For p-values, we conduct the hypothesis test by following three fundamental
sequences:
SOLUTION
Sequence
I. Set up initial conditions: H0, H1, and level of signiﬁcance:
In Our Example,
It Would Be
H0: State the null hypothesis, that is the claim or
assertion you wish to test.
H1: State the alternative hypothesis. In other words,
if H0 proves false, then what must we conclude?
α: State the level of signiﬁcance, α, that is, the risk
of a Type I error (the risk of rejecting H0 in error).
gib90160_ch07.indd 205
H0: μ .560 mm
H1: μ .560 mm
α .05 (5%)
22/12/11 4:59 AM
206
CHAPTER 7
Sequence
II. Calculate p-value (see Figure 7.6)
Convert x → z.
Since we are using x , we have to remember that we have to consider the
size of the sample (n). We also note that
we should use s (the standard deviation
of the sample) as an estimator of (the
population standard deviation).
z
.553 .560 .007
.030
.005
36
z 1.40
Determine area from z to its closest tail to get half of p-value (two-tailed).
Using the z table (Table A), we observe
percentage from z 0 to converted
z score. To get half of the p-value, we
subtract this converted z score from .50
(or 50%).
Percentage in Table A
(in decimal form):
0 tto z 1.44 0.4192
1
pval .50 .4192 .0808
2
Determine total p-value.
Add current value to itself (or multiply
by 2). Convert to percent to get total
p-value.
Sequence
III. Accept or reject H0 by comparing your
p-value and .
8.08%
pval 2( 0.0808) .1616
pval 16.16%
Since p val , we
accept H0.
8.08%
-1.40 µ + 1.40 p = 16.16% (0.16)
(cutoff) (cutoff)
Since we achieved a sample result 1.40 standard deviations from the expected value,
μ, we shade all the area that is at least 1.40 standard deviations from μ. Note in a
two-tailed test, we shade both tails. Next, we found the probability of achieving a
sample result in this shaded area, which is 16.16% (8.08% in each tail). This is our
p-value. This is usually expressed in technical reports and computer software printouts as either p .16 or p > .05 (meaning the probability of achieving this sample
x is greater than the α level of the test).
For p α, ac
Fo
acce
cept
pt H0; otth
heerwisee re
r ject
ct
Since in our case, .16 .05, we accept H0.
ANSWER
The statement of the ﬁnal answer using p-values is rather standard in industry and
academic circles. The ﬁnal answer for this speciﬁc example would be presented as follows:
Our p-value of 16% is the probability by chance that we get a sample mean as extreme as .553 (assuming that μ .560 mm, x 0.0005 mm, and with a sample of
gib90160_ch07.indd 206
22/12/11 4:59 AM
Hypothesis Testing
207
n 36). Since this probability is relatively large (as compared to α), we do not have
enough evidence to reject our null hypothesis. Hence, random chance very well may
be a likely reason for the diﬀerence between μ .560 and μ .553. We conclude
that the assumption of μ .560 mm is probably correct. Essentially, there is insufﬁcient evidence to conclude that the mean thickness of the supplier’s ﬁber-optic
threads diﬀers from .560 mm.
METHODS TWO AND THREE: THE CLASSICAL METHODS
Although Method One is the most popular, several researchers still solve hypothesis
tests using the more classical, traditional approaches. Both Methods Two and Three
follow the same path, as outlined in Sequence I and II of the process. They diﬀer
in Sequence III. We use these methods to attack the previous example again about
ﬁber-optic threads.
The hypothesis test using the classical methods consists of three fundamental
sequences as follows.
SOLUTION
Sequence
I. Set up initial conditions: H0, H1, and level of signiﬁcance:
In Our Example,
It Would Be
H0: State the null hypothesis, that is, the claim
H0: μ .560 mm
or assertion you wish to test.
H1: State the alternative hypothesis. In other
words, if H0 proves false, then what must
H1: μ .560 mm
we conclude?
α: State the level of signiﬁcance, α, that is,
the risk of a Type I error (the risk of reα .05 (5%)
jecting H0 in error).
Sequence
II. Assume H0 true, use α to establish cutoﬀs as follows:
Calculate x :
We must remember we are dealing with
x ’s and therefore must ﬁrst calculate x ,
the standard deviation of the x distribution. Note in our formula for x , we used
s (the standard deviation of the sample) as
an estimator of σ (the population
standard deviation).
Draw Curves
Using our above calculation, x ≈ .005 mm,
we estimate the spread of the
x distribution.
x s
n
.030
6
36
n
.030
.005 mm
x distribution
.550
-2
gib90160_ch07.indd 207
.570
.555 .560 .565
-1
0
1
2 (zz scores)
22/12/11 5:00 AM
208
CHAPTER 7
Establish Cutoﬀs (using α, the level of signiﬁcance)
Our level of signiﬁcance in this case is
α .05 (5%), which in a two-tailed test
implies we will accept the middle 95% of
the x ’s as our boundary for accepting H0
as true. We now look up the z scores corresponding to the middle 95% of the x ’s,
which turn out to be z 1.96 and
z 1.96.
Remember: the normal curve table
reads half the normal curve, starting from
1
z 0 out, so we look up 2 of 95% or 47 12 %,
which in decimal form is .4750 (as shown
at right).
Middle 95%
of x’s
’
x distribution
.4750 .4750
.560 mm
0
x=?
z = -1.96
x=?
z = +1.96
Normal Curve Table
z .00 .01 . . . .06
0.0
.
.
.
.4750
1.9
Substituting the z scores of 1.96 and 1.96 into our formula, we solve for the
x at the cutoﬀs.
z
1 96 x x
z
x .560
.005
1 96 Solving for x :
x x
x .560
.005
Solving for x :
x .550 mm
x .570 mm
The completed solution would appear graphically as follows:
Population
Accept H0
x distribution for
sample size of n = 36
.500 mm
x = .550 mm x = .570 mm
z = +1.96
z = -1.96
.530 mm
.560 mm
.590 mm
.620 mm
Note that the reject zones are shaded, that is, the zones where we would reject
μ .560 mm as being true. This is your risk of a Type I error (5%).
Sequence
gib90160_ch07.indd 208
III. Accept or reject H0 using your sample x : For this, two methods are available.
Method One uses the actual value of the sample x . Method Two uses
the z score of the sample x . Since each adds to understanding, we shall
employ both.
22/12/11 5:00 AM
Hypothesis Testing
Method Two
Th s m
This
metho
hod uses
es tth
he
actu
ac
ual value of th
the
samp
sa
m le x (.553
5 ) in
thee deeci
th
c sion-mak
a ing
proc
pr
ocesss.
209
Criteria: Accept HO ((µ=.560 mm)
if your sample x falls between
the established x cutoffs off .550 mm
and .570 mm, otherwise reject.
Decision: Since our sample x
(.553 mm) fell in the acceptance
zone for
f HO. we accept HO
(µ=.560 mm) as true.
Sample x = .553 mm
Recall: Our sample
results were as follows:
n 36 measurements
x .553 mm
Method Three
This m
This
metho
hod usess the z
scor
sc
oree of the sam
mplle x
in thee ddecision-makingg
proc
pr
oces
esss. To use th
t is
metthodd, however,, we
me
must
mu
st ﬁrst
s calcu
ula
late
te the
z ssco
core of our saample x
(.55
5533 mm),, as fo
ollows.
x .553 .560
z
x
.005
z 1.40
x
x = .550 mm x = .570 mm
(cutoff)
f
(cutoff)
µ = .560 mm
Accept H0
Criteria: Accept HO ((µ=.560 mm)
if the z score off your sample x
ffalls between the established z score
cutoffs off -1.96 +1.96,
otherwise reject.
Decision: Since the z score of
our sample x (-1.40) fell in the
acceptance zone for
f HO, we accept
HO (µ=.560 mm) as true.
z score of
sample x = -1.40
z
z=-1.96
(cutoff)
z=+1.96
(cutoff)
z=0
(.560 mm)
Accept H0
Whether we use the actual value of the sample x or the z score of the sample x ,
we will always make the same decision. In this case, we accept H0. Generally, the z
score is preferred by those most familiar with this statistical technique since the z
score is a more informative measure. Note that we better understand the position of
the sample x if we say it is 1.40 standard deviations from the claimed μ than if
we merely presented its actual value of .553 mm.*
ANSWER
The ﬁnal answer may be presented in a number of ways, depending on the technical
expertise of those reading the report.
a. If the report is to be presented to individuals unfamiliar with statistical
technique, perhaps the following oﬀers a clear approach:
Since the sample average we obtained from the shipment (.553 mm) falls
inside the range (.550 to .570) where we would most likely expect sample
averages to fall if H0 were true, we accept H0: μ .560 mm, and therefore
accept the shipment.
Accept H0
b. However, this same answer may very well appear in a technical report worded
in terms of z scores, as follows:
Since our sample z of 1.40 is not less than 1.96, the null hypothesis cannot
be rejected. The diﬀerence between .553 mm and .560 mm is not large enough
gib90160_ch07.indd 209
22/12/11 5:00 AM
210
CHAPTER 7
to provide evidence at the .05 level of signiﬁcance that the shipment does not
meet supplier’s speciﬁcation.
Null hypot
Nu
othe
hesi
siss cann
nnot
ot be rejeect
cted
e
ed
c. Then again, many reports simply present the results as z 1.40
(not signiﬁcant).
Result
Re
ltss noot si
sign
gniﬁcan
ant*
t*
Believe it or not, all three answers say the same thing. Try to understand the
technical explanations using z scores, since this is typical of how research
reports are presented.
CONTROL CHARTS
In production studies and occasionally in marketing, medical, and other studies, the
same hypothesis test may be repeated a number of times. For instance, what if this
telephone cable manufacturer in the prior problem were to accept this shipment of
ﬁber-optic thread and then ordered additional ﬁber-optic thread under the same
speciﬁcations, to be delivered once a month for several months? Each monthly
shipment may very well be tested in an identical manner. When essentially the same
test must be repeated on a periodic basis, a control chart can be set up as follows,
which exploits techniques used in the classical or traditional methods (Methods
Two and Three):
Construction of Control Chart
1. On a graph, establish cutoﬀs for a given hypothesis test. In industrial
production, cutoﬀs are usually referred to as control limits.
1
2. Rotate graph 4 turn counterclockwise, extending the cutoﬀ lines to the right.
Shade rejection zone.
3. Plot each sample x sequentially to the right. Connect each x to prior result
with a line segment.
In a control chart, you may choose to use either actual values or z scores to represent
the readings. For instance, say we use actual values. We would proceed (using our
ﬁber-optic thread example) as shown in Figure 7.8.
Note that one sample x (.547 mm) was marked “signiﬁcant.” This means, based on
this one sample average, we would reject this particular shipment as not meeting
speciﬁcations. At this point, the production supervisor would likely be called in.
After verifying results, the supervisor may very well call the manufacturer of the
*Again, the words not signiﬁcant have a very special meaning in statistical testing. Essentially,
not signiﬁcant means: the sample results (in this case, x .553 or z 1.40) are considered
“chance ﬂuctuation.” In other words, we would expect to ﬁnd x ’s between 1.96 standard
deviations of the mean if H0 were true. Since the z score (1.40) of our sample x was in this
chance ﬂuctuation range between 1.96 standard deviations, it is deemed not signiﬁcant
and we accept H0.
gib90160_ch07.indd 210
22/12/11 5:01 AM
Hypothesis Testing
FIGURE
211
7.8
A Two-Sided Control
Chart
Fiber-optic
thread example
.570
.561
.560
.550
Rotate ¼ turn
.553
.556
.554
.547
547 ((sig
gnifi
nifica
cant
nt))
.560
x = .550 x = .570
Cutoffs established, taken from prior example.
1
turn counterclockwise, extending
Rotate
4
cutoff lines to the right and shading rejection
zone (as shown in next diagram).
Now let’s say we receive 5 shipments over several months and calculate the sample x for
each as follows.
x .553 mm
x .561 mm
x .547 mm (signiﬁcant)
x .554 mm
x .556 mm
Each sample x is plotted sequentially as the shipment comes in and connected with a line
segment to prior result (as shown above).
ﬁber-optic thread to inform them that their process was not meeting speciﬁcation,
and most likely “out of control.” A process is deemed out of control when sample
x ’s fall outside the control limits for acceptance of H0 and we suspect a possible
deterioration of the process.
Note that a control chart provides a clear visual history of this hypothesis test. Often
we learn more about a process by keeping this kind of record. Sometimes we can
spot a trend, a process going out of control before a signiﬁcant sample x is achieved.
Or we may be able to pick up slight shifts in the value of μ, even though sample
x ’s are in control. For a process in control, the sample x ’s should ﬂuctuate (usually
in a ragged pattern) around the value of μ. Notice that the x ’s we calculated, .553,
.561, .547, .554, and .556, seem to ﬂuctuate more around the value of .555 (than the
value .560). If this trend continues for future shipments, we may very well suspect
the thickness of the ﬁber-optic thread shipped may be on average, μ .555 mm. Of
course, whether or not this slight shift makes a diﬀerence in our production would
have to be assessed.
A control ch
hartt* pr
providdes
e a cle
l ar vvis
isuall hist
stooryy off a rep
epeetittive te
test
st.
*Historical note: Walter Shewhart ﬁrst developed control charts in 1924, which were tested
and developed within the Bell Telephone System, 19261931. For further historical reading
on this topic, refer to, W. Peters, Counting for Something (New York: Springer-Verlag, 1987),
Chapter 16, “Quality Control,” pp. 151–162.
gib90160_ch07.indd 211
22/12/11 5:01 AM