196 CHAPTER 7 .1 Sample, n 30) Two-Tailed Hypothesis Tests7(Large H ypothesis testing is nothing more than a formalized approach to the central limit theorem incorporating the concepts of accept/reject decision making and Type I error. Let’s see how it works in the following problem. 䉲 Suppose the Fiche Company (a manufacturer of telephone cable) receives shipments of fiber optic thread, hair-thin strands of glass capable of transmitting hundreds of thousands of times more information than a copper wire. The Fiche Company will ultimately coat the fiber-optic threads with steel and plastic and bind several into cables to be laid on ocean floors for intercontinental communications. However, it is important for production purposes that the incoming shipments of hair-thin glass fiber thread maintain an average thickness of .560 mm. Of course the supplier of the thread claims this is so. Claim: µ = .560 Thickness (diameter off thread) This is a typical situation in business. A supplier ships you goods and makes a claim with the expectation that you will believe that claim. In this case, the claim is: the average thickness of fiber optic thread in the shipment is .560 mm. In statistical terms, we call this a hypothesis. A hypothessis,, the hen, iss mereely a cla claim im putt ffor orth t by so som meone. TThi h s hypooth theesis or claim is den enot oted byy th thee symbol H0 (H-s -sub-zero) o) an ndd referrre redd to ffor ormaall l y as the he null nu l hy hypoth thes esis is.* .* In this case, our claim or null hypothesis would be written H0: μ .560mm This null hypothesis may or may not be true. The supplier may have documented evidence for making such a claim, or may simply be guessing. In fact, for all we know, the supplier may be lying outright, which of course obliges us as prudent individuals to test their claim. The alternative hypothesis is the opposite of the null hypothesis. It represents a negation of the original claim of the null hypothesis. *Technical note: Actually the symbol H0 originates from tests involving the comparison of two population means or ratios, however the symbol H0 has now evolved to represent any hypothesis set up for the purposes of seeing if it can be rejected. gib90160_ch07.indd 196 22/12/11 4:57 AM Hypothesis Testing 197 The altern nativve hy h po poth t esis es s is de deno noted by H1 and is th he ar argume ment nt thaat refutees H0. A test that analyzes H0 as it relates to H1 is referred to as a hypothesis test. Hypothesis Test A te t stt d des esig igned to pro rove ve or disprove som omee in i itiial cl clai aim m, youur nu null ll h hyp ypotheesis, H0. When dealing with a hypothesis test, we always begin by assuming the claim or null hypothesis (H0) is true, in this case that the supplier is correct, that indeed the average thickness is μ .560 mm for these shipments of fiber optic thread. We begin a hyp ypot othe h siss te test st by as assu sumi mingg H0 is tr t ue. Indeed, if we accept H0: μ .560 mm as true (which we must to begin a hypothesis test), then we know from decades of experience a certain logic will necessarily follow, namely, if we were to measure the thickness of all the glass fiber in the shipment and arrange these measurements according to size into a histogram, these measurements would probably cluster about the average value of μ .560 mm, however many measurements would be less than .560 mm and many would be more, and the histogram might take on the following shape. FIGURE 7.1 Population Histogram: Glass Fiber Thickness μ = .560 mm Notice this population is somewhat ragged in shape with a slight skew. Although in real life we may not actually know the shape of the population prior to sampling, it would not be unusual for such a ragged skewed shape to appear. Although the output from one process or machine, properly operating and running uninterrupted, is often found to be normally or nearly normally distributed, an entire shipment may very well consist of output from several machines or processes over several periods of time and, thus, could vary considerably. When the output from various processes are mixed, a normal distribution may or may not form, depending on a number of factors. However, this should not make a difference in our analysis of gib90160_ch07.indd 197 22/12/11 4:58 AM 198 CHAPTER 7 μ, since whatever the shape of your population, as long as the sample size exceeds 30, x the distribution will be normally distributed, as follows: FIGURE 7.2 Emergence of Sampling Distribution: Glass Fiber Thickness x distribution: several thousand sample averages which represent the total μ = .560 mm However, we do have another problem. Noticeably absent from the above histogram is information concerning the standard deviation of this population, σ, which in real-life situations is often not supplied. In fact, more often than not, it is simply unknown. However, without σ we cannot calculate x . Remember: x n And without x , we cannot estimate the spread of our x distribution, which tells us where we should expect sample averages ( x ’s) to cluster—which of course forms the entire basis of our central limit theorem analysis. In other words, we are stuck! But wait, the problem is not insurmountable. We have learned from prior exercises that when we randomly select 30 or more measurements from a population that x ≈μ the sample average, x , is approximately equal to the population average, μ, and s≈σ the sample standard deviation, s, is approximately equal to the population standard deviation. If indeed s ≈ σ, that is, the individual measurements in one sample are spread out in a manner similar to how the measurements in the entire population are spread out, we may be able to use the standard deviation of one sample, s, as an estimator of the standard deviation of the entire population, σ. Experience has confirmed that when your sample size is over 30, indeed the spread of measurements in one sample is a good estimator of the spread of measurements in the entire population—that is, s is a good estimator of σ, and this is precisely what is done in industry and research studies. s is used too essti tima mate σ.. Since the standard deviation of one sample should give us what we want to know, namely, an approximation of σ, the standard deviation of the population, then the gib90160_ch07.indd 198 22/12/11 4:58 AM Hypothesis Testing 199 telephone cable manufacturer is obliged on receiving the shipment to take a random sample. Although many results are possible, let us say, for the purposes of this example that the manufacturer randomly samples 36 pieces of fiber-optic thread and calculates the following: n 36 measurements x .553 mm s .030 mm If this is indeed a properly conducted random sample, the spread (standard deviation) of the 36 measurements should be similar to the spread (standard deviation) of the entire population. That is, if s .030 mm (note sample results above) and if s ≈ σ, then σ must be approximately equal to .030 mm. And we can use this estimate to calculate x , as follows: x = s n n = .005 mm = .030 36 = .030 6 Now that we know x is approximately equal to .005 mm, we can now estimate the spread of the x distribution. FIGURE 7.3 A Look at Spread in Sampling Distribution Population histogram: millions and millions of individual measurements off glass fiber arranged according to thickness μ = .560 mm σ ≈ .030 mm x distribution: several thousand sample averages which represent the total (sample size, n = 36) μx = .560 mm σx = .005 mm .550 .570 .555 .565 .500 mm .530 mm .560 mm .590 mm .620 mm Keep in mind, what we have done so far is a make-believe construction based solely on the assumption that the supplier’s claim μ .560 mm is true. We really do not know whether μ .560 mm is true or not. We are merely saying: “if ” μ .560 mm is true, and “if ” we were to measure every piece of fiber in the shipment, and “if ” we continually took random samples of 36 measurements and calculated the sample average, x , for each sample, then the central limit theorem tells us that the x ’s should form into a normally distributed x distribution, symmetrical about μ .560 mm and spread out as shown above. Okay, now that we know what the x distribution should look like if the supplier’s claim is true, how do we prove (or disprove) μ .560 mm? Simple. We take a random sample of 36 measurements from our shipment, calculate the sample gib90160_ch07.indd 199 22/12/11 4:58 AM 200 CHAPTER 7 average, x , and observe if this x reasonably fits into the expected x distribution.* Wait a minute. We already took a random sample of 36 measurements. True. There’s no point spending time and money on another sample. Let’s use the x we observed from the earlier sample. If you recall, our sample results were as follows (reprinted here for convenience): n 36 measurements x .553 mm ← (Now we are interested in this measurement) s .030 mm Notice that, now, we are concerned with the x of the sample. In other words, does this x of .553 mm reasonably fit into our expected x distribution? And the answer is, yes. We can look at this sample average of .553 mm and look at the x distribution and see that this x of .553 mm is a reasonably likely occurrence. Observe: FIGURE 7.4 The Test Statistic Sample average x = .553 mm x distribution x .550 .570 .555 .565 .500 mm .530 mm .560 mm .590 mm .620 mm Since an x of .553 mm would be a reasonably likely occurrence, we conclude that the supplier’s claim (μ .560 mm) is quite possible. If we choose to make a firm accept H0 or reject H0 decision, then we Accept H0: μ .560 mm In reality, there is not enough evidence to prove μ is precisely .560 mm. The best we can show is that μ .560 mm is reasonably possible given the evidence of this one sample. The concept of hypothesis testing is much like a jury trial: μ .560 mm is innocent (accepted) unless proven guilty. Since a sample average of x .553 mm *In hypothesis testing, we call the sample x the test statistic because we use it when determining where it reasonably fits. gib90160_ch07.indd 200 22/12/11 4:58 AM Hypothesis Testing 201 does not prove the supplier’s claim false, then we must assume the supplier’s claim is true. Professionally, this conclusion is written in a number of ways. Two of the most popular are: The null hypothesis cannot be rejected or Results not significant Both statements say the same thing, that is, if we use the accept H0–reject H0 format, then we must accept the supplier’s claim (μ .560 mm), since we have no evidence to disprove the claim. My preference is to word the conclusion as follows: Since the sample average of x ⫽ .553 mm reasonably fits into the expected x distribution for μ ⫽ .560 mm, we Accept H0: μ ⫽ .560 mm The words not significant have a very special meaning in statistical testing. They mean the results may reasonably be attributed to “chance fluctuation.” In other words, x ’s may very well vary, fluctuate by chance, between .550 mm and .570 mm when μ .560 mm. Since we achieved an x (.553 mm) in this chance fluctuation range, we merely accept H0. In broad terms, when sample results are, Not significant: Significant: we accept H0 we reject H0 Now you might feel a little uncomfortable accepting H0 since your sample average (.553 mm) did not fall precisely on the claimed population value of .560 mm. And at this point you might say, why don’t we continue sampling to be more positive of our decision? Unfortunately, in most areas of research, further sampling is not practical. It is usually expensive, time-consuming, and in some cases physically impossible (when test circumstances cannot be duplicated). Certainly in this production control experiment, another random sample can be taken with relative ease, however in most studies in marketing, medicine, sociology, economics, and other fields, we often must rely on the results of one and only one sample. Even in this production control experiment, no one wants to absorb the added time and expense of further sampling unless absolutely necessary. In other words, in statistical studies, we normally base our decision on one and only one sample. And we will conform to this practice in this text. So, to sum up our experiment, if our one sample average, x , is reasonably close to the claimed μ, we accept H0 as true and therefore accept the shipment of fiber-optic thread as meeting our specification of μ .560 mm. gib90160_ch07.indd 201 22/12/11 4:59 AM 202 CHAPTER 7 However, this may cause some questions, such as: at what point do we grow suspicious that our sample x is not reasonably close to μ? For instance, what if our sample average turned out to be .550 mm or .540 mm or .577? Clearly, these values are on the very fringe of the “expected” sample averages. Observe: x distribution .550 .570 .555 .565 .500 mm .530 mm .560 mm .590 mm .620 mm In other words, at what value of x do we begin to grow suspicious that maybe the supplier’s claim is false? Fortunately, there are certain industry standards that have proven reliable over decades of use. Although a number of industry standards exist, one of the most popular is the Level of significance, α 5% (.05) Although discussed in the last chapter, a brief review here might be helpful. Essentially, a level of significance sets up the cutoffs, or boundaries for accepting or rejecting H0. For instance, For level of significance, α 5% (.05),* establish where the middle 95% of the x ’s are expected to fall if H0 is true. Then, if the x you calculate from your random sample falls inside (or exactly on the border) of this 95% range, accept H0 as true. If the sample x falls outside, assume H0 is false. Visually we might present this α .05 hypothesis test as follows: FIGURE 7.5 α = .05 two-tailed test Accept HO ffor middle 95% of x’s ’ Acceptance/Rejection Regions Reject HO Reject HO μ 5% of x’s ’ Two-Tailed Test Th his i iss ca call lleed a two o-t -tai aile led hypothesis tes estt siincee wee h hav ave tw wo ta tail ilss of rrejeccti t on (ass sh how own n sh had aded ed aabbovee). ) That is, we would d rejec e t th he nuull hyp ypot othesi siss fo forr any samp m le ng iin n ei eith ther er off the tw wo shaded tail ils. x ffalliling *Actually, many levels of significance are possible. gib90160_ch07.indd 202 22/12/11 4:59 AM Hypothesis Testing 203 To recap: if your sample x falls inside this 95% range (or on the border), accept H0. If your sample x falls outside this range (that is, in the shaded tails), reject H0. And this is precisely what is done in industry and research. When making decisions concerning these large-sample, two-tailed hypothesis tests, we can choose between one of three methods: the P-value method and two versions of the classical (traditional) method. Since Method One is the most popular, we go into detail concerning its development and apply its meaning to our example. We then discuss both Methods Two and Three. Here is a problem as it would be worded and solved in practice. EXAMPLE A supplier claims the average thickness (diameter) of its fiber-optic thread is .560 mm. You receive a shipment and decide to test their claim at a .05 level of significance by taking a sample of 36 randomly selected measurements, with the following results: n 36 measurements x .553 mm s .030 mm What can we conclude? METHOD ONE: THE P-VALUE METHOD Perhaps the most popular way that we accept or reject our null hypothesis (H0) is to use the statistic called the p-value. For hypothesis tests of μ (and other statistical tests), computer printouts often provide a p-value in the results. We discuss here the basic notion of the p-value and apply it to help us solve the current problem at hand. DEFINITION A p-value refers to the probability of obtaining a sample average ( x ) at least as extreme as the one found from the sample data, given that the null hypothesis is true. We generally denote it as p or as p-value. For instance, for a sample x with a p-value p .0012 the probability of obtaining a sample average x at least as extreme as this sample x given the null hypothesis is true is .0012 (.12%). In such situations, we compare this p-value of the sample x (.0012) to the level of significance (α) for the experiment to determine whether we accept or reject the null hypothesis. For p greater than α, accept H0; otherwise reject. Notice that in this instance, if α .05, we would reject H0, since p was less than α (.0012 .05 or .12% 5%). gib90160_ch07.indd 203 22/12/11 4:59 AM 204 CHAPTER 7 Generally, if the p-value is a. greater than .05, there is insufficient evidence against H0 b. between .01 and .05, there is ample evidence against H0 c. between .001 and .01, there is strong evidence against H0 d. less than .01, there is strong evidence against H0 not significant significant highly significant very highly significant However, conventionally, those conducting the tests will choose the α-level (normally .05 or .01) and then compare to the p-value in order to reach a decision. We follow this testing convention, and apply this p-value method to our earlier problem about fiber optics, in which we use the popular 5% level of significance. P-values For Large-Sample, Two-Tailed Tests In the case of two-tailed tests when the sample is large (n 30), we compute the p-value by using knowledge of our test statistic ( x ) and the area or region that exceeds this value. Of course, before finding the area, we should convert the test statistic to its comparable z value so we can use the standard table (Table A in Appendix) to assist us with area percentages. If the test statistic falls to the right of the mean in the normal curve, we consider the upper percentage area from the test statistic to the nearest tail of the curve. If the test statistic falls to the left of the mean in the normal curve, we consider the lower percentage area from the test statistic to the nearest tail of the curve. Since we have a two-tailed test, once we decide on which region (upper or lower) is of interest, we double this found percentage value. We follow this strategy because the value to which we are comparing it has been divided into two tails. So we need to ensure that the p-value in this case represents total area in both tails. Thanks to the concept of symmetry, this allows us to add the p-value to itself or equally multiply it by 2. Figure 7.6 provides a visual interpretation for finding p-values in a large-sample, two-tailed scenario: FIGURE 7.6 Flowchart for Finding P-Values in Two-Tailed Tests (n 30) Locate Locate t test t t statistic stat t tiisttiic (x) C Convert t iinto intto z score sco e z = x s µ n U z ttable Use bl tto ddetermine t i area ffrom z t it to i closest its l t ttail il 1 p – value l = 0.50 0 50 – area ffrom mean to t z 2 Add this thi hi value l to t itself iit lff ((or multiply lltiiply value l by by 2)) to t get g t overallll pp-value value l . Signifies Sig ifi f area in i b thh tail both t il regions gi . (p - value l = 2 x [0.5 [[0 0 50 - area ffrom mean tto z]) gib90160_ch07.indd 204 22/12/11 4:59 AM Hypothesis Testing 205 Once we arrive at our p-value, we essentially have the area under the normal curve, which can be compared to the area of the significance level in the test ( ). If it is the case that the area outlined by the p-value engulfs the area outlined by ∝ (p value ∝ ), then we accept our null hypothesis (H0). For this type of test, a comparison of areas between the p-value and ∝ , along with decision results, is illustrated in Figure 7.7. FIGURE 7.7 Using P-values to Make Decisions (Two-Tailed) p-value diagram –ztest total p-value area (based on –z and z test α diagram vs. ztest total α level test statistic) –ztest ztest Case One If p – value > α, we accept Ho –ztest Observe total p-value area covers total α area when both curves above are placed on top off each other ztest Case Two If p – value < α, we reject Ho Observe total α area covers total p-value area when both curves above are placed on top off each other Applying P-values to our Example In our fiber optics example, we recall that we had a test statistic of .553 ( x .553) and wish to see if this statistic is “good enough” to support the supplier’s claim that fiber optics shipments, on average, are .560mm (μ .560mm). For p-values, we conduct the hypothesis test by following three fundamental sequences: SOLUTION Sequence I. Set up initial conditions: H0, H1, and level of significance: In Our Example, It Would Be H0: State the null hypothesis, that is the claim or assertion you wish to test. H1: State the alternative hypothesis. In other words, if H0 proves false, then what must we conclude? α: State the level of significance, α, that is, the risk of a Type I error (the risk of rejecting H0 in error). gib90160_ch07.indd 205 H0: μ .560 mm H1: μ .560 mm α .05 (5%) 22/12/11 4:59 AM 206 CHAPTER 7 Sequence II. Calculate p-value (see Figure 7.6) Convert x → z. Since we are using x , we have to remember that we have to consider the size of the sample (n). We also note that we should use s (the standard deviation of the sample) as an estimator of (the population standard deviation). z .553 .560 .007 .030 .005 36 z 1.40 Determine area from z to its closest tail to get half of p-value (two-tailed). Using the z table (Table A), we observe percentage from z 0 to converted z score. To get half of the p-value, we subtract this converted z score from .50 (or 50%). Percentage in Table A (in decimal form): 0 tto z 1.44 0.4192 1 pval .50 .4192 .0808 2 Determine total p-value. Add current value to itself (or multiply by 2). Convert to percent to get total p-value. Sequence III. Accept or reject H0 by comparing your p-value and . 8.08% pval 2( 0.0808) .1616 pval 16.16% Since p val , we accept H0. 8.08% -1.40 µ + 1.40 p = 16.16% (0.16) (cutoff) (cutoff) Since we achieved a sample result 1.40 standard deviations from the expected value, μ, we shade all the area that is at least 1.40 standard deviations from μ. Note in a two-tailed test, we shade both tails. Next, we found the probability of achieving a sample result in this shaded area, which is 16.16% (8.08% in each tail). This is our p-value. This is usually expressed in technical reports and computer software printouts as either p .16 or p > .05 (meaning the probability of achieving this sample x is greater than the α level of the test). For p α, ac Fo acce cept pt H0; otth heerwisee re r ject ct Since in our case, .16 .05, we accept H0. ANSWER The statement of the final answer using p-values is rather standard in industry and academic circles. The final answer for this specific example would be presented as follows: Our p-value of 16% is the probability by chance that we get a sample mean as extreme as .553 (assuming that μ .560 mm, x 0.0005 mm, and with a sample of gib90160_ch07.indd 206 22/12/11 4:59 AM Hypothesis Testing 207 n 36). Since this probability is relatively large (as compared to α), we do not have enough evidence to reject our null hypothesis. Hence, random chance very well may be a likely reason for the difference between μ .560 and μ .553. We conclude that the assumption of μ .560 mm is probably correct. Essentially, there is insufficient evidence to conclude that the mean thickness of the supplier’s fiber-optic threads differs from .560 mm. METHODS TWO AND THREE: THE CLASSICAL METHODS Although Method One is the most popular, several researchers still solve hypothesis tests using the more classical, traditional approaches. Both Methods Two and Three follow the same path, as outlined in Sequence I and II of the process. They differ in Sequence III. We use these methods to attack the previous example again about fiber-optic threads. The hypothesis test using the classical methods consists of three fundamental sequences as follows. SOLUTION Sequence I. Set up initial conditions: H0, H1, and level of significance: In Our Example, It Would Be H0: State the null hypothesis, that is, the claim H0: μ .560 mm or assertion you wish to test. H1: State the alternative hypothesis. In other words, if H0 proves false, then what must H1: μ .560 mm we conclude? α: State the level of significance, α, that is, the risk of a Type I error (the risk of reα .05 (5%) jecting H0 in error). Sequence II. Assume H0 true, use α to establish cutoffs as follows: Calculate x : We must remember we are dealing with x ’s and therefore must first calculate x , the standard deviation of the x distribution. Note in our formula for x , we used s (the standard deviation of the sample) as an estimator of σ (the population standard deviation). Draw Curves Using our above calculation, x ≈ .005 mm, we estimate the spread of the x distribution. x s n .030 6 36 n .030 .005 mm x distribution .550 -2 gib90160_ch07.indd 207 .570 .555 .560 .565 -1 0 1 2 (zz scores) 22/12/11 5:00 AM 208 CHAPTER 7 Establish Cutoffs (using α, the level of significance) Our level of significance in this case is α .05 (5%), which in a two-tailed test implies we will accept the middle 95% of the x ’s as our boundary for accepting H0 as true. We now look up the z scores corresponding to the middle 95% of the x ’s, which turn out to be z 1.96 and z 1.96. Remember: the normal curve table reads half the normal curve, starting from 1 z 0 out, so we look up 2 of 95% or 47 12 %, which in decimal form is .4750 (as shown at right). Middle 95% of x’s ’ x distribution .4750 .4750 .560 mm 0 x=? z = -1.96 x=? z = +1.96 Normal Curve Table z .00 .01 . . . .06 0.0 . . . .4750 1.9 Substituting the z scores of 1.96 and 1.96 into our formula, we solve for the x at the cutoffs. z 1 96 x x z x .560 .005 1 96 Solving for x : x x x .560 .005 Solving for x : x .550 mm x .570 mm The completed solution would appear graphically as follows: Population Accept H0 x distribution for sample size of n = 36 .500 mm x = .550 mm x = .570 mm z = +1.96 z = -1.96 .530 mm .560 mm .590 mm .620 mm Note that the reject zones are shaded, that is, the zones where we would reject μ .560 mm as being true. This is your risk of a Type I error (5%). Sequence gib90160_ch07.indd 208 III. Accept or reject H0 using your sample x : For this, two methods are available. Method One uses the actual value of the sample x . Method Two uses the z score of the sample x . Since each adds to understanding, we shall employ both. 22/12/11 5:00 AM Hypothesis Testing Method Two Th s m This metho hod uses es tth he actu ac ual value of th the samp sa m le x (.553 5 ) in thee deeci th c sion-mak a ing proc pr ocesss. 209 Criteria: Accept HO ((µ=.560 mm) if your sample x falls between the established x cutoffs off .550 mm and .570 mm, otherwise reject. Decision: Since our sample x (.553 mm) fell in the acceptance zone for f HO. we accept HO (µ=.560 mm) as true. Sample x = .553 mm Recall: Our sample results were as follows: n 36 measurements x .553 mm Method Three This m This metho hod usess the z scor sc oree of the sam mplle x in thee ddecision-makingg proc pr oces esss. To use th t is metthodd, however,, we me must mu st first s calcu ula late te the z ssco core of our saample x (.55 5533 mm),, as fo ollows. x .553 .560 z x .005 z 1.40 x x = .550 mm x = .570 mm (cutoff) f (cutoff) µ = .560 mm Accept H0 Criteria: Accept HO ((µ=.560 mm) if the z score off your sample x ffalls between the established z score cutoffs off -1.96 +1.96, otherwise reject. Decision: Since the z score of our sample x (-1.40) fell in the acceptance zone for f HO, we accept HO (µ=.560 mm) as true. z score of sample x = -1.40 z z=-1.96 (cutoff) z=+1.96 (cutoff) z=0 (.560 mm) Accept H0 Whether we use the actual value of the sample x or the z score of the sample x , we will always make the same decision. In this case, we accept H0. Generally, the z score is preferred by those most familiar with this statistical technique since the z score is a more informative measure. Note that we better understand the position of the sample x if we say it is 1.40 standard deviations from the claimed μ than if we merely presented its actual value of .553 mm.* ANSWER The final answer may be presented in a number of ways, depending on the technical expertise of those reading the report. a. If the report is to be presented to individuals unfamiliar with statistical technique, perhaps the following offers a clear approach: Since the sample average we obtained from the shipment (.553 mm) falls inside the range (.550 to .570) where we would most likely expect sample averages to fall if H0 were true, we accept H0: μ .560 mm, and therefore accept the shipment. Accept H0 b. However, this same answer may very well appear in a technical report worded in terms of z scores, as follows: Since our sample z of 1.40 is not less than 1.96, the null hypothesis cannot be rejected. The difference between .553 mm and .560 mm is not large enough gib90160_ch07.indd 209 22/12/11 5:00 AM 210 CHAPTER 7 to provide evidence at the .05 level of significance that the shipment does not meet supplier’s specification. Null hypot Nu othe hesi siss cann nnot ot be rejeect cted e ed c. Then again, many reports simply present the results as z 1.40 (not significant). Result Re ltss noot si sign gnifican ant* t* Believe it or not, all three answers say the same thing. Try to understand the technical explanations using z scores, since this is typical of how research reports are presented. CONTROL CHARTS In production studies and occasionally in marketing, medical, and other studies, the same hypothesis test may be repeated a number of times. For instance, what if this telephone cable manufacturer in the prior problem were to accept this shipment of fiber-optic thread and then ordered additional fiber-optic thread under the same specifications, to be delivered once a month for several months? Each monthly shipment may very well be tested in an identical manner. When essentially the same test must be repeated on a periodic basis, a control chart can be set up as follows, which exploits techniques used in the classical or traditional methods (Methods Two and Three): Construction of Control Chart 1. On a graph, establish cutoffs for a given hypothesis test. In industrial production, cutoffs are usually referred to as control limits. 1 2. Rotate graph 4 turn counterclockwise, extending the cutoff lines to the right. Shade rejection zone. 3. Plot each sample x sequentially to the right. Connect each x to prior result with a line segment. In a control chart, you may choose to use either actual values or z scores to represent the readings. For instance, say we use actual values. We would proceed (using our fiber-optic thread example) as shown in Figure 7.8. Note that one sample x (.547 mm) was marked “significant.” This means, based on this one sample average, we would reject this particular shipment as not meeting specifications. At this point, the production supervisor would likely be called in. After verifying results, the supervisor may very well call the manufacturer of the *Again, the words not significant have a very special meaning in statistical testing. Essentially, not significant means: the sample results (in this case, x .553 or z 1.40) are considered “chance fluctuation.” In other words, we would expect to find x ’s between 1.96 standard deviations of the mean if H0 were true. Since the z score (1.40) of our sample x was in this chance fluctuation range between 1.96 standard deviations, it is deemed not significant and we accept H0. gib90160_ch07.indd 210 22/12/11 5:01 AM Hypothesis Testing FIGURE 211 7.8 A Two-Sided Control Chart Fiber-optic thread example .570 .561 .560 .550 Rotate ¼ turn .553 .556 .554 .547 547 ((sig gnifi nifica cant nt)) .560 x = .550 x = .570 Cutoffs established, taken from prior example. 1 turn counterclockwise, extending Rotate 4 cutoff lines to the right and shading rejection zone (as shown in next diagram). Now let’s say we receive 5 shipments over several months and calculate the sample x for each as follows. x .553 mm x .561 mm x .547 mm (significant) x .554 mm x .556 mm Each sample x is plotted sequentially as the shipment comes in and connected with a line segment to prior result (as shown above). fiber-optic thread to inform them that their process was not meeting specification, and most likely “out of control.” A process is deemed out of control when sample x ’s fall outside the control limits for acceptance of H0 and we suspect a possible deterioration of the process. Note that a control chart provides a clear visual history of this hypothesis test. Often we learn more about a process by keeping this kind of record. Sometimes we can spot a trend, a process going out of control before a significant sample x is achieved. Or we may be able to pick up slight shifts in the value of μ, even though sample x ’s are in control. For a process in control, the sample x ’s should fluctuate (usually in a ragged pattern) around the value of μ. Notice that the x ’s we calculated, .553, .561, .547, .554, and .556, seem to fluctuate more around the value of .555 (than the value .560). If this trend continues for future shipments, we may very well suspect the thickness of the fiber-optic thread shipped may be on average, μ .555 mm. Of course, whether or not this slight shift makes a difference in our production would have to be assessed. A control ch hartt* pr providdes e a cle l ar vvis isuall hist stooryy off a rep epeetittive te test st. *Historical note: Walter Shewhart first developed control charts in 1924, which were tested and developed within the Bell Telephone System, 19261931. For further historical reading on this topic, refer to, W. Peters, Counting for Something (New York: Springer-Verlag, 1987), Chapter 16, “Quality Control,” pp. 151–162. gib90160_ch07.indd 211 22/12/11 5:01 AM
© Copyright 2024