Wilcoxon Mann-Whitney test

Supplement 16B: Small Sample Wilcoxon Rank Sum Test
Hypothesis Testing Steps
When the samples have fewer than 10 observations, it may not be appropriate to use the large
sample Wilcoxon Rank Sum (Mann-Whitney) test. However, there are tables of critical values
that allow us to safely perform this test for small samples. As in the large sample case, the
hypotheses are:
H0: Populations are the same
H1: populations are not the same
If the analyst is willing to assume that the populations differ only in location (i.e., the center of
the distributions is shifted) and are otherwise the same, we can view this as a test of two
medians. For a two-sided test the hypotheses would then be:
H0: M1= M2 (medians are the same)
H0: M1≠ M2 (medians are not the same)
The hypothesis testing procedure is similar to the large-sample case until we get to Step 4.
Step 1: Combine the two samples.
Step 2: Calculate the ranks for the combined samples, sorting from smallest to largest. Be
careful to average the ranks if there are tied data values.
Warning: If you are using Excel, use the 2010 Excel function =RANK.AVG(X,Array,1) to sort
from smallest to largest. Be sure to specify the third argument “1” because the default is to sort
from largest to smallest. That is, if yo0u were to use the function =RANK.AVG(X,Array,0) or if the
third argument is omitted as in =RANK.AVG(X,Array,) your data will be sorted from largest to
smallest (the opposite of the test format shown here). Also, beware of the old 2007 Excel
function =RANK(X,Array) and the new 2010 function =RANK.EQ(X,Array) which do not handle tied
data values correctly.
Step 3: Separate the ranks into the original groups and sum the ranks for each group. Denote the
rank sums T1 and T2.
Step 4: The test statistic is the sum of the ranks from the smaller sample (the sample with fewer
observations). To avoid confusion, it is best to list the smaller sample first, so that the test
statistic can be denoted T1. Table 16.xx shows the critical values for a two-tailed test at α = .05
(upper and lower 2.5% critical values). Reject H0 if T1 ≤ WLower or if T1 ≥ WUpper.
Illustration: Computer Repair Claims
Warranty
Baldr Electronics Emporium is a medium-sized electronics retailer that offers a one-year parts
and labor warranty on laptop computers that it sells. During the month of October, there were 15
claims for warranty repairs for its top two brands of laptops (6 claims for brand A and 9 claims
for brand B). The store noted the number of days the laptops had been owned prior to coming in
for repair. Is there a difference in the days owned prior to repairs? There is doubt about whether
the data are normally distributed, so we will perform the Wilcoxon rank sum test to compare the
medians. The color-coded data are:
Brand A
Brand B
225
83
79
52
225
113
52
67
29
165
98
132
48
230
255
Step 1: Combine the two samples.
Step 2: Calculate the ranks for the combined samples, sorting from smallest to largest. Be
careful to average the ranks if there are tied data values. For example, here the value 52 occurs
twice, as does the value 225. Color coding helps you keep track of data in the the combined
samples.
Combined and
Sorted
Rank
Brand A
29
1
48
2
52
Rank
Brand B
29
1
48
2
52
3.5
52
3.5
3.5
79
6
67
5
52
3.5
98
8
83
7
67
5
225
12.5
113
9
79
6
225
12.5
132
10
83
7
165
11
98
8
230
14
113
9
255
15
132
10
165
11
225
12.5
n1 =
225
12.5
Median 1 =
230
14
255
15
Sum T1 =
43.5
6
88.5
Sum T2 =
n2 =
Median 2 =
Rank
76.5
9
113.0
Step 3: Separate the ranks into the original groups and sum them for each group. The test
statistic is the sum of the ranks from the smaller sample (the sample with fewer observations). If
you wish, you can check your sums by adding; the ranks must sum to n(n+1)/2 where n = n1 +
n2. In this case, n = n1 + n2. = 6 + 9 = 15 so the ranks must sum to n(n+1)/2 = 15(15+1)/2 = 120.
Checking our rank sums, we get T1 + T2 = 43.5 + 76.5 = 120 which confirms our rank
calculations.
Step 4: Table 16.B1 shows the critical values for a two-tailed test at α = .05 (upper and lower
2.5% critical values). We would reject H0 if T1 ≤ WLower or if T1 ≥ WUpper. For our data, n1 = 6 and
n2 = 9, so the decision rule is:
Reject H0 if T1 ≤ 31 or if T1 ≥ 65
Because our test statistic is T1 = 43.5, we cannot reject H0. Although there is a difference in the
sample medians, it is not great enough to conclude unequal population medians.
TABLE 16.B1 Lower 2.5% and Upper 2.5% Critical Values for Wilcoxon
Rank Sum Test
n1
n2
4
4
10,26
5
11,29
17,38
6
12,32
18,42
26,52
7
13,35
20,45
27,57
36,69
8
14,38
21,49
29,61
38,74
49,87
9
14,42
22,53
31,65
40,79
51,93
62,109
10
15,45
23,57
32,70
42,84
53,99
65,115
78,132
11
16,48
24,61
34,74
44,89
55,105
68,121
81,139
5
6
7
8
9
10
11
12
96,157
12 17,51 26,64 35,79 46,94
58,110
71,127
84,146
99,165
115,185
Decision Rule: Reject the null hypothesis if T1 ≤ WLower or if T1 ≥ WUpper where T1 is the rank sum from
the smaller sample. Source: F. Wilcoxon and R.A. Wilcox, Some Rapid Approximate Statistical
Procedures, Lederle Laboratories, 1964. Use with permission of the American Cyanamid Company.
Step 5: No action is required, However, the retailer may wish to continue accumulating data on
the length of time before each warranty claim for these two top-selling brands. It is possible that
in a larger sample, significant differences might be detected.
Computer Software
There are many reasons to prefer using a computer for this type of test. First, the calculations are
easier. Second, you don’t need tables. Third, tables become awkwardly large for this test when
sample sizes become larger. Table 16.B1, for example, is abbreviated. If you have sample sizes
between 13 and 20, you would need a larger table. Figure 16.B1 show the output from Minitab,
which confirms our calculations and our decision not to reject H0 at α = .05. Note that Minitab
also provides a confidence interval for the difference of medians as well as a p-value (0.6367)
which shows that the observed difference in medians is within the realm of chance.
FIGURE 16.B1 Minitab Results for Wilcoxon Rank Sum/Mann-Whitney Test
Mann-Whitney Test and CI: Brand A, Brand B
Brand A
Brand B
N
6
9
Median
88.5
113.0
Point estimate for ETA1-ETA2 is -15.0
96.1 Percent CI for ETA1-ETA2 is (-113.0,112.0)
W = 43.5
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.6374
The test is significant at 0.6367 (adjusted for ties)
Section Exercises
Note: *Indicates optional exercises based on large sample z-test
or using software that may not be available to students.
16B.1 A trucking company wants to compare the number of miles driven by two delivery truck
drivers in one week on different days (n1 = 5 days, n2 = 7 days). Do not assume that
distances driven are normally distributed. (a) Use Table 16.B1 to test the hypothesis of
equal medians at α = .05. Show the steps in your analysis. (b*) If possible, check your
work using Minitab or another computer package. (c*) Perform a large-sample test using
the z-test. Is your conclusion the same?
Delivery
Driver 1
128
102
78
40
76
Driver 2
97
158
112
112
216
316
112
16B.2 Below are data for two different regions, showing the number of days that kidney
transplant patients had to wait before a donor was found (n1 = 6 patients, n2 = 8 patients).
Do not assume a normal distribution of waiting times. (a) Use Table 16.B1 to test the
hypothesis of equal medians at α = .05. Show the steps in your analysis. (b*)If possible,
check your work using Minitab or another computer package. (c*) Perform a largesample test using the z-test. Is your conclusion the same?
Kidneys
East Region
West Region
109
137
248
93
85
52
107
191
28
236
67
205
92
133