Data-Driven Percentile Modified Two-sample Wilcoxon Test

Data-Driven Percentile Modified Two-sample
Wilcoxon Test
O. Thasa ∗and J.C.W. Raynerb
a
Dept. of Applied Mathematics, Biometrics and Process Control,
Ghent University, Belgium
b
School of Mathematical and Physical Sciences, University of
Newcastle, Callaghan, NSW 2308, Australia
∗
Corresponding author: Olivier Thas, Department of Applied Mathematics, Biometrics and
Process Control, Ghent University, Belgium (e-mail: [email protected])
1
Summary
The two-sample Wilcoxon rank sum statistic can be derived as the first component
of the Pearson chi-squared statistic in a particularly constructed contingency table.
For this test a “percentile modification” has been proposed, which is equivalent to
splitting the contingency table into two independent subtables, and computing the
Wilcoxon statistic on one of the subtables. Although this procedure does not use
all data in the sample, it often results in a power increase. The splitting position
is determined by an arbitrarily chosen trimming proportion p. We first illustrate
that an inappropriate choice of p may result in substantial power loss. To circumvent this problem, we propose a new test statistic by using a data-dependent
choice for p, say pˆ. We show that its asymptotic null distribution is the supremum
of a time-transformed Brownian Motion. At the rejection of the null hypothesis,
informative conclusions may be obtained by conditioning on pˆ. In a simulation
study it is shown that our solution results in a power advantage for some particular alternatives. Also, instead of using only one subtable, we suggest to compute
the Wilcoxon statistic on both subtables, and to consider their sum as a new test
statistic, which we consider as a recomposition of statistics, rather than a decomposition.
keywords: contingency table, Pearson chi-squared, Brownian Motion
2
1
INTRODUCTION
The Wilcoxon rank sum test (Wilcoxon, 1945) is probably the most common nonparametrical test for testing the two-sample location problem
H0 : F1 (x) = F2 (x) for all x
(1)
H1 : F1 (x) = F2 (x − ∆) for all x,
(2)
versus
where ∆ 6= 0 measures the location shift. This formulation of the two-sample
location problem is generally referred to as the location-shift model. Under the
restriction that F1 and F2 have the same “shape”, the location-shift model implies
that the hypotheses are equivalent to H0 : ∆ = 0 versus H1 : ∆ 6= 0, where ∆ is
now the difference in means (medians).
Although most text books present the location-shift model to introduce the Wilcoxon
rank sum test, this test actually tests the null hypothesis H0 : P [X < Y ] =
1/2, where X and Y have distribution functions F1 and F2 , respectively, without imposing any further restrictions on their shape. Note that F1 = F2 implies
P [X < Y ] = 1/2, but the reverse is not necesarilly true. Some authors have argued (see Hollander & Wolfe, 1999, Chapter 4, for an overview) that a conclusion
as e.g. P [X < Y ] > 1/2 is more informative than a conclusion about differences
in means. For example, suppose X and Y are responses under placebo and active
treatment, respectively. Then, rejecting H0 in favour of P [X < Y ] > 0.5, implies
that it is more likely that giving an active treatment to a patient will result in a
higher response than when the patient would have been given placebo.
Some people prefer conlclusions in terms of stochastic ordenings of F1 and F2 ,
i.e. they want to test F1 = F2 against F1 (x) < F2 (x) (F1 (x) > F2 (x)) for all x.
This is referred to as “X is stochastically larger (smaller) than Y ”. Evidently, this
further implies P [X < Y ] < 1/2 (P [X < Y ] > 1/2), but this is not necessarily
true in the other direction.
As mentioned before, the most common nonparametrical test for testing H0 is the
Wilcoxon rank sum test (Wilcoxon, 1945). Rayner and Best (2001) have shown
how this statistic is obtained as the first component of Pearson’s χ2 statistic computed on a particularly constructed contingency table. This is summarized in Section 2.
1
Gastwirth (1965) has proposed a percentile modified two-sample Wilcoxon test.
Basically, his modification consists in pooling the data of the two samples, and
performing the Wilcoxon test on only a fraction of the data, where the fraction is
determined by percentages r and p of the most extreme small and large data points
in the pooled sample, respectively. In this paper, we set r = 0, i.e. the fraction
only refers to a portion of extreme large observations. This trimming proportion
p must be determined a priori. Although in this way, a part of the data is actually
discarded, Gaswirth has shown that in many occasions a power gain is obtained.
However, if p is chosen badly, then there is a risk of loosing power. In Section 3 we
show that this modification is basically the same as splitting the contingency table
and using only one part for further calculations. We also illustrate the meaning of
p and argue that an appropriate value for p may be estimated from the data, say pˆ,
so that the percentile modified Wilcoxon statistic based on pˆ is maximized. Also
this pˆ has an interesting interpretation, and the maximized Wilcoxon statistic may
be used as a test statistic. In Section 4 we give the asymptotic null distribution.
The results of a small simulation study are presented in Section 5.
2
The Contingency Table Approach
Let X1 , . . . , Xn1 and Y1 , . . . , Yn2 denote the sample observations from F1 and F2 ,
respectively, and let n = n1 + n2 , the total sample size. The pooled sample
observations are denoted by Zi , and the corresponding order statistics by Z(i)
(i = 1, . . . , n). A binary variable S is defined as an indicator for the original
sample, i.e. Si = 1 if Zi is an observation from the first sample, and Si = 2 if Zi
is originally from the second sample. Similarly, S(i) refers to Z(i) . The two-sample
Wilcoxon rank sum statistic is then given by
Wn =
n
X
jI(Z(j) ),
(3)
j=1
where
I(Z(j) ) =
if S(j) = 1
.
if S(j) = 2
1
0
(4)
A 2 × n contingency table, say {Nij }, is constructed as follows (this exposition
follows from Rayner and Best (2001)). Let N1j = I(Z(j) ) and N2j = 1 − I(Z(i) )
2
(j = 1, . . . , n). Note that by construction the n column totals N.j = N1j +
N
P2jn (j = 1, . . . , n) are fixed and all equal to one. The two row totals Ni. =
j=1 Nij = ni are the number of observations in the first (i = 1) and in the
second (i = 2) sample.
Under the two-sample null hypothesis, the two column distributions {N1j }j and
{N2j }j are equal to the marginal distribution, which is the uniform. Therefore,
Pearson’s X 2 test seems an appropriate choice of test statistic. In the contingency
table Pearson’s X 2 statistic for independence (or homogeneity of column distributions) is given by
2 X
n
X
(Nij − Ni. /n)2
2
.
Xn =
N
/n
i.
i=1 j=1
From Rayner and Best (2001), Pearson’s X 2 statistic satisfies
Xn2 =
n−1
X
n−1
X
Uj2 =
j=1
Vjt Vj ,
j=1
where Vjt = (V1j , V2j ) and
Vij =
−1/2
Ni.
n
X
Nik gj (k).
k=1
Here, the {gj } are orthonormal polynomials on the marginal column distribution,
which is, by construction, the uniform discrete distribution {n−1 , n−1 , . . . , n−1 }.
We take gn (k) = 1 for all k = 1, . . . , n, so that for s = 1, . . . , n − 1, gs (.) is a
polynomial of degree s. In particular, the degree one polynomial is
r
12
n+1
k−
.
g1 (k) =
n2 − 1
2
As the first component we find

Wn −
U12 = V1t V1 =  q
2
N1. (n+1)
2

,
(5)
N1. N2. (n+1)
12
which is exactly the squared standardized Wilcoxon rank sum statistic, which
has asymptotically a χ21 distribution under H0 . Moreover, the asymptotic null
3
distribution of U12 follows also directly from the general theory given in Rayner
and Best (2001).
The second component of Xn2 is

2
Pn
n2 −1
n+1 2
√
− 12 N1.
k=1 N1k k − 2
 ,
U22 = V2t V2 =  n q
1
2
2
N N (n − 1)(n − 4)
180 1. 2.
(6)
which is exactly the square of the standardized Mood statistic for testing equality
of dispersion. The third component U32 = V3t V3 of Xn2 , is given by

√
U32 = V3t V3 =  n
Pn
n+1
k=1 N1k k − 2
q
3
−
3n2 −7
20
Pn
k=1 N1k k −
(n2 −1)(n2 −4)(n2 −9)N1. N2.
2800
n+1
2
2
 .
(7)
They all have an asymptotic
χ21
null distribution.
[something about the relation with two-sample tests for dispersion and skewness
with reference to (Eubank, LaRiccia, & Rosenstein, 1987)]
3
3.1
The Percentile Modification
Percentile Modified Statistics
Gastwirth (1965) proposed a percentile modification to increase the power of twosample tests. He claims that differences between distributions or samples are often
masked when all observations are considered. In particular, differences become
sometimes more apparent in the tails of the distribution. Therefore, his suggestion
is to remove a fraction of the sample observations and to apply the Wilcoxon test
on the remaining observations.
The fraction of the observations to be discarded is denoted by p, and let np denote the largest integer smaller than or equal to pn. Given the original order
statistics Z(i) , we define the lower observations as Z(1) , . . . , Z(np ) , and the upper observations as Z(np +1) , . . . , Z(n) . The construction of the contingency table
{Nij }i=1,2;j=1,...,n immediately implies that the lower observations completely de4
termine the left part of the table, {Nij }i=1,2;j=1,...,np , and, similarly, the upper observations determine the right part of the table, {Nij }i=1,2;j=np +1,...,n . The row
totals are denoted by Ni.low = Ni.low (p) and Ni.up = Ni.up (p) (i = 1, 2), respectively.
For notational comfort, the dependence on p will be omitted unless it is needed.
We will continue with the contingency table corresponding to the upper observations and give these upper observations ranks 1, . . . , n − np . In this table the
theory of Section 2 applies. Hence, the first component of Pearson’s X 2 statistic
is the square of the standardized conditional Wilcoxon rank sum statistic, based
on the upper subsample. Conditional refers to the conditioning on the row totals
of the selected subtable. In particular, the conditional Wilcoxon statistic is given
by
n
X
up
up
W = W (p) =
(j − np )I(Z(j) ),
(8)
j=np +1
and the first component is the square of
T
up
W up − 21 N1.up (nq + 1)
,
= T (p) = q
up up
1
N
N
(n
+
1)
q
2.
12 1.
up
(9)
where nq = n − np is the number of upper observations in the sample. Note that
W up (0) is exactly the Wilcoxon statistic. To apply the general distribution theory
of Pearson components (Rayner & Best, 2001), it must be assured that nq → ∞
as n → ∞. Therefore, we will assume that p < 1, which is an obvious restriction
from a ‘data’-analysis point of view. Under this condition it follows immediately
that T up has asymptotically a standard normal null distribution.
It may happen that one believes that the difference between F1 and F2 will occur
in the lower part. Then, the lower contingency table may be used to construct
a statistic similar to T up . Since the construction is basically exactly the same,
we will give no further details. Let T low = T low (p) denote this statistic. Note
that also T low has an asymptotic standard normal null distribution, provided that
p > 0. Since T low and T up are calculated on mutually distinct subsamples, it may
be informative to combine them both into one test statistic,
T tot (p) = T up (p)2 + T low (p)2 .
(10)
Equation 10 looks like a decomposition of the T tot (p) statistic, though we pefer to
look at it as a recomposition of the statistics T up (p) and T low (p).
5
3.2
Profiles
The distribution theory of the percentile modified Wilcoxon test only holds when p
is fixed, i.e. when p is chosen prior to the observation of the data. Of course, it may
often happen that p is chosen inappropriately, resulting in a low power test. In this
section, we propose to compute a profile of the percentile modified Wilcoxon test.
A profile is simply T up (p) considered as a function of the trimming proportion p.
Since each p corresponds to exactly one threshold value Z(np ) , the profile can also
be defined as T up (p(z)) with p(z) the smallest p such that p(Znp ) ≤ p. In a similar
way T low (p(z)) and T tot (p(z)) are the profiles of the T low and T tot statistics.
We argue that profile plots are informative exploratory tools, but we will also use
them here to illustrate that some choices of the trimming proportion p may result
in poor tests. As an example, we consider two samples of 50 observations each.
Figure 1 shows the EDF’s of the two samples and the three profile plots for T up ,
T low and T tot . The EDF’s clearly show that the shapes of the two distributions are
quite different. The profile plots show for each z (or corresponding p) the value of
the percentile modified test statistic. For each fixed p, the level α = 0.05 critical
value of T up (p) and T low (p) is 1.96 (two-sided; asymptotic approximation). From
the profile plot of T up , it is seen that significance would only be obtained with
1 < z < 4, and, similarly, for T low significance only occurs for 2.5 < z < 4. For
each p, T tot (p) is asymptotically χ22 distributed. Thus, significance would result
from T tot (p) > 5.99. This happens when 2 < z < 4. These plots clearly illustrate
that the final results of the percentile modified tests depend heavily on the choice
of p.
The profile plots also suggest another approach: define a test statistic as the maximum absolute value of the profile over p ∈ (0, 1). This is further discussed in the
next section.
4
The Supremum Statistic
4.1
The Test Statistics and Asymptotic Null Distributions
Define
Snup = sup |Tnup (p)|,
p
6
(11)
●●
1
● ●
●
−1
●●
−5
profile
5
10
15
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
● ●
●●
● ●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
0
●●●
●
●●●
5
10
z
lower
total
35
z
●●●
●
●●●
●● ● ●
5
10
15
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
0
z
●●
15
●
●●
15
●● ● ●
●●
●
●●
●
●
●
●
●
●
●
●
●●●
●
● ●
●
●
●● ●
●
●●
●
●
●
●
25
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
0
●
●●
0 5
2
1
−1
profile
3
4
5
0
●
●
●●
−3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●● ●
●
●
● ●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
upper
profile
0.4
0.0
Fn(x)
0.8
EDF
●●●
●
●●●
5
10
●● ● ●
●●
15
z
Figure 1: Example data set. Upper left pannel: EDF’s of the two samples; upper
right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel:
profile of T tot .
7
and denote pˆ for the value of p for which the supremum is obtained. In a similar
way Snlow = supp |Tnlow (p)| and Sntot = supp Tntot (p) are defined. Of course the
asymptotic null distribution of these new test statistics will no longer be standard
normal. In the following theorem the asymptotic null distribution is discussed. In
Figure 1 the pˆ corresponding to the maxima are indicated by vertical lines. It is
interesting to note that the positions of these lines more or less coincide with the
crossing of the two EDF’s, and that they approximately devide the z-range into
parts with conditional stochastic ordenings.
The asymptotic null distribution of Snup is given in the next theorem. The proof is
given in appendix A.
Theorem 1 Let 0 < θ < 1, Ψ(p) = (1 − p)3 , and let S(t) denote a Brownian
motion on [Ψ(θ), 1]. Then, as n → ∞
(a)
w S ◦Ψ
T up −→ 1/2
Ψ
(b)
S(t) d
up
up
Sn = sup |T (p)| −→ sup 1/2 0≤p≤θ
Ψ(θ)≤t≤1 t
Note that the restriction p ≤ θ < 1 expresses the condition that there should be
observations in the upper sample. A very similar theorem follows immediately
for T low and Snlow . Once the weak limits of T up and T low are known, the following
theorem follows almost immediately (an outline of the proof is given in appendix
B).
Theorem 2 Let 0 < θ < 12 , Ψup (p) = (1 − p)3 , Ψlow (p) = p3 , and let S(t) denote
a Brownian motion on [Ψlow (θ), Ψup (θ)]. Then, as n → ∞
(a)
!
!
T
tot
w
−→
2
S ◦ Ψup
+
1/2
Ψup
2
S ◦ Ψlow
1/2
Ψlow
(b)
Sntot =
d
sup T tot (p) −→
θ≤p≤1−θ
"
sup
θ≤p≤1−θ
8
S(Ψup (p))
Ψup (p)1/2
2
+
S(Ψlow (p))
Ψlow (p)1/2
2 #
Table 1: Simulated critical values for the Snup (Snlow ) and Sntot statistic
Sntot
n1 = n2
Snup (Snlow )
α = 0.01 α = 0.05 α = 0.10 α = 0.01 α = 0.05 α = 0.10
10
2.8284
2.4056
2.2478
9.9709
8.0404
7.1333
25
3.1720
2.7045
2.5311 12.8290
9.8143
8.7879
50
3.3526
2.8999
2.6734 14.2166 11.1685
9.7863
∞
3.7488
3.2341
3.0111
Simulations of Snup under H0 for several sample sizes n1 and n2 have indicated
that the convergence to the limiting null distribution is quite slow. Therefore, we
propose to work with the Monte Carlo approximation to the exact null distribution.
For sample sizes (n1 = n2 ) n1 = 10, n1 = 25 and n1 = 50, and levels α = 0.01,
α = 0.05 and α = 0.10, the simulated and asymptotic critical values are given in
Table 1. All estimates are based on 100,000 simulation runs.
In the example presented in Figure 1 both samples had 50 observations. The
critical value of Snlow and Snup at α = 0.05 is 2.9 (Table 1), and for Sntot we find
11.2. Using these critical values as thresholds in the profile plots of Figure 1,
we find that all three sup-tests result in a rejection of the null hypothesis H0 :
P [X < Y ] = 0.5.
4.2
Interpretation of Alternatives
As mentioned in the introduction, the Wilcoxon rank sum test tests the null hy1 +1
ˆ = n11n2 W − n2n
pothesis H0 : P [X < Y ] = 21 . It is also interesting to note that π
2
is an unbiased estimator of P [X < Y ]. This is a direct consequence of the relationship between the Wilcoxon and the Mann-Whitney statistics. Thus, at the
rejection of H0 , π
ˆ may be directly used to formulate a conclusion.
In many practical situations P [X < Y ] is close to 0.5, but a closer look at the
data suggests that this is caused by computing P [X < Y ] as an integral over the
whole support of the distributions F1 and F2 . For example, the two empirical
distribution functions (EDF) in Figure 2 show samples from two normal distributions, both with mean 2, but the data from the first group has standard deviation
1, whereas the standard deviation in the second population is 0.2. The traditional
9
1.0
EDF
0.2
0.4
Fn(x)
0.6
0.8
group 1
group 2
0.0
●
0
●
●
●
●
●
●
●
1
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
3
4
x
Figure 2: EDF of two samples from normal distributions with means equal to 2,
and standard deviations equal to 1 (group 1) and 2 (group 2)
two-sample Wilcoxon rank sum test gives a p-value of 0.46, so that it is concluded that P [X < Y ] = 0.5. However, Figure 2 indicates that small values of
x are more likely to result from group 1. When x < 2, the latter statement is
true with probability close to one! Also, when x > 2, it is with very high probability that the response of group 2 is smaller than the respose of group 1. This
is exactly what the percentile modified Wilcoxon test of Gastwirth (1965) could
have detected if p was chosen such that a fraction p of the pooled sample data is
smaller than 2 (p approximately 0.5). With such a p, the null hypothesis of the
percentile modified Wilcoxon test is H0 : P [X < Y |X, Y > 2] = 0.5, and for
the example data set this test rejects H0 with a p-value of zero, resulting in the
conlcusion P [X < Y |X, Y > 2] <<< 0.5. With the same trimming proportion
p, a percentile modified Wilcoxon test could have been computed on the lower instead of the upper observations. Also here the null hypothesis is strongly rejected,
resulting in the conclusion P [X < Y |X, Y < 2] >>> 0.5.
10
5
Simulation Study
We limit the simulation study to some specific alternatives which may be of particular interest. For instance, two distributions which show conditional stochastic
ordenings. As a benchmark comparison we also include some simulations under
the location-shift model. Although there are many tests described in the literature
for the two-sample problem, we only include the Wilcoxon rank sum test and the
two-sample t-test. The latter is only considered because it is often (mis)used in
practice. All tests are performed at the 5% level of significance and all powers are
estimated based on 10.000 simulation runs. Sample sizes of n1 = n2 = 20 and
n1 = n2 = 50 are considered.
As a first alternative, two normal populations in a location-shift model are considered. The first is standard normal, and the second has also variance 1 and a varying
mean from 1 to 1.5 in steps of 0.25. Figure 3 shows the EDF’s and profile plots of
a random sample of n1 = n2 = 20 observations. As a second alternative, also two
normal distributions are considered. Again, the first is standard normal, and the
second has mean 1 and a varying standard deviation from 1 to 3 in steps of 0.5.
Figure 4 shows the EDF’s and profile plots of a random sample of n1 = n2 = 20
observations. Finally, two exponential distributions are considered. The first has
rate 1, and the second has a varying rate between 1 and 3 in steps of 0.5. Since
the rate parameter influences both the mean and the shape in general, the samples
from these distributions are corrected for the mean, i.e. the theoretical mean (inverse of the rate) is substracted from each sample. An example is presented in
Figure 5.
The results of the simulation study are presented in Figures 6, 7 and 8.
For the location-shift alternative, Figure 6 shows that the best powers are obtained
with the t-test, very closely followed by the Wilcoxon test. This is as expected, for
the simulations were performed under all optimality conditions for the t-test. The
powers of the three new tests are smaller, but no power break down is observed.
The powers of the new tests are quite similar. Under the second alternative (Figure 7), the powers of the t-test and the Wilcoxon test decrease as the standard
deviation of the second normal distribution is increased. The opposite behaviour
is observed for the Snup and Sntot tests. The power of the Snlow test is generally low.
Finally, Figure 8 shows very large powers for the Snlow and Sntot tests, moderate
powers for the Snup test, low powers Wilcoxon test, and virtually no power for the
11
−1
−2
profile
−3
1
●
●●
●
●
●
●
●
●●
1
●
0
●
●
●
2
●
●
●
3
●
●
●
●
−1
0
●
●
●
●●
●
●
●
● ●●
●
● ●●
●●
●
0
1
● ●
●
●
●
2
3
●
●
●
●
●
●
●
●
●
2
3
●
●
● ●
●
2
3
●
●●
●●
●●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●●
−1
z
●
●
●●
●
●
●
1
total
●
●
●
●
● ●
●
● ●
●●
● ●● ●
●
●
●
●
●
●
●●
●●
lower
●
−2
●
●
●
●
●
●
●
●
●
z
●
●●
−1
●
●
z
−3
profile
0
●●
●
●
−1
0
−1
●●
●
●
●
●
●
●
●
●
●
12
●
●
●
●
●
●
2 4 6 8
●
●
●
●
●
●
●
●
●
●
●
upper
profile
0.4
0.0
Fn(x)
0.8
EDF
0
1
z
Figure 3: Example data set (two normal distributions with variance 1, and means
0 and 1, respectively). Upper left pannel: EDF’s of the two samples; upper right
pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel:
profile of T tot .
12
−2
0
2
−0.5
0
−1.5
−2.5
total
−2
0
2
4
8
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
2
●
●● ●
2
●
0
0.0
−2
lower
profile
2.0
profile
1.0
−4
−4
z
●
●
●
●
z
●
−6
●
●● ●
●
−6
●
● ●
●
●●
●● ● ●
●●●
●
● ●
●
● ●● ●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●●●
●●● ●
●●
●● ●
●
4
●
●
●
●
●
●
10
−4
●
●
6
−6
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●●
● ●
● ●
● ●
●
●
●
4
●
●
●
upper
profile
0.4
0.0
Fn(x)
0.8
EDF
4
●
●
●
−6
z
−4
●
●
●
●● ●
−2
0
2
●
4
z
Figure 4: Example data set (two normal distributions with means 0 and 1, and
standard deviations 1 and 3, respectively). Upper left pannel: EDF’s of the two
samples; upper right pannel: profile of T up ; lower left pannel: profile of T low ;
lower right pannel: profile of T tot .
13
EDF
3
2
1
0
0.5
●● ●●
●
●
●
●
●
●
●
−0.5
20
total
●
0.0
10
●
●
0.5
●
●●
●
●●
●
● ●●
●
●● ●●
−0.5
z
0.5
●
5
●
●●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
15
●
●
0.0
lower
0
0
−2
−3
●
● ●●
●
z
●
●
●
●
●
●
●
●
●
●● ●
●● ●
●●
●
●
−4
●
z
●
●
●
−0.5
●
●
●
●
0.0
●● ●●
−1
●
● ●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
profile
0.4
●
●
●
●
●
●
●
−0.5
profile
●
profile
●
●
● ●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0.0
Fn(x)
0.8
●
upper
●
0.0
●
0.5
z
Figure 5: Example data set (two exponential distributions with rates 1 and 3, both
corrected for the mean). Upper left pannel: EDF’s of the two samples; upper
right pannel: profile of T up ; lower left pannel: profile of T low ; lower right pannel:
profile of T tot .
14
1.0
Normal, location shift, n=50
1.0
Normal, location shift, n=20
●
●
●
●
●
0.8
0.8
●
0.6
power
power
●
0.4
●
0.2
0.4
S low
S up
S tot
t
W
0.2
0.6
●
●
●
S low
S up
S tot
t
W
●
●
●
0.0
●
0.0
●
0.0
0.5
1.0
1.5
0.0
0.5
1.0
mu2
1.5
mu2
Figure 6: Estimated powers for the location-shift alternatives for sample sizes
n = 20 (left pannel) and n = 50 (right pannel)
1.0
Normal, n=50
1.0
Normal, n=20
●
0.8
0.8
●
●
0.4
0.6
●
power
●
S low
S up
S tot
t
W
●
0.4
power
0.6
●
●
●
●
●
0.2
0.2
●
●
●
S low
S up
S tot
t
W
●
0.0
0.0
●
1.0
1.5
2.0
2.5
3.0
1.0
sd
1.5
2.0
2.5
3.0
sd
Figure 7: Estimated powers for the normal scale alternatives for sample sizes
n = 20 (left pannel) and n = 50 (right pannel)
15
Exponential, n=50
1.0
1.0
Exponential, n=20
●
●
●
●
●
●
●
●
0.6
S low
S up
S tot
t
W
0.4
0.4
power
●
power
●
0.6
●
0.8
0.8
●
●
0.2
●
●
1.0
S low
S up
S tot
t
W
0.0
0.0
0.2
●
1.5
2.0
2.5
3.0
1.0
rate
1.5
2.0
2.5
3.0
rate
Figure 8: Estimated powers for the exponential alternatives for sample sizes n =
20 (left pannel) and n = 50 (right pannel)
t-test.
In general, for the alternatives considered, the Sntot test seems to be a reasonable
compromise between the Snlow and Snup tests.
16
APPENDIX: PROOF OF THEOREM 1
First, the weak convergence of T up (p) is shown. Part (b) of the theorem is a direct
consequence of part (a) and the continuous mapping theorem.
To proof the weak convergence of T up (p), we will first start with showing the
weak convergence of the statistic
√
√
1 up
1 up
up
Sn (p) = 48σn (p)T (p) = n
W (p) − 2 N1. (n(1 − p) + 1) ,
n2
2n
q
1
N up (p)N2.up (p)(n(1 − p) + 1). This proof consists of two
where σn (p) =
12n3 1.
parts: we need to show (1) finite dimensional convergence to a multivariate normal distribution, and (2) tightness of {Sn }. Once weak convergence of Sn is
established, we use the Skorohod construction and a strong approximation result
for the weighted version T up of Sn .
First, the notation is simplified. Let Nn (p) = N1.up (p) and Wn (p) = W up (p).
(1)
To prove multivariate normality, it is sufficient to apply the multivariate central
limit theorem. It has been shown before that for each p, E {Sn (p)} = 0. Furthermore, as n → ∞, it is easy to show that Var {Sn (p)} = 48σn2 (p) → (1 − p)3 (we
have used limn→∞ Nnn(p) = 12 (1 − p), which is the convergence of a non-random
series, for all inference is conditional on Nn (p)). We only need to calculate the covariance Cov {Sn (p), Sn (q)}. Straightforward, though lengthy calculations give
lim Cov {Sn (p), Sn (q)} = (1 − p)3 ∧ (1 − q)3 ,
(12)
n→∞
where ∧ denotes the minimum operator. Equation 12 shows exactly the covariance
function of a Brownian motion in transformed time (1 − p)3 .
(2)
Tightness is proven if we succeed in showing that for all n ≥ 1, and p1 ≤ p ≤ p2 ,
E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2 ≤ (µ(p) − µ(p1 ))(µ(p2 ) − µ(p)),
where µ is a continuous measure on [0, 1]. This is essentially Theorem 6 in Chapter 2 of Shorack and Wellner (1986), with a = 1 and b = 2.
17
By the independence of Sn (p1 ) − Sn (p) and Sn (p) − Sn (p2 ), we have
E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2
= E (Sn (p1 ) − Sn (p))2 E (Sn (p) − Sn (p2 ))2 .
Further, we have
E (Sn (p1 ) − Sn (p))2 = Var {Sn (p1 )} − 2Cov {Sn (p1 ), Sn (p)} + Var {Sn (p)}
1
1
1
≤
(1 − p1 )3 − 2 (1 − p)3 + (1 − p)3
4
4
4
3
3
≤ (p − 1) − (p1 − 1)
= µ(p) − µ(p1 ),
where µ(p) = (1 − p)3 is a continuous measure on [0, 1]. The first inequality sign
in the above calculations follow from the observation that the variance of Sn (p) is
a decreasing function of n. Similarly, E (Sn (p) − Sn (p1 ))2 ≤ (p2 − 1)3 − (p −
1)3 = µ(p2 ) − µ(p). Hence,
E (Sn (p1 ) − Sn (p))2 (Sn (p) − Sn (p2 ))2 ≤ (µ(p)−µ(p1 ))(µ(p2 )−µ(p)). (13)
This completes the proof of the weak convergence of Sn to a Brownian motion in
transformed time (1 − p)3 .
For the Skorohod construction the following strong approximation holds
Sn (p) − S(Ψ(p)) a.s.
−→ 0 as n → ∞,
(14)
sup Ψ1/2 (p)
0≤p≤θ
Rθ 1
provided that 0 Ψ(t)
dΨ(t) = − ln(Ψ(θ)) < ∞, which holds because θ < 1 is
assumed. This strong approximation result may be reformulated as
3 up
S((1
−
p)
))
a.s.
−→
sup T (p) −
0 as n → ∞,
(15)
3/2
(1 − p)
0≤p≤θ
from which the weak convergence of T up immediately follows.
APPENDIX: PROOF OF THEOREM 2
w
From Theorem 1 we know that T up −→
S◦Ψup
1/2
Ψup
S◦Ψlow
1/2 , as
Ψlow
T tot and Sntot are
n → ∞. Both part (a) and part (b) follow immediately, because
continuous function of T low and T up (continuous mapping theorem).
18
w
, and similarly, T low −→
References
Eubank, R., LaRiccia, V., & Rosenstein, R. (1987). Test statistics derived as components of Pearson’s phi-squared distance measure. Journal of the American Statistical Association, 82, 816-825.
Gastwirth, J. (1965). Percentile modification of two sample rank tests. Journal of
the American Statistical Association, 18, 1127-1141.
Hollander, M., & Wolfe, D. (1999). Nonparametric statistical methods. New
York: Wiley.
Rayner, J., & Best, D. (2001). A contingency table approach to nonparametric
testing. New York: Chapman and Hall.
Shorack, G., & Wellner, J. (1986). Empirical processes with applications to
statistics. New York, USA: Wiley.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics
Bulletin, 1, 80-83.
19