Constructing Confidence Intervals from a Ranked Set Sample Technical Report 783

Constructing Confidence Intervals from a Ranked
Set Sample
Technical Report 783
Christopher J. Sroka, Elizabeth A. Stasny, and Douglas A. Wolfe
Department of Statistics
The Ohio State University
August 5, 2006
Abstract
Ranked set sampling (RSS) is a sampling method whereby auxiliary information is used to rank the units in the population. A unit is included in the
sample based on its rank amongst the other units. The method was first proposed over fifty years ago as a way to increase the precision of estimates of
pasture yields without increasing the sample size. The estimator of the population mean based on balanced RSS has been shown to be unbiased and have
variance no larger than the estimator based on the same number of measured
observations from simple random sampling (SRS). In some cases, RSS can result in more precise estimates than those obtained through stratified simple
1
random sampling. RSS holds the potential to significantly improve the efficiency of survey sampling.
Despite previous research demonstrating the advantages of RSS, the literature has not addressed how to calculate a confidence interval for the population
mean using RSS. The critical component to computing such a confidence interval is estimating the variances of the judgment order statistics. In this paper,
we develop estimators for these variances and demonstrate some of their desirable properties. Using these properties, we provide a formula for calculating
a large sample confidence interval based on a normal approximation. This
method is extended to the case where RSS is used in place of SRS in stratified sampling. We evaluate the performance of our confidence interval method
using simulated samples from the Medical Expenditure Panel Survey (MEPS).
We find that confidence intervals based on RSS can be over 40% shorter than
competing confidence intervals based on a SRS of equivalent sample size. The
data used for the simulation are heavily skewed, so the true confidence level is
usually less than the nominal confidence level. RSS and SRS perform similarly
with respect to how quickly the levels of their respective confidence intervals
converge to the nominal level.
Key words: Medical Expenditure Panel Survey, ratio estimation, stratified
sampling, variance estimation
2
1
Introduction
Ranked set sampling (RSS) is a method of data collection whereby the sampler’s
judgment about the relative sizes of the population units determines which units
are selected for measurement. The method was first devised by McIntyre (1952) as
a way to improve estimates of pasture yields. McIntyre wanted to utilize selective
sampling to increase the precision of his estimates without introducing bias. Since
this original paper, RSS (in its basic form) has been shown to be at least as precise
as SRS for estimating the population mean (Dell and Clutter 1972). Kaur, Patil,
Shirk, and Taillie (1996) demonstrated that in some cases, RSS can result in more
precise estimation than stratified simple random sampling (SSRS). Equivalently, fewer
quantified measurements are needed under RSS than under SRS or SSRS to obtain
a specified level of precision. Thus, RSS provides a more cost-effective alternative
to other sampling methods when it is relatively inexpensive to rank units in the
population but costly to quantify the variable of interest. Kaur, et al. (1996), Nahhas,
Wolfe, and Chen (2002), and Wang, Chen, and Liu (2004) developed cost models to
determine when RSS is more efficient than SRS and SSRS.
The most basic form of RSS is balanced RSS. The sampler uses SRS, with
replacement, to select m2 units from the population. These m2 units are divided into
m sets of size m. The units within each set are ranked in increasing order according
to the variable of interest prior to actually measuring this variable. The ranking is
done using some sort of auxiliary information, such as a concomitant variable or visual inspection. The auxiliary information need not be perfectly correlated with the
3
variable of interest. The next step is to select units from the ranked sets for measurement. A set is randomly chosen without replacement, and from it the smallest ranked
unit is chosen. Another set is selected without replacement, and from it the second
smallest unit is chosen. This process continues until m units have been selected, each
representing a different rank from a different set. The variable of interest is quantified
for this subsample of m units. The units in the sets not included in the subsample are
disregarded, and analysis is carried out using only the measurements on the variable
of interest.
This process yields m measurements on the quantified variable. The process
can be repeated k times, called cycles, resulting in a total sample size of mk. Each
measurement is called a judgment order statistic. We let X[r]i denote the judgment
order statistic representing the rth rank in the set taken during the ith cycle, r =
1, 2, . . . , m and i = 1, 2, . . . , k.
The ranking based on the auxiliary information might not result in the true
ranking that would result if the units had been ranked based on complete knowledge of
the variable of interest. In the case where the two sets of rankings are the same, we say
we have perfect rankings. The term imperfect rankings is used to describe situations
where the rankings based on the auxiliary information differ from the rankings based
on knowledge of the quantified variable. Rankings can be so imperfect as to result in
an ordering that is reversed from the true ranking. If such an imperfect ranking is
applied consistently, it will be no different than taking a RSS under perfect rankings.
The worst case scenario, in terms of the sample obtained, would be if the ranks were
assigned randomly to the units in the sets. In this case, the judgment order statistics
4
have no more structure to them than a SRS of size mk.
An estimator of the population mean under RSS, µ
ˆ, is the simple average of
the mk quantified observations, namely,
µ
ˆ=
m
k X
1 X
X[r]i
mk i=1 r=1
(1)
Even if the rankings are not perfect, Dell and Clutter (1972) show that this estimator
is unbiased if the ranking errors are uncorrelated with the ranking procedure.
RSS has been shown to fit easily into survey sampling settings. Stokes (1977)
showed that concomitant variables, which are frequently available in stratified survey designs, can be used for purposes of ranking. Husby, Stasny, and Wolfe (2005)
demonstrated that estimates of corn production in Ohio can be significantly more
precise under RSS than under SRS. The gains in precision are greater as the rankings
become closer to perfect. Chen, Stasny, and Wolfe (2005) developed a method for
estimating a population proportion via RSS. This method uses logistic regression on a
set of concomitant variables to arrive at the rankings. Sroka, Stasny, and Wolfe (2005)
showed that RSS can be used in place of SRS within strata in stratified sampling to
improve precision.
The RSS literature does not discuss how one would go about computing a
confidence interval for the population mean once a RSS has been obtained. Previous
studies demonstrating the precision of RSS rely on simulations to estimate the largesample variance of µ
ˆ. Dell and Clutter (1972) show that the variance of µ
ˆ is a function
of the variances of the r judgment order statistics. There still remain the dual issues
of how to estimate these variances and how to determine distributional quantiles for
5
a confidence interval.
In Section 2, we develop an unbiased, consistent estimator for the variance
of the rth judgment order statistic. We use this estimator to derive a large sample
confidence interval based on a balanced RSS. We extend these results in Section 3
to the case where RSS is used in a stratified framework. Section 4 examines the use
of RSS in place of SRS in ratio estimation. In Section 5, we discuss the results of a
simulation study used to evaluate our confidence intervals. We present our conclusions
in Section 6.
2
Confidence Intervals for Balanced RSS
Suppose we collect a balanced RSS from the population of interest using k cycles and
set size m. We do not need to assume that the rankings are perfect. Our first step
in developing a confidence interval for the mean µ is to determine the variance of the
estimator µ
ˆ. Since only one of our measured quantities is selected from a set, and the
sets are independent simple random samples, all of our observations are independent.
They are not, however, identically distributed, since X[r]i and X[s]i , r 6= s, represent
different ranks assigned to the units in the sets. Let f[r] denote the distribution of the
2
rth judgment order statistic. Furthermore, let µ[r] and σ[r]
denote the mean and vari-
ance, respectively, of this distribution. For a fixed r, X[r]1 , X[r]2 , . . . , X[r]k represent
independent and identically distributed measurements drawn from the distribution
f[r] . Thus,
Var(ˆ
µ) =
1
mk
2 X
m
k X
i=1 r=1
6
Var(X[r]i )
=
=
1
mk
1
m2 k
2 X
m
k X
2
σ[r]
i=1 r=1
m
X
2
σ[r]
.
r=1
¯ [r]· = Pki=1 X[r]i /k. It is easy to see that this is an unbiased estimator of
Let X
2
µ[r] . We propose the following estimator of σ[r]
:
2
S[r]
=
k
1 X
¯ [r]· )2
(X[r]i − X
k − 1 i=1
(2)
Note that this is just the usual variance estimator for a SRS from the distribution
f[r] . This is a logical choice given the fact that the judgment order statistics are
independent and their distribution is identical for fixed r.
Since X[r]1 , X[r]2 , . . . , X[r]k is a random sample from f[r] , it follows immediately
2
that the properties of S[r]
are identical to those that hold for S 2 in the SRS case.
p
2
2
2
2
Thus, E(S[r]
) = σ[r]
and S[r]
→ σ[r]
as k → ∞, provided the fourth moment of f[r] is
finite.
The following result allows us to make statements about the convergence of
2
S[r]
based solely on the distribution of the population whence our RSS came.
Theorem 1. Let X[1]1 , X[1]2 , . . . , X[m]k be a RSS from a population with distribution
f (·). A sufficient condition for the fourth moment of f[r] to be finite is that the fourth
moment of f (·) is finite.
Proof. Dell and Clutter (1972) noted that
f (x) =
m
1 X
f[r] (x).
m r=1
(3)
The result follows from the Law of Total Probability. The ranking procedure partitions the population into m classes, each with probability of selection 1/m. The
7
value f[r] (x) is the probability of selecting x given partition r was chosen. Using this
result, we have
4
E(X[r]i ) <
=
=
=
m
X
4
E(X[r]i )
r=1
m Z
X
r=1 X
Z X
m
X r=1
Z
X
x4 f[r] (x)dx
x4 f[r] (x)dx
x4 mf (x)dx
= m E(X 4 )
where X follows the population distribution f (x).
The last line of the proof holds because the support for the judgment order
statistics is the same as the support for the distribution of the overall population.
This is an important fact that distinguishes judgment order statistics from ordinary
order statistics. Given a ranked set, only one unit is selected from which we obtain
the measured judgment order statistic. Since the sets are independent, there are no
restrictions placed on the values of X[1]i , X[2]i , . . . , X[m]i other than those determined
by the sample space of the population. Conversely, ordinary order statistics are
the ranked observations from a single random sample. Their distribution carries the
constraint Y1:n ≤ Y2:n ≤ . . . ≤ Yn:n , where Yj:n is the j th order statistic from a random
sample of size n.
We cannot calculate an exact confidence interval for µ without knowledge of
f[r] . We can, however, develop a large sample approximate confidence interval based
on the standard normal distribution. This approach assumes that the number of
8
cycles, k, goes to infinity while the set size, m, is fixed.
Theorem 2. Let X[1]1 , X[2]1 , . . . , X[m]k be a balanced ranked set sample based on set
size m and conducted for k cycles. Let µ denote the mean of the underlying distribution from which the sample was obtained. As k → ∞, for fixed m,
√ µ
ˆ−µ
d
−→ N(0, 1),
m k qPm
2
r=1 S[r]
(4)
2
where µ
ˆ and S[r]
are defined by Equations 1 and 2, respectively.
¯ [·]i =
Proof. Let X
1
m
Pm
r=1
(1972) showed that µ =
X[r]i , i = 1, 2, . . . , k. Applying Equation 3, Dell and Clutter
1
m
Pm
r=1
µ[r] . Therefore,
m
m
1 X
1 X
¯
µ[r] = µ
E(X[·]i ) =
E(X[r]i ) =
m r=1
m r=1
(5)
m
m
1 X
1 X
2
¯
Var(X[r]i ) = 2
σ[r]
.
Var(X[·]i ) = 2
m r=1
m r=1
(6)
and
¯ [·]i , i = 1, 2, . . . , k are independent and identically distributed with
For fixed m, the X
mean and variance specified above. Applying the standard Central Limit Theorem,
¯ [·]i − µ d
√ 1 Pki=1 X
kq
k
−→ N(0, 1)
1 Pm
2
r=1 σ[r]
m2
(7)
2
as k → ∞. It follows from the consistency of S[r]
that
m
X
r=1
p
2
S[r]
−→
m
X
2
σ[r]
r=1
vP
u m
2
u r=1 S[r]
p
=⇒ t Pm 2 −→ 1.
r=1
σ[r]
We apply Slutsky’s Theorem so that
q
Pm
2 ¯ [·]i − µ √ 1 Pki=1 X
r=1 S[r] −1 d
kq
qP
k
−→ N(0, 1).
m
1 Pm
2
2
r=1 σ[r]
r=1 σ[r]
m2
After simplification, we achieve our result.
9
(8)
It follows from Theorem 2 that an asymptotic 100(1 − α)% confidence inteval
for µ based on a RSS is
µ
ˆ ± zα/2
qP
m
r=1
2 S[r]
√
m k
(9)
where zα/2 is the upper 100(1−α/2)th percentile of the standard normal distribution.
3
Extension to Stratified RSS
Sroka, et al. (2005) introduced the concept of stratified ranked set sampling (SRSS).
Under this sampling design, the SRS stage of stratified simple random sampling is
replaced by RSS. This sampling method can result in significant gains in precision of
estimators over stratified simple random sampling (SSRS).
We modify our notation from Section 2 to account for the stratification. Suppose within each stratum h = 1, 2, . . . , H, we obtain a RSS of set size mh over k
cycles, where k is the same across strata. The sample size from each stratum equals
mh h; the optimal allocation of sample sizes to strata can be achieved by varying mh .
Let
X[r]ih = the rth judgment order statistic from cycle i in stratum h
2
σ[r]h
= the variance of the rth judgment order statistic in stratum h
µh = the mean of stratum h
Nh = the population size of stratum h
N =
¯ [·]i· =
X
H
X
Nh = the total population size
h=1
H X
h=1
Nh
N
mh
1 X
X[r]ih = the RSS mean for cycle i
mh r=1
10
k
X
¯ [r]·h = 1
X[r]ih = the mean of the rth judgment order statistic for stratum h
X
k i=1
µ
ˆh =
µ
ˆSRSS
2
S[r]h
X
mh X
k
1
X[r]ih = the RSS mean for stratum h
mh k r=1 i=1
H
k
X
1X
Nh
¯
=
X[·]i· =
µ
ˆh = the SRSS mean
k i=1
h=1 N
k
1 X
¯ [r]·h )2
(X[r]ih − X
=
k − 1 i=1
= the sample variance for the rth judgment order statistic for stratum h.
We can develop a large sample confidence interval for µ under SRSS using
a method similar to that used for the ordinary RSS case. For SRSS, we allow k,
the number of cycles of RSS performed in each stratum, to get large, while the
number of strata and the set sizes in each stratum remain fixed. The quantities
¯ [·]i· , i = 1, 2, . . . , k are independent and identically distributed random variables
X
such that
¯ [·]i·
E X
=
=
=
mh
H X
Nh
1 X
h=1
H X
h=1
H X
h=1
N
Nh
N
mh
1
mh
Nh
µh
N
r=1
X
mh
E(X[r]ih )
µ[r]h
r=1
= µ
and
¯ [·]i·
Var X
=
=
mh
H X
Nh 2 X
r=1
h=1 N mh
mh
H
2
X Nh
X
h=1
N mh
11
r=1
Var(X[r]ih )
2
σ[r]h
.
By the Central Limit Theorem,
√ k s
1
k
PH
Pk
h=1
i=1
¯ [·]i· − µ
X
Nh
N mh
2
Pmh
2
r=1 σ[r]h
d
−→ N(0, 1)
(10)
2
as k → ∞. By the consistency of S[r]h
,
s
PH
s
h=1
PH
h=1
Nh
N mh
2
Nh
N mh
2
Pmh
r=1
2
S[r]h
Pmh
2
r=1 σ[r]h
p
−→ 1
(11)
as k → ∞. After dividing the expression in Equation 10 by the expression in Equation
11, applying Slutsky’s Theorem, and simplifying, we get the pivotal quantity
√ k s
µ
ˆSRSS − µ
PH
h=1
Nh
N mh
2
Pmh
2
r=1 S[r]h
(12)
which has a standard normal distribution for large k. Thus, from a SRSS, the approximate 100(1 − α)% confidence interval for the population mean is given by the
followng formula:
µ
ˆSRSS ± zα/2
v
2
u
Pmh 2
u PH
u h=1 Nh
r=1 S[r]h
N mh
t
k
(13)
where zα/2 is the upper 100(1−α/2)th percentile of the standard normal distribution.
4
Ratio Estimation
In this section, we discuss a ratio estimator for the population mean when the data
were obtained using RSS. We compare the accuracy of this estimator to the corresponding estimator for SRS. Since ratio estimators are known to be biased, we compare the RSS estimator to the SRS estimator using the mean squared error (MSE).
12
Let X and Y denote two correlated random variables with respective means
µX and µY . We assume µX > 0 and µY > 0, as is typical for most variables of interest
in a survey sample. If we know the value of µX , we can use it to estimate µY (Lohr
1999). Let (Xi , Yi ), i = 1, 2, . . . , n denote measurements of X and Y from a SRS of n
units from the population. The ratio estimator of µY , denoted µ
˜Y,SRS , is
Y¯
µ
˜Y,SRS = ¯ µX
X
(14)
¯ are the sample averages of the Yi and Xi values, respectively. The
where Y¯ and X
MSE of µ
˜Y,SRS is approximately equal to the following quantity:
2
µ2 σY2
σX
2Cov(X, Y )
M SE(˜
µY,SRS ) = Y2
+
−
2
2
nµX µY
µX
µX µY
(15)
2
where σX
and σY2 are the variances of X and Y , respectively. Note that the MSE
decreases as the covariance between X and Y becomes large and positive.
We consider the case of using ratio estimation when the (X, Y ) pairs are
obtained via RSS. The sampling procedure is similar to that of the univariate case.
The sampler identifies m sets of m units from the population. For now, we assume
the units within each set are ranked based on the sampler’s perceived value of the
Y variable; we will see later, however, that one could instead rank based on the
perceived X value of the units. For the unit selected from a set for measurement,
both the Y variable and the X variable are measured. The process is repeated k
times. Thus, our data from this process are the mk pairs (X[r]i , Y[r]i ), r = 1, 2, . . . , m,
and i = 1, 2, . . . , k.
Although the ranking is done based on perceived Y values, the realizations
of the X variable are also considered judgment order statistics. The observations
13
on X were indirectly ranked by virtue of their correlation (if any) with the ranking
procedure. The ranking procedure produces a de facto ranking of X based on the
ranker’s perception of Y for each of the m2 units in the initial sample. The rankings of
the X values will become more accurate as the correlation between X and Y increases
and the ranking of Y becomes more accurate.
Both of the judgment order statistics, X[r]i and Y[r]i , follow particular distributions based on the ranking procedure. The mean and variance of the rth judgment
2
order statistic related to X are denoted µX[r] and σX[r]
, respectively. The correspond-
ing quantities related to Y are denoted µY [r] and σY2 [r] .
Assume µX , the population mean of X, is known. Then a ratio estimator of
µY based on RSS is
Y¯RSS
µ
˜Y,RSS = ¯
µX
XRSS
(16)
¯ RSS is the sample
where Y¯RSS is the sample average of the mk Y[r]i values and X
average of the mk X[r]i values.
¯ RSS . This
The MSE of µ
˜Y,RSS depends on the covariance between Y¯RSS and X
covariance can be expressed in terms of the covariance between X and Y for the entire
population.
Theorem 3. Suppose we obtain observations (X[r]i , Y[r]i ), r = 1, 2, . . . , m, i = 1, 2, . . . , k
from a population using RSS. Then
m
X
¯ RSS , Y¯RSS ) = 1 Cov(X, Y ) + 1 µX µY − 1
Cov(X
µX[r] µY [r] .
mk
mk
m2 k r=1
14
(17)
Proof.
¯ RSS , Y¯RSS ) = Cov
Cov(X
1
mk
=
=
1
mk
1
mk
=
k
k
m X
m X
1 X
1 X
X[r]i ,
Y[r]i
mk r=1 i=1
mk r=1 i=1
2 X
m X
k
Cov(X[r]i , Y[r]i ) +
r=1 i=1
2 X
k
m X
XX
Cov(X[r]i , Y[s]j )
r6=s i6=j
Cov(X[r]i , Y[r]i )
r=1 i=1
2 X
m
Cov(X[r]1 , Y[r]1 )
k
(18)
r=1
The penultimate line holds because the judgment order statistics are obtained from
independent samples. Consequently, X[r]i and Y[s]j are independent whenever r 6= s
or i 6= j. The last line follows from the fact that (X[r]i , Y[r]i ) and (X[r]j , Y[r]j ) are
identically distributed for all 1 ≤ i, j ≤ k, i 6= j. The ranking procedure partitions
the joint distribution of X and Y , f (x, y), into m classes. Thus, by the Law of Total
Probability,
m
1 X
f (x, y) =
f[r,r] (x, y)
m r=1
(19)
where f[r,r] is the joint distribution of X[r]1 and Y[r]1 . It follows that
m
X
Cov(X[r]1 , Y[r]1 ) =
r=1
=
m
X
E(X[r]1 Y[r]1 ) −
r=1
m Z
X
r=1 X
Z
Y
m
X
µX[r] µY [r]
r=1
xyf[r,r] (x, y)dydx −
= m E(XY ) −
m
X
m
X
µX[r] µY [r]
r=1
µX[r] µY [r]
(20)
r=1
Our final result is obtained by substituting the expression in Equation 20 into the
expression in Equation 18 and replacing E(XY ) by Cov(X, Y ) + µX µY .
Using the above result, we provide a simplified expression for the MSE of
the ratio estimator under RSS. The MSE for the RSS estimator is a function of the
15
MSE for the SRS estimator, so the relative magnitudes of the two quantities can be
compared easily.
Theorem 4. The ratio estimator under RSS (Equation 16) has the following approximate mean squared error:
1 µY
MSE(˜
µY,RSS ) ≈ MSE(˜
µY,SRS ) − 2
m k µX
2 X
m r=1
µY [r] µX[r]
−
µY
µX
2
.
(21)
¯ RSS can be approximated using a Taylor series expanProof. The ratio of Y¯RSS to X
sion.
Y¯RSS
1 ¯
µY ¯
µY
+
(
Y
−
µ
)
−
(XRSS − µX )
≈
RSS
Y
¯ RSS
µX µX
µ2X
X
(22)
From this approximation, we get
¯
YRSS
µY
−
µX
2
MSE(˜
µY,RSS ) = E ¯
XRSS
2
µY ¯
1 ¯
(YRSS − µY ) − 2 (XRSS − µX )
≈ E
µX
µX
2 ¯
¯ RSS − µX )2
(YRSS − µY )2 (X
µY
+
=
E
µX
µ2Y
µ2X
¯ RSS − µX )(Y¯RSS − µY ) 2(X
−
µX µY
¯ RSS )
µY 2 Var(Y¯RSS ) Var(X
+
=
2
µX
µY
µ2X
¯ RSS , Y¯RSS ) 2 Cov(X
−
.
µX µY
(23)
¯ RSS and Y¯RSS was demonstrated in Theorem 3. From Dell and
The covariance of X
Clutter (1972), we have
m
1 2
1 X
σX − 2
(µX[r] − µX )2
mk
m k r=1
m
1 X
1 2
σY − 2
(µY [r] − µY )2 .
Var(Y¯RSS ) =
mk
m k r=1
¯ RSS ) =
Var(X
16
(24)
(25)
Using the expressions in Equations 17, 24, and 25 in Equation 23 and setting aside
the terms that contain the subscript r, we obtain
MSE(˜
µY,RSS ) = MSE(˜
µY,SRS ) −
−
Pm
r=1 (µX[r] −
m2 kµ2X
Pm
r=1 (µY [r] −
m2 kµ2Y
2
µX )
µY )2
1
−
−2
mk
Pm
r=1 µX[r] µY [r]
m2 kµX µY
After expanding the squared terms, simplifying, and factoring into a single squared
term, we obtain the desired result.
Theorem 4 provides important insight into the accuracy of the ratio estimator under RSS relative to the accuracy of the estimator under SRS. First, the RSS
estimator has MSE that is no greater than the MSE for the SRS estimator. The
MSE under RSS will be smallest relative to the MSE under SRS when the quantity
µY [r] /µY is considerably different from the quantity µX[r] /µX for all r = 1, 2, . . . , m.
This situation would arise, for instance, when the ranking procedure produces a perfect ordering of the Y variable, but it produces an ordering of the X variable that is
exactly opposite the true ordering. Furthermore, the MSE for the RSS estimator is
invariant with respect to which variable formed the basis for the judgment rankings
in RSS. We assumed in our previous discussion that the sampler would rank the sets
based on perceived values of Y . We see from Equation 21 that the MSE is not altered
if the sampler ranked the sets based on judgments about X.
For RSS to achieve a worthwhile reduction in MSE over SRS, some care must
be used when choosing a ranking variable. The above result implies that the ranking
variable should not be highly correlated with both X and Y in the same direction.
Ideally, the ranking variable would be highly correlated with Y (either positively or
17
negatively) and have an opposite and strong correlation with X. Ratio estimation
exploits the information contained in the relationship between X and Y . For RSS to
be advantageous, the ranking must utilize information about Y that is not already
accounted for in X.
One can compute an approximate confidence interval for the ratio µY /µX
under RSS using our approach in Section 2 and techniques similar to those used
under SRS (Lohr 1999).
5
5.1
Simulation
Study
We conducted a simulation study to evaluate how often our confidence intervals for
RSS and SRSS contain the population mean. We used data from the Medical Expenditure Panel Survey (MEPS). MEPS is a rotating panel survey of households and
medical care providers conducted by the U.S. Agency for Healthcare Research and
Quality. The data collected from the survey are used to estimate how much health
care Americans use and the amounts they pay for it. Our simulation utilizes the 2002
consolidated Household Component. This dataset consists of responses provided by
all households in the various panels for the entire year. Data are collected for every
individual in the household.
We collapsed the MEPS Household Component to give us totals by household.
We also removed records for which a negative income was reported. The resulting
18
dataset consists of 14,686 records. We treat the dataset as though it were an actual
population from which to sample. Therefore, the exact survey design and survey
weights corresponding to the MEPS records are irrelevant for the simulation. The
14,686 households are the entire population, and the mean of these households on each
variable of interest is the true population mean we wish to estimate. The quantified
variables for our simulation are total prescription drug expenditures, logarithm of the
quantity (total prescription drug expenditures + 1), total health expenditures paid
by insurance on behalf of the household, and total household health expenditures.
The simulation study consisted of two parts. In the first part, we compared
the RSS confidence interval with the SRS confidence interval. We took 10,000 samples from the population using each sampling design. All sampling was done with
replacement. The appropriate formula was applied to each sample to compute a 95%
confidence interval for the population mean. We calculated the average length of the
10,000 intervals and the percentage of intervals that contained the true mean. For
RSS, the simulation was conducted using set sizes of 2, 5, and 10 and enough cycles to
yield sample sizes of 20, 50, 150, 500, and 2,500 for each set size. The SRS simulation
utilized the same sample sizes.
The second part of the simulation study compared SRSS to SSRS. The population was divided into three strata of equal sizes based on the values of the ranking
variable. For the SRSS design, we sampled from each stratum using RSS; the set size
per stratum, mh , was fixed at 5 for all strata. We conducted the simulation when
the number of cycles of RSS used in each stratum was 5, 10, 50, 100, and 250. This
resulted in total sample sizes of 75, 150, 750, 1,500, and 3,750, respectively. For the
19
SSRS simulation, we sampled equal numbers of observations from each stratum via
SRS such that the total sample sizes would be the same as those for the SRSS simulation. As in the first part of the study, we obtained 10,000 samples and computed a
95% confidence interval for the population mean from each one. We then calculated
the average length of the intervals and the percentage of intervals containing the true
mean.
Both parts of the simulation study were conducted using various combinations
of ranking variables and variables of interest. The accuracy of the rankings in RSS
can be quantified using the Spearman rank correlation coefficient between the quantified and ranking variables. A higher value of the coefficient corresponds to more
accurate rankings. Table 1 indicates the variables used in the simulations and the
rank correlations between them.
All of the quantified variables used in the simulation measured some type of
expenditure. The distributions of these variables were skewed heavily to the right,
and a significant number of observations had values of zero. To simulate a case with a
more symmetric distribution, we added 1 to the total drug expenditure variable and
took the natural logarithm.
5.2
Results
The results of our simulation study are presented in Table 2 through Table 9. In
almost all of our simulations comparing RSS to SRS, the RSS interval was, on average,
shorter than the SRS interval from the same size sample. The differences in interval
20
length were largest for the case where the log of the quantified variable and the
ranking variable were highly correlated (see Table 2). In this case, when the number
of sets used in RSS was ten, the decrease in the length of the interval, as a percent
of the interval length under SRS, ranged from 42% to 44%, depending on the total
sample size. For our simulations comparing SRSS to SSRS, the SRSS intervals were,
on average, always shorter.
In most of our simulations, the percent of intervals containing the population
mean did not come close to 95% for either RSS or SRS (see Tables 3, 4, 5, 7, 8,
and 9). This result is not surprising given the skewed distributions of the quantified
variables. For the simulation where we used the natural logarithm (see Table 2), the
RSS resulted in close to 95% coverage when the number of cycles was as low as 15. It
is important to note that the percent of intervals containing the mean under the RSS
simulations did not differ substantially from the percent containing the mean under
the SRS simulations of equivalent sample size. Thus, the asymmetrical nature of the
distributions of the variables is affecting the RSS procedure to the same degree that
it affects the SRS procedure.
5.3
Discussion and Extensions
In one simulation, the RSS intervals were, on average, longer than the SRS intervals.
This occurred when the quantified variable and the ranking variable had a low rank
correlation coefficient (see Table 5). In fact, 10 out of the 15 average interval lengths
for RSS provided in Table 5 are longer than the corresponding average lengths for SRS.
21
We conducted a follow-up simulation study for the case where the rank correlation
coefficient between the variables was low. In the follow-up study, we repeated the
study as described above except that 100,000 iterations were used. The results of
the follow-up study are shown in Table 10. Only four of the 15 average interval
lengths for RSS were longer than the corresponding average interval lengths from
SRS. Moreover, the differences were negligible for three of these reported lengths;
the percent difference was less than 0.05%. The largest difference (0.173%) occurred
when the set size was 2 and the total sample size was 20. For small set sizes, and low
correlation between the variables, we would not expect RSS to perform much better
than SRS. We conclude that variability in the simulation process led to the average
lengths under RSS being longer than the average lengths for SRS.
We note that it may be inappropriate to compare the asymptotic behavior of
the RSS approximate confidence interval to that of the SRS approximate interval at
identical sample sizes. The behavior of the RSS interval relies on the number of cycles,
k, increasing to infinity. Conversely, the approximate interval for SRS depends on the
total sample size, n = mk, being large. Thus, at a fixed sample size, the asymptotic
behavior of the SRS interval is more established than the asymptotic behavior of the
RSS interval.
A simulation study involving ratio estimation is not presented here, but our
methods could easily be modified to accommodate one. A possible variable of interest
is the amount of health expenditures paid by private insurance. Total household
health expenditures could be used as the X variable in the ratio estimation, since
it is moderately correlated with the variable of interest (Spearman rank correlation
22
is 0.522). As a ranking variable, one could use household income; it is moderately
correlated with the variable of interset (Spearman rank correlation is 0.433) but is
weakly correlated with the total health expenditures (Spearman rank correlation is
0.116). The conditions for RSS to improve upon ratio estimation, discussed in Section
4, would be met. The approximate confidence interval for the ratio estimator under
RSS can be calculated by applying the methods from Section 2 in conjunction with
approximations provided by Lohr (1999).
6
Conclusions
In this paper, we introduced an estimator of the variance of the rth judgment order
statistic. We demonstrated desirable properties of this estimator that allow us to
compute approximate confidence intervals for the mean of a population using a RSS.
These results were extended to the case where RSS is conducted within a stratified
sampling framework.
When the ranking procedure provides reasonable structure to the observations, the confidence interval based on RSS is shorter than the corresponding interval
based on SRS. In our simulation study, we obtained intervals based on RSS that were
sometimes over 40% shorter than their SRS counterparts. The largest gains in precision were achieved when the number of sets used for RSS was relatively high. If the
ranking procedure produces a result similar to a random ordering of the observations,
however, the RSS confidence interval may be longer than the SRS interval. The difference between the average RSS interval length and the average SRS interval length
23
is negligible; the difference is likely due to variablility in the simulation process.
As with SRS, asymmetry in the distribution of the variable of interest can
slow convergence to the nominal confidence level. In one severe case, only about 91%
of our simulated 95% confidence intervals contained the true mean, even though the
number of cycles of RSS surpassed 1,200. Simulated confidence intervals using SRS
fared no better, however. Thus, we conclude that RSS does not facilitate or hinder
convergence to the nominal coverage probability.
This paper raises an important question about how one should allocate a fixed
sample size, mk, under RSS. The largest gains in precision over SRS are achieved by
using a large number of sets, m. However, one also wants a large number of cycles, k,
to ensure that the attained confidence level is close to the nominal level. A topic for
future research is whether some method can be developed to ascertain the optimum
choice of cycles and number of sets that accounts for this tradeoff.
References
[1] Agency for Health Quality and Research (2002), “Medical Panel Expenditure Survey,” www.meps.ahrq.gov.
[2] Chen, H., Stasny, E. A., and Wolfe, D. A. (2005), “Ranked Set Sampling for Efficient
Estimation of a Population Proportion,” Statistics in Medicine, 24, 3319-3329.
[3] Dell, T. R. and Clutter, J. L. (1972) “Ranked Set Sampling Theory With Order
Statistics Background,” Biometrics, 28, 545-555.
24
[4] Husby, C. E., Stasny, E. A., and Wolfe, D. A. (2005), “An Application of Ranked
Set Sampling for Mean and Median Estimation Using USDA Crop Production
Data,” Journal of Agricultural, Biological, and Environmental Statistics, 10, 354373.
[5] Kaur, A., Patil, G. P., Shirk, S. J., and Taillie, C. (1996), “Environmental Sampling
With a Concomitant Variable: A Comparison Between Ranked Set Sampling and
Stratified Simple Random Sampling,” Journal of Applied Statistics, 23, 231-255.
[6] Lohr, S. L. (1999), Sampling: Design and Analysis, Pacific Grove, CA: Duxbury
Press.
[7] McIntyre, G. A. (1952), “A Method for Unbiased Selective Sampling, Using Ranked
Sets,” Australian Journal of Agricultural Research, 3, 385-390.
[8] Nahhas, R., Wolfe, D. A., and Chen, H. (2002), “Ranked Set Sampling: Cost and
Optimal Set Size,” Biometrics, 58, 964-971.
[9] Sroka, C. J., Stasny, E. A., and Wolfe, D. A. (2005), “Ranked Set Sampling: Where
Are the Samplers?” Technical Report 752, The Ohio State University, Department of Statistics.
[10] Stokes, S. L. (1977), “Ranked Set Sampling With Concomitant Variables,” Communications in Statistics - Theory and Methods, A6, 1207-1211.
[11] Wang, Y., Chen, Z., and Liu, J. (2004), “General Ranked Set Sampling With Cost
Considerations,” Biometrics, 60, 556-561.
25
Table 1: Sets of Variables Used in Simulation
Quantified variable
Ranking variable
Spearman rank
correlation
0.907
Total drug expenditures
Drug expenditures
paid by household
Log(total drug
expenditures + 1)
Drug expenditures
paid by household
0.907
Health expenditures
paid by insurance
Health expenditures
paid by household
0.436
Total health expenditures Household income
0.116
Table 2: Comparison of RSS and SRS. Variable of interest is log(drug expenditures
+ 1), ranking variable is household drug expenditures.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
20
50
150
500
2500
2
92.3% 94.0% 95.2% 95.1%
94.9%
RSS
5
92.2% 94.1% 94.6% 95.0%
94.7%
10
90.4% 93.7% 94.8% 95.0%
94.6%
SRS
93.1% 93.9% 94.8% 94.9%
95.2%
Average length of interval (µ = 4.920)
No. of
Sample size (n)
sets (m)
20
50
150
500
2
2.089 1.332 0.772 0.423
RSS
5
1.643 1.054 0.611 0.335
10
1.366 0.888 0.515 0.283
SRS
2.446 1.553 0.899 0.493
26
2500
0.189
0.150
0.127
0.220
Table 3: Comparison of RSS and SRS. Variable of interest is drug expenditures,
ranking variable is household drug expenditures.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
20
50
150
500
2500
2
84.2%
88.2% 91.0% 92.5%
91.3%
RSS
5
83.6%
87.9% 90.2% 91.0%
90.1%
10
80.9%
87.4% 89.6% 90.8%
89.5%
SRS
84.9%
89.0% 91.8% 92.2%
91.7%
Average length of interval (µ = 1021.24)
No. of
Sample size (n)
sets (m)
20
50
150
500
2
1391.60 936.44 593.37 356.27
RSS
5
1265.68 875.18 547.46 331.02
10
1130.12 820.17 529.00 324.09
SRS
1430.43 1007.03 625.60 374.60
2500
184.12
175.54
172.22
192.43
Table 4: Comparison of RSS and SRS. Variable of interest is amount of expenditures
paid by insurance, ranking variable is amount of expenditures paid by household.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
20
50
150
500
2500
2
75.2%
81.9%
88.1% 91.7%
94.4%
RSS
5
75.1%
81.8%
87.9% 91.9%
94.3%
10
74.4%
81.9%
87.8% 92.5%
93.8%
SRS
75.0%
81.9%
87.4% 91.9%
93.7%
Average length of interval (µ = 1741.93)
No. of
Sample size (n)
sets (m)
20
50
150
500
2
3142.91 2296.97 1460.20 839.67
RSS
5
3190.13 2290.01 1446.90 831.62
10
3086.05 2257.41 1439.66 827.55
SRS
3194.25 2309.67 1463.49 848.87
27
2500
383.94
379.08
377.22
388.34
Table 5: Comparison of RSS and SRS. Variable of interest is total health expenditures,
ranking variable is household income.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
20
50
150
500
2500
2
79.1%
85.1%
89.7%
92.8%
94.3%
RSS
5
78.3%
85.4%
90.0%
93.2%
94.7%
10
78.9%
85.6%
90.4%
93.2%
94.6%
SRS
79.0%
85.5%
90.0%
92.7%
94.3%
Average length of interval (µ = 5086.90)
No. of
Sample size (n)
sets (m)
20
50
150
500
2
7411.01 5169.87 3235.09 1847.87
RSS
5
7346.86 5190.91 3234.70 1852.73
10
7437.71 5206.06 3236.60 1850.97
SRS
7396.06 5207.70 3222.80 1846.51
2500
844.81
845.53
845.12
844.87
Table 6: Comparison of SRSS and SSRS. Variable of interest is log(drug expenditures
+ 1), ranking variable is household drug expenditures.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
60
150
750
1500
3750
SRSS
5
94.0% 94.5% 94.7% 94.8%
94.9%
SSRS
94.2% 94.8% 95.2% 94.8%
95.1%
Average length of interval (µ = 4.920)
No. of
Sample size (n)
sets (m)
60
150
750
1500
SRSS
5
0.549 0.390 0.175 0.124
SSRS
0.653 0.464 0.208 0.147
28
3750
0.078
0.093
Table 7: Comparison of SRSS and SSRS. Variable of interest is drug expenditures,
ranking variable is household drug expenditures.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
60
150
750
1500
3750
SRSS
5
87.0% 88.9% 90.3% 89.8%
88.1%
SSRS
87.7% 89.8% 91.2% 91.0%
89.3%
Average length of interval (µ = 1021.24)
No. of
Sample size (n)
sets (m)
60
150
750
1500
SRSS
5
647.71 496.33 262.94 201.89
SSRS
708.70 546.45 279.73 213.69
3750
140.66
147.30
Table 8: Comparison of SRSS and SSRS. Variable of interest is amount of expenditures paid by insurance, ranking variable is amount of expenditures paid by household.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
60
150
750
1500
3750
SRSS
5
83.7%
87.5% 92.9% 94.1%
94.3%
SSRS
84.1%
87.3% 93.0% 93.9%
95.0%
Average length of interval (µ = 1741.93)
No. of
Sample size (n)
sets (m)
60
150
750
1500
SRSS
5
1899.82 1416.60 677.05 482.46
SSRS
1917.86 1422.50 679.04 484.59
29
3750
307.26
307.95
Table 9: Comparison of SRSS and SSRS. Variable of interest is total health expenditures, ranking variable is household income.
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
60
150
750
1500
3750
SRSS
5
88.0%
90.1%
93.2%
93.7%
94.5%
SSRS
87.7%
90.4%
93.8%
94.1%
94.7%
Average length of interval (µ = 5086.90)
No. of
Sample size (n)
sets (m)
60
150
750
1500
SRSS
5
4395.30 3240.78 1520.00 1082.93
SSRS
4420.99 3257.15 1524.18 1087.92
3750
690.88
690.75
Table 10: Comparison of RSS and SRS. Variable of interest is total health expenditures, ranking variable is household income (100,000 iterations)
Percent of intervals containing mean (95% nominal confidence level)
No. of
Sample size (n)
sets (m)
20
50
150
500
2500
2
79.2%
85.4%
90.2%
92.8%
94.4%
RSS
5
79.0%
85.4%
90.2%
93.0%
94.3%
10
78.0%
85.2%
90.2%
92.9%
94.4%
SRS
79.3%
85.4%
90.3%
92.8%
94.5%
Average length of interval (µ = 5086.90)
No. of
Sample size (n)
sets (m)
20
50
150
500
2
7427.21 5197.21 3235.41 1846.90
RSS
5
7414.47 5211.73 3235.33 1848.53
10
7332.53 5188.80 3233.52 1848.54
SRS
7414.35 5216.20 3236.10 1847.95
30
2500
844.97
844.63
844.34
844.98