Document 284146

Draft revised on 10-11-05
Testing equivalence between two bioanalytical laboratories or two
methods using paired-sample analysis and interval hypothesis testing
Shixia Feng*' , Qiwei Liang , Robin D. Kinser , Kirk Newland and Rudolf Guilbaud
a
Philip Morris USA, Research Center, 4201 Commerce Road, Richmond, VA 23234,
USA; bMDS Pharma Services, 621 Rose Street, Lincoln, NE 68502, USA; CMDS
Pharma Services, 2350 Cohen Street, St. Laurent (Montreal), Quebec H4R 2N6,
Canada
Corresponding author:
Shixia Feng, Ph.D.
Philip Morris USA
Research Center
4201 Commerce Road
Richmond, VA 23234
Email: [email protected]
Abstract
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
1
Draft revised on 10-11-05
A modified interval hypothesis testing procedure, based on paired-sample analysis, and
its application in testing equivalence between two bioanalytical laboratories or two
methods are described. This testing procedure has the advantage of reducing the risk of
wrongly concluding equivalence while in fact two laboratories or two methods are not
equivalent. The advantage of using paired-sample analysis is that the test is less
confounded by the inter-sample variability than unpaired-sample analysis when incurred
biological samples with a wide range of concentrations are included in the experiments.
Practical aspects including experimental design, sample size calculation and power
estimation are also discussed through examples.
1.
Introduction
2
3117039798
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039798
Draft revised on 10-11-05
Bioanalytical methods for quantitative measurement of chemical compounds and
their metabolites in biological samples must be validated to ensure that methods yield
reliable results1. To achieve necessary analytical throughput, bioanalytical methods are
often transferred from one laboratory to another, and sometimes, samples from a
clinical or animal study are analyzed using different methods or at different laboratories.
To ensure that comparable results can be achieved between two laboratories or
methods, comparison should be conducted through a process of cross-validation1
before a new laboratory or method is used to analyze the biological samples. We
believe that the objectives of the experimental design for bioanalytical cross-validation
should be three-fold: to establish statistical equivalence (no unacceptable bias) between
the two laboratories or methods, to identify the sources of any differences, and to
resolve the differences.
Although statistical procedures exit for inter-laboratory comparisons {e.g. ISO 5725),
such experiments often require more than 6 participating laboratories. For comparing
the two means generated by two laboratories or methods, the general ttest (point
hypothesis test) is often used. The hypotheses are:
Ho: fJ2/(J] = 1 (there is no bias) versus Hi: /72/jt/i * 1 (bias exists)
(1)
If there is insufficient evidence to reject the null hypothesis, the two means are declared
equal. It has been shown2 that the general f-test has the undesirable property of
penalizing higher precision. If the sample size is small and there are relatively large
variations within two sets of data in comparison to the relatively small difference
between the two means, the f-test may be unable to detect a possible difference, and
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3
Draft revised on 10-11-05
sometimes when the precision of each data set is very high, differences of little practical
importance could be considered significant.
Hartmann et al. 3 recently compared the general Mest with the interval hypothesis
test for testing the equivalence of two laboratories or two methods and concluded that
for laboratory or method comparison purposes, the latter is the preferred one because
the (3-error of the general Mest can be controlled. The interval hypothesis test is based
on Schuirmann's2 two one-sided Mests (TOST), which has become the standard
approach in bioequivalence studies. This approach requires the pre-determination of an
acceptance interval (lower limit 01 and upper limit 0z) and then tests if the measured bias
is within the acceptance interval. In statistical terms, the hypotheses to be tested are:
Ho: /J2/^1 ^ @1 or JL/^1 ^ @2 versus Hi: 61 <\i2\\J\ < O2
(2)
The hypotheses can also be expressed as two one-sided tests:
H01: \Jzl\J\ ^ 61 versus Hn: /v2/^i > 01
(3)
H02: JU2///1 > 62 versus Hi 2 : £/2/£/i < O2
(4)
Compared to the general Mest, the null and alternative hypotheses have been
reversed. Consequently, the definitions of type I (a) and type II (p) errors in the interval
hypothesis test are exactly switched with respect to the general Mest. The type I error is
now the probability (a) of the erroneous acceptance of equivalence when in fact the two
laboratories or methods are not equivalent. The type II error is the probability ((3) of
erroneous acceptance of nonequivalence while in fact the two laboratories or methods
are equivalent. The acceptance of null hypotheses leads to the conclusion that the bias
is not acceptable, and the rejection of null hypotheses leads to the conclusion that the
bias is acceptable. This test has the advantage of limiting the risk of erroneously
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
4
Draft revised on 10-11-05
accepting a new laboratory or new method that is biased as unbiased to very low
degree.
However, close examination of the algorithm and procedures described by
Hartmann et al. 3 led us to believe that they are only suitable for testing two means from
two independent sets of samples (unpaired sample analysis).
Unpaired-sample
experimental design in a bioanalytical cross-validation study may confound this
statistical test because of a possible large pooled variance that is actually due to the
inter-sample variability, especially for incurred biological samples obtained from clinical
or animal studies. We believe that this problem can be overcome by applying pairedsample analysis.
In this manuscript, we present a modified interval hypothesis testing procedure and
practical experimental design based on paired-sample analysis for testing equivalence
between two bioanalytical laboratories or methods. The example of concern is
represented by the transfer of an LC-MS/MS method for S-phenylmercapturic acid (SPMA), a benzene metabolite in human urine and a biomarker for exposure to
benzene4'5. Both spiked quality control (QC) and incurred human samples were
included in this study.
2.
Methods
2.1
The interval hypothesis test procedure
The acceptance interval (#7 and O2) should be defined before the experiments begin.
As it is now widely accepted1 that a validated bioanalytical method should have less
than ±15% bias at concentration levels other than the lower limit of quantification
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
5
Draft revised on 10-11-05
(LLOQ), where maximally ±20% bias is acceptable, the Gi and 02 in this study were
therefore set as 0.85 and 1.15, respectively (note that although we typically find these
limits reasonable, one should define the acceptance interval based on his or her own
experience with the method). A paired-sample design was employed and the
C OTIC
concentration ratio for each sample was calculated. Let R = ——Lab2
(lTKZClStiT&di
for each
ConcLabl (measured)
sample and RLab2llahl be the mean ratio of each sample set and R0 = the true ratio
between the two laboratories (Lab2/I_ab1) for the entire sample population. The
hypotheses to be tested can be stated as:
H0:
RQ^ 0.85 orR 0 > 1.15 versus Hi: 0.85 <R0 < 1.15
(5)
R0<0.85
versus Hn:
R0 > 0.85
(6)
^0>1.15
versus H i2 :
#o<1.15
(7)
or:
Hoi:
and
H02:
The Ho will be rejected if
f
_
cal(.\) ~
l
R
1
^b2fLab\
-85
r,ab2/r,ab] - 0^-^J„
I j—
SDjJn.
>
l
a,n-\
/n\
\°/
where tan_^ the 1-a quantile of the t distribution with n-1 degrees of freedom. SDR
represents the standard deviation of the mean ratio for the sample set, or by
rearrangement, if
£ ^ 2 / ^ 1 - W i £ D * A ^ > 0.85.
and if
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
(9)
Draft revised on 10-11-05
^
cal(2)
l
_ ^Lab2/Labl
~
~
^^
/ (—
^
"^
f
a.n-l
l
\
('\C)\
i U
)
SDR/jn
or by rearrangement, if
Rr*,2,M+la^SDjJl<l.l5.
Notice that RLab2!Lab\±ta.n-\SDRiJn
(11)
actually represents the confidence interval of the
mean ratio at (1-2a)x100% level. Thus, H0 will be rejected at a significance level of a if
the (1-2a)x100% confidence interval is contained entirely between 0.85 and 1.15.
In this study, basic data calculations were performed with Microsoft® EXCEL
2002. The computation of the interval hypothesis test and statistical power were
performed with Statistical Analytical System (SAS®) software.
2.2
The LC-MS/MS method
An LC-MS/MS method for S-PMA in human urine was initially developed and
validated at Labi, the originating laboratory. The full details of the method will be
published separately. Briefly, the method utilized 1 mL of a human urine sample and a
solid phase extraction procedure. The LC-MS/MS system included a Thermo Hypersil
BioBasic® AX column with a guard column and a PE Sciex API 4000 MS detector using
an electrospray ionization interface. Negative ions were monitored in the multiple
reaction monitoring (MRM) mode.
After the method had been validated and used to analyze approximately 1800
human urine samples at Labi, it was transferred to Lab2 (accepting laboratory) to
accommodate the need for higher throughput. The method used at Lab2 was identical
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
7
Draft revised on 10-11-05
to that at Labi except that the final step of the extraction procedure was slightly
modified to allow for differences in available analytical equipment.
2.3
Cross-validation experimental design
The method transfer was conducted according to a method transfer protocol which
outlined the sample types and source and the cross-validation acceptance criteria. Lab2
performed the initial between-site qualification batches to re-establish the lower and
upper limits of quantification, inter-batch accuracy and precision, and the linear range.
The performance characteristics at both laboratories are summarized in Table 1.
(Insert Table 1 here)
To establish equivalence between the two laboratories, a paired-sample design was
employed: paired analyses of both spiked quality control (QC) samples at three fixed
concentrations (low, medium, and high) and human incurred samples. QC samples at
each concentration level were prepared by Labi and a portion of each sample was sent
to Lab2. The nominal values for each QC level were 89.0, 2140, and 15100 pg/mL,
respectively. Each laboratory measured QC samples at each concentration level in
twelve replicates in paired fashion. In addition, 12 incurred human urine samples were
collected at Labi and a portion of each sample was also sent to Lab2. The ratio of
measured concentration of each sample pair was then calculated. The interval
hypothesis test was performed on each QC set as well as the incurred human sample
set. The equivalence between the two laboratories was declared at a = 0.025 level if the
95% confidence interval of the mean ratios was contained entirely between 0.85 and
1.15 for all QC levels and incurred human samples.
8
3117039804
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039804
Draft revised on 10-11-05
3. Results and Discussion
3.1 Sample selection
The calibration curve of a bioanalytical method should represent the normal
concentration distribution of the real biological samples. To establish equivalence
between the two laboratories, we believe that at least three concentration levels which
represent the entire range of the calibration curve should be studied: one near the lower
boundary of the curve, one near the center, and one near the upper boundary of the
curve. Real biological samples that represent the normal concentration distribution of
the analyte should also be included. The selection of QC levels and incurred human
samples in this study was based on these considerations. The paired-sample design is
based on the properties of the interval hypothesis test which will be discussed later in
the section on paired versus unpaired sample analysis.
(Insert Table 2 here)
3.2 Type I and II errors, power, and sample size
As mentioned earlier for the interval hypothesis test, the type I error is the
probability (a) of the erroneous acceptance of equivalence when in fact the two
laboratories or methods are not equivalent. The type II error is the probability ((3) of
erroneous acceptance of nonequivalence while in fact the two laboratories or methods
are equivalent. The corresponding power of the test, 1-p, is therefore the probability of
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
9
Draft revised on 10-11-05
correctly declaring equivalence between the two laboratories or two methods when they
are truly equivalent.
Similar to bioequivalence studies6'7, the primary concern of bioanalytical chemists
should be the control of a-error (equivalent to the p-error in the general f-test) because
the results of clinical samples generated by nonequivalent laboratories or methods as
equivalent ones will cause comparability problems and confusion in data interpretation.
The a level should be defined a priori as the acceptance interval {9i and ft). In this
study, we pre-defined in the cross-validation protocol that the 95% confidence interval of
the mean ratios must be contained entirely between 0.85 and 1.15 for all QC levels and
incurred human samples. Therefore, the risk of wrongly concluding the two labs as
equivalent ones is limited to no more than 2.5%. The actual probability of this risk is
represented by the p-value and can be calculated by SAS® software using the two onesided tests approach (Eq. 6 and 7). The p-value of the interval hypothesis test (Eq. 5) is
the maximum p-value of the two one-sided tests. For example, as shown in Table 3, the
p-values for the two one-sided tests for the incurred human sample set were 0.0008 and
<0.0001, respectively. The p-value of the interval hypothesis test is therefore 0.0008,
which is much lower than the pre-defined acceptable limit of 0.025. To reject Ho, both p\
and p2 should be less than the a-value. In this example, the risks of declaring
equivalence between the two laboratories while the two laboratories are in fact not
equivalent were extremely small for all sample sets tested.
For the incurred sample set, we plotted the ratio versus the mean concentration
from the two laboratories (not shown here) which showed no correlation. It indicated the
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
10
Draft revised on 10-11-05
two laboratories were not systematically different in various parts of the concentration
range.
{Insert Table 3 here)
A basic question bioanalytical chemists often ask when they design the experiments
is how many samples should be analyzed in order to be statistically meaningful. Sample
size (number of sample pairs in our case) is one of the key factors that affects the
statistical power and should be estimated based on the desired power and estimated
variability. The power for the interval hypothesis test should be:
Power (RLab2flM) = P{fca<i) > W i and fca/(2) < - fa,n-i}
=P{0.S5 + ta^<TR/yfc <R0
<1.15-ta^aR/Ji}
(12)
where P stands for the probability and OR is the standard deviation of the entire sample
population.
Using the same procedure described by Chow and Liu8;9, we obtained Eq. 13 and
14 for estimating the minimum number of sample pairs for achieving the desired power.
LV
w
^(f«,«-i+'A«-i):
V '
XftLab2fLabl
~ ^LablILab\
for1.00<^2;^<1.15
(13)
for0.85<^2/tol<1.00
(14)
J
and
^(Vi+Wi)
:
CVxR
R
Lab2ILab\
Lab2ILab\
~0.85
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
11
Draft revised on 10-11-05
where CV is the coefficient of variation of the sample set and tpn_y is the 1-J3 quantile of
the t distribution with n-1 degrees of freedom.
Figure 1A-J illustrates the effect of the number of sample pairs, CV, ratio and avalue on power based on the paired-sample design (when 61 and 62 were set at 0.85
and 1.15, respectively). The power increases as the CV decreases or the number of
sample pairs increases.
The relationships showed in Figure 1A-J should be used to guide the planning of
the experiments. However, because the computation of the power requires the
computer programming and also because the main concern of bioanalytical chemists in
cross-validation is the control of the a-error, the post-study calculation of the actual
power does not have to be part of the test. If the confidence interval is within the
acceptance interval after the experiments, a simple way to verify if the desired power
has been achieved is to compare the calculated number of sample pairs for a desired
power and actual number of sample pairs. If the calculated number of sample pairs is
less than the actual values, the desired power has been achieved. It is common in
clinical studies to set p to be less than 0.2, which represents the power of the statistical
test as greater than 80%. Therefore, for example, for the incurred sample set in this
study, the estimated number of sample pairs for 80% power is: n = (2.20 + 0.88)2 x
[(0.07)/(0.94 - 0.85)]2 « 6. Since the actual number was 12, the power must have
exceeded 80%. The power for this sample set is 98%.
Each laboratory or method is associated with two types of errors: systematic and
random errors. A ratio that is close to the acceptance boundary is an indication of
12
3117039808
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039808
Draft revised on 10-11-05
systematic bias between the two laboratories or methods. One should first focus on
finding the sources of systematic bias. For example, comparing the calibration
standards, stock solutions, and working solutions in both laboratories often helps
resolve the problem. Sometimes in practice it is necessary to conduct pilot experiments
with a limited number of samples to generate preliminary data in order to guide the full
scale comparison. The CV of the ratio is a function of random errors of the two
laboratories or methods being compared. Assuming there is no variation in sampling,
the CV of the ratio can be estimated using Eq. 15 from the precision data of each
laboratory or method for a given concentration level, which should have been generated
during the method validation by repeated measurements of QC samples at different
concentration levels.
r a„Labi \2
VConcLam
J
,
+
\2
<TLab2
C°»CL*I
J
ferEn+cK*!
(15)
Note that Eq. 15 does not apply to the incurred human samples which may vary over a
wide
concentration
range.
Because
the
accuracy
and
precision
at
different
concentration levels are often different for a laboratory or method, the incurred samples
should have an approximately even distribution over the entire concentration range. If
the incurred samples are not distributed evenly both the ratio and CV may be skewed.
3.3 Paired versus unpaired sample analysis
The interval hypothesis tests procedures for testing either the ratio or difference {s)
between two means from two independent random samples (unpaired sample analysis)
have been described by previous authors310'11. When testing the ratio of two means, the
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
13
Draft revised on 10-11-05
rejection criteria for the null hypotheses for two independent series of data with equal
variance (of = G\ ) are:
t
=
''cal(X)
_&ZM>_>,
|
:
(16)
V IVJ/
— —*«,«!+«,-2
where ^ is the pooled variance of the two means:
^2
(nx - l)sf + (n2 - \)s22
=
p
«j + n2 - 2
and
'cai(2)
—
I
. —r
y
- n( 7
/
W
+
2
— 'a,n 1 +n 2 +2
\ ' ^/
/ )
/
W
l
It should be noted that the unpaired-sample design in bioanalytical cross-validation
studies may confound this statistical test due to an incidence of very large pooled
variance that is actually due to the inter-sample variability, especially for incurred
biological samples. Two actually equivalent laboratories or methods may be determined
to be nonequivalent unless the sample size is extremely large. The paired-sample
design and testing the ratios allow one to remove such variability and therefore render
higher sensitivity to the test. For similar reasons, it is inappropriate to test the mean
differences (<5) for incurred human samples in paired-sample analysis. For example, for
the incurred human sample set in this study, 5 = -308.8; SD = 469.1; CV = 151.9%;
95% CI = -606.9 (lower), -10.8 (upper); U = 1-83 and t2 = -6.39 (compared to taXnA) =
14
3117039810
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039810
Draft revised on 10-11-05
2.20). Notice that the variability in the mean difference (s) of this sample set was very
high (CV = 151.9%). This variability is mainly due to the variable sample concentrations.
Therefore, the test performed on the mean difference led to the wrong conclusion that
the two laboratories were not equivalent because the test was confounded by the
variability in incurred sample itself.
3.4 Some potential issues
Compared to the unpaired sample analysis, the degree of freedom in the paired
sample analysis is lower (/?-1 versus n<\+ri2-2), which seemingly implies lower power.
However, with good planning the desired power can be achieved as discussed above.
Another issue is that the statistical testing procedure described herein assumes that
the ratio has normal distribution around the mean values, which is true in this study. For
non-normal ratio data, if the sample size is sufficiently large and the variability is small,
this procedure is still applicable. Log-transformation may also be considered if the data
are not normally distributed.
Sometimes, especially when the sample size is small, one may encounter a
situation where outliers (larger or smaller ratios than the others in a data set) may occur
and the outcomes of the analysis can be influenced depending on how the outliers are
treated. Potential outliers can be identified using statistical procedures such as Dixon's
test, Grubbs' test12, or simply as values outside of three standard deviations of the
mean. We recommend that the root causes be investigated before a decision is made to
exclude outliers from the statistical analysis. The exclusion of an outlier may be justified
if a review of the documentation clearly indicates an error in the sample processing or
15
3117039811
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039811
Draft revised on 10-11-05
analysis. One should keep in mind that the exclusion of the outliers will result in reduced
sample size and therefore will affect both types of statistical errors. An alternative is to
reanalyze the outlying samples in either or both laboratories by both methods. However,
if the repeat analyses verify the original values, these values should not be excluded in
the statistical analysis and unfavorable conclusions may be reached in such cases.
However, it is our opinion that this situation is likely due to the large variability between
the two laboratories or methods, which is one indication that they are not equivalent.
Sample stability is another factor that may affect the cross-validation results. The
shipping and storage conditions must be proven to be effective in preserving the sample
integrity. Otherwise, the results are confounded and may be misleading.
4.
Conclusion
The interval hypothesis test and paired-sample experimental design described in
this manuscript can be used in testing equivalence between two bioanalytical
laboratories or methods. The premise for using this approach is that the two laboratories
or methods in comparison have been validated independently prior to the crossvalidation. The acceptance interval and the limits for the two types of risks associated
with this statistical test (a- and p-errors) should be defined prior to the experiments.
Equations for estimating the number of sample pairs based on power were also
described. The number of sample pairs should be sufficiently large in order to achieve
the desired power. The advantages of this test and the experimental design based on
this test is that the risk of wrongly concluding equivalence while in fact two laboratories
16
3117039812
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039812
Draft revised on 10-11-05
or methods are not equivalent can be limited to a low level. In addition, as the number of
sample pairs are determined based on power calculations, the risk of wrongly
concluding non-equivalence while in fact the two laboratories or methods are equivalent
can also be limited to a low level (< 20%).
Finally, since both systematic bias and
random errors of the two laboratories or methods are taken into account, this procedure
should be suitable for real practice.
Acknowledgement: The authors would like to thank Drs. Hans Roethig and Mohamadi
Sarkar for helpful discussion and suggestions.
References:
1. Shah, V. P.; Midha, K. K.; Findlay, J. W.; Hill, H. M.; Hulse, J. D.; McGilveray, I. J.;
McKay, G.; Miller, K. J.; Patnaik, R. N.; Powell, M. L; Tonelli, A.; Viswanathan, C.
T.; Yacobi, A. Pharm.Res. 2000, 17, 1551-57. Also see US DHSS FDA. Guidance for
17
3117039813
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039813
Draft revised on 10-11-05
Industry Bioanalytical Method Validation, http://www.fda.gov/cder/guidance/4252fnl.pdf.
2001, accessed October 11, 2005.
2. Schuirmann, D. J. J.Pharmacokinet.Biopharm. 1987, 15, 657-80.
3. Hartmann, C; Smeyersverbeke, J.; Penninckx, W.; Vanderheyden, Y.;
Vankeerberghen, P.; Massart, D. L. Analytical Chemistry 1995, 67, 4491-99.
4. Boogaard, P. J.; van Sittert, N. J. Occup.Environ.Med. 1995, 52, 611-20.
5. Boogaard, P. J.; van Sittert, N. J. Environ.Health Perspect. 1996, 104 Suppl 6,
1151-57.
6. Diletti, E.; Hauschke, D.; Steinijans, V. W. Int J Clin Pharmacol Ther Toxicol 1991,
29, 1-8.
7. Phillips, K. F. J.Pharmacokinet.Biopharm. 1990, 18, 137-44.
8. Chow, S. C; Liu, J. P. Design and Analysis of Bioavailability and Bioequivalence
Studies, 2 ed.; Marcel Dekker: New York, 1999; Chapter 5.
9. Liu, J. P.; Chow, S. C. J Pharmacokinet Biopharm. 1992, 20, 101-04.
10. Locke, C. S. J Pharmacokinet Biopharm. 1984, 12, 649-55.
11. Richter, S. J.; Richter, C. Quality Engineering 2002, 14, 375-380.
12. Miller, J. N.; Miller, J. C. Statistics and Chemometrics for Analytical Chemistry, 4
ed.; Prentice Hall: New York, 2000; Chapter 3.
Table 1. Validation summary for the two laboratories
Performance Parameter
Labi*
Lab2**
Lower Limit of Quantitation
Linear Range
20 pg/mL
20 - 20000
20 pg/mL
20 - 20000
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
Draft revised on 10-11-05
Between-Run Precision (CV%)
LowQC
Medium QC
High QC
Between-Run Accuracy (% Nominal)
LowQC
Medium QC
High QC
pg/mL
pg/mL
9.7 (n=24)
1.8(n=24)
2.8 (n=24)
7.8(n=17)
2.6(n=18)
2.6(n=18)
100.0 (n=24)
97.7 (n=24)
99.3 (n=24)
103.3 (n=17)
96.0 (n=18)
96.1 (n=18)
*The nominal values for low, medium, and high QC levels for the Labi were 142, 2140, and 15100
pg/mL, respectively
**The nominal values for low, medium, and high QC levels for the receiving Lab2 were 48.6, 2070, and
15206 pg/mL, respectively
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
19
CO
-vl
o
Draft revised on
10-11-05
CO
CD
00
0>
Table 2. The analytical results from a cross-validation study
QC1
QC2
QC3
(Nominal = 89.0 pg/mL)
(Nominal = 2140 pg/mL)
(Nominal = 15100 pg/mL)
Incurred
N
Cocn1
Cocn2
R2/1
Cocrii
Cocn2
R2/1
Cocrii
Cocn2
R2/1
Cocn1
Cocn2
R2/1
1
83.7
83.6
1.00
2090
2070
0.99
14500
14700
1.01
391
372
0.95
2
92.8
91.3
0.98
2150
2140
1.00
14700
14700
1.00
432
362
0.84
3
87.4
93.7
1.07
2100
2110
1.01
14700
14900
1.01
4430
4230
0.96
4
89.4
86.1
0.96
2070
2120
1.02
14600
14900
1.02
141
119
0.84
5
99.0
91.2
0.92
2040
2190
1.07
14500
14300
0.99
80.7
82.7
1.03
6
96.1
85.9
0.89
2230
2190
0.98
15100
14500
0.96
12800
11600
0.91
7
102
93.4
0.92
2110
2130
1.01
14400
14100
0.98
232
243
1.05
8
89.5
94.4
1.06
2000
2050
1.03
14800
14400
0.97
638
660
1.03
9
102
87.4
0.86
2180
2030
0.93
14900
13800
0.93
10800
9930
0.92
10
102
81.3
0.80
2150
2150
1.00
14600
14800
1.01
1120
1100
0.98
11
94.0
95.2
1.01
2170
2020
0.93
14800
14900
1.01
9750
8620
0.88
12
93.0
95.1
1.02
2020
2010
1.00
14500
14300
0.99
3710
3500
0.94
Mean
94.2
89.9
0.96
2109
2101
1.00
14675
14525
0.99
3710
3402
0.94
SDR
6.14
4.83
0.083
69.60
63.60
0.039
200.57
354.52
0.028
4727.18
4272.11
0.07
CV (%)
6.51
5.37
8.64
3.30
3.03
3.90
1.37
2.44
2.79
127.40
125.59
7.41
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
20
Draft revised on 10-11-05
Table 3. The interval hypothesis test results for mean ratios
QC 1
QC2
QC3
Incurred
(n=12)
(n=12)
(n=12)
(n=12)
95% CI
(lower, upper)
0.91, 1.01
0.97, 1.02
0.97, 1.01
0.90, 0.99
ti
4.57
13.13
18.45
4.61
t2
-8.29
-13.73
-21.09
-10.33
ta,(n--\)
2.20
2.20
2.20
2.20
P1
<0.0001
<0.0001
<0.0001
0.0008
?2
<0.0001
<0.0001
<0.0001
O.0001
P
<0.0001
<0.0001
<0.0001
0.0008
Power (%)
99%
>99%
>99%
98%
21
3117039817
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039817
Draft revised on 10-11-05
Figure caption:
Figure 1A-J. The effect of the number of sample pairs, CV% and mean ratio on power
at a = 0.05 and 0.025 levels.
22
3117039818
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039818
Draft revised on 10-11-05
< A)
alpha = 0.05, mean=1.0
alpha = 0.025, m e a n = 1 . 0
if
12
15
18
21
24
27
sample size
<• )
(C)
alpha=0.05, mean=0.90
I
12
15
CV=59S
IS
CCQ
£lpha=0.02S, rrtsan = 0.90
CV-8%
I I I CV=10%
21
12
Sample size
18
21
Sample size
<F>
£lpha=0.025, rrtean = 0.95
alpha=0.05, mean=0.95
I
16
OV-5«
QG© CV-8%
A A A cv-i3«
# # # cv-is%
l - H - l CV-10*
12
0 8 0 OV-8%
# # 9 cv-13%
16
IS
21
Sample size
Sample size
<H)
aJptia=0-025, mean =1.05
(G)
=0-05, mean =
Sample size
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
I I I CV-SK
A A A cv-is?s
sample size
23
l - H - l CV-10*
Draft revised on 10-11-05
(i)
alpha=0.05, mean=1.10
alpha=0.025, mean =1.10
9
1 2 1 5 1 8 2 1 2 - 1 2 7 M 3 S
sample size
12
16
18
ZL
sample size
24
3117039820
http://legacy.library.ucsf.edu/tid/puk82i00/pdf
3117039820