Nonparametric hypothesis tests One sample tests (7.2) Two-sample tests (7.3) Ordered categories (7.4)

Nonparametric hypothesis tests
One sample tests (7.2)
Two-sample tests (7.3)
Ordered categories (7.4)
Stratified tests (7.5)
Tests for crossing hazards (7.6)
Tests based on differences in outcome at a fixed point in
time (7.8)
1
One-sample tests:
censored sample of size n
test hypothesis that population hazard is h0(t) for all t # J
for hazard function fully-specified over interval
typically take J to be largest of study times
for all t # J
i.e., H0:
HA:
for at some t # J
could also formulate hypotheses in terms of
,
, or
since there are 1-1 transformations between
functions
can also formulate in terms of mrl(tJ) (restricted mean
life)
2
form of tests:
compare observed and hypothesized weighted hazards
observed event times
; di events at ti
increment in estimated cumulative hazard at ti:
3
so, compare
(observed) to
(expected)
for nonparametric estimators of the hazard, we have
and so we have
4
we then compare the weighted sums for the observed and
expected to obtain
variance of test statistic under null:
5
derivation using counting process approach:
is a martingale
is stochastic integral of weight wrt
martingale
6
predictable variation process can be found from the
original martingale by
(3.6)
now,
and so
for large samples, Z(J) is asymptotically normally
distributed; use this in testing hypothesis
7
most popular weight; W(t) = Y(t) yields log-rank statistic
then,
, the number of observed
events prior to J, and
is expected # of
events
8
generalization: Harrington-Fleming family of weights:
, for p, q $ 0
choice of p and q will put different weights on early, late
departures from null hypothesis
9
disadvantages of 1-sample tests:
neither SAS nor stata permit one-sample tests
difficult to fully specify a priori reasonable form for
nonetheless, valuable didactically: form of tests similar to
2 and K-sample tests
10
Tests for 2 or more samples
let
denote hazard in group j at t
hypotheses:
for all t # J
HA: at least one of the
is different for some t # J
global alternative
notation:
dij / # of failures at the ith failure-time in group j
Yij / # subjects at risk in ith failure-time in group j
the index i refers to failures in the pooled group (i.e., all
groups put together)
11
will continue to derive tests as comparisons of observed
and expected weighted hazards, as before
expected hazards: for 1-sample test, compute directly
from prespecified hazard function
compute hazard expected under hypothesis that there is
no association
pool all groups together; derive pooled Nelson-Aalen type
estimator of increments in cumulative hazard at each
failure time:
compare stratum-specific increments in the cumulative
hazard at the same failure-time
to the
pooled cumulative hazard
compute summed statistic
there are K summed statistics Zj(J)
12
will combine information from each of the Zj(J) to get
overall test statistic for global null
in principle can use different weight Wj(t) for each
different group j
in practice, restricted to form
so obtain
weighted sum of difference between observed and
expected deaths in each group under null
under null, hazard increment is di/Yi; multiply by # at risk
to get expected # of deaths
alternatively, di is observed # of deaths in pooled sample
at ti; under null, divide deaths evenly among groups
(proportional to # at risk), so expect
deaths
similar to computations in chi-squared goodness of fit
statistic
13
Variance of the statistics
covariance
where g … j
interpretation of components of formulae:
14
interpretation of components of formulae:
correction for ties; equals 1 if di = 1 (no ties)
,
come from variance/covariance
of multinomial random variable with parameters di,
pj= Yij/Yi
15
when structured weights
used, the
components vector {Zj(J)} is linearly dependent
so to test, select any K-1 of the Zjs
(column vector), and let G be
let
corresponding covariance matrix
test statistic:
distributed as
under H0
what additional can be done for K=2 (2 sample test)?
16
for K=2 (2 sample test), can get Z statistic, compare to
normal distribution, test 1-sided alternatives
17
weight functions W(t)
most common: W(t) = 1; log rank test
optimal power to detect proportional hazards alternatives
to null
W(ti) = Yi yields generalization of Mann-WhitneyWilcoxon test
Tarone and Ware generalize this
W(ti) = f(Yi) where f(y) is a known, fixed function
suggest f(y) = y
can be misleading, since weights in each stratum can
depend on event times and censoring distributions
especially when censoring distributions in different
arms/groups differ
18
alternative version without this problem
let
, a survival estimator close to the
product limit estimator
Fleming-Harrington weights
, for p, q $ 0
use
so that weights are known before failure
(predictable)
when weights are predictable, derivation works easily
with counting process theory
is the integral of a predictable process with respect
to a martingale
log-rank: p=q=0
version of Wilcoxon: p=1, q=0
19
greater weight to early failures/differences
by appropriate choice, can pick any single region to give
greatest weights
statistics presented are generalizations to censored data of
linear rank statistics
20
typically, compute weights with log-rank weights (W(ti)= 1), Gehan weights
W(ti)= Yi
computed in SAS
see SAS code below:
proc lifetest;
time t1*d1(0);
strata g(0.5 1.5 2.5);
other option using test statement; book recommends using strata statement
automatically produces these 2 test statistics:
21
The LIFETEST Procedure
Testing Homogeneity of Survival Curves for T1 over Strata
Rank Statistics
g
Wilcoxon
2.680
-15.767
13.087
127.0
-1545.0
1418.0
1
2
>=2.5
Log-Rank
Covariance Matrix for the Log-Rank Statistics
g
1
2
>=2.5
1
2
>=2.5
15.5407
-9.9934
-5.5473
-9.9934
19.7910
-9.7976
-5.5473
-9.7976
15.3449
22
Covariance Matrix for the Wilcoxon Statistics
g
1
2
>=2.5
1
2
>=2.5
158038
-96825
-61213
-96825
192190
-95365
-61213
-95365
156578
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
DF
15.2198
16.3034
20.6325
2
2
2
23
Pr >
Chi-Square
0.0005
0.0003
<.0001
other option, using test statement:
proc lifetest;
time t1*d1(0);
test g2 g3;
/* tests groups 2 and 3 separately vs. 1 then
combine*/
The LIFETEST Procedure
Univariate Chi-Squares for the Wilcoxon Test
Variable
g2
g3
Test
Statistic
11.1941
-10.2208
Standard
Deviation
3.0937
3.0123
24
Chi-Square
13.0923
11.5126
Pr >
Chi-Square
0.0003
0.0007
Covariance Matrix for the Wilcoxon Statistics
Variable
g2
g3
g2
g3
9.57106
-4.90931
-4.90931
9.07394
Forward Stepwise Sequence of Chi-Squares for the Wilcoxon Test
Variable
g2
g3
DF
1
2
Chi-Square
13.0923
16.1524
Pr >
Chi-Square
Chi-Square
Increment
0.0003
0.0003
13.0923
3.0601
Pr >
Increment
0.0003
0.0802
Univariate Chi-Squares for the Log-Rank Test
Variable
g2
g3
Test
Statistic
15.7673
-13.0873
Standard
Deviation
4.4521
3.9205
25
Chi-Square
12.5426
11.1435
Pr >
Chi-Square
0.0004
0.0008
Covariance Matrix for the Log-Rank Statistics
Variable
g2
g3
g2
g3
19.8210
-9.8123
-9.8123
15.3701
Forward Stepwise Sequence of Chi-Squares for the Log-Rank Test
Variable
g2
g3
DF
1
2
Chi-Square
12.5426
15.1963
Pr >
Chi-Square
Chi-Square
Increment
0.0004
0.0005
12.5426
2.6537
Pr >
Increment
0.0004
0.1033
Overall test shown here; somewhat different from what is produced by strata
statement
book recommends first version; what is discussed above
26
Stata:
after stsetting the data,
sts test varlist
infile g T1 T2 d1 d2 d3 TA A TC C TP P Z1 Z2 Z3 Z4 Z5 Z6 Z7 z8 z9 z10
using d:\wpfiles\surv_anl\gvt1.txt
(137 observations read)
. stset T1 d1
id:
--
(meaning each record a unique
subject)
entry time:
exit time:
failure/censor:
-T1
d1
(meaning all entered at time 0)
27
. sts test g
failure time:
failure/censor:
T1
d1
Log-rank test for equality of survivor functions
-----------------------------------------------| Events
g
| observed
expected
------+------------------------1
|
24
21.32
2
|
23
38.77
3
|
34
20.91
------+------------------------Total |
81
81.00
chi2(2) =
Pr>chi2 =
15.22
0.0005
28
. sts test g, wilcoxon
failure time:
failure/censor:
T1
d1
Wilcoxon (Breslow) test for equality of survivor functions
---------------------------------------------------------| Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
127
2
|
23
38.77
-1545
3
|
34
20.91
1418
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
16.30
0.0003
29
. sts test g, logrank
failure time:
failure/censor:
T1
d1
Log-rank test for equality of survivor functions
-----------------------------------------------| Events
g
| observed
expected
------+------------------------1
|
24
21.32
2
|
23
38.77
3
|
34
20.91
------+------------------------Total |
81
81.00
chi2(2) =
Pr>chi2 =
15.22
0.0005
30
. sts test g,fh(0,0)
failure _d:
analysis time _t:
d1
T1
Fleming-Harrington test for equality of survivor
functions
|
Events
Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
2.6800072
2
|
23
38.77
-15.767289
3
|
34
20.91
13.087282
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
15.22
0.0005
31
. sts test g,fh(1,0)
failure _d:
analysis time _t:
d1
T1
Fleming-Harrington test for equality of survivor
functions
|
Events
Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
1.0428607
2
|
23
38.77
-11.899202
3
|
34
20.91
10.856341
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
15.92
0.0003
32
. sts test g,fh(0,1)
failure _d:
analysis time _t:
d1
T1
Fleming-Harrington test for equality of survivor
functions
|
Events
Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
1.6371465
2
|
23
38.77
-3.8680867
3
|
34
20.91
2.2309402
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
8.45
0.0147
33
. sts test g,fh(0.5,0.5)
failure _d:
analysis time _t:
d1
T1
Fleming-Harrington test for equality of survivor
functions
|
Events
Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
2.3425108
2
|
23
38.77
-6.6296218
3
|
34
20.91
4.2871109
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
14.50
0.0007
34
35
Tests for trend:
what are null, alternative hypotheses?
What is idea behind test for trend?
(Same as in categorical data)
36
null hypothesis is the same as before
idea is to get more power under ordered alternatives
will use the same stratum-specific statisitics
and
estimated covariate matrix
strata are presumed meaningfully ordered
assign stratum-specific scores aj
typically, aj = j chosen, but may choose other values (e.g.,
if using categories based on continuous variables,
might choose mean of underlying continuous
variables)
37
compute Z-statistic:
i.e., numerator is weighted sum of
statistics
denominator is variance under null, derived from
38
with SAS, use TEST statement
asthma data
proc lifetest notable;
time ttr28*rcsr28(1);
test tseq;
Univariate Chi-Squares for the Log-Rank Test
Test
Standard
Pr >
Variable Statistic Deviation Chi-Square
TSEQ
-15.9666
10.0939
2.5021
39
Chi-Square
0.1137
in stata, use trend statement
GVT data
. sts test g,tr
failure _d:
analysis time _t:
d1
T1
Log-rank test for equality of survivor
functions
|
Events
Events
g
| observed
expected
------+------------------------1
|
24
21.32
2
|
23
38.77
3
|
34
20.91
------+------------------------Total |
81
81.00
chi2(2) =
Pr>chi2 =
15.22
0.0005
Test for trend of survivor functions
chi2(1) =
Pr>chi2 =
40
2.58
0.1082
. sts test g,tr fh(0.5,0.5)
failure _d:
analysis time _t:
d1
T1
Fleming-Harrington test for equality of survivor
functions
|
Events
Events
Sum of
g
| observed
expected
ranks
------+-------------------------------------1
|
24
21.32
2.3425108
2
|
23
38.77
-6.6296218
3
|
34
20.91
4.2871109
------+-------------------------------------Total |
81
81.00
0
chi2(2) =
Pr>chi2 =
14.50
0.0007
Test for trend of survivor functions
chi2(1) =
Pr>chi2 =
0.60
0.4385
41
. generate g3 = g^2
. sts test g3,tr
failure _d:
analysis time _t:
d1
T1
Log-rank test for equality of survivor functions
|
Events
Events
g3
| observed
expected
------+------------------------1
|
24
21.32
4
|
23
38.77
9
|
34
20.91
------+------------------------Total |
81
81.00
chi2(2) =
Pr>chi2 =
15.22
0.0005
Test for trend of survivor functions
chi2(1) =
Pr>chi2 =
4.78
0.0289
nonlinear transformations of scores yield different tests
42
Stratified tests
want to test whether, within levels of some covariate(s) X,
hazard is the same/not associated with other covariate A
(test about main effect of A)
what is null hypothesis?
When will tests be of particular use?
43
Null hypothesis:
for all X, a, a’
useful if:
have some covariate associated with hazard; i.e.,
if A is treatment of interest and associated with covariate
, have confounding
unconditional tests biased
similarly, if have censoring which is associated with X1,
comparison of crude hazards will be biased
44
stratified tests:
nonparametric if strata levels naturally defined
can also be done by categorizing continuous variables,
collapsing
what are problems with this?
45
no longer nonparametric; makes homogeneity assumption
hypothesis:
for all strata s
test statistic:
let
;
construct chi-squared test statistic as above
limitations on stratified test?
46
As with other stratified analyses, when # of strata gets
large, can lose power, strata get small
47
SAS implementation:
proc lifetest plots = (s);
time t1*d1(0);
strata z3; /* sex */
test g2 g3;
stata
sts test g, strata(Z3) detail
/* detail provides stratum-specific as
well as overall stratified tests */
failure time:
failure/censor:
T1
d1
48
Stratified log-rank test for equality
of survivor functions
---------------------------------------> Z3 = 0
| Events
g
| observed
expected
------+------------------------1
|
8
7.21
2
|
10
18.83
3
|
18
9.96
------+------------------------Total |
36
36.00
chi2(2) =
Pr>chi2 =
11.33
0.0035
-> Z3 = 1
| Events
g
| observed
expected
------+------------------------1
|
16
13.74
2
|
13
20.35
3
|
16
10.91
------+------------------------Total |
45
45.00
chi2(2) =
Pr>chi2 =
49
5.45
0.0655
-> Total
| Events
g
| observed
expected(*)
------+------------------------1
|
24
20.95
2
|
23
39.18
3
|
34
20.87
------+------------------------Total |
81
81.00
(*) sum over calculations within Z3
chi2(2) =
Pr>chi2 =
results fairly similar to unstratified test
50
15.90
0.0004
stratified test for matched pairs
in any stratum, if both subjects (i.e., subject in stratum j,
other stratum) at risk at time of failure in stratum j, have
contribution to sum at first failure time of
;
at subsequent failure time (when/if other subject fails) of
if other subject fails first, have contribution of
, contribution of 0 at other failuretime
if no subject fails, no contribution
second failure in stratum contributes nothing
51
let D1 be the # of matched pairs in which the individual
from group 1 fails first
for any weight functions discussed, get
test statistic:
what is needed condition for asymptotics to work?
52
asymptotically normal (i.e., as # of strata goes to infinity)
strata/pairs in which both subjects fail contribute nothing
McNemar’s test; also used in matched case-control
studies (there, strata always have 1 failure)
no loss of information by excluding strata with 0 failures,
2 failures
53
Tests for crossing hazards
Discuss Renyi-type tests
tests with power to detect crossing hazards alternatives
consider 2-sample tests
censored-data analogs of Kolmogorov-Smirnov statistic
find value of test statistic
at
each death time;
increments
in test statistic
death time ti have positive expectation if
negative expectation if reverse
thus, expect
to increase until hazards cross,
decrease thereafter
if hazards cross, statistic
will reach maximum
before maximum death time
54
at
,
where hazards do not cross, can base test on test stat at
maximum death time, i.e.,
since this is expected to increase, this will be reasonably
powerful
where hazards cross, poor power, since, test stat decreases
after hazards cross; early and late part of test
statistics may cancel each other out
base test on maximum value of test statistic
let
be a test statistic
where F(J) is, as before, variance of
(i.e., at end of
follow-up)
under null Q is approximately distributed as
, where
is a standard Brownian
motion process
critical values of Q found in table C.5
55
Tests for equality of survival functions at particular time
point
Let
Let
, etc.
denote (column) vector
H0:
Let C denote matrix of contrasts, e.g.,
each row defines separate contrast:
hypothesis:
how to test:
56
where
is estimated covariance matrix of
Then,
what is structure of
sample test?
is test statistic
, and how does it lead to simple 2-
57
s independent, leads to Z test
58
related: nonparametric estimation of contrasts in survival
relative risk: general term
risk ratio:
estimate as
how should one estimate confidence intervals?
59
delta method, log transformation:
not
but
usual approach
symmetry of confidence intervals when indices of
exposed/unexposed permuted
60