3.1 Hypothesis Testing • One sample test — A censored sample of size n from some population — Want to test the hypothesis that the population hazard rate is h0(t) for all t ≤ τ , i.e., H0 : h(t) = h0(t), 0 < t ≤ τ — Typically, take τ to be the largest of the observed study time — Note that we have information on the failure times only and the estimator of the hazard function at time ti is ˆ i) = di h(t ni Therefore, we compare the observed hazard and the hazard under the null hypothesis ˆ i) − h0(ti) = di − h0(ti) h(t ni ˆ i) is the maximum likelihood estimator of h(ti), Since h(t it, asymptotically, is unbiased, i.e., di E = h0(ti) ni and the variance under H0: 3.2 Therefore, under H0, h0(ti)(1 − h0(ti)) di h0 (ti ), ∼N ni ni asymptotically. Further, we have di − Ei ∼ N (0, Ei) where Ei = nih0(ti) — If the data follow the distribution under H0, di − Ei should be small. Therefore, large differences indicate evidences against the null hypothesis. — However we have k differences, tj t1 t2 ... tk nj n1 n2 ... nk dj d1 d2 ... dk Ej Diff. E1 d1 − E(t1) E2 d2 − E(t2) ... ... Ek dk − E(tk ) To test H0, so we sum them up i=1 (di − Ei) under H0, E(Z) = 0 — How about the variance of Z under H0 3.3 — Therefore, under H0, Z = V ar(Z) k i=1 (di − Ei) k E i=1 i ∼ N(0, 1) or, equivalently, 2 k (d − E ) i i i=1 k E i=1 i ∼ χ21 — This is called the log-rank test — Example: A study of the remission times in weeks of 21 patients: 6, 6, 6, 7, 10, 13, 16, 22, 23, 6+, 9+, 10+, 11+, 17+, 19+, 20+, 25+, 32+, 32+, 34+, 35+ Test the null hypothesis: H0 : h0(t) = 0.05 tj 6 7 10 13 16 22 23 nj dj Ej 3.4 Comparisons of Several Survivor Functions — Suppose we have r groups of lifetimes and want to test the equality among them — Example: Pike (1966) Table 1.1 Days to Vaginal Cancer Mortality in Rats Group 1 Group 2 ∗ Censored 143 216∗ 142 233 344∗ 164 220 156 233 188 227 162 239 188 230 198 240 190 234 204∗ 261 192 244∗ 205 280 206 246 232 280 209 265 232 296 213 216 304 233 233 296 323 3.5 — Suppose there are r independent censored samples i.e., for sample i, we have di1 failures ti01 ≤ · · · ≤ ti0mi0 < censored obs ti1 di2 failures ≤ ti11 ≤ · · · ≤ ti1mi1 < ti2 diki failures ≤ ··· < tiki ≤ · · · ≤ tikmik censored obs — We want to test if all the survival functions are the same — Under this null hypothesis, all these samples are coming from the same distribution — We can pool all samples together to form a pooled sample — Note that the KM estimator for the survival function only changes at the observed failures — Therefore we only focus on the observed failures — Let t1 < t2 < · · · < tk denote the failure times for the pooled sample — Suppose dj failures occurs at tj and that nj study subjects are at risks just prior to tj (j = 1, . . . , k) in the pooled sample — Let dij and nij be the corresponding numbers in sample i (i = 1, . . . , r) — Then , at tj , we can form a 2 × r contingency table Group failures survivors Sub-total 1 d1j n1j − d1j n1j 2 d2j n2j − d2j n2j ... ... ... ... r drj nrj − drj nrj Total dj nj − dj nj 3.6 — Under the null hypothesis (there is no difference among groups), the conditional distribution of the d1j , . . . , drj given dj is then the hypergeometric distribution Pr {d1j from group 1, d2j from group 2, . . . , drj from group r|dj } n n n 1j 2j rj ... d1j d2j drj = n j dj — Why? (What is the hypergeometric distribution?) 3.7 — The conditional mean of dij is wij = nij dj nj — the conditional variance of dij is (Vj )ii = nij (nj − nij )dj (nj − dj ) n2j (nj − 1) — The conditional covariance of dij and dlj is (Vj )il = − nij nlj dj (nj − dj ) n2j (nj − 1) — the statistic vj = (d1j − w1j , . . . , drj − wrj ) has (conditional) mean zero and variance matrix Vj — Although the column vector vj has r components, we need only the knowledge of r − 1 components Redefine vj∗ = (d1j − w1j , . . . , d(r−1)j − w(r−1)j ) and the corresponding covariance matrix is Vj∗ = [(Vj )il ]i,l=1,2,...,r−1 — The log-rank statistic is v= k 1 vj∗ — If the k contingency tables were independent, the variance of the log-rank statistic would be V = V1∗ + V2∗ + · · · Vk∗ 3.8 — The log-rank test for the equality of the r survival curves is v V−1 − v which is asymptotic χ2r−1 distributed — the χ2r−1 statistic can be formed using any r−1 elements of v and the corresponding (r − 1) × (r − 1) submatrix of V 3.9 Pool two groups together 142(2) 204(2)∗ 232(2) 246(1) 143(1) 205(2) 232(2) 261(2) 156(2) 206(1) 233(2) 265(1) 162(2) 209(1) 233(2) 280(2) 164(1) 213(1) 233(2) 280(2) 188(1) 216(1) 233(2) 296(2) 188(1) 216(1)∗ 234(1) 296(2) 190(1) 220(1) 239(2) 304(1) 192(1) 227(1) 240(2) 323(2) — Totally there are 29 distinct failures and hence 29 contingency tables — just prior to t1 = 142, n1 = 40, d1 = 1, d11 = 0, n11 = 19, d21 = 1, n21 = 21 Group failures survivors Subtotal 1 0 19 19 2 1 20 21 Subtotal 1 39 40 — Therefore, w11 = n11 ∗ d1/n1 = 19 ∗ 1/40, w21 = n21 ∗ d1/n1 = 21 ∗ 1/40 and v1 = (0 − 0.475, 1 − 0.525) = (−0.475, 0.475) 198(2) 230(1) 244(1)∗ 344(2)∗ 3.10 — just prior to t2 = 143, n2 = 39, d2 = 1, d12 = 1, n12 = 19, d22 = 0, n22 = 20 Group failures survivors Subtotal 1 1 18 19 2 0 20 20 Subtotal 1 38 39 — Therefore, w12 = n12 ∗ d2/n2 = 19 ∗ 1/39, w22 = n22 ∗ d2/n2 = 20 ∗ 1/39 and v2 = (1 − 0.487, 0 − 0.513) = (0.513, −0.513) — ··· — The log-rank statistic is 4.762 and variance is 7.263 and hence the χ2 statistic is 4.762 ∗ 4.762 = 3.12 7.263 and p − value = P r(χ21 > 3.12) = 0.0773 3.11 tj 142 143 156 162 164 188 190 192 198 205 206 209 213 216 220 227 230 232 233 234 239 240 246 261 265 280 296 304 323 dj 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 2 4 1 1 1 1 1 1 2 2 1 1 nj d1j 40 0 39 1 38 0 37 0 36 1 35 2 33 1 32 1 31 0 29 0 28 1 27 1 26 1 25 1 23 1 22 1 21 1 20 0 18 0 14 1 13 0 12 0 10 1 9 0 8 1 7 0 5 0 3 1 2 0 n1j d2j 19 1 19 0 18 1 18 1 18 0 17 0 15 0 14 0 13 1 13 1 13 0 12 0 11 0 10 0 8 0 7 0 6 0 5 2 5 4 5 0 4 1 4 1 3 0 2 1 2 0 1 2 1 2 1 0 0 1 n2j 21 20 20 19 18 18 18 18 18 16 15 15 15 15 15 15 15 15 13 9 9 8 7 7 6 6 4 2 2 3.12 Wilcoxon Test — In log-rank test all contingency tables are treated equally — Someone may think of large sample should carry more information — Proposed to use k j=1 nj vj — The corresponding variance matrix is V= k j=1 n2j Vj — back to the example, the Wilcoxon statistic is 114 and the variance is 4902.22291 and hence the χ21 statistic is 114 * 114 / 4902.22291 = 2.651 and p − value = P r(χ21 > 2.651) = 0.1035
© Copyright 2024