Study Design I. Sample Size Consideration Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia The The classical classical hypothesis hypothesis testing testing • Define a null hypothesis: – Ho: Xt = Xc • Define an alternative hypothesis: – Ha: Xt > Xc , Xt < Xc , Xt not equal to Xc. • Perform a test of significance on the null hypothesis. – Assume that the null hypothesis is true. – Determine the probability of obtaining the observations found in the data. • Accept or reject Ho – If the Ho is rejected, the alternative hypothesis is accepted. – But there are many alternative hypotheses! Diagnosis Diagnosis and and statistical statistical reasoning reasoning Disease status Present Absent Significance Difference is Present Absent (Ho not true) Test result +ve True +ve False +ve Test result Reject Ho No error 1-β True -ve Accept Ho (sensitivity) -ve False -ve (Specificity) (Ho is true) Type I err. α Type II err. No error β 1−α α : significance level 1-β : power Study Study Design Design Issues Issues • • • • Setting Participants: inclusion / exclusion criteria Design: cross-sectional, longitudinal Measurements: outcome, covariates / risk factors • Analysis • Sample size / power issues Sample Sample size size issues issues • How many observations / subjects? – Practical and statistical issues – Ethical issues • Ethical issues in clinical studies – Unnecessarily large number of patients may be deemed unethical – Too small a sample may also be unethical as the study can’t show anything. Large Large difference difference vs vs Statistical Statistical significance significance Status Improved Group A Group B Status 9 18 Improved No improved 21 12 Total 30 30 Chi-square: 5.4; P < 0.05 “Statistically significant” Group A Group B 6 12 No improved 14 8 Total 20 20 Chi-square: 3.3; P > 0.05 “Statistically insignificant” Effect of sample size: a simulation True mean: 100 True SD: 15 True mean: 100 True SD: 35 Sample size Est. M SD Est. M SD 10 50 100 200 500 1000 2000 10000 100000 98.0 100.4 101.3 99.9 99.8 99.5 99.7 100.1 100.0 108.9 95.3 99.1 100.3 98.9 99.9 99.9 99.9 100.0 11.0 13.6 14.4 15.2 15.3 15.1 15.0 15.0 15.0 32.2 41.4 35.5 33.2 33.8 35.0 34.7 35.0 35.0 Specification Specification for for sample sample size size determination determination • • • • Parameter of major interest Magnitude of difference in the parameter Variability of the parameter Bound of errors (type I and type II error rates) Parameter Parameter of of Interest Interest • Type of measurement of primary interest: – Continuous or categorical outcome • Examples: – Mortality: proportion (or probability) of death/survival – Blood pressure: difference in BP in mmHg – Quality of life: change in QoL scores Variability Variability of of the the Parameter Parameter of of Interest Interest • If the parameter is a continuous variable: – What is the standard deviation (SD) ? • If the parameter is a categorical variable: – SD can be estimated from the proportion/probability. Magnitude Magnitude of of Difference Difference of of Interest Interest • Distinction between clinical and statistical relevance. • Change from baseline or difference between groups. • Examples: – Probability of survival: 85% vs 80% – Blood pressure: difference between groups by 1 SD. – Quality of life: difference in the change in QoL between groups by 5%. 0.95 0.95 Z2 -1.96 0 1.96 0.025 0 0.025 Prob. 0.80 0.90 0.95 0.99 Z1 0.84 1.28 1.64 2.33 Z2 1.28 1.64 1.96 2.81 Z1 1.64 0.05 Alpha c 0.20 0.10 0.05 0.01 Zα (One-sided) 0.84 1.28 1.64 2.33 Zα/2 (Two-sided) 1.28 1.64 1.96 2.81 Power 0.80 0.90 0.95 0.99 Z1−β 0.84 1.28 1.64 2.33 • The serum cholesterol levels of Californian children have a mean of 175 mg/100ml and a standard deviation of 30 mg/100ml. The distribution of the cholesterol levels is normal. 116 • 95% of the children should have cholesterol levels ranged between 175 + (1.96x30) = 116 and 234 mg/100ml. 175 234 • If we let X be the chol. level for any child, then X can be converted to a variable with mean=0 and SD=1: Z = (X – 175) / 30 mg/100l Z -1.96 Abnormal? 0 1.96 Abnormal? Study Study design design and and Outcome Outcome • Single population • Two populations • Continuous measurement • Categorical outcome • Correlation Single Single Population Population Sample Sample size size for for estimating estimating aa population population mean mean • How close to the true mean • Confidence around the sample mean • Type I error. • N = (Zα/2)2 σ2 / d2 σ: standard deviation d: the accuracy of estimate (how close to the true mean). Zα/2: A Normal deviate reflects the type I error. • Example: we want to estimate the average weight in a population, and we want the error of estimation to be less than 2 kg of the true mean, with a probability of 95% (e.g., error rate of 5%). • N = (1.96)2 σ2 / 22 Sample size 96 138 188 246 311 384 400 350 300 Sample size Std Dev (σ) 10 12 14 16 18 20 450 250 200 150 100 50 0 0 5 10 15 Standard deviation 20 25 Sample Sample size size for for estimating estimating aa population population proportion proportion • How close to the true proportion • Confidence around the sample proportion. • Type I error. • N = (Zα/2)2 p(1-p) / d2 p: proportion to be estimated. d: the accuracy of estimate (how close to the true proportion). Zα/2: A Normal deviate reflects the type I error. • Example: The prevalence of osteoporosis in the general population is around 30%. We want to estimate the prevalence p in a community within 2% with 95% confidence interval. • N = (1.96)2 (0.3)(0.7) / 0.022 = 2017 subjects. • N = (1.96)2 (0.3)(0.7) / 0.022 = 2017 subjects. 2500 2000 Sample size • Example: The prevalence of osteoporosis in the general population is around 30%. We want to estimate the prevalence p in a community within 2% with 95% confidence interval. 1500 1000 500 0 0 0.02 0.04 0.06 Standard deviation 0.08 0.1 Sample Sample size size for for estimating estimating aa correlation correlation coeffcient coeffcient • In observational studies which involve estimate a correlation (r) between two variables of interest, say, X and Y, a typical hypothesis is of the form: – Ho: r = 0 vs H1: r not equal to 0. • The test statistic is of the Fisher's z transformation, which can be written as: 1 é1 + r ù t = log e ê n−3 ú 2 ë1 − r û • Where n is the sample size and r is the observed correlation coefficient. • It can be shown that t is normally distributed with mean 0 and unit variance, and the sample size to detect a statistical significance of t can be derived as: (Z α + Z1− β ) +3 N= 1é 1 + r æ öù log ç ÷ú e 4 êë 1 − r è øû 2 Sample Sample size size for for estimating estimating r: r: example example • Example: According to the literature, the correlation between salt intake and systolic blood pressure is around 0.3. A study is conducted to test the correlation in a population, with the significance level of 1% and power of 90%. The sample size for such a study can be estimated as follows: 2 ( 2.33 + 1.28) N= + 3 = 87 1é æ 1 + 0.3 öù log ÷ eç 4 êë è 1 − 0.3 øúû • A sample size of at least 87 subjects is required for the study. Sample Sample size size for for difference difference between between two two means means • Hypotheses: Ho: µ1 = µ2 vs. Ha: µ1 = µ2 + d • Let n1 and n2 be the sample sizes for group 1 and 2, respectively; N = n1 + n2 ; r = n1 / n2 ; σ: standard deviation of the variable of interest. • Then, the total sample size is given by: 2 æ ö 2 (r + 1)ç Z + Z ÷ σ 1− β ø è α N= rd 2 Where Zα and Z1-β are Normal deviates • If we let Z = d / σ be the “effect size”, then: ö (r + 1)æç Z + Z ÷ 1 α − β è ø N= rZ 2 2 • If n1 = n2 , power = 0.80, alpha = 0.05, then (Zα + Z1-β)2 = (1.96 + 1.28)2 = 10.5, then the equation is reduced to: N= 21 Z2 Two Two Populations Populations Sample Sample size size for for two two means means vs.“effect vs.“effect size” size” Total sample size (N) 2400 2000 1600 1200 800 400 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Effect size (d / s) For a power of 80%, significance level of 5% 2 Sample Sample size size for for difference difference between between 22 proportions proportions • Hypotheses: Ho: π1 = π2 vs. Ha: π1 = π2 + d . • Let p1 and p2 be the sample proportions (e.g. estimates of π1 and π2) for group 1 and group 2. Then, the sample size to test the hypothesis is: ( Z n= α 2 p(1 − p ) + Z p (1 − p ) p (1 − p ( p − p )2 1− β 1 1 2 1 2 2 ))2 Where: n = sample size for each group ; p = (p1 + p2) / 2 ; Zα and Z1-β are Normal deviates A better (more conservative) suggestion for sample size is: ù 4 né n = ê1 + 1 + ú 4ë np −p û a 1 2 2 Sample Sample size size for for difference difference between between 22 prevalence prevalence • For most diseases, the prevalence in the general population is small (e.g. 1 per 1000 subjects). Therefore, a difference formulation is required. • Let p1 and p2 be the prevalence for population 1 and population 2. Then, the sample size to test the hypothesis is: n= (Z [0.00061(arcsin α + Z1− β )2 p1 − arcsin p2 ) 2 ] Where: n = sample size for each group; Zα and Z1-β are Normal deviates. Sample Sample size size for for two two proportions: proportions: example example • Example: In a condition, the remission rate is expected to be 70% for a new treatment, and 60% for a conventional treatment. A trial is planned to show the difference at the significance level of 1% and power of 90%. • The sample size can be calculated as follows: – p1 = 0.6; p2 = 0.7; p = (0.6 + 0.7)/2 = 0.65; Z0.01 = 2.81; Z1−0.9 = 1.28. – The sample size required for each group should be: ( 2.81 n= 2 2 × 0.65 × 0.35 + 1.28 0.6 × 0.4 + 0.7 × 0.3 ) ≈ 759 2 (0.6 − 0.7 ) • Adjusted / conservative sample size is: 2 ù 4 759 é n = ê1 + 1 + ú = 836 4 ë 759 0.6 − 0.7 û a Sample Sample size size for for two two proportions proportions vs. vs. effect effect size size Difference from p1 by: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 424 625 759 825 825 759 625 424 131 173 198 206 198 173 131 73 67 82 89 89 82 67 45 . 41 47 50 47 41 31 . . 28 30 30 28 22 . . . 19 20 19 17 . . . . 14 14 13 . . . . . 10 9 8 . . . . . P1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Note: these values are “unadjusted” sample sizes Sample Sample size size for for estimating estimating an an odds odds ratio ratio • In case-control study the data are usually summarized by an odds ratio (OR), rather then difference between two proportions. • If p1 and p2 are the proportions of cases and controls, respectively, exposed to a risk factor, then: p (1 − p2 ) OR = 1 p2 (1 − p1 ) • If we know the proportion of exposure in the general population (p), the total sample size N for estimating an OR is: (1 + r )2 (Zα + Z1− β )2 N= 2 r (ln OR ) p(1 − p ) • Where r = n1 / n2 is the ratio of sample sizes for group 1 and group2; p is the prevalence of exposure in the controls; and OR is the hypothetical odds ratio. If n1 = n2 (so that r = 1) then the fomula is reduced to: 4(Zα + Z1− β ) 2 N= (ln OR )2 p(1 − p ) Sample Sample size size for for an an odds odds ratio: ratio: example example • Example: The prevalence of vertebral fracture in a population is 25%. It is interested to estimate the effect of smoking on the fracture, with an odds ratio of 2, at the significance level of 5% (one-sided test) and power of 80%. • The total sample size for the study can be estimated by: 4(1.64 + 0.85) = 275 N= 2 (ln 2) × 0.25 × 0.75 2 Sample Sample size size for for 22 correlation correlation coefficients coefficients • In detecting a relevant difference between two correlation coefficients r1 and r2 obtained from two independent samples of sizes n1 and n2, respectively, we need to firstly transform these coefficients into z value as follows: æ 1 + r1 ö ÷÷ z1 = 0.5 log e çç è 1 − r1 ø æ 1 + r2 ö ÷÷ z 2 = 0.5 log e çç è 1 − r2 ø • The total sample size N required to detect the difference between two correlation coefficients r1 and r2, with a significance level of α and power 1β, can be estimated by: 2 4(Zα + Z1− β ) N= ( z1 − z2 )2 Where Zα and Z1-β are Normal deviates Sample Sample size size for for two two r’s: r’s: example example • The sample size required to detect the difference between r1 = 0.8 and r2 = 0.4 with the significance level of 5% (two-tailed) and power of 80% can be solved as follows: – z1 = 0.5 ln ((1+0.4) / (1-0.4)) = 0.424 – z1 = 0.5 ln ((1+0.8) / (1-0.8)) = 1.098 4(1.96 + 1.28) N= 2 = 92 (0.424 − 1.098) 2 • 46 subjects is needed in each group. Some Some Comments Comments • • • • • The formulae presented are theoretical. They are all based on the assumption of Normal distribution. The estimator [of sample size] has its own variability. The calculated sample size is only an approximation. Non-response must be allowed for in the calculation. Computer Computer Programs Programs • Software program for sample size and power evaluation – PS (Power and Sample size), from Vanderbilt Medical Center. This can be obtained from me by sending email to ([email protected] or [email protected]). Free. • On-line calculator: – http://ebook.stat.ucla.edu/calculators/powercalc/ • References: – Florey CD. Sample size for beginners. BMJ 1993 May 1;306(6886):1181-4 – Day SJ, Graham DF. Sample size and power for comparing two or more treatment groups in clinical trials. BMJ 1989 Sep 9;299(6700):663-5. – Miller DK, Homan SM. Graphical aid for determining power of clinical trials involving two groups. BMJ 1988 Sep 10;297(6649):672-6 – Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons. BMJ 1995 Oct 28;311(7013):1145-8. – Sahai H, Khurshid A. Formulae and tables for the determination of sample sizes and power in clinical trials for testing differences in proportions for the two-sample design: a review. Stat Med 1996 Jan 15;15(1):1-21. – Kieser M, Hauschke D. Approximate sample sizes for testing hypotheses about the ratio and difference of two means. J Biopharm Stat 1999 Nov;9(4):641-50.