Estimating Sample Size Computer Laboratories Epidemiology and Biostatistics Department Faculty of Medicine Universitas Padjadjaran 2013 Reference • Dahlan, MS. Besar Sampel dan Cara Pengambilan Sampel dalam Penelitian Kedokteran dan Kesehatan. Edisi 3. Jakarta: Salemba Medika; 2008 • Hulley, SB et al. Designing Clinical Research. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2007 Introduction Whom? what? Design? ↓ How many subjects to sample? Introduction • If the sample size is too small fail to answer its research question • If the sample size is too large more difficult and costly than necessary Introduction • Goal to estimate an appropriate number of subjects for a given study design • Should be estimated early in the design phase, when major changes are still possible – Research design is not feasible – Different predictor or outcome variables are needed Reasons for sampling • Unable to perform total sampling • Results from representative sample (appropriate number of subjects and sampling technique) can be generalized to population • More efficient and ethical Generalization Study subjects Internal validity Intended sample External validity I Accessible population External validity II Study/Target population Internal validity • Representative actual sample/study subjects from intended sample – same characteristics with intended sample – problems: non-response, drop-out, loss to follow-up External validity I • Representative intended sample from accessible population – Appropriate sample size – Probabilistic sampling method External validity II • Representative accessible population from target/study population How to get appropriate sample size? • Appropriate sample size formula – Can be decided from our research questions/ research problems/problem identification • Correct sample size calculation Type of research: Specific design • Diagnostic – Sensitivity, specificity, PPV, NPV, LR (+), LR (-) • Prognostic – Example: What are the prognostic factors of shock in DHF patients? • Survival analysis – Example: Is there a mortality rate difference between HIV-patient treated with HAART starting at CD4 count 200 and 200 ? Type of research: Non-specific design • Descriptive – To estimate population proportion • What is the prevalence of diarrhea in Kecamatan X? – To estimate population mean • What is the mean of FBG level among adults in Kecamatan X? • Analytic – To find relationship/association between dependent and independent variable – To find a (proportion, mean) difference between two or more groups – To find correlation between variables Notes • In one study, it is possible to use more than one sample size formula, due to: – More than one research questions – Different study design • Cohort and nested-case control Notes • Stated in advance the primary and secondary research questions/hypotheses • The sample size calculations are always focused on the primary research question/hypothesis Power of the study (1 – β) • Results may be different • Need to be calculated again due to: – Actual sample/study subjects intended sample – in correlation study is different – Effect size (p1 – p2, x1 – x2) is different – Sample size is predetermined Z and Z * Value of or Z Descriptive or Two-sided One-sided Z 1% 2.81 2.57 2.57 5% 1.96 1.64 1.64 10% 1.64 1.44 1.44 15% 1.44 1.28 1.28 20% 1.28 0.84 0.84 *From Dahlan, MS, 2008 For two-tailed hypothesis Z1 – /2 For one-tailed hypothesis Z1 – Strategies for minimizing sample size and maximizing power • Use continuous variable (for outcome variable) – Permits smaller sample size for a given power – Permits greater power for a given sample size Strategies for minimizing sample size and maximizing power • Use paired measurements or matching – By comparing each subject with herself, it removes the baseline between-subjects part of the variability of the outcome variable – Example: • Change in weight on a diet has less variability than the final weight • Final weight is highly correlated with initial weight Strategies for minimizing sample size and maximizing power • Increase the precision – Standardizing the measurement methods – Training and certifying observer – Refining the instrument – Automating the instrument – Repeating the measurement Strategies for minimizing sample size and maximizing power • Use unequal group sizes – In general, the gain in power when the size of one group is increase to twice the size of the other is considerable – Tripling or quadrupling one of the groups provide progressively smaller gains. – Example: In a case control study 1 case : 2 controls Strategies for minimizing sample size and maximizing power • Use more common outcome (with caution!) – More frequent outcome – Enroll subjects at greater risk of developing that outcome – Extend the follow-up period – Loosen the definition of what constitutes an outcome Common Errors to Avoid • Estimating sample size late during the design of the study most common • Percentage or rate misinterpreted as numeric • No planning for dropouts or subjects with missing data • Equal vs unequal sample sizes • Two-sided alternative hypothesis or statistical analysis (Z1 - /2), but we use one-sided (Z1 - ) during sample size determination Literature vs Judgement* Judgement Literature or pilot study Variable Descriptive Categorical Probability of type I error = Precision = d Probability of type I error = (one/two-sided) Probability of type II error = p1 – p2 Numerical Probability of type I error = Precision = d Probability of type I error = (one/two-sided) Probability of type II error = x1 – x2 Categorical Proportion Proportion in control/non-exposed/standard group = P2 Numerical Standard deviation Combined standard deviation = S Correlation coefficient = r *From Dahlan, MS, 2008 Analytic Case I • Students have a variety of reasons for doing research while in medical school. As part of the Jatinangor program you are interesting in reproductive health. The aim of your study is to know the prevalence of puberty (defined by menarche or wet dreams) among primary school children in Kecamatan Jatinangor. There is no previous study on prevalence of puberty in that community. Answer a. The most appropriate study design: crosssectional study Outcome variable : prevalence of puberty (history of menarche or wet dreams YesNo, nominal) Predictor variable : b. The most appropriate statistical analysis for the study: Descriptive statistics Answer c. The target population: All Primary school in Kecamatan Jatinangor The accessible population: Primary school in Kecamatan Jatinangor Study unit of the study: Student age of 7 – 12 years old d. The appropriate sampling technique for the study: Stratified random sampling, cluster sampling Answer e. Using 95% confidence interval ( =.05) and with precision of the study 10 % (within 10% of the true value), the sample size needed and the appropriate sampling technique are : • For α= 0.05 then Z0.975 = 1.96 make sure npq ≥ 5 97(0,5)(0,5) = 24.25 ≥ 5 • The researcher will need at least 97 student age of 7 – 12 years old Case II • Suppose we wishes to know the random blood glucose level (mg/dl) among medical students in Faculty of Medicine X Answer a. The most appropriate study design: Crosssectional study Outcome variable : random blood glucose level (numeric) Predictor variable : b. The most appropriate statistical analysis for the study: Descriptive statistics Answer c. the target population: All medical students in Faculty of Medicine X the accessible population: All medical students in Faculty of Medicine X the study unit of the study: Medical student d. The appropriate sampling technique for the study: Simple random sampling, stratified random sampling Answer The aspects that can be determined by the researcher from the beginning The aspects that must be searched by the researcher from literature or a pilot study • d (precision) • s (standard deviation) f. Based on a pilot study, ten students were selected, and the following were the result of their random blood glucose level. Using α= 0.05 and a precision of 2.5 mg/dl, the estimation of sample size needed for the study are: Answer • For α = 0.05 then Z0.975 = 1.96 ; d = 2.5 mg/dl ; s = 13.47 mg/dl • The researcher will need at least 112 medical students Case III • One of the batch 2010 medical student prepare to conduct a study (for his minor thesis) on risk factors of diarrhea. Let’s say that the hypothesis was exclusive breastfed babies (first six months of life) will be less dehydrated (mild to moderate vs severe) during diarrhea in their age 7 to 11 months. The researcher wishes to conduct the study in Hasan Sadikin Hospital Bandung period of January – December 2011. Answer a. The most appropriate study design? Case-control, cross-sectional study Outcome variable : dehydration during diarrhea (mild to moderate or severe, nominal) Predictor variable : history of exclusive breastfeeding (yes or no, nominal) b. The most appropriate statistical analysis for the study: Chi-square test (assuming there are no confounding variables) Answer c. The target population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The accessible population: Baby age of 7 to 11 months diagnosed with diarrhea treated in Pediatric Emergency Unit, Hasan Sadikin Hospital, Bandung, period of January – December 2011 The study unit of the study: Medical record d. The appropriate sampling technique for the study: Simple random sampling Answer The aspects that can be determined by the researcher from the beginning The aspects that must be searched by the researcher from literature or a pilot study • α • β, • p1 – p2 • p2 (depends on the study design) Answer • • Using α = 0.05, β= 0.2, and difference of proportion considered by the researcher to be clinically significant = 0.2, the estimation of sample size needed for the study are For α = 0.05 then Z0.95 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 ; p1 – p2= 0.2 p2 = 18/35 = 0.51 (cross-sectional) p1 = 0.2 + p2 = 0.2 + 0.51 = 0.71 q1 = 1 – p1 = 1 – 0.71 = 0.29 q2 = 1 – p2 = 1 – 0.51 = 0.49 p = (p1+p2)/2 = (0.71 + 0.51)/2 = 0.61 q = 1 – p = 1 – 0.61 = 0.39 p2 = 17/32 = 0.53 (case control) p1 = 0.2 + p2 = 0.2 + 0.53 = 0.73 q1 = 1 – p1 = 1 – 0.73 = 0.27 q2 = 1 – p2 = 1 – 0.53 = 0.47 p = (p1+p2)/2 = (0.73 + 0.53)/2 = 0.63 q = 1 – p = 1 – 0.61 = 0.37 Answer Cross sectional study The researcher will need at least 73 exclusive breastfed babies and 73 non-exclusive breastfed babies diagnosed with diarrhea Answer Case control study • • For case group, the researcher will need at least 71 babies diagnosed with diarrhea plus severe dehydration For control group, the researcher will need at least 71 babies diagnosed with diarrhea plus mild to moderate dehydration Case IV • The researcher wishes to compare fasting blood glucose level (mg/dl) between medical students of Faculty of Medicine X with and without family history of DM type II. The subjects were matched according to age and sex. Answer a. The most appropriate study design: cross-sectional study Outcome variable : fasting blood glucose level (numeric) Predictor variable : - b. The most appropriate statistical analysis for the study: Paired t-test with Wilcoxon signed-rank test as an alternative Answer c. The target population: All medical students in Faculty of Medicine X The accessible population: All medical students in Faculty of Medicine X The study unit of the study: Medical student d. The appropriate sampling technique for the study? Matching technique Answer The aspects that can be determined by the researcher from the beginning The aspects that must be searched by the researcher from literature or a pilot study • α • β • x1 – x2 • S (combined standard deviation from two observations) Answer Based on a pilot study, six-paired students with family history of DM type II and without family history of DM type II were selected α = 0.05, β = 0.2, and difference of mean considered by the researcher to be clinically significant = 2.5 mg/dl, the estimation of sample size needed for the study are Answer • For α = 0.05 then Z0.975 = 1.96 (two-sided) and β = 0.2 then Z0.8 = 0.84 • x1 – x2 = 2.5 ; s1 = 4.88 mg/dl, n1 = 6 ; s2 = 3.74 mg/dl, n2 = 6 The researcher will need at least 24 of medical students with family history of DM type II and 24 medical students without family history of DM type II (matched according to age and sex) Case V • The investigator wants to conduct a cross-sectional study to know whether DM will give negative effect on the treatment outcome of TB. Data will be collected from hospital. The register showed that there are 50 people meet the criteria of inclusion in this study. From previous study, after 6 months of therapy, 9.6% of cultured sputum specimens from non-diabetic patients were still positive for Mycobacterium tuberculosis (RR = 2.65). Answer a. Outcome variable : response for treatment (Yes-No, nominal) Predictor variable : random blood glucose level (numeric) b. The most appropriate statistical analysis for the study: Chi-square test Answer c. The target population: All TB patients with DM in Hospital X The accessible population: Adult TB age of 20 to 65 years old diagnosed with DM treated in in Hospital X The study unit of the study: Medical record d. The appropriate sampling technique for the study? Simple random sampling • The power of the study in the number of samples taken from a total sampling? (Using = .05) : looking the formula and put the sample size Case VI • Let’s say the researcher has a hypothesis that serum 25(OH)-vitamin D levels (ng/ml) is positively correlated with bone mineral density, estimated using the quantitative ultrasound index (QUI), among postmenopausal women in Kecamatan Jatinangor Answer a. The most appropriate study design: Case-control, cross-sectional study Serum 25(OH)-vitamin D levels (numeric) Quantitative ultrasound index (numeric) b. What is the most appropriate statistical analysis for the study? Correlation methods (Pearson or Spearman’s rho coefficient correlation) Answer c. The target population: Postmenopausal women in Kecamatan Jatinangor The accessible population: Women who come to Posbindu Lansia in all villages The study unit of the study: Postmenopausal woman d. The appropriate sampling technique for the study: Consecutive sampling Answer The aspects that can be determined by the researcher from the beginning The aspects that must be searched by the researcher from literature or a pilot study • α • β • r (Pearson’s correlation coefficient) Based on pilot study, with 10 participants For α = 0.05 then Z0.975 = 1.64 (one-sided) and β = 0.2 then Z0.8 = 0.84 r = 0.78 (using SPSS or Excel) Answer • The researcher will need at least 9 postmenopausal women Review • Study Design – Non-specific or specific? – Observational (cross-sectional, case-control, cohort) or experimental? • Variables – Predictor/dependent and outcome/independent – Scale of measurement • • • • Categorical (nominal or ordinal) Numerical Paired vs unpaired observation Hypothesis – Type I and type II error (α, β) – Power of the study (1 – β) – One or two-sided alternative hypothesis • • Statistical analysis Sampling technique – Probabilistic sampling technique – Non-probabilistic sampling technique
© Copyright 2024