Statistical Thinking and Smart Experimental Design Luc Wouters November 2014 Statistical Thinking and Smart Experimental Design Luc Wouters November 2014 Statistical Thinking and Reasoning in Experimental Research 9h00 – 9h30: 11h00 – 11h15: 12h30 -13h15: 15h00 – 15h15: 16h45 – 17h15: 3 Welcome - Coffee Comments & Question - Coffee break Lunch Comments & Questions – Coffee Break Summary – Concluding Remarks – Comments & Questions Statistical Thinking and Reasoning in Experimental Research ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ 4 Introduction Smart Research Design by Statistical Thinking Planning the Experiment Principles of Experimental Design Common Designs in Biosciences The Required Number of Replicates - Sample Size The Statistical Analysis y The Study Protocol Interpretation & Reporting Concluding Remarks Statistical Thinking and Smart Experimental Design I. Introduction 5 There is a problem with research in the biosciences - Replicability 6 ¾ Scholl, et al. (Cell, 2009): “Synthetic lethal interaction between oncogenic KRAS dependency p y and STK33 suppression pp in human cancer cells.” ¾ Conclusion: cancer tumors could be destroyed by targeting STK33 protein ¾ Amgen Inc.: 24 researchers, 6 months labwork ¾ Unable to replicate p results of Scholl,, et al. There is a problem with research in the bi biosciences i - Acceptance A t 7 There is a problem with research in the bi biosciences i - Acceptance A t ¾ Séralini, et al. (Food Chem Toxicol, 2012): “Long term toxicity of a roundup herbicide and a rounduptolerant genetically modified maize” ¾ Conclusion: GM maize and low concentrations of Roundup herbicide causes toxic effects (tumors) in rats ¾ Press conference e.g. Natural News: Shock findings in new GMO study: Rats fed lifetime of GM corn grow horrifying tumors, 70% of females die early ¾ Severe impact on general public and on interest of industry y Referendum labeling of GM food in California, y Bans on importation of GMOs in Russia and Kenya 8 There is a problem with research in the bi biosciences i - Acceptance A t The critiques: ¾ Heavily criticized (VIB) ¾ European p Food Safety y Authority y ((2012)) review: • inadequate design, analysis and reporting • wrong strain of rats • number of animals too small ¾ Journal had paper withdrawn late 2013 ¾ Republished 2014 in journal with less impact (Environmental Sciences Europe) 9 There is a problem with research in the bi biosciences i - Efficiency Effi i 10 The problem of doing “good science” S Some critical iti l reviews i Ioannidis, 2005 Kilkenny, 2009 Loskalzo, 2012 11 Things get even worse R t ti rates Retraction t ¾ 21% of retractions were due to error ¾ 67% misconduct, including fraud or suspected t d fraud f d 12 The integrity of our profession is put into question ti 13 The integrity of our profession is put into question ti “The way we do our research (with our animals) is stone age” Ulrich Dirnagl, Dirnagl Charité University Medicine, Berlin Science 2013 14 What’s wrong with research in the bi biosciences? i ? 15 Co rse objectives Course objecti es ¾ Increase quality of research reports by: ¾ Statistical Thinking: t provide to id elements l t off a generic i methodology th d l tto design insightful experiments ¾ Statistical Reasoning: to provide guidance for interpreting and reporting results 16 Thi course prepares you ffor a dialogue This di l 17 Statistical thinking leads to highly productive d ti research h ¾ Set of statistical precepts (rules) accepted by his scientists ¾ Kept research to proceed in orderly and planned fashion ¾ Open mind for the unexpected t d ¾ World record: 77 approved medicines over a p period of 40 years ¾ 200 research papers/year 18 Statistical Thinking and Smart Experimental Design II. Smart Research Design g by y Statistical Thinking g 19 The architecture of experimental research The two types of studies 20 The architecture of experimental research R Research h is i a phased h d process 1. Definition proposal 2. Design protocol 3. Data Collection data set 4. Analysis conclusions 5. Reporting report Phase 21 Deliverable The architecture of experimental research R Research h is i an iterative it ti process 22 Modulating between th concrete the t and d th the abstract b t t 23 Research st styles les ma may vary… ar The ‘novelist’ D D C A R The ‘data salvager’ D D C A R The ‘lab freak’ D 24 D C A R The ‘smart researcher’ D D C A R • Definition • Design g • Data Collection • Analysis • Reporting The smart researcher ¾ time ti spentt planning l i and dd designing i i an experiment i t att the outset saves time and money in the long run ¾ optimizes the lab work and reduces the time spent in the lab ¾ the design of the experiment is most important and governs how h d data t are analyzed l d ¾ minimizes time spent at the data analysis stage ¾ can look l k ahead h d tto th the reporting ti phase h with ith peace off mind since early phases of study are carefully planned and formalized (proposal, protocol) 25 The design of insightful experiments is a process informed i f d by b statistical t ti ti l thi thinking ki 26 The Seven Principles of Statistical Thi ki Thinking 1 Time spent thinking on the conceptualization and design of an 1. experiment is time wisely spent 2. The design of an experiment reflects the contributions from different sources of variability 3. The design of an experiment balances between its internal validity and external validity 4. Good experimental practice provides the clue to bias minimization 5. Good experimental design is the clue to the control of variability 6 Experimental design integrates various disciplines 6. 7. A priori consideration of statistical power is an indispensable pillar of an effective experiment 27 Statistical Thinking and Smart Experimental Design III. Planning g the Experiment p 28 The planning process 29 The planning process 30 The planning process 31 Limit number n mber of research objecti objectives es Number of research objectives should be limited (≤ 3 objectives) Example Séralini study: ¾ 10 treatment groups, females and males = 20 groups ¾ Only O l 10 animals/group i l / ¾ 50 animals/group “golden standard” in toxicology 32 Importance of a auxiliary iliar h hypotheses potheses Reliance on auxiliary hypotheses is the rule rather than the exception in testing scientific hypotheses (Hempel 1966) (Hempel, Example Séralini study: ¾ Use of Sprague-Dawley breed of rats as model for “human” cancer ¾ Known to have higher mortality in 2 year studies ¾ Prone to develop spontaneous tumors 33 T pes of e Types experiments periments 34 Ab se of exploratory Abuse e plorator e experiments periments 35 ¾ Most published in biomedical research exploratory ¾ Too often published as if confirmatory experiments ¾ Do not unequivocally q y answer research q question ¾ Use of same data that generated research hypotheses to prove these hypotheses involves circular reasoning ¾ I f Inferential ti l statistics t ti ti should h ld be b interpreted i t t d and d published bli h d as “for exploratory purposes only” ¾ Séralini studyy was actuallyy conceived as exploratory p y ¾ See also Nature 506: 150-152, 13 February 2014 bit.ly/1tJWs4a Role of the pilot e experiment periment 36 Types of experiments objective bj ti - usage 37 Statistical Thinking and Smart Experimental Design IV. Principles p of Statistical Design g 38 Some terminolog terminology 39 ¾ Factor: the condition that we manipulate in the experiment (e.g. concentration of a drug, temperature) ¾ Factor level: the value of that condition (e.g. 10-5M, 40 40°C) C) ¾ Treatment: combination of factor levels (e.g. 10-5M and 40°C) ¾ Response p or dependent p variable: characteristic that is measured ¾ Experimental unit: the smallest physical entity to which a treatment is independently applied ¾ Observational unit: physical entity on which the response is measured ¾ See Appendix A A, Glossary of Statistical Terms The str structure ct re of the response Assumptions 40 • Additi model Additive d l • Treatment effects constant • Treatment effects in one unit do not affect other units Defining the e experimental perimental unit nit 41 The choice of the e experimental perimental unit nit y Temme et al. ((2001)) y Comparison of two genetic strains of mice, wild-type and connexin32-defficient y Diameters of bile caniliculi in livers y Statistically significant difference between Means ± SEM from 3 livers two strains (P < 0.005 0 005 = 1/200) y Experimental unit = Cell y Is this correct? 42 The choice of the e experimental perimental unit nit 43 ¾ Experimental unit = animal ¾ Results not conclusive at all ¾ Wrong choice made out of ignorance ignorance, or out of convenience? ¾ This was published in peer reviewed journal !!! The choice of the experimental unit Pl t R Plant Research h 44 The choice of the experimental unit Ri Rivenson study t d on N-nitrosamines N it i y Rats housed 3 / cage y Treatment supplied in drinking water y Impossible to treat any 2 rats differently y Rats within a cage not independent y Rat not experimental unit y Experimental unit = cage y Same remarks for Séralini study y Rats housed 2/cage y Rats in same cage are not independent of one another y Number of experimental units is not 10/treatment, but 5/treatment 45 The choice of the experimental unit Th reverse case The y Skin irritation test in volunteers y 2 compounds y Backs of volunteers divided in 8 test areas y Each compound randomly assigned to 4 different test areas y Experimental unit = Test Area, not volunteer 46 The choice of the experimental unit Summary Choice of experimental unit of particular concern when: ¾microscopy studies (the cell is not treated) ¾animals ¾ i l are nott iindividually di id ll caged d and/or d/ independence in question ¾plant research when independence is not guaranteed. Then the plot, tray, or entire growth chamber (environmental factor) is experimental unit (Fernandez 2007). 47 Why engineers don’t care very much about b t statistics t ti ti 48 The basic dilemma: b l balancing i iinternal t l and d external t l validity lidit 49 Failure to minimize bias ruins an experiment i t F il Failure to t minimize i i i bias bi results lt in i llostt experiments i t Failure to control variability can sometimes be remediated after the facts Allocation to treatment is most important source of bias 50 Req irements for a good e Requirements experiment periment 51 Basic strategies for maximizing i i i Si Signal/Noise l/N i ratio ti 52 A control t l is i a standard t d d ttreatment t t condition diti against i t which hi h all others may be compared 53 ¾ Single blinding – the condition ¾ Neutralizes investigator bias under which investigators are uninformed as to the treatment condition of experimental units ¾ Double blinding – the condition under which both experimenter and d observer b are uninformed i f d as to the treatment condition of experimental p units 54 ¾ Neutralizes investigator bias and bias in the observer’s scoring system In toxicological histopathology, both blinded and unblinded evaluation are recommended ¾ Group blinding ¾ Entire treatment groups are coded: A, B, C ¾ Individual blinding ¾ Each experimental unit is coded individually ¾ Codes are kept in randomization list ¾ Involves independent person that maintains list and prepares treatments 55 ¾ Describes D ib practical ti l actions ti ¾ Guidelines for lab technicians ¾ Manipulation of experimental units (animals, etc.) ¾ Materials used ¾ Logistics ¾ Definitions of measurements and scoring methods ¾ Data collection & processing ¾ Personal responsibilities 56 ¾ A detailed protocol is imperative to minimize potential bias Calibration is an operation p that compares p the output p of a measurement device to standards of known value, leading to correction of the values indicated by the measurement device Neutralizes bias in the investigator’s measurement system 57 ¾ Randomization ensures that the effect of uncontrolled sources of variability has equal probability in all treatment groups ¾ Randomization provides a rigorous method to assess the uncertainty (standard error) in treatment comparisons ¾ Formal randomization involves a chance dependent mechanism, e.g. flip of a coin, by computer (Appendix B) ¾ Formal randomization is not haphazard allocation ¾ Moment of randomization such that the randomization covers all substantial sources of variation,, i.e. immediately before treatment administration ¾ Additional randomization may be required at different stages 58 Randomized versus Systematic Allocation ¾ First unit treatment A,, second treatment B, etc. yyielding g sequence q AB,, AB,, AB ¾ Other sequences as AB BA, BA AB, etc. possible . 59 • any systematic arrangement might coincide with a specific pattern in the variability • biased estimate of treatment effect • randomization d i ti is i a necessary condition diti ffor a statistical t ti ti l analysis, without randomization a statistical analysis is not justified Improvements of Randomization Schemes . ¾ Balancing B l i means causes iimbalance b l iin variability i bilit ¾ Invalidates subsequent statistical analysis 60 Haphazard Allocation versus Formal Randomization • • • • Effect of 2 diets on body weight of rats 12 animals arrive in 1 cage g technician takes animals out of cage first 6 animals Îdiet A, remaining 6 Îdiet B Isn’t this Random ? • heavy animals react slower and are easier to catch than smaller animals • first 6 animals weigh more than remaining 6 ¾ Haphazard treatment allocation is NOT the same as formal randomization ¾ Haphazard allocation is subjective and can introduce bias ¾ Proper randomization requires a physical “randomizing randomizing device” device 61 Moment of Randomization • Rat brain cells harvested and seeded on Petri-dishes (1 animal = 1 dish) • Dishes randomly divided in two groups • Two sets of Petri dishes placed in incubator • After 72 hours group 1 treated with drug A and group 2 with drug B ¾ Effect of treatment confounded with systematic errors due to incubation ¾ Randomization should be done as late as possible just before treatment application ¾ Randomization sequence should be maintained throughout the experiment 62 Basic strategies for maximizing i i i Si Signal/Noise l/N i ratio ti 63 ¾ Simple random sample a selection of units from a defined target population such that each unit has an equal chance of being selected A random sample is a stringent requirement that is very difficult to fulfill in biomedical research 64 ¾ Neutralizes sampling bias ¾ Provides the foundation for the population model of inference ¾ Particularly important in genetic research (Nahon & Shoemaker) Switch to randomization model of statistical inference in which inference is restricted to the actual experiment only Reducing the intrinsic variability and bias by: ¾ use of genetically uniform animals ¾ use of phenotypically uniform animals ¾ environmental control ¾ nutritional control ¾ acclimatization ¾ measurement system 65 Beware off B external validity Basic strategies for maximizing i i i Si Signal/Noise l/N i ratio ti 66 Replication can be an effective strategy to control variability (precision) Replication is an expensive strategy to control variability Precision of experiment is quantified by standard error of difference between two treatment means Replication R li ti is i also l needed to estimate SD 67 subsampling: Standard error of difference: ¾ multiple observations over time ¾ duplicate, triplicate measurement ¾ different levels of subsampling ¾ Replication is in most cases only effective at the level of the true experimental unit ¾ Replication at the subsample level (observational unit) makes only sense when the variability at this level is substantial as compared to the variability between the experimental p units 68 A blocking factor is a known and unavoidable source of variability that adds or subtracts an offset to every experimental unit in the block Typical blocking factors • age group • microtiter plate • run of assay • batch of material • subject (animal) • date • operator 69 A covariate is an uncontrollable but measurable attribute of the experimental unit or its environment that is unaffected by the treatments but may have an influence on the measured response Typical covariates Covariates filter out one particular source of variation. Rather than blocking it represents a quantifiable (by coefficient β) attribute of the system studied. • baseline measurement • weight • age • ambient temperature Covariate adjustment requires slopes p to be parallel p 70 Req irements for a good e Requirements experiment periment 71 Simplicit of design Simplicity Protocol adherence? Complex Design Statistical analysis? Simple designs require a simple analysis 72 The calc calculation lation of uncertainty ncertaint (error) “It iis possible ibl and d iindeed d d it iis allll ttoo ffrequent, t ffor an experiment i t tto b be so conducted that no valid estimate of error is available” (Fisher, 1935) Validity of error estimation Respond independently to treatment Experimental units Differ only in random way from experimental units in other treatment groups Without a valid estimate of error, error a valid statistical analysis is impossible 73 Statistical Thinking and Smart Experimental Design V. Common Designs g in Biosciences 74 The jjungle ngle of e experimental perimental designs Split Plot Greco-Latin Square Design Latin Square Design Completely Randomized Design Randomized Complete Block Design Plackett-Burman Design 75 Youden Square Design Factorial Design Experimental design is an integrative i t ti conceptt An experimental design is a synthetic approach to minimize bias and control variability 76 Three aspects of experimental design d t determine i complexity l it and d resources 77 Completel Randomi Completely Randomized ed Design ¾ Studies the effect of a single independent variable (treatment) ¾ One-way treatment design Completely randomized error–control design ¾ Most common design (default design) design), easy to implement Lack of precision in comparisons is major drawback 78 Completely Randomized Design E Example l 1 ¾ Effect of treatment on proliferation of gastric epithelial cells in rats ¾ 2 drugs & 2 controls (vehicle) = 4 experimental groups g p, individual housing g in cages g ¾10 animals/group, ¾ Cages numbered and treatments assigned randomly to cage numbers ¾ Cages sequentially put in rack ¾ Blinding of laboratory staff and of histological evaluation, ¾ Treatment code = cage number (individual blinding) 79 Completely Randomized Design E Example l 2 Bias due to plate location effects in microtiter plates (Burrows (1984) (1984), causes parabolic patterns: Underlying y g causes unkown Solution byy Faessel ((1999): ) 80 • Random allocation of treatments to the wells • Bias of plate location is now introduced as random variation Randomization can be used to transform bias into noise A convenient way to randomize microtiter plates l t y dilutions made in tube rack 1 y tubes are numbered y MS Excel used to make randomization map y map taped on top of rack 2 y tubes from rack 1 are pushed to corresponding number in rack 2 y multipipette used to apply drugs to 96-well plate y results entered in MS Excel and sorted Alternative methods: • design (Burrows et al al. 1984 1984, Schlain et al. al 2001) • statistical model (Straetemans et al. 2005) 81 Factorial Design (full factorial) ¾ Studies the jjoint effect of two or more independent variables ((factors)) ¾ Considers main effects and interactions ¾ Factorial treatment design C Completely l t l R Randomized d i d error-control t l design d i 82 Factorial Design E Example l No interaction, additive effects 83 Interaction present, superadditive effects Factorial Designs I t Interaction ti and d Drug D Synergy S A hypothetical y factorial experiment of a drug combined with itself at 0 and 0.2 mg/l 0 0.2 mg/L 0 0% 1% 0.2 mg/L 1% 22% S Synergy off a drug d with ith itself? it lf? The pharmacological concept of drug synergy requires more than just statistical interaction 84 Factorial Design F th examples Further l ¾ Effects of 2 different culture media and 2 different incubation times on growth of a viral culture ¾ Microarray experiments interaction between mutants and time of observation b ti ¾ For cDNA microarrays specialized procedures required (see Glonek and Solomon Solomon, 2004; Banerjee and Mukerjee, Mukerjee 2007) ¾ The need for a factorial experiment should be recognized at the design phase ¾ The analysis should make use of the proper methods i.e. ANOVA 85 Factorial Design Hi h di Higher dimensional i ld designs i ¾ 3 x 3 x 3 design = 27 treatments , 2 replicates = 54 experimental units is this manageable? ¾ Interpretation p of higher g ((3-way) y) interactions? Neglect higher order interactions Do not test ALL combinations Fractional Factorial Design (e g Process optimization) (e.g. 86 Factorial Design Effi i Efficiency off factorial f t i l design d i Only valid method to study interaction between factors B absent B Present A Absent 18 15 A Present 13 10 Interaction = (18 + 10) – (15 +13) = 0 No important interaction from statistical AND scientific point of view Effect of factor A: [(13 – 18) + (10 -15)]/2 = -5 d.f. = 4 x (n – 1) 87 Effect of factor B: [(15 – 18) + (10 -13)]/2 = -3 d.f. = 4 x (n – 1) Increase external validity Randomi ed Complete Block Design Randomized ¾ Studies the effect of a single g independent p variable ((treatment)) in presence of a single isolated extraneous source of variability (blocks) ¾ One way treatment design One O e way ay b block oc e error-control o co t o des design g • Increase of precision • Eliminates possible imbalance in blocking factor • Assumes treatment effect the same among all blocks • Blocking characteristic must be related to response or else loss of efficiency due to loss in df 88 Randomized Complete Block Design E Example l ¾ Effect of treatment on proliferation of gastric epithelial cells in rats ¾ 2 drugs & 2 controls (vehicle) = 4 experimental groups ¾10 animals/group, individual housing in cages ¾ Cages numbered and treatments assigned randomly to cage numbers ¾ Blinding of laboratory staff Shelf height will probably influence the results ¾ ¾ ¾ ¾ 89 1 shelf = 1 block 8 animals (cages) / shelf, 5 shelves Randomize treatments per shelf 2 animals / shelf / treatment Not restricted to shelf height, other characteristics can also be included (e.g. body weight) Randomized Complete Block Design P i dD Paired Design i ¾ Special case of RCB design with only 2 treatments and 2 EU/block ¾ Example Cardiomyocyte experiment 90 Randomized Complete Block Design P i dD Paired Design i – Example E l Experimental Design Results • Left panel • ignores pairing • groups overlap, • standard error drug-vehicle = 7.83 • Right panel • pairs are connected by lines • consistent increase in drug treated dishes • standard error drug-vehicle drug vehicle = 2.51 2 51 91 The CRD requires 8 times more EU’s then the paired design Latin Sq Square are Design ¾ Studies the effect of a single independent variable (treatment) in presence of two isolated, extraneous sources of variability (blocks) ¾ One-way treatment design Two-way block error-control design • • • 92 In 1 k x k Latin square only k EU’s/treatment More EU’s EU s by stacking Latin squares upon each other or next to each other Randomizing the rows (columns) of the stacked LS design can be advantageous Latin squares to eliminate row-column effects in 96-well microtiter plates, 1 plate = 2 x (5 x 5 LS) also of interest for experiments in green houses and growth chambers Split Plot Design ¾ A split-plot design recognizes two types of experimental units: • • Plots • Subplots Two-way crossed (factorial) treatment design Split plot error-control error control design ¾ Plots randomly assigned to primary factor (color) ¾ Subplots randomly assigned to secondary factor (symbol) 93 Split Plot design E Example l ¾ Effects of 2 diets and vitamin supplement on weight g g gain in rats ¾ Rats housed 2 / cage (independence?) ¾ Cages randomly allocated to diet A or diet B ¾ Individual rats color marked randomly selected to receive vitamin or not ¾ Main plot factor = diet ¾ Subplot factor = vitamin 94 Repeated Meas Measures res Design ¾ Two independent variables ‘time’ time and ‘treatment’ treatment Each is associated with different experimental units ¾ Special case of split plot design 95 Statistical Thinking and Smart Experimental Design VI. The Required q Number of Replicates p – Sample p Size 96 Determining sample size is a risk-cost i k t assessmentt Replicates Cost Uncertainty Confidence 97 Choose sample size just big enough to give confidence that any biologically meaningful effect can be detected Th context The t t off biomedical bi di l experiments i t Sample size depends on Study Obj ti Objectives Statistical Context Point Estimation Interval Estimation Study Design Hypothesis Testing Study Specifications Assumptions 98 The h hypothesis pothesis testing conte contextt Population Model Null Hypothesis Decision made Do not reject null hypothesis Reject null hypothesis Alternative Hypothesis State of Nature Null hypothesis Alternative true hypothesis true Correct decision False negative (1 – α) β False positive α Correct decision (1 – β) • Alternative hypothesis, effect size: • Size of difference • Variability Specify p y allowable false positive & false negative rate S Sample l size i Level of significance 0.01, 0.05, 0.10 99 Power 80% or 90% Number of treatments Number of blocks Major determinants of sample si size e 100 Approaches for determining sample size Formal power analysis approaches Requires input of • significance level • power • alternative hypothesis (i.e. effect size Δ) ¾ R-package pwr (Champely, 2009) ¾ Lehr’s equation: q • • • • smallest difference variability (SD) n per group single comparison (2 groups) significance level 0.05 power 80%, other powers possible Rule-of-thumb: Mead’s Resource Requirement Equation E=N-T-B 101 • • • • E = df for error, 12 ≤ E ≤ 20, blocked exps. ≥ 6 N = df total sample size (no. EUs – 1) T = df for treatment combination B = df for blocks and covariates df = number of occurences – 1 Approaches for determining sample size E Examples l Effect size Δ = difference in means / standard deviation Δ = 0.2 0 2 small; Δ = 0.5 0 5 medium; Δ = 0.8 0 8 large (Cohen) Cardiomyocytes completely randomized design Treated - Control = 10; SD = 12.5 Δ = 10/12.5 = 0.8 standard deviation of either group or pooled SD > require(pwr) > pwr.t.test(d=0.8,power=0.8,sig.level=0.05,type="two.sample", pwr t test(d 0 8 power 0 8 sig level 0 05 type "two sample" + alternative="two.sided") Two-sample t test power calculation n = 25.52457 d = 0.8 sig.level = 0.05 power = 0.8 alte nati e = two.sided alternative t o sided NOTE: n is number in *each* group 102 Lehr’s equation 16/0.64 = 25 Approaches for determining sample size E Examples l ((cont.) t) Rule-of-thumb: Mead’s Resource Requirement Equation E=N-T-B • • • • E = df for error, 12 ≤ E ≤ 20 , ≥ 6 blocked exps. N = df total sample size (no. EUs – 1) T=d df for o treatment ea e co combination b a o B = df for blocks and covariates Cardiomyocytes completely randomized: N=9 9, T = 1 1, B = 0 E = 10, too small (min. = 12) At least 14 E.U.: 13 (N) – 1 (T) = 12 (E) Cardiomyocytes paired experiment: N = 9, T = 1, B = 4 E = 4, too small (min. = 6) At least: 14 EU (7 animals): 13(N) – 1 (T) – 6 (B) 103 Ho man subsamples s bsamples ? How many Standard error of difference: Vertical line corresponds to upper bound: Taking differential costs into consideration: 104 How many subsamples ? E Example l Morphologic study diameter of cardiomyocytes in sheep 7 sheep with intervention 6 sheep control Diameter of 100 epicardial cells/sheep Variance components: Upper bound (neglecting differential costs): Assume cost of 1 animal is equivalent to 100 subsamples then: 1 animal equivalent to 1000 subsamples then 55 cells sufficient Taking a priori 100 subsamples was definitely a waste of time in this case 105 M ltiplicit and sample si Multiplicity size e 106 M ltiplicit and sample si Multiplicity size e • • • • ¾ For single comparison, large sample sizes required for medium (Δ = 0.5) to large (Δ = 0.8) effects 107 ¾ Effects in early research usually extremely large 2 tests: 20% increase RSS 3 tests: 30% increase RSS 4 tests: 40% increase RSS 10 tests: 70% increase RSS Statistical Thinking and Smart Experimental Design VII. The Statistical Analysis y 108 The Statistical Triangle Design, Model, and Analysis are Interdependent 109 Every experimental design is underpinned b a statistical by t ti ti l model d l Statistical analysis: • • 110 Fit model to data Compare treatment effect(s) with error Measurement theory defines allowable operations ti and d statements t t t 111 Nominal ¾ Naming ¾ Ex. E C Color, l gender, d etc. t ¾ Frequency tables Ordinal ¾ Ranking and ordering ¾ Ex. E S Scoring i as none, minor, i moderate,…. d t ¾ Differences between scores do not reflect same distance ¾ Frequency tables, median values, IQR Interval ¾ ¾ ¾ ¾ Ratio ¾ Differences and ratios make sense ¾ True zero point, negative numbers impossible ¾ Ex. Ex Concentrations Concentrations, duration duration, temperature in degrees Kelvin, age, counts,… ¾ Geometric means, coefficient of variation Adding, subtracting (no ratios) Arbitrary zero, 30°C ≠ 2 x 15°C Ex. Temperature in degrees Fahrenheit or Celsius Means standard deviations Means, deviations, etc etc. Significance tests The concept of the p-value is often misunderstood Randomization Model Exp. Data Null hypothesis (H0): no difference in response between treatment groups Consider H0 as true Observed Test Statistic t P-value P value = Probability of obtaining a value for the test statistic that is at least as extreme as t assuming H0 is true Statistical Model Reject H0 112 Yes p≤α No Do not reject H0 Significance tests E Example l Cardiomyocytes C di t Mean diff. = 7% viable MC standard error = 2.51 Assume H0 is true Test statistic t = 7/2.51 = 2.79 Distribution of possible values of t with mean 0 P(t≥2.79) = 0.024 113 Probabilityy of obtaining g an increase of 2.79 or more is 0.024, provided H0 is true Significance tests T Two-sided id d ttests t Mean diff. = 7% viable MC standard error = 2.51 Assume H0 is true Test statistic t = 7/2.51 = 2.79 Distribution of possible values of t with mean 0 R Results lt iin opposite it direction are also of interest P(|t|≥2.79) = 0.048 114 Two-sided tests are the rule! Probabilityy of obtaining ga difference of ±2.79 or more is 0.048, provided H0 is true St ti ti always Statistics l makes k assumptions ti Parametric Methods: Nonparametric Methods: • • • • • • Distribution of error term Randomization Independence Equality of variance Randomization Independence Always verify assumptions • • 115 Planning (historical data) Analysis time before actual analysis M ltiplicit Multiplicity Multiple comparisons Multiplicity Multiple variables Multiple p measurements ((in time) 116 • 20 doses of drug tested against its control each at g level of 0.05 significance • Assume all null hypotheses are true, no dose differs from control • P(correct decision of no difference) = 1 - 0.05 = 0.95 • P(ALL 20 decisions correct) = 0.9520 = 0.36 • P(at least 1 mistake) = P(NOT ALL 20 correct) = 1 – 0.36 = 0.64 Decision made Do not reject null hypothesis Reject null hypothesis State of Nature Null Alternative hypothesis hypothesis true true Correct decision (1 – α) False positive α False negative β Correct decision (1 – β) (1 – • The “Curse of Multiplicity” is of particular importance in microarray data (30 (30,000 000 genes → 1 1,500 500 FP) • Multiplicity and how to deal with it must be recognized at the planning stage Statistical Thinking and Smart Experimental Design VIII. The Studyy Protocol 117 The writing of the study protocol finalizes th research the hd design i phase h Research protocol Technical protocol ¾ Rehearses research logic ¾ Forms the basis for reporting ¾ Describes practical actions ¾ Guidelines for lab technicians • General hypothesis • Working hypothesis • Experimental design rationale (treatment, error control, sampling design) • Measures to minimize bias (randomization, etc.) • Measurement methods • Statistical analysis 118 • • • • • Manipulation of animals Materials Logistics Data collection & processing Personal responsibilities Writing the Study Protocol is already working on the Study Report Statistical Thinking and Smart Experimental Design IX. Interpretation p & Reporting p g 119 P i t to Points t consider id in i the th Methods M th d S Section ti Experimental p design g Weaknesses & Strengths : • • • • • • • 120 Size and number of experimental groups How were experimental units assigned to treatments? Was proper randomization used or not? Was it a blinded study and how was g introduced? blinding Was blocking used and how? How were outcomes assessed? In case of ambiguity, which was the experimental unit and which the observational unit and why? Statistical Methods: • • • • Report and justify statistical methods with enough detail Supply level of significance and when applicable li bl direction di ti off ttestt ((one-sided, id d two-sided) Recognize multiplicity and give justification of strategy to deal with it Reference to software and version Points to consider in the Results Section S Summarizing i i th the D Data t Always report number of experimental units used in the analysis All units it mustt b be accounted t d ffor, no di discrepancies i with ith M Methods th d For - and only for - normally di ib d data distributed d mean and d standard d d deviation (SD) are sufficient statistics • SD is descriptive statistic about spread, i.e. variability • Report as mean (SD) NOT as mean ± SD • SEM is measure of precision of the mean (makes not much sense) Mean – 2 x SD can lead to ridiculous values if data not normally distributed (concentrations, durations, counts, etc.) Not normally distributed Median values and interquartile range Extremely small datasets (n < 6) Raw data Spurious p p precision detracts a p paper’s p readability y and credibility 121 Points to consider in the Results Section G hi l Di Graphical Displays l ¾ Complement tabular presentations ¾ Better suited for identifying patterns: “A picture says more than a thousand words” ¾ Whenever possible display individual data (e.g. use of beeswarm package in R) 20 Occurence es 15 10 5 0 Fa Fb Ma Individual data, median values and 95% confidence fid iintervals l 122 Mb Longitudinal data (measurements over time) Individual subject profiles Points to consider in the Results Section Si ifi Significance Tests T t ¾ Specify method of analysis do not leave ambiguity e.g. “statistical methods included ANOVA, regression analysis as well as tests of significance” in the methods section is not specific enough ¾ Tests of significance should be two-sided • Two group tests allow direction of alternative hypothesis, e.g. treated > control, this is a one-sided alternative • Use of one-sided tests must be justified and the direction of the test must be stated beforehand ((in protocol) p ) • For tests that allow a one-sided alternative it must be stated whether twosided or one-sided was used ¾ Report exact p-values rather than p<0.05 or NS • • • • • 123 0.049 is significant, 0.051 not ? Allows readers to choose their own critical value Avoid reporting p = 0.000; but report p<0.001 instead Report p-values to the third decimal For one-sided tests if result is in “wrong” direction then p’ = 1 - p Points to consider in the Results Section Si ifi Significance Tests T t – Confidence C fid IIntervals t l Provide confidence intervals to interpret size of effect ff t • • • 124 Null hypothesis is never proved Lack of evidence is no evidence for lack of effect ff t Confidence intervals provide a region of plausible values for the treatment effect Points to consider in the Results Section Si ifi Significance Tests T t – Confidence C fid IIntervals t l 125 X is significant, while Y is not F l claims False l i off iinteraction t ti “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05), but not in the control mice (P>0.05)” Invalid reasoning from nonsignificant result (P>0.05) The difference between “significant” g and “not significant” g is not itself significant Test for equalitiy of effect of training in 2 groups Only a factorial experiment with proper method of analysis (ANOVA) and test for interaction can make such a claim 126 Concluding Remarks Statistical thinking is the key to succesful experiments 1. Time spent thinking on the conceptualization and design of an experiment is time wisely spent 2. The design of an experiment reflects the contributions from different sources of variability 3 The design of an experiment balances between its internal validity and 3. external validity 4. Good experimental practice provides the clue to bias minimization 5. Good experimental design is the clue to the control of variability 6. Experimental design integrates various disciplines 7 A priori consideration of statistical power is an indispensable pillar of 7. an effective experiment 127 Concl ding Remarks Concluding R l off St Role Statistician? ti ti i ? 128 • Professional skilled in solving research problems • Team member and collaborator/partner in research project often granted co-authorship • Should be consulted when there is doubt with regard to design, sample size, or statistical analysis • It can be wise to include a statistician from the very beginning of the project Fi h about Fisher b t th the R Role l off th the St Statistician ti ti i “To To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. perhaps p say y what the experiment p He can p died of.” (Presidential Address to the First Indian Statistical Congress, 1938; Fisher, 1938). 129 Statistical Thinking and Smart Experimental Design Slides, Course notes at: http://www.datascope.be Teşekkür ederim 131
© Copyright 2024