SAMPLE SIZE CALCULATIONS WITH R Instructor: M B Rao Division of Biostatistics and Epidemiology Department of Environmental Health University of Cincinnati A Two-Day Workshop At CR RAO AIMSCS January 2-3, 2014 Outline Chapter 1: Introduction to R and General Ideas on Sample Size calculations Chapter 2: Single Sample Problems: Quantitative responses Chapter 3: Two-sample problems: Quantitative responses 1 Chapter 4: Multi-sample problems: Quantitative responses – Analysis of Variance – One-way classified data Chapter 5: Correlations Chapter 6: Proportions Chapter 7: Contingency tables – multiple proportions – Odds ratio – McNemar test Chapter 8: Survival analysis – Hazard ratio Chapter 9: Bioequivalence testing Chapter 10: Diagnostic tests – ROC curves Chapter 11: Regressions Chapter 12: Nonparametric tests Chapter 13: Analysis of Variance with Repeated Measures Chapter 14: Comparing variability – Inter- and Intra- variability Chapter 15: Group Sequential Methods A traditional class on sample size calculations should include all these topics. This program is too ambitious for a two-day workshop. I will strive to cover as many of these topics as possible. The major goal is to empower you to carry out sample size calculations on your own at least in some of the standard scenarios you come across. Goals 1. Understand what goes behind the sample size calculations. Why do we need to spell out the specifications demanded? 2. Learn how to use R to get the required output. 2 Chapter 1 Introduction to R and General Ideas on Sample Size Calculations Introduction to R The purpose of the workshop is to show you how to calculate required sample size, typically, in the environment of testing hypotheses. The medium I use is the computational software R. It is free and takes less than five minutes to download the software. A number of statistical software such as SAS, SPSS, BMDP, StatExact, Stata, PASS, nQuery, etc. do calculate sample sizes. All these software cost money. This is a genuine hands-on-experience workshop. I encourage you strongly to bring your own laptop to the classroom, copy my moves on the big screen, and see how the output materializes on your laptop. It is fervently hoped that at the end of the program you will achieve a good degree of mastery in calculating sample sizes on your own after grasping the underlying idea. For those who are attending the workshop for credit, there are homework problems peppered in the text. You can use the last half-hour of the workshop on the final day to work out the solutions, compile them in a word file, and email the file to me. You will get a grade for your effort. Tutors will help you if you face problems in executing R commands. Commercial software such as SAS and SPSS produce a number of manuals to help you use their software. R does not do that. There is a committee which oversees the running of R. It works purely voluntary. However, any one in the world can write a book on the usage of R, publish it commercially, and profit from it. Major commercial publishers such as Springer, John Wiley, and Chapman and Hall are big in publishing books on R. You can start your own company helping small and big businesses, for a fee, by attending to their computational needs via R. The company ‘Revolution Analytics’ is a prime example of such an endeavor. 3 If you develop a new methodology for analyzing data and your research is funded federally (NIH, NSF, etc.), the federal organizations now demand that you develop an R code for the implementation of the methodology. SAS is No. 1 among all commercial software. R is not far behind. There are certain features of SAS such as spread sheet and simulations that work better than R. R is catching up. Books on R For the last five years, books on R are being published at the rate of one per month. Some books are about general usage of R. Others are specialized. Select a specific area of research and explain how R can be used in analyzing the data pertinent to the specific area. At the latest count, there are more than 100 books published with the major theme being R. I have more than 50 books in my own collection. Here is a sample. 1. Peter Dalgaard – Introductory Statistics with R, Second Edition – Springer, 2008 2. Brian S Everitt and Torsten Hothorn – A Handbook of Statistical Analyses Using R - Chapman and Hall, 2006 3. Julian J Faraway – Linear Models with R – Chapman and Hall, 2004 4. Simon N Wood – Generalized Additive Models: An Introduction with R – Chapman and Hall, 2006 5. John Fox and Sanford Weisberg – An R Companion to Applied Regression, Second Edition – Sage 6. Andrew Gelman and Jennifer Hill – Data Analysis Using Regression and Multilevel/Hierarchical Models – Cambridge University Press 7. Andreas Foulkes – Applied Statistical Genetics with R – Springer, 2009 8. Thomas Lumley – Complex Surveys; A Guide to Analysis Using R – John Wiley, 2010 9. Ding-Geng Chen and Karl Peace – Clinical Trial Data Analysis Using R – Chapman and Hall, 2011 10. Paul Murrell – R Graphics, Second Edition – Chapman and Hall, 2011 11. Hadley Wickham – ggplot2 – Springer, 2009 12. Deepayan Sarkar – Lattice: Multivariate Visualization with R – Springer, 2011 4 13. Kunio Takezawa – Guidebook to R Graphics Using Microsoft Windows – John Wiley, 2012 14. Nicholas Horton and Ken Kleinman – Using R for Data Management, Statistical Analysis, and Graphics – Chapman and Hall, 2011 15. Andrew Robinson and Jeff Hamann – Forest Analytics with R – Springer, 2011 16. Giovanni Seni and John Elder – Ensemble Methods in Data Mining (Improving Accuracy Through Combining Predictions) – Morgan and Claypool Publishers, 2010 17. Dynamic Documents with R and knitr – Chapman and Hall, 2013 18. Applied Meta-Analysis with R, 2013 There are no books addressed to sample size calculations using R. I am hoping to bring out one. The following books do deal with sample size calculations. They are essentially cookbooks with pages and pages of tables. 1. MM Desu and D Raghavarao – Sample Size methodology – Academic Press, New York, 1990 2. Jacob Cohen – Statistical Power Analysis for the Behavioral Sciences – Second Edition – Lawrence Erlbaum Associates, Hillsdale, NJ, 1988 3. Shein-Chung Chow, Jun Shao, and Hansheng Wang – Sample Size Calculations in Clinical Research – Marcel Dekker, New York, 2003 4. Lloyd D Fisher and Gerald van Belle – Biostatistics – John Wiley, New York, 1993 What is R? R is a system for data manipulation, analysis, and presentation including graphics. The system is free and it takes about five minutes to download it. It has two components. 1. Base 2. Packages For most of our needs, it is good enough to download only the base. There are hundreds and hundreds of packages. Each package is tailored for specific needs. During our sojourn in sample size calculations, we will need two or three 5 packages. When the need arises, I will explain how to download the package of interest. What does R give? In R a statistical analysis is usually performed in a series of steps with intermediate results being stored in objects. Systems such as SPSS and SAS provide extensive output, where as R gives minimal output and stores other results for subsequent usage with other R functions or commands. This means that R can be tailored to produce exactly the analysis and results that you want rather than produce an analysis designed to fit all situations. One has a better control of analysis in R than in SPSS, BMDP, SAS, STATA, etc. History. It all started with the introduction of the system S, which is an elegant, widely accepted, and enduring software system with outstanding conceptual integrity. The originator of the system is Dr. John Chambers, currently with Stanford University. The system S was hailed as the mother of all data analysis systems. In 1998, the Association of Computing Machinery (ACM) presented Dr. Chambers with its Software System Award for ‘the S system, which has forever altered the way people analyze, visualize, and manipulate data.’ S is not free. R was inspired by the S environment. R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics, University of Auckland, New Zealand. Subsequently, a large group of individuals contributed to R by sending code and bug reports. The current R is the result of a collaborative effort with contributions from all over the world. SAS has a single owner. More than 500 people work on the development of SAS codes. If a new method comes up in the literature, SAS works on developing a code and incorporate it into its general system. New versions keep coming out incorporating new methodologies. However, R has a million brains working behind. New packages are installed periodically. There are several features of data analysis in R that SAS has not caught up yet. 6 How to download R? The current version is 3.0.1. 1. Open Internet. 2. Type R. 3. Click on ‘The R project for statistical computing.’ Their web page opens up. All resources are available from this page: the R system itself, a collection of add-on packages, manuals, documentation, and more. 4. Click on ‘Download CRAN.’ 5. A number of sights are available from which one can download R. Go to the nearest website. In our case, I think Ohio – Case Western Reserve University is the nearest. Click on this site. 6. Click on ‘Download R for Windows.’ 7. Click on ‘base.’ R has two components: base and packages. For all practical needs ‘base’ is good enough. There are hundreds of user-contributed packages available. When the need arises we will download certain packages. The ‘base’ is about 40 MB. 8. Click on ‘Download R-2.15.1 for Windows.’ 9. Click ‘Run.’ 10. Follow the next steps. Always click ‘next.’ 11. Click ‘Finish.’ 12. An R-icon will be created on your desktop. 13. When you click on the icon, an R-console opens up. 14. I will take charge from this point. How R is maintained and updated? There is a core team consisting of the following members: Doug Bates (Madison) John Chambers (Stanford) Peter Dalgaard (Denmark) Robert Gentleman (USA) 7 Kurt Hornik Stefano Iacus Ross Ihaka (New Zealand) Friedrich Leisch Thomas Lumley Martin Maeschler Duncan Murdoch Paul Murrell (Australia) Martyn Plummer Brian Ripley (U.K.) Duncan Temple Lang Luke Tierney (Minneapolis-St.Paul) Simon Urbanek Suppose you develop an R code for a new method of analysis. You submit the code to the core team. They will make it available widely on the internet for comments, improvements, and bugs. After the code is thoroughly vetted for bugs, it will be made part of the list of packages available in the R website. Suppose you have a query about R. Put your question on the web. Usually, someone responds. Suppose you want use R for your data analysis. You should know what statistical methodology you want to use. For example, you want to perform cluster analysis. You would like to know what R command to be used. Type ‘cluster analysis R’ on a web browser. You will get plenty of help. 8 R cannot teach you statistics. You ought to have a modicum of knowledge of statistics before you are ready to use R. Normal, t-, chi-squared, and F distributions are at the core of any statistical data analysis. Computations involving these distributions can be done using R. Let us focus on the normal distribution. Example: An engineer in collaboration with a pharmaceutical analyst sets up a machine for manufacturing tablets of a particular medication. The parameters of the machine are so set that the machine is supposed to produce tablets each with weight 750 mg. It is impossible for the machine to produce tablets each with exactly the same weight. Variation in the weights is normal to expect. In order to understand the variation in the weights of the tablets, they took a random sample of 100,000 tablets and determined the weight of each tablet. This is not hard to do. They built a (relative frequency) histogram of the data and drew the frequency polygon. The mean of the weights worked out to be 750 mg and the standard deviation 60 mg approximately. They also observed that the shape of the empirical density curve was bell-shaped and symmetrical around the mean. Using mathematics, they fitted the following theoretical curve, called the normal distribution, to the histogram. 1 x −750 f(x) = 2 1 1 − 2 ( 60 ) e , x = weight 2π 60 This curve approximates the contour of the histogram very well. They summarized the entire data collected in the following way. The weights of the tablets the machine produces are normally distributed with mean µ = 750 mg and standard deviation σ = 60 mg. X = Weight ~ N(750, 60) 9 Look at the documentation on the normal distribution. ?pnorm There are four commands associated with the normal distribution. dnorm (for plotting a normal curve) pnorm (for calculating probability for a given quantile q) qnorm (for calculating quantile for a given probability p) rnorm (for drawing random samples from a specified normal distribution) Let us draw the Normal density function with mean 750 and sd 60. Determine the three sigma limits. mean – 3*sd = 570 mean + 3*sd = 930 0.004 0.003 0.002 0.001 0.000 Density 0.005 0.006 Normal Density with mean 750 SD 60 600 650 700 750 800 850 Weight in mg 10 900 The relevant r code: > curve(dnorm(x, mean = 750, sd = 60), from = 570, to = 930, xlab = "Weight in mg", ylab = "Density", main = "Normal Density with mean 750 SD 60", col = "red", lwd = 2) lwd = line width Uses of the other commands: 1. What proportion of tablets have weights less than 700 milligrams? The required proportion is the area under the curve to the left of 700. Draw a picture. If X stands as a generic symbol for the weight of a random tablet, in probabilistic jargon, we want to find Pr(X < 700). We use ‘pnorm’ command of R. We are given the quantile q = 700, we want the probability. pnorm(700, mean = 750, sd = 60, lower.tail = T) 0.2023284 2. What proportion of tablets have weights greater than 820 milligrams? It is the area under the curve to the right of 820. Equivalently, Pr(X > 820) = ? < pnorm(820, mean = 750, sd = 60, lower.tail = F) < 0.1216725 3. What proportion of tablets have weights between 710 grams and 820 grams? It is the area under the curve between 710 and 820. Equivalently, Pr(710 < X < 820) = ? Draw a picture. < pnorm(820, mean = 750, sd = 60, lower.tail + T) – pnorm(710, mean = 750, sd = 60, lower.tail = T) < 0.625835 4. For what quantile q the probability of finding a random tablet with weight less than or equal to q is 60%? In probabilistic jargon, 11 Pr(X ≤ q) = 0.60. Solve for q. < qnorm(0.60, mean = 750, sd = 60, lower.tail = T) < 765.2008 5. For what quantile q the probability of finding a random tablet with weight greater than q is 20%? In probabilistic jargon, Pr(X > q) = 0.2. Solve for q. Draw a picture. < qnorm(0.2, mean = 750, sd = 60, lower.tail = F) < 800.4973 How to draw random samples from a normal distribution? Draw a random sample of size 200 from a normal distribution with mean = 2 and standard deviation = 4. > MB <- rnorm(200, 2, 4) > MB [1] 8.789175699 -1.617459744 0.012799405 2.974297318 -1.320160020 [6] 4.762438460 0.405252116 9.115464798 4.783125687 1.323385626 [11] 0.649639930 6.525328905 3.902918248 8.490263075 4.562788788 [16] 5.935500975 0.305411208 1.227126658 -1.391149804 -1.874279574 [21] 9.111568534 -1.846544766 0.833858577 5.269729951 [26] 5.549521744 -2.528451681 0.163028167 1.575618571 -3.419861794 1.728848697 5.969369979 [31] -3.305920381 2.837391560 [36] 6.900390074 -0.490227749 -3.003517591 [41] 2.203254380 [46] 0.911479538 -1.130621539 [51] 0.153871416 0.786931249 0.550904911 4.392683431 2.744203927 -2.988515419 7.883843005 12.072546060 2.284069818 3.391368121 6.909114629 5.453713625 5.618779771 -2.073108816 -0.705343870 -1.928645030 12 [56] 0.381198466 2.962436960 -2.093332755 -4.193534320 9.030390948 [61] 3.524519959 3.537574470 -1.171645319 8.747353307 2.470723311 [66] 10.021189802 3.562368160 1.041737847 -0.034606098 -1.729761583 [71] 6.886239664 7.190669322 8.535222726 [76] 0.423825384 9.502248649 5.936880448 -3.061883462 -3.522890904 [81] 2.186850190 5.704365193 -2.128791820 3.698705126 0.220870726 7.785209121 -8.670565016 [86] -3.147239002 -1.656142496 0.778132536 5.049019307 6.349020934 [91] 3.295050757 9.307084422 5.237748023 -1.903117081 1.847576343 [96] 1.068654452 3.102929390 -7.237209984 -0.740945367 2.638578180 6.202710018 -0.004069719 2.798317650 [101] -2.031485363 5.642531206 [106] 6.882103775 -0.956596247 2.366458786 5.493837266 -1.730639713 [111] 4.982947714 4.417225016 4.788445765 -0.969853481 5.592838127 [116] -3.998173646 -1.481809819 1.011862395 -0.537730789 8.177127446 [121] 1.180374828 -2.311751313 -3.245168209 4.196470783 4.701655399 [126] 5.431580484 0.083231102 0.119834242 7.596097513 -1.111347952 [131] -1.533085470 7.364779690 7.634072923 0.412890235 -1.454538846 [136] -2.185427181 7.358822713 0.062120404 1.800029391 2.571732700 [141] -5.245476732 -7.904310359 3.074961843 1.561507399 2.295087145 [146] 7.580388359 5.253521410 3.940190461 -0.046578566 -5.329969438 [151] 3.387270949 1.582039672 2.780308669 -0.375832591 [156] 0.008395951 2.601680487 9.533725094 0.694111290 7.294418297 10.312334848 [161] -1.087296972 12.304844296 2.138258640 -1.855732245 6.175450959 [166] 10.427188687 9.973692202 8.207523479 5.136222523 4.700412906 [171] 1.739724532 -0.077276770 8.390004806 0.900238953 [176] -7.233447913 3.311313702 -2.657145577 -1.275059616 6.746908004 [181] -0.255951403 2.289077158 0.215136088 2.935440199 6.945107462 [186] 1.986374874 6.191729276 0.574863390 11.084035294 8.594470631 [191] 1.502290158 6.231591151 1.115593634 4.245974009 6.585035363 [196] -0.007240504 -4.124090894 -0.312889427 2.368999413 6.725580792 5.514952161 Four basic commands on the t-distribution dt (density drawing) 13 pt (percentile calculation) qt (quantile calculation) rt (random drawings) Chi-squared distribution dchisq pchisq qchisq rchisq F-distribution dF pF qF rF General ideas on sample size calculations 1. 2. 3. 4. The null hypothesis is a hypothesis of skepticism. The alternative is a hypothesis of optimism. Funding agencies put emphasis on the null hypothesis. Researchers put emphasis on the alternative hypothesis. Example A targeted population has average systolic blood pressure µ0 = 130 and standard deviation σ = 16. The average is high. A researcher developed a drug that is specifically tailored to the target population. Her contention is that this drug will help bring the mean down. 14 Design: Take a random sample of n subjects from the target population and administer the drug for a prescribed period of treatment. What should be the value of n? Null hypothesis: Skepticism: The drug has no effect. H 0: µ = µ0 = 130 Alternative: Optimism: The drug has indeed some effect. H 1: µ < µ0 We do not know which hypothesis is true? No amount of data collected will tell you that, unless you are a Bayesian. We can only check whether or not the data collected is consistent with the null hypothesis. We use a test statistic Z, say, and test built on the test statistic. If you reject the null hypothesis when it is true, you are raising false hopes. You are indulging in a false positive. We need to control this type of error (Type I error). The funding agency wants to control Type I error probability. Typically, the probability is denoted by α. Typical values of α are 0.05, 0.01, 0.10. If you reject the null hypothesis when it is not true, this is a good thing. The researcher would love this scenario. She wants to control the chances of rejecting the null hypothesis when it is not true. This is power. This is where n plays a role. Power is usually denoted by 1 – β. The entity β has a special meaning. It is the probability of not rejecting the null hypothesis when the alternative is true (Type II error). Controlling Type I error is easy. Meeting the power requirements is mathematically hard. In the calculations, we need the distribution of the test statistic Z when the null hypothesis is true. This is usually easy. Using the distribution, we can obtain the critical value c of the test. The test would be of the form: Reject the null hypothesis if Z > c. 15 We also need the distribution of Z when the alternative is true. This is where problems crop up. Our alternative is hazy. The mean is not clear except that it should be < µ0. How can we find the distribution of Z when we do not know what the mean is? Be concrete. One reasonable choice, here, could be µ = 120 (Normal blood pressure). You want people to have average normal blood pressure if they use your drug. When µ = 120, we can find the distribution of Z. We define the effect size Δ= = = 0.625 = You might say that you want to see this effect size realized with specified power. Technically, the equation for sample size calculation would be Power = Pr(Reject H0 | µ = 120) = Pr(Z > c | Δ = 0.625). We know the distribution of Z when µ = 120 or, equivalently, Δ = 0.625. This equation will be a function of n. Solve this integral equation for n. Generally, a lot of computing effort is involved here. Caution: The sample size depends on the choice of the test statistic too, besides the level of significance α, effect size Δ, and power. FAQ and pointers 1. How do I come up with the alternative mean? As a researcher, you are responsible. Statistician cannot help you. Conduct a pilot study. 2. How do I come up with the effect size? As a researcher, you are responsible. Conduct a pilot study. 3. Most researcher proposals require sample size calculations. Typically, pilot studies do not ask for sample size calculations. Sample size is limited by the budget offered. 4. Sample size calculations are recommended if you are conducting a prospective study. Pilot studies, typically, are retrospective. How many records one extracts depends on time and budget. 16 Signal-to-noise ratio Suppose we have a single population whose quantitative characteristic is normally distributed with mean μ and standard deviation σ. Engineers call the mean as ‘signal’ and standard deviation as ‘noise.’ For us ‘noise’ means ‘variation.’ Suppose mean = 1 and sd = 100. There is too much noise (too much variation) in the population values and the signal is weak. Look at the normal distribution curve with this mean and this standard deviation. mean ± 3*sigma limits: -299 to 301 > curve(dnorm(x, mean = 1, sd = 100), from = -299, to = 301) Some features of this normal distribution 1. 67% of the population values are in between mean – sd and mean + sd, i.e., 99 and 101. 2. 95% of the population values are in between mean – 2*sd and mean + *sd, i.e., 199 and 201. 3. 99% of the population values are in between mean – 3*sd and mean + *sd, i.e., -299 and 301. 4. The peak of the normal curve is at the ‘signal.’ 17 0.004 0.003 0.002 0.000 0.001 dnorm(x, 1, 100) -300 -200 -100 0 100 200 300 x The signal-to-noise ratio is defined by (mean/sd), which, in our case, is 1/100 = 0.01. The signal-to-noise ratio measures how strong the signal is. A value close to zero indicates that the signal is buried deep in the noise. Suppose we know the noise, sd, but not the signal. We want to extract the signal. We take a sample from the population. The sample mean is a good estimate of the signal. If the signal is weak, we need a very large sample to extract the signal. Let us do a lab experiment. Draw a random sample of size 10 from N(1, 100). > x <- rnorm(10, 1, 100) >x [1] -11.155289 -1.114619 24.772015 -96.294882 12.975678 258.847377 -33.853011 -24.976018 184.088338 206.766361 18 > mean(x) [1] 49.41046 > sd(x) [1] 120.7289 The sample is way off the mark (signal). The sample standard deviation, sd, 120.7289 gives an inkling that there is too much variation in the data. Suppose we want to build a 95% confidence interval for the signal. Theory says that with 95% probability the random interval [Mean – 2*(Standard Error of the mean), Mean + 2*(Standard Error of the mean)] contains the population mean. Technically, the random interval [ sd sd X − 2 * ( ), X + 2 * ( ) ] n n contains the population mean with 95% probability. Let us calculate the confidence interval for our data on hand: [- 26.95, 125.77]. Sure, this interval contains the population mean. If we had not known the population mean, this interval is useless since it is so wide. Let us take a bigger sample. > y <- rnorm(100, 1, 100) >y [1] -7.584475 -93.597787 262.402064 57.549877 61.564929 -92.127568 [7] -77.434990 -15.951024 -112.203137 -11.165914 88.682188 -165.483593 [13] 33.363933 127.303996 -87.495185 -200.956444 76.069743 -22.682649 [19] -5.820246 -77.956223 -15.888139 53.633738 93.628054 -103.369858 19 [25] 214.187692 -196.819549 -64.301984 68.267119 195.122075 15.644491 [31] -111.910557 86.415472 4.717438 251.892708 58.522508 -84.680223 [37] 98.099402 -99.176147 48.997803 114.735963 -147.224747 64.300237 [43] -28.979733 91.250281 31.368825 -68.449577 88.375388 -14.719154 [49] -7.322700 27.721927 -38.416187 -34.498582 -125.094495 65.548993 [55] -32.808806 221.979550 20.426503 189.597507 56.740237 -81.423842 [61] 59.409816 -7.372238 -51.109772 -58.382034 -72.963278 -122.822444 [67] -10.678074 72.276480 128.933643 -83.520740 -119.407379 -3.915459 [73] 115.037092 1.126465 -10.823151 67.628837 8.981584 -128.335287 [79] 78.787096 156.665690 163.054969 -73.958915 -82.860109 23.277641 [85] -172.446178 -84.150425 -20.659058 -18.827225 -8.130253 -188.566125 [91] 25.613749 78.774484 86.161610 -19.991938 -112.910038 -105.682786 [97] 54.909825 -135.003036 88.848900 36.287003 > mean(y) [1] 1.958261 The sample mean is closer to the population mean. > sd(y) [1] 100.3590 The 95% confidence interval for the signal is: [-18.112, 22.032]. The interval is better but still wide. Let us take a much bigger sample. 20 > z <- rnorm(1000, 1, 100) > mean(z) [1] 3.696012 Let us take a very, very large sample. > u <- rnorm(10000, 1, 100) > mean(u) [1] 1.540427 The moral of the story is if the signal is weak, we need a very, very large sample to extract it. Let us look at the normal population with mean μ = 1 and σ = 1. The signal-tonoise ratio is (mean/sd) = 1. The signal is very strong. Look at the normal curve. Features of the curve 1. The peak is at the signal. 2. 67% of the population values are in between μ – σ and μ + σ, i.e., 0 to 2. 3. 95% of the population values are in between μ – 2σ and μ + 2σ, i.e., -1 to 3. 4. 99% of the population values are in between μ – 3σ and μ + 3σ, i.e.,-2 to 4. 21 0.4 0.3 0.2 0.0 0.1 dnorm(x, 1, 1) -2 -1 0 1 2 3 4 x Draw a random sample of size 10 from this population. Build a 95% confidence interval. I bet that the interval will be narrow. Moral. Quality of inference depends how much noise there is in the population. The same argument works through sample size calculations. The effect size is akin to sigan-to-noise ratio. The smaller the effect size is, the larger the sample required to detect the effect size. 22 Chapter 2: Single Sample Problems Quantitative Testing of Hypotheses environment Set-up: The random entity X has a normal distribution with mean µ and standard deviation σ. The null hypothesis is H0: µ = µ0 (specified) and the alternative H1: µ > µ0 (One-sided alternative) The population standard deviation σ is known. Potential data: X1, X2, … , Xn Test Statistic: Z-statistic Z = /√" Test: Reject the null hypothesis H0 in favor of the alternative if Z > z1-α What is z1-α, the critical value of the test? The area under the standard normal curve to the left of z1-α is 1 - α. Some examples of α: α Critical value (z1-α) 0.05 1.645 0.01 2.326 0.10 1.282 What should be the sample size n? 23 Specifications: 1. Level of significance: α (Type I Error Probability) (This provides the critical value of the test.) 1 – β (1 – Type II Error Probability) 2. Power: 3. Effect Size: Δ= (Standardized difference) µ1 is the mean under the alternative hypothesis. Jargon What should be the sample size n? With false positive (Type I error) probability controlled at α, what should be the sample size n so that the test is able to detect the standardized difference Δ with power 1 – β? How a mathematical statistician does come up with a formula for n? He needs the distribution of the test statistics Z under the null hypothesis. Z ~ N(0, 1) He needs the distribution of the test statistic Z under the alternative hypothesis. The alternative hypothesis is hazy. It stipulates the unknown mean µ > µ0. One cannot find the distribution of Z. Be concrete. Stipulate the value of the mean µ. Set a value µ = µ1 (> µ0). (Use your judgment.) Under the alternative hypothesis, Z ~ N(√# $ Z ~ N(√#Δ, 1) %, 1) The mean of the normal distribution is a function of the standardized difference Δ and sample size. An important lesson: You do not have to come up with a value of µ under the environment of the alternative hypothesis. Spell out the standardized difference Δ. 24 Formula for sample size The equation to solve is: - 1 – β = Pr(Z > z1-α | Δ) = &. /0 '√#∆, 1"+, For what value of n, this equation is satisfied. Fortunately, we have an explicit solution. n= 3 ./0 1./2 " ∆3 Use R to calculate n. Goal: For each of the levels 0.01, 0.05, and 0.10, power 0.80, 0.90, 0.95, and effect size 0.1 (0.1) 0.9, calculate the required sample size. Step 1: Calculate critical values: z1-α. Use the loop command in R. The loop command is used when the same formula is used under different scenarios. Create a vector of zeros. We will replace the zeros by critical values. > Critical <- c(0, 0, 0) Create a vector of levels of significance. Level <- c(0.01, 0.05, 0.10) Start the loop command. Replace each entry in the folder ‘Critical’ by an appropriate critical value. Replace the i-th entry of ‘Critical’ by calculating the critical value using the i-th entry of ‘Level.’ The command ‘qnorm’ does the trick. > for (i in 1:3) +{ + Critical[i] <- qnorm(Level[i], lower.tail = F) 25 +} > Critical [1] 2.326348 1.644854 1.281552 Round the critical values to three decimal places. > round(Critical, 3) [1] 2.326 1.645 1.282 Let us calculate z1-β s for different choices of power. > Power <- c(0.80, 0.90, 0.95) > Powerz <- c(0, 0, 0) > for (i in 1:3) +{ + Powerz[i] <- qnorm(Power[i]) +} > Powerz [1] 0.8416212 1.2815516 1.6448536 Let us experiment with different effect sizes. Create a folder consisting of numbers 0.1 through 0.9 in increments of 0.1. Delta <- seq(0.1, 0.9, 0.1) > Delta [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 We are ready to start another loop. Remember that the sample size is a function of significance level, power, and effect size. We have chosen three levels of significance, three choices of power, and 9 choices of effect size. Let us create a 326 dimensional array with 9 slices and each slice is a matrix of three rows and three columns with all entries being zeros. Create a column vector of 81 zeros. (rep = repeat) > Sizes <- rep(0, 81) Create the required array. > dim(Sizes) <- c(3, 3, 9) Start a loop. We are trying to fill the entries in the first slice of the loop using the sample size formula for Δ = 0.1. The first line in the code achieves this objective for a general slice k. > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:9) +{ + Sizes[i, j, k] <- ((Critical[i] + Powerz[j])^2)/Delta[k]^2 +} +} +} Round the numbers to integers. > round(Sizes) ,,1 [,1] [,2] [,3] 27 [1,] 1004 1302 1577 [2,] 618 856 1082 [3,] 451 657 856 ,,2 [,1] [,2] [,3] [1,] 251 325 394 [2,] 155 214 271 [3,] 113 164 214 ,,3 [,1] [,2] [,3] [1,] 112 145 175 [2,] 69 95 120 [3,] 50 73 95 ,,4 [,1] [,2] [,3] [1,] 63 81 99 [2,] 39 54 68 [3,] 28 41 54 ,,5 [,1] [,2] [,3] [1,] 40 52 63 [2,] 25 34 43 [3,] 18 26 34 ,,6 28 [,1] [,2] [,3] [1,] 28 36 44 [2,] 17 24 30 [3,] 13 18 24 ,,7 [,1] [,2] [,3] [1,] 20 27 32 [2,] 13 17 22 [3,] 9 13 17 ,,8 [,1] [,2] [,3] [1,] 16 20 25 [2,] 10 13 17 [3,] 7 10 13 ,,9 [,1] [,2] [,3] [1,] 12 16 19 [2,] 8 11 13 [3,] 6 8 11 Let us name the rows, columns, and slices. > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95") > dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effecr 0.2", "Effect 0.3", + "Effect 0.4", "Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", 29 + "Effect 0.9") > round(Sizes) , , Effect 0.1 Power 0.80 Power 0.90 Power 0.95 Level 0.01 1004 1302 1577 Level 0.05 618 856 1082 Level 0.10 451 657 856 , , Effect 0.2 Power 0.80 Power 0.90 Power 0.95 Level 0.01 251 325 394 Level 0.05 155 214 271 Level 0.10 113 164 214 , , Effect 0.3 Power 0.80 Power 0.90 Power 0.95 Level 0.01 112 145 175 Level 0.05 69 95 120 Level 0.10 50 73 95 , , Effect 0.4 Power 0.80 Power 0.90 Power 0.95 Level 0.01 63 81 99 Level 0.05 39 54 68 Level 0.10 28 41 54 , , Effect 0.5 Power 0.80 Power 0.90 Power 0.95 Level 0.01 40 52 30 63 Level 0.05 25 34 43 Level 0.10 18 26 34 , , Effect 0.6 Power 0.80 Power 0.90 Power 0.95 Level 0.01 28 36 44 Level 0.05 17 24 30 Level 0.10 13 18 24 , , Effect 0.7 Power 0.80 Power 0.90 Power 0.95 Level 0.01 20 27 32 Level 0.05 13 17 22 Level 0.10 9 13 17 , , Effect 0.8 Power 0.80 Power 0.90 Power 0.95 Level 0.01 16 20 25 Level 0.05 10 13 17 Level 0.10 7 10 13 , , Effect 0.9 Power 0.80 Power 0.90 Power 0.95 Level 0.01 12 16 19 Level 0.05 8 11 13 Level 0.10 6 8 11 Comments 1. Fix the level and power. As the effect size increases, the sample size needed decreases. Fix α = 0.05 and Power = 0.80. 31 Δ: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 n: 618 155 69 39 25 17 13 10 8 2. Fix the level and effect size. As the power increases, the sample size increases. Set α = 0.05 and Δ = 0.5. Power: 80% 90% 95% n: 25 34 43 3. Fix the power and effect size. As the level increases, the sample size needed decreases. Set Power = 80% and Δ = 0.5 Level: 0.01 0.05 0.10 n: 40 25 18 Query. I do not need these comprehensive tables. I know my level, power, and effect size. Calculate the required sample size. This is Home Work Problem 1. The other type of one-sided alternative The null hypothesis is H0: µ = µ0 (specified) and the alternative H1: µ < µ0 (One-sided alternative) We use the same test statistic. However, the character of the test changes. Test: Reject the null hypothesis in favor of the alternative if Z < zα Sample size numbers will remain exactly the same. The case of two-sided alternative The null hypothesis is H0: µ = µ0 (specified) 32 and the alternative H1: µ ≠ µ0 (Two-sided alternative) The test statistic Z remains the same. Test: Reject the null hypothesis in favor of the alternative if |5| > z1-α/2. The critical value z1-α/2 is located on the following standard normal curve. The number c (z1-α/2) on the X-axis is the critical value. 0.2 Density 0.3 0.4 Standard Normal Curve 0.1 1 - Alpha Alpha/2 0.0 Alpha/2 -3 -2 -c -1 0 1 c 2 3 X All other features remain the same. The formula for the sample size changes slightly. n= 3 ./0/31./2 " ∆3 α 0.05 0.01 0.10 z1-α/2 1.96 2.576 1.645 33 Let us work out the sample size for a variety of configurations of level, power, and effect size. Use R. Critical values are different now. > Level <- c(0.01, 0.05, 0.10) > Critical <- c(0, 0, 0) > for (i in 1:3) +{ + Critical[i] <- qnorm(Level[i]/2, lower.tail = F) +} > Critical [1] 2.575829 1.959964 1.644854 The power numbers z1-β s remain the same. We will use the folder ‘Powerz’ defined earlier. We will use the same ‘Delta’ folder. Let us create an array of dimensions 3x3x9 with every entry being zero. > Sizes <- rep(0, 81) > dim(Sizes) <- c(3, 3, 9) > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:9) +{ + Sizes[i, j, k] <- ((Critical[i] + Powerz[j])^2)/0.1^2 34 +} +} +} > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95") > dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4", "Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9") The sample sizes are given below. > round(Sizes) , , Effect 0.1 Power 0.80 Power 0.90 Power 0.95 Level 0.01 1168 1488 1781 Level 0.05 785 1051 1299 Level 0.10 618 856 1082 , , Effect 0.2 Power 0.80 Power 0.90 Power 0.95 Level 0.01 292 372 445 Level 0.05 196 263 325 Level 0.10 155 214 271 , , Effect 0.3 Power 0.80 Power 0.90 Power 0.95 Level 0.01 130 165 198 Level 0.05 87 117 144 35 Level 0.10 69 95 120 , , Effect 0.4 Power 0.80 Power 0.90 Power 0.95 Level 0.01 73 93 111 Level 0.05 49 66 81 Level 0.10 39 54 68 , , Effect 0.5 Power 0.80 Power 0.90 Power 0.95 Level 0.01 47 60 71 Level 0.05 31 42 52 Level 0.10 25 34 43 , , Effect 0.6 Power 0.80 Power 0.90 Power 0.95 Level 0.01 32 41 49 Level 0.05 22 29 36 Level 0.10 17 24 30 , , Effect 0.7 Power 0.80 Power 0.90 Power 0.95 Level 0.01 24 30 36 Level 0.05 16 21 27 Level 0.10 13 17 22 , , Effect 0.8 Power 0.80 Power 0.90 Power 0.95 Level 0.01 18 23 28 Level 0.05 12 16 20 36 Level 0.10 10 13 17 , , Effect 0.9 Power 0.80 Power 0.90 Power 0.95 Level 0.01 14 18 22 Level 0.05 10 13 16 Level 0.10 8 11 13 Comments: Sample sizes are higher. Set α = 0.05 and Power = 80%. One-sided alternative Δ: n: 0.1 618 0.2 155 0.3 69 0.4 39 0.5 25 0.6 17 0.7 13 0.8 10 0.9 8 0.3 87 0.4 49 0.5 31 0.6 22 0.7 16 0.8 12 0.9 10 Two-sided alternative Δ: n: 0.1 785 0.2 196 May be, we are not comparing apples with apples! The numbers for the two-sided alternative when α = 0.10 and Power = 80% are exactly the same for the onesided alternative when α = 0.05 and Power = 80%. The R code for the standard normal curve picture presented above is given by: > xv <- seq(-3, 3, 0.01) > yv <- dnorm(xv) > plot(xv, yv, type = "l", xlab = "X", ylab = "Density", main = "Standard + Normal Curve") > axis(side = 1, at = c(-1.5, 1.5), labels = c("-c", "c")) > polygon(c(xv[xv <= -1.5], -1.5), c(yv[xv <= - 1.5], yv[xv == -3]), col = + "lightgray") 37 > polygon(c(xv[xv >= 1.5], 1.5), c(yv[xv >= 1.5], yv[xv == 3]), col = + "lightgray") > text(-2, 0.05, "Alpha/2") > text(2, 0.05, "Alpha/2") > text(0, 0.2, "1 - Alpha") The case when the population standard deviation σ is unknown This scenario is more natural. The other features such as normality remain the same. The test statistic used is now different. Let us check where the differences occur. The case of one-sided alternative H0: µ = µ0 (specified) and the alternative H1: µ > µ0 (One-sided alternative) The population standard deviation σ is unknown. Potential data: X1, X2, … , Xn Test Statistic: Student’s t-statistic t = /√" S is the sample standard deviation. Test: Reject the null hypothesis H0 in favor of the alternative if t > tn-1,1-α. The critical value tn-1,1-α comes from the Student’s t-distribution. The area under the Student’s t-curve on n-1 degrees of freedom to the left of tn-1,1-α is 1 – α. What should be the sample size n? Specifications: 1. Level of significance: α (Type I Error Probability) 2. Power: 1 – β (1 – Type II Error Probability) 38 3. Effect Size: Δ= (Standardized difference) Jargon What should be the sample size n? With false positive (Type I error) probability controlled at α, what should be the sample size n so that the test is able to detect the standardized difference Δ with power 1 – β? How a mathematical statistician does come up with a formula for n? He needs the distribution of the test statistics t under the null hypothesis. t ~ t-distribution with n-1 degrees of freedom. He needs the distribution of the test statistic t under the alternative hypothesis. The alternative hypothesis is hazy. It stipulates the unknown mean µ > µ0. One cannot find the distribution of t. Be concrete. Stipulate the value of the mean µ. Set a value µ = µ1 (> µ0). Under the alternative hypothesis, t ~ non-central t-distribution with n-1 degrees of freedom and non-centrality parameter Δ. The probability density function is very complicated and comes in the form of infinite series. The calculation of the required sample size is much harder and one has to use quadrature formulas to evaluate integrals that show up in the calculations. The equation that emerges involves three entities: sample size n; power; and effect size Δ. If one specifies two of these three entities, one can solve the equation for the third one. Typically, we specify the power and effect size. The sample size n can be solved from the equation. An important lesson: You do not have to come up with a value of µ under the environment of the alternative hypothesis. Spell out the standardized difference Δ. There is no explicit formula for the sample size. R has a package ‘pwr,’ which helps us in determining the sample size. Download and activate this package. 39 Let us use this package. Look at the documentation of pwr. The documentation is attached to the notes. > ?pwr Look at the documentation of pwr..test. > ?pwr.t.test Set Level = 0.05, effect size = 0.5, and power = 0.80. > MB <- pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "one.sample", + alternative = "greater") > MB One-sample t test power calculation n = 26.13751 d = 0.5 sig.level = 0.05 power = 0.8 alternative = greater > names(MB) [1] "n" [6] "note" "d" "sig.level" "power" "method" > MB$n [1] 26.13751 Goal: Level = 0.01, 0.05, 0.10 40 "alternative" Power = 0.80, 0.90, 0.95 Effect size = 0.1 (0.1) 0.9 Calculate the required sample size for each configuration of Level, Power, and Effect Size. Use R. > Level <- c(0.01, 0.05, 0.10) > Power <- c(0.80, 0.90, 0.95) > Delta <- seq(0.1, 0.9, 0.1) > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:9) +{ + Sizes[i, j, k] <- pwr.t.test(d = Delta[k], sig.level = Level[i], power = + Power[j], type = "one.sample", alternative = "greater")$n +} +} +} > round(Sizes) , , 1 [,1] [,2] [,3] 41 [1,] 1006 1304 1580 [2,] 620 858 1084 [3,] 452 658 857 , , 2 [,1] [,2] [,3] [1,] 254 328 397 [2,] 156 215 272 [3,] 114 165 215 , , 3 [,1] [,2] [,3] [1,] 114 147 178 [2,] 70 97 122 [3,] 51 74 96 , , 4 [,1] [,2] [,3] [1,] 65 84 101 [2,] 40 55 69 [3,] 29 42 54 , , 5 [,1] [,2] [,3] [1,] 43 55 66 [2,] 26 36 45 [3,] 19 27 35 , , 6 [,1] [,2] [,3] 42 [1,] 31 39 47 [2,] 19 25 31 [3,] 13 19 25 , , 7 [,1] [,2] [,3] [1,] 23 29 35 [2,] 14 19 24 [3,] 10 14 18 , , 8 [,1] [,2] [,3] [1,] 18 23 27 [2,] 11 15 18 [3,] 8 11 14 , , 9 [,1] [,2] [,3] [1,] 15 19 22 [2,] 9 12 15 [3,] 7 9 11 Let us name the rows, columns, and slices. > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95") > dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4", "Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9") > round(Sizes) 43 , , Effect 0.1 Power 0.80 Power 0.90 Power 0.95 Level 0.01 1006 1304 1580 Level 0.05 620 858 1084 Level 0.10 452 658 857 , , Effect 0.2 Power 0.80 Power 0.90 Power 0.95 Level 0.01 254 328 397 Level 0.05 156 215 272 Level 0.10 114 165 215 , , Effect 0.3 Power 0.80 Power 0.90 Power 0.95 Level 0.01 114 147 178 Level 0.05 70 97 122 Level 0.10 51 74 96 , , Effect 0.4 Power 0.80 Power 0.90 Power 0.95 Level 0.01 65 84 101 Level 0.05 40 55 69 Level 0.10 29 42 54 , , Effect 0.5 Power 0.80 Power 0.90 Power 0.95 Level 0.01 43 55 66 Level 0.05 26 36 45 Level 0.10 19 27 35 44 , , Effect 0.6 Power 0.80 Power 0.90 Power 0.95 Level 0.01 31 39 47 Level 0.05 19 25 31 Level 0.10 13 19 25 , , Effect 0.7 Power 0.80 Power 0.90 Power 0.95 Level 0.01 23 29 35 Level 0.05 14 19 24 Level 0.10 10 14 18 , , Effect 0.8 Power 0.80 Power 0.90 Power 0.95 Level 0.01 18 23 27 Level 0.05 11 15 18 Level 0.10 8 11 14 , , Effect 0.9 Power 0.80 Power 0.90 Power 0.95 Level 0.01 15 19 22 Level 0.05 9 12 15 Level 0.10 7 9 11 Comments You do not know the population standard deviation σ. You use the sample standard deviation in your test statistic. You are penalized. Sample sizes are higher. One-sample problem: One-sided alternative: σ is known 45 Set α = 0.05 and Power = 0.80 Δ: n: 0.1 618 0.2 155 0.3 69 0.4 39 0.5 25 0.6 17 0.7 13 0.8 10 0.9 8 One-sample problem: One-sided alternative: σ is unknown Set α = 0.05 and Power = 0.80 Δ: n: 0.1 620 0.2 156 0.3 70 0.4 40 0.5 26 0.6 19 0.7 14 0.8 11 0.9 9 Typically, you need at the most four more observations. Let us now focus on the two-sided alternative. The null hypothesis is H0: µ = µ0 (specified) and the alternative H1: µ ≠ µ0 (Two-sided alternative) The population standard deviation σ is known. The test statistic t remains the same. There is a modification to the test. Test: Reject the null hypothesis if |6| > tn-1,1-α/2. The area under the Student’s t-curve with n-1 degrees of freedom to the left of the critical value tn-1,1-α/2 is 1 – α/2. What should be the sample size n? Levels: 0.01, 0.05, 0.10 Power: 0.80, 0.90, 0.95 Effect Size: 0.1 (0.1) 0.9 46 Home Work Problem No. 3: Calculate the required sample size for each of the 81 configurations. One-sample problem Estimation environment Set-up: The observable entity X has a normal distribution with mean µ and standard deviation σ. Suppose σ is known and µ is unknown. Potential data: X1, X2, … , Xn Error of Interval Estimation Our focus is on the estimation of the unknown population mean µ. A 95% confidence interval is give by 7 − 1.96 ∗ √ ≤ µ ≤ 7 + 1.96 ∗ √ The chances that the random interval covers the population mean are 95%. Margin of Error (of interval estimation) = Length of the confidence intervals = 2 ∗ 1.96 ∗ ? √# We want to control the margin of error. Specify the margin of error d you are willing to tolerate. Set d = 2 ∗ 1.96 ∗ √ Solve for n. n=@ ∗.A∗ B C Specifications 1. Confidence level (Here I have taken it to be 95%.) 2. Population standard deviation σ 3. Margin of Error of Interval Estimation d 47 Example The average miles per gallon (mpg) µ are under focus for a certain brand of cars. The standard deviation σ is 1 mpg. A 95% confidence interval is to be entertained. The margin of error of interval estimation is specified to be d = ½ mpg. What should be the sample size? n=@ ∗.A∗ C = 61.4656 ~ 62 .D An alternative approach Error of Estimation A point estimate of the population mean µ is 7. Error of estimation is defined to be |7 − E|. One specifies how much d one is willing to tolerate the error with a specified probability 95%, say. Mathematically, this results in an equation. Pr(|7 − E| ≤ d) = 0.95. The chances that sample mean is within d units of the population mean are 95%. This gives a formula for n. n=@ .A∗ B C Example The average miles per gallon (mpg) µ are under focus for a certain brand of cars. The standard deviation σ is 1 mpg. A 95% probbaility is to be entertained. The margin of error of estimation is specified to be d = ½ mpg. What should be the sample size? n=@ .A∗ .D C = 15.3664 ~ 16 With this choice of n, we have Pr(7 − 0.5≤ µ ≤ 7 + 0.5) = 0.95 The length of the interval is 1, as it should be. 48 With the choice of n = 62, Pr(7 − 1.96 ∗ Pr(7 − 1.96 ∗ √ ≤ µ ≤ 7 + 1.96 ∗ √ √ ≤ µ ≤ 7 + 1.96 ∗ Pr(7 − 0.25 ≤ µ ≤ 7 + 0.25) ) = 0.95 = √ )= The length of the interval is 0.5, as it should be. No wonder we needed bigger sample size. What happens if the population standard deviation is unknown? Error of Estimation Stein’s two-step procedure Specificatioins 1. Margin of error: d 2. Confidence level: 1 – α Goal: What should be the sample size n so that Pr(|7 − E| ≤ d) = 1 – α. Step 1: Take an initial sample of size m (> 2). (Whatever your budget permits.) Let S2 be the sample variance. 3 3 ∗I/,/0/3 Step 2: Calculate n = max{m, H greatest integer ≤ x. B3 J + 1}. The symbol [x] means the Step 3: Take an additional sample of n – m observations. Step 4. Let 7 be the sample mean of all the n observations. Stein (1945) has shown that the sample mean satisfies the equation Pr(|7 − E| ≤ d) = 1 – α. 49 Illustration Let us go back to the gas mileage problem. The population standard deviation σ is unknown. The experimenter has taken m = 5. The sample data: 24.5, 26, 25, 24.9, 24.6 mpg. S2 = 0.355 Specifications d = ½ mpg 1 – α = 0.95 tm-1,1-α/2 = t4,0.975 = 2.776 Area under the Student’s t-curve with 4 degrees of freedom to the left of 2.776 is 97.5%. > qt(0.975, 4) [1] 2.776445 3 3 ∗I/,/0/3 Calculate H B3 J=@ .DD∗.KK3 .D3 C = [10.94] = 10 n = max{5, 11} = 11 Conclusion: Take an additional sample of 6 observations. How does one choose the initial sample size? A discussion is presented in Moshman (1958) and Seelbinder (1953). 50 Chapter 3: Two-sample problems: Quantitative responses Set-up: Two populations are under focus. The same numerical characteristic of the populations is under scrutiny. (To fix ideas, suppose the numerical characteristic be systolic blood pressure.) X = The value of the numerical characteristic of a random subject from Population 1. Y = The value of the numerical characteristic of a random subject from Population 2. X ~ N(µ1, σ1) Y ~ N(µ2, σ2) Null hypothesis: Hypothesis of skepticism H0: µ1 = µ2 Alternative hypothesis: Hypothesis of optimism H1: µ1 > µ2 (One-sided alternative) Or H1: µ1 < µ2 (One-sided alternative) Or H1: µ1 ≠ µ2 (Two-sided alternative) We want to draw a random sample of size m from Population 1 and of size n from Population 2. What should be the values of m and n? Scenario 1 The standard deviations σ1 and σ2 are known and equal. Let the common value be σ. 51 Look at the testing problem. H0: µ1 = µ2 H1: µ1 > µ2 (One-sided alternative) Specifications 1. Level: α 2. Power: 1 – β 3. The absolute difference |E − E | or the effect size Δ = 3 . Spell out the difference between the population means under the alternative hypothesis. Test Statistic: Z = L 3 N M ∗ Test: Reject the null hypothesis if Z > z1-α. Sample sizes m = n are recommended. Literature says that power is maximized when m = n. Sample size per group m=n= 3 ∗./0 1./2 " ∆3 Compare the formula in the single sample case for the one-sided alternative. Look at the following testing problem. H0: µ1 = µ2 H1: µ1 ≠ µ2 (One-sided alternative) The specifications are the same. The test statistic is the same. The test is slightly different. Test: |5| > z1-α/2 52 Sample size formula m=n= 3 ∗./0/3 1./2 " ∆3 Some reflections 1. When it comes to two populations, sample size required per group is double of that of the single sample case. Not only one needs to take care of variation present in the population but also variation present in the other population. 2. More effort and more money are needed when comparing the means of two populations. Scenario 2 The standard deviations σ1 and σ2 are known but unequal. One-sided alternative H0: µ1 = µ2 H1: µ1 > µ2 (One-sided alternative) Specifications 1. Level: α 2. Power: 1 – β 3. The absolute difference |E − E |.The difference between the population means under the alternative hypothesis. Test statistic: Z = L 3 3 MO 1O3 I N Test: Reject the null hypothesis if Z > z1-α. Equal sample sizes are not recommended. Take a larger sample from the population which has more variation. One choice: let k be the ratio of the variances, i.e., 53 k= 3 33 Formula for n, the sample size for Population 2 n= P ∗33 Q /Q3 T R/0 SR/2 3 m = k*n You can choose the number k any way you want. It does not have to be the ratio of the variances. For your own choice of k, the formulas are: n= P O3 1 3 3 U Q /Q3 T R/0 SR/2 3 m = k*n Scenario 3 The standard deviations σ1 and σ2 are unknown but equal. Let the common value be σ, which is unknown. This is more challenging and interesting. Look at the testing problem. H0: µ1 = µ2 H1: µ1 > µ2 (One-sided alternative) Specifications 1. Level: α 2. Power: 1 – β 3. Effect size Δ = Test Statistic: t = 3 L I . N M$ 1 %∗ 54 S is the pooled standard deviation of the data. Test: Reject the null hypothesis if t > tm+n-2, 1-α. Sample sizes m = n are recommended. Literature says that power is maximized when m = n. Common sample size is hard to compute. The integral equation, which involves a non-central t-distribution, is hard to solve by hand. We need software. We will use the ‘pwr’ package. Illustration α = 0.05 Power = 0.90 Effect size = Δ = 0.5 > pwr.t.test(d = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample", + alternative = "greater") Two-sample t test power calculation n = 107.7313 d = 0.4 sig.level = 0.05 power = 0.9 alternative = greater NOTE: n is number in *each* group If the common standard deviation σ is known, the common sample size is given by, for the same specifications, 55 m=n= 3 ∗./0 1./2 " ∆3 = 107 Homework Problem No. 4 Two-sample problem The standard deviations σ1 and σ2 are unknown but equal. Let the common value be σ, which is unknown. Look at the testing problem. H0: µ1 = µ2 H1: µ1 > µ2 (One-sided alternative) Level: 0.01, 0.05, 0.10 Power: 0.80, 0.90, 0.95 Effect Size: 0.1 (0.1) 0.9 Get an array of the required sample sizes. Look at the following testing problem. H0: µ1 = µ2 H1: µ1 ≠ µ2 (Two-sided alternative) Specifications 1. Level: α 2. Power: 1 – β 3. Effect size Δ = Test Statistic: t = 3 L I . N M$ 1 %∗ S is the pooled standard deviation of the data. Test: Reject the null hypothesis if t > tm+n-2, 1-α/2. 56 Sample sizes m = n are recommended. Literature says that power is maximized when m = n. Common sample size is hard to compute. The integral equation, which involves a non-central t-distribution, is hard to solve by hand. We need software. We will use the ‘pwr’ package. An illustration Level = 0.05 Power = 0.90 Effect Size = 0.4 > pwr.t.test(d = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample", + alternative = "two.sided") Two-sample t test power calculation n = 132.3105 d = 0.4 sig.level = 0.05 power = 0.9 alternative = two.sided NOTE: n is number in *each* group Sample size per group = 133 There is another command available in the ‘Base.’ Look at its documentation. The documentation is attached to the notes. ?power.t.test > power.t.test(delta = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample", 57 + alternative = "two.sided") Two-sample t test power calculation n = 132.3106 delta = 0.4 sd = 1 sig.level = 0.05 power = 0.9 alternative = two.sided NOTE: n is number in *each* group Scenario 4 The standard deviations σ1 and σ2 are unknown but possibly unequal. Welch test is used to test the equality of population means. No satisfactory treatment is available to calculate the required sample sizes. Estimation perspective Absolute estimation error: |V7 − W X − VE − E X| Scenario 1 The population standard deviations σ1 and σ2 are known. Specifications 1. Confidence level: 1 – α 2. Tolerable maximum error: d Goal: Find the sample sizes m and n so that Pr(|V7 − W X − VE − E X| ≤ d) = 1 – α 58 The chances that the difference in the sample means is within d units of the difference between the population means are 1 – α. The above equation cannot be solved. Choose a constant k and take m = k*n. Formulas n = H$ 3 Y + ? % Z 0 J /+ 3 m = k*n Example Goal: Estimate the difference in mean pulse rates for men and women within one beat with probability at least 0.95. The common standard deviation σ is 2 beats. Here, d = 1 and 1 – α = 0.95. Take k = 1. m = n = 31 Scenario 2 The population standard deviations σ1 and σ2 are equal but unknown. Specifications 1. Confidence level: 1 – α 2. Tolerable maximum error: d Goal: Find the sample sizes m and n so that Pr(|V7 − W X − VE − E X| ≤ d) = 1 – α The chances that the difference in the sample means is within d units of the difference between the population means are 1 – α. We take m = n. No formula is available. A two-stage procedure has been worked out. 59 Step 1: Take initial samples of the same size s. Calculate the pooled variance S2 of both the samples. Here, the pooled variance is the average of the individual variances. Step 2: Calculate c = [ B 3\/3,/0/3 ] 3 Step 3: Calculate n = max{s, @ C + 1} ^ Step 4: Take additional samples of size n – s, if necessary, from each of the populations. It has been demonstrated that this two-stage procedure ensures the validity of the tolerance equation. Example Goal: Estimate the difference in mean pulse rates for men and women within one beat with probability at least 0.95. The common standard deviation σ is unknown. Here, d = 1 and 1 – α = 0.95. A sample of 5 subjects from the men’s and women’s groups yielded the following data. Men: 81, 79, 78, 82, 80 Women: 77, 76, 78, 79, 77 Pooled Variance = S2 = 1.9 > qt(0.975, 8) [1] 2.306004 c= $ % = 0.094 . n = max{5, @ .A . A_ C + 1} = 21 60 An additional sample of 16 subjects from each group would do. Scenario 3 The population standard deviations σ1 and σ2 are unequal but unknown. Hard Paired t-test Set-up A subject in the population gives a pair (X, Y) of measurements on a numerical characteristic. Typically, X = Measurement before the treatment begins Y = Measurement after the treatment Here X and Y are correlated. (In the traditional two-sample t-test, X and Y are independent.) Assume X and Y have a bivariate normal distribution with means µ1, and µ2, standard deviations σ1 and σ2, respectively, and correlation coefficient ρ. Let σ be the standard deviation of the difference D = X – Y. The hypothesis of skepticism is: H0: µ1 = µ2 The alternative is whatever you want: one-sided or two-sided. Structure of the data (X1, Y1), (X2, Y2), … , (Xn, Yn) The data come in pairs. What is n? c be the mean and S be the standard deviation of the Let Di = Xi – Yi. Let b differences. Test statistic: t = c Md 3 N 61 If the alternative is H1: µ1 ≠ µ2 the test is given by: Reject the null hypothesis if |6| > tn-1,1-α/2. Specifications 1. Level: α 2. Power: 1 – β 3. Effect size: Δ = 3 Since we are dealing with the difference D, it looks as though we are in the environment of a single-sample problem. Yes, it is indeed true. Let us look at an example. Level = 0.05 Power = 0.80 Effect Size = 0.5 > pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "paired", + alternative = "two.sided") Paired t test power calculation n = 33.36713 d = 0.5 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number of *pairs* 62 > pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "one.sample", + alternative = "two.sided") One-sample t test power calculation n = 33.36713 d = 0.5 sig.level = 0.05 power = 0.8 alternative = two.sided 63 Chapter 4: Multi-sample problem – Quantitative responses – Analysis of Variance Suppose we are testing the equality of k population means. H0: μ1 = μ2 = … = μk H1: H0 not true. Assumptions: All the k populations each is normally distributed with common standard deviation. Specifications For sample size calculations, specify level, power, alternative population means, and within population standard deviation σ. k ∑ ( µi − µ ) 2 Between populations variance σ 2pop = i =1 k , where 1 k k i =1 µ = ( ) ∑ µi = average of the population means. Effect size = Δ = σ pop = bet.pop.sd/within.pop.sd σ In engineering terminology, σ is the noise present in the populations and σpop is a measure of the signal. There are so many signals associated with different populations we need to measure how close they are. The standard deviation σpop of these signals is a measure how close they are. If they are very close, you will need a large sample to detect them. If they are very close, the alternative and null hypotheses are very close. Examples No. of populations = 4 Within population standard deviation = σ = 1 64 Alternative means σpop Effect Size 1 1 1 1.1 0.001875 0.001875 1 1 1 2 0.1875 0.1875 1 1 2 2 0.25 1 2 3 4 1.25 0.25 1.25 Use R to show how these calculations are done. Look at Scenario 1. H0: μ1 = μ2 = μ3 = μ4 H1: μ1 = 1, μ2 = 1, μ3 = 1, μ4 = 1.1 H0 and H1 are almost similar. If we want to detect H1 with a good amount of probability, we will need a very, very large sample from each population. An important discovery: The sample size depends only on the effect size. Test Statistic: F Under the null hypothesis, the distribution of F statistic is F distribution with numerator degrees of freedom k-1 and denominator degrees of freedom k(n-1). I am assuming that the same sample of size n is drawn from each population. Under the alternative hypothesis (with the specified means), the distribution of the F statistic is a non-central F distribution with numerator degrees of freedom k1 and denominator degrees of freedom k(n-1) and non-centrality parameter Δ. There is no need to specify the population means under the alternative hypothesis. It is enough to specify the effect size Δ. Specifications Level = 0.05 Power = 0.80 65 No. of populations = 3 within.var = 1 between.var = 0.04 Effect size = sqrt(between.var)/sqrt(within.var) = 0.2 Use R. > power.anova.test(groups = 3, between.var = 0.04, within.var = 1, sig.level = 0.05, power = 0.80) Balanced one-way analysis of variance power calculation groups = 3 n = 121.4378 between.var = 0.04 within.var = 1 sig.level = 0.05 power = 0.8 NOTE: n is number in each group Specifications Level = 0.05 Power = 0.80 No. of populations = 3 within.var = 4 between.var = 0.16 66 Effect size = sqrt(between.var)/sqrt(within.var) = 0.2 Effect size is the same as in the previous example. > power.anova.test(groups = 3, between.var = 0.16, within.var = 4, sig.level + = 0.05, power = 0.80) Balanced one-way analysis of variance power calculation groups = 3 n = 121.4378 between.var = 0.16 within.var = 4 sig.level = 0.05 power = 0.8 NOTE: n is number in each group We get the same sample size per group. Why? Specifications Level = 0.05 Power = 0.80 No. of populations = 3 within.var = 9 between.var = 0.36 Effect size = sqrt(between.var)/sqrt(within.var) = 0.2 > power.anova.test(groups = 3, between.var = 0.36, within.var = 9, sig.level + = 0.05, power = 0.80) 67 Balanced one-way analysis of variance power calculation groups = 3 n = 121.4378 between.var = 0.36 within.var = 9 sig.level = 0.05 power = 0.8 NOTE: n is number in each group We get the same sample size. Why? Moral. No matter what the configurations of the alternative means and within population standard deviation are, the sample size required remains the same if the effect size is the same. Some guidelines from the social sciences and psychology Small effect size = 0.10 Medium effect size = 0.25 Large effect size = 0.40 Specifications Level = ? Power = No. of populations = ? 68 Effect size = ? The command ‘pwr.anova.test’ also works. Let us get an array of required sample sizes under a number of configurations. Level: 0.01, 0.05, 0.10 Power: 0.80, 0.90, 0.95 Effect Size: 0.1(0.1)0.9 Populations: 3 Let us use R to get the sample sizes. > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:9) +{ + Sizes[i, j, k] <- pwr.anova.test(k = 3, f = Delta[k], sig.level = Level[i], + power = Power[j])$n +} +} +} > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95") 69 > dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4", "Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9") > round(Sizes) , , Effect 0.1 Power 0.80 Power 0.90 Power 0.95 Level 0.01 464 582 690 Level 0.05 322 423 516 Level 0.10 258 349 435 , , Effect 0.2 Power 0.80 Power 0.90 Power 0.95 Level 0.01 117 147 174 Level 0.05 81 106 130 Level 0.10 65 88 109 , , Effect 0.3 Power 0.80 Power 0.90 Power 0.95 Level 0.01 53 66 78 Level 0.05 37 48 58 Level 0.10 29 40 49 , , Effect 0.4 Power 0.80 Power 0.90 Power 0.95 Level 0.01 30 38 45 Level 0.05 21 27 33 70 Level 0.10 17 23 28 , , Effect 0.5 Power 0.80 Power 0.90 Power 0.95 Level 0.01 20 25 29 Level 0.05 14 18 22 Level 0.10 11 15 18 , , Effect 0.6 Power 0.80 Power 0.90 Power 0.95 Level 0.01 14 18 21 Level 0.05 10 13 15 Level 0.10 8 11 13 , , Effect 0.7 Power 0.80 Power 0.90 Power 0.95 Level 0.01 11 13 16 Level 0.05 8 10 12 Level 0.10 6 8 10 , , Effect 0.8 Power 0.80 Power 0.90 Power 0.95 Level 0.01 9 11 12 Level 0.05 6 8 9 Level 0.10 5 6 8 , , Effect 0.9 71 Power 0.80 Power 0.90 Power 0.95 Level 0.01 7 9 10 Level 0.05 5 6 7 Level 0.10 4 5 6 Homework Problem 4: Get the required sample sizes when the number of populations is 4 under a choice of three levels, three powers, and nine effect sizes. 72 Chapter 5: Correlations A subject in the targeted population gives a pair of measurement X and Y on possibly different numerical characteristics. For example, let X = Weight and Y = Height. Or, X = SBP and Y = DBP. Assume bivariate normality of the joint distribution of X and Y. Let ρ be the correlation between X and Y. The entity ρ is a measure of association between X and Y. The hypothesis of skepticism is: H 0: ρ = 0 X and Y are not associated. X and Y are statistically independent. Structure of the data (X1, Y1), (X2, Y2), … , (Xn, Yn) What is n? Let r be the sample correlation coefficient. Test statistic: t = √# − 2 ∗ √ 3 The alternative is: H 1: ρ ≠ 0 Test: Reject the null hypothesis if |6| > tn-2,1-α/2. Specifications 1. Level 2. Power 3. The value of ρ under the alternative The distribution of the t-statistic under the null hypothesis is Student’s t with n-2 degrees of freedom. The distribution of the t-statistic under the alternative hypothesis is Student’s non-central t with n-2 degrees of freedom and non-centrality parameter is the specified value of ρ. 73 Sample size calculations are simple. Specifications: α = 0.05; Power = 0.80; ρ = 0.3 > pwr.r.test(r = 0.3, sig.level = 0.05, power = 0.80, alternative = "two.sided") approximate correlation power calculation (arctangh transformation) n = 84.74891 r = 0.3 sig.level = 0.05 power = 0.8 alternative = two.sided Under the two-sided alternative, what happens if ρ = -0.3? We get exactly the same number. > pwr.r.test(r = -0.3, sig.level = 0.05, power = 0.80, alternative = "two.sided") approximate correlation power calculation (arctangh transformation) n = 84.74891 r = 0.3 sig.level = 0.05 power = 0.8 alternative = two.sided We can have the whole array of sample sizes to cover a variety of configurations of the specifications. Specifications Level: 0.01, 0.05, 0.10 74 Power: 0.80, 0.90, 0.95 Rho: 0.1(0.1)0.9 The following is the R code. > Level <- c(0.01, 0.05, 0.10) > Power <- c(0.80, 0.90, 0.95) > Rho <- seq(0.1, 0.9, 0.1) > Sizes <- rep(0, 3*3*9) > dim(Sizes) <- c(3, 3, 9) > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:9) +{ + Sizes[i, j, k] <- pwr.r.test(r = Rho[k], sig.level = Level[i], power = Power[j], + alternative = "two.sided")$n +} +} +} > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 80%", "Power 90%", "Power 95%") 75 > dimnames(Sizes)[[3]] <- c("Rho 0.1", "Rho 0.2", "Rho 0.3", "Rho 0.4", "Rho 0.5", "Rho 0.6", "Rho 0.7", + "Rho 0.8", "Rho 0.9") > round(Sizes) , , Rho 0.1 Power 80% Power 90% Power 95% Level 0.01 1163 1481 1773 Level 0.05 782 1046 1293 Level 0.10 617 853 1077 , , Rho 0.2 Power 80% Power 90% Power 95% Level 0.01 287 365 436 Level 0.05 194 258 319 Level 0.10 153 211 266 , , Rho 0.3 Power 80% Power 90% Power 95% Level 0.01 125 158 189 Level 0.05 85 112 138 Level 0.10 67 92 115 , , Rho 0.4 Power 80% Power 90% Power 95% Level 0.01 68 86 102 Level 0.05 47 61 75 Level 0.10 37 50 63 , , Rho 0.5 76 Power 80% Power 90% Power 95% Level 0.01 42 52 62 Level 0.05 29 38 46 Level 0.10 23 31 38 , , Rho 0.6 Power 80% Power 90% Power 95% Level 0.01 28 34 40 Level 0.05 19 25 30 Level 0.10 16 20 25 , , Rho 0.7 Power 80% Power 90% Power 95% Level 0.01 19 23 27 Level 0.05 13 17 20 Level 0.10 11 14 17 , , Rho 0.8 Power 80% Power 90% Power 95% Level 0.01 13 16 18 Level 0.05 9 12 14 Level 0.10 8 10 12 , , Rho 0.9 Power 80% Power 90% Power 95% Level 0.01 9 10 11 Level 0.05 7 8 9 Level 0.10 6 7 8 Reflections? 77 Chapter 6: Proportions Sample size calculations in the case of binary response variables Single proportion Problem A drug is under scrutiny. If it is given to a patient, there are only two possible outcomes. Either the patient gets better (1 = Success) or fails to get better (0 = Failure). Let p stand as a generic symbol for the probability of success, which is unknown. We postulate a null hypothesis H0: p = p0 (specified). The alternatives are of three kinds. H1: p < p0 (one-sided alternative) H1: p > p0 (one-sided alternative) H1: p ≠ p0 (two-sided alternative) We need data to test H0 versus H1. A random sample of n patients is taken. Each patient is given the drug. Let X be the number of patients who got better on the drug. Note that X is our data. The possible values of X are 0, 1, 2, … , n. The entity X is an obvious statistic to build a test. An estimate of p is given by pˆ = X , n which is the sample proportion of patients who got better. For testing H0 versus H1: p < p0, reject H0 in favor of H1 if 78 X − p0 n < zα , p0 (1 − p0 ) n where zα is such that area under the standard normal curve to the left of zα is α and α is your chosen level of significance. For testing H0 versus H1: p > p0, reject H0 in favor of H1 if X − p0 n > z1−α , p0 (1 − p0 ) n where z1-α is such that area under the standard normal curve to the left of z1-α is 1α and α is your chosen level of significance. For testing H0 versus H1: p ≠ p0, reject H0 in favor of H1 if | X − p0 n |> z1−α / 2 , p0 (1 − p0 ) n where z1-α/2 is such that area under the normal curve to the left of z1-α/2 is 1 – α/2 and α is your chosen level of significance. All these tests are large sample tests. Normal approximation is employed to calculate the critical values. The normal approximation works very well if the null proportion is in a close neighborhood of 0.5. If not, the arcsine square root transformation of the statistic X/n is recommended. Technically, calculate sin-1(sqrt(X/n)). The ‘pwr’ package calculates sample sizes based on the transformed statistic. Sample size calculations with given power and level. 79 Specifications 1. 2. 3. 4. 5. Level 0.05 Power 0.80 Null proportion: p0 = 0.5 Alternative proportion: p1 = 0.4 Alternative: Two-sided Before we execute the ‘pwr.p.test’ command, we need to translate the numbers p0 and p1 to an effect size h that is operational in the environment of transformed statistic. The technical definition of the effect size h is: h = 2*sin-1(sqrt(p1)) – 2*sin-1(p0) In the sample size calculations, the absolute value of h is used. The following R command gives the value of h. > effect <- ES.h(0.5, 0.4) > effect [1] 0.2013579 Let us understand how the effect size h varies with varying values of the alternative p1. > AltProp <- seq(0.1, 0.9, 0.1) > AltProp [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 > EffectSizes <- ES.h(0.5, AltProp) > EffectSizes [1] 0.9272952 0.6435011 0.4115168 0.0000000 -0.2013579 -0.4115168 [8] -0.6435011 -0.9272952 80 0.2013579 p1: 0.1 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 h: 0.64 0.41 0.20 0.00 0.20 0.41 0.64 0.93 0.93 Moral: The farther you move away from the null proportion, the larger the effect size h is. Let us get the sample size. > pwr.p.test(h = effect, sig.level = 0.05, power = 0.80, alternative = + "two.sided") proportion power calculation for binomial distribution (arcsine transformation) h = 0.2013579 n = 193.5839 sig.level = 0.05 power = 0.8 alternative = two.sided Now the alternative p1 = 0.6. > effect1 <- ES.h(0.5, 0.6) > pwr.p.test(h = effect1, sig.level = 0.05, power = 0.80, alternative = + "two.sided") proportion power calculation for binomial distribution (arcsine transformation) h = 0.2013579 n = 193.5839 sig.level = 0.05 81 power = 0.8 alternative = two.sided The case of two proportions Example 1. H0: p1 = p2 H1: p1 ≠ p2 For sample size calculation, spell out the alternative values of the proportions: p1 = 0.3 and p2 = 0.4, say. Test: Two-sided proportion test Level = 0.05 Power = 0.80 What is the sample size? We need to calculate effect size in this context. It is defined by Effect size = h = 2*sin-1(sqrt(p1)) – 2*sin-1(sqrt(p2)) (arc sine transformation) The command ‘pwr.2p.test’ demands effect size. R can calculate the effect size. The command is ‘ES.h.’ > h <- ES.h(0.3, 0.4) >h [1] -0.2101589 > h1 <- ES.h(0.4, 0.3) > h1 [1] 0.2101589 82 We do a two.sided test. It does not matter whether one takes the effect size h or h1 (the negative of h). > pwr.2p.test(h = h, sig.level = 0.05, power = 0.80) Difference of proportion power calculation for binomial distribution (arcsine transformation) h = 0.2101589 n = 355.4192 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: same sample sizes > pwr.2p.test(h = h1, sig.level = 0.05, power = 0.80) Difference of proportion power calculation for binomial distribution (arcsine transformation) h = 0.2101589 n = 355.4192 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: same sample sizes Comment. We need a sample size of 356 subjects from each group in order to detect the alternative proportions 0.3 and 0.4 with probability 0.80. 83 Chapter 7 Multiple Proportions Social scientists and psychologists provide ballpark numbers as to what constitute small, medium, and large effect sizes. The Guru of these ideas is Jacob Cohen. The package ‘pwr’ provides the relevant numbers. The command ‘cohen.ES’ within the ‘pwr’ package provides all the answers. The case of proportions > cohen.ES(test = "p", size = c("small")) Conventional effect size from Cohen (1982) test = p size = small effect.size = 0.2 > cohen.ES(test = "p", size = c("medium")) Conventional effect size from Cohen (1982) test = p size = medium effect.size = 0.5 > cohen.ES(test = "p", size = c("large")) Conventional effect size from Cohen (1982) test = p size = large effect.size = 0.8 The test statistic used for testing hypotheses on proportions is given by: 84 Let me explain in the case of a single-proportion problem. Let pˆ be the sample proportion based on a random sample of size n from the population. The test statistic is given by 2 * sin −1 ( pˆ ) − 2 * sin −1 ( p0 ) Z= 1 n It has been shown that Z has approximately a normal distribution even in small samples. This test statistic is better than the conventional test described earlier. The case of t-test What are small, medium, and large effect sizes? > cohen.ES(test = "t", size = "small") Conventional effect size from Cohen (1982) test = t size = small effect.size = 0.2 > cohen.ES(test = "t", size = "medium") Conventional effect size from Cohen (1982) test = t size = medium effect.size = 0.5 > cohen.ES(test = "t", size = "large") Conventional effect size from Cohen (1982) test = t 85 size = large effect.size = 0.8 The case of correlation Let ρ be the population correlation coefficient between two quantitative variables. H 0: ρ = 0 H1: ρ ≠ 0; ρ = ρ1 = 0.3, say. What are the small, medium, and large effect sizes? ES = 0.5*ln 1 + ρ1 1 − ρ1 > cohen.ES(test = "r", size = "small") Conventional effect size from Cohen (1982) test = r size = small effect.size = 0.1 > cohen.ES(test = "r", size = "medium") Conventional effect size from Cohen (1982) test = r size = medium effect.size = 0.3 > cohen.ES(test = "r", size = "large") 86 Conventional effect size from Cohen (1982) test = r size = large effect.size = 0.5 Small, Medium, and Large effect sizes > cohen.ES(test = "anov", size = "small") Conventional effect size from Cohen (1982) test = anov size = small effect.size = 0.1 > cohen.ES(test = "anov", size = "medium") Conventional effect size from Cohen (1982) test = anov size = medium effect.size = 0.25 > cohen.ES(test = "anov", size = "large") Conventional effect size from Cohen (1982) test = anov size = large effect.size = 0.4 Multi-proportion problem and chi-squared test 87 How to test the equality of several population proportions? Something to mull about: what is the difference between this problem and ANOVA? A motivating problem We want to investigate the effect of the duration of pre-natal care, and the place where that care is received, on the survival of patients. There are two factors involved. 1. Duration of care: a. Mother received pre-natal care for less than one month b. Mother received pre-natal care for at least one month 2. Clinic where the care was given: a. Clinic A b. Clinic B The response variable is whether or not the infant died in the first month of life. The response variable is binary. Four experimental conditions are recognized. The following are the data. Condition Duration Clinic # Infants #Died 1 < 1 month A 179 3 2 ≥ 1 month A 297 4 3 < 1 month B 214 17 4 ≥ 1 month B 25 2 Let us calculate the sample proportions of infants who die in the first month of life. Condition Proportion 1 2 3 4 3/179 = 0.0168 4/297 = 0.0135 17/214 = 0.0794 2/25 = 0.08 88 First impressions 1. I see with my naked eye that there are differences in the proportions. Are these differences statistically significant? My null hypothesis is a hypothesis of skepticism. The underlying population proportions do not depend on the condition. In other words, the four population proportions are the same. 2. My hunch is that the null hypothesis will be rejected. 3. The death proportion in Clinic B is higher than the one in Clinic A. Is the difference significant? 4. Does the duration of pre-natal care matter? Formulation of problem p1 = Probability that the infant dies in the first month of life when the mother receives pre-natal care for less than a month in Clinic A p2 = Probability that the infant dies in the first month of life when the mother receives pre-natal care for at least a month in Clinic A p3 = Probability that the infant dies in the first month of life when the mother receives pre-natal care for less than a month in Clinic B p4 = Probability that the infant dies in the first month of life when the mother receives pre-natal care for at least a month in Clinic B H0: p1 = p2 = p3 = p4 H1: H0 not true The null hypothesis avers that the chances of death do not depend on the condition. Use the data collected to test H0 versus H1. One uses a chi-squared test. Arrange the data in the form of a contingency table. 89 Mortality Condition 1 2 3 4 Total # Died 3 4 17 2 26 # Alive 176 293 197 23 689 Total 179 297 214 25 715 The numbers in the above table are called observed frequencies (O). Calculate the expected frequencies (E) if the null hypothesis is true. Expected frequencies (E) Mortality Condition 1 2 3 4 Total # Died 26 ∗ 179 715 26 ∗ 297 715 26 ∗ 214 715 26 ∗ 25 715 26 # Alive 689 ∗ 179 715 689 ∗ 297 715 689 ∗ 214 715 689 ∗ 25 715 689 Total 179 297 214 25 715 Expected frequencies (E) simplified Mortality Condition 1 2 3 4 # Died 6.51 10.80 7.78 0.91 26 # Alive 172.49 286.20 206.22 24.09 689 Total 179 297 214 25 Interpretation. 90 Total 715 If the null hypothesis were true, the above were the frequencies expected in the 8 cells. The observed cell frequencies are different from the expected ones. Are the differences significant? In order to answer this question, we calculate the chisquared test statistic. χ2 = ∑ (O − E ) 2 (3 − 6.51) 2 (176 − 172.49) 2 = + + E 6.51 172.49 (4 − 10.80) 2 (293 − 286.20)2 (17 − 7.78) 2 + + 10.80 286.20 7.78 (197 − 206.22)2 (2 − 0.91)2 (23 − 24.09) 2 + + 206.22 0.91 24.09 = 19.10 Theory If the null hypothesis is true, theoretically the chi-squared statistic has a chisquared distribution with 3 degrees of freedom asymptotically (in large samples). (Degrees of freedom = # groups – 1) If the null hypothesis is true, we expect the expected frequencies to be close to the observed frequencies. Consequently, we expect the chi-squared statistic value to be small. A large observed value of the chi-squared statistic value would cast doubts on the validity of the null hypothesis. Reject the null hypothesis in favor of the alternative if the observed chi-squared value is large, i.e., > cα, the 100*α critical value. The critical value satisfies the following equation. α = Pr( χ 3 2 > cα ) Draw a graph. Use R to get the critical value. > qchisq(0.05, 3, lower.tail = F) [1] 7.814728 91 Test Reject the null hypothesis if the observed chi-squared value > 7.81. The observed chi-squared value is 19.10 > 7.81. We reject the null hypothesis. As per the data collected, it seems that the death rate of infants does depend on the conditions. We can calculate the p-value associated with the data. p-value = Pr( χ32 > 19.10) = The probability of observing the chi-squared value as large as the one we have gotten when the null hypothesis is true > pchisq(19.10, 3, lower.tail = F) [1] 0.0002606860 The null hypothesis is rejected very, very strongly. The observed result is very, very significant. R can calculate the chi-squared statistic and also the p-value. Let us do this. Input the data in a matrix form. > InfantDeaths <- matrix(c(3, 4, 17, 2, 176, 293, 197, 23), nrow = 2, byrow = T) > InfantDeaths [,1] [,2] [,3] [,4] [1,] 3 4 17 2 [2,] 176 293 197 23 Label the rows and columns. > rownames(InfantDeaths) <- c("Died", "Alive") > colnames(InfantDeaths) <- c("A&<1Month", "A&>1Month", "B&<1Month", "B&>1Month") 92 > InfantDeaths A&<1Month A&>1Month B&<1Month B&>1Month Died 3 4 17 2 Alive 176 293 197 23 The command for the chi-squared test is ‘chisq.test.’ > chisq.test(InfantDeaths) Pearson's Chi-squared test data: InfantDeaths X-squared = 19.0964, df = 3, p-value = 0.0002611 Warning message: In chisq.test(InfantDeaths) : Chi-squared approximation may be incorrect The output matches with our hand calculations. One can ask for the expected frequencies under the null hypothesis. > chisq.test(InfantDeaths)$expected A&<1Month A&>1Month B&<1Month B&>1Month Died 6.509091 10.8 7.781818 Alive 172.490909 286.2 206.218182 One can ask for the observed frequencies. > chisq.test(InfantDeaths)$observed 93 0.9090909 24.0909091 A&<1Month A&>1Month B&<1Month B&>1Month Died 3 4 17 2 Alive 176 293 197 23 > O <- chisq.test(InfantDeaths)$observed > E <- chisq.test(InfantDeaths)$expected > (O-E)^2/E A&<1Month A&>1Month B&<1Month B&>1Month Died 1.89177247 4.2814815 10.919669 1.30909091 Alive 0.07138764 0.1615653 0.412063 0.04939966 By looking at the table above, one can find out which cell is contributing the most to the chi-squared value. One can find out in which of the cells the discrepancy is large between the observed and expected frequencies. The hypothesis of homogeneity of proportions is rejected. The chi-squared test is traditionally used to test independence of two categorical variables based on the data collected on the variables. Testing the equality of several population proportions is equivalent to testing statistical independence of the two categorical variables ‘Mortality’ and ‘Condition,’ in our context. Power calculations for testing the equality of three proportions. A motivating problem from the Digestive Health Center, Children’s Hospital There are three methods of detection of colon cancer gene based. 1. RTPCR (Diagnosis) Response: Yes/No 2. Methylation Response: Yes/No 3. Deletion Response: Yes/No 94 Let p1 = probability that response is Yes when RTPCR is used on a patient with cancer p2 = probability that response is Yes when Methylation is used on a patient with cancer p3 = probability that response is Yes when Deletion is used on a patient with cancer Only one method can be used on a patient. Which method is more effective? We start with the hypothesis of skepticism. H0: p1 = p2 = p3 H1: H0 not true In order to test the null hypothesis, the investigator intends to use either 40 or 70 cancer patients under each method of detection. Questions 1. What is the effect size in this context? 2. Calculate power under each proposed sample sizes for different effect sizes. Some additional information. The investigator is very familiar with RTPCR. If cancer is present, the chances are 75% that the response under RTPCR is ‘Yes.’ For power calculations, we need the following. 1. Level, α (0.05?) 2. Power 3. Effect size Formula for effect size. The general case of k proportions 95 H0: p1 = p2 = … = pk H1: H0 not true. Spell out what the proportions are under the alternative hypothesis. (Like as in the Analysis of Variance) Let the specified values be p1 = p11, p2 = p12, … , pk = p1k The effect size w is defined by w= ( p11 − po ) 2 ( p12 − p0 ) 2 ( p1k − p0 ) 2 + + ... + p0 p0 p0 The entity p0 is the common value of the proportions under the null hypothesis. How do I get this? Example (Courtesy: J . Cohen) A market researcher is seeking to determine the relative preference by consumers among four different package designs for a new product. Let pi be the true proportion of customers who like Design i, i = 1, 2, 3, 4. H0: p1 = p2 = p3 = p4 H1: H0 not true. He wants to find the necessary sample size to conduct a survey. A pilot study of 100 consumers gave pˆ1 = 0.3750, pˆ 2 = 0.2083, pˆ 3 = 0.2083, pˆ 4 = 0.2083 . Take these proportions as the alternative proportions. He wants a confirmation of these proportions. What about the common value of the proportions under the null hypothesis? Take p0 as the average of these proportions, which is 0.25. Now compute w (0.289). 96 Let us find out what small, medium, and large effect sizes are in this environment? > cohen.ES(test = "chisq", size = "small") Conventional effect size from Cohen (1982) test = chisq size = small effect.size = 0.1 > cohen.ES(test = "chisq", size = "medium") Conventional effect size from Cohen (1982) test = chisq size = medium effect.size = 0.3 > cohen.ES(test = "chisq", size = "large") Conventional effect size from Cohen (1982) test = chisq size = large effect.size = 0.5 Look at the documentation on chis-quared test. ? pwr.chisq.test Let us find the sample size for our problem. > pwr.chisq.test(w = 0.289, df = 3, sig.level = 0.05, power = 0.80) 97 Chi squared power calculation w = 0.289 N = 130.5368 df = 3 sig.level = 0.05 power = 0.8 NOTE: N is the number of observations We need to survey 131 consumers. 98 Chapter 8 McNemar test This is similar to the paired t-test for a pair of binary variables. For every subject in the targeted population, the response is the outcome of a pair (X, Y) of binary variables. Example from toxicology (Courtesy: Kanishta) Two medical devices are being compared for the presence of a particular type of bacteria in human beings. Device 1: Dosimeter Device 2: Nasal swab Design Take a sample of n subjects. On each subject try both the devices. X = Verdict from the dosimeter: Presence of bacteria: yes or no Y = Verdict from the nasal swab: Presence of bacteria: yes or no X and Y are correlated because the responses are coming from the same subject. Joint distribution of responses from a random subject Dosimeter Nasal swab Marginal Yes No Yes p11 p12 p1 No p21 p22 p2 Marginal q1 q2 1 p1 = Population probability that Dosimeter gives the verdict that bacteria is present 99 q1 = Population probability that Nasal swab gives the verdict that bacteria is present Hypothesis of skepticism: No difference H0: p1 = q1 H1: p1 ≠ q1 Mathematical aside: p1 = q1 if and only if p12 = p21. This observation has an important bearing in the development of a test for this testing problem. A sample of n subjects gives data: (X1, Y1), (X2, Y2), … , (Xn, Yn) Each Xi is Yes or No. Each Yi is Yes or No. Arrange the data in the form of a 2x2 contingency table. Dosimeter Nasal swab Marginal Yes No Yes n11 n12 n1 No n21 n22 n2 Marginal m1 m2 n Test Statistic: χ2 = V3 3 X3 3 13 If the null hypothesis is true, the chi-squared statistic has a chi-squared distribution with one degree of freedom. This is important for the computation of the critical value or p-value. McNemar test: Reject the null hypothesis if χ2 > e,f . The area under the chi-squared distribution with one degree of freedom to the left of e,f is 1-α. 100 What about the distribution of the chi-squared statistic under the alternative hypothesis? This is important for sample size calculation. What is the effect size here? Specifications 1. Level α 2. Power 1 – β 3. Values of p12 and p21 (Go to the Joint distribution table.) The closer the values of p12 and p21 are, the closer the values of p1 and q1. The closer the values of p12 and p21 are, the closer the alternative and null hypotheses are. The closer the values of p12 and p21 are, the larger the sample size is. An example from Agresti (1990) A sample of 1600 people is asked whether or not they approve the performance of the President. Six months later, the same sample is asked whether or not they approve the performance of the President. The question is whether or not there is a significant shift in the approval rate. The following are the data. 2nd Survey 1st Survey Approve Disapprove Approve Disapprove 794 150 86 570 We want to test the null hypothesis H0: p1 = q1 (No shift) H1: p1 ≠ q1 101 Look for the documentation on McNemar test. > ?mcnemar.test Input the data in a matrix form. > Performance <- matrix(c(794, 86, 150, 570), nrow = 2, dimnames = list( + "1st Survey" = c("Approve", "Disapprove"), "2nd Survey" = c("Approve", + "Disapprove"))) > Performance 2nd Survey 1st Survey Approve Disapprove Approve Disapprove 794 150 86 570 > MB <- mcnemar.test(Performance) > MB McNemar's Chi-squared test with continuity correction data: Performance McNemar's chi-squared = 16.8178, df = 1, p-value = 4.115e-05 Conclusion: There is a significant shift in the approval rate. It was worse (59% versus 55%). An R package ‘TrialSize’ does sample size calculations for the McNemar test. It came into existence on June 03, 2013. Download and then activate the package. Specifications 1. Level: 0.05 102 2. Power = 0.80 or Beta = 0.20 3. p12 = 0.2 and p21 = 0.5 The command needs the ratio of these two numbers and the sum of these numbers. Note that p12 and p21 are not close. > mb <- McNemar.Test(alpha = 0.05, beta = 0.20, psai = 0.2/0.5, paid = 0.7) > mb [1] 58.63224 The required sample size is 59. Change the specifications. 1. Level: 0.05 2. Power = 0.80 or Beta = 0.20 3. p12 = 0.2 and p21 = 0.3 The command needs the ratio of these two numbers and the sum of these numbers. Note that p12 and p21 are close. > mb1 <- McNemar.Test(alpha = 0.05, beta = 0.20, psai = 0.2/0.3, paid = 0.5) > mb1 [1] 390.0778 The required sample size is 391. 103 Chapter 9: Hazard Ratio in Survival Analysis A motivating example The target population is those patients with retinitis pigmentosa. The goal is the prevention of visual loss. The treatment proposed is heavy intake of vitamin A supplement. Design Take a sample of m patients. Each one takes 15,000 IU of vitamin A per day. Follow each one of them for six years. This is the experimental group. Take a sample of n patients. Each one takes 75 IU of vitamin A per day. Follow each one of them for six years. This is the control group. The response variable is the time T at which vision loss occurs. In the survival analysis jargon, this is called survival time. The time T is measured in years. In studies like this, censoring occurs. Censoring (dropouts) within the period of study could occur for several reasons. 1. 2. 3. 4. 5. Death Stops taking medication because of other diseases. Stops taking medication because of side effects. Unwilling to continue take medications. No vision loss occurs during the entire period of time. Data structure Subject times status group 1 2 1 E 2 3 0 C And so on. Interpretation 104 The first subject was in the experimental group. Vision loss occurred in Year 2. This is a genuine observation (status = 1) of the variable T. The second subject was in the Control group. He dropped out in the Year 3. This is a censored observation of T. He took vitamin A for three years and no vision loss occurred. Then he is lost. The goal is whether taking a mega dose of vitamin A postpones vision loss significantly. This question can be answered via a model building endeavor. We use Cox Proportional Hazards model. The covariate is ‘group.’ h(t; group) = h0(t)*exp(γ*group), t > 0 h() is the hazard function of the variable T at time t. h0(t) is the baseline hazard function. ‘group’ is codified as 1 (Experimental) or 0 (Control). HR = Hazard Ratio = exp(γ). Interpretation HR = 1 if and only if γ = 0, which means the mega treatment is not effective. HR < 1 means γ < 0, which means the mega treatment is more effective than the control treatment. Test H0: HR = 1 H1: HR ≠ 1 How many patients to recruit under the experimental treatment and how many under the control treatment? The answer is complicated by the threat of censoring. We need to know rates of censoring under both scenarios. 105 We need to specify the level, power, and the HR we are shooting for under the alternative. Specifications 1 . level: 0.05 2. power: 0.80 or β = 0.2 3. HR = 0.8 We need lots of other information such as censoring rates before we can calculate sample sizes. We need data from a pilot study. The R package ‘powerSurvEpi’ helps calculating sample sizes in this environment. The R command ‘ssizeCT’ uses the pilot study data thrown at it to calculate sample sizes. Download this package and activate it. A pilot study Berson et al (1993) initiated a study to examine the effectiveness of the mega treatment. Recruitment was carried out over a two-year period 1984-87. The study was terminated in September, 1991. The data are available in the package under the name ‘Oph.’ Let us download the data and understand it. > data(Oph) > dim(Oph) [1] 354 3 > head(Oph) times status group 1 1 1 E 2 1 1 E 3 1 1 E 4 1 0 E 106 5 1 0 E 6 1 0 E > summary(Oph) times Min. status :1.000 :0.000 C:182 1st Qu.:4.000 1st Qu.:0.000 E:172 Median :5.000 Median :0.000 Mean Mean :4.401 Min. group :0.435 3rd Qu.:5.000 3rd Qu.:1.000 Max. Max. :6.000 :1.000 > table(Oph$status) 0 1 200 154 Censoring rate > 200/354 [1] 0.5649718 Isolate the experimental group. > OphE <- subset(Oph, Oph$group == "E") > head(OphE) times status group 1 1 1 E 2 1 1 E 3 1 1 E 4 1 0 E 107 5 1 0 E 6 1 0 E > table(OphE$status) 0 1 107 65 Censoring rate under the experimental group. > 107/172 [1] 0.622093 Isolate the Control group. > OphC <- subset(Oph, Oph$group == "C") > head(OphC) times status group 173 1 1 C 174 1 1 C 175 1 1 C 176 1 1 C 177 1 1 C 178 1 1 C > table(OphC$status) 0 1 93 89 Censoring rate under the Control group. > 93/182 108 [1] 0.510989 Fit the Cox model. > MB <- coxph(Surv(times, status) ~ group, data = Oph) > summary(MB) Call: coxph(formula = Surv(times, status) ~ group, data = Oph) n= 354, number of events= 154 coef exp(coef) se(coef) groupE -0.3735 0.6883 z Pr(>|z|) 0.1633 -2.288 0.0222 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 groupE 0.6883 1.453 0.4998 0.9479 Concordance= 0.548 (se = 0.024 ) Rsquare= 0.015 (max possible= 0.991 ) Likelihood ratio test= 5.31 on 1 df, p=0.02126 Wald test = 5.23 on 1 df, p=0.02216 Score (logrank) test = 5.29 on 1 df, p=0.0214 Use this pilot data to calculate sample sizes. The stipulation k = 1 means we want equal sample sizes. > MB1 <- ssizeCT(Surv(times, status) ~ group, dat = Oph, power = 0.8, k = 1, + RR = 0.8, alpha = 0.05) > names(MB1) 109 [1] "mat.lambda" "mat.event" "pC" "pE" "ssize" > MB1$ssize nE nC 706 706 We need to recruit 706 patients under each treatment regimen so that we can be sure with 80% probability that a hazard ratio of 0.8 is detectable if it is true. Get documentation. CT stands for clinical trial. > ?ssizeCT starting httpd help server ... done 110 CHAPTER 10: Multiple Regression Set-up Y: Quantitative Response variable Predictors: X1, X2, … , Xp Multiple Regression Model Y = β0 + β1*X1 + β2*X2 + … + βp*Xp + ε ε ~ N(0, σ) Assumptions The vector (Y, X1, X2, … , Xp) has a multivariate normal distribution. This assumption implies that the conditional distribution Y | X1, X2, … , Xp ~ N(β0 + β1*X1 + β2*X2 + … + βp*Xp, σ) This also implies that E(Y | X1, X2, … , Xp) = β0 + β1*X1 + β2*X2 + … + βp*Xp Var(Y | X1, X2, … , Xp) = σ2 The standard deviation σ does not depend on the covariates X1, X2, … , Xp (Homoscedasticity) It is customary to write the multivariate normal distribution in the multiple regression style presented above. The population multiple correlation coefficient between Y and X = (X1, X2, … , Xp) is usually denoted by gL. or simply, Ρ2. A standard interpretation is that the percentage of variation in Y accounted by the predictors is given by 100*Ρ2. A mathematical formula is available for P2. 111 We want to test the null hypothesis H0: Ρ2 = 0 against the alternative H1: Ρ2 > 0. Interpretation of the null hypothesis Y and X1, X2, … , Xp are statistically independent. β1 = β2 = … βp = 0 As a matter of fact, the null hypothesis and β1 = β2 = … βp = 0 are equivalent. The covariates X1, X2, … , Xp have nothing to say about Y. They are useless predictors. Let R2 be the sample multiple correlation coefficient. Test Statistic: F = h3 i /h3 N/i/ If the null hypothesis is true, F ~ F distribution with numerator degrees of freedom p and denominator degrees of freedom n – p – 1. The critical value for a given level can be found from the F distribution in R. Interpretation of the alternative hypothesis At least one βi ≠ 0. At least one covariate is a useful predictor. What should be the sample size n? Spell out the alternative value of Ρ2. 112 Under this given alternative value of Ρ2, F has a non-central F distribution with numerator degrees of freedom p and denominator degrees of freedom n – p – 1 and non-centrality parameter n* j3 j3 . Now, we can develop an integral equation to solve for the desired sample size. The package MBESS has a command, which gives the sample size. Specifications 1. 2. 3. 4. Level Power Number of predictors Alternative value of the population Ρ2. Look at the documentation. > ?MBESS Look at the documentation of the following command. > ?ss.power.R2 Let us calculate the sample size for the following specifications. Level = 0.05 Power = 0.80 Ρ2 = 0.25 Number of predictors = 5 > MB <- ss.power.R2(Population.R2 = 0.25, alpha.level = 0.05, desired.power = + 0.80, p = 5) > MB $Necessary.Sample.Size 113 [1] 45 $Actual.Power [1] 0.8075842 $Noncentral.F.Parm [1] 15 $Effect.Size [1] 0.3333333 Lower the value of Ρ2. > MB <- ss.power.R2(Population.R2 = 0.2, alpha.level = 0.05, desired.power = + 0.80, p = 5) > MB $Necessary.Sample.Size [1] 58 $Actual.Power [1] 0.8078919 $Noncentral.F.Parm [1] 14.5 $Effect.Size [1] 0.25 Let us calculate the sample size for each of the configurations coming from: Level: 0.01, 0.05, 0.10 Power: 0.80, 0.90, 0.95 114 Alternative Ρ2: 0.2, 0.5, 0.8 Number of variables: 3, 4, 5 There are 3*3*3*3 = 81 configuations. We need a four-dimensional array. > Level <- c(0.01, 0.05, 0.10) > Power <- c(0.80, 0.90, 0.95) > RSquared <- c(0.2, 0.5, 0.8) > Variables <- c(3, 4, 5) Create a four-dimensional array of zeros. > Sizes <- rep(0, 81) > dim(Sizes) <- c(3, 3, 3, 3) Label the rows, columns, and doubly-indexed slices. > dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10") > dimnames(Sizes)[[2]] <- c("Power 80%", "Power 90%", "Power 95%") > dimnames(Sizes)[[3]] <- c("R2 = 0.2", "R2 = 0.5", "R2 = 0.8") > dimnames(Sizes)[[4]] <- c("p = 3", "p = 4", "p = 5") Start four loops. > for (i in 1:3) +{ + for (j in 1:3) +{ + for (k in 1:3) 115 +{ + for (l in 1:3) +{ + Sizes[i, j, k, l] <- ss.power.R2(Population.R2 = RSquared[k], alpha.level = + Level[i], desired.power = Power[j], p = Variables[l])$Necessary.Sample.Size +} +} +} +} > Sizes , , R2 = 0.2, p = 3 Power 80% Power 90% Power 95% Level 0.01 68 83 97 Level 0.05 48 61 73 Level 0.10 39 51 62 , , R2 = 0.5, p = 3 Power 80% Power 90% Power 95% Level 0.01 22 26 29 Level 0.05 16 19 22 Level 0.10 13 16 19 , , R2 = 0.8, p = 3 Power 80% Power 90% Power 95% Level 0.01 11 12 116 13 Level 0.05 8 9 10 Level 0.10 7 8 9 , , R2 = 0.2, p = 4 Power 80% Power 90% Power 95% Level 0.01 74 90 105 Level 0.05 53 67 80 Level 0.10 43 56 68 , , R2 = 0.5, p = 4 Power 80% Power 90% Power 95% Level 0.01 24 28 32 Level 0.05 18 21 24 Level 0.10 15 18 21 , , R2 = 0.8, p = 4 Power 80% Power 90% Power 95% Level 0.01 12 13 14 Level 0.05 10 11 11 Level 0.10 9 9 10 , , R2 = 0.2, p = 5 Power 80% Power 90% Power 95% Level 0.01 80 96 111 Level 0.05 58 72 85 Level 0.10 47 61 73 , , R2 = 0.5, p = 5 Power 80% Power 90% Power 95% Level 0.01 26 31 117 34 Level 0.05 20 23 27 Level 0.10 17 20 23 , , R2 = 0.8, p = 5 Power 80% Power 90% Power 95% Level 0.01 13 15 16 Level 0.05 11 12 13 Level 0.10 10 10 11 Understanding? 118
© Copyright 2024