SAMPLE SIZE CALCULATIONS WITH R

SAMPLE SIZE CALCULATIONS WITH R
Instructor: M B Rao
Division of Biostatistics and Epidemiology
Department of Environmental Health
University of Cincinnati
A Two-Day Workshop
At CR RAO AIMSCS
January 2-3, 2014
Outline
Chapter 1: Introduction to R and General Ideas on Sample Size calculations
Chapter 2: Single Sample Problems: Quantitative responses
Chapter 3: Two-sample problems: Quantitative responses
1
Chapter 4: Multi-sample problems: Quantitative responses – Analysis of
Variance – One-way classified data
Chapter 5: Correlations
Chapter 6: Proportions
Chapter 7: Contingency tables – multiple proportions – Odds ratio – McNemar
test
Chapter 8: Survival analysis – Hazard ratio
Chapter 9: Bioequivalence testing
Chapter 10: Diagnostic tests – ROC curves
Chapter 11: Regressions
Chapter 12: Nonparametric tests
Chapter 13: Analysis of Variance with Repeated Measures
Chapter 14: Comparing variability – Inter- and Intra- variability
Chapter 15: Group Sequential Methods
A traditional class on sample size calculations should include all these topics. This
program is too ambitious for a two-day workshop. I will strive to cover as many of
these topics as possible. The major goal is to empower you to carry out sample
size calculations on your own at least in some of the standard scenarios you come
across.
Goals
1. Understand what goes behind the sample size calculations. Why do we
need to spell out the specifications demanded?
2. Learn how to use R to get the required output.
2
Chapter 1
Introduction to R and General Ideas on Sample Size Calculations
Introduction to R
The purpose of the workshop is to show you how to calculate required sample
size, typically, in the environment of testing hypotheses. The medium I use is the
computational software R. It is free and takes less than five minutes to download
the software. A number of statistical software such as SAS, SPSS, BMDP,
StatExact, Stata, PASS, nQuery, etc. do calculate sample sizes. All these software
cost money.
This is a genuine hands-on-experience workshop. I encourage you strongly to
bring your own laptop to the classroom, copy my moves on the big screen, and
see how the output materializes on your laptop. It is fervently hoped that at the
end of the program you will achieve a good degree of mastery in calculating
sample sizes on your own after grasping the underlying idea.
For those who are attending the workshop for credit, there are homework
problems peppered in the text. You can use the last half-hour of the workshop on
the final day to work out the solutions, compile them in a word file, and email the
file to me. You will get a grade for your effort. Tutors will help you if you face
problems in executing R commands.
Commercial software such as SAS and SPSS produce a number of manuals to help
you use their software. R does not do that. There is a committee which oversees
the running of R. It works purely voluntary. However, any one in the world can
write a book on the usage of R, publish it commercially, and profit from it. Major
commercial publishers such as Springer, John Wiley, and Chapman and Hall are
big in publishing books on R. You can start your own company helping small and
big businesses, for a fee, by attending to their computational needs via R. The
company ‘Revolution Analytics’ is a prime example of such an endeavor.
3
If you develop a new methodology for analyzing data and your research is funded
federally (NIH, NSF, etc.), the federal organizations now demand that you develop
an R code for the implementation of the methodology.
SAS is No. 1 among all commercial software. R is not far behind. There are certain
features of SAS such as spread sheet and simulations that work better than R. R is
catching up.
Books on R
For the last five years, books on R are being published at the rate of one per
month. Some books are about general usage of R. Others are specialized. Select a
specific area of research and explain how R can be used in analyzing the data
pertinent to the specific area. At the latest count, there are more than 100 books
published with the major theme being R. I have more than 50 books in my own
collection. Here is a sample.
1. Peter Dalgaard – Introductory Statistics with R, Second Edition – Springer,
2008
2. Brian S Everitt and Torsten Hothorn – A Handbook of Statistical Analyses
Using R - Chapman and Hall, 2006
3. Julian J Faraway – Linear Models with R – Chapman and Hall, 2004
4. Simon N Wood – Generalized Additive Models: An Introduction with R –
Chapman and Hall, 2006
5. John Fox and Sanford Weisberg – An R Companion to Applied Regression,
Second Edition – Sage
6. Andrew Gelman and Jennifer Hill – Data Analysis Using Regression and
Multilevel/Hierarchical Models – Cambridge University Press
7. Andreas Foulkes – Applied Statistical Genetics with R – Springer, 2009
8. Thomas Lumley – Complex Surveys; A Guide to Analysis Using R – John
Wiley, 2010
9. Ding-Geng Chen and Karl Peace – Clinical Trial Data Analysis Using R –
Chapman and Hall, 2011
10. Paul Murrell – R Graphics, Second Edition – Chapman and Hall, 2011
11. Hadley Wickham – ggplot2 – Springer, 2009
12. Deepayan Sarkar – Lattice: Multivariate Visualization with R – Springer,
2011
4
13. Kunio Takezawa – Guidebook to R Graphics Using Microsoft Windows –
John Wiley, 2012
14. Nicholas Horton and Ken Kleinman – Using R for Data Management,
Statistical Analysis, and Graphics – Chapman and Hall, 2011
15. Andrew Robinson and Jeff Hamann – Forest Analytics with R – Springer,
2011
16. Giovanni Seni and John Elder – Ensemble Methods in Data Mining
(Improving Accuracy Through Combining Predictions) – Morgan and
Claypool Publishers, 2010
17. Dynamic Documents with R and knitr – Chapman and Hall, 2013
18. Applied Meta-Analysis with R, 2013
There are no books addressed to sample size calculations using R. I am hoping to
bring out one. The following books do deal with sample size calculations. They are
essentially cookbooks with pages and pages of tables.
1. MM Desu and D Raghavarao – Sample Size methodology – Academic Press,
New York, 1990
2. Jacob Cohen – Statistical Power Analysis for the Behavioral Sciences –
Second Edition – Lawrence Erlbaum Associates, Hillsdale, NJ, 1988
3. Shein-Chung Chow, Jun Shao, and Hansheng Wang – Sample Size
Calculations in Clinical Research – Marcel Dekker, New York, 2003
4. Lloyd D Fisher and Gerald van Belle – Biostatistics – John Wiley, New York,
1993
What is R?
R is a system for data manipulation, analysis, and presentation including graphics.
The system is free and it takes about five minutes to download it. It has two
components.
1. Base
2. Packages
For most of our needs, it is good enough to download only the base. There are
hundreds and hundreds of packages. Each package is tailored for specific needs.
During our sojourn in sample size calculations, we will need two or three
5
packages. When the need arises, I will explain how to download the package of
interest.
What does R give?
In R a statistical analysis is usually performed in a series of steps with
intermediate results being stored in objects. Systems such as SPSS and SAS
provide extensive output, where as R gives minimal output and stores other
results for subsequent usage with other R functions or commands. This means
that R can be tailored to produce exactly the analysis and results that you want
rather than produce an analysis designed to fit all situations. One has a better
control of analysis in R than in SPSS, BMDP, SAS, STATA, etc.
History.
It all started with the introduction of the system S, which is an elegant, widely
accepted, and enduring software system with outstanding conceptual integrity.
The originator of the system is Dr. John Chambers, currently with Stanford
University. The system S was hailed as the mother of all data analysis systems. In
1998, the Association of Computing Machinery (ACM) presented Dr. Chambers
with its Software System Award for ‘the S system, which has forever altered the
way people analyze, visualize, and manipulate data.’ S is not free.
R was inspired by the S environment. R was initially written by Ross Ihaka and
Robert Gentleman at the Department of Statistics, University of Auckland, New
Zealand. Subsequently, a large group of individuals contributed to R by sending
code and bug reports. The current R is the result of a collaborative effort with
contributions from all over the world.
SAS has a single owner. More than 500 people work on the development of SAS
codes. If a new method comes up in the literature, SAS works on developing a
code and incorporate it into its general system. New versions keep coming out
incorporating new methodologies. However, R has a million brains working
behind. New packages are installed periodically. There are several features of
data analysis in R that SAS has not caught up yet.
6
How to download R?
The current version is 3.0.1.
1. Open Internet.
2. Type R.
3. Click on ‘The R project for statistical computing.’ Their web page opens up.
All resources are available from this page: the R system itself, a collection
of add-on packages, manuals, documentation, and more.
4. Click on ‘Download CRAN.’
5. A number of sights are available from which one can download R. Go to
the nearest website. In our case, I think Ohio – Case Western Reserve
University is the nearest. Click on this site.
6. Click on ‘Download R for Windows.’
7. Click on ‘base.’ R has two components: base and packages. For all practical
needs ‘base’ is good enough. There are hundreds of user-contributed
packages available. When the need arises we will download certain
packages. The ‘base’ is about 40 MB.
8. Click on ‘Download R-2.15.1 for Windows.’
9. Click ‘Run.’
10. Follow the next steps. Always click ‘next.’
11. Click ‘Finish.’
12. An R-icon will be created on your desktop.
13. When you click on the icon, an R-console opens up.
14. I will take charge from this point.
How R is maintained and updated?
There is a core team consisting of the following members:
Doug Bates (Madison)
John Chambers (Stanford)
Peter Dalgaard (Denmark)
Robert Gentleman (USA)
7
Kurt Hornik
Stefano Iacus
Ross Ihaka (New Zealand)
Friedrich Leisch
Thomas Lumley
Martin Maeschler
Duncan Murdoch
Paul Murrell (Australia)
Martyn Plummer
Brian Ripley (U.K.)
Duncan Temple Lang
Luke Tierney (Minneapolis-St.Paul)
Simon Urbanek
Suppose you develop an R code for a new method of analysis. You submit the
code to the core team. They will make it available widely on the internet for
comments, improvements, and bugs. After the code is thoroughly vetted for bugs,
it will be made part of the list of packages available in the R website.
Suppose you have a query about R. Put your question on the web. Usually,
someone responds.
Suppose you want use R for your data analysis. You should know what statistical
methodology you want to use. For example, you want to perform cluster analysis.
You would like to know what R command to be used. Type ‘cluster analysis R’ on a
web browser. You will get plenty of help.
8
R cannot teach you statistics. You ought to have a modicum of knowledge of
statistics before you are ready to use R.
Normal, t-, chi-squared, and F distributions are at the core of any statistical data
analysis. Computations involving these distributions can be done using R. Let us
focus on the normal distribution.
Example:
An engineer in collaboration with a pharmaceutical analyst sets up a machine for
manufacturing tablets of a particular medication. The parameters of the machine
are so set that the machine is supposed to produce tablets each with weight 750
mg. It is impossible for the machine to produce tablets each with exactly the same
weight. Variation in the weights is normal to expect. In order to understand the
variation in the weights of the tablets, they took a random sample of 100,000
tablets and determined the weight of each tablet. This is not hard to do. They
built a (relative frequency) histogram of the data and drew the frequency
polygon. The mean of the weights worked out to be 750 mg and the standard
deviation 60 mg approximately. They also observed that the shape of the
empirical density curve was bell-shaped and symmetrical around the mean. Using
mathematics, they fitted the following theoretical curve, called the normal
distribution, to the histogram.
1 x −750
f(x) =
2
1 1 − 2 ( 60 )
e
, x = weight
2π 60
This curve approximates the contour of the histogram very well. They summarized
the entire data collected in the following way.
The weights of the tablets the machine produces are normally distributed with
mean µ = 750 mg and standard deviation σ = 60 mg.
X = Weight ~ N(750, 60)
9
Look at the documentation on the normal distribution.
?pnorm
There are four commands associated with the normal distribution.
dnorm (for plotting a normal curve)
pnorm (for calculating probability for a given quantile q)
qnorm (for calculating quantile for a given probability p)
rnorm (for drawing random samples from a specified normal distribution)
Let us draw the Normal density function with mean 750 and sd 60.
Determine the three sigma limits.
mean – 3*sd = 570
mean + 3*sd = 930
0.004
0.003
0.002
0.001
0.000
Density
0.005
0.006
Normal Density with mean 750 SD 60
600
650
700
750
800
850
Weight in mg
10
900
The relevant r code:
> curve(dnorm(x, mean = 750, sd = 60), from = 570, to = 930, xlab =
"Weight in mg", ylab = "Density", main = "Normal Density with mean 750
SD 60", col = "red", lwd = 2)
lwd = line width
Uses of the other commands:
1.
What proportion of tablets have weights less than 700 milligrams? The
required proportion is the area under the curve to the left of 700. Draw
a picture. If X stands as a generic symbol for the weight of a random
tablet, in probabilistic jargon, we want to find
Pr(X < 700).
We use ‘pnorm’ command of R. We are given the quantile q = 700, we
want the probability.
pnorm(700, mean = 750, sd = 60, lower.tail = T)
0.2023284
2.
What proportion of tablets have weights greater than 820 milligrams? It
is the area under the curve to the right of 820. Equivalently, Pr(X >
820) = ?
< pnorm(820, mean = 750, sd = 60, lower.tail = F)
< 0.1216725
3.
What proportion of tablets have weights between 710 grams and 820
grams? It is the area under the curve between 710 and 820.
Equivalently, Pr(710 < X < 820) = ? Draw a picture.
< pnorm(820, mean = 750, sd = 60, lower.tail + T) – pnorm(710, mean =
750, sd = 60, lower.tail = T)
< 0.625835
4.
For what quantile q the probability of finding a random tablet with
weight less than or equal to q is 60%? In probabilistic jargon,
11
Pr(X ≤ q) = 0.60.
Solve for q.
< qnorm(0.60, mean = 750, sd = 60, lower.tail = T)
< 765.2008
5.
For what quantile q the probability of finding a random tablet with
weight greater than q is 20%? In probabilistic jargon,
Pr(X > q) = 0.2.
Solve for q.
Draw a picture.
< qnorm(0.2, mean = 750, sd = 60, lower.tail = F)
< 800.4973
How to draw random samples from a normal distribution?
Draw a random sample of size 200 from a normal distribution with mean =
2 and standard deviation = 4.
> MB <- rnorm(200, 2, 4)
> MB
[1]
8.789175699 -1.617459744
0.012799405
2.974297318 -1.320160020
[6]
4.762438460
0.405252116
9.115464798
4.783125687
1.323385626
[11]
0.649639930
6.525328905
3.902918248
8.490263075
4.562788788
[16]
5.935500975
0.305411208
1.227126658 -1.391149804 -1.874279574
[21]
9.111568534 -1.846544766
0.833858577
5.269729951
[26]
5.549521744 -2.528451681
0.163028167
1.575618571 -3.419861794
1.728848697
5.969369979
[31] -3.305920381
2.837391560
[36]
6.900390074 -0.490227749 -3.003517591
[41]
2.203254380
[46]
0.911479538 -1.130621539
[51]
0.153871416
0.786931249
0.550904911
4.392683431
2.744203927 -2.988515419
7.883843005 12.072546060
2.284069818
3.391368121
6.909114629
5.453713625
5.618779771 -2.073108816 -0.705343870 -1.928645030
12
[56]
0.381198466
2.962436960 -2.093332755 -4.193534320
9.030390948
[61]
3.524519959
3.537574470 -1.171645319
8.747353307
2.470723311
[66] 10.021189802
3.562368160
1.041737847 -0.034606098 -1.729761583
[71]
6.886239664
7.190669322
8.535222726
[76]
0.423825384
9.502248649
5.936880448 -3.061883462 -3.522890904
[81]
2.186850190
5.704365193 -2.128791820
3.698705126
0.220870726
7.785209121 -8.670565016
[86] -3.147239002 -1.656142496
0.778132536
5.049019307
6.349020934
[91]
3.295050757
9.307084422
5.237748023 -1.903117081
1.847576343
[96]
1.068654452
3.102929390 -7.237209984 -0.740945367
2.638578180
6.202710018 -0.004069719
2.798317650
[101] -2.031485363
5.642531206
[106]
6.882103775 -0.956596247
2.366458786
5.493837266 -1.730639713
[111]
4.982947714
4.417225016
4.788445765 -0.969853481
5.592838127
[116] -3.998173646 -1.481809819
1.011862395 -0.537730789
8.177127446
[121]
1.180374828 -2.311751313 -3.245168209
4.196470783
4.701655399
[126]
5.431580484
0.083231102
0.119834242
7.596097513 -1.111347952
[131] -1.533085470
7.364779690
7.634072923
0.412890235 -1.454538846
[136] -2.185427181
7.358822713
0.062120404
1.800029391
2.571732700
[141] -5.245476732 -7.904310359
3.074961843
1.561507399
2.295087145
[146]
7.580388359
5.253521410
3.940190461 -0.046578566 -5.329969438
[151]
3.387270949
1.582039672
2.780308669 -0.375832591
[156]
0.008395951
2.601680487
9.533725094
0.694111290
7.294418297 10.312334848
[161] -1.087296972 12.304844296
2.138258640 -1.855732245
6.175450959
[166] 10.427188687
9.973692202
8.207523479
5.136222523
4.700412906
[171]
1.739724532 -0.077276770
8.390004806
0.900238953
[176] -7.233447913
3.311313702 -2.657145577 -1.275059616
6.746908004
[181] -0.255951403
2.289077158
0.215136088
2.935440199
6.945107462
[186]
1.986374874
6.191729276
0.574863390 11.084035294
8.594470631
[191]
1.502290158
6.231591151
1.115593634
4.245974009
6.585035363
[196] -0.007240504 -4.124090894 -0.312889427
2.368999413
6.725580792
5.514952161
Four basic commands on the t-distribution
dt (density drawing)
13
pt (percentile calculation)
qt (quantile calculation)
rt (random drawings)
Chi-squared distribution
dchisq
pchisq
qchisq
rchisq
F-distribution
dF
pF
qF
rF
General ideas on sample size calculations
1.
2.
3.
4.
The null hypothesis is a hypothesis of skepticism.
The alternative is a hypothesis of optimism.
Funding agencies put emphasis on the null hypothesis.
Researchers put emphasis on the alternative hypothesis.
Example
A targeted population has average systolic blood pressure µ0 = 130 and standard
deviation σ = 16. The average is high. A researcher developed a drug that is
specifically tailored to the target population. Her contention is that this drug will
help bring the mean down.
14
Design: Take a random sample of n subjects from the target population and
administer the drug for a prescribed period of treatment. What should be the
value of n?
Null hypothesis: Skepticism: The drug has no effect.
H 0:
µ = µ0 = 130
Alternative: Optimism: The drug has indeed some effect.
H 1:
µ < µ0
We do not know which hypothesis is true? No amount of data collected will tell
you that, unless you are a Bayesian. We can only check whether or not the data
collected is consistent with the null hypothesis. We use a test statistic Z, say, and
test built on the test statistic.
If you reject the null hypothesis when it is true, you are raising false hopes. You
are indulging in a false positive. We need to control this type of error (Type I
error). The funding agency wants to control Type I error probability. Typically, the
probability is denoted by α. Typical values of α are 0.05, 0.01, 0.10.
If you reject the null hypothesis when it is not true, this is a good thing. The
researcher would love this scenario. She wants to control the chances of rejecting
the null hypothesis when it is not true. This is power. This is where n plays a role.
Power is usually denoted by 1 – β. The entity β has a special meaning. It is the
probability of not rejecting the null hypothesis when the alternative is true (Type
II error).
Controlling Type I error is easy. Meeting the power requirements is
mathematically hard. In the calculations, we need the distribution of the test
statistic Z when the null hypothesis is true. This is usually easy. Using the
distribution, we can obtain the critical value c of the test. The test would be of the
form:
Reject the null hypothesis if Z > c.
15
We also need the distribution of Z when the alternative is true. This is where
problems crop up. Our alternative is hazy. The mean is not clear except that it
should be < µ0. How can we find the distribution of Z when we do not know what
the mean is? Be concrete. One reasonable choice, here, could be µ = 120 (Normal
blood pressure). You want people to have average normal blood pressure if they
use your drug. When µ = 120, we can find the distribution of Z. We define the
effect size
Δ=
=
= 0.625 = You might say that you want to see this effect size realized with specified power.
Technically, the equation for sample size calculation would be
Power = Pr(Reject H0 | µ = 120) = Pr(Z > c | Δ = 0.625).
We know the distribution of Z when µ = 120 or, equivalently, Δ = 0.625. This
equation will be a function of n. Solve this integral equation for n. Generally, a lot
of computing effort is involved here.
Caution: The sample size depends on the choice of the test statistic too, besides
the level of significance α, effect size Δ, and power.
FAQ and pointers
1. How do I come up with the alternative mean? As a researcher, you are
responsible. Statistician cannot help you. Conduct a pilot study.
2. How do I come up with the effect size? As a researcher, you are
responsible. Conduct a pilot study.
3. Most researcher proposals require sample size calculations. Typically, pilot
studies do not ask for sample size calculations. Sample size is limited by the
budget offered.
4. Sample size calculations are recommended if you are conducting a
prospective study. Pilot studies, typically, are retrospective. How many
records one extracts depends on time and budget.
16
Signal-to-noise ratio
Suppose we have a single population whose quantitative characteristic is normally
distributed with mean μ and standard deviation σ. Engineers call the mean as
‘signal’ and standard deviation as ‘noise.’ For us ‘noise’ means ‘variation.’
Suppose mean = 1 and sd = 100. There is too much noise (too much variation) in
the population values and the signal is weak. Look at the normal distribution
curve with this mean and this standard deviation.
mean ± 3*sigma limits: -299 to 301
> curve(dnorm(x, mean = 1, sd = 100), from = -299, to = 301)
Some features of this normal distribution
1. 67% of the population values are in between mean – sd and mean +
sd, i.e., 99 and 101.
2. 95% of the population values are in between mean – 2*sd and mean +
*sd, i.e., 199 and 201.
3. 99% of the population values are in between mean – 3*sd and mean +
*sd, i.e., -299 and 301.
4. The peak of the normal curve is at the ‘signal.’
17
0.004
0.003
0.002
0.000
0.001
dnorm(x, 1, 100)
-300
-200
-100
0
100
200
300
x
The signal-to-noise ratio is defined by (mean/sd), which, in our case, is 1/100 =
0.01. The signal-to-noise ratio measures how strong the signal is. A value close to
zero indicates that the signal is buried deep in the noise.
Suppose we know the noise, sd, but not the signal. We want to extract the signal.
We take a sample from the population. The sample mean is a good estimate of
the signal. If the signal is weak, we need a very large sample to extract the signal.
Let us do a lab experiment.
Draw a random sample of size 10 from N(1, 100).
> x <- rnorm(10, 1, 100)
>x
[1] -11.155289 -1.114619 24.772015 -96.294882 12.975678 258.847377 -33.853011 -24.976018 184.088338
206.766361
18
> mean(x)
[1] 49.41046
> sd(x)
[1] 120.7289
The sample is way off the mark (signal). The sample standard deviation, sd,
120.7289 gives an inkling that there is too much variation in the data.
Suppose we want to build a 95% confidence interval for the signal. Theory says
that with 95% probability the random interval
[Mean – 2*(Standard Error of the mean), Mean + 2*(Standard Error of the mean)]
contains the population mean. Technically, the random interval [
sd
sd
X − 2 * ( ), X + 2 * ( ) ]
n
n
contains the population mean with 95% probability.
Let us calculate the confidence interval for our data on hand: [- 26.95, 125.77].
Sure, this interval contains the population mean. If we had not known the
population mean, this interval is useless since it is so wide.
Let us take a bigger sample.
> y <- rnorm(100, 1, 100)
>y
[1] -7.584475 -93.597787 262.402064 57.549877 61.564929 -92.127568
[7] -77.434990 -15.951024 -112.203137 -11.165914 88.682188 -165.483593
[13] 33.363933 127.303996 -87.495185 -200.956444 76.069743 -22.682649
[19] -5.820246 -77.956223 -15.888139 53.633738 93.628054 -103.369858
19
[25] 214.187692 -196.819549 -64.301984 68.267119 195.122075 15.644491
[31] -111.910557 86.415472 4.717438 251.892708 58.522508 -84.680223
[37] 98.099402 -99.176147 48.997803 114.735963 -147.224747 64.300237
[43] -28.979733 91.250281 31.368825 -68.449577 88.375388 -14.719154
[49] -7.322700 27.721927 -38.416187 -34.498582 -125.094495 65.548993
[55] -32.808806 221.979550 20.426503 189.597507 56.740237 -81.423842
[61] 59.409816 -7.372238 -51.109772 -58.382034 -72.963278 -122.822444
[67] -10.678074 72.276480 128.933643 -83.520740 -119.407379 -3.915459
[73] 115.037092 1.126465 -10.823151 67.628837 8.981584 -128.335287
[79] 78.787096 156.665690 163.054969 -73.958915 -82.860109 23.277641
[85] -172.446178 -84.150425 -20.659058 -18.827225 -8.130253 -188.566125
[91] 25.613749 78.774484 86.161610 -19.991938 -112.910038 -105.682786
[97] 54.909825 -135.003036 88.848900 36.287003
> mean(y)
[1] 1.958261
The sample mean is closer to the population mean.
> sd(y)
[1] 100.3590
The 95% confidence interval for the signal is:
[-18.112, 22.032].
The interval is better but still wide.
Let us take a much bigger sample.
20
> z <- rnorm(1000, 1, 100)
> mean(z)
[1] 3.696012
Let us take a very, very large sample.
> u <- rnorm(10000, 1, 100)
> mean(u)
[1] 1.540427
The moral of the story is if the signal is weak, we need a very, very large sample to
extract it.
Let us look at the normal population with mean μ = 1 and σ = 1. The signal-tonoise ratio is (mean/sd) = 1. The signal is very strong. Look at the normal curve.
Features of the curve
1. The peak is at the signal.
2. 67% of the population values are in between
μ – σ and μ + σ, i.e., 0 to 2.
3. 95% of the population values are in between
μ – 2σ and μ + 2σ, i.e., -1 to 3.
4. 99% of the population values are in between
μ – 3σ and μ + 3σ, i.e.,-2 to 4.
21
0.4
0.3
0.2
0.0
0.1
dnorm(x, 1, 1)
-2
-1
0
1
2
3
4
x
Draw a random sample of size 10 from this population. Build a 95% confidence
interval. I bet that the interval will be narrow.
Moral. Quality of inference depends how much noise there is in the population.
The same argument works through sample size calculations. The effect size is akin
to sigan-to-noise ratio. The smaller the effect size is, the larger the sample
required to detect the effect size.
22
Chapter 2: Single Sample Problems
Quantitative
Testing of Hypotheses environment
Set-up: The random entity X has a normal distribution with mean µ and standard
deviation σ. The null hypothesis is
H0: µ = µ0 (specified)
and the alternative
H1: µ > µ0 (One-sided alternative)
The population standard deviation σ is known.
Potential data: X1, X2, … , Xn
Test Statistic: Z-statistic Z =
/√"
Test: Reject the null hypothesis H0 in favor of the alternative if Z > z1-α
What is z1-α, the critical value of the test? The area under the standard normal
curve to the left of z1-α is 1 - α.
Some examples of α:
α
Critical value (z1-α)
0.05 1.645
0.01 2.326
0.10 1.282
What should be the sample size n?
23
Specifications:
1. Level of significance: α (Type I Error Probability) (This provides the
critical value of the test.)
1 – β (1 – Type II Error Probability)
2. Power:
3. Effect Size:
Δ=
(Standardized difference)
µ1 is the mean under the alternative hypothesis.
Jargon
What should be the sample size n? With false positive (Type I error) probability
controlled at α, what should be the sample size n so that the test is able to detect
the standardized difference Δ with power 1 – β?
How a mathematical statistician does come up with a formula for n?
He needs the distribution of the test statistics Z under the null hypothesis.
Z ~ N(0, 1)
He needs the distribution of the test statistic Z under the alternative hypothesis.
The alternative hypothesis is hazy. It stipulates the unknown mean µ > µ0. One
cannot find the distribution of Z. Be concrete. Stipulate the value of the mean µ.
Set a value µ = µ1 (> µ0). (Use your judgment.) Under the alternative hypothesis,
Z ~ N(√# $
Z ~ N(√#Δ, 1)
%, 1)
The mean of the normal distribution is a function of the standardized difference Δ
and sample size.
An important lesson: You do not have to come up with a value of µ under the
environment of the alternative hypothesis. Spell out the standardized difference
Δ.
24
Formula for sample size
The equation to solve is:
-
1 – β = Pr(Z > z1-α | Δ) = &.
/0
'√#∆, 1"+,
For what value of n, this equation is satisfied. Fortunately, we have an explicit
solution.
n=
3
./0 1./2 "
∆3
Use R to calculate n.
Goal: For each of the levels 0.01, 0.05, and 0.10, power 0.80, 0.90, 0.95, and
effect size 0.1 (0.1) 0.9, calculate the required sample size.
Step 1: Calculate critical values: z1-α.
Use the loop command in R. The loop command is used when the same formula is
used under different scenarios.
Create a vector of zeros. We will replace the zeros by critical values.
> Critical <- c(0, 0, 0)
Create a vector of levels of significance.
Level <- c(0.01, 0.05, 0.10)
Start the loop command. Replace each entry in the folder ‘Critical’ by an
appropriate critical value. Replace the i-th entry of ‘Critical’ by calculating the
critical value using the i-th entry of ‘Level.’ The command ‘qnorm’ does the trick.
> for (i in 1:3)
+{
+ Critical[i] <- qnorm(Level[i], lower.tail = F)
25
+}
> Critical
[1] 2.326348 1.644854 1.281552
Round the critical values to three decimal places.
> round(Critical, 3)
[1] 2.326 1.645 1.282
Let us calculate z1-β s for different choices of power.
> Power <- c(0.80, 0.90, 0.95)
> Powerz <- c(0, 0, 0)
> for (i in 1:3)
+{
+ Powerz[i] <- qnorm(Power[i])
+}
> Powerz
[1] 0.8416212 1.2815516 1.6448536
Let us experiment with different effect sizes. Create a folder consisting of
numbers 0.1 through 0.9 in increments of 0.1.
Delta <- seq(0.1, 0.9, 0.1)
> Delta
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
We are ready to start another loop. Remember that the sample size is a function
of significance level, power, and effect size. We have chosen three levels of
significance, three choices of power, and 9 choices of effect size. Let us create a 326
dimensional array with 9 slices and each slice is a matrix of three rows and three
columns with all entries being zeros. Create a column vector of 81 zeros. (rep =
repeat)
> Sizes <- rep(0, 81)
Create the required array.
> dim(Sizes) <- c(3, 3, 9)
Start a loop. We are trying to fill the entries in the first slice of the loop using the
sample size formula for Δ = 0.1. The first line in the code achieves this objective
for a general slice k.
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:9)
+{
+ Sizes[i, j, k] <- ((Critical[i] + Powerz[j])^2)/Delta[k]^2
+}
+}
+}
Round the numbers to integers.
> round(Sizes)
,,1
[,1] [,2] [,3]
27
[1,] 1004 1302 1577
[2,]
618
856 1082
[3,]
451
657
856
,,2
[,1] [,2] [,3]
[1,]
251
325
394
[2,]
155
214
271
[3,]
113
164
214
,,3
[,1] [,2] [,3]
[1,]
112
145
175
[2,]
69
95
120
[3,]
50
73
95
,,4
[,1] [,2] [,3]
[1,]
63
81
99
[2,]
39
54
68
[3,]
28
41
54
,,5
[,1] [,2] [,3]
[1,]
40
52
63
[2,]
25
34
43
[3,]
18
26
34
,,6
28
[,1] [,2] [,3]
[1,]
28
36
44
[2,]
17
24
30
[3,]
13
18
24
,,7
[,1] [,2] [,3]
[1,]
20
27
32
[2,]
13
17
22
[3,]
9
13
17
,,8
[,1] [,2] [,3]
[1,]
16
20
25
[2,]
10
13
17
[3,]
7
10
13
,,9
[,1] [,2] [,3]
[1,]
12
16
19
[2,]
8
11
13
[3,]
6
8
11
Let us name the rows, columns, and slices.
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95")
> dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effecr 0.2", "Effect 0.3",
+ "Effect 0.4", "Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8",
29
+ "Effect 0.9")
> round(Sizes)
, , Effect 0.1
Power 0.80 Power 0.90 Power 0.95
Level 0.01
1004
1302
1577
Level 0.05
618
856
1082
Level 0.10
451
657
856
, , Effect 0.2
Power 0.80 Power 0.90 Power 0.95
Level 0.01
251
325
394
Level 0.05
155
214
271
Level 0.10
113
164
214
, , Effect 0.3
Power 0.80 Power 0.90 Power 0.95
Level 0.01
112
145
175
Level 0.05
69
95
120
Level 0.10
50
73
95
, , Effect 0.4
Power 0.80 Power 0.90 Power 0.95
Level 0.01
63
81
99
Level 0.05
39
54
68
Level 0.10
28
41
54
, , Effect 0.5
Power 0.80 Power 0.90 Power 0.95
Level 0.01
40
52
30
63
Level 0.05
25
34
43
Level 0.10
18
26
34
, , Effect 0.6
Power 0.80 Power 0.90 Power 0.95
Level 0.01
28
36
44
Level 0.05
17
24
30
Level 0.10
13
18
24
, , Effect 0.7
Power 0.80 Power 0.90 Power 0.95
Level 0.01
20
27
32
Level 0.05
13
17
22
Level 0.10
9
13
17
, , Effect 0.8
Power 0.80 Power 0.90 Power 0.95
Level 0.01
16
20
25
Level 0.05
10
13
17
Level 0.10
7
10
13
, , Effect 0.9
Power 0.80 Power 0.90 Power 0.95
Level 0.01
12
16
19
Level 0.05
8
11
13
Level 0.10
6
8
11
Comments
1. Fix the level and power. As the effect size increases, the sample size needed
decreases. Fix α = 0.05 and Power = 0.80.
31
Δ:
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
n:
618 155 69
39
25
17
13
10
8
2. Fix the level and effect size. As the power increases, the sample size
increases. Set α = 0.05 and Δ = 0.5.
Power:
80% 90% 95%
n:
25
34
43
3. Fix the power and effect size. As the level increases, the sample size needed
decreases. Set Power = 80% and Δ = 0.5
Level:
0.01 0.05 0.10
n:
40
25
18
Query. I do not need these comprehensive tables. I know my level, power, and
effect size. Calculate the required sample size.
This is Home Work Problem 1.
The other type of one-sided alternative
The null hypothesis is
H0: µ = µ0 (specified)
and the alternative
H1: µ < µ0 (One-sided alternative)
We use the same test statistic. However, the character of the test changes.
Test: Reject the null hypothesis in favor of the alternative if
Z < zα
Sample size numbers will remain exactly the same.
The case of two-sided alternative
The null hypothesis is
H0: µ = µ0 (specified)
32
and the alternative
H1: µ ≠ µ0 (Two-sided alternative)
The test statistic Z remains the same.
Test: Reject the null hypothesis in favor of the alternative if |5| > z1-α/2.
The critical value z1-α/2 is located on the following standard normal curve. The
number c (z1-α/2) on the X-axis is the critical value.
0.2
Density
0.3
0.4
Standard
Normal Curve
0.1
1 - Alpha
Alpha/2
0.0
Alpha/2
-3
-2
-c
-1
0
1
c
2
3
X
All other features remain the same. The formula for the sample size changes
slightly.
n=
3
./0/31./2 "
∆3
α
0.05
0.01
0.10
z1-α/2
1.96
2.576
1.645
33
Let us work out the sample size for a variety of configurations of level, power, and
effect size. Use R.
Critical values are different now.
> Level <- c(0.01, 0.05, 0.10)
> Critical <- c(0, 0, 0)
> for (i in 1:3)
+{
+ Critical[i] <- qnorm(Level[i]/2, lower.tail = F)
+}
> Critical
[1] 2.575829 1.959964 1.644854
The power numbers z1-β s remain the same. We will use the folder ‘Powerz’
defined earlier. We will use the same ‘Delta’ folder. Let us create an array of
dimensions 3x3x9 with every entry being zero.
> Sizes <- rep(0, 81)
> dim(Sizes) <- c(3, 3, 9)
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:9)
+{
+ Sizes[i, j, k] <- ((Critical[i] + Powerz[j])^2)/0.1^2
34
+}
+}
+}
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95")
> dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4",
"Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9")
The sample sizes are given below.
> round(Sizes)
, , Effect 0.1
Power 0.80 Power 0.90 Power 0.95
Level 0.01
1168
1488
1781
Level 0.05
785
1051
1299
Level 0.10
618
856
1082
, , Effect 0.2
Power 0.80 Power 0.90 Power 0.95
Level 0.01
292
372
445
Level 0.05
196
263
325
Level 0.10
155
214
271
, , Effect 0.3
Power 0.80 Power 0.90 Power 0.95
Level 0.01
130
165
198
Level 0.05
87
117
144
35
Level 0.10
69
95
120
, , Effect 0.4
Power 0.80 Power 0.90 Power 0.95
Level 0.01
73
93
111
Level 0.05
49
66
81
Level 0.10
39
54
68
, , Effect 0.5
Power 0.80 Power 0.90 Power 0.95
Level 0.01
47
60
71
Level 0.05
31
42
52
Level 0.10
25
34
43
, , Effect 0.6
Power 0.80 Power 0.90 Power 0.95
Level 0.01
32
41
49
Level 0.05
22
29
36
Level 0.10
17
24
30
, , Effect 0.7
Power 0.80 Power 0.90 Power 0.95
Level 0.01
24
30
36
Level 0.05
16
21
27
Level 0.10
13
17
22
, , Effect 0.8
Power 0.80 Power 0.90 Power 0.95
Level 0.01
18
23
28
Level 0.05
12
16
20
36
Level 0.10
10
13
17
, , Effect 0.9
Power 0.80 Power 0.90 Power 0.95
Level 0.01
14
18
22
Level 0.05
10
13
16
Level 0.10
8
11
13
Comments:
Sample sizes are higher.
Set α = 0.05 and Power = 80%.
One-sided alternative
Δ:
n:
0.1
618
0.2
155
0.3
69
0.4
39
0.5
25
0.6
17
0.7
13
0.8
10
0.9
8
0.3
87
0.4
49
0.5
31
0.6
22
0.7
16
0.8
12
0.9
10
Two-sided alternative
Δ:
n:
0.1
785
0.2
196
May be, we are not comparing apples with apples! The numbers for the two-sided
alternative when α = 0.10 and Power = 80% are exactly the same for the onesided alternative when α = 0.05 and Power = 80%.
The R code for the standard normal curve picture presented above is given by:
> xv <- seq(-3, 3, 0.01)
> yv <- dnorm(xv)
> plot(xv, yv, type = "l", xlab = "X", ylab = "Density", main = "Standard
+ Normal Curve")
> axis(side = 1, at = c(-1.5, 1.5), labels = c("-c", "c"))
> polygon(c(xv[xv <= -1.5], -1.5), c(yv[xv <= - 1.5], yv[xv == -3]), col =
+ "lightgray")
37
> polygon(c(xv[xv >= 1.5], 1.5), c(yv[xv >= 1.5], yv[xv == 3]), col =
+ "lightgray")
> text(-2, 0.05, "Alpha/2")
> text(2, 0.05, "Alpha/2")
> text(0, 0.2, "1 - Alpha")
The case when the population standard deviation σ is unknown
This scenario is more natural. The other features such as normality remain the
same. The test statistic used is now different. Let us check where the differences
occur.
The case of one-sided alternative
H0: µ = µ0 (specified)
and the alternative
H1: µ > µ0 (One-sided alternative)
The population standard deviation σ is unknown.
Potential data: X1, X2, … , Xn
Test Statistic: Student’s t-statistic t =
/√"
S is the sample standard deviation.
Test: Reject the null hypothesis H0 in favor of the alternative if t > tn-1,1-α.
The critical value tn-1,1-α comes from the Student’s t-distribution. The area under
the Student’s t-curve on n-1 degrees of freedom to the left of tn-1,1-α is 1 – α.
What should be the sample size n?
Specifications:
1. Level of significance: α (Type I Error Probability)
2. Power:
1 – β (1 – Type II Error Probability)
38
3. Effect Size:
Δ=
(Standardized difference)
Jargon
What should be the sample size n? With false positive (Type I error) probability
controlled at α, what should be the sample size n so that the test is able to detect
the standardized difference Δ with power 1 – β?
How a mathematical statistician does come up with a formula for n?
He needs the distribution of the test statistics t under the null hypothesis.
t ~ t-distribution with n-1 degrees of freedom.
He needs the distribution of the test statistic t under the alternative hypothesis.
The alternative hypothesis is hazy. It stipulates the unknown mean µ > µ0. One
cannot find the distribution of t. Be concrete. Stipulate the value of the mean µ.
Set a value µ = µ1 (> µ0). Under the alternative hypothesis,
t ~ non-central t-distribution with n-1 degrees of freedom and non-centrality
parameter Δ.
The probability density function is very complicated and comes in the form of
infinite series. The calculation of the required sample size is much harder and one
has to use quadrature formulas to evaluate integrals that show up in the
calculations. The equation that emerges involves three entities: sample size n;
power; and effect size Δ. If one specifies two of these three entities, one can solve
the equation for the third one. Typically, we specify the power and effect size. The
sample size n can be solved from the equation.
An important lesson: You do not have to come up with a value of µ under the
environment of the alternative hypothesis. Spell out the standardized difference
Δ.
There is no explicit formula for the sample size. R has a package ‘pwr,’ which
helps us in determining the sample size. Download and activate this package.
39
Let us use this package. Look at the documentation of pwr. The documentation is
attached to the notes.
> ?pwr
Look at the documentation of pwr..test.
> ?pwr.t.test
Set Level = 0.05, effect size = 0.5, and power = 0.80.
> MB <- pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "one.sample",
+ alternative = "greater")
> MB
One-sample t test power calculation
n = 26.13751
d = 0.5
sig.level = 0.05
power = 0.8
alternative = greater
> names(MB)
[1] "n"
[6] "note"
"d"
"sig.level" "power"
"method"
> MB$n
[1] 26.13751
Goal:
Level = 0.01, 0.05, 0.10
40
"alternative"
Power = 0.80, 0.90, 0.95
Effect size = 0.1 (0.1) 0.9
Calculate the required sample size for each configuration of Level, Power, and
Effect Size. Use R.
> Level <- c(0.01, 0.05, 0.10)
> Power <- c(0.80, 0.90, 0.95)
> Delta <- seq(0.1, 0.9, 0.1)
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:9)
+{
+ Sizes[i, j, k] <- pwr.t.test(d = Delta[k], sig.level = Level[i], power =
+ Power[j], type = "one.sample", alternative = "greater")$n
+}
+}
+}
> round(Sizes)
, , 1
[,1] [,2] [,3]
41
[1,] 1006 1304 1580
[2,]
620
858 1084
[3,]
452
658
857
, , 2
[,1] [,2] [,3]
[1,]
254
328
397
[2,]
156
215
272
[3,]
114
165
215
, , 3
[,1] [,2] [,3]
[1,]
114
147
178
[2,]
70
97
122
[3,]
51
74
96
, , 4
[,1] [,2] [,3]
[1,]
65
84
101
[2,]
40
55
69
[3,]
29
42
54
, , 5
[,1] [,2] [,3]
[1,]
43
55
66
[2,]
26
36
45
[3,]
19
27
35
, , 6
[,1] [,2] [,3]
42
[1,]
31
39
47
[2,]
19
25
31
[3,]
13
19
25
, , 7
[,1] [,2] [,3]
[1,]
23
29
35
[2,]
14
19
24
[3,]
10
14
18
, , 8
[,1] [,2] [,3]
[1,]
18
23
27
[2,]
11
15
18
[3,]
8
11
14
, , 9
[,1] [,2] [,3]
[1,]
15
19
22
[2,]
9
12
15
[3,]
7
9
11
Let us name the rows, columns, and slices.
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95")
> dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4",
"Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9")
> round(Sizes)
43
, , Effect 0.1
Power 0.80 Power 0.90 Power 0.95
Level 0.01
1006
1304
1580
Level 0.05
620
858
1084
Level 0.10
452
658
857
, , Effect 0.2
Power 0.80 Power 0.90 Power 0.95
Level 0.01
254
328
397
Level 0.05
156
215
272
Level 0.10
114
165
215
, , Effect 0.3
Power 0.80 Power 0.90 Power 0.95
Level 0.01
114
147
178
Level 0.05
70
97
122
Level 0.10
51
74
96
, , Effect 0.4
Power 0.80 Power 0.90 Power 0.95
Level 0.01
65
84
101
Level 0.05
40
55
69
Level 0.10
29
42
54
, , Effect 0.5
Power 0.80 Power 0.90 Power 0.95
Level 0.01
43
55
66
Level 0.05
26
36
45
Level 0.10
19
27
35
44
, , Effect 0.6
Power 0.80 Power 0.90 Power 0.95
Level 0.01
31
39
47
Level 0.05
19
25
31
Level 0.10
13
19
25
, , Effect 0.7
Power 0.80 Power 0.90 Power 0.95
Level 0.01
23
29
35
Level 0.05
14
19
24
Level 0.10
10
14
18
, , Effect 0.8
Power 0.80 Power 0.90 Power 0.95
Level 0.01
18
23
27
Level 0.05
11
15
18
Level 0.10
8
11
14
, , Effect 0.9
Power 0.80 Power 0.90 Power 0.95
Level 0.01
15
19
22
Level 0.05
9
12
15
Level 0.10
7
9
11
Comments
You do not know the population standard deviation σ. You use the sample
standard deviation in your test statistic. You are penalized. Sample sizes are
higher.
One-sample problem: One-sided alternative: σ is known
45
Set α = 0.05 and Power = 0.80
Δ:
n:
0.1
618
0.2
155
0.3
69
0.4
39
0.5
25
0.6
17
0.7
13
0.8
10
0.9
8
One-sample problem: One-sided alternative: σ is unknown
Set α = 0.05 and Power = 0.80
Δ:
n:
0.1
620
0.2
156
0.3
70
0.4
40
0.5
26
0.6
19
0.7
14
0.8
11
0.9
9
Typically, you need at the most four more observations.
Let us now focus on the two-sided alternative.
The null hypothesis is
H0: µ = µ0 (specified)
and the alternative
H1: µ ≠ µ0 (Two-sided alternative)
The population standard deviation σ is known.
The test statistic t remains the same. There is a modification to the test.
Test: Reject the null hypothesis if |6| > tn-1,1-α/2.
The area under the Student’s t-curve with n-1 degrees of freedom to the left of
the critical value tn-1,1-α/2 is 1 – α/2.
What should be the sample size n?
Levels: 0.01, 0.05, 0.10
Power: 0.80, 0.90, 0.95
Effect Size: 0.1 (0.1) 0.9
46
Home Work Problem No. 3: Calculate the required sample size for each of the 81
configurations.
One-sample problem
Estimation environment
Set-up: The observable entity X has a normal distribution with mean µ and
standard deviation σ. Suppose σ is known and µ is unknown.
Potential data: X1, X2, … , Xn
Error of Interval Estimation
Our focus is on the estimation of the unknown population mean µ. A 95%
confidence interval is give by
7 − 1.96 ∗
√
≤ µ ≤ 7 + 1.96 ∗
√
The chances that the random interval covers the population mean are 95%.
Margin of Error (of interval estimation) = Length of the confidence intervals =
2 ∗ 1.96 ∗
?
√#
We want to control the margin of error. Specify the margin of error d you are
willing to tolerate. Set
d = 2 ∗ 1.96 ∗
√
Solve for n.
n=@
∗.A∗ B
C
Specifications
1. Confidence level (Here I have taken it to be 95%.)
2. Population standard deviation σ
3. Margin of Error of Interval Estimation d
47
Example
The average miles per gallon (mpg) µ are under focus for a certain brand of cars.
The standard deviation σ is 1 mpg. A 95% confidence interval is to be entertained.
The margin of error of interval estimation is specified to be d = ½ mpg. What
should be the sample size?
n=@
∗.A∗ C = 61.4656 ~ 62
.D
An alternative approach
Error of Estimation
A point estimate of the population mean µ is 7. Error of estimation is defined to
be |7 − E|. One specifies how much d one is willing to tolerate the error with a
specified probability 95%, say. Mathematically, this results in an equation.
Pr(|7 − E| ≤ d) = 0.95.
The chances that sample mean is within d units of the population mean are 95%.
This gives a formula for n.
n=@
.A∗ B
C
Example
The average miles per gallon (mpg) µ are under focus for a certain brand of cars.
The standard deviation σ is 1 mpg. A 95% probbaility is to be entertained. The
margin of error of estimation is specified to be d = ½ mpg. What should be the
sample size?
n=@
.A∗ .D
C = 15.3664 ~ 16
With this choice of n, we have Pr(7 − 0.5≤ µ ≤ 7 + 0.5) = 0.95
The length of the interval is 1, as it should be.
48
With the choice of n = 62,
Pr(7 − 1.96 ∗
Pr(7 − 1.96 ∗
√
≤ µ ≤ 7 + 1.96 ∗
√
√
≤ µ ≤ 7 + 1.96 ∗
Pr(7 − 0.25 ≤ µ ≤ 7 + 0.25)
) = 0.95 =
√
)=
The length of the interval is 0.5, as it should be.
No wonder we needed bigger sample size.
What happens if the population standard deviation is unknown?
Error of Estimation
Stein’s two-step procedure
Specificatioins
1. Margin of error: d
2. Confidence level: 1 – α
Goal: What should be the sample size n so that Pr(|7 − E| ≤ d) = 1 – α.
Step 1: Take an initial sample of size m (> 2). (Whatever your budget permits.) Let
S2 be the sample variance.
3
3 ∗I/,/0/3
Step 2: Calculate n = max{m, H
greatest integer ≤ x.
B3
J + 1}. The symbol [x] means the
Step 3: Take an additional sample of n – m observations.
Step 4. Let 7 be the sample mean of all the n observations.
Stein (1945) has shown that the sample mean satisfies the equation
Pr(|7 − E| ≤ d) = 1 – α.
49
Illustration
Let us go back to the gas mileage problem. The population standard deviation σ is
unknown.
The experimenter has taken m = 5. The sample data: 24.5, 26, 25, 24.9, 24.6 mpg.
S2 = 0.355
Specifications
d = ½ mpg
1 – α = 0.95
tm-1,1-α/2 = t4,0.975 = 2.776
Area under the Student’s t-curve with 4 degrees of freedom to the left of 2.776 is
97.5%.
> qt(0.975, 4)
[1] 2.776445
3
3 ∗I/,/0/3
Calculate H
B3
J=@
.DD∗.KK3
.D3
C = [10.94] = 10
n = max{5, 11} = 11
Conclusion: Take an additional sample of 6 observations.
How does one choose the initial sample size?
A discussion is presented in Moshman (1958) and Seelbinder (1953).
50
Chapter 3: Two-sample problems: Quantitative responses
Set-up:
Two populations are under focus. The same numerical characteristic of the
populations is under scrutiny. (To fix ideas, suppose the numerical characteristic
be systolic blood pressure.)
X = The value of the numerical characteristic of a random subject from Population
1.
Y = The value of the numerical characteristic of a random subject from Population
2.
X ~ N(µ1, σ1)
Y ~ N(µ2, σ2)
Null hypothesis: Hypothesis of skepticism
H0: µ1 = µ2
Alternative hypothesis: Hypothesis of optimism
H1: µ1 > µ2 (One-sided alternative)
Or
H1: µ1 < µ2 (One-sided alternative)
Or
H1: µ1 ≠ µ2 (Two-sided alternative)
We want to draw a random sample of size m from Population 1 and of size n from
Population 2. What should be the values of m and n?
Scenario 1
The standard deviations σ1 and σ2 are known and equal. Let the common value be
σ.
51
Look at the testing problem.
H0: µ1 = µ2
H1: µ1 > µ2 (One-sided alternative)
Specifications
1. Level: α
2. Power: 1 – β
3. The absolute difference |E − E | or the effect size Δ = 3
. Spell out
the difference between the population means under the alternative
hypothesis.
Test Statistic: Z =
L
3
N
M ∗
Test: Reject the null hypothesis if Z > z1-α.
Sample sizes m = n are recommended. Literature says that power is maximized
when m = n.
Sample size per group
m=n=
3
∗./0 1./2 "
∆3
Compare the formula in the single sample case for the one-sided alternative.
Look at the following testing problem.
H0: µ1 = µ2
H1: µ1 ≠ µ2 (One-sided alternative)
The specifications are the same. The test statistic is the same. The test is slightly
different.
Test: |5| > z1-α/2
52
Sample size formula
m=n=
3
∗./0/3 1./2 "
∆3
Some reflections
1. When it comes to two populations, sample size required per group is
double of that of the single sample case. Not only one needs to take care of
variation present in the population but also variation present in the other
population.
2. More effort and more money are needed when comparing the means of
two populations.
Scenario 2
The standard deviations σ1 and σ2 are known but unequal.
One-sided alternative
H0: µ1 = µ2
H1: µ1 > µ2 (One-sided alternative)
Specifications
1. Level: α
2. Power: 1 – β
3. The absolute difference |E − E |.The difference between the population
means under the alternative hypothesis.
Test statistic: Z =
L
3
3
MO 1O3
I
N
Test: Reject the null hypothesis if Z > z1-α.
Equal sample sizes are not recommended. Take a larger sample from the
population which has more variation. One choice: let k be the ratio of the
variances, i.e.,
53
k=
3
33
Formula for n, the sample size for Population 2
n=
P
∗33
Q /Q3
T
R/0 SR/2
3
m = k*n
You can choose the number k any way you want. It does not have to be the
ratio of the variances. For your own choice of k, the formulas are:
n=
P
O3
1 3
3
U
Q /Q3
T
R/0 SR/2
3
m = k*n
Scenario 3
The standard deviations σ1 and σ2 are unknown but equal. Let the common value
be σ, which is unknown.
This is more challenging and interesting.
Look at the testing problem.
H0: µ1 = µ2
H1: µ1 > µ2 (One-sided alternative)
Specifications
1. Level: α
2. Power: 1 – β
3. Effect size Δ = Test Statistic: t =
3
L
I
.
N
M$ 1 %∗
54
S is the pooled standard deviation of the data.
Test: Reject the null hypothesis if t > tm+n-2, 1-α.
Sample sizes m = n are recommended. Literature says that power is maximized
when m = n.
Common sample size is hard to compute. The integral equation, which involves
a non-central t-distribution, is hard to solve by hand. We need software. We
will use the ‘pwr’ package.
Illustration
α = 0.05
Power = 0.90
Effect size = Δ = 0.5
> pwr.t.test(d = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample",
+ alternative = "greater")
Two-sample t test power calculation
n = 107.7313
d = 0.4
sig.level = 0.05
power = 0.9
alternative = greater
NOTE: n is number in *each* group
If the common standard deviation σ is known, the common sample size is
given by, for the same specifications,
55
m=n=
3
∗./0 1./2 "
∆3
= 107
Homework Problem No. 4
Two-sample problem
The standard deviations σ1 and σ2 are unknown but equal. Let the common value
be σ, which is unknown.
Look at the testing problem.
H0: µ1 = µ2
H1: µ1 > µ2 (One-sided alternative)
Level: 0.01, 0.05, 0.10
Power: 0.80, 0.90, 0.95
Effect Size: 0.1 (0.1) 0.9
Get an array of the required sample sizes.
Look at the following testing problem.
H0: µ1 = µ2
H1: µ1 ≠ µ2 (Two-sided alternative)
Specifications
1. Level: α
2. Power: 1 – β
3. Effect size Δ = Test Statistic: t =
3
L
I
.
N
M$ 1 %∗
S is the pooled standard deviation of the data.
Test: Reject the null hypothesis if t > tm+n-2, 1-α/2.
56
Sample sizes m = n are recommended. Literature says that power is maximized
when m = n.
Common sample size is hard to compute. The integral equation, which involves
a non-central t-distribution, is hard to solve by hand. We need software. We
will use the ‘pwr’ package.
An illustration
Level = 0.05
Power = 0.90
Effect Size = 0.4
> pwr.t.test(d = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample",
+ alternative = "two.sided")
Two-sample t test power calculation
n = 132.3105
d = 0.4
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
Sample size per group = 133
There is another command available in the ‘Base.’ Look at its documentation.
The documentation is attached to the notes.
?power.t.test
> power.t.test(delta = 0.4, sig.level = 0.05, power = 0.90, type = "two.sample",
57
+ alternative = "two.sided")
Two-sample t test power calculation
n = 132.3106
delta = 0.4
sd = 1
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
Scenario 4
The standard deviations σ1 and σ2 are unknown but possibly unequal.
Welch test is used to test the equality of population means. No satisfactory
treatment is available to calculate the required sample sizes.
Estimation perspective
Absolute estimation error: |V7 − W X − VE − E X|
Scenario 1
The population standard deviations σ1 and σ2 are known.
Specifications
1. Confidence level: 1 – α
2. Tolerable maximum error: d
Goal: Find the sample sizes m and n so that
Pr(|V7 − W X − VE − E X| ≤ d) = 1 – α
58
The chances that the difference in the sample means is within d units of the
difference between the population means are 1 – α.
The above equation cannot be solved. Choose a constant k and take m = k*n.
Formulas
n = H$
3
Y
+ ? % Z
0 J /+
3
m = k*n
Example
Goal: Estimate the difference in mean pulse rates for men and women within one
beat with probability at least 0.95. The common standard deviation σ is 2 beats.
Here, d = 1 and 1 – α = 0.95.
Take k = 1.
m = n = 31
Scenario 2
The population standard deviations σ1 and σ2 are equal but unknown.
Specifications
1. Confidence level: 1 – α
2. Tolerable maximum error: d
Goal: Find the sample sizes m and n so that
Pr(|V7 − W X − VE − E X| ≤ d) = 1 – α
The chances that the difference in the sample means is within d units of the
difference between the population means are 1 – α.
We take m = n.
No formula is available. A two-stage procedure has been worked out.
59
Step 1: Take initial samples of the same size s. Calculate the pooled variance S2 of
both the samples. Here, the pooled variance is the average of the individual
variances.
Step 2: Calculate c = [
B
3\/3,/0/3
]
3
Step 3: Calculate n = max{s, @ C + 1}
^
Step 4: Take additional samples of size n – s, if necessary, from each of the
populations.
It has been demonstrated that this two-stage procedure ensures the validity of
the tolerance equation.
Example
Goal: Estimate the difference in mean pulse rates for men and women within one
beat with probability at least 0.95. The common standard deviation σ is unknown.
Here, d = 1 and 1 – α = 0.95.
A sample of 5 subjects from the men’s and women’s groups yielded the following
data.
Men:
81, 79, 78, 82, 80
Women:
77, 76, 78, 79, 77
Pooled Variance = S2 = 1.9
> qt(0.975, 8)
[1] 2.306004
c=
$
% = 0.094
.
n = max{5, @
.A
.
A_
C + 1} = 21
60
An additional sample of 16 subjects from each group would do.
Scenario 3
The population standard deviations σ1 and σ2 are unequal but unknown.
Hard
Paired t-test
Set-up
A subject in the population gives a pair (X, Y) of measurements on a numerical
characteristic. Typically,
X = Measurement before the treatment begins
Y = Measurement after the treatment
Here X and Y are correlated. (In the traditional two-sample t-test, X and Y are
independent.)
Assume X and Y have a bivariate normal distribution with means µ1, and µ2,
standard deviations σ1 and σ2, respectively, and correlation coefficient ρ. Let σ be
the standard deviation of the difference D = X – Y. The hypothesis of skepticism is:
H0: µ1 = µ2
The alternative is whatever you want: one-sided or two-sided.
Structure of the data
(X1, Y1), (X2, Y2), … , (Xn, Yn)
The data come in pairs. What is n?
c be the mean and S be the standard deviation of the
Let Di = Xi – Yi. Let b
differences.
Test statistic: t =
c
Md
3
N
61
If the alternative is
H1: µ1 ≠ µ2
the test is given by:
Reject the null hypothesis if |6| > tn-1,1-α/2.
Specifications
1. Level: α
2. Power: 1 – β
3. Effect size: Δ = 3
Since we are dealing with the difference D, it looks as though we are in the
environment of a single-sample problem. Yes, it is indeed true. Let us look at an
example.
Level = 0.05
Power = 0.80
Effect Size = 0.5
> pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "paired",
+ alternative = "two.sided")
Paired t test power calculation
n = 33.36713
d = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number of *pairs*
62
> pwr.t.test(d = 0.5, sig.level = 0.05, power = 0.80, type = "one.sample",
+ alternative = "two.sided")
One-sample t test power calculation
n = 33.36713
d = 0.5
sig.level = 0.05
power = 0.8
alternative = two.sided
63
Chapter 4: Multi-sample problem – Quantitative responses – Analysis of Variance
Suppose we are testing the equality of k population means.
H0: μ1 = μ2 = … = μk
H1: H0 not true.
Assumptions: All the k populations each is normally distributed with common
standard deviation.
Specifications
For sample size calculations, specify level, power, alternative population means,
and within population standard deviation σ.
k
∑ ( µi − µ ) 2
Between populations variance σ 2pop = i =1
k
, where
1 k
k i =1
µ = ( ) ∑ µi = average of the population means.
Effect size = Δ =
σ pop
= bet.pop.sd/within.pop.sd
σ
In engineering terminology, σ is the noise present in the populations and σpop is a
measure of the signal. There are so many signals associated with different
populations we need to measure how close they are. The standard deviation σpop
of these signals is a measure how close they are. If they are very close, you will
need a large sample to detect them. If they are very close, the alternative and null
hypotheses are very close.
Examples
No. of populations = 4
Within population standard deviation = σ = 1
64
Alternative means
σpop
Effect Size
1
1
1
1.1
0.001875
0.001875
1
1
1
2
0.1875
0.1875
1
1
2
2
0.25
1
2
3
4
1.25
0.25
1.25
Use R to show how these calculations are done.
Look at Scenario 1.
H0: μ1 = μ2 = μ3 = μ4
H1: μ1 = 1, μ2 = 1, μ3 = 1, μ4 = 1.1
H0 and H1 are almost similar. If we want to detect H1 with a good amount of
probability, we will need a very, very large sample from each population.
An important discovery: The sample size depends only on the effect size.
Test Statistic: F
Under the null hypothesis, the distribution of F statistic is F distribution with
numerator degrees of freedom k-1 and denominator degrees of freedom k(n-1). I
am assuming that the same sample of size n is drawn from each population.
Under the alternative hypothesis (with the specified means), the distribution of
the F statistic is a non-central F distribution with numerator degrees of freedom k1 and denominator degrees of freedom k(n-1) and non-centrality parameter Δ.
There is no need to specify the population means under the alternative
hypothesis. It is enough to specify the effect size Δ.
Specifications
Level = 0.05
Power = 0.80
65
No. of populations = 3
within.var = 1
between.var = 0.04
Effect size
= sqrt(between.var)/sqrt(within.var) = 0.2
Use R.
> power.anova.test(groups = 3, between.var = 0.04, within.var = 1, sig.level =
0.05, power = 0.80)
Balanced one-way analysis of variance power calculation
groups = 3
n = 121.4378
between.var = 0.04
within.var = 1
sig.level = 0.05
power = 0.8
NOTE: n is number in each group
Specifications
Level = 0.05
Power = 0.80
No. of populations = 3
within.var = 4
between.var = 0.16
66
Effect size
= sqrt(between.var)/sqrt(within.var) = 0.2
Effect size is the same as in the previous example.
> power.anova.test(groups = 3, between.var = 0.16, within.var = 4, sig.level
+ = 0.05, power = 0.80)
Balanced one-way analysis of variance power calculation
groups = 3
n = 121.4378
between.var = 0.16
within.var = 4
sig.level = 0.05
power = 0.8
NOTE: n is number in each group
We get the same sample size per group. Why?
Specifications
Level = 0.05
Power = 0.80
No. of populations = 3
within.var = 9
between.var = 0.36
Effect size
= sqrt(between.var)/sqrt(within.var) = 0.2
> power.anova.test(groups = 3, between.var = 0.36, within.var = 9, sig.level
+ = 0.05, power = 0.80)
67
Balanced one-way analysis of variance power calculation
groups = 3
n = 121.4378
between.var = 0.36
within.var = 9
sig.level = 0.05
power = 0.8
NOTE: n is number in each group
We get the same sample size. Why?
Moral. No matter what the configurations of the alternative means and within
population standard deviation are, the sample size required remains the same if
the effect size is the same.
Some guidelines from the social sciences and psychology
Small effect size = 0.10
Medium effect size = 0.25
Large effect size = 0.40
Specifications
Level = ?
Power =
No. of populations = ?
68
Effect size = ?
The command ‘pwr.anova.test’ also works.
Let us get an array of required sample sizes under a number of configurations.
Level:
0.01, 0.05, 0.10
Power:
0.80, 0.90, 0.95
Effect Size: 0.1(0.1)0.9
Populations: 3
Let us use R to get the sample sizes.
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:9)
+{
+ Sizes[i, j, k] <- pwr.anova.test(k = 3, f = Delta[k], sig.level = Level[i],
+ power = Power[j])$n
+}
+}
+}
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 0.80", "Power 0.90", "Power 0.95")
69
> dimnames(Sizes)[[3]] <- c("Effect 0.1", "Effect 0.2", "Effect 0.3", "Effect 0.4",
"Effect 0.5", "Effect 0.6", "Effect 0.7", "Effect 0.8", "Effect 0.9")
> round(Sizes)
, , Effect 0.1
Power 0.80 Power 0.90 Power 0.95
Level 0.01
464
582
690
Level 0.05
322
423
516
Level 0.10
258
349
435
, , Effect 0.2
Power 0.80 Power 0.90 Power 0.95
Level 0.01
117
147
174
Level 0.05
81
106
130
Level 0.10
65
88
109
, , Effect 0.3
Power 0.80 Power 0.90 Power 0.95
Level 0.01
53
66
78
Level 0.05
37
48
58
Level 0.10
29
40
49
, , Effect 0.4
Power 0.80 Power 0.90 Power 0.95
Level 0.01
30
38
45
Level 0.05
21
27
33
70
Level 0.10
17
23
28
, , Effect 0.5
Power 0.80 Power 0.90 Power 0.95
Level 0.01
20
25
29
Level 0.05
14
18
22
Level 0.10
11
15
18
, , Effect 0.6
Power 0.80 Power 0.90 Power 0.95
Level 0.01
14
18
21
Level 0.05
10
13
15
Level 0.10
8
11
13
, , Effect 0.7
Power 0.80 Power 0.90 Power 0.95
Level 0.01
11
13
16
Level 0.05
8
10
12
Level 0.10
6
8
10
, , Effect 0.8
Power 0.80 Power 0.90 Power 0.95
Level 0.01
9
11
12
Level 0.05
6
8
9
Level 0.10
5
6
8
, , Effect 0.9
71
Power 0.80 Power 0.90 Power 0.95
Level 0.01
7
9
10
Level 0.05
5
6
7
Level 0.10
4
5
6
Homework Problem 4: Get the required sample sizes when the number of
populations is 4 under a choice of three levels, three powers, and nine effect
sizes.
72
Chapter 5: Correlations
A subject in the targeted population gives a pair of measurement X and Y on
possibly different numerical characteristics. For example, let X = Weight and Y =
Height. Or, X = SBP and Y = DBP. Assume bivariate normality of the joint
distribution of X and Y. Let ρ be the correlation between X and Y. The entity ρ is a
measure of association between X and Y. The hypothesis of skepticism is:
H 0: ρ = 0
X and Y are not associated. X and Y are statistically independent.
Structure of the data
(X1, Y1), (X2, Y2), … , (Xn, Yn)
What is n?
Let r be the sample correlation coefficient.
Test statistic: t = √# − 2 ∗
√ 3
The alternative is:
H 1: ρ ≠ 0
Test: Reject the null hypothesis if |6| > tn-2,1-α/2.
Specifications
1. Level
2. Power
3. The value of ρ under the alternative
The distribution of the t-statistic under the null hypothesis is Student’s t with n-2
degrees of freedom.
The distribution of the t-statistic under the alternative hypothesis is Student’s
non-central t with n-2 degrees of freedom and non-centrality parameter is the
specified value of ρ.
73
Sample size calculations are simple.
Specifications: α = 0.05; Power = 0.80; ρ = 0.3
> pwr.r.test(r = 0.3, sig.level = 0.05, power = 0.80, alternative = "two.sided")
approximate correlation power calculation (arctangh transformation)
n = 84.74891
r = 0.3
sig.level = 0.05
power = 0.8
alternative = two.sided
Under the two-sided alternative, what happens if ρ = -0.3? We get exactly the
same number.
> pwr.r.test(r = -0.3, sig.level = 0.05, power = 0.80, alternative = "two.sided")
approximate correlation power calculation (arctangh transformation)
n = 84.74891
r = 0.3
sig.level = 0.05
power = 0.8
alternative = two.sided
We can have the whole array of sample sizes to cover a variety of configurations
of the specifications.
Specifications
Level: 0.01, 0.05, 0.10
74
Power: 0.80, 0.90, 0.95
Rho: 0.1(0.1)0.9
The following is the R code.
> Level <- c(0.01, 0.05, 0.10)
> Power <- c(0.80, 0.90, 0.95)
> Rho <- seq(0.1, 0.9, 0.1)
> Sizes <- rep(0, 3*3*9)
> dim(Sizes) <- c(3, 3, 9)
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:9)
+{
+ Sizes[i, j, k] <- pwr.r.test(r = Rho[k], sig.level = Level[i], power = Power[j],
+ alternative = "two.sided")$n
+}
+}
+}
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 80%", "Power 90%", "Power 95%")
75
> dimnames(Sizes)[[3]] <- c("Rho 0.1", "Rho 0.2", "Rho 0.3", "Rho 0.4", "Rho 0.5",
"Rho 0.6", "Rho 0.7",
+ "Rho 0.8", "Rho 0.9")
> round(Sizes)
, , Rho 0.1
Power 80% Power 90% Power 95%
Level 0.01
1163
1481
1773
Level 0.05
782
1046
1293
Level 0.10
617
853
1077
, , Rho 0.2
Power 80% Power 90% Power 95%
Level 0.01
287
365
436
Level 0.05
194
258
319
Level 0.10
153
211
266
, , Rho 0.3
Power 80% Power 90% Power 95%
Level 0.01
125
158
189
Level 0.05
85
112
138
Level 0.10
67
92
115
, , Rho 0.4
Power 80% Power 90% Power 95%
Level 0.01
68
86
102
Level 0.05
47
61
75
Level 0.10
37
50
63
, , Rho 0.5
76
Power 80% Power 90% Power 95%
Level 0.01
42
52
62
Level 0.05
29
38
46
Level 0.10
23
31
38
, , Rho 0.6
Power 80% Power 90% Power 95%
Level 0.01
28
34
40
Level 0.05
19
25
30
Level 0.10
16
20
25
, , Rho 0.7
Power 80% Power 90% Power 95%
Level 0.01
19
23
27
Level 0.05
13
17
20
Level 0.10
11
14
17
, , Rho 0.8
Power 80% Power 90% Power 95%
Level 0.01
13
16
18
Level 0.05
9
12
14
Level 0.10
8
10
12
, , Rho 0.9
Power 80% Power 90% Power 95%
Level 0.01
9
10
11
Level 0.05
7
8
9
Level 0.10
6
7
8
Reflections?
77
Chapter 6: Proportions
Sample size calculations in the case of binary response variables
Single proportion
Problem
A drug is under scrutiny. If it is given to a patient, there are only two possible
outcomes. Either the patient gets better (1 = Success) or fails to get better (0 =
Failure). Let p stand as a generic symbol for the probability of success, which is
unknown. We postulate a null hypothesis
H0: p = p0 (specified).
The alternatives are of three kinds.
H1: p < p0 (one-sided alternative)
H1: p > p0 (one-sided alternative)
H1: p ≠ p0 (two-sided alternative)
We need data to test H0 versus H1. A random sample of n patients is taken. Each
patient is given the drug. Let X be the number of patients who got better on the
drug. Note that X is our data. The possible values of X are 0, 1, 2, … , n. The entity
X is an obvious statistic to build a test. An estimate of p is given by
pˆ =
X
,
n
which is the sample proportion of patients who got better.
For testing H0 versus H1: p < p0, reject H0 in favor of H1 if
78
X
− p0
n
< zα ,
p0 (1 − p0 )
n
where zα is such that area under the standard normal curve to the left of zα is α
and α is your chosen level of significance.
For testing H0 versus H1: p > p0, reject H0 in favor of H1 if
X
− p0
n
> z1−α ,
p0 (1 − p0 )
n
where z1-α is such that area under the standard normal curve to the left of z1-α is 1α and α is your chosen level of significance.
For testing H0 versus H1: p ≠ p0, reject H0 in favor of H1 if
|
X
− p0
n
|> z1−α / 2 ,
p0 (1 − p0 )
n
where z1-α/2 is such that area under the normal curve to the left of z1-α/2 is 1 – α/2
and α is your chosen level of significance.
All these tests are large sample tests. Normal approximation is employed to
calculate the critical values. The normal approximation works very well if the null
proportion is in a close neighborhood of 0.5. If not, the arcsine square root
transformation of the statistic X/n is recommended. Technically, calculate
sin-1(sqrt(X/n)). The ‘pwr’ package calculates sample sizes based on the
transformed statistic.
Sample size calculations with given power and level.
79
Specifications
1.
2.
3.
4.
5.
Level 0.05
Power 0.80
Null proportion: p0 = 0.5
Alternative proportion: p1 = 0.4
Alternative: Two-sided
Before we execute the ‘pwr.p.test’ command, we need to translate the numbers
p0 and p1 to an effect size h that is operational in the environment of transformed
statistic. The technical definition of the effect size h is:
h = 2*sin-1(sqrt(p1)) – 2*sin-1(p0)
In the sample size calculations, the absolute value of h is used.
The following R command gives the value of h.
> effect <- ES.h(0.5, 0.4)
> effect
[1] 0.2013579
Let us understand how the effect size h varies with varying values of the
alternative p1.
> AltProp <- seq(0.1, 0.9, 0.1)
> AltProp
[1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
> EffectSizes <- ES.h(0.5, AltProp)
> EffectSizes
[1] 0.9272952 0.6435011 0.4115168
0.0000000 -0.2013579 -0.4115168
[8] -0.6435011 -0.9272952
80
0.2013579
p1: 0.1
0.9
0.2
0.3
0.4
0.5
0.6
0.7
0.8
h:
0.64
0.41
0.20
0.00
0.20
0.41
0.64
0.93
0.93
Moral: The farther you move away from the null proportion, the larger the effect
size h is.
Let us get the sample size.
> pwr.p.test(h = effect, sig.level = 0.05, power = 0.80, alternative =
+ "two.sided")
proportion power calculation for binomial distribution (arcsine transformation)
h = 0.2013579
n = 193.5839
sig.level = 0.05
power = 0.8
alternative = two.sided
Now the alternative p1 = 0.6.
> effect1 <- ES.h(0.5, 0.6)
> pwr.p.test(h = effect1, sig.level = 0.05, power = 0.80, alternative =
+ "two.sided")
proportion power calculation for binomial distribution (arcsine transformation)
h = 0.2013579
n = 193.5839
sig.level = 0.05
81
power = 0.8
alternative = two.sided
The case of two proportions
Example 1. H0: p1 = p2
H1: p1 ≠ p2
For sample size calculation, spell out the alternative values of the proportions: p1
= 0.3 and p2 = 0.4, say.
Test: Two-sided proportion test
Level = 0.05
Power = 0.80
What is the sample size?
We need to calculate effect size in this context. It is defined by
Effect size = h = 2*sin-1(sqrt(p1)) – 2*sin-1(sqrt(p2))
(arc sine transformation)
The command ‘pwr.2p.test’ demands effect size. R can calculate the effect size.
The command is ‘ES.h.’
> h <- ES.h(0.3, 0.4)
>h
[1] -0.2101589
> h1 <- ES.h(0.4, 0.3)
> h1
[1] 0.2101589
82
We do a two.sided test. It does not matter whether one takes the effect size h or
h1 (the negative of h).
> pwr.2p.test(h = h, sig.level = 0.05, power = 0.80)
Difference of proportion power calculation for binomial distribution (arcsine
transformation)
h = 0.2101589
n = 355.4192
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: same sample sizes
> pwr.2p.test(h = h1, sig.level = 0.05, power = 0.80)
Difference of proportion power calculation for binomial distribution (arcsine
transformation)
h = 0.2101589
n = 355.4192
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: same sample sizes
Comment. We need a sample size of 356 subjects from each group in order to
detect the alternative proportions 0.3 and 0.4 with probability 0.80.
83
Chapter 7
Multiple Proportions
Social scientists and psychologists provide ballpark numbers as to what constitute
small, medium, and large effect sizes. The Guru of these ideas is Jacob Cohen. The
package ‘pwr’ provides the relevant numbers.
The command ‘cohen.ES’ within the ‘pwr’ package provides all the answers.
The case of proportions
> cohen.ES(test = "p", size = c("small"))
Conventional effect size from Cohen (1982)
test = p
size = small
effect.size = 0.2
> cohen.ES(test = "p", size = c("medium"))
Conventional effect size from Cohen (1982)
test = p
size = medium
effect.size = 0.5
> cohen.ES(test = "p", size = c("large"))
Conventional effect size from Cohen (1982)
test = p
size = large
effect.size = 0.8
The test statistic used for testing hypotheses on proportions is given by:
84
Let me explain in the case of a single-proportion problem. Let pˆ be the sample
proportion based on a random sample of size n from the population. The test
statistic is given by
2 * sin −1 ( pˆ ) − 2 * sin −1 ( p0 )
Z=
1
n
It has been shown that Z has approximately a normal distribution even in small
samples. This test statistic is better than the conventional test described earlier.
The case of t-test
What are small, medium, and large effect sizes?
> cohen.ES(test = "t", size = "small")
Conventional effect size from Cohen (1982)
test = t
size = small
effect.size = 0.2
> cohen.ES(test = "t", size = "medium")
Conventional effect size from Cohen (1982)
test = t
size = medium
effect.size = 0.5
> cohen.ES(test = "t", size = "large")
Conventional effect size from Cohen (1982)
test = t
85
size = large
effect.size = 0.8
The case of correlation
Let ρ be the population correlation coefficient between two quantitative
variables.
H 0: ρ = 0
H1: ρ ≠ 0; ρ = ρ1 = 0.3, say.
What are the small, medium, and large effect sizes?
ES = 0.5*ln
1 + ρ1
1 − ρ1
> cohen.ES(test = "r", size = "small")
Conventional effect size from Cohen (1982)
test = r
size = small
effect.size = 0.1
> cohen.ES(test = "r", size = "medium")
Conventional effect size from Cohen (1982)
test = r
size = medium
effect.size = 0.3
> cohen.ES(test = "r", size = "large")
86
Conventional effect size from Cohen (1982)
test = r
size = large
effect.size = 0.5
Small, Medium, and Large effect sizes
> cohen.ES(test = "anov", size = "small")
Conventional effect size from Cohen (1982)
test = anov
size = small
effect.size = 0.1
> cohen.ES(test = "anov", size = "medium")
Conventional effect size from Cohen (1982)
test = anov
size = medium
effect.size = 0.25
> cohen.ES(test = "anov", size = "large")
Conventional effect size from Cohen (1982)
test = anov
size = large
effect.size = 0.4
Multi-proportion problem and chi-squared test
87
How to test the equality of several population proportions?
Something to mull about: what is the difference between this problem and
ANOVA?
A motivating problem
We want to investigate the effect of the duration of pre-natal care, and the place
where that care is received, on the survival of patients. There are two factors
involved.
1. Duration of care:
a. Mother received pre-natal care for less than one month
b. Mother received pre-natal care for at least one month
2. Clinic where the care was given:
a. Clinic A
b. Clinic B
The response variable is whether or not the infant died in the first month of life.
The response variable is binary. Four experimental conditions are recognized. The
following are the data.
Condition
Duration
Clinic
# Infants
#Died
1
< 1 month A
179
3
2
≥ 1 month A
297
4
3
< 1 month B
214
17
4
≥ 1 month B
25
2
Let us calculate the sample proportions of infants who die in the first month of
life.
Condition
Proportion
1
2
3
4
3/179 = 0.0168
4/297 = 0.0135
17/214 = 0.0794
2/25 = 0.08
88
First impressions
1. I see with my naked eye that there are differences in the proportions. Are
these differences statistically significant? My null hypothesis is a
hypothesis of skepticism. The underlying population proportions do not
depend on the condition. In other words, the four population proportions
are the same.
2. My hunch is that the null hypothesis will be rejected.
3. The death proportion in Clinic B is higher than the one in Clinic A. Is the
difference significant?
4. Does the duration of pre-natal care matter?
Formulation of problem
p1 = Probability that the infant dies in the first month of life when the mother
receives pre-natal care for less than a month in Clinic A
p2 = Probability that the infant dies in the first month of life when the mother
receives pre-natal care for at least a month in Clinic A
p3 = Probability that the infant dies in the first month of life when the mother
receives pre-natal care for less than a month in Clinic B
p4 = Probability that the infant dies in the first month of life when the mother
receives pre-natal care for at least a month in Clinic B
H0: p1 = p2 = p3 = p4
H1: H0 not true
The null hypothesis avers that the chances of death do not depend on the
condition.
Use the data collected to test H0 versus H1. One uses a chi-squared test. Arrange
the data in the form of a contingency table.
89
Mortality
Condition
1
2
3
4
Total
# Died
3
4
17
2
26
# Alive
176
293
197
23
689
Total
179
297
214
25
715
The numbers in the above table are called observed frequencies (O). Calculate the
expected frequencies (E) if the null hypothesis is true.
Expected frequencies (E)
Mortality
Condition
1
2
3
4
Total
# Died
26 ∗ 179
715
26 ∗ 297
715
26 ∗ 214
715
26 ∗ 25
715
26
# Alive
689 ∗ 179
715
689 ∗ 297
715
689 ∗ 214
715
689 ∗ 25
715
689
Total
179
297
214
25
715
Expected frequencies (E) simplified
Mortality
Condition
1
2
3
4
# Died
6.51
10.80
7.78
0.91 26
# Alive
172.49
286.20
206.22
24.09 689
Total
179
297
214
25
Interpretation.
90
Total
715
If the null hypothesis were true, the above were the frequencies expected in the 8
cells. The observed cell frequencies are different from the expected ones. Are the
differences significant? In order to answer this question, we calculate the chisquared test statistic.
χ2 = ∑
(O − E ) 2 (3 − 6.51) 2 (176 − 172.49) 2
=
+
+
E
6.51
172.49
(4 − 10.80) 2 (293 − 286.20)2 (17 − 7.78) 2
+
+
10.80
286.20
7.78
(197 − 206.22)2 (2 − 0.91)2 (23 − 24.09) 2
+
+
206.22
0.91
24.09
= 19.10
Theory
If the null hypothesis is true, theoretically the chi-squared statistic has a chisquared distribution with 3 degrees of freedom asymptotically (in large samples).
(Degrees of freedom = # groups – 1) If the null hypothesis is true, we expect the
expected frequencies to be close to the observed frequencies. Consequently, we
expect the chi-squared statistic value to be small. A large observed value of the
chi-squared statistic value would cast doubts on the validity of the null
hypothesis.
Reject the null hypothesis in favor of the alternative if the observed chi-squared
value is large, i.e., > cα, the 100*α critical value. The critical value satisfies the
following equation.
α = Pr( χ 3
2
> cα )
Draw a graph. Use R to get the critical value.
> qchisq(0.05, 3, lower.tail = F)
[1] 7.814728
91
Test
Reject the null hypothesis if the observed chi-squared value > 7.81.
The observed chi-squared value is 19.10 > 7.81.
We reject the null hypothesis. As per the data collected, it seems that the death
rate of infants does depend on the conditions.
We can calculate the p-value associated with the data.
p-value = Pr( χ32 > 19.10) = The probability of observing the chi-squared value as
large as the one we have gotten when the null hypothesis is true
> pchisq(19.10, 3, lower.tail = F)
[1] 0.0002606860
The null hypothesis is rejected very, very strongly. The observed result is very,
very significant.
R can calculate the chi-squared statistic and also the p-value. Let us do this.
Input the data in a matrix form.
> InfantDeaths <- matrix(c(3, 4, 17, 2, 176, 293, 197, 23), nrow = 2, byrow = T)
> InfantDeaths
[,1] [,2]
[,3]
[,4]
[1,]
3
4
17
2
[2,]
176
293
197
23
Label the rows and columns.
> rownames(InfantDeaths) <- c("Died", "Alive")
> colnames(InfantDeaths) <- c("A&<1Month", "A&>1Month", "B&<1Month",
"B&>1Month")
92
> InfantDeaths
A&<1Month A&>1Month B&<1Month B&>1Month
Died
3
4
17
2
Alive
176
293
197
23
The command for the chi-squared test is ‘chisq.test.’
> chisq.test(InfantDeaths)
Pearson's Chi-squared test
data: InfantDeaths
X-squared = 19.0964, df = 3, p-value = 0.0002611
Warning message:
In chisq.test(InfantDeaths) : Chi-squared approximation may be incorrect
The output matches with our hand calculations.
One can ask for the expected frequencies under the null hypothesis.
> chisq.test(InfantDeaths)$expected
A&<1Month A&>1Month B&<1Month B&>1Month
Died 6.509091
10.8
7.781818
Alive 172.490909
286.2
206.218182
One can ask for the observed frequencies.
> chisq.test(InfantDeaths)$observed
93
0.9090909
24.0909091
A&<1Month A&>1Month B&<1Month B&>1Month
Died
3
4
17
2
Alive
176
293
197
23
> O <- chisq.test(InfantDeaths)$observed
> E <- chisq.test(InfantDeaths)$expected
> (O-E)^2/E
A&<1Month A&>1Month B&<1Month B&>1Month
Died
1.89177247
4.2814815
10.919669
1.30909091
Alive 0.07138764
0.1615653
0.412063
0.04939966
By looking at the table above, one can find out which cell is contributing the most
to the chi-squared value. One can find out in which of the cells the discrepancy is
large between the observed and expected frequencies.
The hypothesis of homogeneity of proportions is rejected. The chi-squared test is
traditionally used to test independence of two categorical variables based on the
data collected on the variables.
Testing the equality of several population proportions is equivalent to testing
statistical independence of the two categorical variables ‘Mortality’ and
‘Condition,’ in our context.
Power calculations for testing the equality of three proportions.
A motivating problem from the Digestive Health Center, Children’s Hospital
There are three methods of detection of colon cancer gene based.
1. RTPCR
(Diagnosis) Response: Yes/No
2. Methylation
Response: Yes/No
3. Deletion
Response: Yes/No
94
Let
p1 = probability that response is Yes when RTPCR is used on a patient with cancer
p2 = probability that response is Yes when Methylation is used on a patient with
cancer
p3 = probability that response is Yes when Deletion is used on a patient with
cancer
Only one method can be used on a patient. Which method is more effective? We
start with the hypothesis of skepticism.
H0: p1 = p2 = p3
H1: H0 not true
In order to test the null hypothesis, the investigator intends to use either 40 or 70
cancer patients under each method of detection.
Questions
1. What is the effect size in this context?
2. Calculate power under each proposed sample sizes for different effect
sizes.
Some additional information.
The investigator is very familiar with RTPCR. If cancer is present, the chances are
75% that the response under RTPCR is ‘Yes.’
For power calculations, we need the following.
1. Level, α (0.05?)
2. Power
3. Effect size
Formula for effect size. The general case of k proportions
95
H0: p1 = p2 = … = pk
H1: H0 not true.
Spell out what the proportions are under the alternative hypothesis. (Like as in
the Analysis of Variance)
Let the specified values be
p1 = p11, p2 = p12, … , pk = p1k
The effect size w is defined by
w=
( p11 − po ) 2 ( p12 − p0 ) 2
( p1k − p0 ) 2
+
+ ... +
p0
p0
p0
The entity p0 is the common value of the proportions under the null hypothesis.
How do I get this?
Example (Courtesy: J . Cohen)
A market researcher is seeking to determine the relative preference by
consumers among four different package designs for a new product. Let pi be the
true proportion of customers who like Design i, i = 1, 2, 3, 4.
H0: p1 = p2 = p3 = p4
H1: H0 not true.
He wants to find the necessary sample size to conduct a survey. A pilot study of
100 consumers gave pˆ1 = 0.3750, pˆ 2 = 0.2083, pˆ 3 = 0.2083, pˆ 4 = 0.2083 .
Take these proportions as the alternative proportions. He wants a confirmation of
these proportions. What about the common value of the proportions under the
null hypothesis? Take p0 as the average of these proportions, which is 0.25.
Now compute w (0.289).
96
Let us find out what small, medium, and large effect sizes are in this
environment?
> cohen.ES(test = "chisq", size = "small")
Conventional effect size from Cohen (1982)
test = chisq
size = small
effect.size = 0.1
> cohen.ES(test = "chisq", size = "medium")
Conventional effect size from Cohen (1982)
test = chisq
size = medium
effect.size = 0.3
> cohen.ES(test = "chisq", size = "large")
Conventional effect size from Cohen (1982)
test = chisq
size = large
effect.size = 0.5
Look at the documentation on chis-quared test.
? pwr.chisq.test
Let us find the sample size for our problem.
> pwr.chisq.test(w = 0.289, df = 3, sig.level = 0.05, power = 0.80)
97
Chi squared power calculation
w = 0.289
N = 130.5368
df = 3
sig.level = 0.05
power = 0.8
NOTE: N is the number of observations
We need to survey 131 consumers.
98
Chapter 8
McNemar test
This is similar to the paired t-test for a pair of binary variables. For every subject in
the targeted population, the response is the outcome of a pair (X, Y) of binary
variables.
Example from toxicology (Courtesy: Kanishta)
Two medical devices are being compared for the presence of a particular type of
bacteria in human beings.
Device 1:
Dosimeter
Device 2:
Nasal swab
Design
Take a sample of n subjects. On each subject try both the devices.
X = Verdict from the dosimeter: Presence of bacteria: yes or no
Y = Verdict from the nasal swab: Presence of bacteria: yes or no
X and Y are correlated because the responses are coming from the same subject.
Joint distribution of responses from a random subject
Dosimeter Nasal swab Marginal
Yes
No
Yes
p11
p12
p1
No
p21
p22
p2
Marginal
q1
q2
1
p1 = Population probability that Dosimeter gives the verdict that bacteria is
present
99
q1 = Population probability that Nasal swab gives the verdict that bacteria is
present
Hypothesis of skepticism: No difference
H0: p1 = q1
H1: p1 ≠ q1
Mathematical aside: p1 = q1 if and only if p12 = p21. This observation has an
important bearing in the development of a test for this testing problem.
A sample of n subjects gives data: (X1, Y1), (X2, Y2), … , (Xn, Yn)
Each Xi is Yes or No. Each Yi is Yes or No.
Arrange the data in the form of a 2x2 contingency table.
Dosimeter Nasal swab Marginal
Yes
No
Yes
n11
n12
n1
No
n21
n22
n2
Marginal
m1
m2
n
Test Statistic: χ2 =
V3 3 X3
3 13
If the null hypothesis is true, the chi-squared statistic has a chi-squared
distribution with one degree of freedom. This is important for the computation of
the critical value or p-value.
McNemar test: Reject the null hypothesis if χ2 > e,f
.
The area under the chi-squared distribution with one degree of freedom to the
left of e,f
is 1-α.
100
What about the distribution of the chi-squared statistic under the alternative
hypothesis? This is important for sample size calculation. What is the effect size
here?
Specifications
1. Level α
2. Power 1 – β
3. Values of p12 and p21 (Go to the Joint distribution table.)
The closer the values of p12 and p21 are, the closer the values of p1 and q1.
The closer the values of p12 and p21 are, the closer the alternative and null
hypotheses are.
The closer the values of p12 and p21 are, the larger the sample size is.
An example from Agresti (1990)
A sample of 1600 people is asked whether or not they approve the performance
of the President.
Six months later, the same sample is asked whether or not they approve the
performance of the President.
The question is whether or not there is a significant shift in the approval rate.
The following are the data.
2nd Survey
1st Survey
Approve Disapprove
Approve
Disapprove
794
150
86
570
We want to test the null hypothesis
H0: p1 = q1 (No shift)
H1: p1 ≠ q1
101
Look for the documentation on McNemar test.
> ?mcnemar.test
Input the data in a matrix form.
> Performance <- matrix(c(794, 86, 150, 570), nrow = 2, dimnames = list(
+ "1st Survey" = c("Approve", "Disapprove"), "2nd Survey" = c("Approve",
+ "Disapprove")))
> Performance
2nd Survey
1st Survey
Approve
Disapprove
Approve Disapprove
794
150
86
570
> MB <- mcnemar.test(Performance)
> MB
McNemar's Chi-squared test with continuity correction
data: Performance
McNemar's chi-squared = 16.8178, df = 1, p-value = 4.115e-05
Conclusion: There is a significant shift in the approval rate. It was worse (59%
versus 55%).
An R package ‘TrialSize’ does sample size calculations for the McNemar test. It
came into existence on June 03, 2013.
Download and then activate the package.
Specifications
1. Level: 0.05
102
2. Power = 0.80 or Beta = 0.20
3. p12 = 0.2 and p21 = 0.5 The command needs the ratio of these two numbers
and the sum of these numbers.
Note that p12 and p21 are not close.
> mb <- McNemar.Test(alpha = 0.05, beta = 0.20, psai = 0.2/0.5, paid = 0.7)
> mb
[1] 58.63224
The required sample size is 59.
Change the specifications.
1. Level: 0.05
2. Power = 0.80 or Beta = 0.20
3. p12 = 0.2 and p21 = 0.3 The command needs the ratio of these two numbers
and the sum of these numbers.
Note that p12 and p21 are close.
> mb1 <- McNemar.Test(alpha = 0.05, beta = 0.20, psai = 0.2/0.3, paid = 0.5)
> mb1
[1] 390.0778
The required sample size is 391.
103
Chapter 9: Hazard Ratio in Survival Analysis
A motivating example
The target population is those patients with retinitis pigmentosa. The goal is the
prevention of visual loss. The treatment proposed is heavy intake of vitamin A
supplement.
Design
Take a sample of m patients. Each one takes 15,000 IU of vitamin A per day.
Follow each one of them for six years. This is the experimental group.
Take a sample of n patients. Each one takes 75 IU of vitamin A per day. Follow
each one of them for six years. This is the control group.
The response variable is the time T at which vision loss occurs. In the survival
analysis jargon, this is called survival time. The time T is measured in years.
In studies like this, censoring occurs. Censoring (dropouts) within the period of
study could occur for several reasons.
1.
2.
3.
4.
5.
Death
Stops taking medication because of other diseases.
Stops taking medication because of side effects.
Unwilling to continue take medications.
No vision loss occurs during the entire period of time.
Data structure
Subject
times
status
group
1
2
1
E
2
3
0
C
And so on.
Interpretation
104
The first subject was in the experimental group. Vision loss occurred in Year 2.
This is a genuine observation (status = 1) of the variable T.
The second subject was in the Control group. He dropped out in the Year 3. This is
a censored observation of T. He took vitamin A for three years and no vision loss
occurred. Then he is lost.
The goal is whether taking a mega dose of vitamin A postpones vision loss
significantly. This question can be answered via a model building endeavor. We
use Cox Proportional Hazards model. The covariate is ‘group.’
h(t; group) = h0(t)*exp(γ*group), t > 0
h() is the hazard function of the variable T at time t.
h0(t) is the baseline hazard function.
‘group’ is codified as 1 (Experimental) or 0 (Control).
HR = Hazard Ratio = exp(γ).
Interpretation
HR = 1 if and only if γ = 0, which means the mega treatment is not effective.
HR < 1 means γ < 0, which means the mega treatment is more effective than the
control treatment.
Test
H0: HR = 1
H1: HR ≠ 1
How many patients to recruit under the experimental treatment and how many
under the control treatment?
The answer is complicated by the threat of censoring. We need to know rates of
censoring under both scenarios.
105
We need to specify the level, power, and the HR we are shooting for under the
alternative.
Specifications
1 . level: 0.05
2. power: 0.80 or β = 0.2
3. HR = 0.8
We need lots of other information such as censoring rates before we can
calculate sample sizes. We need data from a pilot study.
The R package ‘powerSurvEpi’ helps calculating sample sizes in this environment.
The R command ‘ssizeCT’ uses the pilot study data thrown at it to calculate
sample sizes. Download this package and activate it.
A pilot study
Berson et al (1993) initiated a study to examine the effectiveness of the mega
treatment. Recruitment was carried out over a two-year period 1984-87. The
study was terminated in September, 1991. The data are available in the package
under the name ‘Oph.’ Let us download the data and understand it.
> data(Oph)
> dim(Oph)
[1] 354 3
> head(Oph)
times status group
1
1
1
E
2
1
1
E
3
1
1
E
4
1
0
E
106
5
1
0
E
6
1
0
E
> summary(Oph)
times
Min.
status
:1.000
:0.000
C:182
1st Qu.:4.000
1st Qu.:0.000
E:172
Median :5.000
Median :0.000
Mean
Mean
:4.401
Min.
group
:0.435
3rd Qu.:5.000
3rd Qu.:1.000
Max.
Max.
:6.000
:1.000
> table(Oph$status)
0
1
200 154
Censoring rate
> 200/354
[1] 0.5649718
Isolate the experimental group.
> OphE <- subset(Oph, Oph$group == "E")
> head(OphE)
times status group
1
1
1
E
2
1
1
E
3
1
1
E
4
1
0
E
107
5
1
0
E
6
1
0
E
> table(OphE$status)
0
1
107
65
Censoring rate under the experimental group.
> 107/172
[1] 0.622093
Isolate the Control group.
> OphC <- subset(Oph, Oph$group == "C")
> head(OphC)
times status group
173
1
1
C
174
1
1
C
175
1
1
C
176
1
1
C
177
1
1
C
178
1
1
C
> table(OphC$status)
0
1
93 89
Censoring rate under the Control group.
> 93/182
108
[1] 0.510989
Fit the Cox model.
> MB <- coxph(Surv(times, status) ~ group, data = Oph)
> summary(MB)
Call:
coxph(formula = Surv(times, status) ~ group, data = Oph)
n= 354, number of events= 154
coef exp(coef) se(coef)
groupE -0.3735
0.6883
z Pr(>|z|)
0.1633 -2.288
0.0222 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
groupE
0.6883
1.453
0.4998
0.9479
Concordance= 0.548 (se = 0.024 )
Rsquare= 0.015 (max possible= 0.991 )
Likelihood ratio test= 5.31 on 1 df, p=0.02126
Wald test
= 5.23 on 1 df, p=0.02216
Score (logrank) test = 5.29 on 1 df, p=0.0214
Use this pilot data to calculate sample sizes. The stipulation k = 1 means we want
equal sample sizes.
> MB1 <- ssizeCT(Surv(times, status) ~ group, dat = Oph, power = 0.8, k = 1,
+ RR = 0.8, alpha = 0.05)
> names(MB1)
109
[1] "mat.lambda" "mat.event" "pC"
"pE"
"ssize"
> MB1$ssize
nE nC
706 706
We need to recruit 706 patients under each treatment regimen so that we can be
sure with 80% probability that a hazard ratio of 0.8 is detectable if it is true.
Get documentation. CT stands for clinical trial.
> ?ssizeCT
starting httpd help server ... done
110
CHAPTER 10: Multiple Regression
Set-up
Y: Quantitative Response variable
Predictors: X1, X2, … , Xp
Multiple Regression Model
Y = β0 + β1*X1 + β2*X2 + … + βp*Xp + ε
ε ~ N(0, σ)
Assumptions
The vector (Y, X1, X2, … , Xp) has a multivariate normal distribution.
This assumption implies that the conditional distribution
Y | X1, X2, … , Xp ~ N(β0 + β1*X1 + β2*X2 + … + βp*Xp, σ)
This also implies that
E(Y | X1, X2, … , Xp) = β0 + β1*X1 + β2*X2 + … + βp*Xp
Var(Y | X1, X2, … , Xp) = σ2
The standard deviation σ does not depend on the covariates X1, X2, … , Xp
(Homoscedasticity)
It is customary to write the multivariate normal distribution in the multiple
regression style presented above.
The population multiple correlation coefficient between Y and X = (X1, X2, … , Xp) is
usually denoted by gL.
or simply, Ρ2.
A standard interpretation is that the percentage of variation in Y accounted by the
predictors is given by 100*Ρ2.
A mathematical formula is available for P2.
111
We want to test the null hypothesis
H0: Ρ2 = 0
against the alternative
H1: Ρ2 > 0.
Interpretation of the null hypothesis
Y and X1, X2, … , Xp are statistically independent.
β1 = β2 = … βp = 0
As a matter of fact, the null hypothesis and β1 = β2 = … βp = 0 are equivalent.
The covariates X1, X2, … , Xp have nothing to say about Y. They are useless
predictors.
Let R2 be the sample multiple correlation coefficient.
Test Statistic: F =
h3
i
/h3
N/i/
If the null hypothesis is true, F ~ F distribution with numerator degrees of
freedom p and denominator degrees of freedom n – p – 1. The critical value for a
given level can be found from the F distribution in R.
Interpretation of the alternative hypothesis
At least one βi ≠ 0.
At least one covariate is a useful predictor.
What should be the sample size n?
Spell out the alternative value of Ρ2.
112
Under this given alternative value of Ρ2, F has a non-central F distribution with
numerator degrees of freedom p and denominator degrees of freedom n – p – 1
and non-centrality parameter n*
j3
j3
.
Now, we can develop an integral equation to solve for the desired sample size.
The package MBESS has a command, which gives the sample size.
Specifications
1.
2.
3.
4.
Level
Power
Number of predictors
Alternative value of the population Ρ2.
Look at the documentation.
> ?MBESS
Look at the documentation of the following command.
> ?ss.power.R2
Let us calculate the sample size for the following specifications.
Level = 0.05
Power = 0.80
Ρ2 = 0.25
Number of predictors = 5
> MB <- ss.power.R2(Population.R2 = 0.25, alpha.level = 0.05, desired.power =
+ 0.80, p = 5)
> MB
$Necessary.Sample.Size
113
[1] 45
$Actual.Power
[1] 0.8075842
$Noncentral.F.Parm
[1] 15
$Effect.Size
[1] 0.3333333
Lower the value of Ρ2.
> MB <- ss.power.R2(Population.R2 = 0.2, alpha.level = 0.05, desired.power =
+ 0.80, p = 5)
> MB
$Necessary.Sample.Size
[1] 58
$Actual.Power
[1] 0.8078919
$Noncentral.F.Parm
[1] 14.5
$Effect.Size
[1] 0.25
Let us calculate the sample size for each of the configurations coming from:
Level: 0.01, 0.05, 0.10
Power: 0.80, 0.90, 0.95
114
Alternative Ρ2: 0.2, 0.5, 0.8
Number of variables: 3, 4, 5
There are 3*3*3*3 = 81 configuations.
We need a four-dimensional array.
> Level <- c(0.01, 0.05, 0.10)
> Power <- c(0.80, 0.90, 0.95)
> RSquared <- c(0.2, 0.5, 0.8)
> Variables <- c(3, 4, 5)
Create a four-dimensional array of zeros.
> Sizes <- rep(0, 81)
> dim(Sizes) <- c(3, 3, 3, 3)
Label the rows, columns, and doubly-indexed slices.
> dimnames(Sizes)[[1]] <- c("Level 0.01", "Level 0.05", "Level 0.10")
> dimnames(Sizes)[[2]] <- c("Power 80%", "Power 90%", "Power 95%")
> dimnames(Sizes)[[3]] <- c("R2 = 0.2", "R2 = 0.5", "R2 = 0.8")
> dimnames(Sizes)[[4]] <- c("p = 3", "p = 4", "p = 5")
Start four loops.
> for (i in 1:3)
+{
+ for (j in 1:3)
+{
+ for (k in 1:3)
115
+{
+ for (l in 1:3)
+{
+ Sizes[i, j, k, l] <- ss.power.R2(Population.R2 = RSquared[k], alpha.level =
+ Level[i], desired.power = Power[j], p = Variables[l])$Necessary.Sample.Size
+}
+}
+}
+}
> Sizes
, , R2 = 0.2, p = 3
Power 80% Power 90% Power 95%
Level 0.01
68
83
97
Level 0.05
48
61
73
Level 0.10
39
51
62
, , R2 = 0.5, p = 3
Power 80% Power 90% Power 95%
Level 0.01
22
26
29
Level 0.05
16
19
22
Level 0.10
13
16
19
, , R2 = 0.8, p = 3
Power 80% Power 90% Power 95%
Level 0.01
11
12
116
13
Level 0.05
8
9
10
Level 0.10
7
8
9
, , R2 = 0.2, p = 4
Power 80% Power 90% Power 95%
Level 0.01
74
90
105
Level 0.05
53
67
80
Level 0.10
43
56
68
, , R2 = 0.5, p = 4
Power 80% Power 90% Power 95%
Level 0.01
24
28
32
Level 0.05
18
21
24
Level 0.10
15
18
21
, , R2 = 0.8, p = 4
Power 80% Power 90% Power 95%
Level 0.01
12
13
14
Level 0.05
10
11
11
Level 0.10
9
9
10
, , R2 = 0.2, p = 5
Power 80% Power 90% Power 95%
Level 0.01
80
96
111
Level 0.05
58
72
85
Level 0.10
47
61
73
, , R2 = 0.5, p = 5
Power 80% Power 90% Power 95%
Level 0.01
26
31
117
34
Level 0.05
20
23
27
Level 0.10
17
20
23
, , R2 = 0.8, p = 5
Power 80% Power 90% Power 95%
Level 0.01
13
15
16
Level 0.05
11
12
13
Level 0.10
10
10
11
Understanding?
118