Sample Size and Power Analysis for Microarray Studies

Sample Size and Power Analysis for
Microarray Studies
Maarten van Iterson and Ren´ee de Menezes
Center for Human and Clinical Genetics,
Leiden University Medical Center, The Netherlands
Package SSPA, version 2.0.0
March 29, 2011
Contents
1 Introduction
1
2 Basic Example
2
3 Compare multiple experiments
4
4 Using SSPA in combination with limma
7
5 Additional functionality
10
5.1 Ferreira’s π0 estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Rupperts estimation method . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Details
11
1 Introduction
This document shows the functionality of the R-package SSPA. The package performs
power and sample size analysis using a method described by [1, 2]. Our implementation allows for fast and realistic estimates of power and sample size for microarray
experiments, given pilot data. By means of two simple commands (pilotData(),
sampleSize()), a researcher can read their data in and compute the desired estimates.
Other functions are provided to facilitate interpretation of results.
Given a set of test statistics from the pilot data, the knowledge of their distribution
under the null hypothesis and the sample size used to compute them, the method
1
estimates the power for a given false discovery rate. The multiple testing problem is
controlled through the adaptive version of the Benjamini and Hochberg method [3].
For more details about the implementation and method we refer to [1, 2]. In [4] we
describe two biological case studies using the package SSPA.
For comparison we implement the power and sample size estimation method proposed by Ruppert [5] which uses a different estimation approach. Two additional
packages need to be installed for using this method namely quadprog and splines.
2 Basic Example
We demonstrate the functionality of the package by using preprocessed gene expression
data from the leukemia ALL/AML study of [6] as provided by multtest package. To
load the leukemia dataset, use data(golub) for a short description of the experiment
and data use ?golub.
> library(multtest)
> data(golub)
The required input for the sample size and power analysis is a vector of test-statistics
and the effective sample sizes. The test-statistics are obtained by a differential gene expression analysis using one of the available packages like limma, maanova, multtest
(it is also possible to import the vector of test-statistics in R if they are calculated
using another software package). Here we will use the function mt.teststat from the
multtest package to obtain a vector of test-statistics from the leukemia data. The
effective sample size for a two-group experiment is given by:
−1
1
1
+
,
(1)
N=
nA
nB
where nA is the sample size of one group and nB the other.
> tst <- mt.teststat(golub, golub.cl)
> X <- model.matrix(~golub.cl)
> effectivesamplesize <- 1/solve(crossprod(X))[2,
+
2]
The first step in performing the sample size and power analysis is creating a object
of class PilotData which will contain all the necessary information needed for the
following power and sample size analysis. A user-friendly interface for creating an
object of PilotData is available as pilotData().
> library(SSPA)
> pd <- pilotData(statistics = tst, effectivesamplesize = effectivesamplesize)
Several ways of viewing the content of the PilotData object are possible either
graphically or using a show-method by just typing the name of the created object of
PilotData:
2
> pd
An object of class "PilotData"
Number of test-statistics: 3051
Effective sample size:
7.82
Null distribution:
Normal
By using plot(pd) several diagnostic plots are produced.
> print(plot(pd))
4
3
2
1
0
0.15
0.10
0.05
0.00
−5
0
5
10
0.0
0.2
test statistic
test statistic
10
5
0
−5
0
0.6
0.8
1.0
0.8
1.0
p−value
1.0
0.8
0.6
0.4
0.2
0.0
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
−2
0.4
2
0.0
qnorm
0.2
0.4
0.6
p−value (sorted)
Figure 1: Diagnostic plots of the pilot-data: Upper-left panel show a histogram of the
test statistics, upper right the corresponding histogram of p-values. The
lower left-panel shows a quantile-quantile plot for the test statistics and in
the lower right-panel the order p-values are plotted against their ranks. In
this example a large effect is visual.
Next we create an object of class SampleSize which will perform the estimation
of the proportion of non-differentially expressed genes and the density of effect sizes.
Several options are available see ?sampleSize. The default method for estimation of
3
the proportion of non-differentially expressed genes is the method proposed by [7] as
implemented by convest() function from the package limma. Additionally the method
by [8] and [2] are available, also a user-defined proportion is possible. Again a generic
show-method is implemented as well.
> ss <- sampleSize(pd)
> ss
An object of class "SampleSize"
Distribution of effect sizes is estimated from -6 to 6 using 1024 points.
Method for estimation of the proportion of non-differentially expressed: "Langaas
Fraction of non-differentially expressed genes: 0.4702 (adjusted=0.4761).
Kernel used in the deconvolution is "fan" with bandwidth 0.353.
The density of effect size can be shown by a call to plot(). When there are both
up- and down-regulated genes a bimodal density is observed.
Estimation of the average power for other sample sizes then that of the pilot-data
can be performed with the power()-function. The user can also give desired false
discovery rate level or possible multiple false discovery rate levels as a vector.
3 Compare multiple experiments
Relative power and sample size analysis is useful when multiple experiments can be
compared e.g. using different treatments or exposure times. The following example is described in more detail in [4]. In this experiment the outcome of specific
PPAR-alpha activation on murine small intestinal gene expression was examined using Affymetrix GeneChip Mouse 430 2.0 arrays. PPAR-alpha was activated by several
PPAR-alpha-agonists that differed in activating potency. In this paper the data of
three agonists were used, namely Wy14,643, fenofibrate and trilinolenin (C18:3). The
first two compounds belong to the fibrate class of drugs that are widely prescribed
to treat dyslipidemia, whereas trilinolenin is an agonist frequently found in the human diet. For intestinal PPAR-alpha, Wy14,643 is the most potent agonist followed
by C18:3 and fenofibrate. Since time of exposure also affects the effect size, intestines
were collected 6 hrs (all three agonists) or 5 days (Wy14,643 and fenofibrate only) after
exposure. The data is available from the SSPA-package using data(Nutrigenomics)
and represents the test statistics for the five experimental conditionals. The first row
of the data-matrix contains the effective sample sizes. In order to construct a PilotData object we use the apply-family of function for convinience. Here we use the
normal distributional assumption and Storey’s method to estimated the proportion of
nondifferentially expressed genes because these methods are the fastest.
> data(Nutrigenomics)
> str(Nutrigenomics)
'data.frame':
16540 obs. of 5 variables:
$ wy5d : num 2 1.22 1.19 -1.14 1 0.86 4 -1.83 -2.64 3.04 ...
$ feno5d: num 2 -1.03 -1.11 0.33 0.22 0.22 0.24 0.36 1.73 0.54 ...
4
> print(plot(ss))
density of effect sizes
0.20
0.15
0.10
0.05
0.00
−6
−4
−2
0
2
4
6
effect size
Figure 2: Density of effect-sizes:
$ X1836h: num 2.5 -2.3 -0.56 0.06 -0.89 -0.09 2 -2.12 -1.47 -1.69 ...
$ wy6h : num 2 1.12 0.21 -0.68 -1.06 -0.4 1.82 -1.71 1.2 1.03 ...
$ feno6h: num 2.22 0.76 -0.37 -0.07 -1.9 ...
> pd <- apply(Nutrigenomics, 2, function(x) pilotData(statistics = x[-1],
+
effectivesamplesize = x[1], distribution = "Normal"))
> ss <- lapply(pd, sampleSize, method = "Storey")
We could use the plot to view the density of effectsizes of each set of test statistics
but it would be more informative to use view them together in one plot e.g. using the
lattice function xyplot.
The sample sizes of these experiments were in the range of 4 to 5 per group with not
all experiments balanced. What would be the average power gain using 10 sample per
group? The following code answers this question again for all experiments together.
5
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.90
●
0.85
Power
0.95
1.00
> 1/(1/27 + 1/11)
[1] 7.81579
> 1/(1/27 + 1/seq(11, 27, 2))
[1] 7.815789 8.775000 9.642857 10.431818 11.152174
[6] 11.812500 12.420000 12.980769 13.500000
> pwr <- power(ss, plot = FALSE, effectivesamplesizes = 1/(1/27 +
+
1/seq(11, 27, 2)), fdr = c(0.01, 0.05))
> matplot(seq(11, 27, 2), pwr, type = "b", pch = 1,
+
ylim = c(0.8, 1), ylab = "Power", xlab = "Sample size")
> grid()
> legend("bottomright", colnames(pwr), col = c(1:ncol(pwr)),
+
pch = 1, lty = 1)
●
0.80
●
15
20
fdr = 0.01
fdr = 0.05
25
Sample size
Figure 3: Estimated average power for different sample sizes and FDR-levels: The
sample size of one group was fixed at 27, the pilot data sample size and the
other was increased from 11 (pilot data sample size) to 27.
6
> library(lattice)
> effectsize <- data.frame(exposure = factor(rep(rep(c("5 Days",
+
"6 Hours"), c(2, 3)), each = 1024)), compound = factor(rep(c("Wy14,643",
+
"fenofibrate", "trilinolenin (C18:3)", "Wy14,643",
+
"fenofibrate"), each = 1024)), lambda = as.vector(sapply(ss,
+
lambda)), theta = as.vector(sapply(ss, theta)))
> print(xyplot(lambda ~ theta | exposure, group = compound,
+
data = effectsize, type = c("g", "l"), layout = c(1,
+
2), lwd = 2, xlab = "effect size", ylab = "",
+
auto.key = list(columns = 3, lines = TRUE,
+
points = FALSE)))
fenofibrate
trilinolenin (C18:3)
Wy14,643
6 Hours
0.4
0.3
0.2
0.1
0.0
5 Days
0.4
0.3
0.2
0.1
0.0
−6
−4
−2
0
2
4
6
effect size
Figure 4: Density of effect size of the five sets of test statistics.
4 Using SSPA in combination with limma
limma is a frequently used package for differential expression analysis of microarray
experiments. Here we will show how SSPA can be used with the results from limma.
7
> samplesize <- 2 * seq(4, 8)
> averagepower <- data.frame(power = as.vector(sapply(ss,
+
function(x) as.numeric(power(x, effectivesamplesize = samplesize/2,
+
fdr = 0.05)[, 1]))), exposure = factor(rep(rep(c("5 Days",
+
"6 Hours"), c(2, 3)), each = length(samplesize))),
+
compound = factor(rep(c("Wy14,643", "fenofibrate",
+
"trilinolenin (C18:3)", "Wy14,643", "fenofibrate"),
+
each = length(samplesize))), samplesize = rep(samplesize,
+
5))
> print(xyplot(power ~ samplesize | exposure, group = compound,
+
data = averagepower, type = c("g", "b"), layout = c(1,
+
2), lwd = 2, pch = 16, xlab = "sample size (per group)",
+
ylab = "", auto.key = list(columns = 3, lines = TRUE,
+
points = FALSE)))
fenofibrate
trilinolenin (C18:3)
Wy14,643
6 Hours
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 Days
0.9
0.8
0.7
0.6
0.5
0.4
●
●
8
●
●
●
●
●
●
●
●
10
12
14
16
sample size (per group)
Figure 5: Power curves for the five experiments.
8
0.9
0.8
0.7
0.6
0.5
0.4
First obtain limma fit of the data. Here we will use the swirl example data from
marray which is also extensively described in the limma useguide.
>
>
>
>
>
>
>
>
>
>
library(marray)
data(swirl)
library(convert)
library(limma)
swirl <- as(swirl, "RGList")
MA <- normalizeWithinArrays(swirl)
design <- c(-1, 1, -1, 1)
fit <- lmFit(MA, design)
ordinary.t <- fit$coef/fit$stdev.unscaled/fit$sigma
fitModerated <- eBayes(fit)
Now we load the SSPA-package and create two pilotData-objects, one for the ordinary t test-statistics and one for the moderated t test-statistics with the appropriate
degrees of freedom and effective sample size. Futher information on the moderated
test statistics can be found in [9, 10].
>
>
+
>
>
>
+
+
>
+
+
library(SSPA)
effectivesamplesize <- 1/solve(crossprod(design))[1,
1]
nu <- fit$df.residual[1]
nu0 <- fitModerated$df.prior
pd <- pilotData(statistics = ordinary.t[, 1],
effectivesamplesize = effectivesamplesize,
dof = nu, distribution = "Student")
pdMod <- pilotData(statistics = fitModerated$t[,
1], effectivesamplesize = effectivesamplesize,
dof = nu + nu0, distribution = "Student")
The data comes from a dye-swap experiment with four microarrays, thus for each
RNA source we have two biological and two technical replicates. Because this is an
dye-swap experiment we can estimate the difference between the biological sources,
leaving three degrees of freedom for estimation of the error.
The pdMod-object has a moderated degrees of freedom, the moderated Student tdistribution will be used for the analysis. The effective sample size was defined by:
N = (1/nX + 1/nY )−1 , thus with both four samples gives two.
> ss <- sampleSize(pd)
> ssMod <- sampleSize(pdMod)
Again estimation of the average power for other sample sizes then that of the pilotdata can be performed using:
For small sample sizes the moderated test-statistics has higher power for larger
sample sizes both statistics converge to eachother, as expected.
9
> library(lattice)
> data <- data.frame(theta = rep(theta(ss), 2),
+
lambda = c(lambda(ss), lambda(ssMod)), statistic = rep(c("original",
+
"moderated"), each = length(theta(ss))))
> print(xyplot(lambda ~ theta, group = statistic,
+
data, type = c("g", "l"), xlab = "effect size",
+
ylab = "", auto.key = list(columns = 2, points = FALSE,
+
lines = TRUE)))
moderated
original
0.25
0.20
0.15
0.10
0.05
0.00
−6
−4
−2
0
2
4
effect size
Figure 6: Density of effect size for moderated and orginal t test-statistics.
5 Additional functionality
5.1 Ferreira’s π0 estimate
Ferreira et al. propose a semi-parametric method to estimate the proportion of nondifferentially expressed genes[1, 2].
10
6
> ss <- sampleSize(pd, method = "Ferreira", pi0 = seq(0.05,
+
0.5, 0.05), doplot = TRUE)
5.2 Rupperts estimation method
For using the method proposed by Ruppert [5] two additional packages need to be
installed for using this method namely quadprog and splines. Both π0 and the
density of effect sizes is estimated by Ruppert’s method.
> ss <- sampleSize(pd, method = "Ruppert", nKnots = 11,
+
bDegree = 3)
6 Details
This document was written using:
ˆ R version 2.13.0 Under development (unstable) (2010-12-16 r53865),
x86_64-unknown-linux-gnu
ˆ Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8,
LC_COLLATE=en_US.UTF-8, LC_MONETARY=C, LC_MESSAGES=en_US.UTF-8,
LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C,
LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C
ˆ Base packages: base, datasets, graphics, grDevices, methods, stats, utils
ˆ Other packages: Biobase 2.11.6, convert 1.27.0, lattice 0.19-13, limma 3.7.22,
marray 1.29.1, multtest 2.7.1, qvalue 1.25.0, SSPA 2.0.0
ˆ Loaded via a namespace (and not attached): grid 2.13.0, MASS 7.3-9,
splines 2.13.0, survival 2.36-2, tcltk 2.13.0, tools 2.13.0
References
[1] J.A. Ferreira and A. Zwinderman. Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method. International Journal of Biostatistics, 2(1), 2006.
[2] J.A. Ferreira and A. Zwinderman. Approximate Sample Size Calculations with
Microarray Data: An Illustration. Statistical Applications in Genetics and Molecular Biology, 5(1), 2006.
[3] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical
and powerful approach to multiple testing. Journal of the Royal Statistical Society:
Series B, 57:289–300, 1995.
11
[4] M. van Iterson, P.A.C. ’t Hoen, P. Pedotti, G.J.E.J. Hooiveld, J.T. den Dunnen,
G.J.B. van Ommen, J.M. Boer, and R.X. Menezes. Relative power and sample
size analysis on gene expression profiling data. BMC Genomics, 10:439–, 2009.
[5] D. Ruppert, D. Nettleton, and J.T.G. Hwang. Exploring the information in pvalues for the analysis and planning of multiple-test experiments. Biometrics,
63(2):483–95, 2007.
[6] T R Golub, D K Slonim, P Tamayo, C Huard, M Gaasenbeek, J P Mesirov,
H Coller, M L Loh, J R Downing, M A Caligiuri, C D Bloomfield, and E S
Lander. Molecular classification of cancer: class discovery and class prediction by
gene expression monitoring. Science, 286(5439):531–7, 1999.
[7] M. Langaas, B.H. Lindqvist, and E. Ferkingstad. Estimating the proportion of
true null hypotheses, with application to DNA microarray data. Journal of the
Royal Statistical Society: Series B, 67(4):555–572, 2005.
[8] J.D. Storey. A direct approach to false discovery rates. Journal of the Royal
Statistical Society: Series B, 64:479–498, 2002.
[9] G. Wright and R. Simon. A random variance model for detection of differential
gene expression in small microarray experiments. Bioinformatics, 19(18):2448–
2455, 2003.
[10] R.X. Menezes, J.M. Boer, and H. van Houwelingen. Microarray data analysis: a
hierarchical t-test to handle heteroscedasticity. Applied Bioinformatics, 3(4):229–
235, 2004.
12
>
>
+
>
+
>
+
+
+
>
+
+
+
+
samplesizes <- 2 * c(2, 4, 6)
pwr <- power(ss, plot = FALSE, effectivesamplesizes = samplesizes/2,
fdr = 0.1)
pwrMod <- power(ssMod, plot = FALSE, effectivesamplesizes = samplesizes/2,
fdr = 0.1)
averagepower <- data.frame(power = c(pwr, pwrMod),
statistics = rep(c("original", "moderated"),
each = 3), samplesizes = rep(samplesizes,
2))
print(xyplot(power ~ samplesizes, group = statistics,
data = averagepower, type = c("g", "b"), lwd = 2,
pch = 16, xlab = "sample size (per group)",
ylab = "", auto.key = list(columns = 2, lines = TRUE,
points = FALSE)))
moderated
original
0.9
●
0.8
●
●
0.7
●
0.6
●
0.5
0.4
0.3
●
4
6
8
10
sample size (per group)
Figure 7: Estimated average power for other sample sizes then that of the pilot-data for
both moderated and orginal t test-statistics and for different false discovery
rate thresholds.
13
12