Sample Size and Power Analysis for Microarray Studies Maarten van Iterson and Ren´ee de Menezes Center for Human and Clinical Genetics, Leiden University Medical Center, The Netherlands Package SSPA, version 2.0.0 March 29, 2011 Contents 1 Introduction 1 2 Basic Example 2 3 Compare multiple experiments 4 4 Using SSPA in combination with limma 7 5 Additional functionality 10 5.1 Ferreira’s π0 estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Rupperts estimation method . . . . . . . . . . . . . . . . . . . . . . . . 11 6 Details 11 1 Introduction This document shows the functionality of the R-package SSPA. The package performs power and sample size analysis using a method described by [1, 2]. Our implementation allows for fast and realistic estimates of power and sample size for microarray experiments, given pilot data. By means of two simple commands (pilotData(), sampleSize()), a researcher can read their data in and compute the desired estimates. Other functions are provided to facilitate interpretation of results. Given a set of test statistics from the pilot data, the knowledge of their distribution under the null hypothesis and the sample size used to compute them, the method 1 estimates the power for a given false discovery rate. The multiple testing problem is controlled through the adaptive version of the Benjamini and Hochberg method [3]. For more details about the implementation and method we refer to [1, 2]. In [4] we describe two biological case studies using the package SSPA. For comparison we implement the power and sample size estimation method proposed by Ruppert [5] which uses a different estimation approach. Two additional packages need to be installed for using this method namely quadprog and splines. 2 Basic Example We demonstrate the functionality of the package by using preprocessed gene expression data from the leukemia ALL/AML study of [6] as provided by multtest package. To load the leukemia dataset, use data(golub) for a short description of the experiment and data use ?golub. > library(multtest) > data(golub) The required input for the sample size and power analysis is a vector of test-statistics and the effective sample sizes. The test-statistics are obtained by a differential gene expression analysis using one of the available packages like limma, maanova, multtest (it is also possible to import the vector of test-statistics in R if they are calculated using another software package). Here we will use the function mt.teststat from the multtest package to obtain a vector of test-statistics from the leukemia data. The effective sample size for a two-group experiment is given by: −1 1 1 + , (1) N= nA nB where nA is the sample size of one group and nB the other. > tst <- mt.teststat(golub, golub.cl) > X <- model.matrix(~golub.cl) > effectivesamplesize <- 1/solve(crossprod(X))[2, + 2] The first step in performing the sample size and power analysis is creating a object of class PilotData which will contain all the necessary information needed for the following power and sample size analysis. A user-friendly interface for creating an object of PilotData is available as pilotData(). > library(SSPA) > pd <- pilotData(statistics = tst, effectivesamplesize = effectivesamplesize) Several ways of viewing the content of the PilotData object are possible either graphically or using a show-method by just typing the name of the created object of PilotData: 2 > pd An object of class "PilotData" Number of test-statistics: 3051 Effective sample size: 7.82 Null distribution: Normal By using plot(pd) several diagnostic plots are produced. > print(plot(pd)) 4 3 2 1 0 0.15 0.10 0.05 0.00 −5 0 5 10 0.0 0.2 test statistic test statistic 10 5 0 −5 0 0.6 0.8 1.0 0.8 1.0 p−value 1.0 0.8 0.6 0.4 0.2 0.0 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● −2 0.4 2 0.0 qnorm 0.2 0.4 0.6 p−value (sorted) Figure 1: Diagnostic plots of the pilot-data: Upper-left panel show a histogram of the test statistics, upper right the corresponding histogram of p-values. The lower left-panel shows a quantile-quantile plot for the test statistics and in the lower right-panel the order p-values are plotted against their ranks. In this example a large effect is visual. Next we create an object of class SampleSize which will perform the estimation of the proportion of non-differentially expressed genes and the density of effect sizes. Several options are available see ?sampleSize. The default method for estimation of 3 the proportion of non-differentially expressed genes is the method proposed by [7] as implemented by convest() function from the package limma. Additionally the method by [8] and [2] are available, also a user-defined proportion is possible. Again a generic show-method is implemented as well. > ss <- sampleSize(pd) > ss An object of class "SampleSize" Distribution of effect sizes is estimated from -6 to 6 using 1024 points. Method for estimation of the proportion of non-differentially expressed: "Langaas Fraction of non-differentially expressed genes: 0.4702 (adjusted=0.4761). Kernel used in the deconvolution is "fan" with bandwidth 0.353. The density of effect size can be shown by a call to plot(). When there are both up- and down-regulated genes a bimodal density is observed. Estimation of the average power for other sample sizes then that of the pilot-data can be performed with the power()-function. The user can also give desired false discovery rate level or possible multiple false discovery rate levels as a vector. 3 Compare multiple experiments Relative power and sample size analysis is useful when multiple experiments can be compared e.g. using different treatments or exposure times. The following example is described in more detail in [4]. In this experiment the outcome of specific PPAR-alpha activation on murine small intestinal gene expression was examined using Affymetrix GeneChip Mouse 430 2.0 arrays. PPAR-alpha was activated by several PPAR-alpha-agonists that differed in activating potency. In this paper the data of three agonists were used, namely Wy14,643, fenofibrate and trilinolenin (C18:3). The first two compounds belong to the fibrate class of drugs that are widely prescribed to treat dyslipidemia, whereas trilinolenin is an agonist frequently found in the human diet. For intestinal PPAR-alpha, Wy14,643 is the most potent agonist followed by C18:3 and fenofibrate. Since time of exposure also affects the effect size, intestines were collected 6 hrs (all three agonists) or 5 days (Wy14,643 and fenofibrate only) after exposure. The data is available from the SSPA-package using data(Nutrigenomics) and represents the test statistics for the five experimental conditionals. The first row of the data-matrix contains the effective sample sizes. In order to construct a PilotData object we use the apply-family of function for convinience. Here we use the normal distributional assumption and Storey’s method to estimated the proportion of nondifferentially expressed genes because these methods are the fastest. > data(Nutrigenomics) > str(Nutrigenomics) 'data.frame': 16540 obs. of 5 variables: $ wy5d : num 2 1.22 1.19 -1.14 1 0.86 4 -1.83 -2.64 3.04 ... $ feno5d: num 2 -1.03 -1.11 0.33 0.22 0.22 0.24 0.36 1.73 0.54 ... 4 > print(plot(ss)) density of effect sizes 0.20 0.15 0.10 0.05 0.00 −6 −4 −2 0 2 4 6 effect size Figure 2: Density of effect-sizes: $ X1836h: num 2.5 -2.3 -0.56 0.06 -0.89 -0.09 2 -2.12 -1.47 -1.69 ... $ wy6h : num 2 1.12 0.21 -0.68 -1.06 -0.4 1.82 -1.71 1.2 1.03 ... $ feno6h: num 2.22 0.76 -0.37 -0.07 -1.9 ... > pd <- apply(Nutrigenomics, 2, function(x) pilotData(statistics = x[-1], + effectivesamplesize = x[1], distribution = "Normal")) > ss <- lapply(pd, sampleSize, method = "Storey") We could use the plot to view the density of effectsizes of each set of test statistics but it would be more informative to use view them together in one plot e.g. using the lattice function xyplot. The sample sizes of these experiments were in the range of 4 to 5 per group with not all experiments balanced. What would be the average power gain using 10 sample per group? The following code answers this question again for all experiments together. 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.90 ● 0.85 Power 0.95 1.00 > 1/(1/27 + 1/11) [1] 7.81579 > 1/(1/27 + 1/seq(11, 27, 2)) [1] 7.815789 8.775000 9.642857 10.431818 11.152174 [6] 11.812500 12.420000 12.980769 13.500000 > pwr <- power(ss, plot = FALSE, effectivesamplesizes = 1/(1/27 + + 1/seq(11, 27, 2)), fdr = c(0.01, 0.05)) > matplot(seq(11, 27, 2), pwr, type = "b", pch = 1, + ylim = c(0.8, 1), ylab = "Power", xlab = "Sample size") > grid() > legend("bottomright", colnames(pwr), col = c(1:ncol(pwr)), + pch = 1, lty = 1) ● 0.80 ● 15 20 fdr = 0.01 fdr = 0.05 25 Sample size Figure 3: Estimated average power for different sample sizes and FDR-levels: The sample size of one group was fixed at 27, the pilot data sample size and the other was increased from 11 (pilot data sample size) to 27. 6 > library(lattice) > effectsize <- data.frame(exposure = factor(rep(rep(c("5 Days", + "6 Hours"), c(2, 3)), each = 1024)), compound = factor(rep(c("Wy14,643", + "fenofibrate", "trilinolenin (C18:3)", "Wy14,643", + "fenofibrate"), each = 1024)), lambda = as.vector(sapply(ss, + lambda)), theta = as.vector(sapply(ss, theta))) > print(xyplot(lambda ~ theta | exposure, group = compound, + data = effectsize, type = c("g", "l"), layout = c(1, + 2), lwd = 2, xlab = "effect size", ylab = "", + auto.key = list(columns = 3, lines = TRUE, + points = FALSE))) fenofibrate trilinolenin (C18:3) Wy14,643 6 Hours 0.4 0.3 0.2 0.1 0.0 5 Days 0.4 0.3 0.2 0.1 0.0 −6 −4 −2 0 2 4 6 effect size Figure 4: Density of effect size of the five sets of test statistics. 4 Using SSPA in combination with limma limma is a frequently used package for differential expression analysis of microarray experiments. Here we will show how SSPA can be used with the results from limma. 7 > samplesize <- 2 * seq(4, 8) > averagepower <- data.frame(power = as.vector(sapply(ss, + function(x) as.numeric(power(x, effectivesamplesize = samplesize/2, + fdr = 0.05)[, 1]))), exposure = factor(rep(rep(c("5 Days", + "6 Hours"), c(2, 3)), each = length(samplesize))), + compound = factor(rep(c("Wy14,643", "fenofibrate", + "trilinolenin (C18:3)", "Wy14,643", "fenofibrate"), + each = length(samplesize))), samplesize = rep(samplesize, + 5)) > print(xyplot(power ~ samplesize | exposure, group = compound, + data = averagepower, type = c("g", "b"), layout = c(1, + 2), lwd = 2, pch = 16, xlab = "sample size (per group)", + ylab = "", auto.key = list(columns = 3, lines = TRUE, + points = FALSE))) fenofibrate trilinolenin (C18:3) Wy14,643 6 Hours ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 Days 0.9 0.8 0.7 0.6 0.5 0.4 ● ● 8 ● ● ● ● ● ● ● ● 10 12 14 16 sample size (per group) Figure 5: Power curves for the five experiments. 8 0.9 0.8 0.7 0.6 0.5 0.4 First obtain limma fit of the data. Here we will use the swirl example data from marray which is also extensively described in the limma useguide. > > > > > > > > > > library(marray) data(swirl) library(convert) library(limma) swirl <- as(swirl, "RGList") MA <- normalizeWithinArrays(swirl) design <- c(-1, 1, -1, 1) fit <- lmFit(MA, design) ordinary.t <- fit$coef/fit$stdev.unscaled/fit$sigma fitModerated <- eBayes(fit) Now we load the SSPA-package and create two pilotData-objects, one for the ordinary t test-statistics and one for the moderated t test-statistics with the appropriate degrees of freedom and effective sample size. Futher information on the moderated test statistics can be found in [9, 10]. > > + > > > + + > + + library(SSPA) effectivesamplesize <- 1/solve(crossprod(design))[1, 1] nu <- fit$df.residual[1] nu0 <- fitModerated$df.prior pd <- pilotData(statistics = ordinary.t[, 1], effectivesamplesize = effectivesamplesize, dof = nu, distribution = "Student") pdMod <- pilotData(statistics = fitModerated$t[, 1], effectivesamplesize = effectivesamplesize, dof = nu + nu0, distribution = "Student") The data comes from a dye-swap experiment with four microarrays, thus for each RNA source we have two biological and two technical replicates. Because this is an dye-swap experiment we can estimate the difference between the biological sources, leaving three degrees of freedom for estimation of the error. The pdMod-object has a moderated degrees of freedom, the moderated Student tdistribution will be used for the analysis. The effective sample size was defined by: N = (1/nX + 1/nY )−1 , thus with both four samples gives two. > ss <- sampleSize(pd) > ssMod <- sampleSize(pdMod) Again estimation of the average power for other sample sizes then that of the pilotdata can be performed using: For small sample sizes the moderated test-statistics has higher power for larger sample sizes both statistics converge to eachother, as expected. 9 > library(lattice) > data <- data.frame(theta = rep(theta(ss), 2), + lambda = c(lambda(ss), lambda(ssMod)), statistic = rep(c("original", + "moderated"), each = length(theta(ss)))) > print(xyplot(lambda ~ theta, group = statistic, + data, type = c("g", "l"), xlab = "effect size", + ylab = "", auto.key = list(columns = 2, points = FALSE, + lines = TRUE))) moderated original 0.25 0.20 0.15 0.10 0.05 0.00 −6 −4 −2 0 2 4 effect size Figure 6: Density of effect size for moderated and orginal t test-statistics. 5 Additional functionality 5.1 Ferreira’s π0 estimate Ferreira et al. propose a semi-parametric method to estimate the proportion of nondifferentially expressed genes[1, 2]. 10 6 > ss <- sampleSize(pd, method = "Ferreira", pi0 = seq(0.05, + 0.5, 0.05), doplot = TRUE) 5.2 Rupperts estimation method For using the method proposed by Ruppert [5] two additional packages need to be installed for using this method namely quadprog and splines. Both π0 and the density of effect sizes is estimated by Ruppert’s method. > ss <- sampleSize(pd, method = "Ruppert", nKnots = 11, + bDegree = 3) 6 Details This document was written using: R version 2.13.0 Under development (unstable) (2010-12-16 r53865), x86_64-unknown-linux-gnu Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=C, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C Base packages: base, datasets, graphics, grDevices, methods, stats, utils Other packages: Biobase 2.11.6, convert 1.27.0, lattice 0.19-13, limma 3.7.22, marray 1.29.1, multtest 2.7.1, qvalue 1.25.0, SSPA 2.0.0 Loaded via a namespace (and not attached): grid 2.13.0, MASS 7.3-9, splines 2.13.0, survival 2.36-2, tcltk 2.13.0, tools 2.13.0 References [1] J.A. Ferreira and A. Zwinderman. Approximate Power and Sample Size Calculations with the Benjamini-Hochberg Method. International Journal of Biostatistics, 2(1), 2006. [2] J.A. Ferreira and A. Zwinderman. Approximate Sample Size Calculations with Microarray Data: An Illustration. Statistical Applications in Genetics and Molecular Biology, 5(1), 2006. [3] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57:289–300, 1995. 11 [4] M. van Iterson, P.A.C. ’t Hoen, P. Pedotti, G.J.E.J. Hooiveld, J.T. den Dunnen, G.J.B. van Ommen, J.M. Boer, and R.X. Menezes. Relative power and sample size analysis on gene expression profiling data. BMC Genomics, 10:439–, 2009. [5] D. Ruppert, D. Nettleton, and J.T.G. Hwang. Exploring the information in pvalues for the analysis and planning of multiple-test experiments. Biometrics, 63(2):483–95, 2007. [6] T R Golub, D K Slonim, P Tamayo, C Huard, M Gaasenbeek, J P Mesirov, H Coller, M L Loh, J R Downing, M A Caligiuri, C D Bloomfield, and E S Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286(5439):531–7, 1999. [7] M. Langaas, B.H. Lindqvist, and E. Ferkingstad. Estimating the proportion of true null hypotheses, with application to DNA microarray data. Journal of the Royal Statistical Society: Series B, 67(4):555–572, 2005. [8] J.D. Storey. A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B, 64:479–498, 2002. [9] G. Wright and R. Simon. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics, 19(18):2448– 2455, 2003. [10] R.X. Menezes, J.M. Boer, and H. van Houwelingen. Microarray data analysis: a hierarchical t-test to handle heteroscedasticity. Applied Bioinformatics, 3(4):229– 235, 2004. 12 > > + > + > + + + > + + + + samplesizes <- 2 * c(2, 4, 6) pwr <- power(ss, plot = FALSE, effectivesamplesizes = samplesizes/2, fdr = 0.1) pwrMod <- power(ssMod, plot = FALSE, effectivesamplesizes = samplesizes/2, fdr = 0.1) averagepower <- data.frame(power = c(pwr, pwrMod), statistics = rep(c("original", "moderated"), each = 3), samplesizes = rep(samplesizes, 2)) print(xyplot(power ~ samplesizes, group = statistics, data = averagepower, type = c("g", "b"), lwd = 2, pch = 16, xlab = "sample size (per group)", ylab = "", auto.key = list(columns = 2, lines = TRUE, points = FALSE))) moderated original 0.9 ● 0.8 ● ● 0.7 ● 0.6 ● 0.5 0.4 0.3 ● 4 6 8 10 sample size (per group) Figure 7: Estimated average power for other sample sizes then that of the pilot-data for both moderated and orginal t test-statistics and for different false discovery rate thresholds. 13 12
© Copyright 2024