Microarray experimental design: power and sample size considerations M. C. K. Yang, J. J. Yang, R. A. McIndoe and J. X. She Physiol. Genomics 16:24-28, 2003. First published 7 October 2003; doi:10.1152/physiolgenomics.00037.2003 You might find this additional info useful... This article cites 14 articles, 8 of which can be accessed free at: http://physiolgenomics.physiology.org/content/16/1/24.full.html#ref-list-1 The Autoimmune Regulator Directly Controls the Expression of Genes Critical for Thymic Epithelial Function Qing-Guo Ruan, Kenneth Tung, Daniel Eisenman, Yulius Setiady, Sarah Eckenrode, Bing Yi, Sharad Purohit, Wei-Peng Zheng, Yan Zhang, Leena Peltonen and Jin-Xiong She J Immunol, June 1, 2007; 178 (11): 7173-7180. [Abstract] [Full Text] [PDF] Epistemological issues in omics and high-dimensional biology: give the people what they want Tapan S. Mehta, Stanislav O. Zakharkin, Gary L. Gadbury and David B. Allison Physiol. Genomics, December 13, 2006; 28 (1): 24-32. [Abstract] [Full Text] [PDF] A guide to issues in microarray analysis: application to endometrial biology Christine A White and Lois A Salamonsen Reproduction, July , 2005; 130 (1): 1-13. [Abstract] [Full Text] [PDF] False discovery rate, sensitivity and sample size for microarray studies Yudi Pawitan, Stefan Michiels, Serge Koscielny, Arief Gusnanto and Alexander Ploner Bioinformatics, July 1, 2005; 21 (13): 3017-3024. [Abstract] [Full Text] [PDF] Updated information and services including high resolution figures, can be found at: http://physiolgenomics.physiology.org/content/16/1/24.full.html Additional material and information about Physiological Genomics can be found at: http://www.the-aps.org/publications/pg This infomation is current as of December 18, 2011. Physiological Genomics publishes results of a wide variety of studies from human and from informative model systems with techniques linking genes and pathways to physiology, from prokaryotes to eukaryotes. It is published quarterly in January, April, July, and October by the American Physiological Society, 9650 Rockville Pike, Bethesda MD 20814-3991. Copyright © 2003 by the American Physiological Society. ISSN: 1094-8341, ESSN: 1531-2267. Visit our website at http://www.the-aps.org/. Downloaded from physiolgenomics.physiology.org on December 18, 2011 This article has been cited by 5 other HighWire hosted articles Comparative analysis of innate immune responses following infection of newborn calves with bovine rotavirus and bovine coronavirus Palok Aich, Heather L. Wilson, Radhey S. Kaushik, Andy A. Potter, Lorne A. Babiuk and Philip Griebel J Gen Virol, October , 2007; 88 (10): 2749-2761. [Abstract] [Full Text] [PDF] Physiol Genomics 16: 24–28, 2003. First published October 7, 2003; 10.1152/physiolgenomics.00037.2003. Microarray experimental design: power and sample size considerations M. C. K. Yang,1 J. J. Yang,1 R. A. McIndoe,2 and J. X. She2 1 Department of Statistics, University of Florida, Gainesville, Florida 32611; and 2Center for Biotechnology and Genomic Medicine, Medical College of Georgia, Augusta, Georgia 30912 Submitted 20 March 2003; accepted in final form 30 September 2003 gene expression; statistical analysis; functional genomics HIGH-THROUGHPUT MICROARRAY technologies have created unparalleled opportunities to study the mechanism of disease, monitor disease progression, and evaluate effective therapies. Because tens of thousands of genes are investigated simultaneously with a technology that has not been perfected, there is a large uncertainty in the decision process. Therefore, one must rely on statistical analyses of the data to draw conclusions from microarray experiments. A number of methods have been developed to extract useful and robust information from microarray data sets once the hybridizations are complete. However, not as much attention has been paid to the experimental design before the investigator actually completes the experiment. One of the principal questions while designing a microarray experiment is: What is the sample size required for a microarray experiment to achieve a user defined statistical threshold? To achieve this goal, one must decide how the data will be analyzed and what conclusions can be made. Since microarrays can be used to answer a variety of questions, different experiments will require different designs and analyses. In this study, we will concentrate on finding expression differences between two treatment groups. The goals of the experiments can be classified into three categories: 1) comparing two groups at a single time point, 2) examining the treatment effect over time when the without treatment effect is known, and 3) Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org). Address for reprint requests and other correspondence: J.-X. She, Center for Biotechnology and Genomic Medicine, Medical College of Georgia, 1120 15th St., PV6B108, Augusta, GA 30912 (E-mail: [email protected]); or M. C. K. Yang, Dept. of Statistics, Univ. of Florida, Gainesville, FL 32611 (E-mail: [email protected]). 24 comparing two groups at sequential time points. We will investigate the sample size requirements necessary to satisfy a specified false discovery rate (FDR) and power (1 ⫺ ) given various biological and experimental variation. Although there are other reports on how to compute the sample size, none of them is as simple and straightforward as the methods presented here. In our experience, we often observe a large number of genes that show expression differences in a typical microarray analysis. The key is that we want to make sure that only a small number in these selected genes are false positives (discoveries). A brief survey of some of the other methods is in the DISCUSSION. METHODS cDNA microarrays are printed on glass microscope slides using established protocols (15). The analysis of microarray data can be separated into several stages. The first stage is to eliminate spots with very weak signal, which we define as the signal that is indistinguishable from the background noise. Yang et al. (17) demonstrated that any signal less than 3 standard deviations of the background noise should be discarded. Due to the variation in printing material on the cDNA arrays, the expression is more accurately measured by the ratio of sample intensity vs. intensity from a common reference (5, 12). To derive the ratio, the intensities from both channels have to be normalized. Although there are a number of methods to normalize the two channels, Yang et al. (17) determined that there was actually a very small difference among the different normalization methods (10) when spots with sufficient signal are chosen. In this study, we assume that the analysis will be done after flagging for weak spots, logtransformation, and normalization. Given these assumptions, the useful data is now yijtk, where i is the treatment index, j is the subject index, t is the time index, and k is the replicate index. A few assumptions have to be made in order to obtain the analytic results for sample size determination. The most general model in this study is y ijtk ⫽ it ⫹ ␦ ijt ⫹ ⑀ ijtk (1) where it is the mean expression level under the ith treatment at time t, ␦ijt represents the subject variation, and ⑀ijtk is the experimental variation. Throughout this paper, it is a fixed constant (or fixed effect), and ␦ijt is a random sample from subjects under treatment i, or it is a random effect. Some of the indices can also be eliminated when they are not relevant. For example, when the data is not measured over time, the index t is eliminated, and when there is only one treatment, the index i is omitted. Adjustment of the threshold significance level. As in the usual hypothesis testing, a significance level ␣ is assigned. When there is only one hypothesis, this guarantees that the chance of incorrectly rejecting the null hypothesis is ␣. When many hypotheses are tested simultaneously, ␣ cannot be considered as the significance level for rejecting all the true null hypotheses. The level for each individual test has to be adjusted. One option for adjusting the test is to use the Bonferroni inequality, i.e., to set every test at the new significant level ␣/ng, where ng is the number of hypotheses tested simultaneously. We use the notation ng because this number is the number of genes in the array. However, when ng is very large, the Bonferroni adjustment is 1094-8341/03 $5.00 Copyright © 2003 the American Physiological Society Downloaded from physiolgenomics.physiology.org on December 18, 2011 Yang, M. C. K., J. J. Yang, R. A. McIndoe, and J. X. She. Microarray experimental design: power and sample size considerations. Physiol Genomics 16: 24–28, 2003. First published October 7, 2003; 10.1152/physiolgenomics.00037.2003.—Gene expression analysis using high-throughput microarray technology has become a powerful approach to study systems biology. The exponential growth in microarray experiments has spawned a number of investigations into the reliability and reproducibility of this type of data. However, the sample size requirements necessary to obtain statistically significant results has not had as much attention. We report here statistical methods for the determination of the sufficient number of subjects necessary to minimize the false discovery rate while maintaining high power to detect differentially expressed genes. Two experimental designs were considered: 1) a comparison between two groups at a single time point, and 2) a comparison of two experimental groups with sequential time points. Computer programs are available for the methods discussed in this paper and are adaptable to more complicated situations. 25 SAMPLE SIZE FOR MICROARRAY EXPERIMENTS too conservative because it is meant to control family-wise error rate (1), i.e., to control the probability of one type I error among thousands of hypotheses. In microarray experiments, it is important to find the best compromise between maximizing the power of detection and minimizing false-positive identification. Very often, although not always, additional follow-up research such as database search and real-time PCR will be done to further eliminate false-positive results (6). Thus the FDR defined as the expected proportion of wrong discovery (1) fits this purpose better. At the sample size determination stage, we need to know approximately the minimum number of genes that will be accepted under a given selection rule. An experimenter may know, from past experience, that at least n0 genes will be accepted, then the adjusted significance level for each gene is ␣⬘ ⫽ ␣ max 共1, n0兲/ng (2) y ijk ⫽ i ⫹ ␦ ij ⫹ ⑀ ijk (3) where i ⫽ 1, 2; j ⫽ 1, . . ., ni; k ⫽ 1, . . ., m; ␦ij is the between subjects variation, and ⑀ijk is experimental variation. When m ⬎ 1, replicates are used to reduce the intermicroarray variation under the same biological condition. We assume that both of ␦ij and ⑀ijk are normally distributed with zero mean and standard deviation (SD) s and e, respectively. The hypotheses are H 0: 1 ⫽ 2 versus H1: 1 ⫽ 2 (4) for each gene. After the data has been collected, the mean and variance for each gene is defined by y i.. ⫽ ⌺ j ⌺ k y ijk /共n i m兲, y ij. ⫽ ⌺ k y ijk /m (5) 2 ⫽ ⌺ j ⌺ k 共y ij. ⫺ y i.. 兲 2 /共n 1 ⫹ n 2 ⫺ 2兲 s p.. (6) and the decision rule is to accept H1 at significance level ␣⬘ when t ⫽ 兩y 1.. ⫺ y 2.. 兩/关s p 共1/n 1 ⫹ 1/n 2 兲 1/2 兴 ⱖ t ,␣⬘ (7) where ␣⬘ is defined by Eq. 2, ⫽ n1 ⫹ n2 ⫺ 2, and t,␣⬘ is the upper ␣⬘ quantile of the t distribution with degrees of freedom. Other types of tests such as unequal variance t-test or a Mann-Whitney test can also be used, but the results should be very similar (18). To determine the sample size, we need to specify the power (1 ⫺ ) at a predetermined level of significant gene expression difference 兩1 ⫺ 2兩. In traditional statistics, what we need is the value ⌬ ⫽ 兩 1 ⫺ 2 兩/ s (8) and the variance ratio r⫽ 2e s2 (9) It is often difficult to determine what ⌬ constitutes a significant expression difference. We believe that the overlap concept can make Physiol Genomics • VOL 16 • w ⫽ the proportion of subjects with (10) expression apparently belonging to the other group The term w is the proportion of subjects with expression that appears to belong to the other treatment group. Under the model in Eq. 3, the overlapping proportion is symmetric, i.e., the proportions that violate the treatment assignment are the same in both treatments. The relation between the traditional ⌬ in Eq. 8 and w in Eq. 10 is w ⫽ 1 ⫺ ⌽共⌬/2兲 (11) where ⌽(x) is the standard normal distribution function. Although ⌬ and w are equivalent and can be easily converted from one to another, it is our opinion that the idea of overlapping between groups has an easier interpretation. A historical survey and recent developments in using overlapping for sample size determination have been published (3, 8, 14). Hwang et al. (9) has also used the same idea in microarray analysis. To determine the sample size based on Eqs. 7 and 11, we use the noncentral F distribution (see APPENDIX II). Serial time points with one treatment group. This experimental design will measure gene changes under one treatment over time, while the gene expression under no treatment condition is known. Let the time points be 1, 2, . . ., T. Since there is only one treatment, the index i can be dropped, and Eq. 1 becomes y jtk ⫽ t ⫹ ␦ jt ⫹ ⑀ jtk (12) where ␦jt represents the subject variation. Under the null hypothesis, when there is no treatment, we should have H 0: 1 ⫽ 2 ⫽ · · · ⫽ T i.e., gene expression is not affected by the treatment. The test for this hypothesis would be a two-way analysis of variance with time and subject as two discrete factors with the subjects being a random effect. The alternative is that H0 is not true, i.e., gene expression has been changed. To determine the sample size, we need to specify the pattern of {t} under H1. Moreover, we need to know how this pattern varies among subjects. We use a simple linear expression by letting t ⫽ ␥t, and ␦jt ⫽ ␦jt (13) where ␥ ⫽ 0 under H0 (␥ ⫽ 0 under H1), and the ␦j terms represent the variation of the subjects response, which are assumed to be independent and identically distributed normal random variables with 0 mean and variance ␥2(N(0,␥2)) and ⑀jtk ⬃ N(0,e2). The reason for this alternative model is that with a linear time effect, the slope usually changes with the subjects. However, the linear relationship is not crucial, because when time is considered discrete, Eq. 13 can be a good approximation to a quadratic change. For example, the pattern in Fig. 1, left, can be expressed linearly when time is reordered. We have tried to use specific alternatives such as a quadratic model (which covers the linear model) to increase the power of the test. However, the increase in power is very limited. One could also make the model more general by adding correlations between time points. But during the design stage, this correlation is difficult to ascertain. The random effect, ␦j, in the slope should be enough to rep resent the intra-subject correlation. Again, we will use the overlap concept to specify the alternative. In the model in Eq. 13, suppose the alternative is ␥ ⬎ 0. (The case ␥ ⬍ 0 works almost identically.) We let w ⫽ the proportion of subjects (in the population) that show the opposite (␥ ⫹ ␦j ⬍ 0) mRNA expression. The power can be computed by simulation using SAS PROC MIXED, i.e., we simulate the data a large number of times under the alternative and use the proportion of the times that H1 is accepted as www.physiolgenomics.org Downloaded from physiolgenomics.physiology.org on December 18, 2011 to guarantee an ␣ FDR. The derivation is given in the APPENDIX I. The value n0 has to be determined by experience or a pilot experiment. In our experience, there are always a considerable number of genes that will show different expressions. If there is no way to determine n0, then the experimenter may let n0 ⫽ 0 and determine the most conservative (largest) sample size. Sometimes it may be necessary to do a pilot study to determine n0 and other variance components in the model. For our laboratory, it is common practice to do a pilot study to get an idea of these parameters and then determine the sample size for the required power. At the data analysis stage, other methods such as permutation tests (4, 16) may be used, but their power are not as easy to compute at the design stage. Single time point comparison between two treatment groups. For the case of a comparing between treatments at one time point, one can drop the t index so Eq. 1 becomes the interpretation of the alternative easier. When the alternative hypothesis is true, we have 兩1 ⫺ 2兩 ⬎ 0. Naturally, any expression closer to 1 will be classified to treatment 1 and vice versa. Let 26 SAMPLE SIZE FOR MICROARRAY EXPERIMENTS Fig. 1. Any reaction where time is considered as a discrete variable can be approximated by a linear function. Note the time points on the right are reordered from the times on the left. H 0: 11 ⫽ 21; 12 ⫽ 22; . . . ; 1T ⫽ 2T and the general alternative that H0 is not true. This model is based on two factors, time and treatment, and subjects are nested in the treatments. The experiment is to test the treatment effect or the treatment-time interaction. We will not consider the treatment-time interaction in the design stage. To specify an alternative for the sample size determination, more parameters are involved. We follow the linear approximation in Eq. 13 with 2t ⫽ 1t ⫹ ␥t and ␦ijt ⫽ ␦ijt (14) where the ␦ij terms share the same assumptions as those in Eq. 13. Again we can define the alternative by using the overlap idea with w being equal to the proportion of subjects that have the opposite slope (␥ ⫹ ␦ij ⬍ 0) in mRNA expression when it belongs to the treatment group with ␥ ⬎ 0. This model can be expanded to include other alternatives such as the change of intercept. All of these modifications can be easily embedded into the SAS program mentioned in APPENDIX III. The details of power computation are again given in APPENDIX II. RESULTS For a single time point. Figure 2 presents the sample size requirements for a microarray experiment with 5,000 genes (ng) assayed with an FDR of 0.05 (␣) and power ⫽ 0.85 when n0 ⫽ 0, 5, 20, and 50. The sample sizes for the two treatments are assumed to be equal (i.e., n1 ⫽ n2 ⫽ n) and the number of replicates equals 1 (m). The variance ratio (r) defined in Eq. 9 varies from 0 to 1. For example, if ng ⫽ 5,000, r ⫽ 0.25, and a 15% or less overlap between genes from the two treatments is considered significant, then 22 subjects per treatment group is required to reach a power of 0.85 (n0 ⫽ 0), but only 15 subjects if we expect at least 50 genes will show significance. However, Fig. 2 is used only as a demonstration of the change of sample size under various conditions. The software given in APPENDIX III will produce the required sample size under any significance level, power, and other parameter values. For serial time points. Although Fig. 2 covers some useful experimental cases, it is more difficult to provide examples of general interest in serial time point measurements due to the large number of parameters. The power depends not only on the biological/experimental variations as well as the number of time points (T), but also the time unit of T. For example, for a Physiol Genomics • VOL 16 • Fig. 2. The required sample sizes under various alternatives based on the overlap ratio w defined by Eq. 11 (or equivalently effect size ⌬) and variance ratio r by Eq. 9 under various assumptions on the number of possible selected genes (n0). The curves from the top to bottom are for r ⫽ 1.0, 0.75, 0.50, 0.25, and 0.00, respectively. The overall significance is set at alpha ⫽ 0.05. The sample size is n ⫽ n1 ⫽ n2. An example is given in the text. www.physiolgenomics.org Downloaded from physiolgenomics.physiology.org on December 18, 2011 the power estimate. Note that the power is not determined by w alone. It is also a function of T. Details are given in the RESULTS and APPENDIX II. Serial time points with two treatment groups. This experiment will measure the gene expression changes over time from two treatment groups or from one treatment and one control. The model is now the full model in Eq. 1 with the null hypothesis fixed slope ␥, the relative size of ␥T to the experimental standard deviation plays a crucial role. In other words, seven points in 1 day will be different from seven points in a week with the same growth rate ␥ in Eqs. 13 and 14. Thus the variance ratio (r) alone is not sufficient to determine the sample size. Figure 3 shows one example on the sample size required for detecting mRNA expression changes over time. We used the data from Yang et al. (17). In that study, we found a e ⬇ 0.40 for log2-transformed ratios between the sample and reference. If one assumes ng ⫽ 5,000, ␥ ⫽ 0.5, and 5% or less overlap between the two conditions (effect size ⫽ 3.29), and a twofold change in expression at the end of the time series (␥T ⫽ 1), then the sample size curves for the one-sample and two-sample designs can be obtained. The Bonferroni correction (n0 ⫽ 0) and expected 1% of genes to be selected (n0 ⫽ 50) for replicate sizes m ⫽ 1, 2, and 4 are shown. The FDR is set at 0.05, and the power is set at 0.80. Note that in this design, the end time point is fixed. Thus an increase in the number of time points can merely add points between the predefined start and end points. Since the highest difference happens at the end point, there is only a small increase in power by observing more than four time points. It is also quite intuitive that T can compensate for the number of 27 SAMPLE SIZE FOR MICROARRAY EXPERIMENTS replicates (m), because both of them increase the information for the treatment effect when the treatment is effective. APPENDIX I Adjustment of the Significant Level Based on the False Discovery Rate In single hypothesis testing, the significance level ␣ can be also interpreted as the false discovery rate, FDR, i.e., ␣ ⫽ Pr兵Claim difference 共discovery兲兩No difference其 DISCUSSION This paper provides a reasonable way to estimate the sample size for microarray experiments. The analysis methods are chosen to be simple in order to make the unknown parameters as few as possible. More sophisticated methods may be used after the data has been collected, but based on our experience, different analysis methods usually produce similar results for the prominent gene effects. The computer software is very easy to comprehend. Thus a user can easily modify the programs when other tests are more appropriate. For example, if treatment-time interactions are a main concern, then this term can be easily added to the SAS procedure. The alternative under other models, if known, can also be easily generated using the simulation program with simple modifications. The sample size determination discussed in this paper depends on the expected number of genes n0 to be accepted after the experiment. As we mentioned before, if there is no way to determine this number, then the experimenter may choose to use the most conservative number n0 ⫽ 0. Optimal sample size Physiol Genomics • VOL 16 • In multiple hypotheses testing, if all the null hypotheses are true, then the Bonferroni correction is also the same as the FDR. However, if at least n0 ⬎ 0 genes are expected to be accepted, then FDR ⫽ E关共GenesWronglyAccepted兲/共AlltheGenesAccepted兲兴 Moreover 冋 FDR ⫽ E 册 冋 册 GenesWronglyAccepted GenesWronglyAccepted ⱕE AlltheGenesAccepted n0 ⫽ ␣⬘n⬘g ␣⬘ng ⱕ n0 n0 where ␣⬘ is the adjusted significance level used in each test and ng⬘ is the number of genes with no expression differences. When n0 ⫽ 0, it means that there may be no gene accepted. If this is true, then there is no false discovery. However, n0 is only the expected number; there may be genes accepted after the experiment. Since the smallest number is 1, ␣⬘ is the Bonferroni correction. This gives the max(1, n0) in Eq. 2. www.physiolgenomics.org Downloaded from physiolgenomics.physiology.org on December 18, 2011 Fig. 3. Sample size required by one-sample and two-sample (n1 ⫽ n2) sequential experiments based on a specific data set described in the text. The array size is fixed at ng ⫽ 5,000. The alternative is a twofold increase in gene expression at the end time. T is the number of observations between time 0 and T. From the top to the bottom, the curves are for m ⫽ 1, 2, and 4. depends on experience and the validity of certain assumptions. Of course, there is a certain possibility of failure that an experimenter must face. Even to determine the sample size for the simplest two-sample t-test, we must assume that the two treatments have the same variance. There is no guarantee on this before the experiment is done. Similarly, in most microarray experiments, a large n0 can be expected. If the result shows the assumed n0 is too large, then we must admit our misjudgment and may choose to do more experiments to increase the power. This strategy is usually more economical than choosing a large sample size by always assuming the worst scenario. Zien at al. (18) have presented a method similar to ours. In addition to the t-test, they also mentioned the Mann-Whitney nonparametric test and a model with additive noise. However, they did not address serial time points nor the use of the FDR in the analysis. Their results are based on a simulation which is not as accurate as the SAS output based on a noncentral F distribution. Black and Doerge (2) also used the t-test for the power calculation, but they considered only the number of replications with a Bonferroni correction. Subject level variation and FDR were not considered. Pan et al. (13) discussed sample size based on the mixture model. However, under the null hypothesis when all the means are zero, the mixture model is difficult to identify. Thus it may not be worth the effort to go through such a complex model building procedure at the design stage. The paper by Hwang et al. (9) tried to determine the sample size for finding the best discriminant function for the data similar to those given in Ref. 7. But the goal of finding the discriminant function is different from identifying genes with different expressions. The use of two highly correlated genes adds little discriminant power, but both of them are important to be identified. 28 SAMPLE SIZE FOR MICROARRAY EXPERIMENTS APPENDIX II Data Analysis and Power Computation Two sample one time point per subject. The data analysis is based on Eq. 7. For a given w, ⌬ ⫽ 2⌽⫺1(1 ⫺ w), and the power is p ⫽ 1 ⫺ F 1共, 2兲 where F1( ,2) represents the F distribution of 1, ⫽ n1 ⫹ n2 ⫺ 2 degrees of freedom with noncentrality parameter 2 evaluated at the threshold ⫽ F1 ,␣⬘, which is the upper ␣⬘ quantile of the F distribution with 1, ⫽ n1 ⫹ n2 ⫺ 2 degrees of freedom, and 2 ⫽ ⌬2 共1 ⫹ r/m兲共1/n 1 ⫹ 1/n 2 兲 y jt ⫽ 共␥ ⫹ ␦ j 兲t ⫹ ⑀ jt , j ⫽ 1, . . . , n; t ⫽ 1, . . . , T (15) where k in Eq. 12 is embedded into yjt and ⑀jt so they are averages over m values, and ␦j is a normal random deviate with 0 mean and variance ␥, ⑀jt are standard normal deviates, and ␥ ⫽ ⫺␥⌽⫺1共w兲 The generated data is then tested by the mixed model procedure for the significance of the time effect. The proportion of the rejected samples are the estimated power of this test. The mixed model takes care of the random effect from subjects. Users may refer to a SAS example in Ref. 11 (p. 90) to see how SAS program is written. Two sample serial time points experiment. This analysis is based on a mixed model with time and treatment as fixed effects and the subject as a random effect. Without loss of generality, we may let m ⫽ 1 and 1t ⫽ 2t ⫽ 0. For a given T and w, data are generated by y ijt ⫽ 共␥ i ⫹ ␦ ij 兲t ⫹ ⑀ ijt , i ⫽ 1, 2; j ⫽ 1, . . . , n i ; t ⫽ 1, . . . , T (16) where ␦ij and ⑀ijt are generated by the same procedure in Eq. 15 with ␥1 ⫽ 0 and ␥ 2 ⫽ ⫺2␥⌽⫺1共w兲 The power is again estimated the same way as the one-treatment case. APPENDIX III Software The software is located at http://web.stat.ufl.edu/⬃yang/microarray for the microarray sample size determination. Program one_time.sas. This subroutine contains the SAS program for one time point study based on Eq. 3. Program sequent_1.sas. This subroutine contains the SAS program for one sample serial time points measurements based on Eqs. 12 and 13. Physiol Genomics • VOL 16 • REFERENCES 1. Benjamini Y and Hochberg Y. Controlling the false alarm discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc B57: 298–300, 1995. 2. Black MA and Doerge RW. Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18: 1609–1616, 2002. 3. Bradley EL. Overlapping coefficient. In: Encyclopedia of Statistical Sciences. New York, Wiley, 1985, vol. 6, p. 546–547. 4. Callow MJ, Dudoit S, Gong EL, Speed TP, and Rubin EM. Microarray expression profiling identifies gene with altered expression in HDLdeficiency mice. Genome Res 10: 2022–2029, 2000. 5. Chen Y, Dougherty ER, and Bittner ML. Ratio based decisions and quantitative analysis of cDNA microarray images. J Chem Optics 2: 264–374, 1997. 6. Eisen MB, Spellman PT, Brown PO, and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868, 1998. 7. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Dowing JR, Caligiuri MA, Bloomfield CD, and Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537, 1999. 8. Huberty CJ and Lowman LL. Group overlap as basis for effect size. Educ Psychol Meas 60: 543–563, 2000. 9. Hwang D, Schmitt WA, Stephanopoulos G, and Stephanopoulos G. Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18: 1184–1193, 2002. 10. Kerr MK, Martin M, and Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol 7:819–837, 2000. 11. Littell RC, Milliken GA, Stroupe WW, and Wolfinger RD. SAS System for Mixed Models. Cary, NC: SAS Institute, 1996. 12. Newton MA, Kendziorski CM, Richmond CS, Blattner FR, and Tsui KW. On differential variability of expression ratios: improving statistical inference about gene expression changes form microarray data. J Comput Biol 8: 37–52, 2001. 13. Pan W, Lin J, and Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 3: research0022, 2002. 14. Rom DR and Hwang H. Testing for individual and population equivalence based on proportion of similar response. Stat Med 15: 1489–1505, 1996. 15. Shalon D, Smith SJ, and Brown PO. A DNA microarray system for analyzing complex DNA samples using two color fluorescent probe hybridization. Genome Res 6: 639–645, 1996. 16. Tusher VG, Tibshirani R, and Chu G. significance analysis of microarray applied to the ionizing radiation response. Proc Natl Acad Sci USA 5116–5121, 2001. 17. Yang MCK, Ruan QG, Yang JJ, Eckenrode S, Wu S, McIndoe RA, and She JX. A statistical method for flagging weak spots improves normalization and ratio estimates in microarray. Physiol Genomics 7: 45–53, 2001. First published August 8, 2001; 10.1152/physiolgenomics. 00020.2001. 18. Zien A, Fluck J, Zimmer R, and Lengauer T. How many do you need? Preprint at http://cartan.gmd.de/⬃zien/HowManyArray/, 2003. www.physiolgenomics.org Downloaded from physiolgenomics.physiology.org on December 18, 2011 with r defined by Eq. 9. Since r and m are confounded, we may let m ⫽ 1 in the computer program by changing r. One sample serial time points experiment. The analysis is based on the mixed model with time as a fixed effect and the subject as a random effect. The SAS procedure is given in the software. For the power simulation, data are generated by Program sequent_2.sas. This subroutine contains the SAS program for one sample serial time points measurements based on Eqs. 1 and 14. All the input instructions are given by the comments in the program. Examples corresponding to the sample sizes in Fig. 2 and Fig. 3 are in the input (ⴱ.sas) and the output (ⴱ.lst) programs.
© Copyright 2024