How to Design a good RNA-Seq experiment in an interdisciplinary context? Pôle Planification Expérimentale, PEPI IBIS 1 RNA-seq technology is a powerful tool for characterizing and quantifying transcriptome. Upstream careful experimental planning is necessary to pull the maximum of relevant information and to make the best use of these experiments. 1 , INRA, France Be aware of different types of bias Why increasing the number of biological replicates? • To generalize to the population level • To estimate to a higher degree of accuracy variation in individual transcript (Hart, 2013) • To improve detection of DE transcripts and control of false positive rate: TRUE with at least 3 (Sonenson 2013, Robles 2012) An RNA-seq experimental design using Fisher’s principles More biological replicates or increasing sequencing depth? Rule 1: Share a minimal common language Keep in mind the influence of effects on results: lane ≤ run ≤ RNA library preparation ≤ biological (Marioni, 2008), (Bullard, 2010) RNA-seq experiment analysis: from A to Z It depends! (Haas, 2012), (Liu, 2014) • DE transcript detection: (+) biological replicates • Construction and annotation of transcriptome: (+) depth and (+) sampling conditions • Transcriptomic variants search: (+) biological replicates and (+) depth A solution: multiplexing. Decision tools available: Scotty (Busby, 2013), RNAseqPower (Hart, 2013) Some definitions Biological and technical replicates: Rule 2: Well define the biological question Sequencing depth: Average number of a given position in a genome or a transcriptome covered by reads in a sequencing run Multiplexing: Tag or bar coded with specific sequences added during library construction and that allow multiple samples to be included in the same sequencing reaction (lane) Blocking: Isolating variation attributable to a nuisance variable (e.g. lane) From Alon, 2009 • Choose scientific problems on feasibility and interest • Order your objectives (primary and secondary) • Ask yourself if RNA-seq is better than microarray regarding the biological question Adapted from Mutz, 2013 Conclusions Make a choice • Identify differentially expressed (DE) genes? Rule 4: Make good choices • Detect and estimate isoforms? • All skills are needed to discussions right from project • Construct a de Novo transcriptome? How many reads? Rule 3: Anticipate difficulties with a well designed experiment Prepare a checklist with all the needed elements to be collected, 2 Collect data and determine all factors of variation, 3 Choose bioinformatics and statistical models, 4 Draw conclusions on results. 1 • Clarify the biological question • 100M to detect 90% of the transcripts of 81% of human genes (Toung, 2011) • 20M reads of 75bp can detect transcripts of medium and low abundance in chicken (Wand, 2011) • 10M to cover by at least 10 reads 90% of all (human and zebrafish) genes (Hart, 2013)... construction • Prefer biological replicates instead of technical replicates • Use multiplexing • Optimum compromise between replication number and sequencing depth depends on the question • Wherever possible apply the three Fisher’s principles of randomization, replication and local control (blocking) And do not forget: budget also includes cost of biological data acquisition, sequencing data backup, bioinformatics and statistical analysis. Who are we? [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] ECCB14, 7 - 10 September 2014, Strasbourg, France Alon, U. (2009). How to choose a good scientific problem. Molecular cell. Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC bioinformatics, 11(1):94. Busby, M. A., Stewart, C., Miller, C. A., Grzeda, K. R., and Marth, G. T. (2013). Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Bioinformatics (Oxford, England), 29(5):656–7. Consortium, T. E. (2011). A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) - ENCODE UsersGuide.pdf. Plos Biology, 9(4). Fisher, R. A. (1971). The Design of Experiments. Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC genomics, 13(1):734. Hart, S. N., Therneau, T. M., Zhang, Y., Poland, G. A., and Kocher, J.-P. (2013). Calculating Sample Size Estimates for RNA Sequencing Data. Journal of computational biology : a journal of computational molecular cell biology. Liu, Y., Zhou, J., and White, K. P. (2014). RNA-seq differential expression studies: more sequence or more replication? Bioinformatics (Oxford, England), 30(3):301–4. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., and Gilad, Y. (2008). Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research, 18(9):1509–17. Mutz, K.-O., Heilkenbrinker, A., Lanne, M., Walter, J.-G., and Stahl, F. (2013). Transcriptome analysis using next-generation sequencing. Current Opinion in Biotechnology, 24(1). Robles, J. A., Qureshi, S. E., Stephen, S. J., Wilson, S. R., Burden, C. J., and Taylor, J. M. (2012). Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing. BMC genomics, 13:484. Soneson, C. and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC bioinformatics, 14(1):91. Toung, J. M., Morley, M., Li, M., and Cheung, V. G. (2011). RNA-sequence analysis of human B-cells. Genome research, 21(6):991–8. Wang, Y., Ghaffari, N., Johnson, C. D., Braga-Neto, U. M., Wang, H., Chen, R., and Zhou, H. (2011). Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC bioinformatics, 12 Suppl 1(Suppl 10):S5.
© Copyright 2024