How to Design a good RNA-Seq experiment in an interdisciplinary context? ,

How to Design a good RNA-Seq experiment in an
interdisciplinary context?
Pôle Planification Expérimentale, PEPI IBIS
1
RNA-seq technology is a powerful tool for characterizing and quantifying transcriptome. Upstream careful experimental planning is necessary to pull the maximum of
relevant information and to make the best use of these
experiments.
1
,
INRA, France
Be aware of different types of bias
Why increasing the number of biological
replicates?
• To generalize to the population level
• To estimate to a higher degree of accuracy variation in
individual transcript (Hart, 2013)
• To improve detection of DE transcripts and control of
false positive rate: TRUE with at least 3 (Sonenson
2013, Robles 2012)
An RNA-seq
experimental
design
using Fisher’s
principles
More biological replicates or increasing
sequencing depth?
Rule 1: Share a minimal common
language
Keep in mind the influence of effects on results:
lane ≤ run ≤ RNA library preparation ≤ biological
(Marioni, 2008), (Bullard, 2010)
RNA-seq experiment analysis: from A to Z
It depends! (Haas, 2012), (Liu, 2014)
• DE transcript detection: (+) biological replicates
• Construction and annotation of transcriptome: (+)
depth and (+) sampling conditions
• Transcriptomic variants search: (+) biological
replicates and (+) depth
A solution: multiplexing.
Decision tools available: Scotty (Busby, 2013),
RNAseqPower (Hart, 2013)
Some definitions
Biological and technical replicates:
Rule 2: Well define the biological
question
Sequencing depth: Average number of a given position in a
genome or a transcriptome covered by reads in a sequencing run
Multiplexing: Tag or bar coded with specific sequences
added during library construction and that allow multiple
samples to be included in the same sequencing reaction
(lane)
Blocking: Isolating variation attributable to a nuisance variable (e.g. lane)
From Alon, 2009
• Choose scientific problems on feasibility and interest
• Order your objectives (primary and secondary)
• Ask yourself if RNA-seq is better than microarray
regarding the biological question
Adapted from Mutz, 2013
Conclusions
Make a choice
• Identify differentially expressed (DE) genes?
Rule 4: Make good choices
• Detect and estimate isoforms?
• All skills are needed to discussions right from project
• Construct a de Novo transcriptome?
How many reads?
Rule 3: Anticipate difficulties with
a well designed experiment
Prepare a checklist with all the needed elements to be
collected,
2 Collect data and determine all factors of variation,
3 Choose bioinformatics and statistical models,
4 Draw conclusions on results.
1
• Clarify the biological question
• 100M to detect 90% of the transcripts of 81% of
human genes (Toung, 2011)
• 20M reads of 75bp can detect transcripts of medium
and low abundance in chicken (Wand, 2011)
• 10M to cover by at least 10 reads 90% of all (human
and zebrafish) genes (Hart, 2013)...
construction
• Prefer biological replicates instead of technical
replicates
• Use multiplexing
• Optimum compromise between replication number and
sequencing depth depends on the question
• Wherever possible apply the three Fisher’s principles of
randomization, replication and local control (blocking)
And do not forget: budget also includes cost of biological data acquisition, sequencing data backup, bioinformatics and statistical analysis.
Who are we?
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected], [email protected], [email protected]
ECCB14, 7 - 10 September 2014, Strasbourg, France
Alon, U. (2009).
How to choose a good scientific problem.
Molecular cell.
Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S. (2010).
Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments.
BMC bioinformatics, 11(1):94.
Busby, M. A., Stewart, C., Miller, C. A., Grzeda, K. R., and Marth, G. T. (2013).
Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.
Bioinformatics (Oxford, England), 29(5):656–7.
Consortium, T. E. (2011).
A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) - ENCODE UsersGuide.pdf.
Plos Biology, 9(4).
Fisher, R. A. (1971).
The Design of Experiments.
Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W., and Livny, J. (2012).
How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?
BMC genomics, 13(1):734.
Hart, S. N., Therneau, T. M., Zhang, Y., Poland, G. A., and Kocher, J.-P. (2013).
Calculating Sample Size Estimates for RNA Sequencing Data.
Journal of computational biology : a journal of computational molecular cell biology.
Liu, Y., Zhou, J., and White, K. P. (2014).
RNA-seq differential expression studies: more sequence or more replication?
Bioinformatics (Oxford, England), 30(3):301–4.
Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M., and Gilad, Y. (2008).
Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
Genome research, 18(9):1509–17.
Mutz, K.-O., Heilkenbrinker, A., Lanne, M., Walter, J.-G., and Stahl, F. (2013).
Transcriptome analysis using next-generation sequencing.
Current Opinion in Biotechnology, 24(1).
Robles, J. A., Qureshi, S. E., Stephen, S. J., Wilson, S. R., Burden, C. J., and Taylor, J. M. (2012).
Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.
BMC genomics, 13:484.
Soneson, C. and Delorenzi, M. (2013).
A comparison of methods for differential expression analysis of RNA-seq data.
BMC bioinformatics, 14(1):91.
Toung, J. M., Morley, M., Li, M., and Cheung, V. G. (2011).
RNA-sequence analysis of human B-cells.
Genome research, 21(6):991–8.
Wang, Y., Ghaffari, N., Johnson, C. D., Braga-Neto, U. M., Wang, H., Chen, R., and Zhou, H. (2011).
Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens.
BMC bioinformatics, 12 Suppl 1(Suppl 10):S5.