RNA spike-in controls & analysis methods for

RNA spike-in controls & analysis methods
for trustworthy genome-scale
measurements
Sarah A. Munro, Ph.D.
Genome-Scale Measurements Group
ABRF Meeting
March 29, 2015
Overview
• External RNA Controls Consortium (ERCC)
RNA spike-in controls
• ‘erccdashboard’ analysis tool
• ERCC 2.0: Building an updated suite of RNA
controls
Overview
• External RNA Controls Consortium (ERCC)
RNA spike-in controls
• ‘erccdashboard’ analysis tool
• ERCC 2.0: Building an updated suite of RNA
controls
How can we have trustworthy
gene expression results?
• We’re simultaneously
measuring thousands
of RNA molecules in
gene expression
experiments
• But are we getting it
right?
External RNA Controls Consortium (ERCC)
initiated by industry, hosted by NIST
• Initiated by Janet Warrington,
VP Clinical Genomics at
Affymetrix
• Open to all interested parties
• Voluntary
• More than 90 participants
– Industry, Academia, Government
– All major microarray technology
developers
– Other gene expression assay
developers
Spikeins
ERCC control sequences are
in NIST Standard Reference Material 2374
• DNA sequence library
• 96 unique control
sequences in DNA
plasmids
• Controls intended to
mimic mammalian
mRNA
• In vitro transcription to
make RNA controls
NIST SRM 2374 and related data files
are available directly from NIST @
http://tinyurl.com/erccsrm
Making ERCC ratio mixtures
with true positive and true negative ratios
NIST Plasmid
DNA Library
RNA transcripts
…
in vitro
transcription
Pooling
Mixtures with known
abundance ratios
Using ERCC ratio mixtures
Treated (n>3)
Control (n>3)
Using ERCC ratio mixtures
Treated (n>3)
Control (n>3)
Using ERCC ratio mixtures
Treated (n>3)
Control (n>3)
Using ERCC ratio mixtures
Treated (n>3)
Control (n>3)
Measurement
process
Expression Measures
Statistical Analysis
Multiple steps
Many people & labs
Takes days to weeks
Example gene expression data
Treated
Control
Are the RNA molecule ratios statistically
different across the samples?
Treated
Control
Evaluate technical performance with
ERCC true positive and true negative ratios
Treated
Control
Overview
• External RNA Controls Consortium (ERCC)
RNA spike-in controls
• ‘erccdashboard’ analysis tool
• ERCC 2.0: Building an updated suite of RNA
controls
Use erccdashboard to produce standard
performance metrics for any experiment
• R package is available
from:
– Bioconductor
– NIST GitHub Site
• Open source and open
access for use in
– Other analysis tools and
pipelines
– Commercial software
Gauge technical performance with
4 erccdashboard figures
• Developed as part of
SEQC study, with ABRF
partners
• Technology-independent
ratio performance measures
• Assessed differences in
performance across
– Experiments
– Laboratories
– Measurement processes
Munro, S. A. et al. Nature Communications 5:5125 doi: 10.1038/ncomms6125 (2014).
Ambion ERCC Ratio Mixtures
23 Controls per Subpool
Design abundance spans 220
range within each Subpool
Spike-in design for SEQC RNA Sequencing
Experiments
Samples replicates
for sequencing
Rat Experiment
Treated and Control Rat RNA
Biological Replicates
Interlaboratory Experiment
Human Reference RNA Samples
Technical Replicates
What is the dynamic range of my experiment?
Interlaboratory Experiment
Log2 Normalized ERCC Counts
Log2 Normalized ERCC Counts
Rat Experiment
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
What is the dynamic range of my experiment?
Typical
Sequencing
~40 million sequence
reads per replicate
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
Interlaboratory Experiment
Log2 Normalized ERCC Counts
Log2 Normalized ERCC Counts
Rat Experiment
Deep
Sequencing
~260 million sequence
reads per replicate
Log2 ERCC Spike Amount (attomol nt µg-1 total RNA)
What was the diagnostic power?
Interlaboratory Experiment
True Positive Rate
True Positive Rate
Rat Experiment
False Positive Rate
False Positive Rate
What was the diagnostic power?
Area Under the Curve (AUC)
depends on the number of
controls detected!
False Positive Rate
Interlaboratory Experiment
True Positive Rate
True Positive Rate
Rat Experiment
False Positive Rate
AUC is a reasonable summary statistic…
But we’d like to evaluate our diagnostic
performance as a function of abundance…
Log2 Normalized Ratio of Counts
Rat Experiment
MA Plot
Log2 Normalized Average Counts
LODR: Limit of Detection of Ratios
Rat Experiment
Reference RNA
• Model P-values as a
function of average signal
DE Test P-values
• Find P-value threshold
based on chosen false
discovery rate
• Here FDR = 0.1
• Default is FDR = 0.05
Average Counts
• Estimate LODR from
intersection of model
confidence interval upper
bound and P-value
threshold
LODR: Limit of Detection of Ratios
Rat Experiment
Reference RNA
LODR provides
DE Test P-values
• Specified confidence in the
differentially expressed
transcripts above LODR
(90% chance of <10% FDR)
Average Counts
• Guidance for experimental
design
 increase signal for
transcripts above LODR
estimate
4:1 LODR
Log2 Ratio of Normalized Counts
Rat Experiment
MA Plot
Log2 Normalized Average Counts
4:1 LODR
Log2 Ratio of Normalized Counts
Rat Experiment
**
MA Plot
*
Log2 Normalized Average Counts
4:1 LODR
**
*
Log2 Ratio of Normalized Counts
Rat Experiment
MA Plot
Log2 Normalized Average Counts
Increased sequencing depth
shifts endogenous transcript
ratio measurements above
LODR
What are the LODR estimates for my experiment?
Interlaboratory Experiment
DE Test P-values
DE Test P-values
Rat Experiment
Average Counts
Average Counts
How do the endogenous samples relate to LODR?
Interlaboratory Experiment
Log2 Ratio of Normalized Counts
Log2 Ratio of Normalized Counts
Rat Experiment
4:1 LODR
Log2 Normalized Average Counts
4:1 LODR
Log2 Normalized Average Counts
How much technical variability & bias is there?
Rat Experiment
Interlaboratory Experiment
Log2 Ratio of Normalized Counts
Log2 Ratio of Normalized Counts
Decreased Variability
Significant Ratio Bias
mRNA Fraction Differences Between Samples
Contributes to Bias in ERCC Ratios
Spike-in
Spike-in
Total RNA
mRNA
mRNA
mRNA
enrichment
rRNA
Sample 1
Sample 2
Sample 1
The RNA fractions are exaggerated for illustration purposes
Sample 2
Dynamic
Range
AUC
• Variability
• Bias
• LODR &
Sample
Transcripts
LODR
Diagnostic
performance
Limit of
Detection of
Ratios
EVALUATE REPRODUCIBILITY
ACROSS LABORATORIES
Good
Performance
Poor
Performance
Interlaboratory Analysis Using
erccdashboard performance metrics
Lab 1-6
Illumina + poly-A selection (Illumina kit)
Lab 7-9
Life Tech + poly-A selection (Life Tech kit)
Lab 10-12
Illumina + ribosomal RNA depletion
• Diagnostic
performance was
consistent within and
amongst
measurement
processes
• Lab 7 was an
outlier for diagnostic
performance
LODR (Average Counts)
Consistent LODR across 11 of 12 Labs
• LODR agreement
with AUC
Laboratory
Ratio bias is highly variable amongst experiments
Ratio bias (rm) can be
attributed to mRNA fraction
difference between
samples:
Rs = nominal subpool
ratio
(E1/E2)s = empirical ratio
•
Shippy et al. 2006
mRNA fraction
Difference
Log(rm)
•
Large standard errors
indicate that mRNA fraction
isn’t the only factor
contributing to ERCC ratio
bias
– mRNA enrichment protocol
is a factor…
Laboratory
Protocol-dependent bias from poly-A selection affects
ERCC controls due to short poly-A tails
Lab 1-6 ILM Poly-A
Lab 7-9 LIF Poly-
Lab 10-12 ILM Ribo
mRNA enrichment protocol biases vary across individual
ERCCs but are consistent for a protocol
mRNA enrichment protocol biases vary across individual
ERCCs but are consistent for a protocol
Results of the erccdashboard Publication
• Ratio performance
measures for any
technology platform and
any experiment
– Diagnostic Power
– Novel LODR metric
– Technical Variability & Bias
• Comparison across
experiments
• Quantification of
mRNA fraction differences
between samples
• Show protocol-dependent
bias
Overview
• External RNA Controls Consortium (ERCC)
RNA spike-in controls
• ‘erccdashboard’ analysis tool
• ERCC 2.0: Building an updated suite of RNA
controls
ERCC 2.0: A New Suite of RNA Controls
• Approached by industry
and academia to build
new RNA controls
• NIST-hosted open, public
ERCC 2.0 workshop
– Workshop report and
presentations available:
slideshare.net/ERCC-Workshop
• All interested parties are
welcome to participate
– Sequence contributions
– Interlaboratory analysis
• New and Improved
mRNA Mimics
• Transcript Isoforms
• miRNA
New and Improved mRNA Mimics
• Additional controls
• Expand distributions
of RNA control
properties
– Length (> 2kb)
– GC content
– Poly-A tail length
Transcript Isoform Controls
• Transcript Design
– Non-cognate Spike-in
RNA Variants (SIRVs)
developed by Lexogen
– Cognate sequence
selection in progress
• Schizosaccharomyces
pombe
• Mixture design
– Dynamic Range
• 24
– Design Ratios
• < 2:1
Lukas Paul, Lexogen
Small and miRNA Controls
• Needed for validation of
clinical applications
– Early Detection Research
Network
– Tgen
• Other applications relevant
to bacterial RNA-Seq
• Non-cognate miRNA
controls
• Include some pre-miRNA
• Direct RNA control synthesis
by Agilent
– no need for DNA templates
Karol Thompson, FDA
Recap
• External RNA Controls Consortium (ERCC)
RNA spike-in controls
• ‘erccdashboard’ analysis tool
• ERCC 2.0: Building an updated suite of RNA
controls
Acknowledgements
• All External RNA Controls
Consortium participants
• NIST
–
–
–
–
–
–
–
–
Marc Salit
Steve Lund
P. Scott Pine
Justin Zook
David Duewer
Jerod Parsons
Jennifer McDaniel
Margaret Klein
• Empa
– Matthias Roesslein
• SEQC study participants
• Co-authors on erccdashboard
manuscript:
S. P. Lund, P. S. Pine, H. Binder,
D. Clevert, A. Conesa, J. Dopazo,
M. Fasold, S. Hochreiter, H. Hong,
N. Jafari, D. P. Kreil, P. P. Łabaj,
S. Li, Y. Liao, S. M. Lin, J. Meehan,
C. E. Mason, J. Santoyo-Lopez,
R. A. Setterquist, L. Shi, W. Shi,
G. K. Smyth, N. Stralis-Pavese,
Z. Su, W. Tong, C. Wang, J. Wang,
J. Xu, Z. Ye, Y. Yang, Y. Yu, & M. Salit
For more information contact: [email protected]