Download Report

www.roche-applied-science.com
Genome Sequencer FLX System
Longer sequencing reads
mean more applications.
In 2005, the Genome Sequencer 20 System was launched
■ Read length: 100 bases
■ 20 million bases in less than 5 hours
In 2007, the Genome Sequencer FLX System was launched
■ Read length: 250 to 300 bases
■ 100 million bases in less than 8 hours
Available in 2008, the Genome Sequencer FLX with improved chemistries
■ Read length: >400 bases
■ 1 billion bases in less than 24 hours
More applications lead to more publications.
Sequencing-by-Synthesis: Using an enzymatically
coupled reaction, light is generated when individual
nucleotides are incorporated. Hundreds of thousands of
Proven performance with an expanding list of applications and
more than 130 peer-reviewed publications.
Visit www.genome-sequencing.com to learn more.
individual DNA fragments are sequenced in parallel.
For life science research only. Not for use in diagnostic procedures.
454 and GENOME SEQUENCER are trademarks of 454 Life Sciences Corporation,
Branford, CT, USA.
© 2008 Roche Diagnostics GmbH. All rights reserved.
Roche Diagnostics GmbH
Roche Applied Science
68298 Mannheim, Germany
Table of contents
Letter from the Editor . . . . . . . . . . . . . . . . . . . . . . .4
Index of Experts . . . . . . . . . . . . . . . . . . . . . . . . . . .4
For Roche/454 Users:
Q1: How do you ensure accuracy and
reproducibility when you isolate genomic
regions of interest to be sequenced? . . . . 6
Q2: How do you optimize the amount of
input DNA? . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Q3: What steps do you take to ensure a timeeffective sample preparation protocol? . . . . . .8
For Illumina/Solexa Users:
Q4: How do you ensure accuracy and
reproducibility when you isolate genomic
regions of interest to be sequenced? . . . 10
Q5: How do you optimize the amount of
input DNA? . . . . . . . . . . . . . . . . . . . . . . . . . 11
Q6: What steps do you take to ensure a timeeffective sample preparation protocol? . . . . 12
List of Resources . . . . . . . . . . . . . . . . . . . . . . . . . 14
HybSelect™ from febit:
DNA extraction for next generation sequencing using Geniom® Biochips
Genomic DNA
Sequencing library
DNA selection by specific
capture probe matrix
Target DNA elution
Target your sequencing to those regions that interest you most
E A R LY A CC E S S : Contact febit to find out more about our method for targeted
sequence capture, based on our microfluidic, fully flexible oligo array platform
[email protected] (USA & Canada) · [email protected] (Europe & RoW)
For all Next Generation
Sequencers
Letter from the editor
For the first installment in what we
envision to be a series, GT looks to the
future of next-gen sequencing. By the
end of last year, three next-gen
platforms had made it to market:
Roche/454's Genome Sequencer FLX (an
upgrade of the Genome Sequencer 20);
Illumina's Genome Analyzer; and Applied Biosystems's
SOLiD sequencer. For the purposes of this guide, we've
focused on Roche and Illumina, the two platforms that
have been around for a year or more to ensure that our
experts have had enough time to refine their protocols.
While there’s been less demand in the research
market for CE instruments, next-gen platforms have
roared to life for a number of applications, including de
novo genome sequencing, gene expression profiling,
ChIP sequencing, small RNA analysis, metagenomics,
and resequencing. And considering the ever-declining
prices, no doubt scientists will continue to use them for
efficient, high-throughput sequencing analyses.
One area that seems to present the most difficulty
in these early days is sample preparation. To that end,
we've gathered experts familiar with both platforms to
lend their insight to the challenge of maintaining
efficient, standardized procedures. In this guide, users
offer advice on isolating genomic regions of interest,
maximizing the amount of input DNA, and ensuring
timely preparation procedures. As always, don't miss our
resources section, which lists additional places to go for
advice on how to keep your next-gen runs as accurate
and reproducible as possible.
— Jeanene Swanson
Index of experts
Anoja Perera
Genome Technology would like to thank the
following contributors for taking the time to respond
to the questions in this tech guide.
4
Stowers Institute for
Medical Research
Ghia Euskirchen
Stephen Kingsmore
Richard Reinhardt
(Mike Snyder's lab)
Yale University
National Center for
Genome Resources
Max Planck Institute
for Molecular Genetics
Yuan Gao
Matthias Meyer
Virginia
Commonwealth
University
Max Planck Institute
for Evolutionary
Anthropology
University of
Oklahoma
Neil Hall
Kenneth Nelson
Agnes Viale
University of
Liverpool
(Mike Snyder's lab)
Memorial SloanKettering Cancer
Center
Yale University
Next-Gen Sequencing Sample Preparation
Bruce Roe
Genome Technology
Res earch
Development
Manufacturing
Use your research resources wisely...
Go Green, Go Cogenics
The Genomics Services Company
Cogenics is setting the standard in customizing and delivering expert genomics
solutions for Research, Clinical, and Manufacturing applications in the biotechnology
and pharmaceutical industries.
Whether your questions are best answered by sequencing, conventional or next-generation,
gene expression, genotyping, or a combination of techniques, Cogenics provides resourceeffective, expertly-run solutions for your research or FDA regulated genomics projects.
Your analyses will be performed using the most appropriate platform to answer your research questions with
fast delivery times and high quality data. Whether you are planning a full or pilot project, here are some of
the solutions we consistently provide:
Sequencing solutions
Genetic variant assay development and validation
Viral and oncogene analyses
Drug efficacy and safety related analyses
SNP Discovery and Genotyping
Support of global multi-center clinical trials
Cell Bank Characterization
Biodistribution and Residual DNA Analyses
w w w.c o g e n i c s . com / gogr een
US: 1 877-226-4364
France: +33 (0) 456-381102
UK: +44 (0) 1279-873837
Germany: +49 (0) 8158-998518
www.cogenics.com
Email: [email protected]
The Genomics Services Company
For Roche/454 Users:
How do you ensure accuracy
and reproducibility when you
isolate genomic regions of
interest to be sequenced?
We don't do much of this. So far we have only
isolated genomic regions using high fidelity PCR.
— Neil Hall
Apart from whole genome
shotgun sequencing we
are currently targeting
small genomic regions,
which can be easily
enriched through preamplification by PCR or
long-range PCR. In our
hands, the success of longrange PCR greatly varies
not only with DNA quality,
but also with the PCR
system, and we found it
helpful to evaluate the
performance of kits from
different suppliers.
— Matthias Meyer
"Touchdown" PCR coupled with a second
round of nested primers and "Touchdown"
PCR to amplify genomic DNA regions
of interest.
— Bruce Roe
We require core facility
users to provide purified
genomic DNA. For whole
genome sequencing, we
long-range PCR greatly varies
have obtained good quality
4 5 4 data using
not only with DNA quality, but
DNA extracted by several
different methods (e.g.,
also with the PCR system, and
DNeasy and Proteinase
K / phe no l - c h lorofor m
we found it helpful to evaluate
extraction kits from Qiagen).
We do not believe the
purification method is a
the performance of kits from
critical parameter provided
that the resultant DNA is
We don't use the 454 for
different suppliers.”
high molecular weight and
re-sequencing but mainly
very clean (260/230 > 1.7).
— Matthias Meyer
for de novo sequencing
For amplicon resequencing, a
(based on pooled BACs,
proofreading
polymerase
which have been
should
be
used
during the
individually measured and
amplification.
We
routinely
purify
the
PCR
product
adjusted) and cDNA/microRNA [libraries].
using the AMPure Agencourt kit. We have not yet
— Richard Reinhardt
optimized protocols for resequencing long-range
We
ac tually
rarely
focus
on
specific
PCR products.
— Agnes Viale
genomic regions, but when we do, we use
6
“In our hands, the success of
Next-Gen Sequencing Sample Preparation
Genome Technology
How do you optimize the
amount of input DNA?
We use a 2:1 template-to-bead ratio. We don't do
titrations and have had consistent runs between
100 Mb and 150 Mb.
— Neil Hall
The material requirements for 454 sequencing are
very low; 1 nanogram or less starting material will
usually produce sufficient library for sequencing,
and there is, in principle, no requirement for
optimizing the amount of input DNA. However,
this is only true if quantitative PCR is used to
estimate the copy number in the sequencing
library. The quantification methods suggested in
Roche’s library preparation protocol are not
sufficiently sensitive, and micrograms of input
material are required to detect resulting libraries
on Agilent chips or in RiboGreen assays. I
generally
recommend
implementing
the
quantitative PCR when working with the 454
platform. It not only drastically reduces the
material requirements to nanograms or
picograms, but in our experience also gives more
consistent sequence yields. From 100 or so
libraries we quantified with this method, most
gave optimal sequence numbers without further
titration runs. When using the method
for the first time, it is advisable to include an
existing,
well-titrated
sequencing
library
into the measurements for use as an initial
reference point.
— Matthias Meyer
The titration step is the most accurate and best
method to optimize input DNA. However, to get
Genome Technology
into the ballpark, we use RiboGreen and
PicoGreen (Invitrogen) assays for quantity and
Agilent Bioanalyzer for sizing. (These are the
standard 454 methods, but we find that they are
essential and cannot be skipped.) A German
group recently published a method using qPCR,
but we have not tried that yet.
— Kenneth Nelson
We generally check the quality using the Agilent
system from which we extract empirical factors,
and in some cases we use titration runs.
— Richard Reinhardt
We typically begin making our library with 5 to 10
ug input DNA, and at various stages we quantitate
the DNA on the Caliper AMS-90. In the
emPCR step we use less input DNA (0.8 molecules
of DNA/molecule of beads) rather that the 1.0
to 1.2 molecules of DNA recommended
by Roche/454.
— Bruce Roe
This step is crucial. An inadequate copy-per-bead
ratio can completely spoil a run. If the DNA is a
discrete band, we use a PicoGreen-based
quantification method to calculate the molarity of
the sample. If the starting material is a smear
(e.g., cDNA), we use the PicoGreen results but we
size-weight the value according to the Agilent
Bioanalyzer DNA 1000 Assay results. This
approach was developed empirically but it works
fairly well.
— Agnes Viale
Next-Gen Sequencing Sample Preparation
7
What steps do you take to
ensure a time-effective sample
preparation protocol?
Really we only use the manufacturer’s protocol.
Shortcuts such as cetrifuging to break emulsions
have not worked for us. At the moment, we find
that shortcuts have reduced our throughput.
— Neil Hall
multiple sequencing runs, one careful library prep is
very time effective. We have not found any real
shortcuts to the Roche/454 protocols.
— Kenneth Nelson
We consequently stick to protocols supplied
by Roche.
Sample preparation for 454 sequencing in our lab
— Richard Reinhardt
often involves barcoding of multiple samples before
the construction of a single sequencing library. This
We adhere to a strict time schedule for the library
adapts the 454 technology for use with multiple
and emPCR protocols that
samples and in many cases
has been established over
better exploits the
“We
adhere
to
a
strict
time
the past two-plus years. My
sequencing resources. Since
technicians and students
the barcoding reactions add
doing these protocols also
to the time required for
schedule for the library and
work in teams and that
sample preparation, we have
helps
keep
to
the
developed a protocol for
emPCR
protocols.”
set schedule.
multichannel setup in plates,
allowing for par tial
— Bruce Roe
— Bruce Roe
automation on a pipetting
At this point, we are still
robot. Once the samples are
processing our samples manually. To reduce reagent
barcoded, Roche’s standard protocol for
cost, we first set up two or three emPCR per sample
sequencing library preparation only takes some
with different copy-per-bead ratio. Then, based on
hours. However, we have observed that sequencing
the percentage of bead recovery, we select an
libraries degrade very rapidly. Freezing libraries in
optimal ratio and process the remaining samples
aliquots immediately after their production is very
using this ratio for the emPCR. This process bypasses
helpful to decrease the risk of failed or suboptimal
the titration on PTP, but does not reduce the
sequencing runs, and can therefore save a lot of
processing time (in general, we perform sample
time and money on this side.
preparation/processing Monday through Thursday
— Matthias Meyer
and run the 454 Thursday nights).
Since the library prep usually yields enough DNA for
— Agnes Viale
8
Next-Gen Sequencing Sample Preparation
Genome Technology
Announcing the arrival of THE next-generation in sequencing platforms — Applied Biosystems SOLiD™ System
Setting the standard in next-generation technology, the SOLiD™ System is the only platform to provide the accuracy,
throughput and scalability required to enable exciting new applications beyond the boundaries of traditional genetic analysis.
The company that automated sequencing now introduces a genomic platform with the power to break the barriers
into a new generation of discovery.
To join the new generation, visit solid.appliedbiosystems.com
For Research Use Only. Not for use in diagnostic procedures. © 2008 Applied Biosystems. All rights reserved. All other trademarks are the property of their respective owners. Applera, Applied Biosystems, AB (Design) and ABI PRISM are registered
trademarks. SOLiD is a trademark of Applera Corporation or its subsidiaries in the U.S and/or certain other countries.
For Illumina Users:
How do you ensure accuracy
and reproducibility when you
isolate genomic regions of
interest to be sequenced?
Most of our Solexa (Illumina) work is ChIP
sequencing. Many of the standards that were
developed for ChIP-chip also apply to ChIP-seq, with
antibody validation being critical to all ChIP
experiments. We validate antibodies by IP-western
as well as by mass spectrometry. For reproducibility
we perform and evaluate three biological replicates,
zeroing in on control loci if they are known for a
given factor.
— Ghia Euskirchen
We pretty much check the accuracy and
reproducibility by:
• mapping the reads to the regions of our interest
• using Sanger sequencing to confirm
• performing technical replicates to see correlation
— Yuan Gao
The National Center for Genome Resources currently
has two Solexa-Illumina sequencers in full-time
operation and a third on its way. About one half of
our throughput represents in-house samples and the
other half are provided by academic and industry
clients nationwide. To date, we have brought two
applications into full production — genomic DNA
sequencing and messenger RNA sequencing. The
mRNA protocol was developed by Gary Schroth's
group at Illumina and has been tweaked by Jim
Huntley at NCGR, while our genomic DNA protocol
is standard. For these sample types, we have
developed standard procedures and a LIMS system
to ensure accuracy and reproducibility. It tracks
each sample through the Solexa sequencing
process and Joann Mudge at NCGR has been
working hard to validate quality metrics at various
stages of the process. The standard yield that
passes quality control from seven channels is ~1
10
gigabase of singleton reads. Our standard read
length is 36 bp, although we've recently been
extending this to 46 bp. One neat accuracy check
that we've done is to run a set of samples both on
the Solexa sequencer and on Infinium HapMap 550K
genotyping chips. This has helped us tremendously
to validate raw and bioinformatically filtered SNP
detec tion accuracy. For nucleotide variant
detec tion and management of case -control
association studies we are using a software system
we've
developed
called
Alpheus
(http://alpheus.ncgr.org/). For other sample types,
such as isolated genomic regions of interest, we ask
clients to do the isolation and first steps in the library
preparation. They ship us libraries and we generate
clusters and sequence them. The yield and quality of
these libraries vary.
— Stephen Kingsmore
So far we have not isolated genomic regions. We
have only performed whole genome-wide
experiments. In the future, if we do isolate regions
we will have to perform validation experiments. The
type of validation experiment will depend on what
regions are isolated and the techniques used to
isolate. For instance, if we do long-range PCR to
isolate a small region, we could run a gel to ensure
we are amplifying the expected size. Also, we can
perform Sanger sequencing with the PCR primers to
confirm the amplified region.
— Anoja Perera
Any kind of UV- or gel-based measurements are used
to determine the amount of PCR-amplified samples,
cDNA [libraries] for expression profiling or ChIPbased experiments.
— Richard Reinhardt
Next-Gen Sequencing Sample Preparation
Genome Technology
How do you optimize the
amount of input DNA?
Library size is an important parameter in obtaining
There
are
two
points
at
which
we
seek
to
optimize
the
amount
of
input
material.
good quality data. We monitor library
The first is at the time of RNA library generation,
performance in part by examining sequence data
when
many
clients
want
to
generate
for identical reads which can be generated during
sequence from as little as 1 microgram of
the PCR amplification step if insufficient starting
total RNA.
material was used. Additionally, if there is an
The second point is at
excess of adapters relative
c
l
u
s
ter generation.
to input material, the
Addition of either too
adapters ligate to each
much or too little library
other without an inser t
“We try to determine the
results in fewer sequence
and yield a large number
reads. The optimal number
of adapter reads.
amount of clusters generated by
of clusters will generate
— Ghia Euskirchen
almost 5 million passing
fluorescent measurement, but
We have used different
reads per channel. We use
amounts of input DNA to
an Agilent Bioanalyzer to
m a ke l i b r a r i e s a nd
determine
the
library
mainly it is based on empirical
t he n d e te r m ine w hi c h
concentration and typically
concentration yields better
load 1 pM to 3.5 pM.
feeling and empirical factors.”
results. We found out that the
— Stephen Kingsmore
most important optimization
— Richard Reinhardt
is
the
input
library
Quantity as well as quality
concentration. We usually use
matters when it comes to
3 pM to 4 pM of library DNA
input DNA. Here, an
to generate clusters. There are many ways to measure
efficient cleanup technique is a must!
— Anoja Perera
the concentration of the library. We used a combination
of measuring the amount of input DNA by Nanodrop
We try to determine the amount of clusters
and running against a quantitative marker on a gel. We
generated by fluorescent measurement, but
highly recommend doing both, as this may be the
mainly it is based on empirical feeling and empirical
most important factor to determine your final
factors.
sequencing output.
— Yuan Gao
— Richard Reinhardt
Genome Technology
Next-Gen Sequencing Sample Preparation
11
How do you ensure accuracy
and reproducibility when you
isolate genomic regions of
interest to be sequenced?
We find the genomic and ChIP DNA library
preparation to be quite straightforward. Mostly we
try to space out our samples during the library
preparation to avoid any cross-contamination.
— Ghia Euskirchen
Solexa sample preparation is easy enough. We
pretty much follow Illumina's protocol.
— Yuan Gao
next steps while you are on a waiting step to see
what needs to be thawed to cut out down time.
Arrange your work area to maximize workflow.
— Anoja Perera
We use the cluster station from Illumina but try to
consequently follow the protocols.
— Richard Reinhardt
The Solexa-Illumina sample preparation protocol is
fast (~a day) and several libraries can be generated
simultaneously. The bottlenecks in the process are
not at sample preparation, but at cluster generation
(we have two cluster stations for two sequencers to
alleviate this), sequence generation (particularly
when we are generating 46-bp reads), basecalling,
and genomic alignments.
— Stephen Kingsmore
Plan ahead of time, set up a schedule, and organize
yourself. Familiarize yourself with the protocols
beforehand. Make sure all reagents and supplies are
available to work with. Have a backup plan! For
example, have extra supplies in case something goes
wrong. We have had two faulty amplification
manifolds in the past, and if we didn't have backup
ones our experiments would have been delayed.
Read your protocols and draw out timelines
next to the steps. The gene expression protocols take
three full days and without proper preparation you
will be putting in more than eight hours. Look to the
12
Next-Gen Sequencing Sample Preparation
Genome Technology
Massively Parallel Sequencing Projects?
Data Can Be
Overwhelming.
NCGR has
a solution…
I need Alpheus™!
Contact [email protected]
[email protected]
www.ncgr.org
List of resources
Our panel of experts referred to a number of
publications and online tools that may be able to
help you get a handle on sample preparation for
next-generation sequencing. Whether you're a
novice or pro at this new technology, these
resources are sure to come in handy.
Publications
Brockman W, Alvarez P, Young S, Garber M,
Giannoukos G, Lee WL, Russ C, Lander ES,
Nusbaum C, Jaffe DB. Quality scores and SNP
detection in sequencing-by-synthesis systems.
Genome Res. Jan 22, 2008 [Epub ahead of print].
Don RH, Cox PT, Wainwright BJ, Baker K, Mattick
JS. 'Touchdown' PCR to circumvent spurious
priming during gene amplification. Nucleic
Acids Res. 19(14): 4008 (1991).
Fahlgren N, Howell MD, Kasschau KD, Chapman EJ,
Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR,
Dangl JL, Carrington JC. High-throughput
sequencing of Arabidopsis microRNAs: evidence
for frequent birth and death of MIRNA genes.
PLoS ONE. 2(2):e219 (2007).
Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, Lin
C, Holoch D, Lim C, Tuschl T. Identification of
microRNAs and other small regulatory RNAs
using cDNA library sequencing. Methods.
44(1):3-12 (2008).
Hillier LW, Marth GT, Quinlan AR, Dooling D,
Fewell G, Barnett D, Fox P, Glasscock JI,
Hickenbotham M, Huang W, Magrini VJ, Richt RJ,
Sander SN, Stewart DA, Stromberg M, Tsung EF,
Wylie T, Schedl T, Wilson RK, Mardis ER. Wholegenome sequencing and variant discovery in
C. elegans. Nat Methods. 5(2):183-8 (2008).
Meyer M, Briggs AW, Maricic T, Höber B, Höffner
B, Krause J, Weihmann, Pääbo S, Hofreiter M.
From micrograms to picograms: quantitative
14
PCR reduces the material demands of highthroughput sequencing. Nucleic Acids Res.
36(1):e5 (2008).
Meyer M, Stenzel U, and Hofreiter M. Parallel
tagged sequencing on the 454 platform.
Nature Protocols. 3:267-278 (2008).
Robertson G, Hirst M, Bainbridge M, Bilenky M,
Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R,
Delaney A, Thiessen N, Griffith OL, He A, Marra
M, Snyder M, Jones S. Genome-wide profiles of
STAT1 DNA association using chromatin
immunoprecipitation and massively parallel
sequencing. Nat Methods. 4(8):651-7 (2007).
Rusk N, Kiermer V. Primer: Sequencing — the
next generation. Nat Methods. 5(1):15 (2008).
Schuster SC. Next-generation sequencing
transforms today's biology. Nat Methods.
5(1):16-8 (2008).
Tarasov V, Jung P, Verdoodt B, Lodygin D,
Epanchintsev A, Menssen A, Meister G,
Hermeking H. Differential regulation of
microRNAs by p53 revealed by massively
parallel sequencing: miR-34a is a p53 target
that induces apoptosis and G1-arrest. Cell
Cycle. 6(13):1586-93 (2007).
Wold B, Myers RM. Sequence census methods
for functional genomics. Nat Methods. 5(1):
19-21 (2008).
Conferences
Next Generation Sequencing: Platforms,
Applications, and Case Studies (CHI conference)
http://www.healthtech.com/2008/seq/index.asp
Next Generation Sequencing Symposium
http://www.nminbre.org/pages/events/nmbis/2008/
Next-Generation Sequencing Data Management
http://blog.bioteam.net/2008/01/15/workshopnext-generation-sequencing-data-management/
Next-Gen Sequencing Sample Preparation
Genome Technology
Evolving?
Don’t change jobs without us.
E-mail your updated address information to
[email protected].
Please include the subscriber number
appearing directly above your
name on the address card.
GenomeWeb
Intelligence
Network