Purple Black NOVEL IMPROVEMENTS TO THE ILLUMINA TRUSEQ INDEXED LIBRARY CONSTRUCTION, AMPLIFICATION AND QUANTIFICATION PROTOCOLS FOR OPTIMIZED MULTIPLEXED SEQUENCING GAVIN J. RUSH1, ERIC VAN DER WALT1, JACOB KITZMAN2, LIESL NOACH1, ZAYED ALBERTYN3, COLIN HERCUS3, CHARLIE LEE2, JAY SHENDURE2, JOHN F. FOSKETT III1, PAUL J. MCEWAN1 Reversed 1 KAPA BIOSYSTEMS, 600 WEST CUMMINGS PARK, SUITE 2250, WOBURN, MA 01801 | 2 UNIVERSITY OF WASHINGTON, DEPARTMENT OF GENOME SCIENCES, FOEGE BUILDING S-250, BOX 355065, 3720 15TH AVE NE , SEATTLE WA 98195 | 3 NOVOCRAFT, C-23A-5, TWO SQUARE, SECTION 19, 46300 PETALING JAYA, SELANGOR, MALAYSIA A M P L I F I C AT I O N S A M P L E P R E PA R AT I O N INTRODUCTION L I B R A R Y Q UA N T I F I C AT I O N K A PA L I B R A R Y Q U A N T I F I C AT I O N K I T F O R O P T I M A L S A M P L E M U LT I P L E X I N G R AT I O N A L E F O R R E A L-T I M E H I G H F I D E L I T Y A M P L I F I C AT I O N O F N E X T- G E N E R AT I O N D N A S E Q U E N C I N G L I B R A R I E S The same guidelines and standards for the University logo 0 Manual Gel Purification Sage Science Pippin Prep Yield of adaptor-ligated library molecules before size selection. Yield of adaptor-ligated library molecules after size selection. Sheared genomic DNA was prepared in bulk by nebulization and identical starting material was used for each library (1 μg of gDNA from S. aureus, E. coli, or M. tuberculosis). Libraries were constructed using reagents and procedures (see detailed methods on left-hand side) from KAPA Biosystems (green; 9 libraries), or using the Illumina TruSeq DNA Sample Prep Kit and Low Throughput Protocol (orange; 15 libraries). Libraries were quantified by qPCR before size selection using the KAPA Biosystems Library Quantification Kit according to the recommended protocol. The estimated percentage of starting material that was converted to useful, adaptor-ligated (PCR-amplifiable) library molecules is provided. Starting with 1 μg of either S. aureus, E. coli, or M. tuberculosis gDNA, we constructed duplicate libraries using reagents and procedures from the KAPA DNA Library Prep Kit (see detailed methods on left-hand side). Duplicate libraries were pooled, and the three resulting libraries (S. aureus, E. coli, or M. tuberculosis) were each split equally before size selection either by manual gel purification (Qiagen QIAquick Gel Extraction Kit) or by Pippin Prep automated gel purification (Sage Science). Libraries were quantified by qPCR using the KAPA Library Quantification Kit according to the recommended protocol. The estimated percentage of starting material converted to useful, adaptor-ligated (PCR-amplifiable) library molecules is provided. Approximate duration No details supplied in protocol Note: This can negatively impact library diversity because it will result in the availability of less than 20 μL of library for the subsequent preparative PCR reaction. 20 25 30 35 Time (min) 40 0 45 000 45 000 40 000 40 000 35 000 35 000 Fluorescence Fluorescence Fluorescence 15 30 000 00 20 000 15 000 5 10 15 20 30 35 40 30 000 25 000 20 000 15 000 10 000 10 000 5 000 5 000 Here we demonstrate the use of the system to: 0 0 1 2 3 4 5 6 7 8 9 10 1112 1314 1516 17181920 0 1 2 3 4 5 6 7 8 9 10 1112 1314 1516 17181920 Cycle Cycle 0 B 25 0 1 2 3 4 5 6 7 8 9 10 1112 1314 1516 17181920 21 22 232425 2627 282930 3132333435 Cycle Real-time high fidelity amplification of next-generation DNA sequencing libraries. Cycle 12 Cycle 13 Cycle 14 Cycle 15 Cycle 16 Target concentration 1. Generate Illumina TruSeq indexed libraries comprising equimolar concentrations of 3 bacterial genomes spanning a range of average GC contents (S. aureus, 33% GC; E. coli, 51% GC; and M. tuberculosis, 65% GC). 2. Pool multiple Illumina TruSeq indexed libraries in equimolar concentrations for equal representation in sequence data. (A) Libraries are amplified using a SYBR® Green-based real-time high fidelity PCR master mix. Four wells of the PCR plate contain fluorescein reference standards representing a range of distinct DNA concentrations. Reactions terminated between standards 1 and 3 represent the optimal library amplification range, depicted here from cycle 10-14. (B) Gel image of a typical library stopped at different amplification cycles. Low and high molecular weight artifacts increase progressively with additional cycles . (C) Shown are typical amplification plots for reactions terminated at the lower (left panel) or upper (right panel) bounds of the targeted concentration range (grey box). Library amplification reactions should ideally be terminated anywhere within the indicated target concentration range. Primer Mix (10X) Size separation using either: 1. Manual gel extraction, purification (MinElute Gel Extraction Kit, Qiagen). Recover 20 μL library DNA. 2. Automated DNA size selection and collection system (Pippin Prep, Sage Science). Recover 30 μL library DNA. PROGRAM REAL-TIME THERMOCYCLER 5 10 15 20 25 30 35 40 DISADVANTAGES ASSOCIATED WITH THE EXISTING TRUSEQ WORKFLOW 45 000 40 000 Approximate duration 30 000 25 000 20 000 15 000 10 000 5 min 0 1 2 3 4 5 6 7 8 9 10 1112 1314 1516 1 7181920 Cycle 1 x 50 μL qPCR reaction per library 25 μL 5 μL 20 μL 15 min Fixed cycle number amplification High fidelity real-time PCR · Wastage of library material required for cycle number optimization leads to loss of library diversity. · Variability in the nature and/or concentration of the input DNA can result in under or over-amplification exacerbating amplification bias and artifacts. · Longer protocol involves more time and labor. Example of multiplexed real-time high fidelity amplification. 20 libraries, spanning a ~64-fold concentration range (6 cycles), were simultaneously amplified and terminated after 14 cycles. 14 of the 20 libraries fall within the targeted amplification range. The remaining 6 libraries could either be used as is, noting that they may be outside the optimal concentration range, or they could be re-amplified individually or in high- or low-concentration groups. ADVANTAGES OF HIGH FIDELITY REAL-TIME PCR Cycle number optimization ·Inconsistencies in reaction volume scale- up can cause variable results. 5 000 0 SET UP AND INITIATE PREPARATIVE HIGH FIDELITY qPCR KAPA HiFi qPCR MM TruSeq PCR Primer Mix Size selected library · Gel electrophoresis steps are not amenable to automation. · Post-amplification gel electrophoresis QC is not amenable to automation. · Built-in real-time quality metrics (concentration range) for each amplified DNA library. Std 1 ·Real-time PCR is amenable to automation. 0.001 M. tuberculosis 30 min S. aureus Pause the themocycler and remove a small scale PCR tube after each predetermined cycle and place on ice before initiating cycling. TERMINATE HIGH FIDELITY qPCR Terminate qPCR reaction when the linear amplification plots of samples fall between fluorescent standards 1 - 3 or within targeted concentration range. This is a 4 cycle termination window and enables single plate amplification of libraries with up to a 16-fold difference in initial concentration. S. aureus (33% GC) · Precise control over PCR cycle number required for optimal amplification. 60 min Run an aliquot of each small scale sample on an agarose gel or BioAnalyzer chip to determine the optimal cycle number to be used for preparative gel electrophoresis. 30 min* CLEAN UP qPCR REACTION PROGRAM THERMOCYCLER FOR PREPARATIVE PCR 5 min 98° for 30 s 10 cycles* of: 98° for 10 s 60° for 30 s 72° for 30 s *or as determined by optional small-scale cycle optimization. 72° for 5 min SET UP AND INITIATE PREPARITIVE PCR 1 x 50 μL qPCR reaction per library TruSeq PCR MM 25 μL TruSeq PCR Primer Mix 5 μL Size Separated Library 20 μL CLEAN UP PCR REACTION 60 min Use AMPure XP Beads and elute in 32.5 μL Resuspension buffer, according to the TruSeq protocol. 3h 10 min QUANTIFY AMPLIFIED LIBRARY USING ILLUMINA QUANTIFICATION PROTOCOL GUIDE 20 20 15 QUANTIFY AMPLIFIED LIBRARY USING KAPA LIBRARY QUANTIFICATION KIT 15 10 10 5 5 0 0 0.3 fracGC 0.4 0.5 0.5 0.6 fracGC 0.7 0.8 60 min windowed depth vs %GC, M. tuberculosis l_7_tru12 amp:kapa_hifi_q_early windowed depth vs %GC, S. aureus l_7_tru12 amp:kapa_hifi_q_early 1h 50 min Starting with 1 µg of either S. aureus, E. coli, or M. tuberculosis sheared gDNA, we constructed duplicate libraries using reagents and procedures from the KAPA DNA Library Prep Kit (see detailed methods on left-hand side). Duplicate libraries were pooled, and the three resulting libraries (S. aureus, E. coli, or M. tuberculosis) were each split equally before size selection either by manual gel excision and purification (Qiagen QIAquick Gel Extraction Kit) or by Pippin Prep automated gel purification (Sage Science). We set the Pippin Prep instrument to collect library molecules in the 370 bp – 450 bp range including adaptors, and we attempted to select the same range via manual agarose gel purification. TruSeq adaptors are 121 bp in total, suggesting targeted insert sizes were ~250 bp – 330 bp (broken lines). Paired reads (2 x 75 bp) were used to determine actual insert sizes. 25 25 0.2 Use AMPure XP Beads and elute in 32.5 μL Resuspension buffer, according to the TruSeq protocol. 30 *Note: the 5 second step at 72° enables termination of the qPCR reaction (once the desired fluorescent intensity has been achieved) before the next cycle of denaturation is initiated. 30 30 25 25 20 20 15 15 10 5 5 0 · Seamless integration with qPCR-based library quantification. · Ideally, libraries should be minimally amplified to a low final concentration (~3.5 μM), not detectable · KAPA HiFi DNA via standard gel Polymerase is less prone electrophoresis. to amplification bias due to high- or low-GC content. 0.3 fracGC 0.4 0.5 15 10 5 0 3.0% windowed depth vs %GC, S. aureus l_8_tru2 amp:ilmn 30 25 25 20 20 15 15 10 10 5 5 0 0 0.2 0.3 fracGC 0.4 0.8 windowed depth vs %GC, M. tuberculosis l_8_tru2 amp:ilmn 30 0.5 2.5% 0.6 fracGC 0.7 12 14 16 18 20 22 24 26 28 30 32 34 -4 -2 0 2 Log (Concentration) KAPA Library Quantification Kits comprise DNA Standards (six 10-fold dilutions) and 10X Primer Premix, paired with KAPA SYBR® FAST qPCR Kits to accurately quantify the number of amplifiable molecules in an Illumina GA library. The 452 bp KAPA Illumina GA DNA Standard consists of a linear DNA fragment flanked by qPCR primer binding sites. Quantification is achieved by inference from a standard curve generated using the six DNA Standards. 2.0% 1.5% 1.0% 0.5% KAPA HiFi KAPA KAPA HiFi TruSeq KAPA HiFi TruSeq KAPA HiFi TruSeq Phusion TruSeq TruSeq Truseq 6.0 9.9 13.6 13.6 14.2 13.2 0.8 26% 24% 22% 20% 18% 16% 14% 12% Amplification method Library preparation Calculated doublings Effect of GC content on coverage depth for libraries amplified using KAPA HiFi qPCR Master Mix or Illumina TruSeq PCR Master Mix. Indexed libraries were prepared from identical sheared S. aureus (33% GC; left panels) and M. tuberculosis (65% GC; right panels) gDNA using either the KAPA DNA Library Prep Kit, or the Illumina TruSeq DNA Sample Prep Kit, and then amplified using the indicated PCR reagents before paired-end sequencing (2 x 75 bp). After filtering and aligning read pairs to reference sequences, 250 000 read pairs were randomly sampled for each genome, and scatter plots of mean sequence coverage depth vs. GC content were generated by analyzing 250 bp windows. TM For the AT-rich S. aureus genome (left panel), none of the samples showed gross amplification bias compared to the unamplified control sample. GC-rich M. tuberculosis sequences (right panel) in the library constructed and amplified using Illumina TruSeq reagents are under-represented in the sequencing data. In contrast, the library prepared using KAPA Biosystems reagents yielded coverage across the range of GC-content that is almost indistinguishable from that of the unamplified control, indicating that amplification with KAPA HiFi qPCR Master Mix introduced minimal additional GC-dependent coverage bias. Sample 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% End-Point PCR 10% 0.5 10 KAPA Library Quantification Kit Workflow. 28% fracGC 8 Cycle qPCR 0.7 6 3.5% Library preparation 0.6 4 4.0% Amplification method 0.5 y = -3.4496x + 36.24 R² = 0.9998 20 0.0% 0 0.2 Std 5 Std 6 25 4.5% Calculated doublings 10 Std 4 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 5.0% windowed depth vs %GC, M. tuberculosis l_8_tru4 amp:noAmp 30 100 150 200 250 300 350 400 VALIDATE IDEAL CYCLE NUMBER BY PERFORMING GEL ELECTROPHORISIS M. tuberculosis (65% GC) windowed depth vs %GC, S. aureus l_8_tru4 amp:noAmp Std 2 Std 3 Library Sample 2 Fraction of genome not represented E. coli Accurate qPCR-based library quantification ultimately depends on three factors: (i) the accuracy and reproducibility of the standards used, (ii) the ability of the DNA polymerase used in the qPCR to amplify all adaptor-flanked molecules with equal efficiency, and (iii) accurate and reproducible liquid handling. KAPA Library Quantification Kits are rigorously tested to ensure minimal lot-to-lot variation. In addition, KAPA SYBR® FAST qPCR Kits are designed for high performance, high-throughput, real-time PCR. The kit contains a novel DNA polymerase engineered via molecular evolution, resulting in an unique enzyme optimized for qPCR using SYBR® Green I dye chemistry. KAPA SYBR® FAST qPCR Kits are ideally suited to library quantification applications, as they support high-efficiency amplification of both AT- and GC-rich targets, and of fragments up to 1 kb in length. Time (min) Load 50 μL of each fluorescent standard 1 - 4 (in triplicate) per qPCR plate. REMOVE SMALL-SCALE PCR REACTIONS AT DESIRED PCR CYCLES 15 min DNA Size Selection System 10 0 depth Genome 0.002 Size distribution of library molecules after size selection by manual agarose gel purification or automated gel purification with Sage Science Pippin Prep. · Reduced potential for cross- contamination amongst samples. 5 Fraction of genome not represented · Potential for cross- contamination amongst samples on a single gel. 0 KAPA SYBR® FAST qPCR Master Mix (2X) Insert Size · Reduction of low molecular weight DNA contamination reduces library artifacts. Time (min) 75 Pippin Prep 50 Significant time and labor savings from quicker run times and automated elution. 70 5 000 depth · Contamination of adaptor- dimers and other low molecular weight artifacts leads to lower data quality. · Manual gel 0.000 · Selects tight, accurate, and reproducible DNA fragment size ranges. 65 10 000 SIZE SELECTION 98° for 45 s 30 cycles of: 98° for 15 s 60° for 30 s 72° for 30 s Data Aquisition 72° for 5 s* 98° for 30 s 16 cycles of: 98° for 10 s 60° for 30 s 72° for 30 s 5 min 15 min 0.003 60 Fluorescent Standard 1 depth · Manual gel electrophoresis requires longer run time and more manual steps. Automated elution of sample material into buffer compatible with downstream workflows (PCR amplification, flow cell amplification). 15 000 depth · Actual DNA fragment length distribution is broader and less reproducible. · PROGRAM THERMOCYCLER SET UP AND INITIATE SMALL-SCALE PCR Proportion of read pairs · Requires manual gel excision and downstream extraction of the sample from the agarose gel. AUTOMATED SIZE SELECTION 55 35 000 Method MANUAL GEL EXTRACTION 50 20 000 0 OPTIONAL The KAPA Biosystems library construction reagents and protocol produced ~25-fold more adaptor-ligated library fragments from the same amount of starting material than did the standard Illumina TruSeq DNA Sample Prep Kit. Important differences between the two library construction procedures assessed here include: Combined enzyme and reaction buffer “master mix” formulations (TruSeq) vs. separate enzyme and reaction buffer (KAPA); and the absence of an Ampure XP Bead clean-up between A-Tailing and Adaptor Ligation in the TruSeq Protocol. Size selection by manual gel purification yielded ~4-fold more DNA than size selection by automated gel purification using the Sage Science Pippin Prep apparatus. To a large extent this may be accounted for by the fact that the Pippin Prep method yielded a much narrower size-range of library fragments (see below), despite careful manual gel purification targeting an equivalent size range of library molecules. 45 Multiple Sample Workflow using KAPA High Fidelity qPCR Kit for TruSeq Library Amplification SIZE SELECTION Size separation: 1. Manual gel extraction, purification (MinElute Gel Extraction Kit, Qiagen). Recover 20 μL library DNA. 40 Fluorescent Standard 2 Here we present a high fidelity, real-time PCR method for rapid and convenient enrichment and amplification of libraries. The benefits of this approach include: 1) automatable workflows, 2) built-in quality metrics for each enriched library, eliminating expensive and time-consuming post-enrichment gel electrophoresis, 3) precise control over the number of PCR cycles required for optimal amplification, 4) a quality control metric for identifying inconsistencies in library preparation, and 5) seamless integration with KAPA qPCR Library Quantification kits. TruSeq Low-Throughput Library Amplification Workflow 35 Norm. Fluoro 500 Illumina TruSeq Library Prep 25 000 depth KAPA Library Prep 371 pM (~0.49%) 30 30 000 depth 0 1 000 25 Fluorescent Standard 3 35 000 Fluorescence 5 000 20 40 000 No Amplification 885 pM (~0.46%) 15 Fluorescent Standard 4 KAPA HiFi q PCR (“early”) 10 000 10 Cycle 11 1 500 C Time (min) 5 45 000 TruSeq PCR 15 000 0 TruSeq Library Prep 20 000 1 546 pM (~2.09%) A KAPA Library Prep 25 000 21 767 pM (~11.3%) 2 000 High fidelity PCR is used to selectively enrich library fragments carrying appropriate adaptor sequences and to amplify the amount of DNA prior to sequencing. During PCR enrichment of libraries, a subset of library molecules are amplified with reduced efficiencies, introducing bias and resulting in uneven sequence coverage. GC content is known to be an important factor in NGS library amplification bias, and different PCR enzymes and buffer formulations are likely to show individual strengths and weaknesses in this regard. Furthermore, such biases – along with other artifacts such as PCR-induced errors, adaptor dimers, PCR duplicates, and chimeras – are exacerbated by over-amplification, while under-amplification results in insufficient yields. Inherent uncertainty in the outcome of end-point PCR often demands downstream validation of library quality by electrophoresis. TruSeq Library Prep 30 000 Concentration by qPCR (pM) Concentration by qPCR (pM) Dramatic improvements in commercial Next Generation Sequencing (NGS) platforms have resulted in spectacular reductions in the cost-per-base of DNA sequencing. Until recently, the primary focus for innovation has been on improvements to the core sequencing technologies, with optimization of sample preparation playing a secondary role. The exponential gains in sequencing capacity have simultaneously led to higher sample throughput, placing increasing emphasis on the importance of improved library construction protocols for multiplexed sample sequencing. While major commercial NGS systems all require the construction of similar libraries via analogous workflows, some protocols and/or reagents offer significant advantages over others, and end-users must choose among numerous alternative methods and reagents for sample preparation. We re-sequenced the Staphylococcus aureus, Escherichia coli, and Mycobacterium tuberculosis genomes to compare the standard Illumina TruSeq sample preparation reagents and workflow with a number of innovative improvements including: alternative library preparation reagents and protocols; automated fragment size selection; real-time library amplification; amplification-free sequencing; and accurate qPCR library quantification for sample pooling and multiplexed sequencing. Accurate quantification of the number of amplifiable molecules in a library is critical to the ofSeal sequencing results on the Illumina applyoutcome to the University — including color use, staging and background colors. in Thelower minimum size requirement is .5”. Genome Analyzer next-generation sequencing platform. Overestimation of library concentration results cluster density after bridge The University logo guidelines can be found on pages 13-15. PCR. Underestimation of library concentration results in too many clusters on the flow cell, which can lead to poor cluster resolution. Both January 2010, Version 2.2 33 scenarios result in suboptimal sequencing capacity. qPCR is widely regarded as the gold standard for accurate quantification of DNA libraries as it is the only technique capable of measuring the number of amplifiable molecules. The broad dynamic range of qPCR also enables accurate quantification of extremely dilute libraries. KAPA HiFi KAPA KAPA HiFi TruSeq KAPA HiFi TruSeq KAPA HiFi TruSeq Phusion TruSeq TruSeq Truseq 6.0 9.9 13.6 13.6 14.2 13.2 qPCR End-Point PCR Percentage of the M. tuberculosis genome not represented at all (top), or at ≥5x coverage (bottom) when using various library construction and PCR reagents. Indexed libraries were prepared from identical sheared M. tuberculosis (65% GC) gDNA using either the KAPA DNA Library Prep Kit, or the Illumina TruSeq DNA Sample Prep Kit, and then amplified using the indicated PCR reagents before paired-end sequencing (2 x 75 bp). Libraries were quantified before and after amplification using the KAPA Library Quantification Kit to determine the number of doublings in each case. After filtering and aligning read pairs to reference sequences, 250 000 read pairs (~8.5x coverage) were randomly sampled for each genome. 80% 2 76.5 nM 1.04 μL 2.35 nM % of total assigned reads in lane 4 4 5.31 nM 15.00 μL 2.35 nM 5 5 51.09 nM 1.56 μL 2.35 nM 7 6 36.88 nM 2.16 μL 2.35 nM 9 7 36.69 nM 2.17 μL 2.35 nM 11 12 6.67 nM 11.94 μL 2.35 nM 33.87 μL 11.76 nM 60% Totals: 40% POOLED LIBRARY 2 0% 5 12 Conc. In Final Pool 1 20% 4 Vol. in Pool Relative qPCR concentration M. Tuberculosis S. aureus qPCR Concentration POOLED LIBRARY 1 100% E. coli I lumina TruSeq Index 1 5 7 9 11 2 10 6 8 2 12 2 2 38.41 nM 2.67 μL 4.35 nM 6 5 51.98 nM 1.97 μL 4.35 nM 8 6 51.24 nM 2.00 μL 4.35 nM 10 7 52.05 nM 1.97 μL 4.35 nM 12 12 6.84 nM 15.00 μL 4.35 nM 23.62 μL 21.73 nM Totals: TruSeq Index qPCR quantification enables equal representation of different sample types within indexed libraries. For each index (TruSeq 4, 5, and 12) we constructed three separate libraries (S. aureus, 33% GC; E. coli, 51% GC; and M. tuberculosis, 65% GC). Individual libraries were quantified using the KAPA Library Quantification Kit and for each index the libraries were pooled to achieve equimolar representation for each genome. The results indicate that quantification is reliable for samples with a wide range of GC content. qPCR quantification enables equal representation of pooled indexed libraries. Eleven indexed Illumina TruSeq libraries were quantified by qPCR using the KAPA Library Quantification Kit according to the recommended protocol, and then combined to achieve equal final concentrations in two separate pools for multiplexed sequencing on different flow-cell lanes. The eleven libraries ranged ~11-fold in concentration from 0.67 pM to 7.65 pM, while representation of each index varied between 90% and 127% of expected assigned reads per lane.
© Copyright 2024