demonstrated - Pacific Biosciences

Multiplexing Human HLA Class I & II Genotyping with DNA Barcode
Adapters for High Throughput Research
Swati Ranade1, Walter Lee1, John Harting1, Phil Jiao1, Tyson Bowen1, Kevin Eng1, Lance Hepler1, Brett Bowman1, Nienke Westerink2
1Pacific Biosciences of California, Inc., Menlo Park, United States of America
2GenDx, Utrecht, Netherlands
Multiplexing Strategy
Strategy
Multiplexing
Read Length Distribution for 96X5 HLA Loci
(3hr Movies)
A
A
Workflow for Multiplex Sequencing
using Barcoded Adapter
One sample, one barcode for all HLA genes
P6/C4
Chemistry
Polymerase Read Length in Base Pairs
B
NGS-go® AmpX Primers from GenDx
C
D
HLA Locus
A] Average polymerase read length distribution across all amplicons in all samples
B] Consensus sequence length for each HLA gene in concordance with expected full-length
amplicon size
C] Unbalanced pool (without molar ratio considerations) created by equal volume pooling of HLA
amplicons (HLA- A, -B, -C, -DQB1, -DRB1) across all samples
D] Balanced pooling by global adjustment of volume rations in comparison to HLA-A. Ratios were
derived from empirical observations in sequencing data of unbalanced pools.
C
96 DNA Samples
1-96
1-96
+
+
GenDx Primers GenDx Primers
for HLA-B
for HLA-A
Amplify QC and Pool
amplicons for each
sample individually
1-96
1-96
1-96
+
GenDx Primers
for HLA-C
+
GenDx Primers
for HLA-DRB1
+
GenDx Primers
for HLA-DQB1
*Individual amplification efficiencies on a per sample basis were not considered.
Combined Step
End Repair &
Barcoded
adapter ligation
Hands on: 15 min
Incubation: 50 min
Pool all samples
Hands on: 10 min
1-96
1-96
Amplicons from the 5 loci for each individual
+
16 bp barcoded adapters to uniquely
identity all loci of an individual
Ampure Purif.
Damage Repair
Pool barcoded SMRTbell
templates & finish library for
sequencing
DNA
Barcode
Tagging
Hands on: 25 min
Hands on: 15 min
Incubation: 20 min
Exo III/VII Spike
Hands on: 5 min.
Incubation: 1 hr
2X Ampure Purif.
Hands on: 50 min
+
Pol/DNA/Nucleotide complex
~ 4 hours; 2 hours hands-on
Missed
Adjusted
Data
High-quality
Missed calls due to rate for
Alleles
calls
generated
detection
alleles
Total
severe
misdiscordant
via
Locus alleles
of all
found with
allelic
to
automated identified
expected
alleles
imbalance
extra
as chimera orthogonal
LAA
coverage
from
in PCR
analysis
typing
(4 cells)
products
A
177
172
1
4
0
0
100.0
B
180
177
0
3
0
0
100.0
C
175
167
0
4
4
0
97.7
DRB1
171
159
0
9
1
2
98.2
DQB1
178
170
0
2
4
2
96.6
Total
881
845
1
22
9
4
98.5
B.
IHIWP11
(Sample from UCLA reference panel)
Figure 6. Example of discordant
call for one allele from pre-typed
reference sample IHIWP11
The PacBio call 15:01:01:03 is supported
by consensus sequence derived from
high-quality reads as well as high
coverage. The second allele was
concordant and called as DRB1*14:07:01
Automated SMRT® Analysis
Separate
by Barcode
Figure 1. Allele Sequence Coverage Comparison
Conclusion
Filter
Subreads
Overlap
Cluster
A)HLA-A amplified using NGS-go® Reagent & Sequenced on the PacBio RS II
B)Traditional Sanger sequencing (SBT) of exons #2 and #3.
42
1210
1591
168
36
3
Exon1
Exon2
Exon3
Exon4
Exon5
Exon6
4237
(Class I genes: HLA-A, -B, -C
Class II genes: HLA -DRB1,
& -DQB1)
Comparison of PacBio allele
calls generated using internal
tools to pre-typing data from
orthogonal method (Data
based on unbalanced equal
volume pools across
amplicons and samples)
5’ UTR to 3’ UTR 3.2kb
Traditional SBT
1836
Table 1. 96 Samples
interrogated at 5 loci
SBT based Pre-Type: DRB1*15:01:01:01;
PacBio call: DRB1*15:01:01:03, due to discordant base in intron 2
A] Per Patient Barcoding Scheme
B] Amplicons generated by GenDx HLA kits & symmetric SMRTbell™ adapters with
barcodes, used for sample wise 16 bp barcode tagging. Barcodes are attached to the
adapters for use with off-the-shelf assays
C] Barcoded adapters and template preparation set up for multiplexed sample
sequencing
A. Full Length + SMRT Sequencing
HLA Locus
Figure 5. Qualitative analysis of SMRT® Sequencing data for HLA amplicons
across all samples
Figure 3. Multiplexing Strategy and Workflow
HLA Sequencing on PacBio® RS II
B
Reads
Human MHC class I genes HLA-A, -B, -C, and class II genes
HLA-DR, -DP and -DQ, play a critical role in the immune system
as major factors responsible for organ transplant rejection. The
have a direct or linkage-based association with several diseases,
including cancer and autoimmune diseases, and are important
targets for clinical and drug sensitivity research. HLA genes are
also highly polymorphic and their diversity originates from exonic
combinations as well as recombination events. A large number of
new alleles are expected to be encountered if these genes are
sequenced through the UTRs. Thus allele-level resolution is
strongly preferred when sequencing HLA genes. Pacific
Biosciences has developed a method to sequence the HLA
genes in their entirety within the span of a single read taking
advantage of long read lengths (average >10 kb) facilitated by
SMRT® technology. A highly accurate consensus sequence
(≥99.999 or QV50 demonstrated) is generated for each allele in
a de novo fashion by our SMRT Analysis software. In the present
work, we have combined this imputation-free, fully phased,
allele-specific consensus sequence generation workflow and a
newly developed DNA-barcode-tagged SMRTbell™ sample
preparation approach to multiplex 96 individual samples for
sequencing all of the HLA class I and II genes. Commercially
available NGS-go® reagents for full-length HLA Class I and
relevant exons of class II genes were amplified for hi-resolution
HLA sequencing. The 96 samples included 72 that are part of
UCLA reference panel and had pre-typing information available
for 2 fields, based on gold standard SBT methods. SMRTbell™
adapters with 16 bp barcode tags were ligated to long amplicons
in symmetric pairing. PacBio sequencing was highly effective in
generating accurate, phased sequences of full-length alleles of
HLA genes. In this work we demonstrate scalability of HLA
sequencing using off the shelf assays for research applications
to find biological significance in full-length sequencing.
Results
Base Pairs
Abstract
1275
819
62
Phase
Filter
Consensus
Sequences
Quiver
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGA
||||||||||||||||||||||||||||||||*||||||||||||||||||||||||||||||
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGA
Quiver
CTCTCTCTGTCTCTCTCT--CACACACACACACACACACACACACACACACACACA
||||||||||||||||||**|||||||||||||||||||||||||||||||||||||||||||
CTCTCTCTGTCTCTCTCTCACACACACACACACACACACACACACACACACACAC
Figure 2. HLA-B Diversity due to Exon Combinations
A) Numbers above Exons denote unique coding sequences (CDS) or variants of exons,
B) Numbers between Exons denote the number of unique combinations with neighboring exons
Recombination events increase diversity in HLA genes, thus
Exon-only sequencing is insufficient for resolving variation in new
alleles. Additional variation is sometimes caused by mutations
occurring in exons other than exon 2 & 3 CDS or sequences
outside of the CDS. Thus, fully phased, allele-level genotyping
with phasing across all exons and introns for accurate SNP
determination in a single read span is highly advantageous for
determining true heterozygosis between alleles.
Report Consensus
Sequences for all
HLA alleles
Iterate
• High quality de novo consensus sequences (QV >50) were
generated using reads >3 kb in automated LAA pipeline
• 868 of the 881 expected HLA alleles were detected in LAA output
• 13 alleles missed by automated analysis came from the 14
samples with <1500 subreads per sample
• 9 of the 13 alleles could be retrieved with manual intervention or
additional SMRT Cell sequencing.
• Balanced pooling volume ratios improved subread coverage for all
loci on a global level (Figure 5D)
• The low coverage samples (<800 reads) continued to stay
undetectable in automated analysis even after balanced pooling
• Low coverage samples due to poor PCR amplification need
systematic pooling using QC data
• 22 discordant calls will be evaluated in future for validation of
discordant sequences.
Figure 4. Diagram of Long Amplicon Analysis (LAA)
Subreads for 96 samples were grouped by barcode identity. Each group was independently
processed and subreads were filtered based on user-definable criteria for read quality and
length. All filtered subreads in a group are aligned and clustered based on the sequence
similarity and iteratively “phased” by identifying and separating subreads according to highscoring mutations. Each resulting sub-cluster from this de novo pipeline is polished with
Quiver to generate high-quality consensus sequences which are filtered to remove artifacts.
References :
[1] Robinson, James, et al. "the IMGT/HLA database." Nucleic acids research41.D1 (2013): D1222-D1227.
[2] Chin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT
sequencing data." Nature methods 10.6 (2013): 563-569.
[3] https://github.com/bnbowman/HlaTools
For Research Use Only. Not for use in diagnostic procedures.
Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2015 Pacific Biosciences of California, Inc. All rights reserved.