Multiplexing Human HLA Class I & II Genotyping with DNA Barcode Adapters for High Throughput Research Swati Ranade1, Walter Lee1, John Harting1, Phil Jiao1, Tyson Bowen1, Kevin Eng1, Lance Hepler1, Brett Bowman1, Nienke Westerink2 1Pacific Biosciences of California, Inc., Menlo Park, United States of America 2GenDx, Utrecht, Netherlands Multiplexing Strategy Strategy Multiplexing Read Length Distribution for 96X5 HLA Loci (3hr Movies) A A Workflow for Multiplex Sequencing using Barcoded Adapter One sample, one barcode for all HLA genes P6/C4 Chemistry Polymerase Read Length in Base Pairs B NGS-go® AmpX Primers from GenDx C D HLA Locus A] Average polymerase read length distribution across all amplicons in all samples B] Consensus sequence length for each HLA gene in concordance with expected full-length amplicon size C] Unbalanced pool (without molar ratio considerations) created by equal volume pooling of HLA amplicons (HLA- A, -B, -C, -DQB1, -DRB1) across all samples D] Balanced pooling by global adjustment of volume rations in comparison to HLA-A. Ratios were derived from empirical observations in sequencing data of unbalanced pools. C 96 DNA Samples 1-96 1-96 + + GenDx Primers GenDx Primers for HLA-B for HLA-A Amplify QC and Pool amplicons for each sample individually 1-96 1-96 1-96 + GenDx Primers for HLA-C + GenDx Primers for HLA-DRB1 + GenDx Primers for HLA-DQB1 *Individual amplification efficiencies on a per sample basis were not considered. Combined Step End Repair & Barcoded adapter ligation Hands on: 15 min Incubation: 50 min Pool all samples Hands on: 10 min 1-96 1-96 Amplicons from the 5 loci for each individual + 16 bp barcoded adapters to uniquely identity all loci of an individual Ampure Purif. Damage Repair Pool barcoded SMRTbell templates & finish library for sequencing DNA Barcode Tagging Hands on: 25 min Hands on: 15 min Incubation: 20 min Exo III/VII Spike Hands on: 5 min. Incubation: 1 hr 2X Ampure Purif. Hands on: 50 min + Pol/DNA/Nucleotide complex ~ 4 hours; 2 hours hands-on Missed Adjusted Data High-quality Missed calls due to rate for Alleles calls generated detection alleles Total severe misdiscordant via Locus alleles of all found with allelic to automated identified expected alleles imbalance extra as chimera orthogonal LAA coverage from in PCR analysis typing (4 cells) products A 177 172 1 4 0 0 100.0 B 180 177 0 3 0 0 100.0 C 175 167 0 4 4 0 97.7 DRB1 171 159 0 9 1 2 98.2 DQB1 178 170 0 2 4 2 96.6 Total 881 845 1 22 9 4 98.5 B. IHIWP11 (Sample from UCLA reference panel) Figure 6. Example of discordant call for one allele from pre-typed reference sample IHIWP11 The PacBio call 15:01:01:03 is supported by consensus sequence derived from high-quality reads as well as high coverage. The second allele was concordant and called as DRB1*14:07:01 Automated SMRT® Analysis Separate by Barcode Figure 1. Allele Sequence Coverage Comparison Conclusion Filter Subreads Overlap Cluster A)HLA-A amplified using NGS-go® Reagent & Sequenced on the PacBio RS II B)Traditional Sanger sequencing (SBT) of exons #2 and #3. 42 1210 1591 168 36 3 Exon1 Exon2 Exon3 Exon4 Exon5 Exon6 4237 (Class I genes: HLA-A, -B, -C Class II genes: HLA -DRB1, & -DQB1) Comparison of PacBio allele calls generated using internal tools to pre-typing data from orthogonal method (Data based on unbalanced equal volume pools across amplicons and samples) 5’ UTR to 3’ UTR 3.2kb Traditional SBT 1836 Table 1. 96 Samples interrogated at 5 loci SBT based Pre-Type: DRB1*15:01:01:01; PacBio call: DRB1*15:01:01:03, due to discordant base in intron 2 A] Per Patient Barcoding Scheme B] Amplicons generated by GenDx HLA kits & symmetric SMRTbell™ adapters with barcodes, used for sample wise 16 bp barcode tagging. Barcodes are attached to the adapters for use with off-the-shelf assays C] Barcoded adapters and template preparation set up for multiplexed sample sequencing A. Full Length + SMRT Sequencing HLA Locus Figure 5. Qualitative analysis of SMRT® Sequencing data for HLA amplicons across all samples Figure 3. Multiplexing Strategy and Workflow HLA Sequencing on PacBio® RS II B Reads Human MHC class I genes HLA-A, -B, -C, and class II genes HLA-DR, -DP and -DQ, play a critical role in the immune system as major factors responsible for organ transplant rejection. The have a direct or linkage-based association with several diseases, including cancer and autoimmune diseases, and are important targets for clinical and drug sensitivity research. HLA genes are also highly polymorphic and their diversity originates from exonic combinations as well as recombination events. A large number of new alleles are expected to be encountered if these genes are sequenced through the UTRs. Thus allele-level resolution is strongly preferred when sequencing HLA genes. Pacific Biosciences has developed a method to sequence the HLA genes in their entirety within the span of a single read taking advantage of long read lengths (average >10 kb) facilitated by SMRT® technology. A highly accurate consensus sequence (≥99.999 or QV50 demonstrated) is generated for each allele in a de novo fashion by our SMRT Analysis software. In the present work, we have combined this imputation-free, fully phased, allele-specific consensus sequence generation workflow and a newly developed DNA-barcode-tagged SMRTbell™ sample preparation approach to multiplex 96 individual samples for sequencing all of the HLA class I and II genes. Commercially available NGS-go® reagents for full-length HLA Class I and relevant exons of class II genes were amplified for hi-resolution HLA sequencing. The 96 samples included 72 that are part of UCLA reference panel and had pre-typing information available for 2 fields, based on gold standard SBT methods. SMRTbell™ adapters with 16 bp barcode tags were ligated to long amplicons in symmetric pairing. PacBio sequencing was highly effective in generating accurate, phased sequences of full-length alleles of HLA genes. In this work we demonstrate scalability of HLA sequencing using off the shelf assays for research applications to find biological significance in full-length sequencing. Results Base Pairs Abstract 1275 819 62 Phase Filter Consensus Sequences Quiver TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGAGA ||||||||||||||||||||||||||||||||*|||||||||||||||||||||||||||||| TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGAGAGAGAGAGAGAGAGA Quiver CTCTCTCTGTCTCTCTCT--CACACACACACACACACACACACACACACACACACA ||||||||||||||||||**||||||||||||||||||||||||||||||||||||||||||| CTCTCTCTGTCTCTCTCTCACACACACACACACACACACACACACACACACACAC Figure 2. HLA-B Diversity due to Exon Combinations A) Numbers above Exons denote unique coding sequences (CDS) or variants of exons, B) Numbers between Exons denote the number of unique combinations with neighboring exons Recombination events increase diversity in HLA genes, thus Exon-only sequencing is insufficient for resolving variation in new alleles. Additional variation is sometimes caused by mutations occurring in exons other than exon 2 & 3 CDS or sequences outside of the CDS. Thus, fully phased, allele-level genotyping with phasing across all exons and introns for accurate SNP determination in a single read span is highly advantageous for determining true heterozygosis between alleles. Report Consensus Sequences for all HLA alleles Iterate • High quality de novo consensus sequences (QV >50) were generated using reads >3 kb in automated LAA pipeline • 868 of the 881 expected HLA alleles were detected in LAA output • 13 alleles missed by automated analysis came from the 14 samples with <1500 subreads per sample • 9 of the 13 alleles could be retrieved with manual intervention or additional SMRT Cell sequencing. • Balanced pooling volume ratios improved subread coverage for all loci on a global level (Figure 5D) • The low coverage samples (<800 reads) continued to stay undetectable in automated analysis even after balanced pooling • Low coverage samples due to poor PCR amplification need systematic pooling using QC data • 22 discordant calls will be evaluated in future for validation of discordant sequences. Figure 4. Diagram of Long Amplicon Analysis (LAA) Subreads for 96 samples were grouped by barcode identity. Each group was independently processed and subreads were filtered based on user-definable criteria for read quality and length. All filtered subreads in a group are aligned and clustered based on the sequence similarity and iteratively “phased” by identifying and separating subreads according to highscoring mutations. Each resulting sub-cluster from this de novo pipeline is polished with Quiver to generate high-quality consensus sequences which are filtered to remove artifacts. References : [1] Robinson, James, et al. "the IMGT/HLA database." Nucleic acids research41.D1 (2013): D1222-D1227. [2] Chin, Chen-Shan, et al. "Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data." Nature methods 10.6 (2013): 563-569. [3] https://github.com/bnbowman/HlaTools For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2015 Pacific Biosciences of California, Inc. All rights reserved.
© Copyright 2024