Rapid pan-cancer identification of previously unidentified fusion genes to enable novel targeted therapeutics; Pan-cancer identification of fusion genes Henrik Edgren1, Kalle Ojala1, Anja Ruusulehto1, Carolyn Buser-Doepner2, Gopi Ganji2 1MediSapiens Ltd, Helsinki, Finland 2Oncology R&D, GlaxoSmithKline, Collegeville, PA, USA For additional information: Henrik Edgren, CSO email: [email protected] Tel: +358 50 353 9219 RESULTS In this study, we have developed a highly specific and sensitive fusion gene detection pipeline, FusioSCOUT, building on previous work (Edgren et al., Genome Biol. 2011;12(1):R6). We further performed a comprehensive analysis of fusion genes across 7625 TCGA tumors from 28 different cancer types. Here, we report on the prevalence of select fusion genes with high interest for development of targeted therapies, with a focus on the NTRK receptor tyrosine kinase family Our pipeline was validated against reported findings of prevalence of known fusion genes in the respective cancer types (e.g. ETS-family gene fusions in prostate cancer and EML4-ALK in lung cancer). We were able to identify several novel fusion partners for known fused oncogenes. Furthermore, the newly identified and also previously known fusion events were discovered in novel tumor types, thus expanding the fusion landscape of well-known genes. MATERIALS AND METHODS As an example, we focused on NTRK gene family fusions across 7625 TCGA cancer samples. In this set, we identified 26 NTRK fusions in 24 samples, with 24/26 having the NTRK gene as 3’ fusion partner (Table 2). The fusion gene identification pipeline is described in Figure 1. For all analyses, transcript and exon definitions from Ensembl database version 75 were used. A total of 7625 TCGA paired-end RNAseq cancer samples in 28 cancer types, as well as 669 samples from normal tissues, were analyzed (Table 1). The two exceptions are cases in which the reciprocal fusion is expressed: RBPMS-NTRK3 and NTRK3-RBPMS as well as AFAP1-NTRK2 and NTRK2-AFAP. Table 1: The number of samples analyzed for each TCGA cancer type, as well as normal tissue samples Filter by genome alignments, sequence homology, … Identify candidate fusion genes 1 2 3 1 2 3 Find fusion junction 2 5’ junction exon 5 3 3’ junction exon Filter by: junction read pair alignment, read orientations, … Predict fusion mRNA and reading frame 1 2 3 4 5 Predict protein domains (Pfam) Acute Myeloid Leukemia (LAML) 153 Adrenocortical carcinoma (ACC) 79 Bladder Urothelial Carcinoma (BLCA) 273 Brain Lower Grade Glioma (LGG) 467 Breast Invasive Carcinoma (BRCA) 1029 Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (CESC) 195 Colon Adenocarcinoma (COAD) 287 Glioblastoma multiforme (GBM) 170 Head and Neck Squamous Cell Carcinoma (HNSC) 412 Kidney Chromophobe (KICH) 66 Kidney Renal Clear Cell Carcinoma (KIRC) 523 Kidney Renal Papillary Cell Carcinoma (KIRP) 226 Liver Hepatocellular Carcinoma (LIHC) 198 Lung Adenocarcinoma (LUAD) Interpro domain 1 Interpro domain 2 Figure 1: Schematic overview of the MediSapiens ’ FusionSCOUT™ pipeline. 456 Lung Squamous Cell Carcinoma (LUSC) Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC) 482 Mesothelioma (MESO) 36 Ovarian Serous Cystadenocarcinoma (OV) 420 By far the most common 5’ fusion partner is ETV6, which accounts for 10/24 cases. In the remaining cases, only SQSTM1 is recurrent, observed twice. Fusions involving NTRK3 are the most common (13/24), followed by NTRK1 (7/24) and NTRK2 (4/24). All but one of the fusions with NTRK as 3’ partner gene retained the full kinase domain in the fusion gene, with 1/24 cases retaining a partial kinase domain (Figures 2 and 3). A small number of known or suspected oncogenic fusion genes were identified in TCGA normal tissue samples. In some cases, the same gene fusion was found in the matching cancer sample, suggesting potential cross contamination between tissue samples. In other cases, including two cases of TMPRSS2-ERG in prostate cancer, the fusion gene was only observed in the normal tissue sample. A 28 100 80 184 Prostate Adenocarcinoma (PRAD) 336 Rectum Adenocarcinoma (READ) 85 Sarcoma (SARC) 161 Skin Cutaneous Melanoma (SKCM) 355 Stomach Adenocarcinoma (STAD) 190 Thyroid Carcinoma (THCA) 506 Uterine Carcinosarcoma (UCS) 57 Uterine Corpus Endometrial Carcinoma (UCEC) 167 Normal tissues 669 GBM THCA THCA THCA THCA THCA THCA THCA THCA THCA THCA THCA THCA LGG LGG LGG 60 HNSC 40 30 COAD 20 10 40 NTRK3 NTRK3 NTRK3 NTRK3 NTRK3 20 NTRK3 ETV6 ETV6 84 Pheochromocytoma and Paraganglioma (PCPG) GBM HNSC 50 ETV6 ETV6 COAD NTRK3 NTRK3 NTRK3 NTRK3 ETV6 NTRK3 NTRK3 NTRK3 NTRK3 NTRK3 ETV6 Pancreatic Adenocarcinoma (PAAD) Cancer type 60 120 TCGA-CK-5913-01 Align reads to Ensembl Ensembl transcriptome Number of samples analyzed TCGA-CK-5913-01 Cancer type Paired-end reads (fastq) Table 2: Gene fusions involving NTRK1, NTRK2 and NTRK3 identified in 24/7625 TCGA cancer samples. 11.8 mb 11.9 mb 11.85 mb 12 mb NTRK3 NTRK3 NTRK3 88.5 mb 11.95 mb MED28P6 NTRK3 NTRK3 NTRK3 88.7 mb COAD BRCA 88.6 mb LUAD r B SKCM Pointed_dom (IPR003118) yr_kinase_cat_dom (IPR001245) 0 100 200 300 PAAD 400 500 600 Amino acids Figure 2: Structure of the ETV6-NTRK3 fusion gene in TCGA Colon adenocarcinoma sample TCGA-CK-5913-01. Panel A shows RNA-seq coverage for ETV6 (left) and NTRK3 (right). The exons representing the fusion junction are highlighted. Transcript variants from the Ensembl database and genomic positions are shown below. Panel B shows the predicted fusion protein and Pfam functional domains. Sample id TCGA-06-541101 TCGA-19-261901 TCGA-DJA4V0-01 TCGA-DJA4UP-01 TCGA-E8A438-01 TCGA-ETA40S-01 TCGA-ELA3ZN-01 TCGA-DJA3UV-01 TCGA-FEA3PD-01 TCGA-CEA3MD-01 TCGA-EMA3AO-01 TCGA-ETA39L-01 TCGA-ETA39L-01 TCGA-CEA27D-01 TCGAHT-7680-01 TCGAHT-7680-01 TCGA-DUA76L-01 TCGACN-6997-01 TCGABB-4223-01 TCGACK-5916-01 TCGACK-5913-01 TCGAA6-2674-01 TCGA-AOA03U-01 TCGA-55-809101 TCGA-EBA51B-01 TCGAHZ-7918-01 5' gene NFASC 3' gene In frame NTRK1 yes A Kinase domain retained yes BCAN NTRK1 yes yes ETV6 NTRK3 yes yes IRF2BP2 NTRK1 yes yes ETV6 NTRK3 yes yes SQSTM1 NTRK1 yes yes ETV6 NTRK3 yes yes ETV6 NTRK3 yes yes ETV6 NTRK3 yes yes SSBP2 NTRK1 yes yes TFG NTRK1 yes yes RBPMS NTRK3 yes yes NTRK3 RBPMS yes no ETV6 NTRK3 yes yes AFAP1 NTRK2 yes yes 1500 1500 1500 TCGA-HZ-7918-01 AIM OF THE STUDY 60000 TCGA-HZ-7918-01 4793 1000 1000 1000 40000 500 500 500 20000 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 CEL NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 CEL NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 NTRK1 CEL NTRK1 NTRK1 NTRK1 NTRK2 AFAP1 yes no SQSTM1 NTRK2 yes yes LYN NTRK3 yes yes PAN3 NTRK2 yes yes ETV6 NTRK3 yes yes ETV6 NTRK3 yes yes VPS18 NTRK3 yes partially ETV6 NTRK3 yes yes TRIM24 NTRK2 yes yes ETV6 NTRK3 yes yes CEL NTRK1 yes yes 135.938 mb 135.94 mb 135.939 mb 135.942 mb 135.941 mb 135.944 mb 135.943 mb 156.79 156.79 156.79mb mb mb 135.946 mb 135.945 mb 156.81 156.81 156.81mb mb mb 156.8 156.8 156.8mb mb mb 135.947 mb NTRK1 NTRK1 NTRK1 156.83 156.83 156.83mb mb mb 156.82 156.82 156.82mb mb mb 156.85 156.85 156.85mb mb mb 156.84 156.84 156.84mb mb mb r B CarbesteraseB (IPR002018) AB_hydrolase_3 (IPR013094) yr_kinase_cat_dom (IPR001245) 0 200 400 600 800 Amino acids Figure 3: Structure of the CEL-NTRK1 fusion gene in TCGA Pancreatic adenocarcinoma sample TCGA-HZ-7918-01. Panel A shows RNA-seq coverage for CEL (left) and NTRK1 (right). The exons representing the fusion junction are highlighted. Transcript variants from the Ensembl database and genomic positions are shown below. Panel B shows the predicted fusion protein and Pfam functional domains. CONCLUSIONS In summary, we have successfully executed a valid, functional and efficient pipeline for the analysis of the largest set of cancer samples reported to date. We found at least one fusion gene in 3930/7625 cancer samples, involving 9257 different genes. NTRK1, 2 and 3 fusions are found in 9/28 TCGA cancer types with multiple occurrences in thyroid (2.2%), glioblastoma (1.2%), glioma (0.4%), head & neck (0.5%) and colorectal (1%) cancers. Screening tumors for these alterations may identify patients who could benefit from treatment with selective NTRK inhibitors. This holds true for other clinically significant oncogenes such as ALK. In all fusion positive cases, NTRKs are the 3’ in frame partner with full or partial retention of the catalytic kinase domain. This suggests that the NTRK rearrangements are likely driver events in cancer. Acknowledgements Aurexel Ltd. (www.aurexel.com) is thanked for editorial assistance in the preparation of this poster. www.medisapiens.com
© Copyright 2024