Poster_AACR15_PanCancer Fusion Gene

Rapid pan-cancer identification of previously unidentified fusion genes to enable
novel targeted therapeutics; Pan-cancer identification of fusion genes
Henrik Edgren1, Kalle Ojala1, Anja Ruusulehto1, Carolyn Buser-Doepner2, Gopi Ganji2
1MediSapiens Ltd, Helsinki, Finland
2Oncology R&D, GlaxoSmithKline, Collegeville, PA, USA
For additional information:
Henrik Edgren, CSO
email: [email protected]
Tel: +358 50 353 9219
RESULTS
In this study, we have developed a highly specific and sensitive fusion gene detection
pipeline, FusioSCOUT, building on previous work (Edgren et al., Genome Biol. 2011;12(1):R6).
We further performed a comprehensive analysis of fusion genes across 7625 TCGA tumors
from 28 different cancer types. Here, we report on the prevalence of select fusion genes with
high interest for development of targeted therapies, with a focus on the NTRK receptor
tyrosine kinase family
Our pipeline was validated against reported findings of
prevalence of known fusion genes in the respective cancer
types (e.g. ETS-family gene fusions in prostate cancer and
EML4-ALK in lung cancer).
We were able to identify several novel fusion partners for
known fused oncogenes.
Furthermore, the newly identified and also previously known
fusion events were discovered in novel tumor types, thus
expanding the fusion landscape of well-known genes.
MATERIALS AND METHODS
As an example, we focused on NTRK gene family fusions
across 7625 TCGA cancer samples. In this set, we identified
26 NTRK fusions in 24 samples, with 24/26 having the
NTRK gene as 3’ fusion partner (Table 2).
The fusion gene identification pipeline is described in Figure 1. For all analyses, transcript and exon
definitions from Ensembl database version 75 were used. A total of 7625 TCGA paired-end RNAseq cancer samples in 28 cancer types, as well as 669 samples from normal tissues, were analyzed
(Table 1).
The two exceptions are cases in which the reciprocal fusion
is expressed: RBPMS-NTRK3 and NTRK3-RBPMS as well
as AFAP1-NTRK2 and NTRK2-AFAP.
Table 1: The number of samples analyzed for each TCGA
cancer type, as well as normal tissue samples
Filter by genome alignments,
sequence homology, …
Identify candidate fusion genes
1
2
3
1
2
3
Find fusion junction
2
5’ junction exon
5
3
3’ junction exon
Filter by: junction read pair alignment, read
orientations, …
Predict fusion mRNA and reading frame
1
2
3
4
5
Predict protein domains (Pfam)
Acute Myeloid Leukemia (LAML)
153
Adrenocortical carcinoma (ACC)
79
Bladder Urothelial Carcinoma (BLCA)
273
Brain Lower Grade Glioma (LGG)
467
Breast Invasive Carcinoma (BRCA)
1029
Cervical Squamous Cell Carcinoma and
Endocervical Adenocarcinoma (CESC)
195
Colon Adenocarcinoma (COAD)
287
Glioblastoma multiforme (GBM)
170
Head and Neck Squamous Cell Carcinoma (HNSC)
412
Kidney Chromophobe (KICH)
66
Kidney Renal Clear Cell Carcinoma (KIRC)
523
Kidney Renal Papillary Cell Carcinoma (KIRP)
226
Liver Hepatocellular Carcinoma (LIHC)
198
Lung Adenocarcinoma (LUAD)
Interpro domain 1
Interpro domain 2
Figure 1: Schematic overview of the MediSapiens ’
FusionSCOUT™ pipeline.
456
Lung Squamous Cell Carcinoma (LUSC)
Lymphoid Neoplasm Diffuse Large B-cell
Lymphoma (DLBC)
482
Mesothelioma (MESO)
36
Ovarian Serous Cystadenocarcinoma (OV)
420
By far the most common 5’ fusion partner is ETV6, which
accounts for 10/24 cases. In the remaining cases, only
SQSTM1 is recurrent, observed twice.
Fusions involving NTRK3 are the most common (13/24),
followed by NTRK1 (7/24) and NTRK2 (4/24). All but one of
the fusions with NTRK as 3’ partner gene retained the full
kinase domain in the fusion gene, with 1/24 cases retaining
a partial kinase domain (Figures 2 and 3).
A small number of known or suspected oncogenic fusion
genes were identified in TCGA normal tissue samples. In
some cases, the same gene fusion was found in the
matching cancer sample, suggesting potential cross
contamination between tissue samples. In other cases,
including two cases of TMPRSS2-ERG in prostate cancer,
the fusion gene was only observed in the normal tissue
sample.
A
28
100
80
184
Prostate Adenocarcinoma (PRAD)
336
Rectum Adenocarcinoma (READ)
85
Sarcoma (SARC)
161
Skin Cutaneous Melanoma (SKCM)
355
Stomach Adenocarcinoma (STAD)
190
Thyroid Carcinoma (THCA)
506
Uterine Carcinosarcoma (UCS)
57
Uterine Corpus Endometrial Carcinoma (UCEC)
167
Normal tissues
669
GBM
THCA
THCA
THCA
THCA
THCA
THCA
THCA
THCA
THCA
THCA
THCA
THCA
LGG
LGG
LGG
60
HNSC
40
30
COAD
20
10
40
NTRK3
NTRK3
NTRK3
NTRK3
NTRK3
20
NTRK3
ETV6
ETV6
84
Pheochromocytoma and Paraganglioma (PCPG)
GBM
HNSC
50
ETV6
ETV6
COAD
NTRK3
NTRK3
NTRK3
NTRK3
ETV6
NTRK3
NTRK3
NTRK3
NTRK3
NTRK3
ETV6
Pancreatic Adenocarcinoma (PAAD)
Cancer
type
60
120
TCGA-CK-5913-01
Align reads to Ensembl
Ensembl
transcriptome
Number of
samples analyzed
TCGA-CK-5913-01
Cancer type
Paired-end reads (fastq)
Table 2: Gene fusions involving NTRK1, NTRK2 and NTRK3 identified in
24/7625 TCGA cancer samples.
11.8 mb
11.9 mb
11.85 mb
12 mb
NTRK3
NTRK3
NTRK3
88.5 mb
11.95 mb
MED28P6
NTRK3
NTRK3
NTRK3
88.7 mb
COAD
BRCA
88.6 mb
LUAD
r
B
SKCM
Pointed_dom (IPR003118)
yr_kinase_cat_dom (IPR001245)
0
100
200
300
PAAD
400
500
600
Amino acids
Figure 2: Structure of the ETV6-NTRK3 fusion gene in TCGA Colon adenocarcinoma sample
TCGA-CK-5913-01. Panel A shows RNA-seq coverage for ETV6 (left) and NTRK3 (right). The
exons representing the fusion junction are highlighted. Transcript variants from the Ensembl
database and genomic positions are shown below. Panel B shows the predicted fusion protein
and Pfam functional domains.
Sample id
TCGA-06-541101
TCGA-19-261901
TCGA-DJA4V0-01
TCGA-DJA4UP-01
TCGA-E8A438-01
TCGA-ETA40S-01
TCGA-ELA3ZN-01
TCGA-DJA3UV-01
TCGA-FEA3PD-01
TCGA-CEA3MD-01
TCGA-EMA3AO-01
TCGA-ETA39L-01
TCGA-ETA39L-01
TCGA-CEA27D-01
TCGAHT-7680-01
TCGAHT-7680-01
TCGA-DUA76L-01
TCGACN-6997-01
TCGABB-4223-01
TCGACK-5916-01
TCGACK-5913-01
TCGAA6-2674-01
TCGA-AOA03U-01
TCGA-55-809101
TCGA-EBA51B-01
TCGAHZ-7918-01
5' gene
NFASC
3' gene In frame
NTRK1
yes
A
Kinase
domain
retained
yes
BCAN
NTRK1
yes
yes
ETV6
NTRK3
yes
yes
IRF2BP2
NTRK1
yes
yes
ETV6
NTRK3
yes
yes
SQSTM1
NTRK1
yes
yes
ETV6
NTRK3
yes
yes
ETV6
NTRK3
yes
yes
ETV6
NTRK3
yes
yes
SSBP2
NTRK1
yes
yes
TFG
NTRK1
yes
yes
RBPMS
NTRK3
yes
yes
NTRK3
RBPMS
yes
no
ETV6
NTRK3
yes
yes
AFAP1
NTRK2
yes
yes
1500
1500
1500
TCGA-HZ-7918-01
AIM OF THE STUDY
60000
TCGA-HZ-7918-01
4793
1000
1000
1000
40000
500
500
500
20000
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
CEL
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
CEL
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
NTRK1
CEL
NTRK1
NTRK1
NTRK1
NTRK2
AFAP1
yes
no
SQSTM1
NTRK2
yes
yes
LYN
NTRK3
yes
yes
PAN3
NTRK2
yes
yes
ETV6
NTRK3
yes
yes
ETV6
NTRK3
yes
yes
VPS18
NTRK3
yes
partially
ETV6
NTRK3
yes
yes
TRIM24
NTRK2
yes
yes
ETV6
NTRK3
yes
yes
CEL
NTRK1
yes
yes
135.938 mb
135.94 mb
135.939 mb
135.942 mb
135.941 mb
135.944 mb
135.943 mb
156.79
156.79
156.79mb
mb
mb
135.946 mb
135.945 mb
156.81
156.81
156.81mb
mb
mb
156.8
156.8
156.8mb
mb
mb
135.947 mb
NTRK1
NTRK1
NTRK1
156.83
156.83
156.83mb
mb
mb
156.82
156.82
156.82mb
mb
mb
156.85
156.85
156.85mb
mb
mb
156.84
156.84
156.84mb
mb
mb
r
B
CarbesteraseB (IPR002018)
AB_hydrolase_3 (IPR013094)
yr_kinase_cat_dom (IPR001245)
0
200
400
600
800
Amino acids
Figure 3: Structure of the CEL-NTRK1 fusion gene in TCGA Pancreatic adenocarcinoma
sample TCGA-HZ-7918-01. Panel A shows RNA-seq coverage for CEL (left) and NTRK1
(right). The exons representing the fusion junction are highlighted. Transcript variants from the
Ensembl database and genomic positions are shown below. Panel B shows the predicted
fusion protein and Pfam functional domains.
CONCLUSIONS
In summary, we have successfully executed a valid, functional
and efficient pipeline for the analysis of the largest set of
cancer samples reported to date. We found at least one fusion
gene in 3930/7625 cancer samples, involving 9257 different
genes.
NTRK1, 2 and 3 fusions are found in 9/28 TCGA cancer types
with multiple occurrences in thyroid (2.2%), glioblastoma
(1.2%), glioma (0.4%), head & neck (0.5%) and colorectal (1%)
cancers. Screening tumors for these alterations may identify
patients who could benefit from treatment with selective NTRK
inhibitors. This holds true for other clinically significant
oncogenes such as ALK.
In all fusion positive cases, NTRKs are the 3’ in frame partner
with full or partial retention of the catalytic kinase domain.
This suggests that the NTRK rearrangements are likely driver
events in cancer.
Acknowledgements
Aurexel Ltd. (www.aurexel.com) is thanked for editorial assistance in the preparation of this poster.
www.medisapiens.com