here - MediSapiens

MediSapiens FusionSCOUT pipeline and
datasets
Paired end reads (fastq)
Align reads to Ensembl
Ensembl
transcriptome
!
One of the latest therapeutics angles in the fight against cancer is fusion
genes and their regulation. To aid in fusion gene research and reveal the
multitude of gene fusion event in cancer samples MediSapiens has developed
a proprietary FusionSCOUT pipeline for identifying fusion genes from RNA
sequencing datasets.!
!
Currently we have analysed 7625 tumour samples from the TCGA project
building a fusion gene dataset covering 28 different cancers within the TCGA
project which can be accessed through our FusionSCOUT product. !
!
Using this pipeline, we have discovered 3930 samples with gene fusions with
9667 different fusion genes. We´ve discovered numerous novel gene fusions
as well as new cancer types in which previously known fusions appear.!
!
You can now purchase these gene fusions datasets with a few mouse clicks
and get the world´s most comprehensive gene fusions from cancer sets within
days.!
!
You can also subscribe our pipeline for your samples and get the most out of
your data!!
Filter by genome alignments,
sequence homology, …
Identify candidate fusion genes
1
2
3
1
3
2
Find fusion junction
2
5’ junction exon
5
3
3’ junction exon
Filter by: junction read mate alignment,
read orientations, …
Predict fusion mRNA and reading frame
1
2
3
4
5
Predict protein domains (Pfam)
Interpro domain 1
Interpro domain 2
Schematic representation of FusionSCOUT pipeline!
www.medisapiens.com
Example of data from FusionSCOUT:
With the full order you´ll receive:
!
• 
Dataset: TCGA Breast cancer (BRCA) !
cancer types or one cancer in Excel and text format!
• 
Number of samples analysed: 1029!
• 
• 
Number of samples with fusion genes: 714!
Total number of fusion genes: 2941!
• 
Selected 5´genes ESR1 and FGFR2!
• 
• 
5´gene name
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
ESR1!
FGFR2!
FGFR2!
FGFR2!
FGFR2!
FGFR2!
FGFR2!
3´gene' name
MTHFD1L!
CCDC170!
CCDC170!
AKAP12!
AKAP12!
POLH!
C6orf211!
CCDC170!
CCDC170!
CCDC170!
CCDC170!
CCDC170!
CCDC170!
CCDC170!
ASPH!
ASPH!
RP11-78A18.2!
AP1M1!
ATE1!
CCDC6!
ENPP2!
ENPP2!
Paired-end
read count
18!
24!
17!
16!
16!
15!
42!
21!
14!
22!
42!
35!
35!
26!
613!
613!
50!
8!
22!
27!
83!
83!
Junction read
count
5!
9!
11!
17!
17!
8!
10!
16!
10!
16!
25!
25!
25!
14!
371!
371!
51!
9!
19!
20!
59!
59!
TCGA sample ID:s of the samples with fusions!
Exact exon junctions for the fusions, including alternatively spliced variants;
data on whether reading frame is retained!
Data source: TCGA project, via Cancer Genomics Hub (https://cghub.ucsc.edu/)!
TCGA sample id
TCGA-BH-A5J0-01!
TCGA-BH-A0C0-01!
TCGA-BH-A1FD-01!
TCGA-BH-A1FD-01!
TCGA-BH-A1FD-01!
TCGA-BH-A1FD-01!
TCGA-E9-A1NA-01!
TCGA-D8-A27N-01!
TCGA-A2-A4S3-01!
TCGA-A2-A0CT-01!
TCGA-BH-A18R-01!
TCGA-AR-A24R-01!
TCGA-AR-A24R-01!
TCGA-D8-A1JP-01!
TCGA-A2-A3KD-01!
TCGA-A2-A3KD-01!
TCGA-BH-A1EY-01!
TCGA-AQ-A04L-01!
TCGA-D8-A142-01!
TCGA-D8-A13Z-01!
TCGA-LD-A7W5-01!
TCGA-LD-A7W5-01!
A list of all gene fusions that involve your genes of interest, across all TCGA
Canonical fusion
reading frame
is_in_frame!
is_not_in_frame!
is_not_in_frame!
is_not_in_frame!
is_in_frame!
is_not_in_frame!
is_in_frame!
is_not_in_frame!
is_in_frame!
is_not_in_frame!
is_not_in_frame!
no_frame!
is_not_in_frame!
is_in_frame!
is_not_in_frame!
is_not_in_frame!
is_not_in_frame!
is_in_frame!
is_in_frame!
is_in_frame!
is_in_frame!
is_in_frame!
• 
Detailed list of protein domains retained in the fusion genes!
• 
Positions, cDNA sequences for the fusion mRNAs, and much more!!
5´protein domains
Oestr_rcpt ; Znf_hrmn_rcpt!
3´protein domains
Formate_THF_ligase!
Oestr_rcpt ; Znf_hrmn_rcpt!
Oestr_rcpt ; Znf_hrmn_rcpt!
Oestr_rcpt ; Znf_hrmn_rcpt!
Oestr_rcpt ; Nucl_hrmn_rcpt_lig-bd_core ; Znf_hrmn_rcpt!
!
Oestr_rcpt ; Znf_hrmn_rcpt!
!
!
Pkinase-A_anch_WSK-motif ; RII_binding_1!
Pkinase-A_anch_WSK-motif ; RII_binding_1!
DNA_repair_prot_UmuC-like ; DNA_pol_Y-fam_little_finger!
!
!
!
Oestr_rcpt!
!
Oestr_rcpt ; Znf_hrmn_rcpt!
Oestr_rcpt ; Znf_hrmn_rcpt!
Immunoglobulin ; Ig_I-set ; Ser-Thr/Tyr_kinase_cat_dom!
Immunoglobulin ; Ig_I-set ; Ser-Thr/Tyr_kinase_cat_dom!
!
Immunoglobulin ; Ig_I-set ; Ser-Thr/Tyr_kinase_cat_dom!
Immunoglobulin ; Ig_I-set!
Immunoglobulin ; Ig_I-set!
Asp_Arg_b-Hydrxlase!
Asp_Arg_b-Hydrxlase!
!
Clathrin_mu_C ; AP_mu_sigma_su!
Arg-tRNA-P_Trfase_C!
DUF2046!
Phosphodiest/P_Trfase ; Somatomedin_B_dom ; DNA/RNA_non-sp_Endonuclease!
Phosphodiest/P_Trfase ; Somatomedin_B_dom ; DNA/RNA_non-sp_Endonuclease!
!
For more information on MediSapiens and FusionSCOUT gene datasets and pipeline, please visit
www.medisapiens.com/products/fusionscout or contact us via email at [email protected]
www.medisapiens.com