Toward Precision Medicine: Building a patient centric information commons on common and rare diseases with I2b2/tranSMART Application to Autism and Phelan McDermid Syndrome Paul Avillach, MD, PhD Assistant Professor -‐ Harvard Medical School Center of Biomedical Informa:cs Research Connec:on – CHIP-‐ Boston Children’s Hospital Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease Report from National academy of science, USA, 2011 2 PI: Isaac Kohane – Harvard - CBMI Health care Clinical Research Health Information System Browser tools available for Researchers DRG Imaging Pathology Rx 2) Structured data from research studies ETL Biology ETL EHR reports ETL once a week EHR forms 1) Analysis tools 3) ‘omics data Phenotypic ‘omics +’omics • Integrated platform to support translational research • Initiated by Johnson & Johnson et Recombinant 7 years ago • Open-source since January 24th , 2012 • Installed at HEGP Hospital, Paris since May, 2012 • Today, driven and maintained by the tranSMART http://transmartfoundation.org Objectives : 1. Integration of clinical, biological and ‘omics data in one place – hypothesis free – 2. Generation of hypothesis by Clinicians / Researchers 7 Autism cohorts Phenotype data • Simons Simplex Collection (SSC) • AGRE • Autism Consortium (AC) • BCH i2b2 HER data 2,760 3,300 525 16,587 8 BIG data RESEARCH ARTICLE [Bioinformatics & Integrative Genomics] TCGA a Ref. 25 HGS-OvCa 489 tumours GBM b Tumour/gene groups Gene expression Low 1p36.33 10–5.5 10–10 10–20 1p34.2 1q21.2 1,000 genes Differentiated Immunoreactive Mesenchymal Proliferative 245 tumours 0. 25 a 4q13.3 5p15.33 5p13.2 High 6p22.3 Deletion Known rela:onships between genes Neutral TCGA set c 0.25training 10 10 10 b Sequence varia:on (Batches Copy 9–15; 215 tumors) number Death event (red) varia:on –20 –40 –60 10–80 Amplification TCGA test set (BatchesGene 17–24;e255 tumors) xpression 8p11.21 DEPTOR Poor prognosis genes (107) 12q13.2 12q15 14q11.2 (METTL17) 14q32.33 Good prognosis genes (85) Prognostic gene score 10 0 –10 + many more Lower risk Higher risk Lower risk 9 Higher risk inc of 10–40 BR an of BR lat BR viv for ep wi 41 sec ina an Ge mi (Su SSC (928) Raw data RNA Blood (Kunkel-Kohane) – Affymetrix Gene ST 1.0 – Affymetrix U133+2 – Illumina HiSeq 316 19 154 LCL (Geschwind) - Illumina REF-8 3.0 AC (166) 439 Blood (Kunkel-Kohane) – Affymetrix Gene ST 1.0 – Affymetrix U133+2 – Illumina HiSeq BCH (386) 117 21 28 Blood (Kunkel-Kohane) – Affymetrix Gene ST 1.0 – Affymetrix U133+2 – Illumina HiSeq AGRE (1,048) 186 168 32 LCL (Geschwind) - Illumina REF-8 3.0 1,048 10 DNA • Static genomic predisposition – Goal: enable streamlined analysis of genomic variation at any functional unit resolution • Single variant / variant type • Single gene • Gene set / pathway • Regulatory module • Cellular system • Genomic location / context … • Measurement types 11 Raw genotyping array data by cohort • SSC (3,184) – Ilumina Infinium1M – Illumina 1M Duo DNA • AC (60) - Affymetrix SNP 6.0 • AGRE (3,832) - - - - Affymetrix 10K Affymetrix 500K Illumina HumanHap550 Illumina Infinium 1M 12 WHOLE EXOME SEQUENCE data by cohort • SSC (2,963) – State 914 • Illumina GAIIx • Illumina HiSeq – Eichler DNA 676 • Illumina GAIIx • Illumina HiSeq – Wigler 1373 • Illumina HiSeq • AC (381) – Daly 381 • Illumina HiSeq • AGRE (1672) – Walsh 750 • Illumina HiSeq – BI-BCM • Illumina HiSeq 922 ABI SOLiD 13 Exome sequence data processing > Variant calling > Variant annotation > tranSMART input Physical location e.g. Chr:start-end Cytoband … Gene e.g. Gene name Variant function … Gene set e.g. Pathway Molecular process … Predicted variant impact e.g. SIFT PolyPhen … Conservation e.g. GERP PhyloP … Population frequency e.g. 1000 Genomes ESP 6500 … Clinical significance e.g. ClinVar OMIM … Expression patterns e.g. GTEx BrainSpan … Transcriptional regulation e.g. ENCODE TFBS Histone modifications … Comprehensively annotated variants Individual genotypes ANNOVAR SNPs, indels 15 Live DEMO hIps://www.youtube.com/watch?v=rUFH697a2n4 16 17 19 20 PCORnet $105 million Start date Jan 2014 18 months 11 CDRN $7M 18 PPRN $1M 21 PCORI PPRN Grant • Phelan-McDermid Syndrome Data Network – PI: Megan O’Boyle, Mother of PMS patient – Co-PI: Paul Avillach, MD, PhD • Total: $1M • 18 months • To collect all available patient data from PhelanMcDermid Syndrome (PMS) patients to make meaningful, well-annotated clinical data available to researchers and to share insights with members of the PCORI network 22 23 Peter B. Jensen, Lars J. Jensen and Søren Brunak, Nat Rev Genet. 2012 24 Kohane I, Nature Review Gen. 2011 25 Live Demo PMS_DN 26 PaHent Centric InformaHon Commons (PIC) PI: Isaac Kohane RESTful API 27 28 P50: Centers of Excellence in Genomic Science Neuropsychiatric Genome-‐Scale and RDOC Individualized Domains (N-‐GRID) PI: Isaac Kohane 500 subjects will use the same i2b2/tranSMART infrastructure to store and access all the following datatypes: • • • • • • • • • • • Results of 4h of self-report measures, clinician interview, and neuropsych testing. EMR notes and codes RDoC-parsing (NLP) of EMR notes Whole-genome (not just exome) sequencing GWAS (illumina psych chip - ~250k SNPs + CNV's) for at least a subset RNA-seq (baseline and possibly additional experimental conditions) L1000 (expression) data for up to 400+ conditions x multiple cell lines illumina expression data x multiple lines (fibroblast, NPC, differentiated neurons) nanostring (expression) data x multiple lines (iPS, NPC, differentiated neurons) ATAC-seq data x multiple lines Additional data from Greenberg lab (electrophys etc) 29 30 PCORI PMS_DN team • • • Megan O’Boyle, PI & Mom of Shannon Paul Avillach, MD, PhD , Co-PI, Harvard Medical School Liz Horn, PhD, Co-PI, Network Director PMSF Research director • Geraldine Bliss, MSc & Mom of Project Manager • Andria Cornell Mann LGC Data Network Specialist • Rebecca Davis Family Engagement Specialist • Jackie Malasky Harvard Medical School : CBMI • Sushma Hanawal • Michael McDuffie, MSc • Isaac Kohane, MD, PhD • Eric Perakslis, PhD Boston Children’s Hospital: cTAKES NLP • Guergana Savova, PhD • Pei Chen Harvard Medical School : IT Infrastructure Support • Christopher Botka • David Hummel • Daniel Lewis Avillach Lab hIp://avillach-‐lab.hms.harvard.edu Developers • Michael McDuffie, MSc • Sushma Hanawal • Pei Chen Research Associates • Antoine Neuraz, MD, MSc • Cartik Saravana, PhD Graduate students • Maxime Wack, MD • Claire Hassen-Kodja, MD, MSc • Samuel Finlayson Previous members • Ephi Sachs, MD • Heike Lehmann We are hiring now • Senior Software Developer *2 • Senior Database Developer *2 • Postdocs *2 33 P50: Centers of Excellence in Genomic Science Neuropsychiatric Genome-‐Scale and RDOC Individualized Domains (N-‐GRID) PI: Isaac Kohane 34 35
© Copyright 2024