Toward Precision Medicine: Building a patient

Toward Precision Medicine:
Building a patient centric information commons on common and
rare diseases with I2b2/tranSMART
Application to Autism and Phelan McDermid Syndrome
Paul Avillach, MD, PhD Assistant Professor -­‐ Harvard Medical School Center of Biomedical Informa:cs Research Connec:on – CHIP-­‐ Boston Children’s Hospital Toward Precision Medicine: Building a Knowledge Network for
Biomedical Research and a New Taxonomy of Disease
Report from National academy of science, USA, 2011
2
PI: Isaac Kohane – Harvard - CBMI
Health care
Clinical Research
Health Information System
Browser tools
available
for Researchers
DRG
Imaging
Pathology
Rx
2) Structured
data from
research
studies
ETL
Biology
ETL
EHR reports
ETL once a week
EHR forms
1) Analysis tools 3) ‘omics data
Phenotypic ‘omics
+’omics
•  Integrated platform to support translational research
•  Initiated by Johnson & Johnson et Recombinant 7 years ago
•  Open-source since January 24th , 2012
•  Installed at HEGP Hospital, Paris since May, 2012
•  Today, driven and maintained by the tranSMART
http://transmartfoundation.org
Objectives :
1.  Integration of clinical, biological and ‘omics
data in one place – hypothesis free –
2.  Generation of hypothesis by Clinicians /
Researchers
7
Autism cohorts
Phenotype data
•  Simons Simplex Collection (SSC)
•  AGRE
•  Autism Consortium (AC)
•  BCH i2b2 HER data
2,760
3,300
525
16,587
8
BIG data
RESEARCH ARTICLE
[Bioinformatics & Integrative Genomics]
TCGA
a
Ref. 25
HGS-OvCa
489 tumours GBM
b
Tumour/gene groups
Gene expression
Low
1p36.33
10–5.5 10–10
10–20
1p34.2
1q21.2
1,000 genes
Differentiated
Immunoreactive
Mesenchymal
Proliferative
245 tumours
0.
25
a
4q13.3
5p15.33
5p13.2
High
6p22.3
Deletion
Known rela:onships between genes Neutral
TCGA
set
c 0.25training
10
10
10
b
Sequence varia:on (Batches Copy 9–15; 215
tumors)
number Death event (red)
varia:on –20
–40
–60
10–80
Amplification
TCGA test set
(BatchesGene 17–24;e255
tumors)
xpression 8p11.21
DEPTOR
Poor prognosis
genes (107)
12q13.2
12q15
14q11.2 (METTL17)
14q32.33
Good prognosis
genes (85)
Prognostic
gene score
10
0
–10
+ many more Lower risk
Higher risk
Lower risk
9
Higher risk
inc
of
10–40
BR
an
of
BR
lat
BR
viv
for
ep
wi
41
sec
ina
an
Ge
mi
(Su
SSC (928)
Raw data
RNA Blood (Kunkel-Kohane)
–  Affymetrix Gene ST 1.0
–  Affymetrix U133+2
–  Illumina HiSeq
316
19
154
LCL (Geschwind)
- Illumina REF-8 3.0
AC (166)
439
Blood (Kunkel-Kohane)
–  Affymetrix Gene ST 1.0
–  Affymetrix U133+2
–  Illumina HiSeq
BCH (386)
117
21
28
Blood (Kunkel-Kohane)
–  Affymetrix Gene ST 1.0
–  Affymetrix U133+2
–  Illumina HiSeq
AGRE (1,048)
186
168
32
LCL (Geschwind)
- Illumina REF-8 3.0
1,048
10
DNA •  Static genomic predisposition
–  Goal: enable streamlined analysis of genomic
variation at any functional unit resolution
•  Single variant / variant type
•  Single gene
•  Gene set / pathway
•  Regulatory module
•  Cellular system
•  Genomic location / context
…
•  Measurement types
11
Raw genotyping array data by cohort
•  SSC (3,184)
–  Ilumina Infinium1M
–  Illumina 1M Duo
DNA •  AC (60)
- Affymetrix SNP 6.0
•  AGRE (3,832)
- 
- 
- 
- 
Affymetrix 10K
Affymetrix 500K
Illumina HumanHap550
Illumina Infinium 1M
12
WHOLE EXOME SEQUENCE data by cohort
•  SSC (2,963)
–  State
914
•  Illumina GAIIx
•  Illumina HiSeq
–  Eichler
DNA 676
•  Illumina GAIIx
•  Illumina HiSeq
–  Wigler
1373
•  Illumina HiSeq
•  AC (381)
–  Daly
381
•  Illumina HiSeq
•  AGRE (1672)
–  Walsh
750
•  Illumina HiSeq
–  BI-BCM
•  Illumina HiSeq
922
ABI SOLiD
13
Exome sequence
data processing
>
Variant calling
>
Variant annotation > tranSMART input
Physical location
e.g. Chr:start-end
Cytoband
…
Gene
e.g. Gene name
Variant function
…
Gene set
e.g. Pathway
Molecular process
…
Predicted variant impact
e.g. SIFT
PolyPhen
…
Conservation
e.g. GERP
PhyloP
…
Population frequency
e.g. 1000 Genomes
ESP 6500
…
Clinical significance
e.g. ClinVar
OMIM
…
Expression patterns
e.g. GTEx
BrainSpan
…
Transcriptional regulation
e.g. ENCODE TFBS
Histone modifications
…
Comprehensively
annotated variants
Individual genotypes
ANNOVAR
SNPs,
indels
15
Live DEMO hIps://www.youtube.com/watch?v=rUFH697a2n4 16
17
19
20
PCORnet $105 million Start date Jan 2014 18 months 11 CDRN $7M 18 PPRN $1M 21
PCORI PPRN Grant
•  Phelan-McDermid Syndrome Data Network
–  PI: Megan O’Boyle, Mother of PMS patient
–  Co-PI: Paul Avillach, MD, PhD
•  Total: $1M
•  18 months
•  To collect all available patient data from PhelanMcDermid Syndrome (PMS) patients to make meaningful,
well-annotated clinical data available to researchers and to
share insights with members of the PCORI network
22
23
Peter B. Jensen, Lars J. Jensen and Søren Brunak, Nat Rev Genet. 2012 24
Kohane I, Nature Review Gen. 2011
25
Live Demo PMS_DN
26
PaHent Centric InformaHon Commons (PIC) PI: Isaac Kohane RESTful API 27
28
P50: Centers of Excellence in Genomic Science Neuropsychiatric Genome-­‐Scale and RDOC Individualized Domains (N-­‐GRID) PI: Isaac Kohane 500 subjects will use the same i2b2/tranSMART infrastructure to store and access all the following
datatypes:
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 
Results of 4h of self-report measures, clinician interview, and neuropsych testing.
EMR notes and codes
RDoC-parsing (NLP) of EMR notes
Whole-genome (not just exome) sequencing
GWAS (illumina psych chip - ~250k SNPs + CNV's) for at least a subset
RNA-seq (baseline and possibly additional experimental conditions)
L1000 (expression) data for up to 400+ conditions x multiple cell lines
illumina expression data x multiple lines (fibroblast, NPC, differentiated neurons)
nanostring (expression) data x multiple lines (iPS, NPC, differentiated neurons)
ATAC-seq data x multiple lines
Additional data from Greenberg lab (electrophys etc)
29
30
PCORI PMS_DN team
• 
• 
• 
Megan O’Boyle, PI & Mom of Shannon
Paul Avillach, MD, PhD , Co-PI, Harvard
Medical School
Liz Horn, PhD, Co-PI, Network Director
PMSF Research director
•  Geraldine Bliss, MSc & Mom of
Project Manager
•  Andria Cornell Mann
LGC Data Network Specialist
•  Rebecca Davis
Family Engagement Specialist
•  Jackie Malasky
Harvard Medical School : CBMI
•  Sushma Hanawal
•  Michael McDuffie, MSc
•  Isaac Kohane, MD, PhD
•  Eric Perakslis, PhD
Boston Children’s Hospital: cTAKES NLP
•  Guergana Savova, PhD
•  Pei Chen
Harvard Medical School : IT Infrastructure
Support
•  Christopher Botka
•  David Hummel
•  Daniel Lewis
Avillach Lab
hIp://avillach-­‐lab.hms.harvard.edu Developers
•  Michael McDuffie, MSc
•  Sushma Hanawal
•  Pei Chen
Research Associates
•  Antoine Neuraz, MD, MSc
•  Cartik Saravana, PhD
Graduate students
•  Maxime Wack, MD
•  Claire Hassen-Kodja, MD, MSc
•  Samuel Finlayson
Previous members
•  Ephi Sachs, MD
•  Heike Lehmann
We are hiring now
•  Senior Software Developer *2
•  Senior Database Developer *2
•  Postdocs *2
33
P50: Centers of Excellence in Genomic Science Neuropsychiatric Genome-­‐Scale and RDOC Individualized Domains (N-­‐GRID) PI: Isaac Kohane 34
35