Large-scale semantic mining of disease

Large-scale semantic mining of disease-phenotype
annotations
Robert Hoehndorf, Paul Schofield, George Gkoutos
King Abdullah University of Science and Technology
24 April 2015
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
1 / 15
Phenotype analysis
Phenotypes
Phenotypes are the observable characteristics of an organism arising from
its genotype and its response to the environment.
Analysis of phenotypes should reveal information about
genotype,
environment,
mechanisms and processes that determine phenotype from genotype.
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
2 / 15
Phenotype analysis
PhenomeNET (2011-2015):
PATO-based integration of phenotypes
model organisms: yeast, fly, worm, fish, mouse, rat, slime mold
human: OMIM, OrphaNET, SIDER
plants: Arabidopsis, maize, barrel clover, rice, soybean, tomato
semantic similarity measures phenotypic similarity
analysis reveals:
candidate genes
drug targets
interactions and pathways
homology
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
3 / 15
Phenotype analysis
PhenomeNET (2011-2015):
PATO-based integration of phenotypes
model organisms: yeast, fly, worm, fish, mouse, rat, slime mold
human: OMIM, OrphaNET, SIDER
plants: Arabidopsis, maize, barrel clover, rice, soybean, tomato
semantic similarity measures phenotypic similarity
analysis reveals:
candidate genes
drug targets
interactions and pathways
homology
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
4 / 15
Phenotypes for common diseases
Common diseases
Can we learn something about the molecular and physiological
mechanisms underlying common diseases through analysis of phenotypes?
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
5 / 15
Phenotypes for common diseases
Common diseases
Can we learn something about the molecular and physiological
mechanisms underlying common diseases through analysis of phenotypes?
First, we need
a list/database/ontology of common diseases, and
associated phenotypes.
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
5 / 15
Phenotypes for common diseases
The Human Disease Ontology has 8,944 classes characterizing
rare and common,
genetic, environmental, and infectious diseases.
But (almost) no phenotypes associated with diseases!
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
6 / 15
Phenotypes for common diseases
Goal: find associations between diseases (from HumanDO) and
phenotypes (MP, HPO)
Aber-OWL (http://aber-owl.net):
ontology repository
automated reasoning
semantic indexing of PubMed and PubMed Central
query documents using OWL reasoning
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
7 / 15
Phenotypes for common diseases
utilize over 5 million abstracts and articles
find co-occurrences of disease and phenotype terms
phenotypes for 6,220 diseases from DO
9,646 phenotypes from HPO (6,000) and MP (3,646)
rank associations using statistical measures (NPMI)
evaluate using genetically based diseases (from OMIM and
OrphaNET)
http://aber-owl.net/aber-owl/diseasephenotypes
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
8 / 15
Optimal cutoff: 21 phenotypes per disease
0.92
MGI
MGI (genes)
OMIM
0.9
0.88
ROCAUC
0.86
0.84
0.82
0.8
0.78
0.76
0
10
20
Robert Hoehndorf et al. (KAUST)
30
40
50
PMI rank
cutoff
Mining disease
phenotypes
60
70
80
90
Biocuration 2015
100
9 / 15
Adding textmined phenotypes to OMIM improves gene
prioritization
Predicting mouse models of disease
1
0.9
0.8
True Positive Rate
0.7
0.6
0.5
0.4
0.3
0.2
Merged disease definitions
OMIM disease definitions
Textmined disease definitions
x
0.1
0
0
Robert Hoehndorf et al. (KAUST)
0.1
0.2
0.3
0.4
0.5
0.6
MiningFalse
disease
phenotypes
Positive
Rate
0.7
0.8
0.9
1
Biocuration 2015
10 / 15
Phenotypic homogeneity between disease groups
Disease category
cardiovascular system disease (DOID:1287)
disease by infectious agent (DOID:0050117)
disease of cellular proliferation (DOID:14566)
disease of mental health (DOID:150)
disease of metabolism (DOID:0014667)
endocrine system disease (DOID:28)
gastrointestinal system disease (DOID:77)
genetic disease (DOID:630)
immune system disease (DOID:2914)
integumentary system disease (DOID:16)
musculoskeletal system disease (DOID:17)
nervous system disease (DOID:863)
physical disorder (DOID:0080015)
reproductive system disease (DOID:15)
respiratory system disease (DOID:1579)
urinary system disease (DOID:18)
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
ROCAUC
0.720 ± 0.045
0.738 ± 0.035
0.672 ± 0.021
0.783 ± 0.049
0.733 ± 0.053
0.711 ± 0.076
0.718 ± 0.045
0.656 ± 0.070
0.730 ± 0.046
0.743 ± 0.051
0.704 ± 0.048
0.712 ± 0.026
0.593 ± 0.113
0.868 ± 0.048
0.868 ± 0.047
0.870 ± 0.050
Biocuration 2015
11 / 15
Phenotypic similarity network
http://aber-owl.net/aber-owl/diseasephenotypes/network
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
12 / 15
Exploring phenotypic similarity
http://aber-owl.net/aber-owl/diseasephenotypes/network
survival motor neuron spinal muscular atrophy
juvenile spinal muscular atrophy
Cronkhite-Canada syndrome
dyshidrosis
tinea manuum
chronic mucocutaneous candidiasis
black piedra
non-langerhans-cell histiocytosis
alpha-mannosidosis
folliculitis
Pediculus humanus capitis infestation
tinea cruris
Tay-Sachs disease
tinea pedis
Fabry disease
Niemann-Pick disease
scalp dermatosis
sphingolipidosis
nonphotosensitive trichothiodystrophy
histiocytosis
gangliosidosis GM2
neurodermatitis
tinea corporis
lipid storage disease
alopecia universalis
follicular mucinosis
Sandhoff disease
lichen planus
Farber lipogranulomatosis
alopecia areata
alopecia
lysosomal storage disease
angiokeratoma
hair disease
dermatophytosis
tinea favora
sea-blue histiocyte syndrome
hypotrichosis
gangliosidosis
Krabbe disease
tinea capitis
metachromatic leukodystrophy
keratosis follicularis
tinea nigra
diffuse alopecia areata
telogen effluvium
impulse control disorder
vitiligo
A. Tay-Sachs disease
Proliferative
Metabolic
Robert Hoehndorf et al. (KAUST)
B. Alopecia areata
Mental health
Immune
Infectious
Mining disease phenotypes
trichotillomania
Integument
Other
Nervous system
Biocuration 2015
13 / 15
Summary
text-mined disease-phenotype associations
validated against known gene-disease associations
using PhenomeNET
phenotypic clustering and similarity network
freely available
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
14 / 15
Acknowledgements
George Gkoutos (Aberystwyth)
Paul Schofield (Cambridge)
http://aber-owl.net/aber-owl/diseasephenotypes/
Robert Hoehndorf et al. (KAUST)
Mining disease phenotypes
Biocuration 2015
15 / 15