Large-scale semantic mining of disease-phenotype annotations Robert Hoehndorf, Paul Schofield, George Gkoutos King Abdullah University of Science and Technology 24 April 2015 Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 1 / 15 Phenotype analysis Phenotypes Phenotypes are the observable characteristics of an organism arising from its genotype and its response to the environment. Analysis of phenotypes should reveal information about genotype, environment, mechanisms and processes that determine phenotype from genotype. Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 2 / 15 Phenotype analysis PhenomeNET (2011-2015): PATO-based integration of phenotypes model organisms: yeast, fly, worm, fish, mouse, rat, slime mold human: OMIM, OrphaNET, SIDER plants: Arabidopsis, maize, barrel clover, rice, soybean, tomato semantic similarity measures phenotypic similarity analysis reveals: candidate genes drug targets interactions and pathways homology Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 3 / 15 Phenotype analysis PhenomeNET (2011-2015): PATO-based integration of phenotypes model organisms: yeast, fly, worm, fish, mouse, rat, slime mold human: OMIM, OrphaNET, SIDER plants: Arabidopsis, maize, barrel clover, rice, soybean, tomato semantic similarity measures phenotypic similarity analysis reveals: candidate genes drug targets interactions and pathways homology Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 4 / 15 Phenotypes for common diseases Common diseases Can we learn something about the molecular and physiological mechanisms underlying common diseases through analysis of phenotypes? Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 5 / 15 Phenotypes for common diseases Common diseases Can we learn something about the molecular and physiological mechanisms underlying common diseases through analysis of phenotypes? First, we need a list/database/ontology of common diseases, and associated phenotypes. Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 5 / 15 Phenotypes for common diseases The Human Disease Ontology has 8,944 classes characterizing rare and common, genetic, environmental, and infectious diseases. But (almost) no phenotypes associated with diseases! Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 6 / 15 Phenotypes for common diseases Goal: find associations between diseases (from HumanDO) and phenotypes (MP, HPO) Aber-OWL (http://aber-owl.net): ontology repository automated reasoning semantic indexing of PubMed and PubMed Central query documents using OWL reasoning Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 7 / 15 Phenotypes for common diseases utilize over 5 million abstracts and articles find co-occurrences of disease and phenotype terms phenotypes for 6,220 diseases from DO 9,646 phenotypes from HPO (6,000) and MP (3,646) rank associations using statistical measures (NPMI) evaluate using genetically based diseases (from OMIM and OrphaNET) http://aber-owl.net/aber-owl/diseasephenotypes Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 8 / 15 Optimal cutoff: 21 phenotypes per disease 0.92 MGI MGI (genes) OMIM 0.9 0.88 ROCAUC 0.86 0.84 0.82 0.8 0.78 0.76 0 10 20 Robert Hoehndorf et al. (KAUST) 30 40 50 PMI rank cutoff Mining disease phenotypes 60 70 80 90 Biocuration 2015 100 9 / 15 Adding textmined phenotypes to OMIM improves gene prioritization Predicting mouse models of disease 1 0.9 0.8 True Positive Rate 0.7 0.6 0.5 0.4 0.3 0.2 Merged disease definitions OMIM disease definitions Textmined disease definitions x 0.1 0 0 Robert Hoehndorf et al. (KAUST) 0.1 0.2 0.3 0.4 0.5 0.6 MiningFalse disease phenotypes Positive Rate 0.7 0.8 0.9 1 Biocuration 2015 10 / 15 Phenotypic homogeneity between disease groups Disease category cardiovascular system disease (DOID:1287) disease by infectious agent (DOID:0050117) disease of cellular proliferation (DOID:14566) disease of mental health (DOID:150) disease of metabolism (DOID:0014667) endocrine system disease (DOID:28) gastrointestinal system disease (DOID:77) genetic disease (DOID:630) immune system disease (DOID:2914) integumentary system disease (DOID:16) musculoskeletal system disease (DOID:17) nervous system disease (DOID:863) physical disorder (DOID:0080015) reproductive system disease (DOID:15) respiratory system disease (DOID:1579) urinary system disease (DOID:18) Robert Hoehndorf et al. (KAUST) Mining disease phenotypes ROCAUC 0.720 ± 0.045 0.738 ± 0.035 0.672 ± 0.021 0.783 ± 0.049 0.733 ± 0.053 0.711 ± 0.076 0.718 ± 0.045 0.656 ± 0.070 0.730 ± 0.046 0.743 ± 0.051 0.704 ± 0.048 0.712 ± 0.026 0.593 ± 0.113 0.868 ± 0.048 0.868 ± 0.047 0.870 ± 0.050 Biocuration 2015 11 / 15 Phenotypic similarity network http://aber-owl.net/aber-owl/diseasephenotypes/network Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 12 / 15 Exploring phenotypic similarity http://aber-owl.net/aber-owl/diseasephenotypes/network survival motor neuron spinal muscular atrophy juvenile spinal muscular atrophy Cronkhite-Canada syndrome dyshidrosis tinea manuum chronic mucocutaneous candidiasis black piedra non-langerhans-cell histiocytosis alpha-mannosidosis folliculitis Pediculus humanus capitis infestation tinea cruris Tay-Sachs disease tinea pedis Fabry disease Niemann-Pick disease scalp dermatosis sphingolipidosis nonphotosensitive trichothiodystrophy histiocytosis gangliosidosis GM2 neurodermatitis tinea corporis lipid storage disease alopecia universalis follicular mucinosis Sandhoff disease lichen planus Farber lipogranulomatosis alopecia areata alopecia lysosomal storage disease angiokeratoma hair disease dermatophytosis tinea favora sea-blue histiocyte syndrome hypotrichosis gangliosidosis Krabbe disease tinea capitis metachromatic leukodystrophy keratosis follicularis tinea nigra diffuse alopecia areata telogen effluvium impulse control disorder vitiligo A. Tay-Sachs disease Proliferative Metabolic Robert Hoehndorf et al. (KAUST) B. Alopecia areata Mental health Immune Infectious Mining disease phenotypes trichotillomania Integument Other Nervous system Biocuration 2015 13 / 15 Summary text-mined disease-phenotype associations validated against known gene-disease associations using PhenomeNET phenotypic clustering and similarity network freely available Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 14 / 15 Acknowledgements George Gkoutos (Aberystwyth) Paul Schofield (Cambridge) http://aber-owl.net/aber-owl/diseasephenotypes/ Robert Hoehndorf et al. (KAUST) Mining disease phenotypes Biocuration 2015 15 / 15
© Copyright 2024