Maulik R. Kamdar and Michel Dumontier ([email protected], [email protected]) Introduction & Objectives Ebola virus (EBOV; formerly designated Zaire ebolavirus) is a lethal Category A human pathogen, of the family Filoviridae, and is responsible for the Ebola virus disease (EVD). EVD can cause severe hemorrhagic fever and has an average Case Fatality Rate of 71%. The ongoing EVD epidemic, which began in Guinea in February 2014, has spread exponentially across 5 other countries in Western Africa and has infected at least 9380 people (as of February 15, 2015). (http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html) The Viral Hemorrhagic Fever Consortium sequenced a set of 99 EBOV virus sequences from 78 confirmed patients in Sierra Leone to 2000x coverage. (BioProject: http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA257197) There is a dire need to consolidate and integrate all available knowledge that we currently possess or could be retrieved from open-access knowledge bases and available literature, on the EBOV genome. This could lead to a better understanding of the underlying mechanisms of EVD, determination of the druggability of target domains in EBOV and identification of small molecules which could show positive binding affinity. Information required to undertake the above goals is not yet available at a single, aggregated source. As a consequence, the biomedical researcher has to traverse across several data portals to retrieve relevant knowledge before using it to formulate hypotheses. Ebola-KB Vocabulary Availability http://ebola.semanticscience.org Linked Data Use Cases Retrieve knowledge from KEGG and DrugBank on small molecule ligands which bind to the Ebola protein ‘[AIG96339.1] Polymerase’ Identify the biological activities and associated scientific publications of the EBOV Gene ‘[AIG96339.1] Polymerase’ Obtain scientific publications that provide evidence for the binding of the ligands ‘[RFP] Rifampicin’, ‘[RBT] Rifabutin’ and ‘[RPT] Rifapentine’ Reference: http://lod-cloud.net/ Bio2RDF Release 3 (http://bio2rdf.org) has 11B triples from 35 biomedical sources. Cross-linked with several data sources, like Chem2Bio2RDF and MGD. Develop Bio-mashups and Linked Biomedical Dataspaces facilitating in silico drug discovery Rifabutin [DB00615] Antibacterial [antimycobacterial] Rifamycins DNA-dependent RNA polymerase inhibitor, RNA synthesis inhibitor Rifampicin [DB01045] Antibacterial Ribosome Rifamycins DNA-dependent RNA polymerase inhibitor, RNA synthesis inhibitor 23S rRNA of 50S ribosomal subunit, protein synthesis inhibitor Rifapentine [DB01201] Antibacterial Rifamycins DNA-dependent RNA polymerase inhibitor, RNA synthesis inhibitor Pharmacia Inc. Kaiser Foundation Hospital Pfizer Inc. Novartis AG Eon Labs Sanofi-Aventis Inc. UDL Laboratories Mckesson Corp. Ciba Geigy Ltd. and 30 others Sanofi-Aventis Inc. Gruppo Lepetit SPA CONSTRUCT { ?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight; ebola:molecularFormula ?formula; graph:pdb-page ?pdbInfo; graph:drugbank-label ?drugbankLabel; graph:packager ?packagerTitle; graph:mechanism-of-action ?mechAction; graph:pharmacology ?pharmacologyDesc } WHERE { <http://bio2rdf.org/genbank:AIG96639.1> ebola:hasKeyword ?keyword . ?keyword ebola:x-pdb ?pdbUri . ?pdbUri ebola:hasLigand ?bioUri; ebola:pdbPage ?pdbInfo . ?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight; ebola:molecularFormula ?formula; ebola:x-drugbank ?drugbankUri . ?drugbankUri rdfs:label ?drugbankLabel FILTER( xsd:double( ?molWeight ) < 500 ) . { SELECT ?drugbankUri ?mechAction ?packagerTitle ?pharmacologyDesc WHERE { SERVICE <http://cu.drugbank.bio2rdf.org/sparql> { ?drugbankUri drugbank:mechanism-of-action ?action; drugbank:packager ?packager; drugbank:pharmacology ?pharmacology . ?action dc:description ?mechAction . ?packager dc:title ?packagerTitle . ?pharmacology dc:description ?pharmacologyDesc } } GROUP BY ?drugbankUri } } Listing: SPARQL CONSTRUCT Query for Use Case 2 Table: Knowledge on potential EBOV ‘[AIG96639.1] Polymerase’ Protein-binding Ligands retrieved from KEGG and DrugBank using the Ebola-KB endpoint Ebola-KB Dashboard EBOV Genomic Wheel System Architecture Data sources: NCBI Gene PubMed InterPro PDB Gene Ontology DrugBank KEGG Future Work Include PubChem information on BioAssays and activities of small molecules which bind potential virus targets in the Ebola-KB, by querying the NCBI E-utilities with specific EBOV keywords. Delve into methods which predict small molecule binding sites on proteins with a known or unknown structure, given a protein sequence. Use Mouse Model Phenotypes to study the binding profiles of the aggregated molecules against the EBOV targets. Conduct a Subjectivistic User-driven study by evaluating the Ebola-KB and the Ebola-KB Dashboard in a clinical setting. Discussion: Challenges and Limitations As the sequencing of the 2014 strain of the Zaire Ebolavirus was just completed very recently, there is lack of up-to-date and integrated knowledge pertaining to gene functions, protein interactions and activities of binding ligands. Some of the popular knowledge-bases like STITCH, the resource for chemical-protein interaction networks, did not provide any information on the small molecules which bind the EBOV protein sequences, or those binding other similar proteins. Very few EBOV InterPro domains had actually been annotated with Gene Ontology Terms. Our approach to generate and use EBOV Keywords as search terms for PDB and PubMed has incorporated some ‘noise’ in the Ebola-KB, for example information on ligands binding ‘DNA Polymerase’ in species which may not be useful. More rigorous protein and domain-similarity features could be used. This work was funded by Stanford University start-up fund to Michel Dumontier. Querying requires user knowledge on SPARQL, as well as necessitates the availability and better uptime of SPARQL Endpoints, which is not always possible.
© Copyright 2024