Ebola Poster - Maulik Kamdar

Maulik R. Kamdar and Michel Dumontier
([email protected], [email protected])
Introduction & Objectives
 Ebola virus (EBOV; formerly designated Zaire ebolavirus) is a lethal Category A human pathogen, of the family Filoviridae, and is responsible for
the Ebola virus disease (EVD). EVD can cause severe hemorrhagic fever and has an average Case Fatality Rate of 71%.
 The ongoing EVD epidemic, which began in Guinea in February 2014, has spread exponentially across 5 other countries in Western Africa and
has infected at least 9380 people (as of February 15, 2015). (http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html)
 The Viral Hemorrhagic Fever Consortium sequenced a set of 99 EBOV virus sequences from 78 confirmed patients in Sierra Leone to 2000x
coverage. (BioProject: http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA257197)
 There is a dire need to consolidate and integrate all available knowledge that we currently possess or could be retrieved from open-access
knowledge bases and available literature, on the EBOV genome.
 This could lead to a better understanding of the underlying mechanisms of EVD, determination of the druggability of target domains in EBOV
and identification of small molecules which could show positive binding affinity.
 Information required to undertake the above goals is not yet available at a single, aggregated source. As a consequence, the biomedical
researcher has to traverse across several data portals to retrieve relevant knowledge before using it to formulate hypotheses.
Ebola-KB Vocabulary
Availability
http://ebola.semanticscience.org
Linked Data
Use Cases
 Retrieve knowledge from KEGG and
DrugBank on small molecule ligands which
bind to the Ebola protein ‘[AIG96339.1]
Polymerase’
 Identify the biological activities and
associated scientific publications of the
EBOV Gene ‘[AIG96339.1] Polymerase’
 Obtain scientific publications that provide
evidence for the binding of the ligands
‘[RFP] Rifampicin’, ‘[RBT] Rifabutin’ and
‘[RPT] Rifapentine’
Reference: http://lod-cloud.net/
 Bio2RDF Release 3 (http://bio2rdf.org) has
11B triples from 35 biomedical sources.
 Cross-linked with several data sources,
like Chem2Bio2RDF and MGD.
 Develop Bio-mashups and Linked
Biomedical Dataspaces facilitating
in silico drug discovery
Rifabutin
[DB00615]
Antibacterial
[antimycobacterial]
Rifamycins
DNA-dependent RNA
polymerase inhibitor, RNA
synthesis inhibitor
Rifampicin
[DB01045]
Antibacterial
 Ribosome
 Rifamycins
 DNA-dependent
RNA
polymerase inhibitor, RNA
synthesis inhibitor
 23S
rRNA
of
50S
ribosomal
subunit,
protein synthesis inhibitor
Rifapentine
[DB01201]
Antibacterial
Rifamycins
DNA-dependent RNA
polymerase inhibitor, RNA
synthesis inhibitor
 Pharmacia Inc.
 Kaiser Foundation
Hospital
 Pfizer Inc.
 Novartis AG
 Eon Labs
 Sanofi-Aventis Inc.
 UDL Laboratories
 Mckesson Corp.
 Ciba Geigy Ltd.
and 30 others
 Sanofi-Aventis Inc.
 Gruppo Lepetit SPA
CONSTRUCT {
?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight;
ebola:molecularFormula ?formula; graph:pdb-page ?pdbInfo;
graph:drugbank-label ?drugbankLabel; graph:packager
?packagerTitle;
graph:mechanism-of-action ?mechAction; graph:pharmacology
?pharmacologyDesc
} WHERE {
<http://bio2rdf.org/genbank:AIG96639.1> ebola:hasKeyword ?keyword .
?keyword ebola:x-pdb ?pdbUri .
?pdbUri ebola:hasLigand ?bioUri; ebola:pdbPage ?pdbInfo .
?bioUri ebola:chemicalName ?title; ebola:molecularWeight ?molWeight;
ebola:molecularFormula ?formula; ebola:x-drugbank
?drugbankUri .
?drugbankUri rdfs:label ?drugbankLabel
FILTER( xsd:double( ?molWeight ) < 500 ) .
{
SELECT ?drugbankUri ?mechAction ?packagerTitle ?pharmacologyDesc
WHERE {
SERVICE <http://cu.drugbank.bio2rdf.org/sparql> {
?drugbankUri drugbank:mechanism-of-action ?action;
drugbank:packager ?packager;
drugbank:pharmacology ?pharmacology .
?action dc:description ?mechAction .
?packager dc:title ?packagerTitle .
?pharmacology dc:description ?pharmacologyDesc
}
} GROUP BY ?drugbankUri
}
}
Listing: SPARQL CONSTRUCT Query for Use Case 2
Table: Knowledge on potential EBOV ‘[AIG96639.1] Polymerase’
Protein-binding Ligands retrieved from KEGG and DrugBank using
the Ebola-KB endpoint
Ebola-KB Dashboard
EBOV Genomic Wheel
System Architecture
Data sources:
NCBI Gene
PubMed
InterPro
PDB
Gene Ontology
DrugBank
KEGG
Future Work
 Include PubChem information on BioAssays and activities of small
molecules which bind potential virus targets in the Ebola-KB, by querying
the NCBI E-utilities with specific EBOV keywords.
 Delve into methods which predict small molecule binding sites on
proteins with a known or unknown structure, given a protein sequence.
 Use Mouse Model Phenotypes to study the binding profiles of the
aggregated molecules against the EBOV targets.
 Conduct a Subjectivistic User-driven study by evaluating the Ebola-KB
and the Ebola-KB Dashboard in a clinical setting.
Discussion: Challenges and Limitations
 As the sequencing of the 2014 strain of the Zaire Ebolavirus was just completed very recently, there is lack of up-to-date and integrated knowledge
pertaining to gene functions, protein interactions and activities of binding ligands.
 Some of the popular knowledge-bases like STITCH, the resource for chemical-protein interaction networks, did not provide any information on the small
molecules which bind the EBOV protein sequences, or those binding other similar proteins.
 Very few EBOV InterPro domains had actually been annotated with Gene Ontology Terms.
 Our approach to generate and use EBOV Keywords as search terms for PDB and PubMed has incorporated some ‘noise’ in the Ebola-KB, for example
information on ligands binding ‘DNA Polymerase’ in species which may not be useful. More rigorous protein and domain-similarity features could be used.
This work was funded by Stanford University start-up
fund to Michel Dumontier.
 Querying requires user knowledge on SPARQL, as well as necessitates the availability and better uptime of SPARQL Endpoints, which is not always possible.