Entrez`s PubChem

PubChem BioAssay – Connecting
chemical genomics and genomic
information resources at NCBI
Yanli Wang
Biocuration 2015
Beijing, China
April 25, 2015
Battle against disease …
No sharing
Data was locked in
3
PubChem …
http://pubchem.ncbi.nlm.nih.gov
Small molecule
and RNAi
Three connected databases
… A public resource providing chemical structures and
comprehensive information on the biological activities of small
molecules and other reagents including RNAi
PubChem goals
• a public repository for molecular structures
and bioassay data
• integrate information from various sources
to provide value added annotation
• develop data search, analysis, and retrieval
tools to facilitate the use of chemical
structures and bioactivity results
9
PubChem history: the MLI initiative
(http://mli.nih.gov/mli/ 2004-2014)
•
•
•
identify and develop small-molecule chemical probes through highthroughput screening (HTS)
expand the use of chemical probes for basic research (e.g.
understanding the unknown biological systems, rare diseases)
all data must be deposited into PubChem to be public
Science, 2004, 306:1138-1139
14
PubChem: go beyond the MLI
300 sources
for substance
60 sources
for bioassay
16 MLI sources






academic institution (Broad Institute, Harvard U., John Hopkins U.)
government agency (FDA, EPA, NCATS, NIAID, NIST)
pharmaceutical company (Sigma-Aldrich, GlaxoSmithKline, Abbott)
journal publisher (Nature Publishing Group, Web of Science)
database provider (DrugBank, ChEMBL, KEGG, SureChem, NMRShiftDB)
other
See full list at: http://pubchem.ncbi.nlm.nih.gov/sources/
15
PubChem is a community hub
PubChem Growth …
- Substances & Chemical Structure
PubChem Growth …
- BioAssay Data Entries (AID count)
1,000,000 bioassay records





HTS experiment
Chemical biology research
Medicinal chemistry study
Toxicology
Literature based data
 Selectivity profiling
 Physicochemical property
profiling
 Small Molecule
 RNAi
Literature Based BioAssay Data Content
– Data from 54,000 publications
Journal Name
Number of unique Pubmed ID (pmid)
Number of BioAssay records (AID)
Journal of medicinal chemistry
17516
319947
Bioorganic & medicinal chemistry
15313
166466
Journal of natural products
5438
41378
Bioorganic & medicinal chemistry
4954
67392
European journal of medicinal
3192
51580
2125
67750
The Journal of biological chemistry
179
3410
Nature chemical biology
171
6559
ACS medicinal chemistry letters
155
2524
Proceedings of the National Academy
117
4154
99
663
Science (New York, N.Y.)
17
970
Nature
14
501
letters
chemistry
Antimicrobial agents and
chemotherapy
of Sciences of the United States of
America
The Journal of pharmacology and
experimental therapeutics
Literature Based BioAssay Data Content
– RNAi screens
Journal Name
Number of unique Pubmed ID (pmid)
Number of BioAssay records (AID)
Nature
4
5
Science (New York, N.Y.)
4
4
Proceedings of the National Academy
3
3
Nature cell biology
3
3
Genes & development
2
2
Molecular cell
2
2
Cancer research
1
2
Science signaling
1
1
The Journal of biological chemistry
1
1
The Journal of cell biology
1
1
Genome research
1
1
Nature genetics
1
1
Journal of cell science
1
2
Molecular biology of the cell
1
1
PLoS genetics
1
1
of Sciences of the United States of
America
Growth In PubChem Usage
PubChem Usage
PubChem links to many other databases
PubMed
Protein
a literature db
OMIM
BioSystems
a pathway db
Gene
MeSH
Nucleotide
Depositor links
GEO
Taxonomy
CDD
a conserved domain db
Structure
a mirror of Protein Data Bank (PDB)
25
Protein BioAssay Target …
Protein class for BioAssay targets
Enzyme (4399)
Membrane receptor (705)
Ion channel (409)
Transcription factor (224)
Transporter (180)
Epigenetic regulator (166)
Secreted protein (63)
Structural protein (53)
Auxiliary transport protein (32)
Surface antigen (20)
Adhesion (15)
Other cytosolic protein (256)
Other membrane protein (18)
Other nuclear protein (18)
Unclassified protein (1170)
Protein class for MLP probe targets
Enzyme (27)
Membrane receptor (25)
Transporter (6)
Transcription factor (5)
Ion channel (3)
Epigenetic regulator (1)
Other cytosolic protein (1)
Unclassified protein (19)
Biological Pathways for Protein Target …
BioSystems name
KEGG id (conserved pathway)
Count of genes
Neuroactive ligand-receptor interaction
ko04080
623
Calcium signaling pathway
ko04020
329
cAMP signaling pathway
ko04024
299
PI3K-Akt signaling pathway
ko04151
279
MAPK signaling pathway
ko04010
252
Ribosome
ko03010
220
Proteoglycans in cancer
ko05205
194
cGMP-PKG signaling pathway
ko04022
190
Focal adhesion
ko04510
189
Rap1 signaling pathway
ko04015
186
Oxytocin signaling pathway
ko04921
181
Retrograde endocannabinoid signaling
ko04723
168
Inflammatory mediator regulation of TRP channels
ko04750
168
HTLV-I infection
ko05166
165
Vascular smooth muscle contraction
ko04270
165
Chemokine signaling pathway
ko04062
163
Alzheimer's disease
ko05010
161
Epstein-Barr virus infection
ko05169
155
Adrenergic signaling in cardiomyocytes
ko04261
155
Dopaminergic synapse
ko04728
153
Chemical Probe
F2RL3 antagonists
IC50: 0.139 uM
(CID: 2333)
CHRM5 antagonists
IC50: 0.44uM
(CID: 42519285)
mGlu5 positive
allosteric
Potentiator
EC50: 2.411 uM
(CID: 1318633)
STAR inhibitor
IC50: 2.12 uM
(CID: 45100448)
… 200
more
EGFR inhibitor
IC50: 0.7079 uM
(CID: 2303746)
mGluR3 modulator
IC50: 2.611 uM
(CID: 60210836)
Thyroid Hormone Receptor /
Steroid Receptor Coregulator 2
interaction inhibitor
Potency: 1.4uM
(CID: 5184800)
MRGPRX1
allosteric activator
EC50: 0.19 uM
(CID:71598556)
30
A Small Molecule BioAssay Record…
http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1284
BioAssay Descriptions & Data …
http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1284
BioAssay Target …
http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1284
A RNAi BioAssay Record…
http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=720703
Gene target
PubChem Search
http://PubChem.ncbi.nlm.nih.gov/
http://www.ncbi.nlm.nih.gov/
37
BioAssay search by target name
39
BioAssay from NCBI Protein search
GI , accession, UniProt
40
BioAssay from NCBI Gene search
Select RNAi
data
41
BioAssay tool: related bioassays
Assays related by target
43
Upload screening results
•
•
•
•
•
Spreadsheet load
File or web
Include all test results
E.g. an article table
Auto-validations
On-hold Record Flexibility
• Time the public release of your
data
• Satisfy journals and funding
agencies
• Receive PubChem Ids up-front
• Create View Keys to privately
share URLs with collaborators
• Keep data on-hold as long as
necessary
Acknowledgement
Steve Bryant
Evan Bolton
Ben Shoemaker
Jie Chen
Paul Thiessen
Tiejun Cheng
Jiyao Wang
Gang Fu
Bo Yu
Haehnke Volker
Jian Zhang
Lewis Geer
Renata Geer
Asta Gindulyte
Lianyi Han
Jane He
Siqian He
Sunghwan Kim