MyVariant.info

MyVariant.info
Community-aggregated Variant Annotations As a Service
Chunlei Wu, Ph.D.
[email protected]
@chunleiwu
Associate Professor of Molecular Medicine
Dept. of Molecular Experimental Medicine
The Scripps Research Institute
La Jolla, CA, USA
Biocuration 2015
04/24/2015
MyVariant.info
Community-aggregated Variant
Annotations As a Service
Core team:
Ben Ainscough
Chunlei Wu
Chris Mungall
Robin Haw
Sean Mooney
Trish Whetzel
Variant annotation: a million dollar task
Observed genetic variants
Clinical guidance
&
Cure for diseases
So many variant annotation resources
dbNSFP
Tools for Predicting Deleterious Mutations
•
Dmutant
•
NETdiseaseSNP
•
nsSNPAnalyzer
•
SNAP
•
PhD-SNP
•
UMD-predictor
•
SNPs&GO
•
WoLF-PSORT
•
PolyPhen
•
MAPP
•
SIFT
•
Sauders & Baker
•
SNPs3D
•
PON-P
•
SAAP
•
Condel
•
CanPredict
•
SAPred
•
AutoMute
•
SNPinfo/FuncPred
•
CUPSAT
•
FoldX
•
CHASM
•
SNPeffect
•
PantherPSEC
•
Mutationsassessor.org (Xvar)
•
Pmut
•
MultiMutate
•
Align-GVGD
•
Mupro
•
LS_SNP/PDB
•
SKIPPY
•
MutPred
•
VAAST
Modified from Dr. Gary Bader’s talk at #isb2014
Challenges?
Challenges - 1
Fragmented
variants annotations
Challenges - 2
Repeated effort
parsing/aggregating flatfiles
very error-prone
burden of data upgrade
Challenges - 3
Users are
hard to contribute
to data repositories
Common procedure for variant data retrieval
Data sources
Data parsing
dbSNP
ClinVar
EVS
...
update regularly
Merged Variant Data
Common procedure for variant data retrieval
Data sources
Data parsing
As A Service?
dbSNP
ClinVar
EVS
...
update regularly
Merged Variant Data
Making web services suitable for realtime applications

Fast

Always ON

Up-to-date

Scalable

Extensible
A variant annotation represented in JSON
{
"chrom": "1",
"hg19": {
"start": 196659237,
"end": 196659237
},
"ref": "C",
"alt": "T",
"tumor_site": "breast",
"mut_freq": 0.49,
"mut_nt": "C>T",
"cosmic_id": "COSM424915"
}
• Rich data structures to represent hetergenous data
• Friendly for both human and computers
MyVariant.info variant objects
{
"_id": "chr1:g.196659237C>T",
"cosmic": {
"chrom": "1",
"hg19": {
"start": 196659237,
"end": 196659237
},
"ref": "C",
"alt": "T",
"tumor_site": "breast",
"mut_freq": 0.49,
"mut_nt": "C>T",
"cosmic_id": "COSM424915"
}
{
"_id": "chr1:g.196659237C>T",
"cadd": { … },
"clinvar": { … },
"cosmic": { … },
"dbsnp": { … },
"dbnsfp": { … },
"evs": { … },
"emv": { … },
"mutdb": { … },
"gwassnp": { … },
"snpedia": { … },
"wellderly": { … }
}
Variant annotations aggregated by matching HGVS names
Only genomic-based HGVS names are used (currently on hg19)
HGVS name examples
Table . Examples of HGVS (Human Genome Variation Society) nomenclature.
Simple Aggregation mechanism
{
"_id": "chr1:g.196659237C>T",
“dbsnp": {
"snpclass": "single",
"rsid": "rs1061170",
"func": "missense"
}
{
"_id": "chr1:g.196659237C>T",
"cadd": { … },
"clinvar": { … },
"cosmic": { … },
"dbsnp": { … },
"dbnsfp": { … },
"evs": { … },
"emv": { … },
"mutdb": { … },
"gwassnp": { … },
"snpedia": { … },
"wellderly": { … }
}
{
"_id": "chr1:g.196659237C>T",
“cosmic": {
"tumor_site": "breast",
"mut_freq": 0.49,
}
}
}
{
"_id": "chr1:g.196659237C>T",
“dbnsfp": {
“sift": {
"breast“: “tolerated”,
“val”: 1
}
}
}
“cadd”
“clinvar”
“evs”
…
“mutdb”
A real example
A real example online
Syncing from data-hub to query instance
Data hub
Query instance
Data sources
dbSNP
Merging
ClinVar
Query Engine
EVS
...
User queries
Syncing from data-hub to query instance
Data hub
Query instance
Cluster
Data sources
dbSNP
Query Engine
Query Engine
Query Engine
Query Engine
Merging
ClinVar
EVS
...
User queries
MyVariant.info for the end users:
http://MyVariant.info
(currently v1 API, two endpoints)
any query term(s)
hgvs id(s)
http://MyVariant.info/v1/query?q=<query>
http://MyVariant.info/v1/variant/<variantid>
matching variant hits
matching variant object(s)
Simple API. No sign-up. No API key.
Try our live API , more docs are coming soon
Both supports batch-mode via POST
Retrieving a single variant
http://myvariant.info/v1/variant/chr1:g.31349647C>T
Integrated annotations across
resources in well-formatted data
structure
Always up-to-date
Filtering returned fields
http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp
http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar
http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar,db
snp.gmaf,clinvar.hgvs.coding
Making flexible queries
•
All variants with dbNSFP annotation:
http://myvariant.info/v1/query?q=_exists_:dbnsfp
•
All non-synonymous variants on gene "BTK":
http://myvariant.info/v1/query?q=dbnsfp.genename:BTK
•
All variants within a genomic range:
http://myvariant.info/v1/query?q=chr1:69000-70000
•
Query Wellderly variants together with other annotation sources:
http://myvariant.info/v1/query?q=_exists_:wellderly AND cadd.polyphen.cat:possibly_damaging
&fields=wellderly,cadd.polyphen
Many more ways of querying, across resources

Full-text queries

Wildcard queries

Range queries

Boolean queries

Regex queries

Field existing/missing

Faceting

Paging

Sorting

Batch queries

Support JSONP, CORS

…
MyVariant.info stats
•
total (286,219,908)
•
dbNSFP (78,045,379; v2.9)
•
dbSNP (110,234,210; v142)
•
ClinVar (85,789; 20150323)
•
EVS (1,977,300; v2)
•
CADD (163,690,986; v1.2)
•
MutDB (420,221)
•
gwassnps (15,243; from UCSC)
•
COSMIC (1,024,498; v68 from UCSC)
•
DOCM (1,119)
•
SNPedia (5,907)
•
EMVClass (12,066)
•
Wellderly (21,240,519)
As of April, 2015
Use case 1
An easy resource to retrieve
well-structured variant
annotations
Use case 2
Direct queries integrated in
your analysis pipeline
An example workflow for variant prioritization
input variants
Filter for NS/SS variants
Filter for rare
variants
cadd.consequence:non_synonymous
AND
cadd.consequence:splice_site
dbnsfp.exac.af<0.0001
Filter for common
genes across
samples
Sort by cadd
scores
output variants
fields=cadd.phred,cadd.rawscore
Use case 3
For curator/data provider:
A platform for
integrating with other resources
(saving repetitive efforts)
distribute your valuable data
(under your own source field)
Use case 4
For variant curation itself:
Identify discrepancies
Serve as the base of community-engaged
curation process
Base for the community curation
q=cadd.consequence:synonymous AND _exists_:dbnsfp
{
"_id": "chr1:g.196659237C>T",
"cadd": { … },
"clinvar": { … },
"cosmic": { … },
"dbsnp": { … },
"dbnsfp": { … },
"evs": { … },
"emv": { … },
"mutdb": { … },
"gwassnp": { … },
"snpedia": { … },
"wellderly": { … },
“curated”: {…}
}
Extending to other biological entities
Gene
Pathway
Genetic Variant
G
P
V
Disease
Metabolite
D
M
MyGene.info
Acknowledgement
TSRI:
STSI:
UC Berkeley:
Andrew Su
Adam Mark
Jiwen Xin
Cyrus Afrasiabi
Ginger Tsueng
Eric Topol
Ali Torkamani
Galina Erikson
Chris Mungall
Washtington U:
Robin Haw
U. Washington:
Ben Ainscough
Obi Griffith
UCSD:
Sean Mooney
OICR:
Trish Whetzel
MyVariant.info
MyGene.info
Funding and Support
U54GM114833
R01GM083924