MyVariant.info Community-aggregated Variant Annotations As a Service Chunlei Wu, Ph.D. [email protected] @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA Biocuration 2015 04/24/2015 MyVariant.info Community-aggregated Variant Annotations As a Service Core team: Ben Ainscough Chunlei Wu Chris Mungall Robin Haw Sean Mooney Trish Whetzel Variant annotation: a million dollar task Observed genetic variants Clinical guidance & Cure for diseases So many variant annotation resources dbNSFP Tools for Predicting Deleterious Mutations • Dmutant • NETdiseaseSNP • nsSNPAnalyzer • SNAP • PhD-SNP • UMD-predictor • SNPs&GO • WoLF-PSORT • PolyPhen • MAPP • SIFT • Sauders & Baker • SNPs3D • PON-P • SAAP • Condel • CanPredict • SAPred • AutoMute • SNPinfo/FuncPred • CUPSAT • FoldX • CHASM • SNPeffect • PantherPSEC • Mutationsassessor.org (Xvar) • Pmut • MultiMutate • Align-GVGD • Mupro • LS_SNP/PDB • SKIPPY • MutPred • VAAST Modified from Dr. Gary Bader’s talk at #isb2014 Challenges? Challenges - 1 Fragmented variants annotations Challenges - 2 Repeated effort parsing/aggregating flatfiles very error-prone burden of data upgrade Challenges - 3 Users are hard to contribute to data repositories Common procedure for variant data retrieval Data sources Data parsing dbSNP ClinVar EVS ... update regularly Merged Variant Data Common procedure for variant data retrieval Data sources Data parsing As A Service? dbSNP ClinVar EVS ... update regularly Merged Variant Data Making web services suitable for realtime applications Fast Always ON Up-to-date Scalable Extensible A variant annotation represented in JSON { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915" } • Rich data structures to represent hetergenous data • Friendly for both human and computers MyVariant.info variant objects { "_id": "chr1:g.196659237C>T", "cosmic": { "chrom": "1", "hg19": { "start": 196659237, "end": 196659237 }, "ref": "C", "alt": "T", "tumor_site": "breast", "mut_freq": 0.49, "mut_nt": "C>T", "cosmic_id": "COSM424915" } { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … } } Variant annotations aggregated by matching HGVS names Only genomic-based HGVS names are used (currently on hg19) HGVS name examples Table . Examples of HGVS (Human Genome Variation Society) nomenclature. Simple Aggregation mechanism { "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" } { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … } } { "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, } } } { "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } } } “cadd” “clinvar” “evs” … “mutdb” A real example A real example online Syncing from data-hub to query instance Data hub Query instance Data sources dbSNP Merging ClinVar Query Engine EVS ... User queries Syncing from data-hub to query instance Data hub Query instance Cluster Data sources dbSNP Query Engine Query Engine Query Engine Query Engine Merging ClinVar EVS ... User queries MyVariant.info for the end users: http://MyVariant.info (currently v1 API, two endpoints) any query term(s) hgvs id(s) http://MyVariant.info/v1/query?q=<query> http://MyVariant.info/v1/variant/<variantid> matching variant hits matching variant object(s) Simple API. No sign-up. No API key. Try our live API , more docs are coming soon Both supports batch-mode via POST Retrieving a single variant http://myvariant.info/v1/variant/chr1:g.31349647C>T Integrated annotations across resources in well-formatted data structure Always up-to-date Filtering returned fields http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar,db snp.gmaf,clinvar.hgvs.coding Making flexible queries • All variants with dbNSFP annotation: http://myvariant.info/v1/query?q=_exists_:dbnsfp • All non-synonymous variants on gene "BTK": http://myvariant.info/v1/query?q=dbnsfp.genename:BTK • All variants within a genomic range: http://myvariant.info/v1/query?q=chr1:69000-70000 • Query Wellderly variants together with other annotation sources: http://myvariant.info/v1/query?q=_exists_:wellderly AND cadd.polyphen.cat:possibly_damaging &fields=wellderly,cadd.polyphen Many more ways of querying, across resources Full-text queries Wildcard queries Range queries Boolean queries Regex queries Field existing/missing Faceting Paging Sorting Batch queries Support JSONP, CORS … MyVariant.info stats • total (286,219,908) • dbNSFP (78,045,379; v2.9) • dbSNP (110,234,210; v142) • ClinVar (85,789; 20150323) • EVS (1,977,300; v2) • CADD (163,690,986; v1.2) • MutDB (420,221) • gwassnps (15,243; from UCSC) • COSMIC (1,024,498; v68 from UCSC) • DOCM (1,119) • SNPedia (5,907) • EMVClass (12,066) • Wellderly (21,240,519) As of April, 2015 Use case 1 An easy resource to retrieve well-structured variant annotations Use case 2 Direct queries integrated in your analysis pipeline An example workflow for variant prioritization input variants Filter for NS/SS variants Filter for rare variants cadd.consequence:non_synonymous AND cadd.consequence:splice_site dbnsfp.exac.af<0.0001 Filter for common genes across samples Sort by cadd scores output variants fields=cadd.phred,cadd.rawscore Use case 3 For curator/data provider: A platform for integrating with other resources (saving repetitive efforts) distribute your valuable data (under your own source field) Use case 4 For variant curation itself: Identify discrepancies Serve as the base of community-engaged curation process Base for the community curation q=cadd.consequence:synonymous AND _exists_:dbnsfp { "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … }, “curated”: {…} } Extending to other biological entities Gene Pathway Genetic Variant G P V Disease Metabolite D M MyGene.info Acknowledgement TSRI: STSI: UC Berkeley: Andrew Su Adam Mark Jiwen Xin Cyrus Afrasiabi Ginger Tsueng Eric Topol Ali Torkamani Galina Erikson Chris Mungall Washtington U: Robin Haw U. Washington: Ben Ainscough Obi Griffith UCSD: Sean Mooney OICR: Trish Whetzel MyVariant.info MyGene.info Funding and Support U54GM114833 R01GM083924
© Copyright 2024