talks Genotype Refinement Pipeline Using addi4onal data to improve genotype calls and likelihoods Why Genotypes? • Many analyses are only interested in variants • Medical gene4cists are interested in genotypes – Do any pa4ents have two copies of a LOF muta4on? – Are the parents of a diseased child likely to have more afflicted children? Genotype Call Quality Is Important! • Poor genotype calls for some samples – Can have low confidence – Might be wrong call • Can addi4onal (independent) data improve genotype calls? – Use high quality data (like 1000G) as priors – Use pedigree (if available) – Calculate posterior genotype probabili4es We are here in the Best Practices workflow" Genotype Refinement Genotype Refinement Pipeline Recalibrated Variants CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos POPULATION PRIORS Bayes’s Rule Review • Given that your coworker just walked in with an umbrella, what is the probability that it is raining? • Observa4on = umbrella • Θ = probability of rain posterior probability likelihood prior (normalize) CGP Corrects HOM_VAR Call with Low Frequency Priors Locus 20:3011778 1) Baseline HOM_VAR call" 2) Priors w/low allele frequency applied" 3) Posterior genotype called HET" 4) In agreement w/NIST" and BAMs" Likelihoods x Priors = Posterior Probabilities" [895,3,0]" AF=0.002" [HOM_REF, HET, HOM_VAR]" [868,0,27]" [HOM_REF, HET, HOM_VAR]" Genotype corrected" Confidence improved from Q3 to Q27" CGP Corrects HET Call with High Frequency Priors Locus 8:3978061 1) Baseline HET call" 2) Priors w/high allele frequency applied" 3) Posterior genotype called HOM_VAR" 4) In agreement w/NIST" and BAMs" Likelihoods x Priors = Posterior Probabilities" [894,0,0]" AF=0.987" [HOM_REF, HET, HOM_VAR]" [932,16,0]" [HOM_REF, HET, HOM_VAR]" Genotype corrected " Confidence improved" from Q0 to Q16" Popula4on Priors Improve Genotype Confidence Hom Ref Baseline HR calls are under confident, but posterior calls are more accurate" Het Baseline HV calls are over confident, but posterior calls are improved" Hom Var Assessing Confidence and Correctness Posterior Genotype Quality Homozygous Reference Calls Average Q10 increase for correct calls, Q≤30" Intercept = 9.9612," Slope = 0.9302" Incorrect calls stay about the same" Baseline Genotype Quality CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos FAMILY PRIORS Parental Genotypes Inform Child Genotypes • Child can only inherit alleles present in parents • Parent genotypes determine possible child genotypes (assuming no muta4ons) Child HR HR HR HR Mother HR HR HET HET Father HR HET HR HET Child HET HET HET HET HET HET HET Mother HET HR HET HV HET HR HV • HaplotypeCaller gives • Given trio data we can derive Father HET HET HR HET HV HV HR Child HV HV HV HV Mother HV HV HET HET Father HV HET HV HET Bayesian Priors Applied to Trios • Recall Bayes’s Rule: • Establish genotype configura4on probabili4es • Apply family priors posterior likelihood apply prior normalize Family Priors Improve Genotype Confidence Hom Ref Baseline HR calls are under confident, but posterior calls are more accurate" Het Posterior HR and HV calls are higher confidence" Hom Var Assessing Confidence and Correctness Posterior Genotype Quality Homozygous Reference Calls Average Q13 increase for correct calls, Q≤30" Intercept = 12.831," Slope = 1.238" Incorrect calls stay about the same" Baseline Genotype Quality CalculateGenotypePosteriors Popula4on Priors Family Priors VariantAnnotator PossibleDeNovos DE NOVO MUTATION TAGGING De Novo Muta4on Defini4on • De novos muta4ons are culprits is many rare, Mendelian disorders Parents are homozygous reference Child is het (one copy of alt allele) Proper4es of Sequenced De Novos • Novelty – Child has only alt allele in trio, not inherited • Rarity – Allele frequency across all samples sequenced is low • Confidence – Set GQ threshold for parents and child – (GQ improvement tools help A LOT here!) Priors Can Be Tuned For Sensi4vity Muta4on prior is a parameter in genotype configura4on probability: • Sensi4vity and specificity can be tuned as in VQSR Genotype Refinement Yields More High-‐Quality Genotypes • Ini4al genotype calls may be ambiguous or wrong • Applying popula4on and family priors improves confidence • More high confidence genotypes means more data for downstream analysis! Hom Var talks Further reading hgp://www.broadins4tute.org/gatk/guide/ hgp://www.broadins4tute.org/gatk/gatkdocs/ org_broadins4tute_s4ng_gatk_walkers_variantu4ls_CalculateGenotypePosteriors.html
© Copyright 2024