ChIP-seq Xiaole Shirley Liu STAT115, STAT215 ChIP-chip/seq Technology • Chromatin ImmunoPrecipitation + microarray or high throughput sequencing • Detect genome-wide in vivo location of TF and other DNA-binding proteins – Find all the DNA sequences bound by TF-X? – Cook all the dishes with cinnamon • Can learn the regulatory mechanism of a transcription factor or DNA-binding protein much better and faster 2 Sonication (~500bp) 3 Immunoprecipitation 4 Reverse Crosslink and DNA Purification 5 ChIP-Seq ChIP-DNA Noise Sequence millions of 30mer ends of fragments Map 30mers back to the genome 6 MACS: Model-based Analysis for ChIP-Seq • Use confident peaks to model shift size Binding 7 Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome – Chromatin and sequencing bias 8 Peak Calls • Tag distribution along the genome ~ Poisson distribution (λBG = total tag / genome size) • ChIP-Seq show local biases in the genome – Chromatin and sequencing bias – 200-300bp control windows have to few tags – But can look further Dynamic λlocal = max(λBG, [λctrl, λ1k,] λ5k, λ10k) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008 Peak Call Statistics • • • • P-value and FDR? Simulation: random sampling of reads? FDR = A / B (Ctrl/ChIP peaks are all FPs) Qvalue? MAT: Quality Control <1% enriched A Background B 10 ChIP-seq Downstream Analysis 11 Target Gene Assignment Yeast TF Regulatory Network Protein Transcribe Regulate Gene 12 Human TF Binding Distribution • • • • • • Most TF binding sites are outside promoters How to assign targets? Nearest distance? Binding within 10KB? Number of binding? Other knowledge? 13 Binding <> Functional • Binding have effect on up genes at all hours, but only have effect on down genes at 12 hours 14 Stronger sites more function? • Stronger sites are not closer to differentially regulated genes (not necessarily more functional) 15 Tang et al, Cancer Res 2011 Peak Conservation • Evolutionary conservation – Can be used for ChIP QC • Conserved sites more functional? – Majority of functional sites not conserved 16 Odom et al, Nat Genet 2007 Higher Order Chromatin Interactions Chromatin confirmation capture Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009 Direct Target Identification • Binary decision? • Rank product of • Regulatory potential – Default λ 100kb • Differential expression 19 ChIP-chip/seq Motif Finding • ChIP-chip gives 10-5000 binding regions ~2001000bp long. Precise binding motif? – Raw data is like perfect clustering, plus enrichment values • MDscan – High ChIP ranking => true targets, contain more sites – Search TF motif from highest ranking targets first (high signal / background ratio) – Refine candidate motifs with all targets 20 Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT8-mer TGTAACGTmatched AGTAACGTmatched TGCAACATmatched TGACACGGmatched AATAACAGmatched 8 7 6 5 4 m-matches for TGTAACGT Pick a reasonable m to call two w-mers similar 21 MDscan Seeds Higher enrichment A 9-mer ATTGCAAAT TTGCAAATC TTTGCGAAT Seed motif pattern ChIP-chip selected upstream sequences TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC ATTGCAAAT TTTGCGAAT TTTGCAAAT TTTGCAAAT GCAAATCCA CAAATCCAA GCAAATTCG CAAATCCAA GCAAATCCA GAAATCCAC GGAAATCCA GGAAATCCT TGCAAATCC TGCAAATTC GCCACCGT ACCACCGT ACCACGGT GCCACGGC … 22 Update Motifs With Remaining Seqs Seed1 m-matches Extreme High Rank All ChIP-selected targets 23 Refine the Motifs Seed1 m-matches Extreme High Rank All ChIP-selected targets 24 Further Refine Motifs • Could also be used to examine known motif enrichment • Is motif enrichment correlated with ChIP-seq enrichment? • Is motif more enriched in peak summits than peak flanks? • Motif analysis could identify transcription factor partners of ChIP-seq factors 25 Estrogen Receptor • • • • Carroll et al, Cell 2005 Overactive in > 70% of breast cancers Where does it go in the genome? ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1 ER TF?? Estrogen Receptor (ER) Cistrome in Breast Cancer • • • • Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators NRIP ER AP1 Estrogen Receptor (ER) Cistrome in Breast Cancer • • • • Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators ER AP1 NRIP Cell Type-Specific Binding • Same TF bind to very different locations in different tissues and conditions, why? • TF concentration? • Collaborating factors, esp pioneering factors • Interesting observations about pioneering factors 29 Summary • ChIP-seq identifies genome-wide in vivo proteinDNA interaction sites • ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR • Functional analysis of ChIP-seq data: – Strong vs weak binding, conserved vs non-conserved – Target identification – Motif analysis • Cell type-specific binding Epigenetics 30
© Copyright 2024