ChIP-Seq - stat115.org

ChIP-seq
Xiaole Shirley Liu
STAT115, STAT215
ChIP-chip/seq Technology
• Chromatin ImmunoPrecipitation + microarray
or high throughput sequencing
• Detect genome-wide in vivo location of TF and
other DNA-binding proteins
– Find all the DNA sequences bound by TF-X?
– Cook all the dishes with cinnamon
• Can learn the regulatory mechanism of a
transcription factor or DNA-binding protein
much better and faster
2
Sonication (~500bp)
3
Immunoprecipitation
4
Reverse Crosslink and DNA
Purification
5
ChIP-Seq
ChIP-DNA
Noise
Sequence
millions of 30mer ends of
fragments
Map 30mers back
to the
genome
6
MACS: Model-based Analysis for ChIP-Seq
• Use confident peaks to model shift size
Binding
7
Peak Calls
• Tag distribution along the genome ~ Poisson
distribution (λBG = total tag / genome size)
• ChIP-Seq show local biases in the genome
– Chromatin and sequencing bias
8
Peak Calls
• Tag distribution along the genome ~ Poisson
distribution (λBG = total tag / genome size)
• ChIP-Seq show local biases in the genome
– Chromatin and sequencing bias
– 200-300bp control windows have to few tags
– But can look
further
Dynamic λlocal =
max(λBG, [λctrl, λ1k,] λ5k, λ10k)
ChIP
Control
300bp
1kb
5kb
10kb
http://liulab.dfci.harvard.edu/MACS/
Zhang et al, Genome Bio, 2008
Peak Call Statistics
•
•
•
•
P-value and FDR?
Simulation: random sampling of reads?
FDR = A / B (Ctrl/ChIP peaks are all FPs)
Qvalue?
MAT: Quality Control
<1% enriched
A
Background
B
10
ChIP-seq Downstream Analysis
11
Target Gene Assignment
Yeast TF
Regulatory
Network
Protein
Transcribe
Regulate
Gene
12
Human TF Binding Distribution
•
•
•
•
•
•
Most TF binding sites are outside promoters
How to assign targets?
Nearest distance?
Binding within 10KB?
Number of binding?
Other knowledge?
13
Binding <> Functional
• Binding have effect on up genes at all hours, but
only have effect on down genes at 12 hours
14
Stronger sites more function?
• Stronger sites are not closer to differentially
regulated genes (not necessarily more functional)
15
Tang et al, Cancer Res 2011
Peak Conservation
• Evolutionary conservation
– Can be used for ChIP QC
• Conserved sites more
functional?
– Majority of functional sites
not conserved
16
Odom et al, Nat Genet 2007
Higher Order Chromatin Interactions
Chromatin confirmation capture
Hi-C
Interactions follows
exponential decay with
distance
Lieberman-Aiden et al, Science 2009
Direct Target Identification
• Binary decision?
• Rank product of
• Regulatory
potential
– Default λ 100kb
• Differential
expression
19
ChIP-chip/seq Motif Finding
• ChIP-chip gives 10-5000 binding regions ~2001000bp long. Precise binding motif?
– Raw data is like perfect clustering, plus enrichment
values
• MDscan
– High ChIP ranking => true targets, contain more sites
– Search TF motif from highest ranking targets first
(high signal / background ratio)
– Refine candidate motifs with all targets
20
Similarity Defined by m-match
For a given w-mer and any other random w-mer
TGTAACGT8-mer
TGTAACGTmatched
AGTAACGTmatched
TGCAACATmatched
TGACACGGmatched
AATAACAGmatched
8
7
6
5
4
m-matches for
TGTAACGT
Pick a reasonable m to call two w-mers similar
21
MDscan Seeds
Higher enrichment
A 9-mer
ATTGCAAAT
TTGCAAATC
TTTGCGAAT
Seed
motif pattern
ChIP-chip
selected upstream
sequences
TTGCAAATC
TTGCGAATA
TTGCAAATT
TTGCCCATC
ATTGCAAAT
TTTGCGAAT
TTTGCAAAT
TTTGCAAAT
GCAAATCCA
CAAATCCAA
GCAAATTCG
CAAATCCAA
GCAAATCCA
GAAATCCAC
GGAAATCCA
GGAAATCCT
TGCAAATCC
TGCAAATTC
GCCACCGT
ACCACCGT
ACCACGGT
GCCACGGC
…
22
Update Motifs With Remaining Seqs
Seed1
m-matches
Extreme
High
Rank
All ChIP-selected targets
23
Refine the Motifs
Seed1
m-matches
Extreme
High
Rank
All ChIP-selected targets
24
Further Refine Motifs
• Could also be used to examine known motif
enrichment
• Is motif enrichment correlated with ChIP-seq
enrichment?
• Is motif more enriched in peak summits than
peak flanks?
• Motif analysis could identify transcription factor
partners of ChIP-seq factors
25
Estrogen Receptor
•
•
•
•
Carroll et al, Cell 2005
Overactive in > 70% of breast cancers
Where does it go in the genome?
ChIP-chip on chr21/22, motif and expression
analysis found its “pioneering factor” FoxA1
ER
TF??
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
•
•
•
Carroll et al, Nat Genet 2006
ER may function far away (100-200KB) from genes
Only 20% of ER sites have PhastCons > 0.2
ER has different effect based on different collaborators
NRIP
ER
AP1
Estrogen Receptor (ER)
Cistrome in Breast Cancer
•
•
•
•
Carroll et al, Nat Genet 2006
ER may function far away (100-200KB) from genes
Only 20% of ER sites have PhastCons > 0.2
ER has different effect based on different collaborators
ER
AP1
NRIP
Cell Type-Specific Binding
• Same TF bind to very different locations in
different tissues and conditions, why?
• TF concentration?
• Collaborating factors, esp pioneering factors
• Interesting observations about pioneering factors
29
Summary
• ChIP-seq identifies genome-wide in vivo proteinDNA interaction sites
• ChIP-seq peak calling to shift reads, and
calculate correct enrichment and FDR
• Functional analysis of ChIP-seq data:
– Strong vs weak binding, conserved vs non-conserved
– Target identification
– Motif analysis
• Cell type-specific binding  Epigenetics
30