Refined integrative model EGRIN-PROM

Refined Integrative Model of Regulation and
Metabolism Improves Phenotype Prediction
for Saccharomyces Cerevisiae
Zhuo Wang
Outline
• Challenge and goal
• Inferring regulatory network by EGRIN
• Construction of integrative model by
EGRIN_PROM
• Prediction for TF KO effects on growth and
comparison with Yeastract_PROM
• Identification of important regulators for PHA
synthesis
• Further work
Integration of Gene Regulatory Networks
and Metabolic Networks
PROM
(Probabilistic Regulation of Metabolism)
Predict Condition Specific Growth Rates
EGRIN
(Environment & Gene Regulatory Influence Network)
Identify Condition Specific Regulators
Bonneau et al. Cell 2007
Danziger et al. Nucleic acids research 2014
Chandrasekaran and Price, Proc. Natil. Acad. Sci. USA, 2010
EGRIN Part 1 - Inferelator:
Infer Regulation With Linear Regression
=
Target Gene X Expression
+…+
Factor 1
Factor n
ΔGenex = W1 ˣ Factor1 + … + Wn ˣ Factorn
Pick Most Influential Regulators with Regression and Shrinkage (e.g. Elastic Net)
Bonneau et al 2006
EGRIN Part 2 - cMonkey:
Biclustering Coregulated Genes
Included Experiments
Excluded Experiments
Expression Ratio
Ribosomal Genes
Experiment Number
Known Gene Associations
Reiss et al 2006
Detected Upstream Motifs
Detail explaination for cMonkey
•
BiClusters are clusters of genes that are co-expressed under common
conditions. With cMonkey, they are built using uArray expression data,
known gene associations, and common upstream motifs.
1) First Click: Each of these lines represents the expression levels of a gene in
the bicluster. The ones to the left of the dotted red line are coherent and are
included in the cluster. The ones on the right are less coherent and are
excluded. The conditions in pink are yeast in sugar, the green are yeast in
soon after oleic acid expose, the blue are yeast after long term oleic acid
exposure.
2) Second Click: Shown are the regions upstream of each gene in the cluster.
The two motifs shown on the far left are commonly detected in these regions.
The one furthest on the left appears in shades of red, and the other one
appears in shades of green. The darker the color, the stronger the motif
match.
3) Third Click: The lines in black are known protein-protein interactions taken
from the String database. Other color lines are associations from internal
experiments.
4) Fourth Click: It turns out that this cluster contains ribosomal genes. As
expected, they are coexpressed.
Validate Yeast EGRIN:
~1500 Train, ~1500 Test
Global
Condition
Specific
Gene Specific
EGRIN
=
+…+
Find Enriched Regulators of Clusters
Gene
Gene
D
F
Gene
C
Factor
1
Gene
G
Gene
E
Gene
H
Cluster A
These are
significant
P ≤ 0.05
Gene
Gene
D
F
Gene
Factor
1
Gene
C
Gene
G
E
Gene
H
Cluster A
Candidate Selection:
Regulator of Peroxisome Clusters
Global
Condition
Specific
Gene Specific
Selected Genes
Regulate
Peroxisomes
Peroxosome
Annotated
Genes
Deleted
Regulators
p-Value vs Control
Positives < 10-101
Selected < 10-14
Can we predict
Growth Phenotype?
PROM
Global
Condition
Specific
Gene Specific
(Probabilistic Regulation of Metabolism)
Predict Condition Specific Growth
Rates
Bootstrap for Gene Level Predictions
1000 x
=
+…+
TF
Target
FDR
Activator?
YDR253C
YAL012W
0
TRUE
YOL108C
YAL012W
0.19
FALSE
YHR124W
YAL012W
0.43
FALSE
YFL031W
YAL012W
0.46
TRUE
YFL031W
YAL022C
0
TRUE
YJL110C
YAL022C
0.005
TRUE
YPL089C
YAL022C
0.015
FALSE
YMR042W
YAL022C
0.015
TRUE
YHR084W
YAL022C
0.025
FALSE
Starting Assumption: If FDR = .46, then 54% of target is controlled by TF.
Wrong, but maybe useful.
Composition of the integrative models
Yeastract_PROM and EGRIN_PROM
200
180
35000
160
30000
140
25000
120
100
20000
80
15000
Yeastract_PROM
10000
EGRIN_PROM
60
40
5000
20
0
Total TF
Match_expr119TF
Match_17defectTF
0
Regulatory
interaction
metabolic genes
Fendt et al. 2010. “Unraveling Condition-Dependent Networks of
Transcription Factors That Control Metabolic Pathway Activity in Yeast.”
Molecular Systems Biology 6 (1)
EGRIN FDRs Modify PROM
Gene
Factor
1
M
I
FDR = .005
A
Enzyme
1
M
II
Gene
B
Enzyme
2
2
0.005 x Vmax
C
P = 0.97
Gene
D
Enzyme
3
PROM
P(Target=on| TF=off)
EGRIN
DIRECT
M
X
Metabolite
0.97 x Vmax
M
IV
3
E
(Activates)
M
III
Factor
Gene
EGRIN
DIRECT
(Inhibits)
Gene
P = 0.005
EGRIN
Vmax
P = 1.00
Factor
Legend
Enzyme
4
Vmax
M
V
Correlation between measured growth and predicted
growth when TFs are deleted using Yeast 6.06
Glucose minimal media
Correlation
P-value
Sum of
squared error
Normalized sum
of squared
error/
permutation Pvalue
Integrative model
YEASTRACT-PROM_Y6_TF90
0.2110
0.0459
4.298
0.205/0.029
YEASTRACT-PROM_Y6_TF51
0.1019
0.4723
3.566
0.249/0.144
EGRIN-PROM_Y6_DirectPrior_otherParray
0.4183
0.0020
2.481
0.118/0.004
EGRIN-PROM_Y6_DirectPrior_otherP=1
0.4325
0.0014
2.506
0.121/0.003
Sarah-Maria et al. 2010. Molecular Systems Biology
EGRIN_PROM has higher Matthew
correlation coefficient than Yeastract_PROM
MCC between predicted and measured growth ratio
0.6
0.5
YEASTRACTPROM_Y6_TF90
0.4
YEASTRACTPROM_Y6_TF51
0.3
0.2
EGRINPROM_Y6_DirectPrior_oth
erP=1
0.1
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-0.1
Threshold of growth ratio for sick
0.9
0.95
EGRINPROM_Y6_DirectPrior_oth
erParray
-0.2
permutation test by 500 random regulatory association and
expression dataset with the same size, and found all p<0.05
How to compare prediction with experiment?
• True Positive (真正, TP)被模型预测为正
的正样本;
• True Negative(真负 , TN)被模型预测为
负的负样本 ;
• False Positive (假正, FP)被模型预测为
正的负样本;
• False Negative(假负 , FN)被模型预测为
负的正样本
Metric
• Sensitivity=TP /(TP + FN)
• Specificity=TN /(TN + FP)
• Precision (Positive predictive value)
= TP/(TP + FP);
• Negative predictive value=TN/(TN+FN)
• Accuracy=(TP+TN)/(TP+TN+FP+FN)
Matthews correlation coefficient
Integrative model can predict growth change
better than only metabolic network
threshold=0.5
threshold=0.2
Other media
Integrative model
Pearson
corrcoef
pvalue
normalized sum Permu
of squared error p-value
0.95
MCC
0.5
MCC
0.2
MCC
galactose with ammonium medium
YEASTRACT-PROM
0.162
0.126
0.339
0.064
0.039
0.132 -0.158
EGRIN-PROM_
DirectPrior_otherPArray
0.227
0.106
0.196
0.058
0.010
0.111
0.312
EGRIN-PROM_
DirectPrior_otherP=1
0.288
0.038
0.182
0.025
0.308
0.146
0.347
glucose with urea medium
YEASTRACT-PROM
0.188
0.075
0.213
0.040
0.093
0.096
0.009
EGRIN-PROM_
DirectPrior_otherPArray
0.294
0.034
0.158
0.027
0.104
0.369
0.077
EGRIN-PROM_
DirectPrior_otherP=1
0.308
0.026
0.162
0.023
0.123
0.369
0.077
Overall MCC for three media
0.4
0.35
Yeastract_PROM_Y6_TF90
0.3
0.25
Yeastract_PROM_Y6_TF51
0.2
0.15
EGRINPROM_Y6_DirectPrior_otherP=1
0.1
0.05
0
0.1
-0.05
-0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.95
EGRINPROM_Y6_DirectPrior_otherParray
Different metabolic models
Experiment
Control
Glucose minimal media
Correlation Coefficient
YEASTRACT-PROM Yeast6
0.2110
p-Value
YEASTRACT-PROM Yeast7
0.1747
0.0995
YEASTRACT-PROM iMM904
0.2128
0.0441
EGRIN-PROM Yeast6
0.4325
0.0014
EGRIN-PROM Yeast7
0.2724
0.0507
0.3689
Δ Correlation
0.0071
P-Value
T-Test (2 Tail, Paired)
0.1584
0.047348
Aggregate (Fisher’s Transform)
0.2972
0.001002
EGRIN-PROM iMM904
By Samuel A. Danziger
0.0459
Comparison of MCC by setting different
threshold for sick
0.5
0.4
0.3
MCC_essential0.2
0.2
MCC_essential0.5
MCC_essential0.95
0.1
0
YEASTRACT52_PROM
-0.1
YEASTRACT90_PROM
EGRIN_PROM_Y7
EGRIN_PROM_iMM904
EGRIN_PROM_Y6
EGRIN-PROM for different metabolic models
sick: grRatio<0.5
sick: grRatio<0.2
YAL051W
YBL005W
YBL021C
YBL103C
YBR049C
YBR083W
YBR182C
YCL055W
YCR065W
YDL020C
YDL056W
YDL170W
YDR034C
YDR123C
YDR146C
YDR207C
YDR216W
YDR253C
YDR259C
YDR423C
YEL009C
YER040W
YER111C
YFL021W
YFL031W
YFR034C
YGL013C
YGL035C
YGL071W
YGL073W
YGL166W
YGL209W
YGL237C
YGL254W
YGR044C
YHR084W
YHR124W
YHR178W
YHR206W
YIL036W
YIL101C
YIL131C
YIR023W
YJL056C
YJL089W
YJL110C
YJR060W
YJR094C
YKL015W
YKL038W
YKL062W
YKL109W
YKL112W
YKL185W
YKR034W
YKR064W
YLR014C
YLR098C
YLR131C
YLR176C
YLR228C
YLR256W
YLR451W
YML007W
YML099C
YMR021C
YMR037C
YMR042W
YMR043W
YMR070W
YMR280C
YNL027W
YNL068C
YNL103W
YNL167C
YNL204C
YNL216W
YNL314W
YOL028C
YOL067C
YOL108C
YOR028C
YOR140W
YOR337W
YOR358W
YOR363C
YPL075W
YPL089C
YPL248C
YPR065W
YPR199C
Double deletion of TF and metabolic genes
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Highly synthetic lethal TF and metabolic genes
TF
Metabolic gene
YOR363C
YOR028C
YOR028C
YAL051W
YAL051W
YAL051W
YAL051W
YAL051W
YLR228C
YJR019C
YPL059W
YMR170C
YBR221C
YER178W
YGR193C
YNL071W
YFL018C
YJR057W
Double KO
grRatio
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.005
SingleGene KO
grRatio
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.989
1.000
SingleTF KO
grRatio
1.000
0.994
0.994
0.984
0.984
0.984
0.984
0.984
1.000
Synthetic
essential TF
Name
YOR363C
PIP2
YOR028C
CIN5
YAL051W
OAF1
Oleate-activated transcription factor, acts alone and as a heterodimer with Pip2p; activates
genes involved in beta-oxidation of fatty acids and peroxisome organization and biogenesis
YLR228C
ECM22
Sterol regulatory element binding protein; regulates transcription of sterol biosynthetic genes
Function annotation
Oleate-specific transcriptional activator of peroxisome proliferation; binds oleate response
elements (OREs), activates beta-oxidation genes
Basic leucine zipper (bZIP) transcription factor of the yAP-1 family; mediates pleiotropic
drug resistance and salt tolerance
PHA based plastics are attractive industrial products:
small and large companies currently produce bacterial PHA.
•
In 2004, Procter & Gamble (US) and Kaneka Corporation (Japan) announced a joint
development
for the completion of R&D leading to the commercialization of Nodax, a large range of
polyhydroxybutyrate-co-hydroxyalkanoates (PHBHx, PHBO, PHBOd) .
The industrial large-scale production was planned with a target price
around 2€/kg
• Tianan, a Chinese company announced to increase the capacity from the current
2,000 tons to 10,000 tons/year.
• The Dutch chemical company DSM announced to invest in a PHA plant together
with a Chinese bio-based plastics company—Tianjin Green Bio-Science Co. The
c
company will start
up the production of PHA with an annual capacity of 10,000 tons.
• The Japanese company Kaneka announced that they are ready to production
1.000.000 tons/ year of PHBHx in 2013.
Regulatory network relevant with Important TFs for cytosol PHA synthesis
PIP2, CAT8 known regulators of
peroxisome; SIP4, MSN4 very top
candidate peroxisome regulators by
EGRIN
Conclusion
• Integration of TRN and MN can predict growth rate
change for gene knockouts better than only using MN.
• Integrative EGRIN_PROM model can predict
metabolic phenotype better than only using
Yeastract_PROM.
• The prediction is further improved by accounting for
activator and inhibitor status of TFs, and by using the
EGRIN FDR to guide the probability between TF and
target genes.
• Find new regulators across different conditions, and
half of essential regulators for PHA synthesis are top
peroxisome candidates. (experiment validation)
Further work
• Using EGRIN to calculate the target gene
expression for each TF KO, and then set
constrains on PROM or iMAT
• Prediction for carbon-nitrogen interaction