Refined Integrative Model of Regulation and Metabolism Improves Phenotype Prediction for Saccharomyces Cerevisiae Zhuo Wang Outline • Challenge and goal • Inferring regulatory network by EGRIN • Construction of integrative model by EGRIN_PROM • Prediction for TF KO effects on growth and comparison with Yeastract_PROM • Identification of important regulators for PHA synthesis • Further work Integration of Gene Regulatory Networks and Metabolic Networks PROM (Probabilistic Regulation of Metabolism) Predict Condition Specific Growth Rates EGRIN (Environment & Gene Regulatory Influence Network) Identify Condition Specific Regulators Bonneau et al. Cell 2007 Danziger et al. Nucleic acids research 2014 Chandrasekaran and Price, Proc. Natil. Acad. Sci. USA, 2010 EGRIN Part 1 - Inferelator: Infer Regulation With Linear Regression = Target Gene X Expression +…+ Factor 1 Factor n ΔGenex = W1 ˣ Factor1 + … + Wn ˣ Factorn Pick Most Influential Regulators with Regression and Shrinkage (e.g. Elastic Net) Bonneau et al 2006 EGRIN Part 2 - cMonkey: Biclustering Coregulated Genes Included Experiments Excluded Experiments Expression Ratio Ribosomal Genes Experiment Number Known Gene Associations Reiss et al 2006 Detected Upstream Motifs Detail explaination for cMonkey • BiClusters are clusters of genes that are co-expressed under common conditions. With cMonkey, they are built using uArray expression data, known gene associations, and common upstream motifs. 1) First Click: Each of these lines represents the expression levels of a gene in the bicluster. The ones to the left of the dotted red line are coherent and are included in the cluster. The ones on the right are less coherent and are excluded. The conditions in pink are yeast in sugar, the green are yeast in soon after oleic acid expose, the blue are yeast after long term oleic acid exposure. 2) Second Click: Shown are the regions upstream of each gene in the cluster. The two motifs shown on the far left are commonly detected in these regions. The one furthest on the left appears in shades of red, and the other one appears in shades of green. The darker the color, the stronger the motif match. 3) Third Click: The lines in black are known protein-protein interactions taken from the String database. Other color lines are associations from internal experiments. 4) Fourth Click: It turns out that this cluster contains ribosomal genes. As expected, they are coexpressed. Validate Yeast EGRIN: ~1500 Train, ~1500 Test Global Condition Specific Gene Specific EGRIN = +…+ Find Enriched Regulators of Clusters Gene Gene D F Gene C Factor 1 Gene G Gene E Gene H Cluster A These are significant P ≤ 0.05 Gene Gene D F Gene Factor 1 Gene C Gene G E Gene H Cluster A Candidate Selection: Regulator of Peroxisome Clusters Global Condition Specific Gene Specific Selected Genes Regulate Peroxisomes Peroxosome Annotated Genes Deleted Regulators p-Value vs Control Positives < 10-101 Selected < 10-14 Can we predict Growth Phenotype? PROM Global Condition Specific Gene Specific (Probabilistic Regulation of Metabolism) Predict Condition Specific Growth Rates Bootstrap for Gene Level Predictions 1000 x = +…+ TF Target FDR Activator? YDR253C YAL012W 0 TRUE YOL108C YAL012W 0.19 FALSE YHR124W YAL012W 0.43 FALSE YFL031W YAL012W 0.46 TRUE YFL031W YAL022C 0 TRUE YJL110C YAL022C 0.005 TRUE YPL089C YAL022C 0.015 FALSE YMR042W YAL022C 0.015 TRUE YHR084W YAL022C 0.025 FALSE Starting Assumption: If FDR = .46, then 54% of target is controlled by TF. Wrong, but maybe useful. Composition of the integrative models Yeastract_PROM and EGRIN_PROM 200 180 35000 160 30000 140 25000 120 100 20000 80 15000 Yeastract_PROM 10000 EGRIN_PROM 60 40 5000 20 0 Total TF Match_expr119TF Match_17defectTF 0 Regulatory interaction metabolic genes Fendt et al. 2010. “Unraveling Condition-Dependent Networks of Transcription Factors That Control Metabolic Pathway Activity in Yeast.” Molecular Systems Biology 6 (1) EGRIN FDRs Modify PROM Gene Factor 1 M I FDR = .005 A Enzyme 1 M II Gene B Enzyme 2 2 0.005 x Vmax C P = 0.97 Gene D Enzyme 3 PROM P(Target=on| TF=off) EGRIN DIRECT M X Metabolite 0.97 x Vmax M IV 3 E (Activates) M III Factor Gene EGRIN DIRECT (Inhibits) Gene P = 0.005 EGRIN Vmax P = 1.00 Factor Legend Enzyme 4 Vmax M V Correlation between measured growth and predicted growth when TFs are deleted using Yeast 6.06 Glucose minimal media Correlation P-value Sum of squared error Normalized sum of squared error/ permutation Pvalue Integrative model YEASTRACT-PROM_Y6_TF90 0.2110 0.0459 4.298 0.205/0.029 YEASTRACT-PROM_Y6_TF51 0.1019 0.4723 3.566 0.249/0.144 EGRIN-PROM_Y6_DirectPrior_otherParray 0.4183 0.0020 2.481 0.118/0.004 EGRIN-PROM_Y6_DirectPrior_otherP=1 0.4325 0.0014 2.506 0.121/0.003 Sarah-Maria et al. 2010. Molecular Systems Biology EGRIN_PROM has higher Matthew correlation coefficient than Yeastract_PROM MCC between predicted and measured growth ratio 0.6 0.5 YEASTRACTPROM_Y6_TF90 0.4 YEASTRACTPROM_Y6_TF51 0.3 0.2 EGRINPROM_Y6_DirectPrior_oth erP=1 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -0.1 Threshold of growth ratio for sick 0.9 0.95 EGRINPROM_Y6_DirectPrior_oth erParray -0.2 permutation test by 500 random regulatory association and expression dataset with the same size, and found all p<0.05 How to compare prediction with experiment? • True Positive (真正, TP)被模型预测为正 的正样本; • True Negative(真负 , TN)被模型预测为 负的负样本 ; • False Positive (假正, FP)被模型预测为 正的负样本; • False Negative(假负 , FN)被模型预测为 负的正样本 Metric • Sensitivity=TP /(TP + FN) • Specificity=TN /(TN + FP) • Precision (Positive predictive value) = TP/(TP + FP); • Negative predictive value=TN/(TN+FN) • Accuracy=(TP+TN)/(TP+TN+FP+FN) Matthews correlation coefficient Integrative model can predict growth change better than only metabolic network threshold=0.5 threshold=0.2 Other media Integrative model Pearson corrcoef pvalue normalized sum Permu of squared error p-value 0.95 MCC 0.5 MCC 0.2 MCC galactose with ammonium medium YEASTRACT-PROM 0.162 0.126 0.339 0.064 0.039 0.132 -0.158 EGRIN-PROM_ DirectPrior_otherPArray 0.227 0.106 0.196 0.058 0.010 0.111 0.312 EGRIN-PROM_ DirectPrior_otherP=1 0.288 0.038 0.182 0.025 0.308 0.146 0.347 glucose with urea medium YEASTRACT-PROM 0.188 0.075 0.213 0.040 0.093 0.096 0.009 EGRIN-PROM_ DirectPrior_otherPArray 0.294 0.034 0.158 0.027 0.104 0.369 0.077 EGRIN-PROM_ DirectPrior_otherP=1 0.308 0.026 0.162 0.023 0.123 0.369 0.077 Overall MCC for three media 0.4 0.35 Yeastract_PROM_Y6_TF90 0.3 0.25 Yeastract_PROM_Y6_TF51 0.2 0.15 EGRINPROM_Y6_DirectPrior_otherP=1 0.1 0.05 0 0.1 -0.05 -0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 EGRINPROM_Y6_DirectPrior_otherParray Different metabolic models Experiment Control Glucose minimal media Correlation Coefficient YEASTRACT-PROM Yeast6 0.2110 p-Value YEASTRACT-PROM Yeast7 0.1747 0.0995 YEASTRACT-PROM iMM904 0.2128 0.0441 EGRIN-PROM Yeast6 0.4325 0.0014 EGRIN-PROM Yeast7 0.2724 0.0507 0.3689 Δ Correlation 0.0071 P-Value T-Test (2 Tail, Paired) 0.1584 0.047348 Aggregate (Fisher’s Transform) 0.2972 0.001002 EGRIN-PROM iMM904 By Samuel A. Danziger 0.0459 Comparison of MCC by setting different threshold for sick 0.5 0.4 0.3 MCC_essential0.2 0.2 MCC_essential0.5 MCC_essential0.95 0.1 0 YEASTRACT52_PROM -0.1 YEASTRACT90_PROM EGRIN_PROM_Y7 EGRIN_PROM_iMM904 EGRIN_PROM_Y6 EGRIN-PROM for different metabolic models sick: grRatio<0.5 sick: grRatio<0.2 YAL051W YBL005W YBL021C YBL103C YBR049C YBR083W YBR182C YCL055W YCR065W YDL020C YDL056W YDL170W YDR034C YDR123C YDR146C YDR207C YDR216W YDR253C YDR259C YDR423C YEL009C YER040W YER111C YFL021W YFL031W YFR034C YGL013C YGL035C YGL071W YGL073W YGL166W YGL209W YGL237C YGL254W YGR044C YHR084W YHR124W YHR178W YHR206W YIL036W YIL101C YIL131C YIR023W YJL056C YJL089W YJL110C YJR060W YJR094C YKL015W YKL038W YKL062W YKL109W YKL112W YKL185W YKR034W YKR064W YLR014C YLR098C YLR131C YLR176C YLR228C YLR256W YLR451W YML007W YML099C YMR021C YMR037C YMR042W YMR043W YMR070W YMR280C YNL027W YNL068C YNL103W YNL167C YNL204C YNL216W YNL314W YOL028C YOL067C YOL108C YOR028C YOR140W YOR337W YOR358W YOR363C YPL075W YPL089C YPL248C YPR065W YPR199C Double deletion of TF and metabolic genes 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Highly synthetic lethal TF and metabolic genes TF Metabolic gene YOR363C YOR028C YOR028C YAL051W YAL051W YAL051W YAL051W YAL051W YLR228C YJR019C YPL059W YMR170C YBR221C YER178W YGR193C YNL071W YFL018C YJR057W Double KO grRatio 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.005 SingleGene KO grRatio 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.989 1.000 SingleTF KO grRatio 1.000 0.994 0.994 0.984 0.984 0.984 0.984 0.984 1.000 Synthetic essential TF Name YOR363C PIP2 YOR028C CIN5 YAL051W OAF1 Oleate-activated transcription factor, acts alone and as a heterodimer with Pip2p; activates genes involved in beta-oxidation of fatty acids and peroxisome organization and biogenesis YLR228C ECM22 Sterol regulatory element binding protein; regulates transcription of sterol biosynthetic genes Function annotation Oleate-specific transcriptional activator of peroxisome proliferation; binds oleate response elements (OREs), activates beta-oxidation genes Basic leucine zipper (bZIP) transcription factor of the yAP-1 family; mediates pleiotropic drug resistance and salt tolerance PHA based plastics are attractive industrial products: small and large companies currently produce bacterial PHA. • In 2004, Procter & Gamble (US) and Kaneka Corporation (Japan) announced a joint development for the completion of R&D leading to the commercialization of Nodax, a large range of polyhydroxybutyrate-co-hydroxyalkanoates (PHBHx, PHBO, PHBOd) . The industrial large-scale production was planned with a target price around 2€/kg • Tianan, a Chinese company announced to increase the capacity from the current 2,000 tons to 10,000 tons/year. • The Dutch chemical company DSM announced to invest in a PHA plant together with a Chinese bio-based plastics company—Tianjin Green Bio-Science Co. The c company will start up the production of PHA with an annual capacity of 10,000 tons. • The Japanese company Kaneka announced that they are ready to production 1.000.000 tons/ year of PHBHx in 2013. Regulatory network relevant with Important TFs for cytosol PHA synthesis PIP2, CAT8 known regulators of peroxisome; SIP4, MSN4 very top candidate peroxisome regulators by EGRIN Conclusion • Integration of TRN and MN can predict growth rate change for gene knockouts better than only using MN. • Integrative EGRIN_PROM model can predict metabolic phenotype better than only using Yeastract_PROM. • The prediction is further improved by accounting for activator and inhibitor status of TFs, and by using the EGRIN FDR to guide the probability between TF and target genes. • Find new regulators across different conditions, and half of essential regulators for PHA synthesis are top peroxisome candidates. (experiment validation) Further work • Using EGRIN to calculate the target gene expression for each TF KO, and then set constrains on PROM or iMAT • Prediction for carbon-nitrogen interaction
© Copyright 2024