Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Some Algorithms for Inferring Gene Regulatory Networks Using Gene Expression data Changiz Eslahchi Faculty of Mathematical Sciences, Department of Computer Science, Shahid Beheshti University, G.C., Tehran, Iran. March, 2015 . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 1 / 74 Outline Challenge .. 1. Challenge 2. Bayesian Network 3. Algorithms 4. Conclusion Bayesian Network Algorithms Conclusion ......................................................... . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 2 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Gene Expression . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 3 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Challenge Inferring gene regulatory networks Inferring gene regulatory networks (GRNs) is a major issue in systems biology, which explicitly characterizes regulatory processes in the cell. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 4 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Challenge Gene Regulatory Network Differential Equation (Novak et al, 1998) Stochastic Petri Net (Goss et al, 1999) Boolean Network (Liang et al, 1998) Regression Method (Gardner et al, 2003) Bayesian Network (Friedman, 2004) Linear Programming (Wang et al, 2006) And so many others! . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 5 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Independence Recall the following equivalent characterizations of independence, X ⊥ ⊥Y: P (X | Y ) = P (X) P (Y | X) = P (Y ) P (X, Y ) = P (X)P (Y ) Intuitively, if X ⊥ ⊥ Y then knowledge of X provides no information about Y. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 6 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Conditional Independence Characterizations of conditional independence, X ⊥ ⊥ Y | Z: P (X | Y, Z) = P (X | Z) P (X, Y | Z) = P (X|Z)P (Y | Z), P (X, Y, Z) = P (X, Z)P (Y, Z)/P (Z). Intuitively, X ⊥ ⊥ Y | Z, then if Z is known, X provides no further knowledge of Y , and Y provides no further knowledge of X. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 7 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Entropy Entropy function is a suitable tool for measuring the average uncertainty of a variable∑X. H(X) = E[I(X)] = − ni=1 P (X = xi ) log P (X = xi ) ∑ ∑ H(X, Y ) = − ni=1 m j=1 pij log pij ∑m H(Y | X = i) = − j=1 pi (j) log pi (j), ∑ ∑ H(Y | X) = − ni=1 m j=1 pij log pi (j) H(Y | X) = H(X, Y ) − H(X), H(X | Y ) = H(X, Y ) − H(Y ). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 8 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Mutual Information M I(X, Y ) = ∑ ∑ Y X (x,y) P (x, y) log PP(x)P (y) M I(X, Y ) = H(X) + H(Y ) − H(X, Y ), M I(X, Y ) = H(X) − H(X | Y ) = H(Y ) − H(Y | X). Conditional Mutual Information between X and Y given Z is determined by: ∑ ∑ P (X,Y |Z) CM I(X, Y | Z) = Z P (Z) X,Y P (X, Y | Z) log2 P (X|Z)P (Y |Z) . . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 9 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Directed Acyclic Graphs (DAGs) The graph is acyclic because we do not permit directed cycles: A → ··· → A If there is an edge A → B then A is said to be a parent of B,and B is a child of A. Two vertices joined by an edge are said to be adjacent. The set of parents of a vertex A, is denoted pa(A). Example: P a(D) = {B, C}. We use X to denote the set of all vertices in the. DAG. . . . . . Ch. Eslahchi (Shahid Beheshti University) 2015 10 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Example P a(B) = {A} an(E) = {A, B, C, D}, an(D) = {A, B, C}, de(A) = {B, C, D, E}, de(B) = {D, E}. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 11 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Complete DAGs. A DAG is complete if every pair of vertices are adjacent. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 12 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Bayesian Network The term ”Bayesian networks” was coined by Judea Pearl in 1985. A Bayesian network (BN) is a graphical representation of a joint probability distribution that includes two components[20, 13, 26] ▶ First, a directed acyclic graph (DAG) G = (X, EG ) where X = {X1 , ..., Xn } ▶ The second component is a set of numerical parameters, which usually represent conditional probability distributions. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 13 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Bayesian Network The term ”Bayesian networks” was coined by Judea Pearl in 1985. A Bayesian network (BN) is a graphical representation of a joint probability distribution that includes two components[20, 13, 26] ▶ First, a directed acyclic graph (DAG) G = (X, EG ) where X = {X1 , ..., Xn } ▶ The second component is a set of numerical parameters, which usually represent conditional probability distributions. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 13 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Definitions and Theorems . Definition . A DAG G has a Markov property if each variable is conditionally independent of its non-descendants given its parent variables[20, 13, 26]: . For example based on this definition in the following figure we have: P (D|A, B, C) = P (D|B, C) P (X1 , X2 , ..., Xn ) = ∏ Xi ∈X Ch. Eslahchi (Shahid Beheshti University) P (Xi | P aXi ). . . . . . 2015 . 14 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Paths and Colliders A path p from X to Y in G is said to be blocked by a set of variables Z if and only if[20, 13, 26]: ▶ ▶ p contains a chain X→K →Y or a fork X←K → Y such that K ∈ Z, or p contains a collider X→K←Y such that K and all the descendants of K are not in Z. A set Z is said to d-separate X from Y in G if and only if Z blocks every path from X to Y . . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 15 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Paths and Colliders A path p between vertices X and Y consists of a sequence of distinct vertices that are adjacent. For example: X→A←B←C→D→Y A non-endpoint vertex V on a path (between X and Y ) is said to be a collider if the path takes the form X · · · → V ← · · · Y. Non-endpoints that are not colliders are called non-colliders: X · · · ← V → · · · Y, X · · · → V → · · · Y, X · · · ← V ← · · · Y . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 16 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Subtlety regarding colliders A vertex is a collider or non-collider with respect to a given path. A vertex may be a collider on one path and a non-collider on another (even between the same endpoints). Here Z is a collider on the path X → Z ← H → Y , but Z is a non-collider on the path X → Z → Y . Grammar: V is a collider with respect to the path. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 17 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Separation for sets of variables So far we have considered separation between single variables X, Y given a set Z. We extend this to sets of variables as follows: A set X is d-separated from a set Y given Z iff or all X ∈ X,Y ∈ Y , X and Y are d-separated given Z. We use SXY =Z to denote X is d-separated from Y given Z. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 18 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Learning Bayesian Network Learning Bayesian Network contains Structure learning and Parameter Learning. Structure learning Parameter Learning . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 19 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Structure learning Constraint Learning Methods: Constraint-based methods apply information about Conditional Independent (CI) test to determine dependency between vertices. Scored and Search Methods: The score-based searching methods produce a series of candidate networks, calculate a score for each candidate and return a candidate with the highest score. Hybrid Methods: The hybrid method is a combination of this two methods. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 20 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Structure learning Constraint Learning Methods: SGS Algorithm [29], FCI Algorithm [30], TIPDA Algorithm [10], PC Algorithm [29], IC Algorithm [27], QFCI Algorithm [3], GPC Algorithm [28] and P CA − CM I Algorithm [34]. Scored and Search Methods: AIC Algorithm [2], BDeu Algorithm [6], K2 Algorithm [12], BD Algorithm [18], BIC Algorithm [19], MIT Algorithm [13] and Algorithm K2GA [15]. Hybrid Methods: Suzuki’s Algorithm [7], HGC Algorithm [9] and M M HC Algorithm [32]. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 21 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Constraint Learning Methods The Path Consistency (PC) algorithm is introduced by Spirtes et al. in 1993 as a constraint-based method to infer BNs. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 22 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network PC Algorithm Algorithm [PC] 1. Start with a complete undirected graph S−1 2. i = 0, i denotes the order of algorithm,It means the separating sets of size i 3. Repeat 4. For each X ∈ X 5. For each Y ∈ ADJ(X) 6. Determine if there is M ⊆ ADJ(X) Or ADJ(Y ) with |M| = i such that X and Y given M are independent 7. If this set exists 8. Remove the edge between X and Y from Si−1 9. i = i + 1 10. Until max{|ADJ(X)|, |ADJ(Y )|} ≤ i + 1. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 23 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Example of PC Algorithm; i = 0 There is no pair of variables d-separated given ∅, so Graph unchanged. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 24 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Example of PC Algorithm; i = 1 . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 25 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Example of PC Algorithm; i = 2 . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 26 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Orientation Rulles 1. If we have X − Z − Y but no edge between X and Y ,and Z ̸∈ SXY then orient as X → Z ← Y . 2. Apply the following rules: (a) If X and Z are not adjacent by X → Y − Z then orient asY → Z. (b) If X → Y → Z and X − Z then orient as X → Z. (c) If X and Z are not adjacent, but X − W − Z and X → Y ← Z and W − Y then orient as W → Y . . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 27 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Example of PC Algorithm . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 28 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Scored and Methods Scoring Function: Denotes the degree of fitness of candidate structure to data set Search Methods G∗ = argmaxG∈F (n) g(G : D), in which g(G : D) denotes the degree of fitness of candidate G to data set and F (n) indicates all the possible DAGs defined on X. The challenging part of search procedure is that the size of the space of all structures, f (n), is super-exponential in the number of variables[26], . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 29 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Scoring Function There are many scoring functions to measure the degree of fitness of a DAG G to a data set. These are generally classified as: Bayesian scoring functions [6, 18, 22] Information theory-based scores[11, 24, 4, 16, 13]. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 30 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Search Methods The definition of the search space determines the definition of the search operators used to move from one structure to another. In turn, these operators determine the neighborhood of a DAG, namely the DAGs that can be reached in one step from the current DAG. Typically, the operators consist of: arc addition: insert a single arc between two nonadjacent nodes. arc deletion: remove a single arc between two nodes. arc reversal: reverse the direction of a single arc. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 31 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Search Methods In what follows we let op(S, A) represent the result of performing the arc operation A on the structure S, i.e., op(S, A) is a DAG that differs from S with respect to one arc. We define ∪ ∆(Xi → Xj ) = score(Xj , pa(Xj ) Xi, D) − score(Xj , pa(Xj ), D). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 32 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Search Methods The Greedy search Algorithm Algorithm [Greedy search] 1. Let S be an initial structure. 2. Repeat a) Calculate ∆(A) for each legal arc operation A Let ∆∗ (A) = maxA ∆(A) and A∗ = argmaxA ∆(A). b) If ∆∗ > 0, then Set S = op(S, A∗ ). 3. Until ∆∗ < 0. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 33 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network Scored and Search Methods The Greedy search Algorithm . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 34 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Bayesian Network MIT Score MIT Score is one of the famous Information theory-based scores which contains two terms: Mutual information and Penalization Term. si n ∑ ∑ gM IT (G, D) = (2N M ID (Xi , P aB (Xi ))−max χα,li,σi (j) ). σi i=1,P aG (Xi )̸=∅ j=1 (1) . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 35 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms In this work we introduce three algorithms for learning GRNs. 1. Zhang, Xiujun, et al. ”Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.” Bioinformatics 28.1 (2012): 98-104. 2. Aghdam, Rosa, Mojtaba Ganjali, and Changiz Eslahchi. ”IPCA-CMI: An Algorithm for Inferring Gene Regulatory Networks based on a Combination of PCA-CMI and MIT Score.” PloS one 9.4 (2014): e92600. 3. Aghdam, Rosa, Mojtaba Ganjali, Zhang, Xiujun and Changiz Eslahchi. ”CN: a consensus algorithm for inferring gene regulatory networks using the SORDER algorithm and conditional mutual information test.” Molecular BioSystems (2015). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 36 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Information Theory Let X = (X1 , ..., Xn )T be an n-dimensional Gaussian vector with mean µ = (µ1 , ..., µn )T and covariance matrix C(X) = E(X − µ)(X − µ)T , i.e. X ∼ N (µ, C(X)). Then MI for two continuous variables X and Y can be easily calculated using the following formula. σ 2 .σ 2 1 M I(X, Y ) = log X Y , 2 σXY (2) Conditional Mutual Information for two continuous variables X and Y given Z is determined by: det(C(X, Z)).det(C(Y, Z)) 1 , CM I(X, Y |Z) = log 2 det(C(Z)).det(C(X, Y, Z)) . Ch. Eslahchi (Shahid Beheshti University) . . . (3) . 2015 . 37 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms PC Algorithm based on CMI test (PCA-CMI) Algorithm [PCA-CMI] 1. Start with a complete undirected graph S−1 2. i = 0 3. Repeat 4. For each X ∈ X 5. For each Y ∈ ADJ(X) ∩ 6. Determine if there is M ⊆ VXY = ADJ(X) ADJ(Y ) with |M| = i such that X and Y given M are independent 7. If this set exists 8. Remove the edge between X and Y from Si−1 9. i = i + 1 10. Until i ≤ |VXY | . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 38 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms PC Algorithm based on CMI test (PCA-CMI) . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 39 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Zero order of the Improvement of PC Algorithm based on CMI test Algorithm [IPCA-CMI order 0] 1. Start with a complete undirected graph S−1 . 2. Repeat 3. For each X ∈ X 4. For each Y ∈ ADJ(X) 5. If X and Y are independent based on the measure of MI 6. Remove the edge between X and Y from S−1 7. The MIT score was utilized in the HC algorithm to construct G0 . . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 40 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms The IPCA-CMI for i > 0 Aqz for 1 ≤ q ≤ 4 are defined as follows: A1Z = {W |X → Z → W }, A2Z = {W |X ← Z ← W }, A3Z = {W |X ← Z → W }, A4Z = {W |X → Z ← W }. W eightX (Z) = |A1Z | + |A2Z | + |A3Z | − |A4Z |, Let RXY be defined by: RXY = {Z|W eightX (Z) ≥ k or W eightY (Z) ≥ k, ∀Z ∈ {ADJ(X) ∪ ADJ(Y )} \ {X, Y }}, . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 41 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms The IPCA-CMI for i > 0 Algorithm [The IPCA-CMI for i > 0] 1. Start with G0 2. i = 1 3. Repeat 4. For each X ∈ X 5. For each Y ∈ ADJ(X) 6. Test whether ∃H ⊆ RXY with |H| = i such that X ∥ Y | H. 7. If this set exists 8. Remove the edge between X and Y from Gi−1 9. The MIT score was utilized in the HC algorithm to direct the structure. 10. For each Z ∈ {ADJ(X) ∪ ADJ(Y )} \ {X, Y } 11. W eightX (Z) = |A1Z | + |A2Z | + |A3Z | − |A4Z | 12. RXY = {Z|W eightX (Z) ≥ k or W eightY (Z) ≥ k, ∀Z ∈ {ADJ(X) ∪ ADJ(Y )} \ {X, Y }} 13. i = i+1 13. Until i ≤ |RXY | . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 42 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Data Set The Dialogue for Reverse Engineering Assessments and Methods (DREAM) project was introduced in 2006. DREAM is a data set that allows researchers to evaluate the performance of their approaches in predicting biological networks. The goal of the in silico network challenge is to reverse engineer gene regulation networks from simulated steady-state and time-series data. The true network structure of biological gene networks is in general unknown or only partly known, but the structures of in silico networks are known and so predictions can be validated. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 43 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Data Set You can download Data set from http://www.the-dream-project.org/challenges . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 44 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Data Set DREAM3 In Silico Size 10 and 50. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 45 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Evaluation To assess the validity of algorithms, the True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) values for proposed algorithms are computed. In addition, known measures such as ACCuracy (ACC), False Discovery Rate (FDR), Positive Predictive Value (PPV), F-score measure (F-measure), Matthews Correlation Coefficient (MCC) and True Positive Rate (TPR) are considered. These measures are defined by: TP TN , SP C = TP + FN FP + TN TP TN PPV = , NPV = TP + FP TN + FN TP + TN FP ACC = , FPR = , TP + FP + TN + FN FP + TN TPR = . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 46 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Evaluation FP PPV × TPR ,F = 2 , FP + TP PPV + TPR TP × TN − FP × FN M CC = . (T P + F P )(T P + F N )(T N + F P )(T N + F N ) F DR = . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 47 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Evaluation To make an evaluation of the performance of algorithms, in the present study, these algorithms are compared by several other popular Inference methods those which are given in Table. In this table the approaches used by each algorithm is also mentioned. Table: Gene network inference methods which applied for comparison Method LASSO LP PCC MI PCCMI Approach Reference Regression Royal Statistical Society[31] Linear Programing Bioinformatics[33] PCA based on partial CC Machine Learning Research[21] Mutual Information Reverse engineering cellular networks[25] PCA based on CMI Bioinformatics[34] . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 48 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI A True Network Ch. Eslahchi (Shahid Beheshti University) B First-order Network inferred by PCA-CMI C . . First-order Network inferred by IPCA-CMI . . . 2015 . 49 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI Table: The result of Simulated and Real data sets in order 0 Network TP FP ACC FPR FDR PPV F DREAM10 9 1 0.95 0.02 0.10 0.9 0.90 0.87 0.90 DREAM50 36 54 0.92 0.05 0.6 0.4 0.43 0.39 0.46 DREAM100 70 58 0.96 0.01 0.45 0.55 0.47 0.46 0.42 SOS 4 0.72 0.33 0.18 0.82 0.78 0.40 0.75 18 . Ch. Eslahchi (Shahid Beheshti University) . MCC TPR . . . 2015 . 50 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI Table: The result of gene expression data set DREAM3 Challenge with 10 genes and sample number 10 Algorithm TP FP ACC FPR FDR PPV F MCC TPR PCA1 7 1 0.91 0.03 0.73 IPCA1 8.8 0 0.98 0 0.13 0 0.87 1 (Shahid Beheshti University) 0.7 0.94 0.93 . Ch. Eslahchi 0.78 . . . 0.88 . 2015 . 51 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI Table: The result of gene expression data set DREAM3 Challenge with 50 genes and sample number 50 Algorithm TP FP ACC FPR FDR PPV F PCA1 24 23 0.93 0.02 0.49 0.51 0.39 0.37 0.31 PCA2 22 21 0.93 0.02 0.49 0.51 0.37 0.35 0.29 IPCA1 28 26.5 0.94 0.02 0.48 0.51 0.43 0.4 0.36 IPCA2 22.9 11.78 0.95 0.01 0.48 0.52 0.38 . Ch. Eslahchi (Shahid Beheshti University) MCC TPR . . 0.42 0.3 . . 2015 . 52 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI Table: The result of gene expression data set DREAM3 Challenge with 100 genes and sample number 100 Algorithm TP FP ACC FPR FDR PPV F PCA1 49 25 0.971 0.005 0.34 0.66 0.41 0.43 0.28 PCA2 46 25 0.971 0.005 0.35 0.64 0.38 0.41 0.27 IPCA1 53.11 29.77 0.972 0.006 0.35 0.65 0.43 0.44 0.32 IPCA2 46.55 15.16 0.973 0.003 0.24 0.75 0.4 . Ch. Eslahchi (Shahid Beheshti University) MCC TPR . . 0.45 . . 2015 0.28 . 53 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result IPCA-CMI Table: The result of experimental data from Escherichia coil containing 9 genes and sample number 9 Algorithm TP FP ACC FPR FDR PPV F MCC TPR PCA1 18 4 0.72 0.33 0.18 0.82 0.78 0.40 0.75 IPCA1 18 1.8 0.73 0.32 0.17 0.82 0.79 0.41 0.75 . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 54 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms IPCA-CMI IPCA-CMI algorithm is written in Matlab. It gets simulated data and real data as inputs. This algorithm applied for learning the skeleton of GRNs. The source of the program and data sets are available at http://www.bioinf.cs.ipm.ir/software/IPCA-CMI/. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 55 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms CN: A Consensus Algorithm for Inferring Gene Regulatory Networks Using SORDER Algorithm and Conditional Mutual Information Test† PC algorithm is not robust, because it achieves different network topologies if gene orders are permuted. In addition, the performance of this algorithm depends on the threshold value used for independence tests. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 56 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms SORDER Algorithm To overcome the disadvantage of PC algorithm i.e. the dependence on the sequential ordering of vertices, the SORDER algorithm is proposed to select a suitable sequential ordering of vertices (genes) based on the degrees of vertices in S0 (skeleton of order 0 resulted by PCA-CMI). Node selection in SORDER algorithm is based on the maximum degree of nodes. In each step of SORDER algorithm, node with maximum degree is selected, and the selection process are repeated until all nodes of graph are chosen. When some nodes have the same degree in one step, an induced subgraph of these nodes are constructed. In a similar manner, nodes in the induced subgraph are selected according to the maximum degree. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 57 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms SORDER Algorithm Figure: Example of SORDER algorithm The result of applying SORDER algorithm for graph G is O = (c, a, b, d, f, e). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 58 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms SORDER Algorithm In order to investigate the performance of the SORDER algorithm, at first all permutations of n nodes of networks which contain small set of variables (DREAM3 with 10 nodes and SOS network with 9 nodes) are determined. As the number of possible permutations for the network is very large and can not be fully enumerated, we have selected N random permutations (N = 105 ) of gene order and run PCA-CMI. For i = 1, .., 105 , Ti and TSORDER are defined as Ti = (T Pi , F Pi ) and TSORDER = (T PSORDER , F PSORDER ), respectively. Let X = {i|T Pi > T PSORDER and F Pi < F PSORDER )} f or i = 1, ..., N. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 59 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms SORDER Algorithm Table: The comparison between SORDER algorithm and random permutation of nodes. Exceeding Value (EV) denotes the percentage of random permutations (unordered) which perform better than SORDER algorithm out of 105 random permutations. Data Set DREAM50 DREAM100 EV value 0.018 0.048 . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 60 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms CN Algorithm Figure: Diagram of the CN algorithm. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 61 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms CNMIT Algorithm Figure: Diagram of the CNMIT algorithm. (a) Result of CN algorithm. (b) Values of MIT score for k possible DAGs. (c) The result of CNMIT algorithm. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 62 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result ROC Curvee 1 0.9 True Positive Rate (Sensitivity) 0.8 0.7 0.6 0.5 0.4 CN PCA−CMI PCA−PCC MI LASSO LP 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 False Positive Rate (1−Specificity) 0.8 0.9 1 Figure: ROC curves of different methods for DREAM3 challenge with 10 nodes. . . with. a AUC . . The red line is related to the ROC curve of CN algorithm of. 0.9734 Ch. Eslahchi (Shahid Beheshti University) 2015 63 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Figure: Result of SORDER and CN algorithms. (A) The true network with 10 nodes and 10 edges. (B) Network inferred by SPCA-CMI algorithm. The edge with red line G2-G4 is false positives, while edges G1-G2, G3-G5 and G4-G9 are false negative. (C) Network obtained by the CN algorithm. The edge with red line G2-G9 is false positives and G4-G9 is false negative, but edges G1-G2 and G3-G5 are successfully found by this algorithm in comparison with that of (B). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 64 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 65 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Figure: Comparison of TP and FP values of CN algorithm with other methods for learning (a) DREAM3 Challenge with 10 genes, (b) DREAM3 Challenge with 50 gene, (c) DREAM3 Challenge with 100 genes, (d) SOS network with 9 genes. Also, comparison of CNMIT algorithm with different methods for learning (e) DREAM3 Challenge with 50 genes, (f) DREAM3 Challenge with 100 genes, (LA2012: LASSO2012; Gsqrt: GENIE3-FR-sqrt0; Gall: GENIE3-FR-all). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 66 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 67 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Figure: Comparison of ACC value and F-measure of CN algorithm with other methods for learning (a) DREAM3 Challenge with 10 genes, (b) DREAM3 Challenge with 50 genes, (c) DREAM3 Challenge with 100 genes, (d) SOS network with 9 genes. Comparison of CNMIT algorithm with different methods for learning (e) DREAM3 Challenge with 50 genes and 77 edges, (f) DREAM3 Challenge with 100 genes and 166 edges, (LA2012: LASSO2012; Gsqrt: GENIE3-FR-sqrt0; Gall: GENIE3-FR-all). . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 68 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Table: Comparison of different methods for learning DREAM3. LASSO: regression based method; LP: linear programming based method; PCA-PCC: PC-algorithm based on partial correlation coefficient; MI: MI method; PCA-CMI: Path Consistency Algorithm based on Conditional Mutual Information with threshold; CN: Consensus Network Algorithm. AUCD10: AUC value for a 10-gene network in DREAM3; AUCD50: AUC value for a 50-gene network in DREAM3. Method AUCD10 AUCD50 LASSO 0.813 0.819 LP 0.75 0.559 PCA-PCC 0.897 0.78 MI 0.93 0.832 . Ch. Eslahchi (Shahid Beheshti University) PCA-CMI 0.9642 0.8421 . . CN 0.9734 0.8515 . . 2015 . 69 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Table: Comparison of different methods for Learning DREAM4 Challenge. TeamName is the name of the team which registered for this challenge (PCA-CMI: Path Consistency Algorithm based on Conditional Mutual Information; CN: Consensus Network Algorithm, the best performer for the relative item is noted in bold). Method Team415 Team549 Team395 PCA-CMI CN Net1 0.75 0.73 0.69 0.7 0.75 Net2 0.69 0.7 0.64 0.69 0.74 Net3 0.76 0.74 0.72 0.74 0.76 Net4 0.77 0.74 0.72 0.74 0.7 . Ch. Eslahchi (Shahid Beheshti University) Net5 0.76 0.74 0.71 0.74 0.76 . . . . 2015 . 70 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Algorithms Result Table: Comparison of AUC for real data sets (AUCSOS: AUC values for a SOS network with 9 genes, AUCECO: AUC values for E.coil network with 8 genes, the best performer for the relative item is noted in bold). Method AUCSOS AUCECO LASSO 0.61 0.63 LP 0.59 0.54 PCA-PCC 0.67 0.6 MI 0.73 0.63 . Ch. Eslahchi (Shahid Beheshti University) PCA-CMI 0.74 0.64 . . CN 0.75 0.84 . . 2015 . 71 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Conclusion In this study new algorithms called IPCA-CMI and CN for inferring GRNs from gene expression data was presented. Zhang [34] reported that the PCA-CMI performed better than linear programming method [33], multiple linear regression Lasso method [31], mutual information method [25] and PC-Algorithm based on partial correlation coefficient [21] for inferring networks from gene expression data such as DREAM3 Challenge and SOS DNA repair network. Software in the form of MATLAB and JAVA codes. The source of data sets and codes are available at http://bs.ipm.ir/softwares/IPCA-CMI/. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 72 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Conclusion Conclusion In this study, we proved that PCA-CMI is an order-dependent algorithm and the result can be changed considering different orders of n nodes. The results showed that determining a suitable sequential ordering of nodes can improve the performance of the PCA-CMI. Moreover, constraint-based methods that apply CMI tests are dependent on the selected threshold for CMI tests. without using the SORDER algorithm, applying the CN algorithm is meaningless. Software in the form of MATLAB and JAVA codes. The source of data sets and codes are available at http://bs.ipm.ir/softwares/CN/. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 73 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Conclusion Thank You . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 74 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... References [1] N. A. Ahmed and D. Gokhale. Entropy expressions and their estimators for multivariate distributions. Information Theory, IEEE Transactions on, 35(3):688–692, 1989. [2] H. Akaike. A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716–723, 1974. [3] L. Badea. Inferring large gene networks from microarray data: a constraint-based approach. In IJCAI-2003 Workshop on Learning Graphical Models for Computational Genomics. Citeseer, 2003. [4] R. R. Bouckaert. Bayesian belief networks: from construction to inference. Universiteit Utrecht, Faculteit Wiskunde en Informatica, 1995. [5] R. R. Bouckaert. Bayesian belief networks: from construction to inference. 2001. [6] W. Buntine. Theory refinement on bayesian networks. In Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, pages 52–60. Morgan Kaufmann Publishers Inc., 1991. [7] W. Buntine. A guide to the literature on learning probabilistic networks from data. Knowledge and Data Engineering, IEEE Transactions on, 8(2):195–210, 1996. [8] J. Catlett. On changing continuous attributes into ordered discrete attributes. In Machine Learning-EWSL-91, pages 164–178. Springer, 1991. . . . . . . Ch. Eslahchi (Shahid Beheshti University) 2015 74 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... [9] J. Cheng, D. Bell, and W. Liu. Learning bayesian networks from data: An efficient approach based on information theory. On World Wide Web at http://www. cs. ualberta. ca/˜ jcheng/bnpc. htm, 1998. [10] J. Cheng, D. A. Bell, and W. Liu. Learning belief networks from data: An information theory based approach. In Proceedings of the sixth international conference on Information and knowledge management, pages 325–331. ACM, 1997. [11] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. Information Theory, IEEE Transactions on, 14(3):462–467, 1968. [12] G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine learning, 9(4):309–347, 1992. [13] L. M. De Campos. A scoring function for learning bayesian networks based on mutual information and conditional independence tests. The Journal of Machine Learning Research, 7:2149–2187, 2006. [14] J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In ICML, pages 194–202, 1995. [15] E. Faulkner. K2ga: heuristically guided evolution of bayesian network structures from data. In Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on, pages 18–25. IEEE, 2007. . . . . . . Ch. Eslahchi (Shahid Beheshti University) 2015 74 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... [16] N. Friedman and M. Goldszmidt. Learning bayesian networks with local structure. In Learning in graphical models, pages 421–459. Springer, 1998. [17] N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using bayesian networks to analyze expression data. Journal of computational biology, 7(3-4):601–620, 2000. [18] D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine learning, 20(3):197–243, 1995. [19] S. Imoto, T. Goto, S. Miyano, et al. Estimation of genetic networks and functional structures between genes by using bayesian networks and nonparametric regression. In Pacific Symposium on Biocomputing, volume 7, pages 175–186. World Scientific, 2002. [20] F. V. Jensen. An introduction to Bayesian networks, volume 210. UCL press London, 1996. [21] M. Kalisch and P. B¨ uhlmann. Estimating high-dimensional directed acyclic graphs with the PC-algorithm. The Journal of Machine Learning Research, 8:613–636, 2007. [22] M. Kayaalp and G. F. Cooper. A bayesian network scoring metric that is based on globally uniform parameter priors. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pages 251–258. Morgan Kaufmann Publishers Inc., 2002. . . . . . . Ch. Eslahchi (Shahid Beheshti University) 2015 74 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... [23] R. Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the tenth national conference on Artificial intelligence, pages 123–128. Aaai Press, 1992. [24] W. Lam and F. Bacchus. Learning bayesian belief networks: An approach based on the mdl principle. Computational intelligence, 10(3):269–293, 1994. [25] A. A. Margolin, K. Wang, W. K. Lim, M. Kustagi, I. Nemenman, and A. Califano. Reverse engineering cellular networks. Nature Protocols, 1(2):662–671, 2006. [26] T. D. Nielsen and F. V. Jensen. Bayesian networks and decision graphs. Springer, 2009. [27] J. Pearl. Causality: models, reasoning and inference, volume 29. Cambridge Univ Press, 2000. [28] J. M. Pe˜ na, J. Bj¨ orkegren, and J. Tegn´er. Growing bayesian network models of gene networks from seed genes. Bioinformatics, 21(suppl 2):ii224–ii229, 2005. [29] P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search, volume 81. The MIT Press, 2000. [30] P. Spirtes, C. Meek, and T. Richardson. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506. Morgan Kaufmann Publishers Inc., 1995. . . . . . . Ch. Eslahchi (Shahid Beheshti University) 2015 74 / 74 Outline Challenge .. Bayesian Network Algorithms Conclusion ......................................................... Conclusion [31] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. [32] I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine learning, 65(1):31–78, 2006. [33] Y. Wang, T. Joshi, X.-S. Zhang, D. Xu, and L. Chen. Inferring gene regulatory networks from multiple microarray datasets. Bioinformatics, 22(19):2413–2420, 2006. [34] X. Zhang, X.-M. Zhao, K. He, L. Lu, Y. Cao, J. Liu, J.-K. Hao, Z.-P. Liu, and L. Chen. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information. Bioinformatics, 28(1):98–104, 2012. . Ch. Eslahchi (Shahid Beheshti University) . . . . 2015 . 74 / 74
© Copyright 2024