Download Report

Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Some Algorithms for Inferring Gene Regulatory
Networks Using Gene Expression data
Changiz Eslahchi
Faculty of Mathematical Sciences, Department of Computer Science, Shahid Beheshti
University, G.C., Tehran, Iran.
March, 2015
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
1 / 74
Outline
Challenge
..
1.
Challenge
2.
Bayesian Network
3.
Algorithms
4.
Conclusion
Bayesian Network
Algorithms
Conclusion
.........................................................
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
2 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Gene Expression
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
3 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Challenge
Inferring gene regulatory networks
Inferring gene regulatory networks (GRNs) is a major issue in systems
biology, which explicitly characterizes regulatory processes in the cell.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
4 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Challenge
Gene Regulatory Network
Differential Equation (Novak et al, 1998)
Stochastic Petri Net (Goss et al, 1999)
Boolean Network (Liang et al, 1998)
Regression Method (Gardner et al, 2003)
Bayesian Network (Friedman, 2004)
Linear Programming (Wang et al, 2006)
And so many others!
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
5 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Independence
Recall the following equivalent characterizations of independence, X ⊥
⊥Y:
P (X | Y ) = P (X)
P (Y | X) = P (Y )
P (X, Y ) = P (X)P (Y )
Intuitively, if X ⊥
⊥ Y then knowledge of X provides no information about
Y.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
6 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Conditional Independence
Characterizations of conditional independence, X ⊥
⊥ Y | Z:
P (X | Y, Z) = P (X | Z)
P (X, Y | Z) = P (X|Z)P (Y | Z),
P (X, Y, Z) = P (X, Z)P (Y, Z)/P (Z).
Intuitively, X ⊥
⊥ Y | Z, then if Z is known, X provides no further
knowledge of Y , and Y provides no further knowledge of X.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
7 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Entropy
Entropy function is a suitable tool for measuring the average
uncertainty of a variable∑X.
H(X) = E[I(X)] = − ni=1 P (X = xi ) log P (X = xi )
∑ ∑
H(X, Y ) = − ni=1 m
j=1 pij log pij
∑m
H(Y | X = i) = − j=1 pi (j) log pi (j),
∑ ∑
H(Y | X) = − ni=1 m
j=1 pij log pi (j)
H(Y | X) = H(X, Y ) − H(X),
H(X | Y ) = H(X, Y ) − H(Y ).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
8 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Mutual Information
M I(X, Y ) =
∑ ∑
Y
X
(x,y)
P (x, y) log PP(x)P
(y)
M I(X, Y ) = H(X) + H(Y ) − H(X, Y ),
M I(X, Y ) = H(X) − H(X | Y ) = H(Y ) − H(Y | X).
Conditional Mutual Information between X and Y given Z is
determined by:
∑
∑
P (X,Y |Z)
CM I(X, Y | Z) = Z P (Z) X,Y P (X, Y | Z) log2 P (X|Z)P
(Y |Z) .
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
9 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Directed Acyclic Graphs (DAGs)
The graph is acyclic because we do not permit directed cycles:
A → ··· → A
If there is an edge A → B then A is said to be a parent of B,and B is
a child of A.
Two vertices joined by an edge are said to be adjacent.
The set of parents of a vertex A, is denoted pa(A). Example:
P a(D) = {B, C}.
We use X to denote the set of all vertices in
the. DAG.
.
.
.
.
.
Ch. Eslahchi
(Shahid Beheshti University)
2015
10 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Example
P a(B) = {A}
an(E) = {A, B, C, D}, an(D) = {A, B, C},
de(A) = {B, C, D, E}, de(B) = {D, E}.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
11 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Complete DAGs.
A DAG is complete if every pair of vertices are adjacent.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
12 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Bayesian Network
The term ”Bayesian networks” was coined by Judea Pearl in 1985.
A Bayesian network (BN) is a graphical representation of a joint
probability distribution that includes two components[20, 13, 26]
▶
First, a directed acyclic graph (DAG) G = (X, EG ) where
X = {X1 , ..., Xn }
▶
The second component is a set of numerical parameters, which usually
represent conditional probability distributions.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
13 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Bayesian Network
The term ”Bayesian networks” was coined by Judea Pearl in 1985.
A Bayesian network (BN) is a graphical representation of a joint
probability distribution that includes two components[20, 13, 26]
▶
First, a directed acyclic graph (DAG) G = (X, EG ) where
X = {X1 , ..., Xn }
▶
The second component is a set of numerical parameters, which usually
represent conditional probability distributions.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
13 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Definitions
and Theorems
.
Definition
.
A DAG G has a Markov property if each variable is conditionally
independent
of its non-descendants given its parent variables[20, 13, 26]:
.
For example based on this definition in the following figure we have:
P (D|A, B, C) = P (D|B, C)
P (X1 , X2 , ..., Xn ) =
∏
Xi ∈X
Ch. Eslahchi
(Shahid Beheshti University)
P (Xi | P aXi ).
.
.
.
.
.
2015
.
14 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Paths and Colliders
A path p from X to Y in G is said to be blocked by a set of variables
Z if and only if[20, 13, 26]:
▶
▶
p contains a chain X→K →Y or a fork X←K → Y such that K ∈ Z, or
p contains a collider X→K←Y such that K and all the descendants of
K are not in Z.
A set Z is said to d-separate X from Y in G if and only if Z blocks
every path from X to Y .
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
15 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Paths and Colliders
A path p between vertices X and Y consists of a sequence of distinct
vertices that are adjacent. For example:
X→A←B←C→D→Y
A non-endpoint vertex V on a path (between X and Y ) is said to be a
collider if the path takes the form X · · · → V ← · · · Y.
Non-endpoints that are not colliders are called non-colliders:
X · · · ← V → · · · Y, X · · · → V → · · · Y, X · · · ← V ← · · · Y
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
16 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Subtlety regarding colliders
A vertex is a collider or non-collider with respect to a given path. A vertex
may be a collider on one path and a non-collider on another (even between
the same endpoints).
Here Z is a collider on the path X → Z ← H → Y , but Z is a
non-collider on the path X → Z → Y .
Grammar: V is a collider with respect to the path.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
17 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Separation for sets of variables
So far we have considered separation between single variables X, Y given a
set Z.
We extend this to sets of variables as follows: A set X is d-separated from
a set Y given Z iff or all X ∈ X,Y ∈ Y , X and Y are d-separated given
Z.
We use SXY =Z to denote X is d-separated from Y given Z.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
18 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Learning Bayesian Network
Learning Bayesian Network contains Structure learning and Parameter
Learning.
Structure learning
Parameter Learning
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
19 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Structure learning
Constraint Learning Methods: Constraint-based methods apply
information about Conditional Independent (CI) test to determine
dependency between vertices.
Scored and Search Methods: The score-based searching methods
produce a series of candidate networks, calculate a score for each
candidate and return a candidate with the highest score.
Hybrid Methods: The hybrid method is a combination of this two
methods.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
20 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Structure learning
Constraint Learning Methods:
SGS Algorithm [29], FCI Algorithm [30], TIPDA Algorithm [10], PC
Algorithm [29], IC Algorithm [27], QFCI Algorithm [3], GPC
Algorithm [28] and P CA − CM I Algorithm [34].
Scored and Search Methods:
AIC Algorithm [2], BDeu Algorithm [6], K2 Algorithm [12], BD
Algorithm [18], BIC Algorithm [19], MIT Algorithm [13] and
Algorithm K2GA [15].
Hybrid Methods: Suzuki’s Algorithm [7], HGC Algorithm [9] and
M M HC Algorithm [32].
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
21 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Constraint Learning Methods
The Path Consistency (PC) algorithm is introduced by Spirtes et al.
in 1993 as a constraint-based method to infer BNs.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
22 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
PC Algorithm
Algorithm [PC]
1. Start with a complete undirected graph S−1
2. i = 0,
i denotes the order of algorithm,It means the separating sets of size i
3. Repeat
4. For each X ∈ X
5. For each Y ∈ ADJ(X)
6. Determine if there is M ⊆ ADJ(X) Or ADJ(Y ) with |M| = i such
that X and Y given M are independent
7. If this set exists
8. Remove the edge between X and Y from Si−1
9. i = i + 1
10. Until max{|ADJ(X)|, |ADJ(Y )|} ≤ i + 1.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
23 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Example of PC Algorithm; i = 0
There is no pair of variables d-separated given ∅, so Graph unchanged.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
24 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Example of PC Algorithm; i = 1
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
25 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Example of PC Algorithm; i = 2
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
26 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Orientation Rulles
1. If we have X − Z − Y but no edge between X and Y ,and Z ̸∈ SXY
then orient as X → Z ← Y .
2. Apply the following rules:
(a) If X and Z are not adjacent by X → Y − Z then orient asY → Z.
(b) If X → Y → Z and X − Z then orient as X → Z.
(c) If X and Z are not adjacent, but X − W − Z and X → Y ← Z and
W − Y then orient as W → Y .
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
27 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Example of PC Algorithm
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
28 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Scored and Methods
Scoring Function: Denotes the degree of fitness of candidate
structure to data set
Search Methods
G∗ = argmaxG∈F (n) g(G : D),
in which g(G : D) denotes the degree of fitness of candidate G to
data set and F (n) indicates all the possible DAGs defined on X.
The challenging part of search procedure is that the size of the space
of all structures, f (n), is super-exponential in the number of
variables[26],
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
29 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Scoring Function
There are many scoring functions to measure the degree of fitness of a
DAG G to a data set. These are generally classified as:
Bayesian scoring functions [6, 18, 22]
Information theory-based scores[11, 24, 4, 16, 13].
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
30 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Search Methods
The definition of the search space determines the definition of the search
operators used to move from one structure to another. In turn, these
operators determine the neighborhood of a DAG, namely the DAGs that
can be reached in one step from the current DAG. Typically, the operators
consist of:
arc addition: insert a single arc between two nonadjacent nodes.
arc deletion: remove a single arc between two nodes.
arc reversal: reverse the direction of a single arc.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
31 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Search Methods
In what follows we let op(S, A) represent the result of performing the arc
operation A on the structure S, i.e., op(S, A) is a DAG that differs from S
with respect to one arc.
We define
∪
∆(Xi → Xj ) = score(Xj , pa(Xj ) Xi, D) − score(Xj , pa(Xj ), D).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
32 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Search Methods
The Greedy search Algorithm
Algorithm [Greedy search]
1. Let S be an initial structure.
2. Repeat
a) Calculate ∆(A) for each legal arc operation A
Let ∆∗ (A) = maxA ∆(A) and A∗ = argmaxA ∆(A).
b) If ∆∗ > 0, then
Set S = op(S, A∗ ).
3. Until ∆∗ < 0.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
33 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
Scored and Search Methods
The Greedy search Algorithm
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
34 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Bayesian Network
MIT Score
MIT Score is one of the famous Information theory-based scores
which contains two terms:
Mutual information and Penalization Term.
si
n
∑
∑
gM IT (G, D) =
(2N M ID (Xi , P aB (Xi ))−max
χα,li,σi (j) ).
σi
i=1,P aG (Xi )̸=∅
j=1
(1)
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
35 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
In this work we introduce three algorithms for learning GRNs.
1. Zhang, Xiujun, et al. ”Inferring gene regulatory networks from gene
expression data by path consistency algorithm based on conditional mutual
information.” Bioinformatics 28.1 (2012): 98-104.
2. Aghdam, Rosa, Mojtaba Ganjali, and Changiz Eslahchi. ”IPCA-CMI:
An Algorithm for Inferring Gene Regulatory Networks based on a
Combination of PCA-CMI and MIT Score.” PloS one 9.4 (2014): e92600.
3. Aghdam, Rosa, Mojtaba Ganjali, Zhang, Xiujun and Changiz Eslahchi.
”CN: a consensus algorithm for inferring gene regulatory networks using
the SORDER algorithm and conditional mutual information test.”
Molecular BioSystems (2015).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
36 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Information Theory
Let X = (X1 , ..., Xn )T be an n-dimensional Gaussian vector with mean
µ = (µ1 , ..., µn )T and covariance matrix C(X) = E(X − µ)(X − µ)T , i.e.
X ∼ N (µ, C(X)).
Then MI for two continuous variables X and Y can be easily calculated
using the following formula.
σ 2 .σ 2
1
M I(X, Y ) = log X Y ,
2
σXY
(2)
Conditional Mutual Information for two continuous variables X and Y
given Z is determined by:
det(C(X, Z)).det(C(Y, Z))
1
,
CM I(X, Y |Z) = log
2
det(C(Z)).det(C(X, Y, Z))
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
(3)
.
2015
.
37 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
PC Algorithm based on CMI test (PCA-CMI)
Algorithm [PCA-CMI]
1. Start with a complete undirected graph S−1
2. i = 0
3. Repeat
4. For each X ∈ X
5. For each Y ∈ ADJ(X)
∩
6. Determine if there is M ⊆ VXY = ADJ(X) ADJ(Y ) with |M| = i
such that X and Y given M are independent
7. If this set exists
8. Remove the edge between X and Y from Si−1
9. i = i + 1
10. Until i ≤ |VXY |
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
38 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
PC Algorithm based on CMI test (PCA-CMI)
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
39 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Zero order of the Improvement of PC Algorithm based on
CMI test
Algorithm [IPCA-CMI order 0]
1. Start with a complete undirected graph S−1 .
2. Repeat
3. For each X ∈ X
4. For each Y ∈ ADJ(X)
5. If X and Y are independent based on the measure of MI
6. Remove the edge between X and Y from S−1
7. The MIT score was utilized in the HC algorithm to construct G0 .
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
40 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
The IPCA-CMI for i > 0
Aqz for 1 ≤ q ≤ 4 are defined as follows:
A1Z = {W |X → Z → W }, A2Z = {W |X ← Z ← W },
A3Z = {W |X ← Z → W }, A4Z = {W |X → Z ← W }.
W eightX (Z) = |A1Z | + |A2Z | + |A3Z | − |A4Z |,
Let RXY be defined by:
RXY = {Z|W eightX (Z) ≥ k or W eightY (Z) ≥ k, ∀Z ∈
{ADJ(X) ∪ ADJ(Y )} \ {X, Y }},
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
41 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
The IPCA-CMI for i > 0
Algorithm [The IPCA-CMI for i > 0]
1. Start with G0
2. i = 1
3. Repeat
4. For each X ∈ X
5. For each Y ∈ ADJ(X)
6. Test whether ∃H ⊆ RXY with |H| = i such that X ∥ Y | H.
7. If this set exists
8. Remove the edge between X and Y from Gi−1
9. The MIT score was utilized in the HC algorithm to direct the structure.
10. For each Z ∈ {ADJ(X) ∪ ADJ(Y )} \ {X, Y }
11. W eightX (Z) = |A1Z | + |A2Z | + |A3Z | − |A4Z |
12. RXY = {Z|W eightX (Z) ≥ k or W eightY (Z) ≥ k, ∀Z ∈
{ADJ(X) ∪ ADJ(Y )} \ {X, Y }}
13. i = i+1
13. Until i ≤ |RXY |
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
42 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Data Set
The Dialogue for Reverse Engineering Assessments and Methods
(DREAM) project was introduced in 2006. DREAM is a data set that
allows researchers to evaluate the performance of their approaches in
predicting biological networks.
The goal of the in silico network challenge is to reverse engineer gene
regulation networks from simulated steady-state and time-series data. The
true network structure of biological gene networks is in general unknown or
only partly known, but the structures of in silico networks are known and
so predictions can be validated.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
43 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Data Set
You can download Data set from
http://www.the-dream-project.org/challenges
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
44 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Data Set
DREAM3 In Silico Size 10 and 50.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
45 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Evaluation
To assess the validity of algorithms, the True Positive (TP), False
Positive (FP), True Negative (TN) and False Negative (FN) values for
proposed algorithms are computed. In addition, known measures such as
ACCuracy (ACC), False Discovery Rate (FDR), Positive Predictive Value
(PPV), F-score measure (F-measure), Matthews Correlation Coefficient
(MCC) and True Positive Rate (TPR) are considered. These measures are
defined by:
TP
TN
, SP C =
TP + FN
FP + TN
TP
TN
PPV =
, NPV =
TP + FP
TN + FN
TP + TN
FP
ACC =
, FPR =
,
TP + FP + TN + FN
FP + TN
TPR =
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
46 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Evaluation
FP
PPV × TPR
,F = 2
,
FP + TP
PPV + TPR
TP × TN − FP × FN
M CC =
.
(T P + F P )(T P + F N )(T N + F P )(T N + F N )
F DR =
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
47 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Evaluation
To make an evaluation of the performance of algorithms, in the present
study, these algorithms are compared by several other popular Inference
methods those which are given in Table. In this table the approaches used
by each algorithm is also mentioned.
Table: Gene network inference methods which applied for comparison
Method
LASSO
LP
PCC
MI
PCCMI
Approach
Reference
Regression
Royal Statistical Society[31]
Linear Programing
Bioinformatics[33]
PCA based on partial CC Machine Learning Research[21]
Mutual Information
Reverse engineering cellular networks[25]
PCA based on CMI
Bioinformatics[34]
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
48 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
A
True Network
Ch. Eslahchi
(Shahid Beheshti University)
B
First-order Network
inferred by PCA-CMI
C
.
.
First-order Network
inferred by IPCA-CMI
.
.
.
2015
.
49 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
Table: The result of Simulated and Real data sets in order 0
Network
TP FP ACC FPR FDR PPV F
DREAM10
9
1
0.95
0.02
0.10
0.9
0.90 0.87
0.90
DREAM50
36
54
0.92
0.05
0.6
0.4
0.43 0.39
0.46
DREAM100 70
58
0.96
0.01
0.45
0.55
0.47 0.46
0.42
SOS
4
0.72
0.33
0.18
0.82
0.78 0.40
0.75
18
.
Ch. Eslahchi
(Shahid Beheshti University)
.
MCC TPR
.
.
.
2015
.
50 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
Table: The result of gene expression data set DREAM3 Challenge with 10
genes and sample number 10
Algorithm TP FP ACC FPR FDR PPV F
MCC TPR
PCA1
7
1
0.91
0.03
0.73
IPCA1
8.8 0
0.98
0
0.13
0
0.87
1
(Shahid Beheshti University)
0.7
0.94 0.93
.
Ch. Eslahchi
0.78
.
.
.
0.88
.
2015
.
51 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
Table: The result of gene expression data set DREAM3 Challenge with 50
genes and sample number 50
Algorithm TP
FP
ACC FPR FDR PPV F
PCA1
24
23
0.93
0.02
0.49
0.51
0.39 0.37
0.31
PCA2
22
21
0.93
0.02
0.49
0.51
0.37 0.35
0.29
IPCA1
28
26.5
0.94
0.02
0.48
0.51
0.43 0.4
0.36
IPCA2
22.9 11.78
0.95
0.01
0.48
0.52 0.38
.
Ch. Eslahchi
(Shahid Beheshti University)
MCC TPR
.
.
0.42 0.3
.
.
2015
.
52 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
Table: The result of gene expression data set DREAM3 Challenge with 100
genes and sample number 100
Algorithm TP
FP
ACC
FPR
FDR PPV F
PCA1
49
25
0.971
0.005 0.34 0.66
0.41 0.43
0.28
PCA2
46
25
0.971
0.005 0.35 0.64
0.38 0.41
0.27
IPCA1
53.11 29.77 0.972
0.006 0.35 0.65
0.43 0.44
0.32
IPCA2
46.55 15.16
0.973 0.003 0.24 0.75 0.4
.
Ch. Eslahchi
(Shahid Beheshti University)
MCC TPR
.
.
0.45
.
.
2015
0.28
.
53 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result IPCA-CMI
Table: The result of experimental data from Escherichia coil containing 9
genes and sample number 9
Algorithm TP FP
ACC FPR FDR PPV F
MCC TPR
PCA1
18
4
0.72
0.33
0.18
0.82
0.78
0.40
0.75
IPCA1
18
1.8 0.73
0.32
0.17
0.82
0.79 0.41
0.75
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
54 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
IPCA-CMI
IPCA-CMI algorithm is written in Matlab. It gets simulated data and real
data as inputs. This algorithm applied for learning the skeleton of GRNs.
The source of the program and data sets are available at
http://www.bioinf.cs.ipm.ir/software/IPCA-CMI/.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
55 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
CN: A Consensus Algorithm for Inferring Gene Regulatory
Networks Using SORDER Algorithm and Conditional
Mutual Information Test†
PC algorithm is not robust, because it achieves different network
topologies if gene orders are permuted.
In addition, the performance of this algorithm depends on the
threshold value used for independence tests.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
56 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
SORDER Algorithm
To overcome the disadvantage of PC algorithm i.e. the dependence on the
sequential ordering of vertices, the SORDER algorithm is proposed to
select a suitable sequential ordering of vertices (genes) based on the
degrees of vertices in S0 (skeleton of order 0 resulted by PCA-CMI).
Node selection in SORDER algorithm is based on the maximum degree of
nodes. In each step of SORDER algorithm, node with maximum degree is
selected, and the selection process are repeated until all nodes of graph are
chosen. When some nodes have the same degree in one step, an induced
subgraph of these nodes are constructed. In a similar manner, nodes in the
induced subgraph are selected according to the maximum degree.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
57 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
SORDER Algorithm
Figure: Example of SORDER algorithm
The result of applying SORDER algorithm for graph G is
O = (c, a, b, d, f, e).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
58 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
SORDER Algorithm
In order to investigate the performance of the SORDER algorithm, at first
all permutations of n nodes of networks which contain small set of
variables (DREAM3 with 10 nodes and SOS network with 9 nodes) are
determined.
As the number of possible permutations for the network is very large and
can not be fully enumerated, we have selected N random permutations
(N = 105 ) of gene order and run PCA-CMI. For i = 1, .., 105 , Ti and
TSORDER are defined as Ti = (T Pi , F Pi ) and
TSORDER = (T PSORDER , F PSORDER ), respectively. Let
X = {i|T Pi > T PSORDER and F Pi < F PSORDER )} f or i = 1, ..., N.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
59 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
SORDER Algorithm
Table: The comparison between SORDER algorithm and random permutation of
nodes. Exceeding Value (EV) denotes the percentage of random permutations
(unordered) which perform better than SORDER algorithm out of 105 random
permutations.
Data Set
DREAM50
DREAM100
EV value
0.018
0.048
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
60 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
CN Algorithm
Figure: Diagram of the CN algorithm.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
61 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
CNMIT Algorithm
Figure: Diagram of the CNMIT algorithm. (a) Result of CN algorithm. (b) Values
of MIT score for k possible DAGs. (c) The result of CNMIT algorithm.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
62 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
ROC Curvee
1
0.9
True Positive Rate (Sensitivity)
0.8
0.7
0.6
0.5
0.4
CN
PCA−CMI
PCA−PCC
MI
LASSO
LP
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
False Positive Rate (1−Specificity)
0.8
0.9
1
Figure: ROC curves of different methods for DREAM3 challenge with 10 nodes.
.
. with. a AUC
.
.
The red line is related to the ROC curve of CN algorithm
of. 0.9734
Ch. Eslahchi
(Shahid Beheshti University)
2015
63 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Figure: Result of SORDER and CN algorithms. (A) The true network with 10
nodes and 10 edges. (B) Network inferred by SPCA-CMI algorithm. The edge
with red line G2-G4 is false positives, while edges G1-G2, G3-G5 and G4-G9 are
false negative. (C) Network obtained by the CN algorithm. The edge with red line
G2-G9 is false positives and G4-G9 is false negative, but edges G1-G2 and G3-G5
are successfully found by this algorithm in comparison with that of (B).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
64 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
65 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Figure: Comparison of TP and FP values of CN algorithm with other methods for learning (a) DREAM3 Challenge with 10 genes, (b) DREAM3 Challenge
with 50 gene, (c) DREAM3 Challenge with 100 genes, (d) SOS network with 9
genes. Also, comparison of CNMIT algorithm with different methods for learning
(e) DREAM3 Challenge with 50 genes, (f) DREAM3 Challenge with 100 genes,
(LA2012: LASSO2012; Gsqrt: GENIE3-FR-sqrt0; Gall: GENIE3-FR-all).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
66 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
67 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Figure: Comparison of ACC value and F-measure of CN algorithm with other methods for learning (a) DREAM3 Challenge with 10 genes, (b) DREAM3 Challenge
with 50 genes, (c) DREAM3 Challenge with 100 genes, (d) SOS network with 9
genes. Comparison of CNMIT algorithm with different methods for learning (e)
DREAM3 Challenge with 50 genes and 77 edges, (f) DREAM3 Challenge with
100 genes and 166 edges, (LA2012: LASSO2012; Gsqrt: GENIE3-FR-sqrt0; Gall:
GENIE3-FR-all).
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
68 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Table: Comparison of different methods for learning DREAM3. LASSO: regression
based method; LP: linear programming based method; PCA-PCC: PC-algorithm
based on partial correlation coefficient; MI: MI method; PCA-CMI: Path Consistency Algorithm based on Conditional Mutual Information with threshold; CN:
Consensus Network Algorithm. AUCD10: AUC value for a 10-gene network in
DREAM3; AUCD50: AUC value for a 50-gene network in DREAM3.
Method
AUCD10
AUCD50
LASSO
0.813
0.819
LP
0.75
0.559
PCA-PCC
0.897
0.78
MI
0.93
0.832
.
Ch. Eslahchi
(Shahid Beheshti University)
PCA-CMI
0.9642
0.8421
.
.
CN
0.9734
0.8515
.
.
2015
.
69 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Table: Comparison of different methods for Learning DREAM4 Challenge. TeamName is the name of the team which registered for this challenge (PCA-CMI: Path
Consistency Algorithm based on Conditional Mutual Information; CN: Consensus
Network Algorithm, the best performer for the relative item is noted in bold).
Method
Team415
Team549
Team395
PCA-CMI
CN
Net1
0.75
0.73
0.69
0.7
0.75
Net2
0.69
0.7
0.64
0.69
0.74
Net3
0.76
0.74
0.72
0.74
0.76
Net4
0.77
0.74
0.72
0.74
0.7
.
Ch. Eslahchi
(Shahid Beheshti University)
Net5
0.76
0.74
0.71
0.74
0.76
.
.
.
.
2015
.
70 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Algorithms
Result
Table: Comparison of AUC for real data sets (AUCSOS: AUC values for a SOS
network with 9 genes, AUCECO: AUC values for E.coil network with 8 genes, the
best performer for the relative item is noted in bold).
Method
AUCSOS
AUCECO
LASSO
0.61
0.63
LP
0.59
0.54
PCA-PCC
0.67
0.6
MI
0.73
0.63
.
Ch. Eslahchi
(Shahid Beheshti University)
PCA-CMI
0.74
0.64
.
.
CN
0.75
0.84
.
.
2015
.
71 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Conclusion
In this study new algorithms called IPCA-CMI and CN for inferring
GRNs from gene expression data was presented.
Zhang [34] reported that the PCA-CMI performed better than linear
programming method [33], multiple linear regression Lasso method
[31], mutual information method [25] and PC-Algorithm based on
partial correlation coefficient [21] for inferring networks from gene
expression data such as DREAM3 Challenge and SOS DNA repair
network.
Software in the form of MATLAB and JAVA codes. The source of
data sets and codes are available at
http://bs.ipm.ir/softwares/IPCA-CMI/.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
72 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Conclusion
Conclusion
In this study, we proved that PCA-CMI is an order-dependent
algorithm and the result can be changed considering different orders
of n nodes. The results showed that determining a suitable sequential
ordering of nodes can improve the performance of the PCA-CMI.
Moreover, constraint-based methods that apply CMI tests are
dependent on the selected threshold for CMI tests.
without using the SORDER algorithm, applying the CN algorithm is
meaningless.
Software in the form of MATLAB and JAVA codes. The source of data
sets and codes are available at http://bs.ipm.ir/softwares/CN/.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
73 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Conclusion
Thank You
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
74 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
References
[1] N. A. Ahmed and D. Gokhale. Entropy expressions and their estimators for
multivariate distributions. Information Theory, IEEE Transactions on,
35(3):688–692, 1989.
[2] H. Akaike. A new look at the statistical model identification. Automatic Control,
IEEE Transactions on, 19(6):716–723, 1974.
[3] L. Badea. Inferring large gene networks from microarray data: a constraint-based
approach. In IJCAI-2003 Workshop on Learning Graphical Models for
Computational Genomics. Citeseer, 2003.
[4] R. R. Bouckaert. Bayesian belief networks: from construction to inference.
Universiteit Utrecht, Faculteit Wiskunde en Informatica, 1995.
[5] R. R. Bouckaert. Bayesian belief networks: from construction to inference. 2001.
[6] W. Buntine. Theory refinement on bayesian networks. In Proceedings of the
Seventh conference on Uncertainty in Artificial Intelligence, pages 52–60. Morgan
Kaufmann Publishers Inc., 1991.
[7] W. Buntine. A guide to the literature on learning probabilistic networks from data.
Knowledge and Data Engineering, IEEE Transactions on, 8(2):195–210, 1996.
[8] J. Catlett. On changing continuous attributes into ordered discrete attributes. In
Machine Learning-EWSL-91, pages 164–178. Springer,
1991.
.
.
.
.
.
.
Ch. Eslahchi
(Shahid Beheshti University)
2015
74 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
[9] J. Cheng, D. Bell, and W. Liu. Learning bayesian networks from data: An efficient
approach based on information theory. On World Wide Web at http://www. cs.
ualberta. ca/˜ jcheng/bnpc. htm, 1998.
[10] J. Cheng, D. A. Bell, and W. Liu. Learning belief networks from data: An
information theory based approach. In Proceedings of the sixth international
conference on Information and knowledge management, pages 325–331. ACM,
1997.
[11] C. Chow and C. Liu. Approximating discrete probability distributions with
dependence trees. Information Theory, IEEE Transactions on, 14(3):462–467, 1968.
[12] G. F. Cooper and E. Herskovits. A bayesian method for the induction of
probabilistic networks from data. Machine learning, 9(4):309–347, 1992.
[13] L. M. De Campos. A scoring function for learning bayesian networks based on
mutual information and conditional independence tests. The Journal of Machine
Learning Research, 7:2149–2187, 2006.
[14] J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised
discretization of continuous features. In ICML, pages 194–202, 1995.
[15] E. Faulkner. K2ga: heuristically guided evolution of bayesian network structures
from data. In Computational Intelligence and Data Mining, 2007. CIDM 2007.
IEEE Symposium on, pages 18–25. IEEE, 2007.
.
.
.
.
.
.
Ch. Eslahchi
(Shahid Beheshti University)
2015
74 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
[16] N. Friedman and M. Goldszmidt. Learning bayesian networks with local structure.
In Learning in graphical models, pages 421–459. Springer, 1998.
[17] N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using bayesian networks to
analyze expression data. Journal of computational biology, 7(3-4):601–620, 2000.
[18] D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The
combination of knowledge and statistical data. Machine learning, 20(3):197–243,
1995.
[19] S. Imoto, T. Goto, S. Miyano, et al. Estimation of genetic networks and functional
structures between genes by using bayesian networks and nonparametric regression.
In Pacific Symposium on Biocomputing, volume 7, pages 175–186. World
Scientific, 2002.
[20] F. V. Jensen. An introduction to Bayesian networks, volume 210. UCL press
London, 1996.
[21] M. Kalisch and P. B¨
uhlmann. Estimating high-dimensional directed acyclic graphs
with the PC-algorithm. The Journal of Machine Learning Research, 8:613–636,
2007.
[22] M. Kayaalp and G. F. Cooper. A bayesian network scoring metric that is based on
globally uniform parameter priors. In Proceedings of the Eighteenth conference on
Uncertainty in artificial intelligence, pages 251–258. Morgan Kaufmann Publishers
Inc., 2002.
.
.
.
.
.
.
Ch. Eslahchi
(Shahid Beheshti University)
2015
74 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
[23] R. Kerber. Chimerge: Discretization of numeric attributes. In Proceedings of the
tenth national conference on Artificial intelligence, pages 123–128. Aaai Press,
1992.
[24] W. Lam and F. Bacchus. Learning bayesian belief networks: An approach based on
the mdl principle. Computational intelligence, 10(3):269–293, 1994.
[25] A. A. Margolin, K. Wang, W. K. Lim, M. Kustagi, I. Nemenman, and A. Califano.
Reverse engineering cellular networks. Nature Protocols, 1(2):662–671, 2006.
[26] T. D. Nielsen and F. V. Jensen. Bayesian networks and decision graphs. Springer,
2009.
[27] J. Pearl. Causality: models, reasoning and inference, volume 29. Cambridge Univ
Press, 2000.
[28] J. M. Pe˜
na, J. Bj¨
orkegren, and J. Tegn´er. Growing bayesian network models of
gene networks from seed genes. Bioinformatics, 21(suppl 2):ii224–ii229, 2005.
[29] P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction, and search,
volume 81. The MIT Press, 2000.
[30] P. Spirtes, C. Meek, and T. Richardson. Causal inference in the presence of latent
variables and selection bias. In Proceedings of the Eleventh conference on
Uncertainty in artificial intelligence, pages 499–506. Morgan Kaufmann Publishers
Inc., 1995.
.
.
.
.
.
.
Ch. Eslahchi
(Shahid Beheshti University)
2015
74 / 74
Outline
Challenge
..
Bayesian Network
Algorithms
Conclusion
.........................................................
Conclusion
[31] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
[32] I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing bayesian
network structure learning algorithm. Machine learning, 65(1):31–78, 2006.
[33] Y. Wang, T. Joshi, X.-S. Zhang, D. Xu, and L. Chen. Inferring gene regulatory
networks from multiple microarray datasets. Bioinformatics, 22(19):2413–2420,
2006.
[34] X. Zhang, X.-M. Zhao, K. He, L. Lu, Y. Cao, J. Liu, J.-K. Hao, Z.-P. Liu, and
L. Chen. Inferring gene regulatory networks from gene expression data by path
consistency algorithm based on conditional mutual information. Bioinformatics,
28(1):98–104, 2012.
.
Ch. Eslahchi
(Shahid Beheshti University)
.
.
.
.
2015
.
74 / 74