Download Report

Advanced Topics in Control: Distributed Systems and Control
Project #2
Note: the project is due Monday, May 25th
Leaders, followers and likeness in networks
Introduction
Consider a digraph G = (V, E) describing the web, where the vertex set V of cardinality n represents
web-pages and the edge set E of cardinality m represents the links between pages, with (i, j) ∈ E
if page i contains a link to page j. Let A01 ∈ {0, 1}n×n be the 0-1 adjacency matrix of the graph;
let D be the diagonal matrix whose entry [D]ii corresponds to the out-degree of node i and define
A := D−1 A01 ; notice that A is nonnegative and row-stochastic. In exercise session 3, we introduced
the page-rank algorithm, in order to rank web-pages of G according to their relevance. The idea
behind the page rank algorithm is that the importance of a page i should be increased by every
page j that links to i, by an amount proportional to the importance
of page j.
Pn
For every page i ∈ {1, . . . , n} we denote by xi ∈ [0, 1], s.t.
x
= 1, its relative importance.
i
i=1
According to the definition given before, this can be computed as
xi =
n
X
[A]ji xj ,
(1)
j=1
or in vector form
x = A> x .
(2)
Finding the solution of the algebraic equalities (2) is computationally too expensive for a network
as big as the web. Therefore, in the page-rank algorithm the solution of (2) is obtained as the
asymptotic value x
¯ := liml→∞ xl of the iterative algorithm
xk+1 = A> xk x0 = p,
(3)
Pn
with the initial value p such that pi ∈ [0, 1] and i=1 pi = 1. As explained in exercise session 3 and
in paragraph 5.6 of the lecture notes, the page rank algorithm is actually a slight modification of (3).
Task 1: modeling
Leader-follower index
In this project you will develop and extend the concept of importance of a vertex in a graph. Coming back to the example of the web, we now consider two indices for every web page i: a leader
index `i , and a follower index fi . If we perform a web search with the keyword “university”, the
homepages of ETH, University of Zurich and other universities are good leaders, while web pages
pointing to these home pages are good followers. In other words, good followers are pages that link
to good leaders, and good leaders are pages that are pointed to by good followers.
Task (1.A) Based on the description above, write one equation expressing the follower index fi
and one equation expressing the leader index `i of node i; your equation should have a structure
similar to (1), but with [A]ji replaced by [A01 ]ji . Introduce the compact notation
f
x=
∈ R2|V | ,
`
1
Advanced Topics in Control: Distributed Systems and Control
Project #2
where the vector x stacks together the follower and leader indices of all the vertices; the two
equations that you wrote can be expressed in compact form as
x = Mx ,
(4)
where the matrix M is symmetric and nonnegative. Express M in terms of the 0-1 adjacency
matrix A01 .
1-2-3 index
We can give a different interpretation to the leader score `i and follower score fi of vertex i by
considering the graph with two vertices
follower −→ leader
(5)
Then `i can be interpreted as a measure of how much vertex i of the graph G and vertex “leader”
in graph (5) are alike; in the same way, we interpret fi as a measure of how much vertex i of the
graph G and vertex “follower” in graph (5) are alike. As you expressed in task (1.A), the likeness
index between i and “leader” is the sum of the likeness indices between the in-neighbors of i and
“follower”. Analogously, the likeness index between i and “follower” is the sum of the likeness
indices between the out-neighbors of i and “leader”.
Building on this interpretation, instead of computing the likeness index between vertex i of G and
the two vertices of (5), we compute in what follows the likeness index between vertex i of G and
the vertices of the the following 1-2-3 graph
1 −→ 2 −→ 3
(6)
For each vertex i of the graph G, we can define the index xi1 , indicating how much i in G and 1
in (6) are alike, the index xi2 , indicating how much i in G and 2 in (6) are alike, and the index xi3 ,
indicating how much i in G and 3 in (6) are alike. As for the leader and follower indices, the 1,2
and 3 indices of i depend in turn from the 1,2 and 3 indices of the neighbors of i in G. Specifically:
• xi1 is the sum of the likeness indices between j and 2, for all the out-neighbors j of i;
• xi2 is the sum of the likeness indices between j and 1, for all the in-neighbors j of i, plus the
sum of the likeness indices between j and 3, for all the out-neighbors j of i;
• xi3 is the sum of the likeness indices between j and 2, for all the in-neighbors j of i.
Task (1.B) Based on the description above, write one equation describing the 1-index xi1 , one
equation describing the 2-index xi2 and one describing the 3-index xi3 of node i. These equations
have the structure of (1), but with [A]ji replaced by [A01 ]ji . Introduce the compact notation
 
x1
x = x2  ∈ R3|V | ,
x3
where the vector x stacks together the 1-index, 2-index and 3-index of all the vertices; the three
equations that you derived can be rewritten in compact form as
x = Mx ,
2
(7)
Advanced Topics in Control: Distributed Systems and Control
Project #2
where the matrix M is symmetric and nonnegative. Express M in terms of the 0-1 adjacency
matrix A01 .
General formulation: likeness index
We now come to a more general description, where we introduce the likeness between the graph
G = (V, E) and an arbitrary reference graph GR = (VR , ER ). For the leader-follower case, the
reference graph GR was given by (5), while in the 1-2-3 case the reference graph was given by (6).
For each node i in G and each node j in GR , we introduce an index xij , describing how much the
node i and the node j are alike. Generalizing what done for the leader-follower graph (5) and for
the 1-2-3 graph (6), we say that the likeness index xij is the sum of the likeness indices between
each in-neighbor of i in G and each in-neighbors of j in GR , plus the likeness indices between each
out-neighbor of i in G and each out-neighbor of j in GR .
Task (1.C) Based on the description above, write an equation describing the likeness index xij
between node i in G and node j in GR . This equation generalizes the structure of (1). If we
define the likeness matrix X ∈ R|V |×|VG | such that [X]ij = xij , then the equation that you wrote
for one node i in G and one node j in GR can be expressed for all the nodes of G and of GR at
the same time in matrix form, by making use of X, the 0-1 adjacency matrix A01 of G and the
0-1 adjacency matrix A01,R of GR ; report such expression, which in the following we refer to as
matrix-form likeness equation. Such equation is linear in the entries of the likeness matrix X; it is
possible to make this linear dependence more explicit by means of the matrix-to-vector operator
called vectorization (and denoted vec), which transforms the matrix X into the vector x by taking
its columns one by one. It is then possible to express the matrix-form likeness equation as
x = Mx ,
(8)
which generalizes (4) and (7) and where the matrix M is symmetric and nonnegative. Provide an
expression for the matrix M in (8), by exploiting the combined properties between the operator
vec, the matrix multiplication and the Kronecker product.
As for the solution of (2), finding the solution of (8) (which generalizes (4) and (7)) can be computationally too expensive for a very large network. For this reason, as it was done for page-rank
in (3), we introduce the iterative algorithm
zk+1 =
M zk
,
kM zk k2
z0 > 0 ,
(9)
where the normalization enforces the Euclidean norm of zk to be unitary for all k > 0. Since we
are interested in solutions to (8), we investigate the convergence properties of the sequence (9) in
the following Task 2.
Task 2: convergence analysis
Assumption 1 : the spectral radius ρ of the matrix M in (9) is larger than the magnitude of any
other eigenvalue of M .
3
Advanced Topics in Control: Distributed Systems and Control
Project #2
Task (2.A) Consider a nonnegative symmetric matrix M , satisfying assumption 1 (not necessarily
of the form derived in the previous tasks); prove that the iteration (9) converges and provide an
expression of the limit value it converges to. To this end, you could use some of the following facts:
• symmetric matrices admit an eigenvalue decomposition;
• every symmetric nonnegative matrix can be permuted to a block-diagonal matrix, with irreducible blocks.
Task (2.B) Assumption 1 is restrictive and in the following we remove it; consider a nonnegative
symmetric matrix M ; by modifying the proof of task (2.A), show that the two subsequences z2k
and z2k+1 in (9) converge, i.e.
lim z2k = zeven (z0 )
lim z2k = zodd (z0 ) ,
k→∞
k→∞
and provide an expression for zeven (z0 ) and zodd (z0 ).
Task (2.C) Provide a characterization of {zeven (z0 ) : z0 > 0} ∪ {zodd (z0 ) : z0 > 0}. Moreover, by
using the Schwarz inequality, show that zeven (1) is the vector of largest 1-norm in that set.
Since zeven and zodd are in general different and depend on the initial condition z0 , in the following
we will adopt zLK := zeven (1) as definition of likeness vector ; as a consequence, we can define the
likeness matrix ZLK ∈ R|V |×|VG | , where [ZLK ]ij = [zLK ]i·|V |+j . In other terms, if we define the
inverse of the vectorization operator vec introduced in task (1.C) as vec −1 , then ZLK = vec −1 (zLK ).
Task 3: Self-likeness
If we compare the graph G with a reference graph which is G itself, i.e. G = GR , then the likeness
matrix ZLK is a square matrix and we refer to it as self-likeness matrix. We expect each vertex to
have a high likeness with itself; this intuition is to be formalized in the following task.
Task (3.A) Given a graph G, show that the largest element of its self-likeness matrix appears
cannot appear outside the diagonal. If a diagonal element is zero, what can be said about the
corresponding rows and columns?
Hint: Express the analogous of iteration (9) relatively to the matrix-form likeness equation; notice
that Z(k) → ZLK and study if Z(k) is positive semi-definite, positive definite or indefinite.
Task 4: Example
Consider the butterfly graph reported in Figure 1, where the central node is pointed to by n nodes
and points to m nodes and where each node is assigned to a label.
4
Advanced Topics in Control: Distributed Systems and Control
Project #2
SIMILARITY IN GRAPHS
2"
n+2"
3"
n+3"
657
n+4"
1"
n"
m"
n&1"
""n"
""n+m"
n+1"
n+m+1"
Figure 1: The butterfly graph
√
Fig. 4.1 A directed bow-tie graph. Kleinberg’s hub score of the center vertex is equal to√1/ m + 1
if m (4.A)
> n and
if m <
n. and
Thethe
central
equal
to 1/graph
m+n+1
Task
Findtothe0 leader
index
followerscore
indexofforthis
eachvertex
node ofis the
butterfly
independently
of the
relative
values
(making
a distinction
between
the case
m >ofn m
andand
m <n.
n); moreover, find the 1-index, the 2-index
and the 3-index for each node of the butterfly graph; justify your findings with analytical and/or
numerical evidence.
T Which index do you think is more appropriate to describe the structure of the butterfly graph and
P diag{Πv , Πu }P , and hence the subvectors of Π1 are the vectors Πv 1 and Πu 1,
why?
T
T
T
but
which can be computed from the smaller matrices E E or EE . Since E E =
B T B + BB T , the central vector Πv 1 is the middle vector of Π1. It is worth pointing
out that (4.1) also yields a relation between the two smaller projectors:
Task 5: Application
2
ρ Πv = E T Πu E,
ρ2 Πu = EΠv E T .
Task (5.A) Make up three graphs and on each of them compute (analytically and/or numerically)
the self-likeness
matrix. Make
choicegraphs
of the three
in order
to show
different
patterns of
order
to illustrate
thatthepath
of graphs
length
3 may
have
an advantage
the self-likeness matrix. You can focus on graphs with a low number of vertices.
In
over
the hub–authority structure graph we consider here the special case of the “directed
bow-tie graph” GB represented in Figure 4.1. If we label the center vertex first, then
label the m left vertices, and finally the n right vertices, the adjacency matrix for this
Task (5.B)
graph isDownload
given by
from http://control.ee.ethz.ch/~ifaatic/2015/Material/projects/thesaurus.
⎤
⎡
zip the thesaurus graph. The thesaurus
0 0graph
· · ·has
0 a 1vertex
· · · for
1 every word, and there exists an
edge from i to j if j appears in the definition of i.
⎥ possibility}. Given the vertex
⎢ words
Let us now consider the set of four
⎥
⎢ 1 W = {meeting, learn, salt,
⎥ to w, composed by all the
⎢ .. Gw as the subgraph relative
associated to a word w ∈ W , construct
⎥ a subgraph because handling
.
0
0
neighboring vertices of w, and the⎢
relative
edges.
We
consider
here
n
⎥
⎢
⎥Via. a simulation tool (such as
⎢
the entire graph would result
in
time-consuming
computations.
B=⎢ 1
⎥the 3-index for every vertex of
Matlab, Octave or similar), compute
the
1-index,
the
2-index
and
⎥
⎢ 0
⎥ of Gw by their 2-index and
Gw , for all the words w ∈ W . For⎢every word w ∈ W , sort the vertices
⎥ relate to the original word
⎢
.
give an interpretation to such ranking;
scorers
⎦
⎣ .. how do0 the top 2-index
0
m
w?
0
The matrix B T B + BB T is equal to
⎡
m+n 0
5
T
T
0
1n
B B + BB = ⎣
0
0
⎤
0
0 ⎦,
1m