Download Report

ISIT Tutorial
Information theory and machine learning
Part II
Martin Wainwright
UC Berkeley
Emmanuel Abbe
Princeton University
1
Inverse problems on graphs
A large variety of machine learning and data-mining problems
are about inferring global properties on a collection of agents
by observing local noisy interactions of these agents
Examples:
- community detection in social networks
2
Inverse problems on graphs
A large variety of machine learning and data-mining problems
are about inferring global properties on a collection of agents
by observing local noisy interactions of these agents
Examples:
- community detection in social networks
- image segmentation
3
Inverse problems on graphs
A large variety of machine learning and data-mining problems
are about inferring global properties on a collection of agents
by observing local noisy interactions of these agents
Examples:
- community detection in social networks
- image segmentation
- data classification and information retrieval
4
Inverse problems on graphs
A large variety of machine learning and data-mining problems
are about inferring global properties on a collection of agents
by observing local noisy interactions of these agents
Examples:
- community detection in social networks
- image segmentation
- data classification and information retrieval
- object matching, synchronization
- page sorting
- protein-to-protein interactions
- haplotype assembly
- ...
5
Inverse problems on graphs
A large variety of machine learning and data-mining problems
are about inferring global properties on a collection of agents
by observing local noisy interactions of these agents
In each case: observe information on the edges of a network
that has been generated from hidden attributes on the nodes,
and try to infer back these attributes
Dual to the graphical model learning problem (previous part)
6
What about graph-based codes?
X1
W
Y1
W
YN
C
Xn
Different: the code is a design parameter and takes typically specific
non-local interactions of the bits (e.g., random, LDPC, polar codes).
(1) What are the relevant types of “codes” and “channels”
behind machine learning problems?
(2) What are the fundamental limits for these?
7
Outline of the talk
1. Community detection and clustering
2. Stochastic block models :
fundamental limits and capacity-achieving algorithms
3. Open problems
4. Graphical channels and low-rank matrix recovery
8
Community detection and clustering
9
Networks provide local interactions among agents
social networks:
“friendship”
call graphs:
“calls”
biological networks:
“protein interactions”
genome HiC networks:
“DNA contacts”
10
Networks provide local interactions among agents
one often wants to infer global similarity classes
social networks:
“friendship”
call graphs:
“calls”
biological networks:
“protein interactions”
genome HiC networks:
“DNA contacts”
11
The challenges of community detection
A long-studied and notoriously hard problem
what is a good clustering?
assort. and disassort. relations
how to get a good clustering?
computationally hard
work with models
many heuristics...
Tutorial motto:
Can on establish a clear line-of-sight
as in communications with
the Shannon capacity?
Peak Data Efficiency
WAN wireless tech.
Infeasible Region
Shannon-capacity
HSDPA
EV-DO
SNR
12
802.16
The Stochastic Block Model
13
The stochastic block model
P = diag(p)
p = (p1 , . . . , pk )
0
W11 . . .
B ..
..
W =@ .
.
Wk1
···
SBM(n, p, W )
<- probability vector = relative size of the communities
1
W1k
<- symmetric matrix with entries in [0,1]
C
..
= prob .of connecting among communities
. A
Wkk
np1
W11
W12
W13
W14
The DMC of clustering..?
W24
np4
W44
14
W34
W22
np2
W23
W33
np3
k=4
The (exact) recovery problem
Let X n = [X1 , . . . , Xn ] represent the community variables
of the nodes (drawn under p)
ˆ n (·) solves (exact) recovery in the SBM if
Definition. An algorithm X
ˆ n (G)) = 1.
for a random graph G under the model, limn!1 P(X n = X
We will see weaker recovery requirements later
Starting point:
progress in science often comes from understanding special cases...
15
SBM with 2 symmetric communities: 2-SBM
16
2-SBM
p
q
p
n
2
n
2
p1 = p2 = 1/2
W11 = W22 = p
17
W12 = q
Some history
for 2-SBM
Recovery problem
ˆ n = X n) ! 1
P(X
1983
Holland
Laskey
Leinhardt
Boppana
Dyer
Frieze
Bui, Chaudhuri,
Leighton, Sipser
Bui, Chaudhuri,
Leighton, Sipser ’84
Boppana ’87
Dyer, Frieze ’89
Snijders, Nowicki ’97
Jerrum, Sorkin ’98
Condon, Karp ’99
Carson, Impagliazzo ’01
Mcsherry ’01
Bickel, Chen ’09
Rohe, Chatterjee, Yu ’11
maxflow-mincut
spectral meth.
min-cut via degrees
EM algo.
Metropolis aglo.
augmentation algo.
hill-climbing algo.
spectral meth.
N-G modularity
spectral meth.
Snijders
Nowicki
Jerrum
Sorkin
Condon
Karp
Carson
Impagliazzo
McSherry
1 4/((p+q)n)
p = ⌦(1/n), q = o(n p
)
p
(p q)/ p + q = ⌦( log(n)/n)
p q = ⌦(1)
p q = ⌦(1)
p q = ⌦(n 1/6+✏ )
p q = ⌦(n 1/2+✏ )
4
p q = ⌦(n 1/2
log
(n))
p
p
(p q)/ p ⌦( log(n)/n)
p
p
(p q)/ p + q = ⌦(log(n)/ n)
p q = ⌦(1)
algorithms driven...
18
2010
2014
Bickel
Chen
Rohe
Chatterjee
Yu
Instead of ‘how’,
when can we recover
the clusters (IT)?
Information-theoretic view of clustering
Y1
W
X1
unorthodox
code!
Y1
W
2-SBM
...
C
2
R=
n
...
X1
n
R=
N
Xn
W =
✓
1
✏
✏
Xn
YN
W
✏
1
✏
✓ ◆
n
N=
2
W
◆
W =
reliable comm. iff R < 1-H(ɛ)
✓
1
1
p p
q q
,
reliable comm. iff
exact recovery
19
◆
YN
???
Some history
for 2-SBM
Recovery problem
ˆ n = X n) ! 1
P(X
1983
Holland
Laskey
Leinhardt
Boppana
Dyer
Frieze
Bui, Chaudhuri,
Leighton, Sipser
Bui, Chaudhuri,
Leighton, Sipser ’84
Boppana ’87
Dyer, Frieze ’89
Snijders, Nowicki ’97
Jerrum, Sorkin ’98
Condon, Karp ’99
Carson, Impagliazzo ’01
Mcsherry ’01
Bickel, Chen ’09
Rohe, Chatterjee, Yu ’11
maxflow-mincut
spectral meth.
min-cut via degrees
EM algo.
Metropolis aglo.
augmentation algo.
hill-climbing algo.
spectral meth.
N-G modularity
spectral meth.
Snijders
Nowicki
Jerrum
Sorkin
Condon
Karp
Carson
Impagliazzo
McSherry
1 4/((p+q)n)
p = ⌦(1/n), q = o(n p
)
p
(p q)/ p + q = ⌦( log(n)/n)
p q = ⌦(1)
p q = ⌦(1)
p q = ⌦(n 1/6+✏ )
p q = ⌦(n 1/2+✏ )
p q = ⌦(n 1/2
log4 (n))
p
p
(p q)/ p ⌦( log(n)/n)
p
p
(p q)/ p + q = ⌦(log(n)/ n)
p q = ⌦(1)
20
2010
2014
Bickel
Chen
Rohe
Chatterjee
Yu
Abbe-Bandeira-Hall
a log(n)
b log(n)
p=
,q =
n
n p
Recovery i↵
a+b
2
1+
ab
efficiently achievable
b
Infeasible
a
Some history
for 2-SBM
ˆ n = X n) ! 1
P(X
1983
Holland
Laskey
Leinhardt
Boppana
Dyer
Frieze
Bui, Chaudhuri,
Leighton, Sipser
Bui, Chaudhuri,
Leighton, Sipser ’84
Boppana ’87
Dyer, Frieze ’89
Snijders, Nowicki ’97
Jerrum, Sorkin ’98
Condon, Karp ’99
Carson, Impagliazzo ’01
Mcsherry ’01
Bickel, Chen ’09
Rohe, Chatterjee, Yu ’11
Detection problem
Recovery problem
maxflow-mincut
spectral meth.
min-cut via degrees
EM algo.
Metropolis aglo.
augmentation algo.
hill-climbing algo.
spectral meth.
N-G modularity
spectral meth.
Snijders
Nowicki
Jerrum
Sorkin
ˆ n, X n)
d(X
1
9✏ > 0 : P(
<
n
2
2010
2014
✏) ! 1
Decelle
Massoulié
Krzakala
Bickel
Mossel
Moore
Chen
Carson
Zdeborova Neeman
Impagliazzo
Rohe
Sly
Chatterjee
McSherry
Yu
Coja-Oghlan Mossel-Neeman-Sly
Condon
Karp
1 4/((p+q)n)
p = ⌦(1/n), q = o(n p
)
p
(p q)/ p + q = ⌦( log(n)/n)
p q = ⌦(1)
p q = ⌦(1)
p q = ⌦(n 1/6+✏ )
p q = ⌦(n 1/2+✏ )
p q = ⌦(n 1/2
log4 (n))
p
p
(p q)/ p ⌦( log(n)/n)
p
p
(p q)/ p + q = ⌦(log(n)/ n)
p q = ⌦(1)
What about multiple/asymm. communities?
Conjecture: detection changes with 5 or more
21
Abbe-Bandeira-Hall
a log(n)
b log(n)
p=
,q =
n
n p
Recovery i↵
a+b
2
1+
ab
efficiently achievable
a
b
p = ,q =
n
n
Detection i↵ (a b)2 > 2(a + b)
Recovery in the 2-SBM: IT limit
Converse: If
p
a+b
2
i
p
Ri
ab < 1 then ML fails w.h.p.
q
Rj
j
p
a log n
b log n
p=
,q =
n
n
Bj
Bi
n/2
n/2
what is ML? ! min-bisection
ML fails if two nodes can be swapped to reduce the cut
P (9i : Bi  Ri ) = ? ⇣ n P (B1  R1 ) (weak correlations)
P (B1  R1 ) = n
Abbe-Bandeira-Hall ’14
22
((a+b)/2
p
ab)+o(1)
Recovery in the 2-SBM: efficient algorithms
p
p
q
1
n/2
spectral:
max xT Ax
s.t. kxk = 1
1t x = 0
P
+1
n/2
ML-decoding:
max xT Ax
s.t. xi = ±1
1t x = 0
NP-hard
lifting:
t
X = xx
SDP:
max tr(AX)
s.t. Xii = 1
1t X = 0
X⌫0
rank(X) = 1
Abbe-Bandeira-Hall ’14
23
Recovery in the 2-SBM: efficient algorithms
Theorem. The SDP solves recovery if 2LSBM + 11t + In ⌫ 0
where LSBM = DG+ DG
A.
-> Analyze the spectral norm of a random matrix
[Abbe-Bandeira-Hall ’14] Bernstein: slightly loose
[Xu-Hanjek-Wu ’15] Seginer bound
[Bandeira, Bandeira-Van Handel ’15] tight bound
Note that SDP can be expensive...
Abbe-Bandeira-Hall ’14
24
The general SBM
25
SBM(n, p, W )
Quiz: If a node is in community i, how many neighbors does it
have in expectation in community j ?
1.
2.
3.
4.
np1
npj
npj Wij
npi Wij
7
0
@
W13
W14
W24
np4
i
nP W
W11
W12
1
A
26
W44
np2
W22
W23
W33
W34
“degree profile matrix”
np3
Back to the Information-theoretic view of clustering
Y1
W
2
R=
n
X1
W
Y1
W
YN
...
X1
n
R=
N
SBM
...
C
Xn
W
✓
W
◆
Xn
YN
✓
reliable comm. iff R < 1-H(ɛ)
W
◆
__
reliable comm. iff 1 < (a+b)/2-√ab
reliable comm. iff R < max I(p,W)
reliable comm. iff 1 < J(p,W) ???
p | {z }
KL-divergence
27
Main results
Theorem 1. Recovery is solvable in SBM(n, p, Q log(n)/n) if and only if
J(p, Q) := min D+ ((P Q)i , (P Q)j )
i<j
where
D+ (µ, ⌫) := max
t2[0,1]
1 p
( a
2
p
b)
2
1
Abbe-Bandeira-Hall ’14
X
`2[k]
|
t)⌫`
{z
Dt (µ, ⌫)
p
1 p
2k µ
µt` ⌫`1
t
}
⌫k22 is the Hellinger divergence (distance)
P
t
• Dt is an f -divergence:
⌫
f
(µ
/⌫
)
f
(x)
=
1
t
+
tx
x
i
i
i
i
P t t
• log maxt i µi ⌫i is the Cherno↵ divergence
• D1/2 (µ, ⌫) =
We call D+ the CH-divergence.
D
[Abbe-Sandon ’15]
tµ` + (1
1
28
Main results
Theorem 1. Recovery is solvable in SBM(n, p, Q log(n)/n) if and only if
J(p, Q) := min D+ ((P Q)i , (P Q)j )
where
i<j
D+ (µ, ⌫) := max
t2[0,1]
X
tµ` + (1
t)⌫`
1
µt` ⌫`1
t
`2[k]
Is recovery in the general SBM solvable efficiently
down the information theoretic threshold? YES!
Theorem 2. The degree-profiling algorithm achieves the threshold
and runs in quasi-linear time.
[Abbe-Sandon ’15]
29
When can we extract a specific community?
Theorem. If community i has a profile (P Q)i at D+ -distance at least 1
from all other profiles (P Q)j , j 6= i, then it can be extracted w.h.p.
(P Q)j
1
(P Q)i
What if we do not know the parameters?
We can learn them on the fly: [Abbe-Sandon ’15] (second paper)
[Abbe-Sandon ’15]
30
Proof techniques and algorithms
31
A key step
1
dv
2
Hypothesis 1
1
2
.
dv ⇠ P(log(n)(P Q)1 )
3
Hypothesis 2
.
dv ⇠ P(log(n)(P Q)2 )
3
Theorem. For any ✓1 , ✓2 2 (R+ \ {0})k with ✓1 6= ✓2 and p1 , p2 2 R+ \ {0},
⇣
⌘
X
min(Pln(n)✓1 (x)p1 , Pln(n)✓2 (x)p2 ) = ⇥ n D+ (✓1 ,✓2 ) o(1) ,
x2Zk
+
where D+ is the CH-divergence.
[Abbe-Sandon ’15]
32
How to use this?
Plan:
put effort in recovering most of the nodes and
then finish greedily with local improvements
[Abbe-Sandon ’15]
33
The degree-profiling algorithm
G’ loglog-degree
G” log-degree
(1) Split G into two graphs
(2) Run Sphere-comparison on G’
-> gets a fraction 1-o(1)
(see next)
(3) Take now G” with the clustering of G’
dv
Hypothesis 1
...
dv ⇠ P(log(n)(P Q)1 )
Hypothesis 2
...
dv ⇠ P(log(n)(P Q)2 )
Pe = n
[Abbe-Sandon ’15]
D+ ((P Q)1 ,(P Q)2 )+o(1)
34
(capacity-achieving)
How do we get most nodes correctly?
35
Other recovery requirements
Weak recovery or detection : c = 1/k + ✏ for some ✏ > 0
(for the symmetric k-SBM).
Partial recovery: An algorithm solves partial recovery in the
SBM with accuracy c if it produces a clustering which is correct on a
fraction c of the nodes with high probability.
Almost exact recovery: c = 1
o(1)
Exact recovery: c = 1
For all the above: what are the “efficient” VS. “information-theoretic”
fundamental limits?
36
Partial recovery in SBM(n, p, Q/n)
What is a good notion of SNR?
Proposed notion of SNR:
|
=
Bin(n/2, b/n)
=
Bin(n/2, a/n)
min |
2
e.v. of PQ
max
(a b)2
2(a+b) for 2-symm. comm.
(a b)2
k(a+(k 1)b) for k-symm. comm.
Theorem (informal). In the sparse SBM(n, p, Q/n),
the Sphere-comparison algorithm recovers a fraction
of nodes which approaches 1 when the SNR diverges.
Note that the SNR scales if Q scales!
[Abbe-Sandon ’15]
37
Sphere-comparison
A node neighborhood in
=i
}
SBM(n, p, Q/n)
v
...
(P Q)i
pk Qik
depth r
...
p1 Qi1
npk
{z
np1
r
[Abbe-Sandon ’15]
Nr (v)
|
((P Q) )i
38
v
Sphere-comparison
=i
...
Take now
two nodes:
Compare v and v’ from:
0
Nr (v)
0
Nr0 (v )
|Nr (v) \ Nr0 (v )|
hard to analyze...
...
v0
[Abbe-Sandon ’15]
=j
39
Decorrelate:
v
Sphere-comparison
=i
Subsample G with prob. c to get E
Compare v and v’ from:
0
...
Nr,r0 [E] (v · v )
Nr[G\E] (v)
N
r 0 [G\E]
0
(v )
E
...
[Abbe-Sandon ’15]
v0
=j
= number of crossing edges
cQ
0
⇡ Nr[G\E] (v) ·
Nr0 [G\E] (v )
n
cQ
r
r0
⇡ ((1 c)P Q) e v ·
((1 c)P Q) e
n
r+r 0
r+r 0
e v0 /n
= c(1 c)
e v · Q(P Q)
v0
Additional steps:
1. look at several depths -> Vandermonde syst.
2. use “anchor nodes”
40
A real data example
41
The political blogs network
edge = hyperlink between blogs
1222 blogs
(left- and right-leaning)
[Adamic and Glance ’05]
The CH-divergence is close to 1
We can recover 95% of the nodes correctly
42
Some open problems in community detection
43
Some open problems in community detection
I. The SBM:
a. Recovery
Growing nb. of communities? Sub-linear communities?
[Abbe-Sandon ’15] should extend to k = o(log(n))
[Chen-Xu ’14] k,p,q scale with n polynomially
44
Some open problems in community detection
I. The SBM:
a. Recovery
b. Detection and broadcasting on trees
[Mossel-Neeman-Sly ’13] Converse for detection in 2-SBM:
p=a/n, q=b/n
Galton-Watson tree
Poisson((a+b)/2)
If (a+b)/2<=1 the tree dies w.p. 1
If (a+b)/2>1 the tree survives w.p. >0
Xr
Unorthodox broadcasting problem:
when can we detect the root-bit?
BSC(b/(a + b))
a+b
c=
2
1
1
1
If and only if c >
(1 2")2
0
45
[Evans-Kenyon-Peres-Schulman ’00]
(a b)2
()
>1
2(a + b)
Some open problems in community detection
I. The SBM:
a. Recovery
b. Detection and broadcasting on trees
For k clusters? SNR =
(a
k(a
b)2
(k 1)b)
impossible
0
possible
ck
efficient
1
SNR
Conjecture. For the symmetric k-SBM(n, a, b), there exists ck s.t.
(1) If SNR < ck , then detection cannot be solved,
(2) If ck < SNR < 1, then detection can be solved
information-theoretically but not efficiently,
(3) If SNR > 1, then detection can be solved efficiently.
Moreover ck = 1 for k 2 {2, 3, 4} and ck < 1 for k 5.
[Decelle-Krzakala-Zdeborova-Moore ’11]
46
Some open problems in community detection
I. The SBM:
a. Recovery
b. Detection and broadcasting on trees
c. Partial recovery and the SNR-distortion curve
Conjecture. For the symmetric k-SBM(n, a, b) and ↵ 2 (1/k, 1),
there exists k , k s.t. partial-recovery of accuracy ↵ is solvable
if and only if SNR > k , and efficiently solvable i↵ SNR > k .
=1
↵
k=2
1/2
k
5
SNR
1
47
Some open problems in community detection
II. Other block models
p
- Censored block models
- Labelled block models
0
p
p
1
2-CBM(n, p, ✏)
2-CBM
n/2
n/2
[Abbe-Bandeira-Bracher-Singer,
[Abbe-Bandeira-Bracher-Singer ’14]
Xu-Lelarge-Massoulie,
Saad-Krzakala-Lelarge-Zdeborova,
Chin-Rao-Vu,
“correlation clustering” [Bansal, Blum, Chawla ’04]
Hajek-Wu-Xu]
“LDGM codes” [Kumar, Pakzad, Salavati, Shokrollahi ’12]
“labelled block model” [Heimlicher, Lelarge, Massoulié ’12]
“soft CSPs” [Abbe, Montanari ’13]
“pairwise measurements” [Chen, Suh, Goldsmith ’14 and ’15]
“bounded-size correlation clustering” [Puleo, Milenkovic ’14]
48
Some open problems in community detection
II. Other block models
- Censored block models
- Labelled block models
- Degree-corrected block models [Karrer-Newman ’11]
- Mixed-membership block models [Airoldi-Blei-Fienber-Xing ’08]
- Overlapping block models
[Abbe-Sandon ’15]
u
Xu ⇠ p
v
Xv ⇠ p
OSBM (n, p, f )
p on {0, 1}s
connect (u, v) with prob. f (Xu , Xv )
49
f : {0, 1}s ⇥ {0, 1}s ! [0, 1]
Some open problems in community detection
II. Other block models
- Censored block models
- Labelled block models
- Degree-corrected block models
- Mixed-membership block models
- Overlapping block models
- Planted community [Deshpande-Montanari ’14 , Montanari ’15]
,
50
p
p
Some open problems in community detection
II. Other block models
- Censored block models
- Labelled block models
- Degree-corrected block models
- Mixed-membership block models
- Overlapping block models
- Planted community
- Hypergraph models
51
Some open problems in community detection
II. Other block models
- Censored block models
- Labelled block models
- Degree-corrected block models
- Mixed-membership block models
- Overlapping block models
- Planted community
- Hypergraph models
For all the above: Is there a CH-divergence behind?
A generalized notion of SNR? Detection gaps?
Efficient algorithms?
52
Some open problems in community detection
III. Beyond block models:
a. Exchangeable random arraysw and graphons w : [0, 1]2 ! [0, 1]
G2T
w(ui , uj )
P (Eij = 1|Xi = xi , Xj = xj ) = w(xi , xj )
×
uj
[Lovasz]
G1
(ui , uj ) [Choi-Wolfe, Airoldi-Costa-Chan]
SBMs can approximateuigraphons
b. Graphical channels
Figure 1: [Left] Given a graphon w : [0, 1]2 → [0, 1], we draw i.i.d. samples ui , uj from
Uniform[0,1] and assign Gt [i, j] = 1 with probability w(ui , uj ), for t = 1, . . . , 2T . [Middle]
Heat map of a graphon w. [Right] A random graph generated by the graphon shown in the middle.
Rows and columns of the graph are ordered by increasing ui , instead of i for better visualization.
A generalWolfeinformation-theoretic
model?
[9] studied the consistency properties, but
did not provide algorithms to estimate the graphon.
To the best of our knowledge, the only method that estimates graphons consistently, besides ours, is
USVT [8]. However, our algorithm has better complexity and outperforms USVT in our simulations.
More recently, other groups have begun exploring approaches related to ours [28, 24].
The proposed approximation procedure requires w to be piecewise Lipschitz. The basic idea is to
approximate w by a two-dimensional step function w
! with diminishing intervals as n increases.The
proposed method is called the stochastic blockmodel approximation (SBA) algorithm, as the idea of
using a two-dimensional step function for approximation is equivalent to using the stochastic block
models [10, 22, 13, 7, 25]. The SBA algorithm is defined up to permutations of the nodes, so the
estimated graphon is53not canonical. However, this does not affect the consistency properties of the
SBA algorithm, as the consistency is measured w.r.t. the graphon that generates the graphs.
Graphical channels
[Abbe-Montanari ’13]
A family of channels motivated by inference on graph problems
V
Q
YN
G
E
...
For x 2 X , y 2 Y ,
Q
P(y|x) = e2E(G) Q(ye |x[e])
Y1
...
• G = (V, E) a k-hypergraph
• Q : X k ! Y a channel (kernel)
X1
Q
Xn
How much information do graphical channels carry?
1
Theorem. lim I(X n ; G) exists for ER graphs and some kernels
n!1 n
-> why not always? what is the limit?
54
Connection: sparse PCA and clustering
55
SBM and low-rank r
Gaussian model
Spiked Wigner model:
Y =
t
XX + Z
Gaussian symmetric
Yij = cXi Xj + Zij
n
• X = (X1 , . . . , Xn ) i.i.d. Bernoulli(✏) ! sparse-PCA
[Amini-Wainwright ’09, Desphande-Montanari ’14]
• X = (X1 , . . . , Xn ) i.i.d. Radamacher(1/2) ! SBM???
[Deshpande-Abbe-Montanari ’15]
1
1
?
Theorem. lim
I(X; G(n, pn , qn )) = lim
I(X; Y )
n!1 n
n!1 n
=
2
If
n
n(pn qn )
=
!
2(pn + qn )
where
⇤
solves
= (1
+
2
⇤
⇤
+ I( ⇤ )
4
4
2
(finite SNR), with npn , nqn ! 1 (large degrees)
MMSE( ))
I-MMSE [Guo-Shamai-Verdú]
56
Y1 ( ) =
p
X1 + Z1
(single-letter)
Conclusion
Community detection couples naturally with the channel view of
information theory and more specifically with:
}
- graph-based codes
- f-divergences
- broadcasting problems
- I-MMSE
- ...
|
{z
unorthodox versions...
More generally, the problem of inferring global similarity classes in data sets
from noisy local interactions is at the center of many problems in ML, and an
information-theoretic view of these problems seems needed and powerful.
57
Questions?
Documents related to the tutorial:
www.princeton.edu/~eabbe
58