Bump Hun`ng in the Dark

Bump Hun(ng in the Dark Local Discrepancy Maximiza(on on Graphs Aris(des Gionis Michael Mathioudakis An> Ukkonen April 16th, 2015 Seoul, Korea ICDE 2015 undirected graph computer network online social network etc. special nodes failure reported / virus detected! posted message about topic X Find the Bump!
regular nodes Here it
is!
2 graph, special & non-­‐special nodes find connected subgraph with max linear discrepancy score = α x #special -­‐ #non-­‐special score = 0 – 1 = -­‐1 score = 1 – 0 = 1 for α = 1 score = 18 – 3 = 15 3 graph, special & non-­‐special nodes find connected subgraph with max linear discrepancy score = α x #special -­‐ #non-­‐special fixed frac(on of special nodes subgraph size ì score ì
fixed subgraph size frac(on of special nodes ì score ì 4 local access special nodes are provided as input build graph via get-­‐neighbors func(on how much of the graph do we need? (infinite graph?) 5 Approach first retrieve a minimal part of the graph necessary when we have only local access use get-neighbors func(on to expand around the special nodes then solve problem on retrieved subgraph 6 In What Follows •  Unrestricted Access – 
– 
– 
– 
Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms •  Local Access –  Algorithms to retrieve (part of) the graph •  Experiments •  Future Work 7 Unrestricted Access the problem is… NP-­‐hard (reduc(on from SETCOVER) 8 Unrestricted Access the problem is… a special case of PRIZECOLLECTINGSTEINERTREE input: graph
output: tree
posi(ve weight on terminal nodes non-­‐nega(ve weight on edges min Σ(missed terminal node weights) + Σ (edge weights) terminal nodes special nodes: weight = α+1 edges: weight = 1 max discrepancy is instance of min PCST weight GoemansWilliamson algorithm, O(|V||E|) & O(|V|2log|V|) (2 -­‐ 1/(n -­‐ 1))-­‐approxima(on min-­‐approxima(on does not translate to our max-­‐problem 9 In What Follows •  Unrestricted Access – 
– 
– 
– 
Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms •  Local Access –  Algorithms to retrieve (part of) the graph •  Experiments •  Future Work 10 Special Case graph is a tree op(mal linear recursive algorithm op(mal without root (in subtree) op(mal in graph or op(mal with root op(mal with root of subtree 11 Algorithms idea for general case heuris(cs generate spanning tree for graph solve problem on spanning tree BFS-­‐trees from each special node min weight spanning tree random weights w(u,v) in [0,1]
‘smart’ weights w(u,v) = 2 - 1{u is special} - 1{v is special}
GW for PCST 12 In What Follows •  Unrestricted Access – 
– 
– 
– 
Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms •  Local Access –  Algorithms to retrieve (part of) the graph •  Experiments •  Future Work 13 Local Access start with special nodes call get-neighbors(node)to expand retrieve en(re graph? expansion strategies full expansion return a subgraph that contains the op(mal solu(on oblivious expansion, adap(ve expansion 14 Full Expansion retrieve graph that contains op(mal solu(on with calls to get-neighbors() maintain fron(er to expand in each itera(on ini(ally: 1.  fron(er = special nodes loop: 2.  call get-­‐neighbors() on fron(er nodes 3.  update fron(er 4.  go to .2. if fron(er not empty fron(er condi(on unexpanded nodes for which min distance from reachable special node
less than (α+1) x #reachable_special_nodes
15 Full Expansion expensive in prac(ce we look for heuris(cs 16 Oblivious Expansion α = 1 expand (α+1) (mes around each special node not guaranteed to cover op(mal solu(on a solution
special node solution
b
op(mal retrieved by ObliviousExpansion addi(onally retrieved by Full Expansion 17 Adap(ve Expansion idea : while expanding, es(mate heurisHcally how close we are to op(mal for each component, solve problem on spanning trees rooted at fron(er nodes obtain solu(ons with and w/o roots tree of nega(ve discrepancy connected components of retrieved graph maintain heuris(c tree of posi(ve discrepancy upper bound of op(mal solu(on in en(re graph lower bound of op(mal solu(on from retrieved graph terminate when upper bound ≅ lower bound 18 In What Follows •  Unrestricted Access – 
– 
– 
– 
Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms •  Local Access –  Algorithms to retrieve (part of) the graph •  Experiments •  Future Work 19 Datasets Table I
DATASET STATISTICS ( NUMBERS ARE ROUNDED ).
synthe(c real Dataset
|V |
Geo
1 · 106
BA
1 · 106
Grid
4 · 106
Livejournal 4.3 · 106
Patents
2 · 106
Pokec
1.4 · 106
|E|
4 · 106
10 · 106
8 · 106
69 · 106
16.5 · 106
30.6 · 106
All graphs used in the experiments are undirected and their
20 sizes are reported in Table I.
N
s
q
s
g
g
i
F
Input graphs from previous datasets special nodes: planted spheres k spheres radius ρ s special nodes inside spheres z special nodes outside spheres 21 Algorithms expansion measure Full (...not!) Oblivious Adap(ve cost (get-neighbors calls) graph size running (me vs graph size accuracy (Jaccard coeff) discrepancy op(miza(on BFS random-­‐ST smart-­‐ST PCST 22 Expansion Table II
E XPANSION TABLE ( AVERAGES OF 20 RUNS )
dataset
Grid
Grid
Geo
Geo
BA
BA
Patents
Patents
Pokec
Pokec
Livejournal
Livejournal
s k
20 2
60 1
20 2
60 1
20 2
60 1
20 2
60 1
20 2
60 1
20 2
60 1
ObliviousExp. AdaptiveExp.
cost
size
cost
size
302
888 2783 7950
261
784
534 1604
452
2578
4833
30883
418
2452
578 3991
3943 243227
114 6032
4477 271870
135 7407
605
3076 13436 25544
5907
13009
620 3126
3884 217592
161 7249
4343 240544
116 5146
3703 348933
234 13540
4667 394023
129 7087
1e+02
1e+01
1e+00
1e-01
1e-02
generate Q, and in interest of presentation, here we report
what we consider to be representative results.
Table II shows the cost (number of API calls) as well
as the size (number of edges) of the retrieved graph G .
1e+01
23 times of
Figure 6. Running
size (number of edges). We
ES OF
Op(miza(on 10
Running time (in sec)
20 RUNS )
Exp. AdaptiveExp.
size
cost
size
888 2783 7950
784
534 1604
2578 4833 30883
2452
578 3991
3227
114 6032
1870
135 7407
3076 13436 25544
3126 5907 13009
7592
161 7249
0544
116 5146
8933
234 13540
4023
129 7087
sentation, here we report
ve results.
r of API calls) as well
the retrieved graph GX .
s that for Grid, Geo,
on results in fewer API
1e+02
1e+01
1e+00
1e-01
BFS-Tree
Random-ST
PCST-Tree
Smart-ST
1e-02
1e+01
1e+03
1e+05
expansion size (#edges)
Figure 6. Running times of the different algorithms as a function of expansion
size (number of edges). We can see that in comparison to PCST-Tree SmartST scales to inputs that are up to two orders of magnitude larger.
24 Op(miza(on 11
Table III
ACCURACY, AVERAGES OF 20 RUNS
dataset
Grid
Grid
Geo
Geo
BA
BA
Patents
Patents
Pokec
Pokec
Livejournal
Livejournal
ObliviousExpansion
AdaptiveExpansion
s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST
20 2
0.88 (0)
0.81 (0)
0.93 (0)
0.93 (0)
0.88 (0)
0.85 (0)
0.93 (0)
0.93 (0)
60 1
1.00 (0)
0.94 (0)
1.00 (0)
1.00 (0)
0.99 (0)
0.98 (0)
1.00 (0)
1.00 (0)
20 2
1.00 (0)
0.95 (0)
1.00 (0)
1.00 (0)
1.00 (0)
0.98 (0)
1.00 (0)
1.00 (0)
60 1
1.00 (0)
0.96 (0)
1.00 (0)
1.00 (0)
0.99 (0)
0.98 (0)
0.99 (0)
0.99 (0)
20 2 0.47 (12)
0.18 (12)
NaN (20)
0.46 (0)
0.46 (0)
0.44 (0)
0.46 (0)
0.45 (0)
60 1 NaN (20)
NaN (20)
NaN (20)
0.77 (0)
0.76 (0)
0.76 (0)
0.77 (3)
0.76 (0)
20 2
0.92 (0)
0.86 (0)
0.91 (0)
0.90 (0)
0.72 (0)
0.74 (0)
0.77 (3)
0.74 (0)
60 1
0.89 (0)
0.76 (0)
0.89 (0)
0.89 (0)
0.74 (0)
0.73 (0)
0.74 (0)
0.74 (0)
20 2
0.53 (2)
0.13 (3)
NaN (20)
0.46 (0)
0.43 (0)
0.41 (0)
0.42 (2)
0.40 (0)
60 1
0.74 (6)
0.09 (6)
NaN (20)
0.61 (0)
0.48 (0)
0.46 (0)
0.45 (1)
0.45 (0)
20 2
0.62 (5)
0.19 (5)
NaN (20)
0.54 (0)
0.56 (0)
0.53 (0)
0.58 (5)
0.56 (0)
60 1 0.88 (12)
0.26 (9)
NaN (20)
0.68 (0)
0.65 (0)
0.62 (0)
0.62 (1)
0.62 (0)
Table IV
D ISCREPANCY, AVERAGES OF 20 RUNS
dataset
Grid
Grid
Geo
Geo
BA
BA
Patents
Patents
Pokec
Pokec
Livejournal
Livejournal
ObliviousExpansion
AdaptiveExpansion
s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST
20 2
14.5 (0)
11.8 (0)
16.8 (0)
16.7 (0)
14.8 (0)
13.8 (0)
16.4 (0)
16.3 (0)
60 1
41.0 (0)
36.9 (0)
41.0 (0)
41.0 (0)
40.5 (0)
38.9 (0)
40.9 (0)
40.9 (0)
20 2
19.9 (0)
18.4 (0)
20.0 (0)
20.0 (0)
19.9 (0)
19.2 (0)
20.0 (0)
20.0 (0)
60 1
22.0 (0)
20.6 (0)
22.0 (0)
22.0 (0)
21.8 (0)
21.6 (0)
21.8 (0)
21.8 (0)
20 2 15.0 (12)
2.8 (12)
NaN (20)
15.2 (0)
15.6 (0)
14.4 (0)
14.4 (0)
15.0 (0)
60 1 NaN (20)
NaN (20)
NaN (20)
36.1 (0)
37.4 (0)
35.3 (0)
35.9 (3)
35.5 (0)
20 2
17.4 (0)
15.8 (0)
17.7 (0)
17.6 (0)
14.9 (0)
13.8 (0)
15.8 (3)
14.8 (0)
60 1
40.0 (0)
31.1 (0)
40.8 (0)
40.6 (0)
33.0 (0)
32.2 (0)
33.2 (0)
33.3 (0)
20 2
11.6 (2)
2.6 (3)
NaN (20)
11.8 (0)
8.6 (0)
8.0 (0)
8.2 (2)
8.0 (0)
60 1
36.6 (6)
4.7 (6)
NaN (20)
28.6 (0)
20.9 (0)
17.4 (0)
18.3 (1)
18.5 (0)
20 2
14.3 (5)
3.5 (5)
NaN (20)
13.8 (0)
11.8 (0)
9.8 (0)
10.8 (5)
10.2 (0)
60 1 45.6 (12)
12.0 (9)
NaN (20)
31.1 (0)
29.8 (0)
25.6 (0)
26.8 (1)
27.4 (0)
ObliviousExpansion for denser graphs.
We also note that once a subgraph has been discovered in the
25 In What Follows •  Unrestricted Access – 
– 
– 
– 
Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms •  Local Access –  Algorithms to retrieve (part of) the graph •  Experiments •  Future Work 26 Future Work Approxima(on Guarantee / Tighter expansions Distributed Se>ng Unknown Scale Applica(on on Real Data (TKDE Extension) 27 The End
28