Bump Hun(ng in the Dark Local Discrepancy Maximiza(on on Graphs Aris(des Gionis Michael Mathioudakis An> Ukkonen April 16th, 2015 Seoul, Korea ICDE 2015 undirected graph computer network online social network etc. special nodes failure reported / virus detected! posted message about topic X Find the Bump! regular nodes Here it is! 2 graph, special & non-‐special nodes find connected subgraph with max linear discrepancy score = α x #special -‐ #non-‐special score = 0 – 1 = -‐1 score = 1 – 0 = 1 for α = 1 score = 18 – 3 = 15 3 graph, special & non-‐special nodes find connected subgraph with max linear discrepancy score = α x #special -‐ #non-‐special fixed frac(on of special nodes subgraph size ì score ì fixed subgraph size frac(on of special nodes ì score ì 4 local access special nodes are provided as input build graph via get-‐neighbors func(on how much of the graph do we need? (infinite graph?) 5 Approach first retrieve a minimal part of the graph necessary when we have only local access use get-neighbors func(on to expand around the special nodes then solve problem on retrieved subgraph 6 In What Follows • Unrestricted Access – – – – Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms • Local Access – Algorithms to retrieve (part of) the graph • Experiments • Future Work 7 Unrestricted Access the problem is… NP-‐hard (reduc(on from SETCOVER) 8 Unrestricted Access the problem is… a special case of PRIZECOLLECTINGSTEINERTREE input: graph output: tree posi(ve weight on terminal nodes non-‐nega(ve weight on edges min Σ(missed terminal node weights) + Σ (edge weights) terminal nodes special nodes: weight = α+1 edges: weight = 1 max discrepancy is instance of min PCST weight GoemansWilliamson algorithm, O(|V||E|) & O(|V|2log|V|) (2 -‐ 1/(n -‐ 1))-‐approxima(on min-‐approxima(on does not translate to our max-‐problem 9 In What Follows • Unrestricted Access – – – – Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms • Local Access – Algorithms to retrieve (part of) the graph • Experiments • Future Work 10 Special Case graph is a tree op(mal linear recursive algorithm op(mal without root (in subtree) op(mal in graph or op(mal with root op(mal with root of subtree 11 Algorithms idea for general case heuris(cs generate spanning tree for graph solve problem on spanning tree BFS-‐trees from each special node min weight spanning tree random weights w(u,v) in [0,1] ‘smart’ weights w(u,v) = 2 - 1{u is special} - 1{v is special} GW for PCST 12 In What Follows • Unrestricted Access – – – – Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms • Local Access – Algorithms to retrieve (part of) the graph • Experiments • Future Work 13 Local Access start with special nodes call get-neighbors(node)to expand retrieve en(re graph? expansion strategies full expansion return a subgraph that contains the op(mal solu(on oblivious expansion, adap(ve expansion 14 Full Expansion retrieve graph that contains op(mal solu(on with calls to get-neighbors() maintain fron(er to expand in each itera(on ini(ally: 1. fron(er = special nodes loop: 2. call get-‐neighbors() on fron(er nodes 3. update fron(er 4. go to .2. if fron(er not empty fron(er condi(on unexpanded nodes for which min distance from reachable special node less than (α+1) x #reachable_special_nodes 15 Full Expansion expensive in prac(ce we look for heuris(cs 16 Oblivious Expansion α = 1 expand (α+1) (mes around each special node not guaranteed to cover op(mal solu(on a solution special node solution b op(mal retrieved by ObliviousExpansion addi(onally retrieved by Full Expansion 17 Adap(ve Expansion idea : while expanding, es(mate heurisHcally how close we are to op(mal for each component, solve problem on spanning trees rooted at fron(er nodes obtain solu(ons with and w/o roots tree of nega(ve discrepancy connected components of retrieved graph maintain heuris(c tree of posi(ve discrepancy upper bound of op(mal solu(on in en(re graph lower bound of op(mal solu(on from retrieved graph terminate when upper bound ≅ lower bound 18 In What Follows • Unrestricted Access – – – – Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms • Local Access – Algorithms to retrieve (part of) the graph • Experiments • Future Work 19 Datasets Table I DATASET STATISTICS ( NUMBERS ARE ROUNDED ). synthe(c real Dataset |V | Geo 1 · 106 BA 1 · 106 Grid 4 · 106 Livejournal 4.3 · 106 Patents 2 · 106 Pokec 1.4 · 106 |E| 4 · 106 10 · 106 8 · 106 69 · 106 16.5 · 106 30.6 · 106 All graphs used in the experiments are undirected and their 20 sizes are reported in Table I. N s q s g g i F Input graphs from previous datasets special nodes: planted spheres k spheres radius ρ s special nodes inside spheres z special nodes outside spheres 21 Algorithms expansion measure Full (...not!) Oblivious Adap(ve cost (get-neighbors calls) graph size running (me vs graph size accuracy (Jaccard coeff) discrepancy op(miza(on BFS random-‐ST smart-‐ST PCST 22 Expansion Table II E XPANSION TABLE ( AVERAGES OF 20 RUNS ) dataset Grid Grid Geo Geo BA BA Patents Patents Pokec Pokec Livejournal Livejournal s k 20 2 60 1 20 2 60 1 20 2 60 1 20 2 60 1 20 2 60 1 20 2 60 1 ObliviousExp. AdaptiveExp. cost size cost size 302 888 2783 7950 261 784 534 1604 452 2578 4833 30883 418 2452 578 3991 3943 243227 114 6032 4477 271870 135 7407 605 3076 13436 25544 5907 13009 620 3126 3884 217592 161 7249 4343 240544 116 5146 3703 348933 234 13540 4667 394023 129 7087 1e+02 1e+01 1e+00 1e-01 1e-02 generate Q, and in interest of presentation, here we report what we consider to be representative results. Table II shows the cost (number of API calls) as well as the size (number of edges) of the retrieved graph G . 1e+01 23 times of Figure 6. Running size (number of edges). We ES OF Op(miza(on 10 Running time (in sec) 20 RUNS ) Exp. AdaptiveExp. size cost size 888 2783 7950 784 534 1604 2578 4833 30883 2452 578 3991 3227 114 6032 1870 135 7407 3076 13436 25544 3126 5907 13009 7592 161 7249 0544 116 5146 8933 234 13540 4023 129 7087 sentation, here we report ve results. r of API calls) as well the retrieved graph GX . s that for Grid, Geo, on results in fewer API 1e+02 1e+01 1e+00 1e-01 BFS-Tree Random-ST PCST-Tree Smart-ST 1e-02 1e+01 1e+03 1e+05 expansion size (#edges) Figure 6. Running times of the different algorithms as a function of expansion size (number of edges). We can see that in comparison to PCST-Tree SmartST scales to inputs that are up to two orders of magnitude larger. 24 Op(miza(on 11 Table III ACCURACY, AVERAGES OF 20 RUNS dataset Grid Grid Geo Geo BA BA Patents Patents Pokec Pokec Livejournal Livejournal ObliviousExpansion AdaptiveExpansion s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST 20 2 0.88 (0) 0.81 (0) 0.93 (0) 0.93 (0) 0.88 (0) 0.85 (0) 0.93 (0) 0.93 (0) 60 1 1.00 (0) 0.94 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 1.00 (0) 1.00 (0) 20 2 1.00 (0) 0.95 (0) 1.00 (0) 1.00 (0) 1.00 (0) 0.98 (0) 1.00 (0) 1.00 (0) 60 1 1.00 (0) 0.96 (0) 1.00 (0) 1.00 (0) 0.99 (0) 0.98 (0) 0.99 (0) 0.99 (0) 20 2 0.47 (12) 0.18 (12) NaN (20) 0.46 (0) 0.46 (0) 0.44 (0) 0.46 (0) 0.45 (0) 60 1 NaN (20) NaN (20) NaN (20) 0.77 (0) 0.76 (0) 0.76 (0) 0.77 (3) 0.76 (0) 20 2 0.92 (0) 0.86 (0) 0.91 (0) 0.90 (0) 0.72 (0) 0.74 (0) 0.77 (3) 0.74 (0) 60 1 0.89 (0) 0.76 (0) 0.89 (0) 0.89 (0) 0.74 (0) 0.73 (0) 0.74 (0) 0.74 (0) 20 2 0.53 (2) 0.13 (3) NaN (20) 0.46 (0) 0.43 (0) 0.41 (0) 0.42 (2) 0.40 (0) 60 1 0.74 (6) 0.09 (6) NaN (20) 0.61 (0) 0.48 (0) 0.46 (0) 0.45 (1) 0.45 (0) 20 2 0.62 (5) 0.19 (5) NaN (20) 0.54 (0) 0.56 (0) 0.53 (0) 0.58 (5) 0.56 (0) 60 1 0.88 (12) 0.26 (9) NaN (20) 0.68 (0) 0.65 (0) 0.62 (0) 0.62 (1) 0.62 (0) Table IV D ISCREPANCY, AVERAGES OF 20 RUNS dataset Grid Grid Geo Geo BA BA Patents Patents Pokec Pokec Livejournal Livejournal ObliviousExpansion AdaptiveExpansion s k BFS-Tree Random-ST PCST-Tree Smart-ST BFS-Tree Random-ST PCST-Tree Smart-ST 20 2 14.5 (0) 11.8 (0) 16.8 (0) 16.7 (0) 14.8 (0) 13.8 (0) 16.4 (0) 16.3 (0) 60 1 41.0 (0) 36.9 (0) 41.0 (0) 41.0 (0) 40.5 (0) 38.9 (0) 40.9 (0) 40.9 (0) 20 2 19.9 (0) 18.4 (0) 20.0 (0) 20.0 (0) 19.9 (0) 19.2 (0) 20.0 (0) 20.0 (0) 60 1 22.0 (0) 20.6 (0) 22.0 (0) 22.0 (0) 21.8 (0) 21.6 (0) 21.8 (0) 21.8 (0) 20 2 15.0 (12) 2.8 (12) NaN (20) 15.2 (0) 15.6 (0) 14.4 (0) 14.4 (0) 15.0 (0) 60 1 NaN (20) NaN (20) NaN (20) 36.1 (0) 37.4 (0) 35.3 (0) 35.9 (3) 35.5 (0) 20 2 17.4 (0) 15.8 (0) 17.7 (0) 17.6 (0) 14.9 (0) 13.8 (0) 15.8 (3) 14.8 (0) 60 1 40.0 (0) 31.1 (0) 40.8 (0) 40.6 (0) 33.0 (0) 32.2 (0) 33.2 (0) 33.3 (0) 20 2 11.6 (2) 2.6 (3) NaN (20) 11.8 (0) 8.6 (0) 8.0 (0) 8.2 (2) 8.0 (0) 60 1 36.6 (6) 4.7 (6) NaN (20) 28.6 (0) 20.9 (0) 17.4 (0) 18.3 (1) 18.5 (0) 20 2 14.3 (5) 3.5 (5) NaN (20) 13.8 (0) 11.8 (0) 9.8 (0) 10.8 (5) 10.2 (0) 60 1 45.6 (12) 12.0 (9) NaN (20) 31.1 (0) 29.8 (0) 25.6 (0) 26.8 (1) 27.4 (0) ObliviousExpansion for denser graphs. We also note that once a subgraph has been discovered in the 25 In What Follows • Unrestricted Access – – – – Complexity Connec(on to Steiner Trees Special Case: Graph is Tree Algorithms • Local Access – Algorithms to retrieve (part of) the graph • Experiments • Future Work 26 Future Work Approxima(on Guarantee / Tighter expansions Distributed Se>ng Unknown Scale Applica(on on Real Data (TKDE Extension) 27 The End 28
© Copyright 2025