Mining Maximal Cliques from an Uncertain Graph Arko Mukherjee Pan Xu Srikanta Tirthapura ([email protected]) 4/15/15 Iowa State University 1 Uncertain Graph • The presence of an edge between two nodes in a graph cannot be definiNvely ascertained ? ? ? ? ? • Examples – CommunicaNon Networks: Ghosh, et al. On a rouNng problem within probabilisNc graphs and its applicaNon to intermiTently connected networks. IEEE Infocom, 2007 – Social Networks: Guha et al. PropagaNon of trust and distrust. WWW, 2007 – Protein-‐Protein interacNon Networks: Asthana, et al. PredicNng protein complex membership using probabilisNc network reliability. Genome Research, 2004 – Regulatory networks in biological systems: Jiang, et al. Network moNf idenNficaNon in stochasNc networks. Proceedings of the NaNonal Academy of Sciences, 2006 4/15/15 Iowa State University 2 Uncertain Graph An uncertain graph is a triple G = (V,E,p), where V is a set of verNces, E is a set of possible edges, and p: E à (0,1] assigns a probability of existence p(e) for each edge e in E 0.6 0.5 0.9 0.5 0.1 0.6 0.5 We assume that events on different edges are mutually independent April 14, 2015 Iowa State University 3 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C Iowa State University 4 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 B A B 1/8 0.5 4/15/15 0.5 C C Iowa State University 5 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C A B A B 1/8 1/8 C C Iowa State University 6 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C A B A B A B 1/8 1/8 1/8 C C C Iowa State University 7 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 B A 0.5 0.5 C A B A B A B A 1/8 1/8 1/8 C C C B A B A B 1/8 1/8 1/8 1/8 C C C C 4/15/15 B Iowa State University 8 Sampling from an Uncertain Graph A B 1/8 A C Sample A 4/15/15 B A B C B 1/8 1/8 C C Iowa State University 9 Dense Substructure EnumeraNon Enumerate densely connected (potenNally overlapping) subgraphs from a graph • Maximal Clique • Maximal Biclique • Quasi clique/biclique • K-‐Edge connected subgraphs • Densest Subgraph, Triangle Densest Subgraph,.. 4/15/15 Iowa State University 10 Research QuesNons • What is the noNon of a dense structure in uncertain graph? • How many such structures can exist in an uncertain graph? • Time-‐efficient algorithms to find such structures 4/15/15 Iowa State University 11 ContribuNons 1. DefiniNon of a Maximal Clique in an Uncertain graph, α-‐maximal-‐clique 2. Bound on the number of α-‐maximal-‐cliques in an uncertain graph on n verNces 3. Algorithm for enumeraNng all α-‐maximal-‐ cliques in an uncertain graph, analysis of runNme 4. Experimental EvaluaNon 4/15/15 Iowa State University 12 Roadmap • • • • • IntroducNon Maximal Clique in an Uncertain Graph Number of Maximal Cliques Algorithm for EnumeraNng Maximal Cliques Experimental Results 4/15/15 Iowa State University 13 Maximal Clique (DeterminisNc Graph) • A clique in a graph G = (V,E) is a subset of verNces C such that the induced subgraph on C is a complete graph • A clique C is called maximal if there does not exist another clique C’ such that C subset of C’ 4/15/15 Iowa State University 14 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 15 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 16 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 17 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 18 Maximal Clique in a Graph B C D A E 4/15/15 Not a maximal clique F Iowa State University 19 DefiniNon: Maximal Clique in an Uncertain Graph • For a set of verNces C, the clique probability clq(C, G) is the probability that C is a clique in a graph sampled from G • For parameter α, 0 < α <= 1, a set of verNces C is called an α-‐maximal-‐clique if – clq(C, G) ≥ α – There is no vertex set C’ such that C subset of C’ and clq(C’, G) ≥ α • If α=1, reduces to determinisNc maximal clique 4/15/15 Iowa State University 20 Clique ProbabiliNes B 0.4 C 1 0.5 0.5 1 E 4/15/15 0.9 Iowa State University F 21 Clique ProbabiliNes B 0.4 C 1 0.5 clq({C,E,F}) = 0.45 0.5 1 E 4/15/15 0.9 Iowa State University F 22 Clique ProbabiliNes B 0.4 C 1 0.5 clq({C,E,F}) = 0.45 0.5 clq({C,B,E}) = 0.20 1 E 4/15/15 0.9 Iowa State University F 23 Maximal Clique EnumeraNon in an Uncertain Graph Given an uncertain graph G = (V,E,p) and parameter 0 < α < 1, enumerate all α-‐maximal cliques in G 4/15/15 Iowa State University 24 Enumerate α-‐maximal-‐cliques, α=0.4 B 0.4 C B 1 0.5 0.5 4/15/15 0.9 C 1 0.5 1 E 0.4 0.5 1 F E Iowa State University 0.9 F 25 Enumerate α-‐maximal-‐cliques, α=0.2 B 0.4 B C 0.5 0.5 4/15/15 0.9 0.5 1 1 E C 1 1 0.5 0.4 E F Iowa State University 0.9 F 26 Enumerate 0.2-‐maximal-‐cliques B 0.4 C B 0.5 0.5 4/15/15 0.9 B F E 0.4 C 1 0.5 0.5 1 1 E C 1 1 0.5 0.4 0.9 Iowa State University 1 F E 0.9 F 27 Roadmap • IntroducNon • DefiniNon of a Maximal Clique in an Uncertain Graph • Number of Maximal Cliques • Algorithm for EnumeraNng Maximal Cliques • Experimental Results 4/15/15 Iowa State University 28 Number of Maximal Cliques • DeterminisNc Graph: The maximum number of maximal cliques in a graph on n verNces is 3n/3 (Moon and Moser, On cliques in graphs, 1965) • Our Result for an Uncertain Graph: For any 0 < α < 1, the maximum number of α-‐maximal !n $ cliques in uncertain graph G is #" n / 2 &%=Θ(2n/√n ) Note this bound is independent of α 4/15/15 Iowa State University 29 Lower Bound on Maximum Number of α-‐Maximal Cliques • Consider graph Kn Each edge has a probability q B A C q • Let k = = 3 q Let q be such that E F qκ = α • Each subgraph on n/2 v! erNces i s a n α -‐maximal-‐ clique. This graph has #" nn / 2 $&% α-‐maximal cliques !n / 2$ # & "2 % 4/15/15 Iowa State University D 30 Upper Bound on Maximum Number of α-‐Maximal Cliques • Non-‐redundant CollecNon of Sets: A collecNon of sets C is said to be non-‐ redundant if for any pair S1, S2 ∈ C, such that S1 S2 , we have – S1 not subset of S2 and – S2 not subset of S1 • The set of uncertain maximal cliques is a non-‐redundant collecNon • The largest number of α-‐maximal cliques in G is upper bounded by the size of the largest non-‐redundant collecNon of subsets of V. • !n $ We show this to be # & "n / 2% 4/15/15 Iowa State University 31 Roadmap • IntroducNon • DefiniNon of a Maximal Clique in an Uncertain Graph • Number of Maximal Cliques • Algorithm for EnumeraNng Maximal Cliques • Experimental Results 4/15/15 Iowa State University 32 Prior Work on Clique EnumeraNon • Bron and Kerbosch (1973): Branch-‐and-‐Bound approach to enumeraNng maximal cliques in a determinisNc graph • Tomita et al. (2006): Branch-‐and-‐bound with pivoNng, worst-‐case Nme opNmal, followed up by Eppstein, Loffler and Strash (2011) • Tsukiyama et al. (1977): Output-‐sensiNve runNme, follow up work by Makino and Uno (2004) • Uncertain Graphs: Zou et al. (2010) – Different definiNon of maximal clique – Focuses on top-‐k maximal clique 4/15/15 Iowa State University 33 Algorithm for Maximal Uncertain cLique EnumeraNon (MULE) • Branch-‐and-‐bound based Algorithm • Worst case runNme complexity of O(n2n) • Near-‐opNmal since output size can be Θ(√n 2n) – number of structures is itself Θ(2n/√n ) – each maximal clique is Θ(n) 4/15/15 Iowa State University 34 MULE Ideas (1) • Consider verNces in some total order • Depth First Search – Start with an α-‐clique C, iniNally empty – Grow C by adding one vertex at a Nme, while retaining an α-‐clique – At point where adding further verNces leads it to not be an α-‐clique, we have an α-‐maximal-‐clique – Output clique, backtrack and explore 4/15/15 Iowa State University 35 MULE Ideas (2) • Restrict addiNon to only those verNces v such that – Already connected to every vertex in current working set C – Adding v to C keeps the clique probability large enough • Incrementally maintain changes in clique probabiliNes in O(1) Nme per vertex addiNon • Also leads to Maximality check in O(n) Nme (note that given a clique, checking if α-‐maximal is O(n2) operaNon, in worst case) 4/15/15 Iowa State University 36 Roadmap • IntroducNon • DefiniNon of a Maximal Clique in an Uncertain Graph • Number of Maximal Cliques • Algorithm for EnumeraNng Maximal Cliques • Experimental Results 4/15/15 Iowa State University 37 Input Graphs • Real-‐World Uncertain Graphs – PPI: Protein-‐Protein InteracNon Network (3751 verNces, 3692 edges) – DBLP Social Network (684911, 2284991) • Semi-‐SyntheNc Uncertain Graphs – P2p communicaNon networks (5k-‐10k, 20k-‐40k) – Arxiv collaboraNon networks (5k, 30k) – Wiki-‐vote (7k, 100k) • SyntheNc Graphs 4/15/15 Iowa State University 38 Runtime (seconds) RunNme, α=0.8 100 DFS-NOIP MULE 10 1 wiki-vote BA5000 ca-GrQc PPI Input Graph 4/15/15 Iowa State University 39 Number of Alpha Maximal Cliques Number of Maximal Cliques Semi-‐SyntheNc and Real-‐World Graphs 1.8e+06 PPI ca-GrQc p2p-Gnutella04 p2p-Gnutella08 p2p-Gnutella09 wiki-vote 1.6e+06 1.4e+06 1.2e+06 1e+06 800000 600000 400000 200000 0 0.0001 0.001 0.01 0.1 1 Probability Threshold (α) 4/15/15 Iowa State University 40 Number of Alpha Maximal Cliques 4/15/15 Number of Maximal Cliques Random Graphs 90000 80000 70000 60000 50000 40000 30000 20000 10000 BA5000 BA6000 BA7000 BA8000 BA9000 BA10000 0.0001 0.001 0.01 0.1 1 Probability Threshold (α) Iowa State University 41 Algorithm runtime (seconds) RunNme versus threshold α 120 PPI ca-GrQc p2p-Gnutella04 p2p-Gnutella08 p2p-Gnutella09 wiki-vote 100 4/15/15 80 60 40 20 0 0.0001 0.001 0.01 0.1 1 Probability Threshold (α) Iowa State University 42 Algorithm runtime (seconds) 4/15/15 RunNme vs # Maximal Cliques Random Graphs 20 18 16 14 0.05 0.01 0.005 0.001 0.0005 0.0001 12 10 8 6 50000 60000 70000 80000 90000 Output Size ( Number of maximal cliques ) Iowa State University 43 Algorithm runtime (seconds) RunNme vs Size Threshold (DBLP Graph) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 20000 15000 10000 4/15/15 5000 0 2 3 4 5 6 7 Size Threshold Iowa State University 8 9 44 Number of maximal cliques # Max Cliques vs Size Threshold (Ca-‐QrGc) 100000 4/15/15 10000 0.2 0.1 0.05 0.01 0.005 0.001 0.0005 0.0001 1000 100 10 2 3 4 5 6 Size Threshold Iowa State University 7 8 45 Related Work • Uncertain Graphs – Reachability, Shortest Paths – Nearest Neighbors – Frequent and Reliable Subgraphs • DeterminisNc Graphs – Maximal Clique EnumeraNon – Other types of dense substructures 4/15/15 Iowa State University 46 Conclusions 1. Clear DefiniNon of a Maximal Clique in an Uncertain graph, α-‐maximal-‐clique 2. Precise bound on the number of α-‐maximal-‐ cliques in an uncertain graph on n verNces 3. Algorithm for enumeraNng all α-‐maximal-‐ cliques in an uncertain graph, analysis of runNme 4. Experimental EvaluaNon 4/15/15 Iowa State University 47 Open QuesNons • Output-‐sensiNve algorithm for maximal clique enumeraNon in an uncertain graph • Other dense structures in an uncertain graph • General models of a probabilisNc graph where different edges are not independent 4/15/15 Iowa State University 48
© Copyright 2024