slides in pdf - Iowa State University

Mining Maximal Cliques from an Uncertain Graph Arko Mukherjee Pan Xu Srikanta Tirthapura ([email protected]) 4/15/15 Iowa State University 1 Uncertain Graph •  The presence of an edge between two nodes in a graph cannot be definiNvely ascertained ? ? ? ? ? •  Examples –  CommunicaNon Networks: Ghosh, et al. On a rouNng problem within probabilisNc graphs and its applicaNon to intermiTently connected networks. IEEE Infocom, 2007 –  Social Networks: Guha et al. PropagaNon of trust and distrust. WWW, 2007 –  Protein-­‐Protein interacNon Networks: Asthana, et al. PredicNng protein complex membership using probabilisNc network reliability. Genome Research, 2004 –  Regulatory networks in biological systems: Jiang, et al. Network moNf idenNficaNon in stochasNc networks. Proceedings of the NaNonal Academy of Sciences, 2006 4/15/15 Iowa State University 2 Uncertain Graph An uncertain graph is a triple G = (V,E,p), where V is a set of verNces, E is a set of possible edges, and p: E à (0,1] assigns a probability of existence p(e) for each edge e in E 0.6 0.5 0.9 0.5 0.1 0.6 0.5 We assume that events on different edges are mutually independent April 14, 2015 Iowa State University 3 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C Iowa State University 4 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 B A
B 1/8 0.5 4/15/15 0.5 C C Iowa State University 5 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C A
B A
B 1/8 1/8 C C Iowa State University 6 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 0.5 4/15/15 B 0.5 C A
B A
B A
B 1/8 1/8 1/8 C C C Iowa State University 7 Uncertain Graph = Probability DistribuNon on Possible Graphs A 0.5 B A
0.5 0.5 C A
B A
B A
B A
1/8 1/8 1/8 C C C B A
B A
B 1/8 1/8 1/8 1/8 C C C C 4/15/15 B Iowa State University 8 Sampling from an Uncertain Graph A
B 1/8 A
C Sample A
4/15/15 B A
B C B 1/8 1/8 C C Iowa State University 9 Dense Substructure EnumeraNon Enumerate densely connected (potenNally overlapping) subgraphs from a graph •  Maximal Clique •  Maximal Biclique •  Quasi clique/biclique •  K-­‐Edge connected subgraphs •  Densest Subgraph, Triangle Densest Subgraph,.. 4/15/15 Iowa State University 10 Research QuesNons •  What is the noNon of a dense structure in uncertain graph? •  How many such structures can exist in an uncertain graph? •  Time-­‐efficient algorithms to find such structures 4/15/15 Iowa State University 11 ContribuNons 1.  DefiniNon of a Maximal Clique in an Uncertain graph, α-­‐maximal-­‐clique 2.  Bound on the number of α-­‐maximal-­‐cliques in an uncertain graph on n verNces 3.  Algorithm for enumeraNng all α-­‐maximal-­‐
cliques in an uncertain graph, analysis of runNme 4.  Experimental EvaluaNon 4/15/15 Iowa State University 12 Roadmap • 
• 
• 
• 
• 
IntroducNon Maximal Clique in an Uncertain Graph Number of Maximal Cliques Algorithm for EnumeraNng Maximal Cliques Experimental Results 4/15/15 Iowa State University 13 Maximal Clique (DeterminisNc Graph) •  A clique in a graph G = (V,E) is a subset of verNces C such that the induced subgraph on C is a complete graph •  A clique C is called maximal if there does not exist another clique C’ such that C subset of C’ 4/15/15 Iowa State University 14 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 15 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 16 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 17 Maximal Clique in a Graph B C D A E 4/15/15 F Iowa State University 18 Maximal Clique in a Graph B C D A E 4/15/15 Not a maximal clique F Iowa State University 19 DefiniNon: Maximal Clique in an Uncertain Graph •  For a set of verNces C, the clique probability clq(C, G) is the probability that C is a clique in a graph sampled from G •  For parameter α, 0 < α <= 1, a set of verNces C is called an α-­‐maximal-­‐clique if –  clq(C, G) ≥ α –  There is no vertex set C’ such that C subset of C’ and clq(C’, G) ≥ α •  If α=1, reduces to determinisNc maximal clique 4/15/15 Iowa State University 20 Clique ProbabiliNes B 0.4 C 1 0.5 0.5 1 E 4/15/15 0.9 Iowa State University F 21 Clique ProbabiliNes B 0.4 C 1 0.5 clq({C,E,F}) = 0.45 0.5 1 E 4/15/15 0.9 Iowa State University F 22 Clique ProbabiliNes B 0.4 C 1 0.5 clq({C,E,F}) = 0.45 0.5 clq({C,B,E}) = 0.20 1 E 4/15/15 0.9 Iowa State University F 23 Maximal Clique EnumeraNon in an Uncertain Graph Given an uncertain graph G = (V,E,p) and parameter 0 < α < 1, enumerate all α-­‐maximal cliques in G 4/15/15 Iowa State University 24 Enumerate α-­‐maximal-­‐cliques, α=0.4 B 0.4 C B 1 0.5 0.5 4/15/15 0.9 C 1 0.5 1 E 0.4 0.5 1 F E Iowa State University 0.9 F 25 Enumerate α-­‐maximal-­‐cliques, α=0.2 B 0.4 B C 0.5 0.5 4/15/15 0.9 0.5 1 1 E C 1 1 0.5 0.4 E F Iowa State University 0.9 F 26 Enumerate 0.2-­‐maximal-­‐cliques B 0.4 C B 0.5 0.5 4/15/15 0.9 B F E 0.4 C 1 0.5 0.5 1 1 E C 1 1 0.5 0.4 0.9 Iowa State University 1 F E 0.9 F 27 Roadmap •  IntroducNon •  DefiniNon of a Maximal Clique in an Uncertain Graph •  Number of Maximal Cliques •  Algorithm for EnumeraNng Maximal Cliques •  Experimental Results 4/15/15 Iowa State University 28 Number of Maximal Cliques •  DeterminisNc Graph: The maximum number of maximal cliques in a graph on n verNces is 3n/3 (Moon and Moser, On cliques in graphs, 1965) •  Our Result for an Uncertain Graph: For any 0 < α < 1, the maximum number of α-­‐maximal !n $
cliques in uncertain graph G is #" n / 2 &%=Θ(2n/√n ) Note this bound is independent of α 4/15/15 Iowa State University 29 Lower Bound on Maximum Number of α-­‐Maximal Cliques •  Consider graph Kn Each edge has a probability q B A C q •  Let k = = 3 q Let q be such that E F qκ = α •  Each subgraph on n/2 v! erNces i
s a
n α
-­‐maximal-­‐
clique. This graph has #" nn / 2 $&% α-­‐maximal cliques !n / 2$
#
&
"2 %
4/15/15 Iowa State University D 30 Upper Bound on Maximum Number of α-­‐Maximal Cliques •  Non-­‐redundant CollecNon of Sets: A collecNon of sets C is said to be non-­‐
redundant if for any pair S1, S2 ∈ C, such that S1 S2 , we have –  S1 not subset of S2 and –  S2 not subset of S1 •  The set of uncertain maximal cliques is a non-­‐redundant collecNon •  The largest number of α-­‐maximal cliques in G is upper bounded by the size of the largest non-­‐redundant collecNon of subsets of V. • 
!n $
We show this to be # &
"n / 2%
4/15/15 Iowa State University 31 Roadmap •  IntroducNon •  DefiniNon of a Maximal Clique in an Uncertain Graph •  Number of Maximal Cliques •  Algorithm for EnumeraNng Maximal Cliques •  Experimental Results 4/15/15 Iowa State University 32 Prior Work on Clique EnumeraNon •  Bron and Kerbosch (1973): Branch-­‐and-­‐Bound approach to enumeraNng maximal cliques in a determinisNc graph •  Tomita et al. (2006): Branch-­‐and-­‐bound with pivoNng, worst-­‐case Nme opNmal, followed up by Eppstein, Loffler and Strash (2011) •  Tsukiyama et al. (1977): Output-­‐sensiNve runNme, follow up work by Makino and Uno (2004) •  Uncertain Graphs: Zou et al. (2010) –  Different definiNon of maximal clique –  Focuses on top-­‐k maximal clique 4/15/15 Iowa State University 33 Algorithm for Maximal Uncertain cLique EnumeraNon (MULE) •  Branch-­‐and-­‐bound based Algorithm •  Worst case runNme complexity of O(n2n) •  Near-­‐opNmal since output size can be Θ(√n 2n) –  number of structures is itself Θ(2n/√n ) –  each maximal clique is Θ(n) 4/15/15 Iowa State University 34 MULE Ideas (1) •  Consider verNces in some total order •  Depth First Search –  Start with an α-­‐clique C, iniNally empty –  Grow C by adding one vertex at a Nme, while retaining an α-­‐clique –  At point where adding further verNces leads it to not be an α-­‐clique, we have an α-­‐maximal-­‐clique –  Output clique, backtrack and explore 4/15/15 Iowa State University 35 MULE Ideas (2) •  Restrict addiNon to only those verNces v such that –  Already connected to every vertex in current working set C –  Adding v to C keeps the clique probability large enough •  Incrementally maintain changes in clique probabiliNes in O(1) Nme per vertex addiNon •  Also leads to Maximality check in O(n) Nme (note that given a clique, checking if α-­‐maximal is O(n2) operaNon, in worst case) 4/15/15 Iowa State University 36 Roadmap •  IntroducNon •  DefiniNon of a Maximal Clique in an Uncertain Graph •  Number of Maximal Cliques •  Algorithm for EnumeraNng Maximal Cliques •  Experimental Results 4/15/15 Iowa State University 37 Input Graphs •  Real-­‐World Uncertain Graphs –  PPI: Protein-­‐Protein InteracNon Network (3751 verNces, 3692 edges) –  DBLP Social Network (684911, 2284991) •  Semi-­‐SyntheNc Uncertain Graphs –  P2p communicaNon networks (5k-­‐10k, 20k-­‐40k) –  Arxiv collaboraNon networks (5k, 30k) –  Wiki-­‐vote (7k, 100k) •  SyntheNc Graphs 4/15/15 Iowa State University 38 Runtime (seconds)
RunNme, α=0.8 100
DFS-NOIP
MULE
10
1 wiki-vote BA5000 ca-GrQc
PPI
Input Graph
4/15/15 Iowa State University 39 Number of Alpha Maximal Cliques
Number of Maximal Cliques Semi-­‐SyntheNc and Real-­‐World Graphs 1.8e+06
PPI
ca-GrQc
p2p-Gnutella04
p2p-Gnutella08
p2p-Gnutella09
wiki-vote
1.6e+06
1.4e+06
1.2e+06
1e+06
800000
600000
400000
200000
0
0.0001
0.001
0.01
0.1
1
Probability Threshold (α)
4/15/15 Iowa State University 40 Number of Alpha Maximal Cliques
4/15/15 Number of Maximal Cliques Random Graphs 90000
80000
70000
60000
50000
40000
30000
20000
10000
BA5000
BA6000
BA7000
BA8000
BA9000
BA10000
0.0001
0.001
0.01
0.1
1
Probability Threshold (α)
Iowa State University 41 Algorithm runtime (seconds)
RunNme versus threshold α 120
PPI
ca-GrQc
p2p-Gnutella04
p2p-Gnutella08
p2p-Gnutella09
wiki-vote
100
4/15/15 80
60
40
20
0
0.0001
0.001
0.01
0.1
1
Probability Threshold (α)
Iowa State University 42 Algorithm runtime (seconds)
4/15/15 RunNme vs # Maximal Cliques Random Graphs 20
18
16
14
0.05
0.01
0.005
0.001
0.0005
0.0001
12
10
8
6
50000
60000
70000
80000
90000
Output Size ( Number of maximal cliques )
Iowa State University 43 Algorithm runtime (seconds)
RunNme vs Size Threshold (DBLP Graph) 0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
20000
15000
10000
4/15/15 5000
0
2
3
4
5
6
7
Size Threshold
Iowa State University 8
9
44 Number of maximal cliques
# Max Cliques vs Size Threshold (Ca-­‐QrGc) 100000
4/15/15 10000
0.2
0.1
0.05
0.01
0.005
0.001
0.0005
0.0001
1000
100
10
2
3
4
5
6
Size Threshold
Iowa State University 7
8
45 Related Work •  Uncertain Graphs –  Reachability, Shortest Paths –  Nearest Neighbors –  Frequent and Reliable Subgraphs •  DeterminisNc Graphs –  Maximal Clique EnumeraNon –  Other types of dense substructures 4/15/15 Iowa State University 46 Conclusions 1.  Clear DefiniNon of a Maximal Clique in an Uncertain graph, α-­‐maximal-­‐clique 2.  Precise bound on the number of α-­‐maximal-­‐
cliques in an uncertain graph on n verNces 3.  Algorithm for enumeraNng all α-­‐maximal-­‐
cliques in an uncertain graph, analysis of runNme 4.  Experimental EvaluaNon 4/15/15 Iowa State University 47 Open QuesNons •  Output-­‐sensiNve algorithm for maximal clique enumeraNon in an uncertain graph •  Other dense structures in an uncertain graph •  General models of a probabilisNc graph where different edges are not independent 4/15/15 Iowa State University 48