DAVA-tree - Computer Science at Virginia Tech

DAVA: Distributing Vaccines over
Networks under Prior Information
Yao Zhang, B. Aditya Prakash
Department of Computer Science
Virginia Tech
SDM, Philadelphia, April 24, 2014
Motivation: Epidemiology
• Virus spreads over contact
networks
• SIR model [Anderson+ 1991]
• Susceptible-Infectious-Recovered
• Weights pij: propagation prob.
from i to j
• Recovered prob. δ for each node
• (models mumps-like infections)
2
Zhang and Prakash, SDM2014
Motivation: Social Media
• Meme/Rumor spreads over
friendship networks
• E.g.: Twitter following network
• Independent cascade model
(IC) [Kempe+ KDD2003]
• Each node has only one chance
to infect its neighbors
• Special case of SIR model
3
Zhang and Prakash, SDM2014
Immunization
• Centers for Disease Control (CDC) cares
about containing epidemic diseases
• E.g: ~400 million dollars used for vaccines for
children in 2013
• Twitter tries to stop rumor spread
• E.g.: rumors of victims after the Boston Marathon
bombs in 2013
How to choose best nodes to vaccinate
(remove)?
4
Zhang and Prakash, SDM2014
Immunization
Pre-emptive immunization (choose
nodes before the epidemic starts)
• Acquaintance strategy [Cohen+ 2003]
• pick a random person, immunize one of its
neighbors at random
• Netshield [Tong+ 2010]
• Minimize the epidemic threshold
(point when the virus takes-off)
Good for baseline strategies
5
Zhang and Prakash, SDM2014
In reality
Pre-emptive immunization (choose nodes
before the epidemic starts)
• Acquaintance strategy [Cohen+ 2003]
• Netshield [Tong+ 2010]
Typically the epidemic has
already started!
• More realistic intervention
• Which nodes to vaccinate now?
• We call it Data-Aware Immunization
6
this paper
Zhang and Prakash, SDM2014
?
Outline
•
•
•
•
•
•
7
Motivation
Problem Definition
Complexity
Our Proposed Methods
Experiments
Conclusion
Zhang and Prakash, SDM2014
Data-Aware Vaccination Problem
Problem: Given a set of infected nodes and a contact graph,
how to distribute k vaccines (node removal)
to minimize the expected number of infected nodes
at the end of the epidemic?
D
A
D
Best solution
A
E
B
1 vaccine?
E
B
F
C
pij =1 for all edges
8
F
Remove A, save {A, D};
Remove B, save {B};
Remove C, save {C};
Zhang and Prakash, SDM2014
C
Outline
•
•
•
•
•
•
9
Motivation
Problem Definition
Complexity
Our Proposed Methods
Experiments
Conclusion
Zhang and Prakash, SDM2014
Complexity of DAV
See paper for
details
• NP-hard
• Reduce from Maximum K-Intersection Problem
(MaxKI: maximizing the intersection of k subsets)
• MaxKI is NP-Complete [Vinterbo 2004]
• Approximation algorithm?
• Not submodular
• Actually, DAV is hard to
approximate within an absolute
error!
10
Zhang and Prakash, SDM2014
Outline
•
•
•
•
Motivation
Problem Definition
Complexity
Our Proposed Methods
• assume IC model and undirected graph
• Experiments
• Conclusion
11
Zhang and Prakash, SDM2014
1: Simplify - Merging infected nodes
• Idea: merge all the infected nodes into a
single ‘super infected’ node I
Merged Graph
Original Graph
A
pA
pX B
Equivalent
Super
node
I
pA
pB
A
B
pY
Logical-OR
pB=1-(1-pX)(1-pY)
pC
C
12
pC
C
Zhang and Prakash, SDM2014
2: DAVA-Tree Algorithm: Idea
• Select nodes with the largest “benefit”
•
: the expected number of saved nodes after
removing set S on graph G
• Benefit of adding additional node j into S:
# of saved nodes
after adding j into S
Merged Infected Node
Benefit: 4
Additional number of saved nodes when
Benefit: 5 adding node j into S
pij =1for all edges
13
Benefit: 2
Zhang and Prakash, SDM2014
DAVA-Tree Alg.: Optimal on Trees
For any set S:
Merged Infected Node
• Fact 1: the chosen nodes in
the optimal set must be
neighbors of infected node I
• Fact 2: the benefit of each
such node is independent
of the rest of the set S
Benefit: 2
pij =1for all edges
Linear Time
Benefit: 4
DAVA-tree algorithm: Select top k node
from I’s neighbors with the max. benefit
14
Zhang and Prakash, SDM2014
Benefit: 5
3: General Case – Arbitrary Graphs
• Idea
• We have the optimal algorithm for a tree
• Extract a spanning tree, then run DAVA-tree
• What kind of tree?
• Minimum spanning tree
Optimal solution
MST
pij =1 for all edges
15
Zhang and Prakash, SDM2014
Optimal on MST
by DAVA-tree
3: General Case – Arbitrary Graphs
• Idea
• We have the optimal algorithm for a tree
• Build a spanning tree first
• What kind of tree?
• Minimum spanning tree
Software engineering
We propose to use
dominator tree
u dominates v
every path from I to v contains u
pij =1 for all edges
16
4 dominates 8,9,10,11
Zhang and Prakash, SDM2014
Dominator Tree
u is immediate dominator of v
u dominates v AND every other
dominator of v dominates u
Dominator tree: add an edge between every such u and v
Optimal solution
Linear time
[Buchsbaum,
Tarjan 1998]
pij =1 for
all edges
Optimal from
DAVA-tree
Dominator Tree
Merged Graph
• Fact 1: the optimal solution should be among the children
of root I in the dominator tree for any arbitrary graph
• Fact 2: (for special case, k = 1, p = 1) running DAVA-tree
on the dominator tree gives the optimal solution
17
Zhang and Prakash, SDM2014
Weighting the dominator tree
• Weighting the dominator tree
• #P-complete
• Our solution: maximum propagation path
probability between nodes I and v (using
Dijkstra’s algorithm)
w1
p1
p3
p6
w3
w6
Dominator Tree
Merged Graph
18
Zhang and Prakash, SDM2014
DAVA algorithm
Merged Graph (pij =1 for all edges)
Step:
1. T = Build a dominator tree
2. v = Run DAVA-tree on T with
budget=1
3. Remove v from G
4. Goto Step 1 until |S|=k
|S|=2
Iteration=1
19
Zhang and Prakash, SDM2014
Dominator Tree
Merged Graph
DAVA algorithm
Step:
1. T = Build a dominator tree
2. v = Run DAVA-tree on T with
budget=1
3. Remove v from G
4. Goto Step 1 until |S|=k
Remove selected
node
O(k(|E|+ |V|log|V|))
Too slow for large networks!
Dominator tree
|S|=2
Iteration=2
Iteration=1
20
Zhang and Prakash, SDM2014
DAVA-fast: a faster algorithm
Merged Graph
Step:
1. T = Build a dominator tree
2. S = Run DAVA-tree on T
with budget=k
|S|=2
• In practice, the performance of
DAVA-fast is very close to DAVA
• Time complexity: subquadratic!
– DAVA-fast: O(|V|log|V|+|E|)
Dominator tree
21
Zhang and Prakash, SDM2014
Extending to SIR model
• See the paper
22
Zhang and Prakash, SDM2014
Outline
•
•
•
•
•
•
23
Motivation
Problem Definition
Complexity
Our Proposed Methods
Experiments
Conclusion
Zhang and Prakash, SDM2014
Experiments
• Virus Propagation Model
• IC and SIR
• Settings (See more settings in the paper)
• Randomly uniformly chosen initial infected nodes
• Baseline Algorithms
•
•
•
•
RANDOM: randomly uniformly chosen healthy nodes
DEGREE: choose nodes with top weighted degrees
PAGERANK: choose nodes with top pageranks
NETSHIELD
• state-of-the-art pre-emptive immunization algorithm to minimize
the epidemic threshold of the graph [Tong+ ICDM 2010]
• Assumes no data is given before the epidemic starts
24
Zhang and Prakash, SDM2014
Experiments: datasets
Datasets are chosen from different domains
• Social media (IC model)
•
•
•
•
OREGON: AS router graph
STANFORD: hyperlink network
GNUTELLA: peer-to-peer network
BRIGHTKITE: friendship network
• Epidemiology (SIR model)
• PORTLAND and MIAMI: large urban social-contact graph used in
national smallpox modeling studies [Eubank+, 2004]
OREGON
STANFORD GNUTELLA BRIGHTKITE
PORTLAND
MIAMI
|V|
633
8,929
10,876
58,228
0.5 million
0.6 million
|E|
2,172
53,829
39,994
21,4078
1.6 million
2.1 million
25
Zhang and Prakash, SDM2014
Experiments: Quality
GNUTELLA (IC model)
PORTLAND (SIR model)
Higher
is better
DAVA consistently outperforms the baseline algorithms.
Further DAVA-fast performs almost as well as DAVA.
(See more results in the paper)
26
Zhang and Prakash, SDM2014
Experiments: Scalability
Lower is
better
27
Running time(sec.)
did not finish within 10 hours
Zhang and Prakash, SDM2014
Outline
•
•
•
•
•
•
28
Motivation
Problem Definition
Complexity
Our Proposed Methods
Experiments
Conclusion
Zhang and Prakash, SDM2014
Conclusion
Data-Aware Vaccination problem
Given: Graph and Infected nodes
Find: ‘best’ nodes for immunization
• Complexity
Graph with infected
nodes
• NP-hard
• Hard to approximate within an absolute error
• DAVA-tree
Merged graph
• Optimal solution on the tree
• DAVA and DAVA-fast
• Merging infected nodes
• Build a dominator tree, and run DAVA-tree
• Running time: subquadratic
• DAVA: O(k(|E|+ |V|log|V|))
• DAVA-fast: O(|E|+|V|log|V|)
29
Zhang and Prakash, SDM2014
Dominator tree
Any Questions?
Graph with infected
nodes
Code at:
http://people.cs.vt.edu/~yaozhang
Merged graph
Yao Zhang
B. Aditya Prakash
Thanks for the support of NSF (Grant No. IIS1353346).
30
Zhang and Prakash, SDM2014
Dominator tree