Lecture notes

Lecture 6
Solving SDPs using Multiplicative
Weights∗
In this lecture, first we propose an algorithm to solve semidefinite programs and then we
will apply it to MAXCUT problem as an example. As you will see, we need an oracle with
specific properties for our method to work, so we will show how to build such an oracle for
MAXCUT problem. Finally, we investigate the quality of the SDP relaxation for a more
general cases of discrete quadratic programs.
6.1
Semidefinite Programming
As we saw in lecture 04, the canonical form for a SDP is
sup B • X
s.t. Ai • X ≤ ci
X<0
i = 1, · · · , m
Recall that we also added the assumption that A1 = I and c1 = R, implying that T race(X) ≤
R for any feasible X. The dual problem associated to this SDP is
inf
cT y
s.t.
m
X
yi A i − B < 0
i=1
y≥0
Now suppose that we have the following oracle which is going to help up in our algorithm.
Oracle: Given X < 0 and δ > 0, return either
*
Lecturer: Thomas Vidick. Scribe: Ehsan Abbasi.
1
(i) X is primal feasible with objective value greater than or equal to (1 − δ)α, or
P
P
(ii) y ∈ Rm such that y ≥ 0, X • ( i yi Ai − B) ≥ 0, k i yi Ai − Bk ≤ σ and cT y ≤ α.
Recall that σ is called the width of the oracle. Assuming such an oracle is given to us we
introduced the following algorithm.
δα
2σR
(t)
P
y
(t)
−B+σI
Algorithm: Run MMWA using =
and loss of M = i i 2σ
(thus 0 4 M (t) 4 I)
by starting from X (0) = n1 I. At each step, update X using the MMW update rule from the
previous lecture.
In this algorithm, if the oracle returns (i) we’re done. Otherwise we have the following
theorem.
2
2
Theorem 6.1.
the oracle does not fail (in returning y) for T = 8σδ2 αR2 log n iterations then
P If (t)
1
δα
y¯ = R e1 + T t y is dual feasible with objective value at most (1 + δ)α.
Proof. First let’s check the guarantee on the objective value
cT y =
δα T
1 X T (t)
c e1 +
c y ≤ δα + α = (1 + δ)α,
R
T i
where we used c1 = R and the guarantee cT y (t) ≤ α for any y (t) returned by the oracle.
Now we should check feasibility. Due to oracle we know y ≥ 0. We should now check
?
P
i yi Ai − B ≥ 0.
From
P the(t)MMW theorem we know that for any unit vector v, and in particular the eigenvector
of t M associated with its smallest eigenvalue,
0≤
T
X
M
(t)
•X
(t)
(1)
≤ (1 + )
T
X
t=1
t=1
(2)
= (1 + )λn (
T
X
t=1
P
i
v T M (t) v +
log n
(t)
X X (t)
yi Ai − B + σI
σ
log n (3) 1 + log n
yi )Ai − T B) +
)+
=
λn ( (
+
,
2σ
2σ
2σ
t
i
where λn denotes the smallest eigenvalue. Here (2) holds by our choice of v, and (3) uses
properties of eigenvalues of PSD matrices, specifically λi ((A + bI)/c) = (λi (A) + b)/c for any
b and any c > 0. Rearranging terms,
X 1X
X
−1 log n
2σ
δα
(4)
−
)
≤ λn ( (
yi ) − B) = λn (
y¯i Ai − B) −
2
(1 + )T
T t
R
i
i
X
−4σ log n
δα
⇒
≤ λn (
y¯i Ai − B) −
,
(1 + )T
R
i
(
where (4) is because of the way we defined y¯i in the theorem. Given the choice of parameters
4σ log n
made in the theorem you can check that δα
− (1+)T
> 0, so the smallest eigenvalue of
R
P
¯i Ai − B is positive, meaning this is a PSD matrix and y¯ is feasible, as required.
iy
2
6.2
Application to the MAXCUT problem
In this part, we use the MMWA algorithm to solve the MAXCUT SDP introduced in lecture
04. We saw that for a given undirected graph G = (V, E), assuming G is d-regular (i.e. each
vertex has degree exactly d), the size of the largest cut could be written as
MAXCUT(G) =
1 X
1X
|E|
|E|
+ sup
xi xj ≤
+ sup
Ai,j ui .uj ,
2
2
xi ∈{±1} 2
ui ∈R2n 2 i,j
(i,j)∈E
kui k=1
where A is the (symmetrized) adjacency matrix of the graph G, which has 1/2 for every
entry (i, j) and (j, i) associated to an edge {i, j} and zeros elsewhere. This problem can be
written in standard form in the following way:
MAXCUT(G) = sup B • X
s.t. Ei • X ≤ 1
X < 0,
where B = d4 I + A2 and Ei is a matrix whose ith diagonal entry is one and the others are zero.
Using that it has at most d non-zero entries, each equal to 1/2, in every row, the adjacency
matrix satisfies kAk ≤ d/2, thus we have kBk ≤ d2 and B < 0.
Observations: If α is the optimal value for the SDP we have |E|
= nd
≤ α ≤ |E| = nd
.
2
4
2
The first inequality follows since there is always a cut of sie |E|/2 (a random cut will cut
half the edges), and the second follows from the bound on the norm of B.
Now our goal is to design an oracle O that we can use for the algorithm that we proposed
to solve SDP using MMWA. In other words, given X <P
0 such that Tr(X) = n (n plays the
T
role of R in our algorithm), findPy ≥ 0 such that c y = i yi ≤ α (in this problem all of the
entries of c are ones) and X • ( i yi Ei − B) < 0.
We design the oracle by distinguishing the following cases:
P
First Case: If B • X ≤ α, let yi = αn ≥ 0 for all i. Then cT y = i yi = n αn = α ≤ α.
Besides,
X
X
α
X •(
yi Ei − B) =
yi Xii − X • B = Tr(X) − X • B = α − X • B ≥ 0.
n
i
i
Second Case: Suppose B • X = λα > α (λ > 1). We also have λ ≤ 2 because B • X ≤
||B||Tr(X) = d2 n ≤ 2α. We already know that B • X > α, so if X is feasible then we have
case (i) for the oracle and we are done: we found a very good feasible solution. Otherwise
define
S = {i : Xii > λ}
X
K=
Xii
i∈S
3
S is the set of indices whose constraint is violated by a large amount (since λ > 1), and K
is the sum of violated diagonal entries of X. Now consider the following two cases:
If K > δλn
, then let
4
λα
if i ∈ S,
K
yi =
0 if i ∈
/ S.
Then obviously y ≥ 0 and
cT y =
X
yi =
i
(5) λα K
λα
|S| ≤
= α,
K
K λ
where (5) holds since K ≥ λ|S| from the way we defined K and S. Besides,
X
X λα
λα
X •(
Xii − X • B =
K − X • B = λα − X • B = 0.
yi Ei − B) =
K
K
i
i∈S
n), assume
Finally in the other case when there are only a few constraints violated (K ≤ δλ
4
without loss of generality (permuting the rows and columns if necessary) that the first |S|
diagonal entries of X correspond to those i ∈ S, so we can write
XS,S XS,S¯
X=
.
XS,S
XS,
¯
¯ S¯
¯ as
Now define a new matrix X
¯=
X
0
1
X ¯ S¯ .
λ S,
0
0
¯ < 0. Besides X
¯ ii ≤ 1 for every
A diagonal block extracted from a PSD matrix is PSD so X
¯ is primal feasible. It remains to evaluate its objective value.
i, thus X
Claim: B • X ≥ (1 − 3δ)α.
From the definition of λ,
(7) 3 X
1
1 X
1
¯ (6)
α − B • X = B • ( X − X)
=
Bi,j Xi,j = Bi,j kui kkuj k ≤
dkui k
λ
λ i,j∈T
λ
λ i∈S
sX
(10) 3d K (11) 3δnd (12)
(8) 3d p
(9) 3d p
√ ≤ √ ≤ 3δα,
≤
|S|
kui k2 =
|S|K ≤
λ
λ
λ
4 n
λ
i∈S
¯ ∪ (S¯ × S), (7) is because kuj k ≤ 1
where in (6) we introduced T = (S × S) ∪ (S × S)
and T consists of three sets and summation for each set is less P
that d. (8) is a result of
2
the Cauchy-Schwarz inequality, (9) is because kui k = Xii and i∈S Xii = K. (10) uses
K > λ|S|. (11) follows from our assumption K ≤ δλ
n and (12) is because n4 ≤ α. Thus
4
¯ ≥ (1 − 3δ)α.
finally we have B • X
So the oracle works. How good is it?
First note that it runs very fast. We have only three cases to distinguish between, and in
4
each one we check a linear constraint. Thus the running time is linear in the number of edges
m of the graph.
Next we need to bound the width of the oracle.


y
0
1
X
d


...
yi Ei − B ≤ 
 + kBk ≤ max |yi | + .
i
2
i
0
yn
In order to find maxi |yi | we should check all of the cases. For example for the case when
n, we have
K ≥ δλ
4
λα
4 nd
4d
λi ≤
≤
=
K
λδn 2
δ
d
d
Thus maxi yi v δ and width v δ .
6.3
General quadratic programs
Consider the following problem
α=
X
sup
Ai,j xi yj
xi ,yj ∈{±1} i,j
i=1,...,n
j=1,...,m
where A ∈ Rn×m . Just like the MAXCUT problem this is an NP-Hard problem (as you’ll
show in homework, MAXCUT is a special case). We will see how a good approximation can
be obtained in polynomial time. For this we propose the following relaxation:
X
α≤β=
sup
Ai,j ui · vj
ui ,vj ∈Rm+n i,j
kui k=kvj k=1
It is not obvious that this program is SDP and we will get back to it later. The interesting
point here is the following theorem.
Theorem 6.2. Given ui ’s and vj ’s achieving the optimal in the above SDP, there exists a
polynomial-time algorithm that produces xi ’s and yj ’s in {±1} such that
X
Ai,j xi yj ≥ Cβ,
i,j
where C is a universal constant.
There are different methods to prove the theorem, which yield different values of C for
this theorem. For example in your homework you will develop an algorithm to achieve
5
C ≈= 0.56. The best value for C is called Grothendieck’s Constant KG and can be defined
as
X
X
KG = inf C : ∀m, n ∀A ∈ Rn×m , sup
Ai,j ui · vj ≤ KG sup
Ai,j xi yj
i,j
i,j
Now let’s rewrite the above problem as an SDP in the form of
sup B • Z
s.t. Ai • Z ≤ ci
Z<0
If ui ’s are columns of U and vj ’s are columns of V, then define
[ui · uj ] [ui · vj ]
T
Z = (U V ) (U V ) =
∈ R(m+n)×(m+n) .
[vi · uj ] [vi · vj ]
Trivially Z is a PSD matrix (it is a Gram matrix), and its diagonal elements are the squared
norms kui k2 and kvj k2 , which should be at most one. Thus we let ci = 1 and Ai = Ei for
i = 1, . . . , n + m, where Ei is a matrix whose ith diagonal entry is one and the others are
zeros. Finally for the objective value, we define
1
0 A
,
B=
2 AT 0
and this problem is equivalent to the relaxed problem that we introduced for our original
quadratic optimization problem, except now it is in standard SDP form.
6