Lecture 1: Introduction 1 What is combinatorics?

Math 184: Combinatorics
Lecture 1: Introduction
Instructor: Benny Sudakov
1
What is combinatorics?
Defining combinatorics within the larger field of mathematics is not an easy task. Typically, combinatorics deals with finite structures such as graphs, hypergraphs, partitions or partially ordered
sets. However, rather than the object of study, what characterizes combinatorics are its methods:
counting arguments, induction, inclusion-exclusion, the probabilistic method - in general, surprising applications of relatively elementary tools, rather than gradual development of a sophisticated
machinery. That is what makes combinatorics very elegant and accessible, and why combinatorial
methods should be in the toolbox of any mainstream mathematician.
Let’s start with a few examples where combinatorial ideas play a key role.
1. Ramsey theory. In the 1950’s, a Hungarian sociologist S. Szalai studied friendship relationships between children. He observed that in any group of around 20 children, he was able to
find four children who were mutual friends, or four children such that no two of them were
friends. Before drawing any sociological conclusions, Szalai consulted three eminent mathematicians in Hungary at that time: Erd˝os, Tur´an and S´os. A brief discussion revealed that
indeed this is a mathematical phenomenon rather than a sociological one. For any symmetric
relation R on at least 18 elements, there is a subset S of 4 elements such that R contains either
all pairs in S or none of them. This fact is a special case of Ramsey’s theorem proved in 1930,
the foundation of Ramsey theory which developed later into a rich area of combinatorics.
2. Tournament paradox. Suppose that n basketball teams compete in a tournament where
each pair of teams plays exactly one game. The organizers want to award the k best teams.
However, whichever k teams they pick, there is always another team that beats them all! Is
this possible? It can be proved using a random construction that for any k > 0 there is n > k
such that this can indeed happen.
3. Brouwer’s Theorem. In 1911, Luitzen Brower published his famous Fixed Point Theorem:
Every continuous map f : B n → B n (where B n is an n-dimensional ball) has a fixed point,
f (x) = x.
The special case of n = 1 follows easily from the intermediate value theorem. For higher
dimensions, however, the origianal proof was complicated. In 1928, Emanuel Sperner found
a simple combinatorial result which implies Brouwer’s fixed point theorem in an elegant way.
The proof of Sperner’s lemma is equally elegant, by double counting.
4. Borsuk’s conjecture. In 1933, Karol Borsuk published a paper which contained a proof of
a conjecture stated by Stanislaw Ulam:
Every continuous map f : S n → Rn (where S n is an n-dimensional sphere) maps two antipodal
points to the same value, f (x) = f (−x).
1
Borsuk also asked whether the following conjecture is true:
Every set S ⊂ Rn of finite diameter can be partitioned into n + 1 sets of strictly smaller
diameter.
The example of an n-dimensional regular simplex shows than n + 1 parts are necessary, since
we need to separate the n + 1 vertices into different parts. Borsuk’s conjecture was proved for
smooth bodies, using the Borsuk-Ulam theorem above. However, the general case was open
until 1993 when Kahn and Kalai disproved it in a dramatic √way. They constructed discrete
sets of points that cannot be partitioned into fewer than 1.2 n parts.
5. Littlewood and Offord studied the following problem in 1943:
Given complex numbers
z1 , z2 , . . . , zn of absolute value |zi | ≥ 1, what is the maximum number
Pn
of distinct sums i=1 ±zi that
lie inside some unit disk? Kleitman and Katona proved that
n
the maximum number is n/2
, using the methods of extremal combinatorics.
2
Graph theory
Let us begin with an area of combinatorics called graph theory. What we mean by a graph here is
not the graph of a function, but a structure consisting of vertices some of which are connected by
edges.
Definition 1. A graph is a pair G = (V, E) where V is a finite set whose elements we call vertices
and E ⊆ V2 is a collection of pairs of vertices that we call edges.
Some basic examples of graphs are:
• Kn : the complete graph on n vertices, where every pair of vertices forms an edge.
• Ks,t : the complete bipartite graph, where V = S ∪ T , |S| = s, |T | = t, and every pair in
S × T forms an edge.
• C` : the cycle of length `, where V = {v0 , v1 , . . . , v`−1 } and {vi , vj } ∈ E if j = i + 1 (mod `).
A famous story which stands at the beginning of graph theory is the problem of the bridges of
K¨onigsberg. K¨
onigsberg had 7 bridges connecting different parts of the town. Local inhabitants
were wondering whether it was possible to walk across each bridge exactly once and return to the
same point. In the language of graph theory, the bridges are edges connecting different vertices, i.e.
parts of town. The bridges of K¨
onigsberg looked like the following graph (or rather, “multigraph”,
since here multiple edges connect the same pair of vertices):
Figure 1: The graph of K¨onigsberg.
2
The question is whether it is possible to walk around the graph so that we traverse each edge
exactly once and come back to the same vertex. Such a walk is called an Eulerian circuit and
graphs where this is possible are called Eulerian (after a paper by Euler where this question was
considered in 1736). The following result shows that the graph of K¨onigsberg is not eulerian. First,
we need some definitions.
Definition 2. We say that an edge e is incident with a vertex v, if v ∈ e. Two vertices are adjacent,
if they form an edge. The degree d(v) of a vertex is the number of edges incident with v.
Definition 3. A walk is a sequence of vertices (v1 , v2 , . . . , vk ) (with possible repetition) such that
each successive pair (vi , vi+1 ) forms an edge.
An eulerian circuit is a walk where vk = v1 and each edge appears exactly once.
A path is a walk (v1 , v2 , . . . , vk ) without repetition of vertices.
A graph G is called connected if there is a path between any pair of vertices.
Theorem 1. A (multi)graph G is Eulerian if and only if it is connected and the degree of every
vertex is even.
Proof. Suppose G has an Eulerian circuit. Draw arrows along the edges as they are traversed by
the circuit. Then the circuit induces a connecting path for any pair of vertices. Also, for every
vertex the number of incoming edges equals the number of outgoing edges. Hence the degree of
each vertex is even.
Conversely, assume that G is connected and all degrees are even. First, we prove the following:
the set of edges E can be decomposed into a union of disjoint cycles C1 , . . . , Cn . This is true,
because for any graph with even degrees, we can find a cycle by walking arbitrarily until we hit our
own path. The we remove the cycle, all degrees are still even, and we continue as long as there are
any edges in the graph.
Now we construct the Eulerian circuit L inductively as follows. Start with L := C1 . We
incorporate the cycles into L one by one. At any point, there must be an unused cycle Cj that
intersects L, otherwise the graph is disconnected. Then, we add Cj to the circuit L by walking
along L up to the first point where we hit Cj , then walk around Cj and continue along L. We
repeat this procedure until all cycles are exhausted. Since the cycles cover each edge exactly once,
our circuit also covers each edge exactly once at the end.
3
Math 184: Combinatorics
Lecture 2: The pigeonhole principle and double counting
Instructor: Benny Sudakov
1
The pigeonhole principle
If n objects are placed in k boxes, k < n, then at least one box contains more than one object.
This is so obvious, one might think that nothing non-trivial can be derived from this “principle”.
And yet, this principle is very useful.
1.1
Two equal degrees
The following consequence is quite easy.
Theorem 1. In any graph, there are two vertices of equal degree.
Proof. For any graph on n vertices, the degrees are between 0 and n−1. Therefore, the only way all
degrees could be different is that there is exactly one vertex of each possible degree. In particular,
there is a vertex v of degree 0 and a vertex w of degree n − 1. However, if there is an edge (v, w),
then v cannot have degree 0, and if there is no edge (v, w) then w cannot have degree n − 1. This
is a contradiction.
1.2
Subsets without divisors
Let [2n] = {1, 2, . . . , 2n}. Suppose you want to pick a subset S ⊂ [2n] so that no number in S divides
another. How many numbers can you pick? Obviously, you can take S = {n + 1, n + 2, . . . , 2n}
and no number divides another. Can you pick more then n numbers? The answer is negative.
Theorem 2. For any subset S ⊂ [2n] of size |S| > n, there are two numbers a, b ∈ S such that a|b.
Proof. For each odd number a ∈ [2n], let Ca = {2k a : k ≥ 0, 2k a ≤ 2n}. The number of these
classes is n and every element b ∈ [2n] belongs to exactly one of them, for a obtained by dividing b
by the highest possible power of 2. Consider S ⊂ [2n] of size |S| > n. By the pigeonhole principle,
there is a class Ca that contains at least two elements of S.
1.3
Rational approximation
Theorem 3. For any x ∈ R and n > 0, there is a rational number p/q, 1 ≤ q ≤ n, such that
x − p < 1 .
q nq
Note that it is easy to get an approximation whose error is at most n1 , by fixing the denominator
to be q = n. The improved approximation uses the pigeonhole principle and is due to Dirichlet
(1879).
1
Proof. Let {x} denote the fractional part of x. Consider {ax} for a = 1, 2, . . . , n + 1 and place
these n + 1 numbers into n buckets [0, 1/n), [1/n, 2/n), . . ., [(n − 1)/n, 1). There must be a bucket
containing at least two numbers {ax} ≤ {a0 x}. We set q = a0 − a and we get {qx} = {a0 x − ax} <
1/n. This means that qx = p + where p is an integer and = {qx} < 1/n. Hence,
x=
1.4
p + .
q q
Monotone subsequences
Finally, we give an application which is less immediate. Given an arbitrary sequence of distinct
real numbers, what is the largest monotone subsequence that we can always find? It is easy to
construct sequences of mn numbers such that any increasing subsequence has length at most m
and any decreasing subsequence has length at most n. We show that this is an extremal example.
Theorem 4. For any sequence of mn+1 distinct real numbers a0 , a1 , . . . , amn , there is an increasing
subsequence of length m + 1 or a decreasing subsequence of length n + 1.
Proof. Let ti denote the maximum length of an increasing subsequence starting with ai . If ti > m
for some i, we are done. So assume ti ∈ {1, 2, . . . , m} for all i; i.e. we have mn + 1 numbers in m
buckets. By the pigeonhole principle, there must be a value s ∈ {1, 2, . . . , m} such that ti = s for
at least n + 1 indices, i0 < i1 < . . . < in . Now we claim that ai0 > ai1 > . . . > ain . Indeed, if there
were a pair such that aij < aij+1 , we could extend the increasing subsequence starting at aij+1 by
adding aij , to get an increasing subsequence of length s + 1. However, this contradicts tij = s.
2
Double counting
Another elementary trick which often brings surprising results is double counting. As the name
suggests, the trick involves counting a certain quantity in two different ways and comparing the
results.
2.1
Sum of degrees in a graph
The following observation is due to Leonard Euler (1736).
Lemma 1. For any graph G, the sum of degrees over all vertices is even.
Proof. For a vertex v and edge e, let i(v, e) = 1 if v ∈ e and 0 otherwise. We count all the incidences
between vertices and edges in two ways:
P
P
P
P
•
v∈V,e∈E i(v, e) =
v∈V
e∈E i(v, e) =
v∈V d(v),
because d(v) is exactly the number of edges incident with v.
P
P
P
•
v∈V,e∈E i(v, e) =
e∈E
v∈V i(v, e) = 2|E|,
because every edge is incident with exactly two vertices.
Thus we have proved that the sum of all degrees is exactly twice the number of edges.
2
2.2
Average number of divisors
Let t(n) denote the number of divisors of n. E.g., for a prime n, t(n) = 2, while for a power of 2,
t(2k ) = k + 1. We would like to know what is the average number of divisors,
n
1X
t¯(n) =
t(j).
n
j=1
This seems to be a complicated question; however,
double counting gives a simple answer. Let
P
d(i, j) = 1Pif i|j and 0 otherwise. I.e., t(j) = ni=1 d(i, j). We count the total number of dividing
pairs i|j, ni,j=1 d(i, j), in two different ways.
Pn
Pn
¯
•
i,j=1 d(i, j) =
j=1 t(j) = n · t(n).
Pn
Pn n
•
i,j=1 d(i, j) '
i=1 i = n · Hn , where Hn is the n-th harmonic number.
In the second case, we have been somewhat sloppy and neglected some roundoff errors, but these
add up to at most n overall. We can conclude that t(n) ' Hn ' ln n, within an error of 1.
3
Math 184: Combinatorics
Lecture 3: Sperner’s lemma and Brouwer’s theorem
Instructor: Benny Sudakov
1
Sperner’s lemma
In 1928, young Emanuel Sperner found a surprisingly simple proof of Brouwer’s famous Fixed Point
Theorem: Every continous map of an n-dimensional ball to itself has a fixed point. At the heart of
his proof is the following combinatorial lemma. First, we need to define the notions of simplicial
subdivision and proper coloring.
Definition 1. An n-dimensional simplex is a convex linear combination of n+1 points in a general
position. I.e., for given vertices v1 , . . . , vn+1 , the simplex would be
(n+1
)
n+1
X
X
S=
αi vi : αi ≥ 0,
αi = 1 .
i=1
i=1
A simplicial subdivision of an n-dimensional simplex S is a partition of S into small simplices
(“cells”) such that any two cells are either disjoint, or they share a full face of a certain dimension.
Definition 2. A proper coloring of a simplicial subdivision is an assignment of n + 1 colors to the
vertices of the subdivision, so that the vertices of S receive all different colors, and points on each
face of S use only the colors of the vertices defining the respective face of S.
For example, for n = 2 we have a subdivision of a triangle T into triangular cells. A proper
coloring of T assigns different colors to the 3 vertices of T , and inside vertices on each edge of T
use only the two colors of the respective endpoints. (Note that we do not require that endpoints of
an edge receive different colors.)
Lemma 1 (Sperner, 1928). Every properly colored simplicial subdivision contains a cell whose
vertices have all different colors.
Proof. Let us call a cell of the subdivision a rainbow cell, if its vertices receive all different colors.
We actually prove a stronger statement, namely that the number of rainbow cells is odd for any
proper coloring.
Case n = 1. First, let us consider the 1-dimensional case. Here, we have a line segment (a, b)
subdivided into smaller segments, and we color the vertices of the subdivision with 2 colors. It is
required that a and b receive different colors. Thus, going from a to b, we must switch color an
odd number of times, so that we get a different color for b. Hence, there is an odd number of small
segments that receive two different colors.
1
Case n = 2. We have a properly colored simplicial subdivision of a triangle T . Let Q denote
the number of cells colored (1, 1, 2) or (1, 2, 2), and R the number of rainbow cells, colored (1, 2, 3).
Consider edges in the subdivision whose endpoints receive colors 1 and 2. Let X denote the
number of boundary edges colored (1, 2), and Y the number of interior edges colored (1, 2) (inside
the triangle T ). We count in two different ways:
• Over cells of the subdivision: For each cell of type Q, we get 2 edges colored (1, 2), while for
each cell of type R, we get exactly 1 such edge. Note that this way we count internal edges
of type (1, 2) twice, whereas boundary edges only once. We conclude that 2Q + R = X + 2Y .
• Over the boundary of T : Edges colored (1, 2) can be only inside the edge between two vertices
of T colored 1 and 2. As we already argued in the 1-dimensional case, between 1 and 2 there
must be an odd number of edges colored (1, 2). Hence, X is odd. This implies that R is also
odd.
General case. In the general n-dimensional case, we proceed by induction on n. We have a
proper coloring of a simplicial subdivision of S using n + 1 colors. Let R denote the number of
rainbow cells, using all n + 1 colors. Let Q denote the number of simplicial cells that get all the
colors except n + 1, i.e. they are colored using {1, 2, . . . , n} so that exactly one of these colors is
used twice and the other colors once. Also, we consider (n − 1)-dimensional faces that use exactly
the colors {1, 2, . . . , n}. Let X denote the number of such faces on the boundary of S, and Y the
number of such faces inside S. Again, we count in two different ways.
• Each cell of type R contributes exactly one face colored {1, 2, . . . , n}. Each cell of type Q
contributes exactly two faces colored {1, 2, . . . , n}. However, inside faces appear in two cells
while boundary faces appear in one cell. Hence, we get the equation 2Q + R = X + 2Y .
• On the boundary, the only (n − 1)-dimensional faces colored {1, 2, . . . , n} can be on the face
F ⊂ S whose vertices are colored {1, 2, . . . , n}. Here, we use the inductive hypothesis for
F , which forms a properly colored (n − 1)-dimensional subdivision. By the hypothesis, F
contains an odd number of rainbow (n − 1)-dimensional cells, i.e. X is odd. We conclude
that R is odd as well.
2
Brower’s Fixed Point Theorem
Theorem 1 (Brouwer, 1911). Let B n denote an n-dimensional ball. For any continuous map
f : B n → B n , there is a point x ∈ B n such that f (x) = x.
We show how this theorem follows from Sperner’s lemma. It will be convenient to work with
a simplex instead of a ball (which is equivalent by a homeomorphism). Specifically, let S be a
simplex embedded in Rn+1 so that the vertices of S are v1 = (1, 0, . . . , 0), v2 = (0, 1, . . . , 0), ...,
vn+1 = (0, 0, . . . , 1). Let f : S → S be a continuous map and assume that it has no fixed point.
We construct a sequence of subdivisions of S that we denote by S1 , S2 , S3 , . . .. Each Sj is a
subdivision of Sj−1 , so that the size of each cell in Sj tends to zero as j → ∞.
2
Now we define a coloring of Sj . For each vertex x ∈ Sj , we assign a color c(x)P
∈ [n+1] such that
(f (x))c(x) < xc(x) . To see that this is possible, note that for each point x ∈ S,
xi = 1, and also
P
f (x)i = 1. Unless f (x) = x, there are coordinates such that (f (x))i < xi and also (f (x))i0 > xi0 .
In case there are multiple coordinates such that (f (x))i < xi , we pick the smallest i.
Let us check that this is a proper coloring in the sense of Sperner’s lemma. For vertices of
S, vi = (0, . . . , 1, . . . , 0), we have c(x) = i because i is the only coordinate where (f (x))i < xi
is possible. Similarly, for vertices on a certain faces of S, e.g. x = conv{vi : i ∈ A}, the only
coordinates where (f (x))i < xi is possible are those where i ∈ A, and hence c(x) ∈ A.
Sperner’s lemma implies that there is a rainbow cell with vertices x(j,1) , . . . , x(j,n+1) ∈ Sj . In
(j,i)
other words, (f (x(j,i) ))i < xi for each i ∈ [n + 1]. Since this is true for each Sj , we get a sequence
of points {x(j,1) } inside a compact set S which has a convergent subsequence. Let us throw away
all the elements outside of this subsequence - we can assume that {x(j,1) } itself is convergent. Since
the size of the cells in Sj tends to zero, the limits limj→∞ x(j,i) are the same in fact for all i ∈ [n + 1]
- let’s call this common limit point x∗ = limj→∞ x(j,i) .
We assumed that there is no fixed point, therefore f (x∗ ) 6= x∗ . This means that (f (x∗ ))i > x∗i
(j,i)
for some coordinate i. But we know that (f (x(j,i) ))i < xi for all j and limj→∞ x(j,i) = x∗ , which
implies (f (x∗ ))i ≤ x∗i by continuity. This contradicts the assumption that there is no fixed point.
3
Math 184: Combinatorics
Lecture 4: Principle of inclusion and exclusion
Instructor: Benny Sudakov
1
Principle of inclusion and exclusion
Very often, we need to calculate the number of elements in the union of certain sets. Assuming
that we know the sizes of these sets, and their mutual intersections, the principle of inclusion and
exclusion allows us to do exactly that.
Suppose that you have two sets A, B. The size of the union is certainly at most |A| + |B|. This
way, however, we are counting twice all elements in A ∩ B, the intersection of the two sets. To
correct for this, we subtract |A ∩ B| to obtain the following formula:
|A ∪ B| = |A| + |B| − |A ∩ B|.
In general, the formula gets more complicated because we have to take into account intersections
of multiple sets. The following formula is what we call the principle of inclusion and exclusion.
Lemma 1. For any collection of finite sets A1 , A2 , . . . , An , we have
n
[
\ X
(−1)|I|+1 Ai .
Ai =
i=1
i∈I
∅6=I⊆[n]
Writing out the formula more explicitly, we get
|A1 ∪ . . . An | = |A1 | + . . . + |An | − |A1 ∩ A2 | − . . . − |An−1 ∩ An | + |A1 ∩ A2 ∩ A3 | + . . .
In other words, we add up the sizes of the sets, subtract intersections of pairs, add intersection of
triples, etc. The proof of this formula is very short and elegant, using the notion of a characteristic
function.
Proof. Assume that A1 , . . . , An ⊆ X. For each set Ai , define the “characteristic function” fi (x)
where fi (x) = 1 if x ∈ Ai and fi (x) = 0 if x ∈
/ Ai . We consider the following formula:
n
Y
(1 − fi (x)).
F (x) =
i=1
Observe that this is the characteristic function of the complement of
any of the sets Ai . Hence,
n
X
[
F (x) = |X \
Ai |.
Sn
i=1 Ai :
i=1
x∈X
Now we write F (x) differently, by expanding the product into 2n terms:
F (x) =
n
Y
X
Y
(1 − fi (x)) =
(−1)|I|
fi (x).
i=1
I⊆[n]
1
i∈I
it is 1 iff x is not in
(1)
Observe that
Q
i∈I
fi (x) is the characteristic function of
X
F (x) =
x∈X
X
(−1)|I|
XY
T
i∈I
X
fi (x) =
x∈X i∈I
I⊆[n]
Ai . Therefore, we get
(−1)|I| |
\
Ai |.
(2)
i∈I
I⊆[n]
By comparing (1) and (2), we see that
|X \
n
[
Ai | = |X| − |
i=1
n
[
Ai | =
i=1
X
(−1)|I| |
\
Ai |.
i∈I
I⊆[n]
T
The first term in the sum here is | i∈∅ Ai | = |X| by convention (consider how we obtained this
term in the derivation above). Therefore, the lemma follows.
2
The number of derangements
As an application of this principle, consider the following problem. A sequence of n theatergoers
want to pick up their hats on the way out. However, the deranged attendant does not know which
hat belongs to whom and hands them out in a random order. What is the probability that nobody
gets their own hat? More formally, we have a random permutation π : [n] → [n] and we are asking
what is the probability that ∀i; π(i) 6= i. Such permutations are called derangements.
Theorem
1. The probability that a random permutation π : [n] → [n] is a derangement is
Pn
k /k!, which tends to 1/e = 0.3678 . . . as n → ∞.
(−1)
k=0
Proof. Let X be the set of all n! permutations, and let Ai denote the set of permutations that fix
element i, i.e.
Ai = {π ∈ X | π(i) = i}.
By simple counting, there are (nT− 1)! permutations in Ai , since by fixing i, we still have n − 1
elements to permute. Similarly, i∈I Ai consists of the permutations where all elements of I are
fixed, hence the number of such permutations is (n − |I|)!. By inclusion-exclusion, the number of
permutations with some fixed point is
[ \ X
(−1)|I|+1 Ai Ai =
i∈I
=
=
i∈I
∅6=I⊆[n]
n
X
k=1
n
X
k+1 n
(−1)
(n − k)!
k
(−1)k+1
k=1
n!
.
k!
Pn
k+1 /k!. By
Hence, the probability that a random permutation has some fixed point is P
k=1 (−1)
n
taking the complement, the probability that there is no fixed point is P
1 − k=1 (−1)k+1 /k! =
P
n
k
−1
k
= ∞
k=0 (−1) /k!. In the limit, this tends to the Taylor expansion of e
k=0 (−1) /k!.
2
3
The number of surjections
Next, consider the following situation. There are m hunters and n rabbits, m ≥ n. Each hunter
shoots at random and kills exactly one (random) rabbit. What is the probability that all rabbits
are dead?
P
k n
m
Theorem 2. The probability that all rabbits are dead is n−1
k=0 (−1) k (1 − k/n) .
Proof. We formalize this problem as follows. A function f : [m] → [n] is called a surjection if it
covers all elements of [n]. There are nm functions total; we are interested in how many of these are
surjections. We denote by Ai the set of functions that leave element i uncovered, i.e.
Ai = {f : [m] → [n] | ∀j; f (j) 6= i}.
The number of such functions is (n−1)m , since we have n−1 choices for each of f (1), f (2), . . . , f (m).
Similarly,
\
|
Ai | = (n − |I|)m
i∈I
because we have |I| forbidden choices for each function value. By inclusion-exclusion, we get that
the number of functions which are not surjections is
|
m
[
X
Ai | =
i=1
(−1)|I|+1
∅6=I⊆[n]
n
(n − |I|)m .
|I|
By taking the complement, the number of surjections is
m
n −|
m
[
i=1
Ai | =
n−1
X
k=0
n
(−1)
(n − k)m .
k
Dividing by nm , we get the desired probability.
3
k
Math 184: Combinatorics
Lecture 5: Ramsey Theory
Instructor: Benny Sudakov
1
Ramsey’s theorem for graphs
The metastatement of Ramsey theory is that “complete disorder is impossible”. In other words, in
a large system, however complicated, there is always a smaller subsystem which exhibits some sort
of special structure. Perhaps the oldest statement of this type is the following.
Proposition 1. Among any six persons, either there are three persons any two of whom are friends,
or there are three persons such that no two of them are friends.
This is not a sociological claim, but a very simple graph-theoretic statement: in other words,
in any graph on 6 vertices, there is either a triangle or three vertices with no edges between them.
Proof. Let G = (V, E) be a graph and |V | = 6. Fix a vertex v ∈ V . We consider two cases.
• If the degree of v is at least 3, then consider three neighbors of v, call them x, y, z. If any two
among {x, y, z} are friends, we are done because they form a triangle together with v. If not,
no two of {x, y, z} are friends and we are done as well.
• If the degree of v is at most 2, then there are at least three other vertices which are not
neighbors of v, call them x, y, z. In this case, the argument is complementary to the previous
one. Either {x, y, z} are mutual friends, in which case we are done. Or there are two among
{x, y, z} who are not friends, for example x and y, and then no two of {v, x, y} are friends.
More generally, we consider the following setting. We color the edges of Kn (a complete graph
on n vertices) with a certain number of colors and we ask whether there is a complete subgraph (a
clique) of a certain size such that all its edges have the same color. We shall see that this is always
true for a sufficiently large n. Note that the question about frienships corresponds to a coloring of
K6 with 2 colors, “friendly” and “unfriendly”. Equivalently, we start with an arbitrary graph and
we want to find either a clique or the complement of a clique, which is called an independent set.
This leads to the definition of Ramsey numbers.
Definition 1. A clique of size t is a set of t vertices such that all pairs among them are edges.
An independent set of size s is a set of s vertices such that there is no edge between them.
Ramsey’s theorem states that for any large enough graph, there is an independent set of size s
or a clique of size t. The smallest number of vertices required to achieve this is called a Ramsey
number.
1
Definition 2. The Ramsey number R(s, t) is the minimum number n such that any graph on n
vertices contains either an independent set of size s or a clique of size t.
The Ramsey number Rk (s1 , s2 , . . . , sk ) is the minimum number n such that any coloring of the
edges of Kn with k colors contains a clique of size si in color i, for some i.
Note that it is not clear a priori that Ramsey numbers are finite! Indeed, it could be the case
that there is no finite number satisfying the conditions of R(s, t) for some choice of s, t. However,
the following theorem proves that this is not the case and gives an explicit bound on R(s, t).
Theorem 1 (Ramsey’s theorem). For any s, t ≥ 1, there is R(s, t) < ∞ such that any graph on
R(s, t) vertices contains either an independent set of size s or a clique of size t. In particular,
s+t−2
R(s, t) ≤
.
s−1
We remark that the bound given here is stronger than Ramsey’s original bound.
Proof. We show that R(s, t) ≤ R(s − 1, t) + R(s, t − 1). To see this, let n = R(s − 1, t) + R(s, t − 1)
and consider any graph G on n vertices. Fix a vertex v ∈ V . We consider two cases:
• There are at least R(s, t−1) edges incident with v. Then we apply induction on the neighbors
of v, which implies that either they contain an independent set of size s, or a clique of size
t − 1. In the second case, we can extend the clique by adding v, and hence G contains either
an independent set of size s or a clique of size t.
• There are at least R(s − 1, t) non-neighbors of v. Then we apply induction to the nonneighbors of v and we get either an independent set of size s − 1, or a clique of size t. Again,
the independent set can be extended by adding v and hence we are done.
Given that R(s, t) ≤ R(s − 1, t) + R(s, t − 1), it follows by induction that
these Ramsey numbers
s+t−2
are finite. Moreover, we get an explicit bound. First, R(s, t) ≤ s−1 holds for the base cases
where s = 1 or t = 1 since every graph contains a clique or an independent set of size 1. The
inductive step is as follows:
s+t−3
s+t−3
s+t−2
R(s, t) ≤ R(s − 1, t) + R(s, t − 1) ≤
+
=
s−2
s−1
s−1
by a standard identity for binomial coefficients.
For a larger number of colors, we get a similar statement.
Theorem 2. For any s1 , . . . , sk ≥ 1, there is Rk (s1 , . . . , sk ) < ∞ such that for any k-coloring of
the edges of Kn , n ≥ Rk (s1 , . . . , sk ), there is a clique of size si in some color i.
We only sketch the proof here. Let us assume for simplicity that k ≥ 4 is even. We show that
Rk (s1 , s2 , . . . , sk ) ≤ Rk/2 (R(s1 , s2 ), R(s3 , s4 ), . . . , R(sk−1 , sk )).
To prove this, let n = Rk/2 (R(s1 , s2 ), R(s3 , s4 ), . . . , R(sk−1 , sk )) and consider any k-coloring of the
edges of Kn . We pair up the colors: {1, 2}, {3, 4}, {5, 6}, etc. By the definition of n, there exists a
subset S of R(s2i−1 , s2i ) vertices such that all edges on S use only colors 2i − 1 and 2i. By applying
Ramsey’s theorem once again to S, there is either a clique of size s2i−1 in color 2i − 1, or a clique
of size s2i in color 2i.
2
2
Schur’s theorem
Ramsey’s theory for integers is about finding monochromatic subsets with a certain arithmetic
structure. It starts with the following theorem of Schur (1916), which turns out to be an easy
application of Ramsey’s theorem for graphs.
Theorem 3. For any k ≥ 2, there is n > 3 such that for any k-coloring of {1, 2, . . . , n}, there are
three integers x, y, z of the same color such that x + y = z.
Proof. We choose n = Rk (3, 3, . . . , 3), i.e. the Ramsey number such that any k-coloring of Kn
contains a monochromatic triangle. Given a coloring c : [n] → [k], we define an edge coloring of
Kn : the color of edge {i, j} will be χ({i, j}) = c(|j − i|). By the Ramsey theorem for graphs, there
is a monochromatic triangle {i, j, k}; assume i < j < k. Then we set x = j − i, y = k − j and
z = k − i. We have c(x) = c(y) = c(z) and x + y = z.
Schur used this in his work related to Fermat’s Last Theorem. More specifically, he proved that
Fermat’s Last Theorem is false in the finite field Zp for any sufficiently large prime p.
Theorem 4. For every m ≥ 1, there is p0 such that for any prime p ≥ p0 , the congruence
xm + y m = z m
(mod p)
has a solution.
Proof. The multiplicative group Zp∗ is known to be cyclic and hence it has a generator g. Each
element of Zp∗ can be written as x = g mj+i where 0 ≤ i < m. We color the elements of Zp∗ by m
colors, where c(x) = i if x = g mj+i . By Schur’s theorem, for p sufficiently large, there are elements
x, y, z ∈ Zp∗ such that x0 + y 0 = z 0 and c(x0 ) = c(y 0 ) = c(z 0 ). Therefore, x0 = g mjx +i , y 0 = g mjy +i
and z 0 = g mjz +i and
g mjx +i + g mjy +i = g mjz +i .
Setting x = g jx , y = g jy and z = g jz , we get a solution of xm + y m = z m in Zp .
3
Math 184: Combinatorics
Lecture 5B: Ramsey theory: integers
Instructor: Benny Sudakov
1
Hales-Jewett and Van der Waerden’s theorem
The question of finding monochromatic subsets with some additive structure was investigated more
generally by Van der Waerden. He proved the following general theorem. We remark that this still
predates Ramsey’s theorem for graphs.
Theorem 1 (Van der Waerden, 1927). For any k ≥ 2, ` ≥ 3, there is n such that any k-coloring of
[n] contains a monochromatic arithmetic progression of length `: {a, a + b, a + 2b, . . . , a + (` − 1)b}.
Later, Hales and Jewett discovered the following Ramsey theorem, from which many Ramseytype statements can be easily deduced. This theorem is about colorings of a sequences of n symbols
from some alphabet A. We denote the set of such sequences by An . Geometrically, this can be
viewed as an n-dimensional cube.
Definition 1. A combinatorial line in An is a set of n points defined by x ∈ An and S ⊆ [n]:
L(x, S) = {y ∈ An : ∀i, j ∈ S; yi = yj &∀i ∈
/ S; yi = xi }.
In other words, we fix the coordinates outside of S, while the coordinates in S all vary simultaneously
over the symbols in A.
Theorem 2 (Hales-Jewett, 1963). For any finite alphabet A and k ≥ 2, there is n0 (A, k) such that
for any k-coloring of An , n ≥ n0 (A, k), there exists a monochromatic combinatorial line.
We will not prove this theorem in this class, although it can be done in one lecture (and the
proof can be found in Jukna’s book). We will only show how Van der Waerden’s theorem follows
from Hales-Jewett.
Given k and `, we fix an alphabet A = {1, 2, . . . , `} and choose n = `n0 (A, k) where n0 (A, k) is
large enough according to the Hales-Jewett theorem. Given a k-coloring c : [n] → [k], we define an
induced coloring χ : An0 → [k]:
χ(a1 , a2 , . . . , an0 ) = c(a1 + a2 + . . . + an0 ).
By the Hales-Jewett theorem, there is a monochromatic combinatorial line in An0 . It is easy to see
that this combinatorial line translates under the mapping (a1 , . . . , an0 ) → a1 + . . . + an0 into an
arithmetic progression of length `.
2
Szemer´
edi’s theorem
Much later, Szemer´edi proved that arithmetic progressions can be found not only in any k-coloring,
but in fact in any set of sufficient density.
1
Theorem 3 (Szemer´edi). For any δ > 0 and ` ≥ 3, there is n0 such that for any n ≥ n0 and any
set S ⊆ [n], |S| ≥ δn, S contains an arithmetic progression of length `.
It can be seen that this implies Van der Waerden’s theorem, since we can set δ = 1/k and for any
k-coloring of [n], one color class contains at least δn elements. However, the proof of Szemer´edi’s
theorem is much more complicated (it uses the Szemer´edi regularity lemma).
Recently, there has been renewed interest in this theory, in particular the relationship between
the magnitude of n0 and the density δ. The best known bounds are due to Ben Green and
c
Terence Tao, who showed that it is enough to choose n0 ∼ 21/δ for some constant c. This is
a significant improvement over the original bounds of Szemer´edi that involve tower functions of
1/δ. These developments also led to the celebrated result of Green and Tao that prime numbers
contain arbitrarily long arithmetic progressions.
2
Math 184: Combinatorics
Lecture 6: Ramsey theory: continued
Instructor: Benny Sudakov
1
Bounds on Ramsey numbers
Ramsey number of particular interest are the diagonal Ramsey numbers R(s, s). The bound we
have proved gives
2s − 2
4s
≤√ .
R(s, s) ≤
s−1
s
This bound has not been improved significantly for over 50 years! All we know currently is that
exponential growth is the right order of magnitude, but the base of the exponential is not known.
The following is an old lower bound of Erd˝os. Note that to get a lower bound, we need to show
that there is a large graph without cliques and independent sets of a certain size. Equivalently, we
need to prove there is a 2-coloring such that there is no monochromatic clique of a certain size in
either color. This is quite difficult to achieve by an explicit construction. (The early lower bounds
on R(s, s) were polynomial in s.)
The interesting feature of Erd˝
os’s proof is that he never presents a specific coloring. He simply
proves that choosing a coloring at random almost always works! This was one of the first occurences
of the probabilistic method in combinatorics. The probabilistic method has been used in combinatorics ever since with phenomenal success, using much more sophisticated tools; we will return to
this later.
Theorem 1. For s ≥ 3,
R(s, s) > 2s/2 .
Proof. Let n = 2s/2 . Consider a random coloring of Kn where each edge is colored independently
red or blue with probability 1/2. For any particular s-tuple of vertices S, the probability that the
s
clique on S has all edges of the same color is 2/2(2) . The number of s-tuples of vertices is ns and
therefore the probability that some s-clique is monochromatic is at most
2ns
21+s/2
n 2
<
=
<1
s
s 2(2)
s!
s!2s(s−1)/2
for s ≥ 3. Therefore, with non-zero probability, there is no monochromatic clique of size s and such
a coloring certifies that R(s, s) > 2s/2 .
Determining Ramsey numbers exactly, even for small values of s, is a notoriously difficult task.
The currently known diagonal values are: R(2, 2) = 2, R(3, 3) = 6, R(4, 4) = 18. R(5, 5) is known
to be somewhere between 43 − 49, and R(6, 6) between 102 − 165. 1
1
A famous quote from Paul Erd˝
os goes as follows: “Imagine an alien force, vastly more powerful than us, demanding
the value of R(5, 5) or they will destroy our planet. In that case, we should marshal all our computers and all our
mathematicians and attempt to find the value. But suppose, instead, that they ask for R(6, 6). Then we should
attempt to destroy the aliens.”
1
Math 184: Combinatorics
Lecture 7: Ramsey theory: continued
Instructor: Benny Sudakov
1
Ramsey’s theorem for hypergraphs
Next, we will talk about “generalized graphs” which are called hypergraphs. While a graph is a
given by a collection of pairs (edges) on a set of vertices V , a hypergraph can contain “hyperedges”
of arbitrary size. Thus a hypergraph in full generality is any collection of subsets of V . Hypergraphs
of particular importance are r-uniform hypergraphs, where all hyperedges have size r. Thus graphs
can be viewed as 2-uniform hypergraphs.
Definition 1. An r-uniform hypergraph is a pair H = (V, E) where V is a finite set of vertices
and E ⊆ Vr is a set of hyperedges.
An empty hypergraph is a hypergraph with no hyperedges.
(r)
A complete r-uniform hypergraph is Kn = (V, Vr ) where |V | = n.
A subhypergraph of H induced by a set of vertices S is H[S] = (S, E ∩ Sr ).
An independent set is a set of vertices S that induces an empty hypergraph.
A clique is a set of vertices T that induces a complete hypergraph.
We define Ramsey numbers for hypergraphs in a way similar to the previous lecture.
Definition 2. The hypergraph Ramsey number R(r) (s, t) is the minimum number n such that any
r-uniform hypergraph on n vertices contains either an independent set of size s or a clique of size
t.
(r)
The Ramsey number Rk (s1 , s2 , . . . , sk ) is the minimum number n such that any coloring of the
(r)
edges of the complete hypergraph Kn with k colors contains a clique of size si whose edges all have
color i, for some i.
Theorem 1 (Ramsey for hypergraphs). For any s, t ≥ r ≥ 1, the Ramsey number R(r) (s, t) is
finite and satisfies
R(r) (s, t) ≤ R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) + 1.
Similarly, it can be shown that the hypergraph Ramsey numbers for k colors are finite.
Proof. We know from the previous lecture that the Ramsey numbers R(2) (s, t) = R(s, t) are finite.
It’s also easy to see that R(r) (s, r) = s (because any hypergraph on s vertices is either empty or it
contains some edge on r vertices). Similarly R(r) (r, t) = t.
So it remains to prove the inductive step. Fix r, s, t and assume that R(r) (s − 1, t), R(r) (s, t − 1)
and R(r−1) (u, v) for all u, v are finite. Let n = R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) + 1 and consider
any r-uniform hypergraph H on n vertices. Fix a vertex v ∈ V and define an (r − 1)-uniform
hypergraph L(v) on the remaining vertices: an (r − 1)-tuple R0 is in L(v), if and only if R0 ∪ {v}
is a hyperedge of H. 1
1
L(v) is sometimes called the link of v. It generalizes the notion of a neighborhood in a graph.
1
Since L(v) is an (r − 1)-uniform hypergraph on R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) vertices, it
has either an independent set of size R(r) (s − 1, t) or a clique of size R(r) (s, t − 1).
• If L(v) has
set S of size R(r) (s − 1, t), then we know that for any (r − 1)-tuple
an independent
S
0
0
R ∈ r−1 , {v} ∪ R is not a hyperedge in H. Also, by induction we can apply the Ramsey
property to the induced hypergraph H[S]: it contains either an independent set S1 of size
s − 1 or a clique T1 of size t. In the first case, S2 ∪ {v} is an independent set of size s in H,
and in the second case, T1 is a clique of size t in H.
T
• If L(v) has a clique T of size R(r) (s, t − 1), then we know that for any (r − 1)-tuple R0 ∈ r−1
,
T ∪ {v} is a hyperedge of H. Again, by induction on H[T ], we obtain that H[T ] has either
an independent set S1 of size s or a clique T1 of size t − 1. In the first case, we are done and
in the second case, T1 ∪ {x} is a clique of size t in H.
(r)
The proof that the Ramsey numbers Rk (s1 , s2 , . . . , sk ) are finite is more complicated but in
the same spirit. Let us just state a corollary of this statement, Ramsey’s theorem for coloring sets.
Theorem 2. For any t ≥ r ≥ 2 and k ≥ 2, there is n such that for any coloring χ : [n]
→ [k],
r
T
there is a subset T ⊂ [n] of size |T | = t such that all subsets R ∈ r have the same color χ(R).
Note that this corresponds to a special case where we color a complete r-uniform hypergraph
with k colors and we want to find a monochromatic clique of size t in some color. Therefore, n can
(r)
be chosen as n = Rk (t, t, . . . , t).
2
Convex polygons among points in the plane
Ramsey theory has many applications. The following is a geometric statement which follows quite
surprisingly from Ramsey’s theorem for 4-uniform hypergraphs.
Theorem 3 (Erd˝
os-Szekeres, 1935). For any m ≥ 4, there is n, such that given any configuration
of n points in the plane, no three on the same line, there are m points forming a convex polygon.
To deduce this theorem from Ramsey’s theorem, we need only the following two geometric facts.
We say that points are in a convex position if they form a convex polygon.
Fact 1. Among five points in the plane, no three on the same line, there are always four points in
a convex position.
Fact 2. If out of m points, any four are in a convex position, then they are all in a convex position.
The first fact can be seen by checking three cases, depending on whether the convex hull of the
5 points consists of 3,4 or 5 points. Note that this proves the statement of the theorem for m = 4.
The second fact holds because if one of the m points is inside the convex hull of others, then
it is also inside a triangle formed by 3 other points and then these 4 are not in a convex position.
Now we are ready to prove the theorem.
2
Proof. Given m, let n = R(4) (5, m) and consider any set of n points in the plane, no three on the
same line. We define a 4-uniform hypergraph H on this set of n points. Let a 4-tuple of points
u, v, x, y form a hyperedge if they are in a convex position. By Ramsey’s theorem, H must contain
either an independent set of size 5 or a clique of size m. However, an independent set of size 5
would mean that there are 5 points without any 4-typle in a convex position. This would contradict
Fact 1. Therefore, there must be a clique of size m which means m points where all 4-tuples are in
a convex position. Then Fact 2 implies that these m points form a convex polygon.
3
Math 184: Combinatorics
Lecture 8: Extremal combinatorics
Instructor: Benny Sudakov
1
Relationship between Ramsey theory and extremal theory
Consider the following theorem, which falls within the framework of Ramsey theory.
Theorem 1 (Van der Waerden, 1927). For any k ≥ 2, ` ≥ 3, there is n such that any k-coloring of
[n] contains a monochromatic arithmetic progression of length `: {a, a + b, a + 2b, . . . , a + (` − 1)b}.
This is a classical theorem which predates even Ramsey’s theorem about graphs. We are not
going to present the proof here. Note, however, that in order to prove such a statement, it would
be enough to show that for a sufficiently large [n], any subset of at least n/k elements contains an
arithmetic progression of length `. This is indeed what Szemer´edi proved, much later and using
much more involved techniques.
Theorem 2 (Szemer´edi). For any δ > 0 and ` ≥ 3, there is n0 such that for any n ≥ n0 and any
set S ⊆ [n], |S| ≥ δn, S contains an arithmetic progression of length `.
It can be seen that this implies Van der Waerden’s theorem, since we can set δ = 1/k and for any
k-coloring of [n], one color class contains at least δn elements. Szemer´edi’s theorem is an extremal
type of statement - stating, that any object of sufficient size must contain a certain structure.
2
Bipartite graphs
Definition 1. A graph G is called bipartite, if the vertices can be partitioned into V1 and V2 , so
that there are no edges inside V1 and no edges inside V2 .
Equivalently, G is bipartite if its vertices can be colored with 2 colors so that the endpoints of
every edge get two different colors. (The 2 colors correspond to V1 and V2 .) Thus, bipartite graphs
are called equivalently 2-colorable.
We also have the following characterization, which is useful to know.
Lemma 1. G is bipartite, if and only if it does not contain any cycle of odd length.
Proof. Suppose G has an odd cycle. Then obviously it cannot be bipartite, because no odd cycle
is 2-colorable.
Conversely, suppose G has no odd cycle. Then we can color the vertices greedily by 2 colors,
always choosing a different color for a neighbor of some vertex which has been colored already. Any
additional edges are consistent with our coloring, otherwise they would close a cycle of odd length
with the edges we considered already.
The easiest extremal question is about the maximum possible number of edges in a bipartite
graph on n vertices.
1
Lemma 2. A bipartite graph on n vertices can have at most 41 n2 edges.
Proof. Suppose the bipartition is (V1 , V2 ) and |V1 | = k, |V2 | = n − k. The number of edges between
V1 and V2 can be at most k(n − k), which is maximized for k = n/2.
3
Graphs without a triangle
Let us consider Ramsey’s theorem for graphs, which guarantees the existence of a monochromatic
triangle for an arbitrary coloring of the edges. An analogous extremal question is, what is the
largest number of edges in a graph that does not have any triangle? We remark that this is not
the right way to prove Ramsey’s theorem - even for triangles, it is not true that for any 2-coloring
of a large complete graph, the larger color class must contain a triangle.
Exercise: what is a counterexample?
The question how many edges are necessary to force a graph to contain a triangle is very old
and it was resolved by the following theorem.
Theorem 3 (Mantel, 1907). For any graph G with n vertices and more than 41 n2 edges, G contains
a triangle.
Proof. Assume that G has n vertices, m edges and no triangle. Let dx denote the degree of x ∈ V .
Whenever (x, y) ∈ E, we know that x and y cannot share a neighbor (which would form a triangle),
and therefore dx + dy ≤ n. Summing up over all edges, we get
X
X
mn ≥
(dx + dy ) =
d2x .
x∈V
(x,y)∈E
On the other hand, applying Cauchy-Schwartz to the vectors (d1 , d2 , . . . , dn ) and (1, 1, . . . , 1), we
obtain
!2
X
X
dx
= (2m)2 .
n
d2x ≥
x∈V
x∈V
Combining these two inequalities, we conclude that m ≤ 14 n2 .
We remark that the analysis above can be tight only if for every edge, any other vertex is
connected to exactly one of the two endpoints. This defines a partition V1 ∪ V2 such that we have
all edges between V1 and V2 , i.e. a complete bipartite graph. When |V1 | = |V2 |, this is the unique
extremal graph without a triangle, containing 14 n2 edges.
4
Graphs without a clique Kt+1
More generally, it is interesting to ask how many edges G can have if G does not contain any clique
Kt+1 . Graphs without Kt+1 can be constructed for example by taking t disjoint sets of vertices,
V = V1 ∪. . .∪Vt , and inserting all edges between vertices in different sets. Now, obviously there is no
Kt+1 , since any set of t + 1 vertices has two vertices in the same set Vi . The number of edges in such
a graph is maximized, when the sets Vi are as evenly sized as possible, i.e. |Vi | − |Vj | ∈ {−1, 0, +1}
2
for all i, j. We call such a graph on n vertices the Tur´
an graph Tn,t . Tur´an proved in 1941 that
this is indeed the graph without Kt+1 containing the maximum number of edges. Note that the
number of edges in Tn,t is 21 (1 − 1t )n2 , assuming for simplicity that n is divisible by t.
Theorem 4 (Tur´
an, 1941). Among all Kt+1 -free graphs on n vertices, Tn,t has the most edges.
Proof. Let G be a graph without Kt+1 and vm a vertex of maximum degree dm . Let S be the set
of neighbors of vm , |S| = dm , and T = V \ S. Note that by assumption, S has no clique of size t.
We modify the graph into G0 as follows: we keep the graph inside S, we include all possible
edges between S and T , and we remove all edges inside T . For each vertex, the degree can only
increase: for vertices in S, this is obvious, and for vertices in T , the new degrees are at least dm ,
i.e. at least as large as any degree in G. Thus the total number of edges can only increase.
By induction, we can prove that G[S] can be also modified into a union of disjoint independent
sets with all edges between them. Therefore, the best possible graph has the structure of a Tur´
an
graph.
To prove that the Tur´
an graph is the unique extremal graph, we note that if G had any edges
inside T , then we strictly gain by modifying the graph into G0 .
We present another proof of Tur´
an’s theorem, which is probabilistic. Here, we only prove the
1
1
2
quantitative part, that 2 (1 − t )n is the maximum number of edges in a graph without Kt+1 .
P
Proof. Let’s consider a probability distribution on the vertices, p1 , . . . , pn such that ni=1 pi = 1.
We start with pi = 1/n for all vertices. Suppose we sample two vertices v1 , v2 independently
according to this distribution - what is the probability that {v1 , v2 } ∈ E? We can write this
probability as
X
Pr[{v1 , v2 } ∈ E] =
pi pj .
i,j:{i,j}∈E
At the beginning, this is equal to n22 |E|.
Now we modify the distribution in order to make Pr[{v1 , v2 } ∈ E] as large as possible. We
claim that the probability distribution that maximizes this probability is uniform on some maximal
clique.
P We proceed as follows:
P If there are two non-adjacent vertices i, j such that pi , pj > 0, let
si = k:{i,k}∈E pk and sj = k:{j,k}∈E pk . If si ≥ sj , we set the probability of vertex i to pi + pj
and the probability of vertex j to 0 (and conversely if si < sj ). It can be verified that this increases
Pr[{v1 , v2 } ∈ E] by pj (si − sj ) or pi (sj − si ), respectively.
Eventually, we reach a situation where there are no two non-adjacent vertices of positive probability, i.e. the distribution is on a clique Q. Then,
X
Pr[{v1 , v2 } ∈ E] = Pr[v1 6= v2 ] = 1 −
p2i .
i∈Q
By Cauchy-Schwartz, this is maximized when pi is uniform on Q, i.e.
Pr[{v1 , v2 } ∈ E] ≤ 1 −
1
1
≤1−
|Q|
t
assuming that there is no clique larger than t. Recall that the probability we started with was
2
|E| and we never decreased it in the process. Therefore,
n2
1 n2
|E| ≤ 1 −
.
t 2
3
4
Math 184: Combinatorics
Lecture 9: Extremal combinatorics
Instructor: Benny Sudakov
1
The Erd˝
os-Stone theorem
We can ask more generally, what is the maximum number of edges in a graph G on n vertices,
which does not contain a given subgraph H? We denote this number by ex(n, H). For graphs G
on n vertices, this question is resolved up to an additive error of o(n2 ) by the Erd˝os-Stone theorem.
In order to state the theorem, we first need the notion of a chromatic number.
Definition 1. For a graph H, the chromatic number χ(H) is the smallest c such that the vertices
of H can be colored with c colors with no neighboring vertices receiving the same color.
The chromatic number is an important parameter of a graph. The graphs of chromatic number
at most 2 are exactly bipartite graphs. In contrast, graphs of chromatic number 3 are already
hard to decribe and hard to recognize algorithmically. Let us also mention the famous Four Color
Theorem which states that any graph that can be drawn in the plane without crossing edges has
chromatic number at most 4.
The chromatic number of H turns out to be closely related to the question of how many edges
are necessary for H to appear as a subgraph.
Theorem 1 (Erd˝
os-Stone). For any fixed graph H and fixed > 0, there is n0 such that for any
n ≥ n0 ,
1
1
1
1
2
1−
1−
− n ≤ ex(n, H) ≤
+ n2 .
2
χ(H) − 1
2
χ(H) − 1
In particular, for bipartite graphs H, which can be colored with 2 colors, we get that ex(n, H) ≤
n2 for any > 0 and sufficiently large n, so the theorem only says that the extremal number
is very small compared to n2 . We denote this by ex(H, n) = o(n2 ). For graphs H of chromatic
number 3, we get ex(n, H) = 41 n2 + o(n2 ), etc. Note that this also matches the bound we obtained
for H = Kt+1 (χ(Kt+1 ) = t + 1), where we got the exact answer ex(n, Kt+1 ) = 12 (1 − 1t )n2 .
First, we prove the following technical lemma.
Lemma 1. Fix k ≥ 1, 0 < < 1/k and t ≥ 1. Then there is n0 (k, , t) such that any graph G
with n ≥ n0 (k, , t) vertices and m ≥ 12 (1 − 1/k + )n2 edges contains k + 1 disjoint sets of vertices
A1 , A2 , . . . , Ak+1 of size t, such that any two vertices in different sets Ai , Aj are joined by an edge.
Proof. First, we will find a subgraph G0 ⊂ G where all degrees are at least (1 − 1/k + /2)|V (G0 )|.
The procedure to find such a subgraph is very simple: as long as there is a vertex of degree smaller
than (1 − 1/k + /2)|V (G)|, remove the vertex from the graph. We just have to prove that this
procedure terminates before the graph becomes too small.
Suppose that the procedure stops when the graph has n0 vertices (potentially n0 = 0, but we
will prove that this is impossible). Let’s count the total number of edges that we have removed
1
from the graph. At the point when G has ` vertices, we remove at most (1 − 1/k + /2)` edges.
Therefore, the total number of removed edges is at most
n
X
1
n + n0 + 1
1
n2 − n20 n − n0
1
`= 1− +
(n − n0 )
≤ 1− +
+
.
1− +
k 2
k 2
2
k 2
2
2
`=n0 +1
At the end, G has at most 12 n20 edges. Therefore, the number of edges in the original graph must
have been
1
n2 − n20 n − n0 1 2
1
n2
1
n20 n − n0
|E(G)| ≤ 1 − +
+
+ n0 = 1 − +
+
−
+
.
k 2
2
2
2
k 2 2
k 2 2
2
2
On the other hand, we assumed that |E(G)| ≥ 1 − k1 + n2 . Combining these two inequalities,
we obtain that
n20 n0
n2 n
1
−
−
≥
− .
k 2 2
2
4
2
Thus if √
we want to get n0 large enough, it’s sufficient to choose n appropriately larger (roughly
n ' n0 / k).
From now on, we can assume that all degrees in G are at least (1 − 1/k + /2)n. We prove by
induction on k that there are k + 1 sets of size t such that we have all edges between vertices in
different sets. For k = 1, there is nothing to prove.
Let k ≥ 2 and s = dt/e. By the induction hypothesis, we can find k disjoint sets of size
s, A1 , . . . , Ak such that any two vertices in two different sets are joined by an edge. Let U =
V \ (A1 ∪ . . . ∪ Ak ) and let W denote the set of vertices in U , adjacent to at least t points in each
Ai . Let us count the edges missing between U and A1 ∪ . . . ∪ Ak . Since every vertex in U \ W is
adjacent to less than t vertices in some Ai , the number of missing edges is at least
m
˜ ≥ |U \ W |(s − t) ≥ (n − ks − |W |)(1 − )s.
On the other hand, any vertex in the graph has at most (1/k − /2)n missing edges, so counting
over A1 ∪ . . . ∪ Ak , we get
m
˜ ≤ ks(1/k − /2)n = (1 − k/2)sn.
From these inequalities, we deduce
|W |(1 − )s ≥ (n − ks)(1 − )s − (1 − k/2)sn
= (k/2 − 1)sn − (1 − )ks2 .
Everything else being constant, we can make n large enough so that |W | is arbitrarily large. In
particular, we make sure that
k
s
|W | >
(t − 1).
t
We know that each vertex w ∈ W is adjacent to at least t points in each Ai . Select t specific points
k
from each Ai and denote the union of all these kt points Tw . We have st possible sets Tw ; by
the pigeonhole principle, at least one of them is chosen for at least t vertices w ∈ W . We define
these t vertices to constitute our new set Ak+1 , and the respective t-tuples of vertices connected to
it A0i ⊂ Ai . The collection of sets A1 , . . . , Ak+1 satisfies the property that all pairs of vertices from
different sets form edges.
2
Now we are ready to prove the Erd˝os-Stone theorem.
Proof. Let χ(H) = k + 1. The Tur´
an graph Tn,k has chromatic number k, hence it cannot contain
H. This proves ex(n, H) ≥ 12 (1 − 1/k)n2 whenever n is a multiple of k. Therefore,
1
1
1
1 2k
2
1−
(n − k) ≥
1− −
n2 .
ex(n, H) ≥
2
k
2
k
n
2
On the other hand, fix t = |V (H)| and consider a graph G with n vertices and m ≥ (1−1/k+) n2
edges. If n is large enough, then by Lemma 1, G contains sets A1 , . . . , Ak+1 of size t such that all
edges between different sets are present. H is a graph of chromatic number k + 1 and therefore its
vertices can be embedded in A1 , . . . , Ak+1 based on their color. We conclude that H is a subgraph
of G and hence
1
1
1 − + n2
ex(n, H) ≤
2
k
for any > 0 and sufficiently large n.
3
Math 184: Combinatorics
Lecture 10: Extremal combinatorics
Instructor: Benny Sudakov
1
Bipartite forbidden subgraphs
We have seen the Erd˝
os-Stone theorem which says that given a forbidden subgraph H, the extremal
number of edges is ex(H, n) = 12 (1−1/(χ(H)−1)+o(1))n2 . Here, o(1) means a term tending to zero
as n → ∞. This basically resolves the question for forbidden subgraphs H of chromatic number
at least 3, since then the answer is roughly cn2 for some constant c > 0. However, for bipartite
forbidden subgraphs, χ(H) = 2, this answer is not satisfactory, because we get ex(H, n) = o(n2 ),
which does not determine the order of ex(H, n). Hence, bipartite graphs form the most interesting
class of forbidden subgraphs.
2
Graphs without any 4-cycle
Let us start with the first non-trivial case where H is bipartite, H = C4 . I.e., the question is how
many edges G can have before a 4-cycle appears. The answer is roughly n3/2 .
Theorem 1. For any graph G on n vertices, not containing a 4-cycle,
√
1
E(G) ≤ (1 + 4n − 3)n.
4
Proof. Let dv denote the degree of v ∈ V . Let F denote the set of “labeled forks”:
F = {(u, v, w) : (u, v) ∈ E, (u, w) ∈ E, v 6= w}.
Note that we do not care whether (v, w) is an edge or not. We count the size of F in two possible
ways: First, each vertex u contributes du (du − 1) forks, since this is the number of choices for v
and w among the neighbors of u. Hence,
X
|F | =
du (du − 1).
u∈V
On the other hand, every ordered pair of vertices (v, w) can contribute at most one fork, because
otherwise we get a 4-cycle in G. Hence,
|F | ≤ n(n − 1).
By combining these two inequalities,
n(n − 1) ≥
X
u∈V
1
d2u −
X
u∈V
du
and by applying Cauchy-Schwartz, we get
!2
1
n(n − 1) ≥
n
X
du
−
X
du =
u∈V
u∈V
(2m)2
− 2m.
n
This yields a quadratic equation 4m2 − 2mn − n2 (n − 1) ≤ 0. A solution yields the theorem.
This bound can be indeed achieved, i.e. there exist graphs with Ω(n3/2 ) 1 edges, not containing
any 4-cycle. One example is the incidence graphs between lines and points of a finite projective
plane. We give a similar example here, which is algebraically defined and easier to analyze.
Example. Let V = Zp × Zp , i.e. vertices are pairs of elements of a finite field (x, y). The number
of vertices is n = p2 . We define a graph G, where (x, y) and (x0 , y 0 ) are joined by an edge, if
x + x0 = yy 0 . For each vertex (x, y), there are p solutions of this equation (pick any y 0 ∈ Zp and x0
is uniquely determined). One of these solutions could be (x, y) itself, but in any case (x, y) has at
least p − 1 neighbors. Hence, the number of edges in the graph is m ≥ 12 p2 (p − 1) = Ω(n3/2 ).
Finally, observe that there is no 4-cycle in G. Suppose that (x, y) has neighbors (x1 , y1 ) and
(x2 , y2 ). This means x + x1 = yy1 and x + x2 = yy2 , therefore x1 − x2 = y(y1 − y2 ). Hence, given
(x1 , y1 ) 6= (x2 , y2 ), y is determined uniquely, and then x can be also computed from one of the
equations above. So (x1 , y1 ) and (x2 , y2 ) can have only one shared neighbor, which means there is
no C4 in the graph.
3
Graphs without a complete bipartite subgraph
Observe that another way to view C4 is as a complete bipartite subgraph K2,2 . More generally, we
can ask how many edges force a graph to contain a complete bipartite graph Kt,t .
Theorem 2. Let t ≥ 2. Then there is a constant c > 0 such that any graph on n vertices without
Kt,t has at most cn2−1/t edges.
Proof. Let G be a graph without Kt,t , V (G) = {1, 2, . . . , n} and let di denote the degree of vertex
i. The neighborhood of vertex i contains dti t-tuples of vertices. Let’s count such t-tuples over
the neighborhoods of all vertices i. Note that any particular t-tuple can be counted at most t − 1
times in this way, otherwise we would get a copy of Kt,t . Therefore,
n X
di
i=1
t
n
≤ (t − 1)
.
t
Observe that the average degree inthe graph is much more than t, otherwise we have nothing to
prove. Due to the convexity of dti as a function of di , the left-hand side is minimized if all the
degrees are equal, di = 2m/n. Therefore,
n X
di
i=1
1
t
2m/n
≥n
t
≥n
(2m/n − t)t
t!
Ω(f (n)) denotes any function which is lower-bounded by cf (n) for some constant c > 0 for sufficiently large n.
2
and
nt
n
(t − 1)
≤ (t − 1) .
t
t!
We conclude that
n(2m/n − t)t ≤ (t − 1)nt
which means that m ≤ 12 (t − 1)1/t n2−1/t + 12 tn ≤ n2−1/t + 21 tn.
As an exercise, the reader can generalize the bound above to the following.
Theorem 3. Let s ≥ t ≥ 2. Then for sufficiently large n, any graph on n vertices without Ks,t has
O(s1/t n2−1/t ) edges.
Another extremal bound of this type is for forbidden even cycles. (Recall that for a forbidden
odd cycle, the number of edges can be as large as 14 n2 .)
Theorem 4. If G has n vertices and no cycle C2k , then the number of edges is m ≤ cn1+1/k for
some constant c > 0.
We prove a weaker version of this bound, for graphs that do not contain any cycle of length at
most 2k.
Theorem 5. If G has n vertices and no cycles of length shorter than 2k + 1, then the number of
edges is m < n(n1/k + 1).
Proof. Let ρ(G) = |E(G)|/|V (G)| denote the density of a graph G. First, we show that there is a
subgraph G0 where every vertex has degree at least ρ(G): Let G0 be a graph of maximum density
among all subgraphs of G (certainly ρ(G0 ) ≥ ρ(G)). We claim that all degrees in G0 are at least
ρ(G0 ). If not, suppose G0 has n0 vertices and m0 = ρ(G0 )n0 edges; then by removing a vertex of degree
d0 < ρ(G0 ), we obtain a subgraph G00 with n00 = n0 − 1 vertices and m00 = m0 − d0 > ρ(G0 )(n0 − 1)
edges, hence ρ(G00 ) = m00 /n00 > ρ(G0 ) which is a contradiction.
Now consider a graph G with m ≥ n(n1/k + 1) edges and its subgraph G0 of maximum density,
where all degrees are at least ρ(G) ≥ n1/k + 1. We start from any vertex v0 and grow a tree where
on each level Lj we include all the neighbors of vertices in Lj−1 . As long as we do not encounter
any cycle, each vertex has at least n1/k new children and |Lj | ≥ n1/k |Lj−1 |. Assuming that there is
no cycle of length shorter than 2k + 1, we can grow this tree up to level Lk and we have |Lk | ≥ n.
However, this contradicts the fact that the levels should be disjoint and all contained in a graph
on n vertices.
4
Application to additive number theory
The following type of question is studied in additive number theory. Suppose we have a set of
integers B and we want to generate B by forming sums of numbers from a smaller set A. How
small can A be?
More specifically, suppose we would like to generate a certain sequence of squares, B =
{12 , 22 , 32 , . . . , m2 }, by taking sums of pairs of numbers, A + A = {a + b : a, b ∈ A}. How small can
√
A be so that B ⊆ A + A? Obviously, we need |A| ≥ m to generate any set of m numbers.
Theorem 6. For any set A such that B = {12 , 22 , 32 , . . . , m2 } ⊆ A, we need |A| ≥ m2/3−o(1) .
3
Proof. Let B ⊆ A + A and suppose |A| = n. We define a graph G whose vertices are A and (a1 , a2 )
is an edge if a1 + a2 = x2 for some integer x. Since we need to generate m different squares, the
number of edges is at least m.
Consider a1 , a2 ∈ A and all numbers b such that a1 + b = x2 and a2 + b = y 2 . Note that we get
a different pair (x, y) for each b. Then, a1 − a2 = x2 − y 2 = (x + y)(x − y). Now, (x + y, x − y)
cannot be the same pair for different numbers b. Denoting the number of divisors of a1 − a2 by d,
we can have at most d2 such possible pairs, and each of them can be used only for one number b.
Now we use the following proposition.
Proposition. For any > 0 and n large enough, n has less than n divisors.
Qt
αi
This can be proved
by
considering
the
prime
decomposition
of
n
=
i=1 pi , where the number
Qt
of divisors is d = i=1 (1 + αi ). We assume αi ≥ 1 for all i. We claim that for any fixed > 0 and
n large enough,
Pt
log d
i=1 log(1 + αi )
= P
< .
φ(n) =
t
log n
i=1 αi log pi
αi
i)
2/ , and this can be true only if p
Observe that log(1+α
i
αi log pi can be larger than /2 only if pi ≤ (1 + αi )
and αi are bounded by some constants P , A . All such factors together contribute only a constant
C in the decomposition of n. For n arbitrarily large, a majority of the terms log(1 + αi ) will be
smaller than 2 αi log pi and hence φ(n) will drop below for sufficiently large n.
To summarize, for any pair a1 , a2 ∈ A, we have less than n2 numbers b which are neighbors of
both a1 and a2 in the graph G. In other words, G does not contain K2,n2 . By our extremal bound,
it has at most cn3/2+ edges. I.e., m ≤ cn3/2+ , for any fixed > 0.
4
Math 184: Combinatorics
Lecture 11: The probabilistic method
Instructor: Benny Sudakov
Very often, we need to construct a combinatorial object satisfying properties, for example to
show a counterexample or a lower bound for a certain statement. In situations where we do not
have much a priori information and it’s not clear how to define a concrete example, it’s often useful
to try a random construction.
1
Probability basics
A probability space is a pair (Ω, Pr) where Pr is a normalized measure on Ω, i.e. Pr(Ω) = 1. In
combinatorics, it’s mostly sufficient to work with finite probability spaces, so we can avoid a lot
of the technicalities of measure theory. We can assume
P that Ω is a finite set and each elementary
event ω ∈ Ω has a certain probability Pr[ω] ∈ [0, 1]; ω∈Ω Pr[ω]
P = 1.
Any subset A ⊆ Ω is an event, of probability Pr[A] = ω∈A Pr[ω]. Observe that a union of
events corresponds to OR and an intersection of events corresponds to AND.
A random variable is any function X : Ω → R. Two important notions here will be expectation
and independence.
Definition 1. The expectation of a random variable X is
X
X
E[X] =
X(ω) Pr[ω] =
a Pr[X = a].
a
ω∈Ω
Definition 2. Two events A, B ⊆ Ω are independent, if
Pr[A ∩ B] = Pr[A] Pr[B].
Two random variables X, Y are independent, if the events X = a and Y = b are independent for
any choices of a, b.
Lemma 1. For independent random variables X, Y , we have E[XY ] = E[X]E[Y ].
Proof.
E[XY ] =
X
ω∈Ω
X(ω)Y (ω) Pr[ω] =
X
ab Pr[X = a, Y = b] =
X
a Pr[X = a]
a
a,b
The two most elementary tools that we will use are the following.
1
X
b
b Pr[Y = b] = E[X]E[Y ].
1.1
The union bound
Lemma 2. For any collection of events A1 , . . . , An ,
Pr[A1 ∪ A2 ∪ . . . ∪ An ] ≤
n
X
Pr[Ai ].
i=1
An equality holds if the events Ai are disjoint.
This is obviously true by the properties of a measure. This bound is very general, since we do
not need to assume anything about the independence of A1 , . . . , An .
1.2
Linearity of expectation
Lemma 3. For any collection of random variables X1 , . . . , Xn ,
E[X1 + X2 + . . . + Xn ] =
n
X
E[Xi ].
i=1
Again, we do not need to assume anything about the independence of X1 , . . . , Xn .
Proof.
n
n
n X
n
X
XX
X
X
E[
Xi ] =
Xi (ω) Pr[ω] =
Xi (ω) Pr[ω] =
E[Xi ].
i=1
2
ω∈Ω i=1
i=1 ω∈Ω
i=1
2-colorability of hypergraphs
Our first application is the question of 2-colorability of hypergraphs. We call a hypergraph 2colorable, if its vertices can be assigned 2 colors so that every hyperedge contains both colors. An
(r)
example which is not 2-colorable is the complete r-uniform hypergraph on 2r − 1 vertices, K2r−1 .
This is certainly not 2-colorable, because for any
there is a set of r vertices of the same
coloring
√
2r−1
r
color. The number of hyperedges here is r ' 4 / r.
A question is whether a number of edges exponential in r is necessary to make a hypergraph
non-2-colorable. The probabilistic method shows easily that this is true.
Theorem 1. Any r-uniform hypergraph with less than 2r−1 hyperedges is 2-colorable.
Proof. Consider a random coloring, where every vertex is colored independently red/blue with
probability 1/2. For each hyperedge e, the probability that e is monochromatic is 2/2r . By the
union bound,
X 2
2|E|
= r <1
Pr[∃monochromatic edge] ≤
r
2
2
e∈E
2r−1 .
by our assumption that |E| <
If every coloring contained a monochromatic edge, this probability would be 1; therefore, for at least one coloring this is not the case and therefore the hypergraph
is 2-colorable.
2
3
A tournament paradox
A tournament is a directed graph where we have an arrow in exactly one direction for each pair
of vertices. A tournament can represent the outcome of a competition where exactly one game is
played between every pair of teams. A natural notion of k winning teams would be such that there
is no other team, beating all these k teams. Unfortunately, such a notion can be ill-defined, for any
value of k.
Theorem 2. For any k ≥ 1, there exists a tournament T such that for every set of k vertices B,
there exists another vertex x such that x → y for all y ∈ B.
Proof. We can assume k sufficiently large, because the theorem gets only stronger for larger k.
Given k, we set n = k + k 2 2k and consider a uniformly random tournament on n vertices. This
means, we select an arrow x → y or y → x randomly for each pair of vertices x, y.
First let’s fix a set of vertices B, |B| = k, and analyze the event that no other vertex beats all
the vertices in B. For each particular vertex x,
Pr[∀y ∈ B; x → y] =
1
2k
and by taking the complement,
Pr[∃y ∈ B; y → x] = 1 −
1
.
2k
Since these events are independent for different vertices x ∈ V \ B, we can conclude that
Pr[∀x ∈ V \ B; ∃y ∈ B; y → x] = (1 − 2−k )n−k = (1 − 2−k )k
2 2k
2
≤ e−k .
By the union bound over all potential sets B,
2 k k
k 2
n −k2
2 k k −k2
.
Pr[∃B; |B| = k; ∀x ∈ V \ B; ∃y ∈ B; y → x] ≤
e
≤ (k 2 ) e
=
k
ek
For k sufficiently large, this is less than 1, and hence there exists a tournament where the respective
event is false. In other words, ∀B; |B| = k; ∃x ∈ V \ B; ∀y ∈ B; x → y.
It is known that k 2 2k is quite close to the optimal size of a tournament satisfying this property;
more precisely, ck2k for some c > 0 is known to be insufficient.
4
Sum-free sets
Our third application is a statement about sum-free sets, that is sets of integers B such that if
x, y ∈ B then x + y ∈
/ B. A question that we investigate here is, how many elements can be pick
from any set A of n integers so that they form a sum-free set? As an example, consider A = [2n].
We can certainly pick B = {n + 1, n + 2, . . . , 2n} and this is a sum-free set of size 21 |A|. Perhaps
this is not possible for any A, but we can prove the following.
Theorem 3. For any set of nonzero integers A, there is a sum-free subset B ⊆ A of size |B| ≥ 31 |A|.
3
Proof. We proceed by reducing the problem to a finite field Zp . We choose p prime large enough so
that |a| < p for all a ∈ A. We observe that in Zp (counting addition modulo p), there is a sum-free
set S = {dp/3e, . . . , b2p/3c}, which has size |S| ≥ 13 (p − 1).
We choose a subset of A as follows. Pick a random element x ∈ Zp∗ = Zp \ {0}, and let
Ax = {a ∈ A : (ax mod p) ∈ S}.
Note that Ax is sum-free, because for any a, b ∈ Ax , we have (ax mod p), (bx mod p) ∈ S and hence
(ax + bx mod p) ∈
/ S, a + b ∈
/ Ax . It remains to show that Ax is large for some x ∈ Zp∗ . We have
E[|Ax |] =
X
a∈A
Pr[a ∈ Ax ] =
X
a∈A
1
Pr[(ax mod p) ∈ S] ≥ |A|
3
because Pr[(ax mod p) ∈ S] is equal to |S|/(p − 1) ≥
is a value of x for which |Ax | ≥ 13 |A|.
4
1
3
for any fixed a 6= 0. This implies that there
Math 184: Combinatorics
Lecture 12: Extremal results on finite sets
Instructor: Benny Sudakov
1
Largest antichains
Suppose we are given a family F of subsets of [n]. We call F an antichain, if there are no two
sets
n
A, B ∈ F such that A ⊂ B. For example, F = {S ⊆ [n] : |S| = k} is an antichain of size
k . How
n
large can an antichain be? The choice of k = bn/2c gives an antichain of size bn/2c . In 1928,
Emanuel Sperner proved that this is the largest possible antichain that we can have. In fact, we
prove a slightly stronger statement.
Theorem 1 (Sperner’s theorem). For any antichain F ⊂ 2[n] ,
Since
n
|A|
≤
n
bn/2c
X
1
A∈F
n
|A|
≤ 1.
for any A ⊆ [n], we conclude that |F| ≤
n
bn/2c
.
Proof. We present a very short proof due to Lubell. Consider a random permutation π : [n] → [n].
We compute the probability of the event that a prefix of this permutation {π1 , . . . , πk } is in F for
some k. Note that this can happen only for one value of k, since otherwise F would not be an
antichain.
For each particular set A ∈ F, the probability that A = {π1 , . . . , π|A| } is equal to k!(n − k)!/n!,
corresponding to all possible orderings of A and [n] \ A. By the property of an antichain, these
events for different sets A ∈ F are disjoint, and hence
Pr[∃A ∈ F; A = {π1 , . . . , π|A| }] =
X
Pr[A = {π1 , . . . , π|A| }] =
A∈F
X
X |A|!(n − |A|)!
=
n!
A∈F
A∈F
1
n
|A|
.
The fact that any probability is at most 1 concludes the proof.
This has the following application. We note that the theorem actually holds for arbitrary vectors
and any ball of radius 1, but we stick to the 1-dimensional case for simplicity.
Theorem 2. P
Let a1 , a2 , . . . , an be real numbers of absolute value |ai | ≥ 1.. Consider the 2n linear
combinations ni=1 i ai , i ∈ {−1, +1}. Then the number of sums which are in any interval (x −
n
1, x + 1) is at most bn/2c
.
An interpretation of this theorem is that for any random walk on the real line, where the i-th
step is either +ai or −ai at random, the
that after n steps we end up in some fixed
nprobability
√
n
interval (x − 1, x + 1) is at most bn/2c
/2 = O(1/ n).
1
Proof. We can assume that ai ≥ 1. For ∈ {−1, +1}n , let I = {i ∈ [n] : i = +1}. If I ⊂ I 0 , and 0
corresponds to I 0 , we have
X
X
X
0i ai −
i ai = 2
ai ≥ 2|I 0 \ I|.
i∈I 0 \I
Therefore, if I is a proper subset of I 0 then only one of them can correspond to a sum inside
(x − 1, x + 1). Consequently,
the sums inside (x − 1, x + 1) correspond to an antichain and we can
n
have at most bn/2c such sums.
Theorem 3 (Bollob´
as, 1965). If A1 , . . . , Am and B1 , . . . , Bm are two sequences of sets such that
Ai ∩ Bj = ∅ if and only if i = j, then
m X
|Ai | + |Bi | −1
≤ 1.
|Ai |
i=1
Note that if A1 , . . . , Am is an antichain on [n] and we set Bi = [n] \ Ai , we get a system of sets
satisfying the conditions above. Therefore this is a generalization of Sperner’s theorem.
Proof. Suppose that Ai , Bi ⊆ [n] for some n. Again, we consider a random permutation π : [n] →
[n]. Here we look at the event that there is some pair (Ai , Bi ) such that π(Ai ) < π(Bi ), in the
sense that π(a) < π(b) for all a ∈ Ai , b ∈ Bi . For each particular pair (Ai , Bi ), the probability of
this event is |Ai |!|Bi |!/(|Ai | + |Bi |)!.
On the other hand, suppose that π(Ai ) < π(Bi ) and π(Aj ) < π(Bj ). Hence, there are points
xi , xj such that the two pairs are separated by xi and xj , respectively. Depending on the relative
order of xi , xj , we get either Ai ∩ Bj = ∅ or Aj ∩ Bi = ∅, which contradicts our assumptions.
Therefore, the events for different pairs (Ai , Bi ) are disjoint. We conclude that
m
m X
X
|Ai |!|Bi |!
|Ai | + |Bi | −1
Pr[∃i; (Ai , Bi ) are separated in π] =
=
≤ 1.
(|Ai | + |Bi |)!
|Ai |
i=1
i=1
This theorem has an application in the following setting. For a collection of sets F ⊆ 2X , we call
T ⊆ X a transversal of F, if ∀A ∈ F; A ∩ T 6= ∅. One question is, what is the smallest transversal
for a given collection of sets F. We denote the size of the smallest transversal by τ (F).
A set system F is called τ -critical, if removing any member of F decreases τ (F). An example
of a τ -critical system is the collection F = [k+`]
of all subsets of size k out of k + ` elements.
k
The smallest transversal has size ` + 1, because any set of size ` + 1 intersects every member of
F, whereas no set of size ` is a transversal, since its complement is a member of F. Moreover,
removing any set A ∈ F decreases τ (F) to `, because then A¯ is a transversal of F \ {A}. This is
an example of a τ -critical system of size k+`
k , where τ (F) = ` + 1 and ∀A ∈ F; |A| = k.
Observe that if F = {A1 , A2 , . . . , An } is τ -critical and τ (F) = ` + 1, then there is a transversal
Bi , |Bi | = ` for each i, which intersects each Aj , j 6= i. However, Bi does not intersect Ai , otherwise
it would also be a transversal of F. Therefore, Theorem 3 implies the following.
Theorem 4. Suppose F is a τ -critical system, where τ (F) = ` + 1 and each A ∈ F has size k.
Then
k+`
|F| ≤
.
k
2
r+s
r
Exercise. If every collection of
points, then all edges can.
2
edges in an r-uniform hypergraph can be covered by s
Intersecting families
Here we consider a different type of family of subsets. We call F ⊆ 2[n] intersecting, if A ∩ B 6= ∅
for any A, B ∈ F. The question what is the largest such family is quite easy: For any set A, we
can take only one of A and [n] \ A. Conversely, we can take exactly one set from each pair like this
- for example all the sets containing element 1. Hence, the largest intersecting family of subsets of
[n] has size exactly 2n−1 .
A more interesting question is, how large can be an intersecting family of sets of size k? We
assume k ≤ n/2, otherwise we can take all k-sets.
Theorem 5 (Erd˝
os-Ko-Rado).
For any k ≤ n/2, the largest size of an intersecting family of subsets
n−1
of [n] of size k is k−1 .
Observe that an intersecting family of size n−1
k−1 can be constructed by taking all k-sets containing element 1. To prove the upper bound, we use an elegant argument of Katona. First, we
prove the following lemma.
Lemma 1. Consider a circle divided into n intervals by n points. Let k ≤ n/2. Suppose we have
“arcs” A1 , . . . , At , each Ai containing k successive intervals around the circle, and each pair of arcs
overlapping in at least one interval. Then t ≤ k.
Proof. No point x can be the endpoint of two arcs - then they are either the same arc, or two arcs
starting from x in opposite directions, but then they do not share any interval.
Now fix an arc A1 . Every other arc must intersect A1 , hence it must start at one of the k − 1
points inside A1 . Each such endpoint can have at most one arc.
Now we proceed with the proof of Erd˝os-Ko-Rado theorem.
Proof. Let F be an intersecting family of sets of size k. Consider a random permutation π : [n] →
[n]. We consider each set A ∈ F mapped onto the circle as above, by associating π(A) with the
respective set of intervals on the circle. Let X be the number of sets A ∈ F which are mapped onto
contiguous arcs π(A) on thecircle. For each set A ∈ F, the probability that π(A) is a contiguous
arc is nk!(n − k)!/n! = n/ nk . Therefore,
X
n
E[X] =
Pr[π(A) is contiguous] = n |F|.
k
A∈F
On the other hand, we know by our lemma that π(A) can be contiguous for at most k sets at the
same time, because F is an intersecting family. Therefore,
E[X] ≤ k.
From these two bounds, we obtain
k n
n−1
|F| ≤
=
.
n k
k−1
3
Math 184: Combinatorics
Lecture 13: Graphs of high girth and high chromatic number
Instructor: Benny Sudakov
1
Markov’s inequality
Another simple tool that’s often useful is Markov’s inequality, which bounds the probability that a
random variable X is too large, based on the expectation E[X].
Lemma 1. Let X be a nonnegative random variable and t > 0. Then
Pr[X ≥ t] ≤
E[X]
.
t
Proof.
E[X] =
X
a Pr[X = a] ≥
a
X
t Pr[X = a] = t Pr[X ≥ t].
a≥t
Working with expectations is usually easier than working directly with probabilities
or more
P
complicated quantities such as variance. Recall that E[X1 + X2 + . . . + Xn ] = ni=1 E[Xi ] for any
collection of random variables.
2
Graphs of high girth and high chromatic number
We return to the notion of a chromatic number χ(G). Observe that for a graph that does not
contain any cycles, χ(G) ≤ 2 because every component is a tree that can be colored easily by 2
colors. More generally, consider graphs of girth `, which means that the length of the shortest
cycle is `. If ` is large, this means that starting from any vertex, the graph looks like a tree within
distance `/2 − 1. One might expect that such graphs can be also colored using a small number
of colors, since locally they can be colored using 2 colors. However, this is far from being true, as
shown by a classical application of the probabilistic method.
Theorem 1. For any k and `, there is a graph of chromatic number χ(G) > k and girth g(G) > `.
Proof. We start by generating a random graph Gn,p , where each edge appears independently with
probability p. We fix a value λ ∈ (0, 1/`) and we set p = nλ−1 . Let X be the number of cycles of
length at most ` in Gn,p . The number of potential cycles of length j is certainly at most nj , and
each of them appears with probability pj , therefore
E[X] ≤
`
X
j=3
j j
n p =
`
X
j=3
1
nλj ≤
nλ`
.
1 − n−λ
Because λ` < 1, this is less than n/4 for n sufficiently large. By Markov’s inequality, Pr[X ≥
n/2] ≤ 1/2. Note that we are not able to prove that there are no short cycles in Gn,p , but we will
deal with this later.
Now let us consider the chromatic number of Gn,p . Rather than the chromatic number χ(G)
itself, we analyze the independence number α(G), i.e. the size of the largest independent set in G.
Since every color class forms an independent set, it’s easy to see that χ(G) ≥ |V (G)|/α(G). We set
a = d p3 ln ne and consider the event that there is an independent set of size a. By the union bound,
we get
a
n
Pr[α(G) ≥ a] ≤
(1 − p)(2) ≤ na e−pa(a−1)/2 ≤ na n−3(a−1)/2 → 0.
a
For n sufficiently large, this probability is less than 1/2. Hence, again by the union bound, we get
Pr[X ≥ n/2 or α(G) ≥ a] < 1.
Therefore there is a graph where the number of short cycles is X < n/2 and the independence
number α(G) < a. We can just delete one vertex from each short cycle arbitrarily, and we obtain
a graph G0 on at least n/2 vertices which has no cycles of length at most `, and α(G0 ) < a. The
chromatic number of this graph is
χ(G0 ) ≥
nλ
n/2
|V (G0 )|
=
≥ 1−λ
.
0
α(G )
6 ln n
3n
ln n
By taking n sufficiently large, we get χ(G0 ) > k.
We should mention that constructing such graphs explicitly is not easy. We present a construction for triangle-free graphs, which is quite simple.
Proposition 1. Let G2 be a graph consisting of a single edge. Given Gn = (V, E), construct Gn+1
as follows. The new set of vertices is V ∪ V 0 ∪ {z}, where V 0 is a copy of V and z is a single new
vertex. Gn+1 [V ] is isomorphic to Gn . For each vertex v 0 ∈ V which is a copy of v ∈ V , we connect
it by edges to all vertices w ∈ V such that (v, w) ∈ E. We also connect each v 0 ∈ V 0 to the new
vertex z.
Then Gn is triangle-free and χ(Gn ) = n.
Proof. The base case k = 2 is trivial. Assuming that Gn is triangle-free, it is easy to see that Gn+1
is triangle-free as well. Any triangle would have to use one vertex from V 0 and two vertices from
V , because there are no edges inside V 0 . However, by the construction of Gn+1 , this would also
imply a triangle in Gn , which is a contradiction.
Finally, we deal with the chromatic number. We assume χ(Gn ) = n. Note that it’s possible to
color V and V 0 in the same way, and then assign a new color to z, hence χ(Gn+1 ) ≤ n + 1. We
claim that this is essentially the best way to color Gn+1 . Consider any n-coloring of V . For each
color c, there is a vertex vc of color c, which is connected to vertices of all other colors - otherwise,
we could re-color all vertices of color c and decrease the number of colors in Gn . Therefore, there
is also a vertex vc0 ∈ V 0 which is connected to all other colors different from c. If we want to color
Gn+1 using n colors, we must use color c for vc0 . But then, V 0 uses all n colors and z cannot use
any of them. Therefore, χ(Gn+1 ) = n + 1.
2
Math 184: Combinatorics
Lecture 14: Topological methods
Instructor: Benny Sudakov
1
The Borsuk-Ulam theorem
We have seen how combinatorics borrows from probability theory. Another area which has been
very beneficial to combinatorics, perhaps even more surprisingly, is topology. We have already seen
Brouwer’s fixed point theorem and its combinatorial proof.
Theorem 1 (Brouwer). For any continuous function f : B n → B n , there is a point x ∈ B n such
that f (x) = x.
A more powerful topological tool which seems to stand at the root of most combinatorial
applications is a somewhat related result which can be stated as follows. Here, S n denotes the
n-dimensional sphere, i.e. the surface of the (n + 1)-dimensional ball B n+1 .
Theorem 2 (Borsuk-Ulam). For any continuous function f : S n → Rn , there is a point x ∈ S n
such that f (x) = f (−x).
There are many different proofs of this theorem, some of them elementary and some of them
using a certain amount of the machinery of algebraic topology. All the proofs are, however, more
involved than the proof of Brouwer’s theorem. We will not give the proof here.
In the following, we use a corollary (in fact an equivalent re-statement of the Borsuk-Ulam
theorem).
Theorem 3. For any covering of S n by n + 1 open or closed sets A0 , . . . , An , there is a set Ai
which contains two antipodal points x, −x.
Let’s just give some intuition how this is related to Theorem 2. For now, let us assume that all
the sets Ai are closed. (The extension to open sets is a technicality but the idea is the same.) We
define a continuous function f : S n → Rn ,
f (x) = (dist(x, A1 ), dist(x, A2 ), . . . , dist(x, An ))
where dist(x, A) = inf y∈A ||x − y|| is the distance of x from A. By Theorem 2, there is a point
x ∈ S n such that f (x) = f (−x). This means that dist(x, Ai ) = dist(−x, Ai ) for 1 ≤ i ≤ n. If
dist(x, Ai ) = 0 for some i, then we are done. If dist(x, Ai ) = dist(−x, Ai ) 6= 0 for all i ∈ {1, . . . , n},
it means that x, −x ∈
/ A1 ∪ . . . ∪ An . But then x, −x ∈ A0 .
2
Kneser graphs
Similarly to the previous sections, Kneser graphs are derived from the intersection pattern of a
collection of sets. More precisely, the vertex set of a Kneser graph consists of all k-sets on a given
ground set, and two k-sets form an edge if they are disjoint.
1
Definition 1. The Kneser graph on a ground set [n] is
[n]
KGn,k =
, {(A, B) : |A| = |B| = k, A ∩ B = ∅} .
k
Thus, the maximum independent set in KGn,k is equivalent
tokthe maximum intersecting family
of k-sets - by the Er˝
os-Ko-Rado theorem, α(KGn,k ) = n−1
k−1 = n |V | for k ≤ n/2. The maximum
clique in KGn,k is equivalent to the maximum number of disjoint k-sets, i.e. ω(KGn,k ) = bn/kc.
Another natural question is, what is the chromatic number of KGn,k ? Note that for n = 3k − 1,
the Kneser graph does not have any triangle, and also α(KG3k−1,k ) = 31 |V |. Yet, we will show that
the chromatic number χ(KGn,k ) grows with n. Therefore, these graphs give another example of a
triangle-free graph of high chromatic number.
Theorem 4 (Lov´
asz-Kneser). For all k > 0 and n ≥ 2k − 1, χ(KGn,k ) = n − 2k + 2.
Proof. First, we show that KGn,k can be colored using n − 2k + 2 colors. This means assigning
colors to k-sets, so that all k-sets of the same color intersect. This is easy to achieve: color each
k-set by its element which is as close to n/2 as possible. Since every k-set has an element between
k and n − k + 1, we have n − 2k + 2 colors and all k-sets of a given color intersect.
The proof that n − 2k + 1 colors are not enough is more interesting. Let d = n − 2k + 1 and
assume that KGn,k is colored using d colors. Let X be a set of n points on S d in a general position
(there are
no d + 1 points lying on a d-dimensional hyperplane through the origin). Each subset
X
A ∈ k corresponds to a vertex of KGn,k which is colored with one of d colors. Let Ai be the
collection of k-sets corresponding to color i.
We define sets U1 , . . . , Ud ⊆ S d as follows: x ∈ Ui , if there exists A ∈ Ai such that ∀y ∈
A; x · y > 0. In other words, x ∈ Ui if some k-set of color i lies in the open hemisphere whose pole
is x. Finally, we define U0 = S d \ (U1 ∪ U2 ∪ . . . ∪ Ud ). It’s easy to see that the sets U1 , . . . , Ud are
open and U0 is closed. By Theorem 3, there is a set Ui and two antipodes x, −x ∈ Ui .
If this happens for i = 0, then we have two antipodes x, −x which are not contained in any
Ui , i > 0. This means that both hemispheres contain fewer than k points, but then n−2(k−1) = d+1
points must be contained in the “equator” between the two hemispheres, contradicting the general
position of X. Therefore, x, −x ∈ Ui for some i > 0, which means we have two k-sets of color i
lying in opposite hemispheres. This means that they are disjoint and hence forming an edge in
KGn,k , which is a contradiction.
3
Dolnikov’s theorem
The Kneser graph can be defined naturally for any set system F: two sets form an edge if they are
disjoint. We denote this graph by KG(F):
KG(F) = {F, {(A, B) : A, B ∈ F, A ∩ B = ∅}}.
We derive a bound on the chromatic number of KG(F) which generalizes Theorem 4. For this
purpose, we need the notion of a 2-colorability defect.
Definition 2. For a hypergraph (or set system) F, the 2-colorability defect cd2 (F) is the smallest
number of vertices, such that removing them and all incident hyperedges from F produces a 2colorable hypergraph.
2
For example, if H is the hypergraph of all k-sets on n vertices, we need to remove n − 2k + 2
vertices and then the remaining hypergraph of k-sets on 2k − 2 vertices is 2-colorable. (Note that a
similar hypergraph on 2k−1 vertices is not 2-colorable.) Thus, cd2 (H) = n−2k+2.. Coincidentally,
this is also the chromatic number of the corresponding Kneser graph. We prove the following.
Theorem 5 (Dolnikov). For any hypergraph (or set system) F,
χ(KG(F)) ≥ cd2 (F).
We remark that equality does not always hold, and also cd2 (F) is not easy to determine for a
given hypergraph. The connection between two very different coloring concepts is quite surprising,
though. Our first proof follows the lines of the Kneser-Lov´asz theorem.
Proof. Let d = χ(KG(F)) and consider a coloring of F by d colors. Again, we identify the ground
set of F with a set of points X ⊂ S d in general position, with no d+1 points on the same hyperplane
through the origin. We define Ui ⊆ S d by x ∈ Ui iff some set F ∈ F of color i is contained in
H(x) = {y ∈ S d : x · y > 0}. Also, we set A0 = S d \ (A1 ∪ . . . ∪ Ad ).
By Theorem 3, there is a set Ai containing two antipodal points x, −x. This cannot happen
for i ≥ 1, because then there would be two sets F, F 0 ∈ F of color i such that F ⊂ H(x) and
F 0 ⊂ H(−x). This would imply F ∩ F 0 = ∅, contradicting the coloring property of the Kneser
graph KG(F).
Therefore, there are two antipodal points x, −x ∈ A0 . This implies that there is no set
F ∈ F in either hemisphere H(x) or H(−x). By removing the points on the equator between
H(x) and H(−x), whose number is at most d = χ(KG(F)), and also removing all the sets in
F containing them, we obtain a hypergraph F 0 such that all the sets F ∈ F 0 touch both hemispheres H(x), H(−x). This hypergraph can be colored by 2 colors corresponding to the two hemispheres.
Next, we present Dolnikov’s original proof, which is longer but perhaps more intuitive. It relies
on the following geometric lemma, which follows from the Borsuk-Ulam theorem.
Lemma 1. Let C1 , C2 , . . . , Cd be families of convex bounded sets in Rd . Suppose that each family
0
0
Ci is
Sdintersecting, i.e. ∀C, C ∈ Ci ; C ∩ C 6= ∅. Then there is a hyperplane intersecting all the sets
in i=1 Ci .
Proof. Let’s consider a vector v ∈ S d−1 , which defines a line in Rd , Lv = {αv : α ∈ R}. For each
family Ci , we consider its projection on Lv . Formally, for each C ∈ Ci we consider
P (C, v) = {x · v : x ∈ C}.
Since each C is a convex bounded set, P (C, v) is a bounded interval. C ∩ C 0 6= ∅ for all C, C 0 ∈ Ci ,
and therefore all theTintervals P (C, v) are pairwise intersecting as well. Hence, the intersection of
all these intervals,
C∈Ci P (C, v), is a nonempty bounded interval as well. Let fi (v) denote the
T
midpoint of C∈Ci P (C, v). This means that the hyperplane
H(v, λ) = {x ∈ Rd : x · v = λ}
for λ = fi (v) intersects all the sets in Ci .
3
For each 1 ≤ i ≤ d − 1, define gi (v) = fi (v) − fd (v). Observe that P (C, −v) = −P (C, v) and
hence fi (−v) = −fi (v), and also gi (−v) = −gi (v). By Theorem 2, there is a point v ∈ S d−1 such
that for all 1 ≤ i ≤ d − 1, gi (v) = gi (−v). Since gi (−v) = −gi (v), this implies that in fact gi (v) = 0.
In other words, fi (v) = fd (v) = λ for all 1 ≤ i ≤ d − 1. This means that the hyperplane H(v, λ)
intersects all the sets in Ci , for each 1 ≤ i ≤ d.
Now we can give the second proof of Dolnikov’s theorem.
Proof. We consider a coloring of the Kneser graph KG(F) by d colors. Denote by Fi the collection
of sets in F corresponding to vertices of color i.
We represent the ground set of F by a set of points X ⊂ Rd in general position. (Observe that
in the first proof, we placed the points in S d ⊂ Rd+1 .) Again, we assume that there are no d + 1
points on the same hyperplane. We define d families of convex sets: for every i ∈ [d],
Ci = {conv(F ) : F ∈ Fi }.
In other words, these are polytopes corresponding to sets of color i. In each family, all polytopes
are pairwise intersecting, by the coloring property of KG(F). Therefore by Lemma 1, there is a
hyperplane H intersecting all these polytopes in each Ci . Let Y = H ∩ X be the set of points
exactly on the hyperplane. Let’s remove Y and all the sets containing some point in Y , and denote
the remaining sets by F 0 . Each set F 0 ∈ F 0 must contain vertices on both sides of H, otherwise
conv(F 0 ) would not be intersected by H. Therefore, coloring the open halfspaces on the two sides
of H by 2 colors, we obtain a valid 2-coloring of F 0 .
4
Math 184: Combinatorics
Lecture 15: Applications of linear algebra
Instructor: Benny Sudakov
1
Linear algebra in combinatorics
After seeing how probability and topology can be useful in combinatorics, we are going to exploit
an even more basic area of mathematics - linear algebra. While the probabilistic method is usually
useful to construct examples and prove lower bounds, a common application of linear algebra is
to prove an upper bound, where we show that a collection of objects satisfying certain properties
cannot be too large. A typical argument to prove this is that we replace the objects by vectors in a
linear space of certain dimension, and we show that the respective vectors are linearly independent.
Hence, there cannot be more of them than the dimension of the space.
2
Even and odd towns
We start with the following classical example. Suppose there is a town where residents love forming
different clubs. To limit the number of possible clubs, the town council establishes the following
rules:
Even town.
• Every club must have an even number of members.
• Two clubs must not have exactly the same members.
• Every two clubs must share an even number of members.
How many clubs can be formed in such a town? We leave it as an exercise to the reader that
there can be as many as 2n/2 clubs (for an even number of residents n). Thus, the town council
reconvened and invited a mathematician to help with this problem. The mathematician suggested
the following modified rules.
Odd/even town.
• Every club must have an odd number of members.
• Every two clubs must share an even number of members.
The residents soon found out that they were able to form only n clubs under these rules, for
example by each resident forming a separate club. In fact, the mathematician was able to prove
that more than n clubs are impossible to form.
Theorem 1. Let F ⊂ 2[n] be such that |A| is odd for every A ∈ F and |A ∩ B| is even for every
distinct A, B ∈ F. Then |F| ≤ n.
1
Proof. Consider the vector space Z2n , where Z2 = {0, 1} is a finite field with operations modulo 2.
Represent each club A ∈ F by its incidence vector 1A ∈ Z2n , where a component i is equal to 1
exactly if i ∈ A. We claim
P that these vectors are linearly independent.
Suppose that z = A∈F αA 1A = 0. Fix any B ∈ F. We consider the inner product z · 1B = 0.
By the linearity of the inner product and the odd-town rules,
X
0 = z · 1B =
αA (1A · 1B ) = αB ,
A∈F
all operations over Z2 . We conclude that αB = 0 for all B ∈ F. Therefore, the vectors {1A : A ∈ F}
are linearly independent and their number cannot be more than n, the dimension of Z2n .
An alternative variant is an even/odd town, where the rules are reversed.
Even/odd town.
• Every club must have an even number of members.
• Every two clubs must share an odd number of members.
Exercise. By a simple reduction, any even/odd town with n residents and m clubs can be converted to
an odd/even town with n + 1 residents and m clubs. This shows that there is no even/odd town with n
residents and n + 2 clubs.
Theorem 2. Let F ⊂ 2[n] be such that |A| is even for every A ∈ F and |A ∩ B| is odd for every
distinct A, B ∈ F. Then |F| ≤ n.
Proof. Assume that |F| = n + 1. All calculations in P
the following are taken mod 2. The n + 1
vectors {1A : A ∈ F} must be linearly dependent, i.e.
A∈F αA 1A = 0 for some non-trivial linear
combination. Note that 1A ·1B = 1 for distinct A, B ∈ F and 1A ·1A = 0 for any A ∈ F. Therefore,
X
X
1B ·
αA 1A =
αA = 0.
A∈F
A∈F :A6=B
By subtracting these expressions for B, B 0 ∈ F, we get αB = αB 0 . This means that all the
coefficients αB are equal and in fact equal to 1 (otherwise the linearP
combination is trivial).
We have proved that
P for any even/odd town with n + 1 clubs, A∈F 1A = 0. Moreover, for
any B ∈ F, 0 = 1B · A∈F 1A = |F| − 1 = n which means that |F| is odd and n is even.
¯ Since the total
Now we use the following duality. Replace each set A ∈ F by its complement A.
¯
¯
¯
number of elements n is even, we get |A| even and |A ∩ B| odd for any distinct A, B ∈ F. This
means that the n + 1 P
complementary clubs A¯ should also form an even/odd town and therefore
again, we should have A∈F 1A¯ = 0. But then,
0=
X
A∈F
1A +
X
1A¯ = |F|1
A∈F
where 1 is the all-ones vector. This implies that |F| is even, contradicting our previous conclusion
that |F| is odd.
2
3
Fisher’s inequality
A slight modification of the odd-town rules is that every two clubs share a fixed number of members
k (there is no condition here on the size of each club). We get a similar result here, which is known
as Fisher’s inequality.
Theorem 3 (Fisher’s inequality). Suppose that F ⊂ 2[n] is a family of nonempty clubs such that
for some fixed k, |A ∩ B| = k for every distinct A, B ∈ F. Then |F| ≤ n.
Proof. Again, we consider the incidence vectors {1A : A ∈ F}, this time
P as vectors in the real
vector space Rn . We have 1A · 1B = k for all A 6= B in F. Suppose that A∈F αA 1A = 0. Then
!
!
X
X
X
0 = ||
αA 1A ||2 =
αA 1A ·
αB 1B
A∈F
A∈F
B∈F
!2
=
X
A∈F
2
αA
|A| +
X
αA αB k = k
X
αA
A∈F
A6=B∈F
+
X
2
αA
(|A| − k).
A∈F
Note that |A| ≥ k, and at most one set A∗ can actually have size k. Therefore, theP
contributions to
the last expression are all nonnegative and αA = 0 except for |A∗ | = k. But then, A∈F αA = αA∗
and this must be zero as well.
We have proved that the vectors {1A : A ∈ F} are linearly independent in Rn and hence their
number can be at most n.
Fisher’s inequality is related to the study of designs, set systems with special intersection patterns. We show here how such a system can be used to construct a graph on n vertices, which does
not have any clique or independent set of size ω(n1/3 ). Recall that in a random graph, there are
no cliques or independent sets significantly larger than log n; so this explicit construction is very
weak in comparison.
Lemma 1. For a fixed k, let G be a graph whose vertices are triples T ∈ [k]
3 and {A, B} is an
edge if |A ∩ B| = 1. Then G does not contain any clique or independent set of size more than k.
Proof. Suppose Q is a clique in G. This means we have a set of triples on [k] where each pair
intersects in exactly one element. By Fisher’s inequality, the number of such triples can be at most
k.
Suppose S is an independent set in G. This is a set of triples on [k] where each pair intersects
in an even number of elements, either 0 or 2. By the odd-town theorem, the number of such triples
is again at most k.
Another application of Fisher’s inequality is the following.
Lemma 2. Suppose P is a set of n points in the plane, not all on one line. Then pairs of points
from P define at least n distinct lines.
Proof. Let L be the set of lines defined by pairs of points from P . For each point xi ∈ P , let Ai ⊆ L
be the set of lines containing xi . We have |Ai | ≥ 2, otherwise all points lie on the same line. Also,
Ai is different for each point; the same set of at least 2 lines would define the same point. Moreover,
any two points share exactly one line, i.e. |Ai ∩ Ai0 | = 1 for any i 6= i0 . By Fisher’s inequality, we
get |P | ≤ |L|.
3
Math 184: Combinatorics
Lecture 16: Linear algebra - continued
Instructor: Benny Sudakov
1
Spaces of polynomials
In the previous lecture, we considered objects representedP
by 0/1 vectors. A vector 1A corresponding
to a set A can be also viewed as a linear form f (~x) = i∈A xi . (All our proofs could be written
equivalently in the language of linearly independent linear forms.) More generally, however, we can
represent objects by polynomials f (x). Polynomials of a certain degree form a vector space and we
can still apply the same arguments about dimension and linear independence. This gives us more
flexibility and power compared to the linear case.
2
Two-distance sets
Consider a set of points A ⊂ Rn . If all the pairwise distances between points in A are equal, then
these are the vertices of a simplex. The number of such points can be at most n + 1.
What if we relax the condition and require that there are two possible distances c, d, so that
any pairwise distance is either c or d? Such a set is called a two-distance set.
Exercise. Construct a two-distance set in Rn with
n
2
points.
Theorem 1. Any two-distance set in Rn has at most 21 (n + 1)(n + 4) points.
Proof. Let A ⊂ Rn be a two-distance set. For each point a ∈ A, we define a polynomial on Rn ,
fa (x) = ||x − a||2 − c2 ||x − a||2 − d2 .
P
Here, ||x||2 = i x2i denotes the square of the
Peuclidean norm. Let’s prove that the polynomials
fa (x) are linearly independent. Suppose that a∈A αa fa (x) is identically zero. Then plug in x = b
for some point
b ∈ A. We have fa (b) = 0 for any b 6= a, because ||a − b|| is either c or d. So we
P
have 0 = a∈A αa fa (b) = αb fb (b) = αb c2 d2 . Since cd 6= 0, this implies αb = 0, for any b ∈ A. This
shows that the polynomials fa (x) are linearly independent.
Finally, we want to bound the dimension of the vector space containing our polynomials. By
expanding the euclidean norms, it can be seen that each fa (x) can be expressed as a linear combination of the following polynomials:
( n
)
n
X
X
V = (
x2i )2 , xj
x2i , xi xj , xi , 1 | i, j ∈ [n] .
i=1
i=1
The number of generators here is 1 + n + 21 n(n + 1) + n + 1 = 12 (n + 1)(n + 4). Therefore, the
polynomials fi (x) reside in a vector space of dimension 12 (n + 1)(n + 4).
1
3
Sets with few possible intersection sizes
Here we discuss a generalization of Fisher’s inequality. Consider a family of sets F ⊆ 2[n] and let
L ⊂ {0, 1, . . . , n}. We say that F is L-intersecting if |A ∩ B| ∈ L for any distinct A, B ∈ F. Fisher’s
inequality says that if |L| = 1 then |F| ≤ n. Frankl and Wilson proved the following generalization
in 1981.
Theorem 2. If F is an L-intersecting family of subsets of [n], then
|F| ≤
|L| X
n
k=0
k
.
Note that the family of all subsets of size at most ` is L-intersecting, for L = {0, 1, . . . , ` − 1},
so this bound is best possible.
Proof. Let F ⊂ 2[n] and |L| = s. For any A 6= B ∈ F, |A ∩ B| ∈ L. We define a polynomial on Rn
for each A ∈ F:
!
Y
X
fA (x) =
xe − ` .
`∈L:`<|A|
e∈A
Observe that for any B ∈ F, B 6= A, if we plug in the indicator vector 1B , we get
Y
fA (1B ) =
(|A ∩ B| − `) = 0
`∈L:`<|A|
because |A ∩ B| = ` < |A| for some ` ∈ L. On the other hand,
Y
fA (1A ) =
(|A| − `) > 0.
`∈L:`<|A|
By an argument similar to the one we used before, the polynomials {fA (x) : A ∈ L} are independent.
It remains to compute the dimension of the space containing all these polynomials. A trick
that helps reduce the dimension is that we are only using 0/1 vectors here. Thus, we can replace
all higher powers xki by xi itself; this does not
Q change the linear independence property. Then, the
polynomials
by all monomials i∈I xi , where |I| ≤ s. The number of such monomials
Pare generated
is exactly sk=0 nk , as required.
Using essentially the same argument, we can also prove the following modular version of the
theorem.
Theorem 3. Let p be prime and L ⊂ Zp . Assume F ⊂ 2[n] is a family of sets such that
• |A| =
6 L (mod p) for any A ∈ F.
• |A ∩ B| ∈ L (mod p) for any distinct A, B ∈ F.
Then
|F| ≤
|L| X
n
k=0
2
k
.
Proof. Let F ⊂ 2[n] and L ⊂ Zp . In the following, all operations are mod p. For any A, B ∈ F
distinct, |A ∩ B| ∈ L. We define a polynomial on Zpn for each A ∈ F:
!
fA (x) =
Y
X
`∈L
e∈A
xe − ` .
Observe that for any B ∈ F, B 6= A, if we plug in the indicator vector 1B , we get
Y
fA (1B ) =
(|A ∩ B| − `) = 0
`∈L
because |A ∩ B| ∈ L. On the other hand,
fA (1A ) =
Y
(|A| − `) 6= 0.
`∈L
Again, we replace each fA (x) by f˜A (x) where each factor xki is replaced by xi . Since we are only
˜
substituting 0/1 values, this does not affect the properties
Q above. Hence, the polynomials fA (x)
are independent. They are generated by all monomials i∈I xi , where |I| ≤ |L|. The number of
P|L|
such monomials is exactly k=0 nk , as required.
3
Math 184: Combinatorics
Lecture 17: Linear algebra - continued
Instructor: Benny Sudakov
1
Few possible intersections - summary
Last time, we proved two results about families of sets with few possible intersection sizes. Let us
compare them here.
Theorem 1. If F is an L-intersecting family of subsets of [n], then
|F| ≤
|L| X
n
k=0
k
.
Theorem 2. Let p be prime and L ⊂ Zp . Assume F ⊂ 2[n] is an L-intersecting family (with
intersections taken mod p), and no set in F has size in L (mod p). Then
|F| ≤
|L| X
n
k=0
k
.
Both results have intersecting applications. First, let’s return to Ramsey graphs.
2
Explicit Ramsey graphs
We saw how to construct a graph on n = k3 vertices, which does not contain any clique or
independent set larger than k. Here, we improve this construction to n = k Ω(log k/ log log k) , i.e.
superpolynomial in k.
3 Theorem 3 (Frankl,Wilson 1981). For any prime p, there is a graph G on n = p2p−1 vertices
P
p3
such that any clique or independent set in G has size at most k = p−1
i=0 i .
2 −1)
Note that n ' p3(p
, while k ' p3(p−1) . I.e., n ' k p+1 ' k log k/ log log k .
3 Proof. We construct G as follows. Let V = p[p2 −1] , and let A, B ∈ V form an edge if |A ∩ B| =
6 p−1
2
(mod p). Note that for each A ∈ V , |A| = p − 1 = p − 1 (mod p).
If A1 , . . . , Ak is a clique, then |Ai | = p − 1 (mod p), while |Ai ∩ Aj | 6= p − 1 (mod p) for all
P
p3
i 6= j. By Theorem 2 with L = {0, 1, . . . , p − 2}, we get k ≤ p−1
i=0 i .
If A1 , . . . , Ak is an independent set, then |Ai ∩ Aj | = p − 1 (mod p) for all i 6= j. This means
|Ai ∩ Aj | ∈ L = {p − 1, 2p − 1, . . . , p2 − p − 1}, without any modulo operations. By Theorem 1, we
P
p3
get k ≤ p−1
i=0 i .
1
3
Borsuk’s conjecture
Can every bounded set S ⊂ Rd be partitioned into d + 1 sets of strictly smaller diameter?
This conjecture was a long-standing open problem, solved in the special cases of a sphere S
(by Borsuk himself), S being a smooth body (using the Borsuk-Ulam theorem) and low dimension
d ≤ 3. It can be seen that a simplex requires d + 1 sets, otherwise we have 2 vertices in the same
part and hence the diameter does not decrease.
The conjecture was disproved dramatically in 1993, when Kahn and Kalai showed that significantly more than d + 1 parts are required.
Theorem 4. For any d sufficiently large, there√exists a bounded set S ⊂ Rd (in fact a finite set)
such that any partition of S into fewer than 1.2 d parts contains a part of the same diameter.
The proof uses an algebraic construction, relying on the following lemma.
vectors F ⊆ {−1, +1}4p such that every
Lemma 1. For any prime p, there exists a set of 12 4p
2p
4p
subset of 2 p−1
vectors contains an orthogonal pair of vectors.
Proof. Consider 4p elements and all subsets of size 2p, containing a fixed element 1:
F = {I : I ⊆ [4p], |I| = 2p, 1 ∈ I}.
For each set I, we define a vector viI = +1 if i ∈ I and viI = −1 if i ∈
/ I. We set F = {v I : I ∈ F}.
The only way that a pair of such vectors v I , v J can be orthogonal is that |I∆J| = 2p and then
|I ∩ J| = p. Note that |I ∩ J| is always between 1 and 2p − 1 (I, J are different and they share at
least 1 element). Hence v I · v J = 0 iff |I ∩ J| = 0 (mod p).
We claim that this is the desired collection of vectors. For a subset G ⊂ F without any
orthogonal pair, we would have a family of sets G ⊂ F such that
• ∀I ∈ G; |I| = 0 (mod p).
• ∀I, J ∈ G; |I ∩ J| ∈ {1, 2, . . . , p − 1} (mod p).
By Theorem 2,
|G| ≤
p−1 X
4p
k=0
k
<2
4p
.
p−1
Now we are ready to prove the theorem.
Proof. Given a set of vectors F ⊆ Rn = R4p provided by the lemma above, we define a set of
vectors
2
X = {v ⊗ v : v ∈ F } ⊂ Rn .
Here, each vector is a tensor product w = v ⊗ v. More explicitly,
1 ≤ i, j ≤ n.
wij = vi vj ,
These vectors satisfy the following properties:
2
2
• w ∈ {−1, +1}n ; ||w|| =
√
n2 = n.
• w · w0 = (v ⊗ v) · (v 0 ⊗ v 0 ) = (v · v 0 )2 ≥ 0.
• w, w0 are orthogonal if and only if v, v 0 are orthogonal.
• ||w − w0 ||2 = ||w||2 + ||w0 ||2 − 2(w · w0 ) = 2n2 − 2(v · v 0 )2 ≤ 2n2 , and the pairs of maximum
distance correspond to orthogonal vectors.
4p
By the lemma, any subset of 2 p−1
vectors contains an orthogonal pair and so its diameter
is the same as the original set. If we want to decrease the diameter, we must partition X into
4p
sets of size less than 2 p−1
, and the number of such parts is at least
|X|
2
4p
p−1
1 4p
2 2p
4p
2 p−1
=
(3p + 1)(3p)(3p − 1) · · · (2p + 2)(2p + 1)
=
≥
4(2p)(2p − 1) · · · (p + 1)p
p−1
3
.
2
The dimension of√ our space is d = n2 = (4p)2 , and the number of parts must be at least
(3/2)p−1 = (3/2) d/4−1 . (The bound can be somewhat improved by a more careful analysis.)
3
Math 184: Combinatorics
Lecture 18: Spectral graph theory
Instructor: Benny Sudakov
1
Eigenvalues of graphs
Looking at a graph, we see some basic parameters: the maximum degree, the minimum degree, its
connectivity, maximum clique, maximum independent set, etc. Parameters which are less obvious
yet very useful are the eigenvalues of the graph. Eigenvalues are a standard notion in linear algebra,
defined as follows.
Definition 1. For a matrix A ∈ Rn×n , a number λ is an eigenvalue if for some vector x 6= 0,
Ax = λx.
The vector x is called an eigenvector corresponding to λ.
Some basic properties of eigenvalues are
• The eigenvalues are exactly the numbers λ that make the matrix A−λI singular, i.e. solutions
of det(A − λI) = 0.
• All eigenvectors corresponding to λ form a subspace Vλ ; the dimension of Vλ is called the
multiplicity of λ.
• In general, eigenvalues can be complex numbers. However, if A is a symmetric matrix (aij
= aji ), then all eigenvalues are real, and moreover there is an orthogonal basis consisting of
eigenvectors.
P
P
• The sum of all eigenvalues, including multiplicities, is ni=1 λi = T r(A) = ni=1 aii , the trace
of A.
Q
• The product of all eigenvalues, including multiplicities, is ni=1 λi = det(A), the determinant
of A.
• The number of non-zero eigenvalues, including multiplicities, is the rank of A.
For graphs, we define eigenvalues as the eigenvalues of the adjacency matrix.
Definition 2. For a graph G, the adjacency matrix A(G) is defined as follows:
• aij = 1 if (i, j) ∈ E(G).
• aij = 0 if i = j or (i, j) ∈
/ E(G).
Because T r(A(G)) = 0, we get immediately the following.
Lemma 1. The sum of all eigenvalues of a graph is always 0.
1
Examples.
1. The complete graph Kn has an adjacency matrix equal to A = J − I, where J is the all-1’s
matrix and I is the identity. The rank of J is 1, i.e. there is one nonzero eigenvalue equal
to n (with an eigenvector 1 = (1, 1, . . . , 1)). All the remaining eigenvalues are 0. Subtracting
the identity shifts all eigenvalues by −1, because Ax = (J − I)x = Jx − x. Therefore the
eigenvalues of Kn are n − 1 and −1 (of multiplicity n − 1).
2. If G is d-regular, then 1 = (1, 1, . . . , 1) is an eigenvector. We get A1 = d1, and hence d is an
eigenvalue. It is easy to see that no eigenvalue can be larger than d. In general graphs, the
largest eigenvalue is a certain notion of what degrees essentially are in G.
3. If G is d-regular and d = λ1 ≥ λ2 ≥ . . . ≥ λn are the eigenvalues of G, then the eigenvalues
¯ are n − 1 − d and {−1 − λi : 2 ≤ i ≤ n}. This is because A(G)
¯ = J − I − A(G); G
¯
of G
is (n − 1 − d)-regular, so the largest eigenvalue is n − 1 − d. Any other eigenvalue λ has an
eigenvector x orthogonal to 1, and hence
¯ = (J − I − A(G))x = 0 − 1 − λ.
A(G)x
4. The complete bipartite graph Km,n has an adjacency matrix of rank 2, therefore we expect
to have eigenvalue 0 of multiplicity n − 2, and two non-trivial eigenvalues. These should be
equal to ±λ, because the sum of all eigenvalues is always 0.
We find λ by solving Ax = λx. By symmetry, we guess that the eigenvector x should have m
coordinates equal to α and n coordinates equal to β. Then,
Ax = (mβ, . . . , mβ, nα, . . . , nα).
This should be a multiple of x = (α, . . . , α, β, . . . , β). Therefore, we get mβ = λα and
√
nα = λβ, i.e. mnβ = λ2 β and λ = mn.
2
Math 184: Combinatorics
Lecture 19: The Petersen graph and Moore graphs
Instructor: Benny Sudakov
1
The Petersen graph
As a more interesting exercise, we will compute the eigenvalues of the Petersen graph.
Definition 1. The Petersen graph is a graph with 10 vertices and 15 edges. It can be described in
the following two ways:
1. The Kneser graph KG(5, 2), of pairs on 5 elements, where edges are formed by disjoint edges.
2. The complement of the line graph of K5 : the vertices of the line graph are the edges of K5 ,
and two edges are joined if they share a vertex.
3. Take two disjoint copies of C5 : (v1 , v2 , v3 , v4 , v5 ) and (w1 , w2 , w3 , w4 , w5 ). Then add a matching of 5 edges between them: (v1 , w1 ), (v2 , w3 ), (v3 , w5 ), (v4 , w2 ), (v5 , w4 ).
The Petersen graph is a very interesting small graph, which provides a counterexample to many
graph-theoretic statements. For example,
• It is the smallest bridgeless 3-regular graph, which has no 3-coloring of the edges so that
adjacent edges get different colors (the smallest “snark”).
• It is the smallest 3-regular graph of girth 5.
• It is the largest 3-regular graph of diameter 2.
• It has 2000 spanning trees, the most of any 3-regular graph on 10 vertices.
To compute the eigenvalues of the Petersen graph, we use the fact that it is strongly regular.
This means that not only does each vertex have the same degree (3), but each pair of vertices
(u, v) ∈ E has the same number of shared neighbors (0), and each pair of vertices (u, v) ∈
/ E has
the same number of shared neighbors (1). In terms of the adjacency matrix, this can be expressed
as follows:
P
• (A2 )ij = k aik akj is the number of neighbors shared by i and j.
• For i = j, (A2 )ij = 3.
• For i 6= j, (A2 )ij = 1 − aij : either 0 or 1 depending on whether (i, j) ∈ E.
In concise form, this can be written as
A2 + A − 2I = J.
1
Now consider any eigenvector, Ax = λx. We know that one eigenvector is 1 which has eigenvalue
d = 3. Other than that, all eigenvectors x are orthogonal to 1, which also means that Jx = 0.
Then we get
(A2 + A − 2I)x = λ2 x + λx − 2x = 0.
This means that each eigenvalue apart from the largest one should satisfy a quadratic equation
λ2 + λ − 2 = 0. This equation has two roots, 1 and −2.
P
Finally, we calculate the multiplicity of each root from the condition that
λi = 0. The
largest eigenvalue has multiplicity 1 (it is obvious that any vector such that Ax = 3x is a multiple
of 1). Therefore, if eigenvalue 1 comes with multiplicity a and −2 with multiplicity b, we get
3 + a · 1 + b · (−2) = 0 and a + b = 9, which implies a = 5 and b = 4. We conclude that the Petersen
graph has eigenvalues including multiplicities (3, 1, 1, 1, 1, 1, −2, −2, −2, −2).
Finally, we show an application of eigenvalues to the following question. Consider 3 overlapping
copies of the Petersen graph. The degrees in each copy are equal to 3, so the degrees in total could
add up to 9 and form the complete graph K10 . However, something does not work here when you
try it. The following statement shows that indeed this is impossible.
Theorem 1. There is no decomposition of the edge set of K10 into 3 copies of the Petersen graph.
Proof. Suppose that A, B, C are adjacency matrices of different permutations of the Petersen graph,
such that they add up to the adjacency matrix of K10 , A + B + C = J − I. Let VA and VB be
the subspaces corresponding to eigenvalue 1 for matrices A and B, respectively. We know that
dim(VA ) = dim(VB ) = 5, and moreover both VA and VB are orthogonal to the eigenvector 1. This
implies that they cannot be disjoint (then we would have 11 independent vectors in R10 ), and
therefore there is a nonzero vector z ∈ VA ∩ VB . This vector is also orthogonal to 1, i.e. Jz = 0.
Therefore, we get
Cz = (J − I − A − B)z = −z − Az − Bz = −3z.
But we know that −3 is not an eigenvalue of the Petersen graph, which is a contradiction.
2
Moore graphs and cages
The Petersen graph is a special case of the following kind of graph: Suppose that G is d-regular,
starting from any vertex it looks like a tree up to distance k and within distance k we already see
the entire graph. In other words, the diameter of the graph is k and the girth is 2k + 1. Such
graphs are called Moore graphs.
By simple counting, we get that the number of vertices in such a graph must be
nd,k = 1 + d
k−1
X
(d − 1)i .
i=0
This is obviously the minimum possible number of vertices for a d-regular graph of girth 2k + 1.
Such graphs are also called cages.
The Petersen graph is a (unique) example of a 3-regular Moore graph of diameter 2 and girth
5. There are surprisingly few known examples of Moore graphs. We prove here that for girth 5
there cannot be too many indeed.
2
Theorem 2 (Hoffman-Singleton). The only d-regular Moore graphs of diameter 2 exist for d =
2, 3, 7 and possibly 57.
Proof. Assume G is a d-regular Moore graph of girth 5. The number of vertices is n = 1 + d + d(d −
1) = d2 + 1. Again, we consider the square of the adjacency matrix A2 . Observe that adjacent
vertices don’t share any neighbors, otherwise there is a triangle in G. Non-adjacent vertices share
exactly one neighbor, because the diameter of G is 2 and there is no 4-cycle in G. Hence, A2 has
d on the diagonal, 0 for edges and 1 for non-edges. In other words,
A2 + A − (d − 1)I = J.
If λ is an eigenvalue of A different from d, we get λ2 + λ − (d − 1) = 0. This means
1 1√
1 1p
1 + 4(d − 1) = − ±
4d − 3.
λ=− ±
2 2
2 2
√
√
Assume that − 21 + 21 4d − 3 has multiplicity a and − 21 − 12 4d − 3 has multiplicity b. We get
d−
√
a+b 1
+ (a − b) 4d − 3 = 0.
2
2
We also know that a + b = n − 1 = d2 . Therefore,
√
(a − b) 4d − 3 = a + b − 2d = d2 − 2d.
This can be true only if a = b and d = 2, or else 4d−3 is a square. Let 4d−3 = s2 , i.e. d = 14 (s2 +3).
Substituting this into the equation
d−
d2
s
+ (2a − d2 ) = 0,
2
2
we get
1
s
1
1 2
(s + 3) − (s2 + 3)2 + (2a − (s2 + 3)2 ) = 0.
4
32
2
16
From here, we get
s5 + s4 + 6s3 − 2s2 + (9 − 32a)s = 15.
To satisfy this equation by integers, s must divide 15 and hence s ∈ {1, 3, 5, 15}, giving d ∈
{1, 3, 7, 57}. Case d = 1 leads to G = K2 which is not a Moore graph.
We remark that the graph for d = 2 is C5 , for d = 3 it is the Petersen graph, for d = 7 it is
the “Hoffman-Singleton graph” (with 50 vertices and 175 edges) and for d = 57 it is not known
whether such a graph exists. This graph would need to have 3250 vertices, 92, 625 edges, diameter
2 and girth 5.
3
Math 184: Combinatorics
Lecture 20: Friends and politicians
Instructor: Benny Sudakov
1
The friendship theorem
Theorem 1. Suppose G is a (finite) graph where any two vertices share exactly one neighbor. Then
there is a vertex adjacent to all other vertices.
The interpretation of this theorem is as follows: if any two people have exactly one friend in
common, then there is a person (the politician) who is everybody’s friend. We actually prove
a stronger statement, namely that the only graph with this structure consists of a collection of
triangles that all share one vertex.
Surprisingly, the friendship theorem is false for infinite graphs. Let G0 = C5 , and let Gn+1 be
obtained from Gn by adding
S∞ a separate common neighbor to each pair of vertices that does not
have one yet. Then G = n=0 Gn is a counterexample to the theorem.
The theorem for finite graphs sounds somewhat similar to the Erd˝os-Ko-Rado theorem. Interestingly, the proof requires some spectral analysis.
Proof. Assume for the sake of contradiction that any two vertices in G share exactly one vertex,
but there is no vertex adjacent to all other vertices. Note that the first condition implies that there
is no C4 subgraph in G.
First, we claim that G is a regular graph. Suppose (u, v) ∈
/ E and w1 , . . . , wk are the neighbors
of u. We know that v and wi share a neighbor zi for each i. The vertices zi must be distinct,
otherwise we would get a C4 (between u, wi , wj and zi = zj ). Therefore, v also has at least
k neighbors. By symmetry, we conclude that deg(u) = deg(v) for any (u, v) ∈
/ E. Assuming
that w1 is the only shared neighbor of u and v, any other vertex w is adjacent to at most one
of u, v and hence deg(w) = deg(u) = deg(v). Finally, w1 is not adjacent to all these vertices, so
deg(w1 ) = deg(u) = deg(v) as well.
Hence, all degrees are equal to k. The number of walks of length 2 from a fixed vertex x is k 2 .
Because every vertex y 6= x has a unique path of length 2 from x, this way we count every vertex
once except x itself, which is counted k times. To account for that, we subtract k − 1 and the total
number of vertices is n = k 2 − k + 1.
We consider the adjacency matrix A. Since any two vertices share exactly one neighbor, the
matrix A2 has k on the diagonal and 1 everywhere else. We can write
A2 = J + (k − 1)I.
From this expression, it’s easy to see that A2 has eigenvalues n+k −1 = k 2 , and k −1 of multiplicity
n − 1. The eigenvalues
of A2 are squares of the eigenvalues of A, which are k√(the degree of each
√
vertex), and ± k − √
1. We know that the eigenvalues should sum up to 0. If k − 1 appears with
multiplicity r and − k − 1 appears with multiplicity s. This yields
√
k + (r − s) k − 1 = 0.
1
This implies k 2 = (s − r)2 (k − 1), i.e. k − 1 divides k 2 . This is possible only for k = 1, 2; otherwise,
k − 1 divides k 2 − 1 and hence cannot divide k 2 . For k = 1, 2, we get two regular graphs: K1 and
K3 . These both satisfy the conditions of the theorem and also the conclusion.
Otherwise, we conclude that there must be a vertex x adjacent to all other vertices. Then it’s
easy to see that these vertices are matched up and form triangles with the vertex x.
We finish with a related conjecture of Kotzig.
Conjecture. For any fixed ` > 2, there is no finite graph such that every pair of vertices is
connected by precisely one path of length `.
For ` = 2, we concluded that there is exactly one such graph - a collection of triangles joined
by one vertex. This conjecture has been verifed for 3 ≤ ` ≤ 33, but a general proof remains elusive.
2
The variational definition of eigenvalues
We continue with an equivalent definition of eigenvalues.
Lemma 1. The k-th largest eigenvalue of a matrix A ∈ Rn×n is equal to
λk =
max
min
dim(U )=k x∈U
xT Ax
xT Ax
=
min
max
.
xT x
dim(U)=k−1 x⊥U xT x
Here, the maximum/minimum is over all subspaces U of a given dimension, and over all nonzero
vectors x in the respective subspace.
Proof. We only prove the first equality - the second one is analogous. First of all, note that the
quantity xT Ax/xT x is invariant under replacing x by any nonzero multiple µx. Therefore, we can
assume that x is a unit vector and xT x = 1.
Consider
basis of eigenvectors u1 , u2 , . . . , un . Any vector x ∈ Rn can be written
Pn an orthonormal
as x = i=1 αi ui and the expression xT Ax reduces to

!T  n
n
n
n
X
X
X
X
αj uj  =
αi αj (ui · λj uj ) =
αi2 λi .
xT Ax =
αi ui
A
i=1
j=1
i,j=1
i=1
P
using the fact that ui · uj = P
1 if i = j and 0 otherwise. By a similar argument, xT x = ni=1 αi2
and for a unit vector we get ni=1 αi2 = 1. I.e., the expression xT Ax/xT x can be interpreted as a
weighted average of the eigenvalues.
Now consider a subspace UPgenerated by P
the first k eigenvectors, [u1 , . . . , uk ]. For any unit
vector x ∈ U, we get xT Ax = ki=1 αi2 λi and ki=1 αi2 = 1. This weighted average is at least the
smallest of the k eigenvalues, i.e.
max
min
dim(U)=k x∈U
xT Ax
≥ λk .
xT x
On the other hand, consider any subspace U of dimension k, and a subspace V = [uk , uk+1 , . . . , un ]
which has dimension n−k+1. These two subspace have a nontrivial intersection, namely there exists
2
P
P
a nonzero vector z ∈ U ∩V. We can assume that z = nj=k βj uj is a unit vector, z T z = nj=k βj2 = 1.
We obtain
n
z T Az X 2
=
βj λj ≤ λk ,
zT z
j=k
since this is a weighted average of the last n − k + 1 eigenvalues and the largest of these eigenvalues
is λk . Consequently,
z T Az
xT Ax
max min T ≤ T ≤ λk .
z z
dim(U)=k x∈U x x
3
A bound on the independence number
Theorem 2. For a d-regular graph with smallest (most negative) eigenvalue λn , the independence
number is
n
α(G) ≤
.
1 − d/λn
Keep in mind that λn < 0, so the denominator is larger than 1.
Proof. Let S ⊆ V be a maximum independent set, |S| = α. We consider a vector x = n1S − α1.
(This vector can be seen as an indicator vector of S, modified to be orthogonal to 1.) By Lemma 1
with U = Rn , we know that
xT Ax
≥ λn .
xT x
It remains to compute xT Ax. We get
xT Ax = n2 1TS A1S − 2αn1TS A1 + α2 1T A1.
P
By the property of the independent set, we have 1TS A1S =
i,j∈S aij = 0. Similarly, we get
T
T
T
1S A1 = d1S · 1 = αd and 1 A1 = d1 · 1 = dn. All in all,
xT Ax = −2αn · αd + α2 · dn = −α2 dn.
Also,
xT x = n2 ||1S ||2 − 2αn1S · 1 + α2 ||1||2 = n2 α − 2α2 n + α2 n = αn(n − α).
We conclude that
λn ≤
−α2 dn
xT Ax
d
=
=
T
x x
αn(n − α)
1 − n/α
which implies
α≤
n
.
1 − d/λn
This bound need not be tight in general, but it gives the right value in many interesting cases.
3
• The complete graph Kn has eigenvalues λ1 = d = n − 1 and λn = −1. This yields
α(G) ≤
n
= 1.
1 − d/λn
• The complete bipartite graph Kn,n has eigenvalues λ1 = d = n and λn = −n, hence
α(G) ≤
2n
= n.
1 − d/λn
• The Petersen graph has eigenvalues λ1 = d = 3 and λn = −2, therefore
α(G) ≤
10
n
=
=4
1 − d/λn
1 − 3/(−2)
which is the right value.
4
Math 184: Combinatorics
Lecture 21: Bounds on the chromatic number
Instructor: Benny Sudakov
1
Bounds on the chromatic number
Last time, we proved that for any d-regular graph,
α(G) ≤
n
.
1 − d/λn
Since the chromatic number always satisfies χ(G) ≥ n/α(G), we obtain an immediate corollary.
Corollary 1. For a d-regular graph with the smallest eigenvalue λn , the chromatic number is
χ(G) ≥ 1 −
d
.
λn
An upper bound on χ(G) can be obtained as follows. This bound is not restricted to regular
graphs - note that the special case of regular graphs (λ1 = d) is easy.
Theorem 1. For any graph G,
χ(G) ≤ 1 + λ1 .
Proof. Consider any induced subgraph H = G[S]. We claim that the average degree in H is at
most λ1 , the maximum eigenvalue of G. Consider the indicator vector 1S of S. We have
λ1 = max
x6=0
1TS A1S
2|E(S)|
xT Ax
¯
=
≥
= d(H)
T
T
x x
|S|
1S 1S
¯
where d(H)
is the average degree in H. Therefore, any subgraph H ⊂ G has average degree
¯
d(H) ≤ λ1 .
¯
We color the graph G by induction. Since d(G)
≤ λ1 , there is a vertex v of degree at most λ1 .
We remove vertex v to obtain a subgraph H. By induction, H can be colored using at most 1 + λ1
colors. Finally, we add vertex v which has at most λ1 neighbors; again, we can color it using one
of 1 + λ1 colors.
Finally, we generalize the lower bound on χ(G) to arbitrary graphs.
Theorem 2. For any graph G,
χ(G) ≥ 1 −
λ1
.
λn
Proof. Consider a coloring c : V (G) → [k], and define Ui to be the span of the basis vectors
{ej : c(j) = i}. These subspaces are obviously orthogonal.
P
Consider an eigenvector z corresponding to λ1 and write z = ki=1 αi ui , where ui ∈ Ui and
||ui || = 1. Let U˜ be the subspace spanned by u1 , u2 , . . . , uk and let S be the rectangular k × n
matrix with columns (u1 , u2 , . . . , uk ). Note that a = (α1 , . . . , αk ) is mapped to Sa = z.
1
Now consider the k × k matrix B = S T AS. For any j, we have
X
Bii = (ui )T Aui =
ajk uij uik = 0
j,k
because ui has nonzero coordinates only on vertices of color i and there are no edges between such
vertices. Therefore, T r(B) = 0.
˜ x = Su is a unit vector as well: observe that S T S = I by the
For any unit vector u ∈ U,
1
k
orthonormality of u , . . . , u , and xT x = uT S T Su = uT u = 1. Therefore,
uT Bu = uT S T ASu = (Su)T A(Su) = xT Ax ∈ [λn , λ1 ].
I.e., the eigenvalues of B lie within [λn , λ1 ]. In fact, since z = Sa is the eigenvector of A corresponding to λ1 ,
aT S T ASa
z T Az
aT Ba
=
=
= λ1
aT a
aT S T Sa
zT z
and λ1 is the maximum eigenvalue of B as well.
T r(B) = 0 is the sum of the eigenvalues of B, which is at least λ1 + (k − 1)λn . We conclude
that
0 = T r(B) ≥ λ1 + (k − 1)λn
which implies k ≥ 1 − λ1 /λn .
2
Math 184: Combinatorics
Lecture 22: Eigenvalues and expanders
Instructor: Benny Sudakov
1
Expander graphs
Expander graphs are graphs with the special property that any set of vertices S (unless very large)
has a number of outgoing edges proportional to |S|. Expansion can be defined both with respect
to the number of the edges or vertices on the boundary of S. We will stick with edge expansion,
which is more directly related to eigenvalues.
Definition 1. The edge expansion (or “Cheeger constant”) of a graph is
¯
e(S, S)
h(G) = min
|S|
|S|≤n/2
¯ is the number of edges between S and its complement.
where e(S, S)
Definition 2. A graph is a (d, )-expander if it is d-regular and h(G) ≥ .
¯ ≤ d|S| and so cannot be more than d. Graphs with comparable to d
Observe that e(S, S)
are very good expanders. Expanders are very useful in computer science. We will mention some
applications later.
2
Random graphs
It is known that random graphs are good expanders. It is easier to analyze bipartite expanders
which are defined as follows.
Definition 3. A bipartite graph G on n + n vertices L ∪ R is called a (d, β)-expander, if the degrees
in L are d and any set of vertices S ⊂ L of size |S| ≤ n/d has at least β|S| neighbors in R.
Theorem 1. Let d ≥ 4 and G be a random bipartite graph obtained by choosing d random edges
for each vertex in L. Then G is a (d, d/4)-expander with constant positive probability.
Proof. For each S ⊆ L and T ⊆ R, let ES,T denote the event that all neighbors of S are in T . The
probability of this event is
d|S|
|T |
.
Pr[ES,T ] =
n
k
Let β = d/4 ≥ 1. By the union bound, and the standard estimate nk ≤ ( ne
k ) ,
Pr[∃S, T ; |S| ≤ n/d, |T | < β|S|] ≤
n/d X
n
n
βs ds
s=1
≤
n/d X
ne 2βs βs ds
s=1
βs
This is bounded by
n
P∞
=
s
βs
n/d X
4ne ds/2 ds ds
ds/2
s=1 (e/4)
s=1
ds
4n
n
=
≤
n/d 2 X
n
βs ds
s=1
n/d X
eds ds/2
s=1
4n
= (e/4)d/2 /(1 − (e/4)d/2 ) < 1 for d ≥ 4.
1
βs
≤
n
n/d X
e ds/2
s=1
4
.
3
Eigenvalue bounds on expansion
In general, random graphs are very good expanders, so the existence of expanders is not hard to
establish. However, the difficult question is how to construct expanders explicitly. For now, we
leave this question aside and we explore the connection between expansion and eigenvalues.
Theorem 2. For any d-regular graph G with second eigenvalue λ2 ,
1
h(G) ≥ (d − λ2 ).
2
Proof. For any subset of vertices S of size s, let x = (n − s)1S − s1S¯ .1 We get
xT x = (n − s)2 s + s2 (n − s) = s(n − s)n
and
X
xT Ax = 2
¯ + 2s2 e(S).
¯
xi xj = 2(n − s)2 e(S) − 2s(n − s)e(S, S)
(i,j)∈E
¯ observe that every degree is equal to d, and ds can be viewed as
To eliminate e(S) and e(S),
¯
¯
counting each edge in e(S) twice and each edge in e(S, S).
Therefore, ds = 2e(S) + e(S, S).
¯
¯
Similarly, d(n − s) = 2e(S) + e(S, S). This yields
¯ − 2s(n − s)e(S, S)
¯ + s2 (d(n − s) − e(S, S))
¯ = dns(n − s) − n2 e(S, S).
¯
xT Ax = (n − s)2 (ds − e(S, S))
Since x · 1 = 0, we can use the variational definition of λ2 to claim that
λ2 ≥
¯
¯
xT Ax
dns(n − s) − n2 e(S, S)
n e(S, S)
=
=
d
−
.
xT x
s(n − s)n
s(n − s)
For any set S of size s ≤ n/2, we have
¯
e(S, S)
n−s
1
≥
(d − λ2 ) ≥ (d − λ2 ).
|S|
n
2
This theorem shows that if d − λ2 is large, for example λ2 ≤ d/2, then the graph is a (d, d/4)expander - very close to best possible. The quantity d − λ2 is called the spectral gap.
There is also a bound in the opposite direction, although we will not prove it here.
Theorem 3. For any d-regular graph with second eigenvalue λ2 ,
p
h(G) ≤ d(d − λ2 ).
1
Note that we used exactly the same vector to prove our bound on the independence number.
2
4
How large can the spectral gap be?
We have seen that graphs where the maximum eigenvalue λ1 dominates all other eigenvalues have
very interesting properties. Here we ask the question, how small can the remaining eigenvalues
possibly be? We know that the complete graph Kn has eigenvalues n − 1 and −1 and therefore
λ = maxi6=1 |λi | is dominated by λ1 by a factor of n − 1, the degree in Kn . For a constant degree d
and large n, this cannot happen.
Theorem 4. For any constant d > 1, any d-regular graph has an eigenvalue λi 6= d of absolute
value
√
λ = max |λi | ≥ (1 − o(1)) d
i6=1
where o(1) → 0 as n → ∞.
Proof. Consider the square of the adjacency matrix A2 . A2 has d on the diagonal, and therefore
T r(A2 ) = dn. On the other hand, the eigenvalues of A2 are λ2i , and so
T r(A2 ) =
n
X
λ2i ≤ d2 + (n − 1)λ2 .
i=1
Putting these together, we get
λ2 ≥
dn − d2
≥ (1 − d/n)d = (1 − o(1))d.
n−1
√
So the best possible spectral gap that we can have is roughly
between d and d. More precisely,
√
it is known that the second eigenvalue is always at least 2 d − 1 − o(1). This leads to the definition
of Ramanujan graphs.
Definition 4. A √
d-regular graph is Ramanujan, if all eigenvalues in absolute value are either equal
to d or at most 2 d − 1.
√ It is known in fact that a random d-regular graph has all non-trivial eigenvalues bounded by
2 d − 1 + o(1) in absolute value. However, it is more difficult to come up with explicit Ramanujan
graphs.
5
Explicit expanders
The following graph is a beautiful algebraic construction of Margulis which was the earliest explicit
expander known.
Definition 5 (Margulis’ graph). Let V = Zn × Zn and define an 8-regular graph on V as follows.
Let
1 2
1 0
1
0
T1 =
, T2 =
, e1 =
, e2 =
.
0 1
2 1
0
1
Each vertex v ∈ Zn × Zn is adjacent to T1 v, T2 v, T1 v + e1 , T2 v + e2 and four other vertices obtained
by the inverse tranformations. (This is a multigraph with possible multiedges and loops.)
3
This graph has maximum
eigenvalue d = 8, and it can be also computed that the second
√
eigenvalue is λ2 ≤ 5 2. (We will not show this here.) The spectral gap is d − λ2 ≥ 0.92 and
therefore this graph is a (8, 0.46)-expander.
Later, even simpler constructions were found.
Definition 6. Let p be prime and let V = Zp . We define a 3-regular graph G = (V, E) where the
edges are of two types: (x, x + 1), and (x, x−1 ) for each x ∈ Zp . (We assume that 0−1 = 0 for this
purpose.)
It is known that this is a (3, )-expander for some fixed >
this relies on deep results in number theory.
√ These graphs are not Ramanujan graphs, i.e. their second
d. However, even such graphs can be constructed explicitly.
Ramanujan graphs was found by Lubotzky, Phillips and Sarnak
4
0 and any prime p. The proof of
eigenvalue is not on the order of
The first explicit construction of
in 1988.
Math 184: Combinatorics
Lecture 23: Random walks on expanders
Instructor: Benny Sudakov
1
Random walks on expanders
Expanders have particularly nice behavior with respect to random walks. A random walk is a
sequence of vertices, where each successive vertex is obtain by following a random edge from the
previous vertex. Starting from a particular vertex v0 , we will be interested in the probability
distribution of the vertex vt after t steps of a random walk. This distribution can be described by
(t)
a vector x(t) , where xi = Pr[vt = i]. First, we have the following simple lemma.
Lemma 1. Let G be a d-regular graph with adjacency matrix A, and let B = d1 A. Then starting
from probability distribution x(0) , a random walk after t steps has probability distribution
x(t) = B t x(0) .
Proof. We show that starting from a distribution x, the distribution after one step is y = Bx:
yj = Pr[v1 = j] =
X
Pr[v1 = j | v0 = i] Pr[v0 = i] =
(i,j)∈E
X 1
1
· xi = (Ax)j = (Bx)j.
d
d
(i,j)∈E
Now the lemma follows by induction:
x(t) = Bx(t−1) = B(B t−1 x(0) ) = B t x(0) .
It is very natural to analyze the behavior of a random walk in the basis of eigenvectors of B
(which are equal to the eigenvectors of A). The eigenvalues of B are equal to the eigenvalues of A
divided by d. For an expander graph, we obtain the following.
Theorem 1. Suppose that G is a d-regular graph with all other eigenvalues bounded by λ in absolute
value. Let x(0) be any initial distribution (e.g. concentrated on one vertex v0 ), and let u = n1 1 denote
the uniform distribution. Then after t steps of a random walk on G, we obtain
t
λ
||x(t) − u|| ≤
.
d
Note: we bound the deviation from u in the L2 norm. For probability distributions, it is more
natural to use the total variation distance, or L1 norm. In the L1 norm, we get ||x(t) − u||1 =
√
1 · |x(t) − u| ≤ n(λ/d)t by Cauchy-Schwartz.
1
Proof. Let x(0) = u + v where u = n1 1. Since x(0) · 1 = u · 1 = 1, we have v · u = 0. Hence, We have
||v||2 = ||x(0) ||2 − ||u||2 ≤ 1. After t steps of a random walk, we have x(t) = B t x(0) . Since Bu = u,
the difference between x(t) and u can be bounded by
2t
2t
λ
λ
2
(t)
2
t (0)
2
t
2
T 2t
||v|| ≤
.
||x − u|| = ||B (x − u)|| = ||B v|| = v B v ≤
d
d
Thus, a random walk on an expander (where λ/d is small) converges very quickly to the uniform
distribution. Now, we prove an even stronger statement. Given that the distribution is already
uniform (or very close to uniform), we are interested in how often the random walk visits a certain
set S ⊂ V . Although in each particular step, the probability is σ = |S|/|V |, this might not be the
case in a sequence of steps (which are certainly not independent). Nonetheless, we prove that the
random walk behaves almost as if successive locations were independent.
Lemma 2. Let (v0 , v1 , . . .) be a random walk, where v0 has the uniform distribution. Let S ⊂ V
be a subset of vertices of size |S| = σ|V |. Then
λ t
Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] ≤ σ +
.
d
Note that for t independent samples, the probability would have been σ t . If λ/d << σ, we are
not making a huge error by imagining that the vertices v0 , . . . , vt are in fact independent.
Proof. Let P be a projection operator corresponding to S, i.e. P x makes all coordinates outside of
S zero, and leaves the coordinates in S intact. Then we can write
Pr[v0 ∈ S] = ||P u||1 ,
Pr[v0 ∈ S & v1 ∈ S] = ||P BP u||1 ,
etc., and
Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] = ||(P BP )t u||1 .
Thus the goal is to analyze the operator P BP . The idea is that B shrinks all the components of a
vector except the component parallel to 1, and P shrinks the component parallel to 1.
More formally, for any vector x, write P x = αu + v where u = n1 1 and v ⊥ u. Observe that
√
uT P x
xT P u
=
≤ n||x||2 · ||P u||2 = σn||x||2
T
u u
1/n
p
using Cauchy-Schwartz and ||P u||2 = σ/n. We have P BP x = αP Bu + P Bv and by the triangle
inequality,
λ
||P BP x||2 ≤ ||αP Bu||2 + ||P Bv||2 ≤ α||P u||2 + ||v||2 .
d
p
√
Again, we use ||P u||2 = σ/n and α ≤ σn||x||2 , which gives
λ
λ
||P BP x||2 ≤ σ||x||2 + ||v||2 ≤ σ +
||x||2 .
d
d
α=
2
Iterating t times, we get
||(P BP )t u||2 ≤
σ+
λ
d
t
||u||2 .
Finally, we return to the L1 norm. We obtain
t
||(P BP ) u||1 ≤
2
√
t
n||(P BP ) u||2 ≤
√
λ
n σ+
d
t
||u||2 =
λ
σ+
d
t
.
Application to randomized algorithms
Suppose we have a randomized algorithm which uses r random coin flips and returns an answer,
YES or NO. Let us assume that the algorithm is “safe” in the sense that if the correct answer is
YES, our algorithm always returns YES. However, if the correct answer is NO, our algorithm is
allowed to make a mistake with probability 1/2 - in other words, it will answer YES with probability
at most 1/2.
This might not be very satisfying, but we can actually make the probability of error much
smaller by running the algorithm several times. It’s easy to see that if we run the algorithm t times
and answer YES if and only if the algorithm returned YES every time, we make an error with
probability at most 1/2t .
However, to implement this we need tr random coin flips. Sometimes, true randomness is a
scarce resource and we would like to save the number of coin flips that we really need. Here’s how
we can do this using expanders.
• Let G be a d-regular expander with λ/d ≤ 1/4, on the vertex set V = {0, 1}r . I.e., each
vertex corresponds to the outcome of r coin flips.
• Using r random coin flips, generate a random vertex v0 ∈ V .
• Perform a random walk on G of length t, in each step using log d random bits to find a random
neighbor vi+1 of vi .
• Run the algorithm t + 1 times, with random bits corresponding to v0 , v1 , . . . , vt .
• Answer YES if and only if the algorithm always returned YES.
Observe that the total number of random bits needed here is r + t log d, much smaller than rt
for constant d (like d = 8). It remains to analyze the probability of error.
Assume that the true answer is NO and let S be the set of vertices such that the algorithm
with the respective random bits makes a mistake, i.e. returns YES. Since the true answer is NO,
we know that |S| ≤ 21 |V |. By Lemma 2 with σ = 1/2 and λ/d ≤ 1/4, we get
t
3
Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] ≤
.
4
I.e., the probability of error decreases exponentially in t.
3