Math 184: Combinatorics Lecture 1: Introduction Instructor: Benny Sudakov 1 What is combinatorics? Defining combinatorics within the larger field of mathematics is not an easy task. Typically, combinatorics deals with finite structures such as graphs, hypergraphs, partitions or partially ordered sets. However, rather than the object of study, what characterizes combinatorics are its methods: counting arguments, induction, inclusion-exclusion, the probabilistic method - in general, surprising applications of relatively elementary tools, rather than gradual development of a sophisticated machinery. That is what makes combinatorics very elegant and accessible, and why combinatorial methods should be in the toolbox of any mainstream mathematician. Let’s start with a few examples where combinatorial ideas play a key role. 1. Ramsey theory. In the 1950’s, a Hungarian sociologist S. Szalai studied friendship relationships between children. He observed that in any group of around 20 children, he was able to find four children who were mutual friends, or four children such that no two of them were friends. Before drawing any sociological conclusions, Szalai consulted three eminent mathematicians in Hungary at that time: Erd˝os, Tur´an and S´os. A brief discussion revealed that indeed this is a mathematical phenomenon rather than a sociological one. For any symmetric relation R on at least 18 elements, there is a subset S of 4 elements such that R contains either all pairs in S or none of them. This fact is a special case of Ramsey’s theorem proved in 1930, the foundation of Ramsey theory which developed later into a rich area of combinatorics. 2. Tournament paradox. Suppose that n basketball teams compete in a tournament where each pair of teams plays exactly one game. The organizers want to award the k best teams. However, whichever k teams they pick, there is always another team that beats them all! Is this possible? It can be proved using a random construction that for any k > 0 there is n > k such that this can indeed happen. 3. Brouwer’s Theorem. In 1911, Luitzen Brower published his famous Fixed Point Theorem: Every continuous map f : B n → B n (where B n is an n-dimensional ball) has a fixed point, f (x) = x. The special case of n = 1 follows easily from the intermediate value theorem. For higher dimensions, however, the origianal proof was complicated. In 1928, Emanuel Sperner found a simple combinatorial result which implies Brouwer’s fixed point theorem in an elegant way. The proof of Sperner’s lemma is equally elegant, by double counting. 4. Borsuk’s conjecture. In 1933, Karol Borsuk published a paper which contained a proof of a conjecture stated by Stanislaw Ulam: Every continuous map f : S n → Rn (where S n is an n-dimensional sphere) maps two antipodal points to the same value, f (x) = f (−x). 1 Borsuk also asked whether the following conjecture is true: Every set S ⊂ Rn of finite diameter can be partitioned into n + 1 sets of strictly smaller diameter. The example of an n-dimensional regular simplex shows than n + 1 parts are necessary, since we need to separate the n + 1 vertices into different parts. Borsuk’s conjecture was proved for smooth bodies, using the Borsuk-Ulam theorem above. However, the general case was open until 1993 when Kahn and Kalai disproved it in a dramatic √way. They constructed discrete sets of points that cannot be partitioned into fewer than 1.2 n parts. 5. Littlewood and Offord studied the following problem in 1943: Given complex numbers z1 , z2 , . . . , zn of absolute value |zi | ≥ 1, what is the maximum number Pn of distinct sums i=1 ±zi that lie inside some unit disk? Kleitman and Katona proved that n the maximum number is n/2 , using the methods of extremal combinatorics. 2 Graph theory Let us begin with an area of combinatorics called graph theory. What we mean by a graph here is not the graph of a function, but a structure consisting of vertices some of which are connected by edges. Definition 1. A graph is a pair G = (V, E) where V is a finite set whose elements we call vertices and E ⊆ V2 is a collection of pairs of vertices that we call edges. Some basic examples of graphs are: • Kn : the complete graph on n vertices, where every pair of vertices forms an edge. • Ks,t : the complete bipartite graph, where V = S ∪ T , |S| = s, |T | = t, and every pair in S × T forms an edge. • C` : the cycle of length `, where V = {v0 , v1 , . . . , v`−1 } and {vi , vj } ∈ E if j = i + 1 (mod `). A famous story which stands at the beginning of graph theory is the problem of the bridges of K¨onigsberg. K¨ onigsberg had 7 bridges connecting different parts of the town. Local inhabitants were wondering whether it was possible to walk across each bridge exactly once and return to the same point. In the language of graph theory, the bridges are edges connecting different vertices, i.e. parts of town. The bridges of K¨ onigsberg looked like the following graph (or rather, “multigraph”, since here multiple edges connect the same pair of vertices): Figure 1: The graph of K¨onigsberg. 2 The question is whether it is possible to walk around the graph so that we traverse each edge exactly once and come back to the same vertex. Such a walk is called an Eulerian circuit and graphs where this is possible are called Eulerian (after a paper by Euler where this question was considered in 1736). The following result shows that the graph of K¨onigsberg is not eulerian. First, we need some definitions. Definition 2. We say that an edge e is incident with a vertex v, if v ∈ e. Two vertices are adjacent, if they form an edge. The degree d(v) of a vertex is the number of edges incident with v. Definition 3. A walk is a sequence of vertices (v1 , v2 , . . . , vk ) (with possible repetition) such that each successive pair (vi , vi+1 ) forms an edge. An eulerian circuit is a walk where vk = v1 and each edge appears exactly once. A path is a walk (v1 , v2 , . . . , vk ) without repetition of vertices. A graph G is called connected if there is a path between any pair of vertices. Theorem 1. A (multi)graph G is Eulerian if and only if it is connected and the degree of every vertex is even. Proof. Suppose G has an Eulerian circuit. Draw arrows along the edges as they are traversed by the circuit. Then the circuit induces a connecting path for any pair of vertices. Also, for every vertex the number of incoming edges equals the number of outgoing edges. Hence the degree of each vertex is even. Conversely, assume that G is connected and all degrees are even. First, we prove the following: the set of edges E can be decomposed into a union of disjoint cycles C1 , . . . , Cn . This is true, because for any graph with even degrees, we can find a cycle by walking arbitrarily until we hit our own path. The we remove the cycle, all degrees are still even, and we continue as long as there are any edges in the graph. Now we construct the Eulerian circuit L inductively as follows. Start with L := C1 . We incorporate the cycles into L one by one. At any point, there must be an unused cycle Cj that intersects L, otherwise the graph is disconnected. Then, we add Cj to the circuit L by walking along L up to the first point where we hit Cj , then walk around Cj and continue along L. We repeat this procedure until all cycles are exhausted. Since the cycles cover each edge exactly once, our circuit also covers each edge exactly once at the end. 3 Math 184: Combinatorics Lecture 2: The pigeonhole principle and double counting Instructor: Benny Sudakov 1 The pigeonhole principle If n objects are placed in k boxes, k < n, then at least one box contains more than one object. This is so obvious, one might think that nothing non-trivial can be derived from this “principle”. And yet, this principle is very useful. 1.1 Two equal degrees The following consequence is quite easy. Theorem 1. In any graph, there are two vertices of equal degree. Proof. For any graph on n vertices, the degrees are between 0 and n−1. Therefore, the only way all degrees could be different is that there is exactly one vertex of each possible degree. In particular, there is a vertex v of degree 0 and a vertex w of degree n − 1. However, if there is an edge (v, w), then v cannot have degree 0, and if there is no edge (v, w) then w cannot have degree n − 1. This is a contradiction. 1.2 Subsets without divisors Let [2n] = {1, 2, . . . , 2n}. Suppose you want to pick a subset S ⊂ [2n] so that no number in S divides another. How many numbers can you pick? Obviously, you can take S = {n + 1, n + 2, . . . , 2n} and no number divides another. Can you pick more then n numbers? The answer is negative. Theorem 2. For any subset S ⊂ [2n] of size |S| > n, there are two numbers a, b ∈ S such that a|b. Proof. For each odd number a ∈ [2n], let Ca = {2k a : k ≥ 0, 2k a ≤ 2n}. The number of these classes is n and every element b ∈ [2n] belongs to exactly one of them, for a obtained by dividing b by the highest possible power of 2. Consider S ⊂ [2n] of size |S| > n. By the pigeonhole principle, there is a class Ca that contains at least two elements of S. 1.3 Rational approximation Theorem 3. For any x ∈ R and n > 0, there is a rational number p/q, 1 ≤ q ≤ n, such that x − p < 1 . q nq Note that it is easy to get an approximation whose error is at most n1 , by fixing the denominator to be q = n. The improved approximation uses the pigeonhole principle and is due to Dirichlet (1879). 1 Proof. Let {x} denote the fractional part of x. Consider {ax} for a = 1, 2, . . . , n + 1 and place these n + 1 numbers into n buckets [0, 1/n), [1/n, 2/n), . . ., [(n − 1)/n, 1). There must be a bucket containing at least two numbers {ax} ≤ {a0 x}. We set q = a0 − a and we get {qx} = {a0 x − ax} < 1/n. This means that qx = p + where p is an integer and = {qx} < 1/n. Hence, x= 1.4 p + . q q Monotone subsequences Finally, we give an application which is less immediate. Given an arbitrary sequence of distinct real numbers, what is the largest monotone subsequence that we can always find? It is easy to construct sequences of mn numbers such that any increasing subsequence has length at most m and any decreasing subsequence has length at most n. We show that this is an extremal example. Theorem 4. For any sequence of mn+1 distinct real numbers a0 , a1 , . . . , amn , there is an increasing subsequence of length m + 1 or a decreasing subsequence of length n + 1. Proof. Let ti denote the maximum length of an increasing subsequence starting with ai . If ti > m for some i, we are done. So assume ti ∈ {1, 2, . . . , m} for all i; i.e. we have mn + 1 numbers in m buckets. By the pigeonhole principle, there must be a value s ∈ {1, 2, . . . , m} such that ti = s for at least n + 1 indices, i0 < i1 < . . . < in . Now we claim that ai0 > ai1 > . . . > ain . Indeed, if there were a pair such that aij < aij+1 , we could extend the increasing subsequence starting at aij+1 by adding aij , to get an increasing subsequence of length s + 1. However, this contradicts tij = s. 2 Double counting Another elementary trick which often brings surprising results is double counting. As the name suggests, the trick involves counting a certain quantity in two different ways and comparing the results. 2.1 Sum of degrees in a graph The following observation is due to Leonard Euler (1736). Lemma 1. For any graph G, the sum of degrees over all vertices is even. Proof. For a vertex v and edge e, let i(v, e) = 1 if v ∈ e and 0 otherwise. We count all the incidences between vertices and edges in two ways: P P P P • v∈V,e∈E i(v, e) = v∈V e∈E i(v, e) = v∈V d(v), because d(v) is exactly the number of edges incident with v. P P P • v∈V,e∈E i(v, e) = e∈E v∈V i(v, e) = 2|E|, because every edge is incident with exactly two vertices. Thus we have proved that the sum of all degrees is exactly twice the number of edges. 2 2.2 Average number of divisors Let t(n) denote the number of divisors of n. E.g., for a prime n, t(n) = 2, while for a power of 2, t(2k ) = k + 1. We would like to know what is the average number of divisors, n 1X t¯(n) = t(j). n j=1 This seems to be a complicated question; however, double counting gives a simple answer. Let P d(i, j) = 1Pif i|j and 0 otherwise. I.e., t(j) = ni=1 d(i, j). We count the total number of dividing pairs i|j, ni,j=1 d(i, j), in two different ways. Pn Pn ¯ • i,j=1 d(i, j) = j=1 t(j) = n · t(n). Pn Pn n • i,j=1 d(i, j) ' i=1 i = n · Hn , where Hn is the n-th harmonic number. In the second case, we have been somewhat sloppy and neglected some roundoff errors, but these add up to at most n overall. We can conclude that t(n) ' Hn ' ln n, within an error of 1. 3 Math 184: Combinatorics Lecture 3: Sperner’s lemma and Brouwer’s theorem Instructor: Benny Sudakov 1 Sperner’s lemma In 1928, young Emanuel Sperner found a surprisingly simple proof of Brouwer’s famous Fixed Point Theorem: Every continous map of an n-dimensional ball to itself has a fixed point. At the heart of his proof is the following combinatorial lemma. First, we need to define the notions of simplicial subdivision and proper coloring. Definition 1. An n-dimensional simplex is a convex linear combination of n+1 points in a general position. I.e., for given vertices v1 , . . . , vn+1 , the simplex would be (n+1 ) n+1 X X S= αi vi : αi ≥ 0, αi = 1 . i=1 i=1 A simplicial subdivision of an n-dimensional simplex S is a partition of S into small simplices (“cells”) such that any two cells are either disjoint, or they share a full face of a certain dimension. Definition 2. A proper coloring of a simplicial subdivision is an assignment of n + 1 colors to the vertices of the subdivision, so that the vertices of S receive all different colors, and points on each face of S use only the colors of the vertices defining the respective face of S. For example, for n = 2 we have a subdivision of a triangle T into triangular cells. A proper coloring of T assigns different colors to the 3 vertices of T , and inside vertices on each edge of T use only the two colors of the respective endpoints. (Note that we do not require that endpoints of an edge receive different colors.) Lemma 1 (Sperner, 1928). Every properly colored simplicial subdivision contains a cell whose vertices have all different colors. Proof. Let us call a cell of the subdivision a rainbow cell, if its vertices receive all different colors. We actually prove a stronger statement, namely that the number of rainbow cells is odd for any proper coloring. Case n = 1. First, let us consider the 1-dimensional case. Here, we have a line segment (a, b) subdivided into smaller segments, and we color the vertices of the subdivision with 2 colors. It is required that a and b receive different colors. Thus, going from a to b, we must switch color an odd number of times, so that we get a different color for b. Hence, there is an odd number of small segments that receive two different colors. 1 Case n = 2. We have a properly colored simplicial subdivision of a triangle T . Let Q denote the number of cells colored (1, 1, 2) or (1, 2, 2), and R the number of rainbow cells, colored (1, 2, 3). Consider edges in the subdivision whose endpoints receive colors 1 and 2. Let X denote the number of boundary edges colored (1, 2), and Y the number of interior edges colored (1, 2) (inside the triangle T ). We count in two different ways: • Over cells of the subdivision: For each cell of type Q, we get 2 edges colored (1, 2), while for each cell of type R, we get exactly 1 such edge. Note that this way we count internal edges of type (1, 2) twice, whereas boundary edges only once. We conclude that 2Q + R = X + 2Y . • Over the boundary of T : Edges colored (1, 2) can be only inside the edge between two vertices of T colored 1 and 2. As we already argued in the 1-dimensional case, between 1 and 2 there must be an odd number of edges colored (1, 2). Hence, X is odd. This implies that R is also odd. General case. In the general n-dimensional case, we proceed by induction on n. We have a proper coloring of a simplicial subdivision of S using n + 1 colors. Let R denote the number of rainbow cells, using all n + 1 colors. Let Q denote the number of simplicial cells that get all the colors except n + 1, i.e. they are colored using {1, 2, . . . , n} so that exactly one of these colors is used twice and the other colors once. Also, we consider (n − 1)-dimensional faces that use exactly the colors {1, 2, . . . , n}. Let X denote the number of such faces on the boundary of S, and Y the number of such faces inside S. Again, we count in two different ways. • Each cell of type R contributes exactly one face colored {1, 2, . . . , n}. Each cell of type Q contributes exactly two faces colored {1, 2, . . . , n}. However, inside faces appear in two cells while boundary faces appear in one cell. Hence, we get the equation 2Q + R = X + 2Y . • On the boundary, the only (n − 1)-dimensional faces colored {1, 2, . . . , n} can be on the face F ⊂ S whose vertices are colored {1, 2, . . . , n}. Here, we use the inductive hypothesis for F , which forms a properly colored (n − 1)-dimensional subdivision. By the hypothesis, F contains an odd number of rainbow (n − 1)-dimensional cells, i.e. X is odd. We conclude that R is odd as well. 2 Brower’s Fixed Point Theorem Theorem 1 (Brouwer, 1911). Let B n denote an n-dimensional ball. For any continuous map f : B n → B n , there is a point x ∈ B n such that f (x) = x. We show how this theorem follows from Sperner’s lemma. It will be convenient to work with a simplex instead of a ball (which is equivalent by a homeomorphism). Specifically, let S be a simplex embedded in Rn+1 so that the vertices of S are v1 = (1, 0, . . . , 0), v2 = (0, 1, . . . , 0), ..., vn+1 = (0, 0, . . . , 1). Let f : S → S be a continuous map and assume that it has no fixed point. We construct a sequence of subdivisions of S that we denote by S1 , S2 , S3 , . . .. Each Sj is a subdivision of Sj−1 , so that the size of each cell in Sj tends to zero as j → ∞. 2 Now we define a coloring of Sj . For each vertex x ∈ Sj , we assign a color c(x)P ∈ [n+1] such that (f (x))c(x) < xc(x) . To see that this is possible, note that for each point x ∈ S, xi = 1, and also P f (x)i = 1. Unless f (x) = x, there are coordinates such that (f (x))i < xi and also (f (x))i0 > xi0 . In case there are multiple coordinates such that (f (x))i < xi , we pick the smallest i. Let us check that this is a proper coloring in the sense of Sperner’s lemma. For vertices of S, vi = (0, . . . , 1, . . . , 0), we have c(x) = i because i is the only coordinate where (f (x))i < xi is possible. Similarly, for vertices on a certain faces of S, e.g. x = conv{vi : i ∈ A}, the only coordinates where (f (x))i < xi is possible are those where i ∈ A, and hence c(x) ∈ A. Sperner’s lemma implies that there is a rainbow cell with vertices x(j,1) , . . . , x(j,n+1) ∈ Sj . In (j,i) other words, (f (x(j,i) ))i < xi for each i ∈ [n + 1]. Since this is true for each Sj , we get a sequence of points {x(j,1) } inside a compact set S which has a convergent subsequence. Let us throw away all the elements outside of this subsequence - we can assume that {x(j,1) } itself is convergent. Since the size of the cells in Sj tends to zero, the limits limj→∞ x(j,i) are the same in fact for all i ∈ [n + 1] - let’s call this common limit point x∗ = limj→∞ x(j,i) . We assumed that there is no fixed point, therefore f (x∗ ) 6= x∗ . This means that (f (x∗ ))i > x∗i (j,i) for some coordinate i. But we know that (f (x(j,i) ))i < xi for all j and limj→∞ x(j,i) = x∗ , which implies (f (x∗ ))i ≤ x∗i by continuity. This contradicts the assumption that there is no fixed point. 3 Math 184: Combinatorics Lecture 4: Principle of inclusion and exclusion Instructor: Benny Sudakov 1 Principle of inclusion and exclusion Very often, we need to calculate the number of elements in the union of certain sets. Assuming that we know the sizes of these sets, and their mutual intersections, the principle of inclusion and exclusion allows us to do exactly that. Suppose that you have two sets A, B. The size of the union is certainly at most |A| + |B|. This way, however, we are counting twice all elements in A ∩ B, the intersection of the two sets. To correct for this, we subtract |A ∩ B| to obtain the following formula: |A ∪ B| = |A| + |B| − |A ∩ B|. In general, the formula gets more complicated because we have to take into account intersections of multiple sets. The following formula is what we call the principle of inclusion and exclusion. Lemma 1. For any collection of finite sets A1 , A2 , . . . , An , we have n [ \ X (−1)|I|+1 Ai . Ai = i=1 i∈I ∅6=I⊆[n] Writing out the formula more explicitly, we get |A1 ∪ . . . An | = |A1 | + . . . + |An | − |A1 ∩ A2 | − . . . − |An−1 ∩ An | + |A1 ∩ A2 ∩ A3 | + . . . In other words, we add up the sizes of the sets, subtract intersections of pairs, add intersection of triples, etc. The proof of this formula is very short and elegant, using the notion of a characteristic function. Proof. Assume that A1 , . . . , An ⊆ X. For each set Ai , define the “characteristic function” fi (x) where fi (x) = 1 if x ∈ Ai and fi (x) = 0 if x ∈ / Ai . We consider the following formula: n Y (1 − fi (x)). F (x) = i=1 Observe that this is the characteristic function of the complement of any of the sets Ai . Hence, n X [ F (x) = |X \ Ai |. Sn i=1 Ai : i=1 x∈X Now we write F (x) differently, by expanding the product into 2n terms: F (x) = n Y X Y (1 − fi (x)) = (−1)|I| fi (x). i=1 I⊆[n] 1 i∈I it is 1 iff x is not in (1) Observe that Q i∈I fi (x) is the characteristic function of X F (x) = x∈X X (−1)|I| XY T i∈I X fi (x) = x∈X i∈I I⊆[n] Ai . Therefore, we get (−1)|I| | \ Ai |. (2) i∈I I⊆[n] By comparing (1) and (2), we see that |X \ n [ Ai | = |X| − | i=1 n [ Ai | = i=1 X (−1)|I| | \ Ai |. i∈I I⊆[n] T The first term in the sum here is | i∈∅ Ai | = |X| by convention (consider how we obtained this term in the derivation above). Therefore, the lemma follows. 2 The number of derangements As an application of this principle, consider the following problem. A sequence of n theatergoers want to pick up their hats on the way out. However, the deranged attendant does not know which hat belongs to whom and hands them out in a random order. What is the probability that nobody gets their own hat? More formally, we have a random permutation π : [n] → [n] and we are asking what is the probability that ∀i; π(i) 6= i. Such permutations are called derangements. Theorem 1. The probability that a random permutation π : [n] → [n] is a derangement is Pn k /k!, which tends to 1/e = 0.3678 . . . as n → ∞. (−1) k=0 Proof. Let X be the set of all n! permutations, and let Ai denote the set of permutations that fix element i, i.e. Ai = {π ∈ X | π(i) = i}. By simple counting, there are (nT− 1)! permutations in Ai , since by fixing i, we still have n − 1 elements to permute. Similarly, i∈I Ai consists of the permutations where all elements of I are fixed, hence the number of such permutations is (n − |I|)!. By inclusion-exclusion, the number of permutations with some fixed point is [ \ X (−1)|I|+1 Ai Ai = i∈I = = i∈I ∅6=I⊆[n] n X k=1 n X k+1 n (−1) (n − k)! k (−1)k+1 k=1 n! . k! Pn k+1 /k!. By Hence, the probability that a random permutation has some fixed point is P k=1 (−1) n taking the complement, the probability that there is no fixed point is P 1 − k=1 (−1)k+1 /k! = P n k −1 k = ∞ k=0 (−1) /k!. In the limit, this tends to the Taylor expansion of e k=0 (−1) /k!. 2 3 The number of surjections Next, consider the following situation. There are m hunters and n rabbits, m ≥ n. Each hunter shoots at random and kills exactly one (random) rabbit. What is the probability that all rabbits are dead? P k n m Theorem 2. The probability that all rabbits are dead is n−1 k=0 (−1) k (1 − k/n) . Proof. We formalize this problem as follows. A function f : [m] → [n] is called a surjection if it covers all elements of [n]. There are nm functions total; we are interested in how many of these are surjections. We denote by Ai the set of functions that leave element i uncovered, i.e. Ai = {f : [m] → [n] | ∀j; f (j) 6= i}. The number of such functions is (n−1)m , since we have n−1 choices for each of f (1), f (2), . . . , f (m). Similarly, \ | Ai | = (n − |I|)m i∈I because we have |I| forbidden choices for each function value. By inclusion-exclusion, we get that the number of functions which are not surjections is | m [ X Ai | = i=1 (−1)|I|+1 ∅6=I⊆[n] n (n − |I|)m . |I| By taking the complement, the number of surjections is m n −| m [ i=1 Ai | = n−1 X k=0 n (−1) (n − k)m . k Dividing by nm , we get the desired probability. 3 k Math 184: Combinatorics Lecture 5: Ramsey Theory Instructor: Benny Sudakov 1 Ramsey’s theorem for graphs The metastatement of Ramsey theory is that “complete disorder is impossible”. In other words, in a large system, however complicated, there is always a smaller subsystem which exhibits some sort of special structure. Perhaps the oldest statement of this type is the following. Proposition 1. Among any six persons, either there are three persons any two of whom are friends, or there are three persons such that no two of them are friends. This is not a sociological claim, but a very simple graph-theoretic statement: in other words, in any graph on 6 vertices, there is either a triangle or three vertices with no edges between them. Proof. Let G = (V, E) be a graph and |V | = 6. Fix a vertex v ∈ V . We consider two cases. • If the degree of v is at least 3, then consider three neighbors of v, call them x, y, z. If any two among {x, y, z} are friends, we are done because they form a triangle together with v. If not, no two of {x, y, z} are friends and we are done as well. • If the degree of v is at most 2, then there are at least three other vertices which are not neighbors of v, call them x, y, z. In this case, the argument is complementary to the previous one. Either {x, y, z} are mutual friends, in which case we are done. Or there are two among {x, y, z} who are not friends, for example x and y, and then no two of {v, x, y} are friends. More generally, we consider the following setting. We color the edges of Kn (a complete graph on n vertices) with a certain number of colors and we ask whether there is a complete subgraph (a clique) of a certain size such that all its edges have the same color. We shall see that this is always true for a sufficiently large n. Note that the question about frienships corresponds to a coloring of K6 with 2 colors, “friendly” and “unfriendly”. Equivalently, we start with an arbitrary graph and we want to find either a clique or the complement of a clique, which is called an independent set. This leads to the definition of Ramsey numbers. Definition 1. A clique of size t is a set of t vertices such that all pairs among them are edges. An independent set of size s is a set of s vertices such that there is no edge between them. Ramsey’s theorem states that for any large enough graph, there is an independent set of size s or a clique of size t. The smallest number of vertices required to achieve this is called a Ramsey number. 1 Definition 2. The Ramsey number R(s, t) is the minimum number n such that any graph on n vertices contains either an independent set of size s or a clique of size t. The Ramsey number Rk (s1 , s2 , . . . , sk ) is the minimum number n such that any coloring of the edges of Kn with k colors contains a clique of size si in color i, for some i. Note that it is not clear a priori that Ramsey numbers are finite! Indeed, it could be the case that there is no finite number satisfying the conditions of R(s, t) for some choice of s, t. However, the following theorem proves that this is not the case and gives an explicit bound on R(s, t). Theorem 1 (Ramsey’s theorem). For any s, t ≥ 1, there is R(s, t) < ∞ such that any graph on R(s, t) vertices contains either an independent set of size s or a clique of size t. In particular, s+t−2 R(s, t) ≤ . s−1 We remark that the bound given here is stronger than Ramsey’s original bound. Proof. We show that R(s, t) ≤ R(s − 1, t) + R(s, t − 1). To see this, let n = R(s − 1, t) + R(s, t − 1) and consider any graph G on n vertices. Fix a vertex v ∈ V . We consider two cases: • There are at least R(s, t−1) edges incident with v. Then we apply induction on the neighbors of v, which implies that either they contain an independent set of size s, or a clique of size t − 1. In the second case, we can extend the clique by adding v, and hence G contains either an independent set of size s or a clique of size t. • There are at least R(s − 1, t) non-neighbors of v. Then we apply induction to the nonneighbors of v and we get either an independent set of size s − 1, or a clique of size t. Again, the independent set can be extended by adding v and hence we are done. Given that R(s, t) ≤ R(s − 1, t) + R(s, t − 1), it follows by induction that these Ramsey numbers s+t−2 are finite. Moreover, we get an explicit bound. First, R(s, t) ≤ s−1 holds for the base cases where s = 1 or t = 1 since every graph contains a clique or an independent set of size 1. The inductive step is as follows: s+t−3 s+t−3 s+t−2 R(s, t) ≤ R(s − 1, t) + R(s, t − 1) ≤ + = s−2 s−1 s−1 by a standard identity for binomial coefficients. For a larger number of colors, we get a similar statement. Theorem 2. For any s1 , . . . , sk ≥ 1, there is Rk (s1 , . . . , sk ) < ∞ such that for any k-coloring of the edges of Kn , n ≥ Rk (s1 , . . . , sk ), there is a clique of size si in some color i. We only sketch the proof here. Let us assume for simplicity that k ≥ 4 is even. We show that Rk (s1 , s2 , . . . , sk ) ≤ Rk/2 (R(s1 , s2 ), R(s3 , s4 ), . . . , R(sk−1 , sk )). To prove this, let n = Rk/2 (R(s1 , s2 ), R(s3 , s4 ), . . . , R(sk−1 , sk )) and consider any k-coloring of the edges of Kn . We pair up the colors: {1, 2}, {3, 4}, {5, 6}, etc. By the definition of n, there exists a subset S of R(s2i−1 , s2i ) vertices such that all edges on S use only colors 2i − 1 and 2i. By applying Ramsey’s theorem once again to S, there is either a clique of size s2i−1 in color 2i − 1, or a clique of size s2i in color 2i. 2 2 Schur’s theorem Ramsey’s theory for integers is about finding monochromatic subsets with a certain arithmetic structure. It starts with the following theorem of Schur (1916), which turns out to be an easy application of Ramsey’s theorem for graphs. Theorem 3. For any k ≥ 2, there is n > 3 such that for any k-coloring of {1, 2, . . . , n}, there are three integers x, y, z of the same color such that x + y = z. Proof. We choose n = Rk (3, 3, . . . , 3), i.e. the Ramsey number such that any k-coloring of Kn contains a monochromatic triangle. Given a coloring c : [n] → [k], we define an edge coloring of Kn : the color of edge {i, j} will be χ({i, j}) = c(|j − i|). By the Ramsey theorem for graphs, there is a monochromatic triangle {i, j, k}; assume i < j < k. Then we set x = j − i, y = k − j and z = k − i. We have c(x) = c(y) = c(z) and x + y = z. Schur used this in his work related to Fermat’s Last Theorem. More specifically, he proved that Fermat’s Last Theorem is false in the finite field Zp for any sufficiently large prime p. Theorem 4. For every m ≥ 1, there is p0 such that for any prime p ≥ p0 , the congruence xm + y m = z m (mod p) has a solution. Proof. The multiplicative group Zp∗ is known to be cyclic and hence it has a generator g. Each element of Zp∗ can be written as x = g mj+i where 0 ≤ i < m. We color the elements of Zp∗ by m colors, where c(x) = i if x = g mj+i . By Schur’s theorem, for p sufficiently large, there are elements x, y, z ∈ Zp∗ such that x0 + y 0 = z 0 and c(x0 ) = c(y 0 ) = c(z 0 ). Therefore, x0 = g mjx +i , y 0 = g mjy +i and z 0 = g mjz +i and g mjx +i + g mjy +i = g mjz +i . Setting x = g jx , y = g jy and z = g jz , we get a solution of xm + y m = z m in Zp . 3 Math 184: Combinatorics Lecture 5B: Ramsey theory: integers Instructor: Benny Sudakov 1 Hales-Jewett and Van der Waerden’s theorem The question of finding monochromatic subsets with some additive structure was investigated more generally by Van der Waerden. He proved the following general theorem. We remark that this still predates Ramsey’s theorem for graphs. Theorem 1 (Van der Waerden, 1927). For any k ≥ 2, ` ≥ 3, there is n such that any k-coloring of [n] contains a monochromatic arithmetic progression of length `: {a, a + b, a + 2b, . . . , a + (` − 1)b}. Later, Hales and Jewett discovered the following Ramsey theorem, from which many Ramseytype statements can be easily deduced. This theorem is about colorings of a sequences of n symbols from some alphabet A. We denote the set of such sequences by An . Geometrically, this can be viewed as an n-dimensional cube. Definition 1. A combinatorial line in An is a set of n points defined by x ∈ An and S ⊆ [n]: L(x, S) = {y ∈ An : ∀i, j ∈ S; yi = yj &∀i ∈ / S; yi = xi }. In other words, we fix the coordinates outside of S, while the coordinates in S all vary simultaneously over the symbols in A. Theorem 2 (Hales-Jewett, 1963). For any finite alphabet A and k ≥ 2, there is n0 (A, k) such that for any k-coloring of An , n ≥ n0 (A, k), there exists a monochromatic combinatorial line. We will not prove this theorem in this class, although it can be done in one lecture (and the proof can be found in Jukna’s book). We will only show how Van der Waerden’s theorem follows from Hales-Jewett. Given k and `, we fix an alphabet A = {1, 2, . . . , `} and choose n = `n0 (A, k) where n0 (A, k) is large enough according to the Hales-Jewett theorem. Given a k-coloring c : [n] → [k], we define an induced coloring χ : An0 → [k]: χ(a1 , a2 , . . . , an0 ) = c(a1 + a2 + . . . + an0 ). By the Hales-Jewett theorem, there is a monochromatic combinatorial line in An0 . It is easy to see that this combinatorial line translates under the mapping (a1 , . . . , an0 ) → a1 + . . . + an0 into an arithmetic progression of length `. 2 Szemer´ edi’s theorem Much later, Szemer´edi proved that arithmetic progressions can be found not only in any k-coloring, but in fact in any set of sufficient density. 1 Theorem 3 (Szemer´edi). For any δ > 0 and ` ≥ 3, there is n0 such that for any n ≥ n0 and any set S ⊆ [n], |S| ≥ δn, S contains an arithmetic progression of length `. It can be seen that this implies Van der Waerden’s theorem, since we can set δ = 1/k and for any k-coloring of [n], one color class contains at least δn elements. However, the proof of Szemer´edi’s theorem is much more complicated (it uses the Szemer´edi regularity lemma). Recently, there has been renewed interest in this theory, in particular the relationship between the magnitude of n0 and the density δ. The best known bounds are due to Ben Green and c Terence Tao, who showed that it is enough to choose n0 ∼ 21/δ for some constant c. This is a significant improvement over the original bounds of Szemer´edi that involve tower functions of 1/δ. These developments also led to the celebrated result of Green and Tao that prime numbers contain arbitrarily long arithmetic progressions. 2 Math 184: Combinatorics Lecture 6: Ramsey theory: continued Instructor: Benny Sudakov 1 Bounds on Ramsey numbers Ramsey number of particular interest are the diagonal Ramsey numbers R(s, s). The bound we have proved gives 2s − 2 4s ≤√ . R(s, s) ≤ s−1 s This bound has not been improved significantly for over 50 years! All we know currently is that exponential growth is the right order of magnitude, but the base of the exponential is not known. The following is an old lower bound of Erd˝os. Note that to get a lower bound, we need to show that there is a large graph without cliques and independent sets of a certain size. Equivalently, we need to prove there is a 2-coloring such that there is no monochromatic clique of a certain size in either color. This is quite difficult to achieve by an explicit construction. (The early lower bounds on R(s, s) were polynomial in s.) The interesting feature of Erd˝ os’s proof is that he never presents a specific coloring. He simply proves that choosing a coloring at random almost always works! This was one of the first occurences of the probabilistic method in combinatorics. The probabilistic method has been used in combinatorics ever since with phenomenal success, using much more sophisticated tools; we will return to this later. Theorem 1. For s ≥ 3, R(s, s) > 2s/2 . Proof. Let n = 2s/2 . Consider a random coloring of Kn where each edge is colored independently red or blue with probability 1/2. For any particular s-tuple of vertices S, the probability that the s clique on S has all edges of the same color is 2/2(2) . The number of s-tuples of vertices is ns and therefore the probability that some s-clique is monochromatic is at most 2ns 21+s/2 n 2 < = <1 s s 2(2) s! s!2s(s−1)/2 for s ≥ 3. Therefore, with non-zero probability, there is no monochromatic clique of size s and such a coloring certifies that R(s, s) > 2s/2 . Determining Ramsey numbers exactly, even for small values of s, is a notoriously difficult task. The currently known diagonal values are: R(2, 2) = 2, R(3, 3) = 6, R(4, 4) = 18. R(5, 5) is known to be somewhere between 43 − 49, and R(6, 6) between 102 − 165. 1 1 A famous quote from Paul Erd˝ os goes as follows: “Imagine an alien force, vastly more powerful than us, demanding the value of R(5, 5) or they will destroy our planet. In that case, we should marshal all our computers and all our mathematicians and attempt to find the value. But suppose, instead, that they ask for R(6, 6). Then we should attempt to destroy the aliens.” 1 Math 184: Combinatorics Lecture 7: Ramsey theory: continued Instructor: Benny Sudakov 1 Ramsey’s theorem for hypergraphs Next, we will talk about “generalized graphs” which are called hypergraphs. While a graph is a given by a collection of pairs (edges) on a set of vertices V , a hypergraph can contain “hyperedges” of arbitrary size. Thus a hypergraph in full generality is any collection of subsets of V . Hypergraphs of particular importance are r-uniform hypergraphs, where all hyperedges have size r. Thus graphs can be viewed as 2-uniform hypergraphs. Definition 1. An r-uniform hypergraph is a pair H = (V, E) where V is a finite set of vertices and E ⊆ Vr is a set of hyperedges. An empty hypergraph is a hypergraph with no hyperedges. (r) A complete r-uniform hypergraph is Kn = (V, Vr ) where |V | = n. A subhypergraph of H induced by a set of vertices S is H[S] = (S, E ∩ Sr ). An independent set is a set of vertices S that induces an empty hypergraph. A clique is a set of vertices T that induces a complete hypergraph. We define Ramsey numbers for hypergraphs in a way similar to the previous lecture. Definition 2. The hypergraph Ramsey number R(r) (s, t) is the minimum number n such that any r-uniform hypergraph on n vertices contains either an independent set of size s or a clique of size t. (r) The Ramsey number Rk (s1 , s2 , . . . , sk ) is the minimum number n such that any coloring of the (r) edges of the complete hypergraph Kn with k colors contains a clique of size si whose edges all have color i, for some i. Theorem 1 (Ramsey for hypergraphs). For any s, t ≥ r ≥ 1, the Ramsey number R(r) (s, t) is finite and satisfies R(r) (s, t) ≤ R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) + 1. Similarly, it can be shown that the hypergraph Ramsey numbers for k colors are finite. Proof. We know from the previous lecture that the Ramsey numbers R(2) (s, t) = R(s, t) are finite. It’s also easy to see that R(r) (s, r) = s (because any hypergraph on s vertices is either empty or it contains some edge on r vertices). Similarly R(r) (r, t) = t. So it remains to prove the inductive step. Fix r, s, t and assume that R(r) (s − 1, t), R(r) (s, t − 1) and R(r−1) (u, v) for all u, v are finite. Let n = R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) + 1 and consider any r-uniform hypergraph H on n vertices. Fix a vertex v ∈ V and define an (r − 1)-uniform hypergraph L(v) on the remaining vertices: an (r − 1)-tuple R0 is in L(v), if and only if R0 ∪ {v} is a hyperedge of H. 1 1 L(v) is sometimes called the link of v. It generalizes the notion of a neighborhood in a graph. 1 Since L(v) is an (r − 1)-uniform hypergraph on R(r−1) R(r) (s − 1, t), R(r) (s, t − 1) vertices, it has either an independent set of size R(r) (s − 1, t) or a clique of size R(r) (s, t − 1). • If L(v) has set S of size R(r) (s − 1, t), then we know that for any (r − 1)-tuple an independent S 0 0 R ∈ r−1 , {v} ∪ R is not a hyperedge in H. Also, by induction we can apply the Ramsey property to the induced hypergraph H[S]: it contains either an independent set S1 of size s − 1 or a clique T1 of size t. In the first case, S2 ∪ {v} is an independent set of size s in H, and in the second case, T1 is a clique of size t in H. T • If L(v) has a clique T of size R(r) (s, t − 1), then we know that for any (r − 1)-tuple R0 ∈ r−1 , T ∪ {v} is a hyperedge of H. Again, by induction on H[T ], we obtain that H[T ] has either an independent set S1 of size s or a clique T1 of size t − 1. In the first case, we are done and in the second case, T1 ∪ {x} is a clique of size t in H. (r) The proof that the Ramsey numbers Rk (s1 , s2 , . . . , sk ) are finite is more complicated but in the same spirit. Let us just state a corollary of this statement, Ramsey’s theorem for coloring sets. Theorem 2. For any t ≥ r ≥ 2 and k ≥ 2, there is n such that for any coloring χ : [n] → [k], r T there is a subset T ⊂ [n] of size |T | = t such that all subsets R ∈ r have the same color χ(R). Note that this corresponds to a special case where we color a complete r-uniform hypergraph with k colors and we want to find a monochromatic clique of size t in some color. Therefore, n can (r) be chosen as n = Rk (t, t, . . . , t). 2 Convex polygons among points in the plane Ramsey theory has many applications. The following is a geometric statement which follows quite surprisingly from Ramsey’s theorem for 4-uniform hypergraphs. Theorem 3 (Erd˝ os-Szekeres, 1935). For any m ≥ 4, there is n, such that given any configuration of n points in the plane, no three on the same line, there are m points forming a convex polygon. To deduce this theorem from Ramsey’s theorem, we need only the following two geometric facts. We say that points are in a convex position if they form a convex polygon. Fact 1. Among five points in the plane, no three on the same line, there are always four points in a convex position. Fact 2. If out of m points, any four are in a convex position, then they are all in a convex position. The first fact can be seen by checking three cases, depending on whether the convex hull of the 5 points consists of 3,4 or 5 points. Note that this proves the statement of the theorem for m = 4. The second fact holds because if one of the m points is inside the convex hull of others, then it is also inside a triangle formed by 3 other points and then these 4 are not in a convex position. Now we are ready to prove the theorem. 2 Proof. Given m, let n = R(4) (5, m) and consider any set of n points in the plane, no three on the same line. We define a 4-uniform hypergraph H on this set of n points. Let a 4-tuple of points u, v, x, y form a hyperedge if they are in a convex position. By Ramsey’s theorem, H must contain either an independent set of size 5 or a clique of size m. However, an independent set of size 5 would mean that there are 5 points without any 4-typle in a convex position. This would contradict Fact 1. Therefore, there must be a clique of size m which means m points where all 4-tuples are in a convex position. Then Fact 2 implies that these m points form a convex polygon. 3 Math 184: Combinatorics Lecture 8: Extremal combinatorics Instructor: Benny Sudakov 1 Relationship between Ramsey theory and extremal theory Consider the following theorem, which falls within the framework of Ramsey theory. Theorem 1 (Van der Waerden, 1927). For any k ≥ 2, ` ≥ 3, there is n such that any k-coloring of [n] contains a monochromatic arithmetic progression of length `: {a, a + b, a + 2b, . . . , a + (` − 1)b}. This is a classical theorem which predates even Ramsey’s theorem about graphs. We are not going to present the proof here. Note, however, that in order to prove such a statement, it would be enough to show that for a sufficiently large [n], any subset of at least n/k elements contains an arithmetic progression of length `. This is indeed what Szemer´edi proved, much later and using much more involved techniques. Theorem 2 (Szemer´edi). For any δ > 0 and ` ≥ 3, there is n0 such that for any n ≥ n0 and any set S ⊆ [n], |S| ≥ δn, S contains an arithmetic progression of length `. It can be seen that this implies Van der Waerden’s theorem, since we can set δ = 1/k and for any k-coloring of [n], one color class contains at least δn elements. Szemer´edi’s theorem is an extremal type of statement - stating, that any object of sufficient size must contain a certain structure. 2 Bipartite graphs Definition 1. A graph G is called bipartite, if the vertices can be partitioned into V1 and V2 , so that there are no edges inside V1 and no edges inside V2 . Equivalently, G is bipartite if its vertices can be colored with 2 colors so that the endpoints of every edge get two different colors. (The 2 colors correspond to V1 and V2 .) Thus, bipartite graphs are called equivalently 2-colorable. We also have the following characterization, which is useful to know. Lemma 1. G is bipartite, if and only if it does not contain any cycle of odd length. Proof. Suppose G has an odd cycle. Then obviously it cannot be bipartite, because no odd cycle is 2-colorable. Conversely, suppose G has no odd cycle. Then we can color the vertices greedily by 2 colors, always choosing a different color for a neighbor of some vertex which has been colored already. Any additional edges are consistent with our coloring, otherwise they would close a cycle of odd length with the edges we considered already. The easiest extremal question is about the maximum possible number of edges in a bipartite graph on n vertices. 1 Lemma 2. A bipartite graph on n vertices can have at most 41 n2 edges. Proof. Suppose the bipartition is (V1 , V2 ) and |V1 | = k, |V2 | = n − k. The number of edges between V1 and V2 can be at most k(n − k), which is maximized for k = n/2. 3 Graphs without a triangle Let us consider Ramsey’s theorem for graphs, which guarantees the existence of a monochromatic triangle for an arbitrary coloring of the edges. An analogous extremal question is, what is the largest number of edges in a graph that does not have any triangle? We remark that this is not the right way to prove Ramsey’s theorem - even for triangles, it is not true that for any 2-coloring of a large complete graph, the larger color class must contain a triangle. Exercise: what is a counterexample? The question how many edges are necessary to force a graph to contain a triangle is very old and it was resolved by the following theorem. Theorem 3 (Mantel, 1907). For any graph G with n vertices and more than 41 n2 edges, G contains a triangle. Proof. Assume that G has n vertices, m edges and no triangle. Let dx denote the degree of x ∈ V . Whenever (x, y) ∈ E, we know that x and y cannot share a neighbor (which would form a triangle), and therefore dx + dy ≤ n. Summing up over all edges, we get X X mn ≥ (dx + dy ) = d2x . x∈V (x,y)∈E On the other hand, applying Cauchy-Schwartz to the vectors (d1 , d2 , . . . , dn ) and (1, 1, . . . , 1), we obtain !2 X X dx = (2m)2 . n d2x ≥ x∈V x∈V Combining these two inequalities, we conclude that m ≤ 14 n2 . We remark that the analysis above can be tight only if for every edge, any other vertex is connected to exactly one of the two endpoints. This defines a partition V1 ∪ V2 such that we have all edges between V1 and V2 , i.e. a complete bipartite graph. When |V1 | = |V2 |, this is the unique extremal graph without a triangle, containing 14 n2 edges. 4 Graphs without a clique Kt+1 More generally, it is interesting to ask how many edges G can have if G does not contain any clique Kt+1 . Graphs without Kt+1 can be constructed for example by taking t disjoint sets of vertices, V = V1 ∪. . .∪Vt , and inserting all edges between vertices in different sets. Now, obviously there is no Kt+1 , since any set of t + 1 vertices has two vertices in the same set Vi . The number of edges in such a graph is maximized, when the sets Vi are as evenly sized as possible, i.e. |Vi | − |Vj | ∈ {−1, 0, +1} 2 for all i, j. We call such a graph on n vertices the Tur´ an graph Tn,t . Tur´an proved in 1941 that this is indeed the graph without Kt+1 containing the maximum number of edges. Note that the number of edges in Tn,t is 21 (1 − 1t )n2 , assuming for simplicity that n is divisible by t. Theorem 4 (Tur´ an, 1941). Among all Kt+1 -free graphs on n vertices, Tn,t has the most edges. Proof. Let G be a graph without Kt+1 and vm a vertex of maximum degree dm . Let S be the set of neighbors of vm , |S| = dm , and T = V \ S. Note that by assumption, S has no clique of size t. We modify the graph into G0 as follows: we keep the graph inside S, we include all possible edges between S and T , and we remove all edges inside T . For each vertex, the degree can only increase: for vertices in S, this is obvious, and for vertices in T , the new degrees are at least dm , i.e. at least as large as any degree in G. Thus the total number of edges can only increase. By induction, we can prove that G[S] can be also modified into a union of disjoint independent sets with all edges between them. Therefore, the best possible graph has the structure of a Tur´ an graph. To prove that the Tur´ an graph is the unique extremal graph, we note that if G had any edges inside T , then we strictly gain by modifying the graph into G0 . We present another proof of Tur´ an’s theorem, which is probabilistic. Here, we only prove the 1 1 2 quantitative part, that 2 (1 − t )n is the maximum number of edges in a graph without Kt+1 . P Proof. Let’s consider a probability distribution on the vertices, p1 , . . . , pn such that ni=1 pi = 1. We start with pi = 1/n for all vertices. Suppose we sample two vertices v1 , v2 independently according to this distribution - what is the probability that {v1 , v2 } ∈ E? We can write this probability as X Pr[{v1 , v2 } ∈ E] = pi pj . i,j:{i,j}∈E At the beginning, this is equal to n22 |E|. Now we modify the distribution in order to make Pr[{v1 , v2 } ∈ E] as large as possible. We claim that the probability distribution that maximizes this probability is uniform on some maximal clique. P We proceed as follows: P If there are two non-adjacent vertices i, j such that pi , pj > 0, let si = k:{i,k}∈E pk and sj = k:{j,k}∈E pk . If si ≥ sj , we set the probability of vertex i to pi + pj and the probability of vertex j to 0 (and conversely if si < sj ). It can be verified that this increases Pr[{v1 , v2 } ∈ E] by pj (si − sj ) or pi (sj − si ), respectively. Eventually, we reach a situation where there are no two non-adjacent vertices of positive probability, i.e. the distribution is on a clique Q. Then, X Pr[{v1 , v2 } ∈ E] = Pr[v1 6= v2 ] = 1 − p2i . i∈Q By Cauchy-Schwartz, this is maximized when pi is uniform on Q, i.e. Pr[{v1 , v2 } ∈ E] ≤ 1 − 1 1 ≤1− |Q| t assuming that there is no clique larger than t. Recall that the probability we started with was 2 |E| and we never decreased it in the process. Therefore, n2 1 n2 |E| ≤ 1 − . t 2 3 4 Math 184: Combinatorics Lecture 9: Extremal combinatorics Instructor: Benny Sudakov 1 The Erd˝ os-Stone theorem We can ask more generally, what is the maximum number of edges in a graph G on n vertices, which does not contain a given subgraph H? We denote this number by ex(n, H). For graphs G on n vertices, this question is resolved up to an additive error of o(n2 ) by the Erd˝os-Stone theorem. In order to state the theorem, we first need the notion of a chromatic number. Definition 1. For a graph H, the chromatic number χ(H) is the smallest c such that the vertices of H can be colored with c colors with no neighboring vertices receiving the same color. The chromatic number is an important parameter of a graph. The graphs of chromatic number at most 2 are exactly bipartite graphs. In contrast, graphs of chromatic number 3 are already hard to decribe and hard to recognize algorithmically. Let us also mention the famous Four Color Theorem which states that any graph that can be drawn in the plane without crossing edges has chromatic number at most 4. The chromatic number of H turns out to be closely related to the question of how many edges are necessary for H to appear as a subgraph. Theorem 1 (Erd˝ os-Stone). For any fixed graph H and fixed > 0, there is n0 such that for any n ≥ n0 , 1 1 1 1 2 1− 1− − n ≤ ex(n, H) ≤ + n2 . 2 χ(H) − 1 2 χ(H) − 1 In particular, for bipartite graphs H, which can be colored with 2 colors, we get that ex(n, H) ≤ n2 for any > 0 and sufficiently large n, so the theorem only says that the extremal number is very small compared to n2 . We denote this by ex(H, n) = o(n2 ). For graphs H of chromatic number 3, we get ex(n, H) = 41 n2 + o(n2 ), etc. Note that this also matches the bound we obtained for H = Kt+1 (χ(Kt+1 ) = t + 1), where we got the exact answer ex(n, Kt+1 ) = 12 (1 − 1t )n2 . First, we prove the following technical lemma. Lemma 1. Fix k ≥ 1, 0 < < 1/k and t ≥ 1. Then there is n0 (k, , t) such that any graph G with n ≥ n0 (k, , t) vertices and m ≥ 12 (1 − 1/k + )n2 edges contains k + 1 disjoint sets of vertices A1 , A2 , . . . , Ak+1 of size t, such that any two vertices in different sets Ai , Aj are joined by an edge. Proof. First, we will find a subgraph G0 ⊂ G where all degrees are at least (1 − 1/k + /2)|V (G0 )|. The procedure to find such a subgraph is very simple: as long as there is a vertex of degree smaller than (1 − 1/k + /2)|V (G)|, remove the vertex from the graph. We just have to prove that this procedure terminates before the graph becomes too small. Suppose that the procedure stops when the graph has n0 vertices (potentially n0 = 0, but we will prove that this is impossible). Let’s count the total number of edges that we have removed 1 from the graph. At the point when G has ` vertices, we remove at most (1 − 1/k + /2)` edges. Therefore, the total number of removed edges is at most n X 1 n + n0 + 1 1 n2 − n20 n − n0 1 `= 1− + (n − n0 ) ≤ 1− + + . 1− + k 2 k 2 2 k 2 2 2 `=n0 +1 At the end, G has at most 12 n20 edges. Therefore, the number of edges in the original graph must have been 1 n2 − n20 n − n0 1 2 1 n2 1 n20 n − n0 |E(G)| ≤ 1 − + + + n0 = 1 − + + − + . k 2 2 2 2 k 2 2 k 2 2 2 2 On the other hand, we assumed that |E(G)| ≥ 1 − k1 + n2 . Combining these two inequalities, we obtain that n20 n0 n2 n 1 − − ≥ − . k 2 2 2 4 2 Thus if √ we want to get n0 large enough, it’s sufficient to choose n appropriately larger (roughly n ' n0 / k). From now on, we can assume that all degrees in G are at least (1 − 1/k + /2)n. We prove by induction on k that there are k + 1 sets of size t such that we have all edges between vertices in different sets. For k = 1, there is nothing to prove. Let k ≥ 2 and s = dt/e. By the induction hypothesis, we can find k disjoint sets of size s, A1 , . . . , Ak such that any two vertices in two different sets are joined by an edge. Let U = V \ (A1 ∪ . . . ∪ Ak ) and let W denote the set of vertices in U , adjacent to at least t points in each Ai . Let us count the edges missing between U and A1 ∪ . . . ∪ Ak . Since every vertex in U \ W is adjacent to less than t vertices in some Ai , the number of missing edges is at least m ˜ ≥ |U \ W |(s − t) ≥ (n − ks − |W |)(1 − )s. On the other hand, any vertex in the graph has at most (1/k − /2)n missing edges, so counting over A1 ∪ . . . ∪ Ak , we get m ˜ ≤ ks(1/k − /2)n = (1 − k/2)sn. From these inequalities, we deduce |W |(1 − )s ≥ (n − ks)(1 − )s − (1 − k/2)sn = (k/2 − 1)sn − (1 − )ks2 . Everything else being constant, we can make n large enough so that |W | is arbitrarily large. In particular, we make sure that k s |W | > (t − 1). t We know that each vertex w ∈ W is adjacent to at least t points in each Ai . Select t specific points k from each Ai and denote the union of all these kt points Tw . We have st possible sets Tw ; by the pigeonhole principle, at least one of them is chosen for at least t vertices w ∈ W . We define these t vertices to constitute our new set Ak+1 , and the respective t-tuples of vertices connected to it A0i ⊂ Ai . The collection of sets A1 , . . . , Ak+1 satisfies the property that all pairs of vertices from different sets form edges. 2 Now we are ready to prove the Erd˝os-Stone theorem. Proof. Let χ(H) = k + 1. The Tur´ an graph Tn,k has chromatic number k, hence it cannot contain H. This proves ex(n, H) ≥ 12 (1 − 1/k)n2 whenever n is a multiple of k. Therefore, 1 1 1 1 2k 2 1− (n − k) ≥ 1− − n2 . ex(n, H) ≥ 2 k 2 k n 2 On the other hand, fix t = |V (H)| and consider a graph G with n vertices and m ≥ (1−1/k+) n2 edges. If n is large enough, then by Lemma 1, G contains sets A1 , . . . , Ak+1 of size t such that all edges between different sets are present. H is a graph of chromatic number k + 1 and therefore its vertices can be embedded in A1 , . . . , Ak+1 based on their color. We conclude that H is a subgraph of G and hence 1 1 1 − + n2 ex(n, H) ≤ 2 k for any > 0 and sufficiently large n. 3 Math 184: Combinatorics Lecture 10: Extremal combinatorics Instructor: Benny Sudakov 1 Bipartite forbidden subgraphs We have seen the Erd˝ os-Stone theorem which says that given a forbidden subgraph H, the extremal number of edges is ex(H, n) = 12 (1−1/(χ(H)−1)+o(1))n2 . Here, o(1) means a term tending to zero as n → ∞. This basically resolves the question for forbidden subgraphs H of chromatic number at least 3, since then the answer is roughly cn2 for some constant c > 0. However, for bipartite forbidden subgraphs, χ(H) = 2, this answer is not satisfactory, because we get ex(H, n) = o(n2 ), which does not determine the order of ex(H, n). Hence, bipartite graphs form the most interesting class of forbidden subgraphs. 2 Graphs without any 4-cycle Let us start with the first non-trivial case where H is bipartite, H = C4 . I.e., the question is how many edges G can have before a 4-cycle appears. The answer is roughly n3/2 . Theorem 1. For any graph G on n vertices, not containing a 4-cycle, √ 1 E(G) ≤ (1 + 4n − 3)n. 4 Proof. Let dv denote the degree of v ∈ V . Let F denote the set of “labeled forks”: F = {(u, v, w) : (u, v) ∈ E, (u, w) ∈ E, v 6= w}. Note that we do not care whether (v, w) is an edge or not. We count the size of F in two possible ways: First, each vertex u contributes du (du − 1) forks, since this is the number of choices for v and w among the neighbors of u. Hence, X |F | = du (du − 1). u∈V On the other hand, every ordered pair of vertices (v, w) can contribute at most one fork, because otherwise we get a 4-cycle in G. Hence, |F | ≤ n(n − 1). By combining these two inequalities, n(n − 1) ≥ X u∈V 1 d2u − X u∈V du and by applying Cauchy-Schwartz, we get !2 1 n(n − 1) ≥ n X du − X du = u∈V u∈V (2m)2 − 2m. n This yields a quadratic equation 4m2 − 2mn − n2 (n − 1) ≤ 0. A solution yields the theorem. This bound can be indeed achieved, i.e. there exist graphs with Ω(n3/2 ) 1 edges, not containing any 4-cycle. One example is the incidence graphs between lines and points of a finite projective plane. We give a similar example here, which is algebraically defined and easier to analyze. Example. Let V = Zp × Zp , i.e. vertices are pairs of elements of a finite field (x, y). The number of vertices is n = p2 . We define a graph G, where (x, y) and (x0 , y 0 ) are joined by an edge, if x + x0 = yy 0 . For each vertex (x, y), there are p solutions of this equation (pick any y 0 ∈ Zp and x0 is uniquely determined). One of these solutions could be (x, y) itself, but in any case (x, y) has at least p − 1 neighbors. Hence, the number of edges in the graph is m ≥ 12 p2 (p − 1) = Ω(n3/2 ). Finally, observe that there is no 4-cycle in G. Suppose that (x, y) has neighbors (x1 , y1 ) and (x2 , y2 ). This means x + x1 = yy1 and x + x2 = yy2 , therefore x1 − x2 = y(y1 − y2 ). Hence, given (x1 , y1 ) 6= (x2 , y2 ), y is determined uniquely, and then x can be also computed from one of the equations above. So (x1 , y1 ) and (x2 , y2 ) can have only one shared neighbor, which means there is no C4 in the graph. 3 Graphs without a complete bipartite subgraph Observe that another way to view C4 is as a complete bipartite subgraph K2,2 . More generally, we can ask how many edges force a graph to contain a complete bipartite graph Kt,t . Theorem 2. Let t ≥ 2. Then there is a constant c > 0 such that any graph on n vertices without Kt,t has at most cn2−1/t edges. Proof. Let G be a graph without Kt,t , V (G) = {1, 2, . . . , n} and let di denote the degree of vertex i. The neighborhood of vertex i contains dti t-tuples of vertices. Let’s count such t-tuples over the neighborhoods of all vertices i. Note that any particular t-tuple can be counted at most t − 1 times in this way, otherwise we would get a copy of Kt,t . Therefore, n X di i=1 t n ≤ (t − 1) . t Observe that the average degree inthe graph is much more than t, otherwise we have nothing to prove. Due to the convexity of dti as a function of di , the left-hand side is minimized if all the degrees are equal, di = 2m/n. Therefore, n X di i=1 1 t 2m/n ≥n t ≥n (2m/n − t)t t! Ω(f (n)) denotes any function which is lower-bounded by cf (n) for some constant c > 0 for sufficiently large n. 2 and nt n (t − 1) ≤ (t − 1) . t t! We conclude that n(2m/n − t)t ≤ (t − 1)nt which means that m ≤ 12 (t − 1)1/t n2−1/t + 12 tn ≤ n2−1/t + 21 tn. As an exercise, the reader can generalize the bound above to the following. Theorem 3. Let s ≥ t ≥ 2. Then for sufficiently large n, any graph on n vertices without Ks,t has O(s1/t n2−1/t ) edges. Another extremal bound of this type is for forbidden even cycles. (Recall that for a forbidden odd cycle, the number of edges can be as large as 14 n2 .) Theorem 4. If G has n vertices and no cycle C2k , then the number of edges is m ≤ cn1+1/k for some constant c > 0. We prove a weaker version of this bound, for graphs that do not contain any cycle of length at most 2k. Theorem 5. If G has n vertices and no cycles of length shorter than 2k + 1, then the number of edges is m < n(n1/k + 1). Proof. Let ρ(G) = |E(G)|/|V (G)| denote the density of a graph G. First, we show that there is a subgraph G0 where every vertex has degree at least ρ(G): Let G0 be a graph of maximum density among all subgraphs of G (certainly ρ(G0 ) ≥ ρ(G)). We claim that all degrees in G0 are at least ρ(G0 ). If not, suppose G0 has n0 vertices and m0 = ρ(G0 )n0 edges; then by removing a vertex of degree d0 < ρ(G0 ), we obtain a subgraph G00 with n00 = n0 − 1 vertices and m00 = m0 − d0 > ρ(G0 )(n0 − 1) edges, hence ρ(G00 ) = m00 /n00 > ρ(G0 ) which is a contradiction. Now consider a graph G with m ≥ n(n1/k + 1) edges and its subgraph G0 of maximum density, where all degrees are at least ρ(G) ≥ n1/k + 1. We start from any vertex v0 and grow a tree where on each level Lj we include all the neighbors of vertices in Lj−1 . As long as we do not encounter any cycle, each vertex has at least n1/k new children and |Lj | ≥ n1/k |Lj−1 |. Assuming that there is no cycle of length shorter than 2k + 1, we can grow this tree up to level Lk and we have |Lk | ≥ n. However, this contradicts the fact that the levels should be disjoint and all contained in a graph on n vertices. 4 Application to additive number theory The following type of question is studied in additive number theory. Suppose we have a set of integers B and we want to generate B by forming sums of numbers from a smaller set A. How small can A be? More specifically, suppose we would like to generate a certain sequence of squares, B = {12 , 22 , 32 , . . . , m2 }, by taking sums of pairs of numbers, A + A = {a + b : a, b ∈ A}. How small can √ A be so that B ⊆ A + A? Obviously, we need |A| ≥ m to generate any set of m numbers. Theorem 6. For any set A such that B = {12 , 22 , 32 , . . . , m2 } ⊆ A, we need |A| ≥ m2/3−o(1) . 3 Proof. Let B ⊆ A + A and suppose |A| = n. We define a graph G whose vertices are A and (a1 , a2 ) is an edge if a1 + a2 = x2 for some integer x. Since we need to generate m different squares, the number of edges is at least m. Consider a1 , a2 ∈ A and all numbers b such that a1 + b = x2 and a2 + b = y 2 . Note that we get a different pair (x, y) for each b. Then, a1 − a2 = x2 − y 2 = (x + y)(x − y). Now, (x + y, x − y) cannot be the same pair for different numbers b. Denoting the number of divisors of a1 − a2 by d, we can have at most d2 such possible pairs, and each of them can be used only for one number b. Now we use the following proposition. Proposition. For any > 0 and n large enough, n has less than n divisors. Qt αi This can be proved by considering the prime decomposition of n = i=1 pi , where the number Qt of divisors is d = i=1 (1 + αi ). We assume αi ≥ 1 for all i. We claim that for any fixed > 0 and n large enough, Pt log d i=1 log(1 + αi ) = P < . φ(n) = t log n i=1 αi log pi αi i) 2/ , and this can be true only if p Observe that log(1+α i αi log pi can be larger than /2 only if pi ≤ (1 + αi ) and αi are bounded by some constants P , A . All such factors together contribute only a constant C in the decomposition of n. For n arbitrarily large, a majority of the terms log(1 + αi ) will be smaller than 2 αi log pi and hence φ(n) will drop below for sufficiently large n. To summarize, for any pair a1 , a2 ∈ A, we have less than n2 numbers b which are neighbors of both a1 and a2 in the graph G. In other words, G does not contain K2,n2 . By our extremal bound, it has at most cn3/2+ edges. I.e., m ≤ cn3/2+ , for any fixed > 0. 4 Math 184: Combinatorics Lecture 11: The probabilistic method Instructor: Benny Sudakov Very often, we need to construct a combinatorial object satisfying properties, for example to show a counterexample or a lower bound for a certain statement. In situations where we do not have much a priori information and it’s not clear how to define a concrete example, it’s often useful to try a random construction. 1 Probability basics A probability space is a pair (Ω, Pr) where Pr is a normalized measure on Ω, i.e. Pr(Ω) = 1. In combinatorics, it’s mostly sufficient to work with finite probability spaces, so we can avoid a lot of the technicalities of measure theory. We can assume P that Ω is a finite set and each elementary event ω ∈ Ω has a certain probability Pr[ω] ∈ [0, 1]; ω∈Ω Pr[ω] P = 1. Any subset A ⊆ Ω is an event, of probability Pr[A] = ω∈A Pr[ω]. Observe that a union of events corresponds to OR and an intersection of events corresponds to AND. A random variable is any function X : Ω → R. Two important notions here will be expectation and independence. Definition 1. The expectation of a random variable X is X X E[X] = X(ω) Pr[ω] = a Pr[X = a]. a ω∈Ω Definition 2. Two events A, B ⊆ Ω are independent, if Pr[A ∩ B] = Pr[A] Pr[B]. Two random variables X, Y are independent, if the events X = a and Y = b are independent for any choices of a, b. Lemma 1. For independent random variables X, Y , we have E[XY ] = E[X]E[Y ]. Proof. E[XY ] = X ω∈Ω X(ω)Y (ω) Pr[ω] = X ab Pr[X = a, Y = b] = X a Pr[X = a] a a,b The two most elementary tools that we will use are the following. 1 X b b Pr[Y = b] = E[X]E[Y ]. 1.1 The union bound Lemma 2. For any collection of events A1 , . . . , An , Pr[A1 ∪ A2 ∪ . . . ∪ An ] ≤ n X Pr[Ai ]. i=1 An equality holds if the events Ai are disjoint. This is obviously true by the properties of a measure. This bound is very general, since we do not need to assume anything about the independence of A1 , . . . , An . 1.2 Linearity of expectation Lemma 3. For any collection of random variables X1 , . . . , Xn , E[X1 + X2 + . . . + Xn ] = n X E[Xi ]. i=1 Again, we do not need to assume anything about the independence of X1 , . . . , Xn . Proof. n n n X n X XX X X E[ Xi ] = Xi (ω) Pr[ω] = Xi (ω) Pr[ω] = E[Xi ]. i=1 2 ω∈Ω i=1 i=1 ω∈Ω i=1 2-colorability of hypergraphs Our first application is the question of 2-colorability of hypergraphs. We call a hypergraph 2colorable, if its vertices can be assigned 2 colors so that every hyperedge contains both colors. An (r) example which is not 2-colorable is the complete r-uniform hypergraph on 2r − 1 vertices, K2r−1 . This is certainly not 2-colorable, because for any there is a set of r vertices of the same coloring √ 2r−1 r color. The number of hyperedges here is r ' 4 / r. A question is whether a number of edges exponential in r is necessary to make a hypergraph non-2-colorable. The probabilistic method shows easily that this is true. Theorem 1. Any r-uniform hypergraph with less than 2r−1 hyperedges is 2-colorable. Proof. Consider a random coloring, where every vertex is colored independently red/blue with probability 1/2. For each hyperedge e, the probability that e is monochromatic is 2/2r . By the union bound, X 2 2|E| = r <1 Pr[∃monochromatic edge] ≤ r 2 2 e∈E 2r−1 . by our assumption that |E| < If every coloring contained a monochromatic edge, this probability would be 1; therefore, for at least one coloring this is not the case and therefore the hypergraph is 2-colorable. 2 3 A tournament paradox A tournament is a directed graph where we have an arrow in exactly one direction for each pair of vertices. A tournament can represent the outcome of a competition where exactly one game is played between every pair of teams. A natural notion of k winning teams would be such that there is no other team, beating all these k teams. Unfortunately, such a notion can be ill-defined, for any value of k. Theorem 2. For any k ≥ 1, there exists a tournament T such that for every set of k vertices B, there exists another vertex x such that x → y for all y ∈ B. Proof. We can assume k sufficiently large, because the theorem gets only stronger for larger k. Given k, we set n = k + k 2 2k and consider a uniformly random tournament on n vertices. This means, we select an arrow x → y or y → x randomly for each pair of vertices x, y. First let’s fix a set of vertices B, |B| = k, and analyze the event that no other vertex beats all the vertices in B. For each particular vertex x, Pr[∀y ∈ B; x → y] = 1 2k and by taking the complement, Pr[∃y ∈ B; y → x] = 1 − 1 . 2k Since these events are independent for different vertices x ∈ V \ B, we can conclude that Pr[∀x ∈ V \ B; ∃y ∈ B; y → x] = (1 − 2−k )n−k = (1 − 2−k )k 2 2k 2 ≤ e−k . By the union bound over all potential sets B, 2 k k k 2 n −k2 2 k k −k2 . Pr[∃B; |B| = k; ∀x ∈ V \ B; ∃y ∈ B; y → x] ≤ e ≤ (k 2 ) e = k ek For k sufficiently large, this is less than 1, and hence there exists a tournament where the respective event is false. In other words, ∀B; |B| = k; ∃x ∈ V \ B; ∀y ∈ B; x → y. It is known that k 2 2k is quite close to the optimal size of a tournament satisfying this property; more precisely, ck2k for some c > 0 is known to be insufficient. 4 Sum-free sets Our third application is a statement about sum-free sets, that is sets of integers B such that if x, y ∈ B then x + y ∈ / B. A question that we investigate here is, how many elements can be pick from any set A of n integers so that they form a sum-free set? As an example, consider A = [2n]. We can certainly pick B = {n + 1, n + 2, . . . , 2n} and this is a sum-free set of size 21 |A|. Perhaps this is not possible for any A, but we can prove the following. Theorem 3. For any set of nonzero integers A, there is a sum-free subset B ⊆ A of size |B| ≥ 31 |A|. 3 Proof. We proceed by reducing the problem to a finite field Zp . We choose p prime large enough so that |a| < p for all a ∈ A. We observe that in Zp (counting addition modulo p), there is a sum-free set S = {dp/3e, . . . , b2p/3c}, which has size |S| ≥ 13 (p − 1). We choose a subset of A as follows. Pick a random element x ∈ Zp∗ = Zp \ {0}, and let Ax = {a ∈ A : (ax mod p) ∈ S}. Note that Ax is sum-free, because for any a, b ∈ Ax , we have (ax mod p), (bx mod p) ∈ S and hence (ax + bx mod p) ∈ / S, a + b ∈ / Ax . It remains to show that Ax is large for some x ∈ Zp∗ . We have E[|Ax |] = X a∈A Pr[a ∈ Ax ] = X a∈A 1 Pr[(ax mod p) ∈ S] ≥ |A| 3 because Pr[(ax mod p) ∈ S] is equal to |S|/(p − 1) ≥ is a value of x for which |Ax | ≥ 13 |A|. 4 1 3 for any fixed a 6= 0. This implies that there Math 184: Combinatorics Lecture 12: Extremal results on finite sets Instructor: Benny Sudakov 1 Largest antichains Suppose we are given a family F of subsets of [n]. We call F an antichain, if there are no two sets n A, B ∈ F such that A ⊂ B. For example, F = {S ⊆ [n] : |S| = k} is an antichain of size k . How n large can an antichain be? The choice of k = bn/2c gives an antichain of size bn/2c . In 1928, Emanuel Sperner proved that this is the largest possible antichain that we can have. In fact, we prove a slightly stronger statement. Theorem 1 (Sperner’s theorem). For any antichain F ⊂ 2[n] , Since n |A| ≤ n bn/2c X 1 A∈F n |A| ≤ 1. for any A ⊆ [n], we conclude that |F| ≤ n bn/2c . Proof. We present a very short proof due to Lubell. Consider a random permutation π : [n] → [n]. We compute the probability of the event that a prefix of this permutation {π1 , . . . , πk } is in F for some k. Note that this can happen only for one value of k, since otherwise F would not be an antichain. For each particular set A ∈ F, the probability that A = {π1 , . . . , π|A| } is equal to k!(n − k)!/n!, corresponding to all possible orderings of A and [n] \ A. By the property of an antichain, these events for different sets A ∈ F are disjoint, and hence Pr[∃A ∈ F; A = {π1 , . . . , π|A| }] = X Pr[A = {π1 , . . . , π|A| }] = A∈F X X |A|!(n − |A|)! = n! A∈F A∈F 1 n |A| . The fact that any probability is at most 1 concludes the proof. This has the following application. We note that the theorem actually holds for arbitrary vectors and any ball of radius 1, but we stick to the 1-dimensional case for simplicity. Theorem 2. P Let a1 , a2 , . . . , an be real numbers of absolute value |ai | ≥ 1.. Consider the 2n linear combinations ni=1 i ai , i ∈ {−1, +1}. Then the number of sums which are in any interval (x − n 1, x + 1) is at most bn/2c . An interpretation of this theorem is that for any random walk on the real line, where the i-th step is either +ai or −ai at random, the that after n steps we end up in some fixed nprobability √ n interval (x − 1, x + 1) is at most bn/2c /2 = O(1/ n). 1 Proof. We can assume that ai ≥ 1. For ∈ {−1, +1}n , let I = {i ∈ [n] : i = +1}. If I ⊂ I 0 , and 0 corresponds to I 0 , we have X X X 0i ai − i ai = 2 ai ≥ 2|I 0 \ I|. i∈I 0 \I Therefore, if I is a proper subset of I 0 then only one of them can correspond to a sum inside (x − 1, x + 1). Consequently, the sums inside (x − 1, x + 1) correspond to an antichain and we can n have at most bn/2c such sums. Theorem 3 (Bollob´ as, 1965). If A1 , . . . , Am and B1 , . . . , Bm are two sequences of sets such that Ai ∩ Bj = ∅ if and only if i = j, then m X |Ai | + |Bi | −1 ≤ 1. |Ai | i=1 Note that if A1 , . . . , Am is an antichain on [n] and we set Bi = [n] \ Ai , we get a system of sets satisfying the conditions above. Therefore this is a generalization of Sperner’s theorem. Proof. Suppose that Ai , Bi ⊆ [n] for some n. Again, we consider a random permutation π : [n] → [n]. Here we look at the event that there is some pair (Ai , Bi ) such that π(Ai ) < π(Bi ), in the sense that π(a) < π(b) for all a ∈ Ai , b ∈ Bi . For each particular pair (Ai , Bi ), the probability of this event is |Ai |!|Bi |!/(|Ai | + |Bi |)!. On the other hand, suppose that π(Ai ) < π(Bi ) and π(Aj ) < π(Bj ). Hence, there are points xi , xj such that the two pairs are separated by xi and xj , respectively. Depending on the relative order of xi , xj , we get either Ai ∩ Bj = ∅ or Aj ∩ Bi = ∅, which contradicts our assumptions. Therefore, the events for different pairs (Ai , Bi ) are disjoint. We conclude that m m X X |Ai |!|Bi |! |Ai | + |Bi | −1 Pr[∃i; (Ai , Bi ) are separated in π] = = ≤ 1. (|Ai | + |Bi |)! |Ai | i=1 i=1 This theorem has an application in the following setting. For a collection of sets F ⊆ 2X , we call T ⊆ X a transversal of F, if ∀A ∈ F; A ∩ T 6= ∅. One question is, what is the smallest transversal for a given collection of sets F. We denote the size of the smallest transversal by τ (F). A set system F is called τ -critical, if removing any member of F decreases τ (F). An example of a τ -critical system is the collection F = [k+`] of all subsets of size k out of k + ` elements. k The smallest transversal has size ` + 1, because any set of size ` + 1 intersects every member of F, whereas no set of size ` is a transversal, since its complement is a member of F. Moreover, removing any set A ∈ F decreases τ (F) to `, because then A¯ is a transversal of F \ {A}. This is an example of a τ -critical system of size k+` k , where τ (F) = ` + 1 and ∀A ∈ F; |A| = k. Observe that if F = {A1 , A2 , . . . , An } is τ -critical and τ (F) = ` + 1, then there is a transversal Bi , |Bi | = ` for each i, which intersects each Aj , j 6= i. However, Bi does not intersect Ai , otherwise it would also be a transversal of F. Therefore, Theorem 3 implies the following. Theorem 4. Suppose F is a τ -critical system, where τ (F) = ` + 1 and each A ∈ F has size k. Then k+` |F| ≤ . k 2 r+s r Exercise. If every collection of points, then all edges can. 2 edges in an r-uniform hypergraph can be covered by s Intersecting families Here we consider a different type of family of subsets. We call F ⊆ 2[n] intersecting, if A ∩ B 6= ∅ for any A, B ∈ F. The question what is the largest such family is quite easy: For any set A, we can take only one of A and [n] \ A. Conversely, we can take exactly one set from each pair like this - for example all the sets containing element 1. Hence, the largest intersecting family of subsets of [n] has size exactly 2n−1 . A more interesting question is, how large can be an intersecting family of sets of size k? We assume k ≤ n/2, otherwise we can take all k-sets. Theorem 5 (Erd˝ os-Ko-Rado). For any k ≤ n/2, the largest size of an intersecting family of subsets n−1 of [n] of size k is k−1 . Observe that an intersecting family of size n−1 k−1 can be constructed by taking all k-sets containing element 1. To prove the upper bound, we use an elegant argument of Katona. First, we prove the following lemma. Lemma 1. Consider a circle divided into n intervals by n points. Let k ≤ n/2. Suppose we have “arcs” A1 , . . . , At , each Ai containing k successive intervals around the circle, and each pair of arcs overlapping in at least one interval. Then t ≤ k. Proof. No point x can be the endpoint of two arcs - then they are either the same arc, or two arcs starting from x in opposite directions, but then they do not share any interval. Now fix an arc A1 . Every other arc must intersect A1 , hence it must start at one of the k − 1 points inside A1 . Each such endpoint can have at most one arc. Now we proceed with the proof of Erd˝os-Ko-Rado theorem. Proof. Let F be an intersecting family of sets of size k. Consider a random permutation π : [n] → [n]. We consider each set A ∈ F mapped onto the circle as above, by associating π(A) with the respective set of intervals on the circle. Let X be the number of sets A ∈ F which are mapped onto contiguous arcs π(A) on thecircle. For each set A ∈ F, the probability that π(A) is a contiguous arc is nk!(n − k)!/n! = n/ nk . Therefore, X n E[X] = Pr[π(A) is contiguous] = n |F|. k A∈F On the other hand, we know by our lemma that π(A) can be contiguous for at most k sets at the same time, because F is an intersecting family. Therefore, E[X] ≤ k. From these two bounds, we obtain k n n−1 |F| ≤ = . n k k−1 3 Math 184: Combinatorics Lecture 13: Graphs of high girth and high chromatic number Instructor: Benny Sudakov 1 Markov’s inequality Another simple tool that’s often useful is Markov’s inequality, which bounds the probability that a random variable X is too large, based on the expectation E[X]. Lemma 1. Let X be a nonnegative random variable and t > 0. Then Pr[X ≥ t] ≤ E[X] . t Proof. E[X] = X a Pr[X = a] ≥ a X t Pr[X = a] = t Pr[X ≥ t]. a≥t Working with expectations is usually easier than working directly with probabilities or more P complicated quantities such as variance. Recall that E[X1 + X2 + . . . + Xn ] = ni=1 E[Xi ] for any collection of random variables. 2 Graphs of high girth and high chromatic number We return to the notion of a chromatic number χ(G). Observe that for a graph that does not contain any cycles, χ(G) ≤ 2 because every component is a tree that can be colored easily by 2 colors. More generally, consider graphs of girth `, which means that the length of the shortest cycle is `. If ` is large, this means that starting from any vertex, the graph looks like a tree within distance `/2 − 1. One might expect that such graphs can be also colored using a small number of colors, since locally they can be colored using 2 colors. However, this is far from being true, as shown by a classical application of the probabilistic method. Theorem 1. For any k and `, there is a graph of chromatic number χ(G) > k and girth g(G) > `. Proof. We start by generating a random graph Gn,p , where each edge appears independently with probability p. We fix a value λ ∈ (0, 1/`) and we set p = nλ−1 . Let X be the number of cycles of length at most ` in Gn,p . The number of potential cycles of length j is certainly at most nj , and each of them appears with probability pj , therefore E[X] ≤ ` X j=3 j j n p = ` X j=3 1 nλj ≤ nλ` . 1 − n−λ Because λ` < 1, this is less than n/4 for n sufficiently large. By Markov’s inequality, Pr[X ≥ n/2] ≤ 1/2. Note that we are not able to prove that there are no short cycles in Gn,p , but we will deal with this later. Now let us consider the chromatic number of Gn,p . Rather than the chromatic number χ(G) itself, we analyze the independence number α(G), i.e. the size of the largest independent set in G. Since every color class forms an independent set, it’s easy to see that χ(G) ≥ |V (G)|/α(G). We set a = d p3 ln ne and consider the event that there is an independent set of size a. By the union bound, we get a n Pr[α(G) ≥ a] ≤ (1 − p)(2) ≤ na e−pa(a−1)/2 ≤ na n−3(a−1)/2 → 0. a For n sufficiently large, this probability is less than 1/2. Hence, again by the union bound, we get Pr[X ≥ n/2 or α(G) ≥ a] < 1. Therefore there is a graph where the number of short cycles is X < n/2 and the independence number α(G) < a. We can just delete one vertex from each short cycle arbitrarily, and we obtain a graph G0 on at least n/2 vertices which has no cycles of length at most `, and α(G0 ) < a. The chromatic number of this graph is χ(G0 ) ≥ nλ n/2 |V (G0 )| = ≥ 1−λ . 0 α(G ) 6 ln n 3n ln n By taking n sufficiently large, we get χ(G0 ) > k. We should mention that constructing such graphs explicitly is not easy. We present a construction for triangle-free graphs, which is quite simple. Proposition 1. Let G2 be a graph consisting of a single edge. Given Gn = (V, E), construct Gn+1 as follows. The new set of vertices is V ∪ V 0 ∪ {z}, where V 0 is a copy of V and z is a single new vertex. Gn+1 [V ] is isomorphic to Gn . For each vertex v 0 ∈ V which is a copy of v ∈ V , we connect it by edges to all vertices w ∈ V such that (v, w) ∈ E. We also connect each v 0 ∈ V 0 to the new vertex z. Then Gn is triangle-free and χ(Gn ) = n. Proof. The base case k = 2 is trivial. Assuming that Gn is triangle-free, it is easy to see that Gn+1 is triangle-free as well. Any triangle would have to use one vertex from V 0 and two vertices from V , because there are no edges inside V 0 . However, by the construction of Gn+1 , this would also imply a triangle in Gn , which is a contradiction. Finally, we deal with the chromatic number. We assume χ(Gn ) = n. Note that it’s possible to color V and V 0 in the same way, and then assign a new color to z, hence χ(Gn+1 ) ≤ n + 1. We claim that this is essentially the best way to color Gn+1 . Consider any n-coloring of V . For each color c, there is a vertex vc of color c, which is connected to vertices of all other colors - otherwise, we could re-color all vertices of color c and decrease the number of colors in Gn . Therefore, there is also a vertex vc0 ∈ V 0 which is connected to all other colors different from c. If we want to color Gn+1 using n colors, we must use color c for vc0 . But then, V 0 uses all n colors and z cannot use any of them. Therefore, χ(Gn+1 ) = n + 1. 2 Math 184: Combinatorics Lecture 14: Topological methods Instructor: Benny Sudakov 1 The Borsuk-Ulam theorem We have seen how combinatorics borrows from probability theory. Another area which has been very beneficial to combinatorics, perhaps even more surprisingly, is topology. We have already seen Brouwer’s fixed point theorem and its combinatorial proof. Theorem 1 (Brouwer). For any continuous function f : B n → B n , there is a point x ∈ B n such that f (x) = x. A more powerful topological tool which seems to stand at the root of most combinatorial applications is a somewhat related result which can be stated as follows. Here, S n denotes the n-dimensional sphere, i.e. the surface of the (n + 1)-dimensional ball B n+1 . Theorem 2 (Borsuk-Ulam). For any continuous function f : S n → Rn , there is a point x ∈ S n such that f (x) = f (−x). There are many different proofs of this theorem, some of them elementary and some of them using a certain amount of the machinery of algebraic topology. All the proofs are, however, more involved than the proof of Brouwer’s theorem. We will not give the proof here. In the following, we use a corollary (in fact an equivalent re-statement of the Borsuk-Ulam theorem). Theorem 3. For any covering of S n by n + 1 open or closed sets A0 , . . . , An , there is a set Ai which contains two antipodal points x, −x. Let’s just give some intuition how this is related to Theorem 2. For now, let us assume that all the sets Ai are closed. (The extension to open sets is a technicality but the idea is the same.) We define a continuous function f : S n → Rn , f (x) = (dist(x, A1 ), dist(x, A2 ), . . . , dist(x, An )) where dist(x, A) = inf y∈A ||x − y|| is the distance of x from A. By Theorem 2, there is a point x ∈ S n such that f (x) = f (−x). This means that dist(x, Ai ) = dist(−x, Ai ) for 1 ≤ i ≤ n. If dist(x, Ai ) = 0 for some i, then we are done. If dist(x, Ai ) = dist(−x, Ai ) 6= 0 for all i ∈ {1, . . . , n}, it means that x, −x ∈ / A1 ∪ . . . ∪ An . But then x, −x ∈ A0 . 2 Kneser graphs Similarly to the previous sections, Kneser graphs are derived from the intersection pattern of a collection of sets. More precisely, the vertex set of a Kneser graph consists of all k-sets on a given ground set, and two k-sets form an edge if they are disjoint. 1 Definition 1. The Kneser graph on a ground set [n] is [n] KGn,k = , {(A, B) : |A| = |B| = k, A ∩ B = ∅} . k Thus, the maximum independent set in KGn,k is equivalent tokthe maximum intersecting family of k-sets - by the Er˝ os-Ko-Rado theorem, α(KGn,k ) = n−1 k−1 = n |V | for k ≤ n/2. The maximum clique in KGn,k is equivalent to the maximum number of disjoint k-sets, i.e. ω(KGn,k ) = bn/kc. Another natural question is, what is the chromatic number of KGn,k ? Note that for n = 3k − 1, the Kneser graph does not have any triangle, and also α(KG3k−1,k ) = 31 |V |. Yet, we will show that the chromatic number χ(KGn,k ) grows with n. Therefore, these graphs give another example of a triangle-free graph of high chromatic number. Theorem 4 (Lov´ asz-Kneser). For all k > 0 and n ≥ 2k − 1, χ(KGn,k ) = n − 2k + 2. Proof. First, we show that KGn,k can be colored using n − 2k + 2 colors. This means assigning colors to k-sets, so that all k-sets of the same color intersect. This is easy to achieve: color each k-set by its element which is as close to n/2 as possible. Since every k-set has an element between k and n − k + 1, we have n − 2k + 2 colors and all k-sets of a given color intersect. The proof that n − 2k + 1 colors are not enough is more interesting. Let d = n − 2k + 1 and assume that KGn,k is colored using d colors. Let X be a set of n points on S d in a general position (there are no d + 1 points lying on a d-dimensional hyperplane through the origin). Each subset X A ∈ k corresponds to a vertex of KGn,k which is colored with one of d colors. Let Ai be the collection of k-sets corresponding to color i. We define sets U1 , . . . , Ud ⊆ S d as follows: x ∈ Ui , if there exists A ∈ Ai such that ∀y ∈ A; x · y > 0. In other words, x ∈ Ui if some k-set of color i lies in the open hemisphere whose pole is x. Finally, we define U0 = S d \ (U1 ∪ U2 ∪ . . . ∪ Ud ). It’s easy to see that the sets U1 , . . . , Ud are open and U0 is closed. By Theorem 3, there is a set Ui and two antipodes x, −x ∈ Ui . If this happens for i = 0, then we have two antipodes x, −x which are not contained in any Ui , i > 0. This means that both hemispheres contain fewer than k points, but then n−2(k−1) = d+1 points must be contained in the “equator” between the two hemispheres, contradicting the general position of X. Therefore, x, −x ∈ Ui for some i > 0, which means we have two k-sets of color i lying in opposite hemispheres. This means that they are disjoint and hence forming an edge in KGn,k , which is a contradiction. 3 Dolnikov’s theorem The Kneser graph can be defined naturally for any set system F: two sets form an edge if they are disjoint. We denote this graph by KG(F): KG(F) = {F, {(A, B) : A, B ∈ F, A ∩ B = ∅}}. We derive a bound on the chromatic number of KG(F) which generalizes Theorem 4. For this purpose, we need the notion of a 2-colorability defect. Definition 2. For a hypergraph (or set system) F, the 2-colorability defect cd2 (F) is the smallest number of vertices, such that removing them and all incident hyperedges from F produces a 2colorable hypergraph. 2 For example, if H is the hypergraph of all k-sets on n vertices, we need to remove n − 2k + 2 vertices and then the remaining hypergraph of k-sets on 2k − 2 vertices is 2-colorable. (Note that a similar hypergraph on 2k−1 vertices is not 2-colorable.) Thus, cd2 (H) = n−2k+2.. Coincidentally, this is also the chromatic number of the corresponding Kneser graph. We prove the following. Theorem 5 (Dolnikov). For any hypergraph (or set system) F, χ(KG(F)) ≥ cd2 (F). We remark that equality does not always hold, and also cd2 (F) is not easy to determine for a given hypergraph. The connection between two very different coloring concepts is quite surprising, though. Our first proof follows the lines of the Kneser-Lov´asz theorem. Proof. Let d = χ(KG(F)) and consider a coloring of F by d colors. Again, we identify the ground set of F with a set of points X ⊂ S d in general position, with no d+1 points on the same hyperplane through the origin. We define Ui ⊆ S d by x ∈ Ui iff some set F ∈ F of color i is contained in H(x) = {y ∈ S d : x · y > 0}. Also, we set A0 = S d \ (A1 ∪ . . . ∪ Ad ). By Theorem 3, there is a set Ai containing two antipodal points x, −x. This cannot happen for i ≥ 1, because then there would be two sets F, F 0 ∈ F of color i such that F ⊂ H(x) and F 0 ⊂ H(−x). This would imply F ∩ F 0 = ∅, contradicting the coloring property of the Kneser graph KG(F). Therefore, there are two antipodal points x, −x ∈ A0 . This implies that there is no set F ∈ F in either hemisphere H(x) or H(−x). By removing the points on the equator between H(x) and H(−x), whose number is at most d = χ(KG(F)), and also removing all the sets in F containing them, we obtain a hypergraph F 0 such that all the sets F ∈ F 0 touch both hemispheres H(x), H(−x). This hypergraph can be colored by 2 colors corresponding to the two hemispheres. Next, we present Dolnikov’s original proof, which is longer but perhaps more intuitive. It relies on the following geometric lemma, which follows from the Borsuk-Ulam theorem. Lemma 1. Let C1 , C2 , . . . , Cd be families of convex bounded sets in Rd . Suppose that each family 0 0 Ci is Sdintersecting, i.e. ∀C, C ∈ Ci ; C ∩ C 6= ∅. Then there is a hyperplane intersecting all the sets in i=1 Ci . Proof. Let’s consider a vector v ∈ S d−1 , which defines a line in Rd , Lv = {αv : α ∈ R}. For each family Ci , we consider its projection on Lv . Formally, for each C ∈ Ci we consider P (C, v) = {x · v : x ∈ C}. Since each C is a convex bounded set, P (C, v) is a bounded interval. C ∩ C 0 6= ∅ for all C, C 0 ∈ Ci , and therefore all theTintervals P (C, v) are pairwise intersecting as well. Hence, the intersection of all these intervals, C∈Ci P (C, v), is a nonempty bounded interval as well. Let fi (v) denote the T midpoint of C∈Ci P (C, v). This means that the hyperplane H(v, λ) = {x ∈ Rd : x · v = λ} for λ = fi (v) intersects all the sets in Ci . 3 For each 1 ≤ i ≤ d − 1, define gi (v) = fi (v) − fd (v). Observe that P (C, −v) = −P (C, v) and hence fi (−v) = −fi (v), and also gi (−v) = −gi (v). By Theorem 2, there is a point v ∈ S d−1 such that for all 1 ≤ i ≤ d − 1, gi (v) = gi (−v). Since gi (−v) = −gi (v), this implies that in fact gi (v) = 0. In other words, fi (v) = fd (v) = λ for all 1 ≤ i ≤ d − 1. This means that the hyperplane H(v, λ) intersects all the sets in Ci , for each 1 ≤ i ≤ d. Now we can give the second proof of Dolnikov’s theorem. Proof. We consider a coloring of the Kneser graph KG(F) by d colors. Denote by Fi the collection of sets in F corresponding to vertices of color i. We represent the ground set of F by a set of points X ⊂ Rd in general position. (Observe that in the first proof, we placed the points in S d ⊂ Rd+1 .) Again, we assume that there are no d + 1 points on the same hyperplane. We define d families of convex sets: for every i ∈ [d], Ci = {conv(F ) : F ∈ Fi }. In other words, these are polytopes corresponding to sets of color i. In each family, all polytopes are pairwise intersecting, by the coloring property of KG(F). Therefore by Lemma 1, there is a hyperplane H intersecting all these polytopes in each Ci . Let Y = H ∩ X be the set of points exactly on the hyperplane. Let’s remove Y and all the sets containing some point in Y , and denote the remaining sets by F 0 . Each set F 0 ∈ F 0 must contain vertices on both sides of H, otherwise conv(F 0 ) would not be intersected by H. Therefore, coloring the open halfspaces on the two sides of H by 2 colors, we obtain a valid 2-coloring of F 0 . 4 Math 184: Combinatorics Lecture 15: Applications of linear algebra Instructor: Benny Sudakov 1 Linear algebra in combinatorics After seeing how probability and topology can be useful in combinatorics, we are going to exploit an even more basic area of mathematics - linear algebra. While the probabilistic method is usually useful to construct examples and prove lower bounds, a common application of linear algebra is to prove an upper bound, where we show that a collection of objects satisfying certain properties cannot be too large. A typical argument to prove this is that we replace the objects by vectors in a linear space of certain dimension, and we show that the respective vectors are linearly independent. Hence, there cannot be more of them than the dimension of the space. 2 Even and odd towns We start with the following classical example. Suppose there is a town where residents love forming different clubs. To limit the number of possible clubs, the town council establishes the following rules: Even town. • Every club must have an even number of members. • Two clubs must not have exactly the same members. • Every two clubs must share an even number of members. How many clubs can be formed in such a town? We leave it as an exercise to the reader that there can be as many as 2n/2 clubs (for an even number of residents n). Thus, the town council reconvened and invited a mathematician to help with this problem. The mathematician suggested the following modified rules. Odd/even town. • Every club must have an odd number of members. • Every two clubs must share an even number of members. The residents soon found out that they were able to form only n clubs under these rules, for example by each resident forming a separate club. In fact, the mathematician was able to prove that more than n clubs are impossible to form. Theorem 1. Let F ⊂ 2[n] be such that |A| is odd for every A ∈ F and |A ∩ B| is even for every distinct A, B ∈ F. Then |F| ≤ n. 1 Proof. Consider the vector space Z2n , where Z2 = {0, 1} is a finite field with operations modulo 2. Represent each club A ∈ F by its incidence vector 1A ∈ Z2n , where a component i is equal to 1 exactly if i ∈ A. We claim P that these vectors are linearly independent. Suppose that z = A∈F αA 1A = 0. Fix any B ∈ F. We consider the inner product z · 1B = 0. By the linearity of the inner product and the odd-town rules, X 0 = z · 1B = αA (1A · 1B ) = αB , A∈F all operations over Z2 . We conclude that αB = 0 for all B ∈ F. Therefore, the vectors {1A : A ∈ F} are linearly independent and their number cannot be more than n, the dimension of Z2n . An alternative variant is an even/odd town, where the rules are reversed. Even/odd town. • Every club must have an even number of members. • Every two clubs must share an odd number of members. Exercise. By a simple reduction, any even/odd town with n residents and m clubs can be converted to an odd/even town with n + 1 residents and m clubs. This shows that there is no even/odd town with n residents and n + 2 clubs. Theorem 2. Let F ⊂ 2[n] be such that |A| is even for every A ∈ F and |A ∩ B| is odd for every distinct A, B ∈ F. Then |F| ≤ n. Proof. Assume that |F| = n + 1. All calculations in P the following are taken mod 2. The n + 1 vectors {1A : A ∈ F} must be linearly dependent, i.e. A∈F αA 1A = 0 for some non-trivial linear combination. Note that 1A ·1B = 1 for distinct A, B ∈ F and 1A ·1A = 0 for any A ∈ F. Therefore, X X 1B · αA 1A = αA = 0. A∈F A∈F :A6=B By subtracting these expressions for B, B 0 ∈ F, we get αB = αB 0 . This means that all the coefficients αB are equal and in fact equal to 1 (otherwise the linearP combination is trivial). We have proved that P for any even/odd town with n + 1 clubs, A∈F 1A = 0. Moreover, for any B ∈ F, 0 = 1B · A∈F 1A = |F| − 1 = n which means that |F| is odd and n is even. ¯ Since the total Now we use the following duality. Replace each set A ∈ F by its complement A. ¯ ¯ ¯ number of elements n is even, we get |A| even and |A ∩ B| odd for any distinct A, B ∈ F. This means that the n + 1 P complementary clubs A¯ should also form an even/odd town and therefore again, we should have A∈F 1A¯ = 0. But then, 0= X A∈F 1A + X 1A¯ = |F|1 A∈F where 1 is the all-ones vector. This implies that |F| is even, contradicting our previous conclusion that |F| is odd. 2 3 Fisher’s inequality A slight modification of the odd-town rules is that every two clubs share a fixed number of members k (there is no condition here on the size of each club). We get a similar result here, which is known as Fisher’s inequality. Theorem 3 (Fisher’s inequality). Suppose that F ⊂ 2[n] is a family of nonempty clubs such that for some fixed k, |A ∩ B| = k for every distinct A, B ∈ F. Then |F| ≤ n. Proof. Again, we consider the incidence vectors {1A : A ∈ F}, this time P as vectors in the real vector space Rn . We have 1A · 1B = k for all A 6= B in F. Suppose that A∈F αA 1A = 0. Then ! ! X X X 0 = || αA 1A ||2 = αA 1A · αB 1B A∈F A∈F B∈F !2 = X A∈F 2 αA |A| + X αA αB k = k X αA A∈F A6=B∈F + X 2 αA (|A| − k). A∈F Note that |A| ≥ k, and at most one set A∗ can actually have size k. Therefore, theP contributions to the last expression are all nonnegative and αA = 0 except for |A∗ | = k. But then, A∈F αA = αA∗ and this must be zero as well. We have proved that the vectors {1A : A ∈ F} are linearly independent in Rn and hence their number can be at most n. Fisher’s inequality is related to the study of designs, set systems with special intersection patterns. We show here how such a system can be used to construct a graph on n vertices, which does not have any clique or independent set of size ω(n1/3 ). Recall that in a random graph, there are no cliques or independent sets significantly larger than log n; so this explicit construction is very weak in comparison. Lemma 1. For a fixed k, let G be a graph whose vertices are triples T ∈ [k] 3 and {A, B} is an edge if |A ∩ B| = 1. Then G does not contain any clique or independent set of size more than k. Proof. Suppose Q is a clique in G. This means we have a set of triples on [k] where each pair intersects in exactly one element. By Fisher’s inequality, the number of such triples can be at most k. Suppose S is an independent set in G. This is a set of triples on [k] where each pair intersects in an even number of elements, either 0 or 2. By the odd-town theorem, the number of such triples is again at most k. Another application of Fisher’s inequality is the following. Lemma 2. Suppose P is a set of n points in the plane, not all on one line. Then pairs of points from P define at least n distinct lines. Proof. Let L be the set of lines defined by pairs of points from P . For each point xi ∈ P , let Ai ⊆ L be the set of lines containing xi . We have |Ai | ≥ 2, otherwise all points lie on the same line. Also, Ai is different for each point; the same set of at least 2 lines would define the same point. Moreover, any two points share exactly one line, i.e. |Ai ∩ Ai0 | = 1 for any i 6= i0 . By Fisher’s inequality, we get |P | ≤ |L|. 3 Math 184: Combinatorics Lecture 16: Linear algebra - continued Instructor: Benny Sudakov 1 Spaces of polynomials In the previous lecture, we considered objects representedP by 0/1 vectors. A vector 1A corresponding to a set A can be also viewed as a linear form f (~x) = i∈A xi . (All our proofs could be written equivalently in the language of linearly independent linear forms.) More generally, however, we can represent objects by polynomials f (x). Polynomials of a certain degree form a vector space and we can still apply the same arguments about dimension and linear independence. This gives us more flexibility and power compared to the linear case. 2 Two-distance sets Consider a set of points A ⊂ Rn . If all the pairwise distances between points in A are equal, then these are the vertices of a simplex. The number of such points can be at most n + 1. What if we relax the condition and require that there are two possible distances c, d, so that any pairwise distance is either c or d? Such a set is called a two-distance set. Exercise. Construct a two-distance set in Rn with n 2 points. Theorem 1. Any two-distance set in Rn has at most 21 (n + 1)(n + 4) points. Proof. Let A ⊂ Rn be a two-distance set. For each point a ∈ A, we define a polynomial on Rn , fa (x) = ||x − a||2 − c2 ||x − a||2 − d2 . P Here, ||x||2 = i x2i denotes the square of the Peuclidean norm. Let’s prove that the polynomials fa (x) are linearly independent. Suppose that a∈A αa fa (x) is identically zero. Then plug in x = b for some point b ∈ A. We have fa (b) = 0 for any b 6= a, because ||a − b|| is either c or d. So we P have 0 = a∈A αa fa (b) = αb fb (b) = αb c2 d2 . Since cd 6= 0, this implies αb = 0, for any b ∈ A. This shows that the polynomials fa (x) are linearly independent. Finally, we want to bound the dimension of the vector space containing our polynomials. By expanding the euclidean norms, it can be seen that each fa (x) can be expressed as a linear combination of the following polynomials: ( n ) n X X V = ( x2i )2 , xj x2i , xi xj , xi , 1 | i, j ∈ [n] . i=1 i=1 The number of generators here is 1 + n + 21 n(n + 1) + n + 1 = 12 (n + 1)(n + 4). Therefore, the polynomials fi (x) reside in a vector space of dimension 12 (n + 1)(n + 4). 1 3 Sets with few possible intersection sizes Here we discuss a generalization of Fisher’s inequality. Consider a family of sets F ⊆ 2[n] and let L ⊂ {0, 1, . . . , n}. We say that F is L-intersecting if |A ∩ B| ∈ L for any distinct A, B ∈ F. Fisher’s inequality says that if |L| = 1 then |F| ≤ n. Frankl and Wilson proved the following generalization in 1981. Theorem 2. If F is an L-intersecting family of subsets of [n], then |F| ≤ |L| X n k=0 k . Note that the family of all subsets of size at most ` is L-intersecting, for L = {0, 1, . . . , ` − 1}, so this bound is best possible. Proof. Let F ⊂ 2[n] and |L| = s. For any A 6= B ∈ F, |A ∩ B| ∈ L. We define a polynomial on Rn for each A ∈ F: ! Y X fA (x) = xe − ` . `∈L:`<|A| e∈A Observe that for any B ∈ F, B 6= A, if we plug in the indicator vector 1B , we get Y fA (1B ) = (|A ∩ B| − `) = 0 `∈L:`<|A| because |A ∩ B| = ` < |A| for some ` ∈ L. On the other hand, Y fA (1A ) = (|A| − `) > 0. `∈L:`<|A| By an argument similar to the one we used before, the polynomials {fA (x) : A ∈ L} are independent. It remains to compute the dimension of the space containing all these polynomials. A trick that helps reduce the dimension is that we are only using 0/1 vectors here. Thus, we can replace all higher powers xki by xi itself; this does not Q change the linear independence property. Then, the polynomials by all monomials i∈I xi , where |I| ≤ s. The number of such monomials Pare generated is exactly sk=0 nk , as required. Using essentially the same argument, we can also prove the following modular version of the theorem. Theorem 3. Let p be prime and L ⊂ Zp . Assume F ⊂ 2[n] is a family of sets such that • |A| = 6 L (mod p) for any A ∈ F. • |A ∩ B| ∈ L (mod p) for any distinct A, B ∈ F. Then |F| ≤ |L| X n k=0 2 k . Proof. Let F ⊂ 2[n] and L ⊂ Zp . In the following, all operations are mod p. For any A, B ∈ F distinct, |A ∩ B| ∈ L. We define a polynomial on Zpn for each A ∈ F: ! fA (x) = Y X `∈L e∈A xe − ` . Observe that for any B ∈ F, B 6= A, if we plug in the indicator vector 1B , we get Y fA (1B ) = (|A ∩ B| − `) = 0 `∈L because |A ∩ B| ∈ L. On the other hand, fA (1A ) = Y (|A| − `) 6= 0. `∈L Again, we replace each fA (x) by f˜A (x) where each factor xki is replaced by xi . Since we are only ˜ substituting 0/1 values, this does not affect the properties Q above. Hence, the polynomials fA (x) are independent. They are generated by all monomials i∈I xi , where |I| ≤ |L|. The number of P|L| such monomials is exactly k=0 nk , as required. 3 Math 184: Combinatorics Lecture 17: Linear algebra - continued Instructor: Benny Sudakov 1 Few possible intersections - summary Last time, we proved two results about families of sets with few possible intersection sizes. Let us compare them here. Theorem 1. If F is an L-intersecting family of subsets of [n], then |F| ≤ |L| X n k=0 k . Theorem 2. Let p be prime and L ⊂ Zp . Assume F ⊂ 2[n] is an L-intersecting family (with intersections taken mod p), and no set in F has size in L (mod p). Then |F| ≤ |L| X n k=0 k . Both results have intersecting applications. First, let’s return to Ramsey graphs. 2 Explicit Ramsey graphs We saw how to construct a graph on n = k3 vertices, which does not contain any clique or independent set larger than k. Here, we improve this construction to n = k Ω(log k/ log log k) , i.e. superpolynomial in k. 3 Theorem 3 (Frankl,Wilson 1981). For any prime p, there is a graph G on n = p2p−1 vertices P p3 such that any clique or independent set in G has size at most k = p−1 i=0 i . 2 −1) Note that n ' p3(p , while k ' p3(p−1) . I.e., n ' k p+1 ' k log k/ log log k . 3 Proof. We construct G as follows. Let V = p[p2 −1] , and let A, B ∈ V form an edge if |A ∩ B| = 6 p−1 2 (mod p). Note that for each A ∈ V , |A| = p − 1 = p − 1 (mod p). If A1 , . . . , Ak is a clique, then |Ai | = p − 1 (mod p), while |Ai ∩ Aj | 6= p − 1 (mod p) for all P p3 i 6= j. By Theorem 2 with L = {0, 1, . . . , p − 2}, we get k ≤ p−1 i=0 i . If A1 , . . . , Ak is an independent set, then |Ai ∩ Aj | = p − 1 (mod p) for all i 6= j. This means |Ai ∩ Aj | ∈ L = {p − 1, 2p − 1, . . . , p2 − p − 1}, without any modulo operations. By Theorem 1, we P p3 get k ≤ p−1 i=0 i . 1 3 Borsuk’s conjecture Can every bounded set S ⊂ Rd be partitioned into d + 1 sets of strictly smaller diameter? This conjecture was a long-standing open problem, solved in the special cases of a sphere S (by Borsuk himself), S being a smooth body (using the Borsuk-Ulam theorem) and low dimension d ≤ 3. It can be seen that a simplex requires d + 1 sets, otherwise we have 2 vertices in the same part and hence the diameter does not decrease. The conjecture was disproved dramatically in 1993, when Kahn and Kalai showed that significantly more than d + 1 parts are required. Theorem 4. For any d sufficiently large, there√exists a bounded set S ⊂ Rd (in fact a finite set) such that any partition of S into fewer than 1.2 d parts contains a part of the same diameter. The proof uses an algebraic construction, relying on the following lemma. vectors F ⊆ {−1, +1}4p such that every Lemma 1. For any prime p, there exists a set of 12 4p 2p 4p subset of 2 p−1 vectors contains an orthogonal pair of vectors. Proof. Consider 4p elements and all subsets of size 2p, containing a fixed element 1: F = {I : I ⊆ [4p], |I| = 2p, 1 ∈ I}. For each set I, we define a vector viI = +1 if i ∈ I and viI = −1 if i ∈ / I. We set F = {v I : I ∈ F}. The only way that a pair of such vectors v I , v J can be orthogonal is that |I∆J| = 2p and then |I ∩ J| = p. Note that |I ∩ J| is always between 1 and 2p − 1 (I, J are different and they share at least 1 element). Hence v I · v J = 0 iff |I ∩ J| = 0 (mod p). We claim that this is the desired collection of vectors. For a subset G ⊂ F without any orthogonal pair, we would have a family of sets G ⊂ F such that • ∀I ∈ G; |I| = 0 (mod p). • ∀I, J ∈ G; |I ∩ J| ∈ {1, 2, . . . , p − 1} (mod p). By Theorem 2, |G| ≤ p−1 X 4p k=0 k <2 4p . p−1 Now we are ready to prove the theorem. Proof. Given a set of vectors F ⊆ Rn = R4p provided by the lemma above, we define a set of vectors 2 X = {v ⊗ v : v ∈ F } ⊂ Rn . Here, each vector is a tensor product w = v ⊗ v. More explicitly, 1 ≤ i, j ≤ n. wij = vi vj , These vectors satisfy the following properties: 2 2 • w ∈ {−1, +1}n ; ||w|| = √ n2 = n. • w · w0 = (v ⊗ v) · (v 0 ⊗ v 0 ) = (v · v 0 )2 ≥ 0. • w, w0 are orthogonal if and only if v, v 0 are orthogonal. • ||w − w0 ||2 = ||w||2 + ||w0 ||2 − 2(w · w0 ) = 2n2 − 2(v · v 0 )2 ≤ 2n2 , and the pairs of maximum distance correspond to orthogonal vectors. 4p By the lemma, any subset of 2 p−1 vectors contains an orthogonal pair and so its diameter is the same as the original set. If we want to decrease the diameter, we must partition X into 4p sets of size less than 2 p−1 , and the number of such parts is at least |X| 2 4p p−1 1 4p 2 2p 4p 2 p−1 = (3p + 1)(3p)(3p − 1) · · · (2p + 2)(2p + 1) = ≥ 4(2p)(2p − 1) · · · (p + 1)p p−1 3 . 2 The dimension of√ our space is d = n2 = (4p)2 , and the number of parts must be at least (3/2)p−1 = (3/2) d/4−1 . (The bound can be somewhat improved by a more careful analysis.) 3 Math 184: Combinatorics Lecture 18: Spectral graph theory Instructor: Benny Sudakov 1 Eigenvalues of graphs Looking at a graph, we see some basic parameters: the maximum degree, the minimum degree, its connectivity, maximum clique, maximum independent set, etc. Parameters which are less obvious yet very useful are the eigenvalues of the graph. Eigenvalues are a standard notion in linear algebra, defined as follows. Definition 1. For a matrix A ∈ Rn×n , a number λ is an eigenvalue if for some vector x 6= 0, Ax = λx. The vector x is called an eigenvector corresponding to λ. Some basic properties of eigenvalues are • The eigenvalues are exactly the numbers λ that make the matrix A−λI singular, i.e. solutions of det(A − λI) = 0. • All eigenvectors corresponding to λ form a subspace Vλ ; the dimension of Vλ is called the multiplicity of λ. • In general, eigenvalues can be complex numbers. However, if A is a symmetric matrix (aij = aji ), then all eigenvalues are real, and moreover there is an orthogonal basis consisting of eigenvectors. P P • The sum of all eigenvalues, including multiplicities, is ni=1 λi = T r(A) = ni=1 aii , the trace of A. Q • The product of all eigenvalues, including multiplicities, is ni=1 λi = det(A), the determinant of A. • The number of non-zero eigenvalues, including multiplicities, is the rank of A. For graphs, we define eigenvalues as the eigenvalues of the adjacency matrix. Definition 2. For a graph G, the adjacency matrix A(G) is defined as follows: • aij = 1 if (i, j) ∈ E(G). • aij = 0 if i = j or (i, j) ∈ / E(G). Because T r(A(G)) = 0, we get immediately the following. Lemma 1. The sum of all eigenvalues of a graph is always 0. 1 Examples. 1. The complete graph Kn has an adjacency matrix equal to A = J − I, where J is the all-1’s matrix and I is the identity. The rank of J is 1, i.e. there is one nonzero eigenvalue equal to n (with an eigenvector 1 = (1, 1, . . . , 1)). All the remaining eigenvalues are 0. Subtracting the identity shifts all eigenvalues by −1, because Ax = (J − I)x = Jx − x. Therefore the eigenvalues of Kn are n − 1 and −1 (of multiplicity n − 1). 2. If G is d-regular, then 1 = (1, 1, . . . , 1) is an eigenvector. We get A1 = d1, and hence d is an eigenvalue. It is easy to see that no eigenvalue can be larger than d. In general graphs, the largest eigenvalue is a certain notion of what degrees essentially are in G. 3. If G is d-regular and d = λ1 ≥ λ2 ≥ . . . ≥ λn are the eigenvalues of G, then the eigenvalues ¯ are n − 1 − d and {−1 − λi : 2 ≤ i ≤ n}. This is because A(G) ¯ = J − I − A(G); G ¯ of G is (n − 1 − d)-regular, so the largest eigenvalue is n − 1 − d. Any other eigenvalue λ has an eigenvector x orthogonal to 1, and hence ¯ = (J − I − A(G))x = 0 − 1 − λ. A(G)x 4. The complete bipartite graph Km,n has an adjacency matrix of rank 2, therefore we expect to have eigenvalue 0 of multiplicity n − 2, and two non-trivial eigenvalues. These should be equal to ±λ, because the sum of all eigenvalues is always 0. We find λ by solving Ax = λx. By symmetry, we guess that the eigenvector x should have m coordinates equal to α and n coordinates equal to β. Then, Ax = (mβ, . . . , mβ, nα, . . . , nα). This should be a multiple of x = (α, . . . , α, β, . . . , β). Therefore, we get mβ = λα and √ nα = λβ, i.e. mnβ = λ2 β and λ = mn. 2 Math 184: Combinatorics Lecture 19: The Petersen graph and Moore graphs Instructor: Benny Sudakov 1 The Petersen graph As a more interesting exercise, we will compute the eigenvalues of the Petersen graph. Definition 1. The Petersen graph is a graph with 10 vertices and 15 edges. It can be described in the following two ways: 1. The Kneser graph KG(5, 2), of pairs on 5 elements, where edges are formed by disjoint edges. 2. The complement of the line graph of K5 : the vertices of the line graph are the edges of K5 , and two edges are joined if they share a vertex. 3. Take two disjoint copies of C5 : (v1 , v2 , v3 , v4 , v5 ) and (w1 , w2 , w3 , w4 , w5 ). Then add a matching of 5 edges between them: (v1 , w1 ), (v2 , w3 ), (v3 , w5 ), (v4 , w2 ), (v5 , w4 ). The Petersen graph is a very interesting small graph, which provides a counterexample to many graph-theoretic statements. For example, • It is the smallest bridgeless 3-regular graph, which has no 3-coloring of the edges so that adjacent edges get different colors (the smallest “snark”). • It is the smallest 3-regular graph of girth 5. • It is the largest 3-regular graph of diameter 2. • It has 2000 spanning trees, the most of any 3-regular graph on 10 vertices. To compute the eigenvalues of the Petersen graph, we use the fact that it is strongly regular. This means that not only does each vertex have the same degree (3), but each pair of vertices (u, v) ∈ E has the same number of shared neighbors (0), and each pair of vertices (u, v) ∈ / E has the same number of shared neighbors (1). In terms of the adjacency matrix, this can be expressed as follows: P • (A2 )ij = k aik akj is the number of neighbors shared by i and j. • For i = j, (A2 )ij = 3. • For i 6= j, (A2 )ij = 1 − aij : either 0 or 1 depending on whether (i, j) ∈ E. In concise form, this can be written as A2 + A − 2I = J. 1 Now consider any eigenvector, Ax = λx. We know that one eigenvector is 1 which has eigenvalue d = 3. Other than that, all eigenvectors x are orthogonal to 1, which also means that Jx = 0. Then we get (A2 + A − 2I)x = λ2 x + λx − 2x = 0. This means that each eigenvalue apart from the largest one should satisfy a quadratic equation λ2 + λ − 2 = 0. This equation has two roots, 1 and −2. P Finally, we calculate the multiplicity of each root from the condition that λi = 0. The largest eigenvalue has multiplicity 1 (it is obvious that any vector such that Ax = 3x is a multiple of 1). Therefore, if eigenvalue 1 comes with multiplicity a and −2 with multiplicity b, we get 3 + a · 1 + b · (−2) = 0 and a + b = 9, which implies a = 5 and b = 4. We conclude that the Petersen graph has eigenvalues including multiplicities (3, 1, 1, 1, 1, 1, −2, −2, −2, −2). Finally, we show an application of eigenvalues to the following question. Consider 3 overlapping copies of the Petersen graph. The degrees in each copy are equal to 3, so the degrees in total could add up to 9 and form the complete graph K10 . However, something does not work here when you try it. The following statement shows that indeed this is impossible. Theorem 1. There is no decomposition of the edge set of K10 into 3 copies of the Petersen graph. Proof. Suppose that A, B, C are adjacency matrices of different permutations of the Petersen graph, such that they add up to the adjacency matrix of K10 , A + B + C = J − I. Let VA and VB be the subspaces corresponding to eigenvalue 1 for matrices A and B, respectively. We know that dim(VA ) = dim(VB ) = 5, and moreover both VA and VB are orthogonal to the eigenvector 1. This implies that they cannot be disjoint (then we would have 11 independent vectors in R10 ), and therefore there is a nonzero vector z ∈ VA ∩ VB . This vector is also orthogonal to 1, i.e. Jz = 0. Therefore, we get Cz = (J − I − A − B)z = −z − Az − Bz = −3z. But we know that −3 is not an eigenvalue of the Petersen graph, which is a contradiction. 2 Moore graphs and cages The Petersen graph is a special case of the following kind of graph: Suppose that G is d-regular, starting from any vertex it looks like a tree up to distance k and within distance k we already see the entire graph. In other words, the diameter of the graph is k and the girth is 2k + 1. Such graphs are called Moore graphs. By simple counting, we get that the number of vertices in such a graph must be nd,k = 1 + d k−1 X (d − 1)i . i=0 This is obviously the minimum possible number of vertices for a d-regular graph of girth 2k + 1. Such graphs are also called cages. The Petersen graph is a (unique) example of a 3-regular Moore graph of diameter 2 and girth 5. There are surprisingly few known examples of Moore graphs. We prove here that for girth 5 there cannot be too many indeed. 2 Theorem 2 (Hoffman-Singleton). The only d-regular Moore graphs of diameter 2 exist for d = 2, 3, 7 and possibly 57. Proof. Assume G is a d-regular Moore graph of girth 5. The number of vertices is n = 1 + d + d(d − 1) = d2 + 1. Again, we consider the square of the adjacency matrix A2 . Observe that adjacent vertices don’t share any neighbors, otherwise there is a triangle in G. Non-adjacent vertices share exactly one neighbor, because the diameter of G is 2 and there is no 4-cycle in G. Hence, A2 has d on the diagonal, 0 for edges and 1 for non-edges. In other words, A2 + A − (d − 1)I = J. If λ is an eigenvalue of A different from d, we get λ2 + λ − (d − 1) = 0. This means 1 1√ 1 1p 1 + 4(d − 1) = − ± 4d − 3. λ=− ± 2 2 2 2 √ √ Assume that − 21 + 21 4d − 3 has multiplicity a and − 21 − 12 4d − 3 has multiplicity b. We get d− √ a+b 1 + (a − b) 4d − 3 = 0. 2 2 We also know that a + b = n − 1 = d2 . Therefore, √ (a − b) 4d − 3 = a + b − 2d = d2 − 2d. This can be true only if a = b and d = 2, or else 4d−3 is a square. Let 4d−3 = s2 , i.e. d = 14 (s2 +3). Substituting this into the equation d− d2 s + (2a − d2 ) = 0, 2 2 we get 1 s 1 1 2 (s + 3) − (s2 + 3)2 + (2a − (s2 + 3)2 ) = 0. 4 32 2 16 From here, we get s5 + s4 + 6s3 − 2s2 + (9 − 32a)s = 15. To satisfy this equation by integers, s must divide 15 and hence s ∈ {1, 3, 5, 15}, giving d ∈ {1, 3, 7, 57}. Case d = 1 leads to G = K2 which is not a Moore graph. We remark that the graph for d = 2 is C5 , for d = 3 it is the Petersen graph, for d = 7 it is the “Hoffman-Singleton graph” (with 50 vertices and 175 edges) and for d = 57 it is not known whether such a graph exists. This graph would need to have 3250 vertices, 92, 625 edges, diameter 2 and girth 5. 3 Math 184: Combinatorics Lecture 20: Friends and politicians Instructor: Benny Sudakov 1 The friendship theorem Theorem 1. Suppose G is a (finite) graph where any two vertices share exactly one neighbor. Then there is a vertex adjacent to all other vertices. The interpretation of this theorem is as follows: if any two people have exactly one friend in common, then there is a person (the politician) who is everybody’s friend. We actually prove a stronger statement, namely that the only graph with this structure consists of a collection of triangles that all share one vertex. Surprisingly, the friendship theorem is false for infinite graphs. Let G0 = C5 , and let Gn+1 be obtained from Gn by adding S∞ a separate common neighbor to each pair of vertices that does not have one yet. Then G = n=0 Gn is a counterexample to the theorem. The theorem for finite graphs sounds somewhat similar to the Erd˝os-Ko-Rado theorem. Interestingly, the proof requires some spectral analysis. Proof. Assume for the sake of contradiction that any two vertices in G share exactly one vertex, but there is no vertex adjacent to all other vertices. Note that the first condition implies that there is no C4 subgraph in G. First, we claim that G is a regular graph. Suppose (u, v) ∈ / E and w1 , . . . , wk are the neighbors of u. We know that v and wi share a neighbor zi for each i. The vertices zi must be distinct, otherwise we would get a C4 (between u, wi , wj and zi = zj ). Therefore, v also has at least k neighbors. By symmetry, we conclude that deg(u) = deg(v) for any (u, v) ∈ / E. Assuming that w1 is the only shared neighbor of u and v, any other vertex w is adjacent to at most one of u, v and hence deg(w) = deg(u) = deg(v). Finally, w1 is not adjacent to all these vertices, so deg(w1 ) = deg(u) = deg(v) as well. Hence, all degrees are equal to k. The number of walks of length 2 from a fixed vertex x is k 2 . Because every vertex y 6= x has a unique path of length 2 from x, this way we count every vertex once except x itself, which is counted k times. To account for that, we subtract k − 1 and the total number of vertices is n = k 2 − k + 1. We consider the adjacency matrix A. Since any two vertices share exactly one neighbor, the matrix A2 has k on the diagonal and 1 everywhere else. We can write A2 = J + (k − 1)I. From this expression, it’s easy to see that A2 has eigenvalues n+k −1 = k 2 , and k −1 of multiplicity n − 1. The eigenvalues of A2 are squares of the eigenvalues of A, which are k√(the degree of each √ vertex), and ± k − √ 1. We know that the eigenvalues should sum up to 0. If k − 1 appears with multiplicity r and − k − 1 appears with multiplicity s. This yields √ k + (r − s) k − 1 = 0. 1 This implies k 2 = (s − r)2 (k − 1), i.e. k − 1 divides k 2 . This is possible only for k = 1, 2; otherwise, k − 1 divides k 2 − 1 and hence cannot divide k 2 . For k = 1, 2, we get two regular graphs: K1 and K3 . These both satisfy the conditions of the theorem and also the conclusion. Otherwise, we conclude that there must be a vertex x adjacent to all other vertices. Then it’s easy to see that these vertices are matched up and form triangles with the vertex x. We finish with a related conjecture of Kotzig. Conjecture. For any fixed ` > 2, there is no finite graph such that every pair of vertices is connected by precisely one path of length `. For ` = 2, we concluded that there is exactly one such graph - a collection of triangles joined by one vertex. This conjecture has been verifed for 3 ≤ ` ≤ 33, but a general proof remains elusive. 2 The variational definition of eigenvalues We continue with an equivalent definition of eigenvalues. Lemma 1. The k-th largest eigenvalue of a matrix A ∈ Rn×n is equal to λk = max min dim(U )=k x∈U xT Ax xT Ax = min max . xT x dim(U)=k−1 x⊥U xT x Here, the maximum/minimum is over all subspaces U of a given dimension, and over all nonzero vectors x in the respective subspace. Proof. We only prove the first equality - the second one is analogous. First of all, note that the quantity xT Ax/xT x is invariant under replacing x by any nonzero multiple µx. Therefore, we can assume that x is a unit vector and xT x = 1. Consider basis of eigenvectors u1 , u2 , . . . , un . Any vector x ∈ Rn can be written Pn an orthonormal as x = i=1 αi ui and the expression xT Ax reduces to !T n n n n X X X X αj uj = αi αj (ui · λj uj ) = αi2 λi . xT Ax = αi ui A i=1 j=1 i,j=1 i=1 P using the fact that ui · uj = P 1 if i = j and 0 otherwise. By a similar argument, xT x = ni=1 αi2 and for a unit vector we get ni=1 αi2 = 1. I.e., the expression xT Ax/xT x can be interpreted as a weighted average of the eigenvalues. Now consider a subspace UPgenerated by P the first k eigenvectors, [u1 , . . . , uk ]. For any unit vector x ∈ U, we get xT Ax = ki=1 αi2 λi and ki=1 αi2 = 1. This weighted average is at least the smallest of the k eigenvalues, i.e. max min dim(U)=k x∈U xT Ax ≥ λk . xT x On the other hand, consider any subspace U of dimension k, and a subspace V = [uk , uk+1 , . . . , un ] which has dimension n−k+1. These two subspace have a nontrivial intersection, namely there exists 2 P P a nonzero vector z ∈ U ∩V. We can assume that z = nj=k βj uj is a unit vector, z T z = nj=k βj2 = 1. We obtain n z T Az X 2 = βj λj ≤ λk , zT z j=k since this is a weighted average of the last n − k + 1 eigenvalues and the largest of these eigenvalues is λk . Consequently, z T Az xT Ax max min T ≤ T ≤ λk . z z dim(U)=k x∈U x x 3 A bound on the independence number Theorem 2. For a d-regular graph with smallest (most negative) eigenvalue λn , the independence number is n α(G) ≤ . 1 − d/λn Keep in mind that λn < 0, so the denominator is larger than 1. Proof. Let S ⊆ V be a maximum independent set, |S| = α. We consider a vector x = n1S − α1. (This vector can be seen as an indicator vector of S, modified to be orthogonal to 1.) By Lemma 1 with U = Rn , we know that xT Ax ≥ λn . xT x It remains to compute xT Ax. We get xT Ax = n2 1TS A1S − 2αn1TS A1 + α2 1T A1. P By the property of the independent set, we have 1TS A1S = i,j∈S aij = 0. Similarly, we get T T T 1S A1 = d1S · 1 = αd and 1 A1 = d1 · 1 = dn. All in all, xT Ax = −2αn · αd + α2 · dn = −α2 dn. Also, xT x = n2 ||1S ||2 − 2αn1S · 1 + α2 ||1||2 = n2 α − 2α2 n + α2 n = αn(n − α). We conclude that λn ≤ −α2 dn xT Ax d = = T x x αn(n − α) 1 − n/α which implies α≤ n . 1 − d/λn This bound need not be tight in general, but it gives the right value in many interesting cases. 3 • The complete graph Kn has eigenvalues λ1 = d = n − 1 and λn = −1. This yields α(G) ≤ n = 1. 1 − d/λn • The complete bipartite graph Kn,n has eigenvalues λ1 = d = n and λn = −n, hence α(G) ≤ 2n = n. 1 − d/λn • The Petersen graph has eigenvalues λ1 = d = 3 and λn = −2, therefore α(G) ≤ 10 n = =4 1 − d/λn 1 − 3/(−2) which is the right value. 4 Math 184: Combinatorics Lecture 21: Bounds on the chromatic number Instructor: Benny Sudakov 1 Bounds on the chromatic number Last time, we proved that for any d-regular graph, α(G) ≤ n . 1 − d/λn Since the chromatic number always satisfies χ(G) ≥ n/α(G), we obtain an immediate corollary. Corollary 1. For a d-regular graph with the smallest eigenvalue λn , the chromatic number is χ(G) ≥ 1 − d . λn An upper bound on χ(G) can be obtained as follows. This bound is not restricted to regular graphs - note that the special case of regular graphs (λ1 = d) is easy. Theorem 1. For any graph G, χ(G) ≤ 1 + λ1 . Proof. Consider any induced subgraph H = G[S]. We claim that the average degree in H is at most λ1 , the maximum eigenvalue of G. Consider the indicator vector 1S of S. We have λ1 = max x6=0 1TS A1S 2|E(S)| xT Ax ¯ = ≥ = d(H) T T x x |S| 1S 1S ¯ where d(H) is the average degree in H. Therefore, any subgraph H ⊂ G has average degree ¯ d(H) ≤ λ1 . ¯ We color the graph G by induction. Since d(G) ≤ λ1 , there is a vertex v of degree at most λ1 . We remove vertex v to obtain a subgraph H. By induction, H can be colored using at most 1 + λ1 colors. Finally, we add vertex v which has at most λ1 neighbors; again, we can color it using one of 1 + λ1 colors. Finally, we generalize the lower bound on χ(G) to arbitrary graphs. Theorem 2. For any graph G, χ(G) ≥ 1 − λ1 . λn Proof. Consider a coloring c : V (G) → [k], and define Ui to be the span of the basis vectors {ej : c(j) = i}. These subspaces are obviously orthogonal. P Consider an eigenvector z corresponding to λ1 and write z = ki=1 αi ui , where ui ∈ Ui and ||ui || = 1. Let U˜ be the subspace spanned by u1 , u2 , . . . , uk and let S be the rectangular k × n matrix with columns (u1 , u2 , . . . , uk ). Note that a = (α1 , . . . , αk ) is mapped to Sa = z. 1 Now consider the k × k matrix B = S T AS. For any j, we have X Bii = (ui )T Aui = ajk uij uik = 0 j,k because ui has nonzero coordinates only on vertices of color i and there are no edges between such vertices. Therefore, T r(B) = 0. ˜ x = Su is a unit vector as well: observe that S T S = I by the For any unit vector u ∈ U, 1 k orthonormality of u , . . . , u , and xT x = uT S T Su = uT u = 1. Therefore, uT Bu = uT S T ASu = (Su)T A(Su) = xT Ax ∈ [λn , λ1 ]. I.e., the eigenvalues of B lie within [λn , λ1 ]. In fact, since z = Sa is the eigenvector of A corresponding to λ1 , aT S T ASa z T Az aT Ba = = = λ1 aT a aT S T Sa zT z and λ1 is the maximum eigenvalue of B as well. T r(B) = 0 is the sum of the eigenvalues of B, which is at least λ1 + (k − 1)λn . We conclude that 0 = T r(B) ≥ λ1 + (k − 1)λn which implies k ≥ 1 − λ1 /λn . 2 Math 184: Combinatorics Lecture 22: Eigenvalues and expanders Instructor: Benny Sudakov 1 Expander graphs Expander graphs are graphs with the special property that any set of vertices S (unless very large) has a number of outgoing edges proportional to |S|. Expansion can be defined both with respect to the number of the edges or vertices on the boundary of S. We will stick with edge expansion, which is more directly related to eigenvalues. Definition 1. The edge expansion (or “Cheeger constant”) of a graph is ¯ e(S, S) h(G) = min |S| |S|≤n/2 ¯ is the number of edges between S and its complement. where e(S, S) Definition 2. A graph is a (d, )-expander if it is d-regular and h(G) ≥ . ¯ ≤ d|S| and so cannot be more than d. Graphs with comparable to d Observe that e(S, S) are very good expanders. Expanders are very useful in computer science. We will mention some applications later. 2 Random graphs It is known that random graphs are good expanders. It is easier to analyze bipartite expanders which are defined as follows. Definition 3. A bipartite graph G on n + n vertices L ∪ R is called a (d, β)-expander, if the degrees in L are d and any set of vertices S ⊂ L of size |S| ≤ n/d has at least β|S| neighbors in R. Theorem 1. Let d ≥ 4 and G be a random bipartite graph obtained by choosing d random edges for each vertex in L. Then G is a (d, d/4)-expander with constant positive probability. Proof. For each S ⊆ L and T ⊆ R, let ES,T denote the event that all neighbors of S are in T . The probability of this event is d|S| |T | . Pr[ES,T ] = n k Let β = d/4 ≥ 1. By the union bound, and the standard estimate nk ≤ ( ne k ) , Pr[∃S, T ; |S| ≤ n/d, |T | < β|S|] ≤ n/d X n n βs ds s=1 ≤ n/d X ne 2βs βs ds s=1 βs This is bounded by n P∞ = s βs n/d X 4ne ds/2 ds ds ds/2 s=1 (e/4) s=1 ds 4n n = ≤ n/d 2 X n βs ds s=1 n/d X eds ds/2 s=1 4n = (e/4)d/2 /(1 − (e/4)d/2 ) < 1 for d ≥ 4. 1 βs ≤ n n/d X e ds/2 s=1 4 . 3 Eigenvalue bounds on expansion In general, random graphs are very good expanders, so the existence of expanders is not hard to establish. However, the difficult question is how to construct expanders explicitly. For now, we leave this question aside and we explore the connection between expansion and eigenvalues. Theorem 2. For any d-regular graph G with second eigenvalue λ2 , 1 h(G) ≥ (d − λ2 ). 2 Proof. For any subset of vertices S of size s, let x = (n − s)1S − s1S¯ .1 We get xT x = (n − s)2 s + s2 (n − s) = s(n − s)n and X xT Ax = 2 ¯ + 2s2 e(S). ¯ xi xj = 2(n − s)2 e(S) − 2s(n − s)e(S, S) (i,j)∈E ¯ observe that every degree is equal to d, and ds can be viewed as To eliminate e(S) and e(S), ¯ ¯ counting each edge in e(S) twice and each edge in e(S, S). Therefore, ds = 2e(S) + e(S, S). ¯ ¯ Similarly, d(n − s) = 2e(S) + e(S, S). This yields ¯ − 2s(n − s)e(S, S) ¯ + s2 (d(n − s) − e(S, S)) ¯ = dns(n − s) − n2 e(S, S). ¯ xT Ax = (n − s)2 (ds − e(S, S)) Since x · 1 = 0, we can use the variational definition of λ2 to claim that λ2 ≥ ¯ ¯ xT Ax dns(n − s) − n2 e(S, S) n e(S, S) = = d − . xT x s(n − s)n s(n − s) For any set S of size s ≤ n/2, we have ¯ e(S, S) n−s 1 ≥ (d − λ2 ) ≥ (d − λ2 ). |S| n 2 This theorem shows that if d − λ2 is large, for example λ2 ≤ d/2, then the graph is a (d, d/4)expander - very close to best possible. The quantity d − λ2 is called the spectral gap. There is also a bound in the opposite direction, although we will not prove it here. Theorem 3. For any d-regular graph with second eigenvalue λ2 , p h(G) ≤ d(d − λ2 ). 1 Note that we used exactly the same vector to prove our bound on the independence number. 2 4 How large can the spectral gap be? We have seen that graphs where the maximum eigenvalue λ1 dominates all other eigenvalues have very interesting properties. Here we ask the question, how small can the remaining eigenvalues possibly be? We know that the complete graph Kn has eigenvalues n − 1 and −1 and therefore λ = maxi6=1 |λi | is dominated by λ1 by a factor of n − 1, the degree in Kn . For a constant degree d and large n, this cannot happen. Theorem 4. For any constant d > 1, any d-regular graph has an eigenvalue λi 6= d of absolute value √ λ = max |λi | ≥ (1 − o(1)) d i6=1 where o(1) → 0 as n → ∞. Proof. Consider the square of the adjacency matrix A2 . A2 has d on the diagonal, and therefore T r(A2 ) = dn. On the other hand, the eigenvalues of A2 are λ2i , and so T r(A2 ) = n X λ2i ≤ d2 + (n − 1)λ2 . i=1 Putting these together, we get λ2 ≥ dn − d2 ≥ (1 − d/n)d = (1 − o(1))d. n−1 √ So the best possible spectral gap that we can have is roughly between d and d. More precisely, √ it is known that the second eigenvalue is always at least 2 d − 1 − o(1). This leads to the definition of Ramanujan graphs. Definition 4. A √ d-regular graph is Ramanujan, if all eigenvalues in absolute value are either equal to d or at most 2 d − 1. √ It is known in fact that a random d-regular graph has all non-trivial eigenvalues bounded by 2 d − 1 + o(1) in absolute value. However, it is more difficult to come up with explicit Ramanujan graphs. 5 Explicit expanders The following graph is a beautiful algebraic construction of Margulis which was the earliest explicit expander known. Definition 5 (Margulis’ graph). Let V = Zn × Zn and define an 8-regular graph on V as follows. Let 1 2 1 0 1 0 T1 = , T2 = , e1 = , e2 = . 0 1 2 1 0 1 Each vertex v ∈ Zn × Zn is adjacent to T1 v, T2 v, T1 v + e1 , T2 v + e2 and four other vertices obtained by the inverse tranformations. (This is a multigraph with possible multiedges and loops.) 3 This graph has maximum eigenvalue d = 8, and it can be also computed that the second √ eigenvalue is λ2 ≤ 5 2. (We will not show this here.) The spectral gap is d − λ2 ≥ 0.92 and therefore this graph is a (8, 0.46)-expander. Later, even simpler constructions were found. Definition 6. Let p be prime and let V = Zp . We define a 3-regular graph G = (V, E) where the edges are of two types: (x, x + 1), and (x, x−1 ) for each x ∈ Zp . (We assume that 0−1 = 0 for this purpose.) It is known that this is a (3, )-expander for some fixed > this relies on deep results in number theory. √ These graphs are not Ramanujan graphs, i.e. their second d. However, even such graphs can be constructed explicitly. Ramanujan graphs was found by Lubotzky, Phillips and Sarnak 4 0 and any prime p. The proof of eigenvalue is not on the order of The first explicit construction of in 1988. Math 184: Combinatorics Lecture 23: Random walks on expanders Instructor: Benny Sudakov 1 Random walks on expanders Expanders have particularly nice behavior with respect to random walks. A random walk is a sequence of vertices, where each successive vertex is obtain by following a random edge from the previous vertex. Starting from a particular vertex v0 , we will be interested in the probability distribution of the vertex vt after t steps of a random walk. This distribution can be described by (t) a vector x(t) , where xi = Pr[vt = i]. First, we have the following simple lemma. Lemma 1. Let G be a d-regular graph with adjacency matrix A, and let B = d1 A. Then starting from probability distribution x(0) , a random walk after t steps has probability distribution x(t) = B t x(0) . Proof. We show that starting from a distribution x, the distribution after one step is y = Bx: yj = Pr[v1 = j] = X Pr[v1 = j | v0 = i] Pr[v0 = i] = (i,j)∈E X 1 1 · xi = (Ax)j = (Bx)j. d d (i,j)∈E Now the lemma follows by induction: x(t) = Bx(t−1) = B(B t−1 x(0) ) = B t x(0) . It is very natural to analyze the behavior of a random walk in the basis of eigenvectors of B (which are equal to the eigenvectors of A). The eigenvalues of B are equal to the eigenvalues of A divided by d. For an expander graph, we obtain the following. Theorem 1. Suppose that G is a d-regular graph with all other eigenvalues bounded by λ in absolute value. Let x(0) be any initial distribution (e.g. concentrated on one vertex v0 ), and let u = n1 1 denote the uniform distribution. Then after t steps of a random walk on G, we obtain t λ ||x(t) − u|| ≤ . d Note: we bound the deviation from u in the L2 norm. For probability distributions, it is more natural to use the total variation distance, or L1 norm. In the L1 norm, we get ||x(t) − u||1 = √ 1 · |x(t) − u| ≤ n(λ/d)t by Cauchy-Schwartz. 1 Proof. Let x(0) = u + v where u = n1 1. Since x(0) · 1 = u · 1 = 1, we have v · u = 0. Hence, We have ||v||2 = ||x(0) ||2 − ||u||2 ≤ 1. After t steps of a random walk, we have x(t) = B t x(0) . Since Bu = u, the difference between x(t) and u can be bounded by 2t 2t λ λ 2 (t) 2 t (0) 2 t 2 T 2t ||v|| ≤ . ||x − u|| = ||B (x − u)|| = ||B v|| = v B v ≤ d d Thus, a random walk on an expander (where λ/d is small) converges very quickly to the uniform distribution. Now, we prove an even stronger statement. Given that the distribution is already uniform (or very close to uniform), we are interested in how often the random walk visits a certain set S ⊂ V . Although in each particular step, the probability is σ = |S|/|V |, this might not be the case in a sequence of steps (which are certainly not independent). Nonetheless, we prove that the random walk behaves almost as if successive locations were independent. Lemma 2. Let (v0 , v1 , . . .) be a random walk, where v0 has the uniform distribution. Let S ⊂ V be a subset of vertices of size |S| = σ|V |. Then λ t Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] ≤ σ + . d Note that for t independent samples, the probability would have been σ t . If λ/d << σ, we are not making a huge error by imagining that the vertices v0 , . . . , vt are in fact independent. Proof. Let P be a projection operator corresponding to S, i.e. P x makes all coordinates outside of S zero, and leaves the coordinates in S intact. Then we can write Pr[v0 ∈ S] = ||P u||1 , Pr[v0 ∈ S & v1 ∈ S] = ||P BP u||1 , etc., and Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] = ||(P BP )t u||1 . Thus the goal is to analyze the operator P BP . The idea is that B shrinks all the components of a vector except the component parallel to 1, and P shrinks the component parallel to 1. More formally, for any vector x, write P x = αu + v where u = n1 1 and v ⊥ u. Observe that √ uT P x xT P u = ≤ n||x||2 · ||P u||2 = σn||x||2 T u u 1/n p using Cauchy-Schwartz and ||P u||2 = σ/n. We have P BP x = αP Bu + P Bv and by the triangle inequality, λ ||P BP x||2 ≤ ||αP Bu||2 + ||P Bv||2 ≤ α||P u||2 + ||v||2 . d p √ Again, we use ||P u||2 = σ/n and α ≤ σn||x||2 , which gives λ λ ||P BP x||2 ≤ σ||x||2 + ||v||2 ≤ σ + ||x||2 . d d α= 2 Iterating t times, we get ||(P BP )t u||2 ≤ σ+ λ d t ||u||2 . Finally, we return to the L1 norm. We obtain t ||(P BP ) u||1 ≤ 2 √ t n||(P BP ) u||2 ≤ √ λ n σ+ d t ||u||2 = λ σ+ d t . Application to randomized algorithms Suppose we have a randomized algorithm which uses r random coin flips and returns an answer, YES or NO. Let us assume that the algorithm is “safe” in the sense that if the correct answer is YES, our algorithm always returns YES. However, if the correct answer is NO, our algorithm is allowed to make a mistake with probability 1/2 - in other words, it will answer YES with probability at most 1/2. This might not be very satisfying, but we can actually make the probability of error much smaller by running the algorithm several times. It’s easy to see that if we run the algorithm t times and answer YES if and only if the algorithm returned YES every time, we make an error with probability at most 1/2t . However, to implement this we need tr random coin flips. Sometimes, true randomness is a scarce resource and we would like to save the number of coin flips that we really need. Here’s how we can do this using expanders. • Let G be a d-regular expander with λ/d ≤ 1/4, on the vertex set V = {0, 1}r . I.e., each vertex corresponds to the outcome of r coin flips. • Using r random coin flips, generate a random vertex v0 ∈ V . • Perform a random walk on G of length t, in each step using log d random bits to find a random neighbor vi+1 of vi . • Run the algorithm t + 1 times, with random bits corresponding to v0 , v1 , . . . , vt . • Answer YES if and only if the algorithm always returned YES. Observe that the total number of random bits needed here is r + t log d, much smaller than rt for constant d (like d = 8). It remains to analyze the probability of error. Assume that the true answer is NO and let S be the set of vertices such that the algorithm with the respective random bits makes a mistake, i.e. returns YES. Since the true answer is NO, we know that |S| ≤ 21 |V |. By Lemma 2 with σ = 1/2 and λ/d ≤ 1/4, we get t 3 Pr[v0 ∈ S & v1 ∈ S & . . . & vt ∈ S] ≤ . 4 I.e., the probability of error decreases exponentially in t. 3
© Copyright 2024