MATH41112/61112 Ergodic Theory Charles Walkden 6th February, 2015 MATH4/61112 Contents Contents 0 Preliminaries 2 1 An introduction to ergodic theory. quences Uniform distribution of real se4 2 More on uniform distribution mod 1. Measure spaces. 13 3 Lebesgue integration. Invariant measures 24 4 More examples of invariant measures 39 5 Ergodic measures: definition, criteria, and basic examples 44 6 Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem to prove ergodicity 54 7 Continuous transformations on compact metric spaces 63 8 Ergodic measures for continuous transformations 73 9 Recurrence 84 10 Birkhoff ’s Ergodic Theorem 90 11 Applications of Birkhoff ’s Ergodic Theorem 100 12 Solutions to the Exercises 109 1 MATH4/61112 0. Preliminaries 0. Preliminaries §0.1 Contact details The lecturer is Dr Charles Walkden, Room 2.241, Tel: 0161 275 5805, Email: [email protected]. My office hour is: Thursday, 11:30am-12:30pm. If you want to see me at another time then please email me first to arrange a mutually convenient time. §0.2 Course structure This is a reading course, supported by one lecture per week. I have split the notes into weekly sections. You are expected to have read through the material before the lecture, and then go over it again afterwards in your own time. In the lectures I will highlight the most important parts, explain the statements of the theorems and what they mean in practice, and point out common misunderstandings. As a general rule, I will not normally go through the proofs in great detail (but they are examinable unless indicated otherwise). You will be expected to work through the proofs yourself in your own time. All the material in the notes is examinable, unless it says otherwise. (Note that if a proof is marked ‘not examinable’ then it means that I won’t expect you to reproduce the proof, but you will be expected to know and understand the statement of the result. If an entire section is marked ‘not examinable’ (for example, the review of Riemann integration in §1.3, and the discussions on the proofs of von Neumann’s Ergodic Theorem and Birkhoff’s Ergodic Theorem in §§9.6, 10.5, respectively) then you don’t need to know the statements of any subsidiary lemmas/propositions in those sections that are not used elsewhere but reading this material may help your understanding.) Each section of the notes contains exercises. The exercises are a key part of the course and you are expected to attempt them. The solutions to the exercises are contained in the notes; I would strongly recommend attempting the exercises first without referring to the solutions. Please point out any mistakes (typographical or mathematical) in the notes. §0.3 The exam The exam is a 3 hour written exam. There are several past exam papers on the course website. Note that some topics (for example, entropy) were covered in previous years and are not covered this year; there are also some new topics (for example, Kac’s Lemma) that were not covered in 2010 or earlier. The format of the exam is the same as last year’s. The exam has 5 questions, of which you must do 4. If you attempt all 5 questions then you will get credit for your best 4 answers. The style of the questions is similar to last year’s exam as well as to ‘Section B’ questions from earlier years. 2 MATH4/61112 0. Preliminaries Each question is worth 30 marks. Thus the total number of marks available on the exam is 4 × 30 = 120. This will then be converted to a mark out of 100 (by multiplying by 100/120). There is no coursework, in-class test or mid-term for this course. §0.4 Recommended texts There are several suitable introductory texts on ergodic theory, including W. Parry, Topics in Ergodic Theory P. Walters, An Introduction to Ergodic Theory I.P. Cornfeld, S.V. Fomin and Ya.G. Sinai, Ergodic Theory K. Petersen, Ergodic Theory M. Einsiedler and T. Ward, Ergodic Theory: With a View Towards Number Theory. Parry’s or Walter’s books are the most suitable for this course. 3 MATH4/61112 1. Uniform distribution mod 1 1. An introduction to ergodic theory. Uniform distribution of real sequences §1.1 Introduction A dynamical system consists of a space X, often called a phase space, and a rule that determines how points in X evolve in time. Time can be either continuous (in which case a dynamical system is given by a first order autonomous differential equation) or discrete (in which case we are studying the iterates of a single map T : X → X). We will only consider the case of discrete time in this course. Thus we will be studying the iterates of a single map T : X → X. We will write T n = T ◦ · · · ◦ T (n times) to denote the nth composition of T . If x ∈ X then we can think of T n (x), the result of applying the map T n times to the point x, as being where x has moved to after time n. We call the sequence x, T (x), T 2 (x), . . . , T n (x), . . . the orbit of x. If T is invertible (and so we can iterate backwards by repeatedly applying T −1 ) then sometimes we refer to the doubly-infinite sequence . . . , T −n (x), . . . , T −1 (x), x, T (x), . . . , T n (x), . . . as the orbit of x and the sequence x, T (x), . . . , T n (x), . . . as the forward orbit of x. As an example, consider the map T : [0, 1] → [0, 1] defined by 2x if 0 ≤ x ≤ 1/2, T (x) = 2x − 1 if 1/2 < x ≤ 1. We call this the doubling map. Some orbits for the doubling map are periodic, i.e. they return to where they started after a finite number of iterations. For example, 2/5 is periodic as T (2/5) = 4/5, T (4/5) = 3/5, T (3/5) = 1/5, T (1/5) = 2/5. Thus T 4 (2/5) = 2/5. We say that 2/5 has period 4. In general, for a general dynamical system T : X → X, a point x ∈ X is a periodic point with period n > 0 if T n (x) = x. (Note that we do not assume that n is least.) If x is a periodic point of period n then we call {x, T (x), . . . , T n−1 (x)} a periodic orbit of period n. Other points for the doubling map may have a dense orbit in [0,1]. Recall that a set Y is dense in [0, 1] if any point in [0, 1] can be arbitrarily well approximated by a point in Y . Thus the orbit of x is dense in [0, 1] if: for all x′ ∈ X and for all ε > 0 there exists n > 0 such that |T n (x) − x′ | < ε. Consider a subinterval [a, b] ⊂ [0, 1]. How frequently does an orbit of a point under the doubling map visit the interval [a, b]? Define the characteristic function χB of a set B by 1 if x ∈ B, χB (x) = 0 if x 6∈ B. Then n−1 X χ[a,b] (T j (x)) j=0 4 MATH4/61112 1. Uniform distribution mod 1 denotes the number of the first n points in the orbit of x that lie in [a, b]. Hence n−1 1X χ[a,b] (T j (x)) n j=0 denotes the proportion of the first n points in the orbit of x that lie in [a, b]. Hence n−1 1X lim χ[a,b] (T j (x)) n→∞ n j=0 denotes the frequency with which the orbit of x lies in [a, b]. In ergodic theory, one wants to understand when this is equal to the ‘size’ of the interval [a, b] (we will make ‘size’ precise later by using measure theory; for the moment, ‘size’=‘length’). That is, when does n−1 1X χ[a,b] (T j (x)) = b − a n→∞ n lim (1.1.1) j=0 for every interval [a, b]? Note that if x satisfies (1.1.1) then the proportion of time that the orbit of x spends in an interval [a, b] is equal to the length of that interval; i.e. the orbit of x is equidistributed in [0, 1] and does not favour one region of [0, 1] over another. In general, one cannot expect (1.1.1) to hold for every point x; indeed, if x is periodic then (1.1.1) does not hold. Even if the orbit of x is dense, then (1.1.1) may not hold. However, one might expect (1.1.1) to hold for ‘typical’ points x ∈ X (where again we can make ‘typical’ precise using measure theory). One might also want to replace the function χ[a,b] with an arbitrary function f : X → R. In this case one would want to ask: for the doubling map T , when is it the case that n−1 1X f (T j (x)) = lim n→∞ n j=0 Z 1 f (x) dx? 0 The goal of the course is to understand the statement, prove, and explain how to apply the following result. Theorem 1.1.1 (Birkhoff ’s Ergodic Theorem) Let (X, B, µ) be a probability space. Let f ∈ L1 (X, B, µ) be an integrable function. Suppose that T : X → X is an ergodic measure-preserving transformation of X. Then n−1 1X f (T j (x)) = n→∞ n lim j=0 Z f dµ for µ-a.e. point x ∈ X. Ergodic theory has many applications to other areas of mathematics, notably hyperbolic geometry, number theory, fractal geometry, and mathematical physics. We shall see some of the (simpler) applications to number theory throughout the course. 5 MATH4/61112 §1.2 1. Uniform distribution mod 1 Uniform distribution mod 1 Let T : X → X be a dynamical system. In ergodic theory we are interested in the long-term distributional behaviour of the sequence of points x, T (x), T 2 (x), . . .. Before studying this problem, we consider an analogous problem in the context of sequences of real numbers. Let xn ∈ R be a sequence of real numbers. We may decompose xn as the sum of its integer part [xn ] = sup{m ∈ Z | m ≤ x} (i.e. the largest integer which is less than or equal to xn ) and its fractional part {xn } = xn − [xn ]. Clearly, 0 ≤ {xn } < 1. The study of xn mod 1 is the study of the sequence {xn } in [0, 1]. Definition. We say that the sequence xn is uniformly distributed mod 1 (udm1 for short) if for every a, b with 0 ≤ a < b ≤ 1, we have that lim n→∞ 1 card {0 ≤ j ≤ n − 1 | {xj } ∈ [a, b]} = b − a n as n → ∞. Remarks. (i) Here, card denotes the cardinality of a set. (ii) Thus xn is uniformly distributed mod 1 if, given any interval [a, b] ⊂ [0, 1], the frequency with which the fractional parts of xn lie in the interval [a, b] is equal to its length, b − a. (iii) We can replace [a, b] by [a, b), (a, b] or (a, b) without altering the definition. The following result gives a necessary and sufficient condition for the sequence xn ∈ R to be uniformly distributed mod 1. Theorem 1.2.1 (Weyl’s Criterion) The following are equivalent: (i) the sequence xn ∈ R is uniformly distributed mod 1; (ii) for every continuous function f : [0, 1] → R with f (0) = f (1) we have n−1 1X lim f ({xj }) = n→∞ n j=0 Z 1 f (x) dx; (1.2.1) 0 (iii) for each ℓ ∈ Z \ {0} we have n−1 1 X 2πiℓxj e = 0. n→∞ n lim j=0 Remarks. (i) As a grammatical point, criterion is singular (the plural is criteria). Weyl’s criterion is that (i) and (iii) are equivalent. Statement (ii) has been included because it is an important intermediate step in the proof and, as we shall see, it closely resembles an ergodic theorem. 6 MATH4/61112 1. Uniform distribution mod 1 (ii) One can replace the hypothesis that f is continuous in (1.2.1) with f is Riemann integrable. (iii) To prove that (i) is equivalent to (iii) we work, in fact, not on the unit interval [0, 1] but on the unit circle R/Z. To form R/Z, we work with real numbers modulo the integers (informally: we ignore integers parts). Note that, ignoring integer parts means that 0 and 1 in [0, 1] are ‘the same’. Thus the end-points of the unit interval ‘join up’ and we see that R/Z is a circle. More formally, R is an additive group, Z is a subgroup and the quotient group R/Z is, topologically, a circle. Note that the requirement in (ii) that f (0) = f (1) means that f : [0, 1] → R is a well-defined function on the circle R/Z. It is, however, the case that (i) is equivalent to (ii) without the hypothesis in (ii) that f (0) = f (1). §1.2.1 The sequence xn = αn The behaviour of the sequence xn = αn depends on whether α is rational or irrational. If α ∈ Q then it is easy to see that {αn} can take on only finitely many values in [0, 1]. Indeed, if α = p/q (p ∈ Z, q ∈ Z, q 6= 0, hcf(p, q) = 1) then {αn} takes the q values p 2p (q − 1)p 0 0= , , ,..., q q q q as {qp/q} = 0. In particular αn is not uniformly distributed mod 1. If α 6∈ Q then the situation is completely different. We shall show that αn is uniformly distributed mod 1 by applying Weyl’s Criterion. Let ℓ ∈ Z \ {0}. As α 6∈ Q we have that ℓα is never an integer; hence e2πiℓα 6= 1. Note that n−1 n−1 j=0 j=0 1 X 2πiℓxj 1 X 2πiℓαj 1 e2πiℓαn − 1 e = e = n n n e2πiℓα − 1 by summing the geometric progression. Hence X n−1 1 |e2πiℓαn − 1| 1 2 1 2πiℓxj e = n |e2πiℓα − 1| ≤ n |e2πiℓα − 1| . n (1.2.2) j=0 As α 6∈ Q, the denominator in the right-hand side of (1.2.2) is not 0. Letting n → ∞ we see that n−1 1 X 2πiℓx j lim e = 0. n→∞ n j=0 Hence xn is uniformly distributed mod 1. Remarks. 1. More generally, we could consider the sequence xn = αn + β. It is easy to see by modifying the above argument that xn is uniformly distributed mod 1 if and only if α is irrational. (See Exercise 1.2.) 7 MATH4/61112 1. Uniform distribution mod 1 2. Fix α > 1 and consider the sequence xn = αn x. Then it is possible to show that for (Lebesgue) almost every x ∈ R, the sequence xn is uniformly distributed mod 1. We will prove this, at least for the cases when α = 2, 3, 4, . . .. 3. Suppose we set x = 1 in the above remark and consider the sequence xn = αn . Then one can show that xn is uniformly distributed mod 1 for almost every α > 1. However, not a single example of such an α is known! Indeed, it is not even known if (3/2)n is dense mod 1. §1.2.2 Proof of Weyl’s Criterion We prove (i) implies (ii). Suppose that the sequence xn ∈ R is uniformly distributed mod 1. If χ[a,b] is the characteristic function of the interval [a, b], then we may rewrite the definition of uniform distribution mod 1 as Z 1 n−1 1X χ[a,b] (x) dx, as n → ∞. χ[a,b] ({xj }) → n 0 j=0 From this we deduce that n−1 1X g({xj }) → n j=0 Z 1 g(x) dx, 0 as n → ∞, P whenever g is a step function, i.e., when g(x) = m k=1 ck χ[ak ,bk ] (x) is a finite linear combination of characteristic functions of intervals. Now let f be a continuous function on [0, 1]. Then, given ε > 0, we can find a step function g with kf − gk∞ ≤ ε. We have the estimate n−1 Z 1 X 1 f (x) dx f ({xj }) − n 0 j=0 n−1 n−1 Z 1 1 X 1 X ≤ g(x) dx (f ({xj }) − g({xj })) + g({xj }) − 0 n j=0 n j=0 Z 1 Z 1 + f (x) dx g(x) dx − 0 0 n−1 Z 1 n−1 1 X 1X g(x) dx |f ({xj }) − g({xj })| + g({xj }) − ≤ n 0 n j=0 j=0 Z 1 + |g(x) − f (x)| dx 0 n−1 Z 1 1 X ≤ 2ε + g({xj }) − g(x) dx . 0 n j=0 Since the last term converges to zero as n → ∞, we obtain n−1 Z 1 1 X f (x) dx ≤ 2ε. f ({xj }) − lim sup n→∞ n 0 j=0 8 MATH4/61112 1. Uniform distribution mod 1 Since ε > 0 is arbitrary, this gives us that n−1 1X f ({xj }) → n j=0 Z 1 f (x) dx 0 as n → ∞. We now prove (ii) implies (iii). Suppose that f : [0, 1] → C is continuous and f (0) = f (1). By writing f = Ref + iImf and applying (ii) to the real and imaginary parts of f we have that Z 1 n−1 1X f (x) dx, f ({xj }) → n 0 j=0 as n → ∞. For ℓ ∈ Z, ℓ 6= 0 we let f (x) = e2πiℓx . Note that, as exp is 2πi-periodic, f ({xj }) = e2πiℓxj . Hence n−1 1 X 2πiℓxj e → n j=0 Z 1 e2πiℓx dx = 0 1 1 e2πiℓx = 0 2πiℓ 0 as n → ∞, as ℓ 6= 0. We prove (iii) implies (i). Suppose that (iii) holds. Then n−1 1X g({xj }) → n j=0 Z 1 g(x) dx, 0 as n → ∞, P 2πiℓk x is a trigonometric polynomial, c ∈ C, i.e. a finite linear whenever g(x) = m k k=1 ck e combination of exponential functions. Note that the space C(X, C) is a vector space: if f, g ∈ C(X, C) then f + g ∈ C(X, C) and if f ∈ C(X, C), λ ∈ C then λf ∈ C(X, C). A linear subspace S ⊂ C(X, C) is an algebra if whenever f, g ∈ S then f g ∈ S. We will need the following result: Theorem 1.2.2 (Stone-Weierstrass Theorem) Let X be a compact metric space and let C(X, C) denote the space of continuous functions defined on X. Suppose that S ⊂ C(X, C) is an algebra of continuous functions such that (i) if f ∈ S then f¯ ∈ S, (ii) S separates the points of X, i.e. for all x, y ∈ X, x 6= y, there exists f ∈ S such that f (x) 6= f (y), (iii) for every x ∈ X there exists f ∈ S such that f (x) 6= 0. Then S is uniformly dense in C(X, C), i.e. for all f ∈ C(X, C) and all ε > 0, there exists g ∈ S such that kf − gk∞ = supx∈X |f (x) − g(x)| < ε. We shall apply the Stone-Weierstrass Theorem with S given by the set of trigonometric polynomials. It is easy to see that S satisfies the hypotheses of Theorem 1.2.2. Let f be any continuous function on [0, 1] with f (0) = f (1). Given ε > 0 we can find a trigonometric polynomial g such that kf − gk∞ ≤ ε. As in the first part of the proof, we can conclude that Z 1 n−1 1X f (x) dx, as n → ∞. f ({xj }) → n 0 j=0 9 MATH4/61112 1. Uniform distribution mod 1 Now consider the interval [a, b] ⊂ [0, 1]. Given ε > 0, we can find continuous functions f1 and f2 (with f1 (0) = f1 (1), f2 (0) = f2 (1)) such that f1 ≤ χ[a,b] ≤ f2 and Z 1 0 We then have that f2 (x) − f1 (x) dx ≤ ε. Z 1 n−1 n−1 1X 1X f1 (x) dx χ[a,b] ({xj }) ≥ lim inf f1 ({xj }) = lim inf n→∞ n n→∞ n 0 j=0 j=0 Z 1 Z 1 χ[a,b] (x) dx − ε f2 (x) dx − ε ≥ ≥ 0 0 and Z 1 n−1 n−1 1X 1X lim sup f2 (x) dx χ[a,b] ({xj }) ≤ lim sup f2 ({xj }) = n→∞ n n→∞ n 0 j=0 j=0 Z 1 Z 1 ≤ f1 (x) dx + ε ≤ χ[a,b] (x) dx + ε. 0 0 Since ε > 0 is arbitrary, we have shown that n−1 1X χ[a,b] (xj ) = lim n→∞ n j=0 Z 1 0 χ[a,b] (x) dx = b − a, so that xn is uniformly distributed mod 1. §1.2.3 2 Exercises Exercise 1.1 Show that if xn is uniformly distributed mod 1 then {xn } is dense in [0, 1]. (The converse is not true.) Exercise 1.2 Let α, β ∈ R. Let xn = αn + β. Show that xn is uniformly distributed mod 1 if and only if α 6∈ Q. Exercise 1.3 (i) Prove that log10 2 is irrational. (ii) The leading digit of an integer is the left-most digit of its base 10 representation. (Thus the leading digit of 32 is 3, the leading digit of 1024 is 1, etc.) Show that the frequency with which 2n has leading digit r (r = 1, 2, . . . , 9) is log10 (1 + 1/r). (Hint: first show that 2n has leading digit r if and only if r10k ≤ 2n < (r + 1)10k for some k ∈ N.) 10 MATH4/61112 1. Uniform distribution mod 1 Exercise 1.4 Calculate the frequency with which the penultimate leading digit of 2n is equal to r, r = 0, 1, 2, . . . , 9. (The penultimate leading digit is the second-to-leftmost digit in the base 10 expansion. The penultimate leading digit of 2048 is 0, etc.) §1.3 Appendix: a recap on the Riemann Integral (This subsection is included for general interest and to motivate the Lebesgue integral. Hence it is not examinable.) You have probably already seen the construction of the Riemann integral. This gives a method for defining the integral of suitable functions defined on an interval [a, b]. In the next section we will see how the Lebesgue integral is a generalisation of the Riemann integral in the sense that it allows us to integrate functions defined on spaces more general than subintervals of R. The Lebesgue integral has other nice properties, for example it is well-behaved with respect to limits. Here we give a brief exposition about some inadequacies of the Riemann integral and how they motivate the Lebesgue integral. Let f : [a, b] → R be a bounded function (for the moment we impose no other conditions on f ). A partition ∆ of [a, b] is a finite set of points ∆ = {x0 , x1 , x2 , . . . , xn } with a = x0 < x1 < x2 < · · · < xn = b. In other words, we are dividing [a, b] up into subintervals. We then form the upper and lower Riemann sums U (f, ∆) = L(f, ∆) = n−1 X sup i=0 x∈[xi ,xi+1 ] n−1 X inf i=0 x∈[xi ,xi+1 ] f (x) (xi+1 − xi ), f (x) (xi+1 − xi ). The idea is then that if we make the subintervals in the partition small, these sums will be a good approximation to our intuitive notion of the integral of f over [a, b] as the area bounded by the graph of f . More precisely, if inf U (f, ∆) = sup L(f, ∆), ∆ ∆ where the infimum and supremum are taken over all possible partitions of [a, b], then we write Z b f (x) dx a for their common value and call it the (Riemann) integral of f between those limits. We also say that f is Riemann integrable. The class of Riemann integrable functions includes continuous functions and step functions (i.e. finite linear combinations of characteristic functions of intervals). However, there are many functions for which one wishes to define an integral but which are not Riemann integrable, making the theory rather unsatisfactory. For example, define f : [0, 1] → R by 1 if x ∈ Q f (x) = χQ∩[0,1] (x) = 0 otherwise. 11 MATH4/61112 1. Uniform distribution mod 1 Since between any two distinct real numbers we can find both a rational number and an irrational number, given 0 ≤ y < z ≤ 1, we can find y < x < z with f (x) = 1 and y < x′ < z with f (x′ ) = 0. Hence for any partition ∆ = {x0 , x1 , . . . , xn } of [0, 1], we have U (f, ∆) = n−1 X i=0 (xi+1 − xi ) = 1, L(f, ∆) = 0. Taking the infimum and supremum, respectively, over all partitions ∆ shows that f is not Riemann integrable. Why does Riemann integration not work for the above function and how could we go about improving it? Let us look again at (and slightly rewrite) the formulæ for U (f, ∆) and L(f, ∆). We have U (f, ∆) = n−1 X sup n−1 X inf f (x) λ([xi , xi+1 ]) i=0 x∈[xi ,xi+1 ] and L(f, ∆) = i=0 where, for an interval [y, z], x∈[xi ,xi+1 ] f (x) λ([xi , xi+1 ]), λ([y, z]) = z − y denotes its length. In the example above, things did not work because dividing [0, 1] into intervals (no matter how small) did not ‘separate out’ the different values that f could take. But suppose we had a notion of ‘length’ that worked for more general sets than intervals. Then we could do better by considering more complicated ‘partitions’ of [0, 1], where by partition we now S mean a collection of subsets {E1 , . . . , Em } of [0, 1] such that Ei ∩ Ej = ∅, if i 6= j, and m i=1 Ei = [0, 1]. In the example, for instance, it might be reasonable to write Z 1 f (x) dx = 1 × λ([0, 1] ∩ Q) + 0 × λ([0, 1]\Q) 0 = λ([0, 1] ∩ Q). Instead of using subintervals, the Lebesgue integral uses a much wider class of subsets (namely, sets in a given σ-algebra) together with a notion of ‘generalised length’ (namely, measure). 12 MATH4/61112 2. More on uniform distribution. Measure spaces. 2. More on uniform distribution mod 1. Measure spaces §2.1 Uniform distribution of sequences in Rk We shall now look at the uniform distribution of sequences in Rk . We will say that a sequence xn = (xn,1 , . . . , xn,k ) ∈ Rk is uniformly distributed mod 1 if, given any k-dimensional cube, the frequency with which the fractional parts of xn lie in the cube is equal to its kdimensional volume. Definition. A sequence xn = (xn,1 , . . . , xn,k ) ∈ Rk is said to be uniformly distributed mod 1 if, for each choice of k intervals [a1 , b1 ], . . . , [ak , bk ] ⊂ [0, 1], we have that n−1 1X card{j ∈ {0, 1, . . . , n − 1} | xj ∈ [a1 , b1 ] × · · · × [ak , bk ]} → (b1 − a1 ) · · · (bk − ak ) n j=0 as n → ∞. We have the following criterion for uniform distribution. Theorem 2.1.1 (Multi-dimensional Weyl’s Criterion) Let xn = (xn,1 , . . . , xn,k ) ∈ Rk . The following are equivalent: (i) the sequence xn ∈ Rk is uniformly distributed mod 1; (ii) for any continuous function f : Rk /Zk → R we have n−1 1X f ({xj,1 }, . . . , {xj,k }) → n j=0 Z ··· Z f (x1 , . . . , xk ) dx1 . . . dxk ; (iii) for all ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0} we have n−1 1 X 2πi(ℓ1 xj,1 +···+ℓk xj,k ) e →0 n j=0 as n → ∞. Remark. Here and throughout 0 ∈ Zk denotes the zero vector (0, . . . , 0). Remark. In §1 we commented that, topologically, the quotient group R/Z is a circle. More generally, the quotient group Rk /Zk is a k-dimensional torus. Remark. Consider the case when k = 2 so that R2 /Z2 is the 2-dimensional torus. We can regard R2 /Z2 as the square [0, 1] × [0, 1] with the top and bottom sides identified and left and right sides identified. Thus a continuous function f : R2 /Z2 → R has the property that f (0, y) = f (1, y) and f (x, 0) = f (x, 1). More generally, we can identify the k-dimensional 13 MATH4/61112 2. More on uniform distribution. Measure spaces. torus Rk /Zk with [0, 1]k with (x1 , . . . , xi−1 , 0, xi+1 , . . . , xk ) and (x1 , . . . , xi−1 , 1, xi+1 , . . . , xk ) identified, 1 ≤ i ≤ k. A continuous function f : Rk /Zk → R then corresponds to a continuous function f : [0, 1]k → R such that f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xk ) = f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xk ) for each i, 1 ≤ i ≤ k. Proof of Theorem 2.1.1. The proof of Theorem 2.1.1 is essentially the same as in the case k = 1. 2 §2.2 The sequence xn = (α1 n, . . . , αk n) We shall apply Theorem 2.1.1 to the sequence xn = (α1 n, . . . , αk n), for real numbers α1 , . . . , αk . Definition. Real numbers β1 , . . . , βs ∈ R are said to be rationally independent if the only rationals r1 , . . . , rs ∈ Q such that r1 β1 + · · · + rs βs = 0 are r1 = · · · = rs = 0. Proposition 2.2.1 Let α1 , . . . , αk ∈ R. Then the following are equivalent: (i) the sequence xn = (α1 n, . . . , αk n) ∈ Rk is uniformly distributed mod 1; (ii) α1 , . . . , αk and 1 are rationally independent. Proof. The proof is similar to the discussion in §1.2.1 and we leave it as an exercise. (See Exercise 2.1.) 2 Remark. Note that in the case k = 1, Proposition 2.2.1 reduces to the results of §1.2.1. To see this, note that α, 1 are rationally dependent if and only if there exist rationals r, s (not both zero) such that rα + s = 0. This holds if and only if α is rational. Hence α, 1 are rationally independent if and only if α is irrational. §2.3 Weyl’s Theorem on Polynomials We have seen that αn + β is uniformly distributed mod 1 if α is irrational. Weyl’s Theorem generalises this to polynomials of higher degree. Write p(n) = αk nk + αk−1 nk−1 + · · · + α1 n + α0 . Theorem 2.3.1 (Weyl’s Theorem on Polynomials) If any one of α1 , . . . , αk is irrational then p(n) is uniformly distributed mod 1. To prove this theorem we shall need the following technical result. 14 MATH4/61112 2. More on uniform distribution. Measure spaces. Lemma 2.3.2 (van der Corput’s Inequality) Let z0 , . . . , zn−1 ∈ C and let 1 < m < n. Then n−1 2 n−1−j m−1 n−1 X X X X zj ≤ m(n + m − 1) m2 zi+j z¯i . (m − j) |zj |2 + 2(n + m − 1) Re j=0 i=0 j=1 j=0 Proof (not examinable). The proof is essentially an exercise in multiplying out a product and some careful book-keeping of the cross-terms. You are familiar with a particular case of it, namely the fact that |z0 + z1 |2 = (z0 + z1 )(z¯0 + z¯1 ) = |z0 |2 + |z1 |2 + z0 z¯1 + z¯0 z1 = |z0 |2 + |z1 |2 + 2 Re(z0 z¯1 ). Construct the following parallelogram: z0 z0 z1 z0 z1 z2 .. .. .. . . . z0 z1 z2 · · · z1 z2 · · · z2 · · · .. . .. . zm−1 zm−1 zm zm−1 zm zm+1 .. zn−m ··· zn−m+1 . zn−1 zn−1 .. . ··· .. . zn−2 zn−1 zn−1 (There are n columns, with each column containing m terms, and n + m − 2 rows.) Let sj , 0 ≤ j ≤ n + m − 2, denote the sum of the terms in the jth row. Each zi occurs in exactly m of the row sums sj . Hence s0 + · · · + sn+m−2 = m(z0 + · · · + zn−1 ) so that n−1 2 X zj = |s0 + · · · + sn+m−2 |2 m2 j=0 ≤ (|s0 | + · · · + |sn+m−2 |)2 ≤ (n + m − 1)(|s0 |2 + · · · + |sn+m−2 |2 ), where the final inequality follows from the (n + m − 1)-dimensional Cauchy-Schwarz inequality. Recall that |sj |2 = sj s¯j . Expanding out this product and recalling that 2 Re(z) = z + z¯ we have that X X |sj |2 = |zk |2 + 2 Re zk z¯ℓ k k,ℓ 15 MATH4/61112 2. More on uniform distribution. Measure spaces. where the first sum is over all indices k of the zi occurring in the definition of sj , and the second sum is over the indices ℓ < k of the zi occurring in the definition of sj . Noting that the the number of time the term zk z¯ℓ occurs in |s0 |2 + · · · + |sn+m−1 |2 is equal to m − (ℓ − k), we can write |s0 |2 + · · · + |sn+m−1 |2 ≤ m n−1 X j=0 |zj |2 + 2 Re m−1 X j=1 (m − j) n−1−j X zi+j z¯i i=0 and the result follows. 2 (m) Let xn ∈ R. For each m ≥ 1 define the sequence xn = xn+m − xn to be the sequence of mth differences. The following lemma allows us to infer the uniform distribution of the sequence xn if we know the uniform distribution of the each of the mth differences of xn . Lemma 2.3.3 (m) Let xn ∈ R be a sequence. Suppose that for each m ≥ 1 the sequence xn of mth differences is uniformly distributed mod 1. Then xn is uniformly distributed mod 1. Proof. We shall apply Weyl’s Criterion. We need to show that if ℓ ∈ Z \ {0} then n−1 1 X 2πiℓxj e → 0, n j=0 as n → ∞. Let zj = e2πiℓxj for j = 0, . . . , n − 1. Note that |zj | = 1. Let 1 < m < n. By van der Corput’s inequality, 2 n−1 m−1 X (m − j) n−1−j X 2(n + m − 1) m2 X 2πiℓxj m Re e2πiℓ(xi+j −xi ) (n + m − 1)n + e ≤ 2 2 n n n n j=1 i=0 j=0 m−1 = X 2(n + m − 1) m (m − j)An,j (m + n − 1) + Re n n j=1 where An,j = n−1−j n−1−j 1 X 2πiℓ(xi+j −xi ) 1 X 2πiℓx(j) i . e = e n n i=0 (j) i=0 As the sequence xi of j th differences is uniformly distributed mod 1, by Weyl’s criterion we have that An,j → 0 for each j = 1, . . . , m − 1. Hence for each m ≥ 1 2 n−1 m2 X 2πiℓxj (n + m − 1) lim sup 2 = m. e ≤ lim sup m n n→∞ n n→∞ j=0 Hence, for each m > 1 we have n−1 X 1 1 2πiℓxj e ≤√ . lim sup m n→∞ n j=0 As m > 1 is arbitrary, the result follows. 16 2 MATH4/61112 2. More on uniform distribution. Measure spaces. Proof of Weyl’s Theorem. We will only prove Weyl’s Theorem on Polynomials (Theorem 2.3.1) in the special case where the leading coefficient αk of p(n) = αk nk + · · · + α1 n + α0 is irrational. (The general case, where αi is irrational for some 1 ≤ i ≤ k, can be deduced easily from this special case and we leave this as an exercise. See Exercise 2.2.) We shall use induction on the degree of p. Let ∆(k) denote the statement ‘for every polynomial p of degree ≤ k, with irrational leading coefficient, the sequence p(n) is uniformly distributed mod 1’. We know that ∆(1) is true; this follows immediately from Exercise 1.2. Suppose that ∆(k − 1) is true. Let p(n) = αk nk + · · · + α1 n + α0 be any polynomial of degree k with αk irrational. Let m ∈ N and consider the sequence p(m) (n) = p(n+m)−p(n) of mth differences. We have that p(m) (n) = p(n + m) − p(n) = αk (n + m)k + αk−1 (n + m)k−1 + · · · + α1 (n + m) + α0 − αk nk − αk−1 nk−1 − · · · − α1 n − α0 = αk nk + αk knk−1 m + · · · + αk−1 nk−1 + αk−1 (k − 1)nk−2 h + · · · + α1 n + α1 m + α0 − αk nk − αk−1 nk−1 − · · · − α1 n − α0 . After cancellation, we can see that, for each m, p(m) (n) is a polynomial of degree k − 1 with irrational leading coefficient αk km. Therefore, by the inductive hypothesis, p(m )(n) is uniformly distributed mod 1. We may now apply Lemma 2.3.3 to conclude that p(n) is uniformly distributed mod 1 and so ∆(k) holds. This completes the induction. 2 §2.4 Measures and the Lebesgue integral You may have seen the definition of Lebesgue measure, Lebesgue outer measure and the Lebesgue integral in other courses, for example in Fourier Analysis and Lebesgue Integration. The theory developed in that course is one particular example of a more general theory, which we sketch here. Measure theory is a key technical tool in ergodic theory, and so a good knowledge of measures and integration is essential for this course (although we will not need to know the (many) technical intricacies). §2.4.1 Measure spaces Loosely speaking, a measure is a function that, when given a subset of a space X, will say how ‘big’ that subset is. A motivating example is given by Lebesgue measure on [0, 1]. The Lebesgue measure of an interval [a, b] is given by its length b − a. In defining an abstract measure space, we will be taking the properties of ‘length’ (or, in higher dimensions, ‘volume’) and abstracting them, in much the same way that a metric space abstracts the properties of ‘distance’. It turns out that in general it is not possible to be able to define the measure of an arbitrary subset of X. Instead, we will usually have to restrict our attention to a class of subsets of X. Definition. A collection B of subsets of X is called a σ-algebra if the following properties hold: 17 MATH4/61112 2. More on uniform distribution. Measure spaces. (i) ∅ ∈ B, (ii) if E ∈ B then its complement X \ E ∈ B, (iii) S if En ∈ B, n = 1, 2, 3, . . ., is a countable sequence of sets in B then their union ∞ n=1 En ∈ B. Definition. If X is a set and B a σ-algebra of subsets of X then we call (X, B) a measurable space. Examples. 1. The trivial σ-algebra is given by B = {∅, X}. 2. The full σ-algebra is given by B = P(X), i.e. the collection of all subsets of X. Remark. In general, the trivial σ-algebra is too small and the full σ-algebra is too big. We shall see some more interesting examples of σ-algebras later. Here are some easy properties of σ-algebras: Lemma 2.4.1 Let B be a σ-algebra of subsets of X. Then (i) X ∈ B; (ii) if En ∈ B then T∞ n=1 En ∈ B. In the special case when X is a compact metric space there is a particularly important σ-algebra. Definition. Let X be a compact metric space. We define the Borel σ-algebra B(X) to be the smallest σ-algebra of subsets of X which contains all the open subsets of X. Remarks. 1. By ‘smallest’ we mean that if C is another σ-algebra that contains all open subsets of X then B(X) ⊂ C, that is: \ B(X) = {C | C is a σ-algebra that contains the open sets}. 2. We say that the Borel σ-algebra is generated by the open sets. We call a set in B(X) a Borel set. 3. By Definition 2.4.1(ii), the Borel σ-algebra also contains all the closed sets and is the smallest σ-algebra with this property. 4. By Lemma 2.4.1 it follows that B contains all countable intersections of open sets, all countable unions of countable intersections of open sets, all countable intersections of countable unions of countable intersections of open sets, etc—and indeed many other sets. 18 MATH4/61112 2. More on uniform distribution. Measure spaces. 5. There are plenty of sets that are not Borel sets, although by necessity they are rather complicated. For example, consider R as an additive group and Q ⊂ R as a subgroup. Form the quotient group R/Q and choose an element in [0, 1] for each coset (this requires the Axiom of Choice.) The set E of coset representatives is a non-Borel set. 6. In the case when X = [0, 1] or R/Z, the Borel σ-algebra is also the smallest σ-algebra that contains all sub-intervals. Let X be a set and let B be a σ-algebra of subsets of X. Definition. A function µ : B → R is called a (finite) measure if: (i) µ(∅) = 0; (ii) if En is a countable collection of pairwise disjoint sets in B (i.e. En ∩ Em = ∅ for n 6= m) then ! ∞ ∞ X [ µ(En ). En = µ n=1 n=1 We call (X, B, µ) a measure space. If µ(X) = 1 then we call µ a probability or probability measure and refer to (X, B, µ) as a probability space. Remark. Thus a measure just abstracts properties of ‘length’ or ‘volume’. Condition (i) says that the empty set has zero length, and condition (ii) says that the length of a disjoint union is the sum of the lengths of the individual sets. Definition. We say that a property holds almost everywhere if the set of points on which the property fails to hold has measure zero. Example. We shall see (Exercise 2.9) that the set of rationals in [0, 1] forms a Borel set with zero Lebesgue measure. Thus Lebesgue almost every point in [0, 1] is irrational. (Thus, ‘typical’ (in the sense of measure theory, and with respect to Lebesgue measure) points in [0, 1] are irrational.) We will usually be interested in studying measures on the Borel σ-algebra of a compact metric space X. To define such a measure, we need to define the measure of an arbitrary Borel set. In general, the Borel σ-algebra is extremely large. We shall see that it is often unnecessary to do this and instead it is sufficient to define the measure of a certain class of subsets. §2.4.2 The Hahn-Kolmogorov Extension Theorem A collection A of subsets of X is called an algebra if: (i) ∅ ∈ A, (ii) if A1 , A2 , . . . , An ∈ A then (iii) if A ∈ A then Ac ∈ A. Sn j=1 Aj ∈ A, Thus an algebra is like a σ-algebra, except that it is closed under finite unions and not necessarily closed under countable unions. 19 MATH4/61112 2. More on uniform distribution. Measure spaces. Example. Take X = [0, 1], and A = {all finite unions of subintervals}. Let B(A) denote the σ-algebra generated by A, i.e., the smallest σ-algebra containing A. More precisely: \ B(A) = {C | C is a σ-algebra, C ⊃ A}. In the case when X = [0, 1] and A is the algebra of finite unions of intervals, we have that B(A) is the Borel σ-algebra. Indeed, in the special case of the Borel σ-algebra of a compact metric space X, it is usually straightforward to check whether an algebra generates the Borel σ-algebra. Proposition 2.4.2 Let X be a compact metric space and let B be the Borel σ-algebra. Let A be an algebra of Borel subsets, A ⊂ B. Suppose that for every x1 , x2 ∈ X, x1 6= x2 , there exist disjoint open sets A1 , A2 ∈ A such that x1 ∈ A1 , x2 ∈ A2 . Then A generates the Borel σ-algebra B. The following result says that if we have a function which looks like a measure defined on an algebra, then it extends uniquely to a measure defined on the σ-algebra generated by the algebra. Theorem 2.4.3 (Hahn-Kolmogorov Extension Theorem) Let A be an algebra of subsets of X. Suppose that µ : A → [0, 1] satisfies: (i) µ(∅) = 0; S (ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if ∞ n=1 An ∈ A then ! ∞ ∞ X [ µ(An ). An = µ n=1 n=1 Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension of µ : A → [0, 1]. Remarks. (i) We will often use the Hahn-Kolmogorov Extension Theorem as follows. Take X = [0, 1] and take A to be the algebra consisting of all finite unions of subintervals of X. We then define the ‘measure’ µ of a subinterval in such a way as to be consistent with the hypotheses of the Hahn-Kolmogorov Extension Theorem. It then follows that µ does indeed define a measure on the Borel σ-algebra. (ii) Here is another way in which we shall use the Hahn-Kolmogorov Extension Theorem. Suppose we have two measures, µ and ν, and we want to see if µ = ν. A priori we would have to check that µ(B) = ν(B) for all B ∈ B. The Hahn-Kolmogorov Extension Theorem says that it is sufficient to check that µ(A) = ν(A) for all A in an algebra A that generates B. For example, to show that two Borel probability measures on [0, 1] are equal, it is sufficient to show that they give the same measure to each subinterval. (iii) There is a more general version of the Hahn-Kolmogorov Extension Theorem for the case when X does not have finite measure (indeed, this is the setting in which the 20 MATH4/61112 2. More on uniform distribution. Measure spaces. Hahn-Kolmogorov Theorem is usually stated). Suppose that X is a set, B is a σalgebra of subsets of X, and A is an algebra that generates B. Suppose that µ : A → R ∪ {∞} satisfies conditions (i) and (ii) of Theorem 2.4.3. Suppose in addition S∞ that there exist a countable number of sets An ∈ A, n = 1, 2, 3, . . . such that X = n=1 An such that µ(An ) < 1. Then there exists a unique measure µ : B(A) → R ∪ {∞} which is an extension of µ : A → R ∪ {∞}. A consequence of the proof (which we omit) of the Hahn-Kolmogorov Extension Theorem is that sets in B can be arbitrarily well approximated by sets in A in the following sense. We define the symmetric difference between two sets A, B by A△B = (A \ B) ∪ (B \ A). Thus, two sets are ‘close’ if their symmetric difference is small. Proposition 2.4.4 Suppose that A is an algebra that generates the σ-algebra B. Let B ∈ B and let ε > 0. Then there exists A ∈ A such that µ(A△B) < ε. Remark. It is straightforward to check that if µ(A△B) < ε then |µ(A) − µ(B)| < ε. §2.4.3 Examples of measure spaces Lebesgue measure on [0, 1]. Take X = [0, 1] and take A to be the collection of all finite unions of subintervals of [0, 1]. For a subinterval [a, b] define µ([a, b]) = b − a. This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra B. This is Lebesgue measure. Lebesgue measure on R/Z. Take X = R/Z and take A to be the collection of all finite unions of subintervals of [0, 1). For a subinterval [a, b] define µ([a, b]) = b − a. This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra B. This is Lebesgue measure on the circle. Lebesgue measure on the k-dimensional torus. Take XQ= Rk /Zk and take A to be the collection of all finite unions of k-dimensional sub-cubes kj=1 [aj , bj ] of [0, 1]k . For a Q sub-cube kj=1 [aj , bj ] of [0, 1]k , define µ( k Y [aj , bj ]) = k Y (bj − aj ). j=1 j=1 This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra B. This is Lebesgue measure on the torus. 21 MATH4/61112 2. More on uniform distribution. Measure spaces. Stieltjes measures.1 Take X = [0, 1] and let ρ : [0, 1] → R+ be an increasing function such that ρ(1) − ρ(0) = 1. Take A to be the algebra of finite unions of subintervals and define µρ ([a, b]) = ρ(b) − ρ(a). This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra B. We say that µρ is the measure on [0, 1] with density ρ. Dirac measures. Finally, we give an example of a class of measures that do not fall into the above categories. Let X be an arbitrary space and let B be an arbitrary σ-algebra. Let x ∈ X. Define the measure δx by 1 if x ∈ A δx (A) = 0 if x 6∈ A. Then δx defines a probability measure. It is called the Dirac measure at x. §2.5 Exercises Exercise 2.1 Prove Proposition 2.2.1: let α1 , . . . , αk ∈ R and let xn = (α1 n, . . . , αk n) ∈ Rk . Prove that xn is uniformly distributed mod 1 if and only if α1 , . . . , αk , 1 are rationally independent. Exercise 2.2 Deduce the general case of Weyl’s Theorem on Polynomials (where at least one non-constant coefficient is irrational) from the special case proved above (where the leading coefficient is irrational). Exercise 2.3 Let α be irrational. Show that p(n) = αn2 + n + 1 is uniformly distributed mod 1 by using Lemma 2.3.3 and Exercise 1.2: i.e. show that, for each m ≥ 1, the sequence p(m) (n) = p(n + m) − p(n) of mth differences is uniformly distributed mod 1. Exercise 2.4 Let p(n) = αk nk + αk−1 nk−1 + · · · + α1 n + α0 , q(n) = βk nk + βk−1 nk−1 + · · · + β1 n + β0 . Show that (p(n), q(n)) ∈ R2 is uniformly distributed mod 1 if, for some 1 ≤ i ≤ k, αi , βi and 1 are rationally independent. Exercise 2.5 Prove Lemma 2.4.1. Exercise 2.6 Let X = [0, 1]. Find the smallest σ-algebra that contains the sets: [0, 1/4), [1/4, 1/2), [1/2, 3/4), and [3/4, 1] Exercise 2.7 Let X = [0, 1] and let B denote the Borel σ-algebra. A dyadic interval is an interval of the form hp p i 2 1 , , p1 , p2 ∈ {0, 1, . . . , 2k }. 2k 2k 1 An approximate pronunciation of Stieltjes is ‘Steeel-tyuz’. 22 MATH4/61112 2. More on uniform distribution. Measure spaces. Show that the algebra formed by taking finite unions of all dyadic intervals (over all k ∈ N) generates the Borel σ-algebra. Exercise 2.8 Show that A = {all finite unions of subintervals of [0, 1]} is an algebra. Exercise 2.9 Let µ denote Lebesgue measure on [0, 1]. Show that for any x ∈ [0, 1] we have that µ({x}) = 0. Hence show that the Lebesgue measure of any countable set is zero. Show that Lebesgue almost every point in [0, 1] is irrational. Exercise 2.10 Let X = [0, 1]. Let µ = δ1/2 denote the Dirac δ-measure at 1/2. Show that µ ([0, 1/2) ∪ (1/2, 1]) = 0. Conclude that µ-almost every point in [0, 1] is equal to 1/2. 23 MATH4/61112 3. Lebesgue integration. Invariant measures 3. Lebesgue integration. Invariant measures §3.1 Lebesgue integration Let (X, B, µ) be a measure space. We are interested in how to integrate functions defined on X with respect to the measure µ. In the special case when X = [0, 1], B is the Borel σ-algebra and µ is Lebesgue measure, this will extend the definition of the Riemann integral to a class of functions that are not Riemann integrable. Definition. Let f : X → R be a function. If D ⊂ R then we define the pre-image of D to be the set f −1 D = {x ∈ X | f (x) ∈ D}. A function f : X → R is measurable if f −1 D ∈ B for every Borel subset D of R. One can show that this is equivalent to requiring that f −1 (−∞, c) ∈ B for all c ∈ R. A function f : X → C is measurable if both the real and imaginary parts, Ref and Imf , are measurable. Remark. In writing f −1 D, we are not assuming that f is a bijection. We are writing f −1 D to denote the pre-image of the set D. We define integration via simple functions. Definition. A function f : X → R is simple if it can be written as a linear combination of characteristic functions of sets in B, i.e.: f= r X aj χBj , j=1 for some aj ∈ R, Bi ∈ B, where the Bj are pairwise disjoint. Remarks. (i) Note that the sets Bj are sets in the σ-algebra B; even in the case when X = [0, 1] we do not assume that the sets Bj are intervals. (ii) For example, χQ∩[0,1] is a simple function. Note, however, that χQ∩[0,1] is not Riemann integrable. For a simple function f : X → R we define Z f dµ = r X aj µ(Bj ). j=1 For example, if µ denotes Lebesgue measure on [0, 1] then Z χQ∩[0,1] dµ = µ(Q ∩ [0, 1]) = 0, 24 MATH4/61112 3. Lebesgue integration. Invariant measures as Q ∩ [0, 1] is a countable set and so has Lebesgue measure zero. A simple function can be written as a linear combination of characteristics functions of pairwise disjoint sets in many different ways (for example, χ[1/4,3/4] = χ[1/4,1/2) + χ[1/2,3/4] ). However, one can show that the definition of a simple function f given in (3.1.1) is independent of the choice of representation of f as a linear combination of characteristic functions. Thus for a simple function f , the integral of f can be regarded as being the area of the region in X × R bounded by the graph of f . If f : X → R, f ≥ 0, is measurable then one can show that there exists an increasing sequence of simple functions fn such that fn ↑ f pointwise1 as n → ∞ and we define Z Z f dµ = lim fn dµ. n→∞ This can be shown to exist (although it may be ∞) and to be independent of the choice of sequence fn . For an arbitrary measurable function f : X → R, we write f = f + − f − , where + f = max{f, 0} ≥ 0 and f − = max{−f, 0} ≥ 0 and define Z Z Z + f dµ = f dµ − f − dµ. R R R R If fR + dµ = ∞ and f − dµ isRfinite then we set f dµ = ∞. Similarly, if f + dµ is finite R R but f − Rdµ = ∞ then we set f dµ = −∞. If both f + dµ and f − dµ are infinite then we leave f dµ undefined. Finally, for a measurable function f : X → C, we define Z Z Z f dµ = Ref dµ + i Imf dµ. We say that f is integrable if Z |f | dµ < +∞. (Note that, in the case of a measurable function f : X → R, saying that f is integrable is R R equivalent to saying that both f + dµ and f − dµ are finite.) Denote the space of C-valued integrable functions by L1 (X, B, µ). (We shall see a slightly more sophisticated definition of this space below.) R Note that when we write f dµ we are implicitly integrating over the whole space X. We can define integration over subsets of X as follows. Definition. Let (X, B, µ) be a probability space. Let f ∈ L1 (X, B, µ) and let B ∈ B. Then χB f ∈ L1 (X, B, µ). We define Z Z f dµ = χB f dµ. B 1 fn ↑ f pointwise means: for every x, fn (x) is an increasing sequence of real numbers and fn (x) → f (x) as n → ∞. 25 MATH4/61112 §3.1.1 3. Lebesgue integration. Invariant measures Examples Lebesgue measure. Let X = [0, 1] and let µ denote Lebesgue measure on the Borel σ-algebra. If f : [0, 1] → R is Riemann integrable then it is also Lebesgue integrable and the two definitions agree. However, there are plenty of examples of functions which are Lebesgue integrable but not Riemann integrable. For example, take f (x) = χQ∩[0,1] (x) defined on [0, 1] to be theRcharacteristic function of the rationals. Then f (x) = 0 µ-a.e. Hence f is integrable and f dµ = 0. However, f is not Riemann integrable. The Stieltjes integral. Let ρ : [0, 1] → R+ and suppose that ρ is differentiable. Then one can show that Z Z f dµρ = f (x)ρ′ (x) dx. Integration with respect to Dirac measures. Let x ∈ X. Recall that we defined the Dirac measure at x by 1 if x ∈ B δx (B) = 0 if x 6∈ B. If χB denotes the characteristic function of A then Z 1 if x ∈ B χB dδx = 0 if x 6∈ B. P Suppose that f = aj χBj is R a simple function and that, without loss of generality, the Bj are pairwise disjoint. Then f dδx = aj where j is chosen so that x ∈ Bj (and equals zero if no such Bj exists). Now let f : X → R. By choosing an increasing sequence of simple functions, we see that Z f dδx = f (x). We say that two measurable functions f, g : X → C are equivalent or equal µ-a.e. if f = g µ-a.e., i.e. if µ({x ∈ X | f (x) 6= g(x)}) = 0. The following result says that if two functions differ only on a set of measure zero then their integrals are equal. Lemma 3.1.1 R R Suppose that f, g ∈ L1 (X, B, µ) and f, g are equal µ-a.e. Then f dµ = g dµ. Functions being equivalent is an equivalence relation. We shall write L1 (X, B, µ) for the set of equivalence classes of integrable functions f : X → C on (X, B, µ). We define Z kf k1 = |f | dµ. Then d(f, g) = kf − gk1 is a metric on L1 (X, B, µ). One can show that L1 (X, B, µ) is a vector space; indeed, it is complete in the L1 metric, and so is a Banach space. Remark. In practice, we will often abuse notation and regard elements of L1 (X, B, µ) as functions rather than equivalence classes of functions. In general, in measure theory one can often ignore sets of measure zero and treat two objects (functions, sets, etc) that differ only on a set of measure zero as ‘the same’. 26 MATH4/61112 3. Lebesgue integration. Invariant measures More generally, for any p ≥ 1, we can define the space Lp (X, B, µ) consisting of (equivalence classes of) measurable functions f : X → C such that |f |p is integrable. We can again define a metric on Lp (X, B, µ) by defining d(f, g) = kf − gkp where 1/p Z p |f | dµ kf kp = is the Lp norm. Apart from L1 , the most interesting Lp space is L2 (X, B, µ). This is a Hilbert space2 with the inner product Z hf, gi = f g¯ dµ. The Cauchy-Schwarz inequality holds: |hf, gi| ≤ kf k2 kgk2 for all f, g ∈ L2 (X, B, µ). Suppose that µ is a finite measure. It follows from the Cauchy-Schwarz inequality that L2 (X, B, µ) ⊂ L1 (X, B, µ). In general, the Riemann integral does not behave well with respect to limits. For example, if fn is a sequence of Riemann integrable functions such that fn (x) → f (x) at every point x then it does not followR that f is Riemann integrable. Even if f is Riemann R integrable, it does not follow that fn (x) dx → f (x) dx. The following convergence theorems hold for the Lebesgue integral. Theorem 3.1.2 (Monotone Convergence Theorem) Suppose that Rfn : X → R is an increasing sequence of integrable functions on (X, B, µ). Suppose R that fn dµ is a bounded sequence of real numbers (i.e. there exists M > 0 such that | fn dµ| ≤ M for all n). Then f (x) = limn→∞ fn exists µ-a.e. Moreover, f (x) is integrable and Z Z f dµ = lim n→∞ fn dµ. Theorem 3.1.3 (Dominated Convergence Theorem) Suppose that g : X → R is integrable and that fn : X → R is a sequence of measurable functions with |fn | ≤ g µ-a.e. and limn→∞ fn = f µ-a.e. Then f is integrable and Z Z lim fn dµ = f dµ. n→∞ Remark. Both the Monotone Convergence Theorem and the Dominated Convergence Theorem fail for Riemann integration. §3.2 Invariant measures We are now in a position to study dynamical systems. Let (X, B, µ) be a probability space. Let T : X → X be a dynamical system. If B ∈ B then we define T −1 B = {x ∈ X | T (x) ∈ B}, that is, T −1 B is the pre-image of B under T . 2 An inner product h·, ·i : H × H → C on a complex vector space H is a function such that: (i) hv, vi ≥ 0 for all v ∈ H with equality if and only if v = 0, (ii) hu, vi = hv, ui, and (iii) for each v ∈ H, u 7→ hu, vi is linear. An inner product determines a norm by setting kvk = (hv, vi)1/2 . A norm determines a metric by setting d(u, v) = ku − vk. We say that H is a Hilbert space if the vector space H is complete with respect to the metric induced from the inner product. 27 MATH4/61112 3. Lebesgue integration. Invariant measures Remark. Note that we do not have to assume that T is a bijection for this definition to make sense. For example, let T (x) = 2x mod 1 be the doubling map on [0, 1]. Then T is not a bijection. One can easily check that, for example, T −1 (0, 1/2) = (0, 1/4) ∪ (1/2, 3/4). Definition. A transformation T : X → X is said to be measurable if T −1 B ∈ B for all B ∈ B. Remark. We will often work with compact metric spaces X equipped with the Borel σ-algebra. In this setting, any continuous transformation is measurable. Remark. Suppose that A is an algebra of sets that generates the σ-algebra B. One can show that if T −1 A ∈ B for all A ∈ A then T is measurable. Definition. We say that T is a measure-preserving transformation (m.p.t. for short) or, equivalently, µ is said to be a T -invariant measure, if µ(T −1 B) = µ(B) for all B ∈ B. §3.3 Using the Hahn-Kolmogorov Extension Theorem to prove invariance Recall the Hahn-Kolmogorov Extension Theorem: Theorem 3.3.1 (Hahn-Kolmogorov Extension Theorem) Let A be an algebra of subsets of X and let B(A) denote the σ-algebra generated by A. Suppose that µ : A → [0, 1] satisfies: (i) µ(∅) = 0; S (ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if ∞ n=1 An ∈ A then ! ∞ ∞ X [ µ(An ). µ An = n=1 n=1 Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension of µ : A → [0, 1]. That is, if µ looks like a measure on an algebra A, then it extends uniquely to a measure defined on the σ-algebra B(A) generated by A. Corollary 3.3.2 Let A be an algebra of subsets of X. Suppose that µ1 and µ2 are two measures on B(A) such that µ1 (A) = µ2 (A) for all A ∈ A. Then µ1 = µ2 on B(A). We shall discuss several examples of dynamical systems and prove that certain naturally occurring measures are invariant using the Hahn-Kolmogorov Extension Theorem. Suppose that (X, B, µ) is a probability space and suppose that T : X → X is measurable. We define a new measure T∗ µ by T∗ µ(B) = µ(T −1 B) (3.3.1) where B ∈ B. It is straightforward to check that T∗ µ is a probability measure on (X, B, µ) (see Exercise 3.4). Thus µ is a T -invariant measure if and only if T∗ µ = µ, i.e. T∗ µ and µ are the same measure. Corollary 3.3.2 says that if two measures agree on an algebra, then they agree on the σ-algebra generated by that algebra. Hence if we can show that T∗ µ(A) = µ(A) for all sets A ∈ A for some algebra A that generates B, then T∗ µ = µ, and so µ is a T -invariant measure. 28 MATH4/61112 §3.3.1 3. Lebesgue integration. Invariant measures The doubling map Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure. Define the doubling map by T (x) = 2x mod 1. Proposition 3.3.3 Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure. Define the doubling map by T (x) = 2x mod 1. Then Lebesgue measure µ is T -invariant. Proof. Let A denote the algebra of finite unions of intervals. For an interval [a, b] we have that a b a+1 b+1 −1 T [a, b] = {x ∈ R/Z | T (x) ∈ [a, b]} = , , ∪ . 2 2 2 2 See Figure 3.3.1. b a a 2 b 2 a+1 b+1 2 2 Figure 3.3.1: The pre-image of an interval under the doubling map Hence T∗ µ([a, b]) = µ(T −1 [a, b]) a+1 b+1 a b , , ∪ = µ 2 2 2 2 b a (b + 1) (a + 1) − + − = 2 2 2 2 = b − a = µ([a, b]). Hence T∗ µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness in the Hahn-Kolmogorov Extension Theorem we see that T∗ µ = µ. Hence Lebesgue measure is T -invariant. 2 §3.3.2 Rotations on a circle Let X = R/Z be the circle, let B be the Borel σ-algebra and let µ be Lebesgue measure. Fix α ∈ R. Define T : R/Z → R/Z by T (x) = x + α mod 1. We call T a rotation through angle α. 29 MATH4/61112 3. Lebesgue integration. Invariant measures One can also regard R/Z as the unit circle K = {z ∈ C | |z| = 1} in the complex plane via the map t 7→ e2πit . In these co-ordinates, the map T becomes T (e2πiθ ) = e2πiα e2πiθ , which is a rotation about the origin through the angle 2πα. Proposition 3.3.4 Let T : R/Z → R/Z, T (x) = x + α mod 1, be a circle rotation. Then Lebesgue measure is an invariant measure. Proof. Let [a, b] ⊂ R/Z be an interval. By the Hahn-Kolmogorov Extension Theorem, if we can show that T∗ µ([a, b]) = µ([a, b]) then it follows that µ(T −1 B) = µ(B) for all B ∈ B, hence µ is T -invariant. Note that T −1 [a, b] = [a − α, b − α] where we interpret the endpoints mod 1. (One needs to be careful here: if a − α < 0 < b − α then T −1 ([a, b]) = [0, b − α] ∪ [a − α + 1, 1], etc.) Hence T∗ µ([a, b]) = µ([a − α, b − α]) = (b − α) − (a − α) = b − a = µ([a, b]). Hence T∗ µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness in the Hahn-Kolmogorov Extension Theorem we see that T∗ µ = µ. Hence Lebesgue measure is T -invariant. 2 §3.3.3 The Gauss map Let X = [0, 1] be the unit interval and let B be the Borel σ-algebra. Define the Gauss map T : X → X by 1 x mod 1 if x 6= 0 T (x) = 0 if x = 0. See Figure 3.3.2 0 1 4 1 3 1 2 1 Figure 3.3.2: The graph of the Gauss map (note that there are, in fact, infinitely many branches to the graph, only the first 5 are illustrated) The Gauss map is very closely related to continued fractions. Recall that if x ∈ (0, 1) 30 MATH4/61112 3. Lebesgue integration. Invariant measures then x has a continued fraction expansion of the form 1 x= (3.3.2) 1 x0 + x1 + 1 x2 + · · · where xj ∈ N. If x is rational then this expansion is finite. One can show that x is irrational if and only if it has an infinite continued fraction expansion. Moreover, if x is irrational then it has a unique infinite continued fraction expansion. If x has continued fraction expansion given by (3.3.2) then 1 = x0 + x 1 1 x1 + x2 + . 1 x3 + · · · Hence, taking the fractional part, we see that T (x) has continued fraction expansion given by 1 T (x) = 1 x1 + 1 x2 + x3 + · · · i.e. T acts by deleting the zeroth term in the continued fraction expansion of x and then shifting the remaining digits one place to the left. The Gauss map does not preserve Lebesgue measure (see Exercise 3.5). However it does preserve Gauss’ measure µ defined by Z dx 1 µ(B) = log 2 B 1 + x (here log denotes the natural logarithm; the factor log 2 is a normalising constant to make this a probability measure). Proposition 3.3.5 Gauss’ measure is an invariant measure for the Gauss map. Proof. It is sufficient to check that µ([a, b]) = µ(T −1 [a, b]) for any interval [a, b]. First note that ∞ [ 1 1 −1 T [a, b] = , . b+n a+n n=1 Thus µ(T −1 [a, b]) ∞ Z 1 1 X a+n 1 dx = 1 log 2 1+x n=1 b+n ∞ 1 X 1 1 = − log 1 + log 1 + log 2 a+n b+n = n=1 ∞ X 1 [log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)] log 2 n=1 31 MATH4/61112 3. Lebesgue integration. Invariant measures N 1 X = lim [log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)] N →∞ log 2 n=1 = = = = 1 lim [log(a + N + 1) − log(a + 1) − log(b + N + 1) + log(b + 1)] log 2 N →∞ a+N +1 1 log(b + 1) − log(a + 1) + lim log N →∞ log 2 b+N +1 1 (log(b + 1) − log(a + 1)) log 2 Z b 1 1 dx = µ([a, b]), log 2 a 1 + x as required. §3.3.4 2 Markov shifts Let S be a finite set, for example S = {1, 2, . . . , k}, with k ≥ 2. Let Σ = {x = (xj )∞ j=0 | xj ∈ S} denote the set of all infinite sequences of symbols chosen from S. Thus a point x in the phase space Σ is an infinite sequence of symbols x = (x0 , x1 , x2 , . . .). Define the shift map σ : Σ → Σ by σ((x0 , x1 , x2 , . . .)) = (x1 , x2 , x3 , . . .) (equivalently, (σ(x))j = xj+1 ). Thus σ takes a sequence, deletes the zeroth term in this sequence, and then shifts the remaining terms in the sequence one place to the left. When constructing a measure µ on the Borel σ-algebra B of [0, 1] we first defined µ on an algebra A that generates the σ-algebra B and then extended µ to B using the HahnKolmogorov Extension Theorem. In this case, our algebra A was the collection of finite unions of intervals; thus to define µ on A it was sufficient to define µ on an interval. We want to use a similar procedure to define measures on Σ. To do this, we first need to define a metric on Σ, so that it makes sense to talk about the Borel σ-algebra, and then we need an algebra of subsets that generates the Borel σ-algebra. Let x, y ∈ Σ. Suppose that x 6= y. Define n(x, y) = n where xn 6= yn but xj = yj for 0 ≤ j ≤ n − 1. Thus n(x, y) is the index of the first place in which the sequences x, y disagree. For convenience, define n(x, y) = ∞ if x = y. Define d(x, y) = 1 . 2n(x,y) Thus two sequences x, y are close if they agree for a large number of initial places. One can show (see Exercise 3.10) that d is a metric on Σ and that the shift map σ : Σ → Σ is continuous. Fix ij ∈ S, j = 0, 1, . . . , n − 1. We define the cylinder set [i0 , i1 , . . . , in−1 ] = {x = (xj )∞ j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}. That is, the cylinder set [i0 , i1 , . . . , in−1 ] consists of all infinite sequences of symbols from S that begin i0 , i1 , . . . , in−1 . We call n the rank of the cylinder. Cylinder sets for shifts often 32 MATH4/61112 3. Lebesgue integration. Invariant measures play the same role that intervals do for maps of the unit interval or circle. Let A denote the algebra of all finite unions of cylinders. Then A generates the Borel σ-algebra B. To see this we use Proposition 2.4.2. It is sufficient to check that A separates every pair of ∞ distinct points in Σ. Let x = (xj )∞ j=0 , y = (yj )j=0 ∈ Σ and suppose that x 6= y. Then there exists n ≥ 0 such that xn 6= yn . Hence x, y are in different cylinders of rank n + 1, and the claim follows. We will construct a family of σ-invariant measures on Σ by first constructing them on cylinders and then extending them to the Borel σ-algebra by using the Hahn-Kolmogorov Extension Theorem. A k × k matrix P is called a stochastic matrix if (i) P (i, j) ∈ [0, 1] (ii) each row of P sums to 1: for each i, Pk j=1 P (i, j) = 1. (Here, P (i, j) denotes the (i, j)th entry of the matrix P .) We say that P is irreducible if: for all i, j, there exists n > 0 such that P n (i, j) > 0. We say that P is aperiodic if there exist n > 0 such that every entry of P n is strictly positive. Thus P is irreducible if for every (i, j) there exists an n such that the (i, j)th entry of P n is positive, and P is aperiodic if this n can be chosen to be independent of (i, j). Suppose that P is irreducible. Let d be the highest common factor of {n > 0 | P n (i, i) > 0}. One can show that P is aperiodic if and only if d = 1. We call d the period of P . In general, if d is the period of an irreducible matrix P then {1, 2, . . . , k} can be partitioned into d sets, S0 , S1 , . . . , Sd−1 , say, such that P (i, j) > 0 only if i ∈ Sℓ , j ∈ Sℓ+1 mod d . The matrix P d restricted to the indices that comprise each set Sj is then aperiodic. The eigenvalues of aperiodic (or, more generally, irreducible) stochastic matrices are extremely well-behaved. Theorem 3.3.6 (Perron-Frobenius Theorem) Let P be an irreducible stochastic matrix with period d. Then the following statements hold: (i) The dth roots of unity are simple eigenvalues for P and all other eigenvalues have modulus strictly less than 1. (ii) Let 1 denote the column vector (1, 1, . . . , 1)T . Then P 1 = 1 so that 1 is a right eigenvector corresponding to the maximal eigenvalue 1. Moreover, there exists a corresponding left eigenvector p = (p(1), . . . , p(k)) for the eigenvalue 1, that is pP = p. The vector p has strictly positive entries p(j) > 0, and we can assume that p is Pk normalised so that j=1 p(j) = 1. (iii) for all i, j ∈ {1, 2, . . . , k}, we have that P nd (i, j) → p(j) as n → ∞. Proof (not examinable). We prove only the aperiodic case. In this case, the period d = 1. We must show that 1 is a simple eigenvalue, construct the positive left eigenvector p, and show that P n (i, j) → p(j) as n → ∞. First note that 1 is an eigenvalue of P as P 1 = 1; this follows from the fact that, for a stochastic matrix, the rows sum to 1. Suppose P has an eigenvalue λ with corresponding eigenvector v. Then P v = λv. Hence P n v = λn v. As the entries of P are non-negative we have that |λn ||v| ≤ P n |v|. 33 MATH4/61112 3. Lebesgue integration. Invariant measures Note that if P is stochastic then so is P n for any n ≥ 1. As P n is stochastic, the right-hand side is a bounded sequence in n. Hence P keeps an eigenvector in a bounded region of Ck . If |λ| > 1 then |λn ||v| → ∞, a contradiction if v 6= 0. Hence the eigenvalues of P have modulus less than or equal to 1. Suppose that P v = λv and |λ| = 1. Then P n |v| ≥ |v|. As P is aperiodic, we can choose n such that P n (i, j) > 0 for all i, j. Hence k X j=1 P n (i, j)|v(j)| ≥ |v(i)| (3.3.3) and choose i0 such that |v(i0 )| = max{|v(j)| | 1 ≤ j ≤ k}. Also, as P n is stochastic and P n (i, j) > 0, we must have that |v(i0 )| ≥ k X P n (i0 , j)|v(j)| (3.3.4) j=1 as the right-hand side of (3.3.4) is a convex combination of the |v(j)|. Thus |v(j)| = |v(i0 )| for every j, 1 ≤ j ≤ k. We canP assume, by normalising, that |v(j)| = 1 for all 1 ≤ j ≤ k. Now P v = λv, i.e. λv(i) = kj=1 P (i, j)v(j), a convex combination of v(j). As the |v(j)| all have the same modulus, this can only happen if all of the v(j) are the same. Hence v is a multiple of 1 and λ = 1. So 1 is a simple eigenvalue and there are no other eigenvalues of modulus 1. Since 1 is a simple eigenvalue, there is a unique (up to scalar multiples) left eigenvector p such that pP = p. As P is non-negative, we have that |p|P ≥ |p|, i.e. k X i=1 |p(i)|P (i, j) ≥ |p(j)| (3.3.5) and summing over j gives k X i=1 |p(i)| ≥ k X j=1 |p(j)| as P is stochastic. Hence we must have equality in (3.3.5), i.e. |p|P = |p|. Hence |p| is a left eigenvector for P . Hence p is a scalar multiple of |p|, so without loss of generality we can assume that p(i) ≥ 0 for all i. To see that p(i) > 0,Pchoose n such that all of the entries of P n are positive. Then n pP = p. Hence p(j) = ki=1 p(i)P n (i, j). The right-hand side of this expression is a sum of non-negative terms and can only be zero if p(i) = 0 for all 1 ≤ i ≤ k, i.e. if p = 0. Hence all of the entries of p are strictly positive.P We can normalise p and assume that kj=1 p(j) = 1. Decompose Rk into the sum V0 + V1 of eigenspaces where V0 = {v | hp, vi = 0}, V1 = span{1} so that V1 is the eigenspace corresponding to the eigenvalue 1 and V0 is the sum of the eigenspaces of the remaining eigenvalues. Then P (V1 ) = V1 and P (V0 ) ⊂ V0 . Note that if w ∈ V0 then P n w → 0 as 1 is not an eigenvalue of P when restricted to V0 and the eigenvalues of P restricted to V0 have modulus strictly less than 1. 34 MATH4/61112 3. Lebesgue integration. Invariant measures Let v ∈ Rk and write v = c1 + w where hp, wi = 0. Hence c = hp, vi. Then P n v = hp, vi1 + P n w. Hence P n v → hp, vi as n → ∞. Taking v = ej = (0, . . . , 0, 1, 0, . . . , 0), the standard basis vectors, we see that P n (i, j) → p(j). 2 Given an irreducible stochastic matrix P with corresponding normalised left eigenvector p, we define a Markov measure µP on cylinders by defining µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ). We can then extend µP to a probability measure on the Borel σ-algebra of Σ. Bernoulli measures are particular examples of Markov measures. Let p = (p(1), . . . , p(k)), P p(j) ∈ (0, 1), kj=1 p(j) = 1 be a probability vector. Define µp ([i0 , i1 , . . . , in−1 ]) = p(i0 )p(i1 ) · · · p(in−1 ). and then extend to the Borel σ-algebra. We call µp the Bernoulli-p measure. We can now prove that Markov measures are invariant for shift maps. Proposition 3.3.7 Let σ : Σ → Σ be a shift map on k symbols. Let P be an irreducible stochastic matrix with left eigenvector p. Then the Markov measure µP is a σ-invariant measure. Proof. It is sufficient to prove that µP (σ −1 [i0 , . . . , in−1 ]) = µP ([i0 , . . . , in−1 ]) for each cylinder [i0 , . . . , in−1 ]. First note that σ −1 [i0 , . . . , in−1 ] = {x ∈ Σ | σ(x) ∈ [i0 , . . . , in−1 ]} = {x ∈ Σ | x = (i, i0 , . . . , in−1 , . . .), i ∈ {1, 2, . . . , k}} = k [ [i, i0 , . . . , in−1 ]. i=1 Hence µP (σ −1 [i0 , . . . , in−1 ]) = µP k [ ! [i, i0 , . . . , in−1 ] i=1 = k X µP ([i, i0 , . . . , in−1 ]) as this is a disjoint union i=1 = k X i=1 p(i)P (i, i0 )P (i0 , i1 ) · · · P (in−2 , in−1 ) = p(i0 )P (i0 , i1 ) · · · P (in−2 , in−1 ) as pP = p = µP ([i0 , . . . , in−1 ]) where we have used the fact that pP = p. 2 35 MATH4/61112 3. Lebesgue integration. Invariant measures Remark. Bernoulli measures are familiar to you from probability theory. Suppose that S = {H, T } so that Σ denotes all infinite sequences of Hs and T s. We can think of an element of Σ as the outcome of an infinite sequence of coin tosses. Suppose that p = (pH , pT ) is a probability vector with corresponding Bernoulli measure µp . Then, for example, the cylinder set [H, H, T ] denotes the set of (infinite) coin tosses that start H, H, T , and this set has measure pH pH pT , corresponding to the probability of tossing H, H, T . Markov measures are similar. Given a stochastic matrix P = (P (i, j)) and a left probability eigenvector p = (p(1), . . . , p(k)) we defined µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ). We can regard p(i0 ) as being the probability of outcome i0 . Then we can regard P (i0 , i1 ) as being the probability of outcome i1 , given that the previous outcome was i0 . §3.4 Exercises Exercise 3.1 Show that in Weyl’s Criterion (Theorem 1.2.1) one cannot replace the hypothesis in equation (1.2.1) that f is continuous with the hypothesis that f ∈ L1 (R/Z, B, µ) (where µ denotes Lebesgue measure). Exercise 3.2 Let X be a compact metric space equipped with the Borel σ-algebra B. Show that a continuous transformation T : X → X is measurable. Exercise 3.3 Give an example of a sequence of functions fn ∈ L1 ([0, 1], B, µ) (µ = Lebesgue measure) such that fn → 0 µ-a.e. but fn 6→ 0 in L1 . Exercise 3.4 Let (X, B, µ) be a probability space and suppose that T : X → X is measurable. Show that T∗ µ is a probability measure on (X, B, µ). Exercise 3.5 (i) Show that the Gauss map does not preserve Lebesgue measure. (That is, find an example of a Borel set B such that T −1 B and B have different Lebesgue measures.) (ii) Let µ denote Gauss’ measure and let λ denote Lebesgue measure. Show that if B ∈ B, the Borel σ-algebra of [0, 1], then 1 1 λ(B) ≤ µ(B) ≤ λ(B). 2 log 2 log 2 (3.4.1) Conclude that a set B ∈ B has Lebesgue measure zero if and only if it has Gauss’ measure zero. (Two measures with the same sets of measure zero are said to be equivalent.) (iii) Using (3.4.1), show that f ∈ L1 ([0, 1], B, µ) if and only if f ∈ L1 ([0, 1], B, λ). Exercise 3.6 For an integer k ≥ 2 define T : R/Z → R/Z by T (x) = kx mod 1. Show that T preserves Lebesgue measure. 36 MATH4/61112 3. Lebesgue integration. Invariant measures Exercise 3.7 Let β > 1 denote the golden ratio (so that β 2 = β + 1). Define T : [0, 1] → [0, 1] by T (x) = βx mod R 1. Show that T does not preserve Lebesgue measure. Define the measure µ by µ(B) = B k(x) dx where k(x) = 1 on [0, 1/β) 1 + 13 β β β “ 1 1 + β 1 β3 ” on [1/β, 1). By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure. Exercise 3.8 Define the logistic map T : [0, 1] → [0, 1] by T (x) = 4x(1 − x). Define the measure µ by Z 1 1 p µ(B) = dx. π B x(1 − x) (i) Check that µ is a probability measure. (ii) By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure. Exercise 3.9 Define T : [0, 1] → [0, 1] by 1 1 n(n + 1)x − n if x ∈ , T (x) = n+1 n 0 if x = 0. This is called the L¨ uroth map. Show that ∞ X n=1 1 = 1. n(n + 1) Show that T preserves Lebesgue measure. Exercise 3.10 Let Σ = {x = (xj )∞ j=0 | xj ∈ {1, 2, . . . , k}} denote the shift space on k symbols. For x, y ∈ Σ, define n(x, y) to be the index of the first place in which the two sequences x, y disagree, and write n(x, y) = ∞ if x = y.) Define d(x, y) = 1 . 2n(x,y) (i) Show that d(x, y) is a metric. (ii) Show that the shift map σ is continuous. (iii) Show that a cylinder set [i0 , . . . , in−1 ] is both open and closed. (One can also prove that Σ is compact; we shall use this fact later.) 37 MATH4/61112 3. Lebesgue integration. Invariant measures Exercise 3.11 Show that the matrix P = 0 1 0 0 0 1/4 0 3/4 0 0 0 1/2 0 1/2 0 0 0 3/4 0 1/4 0 0 0 1 0 is irreducible but not aperiodic. Show that P has period 2. Show that {1, 2, . . . , 5} can be partitioned into two sets S0 ∪ S1 so that P (i, j) > 0 only if i ∈ Sℓ and j ∈ Sℓ+1 mod 2 . Show that P 2 , when restricted to indices in S0 and in S1 is aperiodic. Determine the eigenvalues of P . Find the unique left probability eigenvector p such that pP = p. Exercise 3.12 Show that Bernoulli measures are Markov measures. That is, given a probability vector p = (p(1), . . . , p(k)), construct a stochastic matrix P such that pP = p. Show that the corresponding Markov measure is the Bernoulli-p measure. 38 MATH4/61112 4. Examples of invariant measures 4. More examples of invariant measures §4.1 Criteria for invariance We shall give more examples of invariant measures. Recall that, given a measurable transformation T : X → X of a probability space (X, B, µ), we say that µ is a T -invariant measure (or, equivalently, T is a measure-preserving transformation) if µ(T −1 B) = µ(B) for all B ∈ B. We will need the following characterisations of invariance. Lemma 4.1.1 Let T : X → X be a measurable transformation of a probability space (X, B, µ). Then the following are equivalent: (i) T is a measure-preserving transformation; (ii) for each f ∈ L1 (X, B, µ) we have Z (iii) for each f ∈ L2 (X, B, µ) we have Z f dµ = Z f ◦ T dµ; f dµ = Z f ◦ T dµ. Proof. We will use the identity χT −1 B = χB ◦ T ; this is straightforward to check, see Exercise 4.1. We prove that (i) implies (ii). Suppose that T is a measure-preserving transformation. For any characteristic function χB , B ∈ B, Z Z Z −1 χB dµ = µ(B) = µ(T B) = χT −1 B dµ = χB ◦ T dµ and so the equality holds for any simple function (a finite linear combination of characteristic functions). Given any f ∈ L1 (X, B, µ) with f ≥ 0, we can find an increasing sequence of simple functions fn with fn → f pointwise, as n → ∞. For each n we have Z Z fn dµ = fn ◦ T dµ and, applying the Monotone Convergence Theorem to both sides, we obtain Z Z f dµ = f ◦ T dµ. To extend the result to a general real-valued integrable function f , consider the positive and negative parts. To extend the result to complex-valued integrable functions f , take real and imaginary parts. 39 MATH4/61112 4. Examples of invariant measures That (ii) implies (iii) follows immediately, as L2 (X, B, µ) ⊂ L1 (X, B, µ). 2 we R Finally, R prove that (iii) implies (i). Let B ∈ B. Then χB ∈ L (X, B, µ) as 2 |χB | dµ = χB dµ = µ(B). Recalling that χB ◦ T = χT −1 B we have that Z Z Z µ(B) = χB dµ = χB ◦ T dµ = χT −1 B dµ = µ(T −1 B) so that µ is a T -invariant probability measure. §4.2 2 Invariant measures on periodic orbits Recall that if x ∈ X then we define the Dirac measure δx by 1 if x ∈ B δx (B) = 0 if x 6∈ B. R We also recall that if f : X → R then f dδx = f (x). Let T : X → X be a measurable dynamical system defined on a measurable space (X, B). Suppose that x = T n x is a periodic point with period n. Then the probability measure n−1 1X δT j x µ= n j=0 is T -invariant. This is clear from Lemma 4.1.1, noting that for f ∈ L1 (X, B, µ) Z 1 (f (T x) + · · · + f (T n−1 x) + f (T n x)) f ◦ T dµ = n 1 = (f (x) + f (T x) + · · · + f (T n−1 x)) n Z = f dµ, using the fact that T n x = x. §4.3 The change of variables formula The change of variables formula (equivalently, integration by substitution) for (Riemann) integration should be familiar to you. It can be stated in the following way: if u : [a, b] → [c, d] is a differentiable bijection with continuous derivative and f : [c, d] → R is (Riemann) integrable then f ◦ u : [a, b] → R is (Riemann) integrable and Z u(b) u(a) f (x) dx = Z b f (u(x))u′ (x) dx. (4.3.1) a Allowing for the possibility that u is decreasing (so that u(b) < u(a)), we can rewrite (4.3.1) as Z Z f (u(x))|u′ (x)| dx. (4.3.2) f (x) dx = [c,d] [a,b] We would like a version of (4.3.2) that holds for (Lebesgue) integrable functions on subsets of Rn , equipped with Lebesgue measure on Rn . 40 MATH4/61112 4. Examples of invariant measures Theorem 4.3.1 (Change of variables formula) Let B ⊂ Rn be a Borel subset of Rn and suppose that B ⊂ U for some open subset U . Suppose that u : U → Rn is a diffeomorphism onto its image (i.e. u : U → u(U ) is a differentiable bijection with differentiable inverse). Then u(B) is a Borel set. Let µ denote Lebesgue measure on Rn and let f : Rn → C be integrable. Then Z Z f ◦ u| det Du| dµ f dµ = B u(B) where Du denotes the matrix of partial derivatives of u. There are more sophisticated versions of the change of variables formula that hold for arbitrary measures on Rn . §4.4 Rotations of a circle We illustrate how one can use the change of variables formula for integration to prove that Lebesgue measure is an invariant measure for certain maps on the circle. Proposition 4.4.1 Fix α ∈ R and define T (x) = x + α mod 1. Then Lebesgue measure µ is T -invariant. R R Proof. By Lemma 4.1.1 we need to show that f ◦T dµ = f dµ for every f ∈ L1 (X, B, µ). Recall that we can identify functions on R/Z with 1-periodic functions on R. By using the substitution u(x) = x + α and the change of variables formula for integration we have that Z 1+α Z 1 Z Z 1 f (x) dx f (x + α) dx = f (T x) dx = f ◦ T dµ = α 0 0 Z 1 Z α Z 1 Z 1+α Z 1 f (x) dx f (x) dx = f (x) dx + f (x) dx = f (x) dx + = α where we have used the fact that §4.5 Toral automorphisms 0 α 1 Rα 0 f (x) dx = R 1+α 1 0 f (x) dx by the periodicity of f . 2 Let X = Rk /Zk be the k-dimensional torus. Let A = (a(i, j)) be a k × k matrix with entries in Z and with det A 6= 0. We can define a linear map Rk → Rk by x1 x1 . .. . 7→ A .. . xk xk For brevity, we shall often abuse this notation by writing this as (x1 , . . . , xk ) 7→ A(x1 , . . . , xk ). Since A is an integer matrix it maps Zk to itself. We claim that A allows us to define a map T = TA : X → X : (x1 , . . . , xk ) + Zk 7→ A(x1 , . . . , xk ) + Zk . We shall often abuse notation and write T (x1 , . . . , xk ) = A(x1 , . . . , xk ) mod 1. To see that this map is well defined, we need to check that if x + Zk = y + Zk then Ax + Zk = Ay + Zk . If x, y ∈ Rk give the same point in the torus, then x = y + n for some n ∈ Zk . Hence Ax = A(y + n) = Ay + An. As A maps Zk to itself, we see that An ∈ Zk so that Ax, Ay determine the same point in the torus. 41 MATH4/61112 4. Examples of invariant measures Definition. Let A = (a(i, j)) denote a k × k matrix with integer entries such that det A 6= 0. Then we call the map TA : Rk /Zk → Rk /Zk a linear toral endomorphism. The map T is not invertible in general. However, if det A = ±1 then A−1 exists and is an integer matrix. Hence we have a map T −1 given by T −1 (x1 , . . . , xk ) = A−1 (x1 , . . . , xk ) mod 1. One can easily check that T −1 is the inverse of T . Definition. Let A = (a(i, j)) denote a k × k matrix with integer entries such that det A = ±1. Then we call the map TA : Rk /Zk → Rk /Zk a linear toral automorphism. Remark. The reason for this nomenclature is clear. If TA is either a linear toral endomorphism or linear toral automorphism, then it is an endomorphism or automorphism, respectively, of the torus regarded as an additive group. Example. Take A to be the matrix A= 2 1 1 1 and define T : R2 /Z2 → R2 /Z2 to be the induced map: T (x1 , x2 ) = (2x1 + x2 mod 1, x1 + x2 mod 1). Then T is a linear toral automorphism and is called Arnold’s CAT map (CAT stands for ‘C’ontinuous ‘A’utomorphism of the ‘T’orus). See Figure 4.5.1. Definition. Suppose that det A = ±1. Then we call T a hyperbolic toral automorphism if A has no eigenvalues of modulus 1. Proposition 4.5.1 Let T be a linear toral automorphism of the k-dimensional torus X = Rk /Zk . Then Lebesgue measure µ is T -invariant. R R Proof. By Lemma 4.1.1(iii) we need to show that f ◦ T dµ = f dµ for every f ∈ L1 (X, B, µ). Recall that we can identify functions f : Rk /Zk → C with functions f : Rk → C that satisfy f (x + n) = f (x) for all n ∈ Zk . We apply the change of variables formula with the substitution T (x) = Ax. Note that DT (x) = A and | det DT | = 1. Hence, by the change of variables formula Z Z Z Z f dµ = f dµ. f ◦ T | det DT | dµ = f ◦ T dµ = T (Rk /Zk ) Rk /Zk 2 We shall see in §5.4.3 that linear toral endomorphisms (i.e. when A is a k × k integer matrix with det A 6= 0 also preserves Lebesgue measure. 42 MATH4/61112 4. Examples of invariant measures Figure 4.5.1: Arnold’s CAT map §4.6 Exercises Exercise 4.1 Suppose that T : X → X. Show that χT −1 B = χB ◦ T . Exercise 4.2 Let T : R/Z → R/Z, T (x) = 2x mod 1, denote the doubling map. Show that the periodic points for T are points of the form p/(2n − 1), p = 0, 1, . . . , 2n − 2. Conclude that T has infinitely many invariant measures. Exercise 4.3 By using the change of variables formula, prove that the doubling map T (x) = 2x mod 1 on R/Z preserves Lebesgue measure. Exercise 4.4 Fix α ∈ R and define the map T : R2 /Z2 → R2 /Z2 by T (x, y) = (x + α, x + y). By using the change of variables formula, prove that Lebesgue measure is T -invariant. 43 MATH4/61112 5. Ergodic measures 5. Ergodic measures: definition, criteria, and basic examples §5.1 Introduction In section 3 we defined what is meant by an invariant measure or, equivalently, what is meant by a measure-preserving transformation. In this section, we define what is meant by an ergodic measure. The primary motivation for ergodicity is Birkhoff’s Ergodic Theorem: if T is an ergodic measure-preserving transformation of the probability space (X, B, µ) then, for each f ∈ L1 (X, B, µ) we have that n−1 1X lim f (T j x) → n→∞ n j=0 Z f dµ for µ-a.e. x ∈ X. Checking that a given measure-preserving transformation is ergodic is often a highly nontrivial task and we shall study some methods for proving ergodicity. §5.2 Ergodicity We define what it means to say that a measure-preserving transformation is ergodic. Definition. Let (X, B, µ) be a probability space and let T : X → X be a measurepreserving transformation. We say that T is an ergodic transformation with respect to µ (or that µ is an ergodic measure) if, whenever B ∈ B satisfies T −1 B = B, then we have that µ(B) = 0 or 1. Remark. Ergodicity can be viewed as an indecomposability condition. If ergodicity does not hold then we can find a set B ∈ B such that T −1 B = B and 0 < µ(B) < 1. We can then split T : X → X into T : B → B and T : X \ B → X \ B with invariant probability 1 1 µ(· ∩ B) and 1−µ(B) µ(· ∩ (X \ B)), respectively. measures µ(B) It will sometimes be convenient for us to weaken the condition T −1 B = B to µ(T −1 B△B) = 0, where △ denotes the symmetric difference: A△B = (A \ B) ∪ (B \ A). We will often write that A = B µ-a.e. or A = B mod 0 to mean that µ(A△B) = 0. Remark. It is easy to see that if A = B µ-a.e. then µ(A) = µ(B). Lemma 5.2.1 Let T be a measure-preserving transformation of the probability space (X, B, µ). Suppose that B ∈ B is such that µ(T −1 B△B) = 0. Then there exists B ′ ∈ B with T −1 B ′ = B ′ and µ(B△B ′ ) = 0. (In particular, µ(B) = µ(B ′ ).) 44 MATH4/61112 5. Ergodic measures Proof (not examinable). For each n ≥ 0, we have the inclusion T −n B△B ⊂ n−1 [ T −j (T −1 B△B). T −(j+1) B△T −j B = n−1 [ j=0 j=0 Hence, as T preserves µ, µ(T −n B△B) ≤ nµ(T −1 B△B) = 0. Let B′ = ∞ [ ∞ \ T −j B. n=0 j=n We have that Since the sets S∞ j=n T T µ B△ −j B −1 ′ ∞ [ j=n T −j B ≤ ∞ X µ(B△T −n B) = 0. j=n decrease as n increases we have that µ(B△B ′ ) = 0. Also, B = ∞ [ ∞ \ T −(j+1) B= ∞ ∞ \ [ T −j B = B ′ , n=0 j=n+1 n=0 j=n as required. 2 Corollary 5.2.2 If T is ergodic and µ(T −1 B△B) = 0 then µ(B) = 0 or 1. We have the following convenient characterisations of ergodicity. Proposition 5.2.3 Let T be a measure-preserving transformation of the probability space (X, B, µ). The following are equivalent: (i) T is ergodic; (ii) whenever f ∈ L1 (X, B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e. (iii) whenever f ∈ L2 (X, B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e. Remark. If f is a constant function then clearly f ◦ T = f . Proposition 5.2.3 says that, when T is ergodic, the constants are the only T -invariant functions (up to sets of measure zero). Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Suppose that f ∈ L1 (X, B, µ) is such that f ◦ T = f µ-a.e. By taking real and imaginary parts, we can assume without loss of generality that f is real-valued. For k ∈ Z and n ∈ N, define k k+1 −1 k k + 1 X(k, n) = x ∈ X | n ≤ f (x) < , =f . 2 2n 2n 2n Since f is measurable, we have that X(k, n) ∈ B. 45 MATH4/61112 5. Ergodic measures We have that T −1 X(k, n)△X(k, n) ⊂ {x ∈ X | f (T x) 6= f (x)} so that µ(T −1 X(k, n)△X(k, n)) = 0. Hence, as T is ergodic, we have by Corollary 5.2.2 that µ(X(k, n)) = 0 or µ(X(k, n)) = 1. As f ∈ L1 (X, B, µ) is integrable, we have that f is finite almost everywhere. Hence, for each n, ! ∞ ∞ ∞ [ [ [ k k+1 −1 −1 −1 k k + 1 f R=f f X(k, n) = , , = 2n 2n 2n 2n k=−∞ k=−∞ k=−∞ is equal to X up to a set of measure zero, i.e., µ X△ [ ! X(k, n) k∈Z = 0; moreover, this union is disjoint. Hence we have X µ(X(k, n)) = µ(X) = 1 k∈Z and so there is a unique kn for which µ(X(kn , n)) = 1. Let Y = ∞ \ X(kn , n). n=1 Then µ(Y ) = 1. Let x, y ∈ Y . Then for each n, f (x), f (y) ∈ [kn /2n , (kn + 1)/2n ) for all n ≥ 1. Hence for all n ≥ 1 we have that |f (x) − f (y)| ≤ 1 . 2n Hence f (x) = f (y). Hence f is constant on the set Y . Hence f is constant µ-a.e. That (ii) implies (iii) is clear as if f ∈ L2 (X, B, µ) then f ∈ L1 (X, B, µ). Finally, we prove that (iii) implies (i). Suppose that B ∈ B is such that T −1 B = B. Then χB ∈ L2 (X, B, µ) and χB ◦ T (x) = χB (x) for all x ∈ X. Hence χB is constant µ-a.e. Since χB only takes the values 0 and 1, we must have χB = 0 µ-a.e. or χB = 1 µ-a.e. Therefore Z 0 if χB = 0 µ-a.e. χB dµ = µ(B) = 1 if χB = 1 µ-a.e. X Hence T is ergodic with respect to µ. §5.3 2 Fourier series We shall give a method for proving that certain transformations of the circle or torus are ergodic with respect to Lebesgue measure. To do this, we use Proposition 5.2.3 and Fourier series. 46 MATH4/61112 5. Ergodic measures Let X = R/Z denote the unit circle and let f : X → R. (Alternatively, we can think of f as a periodic function R → R with f (x) = f (x + n) for all n ∈ Z.) Equip X with the Borel σ-algebra, let µ denote Lebesgue measure and assume that f ∈ L2 (X, B, µ). We can associate to f its Fourier series ∞ a0 X (an cos 2πnx + bn sin 2πnx) , + 2 n=1 where an = 2 Z 1 f (x) cos 2πnx dµ, bn = 2 0 Z (5.3.1) 1 f (x) sin 2πnx dµ. 0 (Notice that we are not claiming that the series converges—we are just formally associating the Fourier series to f .) We shall find it more convenient to work with a complex form of the Fourier series and rewrite (5.3.1) as ∞ X cn e2πinx , (5.3.2) n=−∞ where cn = (In particular, c0 = R1 0 Z 1 f (x)e−2πinx dµ. 0 f dµ.) We call cn the nth Fourier coefficient. Remark. That (5.3.2) and (5.3.1) are equivalent follows from the fact that cos 2πnx = e2πinx + e−2πinx e2πinx − e−2πinx , sin 2πnx = 2 2i One can explain Fourier series by considering a more general construction. Recall that an inner product on a complex vector space H is a function h·, ·i : H × H → C such that (i) hu, vi = hv, ui for all u, v ∈ H, (ii) for each v ∈ H, u 7→ hu, vi is linear, (iii) hv, vi ≥ 0 for all u ∈ H, with equality if and only if v = 0. p Given an inner product, one can define a norm on H by setting kvk = hv, vi. One can then define a metric on H by setting dH (u, v) = ku − vk. If H is a complex vector space with an inner product h·, ·i such that H is complete with respect to the metric given by the inner product then we call H a Hilbert space. Recall that L2 (X, B, µ) is a Hilbert space with the inner product Z hf, gi = f g¯ dµ. The metric on L2 (X, B, µ) is then given by 1/2 Z . |f − g|2 dµ d(f, g) = Let H be an infinite dimensional Hilbert space. We say that {ej }∞ j=0 is an orthonormal basis for H if: 47 MATH4/61112 (i) hei , ej i = 5. Ergodic measures 0 if i 6= j 1 if i = j. (ii) every v ∈ H can be written in the form v= ∞ X cj ej . (5.3.3) j=0 As (5.3.3) involves an infinite sum, P we need to be careful about what convergence means. To make (5.3.3) precise, let sn = nj=0 cj ej denote the nth partial sum. Then (5.3.3) means that kv − sn k → 0 as n → ∞. As the vectors {ej }∞ j=0 are orthonormal, taking the inner product of (5.3.3) with ei shows that ∞ ∞ X X cj hej , ei i = ci . cj ej , ei i = hv, ei i = h j=0 j=0 Let X = R/Z be the circle and let B be the Borel σ-algebra. Let µ denote Lebesgue measure. Let en (x) = e2πinx . Then {en }∞ n=−∞ is an orthonormal basis for the Hilbert space L2 (X, B, µ). Thus if f ∈ L2 (X, B, µ) then we can write f (x) = ∞ X cn e2πinx n=−∞ (in the sense that the sequence of partial sums L2 -converges to f ) where Z cn = hf, en i = f (x)e−2πinx dµ. (5.3.4) If we want to make the dependence of cn on f clear, then we will sometimes write cn (f ) for cn . We shall need the following facts about Fourier coefficients. Proposition 5.3.1 (i) Let f, g ∈ L2 (X, B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are equal, i.e. cn (f ) = cn (g) for all n ∈ Z. (ii) Let f ∈ L2 (X, B, µ). Then cn → 0 as n → ±∞. Remark. Proposition 5.3.1(ii) is better known as the Riemann-Lebesgue Lemma. So far, we have studied Fourier series for functions defined on the circle; a similar construction works for functions defined on the k-dimensional torus. Let X = Rk /Zk be the k-dimensional torus equipped with the Borel σ-algebra and let µ denote Lebesgue measure on X. Then L2 (X, B, µ) is a Hilbert space when equipped with the inner product Z hf, gi = f g¯ dµ. Let n = (n1 , . . . , nk ) ∈ Zk and define en (x) = e2πihn,xi where hn, xi = n1 x1 + · · · + nk xk . Then {en }n∈Zk is an orthonormal basis for L2 (X, B, µ). Thus we can write f ∈ L2 (X, B, µ) as X f (x) = cn e2πihn,xi n∈Zk 48 MATH4/61112 5. Ergodic measures in the sense that the sequence of partial sums sN converges in L2 (B, µ) where X sN (x) = cn e2πihn,xi . n=(n1 ,...,nk )∈Zk ,|nj |≤N The nth Fourier coefficient is given by cn = cn (f ) = Z f (x)e−2πihn,xi dµ. We have the following analogue of Proposition 5.3.1: Proposition 5.3.2 (i) Let f, g ∈ L2 (X, B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are equal. (ii) Let f ∈ L2 (X, B, µ). Let n = (n1 , . . . , nk ) ∈ Zk and define knk = max1≤j≤k |nj |. Then cn → 0 as knk → ∞. Remark. We could have used any norm on Zk in (ii). §5.4 Proving ergodicity using Fourier series In the previous section we studied a number of examples of dynamical systems defined on the circle or the torus and we proved that Lebesgue measure is invariant. We show how Proposition 5.2.3 can be used in conjunction with Fourier series to determine whether Lebesgue measure is ergodic. Recall that if f ∈ L2 (X, B, µ) then we associate to f the Fourier series ∞ X cn (f )e2πinx n=−∞ where cn (f ) = Z f (x)e−2πinx dµ. P If we let sn (x) = nℓ=−n cℓ (f )e2πiℓx then ksn − f k2 → 0 as n → ∞. If T is a measure-preserving transformation then it follows that ksn ◦ T − f ◦ T k2 = Z = Z 2 |sn ◦ T − f ◦ T | dµ 2 (|sn − f |) dµ 1/2 1/2 = Z 2 (|sn − f |) ◦ T dµ 1/2 = ksn − f k2 → 0 as n → ∞, where we have used Lemma 4.1.1. By Proposition 5.3.2(i) it follows that, if limn→∞ sn ◦ T is a possibly infinite sum of terms of the form e2πinx , then it must be the Fourier series of f ◦ T . In practice, this means that if we take the Fourier series for f (x) and evaluate it at T (x), then we obtain the Fourier series for f (T x). If f ◦ T = f almost everywhere, then we can use Proposition 5.3.1(i) to compare Fourier coefficients to obtain relationships between the Fourier coefficients, and then show that f must be constant. A similar method works for Fourier series on the torus, as we shall see. 49 MATH4/61112 §5.4.1 5. Ergodic measures Rotations on a circle Fix α ∈ R and define T : R/Z → R/Z by T (x) = x + α mod 1. We have already seen that T preserves Lebesgue measure. The following result gives a necessary and sufficient condition for T to be ergodic. Theorem 5.4.1 Let T (x) = x + α mod 1. (i) If α ∈ Q then T is not ergodic with respect to Lebesgue measure. (ii) If α 6∈ Q then T is ergodic with respect to Lebesgue measure. Proof. Suppose that α ∈ Q and write α = p/q for p, q ∈ Z with q 6= 0. Define f (x) = e2πiqx ∈ L2 (X, B, µ). Then f is not constant but f (T x) = e2πiq(x+p/q) = e2πi(qx+p) = e2πiqx = f (x). Hence T is not ergodic. Suppose that α 6∈ Q. Suppose that f ∈ L2 (X, B, µ) is such that f ◦ T = f a.e. We want to prove that f is constant. Suppose that f has Fourier series ∞ X cn e2πinx . n=−∞ Then f ◦ T has Fourier series ∞ X cn e2πinα e2πinx . n=−∞ Comparing Fourier coefficients we see that cn = cn e2πinα , for all n ∈ Z. As α ∈ 6 Q, we see that e2πinα 6= 1 unless n = 0. Hence cn = 0 for n 6= 0. Hence f has Fourier series c0 , i.e. f is constant a.e. 2 §5.4.2 The doubling map Let X = R/Z. Recall that if f ∈ L2 (X, B, µ) has Fourier series ∞ X cn e2πinx n=−∞ then the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)) tells us that cn → 0 as n → ∞. Proposition 5.4.2 The doubling map T : X → X defined by T (x) = 2x mod 1 is ergodic with respect to Lebesgue measure µ. 50 MATH4/61112 5. Ergodic measures Proof. Let f ∈ L2 (X, B, µ) and suppose that f ◦ T = f µ-a.e. Let f have Fourier series f (x) = ∞ X cn e2πinx . n=−∞ For each p ≥ 0, f ◦ T p has Fourier series ∞ X p cn e2πin2 x . n=−∞ Comparing Fourier coefficients we see that cn = c2p n for all n ∈ Z and each p = 0, 1, 2, . . .. Suppose that n 6= 0. Then |2p n| → ∞ as p → ∞. By the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)), c2p n → 0 as p → ∞. As c2p n = cn , we must have that cn = 0 for n 6= 0. Thus f has Fourier series c0 , and so must be equal to a constant a.e. Hence T is ergodic with respect to µ. 2 §5.4.3 Toral endomorphisms The argument for the doubling map can be generalised using higher-dimensional Fourier series to study toral endomorphisms. Let X = Rk /Zk and let µ denote Lebesgue measure. When T is invertible (and so a linear toral automorphism) we have already seen that Lebesgue measure is an invariant measure; in §7 we shall see that Lebesgue measure is an invariant measure when T is a linear toral endomorphism. Recall that f ∈ L2 (X, B, µ) has Fourier series X cn e2πihn,xi , n∈Zk where n = (n1 , . . . , nk ), x = (x1 , . . . , xk ). Define |n| = max1≤j≤k |nj |. Then the RiemannLebesgue Lemma tells us that cn → 0 as |n| → ∞. Let A be a k × k integer matrix with det A 6= 0 and define T : X → X by T ((x1 , . . . , xk ) + Zk ) = A(x1 , . . . , xk ) + Zk . Proposition 5.4.3 A linear toral endomorphism T is ergodic with respect to µ if and only if no eigenvalue of A is a root of unity. Remark. In particular, hyperbolic toral automorphisms (i.e. det A = ±1 and A has no eigenvalues of modulus 1) are ergodic with respect to Lebesgue measure. Proof. Suppose that T is ergodic but, for a contradiction, that A has a pth root of unity as an eigenvalue. We choose p > 0 to be the least such integer. Then Ap has 1 as an eigenvalue, and so n(Ap − I) = 0 for some non-zero vector n = (n1 , . . . , nk ) ∈ Rk . Since A is an integer matrix, we have that Ap − I is an integer matrix, and so we can in fact take n ∈ Zk . Note that p p e2πihn,A xi = e2πihnA ,xi = e2πihn,xi . 51 MATH4/61112 5. Ergodic measures This is because, writing x = (x1 , . . . , xk )T , a1,1 · · · a1,k x1 .. .. = hnA, xi. hn, Axi = (n1 , . . . , nk ) ... . . ak,1 · · · ak,k xk Define f (x) = p−1 X e2πihn,A j xi . j=0 Then f ∈ L2 (X, B, µ) and is T -invariant. Since T is ergodic, we must have that f is constant. But the only way in which this can happen is if n = 0, a contradiction. Conversely suppose that no eigenvalue of A is a root of unity; we prove that T is ergodic with respect to Lebesgue measure. Suppose that f ∈ L2 (X, B, µ) is T -invariant µ-a.e. We show that f is constant µ-a.e. Associate to f its Fourier series: X cn e2πihn,xi . n∈Zk Since f T p = f µ-a.e., for all p > 0, we have that X X p cn e2πihnA ,xi = cn e2πihn,xi . n∈Zk n∈Zk Comparing Fourier coefficients we see that, for every n ∈ Zk , cn = cnA = · · · = cnAp = · · · . If cn 6= 0 then there can only be finitely many indices in the above list, for otherwise it would contradict the fact that cn → 0 as |n| → ∞, by the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)). Hence there exist q1 > q2 ≥ 0 such that nAq1 = nAq2 . Letting p = q1 − q2 > 0 we see that nAp = n. Thus n is either equal to 0 or n is an eigenvector for Ap with eigenvalue 1. In the latter case, A would have a pth root of unity as an eigenvalue. Hence n = 0. Hence cn = 0 unless n = 0 and so f is equal to the constant c0 µ-a.e. Thus T is ergodic. 2 §5.5 Exercises Exercise 5.1 Suppose that α ∈ Q. Show directly from the definition that the rotation T (x) = x+α mod 1 is not ergodic, i.e. find an invariant set B = T −1 B, B ∈ B, which has Lebesgue measure 0 < µ(B) < 1. Exercise 5.2 Define T : R2 /Z2 → R2 /Z2 by T (x, y) = (x + α, x + y). Suppose that α 6∈ Q. By using Fourier series, show that T is ergodic with respect to Lebesgue measure. 52 MATH4/61112 5. Ergodic measures Exercise 5.3 Let T : X → X be a measurable transformation of a measurable space (X, B). Suppose that x = T n x is a periodic point with period n. Define the measure µ supported on the periodic orbit of x by n−1 1X µ= δT j x n j=0 where δx denotes the Dirac measure at x. Show from the definition of ergodicity that µ is an ergodic measure. Exercise 5.4 (Part (iv) of this exercise is outside the scope of the course!) It is easy to construct lots of examples of hyperbolic toral automorphisms (i.e. no eigenvalues of modulus 1—the CAT map is such an example), which must necessarily be ergodic with respect to Lebesgue measure. It is harder to show that there are ergodic toral automorphisms with some eigenvalues of modulus 1. (i) Show that to have an ergodic toral automorphism of Rk /Zk with an eigenvalue of modulus 1, we must have k ≥ 4. Consider the matrix 0 0 A= 0 −1 1 0 0 1 0 0 8 −6 0 0 . 1 8 (ii) Show that A defines a linear toral automorphism TA of the 4-dimensional torus R4 /Z4 . (iii) Show that A has four eigenvalues, two of which have modulus 1. (iv) Show that TA is ergodic with respect to Lebesgue measure. (Hint: you have to show that the two eigenvalues of modulus 1 are not roots of unity, i.e. are not solutions to λn − 1 = 0 for some n. The best way to do this is to use results from Galois theory on the irreducibility of polynomials.) 53 MATH4/61112 6. Ergodic measures: using the HKET 6. Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem to prove ergodicity §6.1 Introduction We illustrate a method for proving that a given transformation is ergodic using the HahnKolmogorov Extension Theorem. The key observation is the following technical lemma. Lemma 6.1.1 Let (X, B, µ) be a probability space and suppose that A ⊂ B is an algebra that generates B. Let B ∈ B. Suppose there exists K > 0 such that µ(B)µ(I) ≤ Kµ(B ∩ I) (6.1.1) for all I ∈ A. Then µ(B) = 0 or 1. Proof. Let ε > 0. As A generates B there exists I ∈ A such that µ(B c △I) < ε. Hence |µ(B c ) − µ(I)| < ε. Moreover, note that B ∩ I ⊂ B c △I so that µ(B ∩ I) < ε. Hence µ(B)µ(B c ) ≤ µ(B)(µ(I) + ε) ≤ µ(B)µ(I) + µ(B)ε ≤ Kµ(B ∩ I) + ε ≤ (K + 1)ε. As ε > 0 is arbitrary, it follows that µ(B)µ(B c ) = 0. Hence µ(B) = 0 or 1. 2 Remark. We will often apply Lemma 6.1.1 when A is an algebra of finite unions of intervals or cylinders. In this case, we need only check that there existsSa constant K > 0 such that (6.1.1) holds for intervals or cylinders. To see this, let I = kj=1 Ij be a finite union of pairwise disjoint sets in A. Then if (6.1.1) holds for Ij then k k X [ µ(B)µ(Ij ) Ij = µ(B)µ(I) = µ(B)µ j=1 j=1 ≤ K k X j=1 µ(B ∩ Ij ) = Kµ B ∩ k [ j=1 Ij = Kµ(B ∩ I). We will also use the change of variables formula for integration. Recall that if I, J ⊂ R are intervals, u : I → J is a differentiable bijection, and f : J → R is integrable, then Z Z f (x) dx = f (u(x))|u′ (x)| dx. J I 54 MATH4/61112 §6.2 6. Ergodic measures: using the HKET The doubling map To illustrate the method, we give another proof that the doubling map is ergodic with respect to Lebesgue measure. Let X = [0, 1] be the unit interval, let B be the Borel σ-algebra, and let µ be Lebesgue measure. Given x ∈ [0, 1], we can write x as a base 2 ‘decimal’ expansion: ∞ X xj x = ·x0 x1 x2 . . . = 2j+1 (6.2.1) j=0 where xj ∈ {0, 1}. Note that T (x) = 2 ∞ ∞ ∞ X X X xj+1 xj+1 xj mod 1 = x + mod 1 = . 0 j+1 j+1 2 2 2j+1 j=0 j=0 j=0 Hence if x has base 2 expansion given by (6.2.1) then T (x) has base 2 expansion given by T (x) = ·x1 x2 x3 . . . i.e. T deletes the zeroth term in the base 2 expansion of x and shifts the remaining terms one place to the left. We introduce dyadic intervals or cylinders to be the sets I(i0 , i1 , . . . , in−1 ) = {x ∈ [0, 1] | xj = ij , j = 0, . . . , n − 1}. (So, for example, I(0) = [0, 1/2], I(1) = [1/2, 1], I(0, 0) = [0, 1/4], I(0, 1) = [1/4, 1/2], etc.) We call n the rank of the cylinder. A dyadic interval is an interval with end-points at k/2n , (k + 1)/2n where n ≥ 1 and k ∈ {0, 1, . . . , 2n }. Let A denote the algebra of finite unions of cylinders. Then A generates the Borel σ-algebra. This follows from Proposition 2.4.2 by noting that cylinders are intervals (and so Borel) and that they separate points: if x, y ∈ [0, 1], x 6= y, then they have base 2 expansions that differ at some index, say xn 6= yn . Hence x, y belong to disjoint cylinders of rank n. Define the maps x+1 x . φ0 (x) = , φ1 (x) = 2 2 Then φ0 : [0, 1] → I(0) and φ1 : [0, 1] → I(1) are differentiable bijections. Indeed, if x ∈ [0, 1] has base 2 expansion x = ·x0 x1 x2 . . . then φ0 (x) and φ1 (x) have base 2 expansions given by φ0 (x) = ·0x0 x1 x2 . . . , φ1 (x) = ·1x0 x1 x2 . . . . Thus φ0 and φ1 act on base 2 expansions as a shift to the right, inserting the digits 0 and 1 in the zeroth place, respectively. Note that T φ0 (x) = x and T φ1 (x) = x for all x ∈ [0, 1]. Given i0 , i1 , . . . , in−1 ∈ {0, 1}, define φi0 ,i1 ,...,in−1 : [0, 1] → I(i0 , i1 , . . . , in−1 ) by φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 . 55 (6.2.2) MATH4/61112 6. Ergodic measures: using the HKET Thus φi0 ,i1 ,...,in−1 takes the point x with base 2 expansion given by (6.2.1), shifts the digits n places to the right, and inserts the digits i0 , i1 , . . . , in−1 in the first n places. Note that T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1]. We are now in a position to prove that T is ergodic with respect to Lebesgue measure. Let B ∈ B be such that T −1 B = B. We must show that µ(B) = 0 or 1. By Lemma 6.1.1, it is sufficient to prove that there exists K > 0 such that µ(B)µ(I) ≤ Kµ(B ∩ I) for all intervals I; in fact, we shall prove that µ(B)µ(I) = µ(B ∩ I) for all dyadic intervals I. Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 ) be a cylinder of rank n and let φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note also that µ(I) = 1/2n . We will also need the fact that φ′ (x) = 1/2n (this follows by noting that φ′0 (x) = φ′1 (x) = 1/2 and differentiating (6.2.2) using the chain rule). Finally, we observe that Z µ(B ∩ I) = χB∩I (x) dx Z = χB (x)χI (x) dx Z χB (x) dx = I Z 1 χB (φ(x))φ′ (x) dx by the change of variables formula = 0 Z 1 = χT −n B (φ(x))φ′ (x) dx as T −n B = B 0 Z 1 χB (T n (φ(x)))φ′ (x) dx as χT −n B = χB ◦ T n = 0 Z 1 χB (x)φ′ (x) dx as T n φ(x) = x = 0 Z 1 1 = χB (x) as φ′ (x) = 1/2n 2n 0 = µ(I)µ(B) as µ(I) = 1/2n . Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T . §6.3 The Gauss map Let x ∈ [0, 1]. If x has continued fraction expansion 1 x= 1 x0 + x1 + 1 x2 + · · · then for brevity we write x = [x0 , x1 , x2 , . . .]. Let X = [0, 1] and recall that the Gauss map is defined by T (x) = 1/x mod 1 (with T defined at 0 by setting T (0) = 0). If x has continued fraction expansion [x0 , x1 , x2 , . . .] then T (x) has continued fraction expansion [x1 , x2 , . . .]. We have already seen that T leaves 56 MATH4/61112 6. Ergodic measures: using the HKET Gauss’ measure µ invariant, where Gauss’ measure is defined by Z 1 1 µ(B) = dx. log 2 B 1 + x We shall find it convenient to swap between Gauss’ measure µ and Lebesgue measure, which we shall denote here by λ. Recall from Exercise 3.5 that for any set B ∈ B we have 1 1 λ(B) ≤ µ(B) ≤ λ(B). 2 log 2 log 2 Hence µ(B) = 0 if and only if λ(B) = 0. Thus to prove ergodicity it suffices to show that any T -invariant set B has either λ(B) = 0 or λ(B c ) = 0. We shall also need some basic facts about continued fractions. Let x ∈ (0, 1) be irrational and have continued fraction expansion [x0 , x1 , . . .]. For any t ∈ [0, 1], write [x0 , x1 , . . . , xn−1 + t] = Pn (x0 , x1 , . . . , xn−1 ; t) Qn (x0 , x1 , . . . , xn−1 ; t) where Pn (x0 , x1 , . . . , xn−1 ; t) and Qn (x0 , x1 , . . . , xn−1 ; t) are polynomials in x0 , x1 , . . . , xn−1 and t. Let Pn = Pn (x0 , x1 , . . . , xn−1 ), Qn = Qn (x0 , x1 , . . . , xn−1 ) (we suppress the dependence of Pn and Qn on x0 , . . . , xn−1 for brevity). The following lemma is easily proved using induction. Lemma 6.3.1 (i) We have Pn (x0 , x1 , . . . , xn−1 ; t) = Pn + tPn−1 , Qn (x0 , x1 , . . . , xn−1 ; t) = Qn + tQn−1 . and the following recurrence relations hold: Pn+1 = xn Pn + Pn−1 , Qn+1 = xn Qn + Qn−1 with initial conditions P0 = 0, P1 = 1, Q0 = 1, Q1 = x0 . (ii) The following identity holds: Qn Pn−1 − Qn−1 Pn = (−1)n . Let i0 , i1 , . . . , in−1 ∈ N. Define the cylinder I(i0 , i1 , . . . , in−1 ) to be the set of all points x ∈ (0, 1) whose continued fraction expansion starts with i0 , . . . , in−1 . This is easily seen to be an interval; indeed I(i0 , i1 , . . . , in−1 ) = {[i0 , i1 , . . . , in−1 + t] | t ∈ [0, 1)}. Let A denote the algebra of finite unions of cylinders. Then A generates the Borel σalgebra. (This follows from Proposition 2.4.2: cylinders are clearly Borel sets and they separate points. To see this, note that if x 6= y then they have different continued fraction expansions. Hence there exists n such that xn 6= yn . Hence x, y are in different cylinders of rank n, and these cylinders are disjoint.) For each i ∈ N define the map φi : [0, 1) → I(i) by φi (x) = 57 1 . i+x MATH4/61112 6. Ergodic measures: using the HKET Thus if x has continued fraction expansion [x0 , x1 , . . .] then φi (x) has continued fraction expansion [i, x0 , x1 , . . .]. Clearly T (φi (x)) = x for all x ∈ [0, 1). For i0 , i1 , . . . , in−1 ∈ N, define φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 : [0, 1) → I(i0 , i1 , . . . , in−1 ). Then φi0 ,i1 ,...,in−1 (x) takes the continued fraction expansion of x, shifts every digit n places to the right, and inserts the digit i0 , i1 , . . . , in−1 in the first n places. Clearly T n (φi0 ,i1 ,...,in−1 (x)) = x for all x ∈ [0, 1). We first need an estimate on the length of (i.e. the Lebesgue measure of) the cylinder I(i0 , i1 , . . . , in−1 ). Note that φi0 ,i1 ,...,in−1 (t) = Pn (i0 , . . . , in−1 ; t) Pn + tPn−1 = . Qn (i0 , . . . , in−1 ; t) Qn + tQn−1 Differentiating this expression with respect to t and using Lemma 6.3.1(ii), we see that Qn Pn−1 − Pn Qn−1 1 ′ = . |φi0 ,i1 ,...,in−1 (t)| = 2 (Qn + tQn−1 ) (Qn + tQn−1 )2 It follows from Lemma 6.3.1(ii) that Qn + Qn−1 ≤ 2Qn . Hence 1 1 1 1 ≤ ≤ |φ′i0 ,i1 ,...,in−1 (t)| ≤ 2 . 2 2 4 Qn (Qn + Qn−1 ) Qn (6.3.1) Hence λ(I(i0 , i1 , . . . , in−1 )) = Z χI(i0 ,i1 ,...,in−1 ) (t) dt = Z dt = I(i0 ,i1 ,...,in−1 ) Z 0 1 |φ′i0 ,i1 ,...,in−1 (t)| dt (6.3.2) where we have used the change of variables formula. Combining (6.3.2) with (6.3.1) we see that 1 1 1 ≤ λ(I(i0 , i1 , . . . , in−1 )) ≤ 2 . (6.3.3) 2 4 Qn Qn We can now prove that the Gauss map is ergodic with respect to Gauss’ measure µ. Suppose that T −1 B = B where B ∈ B. Let I(i0 , i1 , . . . , in−1 ) be a cylinder. Then λ(B ∩ I(i0 , i1 , . . . , in−1 )) Z χB (x) dx = = = = = Z Z Z Z I(i0 ,i1 ,...,in−1 ) 1 χB (φi0 ,i1 ,...,in−1 (x))|φ′i0 ,i1 ,...,in−1 (x)| dx by the change of variables formula. 0 1 0 1 0 1 0 χT −n B (φi0 ,i1 ,...,in−1 (x))|φ′i0 ,i1 ,...,in−1 (x)| dx as T −n B = B χB (T n (φi0 ,i1 ,...,in−1 (x)))|φ′i0 ,i1 ,...,in−1 (x)| dx as χT −n B = χB ◦ T n χB (x)|φ′i0 ,i1 ,...,in−1 (x)| dx as T n φi0 ,i1 ,...,in−1 (x) = x. 58 MATH4/61112 6. Ergodic measures: using the HKET By (6.3.1) and (6.3.3) it follows that λ(B ∩ I(i0 , i1 , . . . , in−1 )) ≤ 1 1 λ(B) ≤ λ(B)λ(I(i0 , i1 , . . . , in−1 )) 4Q2n 4 so that λ(B)λ(I(i0 , i1 , . . . , in−1 )) ≤ 4λ(B ∩ I(i0 , i1 , . . . , in−1 )). By Lemma 6.1.1 it follows that λ(B) = 0 or λ(B c ) = 0. Hence, as Lebesgue measure and Gauss’ measure have the same sets of measure zero, it follows that either µ(B) = 0 or µ(B c ) = 0. Hence T is ergodic with respect to Gauss’ measure. §6.4 Bernoulli shifts Let S = {1, . . . , k} be a finite set of symbols and let Σ = {x = (xj )∞ j=0 | xj ∈ {1, 2, . . . , k}} denote the shift space on k symbols. Let σ : Σ → Σ denote the left shift map, so that (σ(x))j = xj+1 . Recall that we defined the cylinder [i0 , . . . , in−1 ] to be the set of all sequences in Σ that start with symbols i0 , . . . , in−1 , that is [i0 , . . . , in−1 ] = {x = (xj )∞ j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}. Let p = (p(1), . . . , p(k)) be a probability vector (that is, p(j) > 0, defined the Bernoulli measure µp on cylinders by setting Pk j=1 p(j) = 1). We µp [i0 , . . . , in−1 ] = p(i0 )p(i1 ) · · · p(in−1 ). We have already seen that µp is a σ-invariant measure. Proposition 6.4.1 Let µp be a Bernoulli measure. Then µp is ergodic. Proof. We first make the following observation: let I = [i0 , . . . , ip−1 ], J = [j0 , . . . , jq−1 ] be cylinders of ranks p, q, respectively. Consider I ∩ σ −n J where n ≥ p. Then I ∩ σ −n J = {x = (xj )∞ j=0 ∈ Σ | xj = ij for j = 0, 1, . . . , p − 1, xj+n = yj for j = 0, 1, . . . , q − 1} [ [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ], = xp ,...,xn−1 a disjoint union. Hence X µp (I ∩ σ −n J) = µp [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ] xp ,...,xn−1 = X xp ,...,xn−1 p(i0 )p(i1 ) · · · p(in−1 )p(xp ) · · · p(xn−1 )p(j0 )p(j1 ) · · · p(jq−1 ) = p(i0 )p(i1 ) · · · p(in−1 )p(j0 )p(j1 ) · · · p(jq−1 ) as = µp (I)µp (J). X xp p(xp ) = · · · = X p(xn−1 ) = 1 xn−1 Let B ∈ B be σ-invariant. By Lemma 6.1.1 it is sufficient to prove that µp (B)µp (I) ≤ µp (B ∩I) for each cylinder I. Let ε > 0. We first approximate the invariant set B by a finite 59 (6.4.1) MATH4/61112 6. Ergodic measures: using the HKET unionSof cylinders. By Proposition 2.4.4, we can find a finite disjoint union of cylinders A = rj=1 Jj such that µp (B△A) < ε. Note that |µp (A) − µp (B)| < ε. Let n be any integer greater than the rank of I. Note that σ −n B△σ −n A = σ −n (B△A). Hence µp (σ −n B△σ −n A) = µp (σ −n (B△A)) = µp (B△A) < ε, −n where we have Sr used the facts that σ B = B and that µp is an invariant measure. As A = j=1 Jj is a finite union of cylinders and n is greater than the rank of I, it follows from (6.4.1) that r r X [ µp (σ −n A ∩ I) = µp σ −n µp (σ −n Jj ∩ I) Jj ∩ I = j=1 j=1 = r X j=1 µp (Jj )µp (I) = µp r [ j=1 = µp (A)µp (I). Jj µp (I) Finally, note that (σ −n A ∩ I)△(σ −n B ∩ I) ⊂ (σ −n A)△(σ −n B). Hence µp ((σ −n A ∩ I)△(σ −n B ∩ I)) < ε so that µp (σ −n A ∩ I) < µp (σ −n B ∩ I) + ε. Hence µp (B)µp (I) = µp (σ −n B)µp (I) ≤ µp (σ −n A)µp (I) + ǫ = µp (σ −n A ∩ I) + ǫ ≤ µp (σ −n B ∩ I) + 2ǫ = µ(B ∩ I) + 2ǫ. As ε > 0 is arbitrary, we have that µp (B)µp (I) ≤ µp (B ∩ I) for any cylinder I. By Lemma 6.1.1, it follows that µp (B) = 0 or 1. Hence µp is ergodic. 2 §6.5 Markov shifts Let P be an irreducible stochastic k × k matrix with entries P (i, j). Let p = (p(1), . . . , p(k)) be the unique right probability eigenvector corresponding to eigenvalue 1, so that pP = p. Recall that the Markov measure µP is defined on the Borel σ-algebra by defining it on cylinders in the following way: µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ). We have seen that µP is an invariant measure for the shift map σ. We can adapt the proof of Proposition 6.4.1 to show that µP is ergodic. Proposition 6.5.1 Let P be an irreducible stochastic matrix. Then the corresponding Markov measure µP is ergodic. Proof (not examinable). Let d denote the period of P . Let I = [i0 , . . . , ip−1 ], J = [j0 , . . . , jq−1 ] be cylinders of ranks p, q, respectively. Consider I ∩ σ −n J where n ≥ p. Then I ∩ σ −n J = {x ∈ Σ | xj = ij for j = 0, 1, . . . , p − 1, xj+n = yj for j = 0, 1, . . . , q − 1} [ [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ], = xp ,...,xn−1 60 MATH4/61112 6. Ergodic measures: using the HKET a disjoint union. Hence µP (I ∩ σ −n J) X µp [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ] = xp ,...,xn−1 = X xp ,...,xn−1 p(i0 )P (i0 , i1 ) · · · P (ip−2 , ip−1 )P (ip−1 , xp )P (xp , xp+1 ) · · · P (xn−1 , j0 ) × P (j0 , j1 ) · · · P (jq−2 , jq−1 ) X 1 P (ip−1 , xp )P (xp , xp+1 ) · · · P (xn−1 , j0 ) = µP (I)µP (J) p(j0 ) x ,...,x p = µP (I)µP (J) n−1 P n−1−p (ip−1 , j0 ) p(j0 ) By the Perron-Frobenius Theorem (Theorem 3.3.6), we know that P nd (i, j) → p(j) as n → ∞. Hence, letting n → ∞ through an appropriate subsequence, we see that µP (I ∩ σ −n J) → µP (I)µP (J). (6.5.1) The remainder of the proof is almost identical to the proof of Proposition 6.4.1. Let B ∈ B be σ-invariant. By Proposition 6.1.1 it is sufficient to prove that µP (B)µP (I) ≤ µP (B ∩I) for every cylinder I. Let ε > 0. We approximate B by a finite union of cylinders S by using Proposition 2.4.4. That is, we can find a finite disjoint union of cylinders A = rj=1 Jj such that µP (B△A) < ε. Note that |µP (A) − µP (B)| < ε. Let n be any integer greater than the rank of I. Note that σ −n B△σ −n (A) = σ −n (B△A). Hence µP (σ −n B△σ −n A) = µP (σ −n (B△A)) = µP (B△A) < ε −n where we have Sr used the facts that σ B = B and that µP is an invariant measure. As A = j=1 Jj is a finite union of cylinders, it follows from (6.5.1) that by choosing n sufficiently large, we have that µP (σ −n Jj ∩ I) ≤ µP (Jj )µP (I) + ε for j = 1, 2, . . . , r. Hence r r [ X µP (σ −n A)µP (I) = µP σ −n Jj µP (I) = µP (σ −n Jj )µP (I) j=1 = r X j=1 j=1 µP (Jj ∩ I) + ε = µP = µP (A ∩ I) + ε. r [ j=1 Jj ∩ I + ε Finally, note that (σ −n A ∩ I)△(σ −n B ∩ I) ⊂ (σ −n A)△(σ −n B). Hence µP ((σ −n A ∩ I)△(σ −n B ∩ I)) < ε so that µP (σ −n B ∩ I) < µP (σ −n A ∩ I) + ε. Hence µP (B)µP (I) = µP (σ −n B)µP (I) ≤ µP (σ −n A)µP (I) + ε = µP (σ −n A ∩ I) + 2ε ≤ µP (σ −n B ∩ I) + 3ε = µP (B ∩ I) + 3ε. As ε > 0 was arbitrary, we have that µP (B)µP (I) ≤ µP (B ∩ I). By Proposition 6.1.1, the result follows. 2 61 MATH4/61112 §6.6 6. Ergodic measures: using the HKET Exercises Exercise 6.1 The dynamical system T : [0, 1] → [0, 1] defined by T (x) = 2x if 0 ≤ x ≤ 1/2 2(1 − x) if 1/2 ≤ x ≤ 1 is called the tent map. (i) Prove that T preserves Lebesgue measure. (ii) Prove that T is ergodic with respect to Lebesgue measure. Exercise 6.2 Recall that the L¨ uroth map T : [0, 1] → [0, 1] is defined to be 1 1 , n(n + 1)x − n if x ∈ T (x) = n+1 n 0 ifx = 0. We saw in Exercise 3.9 that Lebesgue measure is a T -invariant probability measure. Prove that Lebesgue measure is ergodic. Exercise 6.3 Prove (using induction on n) Lemma 6.3.1. Exercise 6.4 Let Σ = {x = (xj )∞ j=0 | xj ∈ {0, 1}} and let σ : Σ → Σ, (σ(x))j = xj+1 be the shift map on the space of infinite sequences of two symbols {0, 1}. Note that Σ supports uncountably many different σ-invariant measures (for example, the Bernoulli-(p, 1 − p) measures are all ergodic and all distinct for p ∈ (0, 1)). We will use this observation to prove that the doubling map has uncountably many ergodic measures. Define π : Σ → R/Z by π(x) = π(x0 , x1 , . . .) = xn x0 x1 + 2 + · · · + n+1 + · · · . 2 2 2 (i) Show that π is continuous. (ii) Let T : R/Z → R/Z be the doubling map: T (x) = 2x mod 1. Show that π ◦ σ = T ◦ π. (iii) If µ is a σ-invariant probability measure on Σ, show that π∗ µ (where π∗ µ(B) = µ(π −1 B) for a Borel subset B ⊂ R/Z) is a T -invariant probability measure on R/Z. (Lebesgue measure on R/Z corresponds to choosing µ to be the Bernoulli-(1/2, 1/2)measure on Σ.) (iv) Show that if µ is an ergodic measure for σ, then π∗ µ is an ergodic measure for T . (v) Conclude that there are uncountably many different ergodic measures for the doubling map. 62 MATH4/61112 7. Continuous transformations 7. Continuous transformations on compact metric spaces §7.1 Introduction So far, we have been studying a measurable map T defined on a probability space (X, B, µ). We have asked whether the given measure µ is invariant or ergodic. In this section, we shift our focus slightly and consider, for a given transformation T : X → X, the space M (X, T ) of all probability measures that are invariant under T . In order to equip M (X, T ) with some structure we will need to assume that the underlying space X is itself equipped with some additional structure other than merely being a measure space. Throughout this section we will work in the context of X being a compact metric space and T being a continuous transformation. §7.2 Probability measures on compact metric spaces Let X be a compact metric space equipped with the Borel σ-algebra B. (Recall that the Borel σ-algebra is the smallest σ-algebra that contains all the open subsets of X.) Let C(X, R) = {f : X → R | f is continuous} denote the space of real-valued continuous functions defined on X. Define the uniform norm of f ∈ C(X, R) by kf k∞ = sup |f (x)|. x∈X With this norm, C(X, R) is a Banach space. An important property of C(X, R) that will prove to be useful later on is that it is separable: C(X, R) contains a countable dense subset. Thus we can choose a sequence {fn ∈ C(X, R)}∞ n=1 such that, for all f ∈ C(X, R) and all ε > 0, there exists n such that kf − fn k∞ < ε. Let M (X) denote the set of all Borel probability measures on (X, B). It will be very important to have a sensible notion of convergence in M (X); the appropriate notion fos us is called weak∗ convergence. We say that a sequence of probability measures µn weak∗ converges to µ, as n → ∞ if, for every f ∈ C(X, R), Z Z f dµn → f dµ, as n → ∞. If µn weak∗ converges to µ then we write µn ⇀ µ. We can make M (X) into a metric space compatible with this definition of convergence by choosing a countable dense subset {fn }∞ n=1 ⊂ C(X, R) and, for µ1 , µ2 ∈ M (X), setting Z Z ∞ X 1 fn dµ1 − fn dµ2 dM (X) (µ1 , µ2 ) = n 2 kfn k∞ n=1 (we can assume that fn 6≡ 0 for any n). It is easy to check that µn ⇀ µ if and only if dM (X) (µn , µ) → 0. However, we will not need to work with a particular metric: what will be important is the definition of convergence. 63 MATH4/61112 7. Continuous transformations Remark. Note that with this definition it is not necessarily true that µn (B) → µ(B), as n → ∞, for B ∈ B. §7.2.1 The Riesz Representation Theorem Let µ ∈ M (X) be a Borel probability measure. Then we can think of µ as a functional that acts on C(X, R), that is we can regard µ as a map Z µ : C(X, R) → R : f 7→ f dµ. R We will often write µ(f ) for f dµ. Notice that this functional enjoys several natural properties: (i) the functional defined by µ is linear: µ(λ1 f1 + λ2 f2 ) = λ1 µ(f1 ) + λ2 µ(f2 ) where λ1 , λ2 ∈ R and f1 , f2 ∈ C(X, R). (ii) the functional defined by µ is bounded: i.e. if f ∈ C(X, R) then |µ(f )| ≤ kf k∞ . (iii) if f ≥ 0 then µ(f ) ≥ 0 (we say that the functional µ is positive); (iv) consider the function 1 defined by 1(x) ≡ 1 for all x; then µ(1) = 1 (we say that the functional µ is normalised). The Riesz Representation Theorem says that the above properties characterise all Borel probability measures on X. That is, if we have a map w : C(X, R) → R that satisfies the above four properties, then w must be given by integrating with respect to a Borel probability measure. This will be a very useful method of constructing measures: we need only construct bounded positive normalised linear functionals. Theorem 7.2.1 (Riesz Representation Theorem) Let w : C(X, R) → R be a functional such that: (i) w is linear: i.e. w(λ1 f1 + λ2 f2 ) = λ1 w(f1 ) + λ2 w(f2 ); (ii) w is bounded: i.e. for all f ∈ C(X, R) we have |w(f )| ≤ kf k∞ ; (iii) w is positive: i.e. if f ≥ 0 then w(f ) ≥ 0; (iv) w is normalised: i.e. w(1) = 1. Then there exists a Borel probability measure µ ∈ M (X) such that Z w(f ) = f dµ. Moreover, µ is unique. Thus the Riesz Representation Theorem says that “if it looks like integration on continuous functions, then it is integration with respect to a (unique) Borel probability measure.” 64 MATH4/61112 §7.2.2 7. Continuous transformations Properties of M (X) First note that the space M (X) of Borel probability measures on the compact metric space X is non-empty (provided X 6= ∅). This is because, for each x ∈ X, the Dirac measure δx is a Borel probability measure. Indeed, we have the following result: Proposition 7.2.2 There is a continuous embedding of X in M (X) given by the map X → M (X) : x 7→ δx , i.e. if xn → x then δxn ⇀ δx . Proof. See Exercise 7.1. 2 Recall that a subset C of a vector space is convex if whenever v1 , v2 ∈ C and α ∈ [0, 1] then αv1 + (1 − α)v2 ∈ C. Proposition 7.2.3 The space M (X) is convex. Proof. Let µ1 , µ2 ∈ M (X), α ∈ [0, 1]. Then it is easy to check that αµ1 + (1 − α)µ2 , defined by (αµ1 + (1 − α)µ2 )(B) = αµ1 (B) + (1 − α)µ2 (B), is a Borel probability measure. 2 Finally, recall that a metric space K is said to be (sequentially) compact if every sequence of points in K has a convergent subsequence. Proposition 7.2.4 The space M (X) is weak∗ compact. R Proof. For convenience, we shall write µ(f ) = f dµ. Since C(X, R) is separable, we can choose a countable dense subset of functions {fi }∞ i=1 ⊂ C(X, R). Given a sequence µn ∈ M (X), we shall first consider the sequence of real numbers µn (f1 ) ∈ R. We have that |µn (f1 )| ≤ kf1 k∞ for all n, so µn (f1 ) is a bounded sequence of (1) real numbers. As such, it has a convergent subsequence, µn (f1 ) say. (1) (1) Next we apply the sequence of measures µn to f2 and consider the sequence µn (f2 ) ∈ R. Again, this is a bounded sequence of real numbers and so it has a convergent subsequence (2) µn (f2 ). (i) (i−1) In this way we obtain, for each i ≥ 1, nested subsequences {µn } ⊂ {µn } such that (i) (n) µn (fj ) converges for 1 ≤ j ≤ i. Now consider the diagonal sequence µn . Since, for n ≥ i, (n) (i) (n) µn is a subsequence of µn , µn (fi ) converges for every i ≥ 1. (n) We can now use the fact that {fi } is dense to show that µn (f ) converges for all f ∈ C(X, R), as follows. For any ε > 0, we can choose fi such that kf − fi k∞ ≤ ε. Since (n) µn (fi ) converges, there exists N such that if n, m ≥ N then (m) |µ(n) n (fi ) − µm (fi )| ≤ ε. Thus if n, m ≥ N we have (m) (n) (n) (n) (m) (m) (m) |µ(n) n (f ) − µm (f )| ≤ |µn (f ) − µn (fi )| + |µn (fi ) − µm (fi )| + |µm (fi ) − µm (f )| ≤ 3ε, 65 MATH4/61112 7. Continuous transformations (n) so µn (f ) converges, as required. (n) To complete the proof, write w(f ) = limn→∞ µn (f ). We claim that w satisfies the hypotheses of the Riesz Representation Theorem and so corresponds to integration with respect to a probability measure. (i) By construction, w is a linear mapping: w(λf + µg) = λw(f ) + µw(g). (ii) As |w(f )| ≤ kf k∞ , we see that w is bounded. (iii) If f ≥ 0 then it is easy to check that w(f ) ≥ 0. Hence w is positive. (iv) It is easy to check that w is normalised: w(1) = 1. Therefore, by the Riesz Representation Theorem, there exists µ ∈ M (X) such that w(f ) = R R R (n) (n) f dµ. We then have that f dµn → f dµ, as n → ∞, for all f ∈ C(X, R), i.e., that µn converges weak∗ to µ, as n → ∞. 2 §7.3 Invariant measures for continuous transformations Let X be a compact metric space equipped with the Borel σ-algebra and let T : X → X be a continuous transformation. It is clear that T is measurable. Given a measure µ, we have already defined the measure T∗ µ by T∗ µ(B) = µ(T −1 B). If µ is a Borel probability measure, then it is straightforward to check that T∗ µ is a Borel probability measure. We can think of T∗ as a transformation on M (X), namely: T∗ : M (X) → M (X), T∗ µ = µ ◦ T −1 . That is, if B ∈ B then T∗ µ(B) = µ(T −1 B). The following result tells us how to integrate with respect to T∗ µ. Lemma 7.3.1 For f ∈ L1 (X, B, µ) we have Z f d(T∗ µ) = Z f ◦ T dµ. Proof. From the definition, for B ∈ B, Z Z Z −1 χB d(T∗ µ) = (T∗ µ)(B) = µ(T B) = χT −1 B dµ = χB ◦ T dµ. Thus the result holds for simple functions. If f ≥ 0 is a positive measurable function then we can choose an increasing sequence of simple functions fn increasing to f pointwise. We have Z Z fn d(T∗ µ) = fn ◦ T dµ and, applying the Monotone Convergence Theorem (Theorem 3.1.2) to each side, we obtain Z Z f d(T∗ µ) = f ◦ T dµ. The result extends to an arbitrary real-valued f ∈ L1 (X, B, µ) by considering positive and negative parts and then to complex-valued integrable functions by taking real and imaginary parts. 2 66 MATH4/61112 7. Continuous transformations Recall that a measure µ is said to be T -invariant if µ(T −1 B) = µ(B) for all B ∈ B. Hence µ is T -invariant if and only if T∗ µ = µ. Write M (X, T ) = {µ ∈ M (X) | T∗ µ = µ} to denote the space of all T -invariant Borel probability measures. The following result gives a useful criterion for checking whether a measure is T invariant. Lemma 7.3.2 Let T : X → X be a continuous mapping of a compact metric space. The following are equivalent: (i) µ ∈ M (X, T ); (ii) for all f ∈ C(X, R) we have that Z f ◦ T dµ = Z f dµ. (7.3.1) Proof. We prove (i) implies (ii). Suppose that µ ∈ M (X, T ) so that T∗ µ = µ. Let f ∈ C(X, R). Then f ∈ L1 (X, B, µ). Hence by Lemma 7.3.1, for any f ∈ C(X, R) we have Z Z Z f ◦ T dµ = f d(T∗ µ) = f dµ. Conversely, Lemma 7.3.1 allows us to write (7.3.1) as: µ(f ) = (T∗ µ)(f ) for all f ∈ C(X, R). Hence µ and T∗ µ determine the same linear functional on C(X, R). By uniqueness in the Riesz Representation theorem, we have T∗ µ = µ. 2 §7.4 Invariant measures for continuous maps on the torus We can use Lemma 7.3.2 to prove that a given measure is invariant for certain dynamical systems. We first note that we need only check (7.3.1) for a dense set of continuous functions. Lemma 7.4.1 Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, Rfor all f ∈ C(X, R) and all ε > 0 thereRexists g ∈ S Rsuch that kf −gk∞ < ε). Suppose that g ◦T dµ = R g dµ for all g ∈ S. Then f ◦ T dµ = f dµ for all f ∈ C(X, R). Proof. Let f ∈ C(X, R) and let ε > 0. Choose g ∈ S such that kf − gk∞ < ε. Then Z Z f ◦ T dµ − f dµ Z Z Z Z Z Z ≤ f ◦ T dµ − g ◦ T dµ + g ◦ T dµ − g dµ + g dµ − f dµ Z Z Z Z ≤ |f ◦ T − g ◦ T | dµ + g ◦ T dµ − g dµ + |f − g| dµ. 67 MATH4/61112 7. Continuous transformations Noting that, Ras kf − gk∞ < ε, we have that |f (T x) − g(T x)| < ε for all x, and that R g ◦ T dµ = g dµ, we have that Z Z f ◦ T dµ − f dµ < 2ε. As ε is arbitrary, the result follows. 2 Corollary 7.4.2 Let T be a continuous transformation of a compact metric space X, equipped with the Borel σ-algebra. Let µ be a Borel probability measure on X. R R Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions such that g◦T dµ = g dµ for all g ∈ S. Then µ is a T -invariant measure. Proof. This follows immediately from Lemma 7.3.2 and Lemma 7.4.1. 2 We show how to use Corollary 7.4.2 by studying some of our examples. §7.4.1 Circle rotations Let T (x) = x + α mod 1 be a circle rotation. We show how to use Corollary 7.4.2 to prove that Lebesgue measure µ is T -invariant. Let ℓ ∈ Z. We first note that if ℓ 6= 0 then Z 1 1 e2πiℓx dx = e2πiℓx = 0. 2πiℓ 0 R 2πiℓx We also note that if ℓ = 0 then e dx = 1. Let S denote the set of trigonometric polynomials, i.e. r−1 X S= cj e2πiℓj x | cj ∈ R, ℓj ∈ Z . j=0 Then S is uniformly dense in C(X, R) by the Stone-Weierstrass Theorem (Theorem 1.2.2). Let g ∈ S be a trigonometric polynomial and write g(x) = r−1 X cj e2πiℓj x j=0 R where ℓj = 0 if and only if j = 0. Hence g dµ = c0 . Note that r−1 r−1 X X 2πiℓj (x+α) cj e2πiℓj α e2πiℓj x . cj e = g(T x) = j=0 j=0 Hence Z g ◦ T dµ = Z X r−1 cj e2πiℓj α e2πiℓj x = r−1 X j=0 j=0 e2πiℓj α Z e2πiℓj x dµ and R the only non-zero integral occurs when ℓj = 0, i.e. j = 0. We must therefore have that g ◦ T dµ = R c0 . R Hence g ◦ T dµ = g dµ for all g ∈ S. It follows from Corollary 7.4.2 that µ is T -invariant. 68 MATH4/61112 §7.4.2 7. Continuous transformations Toral endomorphisms Let A be a k × k integer matrix with det A 6= 0. Define the linear toral endomorphism T : Rk /Zk → Rk /Zk by T ((x1 , . . . , xk ) + Zk ) = A(x1 , . . . , xk ) + Zk . When T is a linear toral automorphism (i.e. when det A = ±1) we have already seen that Lebesgue measure is invariant. We use Corollary 7.4.2 to prove the Lebesgue measure µ is T -invariant when det A 6= 0. For n = (n1 , . . . , nk ) ∈ Zk and x = (x1 , . . . , xk ) ∈ Rk define, as before, hn, xi = n1 x1 + · · · + nk xk . Note that Z Z Z 2πihn,xi e dµ = · · · e2πin1 x1 · · · e2πink xk dx1 · · · dxk . Hence Z e2πihn,xi dµ = 0 if n 6= 0 1 if n = 0 where 0 = (0, . . . , 0) ∈ Zk . Let r−1 X (j) (j) (j) cj e2πihn ,xi | cj ∈ R, n(j) = (n1 , . . . , nk ) ∈ Zk . S= j=0 By the Stone-Weierstrass Theorem (Theorem 1.2.2), we see that S is uniformly dense in C(Rk /Zk , R). Let g ∈ S and write r−1 X (j) cj e2πihn ,xi g(x) = j=0 where n(j) = 0 if and only if j = 0. Then Z g dµ = Z X r−1 2πihn(j) ,xi cj e dµ = g(T x) = r−1 X (j) ,Axi cj e2πihn Z g ◦ T dµ = Z X r−1 = r−1 X Z (j) ,xi e2πihn dµ = c0 . (j) A,xi cj e2πihn . j=0 j=0 Hence cj j=0 j=0 Note that r−1 X 2πihn(j) A,xi cj e dµ = r−1 X j=0 j=0 cj Z (j) A,xi e2πihn dµ. These integrals are zero unless n(j) A = 0. As det A = 6 0 this happen only when n(j) = 0, i.e. when j = 0. Hence Z Z g ◦ T dµ = c0 = g dµ. Hence by Corollary 7.4.2, µ is a T -invariant measure. 69 MATH4/61112 7. Continuous transformations Remark. You will notice a strong connection between the above arguments and Fourier series and you may think that we could take g(x) to be the nth partial sum of the Fourier series P for f . However, one needs to take care. Suppose f ∈ C(Rk /Zk , R) has Fourier series n cn (f )e2πihn,xi . We need to be careful about what it means for this infinite series to converge. We know that the sequence of partial sums sn converges in L2 to f , but we do not know that the partial sums converge uniformly to f . That is, we know that kf − sn k2 → 0, but not necessarily that kf − sn k∞ → 0. In fact, in general, it is not true that kf − sn k∞ → 0. Pn−1 However, if one defines σn = 1/n j=0 sj to be the average of the first n partial sums, then it is true that kf − σn k∞ → 0. (This is quite a deep result.) §7.5 Existence of invariant measures Given a continuous mapping T : X → X of a compact metric space, it is natural to ask whether invariant measures necessarily exist, i.e., whether M (X, T ) 6= ∅. The next result shows that this is the case. Theorem 7.5.1 Let T : X → X be a continuous mapping of a compact metric space. Then there exists at least one T -invariant probability measure. Proof. Let ν ∈ M (X) be a probability measure (for example, we could take ν to be a Dirac measure). Define the sequence µn ∈ M (X) by n−1 µn = 1X j T∗ ν, n j=0 so that, for B ∈ B, µn (B) = 1 (ν(B) + ν(T −1 B) + · · · + ν(T −(n−1) B)). n Since M (X) is weak∗ compact, some subsequence µnk converges, as k → ∞, to a measure µ ∈ M (X). We shall show that µ ∈ M (X, T ). By Lemma 7.3.2, this is equivalent to showing that Z Z f dµ = f ◦ T dµ for all f ∈ C(X, R). To see this, first note that f ◦ T − f is Z Z f ◦ T dµ − f dµ = continuous. Then Z (f ◦ T − f ) dµ Z = lim (f ◦ T − f ) dµnk k→∞ Z nX k −1 1 j T∗ ν = lim (f ◦ T − f ) d k→∞ nk j=0 Z nk −1 X 1 j = lim (f ◦ T − f ) dT∗ ν k→∞ nk j=0 70 MATH4/61112 7. Continuous transformations Z nX k −1 1 j+1 j = lim (f ◦ T − f ◦ T ) dν k→∞ nk j=0 Z 1 = lim (f ◦ T nk − f ) dν k→∞ nk 2kf k∞ = 0. ≤ lim k→∞ nk Therefore, µ ∈ M (X, T ), as required. 2 We will need the following additional properties of M (X, T ). Theorem 7.5.2 Let T : X → X be a continuous mapping of a compact metric space. Then M (X, T ) is a weak∗ compact and convex subset of M (X). Proof. The fact that M (X, T ) is convex is straightforward from the definition. To see that M (X, T ) is weak∗ compact it is sufficient to show that it is a weak∗ closed subset of the weak∗ compact M (X). Suppose that µn ∈ M (X, T ) is such that µn ⇀ µ ∈ M (X). We need to show that µ ∈ M (X, T ). To see this, observe that for any f ∈ C(X, R) we have that Z Z f ◦ T dµ = lim f ◦ T dµn as f ◦ T is continuous n→∞ Z = lim f dµn as µn ∈ M (X, T ) n→∞ Z = f dµ as µn ⇀ µ. 2 §7.6 Exercises Exercise 7.1 Prove Proposition 7.2.2: show that if xn , x ∈ X and xn → x then δxn ⇀ δx . Exercise 7.2 Prove that T∗ : M (X) → M (X) is weak∗ continuous (i.e. if µn ⇀ µ then T∗ µn ⇀ T∗ µ). Exercise 7.3 Let X be a compact metric space. For µ ∈ M (X) define Z . kµk = sup f dµ f ∈C(X,R),kf k∞ ≤1 We say that µn converges strongly to µ if kµn − µk → 0 as n → ∞. The topology this determines is called the strong topology (or the operator topology). (i) Show that if µn → µ strongly then µn ⇀ µ in the weak∗ topology. (ii) Suppose that X is infinite. Show that X ֒→ M (X) : x 7→ δx is not continuous in the strong topology. 71 MATH4/61112 7. Continuous transformations (iii) Prove that kδx − δy k = 2 if x 6= y. (You may use Urysohn’s Lemma: Let A and B be disjoint closed subsets of a metric space X. Then there is a continuous function f ∈ C(X, R) such that 0 ≤ f ≤ 1 on X while f ≡ 0 on A and f ≡ 1 on B.) Hence prove that M (X) is not compact in the strong topology when X is infinite. Exercise 7.4 Give an example of a sequence of measures µn and a set B such that µn ⇀ µ but µn (B) 6→ µ(B). Exercise 7.5 Prove that M (X, T ) is convex. Exercise 7.6 Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, for all f ∈ C(X, R) and all exists g ∈ S such that kf − gk∞ < ε). Let µn , µ ∈ M (X). R ε > 0 there R Suppose that f dµn → f dµ for all f ∈ S. Prove that µn ⇀ µ. Exercise 7.7 Let Σ = {x = (xj )∞ j=0 | xj ∈ {0, 1}} denote the shift space on two symbols 0, 1. Let σ : Σ → Σ, (σ(x))j = xj+1 denote the shift map. (i) How many periodic points of period n are there? (ii) Let Per(n) denote the set of periodic points of period n. Define µn = 1 2n X δx . x∈Per(n) Let ij ∈ {0, 1}, 0 ≤ j ≤ m − 1 and define the cylinder set [i0 , i1 , . . . , im−1 ] = {x = (xj )∞ j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , m − 1}. Let µ denote the Bernoulli-(1/2, 1/2) measure. Prove that Z Z χ[i0 ,i1 ,...,im−1 ] dµn → χ[i0 ,i1 ,...,im−1 ] dµ as n → ∞. (iii) Prove that χ[i0 ,i1 ,...,im−1 ] is continuous function. (iv) Use Exercise 7.6 and the Stone-Weierstrass Theorem (Theorem 1.2.2) to show that µn ⇀ µ as n → ∞. Exercise 7.8 Let X = R3 /Z3 be the 3-dimensional torus. Let α ∈ R. Define T : X → X by x α+x T y + Z3 = y + x + Z3 . z z+y Use Corollary 7.4.2 to prove that Lebesgue measure µ is a T -invariant measure. 72 MATH4/61112 8. Ergodic measures for continuous transformations 8. Ergodic measures for continuous transformations §8.1 Introduction In the previous section we saw that, given a continuous transformation of a compact metric space, the set of T -invariant Borel probability measures is non-empty. One can ask a similar question: is the set of ergodic Borel probability measures non-empty? In this section we address this question. We let E(X, T ) ⊂ M (X, T ) denote the set of ergodic T -invariant Borel probability measures on X. §8.2 Radon-Nikodym derivatives We will need the concept of Radon-Nikodym derivatives. Definition. Let µ be a measure on the measurable space (X, B). We say that a measure ν is absolutely continuous with respect to µ and write ν ≪ µ if ν(B) = 0 whenever µ(B) = 0, B ∈ B. Remark. Thus ν is absolutely continuous with respect to µ if sets of µ-measure zero also have ν-measure zero (but there may be more sets of ν-measure zero). For example, let f ∈ L1 (X, B, µ) be non-negative and define a measure ν by Z f dµ. ν(B) = (8.2.1) B Then ν ≪ µ. As a particular example, let X = [0, 1] be equipped with the Borel σ-algebra B. Define f : [0, 1] → R by 2x if 0 ≤ x ≤ 1/2 f (x) = 0 if 1/2 < x ≤ 1. Let µ be Lebesgue measure and let ν be the measure given by Z f dµ. ν(B) = B If A ⊂ [1/2, 1] is any Borel set then ν(A) = 0. The following theorem says that, essentially, all absolutely continuous measures occur by the construction in (8.2.1). Theorem 8.2.1 (Radon-Nikodym) Let (X, B, µ) be a probability space. Let ν be a measure defined on B and suppose that ν ≪ µ. Then there is a non-negative measurable function f such that Z f dµ for all B ∈ B. ν(B) = B Moreover, f is unique in the sense that if g is a measurable function with the same property then f = g µ-a.e. 73 MATH4/61112 8. Ergodic measures for continuous transformations Remark. If ν ≪ µ then it is customary to write dν/dµ for the function given by the Radon-Nikodym theorem, that is Z dν ν(B) = dµ. B dµ The following relations are all easy to prove, and indicate why the notation was chosen in this way. (i) If ν ≪ µ and f is a µ-integrable function then f is ν-integrable and Z Z dν dµ. f dν = f dµ (ii) If ν1 , ν2 ≪ µ then dν1 dν2 d(ν1 + ν2 ) = + . dµ dµ dµ (iii) If λ ≪ ν ≪ µ then λ ≪ µ and §8.3 dλ dλ dν = . dµ dν dµ Ergodic measures as extreme points §8.3.1 Extreme points of convex sets A point in a convex set is called an extreme point if it cannot be written as a non-trivial convex combination of (other) elements of the set. More precisely, µ is an extreme point of M (X, T ) if, whenever µ = αµ1 + (1 − α)µ2 , with µ1 , µ2 ∈ M (X, T ), 0 < α < 1, then we have µ1 = µ2 = µ. Remarks. (i) Let Y be the unit square Y = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} ⊂ R2 . Then the extreme points of Y are the corners (0, 0), (0, 1), (1, 0), (1, 1). (ii) Let Y be the (closed) unit disc Y = {(x, y) | x2 + y 2 ≤ 1} ⊂ R2 . Then the set of extreme points of Y is precisely the unit circle {(x, y) | x2 + y 2 = 1}. §8.3.2 Existence of ergodic measures The next result will allow us to show that ergodic measures for continuous transformations on compact metric spaces always exist. Theorem 8.3.1 Let T be a continuous transformation of a compact metric space X equipped with the Borel σ-algebra B. The following are equivalent: 74 MATH4/61112 8. Ergodic measures for continuous transformations (i) the T -invariant probability measure µ is ergodic; (ii) µ is an extreme point of M (X, T ). Proof. We prove (ii) implies (i). If µ is an extreme point of M (X, T ) then it is ergodic. In fact, we shall prove the contrapositive. Suppose that µ is not ergodic; we show that µ is not an extreme point of M (X, T ). As µ is not ergodic, there exists B ∈ B such that T −1 B = B and 0 < µ(B) < 1. Define probability measures µ1 and µ2 on X by µ1 (A) = µ(A ∩ B) , µ(B) µ2 (A) = µ(A ∩ (X \ B)) . µ(X \ B) (The assumption that 0 < µ(B) < 1 ensures that the denominators are not equal to zero.) Clearly, µ1 6= µ2 , since µ1 (B) = 1 while µ2 (B) = 0. Since T −1 B = B, we also have T −1 (X \ B) = X \ B. Thus we have µ1 (T −1 A) = = = = = µ(T −1 A ∩ B) µ(B) −1 µ(T A ∩ T −1 B) µ(B) −1 µ(T (A ∩ B)) µ(B) µ(A ∩ B) µ(B) µ1 (A) and (by the same argument) µ2 (T −1 A) = µ(T −1 A ∩ (X \ B)) = µ2 (A), µ(X \ B) i.e., µ1 and µ2 are both in M (X, T ). However, we may write µ as the non-trivial (since 0 < µ(B) < 1) convex combination µ = µ(B)µ1 + (1 − µ(B))µ2 , so that µ is not an extreme point. 2 Proof (not examinable). We prove (i) implies (ii). Suppose that µ is ergodic and that µ = αµ1 + (1 − α)µ2 , with µ1 , µ2 ∈ M (X, T ) and 0 < α < 1. We shall show that µ1 = µ (so that µ2 = µ, also). This will show that µ is an extreme point of M (X, T ). If µ(A) = 0 then µ1 (A) = 0, so that µ1 ≪ µ. Therefore the Radon-Nikodym derivative dµ1 /dµ ≥ 0 exists. One can easily deduce from the statement of the Radon-Nikodym Theorem that µ1 = µ if and only if dµ1 /dµ = 1 µ-a.e. We shall show that this is indeed the case by showing that the sets where, respectively, dµ1 /dµ < 1 and dµ1 /dµ > 1 both have µ-measure zero. Let dµ1 (x) < 1 . B= x∈X | dµ 75 MATH4/61112 8. Ergodic measures for continuous transformations Now B dµ1 dµ = dµ Z T −1 B dµ1 dµ = dµ Z µ1 (B) = and µ1 (T −1 B) = Z Z B∩T −1 B B∩T −1 B dµ1 dµ + dµ Z dµ1 dµ dµ (8.3.1) dµ1 dµ + dµ Z dµ1 dµ. dµ (8.3.2) B\T −1 B T −1 B\B As µ1 ∈ M (X, T ), we have that µ1 (B) = µ1 (T −1 B). Hence comparing the last summands in both (8.3.1) and (8.3.2) we obtain Z Z dµ1 dµ1 dµ = dµ. (8.3.3) B\T −1 B dµ T −1 B\B dµ In fact, these integrals are taken over sets of the same µ-measure: µ(T −1 B \ B) = µ(T −1 B) − µ(T −1 B ∩ B) = µ(B) − µ(T −1 B ∩ B) = µ(B \ T −1 B). Note that on the left-hand side of (8.3.3), the integrand dµ1 /dµ < 1. However, on the righthand side of (8.3.3), the integrand dµ1 /dµ ≥ 1. Thus we must have that µ(B \ T −1 B) = µ(T −1 B \ B) = 0, which is to say that µ(T −1 B△B) = 0, i.e. T −1 B = B µ-a.e. Therefore, since µ is ergodic, we have that µ(B) = 0 or µ(B) = 1. We can rule out the possibility that µ(B) = 1 by observing that if µ(B) = 1 then Z Z dµ1 dµ1 dµ = dµ < µ(B) = 1, 1 = µ1 (X) = B dµ X dµ a contradiction. Therefore µ(B) = 0. If we define C= dµ1 x∈X| (x) > 1 dµ then repeating essentially the same argument gives µ(C) = 0. Hence dµ1 µ x∈X| (x) = 1 = µ(X \ (B ∪ C)) = µ(X) − µ(B) − µ(C) = 1, dµ i.e., dµ1 /dµ = 1 µ-a.e. Therefore µ1 = µ, as required. 2 We can now prove that a continuous transformation of a compact metric space always has an ergodic measure. To do this, we will show that M (X, T ) has an extreme point. Theorem 8.3.2 Let T : X → X be a continuous mapping of a compact metric space. Then there exists at least one ergodic measure in M (X, T ). Proof. By Theorem 8.3.1, it is equivalent to prove that M (X, T ) has an extreme point. Choose a countable dense subset of C(X, R), {fi }∞ i=0 say. Consider the first function f0 . Since the map Z M (X, T ) → R : µ 7→ 76 f0 dµ MATH4/61112 8. Ergodic measures for continuous transformations is (weak∗ ) continuous and M (X, T ) is compact, there exists (at least one) ν ∈ M (X, T ) such that Z Z f0 dν = sup f0 dµ. µ∈M (X,T ) If we define M0 = ( ν ∈ M (X, T ) | Z f0 dν = sup µ∈M (X,T ) Z f0 dµ ) then the above shows that M0 is non-empty. Also, M0 is closed and hence compact. We now consider the next function f1 and define ( ) Z Z M1 = ν ∈ M0 | f1 dν = sup f1 dµ . µ∈M0 By the same reasoning as above, M1 is a non-empty closed subset of M0 . Continuing inductively, we define ( ) Z Z Mj = ν ∈ Mj−1 | fj dν = sup fj dµ µ∈Mj−1 and hence obtain a nested sequence of sets M (X, T ) ⊃ M0 ⊃ M1 ⊃ · · · ⊃ Mj ⊃ · · · with each Mj non-empty and closed. Now consider the intersection M∞ = ∞ \ Mj . j=0 Recall that the intersection of a decreasing sequence of non-empty compact sets is nonempty. Hence M∞ is non-empty and we can pick µ∞ ∈ M∞ . We shall show that µ∞ is an extreme point (and hence ergodic). Suppose that we can write µ∞ = αµ1 + (1 − α)µ2 , µ1 , µ2 ∈ M (X, T ), 0 < α < 1. We have to show that µ1 = µ2 . Since {fj }∞ j=0 is dense in C(X, R), it suffices to show that Z fj dµ1 = Z fj dµ2 ∀ j ≥ 0. Consider f0 . By assumption Z Z Z f0 dµ∞ = α f0 dµ1 + (1 − α) f0 dµ2 . In particular, Z However µ∞ ∈ M0 and so Z f0 dµ∞ = f0 dµ∞ ≤ max sup µ∈M (X,T ) Z Z 77 f0 dµ2 . Z Z f0 dµ1 , f0 dµ ≥ max Z f0 dµ1 , f0 dµ2 . MATH4/61112 Therefore 8. Ergodic measures for continuous transformations Z f0 dµ1 = Z f0 dµ2 = Z f0 dµ∞ . Thus, the first identity we require is proved and µ1 , µ2 ∈ M0 . This last fact allows us to employ the same argument on f1 (with M (X, T ) replaced by M0 ) and conclude that Z Z Z f1 dµ1 = f1 dµ2 = f1 dµ∞ and µ1 , µ2 ∈ M1 . Continuing inductively, we show that for an arbitrary j ≥ 0, Z Z fj dµ1 = fj dµ2 and µ1 , µ2 ∈ Mj . This completes the proof. §8.4 2 An example: the North-South map For many dynamical systems there exist uncountably many different ergodic measures. This is the case for the doubling map, Markov shifts, toral automorphisms, etc. Here we give an example of a dynamical system T : X → X for which one can construct M (X, T ) and E(X, T ) explicitly. Let X ⊂ R2 denote the circle of radius 1 centred at (0, 1) in R2 . Call N = (0, 2) the North Pole and S = (0, 0) the South Pole (S) of X. Define a map φ : X \ {N } → R × {0} by drawing a straight line through N and x and denoting by φ(x) the unique point on the x-axis that this line crosses (this is just stereographic projection of the circle). Define T : X → X by −1 1 φ(x) if x ∈ X \ {N }, φ 2 T (x) = N if x = N. Hence T (N ) = N , T (S) = S and if x 6= N, S then T n (x) → S as n → ∞. We call T the N x T (x) φ(x) 2 S φ(x) Figure 8.4.1: The North-South map North-South map. Clearly both N and S are fixed points for T . Hence δN and δS (the Dirac delta measures at N , S, respectively) are T -invariant. It is easy to see that both δN and δS are ergodic. 78 MATH4/61112 8. Ergodic measures for continuous transformations Now let µ ∈ M (X, T ) be an invariant measure. We claim that µ assigns zero measure to the set X \ {N, S}. Let x ∈ X be any point in the right semi-circle (forSexample, take −n I is x = (1, 1) ∈ R2 ) and consider the arc I of semi-circle from x to T (x). Then ∞ n=−∞ T a disjoint union of arcs of semi-circle and, moreover, is equal to the entire right semi-circle. Now ! ∞ ∞ ∞ X X [ µ(I) µ(T −n I) = T −n I = µ n=−∞ n=−∞ n=−∞ and the only way for this to be finite is if µ(I) = 0. Hence µ assigns zero measure to the entire right semi-circle. Similarly, µ assigns zero measure to the left semi-circle. Hence µ is concentrated on the two points N , S, and so must be a convex combination of the Dirac delta measures δN and δS . Hence M (X, T ) = {αδN + (1 − α)δS | α ∈ [0, 1]} and the ergodic measures are the extreme points of M (X, T ), namely δN , δS . §8.5 Unique ergodicity We conclude by looking at the case where T : X → X has a unique invariant probability measure. Definition. Let T : X → X be a continuous transformation of a compact metric space X. If there is a unique T -invariant probability measure then we say that T is uniquely ergodic. Remark. You might wonder why such T are not instead called ‘uniquely invariant’. Recall that the extreme points of M (X, T ) are precisely the ergodic measures. If M (X, T ) consists of just one measure then that measure is an extreme, and so must be ergodic. Unique ergodicity implies the following strong convergence result. Theorem 8.5.1 (Oxtoby’s Ergodic Theorem) Let X be a compact metric space and let T : X → X be a continuous transformation. The following are equivalent: (i) T is uniquely ergodic; (ii) for each f ∈ C(X, R) there exists a constant c(f ) such that n−1 1X f (T j (x)) → c(f ), n j=0 uniformly for x ∈ X, as n → ∞. Remark. The convergence in (8.5.1) means that n−1 1 X j f (T (x)) − c(f ) = 0. lim sup n→∞ x∈X n j=0 79 (8.5.1) MATH4/61112 8. Ergodic measures for continuous transformations Remark. If M (X, T ) = {µ} then the constant c(f ) in (8.5.1) is R f dµ. Proof. We prove (ii) implies (i). Suppose that µ, ν are T -invariant probability measures; we shall show that µ = ν. Integrating the expression in (ii), we obtain Z n−1 Z 1X n→∞ n f dµ = lim j=0 f ◦ T j dµ = Z n−1 1X f ◦ T j dµ = n→∞ n lim j=0 Z c(f ) dµ = c(f ), (that the convergence in (8.5.1) is uniform allows us to interchange integration and taking limits) and, by the same argument Z f dν = c(f ). Therefore Z f dµ = Z f dν for all f ∈ C(X, R) and so µ = ν (by the Riesz Representation Theorem). We prove (i) implies (ii). Let M (X, T ) = {µ}. If (ii) is true, then, byR the Dominated Convergence Theorem (Theorem 3.1.3), we must necessarily have c(f ) = f dµ. The convergence in (ii) means: ∀f ∈ C(X, R), ∀ε > 0, ∃N ∈ N such that if n ≥ N then for all x ∈ X we have n−1 Z X 1 j f (T x) − f dµ < ε. n j=0 Suppose that (ii) is false. Then, negating the above quantifiers, we see that there exists f0 ∈ C(X, R) and ε > 0 and an increasing sequence nk ↑ ∞ such that there exists xnk for which nX Z 1 k −1 j ≥ ε. f (T x ) − f dµ (8.5.2) 0 n 0 k n k j=0 Define the probability measure µk ∈ M (X) by µk = nk −1 1 X T∗j δxk , nk j=0 so that (8.5.2) can be written as Z Z f0 dµk − f0 dµ ≥ ε. Now µk ∈ M (X) and M (X) is weak∗ compact. Hence there exists a weak∗ convergent subsequence, say with weak∗ limit ν. By following the proof of Theorem 7.5.1, it is easy to see that ν ∈ M (X, T ). In particular, we have Z Z f0 dν − f0 dµ ≥ ε. Therefore, ν 6= µ, contradicting unique ergodicity. 80 2 MATH4/61112 §8.6 8. Ergodic measures for continuous transformations Irrational rotations Let X = R/Z, T : X → X : x 7→ x + α mod 1, α irrational. We have already seen that Lebesgue measure µ is an ergodic T -invariant measure. We can prove that Lebesgue measure is the only invariant measure. Proposition 8.6.1 An irrational rotation of a circle is uniquely ergodic and the unique T -invariant measure is Lebesgue measure. Proof. We use Oxtoby’s Ergodic Theorem. To prove that T is uniquely ergodic, we must show that (8.5.1) holds for every continuous function f ∈ C(X, R). Note that the convergence in (8.5.1) is uniform, i.e. we must show that n−1 Z 1 X j →0 f (T (x)) − f dµ (8.6.1) n j=0 ∞ as n → ∞. We first prove (8.6.1) in the case when f (x) = e2πiℓx , ℓ ∈ Z \ {0}. Note that T j (x) = x + jα. Hence n−1 n−1 X 1 X 1 j 2πiℓ(x+jα) f (T (x)) = e n n j=0 j=0 n−1 X 2πiℓx 1 2πiℓαj e = e n j=0 = ≤ 1 |e2πiℓαn − 1| n |e2πiℓα − 1| 2 1 . n |e2πiℓα − 1| As α is irrational, the denominator in (8.6.2) is not zero. Note also that Hence n−1 Z 1 X j sup f (T (x)) − f dµ → 0 x∈X n (8.6.2) R e2πiℓx dµ = 0. j=0 as n → ∞ when f (x) = e2πiℓx , ℓ ∈ Z \ {0}. Clearly (8.6.1) holds when f is a constant function. By taking finite linear combinations of exponential functions we see that n−1 Z 1 X j g(T (x)) − g dµ → 0 sup x∈X n j=0 as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (Theorem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R) and let ε > 0. Then there exists a trigonometric polynomial g such that kf − gk∞ < ε. 81 MATH4/61112 8. Ergodic measures for continuous transformations Hence for any x ∈ X we have n−1 Z 1 X j f (T (x)) − f dµ n j=0 n−1 Z n−1 Z X 1 X 1 j j j (f (T (x)) − g(T (x)) + g(T (x)) − g dµ + g(x) − f (x) dµ ≤ n j=0 n j=0 n−1 Z Z n−1 1 X 1X j j j |f (T (x)) − g(T (x)| + g(T (x)) − g dµ + |g(x) − f (x)| dµ ≤ n n j=0 j=0 n−1 Z 1 X j g(T (x)) − g dµ . ≤ 2ε + n j=0 Hence, taking the supremum over all x ∈ X, we have n−1 n−1 Z Z X 1 X 1 j j f (T (x)) − f dµ g(T (x)) − g dµ ≤ 2ε + n . n j=0 j=0 ∞ ∞ Letting n → ∞ we see that n−1 Z 1 X j f (T (x)) − f dµ lim sup n→∞ n j=0 < 2ε. ∞ As ε > 0 is arbitrary, it follows that n−1 Z X 1 j lim f (T (x)) − f dµ n→∞ n j=0 = 0. ∞ Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the unique invariant measure. 2 §8.7 Exercises Exercise 8.1 Prove the following identities concerning Radon-Nikodym derivatives. (i) If ν ≪ µ and f ∈ L1 (X, B, µ) then f ∈ L1 (X, B, ν) and Z Z dν dµ. f dν = f dµ (ii) If ν1 , ν2 ≪ µ then dν1 dν2 d(ν1 + ν2 ) = + . dµ dµ dµ 82 MATH4/61112 8. Ergodic measures for continuous transformations (iii) If λ ≪ ν ≪ µ then λ ≪ µ and dλ dλ dν = . dµ dν dµ Exercise 8.2 Let X = R3 /Z3 be the 3-dimensional torus. Fix α 6∈ Q. Define T : X → X by x α+x T y + Z3 = y + x + Z3 . z z+y Prove by induction that for n ≥ 3 n α+x 1 x n n 3 n = α+ x+y +Z y T 2 1 z n n n α+ x+ y+z 3 2 1 + Z3 n (here denotes the binomial coefficient). r Let f (x, y, z) = e2πi(kx+ℓy+mz) where k, ℓ, m ∈ Z. Assuming Weyl’s Theorem on Polynomials (Theorem 2.3.1), prove using Weyl’s Criterion (Theorem 1.2.1) that n−1 1X f (T j ((x, y, z) + Z3 )) → 0 x,y,z n sup j=0 as n → ∞, whenever (k, ℓ, m) ∈ Z3 \ {(0, 0, 0)}. Hence, using Oxtoby’s Ergodic Theorem, prove that T is uniquely ergodic and Lebesgue measure is the unique invariant measure. Exercise 8.3 Let T be a homeomorphism of a compact metric space X. Suppose that T is uniquely ergodic with unique invariant measure µ. Prove that every orbit of T is dense if, and only if, µ(U ) > 0 for every non-empty open set U . 83 MATH4/61112 9. Recurrence 9. Recurrence §9.1 Introduction We can now begin to study ergodic theorems. Before we do this, we discuss a remarkable result due to Poincar´e. §9.2 Poincar´ e’s Recurrence Theorem Theorem 9.2.1 (Poincar´ e’s Recurrence Theorem) Let T : X → X be a measure-preserving transformation of the probability space (X, B, µ). Let B ∈ B be such that µ(B) > 0. Then for µ-a.e. x ∈ B, the orbit {T n x}∞ n=0 returns to B infinitely often. Proof. Let E = {x ∈ B | T n x ∈ B for infinitely many n ≥ 1}, then we have to show that µ(B\E) = 0. If we write F = {x ∈ B | T n x 6∈ B ∀n ≥ 1} then we have the identity B\E = Thus we have the estimate µ(B\E) = µ ∞ [ (T k=0 −k ∞ [ (T −k F ∩ B). k=0 ! F ∩ B) ≤µ ∞ [ k=0 T −k F ! ≤ ∞ X µ(T −k F ). k=0 Since µ(T −k F ) = µ(F ) ∀k ≥ 0 (because the measure is preserved), it suffices to show that µ(F ) = 0. First suppose that n > m and that T −m F ∩ T −n F 6= ∅. If y lies in this intersection then T m y ∈ F and T n−m (T m y) = T n y ∈ F ⊂ B, which contradicts the definition of F . Thus T −m F and T −n F are disjoint. Since {T −k F }∞ n=0 is a disjoint family, we have ! ∞ ∞ X [ µ(T −k F ) = µ T −k F ≤ µ(X) = 1. k=0 k=0 Since the terms in the summation have the constant value µ(F ), we must have µ(F ) = 0. 2 84 MATH4/61112 9. Recurrence Remark. Note that the hypotheses of the Poincar´e Recurrence Theorem are very mild: all one needs is for T to be a measure-preserving transformations of a probability space. (One does not need T to be ergodic.) If you carefully look at the proof, you will see that the fact that T is measure-preserving and the fact that µ(X) = 1 are used just once. The same proof continues to hold in the case when µ(X) is finite. Poincar´e’s Recurrence Theorem is false with either of the hypotheses that µ(X) is finite or T is measure-preserving removed. §9.3 Ergodic Theorems An ergodic theorem is a result that describes the limiting behaviour of sequences of the form n−1 1X f ◦ Tj (9.3.1) n j=0 as n → ∞. The precise formulation of an ergodic theorem depends on the class of function f (for example, one could assume that f is integrable, L2 , or continuous), and the notion of convergence used (for example, the convergence could be pointwise, L2 , or uniform). We have already studied when one has uniform convergence of (9.3.1): this is Oxtoby’s Ergodic Theorem and only holds in the very special circumstances when T is uniquely ergodic. In what follows we will discuss von Neumann’s (Mean) Ergodic Theorem and Birkhoff’s Ergodic Theorem. Von Neumann’s Ergodic Theorem is in the context of f ∈ L2 (X, B, µ) and L2 -convergence of the ergodic averages (9.3.1); Birkhoff’s Ergodic Theorem is in the context of f ∈ L1 (X, B, µ) and almost everywhere pointwise convergence of (9.3.1). Note that L2 convergence neither implies nor is implied by almost everywhere pointwise convergence. Before stating these theorems, we first need to discuss conditional expectation. §9.4 Conditional expectation Let (X, B, µ) be a probability space. Let A ⊂ B be a sub-σ-algebra. Note that µ defines a measure on A by restriction. Let f ∈ L1 (X, B, µ). Then we can define a measure ν on A by setting, for A ∈ A, Z f dµ. ν(A) = A Note that ν ≪ µ|A . Hence by the Radon-Nikodym theorem, there is a unique A-measurable function E(f | A) such that Z ν(A) = A E(f | A) dµ for all A ∈ A. We call E(f | A) the conditional expectation of f with respect to the σ-algebra A. So far, we have only defined E(f | A) for non-negative f . To define E(f | A) for an arbitrary real-valued f , we split f into positive and negative parts f = f+ − f− where f+ , f− ≥ 0 and define E(f | A) = E(f+ | A) − E(f− | A). For a complex-valued f we split f into its real and imaginary parts and define E(f | A) = E(Re(f ) | A) + iE(Im(f ) | A). 85 MATH4/61112 9. Recurrence Thus we can view conditional expectation as an operator E(· | A) : L1 (X, B, µ) → L1 (X, A, µ). Note that E(f | A) is uniquely determined by the two requirements that (i) E(f | A) is A-measurable, and R R (ii) A f dµ = A E(f | A) dµ for all A ∈ A. Intuitively, one can think of E(f | A) as the best approximation to f in the smaller space of A-measurable functions. To state von Neumann’s and Birkhoff’s Ergodic Theorems precisely, we will need the sub-σ-algebra I of T -invariant subsets, namely: I = {B ∈ B | T −1 B = B a.e.}. It is straightforward to check that I is a σ-algebra. Note that if T is ergodic then I is the trivial σ-algebra consisting of all sets in B of measure 0 or 1. §9.5 Von Neumann’s Ergodic Theorem Von Neumann’s Ergodic Theorem deals with the L2 -limiting behaviour of for f ∈ L2 (X, B, µ). 1 n Pn−1 j=0 f ◦ Tj Theorem 9.5.1 (Von Neumann’s Ergodic Theorem) Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L2 (X, B, µ), we have n−1 1X f ◦ T j → E(f | I) n j=0 where the convergence is in L2 . When T is ergodic with respect to µ then von Neumann’s Ergodic Theorem takes a particularly simple form. Corollary 9.5.2 Let (X, B, µ) be a probability space and let T : X → X be an ergodic measure-preserving transformation. Let f ∈ L2 (X, B, µ). Then n−1 1X f ◦ Tj → n j=0 Z f dµ, as n → ∞, (9.5.1) where the convergence is in L2 . Proof. If T is ergodic then I is the trivial R σ-algebra N consisting of sets of measure 0 and 1. If f ∈ L2 (X, B, µ) then E(f | N ) = f dµ. 2 86 MATH4/61112 9. Recurrence Remark. The meaning of convergence in (9.5.1) is that n−1 Z X 1 j lim f ◦ T − f dµ =0 n→∞ n j=0 2 i.e. 2 1/2 Z Z n−1 X 1 lim f (T j x) − f dµ dµ = 0. n→∞ n j=0 §9.6 Proof of von Neumann’s Ergodic Theorem None of this section is examinable—it is included for people who like hard-core functional analysis! We prove von Neumann’s Ergodic Theorem in the case where T is invertible. In order to prove von Neumann’s Ergodic Theorem, it is useful to recast it in terms of linear analysis. Theorem 9.6.1 (von Neumann’s Ergodic Theorem for Operators) Let U be a unitary operator of a complex Hilbert space H. Let I = {v ∈ H | U v = v} be the closed subspace of U -invariant functions and let PI : H → I be orthogonal projection onto I. Then for all v ∈ H we have n−1 1X j U v → PI v n (9.6.1) j=0 in the norm induced on H by the inner product. Proof of Theorem 9.6.1. Denote the inner product and norm on H by h·, ·i and k · k, respectively. First note that if v ∈ I then (9.6.1) holds, as n−1 1X j U v = v = PI v. n j=0 If v = U w − w for some w ∈ H then n−1 X 1 1 1 j U v = kU n w − wk ≤ 2kwk → 0. n n n j=0 If we let C denote the norm-closure of the subspace {U w − w | w ∈ H} then it follows that n−1 1X j U v→0 n j=0 for all v ∈ C, by approximation. We claim that H = I ⊕ C, an orthogonal decomposition. Suppose that v ⊥ C. Then hv, U w − wi = 0 for all w ∈ H. Hence hU ∗ v, wi = hv, wi for all w ∈ H. Hence U ∗ v = v. As U is unitary, we have that U ∗ = U −1 . Hence v = U v, so that v ∈ I. Reversing each implication we see that v ∈ I implies v ⊥ C, and the claim follows. 2 87 MATH4/61112 9. Recurrence Remark. Note that an isometry of a Hilbert space H is a linear operator U such that hU v, U wi = hv, wi for all v, w ∈ H. We say that U is unitary if, in addition, it is invertible. Equivalently, U is unitary if the dual operator U ∗ is the inverse of U : U ∗ U = U U ∗ = id. We can prove von Neumann’s Ergodic Theorem for an invertible measure-preserving transformation T of a probability space (X, B, µ) as follows. Recall that L2 (X, B, µ) is a Hilbert space with respect to the inner product Z hf, gi = f g¯ dµ and that T induces a linear operator U : L2 (X, B, µ) → L2 (X, B, µ) by U f = f ◦ T . As T is measure-preserving, we have that U is an isometry; if T is invertible then U is unitary. Let PI : L2 (X, B, µ) → L2 (X, I, µ) denote the orthogonal projection onto the subspace of T -invariant functions. One can easily check (see Exercise 9.6 that PI f = E(f | I). Hence, when T is invertible, Theorem 9.5.1 follows immediately from Theorem 9.6.1. One can deduce from Theorem 9.6.1 that the result continues to hold when U is an isometry and is not assumed to be invertible. §9.7 Exercises Exercise 9.1 Construct an example to show that Poincar´e’s recurrence theorem does not hold on infinite measure spaces. That is, find a measure space (X, B, µ) with µ(X) = ∞ and a measurepreserving transformation T : X → X such that the conclusion of Poincar´e’s Recurrence Theorem does not hold. Exercise 9.2 Poincar´e’s Recurrence Theorem says that, if we have a measure-preserving transformation T of a probability space (X, B, µ) and a set A ∈ B, µ(A) > 0, then, if we start iterating a typical point x ∈ A then the orbit of x will return to A infinitely often. Construct an example to show that if we have a measure-preserving transformation T of a probability space (X, B, µ) and two sets A, B ∈ B, µ(A), µ(B) > 0, then, if we start iterating a typical point x ∈ A then the orbit of x does not necessarily visit B infinitely often. Exercise 9.3 (i) Prove that f 7→ E(f | A) is linear. (ii) Suppose that T is a measure-preserving transformation. Show that E(f | A) ◦ T = E(f ◦ T | T −1 A). (iii) Show that E(f | B) = f . (iv) Let N denote the trivial σ-algebra consisting of all sets of measure 0 and 1. Show that aRfunction f is N -measurable if and only if it is constant a.e. Show that E(f | N ) = f dµ. Exercise 9.4 Let (X, B, µ) be a probability space. 88 MATH4/61112 9. Recurrence (i) Let α = {A Sn1 , . . . , An }, Aj ∈ B be a finite partition of X. (By a partition we mean that X = j=1 Aj and Ai ∩ Aj = ∅ if i 6= j.) Let A denote the set of all finite unions of sets in α. Check that A is a σ-algebra. (ii) Show that g : X → R is A-measurable if and only if g is constant on each Aj , i.e. g(x) = n X cj χAj (x). j=1 (iii) Let f ∈ L1 (X, B, µ). Show that E(f | A)(x) = r X j=1 χAj (x) R Aj f dµ µ(Aj ) . Thus E(f | A) is the best approximation to f that is constant on sets in the partition α. Exercise 9.5 Prove that I is a σ-algebra. Exercise 9.6 Let T be a measure-preserving transformation of the probability space (X, B, µ) and let I denote the sub-σ-algebra of T -invariant sets. Let PI : L2 (X, B, µ) → L2 (X, I, µ) denote the orthogonal projection onto the subspace of T -invariant functions. Prove that PI f = E(f | I) for all f ∈ L2 (X, B, µ). 89 MATH4/61112 10. Birkhoff’s Ergodic Theorem 10. Birkhoff’s Ergodic Theorem §10.1 Birkhoff ’s Ergodic Theorem Birkhoff’s Ergodic Theorem deals with the behaviour of and for f ∈ L1 (X, B, µ). 1 n Pn−1 j=0 f (T j x) for µ-a.e. x ∈ X, Theorem 10.1.1 (Birkhoff ’s Ergodic Theorem) Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L1 (X, B, µ), we have n−1 1X f (T j x) → E(f | I)(x) n j=0 for µ-a.e. x ∈ X. Corollary 10.1.2 (Birkhoff ’s Ergodic Theorem for an ergodic transformation) Let (X, B, µ) be a probability space and let T : X → X be an ergodic measure-preserving transformation. Let f ∈ L1 (X, B, µ). Then n−1 1X f (T j x) → n j=0 Z f dµ, as n → ∞, for µ-a.e. x ∈ X. §10.2 Consequences of, and criteria for, ergodicity Here we give some simple corollaries of Birkhoff’s Ergodic Theorem. The first result says that, for a typical orbit of an ergodic dynamical system, ‘time averages’ equal ‘space averages’. Corollary 10.2.1 Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ). Suppose that B ∈ B. Then for µ-a.e. x ∈ X, the frequency with which the orbit of x lies in B is given by µ(B), i.e., lim n→∞ 1 card{j ∈ {0, 1, . . . , n − 1} | T j x ∈ B} = µ(B) µ-a.e. n Proof. Apply the Birkhoff Ergodic Theorem with f = χB . 2 It is possible to characterise ergodicity in terms of the behaviour of iteration of preimages of sets, rather than the iteration points, under the dynamics. The next result deals with this. 90 MATH4/61112 10. Birkhoff’s Ergodic Theorem Proposition 10.2.2 Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. The following are equivalent: (i) T is ergodic; (ii) for all A, B ∈ B, n−1 1X µ(T −j A ∩ B) → µ(A)µ(B), n j=0 as n → ∞. Proof. (i) ⇒ (ii): Suppose that T is ergodic. Since χA ∈ L1 (X, B, µ), Birkhoff’s Ergodic Theorem tells us that n−1 1X χA (T j x) → µ(A), as n → ∞ n j=0 for µ-a.e. x ∈ X. Multiplying both sides by χB gives n−1 1X χA (T j x) χB (x) → µ(A)χB , as n → ∞ n j=0 for µ-a.e. x ∈ X. Since the left-hand side is bounded (by 1), we can apply the Dominated Convergence Theorem (Theorem 3.1.3) to see that n−1 n−1 Z 1X 1X µ(T −j A ∩ B) = n n j=0 j=0 j χA ◦ T χB dµ = Z n−1 1X χA ◦ T j χB dµ → µ(A)µ(B), n j=0 as n → ∞. (ii) ⇒ (i): Now suppose that the convergence holds. Suppose that T −1 B = B and take A = B. Then µ(T −j A ∩ B) = µ(B) so n−1 1X µ(B) → µ(B)2 , n j=0 as n → ∞. This gives µ(B) = µ(B)2 . Therefore µ(B) = 0 or 1 and so T is ergodic. §10.3 2 Kac’s Lemma Poincar´e’s Recurrence Theorem tells us that, under a measure-preserving transformation, almost every point of a subset A of positive measure will return to A. However, it does not tell us how long we should have to wait for this to happen. One would expect that return times to sets of large measure are small, and that return times to sets of small measure are large. This is indeed the case, and forms the content of Kac’s Lemma. Let T : X → X be a measure-preserving transformation of a probability space (X, B, µ) and let A ⊂ X be a measurable subset with µ(A) > 0. By Poincar´e’s Recurrence Theorem, the integer nA (x) = inf{n ≥ 1 | T n (x) ∈ A} is defined for a.e. x ∈ A. 91 MATH4/61112 10. Birkhoff’s Ergodic Theorem Theorem 10.3.1 (Kac’s Lemma) Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ). Let A ∈ B be such that µ(A) > 0. Then Z nA dµ = 1. A Proof. Let An = A ∩ T −1 Ac ∩ · · · ∩ T −(n−1) Ac ∩ T −n A. Then An consists of those points in A that return to A after exactly n iterations of T , i.e. An = {x ∈ A | nA (x) = n}. Consider the illustration in Figure 10.3. As T is ergodic, almost every point of X T T T A1 A2 T T T T A3 A An T Figure 10.3.1: The return times to A eventually enters A. Hence the diagram represent almost all of X. Note that the column above An in the diagram consists of n sets, An,0 , . . . , An,n−1 say, with An,0 = An . Note that T −k An,k = An . As T is measure-preserving, it follows that µ(An,k ) = µ(An ) for k = 0, . . . , n − 1. Hence 1 = µ(X) = ∞ n−1 X X µ(An,k ) = n=1 k=0 = ∞ Z X nA dµ = n=1 An ∞ X nµ(An ) n=1 Z nA dµ. A 2 Remark. Let A be as in the statement of Kac’s Lemma (Theorem 10.3.1). Define a probability measure µA on A by µA = µ/µ(A) so that µA (A) = 1. Then Kac’s Lemma says 92 MATH4/61112 that 10. Birkhoff’s Ergodic Theorem Z nA dµA = A 1 , µ(A) i.e. the expected return time of a point in A to the set A is 1/µ(A). §10.4 Ehrenfests’ example The following example, due to P. and T. Ehrenfest, demonstrates that the return times in Poincar´e’s Recurrence Theorem may be extremely large. Consider two urns. One urn contains 100 balls, numbered 1 to 100, and the other urn is empty. We also have a random number generator: this could be a bag containing 100 slips of paper, numbered 1 to 100. Each second, a slip of paper is drawn from the bag, the number is noted, and the slip of paper is returned to the bag. The ball bearing that number is then moved from whichever urn it is currently in to the other urn. Naively, we would expect that the system will settle into an equilibrium state in which there are 50 balls in each urn. Of course, there will continue to be small random fluctuations about the 50-50 distribution. However, it would appear highly unlikely for the system to return to the state in which 100 balls are in the first urn. Nevertheless, the Poincar´e Recurrence Theorem tells us that this situation will occur almost surely and Kac’s Lemma tells us how long we should expect to wait. To see this, we represent the system as a shift on 101 symbols with an appropriate Markov measure. Regard xj ∈ {0, . . . , 100} as being the number of balls in the first urn after j seconds. Hence a sequence (xj )∞ j=0 records the number of balls in the first urn at each time. Let Σ = {x = (xj )∞ | x ∈ {0, 1, . . . , 101}}. j j=0 Let p(i) denote the probability of there being i balls in the first urn. This is equal to the number of possible ways of choosing i balls from 100,divided by the total number of 100 ways of distributing 100 balls across the 2 urns. There are ways of choosing i balls i from 100 balls. As there are 2 possible urns for each ball to be in, there are 2100 possible arrangements of all the balls. Hence the probability of there being i balls in the first urn is 1 100 . p(i) = 100 i 2 If we have i balls in the first urn then at the next stage we must have either i − 1 or i + 1 balls in the first urn. The number of balls becomes i − 1 if the random number chosen is equal to the number of one of the balls in the first urn. As there are currently i such balls, the probability of this happening is i/100. Hence the probability P (i, i − 1) that there are i − 1 balls remaining given that we started with i balls in the first urn is i/100. Similarly, the probability P (i, i + 1) that there are i + 1 balls in the first urn given that we started with i balls is (100 − i)/100. if j 6= i − 1, i + 1 then we cannot have j balls in the first urn given that we started with i balls; thus P (i, j) = 0. This defines a stochastic matrix: 0 1 0 0 0 ··· 99 1 0 0 ··· 100 100 02 98 0 0 100 0 · · · P = 100 97 3 0 0 100 ··· 0 100 .. .. .. .. .. .. . . . . . . 93 MATH4/61112 10. Birkhoff’s Ergodic Theorem It is straightforward to check that pP = p. Hence we have a Markov probability measure µP defined on Σ. The matrix A is irreducible (but is not aperiodic); this ensures that µP is ergodic. Consider the cylinder A = [100] of length 1. The represents there being 100 balls in the first urn. By Poincar´e’s Recurrence Theorem, if we start in A then we return to A infinitely often. Thus, with probability 1, we will return to the situation where all 100 balls have returned to the first urn—and this will happen infinitely often! We can use Kac’s Lemma to calculate the expected amount of time we will have to wait until all the balls first return to the first urn. By Kac’s lemma, the expected first return time to A is 1 = 2100 seconds, µP (A) which is about 4 × 1022 years, or about 3 × 1012 times the length of time that the Universe has so far existed! (This measure-preserving transformation system, with 4 balls rather than 100, was also studied in Exercise 3.11.) §10.5 Proof of Birkhoff ’s Ergodic Theorem None of this section is examinable—it is included for people who like hard-core ε-δ analysis! The proof is based on the following inequality. Theorem 10.5.1 (Maximal Inequality) Let (X, B, µ) be a probability space, let T : X → X be a measure-preserving transformation and let f ∈ L1 (X, B, µ). Define f0 = 0 and, for n ≥ 1, fn = f + f ◦ T + · · · + f ◦ T n−1 . For n ≥ 1, set Fn (x) = max0≤j≤n fj (x) so that Fn (x) ≥ 0. Then Z f dµ ≥ 0. {x∈X|Fn (x)>0} Proof. Clearly Fn ∈ L1 (X, B, µ). For 0 ≤ j ≤ n, we have Fn ≥ fj , so Fn ◦ T ≥ fj ◦ T . Hence Fn ◦ T + f ≥ fj ◦ T + f = fj+1 and therefore Fn ◦ T (x) + f (x) ≥ max fj (x). 1≤j≤n If Fn (x) > 0 then max fj (x) = max fj (x) = Fn (x), 1≤j≤n 0≤j≤n so we obtain that f ≥ Fn − Fn ◦ T on the set A = {x | Fn (x) > 0}. 94 MATH4/61112 10. Birkhoff’s Ergodic Theorem Hence Z A f dµ ≥ = ≥ Z ZA ZX X Z Fn dµ − Fn dµ − Fn dµ − Fn ◦ T dµ ZA ZA Fn ◦ T dµ as Fn = 0 on X \ A X Fn ◦ T dµ as Fn ◦ T ≥ 0 = 0 as µ is T -invariant. 2 Corollary 10.5.2 Let g ∈ L1 (X, B, µ) and let Mα = 1 n≥1 n x ∈ X | sup n−1 X j=0 g(T j x) > α . Then for all B ∈ B with T −1 B = B we have that Z g dµ ≥ αµ(Mα ∩ B). Mα ∩A Proof. Suppose first that B = X. Let f = g − α, then ∞ ∞ n−1 ∞ [ [ X [ j {x | Fn (x) > 0} {x | fn (x) > 0} = g(T x) > nα = x| Mα = n=1 n=1 n=1 j=0 (since fn (x) > 0 ⇒ Fn (x) > 0 and Fn (x) > 0 ⇒ fj (x) > 0 for some 1 ≤ j ≤ n). Write Cn = {x | Fn (x) > 0} and observe that Cn ⊂ Cn+1 . Thus χCn converges to χBα and so f χCn converges to f χMα , as n → ∞. Furthermore, |f χCn | ≤ |f |. Hence, by the Dominated Convergence Theorem, Z Z Z Z f dµ, as n → ∞. f χMα dµ = f χCn dµ → f dµ = Cn Mα X X R Applying the Maximal Inequality, we have for all n ≥ 1 that Cn f dµ ≥ 0. Therefore R R Mα f dµ ≥ 0, i.e., Bα g dµ ≥ αµ(Bα ). For the general case, we work with the restriction of T to B, T |B : B → B, and apply the Maximal Inequality on this subset to get Z g dµ ≥ αµ(Mα ∩ B), Mα ∩B as required. 2 We will also need the following convergence result. Proposition 10.5.3 (Fatou’s Lemma) Let (X, B, µ) be a probability space and suppose that fn : X → R are measurable functions. Define f (x) = lim inf n→∞ fn (x). Then f is measurable and Z Z f dµ ≤ lim inf fn dµ n→∞ (one or both of these expressions may be infinite). 95 MATH4/61112 10. Birkhoff’s Ergodic Theorem Proof of Birkhoff ’s Ergodic Theorem. Let f ∗ (x) = lim sup n→∞ n−1 n−1 j=0 j=0 1X 1X f (T j x), f∗ (x) = lim inf f (T j x). n→∞ n n These exist (but may be ±∞, respectively) at all points x ∈ X. Clearly f∗ (x) ≤ f ∗ (x). Let n−1 1X an (x) = f (T j x). n j=0 Observe that n+1 1 an+1 (x) = an (T x) + f (x). n n As f is finite µ-a.e., we have that f (x)/n → 0 µ-a.e. as n → ∞. Hence, taking the lim sup and lim inf as n → ∞, gives us that f ∗ ◦ T = f ∗ µ-a.e. and f∗ ◦ T = f∗ µ-a.e. We have to show (i) f ∗ = f∗ µ-a.e (ii) f ∗ ∈ L1 (X, B, µ) R R (iii) f ∗ dµ = f dµ. We prove (i). For α, β ∈ R, define Eα,β = {x ∈ X | f∗ (x) < β and f ∗ (x) > α}. Note that {x ∈ X | f∗ (x) < f ∗ (x)} = [ Eα,β β<α, α,β∈Q (a countable union). Thus, to show that f ∗ = f∗ µ-a.e., it suffices to show that µ(Eα,β ) = 0 whenever β < α. Since f∗ ◦ T = f∗ and f ∗ ◦ T = f ∗ , we see that T −1 Eα,β = Eα,β . If we write n−1 1X f (T j x) > α Mα = x ∈ X | sup n≥1 n j=0 then Eα,β ∩ Mα = Eα,β . Applying Corollary 10.5.2 we have that Z Z f dµ = f dµ Eα,β ∩Mα Eα,β ≥ αµ(Eα,β ∩ Mα ) = αµ(Eα,β ). Replacing f , α and β by −f , −β and −α and using the fact that (−f )∗ = −f∗ and (−f )∗ = −f ∗ , we also get Z f dµ ≤ βµ(Eα,β ). Eα,β Therefore αµ(Eα,β ) ≤ βµ(Eα,β ) 96 MATH4/61112 10. Birkhoff’s Ergodic Theorem and since β < α this shows that µ(Eα,β ) = 0. Thus f ∗ = f∗ µ-a.e. and n−1 1X f (T j x) = f ∗ (x) µ-a.e. n→∞ n lim j=0 We prove (ii). Let n−1 1 X j gn (x) = f (T x) . n j=0 Then gn ≥ 0 and Z gn dµ ≤ Z |f | dµ so we can apply Fatou’s Lemma (Proposition 10.5.3) to conclude that limn→∞ gn = |f ∗ | is integrable, i.e., that f ∗ ∈ L1 (X, B, µ). We prove (iii). For n ∈ N and k ∈ Z, define k+1 k . Dkn = x ∈ X | ≤ f ∗ (x) < n n For every ε > 0, we have that Dkn ∩ M k −ε = Dkn . n Since T −1 Dkn = Dkn , we can apply Corollary 10.5.2 again to obtain Z k − ε µ(Dkn ). f dµ ≥ n Dkn Since ε > 0 is arbitrary, we have Z Thus Dkn f dµ ≥ k µ(Dkn ). n 1 k+1 µ(Dkn ) ≤ µ(Dkn ) + f dµ ≤ n n Dkn Z ∗ Z Dkn f dµ (where the first inequality follows from the definition of Dkn ). Since [ X= Dkn k∈Z (a disjoint union), summing over k ∈ Z gives Z Z 1 f dµ f ∗ dµ ≤ µ(X) + n X X Z 1 f dµ. + = n X Since this holds for all n ≥ 1, we obtain Z Z ∗ f dµ. f dµ ≤ X X 97 MATH4/61112 10. Birkhoff’s Ergodic Theorem Applying the same argument to −f gives Z Z ∗ (−f ) dµ ≤ −f dµ so that Therefore Z ∗ f dµ = Z Z f∗ dµ ≥ ∗ f dµ = Z Z f dµ. f dµ, as required. Finally, we prove that f ∗ = E(f | I). First note that as f ∗ is T -invariant, it is measurable with respect to I. Moreover, if I is any T -invariant set then Z Z f dµ = f ∗ dµ. I I Hence f ∗ = E(f | I). §10.6 2 Exercises Exercise 10.1 Suppose that T is an ergodic measure-preserving transformation of the probability space (X, B, µ) and suppose that f ∈ L1 (X, B, µ). Prove that f (T n x) = 0 µ-a.e. n→∞ n lim Exercise 10.2 Deduce from Birkhoff’s Ergodic Theorem that if T is an ergodic measure-preserving transR formation of a probability space (X, B, µ) and f ≥ 0 is measurable but f dµ = ∞ then n−1 1X f (T j x) → ∞ µ-a.e. n j=0 (Hint: define fM = min{f, M } and note that fM ∈ L1 (X, B, µ). Apply Birkhoff’s Ergodic Theorem to each fM .) Exercise 10.3 Let T be a measure-preserving transformation of the probability space (X, B, µ). Prove that the following are equivalent: (i) T is ergodic with respect to µ, (ii) for all f, g ∈ L2 (X, B, µ) we have that n−1 Z 1X lim n→∞ n j f (T x)g(x) dµ = j=0 98 Z f dµ Z g dµ. MATH4/61112 10. Birkhoff’s Ergodic Theorem Exercise 10.4 Let X be a compact metric space equipped with the Borel σ-algebra B and let T : X → X be continuous. Suppose that µ ∈ M (X) is an ergodic measure. Prove that there exists a set Y ∈ B with µ(Y ) = 1 such that n−1 1X f (T j x) → n j=0 Z f dµ for all x ∈ Y and for all f ∈ C(X, R). (Thus, in the special case of a continuous transformation of a compact metric space and continuous functions f , the set of full measure for which Corollary 10.1.2 holds can be chosen to be independent of the function f .) Exercise 10.5 A popular illustration of recurrence concerns a monkey typing the complete works of Shakespeare on a typewriter. Here we study this from an ergodic-theoretic viewpoint. Imagine a(n idealised) monkey typing on a typewriter. Each second he types one letter, and each letter occurs with equal probability (independently of the preceding letter). Suppose that the keyboard has 26 keys (so no space bar, carriage return, numbers, etc). Show how to model this using a shift space on 26 symbols with an appropriate Bernoulli measure. Use Birkhoff’s Ergodic Theorem to show that the monkey must, with probability 1, eventually type the word ‘MONKEY’. Use Kac’s Lemma to calculate the expected time it would take for the monkey to first type ‘MONKEY’. 99 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem 11. Applications of Birkhoff’s Ergodic Theorem §11.1 Introduction We will show how to use Birkhoff’s Ergodic Theorem to prove some interesting results in number theory. §11.2 Normal and simply normal numbers Recall that any number x ∈ [0, 1] can written as a decimal x = ·x0 x1 x2 . . . = ∞ X xj 10j+1 j=0 where xj ∈ {0, 1, . . . , 9}. This decimal expansion is unique unless the decimal expansion ends in either infinitely repeated 0s or infinitely repeated 9s. More generally, given any integer base b ≥ 2, we can write z ∈ [0, 1] as a base b expansion: ∞ X xj x = ·x0 x1 x2 . . . = bj+1 j=0 where xj ∈ {0, 1, . . . , b − 1}. This expansion is unique unless it ends in either infinitely repeated 0s or infinitely repeated (b − 1)s. Definition. A number x ∈ [0, 1] is said to be simply normal in base b if for each k = 0, 1, . . . , b − 1, the frequency with which digit k occurs in the base b expansion of x is equal to 1/b. Remarks. 1. Thus a number is simply normal in base b if all of the b possible digits in its base b expansion are equally likely to occur. 2. It is straightfoward to construct examples of simply normal numbers in a given base. For example, x = ·012 · · · 9012 · · · 9 · · · consisting of the block of decimal digits 012 · · · 9 infinitely repeated is simply normal in base 10. If a number is simply normal in one base then it need not be simply normal in any other base. Fix b ≥ 2. Define the map T : [0, 1] → [0, 1] by T (x) = Tb (x) = bx mod 1. It is easy to see, by following any of the arguments we have seen for the doubling map, that Lebesgue measure µ on [0, 1] is an ergodic invariant measure for T . 100 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem There is a close connection between the map Tb and base b expansions. Note that if x ∈ [0, 1] has base b expansion ∞ X xj x= = ·x0 x1 x2 · · · j+1 b j=0 then ∞ ∞ ∞ X X X xj xj xj+1 Tb (x) = b mod 1 = x0 + mod 1 = = ·x1 x2 x3 · · · . j+1 j b b bj+1 j=0 j=1 j=0 Thus Tb acts on base b expansions by deleting the zeroth term and then shifting the remaining digits one place to the left. This relationship between base b expansions and the map Tb can be used to prove the following result. Proposition 11.2.1 Let b ≥ 2. Then Lebesgue almost every real number in [0, 1] is simply normal in base b. Proof. Fix k ∈ {0, 1, . . . , b − 1}. Note that x0 = k if and only if x ∈ [k/b, (k + 1)/b). Hence xj = k if and only if Tbj (x) ∈ [k/b, (k + 1)/b). Thus n−1 1X 1 card{0 ≤ j ≤ n − 1 | xj = k} = χ[k/b,(k+1)/b) (T j x). n n (11.2.1) j=0 By Birkhoff’sR Ergodic Theorem, for Lebesgue almost every point x the above expression converges to χ[k/b,(k+1)/b) (x) dx = 1/b. Let Xb (k) denote the set of points x ∈ [0, 1] for which (11.2.1)Tconverges. Then µ(Xb (k)) = 1 for each k = 0, 1, . . . , b − 1. Let Xb = b−1 k=0 Xb (k). Then µ(Xb ) = 1. If x ∈ Xb then the frequency with which digit k occurs in the base b expansion of x is equal to 1/b, i.e. x is simply normal in base b. 2 We can consider a more general notion of normality of numbers as follows. Take x ∈ [0, 1] and write x as a base b expansion ∞ X xj x = ·x0 x1 x2 . . . = bj+1 j=0 where xj ∈ {0, 1, . . . , b − 1}. Fix a finite word of symbols i0 , i1 , . . . , ik−1 where ij ∈ {0, 1, . . . , b − 1}, j = 0, . . . , k − 1. We can ask what is the frequency with which the block of symbols i0 , i1 , . . . , ik−1 occurs in the base b expansion of x. Note that x has a base b expansion that starts i0 i1 · · · ik−1 precisely when k−1 k−1 X X ij ij 1 , + k. x∈ j+1 j+1 b b b j=0 j=0 Call this interval I(i0 , . . . , ik−1 ) and note that it has Lebesgue measure 1/bk . Definition. A number x ∈ [0, 1] is said to be normal in base b if, for each k ≥ 1 and for each word i0 , i1 , . . . , ik−1 of length k, the frequency with which this word occurs in the base b expansion of x is equal to 1/bk . 101 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem Proposition 11.2.2 Let b ≥ 2 be an integer. Lebesgue almost every real number in [0, 1] is normal in base b. Proof. Fix a word i0 , i1 , . . . , ik−1 of length k and define the interval I(i0 , . . . , ik−1 ) as above. Then the word i0 , i1 , . . . , ik−1 occurs at the jth place in the base b expansion of x if and only if Tbj (x) ∈ I(i0 , . . . , ik−1 ). Thus 1 card{0 ≤ j ≤ n − 1 | i0 , i1 , . . . , ik−1 occurs at the jth place in the base b expansion of x} n n−1 = 1X χI(i0 ,...,ik−1 ) (T j x). n (11.2.2) j=0 By Birkhoff’sRErgodic Theorem, for Lebesgue almost every point x the above expression converges to χI(i0 ,...,ik−1 ) (x) dx = 1/bk . Let Xb (i0 , i1 , . . . , ik−1 ) denote the set of points x ∈ [0, 1] for which (11.2.2) converges. Then µ(Xb (i0 , i1 , . . . , ik−1 )) = 1 for each word i0 , i1 , . . . , ik−1 of length k. Let ∞ \ \ Xb = Xb (i0 , i1 , . . . , ik−1 ) k=1 i0 ,i1 ,...,ik−1 where the second intersection is taken over all words of length k. As this is a countable intersection, we have that µ(Xb ) = 1. If x ∈ Xb then the frequency with which any word of length k occurs in the base b expansion of x is equal to 1/bk , i.e. x is normal in base b. 2 We can then make the following definition. Definition. A number x ∈ [0, 1] is normal if it is normal in base b for every base b ≥ 2. One can then prove the following result: Proposition 11.2.3 Lebesgue almost every number x ∈ [0, 1] is normal. Remark. Although a ‘typical’ number is normal, there are no known examples of normal numbers! §11.3 Continued fractions We can use Birkhoff’s Ergodic Theorem to study the frequency with which a given digit occurs in the continued fraction expansion of real numbers. Proposition 11.3.1 For Lebesgue-almost every x ∈ [0, 1], the frequency with which the natural number k occurs in the continued fraction expansion of x is (k + 1)2 1 log . log 2 k(k + 2) 102 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem Proof. Let λ denote Lebesgue measure and let µ denote Gauss’ measure. Recall that λ and µ are equivalent, i.e. they have the same sets of measure zero. Then λ-a.e. and µ-a.e. x ∈ (0, 1) is irrational and has an infinite continued fraction expansion 1 x= . 1 x0 + x1 + 1 x2 + · · · Let T denote the continued fraction map. Then 1 T (x) = 1 x1 + x2 + so that 1 = x1 + T (x) 1 x3 + · · · 1 1 x2 + x3 + · · · . Hence x1 = [1/T (x)], where [x] denotes the integer part of x. More generally, xn = [1/T n x]. Fix k ∈ N. Note that x has a continued fraction expansion starting with digit k (i.e. x0 = k) precisely when [1/x] = k. That is, x0 = k precisely when k≤ 1 <k+1 x which is equivalent to requiring 1 1 <x≤ k+1 k i.e. x ∈ (1/(k + 1), 1/k]. Similarly xn = k precisely when T n x ∈ (1/(k + 1), 1/k]. Hence 1 card{0 ≤ j ≤ n − 1 | xj = k} n n−1 1X χ(1/(k+1),1/k] (T j x) n j=0 Z → χ(1/(k+1),1/k] dµ for µ-a.e. x 1 1 1 log 1 + − log 1 + for µ-a.e. x = log 2 k k+1 1 (k + 1)2 = log for µ-a.e. x. log 2 k(k + 2) = As µ and λ have the same sets of measure zero, this holds for Lebesgue almost every point. 2 We can also study the limiting arithmetic and geometric means of the digits in the continued fraction expansion of Lebesgue almost every point x ∈ [0, 1]. Proposition 11.3.2 (i) For Lebesgue-almost every x ∈ [0, 1], the limiting arithmetic mean of the digits in the continued fraction expansion of x is infinite. 103 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem (ii) For Lebesgue-almost every x ∈ [0, 1], the limiting geometric mean of the digits in the continued fraction expansion of x is log k/ log 2 ∞ Y 1 1+ 2 . k + 2k k=1 Proof. Writing 1 x= . 1 x0 + x1 + the proposition claims that 1 x2 + · · · 1 (x0 + x1 + · · · + xn−1 ) = ∞ n→∞ n (11.3.1) lim for Lebesgue almost every point, and that 1/n lim (x0 x1 · · · xn−1 ) n→∞ = ∞ Y k=1 1 1+ 2 k + 2k log k/ log 2 (11.3.2) for Lebesgue almost every point. We leave (11.3.1) as an exercise. We prove (11.3.2). Define f (x) = log k for x ∈ (1/(k + 1), 1/k] so that f (x) = log k precisely when x0 = k. Then f (T j x) = log k precisely when xj = k. By Exercise 3.5(iii), to show f ∈ L1 (X, B, µ) it is sufficient to show that f ∈ L1 (X, B, λ). Note that Z ∞ X 1 1 , f dλ = log k λ k+1 k k=1 = ≤ which converges. Hence f ∈ Now ∞ X k=1 ∞ X K=1 log k k(k + 1) log k , k2 L1 (X, B, µ). n−1 1 (log x0 + log x1 + · · · + log xn−1 ) n = → = = 1X f (T j x) n j=0 Z 1 f (x) 1 dx log 2 0 1 + x ∞ Z 1 X 1/k log k log 2 k=1 ∞ X log k k=1 1/(k+1) 1+x dx 1 log 1 + 2 log 2 k + 2k , for Gauss-almost every point x ∈ [0, 1]. As Gauss’ measure and Lebesgue measure have the same sets of measure zero, this limit also exists for Lebesgue almost every point. 2 104 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem Let x ∈ (0, 1) be irrational and have continued fraction expansion [x0 , x1 , . . .]. Then [x0 , x1 , . . . , xn−1 ] is a rational number; write [x0 , x1 , . . . , xn−1 ] = Pn /Qn , where Pn , Qn are co-prime integers. Then Pn /Qn is a ‘good’ rational approximation to x. We write Pn (x), Qn (x) if we wish to indicate the dependence on x. As x and Pn /Qn lie in the same cylinder I(x0 , . . . , xn−1 ) of rank n, we must have that x − Pn ≤ diam I(x0 , . . . , xn−1 ) ≤ 1 . Qn Q2n Thus we can quantify how good a rational approximation Pn /Qn is to x by looking at the denominator Qn . Thus understanding how Qn grows gives us information about x. For a typical point, Qn grows exponentially fast and we can determine the exponential growth rate. Proposition 11.3.3 For Lebesgue almost every real number x ∈ (0, 1) we have that π2 1 log Qn (x) = . n→∞ n 12 log 2 lim Remark. Thus, for a typical point x ∈ (0, 1), we have that Qn (x) ∼ enπ 2 /12 log 2 . Proof (not examinable). Let x ∈ (0, 1) be irrational and have continued fraction expansion [x0 , x1 , . . .]. Write Pn (x) . [x0 , x1 , . . . , xn−1 ] = Qn (x) Then 1 Pn (x) = = Qn (x) x0 + [x1 , . . . , xn−1 ] 1 Qn−1 (T x) . = Pn−1 (T x) Pn−1 (T x) + x0 Qn−1 (T x) x0 + Qn−1 (T x) (11.3.3) By Lemma 6.3.1(ii) and the Euclidean algorithm, we know that for all n and all x, Pn (x) and Qn (x) are coprime. As Pn−1 (T x) and Qn−1 (T x) are coprime, it follows that Pn−1 (T x) + x0 Qn−1 (T x) and Qn−1 (T x) are coprime. Hence, comparing the numerators in (11.3.3), we see that Pn (x) = Qn−1 (T x). Also note that P1 (x) = 1. Hence P1 (T n−1 x) P1 (T n−1 x) 1 Pn (x) Pn−1 (T x) ··· = = . n−1 Qn (x) Qn−1 (T x) Q1 (T x) Qn (x) Qn (x) Taking the logarithm and dividing by n gives that n−1 n−1 j=0 j=0 Y Pn−j (T j x) Pn−j (T j x) 1 1X 1 = log . − log Qn (x) = log n n Qn−j (T j x) n Qn−j (T j x) (11.3.4) This resembles an ergodic sum, except that the function Pn−j /Qn−j depends on j and so we cannot immediately apply Birkhoff’s Ergodic Theorem. We will consider sums P ergodic j x) and f (T using the function f (x) = log x and show that the difference between n1 ∞ n=0 (11.3.4) is small. 105 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem Let f (x) = log x. Then we can write (11.3.4) as n−1 n−1 1X Pn−j (T j x) 1X 1 (2) 1 1 j j f (T x) − log T (x) − log − log Qn (x) = = − Σ(1) n − Σn . j n n n Qn−j (T x) n n j=0 j=0 (1) We evaluate limn→∞ n1 Σn . By Birkhoff’s Ergodic Theorem and the fact that Gauss’ measure µ and Lebesgue measure are equivalent, it follows that for Lebesgue almost every x ∈ [0, 1] we have that 1 1 lim Σ(1) = n→∞ n n log 2 Z 1 f (x) dx = 1+x log 2 Z 0 1 log x dx. 1+x Integrating by parts we have that Z 1 Z 1 Z 1 log x log(1 + x) log(1 + x) 1 dx = log x log(1 + x)|0 − dx = − dx. x x 0 1+x 0 0 The Taylor series expansion of log(1 + x) about zero is ∞ log(1 + x) = x − so that X (−1)k−1 xk x2 x3 + − ··· = 2 3 k k=1 ∞ log(1 + x) X (−1)k xk = . x k+1 k=0 Hence for almost every x, ∞ 1 X (−1)k 1 = − lim Σ(1) n→∞ n n log 2 k+1 k=0 Z 1 0 ∞ xk dx = − 1 X (−1)k . log 2 (k + 1)2 k=0 Note that ∞ ∞ ∞ ∞ ∞ X X X X (−1)k 1 1 1 1X 1 π2 , = − 2 = − = (k + 1)2 n2 (2n)2 n2 2 n=1 n2 12 n=1 n=1 n=1 k=0 using the well-known fact that P∞ n=1 1/n 2 = π 2 /6. Hence for almost every x, π2 1 (1) Σn = − . n→∞ n 12 log 2 lim (2) It remains to show that n1 Σn → 0 as n → ∞. Recall that in §6.3 we introduced the cylinder set I(x0 , x1 , . . . , xn−1 ) of rank n, denoting the set of points x with continued fraction expansion that starts x0 , . . . , xn−1 . We proved in §6.3 that I(x0 , x1 , . . . , xn−1 ) is an interval with length at most 1/Qn (x)2 . Note that both x and Pn (x)/Qn (x) lie in the same interval of rank n. Hence Qn x Pn Qn 1 Pn Pn /Qn − 1 = Pn x − Qn ≤ Pn Q2 = Qn . n 106 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem It follows from Lemma 6.3.1(i) that Pn ≥ 2(n−2)/2 and Qn ≥ 2(n−1)/2 . Hence x 1 1 Pn /Qn − 1 ≤ 2n−3/2 ≤ 2n−1 . By the triangle inequality and the fact that log y ≤ y − 1 we have that (n) |Σ2 | n−1 X T j (x) ≤ j j Pn−j (T (x))/Qn−j (T (x)) j=0 n−1 X T j (x) ≤ Pn−j (T j (x))/Qn−j (T j (x)) − 1 log j=0 n−1 X ≤ Note that j=0 n−1 X j=0 1 2n−j−1 1 2n−j−1 = . n−1 X j=0 ∞ X 1 1 ≤ = 2. 2j 2j j=0 (2) Hence Σn < 2 for all n. Hence lim n→∞ 1 (2) Σ =0 n n and the result follows. §11.4 2 Exercises Exercise 11.1 Let b ≥ 2 be an integer. Prove that Lebesgue measure is an ergodic invariant measure for Tb (x) = bx mod 1 defined on the unit interval. Exercise 11.2 (i) A number x ∈ [0, 1] is said to be simply normal if it is simply normal in base b for all b ≥ 2. Prove that Lebesgue a.e. number x ∈ [0, 1] is simply normal. (ii) Prove Proposition 11.2.3. Exercise 11.3 Let r ≥ 2 be an integer. Prove that for Lebesgue almost every x ∈ [0, 1], the sequence xn = r n x is uniformly distributed mod 1. Exercise 11.4 Prove that the arithmetic mean of the digits appearing 10 expansion of LebesgueP in the base j+1 , x ∈ {0, 1, . . . , 9} then a.e. x ∈ [0, 1) is equal to 4.5, i.e. prove that if x = ∞ x /10 j j j=0 lim n→∞ 1 (x0 + x1 + · · · + xn−1 ) = 4.5 a.e. n Exercise 11.5 Let x ∈ (0, 1) have continued fraction expansion x = [x0 , x1 , x2 , . . .]. 107 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem Prove that 1 (x0 + x1 + · · · + xn−1 ) = ∞ n→∞ n for Lebesgue almost every x ∈ [0, 1]. (Hint: use Exercise 10.2.) lim 108 MATH4/61112 12. Solutions 12. Solutions to the Exercises Solution 1.1 Suppose that xn ∈ R is uniformly distributed mod 1. Let x ∈ [0, 1] and let ε > 0. We want to show that there exists n such that {xn } ∈ (x − ε, x + ε) ∩ [0, 1] (as usual, {xn } denotes the fractional part of xn ). By the definition of uniform distribution mod 1 we have that 1 card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} = 2ε. n→∞ n lim Then there exists n0 such that if n ≥ n0 then 1 card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} > ε > 0. n Hence card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} > 0 for some n, so there exists j such that {xj } ∈ (x − ε, x + ε). Solution 1.2 We use Weyl’s Criterion. Let ℓ ∈ Z \ {0}. Then n−1 n−1 n−1 X 1 X 2πiℓxj 1 1 X 2πiℓ(αj+β) 1 e2πiℓαj = e2πiℓβ e = e = e2πiℓβ n n n n j=0 j=0 j=0 e2πiℓαn − 1 e2πiℓα − 1 , summing the geometric progression. As α 6∈ Q, we have that e2πiℓα 6= 1 for any ℓ ∈ Z \ {0}. Hence n−1 1 X 2πiℓx 1 e2πiℓαn − 1 1 2 j e ≤ n e2πiℓα − 1 ≤ n |e2πiℓα − 1| → 0 n j=0 as n → ∞, as |e2πiℓβ | = 1. Hence xn = αn + β is uniformly distributed. Solution 1.3 (i) If log10 2 = p/q with p, q integers, hcf(p, q) = 1, then 2 = 10p/q , i.e. 2q = 10p = 5p 2p . Comparing indices, we see that 0 = p = q, a contradiction. (ii) Let 2n have leading digit r. Then 2n = r · 10ℓ + terms involving lower powers of 10 where the terms involving lower powers of 10 are integers lying in [0, 10ℓ ). Hence 2n has leading digit r ⇔ r · 10ℓ ≤ 2n < (r + 1) · 10ℓ ⇔ log10 r + ℓ ≤ n log10 2 < log10 (r + 1) + ℓ ⇔ log10 r ≤ {n log10 2} < log10 (r + 1). 109 MATH4/61112 12. Solutions Hence 1 card{k | 0 ≤ k ≤ n − 1, 2k has leading digit r} n 1 = card{k | 0 ≤ k ≤ n − 1, {k log10 2} ∈ [log10 r, log10 (r + 1))} n which, by uniform distribution, converges to log10 (r + 1) − log10 r = log10 (1 + 1/r) as n → ∞. Solution 1.4 P The frequency with which the penultimate leading digit of 2n is r is given by 9q=1 A(q, r) where A(q, r) is the frequency with which the leading digit is q and the penultimate leading digit is r. Now 2n has leading digit q and penultimate digit r precisely when q · 10ℓ + r · 10ℓ−1 ≤ 2n < q · 10ℓ + (r + 1) · 10ℓ−1 . Taking logs shows that 2n has leading digit q and penultimate leading digit r when log10 (10q + r) + ℓ − 1 ≤ n log10 2 < log10 (10q + r + 1) + (ℓ − 1). Reducing this mod 1 gives log10 (10q + r) − 1 ≤ {n log10 2} < log10 (10q + r + 1) − 1 (the −1s appear because 1 < log10 (10q + r), log10 (10q + r + 1) < 2). As {n log10 2} is uniformly distributed mod 1, we see that A(q, r) = (log10 (10q + r + 1) − 1) − (log10 (10q + r) − 1) 1 . = log10 1 + 10q + r Hence the frequency with which the penultimate leading digit of 2n is r is 9 X q=1 log10 1 1+ 10q + r = log10 9 Y q=1 1 1+ 10q + r . Solution 2.1 Suppose first that the numbers α1 , . . . , αk , 1 are rationally independent. This means that if r1 , . . . , rk , r are rational numbers such that r1 α1 + · · · + rk αk + r = 0, then r1 = · · · = rk = r = 0. In particular, for ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0} ℓ1 α1 + · · · + ℓk αk ∈ / Z, so that e2πi(ℓ1 α1 +···+ℓk αk ) 6= 1. 110 MATH4/61112 12. Solutions By summing the geometric progression we have that X 1 e2πin(ℓ1 α1 +···+ℓk αk ) − 1 1 n−1 2πi(ℓ jα +···+ℓ jα ) 1 1 k k e n = n e2πi(ℓ1 α1 +···+ℓk αk ) − 1 j=0 ≤ 2 1 → 0, n |e2πi(ℓ1 α1 +···+ℓk αk ) − 1| as n → ∞. Therefore, by Weyl’s Criterion, (nα1 , . . . , nαk ) is uniformly distributed mod 1. Now suppose that the numbers α1 , . . . , αk , 1 are rationally dependent. Thus there exist rationals r1 , . . . , rk , r (not all zero) such that r1 α1 + · · · + rk αk + r = 0. By multiplying by a common denominator we can find ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0} such that ℓ1 α1 + · · · + ℓk αk ∈ Z. Thus e2πi(ℓ1 nα1 +···+ℓk nαk ) = 1 for all n ∈ N and so n−1 1 X 2πi(ℓ1 jα1 +···+ℓk jαk ) e = 1 6→ 0, n j=0 as n → ∞. Therefore, (nα1 , . . . , nαk ) is not uniformly distributed mod 1. Solution 2.2 Let p(n) = αk nk + · · · + α1 n + α0 . Suppose that αk , . . . , αs+1 ∈ Q but αs 6∈ Q. Let p1 (n) = αk nk + · · · + αs+1 ns+1 p2 (n) = αs ns + · · · + α1 n + α0 so that p(n) = p1 (n) + p2 (n). By choosing q to be a common denominator for αk , . . . , αs+1 , we can write 1 p1 (n) = (mk nk + · · · + ms+1 ns+1 ) q where mj ∈ Z. By Weyl’s Criterion, we want to show that for ℓ ∈ Z \ {0} we have n−1 1 X 2πiℓp(j) e →0 n j=0 as n → ∞. Write j = qm + r where r = 0, . . . , q − 1. Then p1 (qm + r) = dr mod 1 for some dr ∈ Q. (q,r) Moreover, p2 (qm + r) = p2 (m) is a polynomial in m with irrational leading coefficient. Now 1 n→∞ n lim n−1 X h i n −1 q−1 q e2πiℓp(j) = 1 X X 2πiℓdr 2πiℓp(q,r) (m) 2 e e n→∞ n lim m=0 r=0 j=0 = lim n→∞ h i n q n = 0 (q,r) as p2 q−1 X r=0 (m) is uniformly distributed mod 1. 111 h i n 2πiℓdr e 1 h i n q −1 q X m=0 (q,r) e2πiℓp2 (m) MATH4/61112 12. Solutions Solution 2.3 Let p(n) = αn2 + n + 1 where α 6∈ Q. Let m ≥ 1 and consider the sequence p(m) (n) = p(n + m) − p(n) of mth differences. We have that p(m) (n) = α(n + m)2 + (n + m) + 1 − αn2 − n − 1 = 2αmn + αm2 + m which is a degree 1 polynomial in n with leading coefficient 2αm 6∈ Q. Note that 2αm 6 0, as m ≥ 1. By Exercise 1.2 we have that p(m) (n) is uniformly distributed mod 1 for every m ≥ 1. By Lemma 2.3.3, it follows that p(n) is uniformly distributed mod 1. Solution 2.4 By Weyl’s Criterion, we require that for each (ℓ1 , ℓ2 ) ∈ Z2 \ {(0, 0)} we have n−1 1X exp 2πi(ℓ1 p(k) + ℓ2 q(k)) → 0 n (12.0.1) k=0 as n → ∞. Let pℓ1 ,ℓ2 (n) = ℓ1 p(n) + ℓ2 q(n) = (ℓ1 αk + ℓ2 βk )nk + · · · + (ℓ1 α1 + ℓ2 β1 )n + (ℓ1 α0 + ℓ2 β0 ) This is a polynomial of degree at most k. Then (12.0.1) can be written as n−1 1X exp 2πipℓ1 ,ℓ2 (k). n k=0 By the 1-dimensional version of Weyl’s criterion (using the integer ℓ = 1), this will converge to 0 as n → ∞ if pℓ1 ,ℓ2 (n) is uniformly distributed mod 1. By Weyl’s Theorem on Polynomials (Theorem 2.3.1), this happens if at least one of ℓ1 αk + ℓ2 βk , ℓ1 αk−1 + ℓ2 βk−1 , . . . , ℓ1 α1 + ℓ2 β1 is irrational. Note that ℓ1 αi +ℓ2 βi 6∈ Q if and only if αi , βi , 1 are rationally independent. Solution 2.5 (i) We know that ∅ ∈ B and that if E ∈ B then X \ E ∈ B. Hence X = X \ ∅ ∈ B. S S T (ii) Let En T ∈ B. Then X \En T ∈ B. Then n (X \En ) ∈ B. Now n (X \En ) = X \ n En . Hence n En = X \ (X \ n En ) ∈ B. Solution 2.6 The smallest σ-algebra containing the sets [0, 1/4), [1/4, 1/2), [1/2, 3/4) and [3/4, 1] is B = {∅, [0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1], [0, 1/2), [0, 1/4) ∪ [1/2, 3/4), [0, 1/4) ∪ [3/4, 1], [1/4, 3/4), [1/4, 1/2) ∪ [3/4, 1], [1/2, 1], [0, 3/4), [0, 1/2) ∪ [3/4, 1], [0, 1/4) ∪ [1/2, 1], [1/4, 1], [0, 1]} Solution 2.7 Clearly a finite union of dyadic intervals is a Borel set. By Proposition 2.4.2 we need to show that if x, y ∈ [0, 1], x 6= y, then there exist disjoint dyadic intervals I1 , I2 such that x ∈ I1 , y ∈ I2 . Let ε = |x − y| and choose n such that 1/2n < ε/2. Without loss of generality, assume that x < y. Then there exist integers p, q, p < q, such that p q q+1 p−1 ≤x< n < n <y≤ n . n 2 2 2 2 Hence x, y belong to different dyadic intervals. 112 MATH4/61112 12. Solutions Solution 2.8 Let A denote the collection of finite unions of intervals. Trivially ∅ ∈ A. If A, B ∈ A are finite unions of intervals then A ∪ B is a finite union of intervals. Hence A is closed under taking finite unions. If A = [a, b] ⊂ [0, 1] then Ac = [0, a) ∪ (b, 1] is a finite union of intervals. Hence A is an algebra. Solution 2.9 First note that if µ is a measure and A ⊂ B then µ(A) ≤ µ(B). (To see this, note that if A ⊂ B then B = A ∪ (B \ A) is a disjoint union. Hence µ(B) = µ(A ∪ (B \ A)) = µ(A) + µ(B \ A) ≥ µ(A).) Let µ denote Lebesgue measure on [0, 1]. Let x ∈ [0, 1]. For any ε > 0, we have that {x} ⊂ (x − ε, x + ε) ∩ [0, 1]. Hence µ({x}) ≤ 2ε. As ε > 0 is arbitrary, it follows that µ({x}) = 0. Let E = {xj }∞ j=1 be a countable set. Then ∞ ∞ X [ µ({xj }) = 0. µ(E) = µ {xj } = j=1 j=1 Hence any countable set has Lebesgue measure 0. As the rational points in [0, 1] are countable, it follows that µ(Q ∩ [0, 1]) = 0. Hence Lebesgue almost every point in [0, 1] is irrational. Solution 2.10 Let µ = δ1/2 be the Dirac δ-measure at 1/2. Then, by definition, µ([0, 1/2) ∪ (1/2, 1]) = 0 as 1/2 6∈ [0, 1/2) ∪ (1/2, 1]. Hence µ{x ∈ [0, 1] | x 6= 1/2} = 0, so that µ-a.e. point in [0, 1] is equal to 1/2. Solution 3.1 Let xn = αn where α ∈ R is irrational. Then xn is uniformly distributed mod 1 (by the results in §1.2.1). Let A = {{αn} | n ≥ 0} ⊂ [0, 1] denote the set of fractional parts of the sequence xn ; note that A is a countable set. Let f = χA . Then f ∈ L1 ([0, 1], B, µ) (where B denotes R the Borel σ-algebra and µ denotes Lebesgue measure on [0, 1]) and f ≡ 0 a.e. Hence f dµ = 0. However, f ({xn }) = 1 for each n. Hence n−1 1X f ({xn }) = 1 6→ n j=0 as n → ∞. Z f dµ = 0 Solution 3.2 Let X be a compact metric space equipped with the Borel σ-algebra B. Let T : X → X be continuous. Recall that B is generated by the open sets. It is sufficient to check that T −1 U ∈ B for all open sets U . But this is clear: as T is continuous, the pre-image T −1 U of any open set is open, hence T −1 U ∈ B. Solution 3.3 Define fn : [0, 1] → R by fn (x) = n − n2 x if 0 ≤ x ≤ 1/n 0 if 1/n ≤ x ≤ 1. 113 MATH4/61112 12. Solutions R (Draw a picture!) Then fn is continuous, hence fn ∈ L1 (X, B, µ). Moreover, fn dµ = 1/2 for each n. Hence fn 6→ 0 in L1 (X, B, µ). However, fn → 0 µ-a.e. To see this, let x ∈ [0, 1], x 6= 0. Choose n such that 1/n < x. Then fn (x) = 0 for any n ≥ N . Hence, if x 6= 0, we have that fn (x) = 0 for all sufficiently large n. Hence fn → 0 µ-a.e. Solution 3.4 First note that if B ∈ B then T −1 B ∈ B. Hence T∗ µ(B) = µ(T −1 B) is well-defined. Clearly T −1 (∅) = ∅. Hence T∗ µ(∅) = µ(T −1 ∅) = µ(∅) = 0. Let En ∈ B be pairwise disjoint. Then T −1 En ∈ B are pairwise disjoint. (To see this, suppose that x ∈ T −1 En ∩ T −1 Em . Then T (x) ∈ En and T (x) ∈ Em . Hence T (x) ∈ En ∩ Em . As the En are pairwise disjoint, this implies that n = m. Hence T −1 En = T −1 Em .) Hence ! ! ! ∞ ∞ ∞ ∞ ∞ X X [ [ [ T∗ µ(En ) µ(T −1 En ) = T −1 En = En = µ En = µ T −1 T∗ µ n=1 n=1 n=1 n=1 n=1 S S∞ −1 E . where we have used the fact that T −1 ∞ n n=1 En = n=1 T Hence T∗ µ is a measure. Finally, note that T −1 (X) = X. Hence T∗ µ(X) = µ(T −1 X) = µ(X) = 1, so that T is a probability measure. Solution 3.5 (i) Let λ denote Lebesgue measure on [0, 1]. All one needs to do is to find a set B such that λ(B) 6= λ(T −1 B), and any (reasonable) choice of set B will work. For example, take B = (1/2, 1). Then T −1 (B) = ∞ [ n=1 1 1 , n + 1 n + 1/2 . It follows that λ(T −1 B) = ∞ X n=1 1 1 = log(4) − 1 < = λ(B). (1 + 2n)(1 + n) 2 (ii) Recall that 1 µ(B) = log 2 Z B dx 1 = 1+x log 2 Z χB (x) dx. 1+x Note that 1/2 ≤ 1/(1 + x) ≤ 1 if 0 ≤ x ≤ 1. Hence Z Z 1 χB (x) 1 dx ≤ µ(B) ≤ χB (x) dx log 2 2 log 2 so that 1 1 λ(B) = 2 log 2 2 log 2 Z 1 χB (x) dx ≤ µ(B) ≤ log 2 114 Z χB (x) dx = 1 λ(B). log 2 MATH4/61112 12. Solutions (iii) From (3.4.1) it follows that 1 2 log 2 Z f dλ ≤ Z 1 f dµ ≤ 2 Z f dµ (12.0.2) for all simple functions f . By taking increasing sequences of simple functions, we see that (12.0.2) continues to hold for non-negative measurable functions. Let f ∈ L1 (X, B, µ). Then Z Z 1 |f | dλ ≤ |f | dµ 2 log 2 so that f ∈ L1 (X, B, λ). Similarly, if f ∈ L1 (X, B, λ) then f ∈ L1 (X, B, µ). Solution 3.6 Let [a, b] ⊂ [0, 1]. Then T −1 [a, b] = k−1 [ j=0 so that T∗ µ([a, b]) = k−1 X b+j j=0 k a+j b+j , k k k−1 − a+j Xb−a = = b − a = µ([a, b]). k k j=0 Hence T∗ µ and µ agree on intervals. Hence, by the Hahn-Kolmogorov Extension Theorem, T∗ µ = µ so that µ is a T -invariant measure. Solution 3.7 Let T (x) = βx mod 1. Then T has a graph as illustrated in Figure 12.1. 1 1/β 0 0 1/β 1 Figure 12.1: The graph of T (x) = βx mod 1. Let us first show that T does not preserve Lebesgue measure. For this, it is sufficient to find a set B such that B and T −1 B do not have equal Lebesgue measure; in fact, almost any 115 MATH4/61112 12. Solutions reasonable choice of B will suffice, but here is a specific example. Let λ denote Lebesgue measure. Take B = [1/β, 1]. Then λ(B) = 1 − 1/β = 1/β 2 (as β − 1 = 1/β). Now T −1 [1/β, 1] = [1/β 2 , 1/β] so that λ(T −1 B) = 1/β − 1/β 2 = 1/β 3 6= λ(B). We now show that T does preserve the measure µ defined as in the statement of the question. To do this we again use the Hahn-Kolmogorov Extension Theorem, which tells us that it is sufficient to prove that µ(T −1 [a, b]) = µ[a, b] for all intervals [a, b] ⊂ [0, 1]. If [a, b] ⊂ [0, 1/β] then a b a+1 b+1 −1 , , ∪ , T [a, b] = β β β β a disjoint union. Hence, µ(T −1 [a, b]) = = 1 + β13 (b − a) β b−a 1 1 β + β3 1 1 + 2 β β 1 β 1 + β 1 β + 1 β3 (b + 1) − (a + 1) β = µ([a, b]). If [a, b] ⊂ [1/β, 1] then T −1 [a, b] = [a/β, b/β] and a b b−a 1 −1 , µ(T [a, b]) = µ = 1 = µ([a, b]). 1 β β β β + β3 If a < 1/β < b then we write [a, b] = [a, 1/β] ∪ [1/β, b]. Then T −1 [a, b] = T −1 [a, 1/β] ∪ T −1 [1/β, b], a disjoint union. Hence µ(T −1 [a, b]) = µ(T −1 [a, 1/β] ∪ T −1 [1/β, b]) = µ(T −1 [a, 1/β]) + µ(T −1 [1/β, b]) = µ([a, 1/β]) + µ([1/β, b]) = µ([a, b]). Solution 3.8 (i) Note that µ([0, 1]) = 1 π Z 1 0 using the substitution x = sin2 θ. p 1 1 dx = π x(1 − x) Z π/2 dθ = 1, 0 (ii) By the Hahn-Kolmogorov Extension Theorem it is sufficient to prove that µ([a, b]) = µ(T −1 [a, b]) for all intervals [a, b]. Note that T −1 [a, b] = 1− √ √ √ 1+ 1−b 1+ 1−a 1−a 1− 1−b , , ∪ 2 2 2 2 √ (as the graph of T is decreasing on [1/2, 1] the order of a, b are reversed in the second sub-interval). It is sufficient to prove that µ([(1 − √ 1 − a)/2, (1 − 116 √ 1 − b)/2]) = 1 µ([a, b]) 2 MATH4/61112 12. Solutions and µ([(1 + √ 1 − b)/2, (1 + √ 1 − a)/2]) = 1 µ([a, b]) 2 We prove the first equality (the second is similar). Now µ([(1 − √ 1 − a)/2, (1 − √ 1 1 − b)/2]) = π Z 1− 1− √ 1−b 2 √ 1−a 2 1 p x(1 − x) dx. (12.0.3) Consider the substitution u = 4x(1 √− x). Then du = 4(1 − 2x)dx and as x ranges √ between (1 − 1 − a)/2 and (1 − 1 − b)/2, u ranges between a, b. Note also that a simple manipulation shows that (1 − 2x)2 = 1 − u. Hence the right-hand side of (12.0.3) is equal to Z b 1 1 1 p du = µ([a, b]). 2π a 2 u(1 − u) Similarly, µ([(1 + √ 1 − b)/2, (1 + √ 1 1 − a)/2]) = µ([a, b]) 2 and the result follows. Solution 3.9 Note that X= ∞ [ n=1 1 1 , ∪ {0} n+1 n and that this is a disjoint union. Hence, denoting Lebesgue measure by µ, ∞ X µ 1 = µ(X) = n=1 1 1 , n+1 n ∞ ∞ X X 1 1 1 − = . = n n + 1 n=1 n(n + 1) n=1 By the Hahn-Kolmogorov Extension Theorem, it is sufficient to check that µ(T −1 [a, b]) = µ([a, b]) for all intervals [a, b]. It is straightforward to check that T −1 ∞ [ b+n a+n , [a, b] = n(n + 1) n(n + 1) n=1 and that this is a disjoint union. Hence µ(T −1 ∞ X a+n b+n µ [a, b]) = , n(n + 1) n(n + 1) = n=1 ∞ X b−a n(n + 1) n=1 = b − a = µ([a, b]). Solution 3.10 (i) Clearly d(x, y) ≥ 0 with equality if and only if x = y. It is also clear that d(x, y) = d(y, x). It remains to prove the triangle inequality: d(x, z) ≤ d(x, y) + d(y, z) for all x, y, z ∈ Σ. 117 MATH4/61112 12. Solutions If any of x, y, z are equal then the triangle inequality is clear, so we can assume that x, y, z are all distinct. Suppose that x and y agree in the first n places and that y and z agree in the first m places. Then x and z agree in at least the first min{n, m} places. Hence n(x, z) ≥ min{n(x, y), n(y, z)}. Hence d(x, z) = 1 2n(x,z) ≤ 1 2min{n(x,y),n(y,z)} ≤ 1 2n(x,y) + 1 2n(y,z) = d(x, y) + d(y, z). (ii) Let ε > 0 and choose n ≥ 1 such that 1/2n < ε. Choose δ = 1/2n+1 . Suppose that d(x, y) < δ = 1/2n+1 . Then n(x, y) > n + 1, i.e. x and y agree in at least the first n + 1 places. Hence σ(x) and σ(y) agree in at least the first n places. Hence n(σ(x), σ(y)) > n. Hence d(σ(x), σ(y)) < 1/2n < ε. (iii) We show that [i0 , . . . , in−1 ] is open. Let x ∈ [i0 , . . . , in−1 ] so that xj = ij for j = 0, 1, . . . , n − 1. Choose ε = 1/2n . Suppose that d(x, y) < ε. Then n(x, y) > n, i.e. x and y agree in at least the first n places. Hence xj = yj for j = 0, 1, . . . , n − 1. Hence yj = ij for j = 0, 1, . . . , n − 1 so that y ∈ [i0 , . . . , in−1 ]. Hence [i0 , . . . , in−1 ] is open. To see that [i0 , . . . , in−1 ] is closed, note that Σ \ [i0 , . . . , in−1 ] = [ [i′0 , . . . , i′n−1 ] where the union is over all n-tuples (i′0 , i′1 , . . . , i′n−1 ) 6= (i0 , i1 , . . . , in−1 ). This is a finite union of open sets, and so is open. Hence [i0 , . . . , in−1 ], as the complement of an open set, is closed. Solution 3.11 First note that P = P = 3 0 1 0 0 0 1/4 0 3/4 0 0 0 1/2 0 1/2 0 0 0 3/4 0 1/4 0 0 0 1 0 0 5/8 0 3/8 0 5/32 0 3/4 0 3/32 0 1/2 0 1/2 0 3/32 0 3/4 0 5/32 0 3/8 0 5/8 0 2 ,P = 4 ,P = 1/4 0 3/4 0 0 0 5/8 0 3/8 0 1/8 0 3/4 0 1/8 0 3/8 0 5/8 0 0 0 3/4 0 1/4 , 5/32 0 3/4 0 3/32 0 17/32 0 15/32 0 1/8 0 3/4 0 1/8 0 15/32 0 17/32 0 3/32 0 3/4 0 5/32 . As for each i, j there exists n for which P n (i, j) > 0, it follows that P is irreducible. Recall that the period of P is the highest common factor of {n > 0 | P n (i, i) > 0}. As all the diagonal entries of P 2 are positive, it follows that P has period 2. Decompose {1, 2, 3, 4, 5} = {1, 3, 5} ∪ {2, 4} = S0 ∪ S1 . If P (i, j) > 0 then either i ∈ S0 and j ∈ S1 , or i ∈ S1 and j ∈ S0 , i.e. i ∈ Sℓ and j ∈ Sℓ+1 mod 2 . When restricted to the indices {1, 3, 5}, P 2 has the form 1/4 3/4 0 1/8 3/4 1/8 0 3/4 1/4 118 MATH4/61112 12. Solutions which is easily seen to be irreducible and aperiodic. When restricted to the indices {2, 4}, P 2 has the form 5/8 3/8 3/8 5/8 which is clearly irreducible and aperiodic. The eigenvalues of P are found by evaluating the −λ 1 0 0 1/4 −λ 3/4 0 0 1/2 −λ 1/2 0 0 3/4 −λ 0 0 0 1 After simplifying this expression, we obtain determinant 0 0 0 . 1/4 −λ 1 (1 − λ)(1 + λ)λ λ + 4 . (Note that, as P has period 2, we expect from the Perron-Frobenius Theorem that the square roots of 1 to be the eigenvalues of modulus 1 for P .) A left eigenvector p = (p(1), p(2), p(3), p(4), p(5)) for the eigenvalue 1 is determined by 0 1 0 0 0 1/4 0 3/4 0 0 (p(1), p(2), p(3), p(4), p(5)) 0 1/2 0 1/2 0 = (p(1), p(2), p(3), p(4), p(5)) 0 0 3/4 0 1/4 0 0 0 1 0 which simplifies to 1 3 3 1 1 1 p(2) = p(1), p(1)+ p(3) = p(2), p(2)+ p(4) = p(3), p(3)+p(5) = p(4), p(4) = p(5). 4 2 4 4 2 4 Setting p(1) = 1 we obtain (p(1), p(2), p(3), p(4), p(5)) = (1, 4, 6, 4, 1), and normalising this to form a probability vector we obtain 1 1 3 1 1 p= , , , , . 16 4 8 4 16 Solution 3.12 Let p = (p(1), . . . , p(k)) be a probability vector. Let P be the matrix P = p(1) p(2) · · · p(k) p(1) p(2) · · · p(k) .. .. . . . p(1) p(2) · · · p(k) Then P is a stochastic matrix. As each p(j) > 0, it follows that P is aperiodic. It is straightforward to check that pP = p. As P (i, j) = p(j), the Markov measure determined by the matrix P is the same as Bernoulli measure determined by the probability vector p. 119 MATH4/61112 12. Solutions Solution 4.1 Note that χT −1 B (x) = 1 ⇔ x ∈ T −1 B ⇔ T (x) ∈ B ⇔ χB (T (x)) = 1. Hence χT −1 B = χB ◦ T . Solution 4.2 Note that T n (x) = x if and only if 2n x = x mod 1, i.e. 2n x = x + p for some integer p. Hence x = p/(2n − 1). We get distinct values of x in R/Z when p = 0, 1, . . . , 2n − 2 (note that when p = 2n − 1 then x = 1, which is the same as 0 in R/Z). Hence there are infinitely many distinct periodic orbits for the doubling map. If Pn−1 x, T x, . . . , T n−1 x is a periodic orbit of period n then let δ(x) = 1/n j=0 δT j x denote the periodic orbit measure supported on the orbit of x. As there are infinitely many distinct periodic orbits, there are infinitely many distinct measures supported on periodic orbits. Solution 4.3 Recall that R/Z can be regarded as [0, 1] where 0 and 1 are identified. Suppose that f : [0, 1] → R is integrable and that f (0) = f (1) so that f is a well-defined function on R/Z. Then Z 1 Z Z 1/2 f ◦ T dµ f ◦ T dµ + f ◦ T dµ = 1/2 0 = Z 1/2 f (2x) dx + 0 = = 1 2 Z Z 1 f (x) dx + 0 Z 1 2 1 1/2 Z 1 f (2x − 1) dx (12.0.4) f (x) dx 0 f dµ where we have used the substitution u(x) = 2x for the first integral and u(x) = 2x − 1 for the second integral in (12.0.4) Solution 4.4 It is straightforward to check that T : Rk /Zk → Rk /Zk is a diffeomorphism. Recall that we can identify functions f : Rk /Zk → C with functions f : Rk → C that satisfy f (x + n) = f (x) for all n ∈ Zk . We apply the change of variables formula with the substitution u(x) = T (x). Note that 1 0 DT (x) = 1 1 so that | det DT | = 1. Hence, by the change of variables formula Z Z Z Z f dµ = f dµ. f ◦ T | det DT | dµ = f ◦ T dµ = T (Rk /Zk ) Rk /Zk Solution 5.1 Let α = p/q with p, q ∈ Z, q 6= 0, hcf(p, q) = 1. Let B= q−1 [ j=0 1 j j , + . q q 2q 120 MATH4/61112 12. Solutions Then T −1 j j j −p j−p 1 1 , + , + = q q 2q q q 2q so that T −1 B = B (draw a picture to understand this better). However µ(B) = 1/2, so that T is not ergodic with respect to Lebesgue measure. Solution 5.2 Suppose that f ∈ L2 (X, B, µ) has Fourier series X c(n,m) e2πi(nx+my) . (n,m)∈Z2 Then f ◦ T has Fourier series X c(n,m) e2πi(n(x+α)+m(x+y)) = (n,m)∈Z2 X c(n,m) e2πinα e2πi((n+m)x+my) . (n,m)∈Z2 Comparing coefficients we see that c(n+m,m) = e2πinα c(n,m) . Suppose that m 6= 0. Then for each j > 0, |c(n+jm,m) | = · · · = |c(n+m,m) | = |c(n,m) |, as |e2πinα | = 1. Note that if m 6= 0 then (n + jm, m) → ∞ as j → ∞. By the RiemannLebesgue Lemma (Proposition 5.3.2(ii)), we must have that cn,m = 0 if m 6= 0. Hence f has Fourier series X c(n,0) e2πinx (n,0)∈Z2 and f ◦ T has Fourier series X c(n,0) e2πinα e2πinx . (n,0)∈Z2 Comparing Fourier coefficients we see that c(n,0) = c(n,0) e2πinα . Suppose that n 6= 0. As α 6∈ Q, e2πinα 6= 1. Hence c(n,0) = 0 unless n = 0. Hence f has Fourier series c(0,0) , i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure. Solution 5.3 Suppose that T : X → X has a periodic point x with period n. Let n−1 µ= 1X δT j x . n j=0 Let B ∈ B and suppose that T −1 B = B. We must show that µ(B) = 0 or 1. 121 MATH4/61112 12. Solutions Suppose that x ∈ B. Then x ∈ T −1 B. Hence T (x) ∈ B. Continuing inductively, we see that T j (x) ∈ B for j = 0, 1, . . . , n − 1. Hence µ(B) = n−1 n−1 j=0 j=0 1X 1X δT j x (B) = 1 = 1. n n Similarly, if x ∈ X \ B then T j (x) ∈ X \ B for j = 0, 1, . . . , n − 1 (we have used the fact that if B is T -invariant then X \ B is T -invariant). Hence µ(B) = 0. Solution 5.4 (i) Recall that the determinant of a matrix is equal to the product of all the eigenvalues. Let T be a linear toral automorphism with corresponding matrix A. Suppose that A has an eigenvalue of modulus 1. By considering A2 if necessary, there is no loss in generality in assuming that det A = 1. Suppose k = 2. Then the matrix A has two eigenvalues, λ, λ. As A has an eigenvalue of modulus 1, we must have that λ = e2πiθ for some θ ∈ [0, 1). Then λ, λ satisfy the equation λ2 + 2 cos θλ + 1 = 0. However the matrix A = (a, b; c, d) has characteristic equation λ2 +(a+d)λ+1 = 0. Hence 2 cos θ = a+d, an integer. Thus θ = 0, ±π/2, ±π. Hence λ = ±1, ±i, and is a root of unity and so T cannot be ergodic. Now suppose k = 3. Then, assuming that A has an eigenvalue of modulus 1, the ¯ and µ ∈ R. As det A = 1, we must have that eigenvalues must be λ = e2πiθ , λ ¯ ¯ λλµ = 1. As λλ = 1, it follows that µ = 1, Hence A has 1 as an eigenvalue and so T cannot be ergodic. Thus k ≥ 4. (ii) A has integer entries and it is easy to see that det A = 1. Hence A determines a linear toral automorphism of R4 /Z4 . (iii) It is straightforward to calculate that the characteristic equation for A is λ4 − 8λ3 + 6λ2 − 8λ + 1 = 0. Clearly, λ 6= 0. Dividing by λ2 and substituting u = λ + λ−1 we see that u2 − 8u + 4 = 0. Hence √ u = 4 ± 2 3. Multiplying λ + λ−1 = u by λ we obtain a quadratic in λ with solution √ u ± u2 − 4 . λ= 2 Substituting the two different values of u gives four values of λ, namely: q q √ √ √ √ 2 + 3 ± 6 + 4 3, 2 − 3 ± i 4 3 − 6. The first two are real and not of unit modulus, whereas the second two are complex numbers of unit modulus. 122 MATH4/61112 12. Solutions (iv) This question is not part of the course and is included for completeness only. The solution requires ideas from Galois theory. We first claim that λ4 − 8λ3 + 6λ2 − 6λ + 1 is irreducible over Q. (To see this, recall that irreducibility over Q is equivalent to irreducibility over Z. Now apply Eisenstein’s criterion using the prime 2.) Hence λ4 − 8λ3 + 6λ2 − 6λ + 1 has no common factors with λn − 1 for any n. Hence λ is not a root of unity. Solution 6.1 (i) Let µ denote Lebesgue measure. We prove that T∗ µ = µ by using the HahnKolmogorov Extension Theorem. It is sufficient to prove that T∗ µ([a, b]) = µ([a, b]) for all intervals [a, b]. Note that a b a b −1 T [a, b] = , ∪ 1 − ,1 − . 2 2 2 2 Hence b a b a − 1− T∗ µ([a, b]) = − + 1 − = b − a = µ([a, b]). 2 2 2 2 Hence µ is a T -invariant measure. (ii) Define I(0) = [0, 1/2], I(1) = [1/2, 1] and define the maps φ0 : [0, 1] → I(0) : x 7→ x x , φ1 : [0, 1] → I(1) : x 7→ 1 − . 2 2 Then T φ0 (x) = x, T φ1 (x) = x. Given i0 , . . . , in−1 ∈ {0, 1} define φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 and note that T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1]. Define I(i0 , i1 , . . . , in−1 ) = φi0 ,i1 ,...,in−1 ([0, 1]) and call this a cylinder of rank n. It is easy to see that cylinders of rank n are dyadic intervals (although the labelling of these cylinders is not the same as the labelling that one gets when using the doubling map: for example, for the tent map I(1, 1) = [1/2, 3/4] whereas for the doubling map I(1, 1) = [3/4, 1]). Hence the algebra A of finite unions of cylinders generates the Borel σ-algebra. Let B ∈ B be such that T −1 B = B. Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 ) be a cylinder of rank n and let φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note also that µ(I) = 1/2n . We will also need the fact that |φ′ (x)| = 1/2n (this follows from noting that |φ′0 (x)| = |φ′1 (x)| = 1/2 and using the chain rule). Finally, we observe that Z µ(B ∩ I) = χB∩I (x) dx Z = χB (x)χI (x) dx Z χB (x) dx = I 123 MATH4/61112 12. Solutions = = = Z Z Z Z 1 χB (φ(x))|φ′ (x)| dx by the change of variables formula 0 1 0 1 χT −n B (φ(x))|φ′ (x)| dx as T −n B = B χB (T n (φ(x)))|φ′ (x)| dx 0 1 χB (x)|φ′ (x)| dx as T n φ(x) = x Z 1 1 = χB (x) as |φ′ (x)| = 1/2n 2n 0 = µ(I)µ(B) as µ(I) = 1/2n . = 0 Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T . Solution 6.2 For each n ≥ 1 define I(n) = [1/(n + 1), 1/n] and define the maps φn : [0, 1] → I(n) : x 7→ x−n . n(n + 1) Note that T φn (x) = x for all x ∈ [0, 1]. Given i0 , i1 , . . . , in−1 ∈ N define φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 and note that T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1]. Define I(i0 , i1 , . . . , in−1 ) = φi0 ,i1 ,...,in−1 ([0, 1]) and call this a cylinder of rank n. Note that 1 1 ≤ φ′n (x) = n(n + 1) 2 so that, by the chain rule, φ′i0 ,i1 ,...,in−1 (x) = n−1 Y j=0 1 1 ≤ n. ij (ij + 1 2 By the Intermediate Value Theorem, I(i0 , i1 , . . . , in−1 ) is an interval of length no more than 1/2n . For each n, the cylinders of rank n partition [0, 1]. Let x, y ∈ [0, 1] and suppose that x 6= y. Choose n such that |x − y| > 1/2n . Then x, y must lie in different cylinders of rank n. Hence the cylinders separate the points of [0, 1]. By Proposition 2.4.2 it follows that the algebra A of finite unions of cylinders generates the Borel σ-algebra. Let B ∈ B be such that T −1 B = B. Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 ) be a cylinder of rank n and let φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note that µ(I) = n−1 Y j=0 1 ij (ij + 1 for any x ∈ [0, 1]. 124 = φ′ (x) (12.0.5) MATH4/61112 12. Solutions Finally, we observe that Z µ(B ∩ I) = χB∩I (x) dx Z = χB (x)χI (x) dx Z χB (x) dx = I Z 1 χB (φ(x))|φ′ (x)| dx by the change of variables formula = 0 Z 1 χT −n B (φ(x))|φ′ (x)| dx as T −n B = B = 0 Z 1 χB (T n (φ(x)))|φ′ (x)| dx = 0 Z 1 χB (x)|φ′ (x)| dx = 0 = µ(I)µ(B) by (12.0.5). Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T . Solution 6.3 (i) First note that 1 P1 1 = , x0 Q1 x0 + 1 x1 = x1 P2 = . x0 x1 + 1 Q2 If we define P0 = 0, Q0 = 1 then we have that P2 = x1 P1 + P0 and Q2 = x1 Q1 + Q0 . Similarly, x1 + t 1 P1 (x0 ; t) 1 P2 (x0 , x1 ; t) = = , = . 1 x0 + t Q1 (x0 ; t) x0 + x +t x0 x1 + 1 + t Q2 (x0 , x1 ; t) 1 then P2 (x0 , x1 ; t) = P2 + tP1 , Q2 (x0 , x1 ; t) = Q2 + tQ1 . Suppose that Pn (x0 , . . . , xn−1 ) = Pn + tPn−1 , Qn (x0 , . . . , xn−1 ) = Qn + tQn−1 . Then Pn+1 (x0 , x1 , . . . , xn ; t) Qn+1 (x0 , x1 , . . . , xn ; t) = [x0 , . . . , xn−1 , xn + t] 1 = [x0 , . . . , xn−1 + xn + t Pn (x0 , x1 , . . . , xn−1 ; xn1+t ) = Qn (x0 , x1 , . . . , xn−1 ; xn1+t ) = = 125 Pn + 1 xn +t Pn−1 1 xn +t Qn−1 Qn + xn Pn + Pn−1 + tPn−1 . xn Qn + Qn−1 + tQn−1 MATH4/61112 12. Solutions Hence Pn+1 (x0 , x1 , . . . , xn ; t) = xn Pn +Pn−1 +tPn , Qn+1 (x0 , x1 , . . . , xn ; t) = xn Qn +Qn−1 +tQn . Putting t = 0 we obtain the recurrence relations Pn+1 = xn Pn + Pn+1 , Qn+1 = xn Qn + Qn+1 . Hence Pn+1 (x0 , x1 , . . . , xn ; t) = Pn+1 + tPn , Qn+1 (x0 , x1 , . . . , xn ; t) = Qn+1 + tQn . By induction, the recurrence relations hold. (ii) Note that Qn Pn−1 − Qn−1 Pn = (xn−1 Qn−1 + Qn−2 )Pn−1 − Qn−1 (xn−1 Pn−1 + Pn−2 ) = −(Qn−1 Pn−2 − Qn−2 Pn−1 ) = · · · = (−1)n . Solution 6.4 (i) Let x = (x0 , x1 , . . .), y = (y0 , y1 , . . .) ∈ Σ. Let dR/Z and dΣ denote the usual metrics on R/Z and Σ, respectively. Now dR/Z (π(x), π(y)) (12.0.6) ≤ |π(x0 , x1 , . . .) − π(y0 , y1 , . . .)| x0 − y0 x1 − y1 = + + · · · 2 2 2 |x0 − y0 | |x1 − y1 | + + ···. (12.0.7) ≤ 2 22 Now if dΣ (x, y) < 1/2n then xj = yj for j = 0, . . . n. Hence we can bound the right-hand side of (12.0.7) by |xn+1 − yn+1 | |xn+2 − yn+2 | + + ··· ≤ 2n+2 2n+3 1 2n+2 1 + n+3 + · · · 2 1 1 1 + + 2 + ··· 2 2 1 2n+2 1 ≤ , n+1 2 summing the geometric progression. This implies that π is continuous. To see this, let ε > 0. Choose n such that 1/2n+1 < ε. Choose δ = 1/2n . If dΣ (x, y) < δ then dR/Z (π(x), π(y)) < ε. = (ii) Observe that if x = (xj )∞ j=0 ∈ Σ then π(σ(x)) = π(σ(x0 , x1 , . . .)) = π(x1 , x2 , . . .) = x1 x2 + 2 + ··· 2 2 and T (π(x)) = T (π(x0 , x1 , . . .)) x x1 0 + 2 + ··· = T 2 2 x1 x2 = x0 + + 2 + · · · mod 1 2 2 x1 x2 + 2 + ···. = 2 2 126 MATH4/61112 12. Solutions (iii) We must show that T∗ (π∗ µ) = π∗ µ. To see this, observe that T∗ (π∗ µ)(B) = π∗ µ(T −1 B) = µ(π −1 T −1 B) = µ(σ −1 π −1 B) as πσ = T π = (σ∗ µ)(π −1 B) = µ(π −1 B) as µ is σ-invariant = (π∗ µ)(B). (iv) Suppose that µ is an ergodic measure for σ. We claim that π∗ µ is an ergodic measure for T , i.e. if B ∈ B(R/Z) is such that T −1 B = B then π∗ µ(B) = 0 or 1. First observe that π −1 (B) is σ-invariant. This follows as: σ −1 (π −1 (B)) = π −1 T −1 (B) = π −1 (B). As µ is an ergodic measure for σ, we must have that µ(π −1 (B)) = 0 or 1. Hence π∗ µ(B) = 0 or 1. (v) There are uncountably many different Bernoulli measures µp for Σ given by the family of probability vectors (p, 1 − p). These are ergodic for σ. To see that π∗ µp are all different, notice that π∗ µp ([0, 1/2)) = µp (π −1 [0, 1/2)) = µp ([0]) = p, where [0] denotes the cylinder consisting of all sequences that start with 0. Solution 7.1 Suppose that xn → x. We must show that δxn ⇀ δx . Let f ∈ C(X, R). Then Z Z f dδxn = f (xn ) → f (x) = f dδx as f is continuous. Hence δxn ⇀ δx . Solution 7.2 Suppose that µn ⇀ µ. We must show that T∗ µ ⇀ T∗ µ. Let f ∈ C(X, R). Then Z Z Z Z f d(T∗ µn ) = f ◦ T dµn → f ◦ T dµ = f d(T∗ µ) as f ◦ T is continuous. Hence T∗ µn ⇀ T∗ µ. Solution 7.3 (i) Suppose that µn → µ. R We claim R that µn ⇀ µ. To show this, we have to prove that if f ∈ C(X, R) then f dµn → f dµ. Let f ∈ C(X, R). Note that f /kf k∞ ∈ C(X, R) and that k(f /kf k∞ )k∞ = 1. Hence Z Z Z Z f f f dµn − f dµ = kf k∞ dµ − dµ n kf k∞ kf k∞ Z Z g dµn − g dµ ≤ sup g∈C(X,R),kgk∞ ≤1 = kµn − µk, which tends to 0 as n → ∞. 127 MATH4/61112 12. Solutions (ii) Suppose that xn → x but that xn 6= x for all n. We claim that δxn 6→ δx . Note that kδxn − δx k = sup f ∈C(X,R),kf k∞ ≤1 |f (xn ) − f (x)|. For each n, we can choose a continuous function fn ∈ C(X, R) such that fn (x) = 1, fn (xn ) = 0 and kfn k∞ ≤ 1. Hence kδxn − δx k = sup f ∈C(X,R),kf k∞ ≤1 |f (xn ) − f (x)|. ≥ |fn (xn ) − fn (x)| = 1. Hence δxn 6→ δx . (iii) First note that if f ∈ C(X, R) is any continuous function with kf k∞ ≤ 1, then Z Z f dδx − f dδy = |f (x) − f (y)| ≤ |f (x)| + |f (y)| ≤ 2. Hence kδx − δy k = sup f ∈C(X,R),kf k∞ ≤1 Z Z f dδx − f dδy ≤ 2. Conversely, by Urysohn’s Lemma, there exist continuous functions g1 , g2 such that g1 (x) = g2 (y) = 1, g1 (y) = g2 (x) = 0 and 0 ≤ g1 , g2 ≤ 1. Let h = g1 − g2 . Then h(x) = 1, h(y) = −1 and −1 ≤ h ≤ 1 (so that khk∞ = 1). Hence 2 = |h(x) − h(y)| Z Z = h dδx − h dδy Z Z ≤ sup f dδx − f dδy f ∈C(X,R),kf k∞ ≤1 = kδx − δy k. Hence kδx − δy k = 2. Solution 7.4 Let xn ∈ X be a sequence such that xn → x and xn 6= x for all n. Let µn = δxn and µ = δx . Then µn ⇀ µ. Take B = {x}. Then µn (B) = 0 but µ(B) = 1. Hence µn (B) 6→ µ(B). Solution 7.5 Let µ1 , µ2 ∈ M (X, T ) and suppose that α ∈ [0, 1]. Then αµ1 + (1 − α)µ2 ∈ M (X). To check that αµ1 + (1 − α)µ2 ∈ M (X, T ), note that (T∗ (αµ1 + (1 − α)µ2 ))(B) = (αµ1 + (1 − α)µ2 )(T −1 B) = αµ1 (T −1 B) + (1 − α)µ2 (T −1 B) = αµ1 (B) + (1 − α)µ2 (B) = (αµ1 + (1 − α)µ2 )(B). 128 MATH4/61112 12. Solutions Solution 7.6 Let S ⊂ C(X, R) be uniformly dense. Let f ∈ C(X, RR). Let ε > R 0. Choose g ∈ S such that kf − gk∞ < ε. Choose N such that if n ≥ N then | g dµn − g dµ| < veps. Then Z Z f dµn − f dµ Z Z Z Z Z Z ≤ f dµn − g dµn + g dµn − g dµ + f dµ − g dµ Z Z Z Z ≤ |f − g| dµn + g dµn − g dµ + |f − g| dµ ≤ 3ε. As ε > 0 is arbitrary, the result follows. Solution 7.7 (i) Recall that x ∈ Σ is a periodic point with period n if σ n (x) = x. If x is periodic with period n then xj+n = xj for all j = 0, 1, 2, . . .. Hence x is determined by the first n symbols, which then repeat. As there are two choices for each xj , there are 2n periodic points with period n. (ii) First note that µn is a Borel probability measure. Let [i0 , i1 , . . . , im−1 ] be a cylinder. Let n ≥ m. Then the periodic points x of period n in [i0 , i1 , . . . , im−1 ] have the form x = (i0 , i1 , . . . , im−1 , xm , . . . , xn−1 , i0 , i1 , . . . , im−1 , xm , . . . , xn−1 , i0 , . . .) where the finite string of symbols i0 , i1 , . . . , im−1 , xm , . . . , xn−1 repeats. The symbols xm , . . . , xn−1 can be chosen arbitrarily. Hence there are 2n−m such periodic points. Hence, if n ≥ m, Z Z 1 1 n−m = m = χ[i0 ,i1 ,...,im−1 ] dµ. χ[i0 ,i1 ,...,im−1 ] dµn = n × 2 2 2 (iii) To prove that χ[i0 ,i1 ,...,im−1 ] is continuous we need to show that, if xn → x then χ[i0 ,i1 ,...,im−1 ] (xn ) → χ[i0 ,i1 ,...,im−1 ] (x). First suppose that x ∈ [i0 , i1 , . . . , im−1 ]. As xn → x, it follows from the definition of the metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agree in the first m places. Hence if n ≥ N then xn ∈ [i0 , i1 , . . . , im−1 ]. Hence, if n ≥ N , then χ[i0 ,i1 ,...,im−1 ] (xn ) = 1 = χ[i0 ,i1 ,...,im−1 ] (x). Now suppose that x 6∈ [i0 , i1 , . . . , im−1 ]. As xn → x, it follows from the definition of the metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agree in the first m places. Hence if n ≥ N , there exists j ∈ {0, 1, . . . , m − 1} such that (xn )j 6= ij ; that is, if n ≥ N , then xn 6∈ [i0 , i1 , . . . , im−1 ]. Hence, if n ≥ N , then χ[i0 ,i1 ,...,im−1 ] (xn ) = 0 = χ[i0 ,i1 ,...,im−1 ] (x). Hence χ[i0 ,i1 ,...,im−1 ] is continuous. (iv) Let S denote the set of finite linear combinations of characteristic functions of cylinders. By the Stone-Weierstrass Theorem, S is uniformly dense in C(X, R). By (ii) R R above, if g ∈ S then g dµn → g dµ as n → ∞. Let f ∈ C(X, R) and let ε > 0. Choose g ∈ S such that kf − gk∞ < Rε. Then aR 3ε argument as in the solutions to Exercise 7.6 proves that lim supn→∞ | f dµn − f dµ| < 3ε and the result follows. 129 MATH4/61112 12. Solutions Solution 7.8 As trigonometric polynomials are uniformly dense in C(X, R), it is sufficient to prove that R R Pr 2πihn(j) ,xi , g ◦ T dµ = g dµ for all trigonometric polynomials g. Let g(x) = j=0 cj e (j) (j) (j) cj ∈ R, n(j) = (n1 , n2 , n3 ) ∈ Z3 be a trigonometric polynomial. We label the coefficients R (j) so that n = 0 if and only if j = 0. Then g dµ = c0 . Note that x α+x g ◦ T y + Z3 = g x + y + Z3 z y+z r X (j) (j) (j) cj e2πih(n1 ,n2 ,n3 ),(α+x,x+y,y+x)i = = j=0 r X (j) “ ” (j) (j) (j) α 2πi n1 x+n2 (x+y)+n3 (y+z) (j) α 2πih(n1 +n2 ,n2 +n3 ,n3 ),(x,y,z)i cj e2πin1 e j=0 = r X cj e2πin1 e (j) (j) (j) (j) (j) (j) (j) (j) . j=0 Hence Z g dµ = Z X r (j) cj e2πin1 (j) (j) α 2πih(n1 +n2 ,n2 +n3 ,n3 ),(x,y,z)i e dµ j=0 = r X j=0 (j) 2πin1 α cj e Z (j) e2πih(n1 (j) (j) (j) (j) (j) (j) +n2 ,n2 +n3 ,n3 ),(x,y,z)i (j) (j) dµ. (j) The integral is equal to zero unless (n1 + n2 , n2 + n3 , n3 ) = (0, 0, 0), i.e. unless (j) (j) (j) n1 = n2 = n3 = 0. By our choice of labelling the coefficients, this only happens if j = 0. Hence Z Z g ◦ T dµ = c0 = g dµ. Solution 8.1 (i) Let B ∈ B and let f = χB . Note that Z Z Z Z Z Z dν dν dν dµ = χB dµ = f dµ. dν = ν(B) = f dν = χB dν = dµ dµ dµ B B Hence the result holds for characteristic functions, hence for simple functions (finite linear combinations of characteristic functions). Let f ∈ L1 (X, B, µ) be such that f ≥ 0. By considering an increasing sequence of simple functions, the result follows for positive L1 functions. By splitting an arbitrary real-valued L1 function into its positive and negative parts, and then an arbitrary L1 (X, B, µ) function into its real and imaginary parts, the result holds. (ii) Now dν1 /dµ, dν2 /dµ are the unique functions such that Z Z dν2 dν1 dµ, ν2 (B) = dµ, ν1 (B) = B dµ B dµ 130 MATH4/61112 12. Solutions respectively. Hence ν1 (B) + ν2 (B) = Z B dν1 dµ + dµ Z B dν2 dµ = dµ Z B dν1 dν2 + dµ. dµ dµ However, (ν1 + ν2 )(B) = Z B d(ν1 + ν2 ) dµ. dµ Hence, by uniqueness in the Radon-Nikodym theorem, we have that d(ν1 + ν2 ) dν1 dν2 = + . dµ dµ dµ (iii) Suppose that µ(B) = 0. As ν ≪ µ then ν(B) = 0. As λ ≪ ν then λ(B) = 0. Hence λ ≪ µ. Now as λ ≪ µ we have λ(B) = Z B As λ ≪ ν we have λ(B) = Z B dλ dµ. dµ dλ dν = dν Z χB dλ dν. dν By part (i), using the fact that ν ≪ µ, it follows that Z Z Z dλ dν dλ dν dλ dν = χB dµ = dµ. χB dν dν dµ B dν dµ Hence by uniqueness in the Radon-Nikodym theorem, dλ dλ dν = . dµ dν dµ Solution 8.2 The claimed formula is easily seen to be valid for n = 3. Suppose the formula is valid for n. Then x T n+1 y + Z3 z x = T T n y + Z3 z n α + x 1 n n 3 = T + Z α+ x+y 1 2 n n n y+z x+ α+ 1 2 3 131 MATH4/61112 12. Solutions n α+x+α 1 n n n = α+ x+y+ α+x 2 1 1 n n n n n α+ x+ y+z+ α+ x+y 3 2 1 2 1 n+1 α + x 1 n+1 n+1 3 = α+ x+y + Z . 2 1 n+1 n+1 n+1 α+ x+ y+z 3 2 1 + Z3 Hence the claimed formula holds by induction. Let f (x, y, z) = e2πi(kx+ℓy+mz) . Then n−1 n−1 j=0 j=0 (k,ℓ,m) 1 X 2πipx,y,z 1X (j) f (T j ((x, y, z) + Zk )) = e n n (k,ℓ,m) (k,ℓ,m) where px,y,z (n) is a polynomial. When m 6= 0, px,y,z (n) is a degree 3 polynomial with (k,ℓ,m) leading coefficient mα/6 6∈ Q. When m = 0, ℓ 6= 0, px,y,z (n) is a degree 2 polynomial with (k,ℓ,m) leading coefficient ℓα/2 6∈ Q. When m = ℓ = 0, k 6= 0, px,y,z (n) is a degree 1 polynomial (k,ℓ,m) with leading coefficient kα 6∈ Q. In all three cases, px,y,z (n) is uniformly distributed mod 1, by Weyl’s Theorem on Polynomials (Theorem 2.3.1). Hence by Weyl’s Criterion (theorem 1.2.1 for all (k, ℓ, m) ∈ Z3 \ {(0, 0, 0)} we have n−1 (k,ℓ,m) 1 X 2πipx,y,z (j) e →0 n j=0 as n → ∞. When k = ℓ = m = 0 we trivially have that n−1 n−1 j=0 j=0 (k,ℓ,m) 1 X 2πipx,y,z 1X (j) e = 1→1 n n as n → ∞. Hence n−1 1X f (T j ((x, y, z) + Zk )) → n j=0 Z f dµ whenever f (x, y, z) = e2πi(kx+ℓy+mz) . By taking finite linear combinations of exponential functions we see that n−1 Z 1 X j g(T (x)) − g dµ → 0 sup x∈X n j=0 as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (Theorem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R) 132 MATH4/61112 12. Solutions and let ε > 0. Then there exists a trigonometric polynomial g such that kf − gk∞ < ε. Hence for any x ∈ X we have n−1 Z 1 X j f (T (x)) − f dµ n j=0 n−1 n−1 Z Z X 1 X 1 j j j ≤ (f (T (x)) − g(T (x)) + g(T (x)) − g dµ + g(x) − f (x) dµ n j=0 n j=0 n−1 Z Z n−1 1 X 1X j j j |f (T (x)) − g(T (x)| + g(T (x)) − g dµ + |g(x) − f (x)| dµ ≤ n n j=0 j=0 X Z 1 n−1 j ≤ 2ε + g(T (x)) − g dµ . n j=0 ∞ Hence, taking the supremum over all x ∈ X, we have n−1 n−1 Z Z X 1 X 1 j j f (T (x)) − f dµ g(T (x)) − g dµ n ≤ 2ε + n . j=0 j=0 ∞ ∞ Letting n → ∞ we see that n−1 Z 1 X j f (T (x)) − f dµ lim sup n→∞ n j=0 ≤ 2ε. ∞ As ε > 0 is arbitrary, it follows that n−1 Z 1 X j lim f (T (x)) − f dµ n→∞ n j=0 = 0. ∞ Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the unique invariant measure. Solution 8.3 Let T be a uniquely ergodic homeomorphism with unique invariant measure µ. Suppose that every orbit is dense. Let U be a non-empty open set. Then for all x ∈ X, S∞ n there exists n ∈ Z such that T (x) ∈ U . Hence X = n=−∞ T −n U . Hence ! ∞ ∞ ∞ X X [ µ(U ) µ(T −n U ) = T −n U ≤ 1 = µ(X) = µ n=−∞ n=−∞ n=−∞ as µ is T -invariant. Hence µ(U ) > 0. Conversely, suppose that µ(U ) > 0 for all non-empty open sets. Suppose for a contradiction that there exists x0 ∈ X such that the orbit of x0 is not dense. Clearly {T n (x0 ) | n ∈ Z} is T -invariant. As T is continuous, the set Y = cl{T n (x0 ) | n ∈ Z} 133 MATH4/61112 12. Solutions is also T -invariant. As the orbit of x0 is not dense, Y is a proper subset of X. As Y is closed and X is compact, it follows that Y is compact. By Theorem 7.5.1 there exists an invariant probability measure ν for the map T : Y → Y . Extend ν to X by setting ν(B) = ν(B ∩ Y ) for Borel subsets B ⊂ X. Noting that X \ Y is also T -invariant, it follows that ν is an invariant measure for T : X → X. This contradicts unique ergodicity as ν(X \ Y ) = 0 but µ(X \ Y ) > 0. Solution 9.1 Let X = R equipped with the Borel σ-algebra and Lebesgue measure. Define T (x) = x + 1. Then Lebesgue measure is T -invariant. Take A = [0, 1). Then A has positive measure, but no point of A returns to A under T . Solution 9.2 Take X = {0, 1} to be a set consisting of two elements. Let B be the set of all subsets of X and equip X with the measure µ = 12 δ0 + 12 δ1 that assigns measure 1/2 to both 0 and 1. Take T (x) = x to be the identity. Then T is a measure-preserving transformation. Let A = {0}, B = {1}. Then µ(A) = µ(B) = 1/2 > 0. However, T j (0) never lands in B. Solution 9.3 Recall that E(f | A) is determined as being the unique A-measurable function such that Z Z f dµ E(f | A) dµ = A A for all A ∈ A. (i) We need to show that E(αf + βg | A) = αE(f | A) + βE(g | A). Note that αE(f | A) + βE(g | A) is A-measurable. Moreover, as Z Z Z E(g | A) dµ E(f | A) dµ + β αE(f | A) + βE(g | A) dµ = α A A Z ZA g dµ f dµ + β = α A Z A αf + βg dµ = A Z E(αf + βg | A) dµ = A for all A ∈ A, the claim follows. (ii) First note that E(f | A) ◦ T is T −1 A-measurable. To see this, note that E(f | A) is A-measurable, i.e. {x ∈ X | E(f | A)(x) ≤ c} ∈ A for all c ∈ R. Hence {x ∈ X | E(f | A)(T x) ≤ c} = T −1 {x ∈ X | E(f | A)(x) ≤ c} ∈ T −1 A 134 MATH4/61112 12. Solutions so that E(f | A) ◦ T is T −1 A-measurable. Note that for any A ∈ A Z Z E(f | A) ◦ T dµ = χT −1 A E(f | A) ◦ T dµ T −1 A Z = χA ◦ T · E(f | A) ◦ T dµ Z = χA E(f | A) dµ as µ is T -invariant Z E(f | A) dµ = ZA f dµ. = A Moreover Z T −1 A E(f ◦ T | T −1 Z A) dµ = ZT = Z = Z = Z T −1 A E(f ◦ T | T −1 A) dµ = Z f ◦ T dµ χT −1 A f ◦ T dµ χA ◦ T · f ◦ T dµ Z = Hence −1 A χA f dµ f dµ. A T −1 A E(f | A) ◦ T dµ for all A ∈ A. By the characterisation of conditional expectation, it follows that E(f ◦ T | T −1 A) = E(f | A) ◦ T. (iii) That E(f | B) = f is immediate from the above characterisation of conditional expectation. (iv) Recall that a function f : X → R is A-measurable if f −1 (−∞, c) ∈ A for all c ∈ R. Suppose that f is N -measurable. Let Bc = f −1 (−∞, c) ∈ N . Hence µ(Bc ) = 0 or 1. Note that c1 < c2 implies Bc1 ⊂ Bc2 . Hence there exists c0 such that c0 = sup{c | µ(Bc ) = 0} = inf{c | µ(Bc ) = 1}. We claim that f (x) = c0 µ-a.e. If c < c0 then µ({x ∈ X | f (x) < c}) = 0. Hence f (x) ≥ c0 µ-a.e. Let c > c0 . Then µ({x ∈ X | f (x) ≥ c}) = µ(X \ {x ∈ X | f (x) < c}) = 1 − µ({x ∈ X | f (x) < c}) = 0. Hence µ({x ∈ X | f (x) > c0 }) = 0. Hence f (x) = c0 µ-a.e. 135 MATH4/61112 12. Solutions Suppose that f is constant almost everywhere, say f (x) = a µ-a.e. Then f −1 (−∞, c) = ∅ µ-a.e. if c < a and f −1 (−∞, c) = X µ-a.e. if c > a. Hence µ(f −1 (−∞, c)) = 0 or 1. Hence f −1 (−∞, c) ∈ N for all c ∈ R. Hence f is N -measurable. If N ∈ N has measure 0 then Z f dµ = 0 = Z Z f dµ N N dµ and if N ∈ N has measure 1 then Z Z Z Z f dµ dµ. f dµ = f dµ = N N Hence E(f | N ) = R f dµ. Solution 9.4 (i) Let α = {A1 , . . . , An } be a finite partition of X into sets Aj ∈ B and let A be the set of all finite unions of sets in α. Trivially ∅ ∈ A. S ℓj Let BS j = i=1 Ai,j , Ai,j ∈ α, be a countable collection of finite unions of sets in α. ThenS j Bj is a union of sets in α. As there are S only finitely many sets in α, we have that j Bj is a finite union of sets in α. Hence j Bj ∈ A. It is clear that A is closed under taking complements. Hence A is a σ-algebra. (ii) Recall that g : X → R is A-measurable if g−1 (−∞, c) ∈ A for all c ∈ R. P Suppose that g is constant on each Aj ∈ α and write g(x) = j cj χAj (x). Then S g−1 (−∞, c) = Aj where the union is taken over sets Aj for which cj < c. Hence g is A-measurable. Conversely, suppose that g is A-measurable. For each c ∈ R, let T Ac = g−1 (−∞, c). Then Ac ∈ A. Moreover, Ac ↓ ∅ as cS→ −∞ (in the sense that c∈R Ac = ∅) and Ac ↑ X as c → ∞ (in the sense that c Ac = X). Let A ∈ α. Then there exists c0 such that A 6⊂ Ac for c < c0 and A ⊂ Ac for c > c0 . Hence g(x) = c0 for all x ∈ A. Hence g is constant on each element of α. (iii) Define g by g(x) = n X χAj (x) j=1 R Aj f dµ µ(Aj ) . Then g is constant on each set in α, hence g is A-measurable. Let Ai ∈ α. Then Z g dµ = Ai Hence R n Z X j=1 A g dµ = R Af χA i χA j R Aj f dµ µ(Aj ) dµ = n Z X j=1 χAi ∩Aj R Aj f dµ µ(Aj ) dµ = dµ for all A ∈ A. It follows that g = E(f | A). 136 Z Ai f dµ. MATH4/61112 12. Solutions Solution 9.5 Clearly ∅ ∈ I. Let I ∈ I, so that T −1 (I) = I. Then T −1 (X \ I) = X \ I, so that the complement of I is in I. S S S S Let In ∈ I. Then T −1 ( n In ) = n T −1 In = n In so that n In ∈ I. Hence I is a σ-algebra. Solution 9.6 Recall that E(f | I) is determined by the requirements that E(f | I) is I-measurable and that Z Z E(f | I) dµ = f dµ I I L2 (X, B, µ) L2 (X, I, µ) for all I ∈ I. Let PI : → denote the orthogonal projection onto the subspace of I-measurable functions. To show that PI f = E(f | I) it is thus sufficient to check that for each I ∈ mathcalI we have Z Z PI f dµ = f dµ I I for all f ∈ L2 (X, R B, µ). R R Note that I PI f dµ = χI PI f dµ = hχI , PI f i and, similarly, I f dµ = hχI , f i, where we use h·, ·i to denote the inner product on L2 (X, B, µ). Hence it is sufficient to prove that, for all I ∈ I, hχI , f − PI f i = 0. It is proved in the proof of Theorem 9.6.1 that L2 (X, B, µ) = L2 (X, I, µ) ⊕ C where C denotes the norm-closure of the subspace {w ◦T − w | w ∈ L2 (X, B, µ). Hence it is sufficient to prove that hχI , gi = 0 for all g ∈ C. To see this, first note that for w ∈ L2 (X, B, µ) we have that hχI , w ◦ T − wi = hχI , w ◦ T i − hχI , wi = hχT −1 I , w ◦ T i − hχI , wi = hχI ◦ T, w ◦ T i − hχI , wi = 0, using the facts that I = T −1 I a.e. and that T is measure-preserving. It follows that hχI , gi = 0 for all g ∈ C. Solution 10.1 Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ) and let f ∈ L1 (X, B, µ). Let n−1 X f (T j x). Sn = j=0 By Birkhoff’s Ergodic Theorem, there exists a set N such that µ(N ) = 0 and if x 6∈ N then R Sn /n → f dµ as n → ∞. Let x 6∈ N . Note that f (T n x) Sn n + 1 Sn+1 = + . n n+1 n n R R 1 Sn+1 → f dµ and n1 Sn → f dµ as Letting n → ∞ we have that (n + 1)/n → 1, n+1 n → ∞. Hence if x 6∈ N then f (T n x)/n → 0 as n → ∞. Hence f (T n x)/n → 0 as n → ∞ for µ-a.e. x ∈ X. 137 MATH4/61112 12. Solutions Solution 10.2 R Let f ≥ 0 be measurable and suppose that f dµ = ∞. For each integer M > 0 define fM (x) = min{f (x), M }. Then 0 ≤ fM ≤ M , hence fM ∈ L1 (X, B, µ). Moreover fM (x) ↑ f (x) as M R → ∞ forR all x ∈ X. Hence by the Monotone Convergence Theorem (Theorem 3.1.2), fM dµ → f dµ = ∞. By Birkhoff’s Ergodic Theorem, there exists NM ⊂ X with µ(NM ) = 0 such that for all x 6∈ NM we have Z n−1 1X j fM (T x) = fM dµ. (12.0.8) lim n→∞ n j=0 S Let N = ∞ M =1 NM . Then µ(N ) = 0. Moreover, for any M > 0 we have that if x 6∈ N then (12.0.8) holds. R R Let K ≥ 0 be arbitrary. As fM dµ → ∞, it follows that there exists M > 0 such that fM dµ ≥ K. Hence for all x 6∈ N we have n−1 n−1 j=0 j=0 1X 1X lim inf f (T j x) ≥ lim fM (T j x) = n→∞ n n→∞ n Z fM dµ ≥ K. As K is arbitrary, we have that for all x 6∈ N n−1 lim inf n→∞ Hence 1 n P∞ j=0 f (T j x) 1X f (T j x) = ∞. n j=0 → ∞ for µ-a.e. x ∈ X. Solution 10.3 We prove that (i) implies (ii). Suppose that T is an ergodic measure-preserving transformation of the probability space (X, B, µ). Recall from Proposition 10.2.2 that for all A, B ∈ B, n−1 1X µ(T −j A ∩ B) → µ(A)µ(B), n j=0 as n → ∞; equivalently, for all A, B ∈ B we have n−1 Z 1X n j=0 j χA (T x)χB (x) dµ → Z χA dµ Z χB dµ. (12.0.9) Pr Let f (x) = k=1 ck χAk (x) be a simple function. Then taking linear combinations of expressions of the form (12.0.9) we have that n−1 Z 1X n j=0 j f (T x)χB (x) dµ → Z f dµ Z χB dµ. If f ≥ 0 is a positive measurable function then we can choose a sequence of simple functions fn ↑ f that increase pointwise to f . By the Monotone Convergence Theorem (Theorem 3.1.2) we have that n−1 Z 1X n j=0 f (T j x)χB (x) dµ → 138 Z f dµ Z χB dµ (12.0.10) MATH4/61112 12. Solutions for all positive measurable functions f . Suppose that f ∈ L1 (X, B, µ) is real-valued. Then by writing f = f + − f − where f + , f − are positive, we have that (12.0.10) holds when f is integrable and real-valued. By taking real and imaginary parts of f , we have that (12.0.10) holds for all f ∈ L1 (X, B, µ). By taking finite linear combinations of characteristic functions in (12.0.10) we have that n−1 Z 1X n j=0 f (T j x)g(x) dµ → Z f dµ Z g dµ. (12.0.11) for all simple functions g. By taking an increasing sequence of simple functions and applying the Monotone Convergence Theorem as above, we have that (12.0.11) holds for all positive measurable functions g. By writing g = g+ − g− where g+ .g− are positive, we have that (12.0.11) holds for any real-valued integrable function g. By taking real and imaginary parts, we have that (12.0.11) holds for any g ∈ L1 (X, B, µ). We prove that (ii) implies (i). Suppose that for all f, g ∈ L2 (X, B, µ) we have that n−1 Z 1X lim n→∞ n j f (T x)g(x) dµ = j=0 Z f dµ Z g dµ. Suppose that T −1 B = B, B ∈ B. Then χB ∈ L2 (X, B, µ). Taking f = g = χB we have that Z Z n−1 Z 1X j χB (T x)χB (x) dµ → χB dµ χB dµ = µ(B)2 . lim n→∞ n j=0 Note that χB (T j x)χB (x) = χT −j B∩B (x) = χB (x) as T −j B = B. Hence n−1 Z 1X n j=0 n−1 Z 1X χB (T x)χB (x) dµ = n j χB dµ = j=0 Z χB dµ = µ(B). Hence µ(B) = µ(B)2 so that µ(B) = 0 or 1. Solution 10.4 Choose a countable dense set of continuous functions {fi }∞ i=1 ⊂ C(X, R). By Birkhoff’s Ergodic Theorem there exists Yi ∈ B such that µ(Yi ) = 1 and n−1 1X fi (T j x) = n→∞ n lim j=0 T∞ Z fi dµ for all x ∈ Yi . Let Y = i=1 Yi . Then Y ∈ B and µ(Y ) = 1. Let f ∈ C(X, R), ε > 0, x ∈ Y . Choose i such that kf − fi k∞ < ε. Choose N such that if n ≥ N then n−1 Z X 1 j fi (T x) − fi dµ < ε. n j=0 Then n−1 Z 1 X j f (T x) − f dµ n j=0 139 MATH4/61112 12. Solutions n−1 n−1 Z Z Z X 1 X 1 j j j ≤ f (T x) − fi (T x) + fi (T x) − fi dµ + fi dµ − f dµ . n j=0 n j=0 < 3ε. As ε > 0 is arbitrary, we have that for all f ∈ C(X, R) and for all x ∈ Y , n−1 1X f (T j x) = lim n→∞ n j=0 Z f dµ. Solution 10.5 Let S = {A, B, C, . . . , Z} denote the finite set of letters (symbols) in the alphabet. Let Σ = {x = (xj )∞ j=0 | xj ∈ S, j = 0, 1, 2, . . .} denote the space of all infinite sequences of symbols. For each s ∈ S, let p(s) = 1/26 denote the probability of choosing symbol s. Let B denote the Borel σ-algebra on Σ and equip Σ with the Bernoulli probability measure µ defined on cylinders by µ([i0 , i1 , . . . , in−1 ]) = p(i0 )p(i1 ) · · · p(in−1 ). Define σ : Σ → Σ by (σ(x)j = xj+1 . We regard an element x ∈ Σ as one possible outcome of the monkey typing an infinite sequence of letters. Let B denote the cylinder [B, A, N, A, N, A]. Then µ(B) = 1/266 > 0. By Birkhoff’s Ergodic Theorem, for µ-a.e. x ∈ Σ, n−1 1X χB (σ j (x)) = µ(B) > 0. n→∞ n lim j=0 Hence, almost surely, the infinite sequence of letters x will contain ‘MONKEY’. Hence, with probability 1, the monkey will type the word ‘MONKEY’. (Indeed, with probability one he will type ‘MONKEY’ infinitely often.) By Kac’s Lemma, the expected first time at which ‘MONKEY’ appears is 1/µ(B) = 266 . If the monkey types 1 letter a second, then one would expect to wait 266 seconds (about 9.8 years) until ‘MONKEY’ first appears in a block of 6. Solution 11.1 We first claim that for each integer b ≥ 2, T (x) = Tb (x) = bx mod 1 is ergodic with respect to Lebesgue measure µ (we already know that Lebesgue measure is invariant by Exercise 3.6). To see this, we use Fourier series, following the argument that was used to prove that the doubling map is ergodic with respect to Lebesgue measure. Suppose that f ∈ L2 (R/Z, B, µ) is such that f ◦ P T = f µ-a.e. Then f ◦ T p = f µ2πinx . Then f ◦ T p has a.e. for all p ∈ N. Associate to f its Fourier series ∞ n=−∞ cn e P∞ p Fourier series n=−∞ cn e2πinb x . Comparing Fourier coefficients we see that cbp n = cn . Suppose that n 6= 0. Then bp n → ∞ as n → ∞. By the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)), cn = cbp n → 0 as n → ∞. Hence cn = 0 if n 6= 0. Hence f has Fourier series c0 , i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure. Solution 11.2 (i) Let Xb = {x ∈ [0, 1) | x is simply normal in base b}. 140 MATH4/61112 12. Solutions Then for each b ≥ 2, Xb has Lebesgue measure µ(Xb ) = 1. Hence X∞ = ∞ \ Xb b=2 consists of all numbers that are simply normal in every base b ≥ 2. Clearly µ(X∞ ) = 1. (ii) T Let X(b) denote the set of numbers that are normal in base b, b ≥ 2. Then X∞ = ∞ b=2 Xb consists of all normal numbers. Clearly µ(X∞ ) = 1. Alternatively, note that x ∈ [0, 1] is simply normal in base bk if and only if every word of length k occurs with frequency 1/bk in the base b expansion of x. Hence a number is normal in every base if and only if it is simply normal in every base. Solution 11.3 Let T (x) = rx mod 1, T : R/Z → R/Z. From Exercise 11.1 we know that T is ergodic with respect to Lebesgue measure µ. Let x ∈ [0, 1] and let xn = r n x. Then {xn }, the fractional part of xn , is equal to T n x. Let [a, b] ⊂ [0, 1]. Let ℓ ∈ Z \ {0} and let fℓ (x) = e2πiℓx . Then there exists Nℓ ∈ B, µ(Nℓ ) = 0, such that n−1 n−1 j=0 j=0 1X 1 X 2πiℓxn = e fℓ (T j x) → n n Z fℓ (x) dx = 0 for all x 6∈ Nℓ . S Let N = ℓ∈Z\{0} Nℓ . As µ(Nℓ ) = 0 and this is a countable union, we have that µ(N ) = 0. Hence if x 6∈ N we have for all ℓ ∈ Z \ {0} n−1 1 X 2πiℓxn e = 0. n j=0 By Weyl’s Criterion it follows that if x 6∈ N then xn is uniformly distributed mod 1. (Aside: you might wonder why we had to use Weyl’s Criterion and did not just use the definition of uniform distribution. Whilst it is certainly true that n−1 1X 1 χ[a,b] (T j x) → card{j ∈ {0, 1, . . . , n − 1} | {xj } ∈ [a, b]} = n n j=0 Z χ[a,b] dµ = b − a for µ-a.e. x ∈ X, the set of measure zero for which this fails depends on the interval [a, b]. We need a set of measure zero that works for all intervals. As there are uncountably many intervals, we cannot just take the union of all the sets of measure zero as we did above. One can make an argument along these lines work, by considering intervals with rational endpoints (and so a countable collection of intervals) and then approximate an arbitrary interval.) Solution 11.4 Let T (x) = 10x mod 1. From Exercise 11.1 we know that T is ergodic with respect to Lebesgue measure. Let x ∈ [0, 1] have decimal expansion x= ∞ X xj 10j+1 j=0 141 MATH4/61112 12. Solutions with xj ∈ {0, 1, . . . , 9}. Let f (x) = 9 X kχ[k/10,(k+1)/10) k=0 so that f (x) = k precisely when x0 = k. Note that f (T j x) = k precisely when xj = k. Then n−1 1 1X (x0 + x1 + · · · + xn−1 ) = f (T j x). n n j=0 Hence by Birkhoff’s Ergodic Theorem, 1 (x0 + x1 + · · · + xn−1 ) = n→∞ n n−1 1X f (T j x) n→∞ n j=0 Z = f (x) dx a.e. lim = lim 9 X k a.e. = 4.5 a.e. 10 k=0 Solution 11.5 If x ∈ (0, 1) then write the continued fraction expansion of x as [x0 , x1 , . . .]. Define f : (0, 1) → R by f (x) = ∞ X kχ(1/(k+1),1/k] (x). k=1 Then f (x) = k precisely when 1/(k + 1) < x ≤ 1/k, i.e. f (x) = k when x0 = k. Hence f (T j x) = k precisely when xj = k. We can write n−1 1X 1 (x0 + · · · + xn−1 ) = f (T j x). n n j=0 Clearly f ≥ 0 and is measurable. However f 6∈ L1 (X, B, µ). To see this, using Exercise 3.5(iii), it is sufficient to show that f 6∈ L1 (X, B, λ) where λ denotes Lebesgue measure. Note that Z ∞ X 1 1 , f dλ = kλ k+1 k k=1 = ∞ X k=1 1 = ∞. k+1 By the results of Exercise 10.2, it follows that n−1 1X f (T j (x)) = ∞ n→∞ n lim j=0 for µ-a.e. x ∈ X. As Gauss’ measure and Lebesgue measure have the same sets of measure zero. we have that n−1 1X f (T j (x)) = ∞ lim n→∞ n j=0 142 MATH4/61112 12. Solutions for Lebesgue almost every point x ∈ X. 143
© Copyright 2025