Ergodic theory lecture notes

MATH41112/61112
Ergodic Theory
Charles Walkden
6th February, 2015
MATH4/61112
Contents
Contents
0 Preliminaries
2
1 An introduction to ergodic theory.
quences
Uniform distribution of real se4
2 More on uniform distribution mod 1. Measure spaces.
13
3 Lebesgue integration. Invariant measures
24
4 More examples of invariant measures
39
5 Ergodic measures: definition, criteria, and basic examples
44
6 Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem to
prove ergodicity
54
7 Continuous transformations on compact metric spaces
63
8 Ergodic measures for continuous transformations
73
9 Recurrence
84
10 Birkhoff ’s Ergodic Theorem
90
11 Applications of Birkhoff ’s Ergodic Theorem
100
12 Solutions to the Exercises
109
1
MATH4/61112
0. Preliminaries
0. Preliminaries
§0.1
Contact details
The lecturer is Dr Charles Walkden, Room 2.241, Tel: 0161 275 5805,
Email: [email protected].
My office hour is: Thursday, 11:30am-12:30pm. If you want to see me at another time
then please email me first to arrange a mutually convenient time.
§0.2
Course structure
This is a reading course, supported by one lecture per week. I have split the notes into
weekly sections. You are expected to have read through the material before the lecture,
and then go over it again afterwards in your own time. In the lectures I will highlight
the most important parts, explain the statements of the theorems and what they mean in
practice, and point out common misunderstandings. As a general rule, I will not normally
go through the proofs in great detail (but they are examinable unless indicated otherwise).
You will be expected to work through the proofs yourself in your own time. All the material
in the notes is examinable, unless it says otherwise. (Note that if a proof is marked ‘not
examinable’ then it means that I won’t expect you to reproduce the proof, but you will
be expected to know and understand the statement of the result. If an entire section is
marked ‘not examinable’ (for example, the review of Riemann integration in §1.3, and
the discussions on the proofs of von Neumann’s Ergodic Theorem and Birkhoff’s Ergodic
Theorem in §§9.6, 10.5, respectively) then you don’t need to know the statements of any
subsidiary lemmas/propositions in those sections that are not used elsewhere but reading
this material may help your understanding.)
Each section of the notes contains exercises. The exercises are a key part of the course
and you are expected to attempt them. The solutions to the exercises are contained in the
notes; I would strongly recommend attempting the exercises first without referring to the
solutions.
Please point out any mistakes (typographical or mathematical) in the notes.
§0.3
The exam
The exam is a 3 hour written exam. There are several past exam papers on the course
website. Note that some topics (for example, entropy) were covered in previous years and
are not covered this year; there are also some new topics (for example, Kac’s Lemma) that
were not covered in 2010 or earlier.
The format of the exam is the same as last year’s. The exam has 5 questions, of which
you must do 4. If you attempt all 5 questions then you will get credit for your best 4
answers. The style of the questions is similar to last year’s exam as well as to ‘Section B’
questions from earlier years.
2
MATH4/61112
0. Preliminaries
Each question is worth 30 marks. Thus the total number of marks available on the
exam is 4 × 30 = 120. This will then be converted to a mark out of 100 (by multiplying by
100/120).
There is no coursework, in-class test or mid-term for this course.
§0.4
Recommended texts
There are several suitable introductory texts on ergodic theory, including
W. Parry, Topics in Ergodic Theory
P. Walters, An Introduction to Ergodic Theory
I.P. Cornfeld, S.V. Fomin and Ya.G. Sinai, Ergodic Theory
K. Petersen, Ergodic Theory
M. Einsiedler and T. Ward, Ergodic Theory: With a View Towards Number Theory.
Parry’s or Walter’s books are the most suitable for this course.
3
MATH4/61112
1. Uniform distribution mod 1
1. An introduction to ergodic theory. Uniform distribution
of real sequences
§1.1
Introduction
A dynamical system consists of a space X, often called a phase space, and a rule that
determines how points in X evolve in time. Time can be either continuous (in which case
a dynamical system is given by a first order autonomous differential equation) or discrete
(in which case we are studying the iterates of a single map T : X → X).
We will only consider the case of discrete time in this course. Thus we will be studying
the iterates of a single map T : X → X. We will write T n = T ◦ · · · ◦ T (n times) to denote
the nth composition of T . If x ∈ X then we can think of T n (x), the result of applying the
map T n times to the point x, as being where x has moved to after time n.
We call the sequence x, T (x), T 2 (x), . . . , T n (x), . . . the orbit of x. If T is invertible (and
so we can iterate backwards by repeatedly applying T −1 ) then sometimes we refer to the
doubly-infinite sequence . . . , T −n (x), . . . , T −1 (x), x, T (x), . . . , T n (x), . . . as the orbit of x
and the sequence x, T (x), . . . , T n (x), . . . as the forward orbit of x.
As an example, consider the map T : [0, 1] → [0, 1] defined by
2x
if 0 ≤ x ≤ 1/2,
T (x) =
2x − 1 if 1/2 < x ≤ 1.
We call this the doubling map.
Some orbits for the doubling map are periodic, i.e. they return to where they started
after a finite number of iterations. For example, 2/5 is periodic as
T (2/5) = 4/5, T (4/5) = 3/5, T (3/5) = 1/5, T (1/5) = 2/5.
Thus T 4 (2/5) = 2/5. We say that 2/5 has period 4.
In general, for a general dynamical system T : X → X, a point x ∈ X is a periodic
point with period n > 0 if T n (x) = x. (Note that we do not assume that n is least.) If x is
a periodic point of period n then we call {x, T (x), . . . , T n−1 (x)} a periodic orbit of period
n.
Other points for the doubling map may have a dense orbit in [0,1]. Recall that a set Y
is dense in [0, 1] if any point in [0, 1] can be arbitrarily well approximated by a point in Y .
Thus the orbit of x is dense in [0, 1] if: for all x′ ∈ X and for all ε > 0 there exists n > 0
such that |T n (x) − x′ | < ε.
Consider a subinterval [a, b] ⊂ [0, 1]. How frequently does an orbit of a point under the
doubling map visit the interval [a, b]? Define the characteristic function χB of a set B by
1 if x ∈ B,
χB (x) =
0 if x 6∈ B.
Then
n−1
X
χ[a,b] (T j (x))
j=0
4
MATH4/61112
1. Uniform distribution mod 1
denotes the number of the first n points in the orbit of x that lie in [a, b]. Hence
n−1
1X
χ[a,b] (T j (x))
n
j=0
denotes the proportion of the first n points in the orbit of x that lie in [a, b]. Hence
n−1
1X
lim
χ[a,b] (T j (x))
n→∞ n
j=0
denotes the frequency with which the orbit of x lies in [a, b]. In ergodic theory, one wants to
understand when this is equal to the ‘size’ of the interval [a, b] (we will make ‘size’ precise
later by using measure theory; for the moment, ‘size’=‘length’). That is, when does
n−1
1X
χ[a,b] (T j (x)) = b − a
n→∞ n
lim
(1.1.1)
j=0
for every interval [a, b]? Note that if x satisfies (1.1.1) then the proportion of time that the
orbit of x spends in an interval [a, b] is equal to the length of that interval; i.e. the orbit of
x is equidistributed in [0, 1] and does not favour one region of [0, 1] over another.
In general, one cannot expect (1.1.1) to hold for every point x; indeed, if x is periodic
then (1.1.1) does not hold. Even if the orbit of x is dense, then (1.1.1) may not hold.
However, one might expect (1.1.1) to hold for ‘typical’ points x ∈ X (where again we can
make ‘typical’ precise using measure theory). One might also want to replace the function
χ[a,b] with an arbitrary function f : X → R. In this case one would want to ask: for the
doubling map T , when is it the case that
n−1
1X
f (T j (x)) =
lim
n→∞ n
j=0
Z
1
f (x) dx?
0
The goal of the course is to understand the statement, prove, and explain how to apply
the following result.
Theorem 1.1.1 (Birkhoff ’s Ergodic Theorem)
Let (X, B, µ) be a probability space. Let f ∈ L1 (X, B, µ) be an integrable function. Suppose
that T : X → X is an ergodic measure-preserving transformation of X. Then
n−1
1X
f (T j (x)) =
n→∞ n
lim
j=0
Z
f dµ
for µ-a.e. point x ∈ X.
Ergodic theory has many applications to other areas of mathematics, notably hyperbolic
geometry, number theory, fractal geometry, and mathematical physics. We shall see some
of the (simpler) applications to number theory throughout the course.
5
MATH4/61112
§1.2
1. Uniform distribution mod 1
Uniform distribution mod 1
Let T : X → X be a dynamical system. In ergodic theory we are interested in the long-term
distributional behaviour of the sequence of points x, T (x), T 2 (x), . . .. Before studying this
problem, we consider an analogous problem in the context of sequences of real numbers.
Let xn ∈ R be a sequence of real numbers. We may decompose xn as the sum of its
integer part [xn ] = sup{m ∈ Z | m ≤ x} (i.e. the largest integer which is less than or equal
to xn ) and its fractional part {xn } = xn − [xn ]. Clearly, 0 ≤ {xn } < 1. The study of
xn mod 1 is the study of the sequence {xn } in [0, 1].
Definition. We say that the sequence xn is uniformly distributed mod 1 (udm1 for short)
if for every a, b with 0 ≤ a < b ≤ 1, we have that
lim
n→∞
1
card {0 ≤ j ≤ n − 1 | {xj } ∈ [a, b]} = b − a
n
as n → ∞.
Remarks.
(i) Here, card denotes the cardinality of a set.
(ii) Thus xn is uniformly distributed mod 1 if, given any interval [a, b] ⊂ [0, 1], the
frequency with which the fractional parts of xn lie in the interval [a, b] is equal to its
length, b − a.
(iii) We can replace [a, b] by [a, b), (a, b] or (a, b) without altering the definition.
The following result gives a necessary and sufficient condition for the sequence xn ∈ R
to be uniformly distributed mod 1.
Theorem 1.2.1 (Weyl’s Criterion)
The following are equivalent:
(i) the sequence xn ∈ R is uniformly distributed mod 1;
(ii) for every continuous function f : [0, 1] → R with f (0) = f (1) we have
n−1
1X
lim
f ({xj }) =
n→∞ n
j=0
Z
1
f (x) dx;
(1.2.1)
0
(iii) for each ℓ ∈ Z \ {0} we have
n−1
1 X 2πiℓxj
e
= 0.
n→∞ n
lim
j=0
Remarks.
(i) As a grammatical point, criterion is singular (the plural is criteria). Weyl’s criterion
is that (i) and (iii) are equivalent. Statement (ii) has been included because it is an
important intermediate step in the proof and, as we shall see, it closely resembles an
ergodic theorem.
6
MATH4/61112
1. Uniform distribution mod 1
(ii) One can replace the hypothesis that f is continuous in (1.2.1) with f is Riemann
integrable.
(iii) To prove that (i) is equivalent to (iii) we work, in fact, not on the unit interval [0, 1] but
on the unit circle R/Z. To form R/Z, we work with real numbers modulo the integers
(informally: we ignore integers parts). Note that, ignoring integer parts means that
0 and 1 in [0, 1] are ‘the same’. Thus the end-points of the unit interval ‘join up’ and
we see that R/Z is a circle. More formally, R is an additive group, Z is a subgroup
and the quotient group R/Z is, topologically, a circle. Note that the requirement in
(ii) that f (0) = f (1) means that f : [0, 1] → R is a well-defined function on the circle
R/Z.
It is, however, the case that (i) is equivalent to (ii) without the hypothesis in (ii) that
f (0) = f (1).
§1.2.1
The sequence xn = αn
The behaviour of the sequence xn = αn depends on whether α is rational or irrational.
If α ∈ Q then it is easy to see that {αn} can take on only finitely many values in [0, 1].
Indeed, if α = p/q (p ∈ Z, q ∈ Z, q 6= 0, hcf(p, q) = 1) then {αn} takes the q values
p
2p
(q − 1)p
0
0=
,
,
,...,
q
q
q
q
as {qp/q} = 0. In particular αn is not uniformly distributed mod 1.
If α 6∈ Q then the situation is completely different. We shall show that αn is uniformly
distributed mod 1 by applying Weyl’s Criterion. Let ℓ ∈ Z \ {0}. As α 6∈ Q we have that
ℓα is never an integer; hence e2πiℓα 6= 1. Note that
n−1
n−1
j=0
j=0
1 X 2πiℓxj
1 X 2πiℓαj
1 e2πiℓαn − 1
e
=
e
=
n
n
n e2πiℓα − 1
by summing the geometric progression. Hence
X
n−1
1 |e2πiℓαn − 1|
1
2
1
2πiℓxj e
= n |e2πiℓα − 1| ≤ n |e2πiℓα − 1| .
n
(1.2.2)
j=0
As α 6∈ Q, the denominator in the right-hand side of (1.2.2) is not 0. Letting n → ∞ we
see that
n−1
1 X 2πiℓx j
lim
e
= 0.
n→∞ n
j=0
Hence xn is uniformly distributed mod 1.
Remarks.
1. More generally, we could consider the sequence xn = αn + β. It is easy to see by
modifying the above argument that xn is uniformly distributed mod 1 if and only if
α is irrational. (See Exercise 1.2.)
7
MATH4/61112
1. Uniform distribution mod 1
2. Fix α > 1 and consider the sequence xn = αn x. Then it is possible to show that for
(Lebesgue) almost every x ∈ R, the sequence xn is uniformly distributed mod 1. We
will prove this, at least for the cases when α = 2, 3, 4, . . ..
3. Suppose we set x = 1 in the above remark and consider the sequence xn = αn . Then
one can show that xn is uniformly distributed mod 1 for almost every α > 1. However,
not a single example of such an α is known! Indeed, it is not even known if (3/2)n is
dense mod 1.
§1.2.2
Proof of Weyl’s Criterion
We prove (i) implies (ii). Suppose that the sequence xn ∈ R is uniformly distributed mod 1.
If χ[a,b] is the characteristic function of the interval [a, b], then we may rewrite the definition
of uniform distribution mod 1 as
Z 1
n−1
1X
χ[a,b] (x) dx, as n → ∞.
χ[a,b] ({xj }) →
n
0
j=0
From this we deduce that
n−1
1X
g({xj }) →
n
j=0
Z
1
g(x) dx,
0
as n → ∞,
P
whenever g is a step function, i.e., when g(x) = m
k=1 ck χ[ak ,bk ] (x) is a finite linear combination of characteristic functions of intervals.
Now let f be a continuous function on [0, 1]. Then, given ε > 0, we can find a step
function g with kf − gk∞ ≤ ε. We have the estimate
n−1
Z 1
X
1
f (x) dx
f ({xj }) −
n
0
j=0
n−1
n−1
Z 1
1 X
1 X
≤ g(x) dx
(f ({xj }) − g({xj })) + g({xj }) −
0
n j=0
n j=0
Z 1
Z 1
+ f (x) dx
g(x) dx −
0
0
n−1
Z 1
n−1
1 X
1X
g(x) dx
|f ({xj }) − g({xj })| + g({xj }) −
≤
n
0
n j=0
j=0
Z 1
+
|g(x) − f (x)| dx
0
n−1
Z 1
1 X
≤ 2ε + g({xj }) −
g(x) dx .
0
n
j=0
Since the last term converges to zero as n → ∞, we obtain
n−1
Z 1
1 X
f (x) dx ≤ 2ε.
f ({xj }) −
lim sup n→∞ n
0
j=0
8
MATH4/61112
1. Uniform distribution mod 1
Since ε > 0 is arbitrary, this gives us that
n−1
1X
f ({xj }) →
n
j=0
Z
1
f (x) dx
0
as n → ∞.
We now prove (ii) implies (iii). Suppose that f : [0, 1] → C is continuous and f (0) =
f (1). By writing f = Ref + iImf and applying (ii) to the real and imaginary parts of f we
have that
Z 1
n−1
1X
f (x) dx,
f ({xj }) →
n
0
j=0
as n → ∞. For ℓ ∈ Z, ℓ 6= 0 we let f (x) = e2πiℓx . Note that, as exp is 2πi-periodic,
f ({xj }) = e2πiℓxj . Hence
n−1
1 X 2πiℓxj
e
→
n
j=0
Z
1
e2πiℓx dx =
0
1
1
e2πiℓx = 0
2πiℓ
0
as n → ∞, as ℓ 6= 0.
We prove (iii) implies (i). Suppose that (iii) holds. Then
n−1
1X
g({xj }) →
n
j=0
Z
1
g(x) dx,
0
as n → ∞,
P
2πiℓk x is a trigonometric polynomial, c ∈ C, i.e. a finite linear
whenever g(x) = m
k
k=1 ck e
combination of exponential functions.
Note that the space C(X, C) is a vector space: if f, g ∈ C(X, C) then f + g ∈ C(X, C)
and if f ∈ C(X, C), λ ∈ C then λf ∈ C(X, C). A linear subspace S ⊂ C(X, C) is an algebra
if whenever f, g ∈ S then f g ∈ S. We will need the following result:
Theorem 1.2.2 (Stone-Weierstrass Theorem)
Let X be a compact metric space and let C(X, C) denote the space of continuous functions
defined on X. Suppose that S ⊂ C(X, C) is an algebra of continuous functions such that
(i) if f ∈ S then f¯ ∈ S,
(ii) S separates the points of X, i.e. for all x, y ∈ X, x 6= y, there exists f ∈ S such that
f (x) 6= f (y),
(iii) for every x ∈ X there exists f ∈ S such that f (x) 6= 0.
Then S is uniformly dense in C(X, C), i.e. for all f ∈ C(X, C) and all ε > 0, there exists
g ∈ S such that kf − gk∞ = supx∈X |f (x) − g(x)| < ε.
We shall apply the Stone-Weierstrass Theorem with S given by the set of trigonometric
polynomials. It is easy to see that S satisfies the hypotheses of Theorem 1.2.2. Let f be
any continuous function on [0, 1] with f (0) = f (1). Given ε > 0 we can find a trigonometric
polynomial g such that kf − gk∞ ≤ ε. As in the first part of the proof, we can conclude
that
Z 1
n−1
1X
f (x) dx, as n → ∞.
f ({xj }) →
n
0
j=0
9
MATH4/61112
1. Uniform distribution mod 1
Now consider the interval [a, b] ⊂ [0, 1]. Given ε > 0, we can find continuous functions
f1 and f2 (with f1 (0) = f1 (1), f2 (0) = f2 (1)) such that
f1 ≤ χ[a,b] ≤ f2
and
Z
1
0
We then have that
f2 (x) − f1 (x) dx ≤ ε.
Z 1
n−1
n−1
1X
1X
f1 (x) dx
χ[a,b] ({xj }) ≥ lim inf
f1 ({xj }) =
lim inf
n→∞ n
n→∞ n
0
j=0
j=0
Z 1
Z 1
χ[a,b] (x) dx − ε
f2 (x) dx − ε ≥
≥
0
0
and
Z 1
n−1
n−1
1X
1X
lim sup
f2 (x) dx
χ[a,b] ({xj }) ≤ lim sup
f2 ({xj }) =
n→∞ n
n→∞ n
0
j=0
j=0
Z 1
Z 1
≤
f1 (x) dx + ε ≤
χ[a,b] (x) dx + ε.
0
0
Since ε > 0 is arbitrary, we have shown that
n−1
1X
χ[a,b] (xj ) =
lim
n→∞ n
j=0
Z
1
0
χ[a,b] (x) dx = b − a,
so that xn is uniformly distributed mod 1.
§1.2.3
2
Exercises
Exercise 1.1
Show that if xn is uniformly distributed mod 1 then {xn } is dense in [0, 1]. (The converse
is not true.)
Exercise 1.2
Let α, β ∈ R. Let xn = αn + β. Show that xn is uniformly distributed mod 1 if and only
if α 6∈ Q.
Exercise 1.3
(i) Prove that log10 2 is irrational.
(ii) The leading digit of an integer is the left-most digit of its base 10 representation.
(Thus the leading digit of 32 is 3, the leading digit of 1024 is 1, etc.) Show that the
frequency with which 2n has leading digit r (r = 1, 2, . . . , 9) is log10 (1 + 1/r).
(Hint: first show that 2n has leading digit r if and only if
r10k ≤ 2n < (r + 1)10k
for some k ∈ N.)
10
MATH4/61112
1. Uniform distribution mod 1
Exercise 1.4
Calculate the frequency with which the penultimate leading digit of 2n is equal to r, r =
0, 1, 2, . . . , 9. (The penultimate leading digit is the second-to-leftmost digit in the base 10
expansion. The penultimate leading digit of 2048 is 0, etc.)
§1.3
Appendix: a recap on the Riemann Integral
(This subsection is included for general interest and to motivate the Lebesgue integral.
Hence it is not examinable.)
You have probably already seen the construction of the Riemann integral. This gives
a method for defining the integral of suitable functions defined on an interval [a, b]. In
the next section we will see how the Lebesgue integral is a generalisation of the Riemann
integral in the sense that it allows us to integrate functions defined on spaces more general
than subintervals of R. The Lebesgue integral has other nice properties, for example it is
well-behaved with respect to limits. Here we give a brief exposition about some inadequacies
of the Riemann integral and how they motivate the Lebesgue integral.
Let f : [a, b] → R be a bounded function (for the moment we impose no other conditions
on f ).
A partition ∆ of [a, b] is a finite set of points ∆ = {x0 , x1 , x2 , . . . , xn } with
a = x0 < x1 < x2 < · · · < xn = b.
In other words, we are dividing [a, b] up into subintervals.
We then form the upper and lower Riemann sums
U (f, ∆) =
L(f, ∆) =
n−1
X
sup
i=0 x∈[xi ,xi+1 ]
n−1
X
inf
i=0
x∈[xi ,xi+1 ]
f (x) (xi+1 − xi ),
f (x) (xi+1 − xi ).
The idea is then that if we make the subintervals in the partition small, these sums will
be a good approximation to our intuitive notion of the integral of f over [a, b] as the area
bounded by the graph of f . More precisely, if
inf U (f, ∆) = sup L(f, ∆),
∆
∆
where the infimum and supremum are taken over all possible partitions of [a, b], then we
write
Z b
f (x) dx
a
for their common value and call it the (Riemann) integral of f between those limits. We
also say that f is Riemann integrable.
The class of Riemann integrable functions includes continuous functions and step functions (i.e. finite linear combinations of characteristic functions of intervals).
However, there are many functions for which one wishes to define an integral but which
are not Riemann integrable, making the theory rather unsatisfactory. For example, define
f : [0, 1] → R by
1 if x ∈ Q
f (x) = χQ∩[0,1] (x) =
0 otherwise.
11
MATH4/61112
1. Uniform distribution mod 1
Since between any two distinct real numbers we can find both a rational number and an
irrational number, given 0 ≤ y < z ≤ 1, we can find y < x < z with f (x) = 1 and y < x′ < z
with f (x′ ) = 0. Hence for any partition ∆ = {x0 , x1 , . . . , xn } of [0, 1], we have
U (f, ∆) =
n−1
X
i=0
(xi+1 − xi ) = 1,
L(f, ∆) = 0.
Taking the infimum and supremum, respectively, over all partitions ∆ shows that f is not
Riemann integrable.
Why does Riemann integration not work for the above function and how could we go
about improving it? Let us look again at (and slightly rewrite) the formulæ for U (f, ∆)
and L(f, ∆). We have
U (f, ∆) =
n−1
X
sup
n−1
X
inf
f (x) λ([xi , xi+1 ])
i=0 x∈[xi ,xi+1 ]
and
L(f, ∆) =
i=0
where, for an interval [y, z],
x∈[xi ,xi+1 ]
f (x) λ([xi , xi+1 ]),
λ([y, z]) = z − y
denotes its length. In the example above, things did not work because dividing [0, 1] into
intervals (no matter how small) did not ‘separate out’ the different values that f could take.
But suppose we had a notion of ‘length’ that worked for more general sets than intervals.
Then we could do better by considering more complicated ‘partitions’ of [0, 1], where by
partition we now
S mean a collection of subsets {E1 , . . . , Em } of [0, 1] such that Ei ∩ Ej = ∅,
if i 6= j, and m
i=1 Ei = [0, 1].
In the example, for instance, it might be reasonable to write
Z 1
f (x) dx = 1 × λ([0, 1] ∩ Q) + 0 × λ([0, 1]\Q)
0
= λ([0, 1] ∩ Q).
Instead of using subintervals, the Lebesgue integral uses a much wider class of subsets
(namely, sets in a given σ-algebra) together with a notion of ‘generalised length’ (namely,
measure).
12
MATH4/61112
2. More on uniform distribution. Measure spaces.
2. More on uniform distribution mod 1. Measure spaces
§2.1
Uniform distribution of sequences in Rk
We shall now look at the uniform distribution of sequences in Rk . We will say that a sequence xn = (xn,1 , . . . , xn,k ) ∈ Rk is uniformly distributed mod 1 if, given any k-dimensional
cube, the frequency with which the fractional parts of xn lie in the cube is equal to its kdimensional volume.
Definition. A sequence xn = (xn,1 , . . . , xn,k ) ∈ Rk is said to be uniformly distributed
mod 1 if, for each choice of k intervals [a1 , b1 ], . . . , [ak , bk ] ⊂ [0, 1], we have that
n−1
1X
card{j ∈ {0, 1, . . . , n − 1} | xj ∈ [a1 , b1 ] × · · · × [ak , bk ]} → (b1 − a1 ) · · · (bk − ak )
n
j=0
as n → ∞.
We have the following criterion for uniform distribution.
Theorem 2.1.1 (Multi-dimensional Weyl’s Criterion)
Let xn = (xn,1 , . . . , xn,k ) ∈ Rk . The following are equivalent:
(i) the sequence xn ∈ Rk is uniformly distributed mod 1;
(ii) for any continuous function f : Rk /Zk → R we have
n−1
1X
f ({xj,1 }, . . . , {xj,k }) →
n
j=0
Z
···
Z
f (x1 , . . . , xk ) dx1 . . . dxk ;
(iii) for all ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0} we have
n−1
1 X 2πi(ℓ1 xj,1 +···+ℓk xj,k )
e
→0
n
j=0
as n → ∞.
Remark. Here and throughout 0 ∈ Zk denotes the zero vector (0, . . . , 0).
Remark. In §1 we commented that, topologically, the quotient group R/Z is a circle.
More generally, the quotient group Rk /Zk is a k-dimensional torus.
Remark. Consider the case when k = 2 so that R2 /Z2 is the 2-dimensional torus. We can
regard R2 /Z2 as the square [0, 1] × [0, 1] with the top and bottom sides identified and left
and right sides identified. Thus a continuous function f : R2 /Z2 → R has the property that
f (0, y) = f (1, y) and f (x, 0) = f (x, 1). More generally, we can identify the k-dimensional
13
MATH4/61112
2. More on uniform distribution. Measure spaces.
torus Rk /Zk with [0, 1]k with (x1 , . . . , xi−1 , 0, xi+1 , . . . , xk ) and (x1 , . . . , xi−1 , 1, xi+1 , . . . , xk )
identified, 1 ≤ i ≤ k. A continuous function f : Rk /Zk → R then corresponds to a
continuous function f : [0, 1]k → R such that
f (x1 , . . . , xi−1 , 0, xi+1 , . . . , xk ) = f (x1 , . . . , xi−1 , 1, xi+1 , . . . , xk )
for each i, 1 ≤ i ≤ k.
Proof of Theorem 2.1.1. The proof of Theorem 2.1.1 is essentially the same as in the
case k = 1.
2
§2.2
The sequence xn = (α1 n, . . . , αk n)
We shall apply Theorem 2.1.1 to the sequence xn = (α1 n, . . . , αk n), for real numbers
α1 , . . . , αk .
Definition. Real numbers β1 , . . . , βs ∈ R are said to be rationally independent if the only
rationals r1 , . . . , rs ∈ Q such that
r1 β1 + · · · + rs βs = 0
are r1 = · · · = rs = 0.
Proposition 2.2.1
Let α1 , . . . , αk ∈ R. Then the following are equivalent:
(i) the sequence xn = (α1 n, . . . , αk n) ∈ Rk is uniformly distributed mod 1;
(ii) α1 , . . . , αk and 1 are rationally independent.
Proof. The proof is similar to the discussion in §1.2.1 and we leave it as an exercise. (See
Exercise 2.1.)
2
Remark. Note that in the case k = 1, Proposition 2.2.1 reduces to the results of §1.2.1.
To see this, note that α, 1 are rationally dependent if and only if there exist rationals r, s
(not both zero) such that rα + s = 0. This holds if and only if α is rational. Hence α, 1 are
rationally independent if and only if α is irrational.
§2.3
Weyl’s Theorem on Polynomials
We have seen that αn + β is uniformly distributed mod 1 if α is irrational. Weyl’s Theorem
generalises this to polynomials of higher degree. Write
p(n) = αk nk + αk−1 nk−1 + · · · + α1 n + α0 .
Theorem 2.3.1 (Weyl’s Theorem on Polynomials)
If any one of α1 , . . . , αk is irrational then p(n) is uniformly distributed mod 1.
To prove this theorem we shall need the following technical result.
14
MATH4/61112
2. More on uniform distribution. Measure spaces.
Lemma 2.3.2 (van der Corput’s Inequality)
Let z0 , . . . , zn−1 ∈ C and let 1 < m < n. Then
n−1 2
n−1−j
m−1
n−1
X X
X
X
zj ≤ m(n + m − 1)
m2 zi+j z¯i .
(m − j)
|zj |2 + 2(n + m − 1) Re
j=0 i=0
j=1
j=0
Proof (not examinable). The proof is essentially an exercise in multiplying out a product and some careful book-keeping of the cross-terms. You are familiar with a particular
case of it, namely the fact that
|z0 + z1 |2 = (z0 + z1 )(z¯0 + z¯1 ) = |z0 |2 + |z1 |2 + z0 z¯1 + z¯0 z1 = |z0 |2 + |z1 |2 + 2 Re(z0 z¯1 ).
Construct the following parallelogram:
z0
z0 z1
z0 z1 z2
..
..
..
.
.
.
z0 z1 z2 · · ·
z1 z2 · · ·
z2 · · ·
..
.
..
.
zm−1
zm−1 zm
zm−1 zm zm+1
..
zn−m
···
zn−m+1
.
zn−1
zn−1
..
.
···
..
.
zn−2 zn−1
zn−1
(There are n columns, with each column containing m terms, and n + m − 2 rows.) Let sj ,
0 ≤ j ≤ n + m − 2, denote the sum of the terms in the jth row. Each zi occurs in exactly
m of the row sums sj . Hence
s0 + · · · + sn+m−2 = m(z0 + · · · + zn−1 )
so that
n−1 2
X zj = |s0 + · · · + sn+m−2 |2
m2 j=0 ≤ (|s0 | + · · · + |sn+m−2 |)2
≤ (n + m − 1)(|s0 |2 + · · · + |sn+m−2 |2 ),
where the final inequality follows from the (n + m − 1)-dimensional Cauchy-Schwarz inequality.
Recall that |sj |2 = sj s¯j . Expanding out this product and recalling that 2 Re(z) = z + z¯
we have that
X
X
|sj |2 =
|zk |2 + 2 Re
zk z¯ℓ
k
k,ℓ
15
MATH4/61112
2. More on uniform distribution. Measure spaces.
where the first sum is over all indices k of the zi occurring in the definition of sj , and the
second sum is over the indices ℓ < k of the zi occurring in the definition of sj .
Noting that the the number of time the term zk z¯ℓ occurs in |s0 |2 + · · · + |sn+m−1 |2 is
equal to m − (ℓ − k), we can write
|s0 |2 + · · · + |sn+m−1 |2 ≤ m
n−1
X
j=0
|zj |2 + 2 Re
m−1
X
j=1
(m − j)
n−1−j
X
zi+j z¯i
i=0
and the result follows.
2
(m)
Let xn ∈ R. For each m ≥ 1 define the sequence xn = xn+m − xn to be the sequence
of mth differences. The following lemma allows us to infer the uniform distribution of the
sequence xn if we know the uniform distribution of the each of the mth differences of xn .
Lemma 2.3.3
(m)
Let xn ∈ R be a sequence. Suppose that for each m ≥ 1 the sequence xn of mth differences
is uniformly distributed mod 1. Then xn is uniformly distributed mod 1.
Proof. We shall apply Weyl’s Criterion. We need to show that if ℓ ∈ Z \ {0} then
n−1
1 X 2πiℓxj
e
→ 0,
n
j=0
as n → ∞.
Let zj = e2πiℓxj for j = 0, . . . , n − 1. Note that |zj | = 1. Let 1 < m < n. By van der
Corput’s inequality,
2
n−1
m−1
X (m − j) n−1−j
X
2(n + m − 1)
m2 X 2πiℓxj m
Re
e2πiℓ(xi+j −xi )
(n
+
m
−
1)n
+
e
≤
2
2
n n
n
n
j=1
i=0
j=0
m−1
=
X
2(n + m − 1)
m
(m − j)An,j
(m + n − 1) +
Re
n
n
j=1
where
An,j =
n−1−j
n−1−j
1 X 2πiℓ(xi+j −xi )
1 X 2πiℓx(j)
i .
e
=
e
n
n
i=0
(j)
i=0
As the sequence xi of j th differences is uniformly distributed mod 1, by Weyl’s criterion
we have that An,j → 0 for each j = 1, . . . , m − 1. Hence for each m ≥ 1
2
n−1
m2 X 2πiℓxj (n + m − 1)
lim sup 2 = m.
e
≤ lim sup m
n
n→∞ n n→∞
j=0
Hence, for each m > 1 we have
n−1
X
1
1
2πiℓxj e
≤√ .
lim sup m
n→∞ n j=0
As m > 1 is arbitrary, the result follows.
16
2
MATH4/61112
2. More on uniform distribution. Measure spaces.
Proof of Weyl’s Theorem. We will only prove Weyl’s Theorem on Polynomials (Theorem 2.3.1) in the special case where the leading coefficient αk of
p(n) = αk nk + · · · + α1 n + α0
is irrational. (The general case, where αi is irrational for some 1 ≤ i ≤ k, can be deduced
easily from this special case and we leave this as an exercise. See Exercise 2.2.)
We shall use induction on the degree of p. Let ∆(k) denote the statement ‘for every
polynomial p of degree ≤ k, with irrational leading coefficient, the sequence p(n) is uniformly
distributed mod 1’. We know that ∆(1) is true; this follows immediately from Exercise 1.2.
Suppose that ∆(k − 1) is true. Let p(n) = αk nk + · · · + α1 n + α0 be any polynomial of
degree k with αk irrational. Let m ∈ N and consider the sequence p(m) (n) = p(n+m)−p(n)
of mth differences. We have that
p(m) (n) = p(n + m) − p(n)
= αk (n + m)k + αk−1 (n + m)k−1 + · · · + α1 (n + m) + α0
− αk nk − αk−1 nk−1 − · · · − α1 n − α0
= αk nk + αk knk−1 m + · · · + αk−1 nk−1 + αk−1 (k − 1)nk−2 h
+ · · · + α1 n + α1 m + α0 − αk nk − αk−1 nk−1 − · · · − α1 n − α0 .
After cancellation, we can see that, for each m, p(m) (n) is a polynomial of degree k − 1
with irrational leading coefficient αk km. Therefore, by the inductive hypothesis, p(m )(n)
is uniformly distributed mod 1. We may now apply Lemma 2.3.3 to conclude that p(n) is
uniformly distributed mod 1 and so ∆(k) holds. This completes the induction.
2
§2.4
Measures and the Lebesgue integral
You may have seen the definition of Lebesgue measure, Lebesgue outer measure and the
Lebesgue integral in other courses, for example in Fourier Analysis and Lebesgue Integration. The theory developed in that course is one particular example of a more general
theory, which we sketch here. Measure theory is a key technical tool in ergodic theory, and
so a good knowledge of measures and integration is essential for this course (although we
will not need to know the (many) technical intricacies).
§2.4.1
Measure spaces
Loosely speaking, a measure is a function that, when given a subset of a space X, will
say how ‘big’ that subset is. A motivating example is given by Lebesgue measure on
[0, 1]. The Lebesgue measure of an interval [a, b] is given by its length b − a. In defining an
abstract measure space, we will be taking the properties of ‘length’ (or, in higher dimensions,
‘volume’) and abstracting them, in much the same way that a metric space abstracts the
properties of ‘distance’.
It turns out that in general it is not possible to be able to define the measure of an
arbitrary subset of X. Instead, we will usually have to restrict our attention to a class of
subsets of X.
Definition. A collection B of subsets of X is called a σ-algebra if the following properties
hold:
17
MATH4/61112
2. More on uniform distribution. Measure spaces.
(i) ∅ ∈ B,
(ii) if E ∈ B then its complement X \ E ∈ B,
(iii) S
if En ∈ B, n = 1, 2, 3, . . ., is a countable sequence of sets in B then their union
∞
n=1 En ∈ B.
Definition. If X is a set and B a σ-algebra of subsets of X then we call (X, B) a measurable space.
Examples.
1. The trivial σ-algebra is given by B = {∅, X}.
2. The full σ-algebra is given by B = P(X), i.e. the collection of all subsets of X.
Remark. In general, the trivial σ-algebra is too small and the full σ-algebra is too big.
We shall see some more interesting examples of σ-algebras later.
Here are some easy properties of σ-algebras:
Lemma 2.4.1
Let B be a σ-algebra of subsets of X. Then
(i) X ∈ B;
(ii) if En ∈ B then
T∞
n=1 En
∈ B.
In the special case when X is a compact metric space there is a particularly important
σ-algebra.
Definition. Let X be a compact metric space. We define the Borel σ-algebra B(X) to
be the smallest σ-algebra of subsets of X which contains all the open subsets of X.
Remarks.
1. By ‘smallest’ we mean that if C is another σ-algebra that contains all open subsets of
X then B(X) ⊂ C, that is:
\
B(X) = {C | C is a σ-algebra that contains the open sets}.
2. We say that the Borel σ-algebra is generated by the open sets. We call a set in B(X)
a Borel set.
3. By Definition 2.4.1(ii), the Borel σ-algebra also contains all the closed sets and is the
smallest σ-algebra with this property.
4. By Lemma 2.4.1 it follows that B contains all countable intersections of open sets, all
countable unions of countable intersections of open sets, all countable intersections of
countable unions of countable intersections of open sets, etc—and indeed many other
sets.
18
MATH4/61112
2. More on uniform distribution. Measure spaces.
5. There are plenty of sets that are not Borel sets, although by necessity they are rather
complicated. For example, consider R as an additive group and Q ⊂ R as a subgroup.
Form the quotient group R/Q and choose an element in [0, 1] for each coset (this
requires the Axiom of Choice.) The set E of coset representatives is a non-Borel set.
6. In the case when X = [0, 1] or R/Z, the Borel σ-algebra is also the smallest σ-algebra
that contains all sub-intervals.
Let X be a set and let B be a σ-algebra of subsets of X.
Definition. A function µ : B → R is called a (finite) measure if:
(i) µ(∅) = 0;
(ii) if En is a countable collection of pairwise disjoint sets in B (i.e. En ∩ Em = ∅ for
n 6= m) then
!
∞
∞
X
[
µ(En ).
En =
µ
n=1
n=1
We call (X, B, µ) a measure space.
If µ(X) = 1 then we call µ a probability or probability measure and refer to (X, B, µ)
as a probability space.
Remark. Thus a measure just abstracts properties of ‘length’ or ‘volume’. Condition (i)
says that the empty set has zero length, and condition (ii) says that the length of a disjoint
union is the sum of the lengths of the individual sets.
Definition. We say that a property holds almost everywhere if the set of points on which
the property fails to hold has measure zero.
Example. We shall see (Exercise 2.9) that the set of rationals in [0, 1] forms a Borel
set with zero Lebesgue measure. Thus Lebesgue almost every point in [0, 1] is irrational.
(Thus, ‘typical’ (in the sense of measure theory, and with respect to Lebesgue measure)
points in [0, 1] are irrational.)
We will usually be interested in studying measures on the Borel σ-algebra of a compact
metric space X. To define such a measure, we need to define the measure of an arbitrary
Borel set. In general, the Borel σ-algebra is extremely large. We shall see that it is often
unnecessary to do this and instead it is sufficient to define the measure of a certain class of
subsets.
§2.4.2
The Hahn-Kolmogorov Extension Theorem
A collection A of subsets of X is called an algebra if:
(i) ∅ ∈ A,
(ii) if A1 , A2 , . . . , An ∈ A then
(iii) if A ∈ A then Ac ∈ A.
Sn
j=1 Aj
∈ A,
Thus an algebra is like a σ-algebra, except that it is closed under finite unions and not
necessarily closed under countable unions.
19
MATH4/61112
2. More on uniform distribution. Measure spaces.
Example. Take X = [0, 1], and A = {all finite unions of subintervals}.
Let B(A) denote the σ-algebra generated by A, i.e., the smallest σ-algebra containing
A. More precisely:
\
B(A) = {C | C is a σ-algebra, C ⊃ A}.
In the case when X = [0, 1] and A is the algebra of finite unions of intervals, we have
that B(A) is the Borel σ-algebra. Indeed, in the special case of the Borel σ-algebra of a
compact metric space X, it is usually straightforward to check whether an algebra generates
the Borel σ-algebra.
Proposition 2.4.2
Let X be a compact metric space and let B be the Borel σ-algebra. Let A be an algebra
of Borel subsets, A ⊂ B. Suppose that for every x1 , x2 ∈ X, x1 6= x2 , there exist disjoint
open sets A1 , A2 ∈ A such that x1 ∈ A1 , x2 ∈ A2 . Then A generates the Borel σ-algebra B.
The following result says that if we have a function which looks like a measure defined
on an algebra, then it extends uniquely to a measure defined on the σ-algebra generated
by the algebra.
Theorem 2.4.3 (Hahn-Kolmogorov Extension Theorem)
Let A be an algebra of subsets of X. Suppose that µ : A → [0, 1] satisfies:
(i) µ(∅) = 0;
S
(ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if ∞
n=1 An ∈ A then
!
∞
∞
X
[
µ(An ).
An =
µ
n=1
n=1
Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension of
µ : A → [0, 1].
Remarks.
(i) We will often use the Hahn-Kolmogorov Extension Theorem as follows. Take X =
[0, 1] and take A to be the algebra consisting of all finite unions of subintervals of X.
We then define the ‘measure’ µ of a subinterval in such a way as to be consistent with
the hypotheses of the Hahn-Kolmogorov Extension Theorem. It then follows that µ
does indeed define a measure on the Borel σ-algebra.
(ii) Here is another way in which we shall use the Hahn-Kolmogorov Extension Theorem.
Suppose we have two measures, µ and ν, and we want to see if µ = ν. A priori
we would have to check that µ(B) = ν(B) for all B ∈ B. The Hahn-Kolmogorov
Extension Theorem says that it is sufficient to check that µ(A) = ν(A) for all A in
an algebra A that generates B. For example, to show that two Borel probability
measures on [0, 1] are equal, it is sufficient to show that they give the same measure
to each subinterval.
(iii) There is a more general version of the Hahn-Kolmogorov Extension Theorem for the
case when X does not have finite measure (indeed, this is the setting in which the
20
MATH4/61112
2. More on uniform distribution. Measure spaces.
Hahn-Kolmogorov Theorem is usually stated). Suppose that X is a set, B is a σalgebra of subsets of X, and A is an algebra that generates B. Suppose that µ : A →
R ∪ {∞} satisfies conditions (i) and (ii) of Theorem 2.4.3. Suppose in addition
S∞ that
there exist a countable number of sets An ∈ A, n = 1, 2, 3, . . . such that X = n=1 An
such that µ(An ) < 1. Then there exists a unique measure µ : B(A) → R ∪ {∞} which
is an extension of µ : A → R ∪ {∞}.
A consequence of the proof (which we omit) of the Hahn-Kolmogorov Extension Theorem is that sets in B can be arbitrarily well approximated by sets in A in the following
sense. We define the symmetric difference between two sets A, B by
A△B = (A \ B) ∪ (B \ A).
Thus, two sets are ‘close’ if their symmetric difference is small.
Proposition 2.4.4
Suppose that A is an algebra that generates the σ-algebra B. Let B ∈ B and let ε > 0.
Then there exists A ∈ A such that µ(A△B) < ε.
Remark. It is straightforward to check that if µ(A△B) < ε then |µ(A) − µ(B)| < ε.
§2.4.3
Examples of measure spaces
Lebesgue measure on [0, 1]. Take X = [0, 1] and take A to be the collection of all finite
unions of subintervals of [0, 1]. For a subinterval [a, b] define
µ([a, b]) = b − a.
This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines
a measure on the Borel σ-algebra B. This is Lebesgue measure.
Lebesgue measure on R/Z. Take X = R/Z and take A to be the collection of all finite
unions of subintervals of [0, 1). For a subinterval [a, b] define
µ([a, b]) = b − a.
This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines
a measure on the Borel σ-algebra B. This is Lebesgue measure on the circle.
Lebesgue measure on the k-dimensional torus. Take XQ= Rk /Zk and take A to be
the collection of all finite unions of k-dimensional sub-cubes kj=1 [aj , bj ] of [0, 1]k . For a
Q
sub-cube kj=1 [aj , bj ] of [0, 1]k , define
µ(
k
Y
[aj , bj ]) =
k
Y
(bj − aj ).
j=1
j=1
This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines
a measure on the Borel σ-algebra B. This is Lebesgue measure on the torus.
21
MATH4/61112
2. More on uniform distribution. Measure spaces.
Stieltjes measures.1 Take X = [0, 1] and let ρ : [0, 1] → R+ be an increasing function
such that ρ(1) − ρ(0) = 1. Take A to be the algebra of finite unions of subintervals and
define
µρ ([a, b]) = ρ(b) − ρ(a).
This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines
a measure on the Borel σ-algebra B. We say that µρ is the measure on [0, 1] with density
ρ.
Dirac measures. Finally, we give an example of a class of measures that do not fall into
the above categories. Let X be an arbitrary space and let B be an arbitrary σ-algebra. Let
x ∈ X. Define the measure δx by
1 if x ∈ A
δx (A) =
0 if x 6∈ A.
Then δx defines a probability measure. It is called the Dirac measure at x.
§2.5
Exercises
Exercise 2.1
Prove Proposition 2.2.1: let α1 , . . . , αk ∈ R and let xn = (α1 n, . . . , αk n) ∈ Rk . Prove that
xn is uniformly distributed mod 1 if and only if α1 , . . . , αk , 1 are rationally independent.
Exercise 2.2
Deduce the general case of Weyl’s Theorem on Polynomials (where at least one non-constant
coefficient is irrational) from the special case proved above (where the leading coefficient is
irrational).
Exercise 2.3
Let α be irrational. Show that p(n) = αn2 + n + 1 is uniformly distributed mod 1 by using
Lemma 2.3.3 and Exercise 1.2: i.e. show that, for each m ≥ 1, the sequence p(m) (n) =
p(n + m) − p(n) of mth differences is uniformly distributed mod 1.
Exercise 2.4
Let p(n) = αk nk + αk−1 nk−1 + · · · + α1 n + α0 , q(n) = βk nk + βk−1 nk−1 + · · · + β1 n + β0 .
Show that (p(n), q(n)) ∈ R2 is uniformly distributed mod 1 if, for some 1 ≤ i ≤ k, αi , βi
and 1 are rationally independent.
Exercise 2.5
Prove Lemma 2.4.1.
Exercise 2.6
Let X = [0, 1]. Find the smallest σ-algebra that contains the sets: [0, 1/4), [1/4, 1/2), [1/2, 3/4),
and [3/4, 1]
Exercise 2.7
Let X = [0, 1] and let B denote the Borel σ-algebra. A dyadic interval is an interval of the
form
hp p i
2
1
,
, p1 , p2 ∈ {0, 1, . . . , 2k }.
2k 2k
1
An approximate pronunciation of Stieltjes is ‘Steeel-tyuz’.
22
MATH4/61112
2. More on uniform distribution. Measure spaces.
Show that the algebra formed by taking finite unions of all dyadic intervals (over all k ∈ N)
generates the Borel σ-algebra.
Exercise 2.8
Show that A = {all finite unions of subintervals of [0, 1]} is an algebra.
Exercise 2.9
Let µ denote Lebesgue measure on [0, 1]. Show that for any x ∈ [0, 1] we have that
µ({x}) = 0. Hence show that the Lebesgue measure of any countable set is zero.
Show that Lebesgue almost every point in [0, 1] is irrational.
Exercise 2.10
Let X = [0, 1]. Let µ = δ1/2 denote the Dirac δ-measure at 1/2. Show that
µ ([0, 1/2) ∪ (1/2, 1]) = 0.
Conclude that µ-almost every point in [0, 1] is equal to 1/2.
23
MATH4/61112
3. Lebesgue integration. Invariant measures
3. Lebesgue integration. Invariant measures
§3.1
Lebesgue integration
Let (X, B, µ) be a measure space. We are interested in how to integrate functions defined
on X with respect to the measure µ. In the special case when X = [0, 1], B is the Borel
σ-algebra and µ is Lebesgue measure, this will extend the definition of the Riemann integral
to a class of functions that are not Riemann integrable.
Definition. Let f : X → R be a function. If D ⊂ R then we define the pre-image of D
to be the set f −1 D = {x ∈ X | f (x) ∈ D}.
A function f : X → R is measurable if f −1 D ∈ B for every Borel subset D of R. One
can show that this is equivalent to requiring that f −1 (−∞, c) ∈ B for all c ∈ R.
A function f : X → C is measurable if both the real and imaginary parts, Ref and
Imf , are measurable.
Remark. In writing f −1 D, we are not assuming that f is a bijection. We are writing
f −1 D to denote the pre-image of the set D.
We define integration via simple functions.
Definition. A function f : X → R is simple if it can be written as a linear combination
of characteristic functions of sets in B, i.e.:
f=
r
X
aj χBj ,
j=1
for some aj ∈ R, Bi ∈ B, where the Bj are pairwise disjoint.
Remarks.
(i) Note that the sets Bj are sets in the σ-algebra B; even in the case when X = [0, 1]
we do not assume that the sets Bj are intervals.
(ii) For example, χQ∩[0,1] is a simple function. Note, however, that χQ∩[0,1] is not Riemann
integrable.
For a simple function f : X → R we define
Z
f dµ =
r
X
aj µ(Bj ).
j=1
For example, if µ denotes Lebesgue measure on [0, 1] then
Z
χQ∩[0,1] dµ = µ(Q ∩ [0, 1]) = 0,
24
MATH4/61112
3. Lebesgue integration. Invariant measures
as Q ∩ [0, 1] is a countable set and so has Lebesgue measure zero.
A simple function can be written as a linear combination of characteristics functions of
pairwise disjoint sets in many different ways (for example, χ[1/4,3/4] = χ[1/4,1/2) + χ[1/2,3/4] ).
However, one can show that the definition of a simple function f given in (3.1.1) is independent of the choice of representation of f as a linear combination of characteristic functions.
Thus for a simple function f , the integral of f can be regarded as being the area of the
region in X × R bounded by the graph of f .
If f : X → R, f ≥ 0, is measurable then one can show that there exists an increasing
sequence of simple functions fn such that fn ↑ f pointwise1 as n → ∞ and we define
Z
Z
f dµ = lim
fn dµ.
n→∞
This can be shown to exist (although it may be ∞) and to be independent of the choice of
sequence fn .
For an arbitrary measurable function f : X → R, we write f = f + − f − , where
+
f = max{f, 0} ≥ 0 and f − = max{−f, 0} ≥ 0 and define
Z
Z
Z
+
f dµ = f dµ − f − dµ.
R
R
R
R
If fR + dµ = ∞ and f − dµ isRfinite then we set f dµ
= ∞. Similarly,
if f + dµ is finite
R
R
but f − Rdµ = ∞ then we set f dµ = −∞. If both f + dµ and f − dµ are infinite then
we leave f dµ undefined.
Finally, for a measurable function f : X → C, we define
Z
Z
Z
f dµ = Ref dµ + i Imf dµ.
We say that f is integrable if
Z
|f | dµ < +∞.
(Note that, in the case of a measurable
function
f : X → R, saying that f is integrable is
R
R
equivalent to saying that both f + dµ and f − dµ are finite.)
Denote the space of C-valued integrable functions by L1 (X, B, µ). (We shall see a
slightly more sophisticated definition
of this space below.)
R
Note that when we write f dµ we are implicitly integrating over the whole space X.
We can define integration over subsets of X as follows.
Definition. Let (X, B, µ) be a probability space. Let f ∈ L1 (X, B, µ) and let B ∈ B.
Then χB f ∈ L1 (X, B, µ). We define
Z
Z
f dµ = χB f dµ.
B
1
fn ↑ f pointwise means: for every x, fn (x) is an increasing sequence of real numbers and fn (x) → f (x)
as n → ∞.
25
MATH4/61112
§3.1.1
3. Lebesgue integration. Invariant measures
Examples
Lebesgue measure. Let X = [0, 1] and let µ denote Lebesgue measure on the Borel
σ-algebra. If f : [0, 1] → R is Riemann integrable then it is also Lebesgue integrable and
the two definitions agree. However, there are plenty of examples of functions which are
Lebesgue integrable but not Riemann integrable. For example, take f (x) = χQ∩[0,1] (x)
defined on [0, 1] to be theRcharacteristic function of the rationals. Then f (x) = 0 µ-a.e.
Hence f is integrable and f dµ = 0. However, f is not Riemann integrable.
The Stieltjes integral. Let ρ : [0, 1] → R+ and suppose that ρ is differentiable. Then
one can show that
Z
Z
f dµρ = f (x)ρ′ (x) dx.
Integration with respect to Dirac measures. Let x ∈ X. Recall that we defined the
Dirac measure at x by
1 if x ∈ B
δx (B) =
0 if x 6∈ B.
If χB denotes the characteristic function of A then
Z
1 if x ∈ B
χB dδx =
0 if x 6∈ B.
P
Suppose that f = aj χBj is
R a simple function and that, without loss of generality, the Bj
are pairwise disjoint. Then f dδx = aj where j is chosen so that x ∈ Bj (and equals zero
if no such Bj exists). Now let f : X → R. By choosing an increasing sequence of simple
functions, we see that
Z
f dδx = f (x).
We say that two measurable functions f, g : X → C are equivalent or equal µ-a.e. if
f = g µ-a.e., i.e. if µ({x ∈ X | f (x) 6= g(x)}) = 0. The following result says that if two
functions differ only on a set of measure zero then their integrals are equal.
Lemma 3.1.1
R
R
Suppose that f, g ∈ L1 (X, B, µ) and f, g are equal µ-a.e. Then f dµ = g dµ.
Functions being equivalent is an equivalence relation. We shall write L1 (X, B, µ) for
the set of equivalence classes of integrable functions f : X → C on (X, B, µ). We define
Z
kf k1 = |f | dµ.
Then d(f, g) = kf − gk1 is a metric on L1 (X, B, µ). One can show that L1 (X, B, µ) is a
vector space; indeed, it is complete in the L1 metric, and so is a Banach space.
Remark. In practice, we will often abuse notation and regard elements of L1 (X, B, µ) as
functions rather than equivalence classes of functions. In general, in measure theory one
can often ignore sets of measure zero and treat two objects (functions, sets, etc) that differ
only on a set of measure zero as ‘the same’.
26
MATH4/61112
3. Lebesgue integration. Invariant measures
More generally, for any p ≥ 1, we can define the space Lp (X, B, µ) consisting of (equivalence classes of) measurable functions f : X → C such that |f |p is integrable. We can
again define a metric on Lp (X, B, µ) by defining d(f, g) = kf − gkp where
1/p
Z
p
|f | dµ
kf kp =
is the Lp norm.
Apart from L1 , the most interesting Lp space is L2 (X, B, µ). This is a Hilbert space2
with the inner product
Z
hf, gi = f g¯ dµ.
The Cauchy-Schwarz inequality holds: |hf, gi| ≤ kf k2 kgk2 for all f, g ∈ L2 (X, B, µ).
Suppose that µ is a finite measure. It follows from the Cauchy-Schwarz inequality that
L2 (X, B, µ) ⊂ L1 (X, B, µ).
In general, the Riemann integral does not behave well with respect to limits. For
example, if fn is a sequence of Riemann integrable functions such that fn (x) → f (x) at
every point x then it does not followR that f is Riemann
integrable. Even if f is Riemann
R
integrable, it does not follow that fn (x) dx → f (x) dx. The following convergence
theorems hold for the Lebesgue integral.
Theorem 3.1.2 (Monotone Convergence Theorem)
Suppose that Rfn : X → R is an increasing sequence of integrable functions on (X, B, µ).
Suppose
R that fn dµ is a bounded sequence of real numbers (i.e. there exists M > 0 such
that | fn dµ| ≤ M for all n). Then f (x) = limn→∞ fn exists µ-a.e. Moreover, f (x) is
integrable and
Z
Z
f dµ = lim
n→∞
fn dµ.
Theorem 3.1.3 (Dominated Convergence Theorem)
Suppose that g : X → R is integrable and that fn : X → R is a sequence of measurable
functions with |fn | ≤ g µ-a.e. and limn→∞ fn = f µ-a.e. Then f is integrable and
Z
Z
lim
fn dµ = f dµ.
n→∞
Remark. Both the Monotone Convergence Theorem and the Dominated Convergence
Theorem fail for Riemann integration.
§3.2
Invariant measures
We are now in a position to study dynamical systems. Let (X, B, µ) be a probability space.
Let T : X → X be a dynamical system. If B ∈ B then we define
T −1 B = {x ∈ X | T (x) ∈ B},
that is, T −1 B is the pre-image of B under T .
2
An inner product h·, ·i : H × H → C on a complex vector space H is a function such that: (i) hv, vi ≥ 0
for all v ∈ H with equality if and only if v = 0, (ii) hu, vi = hv, ui, and (iii) for each v ∈ H, u 7→ hu, vi is
linear. An inner product determines a norm by setting kvk = (hv, vi)1/2 . A norm determines a metric by
setting d(u, v) = ku − vk. We say that H is a Hilbert space if the vector space H is complete with respect
to the metric induced from the inner product.
27
MATH4/61112
3. Lebesgue integration. Invariant measures
Remark. Note that we do not have to assume that T is a bijection for this definition to
make sense. For example, let T (x) = 2x mod 1 be the doubling map on [0, 1]. Then T is
not a bijection. One can easily check that, for example, T −1 (0, 1/2) = (0, 1/4) ∪ (1/2, 3/4).
Definition. A transformation T : X → X is said to be measurable if T −1 B ∈ B for all
B ∈ B.
Remark. We will often work with compact metric spaces X equipped with the Borel
σ-algebra. In this setting, any continuous transformation is measurable.
Remark. Suppose that A is an algebra of sets that generates the σ-algebra B. One can
show that if T −1 A ∈ B for all A ∈ A then T is measurable.
Definition. We say that T is a measure-preserving transformation (m.p.t. for short) or,
equivalently, µ is said to be a T -invariant measure, if µ(T −1 B) = µ(B) for all B ∈ B.
§3.3
Using the Hahn-Kolmogorov Extension Theorem to prove invariance
Recall the Hahn-Kolmogorov Extension Theorem:
Theorem 3.3.1 (Hahn-Kolmogorov Extension Theorem)
Let A be an algebra of subsets of X and let B(A) denote the σ-algebra generated by A.
Suppose that µ : A → [0, 1] satisfies:
(i) µ(∅) = 0;
S
(ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if ∞
n=1 An ∈ A then
!
∞
∞
X
[
µ(An ).
µ
An =
n=1
n=1
Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension of
µ : A → [0, 1].
That is, if µ looks like a measure on an algebra A, then it extends uniquely to a measure
defined on the σ-algebra B(A) generated by A.
Corollary 3.3.2
Let A be an algebra of subsets of X. Suppose that µ1 and µ2 are two measures on B(A)
such that µ1 (A) = µ2 (A) for all A ∈ A. Then µ1 = µ2 on B(A).
We shall discuss several examples of dynamical systems and prove that certain naturally
occurring measures are invariant using the Hahn-Kolmogorov Extension Theorem.
Suppose that (X, B, µ) is a probability space and suppose that T : X → X is measurable.
We define a new measure T∗ µ by
T∗ µ(B) = µ(T −1 B)
(3.3.1)
where B ∈ B. It is straightforward to check that T∗ µ is a probability measure on (X, B, µ)
(see Exercise 3.4). Thus µ is a T -invariant measure if and only if T∗ µ = µ, i.e. T∗ µ and
µ are the same measure. Corollary 3.3.2 says that if two measures agree on an algebra,
then they agree on the σ-algebra generated by that algebra. Hence if we can show that
T∗ µ(A) = µ(A) for all sets A ∈ A for some algebra A that generates B, then T∗ µ = µ, and
so µ is a T -invariant measure.
28
MATH4/61112
§3.3.1
3. Lebesgue integration. Invariant measures
The doubling map
Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure.
Define the doubling map by T (x) = 2x mod 1.
Proposition 3.3.3
Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure.
Define the doubling map by T (x) = 2x mod 1. Then Lebesgue measure µ is T -invariant.
Proof. Let A denote the algebra of finite unions of intervals. For an interval [a, b] we
have that
a b
a+1 b+1
−1
T [a, b] = {x ∈ R/Z | T (x) ∈ [a, b]} =
,
,
∪
.
2 2
2
2
See Figure 3.3.1.
b
a
a
2
b
2
a+1 b+1
2
2
Figure 3.3.1: The pre-image of an interval under the doubling map
Hence
T∗ µ([a, b]) = µ(T −1 [a, b])
a+1 b+1
a b
,
,
∪
= µ
2 2
2
2
b a (b + 1) (a + 1)
− +
−
=
2 2
2
2
= b − a = µ([a, b]).
Hence T∗ µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness in
the Hahn-Kolmogorov Extension Theorem we see that T∗ µ = µ. Hence Lebesgue measure
is T -invariant.
2
§3.3.2
Rotations on a circle
Let X = R/Z be the circle, let B be the Borel σ-algebra and let µ be Lebesgue measure.
Fix α ∈ R. Define T : R/Z → R/Z by T (x) = x + α mod 1. We call T a rotation through
angle α.
29
MATH4/61112
3. Lebesgue integration. Invariant measures
One can also regard R/Z as the unit circle K = {z ∈ C | |z| = 1} in the complex plane
via the map t 7→ e2πit . In these co-ordinates, the map T becomes T (e2πiθ ) = e2πiα e2πiθ ,
which is a rotation about the origin through the angle 2πα.
Proposition 3.3.4
Let T : R/Z → R/Z, T (x) = x + α mod 1, be a circle rotation. Then Lebesgue measure is
an invariant measure.
Proof. Let [a, b] ⊂ R/Z be an interval. By the Hahn-Kolmogorov Extension Theorem, if
we can show that T∗ µ([a, b]) = µ([a, b]) then it follows that µ(T −1 B) = µ(B) for all B ∈ B,
hence µ is T -invariant.
Note that T −1 [a, b] = [a − α, b − α] where we interpret the endpoints mod 1. (One needs
to be careful here: if a − α < 0 < b − α then T −1 ([a, b]) = [0, b − α] ∪ [a − α + 1, 1], etc.)
Hence
T∗ µ([a, b]) = µ([a − α, b − α]) = (b − α) − (a − α) = b − a = µ([a, b]).
Hence T∗ µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness in
the Hahn-Kolmogorov Extension Theorem we see that T∗ µ = µ. Hence Lebesgue measure
is T -invariant.
2
§3.3.3
The Gauss map
Let X = [0, 1] be the unit interval and let B be the Borel σ-algebra. Define the Gauss map
T : X → X by
1
x mod 1 if x 6= 0
T (x) =
0
if x = 0.
See Figure 3.3.2
0
1
4
1
3
1
2
1
Figure 3.3.2: The graph of the Gauss map (note that there are, in fact, infinitely many
branches to the graph, only the first 5 are illustrated)
The Gauss map is very closely related to continued fractions. Recall that if x ∈ (0, 1)
30
MATH4/61112
3. Lebesgue integration. Invariant measures
then x has a continued fraction expansion of the form
1
x=
(3.3.2)
1
x0 +
x1 +
1
x2 + · · ·
where xj ∈ N. If x is rational then this expansion is finite. One can show that x is irrational
if and only if it has an infinite continued fraction expansion. Moreover, if x is irrational
then it has a unique infinite continued fraction expansion.
If x has continued fraction expansion given by (3.3.2) then
1
= x0 +
x
1
1
x1 +
x2 +
.
1
x3 + · · ·
Hence, taking the fractional part, we see that T (x) has continued fraction expansion given
by
1
T (x) =
1
x1 +
1
x2 +
x3 + · · ·
i.e. T acts by deleting the zeroth term in the continued fraction expansion of x and then
shifting the remaining digits one place to the left.
The Gauss map does not preserve Lebesgue measure (see Exercise 3.5). However it does
preserve Gauss’ measure µ defined by
Z
dx
1
µ(B) =
log 2 B 1 + x
(here log denotes the natural logarithm; the factor log 2 is a normalising constant to make
this a probability measure).
Proposition 3.3.5
Gauss’ measure is an invariant measure for the Gauss map.
Proof. It is sufficient to check that µ([a, b]) = µ(T −1 [a, b]) for any interval [a, b]. First
note that
∞ [
1
1
−1
T [a, b] =
,
.
b+n a+n
n=1
Thus
µ(T −1 [a, b])
∞ Z 1
1 X a+n 1
dx
=
1
log 2
1+x
n=1 b+n
∞ 1 X
1
1
=
− log 1 +
log 1 +
log 2
a+n
b+n
=
n=1
∞
X
1
[log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)]
log 2 n=1
31
MATH4/61112
3. Lebesgue integration. Invariant measures
N
1 X
= lim
[log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)]
N →∞ log 2
n=1
=
=
=
=
1
lim [log(a + N + 1) − log(a + 1) − log(b + N + 1) + log(b + 1)]
log 2 N →∞
a+N +1
1
log(b + 1) − log(a + 1) + lim log
N →∞
log 2
b+N +1
1
(log(b + 1) − log(a + 1))
log 2
Z b
1
1
dx = µ([a, b]),
log 2 a 1 + x
as required.
§3.3.4
2
Markov shifts
Let S be a finite set, for example S = {1, 2, . . . , k}, with k ≥ 2. Let
Σ = {x = (xj )∞
j=0 | xj ∈ S}
denote the set of all infinite sequences of symbols chosen from S. Thus a point x in the
phase space Σ is an infinite sequence of symbols x = (x0 , x1 , x2 , . . .).
Define the shift map σ : Σ → Σ by
σ((x0 , x1 , x2 , . . .)) = (x1 , x2 , x3 , . . .)
(equivalently, (σ(x))j = xj+1 ). Thus σ takes a sequence, deletes the zeroth term in this
sequence, and then shifts the remaining terms in the sequence one place to the left.
When constructing a measure µ on the Borel σ-algebra B of [0, 1] we first defined µ on
an algebra A that generates the σ-algebra B and then extended µ to B using the HahnKolmogorov Extension Theorem. In this case, our algebra A was the collection of finite
unions of intervals; thus to define µ on A it was sufficient to define µ on an interval. We
want to use a similar procedure to define measures on Σ. To do this, we first need to define
a metric on Σ, so that it makes sense to talk about the Borel σ-algebra, and then we need
an algebra of subsets that generates the Borel σ-algebra.
Let x, y ∈ Σ. Suppose that x 6= y. Define n(x, y) = n where xn 6= yn but xj = yj
for 0 ≤ j ≤ n − 1. Thus n(x, y) is the index of the first place in which the sequences x, y
disagree. For convenience, define n(x, y) = ∞ if x = y. Define
d(x, y) =
1
.
2n(x,y)
Thus two sequences x, y are close if they agree for a large number of initial places.
One can show (see Exercise 3.10) that d is a metric on Σ and that the shift map
σ : Σ → Σ is continuous.
Fix ij ∈ S, j = 0, 1, . . . , n − 1. We define the cylinder set
[i0 , i1 , . . . , in−1 ] = {x = (xj )∞
j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}.
That is, the cylinder set [i0 , i1 , . . . , in−1 ] consists of all infinite sequences of symbols from S
that begin i0 , i1 , . . . , in−1 . We call n the rank of the cylinder. Cylinder sets for shifts often
32
MATH4/61112
3. Lebesgue integration. Invariant measures
play the same role that intervals do for maps of the unit interval or circle. Let A denote
the algebra of all finite unions of cylinders. Then A generates the Borel σ-algebra B. To
see this we use Proposition 2.4.2. It is sufficient to check that A separates every pair of
∞
distinct points in Σ. Let x = (xj )∞
j=0 , y = (yj )j=0 ∈ Σ and suppose that x 6= y. Then there
exists n ≥ 0 such that xn 6= yn . Hence x, y are in different cylinders of rank n + 1, and the
claim follows.
We will construct a family of σ-invariant measures on Σ by first constructing them on
cylinders and then extending them to the Borel σ-algebra by using the Hahn-Kolmogorov
Extension Theorem. A k × k matrix P is called a stochastic matrix if
(i) P (i, j) ∈ [0, 1]
(ii) each row of P sums to 1: for each i,
Pk
j=1 P (i, j)
= 1.
(Here, P (i, j) denotes the (i, j)th entry of the matrix P .)
We say that P is irreducible if: for all i, j, there exists n > 0 such that P n (i, j) > 0. We
say that P is aperiodic if there exist n > 0 such that every entry of P n is strictly positive.
Thus P is irreducible if for every (i, j) there exists an n such that the (i, j)th entry of P n
is positive, and P is aperiodic if this n can be chosen to be independent of (i, j).
Suppose that P is irreducible. Let d be the highest common factor of {n > 0 | P n (i, i) >
0}. One can show that P is aperiodic if and only if d = 1. We call d the period of P .
In general, if d is the period of an irreducible matrix P then {1, 2, . . . , k} can be partitioned into d sets, S0 , S1 , . . . , Sd−1 , say, such that P (i, j) > 0 only if i ∈ Sℓ , j ∈ Sℓ+1 mod d .
The matrix P d restricted to the indices that comprise each set Sj is then aperiodic.
The eigenvalues of aperiodic (or, more generally, irreducible) stochastic matrices are
extremely well-behaved.
Theorem 3.3.6 (Perron-Frobenius Theorem)
Let P be an irreducible stochastic matrix with period d. Then the following statements
hold:
(i) The dth roots of unity are simple eigenvalues for P and all other eigenvalues have
modulus strictly less than 1.
(ii) Let 1 denote the column vector (1, 1, . . . , 1)T . Then P 1 = 1 so that 1 is a right
eigenvector corresponding to the maximal eigenvalue 1. Moreover, there exists a
corresponding left eigenvector p = (p(1), . . . , p(k)) for the eigenvalue 1, that is pP = p.
The vector p has strictly
positive entries p(j) > 0, and we can assume that p is
Pk
normalised so that j=1 p(j) = 1.
(iii) for all i, j ∈ {1, 2, . . . , k}, we have that P nd (i, j) → p(j) as n → ∞.
Proof (not examinable). We prove only the aperiodic case. In this case, the period
d = 1. We must show that 1 is a simple eigenvalue, construct the positive left eigenvector
p, and show that P n (i, j) → p(j) as n → ∞.
First note that 1 is an eigenvalue of P as P 1 = 1; this follows from the fact that, for a
stochastic matrix, the rows sum to 1.
Suppose P has an eigenvalue λ with corresponding eigenvector v. Then P v = λv. Hence
P n v = λn v. As the entries of P are non-negative we have that
|λn ||v| ≤ P n |v|.
33
MATH4/61112
3. Lebesgue integration. Invariant measures
Note that if P is stochastic then so is P n for any n ≥ 1. As P n is stochastic, the right-hand
side is a bounded sequence in n. Hence P keeps an eigenvector in a bounded region of Ck .
If |λ| > 1 then |λn ||v| → ∞, a contradiction if v 6= 0. Hence the eigenvalues of P have
modulus less than or equal to 1.
Suppose that P v = λv and |λ| = 1. Then P n |v| ≥ |v|. As P is aperiodic, we can choose
n such that P n (i, j) > 0 for all i, j. Hence
k
X
j=1
P n (i, j)|v(j)| ≥ |v(i)|
(3.3.3)
and choose i0 such that |v(i0 )| = max{|v(j)| | 1 ≤ j ≤ k}. Also, as P n is stochastic and
P n (i, j) > 0, we must have that
|v(i0 )| ≥
k
X
P n (i0 , j)|v(j)|
(3.3.4)
j=1
as the right-hand side of (3.3.4) is a convex combination of the |v(j)|. Thus |v(j)| = |v(i0 )|
for every j, 1 ≤ j ≤ k. We canP
assume, by normalising, that |v(j)| = 1 for all 1 ≤ j ≤ k.
Now P v = λv, i.e. λv(i) = kj=1 P (i, j)v(j), a convex combination of v(j). As the |v(j)|
all have the same modulus, this can only happen if all of the v(j) are the same. Hence v is
a multiple of 1 and λ = 1. So 1 is a simple eigenvalue and there are no other eigenvalues
of modulus 1.
Since 1 is a simple eigenvalue, there is a unique (up to scalar multiples) left eigenvector
p such that pP = p. As P is non-negative, we have that |p|P ≥ |p|, i.e.
k
X
i=1
|p(i)|P (i, j) ≥ |p(j)|
(3.3.5)
and summing over j gives
k
X
i=1
|p(i)| ≥
k
X
j=1
|p(j)|
as P is stochastic. Hence we must have equality in (3.3.5), i.e. |p|P = |p|. Hence |p| is a
left eigenvector for P . Hence p is a scalar multiple of |p|, so without loss of generality we
can assume that p(i) ≥ 0 for all i.
To see that p(i) > 0,Pchoose n such that all of the entries of P n are positive. Then
n
pP = p. Hence p(j) = ki=1 p(i)P n (i, j). The right-hand side of this expression is a sum
of non-negative terms and can only be zero if p(i) = 0 for all 1 ≤ i ≤ k, i.e. if p = 0. Hence
all of the entries of p are strictly positive.P
We can normalise p and assume that kj=1 p(j) = 1.
Decompose Rk into the sum V0 + V1 of eigenspaces where
V0 = {v | hp, vi = 0}, V1 = span{1}
so that V1 is the eigenspace corresponding to the eigenvalue 1 and V0 is the sum of the
eigenspaces of the remaining eigenvalues. Then P (V1 ) = V1 and P (V0 ) ⊂ V0 . Note that
if w ∈ V0 then P n w → 0 as 1 is not an eigenvalue of P when restricted to V0 and the
eigenvalues of P restricted to V0 have modulus strictly less than 1.
34
MATH4/61112
3. Lebesgue integration. Invariant measures
Let v ∈ Rk and write v = c1 + w where hp, wi = 0. Hence c = hp, vi. Then
P n v = hp, vi1 + P n w.
Hence P n v → hp, vi as n → ∞. Taking v = ej = (0, . . . , 0, 1, 0, . . . , 0), the standard basis
vectors, we see that P n (i, j) → p(j).
2
Given an irreducible stochastic matrix P with corresponding normalised left eigenvector
p, we define a Markov measure µP on cylinders by defining
µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ).
We can then extend µP to a probability measure on the Borel σ-algebra of Σ.
Bernoulli measures
are particular examples of Markov measures. Let p = (p(1), . . . , p(k)),
P
p(j) ∈ (0, 1), kj=1 p(j) = 1 be a probability vector. Define
µp ([i0 , i1 , . . . , in−1 ]) = p(i0 )p(i1 ) · · · p(in−1 ).
and then extend to the Borel σ-algebra. We call µp the Bernoulli-p measure.
We can now prove that Markov measures are invariant for shift maps.
Proposition 3.3.7
Let σ : Σ → Σ be a shift map on k symbols. Let P be an irreducible stochastic matrix
with left eigenvector p. Then the Markov measure µP is a σ-invariant measure.
Proof. It is sufficient to prove that µP (σ −1 [i0 , . . . , in−1 ]) = µP ([i0 , . . . , in−1 ]) for each
cylinder [i0 , . . . , in−1 ]. First note that
σ −1 [i0 , . . . , in−1 ] = {x ∈ Σ | σ(x) ∈ [i0 , . . . , in−1 ]}
= {x ∈ Σ | x = (i, i0 , . . . , in−1 , . . .), i ∈ {1, 2, . . . , k}}
=
k
[
[i, i0 , . . . , in−1 ].
i=1
Hence
µP (σ −1 [i0 , . . . , in−1 ]) = µP
k
[
!
[i, i0 , . . . , in−1 ]
i=1
=
k
X
µP ([i, i0 , . . . , in−1 ]) as this is a disjoint union
i=1
=
k
X
i=1
p(i)P (i, i0 )P (i0 , i1 ) · · · P (in−2 , in−1 )
= p(i0 )P (i0 , i1 ) · · · P (in−2 , in−1 ) as pP = p
= µP ([i0 , . . . , in−1 ])
where we have used the fact that pP = p.
2
35
MATH4/61112
3. Lebesgue integration. Invariant measures
Remark. Bernoulli measures are familiar to you from probability theory. Suppose that
S = {H, T } so that Σ denotes all infinite sequences of Hs and T s. We can think of an
element of Σ as the outcome of an infinite sequence of coin tosses. Suppose that p = (pH , pT )
is a probability vector with corresponding Bernoulli measure µp . Then, for example, the
cylinder set [H, H, T ] denotes the set of (infinite) coin tosses that start H, H, T , and this
set has measure pH pH pT , corresponding to the probability of tossing H, H, T .
Markov measures are similar. Given a stochastic matrix P = (P (i, j)) and a left probability eigenvector p = (p(1), . . . , p(k)) we defined
µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ).
We can regard p(i0 ) as being the probability of outcome i0 . Then we can regard P (i0 , i1 )
as being the probability of outcome i1 , given that the previous outcome was i0 .
§3.4
Exercises
Exercise 3.1
Show that in Weyl’s Criterion (Theorem 1.2.1) one cannot replace the hypothesis in equation (1.2.1) that f is continuous with the hypothesis that f ∈ L1 (R/Z, B, µ) (where µ
denotes Lebesgue measure).
Exercise 3.2
Let X be a compact metric space equipped with the Borel σ-algebra B. Show that a
continuous transformation T : X → X is measurable.
Exercise 3.3
Give an example of a sequence of functions fn ∈ L1 ([0, 1], B, µ) (µ = Lebesgue measure)
such that fn → 0 µ-a.e. but fn 6→ 0 in L1 .
Exercise 3.4
Let (X, B, µ) be a probability space and suppose that T : X → X is measurable. Show
that T∗ µ is a probability measure on (X, B, µ).
Exercise 3.5
(i) Show that the Gauss map does not preserve Lebesgue measure. (That is, find an
example of a Borel set B such that T −1 B and B have different Lebesgue measures.)
(ii) Let µ denote Gauss’ measure and let λ denote Lebesgue measure. Show that if B ∈ B,
the Borel σ-algebra of [0, 1], then
1
1
λ(B) ≤ µ(B) ≤
λ(B).
2 log 2
log 2
(3.4.1)
Conclude that a set B ∈ B has Lebesgue measure zero if and only if it has Gauss’
measure zero. (Two measures with the same sets of measure zero are said to be
equivalent.)
(iii) Using (3.4.1), show that f ∈ L1 ([0, 1], B, µ) if and only if f ∈ L1 ([0, 1], B, λ).
Exercise 3.6
For an integer k ≥ 2 define T : R/Z → R/Z by T (x) = kx mod 1. Show that T preserves
Lebesgue measure.
36
MATH4/61112
3. Lebesgue integration. Invariant measures
Exercise 3.7
Let β > 1 denote the golden ratio (so that β 2 = β + 1). Define T : [0, 1] → [0, 1] by
T (x) = βx mod
R 1. Show that T does not preserve Lebesgue measure. Define the measure
µ by µ(B) = B k(x) dx where
k(x) =



1
on [0, 1/β)
1
+ 13
β
β
β
“ 1
1
+
β
1
β3
”
on [1/β, 1).
By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure.
Exercise 3.8
Define the logistic map T : [0, 1] → [0, 1] by T (x) = 4x(1 − x). Define the measure µ by
Z
1
1
p
µ(B) =
dx.
π B x(1 − x)
(i) Check that µ is a probability measure.
(ii) By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant
measure.
Exercise 3.9
Define T : [0, 1] → [0, 1] by


1
1
n(n + 1)x − n if x ∈
,
T (x) =
n+1 n

0
if x = 0.
This is called the L¨
uroth map.
Show that
∞
X
n=1
1
= 1.
n(n + 1)
Show that T preserves Lebesgue measure.
Exercise 3.10
Let Σ = {x = (xj )∞
j=0 | xj ∈ {1, 2, . . . , k}} denote the shift space on k symbols. For
x, y ∈ Σ, define n(x, y) to be the index of the first place in which the two sequences x, y
disagree, and write n(x, y) = ∞ if x = y.) Define
d(x, y) =
1
.
2n(x,y)
(i) Show that d(x, y) is a metric.
(ii) Show that the shift map σ is continuous.
(iii) Show that a cylinder set [i0 , . . . , in−1 ] is both open and closed.
(One can also prove that Σ is compact; we shall use this fact later.)
37
MATH4/61112
3. Lebesgue integration. Invariant measures
Exercise 3.11
Show that the matrix



P =


0
1
0
0
0
1/4 0 3/4 0
0
0 1/2 0 1/2 0
0
0 3/4 0 1/4
0
0
0
1
0






is irreducible but not aperiodic. Show that P has period 2. Show that {1, 2, . . . , 5} can be
partitioned into two sets S0 ∪ S1 so that P (i, j) > 0 only if i ∈ Sℓ and j ∈ Sℓ+1 mod 2 . Show
that P 2 , when restricted to indices in S0 and in S1 is aperiodic.
Determine the eigenvalues of P . Find the unique left probability eigenvector p such
that pP = p.
Exercise 3.12
Show that Bernoulli measures are Markov measures. That is, given a probability vector
p = (p(1), . . . , p(k)), construct a stochastic matrix P such that pP = p. Show that the
corresponding Markov measure is the Bernoulli-p measure.
38
MATH4/61112
4. Examples of invariant measures
4. More examples of invariant measures
§4.1
Criteria for invariance
We shall give more examples of invariant measures. Recall that, given a measurable transformation T : X → X of a probability space (X, B, µ), we say that µ is a T -invariant
measure (or, equivalently, T is a measure-preserving transformation) if µ(T −1 B) = µ(B)
for all B ∈ B.
We will need the following characterisations of invariance.
Lemma 4.1.1
Let T : X → X be a measurable transformation of a probability space (X, B, µ). Then the
following are equivalent:
(i) T is a measure-preserving transformation;
(ii) for each f ∈ L1 (X, B, µ) we have
Z
(iii) for each f ∈ L2 (X, B, µ) we have
Z
f dµ =
Z
f ◦ T dµ;
f dµ =
Z
f ◦ T dµ.
Proof. We will use the identity χT −1 B = χB ◦ T ; this is straightforward to check, see
Exercise 4.1.
We prove that (i) implies (ii). Suppose that T is a measure-preserving transformation.
For any characteristic function χB , B ∈ B,
Z
Z
Z
−1
χB dµ = µ(B) = µ(T B) = χT −1 B dµ = χB ◦ T dµ
and so the equality holds for any simple function (a finite linear combination of characteristic
functions). Given any f ∈ L1 (X, B, µ) with f ≥ 0, we can find an increasing sequence of
simple functions fn with fn → f pointwise, as n → ∞. For each n we have
Z
Z
fn dµ = fn ◦ T dµ
and, applying the Monotone Convergence Theorem to both sides, we obtain
Z
Z
f dµ = f ◦ T dµ.
To extend the result to a general real-valued integrable function f , consider the positive
and negative parts. To extend the result to complex-valued integrable functions f , take
real and imaginary parts.
39
MATH4/61112
4. Examples of invariant measures
That (ii) implies (iii) follows immediately, as L2 (X, B, µ) ⊂ L1 (X, B, µ).
2
we
R Finally,
R prove that (iii) implies (i). Let B ∈ B. Then χB ∈ L (X, B, µ) as
2
|χB | dµ = χB dµ = µ(B). Recalling that χB ◦ T = χT −1 B we have that
Z
Z
Z
µ(B) = χB dµ = χB ◦ T dµ = χT −1 B dµ = µ(T −1 B)
so that µ is a T -invariant probability measure.
§4.2
2
Invariant measures on periodic orbits
Recall that if x ∈ X then we define the Dirac measure δx by
1 if x ∈ B
δx (B) =
0 if x 6∈ B.
R
We also recall that if f : X → R then f dδx = f (x).
Let T : X → X be a measurable dynamical system defined on a measurable space
(X, B). Suppose that x = T n x is a periodic point with period n. Then the probability
measure
n−1
1X
δT j x
µ=
n
j=0
is T -invariant. This is clear from Lemma 4.1.1, noting that for f ∈ L1 (X, B, µ)
Z
1
(f (T x) + · · · + f (T n−1 x) + f (T n x))
f ◦ T dµ =
n
1
=
(f (x) + f (T x) + · · · + f (T n−1 x))
n
Z
=
f dµ,
using the fact that T n x = x.
§4.3
The change of variables formula
The change of variables formula (equivalently, integration by substitution) for (Riemann)
integration should be familiar to you. It can be stated in the following way: if u : [a, b] →
[c, d] is a differentiable bijection with continuous derivative and f : [c, d] → R is (Riemann)
integrable then f ◦ u : [a, b] → R is (Riemann) integrable and
Z
u(b)
u(a)
f (x) dx =
Z
b
f (u(x))u′ (x) dx.
(4.3.1)
a
Allowing for the possibility that u is decreasing (so that u(b) < u(a)), we can rewrite (4.3.1)
as
Z
Z
f (u(x))|u′ (x)| dx.
(4.3.2)
f (x) dx =
[c,d]
[a,b]
We would like a version of (4.3.2) that holds for (Lebesgue) integrable functions on subsets
of Rn , equipped with Lebesgue measure on Rn .
40
MATH4/61112
4. Examples of invariant measures
Theorem 4.3.1 (Change of variables formula)
Let B ⊂ Rn be a Borel subset of Rn and suppose that B ⊂ U for some open subset U .
Suppose that u : U → Rn is a diffeomorphism onto its image (i.e. u : U → u(U ) is a
differentiable bijection with differentiable inverse). Then u(B) is a Borel set.
Let µ denote Lebesgue measure on Rn and let f : Rn → C be integrable. Then
Z
Z
f ◦ u| det Du| dµ
f dµ =
B
u(B)
where Du denotes the matrix of partial derivatives of u.
There are more sophisticated versions of the change of variables formula that hold for
arbitrary measures on Rn .
§4.4
Rotations of a circle
We illustrate how one can use the change of variables formula for integration to prove that
Lebesgue measure is an invariant measure for certain maps on the circle.
Proposition 4.4.1
Fix α ∈ R and define T (x) = x + α mod 1. Then Lebesgue measure µ is T -invariant.
R
R
Proof. By Lemma 4.1.1 we need to show that f ◦T dµ = f dµ for every f ∈ L1 (X, B, µ).
Recall that we can identify functions on R/Z with 1-periodic functions on R. By using
the substitution u(x) = x + α and the change of variables formula for integration we have
that
Z 1+α
Z 1
Z
Z 1
f (x) dx
f (x + α) dx =
f (T x) dx =
f ◦ T dµ =
α
0
0
Z 1
Z α
Z 1
Z 1+α
Z 1
f (x) dx
f (x) dx =
f (x) dx +
f (x) dx =
f (x) dx +
=
α
where we have used the fact that
§4.5
Toral automorphisms
0
α
1
Rα
0
f (x) dx =
R 1+α
1
0
f (x) dx by the periodicity of f .
2
Let X = Rk /Zk be the k-dimensional torus. Let A = (a(i, j)) be a k × k matrix with entries
in Z and with det A 6= 0. We can define a linear map Rk → Rk by




x1
x1
 . 
 .. 
 .  7→ A  ..  .
xk
xk
For brevity, we shall often abuse this notation by writing this as (x1 , . . . , xk ) 7→ A(x1 , . . . , xk ).
Since A is an integer matrix it maps Zk to itself. We claim that A allows us to define
a map
T = TA : X → X : (x1 , . . . , xk ) + Zk 7→ A(x1 , . . . , xk ) + Zk .
We shall often abuse notation and write T (x1 , . . . , xk ) = A(x1 , . . . , xk ) mod 1.
To see that this map is well defined, we need to check that if x + Zk = y + Zk then
Ax + Zk = Ay + Zk . If x, y ∈ Rk give the same point in the torus, then x = y + n for some
n ∈ Zk . Hence Ax = A(y + n) = Ay + An. As A maps Zk to itself, we see that An ∈ Zk
so that Ax, Ay determine the same point in the torus.
41
MATH4/61112
4. Examples of invariant measures
Definition. Let A = (a(i, j)) denote a k × k matrix with integer entries such that det A 6=
0. Then we call the map TA : Rk /Zk → Rk /Zk a linear toral endomorphism.
The map T is not invertible in general. However, if det A = ±1 then A−1 exists and is
an integer matrix. Hence we have a map T −1 given by
T −1 (x1 , . . . , xk ) = A−1 (x1 , . . . , xk ) mod 1.
One can easily check that T −1 is the inverse of T .
Definition. Let A = (a(i, j)) denote a k × k matrix with integer entries such that det A =
±1. Then we call the map TA : Rk /Zk → Rk /Zk a linear toral automorphism.
Remark. The reason for this nomenclature is clear. If TA is either a linear toral endomorphism or linear toral automorphism, then it is an endomorphism or automorphism,
respectively, of the torus regarded as an additive group.
Example. Take A to be the matrix
A=
2 1
1 1
and define T : R2 /Z2 → R2 /Z2 to be the induced map:
T (x1 , x2 ) = (2x1 + x2 mod 1, x1 + x2 mod 1).
Then T is a linear toral automorphism and is called Arnold’s CAT map (CAT stands for
‘C’ontinuous ‘A’utomorphism of the ‘T’orus). See Figure 4.5.1.
Definition. Suppose that det A = ±1. Then we call T a hyperbolic toral automorphism
if A has no eigenvalues of modulus 1.
Proposition 4.5.1
Let T be a linear toral automorphism of the k-dimensional torus X = Rk /Zk . Then
Lebesgue measure µ is T -invariant.
R
R
Proof. By Lemma 4.1.1(iii) we need to show that f ◦ T dµ = f dµ for every f ∈
L1 (X, B, µ).
Recall that we can identify functions f : Rk /Zk → C with functions f : Rk → C that
satisfy f (x + n) = f (x) for all n ∈ Zk . We apply the change of variables formula with the
substitution T (x) = Ax. Note that DT (x) = A and | det DT | = 1. Hence, by the change
of variables formula
Z
Z
Z
Z
f dµ = f dµ.
f ◦ T | det DT | dµ =
f ◦ T dµ =
T (Rk /Zk )
Rk /Zk
2
We shall see in §5.4.3 that linear toral endomorphisms (i.e. when A is a k × k integer matrix
with det A 6= 0 also preserves Lebesgue measure.
42
MATH4/61112
4. Examples of invariant measures
Figure 4.5.1: Arnold’s CAT map
§4.6
Exercises
Exercise 4.1
Suppose that T : X → X. Show that χT −1 B = χB ◦ T .
Exercise 4.2
Let T : R/Z → R/Z, T (x) = 2x mod 1, denote the doubling map. Show that the periodic
points for T are points of the form p/(2n − 1), p = 0, 1, . . . , 2n − 2. Conclude that T has
infinitely many invariant measures.
Exercise 4.3
By using the change of variables formula, prove that the doubling map T (x) = 2x mod 1
on R/Z preserves Lebesgue measure.
Exercise 4.4
Fix α ∈ R and define the map T : R2 /Z2 → R2 /Z2 by
T (x, y) = (x + α, x + y).
By using the change of variables formula, prove that Lebesgue measure is T -invariant.
43
MATH4/61112
5. Ergodic measures
5. Ergodic measures: definition, criteria, and basic examples
§5.1
Introduction
In section 3 we defined what is meant by an invariant measure or, equivalently, what is
meant by a measure-preserving transformation. In this section, we define what is meant by
an ergodic measure. The primary motivation for ergodicity is Birkhoff’s Ergodic Theorem:
if T is an ergodic measure-preserving transformation of the probability space (X, B, µ) then,
for each f ∈ L1 (X, B, µ) we have that
n−1
1X
lim
f (T j x) →
n→∞ n
j=0
Z
f dµ for µ-a.e. x ∈ X.
Checking that a given measure-preserving transformation is ergodic is often a highly nontrivial task and we shall study some methods for proving ergodicity.
§5.2
Ergodicity
We define what it means to say that a measure-preserving transformation is ergodic.
Definition. Let (X, B, µ) be a probability space and let T : X → X be a measurepreserving transformation. We say that T is an ergodic transformation with respect to µ
(or that µ is an ergodic measure) if, whenever B ∈ B satisfies T −1 B = B, then we have
that µ(B) = 0 or 1.
Remark. Ergodicity can be viewed as an indecomposability condition. If ergodicity does
not hold then we can find a set B ∈ B such that T −1 B = B and 0 < µ(B) < 1. We can
then split T : X → X into T : B → B and T : X \ B → X \ B with invariant probability
1
1
µ(· ∩ B) and 1−µ(B)
µ(· ∩ (X \ B)), respectively.
measures µ(B)
It will sometimes be convenient for us to weaken the condition T −1 B = B to µ(T −1 B△B) =
0, where △ denotes the symmetric difference:
A△B = (A \ B) ∪ (B \ A).
We will often write that A = B µ-a.e. or A = B mod 0 to mean that µ(A△B) = 0.
Remark. It is easy to see that if A = B µ-a.e. then µ(A) = µ(B).
Lemma 5.2.1
Let T be a measure-preserving transformation of the probability space (X, B, µ).
Suppose that B ∈ B is such that µ(T −1 B△B) = 0. Then there exists B ′ ∈ B with
T −1 B ′ = B ′ and µ(B△B ′ ) = 0. (In particular, µ(B) = µ(B ′ ).)
44
MATH4/61112
5. Ergodic measures
Proof (not examinable). For each n ≥ 0, we have the inclusion
T −n B△B ⊂
n−1
[
T −j (T −1 B△B).
T −(j+1) B△T −j B =
n−1
[
j=0
j=0
Hence, as T preserves µ,
µ(T −n B△B) ≤ nµ(T −1 B△B) = 0.
Let
B′ =
∞ [
∞
\
T −j B.
n=0 j=n
We have that
Since the sets

S∞
j=n T
T
µ B△
−j B
−1
′
∞
[
j=n

T −j B  ≤
∞
X
µ(B△T −n B) = 0.
j=n
decrease as n increases we have that µ(B△B ′ ) = 0. Also,
B =
∞ [
∞
\
T
−(j+1)
B=
∞
∞
\
[
T −j B = B ′ ,
n=0 j=n+1
n=0 j=n
as required.
2
Corollary 5.2.2
If T is ergodic and µ(T −1 B△B) = 0 then µ(B) = 0 or 1.
We have the following convenient characterisations of ergodicity.
Proposition 5.2.3
Let T be a measure-preserving transformation of the probability space (X, B, µ). The
following are equivalent:
(i) T is ergodic;
(ii) whenever f ∈ L1 (X, B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e.
(iii) whenever f ∈ L2 (X, B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e.
Remark. If f is a constant function then clearly f ◦ T = f . Proposition 5.2.3 says that,
when T is ergodic, the constants are the only T -invariant functions (up to sets of measure
zero).
Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Suppose that f ∈
L1 (X, B, µ) is such that f ◦ T = f µ-a.e. By taking real and imaginary parts, we can
assume without loss of generality that f is real-valued. For k ∈ Z and n ∈ N, define
k
k+1
−1 k k + 1
X(k, n) = x ∈ X | n ≤ f (x) <
,
=f
.
2
2n
2n 2n
Since f is measurable, we have that X(k, n) ∈ B.
45
MATH4/61112
5. Ergodic measures
We have that
T −1 X(k, n)△X(k, n) ⊂ {x ∈ X | f (T x) 6= f (x)}
so that
µ(T −1 X(k, n)△X(k, n)) = 0.
Hence, as T is ergodic, we have by Corollary 5.2.2 that µ(X(k, n)) = 0 or µ(X(k, n)) = 1.
As f ∈ L1 (X, B, µ) is integrable, we have that f is finite almost everywhere. Hence, for
each n,
!
∞ ∞
∞
[
[
[
k k+1
−1
−1
−1 k k + 1
f R=f
f
X(k, n)
=
,
,
=
2n 2n
2n 2n
k=−∞
k=−∞
k=−∞
is equal to X up to a set of measure zero, i.e.,
µ X△
[
!
X(k, n)
k∈Z
= 0;
moreover, this union is disjoint. Hence we have
X
µ(X(k, n)) = µ(X) = 1
k∈Z
and so there is a unique kn for which µ(X(kn , n)) = 1. Let
Y =
∞
\
X(kn , n).
n=1
Then µ(Y ) = 1. Let x, y ∈ Y . Then for each n, f (x), f (y) ∈ [kn /2n , (kn + 1)/2n ) for all
n ≥ 1. Hence for all n ≥ 1 we have that
|f (x) − f (y)| ≤
1
.
2n
Hence f (x) = f (y). Hence f is constant on the set Y . Hence f is constant µ-a.e.
That (ii) implies (iii) is clear as if f ∈ L2 (X, B, µ) then f ∈ L1 (X, B, µ).
Finally, we prove that (iii) implies (i). Suppose that B ∈ B is such that T −1 B = B.
Then χB ∈ L2 (X, B, µ) and χB ◦ T (x) = χB (x) for all x ∈ X. Hence χB is constant µ-a.e.
Since χB only takes the values 0 and 1, we must have χB = 0 µ-a.e. or χB = 1 µ-a.e.
Therefore
Z
0 if χB = 0 µ-a.e.
χB dµ =
µ(B) =
1
if χB = 1 µ-a.e.
X
Hence T is ergodic with respect to µ.
§5.3
2
Fourier series
We shall give a method for proving that certain transformations of the circle or torus are
ergodic with respect to Lebesgue measure. To do this, we use Proposition 5.2.3 and Fourier
series.
46
MATH4/61112
5. Ergodic measures
Let X = R/Z denote the unit circle and let f : X → R. (Alternatively, we can think
of f as a periodic function R → R with f (x) = f (x + n) for all n ∈ Z.) Equip X with the
Borel σ-algebra, let µ denote Lebesgue measure and assume that f ∈ L2 (X, B, µ).
We can associate to f its Fourier series
∞
a0 X
(an cos 2πnx + bn sin 2πnx) ,
+
2
n=1
where
an = 2
Z
1
f (x) cos 2πnx dµ, bn = 2
0
Z
(5.3.1)
1
f (x) sin 2πnx dµ.
0
(Notice that we are not claiming that the series converges—we are just formally associating
the Fourier series to f .)
We shall find it more convenient to work with a complex form of the Fourier series and
rewrite (5.3.1) as
∞
X
cn e2πinx ,
(5.3.2)
n=−∞
where
cn =
(In particular, c0 =
R1
0
Z
1
f (x)e−2πinx dµ.
0
f dµ.) We call cn the nth Fourier coefficient.
Remark. That (5.3.2) and (5.3.1) are equivalent follows from the fact that
cos 2πnx =
e2πinx + e−2πinx
e2πinx − e−2πinx
, sin 2πnx =
2
2i
One can explain Fourier series by considering a more general construction. Recall that
an inner product on a complex vector space H is a function h·, ·i : H × H → C such that
(i) hu, vi = hv, ui for all u, v ∈ H,
(ii) for each v ∈ H, u 7→ hu, vi is linear,
(iii) hv, vi ≥ 0 for all u ∈ H, with equality if and only if v = 0.
p
Given an inner product, one can define a norm on H by setting kvk = hv, vi. One can
then define a metric on H by setting dH (u, v) = ku − vk.
If H is a complex vector space with an inner product h·, ·i such that H is complete with
respect to the metric given by the inner product then we call H a Hilbert space.
Recall that L2 (X, B, µ) is a Hilbert space with the inner product
Z
hf, gi = f g¯ dµ.
The metric on L2 (X, B, µ) is then given by
1/2
Z
.
|f − g|2 dµ
d(f, g) =
Let H be an infinite dimensional Hilbert space. We say that {ej }∞
j=0 is an orthonormal
basis for H if:
47
MATH4/61112
(i) hei , ej i =
5. Ergodic measures
0 if i 6= j
1 if i = j.
(ii) every v ∈ H can be written in the form
v=
∞
X
cj ej .
(5.3.3)
j=0
As (5.3.3) involves an infinite sum,
P we need to be careful about what convergence means.
To make (5.3.3) precise, let sn = nj=0 cj ej denote the nth partial sum. Then (5.3.3) means
that kv − sn k → 0 as n → ∞.
As the vectors {ej }∞
j=0 are orthonormal, taking the inner product of (5.3.3) with ei
shows that
∞
∞
X
X
cj hej , ei i = ci .
cj ej , ei i =
hv, ei i = h
j=0
j=0
Let X = R/Z be the circle and let B be the Borel σ-algebra. Let µ denote Lebesgue
measure. Let en (x) = e2πinx . Then {en }∞
n=−∞ is an orthonormal basis for the Hilbert space
L2 (X, B, µ). Thus if f ∈ L2 (X, B, µ) then we can write
f (x) =
∞
X
cn e2πinx
n=−∞
(in the sense that the sequence of partial sums L2 -converges to f ) where
Z
cn = hf, en i = f (x)e−2πinx dµ.
(5.3.4)
If we want to make the dependence of cn on f clear, then we will sometimes write cn (f )
for cn .
We shall need the following facts about Fourier coefficients.
Proposition 5.3.1
(i) Let f, g ∈ L2 (X, B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are
equal, i.e. cn (f ) = cn (g) for all n ∈ Z.
(ii) Let f ∈ L2 (X, B, µ). Then cn → 0 as n → ±∞.
Remark. Proposition 5.3.1(ii) is better known as the Riemann-Lebesgue Lemma.
So far, we have studied Fourier series for functions defined on the circle; a similar
construction works for functions defined on the k-dimensional torus. Let X = Rk /Zk
be the k-dimensional torus equipped with the Borel σ-algebra and let µ denote Lebesgue
measure on X. Then L2 (X, B, µ) is a Hilbert space when equipped with the inner product
Z
hf, gi = f g¯ dµ.
Let n = (n1 , . . . , nk ) ∈ Zk and define en (x) = e2πihn,xi where hn, xi = n1 x1 + · · · + nk xk .
Then {en }n∈Zk is an orthonormal basis for L2 (X, B, µ). Thus we can write f ∈ L2 (X, B, µ)
as
X
f (x) =
cn e2πihn,xi
n∈Zk
48
MATH4/61112
5. Ergodic measures
in the sense that the sequence of partial sums sN converges in L2 (B, µ) where
X
sN (x) =
cn e2πihn,xi .
n=(n1 ,...,nk )∈Zk ,|nj |≤N
The nth Fourier coefficient is given by
cn = cn (f ) =
Z
f (x)e−2πihn,xi dµ.
We have the following analogue of Proposition 5.3.1:
Proposition 5.3.2
(i) Let f, g ∈ L2 (X, B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are
equal.
(ii) Let f ∈ L2 (X, B, µ). Let n = (n1 , . . . , nk ) ∈ Zk and define knk = max1≤j≤k |nj |.
Then cn → 0 as knk → ∞.
Remark. We could have used any norm on Zk in (ii).
§5.4
Proving ergodicity using Fourier series
In the previous section we studied a number of examples of dynamical systems defined
on the circle or the torus and we proved that Lebesgue measure is invariant. We show
how Proposition 5.2.3 can be used in conjunction with Fourier series to determine whether
Lebesgue measure is ergodic.
Recall that if f ∈ L2 (X, B, µ) then we associate to f the Fourier series
∞
X
cn (f )e2πinx
n=−∞
where
cn (f ) =
Z
f (x)e−2πinx dµ.
P
If we let sn (x) = nℓ=−n cℓ (f )e2πiℓx then ksn − f k2 → 0 as n → ∞.
If T is a measure-preserving transformation then it follows that
ksn ◦ T − f ◦ T k2 =
Z
=
Z
2
|sn ◦ T − f ◦ T | dµ
2
(|sn − f |) dµ
1/2
1/2
=
Z
2
(|sn − f |) ◦ T dµ
1/2
= ksn − f k2 → 0
as n → ∞, where we have used Lemma 4.1.1. By Proposition 5.3.2(i) it follows that, if
limn→∞ sn ◦ T is a possibly infinite sum of terms of the form e2πinx , then it must be the
Fourier series of f ◦ T . In practice, this means that if we take the Fourier series for f (x)
and evaluate it at T (x), then we obtain the Fourier series for f (T x). If f ◦ T = f almost
everywhere, then we can use Proposition 5.3.1(i) to compare Fourier coefficients to obtain
relationships between the Fourier coefficients, and then show that f must be constant.
A similar method works for Fourier series on the torus, as we shall see.
49
MATH4/61112
§5.4.1
5. Ergodic measures
Rotations on a circle
Fix α ∈ R and define T : R/Z → R/Z by T (x) = x + α mod 1. We have already seen that T
preserves Lebesgue measure. The following result gives a necessary and sufficient condition
for T to be ergodic.
Theorem 5.4.1
Let T (x) = x + α mod 1.
(i) If α ∈ Q then T is not ergodic with respect to Lebesgue measure.
(ii) If α 6∈ Q then T is ergodic with respect to Lebesgue measure.
Proof. Suppose that α ∈ Q and write α = p/q for p, q ∈ Z with q 6= 0. Define
f (x) = e2πiqx ∈ L2 (X, B, µ).
Then f is not constant but
f (T x) = e2πiq(x+p/q) = e2πi(qx+p) = e2πiqx = f (x).
Hence T is not ergodic.
Suppose that α 6∈ Q. Suppose that f ∈ L2 (X, B, µ) is such that f ◦ T = f a.e. We want
to prove that f is constant. Suppose that f has Fourier series
∞
X
cn e2πinx .
n=−∞
Then f ◦ T has Fourier series
∞
X
cn e2πinα e2πinx .
n=−∞
Comparing Fourier coefficients we see that
cn = cn e2πinα ,
for all n ∈ Z. As α ∈
6 Q, we see that e2πinα 6= 1 unless n = 0. Hence cn = 0 for n 6= 0.
Hence f has Fourier series c0 , i.e. f is constant a.e.
2
§5.4.2
The doubling map
Let X = R/Z. Recall that if f ∈ L2 (X, B, µ) has Fourier series
∞
X
cn e2πinx
n=−∞
then the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)) tells us that cn → 0 as n → ∞.
Proposition 5.4.2
The doubling map T : X → X defined by T (x) = 2x mod 1 is ergodic with respect to
Lebesgue measure µ.
50
MATH4/61112
5. Ergodic measures
Proof. Let f ∈ L2 (X, B, µ) and suppose that f ◦ T = f µ-a.e. Let f have Fourier series
f (x) =
∞
X
cn e2πinx .
n=−∞
For each p ≥ 0, f ◦ T p has Fourier series
∞
X
p
cn e2πin2 x .
n=−∞
Comparing Fourier coefficients we see that
cn = c2p n
for all n ∈ Z and each p = 0, 1, 2, . . .. Suppose that n 6= 0. Then |2p n| → ∞ as p → ∞. By
the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)), c2p n → 0 as p → ∞. As c2p n = cn ,
we must have that cn = 0 for n 6= 0. Thus f has Fourier series c0 , and so must be equal to
a constant a.e. Hence T is ergodic with respect to µ.
2
§5.4.3
Toral endomorphisms
The argument for the doubling map can be generalised using higher-dimensional Fourier
series to study toral endomorphisms. Let X = Rk /Zk and let µ denote Lebesgue measure.
When T is invertible (and so a linear toral automorphism) we have already seen that
Lebesgue measure is an invariant measure; in §7 we shall see that Lebesgue measure is an
invariant measure when T is a linear toral endomorphism.
Recall that f ∈ L2 (X, B, µ) has Fourier series
X
cn e2πihn,xi ,
n∈Zk
where n = (n1 , . . . , nk ), x = (x1 , . . . , xk ). Define |n| = max1≤j≤k |nj |. Then the RiemannLebesgue Lemma tells us that cn → 0 as |n| → ∞.
Let A be a k × k integer matrix with det A 6= 0 and define T : X → X by
T ((x1 , . . . , xk ) + Zk ) = A(x1 , . . . , xk ) + Zk .
Proposition 5.4.3
A linear toral endomorphism T is ergodic with respect to µ if and only if no eigenvalue of
A is a root of unity.
Remark. In particular, hyperbolic toral automorphisms (i.e. det A = ±1 and A has no
eigenvalues of modulus 1) are ergodic with respect to Lebesgue measure.
Proof. Suppose that T is ergodic but, for a contradiction, that A has a pth root of unity
as an eigenvalue. We choose p > 0 to be the least such integer. Then Ap has 1 as an
eigenvalue, and so n(Ap − I) = 0 for some non-zero vector n = (n1 , . . . , nk ) ∈ Rk . Since A
is an integer matrix, we have that Ap − I is an integer matrix, and so we can in fact take
n ∈ Zk . Note that
p
p
e2πihn,A xi = e2πihnA ,xi = e2πihn,xi .
51
MATH4/61112
5. Ergodic measures
This is because, writing x = (x1 , . . . , xk )T ,



a1,1 · · · a1,k
x1

..   ..  = hnA, xi.
hn, Axi = (n1 , . . . , nk )  ...
.  . 
ak,1 · · · ak,k
xk
Define
f (x) =
p−1
X
e2πihn,A
j xi
.
j=0
Then f ∈ L2 (X, B, µ) and is T -invariant. Since T is ergodic, we must have that f is
constant. But the only way in which this can happen is if n = 0, a contradiction.
Conversely suppose that no eigenvalue of A is a root of unity; we prove that T is ergodic
with respect to Lebesgue measure. Suppose that f ∈ L2 (X, B, µ) is T -invariant µ-a.e. We
show that f is constant µ-a.e. Associate to f its Fourier series:
X
cn e2πihn,xi .
n∈Zk
Since f T p = f µ-a.e., for all p > 0, we have that
X
X
p
cn e2πihnA ,xi =
cn e2πihn,xi .
n∈Zk
n∈Zk
Comparing Fourier coefficients we see that, for every n ∈ Zk ,
cn = cnA = · · · = cnAp = · · · .
If cn 6= 0 then there can only be finitely many indices in the above list, for otherwise it
would contradict the fact that cn → 0 as |n| → ∞, by the Riemann-Lebesgue Lemma
(Proposition 5.3.1(ii)). Hence there exist q1 > q2 ≥ 0 such that nAq1 = nAq2 . Letting
p = q1 − q2 > 0 we see that nAp = n. Thus n is either equal to 0 or n is an eigenvector for
Ap with eigenvalue 1. In the latter case, A would have a pth root of unity as an eigenvalue.
Hence n = 0. Hence cn = 0 unless n = 0 and so f is equal to the constant c0 µ-a.e. Thus
T is ergodic.
2
§5.5
Exercises
Exercise 5.1
Suppose that α ∈ Q. Show directly from the definition that the rotation T (x) = x+α mod 1
is not ergodic, i.e. find an invariant set B = T −1 B, B ∈ B, which has Lebesgue measure
0 < µ(B) < 1.
Exercise 5.2
Define T : R2 /Z2 → R2 /Z2 by
T (x, y) = (x + α, x + y).
Suppose that α 6∈ Q. By using Fourier series, show that T is ergodic with respect to
Lebesgue measure.
52
MATH4/61112
5. Ergodic measures
Exercise 5.3
Let T : X → X be a measurable transformation of a measurable space (X, B). Suppose
that x = T n x is a periodic point with period n. Define the measure µ supported on the
periodic orbit of x by
n−1
1X
µ=
δT j x
n
j=0
where δx denotes the Dirac measure at x. Show from the definition of ergodicity that µ is
an ergodic measure.
Exercise 5.4
(Part (iv) of this exercise is outside the scope of the course!)
It is easy to construct lots of examples of hyperbolic toral automorphisms (i.e. no
eigenvalues of modulus 1—the CAT map is such an example), which must necessarily be
ergodic with respect to Lebesgue measure. It is harder to show that there are ergodic toral
automorphisms with some eigenvalues of modulus 1.
(i) Show that to have an ergodic toral automorphism of Rk /Zk with an eigenvalue of
modulus 1, we must have k ≥ 4.
Consider the matrix

0
 0
A=
 0
−1
1 0
0 1
0 0
8 −6

0
0 
.
1 
8
(ii) Show that A defines a linear toral automorphism TA of the 4-dimensional torus R4 /Z4 .
(iii) Show that A has four eigenvalues, two of which have modulus 1.
(iv) Show that TA is ergodic with respect to Lebesgue measure. (Hint: you have to show
that the two eigenvalues of modulus 1 are not roots of unity, i.e. are not solutions to
λn − 1 = 0 for some n. The best way to do this is to use results from Galois theory
on the irreducibility of polynomials.)
53
MATH4/61112
6. Ergodic measures: using the HKET
6. Ergodic measures: Using the Hahn-Kolmogorov Extension
Theorem to prove ergodicity
§6.1
Introduction
We illustrate a method for proving that a given transformation is ergodic using the HahnKolmogorov Extension Theorem. The key observation is the following technical lemma.
Lemma 6.1.1
Let (X, B, µ) be a probability space and suppose that A ⊂ B is an algebra that generates
B. Let B ∈ B. Suppose there exists K > 0 such that
µ(B)µ(I) ≤ Kµ(B ∩ I)
(6.1.1)
for all I ∈ A. Then µ(B) = 0 or 1.
Proof. Let ε > 0. As A generates B there exists I ∈ A such that µ(B c △I) < ε. Hence
|µ(B c ) − µ(I)| < ε. Moreover, note that B ∩ I ⊂ B c △I so that µ(B ∩ I) < ε. Hence
µ(B)µ(B c ) ≤ µ(B)(µ(I) + ε) ≤ µ(B)µ(I) + µ(B)ε ≤ Kµ(B ∩ I) + ε ≤ (K + 1)ε.
As ε > 0 is arbitrary, it follows that µ(B)µ(B c ) = 0. Hence µ(B) = 0 or 1.
2
Remark. We will often apply Lemma 6.1.1 when A is an algebra of finite unions of
intervals or cylinders. In this case, we need only check that there existsSa constant K > 0
such that (6.1.1) holds for intervals or cylinders. To see this, let I = kj=1 Ij be a finite
union of pairwise disjoint sets in A. Then if (6.1.1) holds for Ij then


k
k
X
[


µ(B)µ(Ij )
Ij =
µ(B)µ(I) = µ(B)µ
j=1
j=1
≤ K
k
X
j=1

µ(B ∩ Ij ) = Kµ B ∩
k
[
j=1

Ij  = Kµ(B ∩ I).
We will also use the change of variables formula for integration. Recall that if I, J ⊂ R
are intervals, u : I → J is a differentiable bijection, and f : J → R is integrable, then
Z
Z
f (x) dx = f (u(x))|u′ (x)| dx.
J
I
54
MATH4/61112
§6.2
6. Ergodic measures: using the HKET
The doubling map
To illustrate the method, we give another proof that the doubling map is ergodic with
respect to Lebesgue measure. Let X = [0, 1] be the unit interval, let B be the Borel
σ-algebra, and let µ be Lebesgue measure.
Given x ∈ [0, 1], we can write x as a base 2 ‘decimal’ expansion:
∞
X
xj
x = ·x0 x1 x2 . . . =
2j+1
(6.2.1)
j=0
where xj ∈ {0, 1}. Note that
T (x) = 2
∞
∞
∞
X
X
X
xj+1
xj+1
xj
mod
1
=
x
+
mod
1
=
.
0
j+1
j+1
2
2
2j+1
j=0
j=0
j=0
Hence if x has base 2 expansion given by (6.2.1) then T (x) has base 2 expansion given by
T (x) = ·x1 x2 x3 . . .
i.e. T deletes the zeroth term in the base 2 expansion of x and shifts the remaining terms
one place to the left.
We introduce dyadic intervals or cylinders to be the sets
I(i0 , i1 , . . . , in−1 ) = {x ∈ [0, 1] | xj = ij , j = 0, . . . , n − 1}.
(So, for example, I(0) = [0, 1/2], I(1) = [1/2, 1], I(0, 0) = [0, 1/4], I(0, 1) = [1/4, 1/2], etc.)
We call n the rank of the cylinder. A dyadic interval is an interval with end-points at
k/2n , (k + 1)/2n where n ≥ 1 and k ∈ {0, 1, . . . , 2n }.
Let A denote the algebra of finite unions of cylinders. Then A generates the Borel
σ-algebra. This follows from Proposition 2.4.2 by noting that cylinders are intervals (and
so Borel) and that they separate points: if x, y ∈ [0, 1], x 6= y, then they have base 2
expansions that differ at some index, say xn 6= yn . Hence x, y belong to disjoint cylinders
of rank n.
Define the maps
x+1
x
.
φ0 (x) = , φ1 (x) =
2
2
Then φ0 : [0, 1] → I(0) and φ1 : [0, 1] → I(1) are differentiable bijections. Indeed, if
x ∈ [0, 1] has base 2 expansion
x = ·x0 x1 x2 . . .
then φ0 (x) and φ1 (x) have base 2 expansions given by
φ0 (x) = ·0x0 x1 x2 . . . , φ1 (x) = ·1x0 x1 x2 . . . .
Thus φ0 and φ1 act on base 2 expansions as a shift to the right, inserting the digits 0 and
1 in the zeroth place, respectively. Note that T φ0 (x) = x and T φ1 (x) = x for all x ∈ [0, 1].
Given i0 , i1 , . . . , in−1 ∈ {0, 1}, define
φi0 ,i1 ,...,in−1 : [0, 1] → I(i0 , i1 , . . . , in−1 )
by
φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 .
55
(6.2.2)
MATH4/61112
6. Ergodic measures: using the HKET
Thus φi0 ,i1 ,...,in−1 takes the point x with base 2 expansion given by (6.2.1), shifts the digits
n places to the right, and inserts the digits i0 , i1 , . . . , in−1 in the first n places. Note that
T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1].
We are now in a position to prove that T is ergodic with respect to Lebesgue measure.
Let B ∈ B be such that T −1 B = B. We must show that µ(B) = 0 or 1. By Lemma 6.1.1,
it is sufficient to prove that there exists K > 0 such that µ(B)µ(I) ≤ Kµ(B ∩ I) for all
intervals I; in fact, we shall prove that µ(B)µ(I) = µ(B ∩ I) for all dyadic intervals I.
Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 ) be a cylinder of rank n and let
φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note also that µ(I) = 1/2n . We will also need the
fact that φ′ (x) = 1/2n (this follows by noting that φ′0 (x) = φ′1 (x) = 1/2 and differentiating
(6.2.2) using the chain rule).
Finally, we observe that
Z
µ(B ∩ I) =
χB∩I (x) dx
Z
=
χB (x)χI (x) dx
Z
χB (x) dx
=
I
Z 1
χB (φ(x))φ′ (x) dx by the change of variables formula
=
0
Z 1
=
χT −n B (φ(x))φ′ (x) dx as T −n B = B
0
Z 1
χB (T n (φ(x)))φ′ (x) dx as χT −n B = χB ◦ T n
=
0
Z 1
χB (x)φ′ (x) dx as T n φ(x) = x
=
0
Z 1
1
=
χB (x) as φ′ (x) = 1/2n
2n 0
= µ(I)µ(B) as µ(I) = 1/2n .
Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it
follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .
§6.3
The Gauss map
Let x ∈ [0, 1]. If x has continued fraction expansion
1
x=
1
x0 +
x1 +
1
x2 + · · ·
then for brevity we write x = [x0 , x1 , x2 , . . .].
Let X = [0, 1] and recall that the Gauss map is defined by T (x) = 1/x mod 1 (with
T defined at 0 by setting T (0) = 0). If x has continued fraction expansion [x0 , x1 , x2 , . . .]
then T (x) has continued fraction expansion [x1 , x2 , . . .]. We have already seen that T leaves
56
MATH4/61112
6. Ergodic measures: using the HKET
Gauss’ measure µ invariant, where Gauss’ measure is defined by
Z
1
1
µ(B) =
dx.
log 2 B 1 + x
We shall find it convenient to swap between Gauss’ measure µ and Lebesgue measure,
which we shall denote here by λ. Recall from Exercise 3.5 that for any set B ∈ B we have
1
1
λ(B) ≤ µ(B) ≤
λ(B).
2 log 2
log 2
Hence µ(B) = 0 if and only if λ(B) = 0. Thus to prove ergodicity it suffices to show that
any T -invariant set B has either λ(B) = 0 or λ(B c ) = 0.
We shall also need some basic facts about continued fractions. Let x ∈ (0, 1) be irrational
and have continued fraction expansion [x0 , x1 , . . .]. For any t ∈ [0, 1], write
[x0 , x1 , . . . , xn−1 + t] =
Pn (x0 , x1 , . . . , xn−1 ; t)
Qn (x0 , x1 , . . . , xn−1 ; t)
where Pn (x0 , x1 , . . . , xn−1 ; t) and Qn (x0 , x1 , . . . , xn−1 ; t) are polynomials in x0 , x1 , . . . , xn−1
and t. Let Pn = Pn (x0 , x1 , . . . , xn−1 ), Qn = Qn (x0 , x1 , . . . , xn−1 ) (we suppress the dependence of Pn and Qn on x0 , . . . , xn−1 for brevity). The following lemma is easily proved
using induction.
Lemma 6.3.1
(i) We have
Pn (x0 , x1 , . . . , xn−1 ; t) = Pn + tPn−1 , Qn (x0 , x1 , . . . , xn−1 ; t) = Qn + tQn−1 .
and the following recurrence relations hold:
Pn+1 = xn Pn + Pn−1 , Qn+1 = xn Qn + Qn−1
with initial conditions P0 = 0, P1 = 1, Q0 = 1, Q1 = x0 .
(ii) The following identity holds:
Qn Pn−1 − Qn−1 Pn = (−1)n .
Let i0 , i1 , . . . , in−1 ∈ N. Define the cylinder I(i0 , i1 , . . . , in−1 ) to be the set of all points
x ∈ (0, 1) whose continued fraction expansion starts with i0 , . . . , in−1 . This is easily seen
to be an interval; indeed
I(i0 , i1 , . . . , in−1 ) = {[i0 , i1 , . . . , in−1 + t] | t ∈ [0, 1)}.
Let A denote the algebra of finite unions of cylinders. Then A generates the Borel σalgebra. (This follows from Proposition 2.4.2: cylinders are clearly Borel sets and they
separate points. To see this, note that if x 6= y then they have different continued fraction
expansions. Hence there exists n such that xn 6= yn . Hence x, y are in different cylinders
of rank n, and these cylinders are disjoint.)
For each i ∈ N define the map φi : [0, 1) → I(i) by
φi (x) =
57
1
.
i+x
MATH4/61112
6. Ergodic measures: using the HKET
Thus if x has continued fraction expansion [x0 , x1 , . . .] then φi (x) has continued fraction
expansion [i, x0 , x1 , . . .]. Clearly T (φi (x)) = x for all x ∈ [0, 1).
For i0 , i1 , . . . , in−1 ∈ N, define
φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1 : [0, 1) → I(i0 , i1 , . . . , in−1 ).
Then φi0 ,i1 ,...,in−1 (x) takes the continued fraction expansion of x, shifts every digit n places
to the right, and inserts the digit i0 , i1 , . . . , in−1 in the first n places. Clearly
T n (φi0 ,i1 ,...,in−1 (x)) = x
for all x ∈ [0, 1).
We first need an estimate on the length of (i.e. the Lebesgue measure of) the cylinder
I(i0 , i1 , . . . , in−1 ). Note that
φi0 ,i1 ,...,in−1 (t) =
Pn (i0 , . . . , in−1 ; t)
Pn + tPn−1
=
.
Qn (i0 , . . . , in−1 ; t)
Qn + tQn−1
Differentiating this expression with respect to t and using Lemma 6.3.1(ii), we see that
Qn Pn−1 − Pn Qn−1 1
′
=
.
|φi0 ,i1 ,...,in−1 (t)| = 2
(Qn + tQn−1 )
(Qn + tQn−1 )2
It follows from Lemma 6.3.1(ii) that Qn + Qn−1 ≤ 2Qn . Hence
1
1
1 1
≤
≤ |φ′i0 ,i1 ,...,in−1 (t)| ≤ 2 .
2
2
4 Qn
(Qn + Qn−1 )
Qn
(6.3.1)
Hence
λ(I(i0 , i1 , . . . , in−1 )) =
Z
χI(i0 ,i1 ,...,in−1 ) (t) dt =
Z
dt =
I(i0 ,i1 ,...,in−1 )
Z
0
1
|φ′i0 ,i1 ,...,in−1 (t)| dt
(6.3.2)
where we have used the change of variables formula. Combining (6.3.2) with (6.3.1) we see
that
1 1
1
≤ λ(I(i0 , i1 , . . . , in−1 )) ≤ 2 .
(6.3.3)
2
4 Qn
Qn
We can now prove that the Gauss map is ergodic with respect to Gauss’ measure µ.
Suppose that T −1 B = B where B ∈ B. Let I(i0 , i1 , . . . , in−1 ) be a cylinder. Then
λ(B ∩ I(i0 , i1 , . . . , in−1 ))
Z
χB (x) dx
=
=
=
=
=
Z
Z
Z
Z
I(i0 ,i1 ,...,in−1 )
1
χB (φi0 ,i1 ,...,in−1 (x))|φ′i0 ,i1 ,...,in−1 (x)| dx by the change of variables formula.
0
1
0
1
0
1
0
χT −n B (φi0 ,i1 ,...,in−1 (x))|φ′i0 ,i1 ,...,in−1 (x)| dx as T −n B = B
χB (T n (φi0 ,i1 ,...,in−1 (x)))|φ′i0 ,i1 ,...,in−1 (x)| dx as χT −n B = χB ◦ T n
χB (x)|φ′i0 ,i1 ,...,in−1 (x)| dx as T n φi0 ,i1 ,...,in−1 (x) = x.
58
MATH4/61112
6. Ergodic measures: using the HKET
By (6.3.1) and (6.3.3) it follows that
λ(B ∩ I(i0 , i1 , . . . , in−1 )) ≤
1
1
λ(B) ≤ λ(B)λ(I(i0 , i1 , . . . , in−1 ))
4Q2n
4
so that
λ(B)λ(I(i0 , i1 , . . . , in−1 )) ≤ 4λ(B ∩ I(i0 , i1 , . . . , in−1 )).
By Lemma 6.1.1 it follows that λ(B) = 0 or λ(B c ) = 0. Hence, as Lebesgue measure and
Gauss’ measure have the same sets of measure zero, it follows that either µ(B) = 0 or
µ(B c ) = 0. Hence T is ergodic with respect to Gauss’ measure.
§6.4
Bernoulli shifts
Let S = {1, . . . , k} be a finite set of symbols and let Σ = {x = (xj )∞
j=0 | xj ∈ {1, 2, . . . , k}}
denote the shift space on k symbols. Let σ : Σ → Σ denote the left shift map, so that
(σ(x))j = xj+1 .
Recall that we defined the cylinder [i0 , . . . , in−1 ] to be the set of all sequences in Σ that
start with symbols i0 , . . . , in−1 , that is
[i0 , . . . , in−1 ] = {x = (xj )∞
j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}.
Let p = (p(1), . . . , p(k)) be a probability vector (that is, p(j) > 0,
defined the Bernoulli measure µp on cylinders by setting
Pk
j=1 p(j)
= 1). We
µp [i0 , . . . , in−1 ] = p(i0 )p(i1 ) · · · p(in−1 ).
We have already seen that µp is a σ-invariant measure.
Proposition 6.4.1
Let µp be a Bernoulli measure. Then µp is ergodic.
Proof. We first make the following observation: let I = [i0 , . . . , ip−1 ], J = [j0 , . . . , jq−1 ]
be cylinders of ranks p, q, respectively. Consider I ∩ σ −n J where n ≥ p. Then
I ∩ σ −n J
= {x = (xj )∞
j=0 ∈ Σ | xj = ij for j = 0, 1, . . . , p − 1, xj+n = yj for j = 0, 1, . . . , q − 1}
[
[i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ],
=
xp ,...,xn−1
a disjoint union. Hence
X
µp (I ∩ σ −n J) =
µp [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ]
xp ,...,xn−1
=
X
xp ,...,xn−1
p(i0 )p(i1 ) · · · p(in−1 )p(xp ) · · · p(xn−1 )p(j0 )p(j1 ) · · · p(jq−1 )
= p(i0 )p(i1 ) · · · p(in−1 )p(j0 )p(j1 ) · · · p(jq−1 ) as
= µp (I)µp (J).
X
xp
p(xp ) = · · · =
X
p(xn−1 ) = 1
xn−1
Let B ∈ B be σ-invariant. By Lemma 6.1.1 it is sufficient to prove that µp (B)µp (I) ≤
µp (B ∩I) for each cylinder I. Let ε > 0. We first approximate the invariant set B by a finite
59
(6.4.1)
MATH4/61112
6. Ergodic measures: using the HKET
unionSof cylinders. By Proposition 2.4.4, we can find a finite disjoint union of cylinders
A = rj=1 Jj such that µp (B△A) < ε. Note that |µp (A) − µp (B)| < ε.
Let n be any integer greater than the rank of I. Note that σ −n B△σ −n A = σ −n (B△A).
Hence
µp (σ −n B△σ −n A) = µp (σ −n (B△A)) = µp (B△A) < ε,
−n
where we have
Sr used the facts that σ B = B and that µp is an invariant measure.
As A = j=1 Jj is a finite union of cylinders and n is greater than the rank of I, it
follows from (6.4.1) that




r
r
X
[
µp (σ −n A ∩ I) = µp σ −n 
µp (σ −n Jj ∩ I)
Jj  ∩ I  =
j=1
j=1
=
r
X
j=1

µp (Jj )µp (I) = µp 
r
[
j=1
= µp (A)µp (I).

Jj  µp (I)
Finally, note that (σ −n A ∩ I)△(σ −n B ∩ I) ⊂ (σ −n A)△(σ −n B). Hence µp ((σ −n A ∩
I)△(σ −n B ∩ I)) < ε so that µp (σ −n A ∩ I) < µp (σ −n B ∩ I) + ε. Hence
µp (B)µp (I) = µp (σ −n B)µp (I) ≤ µp (σ −n A)µp (I) + ǫ = µp (σ −n A ∩ I) + ǫ
≤ µp (σ −n B ∩ I) + 2ǫ = µ(B ∩ I) + 2ǫ.
As ε > 0 is arbitrary, we have that µp (B)µp (I) ≤ µp (B ∩ I) for any cylinder I. By
Lemma 6.1.1, it follows that µp (B) = 0 or 1. Hence µp is ergodic.
2
§6.5
Markov shifts
Let P be an irreducible stochastic k × k matrix with entries P (i, j). Let p = (p(1), . . . , p(k))
be the unique right probability eigenvector corresponding to eigenvalue 1, so that pP = p.
Recall that the Markov measure µP is defined on the Borel σ-algebra by defining it on
cylinders in the following way:
µP ([i0 , i1 , . . . , in−1 ]) = p(i0 )P (i0 , i1 )P (i1 , i2 ) · · · P (in−2 , in−1 ).
We have seen that µP is an invariant measure for the shift map σ. We can adapt the proof
of Proposition 6.4.1 to show that µP is ergodic.
Proposition 6.5.1
Let P be an irreducible stochastic matrix. Then the corresponding Markov measure µP is
ergodic.
Proof (not examinable). Let d denote the period of P .
Let I = [i0 , . . . , ip−1 ], J = [j0 , . . . , jq−1 ] be cylinders of ranks p, q, respectively. Consider
I ∩ σ −n J where n ≥ p. Then
I ∩ σ −n J
= {x ∈ Σ | xj = ij for j = 0, 1, . . . , p − 1, xj+n = yj for j = 0, 1, . . . , q − 1}
[
[i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ],
=
xp ,...,xn−1
60
MATH4/61112
6. Ergodic measures: using the HKET
a disjoint union. Hence
µP (I ∩ σ −n J)
X
µp [i0 , i1 , . . . , ip−1 , xp , . . . , xn−1 , j0 , . . . , jq−1 ]
=
xp ,...,xn−1
=
X
xp ,...,xn−1
p(i0 )P (i0 , i1 ) · · · P (ip−2 , ip−1 )P (ip−1 , xp )P (xp , xp+1 ) · · · P (xn−1 , j0 )
× P (j0 , j1 ) · · · P (jq−2 , jq−1 )
X
1
P (ip−1 , xp )P (xp , xp+1 ) · · · P (xn−1 , j0 )
= µP (I)µP (J)
p(j0 ) x ,...,x
p
= µP (I)µP (J)
n−1
P n−1−p (ip−1 , j0 )
p(j0 )
By the Perron-Frobenius Theorem (Theorem 3.3.6), we know that P nd (i, j) → p(j) as
n → ∞. Hence, letting n → ∞ through an appropriate subsequence, we see that
µP (I ∩ σ −n J) → µP (I)µP (J).
(6.5.1)
The remainder of the proof is almost identical to the proof of Proposition 6.4.1. Let B ∈
B be σ-invariant. By Proposition 6.1.1 it is sufficient to prove that µP (B)µP (I) ≤ µP (B ∩I)
for every cylinder I. Let ε > 0. We approximate B by a finite union of cylinders
S by using
Proposition 2.4.4. That is, we can find a finite disjoint union of cylinders A = rj=1 Jj such
that µP (B△A) < ε. Note that |µP (A) − µP (B)| < ε.
Let n be any integer greater than the rank of I. Note that σ −n B△σ −n (A) = σ −n (B△A).
Hence
µP (σ −n B△σ −n A) = µP (σ −n (B△A)) = µP (B△A) < ε
−n
where we have
Sr used the facts that σ B = B and that µP is an invariant measure.
As A = j=1 Jj is a finite union of cylinders, it follows from (6.5.1) that by choosing n
sufficiently large, we have that µP (σ −n Jj ∩ I) ≤ µP (Jj )µP (I) + ε for j = 1, 2, . . . , r. Hence



r
r
[
X
µP (σ −n A)µP (I) = µP σ −n 
Jj  µP (I) =
µP (σ −n Jj )µP (I)
j=1
=
r
X
j=1
j=1

µP (Jj ∩ I) + ε = µP 
= µP (A ∩ I) + ε.
r
[
j=1

Jj ∩ I  + ε
Finally, note that (σ −n A ∩ I)△(σ −n B ∩ I) ⊂ (σ −n A)△(σ −n B). Hence µP ((σ −n A ∩
I)△(σ −n B ∩ I)) < ε so that µP (σ −n B ∩ I) < µP (σ −n A ∩ I) + ε. Hence
µP (B)µP (I) = µP (σ −n B)µP (I) ≤ µP (σ −n A)µP (I) + ε = µP (σ −n A ∩ I) + 2ε
≤ µP (σ −n B ∩ I) + 3ε = µP (B ∩ I) + 3ε.
As ε > 0 was arbitrary, we have that µP (B)µP (I) ≤ µP (B ∩ I). By Proposition 6.1.1, the
result follows.
2
61
MATH4/61112
§6.6
6. Ergodic measures: using the HKET
Exercises
Exercise 6.1
The dynamical system T : [0, 1] → [0, 1] defined by
T (x) =
2x
if 0 ≤ x ≤ 1/2
2(1 − x) if 1/2 ≤ x ≤ 1
is called the tent map.
(i) Prove that T preserves Lebesgue measure.
(ii) Prove that T is ergodic with respect to Lebesgue measure.
Exercise 6.2
Recall that the L¨
uroth map T : [0, 1] → [0, 1] is defined to be

1
1

,
n(n + 1)x − n if x ∈
T (x) =
n+1 n

0
ifx = 0.
We saw in Exercise 3.9 that Lebesgue measure is a T -invariant probability measure. Prove
that Lebesgue measure is ergodic.
Exercise 6.3
Prove (using induction on n) Lemma 6.3.1.
Exercise 6.4
Let Σ = {x = (xj )∞
j=0 | xj ∈ {0, 1}} and let σ : Σ → Σ, (σ(x))j = xj+1 be the shift map on
the space of infinite sequences of two symbols {0, 1}. Note that Σ supports uncountably
many different σ-invariant measures (for example, the Bernoulli-(p, 1 − p) measures are
all ergodic and all distinct for p ∈ (0, 1)). We will use this observation to prove that the
doubling map has uncountably many ergodic measures.
Define π : Σ → R/Z by
π(x) = π(x0 , x1 , . . .) =
xn
x0 x1
+ 2 + · · · + n+1 + · · · .
2
2
2
(i) Show that π is continuous.
(ii) Let T : R/Z → R/Z be the doubling map: T (x) = 2x mod 1. Show that π ◦ σ = T ◦ π.
(iii) If µ is a σ-invariant probability measure on Σ, show that π∗ µ (where π∗ µ(B) =
µ(π −1 B) for a Borel subset B ⊂ R/Z) is a T -invariant probability measure on R/Z.
(Lebesgue measure on R/Z corresponds to choosing µ to be the Bernoulli-(1/2, 1/2)measure on Σ.)
(iv) Show that if µ is an ergodic measure for σ, then π∗ µ is an ergodic measure for T .
(v) Conclude that there are uncountably many different ergodic measures for the doubling
map.
62
MATH4/61112
7. Continuous transformations
7. Continuous transformations on compact metric spaces
§7.1
Introduction
So far, we have been studying a measurable map T defined on a probability space (X, B, µ).
We have asked whether the given measure µ is invariant or ergodic. In this section, we shift
our focus slightly and consider, for a given transformation T : X → X, the space M (X, T )
of all probability measures that are invariant under T . In order to equip M (X, T ) with some
structure we will need to assume that the underlying space X is itself equipped with some
additional structure other than merely being a measure space. Throughout this section
we will work in the context of X being a compact metric space and T being a continuous
transformation.
§7.2
Probability measures on compact metric spaces
Let X be a compact metric space equipped with the Borel σ-algebra B. (Recall that the
Borel σ-algebra is the smallest σ-algebra that contains all the open subsets of X.)
Let C(X, R) = {f : X → R | f is continuous} denote the space of real-valued continuous
functions defined on X. Define the uniform norm of f ∈ C(X, R) by
kf k∞ = sup |f (x)|.
x∈X
With this norm, C(X, R) is a Banach space.
An important property of C(X, R) that will prove to be useful later on is that it is
separable: C(X, R) contains a countable dense subset. Thus we can choose a sequence
{fn ∈ C(X, R)}∞
n=1 such that, for all f ∈ C(X, R) and all ε > 0, there exists n such that
kf − fn k∞ < ε.
Let M (X) denote the set of all Borel probability measures on (X, B).
It will be very important to have a sensible notion of convergence in M (X); the appropriate notion fos us is called weak∗ convergence. We say that a sequence of probability
measures µn weak∗ converges to µ, as n → ∞ if, for every f ∈ C(X, R),
Z
Z
f dµn → f dµ, as n → ∞.
If µn weak∗ converges to µ then we write µn ⇀ µ. We can make M (X) into a metric
space compatible with this definition of convergence by choosing a countable dense subset
{fn }∞
n=1 ⊂ C(X, R) and, for µ1 , µ2 ∈ M (X), setting
Z
Z
∞
X
1
fn dµ1 − fn dµ2 dM (X) (µ1 , µ2 ) =
n
2 kfn k∞
n=1
(we can assume that fn 6≡ 0 for any n). It is easy to check that µn ⇀ µ if and only if
dM (X) (µn , µ) → 0.
However, we will not need to work with a particular metric: what will be important is
the definition of convergence.
63
MATH4/61112
7. Continuous transformations
Remark. Note that with this definition it is not necessarily true that µn (B) → µ(B), as
n → ∞, for B ∈ B.
§7.2.1
The Riesz Representation Theorem
Let µ ∈ M (X) be a Borel probability measure. Then we can think of µ as a functional
that acts on C(X, R), that is we can regard µ as a map
Z
µ : C(X, R) → R : f 7→ f dµ.
R
We will often write µ(f ) for f dµ.
Notice that this functional enjoys several natural properties:
(i) the functional defined by µ is linear:
µ(λ1 f1 + λ2 f2 ) = λ1 µ(f1 ) + λ2 µ(f2 )
where λ1 , λ2 ∈ R and f1 , f2 ∈ C(X, R).
(ii) the functional defined by µ is bounded: i.e. if f ∈ C(X, R) then |µ(f )| ≤ kf k∞ .
(iii) if f ≥ 0 then µ(f ) ≥ 0 (we say that the functional µ is positive);
(iv) consider the function 1 defined by 1(x) ≡ 1 for all x; then µ(1) = 1 (we say that the
functional µ is normalised).
The Riesz Representation Theorem says that the above properties characterise all Borel
probability measures on X. That is, if we have a map w : C(X, R) → R that satisfies
the above four properties, then w must be given by integrating with respect to a Borel
probability measure. This will be a very useful method of constructing measures: we need
only construct bounded positive normalised linear functionals.
Theorem 7.2.1 (Riesz Representation Theorem)
Let w : C(X, R) → R be a functional such that:
(i) w is linear: i.e. w(λ1 f1 + λ2 f2 ) = λ1 w(f1 ) + λ2 w(f2 );
(ii) w is bounded: i.e. for all f ∈ C(X, R) we have |w(f )| ≤ kf k∞ ;
(iii) w is positive: i.e. if f ≥ 0 then w(f ) ≥ 0;
(iv) w is normalised: i.e. w(1) = 1.
Then there exists a Borel probability measure µ ∈ M (X) such that
Z
w(f ) = f dµ.
Moreover, µ is unique.
Thus the Riesz Representation Theorem says that “if it looks like integration on continuous
functions, then it is integration with respect to a (unique) Borel probability measure.”
64
MATH4/61112
§7.2.2
7. Continuous transformations
Properties of M (X)
First note that the space M (X) of Borel probability measures on the compact metric space
X is non-empty (provided X 6= ∅). This is because, for each x ∈ X, the Dirac measure δx
is a Borel probability measure. Indeed, we have the following result:
Proposition 7.2.2
There is a continuous embedding of X in M (X) given by the map X → M (X) : x 7→ δx ,
i.e. if xn → x then δxn ⇀ δx .
Proof. See Exercise 7.1.
2
Recall that a subset C of a vector space is convex if whenever v1 , v2 ∈ C and α ∈ [0, 1]
then αv1 + (1 − α)v2 ∈ C.
Proposition 7.2.3
The space M (X) is convex.
Proof. Let µ1 , µ2 ∈ M (X), α ∈ [0, 1]. Then it is easy to check that αµ1 + (1 − α)µ2 ,
defined by
(αµ1 + (1 − α)µ2 )(B) = αµ1 (B) + (1 − α)µ2 (B),
is a Borel probability measure.
2
Finally, recall that a metric space K is said to be (sequentially) compact if every sequence of points in K has a convergent subsequence.
Proposition 7.2.4
The space M (X) is weak∗ compact.
R
Proof. For convenience, we shall write µ(f ) = f dµ.
Since C(X, R) is separable, we can choose a countable dense subset of functions {fi }∞
i=1 ⊂
C(X, R). Given a sequence µn ∈ M (X), we shall first consider the sequence of real numbers
µn (f1 ) ∈ R. We have that |µn (f1 )| ≤ kf1 k∞ for all n, so µn (f1 ) is a bounded sequence of
(1)
real numbers. As such, it has a convergent subsequence, µn (f1 ) say.
(1)
(1)
Next we apply the sequence of measures µn to f2 and consider the sequence µn (f2 ) ∈
R. Again, this is a bounded sequence of real numbers and so it has a convergent subsequence
(2)
µn (f2 ).
(i)
(i−1)
In this way we obtain, for each i ≥ 1, nested subsequences {µn } ⊂ {µn } such that
(i)
(n)
µn (fj ) converges for 1 ≤ j ≤ i. Now consider the diagonal sequence µn . Since, for n ≥ i,
(n)
(i)
(n)
µn is a subsequence of µn , µn (fi ) converges for every i ≥ 1.
(n)
We can now use the fact that {fi } is dense to show that µn (f ) converges for all
f ∈ C(X, R), as follows. For any ε > 0, we can choose fi such that kf − fi k∞ ≤ ε. Since
(n)
µn (fi ) converges, there exists N such that if n, m ≥ N then
(m)
|µ(n)
n (fi ) − µm (fi )| ≤ ε.
Thus if n, m ≥ N we have
(m)
(n)
(n)
(n)
(m)
(m)
(m)
|µ(n)
n (f ) − µm (f )| ≤ |µn (f ) − µn (fi )| + |µn (fi ) − µm (fi )| + |µm (fi ) − µm (f )|
≤ 3ε,
65
MATH4/61112
7. Continuous transformations
(n)
so µn (f ) converges, as required.
(n)
To complete the proof, write w(f ) = limn→∞ µn (f ). We claim that w satisfies the
hypotheses of the Riesz Representation Theorem and so corresponds to integration with
respect to a probability measure.
(i) By construction, w is a linear mapping: w(λf + µg) = λw(f ) + µw(g).
(ii) As |w(f )| ≤ kf k∞ , we see that w is bounded.
(iii) If f ≥ 0 then it is easy to check that w(f ) ≥ 0. Hence w is positive.
(iv) It is easy to check that w is normalised: w(1) = 1.
Therefore, by the Riesz Representation Theorem, there exists µ ∈ M (X) such that w(f ) =
R
R
R
(n)
(n)
f dµ. We then have that f dµn → f dµ, as n → ∞, for all f ∈ C(X, R), i.e., that µn
converges weak∗ to µ, as n → ∞.
2
§7.3
Invariant measures for continuous transformations
Let X be a compact metric space equipped with the Borel σ-algebra and let T : X → X
be a continuous transformation. It is clear that T is measurable.
Given a measure µ, we have already defined the measure T∗ µ by T∗ µ(B) = µ(T −1 B).
If µ is a Borel probability measure, then it is straightforward to check that T∗ µ is a Borel
probability measure. We can think of T∗ as a transformation on M (X), namely:
T∗ : M (X) → M (X), T∗ µ = µ ◦ T −1 .
That is, if B ∈ B then T∗ µ(B) = µ(T −1 B).
The following result tells us how to integrate with respect to T∗ µ.
Lemma 7.3.1
For f ∈ L1 (X, B, µ) we have
Z
f d(T∗ µ) =
Z
f ◦ T dµ.
Proof. From the definition, for B ∈ B,
Z
Z
Z
−1
χB d(T∗ µ) = (T∗ µ)(B) = µ(T B) = χT −1 B dµ = χB ◦ T dµ.
Thus the result holds for simple functions. If f ≥ 0 is a positive measurable function then
we can choose an increasing sequence of simple functions fn increasing to f pointwise. We
have
Z
Z
fn d(T∗ µ) = fn ◦ T dµ
and, applying the Monotone Convergence Theorem (Theorem 3.1.2) to each side, we obtain
Z
Z
f d(T∗ µ) = f ◦ T dµ.
The result extends to an arbitrary real-valued f ∈ L1 (X, B, µ) by considering positive and
negative parts and then to complex-valued integrable functions by taking real and imaginary
parts.
2
66
MATH4/61112
7. Continuous transformations
Recall that a measure µ is said to be T -invariant if µ(T −1 B) = µ(B) for all B ∈ B.
Hence µ is T -invariant if and only if T∗ µ = µ. Write
M (X, T ) = {µ ∈ M (X) | T∗ µ = µ}
to denote the space of all T -invariant Borel probability measures.
The following result gives a useful criterion for checking whether a measure is T invariant.
Lemma 7.3.2
Let T : X → X be a continuous mapping of a compact metric space. The following are
equivalent:
(i) µ ∈ M (X, T );
(ii) for all f ∈ C(X, R) we have that
Z
f ◦ T dµ =
Z
f dµ.
(7.3.1)
Proof. We prove (i) implies (ii). Suppose that µ ∈ M (X, T ) so that T∗ µ = µ. Let
f ∈ C(X, R). Then f ∈ L1 (X, B, µ). Hence by Lemma 7.3.1, for any f ∈ C(X, R) we have
Z
Z
Z
f ◦ T dµ = f d(T∗ µ) = f dµ.
Conversely, Lemma 7.3.1 allows us to write (7.3.1) as: µ(f ) = (T∗ µ)(f ) for all f ∈
C(X, R). Hence µ and T∗ µ determine the same linear functional on C(X, R). By uniqueness
in the Riesz Representation theorem, we have T∗ µ = µ.
2
§7.4
Invariant measures for continuous maps on the torus
We can use Lemma 7.3.2 to prove that a given measure is invariant for certain dynamical systems. We first note that we need only check (7.3.1) for a dense set of continuous
functions.
Lemma 7.4.1
Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, Rfor all f ∈
C(X,
R) and all ε > 0 thereRexists g ∈ S Rsuch that kf −gk∞ < ε). Suppose that g ◦T dµ =
R
g dµ for all g ∈ S. Then f ◦ T dµ = f dµ for all f ∈ C(X, R).
Proof. Let f ∈ C(X, R) and let ε > 0. Choose g ∈ S such that kf − gk∞ < ε. Then
Z
Z
f ◦ T dµ − f dµ
Z
Z
Z
Z
Z
Z
≤ f ◦ T dµ − g ◦ T dµ + g ◦ T dµ − g dµ + g dµ − f dµ
Z
Z
Z
Z
≤
|f ◦ T − g ◦ T | dµ + g ◦ T dµ − g dµ + |f − g| dµ.
67
MATH4/61112
7. Continuous transformations
Noting
that, Ras kf − gk∞ < ε, we have that |f (T x) − g(T x)| < ε for all x, and that
R
g ◦ T dµ = g dµ, we have that
Z
Z
f ◦ T dµ − f dµ < 2ε.
As ε is arbitrary, the result follows.
2
Corollary 7.4.2
Let T be a continuous transformation of a compact metric space X, equipped with the
Borel σ-algebra. Let µ be a Borel probability measure on X.
R
R Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions such that g◦T dµ =
g dµ for all g ∈ S. Then µ is a T -invariant measure.
Proof. This follows immediately from Lemma 7.3.2 and Lemma 7.4.1.
2
We show how to use Corollary 7.4.2 by studying some of our examples.
§7.4.1
Circle rotations
Let T (x) = x + α mod 1 be a circle rotation. We show how to use Corollary 7.4.2 to prove
that Lebesgue measure µ is T -invariant.
Let ℓ ∈ Z. We first note that if ℓ 6= 0 then
Z
1
1
e2πiℓx dx =
e2πiℓx = 0.
2πiℓ
0
R 2πiℓx
We also note that if ℓ = 0 then e
dx = 1.
Let S denote the set of trigonometric polynomials, i.e.


r−1
X

S=
cj e2πiℓj x | cj ∈ R, ℓj ∈ Z .


j=0
Then S is uniformly dense in C(X, R) by the Stone-Weierstrass Theorem (Theorem 1.2.2).
Let g ∈ S be a trigonometric polynomial and write
g(x) =
r−1
X
cj e2πiℓj x
j=0
R
where ℓj = 0 if and only if j = 0. Hence g dµ = c0 .
Note that
r−1
r−1
X
X
2πiℓj (x+α)
cj e2πiℓj α e2πiℓj x .
cj e
=
g(T x) =
j=0
j=0
Hence
Z
g ◦ T dµ =
Z X
r−1
cj e2πiℓj α e2πiℓj x =
r−1
X
j=0
j=0
e2πiℓj α
Z
e2πiℓj x dµ
and
R the only non-zero integral occurs when ℓj = 0, i.e. j = 0. We must therefore have that
g ◦ T dµ =
R c0 .
R
Hence g ◦ T dµ = g dµ for all g ∈ S. It follows from Corollary 7.4.2 that µ is
T -invariant.
68
MATH4/61112
§7.4.2
7. Continuous transformations
Toral endomorphisms
Let A be a k × k integer matrix with det A 6= 0. Define the linear toral endomorphism
T : Rk /Zk → Rk /Zk by
T ((x1 , . . . , xk ) + Zk ) = A(x1 , . . . , xk ) + Zk .
When T is a linear toral automorphism (i.e. when det A = ±1) we have already seen that
Lebesgue measure is invariant. We use Corollary 7.4.2 to prove the Lebesgue measure µ is
T -invariant when det A 6= 0.
For n = (n1 , . . . , nk ) ∈ Zk and x = (x1 , . . . , xk ) ∈ Rk define, as before, hn, xi =
n1 x1 + · · · + nk xk . Note that
Z
Z
Z
2πihn,xi
e
dµ = · · · e2πin1 x1 · · · e2πink xk dx1 · · · dxk .
Hence
Z
e2πihn,xi dµ =
0 if n 6= 0
1 if n = 0
where 0 = (0, . . . , 0) ∈ Zk .
Let


r−1

X
(j)
(j)
(j)
cj e2πihn ,xi | cj ∈ R, n(j) = (n1 , . . . , nk ) ∈ Zk .
S=


j=0
By the Stone-Weierstrass Theorem (Theorem 1.2.2), we see that S is uniformly dense in
C(Rk /Zk , R).
Let g ∈ S and write
r−1
X
(j)
cj e2πihn ,xi
g(x) =
j=0
where n(j) = 0 if and only if j = 0. Then
Z
g dµ =
Z X
r−1
2πihn(j) ,xi
cj e
dµ =
g(T x) =
r−1
X
(j) ,Axi
cj e2πihn
Z
g ◦ T dµ =
Z X
r−1
=
r−1
X
Z
(j) ,xi
e2πihn
dµ = c0 .
(j) A,xi
cj e2πihn
.
j=0
j=0
Hence
cj
j=0
j=0
Note that
r−1
X
2πihn(j) A,xi
cj e
dµ =
r−1
X
j=0
j=0
cj
Z
(j) A,xi
e2πihn
dµ.
These integrals are zero unless n(j) A = 0. As det A =
6 0 this happen only when n(j) = 0,
i.e. when j = 0. Hence
Z
Z
g ◦ T dµ = c0 = g dµ.
Hence by Corollary 7.4.2, µ is a T -invariant measure.
69
MATH4/61112
7. Continuous transformations
Remark. You will notice a strong connection between the above arguments and Fourier
series and you may think that we could take g(x) to be the nth partial sum of the Fourier
series P
for f . However, one needs to take care. Suppose f ∈ C(Rk /Zk , R) has Fourier
series n cn (f )e2πihn,xi . We need to be careful about what it means for this infinite series
to converge. We know that the sequence of partial sums sn converges in L2 to f , but
we do not know that the partial sums converge uniformly to f . That is, we know that
kf − sn k2 → 0, but not necessarily that kf − sn k∞ → 0. In fact, in general, it is not true
that kf − sn k∞ → 0.
Pn−1
However, if one defines σn = 1/n j=0
sj to be the average of the first n partial sums,
then it is true that kf − σn k∞ → 0. (This is quite a deep result.)
§7.5
Existence of invariant measures
Given a continuous mapping T : X → X of a compact metric space, it is natural to ask
whether invariant measures necessarily exist, i.e., whether M (X, T ) 6= ∅. The next result
shows that this is the case.
Theorem 7.5.1
Let T : X → X be a continuous mapping of a compact metric space. Then there exists at
least one T -invariant probability measure.
Proof. Let ν ∈ M (X) be a probability measure (for example, we could take ν to be a
Dirac measure). Define the sequence µn ∈ M (X) by
n−1
µn =
1X j
T∗ ν,
n
j=0
so that, for B ∈ B,
µn (B) =
1
(ν(B) + ν(T −1 B) + · · · + ν(T −(n−1) B)).
n
Since M (X) is weak∗ compact, some subsequence µnk converges, as k → ∞, to a
measure µ ∈ M (X). We shall show that µ ∈ M (X, T ). By Lemma 7.3.2, this is equivalent
to showing that
Z
Z
f dµ =
f ◦ T dµ for all f ∈ C(X, R).
To see this, first note that f ◦ T − f is
Z
Z
f ◦ T dµ − f dµ =
continuous. Then
Z
(f ◦ T − f ) dµ
Z
= lim (f ◦ T − f ) dµnk k→∞


Z
nX
k −1
1
j 

T∗ ν = lim (f ◦ T − f ) d
k→∞ nk
j=0
Z nk −1
X
1
j
= lim (f ◦ T − f ) dT∗ ν k→∞ nk
j=0
70
MATH4/61112
7. Continuous transformations
Z nX
k −1
1
j+1
j
= lim (f ◦ T
− f ◦ T ) dν k→∞ nk
j=0
Z
1
= lim (f ◦ T nk − f ) dν k→∞ nk
2kf k∞
= 0.
≤ lim
k→∞
nk
Therefore, µ ∈ M (X, T ), as required.
2
We will need the following additional properties of M (X, T ).
Theorem 7.5.2
Let T : X → X be a continuous mapping of a compact metric space. Then M (X, T ) is a
weak∗ compact and convex subset of M (X).
Proof. The fact that M (X, T ) is convex is straightforward from the definition.
To see that M (X, T ) is weak∗ compact it is sufficient to show that it is a weak∗ closed
subset of the weak∗ compact M (X). Suppose that µn ∈ M (X, T ) is such that µn ⇀ µ ∈
M (X). We need to show that µ ∈ M (X, T ). To see this, observe that for any f ∈ C(X, R)
we have that
Z
Z
f ◦ T dµ = lim
f ◦ T dµn as f ◦ T is continuous
n→∞
Z
= lim
f dµn as µn ∈ M (X, T )
n→∞
Z
=
f dµ as µn ⇀ µ.
2
§7.6
Exercises
Exercise 7.1
Prove Proposition 7.2.2: show that if xn , x ∈ X and xn → x then δxn ⇀ δx .
Exercise 7.2
Prove that T∗ : M (X) → M (X) is weak∗ continuous (i.e. if µn ⇀ µ then T∗ µn ⇀ T∗ µ).
Exercise 7.3
Let X be a compact metric space. For µ ∈ M (X) define
Z
.
kµk =
sup
f
dµ
f ∈C(X,R),kf k∞ ≤1
We say that µn converges strongly to µ if kµn − µk → 0 as n → ∞. The topology this
determines is called the strong topology (or the operator topology).
(i) Show that if µn → µ strongly then µn ⇀ µ in the weak∗ topology.
(ii) Suppose that X is infinite. Show that X ֒→ M (X) : x 7→ δx is not continuous in the
strong topology.
71
MATH4/61112
7. Continuous transformations
(iii) Prove that kδx − δy k = 2 if x 6= y. (You may use Urysohn’s Lemma: Let A and B
be disjoint closed subsets of a metric space X. Then there is a continuous function
f ∈ C(X, R) such that 0 ≤ f ≤ 1 on X while f ≡ 0 on A and f ≡ 1 on B.)
Hence prove that M (X) is not compact in the strong topology when X is infinite.
Exercise 7.4
Give an example of a sequence of measures µn and a set B such that µn ⇀ µ but µn (B) 6→
µ(B).
Exercise 7.5
Prove that M (X, T ) is convex.
Exercise 7.6
Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, for all f ∈
C(X, R) and all
exists g ∈ S such that kf − gk∞ < ε). Let µn , µ ∈ M (X).
R ε > 0 there
R
Suppose that f dµn → f dµ for all f ∈ S. Prove that µn ⇀ µ.
Exercise 7.7
Let Σ = {x = (xj )∞
j=0 | xj ∈ {0, 1}} denote the shift space on two symbols 0, 1. Let
σ : Σ → Σ, (σ(x))j = xj+1 denote the shift map.
(i) How many periodic points of period n are there?
(ii) Let Per(n) denote the set of periodic points of period n. Define
µn =
1
2n
X
δx .
x∈Per(n)
Let ij ∈ {0, 1}, 0 ≤ j ≤ m − 1 and define the cylinder set
[i0 , i1 , . . . , im−1 ] = {x = (xj )∞
j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , m − 1}.
Let µ denote the Bernoulli-(1/2, 1/2) measure. Prove that
Z
Z
χ[i0 ,i1 ,...,im−1 ] dµn → χ[i0 ,i1 ,...,im−1 ] dµ as n → ∞.
(iii) Prove that χ[i0 ,i1 ,...,im−1 ] is continuous function.
(iv) Use Exercise 7.6 and the Stone-Weierstrass Theorem (Theorem 1.2.2) to show that
µn ⇀ µ as n → ∞.
Exercise 7.8
Let X = R3 /Z3 be the 3-dimensional torus. Let α ∈ R. Define T : X → X by


 

x
α+x
T  y  + Z3  =  y + x  + Z3 .
z
z+y
Use Corollary 7.4.2 to prove that Lebesgue measure µ is a T -invariant measure.
72
MATH4/61112
8. Ergodic measures for continuous transformations
8. Ergodic measures for continuous transformations
§8.1
Introduction
In the previous section we saw that, given a continuous transformation of a compact metric
space, the set of T -invariant Borel probability measures is non-empty. One can ask a similar
question: is the set of ergodic Borel probability measures non-empty? In this section we
address this question. We let E(X, T ) ⊂ M (X, T ) denote the set of ergodic T -invariant
Borel probability measures on X.
§8.2
Radon-Nikodym derivatives
We will need the concept of Radon-Nikodym derivatives.
Definition. Let µ be a measure on the measurable space (X, B). We say that a measure ν
is absolutely continuous with respect to µ and write ν ≪ µ if ν(B) = 0 whenever µ(B) = 0,
B ∈ B.
Remark. Thus ν is absolutely continuous with respect to µ if sets of µ-measure zero also
have ν-measure zero (but there may be more sets of ν-measure zero).
For example, let f ∈ L1 (X, B, µ) be non-negative and define a measure ν by
Z
f dµ.
ν(B) =
(8.2.1)
B
Then ν ≪ µ.
As a particular example, let X = [0, 1] be equipped with the Borel σ-algebra B. Define
f : [0, 1] → R by
2x if 0 ≤ x ≤ 1/2
f (x) =
0 if 1/2 < x ≤ 1.
Let µ be Lebesgue measure and let ν be the measure given by
Z
f dµ.
ν(B) =
B
If A ⊂ [1/2, 1] is any Borel set then ν(A) = 0.
The following theorem says that, essentially, all absolutely continuous measures occur
by the construction in (8.2.1).
Theorem 8.2.1 (Radon-Nikodym)
Let (X, B, µ) be a probability space. Let ν be a measure defined on B and suppose that
ν ≪ µ. Then there is a non-negative measurable function f such that
Z
f dµ for all B ∈ B.
ν(B) =
B
Moreover, f is unique in the sense that if g is a measurable function with the same property
then f = g µ-a.e.
73
MATH4/61112
8. Ergodic measures for continuous transformations
Remark. If ν ≪ µ then it is customary to write dν/dµ for the function given by the
Radon-Nikodym theorem, that is
Z
dν
ν(B) =
dµ.
B dµ
The following relations are all easy to prove, and indicate why the notation was chosen in
this way.
(i) If ν ≪ µ and f is a µ-integrable function then f is ν-integrable and
Z
Z
dν
dµ.
f dν = f
dµ
(ii) If ν1 , ν2 ≪ µ then
dν1 dν2
d(ν1 + ν2 )
=
+
.
dµ
dµ
dµ
(iii) If λ ≪ ν ≪ µ then λ ≪ µ and
§8.3
dλ
dλ dν
=
.
dµ
dν dµ
Ergodic measures as extreme points
§8.3.1
Extreme points of convex sets
A point in a convex set is called an extreme point if it cannot be written as a non-trivial
convex combination of (other) elements of the set. More precisely, µ is an extreme point of
M (X, T ) if, whenever
µ = αµ1 + (1 − α)µ2 ,
with µ1 , µ2 ∈ M (X, T ), 0 < α < 1, then we have µ1 = µ2 = µ.
Remarks.
(i) Let Y be the unit square
Y = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} ⊂ R2 .
Then the extreme points of Y are the corners (0, 0), (0, 1), (1, 0), (1, 1).
(ii) Let Y be the (closed) unit disc
Y = {(x, y) | x2 + y 2 ≤ 1} ⊂ R2 .
Then the set of extreme points of Y is precisely the unit circle {(x, y) | x2 + y 2 = 1}.
§8.3.2
Existence of ergodic measures
The next result will allow us to show that ergodic measures for continuous transformations
on compact metric spaces always exist.
Theorem 8.3.1
Let T be a continuous transformation of a compact metric space X equipped with the Borel
σ-algebra B. The following are equivalent:
74
MATH4/61112
8. Ergodic measures for continuous transformations
(i) the T -invariant probability measure µ is ergodic;
(ii) µ is an extreme point of M (X, T ).
Proof. We prove (ii) implies (i). If µ is an extreme point of M (X, T ) then it is ergodic.
In fact, we shall prove the contrapositive. Suppose that µ is not ergodic; we show that µ
is not an extreme point of M (X, T ). As µ is not ergodic, there exists B ∈ B such that
T −1 B = B and 0 < µ(B) < 1.
Define probability measures µ1 and µ2 on X by
µ1 (A) =
µ(A ∩ B)
,
µ(B)
µ2 (A) =
µ(A ∩ (X \ B))
.
µ(X \ B)
(The assumption that 0 < µ(B) < 1 ensures that the denominators are not equal to zero.)
Clearly, µ1 6= µ2 , since µ1 (B) = 1 while µ2 (B) = 0.
Since T −1 B = B, we also have T −1 (X \ B) = X \ B. Thus we have
µ1 (T −1 A) =
=
=
=
=
µ(T −1 A ∩ B)
µ(B)
−1
µ(T A ∩ T −1 B)
µ(B)
−1
µ(T (A ∩ B))
µ(B)
µ(A ∩ B)
µ(B)
µ1 (A)
and (by the same argument)
µ2 (T −1 A) =
µ(T −1 A ∩ (X \ B))
= µ2 (A),
µ(X \ B)
i.e., µ1 and µ2 are both in M (X, T ).
However, we may write µ as the non-trivial (since 0 < µ(B) < 1) convex combination
µ = µ(B)µ1 + (1 − µ(B))µ2 ,
so that µ is not an extreme point.
2
Proof (not examinable). We prove (i) implies (ii). Suppose that µ is ergodic and that
µ = αµ1 + (1 − α)µ2 , with µ1 , µ2 ∈ M (X, T ) and 0 < α < 1. We shall show that µ1 = µ
(so that µ2 = µ, also). This will show that µ is an extreme point of M (X, T ).
If µ(A) = 0 then µ1 (A) = 0, so that µ1 ≪ µ. Therefore the Radon-Nikodym derivative
dµ1 /dµ ≥ 0 exists. One can easily deduce from the statement of the Radon-Nikodym
Theorem that µ1 = µ if and only if dµ1 /dµ = 1 µ-a.e. We shall show that this is indeed
the case by showing that the sets where, respectively, dµ1 /dµ < 1 and dµ1 /dµ > 1 both
have µ-measure zero.
Let
dµ1
(x) < 1 .
B= x∈X |
dµ
75
MATH4/61112
8. Ergodic measures for continuous transformations
Now
B
dµ1
dµ =
dµ
Z
T −1 B
dµ1
dµ =
dµ
Z
µ1 (B) =
and
µ1 (T
−1
B) =
Z
Z
B∩T −1 B
B∩T −1 B
dµ1
dµ +
dµ
Z
dµ1
dµ
dµ
(8.3.1)
dµ1
dµ +
dµ
Z
dµ1
dµ.
dµ
(8.3.2)
B\T −1 B
T −1 B\B
As µ1 ∈ M (X, T ), we have that µ1 (B) = µ1 (T −1 B). Hence comparing the last summands
in both (8.3.1) and (8.3.2) we obtain
Z
Z
dµ1
dµ1
dµ =
dµ.
(8.3.3)
B\T −1 B dµ
T −1 B\B dµ
In fact, these integrals are taken over sets of the same µ-measure:
µ(T −1 B \ B) = µ(T −1 B) − µ(T −1 B ∩ B)
= µ(B) − µ(T −1 B ∩ B)
= µ(B \ T −1 B).
Note that on the left-hand side of (8.3.3), the integrand dµ1 /dµ < 1. However, on the righthand side of (8.3.3), the integrand dµ1 /dµ ≥ 1. Thus we must have that µ(B \ T −1 B) =
µ(T −1 B \ B) = 0, which is to say that µ(T −1 B△B) = 0, i.e. T −1 B = B µ-a.e. Therefore,
since µ is ergodic, we have that µ(B) = 0 or µ(B) = 1.
We can rule out the possibility that µ(B) = 1 by observing that if µ(B) = 1 then
Z
Z
dµ1
dµ1
dµ =
dµ < µ(B) = 1,
1 = µ1 (X) =
B dµ
X dµ
a contradiction. Therefore µ(B) = 0.
If we define
C=
dµ1
x∈X|
(x) > 1
dµ
then repeating essentially the same argument gives µ(C) = 0.
Hence
dµ1
µ x∈X|
(x) = 1 = µ(X \ (B ∪ C)) = µ(X) − µ(B) − µ(C) = 1,
dµ
i.e., dµ1 /dµ = 1 µ-a.e. Therefore µ1 = µ, as required.
2
We can now prove that a continuous transformation of a compact metric space always
has an ergodic measure. To do this, we will show that M (X, T ) has an extreme point.
Theorem 8.3.2
Let T : X → X be a continuous mapping of a compact metric space. Then there exists at
least one ergodic measure in M (X, T ).
Proof. By Theorem 8.3.1, it is equivalent to prove that M (X, T ) has an extreme point.
Choose a countable dense subset of C(X, R), {fi }∞
i=0 say. Consider the first function
f0 . Since the map
Z
M (X, T ) → R : µ 7→
76
f0 dµ
MATH4/61112
8. Ergodic measures for continuous transformations
is (weak∗ ) continuous and M (X, T ) is compact, there exists (at least one) ν ∈ M (X, T )
such that
Z
Z
f0 dν = sup
f0 dµ.
µ∈M (X,T )
If we define
M0 =
(
ν ∈ M (X, T ) |
Z
f0 dν =
sup
µ∈M (X,T )
Z
f0 dµ
)
then the above shows that M0 is non-empty. Also, M0 is closed and hence compact.
We now consider the next function f1 and define
(
)
Z
Z
M1 = ν ∈ M0 | f1 dν = sup
f1 dµ .
µ∈M0
By the same reasoning as above, M1 is a non-empty closed subset of M0 .
Continuing inductively, we define
(
)
Z
Z
Mj = ν ∈ Mj−1 | fj dν = sup
fj dµ
µ∈Mj−1
and hence obtain a nested sequence of sets
M (X, T ) ⊃ M0 ⊃ M1 ⊃ · · · ⊃ Mj ⊃ · · ·
with each Mj non-empty and closed.
Now consider the intersection
M∞ =
∞
\
Mj .
j=0
Recall that the intersection of a decreasing sequence of non-empty compact sets is nonempty. Hence M∞ is non-empty and we can pick µ∞ ∈ M∞ . We shall show that µ∞ is an
extreme point (and hence ergodic).
Suppose that we can write µ∞ = αµ1 + (1 − α)µ2 , µ1 , µ2 ∈ M (X, T ), 0 < α < 1. We
have to show that µ1 = µ2 . Since {fj }∞
j=0 is dense in C(X, R), it suffices to show that
Z
fj dµ1 =
Z
fj dµ2
∀ j ≥ 0.
Consider f0 . By assumption
Z
Z
Z
f0 dµ∞ = α f0 dµ1 + (1 − α) f0 dµ2 .
In particular,
Z
However µ∞ ∈ M0 and so
Z
f0 dµ∞ =
f0 dµ∞ ≤ max
sup
µ∈M (X,T )
Z
Z
77
f0 dµ2 .
Z
Z
f0 dµ1 ,
f0 dµ ≥ max
Z
f0 dµ1 ,
f0 dµ2 .
MATH4/61112
Therefore
8. Ergodic measures for continuous transformations
Z
f0 dµ1 =
Z
f0 dµ2 =
Z
f0 dµ∞ .
Thus, the first identity we require is proved and µ1 , µ2 ∈ M0 . This last fact allows us to
employ the same argument on f1 (with M (X, T ) replaced by M0 ) and conclude that
Z
Z
Z
f1 dµ1 = f1 dµ2 = f1 dµ∞
and µ1 , µ2 ∈ M1 .
Continuing inductively, we show that for an arbitrary j ≥ 0,
Z
Z
fj dµ1 = fj dµ2
and µ1 , µ2 ∈ Mj . This completes the proof.
§8.4
2
An example: the North-South map
For many dynamical systems there exist uncountably many different ergodic measures.
This is the case for the doubling map, Markov shifts, toral automorphisms, etc. Here we
give an example of a dynamical system T : X → X for which one can construct M (X, T )
and E(X, T ) explicitly.
Let X ⊂ R2 denote the circle of radius 1 centred at (0, 1) in R2 . Call N = (0, 2) the
North Pole and S = (0, 0) the South Pole (S) of X. Define a map φ : X \ {N } → R × {0}
by drawing a straight line through N and x and denoting by φ(x) the unique point on
the x-axis that this line crosses (this is just stereographic projection of the circle). Define
T : X → X by
−1 1
φ(x) if x ∈ X \ {N },
φ
2
T (x) =
N
if x = N.
Hence T (N ) = N , T (S) = S and if x 6= N, S then T n (x) → S as n → ∞. We call T the
N
x
T (x)
φ(x)
2
S
φ(x)
Figure 8.4.1: The North-South map
North-South map.
Clearly both N and S are fixed points for T . Hence δN and δS (the Dirac delta measures
at N , S, respectively) are T -invariant. It is easy to see that both δN and δS are ergodic.
78
MATH4/61112
8. Ergodic measures for continuous transformations
Now let µ ∈ M (X, T ) be an invariant measure. We claim that µ assigns zero measure
to the set X \ {N, S}. Let x ∈ X be any point in the right semi-circle (forSexample, take
−n I is
x = (1, 1) ∈ R2 ) and consider the arc I of semi-circle from x to T (x). Then ∞
n=−∞ T
a disjoint union of arcs of semi-circle and, moreover, is equal to the entire right semi-circle.
Now
!
∞
∞
∞
X
X
[
µ(I)
µ(T −n I) =
T −n I =
µ
n=−∞
n=−∞
n=−∞
and the only way for this to be finite is if µ(I) = 0. Hence µ assigns zero measure to the
entire right semi-circle. Similarly, µ assigns zero measure to the left semi-circle.
Hence µ is concentrated on the two points N , S, and so must be a convex combination
of the Dirac delta measures δN and δS . Hence
M (X, T ) = {αδN + (1 − α)δS | α ∈ [0, 1]}
and the ergodic measures are the extreme points of M (X, T ), namely δN , δS .
§8.5
Unique ergodicity
We conclude by looking at the case where T : X → X has a unique invariant probability
measure.
Definition. Let T : X → X be a continuous transformation of a compact metric space
X. If there is a unique T -invariant probability measure then we say that T is uniquely
ergodic.
Remark. You might wonder why such T are not instead called ‘uniquely invariant’. Recall
that the extreme points of M (X, T ) are precisely the ergodic measures. If M (X, T ) consists
of just one measure then that measure is an extreme, and so must be ergodic.
Unique ergodicity implies the following strong convergence result.
Theorem 8.5.1 (Oxtoby’s Ergodic Theorem)
Let X be a compact metric space and let T : X → X be a continuous transformation. The
following are equivalent:
(i) T is uniquely ergodic;
(ii) for each f ∈ C(X, R) there exists a constant c(f ) such that
n−1
1X
f (T j (x)) → c(f ),
n
j=0
uniformly for x ∈ X, as n → ∞.
Remark. The convergence in (8.5.1) means that
n−1
1 X
j
f (T (x)) − c(f ) = 0.
lim sup n→∞ x∈X n
j=0
79
(8.5.1)
MATH4/61112
8. Ergodic measures for continuous transformations
Remark. If M (X, T ) = {µ} then the constant c(f ) in (8.5.1) is
R
f dµ.
Proof. We prove (ii) implies (i). Suppose that µ, ν are T -invariant probability measures;
we shall show that µ = ν. Integrating the expression in (ii), we obtain
Z
n−1 Z
1X
n→∞ n
f dµ = lim
j=0
f ◦ T j dµ =
Z
n−1
1X
f ◦ T j dµ =
n→∞ n
lim
j=0
Z
c(f ) dµ = c(f ),
(that the convergence in (8.5.1) is uniform allows us to interchange integration and taking
limits) and, by the same argument
Z
f dν = c(f ).
Therefore
Z
f dµ =
Z
f dν for all f ∈ C(X, R)
and so µ = ν (by the Riesz Representation Theorem).
We prove (i) implies (ii). Let M (X, T ) = {µ}. If (ii) is true, then, byR the Dominated
Convergence Theorem (Theorem 3.1.3), we must necessarily have c(f ) = f dµ.
The convergence in (ii) means: ∀f ∈ C(X, R), ∀ε > 0, ∃N ∈ N such that if n ≥ N
then for all x ∈ X we have
n−1
Z
X
1
j
f (T x) − f dµ < ε.
n
j=0
Suppose that (ii) is false. Then, negating the above quantifiers, we see that there exists
f0 ∈ C(X, R) and ε > 0 and an increasing sequence nk ↑ ∞ such that there exists xnk for
which
nX
Z
1 k −1
j
≥ ε.
f
(T
x
)
−
f
dµ
(8.5.2)
0
n
0
k
n
k j=0
Define the probability measure µk ∈ M (X) by
µk =
nk −1
1 X
T∗j δxk ,
nk
j=0
so that (8.5.2) can be written as
Z
Z
f0 dµk − f0 dµ ≥ ε.
Now µk ∈ M (X) and M (X) is weak∗ compact. Hence there exists a weak∗ convergent
subsequence, say with weak∗ limit ν. By following the proof of Theorem 7.5.1, it is easy to
see that ν ∈ M (X, T ). In particular, we have
Z
Z
f0 dν − f0 dµ ≥ ε.
Therefore, ν 6= µ, contradicting unique ergodicity.
80
2
MATH4/61112
§8.6
8. Ergodic measures for continuous transformations
Irrational rotations
Let X = R/Z, T : X → X : x 7→ x + α mod 1, α irrational. We have already seen
that Lebesgue measure µ is an ergodic T -invariant measure. We can prove that Lebesgue
measure is the only invariant measure.
Proposition 8.6.1
An irrational rotation of a circle is uniquely ergodic and the unique T -invariant measure is
Lebesgue measure.
Proof. We use Oxtoby’s Ergodic Theorem. To prove that T is uniquely ergodic, we
must show that (8.5.1) holds for every continuous function f ∈ C(X, R). Note that the
convergence in (8.5.1) is uniform, i.e. we must show that
n−1
Z
1 X
j
→0
f
(T
(x))
−
f
dµ
(8.6.1)
n
j=0
∞
as n → ∞.
We first prove (8.6.1) in the case when f (x) = e2πiℓx , ℓ ∈ Z \ {0}. Note that T j (x) =
x + jα. Hence
n−1
n−1
X
1 X
1
j
2πiℓ(x+jα) f (T (x)) = e
n
n j=0
j=0
n−1
X
2πiℓx 1
2πiℓαj e
= e
n
j=0
=
≤
1 |e2πiℓαn − 1|
n |e2πiℓα − 1|
2
1
.
n |e2πiℓα − 1|
As α is irrational, the denominator in (8.6.2) is not zero. Note also that
Hence
n−1
Z
1 X
j
sup f (T (x)) − f dµ → 0
x∈X n
(8.6.2)
R
e2πiℓx dµ = 0.
j=0
as n → ∞ when f (x) = e2πiℓx , ℓ ∈ Z \ {0}. Clearly (8.6.1) holds when f is a constant
function. By taking finite linear combinations of exponential functions we see that
n−1
Z
1 X
j
g(T (x)) − g dµ → 0
sup x∈X n
j=0
as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (Theorem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R)
and let ε > 0. Then there exists a trigonometric polynomial g such that kf − gk∞ < ε.
81
MATH4/61112
8. Ergodic measures for continuous transformations
Hence for any x ∈ X we have
n−1
Z
1 X
j
f
(T
(x))
−
f
dµ
n
j=0
n−1
Z
n−1
Z
X
1 X
1
j
j
j
(f (T (x)) − g(T (x)) + g(T (x)) − g dµ + g(x) − f (x) dµ
≤ n j=0
n j=0
n−1
Z
Z
n−1
1 X
1X
j
j
j
|f (T (x)) − g(T (x)| + g(T (x)) − g dµ + |g(x) − f (x)| dµ
≤
n
n j=0
j=0
n−1
Z
1 X
j
g(T (x)) − g dµ .
≤ 2ε + n j=0
Hence, taking the supremum over all x ∈ X, we have
n−1
n−1
Z
Z
X
1 X
1
j
j
f (T (x)) − f dµ
g(T (x)) − g dµ
≤ 2ε + n
.
n
j=0
j=0
∞
∞
Letting n → ∞ we see that
n−1
Z
1 X
j
f (T (x)) − f dµ
lim sup n→∞ n
j=0
< 2ε.
∞
As ε > 0 is arbitrary, it follows that
n−1
Z
X
1
j
lim f (T (x)) − f dµ
n→∞ n
j=0
= 0.
∞
Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic
Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the
unique invariant measure.
2
§8.7
Exercises
Exercise 8.1
Prove the following identities concerning Radon-Nikodym derivatives.
(i) If ν ≪ µ and f ∈ L1 (X, B, µ) then f ∈ L1 (X, B, ν) and
Z
Z
dν
dµ.
f dν = f
dµ
(ii) If ν1 , ν2 ≪ µ then
dν1 dν2
d(ν1 + ν2 )
=
+
.
dµ
dµ
dµ
82
MATH4/61112
8. Ergodic measures for continuous transformations
(iii) If λ ≪ ν ≪ µ then λ ≪ µ and
dλ
dλ dν
=
.
dµ
dν dµ
Exercise 8.2
Let X = R3 /Z3 be the 3-dimensional torus. Fix α 6∈ Q. Define T : X → X by


 


x
α+x
T  y  + Z3  =  y + x  + Z3  .
z
z+y
Prove by induction that for n ≥ 3

n
α+x
 
 
1 

x

n
n
3
n 



= 
α+
x+y
+Z
y
T
 2 1 z

n
n
n
α+
x+
y+z
3
2
1








 + Z3 






n
(here
denotes the binomial coefficient).
r
Let f (x, y, z) = e2πi(kx+ℓy+mz) where k, ℓ, m ∈ Z. Assuming Weyl’s Theorem on Polynomials (Theorem 2.3.1), prove using Weyl’s Criterion (Theorem 1.2.1) that
n−1
1X
f (T j ((x, y, z) + Z3 )) → 0
x,y,z n
sup
j=0
as n → ∞, whenever (k, ℓ, m) ∈ Z3 \ {(0, 0, 0)}.
Hence, using Oxtoby’s Ergodic Theorem, prove that T is uniquely ergodic and Lebesgue
measure is the unique invariant measure.
Exercise 8.3
Let T be a homeomorphism of a compact metric space X. Suppose that T is uniquely
ergodic with unique invariant measure µ. Prove that every orbit of T is dense if, and only
if, µ(U ) > 0 for every non-empty open set U .
83
MATH4/61112
9. Recurrence
9. Recurrence
§9.1
Introduction
We can now begin to study ergodic theorems. Before we do this, we discuss a remarkable
result due to Poincar´e.
§9.2
Poincar´
e’s Recurrence Theorem
Theorem 9.2.1 (Poincar´
e’s Recurrence Theorem)
Let T : X → X be a measure-preserving transformation of the probability space (X, B, µ).
Let B ∈ B be such that µ(B) > 0. Then for µ-a.e. x ∈ B, the orbit {T n x}∞
n=0 returns to B
infinitely often.
Proof. Let
E = {x ∈ B | T n x ∈ B for infinitely many n ≥ 1},
then we have to show that µ(B\E) = 0.
If we write
F = {x ∈ B | T n x 6∈ B ∀n ≥ 1}
then we have the identity
B\E =
Thus we have the estimate
µ(B\E) = µ
∞
[
(T
k=0
−k
∞
[
(T −k F ∩ B).
k=0
!
F ∩ B)
≤µ
∞
[
k=0
T
−k
F
!
≤
∞
X
µ(T −k F ).
k=0
Since µ(T −k F ) = µ(F ) ∀k ≥ 0 (because the measure is preserved), it suffices to show that
µ(F ) = 0.
First suppose that n > m and that T −m F ∩ T −n F 6= ∅. If y lies in this intersection
then T m y ∈ F and T n−m (T m y) = T n y ∈ F ⊂ B, which contradicts the definition of F .
Thus T −m F and T −n F are disjoint.
Since {T −k F }∞
n=0 is a disjoint family, we have
!
∞
∞
X
[
µ(T −k F ) = µ
T −k F ≤ µ(X) = 1.
k=0
k=0
Since the terms in the summation have the constant value µ(F ), we must have µ(F ) = 0.
2
84
MATH4/61112
9. Recurrence
Remark. Note that the hypotheses of the Poincar´e Recurrence Theorem are very mild:
all one needs is for T to be a measure-preserving transformations of a probability space.
(One does not need T to be ergodic.) If you carefully look at the proof, you will see that the
fact that T is measure-preserving and the fact that µ(X) = 1 are used just once. The same
proof continues to hold in the case when µ(X) is finite. Poincar´e’s Recurrence Theorem is
false with either of the hypotheses that µ(X) is finite or T is measure-preserving removed.
§9.3
Ergodic Theorems
An ergodic theorem is a result that describes the limiting behaviour of sequences of the
form
n−1
1X
f ◦ Tj
(9.3.1)
n
j=0
as n → ∞. The precise formulation of an ergodic theorem depends on the class of function
f (for example, one could assume that f is integrable, L2 , or continuous), and the notion
of convergence used (for example, the convergence could be pointwise, L2 , or uniform).
We have already studied when one has uniform convergence of (9.3.1): this is Oxtoby’s
Ergodic Theorem and only holds in the very special circumstances when T is uniquely
ergodic. In what follows we will discuss von Neumann’s (Mean) Ergodic Theorem and
Birkhoff’s Ergodic Theorem. Von Neumann’s Ergodic Theorem is in the context of f ∈
L2 (X, B, µ) and L2 -convergence of the ergodic averages (9.3.1); Birkhoff’s Ergodic Theorem
is in the context of f ∈ L1 (X, B, µ) and almost everywhere pointwise convergence of (9.3.1).
Note that L2 convergence neither implies nor is implied by almost everywhere pointwise
convergence.
Before stating these theorems, we first need to discuss conditional expectation.
§9.4
Conditional expectation
Let (X, B, µ) be a probability space. Let A ⊂ B be a sub-σ-algebra. Note that µ defines a
measure on A by restriction. Let f ∈ L1 (X, B, µ). Then we can define a measure ν on A
by setting, for A ∈ A,
Z
f dµ.
ν(A) =
A
Note that ν ≪ µ|A . Hence by the Radon-Nikodym theorem, there is a unique A-measurable
function E(f | A) such that
Z
ν(A) =
A
E(f | A) dµ
for all A ∈ A. We call E(f | A) the conditional expectation of f with respect to the
σ-algebra A.
So far, we have only defined E(f | A) for non-negative f . To define E(f | A) for an
arbitrary real-valued f , we split f into positive and negative parts f = f+ − f− where
f+ , f− ≥ 0 and define
E(f | A) = E(f+ | A) − E(f− | A).
For a complex-valued f we split f into its real and imaginary parts and define
E(f | A) = E(Re(f ) | A) + iE(Im(f ) | A).
85
MATH4/61112
9. Recurrence
Thus we can view conditional expectation as an operator
E(· | A) : L1 (X, B, µ) → L1 (X, A, µ).
Note that E(f | A) is uniquely determined by the two requirements that
(i) E(f | A) is A-measurable, and
R
R
(ii) A f dµ = A E(f | A) dµ for all A ∈ A.
Intuitively, one can think of E(f | A) as the best approximation to f in the smaller space
of A-measurable functions.
To state von Neumann’s and Birkhoff’s Ergodic Theorems precisely, we will need the
sub-σ-algebra I of T -invariant subsets, namely:
I = {B ∈ B | T −1 B = B a.e.}.
It is straightforward to check that I is a σ-algebra. Note that if T is ergodic then I is the
trivial σ-algebra consisting of all sets in B of measure 0 or 1.
§9.5
Von Neumann’s Ergodic Theorem
Von Neumann’s Ergodic Theorem deals with the L2 -limiting behaviour of
for f ∈ L2 (X, B, µ).
1
n
Pn−1
j=0
f ◦ Tj
Theorem 9.5.1 (Von Neumann’s Ergodic Theorem)
Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L2 (X, B, µ),
we have
n−1
1X
f ◦ T j → E(f | I)
n
j=0
where the convergence is in L2 .
When T is ergodic with respect to µ then von Neumann’s Ergodic Theorem takes a
particularly simple form.
Corollary 9.5.2
Let (X, B, µ) be a probability space and let T : X → X be an ergodic measure-preserving
transformation. Let f ∈ L2 (X, B, µ). Then
n−1
1X
f ◦ Tj →
n
j=0
Z
f dµ,
as n → ∞,
(9.5.1)
where the convergence is in L2 .
Proof. If T is ergodic then I is the trivial
R σ-algebra N consisting of sets of measure 0
and 1. If f ∈ L2 (X, B, µ) then E(f | N ) = f dµ.
2
86
MATH4/61112
9. Recurrence
Remark. The meaning of convergence in (9.5.1) is that
n−1
Z
X
1
j
lim f ◦ T − f dµ
=0
n→∞ n
j=0
2
i.e.
2
1/2
 
Z
Z
n−1
X
1
lim  
f (T j x) − f dµ dµ = 0.
n→∞
n
j=0
§9.6
Proof of von Neumann’s Ergodic Theorem
None of this section is examinable—it is included for people who like hard-core functional
analysis!
We prove von Neumann’s Ergodic Theorem in the case where T is invertible.
In order to prove von Neumann’s Ergodic Theorem, it is useful to recast it in terms of
linear analysis.
Theorem 9.6.1 (von Neumann’s Ergodic Theorem for Operators)
Let U be a unitary operator of a complex Hilbert space H. Let I = {v ∈ H | U v = v} be
the closed subspace of U -invariant functions and let PI : H → I be orthogonal projection
onto I. Then for all v ∈ H we have
n−1
1X j
U v → PI v
n
(9.6.1)
j=0
in the norm induced on H by the inner product.
Proof of Theorem 9.6.1. Denote the inner product and norm on H by h·, ·i and k · k,
respectively.
First note that if v ∈ I then (9.6.1) holds, as
n−1
1X j
U v = v = PI v.
n
j=0
If v = U w − w for some w ∈ H then
n−1
X
1
1
1
j U v = kU n w − wk ≤ 2kwk → 0.
n
n
n
j=0
If we let C denote the norm-closure of the subspace {U w − w | w ∈ H} then it follows that
n−1
1X j
U v→0
n
j=0
for all v ∈ C, by approximation.
We claim that H = I ⊕ C, an orthogonal decomposition. Suppose that v ⊥ C. Then
hv, U w − wi = 0 for all w ∈ H. Hence hU ∗ v, wi = hv, wi for all w ∈ H. Hence U ∗ v = v.
As U is unitary, we have that U ∗ = U −1 . Hence v = U v, so that v ∈ I. Reversing each
implication we see that v ∈ I implies v ⊥ C, and the claim follows.
2
87
MATH4/61112
9. Recurrence
Remark. Note that an isometry of a Hilbert space H is a linear operator U such that
hU v, U wi = hv, wi for all v, w ∈ H. We say that U is unitary if, in addition, it is invertible.
Equivalently, U is unitary if the dual operator U ∗ is the inverse of U : U ∗ U = U U ∗ = id.
We can prove von Neumann’s Ergodic Theorem for an invertible measure-preserving
transformation T of a probability space (X, B, µ) as follows. Recall that L2 (X, B, µ) is a
Hilbert space with respect to the inner product
Z
hf, gi = f g¯ dµ
and that T induces a linear operator U : L2 (X, B, µ) → L2 (X, B, µ) by U f = f ◦ T . As T
is measure-preserving, we have that U is an isometry; if T is invertible then U is unitary.
Let PI : L2 (X, B, µ) → L2 (X, I, µ) denote the orthogonal projection onto the subspace
of T -invariant functions. One can easily check (see Exercise 9.6 that PI f = E(f | I).
Hence, when T is invertible, Theorem 9.5.1 follows immediately from Theorem 9.6.1.
One can deduce from Theorem 9.6.1 that the result continues to hold when U is an
isometry and is not assumed to be invertible.
§9.7
Exercises
Exercise 9.1
Construct an example to show that Poincar´e’s recurrence theorem does not hold on infinite
measure spaces. That is, find a measure space (X, B, µ) with µ(X) = ∞ and a measurepreserving transformation T : X → X such that the conclusion of Poincar´e’s Recurrence
Theorem does not hold.
Exercise 9.2
Poincar´e’s Recurrence Theorem says that, if we have a measure-preserving transformation
T of a probability space (X, B, µ) and a set A ∈ B, µ(A) > 0, then, if we start iterating a
typical point x ∈ A then the orbit of x will return to A infinitely often.
Construct an example to show that if we have a measure-preserving transformation T
of a probability space (X, B, µ) and two sets A, B ∈ B, µ(A), µ(B) > 0, then, if we start
iterating a typical point x ∈ A then the orbit of x does not necessarily visit B infinitely
often.
Exercise 9.3
(i) Prove that f 7→ E(f | A) is linear.
(ii) Suppose that T is a measure-preserving transformation. Show that E(f | A) ◦ T =
E(f ◦ T | T −1 A).
(iii) Show that E(f | B) = f .
(iv) Let N denote the trivial σ-algebra consisting of all sets of measure 0 and 1. Show
that aRfunction f is N -measurable if and only if it is constant a.e. Show that E(f |
N ) = f dµ.
Exercise 9.4
Let (X, B, µ) be a probability space.
88
MATH4/61112
9. Recurrence
(i) Let α = {A
Sn1 , . . . , An }, Aj ∈ B be a finite partition of X. (By a partition we mean
that X = j=1 Aj and Ai ∩ Aj = ∅ if i 6= j.) Let A denote the set of all finite unions
of sets in α. Check that A is a σ-algebra.
(ii) Show that g : X → R is A-measurable if and only if g is constant on each Aj , i.e.
g(x) =
n
X
cj χAj (x).
j=1
(iii) Let f ∈ L1 (X, B, µ). Show that
E(f | A)(x) =
r
X
j=1
χAj (x)
R
Aj
f dµ
µ(Aj )
.
Thus E(f | A) is the best approximation to f that is constant on sets in the partition
α.
Exercise 9.5
Prove that I is a σ-algebra.
Exercise 9.6
Let T be a measure-preserving transformation of the probability space (X, B, µ) and let I
denote the sub-σ-algebra of T -invariant sets. Let PI : L2 (X, B, µ) → L2 (X, I, µ) denote
the orthogonal projection onto the subspace of T -invariant functions. Prove that PI f =
E(f | I) for all f ∈ L2 (X, B, µ).
89
MATH4/61112
10. Birkhoff’s Ergodic Theorem
10. Birkhoff’s Ergodic Theorem
§10.1
Birkhoff ’s Ergodic Theorem
Birkhoff’s Ergodic Theorem deals with the behaviour of
and for f ∈ L1 (X, B, µ).
1
n
Pn−1
j=0
f (T j x) for µ-a.e. x ∈ X,
Theorem 10.1.1 (Birkhoff ’s Ergodic Theorem)
Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L1 (X, B, µ),
we have
n−1
1X
f (T j x) → E(f | I)(x)
n
j=0
for µ-a.e. x ∈ X.
Corollary 10.1.2 (Birkhoff ’s Ergodic Theorem for an ergodic transformation)
Let (X, B, µ) be a probability space and let T : X → X be an ergodic measure-preserving
transformation. Let f ∈ L1 (X, B, µ). Then
n−1
1X
f (T j x) →
n
j=0
Z
f dµ,
as n → ∞,
for µ-a.e. x ∈ X.
§10.2
Consequences of, and criteria for, ergodicity
Here we give some simple corollaries of Birkhoff’s Ergodic Theorem. The first result says
that, for a typical orbit of an ergodic dynamical system, ‘time averages’ equal ‘space averages’.
Corollary 10.2.1
Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ).
Suppose that B ∈ B. Then for µ-a.e. x ∈ X, the frequency with which the orbit of x lies
in B is given by µ(B), i.e.,
lim
n→∞
1
card{j ∈ {0, 1, . . . , n − 1} | T j x ∈ B} = µ(B) µ-a.e.
n
Proof. Apply the Birkhoff Ergodic Theorem with f = χB .
2
It is possible to characterise ergodicity in terms of the behaviour of iteration of preimages of sets, rather than the iteration points, under the dynamics. The next result deals
with this.
90
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Proposition 10.2.2
Let (X, B, µ) be a probability space and let T : X → X be a measure-preserving transformation. The following are equivalent:
(i) T is ergodic;
(ii) for all A, B ∈ B,
n−1
1X
µ(T −j A ∩ B) → µ(A)µ(B),
n
j=0
as n → ∞.
Proof. (i) ⇒ (ii): Suppose that T is ergodic. Since χA ∈ L1 (X, B, µ), Birkhoff’s Ergodic
Theorem tells us that
n−1
1X
χA (T j x) → µ(A), as n → ∞
n
j=0
for µ-a.e. x ∈ X. Multiplying both sides by χB gives
n−1
1X
χA (T j x) χB (x) → µ(A)χB , as n → ∞
n
j=0
for µ-a.e. x ∈ X. Since the left-hand side is bounded (by 1), we can apply the Dominated
Convergence Theorem (Theorem 3.1.3) to see that
n−1
n−1 Z
1X
1X
µ(T −j A ∩ B) =
n
n
j=0
j=0
j
χA ◦ T χB dµ =
Z
n−1
1X
χA ◦ T j χB dµ → µ(A)µ(B),
n
j=0
as n → ∞.
(ii) ⇒ (i): Now suppose that the convergence holds. Suppose that T −1 B = B and take
A = B. Then µ(T −j A ∩ B) = µ(B) so
n−1
1X
µ(B) → µ(B)2 ,
n
j=0
as n → ∞. This gives µ(B) = µ(B)2 . Therefore µ(B) = 0 or 1 and so T is ergodic.
§10.3
2
Kac’s Lemma
Poincar´e’s Recurrence Theorem tells us that, under a measure-preserving transformation,
almost every point of a subset A of positive measure will return to A. However, it does not
tell us how long we should have to wait for this to happen. One would expect that return
times to sets of large measure are small, and that return times to sets of small measure are
large. This is indeed the case, and forms the content of Kac’s Lemma.
Let T : X → X be a measure-preserving transformation of a probability space (X, B, µ)
and let A ⊂ X be a measurable subset with µ(A) > 0. By Poincar´e’s Recurrence Theorem,
the integer
nA (x) = inf{n ≥ 1 | T n (x) ∈ A}
is defined for a.e. x ∈ A.
91
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Theorem 10.3.1 (Kac’s Lemma)
Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ).
Let A ∈ B be such that µ(A) > 0. Then
Z
nA dµ = 1.
A
Proof. Let
An = A ∩ T −1 Ac ∩ · · · ∩ T −(n−1) Ac ∩ T −n A.
Then An consists of those points in A that return to A after exactly n iterations of T , i.e.
An = {x ∈ A | nA (x) = n}.
Consider the illustration in Figure 10.3. As T is ergodic, almost every point of X
T
T
T
A1
A2
T
T
T
T
A3
A
An
T
Figure 10.3.1: The return times to A
eventually enters A. Hence the diagram represent almost all of X. Note that the column
above An in the diagram consists of n sets, An,0 , . . . , An,n−1 say, with An,0 = An . Note
that T −k An,k = An . As T is measure-preserving, it follows that µ(An,k ) = µ(An ) for
k = 0, . . . , n − 1. Hence
1 = µ(X) =
∞ n−1
X
X
µ(An,k ) =
n=1 k=0
=
∞ Z
X
nA dµ =
n=1 An
∞
X
nµ(An )
n=1
Z
nA dµ.
A
2
Remark. Let A be as in the statement of Kac’s Lemma (Theorem 10.3.1). Define a
probability measure µA on A by µA = µ/µ(A) so that µA (A) = 1. Then Kac’s Lemma says
92
MATH4/61112
that
10. Birkhoff’s Ergodic Theorem
Z
nA dµA =
A
1
,
µ(A)
i.e. the expected return time of a point in A to the set A is 1/µ(A).
§10.4
Ehrenfests’ example
The following example, due to P. and T. Ehrenfest, demonstrates that the return times in
Poincar´e’s Recurrence Theorem may be extremely large.
Consider two urns. One urn contains 100 balls, numbered 1 to 100, and the other urn
is empty. We also have a random number generator: this could be a bag containing 100
slips of paper, numbered 1 to 100.
Each second, a slip of paper is drawn from the bag, the number is noted, and the slip of
paper is returned to the bag. The ball bearing that number is then moved from whichever
urn it is currently in to the other urn.
Naively, we would expect that the system will settle into an equilibrium state in which
there are 50 balls in each urn. Of course, there will continue to be small random fluctuations
about the 50-50 distribution. However, it would appear highly unlikely for the system to
return to the state in which 100 balls are in the first urn. Nevertheless, the Poincar´e
Recurrence Theorem tells us that this situation will occur almost surely and Kac’s Lemma
tells us how long we should expect to wait.
To see this, we represent the system as a shift on 101 symbols with an appropriate
Markov measure. Regard xj ∈ {0, . . . , 100} as being the number of balls in the first urn
after j seconds. Hence a sequence (xj )∞
j=0 records the number of balls in the first urn at
each time. Let Σ = {x = (xj )∞
|
x
∈
{0, 1, . . . , 101}}.
j
j=0
Let p(i) denote the probability of there being i balls in the first urn. This is equal to
the number of possible ways of choosing i balls from 100,divided
by the total number of
100
ways of distributing 100 balls across the 2 urns. There are
ways of choosing i balls
i
from 100 balls. As there are 2 possible urns for each ball to be in, there are 2100 possible
arrangements of all the balls. Hence the probability of there being i balls in the first urn is
1
100
.
p(i) = 100
i
2
If we have i balls in the first urn then at the next stage we must have either i − 1 or i + 1
balls in the first urn. The number of balls becomes i − 1 if the random number chosen is
equal to the number of one of the balls in the first urn. As there are currently i such balls,
the probability of this happening is i/100. Hence the probability P (i, i − 1) that there are
i − 1 balls remaining given that we started with i balls in the first urn is i/100. Similarly,
the probability P (i, i + 1) that there are i + 1 balls in the first urn given that we started
with i balls is (100 − i)/100. if j 6= i − 1, i + 1 then we cannot have j balls in the first urn
given that we started with i balls; thus P (i, j) = 0. This defines a stochastic matrix:


0
1
0
0
0 ···
99
 1
0
0 ··· 
100

 100 02
98
 0
0 100 0 · · · 
P =

100
97
3
 0
0 100
··· 
0 100


..
..
..
..
..
..
.
.
.
.
.
.
93
MATH4/61112
10. Birkhoff’s Ergodic Theorem
It is straightforward to check that pP = p. Hence we have a Markov probability measure
µP defined on Σ. The matrix A is irreducible (but is not aperiodic); this ensures that µP
is ergodic.
Consider the cylinder A = [100] of length 1. The represents there being 100 balls in the
first urn. By Poincar´e’s Recurrence Theorem, if we start in A then we return to A infinitely
often. Thus, with probability 1, we will return to the situation where all 100 balls have
returned to the first urn—and this will happen infinitely often! We can use Kac’s Lemma
to calculate the expected amount of time we will have to wait until all the balls first return
to the first urn. By Kac’s lemma, the expected first return time to A is
1
= 2100 seconds,
µP (A)
which is about 4 × 1022 years, or about 3 × 1012 times the length of time that the Universe
has so far existed!
(This measure-preserving transformation system, with 4 balls rather than 100, was also
studied in Exercise 3.11.)
§10.5
Proof of Birkhoff ’s Ergodic Theorem
None of this section is examinable—it is included for people who like hard-core ε-δ analysis!
The proof is based on the following inequality.
Theorem 10.5.1 (Maximal Inequality)
Let (X, B, µ) be a probability space, let T : X → X be a measure-preserving transformation
and let f ∈ L1 (X, B, µ). Define f0 = 0 and, for n ≥ 1,
fn = f + f ◦ T + · · · + f ◦ T n−1 .
For n ≥ 1, set Fn (x) = max0≤j≤n fj (x) so that Fn (x) ≥ 0. Then
Z
f dµ ≥ 0.
{x∈X|Fn (x)>0}
Proof. Clearly Fn ∈ L1 (X, B, µ). For 0 ≤ j ≤ n, we have Fn ≥ fj , so Fn ◦ T ≥ fj ◦ T .
Hence
Fn ◦ T + f ≥ fj ◦ T + f = fj+1
and therefore
Fn ◦ T (x) + f (x) ≥ max fj (x).
1≤j≤n
If Fn (x) > 0 then
max fj (x) = max fj (x) = Fn (x),
1≤j≤n
0≤j≤n
so we obtain that
f ≥ Fn − Fn ◦ T
on the set A = {x | Fn (x) > 0}.
94
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Hence
Z
A
f dµ ≥
=
≥
Z
ZA
ZX
X
Z
Fn dµ −
Fn dµ −
Fn dµ −
Fn ◦ T dµ
ZA
ZA
Fn ◦ T dµ as Fn = 0 on X \ A
X
Fn ◦ T dµ as Fn ◦ T ≥ 0
= 0 as µ is T -invariant.
2
Corollary 10.5.2
Let g ∈ L1 (X, B, µ) and let
Mα =



1
n≥1 n
x ∈ X | sup
n−1
X
j=0


g(T j x) > α .

Then for all B ∈ B with T −1 B = B we have that
Z
g dµ ≥ αµ(Mα ∩ B).
Mα ∩A
Proof. Suppose first that B = X. Let f = g − α, then


∞
∞
n−1
∞ 

[
[
X
[
j
{x | Fn (x) > 0}
{x | fn (x) > 0} =
g(T x) > nα =
x|
Mα =


n=1
n=1
n=1
j=0
(since fn (x) > 0 ⇒ Fn (x) > 0 and Fn (x) > 0 ⇒ fj (x) > 0 for some 1 ≤ j ≤ n). Write
Cn = {x | Fn (x) > 0} and observe that Cn ⊂ Cn+1 . Thus χCn converges to χBα and so
f χCn converges to f χMα , as n → ∞. Furthermore, |f χCn | ≤ |f |. Hence, by the Dominated
Convergence Theorem,
Z
Z
Z
Z
f dµ, as n → ∞.
f χMα dµ =
f χCn dµ →
f dµ =
Cn
Mα
X
X
R
Applying the Maximal Inequality, we have for all n ≥ 1 that Cn f dµ ≥ 0. Therefore
R
R
Mα f dµ ≥ 0, i.e., Bα g dµ ≥ αµ(Bα ).
For the general case, we work with the restriction of T to B, T |B : B → B, and apply
the Maximal Inequality on this subset to get
Z
g dµ ≥ αµ(Mα ∩ B),
Mα ∩B
as required.
2
We will also need the following convergence result.
Proposition 10.5.3 (Fatou’s Lemma)
Let (X, B, µ) be a probability space and suppose that fn : X → R are measurable functions.
Define f (x) = lim inf n→∞ fn (x). Then f is measurable and
Z
Z
f dµ ≤ lim inf fn dµ
n→∞
(one or both of these expressions may be infinite).
95
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Proof of Birkhoff ’s Ergodic Theorem. Let
f ∗ (x) = lim sup
n→∞
n−1
n−1
j=0
j=0
1X
1X
f (T j x), f∗ (x) = lim inf
f (T j x).
n→∞ n
n
These exist (but may be ±∞, respectively) at all points x ∈ X. Clearly f∗ (x) ≤ f ∗ (x).
Let
n−1
1X
an (x) =
f (T j x).
n
j=0
Observe that
n+1
1
an+1 (x) = an (T x) + f (x).
n
n
As f is finite µ-a.e., we have that f (x)/n → 0 µ-a.e. as n → ∞. Hence, taking the lim sup
and lim inf as n → ∞, gives us that f ∗ ◦ T = f ∗ µ-a.e. and f∗ ◦ T = f∗ µ-a.e.
We have to show
(i) f ∗ = f∗ µ-a.e
(ii) f ∗ ∈ L1 (X, B, µ)
R
R
(iii) f ∗ dµ = f dµ.
We prove (i). For α, β ∈ R, define
Eα,β = {x ∈ X | f∗ (x) < β and f ∗ (x) > α}.
Note that
{x ∈ X | f∗ (x) < f ∗ (x)} =
[
Eα,β
β<α, α,β∈Q
(a countable union). Thus, to show that f ∗ = f∗ µ-a.e., it suffices to show that µ(Eα,β ) = 0
whenever β < α. Since f∗ ◦ T = f∗ and f ∗ ◦ T = f ∗ , we see that T −1 Eα,β = Eα,β . If we
write


n−1


1X
f (T j x) > α
Mα = x ∈ X | sup


n≥1 n
j=0
then Eα,β ∩ Mα = Eα,β .
Applying Corollary 10.5.2 we have that
Z
Z
f dµ =
f dµ
Eα,β ∩Mα
Eα,β
≥ αµ(Eα,β ∩ Mα ) = αµ(Eα,β ).
Replacing f , α and β by −f , −β and −α and using the fact that (−f )∗ = −f∗ and
(−f )∗ = −f ∗ , we also get
Z
f dµ ≤ βµ(Eα,β ).
Eα,β
Therefore
αµ(Eα,β ) ≤ βµ(Eα,β )
96
MATH4/61112
10. Birkhoff’s Ergodic Theorem
and since β < α this shows that µ(Eα,β ) = 0. Thus f ∗ = f∗ µ-a.e. and
n−1
1X
f (T j x) = f ∗ (x) µ-a.e.
n→∞ n
lim
j=0
We prove (ii). Let
n−1
1 X
j gn (x) = f (T x) .
n j=0
Then gn ≥ 0 and
Z
gn dµ ≤
Z
|f | dµ
so we can apply Fatou’s Lemma (Proposition 10.5.3) to conclude that limn→∞ gn = |f ∗ | is
integrable, i.e., that f ∗ ∈ L1 (X, B, µ).
We prove (iii). For n ∈ N and k ∈ Z, define
k+1
k
.
Dkn = x ∈ X | ≤ f ∗ (x) <
n
n
For every ε > 0, we have that
Dkn ∩ M k −ε = Dkn .
n
Since
T −1 Dkn
=
Dkn ,
we can apply Corollary 10.5.2 again to obtain
Z
k
− ε µ(Dkn ).
f dµ ≥
n
Dkn
Since ε > 0 is arbitrary, we have
Z
Thus
Dkn
f dµ ≥
k
µ(Dkn ).
n
1
k+1
µ(Dkn ) ≤ µ(Dkn ) +
f dµ ≤
n
n
Dkn
Z
∗
Z
Dkn
f dµ
(where the first inequality follows from the definition of Dkn ). Since
[
X=
Dkn
k∈Z
(a disjoint union), summing over k ∈ Z gives
Z
Z
1
f dµ
f ∗ dµ ≤
µ(X) +
n
X
X
Z
1
f dµ.
+
=
n
X
Since this holds for all n ≥ 1, we obtain
Z
Z
∗
f dµ.
f dµ ≤
X
X
97
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Applying the same argument to −f gives
Z
Z
∗
(−f ) dµ ≤ −f dµ
so that
Therefore
Z
∗
f dµ =
Z
Z
f∗ dµ ≥
∗
f dµ =
Z
Z
f dµ.
f dµ,
as required.
Finally, we prove that f ∗ = E(f | I). First note that as f ∗ is T -invariant, it is
measurable with respect to I. Moreover, if I is any T -invariant set then
Z
Z
f dµ = f ∗ dµ.
I
I
Hence f ∗ = E(f | I).
§10.6
2
Exercises
Exercise 10.1
Suppose that T is an ergodic measure-preserving transformation of the probability space
(X, B, µ) and suppose that f ∈ L1 (X, B, µ). Prove that
f (T n x)
= 0 µ-a.e.
n→∞
n
lim
Exercise 10.2
Deduce from Birkhoff’s Ergodic Theorem that if T is an ergodic measure-preserving
transR
formation of a probability space (X, B, µ) and f ≥ 0 is measurable but f dµ = ∞ then
n−1
1X
f (T j x) → ∞ µ-a.e.
n
j=0
(Hint: define fM = min{f, M } and note that fM ∈ L1 (X, B, µ). Apply Birkhoff’s Ergodic
Theorem to each fM .)
Exercise 10.3
Let T be a measure-preserving transformation of the probability space (X, B, µ). Prove
that the following are equivalent:
(i) T is ergodic with respect to µ,
(ii) for all f, g ∈ L2 (X, B, µ) we have that
n−1 Z
1X
lim
n→∞ n
j
f (T x)g(x) dµ =
j=0
98
Z
f dµ
Z
g dµ.
MATH4/61112
10. Birkhoff’s Ergodic Theorem
Exercise 10.4
Let X be a compact metric space equipped with the Borel σ-algebra B and let T : X → X
be continuous. Suppose that µ ∈ M (X) is an ergodic measure.
Prove that there exists a set Y ∈ B with µ(Y ) = 1 such that
n−1
1X
f (T j x) →
n
j=0
Z
f dµ
for all x ∈ Y and for all f ∈ C(X, R).
(Thus, in the special case of a continuous transformation of a compact metric space
and continuous functions f , the set of full measure for which Corollary 10.1.2 holds can be
chosen to be independent of the function f .)
Exercise 10.5
A popular illustration of recurrence concerns a monkey typing the complete works of Shakespeare on a typewriter. Here we study this from an ergodic-theoretic viewpoint.
Imagine a(n idealised) monkey typing on a typewriter. Each second he types one letter, and each letter occurs with equal probability (independently of the preceding letter).
Suppose that the keyboard has 26 keys (so no space bar, carriage return, numbers, etc).
Show how to model this using a shift space on 26 symbols with an appropriate Bernoulli
measure. Use Birkhoff’s Ergodic Theorem to show that the monkey must, with probability
1, eventually type the word ‘MONKEY’. Use Kac’s Lemma to calculate the expected time
it would take for the monkey to first type ‘MONKEY’.
99
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
11. Applications of Birkhoff’s Ergodic Theorem
§11.1
Introduction
We will show how to use Birkhoff’s Ergodic Theorem to prove some interesting results in
number theory.
§11.2
Normal and simply normal numbers
Recall that any number x ∈ [0, 1] can written as a decimal
x = ·x0 x1 x2 . . . =
∞
X
xj
10j+1
j=0
where xj ∈ {0, 1, . . . , 9}. This decimal expansion is unique unless the decimal expansion
ends in either infinitely repeated 0s or infinitely repeated 9s.
More generally, given any integer base b ≥ 2, we can write z ∈ [0, 1] as a base b
expansion:
∞
X
xj
x = ·x0 x1 x2 . . . =
bj+1
j=0
where xj ∈ {0, 1, . . . , b − 1}. This expansion is unique unless it ends in either infinitely
repeated 0s or infinitely repeated (b − 1)s.
Definition. A number x ∈ [0, 1] is said to be simply normal in base b if for each k =
0, 1, . . . , b − 1, the frequency with which digit k occurs in the base b expansion of x is equal
to 1/b.
Remarks.
1. Thus a number is simply normal in base b if all of the b possible digits in its base b
expansion are equally likely to occur.
2. It is straightfoward to construct examples of simply normal numbers in a given base.
For example,
x = ·012 · · · 9012 · · · 9 · · ·
consisting of the block of decimal digits 012 · · · 9 infinitely repeated is simply normal
in base 10. If a number is simply normal in one base then it need not be simply
normal in any other base.
Fix b ≥ 2. Define the map T : [0, 1] → [0, 1] by T (x) = Tb (x) = bx mod 1. It is easy to
see, by following any of the arguments we have seen for the doubling map, that Lebesgue
measure µ on [0, 1] is an ergodic invariant measure for T .
100
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
There is a close connection between the map Tb and base b expansions. Note that if
x ∈ [0, 1] has base b expansion
∞
X
xj
x=
= ·x0 x1 x2 · · ·
j+1
b
j=0
then
∞
∞
∞
X
X
X
xj
xj
xj+1
Tb (x) = b
mod 1 = x0 +
mod 1 =
= ·x1 x2 x3 · · · .
j+1
j
b
b
bj+1
j=0
j=1
j=0
Thus Tb acts on base b expansions by deleting the zeroth term and then shifting the remaining digits one place to the left. This relationship between base b expansions and the
map Tb can be used to prove the following result.
Proposition 11.2.1
Let b ≥ 2. Then Lebesgue almost every real number in [0, 1] is simply normal in base b.
Proof. Fix k ∈ {0, 1, . . . , b − 1}. Note that x0 = k if and only if x ∈ [k/b, (k + 1)/b).
Hence xj = k if and only if Tbj (x) ∈ [k/b, (k + 1)/b). Thus
n−1
1X
1
card{0 ≤ j ≤ n − 1 | xj = k} =
χ[k/b,(k+1)/b) (T j x).
n
n
(11.2.1)
j=0
By Birkhoff’sR Ergodic Theorem, for Lebesgue almost every point x the above expression
converges to χ[k/b,(k+1)/b) (x) dx = 1/b. Let Xb (k) denote the set of points x ∈ [0, 1] for
which (11.2.1)Tconverges. Then µ(Xb (k)) = 1 for each k = 0, 1, . . . , b − 1.
Let Xb = b−1
k=0 Xb (k). Then µ(Xb ) = 1. If x ∈ Xb then the frequency with which digit
k occurs in the base b expansion of x is equal to 1/b, i.e. x is simply normal in base b. 2
We can consider a more general notion of normality of numbers as follows. Take x ∈ [0, 1]
and write x as a base b expansion
∞
X
xj
x = ·x0 x1 x2 . . . =
bj+1
j=0
where xj ∈ {0, 1, . . . , b − 1}. Fix a finite word of symbols i0 , i1 , . . . , ik−1 where ij ∈
{0, 1, . . . , b − 1}, j = 0, . . . , k − 1. We can ask what is the frequency with which the
block of symbols i0 , i1 , . . . , ik−1 occurs in the base b expansion of x. Note that x has a base
b expansion that starts i0 i1 · · · ik−1 precisely when


k−1
k−1
X
X
ij
ij
1
,
+ k.
x∈
j+1
j+1
b
b
b
j=0
j=0
Call this interval I(i0 , . . . , ik−1 ) and note that it has Lebesgue measure 1/bk .
Definition. A number x ∈ [0, 1] is said to be normal in base b if, for each k ≥ 1 and for
each word i0 , i1 , . . . , ik−1 of length k, the frequency with which this word occurs in the base
b expansion of x is equal to 1/bk .
101
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
Proposition 11.2.2
Let b ≥ 2 be an integer. Lebesgue almost every real number in [0, 1] is normal in base b.
Proof. Fix a word i0 , i1 , . . . , ik−1 of length k and define the interval I(i0 , . . . , ik−1 ) as
above. Then the word i0 , i1 , . . . , ik−1 occurs at the jth place in the base b expansion of x
if and only if Tbj (x) ∈ I(i0 , . . . , ik−1 ). Thus
1
card{0 ≤ j ≤ n − 1 | i0 , i1 , . . . , ik−1 occurs at the jth place in the base b expansion of x}
n
n−1
=
1X
χI(i0 ,...,ik−1 ) (T j x).
n
(11.2.2)
j=0
By Birkhoff’sRErgodic Theorem, for Lebesgue almost every point x the above expression
converges to χI(i0 ,...,ik−1 ) (x) dx = 1/bk . Let Xb (i0 , i1 , . . . , ik−1 ) denote the set of points
x ∈ [0, 1] for which (11.2.2) converges. Then µ(Xb (i0 , i1 , . . . , ik−1 )) = 1 for each word
i0 , i1 , . . . , ik−1 of length k.
Let
∞
\
\
Xb =
Xb (i0 , i1 , . . . , ik−1 )
k=1 i0 ,i1 ,...,ik−1
where the second intersection is taken over all words of length k. As this is a countable
intersection, we have that µ(Xb ) = 1. If x ∈ Xb then the frequency with which any word of
length k occurs in the base b expansion of x is equal to 1/bk , i.e. x is normal in base b. 2
We can then make the following definition.
Definition. A number x ∈ [0, 1] is normal if it is normal in base b for every base b ≥ 2.
One can then prove the following result:
Proposition 11.2.3
Lebesgue almost every number x ∈ [0, 1] is normal.
Remark. Although a ‘typical’ number is normal, there are no known examples of normal
numbers!
§11.3
Continued fractions
We can use Birkhoff’s Ergodic Theorem to study the frequency with which a given digit
occurs in the continued fraction expansion of real numbers.
Proposition 11.3.1
For Lebesgue-almost every x ∈ [0, 1], the frequency with which the natural number k occurs
in the continued fraction expansion of x is
(k + 1)2
1
log
.
log 2
k(k + 2)
102
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
Proof. Let λ denote Lebesgue measure and let µ denote Gauss’ measure. Recall that λ
and µ are equivalent, i.e. they have the same sets of measure zero. Then λ-a.e. and µ-a.e.
x ∈ (0, 1) is irrational and has an infinite continued fraction expansion
1
x=
.
1
x0 +
x1 +
1
x2 + · · ·
Let T denote the continued fraction map. Then
1
T (x) =
1
x1 +
x2 +
so that
1
= x1 +
T (x)
1
x3 + · · ·
1
1
x2 +
x3 + · · ·
.
Hence x1 = [1/T (x)], where [x] denotes the integer part of x. More generally, xn = [1/T n x].
Fix k ∈ N. Note that x has a continued fraction expansion starting with digit k (i.e.
x0 = k) precisely when [1/x] = k. That is, x0 = k precisely when
k≤
1
<k+1
x
which is equivalent to requiring
1
1
<x≤
k+1
k
i.e. x ∈ (1/(k + 1), 1/k]. Similarly xn = k precisely when T n x ∈ (1/(k + 1), 1/k].
Hence
1
card{0 ≤ j ≤ n − 1 | xj = k}
n
n−1
1X
χ(1/(k+1),1/k] (T j x)
n
j=0
Z
→
χ(1/(k+1),1/k] dµ for µ-a.e. x
1
1
1
log 1 +
− log 1 +
for µ-a.e. x
=
log 2
k
k+1
1
(k + 1)2
=
log
for µ-a.e. x.
log 2
k(k + 2)
=
As µ and λ have the same sets of measure zero, this holds for Lebesgue almost every point.
2
We can also study the limiting arithmetic and geometric means of the digits in the
continued fraction expansion of Lebesgue almost every point x ∈ [0, 1].
Proposition 11.3.2
(i) For Lebesgue-almost every x ∈ [0, 1], the limiting arithmetic mean of the digits in the
continued fraction expansion of x is infinite.
103
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
(ii) For Lebesgue-almost every x ∈ [0, 1], the limiting geometric mean of the digits in the
continued fraction expansion of x is
log k/ log 2
∞ Y
1
1+ 2
.
k + 2k
k=1
Proof. Writing
1
x=
.
1
x0 +
x1 +
the proposition claims that
1
x2 + · · ·
1
(x0 + x1 + · · · + xn−1 ) = ∞
n→∞ n
(11.3.1)
lim
for Lebesgue almost every point, and that
1/n
lim (x0 x1 · · · xn−1 )
n→∞
=
∞ Y
k=1
1
1+ 2
k + 2k
log k/ log 2
(11.3.2)
for Lebesgue almost every point.
We leave (11.3.1) as an exercise.
We prove (11.3.2). Define f (x) = log k for x ∈ (1/(k + 1), 1/k] so that f (x) = log k
precisely when x0 = k. Then f (T j x) = log k precisely when xj = k. By Exercise 3.5(iii),
to show f ∈ L1 (X, B, µ) it is sufficient to show that f ∈ L1 (X, B, λ). Note that
Z
∞
X
1
1
,
f dλ =
log k λ
k+1 k
k=1
=
≤
which converges. Hence f ∈
Now
∞
X
k=1
∞
X
K=1
log k
k(k + 1)
log k
,
k2
L1 (X, B, µ).
n−1
1
(log x0 + log x1 + · · · + log xn−1 )
n
=
→
=
=
1X
f (T j x)
n
j=0
Z 1
f (x)
1
dx
log 2 0 1 + x
∞ Z
1 X 1/k
log k
log 2
k=1
∞
X
log k
k=1
1/(k+1)
1+x
dx
1
log 1 + 2
log 2
k + 2k
,
for Gauss-almost every point x ∈ [0, 1]. As Gauss’ measure and Lebesgue measure have the
same sets of measure zero, this limit also exists for Lebesgue almost every point.
2
104
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
Let x ∈ (0, 1) be irrational and have continued fraction expansion [x0 , x1 , . . .]. Then
[x0 , x1 , . . . , xn−1 ] is a rational number; write [x0 , x1 , . . . , xn−1 ] = Pn /Qn , where Pn , Qn
are co-prime integers. Then Pn /Qn is a ‘good’ rational approximation to x. We write
Pn (x), Qn (x) if we wish to indicate the dependence on x. As x and Pn /Qn lie in the same
cylinder I(x0 , . . . , xn−1 ) of rank n, we must have that
x − Pn ≤ diam I(x0 , . . . , xn−1 ) ≤ 1 .
Qn Q2n
Thus we can quantify how good a rational approximation Pn /Qn is to x by looking at the
denominator Qn . Thus understanding how Qn grows gives us information about x. For a
typical point, Qn grows exponentially fast and we can determine the exponential growth
rate.
Proposition 11.3.3
For Lebesgue almost every real number x ∈ (0, 1) we have that
π2
1
log Qn (x) =
.
n→∞ n
12 log 2
lim
Remark. Thus, for a typical point x ∈ (0, 1), we have that Qn (x) ∼ enπ
2 /12 log 2
.
Proof (not examinable). Let x ∈ (0, 1) be irrational and have continued fraction expansion [x0 , x1 , . . .]. Write
Pn (x)
.
[x0 , x1 , . . . , xn−1 ] =
Qn (x)
Then
1
Pn (x)
=
=
Qn (x)
x0 + [x1 , . . . , xn−1 ]
1
Qn−1 (T x)
.
=
Pn−1 (T x)
Pn−1 (T x) + x0 Qn−1 (T x)
x0 +
Qn−1 (T x)
(11.3.3)
By Lemma 6.3.1(ii) and the Euclidean algorithm, we know that for all n and all x,
Pn (x) and Qn (x) are coprime. As Pn−1 (T x) and Qn−1 (T x) are coprime, it follows that
Pn−1 (T x) + x0 Qn−1 (T x) and Qn−1 (T x) are coprime. Hence, comparing the numerators in
(11.3.3), we see that Pn (x) = Qn−1 (T x). Also note that P1 (x) = 1. Hence
P1 (T n−1 x)
P1 (T n−1 x)
1
Pn (x) Pn−1 (T x)
···
=
=
.
n−1
Qn (x) Qn−1 (T x)
Q1 (T
x)
Qn (x)
Qn (x)
Taking the logarithm and dividing by n gives that
n−1
n−1
j=0
j=0
Y Pn−j (T j x)
Pn−j (T j x)
1
1X
1
=
log
.
− log Qn (x) = log
n
n
Qn−j (T j x)
n
Qn−j (T j x)
(11.3.4)
This resembles an ergodic sum, except that the function Pn−j /Qn−j depends on j and so
we cannot immediately apply Birkhoff’s Ergodic Theorem. We will consider
sums
P ergodic
j x) and
f
(T
using the function f (x) = log x and show that the difference between n1 ∞
n=0
(11.3.4) is small.
105
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
Let f (x) = log x. Then we can write (11.3.4) as
n−1
n−1 1X
Pn−j (T j x)
1X
1 (2)
1
1
j
j
f (T x) −
log T (x) − log
− log Qn (x) =
= − Σ(1)
n − Σn .
j
n
n
n
Qn−j (T x)
n
n
j=0
j=0
(1)
We evaluate limn→∞ n1 Σn . By Birkhoff’s Ergodic Theorem and the fact that Gauss’
measure µ and Lebesgue measure are equivalent, it follows that for Lebesgue almost every
x ∈ [0, 1] we have that
1
1
lim Σ(1) =
n→∞ n n
log 2
Z
1
f (x)
dx =
1+x
log 2
Z
0
1
log x
dx.
1+x
Integrating by parts we have that
Z 1
Z 1
Z 1
log x
log(1 + x)
log(1 + x)
1
dx = log x log(1 + x)|0 −
dx = −
dx.
x
x
0 1+x
0
0
The Taylor series expansion of log(1 + x) about zero is
∞
log(1 + x) = x −
so that
X (−1)k−1 xk
x2 x3
+
− ··· =
2
3
k
k=1
∞
log(1 + x) X (−1)k xk
=
.
x
k+1
k=0
Hence for almost every x,
∞
1 X (−1)k
1
=
−
lim Σ(1)
n→∞ n n
log 2
k+1
k=0
Z
1
0
∞
xk dx = −
1 X (−1)k
.
log 2
(k + 1)2
k=0
Note that
∞
∞
∞
∞
∞
X
X
X
X
(−1)k
1
1
1
1X 1
π2
,
=
−
2
=
−
=
(k + 1)2
n2
(2n)2
n2 2 n=1 n2
12
n=1
n=1
n=1
k=0
using the well-known fact that
P∞
n=1 1/n
2
= π 2 /6. Hence for almost every x,
π2
1 (1)
Σn = −
.
n→∞ n
12 log 2
lim
(2)
It remains to show that n1 Σn → 0 as n → ∞. Recall that in §6.3 we introduced
the cylinder set I(x0 , x1 , . . . , xn−1 ) of rank n, denoting the set of points x with continued
fraction expansion that starts x0 , . . . , xn−1 . We proved in §6.3 that I(x0 , x1 , . . . , xn−1 ) is
an interval with length at most 1/Qn (x)2 . Note that both x and Pn (x)/Qn (x) lie in the
same interval of rank n. Hence
Qn x
Pn Qn 1
Pn
Pn /Qn − 1 = Pn x − Qn ≤ Pn Q2 = Qn .
n
106
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
It follows from Lemma 6.3.1(i) that Pn ≥ 2(n−2)/2 and Qn ≥ 2(n−1)/2 . Hence
x
1
1
Pn /Qn − 1 ≤ 2n−3/2 ≤ 2n−1 .
By the triangle inequality and the fact that log y ≤ y − 1 we have that
(n)
|Σ2 |
n−1
X
T j (x)
≤
j
j
Pn−j (T (x))/Qn−j (T (x)) j=0
n−1
X T j (x)
≤
Pn−j (T j (x))/Qn−j (T j (x)) − 1
log
j=0
n−1
X
≤
Note that
j=0
n−1
X
j=0
1
2n−j−1
1
2n−j−1
=
.
n−1
X
j=0
∞
X 1
1
≤
= 2.
2j
2j
j=0
(2)
Hence Σn < 2 for all n. Hence
lim
n→∞
1 (2)
Σ =0
n n
and the result follows.
§11.4
2
Exercises
Exercise 11.1
Let b ≥ 2 be an integer. Prove that Lebesgue measure is an ergodic invariant measure for
Tb (x) = bx mod 1 defined on the unit interval.
Exercise 11.2
(i) A number x ∈ [0, 1] is said to be simply normal if it is simply normal in base b for all
b ≥ 2. Prove that Lebesgue a.e. number x ∈ [0, 1] is simply normal.
(ii) Prove Proposition 11.2.3.
Exercise 11.3
Let r ≥ 2 be an integer. Prove that for Lebesgue almost every x ∈ [0, 1], the sequence
xn = r n x is uniformly distributed mod 1.
Exercise 11.4
Prove that the arithmetic mean of the digits appearing
10 expansion of LebesgueP in the base
j+1 , x ∈ {0, 1, . . . , 9} then
a.e. x ∈ [0, 1) is equal to 4.5, i.e. prove that if x = ∞
x
/10
j
j
j=0
lim
n→∞
1
(x0 + x1 + · · · + xn−1 ) = 4.5 a.e.
n
Exercise 11.5
Let x ∈ (0, 1) have continued fraction expansion x = [x0 , x1 , x2 , . . .].
107
MATH4/61112
11. Applications of Birkhoff’s Ergodic Theorem
Prove that
1
(x0 + x1 + · · · + xn−1 ) = ∞
n→∞ n
for Lebesgue almost every x ∈ [0, 1]. (Hint: use Exercise 10.2.)
lim
108
MATH4/61112
12. Solutions
12. Solutions to the Exercises
Solution 1.1
Suppose that xn ∈ R is uniformly distributed mod 1. Let x ∈ [0, 1] and let ε > 0. We want
to show that there exists n such that {xn } ∈ (x − ε, x + ε) ∩ [0, 1] (as usual, {xn } denotes
the fractional part of xn ).
By the definition of uniform distribution mod 1 we have that
1
card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} = 2ε.
n→∞ n
lim
Then there exists n0 such that if n ≥ n0 then
1
card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} > ε > 0.
n
Hence
card{j | 0 ≤ j ≤ n − 1, {xj } ∈ (x − ε, x + ε)} > 0
for some n, so there exists j such that {xj } ∈ (x − ε, x + ε).
Solution 1.2
We use Weyl’s Criterion. Let ℓ ∈ Z \ {0}. Then
n−1
n−1
n−1
X
1 X 2πiℓxj
1
1 X 2πiℓ(αj+β)
1
e2πiℓαj = e2πiℓβ
e
=
e
= e2πiℓβ
n
n
n
n
j=0
j=0
j=0
e2πiℓαn − 1
e2πiℓα − 1
,
summing the geometric progression. As α 6∈ Q, we have that e2πiℓα 6= 1 for any ℓ ∈ Z \ {0}.
Hence
n−1
1 X 2πiℓx 1 e2πiℓαn − 1 1
2
j
e
≤ n e2πiℓα − 1 ≤ n |e2πiℓα − 1| → 0
n
j=0
as n → ∞, as |e2πiℓβ | = 1. Hence xn = αn + β is uniformly distributed.
Solution 1.3
(i) If log10 2 = p/q with p, q integers, hcf(p, q) = 1, then 2 = 10p/q , i.e. 2q = 10p = 5p 2p .
Comparing indices, we see that 0 = p = q, a contradiction.
(ii) Let 2n have leading digit r. Then
2n = r · 10ℓ + terms involving lower powers of 10
where the terms involving lower powers of 10 are integers lying in [0, 10ℓ ). Hence
2n has leading digit r ⇔ r · 10ℓ ≤ 2n < (r + 1) · 10ℓ
⇔ log10 r + ℓ ≤ n log10 2 < log10 (r + 1) + ℓ
⇔ log10 r ≤ {n log10 2} < log10 (r + 1).
109
MATH4/61112
12. Solutions
Hence
1
card{k | 0 ≤ k ≤ n − 1, 2k has leading digit r}
n
1
=
card{k | 0 ≤ k ≤ n − 1, {k log10 2} ∈ [log10 r, log10 (r + 1))}
n
which, by uniform distribution, converges to log10 (r + 1) − log10 r = log10 (1 + 1/r) as
n → ∞.
Solution 1.4
P
The frequency with which the penultimate leading digit of 2n is r is given by 9q=1 A(q, r)
where A(q, r) is the frequency with which the leading digit is q and the penultimate leading
digit is r.
Now 2n has leading digit q and penultimate digit r precisely when
q · 10ℓ + r · 10ℓ−1 ≤ 2n < q · 10ℓ + (r + 1) · 10ℓ−1 .
Taking logs shows that 2n has leading digit q and penultimate leading digit r when
log10 (10q + r) + ℓ − 1 ≤ n log10 2 < log10 (10q + r + 1) + (ℓ − 1).
Reducing this mod 1 gives
log10 (10q + r) − 1 ≤ {n log10 2} < log10 (10q + r + 1) − 1
(the −1s appear because 1 < log10 (10q + r), log10 (10q + r + 1) < 2). As {n log10 2} is
uniformly distributed mod 1, we see that
A(q, r) = (log10 (10q + r + 1) − 1) − (log10 (10q + r) − 1)
1
.
= log10 1 +
10q + r
Hence the frequency with which the penultimate leading digit of 2n is r is
9
X
q=1
log10
1
1+
10q + r
= log10
9 Y
q=1
1
1+
10q + r
.
Solution 2.1
Suppose first that the numbers α1 , . . . , αk , 1 are rationally independent. This means that
if r1 , . . . , rk , r are rational numbers such that
r1 α1 + · · · + rk αk + r = 0,
then r1 = · · · = rk = r = 0. In particular, for ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0}
ℓ1 α1 + · · · + ℓk αk ∈
/ Z,
so that
e2πi(ℓ1 α1 +···+ℓk αk ) 6= 1.
110
MATH4/61112
12. Solutions
By summing the geometric progression we have that
X
1 e2πin(ℓ1 α1 +···+ℓk αk ) − 1 1 n−1 2πi(ℓ jα +···+ℓ jα ) 1
1
k
k e
n
= n e2πi(ℓ1 α1 +···+ℓk αk ) − 1 j=0
≤
2
1
→ 0,
n |e2πi(ℓ1 α1 +···+ℓk αk ) − 1|
as n → ∞.
Therefore, by Weyl’s Criterion, (nα1 , . . . , nαk ) is uniformly distributed mod 1.
Now suppose that the numbers α1 , . . . , αk , 1 are rationally dependent. Thus there exist
rationals r1 , . . . , rk , r (not all zero) such that r1 α1 + · · · + rk αk + r = 0. By multiplying by
a common denominator we can find ℓ = (ℓ1 , . . . , ℓk ) ∈ Zk \ {0} such that
ℓ1 α1 + · · · + ℓk αk ∈ Z.
Thus e2πi(ℓ1 nα1 +···+ℓk nαk ) = 1 for all n ∈ N and so
n−1
1 X 2πi(ℓ1 jα1 +···+ℓk jαk )
e
= 1 6→ 0,
n
j=0
as n → ∞.
Therefore, (nα1 , . . . , nαk ) is not uniformly distributed mod 1.
Solution 2.2
Let p(n) = αk nk + · · · + α1 n + α0 . Suppose that αk , . . . , αs+1 ∈ Q but αs 6∈ Q. Let
p1 (n) = αk nk + · · · + αs+1 ns+1
p2 (n) = αs ns + · · · + α1 n + α0
so that p(n) = p1 (n) + p2 (n). By choosing q to be a common denominator for αk , . . . , αs+1 ,
we can write
1
p1 (n) = (mk nk + · · · + ms+1 ns+1 )
q
where mj ∈ Z.
By Weyl’s Criterion, we want to show that for ℓ ∈ Z \ {0} we have
n−1
1 X 2πiℓp(j)
e
→0
n
j=0
as n → ∞.
Write j = qm + r where r = 0, . . . , q − 1. Then p1 (qm + r) = dr mod 1 for some dr ∈ Q.
(q,r)
Moreover, p2 (qm + r) = p2 (m) is a polynomial in m with irrational leading coefficient.
Now
1
n→∞ n
lim
n−1
X
h i
n
−1 q−1
q
e2πiℓp(j) =
1 X X 2πiℓdr 2πiℓp(q,r) (m)
2
e
e
n→∞ n
lim
m=0 r=0
j=0
=
lim
n→∞
h i
n
q
n
= 0
(q,r)
as p2
q−1
X
r=0
(m) is uniformly distributed mod 1.
111
h i
n
2πiℓdr
e
1
h i
n
q
−1
q
X
m=0
(q,r)
e2πiℓp2
(m)
MATH4/61112
12. Solutions
Solution 2.3
Let p(n) = αn2 + n + 1 where α 6∈ Q. Let m ≥ 1 and consider the sequence p(m) (n) =
p(n + m) − p(n) of mth differences. We have that
p(m) (n) = α(n + m)2 + (n + m) + 1 − αn2 − n − 1 = 2αmn + αm2 + m
which is a degree 1 polynomial in n with leading coefficient 2αm 6∈ Q. Note that 2αm 6 0,
as m ≥ 1. By Exercise 1.2 we have that p(m) (n) is uniformly distributed mod 1 for every
m ≥ 1. By Lemma 2.3.3, it follows that p(n) is uniformly distributed mod 1.
Solution 2.4
By Weyl’s Criterion, we require that for each (ℓ1 , ℓ2 ) ∈ Z2 \ {(0, 0)} we have
n−1
1X
exp 2πi(ℓ1 p(k) + ℓ2 q(k)) → 0
n
(12.0.1)
k=0
as n → ∞. Let
pℓ1 ,ℓ2 (n) = ℓ1 p(n) + ℓ2 q(n)
= (ℓ1 αk + ℓ2 βk )nk + · · · + (ℓ1 α1 + ℓ2 β1 )n + (ℓ1 α0 + ℓ2 β0 )
This is a polynomial of degree at most k. Then (12.0.1) can be written as
n−1
1X
exp 2πipℓ1 ,ℓ2 (k).
n
k=0
By the 1-dimensional version of Weyl’s criterion (using the integer ℓ = 1), this will converge
to 0 as n → ∞ if pℓ1 ,ℓ2 (n) is uniformly distributed mod 1. By Weyl’s Theorem on Polynomials (Theorem 2.3.1), this happens if at least one of ℓ1 αk + ℓ2 βk , ℓ1 αk−1 + ℓ2 βk−1 , . . . , ℓ1 α1 +
ℓ2 β1 is irrational. Note that ℓ1 αi +ℓ2 βi 6∈ Q if and only if αi , βi , 1 are rationally independent.
Solution 2.5
(i) We know that ∅ ∈ B and that if E ∈ B then X \ E ∈ B. Hence X = X \ ∅ ∈ B.
S
S
T
(ii) Let En T
∈ B. Then X \En T
∈ B. Then n (X \En ) ∈ B. Now n (X \En ) = X \ n En .
Hence n En = X \ (X \ n En ) ∈ B.
Solution 2.6
The smallest σ-algebra containing the sets [0, 1/4), [1/4, 1/2), [1/2, 3/4) and [3/4, 1] is
B = {∅, [0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1], [0, 1/2), [0, 1/4) ∪ [1/2, 3/4), [0, 1/4) ∪ [3/4, 1],
[1/4, 3/4), [1/4, 1/2) ∪ [3/4, 1], [1/2, 1], [0, 3/4), [0, 1/2) ∪ [3/4, 1],
[0, 1/4) ∪ [1/2, 1], [1/4, 1], [0, 1]}
Solution 2.7
Clearly a finite union of dyadic intervals is a Borel set.
By Proposition 2.4.2 we need to show that if x, y ∈ [0, 1], x 6= y, then there exist disjoint
dyadic intervals I1 , I2 such that x ∈ I1 , y ∈ I2 . Let ε = |x − y| and choose n such that
1/2n < ε/2. Without loss of generality, assume that x < y. Then there exist integers p, q,
p < q, such that
p
q
q+1
p−1
≤x< n < n <y≤ n .
n
2
2
2
2
Hence x, y belong to different dyadic intervals.
112
MATH4/61112
12. Solutions
Solution 2.8
Let A denote the collection of finite unions of intervals. Trivially ∅ ∈ A. If A, B ∈ A
are finite unions of intervals then A ∪ B is a finite union of intervals. Hence A is closed
under taking finite unions. If A = [a, b] ⊂ [0, 1] then Ac = [0, a) ∪ (b, 1] is a finite union of
intervals. Hence A is an algebra.
Solution 2.9
First note that if µ is a measure and A ⊂ B then µ(A) ≤ µ(B). (To see this, note that
if A ⊂ B then B = A ∪ (B \ A) is a disjoint union. Hence µ(B) = µ(A ∪ (B \ A)) =
µ(A) + µ(B \ A) ≥ µ(A).)
Let µ denote Lebesgue measure on [0, 1]. Let x ∈ [0, 1]. For any ε > 0, we have that
{x} ⊂ (x − ε, x + ε) ∩ [0, 1]. Hence µ({x}) ≤ 2ε. As ε > 0 is arbitrary, it follows that
µ({x}) = 0.
Let E = {xj }∞
j=1 be a countable set. Then


∞
∞
X
[
µ({xj }) = 0.
µ(E) = µ  {xj } =
j=1
j=1
Hence any countable set has Lebesgue measure 0.
As the rational points in [0, 1] are countable, it follows that µ(Q ∩ [0, 1]) = 0. Hence
Lebesgue almost every point in [0, 1] is irrational.
Solution 2.10
Let µ = δ1/2 be the Dirac δ-measure at 1/2. Then, by definition, µ([0, 1/2) ∪ (1/2, 1]) = 0
as 1/2 6∈ [0, 1/2) ∪ (1/2, 1]. Hence µ{x ∈ [0, 1] | x 6= 1/2} = 0, so that µ-a.e. point in [0, 1]
is equal to 1/2.
Solution 3.1
Let xn = αn where α ∈ R is irrational. Then xn is uniformly distributed mod 1 (by the
results in §1.2.1). Let A = {{αn} | n ≥ 0} ⊂ [0, 1] denote the set of fractional parts of the
sequence xn ; note that A is a countable set. Let f = χA . Then f ∈ L1 ([0, 1], B, µ) (where
B denotes
R the Borel σ-algebra and µ denotes Lebesgue measure on [0, 1]) and f ≡ 0 a.e.
Hence f dµ = 0. However, f ({xn }) = 1 for each n. Hence
n−1
1X
f ({xn }) = 1 6→
n
j=0
as n → ∞.
Z
f dµ = 0
Solution 3.2
Let X be a compact metric space equipped with the Borel σ-algebra B. Let T : X → X
be continuous. Recall that B is generated by the open sets. It is sufficient to check that
T −1 U ∈ B for all open sets U . But this is clear: as T is continuous, the pre-image T −1 U
of any open set is open, hence T −1 U ∈ B.
Solution 3.3
Define fn : [0, 1] → R by
fn (x) =
n − n2 x if 0 ≤ x ≤ 1/n
0
if 1/n ≤ x ≤ 1.
113
MATH4/61112
12. Solutions
R
(Draw a picture!) Then fn is continuous, hence fn ∈ L1 (X, B, µ). Moreover, fn dµ = 1/2
for each n. Hence fn 6→ 0 in L1 (X, B, µ).
However, fn → 0 µ-a.e. To see this, let x ∈ [0, 1], x 6= 0. Choose n such that 1/n < x.
Then fn (x) = 0 for any n ≥ N . Hence, if x 6= 0, we have that fn (x) = 0 for all sufficiently
large n. Hence fn → 0 µ-a.e.
Solution 3.4
First note that if B ∈ B then T −1 B ∈ B. Hence T∗ µ(B) = µ(T −1 B) is well-defined.
Clearly T −1 (∅) = ∅. Hence T∗ µ(∅) = µ(T −1 ∅) = µ(∅) = 0.
Let En ∈ B be pairwise disjoint. Then T −1 En ∈ B are pairwise disjoint. (To see
this, suppose that x ∈ T −1 En ∩ T −1 Em . Then T (x) ∈ En and T (x) ∈ Em . Hence
T (x) ∈ En ∩ Em . As the En are pairwise disjoint, this implies that n = m. Hence
T −1 En = T −1 Em .) Hence
!
!
!
∞
∞
∞
∞
∞
X
X
[
[
[
T∗ µ(En )
µ(T −1 En ) =
T −1 En =
En = µ
En = µ T −1
T∗ µ
n=1
n=1
n=1
n=1
n=1
S
S∞
−1 E .
where we have used the fact that T −1 ∞
n
n=1 En =
n=1 T
Hence T∗ µ is a measure.
Finally, note that T −1 (X) = X. Hence T∗ µ(X) = µ(T −1 X) = µ(X) = 1, so that T is
a probability measure.
Solution 3.5
(i) Let λ denote Lebesgue measure on [0, 1]. All one needs to do is to find a set B such
that λ(B) 6= λ(T −1 B), and any (reasonable) choice of set B will work. For example,
take B = (1/2, 1). Then
T −1 (B) =
∞ [
n=1
1
1
,
n + 1 n + 1/2
.
It follows that
λ(T −1 B) =
∞
X
n=1
1
1
= log(4) − 1 < = λ(B).
(1 + 2n)(1 + n)
2
(ii) Recall that
1
µ(B) =
log 2
Z
B
dx
1
=
1+x
log 2
Z
χB (x)
dx.
1+x
Note that 1/2 ≤ 1/(1 + x) ≤ 1 if 0 ≤ x ≤ 1. Hence
Z
Z
1
χB (x)
1
dx ≤ µ(B) ≤
χB (x) dx
log 2
2
log 2
so that
1
1
λ(B) =
2 log 2
2 log 2
Z
1
χB (x) dx ≤ µ(B) ≤
log 2
114
Z
χB (x) dx =
1
λ(B).
log 2
MATH4/61112
12. Solutions
(iii) From (3.4.1) it follows that
1
2 log 2
Z
f dλ ≤
Z
1
f dµ ≤
2
Z
f dµ
(12.0.2)
for all simple functions f . By taking increasing sequences of simple functions, we
see that (12.0.2) continues to hold for non-negative measurable functions. Let f ∈
L1 (X, B, µ). Then
Z
Z
1
|f | dλ ≤ |f | dµ
2 log 2
so that f ∈ L1 (X, B, λ). Similarly, if f ∈ L1 (X, B, λ) then f ∈ L1 (X, B, µ).
Solution 3.6
Let [a, b] ⊂ [0, 1]. Then
T
−1
[a, b] =
k−1
[
j=0
so that
T∗ µ([a, b]) =
k−1
X
b+j
j=0
k
a+j b+j
,
k
k
k−1
−
a+j Xb−a
=
= b − a = µ([a, b]).
k
k
j=0
Hence T∗ µ and µ agree on intervals. Hence, by the Hahn-Kolmogorov Extension Theorem,
T∗ µ = µ so that µ is a T -invariant measure.
Solution 3.7
Let T (x) = βx mod 1. Then T has a graph as illustrated in Figure 12.1.
1
1/β
0
0
1/β
1
Figure 12.1: The graph of T (x) = βx mod 1.
Let us first show that T does not preserve Lebesgue measure. For this, it is sufficient to
find a set B such that B and T −1 B do not have equal Lebesgue measure; in fact, almost any
115
MATH4/61112
12. Solutions
reasonable choice of B will suffice, but here is a specific example. Let λ denote Lebesgue
measure. Take B = [1/β, 1]. Then λ(B) = 1 − 1/β = 1/β 2 (as β − 1 = 1/β). Now
T −1 [1/β, 1] = [1/β 2 , 1/β] so that λ(T −1 B) = 1/β − 1/β 2 = 1/β 3 6= λ(B).
We now show that T does preserve the measure µ defined as in the statement of the
question. To do this we again use the Hahn-Kolmogorov Extension Theorem, which tells
us that it is sufficient to prove that µ(T −1 [a, b]) = µ[a, b] for all intervals [a, b] ⊂ [0, 1].
If [a, b] ⊂ [0, 1/β] then
a b
a+1 b+1
−1
,
,
∪
,
T [a, b] =
β β
β
β
a disjoint union. Hence,
µ(T
−1
[a, b]) =
=
1
+ β13
(b − a)
β
b−a
1
1
β + β3
1
1
+ 2
β β
1
β
1
+
β
1
β
+
1
β3
(b + 1) − (a + 1)
β
= µ([a, b]).
If [a, b] ⊂ [1/β, 1] then T −1 [a, b] = [a/β, b/β] and
a b
b−a
1
−1
,
µ(T [a, b]) = µ
= 1
= µ([a, b]).
1
β β
β
β + β3
If a < 1/β < b then we write [a, b] = [a, 1/β] ∪ [1/β, b]. Then T −1 [a, b] = T −1 [a, 1/β] ∪
T −1 [1/β, b], a disjoint union. Hence
µ(T −1 [a, b]) = µ(T −1 [a, 1/β] ∪ T −1 [1/β, b])
= µ(T −1 [a, 1/β]) + µ(T −1 [1/β, b])
= µ([a, 1/β]) + µ([1/β, b]) = µ([a, b]).
Solution 3.8
(i) Note that
µ([0, 1]) =
1
π
Z
1
0
using the substitution x = sin2 θ.
p
1
1
dx =
π
x(1 − x)
Z
π/2
dθ = 1,
0
(ii) By the Hahn-Kolmogorov Extension Theorem it is sufficient to prove that µ([a, b]) =
µ(T −1 [a, b]) for all intervals [a, b].
Note that
T
−1
[a, b] =
1−
√
√
√
1+ 1−b 1+ 1−a
1−a 1− 1−b
,
,
∪
2
2
2
2
√
(as the graph of T is decreasing on [1/2, 1] the order of a, b are reversed in the second
sub-interval). It is sufficient to prove that
µ([(1 −
√
1 − a)/2, (1 −
116
√
1 − b)/2]) =
1
µ([a, b])
2
MATH4/61112
12. Solutions
and
µ([(1 +
√
1 − b)/2, (1 +
√
1 − a)/2]) =
1
µ([a, b])
2
We prove the first equality (the second is similar). Now
µ([(1 −
√
1 − a)/2, (1 −
√
1
1 − b)/2]) =
π
Z
1−
1−
√
1−b
2
√
1−a
2
1
p
x(1 − x)
dx.
(12.0.3)
Consider the substitution
u = 4x(1
√− x). Then du = 4(1 − 2x)dx and as x ranges
√
between (1 − 1 − a)/2 and (1 − 1 − b)/2, u ranges between a, b. Note also that
a simple manipulation shows that (1 − 2x)2 = 1 − u. Hence the right-hand side of
(12.0.3) is equal to
Z b
1
1
1
p
du = µ([a, b]).
2π a
2
u(1 − u)
Similarly,
µ([(1 +
√
1 − b)/2, (1 +
√
1
1 − a)/2]) = µ([a, b])
2
and the result follows.
Solution 3.9
Note that
X=
∞ [
n=1
1
1
,
∪ {0}
n+1 n
and that this is a disjoint union. Hence, denoting Lebesgue measure by µ,
∞
X
µ
1 = µ(X) =
n=1
1
1
,
n+1 n
∞
∞
X
X
1
1
1
−
=
.
=
n n + 1 n=1 n(n + 1)
n=1
By the Hahn-Kolmogorov Extension Theorem, it is sufficient to check that µ(T −1 [a, b]) =
µ([a, b]) for all intervals [a, b]. It is straightforward to check that
T
−1
∞ [
b+n
a+n
,
[a, b] =
n(n + 1) n(n + 1)
n=1
and that this is a disjoint union. Hence
µ(T
−1
∞
X
a+n
b+n
µ
[a, b]) =
,
n(n + 1) n(n + 1)
=
n=1
∞
X
b−a
n(n + 1)
n=1
= b − a = µ([a, b]).
Solution 3.10
(i) Clearly d(x, y) ≥ 0 with equality if and only if x = y. It is also clear that d(x, y) =
d(y, x). It remains to prove the triangle inequality: d(x, z) ≤ d(x, y) + d(y, z) for all
x, y, z ∈ Σ.
117
MATH4/61112
12. Solutions
If any of x, y, z are equal then the triangle inequality is clear, so we can assume that
x, y, z are all distinct. Suppose that x and y agree in the first n places and that y
and z agree in the first m places. Then x and z agree in at least the first min{n, m}
places. Hence n(x, z) ≥ min{n(x, y), n(y, z)}. Hence
d(x, z) =
1
2n(x,z)
≤
1
2min{n(x,y),n(y,z)}
≤
1
2n(x,y)
+
1
2n(y,z)
= d(x, y) + d(y, z).
(ii) Let ε > 0 and choose n ≥ 1 such that 1/2n < ε. Choose δ = 1/2n+1 . Suppose that
d(x, y) < δ = 1/2n+1 . Then n(x, y) > n + 1, i.e. x and y agree in at least the
first n + 1 places. Hence σ(x) and σ(y) agree in at least the first n places. Hence
n(σ(x), σ(y)) > n. Hence d(σ(x), σ(y)) < 1/2n < ε.
(iii) We show that [i0 , . . . , in−1 ] is open. Let x ∈ [i0 , . . . , in−1 ] so that xj = ij for j =
0, 1, . . . , n − 1. Choose ε = 1/2n . Suppose that d(x, y) < ε. Then n(x, y) > n, i.e. x
and y agree in at least the first n places. Hence xj = yj for j = 0, 1, . . . , n − 1. Hence
yj = ij for j = 0, 1, . . . , n − 1 so that y ∈ [i0 , . . . , in−1 ]. Hence [i0 , . . . , in−1 ] is open.
To see that [i0 , . . . , in−1 ] is closed, note that
Σ \ [i0 , . . . , in−1 ] =
[
[i′0 , . . . , i′n−1 ]
where the union is over all n-tuples (i′0 , i′1 , . . . , i′n−1 ) 6= (i0 , i1 , . . . , in−1 ). This is a
finite union of open sets, and so is open. Hence [i0 , . . . , in−1 ], as the complement of
an open set, is closed.
Solution 3.11
First note that



P =





P =


3
0
1
0
0
0
1/4 0 3/4 0
0
0 1/2 0 1/2 0
0
0 3/4 0 1/4
0
0
0
1
0
0
5/8 0 3/8
0
5/32 0 3/4 0 3/32
0
1/2 0 1/2
0
3/32 0 3/4 0 5/32
0
3/8 0 5/8
0




 2 
,P = 








 4 
,P = 




1/4 0 3/4 0
0
0 5/8 0 3/8 0
1/8 0 3/4 0 1/8
0 3/8 0 5/8 0
0
0 3/4 0 1/4



,


5/32
0
3/4
0
3/32
0
17/32 0 15/32
0
1/8
0
3/4
0
1/8
0
15/32 0 17/32
0
3/32
0
3/4
0
5/32



.


As for each i, j there exists n for which P n (i, j) > 0, it follows that P is irreducible.
Recall that the period of P is the highest common factor of {n > 0 | P n (i, i) > 0}. As
all the diagonal entries of P 2 are positive, it follows that P has period 2.
Decompose {1, 2, 3, 4, 5} = {1, 3, 5} ∪ {2, 4} = S0 ∪ S1 . If P (i, j) > 0 then either i ∈ S0
and j ∈ S1 , or i ∈ S1 and j ∈ S0 , i.e. i ∈ Sℓ and j ∈ Sℓ+1 mod 2 . When restricted to the
indices {1, 3, 5}, P 2 has the form


1/4 3/4 0
 1/8 3/4 1/8 
0 3/4 1/4
118
MATH4/61112
12. Solutions
which is easily seen to be irreducible and aperiodic. When restricted to the indices {2, 4},
P 2 has the form
5/8 3/8
3/8 5/8
which is clearly irreducible and aperiodic.
The eigenvalues of P are found by evaluating the
−λ 1
0
0
1/4 −λ 3/4 0
0 1/2 −λ 1/2
0
0 3/4 −λ
0
0
0
1
After simplifying this expression, we obtain
determinant
0 0 0 .
1/4 −λ 1
(1 − λ)(1 + λ)λ λ +
4
.
(Note that, as P has period 2, we expect from the Perron-Frobenius Theorem that the
square roots of 1 to be the eigenvalues of modulus 1 for P .)
A left eigenvector p = (p(1), p(2), p(3), p(4), p(5)) for the eigenvalue 1 is determined by


0
1
0
0
0
 1/4 0 3/4 0
0 



(p(1), p(2), p(3), p(4), p(5))  0 1/2 0 1/2 0 
 = (p(1), p(2), p(3), p(4), p(5))
 0
0 3/4 0 1/4 
0
0
0
1
0
which simplifies to
1
3
3
1
1
1
p(2) = p(1), p(1)+ p(3) = p(2), p(2)+ p(4) = p(3), p(3)+p(5) = p(4), p(4) = p(5).
4
2
4
4
2
4
Setting p(1) = 1 we obtain (p(1), p(2), p(3), p(4), p(5)) = (1, 4, 6, 4, 1), and normalising this
to form a probability vector we obtain
1 1 3 1 1
p=
, , , ,
.
16 4 8 4 16
Solution 3.12
Let p = (p(1), . . . , p(k)) be a probability vector. Let P be the matrix



P =


p(1) p(2) · · · p(k)
p(1) p(2) · · · p(k) 

..
..  .
.
. 
p(1) p(2) · · · p(k)
Then P is a stochastic matrix. As each p(j) > 0, it follows that P is aperiodic. It is
straightforward to check that pP = p.
As P (i, j) = p(j), the Markov measure determined by the matrix P is the same as
Bernoulli measure determined by the probability vector p.
119
MATH4/61112
12. Solutions
Solution 4.1
Note that
χT −1 B (x) = 1 ⇔ x ∈ T −1 B ⇔ T (x) ∈ B ⇔ χB (T (x)) = 1.
Hence χT −1 B = χB ◦ T .
Solution 4.2
Note that T n (x) = x if and only if 2n x = x mod 1, i.e. 2n x = x + p for some integer p.
Hence x = p/(2n − 1). We get distinct values of x in R/Z when p = 0, 1, . . . , 2n − 2 (note
that when p = 2n − 1 then x = 1, which is the same as 0 in R/Z).
Hence there are infinitely many distinct periodic orbits for the
doubling map. If
Pn−1
x, T x, . . . , T n−1 x is a periodic orbit of period n then let δ(x) = 1/n j=0
δT j x denote the
periodic orbit measure supported on the orbit of x. As there are infinitely many distinct
periodic orbits, there are infinitely many distinct measures supported on periodic orbits.
Solution 4.3
Recall that R/Z can be regarded as [0, 1] where 0 and 1 are identified. Suppose that
f : [0, 1] → R is integrable and that f (0) = f (1) so that f is a well-defined function on
R/Z. Then
Z 1
Z
Z 1/2
f ◦ T dµ
f ◦ T dµ +
f ◦ T dµ =
1/2
0
=
Z
1/2
f (2x) dx +
0
=
=
1
2
Z
Z
1
f (x) dx +
0
Z
1
2
1
1/2
Z 1
f (2x − 1) dx
(12.0.4)
f (x) dx
0
f dµ
where we have used the substitution u(x) = 2x for the first integral and u(x) = 2x − 1 for
the second integral in (12.0.4)
Solution 4.4
It is straightforward to check that T : Rk /Zk → Rk /Zk is a diffeomorphism.
Recall that we can identify functions f : Rk /Zk → C with functions f : Rk → C that
satisfy f (x + n) = f (x) for all n ∈ Zk . We apply the change of variables formula with the
substitution u(x) = T (x). Note that
1 0
DT (x) =
1 1
so that | det DT | = 1. Hence, by the change of variables formula
Z
Z
Z
Z
f dµ = f dµ.
f ◦ T | det DT | dµ =
f ◦ T dµ =
T (Rk /Zk )
Rk /Zk
Solution 5.1
Let α = p/q with p, q ∈ Z, q 6= 0, hcf(p, q) = 1. Let
B=
q−1
[
j=0
1
j j
, +
.
q q 2q
120
MATH4/61112
12. Solutions
Then
T
−1
j j
j −p j−p
1
1
, +
,
+
=
q q 2q
q
q
2q
so that T −1 B = B (draw a picture to understand this better). However µ(B) = 1/2, so
that T is not ergodic with respect to Lebesgue measure.
Solution 5.2
Suppose that f ∈ L2 (X, B, µ) has Fourier series
X
c(n,m) e2πi(nx+my) .
(n,m)∈Z2
Then f ◦ T has Fourier series
X
c(n,m) e2πi(n(x+α)+m(x+y)) =
(n,m)∈Z2
X
c(n,m) e2πinα e2πi((n+m)x+my) .
(n,m)∈Z2
Comparing coefficients we see that
c(n+m,m) = e2πinα c(n,m) .
Suppose that m 6= 0. Then for each j > 0,
|c(n+jm,m) | = · · · = |c(n+m,m) | = |c(n,m) |,
as |e2πinα | = 1. Note that if m 6= 0 then (n + jm, m) → ∞ as j → ∞. By the RiemannLebesgue Lemma (Proposition 5.3.2(ii)), we must have that cn,m = 0 if m 6= 0. Hence f
has Fourier series
X
c(n,0) e2πinx
(n,0)∈Z2
and f ◦ T has Fourier series
X
c(n,0) e2πinα e2πinx .
(n,0)∈Z2
Comparing Fourier coefficients we see that
c(n,0) = c(n,0) e2πinα .
Suppose that n 6= 0. As α 6∈ Q, e2πinα 6= 1. Hence c(n,0) = 0 unless n = 0. Hence f has
Fourier series c(0,0) , i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue
measure.
Solution 5.3
Suppose that T : X → X has a periodic point x with period n. Let
n−1
µ=
1X
δT j x .
n
j=0
Let B ∈ B and suppose that T −1 B = B. We must show that µ(B) = 0 or 1.
121
MATH4/61112
12. Solutions
Suppose that x ∈ B. Then x ∈ T −1 B. Hence T (x) ∈ B. Continuing inductively, we see
that T j (x) ∈ B for j = 0, 1, . . . , n − 1. Hence
µ(B) =
n−1
n−1
j=0
j=0
1X
1X
δT j x (B) =
1 = 1.
n
n
Similarly, if x ∈ X \ B then T j (x) ∈ X \ B for j = 0, 1, . . . , n − 1 (we have used the fact
that if B is T -invariant then X \ B is T -invariant). Hence µ(B) = 0.
Solution 5.4
(i) Recall that the determinant of a matrix is equal to the product of all the eigenvalues.
Let T be a linear toral automorphism with corresponding matrix A. Suppose that A
has an eigenvalue of modulus 1. By considering A2 if necessary, there is no loss in
generality in assuming that det A = 1.
Suppose k = 2. Then the matrix A has two eigenvalues, λ, λ. As A has an eigenvalue
of modulus 1, we must have that λ = e2πiθ for some θ ∈ [0, 1). Then λ, λ satisfy the
equation λ2 + 2 cos θλ + 1 = 0. However the matrix A = (a, b; c, d) has characteristic
equation λ2 +(a+d)λ+1 = 0. Hence 2 cos θ = a+d, an integer. Thus θ = 0, ±π/2, ±π.
Hence λ = ±1, ±i, and is a root of unity and so T cannot be ergodic.
Now suppose k = 3. Then, assuming that A has an eigenvalue of modulus 1, the
¯ and µ ∈ R. As det A = 1, we must have that
eigenvalues must be λ = e2πiθ , λ
¯
¯
λλµ = 1. As λλ = 1, it follows that µ = 1, Hence A has 1 as an eigenvalue and so T
cannot be ergodic.
Thus k ≥ 4.
(ii) A has integer entries and it is easy to see that det A = 1. Hence A determines a linear
toral automorphism of R4 /Z4 .
(iii) It is straightforward to calculate that the characteristic equation for A is
λ4 − 8λ3 + 6λ2 − 8λ + 1 = 0.
Clearly, λ 6= 0. Dividing by λ2 and substituting u = λ + λ−1 we see that
u2 − 8u + 4 = 0.
Hence
√
u = 4 ± 2 3.
Multiplying λ + λ−1 = u by λ we obtain a quadratic in λ with solution
√
u ± u2 − 4
.
λ=
2
Substituting the two different values of u gives four values of λ, namely:
q
q
√
√
√
√
2 + 3 ± 6 + 4 3, 2 − 3 ± i 4 3 − 6.
The first two are real and not of unit modulus, whereas the second two are complex
numbers of unit modulus.
122
MATH4/61112
12. Solutions
(iv) This question is not part of the course and is included for completeness only. The
solution requires ideas from Galois theory.
We first claim that λ4 − 8λ3 + 6λ2 − 6λ + 1 is irreducible over Q. (To see this, recall
that irreducibility over Q is equivalent to irreducibility over Z. Now apply Eisenstein’s
criterion using the prime 2.) Hence λ4 − 8λ3 + 6λ2 − 6λ + 1 has no common factors
with λn − 1 for any n. Hence λ is not a root of unity.
Solution 6.1
(i) Let µ denote Lebesgue measure. We prove that T∗ µ = µ by using the HahnKolmogorov Extension Theorem. It is sufficient to prove that T∗ µ([a, b]) = µ([a, b])
for all intervals [a, b]. Note that
a b
a
b
−1
T [a, b] =
,
∪ 1 − ,1 −
.
2 2
2
2
Hence
b
a
b a − 1−
T∗ µ([a, b]) = − + 1 −
= b − a = µ([a, b]).
2 2
2
2
Hence µ is a T -invariant measure.
(ii) Define I(0) = [0, 1/2], I(1) = [1/2, 1] and define the maps
φ0 : [0, 1] → I(0) : x 7→
x
x
, φ1 : [0, 1] → I(1) : x 7→ 1 − .
2
2
Then T φ0 (x) = x, T φ1 (x) = x.
Given i0 , . . . , in−1 ∈ {0, 1} define
φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1
and note that T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1].
Define
I(i0 , i1 , . . . , in−1 ) = φi0 ,i1 ,...,in−1 ([0, 1])
and call this a cylinder of rank n. It is easy to see that cylinders of rank n are
dyadic intervals (although the labelling of these cylinders is not the same as the
labelling that one gets when using the doubling map: for example, for the tent map
I(1, 1) = [1/2, 3/4] whereas for the doubling map I(1, 1) = [3/4, 1]). Hence the algebra
A of finite unions of cylinders generates the Borel σ-algebra.
Let B ∈ B be such that T −1 B = B. Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 )
be a cylinder of rank n and let φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note also that
µ(I) = 1/2n . We will also need the fact that |φ′ (x)| = 1/2n (this follows from noting
that |φ′0 (x)| = |φ′1 (x)| = 1/2 and using the chain rule).
Finally, we observe that
Z
µ(B ∩ I) =
χB∩I (x) dx
Z
=
χB (x)χI (x) dx
Z
χB (x) dx
=
I
123
MATH4/61112
12. Solutions
=
=
=
Z
Z
Z
Z
1
χB (φ(x))|φ′ (x)| dx by the change of variables formula
0
1
0
1
χT −n B (φ(x))|φ′ (x)| dx as T −n B = B
χB (T n (φ(x)))|φ′ (x)| dx
0
1
χB (x)|φ′ (x)| dx as T n φ(x) = x
Z 1
1
=
χB (x) as |φ′ (x)| = 1/2n
2n 0
= µ(I)µ(B) as µ(I) = 1/2n .
=
0
Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1
it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .
Solution 6.2
For each n ≥ 1 define I(n) = [1/(n + 1), 1/n] and define the maps
φn : [0, 1] → I(n) : x 7→
x−n
.
n(n + 1)
Note that T φn (x) = x for all x ∈ [0, 1].
Given i0 , i1 , . . . , in−1 ∈ N define
φi0 ,i1 ,...,in−1 = φi0 φi1 · · · φin−1
and note that T n φi0 ,i1 ,...,in−1 (x) = x for all x ∈ [0, 1].
Define I(i0 , i1 , . . . , in−1 ) = φi0 ,i1 ,...,in−1 ([0, 1]) and call this a cylinder of rank n. Note
that
1
1
≤
φ′n (x) =
n(n + 1)
2
so that, by the chain rule,
φ′i0 ,i1 ,...,in−1 (x)
=
n−1
Y
j=0
1
1
≤ n.
ij (ij + 1
2
By the Intermediate Value Theorem, I(i0 , i1 , . . . , in−1 ) is an interval of length no more than
1/2n . For each n, the cylinders of rank n partition [0, 1]. Let x, y ∈ [0, 1] and suppose that
x 6= y. Choose n such that |x − y| > 1/2n . Then x, y must lie in different cylinders of rank
n. Hence the cylinders separate the points of [0, 1]. By Proposition 2.4.2 it follows that the
algebra A of finite unions of cylinders generates the Borel σ-algebra.
Let B ∈ B be such that T −1 B = B. Note that T −n B = B. Let I = I(i0 , i1 , . . . , in−1 )
be a cylinder of rank n and let φ = φi0 ,i1 ,...,in−1 . Then T n φ(x) = x. Note that
µ(I) =
n−1
Y
j=0
1
ij (ij + 1
for any x ∈ [0, 1].
124
= φ′ (x)
(12.0.5)
MATH4/61112
12. Solutions
Finally, we observe that
Z
µ(B ∩ I) =
χB∩I (x) dx
Z
=
χB (x)χI (x) dx
Z
χB (x) dx
=
I
Z 1
χB (φ(x))|φ′ (x)| dx by the change of variables formula
=
0
Z 1
χT −n B (φ(x))|φ′ (x)| dx as T −n B = B
=
0
Z 1
χB (T n (φ(x)))|φ′ (x)| dx
=
0
Z 1
χB (x)|φ′ (x)| dx
=
0
= µ(I)µ(B) by (12.0.5).
Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it
follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .
Solution 6.3
(i) First note that
1
P1
1
=
,
x0
Q1 x0 +
1
x1
=
x1
P2
=
.
x0 x1 + 1
Q2
If we define P0 = 0, Q0 = 1 then we have that P2 = x1 P1 + P0 and Q2 = x1 Q1 + Q0 .
Similarly,
x1 + t
1
P1 (x0 ; t)
1
P2 (x0 , x1 ; t)
=
=
,
=
.
1
x0 + t
Q1 (x0 ; t) x0 + x +t
x0 x1 + 1 + t
Q2 (x0 , x1 ; t)
1
then
P2 (x0 , x1 ; t) = P2 + tP1 , Q2 (x0 , x1 ; t) = Q2 + tQ1 .
Suppose that Pn (x0 , . . . , xn−1 ) = Pn + tPn−1 , Qn (x0 , . . . , xn−1 ) = Qn + tQn−1 . Then
Pn+1 (x0 , x1 , . . . , xn ; t)
Qn+1 (x0 , x1 , . . . , xn ; t)
= [x0 , . . . , xn−1 , xn + t]
1
= [x0 , . . . , xn−1 +
xn + t
Pn (x0 , x1 , . . . , xn−1 ; xn1+t )
=
Qn (x0 , x1 , . . . , xn−1 ; xn1+t )
=
=
125
Pn +
1
xn +t Pn−1
1
xn +t Qn−1
Qn +
xn Pn + Pn−1 + tPn−1
.
xn Qn + Qn−1 + tQn−1
MATH4/61112
12. Solutions
Hence
Pn+1 (x0 , x1 , . . . , xn ; t) = xn Pn +Pn−1 +tPn , Qn+1 (x0 , x1 , . . . , xn ; t) = xn Qn +Qn−1 +tQn .
Putting t = 0 we obtain the recurrence relations
Pn+1 = xn Pn + Pn+1 , Qn+1 = xn Qn + Qn+1 .
Hence
Pn+1 (x0 , x1 , . . . , xn ; t) = Pn+1 + tPn , Qn+1 (x0 , x1 , . . . , xn ; t) = Qn+1 + tQn .
By induction, the recurrence relations hold.
(ii) Note that
Qn Pn−1 − Qn−1 Pn = (xn−1 Qn−1 + Qn−2 )Pn−1 − Qn−1 (xn−1 Pn−1 + Pn−2 )
= −(Qn−1 Pn−2 − Qn−2 Pn−1 ) = · · · = (−1)n .
Solution 6.4
(i) Let x = (x0 , x1 , . . .), y = (y0 , y1 , . . .) ∈ Σ. Let dR/Z and dΣ denote the usual metrics
on R/Z and Σ, respectively. Now
dR/Z (π(x), π(y))
(12.0.6)
≤ |π(x0 , x1 , . . .) − π(y0 , y1 , . . .)|
x0 − y0 x1 − y1
= +
+ · · ·
2
2
2
|x0 − y0 | |x1 − y1 |
+
+ ···.
(12.0.7)
≤
2
22
Now if dΣ (x, y) < 1/2n then xj = yj for j = 0, . . . n. Hence we can bound the
right-hand side of (12.0.7) by
|xn+1 − yn+1 | |xn+2 − yn+2 |
+
+ ··· ≤
2n+2
2n+3
1
2n+2
1
+ n+3 + · · ·
2
1
1
1 + + 2 + ···
2 2
1
2n+2
1
≤
,
n+1
2
summing the geometric progression. This implies that π is continuous. To see this,
let ε > 0. Choose n such that 1/2n+1 < ε. Choose δ = 1/2n . If dΣ (x, y) < δ then
dR/Z (π(x), π(y)) < ε.
=
(ii) Observe that if x = (xj )∞
j=0 ∈ Σ then
π(σ(x)) = π(σ(x0 , x1 , . . .)) = π(x1 , x2 , . . .) =
x1 x2
+ 2 + ···
2
2
and
T (π(x)) = T (π(x0 , x1 , . . .))
x
x1
0
+ 2 + ···
= T
2
2
x1 x2
= x0 +
+ 2 + · · · mod 1
2
2
x1 x2
+ 2 + ···.
=
2
2
126
MATH4/61112
12. Solutions
(iii) We must show that T∗ (π∗ µ) = π∗ µ. To see this, observe that
T∗ (π∗ µ)(B) = π∗ µ(T −1 B)
= µ(π −1 T −1 B)
= µ(σ −1 π −1 B) as πσ = T π
= (σ∗ µ)(π −1 B)
= µ(π −1 B) as µ is σ-invariant
= (π∗ µ)(B).
(iv) Suppose that µ is an ergodic measure for σ. We claim that π∗ µ is an ergodic measure
for T , i.e. if B ∈ B(R/Z) is such that T −1 B = B then π∗ µ(B) = 0 or 1.
First observe that π −1 (B) is σ-invariant. This follows as:
σ −1 (π −1 (B)) = π −1 T −1 (B) = π −1 (B).
As µ is an ergodic measure for σ, we must have that µ(π −1 (B)) = 0 or 1. Hence
π∗ µ(B) = 0 or 1.
(v) There are uncountably many different Bernoulli measures µp for Σ given by the family
of probability vectors (p, 1 − p). These are ergodic for σ. To see that π∗ µp are all
different, notice that π∗ µp ([0, 1/2)) = µp (π −1 [0, 1/2)) = µp ([0]) = p, where [0] denotes
the cylinder consisting of all sequences that start with 0.
Solution 7.1
Suppose that xn → x. We must show that δxn ⇀ δx . Let f ∈ C(X, R). Then
Z
Z
f dδxn = f (xn ) → f (x) = f dδx
as f is continuous. Hence δxn ⇀ δx .
Solution 7.2
Suppose that µn ⇀ µ. We must show that T∗ µ ⇀ T∗ µ. Let f ∈ C(X, R). Then
Z
Z
Z
Z
f d(T∗ µn ) = f ◦ T dµn → f ◦ T dµ = f d(T∗ µ)
as f ◦ T is continuous. Hence T∗ µn ⇀ T∗ µ.
Solution 7.3
(i) Suppose that µn → µ.
R We claim
R that µn ⇀ µ. To show this, we have to prove that
if f ∈ C(X, R) then f dµn → f dµ.
Let f ∈ C(X, R). Note that f /kf k∞ ∈ C(X, R) and that k(f /kf k∞ )k∞ = 1. Hence
Z
Z
Z
Z
f
f
f dµn − f dµ = kf k∞ dµ
−
dµ
n
kf k∞
kf k∞ Z
Z
g dµn − g dµ
≤
sup
g∈C(X,R),kgk∞ ≤1
= kµn − µk,
which tends to 0 as n → ∞.
127
MATH4/61112
12. Solutions
(ii) Suppose that xn → x but that xn 6= x for all n. We claim that δxn 6→ δx . Note that
kδxn − δx k =
sup
f ∈C(X,R),kf k∞ ≤1
|f (xn ) − f (x)|.
For each n, we can choose a continuous function fn ∈ C(X, R) such that fn (x) = 1,
fn (xn ) = 0 and kfn k∞ ≤ 1. Hence
kδxn − δx k =
sup
f ∈C(X,R),kf k∞ ≤1
|f (xn ) − f (x)|.
≥ |fn (xn ) − fn (x)|
= 1.
Hence δxn 6→ δx .
(iii) First note that if f ∈ C(X, R) is any continuous function with kf k∞ ≤ 1, then
Z
Z
f dδx − f dδy = |f (x) − f (y)| ≤ |f (x)| + |f (y)| ≤ 2.
Hence
kδx − δy k =
sup
f ∈C(X,R),kf k∞ ≤1
Z
Z
f dδx − f dδy ≤ 2.
Conversely, by Urysohn’s Lemma, there exist continuous functions g1 , g2 such that
g1 (x) = g2 (y) = 1, g1 (y) = g2 (x) = 0 and 0 ≤ g1 , g2 ≤ 1. Let h = g1 − g2 . Then
h(x) = 1, h(y) = −1 and −1 ≤ h ≤ 1 (so that khk∞ = 1). Hence
2 = |h(x) − h(y)|
Z
Z
= h dδx − h dδy Z
Z
≤
sup
f dδx − f dδy f ∈C(X,R),kf k∞ ≤1
= kδx − δy k.
Hence kδx − δy k = 2.
Solution 7.4
Let xn ∈ X be a sequence such that xn → x and xn 6= x for all n. Let µn = δxn and µ = δx .
Then µn ⇀ µ. Take B = {x}. Then µn (B) = 0 but µ(B) = 1. Hence µn (B) 6→ µ(B).
Solution 7.5
Let µ1 , µ2 ∈ M (X, T ) and suppose that α ∈ [0, 1]. Then αµ1 + (1 − α)µ2 ∈ M (X). To
check that αµ1 + (1 − α)µ2 ∈ M (X, T ), note that
(T∗ (αµ1 + (1 − α)µ2 ))(B) = (αµ1 + (1 − α)µ2 )(T −1 B)
= αµ1 (T −1 B) + (1 − α)µ2 (T −1 B)
= αµ1 (B) + (1 − α)µ2 (B)
= (αµ1 + (1 − α)µ2 )(B).
128
MATH4/61112
12. Solutions
Solution 7.6
Let S ⊂ C(X, R) be uniformly dense. Let f ∈ C(X, RR). Let ε >
R 0. Choose g ∈ S such that
kf − gk∞ < ε. Choose N such that if n ≥ N then | g dµn − g dµ| < veps. Then
Z
Z
f dµn − f dµ
Z
Z
Z
Z
Z
Z
≤ f dµn − g dµn + g dµn − g dµ + f dµ − g dµ
Z
Z
Z
Z
≤
|f − g| dµn + g dµn − g dµ + |f − g| dµ
≤ 3ε.
As ε > 0 is arbitrary, the result follows.
Solution 7.7
(i) Recall that x ∈ Σ is a periodic point with period n if σ n (x) = x. If x is periodic
with period n then xj+n = xj for all j = 0, 1, 2, . . .. Hence x is determined by the
first n symbols, which then repeat. As there are two choices for each xj , there are 2n
periodic points with period n.
(ii) First note that µn is a Borel probability measure.
Let [i0 , i1 , . . . , im−1 ] be a cylinder. Let n ≥ m. Then the periodic points x of period
n in [i0 , i1 , . . . , im−1 ] have the form
x = (i0 , i1 , . . . , im−1 , xm , . . . , xn−1 , i0 , i1 , . . . , im−1 , xm , . . . , xn−1 , i0 , . . .)
where the finite string of symbols i0 , i1 , . . . , im−1 , xm , . . . , xn−1 repeats. The symbols
xm , . . . , xn−1 can be chosen arbitrarily. Hence there are 2n−m such periodic points.
Hence, if n ≥ m,
Z
Z
1
1
n−m
= m = χ[i0 ,i1 ,...,im−1 ] dµ.
χ[i0 ,i1 ,...,im−1 ] dµn = n × 2
2
2
(iii) To prove that χ[i0 ,i1 ,...,im−1 ] is continuous we need to show that, if xn → x then
χ[i0 ,i1 ,...,im−1 ] (xn ) → χ[i0 ,i1 ,...,im−1 ] (x).
First suppose that x ∈ [i0 , i1 , . . . , im−1 ]. As xn → x, it follows from the definition of
the metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agree in
the first m places. Hence if n ≥ N then xn ∈ [i0 , i1 , . . . , im−1 ]. Hence, if n ≥ N , then
χ[i0 ,i1 ,...,im−1 ] (xn ) = 1 = χ[i0 ,i1 ,...,im−1 ] (x).
Now suppose that x 6∈ [i0 , i1 , . . . , im−1 ]. As xn → x, it follows from the definition of
the metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agree
in the first m places. Hence if n ≥ N , there exists j ∈ {0, 1, . . . , m − 1} such that
(xn )j 6= ij ; that is, if n ≥ N , then xn 6∈ [i0 , i1 , . . . , im−1 ]. Hence, if n ≥ N , then
χ[i0 ,i1 ,...,im−1 ] (xn ) = 0 = χ[i0 ,i1 ,...,im−1 ] (x).
Hence χ[i0 ,i1 ,...,im−1 ] is continuous.
(iv) Let S denote the set of finite linear combinations of characteristic functions of cylinders. By the Stone-Weierstrass
Theorem,
S is uniformly dense in C(X, R). By (ii)
R
R
above, if g ∈ S then g dµn → g dµ as n → ∞. Let f ∈ C(X, R) and let ε > 0.
Choose g ∈ S such that kf − gk∞ < Rε. Then aR 3ε argument as in the solutions to
Exercise 7.6 proves that lim supn→∞ | f dµn − f dµ| < 3ε and the result follows.
129
MATH4/61112
12. Solutions
Solution 7.8
As trigonometric polynomials are uniformly dense in C(X, R), it is sufficient to prove that
R
R
Pr
2πihn(j) ,xi ,
g ◦ T dµ = g dµ for all trigonometric polynomials g. Let g(x) =
j=0 cj e
(j)
(j)
(j)
cj ∈ R, n(j) = (n1 , n2 , n3 ) ∈ Z3 be a trigonometric
polynomial. We label the coefficients
R
(j)
so that n = 0 if and only if j = 0. Then g dµ = c0 .
Note that
 




x
α+x
g ◦ T  y  + Z3  = g  x + y  + Z3 
z
y+z
r
X
(j) (j) (j)
cj e2πih(n1 ,n2 ,n3 ),(α+x,x+y,y+x)i
=
=
j=0
r
X
(j)
“
”
(j)
(j)
(j)
α 2πi n1 x+n2 (x+y)+n3 (y+z)
(j)
α 2πih(n1 +n2 ,n2 +n3 ,n3 ),(x,y,z)i
cj e2πin1
e
j=0
=
r
X
cj e2πin1
e
(j)
(j)
(j)
(j)
(j)
(j)
(j)
(j)
.
j=0
Hence
Z
g dµ =
Z X
r
(j)
cj e2πin1
(j)
(j)
α 2πih(n1 +n2 ,n2 +n3 ,n3 ),(x,y,z)i
e
dµ
j=0
=
r
X
j=0
(j)
2πin1 α
cj e
Z
(j)
e2πih(n1
(j)
(j)
(j)
(j)
(j)
(j)
+n2 ,n2 +n3 ,n3 ),(x,y,z)i
(j)
(j)
dµ.
(j)
The integral is equal to zero unless (n1 + n2 , n2 + n3 , n3 ) = (0, 0, 0), i.e. unless
(j)
(j)
(j)
n1 = n2 = n3 = 0. By our choice of labelling the coefficients, this only happens if
j = 0. Hence
Z
Z
g ◦ T dµ = c0 =
g dµ.
Solution 8.1
(i) Let B ∈ B and let f = χB . Note that
Z
Z
Z
Z
Z
Z
dν
dν
dν
dµ = χB
dµ = f
dµ.
dν = ν(B) =
f dν = χB dν =
dµ
dµ
dµ
B
B
Hence the result holds for characteristic functions, hence for simple functions (finite
linear combinations of characteristic functions). Let f ∈ L1 (X, B, µ) be such that
f ≥ 0. By considering an increasing sequence of simple functions, the result follows
for positive L1 functions. By splitting an arbitrary real-valued L1 function into its
positive and negative parts, and then an arbitrary L1 (X, B, µ) function into its real
and imaginary parts, the result holds.
(ii) Now dν1 /dµ, dν2 /dµ are the unique functions such that
Z
Z
dν2
dν1
dµ, ν2 (B) =
dµ,
ν1 (B) =
B dµ
B dµ
130
MATH4/61112
12. Solutions
respectively. Hence
ν1 (B) + ν2 (B) =
Z
B
dν1
dµ +
dµ
Z
B
dν2
dµ =
dµ
Z
B
dν1 dν2
+
dµ.
dµ
dµ
However,
(ν1 + ν2 )(B) =
Z
B
d(ν1 + ν2 )
dµ.
dµ
Hence, by uniqueness in the Radon-Nikodym theorem, we have that
d(ν1 + ν2 )
dν1 dν2
=
+
.
dµ
dµ
dµ
(iii) Suppose that µ(B) = 0. As ν ≪ µ then ν(B) = 0. As λ ≪ ν then λ(B) = 0. Hence
λ ≪ µ.
Now as λ ≪ µ we have
λ(B) =
Z
B
As λ ≪ ν we have
λ(B) =
Z
B
dλ
dµ.
dµ
dλ
dν =
dν
Z
χB
dλ
dν.
dν
By part (i), using the fact that ν ≪ µ, it follows that
Z
Z
Z
dλ dν
dλ dν
dλ
dν = χB
dµ =
dµ.
χB
dν
dν dµ
B dν dµ
Hence by uniqueness in the Radon-Nikodym theorem,
dλ
dλ dν
=
.
dµ
dν dµ
Solution 8.2
The claimed formula is easily seen to be valid for n = 3. Suppose the formula is valid for
n. Then
 

x
T n+1  y  + Z3 
z
 

x
= T T n  y  + Z3 
z



n
α
+
x






1 


n
n
3 




= T 
+ Z 
α+
x+y

 

1 2 


n
n
n
y+z
x+
α+
1
2
3
131
MATH4/61112
12. Solutions
n
α+x+α


1 
n
n
n


= 
α+
x+y+
α+x
 2 1 1 
n
n
n
n
n
α+
x+
y+z+
α+
x+y
3
2
1
2
1



n+1
α
+
x






1 


n+1
n+1
3



= 
α+
x+y
+ Z .

 

2 1 


n+1
n+1
n+1
α+
x+
y+z
3
2
1









 + Z3 






Hence the claimed formula holds by induction.
Let f (x, y, z) = e2πi(kx+ℓy+mz) . Then
n−1
n−1
j=0
j=0
(k,ℓ,m)
1 X 2πipx,y,z
1X
(j)
f (T j ((x, y, z) + Zk )) =
e
n
n
(k,ℓ,m)
(k,ℓ,m)
where px,y,z (n) is a polynomial. When m 6= 0, px,y,z (n) is a degree 3 polynomial with
(k,ℓ,m)
leading coefficient mα/6 6∈ Q. When m = 0, ℓ 6= 0, px,y,z (n) is a degree 2 polynomial with
(k,ℓ,m)
leading coefficient ℓα/2 6∈ Q. When m = ℓ = 0, k 6= 0, px,y,z (n) is a degree 1 polynomial
(k,ℓ,m)
with leading coefficient kα 6∈ Q. In all three cases, px,y,z (n) is uniformly distributed
mod 1, by Weyl’s Theorem on Polynomials (Theorem 2.3.1). Hence by Weyl’s Criterion
(theorem 1.2.1 for all (k, ℓ, m) ∈ Z3 \ {(0, 0, 0)} we have
n−1
(k,ℓ,m)
1 X 2πipx,y,z
(j)
e
→0
n
j=0
as n → ∞. When k = ℓ = m = 0 we trivially have that
n−1
n−1
j=0
j=0
(k,ℓ,m)
1 X 2πipx,y,z
1X
(j)
e
=
1→1
n
n
as n → ∞. Hence
n−1
1X
f (T j ((x, y, z) + Zk )) →
n
j=0
Z
f dµ
whenever f (x, y, z) = e2πi(kx+ℓy+mz) .
By taking finite linear combinations of exponential functions we see that
n−1
Z
1 X
j
g(T (x)) − g dµ → 0
sup x∈X n
j=0
as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (Theorem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R)
132
MATH4/61112
12. Solutions
and let ε > 0. Then there exists a trigonometric polynomial g such that kf − gk∞ < ε.
Hence for any x ∈ X we have
n−1
Z
1 X
j
f (T (x)) − f dµ
n
j=0
n−1
n−1
Z
Z
X
1 X
1
j
j
j
≤ (f (T (x)) − g(T (x)) + g(T (x)) − g dµ + g(x) − f (x) dµ
n j=0
n j=0
n−1
Z
Z
n−1
1 X
1X
j
j
j
|f (T (x)) − g(T (x)| + g(T (x)) − g dµ + |g(x) − f (x)| dµ
≤
n
n j=0
j=0
X
Z
1 n−1
j
≤ 2ε + g(T (x)) − g dµ
.
n j=0
∞
Hence, taking the supremum over all x ∈ X, we have
n−1
n−1
Z
Z
X
1 X
1
j
j
f (T (x)) − f dµ
g(T (x)) − g dµ
n
≤ 2ε + n
.
j=0
j=0
∞
∞
Letting n → ∞ we see that
n−1
Z
1 X
j
f (T (x)) − f dµ
lim sup n→∞ n
j=0
≤ 2ε.
∞
As ε > 0 is arbitrary, it follows that
n−1
Z
1 X
j
lim f (T (x)) − f dµ
n→∞ n
j=0
= 0.
∞
Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic
Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the
unique invariant measure.
Solution 8.3
Let T be a uniquely ergodic homeomorphism with unique invariant measure µ.
Suppose that every orbit is dense. Let U be a non-empty
open set. Then for all x ∈ X,
S∞
n
there exists n ∈ Z such that T (x) ∈ U . Hence X = n=−∞ T −n U . Hence
!
∞
∞
∞
X
X
[
µ(U )
µ(T −n U ) =
T −n U ≤
1 = µ(X) = µ
n=−∞
n=−∞
n=−∞
as µ is T -invariant. Hence µ(U ) > 0.
Conversely, suppose that µ(U ) > 0 for all non-empty open sets. Suppose for a contradiction that there exists x0 ∈ X such that the orbit of x0 is not dense.
Clearly {T n (x0 ) | n ∈ Z} is T -invariant. As T is continuous, the set
Y = cl{T n (x0 ) | n ∈ Z}
133
MATH4/61112
12. Solutions
is also T -invariant. As the orbit of x0 is not dense, Y is a proper subset of X. As Y is closed
and X is compact, it follows that Y is compact. By Theorem 7.5.1 there exists an invariant
probability measure ν for the map T : Y → Y . Extend ν to X by setting ν(B) = ν(B ∩ Y )
for Borel subsets B ⊂ X. Noting that X \ Y is also T -invariant, it follows that ν is an
invariant measure for T : X → X. This contradicts unique ergodicity as ν(X \ Y ) = 0 but
µ(X \ Y ) > 0.
Solution 9.1
Let X = R equipped with the Borel σ-algebra and Lebesgue measure. Define T (x) = x + 1.
Then Lebesgue measure is T -invariant. Take A = [0, 1). Then A has positive measure, but
no point of A returns to A under T .
Solution 9.2
Take X = {0, 1} to be a set consisting of two elements. Let B be the set of all subsets of
X and equip X with the measure µ = 12 δ0 + 12 δ1 that assigns measure 1/2 to both 0 and
1. Take T (x) = x to be the identity. Then T is a measure-preserving transformation. Let
A = {0}, B = {1}. Then µ(A) = µ(B) = 1/2 > 0. However, T j (0) never lands in B.
Solution 9.3
Recall that E(f | A) is determined as being the unique A-measurable function such that
Z
Z
f dµ
E(f | A) dµ =
A
A
for all A ∈ A.
(i) We need to show that
E(αf + βg | A) = αE(f | A) + βE(g | A).
Note that αE(f | A) + βE(g | A) is A-measurable. Moreover, as
Z
Z
Z
E(g | A) dµ
E(f | A) dµ + β
αE(f | A) + βE(g | A) dµ = α
A
A
Z
ZA
g dµ
f dµ + β
= α
A
Z A
αf + βg dµ
=
A
Z
E(αf + βg | A) dµ
=
A
for all A ∈ A, the claim follows.
(ii) First note that E(f | A) ◦ T is T −1 A-measurable. To see this, note that E(f | A) is
A-measurable, i.e.
{x ∈ X | E(f | A)(x) ≤ c} ∈ A for all c ∈ R.
Hence
{x ∈ X | E(f | A)(T x) ≤ c} = T −1 {x ∈ X | E(f | A)(x) ≤ c} ∈ T −1 A
134
MATH4/61112
12. Solutions
so that E(f | A) ◦ T is T −1 A-measurable.
Note that for any A ∈ A
Z
Z
E(f | A) ◦ T dµ =
χT −1 A E(f | A) ◦ T dµ
T −1 A
Z
=
χA ◦ T · E(f | A) ◦ T dµ
Z
=
χA E(f | A) dµ as µ is T -invariant
Z
E(f | A) dµ
=
ZA
f dµ.
=
A
Moreover
Z
T −1 A
E(f ◦ T | T
−1
Z
A) dµ =
ZT
=
Z
=
Z
=
Z
T −1 A
E(f ◦ T | T −1 A) dµ =
Z
f ◦ T dµ
χT −1 A f ◦ T dµ
χA ◦ T · f ◦ T dµ
Z
=
Hence
−1 A
χA f dµ
f dµ.
A
T −1 A
E(f | A) ◦ T dµ
for all A ∈ A. By the characterisation of conditional expectation, it follows that
E(f ◦ T | T −1 A) = E(f | A) ◦ T.
(iii) That E(f | B) = f is immediate from the above characterisation of conditional
expectation.
(iv) Recall that a function f : X → R is A-measurable if f −1 (−∞, c) ∈ A for all c ∈ R.
Suppose that f is N -measurable. Let Bc = f −1 (−∞, c) ∈ N . Hence µ(Bc ) = 0 or 1.
Note that c1 < c2 implies Bc1 ⊂ Bc2 . Hence there exists c0 such that
c0 = sup{c | µ(Bc ) = 0} = inf{c | µ(Bc ) = 1}.
We claim that f (x) = c0 µ-a.e. If c < c0 then µ({x ∈ X | f (x) < c}) = 0. Hence
f (x) ≥ c0 µ-a.e. Let c > c0 . Then
µ({x ∈ X | f (x) ≥ c}) = µ(X \ {x ∈ X | f (x) < c}) = 1 − µ({x ∈ X | f (x) < c}) = 0.
Hence µ({x ∈ X | f (x) > c0 }) = 0. Hence f (x) = c0 µ-a.e.
135
MATH4/61112
12. Solutions
Suppose that f is constant almost everywhere, say f (x) = a µ-a.e. Then f −1 (−∞, c) =
∅ µ-a.e. if c < a and f −1 (−∞, c) = X µ-a.e. if c > a. Hence µ(f −1 (−∞, c)) = 0 or 1.
Hence f −1 (−∞, c) ∈ N for all c ∈ R. Hence f is N -measurable.
If N ∈ N has measure 0 then
Z
f dµ = 0 =
Z Z
f dµ
N
N
dµ
and if N ∈ N has measure 1 then
Z
Z Z
Z
f dµ dµ.
f dµ = f dµ =
N
N
Hence E(f | N ) =
R
f dµ.
Solution 9.4
(i) Let α = {A1 , . . . , An } be a finite partition of X into sets Aj ∈ B and let A be the set
of all finite unions of sets in α.
Trivially ∅ ∈ A.
S ℓj
Let BS
j =
i=1 Ai,j , Ai,j ∈ α, be a countable collection of finite unions of sets in α.
ThenS j Bj is a union of sets in α. As there are S
only finitely many sets in α, we have
that j Bj is a finite union of sets in α. Hence j Bj ∈ A.
It is clear that A is closed under taking complements.
Hence A is a σ-algebra.
(ii) Recall that g : X → R is A-measurable if g−1 (−∞, c) ∈ A for all c ∈ R.
P
Suppose that g is constant on each Aj ∈ α and write g(x) = j cj χAj (x). Then
S
g−1 (−∞, c) = Aj where the union is taken over sets Aj for which cj < c. Hence g
is A-measurable.
Conversely, suppose that g is A-measurable. For each c ∈ R, let T
Ac = g−1 (−∞, c).
Then Ac ∈ A. Moreover, Ac ↓ ∅ as cS→ −∞ (in the sense that c∈R Ac = ∅) and
Ac ↑ X as c → ∞ (in the sense that c Ac = X). Let A ∈ α. Then there exists c0
such that A 6⊂ Ac for c < c0 and A ⊂ Ac for c > c0 . Hence g(x) = c0 for all x ∈ A.
Hence g is constant on each element of α.
(iii) Define g by
g(x) =
n
X
χAj (x)
j=1
R
Aj
f dµ
µ(Aj )
.
Then g is constant on each set in α, hence g is A-measurable.
Let Ai ∈ α. Then
Z
g dµ =
Ai
Hence
R
n Z
X
j=1
A
g dµ =
R
Af
χA i χA j
R
Aj
f dµ
µ(Aj )
dµ =
n Z
X
j=1
χAi ∩Aj
R
Aj
f dµ
µ(Aj )
dµ =
dµ for all A ∈ A. It follows that g = E(f | A).
136
Z
Ai
f dµ.
MATH4/61112
12. Solutions
Solution 9.5
Clearly ∅ ∈ I.
Let I ∈ I, so that T −1 (I) = I. Then T −1 (X \ I) = X \ I, so that the complement of I
is in I.
S
S
S
S
Let In ∈ I. Then T −1 ( n In ) = n T −1 In = n In so that n In ∈ I.
Hence I is a σ-algebra.
Solution 9.6
Recall that E(f | I) is determined by the requirements that E(f | I) is I-measurable and
that
Z
Z
E(f | I) dµ = f dµ
I
I
L2 (X, B, µ)
L2 (X, I, µ)
for all I ∈ I. Let PI :
→
denote the orthogonal projection onto the
subspace of I-measurable functions. To show that PI f = E(f | I) it is thus sufficient to
check that for each I ∈ mathcalI we have
Z
Z
PI f dµ = f dµ
I
I
for all f ∈ L2 (X,
R B, µ).
R
R
Note that I PI f dµ = χI PI f dµ = hχI , PI f i and, similarly, I f dµ = hχI , f i, where
we use h·, ·i to denote the inner product on L2 (X, B, µ). Hence it is sufficient to prove that,
for all I ∈ I, hχI , f − PI f i = 0.
It is proved in the proof of Theorem 9.6.1 that L2 (X, B, µ) = L2 (X, I, µ) ⊕ C where C
denotes the norm-closure of the subspace {w ◦T − w | w ∈ L2 (X, B, µ). Hence it is sufficient
to prove that hχI , gi = 0 for all g ∈ C. To see this, first note that for w ∈ L2 (X, B, µ) we
have that
hχI , w ◦ T − wi = hχI , w ◦ T i − hχI , wi
= hχT −1 I , w ◦ T i − hχI , wi
= hχI ◦ T, w ◦ T i − hχI , wi = 0,
using the facts that I = T −1 I a.e. and that T is measure-preserving. It follows that
hχI , gi = 0 for all g ∈ C.
Solution 10.1
Let T be an ergodic measure-preserving transformation of the probability space (X, B, µ)
and let f ∈ L1 (X, B, µ). Let
n−1
X
f (T j x).
Sn =
j=0
By Birkhoff’s
Ergodic Theorem, there exists a set N such that µ(N ) = 0 and if x 6∈ N then
R
Sn /n → f dµ as n → ∞. Let x 6∈ N . Note that
f (T n x) Sn
n + 1 Sn+1
=
+
.
n n+1
n
n
R
R
1
Sn+1 → f dµ and n1 Sn → f dµ as
Letting n → ∞ we have that (n + 1)/n → 1, n+1
n → ∞. Hence if x 6∈ N then f (T n x)/n → 0 as n → ∞. Hence f (T n x)/n → 0 as n → ∞
for µ-a.e. x ∈ X.
137
MATH4/61112
12. Solutions
Solution 10.2
R
Let f ≥ 0 be measurable and suppose that f dµ = ∞. For each integer M > 0 define fM (x) = min{f (x), M }. Then 0 ≤ fM ≤ M , hence fM ∈ L1 (X, B, µ). Moreover
fM (x) ↑ f (x) as M
R → ∞ forR all x ∈ X. Hence by the Monotone Convergence Theorem
(Theorem 3.1.2), fM dµ → f dµ = ∞.
By Birkhoff’s Ergodic Theorem, there exists NM ⊂ X with µ(NM ) = 0 such that for
all x 6∈ NM we have
Z
n−1
1X
j
fM (T x) = fM dµ.
(12.0.8)
lim
n→∞ n
j=0
S
Let N = ∞
M =1 NM . Then µ(N ) = 0. Moreover, for any M > 0 we have that if x 6∈ N
then (12.0.8) holds.
R
R Let K ≥ 0 be arbitrary. As fM dµ → ∞, it follows that there exists M > 0 such that
fM dµ ≥ K. Hence for all x 6∈ N we have
n−1
n−1
j=0
j=0
1X
1X
lim inf
f (T j x) ≥ lim
fM (T j x) =
n→∞ n
n→∞ n
Z
fM dµ ≥ K.
As K is arbitrary, we have that for all x 6∈ N
n−1
lim inf
n→∞
Hence
1
n
P∞
j=0 f (T
j x)
1X
f (T j x) = ∞.
n
j=0
→ ∞ for µ-a.e. x ∈ X.
Solution 10.3
We prove that (i) implies (ii). Suppose that T is an ergodic measure-preserving transformation of the probability space (X, B, µ). Recall from Proposition 10.2.2 that for all
A, B ∈ B,
n−1
1X
µ(T −j A ∩ B) → µ(A)µ(B),
n
j=0
as n → ∞; equivalently, for all A, B ∈ B we have
n−1 Z
1X
n
j=0
j
χA (T x)χB (x) dµ →
Z
χA dµ
Z
χB dµ.
(12.0.9)
Pr
Let f (x) =
k=1 ck χAk (x) be a simple function. Then taking linear combinations of
expressions of the form (12.0.9) we have that
n−1 Z
1X
n
j=0
j
f (T x)χB (x) dµ →
Z
f dµ
Z
χB dµ.
If f ≥ 0 is a positive measurable function then we can choose a sequence of simple
functions fn ↑ f that increase pointwise to f . By the Monotone Convergence Theorem
(Theorem 3.1.2) we have that
n−1 Z
1X
n
j=0
f (T j x)χB (x) dµ →
138
Z
f dµ
Z
χB dµ
(12.0.10)
MATH4/61112
12. Solutions
for all positive measurable functions f . Suppose that f ∈ L1 (X, B, µ) is real-valued. Then
by writing f = f + − f − where f + , f − are positive, we have that (12.0.10) holds when f is
integrable and real-valued. By taking real and imaginary parts of f , we have that (12.0.10)
holds for all f ∈ L1 (X, B, µ).
By taking finite linear combinations of characteristic functions in (12.0.10) we have that
n−1 Z
1X
n
j=0
f (T j x)g(x) dµ →
Z
f dµ
Z
g dµ.
(12.0.11)
for all simple functions g. By taking an increasing sequence of simple functions and applying
the Monotone Convergence Theorem as above, we have that (12.0.11) holds for all positive
measurable functions g. By writing g = g+ − g− where g+ .g− are positive, we have that
(12.0.11) holds for any real-valued integrable function g. By taking real and imaginary
parts, we have that (12.0.11) holds for any g ∈ L1 (X, B, µ).
We prove that (ii) implies (i). Suppose that for all f, g ∈ L2 (X, B, µ) we have that
n−1 Z
1X
lim
n→∞ n
j
f (T x)g(x) dµ =
j=0
Z
f dµ
Z
g dµ.
Suppose that T −1 B = B, B ∈ B. Then χB ∈ L2 (X, B, µ). Taking f = g = χB we have
that
Z
Z
n−1 Z
1X
j
χB (T x)χB (x) dµ → χB dµ χB dµ = µ(B)2 .
lim
n→∞ n
j=0
Note that χB (T j x)χB (x) = χT −j B∩B (x) = χB (x) as T −j B = B. Hence
n−1 Z
1X
n
j=0
n−1 Z
1X
χB (T x)χB (x) dµ =
n
j
χB dµ =
j=0
Z
χB dµ = µ(B).
Hence µ(B) = µ(B)2 so that µ(B) = 0 or 1.
Solution 10.4
Choose a countable dense set of continuous functions {fi }∞
i=1 ⊂ C(X, R). By Birkhoff’s
Ergodic Theorem there exists Yi ∈ B such that µ(Yi ) = 1 and
n−1
1X
fi (T j x) =
n→∞ n
lim
j=0
T∞
Z
fi dµ
for all x ∈ Yi . Let Y = i=1 Yi . Then Y ∈ B and µ(Y ) = 1.
Let f ∈ C(X, R), ε > 0, x ∈ Y . Choose i such that kf − fi k∞ < ε. Choose N such that
if n ≥ N then
n−1
Z
X
1
j
fi (T x) − fi dµ < ε.
n
j=0
Then
n−1
Z
1 X
j
f (T x) − f dµ
n
j=0
139
MATH4/61112
12. Solutions
n−1
n−1
Z
Z
Z
X
1 X
1
j
j
j
≤ f (T x) − fi (T x) + fi (T x) − fi dµ + fi dµ − f dµ .
n j=0
n j=0
< 3ε.
As ε > 0 is arbitrary, we have that for all f ∈ C(X, R) and for all x ∈ Y ,
n−1
1X
f (T j x) =
lim
n→∞ n
j=0
Z
f dµ.
Solution 10.5
Let S = {A, B, C, . . . , Z} denote the finite set of letters (symbols) in the alphabet. Let
Σ = {x = (xj )∞
j=0 | xj ∈ S, j = 0, 1, 2, . . .} denote the space of all infinite sequences of
symbols. For each s ∈ S, let p(s) = 1/26 denote the probability of choosing symbol s. Let
B denote the Borel σ-algebra on Σ and equip Σ with the Bernoulli probability measure µ
defined on cylinders by
µ([i0 , i1 , . . . , in−1 ]) = p(i0 )p(i1 ) · · · p(in−1 ).
Define σ : Σ → Σ by (σ(x)j = xj+1 .
We regard an element x ∈ Σ as one possible outcome of the monkey typing an infinite
sequence of letters.
Let B denote the cylinder [B, A, N, A, N, A]. Then µ(B) = 1/266 > 0. By Birkhoff’s
Ergodic Theorem, for µ-a.e. x ∈ Σ,
n−1
1X
χB (σ j (x)) = µ(B) > 0.
n→∞ n
lim
j=0
Hence, almost surely, the infinite sequence of letters x will contain ‘MONKEY’. Hence,
with probability 1, the monkey will type the word ‘MONKEY’. (Indeed, with probability
one he will type ‘MONKEY’ infinitely often.)
By Kac’s Lemma, the expected first time at which ‘MONKEY’ appears is 1/µ(B) = 266 .
If the monkey types 1 letter a second, then one would expect to wait 266 seconds (about
9.8 years) until ‘MONKEY’ first appears in a block of 6.
Solution 11.1
We first claim that for each integer b ≥ 2, T (x) = Tb (x) = bx mod 1 is ergodic with
respect to Lebesgue measure µ (we already know that Lebesgue measure is invariant by
Exercise 3.6). To see this, we use Fourier series, following the argument that was used to
prove that the doubling map is ergodic with respect to Lebesgue measure.
Suppose that f ∈ L2 (R/Z, B, µ) is such that f ◦ P
T = f µ-a.e. Then f ◦ T p = f µ2πinx . Then f ◦ T p has
a.e. for all p ∈ N. Associate to f its Fourier series ∞
n=−∞ cn e
P∞
p
Fourier series n=−∞ cn e2πinb x . Comparing Fourier coefficients we see that cbp n = cn .
Suppose that n 6= 0. Then bp n → ∞ as n → ∞. By the Riemann-Lebesgue Lemma
(Proposition 5.3.1(ii)), cn = cbp n → 0 as n → ∞. Hence cn = 0 if n 6= 0. Hence f has
Fourier series c0 , i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure.
Solution 11.2
(i) Let
Xb = {x ∈ [0, 1) | x is simply normal in base b}.
140
MATH4/61112
12. Solutions
Then for each b ≥ 2, Xb has Lebesgue measure µ(Xb ) = 1. Hence
X∞ =
∞
\
Xb
b=2
consists of all numbers that are simply normal in every base b ≥ 2. Clearly µ(X∞ ) = 1.
(ii) T
Let X(b) denote the set of numbers that are normal in base b, b ≥ 2. Then X∞ =
∞
b=2 Xb consists of all normal numbers. Clearly µ(X∞ ) = 1.
Alternatively, note that x ∈ [0, 1] is simply normal in base bk if and only if every word
of length k occurs with frequency 1/bk in the base b expansion of x. Hence a number
is normal in every base if and only if it is simply normal in every base.
Solution 11.3
Let T (x) = rx mod 1, T : R/Z → R/Z. From Exercise 11.1 we know that T is ergodic with
respect to Lebesgue measure µ. Let x ∈ [0, 1] and let xn = r n x. Then {xn }, the fractional
part of xn , is equal to T n x. Let [a, b] ⊂ [0, 1]. Let ℓ ∈ Z \ {0} and let fℓ (x) = e2πiℓx . Then
there exists Nℓ ∈ B, µ(Nℓ ) = 0, such that
n−1
n−1
j=0
j=0
1X
1 X 2πiℓxn
=
e
fℓ (T j x) →
n
n
Z
fℓ (x) dx = 0
for all x 6∈ Nℓ . S
Let N = ℓ∈Z\{0} Nℓ . As µ(Nℓ ) = 0 and this is a countable union, we have that
µ(N ) = 0. Hence if x 6∈ N we have for all ℓ ∈ Z \ {0}
n−1
1 X 2πiℓxn
e
= 0.
n
j=0
By Weyl’s Criterion it follows that if x 6∈ N then xn is uniformly distributed mod 1.
(Aside: you might wonder why we had to use Weyl’s Criterion and did not just use the
definition of uniform distribution. Whilst it is certainly true that
n−1
1X
1
χ[a,b] (T j x) →
card{j ∈ {0, 1, . . . , n − 1} | {xj } ∈ [a, b]} =
n
n
j=0
Z
χ[a,b] dµ = b − a
for µ-a.e. x ∈ X, the set of measure zero for which this fails depends on the interval [a, b].
We need a set of measure zero that works for all intervals. As there are uncountably many
intervals, we cannot just take the union of all the sets of measure zero as we did above.
One can make an argument along these lines work, by considering intervals with rational
endpoints (and so a countable collection of intervals) and then approximate an arbitrary
interval.)
Solution 11.4
Let T (x) = 10x mod 1. From Exercise 11.1 we know that T is ergodic with respect to
Lebesgue measure. Let x ∈ [0, 1] have decimal expansion
x=
∞
X
xj
10j+1
j=0
141
MATH4/61112
12. Solutions
with xj ∈ {0, 1, . . . , 9}. Let
f (x) =
9
X
kχ[k/10,(k+1)/10)
k=0
so that f (x) = k precisely when x0 = k. Note that f (T j x) = k precisely when xj = k.
Then
n−1
1
1X
(x0 + x1 + · · · + xn−1 ) =
f (T j x).
n
n
j=0
Hence by Birkhoff’s Ergodic Theorem,
1
(x0 + x1 + · · · + xn−1 ) =
n→∞ n
n−1
1X
f (T j x)
n→∞ n
j=0
Z
=
f (x) dx a.e.
lim
=
lim
9
X
k
a.e. = 4.5 a.e.
10
k=0
Solution 11.5
If x ∈ (0, 1) then write the continued fraction expansion of x as [x0 , x1 , . . .].
Define f : (0, 1) → R by
f (x) =
∞
X
kχ(1/(k+1),1/k] (x).
k=1
Then f (x) = k precisely when 1/(k + 1) < x ≤ 1/k, i.e. f (x) = k when x0 = k. Hence
f (T j x) = k precisely when xj = k.
We can write
n−1
1X
1
(x0 + · · · + xn−1 ) =
f (T j x).
n
n
j=0
Clearly f ≥ 0 and is measurable. However f 6∈ L1 (X, B, µ). To see this, using Exercise 3.5(iii), it is sufficient to show that f 6∈ L1 (X, B, λ) where λ denotes Lebesgue measure.
Note that
Z
∞
X
1
1
,
f dλ =
kλ
k+1 k
k=1
=
∞
X
k=1
1
= ∞.
k+1
By the results of Exercise 10.2, it follows that
n−1
1X
f (T j (x)) = ∞
n→∞ n
lim
j=0
for µ-a.e. x ∈ X. As Gauss’ measure and Lebesgue measure have the same sets of measure
zero. we have that
n−1
1X
f (T j (x)) = ∞
lim
n→∞ n
j=0
142
MATH4/61112
12. Solutions
for Lebesgue almost every point x ∈ X.
143