Ergodic Number Theory A Course at Nagoya University J¨ orn Steuding

A Course at Nagoya University
Ergodic Number Theory
Jörn Steuding
This is Felix, and iterations of his picture under the ergodic cat map (from left to
right and top down). Since (discrete) ergodic theory is no harm for animals, Felix
returns after finitely many iterations. We will explain this phenomenon...
What is Ergodic Number Theory?
Ergodic theory studies the long time behaviour of dynamical systems.
This line of investigation has its origin in Poincaré’s investigations in statistical physics more than one hundred years ago. However, in the meantime
ergodic theory has found many remarkable applications in various branches
of mathematics. Here we shall focus on arithmetical applications. We begin
with Weyl’s theorems on uniform distribution and applications to diophantine analysis, which might be interpreted as another starting point of ergodic
theory. Then we introduce necessary concepts, techniques, and notions –
mostly from measure and integration theory – in order to set the stage for
the highlights: Birkhoff’s famous ergodic theorem, a sketch of Hlawka’s
proof of the uniform distribution of the ordinates of the nontrivial zeros
of the Riemann zeta-function, and Khintchine’s theorem on patterns in
continued fraction expansions of real numbers.
Our approach is intended to be self-contained. For this aim we recall the
foundations of Lebesgue measure and integral as well as classical results
on continued fractions, the latter topic in details and with proofs; however,
the account on the zeta-function should be regarded as an appetizer. These
notes contain slightly more than the material presented during the lectures
at Nagoya University; in particular, references to related results and advanced topics are given which could not be investigated here in detail. For
those readers who want to learn more we recommend the excellent monographies of Dajani & Kraaikamp [37], resp. Choe [31]. A German version
of these notes originates from a course I have given at Würzburg University in 2007/08 which can be downloaded at http://www.mathematik.uniwuerzburg.de/∼steuding/ergod.htm. I am very grateful to Christian Beck
for his careful reading of the German notes — comments and corrections are
welcome and can be send to [email protected]. Furthermore, I would like to thank Julia Koch for providing the photograph of
her beautiful cat Felix (see the frontpage) and her trust in the cat map to
let her cat return. I am also grateful to Martin Schröter for creating
the impressing pictures of Felix under the cat map. Moreover, I would like
to thank Thomas Christ for technical support and my wife Rasa for her
help with most of the other pictures. Last but not least, I would like to
express my gratitude to the hospitable audience at Nagoya University, and,
in particular, Prof. Kohji Matsumoto for giving me the opportunity to
teach this course and many valuable comments.
Jörn Steuding, Nagoya, November 2010.
Contents
Chapter 1. Motivation: Billiards and Benford
1.1. Classical Diophantine Approximation
1.2. Uniform Distribution Modulo One
4
6
9
Chapter 2. Prelude: Lebesgue Measure and Integral
2.1. Measure Theory
2.2. The Lebesgue Integral
17
17
20
Chapter 3. Measure Invariance and Ergodicity
3.1. Measure Preserving Transformations
3.2. Ergodicity and Mixing
24
24
31
Chapter 4. Classical Ergodic Theorems
4.1. The Mean Ergodic Theorem of von Neumann
4.2. The Birkhoff Pointwise Ergodic Theorem
37
37
39
Chapter 5. Heavenly and Normal Applications
5.1. Poincaré’s Recurrence Theorem
5.2. Normal numbers
47
47
53
Chapter 6. Interlude: The Riemann Zeta-Function
6.1. Primes and Zeros
6.2. Applications of Uniform Distribution and Ergodic Theory
61
61
70
Chapter 7. Crash Course in Continued Fractions
7.1. The Euclidean Algorithm Revisited
7.2. Infinite Continued Fractions
78
78
81
Chapter 8. Metric Theory of Continued Fractions
8.1. Ergodicity of the Continued Fraction Mapping
8.2. The Theorems of Khintchine and Lévy
88
89
93
Chapter 9. Coda: Arithmetic Progressions
102
Biographical and Historical Notes
111
Notations
125
Bibliography
126
Index
133
3
CHAPTER 1
Motivation: Billiards and Benford
Imagine a square with mirrors at its sides and a ray of light is leaving
the interior of the square. The light ray is reflected from the the mirrors
and we may ask whether its path will be periodic or aperiodic? What initial
data is determining periodicity and what aperiodicity? Can it happen that
the path is dense in the square? These questions in the context of billiards
were first raised by König & Szücs [85] in 1913.∗
For the sake of simplicity let us replace the square by a disk. Then
the ray of light is always reflected by the same angle at its boundary –
a phenomenon called rotation symmetry which makes circle billiards a bit
easier than square billiards. Moreover, we may assume that the boundary
of the disk is the unit circle in the complex plane C. The so-called circle
group is the multiplicative group of all complex numbers of absolute value
one and can be parametrized by the exponential function:
√
T := {exp(2πix) : x ∈ [0, 1)}
with i = −1.
Note that the map exp : R → T, x 7→ exp(2πix) is a surjective but not
injective group homomorphism. By the isomorphy theorem from algebra we
find
T∼
= R / Z.
Hence, the circle group T is an isomorphic and homeomorphic image of the
unit interval [0, 1) as an additive group, resp. of the real line R modulo
Z. In the sequel it will often be advantageous to work with cosets r + Z,
resp. the corresponding points on the unit circle (or higher-dimensional tori)
rather than with real numbers r. Billiards is giving a first example. Let πα
denote the angle between the ray of light and the circle T. Since this angle
remains the same after each reflection, geometry shows that a consecutive
intersection point of the ray with the circle is obtained from the previous
one by a circle rotation through the angle 2πα. Thus, denoting the n-th
point where the ray is intersecting the circle by ζn = exp(2πixn ), we find
xn − xn−1 ≡ α mod 1
resp.
xn = x0 + nα
for
n ∈ N.
(Here the notation ’mod 1’ reflects that we only need to consider the fractional parts of the sequence of xn .) If α is rational, the path of the light
ray is periodic. More precisely, for α = pq with p, q ∈ N the ray of light
∗
The unexperienced reader may try the tutorial ’Donald Duck in Mathmagicland’
about math and billiards...
4
1. Motivation: Billiards and Benford
5
is q-periodic (meaning xn+q ≡ xn mod 1). But what about irrational α?
In this case the ray of light will sooner or later visit any arc of the circle.
We will treat this case below with classical methods from Diophantine
approximation theory (namely Corollary 1.5).
Figure 1. A periodic ray of light; here we have α = 15 which
corresponds to an angle of 36◦ between the ray and the circle.
Another interesting phenomenon is Benford’s law which describes irregularities in the distribution of digits in statistical data. In 1881 Newcomb noticed that in books consisting tabulars with values for the logarithm
those pages starting with digit 1 have been used more often than others. In
1938 this observation was rediscovered and popularized by the physicist
Benford [16] who gave further examples from statistics about American
towns. According to this distribution a set of numbers is said to be Benford distributed if the leading digit equals k ∈ {1, 2, . . . , 9} for log10 (1 + k1 )
percent. Thus, slightly more than thirty percent of the numbers in a data set
distributed following Benford’s law have leading digit 1, and only about
six percent start with digit 7. Obviously, this distribution phenomenon,
commonly known as Benford’s law, cannot be true in general. Here is an
illustrating example of a deterministic sequence which follows Benford’s
law, also known as Gelfand’s problem.†
Considering the powers of two, we notice that among the first of those
powers,
1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8092, . . . ,
there are more integers starting with digit 1 than with digit 3. Given a
power of 2 with a decimal expansion of m + 1 digits,
10m k ≤ 2n < 10m (k + 1)
for
k ∈ {0, 1, . . . , 9},
taking the logarithm leads to
m + log10 k ≤ n log10 2 < m + log10 (k + 1).
For a real number x we introduce the decomposition in its integral and
fractional parts by writing x = ⌊x⌋ + {x} with ⌊x⌋ being the largest integer
†
Although, Gelfand, being an excellent mathematician, had definitely no problem
with this task.
6
ERGODIC NUMBER THEORY
less than or equal to x and {x} ∈ [0, 1) the fractional part (which we denote
sometimes also as x mod 1 as above). Consequently,
log10 k ≤ {n log10 2} < log10 (k + 1).
By convexity, the interval [log10 k, log10 (k + 1)) is larger for small k, so,
heuristically, the chance is larger to have an n for which n log10 2 has fractional part in this interval. We shall later show (again by Corollary 1.5)
that the sequence of numbers log10 xn = n log10 2 is uniformly distributed
modulo 1, thence, as n → ∞, the proportion of those with leading digit
k ∈ {1, 2, 3, . . . , 9} equals the length of the interval [log10 k, log10 (k + 1)),
that is
log10 (k + 1) − log10 k = log10 (1 + k1 ),
and, in particular, log10 2 ≈ 30.1 percent of the powers of 2 have a decimal
expansion with leading digit 1 whereas the leading digit equals 7 for only
approximately 5.8 percent. On the contrary, powers of 10 have always leading digit 1 in the decimal system. This shows that the arithmetic nature of
log10 2 is relevant for the proportion with which leading digits appear.
Benford’s law is supposed to hold for quite many sequences as constants in physics and stock market values.‡ A further example of a sequence
for which Benford’s law is known to be true is the sequence of Fibonacci
numbers
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . ,
however, the sequence of primes is not as was proved by Jolissaint [73] and
Diaconis [41]. Recent investigations show that certain stochastic processes,
e.g., the geometric Brownian motion or the 3X + 1-iteration due to Collatz
satisfy Benford’s law as shown by Kontorovich & Miller [90]).
1.1. Classical Diophantine Approximation
Diophantine analysis deals with integer solutions to algebraic equations and rational approximations to real numbers, respectively; the name
is attributed to Diophant who was a Greek mathematician of the third
century who wrote an influential treatise on such type of questions. Since Q
is dense in R, we can approximate any real number by rationals as closely
as we please. The classical approximation theorem of Dirichlet from 1842
provides a quantitative version:
Theorem 1.1. Given ξ ∈ R \ Q, there exist infinitely many rational numbers
p
q satisfying
1
p (1.1)
ξ − q < q 2 .
‡
Benford’s law is quite popular. The U.S.-American tv-serial NUMB3RS deals with
Benford’s law (in the episode “The Running Man”). Moreover, the creative bookkeeping of the enterprise Enron was discovered by the U.S. tax authority with the help of
Benford’s law.
1. Motivation: Billiards and Benford
7
This property characterizes irrational numbers: if ξ is rational, inequality
(1.1) has only finitely many solutions pq .
The quality of a rational approximation to a given real number is measured
in terms of the denominator. Roughly speaking, Dirichlet’s theorem shows
that irrational numbers possess more and better rational approximations
than rationals!
Proof. We shall apply the pigeon hole principle: If n + 1 objects are distributed among n boxes, there is at least one box containing at least two
objects. Given Q ∈ N the Q + 1 points 0, {ξ}, {2ξ}, . . . , {Qξ} lie in the Q
disjoint intervals
j−1 j
,
für j = 1, . . . Q.
Q Q
Hence, there is at least one interval containing two points {kξ}, {ℓξ}, say.
Assuming {kξ} ≥ {ℓξ}, it follows that
(1.2)
{kξ} − {ℓξ} = kξ − [kξ] − ℓξ + [ℓξ]
= {(k − ℓ)ξ} + [(k − ℓ)ξ] + [ℓξ] − [kξ] .
{z
}
|
∈Z
Since {kξ} − {ℓξ} ∈ [0,
q := |k − ℓ| we thus get
1
Q ),
the integral parts in (1.2) sum up to zero. For
{qξ} = {kξ} − {ℓξ} <
With p := [qξ] we obtain
ξ −
(1.3)
1
.
Q
{qξ}
1
p |qξ − p|
=
=
<
,
q
q
q
qQ
which immediately leads to (1.1).
Now assume that ξ is irrational. Suppose there exist only finitely many
solutions pq11 , . . . , pqnn to (1.1). Since ξ 6∈ Q, we can find a number Q such that
p
j
ξ − > 1
for j = 1, . . . , n,
qj Q
contradicting (1.3).
Finally, assume that ξ is rational, that is ξ =
b ∈ N. If ξ = ab 6= pq , then
ξ − p = |aq − bp| ≥ 1 ,
q
bq
bq
a
b
with some a ∈ Z and
and (1.1) implies q < b. Hence there can only finitely many
(1.1). •
p
q
exist satisfying
The classical approximation theorem of Kronecker from 1884 generalizes Dirichlet’s theorem 1.1 to the inhomogeneous case:
8
ERGODIC NUMBER THEORY
Theorem 1.2. Let ξ ∈ R \ Q and η ∈ R. For any N ∈ N, there exist Q ∈ N
with Q > N and P ∈ Z such that
3
|Qξ − P − η| < .
Q
In §23.6 of the classic [61] on number theory, the authors Hardy & Wright
state a multi-dimensional analogue of Kronecker’s theorem (see Exercise
1.2) and comment on this result as ”one of those mathematical theorems
which assert (...) that what is not impossible will happen sometimes however
improbable it may be.”∗
Proof. According to Theorem 1.1 there exist coprime integers q > 2N and
p such that
1
|qξ − p| < .
q
Now suppose that m is an integer satisfying
1
|qη − m| ≤ .
2
In view of Bézout’s theorem from elementary number theory we may find a
linear combination m = px − qy with integers x, y, where |x| ≤ 21 q; actually,
this is an easy consequence of the Euclidean algorithm for p and q; see
[128]. Hence
q(xξ − y − η) = x(qξ − p) − (qη − m),
and
1 1 1
q · + = 1,
2 q 2
respectively. Setting Q = q + x and P = p + y we thus obtain
1
3
N < q ≤ Q ≤ q.
2
2
It follows that
2
3
1 1
|Qξ − P − η| ≤ |xξ − y − η| + |qξ − p| < + = ≤ .
q q
q
Q
This is the assertion of the theorem. •
|q(xξ − y − η)| <
The Kronecker approximation theorem allows a solution to our billiards problem from the beginning. Here we shall consider the square billiard.
We may assume the square to be given by [0, 21 )2 ⊂ R2 . If γ denotes the
angle between one edge and the initial direction of the ray, then the path of
the ray is determined by the linear equation
y = ξx + β,
where ξ = tan γ and β is some real number according to the starting point
of the ray. If we reflect the edges rather than the ray, our straight line
is defined in the whole plane and we observe that the path is periodic if,
and only if, the above line degenerates modulo Z2 into a finite number of
∗
Outside mathematics this is also known as ‘Murphy’s law’.
1. Motivation: Billiards and Benford
9
straight line segments; otherwise, the path of the ray is dense in the square.
In fact, it is periodic if, and only if, the straight line intersects with the same
points on the reflected edges modulo Z2 , that is when its slope ξ is rational.
Now suppose that ξ is irrational. Then, for any point (x1 , y1 ) ∈ R2 and
any ǫ > 0, in view of Kronecker’s approximation theorem 1.2 applied to
η = −y1 + β + ξx1 , there exist integers P, Q such that
|y1 + P − (ξ(x1 + Q) + β)| = | y1 − β − ξx1 +P − Qξ| < ǫ.
{z
}
|
=−η
Hence the point (x1 , y1 ) and point (x1 , ξ(x1 + Q) + β) on the line differ
modulo Z2 by a quantity less than ǫ. We conclude: the ray of light describes
Figure 2. The paths of two different rays of light, one with
irrational tangent, the other one with rational tangent.
a closed resp. periodic path if the line has a rational tangent, i.e. ξ = tan γ ∈
Q. Otherwise, the ray of light is visiting any neighbourhood of any point
in the square.† Despite the different geometry circle billiards also needs
irrationality for the denseness of the corresponding path (see Exercise 1.3).
1.2. Uniform Distribution Modulo One
In view of Kronecker’s approximation theorem the fractional parts of
the numbers nξ lie dense in the unit interval as n ranges thorugh N provided ξ is irrational. Now we want to study this denseness in a quantitative
manner. A sequence (xn ) of real numbers is said to be uniformly distributed
modulo 1 (resp. equidistributed) if for all α, β with 0 ≤ α < β ≤ 1 the
proportion of the fractional parts of the xn in the interval [α, β) corresponds
to its length in the following sense:
1
♯{1 ≤ n ≤ N : {xn } ∈ [α, β)} = β − α.
N →∞ N
lim
Obviously, it suffices to consider only intervals of the form [0, β) with arbitrary β ∈ (0, 1). In terms of probability this means that for a uniformly
distributed random variable all possible positions are equally probable.
The first important results in this direction were obtained by Hermann
Weyl∗ around 1913-16 (see [150]). Here is his first
†
See jdm.mathematik.uni-karlsruhe.de/.../vortrag.pdf for a very nice visualization.
not to confuse with Andre Weil
∗
10
ERGODIC NUMBER THEORY
Theorem 1.3. A sequence (xn ) of real numbers is uniformly distributed
modulo 1 if, and only if, for any Riemann integrable function f : [0, 1] → C,
Z 1
N
1 X
(1.4)
lim
f (x) dx.
f ({xn }) =
N →∞ N
0
n=1
Proof. Given α, β ∈ [0, 1), denote by χ[α,β) the indicator function of the
intervall [α, β), i.e.,
1 if α ≤ x < β,
χ[α,β) (x) =
0 otherwise.
Obviously,
Z
1
χ[α,β) (x) dx = β − α.
0
Therefore, the sequence (xn ) is uniformly distributed modulo 1 if, and only
if, for any pair α, β ∈ [0, 1),
Z 1
N
1 X
lim
χ[α,β) (x) dx.
χ[α,β) ({xn }) =
N →∞ N
0
n=1
Assuming the asymptotic formula (1.4) for any Riemann integrable function
f , it follows that (xn ) is indeed uniformly distributed modulo 1.
In order to show the converse implication suppose that (xn ) is uniformly
distributed modulo 1. Then (1.4) holds for f = χα,β and, consequently, for
any linear combination of such indicator functions. In particular, we may
deduce that (1.4) is true for any step function. It is well-known from any
beginners course on calculus that, for any real-valued Riemann integrable
function f and any ǫ > 0, we can find step functions t− , t+ such that
t− (x) ≤ f (x) ≤ t+ (x)
and
Z
Hence,
Z
0
and
for all x ∈ [0, 1],
1
0
(t+ (x) − t− (x)) dx < ǫ.
1
f (x) dx ≥
Z
1
0
t− (x) dx >
Z
0
1
t+ (x) dx − ǫ,
Z 1
Z 1
N
N
1 X
1 X
f (x) dx ≤
t+ (x) dx + ǫ,
f ({xn }) −
t+ ({xn }) −
N
N
0
0
n=1
n=1
which is less than 2ǫ for all sufficiently large N . Analogously, we obtain
Z 1
N
1 X
f (x) dx > −2ǫ.
f ({xn }) −
N
0
n=1
Consequently, (1.4) holds for all real-valued Riemann integrable functions f .
The case of complex-valued Riemann integrable functions can be deduced
from the real case by treating real and imaginary part of f separately. •
1. Motivation: Billiards and Benford
11
The converse of Weyl’s Theorem was found by de Bruijn [25]: given a
function f : [0, 1) → C with the property that for any uniformly distributed
sequence (xn ) the limit
N
1 X
f ({xn })
lim
N →∞ N
n=1
exists, then f is Riemann integrable.
It is interesting that here the Riemann integral is superior to the
Lebesgue integral. In fact, Theorem 1.3 does not hold for Lebesgue integrable functions f in general since f might vanish at each point {xn }
but have a non-vanishing integral. This subtle difference is related to a
rather important application of uniformly distributed sequences, namely to
so-called Monte-Carlo methods and their use in numerical integration.† If
we distribute N points randomly in the square [−1, 1]2 in the Euclidean
plane and count the number M of those points which lie inside the unit
circle centered at the origin, then the quotient M/N is a good guess for the
area π of the unit disk; with growing N this approximationen is expected to
get better and better. In view of this idea uniformly distributed sequences
can be used to numerically evaluate certain integrals
there is no
R for which
2
elementary method, e.g. the Gaussian integral exp(−x ) dx. More on
this topic can be found in Hlawka [65]. Further applications are relevant
in the theory of pseudo-random numbers (see [34]).
Already Weyl noticed that the appearing limits are uniform which has
been studied ever since under the notion of discrepancy. This topic has
amusing applications, for instance, in billiards where we may ask how soon
an aperiodic ray of light will visit a given domain? First results for effective
billiards are due to Weyl [149], interesting and surprising results on square
billiards have recently been discovered by Beck [15]. Also important in this
setting are effective versions of the inhomogeneous Kronecker approximation theorem 1.2 as, for example, [117]. For the general theory of uniform
distribution and discrepancy we refer to the monographies of Harman [60]
and Kuipers & Niederreiter [106].
Our next aim is another characterization of uniform distribution modulo
one, also due to Weyl. Recall the parametrization of the unit interval
by the exponential function from the very beginning. For abbreviation we
write e(ξ) = exp(2πiξ) for ξ ∈ R which translates the 2πi-periodicity of the
exponential function to 1-periodicity: e(ξ) = e(ξ + Z).
Theorem 1.4. A sequence (xn ) of real numbers is uniformly distributed
modulo 1 if, and only if, for any integer m 6= 0,
(1.5)
N
1 X
e(mxn ) = 0.
N →∞ N
lim
n=1
†
The name Monte-Carlo is an attribute to gambling; there is no university in Monte
Carlo or nearby.
12
ERGODIC NUMBER THEORY
Proof. Suppose the sequence (xn ) is uniformly distributed modulo 1, then
Theorem 1.3 applied with f (x) = e(mx) shows
Z 1
N
1 X
lim
e(mx) dx.
e(mxn ) =
N →∞ N
0
n=1
For any integer m 6= 0 the right-hand side equals zero which gives (1.5).
For the converse suppose (1.5) for all integers m 6= 0. Using the trigonometric polynomial
P (x) =
+M
X
with am ∈ C,
am e(mx)
m=−M
it follows from linearity that
N
1 X
P ({xn }) =
lim
N →∞ N
n=1
(1.6)
+M
X
m=−M
= a0 =
Z
N
1 X
am · lim
e(mxn )
N →∞ N
n=1
1
P (x) dx.
0
Recall Weierstraß’ approximation theorem which claims that, for any
continuous 1-periodic function f and any ǫ > 0, there exists a trigonometric
polynomial P such that
|f (x) − P (x)| < ǫ
(1.7)
for
0≤x<1
(this can be proved, for example, with Fourier Analysis; see [69].‡) Using
this approximating polynomial, we deduce
Z 1
N
1 X
f (x) dx
f ({xn }) −
N
0
n=1
Z 1
N
N
1 X
1 X
P (x) dx
(f ({xn }) − P ({xn })) + P ({xn }) −
≤ N
N
0
n=1
n=1
Z 1
+ (P (x) − f (x)) dx .
0
The first and the third term on the right are less than ǫ thanks to (1.7); the
second term is small by (1.6). Hence, formula (1.4) holds for all continuous,
1-periodic functions f . Denoting by χ[α,β) the indicator function of the
interval [α, β) (as in the proof of the previous theorem), for any ǫ > 0, there
exist continuous 1-periodic functions f− , f+ satisfying
f− (x) ≤ χ[α,β) (x) ≤ f+ (x)
and
Z
0
‡
for all
0 ≤ x < 1,
1
(f+ (x) − f− (x)) dx < ǫ.
Actually, the authors of [69] attribute this result to Fejér.
1. Motivation: Billiards and Benford
13
This leads to
Z 1
N
1 X
χ[α,β) (x) dx.
χ[α,β) ({xn }) =
N →∞ N
0
lim
n=1
Hence, the sequence (xn ) is uniformly distributed modulo 1. •
Combinatorial proofs of Weyl’s theorems can be found in [72].
We shall illustrate the latter criterion with an example. Consider the
fractional parts of the numbers xn = log n. An easy computation shows
N
X
e(log n) =
N
X
n=1
n=1
n2πi =
N X
n 2πi
n=1
∼ N 2πi
N
Z 1
N 2πi
u2πi du =
0
N 1+2πi
1 + 2πi
which is not o(N ). Hence, the sequence (log n)n is not uniformly distributed
modulo 1. Actually, this is the reason why we have been surprised by Benford’s law. If (xn ) is uniformly distributed modulo one, then (log xn ) is
Benford distributed. As a matter of fact, the Benford distribution is
nothing else than the probability law of the mantissa with respect to the
basis.
There is an important application of Theorem 1.4 due to Bohl [22]
improving our observation on the denseness of the fractional parts of nξ in
the unit intervall, as n varies through N.§
Corollary 1.5. Given a real number ξ, the sequence (nξ)n is uniformly
distributed modulo 1 if, and only if, ξ is irrational.
Proof. If ξ is irrational, then e(kξ) 6= 1 for any k ∈ Z and the formula for
the finite geometric series yields
N
X
n=1
e(mnξ) = e(mξ)
1 − e(mN ξ)
1 − e(mξ)
for all integers m 6= 0. Since this quantity is bounded independently of N ,
it follows that
N
1 X
exp(2πimnξ) = 0.
lim
N →∞ N
n=1
Otherwise, ξ = ab for some integers a, b with b 6= 0. In this case the limit is
different from zero for all integer multiples m of b and Theorem 1.4 implies
the assertion. •
We return to Gelfand’s problem from the beginning. First, we observe that log10 2 is irrational. In fact, assuming that it is not, there
§
Remarkably, at about the same time also Sierpinski and Weyl obtained similar
results; for the interesting history we refer to [66].
14
ERGODIC NUMBER THEORY
1
20
y
0
0
0
1
0
200
x
Figure 3.
√ The uniform distribution modulo 1 for the sequence (n 2):
√ on the left a histogram concerning the distri, j)
bution of {n 2} for n = 1, . . . , 500 in the intervals [ j−1
√ 10 10
with 1 ≤ j ≤ 10, in the middle as points (n, {n 2}) in the
unit square, and as points distributed on the circle group on
the right.
would exist positive integers a and b such that 10a = 2b , which is impossible by the unique prime factorization of integers. Hence, applying Corollary 1.5, the proportion of positive integers n for which the inequalities
log10 k ≤ {n log10 2} < log10 (k + 1) hold equals the length of the interval,
that is log10 (1 + k1 ), as predicted by Benford’s law.¶
Corollary 1.5 can be generalized in various ways. Vinogradov [144]
proved the ternary Goldbach conjecture that any sufficiently large odd
integer can be represented as a sum of three primes; an important tool in
his approach are estimates for exponential sums of the form
X
e(ξpn ),
pn ≤N
where pn denotes the nth prime (in ascending order). For irrational ξ the
sequence (ξpn ) is uniformly distributed modulo 1. In order to get an impression on the depth of this result the reader may start with the non-trivial
problem how the sequence (ξpn ) is distributed modulo 1 if ξ is rational;
interestingly, again the name Dirichlet will pop up. On the contrary,
the binary Goldbach conjecture that any even integer larger than two is
representable as sum of two primes is still open.
Exercises
Actually, the statement of Dirichlet’s approximation theorem has already been
known to Lagrange and his contemporaries. However, Dirichlet’s elegant approach allowed him the following interesting generalizations:
¶
There is another problem of Gelf’ond on digits of prime numbers which was recently
solved by Mauduit & Rivat [103] who showed that the sum of digits of primes written in
a basis q ≥ 2 is uniformly distributed in arithmetic progressions, which is not uniform distritbution modulo one, and the mathematician A.O. Gel’fond is not the mathematician
I.M. Gelfand.
1. Motivation: Billiards and Benford
15
Exercise 1.1. Prove the following simultaneous approximation theorem: assume
ξij ∈ R with 1 ≤ i ≤ m, 1 ≤ j ≤ n and 1 < Q ∈ Z, then there exist integers
p1 , . . . , pm , q1 , . . . , qn satisfying
1 ≤ max{|qj | : 1 ≤ j ≤ n} < Qm/n
and
1
Q
Why is this a generalization of Theorem 1.1?
|ξi1 q1 + . . . + ξin qn − pi | ≤
for
1 ≤ i ≤ m.
Exercise 1.2. Prove the following simultaneous inhomoegenous approximation theorem: assume 1, ξ1 , . . . , ξm are linearly independent over the rationals, η1 , . . . , ηm
are arbitrary, and N and ǫ are positive. Then there exist integers Q > N and
P1 , . . . Pm such that
|Qξk − Pk − ηk | < ǫ
for
1 ≤ k ≤ m.
Deduce that the sequence of vectors (nξ1 , . . . , nξm ) lies dense in the unit cube [0, 1)m .
This is the multidimensional version of Kronecker’s approximation theorem. It
might be interesting to consider the illustrative astronomical application to conjunctions in planetary systems from Hardy & Wright [61], §23.6, and to read
three rather different proofs in the same source.
We return to fun, meaning the billiard problem from the beginning.
Exercise 1.3. Explain the details for aperiodicity in square billiards. Moreover,
solve the problem of circular billiard: under what condition on the angle α is the
path periodic, resp. aperiodic? What can be said about other convex bodies? Which
results can be obtained with ergodic theory?
For advice and more information on related topics we refer to the textbooks [61,
128] as well as to the entertaining book [132] by Tabachnikov, Birkhoff’s
classical articles [21], and the surprising results of Veech [143] on mathematical
billiards not only on disks and squares.
Sometimes it is not easy to decide whether a given sequence is uniformly distributed modulo one. Actually, it is not known whether the sequence of powers
( 32 )n or the numbers exp(n) are uniformly distributed modulo one. Here are some
easier sequences to be checked:
Exercise 1.4. Find a sequence of numbers (xn ) which consists of exactly all rational numbers of the unit interval and is uniformly distributed. Moreover, show that
the sequence of numbers
!n
√
5+1
yn :=
for n ∈ N
2
is not uniformly distributed modulo one.
Recall that the Fibonacci numbers, defined by the recursion Fn+1 = Fn + Fn−1
with initial values F0 = 0, F1 = 1, can be computed alternatively with Binet’s
explicit formula
√
√
1− 5 n
1
5+1 n
) −(
) ).
(1.8)
Fn = √ ((
2
2
5
16
ERGODIC NUMBER THEORY
Koksma [87] showed that almost all sequences (αn ) with α > 1 are uniformly
distributed, however, there is no single α with this property explicitly known. On
the contrary, if α is a Salem number, i.e., all algebraic conjugates of α (except α)
have absolute value
less than one, the sequence (αn ) is not uniformly distributed;
√
the golden ratio 5+1
is an example of such a Salem number. See the work of
2
Pisot & Salem [111] to catch a first glimpse.
Here is another generalization of Corollary 1.5 due to Weyl.
Exercise 1.5. How are the values of polynomials distributed modulo one? Let
P = ad X d + . . . + a1 X + a0
be a polynomial with real coefficients, where at least one coefficient aj with j 6= 0
is irrational. Prove that the values P (n), as n ranges through N, are uniformly
distributed modulo one. Hint: first, one may show the following result due to van
der Corput: A sequence of real numbers xn is uniformly distributed modulo one
if for any positive integer m the sequence of real numbers xm+n − xn is uniformly
distributed modulo one. This might be used in combination with the observation
that P (X + m) − P (X) is a polynomial of degree d − 1. More advise can be found
in [33], §XI.1.
We conclude with an aperitif for Monte Carlo methods:
Exercise 1.6. Write a computer program to generate random points in the unit
square [0, 1)2 . Count those points in an appropriate subset in order to obtain a
numerical approximation to π = 3.14159 . . .. How many points are needed for an
accuracy of 10−3 ? Can you do the same for e = exp(1) = 2.71828 . . .?
*
*
*
It should be noted that not only observations in physics on the motion of
heavenly bodies gave motivation for ergodic theory; surprisingly, also number theory had some impact on statistical physics. In his 1914-paper [149],
entitled Sur une application de la théorie des nombres à la mécaniques statistique et la théorie des pertubations, Weyl applied his uniform distribution
theory to statistical mechanics.
We aim at the important ergodic theorem of Birkhoff from 1931 which
generalizes the uniform distribution results of Weyl significally. Important
ingredients are Lebesgue measure and integral whose construction and basic properties we recall in the following chapter.
CHAPTER 2
Prelude: Lebesgue Measure and Integral
There do exist sets for which one cannot assign a geometrical length,
area or volume. In 1905 Vitali showed the unsolvability of the so-called
measure problem for any space Rd . An example for the one-dimensional
case is provided by the equivalence relation defined by
x∼y
⇐⇒
x − y ∈ Q.
By the axiom of choice we may define the set A ⊂ [0, 1] consisting of exactly
one representative of each equivalence class. Now assume that there exists a
meaningful measure µ with all the nice properties we want, e.g., monotonic,
translation invariant, and countable additive). Then
X
1 = µ([0, 1]) ≤
µ(A + x) ≤ µ([−1, 2]) = 3,
| {z }
x∈[−1,1]∩Q
=µ(A)
where A + x is defined as {a + x : a ∈ A}. Therefore, we obviously cannot
assign any meaningful value to µ(A).∗
It was Emile Borel who introduced the notions of measure and measurable sets in a rigorous way in analysis and it was Henri Lebesgue who
built up a new integration theory on this ground — different from and more
powerful than Riemann’s integration theory which is based on functions
rather than sets. An excellent reference for measure and integration theory
is the classic [89] of Kolmogorov & Fomin.
2.1. Measure Theory
Let X be a non-empty set and denote by P(X) its power set. A nonempty system of sets F ⊂ P(X) is called an algebra if X ∈ F and if with
A, B ∈ F also A ∪ B and X \ B lie in F. Such an algebra F is called a
σ-algebra if F is closed with respect to countable unions, i.e., if the following
axioms are satisfied:
∗
• ∅, X ∈ F;
• X \ A ∈ F for any A ∈ F;
S
• j Aj ∈ F for any countable sequence of sets Aj ∈ F.
This is slightly related to the famous counter-intuitive Banach-Tarski-paradox
3
which claims that a ball in R can be cut into five pieces which can be rearranged as
two balls of the same size, for short: • = • + • (see [147]).
17
18
Since
ERGODIC NUMBER THEORY
\
j
Aj = A \
[
j
(A \ Aj )
for
A :=
[
Aj ,
j
T
it follows from the last axiom that j Aj ∈ F. Hence, a σ-algebra is closed
under countably many unions and intersections. For X 6= ∅ the systems
{X, ∅} and the power set P(X) of X itself are examples for σ-algebras,
however, being extremely small, resp. extremely large, they do not play a
big role in the sequel.
It is not difficult to see that any countable intersection of σ-algebras is
again a σ-algebra. Hence, for any system ∅ =
6 E ⊂ P(X) the intersection
\
Aσ (E) =
F
E⊂F
F is a σ−algebra
is the smallest σ-algebra which contains E; for this reason Aσ (E) is said to
be generated by E. A quite important σ-algebra is the Borel σ-algebra B of
a (non-empty) metric space X defined as the smallest σ-algebra generated
by the open sets in X.
A non-negative function µ, defined on a σ-algebra F with some space
X 6= ∅, is called a measure if the following axioms are satisfied:
• µ(∅) = 0;
• for any countable sequence of pairwise disjsoint sets Aj ∈ F,
[ X
Aj =
µ
µ(Aj ).
j
j
In view of the last property µ is said to be σ-additive (resp. countable
additive). Note that we allow µ to take the value +∞ (of course, taking into
account the standard arithmetic with infinity). Then the triple (X, F, µ),
consisting of a set X 6= ∅, an associated σ-algebra F, and a measure µ,
is called a measure space and the elements in F are called measurable. If
µ(X) < ∞ the measure space is said to be finite. A very important concept
in this theory is the notion of the null set, i.e., any set A ∈ F with measure
zero: µ(A) = 0.
First properties are
• Monotonicity: µ(A) ≤ µ(B) for all measurable sets A ⊂ B;
• Nesting Principle: for any nested sequence of measurable sets
A1 ⊃ A2 ⊃ . . .,
\ An .
lim µ(An ) = µ
n→∞
n
We shall give some examples. First of all there is the counting measure
♯A if ♯A < +∞,
A 7→ |A| =
+∞ otherwise,
where ♯A counts the number of elements of the finite set A, which has many
applications in combinatroics and number theory. In physics the Dirac
2. Lebesgue Measure and Integral
19
measure plays a central role; it is defined by
1 if x ∈ A,
A 7→ δx (A) =
0 otherwise.
Last, but not least, there is the Lebesgue measure which we will denote by
λ. To start we define the Lebesgue-measure for cuboids Q by
(2.1)
λ(Q) =
d
Y
(βj − αj ) ,
j=1
where Q = (α1 , β1 ) × . . . × (αd , βd )
with some real numbers αj ≤ βj . Of course, here we may also consider
semi-open or closed cuboids. Then the definition of the Lebesgue measure
can be extended first by additivity to finite (disjoint) unions of cuboids, socalled figures, and, secondly, by identifying with the outer measure λ∗ for
generic measurable sets A (in F) by using countable unions of limits A of
sequences of figures An (modulo null sets), where
An → A as n → ∞
⇐⇒
lim λ∗ (An ∆A) = 0.
n→∞
Recall that
A∆B := (A \ B) ∪ (B \ A)
is the symmetrical difference A∆B of A and B and that the outer measure
is given by
∞
X
λ(An ),
λ∗ (A) = inf
n=1
where the infimum is taken over all countable coverings of A by open figures
An . It should be noticed that λ∗ (A∆B) is small if A and B differ by a set
of small measure only. To simplify arithmetic with sets we shall also write
A = B if λ(A∆B) = 0.
The above construction of the Lebesgue measure dates back to
Carathéodory and can be generalized without big efforts.∗ An important feature of the Lebesgue measure is translation invariance, i.e.,
λ(A) = λ(A + x) for all measurable sets A and all points x; moreover, it
is unique among all normed measures satisfying these properties. Examples for Lebesgue null sets are Q, resp. Qd according to the underlying
space, or, more generally, all countable sets; a more advanced example is
the uncountable Cantor set.
We conclude our brief outline of measure theory with the notion of a
probability space. A measure P is said to be a probability measure if the
values of P lie in [0, 1] and P(X) = 1. For any finite measure µ we can
∗
Actually, the idea to enlarge the set of figures which do not constitute a σ-algebra,
by limits of figures modulo null sets reminds us on Cantor’s construction of the real
numbers.
20
ERGODIC NUMBER THEORY
thus always define a probability measure by setting P(A) = µ(A)/µ(X).
An important property of a probability measure is
P(X \ A) = 1 − P(A)
for any
A ∈ F.
A triple (X, F, P) consisting of a set X 6= ∅, a σ-algebra F, and a probability measure P is called a probability space. The underlying σ-algebra is
said to be the event space and its elements E are the events which appear
with probability P(E). It is remarkable that the axiomatic foundation of
probability theory was given not earlier than in 1933 by Kolmogorov [88].
Probability theory often allows an interesting view on number theoretical questions, in particular, in context with distribution properties of
arithmetical functions (which is only another expression for sequences of
complex numbers). If (Xn ) is a sequence of independent on [0, 1) uniformly
distributed random variables, then the law of the iterated logarithm implies,
for any m 6= 0,
P
| n≤N e(mXn )|
=1
almost surely,
lim sup √
2N log log N
N →∞
which means that this equality holds with probability P(E) = 1, where E
stands for this event. Consequently, the set of all sequences {xn } in [0, 1)
for which the above lim sup condition does not hold is a null set. (For this
law of the iterated logarithm see [17, 83].) This may be compared with
Weyl’s theorem 1.4.
2.2. The Lebesgue Integral
Next we give a short introduction to Lebesgue’s integration theory.
Only here functions enter the stage. We write f ≤ g for two real-valued
functions if the inequality f (x) ≤ g(x) holds for almost all x for which f
and g are defined (which will always be clear from the context). Given a
measure space (X, F, µ), a function f : X → R is called measurable (resp.
µ-measurable) if the set {x ∈ X : f (x) < α} is measurable for any α ∈ R
(i.e., if it lies in F). In particular, any continuous function is measurable
with respect to the Lebesgue measure, or, more general, to any generic
measure with respect to Borel σ-algebras.
A function is said to be simple if its image is finite. In order to define
the integral for non-negative simple functions η we write η as a finite linear
combination of indicator functions
η=
m
X
j=1
cj χBj
with
Bj := {x : η(x) = cj }
and pairwise distinct cj ≥ 0 which constitute the image η(X); in particular, we suppose the sets Bj to be disjoint. Here the indicator function χB
2. Lebesgue Measure and Integral
according to B ⊂ X is defined by
χB (x) =
21
1 if x ∈ B,
0 otherwise.
For an interval B this coincides with the definition of indicator functions
from the previous chapter. Obviously, this function is measurable if, and
only if, B is measurable. A similar statement holds for simple functions η.
The integral of χB with B ∈ F taken over a measurable set A is defined by
Z
χB dµ = µ(A ∩ B),
A
resp. for measurable simple functions η by
Z
Z
m
m
X
X
cj µ(A ∩ Bj ).
χBj dµ =
cj
η dµ =
A
A
j=1
j=1
Using simple functions we can approximate any non-negative, real-valued
measurable function f to any accuracy and thus define the Lebesgue integral by
Z
Z
ηµ,
f dµ = sup
(2.1)
A
A
where the supremum is taken over all measurable simple functions η satisfying 0 ≤ η ≤ f . Using Young’s decomposition, that is
(2.2)
f = f+ − f−
with
f + := max{f, 0},
f − := − min{f, 0},
we define the integral for any measurable, real-valued function f by
Z
Z
Z
f − dµ
f + dµ −
f dµ =
A
A
A
(simply by applying our integral for non-negative functions to both summands, f + and f − individually). The function f is said to be integrable
(resp. µ-integrable) if both integrals on the right-hand side are finite. This
definition of the Lebesgue integral reflects all important properties we are
used to, that are monotonicity, translation invariance, and linearity (which
allows us to define also the integral for complex-valued measurable functions). Moreover, it does not depend on the representation of simple functions as linear combinations of indicator functions (because of (2.1).
What is the difference to Riemann’s integral? Here is an illuminating
quotation of Lebesgue himself on this difference:
“The geometers of the seventeenth century considered the integral of f (x) — the word ‘integral’ had not been invented,
but that does not matter — as the sum of an infinity of indivisibles, each of which was the ordinate, positive or negative,
of f (x). Very well! We have simply grouped together the
indivisibles of comparable size. (...) One could say that, according to Riemann’s procedure, one tried to add the indivisibles by taking them in the order in which they were furnished
22
ERGODIC NUMBER THEORY
by variation in x, like an unsystematic merchant who counts
coins and bills at random in the order in which they came to
hand, while we operate like a methodical merchant who says:
I have m(E1 ) pennies which are worth 1 · m(E1 ),
I have m(E2 ) nickels which are worth 5 · m(E2 ),
I have m(E3 ) dimes which are worth 10 · m(E3 ), etc.
Altogether then I have
S = 1 · m(E1 ) + 5 · m(E2 ) + 10 · m(E3 ) + . . .
The two procedures will certainly lead the merchant to the
same result becaue no matter how much money he has there
is only a finite number of coins or bills to count. But for us
who must add an infinite number of indivisibles the difference
between the two methods is of capital importance.”
Calculating a Lebesgue integral we may disregard null sets. For instance, the Dirichlet function δ = χQ , defined by δ(x) = 1 for x ∈ Q and
δ(x) = 0 for x ∈ R \ Q, is not integrable in the sense of Riemann, however,
it is Lebesgue integrable with
Z
δ dλ = λ([0, 1] ∩ Q) = 0
[0,1]
(since Q is countable and a fortiori a null set). This reflects what we should
expect from the integral of a function which is vanishing almost everywhere.
If a property E holds for all x ∈ A \ B, where A, B are µ-measurable, and if
B is a null set, that is µ(B) = 0, then E holds for almost all x ∈ A and E
is true on A almost everywhere. If µ is a probability measure, we may also
write µ(A) = 1 and the event E can be identified with A. This makes the
Lebesgue integral a powerful tool!
Another important feature in the above construction is the σ-additivity
of the underlying measure which allows to inherit properties as measurability
and integrability from sequences of functions to their limits! This leads to
the famous convergence theorems due to Lebesgue and his contemporaries.
Here is Lebesgue’s dominated convergence theorem:
Theorem 2.1. Let (gn ) be a sequence of measurable functions on a measure
space (X, µ). Assume that limn→∞ gn (x) exists for almost all x ∈ X and
is measurable, and that there exists an integrable function g ≥ 0 such that
|gn (x)| ≤ g(x) for almost all x and any n. Then
Z
Z
lim gn dµ.
gn dµ =
lim
n→∞ X
X n→∞
Thus, we may interchange limit and integration under quite weak conditions.
Here only pointwise convergence of the sequence of the gn is needed, not
the more restrictive uniform convergence which is needed for the Riemann
integral.
The monotone convergence theorem strengthens the theorem on dominated convergence:
2. Lebesgue Measure and Integral
23
Theorem 2.2. If (gn ) is an almost everywhere increasing sequence of realvalued, non-negative, measurable functions on X, and if gn converges to g
pointwise almost everywhere, then
Z
Z
g dµ.
gn dµ =
lim
n→∞ X
X
We conclude with introducing a vector space structure. For 1 ≤ p < +∞
we denote the vector space of all µ-integrable functions f : X → C with
semi-norm
1
Z
kf kp :=
X
|f |p dµ
p
< +∞
by Lp (X, F, µ). Taking the quotient with respect to the equivalence relation
f ∼g
: ⇐⇒
{x ∈ X : f (x) 6= g(x)}
is a null set,
we obtain the normed quotient vector space
Lp (X, F, µ) = Lp (X, F, µ)/ ∼
or, for short, Lp . Here two functions are identified if there values differ on
a set of measure zero, and the norm is defined as continuation of k · kp . The
famous theorem of Riesz & Fischer states that the spaces Lp are complete;
the special case p = +∞ does not play any role in the sequel.
*
*
*
In the first chapter we have found characterisations for uniformly distributed sequences, e.g., the sequence of numbers N ∋ n 7→ xn := nξ with
irrational ξ. In the following we want to investigate mappings T : X → X
defined on certain sets X in order to study the dynamics of the iteration of
T . In our approach the concept of measure plays a central role.
CHAPTER 3
Measure Invariance and Ergodicity
We shall consider mappings T : X → X defined on certain sets X. Our
aim is to understand the dynamics of iterations of T . For this purpose we
may assume that the transformation T respects the structure of X: if X
is a topological space, we may suppose T to be continuous; if X obeys a
differentiable structure, we want T to be a diffeomorphism. In the sequel
we shall often work in probability spaces, hence we may suppose that T is
measurable.
3.1. Measure Preserving Transformations
Given a measurable space (X, F, µ), a transformation T : X → X is
said to be measurable (or more precisely µ-measurable), if T −1 A := {x :
T (x) ∈ A} ∈ F for all A ∈ F. Any such mapping T is said invertible, if
T A := {T (x) : x ∈ A} ∈ F for all A ∈ F and T X = X. A measurable
mapping T is said to be measure preserving with respect to µ, if
µ(T −1 A) = µ(A)
for all A ∈ F;
that means the measure of a set always equals the measure of its preimage.
If T is additionally invertible, the latter property is equivalent to µ(T A) =
µ(A). If T is measure preserving, then (X, F, µ, T ) is called a dynamical
system. From a measure theoretical point of view one may also say that ’µ
is T -invariant’ rather than ’T is µ-measure preserving’.
Given a mapping T as above and x ∈ X, define
T 0 (x) = x, T 1 (x) = T (x)
T n+1 (x) = T (T n (x))
and
for
n ∈ N;
however, we shall use the abbreviation T n x in place of T n (x). The orbit
or trajectory of x under T is defined as the set {T n x : n ∈ N0 }. The
orbit encodes important information about the point x and the mapo T ,
respectively. In case of invertible maps it makes also sense to consider the
past, i.e.,
. . . , T −2 x, T −1 x, T 0 x = x, T x, T 2 x, . . . .
We may interpret this configerutaion as a dynamical system with discrete
time. In systems with continuous time one studies flows ϕ : X × R →
X, (x, t) 7→ ϕ(x, t) =: ϕt (x) with ϕ0 (x) = x for all x ∈ X and ϕs ◦ ϕt = ϕs+t ,
however, in the sequel we shall focus on the discrete time setting.
24
3. Measure Invariance and Ergodicity
25
We already know two interesting transformations. In order to embed
these examples into our new language let us take the measure space X =
[0, 1) with the Borel σ-algebra B and the Lebesgue measure λ.
♣ Example 1): The transformation from circle billiards is called circle
rotation (resp. translation) and is for fixed θ ∈ (0, 1) defined by
Rθ : T → T,
x 7→ x + θ.
(Obviously, we could also define Rθ (x) = {x + θ} = x + θ mod 1 on [0, 1).)
The projection of the sequence n 7→ nξ onto the circle group T is a circle
rotation: for xn we have Rξn = xn . Obviously, Rθ is measurable with respect
to the Lebesgue measure. In fact, given a subinterval (α, β) of [0, 1), we
have
Rθ−1 (α, β) = (α − θ, β − θ)
or
= (1 + α − θ, 1 + β − θ),
according to θ ≤ α or β ≤ θ, and
Rθ−1 (α, β) = (0, β − θ) ∪ (1 + α − θ, 1),
if α < θ ≤ β. This shows also that Rθ is measure preserving with respect to
λ since in both cases
λ(Rθ−1 (α, β)) = β − α = λ((α, β)).
In our reasoning we are allowed to restrict on intervals only since the Borel
σ-algebra is generated by the open subsets of X = [0, 1). This simplification
is based on the notion of a monotonic class C consisting of all finite disjoint
unions of elements of an algebra A. If additionally F is a σ algebra generated
by C and (X, F, µ) a measure space, for any A ∈ F and any ǫ > 0 there
exists a set B ∈ C such that µ(A∆B) < ǫ, hence, B approximates the given
set A as closely as we want. On account of this approximation properties
as measurability and measure invariance can be transported from C to the
completion F with respect to µ. This is known as the theorem of HahnKolmogorov; we refer the interested reader to [37] and [148] for details.
♣ Example 2): the transformation of the Gelfand problem is given by
2x
if 0 ≤ x < 12 ,
T : [0, 1) → [0, 1),
x 7→ 2x mod 1 =
2x − 1 if 12 ≤ x < 1
(in some literature this is also called the doubling–map). Given a subinterval
(α, β) in [0, 1), we have
β+1
T −1 (α, β) = ( α2 , β2 ) ∪ ( α+1
2 , 2 ),
which obviously is an element of B; hence, T is Lebesgue measurable. The
union on the right is disjoint (since α + 1 ≥ b) and, moreover,
λ(T −1 (α, β)) = β − α = λ((α, β)).
Thus, T is measure preserving with respect to the Lebesgue measure. The
cautious reader might have been surprised about the definition of measure
26
ERGODIC NUMBER THEORY
preserving where it is requested that both, T −1 A and A have the same
measure (and not T A and A). The doubling–map provides an example why
this is a good concept since it is measure preserving as shown above, but
not invertible:
λ(T −1 (α, β)) 6= β − α.
(3.1)
Although this example is simple, iterations of this mapping yield the binary
expansion for numbers from the unit interval [0, 1). Given x ∈ [0, 1), let
0 if 0 ≤ x < 21 ,
b1 = b1 (x) =
1 if 12 ≤ x < 1.
Then T x = 2x − b1 (x). Writing
bn = bn (x) = b1 (T n−1 x)
for
n ∈ N,
we thus find
x = 12 (b1 + T x)
and
T x = 21 (b2 + T 2 x),
resp. by induction
b1
b2
bn
T nx
+ 2 + ... + n + n
for n ∈ N.
2
2
2
2
Since 0 ≤ T n x < 1 the tail of the series converges to zero as n → ∞. Hence,
we obtain the binary expansion
x=
x=
∞
X
bn
.
n
2
n=1
♣ Example
√ 3): For the same measure space as in the preceeding example
let G = 21 ( 5 + 1) be the golden section and define TG : X → X by
Gx
if 0 ≤ x < G1 ,
TG x = Gx mod 1 =
Gx − 1 if G1 ≤ x < 1.
Actually, TG is not measure preserving with respet to the Lebesgue measure, however, it is measure preserving with respect to µ defined by
(
Z
1+2G
if 0 ≤ x < G1 ,
2+G
g(x) dx
with g(x) =
µ(A) =
G
if G1 ≤ x < 1.
A
1+G
The iterations TGn x provide the so-called G-expansion of x ∈ [0, 1), that is
∞
X
cn
x=
Gn
n=1
with cn ∈ {0, 1} and cn cn+1 = 0 for all n ∈ N.
♣ Example 4): Next we consider a two-dimensional generalization of the
Gelfand-mapping, also known as ’baker’s transformation’. Consider X =
3. Measure Invariance and Ergodicity
27
[0, 1)2 equipped with the product σ-algebra B×B and the product Lebesgue
measure λ × λ. Then the map is defined by
(2x, y2 )
if 0 ≤ x < 21 ,
2
2
b : [0, 1) → [0, 1) ,
(x, y) 7→ b(x, y) =
otherwise.
(2x − 1, y+1
2 )
1
1
Tz
z
0
1/2
1
0
1
2
0
1
Figure 1. The ’bakers transformation’ in action; it bears its
name from the process with which a baker is mixing water
and flour for a dough. It looks like flaky pastry.
These graphics were created with Maple-notebooks from Choe [31]; points
(xj , b(xj ) from a sufficiently large set of uniformly distributed xj are taken
for an approximation to the graph of b.∗
The bakers transformation b is measurable, invertible, and measure preserving with respect to λ × λ.
1
1
y
1
y
0
y
0
0
1
0
0
x
1
0
x
1
x
Figure 2. The first iterations b, b2 , b3 of the baker transformation.
♣ Example 5): The so-called logistic transformation
ℓ : [0, 1] → [0, 1]
x 7→ 4x(1 − x)
is measurable and measure preserving with respect to
Z
1
dx
p
µ(A) =
.
π A x(1 − x)
This density plays a prominent role in the Sato–Tate–conjecture on the
distribution of group orders of elliptic curves by reduction modulo primes
which was recently proved by Taylor [136]. In fact, it is the uniform
distribution on the conjugacy classes of the special unitary group SU2 (C)
∗
This is in the spirit of the French impressionist painter Georges Seurat and his
contemporaries.
28
ERGODIC NUMBER THEORY
1
5
y
y
0
0
0
1
0
1
x
x
Figure 3. The logistic transformation: to the left the graph
y = 4x(1 − x), and its density to the right.
with respect to the Haar measure. In a similar way Deligne’s famous proof
of the Weil conjectures [38] shows a uniform distribution of the Frobenius
conjugacy classes.
♣ Example 6): Identifying the circle group T with the unit interval [0, 1)
modulo one, we get for T2 = T × T the unit square [0, 1)2 where opposite
sides are identified, hence T2 is a two-dimensional torus (or a doughnut in
terms of a baker). The mapping
x
2 1
x
2
2
A : T →T ,
7→
mod 1
y
1 1
y
is invertible (since the corresponding matrix has non-vanishing determinant)
and, as a short computation shows, is measure preserving with respect to the
two-dimensional Lebesgue measure. This map A is also called “Arnold’s
cat map” in honour of V.I. Arnold.† The mapping A is an example of a
so-called toral automorphism.
Figure 4. How the cat map maps...
We conclude with a last example. The famous 3X + 1-problem (also
known as Collatz- or Syracuse-problem) is based on the following iteration
on the set of positive integers:
x/2 if x even,
x 7→ T x =
3x + 1 if x odd.
For instance,
... 7→ 12 7→ 6 7→ 3 7→ 10 7→ 5 7→ 16 7→ 8 7→ 4 7→ 2 7→ 1 7→ ...,
†
For the origin of this name see into his monograph [6].
3. Measure Invariance and Ergodicity
29
Figure 5. Iterations of cat Felix under the “Arnold cat
map”: A0 , A1 , A2 from left to right.
hence, the orbit of x = 12 is eventually periodic. It is conjectured that
this iteration is eventually periodic with period 4 7→ 2 7→ 1, independent of
the initial value x. A weaker conjecture claims that there are no divergent
trajectories of this iteration. The mapping T is definitely not injective. This
example illustrates that it sometimes makes sense to study the past of an
iteration: what is the preimage of 1 under this iteration?
Actually, there is an interesting ergodic approach to this open problem.
Matthews & Watts [102] showed that T is measure preserving on the set
Z2 of 2-adic integers equipped with the corresponding Haar measure, and,
using Birkhoff’s ergodic theorem, that the iterations T n x are uniformly
distributed modulo 2k for any k ∈ N and almost all x ∈ Z2 . Unfortunately, this result is beyond the scope of this course, however, the interested
reader can find more information in the survey of Lagarias [93] and in
Wirsching’s book [153]. It is somehow surprising that a problem easy to
formulate as the 3X + 1-problem seems to be so difficult to solve.‡
Further examples of measure preserving transformations can be found
in [31]; for the case of Bernoulli-shifts we refer to [37].
Next we shall give a criterion for measure preserving in analogy to
Weyl’s Theorem 1.3 on uniform distribution modulo one:
Theorem 3.1. A transformation T : X → X is measure preserving with
respect to µ if, and only if, for all µ-integrable functions f : X → C,
Z
Z
f ◦ T dµ.
f dµ =
(3.2)
X
X
In the formula giving the equivalent for measure invariance one may understand T as the time evolution of the dynamical system, f as the outcome of
a physical experiment, and the integral as the expected value of the outcome
of f ; then the invariance of the measure µ is nothing but the expectation of
the outcome is the same now and one time unit later.
In the case of metric spaces it suffices to prove the condition only for
continuous functions f . One implication then follows from the proof below,
‡
In the 1960s Kakutani became interested in this problem; he shall have said: ”For
about a month everybody at Yale (University) worked on it, with no result. A similar
phenomenon happened when I mentioned it at the University of Chicago. A joke was
made that this problem was part of a conspiracy to slow down mathematical research in
the U.S.” (cf. [93])
30
ERGODIC NUMBER THEORY
the converse one from the representation theorems of Hahn-Banach and
Riesz (see [121]).
Proof. Assume (3.2) holds. Let A be a measurable set and denote by χA
its (measurable) indicator function. Then,
Z
Z
Z
χT −1 A dµ = µ(T −1 A).
χA ◦ T dµ =
χA dµ =
µ(A) =
X
X
X
Hence, T is measure preserving.
Now assume that T is measure preserving. Then (3.2) holds in particular
for indicator functions and, consequently, for all simple functions too. Now
suppose that f ≥ 0 and (fn ) is a convergent sequence of measurable simple
functions with limit f . Then limn→∞ fn ◦ T = f ◦ T . Applying Lebesgue’s
theorem on dominated convergence, Theorem 2.1, with gn = fn ◦ T and
gn = fn as well, we find
Z
Z
Z
Z
f ◦ T dµ = lim
fn ◦ T dµ = lim
fn dµ = f dµ,
n→∞
n→∞
where we have used (3.2) in the last but one step for simple functions. By
the decomposition (2.2), we deduce the statement for arbitrary real-valued
functions f ; complex-valued f can be treated by separating into real- and
imaginary part (in the same manner as in the proof of Theorem 1.4). •
♣ Example 7): Let T : R → R be defined by T 0 = 0 and
1
1
for x =
6 0.
Tx = 2 x −
x
Then
T −1 (α, β) = (α −
p
α2 + 1, β −
p
β 2 + 1) ∪ (α +
p
α2 + 1, β +
p
β 2 + 1),
hence, T is measurable. For any Lebesgue integrable function f , we find
via the substitution τ = T x, dτ = 12 (1 + x12 ) dx that
Z +∞
Z +∞
dτ
dx
f (τ )
=
.
f (T x)
2
1
+
x
1
+
τ2
−∞
−∞
Thus, Theorem 3.1 implies that T is measure preserving with respect to the
probability measure P defined by
Z
1 β dτ
(3.3)
P((α, β)) =
.
π α 1 + τ2
Alternatetively, one may use the addition theorem
p
p
arctan(x + x2 + 1) + arctan(x − x2 + 1) = arctan(x).
Actually, the transformation T originates from Newton’s iteration applied
to the function f (x) = x2 + 1. Here, Newton’s iteration translates as
follows:
x2 + 1
f (xn )
1
1
↔
Tx = x −
= 2 x−
xn+1 = xn − ′
.
f (xn )
2x
x
3. Measure Invariance and Ergodicity
31
If there would exist a real zero of f , the sequence of the numbers xn would
converge, however, since f (x) 6= 0 for real x, the iteration diverges and
provides an interesting random transformation (which we shall meet again
in Chapter 6). This example is due to Lind (cf. [31]).
3.2. Ergodicity and Mixing
Now we consider a probability space (X, F, µ). A measure preserving
transformation T : X → X is said to be ergodic with respect to µ if for
any measurable set A with T −1 A = A either µ(A) = 0 or µ(A) = 1 holds.
In this case (X, F, µ, T ) is called an ergodic dynamical system. Ergodicity
thus means that any measurable T -invariant set is either a null set or has
full measure.∗
Theorem 3.2. The following statements are equivalent:
(i)
(ii)
(iii)
(iv)
T is ergodic;
µ(B) = 0 or = 1 for all B ∈ F with µ(T −1 B∆B) = 0;
S
µ( n T −n A) = 1 for all A ∈ F with µ(A) > 0;
for any A, B ∈ F with µ(A) > 0 and µ(B) > 0, there exists some
n ∈ N such that µ(T −n A ∩ B) > 0.
If T is invertible, we can replace T −n by T n in these conditions for ergodicity.
We want to give a few remarks. Condition (iii) claims that whenever A has
positive measure any x ∈ X eventually will visit A under T (even infinitely
often), whereas Condition (iv) shows that any element of B will almost
surely visit A under T provided B has positive measure.
Proof. (i) ⇒ (ii): We assume that B is measurable with µ(T −1 B∆B) = 0
and that T is ergodic. We denote the limit superior by
C :=
∞ [
∞
\
T −n B.
m=0 n=m
For m ∈ N0 , we have
B∆
∞
[
T
−n
n=m
Now
B∆T −n B ⊂
B ⊂
n−1
[
∞
[
B∆T −n B.
n=m
T −k B∆T −(k+1)B
k=0
and since, by assumption, the set on the right-hand side has measure zero,
it follows that µ(B∆T −n B) = 0 for any n ∈ N. Now let
Cm =
∞
[
T −n B,
n=m
∗
In probability theory many so-called 0 − 1–laws are known (starting from the work
of Kolmogorv, Borel).
32
ERGODIC NUMBER THEORY
hence, the Cm are nested one in another:
C0 ⊃ C1 ⊃ C2 ⊃ . . . .
Moreover, µ(Cm ) = µ(B) for any m ∈ N0 . It thus follows that µ(C∆B) = 0
and µ(C) = µ(B), respectively. Furthemore, we have
T −1 C =
∞ [
∞
\
∞
\
T −(n+1) B =
m=0 n=m
∞
[
T −n B = C.
m=0 n=m+1
By assumption, µ(C) = 0 or µ(C) = 1. In view of our previous observation
it follows that either µ(B) = 0 or µ(B) = 1.
(ii) ⇒ (iii): Now assume that we are given a set A such that µ(A) > 0
S
−n A. Then
and let B = ∞
n=1 T
T −1 B =
∞
[
n=2
T −n A ⊂ B.
Since T is measure preserving, it follows that µ(T −1 B) = µ(B), hence
µ(B∆T −1 B) = µ(B) − µ(T −1 B) = 0.
Consequently, µ(B) = 0 or µ(B) = 1. Since T −1 A ⊂ B and µ(A) > 0 by
monotonicity, it follows that µ(B) = 1.
(iii) ⇒ (iv): Let both, A and B be sets of positive measure. By Condition
(iii),
!
∞
[
−n
T A = 1,
µ
n=1
hence
0 < µ(B) = µ
∞
[
n=1
!
B ∩ T −n A
≤
∞
X
µ(B ∩ T −n A).
n=1
∩ T −n A)
In particular, there exists some n with µ(B
> 0.
−1
(iv) ⇒ (i): Let A be a set with T A = A. Then
0 = µ(A ∩ X \ A) = µ(T −n A ∩ X \ A)
for arbitrary n ≥ 1. It thus follows from Condition (iv) that µ(A) = 0 or
µ(X \ A) = 0, resp. µ(A) = 1 − µ(X \ A) = 1. •
Next we shall prove a criterion for ergodicity relevant for practical purposes:
Theorem 3.3. The following assertions are equivalent:
(i) T is ergodic;
(v) if f is a measurable function such that f (T x) = f (x) for (almost)
all x, then f is constant (almost) everywhere.
(vi) if f ∈ L2 (X, F, µ) with f (T x) = f (x) for (almost) all x, then f is
constant (almost) everywhere.
3. Measure Invariance and Ergodicity
33
In Conditions (v) and (vi) we may suppose f (T x) = f (x) for all or just
for almost all x ∈ X; because of the negligibility of null sets in Lebesgue
integration these statements are equivalent.
Proof. (i) ⇒ (v): Suppose that T is ergodic and f : X → C is measurable
and satisfies f (T x) = f (x) for almost all x. Since this implies the same for
both, the real and the imaginary part of f individually, we may suppose
that f is real-valued. For k ∈ Z and n ∈ N let
Akn = {x ∈ X : f (x) ∈ [ nk , k+1
n )}.
Then
T −1 Akn ∆Akn ⊂ {x ∈ X : f ◦ T (x) 6= f (x)}.
Since the set on the right-hand side has measure zero, Theorem 3.2, (ii),
implies µ(Akn ) ∈ {0, 1}. For any n the set X is a disjoint union of sets
S
Akn , i.e., X = k∈Z Akn . Thus, there exists a unique positive integer k(n)
k(n)
(depending on n) such that µ(An
Y =
) = 1. Now let
∞
\
Ank(n) .
n=1
Then µ(Y ) = 1 and f is constant on Y . Since Y has full measure, f is
constant almost everywhere.
The implication (v) ⇒ (vi) is trivial, so it remains to prove (vi) ⇒ (i):
suppose that T −1 A = A with a measurable set A of positive measure, then
we need to show µ(A) = 1. For the indicator function of A we thus have to
prove χA ∈ L2 (X, F, µ) and χA ◦ T = χT −1 A = χA . By assumption, χA is
constant almost evrywhere, hence, χA (x) = 1 for almost all x. This implies
µ(A) = 1. The theorem is proved. •
As application we study two examples of measure preserving transformations from the previous section with respect to ergodicity. Both mappings
are defined by use of a periodicity instruction which suggests to use Criterion (vi) from above in combination with Fourier analysis. Recall that
any L2 -function can be represented by its Fourier series (as proved, for
example, in [121]).
♣ Example 1): The circle rotation Rθ : [0, 1) → [0, 1), x 7→ x + θ mod 1
describes the distribution of the fractional parts of the real number sequence
xn = nθ + β with β = Rθ 0. Corollary 1.5 implies that the sequence (nθ) is
uniformly distributed modulo one if, and only if, θ is irrational. Analogously,
the same statement holds true for shifted sequences (nθ + β) independent of
β. The following theorem shows that this is indeed an ergodic phenomenon:
Theorem 3.4. The circle rotation Rθ is ergodic with respect to the
Lebesgue measure if, and only if, θ is irrational.
34
ERGODIC NUMBER THEORY
Proof. Suppose θ = pq is rational. Then x 7→ e(qx) = exp(2πiqx) defines a
non-constant Rθ -invariant function:
e(qRθ x) = exp(2πiq(x + pq )) = exp(2πiqx) exp(2πip) = e(qx).
In view of Theorem 3.3, Condition (v), it follows that Rθ is not ergodic.
Suppose that θ is irrational. Let
X
(3.4)
f (x) =
cn e(nx)
n∈Z
denote the Fourier series of an Rθ -invariant function f ∈ L2 . Then
X
f (x) = f (Rθ x) = f (x + θ) =
cn e(nθ) e(nx),
n∈Z
and, with the uniqueness of the Fourier expansion, cn = cn e(nθ), resp.
cn (1 − e(nθ)) = 0
for n ∈ Z.
For n 6= 0 it follows that e(nθ) 6= 1 thanks to the irrationality of θ, thence
cn = 0. Thus f (x) = c0 is constant and Theorem 3.3, Condition (vi), implies
the ergodicity of Rθ . • (For a proof without Fourier theory we refer to
[47].)
♣ Example 2): Consider the doubling-map T : [0, 1) → [0, 1), x 7→
2x mod 1. As in the previous proof we start with a T -invariant function
f ∈ L2 with Fourier series (3.4). Then
X
f (x) = f (T x) =
cn e(2nx).
n∈Z
Comparing coefficients we find cn = c2n . The Parseval identity (see [121])
yields
Z 1
X
2
|f (x)|2 dx =
kf k2 =
|cn |2 < +∞.
0
n∈Z
Hence, all cn with n 6= 0 vanish, and by Theorem 3.3, (v), the ergodicity of
T follows. This reasoning can be extended to toral automorphisms:
Theorem 3.5. Let A ∈ Zd×d be a matrix and
Tφ : Td → Td ,
φ(x) = Ax mod 1
for x ∈ Td . Then Tφ is ergodic if, and only if, the eigenvalues of A do not
contain a root of unity.
In particular, the mapping x 7→ x mod 1 is not ergodic (which, of course, is
trivial). The proof of the general statement is not much more difficult than
the special case sketched above. We refer to [33, 31] for details.
A close relative of ergodicity is the notion of mixing. A transformation
T is said to be strongly mixing if, for all A, B ∈ F,
lim µ(A ∩ T −n B) = µ(A)µ(B).
n→∞
3. Measure Invariance and Ergodicity
35
On the contrary, T is called weakly mixing if
1 X
lim
|µ(A ∩ T −n B) − µ(A)µ(B)| = 0.
N →∞ N
0≤n<N
Obviously, the following chain of inclusions holds:
strongly mixing
⇒
weakly mixing
⇒
ergodic.
An example of a strongly mixing process is provided by the Baker transformation b. On the contrary, circle rotations Rθ with irrational θ are ergodic
but not strongly mixing. Examples for weakly but not strongly mixing
transformations were given by Kakutani [76].
For the different notions of mixing Halmos [58] found the following
intuitive cocktail example: given a bowl with 90 percent gin and 10 percent
vermouth. After shaking sufficiently long the two fluids mix to one drink
in which any Borel set should contain the same proportions of gin and
vermouth.
Exercises
Practise makes perfect. Since mathematics is no spectator sport it is always the
best to examine new theory by examples.
Exercise 3.1. Verify all claims made above about i) TG and the G-expansion, and
ii) the Baker-transformation.
Next we state inverse problems: how to discover ergodicity and is the measure
associated with an ergodic transformation unique? That are both difficult questions
as one may experience by the following
Exercise 3.2. Define the transformation T by T 0 = 0 and T x = { x1 } for x ∈ (0, 1).
Try to find a measure µ on [0, 1) such that T is measure preserving with respet to
µ. Moreover, prove that the Lebesgue measure is the only measure for which
the circle rotation Rθ is ergodic. Hint for the second task: use the circle group
characters x 7→ e(mx), m ∈ Z.
Given a transformation T and a σ-algebra, there may be many ergodic measures
with respect to T . If there is only one ergodic measure, then T is said to be uniquely
ergodic.
The next exercises are also good for getting experience with all the new notions
and techniques:
Exercise 3.3. Let (X, F , µ) be a measure space and T : X → X a measurable
mapping. Show that all T -invariant sets constitute a σ-algebra.
Exercise 3.4. Let m > 1 be a positive integer and denote by X = Z/mZ the ring
of residue classes modulo m. Further, put F = P(X) and denote by µ the uniform
distribution on X. Finally, for b ∈ {1, 2, . . . , m} put
Tb : X → X,
x 7→ x + b mod m.
Prove that i) Tb is measure preserving, and ii) (X, F , µ, Tb ) is ergodic if, and only
if, b and m are coprime.
36
ERGODIC NUMBER THEORY
Exercise 3.5. Prove all statements on mixing and ergodicity and their hierarchy.
*
*
*
Our next aim are ergodic theorems. Birkhoff wrote in [21]:
”What the Ergodic Theorem means, roughly speaking, is
that for a discrete measure-preserving transformation or a
measure-preserving flow of a finite volume, probabilities and
weighted means tend towards limits when we start from a definite state P (not belonging to a possible exceptional set of
measure 0), and, furthermore, the limiting value is the same
in both directions.”
Following Billingsley [17] here is an easy probabilistic proof of a special
case of the ergodic theorem: if T is mixing and A is a measurable set,
P
then N1 0≤n<N χA (T n x) converges in probability to the expectation of the
indicator function χA , that is EχA = P(A). In fact, if
cmn := E(χA (T m x) − P(A))(χA (T n x) − P(A))
=
P(T −m A ∩ T −n A) − P(A)2 ,
then, since T preserves measure, we find cmn = ρ|n−m| with ρk := P(A ∩
T −k A) − P(A)2 which tends to zero as k → ∞ (because of the mixing
property). Thus
2 o
n 1 X
χA (T n x) − P(A)
E
N
0≤n<N
=
1
N2
X
0≤m,n<N
cmn =
1
2
ρ0 + 2
N
N
X
0≤m<N
(N − m)ρm ,
which tends to zero as N → ∞ by the theorem on arithmetic means of
convergent sequences. It thus follows from Chebyshev’s inequality that
1 P
n
0≤n<N χA (T x) converges in probability to P(A). In the following
N
chapter we shall prove a much stronger version...
CHAPTER 4
Classical Ergodic Theorems
In statistical mechanics one studies a large number of particles whose
positions and momenta are governed by Hamilton’s equation for a given
Hamiltonian. The trajectories of these particles can be considered as a
flow in phase space. It was Boltzmann’s idea to study the flow rather
than a single particle. In his ergodicity hypothesis from 1871 he claims
that the average amount of time any given orbit spends in some set exactly
equals the measure of this set. This statement implies an equivalence with
respect to the mean along a trajectory (Greek: odos) of the system and
the mean over all possible states of equal energy (Greek: ergon). In 1879
Maxwell claimed that any system in any state, sooner or later, will move
to any state possible with respect to the physical side conditions. It was
Poincaré who discovered in 1890 that it is too restrictive to demand that
the trajectory visits any point in the phase space which is conform with the
physical side conditions, hence, this restrictive ergodic hypothesis is false.∗
Actually, Poincaré formulated a weak ergodic hypothesis which states that
the trajectory comes as close to any point of the phase space as we want
(however, the trajectory does not need to visit this target point). The
ergodic theorems below yield a justification of this weak ergodic hypothesis,
hence, they might be interpreted as mathematical foundation of statistical
mechanics.†
4.1. The Mean Ergodic Theorem of von Neumann
The first ergodic theorem was found by John von Neumann [105]
(although his result was published one year after Birkhoff’s theorem which
will be discussed in the next section).
Theorem 4.1. Let (X, F, µ) be a probability space and T : X → X measure
preserving. Then, for f, g ∈ L2 (X, F, µ), the limit of
Z
1 X
f (T n x)g(x) dµ(x)
N
X
0≤n<N
∗
It seems that the strong formulation of Boltzmann’s ergodic hypothesis that one
single trajectory is filling the whole of phase space is due to Ehrenfest’s review of
Boltzmann’s work.
†
In case of spontaneous breaks in symmetry refutations of the ergodicity hypothesis
can appear — this scenario has been observed in phase transitions when fluids freeze and
in spin glasses.
37
38
ERGODIC NUMBER THEORY
exists as N → ∞; if T is ergodic, then
Z
Z
Z
1 X
n
(4.1)
lim
g dµ.
f dµ
f (T x)g(x) dµ(x) =
N →∞ N
X
X
X
0≤n<N
This theorem is also called mean ergodic theorem because of the integration
over X; the appearing function g is a suitable weight function and does not
appear in von Neumann’s original work. As special case we conclude the
convergence in L2
1 X
(4.2)
lim f (T n x) − f ∗ = 0
N →∞ N
2
0≤n<N
with a T -invariant limit f ∗ ∈ L2 . The von Neumann ergodic theorem is a
functional-analytic statement in the following sense. The right-hand side of
the formula in the theorem is the orthogonal projection of f onto the space
of T -invariant f in the Hilbert space L2 equipped with the inner product
R
hf, gi = kf gk22 = f g dµ.
Sketch of proof. Consider the subspace of T -invariant functions
I := {f ∈ L2 : f ◦ T = f },
and let
J := {f ∈ L2 : ∃ h ∈ L2 with f = h ◦ T − h}.
For f1 ∈ I and f2 = h ◦ T − h ∈ J we have
1 X
f1 (T n x) = f1 (x)
N
0≤n<N
and
1
N
X
0≤n<N
f2 (T n x) =
1
(h(T N x) − h(x))
N
for any N ∈ N. By the Cauchy-Schwarz inequality,
Z
2
1 n
(h(T x) − h(x))g(x) dµ(x) ≤ khk2 kgk2 ,
N X
N
which tends to zero as N → ∞. If we would have a decomposition f = f1 +f2
with f1 , f2 as above, we could deduce
Z
1 X
f (T n x)g(x) dµ(x)
N
X
0≤<N
Z
Z
1 X
f1 (x)g(x) dµ(x) +
=
f2 (T n x)g(x) dµ(x),
N
X
X
0≤<N
thence
Z
Z
1 X
f1 g dµ.
f (T n x)g(x) dµ(x) =
N →∞ N
X
X
lim
0≤<N
4. Classical Ergodic Theorems
39
Unfortunately, in general there is no such decomposition of f available.
However, for all sufficiently small ǫ > 0 we can find functions f1 ∈ I and
f2 ∈ J such that
Z
kf − (f1 + f2 )k22 dµ < ǫ;
X
consequently, f1 + f2 approximate the target function f in the mean-square.
In a similar way as in the case f = f1 + f2 it follows that
Z
Z
Z
1 X
n
g dµ.
f dµ
f (T x)g(x) dµ(x) =
lim
N →∞ N
X
X
X
0≤n<N
To finish the proof we only need to show that there exists a decomposition of L2 into a direct sum I ⊕ J , where J denotes the closure of J . For
this purpose we may suppose that f is orthogonal to J , i.e., hf, f2 i = 0 for
all f2 ∈ J . In particular,
Z
Z
2
(f ◦ T ) · f dµ.
|f | dµ =
X
X
It remains to show that f ∈ I. For this we compute
Z
|f ◦ T − f |2 dµ = 0.
X
Hence, f ◦ T = f almost everywhere which means f ∈ I and finishes the
proof. •
Recently, Tao [134] proved a multidimensional version of von Neumann’s
theorem which has been a long-standing problem.
4.2. The Birkhoff Pointwise Ergodic Theorem
Our next aim is the important ergodic theorem of George D.
Birkhoff [20]:
Theorem 4.2. Let T be a measure preserving transformation on a probability space (X, F, µ). If f ∈ L(X, F, µ), then, for almost all x ∈ X, the
limit
1 X
f (T n x)
(4.3)
f ∗ (x) := lim
N →∞ N
0≤n<N
exists with f ∗ ∈ L(X, F, µ) and satisfies f ∗ (T x) = f ∗ (x) as well as
Z
Z
∗
f dµ.
f dµ =
(4.4)
X
X
If additionally T is ergodic, then f ∗ is constant almost everywhere and
Z
1 X
(4.5)
lim
f dµ.
f (T n x) =
N →∞ N
X
0≤n<N
40
ERGODIC NUMBER THEORY
Birkhoff’s original paper [20] deals only with the case of indicator functions. It was Aleksandr Khintchine [80] who extended Birkhoff’s result to arbitrary integrable functions f on an finite measure space. For
this reason in some literarture this result is also called the BirkhoffKhintchine theorem. Another name one can find is pointwise ergodic
theorem. It shows that the time mean (4.3) of f along the orbit {T n x}
equals for almost all x the space mean of f (taken over the complete space
X). This provides a rather precise prediction although not too much might
be known about f or T : the conditions that f ∈ L and T is measure presering are rather weak. In this sense Birkhoff’s ergodic theorem allows to
predict the future value of f along a trajectory, practically, without knowing
anything! For example, if M ⊂ X is measurable and f = χM , the mean of
all visits T n x in M is for almost all initial values x equal to the measure of
M provided T is ergodic. Ergodicity enforces uniform distribution!
Our proof follows Kamae & Keane [77]:
Proof. We shall prove the statement for non-negative real-valued functions;
the general case follows in a standard way by use of the f = f + − f − with
non-negative f + , f − (see (2.2)) for arbitrary real-valued and for complexvalued functions by treating the real and the imaginary part separately.
Therefore, let us assume f ≥ 0. We define
X
f (T n x)
fN (x) =
0≤n<N
as well as
f (x) = lim sup
N →∞
fN (x)
N
and
f (x) = lim inf
N →∞
fN (x)
.
N
It follows that both, f and f are measurable (since lim supN →∞ fN (x) =
inf m supN ≥m fN (x) and something analogous for lim inf). In view of
fN (T x)
fN +1 (x) N + 1 f (x)
f (T x) = lim sup
= lim sup
·
−
N
N +1
N
N
N →∞
N →∞
fN +1 (x)
= f (x)
= lim sup
N +1
N →∞
it follows that f is T -invariant. In an analogous manner we find that
f (T x) = f (x). In order to prove the existence of the limit f ∗ , its integrability, and its T -invariance, it suffices to show
Z
Z
Z
f dµ,
f dµ ≤
f dµ ≤
(4.6)
X
X
X
since then f ≤ f implies f (x) = f (x) = f ∗ (x) for almost all x and integration yields (4.4). (Here we use that if the Lebesgue integral of a
non-negative function equals zero, then the function vanishes almost everywhere.)
4. Classical Ergodic Theorems
41
Let ǫ ∈ (0, 1) and L > 0 be given. By definition of f for any x ∈ X there
exists a positive integer m such that
fm (x)
≥ (1 − ǫ) min{f (x), L};
m
in fact, this inequality holds independent of L (with likely another m). For
any given δ > 0 we further find a positive integer M such that the set
X+ := x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≥ m(1 − ǫ) min{f (x), L}
has measure greater than or equal to 1 − δ. Next define
f (x) if x ∈ X+ ,
f˜(x) =
L
otherwise.
It follows that f ≤ f˜; to see this assume x ∈ X \ X+ , hence
fm (x) < m(1 − ǫ) min{f (x), L},
which implies f ≤ L. For x ∈ X and n ∈ N0 , let
an := an (x) := f˜(T n x)
and
bn := bn (x) := (1 − ǫ) min{f (x), L}.
We claim that, for any n ∈ N0 , there exists a positive integer 1 ≤ m ≤ M
satisfying
(4.7)
an + . . . + an+m−1 ≥ bn + . . . + bn+m−1 .
In order to verify this, let us first assume that T n x ∈ X+ . Then there exists
1 ≤ m ≤ M such that
fm (T n x) ≥ m(1 − ǫ) min{f (T n x), L}
= m(1 − ǫ) min{f (x), L} = bn + . . . + bn+m−1 ,
where we have used the T -invariance of f . Hence,
an + . . . + an+m−1 = f˜(T n x) + . . . + f˜(T n+m−1 x)
≥ f (T n x) + . . . + f (T n+m−1 x) = fm (T n x)
= bn + . . . + bn+m−1 .
If T n x 6∈ X+ , we may take m = 1 since
an = f˜(T n x) = L ≥ (1 − ǫ) min{f (x), L} = bn .
Consequently, our assertion about (4.7) is proved.
In view of (4.7), for any positive integer N > M , there exist recursively
defined integers m0 < m1 < . . . < mk < N satisfying 1 ≤ m0 ≤ M, mj+1 −
mj ≤ M for j = 0, 1, . . . , k − 1 as well as N − mk ≤ M and
a0 + . . . + am0 −1 ≥ b0 + . . . + bm0 −1 ,
am0 + . . . + am1 −1 ≥ bm0 + . . . + bm1 −1 ,
...
...
amk−1 + . . . + amk −1 ≥ bmk−1 + . . . + bmk −1 .
42
ERGODIC NUMBER THEORY
Addition of these inequalities leads to
(4.8)
a0 + . . . + aN −1 ≥ a0 + . . . + amk −1
≥ b0 + . . . + bmk −1 ≥ b0 + . . . + bN −M −1 .
Note that the numbers bn are all independent of n. Translating the latter
inequalities, we find
X
f˜(T n x) ≥ (N − M )(1 − ǫ) min{f (x), L}.
0≤n<N
Integration yields
Z
X Z
min{f (x), L} dµ(x).
f˜(T n x) dµ(x) ≥ (N − M )(1 − ǫ)
0≤n<N
X
X
Since T is measure preserving by Theorem 3.1, it follows that
Z
Z
g(x) dµ(x)
g(T x) dµ(x) =
X
X
for all integrable functions g, in particular for g = f˜. Hence we get rid of
the mean over all 0 ≤ n < N and obtain
Z
Z
˜
min{f (x), L} dµ(x).
f dµ ≥ (N − M )(1 − ǫ)
N
X
X
Since
Z
f˜(x) dµ(x) =
Z
X+
X
f (x) dµ(x) + Lµ(X \ X+ ),
it follows from our construction that
Z
Z
Z
f˜(x) dµ(x) − Lµ(X \ X+ )
f (x) dµ(x) =
f (x) dµ(x) ≥
X
X+
X
Z
N −M
min{f (x), L} dµ(x) − Lδ.
(1 − ǫ)
≥
N
X
Next we let N tend to infinity, and then δ and ǫ to zero in order to deduce
Z
Z
min{f , L} dµ.
f dµ ≥
X
X
Applying the monotone convergence Theorem 2.2 with gL = min{f , L} and
L → ∞, we may interchange limit and integration:
Z
Z Z
min{f , L} dµ =
lim
lim min{f , L} dµ =
f dµ.
L→∞ X
Thus,
X
Z
X
L→∞
f dµ ≥
Z
X
f dµ.
X
This is the first inequality in (4.6).
For the proof of the first inequality in (4.6) we start in a similar manner
as above: for any ǫ > 0 and any x ∈ X, there exists a positive integer m
satisfying
fm (x)
≤ f (x) + ǫ.
m
4. Classical Ergodic Theorems
43
For arbitrary δ > 0 we can find a positive integer M such that
X− := x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≤ m(f (x) + ǫ)
has measure at least 1 − δ. Now define
f (x) if x ∈ X− ,
fˆ(x) =
0
otherwise.
Then fˆ ≤ f and, setting bn = fˆ(T n x) and an = f (x) + ǫ (this time independent of n), we deduce from (4.7) and (4.8) that
X
fˆ(T n x) ≤ N (f (x) + ǫ).
0≤n<N −M
Since T is measure preserving, integration of both sides yields
Z
Z
f dµ + ǫN.
fˆ dµ ≤ N
(N − M )
X
X
Because of f ≥ 0 the measure µ̃, given by
Z
f dµ,
µ̃(A) =
A
is absolutely continuous, that means that there exists some δ̃ > 0 for which
µ̃(A) < δ whenever µ(A) < δ̃. It thus follows from µ(X \ X− ) < δ that
Z
Z
Z
Z
N
f dµ ≤
fˆ dµ +
f dµ =
(f + ǫ) dµ + δ̃
N −M X
X\X−
X
X
Letting first N → ∞ and then δ → 0 (as well as δ̃ → 0), and, finally, ǫ → 0,
we get
Z
Z
f (x) dµ(x).
f (x) dµ(x) ≤
X
X
Hence, (4.6) is proved.
It remains to prove (4.5) in case of ergodic T . In view of Theorem 3.3,
(v), the function f ∗ is constant almost everywhere, hence f ∗ (x) = c for
almost all x ∈ X. This implies
Z
Z
∗
f dµ.
f dµ =
c=
X
X
The theorem is proved. •
In contrast to von Neumann’s functional approach Birkhoff chose the
concept of measure space for his ergodic theorem which lead him to his more
practical result. Important generalizations of both ergodic theorems were
given by Hopf, Yosida & Kakutani as well as Wiener & Wintner
[151], Hurewicz [68]‡ and, even more general, Chacon & Ornstein [29]
(resp. [40]).
The rate of convergence in Birkhoff’s Theorem 4.2 can be rather slow
(as indicated by the simulation below). One can show that a computable
‡
see the excellent online notes http://www.math.uu.nl/people/dajani/lecturenotes2006.pdf
of Dajani
44
ERGODIC NUMBER THEORY
rate of convergence in general cannot exist (cf. [91]). Recently, Kohlenbach & Leuştean [86] obtained a quantitative version of Theorem 4.2 for
uniformly convex Banach spaces by use of model theoretical techniques (in
particular, Gödel’s functional interpretation); see also Avigad et al. [8] in
this context.
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
1000
0
0
n
1000
0
1000
n
n
Figure 1. From left to right: doubling T x = 2x mod 1, the
logistic transformation ℓx = 4x(1 − x), and T x = {1/x},
which will play a central role in a later chapter.
As first application of Birkhoff’s ergodic theorem we shall derive a
measure theoretical characterization of ergodicity:
Theorem 4.3. Let (X, F, µ) be a probability space and assume that T :
X → X is measure preserving with respect to µ. Then T is ergodic if, and
only if, for all A, B ∈ F,
1 X
µ(T −n A ∩ B) = µ(A)µ(B).
(4.9)
lim
N →∞ N
0≤n<N
The theorem states that the preimages of a set A under an ergodic transformation T cover part of a given arbitrary set in the mean. This criterion
for ergodicity may be compared with the notions of weak and strong mixing
from the previous section.
Proof. Assume that T is ergodic. Applying the Birkhoff ergodic theorem
4.2 to the indicator function f = χA , yields
Z
1 X
n
χA dµ = µ(A)
χA (T x) =
(4.10)
lim
N →∞ N
X
0≤n<N
for almost all x. Hence,
1 X
1
lim
χT −n A∩B (x) = lim
N →∞ N
N →∞ N
0≤n<N
X
χA (T n x)χB (x) = µ(A)χB (x)
0≤n<N
almost everywhere. For any N , the limit on the left-hand side is bounded
by the constant function 1. Thus, Lebesgue’s theorem 2.1 on dominated
convergence implies
Z
1 X
1 X
−n
lim
lim
µ(T A ∩ B) =
χT −n A∩B (x) dµ(x)
N →∞ N
X N →∞ N 0≤n<N
0≤n<N
Z
χB (x) dµ(x) = µ(A)µ(B),
= µ(A)
X
4. Classical Ergodic Theorems
45
which is nothing else than (4.9).
For the converse, suppose that T −1 A = A. Setting A = B in (4.9),
shows that
1 X
lim
µ(A) = µ(A)2 ,
N →∞ N
0≤n<N
which implies either µ(A) = 0 or µ(A) = 1. •
An alternative proof of this theorem (based on Wiener’s maximum inequality) can be found in [47].
These ergodic theorems were translated by Kolmogorov and Khintchine into the language of probability theory (see [83, 31] for their precise
R
formulation). In the ergodic theorem of Birkhoff the quantity f ∗ = f dµ
may be regarded in the case of an ergodic T as the expectation of f . This
interpretation allows far-reaching generalizations of the fundamental law of
large numbers which states that, given a sequence of identically distributed
and independent (i.i.d.) random variables X1 , X2 , . . . on some probability
space with finite expectation E|Xn | < +∞, the following limit exists
N
1 X
Xn = EX1
N →∞ N
lim
almost everywhere.
n=1
Thus, taking the mean over the realizations of many i.i.d. random variables
is in the limit the same as taking the mean over the realizations of a single one
— without any such limit law a theory of randomness would be impossible!
This observation essentially dates back to Daniel Bernoulli although in
a very simple form; the first formulation for random variables was given by
Chebyshev.
Exercises
Since the proof of the mean ergodic theorem might be a bit sketchy, it is a good
start to fill the gaps we left:
Exercise 4.1. Complete the proof of von Neumann’s Ergodic Theorem 4.1 (maybe
with the help of [114]) and deduce (4.2). Moreover, show that for f ∈ Lp with
1 ≤ p < +∞ the convergence (4.2) can be replaced by the same statement with
respect to the p-norm with a limit f ∗ ∈ Lp .
The ergodic theorems of von Neumann and Birkhoff are closely related:
Exercise 4.2. Deduce von Neumann’s Ergodic Theorem 4.1 with weight function
g ≡ 1 from Birkhoff’s theorem 4.2 along the following lines: given a function
h ∈ L∞ , define
1 X
h ◦ T n,
H = lim
N →∞ N
0≤n<N
use Birkhoff’s ergodic theorem to show that the difference
1 X
lim
h(T n x) − H(x)
N →∞ N
0≤n<N
46
ERGODIC NUMBER THEORY
tends almost everywhere to zero. Use Lebesgue’s theorem on dominated converP
n
gence to prove that SN h := N1
is a Cauchy sequence in Lp . Now
0≤n<N h ◦ T
show for f that SN f is also a Cauchy sequence and conclude the proof by proving
1
N +1
SN +1 f − SN (f ◦ T ) = f.
N
N
There is a converse of Birkhoff’s ergodic theorem which reminds us on the
analogue for Riemann integrals in the case of uniform distribution:
Exercise 4.3. Given an ergodic transformation T on a finite measure space and a
non-negative measurable function f , show that if
1 X
f (T n x)
lim
N →∞ N
0≤n<N
exists for almost all x, then f is integrable. Hint: Define functions fk (x) =
min{f (x), k} and use the theorem of monotone convergence.
Exercise 4.4. What is wrong with the following ’proof ’ of Birkhoff’s erR
godic theorem: if f is a complex-valued function on N0 , we write f (n) dn =
P
limN →∞ N1
0≤n<N f (n) whenever the limit exists, and call such functions integrable. If now T is a measure preserving transformation on a space X and if f is
integrable on X, then
Z Z
Z Z
n
|f (T x)| dn dx =
|f (T n x)| dx dn
Z Z
Z
=
|f (x)| dx dn = |f (x)| dx < ∞,
by Fubini’s theorem. Hence, f (T n x) is an integrable function of both variables
and therefore, for almost every x an integrable function of n. This alternative
but not completely serious reasoning can be found in [58], p.24. Which parts are
mathematically correct and which not?
As announced, ergodic theorems generalize the concept of uniform distribution.
For instance, we may derive many results form the first chapter via ergodic theory:
Exercise 4.5. Apply Birkhoff’s ergodic Theorem 4.2 to the circle rotation and
give an alternative proof of Corollary 1.5.
*
*
*
Birkhoff [21] gave already applications of his ergodic theorem to a
restricted three-bodies problem Earth–Sun–Moon.§ Since then many obvious and non-obvious applications of ergodic theory have been found. In the
following chapter we shall give two classical examples.
§
and to convex billiards as well!
CHAPTER 5
Heavenly and Normal Applications
We have already mentioned that the origin of ergodic theory dates back
to Poincaré and his studies of heavenly bodies. In this chapter we shall
prove his famous recurrence theorem which, of course, can be proved without the concept of ergodicity (the latter came more than thirty years later),
however, we shall also see that the ergodic machinery allows further insights. Moreover, we give another application of Birkhoff’s ergodic theorem which, besides Weyl’s work on uniform distribution modulo one, can
also be interpreted as a forerunner, namely Borel’s theorem on normal
numbers. Here we shall discuss questions as: how often appears the digit 7
in the decimal expansion of π? The reader may do a guess...
5.1. Poincaré’s Recurrence Theorem
Is our solar system stable? The dynamics of two bodies in space under gravity are described by Kepler’s laws. In his 270-pages paper [112]
Henri Poincaré solved part of the three-bodies-problem, that is the mathematical description of the orbits of three bodies interacting gravity. With
his apporach Poincaré gave the foundations for for treating chaotic movements and invariant integrals. In his monumental work [113] consisting of
three volumes Poincaré sets the stage for mathematical ergodic theory. It
contains his famous reccurrence theorem.
Before we state and prove this remarkable result we first need to introduce more vocabulary. Let T be a measure preserving transformation on a
probability space (X, F, µ) and A be a measurable set. A point x ∈ A is
said to be A-recurrent if there exists a positive integer n for which T n x ∈ A.
The notion of recurrence plays a crucial role in the branch of topological
dynamics. Here is Poincaré’s recurrence theorem:
Theorem 5.1. Let T : X → X be a measure preserving transformation on
a probability space (X, F, µ) and let A be measurable with µ(A) > 0. Then,
for almost all x ∈ A, the trajectory {T n x} will return to A infinitely often,
in particular, x is almost surely A-recurrent.
Equivalent to the almost sure recurrence is the divergence of the infinite
P
n
series ∞
n=0 χA (T x) for almost all x. This formulation reminds us on the
almost sure identity (4.10) from the proof of Theorem 4.3. In fact, it follows
47
48
ERGODIC NUMBER THEORY
immediately from Birkhoff’s ergodic theorem:
Z
1 X
n
lim
χA dµ = µ(A).
χA (T x) =
N →∞ N
X
0≤n<N
The recurrence theorem of Poincaré yields a proof of the weak ergodic hypothesis (mentioned briefly in the previous chapter). The restriction on recurrence almost everywhere allows the existence of a null set of non-recurrent
points. This is necessary as follows from the example provided by the transformation T x = 2x mod 1 (Example 2 in Chapter 3): the orbit of x = 12
(or any other reciprocal of a power of two) is eventually stationary in 0. As
a matter of fact, Poincaré did not prove his result by deeper measure or
even ergodic theoretical arguments.
We give an alternative proof due to Carathéodory [27] independent of
Birkhoff’s theorem:
Proof. Let B be the subset of A which consists exactly of those points x
that are not A-recurrent, that is
B = {x ∈ A : T n x 6∈ A for all n ∈ N}.
Alternatively,
B = A ∩ T −1 (X \ A) ∩ T −2 (X \ A) ∩ . . . ,
which shows that B is measurable. We shall show that µ(B) = 0. Since B ⊂
A we have B ∩ T −nB = ∅ for any n, and, consequently, T −k B ∩ T −k−n B = ∅
for all k, n 6= 1. Hence, the sets B, T −1 B, T −2 B, . . . are pairwise disjoint and,
since T is measure preserving, it follows that µ(B) = µ(T −n B) for all n ∈ N.
Now assume that µ(B) > 0, then


∞
[
X
µ(B) = +∞,
1 = µ(X) ≥ µ 
T −n B  =
n∈N0
n=0
a contradiction. This implies the A-recurrence of almost all x ∈ A. In fact,
almost all x will return infinitely often to A. To see that, define
C = {x ∈ A : T n x ∈ A for only finitely many n ∈ N}.
Then
C = {x ∈ A : T n x ∈ B for some n ∈ N0 } ⊂
∞
[
T −n B.
n=0
Since µ(B) = 0 and T is measure preserving, it follows that µ(C) = 0. •
The statement and the proof of Poincaré’s recurrence theorem may be
interpreted as measure theoretical pigeon hole principle (see Chapter 1).
And another part of the reasoning reminds us on Vitali’s negative solution
of the measure problem.
The recurrence property is intimately related to the assumption of a
finite measure. For instance, the transformation T : R → R, T x = x + 1 is
measure preserving on R with respect to the Lebesgue measure, however,
5. Poincaré’s Recurrence Theorem
49
for any bounded set A ⊂ R and x ∈ A the set {n ∈ N : T n x ∈ A} is empty
or finite which shows that T does not allow recurrence.
We give a physical interpretation of the recurrence theorem: given a
box in R3 with an evacuated right chamber and left chamber filled with
gas, separated by a dividing wall. after removing the dividing wall, we may
expect the gas molecules to distribute in the whole box, resulting in some
kind of uniform distribution.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
|
|
|
|
|
|
|
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
◦
•
•
◦
•
◦
◦
•
−→
◦
◦
•
◦
•
◦
◦
•
◦
•
•
•
◦
•
•
•
◦
•
•
◦
•
◦
◦
◦
◦
◦
•
•
◦
•
◦
•
◦
◦
•
•
◦
•
◦
◦
•
◦
◦
•
◦
•
◦
•
◦
•
◦
•
•
◦
◦
•
•
◦
◦
◦
•
•
•
Contrary to our intuition Poincaré’s recurrence theorem the system will
return after a (long but) finite time to its starting constellation, at least
approximately: the vacuum to the right (◦), the gas molecules on the left
(•). On first view this seems to contradict the second main theorem of
thermodynamics and Boltzmann’s theorem which claims that the entropy
of a closed system cannot decrease.∗ However, the assertion of the recurrence
theorem is primarily of statistical nature and the apparent incompatibility
resolves, however, by taking the expected return time into account which is
in all practical instances beyond the age of our universe. For the probability
to observe such violations of the second main theorem of thermodynamics
we refer to Evans & Searls [51].
With regard to Poincaré’s Recurrence Theorem 5.1 we may ask how
soon an orbit {T n x} will visit a measurable set A. For the following investigation we shall use an idea of Kakutani [75], namely, to consider the
transformation T only for the time when T n x visits A. For x ∈ A ∈ F we
define the return time of x to A by
nA (x) = min{n ∈ N : T n x ∈ A}.
Since nA is a minimum, it is measurable. In view of Poincaré’s recurrence
theorem it follows that nA (x) is finite for almost all x. Next we remove from
A the null set consisting of all x for which nA (x) = +∞ and we denote the
resulting set again by A. Now we introduce a measure induced by µ on the
σ-algebra generated by F ∩ A:
µA (B) =
∗
µ(B)
µ(A)
for
B ⊂ A,
By the way, the second main theorem of thermodynamics excludes the existence of
a perpetuum mobile and, because of the irreversibilty of time, travels in time.
50
ERGODIC NUMBER THEORY
which reminds us on the notion of conditional probability. This yields another probability space (A, F ∩ A, µA ). Moreover, we define the induced
transformation
TA : A → A, x 7→ T nA (x) x.
Now we are in the position to prove the following technical result:
Theorem 5.2. Let A be measurable and assume the definitions and conditions from above. Then the transformation TA is measure preserving with
respect to µA . Moreover, if T is ergodic, then TA too.
Proof. For n ∈ N define
An = {x ∈ A : n(x) = n},
Bn = {x ∈ X \ A : T x, . . . , T n−1 x 6∈ A, T n x ∈ A}.
Then An ∩ Bm = ∅. Moreover,
(5.1) T −1 A = A1 ∪ B1
T −1 Bn = An+1 ∪ Bn+1
and
for
n ∈ N.
Now let C ∈ F ∩ A. Since T is measure preserving with respect to µ, it
follows that µ(C) = µ(T −1 C).
In order to prove the first statement we shall show that the same holds
for µA . We have
∞
[
TA−1 C =
n=1
An ∩ TA−1 C =
∞
[
n=1
An ∩ T −n C,
where the sets An ∩ T −n C are pairwise disjoint. Hence,
µ(TA−1 C) =
(5.2)
∞
X
n=1
µ(An ∩ T −n C).
Since measures are preserved, repeated application of (5.1) leads to
µ(T −1 C)
=
=
=
...
=
µ(A1 ∩ T −1 C) + µ(B1 ∩ T −1 C)
µ(A1 ∩ T −1 C) + µ(T −1 (B1 ∩ T −1 C))
µ(A1 ∩ T −1 C) + µ(A2 ∩ T −2 C) + µ(B2 ∩ T −2 C)
N
X
n=1
µ(An ∩ T −n C) + µ(BN ∩ T −N C).
This construction called the Kakutani skyscraper since one climbs from the
sets A1 and B1 to An and Bn and so forth. In a similar manner,
!
∞
∞
X
[
−n
µ(Bn ∩ T −n C),
Bn ∩ T C =
1≥µ
n=1
n=1
hence µ(Bn ∩
T −n C)
tends to zero as n → ∞. In view of (5.2) this yields
µ(C) = µ(T
−1
C) =
∞
X
n=1
µ(An ∩ T −n C) = µ(TA−1 C),
5. Poincaré’s Recurrence Theorem
51
resp.
µ(TA−1 C)
µ(C)
=
= µA (TA−1 C).
µ(A)
µ(A)
It follows that TA is measure preserving with respect to µA .
It remains to show that TA inherits the ergodicity property. For this
purpose lets assume that T is ergodic. Then, for a T -invariant set B ⊂ A
of positive measure µA (B) > 0, we have to show µA (B) = 1. Using the
T -invariance we have B = TA−1 B = TA−2 B = . . . and so on. Thus,
!
∞
[
T −n B ∩ A.
B =
µA (C) =
n=0
If T is ergodic, we deduce from 0 < µA (B) = µ(B)/µ(A) that 0 < µ(B) = 1.
Thus,
!
∞
[
−n
T B =1
µ
n=0
S
−n B and B = A, respectively. Hence, we get
which yields X = ∞
T
n=0
µA (B) = 1. The proof is complete. •
The next statement is due to Kac [74], called Kac’s lemma, and provides a quantitative version of Poincaré’s recurrence theorem (analogous
to Weyl’s quantitative description of Kronecker’s and Bohl’s results on
the distribution on (nξ)):
Theorem 5.3. Let T : X → X be a measurable ergodic transformation on
a probability space (X, F, µ) and let A be a measurable set with µ(A) > 0.
Then nA ∈ L1 and, for the first return nA (x) for a point x ∈ A,
Z
Z
1
nA (x) dµ(x) = 1
resp.
nA (x) dµA (x) =
µ(A)
A
A
and
1
1 X
nA (T n x) =
.
lim
N →∞ N
µ(A)
0≤n<N
Thus, the expectation for the first return of an orbit to a given set equals
1/µ(A).
Proof. For x ∈ A we consider the orbit of x under TA , that is
x, TA x, . . . , TAn x, . . . , TAN x, . . . .
P
The quantity t := 0≤n<N nA (TAn x) measures the time for the first N returns of the orbit of x under T to the set A, i.e.,
X
χA (T n x) = N.
0≤n<t
Applying Birkhoff’s Ergodic Theorem 4.2 to TA and T (with N → ∞
resp. t → ∞), we get
52
ERGODIC NUMBER THEORY
5. Poincaré’s Recurrence Theorem
Z
nA (x) dµA (x) =
A
1
N →∞ N
lim
X
53
nA (TAn x)
0≤n<N
t
=
= lim P
n
t→∞
0≤n<t χA (T x)
Z
X
χA dµ
−1
=
1
,
µ(A)
which we had to show. • (For a nice variant of the proof see Baéz-Duarte
[9].)
We return for short to cat Felix and the phenomenon of his complete
recurrence after a finite number of iterations of Arnold’s cat-map (see
the pictures for the iterations An with n = 0, Felix himself, then n =
1, 2, 3, 4, 6, 50, and, finally, n = 405 (Felix once again).† Actually one
can show that if (X, F, µ, T ) is an ergodic system with discrete space X
and uniform distribution µ, then the recurrence is sure for any point (see
Exercise 5.3).
Finally, we give a measure theoretical variation of Theorem 5.1:
Theorem 5.4. Let T : X → X be a measure preserving transformation on
a probability space (X, F, µ) and let A be a measurable set with µ(A) > 0.
Then µ(A ∩ T −n A) > 0 for infinitely many n.
Proof. Since T is measure preserving, all sets A, T −1 A, T −2 A, . . . have the
same measure. If all these sets would be disjoint, finitely many of those sets
would provide a finite union of measure larger than µ(X) = 1, a contradiction. Thus, there are positive integers m < n such that µ(T −n A ∩ T −m A) >
0. Writing k = n − m it follows that µ(A ∩ T −k A) > 0 (since T is measure preserving). Repeating this argument with A, T −k A, T −2k A, . . ., implies µ(A ∩ T −n A) > 0 for infinitely many n. •
5.2. Normal numbers
Let b be a positive integer strictly larger than one. Any real number x
possesses a representation to base b, also called b-adic representation, i.e.,
(5.3)
x=
∞
X
n=0
an b−n
with
a0 ∈ Z, an ∈ {0, 1, . . . , b − 1}.
Here a0 = ⌊x⌋ is the integral part of x and the an are said to be the badic digits of {x} ∈ [0, 1). This representation is not unique, however, we
should not think to much about this defect since it is related only to a null
set; we illustrate this with a simple and well-known example from decimal
expansion:
0.9 = 0.99999 99999 . . . = 1.0 = 1,
where, as usual, the expression 9 stands for the infinite sequence of digits
9. In fact, if x has an eventually periodic b-adic representation, then x
†
The article [56] provides similar pictures with Henri Poincaré in place of Felix
which is acknowledged to [35].
54
ERGODIC NUMBER THEORY
is rational and thus belongs to a set of Lebesgue measure zero; if the
representation is not eventually periodic, then it is unique and x is irrational.
A real number x is called normal to base b if for each k ∈ N any block of
digits α1 . . . αk with αj ∈ {0, 1, . . . , b − 1} appears with the same frequency
in the b-adic representation of x = a0 .a1 a2 . . .. For k = 1 this means that
any digit appears with the same frequency:
1
1
lim
♯{n ≤ N : an = α} =
N →∞ N
b
for all α ∈ {0, 1, . . . , b − 1}; for k = 2 normality implies
1
1
♯{n ≤ N : an = α, an+1 = α′ } = 2
lim
N →∞ N
b
′
for all pairs α, α ∈ {0, 1, . . . , b − 1}. In the generic case the pattern α1 . . . αk
with αj ∈ {0, 1, . . . , b − 1} appears with asymptotical frequency b−k . Obviously, it suffices to consider only that part of the b-adic representation to
the fractional part {x} ∈ [0, 1). Next we shall prove Borel’s theorem:
Theorem 5.5. Almost all real numbers x are normal to any base b.
This theorem explains why numbers with such a regularity in their b-adic
representation are called normal. It should be noted that a number can
be normal to base b but not normal with respect to another base b′ . This
observation is due to Cassels [28] and Schmidt [123] obtained a criterion
to describe under which circumstances a normal number to base b is normal
to base b′ .
Proof. In view of our previous observation it suffices to prove the statement for numbers x ∈ [0, 1). The mapping Tb : [0, 1) → [0, 1), defined by
Tb x = bx mod 1, is measurable with respect to the Lebesgue measure λ.
Moreover, it is ergodic (which can be proved in the generic case along our
proof for the special case b = 2 which was Example 2 in §4). If now x to
base b is given by (5.3), then we have
α α+1
n
,
=: I(α)
Tb x ∈
b
b
for some given α ∈ {0, 1, . . . , b−1} if, and only if, an+1 = α. By Birkhoff’s
ergodic theorem 4.2 it thus follows
Z
1
1 X
n
χI(α) dλ = λ(I(α)) =
χI(α) (Tb x) =
lim
N →∞ N
b
[0,1)
0≤n<N
for almost all x. This implies the statement in the case of individual digits
α (that are blocks of length k = 1). The generic case (k ∈ N) follows via
α α+1
k−1
k−2
α := α1 b
+ α2 b
+ . . . + αk
and
I(α, k) := k , k
b
b
in an analogous manner:
Z
1
1 X
χI(α,k) dλ = λ(I(α, k)) = k ,
χI(α,k) (Tbn x) =
lim
N →∞ N
b
[0,1)
0≤n<N
5. Poincaré’s Recurrence Theorem
55
which leads to the assertion of the theorem. •
Borel’s original argument in [24] was based on the Borel-Cantelli–
lemma from probability theory (and faulty; cf. [33]). An elementary (and
correct) proof which follows Borel’s reasoning was given by Niven [107]. A
different but unpublished approach is due to Alan Turing [138]; recently,
Becher, Figueira & Picchi [14] completed his work.
With probabilistic tools we can derive more information. The central limit theorem states that: Given a sequence X1 , X2 , . . . of independent
identically distributed L2 -variables with expectation µ and variance σ 2 , let
SN := X1 + X2 + . . . + XN ; thence
2
Z y
SN − µN
t
1
√
dt.
lim P
exp −
≤y = √
N →∞
2
2π −∞
σ N
This strengthens the strong law of large numbers which claims N1 SN → µ
as N → ∞ under weaker conditions on the Xj . Now let us consider real
numbers x from the unit interval, given in their binary expansion:
x=
∞
X
bj (x)
with
2j
j=1
bj ∈ {0, 1}.
This is Example 2 from Chapter 3. By Borel’s theorem, resp. Birkhoff’s
theorem,
Z
N
1 X
n
lim
b1 dλ = 12
b1 (T x) =
N →∞ N
[0,1)
n=1
for almost all x. Note that bj = b1 ◦ T j−1 . Using this we can compute
expectation and variance by
Z 1
Z 1
Z
j−1
bj dλ =
b1 (T
x) dx =
b1 (x) dx = 21
[0,1)
and
Z
0
0
1
(bj (x) −
1 2
2 ) dx
0
=
Z
0
1
(b1 (x) − 12 )2 dx = 41 .
Moreover, P(bj (x) = 0) = P(bj (x) = 1) = 12 . So we may apply the central
limit theorem and obtain
!
PN
2
Z y
n−1 (x) − 1 N
t
1
n=1 b1 ◦ T
2
√
√
exp −
dt.
≤y =
lim P
1
N →∞
2
2π −∞
2 N
So there is a Gaussian normal distribution behind Borel’s normal numbers
theorem.
Normal distribution is a common feature of many natural distributions.
To give another example, Sinai [122] investigated the geodesic flow ϕt of the
unitary tangent bundle T1 V of a surface V of constant negative curvature; if
A is a domain of T1 V with piecewise differentiable boundary, then the mean
sojourn time of ϕt in this domain has a Gaussian distribution.
56
ERGODIC NUMBER THEORY
Although Borel’s theorem 5.5 shows that almost all real numbers are
normal to any base it is a difficult problem to prove normality of a given
real number. For instance, it is unknown whether the famous number
π = 3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510
58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 . . .
is normal with respect to some base.∗ This problem reminds us on the
difficulty to decide whether a given number is algebraic or transcendental.
Actually, for the latter question some techniques are known and, in particular the transcendence of π was shown by Lindemann in 1882 (which
implies the impossibility of the ancient problem of squaring the circle by
ruler and compass.† Kanada & Takahashi [78] computed more than 50
billion digits of the decimal expansion and the deviation from normality in
this data set is less than 0.002% for any digit. √
The situation is not better
with other famous constants as e = exp(1) and 2. Bailey & Crandall
[11] conjectured that any algebraic irrational number is normal.
Obviously, rational numbers are not normal (since their b-adic representations are eventually periodic). A more advanced example of non-normal
numbers results from the Cantor set C which is defined by successive deleting the middle thirds from the unit interval [0, 1]. More precisley,
n
C = [0, 1] \
∞ [
2
[
n=0 j=1
(xnj + 3−n−1 , xnj + 2 · 3−n−1 )
with certain rationals xnj . The Cantor set C is an example of an uncountable perfect set with empty interior (see [49]); recall that a set is said to
be perfect if each element is a limit point. The elements of C are exactly
those numbers x ∈ [0, 1] having ternary expansion without digit 1 (since the
middle thirds were deleted), i.e.,
x∈C
⇐⇒
x=
∞
X
n=1
an 3−n
with an ∈ {0, 2}.
The numbers xnj from above provide all possible partial sums of such elements x. Hence, the Cantor set does not contain any base 3 normal
number; in particular, it follows from Borel’s theorem that the Cantor
set C has Lebesgue measure zero.‡
There are only a few methods for constructing normal numbers known.
The first explicit example of a normal number was given by Sierpinski
∗
This problem is mentioned in the avantgardistic movie Pi of D. Aronofsky.
It should be noted that Ferdinand Lindemann did his habilitation at Würzburg
University in 1877; however, the breakthrough with π he had during his time in Munich.
‡
see http://mathworld.wolfram.com/CantorSet.html for amazing computer animations
on this topic.
†
5. Poincaré’s Recurrence Theorem
57
[127]. An easy and convincing example of a normal number is due to Champernowne [30]:
0.123456789 10111213141516171819 2021 . . . .
Moreover, Copeland & Erdös [32] proved that
0.23571113171923293137414347 . . . ,
is normal to base 10. In these examples it is obvious how the numbers are
constructed. It is not too difficult to compute any digit explicitly.§ Normal
numbers are definitely not made for generating random numbers.
Figure 1. The first 1600 binary digits of π (left) and its rational approximation 22
7 (right) ordered in a spiral. Which is
rational number with least denominator which approximates
π such that the first 1600 binary digits of both numbers are
equal?
We return to the number π. It is expected that there is no pattern in
the decimal expansion hidden. It is also conjectured that π is normal to
any base b. Nevertheless, it was a big surprise ten years ago when Bailey,
Borwein & Plouffe [10] discovered the so-called BBP-formula (named
after their initials) which allows to compute an arbitrary digit of π in the
hexadecimal system (base b = 16) without knowing any previous digit:
∞
X
4
2
1
1
1
−
−
−
.
(5.4)
π=
16n 8n + 1 8n + 4 8n + 5 8n + 6
n=0
We shall sketch how to derive this formula. One starts with
Z 1/√2 X
Z 1/√2 k−1
∞
∞
X
1
x
1
k−1+8m
− k2
dx =
·
.
x
dx = 2
8
m 8m + k
1
−
x
16
0
0
m=0
§
m=0
Well, in the case of Copeland & Erdös’s number the computation is not too easy,
however, thanks to the recent primality test of Agrawal, Kayal & Saxena [3] the
computation can be performed in polynomial time.
58
ERGODIC NUMBER THEORY
Thus, (5.4) is equivalent to
√
Z 1/√2 √
Z 1
y−1
4 2 − 8x3 − 4 2x4 − 8x5
π=
dx = 16
dy.
8
4 − 2y 3 + 4y − 4
1
−
x
y
0
0
Using
Z
x
du
1
+
u2
0
and partial fraction decomposition (resp. a computer algebra package), this
implies the BBP-formula (5.4). But how to read off an arbitrary digit of π?
We explain this by a more simple example, namely,
arctan x =
log 2 =
∞
X
1
,
k2k
k=1
which follows immediately from the power series expansion of the logarithm
and Abel’s limit theorem. Thus the (d + 1)-th digit of the binary expansion
of log 2 equals
(( d
) ( ∞
))
X 2d−k mod k
X 2d−k
{2d log 2} =
+
.
k
k
k=0
k=d+1
The numerators 2d−k mod k in the first sum can be computed modulo k by
a method called fast exponentiation.¶ The second sum converges quickly,
hence only a few terms need to be computed. In a similar manner, only with
more technical efforts, one can compute an arbitrary digit of the hexadecimal
expansion of π by using the BBP-formula (5.4). However, this does not
imply any pattern for the digits in this 16-adic expansion — in contrast to
Champernowne’s number — at least no consequences for normality are
known so far. Recently, Bailey & Crandall [11] made a conjecture how
a formula of (5.4)-type (as those above for π and log 2) could be related to
a sequence of real numbers which should be uniformly distributed modulo
one if, and only, if the underlying number is normal. We do not go into the
details but mention that, as a consequence of this unproved hypothesis, π
would be normal to base 16 if the sequence (xn ), given by
120n2 − 89n + 16
,
512n4 − 1024n3 + 712n2 − 206n + 21
is uniformly distributed modulo one. In case of log 2 normality would result
from the uniform distribution of
1
mod 1.
x0 = 0, xn+1 = 2 xn +
n
(5.5)
x0 = 0,
xn = 16xn−1 +
Unfortunately, for both sequences it is not known whether they are uniformly
distributed modulo one.
A curiosity: if π is normal, lets say to base b = 26, and if we asign
to each of the 26 digits a letter of the latin alphabet, A 7→ 1, B 7→ 2, . . .,
¶
For example, 217 = ((((22 )2 )2 )2 ) · 2, hence 17 = 24 + 20 .
5. Poincaré’s Recurrence Theorem
59
say, then the 26-adic expansion of π would include a proof of its normality,
provided that this statement is provable.k
Exercises
A rolling stone gathers no moss. We start with reccurrence:
Exercise 5.1. Prove the following metrical version of Poincaré’s reccurrence
theorem: assume the condition of Theorem 5.1 and suppose that X is a metrical
space with metric d which respects µ. Then, for almost all x,
lim inf d(x, T n x) = 0.
n→∞
Moreover, show that for some n ≤ 1 + ⌊1/µ(A)⌋ the inequality µ(A ∩ T −n A) > 0
holds.
Next we consider random walks on the unit cricle:
Exercise 5.2. Imagine a random walker starts at the point P on the unit circle T
and tosses a fair coin; if it comes up heads the walker moves counterclockwise by
the distance α whereas for tails the walker moves clockwise by the distance β, where
α and β are positive real numbers. A sequence of toin cossings can be regarded as a
P
binary expansion of a number from the unit interval: x = (x1 , x2 , . . .) = j≥1 xj 2−j
with xj ∈ {0, 1} according to heads or tails. Can the random walker visit any open
neighborhood of a given point on T, and, if so, how long does it take to return to
aneighbourhood of the starting point P ? Hint: for some advice see [31], §7.5.
The next task is to explain Felix’s recurrence:
Exercise 5.3. Prove: if (X, F , µ, T ) is an ergodic system with discrete space X and
uniform distribution µ, then the recurrence is sure. Explain why cat Felix returns
completely after n = 405 iterations. Can you also explain that the first return of
Felix is after n = 135 returns. Hint: the picture of Felix consists of 810 × 810
pixles; note that 810 is an integer multiple of 405 which is an integer multiple of
135. An illuminating reading might be [42].
The Cantor set and its relatives have a very interesting topological structure.
Exercise 5.4. Prove all statements about the Cantor set C, in particular show
that it has Lebesgue measure λ(C) = 0, does not contain any open interval, is
compact and perfect, and hence uncountable. Moreover, search in the mathematical
literature for generalizations of C (e.g., the Sierpinski gasket).
We conclude with a more explicit task on the BBP-formula:
Exercise 5.5. Give a complete proof for (5.4). Furthermore, implement an algorithm to compute the hexadecimal expansion of π by use of the BBP-formula.
Compare your results with the values xn according to (5.5) and do some statistics
for the digits.
k
Unfortunately, it would also contain false proofs. There is a computer program on
the webpage www.angio.net/pi/bigpi.cgi which finds – if possible – the first appearance of
any date in the decimal expansion of π; for instance, my birth date appears at position
151897.
60
ERGODIC NUMBER THEORY
*
*
*
In the next chapter we get to know the Riemann zeta-function which is
one of the main objects in analytic number theory. It is known to play a
central role in prime number distribution theory (as we will briefly highlight),
however, it will also show up in the context of continued fraction expansions
and their ergodic properties in a later chapter. The main tools to study the
zeta-function is complex analysis and in some places our exposition will be
only fragmentary.
CHAPTER 6
Interlude: The Riemann Zeta-Function
The zeta-function is of particular interest in analytic number theory. For
Re s > 1, zeta is defined by
∞
X
Y
1 −1
1
;
=
1− s
(6.1)
ζ(s) =
ns
p
p
n=1
here the product is taken over all prime numbers p. The identity between the
series and the product is an analytic version of the unique prime factorization
of the integers as becomes obvious by expanding each factor of the product
into a geometric series. This type of series is called Dirichlet series and a
product over primes as above is referred to as Euler product. This indicates
that already Euler studied the zeta-function. We give a first glimpse of his
insights by reproducing his one-sentence proof of the infinitude of primes: if
there were only finitely many primes, the product would converge for s = 1,
contradicting the divergence of the harmonic series. This analytic reasoning
approved to be more powerful than elementary approaches to prime number
distribution. For instance, Euler showed that the sum over the reciprocals
of the primes is divergent, which he noted as follows:
1 1 1 1
+ + + + . . . = log log ∞;
2 3 5 7
P
in modern notation we would write this in the form p≤x 1p ∼ log log x, as
x → ∞. It follows that the primes form a sparse set within N. Unfortunately,
this does not imply an asymptotic formula for the number π(x) of primes
p ≤ x. It was the young Gauss who, at the age of seventeen, conjectured
that π(x) ∼ logx x , as x → ∞. And it was Riemann who was the first to
study ζ(s) as a function of a complex variable which led, finally, to a proof of
Gauss’ conjecture. We shall briefly survey the remarkable relation between
primes and zeta in the following.
Since our exposition here will be rather sketchy we recommend further
reading. For more information as well as citations to the original works on
the zeta-function and its impact on prime number distribution we refer to
the classical monography [137] by Titchmarsh and the historical account
[104] of Narkiewicz.
6.1. Primes and Zeros
It is not difficult to show (e.g., by Riemann’s integral test) that both, the
series and the product in (6.1) converge absolutely for all complex numbers
61
62
ERGODIC NUMBER THEORY
s with Re s > 1. Of special interest are the values of zeta at the positive
integers. Euler proved
Theorem 6.1. For k ∈ N,
ζ(2k) = (−1)k+1
(2π)2k
B2k .
2(2k)!
Here Bm denotes the mth Bernoulli number, defined by the identity
∞
X
x
xm
1
1
1
Bm
=
= 1 − x + x2 − x4 + . . . .
exp(x) − 1 m=0
m!
2
6
30
The Bernoulli numbers were discovered independently by the Swiss mathematician Jakob Bernoulli and by the Japanese mathematician Seki
Kōwa, both discoveries were posthumously published in 1713, resp. 1712.
In particular, we deduce Euler’s famous formula
(6.2)
ζ(2) = 1 +
1
1
π2
1
+ 2 + 2 + ... =
2
2
3
4
6
We shall give only an idea of his proof (details are left to the reader as
Exercise 6.1). Recall the infinite product
∞ ∞ Y
sin z
z Y
z2
=
=
1−
1− 2 2
z
πn
π n
n=−∞
n6=0
n=1
and the power series representation
∞
X
sin z
z2 z4
z 2k
=1−
+
∓ ... =
.
(−1)k
z
3!
5!
(2k + 1)!
k=0
Comparing the coefficients at z 2 , we deduce (6.2). Euler’s proof was much
discussed by his contemporaries. At his times it was not clear whether sin z
has no complex zeros; furthermore, the convergence of the infinite product
for the sine-function cannot be proved without complex analysis which was
not developed (although questions of convergence in those times were often
considered as negligible). However, today Euler’s argument is waterproof
and might be the easiest proof of all.
It is easily seen that the Bernoulli numbers are rational, hence the values ζ(2k) are rational multiples of π 2k and, by the transcendence of π, they
are even transcendental. However, not too much is known about the arithmetic nature of the values at the positive odd integers. Apéry [4] showed
that ζ(3) is irrational, however, it is unknown whether ζ(3) is transcendental
or whether ζ(5) is irrational.
Next we need an analytic continuation of the zeta-function to the left of
this half-plane of absolute convergence; a certain problem is the singularity
at s = 1 which implied in form of the harmonic series the existence of
6. The Riemann Zeta-Function
63
infinitely many primes in Euler’s proof. Assume Re s > 1, then
∞
∞
∞
X
X
X
1
1
(−1)n−1
−
2
=
.
ns
(2m)s
ns
n=1
m=1
n=1
Rewriting the left-hand side in terms of the zeta-function, we obtain the
representation
1−s −1
ζ(s) = (1 − 2
(6.3)
)
∞
X
(−1)n−1
ns
n=1
.
We observe that the series on the right converges in the half-plane Re s > 0.
2π
The factor (1 − 21−s )−1 has simple poles at s = 1 + log
2 k for any k ∈ Z,
however, for k 6= 0 the alternating series vanishes. Hence, (6.3) provides an
analytic continuation for ζ(s) to the half-plane Re s > 0 except for a simple
pole at s = 1.
For a later purpose we shall derive another formula for the zeta-value ζ(2)
which is related with the representation (6.3). Recall the Gamma-function,
for complex s with positive real part defined by the integral
Z ∞
y s−1 exp(−y) dy.
(6.4)
Γ(s) =
0
Substituting y = nu we deduce
n
−s
Γ(s) =
Z
∞
us−1 exp(−nu) du,
0
resp., for Re s > 1,
∞
X
(−1)n−1
n=1
ns
Γ(s) =
Z
∞
∞
X
s−1
u
0
n−1
(−1)
!
exp(−nu)
n=1
du;
here we may interchange summation and integration because of absolute
convergence. Substituting u = − log x we obtain
∞
X
∞
X
(−1)n−1 exp(−nu) =
(−1)n−1 xn =
n=1
n=1
x
,
1+x
hence
1−s
(1 − 2
Z
)ζ(s)Γ(s) =
(6.5)
(− log x)s−1
0
In particular,
π2
1
= ζ(2) =
12
2
1
Z
1
(− log x)
0
dx
.
1+x
dx
,
1+x
which we will need in a later chapter.
Here is another way of extending ζ(s) beyond the domain of absolute
convergence of its defining Dirichlet series which is due to Riemann [118]
64
ERGODIC NUMBER THEORY
and stands at the beginning of his and others investigations of the zetafunction as a function of a complex variable. Substituting u = πn2 x in (6.4)
leads to
Z ∞
s s 1
s
−2
(6.6)
Γ
π
x 2 −1 exp(−πn2 x) dx.
=
s
2
n
0
Summing up over all n ∈ N yields
∞
∞ Z ∞
s X
X
s
1
− 2s
=
x 2 −1 exp(−πn2 x) dx.
π Γ
s
2
n
0
n=1
n=1
On the left-hand side we find the Dirichlet series defining ζ(s); in view of its
convergence, the latter formula is valid only for Re s > 1. On the right-hand
side we may interchange summation and integration, justified by absolute
convergence. Thus we obtain
Z ∞
∞
s
X
s
− 2s
exp(−πn2 x) dx.
π Γ
ζ(s) =
x 2 −1
2
0
n=1
We split the integral at x = 1 and get
Z 1 Z ∞ s
s
− 2s
(6.7)
π Γ
+
x 2 −1 ω(x) dx,
ζ(s) =
2
1
0
where the series ω(x) is given in terms of the ’half’ theta-function of Jacobi:
ω(x) :=
∞
X
n=1
exp(−πn2 x)
exp(−πn2 x) =
1
(θ(x) − 1)
2
(since
= exp(−π(−n)2 x) for any n ∈ N). In view of the functional equation for the theta-function,
√
1
1 √
1
1
ω
=
θ
− 1 = xω(x) + ( x − 1),
x
2
x
2
which can be deduced from Poisson summation formula, we find by the
substitution x 7→ x1 that the first integral in (6.7) is equal to
Z ∞
Z ∞
s+1
1
1
1
− 2s −1
x
− .
x− 2 ω(x) dx +
dx =
ω
x
s−1 s
1
1
Substituting this in (6.7) yields
Z ∞
s
s
s
s+1
1
(6.8)
π− 2 Γ
x− 2 + x 2 −1 ω(x) dx.
ζ(s) =
+
2
s(s − 1)
1
Since ω(x) ≪ exp(−πx), the last integral converges for all values of s, and
thus (6.8) holds, by analytic continuation, throughout the complex plane.
The right-hand side remains unchanged by s 7→ 1 − s. This proves Riemann’s functional equation:
s
1−s
− 1−s
− 2s
ζ(s) = π 2 Γ
(6.9)
π Γ
ζ(1 − s),
2
2
valid for all complex s.
6. The Riemann Zeta-Function
65
In view of the Euler product (6.1) it is easily seen that ζ(s) has no
zeros in the half-plane Re s > 1. It follows from the functional equation and
from basic properties of the Gamma-function that ζ(s) vanishes in Re s < 0
exactly at the so-called trivial zeros s = −2n with n ∈ N. All other zeros of
0.1
0.05
-14
-12
-10
-8
-6
-4
-2
-0.05
-0.1
-0.15
Figure 1. ζ(s) in the range s ∈ [−14.5, 0.5].
ζ(s) are said to be nontrivial, and we denote them by ρ = β + iγ. Obviously,
they have to lie inside the so-called critical strip 0 ≤ Re s ≤ 1, and it is easily
seen that they are non-real. The functional equation (6.9) and the identity
ζ(s) = ζ(s) show some symmetries of ζ(s). In particular, the nontrivial zeros
of ζ(s) are distributed symmetrically with respect to the real axis and to the
vertical line Re s = 12 . It was Riemann’s ingenious contribution to number
theory to point out how the distribution of these nontrivial zeros is linked to
the distribution of prime numbers. Riemann conjectured the asymptotics
for the number N (T ) of nontrivial zeros ρ = β + iγ with 0 < γ ≤ T
(counted according multiplicities). This conjecture was proved in 1895 by
von Mangoldt who found more precisely
T
T
(6.10)
log
+ O(log T ).
N (T ) =
2π
2πe
Riemann worked with the function t 7→ ζ( 12 + it) and wrote that very likely
all roots t are real, i.e., all nontrivial zeros lie on the so-called critical line
Re s = 12 . This is the famous, yet unproved Riemann hypothesis which we
rewrite equivalently as
Riemann’s hypothesis. ζ(s) 6= 0 for Re s > 12 .
In support of his conjecture, Riemann calculated some zeros; the first one
with positive imaginary part is ρ = 12 + i14.134 . . .. Furthermore, he conjectured that there exist constants A and B such that
s
Y
s
s
− s2
1
ζ(s)
=
exp(A
+
Bs)
1
−
exp
,
s(s
−
1)π
Γ
2
2
ρ
ρ
ρ
66
ERGODIC NUMBER THEORY
1.5
1
0.5
-1
1
2
3
-0.5
-1
-1.5
Figure 2. The values of ζ( 21 + it) as t ranges from 0 to 40.
where the product on the right is taken over all nontrivial zeros (the trivial
zeta zeros are cancelled by the poles of the Gamma-factor). This was shown
by Hadamard in 1893 (on behalf of his theory of product representations
of entire functions). Finally, Riemann conjectured the so-called explicit
formula which states that
(6.11)
π(x) +
1
∞
X
π(x n )
n=2
n
X
= li(x) −
+
Z
x
ρ=β+iγ
γ>0
∞
u(u2
li(xρ ) + li(x1−ρ )
du
− log 2
− 1) log u
1
for any x ≥ 2 not being a prime power (otherwise a term 2k
has to be added
k
on the left-hand side, where x = p ). The appearing integral logarithm is
defined by
Z (β+iγ) log x
exp(z)
dz,
li(xβ+iγ ) =
(−∞+iγ) log x z + δiγ
where δ = +1 if γ > 0 and δ = −1 otherwise. The explicit formula was
proved by von Mangoldt in 1895 as a consequence of both product representations for ζ(s), the Euler product (6.1) and the Hadamard product.
Building on these ideas, Hadamard and de la Vallée-Poussin found
(independently) in 1896 the first proof of Gauss’ conjecture, the celebrated
prime number theorem. For technical reasons it is of advantage to work
with the logarithmic derivative of ζ(s) which is for Re s > 1 given by
∞
X Λ(n)
ζ′
(s) = −
,
ζ
ns
n=1
6. The Riemann Zeta-Function
67
where the von Mangoldt Λ-function is defined by
log p if n = pk with k ∈ N,
(6.12)
Λ(n) =
0
otherwise.
A lot of information concerning the prime counting function π(x) can be
recovered from information about
1
X
X
2
ψ(x) :=
Λ(n) =
log p + O x log x .
n≤x
p≤x
ψ(x)
log x .
Partial summation gives π(x) ∼
First of all, we shall express ψ(x) in
terms of the zeta-function. If c is a positive constant, then
Z c+i∞ s
x
1
1 if x > 1,
ds =
0 if 0 < x < 1.
2πi c−i∞ s
This yields the so-called Perron formula: for x 6∈ Z and c > 1,
Z c+i∞ ′
1
xs
ζ
ψ(x) = −
(6.13)
(s) ds.
2πi c−i∞ ζ
s
Moving the path of integration to the left, we find that the latter expression
is equal to the corresponding sum of residues, that are the residues of the
integrand at the pole of ζ(s) at s = 1, at the zeros of ζ(s), and at the
additional pole of the integrand at s = 0. The main term turns out to be
s
′
x
xs
1
ζ
+ O(1)
= x,
= lim (s − 1)
Ress=1 − (s)
s→1
ζ
s
s−1
s
whereas each nontrivial zero ρ gives the contribution
′
ζ
xs
xρ
Ress=ρ − (s)
=− .
ζ
s
ρ
By the same reasoning, the trivial zeros altogether contribute
∞
X
x−2n
1
1
= − 2 log 1 − 2 .
2n
x
n=1
Incorporating the residue at s = 0, this leads to the exact explicit formula
X xρ
1
1
− 2 log 1 − 2 − log(2π),
ψ(x) = x −
ρ
x
ρ
which is equivalent to Riemann’s formula (6.11). This formula is valid for
any x 6∈ Z. Notice that the right-hand side of this formula is not absolutely
convergent. If ζ(s) would have only finitely many nontrivial zeros, the righthand side would be a continuous function of x, contradicting the jumps of
ψ(x) for prime powers x. Going on it is more convenient to cut the integral
in (6.13) at t = ±T which leads to the truncated version
x
X xρ
(6.14)
+O
(log(xT ))2 ,
ψ(x) = x −
ρ
T
|γ|≤T
valid for all values of x. Next we need information on the distribution of
the nontrivial zeros. Already the non-vanishing of ζ(s) on the line Re s = 1
68
ERGODIC NUMBER THEORY
yields the asymptotic relations ψ(x) ∼ x, resp. π(x) ∼ li (x), which is
Gauss’ conjecture and sufficient for many applications. However, more
precise asymptotics with a remainder term can be obtained by a zero-free
region inside the critical strip. The largest known zero-free region for ζ(s)
was found by Vinogradov and Korobov (independently) in 1958 who
proved
ζ(s) 6= 0
in Re s ≥ 1 −
c
1
3
2
(log(|t| + 3)) (log log(|t| + 3)) 3
,
where c is some positive absolute constant. In combination with the
Riemann-von Mangoldt formula (6.10) one can estimate the sum over
the nontrivial zeros in (6.14). Balancing out T and x, we obtain the prime
number theorem with the sharpest known remainder term:
Theorem 6.2. There exists an absolute positive constant C such that for
sufficiently large x
!!
3
(log x) 5
π(x) = li (x) + O x exp −C
.
1
(log log x) 5
By the explicit formula (6.14) the impact of the Riemann hypothesis on
the prime number distribution becomes visible. In 1900, von Koch showed
that for fixed θ ∈ [ 12 , 1)
(6.15) π(x) − li (x) ≪ xθ+ǫ
⇐⇒
ζ(s) 6= 0 for
Re s > θ ;
equivalently, one can replace the left-hand side by ψ(x) − x. Here and in
the sequel ǫ stands for an arbitrary small positive constant, not necessarily
the same at each appearance. With regard to known zeros of ζ(s) on the
critical line it turns out that an error term with θ < 21 is impossible. Thus,
the Riemann hypothesis states that the prime numbers are as uniformly
distributed as possible!
Many computations were done to find a counterexample to the Riemann hypothesis. Van de Lune, te Riele & Winter localized the first
1 500 000 001 zeros, all lying without exception on the critical line; moreover they all are simple! By observations like this it is conjectured, that
all or at least almost all zeros of the zeta-function are simple. This is the
so-called essential simplicity hypothesis. Already classical density theorems
(e.g. those of Bohr & Landau) show that most of the zeros lie arbitrarily
close to the critical line. On the other hand, Hardy showed that infinitely
many zeros lie on the critical line. Refining a mollifying technique of Selberg, Levinson localized more than one third of the nontrivial zeros of
the zeta-function on the critical line, and as Heath-Brown and Selberg
(unpublished) discovered, they are all simple. The current record is due to
Conrey who showed that more than two fifths of the zeros are simple and
on the critical line.
6. The Riemann Zeta-Function
69
We give a heuristic probabilistic argument for the truth of Riemann’s
hypothesis due to Denjoy. For this purpose we introduce the Möbius µfunction which is defined by µ(1) = 1, µ(n) = 0 if n has a quadratic divisor
6= 1, and µ(n) = (−1)r if n is the product of r distinct primes. It is easily
seen that µ(n) is multiplicative and appears as coefficients of the Dirichlet
series representation of the reciprocal of the zeta-function:
X
∞
Y
µ(n)
1
−1
,
ζ(s) =
1− s =
p
ns
p
n=1
valid for Re s > 1. Riemann’s hypothesis is equivalent to the estimate
X
1
M (x) :=
µ(n) ≪ x 2 +ǫ .
n≤x
This is related to (6.15). Now Denjoy [39] argued as follows: Assume that
{Xn } is a sequence of random variables with distribution
1
P(Xn = +1) = P(Xn = −1) = .
2
Pn
Define S0 = 0 and Sn =
j=1 Xj , then {Sn } is a symmetrical random
2
walk in Z with starting point at 0. A simple application of Chebyshev’s
inequality yields, for any positive c,
1
1
P{|Sn | ≥ cn 2 } ≤ 2 ,
2c
which shows that large values for Sn are rare events. By the theorem of
Moivre-Laplace this can be made more precise. It follows that
2
Z c
o
n
1
x
1
2
exp −
=√
dx.
lim P |Sn | < cn
n→∞
2
2π −c
Since the right-hand side above tends to 1 as c → ∞, we obtain
n
o
1
+ǫ
2
lim P |Sn | ≪ n
=1
n→∞
for every ǫ > 0. We observe that this might be regarded as a model for the
value-distribution of Möbius µ-function. The law of the iterated logarithm
would even give the stronger estimate
n
o
1
lim P |Sn | ≪ (n log log n) 2 = 1,
n→∞
1
which suggests for M (x) the upper bound (x log log x) 2 . This estimate is
pretty close to the so-called weak Mertens hypothesis which states
Z X
M (x) 2
dx ≪ log X.
x
1
Note that this bound implies the Riemann hypothesis and the essential
simplicity hypothesis. On the contrary, Odlyzko & te Riele disproved
the original Mertens hypothesis,
1
|M (x)| < x 2 ,
70
ERGODIC NUMBER THEORY
by showing that
(6.16)
lim inf
x→∞
M (x)
x
1
2
< −1.009
and
lim sup
x→∞
M (x)
1
x2
> 1.06;
Figure 3. The random walk generated by the values of the
µ-function for n ≤ 10 000.
6.2. Applications of Uniform Distribution and Ergodic Theory
Our first application deals with the arithmetic nature of the ordinates
of the nontrivial zeros of the zeta-function. Rademacher [115] proved the
remarkable result that these ordinates are uniformly distributed modulo one
provided that the Riemann hypothesis is true; later Elliott [48] remarked
that the latter condition can be removed, and (independently) Hlawka [64]
obtained the following unconditional
Theorem 6.3. The ordinates of the nontrivial zeros of the zeta-function
are uniformly distributed modulo one.
Proof. We need some deep results from zeta-function theory. We start with
a theorem of Landau [94] who proved, for x > 1,
X
T
xρ = −Λ(x)
+ O(log T ),
2π
0<γ≤T
where the summation is over all nontrivial zeros ρ = β + iγ and Λ(x) is
the von Mangoldt Λ-function, defined by (6.12). Hence, in view of the
Riemann-von Mangoldt-formula (6.10) it follows that
X
1
1
xρ ≪
.
(6.17)
N (T )
log T
0<γ≤T
6. The Riemann Zeta-Function
71
We do not want to argue under assumption of the Riemann hypothesis. For
this aim we observe that
1
|x 2 +iγ − xβ+iγ | ≤ xβ | exp(( 21 − β) log x) − 1| ≤ xβ log x |β − 12 |.
Thus,
X
1
x log x X
1
|x 2 +iγ − xβ+iγ | ≤
|β − 21 |.
N (T )
N (T )
(6.18)
0<γ≤T
0<γ≤T
Next we shall use a result of Littlewood [99], namely
X
|β − 12 | ≪ T log log T.
0<γ≤T
it should be noted that Selberg improved upon this result in replacing the
right-hand side by T ; both estimates indicate that most of the zeta zeros are
clustered around the critical line. Inserting this in (6.18) and use of (6.10)
leads to
X
1
log log T
1
(x 2 +iγ − xβ+iγ ) ≪
.
N (T )
log T
0<γ≤T
Thus, it follows from (6.17) that also
X 1
1
log log T
x 2 +iγ ≪
.
N (T )
log T
0<γ≤T
Letting x =
zh
with some real number z > 1 and h ∈ N, we deduce
X
1
log log T
exp(ihγ log z) ≪
,
N (T )
log T
0<γ≤T
which tends to zero as T → ∞. Hence, it follows from Weyl’s criterion,
1
γ log z is uniformly distributed
Theorem 1.4, that the sequence of numbers 2π
modulo one. •
It is a long-standing conjecture that the ordinates of the nontrivial zeros are linearly independent over the rationals.∗ Ingham observed an interesting impact on the distribution of values of the Möbius µ-function
in showing that, if the ordinates of the nontrivial zeros are indeed lin1
early independent over the rationals, then lim supx→∞ M (x)x− 2 = +∞ and
1
lim inf x→∞ M (x)x− 2 = −∞ which should be compared with (6.16).
Our second aim is a recent application of ergodic theory (see [130]). For
this purpose we need an ergodic transformation on the real line. Adler &
Weiss [2] trace back the transformation x 7→ x− x1 to a paper of Boole [23]
from the second half of the nineteenth century who observed the remarkable
R
R
identity R f (x) dx = R f (x − x1 ) dx, valid for all continuous functions f ;
we quote from their introduction:
∗
One may ask why to expect anything like that? Well, an appropriate answer would
be: why not? There is definitely no reason why the zeros should satisfy some algebraic
relations, hence it is reasonable to expect the converse.
72
ERGODIC NUMBER THEORY
”Now as is well known there are fundamental differences between the measure preserving transformations of finite measure spaces and those of infinite measure-spaces. In particular the latter theory suffers from a paucity of good examples,
and so a natural question arose – what can ergodic theory say
about Boole’s transformation...”
Adler & Weiss prove that Boole’s transformation is indeed ergodic as
the dear reader probably already suspected. For some reason we shall study
a different, however, related transformation that has the advantage of a
associated finite measure space.
Recall the transformation given by T 0 = 0 and T x = 12 (x − x1 ) for x 6= 0
on R (that is Example 7 from Chapter 3). Our aim is to study the values of
the zeta-function on vertical lines with respect to this transformation. First
of all, we note that T is ergodic. Since the only T -invariant sets A with
respect to the related probability measure P, given by (3.3), are A = {0}
and A = R for which P(A) = 0 or = 1, transformation T is ergodic. Now
define
R ∋ x 7→ f (x) := ζ(s + ix),
then f is integrable with respect to P for all s with Re s > − 21 . This follows
immediately from the estimate

if σ > 1,
 0
1−σ
(6.19) ζ(σ + it) ≪ tµ(σ)+ǫ
with µ(σ) ≤
if
0 ≤ σ ≤ 1,
 1 2
if σ < 0,
2 −σ
as t → ∞. Hence, applying Birkhoff’s ergodic theorem implies, for Re s >
− 12 ,
Z
1
dτ
1 X
ζ(s + iT n x) =
lim
ζ(s + iτ )
N →∞ N
π R
1 + τ2
0≤n<N
for almost all x ∈ R. For the evaluation of these ergodic limits we shall
use another interpretation of these integrals. Recently, Lifshits & Weber
[97] published a paper entitled ”Sampling the Lindelöf Hypothesis with the
Cauchy Random Walk” which explains the content of their interesting paper
very well. If (Xm ) is an infinite sequence of independent Cauchy distributed
P
random variables, the Cauchy random walk is defined by Cn = m≤n Xm .
Lifshits & Weber proved among other things (in slightly different notation) that almost surely
1
1 X
ζ( 12 + iCn ) = 1 + o(N − 2 (log N )b )
(6.20)
lim
N →∞ N
1≤n≤N
for any b > 2. It should be noted that the expectations EXm and ECn
do not exist, and, indeed, the values of Cn provide a sampling of randomly
distributed real numbers of unpredictable size. Hence, the almost sur econvergence theorem of Lifshits & Weber shows that the expectation value
of ζ(s) on the Cauchy random walk s = 12 + iCn equals one, which indicates
6. The Riemann Zeta-Function
73
that most of the values of the zeta-function on the critical line are pretty
small. The yet unproved Lindelöf hypothesis states that, for any ǫ > 0,
ζ( 21 + it) ≪ tǫ
(6.21)
as t → ∞. The Riemann hypothesis implies the Lindelöf hypothesis (see
[137]) and, thus, the Lindelöf hypothesis serves in some applications as
valuable substitute. The presently best estimate in this direction is due to
32
+ ǫ in place of the tiny ǫ above.
Huxley who obtained the exponent 205
1 P
Our Cesàro mean N 0≤n<N ζ(s + iT n x) may as well be interpreted as a
sample for testing the Lindelöf hypothesis.
Noting that the density function of a Cauchy distributed random vari1
able X is given by τ 7→ π(1+τ
2 ) , it follows that the associated probability
measure is also given by (3.3), as our ergodic measure, and the integral in
question is thus nothing but the expectation of ζ( 12 + iX),
Z
1
dτ
ζ( 1 + iτ )
.
Eζ( 21 + iX) =
π R 2
1 + τ2
In their account to prove (6.20) Lifshits & Weber computed by elementary means several expectation values, in particular this one, which yields
1 X
lim
ζ( 21 + iT n x) = ζ( 32 ) − 83 = −0.05429 . . . .
N →∞ N
0≤n<N
See [130] for a different proof which relies on the calculus of residues. It is
not difficult to consider other vertical lines rather than the critical line. The
general result is
Theorem 6.4. Let s be given with Re s > − 12 . Then
Z
1
dτ
1 X
ζ(s + iT n x) =
ζ(s + iτ )
.
lim
N →∞ N
π R
1 + τ2
0≤n<N
for almost all x ∈ R.
Define
ℓ(s) =
If Re s < 1, then
Z
dτ
1
ζ(s + iτ )
.
π R
1 + τ2
ℓ(s) = ζ(s + 1) −
2
,
s(2 − s)
where the case of s = 0 is included as ℓ(0) = lims→0 ℓ(s) = γ − 12 ,
where γ = 0.577 . . . is the Euler-Mascheroni constant defined as γ =
P
1
limM →∞ ( M
m=1 m − log M ). If s = 1 + it with some real number t, then
ℓ(s) = ζ(s + 1) −
1
1
= ζ(2 + it) −
.
s(2 − s)
1 + t2
Finally, ℓ(s) = ζ(s + 1) for Re s > 1.
74
ERGODIC NUMBER THEORY
Moreover, this allows to give an equivalent formulation of the Riemann
hypothesis in terms of our ergodic transformation. It is widely expected
that if the Riemann hypothesis is true, this should be related to the Euler
product (6.1) although this representation is valid only for Re s > 1. This
belief is grounded on counterexamples to the Riemann hypothesis which
have a Dirichlet series expansion and satisfy a Riemann-type functional
equation (see [137], §10.25). In many reformulations of the Riemann hypothesis one can find a multiplicative feature inside. For our purpose we
replace the zeta-function by its logarithm which is, thanks to the Euler
product, also representable as Dirichlet series in its half-plane of convergence. We denote the nontrivial zeros of ζ(s) by ρ. Balazard, Saias &
Yor [12] proved
Z
X
ρ log |ζ(s)|
1
log
|
ds|
=
(6.22)
1 − ρ ,
2π Re s= 1
|s|2
1
2
Re ρ> 2
and deduced (the obvious consequence) that the Riemann hypothesis is true
if, and only if, the integral vanishes. Substituting t = τ2 the integral in (6.22)
can be rewritten as
Z ∞
Z
dt
dτ
1
1
1
log |ζ( 12 + 21 iτ )|
,
log |ζ( 2 + it)| 1
=
2
2π −∞
π R
1 + τ2
| 2 + it|
which Balazard, Saias & Yor also interpret as the expectation value of
log |ζ(s)| of a Brownian motion on the critical line with Cauchy distribution. We may interpret this integral as limit of a Cesàro mean under
application of Birkhoff’s ergodic theorem; the applicability of the ergodic
theorem is obvious by (6.22). This leads to
Theorem 6.5. For almost all x ∈ R,
1
lim
N →∞ N
X
0≤n<N
log |ζ( 21
+
n
1
2 iT x)|
=
X
Re ρ> 12
ρ ;
log 1 − ρ
in particular, the Riemann hypothesis is true if, and only if, either side
vanishes, the left-hand side for almost all real x.
We have checked the statement of the Theorem 6.5 for various values of x
numerically. For instance, with the initial value x = 42 we found
X
10−6
log |ζ( 21 + iT n 42)| = −0.00004 45327 . . . .
0≤n<10k
There is an important application of Birkhoff’s ergodic theorem in
the value-distribution theory of zeta- and L-functions. In 1975 Voronin
[146] discovered a remarkable approximation property of the zeta-function.
His famous universality theorem states: Let 0 < r < 14 and g(s) be a nonvanishing continuous function defined on the disk |s| ≤ r, which is analytic
6. The Riemann Zeta-Function
75
Figure 4. The values of ζ( 21 + it) as −155 ≤ t ≤ 155 in red
and the values of ζ( 12 + iT n x) with x = 42 for 0 ≤ n < 100
in black; the range for t is according to the values T n 42.
in the interior of the disk. Then, for any ǫ > 0, there exists a real number
τ > 0 such that
max ζ s + 34 + iτ − g(s) < ǫ;
|s|≤r
moreover, the set of all τ ∈ [0, T ] with this porperty has positive lower density with respect to the Lebesgue measure. Meanwhile many examples of
universal zeta-functions are known; for example, Dirichlet L-functions
which are given by
∞
X
χ(n) Y
χ(p) −1
L(s, χ) =
,
=
1− s
ns
p
p
n=1
where χ is a Dirichlet character (that is a group homomorphism on the
group of residue classes Z/q Z), or zeta-functions associated with number
fields (see [129] for an overview). Besides Voronin’s original proof there is a
probabilistic approach to universality due to Bagchi, Reich, Laurinčikas
and further developed by many others. In this method the ergodic theorem
replaces the use of Weyl’s uniform distribution theorem in Voronin’s approach. It is conjectured that universality is an ergodic phenomenon (see
[101]). More on this fascinating topic can be found in [95, 101, 129]; also
have a look to [13] for a slightly different presentation.
76
ERGODIC NUMBER THEORY
Interestingly, Birkhoff proved a universality theorem long before
Voronin. In [19] he showed the existence of an entire function f (z) with
the property that, given any entire function g(z), there exists a sequence of
complex numbers an such that
f (z + an )
−→ g(z)
n→∞
uniformly on compact subsets of C.
Although this result has a striking similarity with Voronin’s theorem,
Birkhoff’s universal function f is not explicitly known and the Riemann
zeta-function and its relatives are so far the only explicitly known universal
functions.
Exercises
The first appearance of the zeta defining series seems to be in the work of the 14th
century scientiest Oresme. The evaluation of the sum over the reciprocals of the
squares was one of the great open problems of the beginning of the 18th century.
Where the Bernoullis failed Euler succeeded.
Exercise 6.1. Give a rigorous proof of Theorem 6.1. A good hint could be the
following formula:
∞
X
(−1)k
k=1
(2π)2k
d
sin(πz)
B2k z 2k = πz cot(πz) − 1 = z
log
.
(2k)!
dz
πz
Moreover, evaluate ζ(2) along the following way: verify
∞
∞ Z 1Z 1
∞
X
X
1
3X 1
=
=
x2m y 2m dx dy
2
4 n=1 n2
(2m
+
1)
0
0
m=0
m=0
Z 1Z 1 X
Z 1Z 1
∞
dx dy
=
(xy)2m dx dy =
.
1
−
x2 y 2
0
0 m=0
0
0
Use the transformation
sin u
sin v
and y =
cos v
cos u
in order to compute the appearing double integral above and deduce
x=
ζ(2) =
∞
X
π2
1
=
.
2
n
6
n=1
This elementary method is due to Calabi.
It is remarkable that already Euler had partial results toward the functional
equation for ζ(s), namely, formulae for the values of ζ(s) for integral s and for
half-integral s relating s with 1 − s although he considered ζ(s) as a function of a
real variable s and so the pole at s = 1 is a severe barrier for continuation of ζ(s)
on the real axis. Here is a sketch of his reasoning: for m ∈ N0 ,
(6.23)
and
1m − 2m + 3m ∓ . . . = (1 − 2m+1 )ζ(−m),
m
x
d
x − 2 x + 3 x ∓ ... = x
.
dx
1+x
m
m 2
m 3
6. The Riemann Zeta-Function
Using the latter formula with x = exp(2πiw), we get
m
exp(2πiw)
d
(1 − 2m+1 )ζ(−m) = (2πi)−m
dw
1 + exp(2πiw) 77
.
w=0
This leads to a formula relating values of the zeta-function at s = 2k and 1 − s =
1 − 2k. Euler’s proof needs a modified notion of convergence – this is obvious with
respect to (6.23); using summability arguments one can make also this approach
waterproof.
Exercise 6.2. Read in [59, 137] and provide a rigorous proof for the following
statement due to Euler: for n ∈ N,
Bn
ζ(1 − n) = −
.
n
Exercise 6.3. Try to give complete proofs of Theorem 6.4 and 6.5. Apply these
ideas to other zeta- or L-functions and inform the author of your results...
*
*
*
In the next two chapters we will learn about continued fractions and their
remarkable diophantine approximation properties as well as their ergodic
behaviour.
CHAPTER 7
Crash Course in Continued Fractions
Continued fractions are a powerful tool in Diophantine approximation
theory. They have been used for long time and in various cultures, however,
a systematic theory for continued fractions was only developed in the 17th
century by the astronomer and mathematician Christiaan Huygens while
constructing a mechanical planetarium.
7.1. The Euclidean Algorithm Revisited
Recall the Euclidean algorithm: given two positive integers a and b,
define r−1 := a, r0 := b and apply successively divison with remainder as
follows
with 0 ≤ rn+1 < rn .
rn−1 = an rn + rn+1
for n = 0, 1, 2 . . .. The sequence of remainders rn is a strictly decreasing
sequence of positive integers, hence the algorithmus terminates and it turns
out, by elementary divisibility properties, that the least non-vanishing remainder rm is equal to the greatest common divisor of a and b, which we
denote by rm = gcd(a, b). We may rewrite the Euclidean algorithm as
(7.1)
rn−1
rn+1
rn−1
=
+
rn
rn
rn
for n ≤ m. Here we have an =
r−1
a
=
= a0 +
b
r0
r0
r1
j
rn−1
rn
−1
with 0 ≤ rn+1 < rn
k
which implies
= a0 +
1
−1 = . . . .
r1
a1 +
r2
The first equality yields the integral part a0 of ab ; disregarding the remainder terms r1 , . . ., the further equalities provide better and better rational
approximations.
An example: the tropical year, that is by definition the time from one
spring equinox to the next, consists of
365 days 5 hours 48 minutes and 45.8 seconds
78
≈
365 +
419
days.
1730
7. Crash Course in Continued Fractions
79
Unfortunately, this is not an integer, so how to define a good calendar? Using
the Euclidean algorithm we find
1730
=
419
=
54
=
...
4 · 419 + 54,
7 · 54 + 41,
1 · 41 + 13,
In view of (7.1) this gives
54
1730
=4+
,
419
419
resp.
1730 −1
1
419
≈ 365 + .
365 +
= 365 +
1730
419
4
∗
This is nothing but the Julian calendar: all four years a leap year with an
additional day. With the complete Euclidean algorithm we obtain
1
419
.
= 365 +
365 +
1
1730
4+
1
7+
1
1+
1
3+
1
6+
2
Disregarding the last fraction 12 , we get the rational approximation
365 +
194
419
≈ 365 +
,
801
1730
which corresponds to the Gregorian calendar:† in 800 years 6 (= 200−194)
leap years are left out. In Japan a lunisolar calendar adapted from the
Chinese calendar was in use before 1873 when the Gregorian calendar was
introduced. In lunisolar calendars the tropical year is approximated by lunar
months which is the time from one new Moon to the next; for an impression
on lunisolar calendars in general and the Chinese one in particular we refer
to [7].
The expression
1
a0 +
1
a1 +
a2 + ...
1
+
1
am−1 +
am
∗
named after Julius Caesar who introduced this calendar in 45 B.C. with the scientific support of the Greek astronomer Sosigenes of Alexandria
†
named after pope Gregor XIII, who introduced this calendar in 1582 when the solar
year was already ten days ahead the Julian calendar; the scientists behind this reform
were Aloysius Lilius and Pietro Pitati.
80
ERGODIC NUMBER THEORY
is called a (regular) continued fraction and the appearing numbers an are
said to be its partial quotients. In order to save space and ink we shall use
the abbreviation
[a0 , a1 , a2 , . . . , am ].
For the first we consider [a0 , . . . , am ] as a function of independent variables
a0 , . . . , am . Obviously,
a1 a0 + 1
[a0 ] = a0 , [a0 , a1 ] =
a1
and
a2 a1 a0 + a2 + a0
[a0 , a1 , a2 ] =
.
a2 a1 + 1
By induction, one shows
1
(7.2)
[a0 , a1 , . . . , an ] = a0 , a1 , . . . , an−1 +
an
and
1
= [a0 , [a1 , . . . , an ]].
[a1 , . . . , an ]
For a positive integer n ≤ m the expression [a0 , a1 , . . . , an ] is called the n-th
convergent to [a0 , a1 , . . . , am ]. Moreover, we define recurrent sequences by

p−1 = 1, p0 = a0 , and
pn = an pn−1 + pn−2 , 
(7.3)

q−1 = 0, q0 = 1, and
qn = an qn−1 + qn−2 .
[a0 , a1 , . . . , an ] = a0 +
The computation of continued fractions is not too difficult thanks to the
following theorem:
Theorem 7.1. For 0 ≤ n ≤ m,
pn
= [a0 , a1 , . . . , an ].
qn
Proof by induction. The case n = 0 is trivial; the case n = 1 follows
immediately from
p1
a1 a0 + 1
= .
[a0 , a1 ] =
a1
q1
Now let us suppose that the formula of the theorem is true for n. In view
of (7.2) we find
1
[a0 , a1 , . . . , an , an+1 ] = a0 , a1 , . . . , an +
.
an+1
Using the recursion formulae for pn and qn , the latter expression equals
1
an + an+1
pn−1 + pn−2
(an+1 an + 1)pn−1 + an+1 pn−2
=
1
(an+1 an + 1)qn−1 + an+1 qn−2
an + an+1
qn−1 + qn−2
=
which concludes the induction. •
pn+1
an+1 pn + pn−1
=
,
an+1 qn + qn−1
qn+1
7. Crash Course in Continued Fractions
81
The sequences of numerators and denominators, respectively, have interesting arithmetical properties:
Theorem 7.2. For 1 ≤ n ≤ m,
pn qn−1 − pn−1 qn = (−1)n−1 ,
and
pn qn−2 − pn−2 qn = (−1)n an .
Proof. It follows from (7.3) that
pn qn−1 − pn−1 qn = (an pn−1 + pn−2 )qn−1 − pn−1 (an qn−1 + qn−2 )
= −(pn−1 qn−2 − pn−2 qn−1 ).
Repeating this for n − 1, n − 2, . . . , 2, 1 we derive the first assertion. In a
similar manner
pn qn−2 − pn−2 qn = (an pn−1 + pn−2 )qn−2 − pn−2 (an qn−1 + qn−2 )
= an (pn−1 qn−2 − pn−2 qn−1 ),
which implies the second statement. •
Next we asign numerical values to the partial quotients an and, consequently, to the continued fraction [a0 , a1 , . . .] itself. In the sequel we assume
a0 ∈ Z and an ∈ N for 1 ≤ n < m as well as am ≥ 1. In view of Theorem 7.1
it follows that pn and qn are integers for n < m; moreover, the first assertion
of Theorem 7.2 implies that they are coprime.
Now let α be rational. Then there exist coprime integers a and b > 0
such that α = ab . Using our variation of the Euclidean algorithm (7.1)
applied to r−1 = a and r0 = b, it follows that α can be represented as a
finite continued fraction:
a
rn−1
= [a0 , a1 , a2 , . . . , am ]
with an =
.
b
rn
This representation is not unique, since
[a0 , a1 , a2 , . . . , am ] = [a0 , a1 , a2 , . . . , am − 1, 1].
There is an obvious way out of this non-uniqueness. We conclude: every
rational number has a unique representation as a finite continued fraction if
we assume the last partial quotient to be strictly larger than one.
7.2. Infinite Continued Fractions
We may rewrite algorithmus (7.1) for the computation of the continued
fraction expansion of a given rational α as
1
for n = 0, 1, . . . .
(7.4)
α0 := α, αn = ⌊αn ⌋ +
αn+1
Setting an = ⌊αn ⌋ we obtain α = [a0 , a1 , . . . , an , αn+1 ]. This algorithm is
called the continued fraction algorithm. If α is rational, the iteration terminates after finitely many steps and it nothing but the Euclidean algorithm
82
ERGODIC NUMBER THEORY
in disguise. What happens if we start with an irrational number? For instance, for α = π = 3.14159 . . . we compute
1
a0 = ⌊π⌋ = 3
and
α1 =
= 7.06251 . . . ,
π−3
1
= 15.99744 . . . ,
a1 = ⌊7.06251 . . .⌋ = 7
and
α2 =
7.06251 . . . − 7
1
,
a2 = ⌊15.99744 . . .⌋ = 15
and
α3 =
15.99744 . . . − 15
which leads to π = [3, 7, 15, α3 ].
The Japanese samurai and mathematician Matsunaga Yoshisuke
computed π correct to 52 digits, the most precise numerical value for π
in wasan. He also gave the begining of the continued fraction expansion by
a method (similar to the above) called reiyakujyutsu, in translation dividing
by zero (cf. [63]).
Now let α be an arbitrary irrational number. Then the iteration does
not terminate (since otherwise we would obtain a representation of α as
a finite continued fraction, contradicting the irrationality of α). It follows
that the continued fraction algorithm applied to an irrational α produces an
infinite sequence of finite continued fractions:
[a0 , a1 , . . .] := lim [a0 , a1 , . . . , αm ].
m→∞
The limit of this sequence is denoted by [a0 , a1 , a2 , . . .] and is called an
infinite continued fraction. The first task is to examine whether this infinite
process is convergent and, if so, whether the limit is related to our starting
value α.
Theorem 7.3. Let α = [a0 , a1 , . . . , an , αn+1 ] be irrational with convergents
[a0 , a1 , . . . , an ] = pqnn . Then
α−
In particular,
(−1)n
pn
=
.
qn
qn (αn+1 qn + qn−1 )
pn
= [a0 , a1 , a2 , . . .].
n→∞ qn
α = lim
Proof. Firstly, we note that all previous observations on finite continued
fractions carry over to infinite ones, in particular (7.3) and Theorem 7.1. A
short computation shows
pn
αn+1 pn + pn−1 pn
pn−1 qn − pn qn−1
α−
=
−
=
.
qn
αn+1 qn + qn−1
qn
qn (αn+1 qn + qn−1 )
Hence, Theorem 7.2 implies the first assertion.
Since an+1 ≤ αn+1 we further have
1
p
n
α − ≤
.
qn
qn (an+1 qn + qn−1 )
In case of irrational α the sequences of pn and of qn both are strictly increasing for n ≥ 2. Thus, the sequence of convergents pqnn is alternately larger,
7. Crash Course in Continued Fractions
83
resp. smaller than α; those with even index n lie to the left and those with
odd index n to the right of α:
p0
p2
p3
p1
<
< ... < α < ... <
< .
q0
q2
q3
q1
If α is irrational, the continued fraction algorithm does not terminate and
the denominators qn of its convergents form a strictly monotonic increasing
sequences of integers. It thus follows from the proven part of the theorem that the distances between consecutive convergents is tending to zero.
Hence, the numbers pqnn converge to the limit [a0 , a1 , . . .] and this limit equals
α. •
It is easy to see that the continued fraction expansion of an irrational
number is unique. This already allows to construct the set of real numbers
R out of the rationals Q. Moreover, continued fractions induce an order
on the real axis. Given two real numbers α = [a0 , . . . , an , αn+1 ] and α′ =
[a0 , . . . , an , α′n+1 ] with identical partial quotients aj for j ≤ n, it follows
that any α′′ lying in between α and α′ has a continued fraction expansion
starting with the same partial quotients, more precisely,
α′′ = [a0 , . . . , an , α′′n+1 ].
Theorem 7.3 already shows the importance of continued fractions in the
theory of Diophantine approximation. Here we note
Corollary 7.4. Let α = [a0 , a1 , . . .] be irrational with convergents
1
pn (7.5)
α − qn < an+1 q 2 .
n
pn
qn .
Then
This statement improves Dirichlet’s approximation theorem 1.1: the
seuquence of convergents approximates α better and better (since the denominators are strictly increasing and each partial quotient is greater than
or equal to one). Thus, we do not only know about very good rational
approximations to a given α, but can compute them explicitly from the
continued fraction expansion of α.
Actually, the approximation theorem of Hurwitz gives another improvement: for any α ∈ R \ Q there exist infinitely many rationals pq such
that
1
p (7.6)
ξ − q < √5q 2 ,
√
and the constant 5 cannot be replaced by any larger constant. For the
proof one considers the slowest converging continued fraction
√
Fn+1
5+1
= [1, 1, 1, 1, 1, . . .] = lim
,
n→∞ Fn
2
where the Fn are the Fibonacci numbers, defined by the recursion
F0 := 0, F1 := 1
and
Fn+1 = Fn + Fn−1
for n ∈ N.
84
ERGODIC NUMBER THEORY
Another example of an infinite continued fraction is the one for π:‡ We
compute
π = [3, 7, 15, 1, 292, 1, 1, 1, 21, 31, 14, 2, 1, 2, 2, 2, . . .].
Cutting the continued fraction in front of the partial quotient 292, we obtain
355
p3
= [3, 7, 15, 1] = .
113
q3
This leads to an excellent approximation:
355
1
0<
−π <
= 0.00000 02682 . . . ,
113
292 · 1132
which was already known by the Chinese mathematician Tsu Chung Chi
in 500 A.D.. Moreover, the next convergent has an extremely large denominator: q4 = a4 q3 + q2 = 292 · 113 + 106 = 33 102. The sequence of the
convergents is identical with the best rational approximations to π starts as
follows:
333
1 03993
355
22
3
<
<
< ... < π < ... <
<
.
1
106
33102
113
7
This is no miracle as Lagrange proved in 1770:
Theorem 7.5. Let α be real with convergents
positive integers p, q satisfying 0 < q ≤ qn and
pn
qn . Then,
p
pn
q 6= qn ,
for n ≥ 2 and any
|qn α − pn | < |qα − p|.
This is the so-called law of best approximation; it shows that one cannot
approximate better than with the convergents of the continued fraction expansion!
Proof. We may assume that p and q are coprime. Since
|qn α − pn | < |qn−1 α − pn−1 |
it suffices to prove the assertion under the assumption that qn−1 < q ≤ qn ;
the full assertion follows by induction.
First suppose q = qn , then p 6= pn and
p pn − ≥ 1.
q
qn qn
By Theorem 7.3,
1
1
α − pn ≤
<
,
qn
qn qn+1
2qn
where we have used qn+1 ≥ 3 (since n ≥ 2). By the triangle inequality,
α − p ≥ p − pn − α − pn > 1 > α − pn ,
q q
qn qn 2qn qn which yields the inequality of the theorem after multiplication with q = qn .
‡
So far no pattern has been found in the regular continued fraction expansion of π,
in contrast to e = exp(1) = [2, 1, 2, 1, 1, 4, 1, . . . , 1, 2n, 1, . . .]; here the meaning of this
notation is obvious.
7. Crash Course in Continued Fractions
85
Now suppose that qn−1 < q < qn . By Theorem 7.2 the linear equation
system
pn X + pn−1 Y = p
and
qn X + qn−1 Y = q
has the unique solution
pqn−1 − qpn−1
= ±(pqn−1 − qpn−1 )
x=
pn qn−1 − pn−1 qn
and
pqn − qpn
= ±(pqn − qpn ).
pn qn−1 − pn−1 qn
Thus, x and y are distinct integers different from zero. We observe that x
and y have different signs and the same holds for qn α − pn and qn−1 α − pn−1
as well. Hence, the numbers x(qn α − pn ) and y(qn−1 α − pn−1 ) have the same
sign. Since
qα − p = x(qn α − pn ) + y(qn−1 α − pn−1 ),
y=
it follows that
|qα − p| > |qn−1 α − pn−1 | > |qn α − pn |,
and this concludes the proof. •
Exercises
How fast do the convergents to an infinite continued fraction grow? Of course, this
depends on the limit of the continued fraction. But one may try to give universal
lower bounds for the denominator and numerator of the convergents.
Exercise 7.1. For the convergents
that
n
pn ≥ 2 2 −1
pn
qn
to a given irrational α = [a0 , a1 , . . .] prove
qn ≥ 2
and
for any n ∈ N. Moreover, show that
n−1
2
n
X
(−1)j−1
pn
= a0 +
.
qn
qj qj−1
j=1
+
The next topic provides a bijection between N and Q different from the usual
one, and much more convenient, from Calkin & Wilf [26].
1
3
1
4
tt
tt
tt
88
4
3
1
2
iiii
iiii
i
i
i
iii
JJ
JJ
JJ
3
2
3
5
Starting with the initial value
1
1
88
5
2
1
1
UUUU
UUUU
UUUU
UU
t
tt
tt
t
2
3 8
8
2
5
5
3
2
1
JJ
JJ
JJ
3
4
3
1
88
4
1
construct recursively a tree by
a
a
a+b
7→
,
.
b
a+b
b
The Calkin–Wilf sequence is then given by reading this tree line by line from the
top, 11 , 21 , 21 , 13 , 23 , 23 , 13 , 14 , 43 , 53 , . . ..
86
ERGODIC NUMBER THEORY
Exercise 7.2. Show that the successors of any reduced fraction in the Calkin–
Wilf sequence are reduced too. Further, prove that the Calkin–Wilf sequence
takes any positive rational value exactly once. Moreover, compute the continued
fraction expansions for the rational numbers appearing in the first four rows of the
Calkin–Wilf tree. Is there any pattern? Where will the number 355
113 appear?
Try to find a rule for how the rationals in the Calkin–Wilf sequence can be
enumerated in terms of their continued fraction expansions. Finally, prove that the
Calkin–Wilf sequence satisfies the following recursion formula: for n ∈ N,
1
1
.
x1 = , xn+1 =
1
⌊xn ⌋ + 1 − {xn }
√
Exercise 7.3. Prove Hurwitz’s approximation theorem; the constant 5 is closely
related to [1, 1, . . .]. Hint: use the law of best approximation, Theorem 7.5.
Whenever an algorithm is used it is important to know whether it terminates,
and in case it does, how fast.
Exercise 7.4. For the number of steps m in the Euclidean algorithm for the
integers b ≤ a show
!−1
√
5+1
m ≤ log
(1 + log a).
2
Hint: show that the Euclidean algorithm is extraordinarily slow for consecutive
Fibonacci numbers Fn . Use Binet’s formula (1.8) to derive the estimate. Moreover, show that any positive integer a has a binary representation
a=
ℓ
X
ak 2 k ,
where
k=0
ak ∈ {0, 1}, aℓ = 1.
Give an upper bound for the quantity ℓ and deduce that a can be expressed by apa
proximately log
log 2 bits. What does this imply for the running time of the Euclidean
algorithm?
The Euclidean algorithm terminates in polynomial time in the input data. The
estimate for the running time of the Euclidean algorithm is due to Lamé in 1845,
long before the computer age. It is remarkable that the average case does not fall
much behind the bound for the worst case. Heilbronn [62] showed that the average length of the Euclidean algorithm is π122 log 2 log a. See [139] for improvements.
For the final task a look into the literature is probably needed:
Exercise 7.5. Prove Lagrange’s theorem: the continued fraction expansion of
an irrational real number α is eventually periodic if, and only if, α is quadratic
irrational, i.e., there exists an irreducible quadratic polynomial P ∈ Z[X] with
P (α) = 0.
*
*
*
In the following chapter we shall study statistical patterns in continued
fraction expansions. We motivate these investigations by an example from
outer space: if the quotient of periods of evolution of two planets around the
Sun is close to a rational number, a phenomenon called resonance in celestial
7. Crash Course in Continued Fractions
87
mechanics, then these planets will perturb each other. For instance, Jupiter
and Saturn pass approximately 299 and 120.5 angular seconds a day, which
2
implies a resonance value of 120.5
299 = 0.403 . . . ≈ 5 and generates an observable secular perturbation that increases for several hundred years before the
planets return to their previous orbits (cf. [5]). When Poincaré was thinking about the stability of our solar system, he suggested to investigate how
many resonance relations exist. The method of choice are continued fractions since large partial quotients go along with extraordinary good rational
approximations. Therefore, it is necessary to understand how many real
numbers have continued fraction expansions with large partial quotients.
CHAPTER 8
Metric Theory of Continued Fractions
In a letter to Laplace from January 1812 (long before ergodic theory or even measure and probability theory) Gauss described a ’curious’
problem he already had been working on for twelve years without a satisfying solution: given 0 ≤ ξ ≤ 1, let mn (ξ) denote the probability that for
α = [0, a1 , a2 , . . . , an , αn+1 ] ∈ [0, 1) the inequality
1
αn+1
<ξ
holds. Obviously, m0 (ξ) = ξ and mn+1 depends on mn . Very likely Gauss
was aware fo the identity
∞ X
1
) .
mn ( k1 ) − mn ( k+ξ
mn+1 (ξ) =
k=1
Actually, Gauss wrote in his letter that he had found a simple proof for
(8.1)
lim mn (ξ) =
n→∞
log(1 + ξ)
log 2
and that the limit satisfies the functional equation
m(ξ) =
∞ X
k=1
1
)
m( k1 ) − m( k+ξ
in addition to m(0) = 0 and m(1) = 1. However, he was not able to describe the difference mn (ξ) − log(1+ξ)
log 2 . There was also some kind of language
problem for Gauss. It was difficult to formulate his result without the notion of measure. Definitely, he knew of exceptions from his probabilistic law
but the additional term ’for almost all ξ’ was added only one century later
when the deviation from the limit was successfully investigated by Kuzmin
[92]. His solution gives not only a first published proof of (8.1)) but also an
explicit error term for the limit law. This error term estimate was improved
by Lévy [96] who showed
mn (ξ) =
log(1 + ξ)
+ O(q n )
log 2
for some q ∈ (0, 0.76); a proof can be found in Rockett & Szüsz [119].
The sharpest known bound is due to Wirsing [154]. Using this theorem
of Gauss–Kuzmin–Lévy in various ways Lévy and Khintchine observed
88
8. Metric Theory of Continued Fractions
89
interesting statistical results for continued fraction, e.g., that almost all
continued fractions [0, a1 , a2 , . . .] satisfy
! N1
log k
∞ N
Y
Y
log 2
1
an
=
(8.2)
lim
1+ 2
,
N →∞
k + 2k
n=1
k=1
where the product on the right is convergent with a limit approximately
2.68. This almost sure convergence for the geometrical mean and much
more we shall prove by an ergodic argument (without using the Gauss–
Kuzmin–Lévy theorem). Whereas the approaches of Khintchine and
Lévy were of probabilistic nature Wolfgang Doeblin [43] in 1940 and
Ryll-Nardzewski showed independently in 1951 that an ergodic system
rules the complicated arithmetic of continued fractions. The ergodicity of
the Gauss map had been established earlier by Knopp [84] in 1926, and
(independently) by Martin [100] in 1934, however, both used a different
language and followed a different line of investigation.
8.1. Ergodicity of the Continued Fraction Mapping
The continued fraction map T : [0, 1) → [0, 1) is defined by
1
for 0 < x < 1
T x = mod 1
x
and T 0 = 0; note that for 0 < x < 1 we could also have written T x =
1
1
1
n
x − x = { x }. Obviously, T x = 0 holds for some n if, and only if, x is
1
1
y
y
0
0
0
1
x
0
1
x
Figure 1. The continued fraction map: on the left its graph,
on the right the graph of its density.
rational. In fact, from the previous chapter we know
(8.3)
T [0, a1 , a2 , . . .] = [a1 , a2 , a3 , . . .] mod 1 = [0, a2 , a3 , . . .].
For our ergodic machinery we need to find a measure for which T is measure
preserving. In general this is no easy task (see Exercise 3.2).
Here comes the solution: for a Lebesgue measurable set A the Gauss
measure µ is given by
Z
dx
1
.
µ(A) =
log 2 A 1 + x
90
ERGODIC NUMBER THEORY
This defines a probability measure on [0, 1). Now we shall prove that the
continued fraction map T is measure preserving with respect to the Gauss
measure µ.
It suffices to show that µ(T −1 (0, ξ)) = µ((0, ξ)), resp.
Z
Z
dx
dx
=
1
+
x
1
−1
T (0,ξ)
(0,ξ) + x
for any ξ ∈ [0, 1). We note
T
−1
(0, ξ) =
∞ [
n=1
1
1
,
n+ξ n
,
where the union on the right-hand side is disjoint since 0 ≤ ξ < 1. It follows
from
Z 1/n
1
1
dx
= log 1 +
− log 1 +
n
n+ξ
1/(n+ξ) 1 + x
that
Z
T −1 (0,ξ)
(8.4)
dx
1+x
∞ Z
X
1/n
dx
1
+x
n=1 1/(n+ξ)
∞ X
1
1
− log 1 +
;
=
log 1 +
n
n+ξ
n=1
=
it is not difficult to see that the appearing series is convergent. Since
1 + n1
1 + nξ
n+1 n+ξ
=
,
=
1
ξ
n n+1+ξ
1 + n+ξ
1 + n+1
we may replace the series in (8.4) by
∞ X
ξ
ξ
log 1 +
− log 1 +
.
n
n+1
n=1
Now reading backwards we find
Z ξ
Z
∞ Z ξ/n
X
dx
dx
dx
=
=
,
1+x
0 1+x
T −1 (0,ξ) 1 + x
n=1 ξ/(n+1)
and, consequently, the map T is measure preserving.
Next we want to show that µ is ergodic which will turn out to be more
sophisticated. For positive integers aj , define
∆n := ∆n (a1 , . . . , an )
:= {x = [0, a1 (x), a2 (x), . . .] ∈ [0, 1) : a1 (x) = a1 , . . . , an (x) = an }.
These sets consist of those x from the unit interval which have partial quotients aj (x) equal to the prescribed values aj for j = 1, . . . , n; for example,
1
1
1
, 1 , ∆1 (n) =
,
for n ≥ 2.
∆1 (1) =
2
n+1 n
8. Metric Theory of Continued Fractions
91
In fact, the sets ∆n are semi-open intervals with end points
pn
pn + pn−1
and
,
qn
qn + qn−1
which follows immediately from the bijective mapping
pn + tpn−1
= [0, a1 , . . . , an + t]
[0, 1] ∋ t 7→
qn + tqn−1
(in addition with our observations on continued fractions from the previous
chapter). Here, as usual, pqnn stands for the nth convergent to [a0 , a1 , . . .].
Now denote by D the set of all intervals ∆n (built from all possible ingredients a1 , . . . , an ∈ N with arbitrary n ∈ N). Then the end points of all
intervals ∆n coincide with the set of rational numbers in the unit interval
[0, 1). Thus, D is a countable family of semi-open intervals which are related
to continued fractions and generate the Borel σ-algebra.
Using Theorem 7.2 we compute the Lebesgue measure of the sets ∆n
as
1
(8.5)
λ(∆n (a1 , . . . , an )) =
.
qn (qn + qn−1 )
For 0 ≤ a < b ≤ 1, we either have
(8.6)
or
(8.7)
pn + apn−1 pn + bpn−1
,
{x : a ≤ T x ≤ b} ∩ ∆n =
qn + aqn−1 qn + bqn−1
n
n
{x : a ≤ T x ≤ b} ∩ ∆n =
pn + bpn−1 pn + apn−1
,
qn + bqn−1 qn + aqn−1
according to n being even or odd. In any case we have
,
{x : a ≤ T n x ≤ b} = T −n [a, b)
and
(8.8)
λ(T −n [a, b) ∩ ∆n ) = λ([a, b))λ(∆n )
qn (qn + qn−1 )
.
(qn + aqn−1 )(qn + bqn−1 )
These technical computations are left to the reader as Exercise 8.2.
Since the sequence of the denominators qn is monotonic,
1
qn
qn (qn + qn−1 )
qn (qn + qn−1 )
< 2.
<
<
<
2
qn + qn−1
(qn + aqn−1 )(qn + bqn−1 )
qn2
In view of (8.8) we deduce for an arbitrary interval I ⊂ [0, 1) the inequalities
1
λ(I)λ(∆n ) < λ(T −n I ∩ ∆n ) < 2λ(I)λ(∆n ).
2
The same inequalities hold if we replace I by any finite union of disjoint
intervals of this type. And since the set of such disjoint unions generates
the Borel σ-algebra, Inequality (8.9) holds for any Borel set and, in
particular, for any Lebesgue measurable set A:
1
λ(A)λ(∆n ) ≤ λ(T −n A ∩ ∆n ) ≤ 2λ(A)λ(∆n ).
(8.9)
2
92
ERGODIC NUMBER THEORY
Note that we have replaced the strict inequalities by simple inequalities since
the above argument to step from intervals to measurable sets involves an
approximation process.
However, we have to introduce the Gauss measure µ in our consideration. We have
1
1
1
1
<
≤
for 0 ≤ x < 1.
2 log 2
log 2 1 + x
log 2
Comparing the densities λ and µ, it follows that, for any Lebesgue measurable set A, the inequalities
(8.10)
1
1
λ(A) < µ(A) ≤
λ(A)
2 log 2
log 2
hold. Now we use the above inequalities to get rid of the Lebesgue measure.
It follows from (8.9) and (8.10) that
log 2
µ(A)µ(∆n ).
4
Now we are in the position to prove the following statement:
(8.11)
µ(T −n A ∩ ∆n ) >
Theorem 8.1. The continued fraction map T is a measure preserving ergodic transformation on the probability space ([0, 1), L, µ), where L is the
family of Lebesgue measurable sets in [0, 1) and µ is the Gauss measure.
In particular, ([0, 1), L, µ, T ) is an ergodic system.
Proof. We have already shown that T is µ-invariant. Hence, it remains
to show that it is ergodic. Given as Lebesgue set B of positive measure.
We further assume that the complement of B has positive measure. Then
B has a representation as disjoint union B = E ∪ F , where E is a Borel
set of measure µ(E) = µ(B) and F is a null set (see [49]). Now assume
that µ(b) < 1. Since the complement of B has positive measure, so has the
complement E c of E. For any ǫ > 0 there exists a set Gǫ which can be
represented as a finite disjoint union of open intervals ∆n ∈ D and has a
small symmetrical difference with E c :
µ(E c ∆Gǫ ) < ǫ
(this is some kind of approximation). It follows from (8.11) that
µ(E ∩ Gǫ ) ≥ γµ(Gǫ )
with γ =
log 2
µ(B).
4
By construction, we get
µ(E c ∆Gǫ ) ≥ µ(E ∩ Gǫ ) ≥ γµ(Gǫ ) ≥ γµ(E c ∩ Gǫ ) > γ(µ(E c ) − ǫ),
which leads to
γ(µ(E c ) − ǫ) < µ(E c ∆Gǫ ) < ǫ.
This yields the inequality γµ(E c ) < ǫ+ǫγ, which is impossible for sufficiently
small ǫ > 0. Hence, we have found a contradiction and conclude µ(B) = 1.
Thus T is ergodic. •
8. Metric Theory of Continued Fractions
93
In the proof we have used the lemma of Knopp [84] (and its proof): Given
a probability space ([0, 1), F, λ); if B is a Lebesgue measurable set and C
is a class of subintervals of [0, 1) such that
• any open subinterval of [0, 1) has a representation as a countable
union of disjoint elements of C, and
• for any A ∈ C we have λ(A ∩ B) ≥ γλ(A) with some positive
constant γ independent of A,
then λ(B) = 1. This ergodicity criterion is important for practical purposes.
8.2. The Theorems of Khintchine and Lévy
Now we apply our machinery to the ergodic system ([0, 1), L, µ, T ) to
obtain remarkable results on the statistical data of continued fraction expansions. We start with almost sure asymptotics for some mean values for
partial quotients (as in (8.2)). Khintchine [81] proved
Theorem 8.2. For almost all x = [0, a1 , a2 , . . .] ∈ [0, 1),
(i) the positive integer k appears in the sequence of partial quotients
an with asymptotical density
1
1
1
♯{1 ≤ n ≤ N : an = k} =
log 1 +
;
lim
N →∞ N
log 2
k(k + 2)
(ii) the arithmetical mean of the partial quotients is infinity:
N
1 X
an = +∞;
lim
N →∞ N
n=1
(iii) for the geometrical mean,
! N1
N
∞ Y
Y
an
lim
=
1+
N →∞
n=1
k=1
1
k(k + 2)
log k
log 2
.
According to (i) we have partial quotient 1 for almost all x from the unit in4/3
terval with a frequency of log
log 2 ≈ 41.50 . . . percent wheras partial quotient
9/8
2 appears with approximately log
log 2 ≈ 16.99 . . . percent. This is nothing else
but a sophisticated analogue of Benford’s law, resp. Gelfand’s problem
on digit distribution of continued fraction expansions.
Proof. We write x = [0, a1 (x), a2 (x), . . .]. Recall from the last section that
the continued fraction map deletes the first partial quotients and shifts the
others. Thus, a1 (x) = ⌊ x1 ⌋ = ⌊T x⌋ and a2 (x) = a1 (T x) by (8.3), which
1
, k1 ] we
implies an (x) = a1 (T n−1 x) for n ≥ 2. Using the intervals ∇k := ( k+1
have a1 (ξ) = k if, and only if, {ξ} ∈ ∇k , thence
(8.12)
an (x) = k
⇐⇒
a1 (T n−1 x) = k
⇐⇒
T n x ∈ ∇k .
The sequence of denominators of the convergents associated with the continued fraction expansion x = [0, a1 (x), a2 (x), . . .] are initmitely related with
the images of the iterations T n in the intervals ∇k .
94
ERGODIC NUMBER THEORY
3
40
2
20
1
0
0
0
1000
0
1000
n
n
Figure 2. The slow convergence of the geometric mean
(left) and the arithmetic mean (right) of the partial quotients
in the example x = π − 3.
Since the continued fraction map T is ergodic by Theorem 8.1, an application of Birkhoff’s Ergodic Theorem 4.2 with the indicator function
f = χ∇k yields
Z
1 X
n
χ∇k dµ = µ(∇k );
lim
χ∇k (T x) =
N →∞ N
[0,1]
0≤n<N
this integral can be computed as
Z 1/k
1
dx
1
1
1
=
log 1 +
− log 1 +
log 2 1/(k+1) 1 + x
log 2
k
k+1
=
1
k+1k+1
log
,
log 2
k k+2
which is the value appearing in (i). Since χ∇k (T n x) = 1 holds with regard
to (8.12) exactly for an = k, the proof of (i) is complete.
The second assertion follows in a similar manner
R 1by using the step func1
tion f (x) = ⌊ x ⌋ = a1 (x). In this case the integral 0 f dµ diverges to +∞.
For (iii) we consider the step function f (x) = log a1 (x) which, in view of
(8.12), we may also rewrite as f (x) = log k for x ∈ ∇k . We note
Z
1
f (x) dx =
0
∞
X
k=1
which implies the convergence of
R
µ(∇k ) log k ≤
[0,1] f
∞
X
log k
k=1
k2
dµ, since
dµ
1
1
=
≪1
dx
log 2 1 + x
for x ∈ [0, 1).
Birkhoff’s Ergodic Theorem 4.2 yields
1
N →∞ N
lim
X
0≤n<N
,
log an =
Z
[0,1)
f dµ.
8. Metric Theory of Continued Fractions
95
The latter integral is easily evaluated as
Z 1
Z
∞
X
log k 1/k
dx
f (x) dµ(x) =
log 2 1/(k+1) 1 + x
0
k=1
∞
X
log k
1
=
log 1 +
;
log 2
k(k + 2)
k=1
log k
as k → ∞,
it should be noted that the terms grow asymptotically k(k+2)
which implies the convergence of both, the infinite series and the infinite
integral. For the geometric mean we thus get
! N1
Z 1
N
Y
lim
an
f (x) dµ(x)
= exp
N →∞
0
n=1
= exp
∞
X
k=1
!
log k
1
log 1 +
,
log 2
k(k + 2)
which leads to the limit according to (iii). •
For N → ∞ the almost sure limit for the geometric mean defines the
so-called Khintchine constant
log k
∞ Y
log 2
√
1
N
a1 a2 · . . . · aN −→
= 2.68545 20010 . . . .
1+
k(k + 2)
k=1
We shall discuss some special continued fractions with respect to this result. For instance, Euler’s number has the following continued fraction
expansion
e = exp(1) = [2, 1, 2, 1, 1, 4, 1, 1, 6, 1, . . . , 1, 2n, 1, . . .]
(a proof can be found in [128]). Here we have for the arithmetic mean
a1 + a2 + . . . + aN ∼ 19 N , whereas for the geometric mean
r
2
3
√
2N
2
N
N
N! ∼
a1 a2 · . . . · aN ∼
3
3e
holds. In the latter case we observe a behaviour different from normality. For
π there is no regularity in the continued fraction expansion known; here computer experiments show a regular behaviour in the sense of Khintchine’s
theorem.
A classical theorem of Lagrange characterizes quadratic irrationalities,
i.e., roots of irreducible quadratic polynomials with rational coefficients, as
those real numbers with an eventually periodic continued fraction expansion
(see [128]). For example,
√
√
√
5+1
3+1
2 = [1, 2, 2, 2, . . .],
= [1, 1, 1, 1, . . .],
= [1, 2, 1, 2, . . .].
2
2
In particular, the partial quotients of quadratic irrationals are bounded and
it is not too difficult to show that in general this does not match the almost
sure statistics of Khintchine’s theorem. Actually, it is not known whether
96
ERGODIC NUMBER THEORY
√
cubic irrationals – as 3 2 – or algebraic irrationals of higher degree have
or have not arbitrarily large partial quotients in their continued fraction
expansion.
Recently, Wolf [155] checked the ordinates of the first 2 600 nontrivial
zeros (in ascending order in the upper half-plane), all accurate to 1 000
digits, with respect to their continued fraction expansion; they all were
found to have a geometric mean matching the Khintchine constant. This
may be interpreted as hint for their irrationality, a deep open conjecture in
zeta-function theory (see Chapter 6.2). We add more data indicating the
irrationality of some zeta-values:
ζ(2)
ζ(3)
ζ(4)
ζ(5)
K100
K1000
2.21929 . . .
2.40379 . . .
3.10594 . . .
2.75239 . . .
2.64745 . . .
2.68948 . . .
2.83378 . . .
2.59444 . . .
√
Here KN := N a1 , . . . , aN , where the an s are the partial quotients
of the continued fraction of the corresponding zeta-value, e.g., ζ(2) =
[1, 4, 1, 1, 8, 1, 1, 1, 4, 1, 9, 9, . . .].
Birkhoff’s ergodic theorem allows more asymptotical results of this
type. Next we investigate the sequence of denominators qn of the convergents. In particular their growth qn → ∞ is of interest and leads to interesting insights with respect to Diophantine approximation. The following
theorem is due to Lévy [96]:
Theorem 8.3. Denote by
almost all x ∈ [0, 1),
pn
qn
=
pn (x)
qn (x)
the n-th convergent to x. Then, for
π2
1
log qn (x) =
n→∞ n
12 log 2
(8.13)
lim
and
π2
1
pn −1
.
lim log x − =
n→∞ n
qn
6 log 2
(8.14)
Surprisingly, these asymptotics have various interpretations in physics,
e.g., as Kolmogorov entropy of mixmaster cosmologies; see Csordás &
Szépfalusy [36].
Proof. Since
pm (x)
qm (x)
=
1
1
=
pm−1 (T x)
a1 + [0, a2 , a3 , . . . , am ]
a1 + qm−1 (T x)
=
qm−1 (T x)
,
pm−1 (T x) + a1 qm−1 (T x)
8. Metric Theory of Continued Fractions
97
it follows that pm (x) = qm−1 (T x) for m ∈ N. Hence,
1
qn (x)
=
=
pn (x)
1
p1 (T n−1 x)
· ... ·
qn (x) qn−1 (T x)
q0 (T n x)
p1 (T n−1 x)
pn (x) pn−1 (T x)
· ... ·
,
qn (x) qn−1 (T x)
q1 (T n−1 x)
since q0 (T n x) = q0 = 1 independent of x and n. Taking the logarithm leads
to
X
pn−j (T j x)
.
− log qn (x) =
log
qn−j (T j x)
0≤j<n
Since the numbers
(8.15)
pn (x)
qn (x)
approximate x, we write
1
1
1 X
log(T j x) + Rn (x)
− log qn (x) =
n
n
n
0≤j<n
with remainder term
Rn (x) =
X 0≤j<n
pn−j (T j x)
j
log
− log(T x) .
qn−j (T j x)
First we shall estimate the error Rn (x). Setting k = n − j we see that
p +pk−1
. Now
ξ := T j x belongs to an interval ∆k with end points pqkk and qkk +qk−1
Theorem 7.3 and the mean value theorem form analysis imply for even k
that
Z ξ
pk
du
0 < log ξ − log
=
qk
pk /qk u
pk 1
qk
1
1
=
ξ−
≤
<
qk η
qk (qk + qk−1 ) pk
qk
with some η ∈ ( pqkk , ξ). Similarly,
pk
1
< log ξ − log
qk
qk
for odd k. Denoting by Fk the kth Fibonacci number, their recursive
definition
yields estimate qk (x) ≥ Fk with equality if, and only if, x =
√
1
2 ( 5 + 1) is the golden ratio. Thus,
n
X
1
,
|Rn (x)| ≤
Fk
k=1
which we can bound thanks to Binet’s formula (1.8). Writing G :=
the infinite geometric series expansion yields
∞
∞
X
X
1
<
G−k < +∞.
|Rn (x)| <
Fk
k=1
In particular,
k=1
1
Rn (x) = 0
n→∞ n
for all x. Hence, the remainder term Rn (x) in (8.15) is negligible.
lim
√
5+1
2 ,
98
ERGODIC NUMBER THEORY
If the following limit exists,
n
1X
log(T n−j x),
n→∞ n
− lim
(8.16)
j=1
limn→∞ n1
log qn (x) exists as well and both limits have the same value.
then
The expression (8.16) can be computed with Birkhoff’s ergodic theorem
for almost all x as
Z 1
n
1X
1
log x
π2
j
(8.17)
lim
log(T x) =
dx = − ,
n→∞ n
log 2 0 1 + x
12
j=1
where the appearing integral was evaluated in (6.5) by use of Euler’s formula (6.2). This proves (8.13). In order to prove the second assertion, we
apply Theorem 7.3 to obtain
1
pn 1
< x− <
.
2qn qn+1 qn
qn qn+1
Now, we may deduce (8.14) from (8.14). •
There are quite many interesting results beyond Lévy’s theorem.
Philipp & Stackelberg [110] improved his result in showing
2
| log qn (x) − 12nπ
log 2 |
lim sup p
=1
n→∞
2σ 2 n log log n
for almost all x ∈ [0, 1), where
2
Z 1 1
dx
nπ 2
2
σ = lim
log qn (x) −
n→∞ n 0
12 log 2
(log 2)(1 + x)
is a positive constant. A further result due to Philipp [109] shows a
Gaussian normal distribution :
!
2
Z z
log qn (x) − 12nπ
1
log 2
√
<z = √
exp(− 12 u2 ) du,
lim µ x ∈ [0, 1] :
n→∞
σ n
2π −∞
where µ is any absolutely continuous probability measure with respect to
the Lebesgue measure.
Faivre [52] investigated quadratic irrational numbers x. In this case
the sequence n1 log qn (x) converges (thanks to the eventually periodic continued fraction expansion) and its limit β(x) is called the Lévy constant.
An interesting question is which values β(x) can assume. Arnold proposed
the investigation of the average frequencies of partial quotients in continued
fractions of solutions of the quadratic equation X 2 + X + q = 0 as p, q grow
(e.g., lie inside some disk of radius R as R → ∞); surprisingly, with such an
averaging the continued fractions seem to behave like those of random real
numbers (cf. [5, 140]). Kesseböhmer & Stratmann [79] obtained multifractal generalizations of the classical results of Lévy and Khintchine.
8. Metric Theory of Continued Fractions
99
In our metrical investigations we have not used Gauss’ limit formula
(8.1) which can be rewritten as
lim λ(T −n [0, ξ]) = µ([0, ξ]).
n→∞
A proof of the theorems of Khintchine and Lévy using this approach can
be found in the monograph of Rockett & Szüsz [119] which includes
as well a proof of the theorem of Gauss–Kuzmin–Lévy with explicit error term. Further deep results on metric theory of this and other types of
continued fractions (e.g., the proof of the Doeblin–Lenstra–conjecture by
Bosma, Jager & Wiedijk) can be found in [37]. The analogues for continued fractions to the nearest integer were given by Rieger [116]. The book
of Schweiger [124] investigates higher dimensional analogues of continued
fractions.
The theory of continued fractions shows that for any given real number
x there exists a sequence (qm ) of strictly increasing positive integers with
qm kqm xk < 1 where, as above, k . k denotes the minmal distance to the next
integer. Littlewood conjectured that
inf nknxkknyk = 0
n∈N
for all x, y ∈ R.
It is not difficult to show that this is satisfied for rationals as well as for quadratic irrationals. For arbitrarily given x, Adamczewski & Bugeaud [1]
constructed explicitly continuum many real numbers y with bounded partial
quotients for which the pair (x, y) satisfies a strong form of Littlewood’s
conjecture. Recently, Einsiedler, Katok & Lindenstrauss [46] proved
that Littlewood’s conjecture is indeed true for almost all real numbers:
the Hausdorff dimension of the set of exceptional pairs (x, y) ∈ R2 has
measure zero. Elon Lindenstrauss has recently received a Fields medal
at the International Congress of Mathematicians 2010 in Hyderabad for his
work on interactions between dynamical systems theory and Diophantine
analysis! Besides his work on the Littlewood conjecture he has solved the
so-called quantum ergodicity conjecture.∗
We have not mentioned numerous applications of ergodic theory to Diophantine equations. For instance, for a squarefree integer d > 1 we are
interested in the √
set of points with integer coordinates on a two-dimensional
sphere of radius d:
Id := {x = (x, y, z) ∈ Z3 : x2 + y 2 + z 2 = d}.
A deep result of Gauss shows that Id is non-emtpy if, and only if, d is not
of the form 4a (8b − 1) for positive integers a, b. In the 1950s Linnik [98]
showed that, as d → ∞ amongst the squarefree integers with d ≡ ±1 mod 5,
the set
√
{x/ d : x ∈ Id } ⊂ §2
∗
And more Fields medals have been awarded to mathematicians working in ergodic
theory, however, this is a topic of the next chapter.
100
ERGODIC NUMBER THEORY
becomes uniformly distributed on the unit sphere S2 with respect to the
Lebesgue measure. Linnik’s approach used ergodic theory. The constraint
d ≡ ±1 mod 5 was removed by Duke [44].
Exercises
No pains - no gains! Here is another round to experience the world of continued
fractions and ergodic theory.
Exercise 8.1. Show that the continued fraction map T is not measure preserving
with respect to the Lebesgue measure.
Exercise 8.2. Prove statements (8.5)-(8.8).
Exercise 8.3. Give a proof of Knopp’s lemma in its full generality. Moreover, fill
all gaps as, for example, Binet’s formula (1.8) and the deduction of (8.14) from
(8.13)). Hint: help can be found in [37].)
Exercise 8.4. For some quadratic and cubic irrationalities compute the first partial
quotients of their continued fraction expansion and try to compare the asymptotic
behaviour of the geometric and arithmetical means.
Exercise 8.5. Prove that the set of real numbers having a continued fraction expansion with bounded partial quotients has Lebesgue measure zero. Moreover, show
P
1
that, given a function f with f (n) > 1 for n ∈ N such that ∞
n=1 f (n) diverges,
then the set
{x = [a0 , a1 , . . .] ∈ [0, 1) : an < f (n) for n ∈ N}
has measure zero. If you have been successful, try to prove under the same hypothesis on f that
{x = [a0 , a1 , . . .] ∈ [0, 1) : an (x) > f (k) for infinitely many k ∈ N}
has measure zero. Hint: The latter proof uses the Borel-Cantelli lemma from
probability theory; for further advise see [119].
Consequently, most of the numbers have a continued fraction expansion with partial
quotients that, although not bounded, are not too large!
Here is another result of Khintchine [81]:
Exercise 8.6. Show that, for almost all x = [0, a1 , a2 , . . .],
lim
N →∞ 1
a1
N
+ ...+
1
aN
= 1.74540 . . . .
Moreover, let f be a function with f (k) = O(k 1−δ ) for some positive δ. Prove that,
for almost all x = [a0 , a1 , . . .],
1
N
∞
X
)
log(1 + k(k+2)
1 X
.
f (an ) =
f (k)
N →∞ N
log 2
n=1
lim
k=1
*
*
*
8. Metric Theory of Continued Fractions
101
A week is definitely not enough to learn about all beautiful applications
of ergodic theory to arithmetic. We did not speak about Margulis’ proof
of the Oppenheim conjecture on the values of indefinite quadratic forms at
least three unknowns, nor did we mention the recent results on quantum
ergodicity. The next and final chapter gives a first glimpse of another, completely different direction of ergodic number theory which is free of integrals
but full of compact sets...
CHAPTER 9
Coda: Arithmetic Progressions
An arithmetic progression of length ℓ and common difference d is a sequence
a, a + d, a + 2d, . . . , a + (ℓ − 1)d,
where a, d, ℓ are integers with d ≥ 1 and ℓ ≥ 3 (to exclude trivialities). For
instance,
3, 13, 23, 33, 43, 53, 63, 73
is an arithmetic progression of length 8. We are interested in sets of integers
which contain arithmetic progressions of arbitrary length as, for example,
the set of even (resp. odd) integers. On the contrary, the powers of 10 do
not contain any arithmetic progression. We ask: under what conditions does
an infinite subset of Z contain arithmetic progressions of arbitrary length?
Erdös & Turán [50] conjectured that any subset {a1 , a2 , . . .} ⊂ N of
positive lower density, i.e.,
1 X
1 > 0,
lim inf
N →∞ N
an ≤N
contains arbitrary long arithmetic progressions. It is remarkable that there
is no structure assumption made upon the set of elements an , only that it
has to be sufficiently large. The conjecture of Erdös & Turán was solved
by Szemerédi [131] by introducing a complicated combinatorial technique.
Furstenberg [54] succesfully studied the problem of simultaneous recurrence of sets of positive measure. In this context he found a remarkable
generalization of Theorem 5.4: Let T : X → X be measure preserving transformation on a probability space (X, F, µ) and let A be a measurable set with
µ(A) > 0. Then, for any positive integer k, there exists a positive integer n
such that
(9.1)
µ(A ∩ T −n A ∩ . . . ∩ T −kn A) > 0.
This theorem plays a crucial role of Furstenberg’s ergodic proof of Szemerédi’s theorem on the solution of the Erdös & Turán conjecture. We
illustrate the proof (and refer to [114] for details). We denote by Ω = {0, 1}Z
the space of double infinite {0, 1}-sequences and interpret its elements as indicator functions χA of sets A ⊂ Z. Since {0, 1} is compact, by Tychonov’s
theorem, so is Ω and we may define a metric on Ω as follows: given sequences
x = (xn ), y = (yn ), let
N (x, y) = min{N ∈ N : xN 6= yN or x−N 6= y−N }
102
9. Coda: Arithmetic Progressions
for x 6= y, and
(9.2)
d(x, y) =
2−N (x,y)
0
103
if x 6= y,
otherwise.
It is easy to check that d defines a metric on Ω, hence (Ω, d) is a compact
metric space. We define the shift transformation by
(9.3)
σ : Ω → Ω, ,
ω(n) 7→ σω(n) = ω(n + 1).
Given an element ω ∈ Ω, we say that 1 occurs with positive upper Banach
density if the set Z := {n ∈ Z : ω(n) = 1} has positive upper Banach
density, i.e.,
♯(Z ∩ I)
> 0,
lim sup
♯I
♯I→∞
where I runs through the set of intervals in Z and ♯I denotes the number
of integers contained in I. Moreover, for ω ∈ Ω we write X = {σ n ω : n ∈
Z} ⊂ Ω. Then, one can show that if 1 occurs with positive upper Banach
density, there exists a σ-invariant measure µ on X satisfying
µ(A) > 0
for
A := {ω ∈ Ω : ω(0) = 1}.
Now we sketch how to apply Furstenberg’s simultaneous recurrence
theorem (9.1) to the Erdös-Turan-Conjecture. Assume that B ⊂ Z has
positive upper Banach density. Then, by (9.1), for any given k there exists
a positive integer n and some point ω ∈ Ω such that σ jn ω ∈ B ∩ X for
0 ≤ j < k. This implies
ω(0) = ω(n) = . . . = ω((k − 1)n) = 1.
Since ω ∈ X is limit of translates of the indicator function χA , we have
χA (b) = χA (b + n) = . . . = χA (b + (k − 1)n) = 1
for some b ∈ Z, hence A contains the arithmetic progression b, b + n, . . . , b +
(k − 1)n. This is essentially Furstenberg’s proof of the theorem of Szemerédi. ◦
Furstenberg’s ergodic approach marks the beginning of an impressive
success story. Gowers and later Tao [133] obtained quantitative results
with respect to Szemerédi’s theorem. Both were awarded a Fields medal
for their work in this direction: Gowers in 1998 at the ICM in Berlin,
Tao in 2006 at the ICM in Madrid. Also Roth obtained a Fields medal,
1958 in Edinburgh, however, mainly for his improvement of the diophantine approximation theorems of Thue and Siegel for algebraic numbers.
Moreover, Margulis and Bourgain received Fields medals for their contributions to ergodic theory and harmonic analysis, namely at the ICM 1978
in Helsinki and 1994 in Zurich, respectively.
There is another famous problem which cannot be deduced from the
theorems of Szemerédi and Furstenberg: Do the prime numbers contain arbitrarily long arithmetic progressions? In view of the Prime Number
104
ERGODIC NUMBER THEORY
Theorem 6.2 the primes have asymptotic density zero in N and, consequently,
Szemerédi’s theorem does not apply. In 2004, Green & Tao [57] proved:
The set of prime numbers contains arbitrarily long arithmetic progressions.
Their deep theorem is built on previous works of other mathematicians work
as, for example, Gowers. The longest arithmetic progression of primes
presently known has length 23,
56 211 383 760 397 + 44 546 738 095 860 k
for
k = 0, 1, . . . , 22,
and has been discovered by Frind, Underwood & Jobling (cf. Green
& Tao [57]).∗ The new methods of Green & Tao are applicable to rather
thin sets and we may speculate what kind of results will be proven with these
new powerful tools; indeed it is somewhat surprising that for the Green &
Tao theorem no deep results from analytic number theory are needed.
Our next aim is a related theorem of van der Waerden [141]:
Theorem 9.1. In any partition of Z in finitely many classes there is at
least one class which contains arbitrarily long arithmetic progressions.
If we divide the integers into r disjoint sets,
(9.4)
Z = A1 ∪ . . . ∪ Ar ,
then van der Waerden’s theorem claims that it is impossible to avoid
arithmetic progressions of arbitrary length in all Aj . Note that this does
not mean that there necessarily exist infinite arithmetic progressions (and
indeed this cannot hold in general as the reader can cconfirm with a simple
example). The statement remains true when we replace Z by N and all
known proofs can be formulated under this restriction without any difficulties. Any of those proofs is difficult.† The history of this theorem and its
different proofs is very interesting and worth to study; see the notes of van
der Waerden [142]. The original problem dates probably back to Schur
for the case r = 2, and not to Baudet as it is often refered to; however,
and this is interesting, although not extraordinary, the more general point
of view, that is arbitrary r, led to an easier proof.
We shall sketch a dynamical proof of van der Waerden’s theorem. For
this purpose we shall work in metric spaces. Recall that a homeomorphism is
a bijective continuous mapping which inverse is continuous too. The branch
of mathematics that deals with the dynamics of such homeomorphisms is
called topological dynamics.
First we mention some basic facts on a certain space of sequences: for
k ≥ 2 let Ωk = {1, 2, . . . , k}Z be the set of double infinite sequences ω =
∗
To illustrate the depth of this result we ask the dear reader to break this current
record on long arithmetic progressions in primes!
†
Clearly, there exists something like inavriance of difficulty: there cannot be a simple
proof of a deep theorem.
9. Coda: Arithmetic Progressions
105
(ω(n))n∈Z with values in {1, 2, . . . , k}. On Ωk we define via (9.2) the same
metric d, only with Ωk in place of Ω. It is not difficult to see that
(i) (Ωk , d) is a compact metric space;
(ii) the shift transformation σ : Ωk → Ωk , given by (9.3), can be
defined on Ωk and is a homeomorphism.
The only difficult part in proving (i) is the triangle inequality. For this aim
we may suppose that x, y, z ∈ Ωk are pairwise distinct. Then we have to
show
2−N (x,y) = d(x, y) ≤ d(x, z) + d(z, y) = 2−N (x,z) + 2−N (z,y) ,
which is equivalent to
2N (z,y)+N (x,z) ≤ 2N (x,y)+N (z,y) + 2N (x,y)+N (x,z) = 2N (x,y) (2N (z,y) + 2N (x,z) ).
This is obvious (actually, N (x, y) ≥ N (x, z) ≥ N (z, y) is the only non-trivial
case to consider).
In order to prove (ii) let x, y ∈ Ωk with x 6= y and d(x, y) = 2−N .
Then xi = yi for −N < i < N , hence (σx)(i) = xi+1 = yi+1 = (σy)(i) for
−(N + 1) < i < N − 1. This implies
d(σx, σy) ≤ 21−N = 2 d(x, y).
Consequently, σ is continuous. Obviously, σ is invertible and the inverse
σ −1 turns out to be continuous by the same reasoning as for σ.
The central role in our dynamical proof of van der Waerden’s Theorem 9.1 is played by the following multi-dimensional recurrence theorem of
Furstenberg & Weiss [55]:
Theorem 9.2. Let T1 , . . . , TN : X → X be homeomorphisms on a compact
metric space which commute, that means Ti Tj = Tj Ti for 1 ≤ i, j ≤ N .
Then there exists an x ∈ X and a sequence of positive integers nk with
limk→∞ nk = +∞ such that
lim d(Tink x, x) = 0
k→∞
for any
i = 1, 2, . . . , N.
The commutativity property is essential; note that Ti Tj denotes Ti ◦ Tj .
Consequently, the set of the Tj forms a semigroup.
Now we sketch how how to deduce the theorem 9.1 of van der Waerden from Theorem 9.2. Given a partition of Z in disjoint sets,
Z = A1 ∪ . . . ∪ Ak ,
we may associate a sequence ω = (ω(n))n∈Z ∈ Ωk by setting ω(n) = i if
n ∈ Ai . Next we consider the orbit {σ n ω : n ∈ Z}, where σ is the shift
transformation introduced in (9.3). We write X for the closure of this orbit
with respect to d. Applying Theorem 9.2 with Ti = σi := σ i (= σ ◦ . . . ◦ σ),
for any sufficiently small ǫ < 1, there exist x ∈ X and d ∈ N such that
d(σid x, x) < 1
for
i = 1, . . . , N.
106
ERGODIC NUMBER THEORY
Since d(x, y) = 2−N (x,y) we deduce that the terms with index 0 coincide:
x0 = xid = σid x(0)
for
i = 0, 1, . . . , N.
By construction, the sequence {xn }0≤n≤N d appears somewhere in ω, starting
at position a, say. Thus
ω(a) = x0 = xid = σid x(0) = ω(a + id)
for
i = 0, 1, . . . N.
This shows that a+id ∈ Aω(a) for i = 0, 1, . . . N . Hence, for any ℓ = N +1 we
have found an index j such that the set Aj contains an arithmetic progression
of length ≥ ℓ. It thus follows that at least one Aj in the dissection (9.4)
exists that contains arbitrarily long arithmetic progressions! Reviewing this
proof, we are reminded of some ideas of Furstenberg’s approach to the
theorem of Szemerédi.
Next we give a proof of Theorem 9.2 for the special case that the homeomorphisms Ti are all built from one homeomorphism T by setting Ti = T i
for i = 1, . . . , N as in our application for van der Waerden’s theorem.
For N = 1 this can be reduced to Birkhoff’s recurrence theorem [18] (not
to confuse with his ergodic theorem):
Theorem 9.3. Let T : X → X be a homeomorphism on a compact metric
space X. Then there exists an element x ∈ X with T nk x → x for a divergent
sequence of positive integers nk → ∞.
Proof. We shall use Zorn’s lemma.‡ If E denotes the family of all nonempty closed and T -invariant subsets Z of X, equipped with the semi-order
Z1 ≤ Z2
: ⇐⇒
Z1 ⊂ Z2 ,
then for each chain {Zκ }κ there exists a maximal and completely ordered
subsystem F ⊂ E — this is the so-called Hausdorff maximal chain theoT
rem (see [121]). Then, the set Z = κ Zκ with Zκ ∈ F is closed, T -invariant,
and, by construction, minimal, i.e., all non-empty clsoed proper subsets of
Z are not T -invariant. Moreover, Z is not empty since X is compact. If A
is any closed T -invariant subset of Z, then either A = ∅ or A = Z (similar
to the notion of ergodicity). In particular, it follows that the closure A of
the orbit {T n x : n ∈ Z} with an arbitrary x ∈ Z satisfies A = Z ⊂ X.
Hence, for any ǫ > 0 there exists n ∈ N such that d(T n x, x) < ǫ. § This
immediately implies the assertion. •
The remaining part of the proof of Theorem 9.2 is by induction on N .
Hence, we have to show that, if the assertion holds for N − 1 homeomorphisms T1 = T, . . . TN −1 = T N −1 , then it is also true if we add the N -th
homeomorphism TN = T N . For this we may suppose that X is the least
‡
Infamous by its equivalence to the unloved axiom of choice which claims that any
non-empty semi-ordered set in which every totally-ordered set has an upper bound contains
a maximal element. This was discovered by Zorn in 1935.
§
Note that the T -invariance allows more than the standard conclusion, namely, the
existence of an accumulation point.
9. Coda: Arithmetic Progressions
107
closed set which is invariant with respect to each T j with j = 1, . . . , N
(again by Hausdorff’s maximal chain theorem as in the previous proof).
Firstly, given ǫ > 0 and arbitrary x, x′ ∈ X, we prove the existence of a
finite set K ⊂ N such that
d(T k x, x′ ) < ǫ
(9.5)
for some
k ∈ K.
If ∅ =
6 B ⊂ X is open, then the minimality of X implies that for any z ∈ X
S
there exists some n ∈ N with T n z ∈ B. Hence, X = n∈N T −n B. Since X
is compact by assumption and T −n B is open, the theorem of Heine-Borel
implies that X possesses a finite covering of the form
[
X=
T −k B
k∈K(B)
with some finite subset K(B) ⊂ N. Once again by the compactness of X
there exist finitely many open balls B1 , . . . , Br each of which of radius 2ǫ
such that
r
[
Bj .
X=
j=1
x, x′
Thus, given
∈ X, we have x ∈ Bi for some i ∈ {1, . . . , r} and x′ ∈
S
−k
T Bi for some k ∈ K(Bi ). This gives (9.5) with K = rj=1 K(Bj ).
Next we show that, for any ǫ > 0 and any x ∈ X, there exist y ∈ X and
n ∈ N such that
(9.6)
d(T jn y, x) < ǫ
for
j = 1, . . . , N.
Since any homeomorphism T k is uniformly continuous on the compact set
X, there exists ρ > 0 for which
(9.7)
d(T k x1 , T k x2 ) < ǫ
für x1 , x2 ∈ X
whenever
d(x1 , x2 ) < ρ.
Actually, we may suppose this for all k from the finite(!) set K, defined by
(9.5) (by uniformity and compactness). By assumption, there exist x′ ∈ X
and n ∈ N such that
d(T jn x′ , x′ ) < ρ
for j = 1, . . . , N − 1.
Since X is compact, it follows that the T -invariant set T X is closed, hence
T X = X and T n X = X, respectively. Thus we can find y ′ ∈ X such that
T n y ′ = x′ and
d(T n y ′ , x′ ) = 0,
d(T jn y ′ , x′ ) < ρ
for
j = 2, . . . , N.
Using our previous uniform estimate (9.7), it thus follows that
d(T jn+k y ′ , T k x′ ) < ǫ
for
k ∈ K, j = 1, . . . , N.
For any x ∈ X there exists some k ∈ K with d(T k x′ , x) < ǫ. Setting
y := T k y ′ , the triangle inequality yields
d(T jn y, x) ≤ d(T jn+k y ′ , T k x′ ) + d(T k x′ , x) < 2ǫ
for j = 1, . . . , N . Since ǫ > 0 is arbitrary, we may deduce (9.6).
108
ERGODIC NUMBER THEORY
We get close to the finish. Given ǫ0 > 0 and some arbitrary x0 ∈ X, in
view of (9.6) there exist x1 ∈ X and n1 ∈ N satisfying
d(T jn1 x1 , x0 ) < ǫ0
for
j = 1, . . . , N.
Now we choose ǫ1 ∈ (0, ǫ0 ) such that with d(x, x1 ) < ǫ1 also
d(T jn1 x, x0 ) < ǫ0
for
j = 1, . . . , N
holds. We iterate this as follows: suppose we have defined
• points x1 , . . . , xk ∈ X,
• positive integers n1 , . . . , nk , and
• a strictly monotonic decreasing sequence of positive real numbers
ǫ1 , . . . , ǫk
with the property that, for all i = 1, . . . , k − 1,
(9.8)
d(T jni xi , xi−1 ) < ǫi−1
for j = 1, . . . , N,
and, if d(x, xi ) < ǫi , additionally
(9.9)
d(T jni x, xi−1 ) < ǫi−1
for j = 1, . . . , N
is true. Then, by (9.6), there exist (as in the case i = 0 above) xk+1 ∈ X
and nk+1 ∈ N such that
d(T jnk+1 xk+1 , xk ) < ǫk
for j = 1, . . . , N.
Now we now choose ǫk+1 ∈ (0, ǫk ) such that d(x, xk+1 ) < ǫk+1 implies
d(T jnk+1 x, xk ) < ǫk
for
j = 1, . . . , N.
Hence, (9.8) and (9.9) hold with i = k + 1. This process can be continued
ad infinitum which finishes the induction.
Finally, we let i = ℓ − 1, ℓ − 2, . . . and deduce for i < ℓ with regard to
(9.8) and (9.9) that
d(T j(ni+1 +...+nℓ ) xℓ , xi ) < ǫi
for j = 1, . . . , N.
Since X is compact, there exists a finite covering of X by r open balls of
radius ǫ0 . Consequently, there are indices i, ℓ satisfying 0 ≤ i < ℓ ≤ r
and d(xi , xℓ ) < ǫ0 . Setting m = ni+1 + . . . + nℓ it follows from ǫi < ǫ0 that
d(T jm xℓ , xℓ ) ≤ d(T jmxℓ , xi ) + d(xi , xℓ ) < 2ǫ0
für j = 1, . . . , N.
Since ǫ0 > 0 is arbitrary, we conclude the assertion of Theorem 9.2 in the
special case Tj = T j for j = 1, . . . , N . •
The given proof of van der Waerden’s theorem uses some infinite elements (e.g., the theorem of Tychonoff, the Zorn lemma, and the theorem of Heine-Borel). In fact, one can circumvent these statements by a
quantitative approach which leads to a pure combinatorial proof. For this
and further thoughts we refer to [135].
Chaotic or random structures, if sufficiently large, do contain regular
substructures. This is the essence of the above results. The van der
9. Coda: Arithmetic Progressions
109
Waerden theorem allows many applications. We give an example which
is related to the distribution of values of quadratic polynomials modulo one
(which closes the circle of our course):
Corollary 9.4. Given a real number α and some ǫ > 0, there exist infinitely
many m ∈ N such that
kαm2 k < ǫ.
Here kxk denotes the minimal distance of x to an integer. The corollary
can be proved in various ways, for instance, along the uniform distribution
results of Weyl; however, the following proof is of completely different
nature:
Proof. We dissect the unit interval in finitely many subintervals I each of
which of length ≤ 2ǫ . Then each of the sets
{n ∈ N :
2
1
2 αn
mod 1 ∈ I}
defines a subset of N which does not intersect with any other such set. By the
theorem of van der Waerden at least one of those subsets of N contains an
arithmetic progression of length three and common difference d as large as
we please (by removing terms from longer arithmetic progressions). Hence,
there exists n ∈ N such that
2 1
1
2 αn , 2 α(n
+ d)2 ,
1
2 α(n
+ 2d)2 ∈ I
for some I. Next we consider the identity
2
1
2 αn
− 2 · 21 α(n + d)2 + 12 α(n + 2d)2 = αd2 .
The left-hand side is made from two differences of elements in I modulo one,
hence, each of which is ≤ 2ǫ . This implies the inequality for m = d. Letting
ǫ → 0 we obtain infinitely many such m ∈ N. •
Erdös awarded 3000 US-Dollars for the proof of the following still open
conjecture:¶ If (an ) is a strictly increasing sequence of positive integers and
∞
X
1
an
n=1
diverges, then the sequence contains arbitrarily long arithmetic progressions.
In particular, this would imply the Green & Tao theorem since the series
over the reciprocals of the primes diverges as already Euler knew (see the
intro to Chapter 6).
¶
Actually, Erdös awarded many such prices for his uncountable conjectures, starting
from 5 Dollars, the amount being an indicator for the expected degree of difficulty. It is
said that Erdös claimed that he could also announce a price of 106 Dollars for the above
conjecture since he would not see the proof in his lifetime. Erdös died in 1996.
110
ERGODIC NUMBER THEORY
Exercises
Exercise 9.1. Give a proof of Theorem 9.2 for the general case of arbitrary commuting homeomorphisms T1 , . . . , TN . (Help can be found in [114].)
What could be more appropriate for the last task in these course notes than
Exercise 9.2. Try to prove Erdös’ conjecture that whenever (an ) is a strictly
P∞
increasing sequence of positive integers with diverging series n=1 a1n , then (an )
contains arbitrarily long arithmetic progressions. (And if you succeed, please inform
me!)
*
*
*
Ergodic theory is exhibiting patterns in (random or deterministic) data
sets. So it is no surprise that ergodic theory has important applications in
information theory. Basic questions are: what is randomness and how much
randomness is in a given set of data? In 1948 Shannon defined randomness
by entropy and set the foundations of information theory and modern digital
communication technology. Kolmogorov and Sinai extended the definition to studies of measure preserving transformations. For the fundamental
results in this direction and applications in data compression we refer to the
book of Choe [31].
However, there is much more what could and should be said about ergodic theory in general and ergodic number theory in particular. We cannot do better than to refer to the rich literature on this fascinating topic:
Although Halmos’ book [58] is very thin it contains an interesting list of
unsolved problems which are still open. The standard reference on ergodic
theory is Krengel’s monograph [91]. Many number theoretical applications offer the new book [47] by Einsiedler & Ward. The standard
reference for metrical continued fraction theory is [71] due to Iosifescu &
Kraaikamp.
Biographical and Historical Notes
We conclude with some biographical and historical notes related to ergodic
number theory. Our selection does not pretend to be complete. In fact, we will
be very brief with those mathematicians which are either very famous or have not
directly contributed to our topic. We cannot explain the historical interactions
which are in one way or another related to the topic of these notes but we intend
to comment on a few interesting incidents which are related with the story behind. The biographical accounts are mostly based on the ‘The MacTutor History
of Mathematics archive’ http://turnbull.mcs.st-and.ac.uk/ history/.
Questions about Life: Laplace’s Demon and Boltzmann’s Brain
The scientific revolution of Copernicus, Brahe, Galileo, and others was
the starting point for a new look on old questions about life and God. Philosophers
as John Calvin believed in predestination, which means that there exists God
who determined the fate of the universe for all time and space before Creation. Although this is not easily compatible with human free will we can find non-theistic or
polytheistic ideas of determinism, destiny, fate, or doom in many different cultures.
Here we are concerned with its counterpart in science.
Pierre-Simon de Laplace, born 1749, was as a French mathematician who
is well-known for his contributions to probability theory, differential equations,
and, in particular, mathematical physics. His mathematical career started at the
University of Caen before he later moved to Paris where he became professor at
the École Militaire. In 1790 he became a member of the Académie des Sciences
with the task to standardize weights and measures. His committee worked on
the metric system and advocated a decimal base. However, the social and political
upheaval of the French Revolution requested republican virtues and hatred of kings,
and Laplace and his family left Paris for some time until Robbespierre was
guillotined. With the rise of Napoleon in 1799 Laplace became Minister of the
Interior, however, as written in Napoleon’s memoirs, Laplace was removed after
only six weeks from this position because he brought the spirit of the infinitely small
into the government and was promoted to the Senate.
One direction of his research was devoted to questions about astronomical stability, a recurrent motive in these notes. Here Laplace pointed out that there
could be massive stars whose gravity is so great that not even light could escape
from their surface — a forerunner of the notion of a black hole. In 1814 Laplace
made the following thought experiment:
”We may regard the present state of the universe as the effect of
its past and the cause of its future. An intellect which at a certain
moment would know all forces that set nature in motion, and all
positions of all items of which nature is composed, if this intellect
111
112
ERGODIC NUMBER THEORY
were also vast enough to submit these data to analysis, it would
embrace in a single formula the movements of the greatest bodies
of the universe and those of tiniest atom; for such an intellect
nothing would be uncertain and the future just like the past would
be present before its eyes.”
This quotation is from Laplace’s introduction to his Philosophical essay on probabilities and the mentioned ’intellect’ is often referred to as Laplace’s demon. By
Heisenberg’s uncertainty principle∗ it turned out that Laplace’s thought experiment is inherently incompatible with quantum mechanics where exact simultaneous
measurement of position and velocities is impossible. Actually, Laplace’s demon
already met its end with the foundation of thermodynamics by Maxwell and his
contemporaries in the 19th century. Certain processes in nature are irreversible.
Laplace died in 1827.
Figure
1. Laplace, Boltzmann, and James Clerk
Maxwell, ∗ 1831 - † 1879, who, of course, also played a leading role in the development of thermodynamics.
Another not unrelated point of view was discussed by Ludwig Boltzmann,
who lived from 1844 to 1906, and was an Austrian physicist who is famous for his
contributions to statistical mechanics and thermodynamics. His kinetic theory of
gases and statistical mechanics was popularized by Paul and Tatiana Ehrenfest [45]. However, the ergodic hypothesis was disproved in 1913 by Rosenthal and Plancherel; we refer to the simple measure-theoretical argument of
Carathéodory [27]: the points of an orbit form a set of measure zero whereas
the energy surface has non-zero measure. Boltzmann was also one of the most
important advocate for atomic theory when that scientific model was highly controversial. However, here we want to report briefly about a philosophical idea to
explain why there is a lot of organization in the universe observable: a ’Boltzmann
brain’ is a hypothetical self-aware object which arises due to random fluctuations
out of some chaotical state. The famous phyisicist Richard Feynman wrote in
his Lectures on Physics [53]:
”So far as we know, all the fundamental laws of physics, such as
Newton’s equations, are reversible. Then were does irreversibility
come from? It comes from order going to disorder, but we do not
understand this until we know the origin of the order. Why is it
that the situations we find ourselves in every day are always out
∗
Werner Heisenberg was born 1901 in Würzburg!
Biographical and Historical Notes
113
of equilibrium? (...) One possible explanation is the following.
Look again at our box of mixed white and black molecules. Now
it is possible, if we wait long enough, by sheer, grossly improbable, but possible, accident, that the distribution of molecules gets to
be mostly white on one side and mostly black on the other. After
that, as time goes on and accidents continue, they get more mixed
up again. Thus one possible explanation of the high degree of order
in the present-day world is that it is just a question of luck. Perhaps our universe happened to have had a fluctuation of some kind
in the past, in which things got somewhat separated, and now they
are running back together again. This kind of theory is not unsymmetrical, because we can ask what the separated gas looks like
either a little in the future or a little in the past. In either case, we
see a grey smear at the interface, because the molecules are mixing
again. No matter which way we run time, the gas mixes. So this
theory would say the irreversibility is just one of the accidents of
life. (...) We would like to argue that this is not the case. Suppose
we do not look at the whole box at once, but only at a piece of the
box. Then, at a certain moment, suppose we discover a certain
amount of order. In this little piece, white and black are separate. What should we deduce about the condition in places where
we have not yet looked? If we really believe that the order arose
from complete disorder by a fluctuation, we must surely take the
most likely fluctuation which could produce it, and the most likely
condition is not that the rest of it has also become disentangled!
Therefore, from the hypothesis that the world is a fluctuation, all
of the predictions are that if we look at a part of the world we have
never seen before, we will find it mixed up, and not like the piece
we just looked at. If our order were due to a fluctuation, we would
not expect order anywhere but where we have just noticed it. (...)
We therefore conclude that the universe is not a fluctuation, and
that the order is a memory of conditions when things started. This
is not to say that we understand the logic of it. For some reason, the universe at one time had a very low entropy for its energy
content, and since then the entropy has increased. So that is the
way toward the future. That is the origin of all irreversibility, that
is what makes the processes of growth and decay, that makes us
remember the past and not the future, remember the things which
are closer to that moment in history of the universe when the order
was higher than now, and why we are not able to remember things
where the disorder is higher than now, which we call the future.”
Rationality vs. Irrationality
The very first observation of irrationality dates probably back to the school
of Pythagoras when Hippasus discovered the existence of irrationality or, in
the words of the later Eudoxos,
the lengths of the side and diagonal of a square
√
are incommensurable: 2 6∈ Q. Previously, Greek philosophy and mathematics
believed that everything can be expressed in rational numbers. This point of view
114
ERGODIC NUMBER THEORY
implied a central position of mathematics among all sciences since the laws of the
world are written in mathematical terms or, as Galileo said
”Mathematics is the language with which God has written the universe.”
The belief of the ancient Greeks in rational numbers also indicates the incompleteness at that time. We often underestimate the efforts and insights of past generations, however, it has been a long way from incommensurability as it appears in
the simple geometry of squares to Cantor’s concept of numbers and sets.
Probably less known are the following quotations of the fourteenth century
scientist Nicole Oresme, born around 1320 near Caen in France, died 1382 as
bishop of Lisieux. The first is from his work De proportionibus proportionum and
claims:
It is probable that two proposed unknown ratios are incommensurable because if many unknown ratios are proposed it is most
probable that any [one] would be incommensurable to any [other],
whereas the second appears in his Tractatus de commensurabilitate vel incommensurabilitate motuum cell (from 1351!) in which he considers two bodies moving on
a circle with uniform but incommensurable velocities:
No sector of a circle is so small that two such bodies could not
conjunct in it at some future time, and could not have conjuncted
in it sometime [in the past]
These sentences form part of Oresme’s refutation of astrology. His viewpoint
is that the future is essentially unpredictable which he tries to illustrate with a
dynamical system! These quotations indicate Oresme’s deep understanding of
irrationality and circle rotations. In modern mathematical his observation is that
rational numbers form a null set (in the sense of Lebesgue) and that the multiples
of an irrational number lie dense in the unit interval (cf. [108], §4.2). This is indeed
a remarkable observation Oresme made more than half a millennium ahead of his
time; we refer to [145] for a detailed analysis of his reasoning. Oresme is wellknown for his opposition to Aristotle’s astronomy; indeed he thought about
rotation of the Earth about two centuries before Copernicus, however, in the end
he rejected his new ideas. Moreover, he wrote an interesting treatise on the speed of
light and he invented a kind of coordinate geometry before Descartes, to mention
just a few of his ingenious ideas.
Figure 2. From left to right: Oresme, Kronecker, Cantor.
Biographical and Historical Notes
115
Another Competition: Intutitionism and Infinity
Leopold Kronecker was a German mathematician who lived from 1823 to
1891 and is well-known for rivalities with contemporaries. Among his academic
teachers were Kummer and Dirichlet, he also got to know Eisenstein and Jacobi during his studies, and they all influenced him to work on number theory
and elliptic functions. Kronecker was a wealthy man by managing the banking
business of his uncle successfully and marrying his uncle’s daughter. He did not
need to take on paid employment since and spent much time for mathematical research. He moved to Berlin and became an elected member of the Berlin Academy
for sciences.
Kronecker is quite famous for his sentence
”God created the integers, all else is the work of man”
in which he expressed his belief that all mathematics could be reduced to arguments
which involve only the integers and a finite number of steps. In the 1870s he was
opposed to the use of irrational numbers, limits, and the theorem of BolzanoWeierstrass because of their non-constructive nature. Also transcendental numbers could – following Kronecker – not exist.† This ended in a conflict, in particular with the new concepts of set and cardinality introduced by Cantor. However,
it should be noticed that there were more mathematicians than only Kronecker
that shared his strong feelings against arguments involving, for example, infinity.
Kronecker’s viewpoint was further developed by Poincaré and Brouwer who
introduced the name intuitionism to stress mathematics is made by mathematicians
and thus has priority over logic.
Kronecker’s strong feelings against the concept of infinity has to be seen in
the context of a new rising mathematical theory in the second half of the nineteenth
century: set theory invented by Georg Ferdinand Ludwig Philipp Cantor
who was a German mathematician, born 1845 into a family of a broker at St.
Petersburg Stock Exchange. During his studies in Berlin Cantor was mostly
interested in number theory, however, after his Ph.D. he moved the field from
number theory to analysis and the place to Halle. In the 1870s he started his
revolutionary work on the foundation of set theory, as we would describe it today.
We know of his progress from his communication with his friend and colleague
Dedekind. In 1873 Cantor proved that the rational numbers are countable,
and that the real numbers are not countable. For this purpose he introduced
the concept of bijective mappings to mathematics. Implicitly, he also showed by
these means that algebraic numbers are countable and thus almost all numbers are
transcendental. Soon after Cantor succeeded in showing that there is a bijective
map from the unit interval to d-dimensional space with arbitrary d. His paper
on this astonishing result was treated with suspicion by Kronecker and it was
published only after Dedekind intervened on Cantor’s behalf. In the period
from 1879 to 1884 Cantor published his major papers on set theory, including
transfinite arithmetic, however, realizing that his ideas were not widely accepted,
Cantor had a first recorded attack of depression. Another reason could also be
his unabality to prove his continuum hypothesis that the order of infinity of the real
†
How this goes along with his approximation theorem, the dear reader may investigate
on her or his own!
116
ERGODIC NUMBER THEORY
numbers was the next after that of the integers (which has been solved by Gödel
and Cohen in the twentieth century).
In 1890 Cantor founded the Deutsche Mathematiker-Vereinigung and organized the first meeting of this association in Halle one year later. In the late 1890s
Cantor’s ideas were finally accepted by the mathematical community. Both, Hurwitz and Hadamard expressed their positive opinion on his work in their lectures
at the first International Congress of Mathematicians 1897 in Zurich. At this time
Cantor discovered the first paradoxon in set theory which, probably, caused another depression. For an example of such a paradox we quote from Russel the
’barber paradox’ of the male barber who shaves all men of his town who do not
shave themselves, but who shaves the barber? During these heavy periods of his
mental illness he was concerned with questions of philosophy and literature (which
led him to his belief that Francis Bacon wrote the plays of Shakespeare). In
his last years Cantor was faced with the deaths of some of his children and further
periods of depression. He died in 1918 of a heart attack. Legend are the words of
Hilbert concerning Cantor’s work, in English translation:
”No one will drive us from the paradise which Cantor created for
us.”
It might be a bit ironic that Kronecker’s results on diophantine approximation give motivation for several mathematical theories, e.g., ergodic theory, which
do not question the methods but heavily use non-intuistic concepts as infinity or
even Zorn’s lemma.
Figure 3. From left to right: Hermann Weyl, ∗ 1885 - † 1955,
and Edmund Hlawka ∗ 1916 - † 2009; both contributed to uniform distribution theory. Unfortunately, I could not find any picture of Bohl; we compensate this with a short biography.
Less known is the Latvian mathematician Piers Bohl who lived from 1865 to
1921. Besides his contributions to uniform distribution theory Bohl is also known
for his proof of Brouwer’s fixed-point theorem for continuous mappings from a
sphere into itself, although this result provoked only little interest at that time.
Latvia had been under Russian rule since the 18th century; in 1914, because of
World War I, Bohl’s institute at Riga was evacuated to Moscow. Bohl went to
Moscow with his colleagues. When Latvia regained independence after the Russian
Revolution of 1917 and the end of World War I in 1918, although this was only for
the time before World War II started, Bohl returned to Riga for a chair at the
University of Latvia which had just been established. He died untimely due to a
stroke soon after.
Biographical and Historical Notes
117
New Theories: Measure, Integral, and Probability
There is an interesting old quotation of Galileo:
”Measure what is measurable, and make measurable what is not
so.”
This is indeed a good device for describing the contributions of French analysts at
the turn to the twenteeth century.
Emile Borel lived from 1871 to 1956. Already in his thesis he obtained
important results on the theory of measure, divergent series, and it also contains
the famous covering theorem which nowadays is called the Heine-Borel theorem
(and taught in any course on calculus). In the following very productive years
(in quantity and quality) he introduced in particular the notion of σ-additivity in
this period (cf. [125]) and some of his works are related to Einstein’s theory
of relativity which shows a remarkable spectrum of interests. He certainly was
influenced by mathematicians as Jordan, Picard, Goursat, Painlevé, and
Appell whose daughter Marguerite, a quite famous author writing under the
name Camille Marbo, he married. In 1897 he was joint secretary at the first
International Congress of Mathematicians held in Zurich, and in 1905 he was elected
president of the French Mathematical Society. Besides many awards and further
activities it should be mentioned taht he founded the Institut de Statistique de
l’Université de Paris in 1922 and, with the financial support of Rockefeller and
Rothschild, he set up the Institut Henri Poincaré in 1928. In his many papers
on probability theory he stressed its practical value and its variety of applications
in various sciences.
In the 1920s Borel started a second career in politics. He joined the
Republican-Socialist Party, the group to which also Painlevé belonged. Borel
became Minister of the Navy in the French Government in the period from 1925
until 1940. During World War II he still produced mathematical works of high
level besides his activities in the Résistance. He was arrested and imprisoned in
1941 but released after one month. For his resistance against Vichy he was awarded
the Grand Croix Légion d’Honneur. In 1948 he became president of the Science
Committee of UNESCO.
Borel’s mathematical ideas had a successor in Henri Lebesgue, another
French mathematician, who lived from 1875 to 1941. His doctoral thesis and early
pieces was a real breakthrough. In the 19th century analysis was limited to continuous function. Generalizing the concept of the Riemann integral as area belwo a
curve to discontinuous functions Lebesgue contributed one of the biggest achievements of modern analysis. He worked out the theory of measure in 1901, building
on previous works of Borel and Jordan. Moreover, Lebesgue formulated the
theory of measure and the definition of what is now known as Lebesgue integral
in 1901. More or less the same time the English mathematician Young developed
independently an integration theory analogous to the one of Lebesgue; however,
as Burkill wrote
”He did not meet the recognition he deserved. This was due in part
to his late start, and in part to a certain conservative hostility to
the modern theory of real functions - a theory which few Englishmen in the early years of this century understood. Even when his
profundity and originality were better appreciated, he was passed
118
ERGODIC NUMBER THEORY
over in elections to chairs in favour of men who might be expected
to be less exacting colleagues.”
Young’s definitions of measure and integration were different from but essentially
equivalent to those of Lebesgue. Whereas his work is almost forgotten Lebesgue’s
thesis had deep impact on Fourier analysis and awakened this field after some
calm period in the second half of the 19th century. In his later career Lebesgue’s
interests moved to topology, potential theory, set theory, and the theory of surfaces.
He also contributed to pedagogics and history of sciences.
Figure 4. From left to right: Borel, Lebesgue, William
Henry Young, ∗ 1863- † 1942, and Kolmogorov
It took some while before the new concepts of measure and integration were
adopted in other disciplines of mathematics, in particular, in probability theory.
Andrey Nikolaevich Kolmogorov was born 1903 in Russia. His career and
his contributions to various mathematical fields (including dynamical systems) were
exceptional. In 1933 Kolmogorov published his influential Grundbegriffe der
Wahrscheinlichkeitsrechnung [88], a treatise of probability theory with the first
widely accepted axiomatic setting (see [125] for its history and reception). Actually,
his approach is now widely considered as mathematical foundation of probability.
Before Kolmogorov rather different concepts were proposed, from the very simple
experimental one due to Laplace to several more developed, however, in details
not satisfying approaches by the French and German schools. Hilbert’s Sixth
problem proposed
”to treat (...) by means of axioms those physical sciences in which
mathematics plays an important part; in the first rank are the theory of probability and mechanics”
In this sense, Kolmogorov has solved the probabilistic part of this task. Moreover,
he solved Hilbert’s Thirteenth Problem in 1957 by showing the existence of a
continuous function of three variables which is not representable by continuous
functions of two variables. Kolomogorov died in 1987.
The Struggle for the First Ergodic Theorem
We start with the famous French mathematician and physicist Henri
Poincaré, who lived from 1854 to 1912, and is often compared with Hilbert for
being the last universalist in mathematics (although he had started his scientific
career as a mining engineer). He contributed to numerous branches of mathematics.
In his 270-pages paper [112] Poincaré solved part of the three-bodies-problem,
that is the mathematical description of the orbits of three bodies interacting gravity. This extraordinary work was awarded by swedish king Oscar II. at the occasion
Biographical and Historical Notes
119
of his sixties birthday; however, the publication of this work delayed by three years
and fifty letters of correspondence with Phragmén and Mittag-Leffler who
found a gap in the original version. With his apporach Poincaré set the foundations for for treating chaotic movements and invariant integrals. The complete
analytic solution of the three-bodies-problem was given by Sundman in 1907. The
stability of a system consisting of three bodies is described by the KAM-theory
due to Kolmogorov, Arnold & Moser, established in the period 1954-1964.
Poincaré’s studies on topology were path-breaking; in particular, his conjecture about the characterization of the three-dimensional sphere among threedimensional manifolds influenced much of investigation in this field before Perelman’s solution of this millennium problem in 2003. For a more elaborate appreciation of his life and work see http://turnbull.mcs.st-and.ac.uk/∼history/. His works
on automorphic forms and elliptic curves were the starting points of fruitful investigations in number theory in the twentieth century. In physics Poincaré is
considered together with Lorentz and, of course, Einstein as one of the discoverers of special relativity. He died untimely on embolism; soon after his cousin,
Raymond Poincaré, became the President of France from 1913 to 1920.
Figure 5. From left to right: Poincaré, Birkhoff, von Neumann
For the field of ergodic theory Poincaré’s investigations related to the ergodic
hypothesis were most influential.
George D. Birkhoff, ∗ 1884 - † 1944, was probably the most famous American mathematician of his time. He worked at the universities at Harvard and
Princeton and his main field of interest was analysis. In particular, mathematical physics where he proved Poincarés ‘Last Geometric Theorem’, a special case
of the three-bodies-problem. Moreover, he studied the four colour problem and,
of course, dynamical systems. His ergodic theorem gave a rigorous foundation of
Maxwell’s kinematic theory of gas. Here is a quotation by Butler:
“Birkhoff ’s discovery of what has come to be known as the ’ergodic theorem’ in 1931 - 32 is his most well-known contribution
to dynamics. This theory, which resolved in principle one of the
fundamental problems arising in the theory of gases and statistical
mechanics, has been influential not only in dynamics itself but also
in probability theory, group theory, and functional analysis.”
Birkhoff was awarded the first Bocher Memorial Prize of the American Mathematical Society, later he was vice-president of this association. However, there is
120
ERGODIC NUMBER THEORY
also something negative to report about his life. According to Einstein he was one
of the world greatest anti-Semites. And it is said that Birkhoff has used his influential position to anticipate engagements of jewish scientists. It should be noticed
that also his son Garrett Birkhoff, ∗1911-†1996, played an important role in
ergodic theory. In contrast to his father, was not Garrett anti-Semitic. First, he
was working in group theory, later, during and after World War II he changed his
interests and studied problems in applied mathematics (in particular in numerical
linear algebra). During this time he became a friend of John von Neumann and
they had an influential joint paper on the logic of quantum mechanics.
We conclude with a quotation by Davis on both Birkhoffs:
”G D Birkhoff was an early teacher of mine, and his son Garrett was my (much appreciated) thesis supervisor. G D (but not
Garrett) was consistently anti-Semitic, as shown in correspondence
over the years (...) He systematically kept Jews out of his department, but apparently relented late in life and favoured appointing
ONE by the 1940s. He also helped some Jewish refugees find jobs
NOT at Harvard in the 1930s, while acting generally to hinder their
entry. Though his record is mixed and some were more implacably
anti-Semitic than he was, his actions in this regard are important
because of his very great influence. However, it does not seem to
be true (as rumoured credibly at the time) that he opposed the appointment of Oscar Zariski to his department. As I mentioned,
Garrett was not anti-Semitic at all.”
John von Neumann was born 1903 in Budapest. Already the young John
(in those times János) gave impressions of his brilliant memory as we quote from
Poundstone:
“At the age of six, he was able to exchange jokes with his father
in classical Greek. The Neumann family sometimes entertained
guests with demonstrations of Johnny’s ability to memorise phone
books. A guest would select a page and column of the phone book
at random. Young Johnny read the column over a few times, then
handed the book back to the guest. He could answer any question
put to him (who has number such and such?) or recite names,
addresses, and numbers in order.”
Starting from 1921 von Neumann studied mathematics and chemistry in Budapest, Berlin, and Zurich; amongst his teachers were Weyl and Pólya. In 1926
he finished his doctoral studies with a dissertationabout ordinal numbers in set
theoy. Afterwards he taught in Berlin, Hamburg, and Göttingen (together with
Hilbert). On invitation by Veblen von Neumann came in 1929 to Princeton to teach about quantum mechanics; soon after he became professor at the
newly founded Institute for Advanced Studies (together with Alexander, Einstein, Morse, Veblen, and Weyl). The same time he hold academic positions
in Germany, however, after the takeover by the Nazi party many jewish scientists
had to leave Germany, in particular von Neumann. His research was extremely
broad, ranfing from pure directions as logic and axiomatic set theory, and measure
theory to more applied branches as partial differential equations, mathematical
foundations of quantum mechanics, statistical mechanics, and operator theory. In
Biographical and Historical Notes
121
this context he discovered the first ergod ic theorem at all. Moreover, he contributed
to Haar’s development of measure theory for groups which led to a partial solution
of the fifths Hilbert problem on characterizing Lie-groups. He is also considered
as founder of game theory and the concept of cellular automata in computer science. He was awarded numerous prizes for his mathematics and, besides, he was
also well-known for parties and driving. During World War II he contributed many
ideas for the construction of the atom and H-bomb in Los Alamos. John von
Neumann died untimely at the age of 53 in 1957 on cancer.
Figure 6. From left to right: Eberhard Frederich Ferdinand Hopf, ∗1902 - † 1983, an Austrian mathematician who contributed to topology and ergodic theory; Marc Kac, ∗ 1914 - †
1984, Polish-U.S. American mathematician with widely spread interests in probability theory, physics, and number theory; Shizuo
Kakutani, ∗ 1911 - † 2004, Osaka born mathematician, who
proved together with Yosida the first maximal ergodic theorem.
In particular, the question of priority of the discovery of ergodic theorems of
von Neumann and Birkhoff and the process of their publication is of interest.
It is a fact that von Neumann was the first although his publication was a little later. According to von Neumann’s mean ergodic theorem the Cesàro limit
1 P
n
0≤n<N f (T x) is convergent in the mean (of order two) while Birkhoff’s
N
pointwise ergodic theorem claims its convergence almost everywhere. For applications in physics, convergence in the mean is sufficient. However, for mathematics
both are relevant.‡ In an interesting recent paper, Zund [156] presents a lost letter of von Neumann to his friend Robertson from January 1932 in which the
process of von Neumann’s discovery of his ergodic theorem is outlined and which
indicates Birkhoff’s unwillingness to postpone his article [20] for the publication
of von Neumann’s [105]. The latter article was originally written in German and
its translation into English took von Neumann and his friend Koopman costed
some time. It should be noted that von Neumann then was aiming at settling
over from Europe to the U.S.. In [20] Birkhoff’s quotation is insufficient since it
does not indicate the priority of von Neumann’s result:
”The important recent work of von Neumann (not yet published)
shows only that there is convergence in the mean, (...) and the time
probability is not established in the usual sense for any trajectory.
‡
In this course we have used Birkhoff’s theorem several times and von Neumann’s
not at all.
122
ERGODIC NUMBER THEORY
A direct proof of von Neumann’s results (not yet published) has
been obtained by E. Hopf.”
Later Birkhoff changed his mind, probably by intervention of Veblen (as Zund
speculates), and mentioned the priority of von Neumann’s discovery explicitly.
The paper by Hopf [67] contains also some ideas concerning improvements of
Birkhoff’s theorem. Whereas Birkhoff’s pointwise ergodic theorem is based on
Lebesgue’s integral and measure theory, von Neumann’s approach is a first and
striking example of abstract Hilbert space theory. The latter was popularized
by von Neumann himself through his work on quantum mechanics and papers
of Koopman, a former doctoral student of Birkhoff, who worked on the weak
ergodic hypothesis.
The Unreasonable Effectiveness of Analysis
This is a variation of the title of an intersting article by the physicist Eugene
Wigner [152] on the ubiquity of mathematics in nature. We have already mentioned the merits of analysis when applied to number theoretical problems, e.g., in
the case of the Riemann zeta-function and its relation to prime number distribution. We do not need to write biographical sketches of those who contributed to
prime number theory since, thanks to the famous open Riemann hypothesis, much
has been written about their lives.§ But we quote from Davenport who wrote:
”Analytic number theory may be said to begin with the work of
Dirichlet, and in particular with Dirichlet’s memoir of 1837 on the
existence of primes in a given arithmetic progression.”
Indeed, for his proof of the existence of infinitely many primes in any prime residue
class from 1837/1838 Dirichlet introduced the important concepts of characters
and Dirichlet series. However, for many reasons one may consider Euler as the
first to apply analytic methods to arithmetical problems (see the intro of Chapter
6).
Figure 7. From left to right: Leonhard Euler, Carl
Friedrich Gauss, Johann Peter Gustav Lejeune Dirichlet (his family had roots in the Belgium town of Richelet which
explains part of his name Le jeune de Richelet standing for Young
from Richelet), and Bernhard Riemann.
Another such success stroy of ’applied analysis’ are the results on statistics
of continued fraction expansions, starting with Gauss’ observations, the rigorous
proofs thereof by Lévy and Kuzmin, and leading to Khinchine’s constant among
other interesting results. We start with Paul Lévy, a French mathematician who
§
A good reading is the popular book The Music of the Primes of Sautoy.
Biographical and Historical Notes
123
lived from 1886 to 1971. Being a pupil of Hadamard he worked in analysis, probability and measure theory, however, he focussed on probability theory in 1919 at
the occasion of a series of lecture he was asked to deliver at the École Polytechnique. Recall that there was no mathematical foundation of probability at that
time, nevertheless, probability theory attracted many young researchers in those
days. Besides Kolmogorov in Russia it was Lévy in France who pushed this field
forwards; we give a quotation of Loève:
”Paul Lévy was a painter in the probabilistic world. Like the very
great painting geniuses, his palette was his own and his paintings
transmuted forever our vision of reality. (...) His three main,
somewhat overlapping, periods were: the limit laws period, the great
period of additive processes and of martingales painted in pathtime
colours, and the Brownian pathfinder period.”
The next contributor to be mentioned is Aleksandr Yakovlevich Khintchine, ∗ 1894 - † 1959; a Russian mathematician which makes the transcription
of his name from cyrillic into latin letters difficult. Besides math he was all his
life fascinated by poetry and theatre. In mathematics he started from analysis and
probability theory and turned later to number theoretical problems, where, however, his approach was always analytic. At Moscow University he founded together
with Kolmogorov and others, including their pupil Gnedenko, the school of
probability theory. In this time he widened his interests again and started research
in statistical mechanics and information theory. Remarkably, Khintchine wrote
monogrophies on more or less each of these topics which became standard references
and were translated in various languages. In 1939 he became an elected member of
the USSR Academy of Sciences and in the following year he was awarded the State
Prize for scientific achievements.
Figure 8. From left to right: Khintchine, Lévy, Doeblin; unfortunately, I could not find a picture of Kuzmin.
We conclude with Wolfgang Doeblin, not because of his interesting, however, unfortunately short life, but because of his deep contributions to mathematics.
Doeblin was born in 1915 in Berlin; his father, Alfred Döblin, was a famous
German writer – his masterpiece is ’Berlin Alexanderplatz’ – who emigrated with
his family from Nazi Germany in 1933 to first Switzerland and later France where
they obtained French citizenship. From 1936 Wolfgang Döblin changed his
name to French Vincent Doeblin (although he signed his mathematical works
with Wolfgang Doeblin) and started to study economy and statistics at the
124
ERGODIC NUMBER THEORY
Sorbonne. Among his academical teachers were Denjoy, Frechet, and Lévy.
He finished his thesis in 1938, the same year when he was recruted to the French
army. During war he continued his studies in probability theory. In February 1940
he sent his treatise Sur l’équation de Kolmogoroff to the Académie des Sciences in
Paris because he was afraid that his studies could get lost. When the German army
occupied his batallion in June 1940 Doeblin committed suicide. His contributions
to mathematics have not been fully credited for a long time until Iosifescu [70]
gave an extensive analysis of his work (apart from Billingsley [17], p.49). Doeblin’s letter to the Académie was opened only in 2000 and its content gave a big
surprise to the leading probabilists all around the world. His treatise anticipated
many results on stochastic analysis which were found only in the 1950s and even
60s.¶
Last but not Least
Much has been written about Paul Erdös, ∗ 1913 - † 1996, the Hungarian
cosmopolit who published more papers and who proposed more conjectures than
any other mathematician. He worked on various fields as combinatorics, graph
theory, number theory, and probability theory. Most of his lifetime he traveled
from one math department to the next giving and searching for inspiration for
his mathematics. There was an interruption during the McCarthy anti-communist
era in the 1950s when the U.S. government denied Erdös a re-entry visa into the
United States; the reasons have never been explained, probably this is related to the
Cold War and Erdös being a citizen from a communist country. In 1963 the U.S.
government changed its opinion and he resumed including American universities in
his teaching and travels.
We could continue with biographical sketches of another cosmopolit, Bartel
van der Waerden, his important work in various branches of mathematics, his
contributions to quantum physics, and his historical pieces to popularize mathematics. And we should write also about the work of many others since mathematics
is discovered by humans and often the discovery itself is related with an interesting
life. However, we better be short and conclude with two of our heroes, from left to
right: Arnold and Felix:
¶
The biography ’The lost equation. In search of Wolfgang and Alfred Doeblin’ by
Marc Petit gives a very readable account of his and his fathers life.
Notations
We indicate here some of the notation and conventions used in these notes.
However, this list is not complete. We omit notions which only appear in one
chapter (where they are defined in situ) or which are covered by the index or which
are standard.
As usual, we denote by N = {1, 2, 3, . . .} the set of positive integers. The sets
of integers, rational numbers, real numbers, and complex numbers are denoted by
Z, Q, R, and C, respectively.
The logarithm is, if not indicated differently, always taken to the basis e =
exp(1). The integer part and fractional part of a real number x are denoted by ⌊x⌋
and {x}, respectively. Very convenient is the use of the Landau- and Vinogradovsymbols. Given two functions f (x) and g(x), both defined for x ∈ X, where g(x)
is positive for all x ∈ X, we write
• f (x) = O(g(x)) and f (x) ≪ g(x), respectively, if there exists a constant
C ≥ 0 such that
|f (x)| ≤ Cg(x)
for all x ∈ X;
• f (x) ≍ g(x) if f (x) ≪ g(x) ≪ |f (x)|;
here X is specified either explicitly or implicitly. Usually, the set X is an interval
[ξ, ∞) for some real number ξ; in this case we also write
• f (x) ∼ g(x) if the limit
lim
x→∞
|f (x)|
g(x)
exists and is equal to 1;
• f (x) = o(g(x)) if the latter limit exists and is equal to zero;
• f (x) = Ω(g(x)) if
|f (x)|
>0
g(x)
(this is the negation of f (x) = o(g(x))).
lim inf
x→∞
Sometimes the limit x → ∞ is replaced by another limit x → x0 , where x0 is some
complex number; in this case the limit x0 is explicitly stated. In estimates, ǫ always
denotes a small positive number, not necessarily the same at each appearance.
We denote by ♯A the cardinality of a set A. For a probability measure we use
the bold faced letter P and E stands for the expectation.
125
Bibliography
[1] B. Adamczewski, Y. Bugeaud, On the Littlewood conjecture in simultaneous Diophantine approximation, J. London Math. Soc. 73 (2006), 355-366
[2] R.L. Adler, B. Weiss, The ergodic infinite measure preserving transformation of
Boole, Israel J. Math. 16 (1973), 263-278
[3] M. Agrawal, N. Kayal, N. Saxena, PRIMES is in P, Ann. of Math. 160 (2004),
781?793
[4] R. Apéry, Irrationalité de ζ(2) et ζ(3), Astérisque 61 (1979), 11-13
[5] V.I. Arnold, Stochastic and Deterministic Characteristics of Orbits in Chaotically
Looking Dynamical Systems, Trans. Moscow Math. Soc. 70 (2009), 31-69
[6] V.I. Arnold, A. Avez, Ergodic Problems of classical mechanics, Benjamin, NY 1968
[7] H.
Aslaksen,
When
is
Chinese
New
Year?,
available
at
www.math.nus.edu.sg/aslaksen/
[8] J. Avigad, P. Gerhardy, H. Towsner, Local stability of ergodic averages, Trans.
A.M.S. 362 (2010), 261-288
[9] L. Baéz-Duarte, Sobre el promedio espacial del ciclo de Poincaré,
Bull. Venezuela Acad. Sciences 24 (1964), 64-66;
engl. translation at
http://front.math.ucdavis.edu/0505.5625
[10] D.H. Bailey, P.B. Borwein, S. Plouffe, On the rapid computation of various
polylogarithmic constants, Math. Comp. 66 (1997), 903-913
[11] D.H. Bailey, R.E. Crandall, On the random character of fundamental constant
expansions, Exper. Math. 10 (2001), 175-190
[12] M. Balazard, E. Saias, M. Yor, Notes sur la fonction de Riemann, 2. Advances
Math. 143 (1999), 284-287
[13] F. Bayart, É. Matheron, Dynamics of linear operators, Cambridge University
Press 2009
[14] V. Becher, S. Figueira & R. Picchi, Turing’s unpublished algorithm for normal
numbers, Theor. Computer Science 377 (2007), 126-138
[15] J. Beck, Super-uniformity of the typical billiard path, in: An irregular mind: Szemerédi is 70, I. Bárány, J. Solymosi (eds.), Springer 2010, 39-130
[16] F. Benford, The law of anomalous numbers, Proc. Amer. Philos. Soc. 78 (1938),
551-572
[17] P. Billingsley, Ergodic theory and Information, John Wiley & Sons, New York
1965
[18] G.D. Birkhoff, Proof of Poincaré’s geometric theorem, Trans. Amer. Math. Soc.
14 (1913), 14-22
[19] G.D. Birkhoff, Démonstration d’un théorème élémentaire sur les fonctions entières,
C. R. Acad. Sci. Paris 189 (1929), 473-475
[20] G.D. Birkhoff, Proof of the ergodic theorem, Proc. Nat. Acad. Sci. USA 17 (1931),
656-660
[21] G.D. Birkhoff, What is the ergodic theorem?, Amer. Math. Monthly 49 (1942),
222-226
[22] P. Bohl, Über ein in der Theorie der säkularen Störungen vorkommendes Problem,
J. f. Math. 135 (1909), 189-283
126
Bibliography
127
[23] G. Boole, On the comparison of transcendents with certain applications to the
theory of definite integrals, Philos. Trans. Roy. Soc. London 147 (1857), 745-803
[24] É. Borel, Les probabilités dénombrables et leurs applications arithmétiques, Rend.
Circ. Matematico di Palermo 27 (1909), 247-271
[25] N.G. de Bruijn, K.A. Post, A remark on uniformly distributed sequences and
Riemann integrability, Indagationes math. 30 (1968), 149-150
[26] N. Calkin, H.S. Wilf, Recounting the rationals, Am. Math. Mon. 107 (2000),
360-363
[27] C. Carathéodory, Über den Wiederkehrsatz von Poincaré, Sitzungsberichte Preuß
Akad. Wiss. (1919), 580-584
[28] J.W.S. Cassels, On a problem of Steinhaus about normal numbers, Colloq. Math.
7 (1959), 95-101
[29] R.V. Chacon, D.S. Ornstein, A general ergodic theorem, III. Journal Math. 4
(1960), 153-160
[30] D.G. Champernowne, The construction of decimals normal in the scale of ten, J.
London Math. Soc. 8 (1933), 254-260
[31] G.H. Choe, Computational Ergodic Theory, Springer 2005
[32] A.H. Copeland, P. Erdös, Note on normal numbers, Bull. Amer. Math. Soc. 52
(1946), 857-860
[33] W.A. Coppel, Number Theory. An Introduction to Mathematics, Part B, Springer
2006
[34] R. Crandall, C. Pomerance, Prime numbers. A computational perspective,
Springer, 2001
[35] J.P. Crutchfield, J.D. Farmer, N.H. Packard, R.S. Shaw, Chaos, Scientific
American 255 (1986), 46-57
[36] A. Csordás, P. Szépfalusy, Singularities in Rényi information as phase transitions
in chaotic states, Phys. Rev. A 39 (1989), 4767-4777
[37] K. Dajani, C. Kraaikamp, Ergodic theory of numbers, Mathematical Association
of America, Washington DC 2002
[38] P. Deligne, La conjecture de Weil. II. Publ. Math., Inst. Hautes Étud. Sci. 52 (1980),
137-252
[39] A. Denjoy, L’Hypothèse de Riemann sur la distribution des zéros de ζ(s), reliée à
la théorie des probabilités, Comptes Rendus Acad. Sci. Paris 192 (1931), 656-658
[40] M. Denker, Einführung in die Analysis dynamischer Systeme, Springer 2005
[41] P. Diaconis, The distributions of leading digits and uniform distribution mod 1,
Ann. Probab. 5 (1977), 72-81
[42] F.J. Dyson, H. Falk, Period of a discrete Cat mapping, Amer. Math. Monthly 99
(1992), 603-614
[43] W. Doeblin, Remarques sur la théorie métrique des fractions continues, Composition
math. 7 (1940), 353-371
[44] W. Duke, Hyperbolic distribution problems and half-integral weight Maass forms,
Invent. Math. 92 (1988), 73-90
[45] P. Ehrenfest, T. Ehrenfest, Begriffliche Grundlagen der statistischen Auffassung
in der Mechanik, in: Encyklopaedie der Mathematischen Wissenschaften, Teubner,
Leipzig 1912, 1-90
[46] M. Einsiedler, A. Katok, E. Lindenstrauss, Invariant measures and the set of
exceptions to Littlewood’s conjecture, Ann. of Math. 164 (2005), 513-560
[47] M. Einsiedler, T. Ward, Ergodic Theory: with a view towards Number Theory,
Springer 2010
[48] P.D.T.A. Elliott, The Riemann zeta function and coin tossing, J. reine angew.
Math. 254 (1972), 100-109
[49] J. Elstrodt, Maß- und Integrationstheorie, Springer 2007, 8.Auflage
128
ERGODIC NUMBER THEORY
[50] P. Erdös, P. Turán, On some integer sequences, J. London Math. Society 11 (1936),
261-264
[51] D. Evans, D. Searls, The fluctuation theorem, Advances in Physics 51 (2002),
1529-1585
[52] C. Faivre, Distribution of Lévy constants for quadratic numbers, Acta Arith. 61
(1992), 13-34
[53] R. Feynman, R. Leighton, M. Sands, The Feynman Lectures on Physics, three
volumes, Caltech 1964
[54] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi
on arithmetic progressions, J. d’Analyse Math. 71 (1977), 204-256
[55] H. Furstenberg & B. Weiss, Topological dynamics and combinatorial number
theory, J. d’Analyse Math. 34 (1978), 61-85
[56] É. Ghys, Variations on Poincaré’s recurrenece theorem, in: The Scientific Legacy of
Poincaré, É. Charpentier et al. (eds.), AMS, Providence 2010
[57] B.J. Green, T.C. Tao, The Primes contain arbitrarily long arithmetic progressions,
Ann. Math. 167 (2008), 481-547
[58] P.R. Halmos, Lectures on Ergodic Theory, Math. Soc. of Japan, Tokyo 1956
[59] G.H. Hardy, Divergent Series, Clarendon Press, Oxford 1949
[60] G. Harman, Metric Number Theory, Clarendon Press, Oxford 1998
[61] G.H. Hardy, E.M. Wright, An introduction to the theory of numbers, Clarendon
Press, Oxford, 1979, 5th ed.
[62] H. Heilbronn, On the average length of a class of finite continued fractions, in
Number Theory and Analysis (Papers in Honor of Edmund Landau), Plenum, New
York 1969, 87–96
[63] F. Hidetoshi, T. Rothman, Sacred Mathematics, Japanese Temple Geometry,
Princeton University Press 2008
[64] E. Hlawka, Über die Gleichverteilung gewisser Folgen, welche mit den Nullstellen
der Zetafunktion zusammenhängen, Österr. Akad. Wiss., Math.-Naturw. Kl. Abt. II
184 (1975), 459-471
[65] E. Hlawka, Theorie der Gleichverteilung, BIB, Mannheim, 1979
[66] E. Hlawka, C. Binder, Über die Entwicklung der Theorie der Gleichverteilung in
den Jahren 1909 bis 1916, Arch. Histor. Exact Sciences 36 (1986), 197-249
[67] E. Hopf, On the time average theorem in dynamics, Proc. Nat. Acad. Sciences 18
(1932), 93-100
[68] W. Hurewicz, Ergodic theorem without invariant measure, Ann. Math. 45 (1944),
192-206
[69] A. Hurwitz, R. Courant, Funktionentheorie, Springer, 4. Auflage 1964
[70] M. Iosifescu, Doeblin and the metric theory of continued fractions: a functionaltheoretic solution to Gauss’ 1812 problem, in: ’Doeblin and modern probability’, AMS,
Providence 1993, 97-110
[71] M. Iosifescu, C. Kraaikamp, Metrical theory of Continued Fractions, Kluwer 2002
[72] K. Jacobs, Selecta Mathematica IV, Springer 1972
[73] P. Jolissaint, Loi de Benford, relations de récurrence et suites équidistribuées, Elem.
Math. 60 (2005), 10-18
[74] M. Kac, On the notion of recurrence in discrete stochastic processes, Bull. Amer.
Math. Soc. 53 (1947), 1002-1010
[75] S. Kakutani, Induced measure preserving transformations, Proc. Imp. Acad. Tokyo
19 (1943), 635-641
[76] S. Kakutani, Examples of ergodic measure preserving transformations which are
weakly mixing but not strongly mixing, in “Recent advances in topological dynamics”,
Proceedings Conference Yale University in honour of G.A. Hedlund, Lecture Notes
Math. 318, Springer 1973, 143-149
Bibliography
129
[77] T. Kamae & M. Keane, A simple proof of the ratio ergodic theorem, Osaka J.
Math. 34 (1997), 653-657
[78] Y. Kanada, D. Takahashi, Calculation of π to 51.5 billion decimal digits on distributed memory parallel processors, Trans. Inform. Process. Soc. Japan 39 (1998),
2074?2083
[79] M. Kesseböhmer, B.O. Stratmann, A multifractal analysis for Stern-Brocot intervals, continued fractions and Diophantine growth rates, J. reine angew. Math. 605
(2007), 133-163
[80] A.Yu. Khintchine, Zu Birkhoffs Lösung des Ergodenproblems, Math. Ann. 107
(1933), 485?488.
[81] A.Yu. Khintchine, Metrische Kettenbruchprobleme, Compositio Math. 1 (1935),
361-382
[82] A.Yu. Khintchine, Three pearls of number theory, Graylock Press, Baltimore 1952
[83] A. Klenke, Wahrscheinlichkeitstheorie, Springer 2006
[84] K. Knopp, Mengentheoretische Behandlung einiger Probleme der diophantischen Approximationen und der transfiniten Wahrscheinlichkeiten, Math. Ann. 95 (1926), 409426
[85] D. König, A. Szücs, Mouvement d’un point abandonné à l’intérieur d’un cube,
Palermo Rend. 36 (1913), 79-90 (in Hungarian)
[86] U. Kohlenbach, L. Leuştean, A quantitative mean ergodic theorem for uniform
convex Banach spaces, Ergodic Theory Dyn. Syst. 29 (2009), 1907-1915; erratum ibid.
29 (2009), 1995
[87] J.F. Koksma, Ein mengentheoretischer Satz über die Gleichverteilung modulo 1,
Compositio Math. 2 (1935), 250-258
[88] A.N. Kolmogorov, Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer 1933
[89] A.N. Kolmogorov, S.V. Fomin, Measure, Lebesgue Integrals, and Hilbert Space,
Academic Press, New York and London 1961
[90] A.V. Kontorovich, S.J. Miller, Benford’s law, values of L-functions and the 3x+1
Problem, Acta Arith. 120 (2005), 269-297
[91] U. Krengel, Ergodic theorems, de Gruyter 1985 (with a supplement by A. Brunel
[92] R.O. Kuzmin, Sur un problem de Gauss, Atti Congr. Itern. Bologne 6 (1928), 83-89
[93] J.C. Lagarias, The ’3X + 1’ Problem and its generalizations, Amer. Math. Mon. 92
(1985), 3-23
[94] E. Landau, Über die Nullstellen der Zetafunktion, Math. Ann. 71 (1912), 548?564
[95] A. Laurinčikas, Limit theorems for the Riemann zeta-function, Kluwer Academic
Publishers, Dordrecht 1996
[96] P. Lévy, Sur les lois de probabilité dont dépendent les quotients complets et incomplets d’une fraction continue, Bull. Soc. Math. France 57 (1929), 178-194
[97] M. Lifshits, M. Weber, Sampling the Lindelöf hypothesis with the Cauchy random
walk, Proc. London Math. Soc. 98 (2009), 241-270
[98] Yu. V. Linnik, Ergodic properties of algebraic fields, Springer 1968
[99] J.E. Littlewood, On the zeros of the Riemann zeta-function, Proc. Cambridge Phil.
Soc. 22 (1924), 295-318
[100] M.H. Martin, Metrically transitive point transformations, Bull. Amer. Math. Soc.
40 (1934), 606-612
[101] K. Matsumoto, Probabilistic value-distribution theory of zeta-functions, Sugaku
53 (2001), 279-296 (in Japanese); engl.translation in Sugaku Expositions 17 (2004),
51-71
[102] K.R. Matthews, A.M. Watts, A generalization of Hasse’s generalization of the
Syracuse algorithm, Acta Arith. 43 (1984), 167-175
[103] C. Mauduit, J. Rivat, Sur un probl?me de Gelfond: la somme des chiffres des
nombres premiers, Ann. of Math. 171 (2010), 1591-1646
130
ERGODIC NUMBER THEORY
[104] W. Narkiewicz, The development of prime number theory, Springer 2000
[105] J. von Neumann, Proof of the quasi-ergodic hypothesis, Nat. Proc. Acad. Sci USA
18 (1932), 70-82
[106] L. Kuipers, H. Niederreiter, Uniform distribution of sequences, John Wiley &
Sons, New York 1974
[107] I. Niven, Irrational numbers, Carus Mathematical Monographs, John Wiley & Sons
1963
[108] K. Petersen, Ergodic theory, Cambridge University Press 1989, corrected reprint
[109] W. Philipp, Mixing sequences of random variables and probabilistic number theory,
Memoirs Amer. Math. Soc. 114, 1971
[110] W. Philipp, O.P. Stackelberg, Zwei Gesetze für Kettenbrüche, Math. Ann. 181
(1969), 152-156
[111] Ch. Pisot, R. Salem, Distribution modulo 1 of the powers of real numbers larger
than 1, Comp. Math. 16 (1964), 164-168
[112] H. Poincaré, Sur le problème des trois corps et les équations de la dynamique,
Acta Math. 13 (1890), 1-270
[113] H. Poincaré, Les méthodes nouvelles de la mécanique céleste, Paris. GauthierVillars et Fils, 1892-1899
[114] M. Pollicott, M. Yuri, Dynamical Systems and Ergodic Theory, London Mathematical Society 40, Cambridge University Press, 1998k
[115] H.A. Rademacher, Fourier Analysis in Number Theory, Symposium on Harmonic
Analysis and Related Integral Transforms (Cornell Univ., Ithaca, N.Y., 1956) in: Collected Papers of Hans Rademacher, Vol. II, pp. 434–458, Massachusetts Inst. Tech.,
Cambridge, Mass., 1974
[116] G.J. Rieger, Mischung und Ergodizität bei Kettenbrüchen nach nächsten Ganzen,
J. reine angew. Math. 310 (1979), 171-181
[117] G.J. Rieger, Effective simultaneous approximation of complex numbers by conjugate algebraic integers, Acta Arith. 63 (1993), 325-334
[118] B. Riemann, Über die Anzahl der Primzahlen unterhalb einer gegebenen Grösse,
Monatsber. Preuss. Akad. Wiss. Berlin (1859), 671-680
[119] A.M. Rockett, P. Szüsz, Continued fractions, World Scientific 1992
[120] K.F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104-109
[121] W. Rudin, Real and Complex Analysis, Mc Graw Hill 1974, 2nd ed.
[122] Ya. Sinai, The central limit theorem for geodesic flows on manifolds of constant
negative curvature, Dokl. Akad. Nauk 133 (1960), 1303-1306; translation in Soviet
Math. Dokl. 1 (1960), 983-987
[123] W. Schmidt, On normal numbers, Pacific J. Math. 10 (1960), 661-672
[124] F. Schweiger, Multidimensional continued fractions, Oxford 2000
[125] G. Shafer, V. Vovk, The origings and legacy of Kolmogorov’s Grundbegriffe,
available at www.probabilityandfinance.com/article/04.pdf
[126] C.E. Shannon, A mathematical theory of communication, Bell System Technical
J. 27 (1948), 379-423, 623-656
[127] W. Sierpinski, Démonstration élémentaire d’un théoreme de M. Borel sur les nombres absolument normaux et détermination effective d’un tel nombre, Bull. Soc. Math.
France 45 (1917), 125-144
[128] J. Steuding, Diophantine Analysis, Chapman & Hall/CRC Press, Boca Raton 2005
[129] J. Steuding, Value distribution of L-functions, Lecture Notes in Mathematics 1877,
Springer 2007
[130] J. Steuding, Sampling the Lı̈ndelöf hypothesis by an ergodic transformation,
preprint 2010
k
a corrected version is available online at www.warwick.ac.uk/∼masdbl/book.html
Bibliography
131
[131] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 199-224
[132] S. Tabachnikov, Geometry and billiards, Amer. Math. Soc., Providence 2005
[133] T.C. Tao, A quantitative ergodic theory proof of Szemerédi’s theorem, Electronic
J. Combinatorics 13 (2006), R99
[134] T.C. Tao, Norm convergence of multiple ergodic averages for commuting transformations, Ergod. Th. & Dynam. Sys. 28 (2008), 657-688
[135] T.C. Tao, The ergodic and combinatorial approaches to Szemerédi’s theorem,
preprint erhältlich unter http://uk.arxiv.org/pdf/math.CO/0604456.pdf
[136] R. Taylor, Automorphy for some l-adic lifts of automorphic mod l representations.
II, Publ. Math. Inst. Hautes Études Sci. 108 (2008), 183-239
[137] E.C. Titchmarsh, The theory of the Riemann zeta-function, Oxford University
Press 1986, 2nd ed., revised by D.R. Heath-Brown
[138] A.M. Turing, A note on normal numbers, Collected Works of A.M. Turing, J.L.
Britton (Ed.), North Holland, Amsterdam 1992, 117-119
[139] A. Ustinov, On the statistical properties of finite continued fractions, Zap. Nauchn.
Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. 322 (2005), 186–211, 255 (Russian);
English translation in J. Math. Sci. 137 (2006), 4722-4738
[140] A. Ustinov, On the Gauss-Kuz-min statistics for finite continued fractions, Fundam. Prikl. Mat. 11 (2005), 195–208 (Russian); English translation in J. Math. Sci.
146 (2007), no. 2, 5771-5781
[141] B.L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw Arch. Wisk.
15 (1928), 212-216
[142] B.L. van der Waerden, Wie der Beweis der Vermutung von Baudet gefunden
wurde, Elem. Math. 9 (1954), 49-56; Nachdruck in Elem. Math. 53 (1998), 139-148
[143] W.A. Veech, Teichmüller curves in moduli space, Eisenstein series and an application to triangular billiards, Invent. Math. 97 (1989), 553?583; erratum: Invent. Math.
103 (1991), 447
[144] I.M. Vinogradov, Darstellung einer ungeraden Zahl als Summe von drei
Primzahlen, Doklady Akad. Nauk SSSR 15 (1937), 291-294 (in Russisch)
[145] J. von Plato, Oresme’s proof of the density of rotations of a circle through an
irrational angle, Hist. Math. 20 (1993), 428-433
[146] S.M. Voronin, Theorem on the ’universality’ of the Riemann zeta-function, Izv.
Akad. Nauk SSSR, Ser. Matem., 39 (1975), 475-486 (Russisch); Math. USSR Izv. 9
(1975), 443-445
[147] S. Wagon, The Banach-Tarski paradox, Cambridge University Press 1985.
[148] P. Walters, Ergodic Theory - Introductory lectures, Lecture Notes in Mathematics
458, Springer 1975
[149] H. Weyl, Sur une application de la théorie des nombres à la mécaniques statistique
et la théorie des pertubations, L’Enseign. math 16 (1914), 455-467
[150] H. Weyl, Über die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77 (1916),
313-352
[151] N. Wiener, A. Wintner, Harmonic analysis and ergodic theory, Amer. J. Math.
63 (1941), 415-426
[152] E. Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences,
Comm. Pure Applied Math., 13 (1960), 1-14
[153] G. Wirsching, The dynamical system generated by the 3X + 1 function, Lecture
Notes in Mathematics 1681, Springer 1998
[154] E. Wirsing, On the theorem of Gauss-Kusmin-Lévy and a Frobenius-type theorem
for function spaces, Acta Arith. 24 (1973/74), 507-528
[155] M. Wolf, Two arguments that the nontrivial zeros of the Riemann zeta function
are irrational, preprint availbale at arXiv:1002.4171v1
132
ERGODIC NUMBER THEORY
[156] J.D. Zund, George David Birkhoff and John von Neumann: A Question of Priority
and the Ergodic Theorems, 1931-1932, Historia Math. 29 (2002), 138-156
Index
arithmetic progression 102
Arnold’s cat map 28, 52
Laplace’s demon 109
law of best approximations 84
Lebesgue measure 21
Lebesgue’s theorems 22, 23
Lévy’s theorem 96
Lindelöf hypothesis 73
Littlewood conjecture 99
baker’s transformation 26
BBP-formula 56
Benford’s law 5, 14
billiards 5, 8
Birkhoff’s ergodic theorem 39, 48, 54,
72, 94, 98
Boltzmann’s brain 112
Borel set 18
measure 18
measure preserving 24
mixing 34, 35
Murphy’s law 8
Calkin-Wilf iteration 85
circle group 5
circle rotation 25, 33
cocktail 35
continued fraction 79
continued fraction algorithm 81
convergents 80
Newton iteration 30, 72
normal number 53
orbit 24
π 55, 82, 84
pigeonhole principle 7, 48
Poincaré’s reccurence theorem 47, 49
prime number theorem 68
probability measure 19
Denjoy’s heuristics 69
dense 13
Dirichlet’s approximation theorem 6,
83
doubling-map 25, 34
dynamical system 24
reccurence 47, 49, 104
Riemann hypothesis 65, 69, 74
Riemann zeta-function 61, 96
ergodic 31, 44
ergodicity hypothesis 37, 112
Euclidean algorithm 77, 86
theorem of Gauss-Kuzmin-Lévy 88
thermodynamics 49
trajectory 24
Felix 28, 52, 122
Fibonacci numbers 15, 83
uniform distribution modulo one 9
van der Waerden’s theorem 104
von Neumann’s ergodic theorem 37
Voronin’s universality theorem 75
Gauss measure 89
Gelfand’s problem 5, 13, 25
Weyl’s theorems 10, 11
Kas’s lemma 51
Khintchine’s theorem 93
Khintchine constant 93, 95
Kronecker’s approximation theorem 8
Young’s decomposition 21
zeta zeros 65, 70
133