LMS-CMI RESEARCH SCHOOL, LOUGHBOROUGH, 13-17 APRIL 2015 NOTES CARLANGELO LIVERANI Abstract. This are informal notes for a five hours mini course. Each one hour lecture contains a selection of problems suitable for at least a 45 minute problem session. These notes are not original as they are largely based on the paper [4] of which they reproduced a lot of material almost verbatim. They are intended for the exclusive use of the students of the Loughborough LMS-CMI 2015 Research School Statistical Properties of Dynamical Systems. Contents 1. Lecture One: the transfer operator 1.1. Introduction 1.2. Invariant measures 1.3. Expanding maps of the circle 2. Lecture Two: standard pairs 2.1. Standard Pairs (expanding maps) 2.2. Coupling 2.3. Expanding maps revisited 3. Lecture three: Fast-Slow partially hyperbolic systems 3.1. The unperturbed system: ε = 0 3.2. Trajectories and path space 3.3. What we would like to prove 3.4. Standard Pairs (partially hyperbolic systems) 3.5. Conditioning 4. Lecture Four: Averaging (the Law of Large Numbers) 4.1. Differentiating with respect to time 4.2. The Martingale Problem at work 4.3. A recap of what we have done so far 5. Lecture five: A brief survey of further results 5.1. Fluctuations around the limit 5.2. Local Limit Theorem 5.3. Wentzel-Friedlin processes 5.4. Statistical propeties of Fε 6. Supplementary material: Proof of Theorem 5.1 6.1. Tightness 6.2. Differentiating with respect to time (poor man’s It¯ o’s formula) 6.3. Uniqueness of the Martingale Problem Appendix A. Geometry Appendix B. Shadowing Appendix C. Martingales, operators and It¯ o’s calculus Appendix D. Solutions to the Problems References Date: April 16, 2015. 1 2 2 3 5 6 6 7 8 10 10 11 11 12 15 17 17 19 20 20 20 21 21 22 22 23 26 30 32 33 34 35 40 2 CARLANGELO LIVERANI 1. Lecture One: the transfer operator 1.1. Introduction. 1.1.1. Dynamical Systems (few general facts). The concept of Dynamical System is a very general one and it appears in many branches of mathematics from discrete mathematics, number theory, probability, geometry and analysis and has wide applications in physics, chemistry, biology, economy and social sciences. Probably the most general formulation of such a concept is the action of a group over an algebra. Given a group G and a vector space V, over a filed K, the (left)action of G on V is simply a map f : G × V → V such that (1) f (gh, a) = f (g, f (h, a)) for each g, h ∈ G and a ∈ V; (2) f (e, a) = a for every a ∈ V, where e is the identity element of G; (3) f (g, αa + βb) = αf (g, a) + βf (g, b) for each g ∈ G and a, b ∈ V, α, β ∈ K; In our discussion we will be mainly motivated by physics. In fact, we will consider mainly the cases G = N as the simplest representative of the most common physics situations in which G ∈ {N, Z, R+ , R}1 is interpreted as time and V is given by the space of functions over some set X, interpreted as the observables of the system.2 In addition, we will restrict ourselves to situations where the action over the algebra is induced by an action over the set X (this is a map f : G × X → X that satisfies condition 1, 2 above).3 Indeed, given an action f of G on X and the vector space V, over the filed C, of function on X one can define f˜(g, a)(x) := a(f (g, x)) for all g ∈ G, a ∈ V and x ∈ X. It is then easy to verify that f˜ satisfies conditions 1-3 above. The Dynamical System for which G ∈ {N, Z} are said to be in discrete time while the Dynamical Systems in which G ∈ {R+ , R} are in continuos time. Note that, in the first case, f (n, x) = f (n − 1 + 1, x) = f (1, f (n − 1, x)), thus defining T : X → X as T (x) = f (1, x), holds f (n, x) = T n (x).4 Thus in such a case we can (and will) specify the Dynamical System by writing only (X, T ). In the case of continuos Dynamical Systems we will write φt (x) := f (t, x) and call φt a flow (if the group is R) or a semi-flow (if the group is R+ ) and will specify the Dynamical System by writing (X, φt ). In fact, in this notes we will be interested only in Dynamical Systems with more structure i.e. topological, measurable or smooth Dynamical Systems. By topological Dynamical Systems we mean a triplet (X, T , T ), where T is a topology and T is continuos (if B ∈ T , then T −1 B ∈ T ). By smooth we consider the case in which X has a differentiable structure and T is r-times differentiable for some r ∈ N. Finally, a measurable Dynamical Systems is a quadruple (X, Σ, T, µ) where Σ is a σ-algebra, T is measurable (if B ∈ Σ, then T −1 B ∈ Σ) and µ is an invariant measure (for all B ∈ Σ, µ(T −1 B) = µ(B)).5 1Although even in physics other possibilities are very relevant, e.g. in the case of Statistical Mechanics it is natural to consider the action of the space translations, i.e. the groups {Zd , Rd } for some d ∈ N, d > 1. 2Again other possibilities are relevant, e.g. the case of Quantum Mechanics (in the so called Heisenberg picture) where the vector space is given by the non-abelian algebra of the observables and consists of the bounded operators over some Hilbert space. 3Again relevant cases are not included, for example all Markov Process where the evolution is given by the action of some semigroup. 4Obviously T 2 (x) = T ◦ T (x) = T (T (x)), T 3 (x) = T ◦ T ◦ T (x) = T (T (T (x))) and so on. 5The definitions for continuos Dynamical Systems are the same with {φ } taking the place of t T. NOTES 3 1.1.2. Strong dependence on initial conditions. One of the great discoveries of the last century, starting with the seminal work of Poincar`e, is that it may be essentially impossible to predict the long term behaviour of a dynamical system. This might seem strange as such a behaviour is, in principle, exactly computable. To understand better what is going one let us consider one of the simplest possible √ cases X = T and T (x) = 2x mod 1. Let us say that you start from x = 1/ 2 and 106 you want to compute TP x. If you do it on a computer, then it is better to write ∞ x in binary digits: x = n=1 an 2−n . Then Tx = ∞ X −n−1 an 2 mod 1 = n=1 ∞ X an+1 2−n . n=2 Hence 6 T 10 x = ∞ X an+106 2−n . n=106 +1 6 This means that if T 10 x ∈ [0, 1/2] or not depends on the 106 + 1 digit of x. This of course can be computed, but it is not super easy. And this is a really simple example. A slightly more sophisticated exampled is the so called Arnold cat map (which is the simplest possible example of a, so called, hyperbolic system: let F : T2 → T2 be defined by x 1 1 x F (x, y) = A mod 1 := mod 1. y 1 2 y Problem 1.1. Compute the eigenvalues of A a verify that they are of the form λ, λ−1 with λ > 1. Deduce that, for almost all small vectors h ∈ R2 , distance between the orbits of (x, y) and (x, y) + h grows exponentially both in the past and in the future. 1.2. Invariant measures. The above seems to cast a shadow of despair on the possibility to use the theory of dynamical systems to make long term predictions. In such a bleak scenario another fundamental fact was discovered in the second half of the last century (starting with the work of Kolmogorov and other prominent members of the so called Russian School): even though it may be impossible to make predictions for a single orbit, some times it is possible to make statistical predictions. That is, instead of looking at the evolution of points, look at the evolution of measures: let M1 (X) be the set of Borel probability measures on X and consider the map T∗ : M1 (x) → M1 (X) defined by T∗ µ(A) = µ(T −1 (A)) for each µ ∈ M1 (X) and measurable set A. Note that (M1 (X), T∗ ) is itself a topological dynamical systems.6 Moreover the fixed points of (M1 (X), T∗ ), i.e. the probability measures such that T∗ µ = µ, correspond to the invariant measures of (X, T ). Problem 1.2. Show that, for all f ∈ C 0 , and measure µ on X, T∗ µ(f ) = µ(f ◦ T ). 6 The natural topology on M (X) being the weak topology. 1 4 CARLANGELO LIVERANI Problem* 1.3 (Krylov–Bogolyubov theorem). 7 Let (X, T ) be a topological dynamical system, with X compact. Let Σ be the Borel σ-algebra. Show that there exists at least a measure µ such that (X, Σ, T, µ) is a measurable dynamical system.8 Remark 1.1. The idea of studying the dynamical system (M1 (X), T∗ ) instead of (X, T ) may seem totally crazy. Indeed, first of all if δx ∈ M1 (X) is defined as δx (f ) = f (x), then the reader can easily check that T∗ δx = δT x , hence (X, T ) can be seen as a minuscule part of (M1 (X), T∗ ). Second (M1 (X), T∗ ) is infinite dimensional even if (X, T ) is one dimensional. The only redeeming property is that T∗ is a linear operator on the space of measures. We will see that this is has deep implications. Due to the above considerations it is not surprising that the study of (M1 (X), T∗ ) might be profitable if one restrict oneself to a space smaller than M1 (X). This can be done in many ways and the actual choice will depend on the questions one is interested in. In this note we will consider the simplest possible case: we are interested in measures that have some connection with Lebesgue. What this means in general it is not so clear, yet we can easily explain it in a very simple class of examples. 1.2.1. Few basic definitions. Let us remind to the reader few basic definitions that will be used during these notes. For many more details see [11]. Definition 1. A topological dynamical system (X, T ) is called transitive if it has a dense orbit.9 Definition 2. A measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , is called ergodic if µ is the only invariant measure in the set M(µ) = {ν ∈ M1 : ν u}. Definition 3. A measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , is called mixing if for each ν ∈ M(µ) one has limn→∞ T∗n ν = µ, where the convergence is in the strong topology. The last thing we need is a quantitative form of mixing (which often goes under the name of decay of correlation. This can be done in several different ways. We will adopt the following. Definition 4. Consider a measurable dynamical system (X, Σ, T, µ), µ ∈ M1 , and sets of functions D+ , D− ⊂ L1 (X, µ) such that for each f ∈ D− and g ∈ D+ we have gf ∈ L1 (X, µ); if f ∈ D+ , then f ◦ T ∈ D+ and if dν = f dµ with f ∈ D− , then d(T∗ ν) = f¯dµ with f¯ ∈ D− . Then we say that for such sets of function we have decay of correlation with speed {cn } if for each f ∈ D+ , g ∈ D− there exists a constant C > 0 such that, for all n ∈ N we have Z Z Z gf ◦ T n dµ − ≤ Ccn . gdµ · f dµ X X X 7Starred problems are harder, If you do not have any idea in 15 minutes, skyp to the next one. 8Hint: choose any initial probability measure µ and let the dynamics work for you by consid- 0 1 P−1 k ering the sequence µu = n k=0 T∗ µ0 . 9 Actually, the usual definition is: for every pair of non-empty open U, V ⊂ X, there exists n ∈ N such that T n (U ) ∩ V 6= ∅. This is not equivalent to our definitions in general, but it is equivalent when X has no isolated points and is separable and second category (see http://www.scholarpedia.org/article/Topological transitivity for more infos). NOTES 5 1.3. Expanding maps of the circle. An expanding map of the circle is a map T ∈ C 2 (T, T) such that T 0 (x) ≥ λ > 1 for all x ∈ T. In this case we will be interested in measures absolutely continuous with respect to Lebesgue. A first remarkable fact is that such a class of measure is invariant under the action of T∗ . Problem 1.4. Show that if dµ = h(x)dx, then d(T∗ µ) = (Lh)(x)dx where X h(y) . (Lh)(x) = 0 (y) T −1 y∈T (x) The operator L describes the dynamics on the densities and is commonly called Ruelle operator or Ruelle-Perron-Frobenius operator. It is naturally defined on L1 but it preserves also smaller spaces. 0 1 Problem 1.5. Let W 1,1 be the Sobolev space R R 0of the functions0 such that f, f ∈ L equipped with the norm kf kW 1,1 = |f | + |f | = kf kL1 + kf kL1 . Prove that, for all h ∈ W 1,1 , kLhkL1 ≤ khkL1 k d n (L h)kL1 ≤ λ−n kh0 kL1 + DkhkL1 dx 00 |T (x)| where D = supx |T 0 (x)|2 . Prove that, there exist C > 0, such that for all n ∈ N and h ∈ C 1 , kLn hkC 0 ≤ CkhkC 0 kLn hkC 1 ≤ Cλ−n khkC 1 + CkhkC 0 . Prove that for each σ ∈ (λ−1 , 1) and a ≥ D(σ − λ−1 )−1 , setting f (x) Da = f ∈ C 1 (T, R≥ ) : ≤ ea|x−y| , f (y) holds true L(Da ) ⊂ Dσa . The above first two sets of inequalities are commonly called Lasota-Yorke or Doeblin-Fortet (in L1 , W 1,1 or C 0 , C 1 , respectively). The last one is a different manifestation of the same phenomena: the dynamics tends to make densities smoother, up to a point. From the above Lasota-Yorke inequalities and an abstract functional analytic theorem it follows that the essential spectrum of L is contained in the disk of radius λ. We will not develop such a theory (see [1] for details) but let us consider the simple case in which D < 1 (that is the map is not so far from being linear). Lemma 1.2. If D < 1, then T has a unique invariant measure µ∗ absolutely continuous with respect to Lebesgue. In addition, every measure with density in W 1,1 converges to it exponentially fast. R R R Proof. First of all note that if h = 0, then Lh = h = 0, thus V = {h ∈ W 1,1 : R h = 0} is an invariant space for L. Also note that khk∗ = kh0 kL1 is a norm on V. Moreover, h ∈ V,10 then there must exists a ∈ T such that h(a) = 0 and then R x if 0 |h(x)| = | a h | ≤ khk∗ . Thus, for each h ∈ V, kLn hk∗ ≤ λ−n khk∗ + DkhkL1 ≤ (λ−n + D)khk∗ . 10 Recall that W 1,1 ⊂ C 0 . 6 CARLANGELO LIVERANI By hypotheses we can choose n∗ so that λ−n∗ + D = σ < 1 which shows that the spectral radius of L on V is smaller that σ 1/n∗ < 1. On the other hand, setting hn = n−1 1X k L 1 n k=0 it follows khn kW 1,1 ≤ D + 1, hence {hn } is a compact sequence in L1 ,11 This means that there exists a subsequence hnj converging in L1 to some h∗ ∈ W 1,1 . Moreover R Lh∗ = h∗ , h∗ ≥ 0 and h∗ = 1. We have thus an invariant probability measure. On the other hand for each h ∈ W 1,1 there exists a ∈ R such that h − ah∗ ∈ V, thus kLn h − ah∗ kW 1,1 = kLn (h − ah∗ )kW 1,1 ≤ σ n kh − ah∗ kW 1,1 , which implies the Lemma. We can now make clear what we mean by statistical predictions. Problem 1.6. Let T (x) = 3x + 10−2 sin(2πx) mod 1. Consider a situation in which x ∈ [0, 10−3 ] with uniform probability. Give an estimate of the probability that T 30 x ∈ [1/4, 1/2]. Problem 1.7. Show that Lemma 1.2 implies that an invariant measure with density in W 1,1 is mixing. 2. Lecture Two: standard pairs 2.1. Standard Pairs (expanding maps). Let us revisit what we have learned about the smooth expanding maps of the circle using a different technique: standard pairs. This tool is less powerful than the transfer operator one, but much more flexible, it will then be instrumental in the study of less trivial systems. We start by presenting it in a very simplified manner, such a simplification is possible only because we treat very simple systems: smooth expanding maps. A standard pair is a|x−y| a couple ` = (I, ρ) where I = [α, β] ⊂ T and ρ ∈ C 1 (I, R≥ ) such that ρ(x) , ρ(y) ≤ e R for all x, y ∈ I, and I ρ = 1. We fix some δ < 1/2 and call the set of all possible such objects, such that δ ≤ |I| ≤ 2δ, La . To a standard pair is uniquely associated the probability measure. Z µ` (ϕ) = ϕ(x)ρ(x)dx. I Remark 2.1. In this particular case we could had considered only the case I = T, but this would not have illustrated the flexibility of the method nor prepared us for future developments. R (k+1)/m 1 11 Indeed, if, for each m ∈ N, we define Π h = Pm−1 1 h, then m [k/m,(k+1)/m] m k/m k=0 kΠm h − hkL1 ≤ m−1 X Z (k+1)/m k=0 k/m Z (k+1)/m m−1 X 1 Z (k+1)/m 1 1 h ≤ |h0 | ≤ khkW 1,1 . h − m k/m m m k/m k=0 Thus the set AC = {khkW 1,1 ≤ C} can be uniformly approximated by the finite dimensional set Πm AC , hence the claim. NOTES 7 Lemma 2.2. There exists a0 > 0 such that, for all a ≥ a0 and ` ∈ La there exists N ∈ N and {`i }N i=1 ⊂ La such that T∗ µ` = N X pi µ`i , i=1 where P i pi = 1. Proof. Note that, if we choose 2δ small enough, then T is invertible on each interval I of length smaller than I. Hence, calling φ the inverse of T |I . Z T∗ µ` (ϕ) = ρ ◦ φ · |φ0 | · ϕ TI Note that by hypothesis T I is longer than λ times I. If it is longer than 2δ, then we can divide it in sub-interval of length between δ and 2δ. Let {Ii } the collection R 0 of such a partition of I. Also, letting pi = Ii ρ ◦ φ · |φ0 | and ρi = p−1 i ρ ◦ φ · |φ |, we have X Z T∗ µ` (ϕ) = pi ρi · ϕ. Ii i R Note that ρi = 1 by construction and that ρi ∈ Da follows by the same computations done in Problem 1.5. We have thus seen that the convex combinations of standard pairs (called standard families) are invariant under the dynamics. This is a different way to restrict the action of the transfer operator on a suitable class of measures. In fact, it is not so different from the previous one as (finite) standard families yield measure absolutely continuos with respect to Lebesgue and with densities that are piecewise C1. 2.2. Coupling. Given two measures µ, ν on two probability spaces X, Y , respectively, a coupling of µ, ν is a probability measure α on X × Y such that, for all f ∈ C 0 (X, R) and g ∈ C 0 (Y, R) we have Z f (x)α(dx, dy) = µ(f ) X×Y Z g(y)α(dx, dy) = ν(g). X×Y That is, the marginals of α are exactly µ and ν. Problem* 2.1. Let X be a compact Polish space,12 let d be the distance and consider the Borel σ-algebra. For each couple of probability measures µ, ν let G(µ, ν) be the set of coupling of µ and ν. (1) Show that G(µ, ν) 6= ∅. (2) Show that Z d(µ, ν) = inf d(x, y)α(dx, dy) α∈G(µ,ν) X2 is a distance in the space of measures. (3) Show that the topology induced by d on the set of probability measures is the weak topology. 12 That is a complete, separable, metric space. 8 CARLANGELO LIVERANI (4) Discuss the cases X = [0, 1], d(x, y) = |x − y| and X = [0, 1], d0 (x, y) = 0 iff x = y and d0 (x, y) = 1 otherwise. 2.3. Expanding maps revisited. In this section we provide an alternative approach to exponential mixing for expanding maps based on the above mentioned ideas. Let us consider any two standard pairs `, `˜ ∈ La . First of all note that there exists n0 ∈ N such that, for each I of length δ, T n0 I = T1 . We then consider the standard pairs {`i = (Ii , ρi )} and {`˜i = (I˜i , ρ˜i )} in which, according to Lemma 2.2, we can decompose the measures T∗n0 µ` and T∗n0 µ`˜, respectively. Chose any of ˜ let us call them the intervals {Ii }, say I0 . There are, at most, three intervals I, I˜0 , I˜1 , I˜2 , whose union covers I0 . Then there must exists an interval I˜i , say I˜0 , such that J˜ = I˜0 ∩ I0 is an interval of length at least δ/3. Let J be the central third ˜ We can then subdivide I0 and I˜0 , I˜1 , I˜2 in subintervals of size at least δ/9 so of J. that J appears in both subdivisions. Arguing like in Lemma 2.2 we can write Z X Z n T∗ µ` (ϕ) = pJ ρJ ϕ + pi ρi ϕ Ii i T∗n µ`˜(ϕ) = p˜J Z ρ˜J ϕ + X Z p˜i i I˜i ρ˜i ϕ. Note hoverer that the above decomposition it is not in standard pairs as the interval may be too short (but always longer than δ/9). Note that there exist a fixed constant c0 > 0 such that min{pJ , p˜J } ≥ 4c0 . In addition, by definition inf{ρJ , ρ˜J } ≥ e−aδ ≥ 1/2, provided δ has been chosen small enough. Accordingly, 1 , we can write setting ρ¯ = |J| Z Z Z X Z T∗n µ` (ϕ) =c0 ρ¯ϕ + (pJ − 2c0 ) ρJ ϕ + c0 (2ρJ − ρ¯)ϕ + pi ρi ϕ J J J i Ii =c0 ν(ϕ) + (1 − c0 )νR (ϕ) Z Z Z X Z n T∗ µ`˜(ϕ) =c0 ρ¯ϕ + (˜ pJ − 2c0 ) ρ˜J ϕ + c0 (2˜ ρJ − ρ¯)ϕ + p˜i ρ˜i ϕ J J J i I˜i =c0 ν(ϕ) + (1 − c0 )˜ νR (ϕ) We can then consider the coupling Z α1 (g) = c0 ρ¯(x)g(x, x) + (1 − c0 )νR × ν˜R (g). Problem 2.2. For each m ∈ N, calling ρ¯n the density of T∗n ν, show that Z (T∗ × T∗ )n α1 (g) = c0 ρ¯n (x)g(x, x) + (1 − c0 )T∗n νR × T∗n ν˜R (g). Problem 2.3. Show that there exists n1 ∈ N, such that both T∗n1 νR and T∗n1 ν˜R admit a decomposition in standard pairs {`1i } and {`˜1i }, respectively. The above implies that, at time n ¯ = n0 + n1 , we can take any two standard pairs `1i and `˜1j , apply to them the same argument and obtain 1,i,j T∗n0 µ`1i = c0 νi,j + (1 − c0 )˜ νR 1,i,j T∗n0 µ`˜1 = c0 νi,j + (1 − c0 )˜ νR i NOTES 9 P P We can thus write T∗n0 νR = i,j pi p˜j T∗n0 µ`1i and T∗n0 ν˜R = i,j pi p˜j T∗n0 µ`˜1 and, j letting ρ¯i,j be the density of the measure νi,j consider the coupling Z X 1,i,j 1,i,j α2 (g) = pi p˜j c0 ρ¯i,j (x)g(x, x) + (1 − c0 )νR × ν˜R (g). Ji,j ij Collecting the above consideration, it follows that there exists a probability density ρ¯1 such that the measures T∗2¯n µ` and T∗2¯n µ`˜ admit a coupling of the form Z 1,i,j 1,i,j n1 n1 T∗ × T∗ α(g) = [c0 + (1 − c0 )c0 ] ρ¯1 (x)g(x, x) + (1 − c0 )2 T∗n1 νR × T∗n1 ν˜R (g). But the above implies, using the discrete distance d0 of Problem 2.1, d(T∗2¯n µ` , T∗2¯n µ`˜) ≤ (1 − c0 )2 . By induction it then follows d(T∗k¯n µ` , T∗k¯n µ`˜) ≤ (1 − c0 )k . PN Remark 2.3. Note that if µ = i=1 pi µ`i is the measure associated to the standard family {pi , `i }N i=1 , then it is absolutely continuous with respect to Lebesgue with density ρ given by X X ρ(x) = pi ρi (x)1Ii (x) ≤ p i ea ≤ ea . i i Remark 2.4. Note that this implies that we can choose D− as the standard families and D+ = L∞ in Definition 4 and obtain, for each µ, ν ∈ D− ,13 |T∗n µ(f ) − T∗n ν(f )| ≤ Ce−cn kf kL∞ . Thus, if we have a measure µ determined by a standard family it follows that {T∗n µ} is a sequence of measures determined by standard families, hence it must be a Cauchy sequence (just apply the previous remark to µ and T∗m µ). If follows that there exists a unique measure ν, with density ρ ∈ L∞ such that, for each standard family measure µ and measurable set A, lim µ(T −n A) = ν(A) = ν(T −1 A). n→∞ Moreover it is easy to see that we can approximate any measure µ that it is absolutely with respect to Lebesgue by a sequence of measures {µk } that arise as standard families. It thus follows that the dynamical systems (T, T, ν) is mixing, i.e., for each measure µ absolutely with respect to Lebesgue (hence with respect to ν ) we have lim µ(T −n A) = ν(A). n→∞ In particular, ν is the unique absolutely continue invariant measure. 13 Indeed, let any G a coupling of T n µ and T n ν, then let g(x, y) = f (x) − f (y) ∗ ∗ |T∗n µ(f ) − T∗n ν(f )| = |G(g)| ≤ G(d · |g|) ≤ kgk∞ G(d) ≤ 2kf k∞ G(d) and the claim follows by taking the infimum on the couplings. 10 CARLANGELO LIVERANI 3. Lecture three: Fast-Slow partially hyperbolic systems Now that we have reviewed some basic results on expanding maps we are ready to tackle less trivial systems. At this point we could take several interesting directions to illustrate in which sense a deterministic system can behave similarly to a stochastic one. I choose to study fast-slow systems in which the fast variable undergoes a strongly chaotic motion. Namely, let M, S be two compact Riemannian manifolds, let X = M × S be the configuration space of our systems and let mLeb be the Riemannian measure on M . For simplicity, we consider only the case in which S = M = T1 . We consider a map F0 ∈ C r (T2 , T2 ), r ≥ 3, defined by F0 (x, θ) = (f (x, θ), θ) where the maps f (·, θ) are uniformly expanding for every θ, i.e.14 ∂x f (x, θ) ≥ λ > 2. If we consider a small perturbation of F0 we note that the perturbation of f still yields a uniformly hyperbolic system, by structural stability. Thus such a perturbation can be subsumed in the original maps. Hence, it suffices to study families of maps of the form Fε (x, θ) = (f (x, θ), θ + εω(x, θ)) with ε ∈ (0, ε0 ), for some ε0 small enough, and ω ∈ C r . Such systems are called fast-slow, since the variable θ, the slow variable, needs a time at least ε−1 to change substantially. Problem 3.1. Show that there exists invariant cone fields for Fε . 3.1. The unperturbed system: ε = 0. The statistical properties of the system are well understood in the case ε = 0. In such a case θ is an invariant of motion, while for every θ the map f (·, θ) has strong statistical properties. We will need such properties in the following discussion which will be predicated on the idea that, for times long but much shorter than ε−1 , on the one hand θ remains almost constant, while, on the other hand, its change depends essentially on the behavior of an ergodic sum with respect to a fixed dynamics f (·, θ). It follows that the dynamical systems (T2 ,RF0 ) has uncountable many SRB measures: all the measures of the form µ(ϕ) = ϕ(x, θ)mθ (dx)ν(dθ) for an arbitrary measure ν. The ergodic measures are the ones in which ν is a point mass. The system is partially hyperbolic and has a central foliation. Indeed, the f (·, θ) are all topologically conjugate by structural stability of expanding maps [11]. Let h(·, θ) be the map conjugating f (·, 0) with f (·, θ), that is h(f (x, 0), θ) = f (h(x, θ), θ). Thus the foliation Wxc = {(h(x, θ), θ)}θ∈S is invariant under F0 and consists of points that stay, more or less, always at the same distance, hence it is a center foliation. Note however that, since in general h is only a H¨older continuous function (see [11]) the foliation is very irregular and, typically, not absolutely continuous. In conclusion, the map F0 has rather poor statistical properties and a not very intuitive description as a partially hyperbolic system. It is then not surprising that its perturbations form a very rich universe to explore and already the study of the behavior of the dynamics for times of order ε−1 (a time long enough so that the 14 In fact, λ > 1 would suffice, we put 2 to simplify a bit later computations. NOTES 11 variable θ has a non trivial evolution, but far too short to investigate the statistical properties of Fε ) is interesting and non trivial. 3.2. Trajectories and path space. To continue, a more detailed discussion concerning the initial conditions is called for. Note that not all measures are reasonable as initial conditions. Just think of the possibility to start with initial conditions given by a point mass, hence killing any trace of randomness. The best one can reasonably do is to fix the slow variable and leave the randomness only in the fast one. Thus we Rwill consider measures µ0 of the following type: for each ϕ ∈ C 0 (T2 , R), µ0 (ϕ) = ϕ(x, θ0 )h(x)dx for some θ0 ∈ S and h ∈ C 1 (M, R+ ). Let µ0 be a probability measure on T2 . Let us define (xn , θn ) = Fεn (x, θ), then (xn , θn ) are random variables 15 if (x0 , θ0 ) are distributed according to µ0 .16 It is natural to define the polygonalization17 (3.1) Θε (t) = θbε−1 tc + (ε−1 t − bε−1 tc)(θbε−1 tc+1 − θbε−1 tc ), 2 t ∈ [0, T ]. 0 Note that Θε is a random variable on T with values in C ([0, T ], S). Also, note the time rescaling done so that one expects non trivial paths. It is often convenient to consider random variables defined directly on the space C 0 ([0, T ], S) rather than T2 . Let us discuss the set up from such a point of view. The space C 0 ([0, T ], S) endowed with the uniform topology is a separable metric space. We can then view C 0 ([0, T ], S) as a probability space equipped with the Borel σ-algebra. Problem T 3.2. Show that such a σ-algebra is the minimal σ-algebra containing the n open sets i=1 {ϑ ∈ C 0 ([0, T ], S) | ϑ(ti ) ∈ Ui } for each {ti } ⊂ [0, T ] and open sets Ui ⊂ S. Since Θε can be viewed as a continuous map from T2 to C 0 ([0, T ], S), the measure µ0 induces naturally a measure Pε on C 0 ([0, T ], S): Pε = (Θε )∗ µ0 .18 Also, for each t ∈ [0, T ] let Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) be the random variable defined by Θ(t, ϑ) = ϑ(t), for each ϑ ∈ C 0 ([0, T ], S). Next, for each A ∈ C 0 (C 0 ([0, T ], S), R), we will write Eε (A) for the expectation with respect to Pε . For A ∈ C 0 (S, R) and t ∈ [0, T ], Eε (A ◦ Θ(t)) = Eε (A(Θ(t))) is the expectation of the function A(ϑ) = A(ϑ(t)), ϑ ∈ C 0 ([0, T ], S). 3.3. What we would like to prove. Our first problem is to understand limε→0 Pε . After some necessary preliminaries, in Lecture 4 we will prove the following result. Theorem 3.1. The measures {Pε } have a weak limit P, moreover P is a measure supported on the trajectory determined by the O.D.E. ˙ =ω Θ ¯ (Θ) (3.2) Θ(0) = θ0 R where ω ¯ (θ) = M ω(x, θ)mθ (dx). 15 Recall that a random variable is a measurable function from a probability space to a measurable space. 16 That is, the probability space is T2 equipped with the Borel σ-algebra, µ is the probability 0 measure and (xn , θn ) are functions of (x0 , θ0 ) ∈ T2 . 17 Since we interpolate between near by points the procedure is uniquely defined in T. 18 Given a measurable map T : X → Y between measurable spaces and a measure P on X, T∗ P is a measure on Y defined by T∗ P (A) = P (T −1 (A)) for each measurable set A ⊂ Y . 12 CARLANGELO LIVERANI The above theorem specifies in which sense the random variable Θε converges to the average dynamics described by equation (3.2). Having stated our goal, let us begin with a first, very simple, result. Problem 3.3. Show that the measures {Pε } are tight. Proof. By (3.1) it follows that the path Θε is made of segments of length ε and maximal slope kωkL∞ , thus for all h > 0,19 bε−1 (t+h)c−1 kΘε (t + h) − Θε (t)k ≤ C# h + ε X kω(xk , θk )k ≤ C# h. k=dε−1 te Thus the measures Pε are all supported on a set of uniformly Lipschitz functions, that is a compact set. The above means that there exist converging subsequences {Pεj }. Our next step is to identify the set of accumulation points. An obstacle that we face immediately is the impossibility of using some typical probabilistic tools. In particular, conditioning with respect to the past and It¯o’s formula. In fact, even if the initial condition is random, the dynamics is still deterministic, hence conditioning with respect to the past seems hopeless as it might kill all the randomness at later times. To solve the first problem it is therefore necessary to devise a systematic way to use the strong dependence on the initial condition (typical of hyperbolic systems) to show that the dynamics, in some sense, forgets the past. One way of doing this effectively is to use standard pairs, introduced in the next section, whereby slightly enlarging our allowed initial conditions. Exactly how this solves the conditioning problem will be explained in Section 3.5. The lack of It¯o’s formula will be overcome by taking the point of view of the Martingale problem to define the solution of a SDE. To explain what this means in the present context is the goal of the present note, but see Appendix C for a brief comment on this issue in the simple case of an SDE. We will come back to the problem of studying the accumulation points of {Pε } after having settled the issue of conditioning. 3.4. Standard Pairs (partially hyperbolic systems). Let us fix δ > 0 small enough, and D > 0 large enough, to be specified later; for c1 > 0 consider the set of functions Σc1 = {G ∈ C 2 ([a, b], S) | a, b ∈ T1 , b − a ∈ [δ/2, δ], kG0 kC 0 ≤ εc1 , kG00 kC 0 ≤ εDc1 , }. Let us associate to each G ∈ Σc1 the map G ∈ C 2 ([a, b], T2 ) defined by G(x) = (x, G(x)) whose image is a curve –the graph of G– which will be called a standard curve. For c2 > 0 large enough, let us define the set of c2 -standard probability densities on the standard curve as ( ) Z b 0 ρ Dc2 (G) = ρ ∈ C 1 ([a, b], R+ ) ρ(x)dx = 1, ρ 0 ≤ c2 . a C 19 The reader should be aware that we use the notation C to designate a generic constant # (depending only on f and ω) which numerical value can change from one occurrence to the next, even in the same line. NOTES 13 A standard pair ` is given by ` = (G, ρ) where G ∈ Σc1 and ρ ∈ Dc2 (G). Let Lε be the collection of all standard pairs for a given ε > 0. A standard pair ` = (G, ρ) induces a probability measure µ` on T2 defined as follows: for any continuous function g on T2 let Z b µ` (g) := g(x, G(x))ρ(x)dx. a We define20 a standard family L = (A, ν, {`j }j∈A ), where A ⊂ N and ν is a probability measureP on A; i.e. we associate to each standard pair `j a positive weight ν({j}) so that j∈A ν({j}) = 1. For the following we will use also the notation ν`j = ν({j}) for each j ∈ A and we will write ` ∈ L if ` = `j for some j ∈ A. A standard family L naturally induces a probability measure µL on T2 defined as follows: for any measurable function g on T2 let X µL (g) := ν` µ` (g). `∈L Let us denote by ∼ the equivalence relation induced by the above correspondence i.e. we let L ∼ L0 if and only if µL = µL0 . Proposition 3.2 (Invariance). There exist δ and D such that, for any c1 , c2 sufficiently large, and ε sufficiently small, for any standard family L, the measure Fε∗ µL can be decomposed in standard pairs, i.e. there exists a standard family L0 such that Fε∗ µL = µL0 . We say that L0 is a standard decomposition of Fε∗ µL . Proof. For simplicity, let us assume that L is given by a single standard pair `; the general case does not require any additional ideas and it is left to the reader. By definition, for any measurable function g: Fε∗ µ` (g) = µ` (g ◦ Fε ) = Z b = g(f (x, G(x)), G(x) + εω(x, G(x))) · ρ(x)dx. a It is then natural to introduce the map fG : [a, b] → T1 defined by fG (x) = f ◦ G(x). Note that fG0 ≥ λ − εc1 k∂θ f kC 0 > 3/2 provided that ε is small enough (depending on how large is c1 ). Hence all fG ’s are expanding maps, moreover they are invertible if δ has been chosen small enough. In addition, for any sufficiently smooth function A on T2 , it is trivial to check that, by the definition of standard curve, if ε is small enough (once again depending on c1 )21 (3.3a) k(A ◦ G)0 kC 0 ≤ kdAkC 0 + εkdAkC 0 c1 k(A ◦ G)00 kC 0 ≤ 2kdAkC 1 + εkdAkC 0 Dc1 . Sm Then, fix a partition (mod 0) [fG (a), fG (b)] = j=1 [aj , bj ], with bj − aj ∈ [δ/2, δ] and bj = aj+1 ; moreover let ϕj (x) = fG−1 (x) for x ∈ [aj , bj ] and define (3.3b) Gj (x) = G ◦ ϕj (x) + εω(ϕj (x), G ◦ ϕj (x)); ρ˜j (x) = ρ ◦ ϕj (x)ϕ0j (x). 20 This is not the most general definition of standard family, yet it suffices for our purposes. 21 Given a function A by dA we mean the differential. 14 CARLANGELO LIVERANI By a change of variables we can thus write: m Z bj X (3.4) Fε∗ µ` (g) = ρ˜j (x)g(x, Gj (x))dx. j=1 aj Observe that, by immediate differentiation we obtain, for ϕj : (3.5) ϕ0j = 1 ◦ ϕj fG0 ϕ00j = − fG00 ◦ ϕj . fG03 ¯ = G + εωG . Differentiating the definitions of Gj and ρ˜j and Let ωG = ω ◦ G and G using (3.5) yields (3.6) G0j = ¯0 G ◦ ϕj fG0 G00j = ¯ 00 G f 00 ◦ ϕj − G0j · G ◦ ϕj 02 fG fG02 and similarly (3.7) ρ˜0j f 00 ρ0 ◦ ϕj − G ◦ ϕj . = 0 ρ˜j ρ · fG fG02 Using the above equations it is possible to conclude our proof: first of all, using (3.6), ¯ and equations (3.3) we obtain, for small enough ε: the definition of G 0 0 G + εωG 0 ≤ 2 (1 + C# ε)εc1 + C# ε ≤ kGj k ≤ 0 3 fG 3 ≤ εc1 + C# ε ≤ εc1 , 4 provided that c1 is large enough; then: 00 00 G + εωG 00 + C# (1 + εDc1 )εc1 ≤ kGj k ≤ fG02 3 ≤ εDc1 + εC# c1 + εC# ≤ εDc1 4 provided c1 and D are sufficiently large. Likewise, using (3.3) together with (3.7) we obtain 0 ρ˜j 2 ≤ c2 + C# (1 + Dc1 ) ≤ c2 , ρ˜j 3 provided that c2 is large enough. This concludes our proof: it suffices to define Rb the family L0 given by (A, ν, {`j }j∈A ), where A = {1, . . . , m}, ν({j}) = ajj ρ˜j , ρj = ν({j})−1 ρ˜j and `j = (Gj , ρj ). Our previousPestimates imply that (Gj , ρj ) are 0 standard pairs; note moreover that (3.4) implies `∈L ˜ 0 ν`˜ = 1, thus L is a standard family. Then we can rewrite (3.4) as follows: X Fε∗ µ` (g) = ν`˜µ`˜(g) = µL0 (g). ˜ 0 `∈L Remark 3.3. Given a standard pair ` = (G, ρ), we will interpret (xk , θk ) as random variables defined as (xk , θk ) = Fεk (x, G(x)), where x is distributed according to ρ. NOTES 15 3.5. Conditioning. In probability, conditioning is one of the most basic techniques and one would like to use it freely when dealing with random variables. Yet, as already mentioned, conditioning seems unnatural when dealing with deterministic systems. The use of standard pairs provides a very efficient solution to this conundrum. The basic idea is that one can apply repeatedly Proposition 3.2 to obtain at each time a family of standard pairs and then “condition” by specifying to which standard pair the random variable belongs at a given time.22 Note that if ` is a standard pair with G0 = 0, then it belongs to Lε for all ε > 0. In the following, abusing notations, we will use ` also toTdesignate a family {`ε }, `ε ∈ Lε that weakly converges to a standard pair ` ∈ ε>0 Lε . For every standard pair ` we let Pε` be the induced measure in path space and Eε` the associated expectation. Before continuing, let us recall and state a bit of notation: for each t ∈ [0, T ] recall that the random variable Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) is defined by Θ(t, ϑ) = ϑ(t), for all ϑ ∈ C 0 ([0, T ], S). Also we will need the filtration of σ-algebras Ft defined as the smallest σ-algebra for which all the functions {Θ(s) : s ≤ t} are measurable. Last, we consider the shift τs : C 0 ([0, T ], S) → C 0 ([0, T − s], S) defined by τs (ϑ)(t) = ϑ(t + s). Note that Θ(t) ◦ τs = Θ(t + s). Also, it is helpful to keep in mind that, for all A ∈ C 0 (S, R), we have23 Eε` (A(Θ(t + kε))) = µ` (A(Θε (t + kε))) = µ` (A(Θε (t) ◦ Fεk )). Our goal is to compute, in some reasonable way, expectations of a function of Θ(t + s) conditioned to Ft , notwithstanding the above mentioned problems due to the fact that the dynamics is deterministic. Obviously, we can hope to obtain a result only in the limit ε → 0. Note that we can always reduce to the case in which the conditional expectation is zero by subtracting an appropriate function, thus it suffices to analyze such a case. The basic fact that we will use is the following. Lemma 3.4. Let t0 ∈ [0, T ] and A be a continuous bounded random variable on C 0 ([0, t0 ], S) with values in R. If we have lim sup |Eε` (A)| = 0, ε→0 `∈Lε 0 then, for each s ∈ [0, T − t ], standard pair `, uniformly bounded continuous functions {Bi }m i=1 , Bi : S → R and times {t1 < · · · < tm } ⊂ [0, s), ! m Y ε lim E` Bi (Θ(ti )) · A ◦ τs = 0. ε→0 i=1 Proof. The quantity we want to study can be written as ! m Y µ` Bi (Θε (ti )) · A(τs (Θε )) . i=1 22 Note that the set of standard pairs does not form a σ-algebra, so to turn the above into a precise statement would be a bit cumbersome. We thus prefer to follow a slightly different strategy, although the substance is unchanged. 23 To be really precise, maybe one should write, e.g., Eε (A ◦ Θ(t + kε)), but we conform to the ` above more intuitive notation. 16 CARLANGELO LIVERANI To simplify our notation, let ki = bti ε−1 c and km+1 = bsε−1 c. Also, for every stan˜ let L ˜ denote an arbitrary standard decomposition of (Fεki+1 −ki )∗ µ ˜ dard pair `, i,` ` Rb and define θ`∗ = µ` (θ) = a`` ρ` (x)G` (x)dx. Then, by Proposition 3.2, ! ! m m Y Y km+1 µ` Bi (Θε (ti )) · A(τs (Θε )) = µ` Bi (Θε (ti )) · A(τs−εkm+1 (Θε ◦ Fε )) i=1 = = X i=1 X ··· `1 ∈L1,` `m+1 ∈Lm,`m X X `1 ∈L1,` ··· `m+1 ∈Lm,`m "m Y # ν`i Bi (θ`∗i ) νm+1 µ`m+1 (A(Θε )) + o(1) i=1 "m Y # ν`i Bi (θ`∗i ) νm+1 Eε`m+1 (A) + o(1) i=1 where limε→0 o(1) = 0. The lemma readily follows. Lemma 3.4 implies that, calling P an accumulation point of Pε` , we have24 ! m Y (3.8) E Bi (Θ(ti )) · A ◦ τs = 0. i=1 This solves the conditioning problems thanks to the following Lemma 3.5. Property (3.8) is equivalent to E (A ◦ τs | Fs1 ) = 0, for all s1 ≤ s. Proof. Note that the statement of the Lemma immediately implies (3.8), we thus worry only about the other direction. If the lemma were not true then there would exist a positive measure set of the form K= ∞ \ {ϑ(ti ) ∈ Ki }, i=0 where the {Ki } is a collection of compact sets in S, and ti < s, on which the conditional expectation is strictly positive (or strictly negative, which can be treated in exactly the same way). For some arbitrary δ > 0, consider open sets Ui ⊃ Ki be such that P({ϑ(ti ) ∈ Ui \ Ki }) ≤ δ2−i . Also, let Bδ,i be a continuous function such that Bδ,i (ϑ) = 1 for ϑ ∈ Ki and Bδ,i (ϑ) = 0 for ϑ 6∈ Ui . Then ! n Y 0 < E(1K A ◦ τs ) = lim E Bδ,i (Θ(ti )) · A ◦ τs + C# δ = C# δ n→∞ i=1 which yields a contradiction by the arbitrariness of δ. In other words, we have recovered the possibility of conditioning with respect to the past after the limit ε → 0. 24 By E we mean the expectation with respect to P. NOTES 17 4. Lecture Four: Averaging (the Law of Large Numbers) We are now ready to provide the proof of Theorem 3.1. The proof consists of several steps; we first illustrate the global strategy while momentarily postponing the proof of the single steps. Proof of Theorem 3.1. As already mentioned we will prove the theorem for a larger class of initial conditions: any initial condition determined by a sequence of standard pairs. Note that for any fixed flat standard pair `, i.e. G` (x) = θ, we have an initial condition assumed in the statement of the Theorem. Given a standard pair ` let {Pε` } be the associate measures in path space (the latter measures being determined, as explained at the beginning of Section ??, by the standard pair ` and (3.1)). We have already seen in Lemma 3.3 that the set {Pε` } is tight. Next we will prove in Lemma 4.1 that, for each A ∈ C 2 (S, R), we have Z t hω(Θ(τ )), ∇A(Θ(τ ))idτ = 0. (4.1) lim sup Eε` A(Θ(t)) − A(Θ(0)) − ε→0 0 `∈Lε Accordingly, it is natural to consider the random variables A(t) defined by Z t A(t, ϑ) = A(ϑ(t)) − A(ϑ(0)) − hω(ϑ(τ )), ∇A(ϑ(τ ))idτ, 0 for each t ∈ [0, T ] and ϑ ∈ C 0 ([0, T ], S), and the first order differential operator (4.2) LA = hω, ∇Ai. Then equation (4.1), together with Lemmata 3.4 and 3.5, means that each accumulation point P` of {Pε` } satisfies, for all s ∈ [0, T ] and t ∈ [0, T − s], Z t+s (4.3) E` (A ◦ τs | Fs ) = E` A(Θ(t + s)) − A(Θ(s)) − LA(Θ(τ ))dτ Fs = 0 s this is the simplest possible version of the Martingale Problem. Indeed it implies that, for all θ, A and standard pair ` such that G` (x) = θ, Z t M (t) = A(Θ(t)) − A(Θ(0)) − LA(Θ(s))ds 0 is a martingale with respect to the measure Pθ and the filtration Ft (i.e., for each 0 ≤ s ≤ t ≤ T , Eθ (M (t) | Fs ) = M (s)).25 Finally we will show in Lemma 4.2 that there is a unique measure that has such a property: the measure supported on the unique solution of equation (3.2). This concludes the proof of the theorem. In the rest of this section we provide the missing proofs. 4.1. Differentiating with respect to time. Before starting with the proof of (4.1) we need to compare the dynamics of (xn , θn ) = Fεn (x0 , θ0 ) with the one of f∗ (x) = f (x, θ0 ). This is a type of shadowing but here we need it a quantitative form. 1 Problem 4.1. Show that there, for each n ≤ ε− 2 , there exist a map Yn ∈ C 1 (T, T) such that, setting x∗k = f∗k (Yn (x0 )) we have x∗n = xn and, for all k ≤ n, |xk − x∗k | ≤ C1 εk. 25 We use P to designate any measure P with G (x) = θ. θ ` ` 18 CARLANGELO LIVERANI 1 Problem 4.2. Sow that there exists C > 0 such that, for each n ≤ Cε− 2 , 2 2 e−c# εn ≤ |Yn0 | ≤ ec# εn . (4.4) In particular, Yn is invertible with uniformly bounded derivative. Lemma 4.1. For each A ∈ C 2 (S, R) we have26 Z t hω(Θ(s)), ∇A(Θ(s))ids = 0, lim sup Eε` A(Θ(t)) − A(Θ(0)) − ε→0 0 `∈Lε where (we recall) ω(θ) = mθ (ω(·, θ)) and mθ is the unique SRB measure of f (·, θ). Proof. We will use the notation of Appendix B. Given a standard pair ` let ρ` = ρ, θ`∗ = µ` (θ) and f∗ (x) = f (x, θ`∗ ). Then, by Lemmata B.1 and B.2, we can write, 1 for n ≤ Cε− 2 ,27 ! Z b n−1 X µ` (A(θn )) = ω(xk , θk ) dx ρ(x)A θ0 + ε a Z k=0 b = θ`∗ ρ(x)A +ε a Z n−1 X dx + O(ε2 n2 + ε) k=0 b ρ(x)A(θ`∗ )dx = +ε a Z ! ω(xk , θ`∗ ) n−1 XZ b k=0 ρ(x)h∇A(θ`∗ ), ω(xk , θ`∗ )idx + O(ε) a b ρ(x)A(G` (x))dx + O(ε) = a +ε n−1 XZ b k=0 ρ(x)h∇A(θ`∗ ), ω(f∗k ◦ Yn (x), θ`∗ )idx a = µ` (A(θ0 )) + ε n−1 XZ k=0 h χ[a,b] ρ Yn0 T1 ρ˜n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx + O(ε) i R ρkBV ◦ Yn−1 (x). Note that T1 ρ˜n = 1 but, unfortunately, k˜ where ρ˜n (x) = may be enormous. Thus, we cannot estimate the integral in the above expression by naively using decay of correlations. Yet, equation (B.3) implies |Yn0 − 1| ≤ C# εn2 . Moreover, ρ¯ = (χ[a,b] ρ) ◦ Y −1 has uniformly bounded variation.28 Accordingly, by the decay of correlations and the C 1 dependence of the invariant measure on θ (see Section 3.1) we have Z Z ρ˜n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx = ρ¯n (x)h∇A(θ`∗ ), ω(f∗k (x), θ`∗ )idx + O(εn2 ) T1 T1 = mLeb (˜ ρn (x))mθ`∗ (h∇A(θ`∗ ), ω(·, θ`∗ )i) + O(εn2 + e−c# k ) = µ` (h∇A(θ0 ), ω(θ0 )i) + O(εn2 + e−c# k ). Hence, (4.5) µ` (A(θn )) = µ` (A(θ0 ) + εnh∇A(θ0 ), ω ¯ (θ0 )i) + O(n3 ε2 + ε). 26 See Lemma C.1 for the relation between the differentiability mentioned in the title of the subsection and the present integral formula. 27 By O(εa nb ) we mean a quantity bounded by C εa nb , where C does not depend on `. # R R # 28 Indeed, for all ϕ ∈ C 1 , |ϕ| ≤ 1, R ρ ¯ϕ0 = ab ρ · ϕ0 ◦ Y · Y 0 = ab ρ(ϕ ◦ Y )0 ≤ kρkBV . ∞ NOTES 19 1 Finally, we choose n = dε− 3 e and set h = εn. We define inductively standard families such that L`0 = {`} and for each standard pair `i+1 ∈ L`i the family L`i+1 is 2 a standard decomposition of the measure (Fεn )∗ µ`i+1 . Thus, setting m = dtε− 3 e−1, recalling equation (4.5) and using repeatedly Proposition 3.2, Eε` (A(Θ(t))) = µ` (A(θtε−1 )) = µ` (A(θ0 )) + m−1 X µ` (A(θε−1 (k+1)h )) − A(θε−1 kh )) k=0 = µ` (A(θ0 )) + m−1 X X k=0 `1 ∈L`0 = Eε` A(Θ(0)) + ··· k−1 Y X i h 2 ¯ (θ0 )i) + O(ε) ν`j µ`k−1 (ε 3 h∇A(θ0 ), ω `k−1 ∈L`k−2 j=1 ! m−1 X h∇A(Θ(kh)), ω ¯ (Θ(kh))ih 1 + O(ε 3 t) k=0 Z t 1 = Eε` A(Θ(0)) + h∇A(Θ(s)), ω ¯ (Θ(s))ids + O(ε 3 t). 0 The lemma follows by taking the limit ε → 0. 4.2. The Martingale Problem at work. First of all let us finally specify precisely what we mean by the martingale problem. Definition 5 (Martingale Problem). Given a Riemannian manifold S, a linear operator L : D(L) ⊂ C 0 (S, Rd ) → C 0 (S, Rd ), a set of measures Py , y ∈ S, on C 0 ([0, T ], S) and a filtration Ft we say that {Py } satisfies the martingale problem if for each function A ∈ D(L), Py ({z(0) = y}) = 1 Z M (t, z) := A(z(t)) − A(z(0)) − t LA(z(s))ds is Ft -martingale under all Py . 0 We can now prove the last announced result. Lemma 4.2. If ω ¯ is Lipschitz, then the martingale problem determined by (4.2) has a unique solution consisting of the measures supported on the solutions of the ODE ˙ = ω(Θ) Θ (4.6) Θ(0) = y. Proof. Let Θ be the solution of (4.6) with initial condition y ∈ Td and Py the probability measure in the martingale problem. The idea is to compute d d Ey (kΘ(t) − Θ(t)k2 ) = Ey (hΘ(t), Θ(t)i) − 2h¯ ω (Θ(t)), Ey (Θ(t))i dt dt d − 2hΘ(t), Ey (Θ(t))i + 2h¯ ω (Θ(t)), Θ(t))i. dt To continue we use Lemma C.1 where, in the first term A(θ) = kθk2 , in the third A(θ) = θi and the generator in (4.3) is given by LA(θ) = h∇A(θ), ω ¯ (θ)i. d Ey (kΘ(t) − Θ(t)k2 ) = 2Ey (hΘ(t), ω ¯ (Θ(t))i) − 2h¯ ω (Θ(t)), Ey (Θ(t))i dt − 2hΘ(t), Ey (¯ ω (Θ(t)))i + 2Ey (hΘ(t), ω ¯ (Θ(t))i) = Ey (hΘ(t) − Θ(t), ω(Θ(t)) − ω(Θ(t))i). 20 CARLANGELO LIVERANI By the Lipschitz property of ω ¯ (let CL be the Lipschitz constant), using the Schwartz inequality and integrating we have Z t Ey (kΘ(t) − Θ(t)k2 ) ≤ 2CL Ey (kΘ(s) − Θ(s)k2 )ds 0 which, by Gronwall’s inequality, implies that Py ({Θ}) = 1. 4.3. A recap of what we have done so far. We have just seen that the martingale method (in Dolgopyat’s version) consists of four steps (1) Identify a suitable class of measures on path space which allow one to handle the conditioning problem (in our case: the one coming from standard pairs) (2) Prove tightness for such measures (in our case: they are supported on uniformly Lipschitz functions) (3) Identify an equation characterizing the accumulation points (in our case: an ODE) (4) Prove uniqueness of the limit equation in the martingale sense. The beauty of the previous scheme is that it can be easily adapted to a variety of problems. To convince the reader of this fact we proceed further and apply it to obtain more refined information on the behavior of the system. 5. Lecture five: A brief survey of further results In this lecture we will describe more precise, and interesting results, that can be obtained by the technique discussed in the previous lectures (and quite some more extra work). Problem 5.1. To complete the understanding of the previous sections read an meditate Appendix C. 5.1. Fluctuations around the limit. The next natural question is how fast the convergence takes place. To this end it is natural to consider, for each t ∈ [0, T ], 1 ζε (t) = ε− 2 Θε (t) − Θ(t) . Note that ζε is a random variable on T2 with values in C 0 ([0, T ], Rd ) which describes eε be the path measure describing ζε the fluctuations around the average.29 Let P eε = (ζε )∗ µ0 . when (x0 , θ0 ) are distributed according to the measure µ0 . That is, P eε , hence of the fluctuation So it is natural to ask what is the limit behavior of P around the average. The following result holds (the proof is provided, for the reader conveniece in the supplementary material presented in Section 6). eε } have a weak limit P. e Moreover, P e is the measure Theorem 5.1. The measures {P of the zero average Gaussian process defined by the Stochastic Differential Equation (SDE) (5.1) dζ = Dω ¯ (Θ)ζdt + σ(Θ)dB ζ(0) = 0, 29 Here we are using that S = Td can be lifted to its universal cover Rd . NOTES 21 where B is the Rd dimensional standard Brownian motion and the diffusion coefficient σ is given by 30 ∞ X σ(θ)2 =mθ (ˆ ω (·, θ) ⊗ ω ˆ (·, θ)) + mθ (ˆ ω (fθm (·), θ) ⊗ ω ˆ (·, θ)) + m=1 (5.2) + ∞ X mθ (ˆ ω (·, θ) ⊗ ω ˆ (fθm (·), θ)) . m=1 where ω ˆ = ω−ω ¯ and we have used the notation fθ (x) = f (x, θ). In addition, σ 2 is symmetric and non-negative, hence σ is uniquely defined as a symmetric positive definite matrix. Finally, σ(θ) is strictly positive, unless ω ˆ (θ, ·) is a coboundary for fθ . Problem 5.2. Show that, setting ψ(λ, t) = E(eihλ,ζ(t)i ), equation (5.1) implies that 1 ∂t ψ = hλ, Dω ¯ ∂λ ψi − hλ, σ 2 λiψ 2 ψ(λ, 0) = 1 which implies that ψ is a zero mean Gaussian. Remark 5.2. It is interesting to notice that equation (5.1) with σ ≡ 0 is just the equation for the evolution of an infinitesimal displacement of the initial condition, that is the linearised equation along an orbit of the averaged deterministic system. This is rather natural, since in the time scale we are considering, the fluctuations around the deterministic trajectory are very small. Remark 5.3. Note that the condition that insures that the diffusion coefficient σ is non zero can be constructively checked by finding periodic orbits with different averages. 5.2. Local Limit Theorem. In fact, with much more work, it is possible to prove a much sharper results Theorem 5.4 ([4]). For any T > 0, there exists ε0 > 0 so that the following holds. For any compact interval I ⊂ R, real numbers κ > 0, ε ∈ (0, ε0 ) and t ∈ [ε1/2000 , T ], any fixed θ0 ∈ T1 , if x0 is distributed according to a smooth distribution, we have: 2 (5.3) − 21 ε 2 e−κ /2σt (θ0 ) √ P(ζε (t) ∈ ε I + κ) = |I| · + O(1). σ t (θ0 ) 2π 1 2 where the variance σ 2t (θ0 ) is given by Z t R t 0 ¯ ¯ θ0 ))ds, (5.4) σ 2t (θ0 ) = e2 s ω¯ (θ(r,θ0 ))dr σ 2 (θ(s, 0 5.3. Wentzel-Friedlin processes. Wentzel and Friedlin in [21] have developed a rather detailed theory to study small perturbation of a deterministic dynamical system. The simplest possible example of the type of situations that they discuss is given by √ dη(t) = ω ¯ (η(t))dt + εσ(η(t))dB (5.5) η(0) = θ0 30 In our notation, for any measure µ and vectors v, w, µ(v ⊗ w) is a matrix with entries µ(vi wj ). 22 CARLANGELO LIVERANI where B is the standard Brownian motion. In fact, using Theorem 5.4 and [21, Theorem 2.2], it is possible to show that the process θε−1 t and η(t) are ε close with very high probability for times of √ order C# ln ε−1 . For such times the evolution of η(t) is such that the process is C# ε close to a sink of ω ¯ with very high probability, thus the same happens to our process. We have thus obtained a very good and simple description of our process θn for times of order n ∼ ε−1 ln ε−1 . Yet, a natural question now appears: is it possible to understand the process for longer times? Possibly infinitely long? 5.4. Statistical propeties of Fε . I believe that it is possible to answer positively to the above question for a generic ω. However at the moment the only available results hold in the case in which the center foliation has negative Lypunov exponent. Loosely stated the result reads: Theorem 5.5 ([5]). Assume that the central foliation has negative Lyapunov exponent and let nZ be the number of sinks of ω ¯ . Then, for a generic ω with the above property the following hold. For all ε > 0 sufficiently small, the map Fε has a finite number of SRB measures µε . If Fε has a unique SRB (and physical) measure which enjoys exponential decay of correlations with rate cε > 0. In other words, there exists C1 , C2 , C3 , C4 > 0 such that, for any two functions A, B ∈ C 2 , |m(A · B ◦ Fεn ) − m(A)µε (B)| ≤ C1 sup kA(·, θ)kC 2 sup kB(x, ·)kC 2 exp(−cε n). x θ Moreover, the rate of decay of correlations satisfies the following bounds: ( C2 ε/ log ε−1 if nZ = 1, (5.6) cε = C3 exp(−C4 ε−1 ) otherwise. The above shows that it is possible to understand quite well what happens in a particular case. The study of the general case stands as a challenge. 6. Supplementary material: Proof of Theorem 5.1 It is possible to study the limit behavior of ζε using the strategy summarized in Section 4.3, even though now the story becomes technically more involved. Let us eε be the path measure describing discuss the situation a bit more in detail. Let P ` ζε when (x0 , θ0 ) are distributed according to the standard pair `.31 Note that, eε = (ζε )∗ µ` . Again, we provide a proof of the claimed results based on some facts P ` that will be proven in later sections. eε is tight, which Proof of Theorem 5.1. First of all, the sequence of measures P ` will be proven in Proposition 6.1. Next, by Proposition 6.4, we have that Z t ε e Ls A(ζ(s))ds = 0, (6.1) lim sup E` A(ζ(t)) − A(ζ(0)) − ε→0 `∈Lε 0 where (6.2) d 1 X 2 (Ls A)(ζ) = h∇A(ζ), Dω ¯ (Θ(s))ζi + [σ (Θ(s))]i,j ∂ζi ∂ζj A(ζ), 2 i,j=1 31 As already explained, here we allow ` to stand also for a family {` } which weakly converges ε to `. In particular, this means that Θ is also a random variable, as it depends on the initial condition θ0 . NOTES 23 with diffusion coefficient σ 2 given by (5.2). In the following we will often write, slightly abusing notations, σ(t) for σ(Θ(t)). We can then use equation (6.1) and Lemma 3.4 followed by Lemma 3.5 to obtain that Z t A(ζ(t)) − A(ζ(0)) − Ls A(ζ(s))ds 0 e of the measures P eε with respect is a martingale under any accumulation point P ` e to the filtration Ft with P({ζ(0) = 0}) = 1. In Proposition 6.6 we will prove that eε = P. e such a problem has a unique solution thereby showing that limε→0 P ` Note that the time dependent operator Ls is a second order operator, this means that the accumulation points of ζε do not satisfy a deterministic equation, but e is equal in law to rather a stochastic one. Indeed our last task is to show that P the measure determined by the stochastic differential equation dζ = Dω ¯ ◦ Θ(t) ζdt + σdB (6.3) ζ(0) = 0 d where B is a standard R dimensional Brownian motion. Note that the above equation is well defined in consequence of Lemma 6.5 which shows that the matrix σ 2 is symmetric and non negative, hence σ = σ T is well defined and strictly positive if ω ˆ is not a coboundary (see Lemma 6.5). To conclude it suffices to show that the probability measure describing the solution of (6.3) satisfies the martingale problem.32 It follows from It¯o’s calculus, indeed if ζ is the solution of (6.3) and A ∈ C r , then It¯ o’s formula reads X 1X dA(ζ) = ∂ζi A(ζ)dζi + ∂ζi ∂ζj A(ζ)σik σjk dt. 2 i i,j,k Integrating it from s to t and taking the conditional expectation we have Z t E A(ζ(t)) − A(ζ(s)) − Lτ A(ζ(τ ))dτ Fs = 0. s See Appendix C for more details on the relation between the Martingale problem and the theory of SDE and how this allows to dispense form It¯o’s formula altogether. We have thus seen that the measure determined by (6.3) satisfies the martingale e since P e is the unique solution of the martingale problem, hence it must agree with P problem. The proof of the Theorem is concluded by noticing that (6.3) defines a zero mean Gaussian process (see the end of the proof of Proposition 6.6). 6.1. Tightness. eε }ε>0 are tight. Proposition 6.1. For every standard pair `, the measures {P ` Proof. Now the proof of tightness is less obvious since the paths have a Lipschitz constant that explodes. Luckily, there exists a convenient criterion for tightness: Kolmogorov criterion [19, Remark A.5]. Theorem 6.2 (Kolmogorov). Given a sequence of measures Pε on C 0 ([0, T ], R), if there exists α, β, C > 0 such that Eε (|z(t) − z(s)|β ) ≤ C|t − s|1+α 32 We do not prove that such a solution exists as this is a standard result in probability, [19]. 24 CARLANGELO LIVERANI for all t, s ∈ [0, T ] and the distribution of z(0) is tight, then {Pε } is tight. Note that ζε (0) = 0. Of course, it is easier to apply the above criteria with β ∈ N. It is reasonable to expect that the fluctuations behave like a Brownian motion, so the variance should be finite. To verify this let us compute first the case β = 2. Note that, setting ω ˆ (x, θ) = ω(x, θ) − ω ¯ (θ), ζε (t) = √ dε−1 te−1 X ε √ ω(xk , θk ) − ω ¯ (Θ(εk)) + O( ε) k=0 = √ dε ε −1 te−1 X √ ω ˆ (xk , θk ) + ω ¯ (θk ) − ω ¯ (Θ(εk)) + O( ε) k=0 (6.4) = √ dε−1 te−1 ε X √ √ ω ˆ (xk , θk ) + εDω ¯ (Θ(εk))ζε (kε) + O( ε) k=0 dε−1 te−1 + 3 X O(ε 2 kζε (εk)k2 ). k=0 We start with a basic result. Lemma 6.3. For each standard pair ` and k, l ∈ {0, . . . , ε−1 }, k ≥ l, we have 2 X k ω ˆ (xj , θj ) µ` ≤ C# (k − l). j=l The proof of the above Lemma is postponed to the end of the section. Let us see how it can be profitably used. Note that, for t = εk, s = εl, (6.5) e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s| + C# |t − s|ε E ` k X µ` (kζε (εj)k2 ) + C# ε, j=l 1 where we have used Lemma 6.3 and the trivial estimate kζε k ≤ C# ε− 2 . If we use the above with s = 0 and define Mk = supj≤k µ` (|ζε (εj)|2 ) we have Mk ≤ C# εk + C# k 2 ε2 Mk . Thus there exists C > 0 such that, if k ≤ Cε−1 , we have Mk ≤ C# εk. Hence, we can substitute such an estimate in (6.5) and obtain (6.6) e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s| + C# ε. E ` Since the estimate for |t − s| ≤ C# ε is trivial, we have the bound, e ε (kζ(t) − ζ(s)k2 ) ≤ C# |t − s|. E ` This is interesting but, unfortunately, it does not suffice to apply the Kolmogorov criteria. The next step could be to compute for β = 3. This has the well known disadvantage of being an odd function of the path, and hence one has to deal with the absolute value. Due to this, it turns out to be more convenient to consider directly the case β = 4. This can be done in complete analogy with the above NOTES 25 computation, by first generalizing the result of Lemma 6.3 to higher momenta. Doing so we obtain e ε (kζ(t) − ζ(s)k4 ) ≤ C# |t − s|2 , E ` (6.7) which concludes the proof of the proposition. Indeed, the proof of Lemma 6.3 explains how to treat correlations. Multiple correlations can be treated similarly and one can thus show that they do not contribute to the leading term. Thus the computation becomes similar (although much more involved) to the case of the sum independent zero mean random variables Xi (where no correlations are present), that is k X E([ Xi ]4 ) = i=l k X E(Xi1 Xi2 Xi2 Xi4 ) = i1 ,...,i4 =l k X E(Xi2 Xj2 ) = O((k − l)2 ). i,j=l For future use let us record that, by equation (6.7) and the Young inequality, (6.8) e ε (kζ(t) − ζ(s)k3 ) ≤ C# |t − s| 23 . E ` We still owe the reader the Proof of Lemma 6.3. The proof starts with a direct computation:33 2 k k X X ω ˆ (xj , θj ) ≤ µ` ω ˆ (xj , θj )2 µ` j=l j=l +2 k k X X µ` (ˆ ω (xj , θj )ˆ ω (xr , θr )) j=l r=l+1 ≤ C# |k − l| + 2 k k X X µ` (ˆ ω (xj , θj )ˆ ω (xr , θr )) . j=l r=j+1 To compute the last correlation, remember Proposition 3.2. We can thus call Lj the standard family associated to (Fεj )∗ µ` and, for r ≥ j, we write X µ` (ˆ ω (xj , θj )ˆ ω (xr , θr )) = ν`1 µ`1 (ˆ ω (x0 , θ0 )ˆ ω (xr−j , θr−j )) `1 ∈Lj = X Z b `1 ν`1 ρ`1 (x)ˆ ω (x, G`1 (x))ˆ ω (xr−j , θr−j ). a `1 `1 ∈Lj We would like to argue as in the proof of Lemma 4.1 and try to reduce the problem to Z b` Z b` 1 1 ρ`1 (x)ˆ ω (x, θ`∗1 )ˆ ω (xr−j , θ`∗1 ) = ρ`1 (x)ˆ ω (x, θ`∗1 )ˆ ω (f∗r−j (Yr−j (x)), θ`∗1 ) a `1 a`1 Z = T1 −1 ρ˜(x)ˆ ω (Yr−j (x), θ`∗1 )ˆ ω (f∗r−j (x), θ`∗1 ), but then the mistake that we would make substituting ρ˜ with ρ¯ is too big for our current purposes. It is thus necessary to be more subtle. The idea is to write 33 To simplify notation we do the computation in the case d = 1, the general case is identical. 26 CARLANGELO LIVERANI ρ`1 (x)ˆ ω (x, G`1 (x)) = α1 ρˆ1 (x)+α2 ρˆ2 (x), where ρˆ1 , ρˆ2 are standard densities.34 Note that α1 , α2 are uniformly bounded. Next, let us fix L > 0 to be chosen later and assume r − j ≥ L. Since `1,i = (G, ρˆi ) are standard pairs, by construction, calling L`1,i = (F r−j−L )∗ µ`1,i we have Z b`1 ρˆi (x)ˆ ω (xr−j , θr−j ) = a`1 X Z a`2 `2 ∈L`1,i = X Z ν`2 T1 `2 ∈L`1,i = X `2 ∈L`1,i b`2 ν`2 Z ν`2 T1 ρ`2 (x)ˆ ω (f∗L (YL (x)), θ`∗2 ) + O(εL) ρ˜(x)ˆ ω (f∗L (x), θ`∗2 ) + O(εL) ρ¯(x)ˆ ω (f∗L (x), θ`∗2 ) + O(εL2 ) = O(e−c# L + εL2 ), due to the decay of correlations for the map f∗ and the fact that ω ˆ (·, θ`∗2 ) is a zero average function for the invariant measure of f∗ . By the above we have 2 X k k X −c L [e # + εL2 ](k − j) + 1 + εL3 ω ˆ (xj , θj ) ≤ C# µ` j=l j=l which yields the result by choosing L = c log(k − j) for c large enough. 6.2. Differentiating with respect to time (poor man’s It¯ o’s formula). Proposition 6.4. For every standard pair ` and A ∈ C 3 (S, R) we have Z t ε e Ls A(ζ(s))ds = 0. lim sup E` A(ζ(t)) − A(ζ(0)) − ε→0 `∈Lε 0 Proof. As in Lemma 4.1, the idea is to fix h ∈ (0, 1) to be chosen later, and compute (6.9) e ε (A(ζ(t + h)) − A(ζ(t))) = E e ε (h∇A(ζ(t)), ζ(t + h) − ζ(t)i) E ` ` 1 e ε ( h(D2 A)(ζ(t))(ζ(t + h) − ζ(t)), ζ(t + h) − ζ(t)i) + O(h 23 ), +E ` 2 where we have used (6.8). Unfortunately this time the computation is a bit lengthy and rather boring, yet it basically does not contain any new idea, it is just a brute force computation. Let us start computing the last term of (6.9). Setting ζ h (t) = ζ(t + h) − ζ(t) P(t+h)ε−1 and Ωh = k=tε−1 ω ˆ (xk , θk ), by equations (6.4) and using the trivial estimate 34 In fact, it would be more convenient to define standard pairs with signed (actually even complex) measures, but let us keep it simple. NOTES 27 1 kζε (t)k ≤ C# ε− 2 , we have (t+h)ε−1 X e ε (h(D2 A)(ζ(t))ζ h (t), ζ h (t)i) = ε E ` µ` (h(D2 A)(ζε (t))ˆ ω (xk , θk ), ω ˆ (xj , θj )i) k,j=tε−1 (t+h)ε−1 + O ε 3 2 X µ` h Ω kζε (jε)k + O εµ` (kΩh k) j=ε−1 t (t+h)ε−1 (t+h)ε−1 X X + O ε2 µ` (kΩh k kζε (jε)k2 ) + ε2 j=tε−1 + O ε X (t+h)ε−1 µ` (kζε (kε)k) + ε 5 2 X k=tε−1 k,j=tε−1 (t+h)ε−1 (t+h)ε−1 X X + O ε2 µ` (kζε (kε)k kζε (jε)k) k,j=tε−1 (t+h)ε−1 3 2 µ` (kζε (kε)k2 ) + ε3 k=tε−1 µ` (kζε (kε)k kζε (jε)k2 ) + ε µ` (kζε (kε)k2 kζε (jε)k2 ) . k,j=tε−1 Observe that (6.6), (6.8) and (6.7) yield m µ` (kζε (kε)km ) = µ` (kζε (kε) − ζε (0)km ) ≤ C# (εk) 2 ≤ C# for m ∈ {1, 2, 3, 4} and k ≤ C# ε−1 . We can now use Lemma 6.3 together with Schwartz inequality to obtain (6.10) (t+h)ε−1 e ε (h(D2 A)(ζ(t))ζ h (t), ζ h (t)i) = ε E ` X µ` (h(D2 A)(ζε (t))ˆ ω (xk , θk ), ω ˆ (xj , θj )i) k,j=tε−1 √ + O( εh + h2 + ε). Next, we must perform a similar analysis on the first term of equation (6.9). e ε (h∇A(ζ(t)), ζ h (t)i) E ` = √ (t+h)ε−1 ε X µ` (h∇A(ζε (t)), ω ˆ (xk , θk )i) k=tε−1 (6.11) (t+h)ε−1 X +ε √ µ` (h∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk)i) + O( ε). k=tε−1 To estimate the term in the second line of (6.11) we have to use again (6.4): (t+h)ε−1 X µ` (h∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk)i) = hε−1 µ` (h∇A(ζε (t)), Dω ¯ (Θ(t))ζε (t)i) k=tε−1 −1 2 + O(ε h +ε − 12 h) + √ (t+h)ε−1 ε X k X k=tε−1 j=tε−1 ω (xj , θj )i). µ` (h∇A(ζε (t)), Dω ¯ (Θ(t))ˆ To compute the last term in the above equation let L` be the standard family 1 generated by ` at time ε−1 t, then, setting αε (θ, t) = ∇A(ε− 2 (θ − Θ(t))) and ˆj = 28 CARLANGELO LIVERANI j − tε−1 , we can write µ` (h∇A(ζε (t)), Dω ¯ (Θ(t))ˆ ω (xj , θj )i) = d X X ν`1 µ`1 (αε (θ0 , t)r Dω ¯ (Θ(t))r,s ω ˆ (xˆj , θˆj )s ). `1 ∈L` r,s=1 Next, notice that for every r, the signed measure µ`1 ,r (φ) = µ`1 (αε (θ0 , t)r φ) has density ρ`1 αε (G`1 (x), t)r whose derivative is uniformly bounded in ε, t. We can then write µ`1 ,r as a linear combination of two standard pairs `1,i . Finally, given L ∈ N, if ˆj ≥ L, we can consider the standard families L`1,i generated by `1,i at time ˆj − L and write, arguing as in the proof of Lemma 6.3, X µ`1,i (ˆ ω (xˆj , θˆj )s ) = ν`2 µ`2 (ˆ ω (xL , θL )s ) `2 ∈L`1,i = X `2 ∈L`1,i Z b `2 ν`2 a `2 ρ`2 (x)ˆ ω (fθL`∗ (x), θ`∗2 )s + O(εL2 ) = O(e−C# L + εL2 ). 2 Collecting all the above estimates yields (t+h)ε−1 (6.12) ε 1 X µ` (h∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk)i) = O(h2 + ε 2 h) k=tε−1 1 1 1 + hµ` (h∇A(ζε (t)), Dω ¯ (Θ(t))ζε (t)i) + O(h2 ε 2 L2 + h2 ε− 2 e−C# L + ε 2 Lh). To deal with the second term in the first line of equation (6.11) we argue as before: (t+h)ε−1 X k=tε−1 µ` (h∇A(ζε (t)), ω ˆ (xk , θk )i) = −1 tεX +L µ` (h∇A(ζε (t)), ω ˆ (xk , θk )i) k=tε−1 + O(hL2 + ε−1 heC# L ) = O(L + hL2 + ε−1 heC# L ). Collecting the above computations and remembering (6.4) we obtain (6.13) e ε (h∇A(ζ(t)), ζ h (t)i) =hE e ε (h∇A(ζ(t)), Dω E ¯ (Θ(t))ζ(t)i) ` ` √ √ 2 + O(h + L ε + h εL2 ) 1 provided L is chosen in the interval [C∗ ln ε−1 , ε− 4 ] with C∗ > 0 large enough. To conclude we must compute the term on the right hand side of the first line of equation (6.10). Consider first the case |j − k| > L. Suppose k > j, the other case being equal, then, letting L` be the standard family generated by ` at time ε−1 t, 1 and set kˆ = k − ε−1 t, ˆj = j − ε−1 t, B(x, θ, t) = (D2 A)(ε− 2 (θ − Θ(t))) X µ` (h(D2 A)(ζε (t))ˆ ω (xk , θk ), ω ˆ (xj , θj )i) = ν`1 µ`1 (hB(x0 , θ0 , t)ˆ ω (xkˆ , θkˆ ), ω ˆ (xˆj , θˆj )i). `1 ∈L` Note that the signed measure µ ˆ`1 ,r,s (g) = µ`1 (Br,s g) has a density with uniformly bounded derivative given by ρˆ`1 ,r,s = ρ`1 B(x, G`1 (x), t)r,s . Such a density can then be written as a linear combination of standard densities ρˆ`1 ,r,s = α1,`1 ,r,s ρ1,`1 ,r,s + α2,`1 ,r,s ρ2,`1 ,r,s with uniformly bounded coefficients αi,`1 ,r,s . We can then use the NOTES 29 same trick at time j and then at time k − L and obtain that the quantity we are interested in can be written as a linear combination of quantities of the type Z b ∗ µ`3 ,r,s (ˆ ωs (xL , θL )) = µ`3 ,r,s (ˆ ωs (xL , θ`3 ) + O(Lε) = ρ˜r,s ω ˆ s (fθL`∗ (x), θ`∗3 ) + O(L2 ε) a = O(e −C# L 3 2 + L ε) where we argued as in the proof of Lemma 6.3. Thus the total contribution of all such terms is of order L2 h2 + ε−1 e−C# L h2 . Next, the terms such that |k − j| ≤ L but j ≤ ε−1 t + L give a total contribution of order L2 ε while to estimate the other terms it is convenient to proceed as before but stop at the time j − L. Setting k˜ = k − j + L we obtain terms of the form µ`2 ,r,s (ˆ ωs (xk˜ , θk˜ )ˆ ωr (xL , θL )i) = Γk−j (θ`∗2 )r,s + O(e−C# L + L2 ε) where Z Γk (θ) = ω ˆ (fθk (x), θ) ⊗ ω ˆ (x, θ) mθ (dx). S The case j > k yields the same results but with Γ∗j . Remembering the smooth dependence of the covariance on the parameter θ (see [13]), substituting the result of the above computation in (6.10) and then (6.10) and (6.13) in (6.9) we finally have e ε (A(ζ(t + h)) − A(ζ(t))) = hE e ε (h∇A(ζ(t)), Dω E ¯ (Θ(t))ζ(t)i) ` ` √ √ ε 2 2 e (Tr(σ (Θ(t))D A(ζ(t)))) + O(L ε + hL2 ε + h2 L2 ) + hE ` Z t+h h i e ε (h∇A(ζ(s)), Dω e ε (Tr(σ 2 (Θ(s))D2 A(ζ(s))) ds = E ¯ (Θ(s))ζ(s)i) + E ` ` t √ √ 3 + O(L ε + hL2 ε + h 2 + h2 L2 ). The proposition follows by summing the h−1 t terms in the interval [0, t] and by 1 1 choosing L = ε− 100 and h = ε 3 . In the previous Lemma the expression σ 2 just stands for a specified matrix, we did not prove that such a matrix is positive definite and hence it has a well defined square root σ, nor we have much understanding of the properties of such a σ (provided it exists). To clarify this is our next task. Lemma 6.5. The matrices σ 2 (s), s ∈ [0, T ], are symmetric and non negative, hence they have a unique real symmetric square root σ(s). In addition, if, for each v ∈ Rd , hv, ω ¯ i is not a smooth coboundary, then there exists c > 0 such that σ(s) ≥ c1. Proof. For each v ∈ Rd a direct computation shows that " #2 n−1 n−1 X 1 X 1 lim mθ hv, ω(fθk (·), θ)i = lim mθ hv, ω(fθk (·), θ)ihv, ω(fθj (·), θ)i n→∞ n n→∞ n k=0 k,j=0 = mθ (hv, ω(·, θ)i2 ) + lim n→∞ = mθ (hv, ω(·, θ)i2 ) + 2 ∞ X k=1 2 n n−1 X (n − k)mθ (hω(·, θ), vihv, ω(fθk (·), θ)i) k=1 mθ hω(·, θ), vihv, ω(fθk (·), θ)i = hv, σ(θ)2 vi. 30 CARLANGELO LIVERANI This implies that σ(θ)2 ≥ 0 and since it is symmetric, there exists, unique, σ(θ) symmetric and non-negative. On the other hand if hv, σ 2 (θ)vi = 0, then, by the decay of correlations and the above equation, we have " #2 n−1 X mθ hv, ω(fθk (·), θ)i = n mθ (hv, ω(·, θ)i2 ) k=1 + 2n n−1 X mθ hv, ω(·, θ)ihv, ω(fθk (·), θ)i + O(1) k=0 = 2n ∞ X mθ hv, ω(·, θ)ihv, ω(fθk (·), θ)i + O(1) = O(1). k=n Pn−1 Thus the L2 norm of φn = k=1 hv, ω(fθk (·), θ)i is uniformly bounded. Hence there exist a weakly convergent subsequence. Let φ ∈ L2 be an accumulation point, then for each ϕ ∈ C 1 we have mθ (φ ◦ fθ ϕ) = lim mθ (φnk ◦ fθ ϕ) = mθ (φϕ) − mθ (hv, ω(·, θ)iϕ) k→∞ That is hv, ω(x, θ)i = φ(x) − φ ◦ fθ (x). In other words hv, ω(x, θ)i is an L2 coboundary. Since the Livsic Theorem [14] states that the solution of the cohomological equation must be smooth, we have φ ∈ C 1 . 6.3. Uniqueness of the Martingale Problem. We are left with the task of proving the uniqueness of the martingale problem. Note that in the present case the operator depends explicitly on time. Thus if we want to set the initial condition at a time s 6= 0 we need to slightly generalise the definition of martingale problem. To avoid this, for simplicity, here we consider only initial conditions at time zero, which suffice for our purposes. In fact, we will consider only the initial condition ζ(0) = 0, since it is the only one we are interested in. We have then the same definition of the martingale problem as in Definition 5, apart form the fact that L is replaced by Ls and y = 0. Since the operators Ls are second order operators, we could use well known results. Indeed, there exists a deep theory due to Stroock and Varadhan that establishes the uniqueness of the martingale problem for a wide class of second order operators, [17]. Yet, our case is especially simple because the coefficients of the higher order part of the differential operator depend only on time and not on ζ. In this case it is possible to modify a simple proof of the uniqueness that works when all the coefficients depend only on time, [17, Lemma 6.1.4]. We provide here the argument for the reader’s convenience. Proposition 6.6. The martingale problem associated to the operators Ls in Proposition 6.4 has a unique solution. Proof. As already noticed, Lt , defined in (6.2), depends on ζ only via the coefficient of the first order part. It is then natural to try to change measure so that such a dependence is eliminated and we obtain a martingale problem with respect to an operator with all coefficients depending only on time, then one can conclude arguing as in [17, Lemma 6.1.4]. Such a reduction is routinely done in probability via the Cameron-Martin-Girsanov formula. Yet, given the simple situation at hand NOTES 31 one can proceed in a much more naive manner. Let S(t) : [0, T ] → Md , Md being the space of d × d matrices, be the generated by the differential equation ˙ S(t) = −Dω(Θ(t))S(t) S(0) = 1. Note that, setting ς(t) = det S(t) and B(t) = Dω(Θ(t)), we have ς(t) ˙ = − tr(B(t))ς(t) ς(0) = 1. The above implies that S(t) is invertible. Define the map S ∈ C 0 (C 0 ([0, T ], Rd ), C 0 ([0, T ], Rd )) by [Sζ](t) = S(t)ζ(t) and e Note that the map S is invertible. Finally, we define the operator set P = S∗ P. 1Xb 2 Lbt = [Σ(t) ]i,j ∂ζi ∂ζj , 2 i,j b 2 = S(t)σ(t)2 S(t)∗ , σ(t) = σ(Θ(t)) as mentioned after (6.2). Let us where Σ(t) verify that P satisfies the martingale problem with respect to the operators Lbt . By Lemma C.1 we have d de | Fs ) E(A(ζ(t)) | Fs ) = E(A(S(t)ζ(t)) dt dt e S(t)∇A(S(t)ζ(t)) ˙ = E( + Lt A(S(t)ζ(t)) | Fs ) 1e X 2 = E σ (t)i,j ∂ζk ∂ζl A(S(t)ζ(t))S(t)k,i S(t)l,j Fs 2 i,j,k,l = E(Lbt A(ζ(t)) | Fs ). Thus the claim follows by Lemma C.1 again. Accordingly, if we prove that the above martingale problem has a unique solution, e concluding then P is uniquely determined, which, in turn, determines uniquely P, the proof. Let us define the function B ∈ C 1 (R2d+1 , R) by 1 B(t, ζ, λ) = ehλ,ζi− 2 Rt s b )2 λidτ hλ,Σ(τ then Lemma C.1 implies d 1 b 2 λiB(t, ζ(t), λ) + Lbt B(t, ζ(t), λ) | Fs ) = 0. E(B(t, ζ(t), λ) | Fs ) = E(− hλ, Σ(t) dt 2 Hence Rt 2 1 b E(ehλ,ζ(t)i | Fs ) = ehλ,ζ(s)i+ 2 s hλ,Σ(τ ) λidτ . From this follows that the finite dimensional distributions are uniquely determined. Indeed, for each n ∈ N, {λi }ni=1 and 0 ≤ t1 < · · · < tn we have Pn Pn−1 E e i=1 hλi ,ζ(ti )i = E e i=1 hλi ,ζ(ti )i E ehλn ,ζ(tn )i Ftn−1 Pn−2 1 R tn hλ ,Σ(τ b )2 λn idτ n = E e i=1 hλi ,ζ(ti )i+hλn−1 +λn ,ζ(tn−1 )i e 2 tn−1 1 = e2 R tn Pn b )2 Pn h i=n(τ ) λi ,Σ(τ i=n(τ ) λi idτ 0 32 CARLANGELO LIVERANI where n(τ ) = inf{m | tm ≥ τ }. This concludes the Lemma since it implies that the measure is uniquely determined on the sets that generate the σ-algebra.35 Note that we have also proven that the process is a zero mean Gaussian process; this, after translating back to the original measure, generalises Remark 5.2. Appendix A. Geometry For c > 0, consider the cones Cc = {(ξ, η) ∈ R2 : |η| ≤ εc|ξ|}. Note that ∂x f ∂θ f dFε = . ε∂x ω 1 + ε∂θ ω Thus if (1, εu) ∈ Cc , dp Fε (1, εu) = (∂x f (p) + εu∂θ f (p), ε∂x ω(p) + εu + ε2 u∂θ ω(p)) ∂θ f (p) = ∂x f (p) 1 + ε u · (1, εΞp (u)) ∂x f (p) where (A.1) Ξp (u) = ∂x ω(p) + (1 + ε∂θ ω(p))u . ∂x f (p) + ε∂θ f (p)u Thus the vector (1, εu) is mapped to the vector (1, εΞp (u)). Thus letting K = max{k∂x ωk∞ , k∂θ ωk∞ , k∂θ f k∞ } we have, for |u| ≤ c and assuming Kεc ≤ 1, |Ξp (u)| ≤ K + (1 + εK)c K +1+c ≤ . λ − εKc λ−1 −1 Thus, if we choose c ∈ [ K+1 ] we have that dp Fε (Cc ) ⊂ Cc . Since this λ−2 , (εK) −1 implies that dp Fε {Cc ⊂ {Cc we have that the complementary cone {CKε−1 is invariant under dFε−1 . From now on we fix c = K+1 λ−2 . 1+d Hence, for any p ∈ T and n ∈ N, we can define the quantities vn , un , sn , rn as follows: (A.2) dp Fεn (1, 0) = vn (1, εun ) dp Fεn (sn , 1) = rn (0, 1) with |un | ≤ c and |sn | ≤ K. For each n the slope field sn is smooth, therefore integrable; given any small ∆ > 0 and p = (x, θ) ∈ T1+d , define Wnc (p, ∆) the local n-step central manifold of size ∆ as the connected component containing p of the intersection with the strip {|θ0 − θ| < ∆} of the integral curve of (sn , 1) passing through p. Notice that, by definition, dp Fε (sn (p), 1) = rn /rn−1 (sn−1 (Fε p), 1); thus, by definition, there exists a constant b such that: rn (A.3) exp(−bε) ≤ ≤ exp(bε). rn−1 Qn−1 Furthermore, define Γn = k=0 ∂x f ◦ Fεk , and let ∂θ f (A.4) a = c ∂x f . ∞ Clearly, (A.5) 35 See Problem 3.2. Γn exp(−aεn) ≤ vn ≤ Γn exp(aεn). NOTES 33 Appendix B. Shadowing In this section we provide a simple quantitative version of shadowing that is needed in the argument. Let (xk , θk ) = Fεk (x, θ) with k ∈ {0, . . . , n}. We assume that θ belongs to the range of a standard pair ` (i.e., θ = G(x) for some x ∈ [a, b]). Let θ∗ ∈ S such that kθ∗ − θk ≤ ε and set f∗ (x) = f (x, θ∗ ). Let us denote with πx : T2 → S the canonical projection on the x coordinate; then, for any s ∈ [0, 1], let n Hn (x, z, s) = πx Fsε (x, θ∗ + s(G` (x) − θ∗ )) − f∗n (z) Note that, Hn (x, x, 0) = 0, in addition, for any x, z and s ∈ [0, 1] ∂z Hn (x, z, s) = −(f∗n )0 (z). Accordingly, by the Implicit Function Theorem any n ∈ N and s ∈ [0, 1], there exists Yn (x, s) such that36 Hn (x, Yn (x, s), s) = 0; from now on Yn (x) stands for Yn (x, 1). Note that setting x∗k = f∗k (Yn (x)), by construction, x∗n = xn . Observe moreover that (1 − G0` sn )vn (B.1) ∂x Yn = (f∗n )0 (z)−1 d(πx Fεn ) = , (f∗n )0 ◦ Yn where we have used the notations introduced in equation (A.2). Recalling (A.5) and by the cone condition we have n−1 n−1 Y ∂x f (xk , θk ) (1 − G0 sn )vn Y ∂x f (xk , θk ) ` ≤ ec# εn (B.2) e−c# εn ≤ . ∗ 0 n 0 f∗ (xk ) (f∗ ) f∗0 (x∗k ) k=0 k=0 Next, we want to estimate to which degree x∗k shadows the true trajectory. 1 Lemma B.1. There exists C > 0 such that, for each k ≤ n < Cε− 2 we have kθk − θ∗ k ≤ C# εk |xk − x∗k | ≤ C# εk. Proof. Observe that θk = ε k−1 X ω(xj , θj ) + θ0 j=0 thus kθk − θ∗ k ≤ C# εk. Accordingly, let us set37 ξk = x∗k − xk ; then, by the mean value theorem, |ξk+1 | = |∂x f · ξk + ∂θ f · (θk − θ∗ )| ≥ λ|ξk | − C# εk. Since, by definition, ξn = 0, we can proceed by backward induction, which yields |ξk | ≤ n−1 X j=k −j+k λ C# εj ≤ C# ε ∞ X λ−j (j + k) ≤ C# εk. j=0 36 The Implicit Function Theorem allows to define Y (x, s) in a neighborhood of s = 0; in fact n we claim that this neighborhood necessarily contains [0, 1]. Otherwise, there would exist s¯ ∈ (0, 1) and x ¯ so that Yn is defined at (¯ x, s¯) but not at (¯ x, s) with s > s¯. We then could apply the Implicit Function Theorem at the point (¯ x, Yn (¯ x, s¯), s¯) and obtain, by uniqueness, an extension of the previous function Yn to a larger neighborhood of s = 0, which contradicts our assumption. 37 Here, as we already done before, we are using the fact that we can lift T1 to the universal covering R. 34 CARLANGELO LIVERANI 1 Lemma B.2. There exists C > 0 such that, for each n ≤ Cε− 2 , 2 2 e−c# εn ≤ |Yn0 | ≤ ec# εn . (B.3) In particular, Yn is invertible with uniformly bounded derivative. Proof. Let us prove the upper bound, the lower bound being similar. By equations (B.1), (B.2) and Lemma B.1 we have |Yn0 | ≤ ec# εn e Pn−1 k=0 ln ∂x f (xk ,θk )−ln f∗0 (x∗ k) ≤ ec# εn ec# Pn−1 k=0 εk . ¯ ’s calculus Appendix C. Martingales, operators and Ito Suppose that Lt ∈ L(C r (Rd , R), C 0 (Rd , R)), t ∈ R, is a one parameter family of bounded linear operators that depends continuously on t.38 Also suppose that P is a measure on C 0 ([0, T ], Rd ) and let Ft be the σ-algebra generated by the variables {z(s)}s≤t .39 Lemma C.1. The two properties below are equivalent: (1) For all A ∈ C 1 (Rd+1 , R), such that, for all t ∈ R, A(t, ·) ∈ C r (Rd , R), and for all times s, t ∈ [0, T ], s < t, the function g(t) = E(A(t, z(t)) | Fs ) is differentiable and g 0 (t) = E(∂t A(t, z(t)) + Lt A(t, z(t)) | Fs ). Rt (2) For all A ∈ C r (Rd , R), M (t) = A(z(t)) − A(z(0)) − 0 Ls A(z(s))ds is a martingale with respect to Ft . Proof. Let us start with (1) ⇒ (2). Let us fix t ∈ [0, T ], then for each s ∈ [0, t] let us define the random variables B(s) by Z t B(s, z) = A(z(t)) − A(z(s)) − Lτ A(z(τ ))dτ. s Clearly, for each z ∈ C 0 , B(s, z) is continuous in s, and B(t, z) = 0. Hence, for all τ ∈ (s, t], by Fubini we have40 Z t d d d E(B(τ ) | Fs ) = − E(A(z(τ )) | Fs ) − E(Lr A(z(r)) | Fs )dr dτ dτ dτ τ = E(−Lτ A(z(τ )) + Lτ A(z(τ )) | Fs ) = 0. Thus, since B is bounded, by Lebesgue dominated convergence theorem, we have 0 = E(B(t) | Fs ) = lim E(B(τ ) | Fs ) = E(B(s) | Fs ). τ →0 This implies E(M (t) | Fs ) = E(B(s) | Fs ) + M (s) = M (s) as required. Next, let us check (2) ⇒ (1). For each h > 0 we have E(A(t + h, z(t + h))−A(t, z(t)) | Fs ) = E ((∂t A)(t, z(t + h)) | Fs ) h + o(h) ! Z t+h + E M (t + h) − M (t) + Lτ A(t, z(τ ))dτ | Fs . t 38 Here C r are thought as Banach spaces, hence consist of bounded functions. A more general setting can be discussed by introducing the concept of a local martingale. 39 At this point the reader is supposed to be familiar with the intended meaning: for all ϑ ∈ C 0 ([0, T ], Rd ), [z(s)](ϑ) = z(ϑ, s) = ϑ(s). 40 If uncomfortable about applying Fubini to conditional expectations, then have a look at [19, Theorem 4.7]. NOTES 35 Since M is a martingale E (M (t + h) − M (t) | Fs ) = 0. The lemma follows by Lebesgue dominated convergence theorem. The above is rather general, to say more it is necessary to specify other properties of the family of operators Ls . A case of particular interest arises for second order differential operators like (6.2). Namely, suppose that (Ls A)(z) = X a(z, s)i ∂zi A(z) + i d 1 X 2 [σ (z, s))]i,j ∂zi ∂zj A(z), 2 i,j=1 where, for simplicity, we assume a, σ to be smooth and bounded and σij = σji . Clearly, (6.2) is a special case of the above. In such a case it turns out that it can be established a strict connection between Ls and the Stochastic Differential Equation (C.1) dz = adt + σdB where B is the standard Brownian motion. The solution of (C.1) can be defined in various way. One possibility is to define it as the solution of the Martingale problem [17], another is to use stochastic integrals [19, Theorem 6.1]. The latter, more traditional, approach leads to It¯o’s formula that reads, for each bounded continuous function A of t and z, [19, page 91], Z t Z tX A(z(t), t) − A(z(0), 0) = ∂s A(z(s), s)ds + a(z(s), s)i ∂zi A(z(s), s)ds 0 0 Z 1 tX + + 2 0 σ 2 (z(s), s)∂zi ∂zj A(z(s), s)ds i,j XZ i,j i t σij (z(s), s)∂zj A(z(s), s)j dBi (s) 0 where the last is a stochastic integral [19, Theorem 5.3]. This formula is often written in the more impressionistic form 1 dA = ∂t Adt + a∂z Adt + σ∂z AdB + σ 2 ∂z2 Adt = ∂t Adt + σ∂z AdB + Lt Adt. 2 Taking the expectation with respect to E(· | Fs ) we obtain exactly condition (1) of Lemma C.1, hence we have that the solution satisfies the Martingale problem. Remark C.2. Note that, if one defines the solution of (C.1) as the solution of the associated Martingale problem, then one can dispense from It¯ o’s calculus altogether. This is an important observation in our present context in which the fluctuations come form a deterministic problem rather than from a Brownian motion and hence a direct application of It¯ o’s formula is not possible. Appendix D. Solutions to the Problems Here we provide the solution of the problems given in the text. Problem 1.1. Just compute. Problem 1.2. Approximate uniformly f with step functions. 36 CARLANGELO LIVERANI Problem 1.3. Consider any Borel probability measure ν and define the following sequence of measures {νn }n∈N :41 for each Borel set A νn (A) = ν(T −n A). The reader can easily see that νn ∈ M1 (X), the sets of the probability measures. Indeed, since T −1 X = X, νn (X) = 1 for each n ∈ N. Next, define µn = n−1 1X νi . n i=0 Again µn (X) = 1, so the sequence {µi }∞ i=1 is contained in a weakly compact set (the unit ball) and therefore admits a weakly convergent subsequence {µni }∞ i=1 ; let µ be the weak limit.42 We claim that µ is T invariant. Since µ is a Borel measure it suffices to verify that for each f ∈ C 0 (X) holds µ(f ◦ T ) = µ(f ). Let f be a continuous function, then by the weak convergence we have43 nj −1 nj −1 1 X 1 X νi (f ◦ T ) = lim ν(f ◦ T i+1 ) j→∞ nj j→∞ nj i=0 i=0 (nj −1 ) X 1 nj = lim νi (f ) + ν(f ◦ T ) − ν(f ) = µ(f ). j→∞ nj i=0 µ(f ◦ T ) = lim Problem 1.4. For any continuous function f Z T∗ µ(f ) = f ◦ T (x)h(x)dx. T Divide the integral in monotonicity intervals for T and change variable in each interval. Problem 1.5. For any function f ∈ C 1 compute d T 00 1 Lh = L h0 0 + L h 0 2 dx T (T ) Integrating and iterating this yields the first inequalities. Since (Ln h)0 is bounded in L1 , then also Ln h must be bounded in C 0 . The second inequalities follows. The third inequality is more of the same. Problem 1.6. Use the last fact of Problem 1.5 and Lemma 1.2. 41Intuitively, if we chose a point x ∈ X at random, according to the measure ν and we ask what is the probability that T n x ∈ A, this is exactly ν(T −n A). Hence, our procedure to produce the point T n x is equivalent to picking a point at random according to the evolved measure νn . 42This depends on the Riesz-Markov Representation Theorem [16] that states that M(X) is exactly the dual of the Banach space C 0 (X). Since the weak convergence of measures in this case correspond exactly to the weak-* topology [16], the result follows from the Banach-Alaoglu theorem stating that the unit ball of the dual of a Banach space is compact in the weak-* topology. 43Note that it is essential that we can check invariance only on continuous functions: if we would have to check it with respect to all bounded measurable functions we would need that µn converges in a stronger sense (strong convergence) and this may not be true. Note as well that this is the only point where the continuity of T is used: to insure that f ◦ T is continuous and hence that µnj (f ◦ T ) → µ(f ◦ T ). NOTES 37 Problem 1.7. Recall that a dynamical systems (X, T, µ) is mixing if for all probability measures µ absolutely continuous with respect to µ one has limn→∞ T∗n ν = µ, where the converges takes place in the weak topology. Also note that the third fact of Problem 1.5 implies that the unique invariant measure µ absolutely continuous with respect to Lebesgue is, in fact, equivalente to Lebesgue. Let dµ = hdx, h ∈ L1 , be a probability measure. Then we can approximate it, in L1 , with a sequence of density of probability densities {hn } ⊂ W 1,1 . Let ν be the invariant measure provided by Lemma 1.2. Then for each f ∈ C 0 , Z Z n lim T∗ µ(f ) = lim lim L hk f = lim lim Ln hk f = ν(f ). n→∞ n→∞ k→∞ k→∞ n→∞ Problem 2.1. R (1) Consider α(f ) = f (x, y)µ(dx)ν(dy). Obviously α ∈ G(µ, ν) (2) First of all d(µ, µ) = 0 since we can consider the coupling Z α(f ) = f (x, x)µ(dx). Next, note that G(µ, ν) is weakly closed and is a subset of the probability measures on X 2 , which is weakly compact. Hence, G(µ, ν) is weakly compact, thus the inf is attained. Accordingly, if d(ν, µ) = 0, there exists α ∈ G(ν, µ) such that α(d) = 0. Thus α is supported on the set D = {(x, y) ∈ X 2 | x = y}. Thus, for each ϕ ∈ C 0 (X), we have, setting π1 (x, y) = x and π2 (x, y) = y, µ(ϕ) = α(ϕ ◦ π1 ) = α(1D ϕ ◦ π1 ) = α(1D ϕ ◦ π2 ) = α(ϕ ◦ π2 ) = ν(ϕ). The fact that d(ν, µ) = d(µ, ν) is obvious from the definition. It remains to prove the triangle inequality. It is possible to obtain a fast argument by using the disintegration of the couplings, but here is an alimentary proof. Let us start with some preparation. Since X is compact, for each ε we can construct a finite partition {pi }M i=1 of X such that each pi has diameter less than ε (do it). Given two probability measures µ, ν and α ∈ G(µ, ν) note that if µ(pi ) = 0, then X α(pi × pj ) = α(pi × X) = µ(pi ) = 0. j Hence α(pi × pj ) = 0 for all j. Thus we can define XZ α(pi × pj ) µ(dx)ν(dy), αε (f ) = f (x, y)1pi (x)1pj (y) µ(p 2 i )ν(pj ) i,j X where the sun runs only on the indixes for which the denominator is strictly positive. It is easy to check that αε ∈ G(µ, ν) and that the weak limit of αε is α. Finally, let ν, µ, ζ be three probability measures and let α ∈ G(µ, ζ), β ∈ G(ζ, ν) such that d(µ, ζ) = α(d) and d(ν, ζ) = β(d). For each ε > 0, there exists δ > 0 such that |α(d) − αδ (d)| ≤ ε and, likewise, |β(d) − βδ (d)| ≤ ε. We can then define the following measure on X 3 XZ α(pi × pj )β(pj × pk ) γδ (f ) = f (x, z, y)1pi (x)1pj (z)1pk (y) µ(dx)ν(dy)ζ(dz), µ(pi )ζ(pj )2 ν(pk ) 3 X i,j,k 38 CARLANGELO LIVERANI where again the sum is restricted to the indexes for which the denominator is strictly positive. The reader can check that the marginal on (x, z) is αδ , the marginal on (z, y) is βδ . It follows that the marginal on (x, y) belongs to G(µ, ν), thus Z Z Z d(µ, ν) ≤ d(x, y)γδ (dx, dz, dy) ≤ d(x, z)αδ (dx, dz) + d(z, y)βδ (dz, dy) X3 X2 X2 ≤ α(d) + β(d) + 2ε = d(µ, ζ) + d(ζ, ν) + 2ε. The result follows by the arbitrariness of ε. (3) It suffices to prove that µn converges weakly to µ if and only if lim d(µn , µ) = 0. n→∞ If it converges in the metric, then, letting αn ∈ G(µn , µ), for each Lipschitz function f we have Z |µn (f ) − µ(f )| ≤ |f (x) − f (y)|αn (dx, dy) ≤ Lf αn (d) X2 where Lf is the Lipschitz constant. Taking the inf on αn we have |µn (f ) − µ(f )| ≤ Lf d(µn , µ). We have thus that limn→∞ µn (f ) = µ(f ) for each Lipschitz function. The claim follows since the Lipschitz functions are dense by StoneWeierstrass Theorem. If it converges weakly, then, to prove convergence in the metric, we need a slightly more sophisticated partitions Pδ : partitions with the property that µ(∂pi ) = 0. Note that this implies limn→∞ µn (pi ) = µ(pi ), [18]. Let us construct explicitly such partitions. For each x ∈ X consider the balls Br (x) = {z ∈ X : d(x, z) < r}. Give δ, let S1,1 = {x ∈ B δ \ B 34 δ } and S1,2 = {x ∈ B 12 δ \ B 41 δ }. These two spherical shells are disjoins. Let σ(1) ∈ {1, 2} be such that µ(S1,σ(1) ) = min{µ(S1,1 ), µ(S1,2 )}. Divide again the spherical shell S1,σ(1) into three, throw away the middle part and let S2,σ(2) be the one with the smaller measure. Continue in this way to obtain a sequence Sn,σ(n) . Note that µ(Bδ (x)) ≥ 2n µ(Sn,σ(n) ) and Sn+1,σ(n+1) ⊂ Sn,σ(n) , thus there exists r(x) ∈ [δ/3, δ] such that ∂Br(x) (x) = ∩∞ n=1 Sn,σ(n) and µ(∂Br(x) (x)) = 0. Since X is compact we can extract a finite sub cover, {Bi }N i=1 , from the cover {Br(x) (x)}x∈X . If we consider all the (open) sets Bi ∩ Bj they form a mod-0 partition of X. To get a partition {pi } just attribute the boundary points in any (measurable) way you like.44 Also, for each partition element pi choose a reference point xi ∈ pi . Having constructed the wanted partition we can discretize any measure ν by associating to it the measure X νδ (f ) = f (xi )ν(pi ). i Define also α(f ) = XZ i f (x, xi )ν(dx) pi 44 E.g., if C are the elements of the partition, you can set p = C , p = C \ (∂C ) and so 1 1 2 2 1 i on. NOTES 39 and check that α ∈ G(ν, νδ ), hence d(ν, νδ ) ≤ α(d) ≤ 2δ. For each n ∈ N and δ > 0 we can then write d(µ, µn ) ≤ d(µδ , µn,δ ) + 4δ. P Next, let zn,i = min{µ(pi ), µn (pi )}, Zn−1 = i zn,i , and define X (1 − Zn )2 X βn (f ) = Zn f (xi , xi )zi + f (xi , xj )[µ(pi ) − zn,i ] · [µn (pi ) − zn,i ] Zn2 i i,j and verify that β ∈ G(µδ , µn,δ ). In addition, for each δ > 0, limn→∞ zn,i = µn (pi ). Hence, limn→∞ Zn = 1. Collecting the above fact, and calling K the diameter of X, yields (1 − Zn )2 + 4δ = 4δ. n→∞ n→∞ n→∞ Zn2 The result follows by the arbitriness of δ. Comment: In the field of optimal transport one usually would prove the above facts via the duality relation lim d(µ, µn ) ≤ lim βn (d) + 4δ ≤ K lim d(µ, ν) = sup {µ(φ) − ν(φ)} φ∈L1 where L1 is the set of Lipschitz function with Lipschitz constant equal 1. We refrain from this point of view because, in spite of its efficiency, requires the development of a little bit of technology outside the scopes of this notes. The interested reader can see [20, Chapter 1] for details. (4) The first metric gives rise to the usual topology, hence convergence in d is equivalente to the usual weak convergence of measures. The metric d0 instead give rise to the discrete topology, hence each function in L∞ is continuous. Hence the convergence in the d is equivalent to the usual strong convergence of measures. Problem 2.2. Just compute. Problem 2.3. Use the third part of Problem 1.5. Problem 3.1. See Appendix A. Problem 3.2. See [17, Section 1.3]. Problem 3.3. By (3.1) it follows that the path Θε is made of segments of length ε and maximal slope kωkL∞ , thus for all h > 0,45 bε−1 (t+h)c−1 kΘε (t + h) − Θε (t)k ≤ C# h + ε X kω(xk , θk )k ≤ C# h. k=dε−1 te Thus the measures Pε are all supported on a set of uniformly Lipschitz functions, that is a compact set. Problem 4.1. See Lemma B.1. 45 The reader should be aware that we use the notation C to designate a generic constant # (depending only on f and ω) which numerical value can change from one occurrence to the next, even in the same line. 40 CARLANGELO LIVERANI Problem 4.2. See Lemma B.2. Problem 5.1. Do it. Problem 5.2. Compute using Ito’s formula. In turn, this implies that ζ is a zero mean Gaussian process, see the proof of Proposition 6.6 for more details. References [1] Baladi, Viviane. Positive transfer operators and decay of correlations. Advanced Series in Nonlinear Dynamics, 16. World Scientific Publishing Co., Inc., River Edge, NJ, 2000. [2] Baladi, Viviane; Tsujii, Masato. Anisotropic H¨ older and Sobolev spaces for hyperbolic diffeomorphisms. Ann. Inst. Fourier (Grenoble) 57 (2007), no. 1, 127–154. [3] Jacopo De Simoi, Carlangelo Liverani, The Martingale approach after Varadhan and Dolpogpyat. In ”Hyperbolic Dynamics, Fluctuations and Large Deviations”, Dolgopyat, Pesin, Pollicott, Stoyanov editors, Proceedings of Symposia in Pure Mathematics, 89, AMS (2015) Preprint. arXiv:1402.0090. [4] Jacopo De Simoi, Carlangelo Liverani, Fast-slow partially hyperbolic systems: Beyond averaging. Part I (Limit Theorems). Preprint. arXiv:1408.5453. [5] Jacopo De Simoi, Carlangelo Liverani, Fast-slow partially hyperbolic systems: Beyond averaging. Part II (statistical properties). Preprint. arXiv:1408.5454. [6] Dolgopyat, Dmitry. Averaging and invariant measures. Mosc. Math. J. 5 (2005), no. 3, 537–576, 742. [7] Donsker, M. D.; Varadhan, S. R. S. Large deviations for noninteracting infinite-particle systems. J. Statist. Phys. 46 (1987), no. 5-6, 1195–1232 [8] Gou¨ ezel, S´ ebastien; Liverani, Carlangelo. Banach spaces adapted to Anosov systems. Ergodic Theory Dynam. Systems 26 (2006), no. 1, 189–217. [9] Gou¨ ezel, S´ ebastien; Liverani, Carlangelo. Compact locally maximal hyperbolic sets for smooth maps: fine statistical properties. J. Differential Geom. 79 (2008), no. 3, 433–477. [10] Guo, M. Z.; Papanicolaou, G. C.; Varadhan, S. R. S. Nonlinear diffusion limit for a system with nearest neighbor interactions. Comm. Math. Phys. 118 (1988), no. 1, 31–59. [11] Katok, Anatole; Hasselblatt, Boris. Introduction to the modern theory of dynamical systems. With a supplementary chapter by Anatole Katok and Leonardo Mendoza. Encyclopedia of Mathematics and its Applications, 54. Cambridge University Press, Cambridge, 1995. [12] R.Z. Khasminskii. On the averaging principle for It¯ o stochastic differential equations, Kybernetika 4 (1968) 260–279 (in Russian). [13] Liverani, Carlangelo. Invariant measures and their properties. A functional analytic point of view, Dynamical Systems. Part II: Topological Geometrical and Ergodic Properties of Dynamics. Pubblicazioni della Classe di Scienze, Scuola Normale Superiore, Pisa. Centro di Ricerca Matematica ”Ennio De Giorgi” : Proceedings. Published by the Scuola Normale Superiore in Pisa (2004). [14] Livsic, A.N. Cohomology of dynamical systems, Math. USSR Iz . 6 (1972) 12781301. [15] Pugh, Charles; Shub, Michael Stably ergodic dynamical systems and partial hyperbolicity. J. Complexity 13 (1997), no. 1, 125–179. [16] Reed, Michael; Simon, Barry Methods of modern mathematical physics. I. Functional analysis. Second edition. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1980. xv+400 pp. [17] Stroock, Daniel W.; Varadhan, S. R. Srinivasa. Multidimensional diffusion processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 233. Springer-Verlag, Berlin-New York, 1979. xii+338 pp. [18] Varadhan, S. R. S. Probability theory. Courant Lecture Notes in Mathematics, 7. New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2001. viii+167 pp. [19] Varadhan, S. R. S. Stochastic processes. Courant Lecture Notes in Mathematics, 16. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2007. x+126 pp. [20] Villani, C´ edric, Topics in optimal transportation. Graduate Studies in Mathematics, 58. American Mathematical Society, Providence, RI, 2003. xvi+370 pp. NOTES 41 [21] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems, volume 260 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. ` di Roma (Tor VerCarlangelo Liverani, Dipartimento di Matematica, II Universita gata), Via della Ricerca Scientifica, 00133 Roma, Italy. E-mail address: [email protected]
© Copyright 2025