Advanced Calculus Michael E. Taylor Contents 0. One variable calculus

Advanced Calculus
Michael E. Taylor
Contents
0.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
One variable calculus
The derivative
Inverse function and implicit function theorem
Fundamental local existence theorem for ODE
Constant coefficient linear systems; exponentiation of matrices
Variable coefficient linear systems of ODE; Duhamel’s principle
Dependence of solutions on initial data, and other parameters
Flows and vector fields
Lie brackets
Differential equations from classical mechanics
Central force problems
The Riemann integral in n variables
Integration on surfaces
Differential forms
Products and exterior derivatives of forms
The general Stokes formula
The classical Gauss, Green, and Stokes formulas
Holomorphic functions and harmonic functions
18.
19.
20.
21.
Variational problems; the stationary action principle
The symplectic form; canonical transformations
Topological applications of differential forms
Critical points and index of a vector field
A. Metric spaces, compactness, and all that
B. Sard’s Theorem
1
2
Introduction
The goal of this text is to present a compact treatment of the basic results of
Advanced Calculus, to students who have had 3 semesters of calculus, including
an introductory treatment of multivariable calculus, and who have had a course in
linear algebra.
We begin with an introductory section, §0, presenting the elements of onedimensional calculus. We first define the integral of a class of functions on an
interval. We then introduce the derivative, and establish the Fundamental Theorem of Calculus, relating differentiation and integration as essentially inverse operations. Further results are dealt with in the exercises, such as the change of variable
formula for integrals, and the Taylor formula with remainder.
In §1 we define the derivative of a function F : O → Rm , when O is an open
subset of Rn , as a linear map from Rn to Rm . We establish some basic properties,
such as the chain rule. We use the one-dimensional integral as a tool to show that,
if the matrix of first order partial derivatives of F is continuous on O, then F is
differentiable on O. We also discuss two convenient multi-index notations for higher
derivatives, and derive the Taylor formula with remainder for a smooth function F
on O ⊂ Rn .
In §2 we establish the inverse function theorem, stating that a smooth map
F : O → Rn with an invertible derivative DF (p) has a smooth inverse defined near
q = F (p). We derive the implicit function theorem as a consequence of this.
In §3 we establish the fundamental local existence theorem for systems of ordinary differential equations. Historically and logically, differential equations is a
major subject in calculus. Our treatmant emphasizes the role of ODE, to a greater
extent than most contemporary treatments of ‘Advanced Calculus,’ in this respect
following such classics as [Wil]. Sections 4-8 develop further the theory of ODE.
We discuss explicit solutions to constant coefficient linear systems in §4, via the
matrix exponential. In §4 we also derive a number of basic results on matrix theory
and linear algebra. We proceed to some general results on variable coefficient linear
systems, in §5, and on dependence of solutions to nonlinear systems on parameters,
in §6. This leads to a geometrical picture of ODE theory, in §§7-8, via a study of
flows generated by vector fields, and the role of the Lie bracket of vector fields.
In §9 we discuss a major source of ODE problems, namely problems arising
from classical mechanics, expressed as differential equations via Newton’s laws. We
show how some can be converted to first order systems of a sort called Hamiltonian
systems. We will return to general results on such systems in §§18 and 19. For
the present, in §10 we treat a special class of equations, arising from central force
problems, including particularly the Kepler problem, for planetary motion.
In §11 we take up the multidimensional Riemann integral. The basic definition is
quite parallel to the one-dimensional case, but a number of central results, such as
3
the change of variable formula, while parallel in statement to the one-dimensional
case, require more elaborate demonstrations in higher dimensions. We also study
the reduction of multiple integrals to iterated integrals, a convenient computational
device.
In §12 we pass from integrals of functions defined on domains in Rn to integrals
of functions defined on n-dimensional surfaces. This includes the study of surface
area. Surface area is defined via a ‘metric tensor,’ an object which is a little more
complicated than a vector field.
In §13 we introduce a further class of objects which are defined on surfaces,
differential forms. A k-form can be integrated over a k-dimensional surface, endowed with an extra piece of structure, an ‘orientation.’ The change of variable
formula established in §11 plays a central role in establishing this. Properties of
differential forms are developed further in the next two sections. In §14 we define
exterior products of forms, and interior products of forms with vector fields. Then
we define the exterior derivative of a form, and the Lie derivative of a form, and
relate these concepts via an important identity. We establish the Poincare Lemma,
characterizing when a k-form is the exterior derivative of some (k − 1)-form, on
a ball in Rn . Section 15 is devoted to the general Stokes formula, an important
integral identity which contains as special cases classical identities of Green, Gauss,
and Stokes. These special cases are discussed in some detail in §16.
In §17 we use Green’s Theorem to derive fundamental properties of holomorphic
functions of a complex variable. Sprinkled throughout earlier sections are some
allusions to functions of complex variables, particularly in some of the exercises in
§§1, 2, and 6. Readers who have had no prior exposure to complex variables should
be able to return to these exercises after getting through §17. In this section, we
also discuss some results on the closely related study of harmonic functions. One
result is Liouville’s Theorem, stating that a bounded harmonic function on all of
Rn must be constant. When specialized to holomorphic functions on C = R2 , this
yields a proof of the Fundamental Theorem of Algebra.
In §18 we delve more deeply into matters introduced in §9, showing how many
problems in classical physics can be systematically treated as variational problems,
by writing down a ‘Lagrangian,’ and showing how a broad class of variational
problems can be converted to Hamiltonian systems. Other variational problems,
such as finding geodesics, i.e., length-minimizing curves on surfaces, are also treated.
In §19 we introduce a differential form, called the symplectic form, and show how
its use clarifies the study of Hamiltonian systems.
In §20 we derive some topological applications of differential forms. We define
the Brouwer degree of a map between two compact oriented surfaces, and establish
some of its basic properties, as a consequence of a characterization of which n-forms
on a compact oriented surface of dimension n are exact. We establish a number of
topological results, including Brouwer’s fixed point theorem, and the Jordan curve
theorem and the Jordan-Brouwer separation theorem (in the smooth case). In the
exercises we sketch a degree-theory proof of the Fundamental Theorem of Algebra,
which one can contrast with the proof given in §17, using complex analysis. In §21
4
we use degree theory to study the index of a vector field, an important quantity
defined in terms of the nature of the critical points of a vector field.
There are two appendices at the end of these notes. Appendix A covers some
basic notions of metric spaces and compactness, used from time to time throughout
the notes, particularly in the study of the Riemann integral and in the proof of the
fundamental existence theorem for ODE. Appendix B discusses the easy case of
Sard’s Theorem, used at one point in the development of degree thory in §20.
There are also scattered exercises intended to give the reader a fresh perspective
on topics familiar from previous study. We mention in particular exercises on
determinants in §1, on trigonometric functions in §3, and on the cross product in
§5.
5
0. One variable calculus
In this brief discussion of one variable calculus, we first consider the integral, and
then relate it to the derivative. We will define the Riemann integral of a bounded
function over an interval I = [a, b] on the real line. To do this, we partition
I into smaller intervals. A partition P of I is a finite collection of subintervals
{Jk : 0 ≤ k ≤ N }, disjoint except for their endpoints, whose union is I. We can
order the Jk so that Jk = [xk , xk+1 ], where
(0.1)
x0 < x1 < · · · < xN < xN +1 ,
x0 = a, xN +1 = b.
We call the points xk the endpoints of P. We set
(0.2)
`(Jk ) = xk+1 − xk ,
We then set
I P (f ) =
maxsize(P) = max `(Jk )
0≤k≤N
X
k
(0.3)
I P (f ) =
X
k
sup f (x) `(Jk ),
Jk
inf f (x) `(Jk ).
Jk
Note that I P (f ) ≤ I P (f ). These quantities should approximate the Riemann integral of f, if the partition P is sufficiently ‘fine.’
To be more precise, if P and Q are two partitions of I, we say P refines Q,
and write P Â Q, if P is formed by partitioning each interval in Q. Equivalently,
P Â Q if and only if all the endpoints of Q are also endpoints of P. It is easy to
see that any two partitions have a common refinement; just take the union of their
endpoints, to form a new partition. Note also that
(0.4)
P Â Q =⇒ I P (f ) ≤ I Q (f ), and I P (f ) ≥ I Q (f ).
Consequently, if Pj are any two partitions and Q is a common refinement, we have
(0.5)
I P1 (f ) ≤ I Q (f ) ≤ I Q (f ) ≤ I P2 (f ).
Now, whenever f : I → R is bounded, the following quantities are well defined:
(0.6)
I(f ) =
inf
P∈Π(I)
I P (f ),
I(f ) = sup
P∈Π(I)
I P (f ),
where Π(I) is the set of all partitions of I. Clearly, by (0.5), I(f ) ≤ I(f ). We then
say that f is Riemann integrable provided I(f ) = I(f ), and in such a case, we set
Z b
Z
(0.7)
f (x)dx = f (x)dx = I(f ) = I(f ).
a
I
We will denote the set of Riemann integrable functions on I by R(I).
We derive some basic properties of the Riemann integral.
6
Proposition 0.1. If f, g ∈ R(I), then f + g ∈ R(I), and
Z
Z
Z
(0.8)
(f + g) dx = f dx + g dx.
I
I
I
Proof. If Jk is any subinterval of I, then
sup (f + g) ≤ sup f + sup g,
Jk
Jk
Jk
so, for any partition P, we have I P (f + g) ≤ I P (f ) + I P (g). Also, using common
refinements, we can simultaneously approximate I(f ) and I(g) by I P (f ) and I P (g).
Thus the characterization (0.6) implies I(f + g) ≤ I(f ) + I(g). A parallel argument
implies I(f + g) ≥ I(f ) + I(g), and the proposition follows.
Next, there is a fair supply of Riemann integrable functions.
Proposition 0.2. If f is continuous on I, then f is Riemann integrable.
Proof. Any continuous function on a compact interval is uniformly continuous; let
ω(δ) be a modulus of continuity for f, so
(0.9)
|x − y| ≤ δ =⇒ |f (x) − f (y)| ≤ ω(δ),
ω(δ) → 0 as δ → 0.
Then
(0.10)
maxsize(P) ≤ δ =⇒ I P (f ) − I P (f ) ≤ ω(δ) · `(I),
which yields the proposition.
We denote the set of continuous functions on I by C(I). Thus Proposition 0.2
says
C(I) ⊂ R(I).
The last argument, showing that every continuous function is Riemann integrable, also provides
a criterion on a partition P guaranteeing that I P (f ) and I P (f )
R
are close to f dx, when f is continuous. The following is a useful extension. Let
I
f ∈ R(I), take ε > 0, and let P0 be a partition such that
Z
(0.11)
I P0 (f ) − ε ≤ f dx ≤ I P0 (f ) + ε.
I
Let
(0.12)
M = sup |f (x)|,
I
δ = minsize(P0 ) = min {`(Jk ) : Jk ∈ P0 }.
7
Proposition 0.3. Under the hypotheses above, if P is any partition of I satisfying
(0.13)
maxsize(P) ≤
δ
,
k
then
Z
(0.14)
I P (f ) − ε1 ≤
f dx ≤ I P (f ) + ε1 , with ε1 = ε +
2M
`(I).
k
I
Proof. Consider on the one hand those intervals in P which are contained in intervals in P0 and on the other hand those intervals in P which are not contained
in intervals in P0 (whose lengths sum to ≤ `(I)/k). Let P1 denote the minimal
common refinement of P and P0 . We obtain:
I P (f ) ≤ I P1 (f ) +
2M
`(I),
k
I P (f ) ≥ I P1 (f ) −
2M
`(I).
k
Since also I P1 (f ) ≤ I P0 (f ) and I P1 (f ) ≥ I P0 (f ), this implies (0.14).
The following corollary is sometimes called Darboux’s Theorem.
Corollary 0.4. Let Pν be any sequence of partitions of I into ν intervals Jνk , 1 ≤
k ≤ ν, such that
maxsize(Pν ) = δν → 0,
and let ξνk be any choice of one point in each interval Jνk of the partition Pν . Then,
whenever f ∈ R(I),
Z
(0.15)
f (x) dx = lim
ν→∞
I
ν
X
f (ξνk ) `(Jνk ).
k=1
However, one should be warned that, once such a specific choice of Pν and ξνk has
been made, the limit on the right side of (0.15) might exist for a bounded function
f that is not Riemann integrable. This and other phenomena are illustrated by the
following example of a function which is not Riemann integrable. For x ∈ I, set
(0.16)
ϑ(x) = 1 if x ∈ Q,
ϑ(x) = 0 if x ∈
/ Q,
where Q is the set of rational numbers. Now every interval J ⊂ I of positive length
contains points in Q and points not in Q, so for any partition P of I we have
I P (ϑ) = `(I) and I P (ϑ) = 0, hence
(0.17)
I(ϑ) = `(I),
I(ϑ) = 0.
8
Note that, if Pν is a partition of I into ν equal subintervals, then we could pick
each ξνk to be rational, in which case the limit on the right side of (10.15) would
be `(I), or we could pick each ξνk to be irrational, in which case this limit would
be zero. Alternatively, we could pick half of them to be rational and half to be
irrational, and the limit would be 12 `(I).
Associated to the Riemann integral is a notion of size of a set S, called content.
If S is a subset of I, define the ‘characteristic function’
(0.18)
χS (x) = 1 if x ∈ S, 0 if x ∈
/ S.
We define ‘upper content’ cont+ and ‘lower content’ cont− by
cont+ (S) = I(χS ),
(0.19)
cont− (S) = I(χS ).
We say S ‘has content,’ or ‘is contented’ is these quantities are equal, which happens
if and only if χS ∈ R(I), in which case the common value of cont+ (S) and cont− (S)
is
Z
(0.20)
m(S) = χS (x) dx.
I
It is easy to see that
(0.21)
+
cont (S) = inf
N
nX
o
`(Jk ) : S ⊂ J1 ∪ · · · ∪ JN ,
k=1
where Jk are intervals. Here, we require S to be in the union of a finite collection
of intervals.
There is a more sophisticated notion of the size of a subset of I, called Lebesgue
measure. The key to the construction of Lebesgue measure is to cover a set S by a
countable (either finite or infinite) set of intervals. The outer measure of S ⊂ I is
defined by
nX
[ o
Jk .
(0.22)
m∗ (S) = inf
`(Jk ) : S ⊂
k≥1
k≥1
Here {Jk } is a finite or countably infinite collection of intervals. Clearly
(0.23)
m∗ (S) ≤ cont+ (S).
Note that, if S = I ∩ Q, then χS = ϑ, defined by (0.16). In this case it is easy to
see that cont+ (S) = `(I), but m∗ (S) = 0. Zero is the ‘right’ measure of this set.
More material on the development of measure theory can be found in a number of
books, including [Fol] and [T2].
R
It is useful to note that I f dx is additive in I, in the following sense.
9
¯
¯
Proposition 0.5. If a < b < c, f : [a, c] → R, f1 = f ¯[a,b] , f2 = f ¯[b,c] , then
¡
¢
¡
¢
¡
¢
f ∈ R [a, c] ⇐⇒ f1 ∈ R [a, b] and f2 ∈ R [b, c] ,
(0.24)
and, if this holds,
Z
Z
c
(0.25)
f dx =
a
Z
b
f1 dx +
a
c
f2 dx.
b
Proof. Since any partition of [a, c] has a refinement for which b is an endpoint, we
may as well consider a partition P = P1 ∪ P2 , where P1 is a partition of [a, b] and
P2 is a partition of [b, c]. Then
(0.26)
I P (f ) = I P1 (f1 ) + I P2 (f2 ),
I P (f ) = I P1 (f1 ) + I P2 (f2 ),
so
(0.27)
©
ª ©
ª
I P (f ) − I P (f ) = I P1 (f1 ) − I P1 (f1 ) + I P2 (f2 ) − I P2 (f2 ) .
Since both terms in braces in (0.27) are ≥ 0, we have equivalence in (0.24). Then
(0.25) follows from (0.26) upon taking sufficiently fine partitions.
Let I = [a, b]. If f ∈ R(I), then f ∈ R([a, x]) for all x ∈ [a, b], and we can
consider the function
Z x
(0.28)
g(x) =
f (t) dt.
a
If a ≤ x0 ≤ x1 ≤ b, then
Z
(0.29)
x1
g(x1 ) − g(x0 ) =
f (t) dt,
x0
so, if |f | ≤ M,
(0.30)
|g(x1 ) − g(x0 )| ≤ M |x1 − x0 |.
In other words, if f ∈ R(I), then g is Lipschitz continuous on I.
A function g : (a, b) → R is said to be differentiable at x ∈ (a, b) provided there
exists the limit
(0.31)
lim
h→0
¤
1£
g(x + h) − g(x) = g 0 (x).
h
When such a limit exists, g 0 (x), also denoted dg/dx, is called the derivative of g at
x. Clearly g is continuous wherever it is differentiable.
The next result is part of the Fundamental Theorem of Calculus.
10
Theorem 0.6. If f ∈ C([a, b]), then the function g, defined by (0.28), is differentiable at each point x ∈ (a, b), and
g 0 (x) = f (x).
(0.32)
Proof. Parallel to (0.29), we have, for h > 0,
(0.33)
¤ 1
1£
g(x + h) − g(x) =
h
h
Z
x+h
f (t) dt.
x
If f is continuous at x, then, for any ε > 0, there exists δ > 0 such that |f (t) −
f (x)| ≤ ε whenever |t − x| ≤ δ. Thus the right side of (0.33) is within ε of f (x)
whenever h ∈ (0, δ]. Thus the desired limit exists as h & 0. A similar argument
treats h % 0.
The next result is the rest of the Fundamental Theorem of Calculus.
Theorem 0.7. If G is differentiable and G0 (x) is continuous on [a, b], then
Z
b
(0.34)
G0 (t) dt = G(b) − G(a).
a
Proof. Consider the function
Z
(0.35)
x
g(x) =
G0 (t) dt.
a
We have g ∈ C([a, b]), g(a) = 0, and, by Theorem 0.6,
g 0 (x) = G0 (x),
∀ x ∈ (a, b).
Thus f (x) = g(x) − G(x) is continuous on [a, b], and
(0.36)
f 0 (x) = 0,
∀ x ∈ (a, b).
We claim that (0.36) implies f is constant on [a, b]. Granted this, since f (a) =
g(a) − G(a) = −G(a), we have f (x) = −G(a) for all x ∈ [a, b], so the integral (0.31)
is equal to G(x) − G(a) for all x ∈ [a, b]. Taking x = b yields (0.30).
The fact that (0.36) implies f is constant on [a, b] is a consequence of the following
result, known as the Mean Value Theorem.
11
Theorem 0.8. Let f : [a, β] → R be continuous, and assume f is differentiable on
(a, β). Then ∃ ξ ∈ (a, β) such that
(0.37)
f 0 (ξ) =
f (β) − f (a)
.
β−a
Proof. Replacing f (x) by fe(x) = f (x) − κ(x − a), where κ is the right side of (0.37),
we can assume without loss of generality that f (a) = f (β). Then we claim that
f 0 (ξ) = 0 for some ξ ∈ (a, β). Indeed, since [a, β] is compact, f must assume a
maximum and a minumum on [a, β]. If f (a) = f (β), one of these must be assumed
at an interior point, ξ, at which f 0 clearly vanishes.
Now, to see that (0.36) implies f is constant on [a, b], if not, ∃ β ∈ (a, b] such
that f (β) 6= f (a). Then just apply Theorem 0.8 to f on [a, β]. This completes the
proof of Theorem 0.7.
If a function G is differentiable on (a,
b), and
G0 is continuous on (a, b), ¡we say¢
¡
¢
G is a C 1 function, ¡and write
G ∈ C 1 (a, b) . Inductively, we say G ∈ C k (a, b)
¢
provided G0 ∈ C k−1 (a, b) .
An easy consequence of the definition (0.31) of the derivative is that, for any real
constants a, b, and c,
f (x) = ax2 + bx + c =⇒
df
= 2ax + b.
dx
Now, it is a simple enough step to replace a, b, c by y, z, w, in these formulas. Having
done that, we can regard y, z, and w as variables, along with x :
F (x, y, z, w) = yx2 + zx + w.
We can then hold y, z and w fixed (e.g., set y = a, z = b, w = c), and then
differentiate with respect to x; we get
∂F
= 2yx + z,
∂x
the partial derivative of F with respect to x. Generally, if F is a function of n
variables, x1 , . . . , xn , we set
∂F
(x1 , . . . , xn )
∂xj
¤
1£
= lim
F (x1 , . . . , xj−1 , xj + h, xj+1 , . . . , xn ) − F (x1 , . . . , xj−1 , xj , xj+1 , . . . , xn ) ,
h→0 h
where the limit exists. Section one carries on with a further investigation of the
derivative of a function of several variables.
12
Exercises
1. Let c > 0 and let f : [ac, bc] → R be Riemann integrable. Working directly with
the definition of integral, show that
Z
b
a
1
f (cx) dx =
c
Z
bc
f (x) dx.
ac
More generally, show that
Z
b−d/c
a−d/c
1
f (cx + d) dx =
c
Z
bc
f (x) dx.
ac
n
2.
R Let f : I × S → R be continuous, where I = [a, b] and S ⊂ R . Take ϕ(y) =
f (x, y) dx. Show that ϕ is continuous on S.
I
Hint. If fj : I → R are continuous and |f1 (x) − f2 (x)| ≤ δ on I, then
Z
¯Z
¯
¯
¯
¯ f1 dx − f2 dx¯ ≤ `(I)δ.
I
I
3. With f as in problem 2, suppose gj : S → R are continuous and a ≤ g0 (y) <
R g (y)
g1 (y) ≤ b. Take ϕ(y) = g01(y) f (x, y) dx. Show that ϕ is continuous on S.
Hint. Make a change of variables, linear in y, to reduce this to problem 2.
4. Suppose
¡ f ¢: (a, b) → (c, d) and g : (c, d) → R are differentiable. Show that
h(x) = g f (x) is differentiable and
¡
¢
h0 (x) = g 0 f (x) f 0 (x).
(0.38)
This is the chain rule.
5. If f1 and f2 are differentiable on (a, b), show that f1 (x)f2 (x) is differentiable and
(0.39)
¢
d ¡
f1 (x)f2 (x) = f10 (x)f2 (x) + f1 (x)f20 (x).
dx
13
If f2 (x) 6= 0, ∀ x ∈ (a, b), show that f1 (x)/f2 (x) is differentiable and
d ³ f1 (x) ´ f10 (x)f2 (x) − f1 (x)f20 (x)
(0.40)
=
.
dx f2 (x)
f2 (x)2
6. If f : (a, b) → R is differentiable, show that
d
(0.41)
f (x)n = nf (x)n−1 f 0 (x),
dx
for all positive integers n. If f (x) 6= 0 ∀ x ∈ (a, b), show that this holds for all n ∈ Z.
Hint. To get (0.41) for positive n, use induction and apply (0.39).
7. Let ϕ : [a, b] → [A, B] be C 1 on a neighborhood J of [a, b], with ϕ0 (x) > 0 for all
x ∈ [a, b]. Assume ϕ(a) = A, ϕ(b) = B. Show that the identity
Z b
Z B
¡
¢
f (y) dy =
f ϕ(t) ϕ0 (t) dt,
(0.42)
A
a
for any f ∈ C(J), follows from the chain rule and the fundamental theorem of
calculus.
Hint. Replace b by x, B by ϕ(x), and differentiate.
Note that this result contains that of Problem 1.
Try to establish (0.42) directly by working with the definition of the integral as a
limit of Riemann sums.
8. Show that, if f and g are C 1 on a neighborhood of [a, b], then
Z b
Z b
£
¤
0
(0.43)
f (s)g (s) ds = −
f 0 (s)g(s) ds + f (b)g(b) − f (a)g(a) .
a
a
This transformation of integrals is called ‘integration by parts.’
9. Let f : (−a, a) → R be a C j+1 function. Show that, for x ∈ (−a, a),
(0.44)
where
f (x) = f (0) + f 0 (0)x +
Z
f 00 (0) 2
f (j) (0) j
x + ··· +
x + Rj (x),
2
j!
Z 1
´
³¡
1 ¢
(x − s)j (j+1)
xj+1
(0.45) Rj (x) =
f
(s) ds =
f (j+1) 1 − t j+1 x dt.
j!
(j + 1)! 0
0
This is Taylor’s formula with remainder.
Hint. Use induction. If (0.44)-(0.45) holds for 0 ≤ j ≤ k, show that it holds for
j = k + 1, by showing that
Z x
Z x
(x − s)k (k+1)
f (k+1) (0) k+1
(x − s)k+1 (k+2)
(0.46)
f
(s) ds =
x
+
f
(s) ds.
k!
(k + 1)!
(k + 1)!
0
0
To establish this, use the integration by parts formula (0.43), with f (s) replaced
by f (k+1) (s), and with appropriate g(s).
x
14
1. The derivative
Let O be an open subset of Rn , and F : O → Rm a continuous function. We say
F is differentiable at a point x ∈ O, with derivative L, if L : Rn → Rm is a linear
transformation such that, for y ∈ Rn , small,
(1.1)
F (x + y) = F (x) + Ly + R(x, y)
with
(1.2)
kR(x, y)k/kyk → 0 as y → 0.
We denote the derivative at x by DF (x) = L. With respect to the standard bases
of Rn and Rm , DF (x) is simply the matrix of partial derivatives,
¡
¢
(1.3)
DF (x) = ∂Fj /∂xk ,
so that, if v = (v1 , . . . , vn ), (regarded as a column vector) then
³X
´
X
(1.4)
DF (x)v =
(∂F1 /∂xk )vk , . . . ,
(∂Fm /∂xk )vk .
k
k
It will be shown below that F is differentiable whenever all the partial derivatives
exist and are continuous on O. In such a case we say F is a C 1 function on O. More
generally, F is said to be C k if all its partial derivatives of order ≤ k exist and are
continuous.
In (1.2), we can use the Euclidean norm on Rn and Rm . This norm is defined by
(1.5)
¡
¢1
kxk = x21 + · · · + x2n 2
for x = (x1 , . . . , xn ) ∈ Rn . Any other norm would do equally well.
We now derive the chain rule for the derivative. Let F : O → Rm be differentiable at x ∈ O, as above, let U be a neighborhood of z = F (x) in Rm , and let
G : U → Rk be differentiable at z. Consider H = G ◦ F. We have
(1.6)
H(x + y) = G(F (x + y))
¡
¢
= G F (x) + DF (x)y + R(x, y)
¡
¢
= G(z) + DG(z) DF (x)y + R(x, y) + R1 (x, y)
= G(z) + DG(z)DF (x)y + R2 (x, y)
with
kR2 (x, y)k/kyk → 0 as y → 0.
15
Thus G ◦ F is differentiable at x, and
(1.7)
D(G ◦ F )(x) = DG(F (x)) · DF (x).
Another useful remark is that, by the fundamental theorem of calculus, applied
to ϕ(t) = F (x + ty),
Z
(1.8)
F (x + y) = F (x) +
1
DF (x + ty)y dt,
0
provided F is C 1 . For a typical application, see (6.6).
A closely related application of the fundamental theorem of calculus is that, if
we assume F : O → Rm is differentiable in each variable separately, and that each
∂F/∂xj is continuous on O, then
n
n
X
X
£
¤
F (x + y) = F (x) +
F (x + zj ) − F (x + zj−1 ) = F (x) +
Aj (x, y)yj ,
(1.9)
j=1
Z
Aj (x, y) =
0
j=1
1
¢
∂F ¡
x + zj−1 + tyj ej dt,
∂xj
where z0 = 0, zj = (y1 , . . . , yj , 0, . . . , 0), and {ej } is the standard basis of Rn . Now
(1.9) implies F is differentiable on O, as we stated below (1.4). As is shown in many
calculus texts, one can use the mean value theorem instead of the fundamental theorem of calculus, and obtain a slightly sharper result. We leave the reconstruction
of this argument to the reader.
We now describe two convenient notations to express higher order derivatives of
a C k function f : Ω → R, where Ω ⊂ Rn is open. In one, let J be a k-tuple of
integers between 1 and n; J = (j1 , . . . , jk ). We set
(1.10)
f (J) (x) = ∂jk · · · ∂j1 f (x),
∂j = ∂/∂xj .
We set |J| = k, the total order of differentiation. As will be seen in the exercises,
∂i ∂j f = ∂j ∂i f provided f ∈ C 2 (Ω). It follows that, if f ∈ C k (Ω), then ∂jk · · · ∂j1 f =
∂`k · · · ∂`1 f whenever {`1 , . . . , `k } is a permutation of {j1 , . . . , jk }. Thus, another
convenient notation to use is the following. Let α be an n-tuple of non-negative
integers, α = (α1 , . . . , αn ). Then we set
(1.11)
f (α) (x) = ∂1α1 · · · ∂nαn f (x),
|α| = α1 + · · · + αn .
Note that, if |J| = |α| = k and f ∈ C k (Ω),
(1.12)
f (J) (x) = f (α) (x), with αi = #{` : j` = i}.
16
Correspondingly, there are two expressions for monomials in x = (x1 , . . . , xn ) :
xJ = xj1 · · · xjk ,
(1.13)
αn
1
xα = xα
1 · · · xn ,
and xJ = xα provided J and α are related as in (1.12). Both these notations are
called ‘multi-index’ notations.
We now derive Taylor’s formula with remainder for a smooth function F : Ω → R,
making use of these multi-index notations. We will apply the one variable formula
(0.44)-(0.45), i.e.,
(1.14)
ϕ(t) = ϕ(0) + ϕ0 (0)t + 21 ϕ00 (0)t2 + · · · +
1 (k)
(0)tk
k! ϕ
+ rk (t),
with
1
rk (t) =
k!
(1.15)
Z
t
(t − s)k ϕ(k+1) (s) ds,
0
given ϕ ∈ C k+1 (I), I = (−a, a). Let us assume 0 ∈ Ω, and that the line segment
from 0 to x is contained in Ω. We set ϕ(t) = F (tx), and apply (1.14)-(1.15) with
t = 1. Applying the chain rule, we have
0
(1.16)
ϕ (t) =
n
X
∂j F (tx)xj =
j=1
X
F (J) (tx)xJ .
|J|=1
Differentiating again, we have
X
(1.17)
ϕ00 (t) =
F (J+K) (tx)xJ+K =
X
F (J) (tx)xJ ,
|J|=2
|J|=1,|K|=1
where, if |J| = k, |K| = `, we take J + K = (j1 , . . . , jk , k1 , . . . , k` ). Inductively, we
have
X
(1.18)
ϕ(k) (t) =
F (J) (tx)xJ .
|J|=k
Hence, from (1.14) with t = 1,
F (x) = F (0) +
X
F (J) (0)xJ + · · · +
|J|=1
1 X (J)
F (0)xJ + Rk (x),
k!
|J|=k
or, more briefly,
(1.19)
F (x) =
X
|J|≤k
1 (J)
F (0)xJ + Rk (x),
|J|!
17
where
(1.20)
1
Rk (x) =
k!
X ³Z
|J|=k+1
1
k
(1 − s) F
(J)
´
(sx) ds xJ .
0
This gives Taylor’s formula with remainder for F ∈ C k+1 (Ω), in the J-multi-index
notation.
We also want to write the formula in the α-multi-index notation. We have
X
X
(1.21)
F (J) (tx)xJ =
ν(α)F (α) (tx)xα ,
|J|=k
|α|=k
where
(1.22)
ν(α) = #{J : α = α(J)},
and we define the relation α = α(J) to hold provided the condition (1.12) holds, or
equivalently provided xJ = xα . Thus ν(α) is uniquely defined by
X
(1.23)
ν(α)xα =
|α|=k
X
xJ = (x1 + · · · + xn )k .
|J|=k
One sees that, if |α| = k, then ν(α) is equal to the product of the number of
combinations of k objects, taken α1 at a time, times the number of combinations
of k − α1 objects, taken α2 at a time, · · · times the number of combinations of
k − (α1 + · · · + αn−1 ) objects, taken αn at a time. Thus
µ
(1.24)
ν(α) =
k
α1
¶µ
¶ µ
¶
k − α1
k − α1 − · · · − αn−1
k!
···
=
α2
αn
α1 !α2 ! · · · αn !
In other words, for |α| = k,
(1.25)
k!
, where α! = α1 ! · · · αn !
α!
ν(α) =
Thus the Taylor formula (1.19) can be rewritten
(1.26)
F (x) =
X 1
F (α) (0)xα + Rk (x),
α!
|α|≤k
where
(1.27)
Rk (x) =
X
|α|=k+1
Z
´
k + 1³ 1
(1 − s)k F (α) (sx) ds xα .
α!
0
18
Exercises
1. Let Mn×n be the space of complex n × n matrices, det : Mn×n → C the
determinant. Show that, if I is the identity matrix,
D det(I)B = T r B,
i.e.,
(1.28)
(d/dt) det(I + tB)|t=0 = T r B.
¡
¢
2. If A(t) = ajk (t) is a curve in Mn×n , use the expansion of (d/dt) det A(t) as a
sum of n determinants, in which the rows of A(t) are successively differentiated, to
show that, for A ∈ Mn×n ,
¡
(1.29)
D det(A)B = T r Cof(A)t · B),
where Cof(A) is the cofactor matrix of A.
3. Suppose A ∈ Mn×n is invertible. Using
det(A + tB) = (det A) det(I + tA−1 B),
show that
D det(A)B = (det A) T r(A−1 B).
Comparing the result of exercise 2, deduce Cramer’s formula:
(1.30)
(det A)A−1 = Cof(A)t .
4. Identify R2 and C via z = x + iy. Then multiplication by i on C corresponds to
applying
µ
¶
0 −1
J=
1 0
Let O ⊂ R2 be open, f : O → R2 be C 1 . Say f = (u, v). Regard Df (x, y) as a
2 × 2 real matrix. One says f is holomorphic, or complex analytic, provided the
Cauchy-Riemann equations hold:
∂u/∂x = ∂v/∂y,
∂u/∂y = −∂v/∂x.
19
Show that this is equivalent to the condition
Df (x, y)J = JDf (x, y).
Generalize to O open in Cm , f : O → Cn .
5. If f is C 1 on a region in R2 containing [a, b] × {y}, show that
d
dy
Z
Z
b
f (x, y) dx =
a
a
b
∂f
(x, y) dx.
∂y
Hint. Show that the left side is equal to
Z
b
lim
h→0
a
1
h
Z
0
h
∂f
(x, y + s) ds dx.
∂y
6. Suppose F : O → Rm is a C 2 function. Applying the fundamental theorem of
calculus, first to
Gj (x) = F (x + hej ) − F (x)
(as a function of h) and then to
Hjk (x) = Gj (x + hek ) − Gj (x),
where {ej } is the standard basis of Rn , show that, if x ∈ O and h is small, then
Z
h
Z
h
F (x+hej +hek )−F (x+hek )−F (x+hej )+F (x) =
0
0
¢
∂ ∂F ¡
x+sej +tek ds dt.
∂xk ∂xj
(Regard this simply as an iterated integral.) Similarly show that this quantity is
equal to
Z hZ h
¢
∂ ∂F ¡
x + sej + tek dt ds.
0
0 ∂xj ∂xk
Taking h → 0, deduce that
(1.31)
∂ ∂F
∂ ∂F
(x) =
(x).
∂xk ∂xj
∂xj ∂xk
Hint. Use problem 5. You should be able to avoid use of results on double integrals,
such as established in §11.
Arguments that use the mean value theorem instead of the fundamental theorem
of calculus can be found in many calculus texts.
20
7. Considering the power series
f (x) = f (y) + f 0 (y)(x − y) + · · · +
show that
f (j) (y)
(x − y)j + Rj (x, y),
j!
1
∂Rj
= − f (j+1) (y)(x − y)j ,
∂y
j!
Rj (x, x) = 0.
Use this to re-derive (0.45).
Auxiliary exercises on determinants
If Mn×n denotes the space of n × n complex matrices, we want to show that
there is a map
(1.32)
Det : Mn×n → C
which is uniquely specified as a function ϑ : Mn×n → C satisfying:
(a) ϑ is linear in each column aj of A,
e = −ϑ(A) if A
e is obtained from A by interchanging two columns.
(b) ϑ(A)
(c) ϑ(I) = 1.
1. Let A = (a1 , . . . , an ), where aj are column vectors; aj = (a1j , . . . , anj )t . Show
that, if (a) holds, we have the expansion
Det A =
(1.33)
X
aj1 Det (ej , a2 , . . . , an ) = · · ·
j
=
X
aj1 1 · · · ajn n Det (ej1 , ej2 , . . . , ejn ),
j1 ,··· ,jn
where {e1 , . . . , en } is the standard basis of Cn .
2. Show that, if (b) and (c) also hold, then
(1.34)
Det A =
X
(sgn σ) aσ(1)1 aσ(2)2 · · · aσ(n)n ,
σ∈Sn
where Sn is the set of permutations of {1, . . . , n}, and
(1.35)
sgn σ = Det (eσ(1) , . . . , eσ(n) ) = ±1.
21
To define sgn σ, the ‘sign’ of a permutation σ, we note that every permutation σ
can be written as a product of transpositions: σ = τ1 · · · τν , where a transposition
of {1, . . . , n} interchanges two elements and leaves the rest fixed. We say sgn σ = 1
if ν is even and sgn σ = −1 if ν is odd. It is necessary to show that sgn σ is
independent of the choice of such a product representation. (Referring to (1.35)
begs the question until we know that Det is well defined.)
3. Let σ ∈ Sn act on a function of n variables by
(1.36)
(σf )(x1 , . . . , xn ) = f (xσ(1) , . . . , xσ(n) ).
Let P be the polynomial
(1.37)
P (x1 , . . . , xn ) =
Y
(xj − xk ).
1≤j<k≤n
Show that
(1.38)
(σP )(x) = (sgn σ) P (x),
and that this implies that sgn σ is well defined.
4. Deduce that there is a unique determinant satisfying (a)-(c), and that it is given
by (1.34).
5. Show that (1.34) implies
(1.39)
Det A = Det At .
Conclude that one can replace columns by rows in the characterization (a)-(c) of
determinants.
Hint. aσ(j)j = a`τ (`) with ` = σ(j), τ = σ −1 . Also, sgn σ = sgn τ.
6. Show that, if (a)-(c) hold (for rows), it follows that
e = ϑ(A) if A
e is obtained from A by adding cρ` to ρk , for some c ∈ C,
(d) ϑ(A)
where ρ1 , . . . , ρn are the rows of A.
22
Re-prove the uniqueness of ϑ satisfying (a)-(d) (for rows) by applying row operations
to A until either some row vanishes or A is converted to I.
7. Show that
(1.40)
Det (AB) = (Det A)(Det B).
Hint. For fixed B ∈ Mn×n , compare ϑ1 (A) = Det(AB) and ϑ2 (A) = (Det A)(Det B).
For uniqueness, use an argument from problem 6.
8. Show that
(1.41)

1
0
Det 
 ...
a12
a22
..
.
···
···
0 an2
···


a1n
1 0
a2n 
 0 a22
= Det 
.. 
..

 ...
.
.
ann
0 an2
···
···
···

0
a2n 
= Det A11
.. 
. 
ann
where A11 = (ajk )2≤j,k≤n .
Hint. Do the first identity by the analogue of (d), for columns. Then exploit
uniqueness for Det on M(n−1)×(n−1) .
9. Deduce that Det(ej , a2 , . . . , an ) = (−1)j−1 Det A1j where Akj is formed by
deleting the k th column and the j th row from A.
10. Deduce from the first sum in (1.33) that
(1.42)
Det A =
n
X
(−1)j−1 aj1 Det A1j .
j=1
More generally, for any k ∈ {1, . . . , n},
(1.43)
Det A =
n
X
(−1)j−k ajk Det Akj .
j=1
This is called an expansion of Det A by minors, down the k th column.
11. Show that
(1.44)


Det 

a11
a12
a22
···
···
..
.

a1n
a2n 
= a11 a22 · · · ann .
.. 
. 
ann
Hint. Use (1.41) and induction.
23
2. Inverse function and implicit function theorem
The Inverse Function Theorem, together with its corollary the Implicit Function
Theorem is a fundamental result in several variable calculus. First we state the
Inverse Function Theorem.
Theorem 2.1. Let F be a C k map from an open neighborhood Ω of p0 ∈ Rn to
Rn , with q0 = F (p0 ). Suppose the derivative DF (p0 ) is invertible. Then there is
a neighborhood U of p0 and a neighborhood V of q0 such that F : U → V is one
to one and onto, and F −1 : V → U is a C k map. (One says F : U → V is a
diffeomorphism.)
Using the chain rule, it is easy to reduce to the case p0 = q0 = 0 and DF (p0 ) = I,
the identity matrix, so we suppose this has been done. Thus
(2.1)
F (u) = u + R(u),
R(0) = 0, DR(0) = 0.
For v small, we want to solve
(2.2)
F (u) = v.
This is equivalent to u + R(u) = v, so let
(2.3)
Tv (u) = v − R(u).
Thus solving (2.2) is equivalent to solving
(2.4)
Tv (u) = u.
We look for a fixed point u = K(v) = F −1 (v). Also, we want to prove that DK(0) =
I, i.e., that K(v) = v+r(v) with r(v) = o(kvk). If we succeed in doing this, it follows
easily that, for general x close to 0,
³
¡
¢´−1
DK(x) = DF K(x)
,
and a simple inductive argument shows that K is C k if F is C k .
A tool we will use to solve (2.4) is the following general result, known as the
Contraction Mapping Principle.
Theorem 2.2. Let X be a complete metric space, and let T : X → X satisfy
(2.5)
dist(T x, T y) ≤ r dist(x, y),
24
for some r < 1. (We say T is a contraction.) Then T has a unique fixed point x.
For any y0 ∈ X, T k y0 → x as k → ∞.
Proof. Pick y0 ∈ X and let yk = T k y0 . Then dist(yk+1 , yk ) ≤ rk dist(y1 , y0 ), so
dist(yk+m , yk ) ≤ dist(yk+m , yk+m−1 ) + · · · + dist(yk+1 , yk )
¡
¢
≤ rk + · · · + rk+m−1 dist(y1 , y0 )
(2.6)
¡
¢−1
≤ rk 1 − r
dist(y1 , y0 ).
It follows that (yk ) is a Cauchy sequence, so it converges; yk → x. Since T yk = yk+1
and T is continuous, it follows that T x = x, i.e., x is a fixed point. Uniqueness
of the fixed point is clear from the estimate dist(T x, T x0 ) ≤ r dist(x, x0 ), which
implies dist(x, x0 ) = 0 if x and x0 are fixed points. This proves Theorem 2.2.
Returning to the problem of solving (2.4), we consider
(2.7)
Tv : Xv −→ Xv
with
(2.8)
Xv = {u ∈ Ω : ku − vk ≤ Av }
where we set
(2.9)
Av =
sup
kR(w)k.
kwk≤2kvk
We claim that (2.7) holds if kvk is sufficiently small. To prove this, note that
Tv (u) − v = −R(u), so we need to show that, provided kvk is small, u ∈ Xv implies
kR(u)k ≤ Av . But indeed, if u ∈ Xv , then kuk ≤ kvk + Av , which is ≤ 2kvk if kvk
is small, so then kR(u)k ≤ sup kR(w)k = Av ; this establishes (2.7).
kwk≤2kvk
Note that if kvk is small enough, the map (2.7) is a contraction map, so there
exists a unique fixed point u = K(v) ∈ Xv . Note that, since u ∈ Xv ,
(2.10)
kK(v) − vk ≤ Av = o(kvk),
so the inverse function theorem is proved.
Thus if DF is invertible on the domain of F, F is a local diffeomorphism, though
stronger hypotheses are needed to guarantee that F is a global diffeomorphism onto
its range. Here is one result along these lines.
Proposition 2.3. If Ω ⊂ Rn is open and convex, F : Ω → Rn is C 1 , and the
symmetric part of DF (u) is positive definite for each u ∈ Ω, then F is one to one
on Ω.
Proof. Suppose F (u1 ) = F (u2 ), u2 = u1 + w. Consider ϕ : [0, 1] → R given by
ϕ(t) = w · F (u1 + tw).
0
Thus ϕ(0) = ϕ(1), so ϕ (t0 ) must vanish for some t0 ∈ (0, 1), by the mean value
theorem. But ϕ0 (t) = w · DF (u1 + tw)w > 0, if w 6= 0, by the hypothesis on DF.
This shows F is one to one.
We can obtain the following Implicit Function Theorem as a consequence of the
Inverse Function Theorem.
25
Theorem 2.4. Suppose U is a neighborhood of x0 ∈ Rk , V a neighborhood of
z0 ∈ R` , and
F : U × V −→ R`
(2.11)
is a C k map. Assume Dz F (x0 , z0 ) is invertible; say F (x0 , z0 ) = u0 . Then the
equation F (x, z) = u0 defines z = f (x, u0 ) for x near x0 , with f a C k map.
To prove this, consider H : U × V → Rk × R` defined by
¡
¢
H(x, z) = x, F (x, z) .
(2.12)
We have
µ
(2.13)
DH =
I
0
Dx F
Dz F
¶
Thus DH(x0 , z0 ) is invertible, so J = H −1 exists and is C k , by the Inverse Function
Theorem. It is clear that J(x, u0 ) has the form
¡
¢
J(x, u0 ) = x, f (x, u0 ) ,
(2.14)
and f is the desired map.
Exercises
1. Supose F : U → Rn is a C 2 map, p ∈ U, open in Rn , and DF (p) is invertible.
With q = F (p), define a map N on a neighborhood of p by
(2.15)
¡
¢
N (x) = x + DF (x)−1 q − F (x) .
Show that there exists ε > 0 and C < ∞ such that, for 0 ≤ r < ε,
kx − pk ≤ r =⇒ kN (x) − pk ≤ C r2 .
Conclude that, if kx1 − pk ≤ r with r < min(ε, 1/2C), then xj+1 = N (xj ) defines
a sequence converging very rapidly to p. This is the basis of Newton’s method, for
solving F (p) = q for p.
Hint. Apply F to both sides of (2.15).
26
2. Applying Newton’s method to f (x) = 1/x, show that you get a fast approximation to division using only addition and multiplication.
Hint. Carry out the calculation of N (x) in this case and notice a ‘miracle.’
3. Identify R2n with Cn via z = x + iy, as in exercise 4 of §1. Let U ⊂ R2n be
open, F : U → R2n be C 1 . Assume p ∈ U, DF (p) invertible. If F −1 : V → U is
given as in Theorem 2.1, show that F −1 is holomorphic provided F is.
4. Let O ⊂ Rn be open. We say a function f ∈ C ∞ (O) is real analytic provided
that, for each x0 ∈ O, we have a convergent power series expansion
(2.16)
f (x) =
X 1
f (α) (x0 )(x − x0 )α ,
α!
α≥0
valid in a neighborhood of x0 . Show that we can let x be complex in (2.16), and
obtain an extension of f to a neighborhood of O in Cn . Show that the extended
function is holomorphic, i.e., satisfies the Cauchy-Riemann equations.
Remark. It can be shown that, conversely, any holomorphic function has a power
series expansion. See §17. For the next exercise, assume this as known.
5. Let O ⊂ Rn be open, p ∈ O, f : O → Rn be real analytic, with Df (p) invertible.
Take f −1 : V → U as in Theorem 2.1. Show f −1 is real analytic.
Hint. Consider a holomorphic extension F : Ω → Cn of f and apply exercise 3.
27
3. Fundamental local existence theorem for ODE
The goal of this section is to establish existence of solutions to an ODE
(3.1)
dy/dt = F (t, y),
y(t0 ) = y0 .
We will prove the following fundamental result.
Theorem 3.1. Let y0 ∈ O, an open subset of Rn , I ⊂ R an interval containing
t0 . Suppose F is continuous on I × O and satisfies the following Lipschitz estimate
in y :
(3.2)
kF (t, y1 ) − F (t, y2 )k ≤ Lky1 − y2 k
for t ∈ I, yj ∈ O. Then the equation (3.1) has a unique solution on some t-interval
containing t0 .
To begin the proof, we note that the equation (3.1) is equivalent to the integral
equation
Z t
(3.3)
y(t) = y0 +
F (s, y(s)) ds.
t0
Existence will be established via the Picard iteration method, which is the following.
Guess y0 (t), e.g., y0 (t) = y0 . Then set
Z t
¡
¢
(3.4)
yk (t) = y0 +
F s, yk−1 (s) ds.
t0
We aim to show that, as k → ∞, yk (t) converges to a (unique) solution of (3.3), at
least for t close enough to t0 .
To do this, we use the Contraction Mapping Principle, established in §2. We
look for a fixed point of T, defined by
Z t
(3.5)
(T y)(t) = y0 +
F (s, y(s))ds.
t0
Let
(3.6)
©
ª
X = u ∈ C(J, Rn ) : u(t0 ) = y0 , sup ku(t) − y0 k ≤ K .
t∈J
Here J = [t0 − ε, t0 + ε], where ε will be chosen, sufficiently small, below. K is
picked so {y : ky − y0 k ≤ K} is contained in O, and we also suppose J ⊂ I. Then
there exists M such that
(3.7)
sup
s∈J,ky−y0 k≤K
kF (s, y)k ≤ M.
28
Then, provided
(3.8)
ε ≤ K/M,
we have
(3.9)
T : X → X.
Now, using the Lipschitz hypothesis (3.2), we have, for t ∈ J,
Z
t
k(T y)(t) − (T z)(t)k ≤
(3.10)
Lky(s) − z(s)kds
t0
≤ ε L sup ky(s) − z(s)k
s∈J
assuming y and z belong to X. It follows that T is a contraction on X provided
one has
(3.11)
ε < 1/L
in addition to the hypotheses above. This proves Theorem 3.1.
In view of the lower bound on the length of the interval J on which the existence
theorem works, it is easy to show that the only way a solution can fail to be globally
defined, i.e., to exist for all t ∈ I, is for y(t) to ‘explode to infinity’ by leaving every
compact set K ⊂ O, as t → t1 , for some t1 ∈ I.
Often one wants to deal with a higher order ODE. There is a standard method
of reducing an nth order ODE
(3.12)
y (n) (t) = f (t, y, y 0 , . . . , y (n−1) )
to a first order system. One sets u = (u0 , . . . , un−1 ) with
(3.13)
u0 = y,
uj = y (j) ,
and then
(3.14)
du/dt = (u1 , . . . , un−1 , f (t, u0 , . . . , un−1 ))
= g(t, u).
If y takes values in Rk , then u takes values in Rkn .
If the system (3.1) is non-autonomous, i.e., if F explicitly depends on t, it can be
converted to an autonomous system (one with no explicit t-dependence) as follows.
Set z = (t, y). We then have
(3.15)
dz/dt = (1, dy/dt) = (1, F (z))
= G(z).
29
Sometimes this process destroys important features of the original system (3.1).
For example, if (3.1) is linear, (3.15) might be nonlinear. Nevertheless, the trick of
converting (3.1) to (3.15) has some uses.
Many systems of ODE are difficult to solve explicitly. There is one very basic
class of ODE which can be solved explicitly, in terms of integrals, namely the single
first order linear ODE:
(3.16)
dy/dt = a(t)y + b(t),
y(0) = y0 ,
where a(t) and b(t) are continuous real or complex valued functions. Set
Z
(3.17)
A(t) =
t
a(s)ds.
0
Then (3.16) can be written as
(3.18)
¡
¢
eA(t) (d/dt) e−A(t) y = b(t).
See exercise 5 below for a comment on the exponential function. From (3.18) we
get
Z t
A(t)
A(t)
(3.19)
y(t) = e
y0 + e
e−A(s) b(s)ds.
0
Compare formulas (4.42) and (5.8), in subsequent sections.
Exercises
1. Solve the initial value problem
dy/dt = y 2 ,
y(0) = a,
given a ∈ R. On what t-interval is the solution defined?
2. Under the hypotheses of Theorem 3.1, if y solves (3.1) for t ∈ [T0 , T1 ], and
y(t) ∈ K, compact in O, for all such t, prove that y(t) extends to a solution for
t ∈ [S0 , S1 ], with S0 < T0 , T1 > T0 , as stated below (3.11).
3. Let M be a compact smooth surface in Rn . Suppose F : Rn → Rn is a smooth
map (vector field), such that, for each x ∈ M, F (x) is tangent to M, i.e., the line
30
γx (t) = x + tF (x) is tangent to M at x, at t = 0. Show that, if x ∈ M, then the
initial value problem
dy/dt = F (y), y(0) = x
has a solution for all t ∈ R, and y(t) ∈ M for all t.
Hint. Locally, straighten out M to be a linear subspace of Rn , to which F is tangent.
Use uniqueness. Material in §2 helps do this local straightening.
Reconsider this problem after reading §7.
4. Show that the initial value problem
dx
dy
= −x(x2 + y 2 ),
= −y(x2 + y 2 ),
dt
dt
x(0) = x0 , y(0) = y0
has a solution for all t ≥ 0, but not for all t < 0, unless (x0 , y0 ) = (0, 0).
5. Let a ∈ R. Show that the unique solution to u0 (t) = au(t), u(0) = 1 is given by
(3.20)
u(t) =
∞
X
aj
j=0
j!
tj .
We denote this function by u(t) = eat , the exponential function. We also write
exp(t) = et .
Hint. Integrate the series term by term and use the fundamental theorem of
calculus.
Alternative. Setting u0 (t) = 1, and using the Picard iteration method (3.4) to
k
P
define the sequence uk (t), show that uk (t) =
aj tj /j!
j=0
6. Show that, for all s, t ∈ R,
(3.21)
ea(s+t) = eas eat .
Hint. Show that u1 (t) = ea(s+t) and u2 (t) = eas eat solve the same initial value
problem.
7. Show that exp : R → (0, ∞) is a diffeomorphism. We denote the inverse by
log : (0, ∞) −→ R.
31
Show that v(x) = log x solves the ODE dv/dx = 1/x, v(1) = 0, and deduce that
Z
x
(3.22)
1
8. Let a ∈ R, i =
given by
√
1
dy = log x.
y
−1. Show that the unique solution to f 0 (t) = iaf, f (0) = 1 is
(3.23)
f (t) =
∞
X
(ia)j
j=0
j!
tj .
We denote this function by f (t) = eiat . Show that, for all t ∈ R,
eia(s+t) = eias eiat .
(3.24)
9. Write
(3.25)
it
e =
∞
X
(−1)j
j=0
(2j)!
Show that
t
2j
∞
X
(−1)j 2j+1
+i
t
= u(t) + iv(t).
(2j + 1)!
j=0
u0 (t) = −v(t),
v 0 (t) = u(t).
We denote these functions by u(t) = cos t, v(t) = sin t. The identity
eit = cos t + i sin t
(3.26)
is called Euler’s formula.
Auxiliary exercises on trigonometric functions
We use the definitions of sin t and cos t given in problem 9 of the last exercise
set.
1. Use (3.24) to derive the identities
(3.27)
sin(x + y) = sin x cos y + cos x sin y
cos(x + y) = cos x cos y − sin x sin y.
32
2. Use (3.26)-(3.27) to show that
(3.28)
sin2 t + cos2 t = 1,
cos2 t =
¡
1
2
¢
1 + cos 2t .
3. Show that
γ(t) = (cos t, sin t)
is a map of R onto the unit circle S 1 ⊂ R2 with non-vanishing derivative, and, as t
increases, γ(t) moves monotonically, counterclockwise.
We define π to be the smallest number t1 ∈ (0, ∞) such that γ(t1 ) = (−1, 0), so
cos π = −1,
sin π = 0.
Show that 2π is the smallest number t2 ∈ (0, ∞) such that γ(t2 ) = (1, 0), so
cos 2π = 1,
Show that
sin 2π = 0.
cos(t + 2π) = cos t,
sin(t + 2π) = sin t
cos(t + π) = − cos t,
sin(t + π) = − sin t.
Show that γ(π/2) = (0, 1), and that
cos(t + 21 π) = − sin t,
sin(t + 12 π) = cos t.
4. Show that sin : (− 12 π, 21 π) → (−1, 1) is a diffeomorphism. We denote its inverse
by
arcsin : (−1, 1) −→ (− 12 π, 12 π).
Show that u(t) = arcsin t solves the ODE
du
1
=√
, u(0) = 0.
dt
1 − t2
¡
¢
Hint. Apply the chain rule to sin u(t) = t.
Deduce that, for t ∈ (−1, 1),
Z
(3.29)
t
√
arcsin t =
0
5. Show that
1
e 3 πi =
1
2
+
√
3
2 i,
1
dx
.
1 − x2
e 6 πi =
√
3
2
+ 12 i.
33
√ ¢
¡
1
1
3
Hint. First compute 12 + 23 i and use problem 3. Then compute e 2 πi e− 3 πi .
For intuition behind these formulas, look at Fig.3.1.
6. Show that sin 16 π = 12 , and hence that
π
=
6
where a0 = 1, an+1 =
Z
1
2
√
0
2n+1
2n+2 an .
∞
X
an ³ 1 ´2n+1
dx
=
,
2n
+
1
2
1 − x2
n=0
Show that
π X an ³ 1 ´2n+1
4−k
−
.
<
6 n=0 2n + 1 2
3(2k + 3)
k
Using a calculator, sum the series over 0 ≤ n ≤ 20, and verify that
π ≈ 3.141592653589 · · ·
7. For x 6= (k + 12 )π, k ∈ Z, set
tan x =
sin x
.
cos x
Show that 1 + tan2 x = 1/ cos2 x. Show that w(x) = tan x satisfies the ODE
dw
= 1 + w2 ,
dx
w(0) = 0.
8. Show that tan : (− 12 π, 12 π) → R is a diffeomorphism. Denote the inverse by
arctan : R −→ (− 12 π, 12 π).
Show that
Z
(3.30)
arctan y =
0
y
dx
.
1 + x2
34
4. Constant coefficient linear systems; exponentiation of matrices
Let A be an n × n matrix, real or complex. We consider the linear ODE
(4.1)
dy/dt = Ay;
y(0) = y0 .
In analogy to the scalar case, we can produce the solution in the form
y(t) = etA y0 ,
(4.2)
where we define
etA =
(4.3)
∞ k
X
t
k=0
k!
Ak .
We will establish estimates implying the convergence of this infinite series for
all real t, indeed for all complex t. Then term by term differentiation is valid, and
gives (4.1). To discuss convergence of (4.3), we need the notion of the norm of a
matrix.
If u = (u1 , . . . , un ) belongs to Rn or to Cn , set, as in (1.5),
(4.4)
¡
¢1
kuk = |u1 |2 + · · · + |un |2 2 .
Then, if A is an n × n matrix, set
(4.5)
kAk = sup{kAuk : kuk ≤ 1}.
The norm (4.4) possesses the following properties:
(4.6)
kuk ≥ 0,
kuk = 0 if and only if u = 0,
(4.7)
kcuk = |c| kuk, for real or complex c,
(4.8)
ku + vk ≤ kuk + kvk.
The last, known as the triangle inequality, follows from Cauchy’s inequality:
(4.9)
|(u, v)| ≤ kuk · kvk,
35
where the inner product is (u, v) = u1 v 1 + · · · + un v n . To deduce (4.8) from (4.9),
just square both sides of (4.8). To prove (4.9), use (u − v, u − v) ≥ 0 to get
2 Re (u, v) ≤ kuk2 + kvk2 .
Then replace u by eiθ u to deduce
2|(u, v)| ≤ kuk2 + kvk2 .
Next, replace u by tu and v by t−1 v, to get
2|(u, v)| ≤ t2 kuk2 + t−2 kvk2 ,
for any t > 0. Picking t so that t2 = kvk/kuk, we have Cauchy’s inequality (4.9).
Granted (4.6)-(4.8), we easily get
kAk ≥ 0,
(4.10)
kcAk = |c| kAk,
kA + Bk ≤ kAk + kBk.
Also, kAk = 0 if and only if A = 0. The fact that kAk is the smallest constant K
such that kAuk ≤ Kkuk gives
(4.11)
kABk ≤ kAk · kBk.
In particular,
(4.12)
kAk k ≤ kAkk .
This makes it easy to check convergence of the power series (4.3).
Power series manipulations can be used to establish the identity
(4.13)
esA etA = e(s+t)A .
Another way to prove this is the following. Regard t as fixed; denote the left side
of (4.13) as X(s) and the right side as Y (s). Then differentiation with respect to s
gives, respectively
(4.14)
X 0 (s) = AX(s),
X(0) = etA ,
Y 0 (s) = AY (s),
Y (0) = etA ,
so uniqueness of solutions to the ODE implies X(s) = Y (s) for all s. We note that
(4.13) is a special case of the following.
36
Proposition 4.1. et(A+B) = etA etB for all t, if and only if A and B commute.
Proof. Let
(4.15)
Y (t) = et(A+B) ,
Z(t) = etA etB .
Note that Y (0) = Z(0) = I, so it suffices to show that Y (t) and Z(t) satisfy the
same ODE, to deduce that they coincide. Clearly
Y 0 (t) = (A + B)Y (t).
(4.16)
Meanwhile
Z 0 (t) = AetA etB + etA BetB .
(4.17)
Thus we get the equation (4.16) for Z(t) provided we know that
etA B = BetA if AB = BA.
(4.18)
This folows from the power series expansion for etA , together with the fact that
(4.19)
Ak B = BAk for all k ≥ 0, if AB = BA.
For the converse, if Y (t) = Z(t) for all t, then etA B = BetA , by (4.17), and hence,
taking the t-derivative, etA AB = BAetA ; setting t = 0 gives AB = BA.
If A is in diagonal form

(4.20)
a1
A=

..

.
an
then clearly

(4.21)
etA = 
eta1

..

.
etan
The following result makes it useful to diagonalize A in order to compute etA .
Proposition 4.2. If K is an invertible matrix and B = KAK −1 , then
(4.22)
etB = K etA K −1 .
Proof. This follows from the power series expansion (4.3), given the observation
that
(4.23)
B k = K Ak K −1 .
37
In view of (4.20)-(4.22), it is convenient to record a few standard results about
eigenvalues and eigenvectors here. Let A be an n × n matrix over F, F = R or C.
An eigenvector of A is a nonzero u ∈ F n such that
(4.24)
Au = λu
for some λ ∈ F. Such an eigenvector exists if and only if A − λI : F n → F n is not
invertible, i.e., if and only if
(4.25)
det(A − λI) = 0.
Now (4.25) is a polynomial equation, so it always has a complex root. This proves
the following.
Proposition 4.3. Given an n × n matrix A, there exists at least one (complex)
eigenvector u.
Of course, if A is real and we know there is a real root of (4.25) (e.g., if n is
odd) then a real eigenvector exists. One important class of matrices guaranteed to
have real eigenvalues is the class of self adjoint matrices. The adjoint of an n × n
complex matrix is specified by the identity (Au, v) = (u, A∗ v).
Proposition 4.4. If A = A∗ , then all eigenvalues of A are real.
Proof. Au = λu implies
(4.26)
λkuk2 = (λu, u) = (Au, u) = (u, Au) = (u, λu) = λkuk2 .
Hence λ = λ, if u 6= 0.
We now establish the following important result.
Theorem 4.5. If A = A∗ , then there is an orthonormal basis of Cn consisting of
eigenvectors of A.
Proof. Let u1 be one unit eigenvector; Au1 = λu1 . Existence is guaranteed by
Proposition 4.3. Let V = (u1 )⊥ be the orthogonal complement of the linear span
of u1 . Then dim V is n − 1 and
(4.27)
A : V → V, if A = A∗ .
The result follows by induction on n.
Corollary 4.6. If A = At is a real symmetric matrix, then there is an orthonormal
basis of Rn consisting of eigenvectors of A.
Proof. By Proposition 4.4 and the remarks following Proposition 4.3, there is one
unit eigenvector u1 ∈ Rn . The rest of the proof is as above.
38
The proof of the last four results rests on the fact that every nonconstant polynomial has a complex root. This is the Fundamental Theorem of Algebra. Two
proofs of this result are given in §17 and §20. An alternative approach to Proposition 4.3 when A = A∗ , yielding Proposition 4.4-Corollary 4.6, is given in one of the
exercises below.
Given an ODE in upper triangular form,


a11 ∗
∗
..
(4.28)
dy/dt = 
.
∗ y
ann
you can solve the last ODE for yn , as it is just dyn /dt = ann yn . Then you get a
single inhomogeneous ODE for yn−1 , which can be solved as demonstrated in (0.6)(0.9), and you can continue inductively to solve. Thus it is often useful to be able
to put an n × n matrix A in upper triangular form, with respect to a convenient
choice of basis. We will establish two results along these lines. The first is due to
Schur.
Theorem 4.7. For any n × n matrix A, there is an orthonormal basis u1 , . . . , un
of Cn with respect to which A is in upper triangular form.
This result is equivalent to:
Proposition 4.8. For any A, there is a sequence of vector spaces Vj of dimension
j, contained in Cn , with
(4.29)
Vn ⊃ Vn−1 ⊃ · · · ⊃ V1
and
(4.30)
A : Vj −→ Vj .
To see the equivalence, if we are granted (4.29)-(4.30), pick un ⊥Vn−1 , a unit
vector, then pick un−1 ∈ Vn−1 such that un−1 ⊥Vn−2 , and so forth. Meanwhile,
Proposition 4.8 is a simple inductive consequence of the following result.
Lemma 4.9. For any matrix A acting on Vn , there is a linear subspace Vn−1 , of
codimension 1, such that A : Vn−1 → Vn−1 .
Proof. Use Proposition 4.3, applied to A∗ . There is a vector v1 such that A∗ v1 =
λv1 . Let Vn−1 = (v1 )⊥ . This completes the proof of the Lemma, hence of Theorem
4.7.
Let’s look more closely at what you can say about solutions to an ODE which
has been put in the form (4.28). As mentioned, we can obtain yj inductively by
solving nonhomogeneous scalar ODEs
(4.31)
dyj /dt = ajj yj + bj (t)
39
where bj (t) is a linear combination of yj+1 (t), . . . , yn (t), and the formula (0.9) applies, with A(t) = ajj t. We have yn (t) = Ceann t , so bn−1 (t) is a multiple of eann t . If
an−1,n−1 6= ann , yn−1 (t) will be a linear combination of eann t and ean−1,n−1 t , but if
an−n,n−1 = ann , yn−1 (t)R may be a linear combination of eann t and teann t . Further
integration will involve p(t)eαt dt where p(t) is a polynomial. That no other sort
of function will arise is guaranteed by the following result.
Lemma 4.10. If p(t) ∈ Pn , the space of polynomials of degree ≤ n, and α 6= 0,
then
Z
(4.32)
p(t)eαt dt = q(t)eαt + C
for some q(t) ∈ Pn .
Proof. The map p = T q defined by (d/dt)(q(t)eαt ) = p(t)eαt is a map on Pn ; in
fact we have
T q(t) = αq(t) + q 0 (t).
(4.33)
It suffices to show T : Pn → Pn is invertible. But D = d/dt is nilpotent on
Pn ; Dn+1 = 0. Hence
¡
¢
T −1 = α−1 (I + α−1 D)−1 = α−1 I − α−1 D + · · · + α−n (−D)n .
Note that this gives a neat formula for the integral (4.32). For example,
Z
(4.34)
tn e−t dt = −(tn + ntn−1 + · · · + n!)e−t + C
³
1
1 ´
= −n! 1 + t + t2 + · · · + tn e−t + C.
2
n!
This could also be established by integration by parts and induction. Of course,
when α = 0 in (4.32) the result is different; q(t) is a polynomial of degree n + 1.
Now the implication for the solution to (4.28) is that all the components of y(t)
are products of polynomials and exponentials. By Theorem 4.7, we can draw the
same conclusion about the solution to dy/dt = Ay for any n × n matrix A. We can
formally state the result as follows.
Proposition 4.11. For any n × n matrix A,
(4.35)
etA v =
X
eλj t vj (t),
where {λj } is the set of eigenvalues of A and vj (t) are Cn -valued polynomials. All
the vj (t) are constant when A is diagonalizable.
40
To see that the λj are the eigenvalues of A, note that, in the upper triangular
case, only the exponentials eajj t arise, and in that case the eigenvalues are precisely
the diagonal elements.
If we let Eλ denote the space of Cn -valued functions of the form V (t) = eλt v(t),
where v(t) is a Cn -valued polynomial, then Eλ is invariant under the action of both
d/dt and A, hence of d/dt − A. Hence, if a sum V1 (t) + · · · + Vk (t), Vj (t) ∈ Eλj
(with λj s distinct) is annihilated by d/dt − A, so is each term in this sum.
Therefore, if (4.35) is a sum over the distinct eigenvalues λj of A, it follows that
each term eλj t vj (t) is annihilated by d/dt − A, or equivalently is of the form etA wj
where wj = vj (0). This leads to the following conclusion. Set
(4.36)
Gλ = {v ∈ Cn : etA v = etλ v(t), v(t) polynomial}.
Then Cn has a direct sum decomposition
(4.37)
Cn = Gλ1 + · · · + Gλk
where λ1 , . . . , λk are the distinct eigenvalues of A. Furthermore, each Gλj is invariant under A, and
(4.38)
Aj = A|Gλj has exactly one eigenvalue, λj .
This last statement holds because, when v ∈ Gλj , etA v involves only the exponential eλj t . We say Gλj is the generalized eigenspace of A, with eigenvalue λj . Of
course, Gλj contains ker (A − λj I). Now Bj = Aj − λj I has only 0 as an eigenvalue.
It is subject to the following result.
Lemma 4.12. If B : Ck → Ck has only 0 as an eigenvalue, then B is nilpotent,
i.e.,
(4.39)
B m = 0 for some m.
Proof. Let Wj = B j (Ck ); then Ck ⊃ W1 ⊃ W2 ⊃ · · · is a sequence of finite
dimensional vector spaces, each invariant under B. This sequence must stabilize, so
for some m, B : Wm → Wm bijectively. If Wm 6= 0, B has a nonzero eigenvalue.
We next discuss the famous Jordan normal form of a complex n × n matrix. The
result is the following.
Theorem 4.13. If A is an n × n matrix, then there is a basis of Cn with respect
to which A becomes a direct sum of blocks of the form


λj 1


..


.
λ
j


(4.40)


.
.

. 1 
λj
41
In light of the decomposition (4.37) and Lemma 4.12, it suffices to establish the
Jordan normal form for a nilpotent matrix B. Given v0 ∈ Ck , let m be the smallest
integer such that B m v0 = 0; m ≤ k. If m = k, then {v0 , Bv0 , . . . , B m−1 v0 } gives a
basis of Ck putting B in Jordan normal form. We then say v0 is a cyclic vector for
B, and Ck is generated by v0 . We call {v0 , . . . , B m−1 v0 } a string.
We will have a Jordan normal form precisely if we can write Ck as a direct sum of
cyclic subspaces. We establish that this can be done by induction on the dimension.
Thus, inductively, we can suppose W1 = B(Ck ) is a direct sum of cyclic subspaces, so W1 has a basis that is a union of strings, let’s say a union of d strings
{vj , Bvj , . . . , B `j vj }, 1 ≤ j ≤ d. In this case, ker B ∩ W1 = N1 has dimension d,
and the vectors B `j vj , 1 ≤ j ≤ d, span N1 . Furthermore, each vj has the form
vj = Bwj for some wj ∈ Ck .
Now dim ker B = k −r ≥ d, where r = dim W1 . Let {z1 , . . . , zk−r−d } span a subspace of ker B complementary to N1 . Then the strings {wj , vj = Bwj , . . . , B `j vj }, 1 ≤
j ≤ d, and {z1 }, . . . , {zk−r−d } generate cyclic subspaces whose direct sum is Ck ,
giving the Jordan normal form.
The argument above is part of an argument of Filippov. In fact, Filippov’s proof
contains a further clever twist, enabling one to prove Theorem 4.13 without using
the decomposition (4.37). However, since we got this decomposition almost for free
as a byproduct of the ODE analysis in Proposition 4.11, I decided to make use of
it. See Strang [Str] for Filippov’s proof.
We have seen how constructing etA solves the equation (4.1). We can also use it
to solve an inhomogeneous equation, of the form
(4.41)
dy/dt = Ay + b(t);
y(0) = y0 .
Direct calculation shows that the solution is given by
Z
(4.42)
t
tA
y(t) = e y0 +
e(t−s)A b(s) ds.
0
Note how this partially generalizes the formula (3.19). This formula is a special
case of Duhamel’s Principle, which will be discussed further in §5.
Exercises
1. In addition to the ‘operator norm’ kAk of an n × n matrix, defined by (4.5), we
consider the ‘Hilbert-Schmidt norm’ kAkHS , defined by
kAk2HS =
X
j,k
|ajk |2 ,
42
¡ ¢
if A = ajk . Show that
kAk ≤ kAkHS .
Hint. If r1 , . . . , rn are the rows of A, then for u ∈ Cn , Au has entries rj · u, 1 ≤
j ≤ n. Use Cauchy’s inequality (4.9) to estimate |rj · u|2 .
Show also that
X
|ajk |2 ≤ kAk2 for each k,
j
and hence
kAk2HS ≤ nkAk2 .
Hint. kAk ≥ kAek k for each standard basis vector ek .
2. Show that, in analogy with (4.11), we have
kABkHS ≤ kAkHS kBkHS .
Indeed, show that
kABkHS ≤ kAk · kBkHS ,
the first factor on the right being the operator norm kAk.
3. Let X be an n × n matrix. Show that
det eX = eT r
X
.
Hint. Use a normal form.
In problems 4-5 we will deal with the set SO(n) of real n × n matrices T such that
T t T = I and Det T > 0 (hence Det T = 1). Also, let Skew(n) denote the set of
real n × n skew-symmetric matrices, and let M (n) denote the set of real n × n real
matrices. Set Exp(X) = eX .
4. Show that
Exp : Skew(n) −→ SO(n).
Show that this map is onto.
43
5. Show that the map Exp : Skew(n) → M (n) satisfies
D Exp(0)A = A,
for A ∈ Skew(n), so D Exp(0) is the inclusion ι : Skew(n) ,→ M (n).
n−1
6. Let A : Rn → Rn be symmetric.
= {x ∈
¯ Let Q(x) = (Ax, x). Let v1 ∈ S
n
R : |x| = 1} be a point where Q¯S n−1 assumes a maximum. Show that v1 is an
eigenvector of A.
Hint. Show that ∇Q(v1 ) is parallel to ∇E(v1 ), where E(x) = (x, x).
Use this result to give an alternative proof of Corollary 4.6.
Extend this argument to establish Theorem 4.5.
44
5. Variable coefficient linear systems of ODE. Duhamel’s principle
Let A(t) be a continuous n × n matrix valued function of t ∈ I. We consider the
general linear homogeneous ODE
(5.1)
dy/dt = A(t)y,
y(0) = y0 .
The general theory of §2 gives local solutions. We claim the solutions here exist for
all t ∈ I. This follows from the discussion after the proof of Theorem 2.1, together
with the following estimate on the solution to (5.1).
Proposition 5.1. If kA(t)k ≤ M for t ∈ I, then the solution to (5.1) satisfies
(5.2)
ky(t)k ≤ eM |t| ky0 k.
It suffices to prove this for t ≥ 0. Then z(t) = e−M t y(t) satisfies
(5.3)
dz/dt = C(t)z,
z(0) = y0 ,
with C(t) = A(t) − M. Hence C(t) satisfies
(5.4)
Re (C(t)u, u) ≤ 0 for all u ∈ Cn .
Thus (5.2) is a consequence of the next energy estimate, of independent interest.
Proposition 5.2. If z solves (5.3) and if (5.4) holds for C(t), then
kz(t)k ≤ kz(0)k for t ≥ 0.
Proof. We have
(d/dt)kz(t)k2 = (z 0 (t), z(t)) + (z(t), z 0 (t))
(5.8)
= 2 Re (C(t)z(t), z(t))
≤ 0.
Thus we have global existence for (5.1). There is a matrix valued function S(t, s)
such that the unique solution to (5.1) satisfies
(5.6)
y(t) = S(t, s)y(s).
Using this solution operator, we can treat the inhomogeneous equation
(5.7)
dy/dt = A(t)y + b(t),
Indeed, direct calculation yields
(5.8)
Z
y(t) = S(t, 0)y0 +
y(0) = y0 .
t
S(t, s)b(s)ds.
0
This identity is known as Duhamel’s Principle.
We next prove an identity which might be called the ‘noncommutative fundamental theorem of calculus.’
45
Proposition 5.3. If A(t) is a continuous matrix function and S(t, 0) is defined as
above, then
(5.9)
S(t, 0) = lim e(t/n)A((n−1)t/n) · · · e(t/n)A(0) ,
n→∞
where there are n factors on the right.
Proof. To prove this at t = T, divide the interval [0, T ] into n equal parts. Set
y = S(t, 0)y0 and define zn (t) by zn (0) = y0 , and
(5.10)
¡
¢
dzn /dt = A(jT /n)zn for t ∈ jT /n, (j + 1)T /n ,
requiring continuity across each endpoint of these intervals. We see that
(5.11)
dzn /dt = A(t)zn + Rn (t)
with
(5.12)
kRn (t)k ≤ δn kzn (t)k,
δn → 0 as n → ∞.
Meanwhile we see that, on [0, T ], kzn (t)k ≤ CT ky0 k. We want to compare zn (t)
and y(t). We have
(5.13)
(d/dt)(zn − y) = A(t)(zn − y) + Rn (t);
zn (0) − y(0) = 0.
Hence Duhamel’s principle gives
Z
(5.14)
t
zn (t) − y(t) =
S(t, s)Rn (s)ds,
0
and since we have an a priori bound kS(t, s)k ≤ K for |s|, |t| ≤ T, we get
(5.15)
kzn (t) − y(t)k ≤ KT CT δn ky0 k → 0 as n → ∞, |t| ≤ T.
In particular, zn (T ) → y(T ) as n → ∞. Since zn (T ) is given by the right side of
(5.9) with t = T, this proves (5.9).
46
Exercises.
1. Let A(t) and X(t) be n × n matrices, satisfying
dX/dt = A(t)X.
We form the Wronskian W (t) = det X(t). Show that W satisfies the ODE
dW/dt = a(t)W,
a(t) = T r A(t).
Hint. Use exercise 2 of §1 to write dW/dt = T r(Cof(X)t dX/dt), and use Cramer’s
formula, (det X)X −1 = Cof(X)t .
Alternative. Write X(t + h) = ehA(t) X(t) + O(h2 ) and use exercise 3 of §4 to write
det ehA(t) = eha(t) , hence W (t + h) = eha(t) W (t) + O(h2 ).
2. Let u(t) = ky(t)k2 , for a solution y to (5.1). Show that
(5.16)
provided kA(t)k ≤
equality
du/dt ≤ M (t)u(t),
1
2 M (t).
Such a differential inequality implies the integral inZ
(5.17)
u(t) ≤ A +
t
M (s)u(s)ds,
t ≥ 0,
0
with A = u(0). The following is a Gronwall inequality, namely if (5.17) holds for a
real valued function u, then, provided M (s) ≥ 0, we have, for t ≥ 0,
Z t
N (t)
(5.18)
u(t) ≤ Ae
, N (t) =
M (s)ds.
0
Prove this. Note that the quantity dominating u(t) in (5.18) is equal to U, solving
U (0) = A, dU/dt = M (t)U (t).
3. Generalize the Gronwall inequality of exercise 2 as follows. Let U be a real
valued solution to
(5.19)
dU/dt = F (t, U ),
U (0) = A,
and let u satisfy the integral inequality
Z t
(5.20)
u(t) ≤ A +
F (s, u(s))ds.
0
47
Then prove that
(5.21)
u(t) ≤ U (t), for t ≥ 0,
provided ∂F/∂u ≥ 0. Show that this continues to hold if we replace (5.19) by
Z t
(5.19’)
U (t) ≥ A +
F (s, U (s))ds.
0
Hint. Set v = u − U. Then (5.19’)-(5.20) imply
Z t
£
¤
v(t) ≤
F (s, u(s)) − F (s, U (s)) ds
0
Z t
=
M (s)v(s)ds
0
where
Z
1
M (s) =
¡
¢
Fu s, τ u(s) + (1 − τ )U (s) dτ.
0
Thus (5.17) applies, with A = 0.
4. Let x(t) be a smooth curve in R3 ; assume it is parametrized by arclength, so
T (t) = x0 (t) has unit length; T (t) · T (t) = 1. Differentiating, we have T 0 (t)⊥T (t).
The curvature is defined to be κ(t) = kT 0 (t)k. If κ(t) 6= 0, we set N (t) = T 0 /kT 0 k,
so
T 0 = κN,
and N is a unit vector orthogonal to T. We define B(t) by
(5.22)
B = T × N.
Note that (T, N, B) form an orthonormal basis of R3 for each t, and
(5.23)
T = N × B,
N = B × T.
By (5.22) we have B 0 = T × N 0 . Deduce that B 0 is orthogonal to both T and B,
hence parallel to N. We set
B 0 = −τ N,
for smooth τ (t), called the torsion.
5. From N 0 = B 0 × T + B × T 0 and the formulas for T 0 and B 0 above, deduce the
following system, called the Frenet-Serret formula:
T0 =
(5.24)
κN
N 0 = −κT
0
B =
+ τB
− τN
48
Form the 3 × 3 matrix

(5.25)
0

A(t) = κ
0
−κ
0
τ

0
−τ 
0
and deduce that the 3 × 3 matrix F (t) whose columns are T, N, B :
F = (T, N, B)
satisfies the ODE
dF/dt = F A(t).
6. Derive the following converse to the Frenet-Serret formula. Let T (0), N (0), B(0)
be an orthonormal set in R3 , such that B(0) = T (0) × N (0), let κ(t) and τ (t) be
given smooth functions, and solve the system (5.24). Show that there is a unique
curve x(t) such that x(0) = 0 and T (t), N (t), B(t) are associated to x(t) by the
construction in problem 4, so in particular the curve has curvature κ(t) and torsion
τ (t).
Hint. To prove that (5.22)-(5.23) hold for all t, consider the next exercise.
7. Let A(t) be a smooth n × n real matrix function which is skew adjoint for all
t (of which (5.25) is an example). Suppose F (t) is a real n × n matrix function
satisfying
dF/dt = F A(t).
If F (0) is an orthogonal matrix, show that F (t) is orthogonal for all t.
Hint. Set J(t) = F (t)∗ F (t). Show that J(t) and J0 (t) = I both solve the initial
value problem
dJ/dt = [J, A(t)], J(0) = I.
8. Let U1 = T, U2 = N, U3 = B, and set ω(t) = τ T + κB. Show that (5.24) is
equivalent to Uj0 = ω × Uj , 1 ≤ j ≤ 3.
9. Suppose τ and κ are constant. Show that ω is constant, so T (t) satisfies the
constant coefficient ODE
T 0 (t) = ω × T (t).
Note that ω · T (0) = τ. Show that, after a translation and rotation, x(t) takes the
form
¡
¢
γ(t) = λκ2 cos λt, λκ2 sin λt, λτ t , λ2 = κ2 + τ 2 .
49
Auxiliary exercises on the cross product
1. If u, v ∈ R3 , show that the formula
(5.26)

w1
w · (u × v) = Det  w2
w3

v1
v2 
v3
u1
u2
u3
for u × v = κ(u, v) defines uniquely a bilinear map κ : R3 × R3 → R3 . Show that it
satisfies
i × j = k, j × k = i, k × i = j,
where {i, j, k} is the standard basis of R3 .
2. We say T ∈ SO(3) provided that T is a real 3 × 3 matrix satisfying T t T = I and
Det T > 0, (hence Det T = 1). Show that
(5.27)
T ∈ SO(3) =⇒ T u × T v = T (u × v).
Hint. Multiply the 3 × 3 matrix in problem 1 on the left by T.
3. Show that, if θ is the angle between u and v in R3 , then
(5.28)
|u × v| = |u| |v| | sin θ|.
Hint. Check this for u = i, v = ai + bj, and use problem 2 to show this suffices.
4. Show that κ : R3 → Skew(3), given by

0
(5.29)
κ(y1 , y2 , y3 ) =  y3
−y2
−y3
0
y1

y2
−y1 
0
satisfies
(5.30)
Kx = y × x,
K = κ(y).
Show that, with [A, B] = AB − BA,
(5.31)
£
¤
κ(x × y) = κ(x), κ(y) ,
¡
¢
Tr κ(x)κ(y)t = 2x · y.
50
6. Dependence of solutions on initial data, and other parameters
We consider how a solution to an ODE depends on the initial conditions. Consider a nonlinear system
(6.1)
dy/dt = F (y),
y(0) = x.
As noted in the introduction, we can consider an autonomous system, such as
(6.1), without loss of generality. Suppose F : U → Rn is smooth, U ⊂ Rn open; for
simplicity we assume U is convex. Say y = y(t, x). We want to examine smoothness
in x.
Note that formally differentiating (6.1) with respect to x suggests that W =
Dx y(t, x) satisfies an ODE called the linearization of (6.1):
(6.2)
dW/dt = DF (y)W,
W (0) = I.
In other words, w(t, x) = Dx y(t, x)w0 satisfies
(6.3)
dw/dt = DF (y)w,
w(0) = w0 .
To justify this, we want to compare w(t) and
(6.4)
z(t) = y1 (t) − y(t)
= y(t, x + w0 ) − y(t, x).
It would be convenient to show that z satisfies an ODE similar to (6.3). Indeed,
z(t) satisfies
dz/dt = F (y1 ) − F (y)
= Φ(y1 , y)z
(6.5)
z(0) = w0 ,
where
Z
(6.6)
Φ(y1 , y) =
1
¡
¢
DF τ y1 + (1 − τ )y dτ.
0
If we assume
(6.7)
kDF (u)k ≤ M for u ∈ U,
51
then the solution operator S(t, 0) of the linear ODE d/dt − B(t), with B(y) =
Φ(y1 (t), y(t)), satisfies a bound kS(t, 0)k ≤ e|t|M as long as y(t), y1 (t) ∈ U. Hence
ky1 (t) − y(t)k ≤ e|t|M kw0 k.
(6.8)
This establishes that y(t, x) is Lipschitz in x.
To continue, since Φ(y, y) = DF (y), we rewrite (6.5) as
dz/dt = Φ(y + z, y)z
= DF (y)z + R(y, z)
(6.9)
w(0) = w0 .
where
(6.10)
F ∈ C 1 (U ) =⇒ kR(y, z)k = o(kzk) = o(kw0 k).
Now comparing the ODE (6.9) with (6.3), we have
(6.11)
(d/dt)(z − w) = DF (y)(z − w) + R(y, z)
(z − w)(0) = 0.
Then Duhamel’s principle yields
Z
(6.12)
z(t) − w(t) =
t
¡
¢
S(t, s)R y(s), z(s) ds,
0
so by the bound kS(t, s)k ≤ e|t−s|M and (6.10) we have
(6.13)
z(t) − w(t) = o(kw0 k).
This is precisely what is required to show that y(t, x) is differentiable with respect
to x, with derivative W = Dx y(t, x) satisfying (6.2). We state our first result.
Proposition 6.1. If F ∈ C 1 (U ), and if solutions to (6.1) exist for t ∈ (−T0 , T1 ),
then for each such t, y(t, x) is C 1 in x, with derivative Dx y(t, x) = W (t, x) satisfying (6.2).
So far we have shown that y(t, x) is both Lipschitz and differentiable in x, but
the continuity of W (t, x) in x follows easily by comparing the ODEs of the form
(6.2) for W (t, x) and W (t, x + w0 ), in the spirit of the analysis of (6.11).
If F possesses further smoothness, we can obtain higher differentiability of y(t, x)
in x by the following trick. Couple (6.1) and (6.2), to get an ODE for (y, W ) :
(6.14)
dy/dt = F (y)
dW/dt = DF (y)W
with initial condition
(6.15)
y(0) = x,
W (0) = I.
We can reiterate the argument above, getting results on Dx (y, W ), i.e., on Dx2 y(t, x),
and continue, proving:
52
Proposition 6.2. If F ∈ C k (U ), then y(t, x) is C k in x.
Similarly we can consider dependence of the solution to a system of the form
(6.16)
dy/dt = F (τ, y),
y(0) = x
on a parameter τ, assuming F is smooth jointly in τ, y. This result can be deduced
from the previous one by the following trick: consider the ODE
(6.17)
dy/dt = F (z, y), dz/dt = 0;
y(0) = x, z(0) = τ.
Thus we get smoothness of y(t, τ, x) in (τ, x). As one special case, let F (τ, y) =
τ F (y). In this case y(t0 , τ, x) = y(τ t0 , 1, x), so we can improve the conclusion of
Proposition 6.2 to:
(6.18)
F ∈ C k (U ) =⇒ y ∈ C k jointly in (t, x).
It is also true that if F is analytic, then one has analytic dependence of solutions
on parameters, especially on t, so that power series techniques work in that case.
One approach to the proof of this is given in exercises below.
Exercises
1. Let Ω be open in R2n , identified with Cn , via z = x + iy. Let X : Ω → R2n
have components X = (a1 , . . . , an , b1 , . . . , bn ), where aj (x, y) and bj (x, y) are real
valued. Denote the solution to du/dt = X(u), u(0) = z by u(t, z). Assume fj (z) =
aj (z) + ibj (z) is holomorphic in z, which is to say its derivative commutes with
J, acting on R2k = Ck as multiplication by i. Show that, for each t, u(t, z) is
holomorphic in z, i.e., Dz u(t, z) commutes with J.
Hint. Use the linearized equation (6.2) to show that K(t) = [W (t), J] satisfies the
ODE
dK/dt = DX(z)K, K(0) = 0.
2. If O ⊂ Rn is open and F : O → Rn is real analytic, show that the solution
y(t, x) to (6.1) is real analytic in x.
Hint. With F = (a1 , . . . , an ), take holomorphic extensions fj (z) of aj (x) and use
exercise 1.
Using the trick leading to (6.18), show that y(t, x) is real analytic jointly in (t, x).
53
7. Flows and vector fields
Let U ⊂ Rn be open. A vector field on U is a smooth map
X : U −→ Rn .
(7.1)
Consider the corresponding ODE
(7.2)
dy/dt = X(y),
y(0) = x,
with x ∈ U. A curve y(t) solving (7.2) is called an integral curve of the vector field
X. It is also called an orbit. For fixed t, write
(7.3)
t
(x).
y = y(t, x) = FX
t
The locally defined FX
, mapping (a subdomain of) U to U, is called the flow
generated by the vector field X.
The vector field X defines a differential operator on scalar functions, as follows:
£
¤
h
x) − f (x)
LX f (x) = lim h−1 f (FX
h→0
(7.4)
¯
t
= (d/dt)f (FX
x)¯t=0 .
We also use the common notation
(7.5)
LX f (x) = Xf,
that is, we apply X to f as a first order differential operator.
Note that, if we apply the chain rule to (7.4) and use (7.2), we have
(7.6)
LX f (x) = X(x) · ∇f (x)
X
=
aj (x)∂f /∂xj ,
P
if X =
aj (x)ej , with {ej } the standard basis of Rn . In particular, using the
notation (7.5), we have
(7.7)
aj (x) = Xxj .
In the notation (7.5),
(7.8)
X=
X
aj (x)∂/∂xj .
We note that X is a derivation, i.e., a map on C ∞ (U ), linear over R, satisfying
(7.9)
X(f g) = (Xf )g + f (Xg).
Conversely, any derivation on C ∞ (U ) defines a vector field, i.e., has the form (7.8),
as we now show.
54
Proposition 7.1. If X is a derivation on C ∞ (U ), then X has the form (7.8).
P
Proof. Set aj (x) = Xxj , X # =
aj (x)∂/∂xj , and Y = X − X # . Then Y is a
derivation satisfying Y xj = 0 for each j; we aim to show that Y f = 0 for all f.
Note that, whenever Y is a derivation
1 · 1 = 1 ⇒ Y · 1 = 2Y · 1 ⇒ Y · 1 = 0,
i.e., Y annihilates constants. Thus in this case Y annihilates all polynomials of
degree ≤ 1.
Now we show Y f (p) = 0 for all p ∈ U. Without loss of generality, we can suppose
R1
p = 0, the origin. Then, by (1.8), we can take bj (x) = 0 (∂j f )(tx)dt, and write
X
f (x) = f (0) +
bj (x)xj .
It immediately follows that Y f vanishes at 0, so the proposition is proved.
A fundamental fact about vector fields is that they can be ‘straightened out’
near points where they do not vanish. To see this, suppose a smooth vector field
X is given on U such that, for a certain p ∈ U, X(p) 6= 0. Then near p there is a
hypersurface M which is nowhere tangent to X. We can choose coordinates near p
so that p is the origin and M is given by {xn = 0}. Thus we can identify a point
x0 ∈ Rn−1 near the origin with x0 ∈ M. We can define a map
(7.10)
F : M × (−t0 , t0 ) −→ U
by
(7.11)
t
F(x0 , t) = FX
(x0 ).
This is C ∞ and has surjective derivative, so by the Inverse Function Theorem is
a local diffeomorphism. This defines a new coordinate system near p, in whch the
flow generated by X has the form
(7.12)
s
FX
(x0 , t) = (x0 , t + s).
If we denote the new coordinates by (u1 , . . . , un ), we see that the following result
is established.
Theorem 7.2. If X is a smooth vector field on U with X(p) 6= 0, then there exists
a coordinate system (u1 , . . . , un ) centered at p (so uj (p) = 0) with respect to which
(7.13)
X = ∂/∂un .
We now make some elementary comments on vector fields in the plane. Here,
the object is to find the integral curves of
(7.14)
f (x, y)∂/∂x + g(x, y)∂/∂y,
55
that is, to solve
(7.15)
x0 = f (x, y),
y 0 = g(x, y).
This implies
(7.16)
dy/dx = g(x, y)/f (x, y),
or, written in differential form notation (which will be discussed more thoroughly
in §13),
(7.17)
g(x, y)dx − f (x, y)dy = 0.
Suppose we manage to find an explicit solution to (7.16):
(7.18)
y = ϕ(x),
x = ψ(y).
Often, it is not feasible to do so, but beginning chapters of elementary texts on
ODE often give methods for doing so in some cases. Then the original system
becomes
(7.19)
x0 = f (x, ϕ(x)),
y 0 = g(ψ(y), y).
In other words, we have reduced ourselves to integrating vector fields on the line.
We have
Z
£
¤−1
f (x, ϕ(x)) dx = t + C1 ,
Z
(7.20)
£
¤−1
g(ψ(y), y) dy = t + C2 .
If (7.18) can be explicitly achieved, it may be that one integral or the other in
(7.20) is easier to evaluate. With either x or y solved, as a function of t, the other
is determined by (7.18).
One case when the planar vector field can be integrated explicitly (locally) is
when there is a smooth u, with nonvanishing gradient, explicitly given, such that
(7.21)
Xu = 0
where X is the vector field (7.14). One says u is a conserved quantity. In such a
case, let w be any smooth function such that (u, w) form a local coordinate system.
In this coordinate system,
(7.22)
X = b(u, w)∂/∂w
56
by (7.7), so
(7.23)
Xv = 1
with
Z
(7.24)
w
v(u, w) =
b(u, s)−1 ds,
w0
and the local coordinate system (u, v) linearizes X.
Exercises
1. Suppose h(x, y) is homogeneous of degree 0, i.e., h(rx, ry) = h(x, y), so h(x, y) =
k(x/y). Show that the ODE
dy/dx = h(x, y)
is changed to a separable ODE for u = u(x), if u = y/x.
2. Using exercise 1, discuss constructing the integral curves of a vector field
X = f (x, y)∂/∂x + g(x, y)∂/∂y
when f (x, y) and g(x, y) are homogeneous of degree a, i.e.,
f (rx, ry) = ra f (x, y) for r > 0,
and similarly for g.
3. Describe the integral curves of
(x2 + y 2 )∂/∂x + xy∂/∂y.
4. Describe the integral curves of
A(x, y)∂/∂x + B(x, y)∂/∂y
when A(x, y) = a1 x + a2 y + a3 , B(x, y) = b1 x + b2 y + b3 .
57
5. Let X = f (x, y)(∂/∂x) + g(x, y)(∂/∂y) be a vector field on a disc Ω ⊂ R2 .
Suppose
div X = 0, i.e., ∂f /∂x + ∂g/∂y = 0. Show that a function u(x, y) such that
∂u/∂x = g,
∂u/∂y = −f
is given by a line integral. Show that Xu = 0 and hence integrate X.
Reconsider this problem after reading §14.
6. Find the integral curves of the vector field
X = (2xy + y 2 + 1)
7. Show that
∂
∂
+ (x2 + 1 − y 2 ) .
∂x
∂y
div(ev X) = ev (div X + Xv).
Hence, if X is a vector field on Ω ⊂ R2 as in exercise 5, show that you can integrate
X if you can construct a function v(x, y) such that Xv = −div X. Construct such
v if either
(div X)/f (x, y) = ϕ(x) or (div X)/g(x, y) = ψ(y).
8. Find the integral curves of the vector field
X = 2xy
∂
∂
+ (x2 + y 2 − 1) .
∂x
∂y
Let X be a vector field on Rn , with a critical point at 0, i.e., X(0) = 0. Suppose
that, for x ∈ Rn near 0,
(7.25)
X(x) = Ax + R(x),
kR(x)k = O(kxk2 ),
where A is an n × n matrix. We call Ax the linearization of X at 0.
9. Suppose all the eigenvalues of A have negative real¡ part. Construct
a qua¢
dratic polynomial Q : Rn → [0, ∞), such that Q(0) = 0, ∂ 2 Q/∂xj ∂xk is positive
definite, and such that, for any integral curve x(t) of X as in (7.25),
(d/dt)Q(x(t)) < 0 if t ≥ 0
58
provided x(0) = x0 (6= 0) is close enough to 0. Deduce that, for small enough C, if
kx0 k ≤ C, then x(t) exists for all t ≥ 0 and x(y) → 0 as t → ∞.
Hint. Take Q(x) = hx, xi, using exercise 10 below.
10. Let A be an n × n matrix, all of whose eigenvalues λj have negative real part.
Show there exists a Hermitian inner product h, i on Cn such that Re hAu, ui < 0
for nonzero u ∈ Cn .
Hint. Put A in Jordan normal form, but with εs instead of 1s above the diagonal,
where ε is small compared with |Re λj |.
59
8. Lie brackets
If F : V → W is a diffeomorphism between two open domains in Rn , and Y is a
vector field on W, we define a vector field F# Y on V so that
(8.1)
FFt # Y = F −1 ◦ FYt ◦ F,
or equivalently, by the chain rule,
(8.2)
¡
¢¡
¢ ¡
¢
F# Y (x) = DF −1 F (x) Y F (x) .
In particular, if U ⊂ Rn is open and X is a vector field on U, defining a flow F t ,
t
then for a vector field Y, F#
Y is defined on most of U, for |t| small, and we can
define the Lie derivative:
¡ h
¢
LX Y = lim h−1 F#
Y −Y
h→0
(8.3)
¯
t
= (d/dt)F#
Y ¯t=0 ,
as a vector field on U.
Another natural construction is the operator-theoretic bracket:
(8.4)
[X, Y ] = XY − Y X,
where the vector fields X and Y are regarded as first order differential operators
on C ∞ (U ). One verifies that (8.4) defines a derivation on C ∞ (U ), hence a vector
field on U. The basic elementary fact about the Lie bracket is the following.
Theorem 8.1. If X and Y are smooth vector fields, then
(8.5)
LX Y = [X, Y ].
Proof. Let us first verify the identity in the special case
X
X = ∂/∂x1 , Y =
bj (x)∂/∂xj .
P
P
t
Then F#
Y =
bj (x + te1 )∂/∂xj . Hence, in this case LX Y = (∂bj /∂x1 )∂/∂xj ,
and a straightforward calculation shows this is also the formula for [X, Y ], in this
case.
Now we verify (8.5) in general, at any point x0 ∈ U. First, if X is nonvanishing at
x0 , we can choose a local coordinate system so the example above gives the identity.
By continuity, we get the identity (8.5) on the closure of the set of points x0 where
X(x0 ) 6= 0. Finally, if x0 has a neighborhood where X = 0, clearly LX Y = 0 and
[X, Y ] = 0 at x0 . This completes the proof.
60
Corollary 8.2. If X and Y are smooth vector fields on U, then
(8.6)
t
t
(d/dt)FX#
Y = FX#
[X, Y ]
for all t.
t+s
t+s
s
t
Proof. Since locally FX
= FX
FX
, we have the same identity for FX#
, which
yields (8.6) upon taking the s-derivative.
We make some further comments about cases when one can explicitly integrate
a vector field X in the plane, exploiting ‘symmetries’ which might be apparent. In
fact, suppose one has in hand a vector field Y such that
(8.7)
[X, Y ] = 0.
By (8.6), this implies FYt # X = X for all t; this connection will be pursued further
in the next section. Suppose one has an explicit hold on the flow generated by Y,
so one can produce explicit local coordinates (u, v) with respect to which
(8.8)
Y = ∂/∂u.
In this coordinate system, write X = a(u, v)∂/∂u+b(u, v)∂/∂v. The condition (8.7)
implies ∂a/∂u = 0 = ∂b/∂u, so in fact we have
(8.9)
X = a(v)∂/∂u + b(v)∂/∂v.
Integral curves of (8.9) satisfy
(8.10)
u0 = a(v),
v 0 = b(v)
and can be found explicitly in terms of integrals; one has
Z
(8.11)
b(v)−1 dv = t + C1 ,
and then
Z
(8.12)
u=
a(v(t))dt + C2 .
More generally than (8.7), we can suppose that, for some constant c,
(8.13)
[X, Y ] = cX,
which by (8.6) is the same as
(8.14)
FYt # X = e−ct X.
61
An example would be
(8.15)
X = f (x, y)∂/∂x + g(x, y)∂/∂y
where f and g satisfy ‘homogeneity’ conditions of the form
(8.16)
f (ra x, rb y) = ra−c f (x, y),
g(ra x, rb y) = rb−c g(x, y),
for r > 0; in such a case one can take explicitly
FYt (x, y) = (eat x, ebt y).
(8.17)
Now, if one again has (8.8) in a local coordinate system (u, v), then X must have
the form
£
¤
(8.18)
X = ecu a(v)∂/∂u + b(v)∂/∂v
which can be explicitly integrated, since
(8.19)
u0 = ecu a(v), v 0 = ecu b(v) =⇒ du/dv = a(v)/b(v).
The hypothesis (8.13) implies that the linear span (over R) of X and Y is a
two dimensional ‘solvable Lie algebra.’ Sophus Lie devoted a good deal of effort to
examining when one could use constructions of solvable Lie algebras of vector fields
to explicitly integrate vector fields; his investigations led to his foundation of what
is now called the theory of Lie groups.
Exercises
1. Verify that the bracket (8.4) satisfies the ‘Jacobi identity’
[X, [Y, Z]] − [Y, [X, Z]] = [[X, Y ], Z],
i.e.,
[LX , LY ]Z = L[X,Y ] Z.
2. Find the integral curves of
X = (x + y 2 )
∂
∂
+y
∂x
∂y
using (8.16).
3. Find the integral curves of
X = (x2 y + y 5 )
∂
∂
+ (x2 + xy 2 + y 4 ) .
∂x
∂y
62
9. Differential equations from classical mechanics
Consider the equations of motion which arise from Newton’s law F = ma, when
the force F is given by
(9.1)
F = − grad V (x),
V being potential energy. We get the ODE
m d2 x/dt2 = −∂V /∂x.
(9.2)
We can convert this into a first order system for (x, ξ), where
(9.3)
ξ = m dx/dt
is the momentum. We have
(9.4)
dx/dt = ξ/m,
dξ/dt = −∂V /∂x.
This is called a Hamiltonian system.
Now consider the total energy
(9.5)
f (x, ξ) = |ξ|2 /2m + V (x).
Note that ∂f /∂ξ = ξ/m, ∂f /∂x = ∂V /∂x. Thus (9.4) is of the form
(9.6)
dxj /dt = ∂f /∂ξj ,
dξj /dt = −∂f /∂xj .
Hence we’re looking for the integral curves of the vector field
(9.7)
n h
X
∂f ∂ i
∂f ∂
−
.
Hf =
∂ξ
∂x
∂x
∂ξ
j
j
j
j
j=1
For smooth f (x, ξ), we call Hf , defined by (9.7), a Hamiltonian vector field. Note
that, directly from (9.7),
(9.8)
Hf f = 0.
A useful notation is the Poisson bracket, defined by
(9.9)
{f, g} = Hf g.
63
One verifies directly from (9.7) that
(9.10)
{f, g} = −{g, f },
generalizing (9.8). Also a routine calculation verifies that
(9.11)
[Hf , Hg ] = H{f,g} .
As noted at the end of §7, if X is a vector field in the plane, and we have
explicitly a function u with nonvanishing gradient such that Xu = 0, then X can
be explicitly integrated. These comments apply to X = Hf , u = f, in case Hf is
a planar Hamiltonian vector field. We can rephrase this description, as follows. If
x ∈ R, ξ ∈ R, integral curves of
(9.12)
x0 = ∂f /∂ξ,
ξ 0 = −∂f /∂x
lie on a level set
(9.13)
f (x, ξ) = E.
Suppose that, locally, this set is described by
(9.14)
x = ϕ(ξ), or ξ = ψ(x).
Then we have one of the following ODEs:
(9.15)
x0 = fξ (x, ψ(x)), or ξ 0 = −fx (ϕ(ξ), ξ),
and hence we have
Z
¡
¢−1
fξ x, ψ(x)
dx = t + C,
(9.16)
or
Z
(9.17)
−
¡
¢−1
fx ϕ(ξ), ξ
dξ = t + C 0 .
Thus solving (9.12) is reduced to a quadrature, i.e., a calculation of an explicit
integral, (9.16) or (9.17).
If the planar Hamiltonian vector field Hf arises from describing motion in a force
field on a line, via Newton’s laws (9.2), so
(9.18)
f (x, ξ) = ξ 2 /2m + V (x),
then the second curve in (9.14) is
(9.19)
£
¡
¢¤ 1
ξ = ± (2m) E − V (x) 2
64
and the formula (9.16) becomes
Z
(9.20)
±(m/2)
1
2
£
¤−1
E − V (x) dx = t + C,
defining x implicitly as a function of t.
In some cases, the integral (9.20) can be evaluated by elementary means. This
includes the trivial case of a constant force, where V (x) = cx, also the case of the
‘harmonic oscillator’ or linearized spring, where V (x) = cx2 . It also includes the
case of motion of a rocket in space, along a line through the center of a planet,
where V (x) = −K/|x|. The case V (x) = −K cos x arises in the analysis of the
pendulum (see (18.38)). In that case (9.20) is an elliptic integral, rather than one
that arises in first year calculus.
For Hamiltonian vector fields in higher dimensions, more effort is required to
understand the resulting flows. One important tractable class arises from central
force problems. We treat this in the next section.
Hamiltonian vector fields arise in treating many problems in addition to those
derived from Newton’s laws in Cartesian coordinates. In §18 we study a broad
class of variational problems, which lead to Hamiltonian systems. This variational
approach has many convenient features, such as allowing easy formulation of the
equations of motion in arbitrary coordinate systems.
Exercises
1. Verify that [Hf , Hg ] = H{f,g} .
2. Demonstrate that the Poisson bracket satisfies the Jacobi identity
{f, {g, h}} − {g, {f, h}} = {{f, g}, h}.
Hint. Use exercise 1 above and exercise 1 of §8.
3. Identifying y and ξ, show that a planar vector field X = f (x, y)(∂/∂x) +
g(x, y)(∂/∂y) is Hamiltonian if and only if div X = 0.
Reconsider problem 5 in §7.
4. Show that
on an orbit of Hf .
d
g(x, ξ) = {f, g}
dt
65
10. Central force problems
Central force problems give rise to a class of Hamiltonian systems with two
degrees of freedom, whose solutions can be found ‘explicitly,’ in terms of integrals.
Before treating these problems specifically, we look at a class of Hamiltonians on a
region in R4 , of the form
(10.1)
F (y, η) = F (y1 , η1 , η2 ),
i.e., with no y2 dependence. Thus Hamilton’s equations take the form
(10.2)
y˙ j = ∂F/∂ηj ,
η˙ 1 = −∂F/∂y1 ,
η˙ 2 = 0.
In particular, η2 is constant on any orbit, say
(10.3)
η2 = L.
This, in addition to F, provides the second conservation law implying integrability;
note that {F, η2 } = 0. If F (y1 , η1 , L) = E on an integral curve, we write this relation
as
(10.4)
η1 = ψ(y1 , L, E).
We can now pursue an analysis which is a variant of that described by (9.14)(9.20). The first equation in (10.2) becomes
(10.5)
¡
¢
y˙ 1 = Fη1 y1 , ψ(y1 , L, E), L ,
with solution given implicitly by
Z
¡
¢−1
(10.6)
Fη1 y1 , ψ(y1 , L, E), L
dy1 = t + C.
Once you have y1 (t), then you have
(10.7)
η1 (t) = ψ(y1 (t), L, E),
and then the remaining equation in (10.2) becomes
(10.8)
¡
¢
y˙ 2 = Fη2 y1 (t), η1 (t), L ,
which is solved by an integration.
66
We apply this method to the central force problem:
(10.9)
x˙ = Gξ (x, ξ), ξ˙ = −Gx (x, ξ),
G(x, ξ) = 21 |ξ|2 + v(|x|),
x ∈ R2 .
Use of polar coordinates is clearly suggested, so we set
(10.10)
y1 = r, y2 = θ;
x1 = r cos θ, x2 = r sin θ.
A calculation shows that the system (10.9) takes the form (10.2), with
(10.11)
F (y, η) =
1
2
¡ 2
¢
η1 + y1−2 η22 + v(y1 ).
We see that the first pair of ODEs in (10.2) takes the form
(10.12)
r˙ = η1 ,
θ˙ = Lr−2 ,
where L is the constant value of η2 along an integral curve, as in (10.3). The last
equation, rewritten as
(10.13)
r2 θ˙ = L,
expresses conservation of angular momentum. The remaining ODE in (10.2) becomes
(10.14)
η˙ 1 = L2 r−3 − v 0 (r).
Note that differentiating the first equation of (10.12) and using (10.14) gives
(10.15)
r¨ = L2 r−3 − v 0 (r),
an equation which can be integrated by the methods described in (9.12)-(9.20).
We won’t solve (10.15) by this means here, though (10.15) will be used below, to
produce (10.23). For now, we use instead (10.4)-(10.6). In the present case, (10.4)
takes the form
(10.16)
£
¤1
η1 = ± 2E − 2v(r) − L2 r−2 2 ,
and since Fη1 = η1 , (10.6) takes the form
Z
£
¤− 1
(10.17)
±
2Er2 − 2r2 v(r) − L2 2 rdr = t + C.
In the case of the Kepler problem, where v(r) = −K/r, the resulting integral
Z
¢− 1
¡
(10.18)
±
2Er2 + 2Kr − L2 2 rdr = t + C
67
can be evaluated using techniques of first year calculus, by completing the square in
2Er2 + 2Kr − L2 . Once r = r(t) is given, the equation (10.13) provides an integral
formula for θ = θ(t).
One of the most remarkable aspects of the analysis of the Kepler problem is the
demonstration that orbits all lie on some conic section, given in polar coordinates
by
(10.19)
£
¤
r 1 + e cos(θ − θ0 ) = ed,
where e is the ‘eccentricity.’ We now describe the famous, elementary but ingenious trick used to demonstrate this. The method involves producing a differential
equation for r in terms of θ, from (10.13) and (10.15). More precisely, we produce
a differential equation for u, defined by
(10.20)
u = 1/r.
By the chain rule,
(10.21)
dr
du
du dθ
= −r2
= −r2
dt
dt
dθ dt
du
= −L ,
dθ
in light of (10.13). Differentiating this with respect to t gives
(10.22)
d2 r
d du
d2 u dθ
=
−L
=
−L
dt2
dt dθ
dθ2 dt
2
d u
= −L2 u2 2 ,
dθ
again using (10.13). Comparing this with (10.15), we get −L2 u2 (d2 u/dθ2 ) = L2 u3 −
v 0 (1/u), or equivalently
(10.23)
d2 u
+ u = (Lu)−2 v 0 (1/u).
dθ2
In the case of the Kepler problem, v(r) = −K/r, the right side becomes the constant
K/L2 , so in this case (10.23) becomes the linear equation
(10.24)
d2 u
+ u = K/L2 ,
2
dθ
with general solution
(10.25)
u(θ) = A cos(θ − θ0 ) + K/L2 ,
68
which is equivalent to the formula (10.19) for a conic section.
For more general central force problems, the equation (10.23) is typically not
linear, but it is of the form treatable by the method of (9.12)-(9.20).
Exercises
1. Solve explicitly w00 (t) = −w(t), for w taking values in R2 = C. Show that
|w(t)|2 + |w0 (t)|2 = 2E is constant for each orbit.
2. For w(t) taking values in C, define a new curve by
Z(τ ) = w(t)2 ,
dτ /dt = |w(t)|2 .
Show that, if w00 (t) = −w(t), then
Z 00 (τ ) = −4EZ(τ )/|Z(τ )|3 ,
i.e., Z(τ ) solves the Kepler problem.
3. Analyze the equation (10.23) for u(θ) in the following cases.
(a) v(r) = −K/r2 ,
(b) v(r) = Kr2 ,
(c) v(r) = −K/r + εr2 .
Show that, in case (c), u(θ) is typically not periodic in θ.
4. Consider the smooth map Φ : (0, ∞) × R → R2 , defined by Φ(y1 , y2 ) = (x1 , x2 ),
with
x1 = y1 cos y2 , x2 = y1 sin y2 .
Then define Ψ : (0, ∞) × R × R2 → R2 × R2 by Ψ(y1 , y2 , η1 , η2 ) = (x1 , x2 , ξ1 , ξ2 ),
where (x1 , x2 ) = Φ(y1 , y2 ) and
µ ¶
µ ¶
η1
ξ1
t
= DΦ(x1 , x2 )
.
η2
ξ2
If G(x, ξ) is given by (10.9), show that G ◦ Ψ = F, given by (10.11). How is
this related to the calculation that the change of variables (10.10) transforms the
Hamiltonian system (10.9) to (10.2)?
Remark. Other perspectives on this calculation can be obtained from problem 4 of
§18, and from the discussion of canonical transformations in §19.
69
11. The Riemann integral in n variables
We define the Riemann integral of a bounded function f : R → R, where R ⊂ Rn
is a cell, i.e., a product of intervals R = I1 × · · · × In , where Iν = [aν , bν ] are
intervals in R. Recall that a partition of an interval I = [a, b] is a finite collection
of subintervals {Jk : 0 ≤ k ≤ N }, disjoint except for their endpoints, whose union
is I. We can take Jk = [xk , xk+1 ], where
(11.1)
a = x0 < x1 < · · · < xN < xN +1 = b.
Now, if one has a partition of each Iν into Jν1 ∪ · · · ∪ Jν,N (ν) , then a partition P of
R consists of the cells
(11.2)
Rα = J1α1 × J2α2 × · · · × Jnαn ,
where 0 ≤ αν ≤ N (ν). For such a partition, define
(11.3)
maxsize(P) = max diam Rα ,
α
where (diam Rα )2 = `(J1α1 )2 + · · · + `(Jnαn )2 . Here, `(J) denotes the length of an
interval J. Each cell has n-dimensional volume
(11.4)
V (Rα ) = `(J1α1 ) · · · `(Jnαn ).
Sometimes we use Vn (Rα ) for emphasis on the dimension. We also use A(R) for
V2 (R), and, of course, `(R) for V1 (R).
We set
X
I P (f ) =
sup f (x) V (Rα ),
(11.5)
α
I P (f ) =
X
α
Rα
inf f (x) V (Rα ).
Rα
Note that I P (f ) ≤ I P (f ). These quantities should approximate the Riemann integral of f, if the partition P is sufficiently ‘fine.’
To be more precise, if P and Q are two partitions of R, we say P refines Q,
and write P Â Q, if each partition of each interval factor Iν of R involved in the
definition of Q is further refined in order to produce the partitions of the factors
Iν , used to define P, via (11.2). It is easy to see that any two partitions of R have
a common refinement. Note also that
(11.6)
P Â Q =⇒ I P (f ) ≤ I Q (f ), and I P (f ) ≥ I Q (f ).
70
Consequently, if Pj are any two partitions of R and Q is a common refinement, we
have
(11.7)
I P1 (f ) ≤ I Q (f ) ≤ I Q (f ) ≤ I P2 (f ).
Now, whenever f : R → R is bounded, the following quantities are well defined:
(11.8)
I(f ) =
inf
P∈Π(R)
I P (f ),
I(f ) = sup I P (f ),
P∈Π
where Π(R) is the set of all partitions of R, as defined above. Clearly, by (11.7),
I(f ) ≤ I(f ). We then say that f is Riemann integrable (on R) provided I(f ) = I(f ),
and in such a case, we set
Z
f (x) dV (x) = I(f ) = I(f ).
(11.9)
R
We will denote the set of Riemann integrable functions on R by R(R). If dim R = 2,
we will often use dA(x) instead of dV (x) in (11.9), and if dim R = 1, we will often
use simply dx.
We derive some basic properties of the Riemann integral. First, the proof of
Proposition 0.3 and Corollary 0.4 readily extend, to give:
Proposition 11.1. Let Pν be any sequence of partitions of R such that
maxsize(Pν ) = δν → 0,
and let ξνα be any choice of one point in each cell Rνα in the partition Pν . Then,
whenever f ∈ R(R),
Z
(11.10)
f (x) dV (x) = lim
X
ν→∞
f (ξνα ) V (Rνα ).
α
R
Since the right side of (11.10) is clearly linear for each ν, we obtain:
Proposition 11.2. If fj ∈ R(R) and cj ∈ R, then c1 f1 + c2 f2 ∈ R(R), and
Z
Z
(11.11)
(c1 f1 + c2 f2 ) dV = c1
R
Z
f1 dV + c2
R
f2 dV.
R
Next, we establish an integrability result analogous to Proposition 0.2.
71
Proposition 11.3. If f is continuous on R, then f ∈ R(R).
Proof. As in the proof of Proposition 0.2, we have that,
(11.12)
maxsize(P) ≤ δ =⇒ I P (f ) − I P (f ) ≤ ω(δ) · V (R),
where ω(δ) is a modulus of continuity for f on R. This proves the proposition.
When the number of variables exceeds one, it becomes more important to identify
some nice classes of discontinuous functions on R which are Riemann integrable.
A useful tool for this is the following notion of size of a set S ⊂ R, called content.
Extending (0.18)-(0.19), we define ‘upper content’ cont+ and ‘lower content’ cont−
by
cont+ (S) = I(χS ),
(11.13)
cont− (S) = I(χS ),
where χS is the characteristic function of S. We say S has content, or ‘is contented,’
if these quantities are equal, which happens if and only if χS ∈ R(R), in which case
the common value of cont+ (S) and cont− (S) is
Z
(11.14)
V (S) =
χS (x) dV (s).
R
It is easy to see that
(11.15)
+
cont (S) = inf
N
nX
o
V (Rk ) : S ⊂ R1 ∪ · · · ∪ RN ,
k=1
where Rk are cells contained in R. In the formal definition, the Rα in (11.15) should
be part of a partition P of R, as defined above, but if {R1 , . . . , RN } are any cells in
R, they can be chopped up into smaller cells, some perhaps thrown away, to yield
a finite cover of S by cells in a partition of R, so one gets the same result.
It is easy to see that, for any set S ⊂ R,
(11.16)
cont+ (S) = cont+ (S),
where S is the closure of S.
We note that, generally, for a bounded function f on R,
(11.17)
I(f ) + I(1 − f ) = V (R).
This follows directly from (11.5). In particular, given S ⊂ R,
(11.18)
cont− (S) + cont+ (R \ S) = V (R).
72
Using this together with (11.16), with S and R \ S switched, we have
◦
cont− (S) = cont− (S),
(11.19)
◦
◦
where S is the interior of S. The difference S \ S is called the boundary of S, and
denoted bS.
Note that
(11.20)
−
cont (S) = sup
N
nX
o
V (Rk ) : R1 ∪ · · · ∪ RN ⊂ S ,
k=1
where here we take {R1 , . . . , RN } to be cells within a partition P of R, and let P
vary over all partitions of R. Now, given a partition P of R, classify each cell in P
◦
as either being contained in R \ S, or intersecting bS, or contained in S. Letting P
vary over all partitions of R, we see that
(11.21)
◦
cont+ (S) = cont+ (bS) + cont− (S).
In particular, we have:
Proposition 11.4. If S ⊂ R, then S is contented if and only if cont+ (bS) = 0.
If a set Σ ⊂ R has the property that cont+ (Σ) = 0, we say that Σ has content
zero, or is a nil set. Clearly Σ is nil if and only if Σ is nil. It follows easily from
K
S
Proposition 11.2 that, if Σj are nil, 1 ≤ j ≤ K, then
Σj is nil.
j=1
◦
◦
◦
If S1 , S2 ⊂ R and S = S1 ∪ S2 , then S = S 1 ∪ S 2 and S = S 1 ∪ S 2 . Hence
bS ⊂ b(S1 ) ∪ b(S2 ). It follows then from Proposition 11.4 that, if S1 and S2 are
contented, so is S1 ∪ S2 . Clearly, if Sj are contented, so are Sjc = R \ Sj . It follows
¢c
¡
that, if S1 and S2 are contented, so is S1 ∩ S2 = S1c ∪ S2c . A family F of subsets
of R such that R ∈ F, Sj ∈ F ⇒ S1 ∪ S2 ∈ F, and S1 ∈ F ⇒ R \ S1 ∈ F is called
an algebra of subsets of R. Algebras of sets are automatically closed under finite
intersections also. We see that:
Proposition 11.5. The family of contented subsets of R is an algebra of sets.
The following result specifies a useful class of Riemann integrable functions.
Proposition 11.6. If f : R → R is bounded and the set S of points of discontinuity
of f is a nil set, then f ∈ R(R).
Proof. Suppose |f | ≤ M on R, and take ε > 0. Take a partition P of R, and write
P = P 0 ∪ P 00 , where cells in P 0 do not meet S, and cells in P 00 do intersect S. Since
cont+ (S) = 0, we can pick P so that the cells in P 00 have total volume ≤ ε. Now f
73
is continuous on each cell in P 0 . Further refining the partition if necessary, we can
assume that f varies by ≤ ε on each cell in P 0 . Thus
(11.22)
£
¤
I P (f ) − I P (f ) ≤ V (R) + 2M ε.
This proves the proposition.
To give an example, suppose K ⊂ R is a closed set such that bK is nil. Let
f : K → R be continuous. Define fe : R → R by
(11.23)
fe(x) = f (x) for x ∈ K,
0
for x ∈ R \ K.
Then the set of points of discontinuity of fe is contained in bK. Hence fe ∈ R(R).
We set
Z
Z
(11.24)
f dV = fe dV.
K
R
The following is a useful criterion for a set S ⊂ Rn to have content zero.
Proposition 11.7. Let Σ ⊂ Rn−1 be a closed bounded set and let g : Σ → R be
continuous. Then the graph of g,
G=
©¡
¢
ª
x, g(x) : x ∈ Σ
is a nil subset of Rn .
Proof. Put Σ in a cell R0 ⊂ Rn−1 . Suppose |f | ≤ M on Σ. Take N ∈ Z+ and set
ε = M/N. Pick a partition P0 of R0 , sufficiently fine that g varies by at most ε
on each set Σ ∩ Rα , for any cell Rα ∈ P0 . Partition the interval I = [−M, M ] into
2N equal intervals Jν , of length ε. Then {Rα × Jν } = {Qαν } forms a partition of
R0 × I. Now, over each cell Rα ∈ P0 , there lie at most 2 cells Qαν which intersect
G, so cont+ (G) ≤ 2ε · V (R0 ). Letting N → ∞, we have the proposition.
Similarly, for any j ∈ {1, . . . , n}, the graph of xj as a continuous function of the
complementary variables is a nil set in Rn . So are finite unions of such graphs. Such
sets arise as boundaries of many ordinary-looking regions in Rn .
Here is a further class of nil sets.
Proposition 11.8. Let S be a compact nil subset of Rn and f : S → Rn a Lipschitz
map. Then f (S) is a nil subset of Rn .
Proof. The Lipschitz hypothesis on f is that, for p, q ∈ S,
(11.25)
|f (p) − f (q)| ≤ C1 |p − q|.
74
If we cover S with k cells (in a partition), of total volume ≤√α, each cubical with
edgesize δ, then f (S) is covered√by k sets of diameter ≤ C1 nδ, hence it can be
covered
√ by (approximately) kC1 n cubical cells of edgesize δ, having total volume
≤ C1 nα. From this we have the (not very sharp) general bound
¡
¢
√
(11.26)
cont+ f (S) ≤ C1 n cont+ (S),
which proves the proposition.
In evaluating n-dimensional integrals, it is usually convenient to reduce them
to iterated integrals. The following is a special case of a result known as Fubini’s
Theorem.
Theorem 11.9. Let Σ ⊂ Rn−1 be a closed, bounded contented set and let gj : Σ →
R be continuous, with g0 (x) < g1 (x) on Σ. Take
©
ª
(11.27)
Ω = (x, y) ∈ Rn : x ∈ Σ, g0 (x) ≤ y ≤ g1 (x) .
Then Ω is a contented set in Rn . If f : Ω → R is continuous, then
Z
(11.28)
g1 (x)
ϕ(x) =
f (x, y) dy
g0 (x)
is continuous on Σ, and
Z
(11.29)
Z
f dVn =
Ω
ϕ dVn−1 ,
Σ
i.e.,
Z ÃZ
Z
(11.30)
f dVn =
Ω
!
g1 (x)
f (x, y) dy
Σ
dVn−1 (x).
g0 (x)
Proof. The continuity of (11.28) is an exercise in one-variable integration; see exercises 2-3 of §0.
Let ω(δ) be a modulus of continuity for g0 , g1 , and f, and also ϕ. Put Σ in a
cell R0 and let P0 be a partition of R0 . If A ≤ g0 ≤ g1 ≤ B, partition the interval
[A, B], and from this and P0 construct a partition P of R = R0 × [A, B]. We denote
a cell in P0 by Rα and a cell in P by Rα` = Rα × J` . Pick points ξα` ∈ Rα` .
◦
Write P0 = P00 ∪ P000 ∪ P0000 , consisting respectively of cells inside Σ, meeting bΣ,
and inside R0 \ Σ. Similarly write P = P 0 ∪ P 00 ∪ P 000 , consisting respectively of cells
◦
inside Ω, meeting bΩ, and inside R \ Ω, as illustrated in Fig.11.1. For fixed α, let
z 0 (α) = {` : Rα` ∈ P 0 }, and let z 00 (α) and z 000 (α) be similarly defined. Note that
z 0 (α) 6= ∅ ⇐⇒ Rα ∈ P00 ,
75
provided we assume maxsize(P) ≤ δ and 2δ < min g1 (x) − g0 (x), as we will from
here on.
It follows from (0.9)-(0.10) that
Z
¯ X
¯
f (ξα` )`(J` ) −
¯
(11.31)
S
¯
¯
f (x, y) dy ¯ ≤ (B − A)ω(δ),
∀ x ∈ Rα ,
A(α)
`∈z 0 (α)
where
B(α)
J` = [A(α), B(α)]. Note that A(α) and B(α) are within 2ω(δ) of g0 (x)
`∈z 0 (α)
and g1 (x), respectively, for all x ∈ Rα , if Rα ∈ P00 . Hence, if |f | ≤ M,
¯Z
¯
¯
(11.32)
B(α)
¯
¯
f (x, y) dy − ϕ(x)¯ ≤ 4M ω(δ).
A(α)
Thus, with C = B − A + 4M,
¯ X
¯
¯
¯
f (ξα` )`(J` ) − ϕ(x)¯ ≤ Cω(δ),
(11.33)
¯
∀ x ∈ Rα ∈ P00 .
`∈z 0 (α)
Multiplying by Vn−1 (Rα ) and summing over Rα ∈ P00 , we have
¯ X
¯
X
¯
¯
(11.34)
f (ξα` )Vn (Rα` ) −
ϕ(xα )Vn−1 (Rα )¯ ≤ CV (R0 )ω(δ),
¯
Rα` ∈P 0
Rα ∈P00
where xα is an arbitrary point in Rα .
Now, if P0 is a sufficiently fine partition of R0 , it follows from theR proof of Proposition 11.6 that the second term in (11.34) is arbitrarily close to ϕ dVn−1 , since
Σ
bΣ has content zero. Furthermore, an argument such as used to prove Proposition
11.7 shows that bΩ has content zero, and one verifies that,
R for a sufficiently fine
partition, the first sum in (11.34) is arbitrarily close to f dVn . This proves the
desired identity (11.29).
Ω
We next take up the change of variables formula for multiple integrals, extending
the one variable formula, (0.38). We begin with a result on linear changes of
variables. The set of invertible real n × n matrices is denoted Gl(n, R).
Proposition 11.10. Let f be a continuous function with compact support in Rn .
If A ∈ Gl(n, R), then
Z
Z
(11.35)
f (x) dV = |Det A|
f (Ax) dV.
Proof. Let G be the set of elements A ∈ Gl(n, R) for which (11.35) is true. Clearly
I ∈ G. Using Det A−1 = (Det A)−1 , and Det AB = (Det A)(Det B), we conclude
76
that G is a subgroup of Gl(n, R). To prove the proposition, it will suffice to show
that G contains all elements of the following 3 forms, since the method of applying
elementary row operations to reduce a matrix shows that any element of Gl(n, R)
is a product of a finite number of these elements. Here, {ej : 1 ≤ j ≤ n} denotes
the standard basis of Rn , and σ a permutation of {1, . . . , n}.
A1 ej = eσ(j) ,
A2 ej = cj ej , cj 6= 0
(11.36)
A3 e2 = e2 + ce1 ,
A3 ej = ej for j 6= 2.
The first two cases are elementary consequences of the definition of the Riemann
integral, and can be left as exercises.
We show that (11.35) holds for transformations of the form A3 by using Theorem
11.9 (in a special case), to reduce it to the case n = 1. Given f ∈ C(R), compactly
supported, and b ∈ R, we clearly have.
Z
(11.37)
Z
f (x) dx =
f (x + b) dx.
Now, for the case A = A3 , with x = (x1 , x0 ), we have
Z
0
Z ³Z
´
f (x1 + cx2 , x0 ) dx1 dVn−1 (x0 )
Z ³Z
´
f (x1 , x0 ) dx1 dVn−1 (x0 ),
f (x1 + cx2 , x ) dVn (x) =
(11.38)
=
the second identity by (11.37). Thus we get (11.35) in case A = A3 , so the proposition is proved.
It is desirable to extend Proposition 11.10 to some discontinuous functions.
Given a cell R and f : R → R, bounded, we say
(11.39)
f ∈ C(R) ⇐⇒ the set of discontinuities of f is nil.
Proposition 11.6 implies
(11.40)
C(R) ⊂ R(R).
From the closure of the class of nil sets under finite unions it is clear that C(R) is
closed under sums and products, i.e., that C(R) is an algebra of functions on R.
We will denote by Cc (Rn ) the set of functions f : Rn → R such that f has compact
n
support and its
¯ set of discontinuities is nil. Any f ∈ Cc (R ) is supported in some
cell R, and f ¯R ∈ C(R).
77
Proposition 11.11. Given A ∈ Gl(n, R), the identity (11.35) holds for all f ∈
Cc (Rn ).
Proof. Assume |f | ≤ M. Suppose supp f is in the interior of a cell R. Let S be
the set of discontinuities of f. The function g(x) = f (Ax) has A−1 S as its set of
discontinuities, which is also nil, by Proposition 11.8. Thus g ∈ Cc (Rn ).
Take ε > 0. Let P be a partition of R, of maxsize ≤ δ, such that S is covered by
cells in P of total volume ≤ ε. Write P = P 0 ∪ P 00 ∪ P 000 , where P 00 consists of cells
meeting S, P 000 consists of all cells touching cells in P 00 , and P 0 consists of all the
rest. Then P 00 ∪ P 000 consists of cells of total volume ≤ 3n ε. Thus if we set
[
(11.41)
B=
Rα ,
P0
we have that B is compact, B ∩ S = ∅, and V (R \ B) ≤ 3n ε.
Now construct a continuous, compactly supported function ϕ : Rn → R such
that
(11.42)
ϕ = 1 on B,
ϕ = 0 near S,
0 ≤ ϕ ≤ 1,
and set
(11.43)
f = ϕf + (1 − ϕ)f = f1 + f2 .
We see that f1 is continuous on Rn , with compact support, |f2 | ≤ M, and f2 is
supported on R \ B, which is a union of cells of total volume ≤ 3n ε.
Now Proposition 11.10 applies directly to f1 , to yield
Z
Z
(11.44)
f1 (y) dV (y) = |Det A| f1 (Ax) dV (x).
We have f2 ∈ Cc (Rn ), and g2 (x) = f2 (Ax) ∈ Cc (Rn ). Also, g2 is supported by
+
A−1
√(R n\ B), which by the estimate (11.26) has the property cont (R \ B) ≤
C1 n3 ε. Thus
¯Z
¯
¯Z
¯
√
¯
¯
¯
¯
n
(11.45)
¯ f2 (y) dV (y)¯ ≤ 3 M ε, ¯ f2 (Ax) dV (x)¯ ≤ C1 n3n M ε.
Taking ε → 0, we have the proposition.
In particular, if Σ ⊂ Rn is a compact set which is contented, then its characteristic function χΣ , which has bΣ as its set of points of discontinuity, belongs to
Cc (Rn ), and we deduce:
Corollary 11.12. If Σ ⊂ Rn is a compact, contented set and A ∈ Gl(n, R), then
A(Σ) = {Ax : x ∈ Σ} is contented, and
¡
¢
(11.46)
V A(Σ) = |Det A| V (Σ).
We now extend Proposition 11.10 to nonlinear changes of variables.
78
Proposition 11.13. Let O and Ω be open in Rn , G : O → Ω a C 1 diffeomorphism,
and f a continuous function with compact support in Ω. Then
Z
Z
¡
¢
(11.47)
f (y) dV (y) = f G(x) |Det DG(x)| dV (x).
Ω
O
Proof. It suffices to prove the result under the additional assumption that f ≥ 0,
which we make from here on.
If K = supp f, put G−1 (K) inside a rectangle R and let P = {Rα } be a partition
of R. Let A be the set of α such that Rα ⊂ O. For α ∈ A, G(Rα ) ⊂ Ω. See Fig.11.2.
Pick P fine enough that α ∈ A whenever Rα ∩ G−1 (K) is nonempty.
eα = Rα − ξα , a cell with center at the
Let ξα be the center of Rα , and let R
origin. Then
¡ ¢
eα = ηα + Hα
G(ξα ) + DG(ξα ) R
(11.48)
is an n-dimensional parallelipiped which is close to G(Rα ), if Rα is small enough.
See Fig.11.3. In fact, given ε > 0, if δ > 0 is small enough and maxsize(P) ≤ δ,
then we have
(11.49)
ηα + (1 + ε)Hα ⊃ G(Rα ),
for all α ∈ A. Now, by (11.46),
(11.50)
V (Hα ) = |Det DG(α)| V (Rα ).
Hence
¡
¢
V G(Rα ) ≤ (1 + ε)n |Det DG(ξα )| V (Rα ).
(11.51)
Now we have
Z
f dV =
X
α∈A
(11.52)
≤
X
α∈A
Z
f dV
G(Rα )
¡
¢
sup f ◦ G(x) V G(Rα )
Rα
≤ (1 + ε)n
X
α∈A
sup f ◦ G(x) |Det DG(ξα )| V (Rα ).
Rα
The second inequality here uses (11.51) and f ≥ 0. If we set
(11.53)
h(x) = f ◦ G(x) |Det DG(x)|,
79
then we have
(11.54)
sup f ◦ G(x) |Det DG(ξα )| ≤ sup h(x) + M ω(δ),
Rα
Rα
provided |f | ≤ M and ω(δ) is a modulus of continuity for DG. Taking arbitrarily
fine partitions, we get
Z
Z
(11.55)
f dV ≤ h dV.
If we apply this result, with G replaced by G−1 , O and Ω switched, and f
replaced by h, given by (11.53), we have
Z
Z
Z
−1
−1
(11.56)
h dV ≤ h ◦ G (y) |Det DG (y)| dV (y) = f dV.
The inequalities (11.55) and (11.56) together yield the identity (11.47).
Now the same argument used to establish Proposition 11.11 from Proposition
11.10 yields the following improvement of Proposition 11.13. We say f ∈ Cc (Ω) if
f ∈ Cc (Rn ) has support in an open set Ω.
Theorem 11.14. Let O and Ω be open in Rn , G : O → Ω a C 1 diffeomorphism,
and f ∈ Cc (Ω). Then f ◦ G ∈ Cc (O), and
Z
Z
¡
¢
(11.57)
f (y) dV (y) = f G(x) |Det DG(x)| dV (x).
Ω
O
The most frequently invoked case of the change of variable formula, in the case
n = 2, involves the following change from Cartesian to polar coordinates:
(11.58)
x = r cos θ,
y = r sin θ.
Thus, take G(r, θ) = (r cos θ, r sin θ). We have
µ
¶
cos θ −r sin θ
(11.59)
DG(r, θ) =
,
sin θ r cos θ
Det DG(r, θ) = r.
For example, if ρ ∈ (0, ∞) and
Dρ = {(x, y) ∈ R2 : x2 + y 2 ≤ ρ2 },
(11.60)
then, for f ∈ C(Dρ ),
Z
(11.61)
Z
ρ
Z
f (x, y) dA =
Dρ
2π
f (r cos θ, r sin θ)r dθ dr.
0
0
80
To get this, we first apply Theorem 11.4, with O = [ε, ρ] × [0, 2π − ε], then apply
Theorem 11.9, then let ε & 0.
It is often useful to integrate a function whose support is not bounded.¯ Generally,
given a bounded function fR : Rn → R, we say f ∈ R(Rn ) provided f ¯R ∈ R(R)
for each cell R ⊂ Rn , and |f | dV ≤ C, for some C < ∞, independent of R. If
R
f ∈ R(Rn ), we set
Z
Z
(11.62)
f dV = lim
f dV,
Rs = {x ∈ Rn : |xj | ≤ s, ∀ j}.
s→∞
Rn
Rs
It is easy to see that, if Kν is any sequence of compact contented subsets of Rn
such that each Rs , for s < ∞, is contained in all Kν for ν sufficiently large, i.e.,
ν ≥ N (s), then, whenever f ∈ R(Rn ),
Z
Z
(11.63)
f dV = lim
f dV.
ν→∞
Rn
Kν
Change of variables formulas and Fubini’s Theorem extend to this case. For
example, the limiting case of (11.61) as ρ → ∞ is
Z
2
Z
2
(11.64) f ∈ C(R ) ∩ R(R ) =⇒
Z
∞
f (x, y) dA =
f (r cos θ, r sin θ)r dθ dr.
0
R2
0
2
The following is a good example. Take f (x, y) = e−x
Z
(11.65)
Z
e
−x2 −y 2
2π
∞
Z
−r 2
e
0
R2
. We have
Z
2π
dA =
−y 2
∞
r dθ dr = 2π
0
2
e−r r dr.
0
Now, methods of §0 allow the substitution s = r2 , so
1
2 . Hence
R∞
0
2
e−r r dr =
Z
e−x
(11.66)
2
−y 2
dA = π.
R2
On the other hand, Theorem 11.9 extends to give
Z
Z ∞Z ∞
2
2
−x2 −y 2
e
dA =
e−x −y dy dx
(11.67)
−∞
R2
µZ
−∞
∞
=
e
−∞
−x2
¶ µZ
dx
¶
∞
−∞
e
−y 2
dy .
1
2
R∞
0
e−s ds =
81
Note that the two factors in the last product are equal. We deduce that
Z ∞
√
2
(11.68)
e−x dx = π.
−∞
We can generalize (11.67), to obtain (via (11.68))
µZ ∞
¶n
Z
n
−|x|2
−x2
(11.69)
e
dx
e
dVn =
= π2.
−∞
Rn
The integrals (11.65)-(11.69) are called Gaussian integrals, and their evaluation has
many uses. We shall see some in §12.
Exercises
1. Let f : Rn → R be unbounded. For s ∈ (0, ∞), set
fs (x) = f (x)
0
if |f (x)| ≤ s
otherwise.
Assume fs ∈ R(Rn ) for each s < ∞, and assume
Z
|fs (x)| dx ≤ B < ∞, ∀ s < ∞.
Rn
In such a case, say f ∈ R# (Rn ). Set
Z
Z
fs (x) dx.
f (x) dx = lim
s→∞
Rn
Rn
Show that this limit exists whenever f ∈ R# (Rn ).
Extend the change of variable formula to such functions.
Show that
2
f (x) = |x|−a e−|x| ∈ R# (Rn ) ⇐⇒ a < n.
2. Consider spherical polar coordinates on R3 , given by
x = ρ sin ϕ cos θ, y = ρ sin ϕ sin θ, z = ρ cos ϕ,
¡
¢
i.e., take G(ρ, ϕ, θ) = ρ sin ϕ cos θ, ρ sin ϕ sin θ, ρ cos θ . Show that
Det DG(ρ, ϕ, θ) = ρ2 sin ϕ.
Use this to compute the volume of the unit ball in R3 .
82
12. Integration on surfaces
A smooth m-dimensional surface M ⊂ Rn is characterized by the following
property. Given p ∈ M, there is a neighborhood U of p in M and a smooth map
ϕ : O → U, from an open set O ⊂ Rm bijectively to U, with injective derivative
at each point. Such a map ϕ is called a coordinate chart on M. We call U ⊂ M a
coordinate patch. If all such maps ϕ are smooth of class C k , we say M is a surface
of class C k . In §15 we will define analogous notions of a C k surface with boundary,
and of a C k surface with corners.
There is an abstraction of the notion of a surface, namely the notion of a ‘manifold,’ which is useful in more advanced studies. A treatment of calculus on manifolds
can be found in [Spi].
¡
¢
There is associated an m × m matrix G(x) = gjk (x) of functions on O, defined
in terms of the inner product of vectors tangent to M :
(12.1)
gjk (x) = Dϕ(x)ej · Dϕ(x)ek =
n
X
∂ϕ` ∂ϕ`
,
∂xj ∂xk
`=1
where {ej : 1 ≤ j ≤ m} is the standard orthonormal basis of Rm . Equivalently,
G(x) = Dϕ(x)t Dϕ(x).
(12.2)
We call (gjk ) the metric tensor of M, on U. Note that this matrix is positive definite.
From a coordinate-independent point of view, the metric tensor on M specifies inner
products of vectors tangent to M , using the inner product of Rn .
Suppose there is another coordinate chart ψ : Ω → U, and we compare (gjk )
with H = (hjk ), given by
(12.3)
hjk (y) = Dψ(y)ej · Dψ(y)ek ,
i.e., H(y) = Dψ(y)t Dψ(y).
We can write ϕ = ψ ◦ F, where F : O → Ω is a diffeomorphism. See Fig.12.1. By
the chain rule, Dϕ(x) = Dψ(y)DF (x), where y = F (x). Consequently,
G(x) = DF (x)t H(y)DF (x),
(12.4)
or equivalently,
(12.5)
gjk (x) =
X ∂Fi ∂F`
hi` (y).
∂xj ∂xk
i,`
If f : M → R is a continuous function supported on U, we will define the surface
integral by
Z
Z
p
(12.6)
f dS = f ◦ ϕ(x) g(x) dx,
M
O
83
where
(12.7)
g(x) = Det G(x).
We need to know that this is independent of the choice of coordinate chart ϕ : O →
U.
R Thus, ifpwe use ψ : Ω → U instead, we want to show that (12.6) is equal to
f ◦ ψ(y) h(y) dy, where h(y) = Det H(y). Indeed, since f ◦ ψ ◦ F = f ◦ ϕ, we
Ω
can apply the change of variable formula, Theorem 11.14, to get
Z
Z
p
p
(12.8)
f ◦ ψ(y) h(y) dy = f ◦ ϕ(x) h(F (x)) |Det DF (x)| dx.
Ω
O
Now, (12.4) implies that
p
p
g(x) = |Det DF (x)| h(y),
(12.9)
so the right side of (12.8) is seen to be equal to (12.6), and our surface integral
is well defined, at least for f supported in a coordinate patch. More generally, if
f : M → R has compact support, write it as a finite sum of terms, each supported
on a coordinate patch, and use (12.6) on each patch.
Let us pause to consider the special cases m = 1 and m = 2. For m = 1, we are
considering a curve in Rn , say ϕ : [a, b] → Rn . Then G(x) is a 1 × 1 matrix, namely
G(x) = |ϕ0 (x)|2 . If we denote the curve in Rn by γ, rather than M, the formula
(12.6) becomes
Z
(12.10)
Z
f ds =
b
f ◦ ϕ(x) |ϕ0 (x)| dx.
a
γ
In case m = 2, let us consider a surface M ⊂ R3 , with a coordinate chart ϕ : O →
U ⊂ M. For f supported in U, an alternative way to write the surface integral is
Z
Z
(12.11)
f dS = f ◦ ϕ(x) |∂1 ϕ × ∂2 ϕ| dx1 dx2 ,
M
O
where u × v is the cross product of vectors u and v in R3 . To see this, we compare
this integrand with the one in (12.6). In this case,
µ
¶
∂1 ϕ · ∂1 ϕ ∂ 1 ϕ · ∂2 ϕ
(12.12)
g = Det
= |∂1 ϕ|2 |∂2 ϕ|2 − (∂1 ϕ · ∂2 ϕ)2 .
∂2 ϕ · ∂1 ϕ ∂ 2 ϕ · ∂2 ϕ
Recall from (5.28) that |u × v| = |u| |v| | sin θ|, where θ is the angle between u and
v. Equivalently, since u · v = |u| |v| cos θ,
¡
¢
(12.13)
|u × v|2 = |u|2 |v|2 1 − cos2 θ = |u|2 |v|2 − (u · v)2 .
84
√
Thus we see that |∂1 ϕ × ∂2 ϕ| = g, in this case, and (12.11) is equivalent to (12.6).
An important class of surfaces is the class of graphs of smooth functions. Let
n−1
u ∈ C 1 (Ω),
, and let M be the graph of z = u(x). The map
¡ for an
¢ open Ω ⊂ R
ϕ(x) = x, u(u) provides a natural coordinate system, in which the metric tensor
is given by
(12.14)
gjk (x) = δjk +
∂u ∂u
.
∂xj ∂xk
If u is C 1 , we see that gjk is continuous. To calculate g = Det(gjk ), at a given
point p ∈ Ω, if ∇u(p) 6= 0, rotate coordinates so that ∇u(p) is parallel to the x1
axis. We see that
√
(12.15)
¢1
¡
g = 1 + |∇u|2 2 .
In particular, the (n − 1)-dimensional volume of the surface M is given by
Z
(12.16)
Vn−1 (M ) =
Z
dS =
M
¡
¢1
1 + |∇u(x)|2 2 dx.
Ω
Particularly important examples of surfaces are the unit spheres S n−1 in Rn ,
S n−1 = {x ∈ Rn : |x| = 1}.
Spherical polar coordinates on Rn are defined in terms of a smooth diffeomorphism
(12.17)
R : (0, ∞) × S n−1 −→ Rn \ 0,
R(r, ω) = rω.
If (h`m ) denotes the metric tensor on S n−1 induced from its inclusion in Rn , we see
that the Euclidean metric tensor can be written
µ
¶
¡ ¢
1
(12.18)
ejk =
.
r2 h`m
Thus
√
(12.19)
√
e = rn−1 h.
We therefore have the following result for integrating a function in spherical polar
coordinates.
Z
Z hZ ∞
i
(12.20)
f (x) dx =
f (rω)rn−1 dr dS(ω).
Rn
S n−1
0
85
We next compute the (n − 1)-dimensional area An−1 of the unit sphere S n−1 ⊂
Rn , using (12.20) together with the computation
Z
2
n
e−|x| dx = π 2 ,
(12.21)
Rn
from (11.69). First note that, whenever f (x) = ϕ(|x|), (12.20) yields
Z
Z
∞
ϕ(|x|) dx = An−1
(12.22)
ϕ(r)rn−1 dr.
0
Rn
2
In particular, taking ϕ(r) = e−r and using (12.21), we have
Z
(12.23)
π
n
2
Z
∞
= An−1
e
−r 2 n−1
r
dr =
0
1
2 An−1
∞
n
e−s s 2 −1 ds,
0
where we used the substitution s = r2 to get the last identity. We hence have
n
(12.24)
An−1
2π 2
=
,
Γ( n2 )
where Γ(z) is Euler’s Gamma function, defined for z > 0 by
Z
(12.25)
∞
Γ(z) =
e−s sz−1 ds.
0
We need to complement (12.24) with some results on Γ(z) allowing a computation
of Γ( n2 ) in terms of more familiar quantities. Of course, setting z = 1 in (12.25),
we immediately get
(12.26)
Γ(1) = 1.
Also, setting n = 1 in (12.23), we have
Z
Z
∞
1
2
e
π =2
−r 2
dr =
0
∞
1
e−s s− 2 ds,
0
or
(12.27)
1
Γ( 12 ) = π 2 .
We can proceed inductively from (12.26)-(12.27) to a formula for Γ( n2 ) for any
n ∈ Z+ , using the following.
86
Lemma 12.1. For all z > 0,
(12.28)
Γ(z + 1) = zΓ(z).
Proof. We can write
Z
∞³
Γ(z + 1) = −
0
d −s ´ z
e
s ds =
ds
Z
∞
e−s
0
d ¡ z¢
s ds,
ds
the last identity by integration by parts. The last expression here is seen to equal
the right side of (12.28).
Consequently, for k ∈ Z+ ,
(12.29)
Γ(k) = (k − 1)!,
¢ ¡
¢
¡ ¢ 1
¡
Γ k + 12 = k − 21 · · · 21 π 2 .
Thus (12.24) can be rewritten
(12.30)
A2k−1 =
2π k
,
(k − 1)!
2π k
¢
¡ ¢.
A2k = ¡
k − 12 · · · 12
We discuss another important example of a smooth surface, in the space M (n) ≈
R of real n × n matrices, namely SO(n), the set of matrices T ∈ M (n) satisfying
T t T = I and Det T > 0 (hence Det T = 1). As noted in exercises 4-5 of §4, the
exponential map Exp: M (n) → M (n) defined by Exp(A) = eA has the property
n2
(12.31)
Exp : Skew(n) −→ SO(n),
where Skew(n) is the set of skew-symmetric matrices in M (n). Also D Exp(0)A =
A; hence
(12.32)
D Exp(0) = ι : Skew(n) ,→ M (n).
It follows from the Inverse Function Theorem that there is a neighborhood O of
0 in Skew(n) which is mapped by Exp diffeomorphically onto a smooth surface
U ⊂ M (n), of dimension m = 12 n(n − 1). Furthermore, U is a neighborhood of I in
SO(n). For general T ∈ SO(n), we can define maps
(12.33)
ϕT : O −→ SO(n),
ϕT (A) = T Exp(A),
and obtain coordinate charts in SO(n), which is consequently a smooth surface of
dimension 12 n(n−1) in M (n). Note that SO(n) is a closed bounded subset of M (n);
hence it is compact.
We use the inner product on M (n) computed componentwise; equivalently,
(12.34)
hA, Bi = Tr(B t A) = Tr (BAt ).
This produces a metric tensor on SO(n). The surface integral over SO(n) has the
following important invariance property.
87
¡
¢
Proposition 12.2. Given f ∈ C SO(n) , if we set
(12.35)
ρT f (X) = f (XT ),
for T, X ∈ SO(n), we have
Z
Z
ρT f dS =
(12.36)
SO(n)
λT f (X) = f (T X),
Z
λT f dS =
SO(n)
f dS.
SO(n)
Proof. Given T ∈ SO(n), the maps RT , LT : M (n) → M (n) defined by RT (X) =
XT, LT (X) = T X are easily seen from (12.34) to be isometries. Thus they yield
maps of SO(n) to itself which preserve the metric tensor, proving (12.36).
¡
¢
R
Since SO(n) is compact, its total volume V SO(n) =
1 dS is finite. We
SO(n)
define the integral with respect to ‘Haar measure’
Z
Z
1
¢
(12.37)
f (g) dg = ¡
V SO(n)
SO(n)
f dS.
SO(n)
This is used in many arguments involving ‘averaging over rotations.’ One example
will arise in the proof of Proposition 20.5.
Exercises
1. Define ϕ : [0, θ] → R2 to be ϕ(t) = (cos t, sin t). Show that, if 0 < θ ≤ 2π, the
image of [0, θ] under ϕ is an arc of the unit circle, of length θ. Deduce that the unit
circle in R2 has total length 2π.
This result follows also from (12.30).
Remark. Use the definition of π given in the auxiliary problem set after §3.
This length formula provided the original definition of π, in ancient Greek geometry.
2. Compute the volume of the unit ball B n = {x ∈ Rn : |x| ≤ 1}.
Hint. Apply (12.22) with ϕ = χ[0,1] .
3. Suppose M is a surface in Rn of dimension
2, and
¡
¢ ϕ : O → U ⊂ M is a coordinate
chart, with O ⊂ R2 . Set ϕjk (x) = ϕj (x), ϕk (x) , so ϕjk : O → R2 . Show that the
formula (12.6) for the surface integral is equivalent to
s ³
Z
Z
´2
X
f dS = f ◦ ϕ(x)
Det Dϕjk (x) dx.
M
O
j<k
88
√
Hint. Show that the quantity under
is equal to (12.12).
4. If M is an m-dimensional surface, ϕ : O → M ⊂ M a coordinate chart, for
J = (j1 , . . . , jm ) set
¡
¢
ϕJ (x) = ϕj1 (x), . . . , ϕjm (x) ,
ϕJ : O → R m .
Show that the formula (12.6) is equivalent to
Z
s
Z
f dS =
M
f ◦ ϕ(x)
X
³
Det DϕJ (x)
´2
dx.
j1 <···<jm
O
√
is equal
Hint. Reduce to the following. For fixed x0 ∈ O, the quantity under
¡
¢
to g(x) at x = x0 , in the case Dϕ(x0 ) = Dϕ1 (x0 ), . . . , Dϕm (x0 ), 0, . . . , 0 .
Reconsider this problem when working on the exercises for §13.
5. Let M be the graph in Rn+1 of xn+1 = u(x), x ∈ O ⊂ Rn . Let N be the unit
¡
¢− 1
normal to M given by N = 1 + |∇u|2 2 (−∇u, 1). Show that, for a continuous
function f : M → Rn+1 ,
Z
Z
¡
¢ ¡
¢
f · N dS = f x, u(x) · −∇u(x), 1 dx.
M
The left side is often denoted
O
R
M
f · dS.
89
13. Differential forms
It is very desirable to be able to make constructions which depend as little as
possible on a particular choice of coordinate system. The calculus of differential
forms, whose study we now take up, is one convenient set of tools for this purpose.
We start with the notion of a 1-form. It is an object that gets integrated over a
curve; formally, a 1-form on Ω ⊂ Rn is written
X
(13.1)
α=
aj (x) dxj .
j
If γ : [a, b] → Ω is a smooth curve, we set
Z
(13.2)
Z
b
α=
a
γ
X
¡
¢
aj γ(t) γj0 (t) dt.
In other words,
Z
(13.3)
γ∗α
α=
γ
where I = [a, b] and γ ∗ α =
Z
P
j
I
aj (γ(t))γj0 (t) is the pull-back of α under the map γ.
More generally, if F : O → Ω is a smooth map (O ⊂ Rm open), the pull-back F ∗ α
is a 1-form on O defined by
X
aj (F (y))(∂Fj /∂yk ) dyk .
(13.4)
F ∗α =
j,k
The usual change of variable for integrals gives
Z
Z
(13.5)
α = F ∗α
γ
σ
if γ is the curve F ◦ σ.
If F : O → Ω is a diffeomorphism, and
X
(13.6)
X=
bj (x)∂/∂xj
is a vector field on Ω, recall that we have the vector field on O :
¡
¢
(13.7)
F# X(y) = DF −1 (p) X(p), p = F (y).
90
If we define a pairing between 1-forms and vector fields on Ω by
(13.8)
hX, αi =
X
bj (x)aj (x) = b · a,
j
a simple calculation gives
(13.9)
hF# X, F ∗ αi = hX, αi ◦ F.
Thus, a 1-form on Ω is characterized at each point p ∈ Ω as a linear transformation
of vectors at p to R.
More generally, we can regard a k-form α on Ω as a k-multilinear map on vector
fields:
(13.10)
α(X1 , . . . , Xk ) ∈ C ∞ (Ω);
we impose the further condition of anti-symmetry:
(13.11)
α(X1 , . . . , Xj , . . . , X` , . . . , Xk ) = −α(X1 , . . . , X` , . . . , Xj , . . . , Xk ).
There is a special notation we use for k-forms. If 1 ≤ j1 < · · · < jk ≤ n, j =
(j1 , . . . , jk ), we set
(13.12)
α=
X
aj (x) dxj1 ∧ · · · ∧ dxjk
j
where
(13.13)
aj (x) = α(Dj1 , . . . , Djk ),
Dj = ∂/∂xj .
More generally, we assign meaning to (13.12) summed over all k-indices (j1 , . . . , jk ),
where we identify
(13.14)
dxj1 ∧ · · · ∧ dxjk = (sgn σ) dxjσ(1) ∧ · · · ∧ dxjσ(k) ,
σ being a permutation of {1, . . . , k}. If any jm = j` (m 6= `), then (12.14) vanishes.
A common notation for the statement that α is a k-form on Ω is
α ∈ Λk (Ω).
(13.15)
In particular, we can write a 2-form β as
(13.16)
β=
X
bjk (x) dxj ∧ dxk
91
and pick coefficients
satisfying bjk (x)
P
P= −bkj (x). According to (13.12)-(13.13), if
we set U =
uj (x)∂/∂xj and V = vj (x)∂/∂xj , then
(13.17)
β(U, V ) = 2
X
bjk (x)uj (x)v k (x).
P
If bjk is not required to be antisymmetric, one gets β(U, V ) = (bjk − bkj )uj v k .
If F : O → Ω is a smooth map as above, we define the pull-back F ∗ α of a k-form
α, given by (13.12), to be
F ∗α =
(13.18)
X
¡
¢
aj F (y) (F ∗ dxj1 ) ∧ · · · ∧ (F ∗ dxjk )
j
where
F ∗ dxj =
(13.19)
X
(∂Fj /∂y` )dy` ,
`
the algebraic computation in (13.18) being performed using the rule (13.14). Extending (13.9), if F is a diffeomorphism, we have
(F ∗ α)(F# X1 , . . . , F# Xk ) = α(X1 , . . . , Xk ) ◦ F.
(13.20)
If B = (bjk ) is an n × n matrix, then, by (13.14),
³X
´
´
³X
´ ³X
bnk dxk
b2k dxk ∧ · · · ∧
b1k dxk ∧
(13.21)
k
k
k
=
³X
´
(sgn σ)b1σ(1) b2σ(2) · · · bnσ(n) dx1 ∧ · · · ∧ dxn
σ
¡
¢
= det B dx1 ∧ · · · ∧ dxn ,
Hence, if F : O → Ω is a C 1 map between two domains of dimension n, and
α = A(x)dx1 ∧ · · · ∧ dxn is an n-form on Ω, then
(13.22)
F ∗ α = det DF (y) A(F (y)) dy1 ∧ · · · ∧ dyn .
Comparison with the change of variable
R formula for multiple integrals suggests
that one has an intrinsic definition of α when α is an n-form on Ω, n = dim
Ω
Ω. To implement this, we need to take into account that det DF (y) rather than
| det DF (y)| appears in (13.21). We say a smooth map F : O → Ω between two open
subsets of Rn preserves orientation if det DF (y) is everywhere positive. The object
called an ‘orientation’ on Ω can be identified as an equivalence class of nowhere
vanishing n-forms on Ω, two such forms being equivalent if one is a multiple of
92
another by a positive function in C ∞ (Ω); the standard orientation on Rn is determined by dx1 ∧ · · · ∧ dxn . If S is an n-dimensional surface in Rn+k , an orientation
on S can also be specified by a nowhere vanishing form ω ∈ Λn (S). If such a form
exists, S is said to be orientable. The equivalence class of positive multiples a(x)ω
is said to consist of ‘positive’ forms. A smooth map ψ : S → M between oriented ndimensional surfaces preserves orientation provided ψ ∗ σ is positive on S whenever
σ ∈ Λn (M ) is positive. If S is oriented, one can choose coordinate charts which are
all orientation preserving. We mention that there exist surfaces which cannot be
oriented.
If O, Ω are open in Rn and F : O → Ω is an orientation preserving diffeomorphism, we have
Z
Z
∗
(13.23)
F α = α.
O
Ω
More generally, if S is an n-dimensional surface with an orientation, say the image
of an open set O ⊂ Rn by ϕ : O → S, carrying the natural orientation of O, we can
set
Z
Z
(13.24)
α = ϕ∗ α
S
O
for an n-form α on S. If it takes several coordinate patches to cover S, define
R
α
S
by writing α as a sum of forms, each supported
R on one patch.
We need to show that this definition of α is independent of the choice of
S
coordinate system on S (as long as the orientation of S is respected). Thus, suppose
ϕ : O → U ⊂ S and ψ : Ω → U ⊂ S are both coordinate patches, so that
F = ψ −1 ◦ ϕ : O → Ω is an orientation-preserving diffeomorphism, as in Fig.12.1
of the last section. We need to check that, if α is an n-form on S, supported on U,
then
Z
Z
∗
(13.25)
ϕ α = ψ ∗ α.
O
Ω
To see this, first note that, for any form α of any degree,
(13.26)
ψ ◦ F = ϕ =⇒ ϕ∗ α = F ∗ ψ ∗ α.
P
It suffices to check this for α = dxj . Then (13.14) gives ψ ∗ dxj = (∂ψj /∂x` )dx` ,
so
X ∂F` ∂ψj
X ∂ϕj
F ∗ ψ ∗ dxj =
dxm , ϕ∗ dxj =
dxm ;
∂xm ∂x`
∂xm
m
`,m
93
but the identity of these forms follows from the chain rule:
Dϕ = (Dψ)(DF ) =⇒
X ∂ψj ∂F`
∂ϕj
=
.
∂xm
∂x` ∂xm
`
Now that we have (13.26), we see that the left side of (13.25) is equal to
Z
F ∗ (ψ ∗ α),
O
which is equal to the right side of (13.25), by (13.23). Thus the integral of an
n-form over an oriented n-dimensional surface is well defined.
Exercises
1. If F : U0 → U1 and G : U1 → U2 are smooth maps and α ∈ Λk (U2 ), then (13.26)
implies
(G ◦ F )∗ α = F ∗ (G∗ α) in Λk (U0 ).
In the special case that Uj = Rn and F and G are linear maps, and k = n, show
that this identity implies
det(GF ) = (det F )(det G).
2. Let Λk Rn denote
¡n¢the space of k-forms (13.12) with constant coefficients. Show
k n
that dimR Λ R = k .
If T : Rm → Rn is linear, then T ∗ preserves this class of spaces; we denote the map
Λk T ∗ : Λk Rn −→ Λk Rm .
Similarly, replacing T by T ∗ yields
Λk T : Λk Rm −→ Λk Rn .
3. Show that Λk T is uniquely characterized as a linear map from Λk Rm to Λk Rn
which satisfies
(Λk T )(v1 ∧ · · · ∧ vk ) = (T v1 ) ∧ · · · ∧ (T vk ),
vj ∈ Rm .
94
4. If {e1 , . . . , en } is the standard orthonormal basis of Rn , define an inner product
on Λk Rn by declaring an orthonormal basis to be
{ej1 ∧ · · · ∧ ejk : 1 ≤ j1 < · · · < jk ≤ n}.
Show that, if {u1 , . . . , un } is any other orthonormal basis of Rn , then the set {uj1 ∧
· · · ∧ ujk : 1 ≤ j1 < · · · < jk ≤ n} is an orthonormal basis of Λk Rn .
5. Let ϕ : O → Rn be smooth, with O ⊂ Rm open. Show that, for each x ∈ O,
kΛm Dϕ(x)ωk2 = Det Dϕ(x)t Dϕ(x),
where ω = e1 ∧ · · · ∧ em . Take another look at problem 4 of §12.
95
14. Products and exterior derivatives of forms
Having discussed the notion of a differential form as something to be integrated,
we now consider some operations on forms. There is a wedge product, or exterior
product, characterized as follows. If α ∈ Λk (Ω) has the form (13.12) and if
X
(14.1)
β=
bi (x)dxi1 ∧ · · · ∧ dxi` ∈ Λ` (Ω),
i
define
(14.2)
α∧β =
X
aj (x)bi (x)dxj1 ∧ · · · ∧ dxjk ∧ dxi1 ∧ · · · ∧ dxi`
j,i
in Λk+` (Ω). A special case of this arose in (13.18)-(13.21). We retain the equivalence
(13.14). It follows easily that
(14.3)
α ∧ β = (−1)k` β ∧ α.
In addition, there is an interior product if α ∈ Λk (Ω) with a vector field X on
Ω, producing ιX α = αcX ∈ Λk−1 (Ω), defined by
(14.4)
(αcX)(X1 , . . . , Xk−1 ) = α(X, X1 , . . . , Xk−1 ).
Consequently, if α = dxj1 ∧ · · · ∧ dxjk , Di = ∂/∂xi , then
(14.5)
c j ∧ · · · ∧ dxj
αcDj` = (−1)`−1 dxj1 ∧ · · · ∧ dx
`
k
c j denotes removing the factor dxj . Furthermore,
where dx
`
`
i∈
/ {j1 , . . . , jk } =⇒ αcDi = 0.
If F : O → Ω is a diffeomorphism and α, β are forms and X a vector field on Ω,
it is readily verified that
(14.6)
F ∗ (α ∧ β) = (F ∗ α) ∧ (F ∗ β),
F ∗ (αcX) = (F ∗ α)c(F# X).
We make use of the operators ∧k and ιk on forms:
(14.7)
∧k α = dxk ∧ α,
ιk α = αcDk .
There is the following useful anticommutation relation:
(14.8)
∧k ι` + ι` ∧k = δk` ,
96
where δk` is 1 if k = `, 0 otherwise. This is a fairly straightforward consequence of
(14.5). We also have
(14.9)
∧j ∧k + ∧k ∧j = 0,
ιj ιk + ιk ιj = 0.
From (14.8)-(14.9) one says that the operators {ιj , ∧j : 1 ≤ j ≤ n} generate a
‘Clifford algebra.’
Another important operator on forms is the exterior derivative:
d : Λk (Ω) −→ Λk+1 (Ω),
(14.10)
defined as follows. If α ∈ Λk (Ω) is given by (13.12), then
X
(14.11)
dα =
(∂aj /∂x` )dx` ∧ dxj1 ∧ · · · ∧ dxjk .
j,`
Equivalently,
(14.12)
dα =
n
X
∂` ∧` α
`=1
where ∂` = ∂/∂x` and ∧` is given by (14.7). The antisymmetry dxm ∧ dx` =
−dx` ∧ dxm , together with the identity ∂ 2 aj /∂x` ∂xm = ∂ 2 aj /∂xm ∂x` , implies
(14.13)
d(dα) = 0,
for any differential form α. We also have a product rule:
(14.14)
d(α ∧ β) = (dα) ∧ β + (−1)k α ∧ (dβ),
α ∈ Λk (Ω), β ∈ Λj (Ω).
The exterior derivative has the following important property under pull-backs:
F ∗ (dα) = dF ∗ α,
(14.15)
if α ∈ Λk (Ω) and F : O → Ω is a smooth map. To see this, extending (14.14) to a
formula for d(α ∧ β1 ∧ · · · ∧ β` ) and using this to apply d to F ∗ α, we have
dF ∗ α =
(14.16)
X ∂ ¡
¢
¡
¢
¡
¢
aj ◦ F (x) dx` ∧ F ∗ dxj1 ∧ · · · ∧ F ∗ dxjk
∂x`
j,`
X
¡
¢¡
¢
¡
¢
¡
¢
+
(±)aj F (x) F ∗ dxj1 ∧ · · · ∧ d F ∗ dxjν ∧ · · · ∧ F ∗ dxjk .
j,ν
Now
¡
¢ X ∂ 2 Fi
dxj ∧ dx` = 0,
d F ∗ dxi =
∂xj ∂x`
j,`
97
so only the first sum in (14.16) contributes to dF ∗ α. Meanwhile,
(14.17)
F ∗ dα =
X ∂aj ¡
¢
¡
¢
¡
¢
F (x) (F ∗ dxm ) ∧ F ∗ dxj1 ∧ · · · ∧ F ∗ dxjk ,
∂xm
j,m
so (14.15) follows from the identity
(14.18)
X ∂aj ¡
X ∂ ¡
¢
¢
aj ◦ F (x) dx` =
F (x) F ∗ dxm ,
∂x`
∂xm
m
`
which in turn follows from the chain rule.
If dα = 0, we say α is closed; if α = dβ for some β ∈ Λk−1 (Ω), we say α is exact.
Formula (14.13) implies that every exact form is closed. The converse is not always
true globally. Consider the multi-valued angular coordinate θ on R2 \ (0, 0); dθ is
a single valued closed form on R2 \ (0, 0) which is not globally exact. As we will
see below, every closed form is locally exact.
First we introduce another important construction. If α ∈ Λk (Ω) and X is a
t
, the Lie derivative LX α is defined to be
vector field on Ω, generating a flow FX
(14.19)
¡ t ¢∗
α|t=0 .
LX α = (d/dt) FX
Note the formal similarity to the definition (8.2) of LX Y for a vector field Y. Recall
the formula (8.4) for LX Y. The following is not only a computationally convenient
formula for LX α, but also an identity of fundamental importance.
Proposition 14.1. We have
(14.20)
LX α = d(αcX) + (dα)cX.
Proof. First we compare both sides in the special case X = ∂/∂x` = D` . Note that
X
¡ t ¢∗
aj (x + te` )dxj1 ∧ · · · ∧ dxjk ,
FD` α =
j
so
(14.21)
LD` α =
X ∂aj
j
∂x`
dxj1 ∧ · · · ∧ dxjk = ∂` α.
To evaluate the right side of (14.20) with X = D` , use (14.12) to write this quantity
as
(14.22)
n
X
¡
¢
d(ι` α) + ι` dα =
∂j ∧j ι` + ι` ∂j ∧j α.
j=1
98
Using the commutativity of ∂j with ∧j and with ι` , and the anticommutation
relations (14.8), we see that the right side of (14.22) is ∂` α, which coincides with
(14.21). Thus the proposition holds for X = ∂/∂x` .
Now we can prove the proposition in general, for a smooth vector field X on Ω.
It is to be verified at each point x0 ∈ Ω. If X(x0 ) 6= 0, choose a coordinate system
about x0 so X = ∂/∂x1 and use the calculation above. This shows that the desired
identity holds on the set of points {x0 ∈ Ω : X(x0 ) 6= 0}, and by continuity it holds
on the closure of this set. However, if x0 ∈ Ω has a neighborhood on which X
vanishes, it is clear that LX α = 0 near x0 and also αcX and dαcX vanish near x0 .
This completes the proof.
The identity (14.20) can furnish a formula for the exterior derivative in terms of
Lie brackets, as follows. By (8.4) and (13.20) we have, for a k-form ω,
X
¡
¢
(14.23) LX ω (X1 , . . . , Xk ) = X ·ω(X1 , . . . , Xk )−
ω(X1 , . . . , [X, Xj ], . . . , Xk ).
j
Now (14.20) can be rewritten as
(14.24)
ιX dω = LX ω − dιX ω.
This implies
(14.25)
¡
¢
¡
¢
(dω)(X0 , X1 , . . . , Xk ) = LX0 ω (X1 , . . . , Xk ) − dιX0 ω (X1 , . . . , Xk ).
We can substitute (14.23) into the first term on the right in (14.25). In case ω is a
1-form, the last term is easily evaluated; we get
(14.26)
(dω)(X0 , X1 ) = X0 · ω(X1 ) − X1 · ω(X0 ) − ω([X0 , X1 ]).
More generally, we can tackle the last term on the right side of (14.25) by the same
method, using (14.24) with ω replaced by the (k − 1)-form ιX0 ω. In this way we
inductively obtain the formula
(14.27)
k
X
b` , . . . , Xk )
(dω)(X0 , . . . , Xk ) =
(−1)` X` · ω(X0 , . . . , X
`=0
+
X
b` , . . . , X
bj , . . . , Xk ).
(−1)j+` ω([X` , Xj ], X0 , . . . , X
0≤`<j≤k
s+t
s
t
Note that from (14.19) and the property FX
= FX
FX
it easily follows that
¡ t ¢∗
¡ t ¢∗
¡ t ¢∗
(14.28)
(d/dt) FX
α = LX FX
α = FX
LX α.
It is useful to generalize this. Let Ft be any smooth family of diffeomorphisms from
M into M. Define vector fields Xt on Ft (M ) by
(14.29)
(d/dt)Ft (x) = Xt (Ft (x)).
99
Then it easily follows that, for α ∈ Λk M,
(d/dt)Ft∗ α = Ft∗ LXt α
¤
£
= Ft∗ d(αcXt ) + (dα)cXt .
(14.30)
In particular, if α is closed, then, if Ft are diffeomorphisms for 0 ≤ t ≤ 1,
Z
F1∗ α
(14.31)
−
F0∗ α
= dβ,
1
β=
0
Ft∗ (αcXt ) dt.
Using this, we can prove the celebrated Poincare Lemma.
Theorem 14.2. If B is the unit ball in Rn , centered at 0, α ∈ Λk (B), k > 0, and
dα = 0, then α = dβ for some β ∈ Λk−1 (B).
Proof. Consider the family of maps Ft : B → B given by Ft (x) = tx. For 0 < t ≤ 1
these are diffeomorphisms, and the formula (14.30) applies. Note that
F1∗ α = α,
F0∗ α = 0.
Now a simple limiting argument shows (14.31) remains valid, so α = dβ with
Z
(14.32)
1
β=
0
Ft∗ (αcV )t−1 dt
P
where V = r∂/∂r =
xj ∂/∂xj . Since F0∗ = 0, the apparent singularity in the
integrand is removable.
Since in the proof of the theorem we dealt with Ft such that F0 was not a
diffeomorphism, we are motivated to generalize (14.31) to the case where Ft : M →
N is a smooth family of maps, not necessarily diffeomorphisms. Then (14.29) does
not work to define Xt as a vector field, but we do have
(14.33)
(d/dt)Ft (x) = Z(t, x);
Z(t, x) ∈ TFt (x) N.
Now in (14.31) we see that
¡
¢¡
¢
F ∗ (αcXt )(Y1 , . . . , Yk−1 ) = α Ft (x) Xt , DFt (x)Y1 , . . . , DFt (x)Yk−1 ,
and we can replace Xt by Z(t, x). Hence, in this more general case, if α is closed,
we can write
Z 1
∗
∗
(14.34)
F1 α − F0 α = dβ; β =
γt dt
0
where, at x ∈ M,
(14.35)
¡
¢¡
¢
γt (Y1 , . . . , Yk−1 ) = α Ft (x) Z(t, x), DFt (x)Y1 , . . . , DFt (x)Yk−1 .
100
For an alternative approach to this homotopy invariance, see exercise 4.
Exercise
1. If α is a k-form, verify the formula (14.14), i.e., d(α∧β) = (dα)∧β +(−1)k α∧dβ.
If α is closed and β is exact, show that α ∧ β is exact.
2. Let F be a vector field on U, open in R3 , F =
1-form ϕ =
3
P
1
3
P
1
fj (x)∂/∂xj . Consider the
fj (x)dxj . Show that dϕ and curl F are related in the following way:
curl F =
3
X
gj (x) ∂/∂xj ,
1
dϕ = g1 (x)dx2 ∧ dx3 + g2 (x)dx3 ∧ dx1 + g3 (x)dx1 ∧ dx2 .
3. If F and ϕ are related as in exercise 3, show that curl F is uniquely specified by
the relation
dϕ ∧ α = hcurl F, αiω
for all 1-forms α on U ⊂ R3 , where ω = dx1 ∧ dx2 ∧ dx3 is the volume form.
4. Let B be a ball in R3 , F a smooth vector field on B. Show that
∃ u ∈ C ∞ (B) s.t. F = grad u ⇐⇒ curl F = 0.
Hint. Compare F = grad u with ϕ = du.
5. Let B be a ball in R3 and G a smooth vector field on B. Show that
∃ vector field F s.t. G = curl F ⇐⇒ div G = 0.
Hint. If G =
3
P
1
gj (x)dxj , consider ψ = g1 (x)dx2 ∧dx3 +g2 (x)dx3 ∧dx1 +g3 (x)dx1 ∧
dx2 . Compute dψ.
101
6. Suppose f0 , f1 : X → Y are smoothly homotopic maps, via Φ : X × R →
Y, Φ(x, j) = fj (x). Let α ∈ Λk Y. Apply (13.51) to α
˜ = Φ∗ α ∈ Λk (X × R), with
˜ and from
Ft (x, s) = (x, s + t), to obtain β˜ ∈ Λk−1 (X × R) such that F1∗ α
˜−α
˜ = dβ,
k−1
∗
∗
there produce β ∈ Λ
(X) such that f1 α − f0 α = dβ.
∗˜
Hint. Use β = ι β, where ι(x) = (x, 0).
For the next set of exercises, let Ω be a planar domain, X = f (x, y)∂/∂x +
g(x, y)∂/∂y a nonvanishing vector field on Ω. Consider the 1-form α = g(x, y)dx −
f (x, y)dy.
7. Let γ : I → Ω be a smooth curve, I = (a, b). Show that the image C = γ(I)
is the image of an integral curve of X if and only if γ ∗ α = 0. Consequently, with
slight abuse of notation, one describes the integral curves by gdx − f dy = 0.
If α is exact, i.e., α = du, conclude the level curves of u are the integral curves of
X.
8. A function ϕ is called an integrating factor if α
˜ = ϕα is exact, i.e., if d(ϕα) = 0,
provided Ω is simply connected. Show that an integrating factor always exists, at
least locally. Show that ϕ = ev is an integrating factor if and only if Xv = − div
X.
Reconsider problem 7 in §7.
Find an integrating factor for α = (x2 + y 2 − 1)dx − 2xydy.
9. Let Y be a vector field which you know how to linearize (i.e., conjugate to ∂/∂x)
and suppose LY α = 0. Show how to construct an integrating factor for α. Treat
the more general case LX α = cα for some constant c. Compare the discussion in
§8 of the situation where [X, Y ] = cX.
102
15. The general Stokes formula
A basic result in the theory of differential forms is the generalized Stokes formula:
Proposition 15.1. Given a compactly supported (k − 1)-form β of class C 1 on an
oriented k-dimensional surface M (of class C 2 ) with boundary ∂M, with its natural
orientation,
Z
Z
β
(15.1)
dβ =
M
∂M
The orientation induced on ∂M is uniquely determined by the following requirement. If
M = Rk− = {x ∈ Rk : x1 ≤ 0}
(15.2)
then ∂M = {(x2 , . . . , xk )} has the orientation determined by dx2 ∧ · · · ∧ dxk .
Proof. Using a partition of unity and invariance of the integral and the exterior
derivative under coordinate transformations, it suffices to prove this when M has
the form (15.2). In that case, we will be able to deduce (15.1) from the fundamental
theorem of calculus. Indeed, if
(15.3)
c j ∧ · · · ∧ dxk ,
β = bj (x) dx1 ∧ · · · ∧ dx
with bj (x) of bounded support, we have
(15.4)
dβ = (−1)j−1 (∂bj /∂xj ) dx1 ∧ · · · ∧ dxk .
If j > 1, we have
Z
(15.5)
Z nZ
∞
dβ =
−∞
M
o
∂bj
dxj dx0 = 0,
∂xj
and also κ∗ β = 0, where κ : ∂M → M is the inclusion. On the other hand, for
j = 1, we have
Z
Z nZ 0
o
∂b1
dβ =
dx1 dx2 · · · dxk
−∞ ∂x1
M
Z
(15.6)
= b1 (0, x0 )dx0
Z
=
β.
∂M
103
This proves Stokes’ formula (15.1).
It is useful to allow singularities in ∂M. We say a point p ∈ M is a corner of
dimension ν if there is a neighborhood U of p in M and a C 2 diffeomorphism of U
onto a neighborhood of 0 in
K = {x ∈ Rk : xj ≤ 0, for 1 ≤ j ≤ k − ν},
(15.7)
where k is the dimension of M. If M is a C 2 surface and every point p ∈ ∂M is
a corner (of some dimension), we say M is a C 2 surface wih corners. In such a
case, ∂M is a locally finite union of C 2 surfaces with corners. The following result
extends Proposition 15.1.
Proposition 15.2. If M is a C 2 surface of dimension k, with corners, and β is a
compactly supported (k − 1)-form of class C 1 on M , then (15.1) holds.
Proof. It suffices to establish this when β is supported on a small neighborhood of
a corner p ∈ ∂M, of the form U described above. Hence it suffices to show that
(15.3) holds whenever β is a (k − 1)-form of class C 1 , with compact support on K
in (15.7); and we can take β to have the form (15.3). Then, for j > k − ν, (15.5)
still holds, while, for j ≤ k − ν, we have, as in (15.6),
Z
dβ = (−1)
j−1
0
−∞
K
(15.8)
Z nZ
o
∂bj
c j · · · dxk
dxj dx1 · · · dx
∂xj
Z
= (−1)j−1
Z
=
β.
c j · · · dxk
bj (x1 , . . . , xj−1 , 0, xj+1 , . . . , xk ) dx1 · · · dx
∂K
The reason we required M to be a surface of class C 2 (with corners) in Propositions 15.1 and 15.2 is the following. Due to the formulas (13.18)-(13.19) for a
pull-back, if β is of class C j and F is of class C ` , then F ∗ β is generally of class
C µ , with µ = min(j, ` − 1). Thus, if j = ` = 1, F ∗ β might be only of class C 0 ,
so there is not a well-defined notion of a differential form of class C 1 on a C 1 surface, though such a notion is well defined on a C 2 surface. This problem can be
overcome, and one can extend Propositions 15.1-15.2 to the case where M is a C 1
surface (with corners), and β is a (k −1)-form with the property that both β and dβ
are continuous. We will not go into the details. Substantially more sophisticated
generalizations are given in [Fed].
Exercises
104
1. Consider the region Ω ⊂ R2 defined by
Ω = {(x, y) : 0 ≤ y ≤ x2 , 0 ≤ x ≤ 1}.
Show that the boundary points (1, 0) and (1, 1) are corners, but (0, 0, ) is not a
corner. The boundary of Ω is too sharp at (0, 0) to be a corner; it is called a ‘cusp.’
Extend Proposition 15.2. to treat this region.
2. Suppose U ⊂ Rn is an open set with smooth boundary M = ∂U, and U has
the standard orientation, determined by dx1 ∧ · · · ∧ dxn . (See the paragraph above
(13.23).) Let ϕ ∈ C 1 (Rn ) satisfy ϕ(x) < 0 for x ∈ U, ϕ(x) > 0 for x ∈ Rn \ U ,
and grad ϕ(x) 6= 0 for x ∈ ∂U, so grad ϕ points out of U. Show that the natural
oriemtation on ∂U, as defined after Proposition 15.1, is the same as the following.
The equivalence class of forms β ∈ Λn−1 (∂U ) defining the orientation on ∂U satisfies
the property that dϕ ∧ β is a positive multiple of dx1 ∧ · · · ∧ dxn , on ∂U.
3. Suppose U = {x ∈ Rn : xn < 0}. Show that the orientation on ∂U described
above is that of (−1)n−1 dx1 ∧ · · · ∧ dxn−1 .
105
16. The classical Gauss, Green, and Stokes formulas
The case of (15.1) where S = Ω is a region in R2 with smooth boundary yields
the classical Green Theorem. In this case, we have
³ ∂g
∂f ´
(16.1)
β = f dx + gdy, dβ =
−
dx ∧ dy,
∂x ∂y
and hence (15.1) becomes the following
Proposition 16.1. If Ω is a region in R2 with smooth boundary, and f and g are
smooth functions on Ω, which vanish outside some compact set in Ω, then
ZZ ³
Z
∂g
∂f ´
(16.2)
−
dx dy = (f dx + gdy).
∂x ∂y
Ω
∂Ω
Note that, if we have a vector field X = X1 ∂/∂x + X2 ∂/∂y on Ω, then the
integrand on the left side of (16.2) is
∂X1
∂X2
+
= div X,
∂x
∂y
(16.3)
provided g = X1 , f = −X2 . We obtain
ZZ
Z
¡
(16.4)
div X dx dy =
−X2 dx + X1 dy).
Ω
∂Ω
¡
¢
If ∂Ω is parametrized by arc-length, as γ(s) = x(s), y(s) , with orientation as
defined for Proposition
15.1,¢ then the unit normal ν, to ∂Ω, pointing out of Ω, is
¡
given by ν(s) = y(s),
˙
−x(s)
˙
, and (16.4) is equivalent to
ZZ
Z
(16.5)
div X dx dy = hX, νi ds.
Ω
∂Ω
This is a special case of Gauss’ Divergence Theorem. We now derive a more general form of the Divergence Theorem. We begin with a definition of the divergence
of a vector field on a surface M.
Let M be a region in Rn , or an n-dimensional surface in Rn+k , provided with a
volume form ω ∈ Λn M. Let X be a vector field on M. Then the divergence of X,
denoted div X, is a function on M which measures the rate of change of the volume
form under the flow generated by X. Thus it is defined by
(16.6)
LX ω = (div X)ω.
106
Here, LX denotes the Lie derivative. In view of the general formula LX α = dαcX +
d(αcX), derived in (14.20), since dω = 0 for any n-form ω on M, we have
(16.7)
(div X)ω = d(ωcX).
If M = Rn , with the standard volume element
(16.8)
ω = dx1 ∧ · · · ∧ dxn ,
and if
(16.9)
X=
X
X j (x)∂/∂xj ,
then
(16.10)
ωcX =
n
X
dj ∧ · · · ∧ dxn .
(−1)j−1 X j (x)dx1 ∧ · · · ∧ dx
j=1
Hence, in this case, (16.7) yields the familiar formula
(16.11)
div X =
n
X
∂j X j ,
j=1
where we use the notation
(16.12)
∂j f = ∂f /∂xj .
Suppose now that M is endowed with a metric tensor gjk (x). Then M carries a
natural volume element ω, determined by the condition that, if one has a coordinate
system in which gjk (p0 ) = δjk , then ω(p0 ) = dx1 ∧· · ·∧dxn . This condition produces
the following formula, in any oriented coordinate system:
(16.13)
ω=
√
g dx1 ∧ · · · ∧ dxn ,
g = det(gjk ),
as we have seen in (12.7).
We now compute div X when the volume element on M is given by (16.13). We
have
X
√
dj ∧ · · · ∧ dxn
(16.14)
ωcX =
(−1)j−1 X j g dx1 ∧ · · · ∧ dx
j
and hence
(16.15)
√
d(ωcX) = ∂j ( gX j ) dx1 ∧ · · · ∧ dxn .
107
Here, as below, we use the summation convention. Hence the formula (16.7) gives
1
1
div X = g − 2 ∂j (g 2 X j ).
(16.16)
We now derive the Divergence Theorem, as a consequence of Stokes’ formula,
which we recall is
Z
Z
(16.17)
dα =
α,
M
∂M
for an (n − 1)-form on M , assumed to be a smooth compact oriented surface with
boundary. If α = ωcX, formula (16.7) gives
Z
(16.18)
Z
(div X)ω =
M
ωcX.
∂M
This is one form of the Divergence Theorem. We will produce an alternative expression for the integrand on the right before stating the result formally.
Given that ω is the volume form for M determined by a Riemannian metric,
we can write the interior product ωcX in terms of the volume element ω∂ on ∂M,
with its induced Riemannian metric, as follows. Pick coordinates on M, centered
at p0 ∈ ∂M, such that ∂M is tangent to the hyperplane {xn = 0} at p0 = 0, and
such that gjk (p0 ) = δjk . Then it is clear that, at p0 ,
j ∗ (ωcX) = hX, νiω∂
(16.19)
where ν is the unit vector normal to ∂M, pointing out of M and j : ∂M ,→ M the
natural inclusion. The two sides of (16.19), which are both defined in a coordinate
independent fashion, are hence equal on ∂M, and the identity (16.18) becomes
Z
(16.20)
Z
(div X)ω =
M
hX, νiω∂ .
∂M
Finally, we adopt the following common notation: we denote the volume element
on M by dV and that on ∂M by dS, obtaining the Divergence Theorem:
Theorem 16.2. If M is a compact surface with boundary, X a smooth vector field
on M , then
Z
Z
(16.21)
(div X) dV =
hX, νi dS,
M
∂M
where ν is the unit outward-pointing normal to ∂M.
108
The only point left to mention here is that M need not be orientable. Indeed,
we can treat the integrals in (16.21) as surface integrals, as in §12, and note that all
objects in (16.21) are independent of a choice of orientation. To prove the general
case, just use a partition of unity supported on orientable pieces.
The definition of the divergence of a vector field given by (16.6), in terms of how
the flow generated by the vector field magnifies or diminishes volumes, is a good
geometrical characterization, explaining the use of the term ‘divergence.’
We obtain some further integral identities. First, we apply (16.21) with X replaced by uX. We have the following ‘derivation’ identity:
(16.22)
div uX = u div X + hdu, Xi = u div X + Xu,
which follows easily from the formula (16.16). The Divergence Theorem immediately gives
Z
(16.23)
Z
(div X)u dV +
M
Z
Xu dV =
M
hX, νiu dS.
∂M
Replacing u by uv and using the derivation identity X(uv) = (Xu)v + u(Xv), we
have
Z
Z
Z
£
¤
(16.24)
(Xu)v + u(Xv) dV = − (div X)uv dV +
hX, νiuv dS.
M
M
∂M
It is very useful to apply (16.23) to a gradient vector field X. If v is a smooth
function on M, grad v is a vector field satisfying
(16.25)
hgrad v, Y i = hY, dvi,
for any vector field Y, where the brackets on the left are given by the metric tensor
on M and those on the right by the natural pairing of vector fields and 1-forms.
Hence grad v = X has components X j = g jk ∂k v, where (g jk ) is the matrix inverse
of (gjk ).
Applying div to grad v, we have the Laplace operator:
(16.26)
¡
¢
1
1
∆v = div grad v = g − 2 ∂j g jk g 2 ∂k v .
When M is a region in Rn and we use the standard Euclidean metric, so div X is
given by (16.11), we have the Laplace operator on Euclidean space:
(16.27)
∆v =
∂2v
∂2v
+
·
·
·
+
.
∂x21
∂x2n
109
Now, setting X = grad v in (16.23), we have Xu = hgrad u, grad vi, and hX, νi =
hν, grad vi, which we call the normal derivative of v, and denote ∂v/∂ν. Hence
Z
Z
Z
∂v
(16.28)
u(∆v) dV = − hgrad u, grad vi dV +
u
dS.
∂ν
M
M
∂M
If we interchange the roles of u and v and subtract, we have
Z
Z
Z h
∂v
∂u i
(16.29)
u(∆v) dV = (∆u)v dV +
u
−
v dS.
∂ν
∂ν
M
M
M
Formulas (16.28)-(16.29) are also called Green formulas. We will make further use
of them in §17.
We return to the Green formula (16.2), and give it another formulation. Consider
a vector field Z = (f, g, h) on a region in R3 containing the planar surface U =
{(x, y, 0) : (x, y) ∈ Ω}. If we form


i
j
k
(16.30)
curl Z = Det  ∂x ∂y ∂z 
f
g h
we see that the integrand on the left side of (16.2) is the k-component of curl Z, so
(16.2) can be written
ZZ
Z
(16.31)
(curl Z) · k dA = (Z · T ) ds,
U
∂U
where T is the unit tangent vector to ∂U. To see how to extend this result, note
that k is a unit normal field to the planar surface U.
To formulate and prove the extension of (16.31) to any compact oriented surface
with boundary in R3 , we use the relation between curl and exterior derivative
discussed in exercises 2-3 of §14. In particular, if we set
3
X
∂
F =
fj (x)
,
∂x
j
j=1
(16.32)
then curl F =
3
P
1
(16.33)
ϕ=
3
X
fj (x) dxj ,
j=1
gj (x) ∂/∂xj where
dϕ = g1 (x)dx2 ∧ dx3 + g2 (x)dx3 ∧ dx1 + g3 (x)dx1 ∧ dx2 .
Now Suppose M is a smooth oriented (n − 1)-dimensional surface with boundary
in Rn . Using the orientation of M, we pick a unit normal field N to M as follows.
110
Take a smooth function v which vanishes on M but such that ∇v(x) 6= 0 on M.
Thus ∇v is normal to M. Let σ ∈ Λn−1 (M ) define the orientation of M. Then
dv ∧ σ = a(x) dx1 ∧ · · · ∧ dxn , where a(x) is nonvanishing on M. For x ∈ M, we
take N (x) = ∇v(x)/|∇v(x)| if a(x) > 0, and N (x) = −∇v(x)/|∇v(x)| if a(x) < 0.
We call N the ‘upward’ unit normal field to the oriented surface M, in this case.
Part of the motivation for this characterization of N is that, if Ω ⊂ Rn is an open
set with smooth boundary M = ∂Ω, and we give M the induced orientation, as
described in §15, then the upward normal field N just defined coincides with the
unit normal field pointing out of Ω. Compare exercises 2-3 of §15.
Now, if G = (g1 , . . . , gn ) is a vector field defined near M, then
Z
(16.34)
(N · G) dS =
M
Z ³X
n
M
´
c j · · · ∧ dxn .
(−1)j−1 gj (x) dx1 ∧ · · · dx
j=1
This result follows from (16.19). When n = 3 and G = curl F, we deduce from
(16.32)-(16.33) that
ZZ
ZZ
(16.35)
dϕ =
(N · curl F ) dS.
M
M
Furthermore, in this case we have
Z
Z
(16.36)
ϕ=
(F · T ) ds,
∂M
∂M
where T is the unit tangent vector to ∂M, specied as follows by the orientation
of ∂M ; if τ ∈ Λ1 (∂M ) defines the orientation of ∂M, then hT, τ i > 0 on ∂M. We
call T the ‘forward’ unit tangent vector field to the oriented curve ∂M. By the
calculations above, we have the classical Stokes formula:
Proposition 16.3. If M is a compact oriented surface with boundary in R3 , and
F is a C 1 vector field on a neighborhood of M , then
ZZ
Z
(16.37)
(N · curl F ) dS =
(F · T ) ds,
M
∂M
where N is the upward unit normal field on M and T the forward unit tangent field
to ∂M.
Exercises
111
1. Given a Hamiltonian vector field
Hf =
n h
X
∂f ∂
∂f ∂ i
−
,
∂ξ
∂x
∂x
∂ξ
j
j
j
j
j=1
calculate div Hf directly from (16.11).
2. Show that the identity (16.21) for div (uX) follows from (16.7) and
du ∧ (ωcX) = (Xu)ω.
Prove this identity, for any n-form ω on M n . What happens if ω is replaced by a
k-form, k < n?
3. Relate problem 2 to the calculations
(16.38)
LuX α = uLX α + du ∧ (ιX α)
and
(16.39)
du ∧ (ιX α) = −ιX (du ∧ α) + (Xu)α,
valid for any k-form α. The last identity follows from (14.8).
4. Show that
div [X, Y ] = X(div Y ) − Y (div X).
5. Show that, if F : R3 → R3 is a linear rotation, then, for a C 1 vector field Z on
R3 ,
(16.40)
F# (curl Z) = curl(F# Z).
6. Let M be the graph in R3 of a smooth function, z = u(x, y), (x, y) ∈ O ⊂ R2 , a
bounded region with smooth boundary (maybe with corners). Show that
(16.41)
Z Z h³
Z
∂F3
∂F2 ´³ ∂u ´ ³ ∂F1
∂F3 ´³ ∂u ´
(curl F · N ) dS =
−
−
+
−
−
∂y
∂z
∂x
∂z
∂x
∂y
M
O
+
³ ∂F
2
∂x
−
∂F1 ´i
dx dy,
∂y
112
¡
¢
where ∂Fj /∂x and ∂Fj /∂y are evaluated at x, y, u(x, y) . Show that
Z
Z
(16.42)
(F · T ) ds =
∂M
¡
¢
¡
¢
e
e ∂u
Fe1 + Fe3 ∂u
∂x dx + F2 + F3 ∂y dy,
∂O
¡
¢
where Fej (x, y) = Fj x, y, u(x, y) . Apply Green’s Theorem, with f = Fe1 +Fe3 ∂u
∂x , g =
∂u
e
e
F2 + F3 ∂y , to show that the right sides of (16.41) and (16.42) are equal, hence proving Stokes’ Theorem in this case.
7. Let M ⊂ Rn be the graph of a function xn = u(x0 ), x0 = (x1 , . . . , xn−1 ). If
n
X
β=
c j ∧ · · · ∧ dxn ,
(−1)j−1 gj (x) dx1 ∧ · · · ∧ dx
j=1
¡
¢
as in (16.34), and ϕ(x0 ) = x0 , u(x0 ) , show that


¡
¢ ∂u
¡
¢
gj x0 , u(x0 )
ϕ∗ β = (−1)n 
− gn x0 , u(x0 )  dx1 ∧ · · · ∧ dxn−1
∂xj
j=1
n−1
X
= (−1)n−1 G · (−∇u, 1) dx1 ∧ · · · ∧ dxn−1 ,
where G = (g1 , . . . , gn ), and verify the identity (16.34) in this case.
Hint. For the last part, recall problems 2-3 of §15, regarding the orientation of M.
8. Let S be a smooth oriented 2-dimensional surface in R3 , and M an open subset
of S, with smooth boundary; see Fig.16.1. Let N be the upward unit normal field
to S, defined by its orientation. For x ∈ ∂M, let ν(x) be the unit vector, tangent
to M, normal to ∂M, and pointing out of M, and let T be the forward unit tangent
vector field to ∂M. Show that, on ∂M,
N × ν = T,
ν × T = N.
113
17. Holomorphic functions and harmonic functions
Let f be a complex-valued C 1 function on a region Ω ⊂ R2 . We identify R2 with
C, via z = x + iy, and write f (z) = f (x, y). We say f is holomorphic on Ω if it
satisfies the Cauchy-Riemann equation:
∂f
1 ∂f
=
.
∂x
i ∂y
(17.1)
Note that f (z) = z k has this property, for any k ∈ Z, via an easy calculation of
the partial derivatives of (x + iy)k . Thus z k is holomorphic on C if k ∈ Z+ , and on
C \ {0}, if k ∈ Z− . Another example is the exponential function, ez = ex eiy . On
the other hand, x2 + y 2 is not holomorphic.
The following is often useful for producing more holomorphic functions:
Lemma 17.1. If f and g are holomorphic on Ω, so is f g.
Proof. We have
(17.2)
∂
∂f
∂g
(f g) =
g+f
,
∂x
∂x
∂x
∂
∂f
∂g
(f g) =
g+f ,
∂y
∂y
∂y
so if f and g satisfy the Cauchy-Riemann equation, so does f g.
R
R
We next apply Green’s theorem to the line integral
f dz =
f (dx + idy).
∂Ω
∂Ω
Clearly (16.2) applies to complex-valued functions, and if we set g = if, we get
Z
(17.3)
ZZ ³
∂f
∂f ´
f dz =
i
−
dx dy.
∂x
∂y
Ω
∂Ω
Whenever f is holomorphic, the integrand on the right side of (17.3) vanishes, so
we have the following result, known as Cauchy’s Integral Theorem:
Theorem 17.2. If f ∈ C 1 (Ω) is holomorphic, then
Z
(17.4)
f (z) dz = 0.
∂Ω
Until further notice, we assume Ω is a bounded region in C, with smooth boundary. Using (17.4), we can establish Cauchy’s Integral Formula:
114
Theorem 17.3. If f ∈ C 1 (Ω) is holomorphic and z0 ∈ Ω, then
1
f (z0 ) =
2πi
(17.5)
Z
∂Ω
f (z)
dz.
z − z0
Proof. Note that g(z) = f (z)/(z − z0 ) is holomorphic on Ω \ {z0 }. Let Dr be the
disk of radius r centered at z0 . Pick r so small that Dr ⊂ Ω. Then (17.4) implies
Z
(17.6)
∂Ω
f (z)
dz =
z − z0
Z
∂Dr
f (z)
dz.
z − z0
To evaluate the integral on the right, parametrize the curve ∂Dr by γ(θ) = z0 +reiθ .
Hence dz = ireiθ dθ, so the integral on the right is equal to
Z
2π
(17.7)
0
f (z0 + reiθ )
ireiθ dθ = i
iθ
re
Z
2π
f (z0 + reiθ ) dθ.
0
As r → 0, this tends in the limit to 2πif (z0 ), so (17.5) is established.
Suppose f ∈ C 1 (Ω) is holomorphic, z0 ∈ Dr ⊂ Ω, where Dr is the disk of radius
r centered at z0 , and suppose z ∈ Dr . Then Theorem 17.3 implies
1
f (z) =
2πi
(17.8)
Z
∂Ω
f (ζ)
dζ.
(ζ − z0 ) − (z − z0 )
We have the infinite series expansion
(17.9)
∞
1
1 X ³ z − z0 ´n
=
,
(ζ − z0 ) − (z − z0 )
ζ − z0 n=0 ζ − z0
valid as long as |z − z0 | < |ζ − z0 |. Hence, given |z − z0 | < r, this series is uniformly
convergent for ζ ∈ ∂Ω, and we have
∞ Z
1 X
f (ζ) ³ z − z0 ´n
f (z) =
dζ.
2πi n=0
ζ − z0 ζ − z0
(17.10)
∂Ω
Thus, for z ∈ Dr , f (z) has the convergent power series expansion
(17.11)
f (z) =
∞
X
n=0
n
an (z − z0 ) ,
1
an =
2πi
Z
∂Ω
f (ζ)
dζ.
(ζ − z0 )n+1
115
Note that, when (17.5) is applied to Ω = Dr , the disk of radius r centered at z0 ,
the computation (17.7) yields
(17.12)
1
f (z0 ) =
2π
Z
2π
0
1
f (z0 + re ) dθ =
`(∂Dr )
Z
iθ
f (z) ds(z),
∂Dr
when f is holomorphic and C 1 on Dr , and `(∂Dr ) = 2πr is the length of the
circle ∂Dr . This is a mean value property, which extends to harmonic functions on
domains in Rn , as we will see below.
Note that we can write (17.1) as (∂x + i∂y )f = 0; applying the operator ∂x − i∂y
to this gives
∂2f
∂2f
+ 2 =0
∂x2
∂y
(17.13)
for any holomorphic function. A general C 2 solution to (17.13) on a region Ω ⊂ R2
is called a harmonic function. More generally, if O is an open set in Rn , a function
f ∈ C 2 (Ω) is called harmonic if ∆f = 0 on O, where, as in (16.27),
(17.14)
∆f =
∂2f
∂2f
+
·
·
·
+
.
∂x21
∂x2n
Generalizing (17.12), we have the following, known as the mean value property of
harmonic functions:
Proposition 17.4. Let Ω ⊂ Rn be open, u ∈ C 2 (Ω) be harmonic, p ∈ Ω, and
BR (p) = {x ∈ Ω : |x − p| ≤ R} ⊂ Ω. Then
Z
1
¢
(17.15)
u(p) = ¡
u(x) dS(x).
A ∂BR (p)
∂BR (p)
For the proof, set
(17.16)
1
ψ(r) =
A(S n−1 )
Z
u(p + rω) dS(ω),
S n−1
for 0 < r ≤ R. We have ψ(R) equal to the right side of (17.15), while clearly
ψ(r) → u(p) as r → 0. Now
Z
Z
1
1
∂u
0
¢
(17.17) ψ (r) =
ω·∇u(p+rω) dS(ω) = ¡
dS(x).
n−1
A(S
)
∂ν
A ∂Br (p)
S n−1
At this point, we establish:
∂Br (p)
116
Lemma 17.5. If O ⊂ Rn is a bounded domain with smooth boundary and u ∈
C 2 (O) is harmonic in O, then
Z
∂u
(17.18)
(x) dS(x) = 0.
∂ν
∂O
Proof. Apply the Green formula (16.29), with M = O and v = 1. If ∆u = 0, every
integrand in (16.29) vanishes, except the one appearing in (17.18), so this integrates
to zero.
It follows from this lemma that (17.17) vanishes, so ψ(r) is constant. This
completes the proof of (17.15).
We can integrate the identity (17.15), to obtain
Z
1
¢
(17.19)
u(p) = ¡
u(x) dV (x),
V BR (p)
BR (p)
¡
¢
where u ∈ C 2 BR (p) is harmonic. This is another expression of the mean value
property.
The mean value property of harmonic functions has a number of important
consequences. Here we mention one result, known as Liouville’s Theorem.
Proposition 17.5. If u ∈ C 2 (Rn ) is harmonic on all of Rn and bounded, then u
is constant.
Proof. Pick any two points p, q ∈ Rn . We have, for any r > 0,


Z
Z
1

¢
u(x) dx .
(17.20)
u(p) − u(q) = ¡
u(x) dx −

V Br (0)
¡
¢
Br (p)
Br (q)
Note that V Br (0) = Cn rn , where Cn is evaluated in problem 2 of §12. Thus
Z
Cn
(17.21)
|u(p) − u(q)| ≤ n
|u(x)| dx,
r
∆(p,q,r)
where
(17.22)
³
´ ³
´
∆(p, q, r) = Br (p)4Br (q) = Br (p) \ Br (q) ∪ Br (q) \ Br (p) .
Note that, if a = |p − q|, then ∆(p, q, r) ⊂ Br+a (p) \ Br−a (p); hence
¡
¢
(17.23)
V ∆(p, q, r) ≤ C(p, q) rn−1 , r ≥ 1.
It follows that, if |u(x)| ≤ M for all x ∈ Rn , then
(17.24)
|u(p) − u(q)| ≤ M Cn C(p, q) r−1 ,
∀ r ≥ 1.
Taking r → ∞, we obtain u(p) − u(q) = 0, so u is constant.
We will now use Liouville’s Theorem to prove the Fundamental Theorem of
Algebra:
117
Theorem 17.6. If p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 is a polynomial of
degree n ≥ 1 (an 6= 0), then p(z) must vanish somewhere in C.
Proof. Consider
(17.25)
f (z) =
1
.
p(z)
If p(z) does not vanish anywhere in C, then f (z) is holomorphic on all of C. On
the other hand,
(17.26)
f (z) =
1
1
,
z n an + an−1 z −1 + · · · + a0 z −n
so
(17.27)
|f (z)| −→ 0, as |z| → ∞.
Thus f is bounded on C, if p(z) has no roots. By Proposition 17.5, f (z) must be
constant, which is impossible, so p(z) must have a complex root.
From the fact that every holomorphic function f on O ⊂ R2 is harmonic, it
follows that its real and imaginary parts are harmonic. This result has a converse.
Let u ∈ C 2 (O) be harmonic. Consider the 1-form
(17.28)
α=−
∂u
∂u
dx +
dy.
∂y
∂x
We have dα = −(∆u)dx ∧ dy, so α is closed if and only if u is harmonic. Now, if O
is diffeomorphic to a disk, Theorem 14.2 implies that α is exact on O, whenever it
is closed, so, in such a case,
(17.29)
∆u = 0 on O =⇒ ∃ v ∈ C 1 (O) s.t. α = dv.
In other words,
(17.30)
∂u
∂v
=
,
∂x
∂y
∂u
∂v
=− .
∂y
∂x
This is precisely the Cauchy-Riemann equation (17.1) for f = u + iv, so we have:
Proposition 17.7. If O ⊂ R2 is diffeomorphic to a disk and u ∈ C 2 (O) is harmonic, then u is the real part of a holomorphic function on O.
The function v (which is unique up to an additive constant) is called the harmonic conjugate
of u.
118
We close this section with a brief mention of holomorphic functions on a domain
O ⊂ Cn . We say f ∈ C 1 (O) is holomorphic provided it satisfies
∂f
1 ∂f
=
,
∂xj
i ∂yj
(17.31)
1 ≤ j ≤ n.
Suppose z ∈ O, z = (z1 , . . . , zn ). Suppose ζ ∈ O whenever |z − ζ| < r. Then, by
successively applying Cauchy’s integral formula (17.5) to each complex variable zj ,
we have that
Z
Z
−n
(17.32)
f (z) = (2πi)
· · · f (ζ)(ζ1 − z1 ) · · · (ζn − zn )−1 dζ1 · · · dζn ,
γn
γ1
where γj is any
√ simple counterclockwise curve about zj in C with the property that
|ζj − zj | < r/ n for all ζj ∈ γj .
Consequently, if p ∈ Cn and O contains the ‘polydisc’ D = {z ∈ Cn : |z − p| ≤
√
nδ}, then, for z ∈ D, the interior of D, we have
(17.33)
Z
Z
£
¤−1
−n
f (z) = (2πi)
· · · f (ζ) (ζ1 − p1 ) − (z1 − p1 )
···
Cn
C1
£
¤−1
(ζn − pn ) − (zn − p2 )
dζ1 · · · dζn ,
where C1 = {ζ ∈ C : |ζ − pj | = δ}. Then, parallel to (17.8)-(17.11), we have
X
(17.34)
f (z) =
cα (z − p)α ,
α≥0
for z ∈ D, where α = (α1 , . . . , αn ) is a multi-index, (z − p)α = (z1 − p1 )α1 · · · (zn −
pn )αn , as in (1.13), and
Z
Z
−n
(17.35) cα = (2πi)
· · · f (ζ)(ζ1 − p1 )−α1 −1 · · · (ζn − pn )−αn −1 dζ1 · · · dζn .
Cn
C1
Thus holomorphic functions on open domains in Cn have convergent power series
expansions.
We refer to [Ahl] and [Hil] for more material on holomorphic functions of one
complex variable, and to [Kr] for material on holomorphic functions of several
complex variables. A source of much information on harmonic functions is [Kel].
Also further material on these subjects can be found in [T1].
Exercises
119
1. Look again at problem 4 in §1.
2. Look again at problems 3-5 in §2.
3. Look again at problems 1-2 in §6.
4. Let O, Ω be open in C. If f is holomorphic on O, with range in Ω, and g is
holomorphic on Ω, show that h = g ◦ f is holomorphic on O.
Hint. Chain rule.
5. If f (x) = ϕ(|x|) on Rn , show that
∆f (x) = ϕ00 (|x|) +
n−1 0
ϕ (|x|).
|x|
In particular, show that
|x|−(n−2) is harmonic on Rn \ 0,
if n ≥ 3, and
log |x| is harmonic on R2 \ 0.
If O, Ω are open in Rn , a smooth map ϕ : O → Ω is said to be conformal provided the
matrix function G(x) = Dϕ(x)t Dϕ(x) is a multiple of the identity, G(x) = γ(x)I.
Recall formula (12.2).
6. Suppose n = 2 and ϕ preserves orientation. Show that ϕ (pictured as a function
ϕ : O → C) is conformal if and only if it is holomorphic. If ϕ reverses orientation,
ϕ is conformal ⇔ ϕ is holomorphic (we say ϕ is anti-holomorphic).
7. If O and Ω are open in R2 and u is harmonic on Ω, show that u ◦ ϕ is harmonic
on O, whenever ϕ : O → Ω is a smooth conformal map.
Hint. Use problem 4.
8. Let ez be defined by
ez = ex eiy ,
z = x + iy, x, y ∈ R,
120
where ex and eiy are defined by (3.20), (3.25). Show that ez is holomorphic in
z ∈ C, and
∞
X
zk
z
e =
.
k!
k=0
9. For z ∈ C, set
cos z =
1
2
¡ iz
¢
e + e−iz ,
sin z =
1
2i
¡ iz
¢
e − e−iz .
Show that these functions agree with the definitions of cos t and sin t given in
(3.25)-(3.26), for z = t ∈ R. Show that cos z and sin z are holomorphic in z ∈ C.
121
18. Variational problems; the stationary action principle
The calculus of variations consists of the study of stationary points (e.g., maxima
and minima) of a real valued function which itself is defined on some space of
functions. Here, we let M be a region in Rn , or more generally an n-dimensional
surface in Rn+k , fix two points p, q ∈ M, and an interval [a, b] ⊂ R, and consider a
space of functions P consisting of smooth curves u : [a, b] → M satisfying u(a) =
p, u(b) = q. We consider functions I : P → R of the form
Z
(18.1)
b
I(u) =
¡
¢
F u(t), u(t)
˙
dt.
a
Here F (x, v) is a smooth function on the tangent bundle T M, or perhaps on some
open subset of T M. By definition, the condition for I to be stationary at u is that
(18.2)
(d/ds)I(us )|s=0 = 0
for any smooth family us of elements of P with u0 = u. Note that
(18.3)
(d/ds)us (t)|s=0 = w(t)
defines a tangent vector to M at u(t), and precisely those tangent vectors w(t)
vanishing at t = a and at t = b arise from making some variation of u within P.
We can compute the left side of (18.2) by differentiating under the integral, and
obtaining a formula for this involves considering t-derivatives of w. We work in
local coordinates on M. Since any smooth curve on M can be enclosed by a single
coordinate patch, this involves no loss of generality. Then, given (18.3), we have
Z
(18.4)
b£
¤
Fx (u, u)w
˙ + Fv (u, u)
˙ w˙ dt.
(d/ds)I(us )|s=0 =
a
Integrating the last term by parts and recalling that w(a) and w(b) vanish, we see
that this is equal to
Z
(18.5)
b£
¤
Fx (u, u)
˙ − (d/dt)Fv (u, u)
˙ wdt.
a
It follows that the condition for u to be stationary is precisely that u satisfy the
equation
(18.6)
(d/dt)Fv (u, u)
˙ − Fx (u, u)
˙ = 0,
122
a second order ODE, called Lagrange’s equation. Written more fully, it is
(18.7)
Fvv (u, u)¨
˙ u + Fvx (u, u)
˙ u˙ − Fx (u, u)
˙ = 0,
where Fvv is the n × n matrix of second order v-derivatives of F (x, v), acting on the
vector u
¨, etc. This is a nonsingular system as long as F (x, v) satisfies the condition
(18.8)
Fvv (x, v) is invertible,
as an n × n matrix, for each (x, v) = (u(t), u(t)),
˙
t ∈ [a, b].
The ODE (18.6) suggests a particularly important role for
(18.9)
ξ = Fv (x, v).
Then, for (x, v) = (u, u),
˙ we have
ξ˙ = Fx (x, v),
(18.10)
x˙ = v.
We claim this system, in (x, ξ) coordinates, is in Hamiltonian form. Note that
(x, ξ) gives a local coordinate system under the hypothesis (18.8), by the Inverse
Function Theorem. In other words, we will produce a function E(x, ξ) such that
(18.10) is the same as
(18.11)
x˙ = Eξ ,
ξ˙ = −Ex ,
so the goal is to construct E(x, ξ) such that
(18.12)
Ex (x, ξ) = −Fx (x, v),
Eξ (x, ξ) = v,
when v = v(x, ξ) is defined by inverting the transformation
(18.13)
(x, ξ) = (x, Fv (x, v)) = λ(x, v).
If we set
E b (x, v) = E(λ(x, v)),
(18.14)
then (18.12) is equivalent to
(18.15)
Exb (x, v) = −Fx + vFvx ,
Evb (x, v) = v Fvv ,
as follows from the chain rule. This calculation is most easily performed using
differential forms. In the differential form notation, our task is to find E b (x, v)
such that
(18.16)
dE b = (−Fx + vFvx )dx + vFvv dv.
123
It can be seen by inspection that this identity is satisfied by
(18.17)
E b (x, v) = Fv (x, v)v − F (x, v).
Thus the ODE (18.7) describing a stationary point for (18.1) has been converted
to a first order Hamiltonian system, in the (x, ξ) coordinates, given the hypothesis
(18.8) on Fvv . In view of (18.13), one often writes (18.17) informally as
E(x, ξ) = ξ · v − F (x, v).
We make some observations about the transformation λ of (18.13). If v ∈ Tx M,
then Fv (x, v) acts naturally as a linear functional on Tx M. In other words, ξ =
Fv (x, v) is naturally regarded as an element of Tx∗ M, in the cotangent bundle of M ;
it makes invariant sense to regard
(18.18)
λ : T M −→ T ∗ M
(if F is defined on all of T M ). This map is called the Legendre transformation. As
we have already noted, the hypothesis (18.8) is equivalent to the statement that λ
is a local diffeomorphism.
As an example, suppose M has a metric tensor g, and
F (x, v) =
1
g(v, v).
2
Then the map (18.18) is the identification of T M and T ∗ M associated with ‘lowering indices,’ using the metric tensor gjk . A straightforward calculation gives E(x, ξ)
equal to half the natural square norm
p on cotangent vectors, in this case. On the
other hand, the function F (x, v) = g(v, v) fails to satisfy hypothesis (18.8). Since
this is the integrand for arc length, it is important to incorporate this case into our
analysis. This case involves parametrizing a curve by arc length. We will look at
the following more general situation.
We say Fp
(x, v) is homogeneous of degree r in v if F (x, cv) = cr F (x, v) for
c > 0. Thus g(v, v) above is homogeneous of degree 1. When F is homogeneous
of degree 1, hypothesis (18.8) is never satisfied. Furthermore, I(u) is independent of
the parametrization of a curve in this case; if σ : [a, b] → [a, b] is a diffeomorphism
(fixing a and b) then I(u) = I(˜
u) for u
˜(t) = u(σ(t)). Let us look at a function
f (x, v) related to F (x, v) by
(18.19)
f (x, v) = ψ(F (x, v)),
F (x, v) = ϕ(f (x, v)).
Given a family us of curves as before, we can write
Z bh
¡
¢
d
ϕ0 f (u, u)
˙ fx (u, u)
˙
I(us )|s=0 =
ds
a
(18.20)
¢
ªi
d © 0¡
−
w dt.
ϕ f (u, u)
˙ fv (u, u)
˙
dt
124
If u satisfies the condition
(18.21)
f (u, u)
˙ = c,
with c constant, this is equal to
Z
(18.22)
b£
¤
fx (u, u)
˙ − (d/dt)fv (u, u)
˙ w dt
0
c
a
with c0 = ϕ0 (c). Of course, setting
Z
(18.23)
J(u) =
b
f (u, u)dt,
˙
a
we have
Z
(18.24)
(d/ds)J(us )|s=0 =
b£
¤
fx (u, u)
˙ − (d/dt)fv (u, u)
˙ w dt.
a
Consequently, if u satisfies (18.21), then u is stationary for I if and only if u is
stationary for J (provided ϕ0 (c) 6= 0).
It is possiblepthat f (x, v) satisfies (18.8) even though F (x, v) does not, as the
case F (x, v) = g(v, v) illustrates. Note that
fvj vk = ψ 0 (F )Fvj vk + ψ 00 (F )Fvj Fvk .
Let us specialize to the case ψ(F ) = F 2 , so f (x, v) = F (x, v)2 is homogeneous of
degree 2. If F is convex in v and (Fvj vk ), a positive semidefinite matrix, annihilates
only radial vectors, and if F > 0, then f (x, v) is strictly convex, i.e., fvv isp
positive
definite, and hence (18.8) holds for f (x, v). This is the case when F (x, v) = g(v, v)
is the arc length integrand.
If f (x, v) = F (x, v)2 satisfies (18.8), then the stationary condition for (18.23) is
that u satisfy the ODE
fvv (u, u)¨
˙ u + fvx (u, u)
˙ u˙ − fx (u, u)
˙ = 0,
a nonsingular ODE for which we know there is a unique local solution, with u(a) =
p, u(a)
˙
given. We will be able to say such a solution is also stationary for (18.1)
once we know that (18.21) holds, i.e., f (u, u)
˙ is constant. Indeed, if f (x, v) is
homogeneous of degree two, then fv (x, v)v = 2f (x, v), and hence
(18.25)
eb (x, v) = fv (x, v)v − f (x, v) = f (x, v).
But since the equations for u take Hamiltonian form in the coordinates (x, ξ) =
(x, fv (x, v)), it follows that eb (u(t), u(t))
˙
is constant for u stationary, so (18.21)
does hold in this case.
125
In particular, the equations for a geodesic, i.e., for a critical point for the length
functional
Z bp
`(u) =
g(u,
˙ u)
˙ dt,
a
with constant speed, is the same as the equation for a critical point for the functional
Z
b
E(u) =
g(u,
˙ u)
˙ dt.
a
We will write out the ODE explicitly below; see (18.32)-(18.34).
There is a general principle, known as the stationary action principle, or Hamilton’s principle, for producing equations of mathematical physics. In this set up, the
state of a physical system at a given time is described by a pair (x, v), position and
velocity. One has a kinetic energy function T (x, v) and a potential energy function
V (x, v), determining the dynamics, as follows. Form the difference
(18.26)
L(x, v) = T (x, v) − V (x, v),
known as the Lagrangian. Hamilton’s principle states that a path u(t) describing
the evolution of the state in this system is a stationary path for the action integral
Z
(18.27)
I(u) =
b
L(u, u)dt.
˙
a
In many important cases, the potential V = V (x) is velocity independent and
T (x, v) is a quadratic form in v; say T (x, v) = 12 v · G(x)v for a symmetric matrix
G(x). In that case, we consider
(18.28)
L(x, v) =
1
v · G(x)v − V (x).
2
Thus we have
(18.29)
ξ = Lv (x, v) = G(x)v,
and the conserved quantity (18.17) becomes
E b (x, v) = v · G(x)v −
(18.30)
=
£1
¤
v · G(x)v − V (x)
2
1
v · G(x)v + V (x),
2
which is the total energy T (x, v) + V (x). Note that the nondegeneracy condition is
that G(x) be invertible; assuming this, we have
(18.31)
E(x, ξ) =
1
ξ · G(x)−1 ξ + V (x),
2
126
whose Hamiltonian vector field defines the dynamics. Note that, in this case, Lagrange’s equation (18.6) takes the form
(18.32)
£
¤ 1
(d/dt) G(u)u˙ = u˙ · Gx (u)u˙ − Vx (u),
2
which can be rewritten as
(18.33)
u
¨ + Γu˙ u˙ + G(u)−1 Vx (u) = 0,
where Γu˙ u˙ is a vector whose `th component is
h ∂g
∂gjm
∂gjk i
km
(18.34)
Γ`jk u˙ j u˙ k , Γ`jk = 12 g `m
+
−
.
∂xj
∂xk
∂xm
¡
¢
Here, g `m is the matrix G(x)−1 , and we use the ‘summation convention’, i.e., we
sum over m.
The equation (18.33) contains as a special case the geodesic equation for the
Riemannian metric (gjk ) = G(x), which is what would arise in the case V = 0.
We refer to [Ar], [Go] for a discussion of the relation of Hamilton’s principle to
other formulations of the laws of Newtonian mechanics, but illustrate it briefly with
a couple of examples here.
Consider the basic case of motion of a particle in Euclidean space Rn , in the
presence of a force field of potential type F (x) = − grad V (x), as in the beginning
of §9. Then
(18.35)
T (x, v) =
1
m|v|2 ,
2
V (x, v) = V (x).
This is of course the special case of (18.28) with G(x) = mI, and the ODE satisfied
by stationary paths for (18.27) hence has the form
(18.36)
m¨
u + Vx (u) = 0,
precisely the equation (10.2) expressing Newton’s law F = ma.
Next we consider one example where Cartesian coordinates are not used, namely
the motion of a pendulum. See Fig.18.1 at the end of this section. We suppose a
mass m is at the end of a (massless) rod of length `, swinging under the influence
of gravity. In this case, we can express the potential energy as
(18.37)
V (θ) = −mg` cos θ,
where θ is the angle the rod makes with the downward vertical ray, and g denotes
˙
the strength of gravity. The speed of the mass at the end of the pendulum is `|θ|,
so the kinetic energy is
(18.38)
˙ = 1 m`2 |θ|
˙ 2.
T (θ, θ)
2
127
In this case we see that Hamilton’s principle leads to the ODE
`θ¨ + g sin θ = 0,
(18.39)
describing the motion of a pendulum.
Exercises
1. Suppose that, more generally than (18.28), we have a Lagrangian of the form
L(x, v) = 12 v · G(x)v + A(x) · v − V (x).
Show that (18.30) continues to hold, i.e.,
E b (x, v) = 12 v · G(x)v + V (x),
and that the Hamiltonian function becomes, in place of (18.31),
E(x, ξ) =
1
2
¡
¢
¡
¢
ξ − A(x) · G(x)−1 ξ − A(x) + V (x).
Work out the modification to (18.33) when the extra term A(x) · v is included.
2. Work out the differential equations for a planar double pendulum, in the spirit
of (18.36)-(18.38). See Fig.18.2.
Hint. To compute kinetic and potential energy, think of the plane as the complex
plane, with the real axis pointing down. The position of particle 1 is `1 eiθ1 and
that of particle 2 is `1 eiθ1 + `2 eiθ2 .
3. The statement before (18.4), that any smooth curve u(s) on M can be enclosed
by a single coordinate patch, is not strictly accurate, as the curve may have selfintersections. Give a more precise statement.
4. Show that, if one uses polar coordinates, x = r cos θ, y = r sin θ, the Euclidean
metric tensor ds2 = dx2 + dy 2 becomes dr2 + r2 dθ2 . Use this together with (18.28)(18.31) to explain the calculation (10.11), in §10.
128
19. The symplectic form; canonical transformations
Recall from §9 that a Hamiltonian vector field on a region Ω ⊂ R2n , with coordinates ζ = (x, ξ), is a vector field of the form
n h
X
∂f ∂
∂f ∂ i
(19.1)
Hf =
−
.
∂ξj ∂xj
∂xj ∂ξj
j=1
We want to gain an understanding of Hamiltonian vector fields, free from coordinates. In particular, we ask the following question. Let F : O → Ω be a diffeomorphism, Hf a Hamiltonian vector field on Ω. Under what condition on F is F# Hf a
Hamiltonian vector field on O?
A central object in this study is the symplectic form, a 2-form on R2n defined
by
n
X
(19.2)
σ=
dξj ∧ dxj .
j=1
Note that, if
X£
¤
U=
uj (ζ)∂/∂xj + aj (ζ)∂/∂ξj ,
V =
X£
¤
v j (ζ)∂/∂xj + bj (ζ)∂/∂ξj ,
then
(19.3)
n
X
£ j
¤
σ(U, V ) =
−u (ζ)bj (ζ) + aj (ζ)v j (ζ) .
j=1
In particular, σ satisfies the following nondegeneracy condition: if U has the property that, for some (x0 , ξ0 ) ∈ R2n , σ(U, V ) = 0 at (x0 , ξ0 ) for all vector fields
V, then U must vanish at (x0 , ξ0 ). The relation between the symplectic form and
Hamiltonian vector fields is the following.
Proposition 19.1. The vector field Hf is uniquely determined by the identity
(19.4)
σcHf = −df.
Proof. The content of the identity is
(19.5)
σ(Hf , V ) = −V f
for any smooth vector field V. If V has the form used in (14.3), then that identity
gives
n
X
£
¤
σ(Hf , V ) = −
(∂f /∂ξj )bj (ζ) + (∂f /∂xj )v j (ζ) ,
j=1
which coincides with the right side of (19.5). In view of the nondegeneracy of σ,
the proposition is proved. Note the special case
(19.6)
σ(Hf , Hg ) = {f, g}.
The following is an immediate corollary.
129
Proposition 19.2. If O, Ω are open in R2n , and F : O → Ω is a diffeomorphism
preserving σ, i.e., satisfying
F ∗ σ = σ,
(19.7)
then for any f ∈ C ∞ (Ω), F# Hf is Hamiltonian on Ω, and
(19.8)
F# Hf = HF ∗ f ,
where F ∗ f (y) = f (F (y)).
A diffeomorphism satisfying (19.7) is called a canonical transformation, or a
symplectic transformation. Let us now look at the condition on a vector field X on
t
Ω that the flow FX
generated by X preserve σ for each t. There is a simple general
condition in terms of the Lie derivative for a given form to be preserved.
¡ t ¢∗
Lemma 19.3. Let α ∈ Λk (Ω). Then FX
α = α for all t if and only if
LX α = 0.
Proof. This is an immediate consequence of (14.28).
Recall the formula (14.20):
(19.9)
LX α = d(αcX) + (dα)cX.
We apply it in the case α = σ is the symplectic form. Clearly (19.2) implies
(19.10)
dσ = 0,
so
(19.11)
LX σ = d(σcX).
t
Consequently, FX
preserves the symplectic form σ if and only if d(σcX) = 0 on
Ω. In view of Poincare’s Lemma, at least locally, one has a smooth function f (x, ξ)
such that
(19.12)
σcX = df,
provided d(σcX) = 0. Any two f ’s satisfying (19.12) must differ by a constant, and
it follows that such f exists globally provided Ω is simply connected. In view of
Proposition 19.1, (19.12) is equivalent to the identity
(19.13)
In particular, we have established:
X = −Hf .
130
Proposition 19.4. The flow generated by a Hamiltonian vector field Hf preserves
the symplectic form σ.
It follows a fortiori that the flow F t generated by a Hamiltonian vector field Hf
leaves invariant the 2n-form
v = σ ∧ ··· ∧ σ
(n factors),
which provides a volume form on Ω. That this volume form is preserved is known
as a theorem of Liouville. This result has the following refinement. Let S be a level
surface of the function f ; suppose f is non-degenerate on S. Then we can define a
(2n − 1)-form w on S (giving rise to a volume element on S) which is also invariant
under the flow F t , as follows. Let X be any vector field on Ω such that Xf = 1 on
S, and define
w = j ∗ (vcX),
(19.14)
where j : S ,→ Ω is the natural inclusion. We claim this is well defined.
Lemma 19.5. The form (19.14) is independent of the choice of X, as long as
Xf = 1 on S.
Proof. The difference of two such forms is j ∗ (vcY1 ), where Y1 f = 0 on S, i.e., Y1
is tangent to S. Now this form, acting on vectors Y2 , . . . , Y2n , all tangent to S, is
merely (j ∗ v)(Y1 , . . . , Y2n ); but obviously j ∗ v = 0, since dim S < 2n.
We can now establish the invariance of the form w on S.
Proposition 19.6. The form (19.14) is invariant under the flow F t on S.
Proof. Since v is invariant under F t , we have
t
F t∗ w = j ∗ (F t∗ vcF#
X)
t
= j ∗ (vcF#
X)
¡
¢
t
= w + j ∗ vc(F#
X − X) .
t
Since F t∗ f = f, we see that (F#
X)f = 1 = Xf, so the last term vanishes, by
Lemma 19.5, and the proof is complete.
Let O ⊂ Rn be open; we claim the symplectic form σ is well defined on T ∗ O =
O × Rn , in the following sense. Suppose g : O → Ω is a diffeomorphism, i.e., a
coordinate change. The map this induces from T ∗ O to T ∗ Ω is
(19.15)
¡
¡
¢−1
¢
G(x, ξ) = g(x), (Dg)t
(x)ξ = (y, η).
Our invariance result is:
(19.16)
G∗ σ = σ.
131
In fact, a stronger result is true. We can write
X
(19.17)
σ = dκ, κ =
ξj dxj ,
j
where the 1-form κ is called the contact form. We claim that
G∗ κ = κ,
(19.18)
which implies (19.16), since G∗ dκ = dG∗ κ. To see (19.18), note that
X
X
dyj =
(∂gj /∂xk )dxk , ηj =
Hj` ξ` ,
k
`
¡
¢−1
where (Hj` ) is the matrix of (Dg)t
, i.e., the inverse matrix of (∂g` /∂xj ). Hence
X
X
ηj dyj =
(∂gj /∂xk )Hj` ξ` dxk
j
(19.19)
j,k,`
=
X
δk` ξ` dxk
k,`
=
X
ξk dxk ,
k
which establishes (19.18).
As a particular case, a vector field Y on O, generating a flow FYt on O, induces
a flow GYt on T ∗ O. Not only does this flow preserve the symplectic form; in fact GYt
is generated by the Hamiltonian vector field HΦ , where
X
(19.20)
Φ(x, ξ) = hY (x), ξi =
ξj v j (x)
j
P
if Y =
v j (x)∂/∂xj .
The symplectic form given by (19.2) can be regarded as a special case of a general
symplectic form, which is a closed nondegenerate 2-form on a domain (or manifold)
Ω. Often such a form ω arises naturally, not a priori looking like (19.2). It is a
theorem of Darboux that locally one can pick coordinates in such a fashion that
ω does take the standard form (19.2). We present a short proof, due to Moser, of
that theorem.
To start, pick p ∈ Ω, and consider B = ω(p), a nondegenerate, antisymmetric,
bilinear form on the vector space V = Tp Ω. It is a simple exercise in linear algebra
that if one has such a form, then dim V must be even, say 2n, and V has a basis
{ej , fj : 1 ≤ j ≤ n} such that
(19.21)
B(ej , e` ) = B(fj , f` ) = 0,
B(ej , f` ) = δj`
for 1 ≤ j, ` ≤ n. Using such a basis to impose linear coordinates
(x, ξ) on a neighP
borhood of p, taken to the origin, we have ω = ω0 =
dξj ∧ dxj at p. Thus
Darboux’ Theorem follows from:
132
Proposition 19.7. If ω and ω0 are closed nondegenerate 2-forms on Ω, and ω = ω0
at p ∈ Ω, then there is a diffeomorphism G1 defined on a neighborhood of p, such
that
G1 (p) = p and G∗1 ω = ω0 .
(19.22)
Proof. For t ∈ [0, 1], let
(19.23)
ωt = (1 − t)ω0 + tω = ω0 + tα,
α = ω − ω0 .
Thus α = 0 at p, and α is a closed 2-form. We can hence write
(19.24)
α = dβ
on a neighborhood of p, and if β is given by the formula (14.32) in the proof of the
Poincare Lemma, we have β = 0 at p. Since, for each t, ωt = ω at p, we see that
each ωt is nondegenerate on some common neighborhood of p, for t ∈ [0, 1].
Our strategy will be to produce a smooth family of local diffeomorphisms Gt , 0 ≤
t ≤ 1, such that Gt (p) = p, G0 = id., and such that G∗t ωt is independent of t, hence
G∗t ωt = ω0 . Gt will be specified by a time-varying family of vector fields, via the
ODE
(19.25)
(d/dt)Gt (x) = Xt (Gt (x)),
G0 (x) = x.
We will have Gt (p) = p provided Xt (p) = 0. To arrange that G∗t ωt is independent
of t, note that, by the product rule,
(19.26)
(d/dt)G∗t ωt = G∗t LXt ωt + G∗t (dωt /dt).
By (19.23), dωt /dt = α = dβ, and by Proposition 14.1,
(19.27)
LXt ωt = d(ωt cXt )
since ωt is closed. Thus we can write (19.26) as
(19.28)
(d/dt)G∗t ωt = G∗t d(ωt cXt + β).
This vanishes provided Xt is defined to satisfy
(19.29)
ωt cXt = −β.
Since ωt is nondegenerate near p, this does indeed uniquely specify a vector field
Xt near p, for each t ∈ [0, 1], which vanishes at p, since β = 0 at p. The proof of
Darboux’ Theorem is complete.
133
Exercises
1. Do the linear algebra exercise stated before Proposition 19.7, as a preparation
for the proof of Darboux’ Theorem.
2. On R2 , identify (x, ξ) with (x, y), so the symplectic form is σ = dy ∧ dx. Show
that
X = f ∂/∂x + g∂/∂y and α = gdx − f dy
are related by
α = σcX.
Reconsider problems 7-9 of §14 in light of this.
3. Show that the volume form w on the level surface S of f, given by (19.14), can
be characterized as follows. Let Sh be the level set {f (x, ξ) = c+h}, S = S0 . Given
any vector field X transversal to S, any open set O ⊂ S with smooth boundary, let
eh be the thin set sandwiched between S and Sh , lying on orbits of X through O.
O
Then, with v = σ ∧ · · · ∧ σ the volume form on Ω,
Z
Z
1
w = lim
v.
h→0 h
O
eh
O
134
20. Topological applications of differential forms
Differential forms are a fundamental tool in calculus. In addition, they have
important applications to topology. We give a few here, starting with simple proofs
of some important topological results of Brouwer.
Proposition 20.1. There is no continuous retraction ϕ : B → S n−1 of the closed
unit ball B in Rn onto its boundary S n−1 .
In fact, it is just as easy to prove the following more general result. The approach
we use is adapted from [Kan].
Proposition 20.2. If M is a compact oriented surface with nonempty boundary
∂M, there is no continuous retraction ϕ : M → ∂M.
Proof. A retraction ϕ satisfies ϕ ◦ j(x) = x where j : S n−1 ,→ B is the natural
inclusion. By a simple approximation, if there were a continuous retraction there
would be a smooth one, so we can suppose ϕ is smooth.
Pick ω ∈ Λn−1 (∂M ) to be the
R volume form on ∂M, endowed with some Riemannian metric (n = dim M ), so
ω > 0. Now apply Stokes’ theorem to α = ϕ∗ ω. If
∂M
ϕ is a retraction, j ∗ ϕ∗ ω = ω, so we have
Z
Z
(20.1)
ω = dϕ∗ ω.
∂M
M
But dϕ∗ ω = ϕ∗ dω = 0, so the integral (20.1) is zero. This is a contradiction, so
there can be no retraction.
A simple consequence of this is the famous Brouwer Fixed Point Theorem.
Theorem 20.3. If F : B → B is a continuous map on the closed unit ball in Rn ,
then F has a fixed point.
Proof. We are claiming that F (x) = x for some x ∈ B. If not, define ϕ(x) to be the
endpoint of the ray from F (x) to x, continued until it hits ∂B = S n−1 . It is clear
that ϕ would be a retraction, contradicting Proposition 20.1.
We next show that an even dimensional sphere cannot have a smooth nonvanishing vector field.
Proposition 20.4. There is no smooth nonvanishing vector field on S n if n = 2k
is even.
Proof. If X were such a vector field, we could arrange it to have unit length, so
we would have X : S n → S n with X(v) ⊥ v for v ∈ S n ⊂ Rn+1 . Thus there is
a unique unit speed geodesic γv from v to X(v), of length π/2. Define a smooth
135
family of maps Ft : S n → S n by Ft (v) = γv (t). Thus F0 (v) = v, Fπ/2 (v) = X(v),
and Fπ = A would be the antipodal map, A(v) = −v. By (14.34), we deduce that
A∗ ω − ω = dβ is exact, where ω is the volume form on S n . Hence, by Stokes’
theorem,
Z
Z
∗
(20.2)
A ω=
Sn
ω.
Sn
On the other hand, it is straightforward that A∗ ω = (−1)n+1 ω, so (20.2) is possible
only when n is odd.
Note that an important ingredient in the proof of both Proposition 20.2 and
Proposition 20.4 is the existence of n-forms on a compact oriented n-dimensional
surface M which are not exact (though of course they are closed). We next establish
the following important counterpoint to the Poincare lemma.
Proposition 20.5. If M is a compact connected oriented surface of dimension n
and α ∈ Λn M, then α = dβ for some β ∈ Λn−1 (M ) if and only if
Z
(20.3)
α = 0.
M
We have already discussed the necessity of (20.3). To prove the sufficiency, we
first look at the case M = S n .
In that case, any n-form α is of the form a(x)ω, a ∈ C ∞ (S n ), ω the volume form
on S n , with its standard metric. The group G = SO(n + 1) of rotations of Rn+1
acts as a transitive group of isometries on S n . In §12 we constructed the integral
of functions over SO(n + 1), with respect to Haar measure.
As noted in §12, we have the map Exp:Skew(n + 1) → SO(n + 1), giving a
diffeomorphism from a ball O about 0 in Skew(n + 1) onto an open set U ⊂ SO(n +
1) = G, a neighborhood of the identity. Since G is compact, we can pick a finite
number of elements ξj ∈ G such that the open sets Uj = {ξj g : g ∈ U } cover G.
Pick ηj ∈ Skew(n + 1) such that Exp ηj = ξj . Define Φjt : Uj → G for 0 ≤ t ≤ 1 by
(20.4)
¡
¢
Φjt ξj Exp(A) = (Exp tηj )(Exp tA),
A ∈ O.
Now partition G into subsets Ωj , each of whose boundaries has content zero, such
that Ωj ⊂ Uj . If g ∈ Ωj , set g(t) = Φjt (g). This family of elements of SO(n + 1)
defines a family of maps Fgt : S n → S n . Now, as in (14.31) we have,
Z
(20.5)
∗
α = g α − dκg (α),
1
κg (α) =
0
∗
Fgt
(αcXg )dt,
136
for each g ∈ SO(n + 1), where Xgt is the family of vector fields on S n generated by
Fgt , as in (14.29). Therefore,
Z
(20.6)
Z
∗
α=
g α dg − d
G
κg (α) dg.
G
Now the first term on the right is equal to αω, where α =
in fact, the constant is
(20.7)
1
α=
Vol S n
R
a(g · x)dg is a constant;
Z
α.
Sn
Thus in this case (20.3) is precisely what serves to make (20.6) a representation of
α as an exact form. This finishes the case M = S n .
For a general compact, oriented, connected M, proceed as follows. Cover M
with open sets O1 , . . . , OK such that each Oj is diffeomorphic to the closed unit
ball in Rn . Set U1 = O1 , and inductively enlarge each Oj to Uj , so that U j is also
diffeomorphic to the closed ball, and such that Uj+1 ∩ Uj 6= ∅, 1 ≤ j < K. You can
do this by drawing a simple curve from Oj+1 to a point in Uj and thickening it.
Pick a smooth partition of unity ϕj , subordinate to this cover.
R
Given α ∈ Λn M, satisfying (20.3), take α
˜ j = ϕj α. Most likely R α
˜ 1 = c1 6= 0,
so take σ1 ∈ Λn M, with compact support in U1 ∩ U2 , such that σ1 = c1 . Set
α1 = α
˜R1 − σ1 , and redefine α
˜ 2 to be the old α
˜ 2 plus σ1 . Make a similar construction
using α
˜ 2 = c2 , and continue. When you are done, you have
(20.8)
α = α1 + · · · + αK ,
with αj compactly supported in Uj . By construction,
Z
(20.9)
αj = 0
R
for 1 ≤ j < K. But then (20.3) implies αK = 0 too.
Now pick p ∈ S n and define smooth maps
(20.10)
ψj : M −→ S n
which map Uj diffeomorphically onto S n \p, and map M \Uj to p. There is a unique
vj ∈ Λn S n , with compact support in S n \ p, such that ψ ∗ vj = αj . Clearly
Z
vj = 0,
Sn
137
so by the case M = S n of Proposition 20.5 already established, we know that
vj = dwj for some wj ∈ Λn−1 S n , and then
(20.11)
βj = ψj∗ wj .
αj = dβj ,
This concludes the proof.
We can sharpen and extend some of the topological results given above, using
the notion of the degree of a map between compact oriented surfaces. Let X and
Y be compact oriented n-dimensional surfaces. We want to define the degree of a
smooth map F : X → Y. To do this, assume Y is connected. We pick ω ∈ Λn Y
such that
Z
(20.12)
ω = 1.
Y
We want to define
Z
(20.13)
F ∗ ω.
Deg(F ) =
X
The following result shows that Deg(F ) is indeed well defined by this formula. The
key argument is an application of Proposition 20.5.
Lemma 20.6. The quantity (20.13) is independent of the choice of ω, as long as
(20.12) holds.
R
R
Proof. Pick ω1 ∈ Λn Y satisfying ω1 = 1, so (ω − ω1 ) = 0. By Proposition 20.5,
Y
this implies
Y
(20.14)
ω − ω1 = dα, for some α ∈ Λn−1 Y.
Thus
Z
Z
∗
(20.15)
F ω−
X
Z
∗
dF ∗ α = 0,
F ω1 =
X
X
and the lemma is proved.
The following is a most basic property.
Proposition 20.7. If F0 and F1 are homotopic, then Deg(F0 ) = Deg(F1 ).
Proof. As noted in problem 6 ofR §14, if F0 and F1 are homotopic, then F0∗ ω − F1∗ ω
is exact, say dβ, and of course dβ = 0.
X
We next give an alternative formula for the degree of a map, which is very useful
in many applications. A point y0 ∈ Y is called a regular value of F, provided that,
for each x ∈ X satisfying F (x) = y0 , DF (x) : Tx X → Ty0 Y is an isomorphism. The
easy case of Sard’s Theorem, discussed in Appendix B, implies that most points in
Y are regular. Endow X with a volume element ωX , and similarly endow Y with
ωY . If DF (x) is invertible, define JF (x) ∈ R \ 0 by F ∗ (ωY ) = JF (x)ωX . Clearly
the sign of JF (x), i.e., sgn JF (x) = ±1, is independent of choices of ωX and ωY ,
as long as they determine the given orientations of X and Y.
138
Proposition 20.8. If y0 is a regular value of F, then
(20.16)
Deg(F ) =
X
{sgn JF (xj ) : F (xj ) = y0 }.
Proof. Pick ω ∈ Λn Y, satisfying
P (20.12), with support in a small neighborhood of
∗
y0 . ThenR F ω will be a sum
ωj , with ωj supported in a small neighborhood of
xj , and ωj = ±1 as sgn JF (xj ) = ±1.
The following result is a powerful tool in degree theory.
Proposition 20.9. Let M be a compact oriented surface
with boundary, dim M =
¯
¯
n + 1. Given a smooth map F : M → Y, let f = F ∂M : ∂M → Y. Then
Deg(f ) = 0.
Proof. Applying Stokes’ Theorem to α = F ∗ ω, we have
Z
Z
∗
dF ∗ ω.
f ω=
∂M
M
But dF ∗ ω = F ∗ dω, and dω = 0 if dim Y = n, so we are done.
An easy corollary of this is Brouwer’s no-retraction theorem. Compare the proof
of Proposition 20.2.
Corollary 20.10. If M is a compact oriented surface with nonempty boundary
∂M, then there is no smooth retraction ϕ : M → ∂M.
Proof. Without loss of generality, we can assume M is connected. If there were a
retraction, then ∂M = ϕ(M ) must also be connected,
so Proposition 20.9 applies.
¯
But then we would have, for the map id. = ϕ¯∂M , the contradiction that its degree
is both zero and 1.
For another application of degree theory, let X be a compact smooth oriented
hypersurface in Rn+1 , and set Ω = Rn+1 \ X. Given p ∈ Ω, define
(20.17)
Fp : X −→ S n ,
Fp (x) =
x−p
.
|x − p|
It is clear that Deg(Fp ) is constant on each connected component of Ω. It is also
easy to see that, when p crosses X, Deg(Fp ) jumps by ±1. Thus Ω has at least
two connected components. This is most of the smooth case of the Jordan-Brouwer
separation theorem:
139
Theorem 20.11. If X is a smooth compact oriented hypersurface of Rn+1 , which
is connected, then Ω = Rn+1 \ X has exactly 2 connected components.
Proof. X being oriented, it has a smooth global normal vector field. Use this to
separate a small collar neighborhood C of X into 2 pieces; C \ X = C0 ∪ C1 . The
collar C is diffeomorphic to [−1, 1] × X, and each Cj is clearly connected. It suffices
to show that any connected component O of Ω intersects either C0 or C1 . Take
p ∈ ∂O. If p ∈
/ X, then p ∈ Ω, which is open, so p cannot be a boundary point of
any component of Ω. Thus ∂O ⊂ X, so O must intersect a Cj . This completes the
proof.
Let us note that, of the two components of Ω, exactly one is unbounded, say Ω0 ,
and the other is bounded; call it Ω1 . Then we claim
(20.18)
p ∈ Ωj =⇒ Deg(Fp ) = j.
Indeed, for p very far from X, Fp : X → S n is not onto, so its degree is 0. And
when p crosses X, from Ω0 to Ω1 , the degree jumps by +1.
For a simple closed curve in R2 , this result is the smooth case of the Jordan curve
theorem. That special case of the argument given above can be found in [Sto].
We remark that, with a bit more work, one can show that any compact smooth
hypersurface in Rn+1 is orientable. For one proof, see appendix B in Chapter 5 of
[T1].
The next application of degree theory is useful in the study of closed orbits of
planar vector fields. Let C be a simple smooth closed curve in R2 , parametrized by
arc-length, of total length L. Say C is given by x = γ(t), γ(t + L) = γ(t). Then we
have a unit tangent field to C, T (γ(t)) = γ 0 (t), defining
(20.19)
T : C −→ S 1 .
Proposition 20.12. For T given by (20.19), we have
(20.20)
Deg(T ) = 1.
Proof. Pick a tangent line ` to C such that C lies on one side of `, as in Fig.20.1.
Without changing Deg(T ), you can flatten out C a little, so it intersects ` along a
line segment, from γ(L0 ) to γ(L) = γ(0), where we take L0 = L − 2ε, L1 = L − ε.
Now T is close to the map Ts : C → S 1 , given by
(20.21)
¡
¢
γ(t + s) − γ(t)
Ts γ(t) =
,
|γ(t + s) − γ(t)|
for any s > 0 small enough; hence T and such Ts are homotopic; hence T and
Ts are homotopic for all s ∈ (0, L). Furthermore, we can even let s = s(t) be any
140
continuous function s : [0, L] → (0, L), such that s(0) = s(L). In particular, T is
homotopic to the map V : C → S 1 , obtained from (20.21) by taking
s(t) = L1 − t, for t ∈ [0, L0 ],
and s(t) going monotonically from L1 − L0 to L1 for t ∈ [L0 , L]. Note that
¡
¢
γ(L1 ) − γ(t)
,
V γ(t) =
|γ(L1 ) − γ(t)|
0 ≤ t ≤ L0 .
The parts of V over the ranges 0 ≤ t ≤ L0 and L0 ≤ t ≤ L, respectively, are
illustrated in Figures 20.1 and 20.2. We see that V maps the segment of C from
γ(0) to γ(L0 ) into the lower half of the circle S 1 , and it maps the segment of C
from γ(L0 ) to γ(L) into the upper half of the circle S 1 . Therefore V (hence T ) is
homotopic to a one-to-one map of C onto S 1 , preserving orientation, and (20.20)
is proved.
Exercises
1. Show that the identity map I : X → X has degree 1.
2. Show that, if F : X → Y is not onto, then Deg(F ) = 0.
3. If A : S n → S n is the antipodal map, show that Deg(A) = (−1)n−1 .
4. Show that the homotopy invariance property given in Proposition 20.7 can be
deduced as a corollary of Proposition 20.9.
Hint. Take M = X × [0, 1].
5. Let p(z) = z n + an−1 z n−1 + · · · + a1 z + a0 be a polynomial of degree n ≥ 1.
Show that, if we identify S 2 ≈ C ∪ {∞}, then p : C → C has a unique continuous
extension pe : S 2 → S 2 , with pe(∞) = ∞. Show that
Deg pe = n.
Deduce that pe : S 2 → S 2 is onto, and hence that p : C → C is onto.
In particular, each nonconstant polynomial in z has a complex root.
This result is the Fundamental Theorem of Algebra.
141
21. Critical points and index of a vector field
A critical point of a vector field V is a point where V vanishes. Let V be a
vector field defined on a neighborhood O of p ∈ Rn , with a single critical point, at
p. Then, for any small ball Br about p, Br ⊂ O, we have a map
(21.1)
Vr : ∂Br → S n−1 ,
Vr (x) =
V (x)
.
|V (x)|
The degree of this map is called the index of V at p, denoted indp (V ); it is clearly
independent of r. If V has a finite number of critical points, then the index of V is
defined to be
X
(21.2)
Index(V ) =
indpj (V ).
If ψ : O → O0 is an orientation preserving diffeomorphism, taking p to p and V
to W, then we claim
(21.3)
indp (V ) = indp (W ).
In fact, Dψ(p) is an element of GL(n, R) with positive determinant, so it is homotopic to the identity, and from this it readily follows that Vr and Wr are homotopic
maps of ∂Br → S n−1 . Thus one has a well defined notion of the index of a vector
field with a finite number of critical points on any oriented surface M.
A vector field V on O ⊂ Rn is said to have a non-degenerate critical point at p
provided DV (p) is a nonsingular n × n matrix. The following formula is convenient.
Proposition 21.1. If V has a nondegenerate critical point at p, then
(21.4)
indp (V ) = sgn det DV (p).
Proof. If p is a nondegenerate critical point, and we set ψ(x) = DV (p)x, ψr (x) =
ψ(x)/|ψ(x)|, for x ∈ ∂Br , it is readily verified that ψr and Vr are homotopic, for
r small. The fact that Deg(ψr ) is given by the right side of (21.4) is an easy
consequence of Proposition 20.8.
The following is an important global relation between index and degree.
Proposition 21.2. Let Ω be a smooth bounded region in Rn+1 . Let V be a vector
field on Ω, with a finite number of critical points pj , all in the interior Ω. Define
F : ∂Ω → S n by F (x) = V (x)/|V (x)|. Then
(21.5)
Index(V ) = Deg(F ).
S
Proof. If we apply Proposition 20.9 to M = Ω \ Bε (pj ), we see that Deg(F ) is
j
equal to the sum of degrees of the maps of ∂Bε (pj ) to S n , which gives (21.5).
Next we look at a process of producing vector fields in higher dimensional spaces
from vector fields in lower dimensional spaces.
142
Proposition 21.3. Let W be a vector¡ field on¢Rn , vanishing only at 0. Define a
vector field V on Rn+k by V (x, y) = W (x), y . Then V vanishes only at (0, 0).
Then we have
(21.6)
ind0 W = ind(0,0) V.
Proof. If we use Proposition 20.8 to compute degrees of maps, and choose y0 ∈
S n−1 ⊂ S n+k−1 , a regular value of Wr , and hence also for Vr , this identity follows.
We turn to a more sophisticated variation. Let X be a compact oriented n
dimensional surface in Rn+k , W a (tangent) vector field on X with a finite number
of critical points pj . Let Ω be a small tubular neighborhood of X, π : Ω → X
mapping z ∈ Ω to the nearest point in X. Let ϕ(z) = dist(z, X)2 . Now define a
vector field V on Ω by
(21.7)
V (z) = W (π(z)) + ∇ϕ(z).
Proposition 21.4. If F : ∂Ω → S n+k−1 is given by F (z) = V (z)/|V (z)|, then
(21.8)
Deg(F ) = Index(W ).
Proof. We see that all the critical points of V are points in X which are critical
for W, and, as in Proposition 21.3, Index(W ) = Index(V ). But Proposition 21.2
implies Index(V ) = Deg(F ).
Since ϕ(z) is increasing as one goes away from X, it is clear that, for z ∈ ∂Ω, V (z)
points out of Ω, provided it is a sufficiently small tubular neighborhood of X. Thus
F : ∂Ω → S n+k−1 is homotopic to the Gauss map
(21.9)
N : ∂Ω −→ S n+k−1 ,
given by the outward pointing normal. This immediately gives:
Corollary 21.5. Let X be a compact oriented surface in Rn+k , Ω a small tubular
neighborhood of X, and N : ∂Ω → S n+k−1 the Gauss map. If W is a vector field
on X with a finite number of critical points, then
(21.10)
Index(W ) = Deg(N ).
Clearly the right side of (21.10) is independent of the choice of W. Thus any two
vector fields on X with a finite number of critical points have the same index, i.e.,
Index(W ) is an invariant of X. This invariant is denoted
(21.11)
Index(W ) = χ(X),
143
and is called the Euler characteristic of X. See the exercises for more results on
χ(M ).
Exercises
A nondegenerate critical point p of a vector field V is said to be a source if the real
parts of the eigenvalues of DV (p) are all positive, a sink if they are all negative,
and a saddle if they are all either positive or negative, and there exist some of each
sign. Such a critical point is called a center if all orbits of V close to p are closed
orbits, which stay near p; this requires all the eigenvalues of DV (p) to be purely
imaginary.
In problems 1-3, V is a vector field on a region Ω ⊂ R2 .
1. Let V have a nondegenerate critical point at p. Show that
p saddle =⇒ indp (V ) = −1
p source =⇒ indp (V ) = 1
p sink =⇒ indp (V ) = 1
p center =⇒ indp (V ) = 1
2. If V has a closed orbit γ, show that the map T : γ → S 1 , T (x) = V (x)/|V (x)|,
has degree +1.
Hint. Use Proposition 20.12.
3. If V has a closed orbit γ whose inside O is contained in Ω, show that V must
have at least one critical point in O, and that the sum of the indices of such critical
points must be +1.
Hint. Use Proposition 21.2.
If V has exactly one critical point in O, show that it cannot be a saddle.
4. Let M be a compact oriented 2-dimensional surface. Given a triangulation of M,
within each triangle construct a vector field, vanishing at 7 points as illustrated in
Fig.21.1, with the vertices as attractors, the center as a repeller, and the midpoints
of each side as saddle points. Fit these together to produce a smooth vector field
X on M. Show directly that
Index(X) = V − E + F,
144
where
V = # vertices, E = # edges, F = # faces,
in the triangulation.
5. More generally, construct a vector field on an n-simplex so that, when a compact
oriented n-dimensional surface M is triangulated into simplices, one produces a
vector field X on M such that
(21.12)
Index(X) =
n
X
(−1)j νj ,
j=0
where νj is the number of j-simplices in the triangulation, i.e., ν0 = # vertices,
ν1 = # edges, . . . , νn = # of n-simplices.
See Fig.21.2 for a picture of a 3-simplex, with its faces (i.e., 2-simplices), edges, and
vertices labelled.
The right side of (21.12) is one definition of χ(M ). As we have seen that the left
side of (21.12) is independent of the choice of X, it follows that the right side is
independent of the choice of triangulation.
6. Let M be the sphere S n , which is homeomorphic to the boundary of an (n + 1)simplex. Computing the right side of (21.12), show that
(21.13)
χ(S n ) = 2 if n even,
0 if n odd.
Conclude that, if n is even, there is no smooth nowhere vanishing vector field on
Sn.
7. With X = S n ⊂ Rn+1 , note that the manifold ∂Ω in (21.9) consists of two
copies of S n , with opposite orientations. Compute the degree of the map N in
(21.9)-(21.10), and use this to give another derivation of (21.13), granted (21.11).
8. Consider the vector field R on S 2 generating rotation about an axis. Show
that R has two critical points, at the ‘poles.’ Classify the critical points, compute
Index(R), and compare the n = 2 case of (21.13).
9. Show that the computation of the index of a vector field X on a surface M
is independent of orientation, and that Index(X) can be defined when M is not
orientable.
145
A. Metric spaces, compactness, and all that
A metric space is a set X, together with a distance function d : X × X → [0, ∞),
having the properties that
d(x, y) = 0 ⇐⇒ x = y,
(A.1)
d(x, y) = d(y, x),
d(x, y) ≤ d(x, z) + d(y, z).
The third of these properties is called the triangle inequality. An example of a
metric space is the set of rational p
numbers Q, with d(x, y) = |x − y|. Another
n
example is X = R , with d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 .
If (xν ) is a sequence in X, indexed by ν = 1, 2, 3, . . . , i.e., by ν ∈ Z+ , one
says xν → y if d(xν , y) → 0, as ν → ∞. One says (xν ) is a Cauchy sequence
if d(xν , xµ ) → 0 as µ, ν → ∞. One says X is a complete metric space if every
Cauchy sequence converges to a limit in X. Some metric spaces are not complete;
for example,Q is√not complete. You can take a sequence (xν ) of rational numbers
such that xν → 2, which is not rational. Then (xν ) is Cauchy in Q, but it has no
limit in Q.
b as follows.
If a metric space X is not complete, one can construct its completion X
b consist of an equivalence class of Cauchy sequences in X,
Let an element ξ of X
where we say (xν ) ∼ (yν ) provided d(xν , yν ) → 0. We write the equivalence class
containing (xν ) as [xν ]. If ξ = [xν ] and η = [yν ], we can set d(ξ, η) = lim d(xν , yν ),
ν→∞
b a complete metric space.
and verify that this is well defined, and makes X
If the completion of Q is constructed by this process, you get R, the set of real
numbers.
A metric space X is said to be compact provided any sequence (xν ) in X has
a convergent subsequence. Clearly every compact metric space is complete. There
are two useful conditions, each equivalent to the characterization of compactness
just stated, on a metric space. The reader can establish the equivalence, as an
exercise.
(i) If S ⊂ X is a set with infinitely many elements, then there is an accumulation
point, i.e., a point p ∈ X such that every neighborhood U of p contains infinitely
many points in S.
Here, a neighborhood of p ∈ X is a set containing the ball
(A.2)
for some ε > 0.
Bε (p) = {x ∈ X : d(x, p) < ε},
146
(ii) Every open cover {Uα } of X has a finite subcover.
Here, a set U ⊂ X is called open if it contains a neighborhood of each of its points.
The complement of an open set is said to be closed. Equivalently, K ⊂ X is closed
provided that
(A.3)
xν ∈ K, xν → p ∈ X =⇒ p ∈ K.
It is clear that any closed subset of a compact metric space is also compact.
If Xj , 1 ≤ j ≤ m, is a finite collection of metric spaces, with metrics dj , we can
define a product metric space
(A.4)
X=
m
Y
Xj ,
d(x, y) = d1 (x1 , y1 ) + · · · + dm (xm , ym ).
j=1
p
Another choice of metric is δ(x, y) = d1 (x1 , y1 )2 + · · · + dm (xm , ym )2 . The metrics d and δ are equivalent, i.e., there exist constants C0 , C1 ∈ (0, ∞) such that
(A.5)
C0 δ(x, y) ≤ d(x, y) ≤ C1 δ(x, y),
∀ x, y ∈ X.
We describe some useful classes of compact spaces.
Proposition A.1. If Xj are compact metric spaces, 1 ≤ j ≤ m, so is X =
m
Q
j=1
Xj .
Proof. If (xν ) is an infinite sequence of points in X, say xν = (x1ν , . . . , xmν ), pick a
convergent subsequence (x1ν ) in X1 , and consider the corresponding subsequence of
(xν ), which we relabel (xν ). Using this, pick a convergent subsequence (x2ν ) in X2 .
Continue. Having a subsequence such that xjν → yj in Xj for each j = 1, . . . , m,
we then have a convergent subsequence in X.
The following result is useful for calculus on Rn .
Proposition A.2. If K is a closed bounded subset of Rn , then K is compact.
Proof. The discussion above reduces the problem to showing that any closed interval
I = [a, b] in R is compact. Suppose S is a subset of I with infinitely many elements.
Divide I into 2 equal subintervals, I1 = [a, b1 ], I2 = [b1 , b], b1 = (a + b)/2. Then
either I1 or I2 must contain infinitely many elements of S. Say Ij does. Let x1
be any element of S lying in Ij . Now divide Ij in two equal pieces, Ij = Ij1 ∪ Ij2 .
One of these intervals (say Ijk ) contains infinitely many points of S. Pick x2 ∈
Ijk to be one such point (different from x1 ). Then subdivide Ijk into two equal
subintervals, and continue. We get an infinite sequence of distinct points xν ∈ S,
and |xν − xν+k | ≤ 2−ν (b − a), for k ≥ 1. Since R is complete, (xν ) converges, say to
y ∈ I. Any neighborhood of y contains infinitely many points in S, so we are done.
If X and Y are metric spaces, a function f : X → Y is said to be continuous
provided xν → x in X implies f (xν ) → f (x) in Y.
147
Proposition A.3. If X and Y are metric spaces, f : X → Y continuous, and
K ⊂ X compact, then f (K) is a compact subset of Y.
Proof. If (yν ) is an infinite sequence of points in f (K), pick xν ∈ K such that
f (xν ) = yν . If K is compact, we have a subsequence xνj → p in X, and then
yνj → f (p) in Y.
If F : X → R is continuous, we say f ∈ C(X). A useful corollary of Proposition
A.3 is:
Proposition A.4. If X is a compact metric space and f ∈ C(X), then f assumes
a maximim and a minimum value on X.
Proof. We know from Proposition A.3 that f (X) is a compact subset of R. Hence
f (X) is bounded, say f (X) ⊂ I = [a, b]. Repeatedly subdividing I into equal halves,
as in the proof of Proposition A.2, at each stage throwing out intervals that do not
intersect f (X), and keeping only the leftmost and rightmost interval amongst those
remaining, we obtain points α ∈ f (X) and β ∈ f (X) such that f (X) ⊂ [α, β]. Then
α = f (x0 ) for some x0 ∈ X is the minimum and β = f (x1 ) for some x1 ∈ X is the
maximum.
At this point, the reader might take a look at the proof of the Mean Value
Theorem, given in §0, which applies this result.
A function f ∈ C(X) is said to be uniformly continuous provided that, for any
ε > 0, there exists δ > 0 such that
(A.6)
x, y ∈ X, d(x, y) ≤ δ =⇒ |f (x) − f (y)| ≤ ε.
An equivalent condition is that f have a modulus of continuity, i.e., a monotonic
function ω : [0, 1) → [0, ∞) such that δ & 0 ⇒ ω(δ) & 0, and such that
(A.7)
x, y ∈ X, d(x, y) ≤ δ ≤ 1 =⇒ |f (x) − f (y)| ≤ ω(δ).
Not all continuous functions are uniformly continuous. For example, if X = (0, 1) ⊂
R, then f (x) = sin 1/x is continuous, but not uniformly continuous, on X. The
following result is useful, for example, in the development of the Riemann integral
in §11.
Proposition A.5. If X is a compact space and f ∈ C(X), then f is uniformly
continuous.
Proof. If not, there exist xν , yν ∈ X and ε > 0 such that d(xν , yν ) ≤ 2−ν but
(A.8)
|f (xν ) − f (yν )| ≥ ε.
Taking a convergent subsequence xνj → p, we also have yνj → p. Now continuity
of f at p implies f (xνj ) → f (p) and f (yνj ) → f (p), contradicting (A.8).
148
If X and Y are metric spaces, the space C(X, Y ) of continuous maps f : X → Y
has a natural metric structure, under some additional hypotheses. We use
(A.9)
¡
¢
D(f, g) = sup d f (x), g(x) .
x∈X
This sup exists provided f (X) and g(X) are bounded subsets of Y, where to say
B ⊂ Y is bounded is to say d : B × B → [0, ∞) has bounded image. In particular,
this supremum exists if X is compact. The following result is useful in the proof of
the fundamental local existence theorem for ODE, in §3.
Proposition A.6. If X is a compact metric space and Y is a complete metric
space, then C(X, Y ), with the metric (A.9), is complete.
We leave the proof as an exercise.
We end this appendix with a couple of slightly more sophisticated results on compactness. The following extension of Proposition A.1 is a special case of Tychonov’s
Theorem.
∞
Q
Proposition A.7. If {Xj : j ∈ Z+ } are compact metric spaces, so is X =
Xj .
j=1
Here, we can make X a metric space by setting
(A.10)
d(x, y) =
∞
X
2−j dj (x, y)
.
1
+
d
(x,
y)
j
j=1
It is easy to verify that, if xν ∈ X, then xν → y in X, as ν → ∞, if and only if, for
each j, pj (xν ) → pj (y) in Xj , where pj : X → Xj is the projection onto the j th
factor.
Proof. Following the argument in Proposition A.1, if (xν ) is an infinite sequence of
points in X, we obtain a nested family of subsequences
(A.11)
(xν ) ⊃ (x1 ν ) ⊃ (x2 ν ) ⊃ · · · ⊃ (xj ν ) ⊃ · · ·
such that p` (xj ν ) converges in X` , for 1 ≤ ` ≤ j. the next step is a diagonal construction.
We set
(A.12)
ξν = xν ν ∈ X.
Then, for each j, after throwing away a finite number N (j) of elements, one obtains
from (ξν ) a subsequence of the sequence (xj ν ) in (A.11), so p` (ξν ) converges in X`
for all `. Hence (ξν ) is a convergent subsequence of (xν ).
Before stating the next result, we establish a simple lemma.
149
Lemma A.8. If X is a compact metric space, then there is a countable dense
subset Σ of X. (One says X is separable.)
Proof. For each ν ∈ Z+ , the collection of balls B1/ν (x) covers X, so there is a finite
subcover, {B1/ν (xνj ) : 1 ≤ j ≤ N (ν)}. It follows that Σ = {xνj : j ≤ N (ν), ν ∈ Z+ }
is a countable dense subset of X.
The next result is a special case of Ascoli’s Theorem.
Proposition A.9. Let X and Y be compact metric spaces, and fix a modulus of
continuity ω(δ). Then
(A.13)
©
¡
¢
¡
¢ª
Cω = f ∈ C(X, Y ) : d f (x), f (y) ≤ ω d(x, y)
is a compact subset of C(X, Y ).
Proof. Let (fν ) be a sequence in Cω . Let Σ be a countable dense subset of X. For
each x ∈ Σ, (fν (x)) is a sequence in Y, which hence has a convergent subsequence.
Using a diagonal construction similar to that in the proof of Proposition A.7, we
obtain a subsequence (ϕν ) of (fν ) with the property that ϕν (x) converges in Y, for
each x ∈ Σ, say
(A.14)
x ∈ Σ =⇒ ϕν (x) → ψ(x),
where ψ : Σ → Y.
So far, we have not used (A.13), but this hypothesis readily yields
¡
¢
¡
¢
d ψ(x), ψ(y) ≤ ω d(x, y) ,
(A.15)
for all x, y ∈ Σ. Using the denseness of Σ ⊂ X, we can extend ψ uniquely to a
continuous map of X → Y, which we continue to denote ψ. Also, (A.15) holds for
all x, y ∈ X, i.e., ψ ∈ Cω .
It remains to show that ϕν → ψ uniformly on X. Pick ε > 0. Then pick δ > 0
such that ω(δ) < ε/3. Since X is compact, we can cover X by finitely many balls
Bδ (xj ), 1 ≤ j ≤ N, xj ∈ Σ. Pick M so large that ϕν (xj ) is within ε/3 of its limit
for all ν ≥ M (when 1 ≤ j ≤ N ). Now, for any x ∈ X, picking ` ∈ {1, . . . , N } such
that d(x, x` ) ≤ δ, we have, for k ≥ 0, ν ≥ M,
(A.16)
¡
¢
¡
¢
¡
¢
d ϕν+k (x), ϕν (x) ≤ d ϕν+k (x), ϕν+k (x` ) + d ϕν+k (x` ), ϕν (x` )
¡
¢
+ d ϕν (x` ), ϕν (x)
≤ ε/3 + ε/3 + ε/3,
proving the proposition.
150
B. Sard’s theorem
Let F : O → Rn be a C 1 map, with O open in Rn . If p ∈ O and DF (p) : Rn → Rn
is not surjective, then p is said to be a critical point, and F (p) a critical value. The
set C of critical points can be a large subset of O, even all of it, but the set of
critical values F (C) must be small in Rn , as the following result implies.
Proposition B.1. If F : O → Rn is a C 1 map, C ⊂ O its set of critical points,
and K ⊂ O compact, then F (C ∩ K) is a nil subset of Rn .
Proof. Without loss of generality, we can assume K is a cubical cell. Let P be a
partition of K into cubical cells Rα , all of diameter δ. Write P = P 0 ∪ P 00 , where
cells in P 0 are disjoint from C, and cells in P 00 intersect C. Pick xα ∈ Rα ∩ C, for
Rα ∈ P 00 .
Fix ε > 0. Now we have
(B.1)
F (xα + y) = F (xα ) + DF (xα ) + rα (y),
and, if δ > 0 is small enough, then |rα (y)| ≤ ε|y| ≤ εδ, for xα +y ∈ Rα . Thus F (Rα )
is contained in an εδ-neighborhood of the set Hα = F (xα ) + DF (xα )(Rα − xα ),
which is a parallelipiped of dimension ≤ n − 1, and diameter ≤ M δ, if |DF | ≤ M.
Hence
(B.2)
cont+ F (Rα ) ≤ Cεδ n ≤ C 0 εV (Rα ),
for Rα ∈ P 00 .
Thus
(B.3)
cont+ F (C ∩ K) ≤
X
cont+ F (Rα ) ≤ C 00 ε.
Rα ∈P 00
Taking ε → 0, we have the proof.
This is the easy case of a result known as Sard’s Theorem, which also treats the
case F : O → Rn when O is an open set in Rm , m > n. Then a more elaborate
argument is needed, and one requires more differentiability, namely that F is class
C k , with k = m − n + 1. A proof can be found in [Mil] or [Stb].
151
References
[AM] R.Abraham and J.Marsden, Foundations of Mechanics, Benjamin, Reading,
Mass., 1978.
[Ahl] L.Ahlfors, Complex Analysis, McGraw-Hill, New York, 1979.
[Ar] V.Arnold, Mathematical Methods of Classical Mechanics, Springer, New
York, 1978.
[Ar2] V.Arnold, Geometrical Methods in the Theory of Ordinary Differential
Equations, Springer, New York, 1983.
[Bir] G.D.Birkhoff, Dynamical Systems, AMS Colloq. Publ., vol. 9, Providence,
RI, 1927.
[Bu] R.Buck, Advanced Calculus, McGraw-Hill, New York, 1978.
[Car] C.Caratheodory, Calculus of Variations and Partial Differential Equations
of the First Order, Holden-Day, San Fransisco, 1965.
[Fed] H.Federer, Geometric Measure Theory, Springer, New York, 1969.
[Fle] W.Fleming, Finctions of Several Variables, Addison-Wesley, Reading, Mass.,
1965.
[Fol] G.Folland, Real Analysis: Modern Techniques and Applications, WileyInterscience, New York, 1984.
[Fr] P.Franklin, A Treatise on Advanced Calculus, John Wiley, New York, 1955.
[Go] H.Goldstein, Classical Mechanics, Addison-Wesley, New York, 1950.
[Hal] J.Hale, Ordinary Differential Equations, Wiley, New York, 1969.
[Hil] E.Hille, Analytic Function Theory, Chelsea Publ., New York, 1977.
[HS] M.Hirsch and S.Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, New York, 1974.
[Kan] Y.Kannai, An elementary proof of the no-retraction theorem, Amer. Math.
Monthly 88(1981), 264-268.
[Kel] O.Kellog, Foundations of Potential Theory, Dover, New York, 1953.
[Kr] S.Krantz, Function Theory of Several Complex Variables, Wiley, New York,
1982.
[Lef] S.Lefschetz, Differential Equations, Geometric Theory, J. Wiley, New York,
1957.
[LS] L.Loomis and S.Sternberg, Advanced Calculus, Addison-Wesley, New York,
1968.
[Mil] J.Milnor, Topology from the Differentiable Viewpoint, Univ. Press of Virginia, Charlottesville, 1965.
[Mos] J.Moser, Stable and Random Motions in Dynamical Systems, Princeton
Univ. Press, 1973.
[NS] V.Nemytskii and V.Stepanov, Qualitative Theory of Differential Equations,
Dover, New York, 1989.
152
[NSS] H.Nickerson, D.Spencer, and N.Steenrod, Advanced Calculus, Van Nostrand, Princeton, New Jersey, 1959.
[Poi] H.Poincare, Les Methodes Nouvelles de la Mecanique Celeste, GauthierVillars, 1899.
[Spi] M.Spivak, Calculus on Manifolds, Benjamin, New York, 1965.
[Stb] S.Sternberg, Lectures on Differential Geometry, Prentice Hall, New Jersey,
1964.
[Sto] J.J.Stoker, Differential Geometry, Wiley-Interscience, New York, 1969.
[Str] G.Strang, Linear Algebra and its Applications, Harcourt Brace Jovanovich,
San Diego, 1988.
[T1] M.Taylor, Partial Differential Equations, MET Press, 1993.
[T2] M.Taylor, Measure Theory and Integration, MET Press, 1993.
[TS] J.M.Thompson and H.Stewart, Nonlinear Dynamics and Chaos, J. Wiley,
New York, 1986.
[Wh] E.Whittaker, A Treatise on the Analytical Dynamics of Particles and Rigid
Bodies, Dover, New York, 1944.
[Wig] S.Wiggins, Introduction to Applied Dynamical Systems and Chaos, Springer,
New York, 1990.
[Wil] E.B.Wilson, Advanced Calculus, Dover, New York, 1958. (Reprint of 1912
ed.)
153
Index
A
action integral 122
algebra
of sets 70
analytic function 48
complex 17
real 23, 49
angular momentum 64
anticommutation relation 92
antipodal map 131, 137
arclength 43, 86, 120
arcsin 29
arctan 30
Ascoli’s theorem 145
autonomous system 25, 46
averaging over rotations 85
B
basis 13, 34, 35, 37, 80, 91
boundary 99
Brouwer fixed point theorem 131
C
canonical transformation 66, 126
Cauchy sequence 21, 142
Cauchy-Riemann equations 17, 110
Cauchy’s inequality 31
Cauchy’s integral formula 110, 114
Cauchy’s integral theorem 110
central force 63
chain rule 11, 13, 20, 80, 90, 93, 116
change of variable formula 76, 79, 81, 87, 89
closed set 143
cofactor 17
combination 16
compact space 6, 10, 142
154
complete metric space 20, 142
completion of a metric space 142
complex variable 115
conformal map 116
conic section 65
conserved quantity 52, 64
contact form 128
content 7, 69, 147
upper 8, 69
continuous 6, 11, 13, 68, 72, 80, 143
uniformly 6, 144
contraction mapping 20, 24
coordinate 51
chart 80, 86, 90
corner 100
cos 28, 117
Cramer’s formula 17, 42
critical point 147
of a vector field 54, 138, 140
critical value 147
cross product 4, 45, 81
curl 97, 106, 108
curvature 43
cusp 101
cyclic vector 37
D
Darboux’ Theorem 128
degree of a map 3, 133, 138
derivation 50
derivative 8, 13, 47, 51
determinant 17, 19, 73, 75, 81
diagonal construction 145
diffeomorphism 20, 75, 76, 80, 82, 84, 87, 90, 129
differential form 3, 52, 87, 119
closed 94, 97
exact 94, 97, 98
div 54, 62, 97, 102, 108
Divergence Theorem 102, 104
Duhamel’s principle 37, 40, 41, 47
155
E
eigenvalue 33, 36, 54
eigenvector 33, 39
energy 59
Euclidean norm 13
Euler characteristic 139
Euler’s formula 28
exponential 26, 27
of a matrix 31, 39, 84
exterior derivative 93
F
fixed point 20, 24, 131
flow 50, 127
force 59
Frenet-Serret formula 43
Fubini’s Theorem 71, 77
fundamental theorem of algebra 3, 34, 113, 137
fundamental theorem of calculus 2, 9, 14, 17
noncommutative 40
G
gamma function 83
Gauss map 139
Gaussian integral 78
geodesic 121, 123
Gl(n, R) 73, 138
grad 105
Green’s formula 106
Green’s Theorem 102, 110
Gronwall inequality 42
H
Haar measure 85, 132
Hamiltonian systems 2, 59, 62, 108, 120, 125
harmonic conjugate 114
harmonic function 3, 112, 116
holomorphic function 3, 17, 23, 49, 110, 114
homotopy 96, 97, 134, 137
156
I
implicit function theorem 21
index of a vector field 138
initial value problem 24
integral curve 50, 51, 53, 58, 97
integral equation 24
integrating factor 98
integration by parts 12, 118
interior product 92
inverse function theorem 20, 51, 84
iterated integral 71
J
Jacobi identity 58, 62
Jordan curve theorem 136
Jordan normal form 37, 54
Jordan-Brouwer separation theorem 135
K
Kepler problem 2, 64, 66
kinetic energy 122
L
Lagrange’s equation 118, 122
Lagrangian 3, 122, 124
Laplace operator 105
Legendre transformation 120
Lie bracket 55, 95
Lie derivative 55, 94, 103, 126
Lie group 57
linearization 46, 54
Liouville’s Theorem 113
Lipschitz continuity 24, 47, 71
log 28
M
manifold 80
matrix 13, 17, 19, 31, 40, 73, 84, 89
157
maximum 39, 144
mean value property 112
Mean Value Theorem 9
metric space 142
metric tensor 3, 80, 82, 103
modulus of continuity 72, 76, 144
multi-index 15
N
Newton’s law 2, 59, 123
Newton’s method 23
nil set 70, 71, 74, 147
nilpotent 37
norm 31, 38
normal derivative 105
normal vector
outward 106, 109
upward 106, 109
O
open set 142
orbit 50, 62
closed 136, 140
orientation 89, 99, 101, 109, 116
orthogonal matrix 44
P
partial derivative 10, 13
partition 5, 67, 73
pendulum 61, 123
double 124
permutation 19
sign of 19
pi (π) 29, 30, 77, 86
Picard iteration 24
Poincare lemma 95, 126, 129, 132
Poisson bracket 59, 62
polar coordinates 64, 76, 82, 124
positive definite matrix 80
potential energy 59, 122
power series 23, 33, 48, 110, 114
pull-back 87, 89, 93, 100
158
Q
quadrature 60
R
rational numbers 142
real numbers 142
regular value 134
retraction 131
Riemann integral 2, 5, 67
rotation 82
S
Sard’s theorem 134, 147
sin 28, 117
skew-adjoint matrix 44
Skew(n) 38, 84, 132
SO(n) 38, 84, 132
sphere 82
area of 82
spherical polar coordinates 79
Stokes formula 3, 99, 104, 131, 135
surface 3, 80
surface integral 80
symplectic form 125
symplectic transformation 126
T
tan 30
tangent vector
forward 107, 109
Taylor’s formula with remainder 2, 12, 15
torsion 43
triangle inequality 31
triangulation 140
trigonometric functions 4, 29
Tychonov’s theorem 144
V
variational problem 61, 118
159
vector field 50, 59, 88, 97, 106, 125
volume form 127
W
wedge product 92
Wronskian 42