Linear Algebra, Theory And Applications

Linear Algebra, Theory And Applications
Kenneth Kuttler
January 16, 2015
2
Contents
1 Preliminaries
1.1 Sets And Set Notation . . . . . . . . . . . . . . . . . .
1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 The Number Line And Algebra Of The Real Numbers
1.4 Ordered fields . . . . . . . . . . . . . . . . . . . . . . .
1.5 The Complex Numbers . . . . . . . . . . . . . . . . . .
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Completeness of R . . . . . . . . . . . . . . . . . . . .
1.8 Well Ordering And Archimedean Property . . . . . . .
1.9 Division . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10 Systems Of Equations . . . . . . . . . . . . . . . . . .
1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
1.12 Fn . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.13 Algebra in Fn . . . . . . . . . . . . . . . . . . . . . . .
1.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
1.15 The Inner Product In Fn . . . . . . . . . . . . . . . .
1.16 What Is Linear Algebra? . . . . . . . . . . . . . . . . .
1.17 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
12
14
15
20
21
22
24
28
33
34
34
35
35
38
38
2 Linear Transformations
2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 The ij th Entry Of A Product . . . . . . . . . . . .
2.1.2 Digraphs . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Properties Of Matrix Multiplication . . . . . . . .
2.1.4 Finding The Inverse Of A Matrix . . . . . . . . . .
2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Linear Transformations . . . . . . . . . . . . . . . . . . .
2.4 Some Geometrically Defined Linear Transformations . . .
2.5 The Null Space Of A Linear Transformation . . . . . . . .
2.6 Subspaces And Spans . . . . . . . . . . . . . . . . . . . .
2.7 An Application To Matrices . . . . . . . . . . . . . . . . .
2.8 Matrices And Calculus . . . . . . . . . . . . . . . . . . . .
2.8.1 The Coriolis Acceleration . . . . . . . . . . . . . .
2.8.2 The Coriolis Acceleration On The Rotating Earth
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
45
47
49
52
55
57
59
62
63
68
69
70
73
78
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
CONTENTS
3 Determinants
3.1 Basic Techniques And Properties . . . . . . .
3.2 Exercises . . . . . . . . . . . . . . . . . . . .
3.3 The Mathematical Theory Of Determinants .
3.3.1 The Function sgn . . . . . . . . . . . .
3.3.2 The Definition Of The Determinant .
3.3.3 A Symmetric Definition . . . . . . . .
3.3.4 Basic Properties Of The Determinant
3.3.5 Expansion Using Cofactors . . . . . .
3.3.6 A Formula For The Inverse . . . . . .
3.3.7 Rank Of A Matrix . . . . . . . . . . .
3.3.8 Summary Of Determinants . . . . . .
3.4 The Cayley Hamilton Theorem . . . . . . . .
3.5 Block Multiplication Of Matrices . . . . . . .
3.6 Exercises . . . . . . . . . . . . . . . . . . . .
4 Row Operations
4.1 Elementary Matrices . . . . . . .
4.2 The Rank Of A Matrix . . . . .
4.3 The Row Reduced Echelon Form
4.4 Rank And Existence Of Solutions
4.5 Fredholm Alternative . . . . . . .
4.6 Exercises . . . . . . . . . . . . .
. .
. .
. .
To
. .
. .
. . . .
. . . .
. . . .
Linear
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
85
89
91
92
93
95
96
98
100
101
104
104
106
109
. . . . .
. . . . .
. . . . .
Systems
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
113
113
118
120
124
125
126
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Some Factorizations
5.1 LU Factorization . . . . . . . . . . . . . . . . . . .
5.2 Finding An LU Factorization . . . . . . . . . . . .
5.3 Solving Linear Systems Using An LU Factorization
5.4 The P LU Factorization . . . . . . . . . . . . . . .
5.5 Justification For The Multiplier Method . . . . . .
5.6 Existence For The P LU Factorization . . . . . . .
5.7 The QR Factorization . . . . . . . . . . . . . . . .
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
131
131
131
133
134
135
136
138
141
6 Linear Programming
6.1 Simple Geometric Considerations .
6.2 The Simplex Tableau . . . . . . . .
6.3 The Simplex Algorithm . . . . . .
6.3.1 Maximums . . . . . . . . .
6.3.2 Minimums . . . . . . . . . .
6.4 Finding A Basic Feasible Solution .
6.5 Duality . . . . . . . . . . . . . . .
6.6 Exercises . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
143
144
148
148
151
158
160
164
7 Spectral Theory
7.1 Eigenvalues And Eigenvectors Of A Matrix . . . . .
7.2 Some Applications Of Eigenvalues And Eigenvectors
7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Schur’s Theorem . . . . . . . . . . . . . . . . . . . .
7.5 Trace And Determinant . . . . . . . . . . . . . . . .
7.6 Quadratic Forms . . . . . . . . . . . . . . . . . . . .
7.7 Second Derivative Test . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
165
165
172
175
181
188
189
190
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
5
7.8 The Estimation Of Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.9 Advanced Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
8 Vector Spaces And Fields
8.1 Vector Space Axioms . . . . . . . . . . . . . .
8.2 Subspaces And Bases . . . . . . . . . . . . . .
8.2.1 Basic Definitions . . . . . . . . . . . .
8.2.2 A Fundamental Theorem . . . . . . .
8.2.3 The Basis Of A Subspace . . . . . . .
8.3 Lots Of Fields . . . . . . . . . . . . . . . . . .
8.3.1 Irreducible Polynomials . . . . . . . .
8.3.2 Polynomials And Fields . . . . . . . .
8.3.3 The Algebraic Numbers . . . . . . . .
8.3.4 The Lindemannn Weierstrass Theorem
8.4 Exercises . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Spaces .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
207
208
208
209
213
213
213
218
223
226
226
9 Linear Transformations
9.1 Matrix Multiplication As A Linear Transformation . . . .
9.2 L (V, W ) As A Vector Space . . . . . . . . . . . . . . . . .
9.3 The Matrix Of A Linear Transformation . . . . . . . . . .
9.3.1 Rotations About A Given Vector . . . . . . . . . .
9.3.2 The Euler Angles . . . . . . . . . . . . . . . . . . .
9.4 Eigenvalues And Eigenvectors Of Linear Transformations
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
233
233
233
235
242
244
245
247
10 Canonical Forms
10.1 A Theorem Of Sylvester, Direct Sums
10.2 Direct Sums, Block Diagonal Matrices
10.3 Cyclic Sets . . . . . . . . . . . . . . .
10.4 Nilpotent Transformations . . . . . . .
10.5 The Jordan Canonical Form . . . . . .
10.6 Exercises . . . . . . . . . . . . . . . .
10.7 The Rational Canonical Form . . . . .
10.8 Uniqueness . . . . . . . . . . . . . . .
10.9 Exercises . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
251
251
254
257
261
264
267
272
274
278
11 Markov Processes
11.1 Regular Markov Matrices
11.2 Migration Matrices . . . .
11.3 Absorbing States . . . . .
11.4 Exercises . . . . . . . . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
And
. . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Vector
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
281
281
284
285
288
12 Inner Product Spaces
12.1 General Theory . . . . . . . . . . . .
12.2 The Gram Schmidt Process . . . . .
12.3 Riesz Representation Theorem . . .
12.4 The Tensor Product Of Two Vectors
12.5 Least Squares . . . . . . . . . . . . .
12.6 Fredholm Alternative Again . . . . .
12.7 Exercises . . . . . . . . . . . . . . .
12.8 The Determinant And Volume . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
291
291
293
296
299
300
302
302
306
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
CONTENTS
12.9 Exercises
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
13 Self Adjoint Operators
13.1 Simultaneous Diagonalization . . . . . . . . . . .
13.2 Schur’s Theorem . . . . . . . . . . . . . . . . . .
13.3 Spectral Theory Of Self Adjoint Operators . . . .
13.4 Positive And Negative Linear Transformations .
13.5 The Square Root . . . . . . . . . . . . . . . . . .
13.6 Fractional Powers . . . . . . . . . . . . . . . . . .
13.7 Polar Decompositions . . . . . . . . . . . . . . .
13.8 An Application To Statistics . . . . . . . . . . .
13.9 The Singular Value Decomposition . . . . . . . .
13.10Approximation In The Frobenius Norm . . . . .
13.11Least Squares And Singular Value Decomposition
13.12The Moore Penrose Inverse . . . . . . . . . . . .
13.13Exercises . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
311
311
314
316
320
322
323
324
327
329
331
332
333
336
14 Norms
14.1 The p Norms . . . . . . . . . . . . . . . .
14.2 The Condition Number . . . . . . . . . .
14.3 The Spectral Radius . . . . . . . . . . . .
14.4 Series And Sequences Of Linear Operators
14.5 Iterative Methods For Linear Systems . .
14.6 Theory Of Convergence . . . . . . . . . .
14.7 Exercises . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
341
347
349
351
354
357
362
365
15 Numerical Methods, Eigenvalues
15.1 The Power Method For Eigenvalues . . . . . . . . . . . . .
15.1.1 The Shifted Inverse Power Method . . . . . . . . .
15.1.2 The Explicit Description Of The Method . . . . .
15.1.3 Complex Eigenvalues . . . . . . . . . . . . . . . . .
15.1.4 Rayleigh Quotients And Estimates for Eigenvalues
15.2 The QR Algorithm . . . . . . . . . . . . . . . . . . . . . .
15.2.1 Basic Properties And Definition . . . . . . . . . .
15.2.2 The Case Of Real Eigenvalues . . . . . . . . . . .
15.2.3 The QR Algorithm In The General Case . . . . . .
15.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
373
373
376
377
381
383
386
386
389
394
399
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Matrix Calculator On The Web
401
A.1 Use Of Matrix Calculator On Web . . . . . . . . . . . . . . . . . . . . . . . . 401
B Positive Matrices
403
C Functions Of Matrices
411
D Differential Equations
D.1 Theory Of Ordinary Differential Equations
D.2 Linear Systems . . . . . . . . . . . . . . . .
D.3 Local Solutions . . . . . . . . . . . . . . . .
D.4 First Order Linear Systems . . . . . . . . .
D.5 Geometric Theory Of Autonomous Systems
D.6 General Geometric Theory . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
417
417
418
419
421
428
431
CONTENTS
7
D.7 The Stable Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
E Compactness And Completeness
439
E.1 The Nested Interval Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
E.2 Convergent Sequences, Sequential Compactness . . . . . . . . . . . . . . . . . 440
F Fundamental Theorem Of Algebra
G Fields And Field Extensions
G.1 The Symmetric Polynomial Theorem . . . .
G.2 The Fundamental Theorem Of Algebra . . .
G.3 Transcendental Numbers . . . . . . . . . . .
G.4 More On Algebraic Field Extensions . . . .
G.5 The Galois Group . . . . . . . . . . . . . .
G.6 Normal Subgroups . . . . . . . . . . . . . .
G.7 Normal Extensions And Normal Subgroups
G.8 Conditions For Separability . . . . . . . . .
G.9 Permutations . . . . . . . . . . . . . . . . .
G.10 Solvable Groups . . . . . . . . . . . . . . .
G.11 Solvability By Radicals . . . . . . . . . . . .
H Selected Exercises
c 2012,
Copyright ⃝
443
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
445
445
447
451
459
464
469
470
471
475
479
482
487
8
CONTENTS
Preface
This is a book on linear algebra and matrix theory. While it is self contained, it will work
best for those who have already had some exposure to linear algebra. It is also assumed that
the reader has had calculus. Some optional topics require more analysis than this, however.
I think that the subject of linear algebra is likely the most significant topic discussed in
undergraduate mathematics courses. Part of the reason for this is its usefulness in unifying
so many different topics. Linear algebra is essential in analysis, applied math, and even in
theoretical mathematics. This is the point of view of this book, more than a presentation
of linear algebra for its own sake. This is why there are numerous applications, some fairly
unusual.
This book features an ugly, elementary, and complete treatment of determinants early
in the book. Thus it might be considered as Linear algebra done wrong. I have done this
because of the usefulness of determinants. However, all major topics are also presented in
an alternative manner which is independent of determinants.
The book has an introduction to various numerical methods used in linear algebra.
This is done because of the interesting nature of these methods. The presentation here
emphasizes the reasons why they work. It does not discuss many important numerical
considerations necessary to use the methods effectively. These considerations are found in
numerical analysis texts.
In the exercises, you may occasionally see ↑ at the beginning. This means you ought to
have a look at the exercise above it. Some exercises develop a topic sequentially. There are
also a few exercises which appear more than once in the book. I have done this deliberately
because I think that these illustrate exceptionally important topics and because some people
don’t read the whole book from start to finish but instead jump in to the middle somewhere.
There is one on a theorem of Sylvester which appears no fewer than 3 times. Then it is also
proved in the text. There are multiple proofs of the Cayley Hamilton theorem, some in the
exercises. Some exercises also are included for the sake of emphasizing something which has
been done in the preceding chapter.
9
10
CONTENTS
Chapter 1
Preliminaries
1.1
Sets And Set Notation
A set is just a collection of things called elements. For example {1, 2, 3, 8} would be a set
consisting of the elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it is
customary to write 3 ∈ {1, 2, 3, 8} . 9 ∈
/ {1, 2, 3, 8} means 9 is not an element of {1, 2, 3, 8} .
Sometimes a rule specifies a set. For example you could specify a set as all integers larger
than 2. This would be written as S = {x ∈ Z : x > 2} . This notation says: the set of all
integers, x, such that x > 2.
If A and B are sets with the property that every element of A is an element of B, then A is
a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} , in symbols, {1, 2, 3, 8} ⊆
{1, 2, 3, 4, 5, 8} . It is sometimes said that “A is contained in B” or even “B contains A”.
The same statement about the two sets may also be written as {1, 2, 3, 4, 5, 8} ⊇ {1, 2, 3, 8}.
The union of two sets is the set consisting of everything which is an element of at least
one of the sets, A or B. As an example of the union of two sets {1, 2, 3, 8} ∪ {3, 4, 7, 8} =
{1, 2, 3, 4, 7, 8} because these numbers are those which are in at least one of the two sets. In
general
A ∪ B ≡ {x : x ∈ A or x ∈ B} .
Be sure you understand that something which is in both A and B is in the union. It is not
an exclusive or.
The intersection of two sets, A and B consists of everything which is in both of the sets.
Thus {1, 2, 3, 8} ∩ {3, 4, 7, 8} = {3, 8} because 3 and 8 are those elements the two sets have
in common. In general,
A ∩ B ≡ {x : x ∈ A and x ∈ B} .
The symbol [a, b] where a and b are real numbers, denotes the set of real numbers x,
such that a ≤ x ≤ b and [a, b) denotes the set of real numbers such that a ≤ x < b. (a, b)
consists of the set of real numbers x such that a < x < b and (a, b] indicates the set of
numbers x such that a < x ≤ b. [a, ∞) means the set of all numbers x such that x ≥ a and
(−∞, a] means the set of all real numbers which are less than or equal to a. These sorts of
sets of real numbers are called intervals. The two points a and b are called endpoints of the
interval. Other intervals such as (−∞, b) are defined by analogy to what was just explained.
In general, the curved parenthesis indicates the end point it sits next to is not included
while the square parenthesis indicates this end point is included. The reason that there
will always be a curved parenthesis next to ∞ or −∞ is that these are not real numbers.
Therefore, they cannot be included in any set of real numbers.
11
12
CHAPTER 1. PRELIMINARIES
A special set which needs to be given a name is the empty set also called the null set,
denoted by ∅. Thus ∅ is defined as the set which has no elements in it. Mathematicians like
to say the empty set is a subset of every set. The reason they say this is that if it were not
so, there would have to exist a set A, such that ∅ has something in it which is not in A.
However, ∅ has nothing in it and so the least intellectual discomfort is achieved by saying
∅ ⊆ A.
If A and B are two sets, A \ B denotes the set of things which are in A but not in B.
Thus
A \ B ≡ {x ∈ A : x ∈
/ B} .
Set notation is used whenever convenient.
1.2
Functions
The concept of a function is that of something which gives a unique output for a given input.
Definition 1.2.1 Consider two sets, D and R along with a rule which assigns a unique
element of R to every element of D. This rule is called a function and it is denoted by a
letter such as f. Given x ∈ D, f (x) is the name of the thing in R which results from doing
f to x. Then D is called the domain of f. In order to specify that D pertains to f , the
notation D (f ) may be used. The set R is sometimes called the range of f. These days it
is referred to as the codomain. The set of all elements of R which are of the form f (x)
for some x ∈ D is therefore, a subset of R. This is sometimes referred to as the image of
f . When this set equals R, the function f is said to be onto, also surjective. If whenever
x ̸= y it follows f (x) ̸= f (y), the function is called one to one. , also injective It is
common notation to write f : D 7→ R to denote the situation just described in this definition
where f is a function defined on a domain D which has values in a codomain R. Sometimes
f
you may also see something like D 7→ R to denote the same thing.
1.3
The Number Line And Algebra Of The Real Numbers
Next, consider the real numbers, denoted by R, as a line extending infinitely far in both
directions. In this book, the notation, ≡ indicates something is being defined. Thus the
integers are defined as
Z ≡ {· · · − 1, 0, 1, · · · } ,
the natural numbers,
N ≡ {1, 2, · · · }
and the rational numbers, defined as the numbers which are the quotient of two integers.
{m
}
Q≡
such that m, n ∈ Z, n ̸= 0
n
are each subsets of R as indicated in the following picture.
−4 −3 −2 −1
0
1
2
3
4
-
1/2
1.3. THE NUMBER LINE AND ALGEBRA OF THE REAL NUMBERS
13
As shown in the picture, 12 is half way between the number 0 and the number, 1. By
analogy, you can see where to place all the other rational numbers. It is assumed that R has
the following algebra properties, listed here as a collection of assertions called axioms. These
properties will not be proved which is why they are called axioms rather than theorems. In
general, axioms are statements which are regarded as true. Often these are things which
are “self evident” either from experience or from some sort of intuition but this does not
have to be the case.
Axiom 1.3.1 x + y = y + x, (commutative law for addition)
Axiom 1.3.2 x + 0 = x, (additive identity).
Axiom 1.3.3 For each x ∈ R, there exists −x ∈ R such that x + (−x) = 0, (existence of
additive inverse).
Axiom 1.3.4 (x + y) + z = x + (y + z) , (associative law for addition).
Axiom 1.3.5 xy = yx, (commutative law for multiplication).
Axiom 1.3.6 (xy) z = x (yz) , (associative law for multiplication).
Axiom 1.3.7 1x = x, (multiplicative identity).
Axiom 1.3.8 For each x ̸= 0, there exists x−1 such that xx−1 = 1.(existence of multiplicative inverse).
Axiom 1.3.9 x (y + z) = xy + xz.(distributive law).
These axioms are known as the field axioms and any set (there are many others besides
R) which has two such operations satisfying the above axioms is called a field.
and
( Division
)
subtraction are defined in the usual way by x − y ≡ x + (−y) and x/y ≡ x y −1 .
Here is a little proposition which derives some familiar facts.
Proposition 1.3.10 0 and 1 are unique. Also −x is unique and x−1 is unique. Furthermore, 0x = x0 = 0 and −x = (−1) x.
Proof: Suppose 0′ is another additive identity. Then
0′ = 0′ + 0 = 0.
Thus 0 is unique. Say 1′ is another multiplicative identity. Then
1 = 1′ 1 = 1′ .
Now suppose y acts like the additive inverse of x. Then
−x = (−x) + 0 = (−x) + (x + y) = (−x + x) + y = y
Finally,
0x = (0 + 0) x = 0x + 0x
and so
0 = − (0x) + 0x = − (0x) + (0x + 0x) = (− (0x) + 0x) + 0x = 0x
Finally
x + (−1) x = (1 + (−1)) x = 0x = 0
and so by uniqueness of the additive inverse, (−1) x = −x. 14
CHAPTER 1. PRELIMINARIES
1.4
Ordered fields
The real numbers R are an example of an ordered field. More generally, here is a definition.
Definition 1.4.1 Let F be a field. It is an ordered field if there exists an order, < which
satisfies
1. For any x ̸= y, either x < y or y < x.
2. If x < y and either z < w or z = w, then, x + z < y + w.
3. If 0 < x, 0 < y, then xy > 0.
With this definition, the familiar properties of order can be proved. The following
proposition lists many of these familiar properties. The relation ‘a > b’ has the same
meaning as ‘b < a’.
Proposition 1.4.2 The following are obtained.
1. If x < y and y < z, then x < z.
2. If x > 0 and y > 0, then x + y > 0.
3. If x > 0, then −x < 0.
4. If x ̸= 0, either x or −x is > 0.
5. If x < y, then −x > −y.
6. If x ̸= 0, then x2 > 0.
7. If 0 < x < y then x−1 > y −1 .
Proof: First consider 1, called the transitive law. Suppose that x < y and y < z. Then
from the axioms, x + y < y + z and so, adding −y to both sides, it follows
x<z
Next consider 2. Suppose x > 0 and y > 0. Then from 2,
0 = 0 + 0 < x + y.
Next consider 3. It is assumed x > 0 so
0 = −x + x > 0 + (−x) = −x
Now consider 4. If x < 0, then
0 = x + (−x) < 0 + (−x) = −x.
Consider the 5. Since x < y, it follows from 2
0 = x + (−x) < y + (−x)
and so by 4 and Proposition 1.3.10,
(−1) (y + (−x)) < 0
1.5. THE COMPLEX NUMBERS
15
Also from Proposition 1.3.10 (−1) (−x) = − (−x) = x and so
−y + x < 0.
Hence
−y < −x.
Consider 6. If x > 0, there is nothing to show. It follows from the definition. If x < 0,
then by 4, −x > 0 and so by Proposition 1.3.10 and the definition of the order,
2
(−x) = (−1) (−1) x2 > 0
By this proposition again, (−1) (−1) = − (−1) = 1 and so x2 > 0 as claimed. Note that
1 > 0 because it equals 12 .
Finally, consider 7. First, if x > 0 then if x−1 < 0, it would follow (−1) x−1 > 0 and so
x (−1) x−1 = (−1) 1 = −1 > 0. However, this would require
0 > 1 = 12 > 0
from what was just shown. Therefore, x−1 > 0. Now the assumption implies y + (−1) x > 0
and so multiplying by x−1 ,
yx−1 + (−1) xx−1 = yx−1 + (−1) > 0
Now multiply by y −1 , which by the above satisfies y −1 > 0, to obtain
x−1 + (−1) y −1 > 0
and so
x−1 > y −1 . In an ordered field the symbols ≤ and ≥ have the usual meanings. Thus a ≤ b means
a < b or else a = b, etc.
1.5
The Complex Numbers
Just as a real number should be considered as a point on the line, a complex number is
considered a point in the plane which can be identified in the usual way using the Cartesian
coordinates of the point. Thus (a, b) identifies a point whose x coordinate is a and whose
y coordinate is b. In dealing with complex numbers, such a point is written as a + ib and
multiplication and addition are defined in the most obvious way subject to the convention
that i2 = −1. Thus,
(a + ib) + (c + id) = (a + c) + i (b + d)
and
(a + ib) (c + id) = ac + iad + ibc + i2 bd
= (ac − bd) + i (bc + ad) .
Every non zero complex number, a+ib, with a2 +b2 ̸= 0, has a unique multiplicative inverse.
1
a − ib
a
b
= 2
= 2
−i 2
.
a + ib
a + b2
a + b2
a + b2
You should prove the following theorem.
16
CHAPTER 1. PRELIMINARIES
Theorem 1.5.1 The complex numbers with multiplication and addition defined as above
form a field satisfying all the field axioms listed on Page 13.
Note that if x + iy is a complex number, it can be written as
(
)
√
x
y
2
2
√
x + iy = x + y
+ i√
x2 + y 2
x2 + y 2
(
)
y
x
Now √ 2 2 , √ 2 2 is a point on the unit circle and so there exists a unique θ ∈ [0, 2π)
x +y
x +y
√
such that this ordered pair equals (cos θ, sin θ) . Letting r = x2 + y 2 , it follows that the
complex number can be written in the form
x + iy = r (cos θ + i sin θ)
This is called the polar form of the complex number.
The field of complex numbers is denoted as C. An important construction regarding
complex numbers is the complex conjugate denoted by a horizontal line above the number.
It is defined as follows.
a + ib ≡ a − ib.
What it does is reflect a given complex number across the x axis. Algebraically, the following
formula is easy to obtain.
(
)
a + ib (a + ib) = a2 + b2 .
Definition 1.5.2 Define the absolute value of a complex number as follows.
√
|a + ib| ≡ a2 + b2 .
Thus, denoting by z the complex number, z = a + ib,
|z| = (zz)
1/2
.
With this definition, it is important to note the following. Be sure to verify this. It is
not too hard but you need to do it.
√
2
2
Remark 1.5.3 : Let z = a + ib and w = c + id. Then |z − w| = (a − c) + (b − d) . Thus
the distance between the point in the plane determined by the ordered pair, (a, b) and the
ordered pair (c, d) equals |z − w| where z and w are as just described.
For example, consider
the distance between (2, 5) and (1, 8) . From the distance formula
√
√
2
2
this distance equals (2 − 1) + (5 − 8) = 10. On the other hand, letting z = 2 + i5 and
√
w = 1 + i8, z − w = 1 − i3 and so (z − w) (z − w) = (1 − i3) (1 + i3) = 10 so |z − w| = 10,
the same thing obtained with the distance formula.
Complex numbers, are often written in the so called polar form which is described next.
Suppose x + iy is a complex number. Then
(
)
√
x
y
2
2
√
+ i√
.
x + iy = x + y
x2 + y 2
x2 + y 2
Now note that
(
√
x
x2 + y 2
)2
(
+
y
√
x2 + y 2
)2
=1
1.5. THE COMPLEX NUMBERS
17
(
and so
x
y
)
√
,√
x2 + y 2
x2 + y 2
is a point on the unit circle. Therefore, there exists a unique angle, θ ∈ [0, 2π) such that
x
y
cos θ = √
, sin θ = √
.
2
2
2
x +y
x + y2
The polar form of the complex number is then
r (cos θ + i sin θ)
√
where θ is this angle just described and r = x2 + y 2 .
A fundamental identity is the formula of De Moivre which follows.
Theorem 1.5.4 Let r > 0 be given. Then if n is a positive integer,
n
[r (cos t + i sin t)] = rn (cos nt + i sin nt) .
Proof: It is clear the formula holds if n = 1. Suppose it is true for n.
[r (cos t + i sin t)]
n+1
n
= [r (cos t + i sin t)] [r (cos t + i sin t)]
which by induction equals
= rn+1 (cos nt + i sin nt) (cos t + i sin t)
= rn+1 ((cos nt cos t − sin nt sin t) + i (sin nt cos t + cos nt sin t))
= rn+1 (cos (n + 1) t + i sin (n + 1) t)
by the formulas for the cosine and sine of the sum of two angles. Corollary 1.5.5 Let z be a non zero complex number. Then there are always exactly k k th
roots of z in C.
Proof: Let z = x + iy and let z = |z| (cos t + i sin t) be the polar form of the complex
number. By De Moivre’s theorem, a complex number,
r (cos α + i sin α) ,
is a k th root of z if and only if
rk (cos kα + i sin kα) = |z| (cos t + i sin t) .
This requires rk = |z| and so r = |z|
This can only happen if
1/k
and also both cos (kα) = cos t and sin (kα) = sin t.
kα = t + 2lπ
for l an integer. Thus
α=
t + 2lπ
,l ∈ Z
k
and so the k th roots of z are of the form
(
(
)
(
))
t + 2lπ
t + 2lπ
1/k
|z|
cos
+ i sin
, l ∈ Z.
k
k
Since the cosine and sine are periodic of period 2π, there are exactly k distinct numbers
which result from this formula. 18
CHAPTER 1. PRELIMINARIES
Example 1.5.6 Find the three cube roots of i.
( ))
(
( )
First note that i = 1 cos π2 + i sin π2 . Using the formula in the proof of the above
corollary, the cube roots of i are
(
(
)
(
))
(π/2) + 2lπ
(π/2) + 2lπ
1 cos
+ i sin
3
3
where l = 0, 1, 2. Therefore, the roots are
cos
(π )
6
+ i sin
(π )
6
(
, cos
)
( )
5
5
π + i sin
π ,
6
6
(
)
( )
3
3
π + i sin
π .
2
2
√
( ) √
( )
Thus the cube roots of i are 23 + i 12 , −2 3 + i 12 , and −i.
The ability to find k th roots can also be used to factor some polynomials.
and
cos
Example 1.5.7 Factor the polynomial x3 − 27.
First find the cube roots
of 27.
By the (above procedure
using De Moivre’s theorem,
(
√ )
√ )
3
3
−1
these cube roots are 3, 3 −1
+
i
,
and
3
−
i
.
Therefore,
x3 + 27 =
2
2
2
2
(
(
(x − 3) x − 3
(
√ )) (
√ ))
3
3
−1
−1
+i
x−3
−i
.
2
2
2
2
(
(
(
√ )) (
√ ))
3
3
−1
Note also x − 3 −1
+
i
x
−
3
−
i
= x2 + 3x + 9 and so
2
2
2
2
(
)
x3 − 27 = (x − 3) x2 + 3x + 9
where the quadratic polynomial, x2 + 3x + 9 cannot be factored without using complex
numbers.
The real and complex numbers both are fields satisfying the axioms on Page 13 and it is
usually one of these two fields which is used in linear algebra. The numbers are often called
scalars. However, it turns out that all algebraic notions work for any field and there are
many others. For this reason, I will often refer to the field of scalars as F although F will
usually be either the real or complex numbers. If there is any doubt, assume it is the field
of complex numbers which is meant. The reason the complex numbers are so significant in
linear
is that they are algebraically complete. This means that every polynomial
∑n algebra
k
k=0 ak z , n ≥ 1, an ̸= 0, having coefficients ak in C has a root in in C. I will give next a
simple explanation of why it is reasonable to believe in this theorem. A correct proof based
on analysis is given in an appendix Theorem F.0.14 on Page 443.
Theorem 1.5.8 Let p (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 where each ak is a complex
number and an ̸= 0, n ≥ 1. Then there exists w ∈ C such that p (w) = 0.
Here is an informal explanation. Dividing by the leading coefficient an , there is no loss
of generality in assuming that the polynomial is of the form
p (z) = z n + an−1 z n−1 + · · · + a1 z + a0
1.5. THE COMPLEX NUMBERS
19
If a0 = 0, there is nothing to prove because p (0) = 0. Therefore, assume a0 ̸= 0. From
the polar form of a complex number z, it can be written as |z| (cos θ + i sin θ). Thus, by
DeMoivre’s theorem,
n
z n = |z| (cos (nθ) + i sin (nθ))
n
It follows that z n is some point on the circle of radius |z|
Denote by Cr the circle of radius r in the complex plane which is centered at 0. Then if
r is sufficiently large and |z| = r, the term z n is far larger than the rest of the polynomial.
n
k
It is on the circle of radius |z| while the other terms are on circles of fixed multiples of |z|
for k ≤ n − 1. Thus, for r large enough, Ar = {p (z) : z ∈ Cr } describes a closed curve which
misses the inside of some circle having 0 as its center. It won’t be as simple as suggested
in the following picture, but it will be a closed curve thanks to De Moivre’s theorem and
the observation that the cosine and sine are periodic. Now shrink r. Eventually, for r small
enough, the non constant terms are negligible and so Ar is a curve which is contained in
some circle centered at a0 which has 0 on the outside.
Ar
a0
Ar
r large
0
r small
Thus it is reasonable to believe that for some r during this shrinking process, the set Ar
must hit 0. It follows that p (z) = 0 for some z.
For example, consider the polynomial x3 + x + 1 + i. It has no real zeros. However, you
could let z = r (cos t + i sin t) and insert this into the polynomial. Thus you would want to
find a point where
3
(r (cos t + i sin t)) + r (cos t + i sin t) + 1 + i = 0
Expanding this expression on the left to write it in terms of real and imaginary parts, you
get on the left
(
)
r3 cos3 t − 3r3 cos t sin2 t + r cos t + 1 + i 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1
Thus you need to have both the real and imaginary parts equal to 0. In other words, you
need to have
( 3
)
r cos3 t − 3r3 cos t sin2 t + r cos t + 1, 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1 = (0, 0)
for some value of r and t. First here is a graph of this parametric function of t for t ∈ [0, 2π]
on the left, when r = 2
y
x
Note how the graph misses the origin 0 + i0. In fact, the closed curve contains a small
circle which has the point 0 + i0 on its inside. Now here is the graph when r = .5.
20
CHAPTER 1. PRELIMINARIES
y
x
Note how the closed curve is included in a circle which has 0 + i0 on its outside. As you
shrink r you get closed curves. At first, these closed curves enclose 0 + i0 and later, they
exclude 0 + i0. Thus one of them should pass through this point. In fact, consider the curve
which results when r = 1. 386 2.
y
x
Note how for this value of r the curve passes through the point 0 + i0. Thus for some t,
1.3862 (cos t + i sin t)
is a solution of the equation p (z) = 0. A complete proof is in an appendix.
1.6
Exercises
1. Let z = 5 + i9. Find z −1 .
2. Let z = 2 + i7 and let w = 3 − i8. Find zw, z + w, z 2 , and w/z.
3. Give the complete solution to x4 + 16 = 0.
4. Graph the complex cube roots of −8 in the complex plane. Do the same for the four
fourth roots of −16.
5. If z is a complex number, show there exists ω a complex number with |ω| = 1 and
ωz = |z| .
n
6. De Moivre’s theorem says [r (cos t + i sin t)] = rn (cos nt + i sin nt) for n a positive
integer. Does this formula continue to hold for all integers, n, even negative integers?
Explain.
7. You already know formulas for cos (x + y) and sin (x + y) and these were used to prove
De Moivre’s theorem. Now using De Moivre’s theorem, derive a formula for sin (5x)
and one for cos (5x). Hint: Use the binomial theorem.
8. If z and w are two complex numbers and the polar form of z involves the angle θ while
the polar form of w involves the angle ϕ, show that in the polar form for zw the angle
involved is θ + ϕ. Also, show that in the polar form of a complex number, z, r = |z| .
9. Factor x3 + 8 as a product of linear factors.
(
)
10. Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored
any more using only real numbers.
1.7. COMPLETENESS OF R
21
11. Completely factor x4 + 16 as a product of linear factors.
12. Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be
factored further without using complex numbers.
13. If z, w are complex numbers∑
prove zw =∑zw and then show by induction that z1 · · · zm =
m
m
z1 · · · zm . Also verify that k=1 zk = k=1 zk . In words this says the conjugate of a
product equals the product of the conjugates and the conjugate of a sum equals the
sum of the conjugates.
14. Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 where all the ak are real numbers.
Suppose also that p (z) = 0 for some z ∈ C. Show it follows that p (z) = 0 also.
15. I claim that 1 = −1. Here is why.
−1 = i2 =
√
√
√
√
2
−1 −1 = (−1) = 1 = 1.
This is clearly a remarkable result but is there something wrong with it? If so, what
is wrong?
16. De Moivre’s theorem is really a grand thing. I plan to use it now for rational exponents,
not just integers.
1/4
1 = 1(1/4) = (cos 2π + i sin 2π)
= cos (π/2) + i sin (π/2) = i.
Therefore, squaring both sides it follows 1 = −1 as in the previous problem. What
does this tell you about De Moivre’s theorem? Is there a profound difference between
raising numbers to integer powers and raising numbers to non integer powers?
17. Show that C cannot be considered an ordered field. Hint: Consider i2 = −1. Recall
that 1 > 0 by Proposition 1.4.2.
18. Say a + ib < x + iy if a < x or if a = x, then b < y. This is called the lexicographic
order. Show that any two different complex numbers can be compared with this order.
What goes wrong in terms of the other requirements for an ordered field.
19. With the order of Problem 18, consider for n ∈ N the complex number 1 − n1 . Show
that with the lexicographic order just described, each of 1 − in is an upper bound to
all these numbers. Therefore, this is a set which is “bounded above” but has no least
upper bound with respect to the lexicographic order on C.
1.7
Completeness of R
Recall the following important definition from calculus, completeness of R.
Definition 1.7.1 A non empty set, S ⊆ R is bounded above (below) if there exists x ∈ R
such that x ≥ (≤) s for all s ∈ S. If S is a nonempty set in R which is bounded above,
then a number, l which has the property that l is an upper bound and that every other upper
bound is no smaller than l is called a least upper bound, l.u.b. (S) or often sup (S) . If S is a
nonempty set bounded below, define the greatest lower bound, g.l.b. (S) or inf (S) similarly.
Thus g is the g.l.b. (S) means g is a lower bound for S and it is the largest of all lower
bounds. If S is a nonempty subset of R which is not bounded above, this information is
expressed by saying sup (S) = +∞ and if S is not bounded below, inf (S) = −∞.
22
CHAPTER 1. PRELIMINARIES
Every existence theorem in calculus depends on some form of the completeness axiom.
Axiom 1.7.2 (completeness) Every nonempty set of real numbers which is bounded above
has a least upper bound and every nonempty set of real numbers which is bounded below has
a greatest lower bound.
It is this axiom which distinguishes Calculus from Algebra. A fundamental result about
sup and inf is the following.
Proposition 1.7.3 Let S be a nonempty set and suppose sup (S) exists. Then for every
δ > 0,
S ∩ (sup (S) − δ, sup (S)] ̸= ∅.
If inf (S) exists, then for every δ > 0,
S ∩ [inf (S) , inf (S) + δ) ̸= ∅.
Proof: Consider the first claim. If the indicated set equals ∅, then sup (S) − δ is an
upper bound for S which is smaller than sup (S) , contrary to the definition of sup (S) as
the least upper bound. In the second claim, if the indicated set equals ∅, then inf (S) + δ
would be a lower bound which is larger than inf (S) contrary to the definition of inf (S). 1.8
Well Ordering And Archimedean Property
Definition 1.8.1 A set is well ordered if every nonempty subset S, contains a smallest
element z having the property that z ≤ x for all x ∈ S.
Axiom 1.8.2 Any set of integers larger than a given number is well ordered.
In particular, the natural numbers defined as
N ≡ {1, 2, · · · }
is well ordered.
The above axiom implies the principle of mathematical induction.
Theorem 1.8.3 (Mathematical induction) A set S ⊆ Z, having the property that a ∈ S
and n + 1 ∈ S whenever n ∈ S contains all integers x ∈ Z such that x ≥ a.
Proof: Let T ≡ ([a, ∞) ∩ Z) \ S. Thus T consists of all integers larger than or equal
to a which are not in S. The theorem will be proved if T = ∅. If T ̸= ∅ then by the well
ordering principle, there would have to exist a smallest element of T, denoted as b. It must
be the case that b > a since by definition, a ∈
/ T. Then the integer, b − 1 ≥ a and b − 1 ∈
/S
because if b − 1 ∈ S, then b − 1 + 1 = b ∈ S by the assumed property of S. Therefore,
b − 1 ∈ ([a, ∞) ∩ Z) \ S = T which contradicts the choice of b as the smallest element of T.
(b − 1 is smaller.) Since a contradiction is obtained by assuming T ̸= ∅, it must be the case
that T = ∅ and this says that everything in [a, ∞) ∩ Z is also in S. Example 1.8.4 Show that for all n ∈ N,
1
2
·
3
4
· · · 2n−1
2n <
√ 1
.
2n+1
1.8. WELL ORDERING AND ARCHIMEDEAN PROPERTY
If n = 1 this reduces to the statement that
then that the inequality holds for n. Then
1 3
2n − 1 2n + 1
· ···
·
2 4
2n
2n + 2
1
2
<
=
√1
3
<
23
which is obviously true. Suppose
1
2n + 1
√
2n + 1 2n + 2
√
2n + 1
.
2n + 2
1
. This happens if and
The theorem will be proved if this last expression is less than √2n+3
only if
(
)2
1
2n + 1
1
√
>
=
2
2n + 3
2n + 3
(2n + 2)
2
which occurs if and only if (2n + 2) > (2n + 3) (2n + 1) and this is clearly true which may
be seen from expanding both sides. This proves the inequality.
Definition 1.8.5 The Archimedean property states that whenever x ∈ R, and a > 0, there
exists n ∈ N such that na > x.
Proposition 1.8.6 R has the Archimedean property.
Proof: Suppose it is not true. Then there exists x ∈ R and a > 0 such that na ≤ x
for all n ∈ N. Let S = {na : n ∈ N} . By assumption, this is bounded above by x. By
completeness, it has a least upper bound y. By Proposition 1.7.3 there exists n ∈ N such
that
y − a < na ≤ y.
Then y = y − a + a < na + a = (n + 1) a ≤ y, a contradiction. Theorem 1.8.7 Suppose x < y and y − x > 1. Then there exists an integer l ∈ Z, such
that x < l < y. If x is an integer, there is no integer y satisfying x < y < x + 1.
Proof: Let x be the smallest positive integer. Not surprisingly, x = 1 but this can be
proved. If x < 1 then x2 < x contradicting the assertion that x is the smallest natural
number. Therefore, 1 is the smallest natural number. This shows there is no integer, y,
satisfying x < y < x + 1 since otherwise, you could subtract x and conclude 0 < y − x < 1
for some integer y − x.
Now suppose y − x > 1 and let
S ≡ {w ∈ N : w ≥ y} .
The set S is nonempty by the Archimedean property. Let k be the smallest element of S.
Therefore, k − 1 < y. Either k − 1 ≤ x or k − 1 > x. If k − 1 ≤ x, then
≤0
z }| {
y − x ≤ y − (k − 1) = y − k + 1 ≤ 1
contrary to the assumption that y − x > 1. Therefore, x < k − 1 < y. Let l = k − 1. It is the next theorem which gives the density of the rational numbers. This means that
for any real number, there exists a rational number arbitrarily close to it.
Theorem 1.8.8 If x < y then there exists a rational number r such that x < r < y.
24
CHAPTER 1. PRELIMINARIES
Proof: Let n ∈ N be large enough that
n (y − x) > 1.
Thus (y − x) added to itself n times is larger than 1. Therefore,
n (y − x) = ny + n (−x) = ny − nx > 1.
It follows from Theorem 1.8.7 there exists m ∈ Z such that
nx < m < ny
and so take r = m/n. Definition 1.8.9 A set S ⊆ R is dense in R if whenever a < b, S ∩ (a, b) ̸= ∅.
Thus the above theorem says Q is “dense” in R.
Theorem 1.8.10 Suppose 0 < a and let b ≥ 0. Then there exists a unique integer p and
real number r such that 0 ≤ r < a and b = pa + r.
Proof: Let S ≡ {n ∈ N : an > b} . By the Archimedean property this set is nonempty.
Let p + 1 be the smallest element of S. Then pa ≤ b because p + 1 is the smallest in S.
Therefore,
r ≡ b − pa ≥ 0.
If r ≥ a then b − pa ≥ a and so b ≥ (p + 1) a contradicting p + 1 ∈ S. Therefore, r < a as
desired.
To verify uniqueness of p and r, suppose pi and ri , i = 1, 2, both work and r2 > r1 . Then
a little algebra shows
r2 − r1
p1 − p2 =
∈ (0, 1) .
a
Thus p1 − p2 is an integer between 0 and 1, contradicting Theorem 1.8.7. The case that
r1 > r2 cannot occur either by similar reasoning. Thus r1 = r2 and it follows that p1 = p2 .
This theorem is called the Euclidean algorithm when a and b are integers.
1.9
Division
First recall Theorem 1.8.10, the Euclidean algorithm.
Theorem 1.9.1 Suppose 0 < a and let b ≥ 0. Then there exists a unique integer p and real
number r such that 0 ≤ r < a and b = pa + r.
The following definition describes what is meant by a prime number and also what is
meant by the word “divides”.
Definition 1.9.2 The number, a divides the number, b if in Theorem 1.8.10, r = 0. That
is there is zero remainder. The notation for this is a|b, read a divides b and a is called a
factor of b. A prime number is one which has the property that the only numbers which
divide it are itself and 1. The greatest common divisor of two positive integers, m, n is that
number, p which has the property that p divides both m and n and also if q divides both m
and n, then q divides p. Two integers are relatively prime if their greatest common divisor
is one. The greatest common divisor of m and n is denoted as (m, n) .
1.9. DIVISION
25
There is a phenomenal and amazing theorem which relates the greatest common divisor
to the smallest number in a certain set. Suppose m, n are two positive integers. Then if x, y
are integers, so is xm + yn. Consider all integers which are of this form. Some are positive
such as 1m + 1n and some are not. The set S in the following theorem consists of exactly
those integers of this form which are positive. Then the greatest common divisor of m and
n will be the smallest number in S. This is what the following theorem says.
Theorem 1.9.3 Let m, n be two positive integers and define
S ≡ {xm + yn ∈ N : x, y ∈ Z } .
Then the smallest number in S is the greatest common divisor, denoted by (m, n) .
Proof: First note that both m and n are in S so it is a nonempty set of positive integers.
By well ordering, there is a smallest element of S, called p = x0 m + y0 n. Either p divides m
or it does not. If p does not divide m, then by Theorem 1.8.10,
m = pq + r
where 0 < r < p. Thus m = (x0 m + y0 n) q + r and so, solving for r,
r = m (1 − x0 ) + (−y0 q) n ∈ S.
However, this is a contradiction because p was the smallest element of S. Thus p|m. Similarly
p|n.
Now suppose q divides both m and n. Then m = qx and n = qy for integers, x and y.
Therefore,
p = mx0 + ny0 = x0 qx + y0 qy = q (x0 x + y0 y)
showing q|p. Therefore, p = (m, n) . There is a relatively simple algorithm for finding (m, n) which will be discussed now.
Suppose 0 < m < n where m, n are integers. Also suppose the greatest common divisor is
(m, n) = d. Then by the Euclidean algorithm, there exist integers q, r such that
n = qm + r, r < m
(1.1)
Now d divides n and m so there are numbers k, l such that dk = m, dl = n. From the above
equation,
r = n − qm = dl − qdk = d (l − qk)
Thus d divides both m and r. If k divides both m and r, then from the equation of 1.1 it
follows k also divides n. Therefore, k divides d by the definition of the greatest common
divisor. Thus d is the greatest common divisor of m and r but m + r < m + n. This yields
another pair of positive integers for which d is still the greatest common divisor but the
sum of these integers is strictly smaller than the sum of the first two. Now you can do the
same thing to these integers. Eventually the process must end because the sum gets strictly
smaller each time it is done. It ends when there are not two positive integers produced.
That is, one is a multiple of the other. At this point, the greatest common divisor is the
smaller of the two numbers.
Procedure 1.9.4 To find the greatest common divisor of m, n where 0 < m < n, replace
the pair {m, n} with {m, r} where n = qm + r for r < m. This new pair of numbers has
the same greatest common divisor. Do the process to this pair and continue doing this till
you obtain a pair of numbers where one is a multiple of the other. Then the smaller is the
sought for greatest common divisor.
26
CHAPTER 1. PRELIMINARIES
Example 1.9.5 Find the greatest common divisor of 165 and 385.
Use the Euclidean algorithm to write
385 = 2 (165) + 55
Thus the next two numbers are 55 and 165. Then
165 = 3 × 55
and so the greatest common divisor of the first two numbers is 55.
Example 1.9.6 Find the greatest common divisor of 1237 and 4322.
Use the Euclidean algorithm
4322 = 3 (1237) + 611
Now the two new numbers are 1237,611. Then
1237 = 2 (611) + 15
The two new numbers are 15,611. Then
611 = 40 (15) + 11
The two new numbers are 15,11. Then
15 = 1 (11) + 4
The two new numbers are 11,4
2 (4) + 3
The two new numbers are 4, 3. Then
4 = 1 (3) + 1
The two new numbers are 3, 1. Then
3=3×1
and so 1 is the greatest common divisor. Of course you could see this right away when the
two new numbers were 15 and 11. Recall the process delivers numbers which have the same
greatest common divisor.
This amazing theorem will now be used to prove a fundamental property of prime numbers which leads to the fundamental theorem of arithmetic, the major theorem which says
every integer can be factored as a product of primes.
Theorem 1.9.7 If p is a prime and p|ab then either p|a or p|b.
Proof: Suppose p does not divide a. Then since p is prime, the only factors of p are 1
and p so follows (p, a) = 1 and therefore, there exists integers, x and y such that
1 = ax + yp.
Multiplying this equation by b yields
b = abx + ybp.
Since p|ab, ab = pz for some integer z. Therefore,
b = abx + ybp = pzx + ybp = p (xz + yb)
and this shows p divides b. 1.9. DIVISION
27
∏n
Theorem 1.9.8 (Fundamental theorem of arithmetic) Let a ∈ N\ {1}. Then a = i=1 pi
where pi are all prime numbers. Furthermore, this prime factorization is unique except for
the order of the factors.
Proof: If a equals a prime number, the prime factorization clearly exists. In particular
the prime factorization exists for the prime number 2. Assume this theorem is true for all
a ≤ n − 1. If n is a prime, then it has a prime factorization. On the other hand, if n is not
a prime, then there exist two integers k and m such that n = km where each of k and m
are less than n. Therefore, each of these is no larger than n − 1 and consequently, each has
a prime factorization. Thus so does n. It remains to argue the prime factorization is unique
except for order of the factors.
Suppose
n
m
∏
∏
pi =
qj
i=1
j=1
where the pi and qj are all prime, there is no way to reorder the qk such that m = n and
pi = qi for all i, and n + m is the smallest positive integer such that this happens. Then
by Theorem 1.9.7, p1 |qj for some j. Since these are prime numbers this requires p1 = qj .
Reordering if necessary it can be assumed that qj = q1 . Then dividing both sides by p1 = q1 ,
n−1
∏
i=1
pi+1 =
m−1
∏
qj+1 .
j=1
Since n + m was as small as possible for the theorem to fail, it follows that n − 1 = m − 1
and the prime numbers, q2 , · · · , qm can be reordered in such a way that pk = qk for all
k = 2, · · · , n. Hence pi = qi for all i because it was already argued that p1 = q1 , and this
results in a contradiction. There is a similar division result for polynomials. This will be discussed more intensively
later. For now, here is a definition and the division theorem.
Definition 1.9.9 A polynomial is an expression of the form an λn +an−1 λn−1 +· · ·+a1 λ+a0 ,
an ̸= 0 where the ai come from a field of scalars. Two polynomials are equal means that the
coefficients match for each power of λ. The degree of a polynomial is the largest power
of λ. Thus the degree of the above polynomial is n. Addition of polynomials is defined in the
usual way as is multiplication of two polynomials. The leading term in the above polynomial
is an λn .
Lemma 1.9.10 Let f (λ) and g (λ) ̸= 0 be polynomials. Then there exist polynomials, q (λ)
and r (λ) such that
f (λ) = q (λ) g (λ) + r (λ)
where the degree of r (λ) is less than the degree of g (λ) or r (λ) = 0. These polynomials
q (λ) and r (λ) are unique.
Proof: Suppose that f (λ) − q (λ) g (λ) is never equal to 0 for any q (λ). If it is, then the
conclusion follows. Now suppose
r (λ) = f (λ) − q (λ) g (λ)
and the degree of r (λ) is m ≥ n where n is the degree of g (λ). Then there exists a
such that aλm−n g (λ) has the same leading term as r (λ). Thus the degree of r1 (λ) ≡
28
CHAPTER 1. PRELIMINARIES
r (λ) − aλm−n g (λ) has degree no more than m − 1. Then
(
)
r1 (λ) = f (λ) − q (λ) g (λ) + aλm−n g (λ)


q1 (λ)
z
}|
{


= f (λ) − q (λ) + aλm−n  g (λ)
Denote by S the set of polynomials f (λ) − g (λ) l (λ) . Out of all these polynomials, there
exists one which has smallest degree r (λ). Let this take place when l (λ) = q (λ). Then by
the above argument, the degree of r (λ) is less than the degree of g (λ). Otherwise, there is
one which has smaller degree. Thus f (λ) = g (λ) q (λ) + r (λ).
As to uniqueness, if you have r (λ) , rˆ (λ) , q (λ) , qˆ (λ) which work, then you would have
(ˆ
q (λ) − q (λ)) g (λ) = r (λ) − rˆ (λ)
Now if the polynomial on the right is not zero, then neither is the one on the left. Hence this
would involve two polynomials which are equal although their degrees are different. This is
impossible. Hence r (λ) = rˆ (λ) and so, matching coefficients implies that qˆ (λ) = q (λ). 1.10
Systems Of Equations
Sometimes it is necessary to solve systems of equations. For example the problem could be
to find x and y such that
x + y = 7 and 2x − y = 8.
(1.2)
The set of ordered pairs, (x, y) which solve both equations is called the solution set. For
example, you can see that (5, 2) = (x, y) is a solution to the above system. To solve this,
note that the solution set does not change if any equation is replaced by a non zero multiple
of itself. It also does not change if one equation is replaced by itself added to a multiple
of the other equation. For example, x and y solve the above system if and only if x and y
solve the system
−3y=−6
z
}|
{
x + y = 7, 2x − y + (−2) (x + y) = 8 + (−2) (7).
(1.3)
The second equation was replaced by −2 times the first equation added to the second. Thus
the solution is y = 2, from −3y = −6 and now, knowing y = 2, it follows from the other
equation that x + 2 = 7 and so x = 5.
Why exactly does the replacement of one equation with a multiple of another added to
it not change the solution set? The two equations of 1.2 are of the form
E 1 = f1 , E 2 = f2
(1.4)
where E1 and E2 are expressions involving the variables. The claim is that if a is a number,
then 1.4 has the same solution set as
E1 = f1 , E2 + aE1 = f2 + af1 .
(1.5)
Why is this?
If (x, y) solves 1.4 then it solves the first equation in 1.5. Also, it satisfies aE1 = af1
and so, since it also solves E2 = f2 it must solve the second equation in 1.5. If (x, y) solves
1.5 then it solves the first equation of 1.4. Also aE1 = af1 and it is given that the second
equation of 1.5 is verified. Therefore, E2 = f2 and it follows (x, y) is a solution of the second
1.10. SYSTEMS OF EQUATIONS
29
equation in 1.4. This shows the solutions to 1.4 and 1.5 are exactly the same which means
they have the same solution set. Of course the same reasoning applies with no change if
there are many more variables than two and many more equations than two. It is still the
case that when one equation is replaced with a multiple of another one added to itself, the
solution set of the whole system does not change.
The other thing which does not change the solution set of a system of equations consists
of listing the equations in a different order. Here is another example.
Example 1.10.1 Find the solutions to the system,
x + 3y + 6z = 25
2x + 7y + 14z = 58
2y + 5z = 19
(1.6)
To solve this system replace the second equation by (−2) times the first equation added
to the second. This yields. the system
x + 3y + 6z = 25
y + 2z = 8
2y + 5z = 19
(1.7)
Now take (−2) times the second and add to the third. More precisely, replace the third
equation with (−2) times the second added to the third. This yields the system
x + 3y + 6z = 25
y + 2z = 8
z=3
(1.8)
At this point, you can tell what the solution is. This system has the same solution as the
original system and in the above, z = 3. Then using this in the second equation, it follows
y + 6 = 8 and so y = 2. Now using this in the top equation yields x + 6 + 18 = 25 and so
x = 1.
This process is not really much different from what you have always done in solving a
single equation. For example, suppose you wanted to solve 2x + 5 = 3x − 6. You did the
same thing to both sides of the equation thus preserving the solution set until you obtained
an equation which was simple enough to give the answer. In this case, you would add −2x
to both sides and then add 6 to both sides. This yields x = 11.
In 1.8 you could have continued as follows. Add (−2) times the bottom equation to the
middle and then add (−6) times the bottom to the top. This yields
x + 3y = 19
y=6
z=3
Now add (−3) times the second to the top. This yields
x=1
y=6 ,
z=3
a system which has the same solution set as the original system.
It is foolish to write the variables every time you do these operations. It is easier to
write the system 1.6 as the following “augmented matrix”


1 3 6 25
 2 7 14 58  .
0 2 5 19
30
CHAPTER 1. PRELIMINARIES
It has exactly 
the same
it is understood there is
 informationas the
original system but
 here
1
3
6
an x column,  2  , a y column,  7  and a z column,  14  . The rows correspond
0
2
5
to the equations in the system. Thus the top row in the augmented matrix corresponds to
the equation,
x + 3y + 6z = 25.
Now when you replace an equation with a multiple of another equation added to itself, you
are just taking a row of this augmented matrix and replacing it with a multiple of another
row added to it. Thus the first step in solving 1.6 would be to take (−2) times the first row
of the augmented matrix above and add it to the second row,


1 3 6 25
 0 1 2 8 .
0 2 5 19
Note how this corresponds to 1.7. Next
third,

1
 0
0
take (−2) times the second row and add to the
3
1
0

6 25
2 8 
1 3
which is the same as 1.8. You get the idea I hope. Write the system as an augmented matrix
and follow the procedure of either switching rows, multiplying a row by a non zero number,
or replacing a row by a multiple of another row added to it. Each of these operations leaves
the solution set unchanged. These operations are called row operations.
Definition 1.10.2 The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by a multiple of another row added to it.
It is important to observe that any row operation can be “undone” by another inverse
row operation. For example, if r1 , r2 are two rows, and r2 is replaced with r′2 = αr1 + r2
using row operation 3, then you could get back to where you started by replacing the row r′2
with −α times r1 and adding to r′2 . In the case of operation 2, you would simply multiply
the row that was changed by the inverse of the scalar which multiplied it in the first place,
and in the case of row operation 1, you would just make the same switch again and you
would be back to where you started. In each case, the row operation which undoes what
was done is called the inverse row operation.
Example 1.10.3 Give the complete solution to the system of equations, 5x+10y−7z = −2,
2x + 4y − 3z = −1, and 3x + 6y + 5z = 9.
The augmented matrix for this system is

2 4 −3
 5 10 −7
3 6
5

−1
−2 
9
1.10. SYSTEMS OF EQUATIONS
31
Multiply the second row by 2, the first row by 5, and then take (−1) times the first row and
add to the second. Then multiply the first row by 1/5. This yields


2 4 −3 −1
 0 0 1
1 
3 6 5
9
Now, combining some row operations, take (−3) times the first row and add this to 2 times
the last row and replace the last row with this. This yields.


2 4 −3 −1
 0 0 1
1 .
0 0 1
21
Putting in the variables, the last two rows say z = 1 and z = 21. This is impossible so
the last system of equations determined by the above augmented matrix has no solution.
However, it has the same solution set as the first system of equations. This shows there is no
solution to the three given equations. When this happens, the system is called inconsistent.
This should not be surprising that something like this can take place. It can even happen
for one equation in one variable. Consider for example, x = x+1. There is clearly no solution
to this.
Example 1.10.4 Give the complete solution to the system of equations, 3x − y − 5z = 9,
y − 10z = 0, and −2x + y = −6.
The augmented matrix of this system is


3 −1 −5
9
 0
1 −10 0 
−2 1
0
−6
Replace the last row with 2 times the top row added to 3 times the bottom row. This gives


3 −1 −5 9
 0 1 −10 0 
0 1 −10 0
Next take −1 times the middle row and

3
 0
0
add to the bottom.

−1 −5 9
1 −10 0 
0
0
0
Take the middle row and add to the top

1
 0
0
and then divide the top row which results by 3.

0 −5 3
1 −10 0  .
0
0
0
This says y = 10z and x = 3 + 5z. Apparently z can equal any number. Therefore, the
solution set of this system is x = 3 + 5t, y = 10t, and z = t where t is completely arbitrary.
The system has an infinite set of solutions and this is a good description of the solutions.
This is what it is all about, finding the solutions to the system.
Definition 1.10.5 Since z = t where t is arbitrary, the variable z is called a free variable.
32
CHAPTER 1. PRELIMINARIES
The phenomenon of an infinite solution set occurs in equations having only one variable
also. For example, consider the equation x = x. It doesn’t matter what x equals.
Definition 1.10.6 A system of linear equations is a list of equations,
n
∑
aij xj = fj , i = 1, 2, 3, · · · , m
j=1
where aij are numbers, fj is a number, and it is desired to find (x1 , · · · , xn ) solving each of
the equations listed.
As illustrated above, such a system of linear equations may have a unique solution, no
solution, or infinitely many solutions. It turns out these are the only three cases which can
occur for linear systems. Furthermore, you do exactly the same things to solve any linear
system. You write the augmented matrix and do row operations until you get a simpler
system in which it is possible to see the solution. All is based on the observation that the
row operations do not change the solution set. You can have more equations than variables,
fewer equations than variables, etc. It doesn’t matter. You always set up the augmented
matrix and go to work on it. These things are all the same.
Example 1.10.7 Give the complete solution to the system of equations, −41x + 15y = 168,
109x − 40y = −447, −3x + y = 12, and 2x + z = −1.
The augmented matrix is

−41
 109

 −3
2
15
−40
1
0
0
0
0
1

168
−447 
.
12 
−1
To solve this multiply the top row by 109, the second row by 41, add the top row to the
second row, and multiply the top row by 1/109. Note how this process combined several
row operations. This yields


−41 15 0 168
 0
−5 0 −15 

.
 −3
1 0 12 
2
0 1 −1
Next take 2 times the third row and replace the fourth row by this added to 3 times the
fourth row. Then take (−41) times the third row and replace the first row by this added to
3 times the first row. Then switch the third and the first rows. This yields


123 −41 0 −492
 0
−5 0 −15 

.
 0
4
0
12 
0
2
3
21
Take −1/2 times the third row and add to the bottom row. Then take 5 times the third
row and add to four times the second. Finally take 41 times the third row and add to 4
times the top row. This yields


492 0 0 −1476
 0 0 0

0


 0 4 0
12 
0 0 3
15
It follows x = −1476
492 = −3, y = 3 and z = 5.
You should practice solving systems of equations. Here are some exercises.
1.11. EXERCISES
1.11
33
Exercises
1. Give the complete solution to the system of equations, 3x − y + 4z = 6, y + 8z = 0,
and −2x + y = −4.
2. Give the complete solution to the system of equations, x+3y +3z = 3, 3x+2y +z = 9,
and −4x + z = −9.
3. Consider the system −5x + 2y − z = 0 and −5x − 2y − z = 0. Both equations equal
zero and so −5x + 2y − z = −5x − 2y − z which is equivalent to y = 0. Thus x and
z can equal anything. But when x = 1, z = −4, and y = 0 are plugged in to the
equations, it doesn’t work. Why?
4. Give the complete solution to the system of equations, x+2y +6z = 5, 3x+2y +6z = 7
,−4x + 5y + 15z = −7.
5. Give the complete solution to the system of equations
x + 2y + 3z
−4x + 5y + z
= 5, 3x + 2y + z = 7,
= −7, x + 3z = 5.
6. Give the complete solution of the system of equations,
x + 2y + 3z
−4x + 5y + 5z
= 5, 3x + 2y + 2z = 7
= −7, x = 5
7. Give the complete solution of the system of equations
x + y + 3z
−4x + 9y + z
= 2, 3x − y + 5z = 6
= −8, x + 5y + 7z = 2
8. Determine a such that there are infinitely many solutions and then find them. Next
determine a such that there are no solutions. Finally determine which values of a
correspond to a unique solution. The system of equations for the unknown variables
x, y, z is
3za2 − 3a + (x + y +) 1 = 0
3x − a − y + z a2 + 4 − 5 = 0
za2 − a − 4x + 9y + 9 = 0
9. Find the solutions to the following system of equations for x, y, z, w.
y + z = 2, z + w = 0, y − 4z − 5w = 2, 2y + z − w = 4
10. Find all solutions to the following equations.
x+y+z
2x + 2y + z − w
= 2, z + w = 0,
= 4, x + y − 4z − 5z = 2
34
1.12
CHAPTER 1. PRELIMINARIES
Fn
The notation, Cn refers to the collection of ordered lists of n complex numbers. Since every
real number is also a complex number, this simply generalizes the usual notion of Rn , the
collection of all ordered lists of n real numbers. In order to avoid worrying about whether
it is real or complex numbers which are being referred to, the symbol F will be used. If it is
not clear, always pick C. More generally, Fn refers to the ordered lists of n elements of Fn .
Definition 1.12.1 Define Fn ≡ {(x1 , · · · , xn ) : xj ∈ F for j = 1, · · · , n} . (x1 , · · · , xn ) =
(y1 , · · · , yn ) if and only if for all j = 1, · · · , n, xj = yj . When (x1 , · · · , xn ) ∈ Fn , it is
conventional to denote (x1 , · · · , xn ) by the single bold face letter x. The numbers xj are
called the coordinates. The set
{(0, · · · , 0, t, 0, · · · , 0) : t ∈ F}
for t in the ith slot is called the ith coordinate axis. The point 0 ≡ (0, · · · , 0) is called the
origin.
Thus (1, 2, 4i) ∈ F3 and (2, 1, 4i) ∈ F3 but (1, 2, 4i) ̸= (2, 1, 4i) because, even though the
same numbers are involved, they don’t match up. In particular, the first entries are not
equal.
1.13
Algebra in Fn
There are two algebraic operations done with elements of Fn . One is addition and the other
is multiplication by numbers, called scalars. In the case of Cn the scalars are complex
numbers while in the case of Rn the only allowed scalars are real numbers. Thus, the scalars
always come from F in either case.
Definition 1.13.1 If x ∈ Fn and a ∈ F, also called a scalar, then ax ∈ Fn is defined by
ax = a (x1 , · · · , xn ) ≡ (ax1 , · · · , axn ) .
(1.9)
This is known as scalar multiplication. If x, y ∈ Fn then x + y ∈ Fn and is defined by
x + y = (x1 , · · · , xn ) + (y1 , · · · , yn )
≡ (x1 + y1 , · · · , xn + yn )
(1.10)
With this definition, the algebraic properties satisfy the conclusions of the following
theorem.
Theorem 1.13.2 For v, w ∈ Fn and α, β scalars, (real numbers), the following hold.
v + w = w + v,
(1.11)
(v + w) + z = v+ (w + z) ,
(1.12)
v + 0 = v,
(1.13)
v+ (−v) = 0,
(1.14)
the commutative law of addition,
the associative law for addition,
the existence of an additive identity,
1.14. EXERCISES
35
the existence of an additive inverse, Also
α (v + w) = αv+αw,
(1.15)
(α + β) v =αv+βv,
(1.16)
α (βv) = αβ (v) ,
(1.17)
1v = v.
(1.18)
In the above 0 = (0, · · · , 0).
You should verify that these properties all hold. As usual subtraction is defined as
x − y ≡ x+ (−y) . The conclusions of the above theorem are called the vector space axioms.
1.14
Exercises
1. Verify all the properties 1.11-1.18.
2. Compute 5 (1, 2 + 3i, 3, −2) + 6 (2 − i, 1, −2, 7) .
3. Draw a picture of the points in R2 which are determined by the following ordered
pairs.
(a) (1, 2)
(b) (−2, −2)
(c) (−2, 3)
(d) (2, −5)
4. Does it make sense to write (1, 2) + (2, 3, 1)? Explain.
5. Draw a picture of the points in R3 which are determined by the following ordered
triples. If you have trouble drawing this, describe it in words.
(a) (1, 2, 0)
(b) (−2, −2, 1)
(c) (−2, 3, −2)
1.15
The Inner Product In Fn
When F = R or C, there is something called an inner product. In case of R it is also called
the dot product. This is also often referred to as the scalar product.
Definition 1.15.1 Let a, b ∈ Fn define a · b as
a·b≡
n
∑
ak bk .
k=1
This will also be denoted as (a, b). Often it is also denoted as ⟨a, b⟩. The notation with the
dot is more usually used when the field is R.
36
CHAPTER 1. PRELIMINARIES
With this definition, there are several important properties satisfied by the inner product.
In the statement of these properties, α and β will denote scalars and a, b, c will denote
vectors or in other words, points in Fn .
Proposition 1.15.2 The inner product satisfies the following properties.
a · b =b · a
(1.19)
a · a ≥ 0 and equals zero if and only if a = 0
(1.20)
(αa + βb) · c =α (a · c) + β (b · c)
(1.21)
c · (αa + βb) = α (c · a) + β (c · b)
(1.22)
2
|a| = a · a
(1.23)
You should verify these properties. Also be sure you understand that 1.22 follows from
the first three and is therefore redundant. It is listed here for the sake of convenience.
Example 1.15.3 Find (1, 2, 0, −1) · (0, i, 2, 3) .
This equals 0 + 2 (−i) + 0 + −3 = −3 − 2i
The Cauchy Schwarz inequality takes the following form in terms of the inner product.
I will prove it using only the above axioms for the inner product.
Theorem 1.15.4 The inner product satisfies the inequality
|a · b| ≤ |a| |b| .
(1.24)
Furthermore equality is obtained if and only if one of a or b is a scalar multiple of the other.
Proof: First define θ ∈ C such that
θ (a · b) = |a · b| , |θ| = 1,
and define a function of t ∈ R
f (t) = (a + tθb) · (a + tθb) .
Then by 1.20, f (t) ≥ 0 for all t ∈ R. Also from 1.21,1.22,1.19, and 1.23
f (t) = a · (a + tθb) + tθb · (a + tθb)
2
= a · a + tθ (a · b) + tθ (b · a) + t2 |θ| b · b
2
2
2
2
= |a| + 2t Re θ (a · b) + |b| t2 = |a| + 2t |a · b| + |b| t2
2
Now if |b| = 0 it must be the case that a · b = 0 because otherwise, you could pick large
negative values of t and violate f (t) ≥ 0. Therefore, in this case, the Cauchy Schwarz
inequality holds. In the case that |b| ̸= 0, y = f (t) is a polynomial which opens up and
therefore, if it is always nonnegative, its graph is like that illustrated in the following picture
t
t
1.15. THE INNER PRODUCT IN FN
37
Then the quadratic formula requires that
The discriminant
}|
{
z
2
2
2
4 |a · b| − 4 |a| |b| ≤ 0
since otherwise the function, f (t) would have two real zeros and would necessarily have a
graph which dips below the t axis. This proves 1.24.
It is clear from the axioms of the inner product that equality holds in 1.24 whenever one
of the vectors is a scalar multiple of the other. It only remains to verify this is the only way
equality can occur. If either vector equals zero, then equality is obtained in 1.24 so it can be
assumed both vectors are non zero. Then if equality is achieved, it follows f (t) has exactly
one real zero because the discriminant vanishes. Therefore, for some value of t, a + tθb = 0
showing that a is a multiple of b. You should note that the entire argument was based only on the properties of the inner
product listed in 1.19 - 1.23. This means that whenever something satisfies these properties,
the Cauchy Schwarz inequality holds. There are many other instances of these properties
besides vectors in Fn . Also note that 1.24 holds if 1.20 is simplified to a · a ≥ 0.
The Cauchy Schwarz inequality allows a proof of the triangle inequality for distances in
Fn in much the same way as the triangle inequality for the absolute value.
Theorem 1.15.5 (Triangle inequality) For a, b ∈ Fn
|a + b| ≤ |a| + |b|
(1.25)
and equality holds if and only if one of the vectors is a nonnegative scalar multiple of the
other. Also
||a| − |b|| ≤ |a − b|
(1.26)
Proof : By properties of the inner product and the Cauchy Schwarz inequality,
2
|a + b| = (a + b) · (a + b) = (a · a) + (a · b) + (b · a) + (b · b)
2
2
2
= |a| + 2 Re (a · b) + |b| ≤ |a| + 2 |a · b| + |b|
2
2
2
2
≤ |a| + 2 |a| |b| + |b| = (|a| + |b|) .
Taking square roots of both sides you obtain 1.25.
It remains to consider when equality occurs. If either vector equals zero, then that
vector equals zero times the other vector and the claim about when equality occurs is
verified. Therefore, it can be assumed both vectors are nonzero. To get equality in the
second inequality above, Theorem 1.15.4 implies one of the vectors must be a multiple of
the other. Say b = αa. Also, to get equality in the first inequality, (a · b) must be a
nonnegative real number. Thus
2
0 ≤ (a · b) = (a·αa) = α |a| .
Therefore, α must be a real number which is nonnegative.
To get the other form of the triangle inequality,
a=a−b+b
so
|a| = |a − b + b| ≤ |a − b| + |b| .
38
CHAPTER 1. PRELIMINARIES
Therefore,
|a| − |b| ≤ |a − b|
(1.27)
|b| − |a| ≤ |b − a| = |a − b| .
(1.28)
Similarly,
It follows from 1.27 and 1.28 that 1.26 holds. This is because ||a| − |b|| equals the left side
of either 1.27 or 1.28 and either way, ||a| − |b|| ≤ |a − b| . 1.16
What Is Linear Algebra?
The above preliminary considerations form the necessary scaffolding upon which linear algebra is built. Linear algebra is the study of a certain algebraic structure called a vector
space described in a special case in Theorem 1.13.2 and in more generality below along with
special functions known as linear transformations. These linear transformations preserve
certain algebraic properties.
A good argument could be made that linear algebra is the most useful subject in all
of mathematics and that it exceeds even courses like calculus in its significance. It is used
extensively in applied mathematics and engineering. Continuum mechanics, for example,
makes use of topics from linear algebra in defining things like the strain and in determining
appropriate constitutive laws. It is fundamental in the study of statistics. For example,
principal component analysis is really based on the singular value decomposition discussed
in this book. It is also fundamental in pure mathematics areas like number theory, functional
analysis, geometric measure theory, and differential geometry. Even calculus cannot be
correctly understood without it. For example, the derivative of a function of many variables
is an example of a linear transformation, and this is the way it must be understood as soon
as you consider functions of more than one variable.
1.17
Exercises
1. Show that (a · b) =
1
4
[
]
2
2
|a + b| − |a − b| .
2
2. Prove from the axioms of the inner product the parallelogram identity, |a + b| +
2
2
2
|a − b| = 2 |a| + 2 |b| .
∑n
3. For a, b ∈ Rn , define a · b ≡ k=1 β k ak bk where β k > 0 for each k. Show this satisfies
the axioms of the inner product. What does the Cauchy Schwarz inequality say in
this case.
4. In Problem 3 above, suppose you only know β k ≥ 0. Does the Cauchy Schwarz inequality still hold? If so, prove it.
5. Let f, g be continuous functions and define
∫ 1
f ·g ≡
f (t) g (t)dt
0
show this satisfies the axioms of a inner product if you think of continuous functions
in the place of a vector in Fn . What does the Cauchy Schwarz inequality say in this
case?
1.17. EXERCISES
39
6. Show that if f is a real valued continuous function,
(∫
b
)2
f (t) dt
a
∫
b
≤ (b − a)
2
f (t) dt.
a
40
CHAPTER 1. PRELIMINARIES
Chapter 2
Linear Transformations
2.1
Matrices
You have now solved systems of equations by writing them in terms of an augmented matrix
and then doing row operations on this augmented matrix. It turns out that such rectangular
arrays of numbers are important from many other different points of view. Numbers are
also called scalars. In general, scalars are just elements of some field. However, in the first
part of this book, the field will typically be either the real numbers or the complex numbers.
A matrix is a rectangular array of numbers. Several of them are referred to as matrices.
For example, here is a matrix.


1 2 3 4
 5 2 8 7 
6 −9 1 2
This matrix is a 3 × 4 matrix because there are three rows and four columns.
 first
 The
1
row is (1 2 3 4) , the second row is (5 2 8 7) and so forth. The first column is  5  . The
6
convention in dealing with matrices is to always list the rows first and then the columns.
Also, you can remember the columns are like columns in a Greek temple. They stand up
right while the rows just lay there like rows made by a tractor in a plowed field. Elements of
the matrix are identified according to position in the matrix. For example, 8 is in position
2, 3 because it is in the second row and the third column. You might remember that you
always list the rows before the columns by using the phrase Rowman Catholic. The symbol,
(aij ) refers to a matrix in which the i denotes the row and the j denotes the column. Using
this notation on the above matrix, a23 = 8, a32 = −9, a12 = 2, etc.
There are various operations which are done on matrices. They can sometimes be added,
multiplied by a scalar and sometimes multiplied. To illustrate scalar multiplication, consider
the following example.

 

1 2 3 4
3
6
9 12
6
24 21  .
3  5 2 8 7  =  15
6 −9 1 2
18 −27 3
6
The new matrix is obtained by multiplying every entry of the original matrix by the given
scalar. If A is an m × n matrix −A is defined to equal (−1) A.
Two matrices which are the same size can be added. When this is done, the result is the
41
42
matrix which is obtained by

1
 3
5
CHAPTER 2. LINEAR TRANSFORMATIONS
adding corresponding entries. Thus
 
 

2
−1 4
0
6
4 + 2
8  =  5 12  .
2
6 −4
11 −2
Two matrices are equal exactly when they are the same size and the corresponding entries
are identical. Thus


)
(
0 0
0 0
 0 0  ̸=
0 0
0 0
because they are different sizes. As noted above, you write (cij ) for the matrix C whose
ij th entry is cij . In doing arithmetic with matrices you must define what happens in terms
of the cij sometimes called the entries of the matrix or the components of the matrix.
The above discussion stated for general matrices is given in the following definition.
Definition 2.1.1 Let A = (aij ) and B = (bij ) be two m × n matrices. Then A + B = C
where
C = (cij )
for cij = aij + bij . Also if x is a scalar,
xA = (cij )
where cij = xaij . The number Aij will typically refer to the ij th entry of the matrix A. The
zero matrix, denoted by 0 will be the matrix consisting of all zeros.
Do not be upset by the use of the subscripts, ij. The expression cij = aij + bij is just
saying that you add corresponding entries to get the result of summing two matrices as
discussed above.
Note that there are 2 × 3 zero matrices, 3 × 4 zero matrices, etc. In fact for every size
there is a zero matrix.
With this definition, the following properties are all obvious but you should verify all of
these properties are valid for A, B, and C, m × n matrices and 0 an m × n zero matrix,
A + B = B + A,
(2.1)
(A + B) + C = A + (B + C) ,
(2.2)
the commutative law of addition,
the associative law for addition,
A + 0 = A,
(2.3)
A + (−A) = 0,
(2.4)
the existence of an additive identity,
the existence of an additive inverse. Also, for α, β scalars, the following also hold.
α (A + B) = αA + αB,
(2.5)
(α + β) A = αA + βA,
(2.6)
α (βA) = αβ (A) ,
(2.7)
1A = A.
(2.8)
The above properties, 2.1 - 2.8 are known as the vector space axioms and the fact that
the m × n matrices satisfy these axioms is what is meant by saying this set of matrices with
addition and scalar multiplication as defined above forms a vector space.
2.1. MATRICES
43
Definition 2.1.2 Matrices which are n × 1 or 1 × n are especially called vectors and are
often denoted by a bold letter. Thus


x1


x =  ... 
xn
is an n × 1 matrix also called a column vector while a 1 × n matrix of the form (x1 · · · xn )
is referred to as a row vector.
All the above is fine, but the real reason for considering matrices is that they can be
multiplied. This is where things quit being banal.
First consider the problem of multiplying an m × n matrix by an n × 1 column vector.
Consider the following example
 
(
)
7
1 2 3  
8
=?
4 5 6
9
It equals
(
7
1
4
)
(
+8
)
2
5
(
+9
3
6
)
Thus it is what is called a linear combination of the columns. These will be discussed
more later. Motivated by this example, here is the definition of how to multiply an m × n
matrix by an n × 1 matrix (vector).
Definition 2.1.3 Let A = Aij be an m × n matrix and let v be an n × 1 matrix,


v1


v =  ...  , A = (a1 , · · · , an )
vn
where ai is an m × 1 vector. Then Av, written as

(
)
a1 · · · an 

v1
..  ,
. 
vn
is the m × 1 column vector which equals the following linear combination of the columns.
v 1 a1 + v 2 a2 + · · · + v n an ≡
n
∑
v j aj
(2.9)
j=1
If the j th column of A is






A1j
A2j
..
.




Amj
then 2.9 takes the form



v1 

A11
A21
..
.
Am1






 + v2 


A12
A22
..
.
Am2






 + · · · + vn 


A1n
A2n
..
.
Amn





44
CHAPTER 2. LINEAR TRANSFORMATIONS
∑n
Thus the ith entry of Av is j=1 Aij vj . Note that multiplication by an m × n matrix takes
an n × 1 matrix, and produces an m × 1 matrix (vector).
Here is another example.
Example 2.1.4 Compute

1 2
 0 2
2 1
1
1
4

1
3
 2 

−2  
 0 .
1
1


First of all, this is of the form (3 × 4) (4 × 1) and so the result should be a (3 × 1) .
Note how the inside numbers cancel. To get the entry in the second row and first and only
column, compute
4
∑
a2k vk
= a21 v1 + a22 v2 + a23 v3 + a24 v4
k=1
= 0 × 1 + 2 × 2 + 1 × 0 + (−2) × 1 = 2.
You should do the rest of the problem and verify



1
1 2 1 3
 2
 0 2 1 −2  
 0
2 1 4 1
1

 
8

 =  2 .

5
With this done, the next task is to multiply an m × n matrix times an n × p matrix.
Before doing so, the following may be helpful.
these must match
(m ×
[
n)
(n × p
)=m×p
If the two middle numbers don’t match, you can’t multiply the matrices!
Definition 2.1.5 Let A be an m × n matrix and let B be an n × p matrix. Then B is of
the form
B = (b1 , · · · , bp )
where bk is an n × 1 matrix. Then an m × p matrix AB is defined as follows:
AB ≡ (Ab1 , · · · , Abp )
(2.10)
where Abk is an m × 1 matrix. Hence AB as just defined is an m × p matrix. For example,
Example 2.1.6 Multiply the following.
(
1 2
0 2
1
1
)


1 2 0
 0 3 1 
−2 1 1
The first thing you need to check before doing anything else is whether it is possible to
do the multiplication. The first matrix is a 2 × 3 and the second matrix is a 3 × 3. Therefore,
2.1. MATRICES
45
is it possible to multiply these matrices. According to the above discussion it should be a
2 × 3 matrix of the form


Second column
Third column
First column
z
z
}| 
{
}|
{
}|
{
z



 
(
)
(
)
(
)
1
2
0 

1
2
1
1
2
1
 1 2 1 
 3 ,
 1 
0 ,


0 2 1
0 2 1
 0 2 1

−2
1
1 

You know how to multiply a matrix times a
three columns. Thus

(
)
1 2
1 2 1 
0 3
0 2 1
−2 1
vector and so you do so to obtain each of the

(
0
−1 9
1 =
−2 7
1
3
3
)
.
Here is another example.
Example 2.1.7 Multiply the following.


(
1 2 0
 0 3 1  1
0
−2 1 1
2
2
1
1
)
First check if it is possible. This is of the form (3 × 3) (2 × 3) . The inside numbers do not
match and so you can’t do this multiplication. This means that anything you write will be
absolute nonsense because it is impossible to multiply these matrices in this order. Aren’t
they the same two matrices considered in the previous example? Yes they are. It is just
that here they are in a different order. This shows something you must always remember
about matrix multiplication.
Order Matters!
Matrix multiplication is not commutative. This is very different than multiplication of
numbers!
2.1.1
The ij th Entry Of A Product
It is important to describe matrix multiplication in terms of entries of the matrices. What
is the ij th entry of AB? It would be the ith entry of the j th column of AB. Thus it would
be the ith entry of Abj . Now


B1j


bj =  ... 
Bnj
th
and from the above definition, the i
entry is
n
∑
Aik Bkj .
(2.11)
k=1
In terms of pictures of the matrix, you are

A11 A12 · · · A1n
 A21 A22 · · · A2n

 ..
..
..
 .
.
.
Am1
Am2
···
Amn
doing





B11
B21
..
.
B12
B22
..
.
···
···
B1p
B2p
..
.
Bn1
Bn2
···
Bnp





46
CHAPTER 2. LINEAR TRANSFORMATIONS
Then as explained above, the j th column is of

A11 A12 · · ·
 A21 A22 · · ·

 ..
..
 .
.
Am1 Am2 · · ·
the form

A1n
B1j
 B2j
A2n 

..   ..
.  .
Amn
Bnj





which is a m × 1 matrix or column vector which equals





A11
A12
A1n
 A21 
 A22 
 A2n





 ..  B1j +  ..  B2j + · · · +  ..
 . 
 . 
 .
Am1
Am2
Amn



 Bnj .

The ith entry of this m × 1 matrix is
Ai1 B1j + Ai2 B2j + · · · + Ain Bnj =
m
∑
Aik Bkj .
k=1
This shows the following definition for matrix multiplication in terms of the ij th entries of
the product harmonizes with Definition 2.1.3.
This motivates the definition for matrix multiplication which identifies the ij th entries
of the product.
Definition 2.1.8 Let A = (Aij ) be an m × n matrix and let B = (Bij ) be an n × p matrix.
Then AB is an m × p matrix and
(AB)ij =
n
∑
Aik Bkj .
(2.12)
k=1
Two matrices, A and B are said to be conformable in a particular order if they can be
multiplied in that order. Thus if A is an r × s matrix and B is a s × p then A and B are
conformable in the order AB. The above formula for (AB)ij says that it equals the ith row
of A times the j th column of B.


(
)
1 2
2 3 1


3 1
Example 2.1.9 Multiply if possible
.
7 6 2
2 6
First check to see if this is possible. It is of the form (3 × 2) (2 × 3) and since the inside
numbers match, it must be possible to do this and the result should be a 3 × 3 matrix. The
answer is of the form







(
( )
(
)
)
1 2
1 2
1 2
 3 1  2 ,  3 1  3 ,  3 1  1 
7
6
2
2 6
2 6
2 6
where the commas separate the columns in the
equals

16 15
 13 15
46 42
resulting product. Thus the above product

5
5 ,
14
2.1. MATRICES
47
a 3 × 3 matrix as desired. In terms of the ij th entries and the above definition, the entry in
the third row and second column of the product should equal
∑
a3k bk2 = a31 b12 + a32 b22 = 2 × 3 + 6 × 6 = 42.
j
You should try a few more such examples to verify the
entries works for other entries.


1 2
2
Example 2.1.10 Multiply if possible  3 1   7
2 6
0
above definition in terms of the ij th

1
2 .
0
3
6
0
This is not possible because it is of the form (3 × 2) (3 × 3) and the middle numbers
don’t match.



2 3 1
1 2
Example 2.1.11 Multiply if possible  7 6 2   3 1  .
0 0 0
2 6
This is possible because in this case it is of the form (3 × 3) (3 × 2) and the middle
numbers do match. When the multiplication is done it equals


13 13
 29 32  .
0
0
Check this and be sure you come up with the same answer.
 
1
(
)
Example 2.1.12 Multiply if possible  2  1 2 1 0 .
1
In this case you are trying to do (3 × 1) (1 × 4) .
do it. Verify

 
1
(
)
 2  1 2 1 0 =
1
2.1.2
The inside numbers match so you can
1 2
2 4
1 2
1
2
1

0
0 
0
Digraphs
Consider the following graph illustrated in the picture.
1
2
3
There are three locations in this graph, labelled 1,2, and 3. The directed lines represent
a way of going from one location to another. Thus there is one way to go from location 1
to location 1. There is one way to go from location 1 to location 3. It is not possible to go
48
CHAPTER 2. LINEAR TRANSFORMATIONS
from location 2 to location 3 although it is possible to go from location 3 to location 2. Lets
refer to moving along one of these directed lines as a step. The following 3 × 3 matrix is
a numerical way of writing the above graph. This is sometimes called a digraph, short for
directed graph.


1 1 1
 1 0 0 
1 1 0
Thus aij , the entry in the ith row and j th column represents the number of ways to go from
location i to location j in one step.
Problem: Find the number of ways to go from i to j using exactly k steps.
Denote the answer to the above problem by akij . We don’t know what it is right now
unless k = 1 when it equals aij described above. However, if we did know what it was, we
could find ak+1
as follows.
ij
∑
ak+1
=
akir arj
ij
r
This is because if you go from i to j in k + 1 steps, you first go from i to r in k steps and
then for each of these ways there are arj ways to go from there to j. Thus akir arj gives
the number of ways to go from i to j in k + 1 steps such that the k th step leaves you at
location r. Adding these gives the above sum. Now you recognize this as the ij th entry of
the product of two matrices. Thus
∑
∑
a2ij =
air arj , a3ij =
a2ir arj
r
r
and so forth. From the above definition of matrix multiplication, this shows that if A is the
matrix associated with the directed graph as above, then akij is just the ij th entry of Ak
where Ak is just what you would think it should be, A multiplied by itself k times.
Thus in the above example, to find the number of ways of going from 1 to 3 in two steps
you would take that matrix and multiply it by itself and then take the entry in the first row
and third column. Thus

2 

1 1 1
3 2 1
 1 0 0  = 1 1 1 
1 1 0
2 1 1
and you see there is exactly one way to go from 1 to 3 in two steps. You can easily see this
is true from looking at the graph also. Note there are three ways to go from 1 to 1 in 2
steps. Can you find them from the graph? What would you do if you wanted to consider 5
steps?
5 


28 19 13
1 1 1
 1 0 0  =  13 9
6 
1 1 0
19 13 9
There are 19 ways to go from 1 to 2 in five steps. Do you think you could list them all by
looking at the graph? I don’t think you could do it without wasting a lot of time.
Of course there is nothing sacred about having only three locations. Everything works
just as well with any number of locations. In general if you have n locations, you would
need to use a n × n matrix.
Example 2.1.13 Consider the following directed graph.
2.1. MATRICES
49
1
2
3
4
Write the matrix which is associated with this directed graph and find the number of ways
to go from 2 to 4 in three steps.
Here you need to use a 4×4 matrix. The one you need is


0 1 1 0
 1 0 0 0 


 1 1 0 1 
0 1 0 1
Then to find the answer, you just need to multiply this matrix by itself three times and look
at the entry in the second row and fourth column.

3 

0 1 1 0
1 3 2 1
 1 0 0 0 
 2 1 0 1 




 1 1 0 1  = 3 3 1 2 
0 1 0 1
1 2 1 1
There is exactly one way to go from 2 to 4 in three steps.
How many ways would there be of going from 2 to 4 in five steps?
5 


0 1 1 0
5 9 5 4
 1 0 0 0 
 5 4 1 3 




 1 1 0 1  =  9 10 4 6 
0 1 0 1
4 6 3 3
There are three ways. Note there are 10 ways to go from 3 to 2 in five steps.
This is an interesting application of the concept of the ij th entry of the product matrices.
2.1.3
Properties Of Matrix Multiplication
As pointed out above, sometimes it is possible to multiply matrices in one order but not
in the other order. What if it makes sense to multiply them in either order? Will they be
equal then?
(
)(
)
(
)(
)
1 2
0 1
0 1
1 2
Example 2.1.14 Compare
and
.
3 4
1 0
1 0
3 4
The first product is
the second product is
(
(
1 2
3 4
0 1
1 0
)(
)(
0
1
1
0
1 2
3 4
)
(
=
)
(
=
2
4
1
3
3
1
4
2
)
,
)
,
and you see these are not equal. Therefore, you cannot conclude that AB = BA for matrix
multiplication. However, there are some properties which do hold.
50
CHAPTER 2. LINEAR TRANSFORMATIONS
Proposition 2.1.15 If all multiplications and additions make sense, the following hold for
matrices, A, B, C and a, b scalars.
A (aB + bC) = a (AB) + b (AC)
(2.13)
(B + C) A = BA + CA
(2.14)
A (BC) = (AB) C
(2.15)
Proof: Using the above definition of matrix multiplication,
∑
(A (aB + bC))ij =
Aik (aB + bC)kj
k
∑
=
Aik (aBkj + bCkj )
k
= a
∑
Aik Bkj + b
k
∑
Aik Ckj
k
= a (AB)ij + b (AC)ij
= (a (AB) + b (AC))ij
showing that A (B + C) = AB + AC as claimed. Formula 2.14 is entirely similar.
Consider 2.15, the associative law of multiplication. Before reading this, review the
definition of matrix multiplication in terms of entries of the matrices.
∑
Aik (BC)kj
(A (BC))ij =
k
=
∑
Aik
∑
k
=
∑
Bkl Clj
l
(AB)il Clj
l
= ((AB) C)ij .
Another important operation on matrices is that of taking the transpose. The following
example shows what is meant by this operation, denoted by placing a T as an exponent on
the matrix.
T

(
)
1 1 + 2i
1
3 2
 =
 3
1
1 + 2i 1 6
2
6
What happened? The first column became the first row and the second column became
the second row. Thus the 3 × 2 matrix became a 2 × 3 matrix. The number 3 was in the
second row and the first column and it ended up in the first row and second column. This
motivates the following definition of the transpose of a matrix.
Definition 2.1.16 Let A be an m × n matrix. Then AT denotes the n × m matrix which
is defined as follows.
( T)
A ij = Aji
The transpose of a matrix has the following important property.
Lemma 2.1.17 Let A be an m × n matrix and let B be a n × p matrix. Then
T
(2.16)
T
(2.17)
(AB) = B T AT
and if α and β are scalars,
(αA + βB) = αAT + βB T
2.1. MATRICES
51
Proof: From the definition,
(
)
T
(AB)
=
(AB)ji
∑
Ajk Bki
=
ij
k
=
=
∑(
(
BT
) (
ik
AT
)
kj
k
)
B T AT ij
2.17 is left as an exercise. Definition 2.1.18 An n × n matrix A is said to be symmetric if A = AT . It is said to be
skew symmetric if AT = −A.
Example 2.1.19 Let


3
−3  .
7


1 3
0 2 
−2 0
2 1
A= 1 5
3 −3
Then A is symmetric.
Example 2.1.20 Let
0
A =  −1
−3
Then A is skew symmetric.
There is a special matrix called I and defined by
Iij = δ ij
where δ ij is the Kronecker symbol defined by
{
1 if i = j
δ ij =
0 if i ̸= j
It is called the identity matrix because it is a multiplicative identity in the following sense.
Lemma 2.1.21 Suppose A is an m × n matrix and In is the n × n identity matrix. Then
AIn = A. If Im is the m × m identity matrix, it also follows that Im A = A.
Proof:
(AIn )ij
=
∑
Aik δ kj
k
= Aij
and so AIn = A. The other case is left as an exercise for you.
Definition 2.1.22 An n × n matrix A has an inverse A−1 if and only if there exists a
matrix, denoted as A−1 such that AA−1 = A−1 A = I where I = (δ ij ) for
{
1 if i = j
δ ij ≡
0 if i ̸= j
Such a matrix is called invertible.
52
CHAPTER 2. LINEAR TRANSFORMATIONS
If it acts like an inverse, then it is the inverse. This is the message of the following
proposition.
Proposition 2.1.23 Suppose AB = BA = I. Then B = A−1 .
Proof: From the definition B is an inverse for A. Could there be another one B ′ ?
B ′ = B ′ I = B ′ (AB) = (B ′ A) B = IB = B.
Thus, the inverse, if it exists, is unique. 2.1.4
Finding The Inverse Of A Matrix
A little later a formula is given for the inverse of a matrix. However, it is not a good way
to find the inverse for a matrix. There is a much easier way and it is this which is presented
here. It is also important to note that not all matrices have inverses.
(
)
1 1
Example 2.1.24 Let A =
. Does A have an inverse?
1 1
One might think A would have an inverse because it does not equal zero. However,
(
)(
) (
)
1 1
−1
0
=
1 1
1
0
and if A−1 existed, this could not happen because you could multiply on the left by the
T
T
inverse A and conclude the vector (−1, 1) = (0, 0) . Thus the answer is that A does not
have an inverse.
Suppose you want to find B such that AB = I. Let
(
)
B = b1 · · · bn
Also the ith column of I is
ei =
(
0
···
0
1 0 ···
0
)T
Thus, if AB = I, bi , the ith column of B must satisfy the equation Abi = ei . The augmented
matrix for finding bi is (A|ei ) . Thus, by doing row operations till A becomes I, you end up
with (I|bi ) where bi is the solution to Abi = ei . Now the same sequence of row operations
works regardless of the right side of the agumented matrix (A|ei ) and so you can save trouble
by simply doing the following.
(A|I)
row operations
→
(I|B)
and the ith column of B is bi , the solution to Abi = ei . Thus AB = I.
This is the reason for the following simple procedure for finding the inverse of a matrix.
This procedure is called the Gauss Jordan procedure. It produces the inverse if the matrix
has one. Actually, it produces the right inverse.
Procedure 2.1.25 Suppose A is an n × n matrix. To find A−1 if it exists, form the
augmented n × 2n matrix,
(A|I)
and then do row operations until you obtain an n × 2n matrix of the form
(I|B)
(2.18)
if possible. When this has been done, B = A−1 . The matrix A has an inverse exactly when
it is possible to do row operations and end up with one like 2.18.
2.1. MATRICES
53
As described above, the following is a description of what you have just done.
A
I
Rq Rq−1 ···R1
→
I
→
B
Rq Rq−1 ···R1
where those Ri sympolize row operations. It follows that you could undo what you did by
doing the inverse of these row operations in the opposite order. Thus
I
B
−1
R1−1 ···Rq−1
Rq−1
→
−1
R1−1 ···Rq−1
Rq−1
→
A
I
Here R−1 is the row operation which undoes the row operation R. Therefore, if you form
(B|I) and do the inverse of the row operations which produced I from A in the reverse
order, you would obtain (I|A) . By the same reasoning above, it follows that A is a right
inverse of B and so BA = I also. It follows from Proposition 2.1.23 that B = A−1 . Thus
the procedure produces the inverse whenever it works.
row operations
If it is possible to do row operations and end up with A
→
I, then the above
argument shows that A has an inverse. Conversely, if A has an inverse, can it be found by
the above procedure? In this case there exists a unique solution x to the equation Ax = y.
In fact it is just x = Ix = A−1 y. Thus in terms of augmented matrices, you would expect
to obtain
(
)
(A|y) → I|A−1 y
That is, you would expect to be able to do row operations to A and end up with I.
The details will be explained fully when a more careful discussion is given which is based
on more fundamental considerations. For now, it suffices to observe that whenever the above
procedure works, it finds the inverse.


1 0
1
Example 2.1.26 Let A =  1 −1 1 . Find A−1 .
1 1 −1
Form the augmented matrix


1 0
1 1 0 0
 1 −1 1 0 1 0  .
1 1 −1 0 0 1
Now do row operations until the n × n matrix on the left becomes the identity matrix. This
yields after some computations,


1
1
1 0 0 0
2
2
 0 1 0 1 −1
0 
0 0 1 1 − 12 − 12
and so the inverse of A is the matrix on the right,


1
1
0
2
2
 1 −1
0 .
1 − 12 − 12
Checking the answer is easy. Just multiply the matrices and see if


 
1
1
1 0
1
0
1 0
2
2
 1 −1 1   1 −1
0 = 0 1
1 1 −1
0 0
1 − 12 − 12
it works.

0
0 .
1
54
CHAPTER 2. LINEAR TRANSFORMATIONS
Always check your answer because if
mistake.

1 2
Example 2.1.27 Let A =  1 0
3 1
you are like some of us, you will usually have made a

2
2 . Find A−1 .
−1
Set up the augmented matrix (A|I)

1 2
 1 0
3 1

1 0 0
0 1 0 
0 0 1
2
2
−1
Next take (−1) times the first row and add to the
row added to the last. This yields

1 2
2
1
 0 −2 0 −1
0 −5 −7 −3
Then take 5 times the second row and add to

1
2
2
 0 −10 0
0
0
14
second followed by (−3) times the first

0 0
1 0 .
0 1
−2 times the last row.

1 0 0
−5 5 0 
1 5 −2
Next take the last row and add to (−7) times the top row. This yields


−7 −14 0 −6 5 −2
 0 −10 0 −5 5 0  .
0
0
14 1 5 −2
Now take (−7/5) times the second row and add to

−7
0
0
1
 0 −10 0 −5
0
0
14 1
the top.

−2 −2
5
0 .
5 −2
Finally divide the top row by −7,
yields

1
 0
0
the second row by -10 and the bottom row by 14 which
Therefore, the inverse is

0
1
0


0 − 17
1
0
2
1
1 14
− 17
1
2
1
14
2
7
− 12
5
14
2
7
− 12
5
14
2
7
2
7

0 .
− 17

0 
− 71

1 2 2
Example 2.1.28 Let A =  1 0 2 . Find A−1 .
2 2 4
2.2. EXERCISES
Write the augmented matrix (A|I)

1 2
 1 0
2 2
55

0
0 
1
(
)
and proceed to do row operations attempting to obtain I|A−1 . Take (−1) times the top
row and add to the second. Then take (−2) times the top row and add to the bottom.


1 2 2 1 0 0
 0 −2 0 −1 1 0 
0 −2 0 −2 0 1
2 1
2 0
4 0
0
1
0
Next add (−1) times the second row to the bottom row.


1 2 2 1
0 0
 0 −2 0 −1 1 0 
0 0 0 −1 −1 1
At this point, you can see there will be no inverse because you have obtained a row of zeros
in the left half of the augmented matrix (A|I) . Thus there will be no way to obtain I on
the left. In other words, the three systems of equations you must solve to find the inverse
have no solution. In particular, there is no solution for the first column of A−1 which must
solve

 

1
x
A y  =  0 
0
z
because a sequence of row operations leads to the impossible equation, 0x + 0y + 0z = −1.
2.2
Exercises
1. In 2.1 - 2.8 describe −A and 0.
2. Let A be an n×n matrix. Show A equals the sum of a symmetric and a skew symmetric
matrix.
3. Show every skew symmetric matrix has all zeros down the main diagonal. The main
diagonal consists of every entry of the matrix which is of the form aii . It runs from
the upper left down to the lower right.
4. Using only the properties 2.1 - 2.8 show −A is unique.
5. Using only the properties 2.1 - 2.8 show 0 is unique.
6. Using only the properties 2.1 - 2.8 show 0A = 0. Here the 0 on the left is the scalar 0
and the 0 on the right is the zero for m × n matrices.
7. Using only the properties 2.1 - 2.8 and previous problems show (−1) A = −A.
8. Prove 2.17.
9. Prove that Im A = A where A is an m × n matrix.
n
10. Let
y ∈ Rm . Show (Ax, y)Rm =
( AT and
) be a real m × n matrix and let x ∈ R and
k
x,A y Rn where (·, ·)Rk denotes the dot product in R .
56
CHAPTER 2. LINEAR TRANSFORMATIONS
T
11. Use the result of Problem 10 to verify directly that (AB) = B T AT without making
any reference to subscripts.
12. Let x = (−1, −1, 1) and y = (0, 1, 2) . Find xT y and xyT if possible.
13. Give an example of matrices, A, B, C such that B ̸= C, A ̸= 0, and yet AB = AC.




(
)
1
1
1
1 −3
1 −1 −2
0  . Find
14. Let A =  −2 −1 , B =
, and C =  −1 2
2 1 −2
1
2
−3 −1 0
if possible the following products. AB, BA, AC, CA, CB, BC.
15. Consider the following digraph.
1
2
3
4
Write the matrix associated with this digraph and find the number of ways to go from
3 to 4 in three steps.
16. Show that if A−1 exists for an n × n matrix, then it is unique. That is, if BA = I and
AB = I, then B = A−1 .
−1
17. Show (AB)
= B −1 A−1 .
( )−1 ( −1 )T
18. Show that if A is an invertible n × n matrix, then so is AT and AT
= A
.
19. Show that if A is an n × n invertible matrix and x is a n × 1 matrix such that Ax = b
for b an n × 1 matrix, then x = A−1 b.
20. Give an example of a matrix A such that A2 = I and yet A ̸= I and A ̸= −I.
21. Give an example of matrices, A, B such that neither A nor B equals zero and yet
AB = 0.




x1 − x2 + 2x3
x1




2x3 + x1
 in the form A  x2  where A is an appropriate matrix.
22. Write 



3x3
x3 
3x4 + 3x2 + x1
x4
23. Give another example other than the one given in this section of two square matrices,
A and B such that AB ̸= BA.
24. Suppose A and B are square matrices of the same size. Which of the following are
correct?
2
(a) (A − B) = A2 − 2AB + B 2
2
(b) (AB) = A2 B 2
2
(c) (A + B) = A2 + 2AB + B 2
2
(d) (A + B) = A2 + AB + BA + B 2
2.3. LINEAR TRANSFORMATIONS
57
(e) A2 B 2 = A (AB) B
3
(f) (A + B) = A3 + 3A2 B + 3AB 2 + B 3
(g) (A + B) (A − B) = A2 − B 2
(h) None of the above. They are all wrong.
(i) All of the above. They are all right.
)
(
−1 −1
25. Let A =
. Find all 2 × 2 matrices, B such that AB = 0.
3
3
26. Prove that if A−1 exists and Ax = 0 then x = 0.

27. Let
1
A= 2
1
2
1
0

3
4 .
2
Find A−1 if possible. If A−1 does not exist, determine why.

28. Let
1
A= 2
1
0
3
0

3
4 .
2
Find A−1 if possible. If A−1 does not exist, determine why.
29. Let

1
A= 2
4
2
1
5

3
4 .
10
Find A−1 if possible. If A−1 does not exist, determine why.
30. Let

1
 1
A=
 2
1
2
1
1
2
0
2
−3
1

2
0 

2 
2
Find A−1 if possible. If A−1 does not exist, determine why.
2.3
Linear Transformations
By 2.13, if A is an m × n matrix, then for v, u vectors in Fn and a, b scalars,
 ∈Fn 
z }| {
A au + bv = aAu + bAv ∈ Fm
(2.19)
Definition 2.3.1 A function, A : Fn → Fm is called a linear transformation if for all
u, v ∈ Fn and a, b scalars, 2.19 holds.
From 2.19, matrix multiplication defines a linear transformation as just defined. It
turns out this is the only type of linear transformation available. Thus if A is a linear
transformation from Fn to Fm , there is always a matrix which produces A. Before showing
this, here is a simple definition.
58
CHAPTER 2. LINEAR TRANSFORMATIONS
Definition 2.3.2 A vector, ei ∈ Fn is defined as follows:
 
0
 .. 
 . 
 

ei ≡ 
 1 ,
 . 
 .. 
0
where the 1 is in the ith position and there are zeros everywhere else. Thus
T
ei = (0, · · · , 0, 1, 0, · · · , 0) .
Of course the ei for a particular value of i in Fn would be different than the ei for that
same value of i in Fm for m ̸= n. One of them is longer than the other. However, which one
is meant will be determined by the context in which they occur.
These vectors have a significant property.
Lemma 2.3.3 Let v ∈ Fn . Thus v is a list of numbers arranged vertically, v1 , · · · , vn . Then
eTi v = vi .
(2.20)
Also, if A is an m × n matrix, then letting ei ∈ Fm and ej ∈ Fn ,
eTi Aej = Aij
(2.21)
Proof: First note that eTi is a 1 × n matrix and v is an n × 1 matrix so the above
multiplication in 2.20 makes perfect sense. It equals


v1
 .. 
 . 



(0, · · · , 1, · · · 0) 
 vi  = vi
 . 
 .. 
vn
as claimed.
Consider 2.21. From the definition of matrix multiplication, and noting that (ej )k = δ kj
 ∑



A1j
k A1k (ej )k


 .. 
..


 . 
.




∑
T 


A
(e
)
eTi Aej = eTi 
=
e
j k 
i  Aij  = Aij
k ik



 . 
..


 .. 
.
∑
Amj
k Amk (ej )k
by the first part of the lemma. Theorem 2.3.4 Let L : Fn → Fm be a linear transformation. Then there exists a unique
m × n matrix A such that
Ax = Lx
for all x ∈ Fn . The ik th entry of this matrix is given by
eTi Lek
Stated in another way, the k th column of A equals Lek .
(2.22)
2.4. SOME GEOMETRICALLY DEFINED LINEAR TRANSFORMATIONS
59
Proof: By the lemma,
(Lx)i = eTi Lx = eTi
∑
xk Lek =
k
∑(
)
eTi Lek xk .
k
eTi Lek ,
Let Aik =
to prove the existence part of the theorem.
To verify uniqueness, suppose Bx = Ax = Lx for all x ∈ Fn . Then in particular, this is
true for x = ej and then multiply on the left by eTi to obtain
Bij = eTi Bej = eTi Aej = Aij
showing A = B. Corollary 2.3.5 A linear transformation, L : Fn → Fm is completely determined by the
vectors {Le1 , · · · , Len } .
Proof: This follows immediately from the above theorem. The unique matrix determining the linear transformation which is given in 2.22 depends only on these vectors. For a different proof of this theorem and corollary, see the following section.
This theorem shows that any linear transformation defined on Fn can always be considered as matrix multiplication. Therefore, the terms “linear transformation” and “matrix”
are often used interchangeably. For example, to say that a matrix is one to one, means the
linear transformation determined by the matrix is one to one.
2
2
Example
Find the (linear
( 2.3.6
)
) transformation, L : R → R which has the property that
2
1
Le1 =
and Le2 =
. From the above theorem and corollary, this linear trans1
3
formation is that determined by matrix multiplication by the matrix
(
)
2 1
.
1 3
2.4
Some Geometrically Defined Linear Transformations
If T is any linear transformation which maps Fn to Fm , there is always an m × n matrix
A ≡ [T ] with the property that
Ax = T x
(2.23)
for all x ∈ Fn . What is the form of A? Suppose T : Fn → Fm is a linear transformation
and you want to find the matrix defined by this linear transformation as described in 2.23.
Then if x ∈ Fn it follows
n
∑
x=
xi ei
i=1
where ei is the vector which has zeros in every slot but the ith and a 1 in this slot. Then
since T is linear,
n
∑
Tx =
xi T (ei )
i=1

|
=  T (e1 )
|
···

 x 
1
|



T (en )   ...  ≡ A 
|
xn

x1
.. 
. 
xn
and so you see that the matrix desired is obtained from letting the ith column equal T (ei ) .
This proves the existence part of the following theorem.
60
CHAPTER 2. LINEAR TRANSFORMATIONS
Theorem 2.4.1 Let T be a linear transformation from Fn to Fm . Then the matrix A satisfying 2.23 is given by


|
|
 T (e1 ) · · · T (en ) 
|
|
where T ei is the ith column of A.
(
Proof: It remains
to verify uniqueness. However, if A is a matrix which works, A =
)
a1 · · · an , then T ei ≡ Aei = ai and so the matrix is of the form claimed above. Example 2.4.2 Determine the matrix for the transformation mapping R2 to R2 which
consists of rotating every vector counter clockwise through an angle of θ.
)
(
)
(
0
1
and e2 ≡
. These identify the geometric vectors which point
Let e1 ≡
0
1
along the positive x axis and positive y axis as shown.
e2
6
-
e1
From Theorem 2.4.1, you only need to find T e1 and T e2 , the first being the first column
of the desired matrix A and the second being the second column. From drawing a picture
and doing a little geometry, you see that
(
)
(
)
cos θ
− sin θ
T e1 =
, T e2 =
.
sin θ
cos θ
Therefore, from Theorem 2.4.1,
(
A=
cos θ
sin θ
− sin θ
cos θ
)
Example 2.4.3 Find the matrix of the linear transformation which is obtained by first
rotating all vectors through an angle of ϕ and then through an angle θ. Thus you want the
linear transformation which rotates all angles through an angle of θ + ϕ.
Let Tθ+ϕ denote the linear transformation which rotates every vector through an angle
of θ + ϕ. Then to get Tθ+ϕ , you could first do Tϕ and then do Tθ where Tϕ is the linear
transformation which rotates through an angle of ϕ and Tθ is the linear transformation
which rotates through an angle of θ. Denoting the corresponding matrices by Aθ+ϕ , Aϕ ,
and Aθ , you must have for every x
Aθ+ϕ x = Tθ+ϕ x = Tθ Tϕ x = Aθ Aϕ x.
Consequently, you must have
(
Aθ+ϕ
)
cos (θ + ϕ) − sin (θ + ϕ)
=
= Aθ Aϕ
sin (θ + ϕ) cos (θ + ϕ)
(
)(
)
cos θ − sin θ
cos ϕ − sin ϕ
=
.
sin θ cos θ
sin ϕ cos ϕ
2.4. SOME GEOMETRICALLY DEFINED LINEAR TRANSFORMATIONS
Therefore,
(
) (
cos (θ + ϕ) − sin (θ + ϕ)
cos θ cos ϕ − sin θ sin ϕ
=
sin (θ + ϕ) cos (θ + ϕ)
sin θ cos ϕ + cos θ sin ϕ
− cos θ sin ϕ − sin θ cos ϕ
cos θ cos ϕ − sin θ sin ϕ
61
)
.
Don’t these look familiar? They are the usual trig. identities for the sum of two angles
derived here using linear algebra concepts.
Example 2.4.4 Find the matrix of the linear transformation which rotates vectors in R3 counterclockwise
about the positive z axis.
Let T be the name of this linear transformation. In this case, T e3 = e3 , T e1 =
T
T
(cos θ, sin θ, 0) , and T e2 = (− sin θ, cos θ, 0) . Therefore, the matrix of this transformation
is just


cos θ − sin θ 0
 sin θ cos θ 0 
(2.24)
0
0
1
In Physics it is important to consider the work done by a force field on an object. This
involves the concept of projection onto a vector. Suppose you want to find the projection
of a vector, v onto the given vector, u, denoted by proju (v) This is done using the dot
product as follows.
(v · u)
proju (v) =
u
u·u
Because of properties of the dot product, the map v → proju (v) is linear,
(
)
(v · u)
(w · u)
αv+βw · u
proju (αv+βw) =
u=α
u+β
u
u·u
u·u
u·u
= α proju (v) + β proju (w) .
T
Example 2.4.5 Let the projection map be defined above and let u = (1, 2, 3) . Find the
matrix of this linear transformation with respect to the usual basis.
You can find this matrix in the same way as in earlier examples. proju (ei ) gives the ith
column of the desired matrix. Therefore, it is only necessary to find
( e ·u )
i
proju (ei ) ≡
u
u·u
For the given vector in the example, this implies the columns of the desired matrix are
 




1
1
1
1   2 
3 
2
2 ,
2 .
,
14
14
14
3
3
3
Hence the matrix is

1 2
1 
2 4
14
3 6

3
6 .
9
Example 2.4.6 Find the matrix of the linear transformation which reflects all vectors in
R3 through the xz plane.
62
CHAPTER 2. LINEAR TRANSFORMATIONS
As illustrated above, you just need to find T ei where T is the name of the transformation.
But T e1 = e1 , T e3 = e3 , and T e2 = −e2 so the matrix is


1 0 0
 0 −1 0  .
0 0 1
Example 2.4.7 Find the matrix of the linear transformation which first rotates counter
clockwise about the positive z axis and then reflects through the xz plane.
This linear transformation is just the composition of two
matrices

 
cos θ − sin θ 0
1 0
 sin θ cos θ 0  ,  0 −1
0
0
1
0 0
respectively. Thus the matrix desired is


1 0 0
cos θ − sin θ
 0 −1 0   sin θ cos θ
0 0 1
0
0
2.5
linear transformations having

0
0 
1
 
0
cos θ
0  =  − sin θ
1
0
− sin θ
− cos θ
0

0
0 .
1
The Null Space Of A Linear Transformation
The null space or kernel of a matrix or linear transformation is given in the following
definition. Essentially, it is just the set of all vectors which are sent to the zero vector by
the linear transformation.
Definition 2.5.1 Let L : Fn → Fm be a linear transformation and let its matrix be the
m × n matrix A. Then ker (L) ≡ {x ∈ Fn : Lx = 0} . Sometimes people also write this as
N (A) , the null space of A.
Then there is a fundamental result in the case where m < n. In this case, the matrix A
of the linear transformation looks like the following.
Theorem 2.5.2 Let A be an m × n matrix where m < n. Then N (A) contains nonzero
vectors.
Proof: First consider the case where A is a 1 × n matrix for n > 1. Say
(
)
A = a1 · · · an
If a1 = 0, consider the vector x = e1 . If a1 ̸= 0, let
 
b
 1 
 
x = . 
 .. 
1
2.6. SUBSPACES AND SPANS
63
where b is chosen to satisfy the equation
a1 b +
n
∑
ak = 0
k=2
Suppose now that the theorem is true for any m × n matrix with n > m and consider an
(m × 1) × n matrix A where n > m + 1. If the first column of A is 0, then you could let
x = e1 as above. If the first column is not the zero vector, then by doing row operations,
the equation Ax = 0 can be reduced to the equivalent system
A1 x = 0
where A1 is of the form
(
A1 =
1
0
aT
B
)
where B is an m × (n − 1) matrix. Since n > m + 1, it follows that (n − 1) > m and so
by induction, there exists a nonzero vector y ∈ Fn−1 such that By = 0. Then consider the
vector
(
)
b
x=
y
 T 
b1


A1 x has for its top entry the expression b + aT y. Letting B =  ...  , the ith entry of
bTm
A1 x for i > 1 is of the form bTi y = 0. Thus if b is chosen to satisfy the equation b+aT y = 0,
then A1 x = 0.
2.6
Subspaces And Spans
Definition 2.6.1 Let {x1 , · · · , xp } be vectors in Fn . A linear combination is any expression
of the form
p
∑
ci xi
i=1
where the ci are scalars. The set of all linear combinations of these vectors is called
span (x1 , · · · , xn ) . If V ⊆ Fn , is nonempty, then V is called a subspace if whenever α, β
are scalars and u and v are vectors of V, it follows αu + βv ∈ V . That is, it is “closed
under the algebraic operations of vector addition and scalar multiplication”. A linear combination of vectors is said to be trivial if all the scalars in the linear combination equal zero.
A set of vectors is said to be linearly independent if the only linear combination of these
vectors which equals the zero vector is the trivial linear combination. Thus {x1 , · · · , xn } is
called linearly independent if whenever
p
∑
ck xk = 0
k=1
it follows that all the scalars ck equal zero. A set of vectors, {x1 , · · · , xp } , is called linearly
dependent if it is not linearly independent. Thus the set∑
of vectors is linearly dependent if
p
there exist scalars ci , i = 1, · · · , n, not all zero such that k=1 ck xk = 0.
Proposition 2.6.2 Let V ⊆ Fn . Then V is a subspace if and only if it is a vector space
itself with respect to the same operations of scalar multiplication and vector addition.
64
CHAPTER 2. LINEAR TRANSFORMATIONS
Proof: Suppose first that V is a subspace. All algebraic properties involving scalar
multiplication and vector addition hold for V because these things hold for Fn . Is 0 ∈ V ? Yes
it is. This is because 0v ∈ V and 0v = 0. By assumption, for α a scalar and v ∈ V, αv ∈ V.
Therefore, −v = (−1) v ∈ V . Thus V has the additive identity and additive inverse. By
assumption, V is closed with respect to the two operations. Thus V is a vector space. If
V ⊆ Fn is a vector space, then by definition, if α, β are scalars and u, v vectors in V, it
follows that αv + βu ∈ V . Thus, from the above, subspaces of Fn are just subsets of Fn which are themselves vector
spaces.
Lemma 2.6.3 A set of vectors {x1 , · · · , xp } is linearly independent if and only if none of
the vectors can be obtained as a linear combination of the others.
∑
Proof: Suppose first that {x1 , · · · , xp } is linearly independent. If xk = j̸=k cj xj , then
0 = 1xk +
∑
(−cj ) xj ,
j̸=k
a nontrivial linear combination, contrary to assumption. This shows that if the set is linearly
independent, then none of the vectors is a linear combination of the others.
Now suppose no vector is a linear combination of the others. Is {x1 , · · · , xp } linearly
independent? If it is not, there exist scalars ci , not all zero such that
p
∑
ci xi = 0.
i=1
Say ck ̸= 0. Then you can solve for xk as
∑
xk =
(−cj ) /ck xj
j̸=k
contrary to assumption. The following is called the exchange theorem.
Theorem 2.6.4 (Exchange Theorem) Let {x1 , · · · , xr } be a linearly independent set of vectors such that each xi is in span(y1 , · · · , ys ) . Then r ≤ s.
Proof 1: Suppose not. Then r > s. By assumption, there exist scalars aji such that
xi =
s
∑
aji yj
j=1
The matrix whose jith entry is aji has more columns than rows. Therefore, by Theorem
2.5.2 there exists a nonzero vector b ∈ Fr such that Ab = 0. Thus
0=
r
∑
aji bi , each j.
i=1
Then
r
∑
i=1
bi xi =
r
∑
i=1
bi
s
∑
j=1
aji yj =
( r
s
∑
∑
j=1
)
aji bi
yj = 0
i=1
contradicting the assumption that {x1 , · · · , xr } is linearly independent.
2.6. SUBSPACES AND SPANS
Proof 2:
that
65
Define span{y1 , · · · , ys } ≡ V, it follows there exist scalars c1 , · · · , cs such
x1 =
s
∑
ci yi .
(2.25)
i=1
Not all of these scalars can equal zero because if this were the case, it would follow that
x
∑1 r= 0 and so {x1 , · · · , xr } would not be linearly independent. Indeed, if x1 = 0, 1x1 +
i=2 0xi = x1 = 0 and so there would exist a nontrivial linear combination of the vectors
{x1 , · · · , xr } which equals zero.
Say ck ̸= 0. Then solve (2.25) for yk and obtain


s-1 vectors here
z
}|
{
yk ∈ span x1 , y1 , · · · , yk−1 , yk+1 , · · · , ys  .
Define {z1 , · · · , zs−1 } by
{z1 , · · · , zs−1 } ≡ {y1 , · · · , yk−1 , yk+1 , · · · , ys }
Therefore, span {x1 , z1 , · · · , zs−1 } = V because if v ∈ V, there exist constants c1 , · · · , cs
such that
s−1
∑
v=
ci zi + cs yk .
i=1
Now replace the yk in the above with a linear combination of the vectors, {x1 , z1 , · · · , zs−1 }
to obtain v ∈ span {x1 , z1 , · · · , zs−1 } . The vector yk , in the list {y1 , · · · , ys } , has now been
replaced with the vector x1 and the resulting modified list of vectors has the same span as
the original list of vectors, {y1 , · · · , ys } .
Now suppose that r > s and that span {x1 , · · · , xl , z1 , · · · , zp } = V where the vectors,
z1 , · · · , zp are each taken from the set, {y1 , · · · , ys } and l + p = s. This has now been done
for l = 1 above. Then since r > s, it follows that l ≤ s < r and so l + 1 ≤ r. Therefore, xl+1
is a vector not in the list, {x1 , · · · , xl } and since span {x1 , · · · , xl , z1 , · · · , zp } = V, there
exist scalars ci and dj such that
xl+1 =
l
∑
i=1
ci xi +
p
∑
dj zj .
(2.26)
j=1
Now not all the dj can equal zero because if this were so, it would follow that {x1 , · · · , xr }
would be a linearly dependent set because one of the vectors would equal a linear combination
of the others. Therefore, (2.26) can be solved for one of the zi , say zk , in terms of xl+1 and
the other zi and just as in the above argument, replace that zi with xl+1 to obtain


p-1 vectors here

}|
{
z
span x1 , · · · xl , xl+1 , z1 , · · · zk−1 , zk+1 , · · · , zp = V.


Continue this way, eventually obtaining
span {x1 , · · · , xs } = V.
But then xr ∈ span {x1 , · · · , xs } contrary to the assumption that {x1 , · · · , xr } is linearly
independent. Therefore, r ≤ s as claimed.
66
CHAPTER 2. LINEAR TRANSFORMATIONS
Proof 3: Suppose r > s. Let zk denote a vector of {y1 , · · · , ys } . Thus there exists j as
small as possible such that
span (y1 , · · · , ys ) = span (x1 , · · · , xm , z1 , · · · , zj )
where m + j = s. It is given that m = 0, corresponding to no vectors of {x1 , · · · , xm } and
j = s, corresponding to all the yk results in the above equation holding. If j > 0 then m < s
and so
j
m
∑
∑
xm+1 =
ak xk +
bi zi
i=1
k=1
Not all the bi can equal 0 and so you can solve for one of them in terms of xm+1 , xm , · · · , x1 ,
and the other zk . Therefore, there exists
{z1 , · · · , zj−1 } ⊆ {y1 , · · · , ys }
such that
span (y1 , · · · , ys ) = span (x1 , · · · , xm+1 , z1 , · · · , zj−1 )
contradicting the choice of j. Hence j = 0 and
span (y1 , · · · , ys ) = span (x1 , · · · , xs )
It follows that
xs+1 ∈ span (x1 , · · · , xs )
contrary to the assumption the xk are linearly independent. Therefore, r ≤ s as claimed. Definition 2.6.5 A finite set of vectors, {x1 , · · · , xr } is a basis for Fn if span (x1 , · · · , xr ) =
Fn and {x1 , · · · , xr } is linearly independent.
Corollary 2.6.6 Let {x1 , · · · , xr } and {y1 , · · · , ys } be two bases1 of Fn . Then r = s = n.
Proof: From the exchange theorem, r ≤ s and s ≤ r. Now note the vectors,
1 is in the ith slot
z
}|
{
ei = (0, · · · , 0, 1, 0 · · · , 0)
for i = 1, 2, · · · , n are a basis for Fn . Lemma 2.6.7 Let {v1 , · · · , vr } be a set of vectors. Then V ≡ span (v1 , · · · , vr ) is a subspace.
∑r
∑r
Proof: Suppose α, β are two scalars and let k=1 ck vk and k=1 dk vk are two elements
of V. What about
r
r
∑
∑
α
ck vk + β
dk vk ?
k=1
Is it also in V ?
α
r
∑
k=1
ck vk + β
r
∑
k=1
k=1
dk vk =
r
∑
(αck + βdk ) vk ∈ V
k=1
so the answer is yes. 1 This is the plural form of basis. We could say basiss but it would involve an inordinate amount of
hissing as in “The sixth shiek’s sixth sheep is sick”. This is the reason that bases is used instead of basiss.
2.6. SUBSPACES AND SPANS
67
Definition 2.6.8 A finite set of vectors, {x1 , · · · , xr } is a basis for a subspace V of Fn if
span (x1 , · · · , xr ) = V and {x1 , · · · , xr } is linearly independent.
Corollary 2.6.9 Let {x1 , · · · , xr } and {y1 , · · · , ys } be two bases for V . Then r = s.
Proof: From the exchange theorem, r ≤ s and s ≤ r. Definition 2.6.10 Let V be a subspace of Fn . Then dim (V ) read as the dimension of V
is the number of vectors in a basis.
Of course you should wonder right now whether an arbitrary subspace even has a basis.
In fact it does and this is in the next theorem. First, here is an interesting lemma.
Lemma 2.6.11 Suppose v ∈
/ span (u1 , · · · , uk ) and {u1 , · · · , uk } is linearly independent.
Then {u1 , · · · , uk , v} is also linearly independent.
∑k
Proof: Suppose
i=1 ci ui + dv = 0. It is required to verify that each ci = 0 and
that d = 0. But if d ̸= 0, then you can solve for v as a linear combination of the vectors,
{u1 , · · · , uk },
k ( )
∑
ci
v=−
ui
d
i=1
∑k
contrary to assumption. Therefore, d = 0. But then i=1 ci ui = 0 and the linear independence of {u1 , · · · , uk } implies each ci = 0 also. Theorem 2.6.12 Let V be a nonzero subspace of Fn . Then V has a basis.
Proof: Let v1 ∈ V where v1 ̸= 0. If span {v1 } = V, stop. {v1 } is a basis for V .
Otherwise, there exists v2 ∈ V which is not in span {v1 } . By Lemma 2.6.11 {v1 , v2 } is a
linearly independent set of vectors. If span {v1 , v2 } = V stop, {v1 , v2 } is a basis for V. If
span {v1 , v2 } ̸= V, then there exists v3 ∈
/ span {v1 , v2 } and {v1 , v2 , v3 } is a larger linearly
independent set of vectors. Continuing this way, the process must stop before n + 1 steps
because if not, it would be possible to obtain n + 1 linearly independent vectors contrary to
the exchange theorem. In words the following corollary states that any linearly independent set of vectors can
be enlarged to form a basis.
Corollary 2.6.13 Let V be a subspace of Fn and let {v1 , · · · , vr } be a linearly independent
set of vectors in V . Then either it is a basis for V or there exist vectors, vr+1 , · · · , vs such
that {v1 , · · · , vr , vr+1 , · · · , vs } is a basis for V.
Proof: This follows immediately from the proof of Theorem 2.6.12. You do exactly the
same argument except you start with {v1 , · · · , vr } rather than {v1 }. It is also true that any spanning set of vectors can be restricted to obtain a basis.
Theorem 2.6.14 Let V be a subspace of Fn and suppose span (u1 · · · , up ) = V where
the ui are nonzero vectors. Then there exist vectors {v1 · · · , vr } such that {v1 · · · , vr } ⊆
{u1 · · · , up } and {v1 · · · , vr } is a basis for V .
Proof: Let r be the smallest positive integer with the property that for some set
{v1 · · · , vr } ⊆ {u1 · · · , up } ,
span (v1 · · · , vr ) = V.
Then r ≤ p and it must be the case that {v1 · · · , vr } is linearly independent because if it
were not so, one of the vectors, say vk would be a linear combination of the others. But
then you could delete this vector from {v1 · · · , vr } and the resulting list of r − 1 vectors
would still span V contrary to the definition of r. 68
2.7
CHAPTER 2. LINEAR TRANSFORMATIONS
An Application To Matrices
The following is a theorem of major significance.
Theorem 2.7.1 Suppose A is an n × n matrix. Then A is one to one (injective) if and
only if A is onto (surjective). Also, if B is an n × n matrix and AB = I, then it follows
BA = I.
Proof: First suppose A is one to one. Consider the vectors, {Ae1 , · · · , Aen } where ek
is the column vector which is all zeros except for a 1 in the k th position. This set of vectors
is linearly independent because if
n
∑
ck Aek = 0,
k=1
then since A is linear,
(
A
n
∑
)
ck ek
=0
k=1
and since A is one to one, it follows
n
∑
ck ek = 0
k=1
which implies each ck = 0 because the ek are clearly linearly independent.
Therefore, {Ae1 , · · · , Aen } must be a basis for Fn because if not there would exist a
vector, y ∈
/ span (Ae1 , · · · , Aen ) and then by Lemma 2.6.11, {Ae1 , · · · , Aen , y} would be
an independent set of vectors having n + 1 vectors in it, contrary to the exchange theorem.
It follows that for y ∈ Fn there exist constants, ci such that
( n
)
n
∑
∑
ck ek
y=
ck Aek = A
k=1
k=1
showing that, since y was arbitrary, A is onto.
Next suppose A is onto. This means the span of the columns of A equals Fn . If these
columns are not linearly independent, then by Lemma 2.6.3 on Page 64, one of the columns
is a linear combination of the others and so the span of the columns of A equals the span of
the n − 1 other columns. This violates the exchange theorem because {e1 , · · · , en } would be
a linearly independent set of vectors contained in the span of only n − 1 vectors. Therefore,
the columns of A must be independent and this is equivalent to saying that Ax = 0 if and
only if x = 0. This implies A is one to one because if Ax = Ay, then A (x − y) = 0 and so
x − y = 0.
Now suppose AB = I. Why is BA = I? Since AB = I it follows B is one to one since
otherwise, there would exist, x ̸= 0 such that Bx = 0 and then ABx = A0 = 0 ̸= Ix.
Therefore, from what was just shown, B is also onto. In addition to this, A must be one
to one because if Ay = 0, then y = Bx for some x and then x = ABx = Ay = 0 showing
y = 0. Now from what is given to be so, it follows (AB) A = A and so using the associative
law for matrix multiplication,
A (BA) − A = A (BA − I) = 0.
But this means (BA − I) x = 0 for all x since otherwise, A would not be one to one. Hence
BA = I as claimed. 2.8. MATRICES AND CALCULUS
69
This theorem shows that if an n × n matrix B acts like an inverse when multiplied on
one side of A, it follows that B = A−1 and it will act like an inverse on both sides of A.
The conclusion of this theorem pertains to square matrices only. For example, let


(
)
1 0
1 0 0


0 1
A=
, B=
(2.27)
1 1 −1
1 0
(
Then
BA =

but
1
AB =  1
1
2.8
1 0
0 1
0
1
0
)

0
−1  .
0
Matrices And Calculus
The study of moving coordinate systems gives a non trivial example of the usefulness of the
ideas involving linear transformations and matrices. To begin with, here is the concept of
the product rule extended to matrix multiplication.
Definition 2.8.1 Let A (t) be an m × n matrix. Say A (t) = (Aij( (t)) . Suppose
also that
)
Aij (t) is a differentiable function for all i, j. Then define A′ (t) ≡ A′ij (t) . That is, A′ (t)
is the matrix which consists of replacing each entry by its derivative. Such an m × n matrix
in which the entries are differentiable functions is called a differentiable matrix.
The next lemma is just a version of the product rule.
Lemma 2.8.2 Let A (t) be an m × n matrix and let B (t) be an n × p matrix with the
property that all the entries of these matrices are differentiable functions. Then
′
(A (t) B (t)) = A′ (t) B (t) + A (t) B ′ (t) .
Proof: This is like the usual proof.
1
(A (t + h) B (t + h) − A (t) B (t)) =
h
1
1
(A (t + h) B (t + h) − A (t + h) B (t)) + (A (t + h) B (t) − A (t) B (t))
h
h
B (t + h) − B (t) A (t + h) − A (t)
= A (t + h)
+
B (t)
h
h
and now, using the fact that the entries of the matrices are all differentiable, one can pass
to a limit in both sides as h → 0 and conclude that
′
(A (t) B (t)) = A′ (t) B (t) + A (t) B ′ (t) 70
2.8.1
CHAPTER 2. LINEAR TRANSFORMATIONS
The Coriolis Acceleration
Imagine a point on the surface of the earth. Now consider unit vectors, one pointing South,
one pointing East and one pointing directly away from the center of the earth.
k
j
j
i
Denote the first as i, the second as j, and the third as k. If you are standing on the earth
you will consider these vectors as fixed, but of course they are not. As the earth turns, they
change direction and so each is in reality a function of t. Nevertheless, it is with respect
to these apparently fixed vectors that you wish to understand acceleration, velocities, and
displacements.
In general, let i∗ , j∗ , k∗ be the usual fixed vectors in space and let i (t) , j (t) , k (t) be an
orthonormal basis of vectors for each t, like the vectors described in the first paragraph.
It is assumed these vectors are C 1 functions of t. Letting the positive x axis extend in the
direction of i (t) , the positive y axis extend in the direction of j (t), and the positive z axis
extend in the direction of k (t) , yields a moving coordinate system. Now let u be a vector
and let t0 be some reference time. For example you could let t0 = 0. Then define the
components of u with respect to these vectors, i, j, k at time t0 as
u ≡ u1 i (t0 ) + u2 j (t0 ) + u3 k (t0 ) .
Let u (t) be defined as the vector which has the same components with respect to i, j, k but
at time t. Thus
u (t) ≡ u1 i (t) + u2 j (t) + u3 k (t) .
and the vector has changed although the components have not.
This is exactly the situation in the case of the apparently fixed basis vectors on the earth
if u is a position vector from the given spot on the earth’s surface to a point regarded as
fixed with the earth due to its keeping the same coordinates relative to the coordinate axes
which are fixed with the earth. Now define a linear transformation Q (t) mapping R3 to R3
by
Q (t) u ≡ u1 i (t) + u2 j (t) + u3 k (t)
where
u ≡ u1 i (t0 ) + u2 j (t0 ) + u3 k (t0 )
Thus letting v be a vector defined in the same manner as u and α, β, scalars,
(
)
(
)
(
)
Q (t) (αu + βv) ≡ αu1 + βv 1 i (t) + αu2 + βv 2 j (t) + αu3 + βv 3 k (t)
) (
)
αu1 i (t) + αu2 j (t) + αu3 k (t) + βv 1 i (t) + βv 2 j (t) + βv 3 k (t)
(
)
(
)
= α u1 i (t) + u2 j (t) + u3 k (t) + β v 1 i (t) + v 2 j (t) + v 3 k (t)
=
(
≡ αQ (t) u + βQ (t) v
2.8. MATRICES AND CALCULUS
71
showing that Q (t) is a linear transformation. Also, Q (t) preserves all distances because,
since the vectors, i (t) , j (t) , k (t) form an orthonormal set,
(
|Q (t) u| =
3
∑
( i )2
u
)1/2
= |u| .
i=1
Lemma 2.8.3 Suppose Q (t) is a real, differentiable n×n matrix which preserves distances.
T
T
Then Q (t) Q (t) = Q (t) Q (t) = I. Also, if u (t) ≡ Q (t) u, then there exists a vector, Ω (t)
such that
u′ (t) = Ω (t) × u (t) .
The symbol × refers to the cross product.
(
)
2
2
Proof: Recall that (z · w) = 41 |z + w| − |z − w| . Therefore,
(Q (t) u·Q (t) w)
This implies
(
)
1(
2
2
|Q (t) (u + w)| − |Q (t) (u − w)|
4
)
1(
2
2
=
|u + w| − |u − w|
4
= (u · w) .
=
)
T
Q (t) Q (t) u · w = (u · w)
T
T
T
for all u, w. Therefore, Q (t) Q (t) u = u and so Q (t) Q (t) = Q (t) Q (t) = I. This proves
the first part of the lemma.
It follows from the product rule, Lemma 2.8.2 that
Q′ (t) Q (t) + Q (t) Q′ (t) = 0
T
and so
T
(
)T
T
T
Q′ (t) Q (t) = − Q′ (t) Q (t)
.
(2.28)
From the definition, Q (t) u = u (t) ,
=u
z
}|
{
T
u′ (t) = Q′ (t) u =Q′ (t) Q (t) u (t).
Then writing the matrix of Q′ (t) Q (t) with respect to fixed in space orthonormal basis
vectors, i∗ , j∗ , k∗ , where these are the usual basis vectors for R3 , it follows from 2.28 that
T
the matrix of Q′ (t) Q (t) is of the form


0
−ω 3 (t) ω 2 (t)
 ω 3 (t)
0
−ω 1 (t) 
−ω 2 (t) ω 1 (t)
0
T
for some time dependent scalars ω i . Therefore,
 1 ′

u
0
−ω 3 (t)
 u2  (t) =  ω 3 (t)
0
u3
−ω 2 (t) ω 1 (t)
 1 
ω 2 (t)
u
−ω 1 (t)   u2  (t)
0
u3
72
CHAPTER 2. LINEAR TRANSFORMATIONS
where the ui are the components of the vector u (t) in terms of the fixed vectors i∗ , j∗ , k∗ .
Therefore,
T
u′ (t) = Ω (t) × u (t) = Q′ (t) Q (t) u (t)
(2.29)
where
because
Ω (t) = ω 1 (t) i∗ +ω 2 (t) j∗ +ω 3 (t) k∗ .
∗
i
j∗ k∗ Ω (t) × u (t) ≡ w1 w2 w3 ≡
u1 u2 u3 (
)
(
)
(
)
i∗ w2 u3 − w3 u2 + j∗ w3 u1 − w13 + k∗ w1 u2 − w2 u1 .
This proves the lemma and yields the existence part of the following theorem. Theorem 2.8.4 Let i (t) , j (t) , k (t) be as described. Then there exists a unique vector Ω (t)
such that if u (t) is a vector whose components are constant with respect to i (t) , j (t) , k (t) ,
then
u′ (t) = Ω (t) × u (t) .
Proof: It only remains to prove uniqueness. Suppose Ω1 also works. Then u (t) = Q (t) u
and so u′ (t) = Q′ (t) u and
Q′ (t) u = Ω × Q (t) u = Ω1 × Q (t) u
for all u. Therefore,
(Ω − Ω1 ) × Q (t) u = 0
for all u and since Q (t) is one to one and onto, this implies (Ω − Ω1 ) ×w = 0 for all w and
thus Ω − Ω1 = 0. Now let R (t) be a position vector and let
r (t) = R (t) + rB (t)
where
rB (t) ≡ x (t) i (t) +y (t) j (t) +z (t) k (t) .
R(t)
rB (t)
R
r(t)
In the example of the earth, R (t) is the position vector of a point p (t) on the earth’s
surface and rB (t) is the position vector of another point from p (t) , thus regarding p (t)
as the origin. rB (t) is the position vector of a point as perceived by the observer on the
earth with respect to the vectors he thinks of as fixed. Similarly, vB (t) and aB (t) will be
the velocity and acceleration relative to i (t) , j (t) , k (t), and so vB = x′ i + y ′ j + z ′ k and
aB = x′′ i + y ′′ j + z ′′ k. Then
v ≡ r′ = R′ + x′ i + y ′ j + z ′ k+xi′ + yj′ + zk′ .
By , 2.29, if e ∈ {i, j, k} , e′ = Ω × e because the components of these vectors with respect
to i, j, k are constant. Therefore,
xi′ + yj′ + zk′
= xΩ × i + yΩ × j + zΩ × k
= Ω× (xi + yj + zk)
2.8. MATRICES AND CALCULUS
73
and consequently,
v = R′ + x′ i + y ′ j + z ′ k + Ω × rB = R′ + x′ i + y ′ j + z ′ k + Ω× (xi + yj + zk) .
Now consider the acceleration. Quantities which are relative to the moving coordinate
system and quantities which are relative to a fixed coordinate system are distinguished by
using the subscript B on those relative to the moving coordinate system.
Ω×vB
}|
{
z
a = v = R + x i + y j + z k+x′ i′ + y ′ j′ + z ′ k′ + Ω′ × rB


Ω×rB (t)
vB
z
}|
{ z
}|
{


+Ω× x′ i + y ′ j + z ′ k+xi′ + yj′ + zk′ 
′
′′
′′
′′
′′
= R′′ + aB + Ω′ × rB + 2Ω × vB + Ω× (Ω × rB ) .
The acceleration aB is that perceived by an observer who is moving with the moving coordinate system and for whom the moving coordinate system is fixed. The term Ω× (Ω × rB )
is called the centripetal acceleration. Solving for aB ,
aB = a − R′′ − Ω′ × rB − 2Ω × vB − Ω× (Ω × rB ) .
(2.30)
Here the term − (Ω× (Ω × rB )) is called the centrifugal acceleration, it being an acceleration
felt by the observer relative to the moving coordinate system which he regards as fixed, and
the term −2Ω × vB is called the Coriolis acceleration, an acceleration experienced by the
observer as he moves relative to the moving coordinate system. The mass multiplied by the
Coriolis acceleration defines the Coriolis force.
There is a ride found in some amusement parks in which the victims stand next to
a circular wall covered with a carpet or some rough material. Then the whole circular
room begins to revolve faster and faster. At some point, the bottom drops out and the
victims are held in place by friction. The force they feel is called centrifugal force and it
causes centrifugal acceleration. It is not necessary to move relative to coordinates fixed with
the revolving wall in order to feel this force and it is pretty predictable. However, if the
nauseated victim moves relative to the rotating wall, he will feel the effects of the Coriolis
force and this force is really strange. The difference between these forces is that the Coriolis
force is caused by movement relative to the moving coordinate system and the centrifugal
force is not.
2.8.2
The Coriolis Acceleration On The Rotating Earth
Now consider the earth. Let i∗ , j∗ , k∗ , be the usual basis vectors fixed in space with k∗
pointing in the direction of the north pole from the center of the earth and let i, j, k be the
unit vectors described earlier with i pointing South, j pointing East, and k pointing away
from the center of the earth at some point of the rotating earth’s surface p. Letting R (t) be
the position vector of the point p, from the center of the earth, observe the coordinates of
R (t) are constant with respect to i (t) , j (t) , k (t) . Also, since the earth rotates from West
to East and the speed of a point on the surface of the earth relative to an observer fixed in
space is ω |R| sin ϕ where ω is the angular speed of the earth about an axis through the poles
and ϕ is the polar angle measured from the positive z axis down as in spherical coordinates.
It follows from the geometric definition of the cross product that
R′ = ωk∗ × R
74
CHAPTER 2. LINEAR TRANSFORMATIONS
Therefore, the vector of Theorem 2.8.4 is Ω = ωk∗ and so
=0
z }| {
R = Ω′ × R + Ω × R′ = Ω× (Ω × R)
′′
since Ω does not depend on t. Formula 2.30 implies
aB = a − Ω× (Ω × R) − 2Ω × vB − Ω× (Ω × rB ) .
(2.31)
In this formula, you can totally ignore the term Ω× (Ω × rB ) because it is so small whenever you are considering motion near some point on the earth’s surface. To see this, note
seconds in a day
z }| {
ω (24) (3600) = 2π, and so ω = 7.2722 × 10−5 in radians per second. If you are using
seconds to measure time and feet to measure distance, this term is therefore, no larger than
(
)2
7.2722 × 10−5 |rB | .
Clearly this is not worth considering in the presence of the acceleration due to gravity which
is approximately 32 feet per second squared near the surface of the earth.
If the acceleration a is due to gravity, then
aB = a − Ω× (Ω × R) − 2Ω × vB =
z
≡g
}|
{
GM (R + rB )
−
− Ω× (Ω × R) − 2Ω × vB ≡ g − 2Ω × vB .
3
|R + rB |
Note that
2
Ω× (Ω × R) = (Ω · R) Ω− |Ω| R
and so g, the acceleration relative to the moving coordinate system on the earth is not
directed exactly toward the center of the earth except at the poles and at the equator,
although the components of acceleration which are in other directions are very small when
compared with the acceleration due to the force of gravity and are often neglected. Therefore, if the only force acting on an object is due to gravity, the following formula describes
the acceleration relative to a coordinate system moving with the earth’s surface.
aB = g−2 (Ω × vB )
While the vector Ω is quite small, if the relative velocity, vB is large, the Coriolis acceleration
could be significant. This is described in terms of the vectors i (t) , j (t) , k (t) next.
Letting (ρ, θ, ϕ) be the usual spherical coordinates of the point p (t) on the surface
taken with respect to i∗ , j∗ , k∗ the usual way with ϕ the polar angle, it follows the i∗ , j∗ , k∗
coordinates of this point are


ρ sin (ϕ) cos (θ)
 ρ sin (ϕ) sin (θ)  .
ρ cos (ϕ)
It follows,
i = cos (ϕ) cos (θ) i∗ + cos (ϕ) sin (θ) j∗ − sin (ϕ) k∗
j = − sin (θ) i∗ + cos (θ) j∗ + 0k∗
and
k = sin (ϕ) cos (θ) i∗ + sin (ϕ) sin (θ) j∗ + cos (ϕ) k∗ .
2.8. MATRICES AND CALCULUS
75
It is necessary to obtain k∗ in terms of the vectors, i, j, k. Thus the following equation
needs to be solved for a, b, c to find k∗ = ai+bj+ck
k∗
z }| { 

0
cos (ϕ) cos (θ)
 0  =  cos (ϕ) sin (θ)
1
− sin (ϕ)
− sin (θ)
cos (θ)
0


sin (ϕ) cos (θ)
a
sin (ϕ) sin (θ)   b 
cos (ϕ)
c
(2.32)
The first column is i, the second is j and the third is k in the above matrix. The solution
is a = − sin (ϕ) , b = 0, and c = cos (ϕ) .
Now the Coriolis acceleration on the earth equals


k∗
z
}|
{


2 (Ω × vB ) = 2ω − sin (ϕ) i+0j+ cos (ϕ) k × (x′ i+y ′ j+z ′ k) .
This equals
2ω [(−y ′ cos ϕ) i+ (x′ cos ϕ + z ′ sin ϕ) j − (y ′ sin ϕ) k] .
(2.33)
Remember ϕ is fixed and pertains to the fixed point, p (t) on the earth’s surface. Therefore,
if the acceleration a is due to gravity,
aB = g−2ω [(−y ′ cos ϕ) i+ (x′ cos ϕ + z ′ sin ϕ) j − (y ′ sin ϕ) k]
(R+rB )
where g = − GM
− Ω× (Ω × R) as explained above. The term Ω× (Ω × R) is pretty
|R+rB |3
small and so it will be neglected. However, the Coriolis force will not be neglected.
Example 2.8.5 Suppose a rock is dropped from a tall building. Where will it strike?
Assume a = −gk and the j component of aB is approximately
−2ω (x′ cos ϕ + z ′ sin ϕ) .
The dominant term in this expression is clearly the second one because x′ will be small.
Also, the i and k contributions will be very small. Therefore, the following equation is
descriptive of the situation.
aB = −gk−2z ′ ω sin ϕj.
z ′ = −gt approximately. Therefore, considering the j component, this is
2gtω sin ϕ.
)
Two integrations give ωgt3 /3 sin ϕ for the j component of the relative displacement at
time t.
This shows the rock does not fall directly towards the center of the earth as expected
but slightly to the east.
(
Example 2.8.6 In 1851 Foucault set a pendulum vibrating and observed the earth rotate
out from under it. It was a very long pendulum with a heavy weight at the end so that it
would vibrate for a long time without stopping2 . This is what allowed him to observe the
earth rotate out from under it. Clearly such a pendulum will take 24 hours for the plane of
vibration to appear to make one complete revolution at the north pole. It is also reasonable
to expect that no such observed rotation would take place on the equator. Is it possible to
predict what will take place at various latitudes?
2 There is such a pendulum in the Eyring building at BYU and to keep people from touching it, there is
a little sign which says Warning! 1000 ohms.
76
CHAPTER 2. LINEAR TRANSFORMATIONS
Using 2.33, in 2.31,
aB = a − Ω× (Ω × R)
−2ω [(−y ′ cos ϕ) i+ (x′ cos ϕ + z ′ sin ϕ) j − (y ′ sin ϕ) k] .
Neglecting the small term, Ω× (Ω × R) , this becomes
= −gk + T/m−2ω [(−y ′ cos ϕ) i+ (x′ cos ϕ + z ′ sin ϕ) j − (y ′ sin ϕ) k]
where T, the tension in the string of the pendulum, is directed towards the point at which
the pendulum is supported, and m is the mass of the pendulum bob. The pendulum can be
2
thought of as the position vector from (0, 0, l) to the surface of the sphere x2 +y 2 +(z − l) =
2
l . Therefore,
y
l−z
x
k
T = −T i−T j+T
l
l
l
and consequently, the differential equations of relative motion are
x′′ = −T
y ′′ = −T
x
+ 2ωy ′ cos ϕ
ml
y
− 2ω (x′ cos ϕ + z ′ sin ϕ)
ml
and
l−z
− g + 2ωy ′ sin ϕ.
ml
If the vibrations of the pendulum are small so that for practical purposes, z ′′ = z = 0, the
last equation may be solved for T to get
z ′′ = T
gm − 2ωy ′ sin (ϕ) m = T.
Therefore, the first two equations become
x′′ = − (gm − 2ωmy ′ sin ϕ)
and
x
+ 2ωy ′ cos ϕ
ml
y
− 2ω (x′ cos ϕ + z ′ sin ϕ) .
ml
All terms of the form xy ′ or y ′ y can be neglected because it is assumed x and y remain
small. Also, the pendulum is assumed to be long with a heavy weight so that x′ and y ′ are
also small. With these simplifying assumptions, the equations of motion become
y ′′ = − (gm − 2ωmy ′ sin ϕ)
x′′ + g
and
y ′′ + g
x
= 2ωy ′ cos ϕ
l
y
= −2ωx′ cos ϕ.
l
These equations are of the form
x′′ + a2 x = by ′ , y ′′ + a2 y = −bx′
where a2 =
constant, c,
g
l
(2.34)
and b = 2ω cos ϕ. Then it is fairly tedious but routine to verify that for each
(
x = c sin
bt
2
(√
)
sin
)
(√
)
( )
bt
b2 + 4a2
b2 + 4a2
t , y = c cos
sin
t
2
2
2
(2.35)
2.8. MATRICES AND CALCULUS
77
yields a solution to 2.34 along with the initial conditions,
x (0) = 0, y (0) = 0, x′ (0) = 0, y ′ (0) =
√
c b2 + 4a2
.
2
(2.36)
It is clear from experiments with the pendulum that the earth does indeed rotate out from
under it causing the plane of vibration of the pendulum to appear to rotate. The purpose
of this discussion is not to establish these self evident facts but to predict how long it takes
for the plane of vibration to make one revolution. Therefore, there will be some instant in
time at which the pendulum will be vibrating in a plane determined by k and j. (Recall
k points away from the center of the earth and j points East. ) At this instant in time,
defined as t = 0, the conditions of 2.36 will hold for some value of c and so the solution to
2.34 having these initial conditions will be those of 2.35 by uniqueness of the initial value
problem. Writing these solutions differently,
(√
)
( ) )
(
)
(
2 + 4a2
b
x (t)
sin ( bt
2 )
=c
sin
t
y (t)
cos bt
2
2
( ) )
sin ( bt
2 )
This is very interesting! The vector, c
always has magnitude equal to |c|
cos bt
2
but its direction changes very slowly because b is very (small.
The
) plane of vibration is
√
b2 +4a2
determined by this vector and the vector k. The term sin
t changes relatively fast
2
and takes values between −1 and 1. This is what describes the actual observed vibrations
of the pendulum. Thus the plane of vibration will have made one complete revolution when
t = T for
bT
≡ 2π.
2
Therefore, the time it takes for the earth to turn out from under the pendulum is
(
T =
4π
2π
=
sec ϕ.
2ω cos ϕ
ω
Since ω is the angular speed of the rotating earth, it follows ω =
hour. Therefore, the above formula implies
2π
24
=
π
12
in radians per
T = 24 sec ϕ.
I think this is really amazing. You could actually determine latitude, not by taking readings
with instruments using the North Star but by doing an experiment with a big pendulum.
You would set it vibrating, observe T in hours, and then solve the above equation for ϕ.
Also note the pendulum would not appear to change its plane of vibration at the equator
because limϕ→π/2 sec ϕ = ∞.
The Coriolis acceleration is also responsible for the phenomenon of the next example.
Example 2.8.7 It is known that low pressure areas rotate counterclockwise as seen from
above in the Northern hemisphere but clockwise in the Southern hemisphere. Why?
Neglect accelerations other than the Coriolis acceleration and the following acceleration
which comes from an assumption that the point p (t) is the location of the lowest pressure.
a = −a (rB ) rB
where rB = r will denote the distance from the fixed point p (t) on the earth’s surface which
is also the lowest pressure point. Of course the situation could be more complicated but
78
CHAPTER 2. LINEAR TRANSFORMATIONS
this will suffice to explain the above question. Then the acceleration observed by a person
on the earth relative to the apparently fixed vectors, i, k, j, is
aB = −a (rB ) (xi+yj+zk) − 2ω [−y ′ cos (ϕ) i+ (x′ cos (ϕ) + z ′ sin (ϕ)) j− (y ′ sin (ϕ) k)]
Therefore, one obtains some differential equations from aB = x′′ i + y ′′ j + z ′′ k by matching
the components. These are
x′′ + a (rB ) x
y ′′ + a (rB ) y
z ′′ + a (rB ) z
= 2ωy ′ cos ϕ
= −2ωx′ cos ϕ − 2ωz ′ sin (ϕ)
= 2ωy ′ sin ϕ
Now remember, the vectors, i, j, k are fixed relative to the earth and so are constant vectors.
Therefore, from the properties of the determinant and the above differential equations,
i j k ′ i
j
k ′
(r′B × rB ) = x′ y ′ z ′ = x′′ y ′′ z ′′ x
x y z y
z i
j
k
′
′
′
′
= −a (rB ) x + 2ωy cos ϕ −a (rB ) y − 2ωx cos ϕ − 2ωz sin (ϕ) −a (rB ) z + 2ωy sin ϕ x
y
z
Then the kth component of this cross product equals
(
)′
ω cos (ϕ) y 2 + x2 + 2ωxz ′ sin (ϕ) .
The first term will be negative because it is assumed p (t) is the location of low pressure
causing y 2 +x2 to be a decreasing function. If it is assumed there is not a substantial motion
in the k direction, so that z is fairly constant and the last
( term
) can be neglected, then
( the
)
′
kth component of (r′B × rB ) is negative provided ϕ ∈ 0, π2 and positive if ϕ ∈ π2 , π .
Beginning with a point at rest, this implies r′B × rB = 0 initially and then the above implies
its kth component is negative in the upper hemisphere when ϕ < π/2 and positive in the
lower hemisphere when ϕ > π/2. Using the right hand and the geometric definition of the
cross product, this shows clockwise rotation in the lower hemisphere and counter clockwise
rotation in the upper hemisphere.
Note also that as ϕ gets close to π/2 near the equator, the above reasoning tends to
break down because cos (ϕ) becomes close to zero. Therefore, the motion towards the low
pressure has to be more pronounced in comparison with the motion in the k direction in
order to draw this conclusion.
2.9
Exercises
1. Show the map T : Rn → Rm defined by T (x) = Ax where A is an m × n matrix and
x is an m × 1 column vector is a linear transformation.
2. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/3.
3. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/4.
4. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of −π/3.
2.9.
EXERCISES
79
5. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 2π/3.
6. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/12. Hint: Note that π/12 = π/3 − π/4.
7. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 2π/3 and then reflects across the x axis.
8. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/3 and then reflects across the x axis.
9. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/4 and then reflects across the x axis.
10. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/6 and then reflects across the x axis followed by a reflection across the
y axis.
11. Find the matrix for the linear transformation which reflects every vector in R2 across
the x axis and then rotates every vector through an angle of π/4.
12. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of π/4 and next reflects every vector across the x axis. Compare with the
above problem.
13. Find the matrix for the linear transformation which reflects every vector in R2 across
the x axis and then rotates every vector through an angle of π/6.
14. Find the matrix for the linear transformation which reflects every vector in R2 across
the y axis and then rotates every vector through an angle of π/6.
15. Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 5π/12. Hint: Note that 5π/12 = 2π/3 − π/4.
T
16. Find the matrix for proju (v) where u = (1, −2, 3) .
T
17. Find the matrix for proju (v) where u = (1, 5, 3) .
T
18. Find the matrix for proju (v) where u = (1, 0, 3) .
19. Give an example of a 2 × 2 matrix A which has all its entries nonzero and satisfies
A2 = A. A matrix which satisfies A2 = A is called idempotent.
20. Let A be an m × n matrix and let B be an n × m matrix where n < m. Show that
AB cannot have an inverse.
21. Find ker (A) for

1
 0
A=
 1
0
2
2
4
2
3
1
4
1
2
1
3
1

1
2 
.
3 
2
Recall ker (A) is just the set of solutions to Ax = 0.
80
CHAPTER 2. LINEAR TRANSFORMATIONS
22. If A is a linear transformation, and Axp = b, show that the general solution to the
equation Ax = b is of the form xp + y where y ∈ ker (A). By this I mean to show that
whenever Az = b there exists y ∈ ker (A) such that xp + y = z. For the definition of
ker (A) see Problem 21.
23. Using Problem 21, find the general solution to the following linear system.

1
 0

 1
0
2
2
4
2
3
1
4
1
2
1
3
1




x1
1
 x2 

 
2 
  x3  = 
 
3 
 x4 
2
x5

11
7 

18 
7
24. Using Problem 21, find the general solution to the following linear system.

1
 0

 1
0
2
2
4
2
3
1
4
1
2
1
3
1


1


2 



3

2
x1
x2
x3
x4
x5



6

  7 
=

  13 

7
25. Show that the function Tu defined by Tu (v) ≡ v − proju (v) is also a linear transformation.
T
26. If u = (1, 2, 3) , as in Example 2.4.5 and Tu is given in the above problem, find the
matrix Au which satisfies Au x = Tu (x).
27. Suppose V is a subspace of Fn and T : V → Fp is a nonzero linear transformation.
Show that there exists a basis for Im (T ) ≡ T (V )
{T v1 , · · · , T vm }
and that in this situation,
{v1 , · · · , vm }
is linearly independent.
28. ↑In the situation of Problem 27 where V is a subspace of Fn , show that there exists
{z1 , · · · , zr } a basis for ker (T ) . (Recall Theorem 2.6.12. Since ker (T ) is a subspace,
it has a basis.) Now for an arbitrary T v ∈ T (V ) , explain why
T v = a1 T v1 + · · · + am T vm
and why this implies
v − (a1 v1 + · · · + am vm ) ∈ ker (T ) .
Then explain why V = span (v1 , · · · , vm , z1 , · · · , zr ) .
29. ↑In the situation of the above problem, show {v1 , · · · , vm , z1 , · · · , zr } is a basis for V
and therefore, dim (V ) = dim (ker (T )) + dim (T (V )) .
2.9.
EXERCISES
81
30. ↑Let A be a linear transformation from V to W and let B be a linear transformation
from W to U where V, W, U are all subspaces of some Fp . Explain why
A (ker (BA)) ⊆ ker (B) , ker (A) ⊆ ker (BA) .
ker(BA)
ker(B)
A
ker(A)
-
A(ker(BA))
31. ↑Let {x1 , · · · , xn } be a basis of ker (A) and let {Ay1 , · · · , Aym } be a basis of A (ker (BA)).
Let z ∈ ker (BA) . Explain why
Az ∈ span {Ay1 , · · · , Aym }
and why there exist scalars ai such that
A (z − (a1 y1 + · · · + am ym )) = 0
and why it follows z − (a1 y1 + · · · + am ym ) ∈ span {x1 , · · · , xn }. Now explain why
ker (BA) ⊆ span {x1 , · · · , xn , y1 , · · · , ym }
and so
dim (ker (BA)) ≤ dim (ker (B)) + dim (ker (A)) .
This important inequality is due to Sylvester. Show that equality holds if and only if
A(ker BA) = ker(B).
32. Generalize the result of the previous problem to any finite product of linear mappings.
33. If W ⊆ V for W, V two subspaces of Fn and if dim (W ) = dim (V ) , show W = V .
34. Let V be a subspace of Fn and let V1 , · · · , Vm be subspaces, each contained in V . Then
V = V1 ⊕ · · · ⊕ Vm
(2.37)
if every v ∈ V can be written in a unique way in the form
v = v1 + · · · + vm
where each vi ∈ Vi . This is called a direct sum. If this uniqueness condition does not
hold, then one writes
V = V1 + · · · + Vm
and this symbol means all vectors of the form
v1 + · · · + vm , vj ∈ Vj for each j.
Show 2.37 is equivalent to saying that if
0 = v1 + · · · + vm , vj ∈ Vj for each j,
{
}
then each vj = 0. Next show that in the situation of 2.37, if β i = ui1 , · · · , uimi is a
basis for Vi , then {β 1 , · · · , β m } is a basis for V .
82
CHAPTER 2. LINEAR TRANSFORMATIONS
35. ↑Suppose you have finitely many linear mappings L1 , L2 , · · · , Lm which map V to V
where V is a subspace of Fn and suppose they commute. That is, Li Lj = Lj Li for all
i, j. Also suppose Lk is one to one on ker (Lj ) whenever j ̸= k. Letting P denote the
product of these linear transformations, P = L1 L2 · · · Lm , first show
ker (L1 ) + · · · + ker (Lm ) ⊆ ker (P )
Next show Lj : ker (Li ) → ker (Li ) . Then show
ker (L1 ) + · · · + ker (Lm ) = ker (L1 ) ⊕ · · · ⊕ ker (Lm ) .
Using Sylvester’s theorem, and the result of Problem 33, show
ker (P ) = ker (L1 ) ⊕ · · · ⊕ ker (Lm )
Hint: By Sylvester’s theorem and the above problem,
∑
dim (ker (P )) ≤
dim (ker (Li ))
i
=
dim (ker (L1 ) ⊕ · · · ⊕ ker (Lm )) ≤ dim (ker (P ))
Now consider Problem 33.
36. Let M (Fn , Fn ) denote the set of all n × n matrices having entries in F. With the usual
operations of matrix addition and scalar multiplications, explain why M (Fn , Fn ) can
2
be considered as Fn . Give a basis for M (Fn , Fn ) . If A ∈ M (Fn , Fn ) , explain why
there exists a monic (leading coefficient equals 1) polynomial of the form
λk + ak−1 λk−1 + · · · + a1 λ + a0
such that
Ak + ak−1 Ak−1 + · · · + a1 A + a0 I = 0
The minimal polynomial of A is the polynomial like the above, for which p (A) = 0
which has smallest degree. I will discuss the uniqueness of this polynomial later. Hint:
2
Consider the matrices I, A, A2 , · · · , An . There are n2 + 1 of these matrices. Can they
be linearly independent? Now consider all polynomials and pick one of smallest degree
and then divide by the leading coefficient.
37. ↑Suppose the field of scalars is C and A is an n × n matrix. From the preceding
problem, and the fundamental theorem of algebra, this minimal polynomial factors
r
r
(λ − λ1 ) 1 (λ − λ2 ) 2 · · · (λ − λk )
rk
where rj is the algebraic multiplicity of λj , and the λj are distinct. Thus
r
r
(A − λ1 I) 1 (A − λ2 I) 2 · · · (A − λk I)
r
r
rk
=0
rk
and so, letting P = (A − λ1 I) 1 (A − λ2 I) 2 · · · (A − λk I)
apply the result of Problem 35 to verify that
rj
and Lj = (A − λj I)
Cn = ker (L1 ) ⊕ · · · ⊕ ker (Lk )
and that A : ker (Lj ) → ker (Lj ). In this context, ker (Lj ) is called the generalized
eigenspace for λj . You need to verify the conditions of the result of this problem hold.
2.9.
EXERCISES
83
38. In the context of Problem 37, show there exists a nonzero vector x such that
(A − λj I) x = 0.
This is called an eigenvector and the λj is called an eigenvalue. Hint:There must exist
a vector y such that
r
r
rj −1
(A − λ1 I) 1 (A − λ2 I) 2 · · · (A − λj I)
· · · (A − λk I)
rk
y = z ̸= 0
Why? Now what happens if you do (A − λj I) to z?
39. Suppose Q (t) is an orthogonal matrix. This means Q (t) is a real n × n matrix which
satisfies
T
Q (t) Q (t) = I
( )′
Suppose also the entries of Q (t) are differentiable. Show QT = −QT Q′ QT .
40. Remember the Coriolis force was 2Ω × vB where Ω was a particular vector which
came from the matrix Q (t) as described above. Show that


i (t) · i (t0 ) j (t) · i (t0 ) k (t) · i (t0 )
Q (t) =  i (t) · j (t0 ) j (t) · j (t0 ) k (t) · j (t0 )  .
i (t) · k (t0 ) j (t) · k (t0 ) k (t) · k (t0 )
There will be no Coriolis force exactly when Ω = 0 which corresponds to Q′ (t) = 0.
When will Q′ (t) = 0?
41. An illustration used in many beginning physics books is that of firing a rifle horizontally and dropping an identical bullet from the same height above the perfectly
flat ground followed by an assertion that the two bullets will hit the ground at exactly the same time. Is this true on the rotating earth assuming the experiment
takes place over a large perfectly flat field so the curvature of the earth is not an
issue? Explain. What other irregularities will occur? Recall the Coriolis acceleration
is 2ω [(−y ′ cos ϕ) i+ (x′ cos ϕ + z ′ sin ϕ) j − (y ′ sin ϕ) k] where k points away from the
center of the earth, j points East, and i points South.
84
CHAPTER 2. LINEAR TRANSFORMATIONS
Chapter 3
Determinants
3.1
Basic Techniques And Properties
Let A be an n × n matrix. The determinant of A, denoted as det (A) is a number. If the
matrix is a 2×2 matrix, this number is very easy to find.
(
)
a b
Definition 3.1.1 Let A =
. Then
c d
det (A) ≡ ad − cb.
The determinant is also often denoted by enclosing the matrix with two vertical lines. Thus
(
) a b a b
.
det
=
c d
c d (
)
2 4
Example 3.1.2 Find det
.
−1 6
From the definition this is just (2) (6) − (−1) (4) = 16.
Assuming the determinant has been defined for k × k matrices for k ≤ n − 1, it is now
time to define it for n × n matrices.
Definition 3.1.3 Let A = (aij ) be an n × n matrix. Then a new matrix called the cofactor
matrix, cof (A) is defined by cof (A) = (cij ) where to obtain cij delete the ith row and the
j th column of A, take the determinant of the (n − 1) × (n − 1) matrix which results, (This
i+j
is called the ij th minor of A. ) and then multiply this number by (−1) . To make the
th
formulas easier to remember, cof (A)ij will denote the ij entry of the cofactor matrix.
Now here is the definition of the determinant given recursively.
Theorem 3.1.4 Let A be an n × n matrix where n ≥ 2. Then
det (A) =
n
∑
aij cof (A)ij =
j=1
n
∑
aij cof (A)ij .
(3.1)
i=1
The first formula consists of expanding the determinant along the ith row and the second
expands the determinant along the j th column.
85
86
CHAPTER 3. DETERMINANTS
Note that for a n × n matrix, you will need n! terms to evaluate the determinant in this
way. If n = 10, this is 10! = 3, 628 , 800 terms. This is a lot of terms.
In addition to the difficulties just discussed, why is the determinant well defined? Why
should you get the same thing when you expand along any row or column? I think you
should regard this claim that you always get the same answer by picking any row or column
with considerable skepticism. It is incredible and not at all obvious. However, it requires
a little effort to establish it. This is done in the section on the theory of the determinant
which follows.
Notwithstanding the difficulties involved in using the method of Laplace expansion,
certain types of matrices are very easy to deal with.
Definition 3.1.5 A matrix M , is upper triangular if Mij = 0 whenever i > j. Thus such
a matrix equals zero below the main diagonal, the entries of the form Mii , as shown.


∗ ∗ ··· ∗

. 
..
 0 ∗
. .. 


 . .

.. ... ∗ 
 ..
0 ···
0 ∗
A lower triangular matrix is defined similarly as a matrix for which all entries above the
main diagonal are equal to zero.
You should verify the following using the above theorem on Laplace expansion.
Corollary 3.1.6 Let M be an upper (lower) triangular matrix. Then det (M ) is obtained
by taking the product of the entries on the main diagonal.
Proof: The corollary is true if the matrix is one to one. Suppose it is n × n. Then the
matrix is of the form
(
)
m11 a
0
M1
where M1 is (n − 1)×(n − 1) . Then expanding along the first row,
∏n you get m11 det (M1 )+0.
Then use the induction hypothesis to obtain that det (M1 ) = i=2 mii . Example 3.1.7 Let

1
 0
A=
 0
0
2
2
0
0
3
6
3
0

77
7 

33.7 
−1
Find det (A) .
From the above corollary, this is −6.
There are many properties satisfied by determinants. Some of the most important are
listed in the following theorem.
Theorem 3.1.8 If two rows or two columns in an n × n matrix A are switched, the determinant of the resulting matrix equals (−1) times the determinant of the original matrix. If
A is an n × n matrix in which two rows are equal or two columns are equal then det (A) = 0.
Suppose the ith row of A equals (xa1 + yb1 , · · · , xan + ybn ). Then
det (A) = x det (A1 ) + y det (A2 )
3.1. BASIC TECHNIQUES AND PROPERTIES
87
where the ith row of A1 is (a1 , · · · , an ) and the ith row of A2 is (b1 , · · · , bn ) , all other rows
of A1 and A2 coinciding with those of A. In other words, det is a linear function of each
row A. The same is true with the word “row” replaced with the word “column”. In addition
to this, if A and B are n × n matrices, then
det (AB) = det (A) det (B) ,
and if A is an n × n matrix, then
( )
det (A) = det AT .
This theorem implies the following corollary which gives a way to find determinants. As
I pointed out above, the method of Laplace expansion will not be practical for any matrix
of large size.
Corollary 3.1.9 Let A be an n×n matrix and let B be the matrix obtained by replacing the
ith row (column) of A with the sum of the ith row (column) added to a multiple of another
row (column). Then det (A) = det (B) . If B is the matrix obtained from A be replacing the
ith row (column) of A by a times the ith row (column) then a det (A) = det (B) .
Here is an example which shows how to use this corollary to find a determinant.
Example 3.1.10 Find the determinant of

1
 5
A=
 4
2
the matrix
2
1
5
2
3
2
4
−4

4
3 

3 
5
Replace the second row by (−5) times the first row added to it. Then replace the third
row by (−4) times the first row added to it. Finally, replace the fourth row by (−2) times
the first row added to it. This yields the matrix


1 2
3
4
 0 −9 −13 −17 

B=
 0 −3 −8 −13 
0 −2 −10 −3
and from the above corollary,
it has the same determinant as A. Now using the corollary
( )
det
(C) where
some more, det (B) = −1
3


1 2
3
4
 0 0
11
22 

C=
 0 −3 −8 −13  .
0 6
30
9
The second row was replaced by (−3) times the third row added to the second row and then
the last row was multiplied by (−3) . Now replace the last row with 2 times the third added
to it and then switch the third and second rows. Then det (C) = − det (D) where


1 2
3
4
 0 −3 −8 −13 

D=
 0 0
11
22 
0 0
14 −17
88
CHAPTER 3. DETERMINANTS
You could do more row operations or you could note that this can be easily expanded along
the first column followed by expanding the 3 × 3 matrix which results along its first column.
Thus
11 22 = 1485
det (D) = 1 (−3) 14 −17 ( )
(−1485) = 495.
and so det (C) = −1485 and det (A) = det (B) = −1
3
The theorem about expanding a matrix along any row or column also provides a way to
give a formula for the inverse of a matrix. Recall the definition of the inverse of a matrix
in Definition 2.1.22 on Page 51. The following theorem gives a formula for the inverse of a
matrix. It is proved in the next section.
)
(
Theorem 3.1.11 A−1 exists if and only if det(A) ̸= 0. If det(A) ̸= 0, then A−1 = a−1
ij
where
−1
cof (A)ji
a−1
ij = det(A)
for cof (A)ij the ij th cofactor of A.
Theorem 3.1.11 says that to find the inverse, take the transpose of the cofactor matrix
and divide by the determinant. The transpose of the cofactor matrix is called the adjugate
or sometimes the classical adjoint of the matrix A. It is an abomination to call it the adjoint
although you do sometimes see it referred to in this way. In words, A−1 is equal to one over
the determinant of A times the adjugate matrix of A.
Example 3.1.12 Find the inverse of the matrix


1 2 3
A= 3 0 1 
1 2 1
First find the determinant of this matrix. This is seen to be 12. The cofactor matrix of
A is


−2 −2 6
 4 −2 0  .
2
8 −6
Each entry of A was replaced by its cofactor. Therefore, from the above theorem, the inverse
of A should equal

T  1
−6
−2 −2 6
1 
4 −2 0  =  − 16
12
1
2
8 −6
2
1
3
− 61
0
1
6
2
3
− 12

.
This way of finding inverses is especially useful in the case where it is desired to find the
inverse of a matrix whose entries are functions.
Example 3.1.13 Suppose

et
A (t) =  0
0
−1
Find A (t)
.

0
0
cos t sin t 
− sin t cos t
3.2. EXERCISES
89
First note det (A (t)) = et . A routine computation using the above theorem shows that
this inverse is

T  −t

1
0
0
e
0
0
1 
0 et cos t et sin t  =  0 cos t − sin t  .
et
0 −et sin t et cos t
0
sin t cos t
This formula for the inverse also implies a famous procedure known as Cramer’s rule.
Cramer’s rule gives a formula for the solutions, x, to a system of equations, Ax = y.
In case you are solving a system of equations, Ax = y for x, it follows that if A−1 exists,
(
)
x = A−1 A x = A−1 (Ax) = A−1 y
thus solving the system. Now in the case that A−1 exists, there is a formula for A−1 given
above. Using this formula,
xi =
n
∑
j=1
a−1
ij yj =
n
∑
j=1
1
cof (A)ji yj .
det (A)
By the formula for the expansion of a determinant along a column,


∗ · · · y1 · · · ∗
1

..
..  ,
xi =
det  ...
.
. 
det (A)
∗ · · · yn · · · ∗
T
where here the ith column of A is replaced with the column vector, (y1 · · · ·, yn ) , and the
determinant of this modified matrix is taken and divided by det (A). This formula is known
as Cramer’s rule.
Procedure 3.1.14 Suppose A is an n × n matrix and it is desired to solve the system
T
T
Ax = y, y = (y1 , · · · , yn ) for x = (x1 , · · · , xn ) . Then Cramer’s rule says
xi =
det Ai
det A
T
where Ai is obtained from A by replacing the ith column of A with the column (y1 , · · · , yn ) .
The following theorem is of fundamental importance and ties together many of the ideas
presented above. It is proved in the next section.
Theorem 3.1.15 Let A be an n × n matrix. Then the following are equivalent.
1. A is one to one.
2. A is onto.
3. det (A) ̸= 0.
3.2
Exercises
1. Find the determinants of the following matrices.


1 2 3
(a)  3 2 2  (The answer is 31.)
0 9 8
90
CHAPTER 3. DETERMINANTS

4
(b)  1
3

1
 1
(c) 
 4
1

3 2
7 8 (The answer is 375.)
−9 3

2 3 2
3 2 3 
, (The answer is −2.)
1 5 0 
2 1 2
)
(
2. If A−1 exist, what is the relationship between det (A) and det A−1 . Explain your
answer.
3. Let A be an n × n matrix where n is odd. Suppose also that A is skew symmetric.
This means AT = −A. Show that det(A) = 0.
4. Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why it is so and
if it is not so, give a counter example.
5. Let A be an r × r matrix and suppose there are r − 1 rows (columns) such that all rows
(columns) are linear combinations of these r − 1 rows (columns). Show det (A) = 0.
6. Show det (aA) = an det (A) where here A is an n × n matrix and a is a scalar.
7. Suppose A is an upper triangular matrix. Show that A−1 exists if and only if all
elements of the main diagonal are non zero. Is it true that A−1 will also be upper
triangular? Explain. Is everything the same for lower triangular matrices?
8. Let A and B be two n × n matrices. A ∼ B (A is similar to B) means there exists an
invertible matrix S such that A = S −1 BS. Show that if A ∼ B, then B ∼ A. Show
also that A ∼ A and that if A ∼ B and B ∼ C, then A ∼ C.
9. In the context of Problem 8 show that if A ∼ B, then det (A) = det (B) .
10. Let A be an n × n matrix and let x be a nonzero vector such that Ax = λx for some
scalar, λ. When this occurs, the vector, x is called an eigenvector and the scalar, λ
is called an eigenvalue. It turns out that not every number is an eigenvalue. Only
certain ones are. Why? Hint: Show that if Ax = λx, then (λI − A) x = 0. Explain
why this shows that (λI − A) is not one to one and not onto. Now use Theorem 3.1.15
to argue det (λI − A) = 0. What sort of equation is this? How many solutions does it
have?
11. Suppose det (λI − A) = 0. Show using Theorem 3.1.15 there exists x ̸= 0 such that
(λI − A) x = 0.
(
)
a (t) b (t)
12. Let F (t) = det
. Verify
c (t) d (t)
( ′
)
(
)
a (t) b′ (t)
a (t) b (t)
′
F (t) = det
+ det
.
c (t) d (t)
c′ (t) d′ (t)
Now suppose


a (t) b (t) c (t)
F (t) = det  d (t) e (t) f (t)  .
g (t) h (t) i (t)
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
91
Use Laplace expansion and the first part to verify F ′ (t) =
 ′



a (t) b′ (t) c′ (t)
a (t) b (t) c (t)
det  d (t) e (t) f (t)  + det  d′ (t) e′ (t) f ′ (t) 
g (t) h (t) i (t)
g (t) h (t) i (t)


a (t) b (t) c (t)
+ det  d (t) e (t) f (t)  .
g ′ (t) h′ (t) i′ (t)
Conjecture a general result valid for n × n matrices and explain why it will be true.
Can a similar thing be done with the columns?
13. Use the formula for the inverse in terms of the cofactor matrix to find the inverse of
the matrix
 t

e
0
0
.
et cos t
et sin t
A= 0
t
t
t
t
0 e cos t − e sin t e cos t + e sin t
14. Let A be an r × r matrix and let B be an m × m matrix such that r + m = n. Consider
the following n × n block matrix
(
)
A 0
C=
.
D B
where the D is an m × r matrix, and the 0 is a r × m matrix. Letting Ik denote the
k × k identity matrix, tell why
(
)(
)
A 0
Ir 0
C=
.
D Im
0 B
Now explain why det (C) = det (A) det (B) . Hint: Part of this will require an explanation of why
(
)
A 0
det
= det (A) .
D Im
See Corollary 3.1.9.
15. Suppose Q is an orthogonal matrix. This means Q is a real n×n matrix which satisfies
QQT = I
Find the possible values for det (Q).
16. Suppose Q (t) is an orthogonal matrix. This means Q (t) is a real n × n matrix which
satisfies
T
Q (t) Q (t) = I
Suppose Q (t) is continuous for t ∈ [a, b] , some interval. Also suppose det (Q (t)) = 1.
Show that it follows det (Q (t)) = 1 for all t ∈ [a, b].
3.3
The Mathematical Theory Of Determinants
It is easiest to give a different definition of the determinant which is clearly well defined
and then prove the earlier one in terms of Laplace expansion. Let (i1 , · · · , in ) be an ordered
list of numbers from {1, · · · , n} . This means the order is important so (1, 2, 3) and (2, 1, 3)
are different. There will be some repetition between this section and the earlier section on
determinants. The main purpose is to give all the missing proofs. Two books which give
a good introduction to determinants are Apostol [1] and Rudin [23]. A recent book which
also has a good introduction is Baker [3]
92
CHAPTER 3. DETERMINANTS
3.3.1
The Function sgn
The following Lemma will be essential in the definition of the determinant.
Lemma 3.3.1 There exists a function, sgnn which maps each ordered list of numbers from
{1, · · · , n} to one of the three numbers, 0, 1, or −1 which also has the following properties.
sgnn (1, · · · , n) = 1
(3.2)
sgnn (i1 , · · · , p, · · · , q, · · · , in ) = − sgnn (i1 , · · · , q, · · · , p, · · · , in )
(3.3)
In words, the second property states that if two of the numbers are switched, the value of the
function is multiplied by −1. Also, in the case where n > 1 and {i1 , · · · , in } = {1, · · · , n} so
that every number from {1, · · · , n} appears in the ordered list, (i1 , · · · , in ) ,
sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) ≡
(−1)
n−θ
sgnn−1 (i1 , · · · , iθ−1 , iθ+1 , · · · , in )
(3.4)
where n = iθ in the ordered list, (i1 , · · · , in ) .
Proof: Define sign (x) = 1 if x > 0, −1 if x < 0 and 0 if x = 0. If n = 1, there is only
one list and it is just the number 1. Thus one can define sgn1 (1) ≡ 1. For the general case
where n > 1, simply define
(
)
∏
sgnn (i1 , · · · , in ) ≡ sign
(is − ir )
r<s
This delivers either −1, 1, or 0 by definition. What about the other claims? Suppose you
switch ip with iq where p < q so two numbers in the ordered list (i1 , · · · , in ) are switched.
Denote the new ordered list of numbers as (j1 , · · · , jn ) . Thus jp = iq and jq = ip and if
r∈
/ {p, q} , jr = ir . See the following illustration
i1
1
i2
2
···
ip
p
···
iq
q
···
in
n
i1
1
i2
2
···
iq
p
···
ip
q
···
in
n
j1
1
j2
2
···
jp
p
···
jq
q
···
jn
n
(
Then
sgnn (j1 , · · · , jn ) ≡ sign
∏
)
(js − jr )
r<s

both p,q
= sign (ip − iq )
∏
p<j<q
one of ∏
p,q
(ij − iq )
p<j<q
(ip − ij )
neither
∏ p nor q
r<s,r,s∈{p,q}
/

(is − ir )
∏
The last product consists of the product of terms which were in r<s (is − ir ) while the
two products in the middle both introduce q − p − 1 minus signs. Thus their product is
positive. The first factor is of opposite sign to the iq − ip which occured in sgnn (i1 , · · · , in ) .
Therefore, this switch introduced a minus sign and
sgnn (j1 , · · · , jn ) = − sgnn (i1 , · · · , in )
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
93
Now consider the last claim. In computing sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) there will
be the product of n − θ negative terms
(iθ+1 − n) · · · (in − n)
and the other terms in the product for computing sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) are
those which are required to compute sgnn−1 (i1 , · · · , iθ−1 , iθ+1 , · · · , in ) multiplied by terms
of the form (n − ij ) which are nonnegative. It follows that
sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) = (−1)
n−θ
sgnn−1 (i1 , · · · , iθ−1 , iθ+1 , · · · , in )
It is obvious that if there are repeats in the list the function gives 0. Lemma 3.3.2 Every ordered list of distinct numbers from {1, 2, · · · , n} can be obtained
from every other ordered list of distinct numbers by a finite number of switches. Also, sgnn
is unique.
Proof: This is obvious if n = 1 or 2. Suppose then that it is true for sets of n − 1
elements. Take two ordered lists of numbers, P1 , P2 . Make one switch in both to place n at
the end. Call the result P1n and P2n . Then using induction, there are finitely many switches
in P1n so that it will coincide with P2n . Now switch the n in what results to where it was in
P2 .
To see sgnn is unique, if there exist two functions, f and g both satisfying 3.2 and 3.3,
you could start with f (1, · · · , n) = g (1, · · · , n) = 1 and applying the same sequence of
switches, eventually arrive at f (i1 , · · · , in ) = g (i1 , · · · , in ) . If any numbers are repeated,
then 3.3 gives both functions are equal to zero for that ordered list. Definition 3.3.3 When you have an ordered list of distinct numbers from {1, 2, · · · , n} ,
say
(i1 , · · · , in ) ,
this ordered list is called a permutation. The symbol for all such permutations is Sn . The
number sgnn (i1 , · · · , in ) is called the sign of the permutation.
A permutation can also be considered as a function from the set
{1, 2, · · · , n} to {1, 2, · · · , n}
as follows. Let f (k) = ik . Permutations are of fundamental importance in certain areas
of math. For example, it was by considering permutations that Galois was able to give a
criterion for solution of polynomial equations by radicals, but this is a different direction
than what is being attempted here.
In what follows sgn will often be used rather than sgnn because the context supplies the
appropriate n.
3.3.2
The Definition Of The Determinant
Definition 3.3.4 Let f be a real valued function which has the set of ordered lists of numbers
from {1, · · · , n} as its domain. Define
∑
f (k1 · · · kn )
(k1 ,··· ,kn )
to be the sum of all the f (k1 · · · kn ) for all possible choices of ordered lists (k1 , · · · , kn ) of
numbers of {1, · · · , n} . For example,
∑
f (k1 , k2 ) = f (1, 2) + f (2, 1) + f (1, 1) + f (2, 2) .
(k1 ,k2 )
94
CHAPTER 3. DETERMINANTS
Definition 3.3.5 Let (aij ) = A denote an n × n matrix. The determinant of A, denoted
by det (A) is defined by
∑
det (A) ≡
sgn (k1 , · · · , kn ) a1k1 · · · ankn
(k1 ,··· ,kn )
where the sum is taken over all ordered lists of numbers from {1, · · · , n}. Note it suffices to
take the sum over only those ordered lists in which there are no repeats because if there are,
sgn (k1 , · · · , kn ) = 0 and so that term contributes 0 to the sum.
Let A be an n × n matrix A = (aij ) and let (r1 , · · · , rn ) denote an ordered list of n
numbers from {1, · · · , n}. Let A (r1 , · · · , rn ) denote the matrix whose k th row is the rk row
of the matrix A. Thus
∑
det (A (r1 , · · · , rn )) =
sgn (k1 , · · · , kn ) ar1 k1 · · · arn kn
(3.5)
(k1 ,··· ,kn )
and A (1, · · · , n) = A.
Proposition 3.3.6 Let (r1 , · · · , rn ) be an ordered list of numbers from {1, · · · , n}. Then
∑
sgn (r1 , · · · , rn ) det (A) =
sgn (k1 , · · · , kn ) ar1 k1 · · · arn kn
(3.6)
(k1 ,··· ,kn )
= det (A (r1 , · · · , rn )) .
(3.7)
Proof: Let (1, · · · , n) = (1, · · · , r, · · · s, · · · , n) so r < s.
det (A (1, · · · , r, · · · , s, · · · , n)) =
∑
(3.8)
sgn (k1 , · · · , kr , · · · , ks , · · · , kn ) a1k1 · · · arkr · · · asks · · · ankn ,
(k1 ,··· ,kn )
and renaming the variables, calling ks , kr and kr , ks , this equals
∑
=
sgn (k1 , · · · , ks , · · · , kr , · · · , kn ) a1k1 · · · arks · · · askr · · · ankn
(k1 ,··· ,kn )
=

∑
These got switched
− sgn k1 , · · · ,
z }| {
kr , · · · , ks

, · · · , kn  a1k1 · · · askr · · · arks · · · ankn
(k1 ,··· ,kn )
= − det (A (1, · · · , s, · · · , r, · · · , n)) .
(3.9)
Consequently,
det (A (1, · · · , s, · · · , r, · · · , n)) = − det (A (1, · · · , r, · · · , s, · · · , n)) = − det (A)
Now letting A (1, · · · , s, · · · , r, · · · , n) play the role of A, and continuing in this way, switching pairs of numbers,
p
det (A (r1 , · · · , rn )) = (−1) det (A)
where it took p switches to obtain(r1 , · · · , rn ) from (1, · · · , n). By Lemma 3.3.1, this implies
p
det (A (r1 , · · · , rn )) = (−1) det (A) = sgn (r1 , · · · , rn ) det (A)
and proves the proposition in the case when there are no repeated numbers in the ordered
list, (r1 , · · · , rn ). However, if there is a repeat, say the rth row equals the sth row, then the
reasoning of 3.8 -3.9 shows that det(A (r1 , · · · , rn )) = 0 and also sgn (r1 , · · · , rn ) = 0 so the
formula holds in this case also. 3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
95
Observation 3.3.7 There are n! ordered lists of distinct numbers from {1, · · · , n} .
To see this, consider n slots placed in order. There are n choices for the first slot. For
each of these choices, there are n − 1 choices for the second. Thus there are n (n − 1) ways
to fill the first two slots. Then for each of these ways there are n − 2 choices left for the third
slot. Continuing this way, there are n! ordered lists of distinct numbers from {1, · · · , n} as
stated in the observation.
3.3.3
A Symmetric Definition
With the above, it is possible to give a (more
) symmetric description of the determinant from
which it will follow that det (A) = det AT .
Corollary 3.3.8 The following formula for det (A) is valid.
det (A) =
1
·
n!
∑
∑
sgn (r1 , · · · , rn ) sgn (k1 , · · · , kn ) ar1 k1 · · · arn kn .
(3.10)
(r1 ,··· ,rn ) (k1 ,··· ,kn )
( )
( )
And also det AT = det (A) where AT is the transpose of A. (Recall that for AT = aTij ,
aTij = aji .)
Proof: From Proposition 3.3.6, if the ri are distinct,
∑
sgn (r1 , · · · , rn ) sgn (k1 , · · · , kn ) ar1 k1 · · · arn kn .
det (A) =
(k1 ,··· ,kn )
Summing over all ordered lists, (r1 , · · · , rn ) where the ri are distinct, (If the ri are not
distinct, sgn (r1 , · · · , rn ) = 0 and so there is no contribution to the sum.)
∑
∑
n! det (A) =
sgn (r1 , · · · , rn ) sgn (k1 , · · · , kn ) ar1 k1 · · · arn kn .
(r1 ,··· ,rn ) (k1 ,··· ,kn )
This proves the corollary since the formula gives the same number for A as it does for AT .
Corollary 3.3.9 If two rows or two columns in an n × n matrix A, are switched, the
determinant of the resulting matrix equals (−1) times the determinant of the original matrix.
If A is an n×n matrix in which two rows are equal or two columns are equal then det (A) = 0.
Suppose the ith row of A equals (xa1 + yb1 , · · · , xan + ybn ). Then
det (A) = x det (A1 ) + y det (A2 )
where the ith row of A1 is (a1 , · · · , an ) and the ith row of A2 is (b1 , · · · , bn ) , all other rows
of A1 and A2 coinciding with those of A. In other words, det is a linear function of each
row A. The same is true with the word “row” replaced with the word “column”.
Proof: By Proposition 3.3.6 when two rows are switched, the determinant of the resulting matrix is (−1) times the determinant of the original matrix. By Corollary 3.3.8 the
same holds for columns because the columns of the matrix equal the rows of the transposed
matrix. Thus if A1 is the matrix obtained from A by switching two columns,
( )
( )
det (A) = det AT = − det AT1 = − det (A1 ) .
96
CHAPTER 3. DETERMINANTS
If A has two equal columns or two equal rows, then switching them results in the same
matrix. Therefore, det (A) = − det (A) and so det (A) = 0.
It remains to verify the last assertion.
∑
det (A) ≡
sgn (k1 , · · · , kn ) a1k1 · · · (xarki + ybrki ) · · · ankn
(k1 ,··· ,kn )
∑
=x
+y
∑
sgn (k1 , · · · , kn ) a1k1 · · · arki · · · ankn
(k1 ,··· ,kn )
sgn (k1 , · · · , kn ) a1k1 · · · brki · · · ankn ≡ x det (A1 ) + y det (A2 ) .
(k1 ,··· ,kn )
( )
The same is true of columns because det AT = det (A) and the rows of AT are the columns
of A. 3.3.4
Basic Properties Of The Determinant
Definition 3.3.10 A vector, w, is a∑
linear combination of the vectors {v1 , · · · , vr } if there
r
exist scalars c1 , · · · cr such that w = k=1 ck vk . This is the same as saying
w ∈ span (v1 , · · · , vr ) .
The following corollary is also of great use.
Corollary 3.3.11 Suppose A is an n × n matrix and some column (row) is a linear combination of r other columns (rows). Then det (A) = 0.
(
)
Proof: Let A = a1 · · · an be the columns of A and suppose the condition that
one column is a linear combination of r of the others is satisfied. Then by using Corollary
3.3.9 you may rearrange the∑columns to have the nth column a linear combination of the
r
first r columns. Thus an = k=1 ck ak and so
(
)
∑r
det (A) = det a1 · · · ar · · · an−1
.
k=1 ck ak
By Corollary 3.3.9
det (A) =
r
∑
ck det
(
a1
···
ar
···
an−1
ak
)
= 0.
k=1
( )
The case for rows follows from the fact that det (A) = det AT . Recall the following definition of matrix multiplication.
Definition ∑
3.3.12 If A and B are n × n matrices, A = (aij ) and B = (bij ), AB = (cij )
n
where cij ≡ k=1 aik bkj .
One of the most important rules about determinants is that the determinant of a product
equals the product of the determinants.
Theorem 3.3.13 Let A and B be n × n matrices. Then
det (AB) = det (A) det (B) .
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
97
Proof: Let cij be the ij th entry of AB. Then by Proposition 3.3.6,
∑
det (AB) =
sgn (k1 , · · · , kn ) c1k1 · · · cnkn
(k1 ,··· ,kn )
∑
=
sgn (k1 , · · · , kn )
(k1 ,··· ,kn )
=
∑
(
∑
)
a1r1 br1 k1
(
···
r1
∑
∑
)
anrn brn kn
rn
sgn (k1 , · · · , kn ) br1 k1 · · · brn kn (a1r1 · · · anrn )
(r1 ··· ,rn ) (k1 ,··· ,kn )
=
∑
sgn (r1 · · · rn ) a1r1 · · · anrn det (B) = det (A) det (B) .
(r1 ··· ,rn )
The Binet Cauchy formula is a generalization of the theorem which says the determinant
of a product is the product of the determinants. The situation is illustrated in the following
picture where A, B are matrices.
B
A
Theorem 3.3.14 Let A be an n × m matrix with n ≥ m and let B be a m × n matrix. Also
let Ai
i = 1, · · · , C (n, m)
be the m × m submatrices of A which are obtained by deleting n − m rows and let Bi be the
m × m submatrices of B which are obtained by deleting corresponding n − m columns. Then
∑
C(n,m)
det (BA) =
det (Bk ) det (Ak )
k=1
Proof: This follows from a computation. By Corollary 3.3.8 on Page 95, det (BA) =
∑
1 ∑
sgn (i1 · · · im ) sgn (j1 · · · jm ) (BA)i1 j1 (BA)i2 j2 · · · (BA)im jm
m!
(i1 ···im ) (j1 ···jm )
1
m!
∑
∑
sgn (i1 · · · im ) sgn (j1 · · · jm ) ·
(i1 ···im ) (j1 ···jm )
n
∑
Bi1 r1 Ar1 j1
r1 =1
n
∑
B i2 r2 A r2 j 2 · · ·
r2 =1
n
∑
Bim rm Arm jm
rm =1
Now denote by Ik one of the r subsets of {1, · · · , n} . Thus there are C (n, m) of these.
∑
∑
k=1
{r1 ,··· ,rm }=Ik
C(n,m)
=
1
m!
∑
∑
sgn (i1 · · · im ) sgn (j1 · · · jm ) ·
(i1 ···im ) (j1 ···jm )
Bi1 r1 Ar1 j1 Bi2 r2 Ar2 j2 · · · Bim rm Arm jm
=
C(n,m)
∑
∑
k=1
{r1 ,··· ,rm }=Ik
∑
(j1 ···jm )
1
m!
∑
sgn (i1 · · · im ) Bi1 r1 Bi2 r2 · · · Bim rm ·
(i1 ···im )
sgn (j1 · · · jm ) Ar1 j1 Ar2 j2 · · · Arm jm
98
CHAPTER 3. DETERMINANTS
=
C(n,m)
∑
∑
k=1
{r1 ,··· ,rm }=Ik
∑
1
2
sgn (r1 · · · rm ) det (Bk ) det (Ak )
m!
C(n,m)
=
det (Bk ) det (Ak )
k=1
since there are m! ways of arranging the indices {r1 , · · · , rm }. 3.3.5
Expansion Using Cofactors
Lemma 3.3.15 Suppose a matrix is of the form
)
(
A ∗
M=
0 a
(
or
M=
A 0
∗ a
(3.11)
)
(3.12)
where a is a number and A is an (n − 1) × (n − 1) matrix and ∗ denotes either a column
or a row having length n − 1 and the 0 denotes either a column or a row of length n − 1
consisting entirely of zeros. Then det (M ) = a det (A) .
Proof: Denote M by (mij ) . Thus in the first case, mnn = a and mni = 0 if i ̸= n while
in the second case, mnn = a and min = 0 if i ̸= n. From the definition of the determinant,
∑
det (M ) ≡
sgnn (k1 , · · · , kn ) m1k1 · · · mnkn
(k1 ,··· ,kn )
Letting θ denote the position of n in the ordered list, (k1 , · · · , kn ) then using the earlier
conventions used to prove Lemma 3.3.1, det (M ) equals
(
)
∑
θ
n−1
n−θ
(−1)
sgnn−1 k1 , · · · , kθ−1 , kθ+1 , · · · , kn m1k1 · · · mnkn
(k1 ,··· ,kn )
Now suppose 3.12. Then if kn ̸= n, the term involving mnkn in the above expression equals
zero. Therefore, the only terms which survive are those for which θ = n or in other words,
those for which kn = n. Therefore, the above expression reduces to
∑
sgnn−1 (k1 , · · · kn−1 ) m1k1 · · · m(n−1)kn−1 = a det (A) .
a
(k1 ,··· ,kn−1 )
To get the assertion in the situation of 3.11 use Corollary 3.3.8 and 3.12 to write
(( T
))
( T)
( )
A
0
det (M ) = det M = det
= a det AT = a det (A) .
∗ a
In terms of the theory of determinants, arguably the most important idea is that of
Laplace expansion along a row or a column. This will follow from the above definition of a
determinant.
Definition 3.3.16 Let A = (aij ) be an n×n matrix. Then a new matrix called the cofactor
matrix cof (A) is defined by cof (A) = (cij ) where to obtain cij delete the ith row and the
j th column of A, take the determinant of the (n − 1) × (n − 1) matrix which results, (This
i+j
is called the ij th minor of A. ) and then multiply this number by (−1) . To make the
th
formulas easier to remember, cof (A)ij will denote the ij entry of the cofactor matrix.
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
99
The following is the main result. Earlier this was given as a definition and the outrageous
totally unjustified assertion was made that the same number would be obtained by expanding
the determinant along any row or column. The following theorem proves this assertion.
Theorem 3.3.17 Let A be an n × n matrix where n ≥ 2. Then
det (A) =
n
∑
aij cof (A)ij =
j=1
n
∑
aij cof (A)ij .
(3.13)
i=1
The first formula consists of expanding the determinant along the ith row and the second
expands the determinant along the j th column.
Proof: Let (ai1 , · · · , ain ) be the ith row of A. Let Bj be the matrix obtained from A by
leaving every row the same except the ith row which in Bj equals (0, · · · , 0, aij , 0, · · · , 0) .
Then by Corollary 3.3.9,
n
∑
det (A) =
det (Bj )
j=1
For example if

a b
A= d e
h i

c
f 
j
and i = 2, then

a
B1 =  d
h




b c
a b c
a b
0 0  , B2 =  0 e 0  , B3 =  0 0
i j
h i j
h i

c
f 
j
Denote by Aij the (n − 1) × (n − 1) matrix obtained by deleting the ith row and the j th
( )
i+j
column of A. Thus cof (A)ij ≡ (−1) det Aij . At this point, recall that from Proposition
3.3.6, when two rows or two columns in a matrix M, are switched, this results in multiplying
the determinant of the old matrix by −1 to get the determinant of the new matrix. Therefore,
by Lemma 3.3.15,
(( ij
))
A
∗
n−j
n−i
det (Bj ) = (−1)
(−1)
det
0 aij
(( ij
))
A
∗
i+j
= (−1) det
= aij cof (A)ij .
0 aij
Therefore,
det (A) =
n
∑
aij cof (A)ij
j=1
which is the formula for expanding det (A) along the ith row. Also,
n
n
∑
( T) ∑
( T)
T
aji cof (A)ji
det (A) = det A =
aij cof A ij =
j=1
j=1
which is the formula for expanding det (A) along the ith column. 100
3.3.6
CHAPTER 3. DETERMINANTS
A Formula For The Inverse
Note that this gives an easy way to write a formula for the inverse of an n ×n matrix. Recall
the definition of the inverse of a matrix in Definition 2.1.22 on Page 51.
(
)
Theorem 3.3.18 A−1 exists if and only if det(A) ̸= 0. If det(A) ̸= 0, then A−1 = a−1
ij
where
−1
a−1
cof (A)ji
ij = det(A)
for cof (A)ij the ij th cofactor of A.
Proof: By Theorem 3.3.17 and letting (air ) = A, if det (A) ̸= 0,
n
∑
air cof (A)ir det(A)−1 = det(A) det(A)−1 = 1.
i=1
Now in the matrix A, replace the k th column with the rth column and then expand along
the k th column. This yields for k ̸= r,
n
∑
air cof (A)ik det(A)−1 = 0
i=1
because there are two equal columns by Corollary 3.3.9. Summarizing,
n
∑
−1
air cof (A)ik det (A)
= δ rk .
i=1
Using the other formula in Theorem 3.3.17, and similar reasoning,
n
∑
arj cof (A)kj det (A)
−1
= δ rk
j=1
(
)
This proves that if det (A) ̸= 0, then A−1 exists with A−1 = a−1
ij , where
−1
a−1
ij = cof (A)ji det (A)
.
Now suppose A−1 exists. Then by Theorem 3.3.13,
(
)
(
)
1 = det (I) = det AA−1 = det (A) det A−1
so det (A) ̸= 0. The next corollary points out that if an n × n matrix A has a right or a left inverse, then
it has an inverse.
Corollary 3.3.19 Let A be an n × n matrix and suppose there exists an n × n matrix B
such that BA = I. Then A−1 exists and A−1 = B. Also, if there exists C an n × n matrix
such that AC = I, then A−1 exists and A−1 = C.
Proof: Since BA = I, Theorem 3.3.13 implies
det B det A = 1
and so det A ̸= 0. Therefore from Theorem 3.3.18, A−1 exists. Therefore,
(
)
A−1 = (BA) A−1 = B AA−1 = BI = B.
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
101
The case where CA = I is handled similarly. The conclusion of this corollary is that left inverses, right inverses and inverses are all
the same in the context of n × n matrices.
Theorem 3.3.18 says that to find the inverse, take the transpose of the cofactor matrix
and divide by the determinant. The transpose of the cofactor matrix is called the adjugate
or sometimes the classical adjoint of the matrix A. It is an abomination to call it the adjoint
although you do sometimes see it referred to in this way. In words, A−1 is equal to one over
the determinant of A times the adjugate matrix of A.
In case you are solving a system of equations, Ax = y for x, it follows that if A−1 exists,
(
)
x = A−1 A x = A−1 (Ax) = A−1 y
thus solving the system. Now in the case that A−1 exists, there is a formula for A−1 given
above. Using this formula,
xi =
n
∑
a−1
ij yj =
j=1
n
∑
j=1
1
cof (A)ji yj .
det (A)
By the formula for the expansion of a determinant along a column,


∗ · · · y1 · · · ∗
1

..
..  ,
xi =
det  ...
.
. 
det (A)
∗ · · · yn · · · ∗
T
where here the ith column of A is replaced with the column vector, (y1 · · · ·, yn ) , and the
determinant of this modified matrix is taken and divided by det (A). This formula is known
as Cramer’s rule.
Definition 3.3.20 A matrix M , is upper triangular if Mij = 0 whenever i > j. Thus such
a matrix equals zero below the main diagonal, the entries of the form Mii as shown.


∗ ∗ ··· ∗

. 
..
 0 ∗
. .. 


 . .

.. ... ∗ 
 ..
0 ···
0 ∗
A lower triangular matrix is defined similarly as a matrix for which all entries above the
main diagonal are equal to zero.
With this definition, here is a simple corollary of Theorem 3.3.17.
Corollary 3.3.21 Let M be an upper (lower) triangular matrix. Then det (M ) is obtained
by taking the product of the entries on the main diagonal.
3.3.7
Rank Of A Matrix
Definition 3.3.22 A submatrix of a matrix A is the rectangular array of numbers obtained
by deleting some rows and columns of A. Let A be an m × n matrix. The determinant
rank of the matrix equals r where r is the largest number such that some r × r submatrix
of A has a non zero determinant. The row rank is defined to be the dimension of the span
of the rows. The column rank is defined to be the dimension of the span of the columns.
102
CHAPTER 3. DETERMINANTS
Theorem 3.3.23 If A, an m × n matrix has determinant rank r, then there exist r rows of
the matrix such that every other row is a linear combination of these r rows.
Proof: Suppose the determinant rank of A = (aij ) equals r. Thus some r × r submatrix
has non zero determinant and there is no larger square submatrix which has non zero
determinant. Suppose such a submatrix is determined by the r columns whose indices are
j1 < · · · < jr
and the r rows whose indices are
i1 < · · · < ir
I want to show that every row is a linear combination of these rows. Consider the lth row
and let p be an index between 1 and n. Form the following (r + 1) × (r + 1) matrix


ai1 j1 · · · ai1 jr ai1 p
 ..

..
..
 .

.
.


 air j1 · · · air jr air p 
alj1
· · · aljr
alp
Of course you can assume l ∈
/ {i1 , · · · , ir } because there is nothing to prove if the lth
row is one of the chosen ones. The above matrix has determinant 0. This is because if
p∈
/ {j1 , · · · , jr } then the above would be a submatrix of A which is too large to have non
zero determinant. On the other hand, if p ∈ {j1 , · · · , jr } then the above matrix has two
columns which are equal so its determinant is still 0.
Expand the determinant of the above matrix along the last column. Let Ck denote the
cofactor associated with the entry aik p . This is not dependent on the choice of p. Remember,
you delete the column and the row the entry is in and take the determinant of what is left
and multiply by −1 raised to an appropriate power. Let C denote the cofactor associated
with alp . This is given to be nonzero, it being the determinant of the matrix


ai1 j1 · · · ai1 jr
 ..

..
 .

.
air j1
···
Thus
0 = alp C +
a ir j r
r
∑
Ck aik p
k=1
which implies
alp =
r
∑
−Ck
k=1
C
aik p ≡
r
∑
mk aik p
k=1
Since this is true for every p and since mk does not depend on p, this has shown the lth row
is a linear combination of the i1 , i2 , · · · , ir rows. Corollary 3.3.24 The determinant rank equals the row rank.
Proof: From Theorem 3.3.23, every row is in the span of r rows where r is the determinant rank. Therefore, the row rank (dimension of the span of the rows) is no larger than
the determinant rank. Could the row rank be smaller than the determinant rank? If so,
it follows from Theorem 3.3.23 that there exist p rows for p < r ≡ determinant rank, such
that the span of these p rows equals the row space. But then you could consider the r × r
3.3. THE MATHEMATICAL THEORY OF DETERMINANTS
103
sub matrix which determines the determinant rank and it would follow that each of these
rows would be in the span of the restrictions of the p rows just mentioned. By Theorem
2.6.4, the exchange theorem, the rows of this sub matrix would not be linearly independent
and so some row is a linear combination of the others. By Corollary 3.3.11 the determinant
would be 0, a contradiction. Corollary 3.3.25 If A has determinant rank r, then there exist r columns of the matrix
such that every other column is a linear combination of these r columns. Also the column
rank equals the determinant rank.
Proof: This follows from the above by considering AT . The rows of AT are the columns
of A and the determinant rank of AT and A are the same. Therefore, from Corollary 3.3.24,
column rank of A = row rank of AT = determinant rank of AT = determinant rank of A.
The following theorem is of fundamental importance and ties together many of the ideas
presented above.
Theorem 3.3.26 Let A be an n × n matrix. Then the following are equivalent.
1. det (A) = 0.
2. A, AT are not one to one.
3. A is not onto.
Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n. Therefore,
there exist r columns such that every other column is a linear combination of these columns
th
by Theorem 3.3.23. In particular, it follows that for
( some m, the m column
) is a linear
combination of all the others. Thus letting A = a1 · · · am · · · an where the
columns are denoted by ai , there exists scalars αi such that
∑
α k ak .
am =
k̸=m
(
· · · −1 · · · αn
∑
αk ak = 0.
Ax = −am +
Now consider the column vector, x ≡
α1
)T
. Then
k̸=m
Since also A0 = 0, it follows A is not one to one. Similarly, AT is not one to one by the
same argument applied to AT . This verifies that 1.) implies 2.).
Now suppose 2.). Then since AT is not one to one, it follows there exists x ̸= 0 such that
AT x = 0.
Taking the transpose of both sides yields
x T A = 0T
where the 0T is a 1 × n matrix or row vector. Now if Ay = x, then
(
)
2
|x| = xT (Ay) = xT A y = 0y = 0
contrary to x ̸= 0. Consequently there can be no y such that Ay = x and so A is not onto.
This shows that 2.) implies 3.).
Finally, suppose 3.). If 1.) does not hold, then det (A) ̸= 0 but then from Theorem 3.3.18
A−1 exists and so for every y ∈ Fn there exists a unique x ∈ Fn such that Ax = y. In fact
x = A−1 y. Thus A would be onto contrary to 3.). This shows 3.) implies 1.). 104
CHAPTER 3. DETERMINANTS
Corollary 3.3.27 Let A be an n × n matrix. Then the following are equivalent.
1. det(A) ̸= 0.
2. A and AT are one to one.
3. A is onto.
Proof: This follows immediately from the above theorem.
3.3.8
Summary Of Determinants
In all the following A, B are n × n matrices
1. det (A) is a number.
2. det (A) is linear in each row and in each column.
3. If you switch two rows or two columns, the determinant of the resulting matrix is −1
times the determinant of the unswitched matrix. (This and the previous one say
(a1 · · · an ) → det (a1 · · · an )
is an alternating multilinear function or alternating tensor.
4. det (e1 , · · · , en ) = 1.
5. det (AB) = det (A) det (B)
6. det (A) can be expanded along any row or any column and the same result is obtained.
( )
7. det (A) = det AT
8. A−1 exists if and only if det (A) ̸= 0 and in this case
( −1 )
A
=
ij
1
cof (A)ji
det (A)
(3.14)
9. Determinant rank, row rank and column rank are all the same number for any m × n
matrix.
3.4
The Cayley Hamilton Theorem
Definition 3.4.1 Let A be an n × n matrix. The characteristic polynomial is defined as
pA (t) ≡ det (tI − A)
and the solutions to pA (t) = 0 are called eigenvalues. For A a matrix and p (t) = tn +
an−1 tn−1 + · · · + a1 t + a0 , denote by p (A) the matrix defined by
p (A) ≡ An + an−1 An−1 + · · · + a1 A + a0 I.
The explanation for the last term is that A0 is interpreted as I, the identity matrix.
3.4. THE CAYLEY HAMILTON THEOREM
105
The Cayley Hamilton theorem states that every matrix satisfies its characteristic equation, that equation defined by pA (t) = 0. It is one of the most important theorems in linear
algebra1 . The proof in this section is not the most general proof, but works well when the
field of scalars is R or C. The following lemma will help with its proof.
Lemma 3.4.2 Suppose for all |λ| large enough,
A0 + A1 λ + · · · + Am λm = 0,
where the Ai are n × n matrices. Then each Ai = 0.
Proof: Multiply by λ−m to obtain
A0 λ−m + A1 λ−m+1 + · · · + Am−1 λ−1 + Am = 0.
Now let |λ| → ∞ to obtain Am = 0. With this, multiply by λ to obtain
A0 λ−m+1 + A1 λ−m+2 + · · · + Am−1 = 0.
Now let |λ| → ∞ to obtain Am−1 = 0. Continue multiplying by λ and letting λ → ∞ to
obtain that all the Ai = 0. With the lemma, here is a simple corollary.
Corollary 3.4.3 Let Ai and Bi be n × n matrices and suppose
A0 + A1 λ + · · · + Am λm = B0 + B1 λ + · · · + Bm λm
for all |λ| large enough. Then Ai = Bi for all i. If Ai = Bi for each Ai , Bi then one can
substitute an n × n matrix M for λ and the identity will continue to hold.
Proof: Subtract and use the result of the lemma. The last claim is obvious by matching
terms. With this preparation, here is a relatively easy proof of the Cayley Hamilton theorem.
Theorem 3.4.4 Let A be an n × n matrix and let p (λ) ≡ det (λI − A) be the characteristic
polynomial. Then p (A) = 0.
Proof: Let C (λ) equal the transpose of the cofactor matrix of (λI − A) for |λ| large.
(If |λ| is large enough, then λ cannot be in the finite list of eigenvalues of A and so for such
−1
λ, (λI − A) exists.) Therefore, by Theorem 3.3.18
C (λ) = p (λ) (λI − A)
−1
.
Say
p (λ) = a0 + a1 λ + · · · + λn
Note that each entry in C (λ) is a polynomial in λ having degree no more than n − 1.
Therefore, collecting the terms,
C (λ) = C0 + C1 λ + · · · + Cn−1 λn−1
for Cj some n × n matrix. Then
(
)
C (λ) (λI − A) = C0 + C1 λ + · · · + Cn−1 λn−1 (λI − A) = p (λ) I
1 A special case was first proved by Hamilton in 1853. The general case was announced by Cayley some
time later and a proof was given by Frobenius in 1878.
106
CHAPTER 3. DETERMINANTS
Then multiplying out the middle term, it follows that for all |λ| sufficiently large,
a0 I + a1 Iλ + · · · + Iλn = C0 λ + C1 λ2 + · · · + Cn−1 λn
[
]
− C0 A + C1 Aλ + · · · + Cn−1 Aλn−1
= −C0 A + (C0 − C1 A) λ + (C1 − C2 A) λ2 + · · · + (Cn−2 − Cn−1 A) λn−1 + Cn−1 λn
Then, using Corollary 3.4.3, one can replace λ on both sides with A. Then the right side is
seen to equal 0. Hence the left side, p (A) I is also equal to 0. 3.5
Block Multiplication Of Matrices
Consider the following problem
(
A
C
B
D
)(
E
G
F
H
)
You know how to do this. You get
(
)
AE + BG AF + BH
.
CE + DG CF + DH
Now what if instead of numbers, the entries, A, B, C, D, E, F, G are matrices of a size such
that the multiplications and additions needed in the above formula all make sense. Would
the formula be true in this case? I will show below that this is true.
Suppose A is a matrix of the form


A11 · · · A1m

.. 
..
A =  ...
(3.15)
.
. 
Ar1
···
Arm
where Aij is a si × pj matrix where si is constant for j = 1, · · · , m for each i = 1, · · · , r.
Such a matrix is called a block matrix, also a partitioned matrix. How do you get the
block Aij ? Here is how for A an m × n matrix:
z(
si ×m
}|
0 Isi ×si
){
z

n×pj
}|
0
{
0 A Ipj ×pj .
0
(3.16)
In the block column matrix on the right, you need to have cj − 1 rows of zeros above the
small pj × pj identity matrix where the columns of A involved in Aij are cj , · · · , cj + pj − 1
and in the block row matrix on the left, you need to have ri − 1 columns of zeros to the left
of the si × si identity matrix where the rows of A involved in Aij are ri , · · · , ri + si . An
important observation to make is that the matrix on the right specifies columns to use in
the block and the one on the left specifies the rows used. Thus the block Aij in this case
is a matrix of size si × pj . There is no overlap between the blocks of A. Thus the identity
n × n identity matrix corresponding to multiplication on the right of A is of the form


Ip1 ×p1
0


..


.
0
Ipm ×pm
3.5. BLOCK MULTIPLICATION OF MATRICES
107
where these little identity matrices don’t overlap. A similar conclusion follows from consideration of the matrices Isi ×si . Note that in 3.16 the matrix on the right is a block column
matrix for the above block diagonal matrix and the matrix on the left in 3.16 is a block row
matrix taken from a similar block diagonal matrix consisting of the Isi ×si .
Next consider the question of multiplication of two block matrices. Let B be a block
matrix of the form


B11 · · · B1p
 ..
.. 
..
(3.17)
 .
.
. 
Br1
···
Brp
A11
 ..
 .
Ap1
···
..
.
···

A1m
.. 
. 
Apm
and A is a block matrix of the form

(3.18)
and that for all i, j, it makes sense to multiply Bis Asj for all s ∈ {1, · · · , p}. (That is the
two matrices, Bis and Asj are conformable.) and that
∑ for fixed ij, it follows Bis Asj is the
same size for each s so that it makes sense to write s Bis Asj .
The following theorem says essentially that when you take the product of two matrices,
you can do it two ways. One way is to simply multiply them forming BA. The other way
is to partition both matrices, formally multiply the blocks to get another block matrix and
this one will be BA partitioned. Before presenting this theorem, here is a simple lemma
which is really a special case of the theorem.
Lemma 3.5.1 Consider the following product.


0
(
 I  0 I
0
0
)
where the first is n × r and the second is r × n. The small identity matrix I is an r × r matrix
and there are l zero rows above I and l zero columns to the left of I in the right matrix.
Then the product of these matrices is a block matrix of the form


0 0 0
 0 I 0 
0 0 0
Proof: From the definition of the way you multiply matrices, the product is
 










 
0
0
0
0
0
0
  I  0 · · ·  I  0  I  e1 · · ·  I  er  I  0 · · ·  I  0 
0
0
0
0
0
0
which yields the claimed result. In the formula ej refers to the column vector of length r
which has a 1 in the j th position. Theorem 3.5.2 Let B be a q × p block matrix as in 3.17 and let A be a p × n block matrix
as in 3.18 such that Bis is conformable with Asj and each product, Bis Asj for s = 1, · · · , p
is of the same size so they can be added. Then BA can be obtained as a block matrix such
that the ij th block is of the form
∑
Bis Asj .
(3.19)
s
108
CHAPTER 3. DETERMINANTS
Proof: From 3.16
Bis Asj =
(
0 Iri ×ri
0
)


0
B  Ips ×ps 
0
(
0 Ips ×ps
0
)

0

A  Iqj ×qj 
0
where here it is assumed Bis is ri × ps and Asj is ps × qj . The product involves the sth
block in the ith row of blocks for B and the sth block in the j th column of A. Thus there
are the same number of rows above the Ips ×ps as there are columns to the left of Ips ×ps in
those two inside matrices. Then from Lemma 3.5.1




0
0
0
0
(
)
 Ips ×ps  0 Ips ×ps 0 =  0 Ips ×ps 0 
0
0
0
0
Since the blocks of small identity matrices do not overlap,


  I
0
p1 ×p1
0
0
0
∑

..
 0 Ips ×ps 0  = 

=I
.
s
0
0
0
0
Ipp ×pp
and so
∑
Bis Asj =
s
∑(
s
=
=
(
(
0
0 Iri ×ri
Iri ×ri
)

0

(
)

0

B  Ips ×ps  0 Ips ×ps 0 A  Iqj ×qj 
0
0




0
0
) ∑
(
)
 Ips ×ps  0 Ips ×ps 0 A  Iqj ×qj 
0 B
s
0
0




0
0
)
(
)
0 BIA  Iqj ×qj  = 0 Iri ×ri 0 BA  Iqj ×qj 
0
0
0 Iri ×ri
0
th
which equals the ij th block of BA. Hence the
∑ ij block of BA equals the formal multiplication according to matrix multiplication,
s Bis Asj . (
)
a b
Example 3.5.3 Let an n×n matrix have the form A =
where P is n−1×n−1.
c P
(
)
p q
Multiply it by B =
where B is also an n × n matrix and Q is n − 1 × n − 1.
r Q
You use block multiplication
(
)(
a b
p
c P
r
q
Q
)
(
=
ap + br aq + bQ
pc + P r cq + P Q
)
Note that this all makes sense. For example, b = 1 × n − 1 and r = n − 1 × 1 so br is a
1 × 1. Similar considerations apply to the other blocks.
Here is an interesting and significant application of block multiplication. In this theorem,
pM (t) denotes the characteristic polynomial, det (tI − M ) . The zeros of this polynomial will
be shown later to be eigenvalues of the matrix M . First note that from block multiplication,
for the following block matrices consisting of square blocks of an appropriate size,
(
) (
)(
)
A 0
A 0
I 0
=
so
B C
B I
0 C
3.6. EXERCISES
(
det
109
A
B
0
C
)
(
= det
A
B
0
I
)
(
det
I
0
0
C
)
= det (A) det (C)
Theorem 3.5.4 Let A be an m × n matrix and let B be an n × m matrix for m ≤ n. Then
pBA (t) = tn−m pAB (t) ,
so the eigenvalues of BA and AB are the same including multiplicities except that BA has
n−m extra zero eigenvalues. Here pA (t) denotes the characteristic polynomial of the matrix
A.
Proof: Use block multiplication to write
(
)(
) (
)
AB 0
I A
AB ABA
=
B 0
0 I
B
BA
(
)(
) (
)
I A
0
0
AB ABA
=
.
0 I
B BA
B
BA
Therefore,
(
)
0
=
BA
(
)
(
)
0
0
AB 0
Since the two matrices above are similar, it follows that
and
have
B BA
B 0
the same characteristic polynomials. See Problem 8 on Page 90. Therefore, noting that BA
is an n × n matrix and AB is an m × m matrix,
I
0
A
I
)−1 (
AB
B
0
0
)(
I
0
A
I
)
(
0
B
tm det (tI − BA) = tn det (tI − AB)
and so det (tI − BA) = pBA (t) = tn−m det (tI − AB) = tn−m pAB (t) . 3.6
Exercises
1. Let m < n and let A be an m × n matrix. Show that A is not one to one. Hint:
Consider the n × n matrix A1 which is of the form
(
)
A
A1 ≡
0
where the 0 denotes an (n − m) × n matrix of zeros. Thus det A1 = 0 and so A1 is
not one to one. Now observe that A1 x is the vector,
(
)
Ax
A1 x =
0
which equals zero if and only if Ax = 0.
2. Let v1 , · · · , vn be vectors in Fn and let M (v1 , · · · , vn ) denote the matrix whose ith
column equals vi . Define
d (v1 , · · · , vn ) ≡ det (M (v1 , · · · , vn )) .
Prove that d is linear in each variable, (multilinear), that
d (v1 , · · · , vi , · · · , vj , · · · , vn ) = −d (v1 , · · · , vj , · · · , vi , · · · , vn ) ,
(3.20)
110
CHAPTER 3. DETERMINANTS
and
d (e1 , · · · , en ) = 1
(3.21)
where here ej is the vector in Fn which has a zero in every position except the j th
position in which it has a one.
3. Suppose f : Fn × · · · × Fn → F satisfies 3.20 and 3.21 and is linear in each variable.
Show that f = d.
4. Show that if you replace a row (column) of an n × n matrix A with itself added to
some multiple of another row (column) then the new matrix has the same determinant
as the original one.
5. Use the result of Problem 4 to evaluate by hand

1 2 3
 −6 3 2
det 
 5 2 2
3 4 6
the determinant

2
3 
.
3 
4
6. Find the inverse if it exists of the matrix
 t

e
cos t
sin t
 et − sin t cos t  .
et − cos t − sin t
7. Let Ly = y (n) + an−1 (x) y (n−1) + · · · + a1 (x) y ′ + a0 (x) y where the ai are given
continuous functions defined on an interval, (a, b) and y is some function which has n
derivatives so it makes sense to write Ly. Suppose Lyk = 0 for k = 1, 2, · · · , n. The
Wronskian of these functions, yi is defined as


y1 (x)
···
yn (x)
 y1′ (x)
···
yn′ (x) 


W (y1 , · · · , yn ) (x) ≡ det 
..
..



.
.
(n−1)
y1
(x)
···
(n−1)
yn
(x)
Show that for W (x) = W (y1 , · · · , yn ) (x) to save space,

y1 (x)
..
.
···


···
W ′ (x) = det  (n−2)
 y1
(x)
(n)
y1 (x)
···
yn (x)
..
.
(n−2)
yn
(x)
(n)
yn (x)



.

Now use the differential equation, Ly = 0 which is satisfied by each of these functions,
yi and properties of determinants presented above to verify that W ′ + an−1 (x) W = 0.
Give an explicit solution of this linear differential equation, Abel’s formula, and use
your answer to verify that the Wronskian of these solutions to the equation, Ly = 0
either vanishes identically on (a, b) or never.
8. Two n × n matrices, A and B, are similar if B = S −1 AS for some invertible n × n
matrix S. Show that if two matrices are similar, they have the same characteristic
polynomials. The characteristic polynomial of A is det (λI − A) .
3.6. EXERCISES
111
9. Suppose the characteristic polynomial of an n × n matrix A is of the form
tn + an−1 tn−1 + · · · + a1 t + a0
and that a0 ̸= 0. Find a formula A−1 in terms of powers of the matrix A. Show that
n
A−1 exists if and only if a0 ̸= 0. In fact, show that a0 = (−1) det (A) .
10. ↑Letting p (t) denote the characteristic polynomial of A, show that pε (t) ≡ p (t − ε)
is the characteristic polynomial of A + εI. Then show that if det (A) = 0, it follows
that det (A + εI) ̸= 0 whenever |ε| is sufficiently small.
11. In constitutive
modeling of the stress and strain tensors, one sometimes considers sums
∑∞
of the form k=0 ak Ak where A is a 3×3 matrix. Show using the Cayley Hamilton
theorem that if such a thing makes any sense, you can always obtain it as a finite sum
having no more than n terms.
12. Recall you can find the determinant from expanding along the j th column.
∑
det (A) =
Aij (cof (A))ij
i
Think of det (A) as a function of the entries, Aij . Explain why the ij th cofactor is
really just
∂ det (A)
.
∂Aij
13. Let U be an open set in Rn and let g :U → Rn be such that all the first partial
derivatives of all components of g exist and are continuous. Under these conditions
form the matrix Dg (x) given by
Dg (x)ij ≡
∂gi (x)
≡ gi,j (x)
∂xj
The best kept secret in calculus courses is that the linear transformation determined
by this matrix Dg (x) is called the derivative of g and is the correct generalization
of the concept of derivative of a function of one variable. Suppose the second partial
derivatives also exist and are continuous. Then show that
∑
(cof (Dg))ij,j = 0.
∑
j
Hint: First explain why
i gi,k cof (Dg)ij = δ jk det (Dg) . Next differentiate with
respect to xj and sum on j using the equality of mixed partial derivatives. Assume
det (Dg) ̸= 0 to prove the identity in this special case. Then explain using Problem 10
why there exists a sequence εk → 0 such that for gεk (x) ≡ g (x) + εk x, det (Dgεk ) ̸= 0
and so the identity holds for gεk . Then take a limit to get the desired result in general.
This is an extremely important identity which has surprising implications. One can
build degree theory on it for example. It also leads to simple proofs of the Brouwer
fixed point theorem from topology. See Evans [9] for example.
14. A determinant of the form
1
a0
a20
..
.
n−1
a
0
an
0
1
a1
a21
..
.
···
···
···
1
an
a2n
..
.
an−1
1
an1
···
···
an−1
n
ann
112
CHAPTER 3. DETERMINANTS
is called a Vandermonde determinant. Show this determinant equals
∏
(aj − ai )
0≤i<j≤n
By this is meant to take the product of all terms of the form (aj − ai ) such that j > i.
Hint: Show it works if n = 1 so you are looking at
1 1 a0 a1 Then suppose it holds for n − 1 and consider the case n. Consider the polynomial in
t, p (t) which is obtained from the above by replacing the last column with the column
(
1 t ···
tn
)T
.
Explain why p (aj ) = 0 for i = 0, · · · , n − 1. Explain why
p (t) = c
n−1
∏
(t − ai ) .
i=0
Of course c is the coefficient of tn . Find this coefficient from the above description of
p (t) and the induction hypothesis. Then plug in t = an and observe you have the
formula valid for n.
15. The example in this exercise was shown to me by Marc van Leeuwen and it helped to
correct a misleading proof of the Cayley Hamilton theorem presented in this chapter.
If p (λ) = q (λ) for all λ or for all λ large enough where p (λ) , q (λ) are polynomials
having matrix coefficients, then it is not necessarily the case that p (A) = q (A) for A
a matrix of an appropriate size. The proof in question read as though it was using
this incorrect argument. Let
(
)
(
)
(
)
1 0
0 0
0 1
E1 =
, E2 =
,N =
0 0
0 1
0 0
Show that for all λ,
(
)
(λI + E1 ) (λI + E2 ) = λ2 + λ I = (λI + E2 ) (λI + E1 )
However,
(N I + E1 ) (N I + E2 ) ̸= (N I + E2 ) (N I + E1 )
Explain why this can happen. In the proof of the Cayley Hamilton theorem given
in the chapter, show that the matrix A does commute with the matrices Ci in that
argument. Hint: Multiply both sides out with N in place of λ. Does N commute
with Ei ?
Chapter 4
Row Operations
4.1
Elementary Matrices
The elementary matrices result from doing a row operation to the identity matrix.
Definition 4.1.1 The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by a multiple of another row added to it.
The elementary matrices are given in the following definition.
Definition 4.1.2 The elementary matrices consist of those matrices which result by applying a row operation to an identity matrix. Those which involve switching rows of the identity
are called permutation matrices. More generally, if (i1 , i2 , · · · , in ) is a permutation, a matrix which has a 1 in the ik position in row k and zero in every other position of that row is
called a permutation matrix. Thus each permutation corresponds to a unique permutation
matrix.
As an example of why these elementary matrices are interesting, consider the following.


 

0 1 0
a b c d
x y z w
 1 0 0  x y z w  =  a b c d 
0 0 1
f g h i
f g h i
A 3 × 4 matrix was multiplied on the left by an elementary matrix which was obtained from
row operation 1 applied to the identity matrix. This resulted in applying the operation 1
to the given matrix. This is what happens in general.
Now consider what these elementary matrices look like. First consider the one which
involves switching row i and row j where i < j. This matrix is of the form


1
0


..


.




0
·
·
·
1




..
..


.
.




1 ··· 0




..


.
0
1
113
114
CHAPTER 4. ROW OPERATIONS
The two exceptional rows are shown. The ith row was the j th and the j th row was the ith
in the identity matrix. Now consider what this does to a column vector.


 

v1
v1
1
0

  ..   .. 
..
 .   . 

.


 


  vi   vj 
0
·
·
·
1

 


  ..   .. 

..
..
=






.
.
 .   . 







1 ··· 0
  vj   vi 






. 
.
..
  ..   .. 

.
vn
vn
0
1
Now denote by P ij the elementary matrix which comes from the identity from switching
rows i and j. From what was just explained consider multiplication on the left by this
elementary matrix.


a11 a12 · · · a1p
 ..
..
.. 
 .
.
. 


 ai1 ai2 · · · aip 



..
.. 
P ij  ...

.
.


 aj1 aj2 · · · ajp 


 .
..
.. 
 ..
.
. 
an1 an2 · · · anp
From the way you multiply matrices this is a matrix which has the indicated columns.







a11
a12
a1p

 .. 
 .. 
 .. 

 . 
 . 
 . 








 ai1 
 ai2 
 aip 







 ij  ..  ij  .. 
 . 
P  .  , P  .  , · · · , P ij  .. 








 aj1 
 aj2 
 ajp 








 . 
 . 
 . 

 .. 
 .. 
 .. 
an1
an2
anp

 
a11
 ..  
 .  

 
 aj1  

 

 
=  ...  , 

 
 ai1  

 
 .  
.
 .  
an1

a12
.. 
. 

aj2 

..  , · · ·
. 

ai2 

.. 
. 
an2


a1p
 .. 
 . 


 ajp 




,  ... 


 aip 


 . 
.
 . 
anp
4.1. ELEMENTARY MATRICES
115

a11
 ..
 .

 aj1


=  ...

 ai1

 .
 ..
a12
..
.
···
aj2
..
.
···
ai2
..
.
···
an1
an2
···

a1p
.. 
. 

ajp 

.. 
. 

aip 

.. 
. 
anp
This has established the following lemma.
Lemma 4.1.3 Let P ij denote the elementary matrix which involves switching the ith and
the j th rows. Then
P ij A = B
where B is obtained from A by switching the ith and the j th rows.
As a consequence of the above lemma, if you have any permutation (i1 , · · · , in ), it
follows from Lemma 3.3.2 that the corresponding permutation matrix can be obtained by
multiplying finitely many permutation matrices, each of which switch only two rows. Now
every such permutation matrix in which only two rows are switched has determinant −1.
p
Therefore, the determinant of the permutation matrix for (i1 , · · · , in ) equals (−1) where
the given permutation can be obtained by making p switches. Now p is not unique. There
are many ways to make switches and end up with a given permutation, but what this shows
is that the total number of switches is either always odd or always even. That is, you could
not obtain a given permutation by making 2m switches and 2k + 1 switches. A permutation
is said to be even if p is even and odd if p is odd. This is an interesting result in abstract
algebra which is obtained very easily from a consideration of elementary matrices and of
course the theory of the determinant. Also, this shows that the composition of permutations
corresponds to the product of the corresponding permutation matrices.
To see permutations considered more directly in the context of group theory, you should
see a good abstract algebra book such as [18] or [14].
Next consider the row operation which involves multiplying the ith row by a nonzero
constant, c. The elementary matrix which results from applying this operation to the ith
row of the identity matrix is of the form


1
0


..


.




c




.
.


.
0
1
Now consider what this does to a column vector.


 

1
0
v1
v1

  ..   .. 
..

 .   . 
.


 


  vi  =  cvi 
c


 


 .   . 
..

  ..   .. 
.
0
1
vn
vn
Denote by E (c, i) this elementary matrix which multiplies the ith row of the identity
by the nonzero constant, c. Then from what was just discussed and the way matrices are
116
CHAPTER 4. ROW OPERATIONS
multiplied,

a11
 ..
 .

E (c, i) 
 ai1
 .
 ..
an1
a12
..
.
···
···
ai2
..
.
···
···
an2
···
···

a1p
.. 
. 

aip 

.. 
. 
anp
equals a matrix having the columns indicated below.






a1p
a12
a11
 ..
 .. 
 .. 

 .
 . 
 . 












= E (c, i)  ai1  , E (c, i)  ai2  , · · · , E (c, i) 
 aip
 .
 . 
 . 

 ..
 .. 
 .. 

an2
anp
an1


a11 a12 · · · · · · a1p
 ..
..
.. 
 .
.
. 



ca
ca
·
·
·
·
·
·
ca
= 
i2
ip 
 i1
 .

.
.
..
.. 
 ..
an1 an2 · · · · · · anp








This proves the following lemma.
Lemma 4.1.4 Let E (c, i) denote the elementary matrix corresponding to the row operation in which the ith row is multiplied by the nonzero constant, c. Thus E (c, i) involves
multiplying the ith row of the identity matrix by c. Then
E (c, i) A = B
where B is obtained from A by multiplying the ith row of A by c.
Finally consider the third of these row operations. Denote by E (c × i + j) the elementary
matrix which replaces the j th row with itself added to c times the ith row added to it. In
case i < j this will be of the form


1
0


..


.




1




.. . .


.
.




c ··· 1




.
.


.
0
1
4.1. ELEMENTARY MATRICES
117
Now consider what this does to a column vector.


1
0


..


.




1




.. . .


.
.




c ··· 1




.
..


0
1
 
v1
v1
..  
..

. 
.
 

vi 
v
i
 
..  = 
..

. 
.
 

vj  
 cvi + vj
..  
..
.  
.
vn
vn
Now from this and the way matrices are multiplied,

a11 a12 · · · · · ·
 ..
..
 .
.

 ai1 ai2 · · · · · ·


..
E (c × i + j)  ...
.

 aj2 aj2 · · · · · ·

 .
..
 ..
.
an1
an2
···
···
···
···
···
···
···
···
···
···














a1p
.. 
. 

aip 

.. 
. 

ajp 

.. 
. 
anp
equals a matrix of the following form having the indicated columns.






a11
a12

 .. 
 .. 


 . 
 . 








 ai1 
 ai2 








 .. 
 .. 

E (c × i + j)  .  , E (c × i + j)  .  , · · · E (c × i + j) 







 aj2 
 aj2 








 . 
 . 

 .. 
 .. 


an1
an2

a11
..
.




ai1


..
=
.

 aj2 + cai1


..

.
an1
a12
..
.
···
a1p
..
.
ai2
..
.
···
aip
..
.
aj2 + cai2
..
.
···
ajp + caip
..
.
an2
···
anp

a1p
.. 

. 


aip 

.. 

. 


ajp 

.. 
. 
anp













The case where i > j is handled similarly. This proves the following lemma.
Lemma 4.1.5 Let E (c × i + j) denote the elementary matrix obtained from I by replacing
the j th row with c times the ith row added to it. Then
E (c × i + j) A = B
where B is obtained from A by replacing the j th row of A with itself added to c times the
ith row of A.
118
CHAPTER 4. ROW OPERATIONS
The next theorem is the main result.
Theorem 4.1.6 To perform any of the three row operations on a matrix A it suffices to do
the row operation on the identity matrix obtaining an elementary matrix E and then take
the product, EA. Furthermore, each elementary matrix is invertible and its inverse is an
elementary matrix.
Proof: The first part of this theorem has been proved in Lemmas 4.1.3 - 4.1.5. It
only remains to verify the claim about the inverses. Consider first the elementary matrices
corresponding to row operation of type three.
E (−c × i + j) E (c × i + j) = I
This follows because the first matrix takes c times row i in the identity and adds it to row j.
When multiplied on the left by E (−c × i + j) it follows from the first part of this theorem
that you take the ith row of E (c × i + j) which coincides with the ith row of I since that
row was not changed, multiply it by −c and add to the j th row of E (c × i + j) which was
the j th row of I added to c times the ith row of I. Thus E (−c × i + j) multiplied on the
left, undoes the row operation which resulted in E (c × i + j). The same argument applied
to the product
E (c × i + j) E (−c × i + j)
replacing c with −c in the argument yields that this product is also equal to I. Therefore,
−1
E (c × i + j) = E (−c × i + j) .
Similar reasoning shows that for E (c, i) the elementary matrix which comes from multiplying the ith row by the nonzero constant, c,
(
)
−1
E (c, i) = E c−1 , i .
Finally, consider P ij which involves switching the ith and the j th rows.
P ij P ij = I
because by the first part of this theorem, multiplying on the left by P ij switches the ith
and j th rows of P ij which was obtained from switching the ith and j th rows of the identity.
First you switch them to get P ij and then you multiply on the left by P ij which switches
( )−1
these rows again and restores the identity matrix. Thus P ij
= P ij . 4.2
The Rank Of A Matrix
Recall the following definition of rank of a matrix.
Definition 4.2.1 A submatrix of a matrix A is the rectangular array of numbers obtained
by deleting some rows and columns of A. Let A be an m × n matrix. The determinant
rank of the matrix equals r where r is the largest number such that some r × r submatrix
of A has a non zero determinant. The row rank is defined to be the dimension of the span
of the rows. The column rank is defined to be the dimension of the span of the columns.
The rank of A is denoted as rank (A).
The following theorem is proved in the section on the theory of the determinant and is
restated here for convenience.
Theorem 4.2.2 Let A be an m × n matrix. Then the row rank, column rank and determinant rank are all the same.
4.2. THE RANK OF A MATRIX
119
So how do you find the rank? It turns out that row operations are the key to the practical
computation of the rank of a matrix.
In rough terms, the following lemma states that linear relationships between columns
in a matrix are preserved by row operations.
Lemma 4.2.3 Let B and A be two m × n matrices and suppose B results from a row
operation applied to A. Then the k th column of B is a linear combination of the i1 , · · · , ir
columns of B if and only if the k th column of A is a linear combination of the i1 , · · · , ir
columns of A. Furthermore, the scalars in the linear combination are the same. (The linear
relationship between the k th column of A and the i1 , · · · , ir columns of A is the same as the
linear relationship between the k th column of B and the i1 , · · · , ir columns of B.)
Proof: Let A equal the following matrix in which the ak are the columns
(
)
a1 a2 · · · an
and let B equal the following matrix in which the columns are given by the bk
(
)
b1 b2 · · · bn
Then by Theorem 4.1.6 on Page 118 bk = Eak where E is an elementary matrix. Suppose
then that one of the columns of A is a linear combination of some other columns of A. Say
∑
ak =
cr ar .
r∈S
Then multiplying by E,
bk = Eak =
∑
cr Ear =
r∈S
∑
cr br .
r∈S
Corollary 4.2.4 Let A and B be two m × n matrices such that B is obtained by applying
a row operation to A. Then the two matrices have the same rank.
Proof: Lemma 4.2.3 says the linear relationships are the same between the columns of
A and those of B. Therefore, the column rank of the two matrices is the same. This suggests that to find the rank of a matrix, one should do row operations until a
matrix is obtained in which its rank is obvious.
Example 4.2.5 Find the rank of the following matrix and identify columns whose linear
combinations yield all the other columns.


1 2 1 3 2
 1 3 6 0 2 
(4.1)
3 7 8 6 6
Take (−1) times the first row and add
row and add to the third. This yields

1 2
 0 1
0 1
to the second and then take (−3) times the first
1
5
5

3 2
−3 0 
−3 0
By the above corollary, this matrix has the same rank as the first matrix. Now take (−1)
times the second row and add to the third row yielding


1 2 1 3 2
 0 1 5 −3 0 
0 0 0 0 0
120
CHAPTER 4. ROW OPERATIONS
At this point it is clear the rank is 2. This is because every column is in the span of the
first two and these first two columns are linearly independent.
Example 4.2.6 Find the rank of the following matrix and identify columns whose linear
combinations yield all the other columns.


1 2 1 3 2
 1 2 6 0 2 
(4.2)
3 6 8 6 6
Take (−1) times the first row and add
row and add to the last row. This yields

1 2
 0 0
0 0
Now multiply the second row by 1/5 and

1 2
 0 0
0 0
Add (−1) times the second row to the

1
 0
0
to the second and then take (−3) times the first

3 2
−3 0 
−3 0
1
5
5
add 5 times it to the last row.

1
3
2
1 −3/5 0 
0
0
0
first.
2 0
0 1
0 0

2
−3/5 0 
0
0
18
5
(4.3)
It is now clear the rank of this matrix is 2 because the first and third columns form a
basis for the column space.
The matrix 4.3 is the row reduced echelon form for the matrix 4.2.
4.3
The Row Reduced Echelon Form
The following definition is for the row reduced echelon form of a matrix.
Definition 4.3.1 Let ei denote the column vector which has all zero entries except for the
ith slot which is one. An m×n matrix is said to be in row reduced echelon form if, in viewing
successive columns from left to right, the first nonzero column encountered is e1 and if you
have encountered e1 , e2 , · · · , ek , the next column is either ek+1 or is a linear combination
of the vectors, e1 , e2 , · · · , ek .
For example, here are some

0 1
 0 0
0 0
matrices which
 
3 0 3
0 1 5 ,
0 0 0
are in row reduced echelon form.

1 0 3 −11 0
0 1 4
4
0 .
0 0 0
0
1
Theorem 4.3.2 Let A be an m × n matrix. Then A has a row reduced echelon form
determined by a simple process.
4.3. THE ROW REDUCED ECHELON FORM
121
Proof: Viewing the columns of A from left to right take the first nonzero column. Pick
a nonzero entry in this column and switch the row containing this entry with the top row of
A. Now divide this new top row by the value of this nonzero entry to get a 1 in this position
and then use row operations to make all entries below this entry equal to zero. Thus the
first nonzero column is now e1 . Denote the resulting matrix by A1 . Consider the submatrix
of A1 to the right of this column and below the first row. Do exactly the same thing for it
that was done for A. This time the e1 will refer to Fm−1 . Use this 1 and row operations
to zero out every entry above it in the rows of A1 . Call the resulting matrix A2 . Thus A2
satisfies the conditions of the above definition up to the column just encountered. Continue
this way till every column has been dealt with and the result must be in row reduced echelon
form. The following diagram illustrates the above procedure. Say the matrix looked something
like the following.


0 ∗ ∗ ∗ ∗ ∗ ∗
 0 ∗ ∗ ∗ ∗ ∗ ∗ 


 .. .. .. .. .. .. .. 
 . . . . . . . 
0 ∗ ∗ ∗ ∗ ∗ ∗
First step would yield something like

0
 0

 ..
 .
1
0
..
.
0 0
∗ ∗
∗ ∗
.. ..
. .
∗ ∗
For the second step you look at the lower right

∗ ∗ ∗
 .. .. ..
 . . .
∗
∗
∗
∗
∗
..
.
∗
∗
..
.
∗
∗
..
.
∗
∗
∗





corner as described,

∗ ∗
.. .. 
. . 
∗
and if the first column consists of all zeros but the
something like this.

0 1 ∗ ∗
 .. .. .. ..
 . . . .
0 0 ∗ ∗
∗
next one is not all zeros, you would get

∗
.. 
. 
∗
Thus, after zeroing out the term in the top row above the 1, you get the following for the
next step in the computation of the row reduced echelon form for the original matrix.


0 1 ∗ 0 ∗ ∗ ∗
 0 0 0 1 ∗ ∗ ∗ 


 .. .. .. .. .. .. ..  .
 . . . . . . . 
0
0
0
0
∗ ∗
∗
Next you look at the lower right matrix below the top two rows and to the right of the first
four columns and repeat the process.
Definition 4.3.3 The first pivot column of A is the first nonzero column of A. The next
pivot column is the first column after this which is not a linear combination of the columns to
its left. The third pivot column is the next column after this which is not a linear combination
122
CHAPTER 4. ROW OPERATIONS
of those columns to its left, and so forth. Thus by Lemma 4.2.3 if a pivot column occurs
as the j th column from the left, it follows that in the row reduced echelon form there will be
one of the ek as the j th column.
There are three choices for row operations at each step in the above theorem. A natural
question is whether the same row reduced echelon matrix always results in the end from
following the above algorithm applied in any way. The next corollary says this is the case.
Definition 4.3.4 Two matrices are said to be row equivalent if one can be obtained from
the other by a sequence of row operations.
Since every row operation can be obtained by multiplication on the left by an elementary
matrix and since each of these elementary matrices has an inverse which is also an elementary
matrix, it follows that row equivalence is a similarity relation. Thus one can classify matrices
according to which similarity class they are in. Later in the book, another more profound
way of classifying matrices will be presented.
It has been shown above that every matrix is row equivalent to one which is in row
reduced echelon form. Note


x1
 .. 
 .  = x1 e1 + · · · + xn en
xn
so to say two column vectors are equal is to say they are the same linear combination of the
special vectors ej .
Corollary 4.3.5 The row reduced echelon form is unique. That is if B, C are two matrices
in row reduced echelon form and both are row equivalent to A, then B = C.
Proof: Suppose B and C are both row reduced echelon forms for the matrix A. Then
they clearly have the same zero columns since row operations leave zero columns unchanged.
If B has the sequence e1 , e2 , · · · , er occurring for the first time in the positions, i1 , i2 , · · · , ir ,
the description of the row reduced echelon form means that each of these columns is not a
linear combination of the preceding columns. Therefore, by Lemma 4.2.3, the same is true of
the columns in positions i1 , i2 , · · · , ir for C. It follows from the description of the row reduced
echelon form, that e1 , · · · , er occur respectively for the first time in columns i1 , i2 , · · · , ir
for C. Thus B, C have the same columns in these positions. By Lemma 4.2.3, the other
columns in the two matrices are linear combinations, involving the same scalars, of the
columns in the i1 , · · · , ik position. Thus each column of B is identical to the corresponding
column in C. The above corollary shows that you can determine whether two matrices are row equivalent by simply checking their row reduced echelon forms. The matrices are row equivalent
if and only if they have the same row reduced echelon form.
The following corollary follows.
Corollary 4.3.6 Let A be an m × n matrix and let R denote the row reduced echelon form
obtained from A by row operations. Then there exists a sequence of elementary matrices,
E1 , · · · , Ep such that
(Ep Ep−1 · · · E1 ) A = R.
Proof: This follows from the fact that row operations are equivalent to multiplication
on the left by an elementary matrix. 4.3. THE ROW REDUCED ECHELON FORM
123
Corollary 4.3.7 Let A be an invertible n × n matrix. Then A equals a finite product of
elementary matrices.
Proof: Since A−1 is given to exist, it follows A must have rank n because by Theorem
3.3.18 det(A) ̸= 0 which says the determinant rank and hence the column rank of A is n
and so the row reduced echelon form of A is I because the columns of A form a linearly
independent set. Therefore, by Corollary 4.3.6 there is a sequence of elementary matrices,
E1 , · · · , Ep such that
(Ep Ep−1 · · · E1 ) A = I.
−1
−1
But now multiply on the left on both sides by Ep−1 then by Ep−1
and then by Ep−2
etc.
until you get
−1
A = E1−1 E2−1 · · · Ep−1
Ep−1
and by Theorem 4.1.6 each of these in this product is an elementary matrix.
Corollary 4.3.8 The rank of a matrix equals the number of nonzero pivot columns. Furthermore, every column is contained in the span of the pivot columns.
Proof: Write the row reduced echelon form for the matrix. From Corollary 4.2.4 this
row reduced matrix has the same rank as the original matrix. Deleting all the zero rows
and all the columns in the row reduced echelon form which do not correspond to a pivot
column, yields an r × r identity submatrix in which r is the number of pivot columns. Thus
the rank is at least r.
From Lemma 4.2.3 every column of A is a linear combination of the pivot columns since
this is true by definition for the row reduced echelon form. Therefore, the rank is no more
than r. Here is a fundamental observation related to the above.
Corollary 4.3.9 Suppose A is an m×n matrix and that m < n. That is, the number of rows
is less than the number of columns. Then one of the columns of A is a linear combination
of the preceding columns of A.
Proof: Since m < n, not all the columns of A can be pivot columns. That is, in the
row reduced echelon form say ei occurs for the first time at ri where r1 < r2 < · · · < rp
where p ≤ m. It follows since m < n, there exists some column in the row reduced echelon
form which is a linear combination of the preceding columns. By Lemma 4.2.3 the same is
true of the columns of A. Definition 4.3.10 Let A be an m×n matrix having rank, r. Then the nullity of A is defined
to be n − r. Also define ker (A) ≡ {x ∈ Fn : Ax = 0} . This is also denoted as N (A) .
Observation 4.3.11 Note that ker (A) is a subspace because if a, b are scalars and x, y are
vectors in ker (A), then
A (ax + by) = aAx + bAy = 0 + 0 = 0
Recall that the dimension of the column space of a matrix equals its rank and since the
column space is just A (Fn ) , the rank is just the dimension of A (Fn ). The next theorem
shows that the nullity equals the dimension of ker (A).
Theorem 4.3.12 Let A be an m × n matrix. Then rank (A) + dim (ker (A)) = n..
124
CHAPTER 4. ROW OPERATIONS
Proof: Since ker (A) is a subspace, there exists a basis for ker (A) , {x1 , · · · , xk } . Also
let {Ay1 , · · · , Ayl } be a basis for A (Fn ). Let u ∈ Fn . Then there exist unique scalars ci
such that
l
∑
Au =
ci Ayi
i=1
(
It follows that
A u−
l
∑
)
ci yi
=0
i=1
and so the vector in parenthesis is in ker (A). Thus there exist unique bj such that
u=
l
∑
ci yi +
i=1
k
∑
bj xj
j=1
Since u was arbitrary, this shows {x1 , · · · , xk , y1 , · · · , yl } spans Fn . If these vectors are
independent, then they will form a basis and the claimed equation will be obtained. Suppose
then that
l
k
∑
∑
ci yi +
bj xj = 0
i=1
j=1
Apply A to both sides. This yields
l
∑
ci Ayi = 0
i=1
and so each ci = 0. Then the independence of the xj imply each bj = 0. 4.4
Rank And Existence Of Solutions To Linear Systems
Consider the linear system of equations,
Ax = b
(4.4)
where A is an m × n matrix, x is a n × 1 column vector, and b is an m × 1 column vector.
Suppose
(
)
A = a1 · · · an
T
where the ak denote the columns of A. Then x = (x1 , · · · , xn ) is a solution of the system
4.4, if and only if
x1 a1 + · · · + xn an = b
which says that b is a vector in span (a1 , · · · , an ) . This shows that there exists a solution
to the system, 4.4 if and only if b is contained in span (a1 , · · · , an ) . In words, there is a
solution to 4.4 if and only if b is in the column space of A. In terms of rank, the following
proposition describes the situation.
Proposition 4.4.1 Let A be an m × n matrix and let b be an m × 1 column vector. Then
there exists a solution to 4.4 if and only if
(
)
rank A | b = rank (A) .
(4.5)
4.5. FREDHOLM ALTERNATIVE
125
(
)
Proof: Place A | b and A in row reduced echelon form, respectively B and C. If
the above condition on rank is true, then both B and C have the same number of nonzero
rows. In particular, you cannot have a row of the form
(
)
0 ··· 0 ⋆
where ⋆ ̸= 0 in B. Therefore, there will exist a solution to the system 4.4.
Conversely, suppose there exists a solution. This means there cannot be such a row in
B described above. Therefore, B and C must have the same number of zero rows and so
they have the same number of nonzero rows. Therefore, the rank of the two matrices in 4.5
is the same. 4.5
Fredholm Alternative
There is a very useful version of Proposition 4.4.1 known as the Fredholm alternative.
I will only present this for the case of real matrices here. Later a much more elegant and
general approach is presented which allows for the general case of complex matrices.
The following definition is used to state the Fredholm alternative.
Definition 4.5.1 Let S ⊆ Rm . Then S ⊥ ≡ {z ∈ Rm : z · s = 0 for every s ∈ S} . The funny
exponent, ⊥ is called “perp”.
Now note
(
T
ker A
)
{
}
≡ z : AT z = 0 =
{
z:
m
∑
}
zk ak = 0
k=1
Lemma 4.5.2 Let A be a real m × n matrix, let x ∈ Rn and y ∈ Rm . Then
(
)
(Ax · y) = x·AT y
Proof: This follows right away from the definition of the inner product and matrix
multiplication.
∑
∑( )
(
)
(Ax · y) =
Akl xl yk =
AT lk xl yk = x · AT y . k,l
k,l
Now it is time to state the Fredholm alternative. The first version of this is the following
theorem.
Theorem 4.5.3 Let A be a real m × n matrix and let b ∈ Rm . There exists a solution, x
( )⊥
to the equation Ax = b if and only if b ∈ ker AT .
( )⊥
Proof: First suppose b ∈ ker AT . Then this says that if AT x = 0, it follows that
b · x = 0. In other words, taking the transpose, if
xT A = 0, then xT b = 0.
Thus, if P is a product of elementary matrices such that P A is in row reduced echelon form,
th
th
then if P A has a row
( of zeros, in
) the k position, then there is also a zero in the k position
A
|
b
of P b. Thus rank
= rank (A) .By Proposition 4.4.1, there exists a solution, x
to the system Ax
( T=) b. It remains to go the other direction.
Let z ∈ ker A and suppose Ax = b. I need to verify b · z = 0. By Lemma 4.5.2,
b · z = Ax · z = x · AT z = x · 0 = 0 This implies the following corollary which is also called the Fredholm alternative. The
“alternative” becomes more clear in this corollary.
126
CHAPTER 4. ROW OPERATIONS
Corollary 4.5.4 Let A be an m × n matrix. Then A maps Rn onto Rm if and only if the
only solution to AT x = 0 is x = 0.
( )
( )⊥
Proof: If the only solution to AT x = 0 is x = 0, then ker AT = {0} and so ker AT
=
Rm because every b ∈ Rm has the property that b · 0 = 0. Therefore, Ax = b has a solu( )⊥
by
tion for any b ∈ Rm because the b for which there is a solution are those in ker AT
Theorem 4.5.3. In other words, A maps Rn onto Rm .
( )⊥
Conversely if A is onto, then by Theorem 4.5.3 every b ∈ Rm is in ker AT
and so if
AT x = 0, then b · x = 0 for every b. In particular, this holds for b = x. Hence if AT x = 0,
then x = 0. Here is an amusing example.
Example 4.5.5 Let A be an m × n matrix in which m > n. Then A cannot map onto Rm .
The reason for this is that AT is an n × m where m > n and so in the augmented matrix
( T )
A |0
there must be some free variables. Thus there exists a nonzero vector x such that AT x = 0.
4.6
Exercises
1. Let {u1 , · · · , un } be vectors in Rn . The parallelepiped determined by these vectors
P (u1 , · · · , un ) is defined as
}
{ n
∑
tk uk : tk ∈ [0, 1] for all k .
P (u1 , · · · , un ) ≡
k=1
Now let A be an n × n matrix. Show that
{Ax : x ∈ P (u1 , · · · , un )}
is also a parallelepiped.
2. In the context of Problem 1, draw P (e1 , e2 ) where e1 , e2 are the standard basis vectors
for R2 . Thus e1 = (1, 0) , e2 = (0, 1) . Now suppose
(
)
1 1
E=
0 1
where E is the elementary matrix which takes the third row and adds to the first.
Draw
{Ex : x ∈ P (e1 , e2 )} .
In other words, draw the result of doing E to the vectors in P (e1 , e2 ). Next draw the
results of doing the other elementary matrices to P (e1 , e2 ).
3. In the context of Problem 1, either draw or describe the result of doing elementary
matrices to P (e1 , e2 , e3 ). Describe geometrically the conclusion of Corollary 4.3.7.
4. Consider a permutation of {1, 2, · · · , n}. This is an ordered list of numbers taken from
this list with no repeats, {i1 , i2 , · · · , in }. Define the permutation matrix P (i1 , i2 , · · · , in )
as the matrix which is obtained from the identity matrix by placing the j th column
th
of I as the ith
j column of P (i1 , i2 , · · · , in ) . Thus the 1 in the ij column of this perth
mutation matrix occurs in the j slot. What does this permutation matrix do to the
T
column vector (1, 2, · · · , n) ?
4.6. EXERCISES
127
5. ↑Consider the 3 × 3 permutation matrices. List all of them and then determine the
dimension of their span. Recall that you can consider an m × n matrix as something
in Fnm .
6. Determine
(
1
(a)
0

1

0
(b)
0

1
(c)  0
0
which matrices are in row reduced echelon form.
)
2 0
1 7

0 0 0
0 1 2 
0 0 0

1 0 0 0 5
0 1 2 0 4 
0 0 0 1 3
7. Row reduce the following matrices to obtain the row reduced echelon form. List the
pivot columns in the original matrix.


1 2 0 3
(a)  2 1 2 2 
1 1 0 3


1 2 3
 2 1 −2 

(b) 
 3 0 0 
3 2 1


1 2 1 3
(c)  −3 2 1 0 
3 2 1 1
8. Find the rank and nullity of the following matrices. If the rank is r, identify r columns
in the original matrix which have the property that every other column may be
written as a linear combination of these.


0 1 0 2 1 2 2
 0 3 2 12 1 6 8 

(a) 
 0 1 1 5 0 2 3 
0 2 1 7 0 3 4


0 1 0 2 0 1 0
 0 3 2 6 0 5 4 

(b) 
 0 1 1 2 0 2 2 
0 2 1 4 0 3 2


0 1 0 2 1 1 2
 0 3 2 6 1 5 1 

(c) 
 0 1 1 2 0 2 1 
0 2 1 4 0 3 1
9. Find the rank of the following matrices. If the rank is r, identify r columns in the
original matrix which have the property that every other column may be written
as a linear combination of these. Also find a basis for the row and column spaces of
the matrices.
128
CHAPTER 4. ROW OPERATIONS

1
 3
(a) 
 2
0

1
 4
(b) 
 2
0

0
 0
(c) 
 0
0

0
 0
(d) 
 0
0

0
 0
(e) 
 0
0
0
1
1
2

0
1 

0 
1

0
1 

0 
0
1
3
1
2
0
2
1
1
2
12
5
7
1
3
1
2
0
2
1
1
2
6
2
4
1
3
1
2
0
2
1
1
2
6
2
4
2
2
1
2

1 2 2
1 6 8 

0 2 3 
0 3 4

0 1 0
0 5 4 

0 2 2 
0 3 2

1 1 2
1 5 1 

0 2 1 
0 3 1
10. Suppose A is an m × n matrix. Explain why the rank of A is always no larger than
min (m, n) .
11. Suppose A is an m × n matrix in which m ≤ n. Suppose also that the rank of A equals
m. Show that A maps Fn onto Fm . Hint: The vectors e1 , · · · , em occur as columns
in the row reduced echelon form for A.
12. Suppose A is an m × n matrix and that m > n. Show there exists b ∈ Fm such that
there is no solution to the equation
Ax = b.
13. Suppose A is an m × n matrix in which m ≥ n. Suppose also that the rank of A
equals n. Show that A is one to one. Hint: If not, there exists a vector, x ̸= 0 such
that Ax = 0, and this implies at least one column of A is a linear combination of the
others. Show this would require the column rank to be less than n.
14. Explain why an n × n matrix A is both one to one and onto if and only if its rank is
n.
15. Suppose A is an m × n matrix and {w1 , · · · , wk } is a linearly independent set of
vectors in A (Fn ) ⊆ Fm . Suppose also that Azi = wi . Show that {z1 , · · · , zk } is also
linearly independent.
16. Show rank (A + B) ≤ rank (A) + rank (B).
17. Suppose A is an m × n matrix, m ≥ n and the columns of A are independent. Suppose also that {z1 , · · · , zk } is a linearly independent set of vectors in Fn . Show that
{Az1 , · · · , Azk } is linearly independent.
4.6. EXERCISES
129
18. Suppose A is an m × n matrix and B is an n × p matrix. Show that
dim (ker (AB)) ≤ dim (ker (A)) + dim (ker (B)) .
Hint: Consider the subspace, B (Fp ) ∩ ker (A) and suppose a basis for this subspace
is {w1 , · · · , wk } . Now suppose {u1 , · · · , ur } is a basis for ker (B) . Let {z1 , · · · , zk }
be such that Bzi = wi and argue that
ker (AB) ⊆ span (u1 , · · · , ur , z1 , · · · , zk ) .
19. Let m < n and let A be an m × n matrix. Show that A is not one to one.
20. Let A be an m × n real matrix and let b ∈ Rm . Show there exists a solution, x to the
system
AT Ax = AT b
Next show that if x, x1 are two solutions, then Ax = Ax1 . Hint: First show that
( T )T
(
)
A A = AT A. Next show if x ∈ ker AT A , then Ax = 0. Finally apply the Fredholm alternative. Show AT b ∈ ker(AT A)⊥ . This will give existence of a solution.
21. Show that in the context of Problem 20 that if x is the solution there, then |b − Ax| ≤
|b − Ay| for every y. Thus Ax is the point of A (Rn ) which is closest to b of every
point in A (Rn ). This is a solution to the least squares problem.
  

1
0
 0   1 
T
  

22. ↑Here is a point in R4 : (1, 2, 3, 4) . Find the point in span 
 2  ,  3  which
3
2
is closest to the given point.
T
23. ↑Here is a point in R4 : (1, 2, 3, 4) . Find the point on the plane described by x + 2y −
4z + 4w = 0 which is closest to the given point.
24. Suppose A, B are two invertible n × n matrices. Show there exists a sequence of row
operations which when done to A yield B. Hint: Recall that every invertible matrix
is a product of elementary matrices.
25. If A is invertible and n × n and B is n × p, show that AB has the same null space as
B and also the same rank as B.
26. Here are two matrices in row reduced echelon form



1 0 1
1 0
A =  0 1 1 , B =  0 1
0 0 0
0 0

0
1 
0
Does there exist a sequence of row operations which when done to A will yield B?
Explain.
27. Is it true that an upper triagular matrix has rank equal to the number of nonzero
entries down the main diagonal?
130
CHAPTER 4. ROW OPERATIONS
28. Let {v1 , · · · , vn−1 } be vectors in Fn . Describe a systematic way to obtain a vector vn
which is perpendicular to each of these vectors. Hint: You might consider something
like this


e1
e2
···
en
 v11
v12
···
v1n 


det 

..
..
..


.
.
.
v(n−1)1
v(n−1)2
···
v(n−1)n
where vij is the j th entry of the vector vi . This is a lot like the cross product.
29. Let A be an m × n matrix. Then ker (A) is a subspace of Fn . Is it true that every
subspace of Fn is the kernel or null space of some matrix? Prove or disprove.
30. Let A be an n×n matrix and let P ij be the permutation matrix which switches the ith
and j th rows of the identity. Show that P ij AP ij produces a matrix which is similar
to A which switches the ith and j th entries on the main diagonal.
31. Recall the procedure for finding the inverse of a matrix on Page 52. It was shown that
the procedure, when it works, finds the inverse of the matrix. Show that whenever
the matrix has an inverse, the procedure works.
Chapter 5
Some Factorizations
5.1
LU Factorization
An LU factorization of a matrix involves writing the given matrix as the product of a
lower triangular matrix which has the main diagonal consisting entirely of ones, L, and an
upper triangular matrix U in the indicated order. The L goes with “lower” and the U with
“upper”. It turns out many matrices can be written in this way and when this is possible,
people get excited about slick ways of solving the system of equations, Ax = y. The method
lacks generality but is of interest just the same.
(
)
0 1
Example 5.1.1 Can you write
in the form LU as just described?
1 0
To do so you would need
(
)(
1 0
a
x 1
0
b
c
)
(
=
a
b
xa xb + c
)
(
=
0
1
1
0
)
.
Therefore, b = 1 and a = 0. Also, from the bottom rows, xa = 1 which can’t happen and
have a = 0. Therefore, you can’t write this matrix in the form LU. It has no LU factorization.
This is what I mean above by saying the method lacks generality.
Which matrices have an LU factorization? It turns out it is those whose row reduced
echelon form can be achieved without switching rows and which only involve row operations
of type 3 in which row j is replaced with a multiple of row i added to row j for i < j.
5.2
Finding An LU Factorization
There is a convenient procedure for finding an LU factorization. It turns out that it is
only necessary to keep track of the multipliers which are used to row reduce to upper
triangular form. This procedure is described in the following examples and is called the
multiplier method. It is due to Dolittle.


1 2 3
Example 5.2.1 Find an LU factorization for A =  2 1 −4 
1 5 2
Write the matrix next to the identity matrix as shown.



1 0 0
1 2 3
 0 1 0   2 1 −4  .
0 0 1
1 5 2
131
132
CHAPTER 5. SOME FACTORIZATIONS
The process involves doing row operations to the matrix on the right while simultaneously
updating successive columns of the matrix on the left. First take −2 times the first row and
add to the second in the matrix on the right.



1 0 0
1 2
3
 2 1 0   0 −3 −10 
0 0 1
1 5
2
Note the method for updating the matrix on the left. The 2 in the second entry of the first
column is there because −2 times the first row of A added to the second row of A produced
a 0. Now replace the third row in the matrix on the right by −1 times the first row added
to the third. Thus the next step is



1 0 0
1 2
3
 2 1 0   0 −3 −10 
1 0 1
0 3
−1
Finally, add the second row to the bottom row and make the following changes



1 0 0
1 2
3
 2 1 0   0 −3 −10  .
1 −1 1
0 0 −11
At this point, stop because the matrix on the right is upper triangular. An LU factorization
is the above.
The justification for this gimmick will be given later.


1 2 1 2 1
 2 0 2 1 1 

Example 5.2.2 Find an LU factorization for A = 
 2 3 1 3 2 .
1 0 1 1 2
This time everything is done at once for a whole column. This saves trouble. First
multiply the first row by (−1) and then add to the last row. Next take (−2) times the first
and add to the second and then (−2) times the first and add to the third.



1 0 0 0
1 2
1
2
1
 2 1 0 0   0 −4 0 −3 −1 



 2 0 1 0   0 −1 −1 −1 0  .
1 0 0 1
0 −2 0 −1 1
This finishes the first column of L and the first column of U. Now take − (1/4) times the
second row in the matrix on the right and add to the third followed by − (1/2) times the
second added to the last.



1 0
0 0
1 2
1
2
1
 2 1

0 0 
−3
−1 

  0 −4 0

 2 1/4 1 0   0 0 −1 −1/4 1/4 
1 1/2 0 1
0 0
0
1/2 3/2
This finishes the second column of L as well as the second column of U . Since the matrix
on the right is upper triangular, stop. The LU factorization has now been obtained. This
technique is called Dolittle’s method. ◮◮
This process is entirely typical of the general case. The matrix U is just the first upper
triangular matrix you come to in your quest for the row reduced echelon form using only
5.3. SOLVING LINEAR SYSTEMS USING AN LU FACTORIZATION
133
the row operation which involves replacing a row by itself added to a multiple of another
row. The matrix L is what you get by updating the identity matrix as illustrated above.
You should note that for a square matrix, the number of row operations necessary to
reduce to LU form is about half the number needed to place the matrix in row reduced
echelon form. This is why an LU factorization is of interest in solving systems of equations.
5.3
Solving Linear Systems Using An LU Factorization
The reason people care about the LU factorization is it allows the quick solution of systems
of equations. Here is an example.




x
1 2 3 2
 y 

Example 5.3.1 Suppose you want to find the solutions to  4 3 1 1  
 z  =
1 2 3 0
w
 
1
 2 .
3
Of course one way is to write the augmented matrix and grind away. However, this
involves more row operations than the computation of an LU factorization and it turns out
that an LU factorization can give the solution quickly. Here is how. The following is an LU
factorization for the matrix.

 


1 2 3 2
1 0 0
1 2
3
2
 4 3 1 1  =  4 1 0   0 −5 −11 −7  .
1 2 3 0
1 0 1
0 0
0
−2
T
Let U x = y and consider Ly = b

1
 4
1
where in this case, b = (1, 2, 3) . Thus
  

1
y1
0 0
1 0   y2  =  2 
3
y3
0 1


1
which yields very quickly that y =  −2  . Now you can find x by solving U x = y. Thus
2
in this case,






x
1 2
3
2
1
 y 
  −2 
 0 −5 −11 −7  
 z =
0 0
0
−2
2
w
which yields


− 35 + 75 t
 9 − 11 t 
5
5
 , t ∈ R.
x =


t
−1
Work this out by hand and you will see the advantage of working only with triangular
matrices.
It may seem like a trivial thing but it is used because it cuts down on the number of
operations involved in finding a solution to a system of equations enough that it makes a
difference for large systems.
134
5.4
CHAPTER 5. SOME FACTORIZATIONS
The P LU Factorization
As indicated above, some matrices don’t have an LU factorization. Here is an example.


1 2 3 2
M = 1 2 3 0 
(5.1)
4 3 1 1
In this case, there is another factorization which is useful called a P LU factorization. Here
P is a permutation matrix.
Example 5.4.1 Find a P LU factorization for the above matrix in 5.1.
Proceed as before trying to find the row echelon form of the matrix. First add −1 times
the first row to the second row and then add −4 times the first to the third. This yields



1 0 0
1 2
3
2
 1 1 0  0 0
0
−2 
4 0 1
0 −5 −11 −7
There is no way to do only row operations involving replacing a row with itself added to a
multiple of another row to the second matrix in such a way as to obtain an upper triangular
matrix. Therefore, consider M with the bottom two rows switched.


1 2 3 2
M′ =  4 3 1 1  .
1 2 3 0
Now try again with this matrix. First take −1 times the first row and add to the bottom
row and then take −4 times the first row and add to the second row. This yields



1 2
3
2
1 0 0
 4 1 0   0 −5 −11 −7 
0 0
0
−2
1 0 1
The second matrix is upper triangular and so


1 0 0
1
 4 1 0  0
0
1 0 1
Thus M ′ = P M = LU where L and U
so

 
1 2 3 2
1 0
 1 2 3 0 = 0 0
4 3 1 1
0 1
the LU factorization of the matrix M ′ is

2
3
2
−5 −11 −7  .
0
0
−2
are given

0
1
1  4
0
1
above. Therefore, M = P 2 M = P LU and


0 0
1 2
3
2
1 0   0 −5 −11 −7 
0 1
0 0
0
−2
This process can always be followed and so there always exists
given matrix even though there isn’t always an LU factorization.

1 2 3
Example 5.4.2 Use a P LU factorization of M ≡  1 2 3
4 3 1
T
M x = b where b = (1, 2, 3) .
a P LU factorization of a

2
0  to solve the system
1
5.5. JUSTIFICATION FOR THE MULTIPLIER METHOD
Let U x = y and consider

1 0
 0 0
0 1
135
P Ly = b. In other words, solve,
  


0
1 0 0
y1
1
1   4 1 0   y2  =  2  .
y3
0
1 0 1
3
Then multiplying both sides by P

1
 4
1
gives
0
1
0

  
0
y1
1
0   y2  =  3 
1
y3
2

 

y1
1
y =  y2  =  −1  .
y3
1
and so
Now U x = y and so it only remains to solve

1
 0
0
which yields
5.5


x1
2
3
2
 x2
−5 −11 −7  
 x3
0
0
−2
x4

 
x1
 x2  

 
 x3  = 
x4
1
5
9
10



1

 =  −1 

1

+ 75 t

− 11
5 t  : t ∈ R.

t
1
−2
Justification For The Multiplier Method
Why does the multiplier method work for finding an LU factorization? Suppose A is a
matrix which has the property that the row reduced echelon form for A may be achieved
using only the row operations which involve replacing a row with itself added to a multiple
of another row. It is not ever necessary to switch rows. Thus every row which is replaced
using this row operation in obtaining the echelon form may be modified by using a row
which is above it. Furthermore, in the multiplier method for finding the LU factorization,
we zero out the elements below the pivot entry in first column and then the next and so on
when scanning from the left. In terms of elementary matrices, this means the row operations
used to reduce A to upper triangular form correspond to multiplication on the left by lower
triangular matrices having all ones down the main diagonal and the sequence of elementary
matrices which row reduces A has the property that in scanning the list of elementary
matrices from the right to the left, this list consists of several matrices which involve only
changes from the identity in the first column, then several which involve only changes from
the identity in the second column and so forth. More precisely, Ep · · · E1 A = U where U is
upper triangular, Ek having all zeros below the main diagonal except for a single column.
Will be L
}|
{
z
−1
Ep−1 U. You multiply the inverses in the reverse order. Now each
Therefore, A = E1−1 · · · Ep−1
of the Ei−1 is also lower triangular with 1 down the main diagonal. Therefore their product
has this property. Recall also that if Ei equals the identity matrix except for having an a
in a single column somewhere below the main diagonal, Ei−1 is obtained by replacing the a
136
CHAPTER 5. SOME FACTORIZATIONS
in Ei with −a, thus explaining why we replace with −1 times the multiplier in computing
−1
L. In the case where A is a 3 × m matrix, E1−1 · · · Ep−1
Ep−1 is of the form



 

1 0 0
1 0 0
1 0 0
1 0 0
 a 1 0  0 1 0  0 1 0  =  a 1 0 .
0 0 1
b 0 1
0 c 1
b c 1
Note that scanning from left to right, the first two in the product involve changes in the
identity only in the first column while in the third matrix, the change is only in the second.
If the entries in the first column had been zeroed out in a different order, the following
would have resulted.



 

1 0 0
1 0 0
1 0 0
1 0 0
 0 1 0  a 1 0  0 1 0  =  a 1 0 
b 0 1
0 0 1
0 c 1
b c 1
However, it is important to be working from the left to the right, one column at a time.
A similar observation holds in any dimension. Multiplying the elementary matrices which
involve a change only in the j th column you obtain A equal to an upper triangular, n × m
matrix U which is multiplied by a sequence of lower triangular matrices on its left which is
of the following form, in which the aij are negatives of multipliers used in row reducing to
an upper triangular matrix.


 

1
0 ··· 0
1
0
··· 0
1 0
···
0

..  
..  
.. 
 a11


1
. 
1
. 
. 

 0
··· 0 1


 .
  .

..
.
.
.
.
.
.
.
.
.
.





. 0
. 0
.
.
.
.
.
0 
a1,n−1 0 · · · 1
0 a2,n−2 · · · 1
0 · · · an,n−1 1
From the matrix multiplication, this product equals

1
 a11
1


..
..

.
.
a1,n−1 · · · an,n−1





1
Notice how the end result of the matrix multiplication made no change in the aij . It just
filled in the empty spaces with the aij which occurred in one of the matrices in the product.
This is why, in computing L, it is sufficient to begin with the left column and work column
by column toward the right, replacing entries with the negative of the multiplier used in the
row operation which produces a zero in that entry.
5.6
Existence For The P LU Factorization
Here I will consider an invertible n × n matrix and show that such a matrix always has
a P LU factorization. More general matrices could also be considered but this is all I will
present.
Let A be such an invertible matrix and consider the first column of A. If A11 ̸= 0, use
this to zero out everything below it. The entry A11 is called the pivot. Thus in this case
there is a lower triangular matrix L1 which has all ones on the diagonal such that
(
)
∗ ∗
L1 P1 A =
(5.2)
0 A1
5.6. EXISTENCE FOR THE P LU FACTORIZATION
137
Here P1 = I. In case A11 = 0, let r be such that Ar1 ̸= 0 and r is the first entry for which
this happens. In this case, let P1 be the permutation matrix which switches the first row
and the rth row. Then as before, there exists a lower triangular matrix L1 which has all
ones on the diagonal such that 5.2 holds in this case also. In the first column, this L1 has
zeros between the first row and the rth row.
Go to A1 . Following the same procedure as above, there exists a lower triangular matrix
and permutation matrix L′2 , P2′ such that
(
)
∗ ∗
L′2 P2′ A1 =
0 A2
(
Let
L2 =
1 0
0 L′2
)
(
, P2 =
1 0
0 P2′
)
Then using block multiplication, Theorem 3.5.2,
(
)(
)(
)
1 0
1 0
∗ ∗
=
0 L′2
0 P2′
0 A1
)
) (
(
)(
∗
∗
∗
∗
1 0
=
=
0 L′2 P2′ A1
0 P2′ A1
0 L′2


∗ ···
∗
 0 ∗
∗  = L2 P2 L1 P1 A
0 0 A2
and L2 has all the subdiagonal entries equal to 0 except possibly some nonzero entries in
the second column starting with position r2 where P2 switches rows r2 and 2. Continuing
this way, it follows there are lower triangular matrices Lj having all ones down the diagonal
and permutation matrices Pi which switch only two rows such that
Ln−1 Pn−1 Ln−2 Pn−2 Ln−3 · · · L2 P2 L1 P1 A = U
(5.3)
where U is upper triangular. The matrix Lj has all zeros below the main diagonal except
for the j th column and even in this column it has zeros between position j and rj where Pj
switches rows j and rj . Of course in the case where no switching is necessary, you could get
all nonzero entries below the main diagonal in the j th column for Lj .
The fact that Lj is the identity except for the j th column means that each Pk for k > j
almost commutes with Lj . Say Pk switches the k th and the q th rows for q ≥ k > j. When
you place Pk on the right of Lj it just switches the k th and the q th columns and leaves the
j th column unchanged. Therefore, the same result as placing Pk on the left of Lj can be
obtained by placing Pk on the right of Lj and modifying Lj by switching the k th and the q th
entries in the j th column. (Note this could possibly interchange a 0 for something nonzero.)
It follows from 5.3 there exists P, the product of permutation matrices, P = Pn−1 · · · P1
each of which switches two rows, and L a lower triangular matrix having all ones on the
main diagonal, L = L′n−1 · · · L′2 L′1 , where the L′j are obtained as just described by moving a
succession of Pk from the left to the right of Lj and modifying the j th column as indicated,
such that
LP A = U.
Then
A = P T L−1 U
138
CHAPTER 5. SOME FACTORIZATIONS
It is customary to write this more simply as
A = P LU
where L is an upper triangular matrix having all ones on the diagonal and P is a permutation
matrix consisting of P1 · · · Pn−1 as described above. This proves the following theorem.
Theorem 5.6.1 Let A be any invertible n × n matrix. Then there exists a permutation
matrix P and a lower triangular matrix L having all ones on the main diagonal and an
upper triangular matrix U such that
A = P LU
5.7
The QR Factorization
As pointed out above, the LU factorization is not a mathematically respectable thing because it does not always exist. There is another factorization which does always exist.
Much more can be said about it than I will say here. At this time, I will only deal with real
matrices and so the inner product will be the usual real dot product.
Definition 5.7.1 An n × n real matrix Q is called an orthogonal matrix if
QQT = QT Q = I.
Thus an orthogonal matrix is one whose inverse is equal to its transpose.
First note that if a matrix is orthogonal this says
∑
∑
QTij Qjk =
Qji Qjk = δ ik
j
Thus
2
|Qx| =
∑
j


i
=
r
2
Qij xj  =
i
Qis Qir xs xr =
∑∑
r
r
r
δ sr xs xr =
∑
s
s
Qis xs Qir xr
s
∑∑∑
s
=
∑∑∑
j
∑∑∑
i
∑
Qis Qir xs xr
i
2
x2r = |x|
r
This shows that orthogonal transformations preserve distances. You can show that if you
have a matrix which does preserve distances, then it must be orthogonal also.
Example 5.7.2 One of the most important examples of an orthogonal matrix is the so
called Householder matrix. You have v a unit vector and you form the matrix
I − 2vvT
This is an orthogonal matrix which is also symmetric. To see this, you use the rules of
matrix operations.
(
)T
I − 2vvT
(
)T
= I T − 2vvT
= I − 2vvT
5.7. THE QR FACTORIZATION
139
so it is symmetric. Now to show it is orthogonal,
(
)(
)
I − 2vvT I − 2vvT
= I − 2vvT − 2vvT + 4vvT vvT
= I − 4vvT + 4vvT = I
2
because vT v = v · v = |v| = 1. Therefore, this is an example of an orthogonal matrix.
Consider the following problem.
Problem 5.7.3 Given two vectors x, y such that |x| = |y| ̸= 0 but x ̸= y and you want an
orthogonal matrix Q such that Qx = y and Qy = x. The thing which works is the Householder matrix
x−y
T
Q≡I −2
2 (x − y)
|x − y|
Here is why this works.
Q (x − y) =
(x − y) − 2
= (x − y) − 2
Q (x + y)
=
(x + y) − 2
x−y
T
2
(x − y) (x − y)
2
|x − y| = y − x
|x − y|
x−y
|x − y|
x−y
2
T
2
|x − y|
x−y
(x − y) (x + y)
2 ((x − y) · (x + y))
|x − y|
)
x−y ( 2
2
= (x + y) − 2
|x|
−
|y|
=x+y
2
|x − y|
= (x + y) − 2
Hence
Qx + Qy
= x+y
Qx − Qy
= y−x
Adding these equations, 2Qx = 2y and subtracting them yields 2Qy = 2x.
A picture of the geometric significance follows.
x
y
The orthogonal matrix Q reflects across the dotted line taking x to y and y to x.
Definition 5.7.4 Let A be an m × n matrix. Then a QR factorization of A consists of two
matrices, Q orthogonal and R upper triangular (right triangular) having all the entries on
the main diagonal nonnegative such that A = QR.
140
CHAPTER 5. SOME FACTORIZATIONS
With the solution to this simple problem, here is how to obtain a QR factorization for
any matrix A. Let
A = (a1 , a2 , · · · , an )
where the ai are the columns. If a1 = 0, let Q1 = I. If a1 ̸= 0, let


|a1 |
 0 


b ≡ . 
 .. 
0
and form the Householder matrix
Q1 ≡ I − 2
(a1 − b)
2
|a1 − b|
(a1 − b)
T
As in the above problem Q1 a1 = b and so
(
)
|a1 | ∗
Q1 A =
0 A2
where A2 is a m−1×n−1 matrix. Now find in the same way as was just done a m−1×m−1
b 2 such that
matrix Q
(
)
∗ ∗
b 2 A2 =
Q
0 A3
(
Let
Q2 ≡
(
Then
Q2 Q1 A =


=
1 0
b2
0 Q
)(
1 0
b2
0 Q
|a1 |
..
.
0
∗
∗
)
.
|a1 | ∗
0 A2

)

∗ ∗ 
0 A3
Continuing this way until the result is upper triangular, you get a sequence of orthogonal
matrices Qp Qp−1 · · · Q1 such that
Qp Qp−1 · · · Q1 A = R
(5.4)
where R is upper triangular.
Now if Q1 and Q2 are orthogonal, then from properties of matrix multiplication,
T
Q1 Q2 (Q1 Q2 ) = Q1 Q2 QT2 QT1 = Q1 IQT1 = I
and similarly
T
(Q1 Q2 ) Q1 Q2 = I.
Thus the product of orthogonal matrices is orthogonal. Also the transpose of an orthogonal
matrix is orthogonal directly from the definition. Therefore, from 5.4
T
A = (Qp Qp−1 · · · Q1 ) R ≡ QR.
This proves the following theorem.
5.8. EXERCISES
141
Theorem 5.7.5 Let A be any real m × n matrix. Then there exists an orthogonal matrix
Q and an upper triangular matrix R having nonnegative entries on the main diagonal such
that
A = QR
and this factorization can be accomplished in a systematic manner.
◮◮
5.8
Exercises

1
1. Find a LU factorization of  2
1

1
2. Find a LU factorization of  1
5

1
3. Find a P LU factorization of  1
2

1
4. Find a P LU factorization of  2
1

1
 1
5. Find a P LU factorization of 
 2
3
2
1
2

0
3 .
3
2
3
0
3
2
1

2
1 .
3

2 1
2 2 .
1 1
2 1
4 2
2 1
2
2
4
2
2
4
3


1
1 .
2
1
2 
.
1 
1
6. Is there only one LU factorization for a given matrix? Hint: Consider the equation
(
) (
)(
)
0 1
1 0
0 1
=
.
0 1
1 1
0 0
7. Here is a matrix and

1
A= 1
0
an LU factorization
 
2 5 0
1
1 4 9 = 1
1 2 5
0
of it.

0 0
1 2
1 0   0 −1
−1 1
0 0
Use this factorization to solve the system of equations
 
1
Ax =  2 
3
8. Find a QR factorization for the matrix


1 2 1
 3 −2 1 
1 0 2

5
0
−1 9 
1 14
142
CHAPTER 5. SOME FACTORIZATIONS
9. Find a QR factorization for the matrix

1 2
 3 0
1 0
1
1
2

0
1 
1
10. If you had a QR factorization, A = QR, describe how you could use it to solve the
equation Ax = b.
11. If Q is an orthogonal matrix, show the columns are an orthonormal set. That is show
that for
(
)
Q = q1 · · · qn
it follows that qi · qj = δ ij . Also show that any orthonormal set of vectors is linearly
independent.
12. Show you can’t expect uniqueness for QR factorizations. Consider


0 0 0
 0 0 1 
0 0 1
and verify this equals

0
0 0
√
1
 0 0
2
2 √
0 0
− 12 2

0
1
√
 1 2 0
2√
1
2 2 0

and also
1
 0
0
0
1
0

0 0
0
0  0 0
0 0
1
√ 
2
0 
0

0
1 .
1
Using Definition 5.7.4, can it be concluded that if A is an invertible matrix it will
follow there is only one QR factorization?
13. Suppose {a1 , · · · , an } are linearly independent vectors in Rn and let
(
)
A = a1 · · · an
Form a QR factorization for A.

(
a1
···
an
)
=
(
q1
···
qn
)



r11
0
..
.
r12
r22
···
···
..
.
0
0
···

r1n
r2n 



rnn
Show that for each k ≤ n,
span (a1 , · · · , ak ) = span (q1 , · · · , qk )
Prove that every subspace of Rn has an orthonormal basis. The procedure just described is similar to the Gram Schmidt procedure which will be presented later.
14. Suppose Qn Rn converges to an orthogonal matrix Q where Qn is orthogonal and Rn
is upper triangular having all positive entries on the diagonal. Show that then Qn
converges to Q and Rn converges to the identity.
Chapter 6
Linear Programming
6.1
Simple Geometric Considerations
One of the most important uses of row operations is in solving linear program problems
which involve maximizing a linear function subject to inequality constraints determined
from linear equations. Here is an example. A certain hamburger store has 9000 hamburger
patties to use in one week and a limitless supply of special sauce, lettuce, tomatoes, onions,
and buns. They sell two types of hamburgers, the big stack and the basic burger. It has also
been determined that the employees cannot prepare more than 9000 of either type in one
week. The big stack, popular with the teenagers from the local high school, involves two
patties, lots of delicious sauce, condiments galore, and a divider between the two patties.
The basic burger, very popular with children, involves only one patty and some pickles
and ketchup. Demand for the basic burger is twice what it is for the big stack. What
is the maximum number of hamburgers which could be sold in one week given the above
limitations?
Let x be the number of basic burgers and y the number of big stacks which could be sold
in a week. Thus it is desired to maximize z = x + y subject to the above constraints. The
total number of patties is 9000 and so the number of patty used is x+2y. This number must
satisfy x + 2y ≤ 9000 because there are only 9000 patty available. Because of the limitation
on the number the employees can prepare and the demand, it follows 2x + y ≤ 9000.
You never sell a negative number of hamburgers and so x, y ≥ 0. In simpler terms the
problem reduces to maximizing z = x + y subject to the two constraints, x + 2y ≤ 9000 and
2x + y ≤ 9000. This problem is pretty easy to solve geometrically. Consider the following
picture in which R labels the region described by the above inequalities and the line z = x+y
is shown for a particular value of z.
143
144
CHAPTER 6. LINEAR PROGRAMMING
x+y =z
2x + y = 4
R
x + 2y = 4
As you make z larger this line moves away from the origin, always having the same slope
and the desired solution would consist of a point in the region, R which makes z as large as
possible or equivalently one for which the line is as far as possible from the origin. Clearly
this point is the point of intersection of the two lines, (3000, 3000) and so the maximum
value of the given function is 6000. Of course this type of procedure is fine for a situation in
which there are only two variables but what about a similar problem in which there are very
many variables. In reality, this hamburger store makes many more types of burgers than
those two and there are many considerations other than demand and available patty. Each
will likely give you a constraint which must be considered in order to solve a more realistic
problem and the end result will likely be a problem in many dimensions, probably many
more than three so your ability to draw a picture will get you nowhere for such a problem.
Another method is needed. This method is the topic of this section. I will illustrate with
this particular problem. Let x1 = x and y = x2 . Also let x3 and x4 be nonnegative variables
such that
x1 + 2x2 + x3 = 9000, 2x1 + x2 + x4 = 9000.
To say that x3 and x4 are nonnegative is the same as saying x1 + 2x2 ≤ 9000 and 2x1 + x2 ≤
9000 and these variables are called slack variables at this point. They are called this because
they “take up the slack”. I will discuss these more later. First a general situation is
considered.
6.2
The Simplex Tableau
Here is some notation.
Definition 6.2.1 Let x, y be vectors in Rq . Then x ≤ y means for each i, xi ≤ yi .
The problem is as follows:
Let A be an m × (m + n) real matrix of rank m. It is desired to find x ∈ Rn+m such
that x satisfies the constraints,
x ≥ 0, Ax = b
(6.1)
and out of all such x,
z≡
m+n
∑
ci xi
i=1
is as large (or small) as possible. This is usually referred to as maximizing or minimizing z
subject to the( above constraints.) First I will consider the constraints.
Let A = a1 · · · an+m . First you find a vector, x0 ≥ 0, Ax0 = b such that n of
the components of this vector equal 0. Letting i1 , · · · , in be the positions of x0 for which
6.2. THE SIMPLEX TABLEAU
145
x0i = 0, suppose also that {aj1 , · · · , ajm } is linearly independent for ji the other positions
of x0 . Geometrically, this means that x0 is a corner of the feasible region, those x which
satisfy the constraints. This is called a basic feasible solution. Also define
cB
xB
and
≡ (cj1 . · · · , cjm ) , cF ≡ (ci1 , · · · , cin )
≡ (xj1 , · · · , xjm ) , xF ≡ (xi1 , · · · , xin ) .
( ) (
z ≡ z x0 = c B
0
cF
)
(
x0B
x0F
)
= cB x0B
since x0F = 0. The variables which are the components of the vector xB are called the basic
variables and the variables which are the entries of xF are called the free variables. You
)T
(
set xF = 0. Now x0 , z 0 is a solution to
(
)(
) (
)
A 0
x
b
=
−c 1
z
0
along with the constraints x ≥ 0. Writing the above in augmented matrix form yields
(
)
A 0 b
(6.2)
−c 1 0
Permute the columns and variables on the left if necessary to write the above in the form


(
)
(
)
x
b
B
F
0  B 
xF
=
(6.3)
0
−cB −cF 1
z
or equivalently in the augmented matrix form keeping track of the variables on the bottom
as


B
F
0 b
 −cB −cF 1 0  .
(6.4)
xB
xF
0 0
Here B pertains to the variables xi1 , · · · , xjm and is an m × m matrix with linearly independent columns, {aj1 , · · · , ajm } , and F is an m × n matrix. Now it is assumed that
(
)
(
)
(
) x0B
(
) x0B
B F
= B F
= Bx0B = b
x0F
0
and since B is assumed to have rank m, it follows
x0B = B −1 b ≥ 0.
(6.5)
This is very important to observe. B −1 b ≥ 0! This is by the assumption that x0 ≥ 0.
Do row operations on the top part of the matrix
(
)
B
F
0 b
(6.6)
−cB −cF 1 0
and obtain its row reduced echelon form. Then after these row operations the above becomes
(
)
I
B −1 F 0 B −1 b
.
(6.7)
−cB −cF
1
0
146
where B −1 b ≥ 0. Next do another
Thus
(
I
0
(
I
=
0
(
I
=
0
CHAPTER 6. LINEAR PROGRAMMING
row operation in order to get a 0 where you see a −cB .
B −1 F
cB B −1 F ′ − cF
B −1 F
cB B −1 F ′ − cF
B −1 F
cB B −1 F − cF
B −1 b
cB B −1 b
)
0 B −1 b
1 cB x0B
)
0 B −1 b
1
z0
0
1
)
(6.8)
(6.9)
)T
(
The reason there is a z 0 on the bottom right corner is that xF = 0 and x0B , x0F , z 0 is a
solution of the system of equations represented by the above augmented matrix because it is
a solution to the system of equations corresponding to the system of equations represented
by 6.6 and row operations leave solution sets unchanged. Note how attractive this is. The z0
is the value of z at the point x0 . The augmented matrix of 6.9 is called the simplex tableau
and it is the beginning point for the simplex algorithm to be described a little later. It is
very convenient to express the simplex
in the above form in which the variables are
( tableau
)
I
possibly permuted in order to have
on the left side. However, as far as the simplex
0
algorithm is concerned it is not necessary to be permuting the variables in this manner.
Starting with 6.9 you could permute the variables and columns to obtain an augmented
matrix in which the variables are in their original order. What is really required for the
simplex tableau?
It is an augmented m + 1 × m + n + 2 matrix which represents a system of equations
T
which has the same set of solutions, (x,z) as the system whose augmented matrix is
(
)
A 0 b
−c 1 0
(Possibly the variables for x are taken in another order.) There are m linearly independent
columns in the first m + n columns for which there is only one nonzero entry, a 1 in one of
the first m rows, the “simple columns”, the other first m + n columns being the “nonsimple
columns”. As in the above, the variables corresponding to the simple columns are xB ,
the basic variables and those corresponding to the nonsimple columns are xF , the free
variables. Also, the top m entries of the last column on the right are nonnegative. This is
the description of a simplex tableau.
In a simplex tableau it is easy to spot a basic feasible solution. You can see one quickly
by setting the variables, xF corresponding to the nonsimple columns equal to zero. Then
the other variables, corresponding to the simple columns are each equal to a nonnegative
entry in the far right column. Lets call this an “obvious basic feasible solution”. If a
solution is obtained by setting the variables corresponding to the nonsimple columns equal
to zero and the variables corresponding to the simple columns equal to zero this will be
referred to as an “obvious” solution. Lets also call the first m + n entries in the bottom
row the “bottom left row”. In a simplex tableau, the entry in the bottom right corner gives
the value of the variable being maximized or minimized when the obvious basic feasible
solution is chosen.
The following is a special case of the general theory presented above and shows how such
a special case can be fit into the above framework. The following example is rather typical
of the sorts of problems considered. It involves inequality constraints instead of Ax = b.
This is handled by adding in “slack variables” as explained below.
The idea is to obtain an augmented matrix for the constraints such that obvious solutions
are also feasible. Then there is an algorithm, to be presented later, which takes you from
one obvious feasible solution to another until you obtain the maximum.
6.2. THE SIMPLEX TABLEAU
147
Example 6.2.2 Consider z = x1 −x2 subject to the constraints, x1 +2x2 ≤ 10, x1 +2x2 ≥ 2,
and 2x1 + x2 ≤ 6, xi ≥ 0. Find a simplex tableau for a problem of the form x ≥ 0,Ax = b
which is equivalent to the above problem.
You add in slack variables. These are positive variables, one for each of the first three constraints, which change the first three inequalities into equations. Thus the first three inequalities become x1 +2x2 +x3 = 10, x1 +2x2 −x4 = 2, and 2x1 +x2 +x5 = 6, x1 , x2 , x3 , x4 , x5 ≥ 0.
Now it is necessary to find a basic feasible solution. You mainly need to find a positive solution to the equations,
x1 + 2x2 + x3 = 10
x1 + 2x2 − x4 = 2
2x1 + x2 + x5 = 6
The solution set for the above system is given by
2
2 1
1
10 2
x4 − + x5 , x1 = − x4 +
− x5 , x3 = −x4 + 8.
3
3 3
3
3
3
An easy way to get a basic feasible solution is to let x4 = 8 and x5 = 1. Then a feasible
solution is
(x1 , x2 , x3 , x4 , x5 ) = (0, 5, 0, 8, 1) .
(
)
A 0 b
0
It follows z = −5 and the matrix 6.2,
with the variables kept track of on
−c 1 0
the bottom is


1
2
1
0
0 0 10
 1
2
0 −1 0 0 2 


 2
1
0
0
1 0 6 


 −1 1
0
0
0 1 0 
x1 x2 x3 x4 x5 0 0
x2 =
and the first thing to do is to permute the columns so that the list of variables on the bottom
will have x1 and x3 at the end.


2
0
0
1
1 0 10
 2 −1 0
1
0 0 2 


 1
0
1
2
0 0 6 


 1
0
0 −1 0 1 0 
x2 x4 x5 x1 x3 0 0
Next, as described above, take the
above matrix. This yields

1
 0
0
Now do row operations to

1
 0

 0
1
to finally obtain

row reduced echelon form of the top three lines of the
0 0
1 0
0 1
0 0
1 0
0 1
0 0
1 0 0
 0 1 0

 0 0 1
0 0 0
1
2
1
2
0
1
− 12
3
2
1
2
1
2
0
1
− 21
−1 0
3
2
1
2
1
2
0
1
− 12
− 12
3
2
− 32

0 5
0 8 .
0 1
0
0
0
1
0
0
0
1

5
8 

1 
0

5
8 

1 
−5
148
CHAPTER 6. LINEAR PROGRAMMING
and this is a simplex tableau. The variables are x2 , x4 , x5 , x1 , x3 , z.
It isn’t as hard as it may appear from the above. Lets not permute the variables and
simply find an acceptable simplex tableau as described above.
Example 6.2.3 Consider z = x1 −x2 subject to the constraints, x1 +2x2 ≤ 10, x1 +2x2 ≥ 2,
and 2x1 + x2 ≤ 6, xi ≥ 0. Find a simplex tableau.
is
Adding in slack variables, an augmented

1 2 1
 1 2 0
2 1 0
matrix which is descriptive of the constraints

0 0 10
−1 0 6 
0 1 6
The obvious solution is not feasible because of that -1 in the fourth column. When you let
x1 , x2 = 0, you end up having x4 = −6 which is negative. Consider the second column and
select the 2 as a pivot to zero out that which is above and below the 2.


0 0 1 1 0 4
 1 1 0 −1 0 3 
2
2
3
1
0 0
1 3
2
2
This one is good. When you let x1 = x4 = 0, you find that x2 = 3, x3 = 4, x5 = 3. The
obvious solution is now feasible. You can now assemble the simplex tableau. The first step
is to include a column and row for z. This yields


0 0 1 1 0 0 4
 1 1 0 −1 0 0 3 
2
 23

1

0 0
1 0 3 
2
2
−1 0 1 0 0 1 0
Now you need to get zeros in the right places so the simple columns will be preserved as
simple columns in this larger matrix. This means you need to zero out the 1 in the third
column on the bottom. A simplex tableau is now


0 0 1 1 0 0 4
 1 1 0 −1 0 0 3 
2
.
 32
1

0 0
1 0 3 
2
2
−1 0 0 −1 0 1 −4
Note it is not the same one obtained earlier. There is no reason a simplex tableau should
be unique. In fact, it follows from the above general description that you have one for each
basic feasible point of the region determined by the constraints.
6.3
6.3.1
The Simplex Algorithm
Maximums
The simplex algorithm takes you from one basic feasible solution to another while maximizing or minimizing the function you are trying to maximize or minimize. Algebraically,
it takes you from one simplex tableau to another in which the lower right corner either
increases in the case of maximization or decreases in the case of minimization.
I will continue writing the simplex tableau in such a way that the simple columns having
only one entry nonzero are on the left. As explained above, this amounts to permuting the
variables. I will do this because it is possible to describe what is going on without onerous
6.3. THE SIMPLEX ALGORITHM
149
notation. However, in the examples, I won’t worry so much about it. Thus, from a basic
feasible solution, a simplex tableau of the following form has been obtained in which the
columns for the basic variables, xB are listed first and b ≥ 0.
(
)
I F 0 b
(6.10)
0 c 1 z0
(
)
Let x0i = bi for i = 1, · · · , m and x0i( = 0 for
i > m. Then x0 , z 0 is a solution to the above
)
system and since b ≥ 0, it follows x0 , z 0 is a basic feasible solution.
(
)
F
If ci < 0 for some i, and if Fji ≤ 0 so that a whole column of
is ≤ 0 with the
c
bottom entry < 0, then letting xi be the variable corresponding to that column, you could
leave all the other entries of xF equal to zero but change xi to be positive. Let the new
vector be denoted by x′F and letting x′B = b − F x′F it follows
∑
(x′B )k = bk −
Fkj (xF )j
j
= bk − Fki xi ≥ 0
Now this shows (x′B , x′F ) is feasible whenever xi > 0 and so you could let xi become
arbitrarily large and positive and conclude there is no maximum for z because
z = (−ci ) xi + z 0
(6.11)
If this happens in a simplex tableau, you can say there is no maximum and stop.
What if c ≥ 0? Then z = z 0 − cxF and to satisfy the constraints, you need xF ≥ 0.
Therefore, in this case, z 0 is the largest possible value of z and so the maximum has been
found. You stop when this occurs. Next I explain what to do if neither of the above stopping
conditions hold.
(The only
) case which remains is that some ci < 0 and some Fji > 0. You pick a column
F
in
in which ci < 0, usually the one for which ci is the largest in absolute value.
c
You pick Fji > 0 as a pivot element, divide the j th row by Fji and then use to obtain
zeros above Fji and below Fji , thus obtaining a new simple column. This row operation
also makes exactly one of the other simple columns into a nonsimple column. (In terms of
variables, it is said that a free variable becomes a basic variable and a basic variable becomes
a free variable.) Now permuting the columns and variables, yields
(
)
I F ′ 0 b′
0 c′ 1 z 0′
( )
bj
where z 0′ ≥ z 0 because z 0′ = z 0 − ci Fji
and ci < 0. If b′ ≥ 0, you are in the same
position you were at the beginning but now z 0 is larger. Now here is the important thing.
You don’t pick just any Fji when you do these row operations. You pick the positive one
for which the row operation results in b′ ≥ 0. Otherwise the obvious basic feasible
solution obtained by letting x′F = 0 will fail to satisfy the constraint that x ≥ 0.
How is this done? You need
b′k ≡ bk −
Fki bj
≥0
Fji
(6.12)
Fki bj
.
Fji
(6.13)
for each k = 1, · · · , m or equivalently,
bk ≥
150
CHAPTER 6. LINEAR PROGRAMMING
Now if Fki ≤ 0 the above holds. Therefore, you only need to check Fpi for Fpi > 0. The
pivot, Fji is the one which makes the quotients of the form
bp
Fpi
for all positive Fpi the smallest. This will work because for Fki > 0,
bp
bk
Fki bp
≤
⇒ bk ≥
Fpi
Fki
Fpi
Having gotten a new simplex tableau, you do the same thing to it which was just done
and continue. As long as b > 0, so you don’t encounter the degenerate case, the values
for z associated with setting xF = 0 keep getting strictly larger every time the process is
repeated. You keep going until you find c ≥ 0. Then you stop. You are at a maximum.
Problems can occur in the process in the so called degenerate case when at some stage of
the process some bj = 0. In this case you can cycle through different values for x with no
improvement in z. This case will not be discussed here.
Example 6.3.1 Maximize 2x1 + 3x2 subject to the constraints x1 + x2 ≥ 1, 2x1 + x2 ≤
6, x1 + 2x2 ≤ 6, x1 , x2 ≥ 0.
The constraints are of the form
x1 + x2 − x3
2x1 + x2 + x4
= 1
= 6
x1 + 2x2 + x5
= 6
where the x3 , x4 , x5 are the slack variables. An augmented matrix for these equations is of
the form


1 1 −1 0 0 1
 2 1 0 1 0 6 
1 2 0 0 1 6
Obviously the obvious solution is not feasible.
basic variables. Lets just try something.

1 1 −1
 0 −1 2
0 1
1
It results in x3 < 0. We need to exchange

0 0 1
1 0 4 
0 1 5
Now this one is all right because the obvious solution is feasible. Letting x2 = x3 = 0,
it follows that the obvious solution is feasible. Now we add in the objective function as
described above.


1
1 −1 0 0 0 1
 0 −1 2 1 0 0 4 


 0
1
1 0 1 0 5 
−2 −3 0 0 0 1 0
Then do row operations to leave the simple columns

1 1 −1 0 0
 0 −1 2 1 0

 0 1
1 0 1
0 −1 −2 0 0
the same. Then

0 1
0 4 

0 5 
1 2
6.3. THE SIMPLEX ALGORITHM
151
Now there are negative numbers on the bottom row to the left of the 1. Lets pick the first.
(It would be more sensible to pick the second.) The ratios to look at are 5/1, 1/1 so pick for
the pivot the 1 in the second column and first row. This will leave the right column above
the lower right corner nonnegative. Thus the next tableau is


1 1 −1 0 0 0 1
 1 0 1 1 0 0 5 


 −1 0 2 0 1 0 4 
1 0 −3 0 0 1 3
There is still a negative number there to the left of the 1 in the bottom row. The new ratios
are 4/2, 5/1 so the new pivot is the 2 in the third column. Thus the next tableau is
 1

1
1 0 0
0 3
2
2
 3
0 0 1 − 12 0 3 
 2

 −1 0 2 0 1 0 4 
3
− 12 0 0 0
1 9
2
Still, there is a negative number in the bottom row to the left of the 1 so the process does
not stop yet. The ratios are 3/ (3/2) and 3/ (1/2) and so the new pivot is that 3/2 in the
first column. Thus the new tableau is


2
0 2
0 1 0 − 13
3
 3 0 0 1 −1 0 3 
2

 2
2
2
 0 0 2
0 6 
3
3
1
4
0 0 0
1 10
3
3
Now stop. The maximum value is 10. This is an easy enough problem to do geometrically
and so you can easily verify that this is the right answer. It occurs when x4 = x5 = 0, x1 =
2, x2 = 2, x3 = 3.
6.3.2
Minimums
How does it differ if you are finding a minimum? From a basic feasible solution, a simplex
tableau of the following form has been obtained in which the simple columns for the basic
variables, xB are listed first and b ≥ 0.
(
)
I F 0 b
(6.14)
0 c 1 z0
(
)
Let x0i = bi for i = 1, · · · , m and x0i (= 0 for) i > m. Then x0 , z 0 is a solution to the above
system and since b ≥ 0, it follows x0 , z 0 is a basic feasible solution. So far, there is no
change.
Suppose first that some ci > 0 and Fji ≤ 0 for each j. Then let x′F consist of changing xi
by making it positive but leaving the other entries of xF equal to 0. Then from the bottom
row,
z = −ci xi + z 0
and you let x′B = b − F x′F ≥ 0. Thus the constraints continue to hold when xi is made
increasingly positive and it follows from the above equation that there is no minimum for
z. You stop when this happens.
Next suppose c ≤ 0. Then in this case, z = z 0 − cxF and from the constraints, xF ≥ 0
and so −cxF ≥ 0 and so z 0 is the minimum value and you stop since this is what you are
looking for.
152
CHAPTER 6. LINEAR PROGRAMMING
What do you do in the case where some ci > 0 and some Fji > 0? In this case, you use
the simplex algorithm as in the case of maximums to obtain a new simplex tableau in which
z 0′ is smaller. You choose Fji the same way to be the positive entry of the ith column such
that bp /Fpi ≥ bj /Fji for all positive entries, Fpi and do the same row operations. Now this
time,
(
)
bj
0′
0
z = z − ci
< z0
Fji
As in the case of maximums no problem can occur and the process will converge unless
you have the degenerate case in which some bj = 0. As in the earlier case, this is most
unfortunate when it occurs. You see what happens of course. z 0 does not change and the
algorithm just delivers different values of the variables forever with no improvement.
To summarize the geometrical significance of the simplex algorithm, it takes you from one
corner of the feasible region to another. You go in one direction to find the maximum and
in another to find the minimum. For the maximum you try to get rid of negative entries of c
and for minimums you try to eliminate positive entries of c, where the method of elimination
involves the auspicious use of an appropriate pivot element and row operations.
Now return to Example 6.2.2. It will be modified to be a maximization problem.
Example 6.3.2 Maximize z = x1 − x2 subject to the constraints,
x1 + 2x2 ≤ 10, x1 + 2x2 ≥ 2,
and 2x1 + x2 ≤ 6, xi ≥ 0.
Recall this is the same as maximizing z = x1 − x2 subject to


x1




 x2 
1 2 1 0 0
10


 1 2 0 −1 0   x3  =  2  , x ≥ 0,


 x4 
2 1 0 0 1
6
x5
the variables, x3 , x4 , x5 being slack variables. Recall the simplex tableau was


1
1
1 0 0
0 5
2
2
 0 1 0 0
1 0 8 


3
1
 0 0 1
−
0 1 
2
2
0 0 0 − 32 − 12 1 −5
with the variables ordered as x2 , x4 , x5 , x1 , x3 and so xB = (x2 , x4 , x5 ) and
xF = (x1 , x3 ) .
Apply the simplex algorithm to the fourth column because − 32 < 0 and this is the most
negative entry in the bottom row. The pivot is 3/2 because 1/(3/2) = 2/3 < 5/ (1/2) .
Dividing this row by 3/2 and then using this to zero out the other elements in that column,
the new simplex tableau is


2
0 14
1 0 − 31 0
3
3
 0 1 0 0 1 0 8 
.

2
 0 0
1 − 13 0 23 
3
0 0 1 0 −1 1 −4
6.3. THE SIMPLEX ALGORITHM
153
Now there is still a negative number in the bottom left row. Therefore, the process should
be continued. This time the pivot is the 2/3 in the top of the column. Dividing the top row
by 2/3 and then using this to zero out the entries below it,
 3

0 − 12 0 1 0 7
2
1
 −3 1
0 0 0 1 
2
 12
.
1

0
1 0 0 3 
2
2
3
1
0
0 0 1 3
2
2
Now all the numbers on the bottom left row are nonnegative so the process stops. Now
recall the variables and columns were ordered as x2 , x4 , x5 , x1 , x3 . The solution in terms of
x1 and x2 is x2 = 0 and x1 = 3 and z = 3. Note that in the above, I did not worry about
permuting the columns to keep those which go with the basic variables on the left.
Here is a bucolic example.
Example 6.3.3 Consider the following table.
iron
protein
folic acid
copper
calcium
F1
1
5
1
2
1
F2
2
3
2
1
1
F3
1
2
2
1
1
F4
3
1
1
1
1
This information is available to a pig farmer and Fi denotes a particular feed. The numbers
in the table contain the number of units of a particular nutrient contained in one pound of
the given feed. Thus F2 has 2 units of iron in one pound. Now suppose the cost of each feed
in cents per pound is given in the following table.
F1
2
F2
3
F3
2
F4
3
A typical pig needs 5 units of iron, 8 of protein, 6 of folic acid, 7 of copper and 4 of calcium.
(The units may change from nutrient to nutrient.) How many pounds of each feed per pig
should the pig farmer use in order to minimize his cost?
His problem is to minimize C ≡ 2x1 + 3x2 + 2x3 + 3x4 subject to the constraints
x1 + 2x2 + x3 + 3x4
≥
5,
5x1 + 3x2 + 2x3 + x4
x1 + 2x2 + 2x3 + x4
2x1 + x2 + x3 + x4
≥
≥
≥
8,
6,
7,
x1 + x2 + x3 + x4
≥
4.
where each xi ≥ 0. Add in the slack variables,
x1 + 2x2 + x3 + 3x4 − x5
= 5
5x1 + 3x2 + 2x3 + x4 − x6
x1 + 2x2 + 2x3 + x4 − x7
= 8
= 6
2x1 + x2 + x3 + x4 − x8
x1 + x2 + x3 + x4 − x9
= 7
= 4
154
The augmented matrix for

1
 5

 1

 2
1
CHAPTER 6. LINEAR PROGRAMMING
this system is
2 1 3
3 2 1
2 2 1
1 1 1
1 1 1
−1
0
0
0
0
0
0
0
−1 0
0
0 −1 0
0
0 −1
0
0
0
0
0
0
0
−1
5
8
6
7
4






How in the world can you find a basic feasible solution? Remember the simplex algorithm
is designed to keep the entries in the right column nonnegative so you use this algorithm a
few times till the obvious solution is a basic feasible solution.
Consider the first column. The pivot is the 5. Using the row operations described in the
algorithm, you get


7
3
14
1
17
0
−1
0
0
0
5
5
5
5
5
3
2
1
8 
 1
0 − 15
0
0
0
5
5
5
5 

7
8
4
1
22
 0

0
−1 0
0
5
5
5
5
5 

2
19 
 0 −1 1 3
0
0
−1
0
5
5
5
5
5
2
3
4
1
0
0
0
0 −1 12
5
5
5
5
5
Now go to the second column. The pivot in this column is the 7/5. This is in a different
row than the pivot in the first column so I will use it to zero out everything below it. This
will get rid of the zeros in the fifth column and introduce zeros in the second. This yields


1
17
0 1 37
2 − 57
0
0
0
7
7
3
1 
 1 0 1 −1
− 27
0
0
0
7
7
7 

 0 0 1 −2 1
0 −1 0
0
1 


3
30 
 0 0 2
1 − 17
0 −1 0
7
7
7
2
1
0 0 37
0
0
0 −1 10
7
7
7
Now consider another column, this time the fourth. I will pick this one because it has
some negative numbers in it so there are fewer entries to check in looking for a pivot.
Unfortunately, the pivot is the top 2 and I don’t want to pivot on this because it would
destroy the zeros in the second column. Consider the fifth column. It is also not a good
choice because the pivot is the second element from the top and this would destroy the zeros
in the first column. Consider the sixth column. I can use either of the two bottom entries
as the pivot. The matrix is


0 1 0
2 −1 0 0
0
1
1
 1 0 1 −1 1 0 0
0 −2 3 


 0 0 1 −2 1 0 −1 0
0
1 


 0 0 −1 1 −1 0 0 −1 3
0 
0 0 3
0
2 1 0
0 −7 10
Next consider the third column. The pivot

0 1 0 2 −1
 1 0 0 1
0

 0 0 1 −2 1

 0 0 0 −1 0
0 0 0 6 −1
is the 1 in the third row. This yields

0 0
0
1 1
0 1
0 −2 2 

0 −1 0
0 1 
.
0 −1 −1 3 1 
1 3
0 −7 7
There are still 5 columns which consist entirely of zeros except for one entry. Four of them
have that entry equal to 1 but one still has a -1 in it, the -1 being in the fourth column.
6.3. THE SIMPLEX ALGORITHM
155
I need to do the row operations on a nonsimple column which has the pivot in the fourth
row. Such a column is the second to the last. The pivot is the 3. The new matrix is


7
1
1
0 1 0
−1 0
0 23
3
3
3
1
1
 1 0 0
0 0
− 23 0 83 
3
3


 0 0 1 −2 1 0 −1
0 0 1 
(6.15)

.
1
1
1 
 0 0 0 −1
0
0
−
−
1
3
3
3
3
2
0 0 0 11
−1 1
− 73 0 28
3
3
3
Now the obvious basic solution is feasible. You let x4 = 0 = x5 = x7 = x8 and x1 =
8/3, x2 = 2/3, x3 = 1, and x6 = 28/3. You don’t need to worry too much about this. It is
the above matrix which is desired. Now you can assemble the simplex tableau and begin
the algorithm. Remember C ≡ 2x1 + 3x2 + 2x3 + 3x4 . First add the row and column which
deal with C. This yields


7
1
1
0
1
0
−1 0
0 0 23
3
3
3
1
1
 1
0
0
0 0
− 23 0 0 83 
3
3


 0
0
1 −2 1 0 −1
0 0 0 1 


(6.16)
 0
0
0 − 13
0 0 − 13 − 13 1 0 13 


11
2
7
28
 0
0
0
−1 1
−3 0 0 3 
3
3
−2 −3 −2 −3 0 0 0
0 0 1 0
Now you do row operations to keep the simple columns of 6.15 simple in 6.16. Of course
you could permute the columns if you wanted but this is not necessary.
This yields the following for a simplex tableau. Now it is a matter of getting rid of the
positive entries in the bottom row because you are trying to minimize.


1
1
7
−1 0
0 0 23
0 1 0
3
3
3
1
1
 1 0 0
0 0
− 23 0 0 83 
3
3


 0 0 1 −2 1 0 −1
0 0 0 1 


 0 0 0 −1
0 0 − 31 − 13 1 0 13 
3


2
 0 0 0 11 −1 1

− 73 0 0 28
3
3
3
2
1
1
28
0 0 0
−1 0 − 3 − 3 0 1 3
3
The most positive of them is the 2/3 and so I will apply the algorithm to this one first. The
pivot is the 7/3. After doing the row operation the next tableau is


3
1
1
0 1 − 73 0
0 0 27
0
7
7
7
1
2
 1 −1 0 0

0
− 75 0 0 18
7
7
7
7 

6
1
5
2
11 
 0
1
0
0
−
0
0
7
7
7
7
7 

1
 0
0 0 − 71 0 − 27 − 27 1 0 37 
7


4
1
 0 − 11 0 0

1
− 20
0 0 58
7
7
7
7
7
0 − 27 0 0 − 57 0 − 37 − 37 0 1 64
7
and you see that all the entries are negative and so the minimum is 64/7 and it occurs when
x1 = 18/7, x2 = 0, x3 = 11/7, x4 = 2/7.
There is no maximum for the above problem. However, I will pretend I don’t know this
and attempt to use the simplex algorithm. You set up the simiplex tableau the same way.
Recall it is


1
1
7
−1 0
0 0 23
0 1 0
3
3
3
1
1
 1 0 0
0 0
− 23 0 0 83 
3
3


 0 0 1 −2 1 0 −1
0 0 0 1 


 0 0 0 −1
0 0 − 31 − 13 1 0 13 
3


2

 0 0 0 11 −1 1
− 73 0 0 28
3
3
3
1
1
28
2
−1
0
−
−
0
1
0 0 0
3
3
3
3
156
CHAPTER 6. LINEAR PROGRAMMING
Now to maximize, you try to get rid of the negative entries in the bottom left row. The
most negative entry is the -1 in the fifth column. The pivot is the 1 in the third row of this
column. The new tableau is


1
1
0 0 − 23
0 0 35
0 1 1
3
3
1
1
 1 0 0
0 0
− 23 0 0 83 
3
3


 0 0 1 −2 1 0 −1
0 0 0 1 


 0 0 0 −1 0 0 −1 −1 1 0 1  .
3
3
3
3 

5

 0 0 1
0 1 − 13 − 73 0 0 31
3
3
4
0 0 1 − 3 0 0 − 43 − 13 0 1 31
3
Consider the fourth column. The pivot

0 3
3 1
 1 −1 −1 0

 0 6
7 0

 0 1
1
0

 0 −5 −4 0
0 4
5 0
is the top 1/3. The new tableau is

0 0 −2 1 0 0 5
0 0 1 −1 0 0 1 

1 0 −5 2 0 0 11 

0 0 −1 0 1 0 2 

0 1 3 −4 0 0 2 
0 0 −4 1 0 1 17
There is still a negative
algorithm yields

0
 1

 0

 0

 0
0
0
0
1
0
0
0
in the bottom, the -4. The pivot in that column is the 3. The
− 13
2
3
− 73
− 23
− 53
− 83
1
3
1
3
1
3
− 13
− 43
− 13
1
0
0
0
0
0
2
3
− 31
5
3
1
3
1
3
4
3
0
0
0
0
1
0
− 53
1
3
− 14
3
− 43
− 43
− 13
3
Note how z keeps getting larger. Consider the column
the single positive entry, 1/3. The next tableau is

5 3 2 1 0 −1 0 0
 3 2 1 0 0 −1 0 1

 14 7 5 0 1 −3 0 0

 4 2 1 0 0 −1 0 0

 4 1 0 0 0 −1 1 0
13 6 4 0 0 −3 0 0
0 0
0 0
0 0
1 0
0 0
0 1
19
3
1
3
43
3
8
3
2
3
59
3








having the −13/3 in it. The pivot is
0
0
0
1
0
0
0
0
0
0
0
1
8
1
19
4
2
24




.



There is a column consisting of all negative entries. There is therefore, no maximum. Note
also how there is no way to pick the pivot in that column.
Example 6.3.4 Minimize z = x1 − 3x2 + x3 subject to the constraints x1 + x2 + x3 ≤
10, x1 + x2 + x3 ≥ 2, x1 + x2 + 3x3 ≤ 8 and x1 + 2x2 + x3 ≤ 7 with all variables nonnegative.
There exists an answer because the region defined by the constraints is closed and
bounded. Adding in slack variables you get the following augmented matrix corresponding
to the constraints.


1 1 1 1 0 0 0 10
 1 1 1 0 −1 0 0 2 


 1 1 3 0 0 1 0 8 
1 2 1 0 0 0 1 7
6.3. THE SIMPLEX ALGORITHM
157
Of course there is a problem with the obvious solution obtained by setting to zero all
variables corresponding to a nonsimple column because of the simple column which has the
−1 in it. Therefore, I will use the simplex algorithm to make this column non simple. The
third column has the 1 in the second row as the pivot so I will use this column. This yields


0
0 0 1 1 0 0 8
 1
1 1 0 −1 0 0 2 


(6.17)
 −2 −2 0 0 3 1 0 2 
0
1 0 0 1 0 1 5
and the obvious solution is feasible. Now it is time to assemble the simplex tableau. First
add in the bottom row and second to last column corresponding to the equation for z. This
yields


0
0
0 1 1 0 0 0 8
 1
1
1 0 −1 0 0 0 2 


 −2 −2 0 0 3 1 0 0 2 


 0
1
0 0 1 0 1 0 5 
−1 3 −1 0 0 0 0 1 0
Next you need to zero out the entries in the bottom
columns in 6.17. This yields the simplex tableau

0
0 0 1 1 0
 1
1 1 0 −1 0

 −2 −2 0 0 3 1

 0
1 0 0 1 0
0
4 0 0 −1 0
row which are below one of the simple

0 0 8
0 0 2 

0 0 2 
.
1 0 5 
0 1 2
The desire is to minimize this so you need to get rid of the positive entries in the left bottom
row. There is only one such entry, the 4. In that column the pivot is the 1 in the second
row of this column. Thus the next tableau is


0 0 0 1 1 0 0 0 8
 1 1 1 0 −1 0 0 0 2 


 0 0 2 0 1 1 0 0 6 


 −1 0 −1 0 2 0 1 0 3 
−4 0 −4 0 3 0 0 1 −6
There is still a positive number there, the 3. The
algorithm again. This yields
 1
1
0
1 0 0
2
2
1
1

1
0 0 0
2
 12
5

0
0 0 1
2
 21
1
 −
0
−
0 1 0
2
2
− 52 0 − 52 0 0 0
pivot in this column is the 2. Apply the
− 12
1
2
− 12
1
2
− 32
0
0
0
0
1
13
2
7
2
9
2
3
2
− 21
2



.


Now all the entries in the left bottom row are nonpositive so the process has stopped. The
minimum is −21/2. It occurs when x1 = 0, x2 = 7/2, x3 = 0.
Now consider the same problem but change the word, minimize to the word, maximize.
Example 6.3.5 Maximize z = x1 − 3x2 + x3 subject to the constraints x1 + x2 + x3 ≤
10, x1 + x2 + x3 ≥ 2, x1 + x2 + 3x3 ≤ 8 and x1 + 2x2 + x3 ≤ 7 with all variables nonnegative.
158
CHAPTER 6. LINEAR PROGRAMMING
The first part of it is the same. You wind up with

0
0 0 1 1 0
 1
1 1 0 −1 0

 −2 −2 0 0 3 1

 0
1 0 0 1 0
0
4 0 0 −1 0
the same simplex tableau,

0 0 8
0 0 2 

0 0 2 

1 0 5 
0 1 2
but this time, you apply the algorithm to get rid of the negative entries in the left bottom
row. There is a −1. Use this column. The pivot is the 3. The next tableau is

 2
2
0 1 0 − 13 0 0 22
3
3
3
1
1
 1
1 0 0
0 0 38 
3
3

 32
2
1
 −
0 0 32 
3

 23 −53 0 0 1


0 0 0 − 13 1 0 13
3
3
3
2
10
1
8
−3
0 0 0
0 1 3
3
3
There is still a negative entry, the −2/3.
the 2/3 on the fourth row. This yields

0 −1 0 1
 0 −1 1 0
2

 0 1 0 0

5
 1
0 0
2
0 5 0 0
This will be the new pivot column. The pivot is
0 0
1
0
2
1 0
0 − 21
0 0
−1
− 21
1
3
2
1
0
0
0
0
1
3



5 

13 
2
7
1
2
and the process stops. The maximum for z is 7 and it occurs when x1 = 13/2, x2 = 0, x3 =
1/2.
6.4
Finding A Basic Feasible Solution
By now it should be fairly clear that finding a basic feasible solution can create considerable
difficulty. Indeed, given a system of linear inequalities along with the requirement that each
variable be nonnegative, do there even exist points satisfying all these inequalities? If you
have many variables, you can’t answer this by drawing a picture. Is there some other way
to do this which is more systematic than what was presented above? The answer is yes. It
is called the method of artificial variables. I will illustrate this method with an example.
Example 6.4.1 Find a basic feasible solution to the system 2x1 +x2 −x3 ≥ 3, x1 +x2 +x3 ≥
2, x1 + x2 + x3 ≤ 7 and x ≥ 0.
If you write the appropriate augmented

2 1 −1
 1 1 1
1 1 1
matrix with the slack variables,

−1 0 0 3
0 −1 0 2 
0
0 1 7
(6.18)
The obvious solution is not feasible. This is why it would be hard to get started with
the simplex method. What is the problem? It is those −1 entries in the fourth and fifth
columns. To get around this, you add in artificial variables to get an augmented matrix of
the form


2 1 −1 −1 0 0 1 0 3
 1 1 1
0 −1 0 0 1 2 
(6.19)
1 1 1
0
0 1 0 0 7
6.4. FINDING A BASIC FEASIBLE SOLUTION
159
Thus the variables are x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 . Suppose you can find a feasible solution
to the system of equations represented by the above augmented matrix. Thus all variables
are nonnegative. Suppose also that it can be done in such a way that x8 and x7 happen to
be 0. Then it will follow that x1 , · · · , x6 is a feasible solution for 6.18. Conversely, if you can
find a feasible solution for 6.18, then letting x7 and x8 both equal zero, you have obtained a
feasible solution to 6.19. Since all variables are nonnegative, x7 and x8 both equalling zero
is equivalent to saying the minimum of z = x7 + x8 subject to the constraints represented by
the above augmented matrix equals zero. This has proved the following simple observation.
Observation 6.4.2 There exists a feasible solution to the constraints represented by the
augmented matrix of 6.18 and x ≥ 0 if and only if the minimum of x7 + x8 subject to the
constraints of 6.19 and x ≥ 0 exists and equals 0.
Of course a similar observation would hold in other similar situations. Now the point of
all this is that it is trivial to see a feasible solution to 6.19, namely x6 = 7, x7 = 3, x8 = 2
and all the other variables may be set to equal zero. Therefore, it is easy to find an initial
simplex tableau for the minimization problem just described. First add the column and row
for z


2 1 −1 −1 0 0 1
0 0 3
 1 1 1
0 −1 0 0
1 0 2 


 1 1 1
0
0 1 0
0 0 7 
0 0 0
0
0 0 −1 −1 1 0
Next it is necessary to make the last two columns on the bottom left row into simple columns.
Performing the row operation, this yields an initial simplex tableau,


2 1 −1 −1 0 0 1 0 0 3
 1 1 1
0 −1 0 0 1 0 2 


 1 1 1
0
0 1 0 0 0 7 
3 2 0 −1 −1 0 0 0 1 5
Now the algorithm involves getting rid of the positive entries on the left bottom row. Begin
with the first column. The pivot is the 2. An application of the simplex algorithm yields
the new tableau


1
0 0
0 0 32
1 21 − 12 − 12
2
3
1
 0 1
−1 0 − 12 1 0 12 
2
2
2


3
1

 0 1
0 1 − 12 0 0 11
2
2
2
2
1
3
1
3
1
0 2
−1 0 − 2 0 1 2
2
2
Now go to the third column. The pivot is the
simplex algorithm yields

1 23 0 − 13 − 13
1
 0 1 1
− 23
3
3

 0 0 0 0
1
0 0 0 0
0
3/2 in the second row. An application of the
0
0
1
0
1
3
− 13
0
−1
1
3
2
3
0
0
−1 0
−1 1
5
3
1
3



5 
0
(6.20)
and you see there are only nonpositive numbers on the bottom left column so the process
stops and yields 0 for the minimum of z = x7 +x8 . As for the other variables, x1 = 5/3, x2 =
0, x3 = 1/3, x4 = 0, x5 = 0, x6 = 5. Now as explained in the above observation, this is a
basic feasible solution for the original system 6.18.
Now consider a maximization problem associated with the above constraints.
Example 6.4.3 Maximize x1 − x2 + 2x3 subject to the constraints, 2x1 + x2 − x3 ≥ 3, x1 +
x2 + x3 ≥ 2, x1 + x2 + x3 ≤ 7 and x ≥ 0.
160
CHAPTER 6. LINEAR PROGRAMMING
From 6.20 you can immediately assemble an initial simplex tableau. You begin with the
first 6 columns and top 3 rows in 6.20. Then add in the column and row for z. This yields


2
1
0 − 13 − 13 0 0 53
3
1
1
 0
1
− 23 0 0 13 
3
3


 0 0 0
0
1 1 0 5 
−1 1 −2 0
0 0 1 0
and you first do row operations to make the first and
the next simplex tableau is

1 23 0 − 13 − 13 0
1
 0 1 1
− 23 0
3
3

 0 0 0 0
1 1
1
5
0 73 0
−
0
3
3
third columns simple columns. Thus
0
0
0
1
5
3
1
3



5 
7
3
You are trying to get rid of negative entries in the bottom left row. There is only one, the
−5/3. The pivot is the 1. The next simplex tableau is then


1 23 0 − 13 0 13 0 10
3
1
 0 1 1

0 23 0 11
3
3
3 

 0 0 0 0 1 1 0 5 
1
0 73 0
0 53 1 32
3
3
and so the maximum value of z is 32/3 and it occurs when x1 = 10/3, x2 = 0 and x3 = 11/3.
6.5
Duality
You can solve minimization problems by solving maximization problems. You can also go
the other direction and solve maximization problems by minimization problems. Sometimes
this makes things much easier. To be more specific, the two problems to be considered are
A.) Minimize z = cx subject to x ≥ 0 and Ax ≥ b and
B.) Maximize w = yb such that y ≥ 0 and yA ≤ c,
(
)
equivalently AT yT ≥ cT and w = bT yT .
In these problems it is assumed A is an m × p matrix.
I will show how a solution of the first yields a solution of the second and then show how
a solution of the second yields a solution of the first. The problems, A.) and B.) are called
dual problems.
Lemma 6.5.1 Let x be a solution of the inequalities of A.) and let y be a solution of the
inequalities of B.). Then
cx ≥ yb.
and if equality holds in the above, then x is the solution to A.) and y is a solution to B.).
Proof: This follows immediately. Since c ≥ yA, cx ≥ yAx ≥ yb.
It follows from this lemma that if y satisfies the inequalities of B.) and x satisfies the
inequalities of A.) then if equality holds in the above lemma, it must be that x is a solution
of A.) and y is a solution of B.). Now recall that to solve either of these problems using the simplex method, you first
add in slack variables. Denote by x′ and y′ the enlarged list of variables. Thus x′ has at
6.5. DUALITY
161
least m entries and so does y′ and the inequalities involving A were replaced by equalities
whose augmented matrices were of the form
(
)
(
)
A −I b , and AT I cT
Then you included the row and column for z and w to obtain
(
)
(
)
AT
I 0 cT
A −I 0 b
and
.
−c 0 1 0
−bT 0 1 0
(6.21)
Then the problems have basic feasible solutions if it is possible to permute the first p + m
columns in the above two matrices and obtain matrices of the form
)
(
)
(
B1
F1
0 cT
B
F
0 b
(6.22)
and
−bTB1 −bTF1 1 0
−cB −cF 1 0
where B, B1 are invertible m × m and p × p matrices and denoting the variables associated
with these columns by xB , yB and those variables associated with F or F1 by xF and yF ,
it follows
that )letting BxB = b and xF = 0, the resulting vector, x′ is a solution to x′ ≥ 0
(
and A −I x′ = b with similar constraints holding for y′ . In other words, it is possible
to obtain simplex tableaus,
)
(
) (
I
B1−1 F1
0
B1−1 cT
I
B −1 F
0
B −1 b
(6.23)
,
0 cB B −1 F − cF 1 cB B −1 b
0 bTB1 B1−1 F − bTF1 1 bTB1 B1−1 cT
Similar considerations apply to the second problem. Thus as just described, a basic feasible
solution is one which determines a simplex tableau like the above in which you get a feasible
solution by setting all but the first m variables equal to zero. The simplex algorithm takes
you from one basic feasible solution to another till eventually, if there is no degeneracy, you
obtain a basic feasible solution which yields the solution of the problem of interest.
Theorem 6.5.2 Suppose there exists a solution x to A.) where x is a basic feasible solution
of the inequalities of A.). Then there exists a solution y to B.) and cx = by. It is also
possible to find y from x using a simple formula.
Proof: Since the solution to A.) is basic and feasible, there exists a simplex tableau like
6.23 such that x′ can be split into xB and xF such that xF = 0 and xB = B −1 b. Now since
it is a minimizer, it follows cB B −1 F − cF ≤ 0 and the minimum value for cx is cB B −1 b.
Stating this again, cx = cB B −1 b. Is it possible you can take y = cB B −1 ? From Lemma 6.5.1
this will be so if cB B −1 solves the constraints of problem(B.). Is cB)B −1(≥ 0? Is)cB B −1 A ≤
c? These two conditions are satisfied if and only if cB B −1 A −I ≤ c 0 . Referring
to the process of permuting the columns of the first (augmented
) matrix
( of 6.21
) to get 6.22
A
−I
c
0
and doing the same permutations on (the columns
of
and
, the desired
) (
)
−1
B
F
c
c
inequality
holds
if
and
only
if
c
B
≤
which
is
equivalent
to saying
B
F
B
(
)
(
)
cB cB B −1 F ≤ cB cF
and this is true because cB B −1 F − cF ≤ 0 due to the
assumption that x is a minimizer. The simple formula is just y = cB B −1 . The proof of the following corollary is similar.
Corollary 6.5.3 Suppose there exists a solution, y to B.) where y is a basic feasible solution
of the inequalities of B.). Then there exists a solution, x to A.) and cx = by. It is also
possible to find x from y using a simple formula. In this case, and referring to 6.23, the
simple formula is x = B1−T bB1 .
As an example, consider the pig farmers problem. The main difficulty in this problem
was finding an initial simplex tableau. Now consider the following example and marvel at
how all the difficulties disappear.
162
CHAPTER 6. LINEAR PROGRAMMING
Example 6.5.4 minimize C ≡ 2x1 + 3x2 + 2x3 + 3x4 subject to the constraints
x1 + 2x2 + x3 + 3x4
5x1 + 3x2 + 2x3 + x4
x1 + 2x2 + 2x3 + x4
≥
≥
≥
5,
8,
6,
2x1 + x2 + x3 + x4
x1 + x2 + x3 + x4
≥
≥
7,
4.
where each xi ≥ 0.
Here the dual problem is to maximize w = 5y1 + 8y2 + 6y3 + 7y4 + 4y5 subject to the
constraints




 
y1
2
1 5 1 2 1
 y2 
 2 3 2 1 1 
  3 


  
 1 2 2 1 1   y3  ≤  2  .
 y4 
3
3 1 1 1 1
y5
Adding in slack variables, these inequalities are equivalent to the system of equations whose
augmented matrix is


1 5 1 2 1 1 0 0 0 2
 2 3 2 1 1 0 1 0 0 3 


 1 2 2 1 1 0 0 1 0 2 
3 1 1 1 1 0 0 0 1 3
Now the obvious solution is feasible so there is no hunting for an initial obvious feasible
solution required. Now add in the row and column for w. This yields


1
5
1
2
1 1 0 0 0 0 2
 2
3
2
1
1 0 1 0 0 0 3 


 1
2
2
1
1 0 0 1 0 0 2 

.
 3
1
1
1
1 0 0 0 1 0 3 
−5 −8 −6 −7 −4 0 0 0 0 1 0
It is a maximization problem so you want to eliminate the negatives in the bottom left row.
Pick the column having the one which is most negative, the −8. The pivot is the top 5.
Then apply the simplex algorithm to obtain

 1
1
2
1
1
1
0 0 0 0 52
5
5
5
5
5
7
2
 7
0
− 15
− 35 1 0 0 0 95 
5
5

 35
8
1
3

0
− 25 0 1 0 0 65 
5
5
5
5
.
 14
3
4
4


0
− 15 0 0 1 0 13
5
5
5
5
5
22
19
12
8
16
17
0 0 0 1 5
−5 0 −5 −5 −5
5
There are still negative entries in the bottom left row. Do the simplex algorithm to the
8
column which has the − 22
5 . The pivot is the 5 . This yields
 1

3
1
1
1 0
0 − 18 0 0 14
8
8
8
4
 7
0 0 − 38 − 18 − 14 1 − 78 0 0 34 
 83

3
1
5

0 1
− 14 0
0 0 34 
8
8
8
 85

1
1

0 0
0 0 − 12 1 0 2 
2
2
2
1
− 47 0 0 − 13
− 34
0 11
0 1 13
4
2
4
2
6.5. DUALITY
163
and there are still negative numbers. Pick the column which
the 3/8 in the top. This yields
 1
8
2
0 1 13
0 − 31 0
3
3
3
 1
1 0 0 0
0 1 −1 0
 1
1
1
1
2

−
1
0
−
0
0
3
3
3
3
 37
4
1
1
1

−
0
0
−
0
−
1
3
3
3
3
3
8
5
− 23 26
0 0 13
0
0
3
3
3
has the −13/4. The pivot is
0
0
0
0
1
2
3

1 

2 
3 
5 
3
26
3
which has only one negative entry on the bottom left. The pivot for this first column is the
7
3 . The next tableau is


2
5
0 20
0 1
0 − 72 − 17 0 37
7
7
7
1
 0 11 0 0 − 1
1 − 76 − 37 0 27 
7
7
7


2
2
5
 0 −1 1 0
−7 0
− 17 0 37 
7
7
7


1
3
 1 −4 0 0
− 17 0 − 17
0 57 
7
7
7
3
18
2
0 58
0 0
0 11
1 64
7
7
7
7
7
7
and all the entries in the left bottom row are nonnegative so the answer is 64/7. This is
the same as obtained before. So what values for x are needed? Here the basic variables are
y1 , y3 , y4 , y7 . Consider the original augmented matrix, one step before the simplex tableau.


1
5
1
2
1 1 0 0 0 0 2
 2
3
2
1
1 0 1 0 0 0 3 


 1
2
2
1
1 0 0 1 0 0 2 

.
 3
1
1
1
1 0 0 0 1 0 3 
−5 −8 −6 −7 −4 0 0 0 0 1 0
Permute the columns to put the columns associated

1
1
2 0 5
1
 2
2
1
1
3
1

 1
2
1
0
2
1

 3
1
1 0 1
1
−5 −6 −7 0 −8 −4

The matrix B is
1
 2

 1
3
and so B −T equals
Also bTB =
(
5
6

− 71
 0
 1
 −
7
7
0
)


x=

3
7
1
2
2
1
− 27
0
5
7
− 17
with these basic variables first. Thus

1 0 0 0 2
0 0 0 0 3 

0 1 0 0 2 

0 0 1 0 3 
0 0 0 1 0

0
1 

0 
0
2
1
1
1
5
7
0
− 27
− 17
1
7

1 

− 67 
− 37
and so from Corollary 6.5.3,

5
1
− 17 − 27
5
7
7
 6
0
0
0
1 

5
− 27 − 67   7
− 17
7
3
0
− 17 − 17 − 37
7


18
7

  0 
 =  11 

 
7
2
7
164
CHAPTER 6. LINEAR PROGRAMMING
which agrees with the original way of doing the problem.
Two good books which give more discussion of linear programming are Strang [26] and
Nobel and Daniels [21]. Also listed in these books are other references which may prove
useful if you are interested in seeing more on these topics. There is a great deal more which
can be said about linear programming.
6.6
Exercises
1. Maximize and minimize z = x1 − 2x2 + x3 subject to the constraints x1 + x2 + x3 ≤
10, x1 + x2 + x3 ≥ 2, and x1 + 2x2 + x3 ≤ 7 if possible. All variables are nonnegative.
2. Maximize and minimize the following if possible. All variables are nonnegative.
(a) z = x1 − 2x2 subject to the constraints x1 + x2 + x3 ≤ 10, x1 + x2 + x3 ≥ 1, and
x1 + 2x2 + x3 ≤ 7
(b) z = x1 − 2x2 − 3x3 subject to the constraints x1 + x2 + x3 ≤ 8, x1 + x2 + 3x3 ≥ 1,
and x1 + x2 + x3 ≤ 7
(c) z = 2x1 + x2 subject to the constraints x1 − x2 + x3 ≤ 10, x1 + x2 + x3 ≥ 1, and
x1 + 2x2 + x3 ≤ 7
(d) z = x1 + 2x2 subject to the constraints x1 − x2 + x3 ≤ 10, x1 + x2 + x3 ≥ 1, and
x1 + 2x2 + x3 ≤ 7
3. Consider contradictory constraints, x1 + x2 ≥ 12 and x1 + 2x2 ≤ 5, x1 ≥ 0, x2 ≥ 0.
You know these two contradict but show they contradict using the simplex algorithm.
4. Find a solution to the following inequalities for x, y ≥ 0 if it is possible to do so. If it
is not possible, prove it is not possible.
(a)
6x + 3y ≥ 4
8x + 4y ≤ 5
(b)
6x1 + 4x3 ≤ 11
5x1 + 4x2 + 4x3 ≥ 8
6x1 + 6x2 + 5x3 ≤ 11
(c)
6x1 + 4x3 ≤ 11
5x1 + 4x2 + 4x3 ≥ 9
6x1 + 6x2 + 5x3 ≤ 9
(d)
x1 − x2 + x3 ≤ 2
x1 + 2x2 ≥ 4
3x1 + 2x3 ≤ 7
(e)
5x1 − 2x2 + 4x3 ≤ 1
6x1 − 3x2 + 5x3 ≥ 2
5x1 − 2x2 + 4x3 ≤ 5
5. Minimize z = x1 + x2 subject to x1 + x2 ≥ 2, x1 + 3x2 ≤ 20, x1 + x2 ≤ 18. Change
to a maximization problem and solve as follows: Let yi = M − xi . Formulate in terms
of y1 , y2 .
Chapter 7
Spectral Theory
Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is of
fundamental importance in many areas. Row operations will no longer be such a useful tool
in this subject.
7.1
Eigenvalues And Eigenvectors Of A Matrix
The field of scalars in spectral theory is best taken to equal C although I will sometimes
refer to it as F when it could be either C or R.
Definition 7.1.1 Let M be an n × n matrix and let x ∈ Cn be a nonzero vector for which
M x = λx
(7.1)
for some scalar, λ. Then x is called an eigenvector and λ is called an eigenvalue (characteristic value) of the matrix M.
Eigenvectors are never equal to zero!
The set of all eigenvalues of an n × n matrix M, is denoted by σ (M ) and is referred to as
the spectrum of M.
Eigenvectors are vectors which are shrunk, stretched or reflected upon multiplication by
a matrix. How can they be identified? Suppose x satisfies 7.1. Then
(λI − M ) x = 0
for some x ̸= 0. Therefore, the matrix M − λI cannot have an inverse and so by Theorem
3.3.18
det (λI − M ) = 0.
(7.2)
In other words, λ must be a zero of the characteristic polynomial. Since M is an n×n matrix,
it follows from the theorem on expanding a matrix by its cofactor that this is a polynomial
equation of degree n. As such, it has a solution, λ ∈ C. Is it actually an eigenvalue? The
answer is yes and this follows from Theorem 3.3.26 on Page 103. Since det (λI − M ) = 0
the matrix λI − M cannot be one to one and so there exists a nonzero vector, x such that
(λI − M ) x = 0. This proves the following corollary.
Corollary 7.1.2 Let M be an n×n matrix and det (M − λI) = 0. Then there exists x ∈ Cn
such that (M − λI) x = 0.
165
166
CHAPTER 7. SPECTRAL THEORY
As an example, consider the following.
Example 7.1.3 Find the eigenvalues and eigenvectors for the matrix


5 −10 −5
14
2 .
A= 2
−4 −8
6
You first need to identify the eigenvalues. Recall this requires the solution of the equation
 
 

1 0 0
5 −10 −5
14
2  = 0
det λ  0 1 0  −  2
0 0 1
−4 −8
6
When you expand this determinant, you find the equation is
(
)
(λ − 5) λ2 − 20λ + 100 = 0
and so the eigenvalues are
5, 10, 10.
I have listed 10 twice because it is a zero of multiplicity two due to
2
λ2 − 20λ + 100 = (λ − 10) .
Having found the eigenvalues, it only remains to find the eigenvectors. First find the
eigenvectors for λ = 5. As explained above, this requires you to solve the equation,
 
 
 
  
1 0 0
5 −10 −5
x
0
5  0 1 0  −  2
14
2   y  =  0  .
0 0 1
−4 −8
6
z
0
That is you need to find the solution to


 

0
10
5
x
0
 −2 −9 −2   y  =  0 
4
8 −1
z
0
By now this is an old problem. You set up the augmented matrix and row reduce to get the
solution. Thus the matrix you must row reduce is


0
10
5 0
 −2 −9 −2 0  .
(7.3)
4
8 −1 0
The reduced row echelon form is

1 0
 0 1
0 0
− 54
1
2
0

0
0 
0
and so the solution is any vector of the form
 5 
 5 
4z
4
 −1 z  = z  −1 
2
2
z
1
7.1. EIGENVALUES AND EIGENVECTORS OF A MATRIX
167
where z ∈ F. You would obtain the same collection of vectors if you replaced z with 4z.
Thus a simpler description for the solutions to this system of equations whose augmented
matrix is in 7.3 is


5
z  −2 
(7.4)
4
where z ∈ F. Now you need to remember that you can’t take z = 0 because this would
result in the zero vector and
Eigenvectors are never equal to zero!
Other than this value, every other choice of z in 7.4 results in an eigenvector. It is a good
idea to check your work! To do so, I will take the original matrix and multiply by this vector
and see if I get 5 times this vector.


 



5 −10 −5
5
25
5
 2
14
2   −2  =  −10  = 5  −2 
−4 −8
6
4
20
4
so it appears this is correct. Always check your work on these problems if you care about
getting the answer right.
The variable, z is called a free variable or sometimes a parameter. The set of vectors in
7.4 is called the eigenspace and it equals ker (λI − A) . You should observe that in this case
the eigenspace has dimension 1 because there is one vector which spans the eigenspace. In
general, you obtain the solution from the row echelon form and the number of different free
variables gives you the dimension of the eigenspace. Just remember that not every vector
in the eigenspace is an eigenvector. The vector, 0 is not an eigenvector although it is in the
eigenspace because
Eigenvectors are never equal to zero!
Next consider the eigenvectors for λ = 10. These vectors are solutions to the equation,
 
 
 
  
1 0 0
5 −10 −5
x
0
10  0 1 0  −  2
14
2   y  =  0 
0 0 1
−4 −8
6
z
0
That is you must find the solutions to

 


x
0
5
10
5
 −2 −4 −2   y  =  0 
4
8
4
z
0
which reduces to consideration of the augmented matrix


5
10
5 0
 −2 −4 −2 0 
4
8
4 0
The row reduced echelon form for this matrix

1 2
 0 0
0 0
is

1 0
0 0 
0 0
168
CHAPTER 7. SPECTRAL THEORY
and so the eigenvectors are of the form






−2y − z
−2
−1

 = y 1  + z 0 .
y
z
0
1
You can’t pick z and y both equal to zero because this would result in the zero vector and
Eigenvectors are never equal to zero!
However, every other choice of z and y does result in an eigenvector for the eigenvalue
λ = 10. As in the case for λ = 5 you should check your work if you care about getting it
right.


 



5 −10 −5
−1
−10
−1
 2
14
2   0  =  0  = 10  0 
−4 −8
6
1
10
1
so it worked. The other vector will also work. Check it.
The above example shows how to find eigenvectors and eigenvalues algebraically. You
may have noticed it is a bit long. Sometimes students try to first row reduce the matrix
before looking for eigenvalues. This is a terrible idea because row operations destroy the
value of the eigenvalues. The eigenvalue problem is really not about row operations. A
general rule to remember about the eigenvalue problem is this.
If it is not long and hard it is usually wrong!
The eigenvalue problem is the hardest problem in algebra and people still do research on
ways to find eigenvalues. Now if you are so fortunate as to find the eigenvalues as in the
above example, then finding the eigenvectors does reduce to row operations and this part
of the problem is easy. However, finding the eigenvalues is anything but easy because for
an n × n matrix, it involves solving a polynomial equation of degree n and none of us are
very good at doing this. If you only find a good approximation to the eigenvalue, it won’t
work. It either is or is not an eigenvalue and if it is not, the only solution to the equation,
(λI − M ) x = 0 will be the zero solution as explained above and
Eigenvectors are never equal to zero!
Here is another example.
Example 7.1.4 Let

2 2
A= 1 3
−1 1
First find the eigenvalues.
 
1 0
det λ  0 1
0 0

−2
−1 
1
 
0
2 2
0 − 1 3
1
−1 1

−2
−1  = 0
1
This is λ3 − 6λ2 + 8λ = 0 and the solutions are 0, 2, and 4.
0 Can be an Eigenvalue!
7.1. EIGENVALUES AND EIGENVECTORS OF A MATRIX
Now find the eigenvectors. For λ = 0 the

2
 1
−1
169
augmented matrix for finding the solutions is

2 −2 0
3 −1 0 
1 1 0
and the row reduced echelon form is

1 0
 0 1
0 0

−1 0
0 0 
0 0
Therefore, the eigenvectors are of the form


1
z 0 
1
where z ̸= 0.
Next find the eigenvectors for λ = 2. The augmented matrix for the system of equations
needed to find these eigenvectors is


0 −2 2 0
 −1 −1 1 0 
1 −1 1 0
and the row reduced echelon form is

1 0
 0 1
0 0
and so the eigenvectors are of the form

0 0
−1 0 
0 0


0
z 1 
1
where z ̸= 0.
Finally find the eigenvectors for λ = 4. The augmented matrix for the system of equations
needed to find these eigenvectors is


2 −2 2 0
 −1 1 1 0 
1 −1 3 0
and the row reduced echelon form is


1 −1 0 0
 0 0 1 0 .
0 0 0 0
Therefore, the eigenvectors are of the form


1
y 1 
0
where y ̸= 0.
170
Example 7.1.5 Let
CHAPTER 7. SPECTRAL THEORY

2
A =  −2
14

−2 −1
−1 −2  .
25 14
Find the eigenvectors and eigenvalues.
In this case the eigenvalues are 3, 6, 6 where I have listed 6 twice because it is a zero of
algebraic multiplicity two, the characteristic equation being
2
(λ − 3) (λ − 6) = 0.
It remains to find the eigenvectors for these eigenvalues. First consider the eigenvectors for
λ = 3. You must solve
 
 
 
  
1 0 0
2 −2 −1
x
0
3  0 1 0  −  −2 −1 −2   y  =  0  .
0 0 1
14 25 14
z
0
Using routine row operations, the eigenvectors are nonzero vectors of the form




1
z
 −z  = z  −1 
1
z
Next consider the eigenvectors for λ = 6. This requires you to solve
 
 
 
 

1 0 0
2 −2 −1
x
0
6  0 1 0  −  −2 −1 −2   y  =  0 
0 0 1
14 25 14
z
0
and using the usual procedures yields the eigenvectors for λ = 6 are of the form
 1 
−8
z  − 14 
1
or written more simply,


−1
z  −2 
8
where z ∈ F.
Note that in this example the eigenspace for the eigenvalue λ = 6 is of dimension 1
because there is only one parameter which can be chosen. However, this eigenvalue is of
multiplicity two as a root to the characteristic equation.
Definition 7.1.6 If A is an n × n matrix with the property that some eigenvalue has algebraic multiplicity as a root of the characteristic equation which is greater than the dimension
of the eigenspace associated with this eigenvalue, then the matrix is called defective.
There may be repeated roots to the characteristic equation, 7.2 and it is not known
whether the dimension of the eigenspace equals the multiplicity of the eigenvalue. However,
the following theorem is available.
Theorem 7.1.7 Suppose M vi = λi vi , i = 1, · · · , r , vi ̸= 0, and that if i ̸= j, then λi ̸= λj .
Then the set of eigenvectors, {v1 , · · · , vr } is linearly independent.
7.1. EIGENVALUES AND EIGENVECTORS OF A MATRIX
171
Proof. Suppose the claim of the lemma is not true. Then there exists a subset of this
set of vectors
{w1 , · · · , wr } ⊆ {v1 , · · · , vk }
such that
r
∑
cj wj = 0
(7.5)
j=1
where each cj ̸= 0. Say M wj = µj wj where
{µ1 , · · · , µr } ⊆ {λ1 , · · · , λk } ,
the µj being distinct eigenvalues of M . Out of all such subsets, let this one be such that r
is as small as possible. Then necessarily, r > 1 because otherwise, c1 w1 = 0 which would
imply w1 = 0, which is not allowed for eigenvectors.
Now apply M to both sides of 7.5.
r
∑
cj µj wj = 0.
(7.6)
j=1
Next pick µk ̸= 0 and multiply both sides of 7.5 by µk . Such a µk exists because r > 1.
Thus
r
∑
cj µk wj = 0
(7.7)
j=1
Subtract the sum in 7.7 from the sum in 7.6 to obtain
r
∑
(
)
cj µk − µj wj = 0
j=1
(
)
Now one of the constants cj µk − µj equals 0, when j = k. Therefore, r was not as small
as possible after all. In words, this theorem says that eigenvectors associated with distinct eigenvalues are
linearly independent.
Sometimes you have to consider eigenvalues which are complex numbers. This occurs in
differential equations for example. You do these problems exactly the same way as you do
the ones in which the eigenvalues are real. Here is an example.
Example 7.1.8 Find the eigenvalues and eigenvectors of the matrix


1 0 0
A =  0 2 −1  .
0 1 2
You need to find the eigenvalues. Solve
 
 
1 0 0
1
det λ  0 1 0  −  0
0 0 1
0
0
2
1

0
−1  = 0.
2
(
)
This reduces to (λ − 1) λ2 − 4λ + 5 = 0. The solutions are λ = 1, λ = 2 + i, λ = 2 − i.
172
CHAPTER 7. SPECTRAL THEORY
There is nothing new about finding the eigenvectors for λ = 1 so consider the eigenvalue
λ = 2 + i. You need to solve


 
 
 

1 0 0
1 0 0
x
0
(2 + i)  0 1 0  −  0 2 −1   y  =  0 
0 0 1
0 1 2
z
0
In other words, you must consider the augmented matrix


1+i 0 0 0
 0
i 1 0 
0
−1 i 0
for the solution. Divide the top row by (1 + i) and then take −i times the second row and
add to the bottom. This yields


1 0 0 0
 0 i 1 0 
0 0 0 0
Now multiply the second row by −i to obtain


1 0 0 0
 0 1 −i 0 
0 0 0 0
Therefore, the eigenvectors are of the form


0
z i .
1
You should find the eigenvectors for λ = 2 − i. These are


0
z  −i  .
1
As usual, if you want to

1 0
 0 2
0 1
get it right you had better check it.

 



0
0
0
0
−1   −i  =  −1 − 2i  = (2 − i)  −i 
2
1
2−i
1
so it worked.
7.2
Some Applications Of Eigenvalues And Eigenvectors
Recall that n × n matrices can be considered as linear transformations. If F is a 3 × 3 real
matrix having positive determinant, it can be shown that F = RU where R is a rotation
matrix and U is a symmetric real matrix having positive eigenvalues. An application of
this wonderful result, known to mathematicians as the right polar decomposition, is to
continuum mechanics where a chunk of material is identified with a set of points in three
dimensional space.
7.2. SOME APPLICATIONS OF EIGENVALUES AND EIGENVECTORS
173
The linear transformation, F in this context is called the deformation gradient and
it describes the local deformation of the material. Thus it is possible to consider this
deformation in terms of two processes, one which distorts the material and the other which
just rotates it. It is the matrix U which is responsible for stretching and compressing. This
is why in continuum mechanics, the stress is often taken to depend on U which is known in
this context as the right Cauchy Green strain tensor. This process of writing a matrix as a
product of two such matrices, one of which preserves distance and the other which distorts
is also important in applications to geometric measure theory an interesting field of study
in mathematics and to the study of quadratic forms which occur in many applications such
as statistics. Here I am emphasizing the application to mechanics in which the eigenvectors
of U determine the principle directions, those directions in which the material is stretched
or compressed to the maximum extent.
Example 7.2.1 Find the principle directions determined by the matrix

 29 6
6

11
6
11
6
11
11
41
44
19
44
11
19
44
41
44

The eigenvalues are 3, 1, and 21 .
It is nice to be given the eigenvalues. The largest eigenvalue is 3 which means that in
the direction determined by the eigenvector associated with 3 the stretch is three times as
large. The smallest eigenvalue is 1/2 and so in the direction determined by the eigenvector
for 1/2 the material is compressed, becoming locally half as long. It remains to find these
directions. First consider the eigenvector for 3. It is necessary to solve
 
  29 6
 
  
6
1 0 0
x
0
11
11
11
3  0 1 0  −  6 41 19   y  =  0 
11
44
44
6
19
41
0 0 1
z
0
11
44
44
Thus the augmented matrix for this system of equations is

 4
6
6
− 11
− 11
0
11
91
 −6
− 19
0 
11
44
44
6
19
91
− 11 − 44
0
44
The row reduced echelon form is

1 0
 0 1
0 0

−3 0
−1 0 
0 0
and so the principle direction for the eigenvalue 3 in which the material is stretched to the
maximum extent is
 
3
 1 .
1
A direction vector in this direction is
√

3/√11
 1/ 11  .
√
1/ 11

174
CHAPTER 7. SPECTRAL THEORY
You should show that the direction in which the material is compressed the most is in the
direction


0√
 −1/ 2 
√
1/ 2
Note this is meaningful information which you would have a hard time finding without
the theory of eigenvectors and eigenvalues.
Another application is to the problem of finding solutions to systems of differential
equations. It turns out that vibrating systems involving masses and springs can be studied
in the form
x′′ = Ax
(7.8)
where A is a real symmetric n × n matrix which has nonpositive eigenvalues. This is
analogous to the case of the scalar equation for undamped oscillation, x′′ + ω 2 x = 0. The
main difference is that here the scalar ω 2 is replaced with the matrix −A. Consider the
problem of finding solutions to 7.8. You look for a solution which is in the form
x (t) = veλt
(7.9)
and substitute this into 7.8. Thus
x′′ = vλ2 eλt = eλt Av
and so
λ2 v = Av.
Therefore, λ2 needs to be an eigenvalue of A and v needs to be an eigenvector. Since A
has nonpositive eigenvalues, λ2 = −a2 and so λ = ±ia where −a2 is an eigenvalue of A.
Corresponding to this you obtain solutions of the form
x (t) = v cos (at) , v sin (at) .
Note these solutions oscillate because of the cos (at) and sin (at) in the solutions. Here is
an example.
Example 7.2.2 Find oscillatory solutions to the system of differential equations, x′′ = Ax
where
 5

− 3 − 13
− 13
5
.
A =  − 13 − 13
6
6
1
5
13
−3
−
6
6
The eigenvalues are −1, −2, and −3.
According to the above, you can find solutions by looking for the eigenvectors. Consider
the eigenvectors for −3. The augmented matrix for finding the eigenvectors is
 4

1
1
−3
0
3
3
 1
− 56 − 56 0 
3
1
− 56 − 56 0
3
and its row echelon form is

1 0
 0 1
0 0
0
1
0

0
0 .
0
7.3. EXERCISES
175
Therefore, the eigenvectors are of the form

0
v = z  −1  .
1

It follows




0
0
(√ )
(√ )
 −1  cos
3t ,  −1  sin
3t
1
1
are both solutions to the system of differential equations. You can find other oscillatory
solutions in the same way by considering the other eigenvalues. You might try checking
these answers to verify they work.
This is just a special case of a procedure used in differential equations to obtain closed
form solutions to systems of differential equations using linear algebra. The overall philosophy is to take one of the easiest problems in analysis and change it into the eigenvalue
problem which is the most difficult problem in algebra. However, when it works, it gives
precise solutions in terms of known functions.
7.3
Exercises
1. If A is the matrix of a linear transformation which rotates all vectors in R2 through
30◦ , explain why A cannot have any real eigenvalues.
2. If A is an n × n matrix and c is a nonzero constant, compare the eigenvalues of A and
cA.
3. If A is an invertible n × n matrix, compare the eigenvalues of A and A−1 . More
generally, for m an arbitrary integer, compare the eigenvalues of A and Am .
4. Let A, B be invertible n × n matrices which commute. That is, AB = BA. Suppose
x is an eigenvector of B. Show that then Ax must also be an eigenvector for B.
5. Suppose A is an n × n matrix and it satisfies Am = A for some m a positive integer
larger than 1. Show that if λ is an eigenvalue of A then |λ| equals either 0 or 1.
6. Show that if Ax = λx and Ay = λy, then whenever a, b are scalars,
A (ax + by) = λ (ax + by) .
Does this imply that ax + by is an eigenvector? Explain.


−1 −1 7
7. Find the eigenvalues and eigenvectors of the matrix  −1 0 4  . Determine
−1 −1 5
whether the matrix is defective.


−3 −7 19
8. Find the eigenvalues and eigenvectors of the matrix  −2 −1 8  .Determine
−2 −3 10
whether the matrix is defective.


−7 −12 30
9. Find the eigenvalues and eigenvectors of the matrix  −3 −7 15  .
−3 −6 14
176
CHAPTER 7. SPECTRAL THEORY

10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.

7 −2 0
Find the eigenvalues and eigenvectors of the matrix  8 −1 0  . Determine
−2 4 6
whether the matrix is defective.


3 −2 −1
1 .
Find the eigenvalues and eigenvectors of the matrix  0 5
0 2
4


6 8 −23
Find the eigenvalues and eigenvectors of the matrix  4 5 −16 . Determine
3 4 −12
whether the matrix is defective.


5 2 −5
Find the eigenvalues and eigenvectors of the matrix  12 3 −10  . Determine
12 4 −11
whether the matrix is defective.


20 9 −18
−6  . Determine
Find the eigenvalues and eigenvectors of the matrix  6 5
30 14 −27
whether the matrix is defective.


1
26 −17
−4
4  . Determine
Find the eigenvalues and eigenvectors of the matrix  4
−9 −18
9
whether the matrix is defective.


3 −1 −2
Find the eigenvalues and eigenvectors of the matrix  11 3 −9  . Determine
8
0 −6
whether the matrix is defective.


−2
1 2
Find the eigenvalues and eigenvectors of the matrix  −11 −2 9  . Determine
−8
0 7
whether the matrix is defective.


2 1 −1
Find the eigenvalues and eigenvectors of the matrix  2 3 −2  . Determine whether
2 2 −1
the matrix is defective.


4 −2 −2
Find the complex eigenvalues and eigenvectors of the matrix  0 2 −2  .
2 0
2


9
6 −3
6
0  . Determine
Find the eigenvalues and eigenvectors of the matrix  0
−3 −6 9
whether the matrix is defective.


4 −2 −2
Find the complex eigenvalues and eigenvectors of the matrix  0 2 −2  . De2 0
2
termine whether the matrix is defective.
7.3. EXERCISES
177


−4 2
0
22. Find the complex eigenvalues and eigenvectors of the matrix  2 −4 0  .
−2 2 −2
Determine whether the matrix is defective.


1
1 −6
23. Find the complex eigenvalues and eigenvectors of the matrix  7 −5 −6  .
−1 7
2
Determine whether the matrix is defective.


4 2 0
24. Find the complex eigenvalues and eigenvectors of the matrix  −2 4 0  . Deter−2 2 6
mine whether the matrix is defective.
25. Here is a matrix.

1
 0

 0
0

a 0 0
1 b 0 

0 2 c 
0 0 2
Find values of a, b, c for which the matrix is defective and values of a, b, c for which it
is nondefective.
26. Here is a matrix.

a 1 0
 0 b 1 
0 0 c

where a, b, c are numbers. Show this is sometimes defective depending on the choice
of a, b, c. What is an easy case which will ensure it is not defective?
27. Suppose A is an n × n matrix consisting entirely of real entries but a + ib is a complex
eigenvalue having the eigenvector, x + iy. Here x and y are real vectors. Show that
then a − ib is also an eigenvalue with the eigenvector, x − iy. Hint: You should
remember that the conjugate of a product of complex numbers equals the product of
the conjugates. Here a + ib is a complex number whose conjugate equals a − ib.
28. Recall an n × n matrix is said to be symmetric if it has all real entries and if A = AT .
Show the eigenvalues of a real symmetric matrix are real and for each eigenvalue, it
has a real eigenvector.
29. Recall an n × n matrix is said to be skew symmetric if it has all real entries and if
A = −AT . Show that any nonzero eigenvalues must be of the form ib where i2 = −1.
In words, the eigenvalues are either 0 or pure imaginary.
30. Is it possible for a nonzero matrix to have only 0 as an eigenvalue?
31. Show that the eigenvalues and eigenvectors of a real matrix occur in conjugate pairs.
32. Suppose A is an n × n matrix having all real eigenvalues which are distinct. Show
there exists S such that S −1 AS = D, a diagonal matrix. If


λ1
0


..
D=

.
0
λn
178
CHAPTER 7. SPECTRAL THEORY
define eD by


eD ≡ 
eλ 1
0
..
.
0
and define



eλ n
eA ≡ SeD S −1 .
Next show that if A is as just described, so is tA where t is a real number and the
eigenvalues of At are tλk . If you differentiate a matrix of functions entry by entry so
that for the ij th entry of A′ (t) you get a′ij (t) where aij (t) is the ij th entry of A (t) ,
show
d ( At )
e
= AeAt
dt
( )
Next show det eAt ̸= 0. This is called the matrix exponential. Note I have only
defined it for the case where the eigenvalues of A are real, but the same procedure will
work even for complex eigenvalues. All you have to do is to define what is meant by
ea+ib .
 7

1
− 14
12
6
7
− 16  . The
33. Find the principle directions determined by the matrix  − 14 12
1
1
2
−6
6
3
1
1
eigenvalues are 3 , 1, and 2 listed according to multiplicity.
34. Find the principle directions determined by the matrix

 5
− 13 − 13
3
7
1 
 −1
The eigenvalues are 1, 2, and 1. What is the physical interpreta3
6
6
1
1
7
−3
6
6
tion of the repeated eigenvalue?
35. Find oscillatory solutions to the system of differential equations, x′′ = Ax where A =


−3 −1 −1
 −1 −2 0  The eigenvalues are −1, −4, and −2.
−1 0 −2
36. Let A and B be n × n matrices and let the columns of B be
b1 , · · · , bn
and the rows of A are
aT1 , · · · , aTn .
Show the columns of AB are
Ab1 · · · Abn
and the rows of AB are
aT1 B · · · aTn B.
37. Let M be an n × n matrix. Then define the adjoint of M , denoted by M ∗ to be the
transpose of the conjugate of M. For example,
(
)∗ (
)
2
i
2 1−i
=
.
1+i 3
−i
3
A matrix M, is self adjoint if M ∗ = M. Show the eigenvalues of a self adjoint matrix
are all real.
7.3. EXERCISES
179
38. Let M be an n × n matrix and suppose x1 , · · · , xn are n eigenvectors which form a
linearly independent set. Form the matrix S by making the columns these vectors.
Show that S −1 exists and that S −1 M S is a diagonal matrix (one having zeros everywhere except on the main diagonal) having the eigenvalues of M on the main diagonal.
When this can be done the matrix is said to be diagonalizable.
39. Show that a n × n matrix M is diagonalizable if and only if Fn has a basis of eigenvectors. Hint: The first part is done in Problem 38. It only remains to show that if the
matrix can be diagonalized by some matrix S giving D = S −1 M S for D a diagonal
matrix, then it has a basis of eigenvectors. Try using the columns of the matrix S.

40. Let

2
0 

3
1 2
 3 4
A=
0 1

and let
0
 1
B=
2

1
1


1
(
Multiply AB verifying the block multiplication formula. Here A11 =
( )
(
)
2
, A21 = 0 1 and A22 = (3) .
0
1
3
2
4
)
, A12 =
41. Suppose A, B are n × n matrices and λ is a nonzero eigenvalue of AB. Show that then
it is also an eigenvalue of BA. Hint: Use the definition of what it means for λ to be
an eigenvalue. That is,
ABx = λx
where x ̸= 0. Maybe you should multiply both sides by B.
42. Using the above problem show that if A, B are n × n matrices, it is not possible that
AB − BA = aI for any a ̸= 0. Hint: First show that if A is a matrix, then the
eigenvalues of A − aI are λ − a where λ is an eigenvalue of A.
43. Consider the following matrix.

0 ···
 1 0

C=
..

.
0
−a0
−a1
..
.
0
..
.
1





−an−1
Show det (λI − C) = a0 + λa1 + · · · an−1 λn−1 + λn . This matrix is called a companion
matrix for the given polynomial.
44. A discreet dynamical system is of the form
x (k + 1) = Ax (k) , x (0) = x0
where A is an n × n matrix and x (k) is a vector in Rn . Show first that
x (k) = Ak x0
180
CHAPTER 7. SPECTRAL THEORY
for all k ≥ 1. If A is nondefective so that it has a basis of eigenvectors, {v1 , · · · , vn }
where
Avj = λj vj
you can write the initial condition x0 in a unique way as a linear combination of these
eigenvectors. Thus
n
∑
aj vj
x0 =
j=1
Now explain why
x (k) =
n
∑
j=1
aj Ak vj =
n
∑
aj λkj vj
j=1
which gives a formula for x (k) , the solution of the dynamical system.
45. Suppose A is an n × n matrix and let v be an eigenvector such that Av = λv. Also
suppose the characteristic polynomial of A is
det (λI − A) = λn + an−1 λn−1 + · · · + a1 λ + a0
Explain why
(
)
An + an−1 An−1 + · · · + a1 A + a0 I v = 0
If A is nondefective, give a very easy proof of the Cayley Hamilton theorem based on
this. Recall this theorem says A satisfies its characteristic equation,
An + an−1 An−1 + · · · + a1 A + a0 I = 0.
46. Suppose an n × n nondefective matrix A has only 1 and −1 as eigenvalues. Find A12 .
47. Suppose the characteristic polynomial of an n × n matrix A is 1 − λn . Find Amn where
m is an integer. Hint: Note first that A is nondefective. Why?
48. Sometimes sequences come in terms of a recursion formula. An example is the Fibonacci sequence.
x0 = 1 = x1 , xn+1 = xn + xn−1
Show this can be considered as a discreet dynamical system as follows.
(
) (
)(
) (
) (
)
xn+1
1 1
xn
x1
1
=
,
=
xn
1 0
xn−1
x0
1
Now use the technique of Problem 44 to find a formula for xn .
49. Let A be an n × n matrix having characteristic polynomial
det (λI − A) = λn + an−1 λn−1 + · · · + a1 λ + a0
n
Show that a0 = (−1) det (A).
7.4. SCHUR’S THEOREM
7.4
181
Schur’s Theorem
Every matrix is related to an upper triangular matrix in a particularly significant way. This
is Schur’s theorem and it is the most important theorem in the spectral theory of matrices.
Lemma 7.4.1 Let {x1 , · · · , xn } be a basis for Fn . Then there exists an orthonormal basis for Fn , {u1 , · · · , un } which has the property that for each k ≤ n, span(x1 , · · · , xk ) =
span (u1 , · · · , uk ) .
Proof: Let {x1 , · · · , xn } be a basis for Fn . Let u1 ≡ x1 / |x1 | . Thus for k = 1,
span (u1 ) = span (x1 ) and {u1 } is an orthonormal set. Now suppose for some k < n, u1 , · · · ,
uk have been chosen such that (uj · ul ) = δ jl and span (x1 , · · · , xk ) = span (u1 , · · · , uk ).
Then define
∑k
xk+1 − j=1 (xk+1 · uj ) uj
,
uk+1 ≡ (7.10)
∑k
xk+1 − j=1 (xk+1 · uj ) uj where the denominator is not equal to zero because the xj form a basis and so
xk+1 ∈
/ span (x1 , · · · , xk ) = span (u1 , · · · , uk )
Thus by induction,
uk+1 ∈ span (u1 , · · · , uk , xk+1 ) = span (x1 , · · · , xk , xk+1 ) .
Also, xk+1 ∈ span (u1 , · · · , uk , uk+1 ) which is seen easily by solving 7.10 for xk+1 and it
follows
span (x1 , · · · , xk , xk+1 ) = span (u1 , · · · , uk , uk+1 ) .
If l ≤ k,

(uk+1 · ul ) = C (xk+1 · ul ) −
k
∑

(xk+1 · uj ) (uj · ul ) =
j=1

C (xk+1 · ul ) −
k
∑

(xk+1 · uj ) δ lj  = C ((xk+1 · ul ) − (xk+1 · ul )) = 0.
j=1
n
The vectors, {uj }j=1 , generated in this way are therefore an orthonormal basis because
each vector has unit length. The process by which these vectors were generated is called the Gram Schmidt process.
Here is a fundamental definition.
Definition 7.4.2 An n × n matrix U, is unitary if U U ∗ = I = U ∗ U where U ∗ is defined to
be the transpose of the conjugate of U.
Proposition 7.4.3 An n×n matrix is unitary if and only if the columns are an orthonormal
set.
Proof: This follows right away from the way we multiply matrices. If U is an n × n
complex matrix, then
(U ∗ U )ij = u∗i uj = (ui , uj )
and the matrix is unitary if and only if this equals δ ij if and only if the columns are
orthonormal. 182
CHAPTER 7. SPECTRAL THEORY
Theorem 7.4.4 Let A be an n × n matrix. Then there exists a unitary matrix U such that
U ∗ AU = T,
(7.11)
where T is an upper triangular matrix having the eigenvalues of A on the main diagonal
listed according to multiplicity as roots of the characteristic equation.
Proof: The theorem is clearly true if A is a 1 × 1 matrix. Just let U = 1 the 1 × 1
matrix which has 1 down the main diagonal and zeros elsewhere. Suppose it is true for
(n − 1) × (n − 1) matrices and let A be an n × n matrix. Then let v1 be a unit eigenvector
for A . Then there exists λ1 such that
Av1 = λ1 v1 , |v1 | = 1.
Extend {v1 } to a basis and then use Lemma 7.4.1 to obtain {v1 , · · · , vn }, an orthonormal
basis in Fn . Let U0 be a matrix whose ith column is vi . Then from the above, it follows U0
is unitary. Then U0∗ AU0 is of the form


λ1 ∗ · · · ∗
 0



 ..

 .

A
1
0
where A1 is an n − 1 × n − 1 matrix. Now by induction there exists an (n − 1) × (n − 1)
e1 such that
unitary matrix U
e1∗ A1 U
e1 = Tn−1 ,
U
an upper triangular matrix. Consider
(
U1 ≡
This is a unitary matrix and
(
)(
1 0
λ1
U1∗ U0∗ AU0 U1 =
e∗
0
0 U
1
1 0
e1
0 U
∗
A1
)(
)
1
0
0
e
U1
)
(
=
λ1
0
∗
Tn−1
∗
)
≡T
where T is upper triangular. Then let U = U0 U1 . Since (U0 U1 ) = U1∗ U0∗ , it follows A
is similar to T and that U0 U1 is unitary. Hence A and T have the same characteristic
polynomials and since the eigenvalues of T are the diagonal entries listed according to
algebraic multiplicity, these are also the eigenvalues of A listed according to multiplicity. As a simple consequence of the above theorem, here is an interesting lemma.
Lemma 7.4.5 Let A be of the form

···
..
.
···
P1
 ..
A= .
0
where Pk is an mk × mk matrix. Then
det (A) =
∏

∗
.. 
. 
Ps
det (Pk ) .
k
Also, the eigenvalues of A consist of the union of the eigenvalues of the Pj .
7.4. SCHUR’S THEOREM
183
Proof: Let Uk be an mk × mk unitary matrix such that
Uk∗ Pk Uk = Tk
where Tk is upper triangular. Then it follows that for



U1 · · · 0

..  , U ∗ = 
..
U ≡  ...

.
. 
0 · · · Us
and also
 ∗
U1
 ..
 .
0
···
..
.
···

0
..  
. 
Us∗
P1
..
.
0
···
..
.
···

∗
..  
. 
Ps
U1
..
.
0
U1∗
..
.
0
···
..
.
···

0
.. 
. 
Us∗
 
0
..  = 
.  
Us
···
..
.
···
T1
..
.
0
···
..
.
···

∗
..  .
. 
Ts
Therefore, since the determinant of an upper triangular matrix is the product of the diagonal
entries,
∏
∏
det (A) =
det (Tk ) =
det (Pk ) .
k
k
From the above formula, the eigenvalues of A consist of the eigenvalues of the upper triangular matrices Tk , and each Tk has the same eigenvalues as Pk . What if A is a real matrix and you only want to consider real unitary matrices?
Theorem 7.4.6 Let A be a real n × n matrix. Then
and a matrix T of the form

P1 · · · ∗

.
..
T =
. ..
0
there exists a real unitary matrix Q



(7.12)
Pr
where Pi equals either a real 1 × 1 matrix or Pi equals a real 2 × 2 matrix having as its
eigenvalues a conjugate pair of eigenvalues of A such that QT AQ = T. The matrix T is
called the real Schur form of the matrix A. Recall that a real unitary matrix is also called
an orthogonal matrix.
Proof: Suppose
Av1 = λ1 v1 , |v1 | = 1
where λ1 is real. Then let {v1 , · · · , vn } be an orthonormal basis of vectors in Rn . Let Q0
be a matrix whose ith column is vi . Then Q∗0 AQ0 is of the form


λ1 ∗ · · · ∗
 0



 ..

 .

A
1
0
where A1 is a real n − 1 × n − 1 matrix. This is just like the proof of Theorem 7.4.4 up to
this point.
Now consider the case where λ1 = α + iβ where β ̸= 0. It follows since A is real that
v1 = z1 + iw1 and that v1 = z1 − iw1 is an eigenvector for the eigenvalue α − iβ. Here
z1 and w1 are real vectors. Since v1 and v1 are eigenvectors corresponding to distinct
eigenvalues, they form a linearly independent set. From this it follows that {z1 , w1 } is an
184
CHAPTER 7. SPECTRAL THEORY
independent set of vectors in Cn , hence in Rn . Indeed,{v1 , v1 } is an independent set and
also span (v1 , v1 ) = span (z1 , w1 ) . Now using the Gram Schmidt theorem in Rn , there exists
{u1 , u2 } , an orthonormal set of real vectors such that span (u1 , u2 ) = span (v1 , v1 ). For
example,
2
|z1 | w1 − (w1 · z1 ) z1
u1 = z1 / |z1 | , u2 = 2
|z1 | w1 − (w1 · z1 ) z1 Let {u1 , u2 , · · · , un } be an orthonormal basis in Rn and let Q0 be a unitary matrix whose
ith column is ui so Q0 is a real orthogonal matrix. Then Auj are both in span (u1 , u2 ) for
j = 1, 2 and so uTk Auj = 0 whenever k ≥ 3. It follows that Q∗0 AQ0 is of the form


∗ ∗ ··· ∗
 (
 ∗ ∗
)


P1 ∗


∗
0
Q0 AQ0 = 
=
0 A1

 ..

 .
A
1
0
where A1 is now an n − 2 × n − 2 matrix and P1 is a 2 × 2 matrix. Now this is similar to A
and so two of its eigenvalues are α + iβ and α − iβ.
e 1 an n − 2 × n − 2 matrix to put A1 in an appropriate form as above and
Now find Q
come up with A2 either an n − 4 × n − 4 matrix or an n − 3 × n − 3 matrix. Then the only
other difference is to let


1 0 0 ··· 0
 0 1 0 ··· 0 




Q1 =  0 0

 .. ..

e
 . .

Q1
0 0
thus putting a 2×2 identity matrix in the upper left corner rather than a one. Repeating this
process with the above modification for the case of a complex eigenvalue leads eventually
to 7.12 where Q is the product of real unitary matrices Qi above. When the block Pi is
2 × 2, its eigenvalues are a conjugate pair of eigenvalues of A and if it is 1 × 1 it is a real
eigenvalue of A.
Here is why this last claim is true


λI1 − P1 · · ·
∗


..
..
λI − T = 

.
.
λIr − Pr
0
where Ik is the 2 × 2 identity matrix in the case that Pk is 2 × 2 and is the number 1 in the
case where Pk is a 1 × 1 matrix. Now by Lemma 7.4.5,
det (λI − T ) =
r
∏
det (λIk − Pk ) .
k=1
Therefore, λ is an eigenvalue of T if and only if it is an eigenvalue of some Pk . This proves
the theorem since the eigenvalues of T are the same as those of A including multiplicity
because they have the same characteristic polynomial due to the similarity of A and T. Corollary 7.4.7 Let A be a real n × n matrix having only real eigenvalues. Then there
exists a real orthogonal matrix Q and an upper triangular matrix T such that
QT AQ = T
7.4. SCHUR’S THEOREM
185
and furthermore, if the eigenvalues of A are listed in decreasing order,
λ1 ≥ λ2 ≥ · · · ≥ λn
Q can be chosen such that T is of the form

λ1 ∗

 0 λ2

 . .
..
 ..
0 ···
···
..
.
..
.
0

∗
.. 
. 


∗ 
λn
Proof: Most of this follows right away from Theorem 7.4.6. It remains to verify the
claim that the diagonal entries can be arranged in the desired order. However, this follows
from a simple modification of the above argument. When you find v1 the eigenvalue of λ1 ,
just be sure λ1 is chosen to be the largest eigenvalue. Then in the rest of the argument,
always choose the largest eigenvalue at each step of the construction. Of course there is a similar conclusion which can be proved exactly the same way in the
case where A has complex eigenvalues.
Corollary 7.4.8 Let A be a real n × n matrix. Then there exists a real orthogonal matrix
Q and an upper triangular matrix T such that


P1 · · · ∗

. 
..
QT AQ = T = 
. .. 
0
Pr
where Pi equals either a real 1 × 1 matrix or Pi equals a real 2 × 2 matrix having as its
eigenvalues a conjugate pair of eigenvalues of A. If Pk corresponds to the two eigenvalues
αk ± iβ k ≡ σ (Pk ) , Q can be chosen such that
|σ (P1 )| ≥ |σ (P2 )| ≥ · · ·
where
|σ (Pk )| ≡
√
α2k + β 2k
The blocks, Pk can be arranged in any other order also.
Definition 7.4.9 When a linear transformation A, mapping a linear space V to V has a
basis of eigenvectors, the linear transformation is called non defective. Otherwise it is called
defective. An n × n matrix A, is called normal if AA∗ = A∗ A. An important class of normal
matrices is that of the Hermitian or self adjoint matrices. An n × n matrix A is self adjoint
or Hermitian if A = A∗ .
You can check
that an example √
of a)normal matrix which is neither symmetric nor
(
6i √
− (1 + i) 2
Hermitian is
.
(1 − i) 2
6i
The next lemma is the basis for concluding that every normal matrix is unitarily similar
to a diagonal matrix.
Lemma 7.4.10 If T is upper triangular and normal, then T is a diagonal matrix.
186
CHAPTER 7. SPECTRAL THEORY
Proof: This is obviously true if T is 1 × 1. In fact, it can’t help being diagonal in this
case. Suppose then that the lemma is true for (n − 1) × (n − 1) matrices and let T be an
upper triangular normal n × n matrix. Thus T is of the form
(
)
(
)
t11 a∗
t11 0T
T =
, T∗ =
a T1∗
0 T1
Then
TT
∗
T ∗T
(
=
(
=
t11
0
a∗
T1
t11
a
0T
T1∗
)(
)(
t11
a
0T
T1∗
t11
0
a∗
T1
)
(
=
)
(
=
|t11 | + a∗ a a∗ T1∗
T1 a
T1 T1∗
2
2
|t11 |
at11
t11 a∗
∗
aa + T1∗ T1
)
)
Since these two matrices are equal, it follows a = 0. But now it follows that T1∗ T1 = T1 T1∗
and so by induction T1 is a diagonal matrix D1 . Therefore,
(
)
t11 0T
T =
0 D1
a diagonal matrix.
Now here is a proof which doesn’t involve block multiplication. Since T is normal,
T ∗ T = T T ∗ . Writing this in terms of components and using the description of the adjoint
as the transpose of the conjugate, yields the following for the ik th entry of T ∗ T = T T ∗ .
z∑
TT∗
T ∗T
}| ∑
{ z∑
}| ∑
{
tij t∗jk =
tij tkj =
t∗ij tjk =
tji tjk .
j
j
j
j
Now use the fact that T is upper triangular and let i = k = 1 to obtain the following from
the above.
∑
∑
2
2
2
|t1j | =
|tj1 | = |t11 |
j
j
You see, tj1 = 0 unless j = 1 due to the
T is of the form

∗
 0

 ..
 .
assumption that T is upper triangular. This shows

0 ··· 0
∗ ··· ∗ 

. .
..
..
.
. .. 
0 ···
∗
0
Now do the same thing only this time take i = k = 2 and use the result just established.
Thus, from the above,
∑
∑
2
2
2
|t2j | =
|tj2 | = |t22 | ,
j
j
showing that t2j = 0 if j > 2 which means T has the form


∗ 0 0 ··· 0
 0 ∗ 0 ··· 0 


 0 0 ∗ ··· ∗ 

.
 .. .. . .
. 
..
 . .
.
. .. 
0 0
0
0
∗
Next let i = k = 3 and obtain that T looks like a diagonal matrix in so far as the first 3
rows and columns are concerned. Continuing in this way, it follows T is a diagonal matrix.
7.4. SCHUR’S THEOREM
187
Theorem 7.4.11 Let A be a normal matrix. Then there exists a unitary matrix U such
that U ∗ AU is a diagonal matrix.
Proof: From Theorem 7.4.4 there exists a unitary matrix U such that U ∗ AU equals
an upper triangular matrix. The theorem is now proved if it is shown that the property of
being normal is preserved under unitary similarity transformations. That is, verify that if
A is normal and if B = U ∗ AU, then B is also normal. But this is easy.
B∗B
= U ∗ A∗ U U ∗ AU = U ∗ A∗ AU
= U ∗ AA∗ U = U ∗ AU U ∗ A∗ U = BB ∗ .
Therefore, U ∗ AU is a normal and upper triangular matrix and by Lemma 7.4.10 it must be
a diagonal matrix. The converse is also true. See Problem 9 below.
Corollary 7.4.12 If A is Hermitian, then all the eigenvalues of A are real and there exists
an orthonormal basis of eigenvectors.
Proof: Since A is normal, there exists unitary, U such that U ∗ AU = D, a diagonal
matrix whose diagonal entries are the eigenvalues of A. Therefore, D∗ = U ∗ A∗ U = U ∗ AU =
D showing D is real.
Finally, let
(
)
U = u1 u2 · · · un
where the ui denote the columns of U and

λ1

D=
0
The equation, U ∗ AU = D implies
(
Au1
AU =
= UD =
(
0
..
.



λn
···
Au2
λ 1 u1
Aun
)
···
λ2 u2
λn un
)
where the entries denote the columns of AU and U D respectively. Therefore, Aui = λi ui
and since the matrix is unitary, the ij th entry of U ∗ U equals δ ij and so
δ ij = u∗i uj ≡ uj · ui .
This proves the corollary because it shows the vectors {ui } are orthonormal. Therefore,
they form a basis because every orthonormal set of vectors is linearly independent. Corollary 7.4.13 If A is a real symmetric matrix, then A is Hermitian and there exists a
real unitary matrix U such that U T AU = D where D is a diagonal matrix whose diagonal
entries are the eigenvalues of A. By arranging the columns of U the diagonal entries of D
can be made to appear in any order.
Proof: This follows from Theorem 7.4.6 and Corollary 7.4.12. Let
(
)
U = u1 · · · un
Then AU = U D so
(
AU = Au1
···
Aun
)
=
(
u1
···
un
)
D=
(
λ1 u1
···
λn un
)
188
CHAPTER 7. SPECTRAL THEORY
Hence each column of U is an eigenvector of A. It follows that by rearranging these columns,
the entries of D on the main diagonal can be made to appear in any order. To see this,
consider such a rearrangement resulting in an orthogonal matrix U ′ given by
(
)
U ′ = ui1 · · · uin
Then
U ′T AU ′ = U ′T

uTi1
(

=  ...  λi1 ui1
uTin

7.5
···
(
···
Aui1
)
λin uin


=
Auin
)
λi1
0
..
.
0


 λin
Trace And Determinant
The determinant has already been discussed. It is also clear that if A = S −1 BS so that
A, B are similar, then
)
(
(
)
det (A) = det S −1 det (S) det (B) = det S −1 S det (B)
=
det (I) det (B) = det (B)
The trace is defined in the following definition.
Definition 7.5.1 Let A be an n × n matrix whose ij th entry is denoted as aij . Then
∑
trace (A) ≡
aii
i
In other words it is the sum of the entries down the main diagonal.
Theorem 7.5.2 Let A be an m × n matrix and let B be an n × m matrix. Then
trace (AB) = trace (BA) .
Also if B = S −1 AS so that A, B are similar, then
trace (A) = trace (B) .
Proof:
trace (AB) ≡
(
∑ ∑
i
k
)
Aik Bki
=
∑∑
k
Bki Aik = trace (BA)
i
Therefore,
(
)
(
)
trace (B) = trace S −1 AS = trace ASS −1 = trace (A) . Theorem 7.5.3 Let A be an n×n matrix. Then trace (A) equals the sum of the eigenvalues
of A and det (A) equals the product of the eigenvalues of A.
This is proved using Schur’s theorem and is in Problem 17 below. Another important
property of the trace is in the following theorem.
7.6. QUADRATIC FORMS
7.6
189
Quadratic Forms
Definition 7.6.1 A quadratic form in three dimensions is an expression of the form


x
(
)
x y z A y 
(7.13)
z
where A is a 3 × 3 symmetric matrix. In higher dimensions the idea is the same except you
use a larger symmetric matrix in place of A. In two dimensions A is a 2 × 2 matrix.
For example, consider
(
x
y
z
)

3 −4
 −4 0
1 −4


1
x
−4   y 
3
z
(7.14)
which equals 3x2 − 8xy + 2xz − 8yz + 3z 2 . This is very awkward because of the mixed terms
such as −8xy. The idea is to pick different axes such that if x, y, z are taken with respect
to these axes, the quadratic form is much simpler. In other words, look for new variables,
x′ , y ′ , and z ′ and a unitary matrix U such that
 ′  

x
x
U  y′  =  y 
(7.15)
z′
z
and if you write the quadratic form in terms of the primed variables, there will be no mixed
terms. Any symmetric real matrix is Hermitian and is therefore normal. From Corollary
7.4.13, it follows there exists a real unitary matrix U, (an orthogonal matrix) such that
U T AU = D a diagonal matrix. Thus in the quadratic form, 7.13


 ′ 
x
x
(
)
( ′
)
x y z A y  =
x y ′ z ′ U T AU  y ′ 
z
z′
 ′ 
x
( ′
)
x y′ z′ D  y′ 
=
z′
and in terms of these new variables, the quadratic form becomes
λ1 (x′ ) + λ2 (y ′ ) + λ3 (z ′ )
2
2
2
where D = diag (λ1 , λ2 , λ3 ) . Similar considerations apply equally well in any other dimension. For the given example,
√ 

 1√
1
0
− 2√ 2
3 −4 1
2 √2
√
1
1
  −4 0 −4  ·
 1 6
6
6√
3 √
6 √6
1
1
1
1 −4 3
3
−
3
3
3
3
3




√1
√1
− √1
2 0 0
 0 2 √26 − √31  
0 −4 0 
=

6
3 
1
1
1
√
√
√
0 0 8
2
6
3
190
CHAPTER 7. SPECTRAL THEORY
and so if the new variables are given by


 

√1
− √12 √16
x′
x
3
 0
√2
− √13 
  y′  =  y  ,

6
1
1
1
√
√
√
z′
z
2
6
3
it follows that in terms of the new variables the quadratic form is 2 (x′ ) − 4 (y ′ ) + 8 (z ′ ) .
You can work other examples the same way.
2
7.7
2
2
Second Derivative Test
Under certain conditions the mixed partial derivatives will always be equal. This astonishing fact was first observed by Euler around 1734. It is also called Clairaut’s theorem.
Theorem 7.7.1 Suppose f : U ⊆ F2 → R where U is an open set on which fx , fy , fxy and
fyx exist. Then if fxy and fyx are continuous at the point (x, y) ∈ U , it follows
fxy (x, y) = fyx (x, y) .
Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) ⊆ U. Now let |t| , |s| <
r/2, t, s real numbers and consider
h(t)
h(0)
}|
{ z
}|
{
1 z
∆ (s, t) ≡ {f (x + t, y + s) − f (x + t, y) − (f (x, y + s) − f (x, y))}.
st
(7.16)
Note that (x + t, y + s) ∈ U because
(
)1/2
|(x + t, y + s) − (x, y)| = |(t, s)| = t2 + s2
( 2
)1/2
r
r2
r
≤
+
= √ < r.
4
4
2
As implied above, h (t) ≡ f (x + t, y + s)−f (x + t, y). Therefore, by the mean value theorem
from calculus and the (one variable) chain rule,
∆ (s, t) =
=
1
1
(h (t) − h (0)) = h′ (αt) t
st
st
1
(fx (x + αt, y + s) − fx (x + αt, y))
s
for some α ∈ (0, 1) . Applying the mean value theorem again,
∆ (s, t) = fxy (x + αt, y + βs)
where α, β ∈ (0, 1).
If the terms f (x + t, y) and f (x, y + s) are interchanged in 7.16, ∆ (s, t) is unchanged
and the above argument shows there exist γ, δ ∈ (0, 1) such that
∆ (s, t) = fyx (x + γt, y + δs) .
Letting (s, t) → (0, 0) and using the continuity of fxy and fyx at (x, y) ,
lim
(s,t)→(0,0)
∆ (s, t) = fxy (x, y) = fyx (x, y) . The following is obtained from the above by simply fixing all the variables except for the
two of interest.
7.7. SECOND DERIVATIVE TEST
191
Corollary 7.7.2 Suppose U is an open subset of Fn and f : U → R has the property
that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk are both
continuous at x ∈ U. Then fxk xl (x) = fxl xk (x) .
Thus the theorem asserts that the mixed partial derivatives are equal at x if they are
defined near x and continuous at x.
Now recall the Taylor formula with the Lagrange form of the remainder. What follows
is a proof of this important result based on the mean value theorem or Rolle’s theorem.
Theorem 7.7.3 Suppose f has n + 1 derivatives on an interval, (a, b) and let c ∈ (a, b) .
Then if x ∈ (a, b) , there exists ξ between c and x such that
f (x) = f (c) +
n
∑
f (k) (c)
k!
k=1
∑0
(In this formula, the symbol
k=1
k
(x − c) +
f (n+1) (ξ)
n+1
(x − c)
.
(n + 1)!
ak will denote the number 0.)
Proof: If n = 0 then the theorem is true because it is just the mean value theorem.
Suppose the theorem is true for n − 1, n ≥ 1. It can be assumed x ̸= c because if x = c there
is nothing to show. Then there exists K such that
(
)
n
∑
f (k) (c)
k
n+1
f (x) − f (c) +
(x − c) + K (x − c)
=0
(7.17)
k!
k=1
In fact,
K=
(
∑n
−f (x) + f (c) + k=1
(x − c)
f (k) (c)
k!
n+1
k
(x − c)
)
.
Now define F (t) for t in the closed interval determined by x and c by
(
)
n
∑
f (k) (c)
k
n+1
(x − t) + K (x − t)
F (t) ≡ f (x) − f (t) +
.
k!
k=1
The c in 7.17 got replaced by t.
Therefore, F (c) = 0 by the way K was chosen and also F (x) = 0. By the mean value
theorem or Rolle’s theorem, there exists t1 between x and c such that F ′ (t1 ) = 0. Therefore,
0 =
f ′ (t1 ) −
n
∑
f (k) (c)
k!
k=1
(
k (x − t1 )
k−1
n
− K (n + 1) (x − t1 )
)
f (k+1) (c)
k
n
= f (t1 ) − f (c) +
(x − t1 )
− K (n + 1) (x − t1 )
k!
k=1
)
(
n−1
∑ f ′(k) (c)
k
n
′
′
(x − t1 )
− K (n + 1) (x − t1 )
= f (t1 ) − f (c) +
k!
′
′
n−1
∑
k=1
By induction applied to f ′ , there exists ξ between x and t1 such that the above simplifies
to
f ′(n) (ξ) (x − t1 )
n
− K (n + 1) (x − t1 )
n!
n
f (n+1) (ξ) (x − t1 )
n
− K (n + 1) (x − t1 )
n!
n
0 =
=
192
CHAPTER 7. SPECTRAL THEORY
therefore,
K=
f (n+1) (ξ)
f (n+1) (ξ)
=
(n + 1) n!
(n + 1)!
and the formula is true for n. The following is a special case and is what will be used.
Theorem 7.7.4 Let h : (−δ, 1 + δ) → R have m+1 derivatives. Then there exists t ∈ [0, 1]
such that
m
∑
h(k) (0) h(m+1) (t)
+
.
h (1) = h (0) +
k!
(m + 1)!
k=1
Now let f : U → R where U ⊆ Rn and suppose f ∈ C m (U ) . Let x ∈ U and let r > 0 be
such that
B (x,r) ⊆ U.
Then for ||v|| < r, consider
f (x+tv) − f (x) ≡ h (t)
for t ∈ [0, 1] . Then by the chain rule,
h′ (t) =
n
n ∑
n
∑
∑
∂f
∂2f
(x + tv) vk , h′′ (t) =
(x + tv) vk vj ∂xk
∂xj ∂xk
j=1
k=1
k=1
Then from the Taylor formula stopping at the second derivative, the following theorem can
be obtained.
Theorem 7.7.5 Let f : U → R and let f ∈ C 2 (U ) . Then if
B (x,r) ⊆ U,
and ||v|| < r, there exists t ∈ (0, 1) such that.
f (x + v) = f (x) +
n
n
n
∑
1 ∑ ∑ ∂2f
∂f
(x) vk +
(x + tv) vk vj
∂xk
2
∂xj ∂xk
j=1
(7.18)
k=1
k=1
Definition 7.7.6 Define the following matrix.
Hij (x+tv) ≡
∂ 2 f (x+tv)
.
∂xj ∂xi
It is called the Hessian matrix. From Corollary 7.7.2, this is a symmetric matrix. Then in
terms of this matrix, 7.18 can be written as
f (x + v) = f (x) +
n
∑
∂f
1
(x) vk + vT H (x+tv) v
∂xj
2
j=1
Then this implies f (x + v) =
f (x) +
n
∑
)
1
1(
∂f
(x) vk + vT H (x) v+ vT (H (x+tv) −H (x)) v .
∂x
2
2
j
j=1
Using the above formula, here is the second derivative test.
(7.19)
7.7. SECOND DERIVATIVE TEST
193
Theorem 7.7.7 In the above situation, suppose fxj (x) = 0 for each xj . Then if H (x) has
all positive eigenvalues, x is a local minimum for f . If H (x) has all negative eigenvalues,
then x is a local maximum. If H (x) has a positive eigenvalue, then there exists a direction
in which f has a local minimum at x, while if H (x) has a negative eigenvalue, there exists
a direction in which H (x) has a local maximum at x.
Proof: Since fxj (x) = 0 for each xj , formula 7.19 implies
)
1
1(
f (x + v) = f (x) + vT H (x) v+ vT (H (x+tv) −H (x)) v
2
2
where H (x) is a symmetric matrix. Thus, by Corollary 7.4.12 H (x) has all real eigenvalues.
Suppose first that H (x) has all positive eigenvalues and that all are larger than δ 2 > 0.
n
Then∑H (x) has an orthonormal basis of eigenvectors, {vi }i=1 and if u is an arbitrary vector,
n
u = j=1 uj vj where uj = u · vj . Thus
(
uT H (x) u =
n
∑
)
uk vkT

H (x) 
n
∑
j=1
k=1

uj vj  =
n
∑
u2j λj ≥ δ 2
j=1
n
∑
2
u2j = δ 2 |u| .
j=1
From 7.19 and the continuity of H, if v is small enough,
1
1
δ2
2
2
2
f (x + v) ≥ f (x) + δ 2 |v| − δ 2 |v| = f (x) +
|v| .
2
4
4
This shows the first claim of the theorem. The second claim follows from similar reasoning.
Suppose H (x) has a positive eigenvalue λ2 . Then let v be an eigenvector for this eigenvalue.
From 7.19,
)
1
1 (
f (x+tv) = f (x) + t2 vT H (x) v+ t2 vT (H (x+tv) −H (x)) v
2
2
which implies
f (x+tv) =
≥
)
1
1 (
2
f (x) + t2 λ2 |v| + t2 vT (H (x+tv) −H (x)) v
2
2
1 2 2 2
f (x) + t λ |v|
4
whenever t is small enough. Thus in the direction v the function has a local minimum at
x. The assertion about the local maximum in some direction follows similarly. This theorem is an analogue of the second derivative test for higher dimensions. As in
one dimension, when there is a zero eigenvalue, it may be impossible to determine from the
Hessian matrix what the local qualitative behavior of the function is. For example, consider
f1 (x, y) = x4 + y 2 , f2 (x, y) = −x4 + y 2 .
Then Dfi (0, 0) = 0 and for both functions, the Hessian matrix evaluated at (0, 0) equals
(
)
0 0
0 2
but the behavior of the two functions is very different near the origin. The second has a
saddle point while the first has a minimum there.
194
7.8
CHAPTER 7. SPECTRAL THEORY
The Estimation Of Eigenvalues
There are ways to estimate the eigenvalues for matrices. The most famous is known as
Gerschgorin’s theorem. This theorem gives a rough idea where the eigenvalues are just from
looking at the matrix.
Theorem 7.8.1 Let A be an n × n matrix. Consider the n Gerschgorin discs defined as




∑
Di ≡ λ ∈ C : |λ − aii | ≤
|aij | .


j̸=i
Then every eigenvalue is contained in some Gerschgorin disc.
This theorem says to add up the absolute values of the entries of the ith row which are
off the main diagonal and form the disc centered at aii having this radius. The union of
these discs contains σ (A) .
Proof: Suppose Ax = λx where x ̸= 0. Then for A = (aij )
∑
aij xj = (λ − aii ) xi .
j̸=i
Therefore, picking k such that |xk | ≥ |xj | for all xj , it follows that |xk | ̸= 0 since |x| ̸= 0
and
∑
∑
|xk |
|aij | ≥
|aij | |xj | ≥ |λ − aii | |xk | .
j̸=i
j̸=i
Now dividing by |xk |, it follows λ is contained in the k th Gerschgorin disc. Example 7.8.2 Here is a matrix. Estimate its

2 1
 3 5
0 1
eigenvalues.

1
0 
9
According to Gerschgorin’s theorem the eigenvalues are contained in the disks
D1 = {λ ∈ C : |λ − 2| ≤ 2} , D2 = {λ ∈ C : |λ − 5| ≤ 3} ,
D3 = {λ ∈ C : |λ − 9| ≤ 1}
It is important to observe that these disks are in the complex plane. In general this is the
case. If you want to find eigenvalues they will be complex numbers.
iy
x
2
5
9
So what are the values of the eigenvalues? In this case they are real. You can compute
them by graphing the characteristic polynomial, λ3 − 16λ2 + 70λ − 66 and then zooming in on the zeros. If you do this you find the solution is {λ = 1. 295 3} , {λ = 5. 590 5} ,
{λ = 9. 114 2} . Of course these are only approximations and so this information is useless
7.9. ADVANCED THEOREMS
195
for finding eigenvectors. However, in many applications, it is the size of the eigenvalues
which is important and so these numerical values would be helpful for such applications. In
this case, you might think there is no real reason for Gerschgorin’s theorem. Why not just
compute the characteristic equation and graph and zoom? This is fine up to a point, but
what if the matrix was huge? Then it might be hard to find the characteristic polynomial.
Remember the difficulties in expanding a big matrix along a row or column. Also, what if
the eigenvalues were complex? You don’t see these by following this procedure. However,
Gerschgorin’s theorem will at least estimate them.
7.9
Advanced Theorems
More can be said but this requires some theory from complex variables1 . The following is a
fundamental theorem about counting zeros.
Theorem 7.9.1 Let U be a region and let γ : [a, b] → U be closed, continuous, bounded
variation, and the winding number, n (γ, z) = 0 for all z ∈
/ U. Suppose also that f is
analytic on U having zeros a1 , · · · , am where the zeros are repeated according to multiplicity,
and suppose that none of these zeros are on γ ([a, b]) . Then
∫ ′
m
∑
f (z)
1
dz =
n (γ, ak ) .
2πi γ f (z)
∏m
k=1
Proof: It is given that f (z) = j=1 (z − aj ) g (z) where g (z) ̸= 0 on U. Hence using
the product rule,
m
f ′ (z) ∑ 1
g ′ (z)
=
+
f (z)
z − aj
g (z)
j=1
where
g ′ (z)
g(z)
is analytic on U and so
1
2πi
∫
γ
∑
f ′ (z)
1
dz =
n (γ, aj ) +
f (z)
2πi
j=1
m
∫
∑
g ′ (z)
dz =
n (γ, aj ) . g (z)
j=1
m
γ
Now let A be an n × n matrix. Recall that the eigenvalues of A are given by the zeros
of the polynomial, pA (z) = det (zI − A) where I is the n × n identity. You can argue
that small changes in A will produce small changes in pA (z) and p′A (z) . Let γ k denote a
very small closed circle which winds around zk , one of the eigenvalues of A, in the counter
clockwise direction so that n (γ k , zk ) = 1. This circle is to enclose only zk and is to have no
other eigenvalue on it. Then apply Theorem 7.9.1. According to this theorem
∫ ′
1
pA (z)
dz
2πi γ pA (z)
is always an integer equal to the multiplicity of zk as a root of pA (t) . Therefore, small
changes in A result in no change to the above contour integral because it must be an integer
and small changes in A result in small changes in the integral. Therefore whenever B is close
enough to A, the two matrices have the same number of zeros inside γ k , the zeros being
counted according to multiplicity. By making the radius of the small circle equal to ε where
ε is less than the minimum distance between any two distinct eigenvalues of A, this shows
that if B is close enough to A, every eigenvalue of B is closer than ε to some eigenvalue of
A. 1 If you haven’t studied the theory of a complex variable, you should skip this section because you won’t
understand any of it.
196
CHAPTER 7. SPECTRAL THEORY
Theorem 7.9.2 If λ is an eigenvalue of A, then if all the entries of B are close enough to
the corresponding entries of A, some eigenvalue of B will be within ε of λ.
Consider the situation that A (t) is an n × n matrix and that t → A (t) is continuous for
t ∈ [0, 1] .
Lemma 7.9.3 Let λ (t) ∈ σ (A (t)) for t < 1 and let Σt = ∪s≥t σ (A (s)) . Also let Kt be the
connected component of λ (t) in Σt . Then there exists η > 0 such that Kt ∩ σ (A (s)) ̸= ∅ for
all s ∈ [t, t + η] .
Proof: Denote by D (λ (t) , δ) the disc centered at λ (t) having radius δ > 0, with other
occurrences of this notation being defined similarly. Thus
D (λ (t) , δ) ≡ {z ∈ C : |λ (t) − z| ≤ δ} .
Suppose δ > 0 is small enough that λ (t) is the only element of σ (A (t)) contained in
D (λ (t) , δ) and that pA(t) has no zeroes on the boundary of this disc. Then by continuity, and
the above discussion and theorem, there exists η > 0, t + η < 1, such that for s ∈ [t, t + η] ,
pA(s) also has no zeroes on the boundary of this disc and A (s) has the same number
of eigenvalues, counted according to multiplicity, in the disc as A (t) . Thus σ (A (s)) ∩
D (λ (t) , δ) ̸= ∅ for all s ∈ [t, t + η] . Now let
∪
H=
σ (A (s)) ∩ D (λ (t) , δ) .
s∈[t,t+η]
It will be shown that H is connected. Suppose not. Then H = P ∪ Q where P, Q are
separated and λ (t) ∈ P. Let s0 ≡ inf {s : λ (s) ∈ Q for some λ (s) ∈ σ (A (s))} . There exists
λ (s0 ) ∈ σ (A (s0 )) ∩ D (λ (t) , δ) . If λ (s0 ) ∈
/ Q, then from the above discussion there are
λ (s) ∈ σ (A (s)) ∩ Q for s > s0 arbitrarily close to λ (s0 ) . Therefore, λ (s0 ) ∈ Q which shows
that s0 > t because λ (t) is the only element of σ (A (t)) in D (λ (t) , δ) and λ (t) ∈ P. Now
let sn ↑ s0 . Then λ (sn ) ∈ P for any λ (sn ) ∈ σ (A (sn )) ∩ D (λ (t) , δ) and also it follows from
the above discussion that for some choice of sn → s0 , λ (sn ) → λ (s0 ) which contradicts P
and Q separated and nonempty. Since P is nonempty, this shows Q = ∅. Therefore, H is
connected as claimed. But Kt ⊇ H and so Kt ∩ σ (A (s)) ̸= ∅ for all s ∈ [t, t + η] . Theorem 7.9.4 Suppose A (t) is an n × n matrix and that t → A (t) is continuous for
t ∈ [0, 1] . Let λ (0) ∈ σ (A (0)) and define Σ ≡ ∪t∈[0,1] σ (A (t)) . Let Kλ(0) = K0 denote the
connected component of λ (0) in Σ. Then K0 ∩ σ (A (t)) ̸= ∅ for all t ∈ [0, 1] .
Proof: Let S ≡ {t ∈ [0, 1] : K0 ∩ σ (A (s)) ̸= ∅ for all s ∈ [0, t]} . Then 0 ∈ S. Let t0 =
sup (S) . Say σ (A (t0 )) = λ1 (t0 ) , · · · , λr (t0 ) .
Claim: At least one of these is a limit point of K0 and consequently must be in K0
which shows that S has a last point. Why is this claim true? Let sn ↑ t0 so sn ∈ S.
Now let the discs, D (λi (t0 ) , δ) , i = 1, · · · , r be disjoint with pA(t0 ) having no zeroes on γ i
the boundary of D (λi (t0 ) , δ) . Then for n large enough it follows from Theorem 7.9.1 and
the discussion following it that σ (A (sn )) is contained in ∪ri=1 D (λi (t0 ) , δ). It follows that
K0 ∩ (σ (A (t0 )) + D (0, δ)) ̸= ∅ for all δ small enough. This requires at least one of the
λi (t0 ) to be in K0 . Therefore, t0 ∈ S and S has a last point.
Now by Lemma 7.9.3, if t0 < 1, then K0 ∪ Kt would be a strictly larger connected set
containing λ (0) . (The reason this would be strictly larger is that K0 ∩ σ (A (s)) = ∅ for
some s ∈ (t, t + η) while Kt ∩ σ (A (s)) ̸= ∅ for all s ∈ [t, t + η].) Therefore, t0 = 1. Corollary 7.9.5 Suppose one of the Gerschgorin discs, Di is disjoint from the union of
the others. Then Di contains an eigenvalue of A. Also, if there are n disjoint Gerschgorin
discs, then each one contains an eigenvalue of A.
7.9. ADVANCED THEOREMS
197
( )
Proof: Denote by A (t) the matrix atij where if i ̸= j, atij = taij and atii = aii . Thus to
get A (t) multiply all non diagonal terms by t. Let t ∈ [0, 1] . Then A (0) = diag (a11 , · · · , ann )
and A (1) = A. Furthermore, the map, t → A (t) is continuous. Denote by Djt the Gerschgorin disc obtained from the j th row for the matrix A (t). Then it is clear that Djt ⊆ Dj
the j th Gerschgorin disc for A. It follows aii is the eigenvalue for A (0) which is contained
in the disc, consisting of the single point aii which is contained in Di . Letting K be the
connected component in Σ for Σ defined in Theorem 7.9.4 which is determined by aii , Gerschgorin’s theorem implies that K ∩ σ (A (t)) ⊆ ∪nj=1 Djt ⊆ ∪nj=1 Dj = Di ∪ (∪j̸=i Dj ) and
also, since K is connected, there are not points of K in both Di and (∪j̸=i Dj ) . Since at least
one point of K is in Di ,(aii ), it follows all of K must be contained in Di . Now by Theorem
7.9.4 this shows there are points of K ∩ σ (A) in Di . The last assertion follows immediately.
This can be improved even more. This involves the following lemma.
Lemma 7.9.6 In the situation of Theorem 7.9.4 suppose λ (0) = K0 ∩ σ (A (0)) and that
λ (0) is a simple root of the characteristic equation of A (0). Then for all t ∈ [0, 1] ,
σ (A (t)) ∩ K0 = λ (t)
where λ (t) is a simple root of the characteristic equation of A (t) .
Proof: Let S ≡ {t ∈ [0, 1] : K0 ∩ σ (A (s)) = λ (s) , a simple eigenvalue for all s ∈ [0, t]} .
Then 0 ∈ S so it is nonempty. Let t0 = sup (S) and suppose λ1 ̸= λ2 are two elements of
σ (A (t0 ))∩K0 . Then choosing η > 0 small enough, and letting Di be disjoint discs containing
λi respectively, similar arguments to those of Lemma 7.9.3 can be used to conclude
Hi ≡ ∪s∈[t0 −η,t0 ] σ (A (s)) ∩ Di
is a connected and nonempty set for i = 1, 2 which would require that Hi ⊆ K0 . But
then there would be two different eigenvalues of A (s) contained in K0 , contrary to the
definition of t0 . Therefore, there is at most one eigenvalue λ (t0 ) ∈ K0 ∩ σ (A (t0 )) . Could
it be a repeated root of the characteristic equation? Suppose λ (t0 ) is a repeated root of
the characteristic equation. As before, choose a small disc, D centered at λ (t0 ) and η small
enough that
H ≡ ∪s∈[t0 −η,t0 ] σ (A (s)) ∩ D
is a nonempty connected set containing either multiple eigenvalues of A (s) or else a single
repeated root to the characteristic equation of A (s) . But since H is connected and contains
λ (t0 ) it must be contained in K0 which contradicts the condition for s ∈ S for all these
s ∈ [t0 − η, t0 ] . Therefore, t0 ∈ S as hoped. If t0 < 1, there exists a small disc centered
at λ (t0 ) and η > 0 such that for all s ∈ [t0 , t0 + η] , A (s) has only simple eigenvalues in
D and the only eigenvalues of A (s) which could be in K0 are in D. (This last assertion
follows from noting that λ (t0 ) is the only eigenvalue of A (t0 ) in K0 and so the others are
at a positive distance from K0 . For s close enough to t0 , the eigenvalues of A (s) are either
close to these eigenvalues of A (t0 ) at a positive distance from K0 or they are close to the
eigenvalue λ (t0 ) in which case it can be assumed they are in D.) But this shows that t0 is
not really an upper bound to S. Therefore, t0 = 1 and the lemma is proved. With this lemma, the conclusion of the above corollary can be sharpened.
Corollary 7.9.7 Suppose one of the Gerschgorin discs, Di is disjoint from the union of
the others. Then Di contains exactly one eigenvalue of A and this eigenvalue is a simple
root to the characteristic polynomial of A.
198
CHAPTER 7. SPECTRAL THEORY
Proof: In the proof of Corollary 7.9.5, note that aii is a simple root of A (0) since
otherwise the ith Gerschgorin disc would not be disjoint from the others. Also, K, the
connected component determined by aii must be contained in Di because it is connected
and by Gerschgorin’s theorem above, K ∩ σ (A (t)) must be contained in the union of the
Gerschgorin discs. Since all the other eigenvalues of A (0) , the ajj , are outside Di , it follows
that K ∩ σ (A (0)) = aii . Therefore, by Lemma 7.9.6, K ∩ σ (A (1)) = K ∩ σ (A) consists of
a single simple eigenvalue. Example 7.9.8 Consider the matrix


5 1 0
 1 1 1 
0 1 0
The Gerschgorin discs are D (5, 1) , D (1, 2) , and D (0, 1) . Observe D (5, 1) is disjoint
from the other discs. Therefore, there should be an eigenvalue in D (5, 1) . The actual
eigenvalues are not easy to find. They are the roots of the characteristic equation, t3 − 6t2 +
3t + 5 = 0. The numerical values of these are −. 669 66, 1. 423 1, and 5. 246 55, verifying the
predictions of Gerschgorin’s theorem.
7.10
Exercises
1. Explain why it is typically impossible to compute the upper triangular matrix whose
existence is guaranteed by Schur’s theorem.
2. Now recall the QR factorization of Theorem 5.7.5 on Page 141. The QR algorithm
is a technique which does compute the upper triangular matrix in Schur’s theorem.
There is much more to the QR algorithm than will be presented here. In fact, what
I am about to show you is not the way it is done in practice. One first obtains what
is called a Hessenburg matrix for which the algorithm will work better. However,
the idea is as follows. Start with A an n × n matrix having real eigenvalues. Form
A = QR where Q is orthogonal and R is upper triangular. (Right triangular.) This
can be done using the technique of Theorem 5.7.5 using Householder matrices. Next
take A1 ≡ RQ. Show that A = QA1 QT . In other words these two matrices, A, A1 are
similar. Explain why they have the same eigenvalues. Continue by letting A1 play the
role of A. Thus the algorithm is of the form An = QRn and An+1 = Rn+1 Q. Explain
why A = Qn An QTn for some Qn orthogonal. Thus An is a sequence of matrices each
similar to A. The remarkable thing is that often these matrices converge to an upper
triangular matrix T and A = QT QT for some orthogonal matrix, the limit of the Qn
where the limit means the entries converge. Then the process computes the upper
triangular Schur form of the matrix A. Thus the eigenvalues of A appear on the
diagonal of T. You will see approximately what these are as the process continues.
3. ↑Try the QR algorithm on
(
−1 −2
6
6
)
which has eigenvalues 3 and 2. I suggest you use a computer algebra system to do the
computations.
4. ↑Now try the QR algorithm on
(
0 −1
2 0
)
7.10. EXERCISES
199
Show that the algorithm cannot converge for this example. Hint: Try a few iterations
of the algorithm. Use a computer algebra system if you like.
(
)
(
)
0 −1
0 −2
5. ↑Show the two matrices A ≡
and B ≡
are similar; that is
4 0
2 0
there exists a matrix S such that A = S −1 BS but there is no orthogonal matrix Q such
that QT BQ = A. Show the QR algorithm does converge for the matrix B although it
fails to do so for A.
6. Let F be an m × n matrix. Show that F ∗ F has all real eigenvalues and furthermore,
they are all nonnegative.
7. If A is a real n × n matrix and λ is a complex eigenvalue λ = a + ib, b ̸= 0, of A having
eigenvector z + iw, show that w ̸= 0.
8. Suppose A = QT DQ where Q is an orthogonal matrix and all the matrices are real.
Also D is a diagonal matrix. Show that A must be symmetric.
9. Suppose A is an n × n matrix and there exists a unitary matrix U such that
A = U ∗ DU
where D is a diagonal matrix. Explain why A must be normal.
10. If A is Hermitian, show that det (A) must be real.
11. Show that every unitary matrix preserves distance. That is, if U is unitary,
|U x| = |x| .
12. Show that if a matrix does preserve distances, then it must be unitary.
13. ↑Show that a complex normal matrix A is unitary if and only if its eigenvalues have
magnitude equal to 1.
14. Suppose A is an n × n matrix which is diagonally dominant. Recall this means
∑
|aij | < |aii |
j̸=i
show A−1 must exist.
15. Give some disks in the complex plane whose union contains all the eigenvalues of the
matrix


1 + 2i 4 2

0
i 3 
5
6 7
16. Show a square matrix is invertible if and only if it has no zero eigenvalues.
17. Using Schur’s theorem, show the trace of an n × n matrix equals the sum of the
eigenvalues and the determinant of an n × n matrix is the product of the eigenvalues.
18. Using Schur’s theorem, show that if A is any
n∑
× n matrix having eigenvalues
∑ complex
2
n
2
{λi } listed according to multiplicity, then i,j |Aij | ≥ i=1 |λi | . Show that equality
holds if and only if A is normal. This inequality is called Schur’s inequality. [20]
200
CHAPTER 7. SPECTRAL THEORY

19. Here is a matrix.
1234
 0

 98
56
6
5
−654
9
123 10, 000
78
98

3
123 

11 
400
I know this matrix has an inverse before doing any computations. How do I know?
20. Show the critical points of the following function are
(
)
1
(0, −3, 0) , (2, −3, 0) , and 1, −3, −
3
and classify them as local minima, local maxima or saddle points.
f (x, y, z) = − 32 x4 + 6x3 − 6x2 + zx2 − 2zx − 2y 2 − 12y − 18 − 32 z 2 .
21. Here is a function of three variables.
f (x, y, z) = 13x2 + 2xy + 8xz + 13y 2 + 8yz + 10z 2
change the variables so that in the new variables there are no mixed terms, terms
involving xy, yz etc. Two eigenvalues are 12 and 18.
22. Here is a function of three variables.
f (x, y, z) = 2x2 − 4x + 2 + 9yx − 9y − 3zx + 3z + 5y 2 − 9zy − 7z 2
change the variables so that in the new variables there are no mixed terms, terms
involving xy, yz etc. The eigenvalues of the matrix which you will work with are
19
− 17
2 , 2 , −1.
23. Here is a function of three variables.
f (x, y, z) = −x2 + 2xy + 2xz − y 2 + 2yz − z 2 + x
change the variables so that in the new variables there are no mixed terms, terms
involving xy, yz etc.
24. Show the critical points of the function,
f (x, y, z) = −2yx2 − 6yx − 4zx2 − 12zx + y 2 + 2yz.
are points of the form,
(
)
(x, y, z) = t, 2t2 + 6t, −t2 − 3t
for t ∈ R and classify them as local minima, local maxima or saddle points.
25. Show the critical points of the function
f (x, y, z) =
1
1 4
x − 4x3 + 8x2 − 3zx2 + 12zx + 2y 2 + 4y + 2 + z 2 .
2
2
are (0, −1, 0) , (4, −1, 0) , and (2, −1, −12) and classify them as local minima, local
maxima or saddle points.
26. Let f (x, y) = 3x4 − 24x2 + 48 − yx2 + 4y. Find and classify the critical points using
the second derivative test.
7.10. EXERCISES
201
27. Let f (x, y) = 3x4 − 5x2 + 2 − y 2 x2 + y 2 . Find and classify the critical points using the
second derivative test.
28. Let f (x, y) = 5x4 − 7x2 − 2 − 3y 2 x2 + 11y 2 − 4y 4 . Find and classify the critical points
using the second derivative test.
29. Let f (x, y, z) = −2x4 − 3yx2 + 3x2 + 5x2 z + 3y 2 − 6y + 3 − 3zy + 3z + z 2 . Find and
classify the critical points using the second derivative test.
30. Let f (x, y, z) = 3yx2 − 3x2 − x2 z − y 2 + 2y − 1 + 3zy − 3z − 3z 2 . Find and classify
the critical points using the second derivative test.
31. Let Q be orthogonal. Find the possible values of det (Q) .
32. Let U be unitary. Find the possible values of det (U ) .
33. If a matrix is nonzero can it have only zero for eigenvalues?
34. A matrix A is called nilpotent if Ak = 0 for some positive integer k. Suppose A is a
nilpotent matrix. Show it has only 0 for an eigenvalue.
35. If A is a nonzero nilpotent matrix, show it must be defective.
36. Suppose A is a nondefective n × n matrix and its eigenvalues are all either 0 or 1.
Show A2 = A. Could you say anything interesting if the eigenvalues were all either
0,1,or −1? By DeMoivre’s theorem, an nth root of unity is of the form
(
(
)
(
))
2kπ
2kπ
cos
+ i sin
n
n
Could you generalize the sort of thing just described to get An = A? Hint: Since A
is nondefective, there exists S such that S −1 AS = D where D is a diagonal matrix.
37. This and the following problems will present most of a differential equations course.
Most of the explanations are given. You fill in any details needed. To begin with,
consider the scalar initial value problem
y ′ = ay, y (t0 ) = y0
When a is real, show the unique solution to this problem is y = y0 ea(t−t0 ) . Next
suppose
y ′ = (a + ib) y, y (t0 ) = y0
(7.20)
where y (t) = u (t) + iv (t) . Show there exists a unique solution and it is given by
y (t) =
y0 ea(t−t0 ) (cos b (t − t0 ) + i sin b (t − t0 )) ≡ e(a+ib)(t−t0 ) y0 .
(7.21)
Next show that for a real or complex there exists a unique solution to the initial value
problem
y ′ = ay + f, y (t0 ) = y0
and it is given by
∫
t
y (t) = ea(t−t0 ) y0 + eat
e−as f (s) ds.
t0
Hint: For the first part write as y ′ − ay = 0 and multiply both sides by e−at . Then
explain why you get
)
d ( −at
e y (t) = 0, y (t0 ) = 0.
dt
202
CHAPTER 7. SPECTRAL THEORY
Now you finish the argument. To show uniqueness in the second part, suppose
y ′ = (a + ib) y, y (t0 ) = 0
and verify this requires y (t) = 0. To do this, note
y ′ = (a − ib) y, y (t0 ) = 0
2
and that |y| (t0 ) = 0 and
d
2
|y (t)| = y ′ (t) y (t) + y ′ (t) y (t)
dt
2
= (a + ib) y (t) y (t) + (a − ib) y (t) y (t) = 2a |y (t)| .
Thus from the first part |y (t)| = 0e−2at = 0. Finally observe by a simple computation
that 7.20 is solved by 7.21. For the last part, write the equation as
2
y ′ − ay = f
and multiply both sides by e−at and then integrate from t0 to t using the initial
condition.
38. Now consider A an n × n matrix. By Schur’s theorem there exists unitary Q such that
Q−1 AQ = T
where T is upper triangular. Now consider the first order initial value problem
x′ = Ax, x (t0 ) = x0 .
Show there exists a unique solution to this first order system. Hint: Let y = Q−1 x
and so the system becomes
y′ = T y, y (t0 ) = Q−1 x0
(7.22)
T
Now letting y = (y1 , · · · , yn ) , the bottom equation becomes
(
)
yn′ = tnn yn , yn (t0 ) = Q−1 x0 n .
Then use the solution you get in this to get the solution to the initial value problem
which occurs one level up, namely
(
)
′
yn−1
= t(n−1)(n−1) yn−1 + t(n−1)n yn , yn−1 (t0 ) = Q−1 x0 n−1
Continue doing this to obtain a unique solution to 7.22.
39. Now suppose Φ (t) is an n × n matrix of the form
(
)
Φ (t) = x1 (t) · · · xn (t)
where
Explain why
(7.23)
x′k (t) = Axk (t) .
Φ′ (t) = AΦ (t)
if and only if Φ (t) is given in the form of 7.23. Also explain why if c ∈ Fn , y (t) ≡ Φ (t) c
solves the equation y′ (t) = Ay (t) .
7.10. EXERCISES
203
40. In the above problem, consider the question whether all solutions to
x′ = Ax
(7.24)
are obtained in the form Φ (t) c for some choice of c ∈ Fn . In other words, is the
general solution to this equation Φ (t) c for c ∈ Fn ? Prove the following theorem using
linear algebra.
Theorem 7.10.1 Suppose Φ (t) is an n × n matrix which satisfies Φ′ (t) = AΦ (t) .
−1
Then the general solution to 7.24 is Φ (t) c if and only if Φ (t)
exists for some t.
−1
−1
′
Furthermore, if Φ (t) = AΦ (t) , then either Φ (t) exists for all t or Φ (t) never
exists for any t.
(det (Φ (t)) is called the Wronskian and this theorem is sometimes called the Wronskian
alternative.)
Hint: Suppose first the general solution is of the form Φ (t) c where c is an arbitrary
−1
constant vector in Fn . You need to verify Φ (t)
exists for some t. In fact, show
−1
−1
Φ (t) exists for every t. Suppose then that Φ (t0 ) does not exist. Explain why
n
there exists c ∈ F such that there is no solution x to the equation c = Φ (t0 ) x. By
the existence part of Problem 38 there exists a solution to
x′ = Ax, x (t0 ) = c
−1
but this cannot be in the form Φ (t) c. Thus for every t, Φ (t) exists. Next suppose
−1
for some t0 , Φ (t0 ) exists. Let z′ = Az and choose c such that
z (t0 ) = Φ (t0 ) c
Then both z (t) , Φ (t) c solve
x′ = Ax, x (t0 ) = z (t0 )
Apply uniqueness to conclude z = Φ (t) c. Finally, consider that Φ (t) c for c ∈ Fn
−1
either is the general solution or it is not the general solution. If it is, then Φ (t)
−1
exists for all t. If it is not, then Φ (t) cannot exist for any t from what was just
shown.
−1
41. Let Φ′ (t) = AΦ (t) . Then Φ (t) is called a fundamental matrix if Φ (t)
t. Show there exists a unique solution to the equation
x′ = Ax + f , x (t0 ) = x0
exists for all
(7.25)
and it is given by the formula
−1
x (t) = Φ (t) Φ (t0 )
∫
t
x0 + Φ (t)
Φ (s)
−1
f (s) ds
t0
Now these few problems have done virtually everything of significance in an entire undergraduate differential equations course, illustrating the superiority of linear algebra.
The above formula is called the variation of constants formula.
Hint: Uniquenss is easy. If x1 , x2 are two solutions then let u (t) = x1 (t) − x2 (t) and
argue u′ = Au, u (t0 ) = 0. Then use Problem 38. To verify there exists a solution, you
204
CHAPTER 7. SPECTRAL THEORY
could just differentiate the above formula using the fundamental theorem of calculus
and verify it works. Another way is to assume the solution in the form
x (t) = Φ (t) c (t)
and find c (t) to make it all work out. This is called the method of variation of
parameters.
42. Show there exists a special Φ such that Φ′ (t) = AΦ (t) , Φ (0) = I, and suppose
−1
Φ (t) exists for all t. Show using uniqueness that
−1
Φ (−t) = Φ (t)
and that for all t, s ∈ R
Φ (t + s) = Φ (t) Φ (s)
Explain why with this special Φ, the solution to 7.25 can be written as
∫ t
x (t) = Φ (t − t0 ) x0 +
Φ (t − s) f (s) ds.
t0
Hint: Let Φ (t) be such that the j th column is xj (t) where
x′j = Axj , xj (0) = ej .
Use uniqueness as required.
43. You can see more on this problem and the next one in the latest version of Horn
and Johnson, [17]. Two n × n matrices A, B are said to be congruent if there is an
invertible P such that
B = P AP ∗
Let A be a Hermitian matrix. Thus it has all real eigenvalues. Let n+ be the number
of positive eigenvalues, n− , the number of negative eigenvalues and n0 the number of
zero eigenvalues. For k a positive integer, let Ik denote the k × k identity matrix and
Ok the k × k zero matrix. Then the inertia matrix of A is the following block diagonal
n × n matrix.


In+


In−
On0
Show that A is congruent to its inertia matrix. Next show that congruence is an equivalence relation on the set of Hermitian matrices. Finally, show that if two Hermitian
matrices have the same inertia matrix, then they must be congruent. Hint: First
recall that there is a unitary matrix, U such that


Dn +

Dn−
U ∗ AU = 
On 0
where the Dn+ is a diagonal
having the positive eigenvalues of A, Dn− being
matrix
defined similarly. Now let Dn− denote the diagonal matrix which replaces each entry
of Dn− with its absolute value. Consider the two diagonal matrices


−1/2
Dn+


Dn− −1/2
D = D∗ = 

In 0
Now consider D∗ U ∗ AU D.
7.10. EXERCISES
205
44. Show that if A, B are two congruent Hermitian matrices, then they have the same
inertia matrix. Hint: Let A = SBS ∗ where S is invertible. Show that A, B have the
same rank and this implies that they are each unitarily similar to a diagonal matrix
which has the same number of zero entries on the main diagonal. Therefore, letting
VA be the span of the eigenvectors associated with positive eigenvalues of A and VB
being defined similarly, it suffices to show that these have the same dimensions. Show
that (Ax, x) > 0 for all x ∈ VA . Next consider S ∗ VA . For x ∈ VA , explain why
(
)
−1
(BS ∗ x,S ∗ x) =
S −1 A (S ∗ ) S ∗ x,S ∗ x
)
(
) (
(
)∗
= S −1 Ax,S ∗ x = Ax, S −1 S ∗ x = (Ax, x) > 0
Next explain why this shows that S ∗ VA is a subspace of VB and so the dimension of VB
is at least as large as the dimension of VA . Hence there are at least as many positive
eigenvalues for B as there are for A. Switching A, B you can turn the inequality
around. Thus the two have the same inertia matrix.
45. Let A be an m × n matrix. Then if you unraveled it, you could consider it as a vector
in Cnm . The Frobenius inner product on the vector space of m × n matrices is defined
as
(A, B) ≡ trace (AB ∗ )
Show that this really does satisfy the axioms of an inner product space and that it
also amounts to nothing more than considering m × n matrices as vectors in Cnm .
46. ↑Consider the n × n unitary matrices. Show that whenever U is such a matrix, it
follows that
√
|U |Cnn = n
Next explain why if {Uk } is any sequence of unitary matrices, there exists a subse∞
quence {Ukm }m=1 such that limm→∞ Ukm = U where U is unitary. Here the limit
takes place in the sense that the entries of Ukm converge to the corresponding entries
of U .
47. ↑Let A, B be two n × n matrices. Denote by σ (A) the set of eigenvalues of A. Define
dist (σ (A) , σ (B)) = max min {|λ − µ| : µ ∈ σ (B)}
λ∈σ(A)
Explain why dist (σ (A) , σ (B)) is small if and only if every eigenvalue of A is close
to some eigenvalue of B. Now prove the following theorem using the above problem
and Schur’s theorem. This theorem says roughly that if A is close to B then the
eigenvalues of A are close to those of B in the sense that every eigenvalue of A is close
to an eigenvalue of B.
Theorem 7.10.2 Suppose limk→∞ Ak = A. Then
lim dist (σ (Ak ) , σ (A)) = 0
k→∞
(
)
a b
48. Let A =
be a 2 × 2 matrix which is not a multiple of the identity. Show
c d
that A is similar to a 2 × 2 matrix which has at least one diagonal entry equal to 0.
Hint: First note that there exists a vector a such that Aa is not a multiple of a. Then
consider
(
)−1 (
)
B = a Aa
A a Aa
Show B has a zero on the main diagonal.
206
CHAPTER 7. SPECTRAL THEORY
49. ↑ Let A be a complex n × n matrix which has trace equal to 0. Show that A is similar
to a matrix which has all zeros on the main diagonal. Hint: Use Problem 30 on
Page 130 to argue that you can say that a given matrix is similar to one which has
the diagonal entries permuted in any order desired. Then use the above problem and
block multiplication to show that if the A has k nonzero entries, then it is similar to
a matrix which has k − 1 nonzero entries. Finally, when A is similar to one which has
at most one nonzero entry, this one must also be zero because of the condition on the
trace.
50. ↑An n × n matrix X is a comutator if there are n × n matrices A, B such that X =
AB − BA. Show that the trace of any comutator is 0. Next show that if a complex
matrix X has trace equal to 0, then it is in fact a comutator. Hint: Use the above
problem to show that it suffices to consider X having all zero entries on the main
diagonal. Then define


1
0
{


Xij
2


i−j if i ̸= j
,
B
=
A=

ij
..


0 if i = j
.
0
n
Chapter 8
Vector Spaces And Fields
8.1
Vector Space Axioms
It is time to consider the idea of a Vector space.
Definition 8.1.1 A vector space is an Abelian group of “vectors” satisfying the axioms of
an Abelian group,
v + w = w + v,
the commutative law of addition,
(v + w) + z = v+ (w + z) ,
the associative law for addition,
v + 0 = v,
the existence of an additive identity,
v+ (−v) = 0,
the existence of an additive inverse, along with a field of “scalars”, F which are allowed to
multiply the vectors according to the following rules. (The Greek letters denote scalars.)
α (v + w) = αv+αw,
(8.1)
(α + β) v =αv+βv,
(8.2)
α (βv) = αβ (v) ,
(8.3)
1v = v.
(8.4)
The field of scalars is usually R or C and the vector space will be called real or complex
depending on whether the field is R or C. However, other fields are also possible. For
example, one could use the field of rational numbers or even the field of the integers mod p
for p a prime. A vector space is also called a linear space.
For example, Rn with the usual conventions is an example of a real vector space and Cn
is an example of a complex vector space. Up to now, the discussion has been for Rn or Cn
and all that is taking place is an increase in generality and abstraction.
There are many examples of vector spaces.
207
208
CHAPTER 8. VECTOR SPACES AND FIELDS
Example 8.1.2 Let Ω be a nonempty set and let V consist of all functions defined on Ω
which have values in some field F. The vector operations are defined as follows.
(f + g) (x) =
(αf ) (x) =
f (x) + g (x)
αf (x)
Then it is routine to verify that V with these operations is a vector space.
Note that Fn actually fits in to this framework. You consider the set Ω to be {1, 2, · · · , n}
and then the mappings from Ω to F give the elements of Fn . Thus a typical vector can be
considered as a function.
Example 8.1.3 Generalize the above example by letting V denote all functions defined on
Ω which have values in a vector space W which has field of scalars F. The definitions of
scalar multiplication and vector addition are identical to those of the above example.
8.2
8.2.1
Subspaces And Bases
Basic Definitions
Definition 8.2.1 If {v1 , · · · , vn } ⊆ V, a vector space, then
{ n
}
∑
α i vi : α i ∈ F .
span (v1 , · · · , vn ) ≡
i=1
A subset, W ⊆ V is said to be a subspace if it is also a vector space with the same field of
scalars. Thus W ⊆ V for W nonempty is a subspace if ax + by ∈ W whenever a, b ∈ F and
x, y ∈ W. The span of a set of vectors as just described is an example of a subspace.
Example 8.2.2 Consider the real valued functions defined on an interval [a, b]. A subspace
is the set of continuous real valued functions defined on the interval. Another subspace is
the set of polynomials of degree no more than 4.
Definition 8.2.3 If {v1 , · · · , vn } ⊆ V, the set of vectors is linearly independent if
n
∑
α i vi = 0
i=1
implies
α1 = · · · = αn = 0
and {v1 , · · · , vn } is called a basis for V if
span (v1 , · · · , vn ) = V
and {v1 , · · · , vn } is linearly independent. The set of vectors is linearly dependent if it is not
linearly independent.
8.2. SUBSPACES AND BASES
8.2.2
209
A Fundamental Theorem
The next theorem is called the exchange theorem. It is very important that you understand
this theorem. It is so important that I have given several proofs of it. Some amount to the
same thing, just worded differently.
Theorem 8.2.4 Let {x1 , · · · , xr } be a linearly independent set of vectors such that each xi
is in the span{y1 , · · · , ys } . Then r ≤ s.
Proof 1:
that
Define span{y1 , · · · , ys } ≡ V, it follows there exist scalars c1 , · · · , cs such
x1 =
s
∑
ci yi .
(8.5)
i=1
Not all of these scalars can equal zero because if this were the case, it would follow that
x1 = 0 and so {x1 , · · · , xr } would not be linearly independent. Indeed, if x1 = 0, 1x1 +
∑
r
i=2 0xi = x1 = 0 and so there would exist a nontrivial linear combination of the vectors
{x1 , · · · , xr } which equals zero.
Say ck ̸= 0. Then solve 8.5 for yk and obtain


s-1 vectors here
z
}|
{
yk ∈ span x1 , y1 , · · · , yk−1 , yk+1 , · · · , ys  .
Define {z1 , · · · , zs−1 } by
{z1 , · · · , zs−1 } ≡ {y1 , · · · , yk−1 , yk+1 , · · · , ys }
Therefore, span {x1 , z1 , · · · , zs−1 } = V because if v ∈ V, there exist constants c1 , · · · , cs
such that
s−1
∑
v=
ci zi + cs yk .
i=1
Now replace the yk in the above with a linear combination of the vectors, {x1 , z1 , · · · , zs−1 }
to obtain v ∈ span {x1 , z1 , · · · , zs−1 } . The vector yk , in the list {y1 , · · · , ys } , has now been
replaced with the vector x1 and the resulting modified list of vectors has the same span as
the original list of vectors, {y1 , · · · , ys } .
Now suppose that r > s and that span {x1 , · · · , xl , z1 , · · · , zp } = V where the vectors,
z1 , · · · , zp are each taken from the set, {y1 , · · · , ys } and l + p = s. This has now been done
for l = 1 above. Then since r > s, it follows that l ≤ s < r and so l + 1 ≤ r. Therefore, xl+1
is a vector not in the list, {x1 , · · · , xl } and since span {x1 , · · · , xl , z1 , · · · , zp } = V there
exist scalars ci and dj such that
xl+1 =
l
∑
i=1
ci xi +
p
∑
dj zj .
(8.6)
j=1
Now not all the dj can equal zero because if this were so, it would follow that {x1 , · · · , xr }
would be a linearly dependent set because one of the vectors would equal a linear combination
of the others. Therefore, (8.6) can be solved for one of the zi , say zk , in terms of xl+1 and
the other zi and just as in the above argument, replace that zi with xl+1 to obtain


p-1 vectors here
z
}|
{
span x1 , · · · xl , xl+1 , z1 , · · · zk−1 , zk+1 , · · · , zp  = V.
210
CHAPTER 8. VECTOR SPACES AND FIELDS
Continue this way, eventually obtaining
span (x1 , · · · , xs ) = V.
But then xr ∈ span {x1 , · · · , xs } contrary to the assumption that {x1 , · · · , xr } is linearly
independent. Therefore, r ≤ s as claimed.
Proof 2: Let
s
∑
xk =
ajk yj
j=1
If r > s, then the matrix A = (ajk ) has more columns than rows. By Corollary 4.3.9
one of these columns is a linear combination of the others. This implies there exist scalars
c1 , · · · , cr , not all zero such that
r
∑
ajk ck = 0, j = 1, · · · , r
k=1
Then
r
∑
k=1
ck xk =
r
∑
k=1
ck
s
∑
ajk yj =
j=1
( r
s
∑
∑
j=1
)
ck ajk
yj = 0
k=1
which contradicts the assumption that {x1 , · · · , xr } is linearly independent. Hence r ≤ s.
Proof 3: Suppose r > s. Let zk denote a vector of {y1 , · · · , ys } . Thus there exists j as
small as possible such that
span (y1 , · · · , ys ) = span (x1 , · · · , xm , z1 , · · · , zj )
where m + j = s. It is given that m = 0, corresponding to no vectors of {x1 , · · · , xm } and
j = s, corresponding to all the yk results in the above equation holding. If j > 0 then m < s
and so
j
m
∑
∑
xm+1 =
ak xk +
bi zi
k=1
i=1
Not all the bi can equal 0 and so you can solve for one of them in terms of xm+1 , xm , · · · , x1 ,
and the other zk . Therefore, there exists
{z1 , · · · , zj−1 } ⊆ {y1 , · · · , ys }
such that
span (y1 , · · · , ys ) = span (x1 , · · · , xm+1 , z1 , · · · , zj−1 )
contradicting the choice of j. Hence j = 0 and
span (y1 , · · · , ys ) = span (x1 , · · · , xs )
It follows that
xs+1 ∈ span (x1 , · · · , xs )
contrary to the assumption the xk are linearly independent. Therefore, r ≤ s as claimed. Corollary 8.2.5 If {u1 , · · · , um } and {v1 , · · · , vn } are two bases for V, then m = n.
Proof: By Theorem 8.2.4, m ≤ n and n ≤ m. 8.2. SUBSPACES AND BASES
211
Definition 8.2.6 A vector space V is of dimension n if it has a basis consisting of n vectors.
This is well defined thanks to Corollary 8.2.5. It is always assumed here that n < ∞ and in
this case, such a vector space is said to be finite dimensional.
Example 8.2.7 Consider the polynomials defined
on R }of degree no more than 3, denoted
{
here as P3 . Then show that a basis for P3 is 1, x, x2 , x3 . Here xk symbolizes the function
x 7→ xk .
It is obvious that the span of the given vectors yields P3 . Why is this set of vectors
linearly independent? Suppose
c0 + c1 x + c2 x2 + c3 x3 = 0
where 0 is the zero function which maps everything to 0. Then you could differentiate three
times and obtain the following equations
c1 + 2c2 x + 3c3 x2 =
2c2 + 6c3 x =
6c3
=
0
0
0
Now this implies c3 = 0. Then from the equations above the bottom one, you find in
succession that c2 = 0, c1 = 0, c0 = 0.
There is a somewhat interesting theorem about linear independence of smooth functions
(those having plenty of derivatives) which I will show now. It is often used in differential
equations.
Definition 8.2.8 Let f1 , · · · , fn be smooth functions defined on an interval [a, b] . The
Wronskian of these functions is defined as follows.
f1 (x)
f2 (x)
···
fn (x) f1′ (x)
···
fn′ (x) f2′ (x)
W (f1 , · · · , fn ) (x) ≡ ..
..
..
.
.
.
(n−1)
(n−1)
(n−1)
f
(x) (x) · · · fn
(x) f2
1
Note that to get from one row to the next, you just differentiate everything in that row. The
notation f (k) (x) denotes the k th derivative.
With this definition, the following is the theorem. The interesting theorem involving the
Wronskian has to do with the situation where the functions are solutions of a differential
equation. Then much more can be said and it is much more interesting than the following
theorem.
Theorem 8.2.9 Let {f1 , · · · , fn } be smooth functions defined on [a, b] . Then they are linearly independent if there exists some point t ∈ [a, b] where W (f1 , · · · , fn ) (t) ̸= 0.
Proof: Form the linear combination of these vectors (functions) and suppose it equals
0. Thus
a 1 f1 + a 2 f2 + · · · + a n fn = 0
The question you must answer is whether this requires each aj to equal zero. If they all
must equal 0, then this means these vectors (functions) are independent. This is what it
means to be linearly independent.
212
CHAPTER 8. VECTOR SPACES AND FIELDS
Differentiate the above equation n − 1 times yielding the equations


a 1 f1 + a 2 f2 + · · · + a n fn = 0


a1 f1′ + a2 f2′ + · · · + an fn′ = 0


..




.
(n−1)
a1 f1
(n−1)
+ a 2 f2
Now plug in t. Then the above yields

f1 (t)
f2 (t)
 f1′ (t)
f2′ (t)

..
..


.
.
(n−1)
(n−1)
f1
(t) f2
(t)
···
···
···
(n−1)
+ · · · + a n fn
fn (t)
fn′ (t)
..
.
(n−1)
fn





(t)
=0
a1
a2
..
.


 
 
=
 
an
0
0
..
.





0
Since the determinant of the matrix on the left is assumed to be nonzero, it follows this
matrix has an inverse and so the only solution to the above system of equations is to have
each ak = 0. Here is a useful lemma.
Lemma 8.2.10 Suppose v ∈
/ span (u1 , · · · , uk ) and {u1 , · · · , uk } is linearly independent.
Then {u1 , · · · , uk , v} is also linearly independent.
∑k
Proof: Suppose i=1 ci ui + dv = 0. It is required to verify that each ci = 0 and that
d = 0. But if d ̸= 0, then you can solve for v as a linear combination of the vectors,
{u1 , · · · , uk },
k ( )
∑
ci
v=−
ui
d
i=1
∑k
contrary to assumption. Therefore, d = 0. But then i=1 ci ui = 0 and the linear independence of {u1 , · · · , uk } implies each ci = 0 also. Given a spanning set, you can delete vectors till you end up with a basis. Given a linearly
independent set, you can add vectors till you get a basis. This is what the following theorem
is about, weeding and planting.
Theorem 8.2.11 If V = span (u1 , · · · , un ) then some subset of {u1 , · · · , un } is a basis for
V. Also, if {u1 , · · · , uk } ⊆ V is linearly independent and the vector space is finite dimensional, then the set, {u1 , · · · , uk }, can be enlarged to obtain a basis of V.
Proof: Let
S = {E ⊆ {u1 , · · · , un } such that span (E) = V }.
For E ∈ S, let |E| denote the number of elements of E. Let
m ≡ min{|E| such that E ∈ S}.
Thus there exist vectors
{v1 , · · · , vm } ⊆ {u1 , · · · , un }
such that
span (v1 , · · · , vm ) = V
and m is as small as possible for this to happen. If this set is linearly independent, it follows
it is a basis for V and the theorem is proved. On the other hand, if the set is not linearly
independent, then there exist scalars
c1 , · · · , cm
8.3. LOTS OF FIELDS
213
such that
0=
m
∑
ci vi
i=1
and not all the ci are equal to zero. Suppose ck ̸= 0. Then the vector, vk may be solved for
in terms of the other vectors. Consequently,
V = span (v1 , · · · , vk−1 , vk+1 , · · · , vm )
contradicting the definition of m. This proves the first part of the theorem.
To obtain the second part, begin with {u1 , · · · , uk } and suppose a basis for V is
{v1 , · · · , vn } .
If
span (u1 , · · · , uk ) = V,
then k = n. If not, there exists a vector,
uk+1 ∈
/ span (u1 , · · · , uk ) .
Then by Lemma 8.2.10, {u1 , · · · , uk , uk+1 } is also linearly independent. Continue adding
vectors in this way until n linearly independent vectors have been obtained. Then
span (u1 , · · · , un ) = V
because if it did not do so, there would exist un+1 as just described and {u1 , · · · , un+1 }
would be a linearly independent set of vectors having n+1 elements even though {v1 , · · · , vn }
is a basis. This would contradict Theorem 8.2.4. Therefore, this list is a basis. 8.2.3
The Basis Of A Subspace
Every subspace of a finite dimensional vector space is a span of some vectors and in fact it
has a basis. This is the content of the next theorem.
Theorem 8.2.12 Let V be a nonzero subspace of a finite dimensional vector space W of
dimension n. Then V has a basis with no more than n vectors.
Proof: Let v1 ∈ V where v1 ̸= 0. If span {v1 } = V, stop. {v1 } is a basis for V .
Otherwise, there exists v2 ∈ V which is not in span {v1 } . By Lemma 8.2.10 {v1 , v2 } is a
linearly independent set of vectors. If span {v1 , v2 } = V stop, {v1 , v2 } is a basis for V. If
span {v1 , v2 } ̸= V, then there exists v3 ∈
/ span {v1 , v2 } and {v1 , v2 , v3 } is a larger linearly
independent set of vectors. Continuing this way, the process must stop before n + 1 steps
because if not, it would be possible to obtain n + 1 linearly independent vectors contrary to
the exchange theorem, Theorem 8.2.4. 8.3
8.3.1
Lots Of Fields
Irreducible Polynomials
I mentioned earlier that most things hold for arbitrary fields. However, I have not bothered
to give any examples of other fields. This is the point of this section. It also turns out that
showing the algebraic numbers are a field can be understood using vector space concepts
214
CHAPTER 8. VECTOR SPACES AND FIELDS
and it gives a very convincing application of the abstract theory presented earlier in this
chapter.
Here I will give some basic algebra relating to polynomials. This is interesting for its
own sake but also provides the basis for constructing many different kinds of fields. The
first is the Euclidean algorithm for polynomials.
∑n
Definition 8.3.1 A polynomial is an expression of the form p (λ) = k=0 ak λk where as
usual λ0 is defined to equal 1. Two polynomials are said to be equal if their corresponding
coefficients are the same. Thus, in particular, p (λ) = 0 means each of the ak = 0. An
element of the field λ is said to be a root of the polynomial if p (λ) = 0 in the sense that
when you plug in λ into the formula and do the indicated operations, you get 0. The degree
of a nonzero polynomial is the highest exponent appearing on λ. The degree of the zero
polynomial p (λ) = 0 is not defined.
Example 8.3.2 Consider the polynomial p (λ) = λ2 + λ where the coefficients are in Z2 . Is
this polynomial equal to 0? Not according to the above definition, because its coefficients are
not all equal to 0. However, p (1) = p (0) = 0 so it sends every element of Z2 to 0. Note the
distinction between saying it sends everything in the field to 0 with having the polynomial be
the zero polynomial.
The fundamental result is the division theorem for polynomials. It is Lemma 1.9.10 on
Page 27. We state it here for convenience.
Lemma 8.3.3 Let f (λ) and g (λ) ̸= 0 be polynomials. Then there exists a polynomial, q (λ)
such that
f (λ) = q (λ) g (λ) + r (λ)
where the degree of r (λ) is less than the degree of g (λ) or r (λ) = 0. These polynomials
q (λ) and r (λ) are unique.
Now with this lemma, here is another one which is very fundamental. First here is a
definition. A polynomial is monic means it is of the form
λn + cn−1 λn−1 + · · · + c1 λ + c0 .
That is, the leading coefficient is 1. In what follows, the coefficients of polynomials are in
F, a field of scalars which is completely arbitrary. Think R if you need an example.
Definition 8.3.4 A polynomial f is said to divide a polynomial g if g (λ) = f (λ) r (λ) for
some polynomial r (λ). Let {ϕi (λ)} be a finite set of polynomials. The greatest common
divisor will be the monic polynomial q (λ) such that q (λ) divides each ϕi (λ) and if p (λ)
divides each ϕi (λ) , then p (λ) divides q (λ) . The finite set of polynomials {ϕi } is said to be
relatively prime if their greatest common divisor is 1. A polynomial f (λ) is irreducible if
there is no polynomial with coefficients in F which divides it except nonzero scalar multiples
of f (λ) and constants.
Proposition 8.3.5 The greatest common divisor is unique.
Proof: Suppose both q (λ) and q ′ (λ) work. Then q (λ) divides q ′ (λ) and the other way
around and so
q ′ (λ) = q (λ) l (λ) , q (λ) = l′ (λ) q ′ (λ)
Therefore, the two must have the same degree. Hence l′ (λ) , l (λ) are both constants. However, this constant must be 1 because both q (λ) and q ′ (λ) are monic. 8.3. LOTS OF FIELDS
215
Theorem 8.3.6 Let ψ (λ) be the greatest common divisor of {ϕi (λ)} , not all of which are
zero polynomials. Then there exist polynomials ri (λ) such that
ψ (λ) =
p
∑
ri (λ) ϕi (λ) .
i=1
Furthermore, ψ (λ) is the monic polynomial of smallest degree which can be written in the
above form.
Proof: Let S denote the set of monic polynomials which are of the form
p
∑
ri (λ) ϕi (λ)
i=1
where ri (λ) is a polynomial. Then S ̸=∑∅ because some ϕi (λ) ̸= 0. Then let the ri be chosen
p
such that the degree of the expression i=1 ri (λ) ϕi (λ) is as small as possible. Letting ψ (λ)
equal this sum, it remains to verify it is the greatest common divisor. First, does it divide
each ϕi (λ)? Suppose it fails to divide ϕ1 (λ) . Then by Lemma 8.3.3
ϕ1 (λ) = ψ (λ) l (λ) + r (λ)
where degree of r (λ) is less than that of ψ (λ). Then dividing r (λ) by the leading coefficient
if necessary and denoting the result by ψ 1 (λ) , it follows the degree of ψ 1 (λ) is less than
the degree of ψ (λ) and ψ 1 (λ) equals
ψ 1 (λ) = (ϕ1 (λ) − ψ (λ) l (λ)) a
(
=
ϕ1 (λ) −
(
=
p
∑
)
ri (λ) ϕi (λ) l (λ) a
i=1
(1 − r1 (λ)) ϕ1 (λ) +
p
∑
)
(−ri (λ) l (λ)) ϕi (λ) a
i=2
for a suitable a ∈ F. This is one of the polynomials in S. Therefore, ψ (λ) does not have
the smallest degree after all because the degree of ψ 1 (λ) is smaller. This is a contradiction.
Therefore, ψ (λ) divides ϕ1 (λ) . Similarly it divides all the other ϕi (λ).
If p ∑
(λ) divides all the ϕi (λ) , then it divides ψ (λ) because of the formula for ψ (λ) which
p
equals i=1 ri (λ) ϕi (λ) . Lemma 8.3.7 Suppose ϕ (λ) and ψ (λ) are monic polynomials which are irreducible and
not equal. Then they are relatively prime.
Proof: Suppose η (λ) is a nonconstant polynomial. If η (λ) divides ϕ (λ) , then since
ϕ (λ) is irreducible, η (λ) equals aϕ (λ) for some a ∈ F. If η (λ) divides ψ (λ) then it must
be of the form bψ (λ) for some b ∈ F and so it follows
ψ (λ) =
a
ϕ (λ)
b
but both ψ (λ) and ϕ (λ) are monic polynomials which implies a = b and so ψ (λ) = ϕ (λ).
This is assumed not to happen. It follows the only polynomials which divide both ψ (λ)
and ϕ (λ) are constants and so the two polynomials are relatively prime. Thus a polynomial
which divides them both must be a constant, and if it is monic, then it must be 1. Thus 1
is the greatest common divisor. 216
CHAPTER 8. VECTOR SPACES AND FIELDS
Lemma 8.3.8 Let ψ (λ) be an irreducible monic polynomial not equal to 1 which divides
p
∏
k
ϕi (λ) i , ki a positive integer,
i=1
where each ϕi (λ) is an irreducible monic polynomial. Then ψ (λ) equals some ϕi (λ) .
Proof : Suppose ψ (λ) ̸= ϕi (λ) for all i. Then by Lemma 8.3.7, there exist polynomials
mi (λ) , ni (λ) such that
1 = ψ (λ) mi (λ) + ϕi (λ) ni (λ) .
Hence
(ϕi (λ) ni (λ))
ki
= (1 − ψ (λ) mi (λ))
ki
∏p
k
Then, letting ge (λ) = i=1 ni (λ) i , and applying the binomial theorem, there exists a
polynomial h (λ) such that
ge (λ)
p
∏
ki
p
∏
≡
ϕi (λ)
i=1
i=1
p
∏
=
ni (λ)
ki
p
∏
ϕi (λ)
ki
i=1
(1 − ψ (λ) mi (λ))
ki
= 1 + ψ (λ) h (λ)
i=1
Thus, using the fact that ψ (λ) divides
∏p
i=1
k
ϕi (λ) i , for a suitable polynomial g (λ) ,
g (λ) ψ (λ) = 1 + ψ (λ) h (λ)
1 = ψ (λ) (h (λ) − g (λ))
which is impossible if ψ (λ) is non constant, as assumed. Now here is a simple lemma about canceling monic polynomials.
Lemma 8.3.9 Suppose p (λ) is a monic polynomial and q (λ) is a polynomial such that
p (λ) q (λ) = 0.
Then q (λ) = 0. Also if
p (λ) q1 (λ) = p (λ) q2 (λ)
then q1 (λ) = q2 (λ) .
Proof: Let
p (λ) =
k
∑
pj λj , q (λ) =
j=1
n
∑
qi λi , pk = 1.
i=1
Then the product equals
k ∑
n
∑
pj qi λi+j .
j=1 i=1
Then look at those terms involving λk+n . This is pk qn λk+n and is given to be 0. Since
pk = 1, it follows qn = 0. Thus
k n−1
∑
∑
pj qi λi+j = 0.
j=1 i=1
8.3. LOTS OF FIELDS
217
Then consider the term involving λn−1+k and conclude that since pk = 1, it follows qn−1 = 0.
Continuing this way, each qi = 0. This proves the first part. The second follows from
p (λ) (q1 (λ) − q2 (λ)) = 0. The following is the analog of the fundamental theorem of arithmetic for polynomials.
Theorem 8.3.10 Let f (λ) be a nonconstant
polynomial with coefficients in F. Then there
∏n
is some a ∈ F such that f (λ) = a i=1 ϕi (λ) where ϕi (λ) is an irreducible nonconstant
monic polynomial and repeats are allowed. Furthermore, this factorization is unique in the
sense that any two of these factorizations have the same nonconstant factors in the product,
possibly in different order and the same constant a.
Proof: That such a factorization exists is obvious. If f (λ) is irreducible, you are done.
Factor out the leading coefficient. If not, then f (λ) = aϕ1 (λ) ϕ2 (λ) where these are monic
polynomials. Continue doing this with the ϕi and eventually arrive at a factorization of the
desired form.
It remains to argue the factorization is unique except for order of the factors. Suppose
a
n
∏
ϕi (λ) = b
i=1
m
∏
ψ i (λ)
i=1
where the ϕi (λ) and the ψ i (λ) are all irreducible monic nonconstant polynomials and a, b ∈
F. If n > m, then by Lemma 8.3.8, each ψ i (λ) equals one of the ϕj (λ) . By the above
cancellation lemma, Lemma 8.3.9, you can cancel all these ψ i (λ) with appropriate ϕj (λ)
and obtain a contradiction because the resulting polynomials on either side would have
different degrees. Similarly, it cannot happen that n < m. It follows n = m and the two
products consist of the same polynomials. Then it follows a = b. The following corollary will be well used. This corollary seems rather believable but does
require a proof.
∏p
k
Corollary 8.3.11 Let q (λ) = i=1 ϕi (λ) i where the ki are positive integers and the ϕi (λ)
are irreducible monic polynomials. Suppose also that p (λ) is a monic polynomial which
divides q (λ) . Then
p
∏
r
p (λ) =
ϕi (λ) i
i=1
where ri is a nonnegative integer no larger than ki .
∏s
r
Proof: Using Theorem 8.3.10, let p (λ) = b i=1 ψ i (λ) i where the ψ i (λ) are each
irreducible and monic and b ∈ F. Since p (λ) is monic, b = 1. Then there exists a polynomial
g (λ) such that
p
s
∏
∏
r
k
p (λ) g (λ) = g (λ)
ψ i (λ) i =
ϕi (λ) i
i=1
i=1
Hence g (λ) must be monic. Therefore,
p(λ)
z }| {
p
s
l
∏
∏
∏
r
k
p (λ) g (λ) =
ψ i (λ) i
η j (λ) =
ϕi (λ) i
i=1
j=1
i=1
for η j monic and irreducible. By uniqueness, each ψ i equals one of the ϕj (λ) and the same
holding true of the η i (λ). Therefore, p (λ) is of the desired form. 218
8.3.2
CHAPTER 8. VECTOR SPACES AND FIELDS
Polynomials And Fields
When you have a polynomial like x2 − 3 which has no rational roots, it turns out you can
enlarge the field of rational numbers to obtain a larger field such that this polynomial does
have roots in this larger field. I am going to discuss a systematic way to do this. It will
turn out that for any polynomial with coefficients in any field, there always exists a possibly
larger field such that the polynomial has roots in this larger field. This book has mainly
featured the field of real or complex numbers but this procedure will show how to obtain
many other fields which could be used in most of what was presented earlier in the book.
Here is an important idea concerning equivalence relations which I hope is familiar.
Definition 8.3.12 Let S be a set. The symbol, ∼ is called an equivalence relation on S if
it satisfies the following axioms.
1. x ∼ x
for all x ∈ S. (Reflexive)
2. If x ∼ y then y ∼ x. (Symmetric)
3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive)
Definition 8.3.13 [x] denotes the set of all elements of S which are equivalent to x and
[x] is called the equivalence class determined by x or just the equivalence class of x.
Also recall the notion of equivalence classes.
Theorem 8.3.14 Let ∼ be an equivalence class defined on a set, S and let H denote the
set of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either
x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅.
Definition 8.3.15 Let F be a field, for example the rational numbers, and denote by F [x]
the polynomials having coefficients in F. Suppose p (x) is a polynomial. Let a (x) ∼ b (x)
(a (x) is similar to b (x)) when
a (x) − b (x) = k (x) p (x)
for some polynomial k (x) .
Proposition 8.3.16 In the above definition, ∼ is an equivalence relation.
Proof: First of all, note that a (x) ∼ a (x) because their difference equals 0p (x) . If
a (x) ∼ b (x) , then a (x) − b (x) = k (x) p (x) for some k (x) . But then b (x) − a (x) =
−k (x) p (x) and so b (x) ∼ a (x). Next suppose a (x) ∼ b (x) and b (x) ∼ c (x) . Then
a (x) − b (x) = k (x) p (x) for some polynomial k (x) and also b (x) − c (x) = l (x) p (x) for
some polynomial l (x) . Then
a (x) − c (x) = a (x) − b (x) + b (x) − c (x)
= k (x) p (x) + l (x) p (x) = (l (x) + k (x)) p (x)
and so a (x) ∼ c (x) and this shows the transitive law. With this proposition, here is another definition which essentially describes the elements
of the new field. It will eventually be necessary to assume the polynomial p (x) in the above
definition is irreducible so I will begin assuming this.
8.3. LOTS OF FIELDS
219
Definition 8.3.17 Let F be a field and let p (x) ∈ F [x] be a monic irreducible polynomial
of degree greater than 0. Thus there is no polynomial having coefficients in F which divides
p (x) except for itself and constants. For the similarity relation defined in Definition 8.3.15,
define the following operations on the equivalence classes. [a (x)] is an equivalence class
means that it is the set of all polynomials which are similar to a (x).
[a (x)] + [b (x)] ≡ [a (x) + b (x)]
[a (x)] [b (x)] ≡ [a (x) b (x)]
This collection of equivalence classes is sometimes denoted by F [x] / (p (x)).
Proposition 8.3.18 In the situation of Definition 8.3.17, p (x) and q (x) are relatively
prime for any q (x) ∈ F [x] which is not a multiple of p (x). Also the definitions of addition
and multiplication are well defined. In addition, if a, b ∈ F and [a] = [b] , then a = b.
Proof: First consider the claim about p (x) , q (x) being relatively prime. If ψ (x) is the
greatest common divisor, it follows ψ (x) is either equal to p (x) or 1. If it is p (x) , then
q (x) is a multiple of p (x) . If it is 1, then by definition, the two polynomials are relatively
prime.
To show the operations are well defined, suppose
[a (x)] = [a′ (x)] , [b (x)] = [b′ (x)]
It is necessary to show
[a (x) + b (x)] = [a′ (x) + b′ (x)]
[a (x) b (x)] = [a′ (x) b′ (x)]
Consider the second of the two.
=
a′ (x) b′ (x) − a (x) b (x)
a′ (x) b′ (x) − a (x) b′ (x) + a (x) b′ (x) − a (x) b (x)
=
b′ (x) (a′ (x) − a (x)) + a (x) (b′ (x) − b (x))
Now by assumption (a′ (x) − a (x)) is a multiple of p (x) as is (b′ (x) − b (x)) , so the above
is a multiple of p (x) and by definition this shows [a (x) b (x)] = [a′ (x) b′ (x)]. The case for
addition is similar.
Now suppose [a] = [b] . This means a − b = k (x) p (x) for some polynomial k (x) . Then
k (x) must equal 0 since otherwise the two polynomials a − b and k (x) p (x) could not be
equal because they would have different degree. Note that from this proposition and math induction, if each ai ∈ F,
[
]
an xn + an−1 xn−1 + · · · + a1 x + a0
n
= [an ] [x] + [an−1 ] [x]
n−1
+ · · · [a1 ] [x] + [a0 ]
(8.7)
With the above preparation, here is a definition of a field in which the irreducible polynomial p (x) has a root.
Definition 8.3.19 Let p (x) ∈ F [x] be irreducible and let a (x) ∼ b (x) when a (x) − b (x) is
a multiple of p (x) . Let G denote the set of equivalence classes as described above with the
operations also described in Definition 8.3.17.
Also here is another useful definition and a simple proposition which comes from it.
220
CHAPTER 8. VECTOR SPACES AND FIELDS
Definition 8.3.20 Let F ⊆ K be two fields. Then clearly K is also a vector space over
F. Then also, K is called a finite field extension of F if the dimension of this vector space,
denoted by [K : F ] is finite.
There are some easy things to observe about this.
Proposition 8.3.21 Let F ⊆ K ⊆ L be fields. Then [L : F ] = [L : K] [K : F ].
n
m
Proof: Let {li }i=1 be a basis for L over K and let {kj }j=1 be a basis of K over F . Then
if l ∈ L, there exist unique scalars xi in K such that
l=
n
∑
xi li
i=1
Now xi ∈ K so there exist fji such that
xi =
m
∑
fji kj
j=1
Then it follows that
l=
n ∑
m
∑
fji kj li
i=1 j=1
It follows that {kj li } is a spanning set. If
n ∑
m
∑
fji kj li = 0
i=1 j=1
Then, since the li are independent, it follows that
m
∑
fji kj = 0
j=1
and since {kj } is independent, each fji = 0 for each j for a given arbitrary i. Therefore,
{kj li } is a basis. Theorem 8.3.22 The set of all equivalence classes G ≡ F/ (p (x)) described above with
the multiplicative identity given by [1] and the additive identity given by [0] along with the
operations of Definition 8.3.17, is a field and p ([x]) = [0] . (Thus p has a root in this new
field.) In addition to this, [G : F] = n, the degree of p (x) .
Proof: Everything is obvious except for the existence of the multiplicative inverse and
the assertion that p ([x]) = 0. Suppose then that [a (x)] ̸= [0] . That is, a (x) is not a multiple
−1
of p (x). Why does [a (x)] exist? By Theorem 8.3.6, a (x) , p (x) are relatively prime and
so there exist polynomials ψ (x) , ϕ (x) such that
1 = ψ (x) p (x) + a (x) ϕ (x)
and so
1 − a (x) ϕ (x) = ψ (x) p (x)
which, by definition implies
[1 − a (x) ϕ (x)] = [1] − [a (x) ϕ (x)] = [1] − [a (x)] [ϕ (x)] = [0]
8.3. LOTS OF FIELDS
221
−1
and so [ϕ (x)] = [a (x)] . This shows G is a field.
Now if p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 , p ([x]) = 0 by 8.7 and the definition
which says [p (x)] = [0].
[ 2]
Consider the claim about the dimension.
It
was
just
shown
that
[1]
,
[x]
,
x , · · · , [xn ]
[ 2]
[ n−1 ]
is linearly dependent. Also [1] , [x] , x , · · · , x
is independent because if not, there
would exist a polynomial q (x) of degree n−1 which is a multiple of p (x) which is impossible.
Now for [q (x)] ∈ G, you can write
q (x) = p (x) l (x) + r (x)
where the degree of r (x) is less than
0. Either way, [q (x)] = [r (x)] which
]
[ it equals
[ n] or else
is a linear combination of [1] , [x] , x2 , · · · , xn−1 . Thus [G : F] = n as claimed. Note that if p (x) were not irreducible, then you could find a field extension G such that
[G : F] ≤ n. You could do this by working with an irreducible factor of p (x).
Usually, people simply write b rather than [b] if b ∈ F. Then with this convention,
[bϕ (x)] = [b] [ϕ (x)] = b [ϕ (x)] .
This shows how to enlarge a field to get a new one in which the polynomial has a root.
By using a succession of such enlargements, called field extensions, there will exist a field
in which the given polynomial can be factored into a product of polynomials having degree
one. The field you obtain in this process of enlarging in which the given polynomial factors
in terms of linear factors is called a splitting field.
Theorem 8.3.23 Let p (x) = xn +an−1 xn−1 +· · ·+a1 x+a0 be a polynomial with coefficients
in a field of scalars F. There exists a larger field G such that there exist {z1 , · · · , zn } listed
according to multiplicity such that
p (x) =
n
∏
(x − zi )
i=1
This larger field is called a splitting field. Furthermore,
[G : F] ≤ n!
Proof: From Theorem 8.3.22, there exists a field F1 such that p (x) has a root, z1 (= [x]
if p is irreducible.) Then by the Euclidean algorithm
p (x) = (x − z1 ) q1 (x) + r
where r ∈ F1 . Since p (z1 ) = 0, this requires r = 0. Now do the same for q1 (x) that was
done for p (x) , enlarging the field to F2 if necessary, such that in this new field
q1 (x) = (x − z2 ) q2 (x) .
and so
p (x) = (x − z1 ) (x − z2 ) q2 (x)
After n such extensions, you will have obtained the necessary field G.
Finally consider the claim about dimension. By Theorem 8.3.22, there is a larger field
G1 such that p (x) has a root a1 in G1 and [G : F] ≤ n. Then
p (x) = (x − a1 ) q (x)
Continue this way until the polynomial equals the product of linear factors. Then by
Proposition 8.3.21 applied multiple times, [G : F] ≤ n!. 222
CHAPTER 8. VECTOR SPACES AND FIELDS
Example 8.3.24 The polynomial x2 + 1 is irreducible in R (x) , polynomials having real
coefficients. To see this is the case, suppose ψ (x) divides x2 + 1. Then
x2 + 1 = ψ (x) q (x)
If the degree of ψ (x) is less than 2, then it must be either a constant or of the form ax + b.
In the latter case, −b/a must be a zero of the right side, hence of the left but x2 + 1 has no
real zeros. Therefore, the degree of ψ (x) must be two and q (x) must be a constant. Thus
the only polynomial which divides x2 + 1 are constants
of x2 + 1. Therefore,
[ 2 and multiples
]
2
this shows x( + 1 is) irreducible. Find the inverse of x + x + 1 in the space of equivalence
classes, R/ x2 + 1 .
You can solve this with partial fractions.
(x2
and so
1
x
x+1
=− 2
+ 2
2
+ 1) (x + x + 1)
x +1 x +x+1
(
)
(
)
1 = (−x) x2 + x + 1 + (x + 1) x2 + 1
which implies
(
)
1 ∼ (−x) x2 + x + 1
and so the inverse is [−x] .
The following proposition is interesting. It was essentially proved above but to emphasize
it, here it is again.
Proposition 8.3.25 Suppose p (x) ∈ F [x] is irreducible and has degree n. Then every
element of G = F [x] / (p (x)) is of the form [0] or [r (x)] where the degree of r (x) is less
than n.
Proof: This follows right away from the Euclidean algorithm for polynomials. If k (x)
has degree larger than n − 1, then
k (x) = q (x) p (x) + r (x)
where r (x) is either equal to 0 or has degree less than n. Hence
[k (x)] = [r (x)] . −1
Example 8.3.26 In the situation of the above example, find [ax + b] assuming a2 + b2 ̸=
0. Note this includes all cases of interest thanks to the above proposition.
You can do it with partial fractions as above.
(x2
1
b − ax
a2
= 2
+ 2
2
2
2
+ 1) (ax + b)
(a + b ) (x + 1) (a + b ) (ax + b)
and so
1=
Thus
( 2
)
1
a2
(b
−
ax)
(ax
+
b)
+
x +1
a2 + b2
(a2 + b2 )
1
(b − ax) (ax + b) ∼ 1
a2 + b2
and so
[ax + b]
−1
=
[(b − ax)]
b − a [x]
= 2
a2 + b2
a + b2
You might find it interesting to recall that (ai + b)
−1
=
b−ai
a2 +b2 .
8.3. LOTS OF FIELDS
8.3.3
223
The Algebraic Numbers
Each polynomial having coefficients in a field F has a splitting field. Consider the case of all
polynomials p (x) having coefficients in a field F ⊆ G and consider all roots which are also
in G. The theory of vector spaces is very useful in the study of these algebraic numbers.
Here is a definition.
Definition 8.3.27 The algebraic numbers A are those numbers which are in G and also
roots of some polynomial p (x) having coefficients in F. The minimal polynomial of a ∈ A
is defined to be the monic polynomial p (x) having smallest degree such that p (a) = 0.
Theorem 8.3.28 Let a ∈ A. Then there exists a unique monic irreducible polynomial p (x)
having coefficients in F such that p (a) = 0. This polynomial is the minimal polynomial.
Proof: Let p (x) be the monic polynomial having smallest degree such that p (a) = 0.
Then p (x) is irreducible because if not, there would exist a polynomial having smaller degree
which has a as a root. Now suppose q (x) is monic and irreducible such that q (a) = 0.
q (x) = p (x) l (x) + r (x)
where if r (x) ̸= 0, then it has smaller degree than p (x). But in this case, the equation
implies r (a) = 0 which contradicts the choice of p (x). Hence r (x) = 0 and so, since q (x)
is irreducible, l (x) = 1 showing that p (x) = q (x). Definition 8.3.29 For a an algebraic number, let deg (a) denote the degree of the minimal
polynomial of a.
Also, here is another definition.
Definition 8.3.30 Let a1 , · · · , am be in A. A polynomial in {a1 , · · · , am } will be an expression of the form
∑
ak1 ···kn ak11 · · · aknn
k1 ···kn
where the ak1 ···kn are in F, each kj is a nonnegative integer, and all but finitely many of the
ak1 ···kn equal zero. The collection of such polynomials will be denoted by
F [a1 , · · · , am ] .
Now notice that for a an algebraic number, F [a] is a vector space with field of scalars F.
Similarly, for {a1 , · · · , am } algebraic numbers, F [a1 , · · · , am ] is a vector space with field of
scalars F. The following fundamental proposition is important.
Proposition 8.3.31 Let {a1 , · · · , am } be algebraic numbers. Then
dim F [a1 , · · · , am ] ≤
m
∏
deg (aj )
j=1
and for an algebraic number a,
dim F [a] = deg (a)
Every element of F [a1 , · · · , am ] is in A and F [a1 , · · · , am ] is a field.
224
CHAPTER 8. VECTOR SPACES AND FIELDS
Proof: Let the minimal polynomial be
p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 .
If q (a) ∈ F [a] , then
q (x) = p (x) l (x) + r (x)
where r (x) has degree less than the degree of p (x) if it is not zero. Thus F [a] is spanned
by
}
{
1, a, a2 , · · · , an−1
Since p (x) has smallest degree of all polynomial which have a as a root, the above set is
also linearly independent. This proves the second claim.
Now consider{the first claim. }
By definition, F [a1 , · · · , am ] is obtained from all linear
combinations of ak11 , ak22 , · · · , aknn where the ki are nonnegative integers. From the first
part, it suffices to consider only kj ≤ deg (aj ). Therefore, there exists a spanning set for
F [a1 , · · · , am ] which has
m
∏
deg (ai )
i=1
entries. By Theorem 8.2.4 this proves the first claim.
Finally consider the last claim. Let g (a1 , · · · , am ) be a polynomial in {a1 , · · · , am } in
F [a1 , · · · , am ]. Since
m
∏
dim F [a1 , · · · , am ] ≡ p ≤
deg (aj ) < ∞,
j=1
it follows
2
p
1, g (a1 , · · · , am ) , g (a1 , · · · , am ) , · · · , g (a1 , · · · , am )
are dependent. It follows g (a1 , · · · , am ) is the root of some polynomial having coefficients
in F. Thus everything in F [a1 , · · · , am ] is algebraic. Why is F [a1 , · · · , am ] a field? Let
g (a1 , · · · , am ) be as just mentioned. Then it has a minimal polynomial,
p (x) = xq + aq−1 xq−1 + · · · + a1 x + a0
where the ai ∈ F. Then a0 ̸= 0 or else the polynomial would not be minimal. Therefore,
(
)
q−1
q−2
g (a1 , · · · , am ) g (a1 , · · · , am )
+ aq−1 g (a1 , · · · , am )
+ · · · + a1 = −a0
and so the multiplicative inverse for g (a1 , · · · , am ) is
g (a1 , · · · , am )
q−1
q−2
+ aq−1 g (a1 , · · · , am )
−a0
+ · · · + a1
∈ F [a1 , · · · , am ] .
The other axioms of a field are obvious. Now from this proposition, it is easy to obtain the following interesting result about the
algebraic numbers.
Theorem 8.3.32 The algebraic numbers A, those roots of polynomials in F [x] which are
in G, are a field.
8.3. LOTS OF FIELDS
225
Proof: By definition, each a ∈ A has a minimal polynomial. Let a ̸= 0 be an algebraic
number and let p (x) be its minimal polynomial. Then p (x) is of the form
xn + an−1 xn−1 + · · · + a1 x + a0
where a0 ̸= 0. Otherwise p(x) would not have minimal degree. Then plugging in a yields
( n−1
)
a
+ an−1 an−2 + · · · + a1 (−1)
a
= 1.
a0
(an−1 +an−1 an−2 +···+a1 )(−1)
and so a−1 =
∈ F [a]. By the proposition, every element of F [a]
a0
is in A and this shows that for every nonzero element of A, its inverse is also in A. What
about products and sums of things in A? Are they still in A? Yes. If a, b ∈ A, then both
a + b and ab ∈ F [a, b] and from the proposition, each element of F [a, b] is in A. A typical example of what is of interest here is when the field F of scalars is Q, the
rational numbers and the field G is R. However, you can certainly conceive of many other
examples by considering the integers mod a prime, for example (See Problem 34 on Page
229 for example.) or any of the fields which occur as field extensions in the above.
There is a very interesting thing about F [a1 · · · an ] in the case where F is infinite which
says that there exists a single algebraic γ such that F [a1 · · · an ] = F [γ]. In other words,
every field extension of this sort is a simple field extension. I found this fact in an early
version of [5].
Proposition 8.3.33 There exists γ such that F [a1 · · · an ] = F [γ].
Proof: To begin with, consider F [α, β]. Let γ = α + λβ. Then by Proposition 8.3.31 γ
is an algebraic number and it is also clear
F [γ] ⊆ F [α, β]
I need to show the other inclusion. This will be done for a suitable choice of λ. To do this,
it suffices to verify that both α and β are in F [γ].
Let the minimal polynomials of α and β be f (x) and g (x) respectively. Let the distinct
roots of f (x) and g (x) be {α1 , α2 , · · · , αn } and {β 1 , β 2 , · · · , β m } respectively. These roots
are in a field which contains splitting fields of both f (x) and g (x). Let α = α1 and β = β 1 .
Now define
h (x) ≡ f (α + λβ − λx) ≡ f (γ − λx)
so that h (β) = f (α) = 0. It follows (x − β) divides both h((x) and) g (x). If (x − η) is a
different linear factor of both g (x) and h (x) then it must be x − β j for some β j for some
j > 1 because these are the only factors of g (x) . Therefore, this would require
( )
(
)
0 = h β j = f α1 + λβ 1 − λβ j
and so it would be the case that α1 + λβ 1 − λβ j = αk for some k. Hence
λ=
αk − α1
β1 − βj
Now there are finitely many quotients of the above form and if λ is chosen to not be any of
them, then the above cannot happen and so in this case, the only linear factor of both g (x)
and h (x) will be (x − β). Choose such a λ.
Let ϕ (x) be the minimal polynomial of β with respect to the field F [γ]. Then this
minimal polynomial must divide both h (x) and g (x) because h (β) = g (β) = 0. However,
226
CHAPTER 8. VECTOR SPACES AND FIELDS
the only factor these two have in common is x − β and so ϕ (x) = x − β which requires
β ∈ F [γ] . Now also α = γ − λβ and so α ∈ F [γ] also. Therefore, both α, β ∈ F [γ] which
forces F [α, β] ⊆ F [γ] . This proves the proposition in the case that n = 2. The general result
follows right away by observing that
F [a1 · · · an ] = F [a1 · · · an−1 ] [an ]
and using induction. When you have a field F, F (a) denotes the smallest field which contains both F and a.
When a is algebraic over F, it follows that F (a) = F [a] . The latter is easier to think about
because it just involves polynomials.
8.3.4
The Lindemannn Weierstrass Theorem And Vector Spaces
As another application of the abstract concept of vector spaces, there is an amazing theorem
due to Weierstrass and Lindemannn.
Theorem 8.3.34 Suppose a1 , · · · , an are algebraic numbers, roots of a polynomial with
rational coefficients, and suppose α1 , · · · , αn are distinct algebraic numbers. Then
n
∑
ai eαi ̸= 0
i=1
In other words, the {eα1 , · · · , eαn } are independent as vectors with field of scalars equal to
the algebraic numbers.
There is a proof of this in the appendix. It is long and hard but only depends on
elementary considerations other than some algebra involving symmetric polynomials. See
Theorem G.3.5.
A number is transcendental, as opposed to algebraic, if it is not a root of a polynomial
which has integer (rational) coefficients. Most numbers are this way but it is hard to verify
that specific numbers are transcendental. That π is transcendental follows from
e0 + eiπ = 0.
By the above theorem, this could not happen if π were algebraic because then iπ would also
be algebraic. Recall these algebraic numbers form a field and i is clearly algebraic, being
a root of x2 + 1. This fact about π was first proved by Lindemannn in 1882 and then the
general theorem above was proved by Weierstrass in 1885. This fact that π is transcendental
solved an old problem called squaring the circle which was to construct a square with the
same area as a circle using a straight edge and compass. It can be shown that the fact π is
transcendental implies this problem is impossible.1
8.4
Exercises

      
1
1
1
0
1. Let H denote span  2  ,  4  ,  3  ,  1  . Find the dimension of H
0
0
1
1
and determine a basis.
1 Gilbert, the librettist of the Savoy operas, may have heard about this great achievement. In Princess
Ida which opened in 1884 he has the following lines. “As for fashion they forswear it, so the say - so they
say; and the circle - they will square it some fine day some fine day.” Of course it had been proved impossible
to do this a couple of years before.
8.4. EXERCISES
227
{
}
2. Let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : u3 = u1 = 0 . Is M a subspace? Explain.
{
}
3. Let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : u3 ≥ u1 . Is M a subspace? Explain.
{
}
4. Let w ∈ R4 and let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : w · u = 0 . Is M a subspace?
Explain.
{
}
5. Let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : ui ≥ 0 for each i = 1, 2, 3, 4 . Is M a subspace?
Explain.
6. Let w, w1 be given vectors in R4 and define
{
}
M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : w · u = 0 and w1 · u = 0 .
Is M a subspace? Explain.
{
}
7. Let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : |u1 | ≤ 4 . Is M a subspace? Explain.
{
}
8. Let M = u = (u1 , u2 , u3 , u4 ) ∈ R4 : sin (u1 ) = 1 . Is M a subspace? Explain.
9. Suppose {x1 , · · · , xk } is a set of vectors from Fn . Show that 0 is in span (x1 , · · · , xk ) .
10. Consider the vectors of the form




 2t + 3s
 s − t  : s, t ∈ R .


t+s
Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspace
and find its dimension.
11. Consider the vectors of the form

2t + 3s + u



s−t


t+s



u





 : s, t, u ∈ R .





Is this set of vectors a subspace of R4 ? If so, explain why, give a basis for the subspace
and find its dimension.
12. Consider the vectors of the form

2t + u + 1



 t + 3u
 t+s+v



u





 : s, t, u, v ∈ R .





Is this set of vectors a subspace of R4 ? If so, explain why, give a basis for the subspace
and find its dimension.
13. Let V denote the set of functions defined on [0, 1]. Vector addition is defined as
(f + g) (x) ≡ f (x) + g (x) and scalar multiplication is defined as (αf ) (x) ≡ α (f (x)).
Verify V is a vector space. What is its dimension, finite or infinite? Justify your
answer.
228
CHAPTER 8. VECTOR SPACES AND FIELDS
14. Let V denote the set of polynomial functions defined on [0, 1]. Vector addition is
defined as (f + g) (x) ≡ f (x) + g (x) and scalar multiplication is defined as (αf ) (x) ≡
α (f (x)). Verify V is a vector space. What is its dimension, finite or infinite? Justify
your answer.
15. Let V be the set of polynomials defined on R having degree no more than 4. Give a
basis for this vector space.
√
16. Let the vectors be of the form a + b 2 where a, b are rational numbers and let the
field of scalars be F = Q, the rational numbers. Show directly this is a vector space.
What is its dimension? What is a basis for this vector space?
17. Let V be a vector space with field of scalars F and suppose {v1 , · · · , vn } is a basis for
V . Now let W also be a vector space with field of scalars F. Let L : {v1 , · · · , vn } →
W be a function such that Lvj = wj . Explain how L can be extended to a linear
transformation mapping V to W in a unique way.
18. If you have 5 vectors in F5 and the vectors are linearly independent, can it always be
concluded they span F5 ? Explain.
19. If you have 6 vectors in F5 , is it possible they are linearly independent? Explain.
20. Suppose V, W are subspaces of Fn . Show V ∩ W defined to be all vectors which are in
both V and W is a subspace also.
21. Suppose V and W both have dimension equal to 7 and they are subspaces of a vector
space of dimension 10. What are the possibilities for the dimension of V ∩ W ? Hint:
Remember that a linear independent set can be extended to form a basis.
22. Suppose V has dimension p and W has dimension q and they are each contained in
a subspace, U which has dimension equal to n where n > max (p, q) . What are the
possibilities for the dimension of V ∩ W ? Hint: Remember that a linear independent
set can be extended to form a basis.
23. If b ̸= 0, can the solution set of Ax = b be a plane through the origin? Explain.
24. Suppose a system of equations has fewer equations than variables and you have found
a solution to this system of equations. Is it possible that your solution is the only one?
Explain.
25. Suppose a system of linear equations has a 2×4 augmented matrix and the last column
is a pivot column. Could the system of linear equations be consistent? Explain.
26. Suppose the coefficient matrix of a system of n equations with n variables has the
property that every column is a pivot column. Does it follow that the system of
equations must have a solution? If so, must the solution be unique? Explain.
27. Suppose there is a unique solution to a system of linear equations. What must be true
of the pivot columns in the augmented matrix.
28. State whether each of the following sets of data are possible for the matrix equation
Ax = b. If possible, describe the solution set. That is, tell whether there exists a
unique solution no solution or infinitely many solutions.
(a) A is a 5 × 6 matrix, rank (A) = 4 and rank (A|b) = 4. Hint: This says b is in
the span of four of the columns. Thus the columns are not independent.
8.4. EXERCISES
229
(b) A is a 3 × 4 matrix, rank (A) = 3 and rank (A|b) = 2.
(c) A is a 4 × 2 matrix, rank (A) = 4 and rank (A|b) = 4. Hint: This says b is in
the span of the columns and the columns must be independent.
(d) A is a 5 × 5 matrix, rank (A) = 4 and rank (A|b) = 5. Hint: This says b is not
in the span of the columns.
(e) A is a 4 × 2 matrix, rank (A) = 2 and rank (A|b) = 2.
29. Suppose A is an m × n matrix in which m ≤ n. Suppose also that the rank of A equals
m. Show that A maps Fn onto Fm . Hint: The vectors e1 , · · · , em occur as columns
in the row reduced echelon form for A.
30. Suppose A is an m × n matrix in which m ≥ n. Suppose also that the rank of A equals
n. Show that A is one to one. Hint: If not, there exists a vector, x such that Ax = 0,
and this implies at least one column of A is a linear combination of the others. Show
this would require the column rank to be less than n.
31. Explain why an n × n matrix A is both one to one and onto if and only if its rank is
n.
32. If you have not done this already, here it is again. It is a very important result.
Suppose A is an m × n matrix and B is an n × p matrix. Show that
dim (ker (AB)) ≤ dim (ker (A)) + dim (ker (B)) .
Hint: Consider the subspace, B (Fp ) ∩ ker (A) and suppose a basis for this subspace
is {w1 , · · · , wk } . Now suppose {u1 , · · · , ur } is a basis for ker (B) . Let {z1 , · · · , zk }
be such that Bzi = wi and argue that
ker (AB) ⊆ span (u1 , · · · , ur , z1 , · · · , zk ) .
Here is how you do this. Suppose ABx = 0. Then Bx ∈ ker (A) ∩ B (Fp ) and so
∑k
Bx = i=1 Bzi showing that
x−
k
∑
zi ∈ ker (B) .
i=1
33. Recall that every positive integer can be factored into a product of primes in a unique
way. Show there must be infinitely many primes. Hint: Show that if you have any
finite set of primes and you multiply them and then add 1, the result cannot be
divisible by any of the primes in your finite set. This idea in the hint is due to Euclid
who lived about 300 B.C.
34. There are lots of fields. This will give an example of a finite field. Let Z denote the set
of integers. Thus Z = {· · · , −3, −2, −1, 0, 1, 2, 3, · · · }. Also let p be a prime number.
We will say that two integers, a, b are equivalent and write a ∼ b if a − b is divisible
by p. Thus they are equivalent if a − b = px for some integer x. First show that
a ∼ a. Next show that if a ∼ b then b ∼ a. Finally show that if a ∼ b and b ∼ c
then a ∼ c. For a an integer, denote by [a] the set of all integers which is equivalent
to a, the equivalence class of a. Show first that is suffices to consider only [a] for
a = 0, 1, 2, · · · , p − 1 and that for 0 ≤ a < b ≤ p − 1, [a] ̸= [b]. That is, [a] = [r] where
r ∈ {0, 1, 2, · · · , p − 1}. Thus there are exactly p of these equivalence classes. Hint:
230
CHAPTER 8. VECTOR SPACES AND FIELDS
Recall the Euclidean algorithm. For a > 0, a = mp + r where r < p. Next define the
following operations.
[a] + [b] ≡ [a + b]
[a] [b] ≡ [ab]
Show these operations are well defined. That is, if [a] = [a′ ] and [b] = [b′ ] , then
[a] + [b] = [a′ ] + [b′ ] with a similar conclusion holding for multiplication. Thus for
addition you need to verify [a + b] = [a′ + b′ ] and for multiplication you need to verify
[ab] = [a′ b′ ]. For example, if p = 5 you have [3] = [8] and [2] = [7] . Is [2 × 3] = [8 × 7]?
Is [2 + 3] = [8 + 7]? Clearly so in this example because when you subtract, the result
is divisible by 5. So why is this so in general? Now verify that {[0] , [1] , · · · , [p − 1]}
with these operations is a Field. This is called the integers modulo a prime and is
written Zp . Since there are infinitely many primes p, it follows there are infinitely
many of these finite fields. Hint: Most of the axioms are easy once you have shown
the operations are well defined. The only two which are tricky are the ones which
give the existence of the additive inverse and the multiplicative inverse. Of these, the
first is not hard. − [x] = [−x]. Since p is prime, there exist integers x, y such that
1 = px+ky and so 1−ky = px which says 1 ∼ ky and so [1] = [ky] . Now you finish the
argument. What is the multiplicative identity in this collection of equivalence classes?
Of course you could now consider field extensions based on these fields.
35. Suppose the field of scalars is Z2 described above. Show that
(
)(
) (
)(
) (
0 1
0 0
0 0
0 1
1
−
=
0 0
1 0
1 0
0 0
0
0
1
)
Thus the identity is a comutator. Compare this with Problem 50 on Page 206.
36. Suppose V is a vector space with field of scalars F. Let T ∈ L (V, W ) , the space of
linear transformations mapping V onto W where W is another vector space. Define
an equivalence relation on V as follows. v ∼ w means v − w ∈ ker (T ) . Recall that
ker (T ) ≡ {v : T v = 0}. Show this is an equivalence relation. Now for [v] an equivalence class define T ′ [v] ≡ T v. Show this is well defined. Also show that with the
operations
[v] + [w] ≡ [v + w]
α [v] ≡ [αv]
this set of equivalence classes, denoted by V / ker (T ) is a vector space. Show next that
T ′ : V / ker (T ) → W is one to one, linear, and onto. This new vector space, V / ker (T )
is called a quotient space. Show its dimension equals the difference between the
dimension of V and the dimension of ker (T ).
37. Let V be an n dimensional vector space and let W be a subspace. Generalize the
above problem to define and give properties of V /W . What is its dimension? What
is a basis?
38. If F and G are two fields and F ⊆ G, can you consider G as a vector space with field
of scalars F? Explain.
39. Let A denote the real roots of polynomials in Q [x] . Show A can be considered a
vector space with field of scalars Q. What is the dimension of this vector space, finite
or infinite?
8.4. EXERCISES
231
n
40. As mentioned, for distinct algebraic numbers αi , the complex numbers {eαi }i=1 are
linearly independent over the field of scalars A where A denotes the algebraic numbers,
those which are roots of a polynomial having integer (rational) coefficients. What is
the dimension of the vector space C with field of scalars A, finite or infinite? If the
field of scalars were C instead of A, would this change? What if the field of scalars
were R?
41. Suppose F is a countable field and let A be the algebraic numbers, those numbers in
G which are roots of a polynomial in F [x]. Show A is also countable.
42. This problem is on partial fractions. Suppose you have
R (x) =
p (x)
, degree of p (x) < degree of denominator.
q1 (x) · · · qm (x)
where the polynomials qi (x) are relatively prime and all the polynomials p (x) and
qi (x) have coefficients in a field of scalars F. Thus there exist polynomials ai (x)
having coefficients in F such that
1=
m
∑
ai (x) qi (x)
i=1
Explain why
∑m
m
p (x) i=1 ai (x) qi (x) ∑ ai (x) p (x)
∏
R (x) =
=
q1 (x) · · · qm (x)
j̸=i qj (x)
i=1
Now continue doing this on each term in the above sum till finally you obtain an
expression of the form
m
∑
bi (x)
i=1
qi (x)
Using the Euclidean algorithm for polynomials, explain why the above is of the form
M (x) +
m
∑
ri (x)
i=1
qi (x)
where the degree of each ri (x) is less than the degree of qi (x) and M (x) is a polynomial. Now argue that M (x) = 0. From this explain why the usual partial fractions
expansion of calculus must be true. You can use the fact that every polynomial having
real coefficients factors into a product of irreducible quadratic polynomials and linear
polynomials having real coefficients. This follows from the fundamental theorem of
algebra in the appendix.
43. Suppose {f1 , · · · , fn } is an independent set of smooth functions defined on some interval (a, b). Now let A be an invertible n × n matrix. Define new functions {g1 , · · · , gn }
as follows.




g1
f1
 .. 
 . 
 .  = A  .. 
gn
fn
Is it the case that {g1 , · · · , gn } is also independent? Explain why.
232
CHAPTER 8. VECTOR SPACES AND FIELDS
Chapter 9
Linear Transformations
9.1
Matrix Multiplication As A Linear Transformation
Definition 9.1.1 Let V and W be two finite dimensional vector spaces. A function, L
which maps V to W is called a linear transformation and written L ∈ L (V, W ) if for all
scalars α and β, and vectors v,w,
L (αv+βw) = αL (v) + βL (w) .
An example of a linear transformation is familiar matrix multiplication. Let A = (aij )
be an m × n matrix. Then an example of a linear transformation L : Fn → Fm is given by
(Lv)i ≡
n
∑
aij vj .
j=1
Here
9.2


v1


v ≡  ...  ∈ Fn .
vn
L (V, W ) As A Vector Space
Definition 9.2.1 Given L, M ∈ L (V, W ) define a new element of L (V, W ) , denoted by
L + M according to the rule1
(L + M ) v ≡ Lv + M v.
For α a scalar and L ∈ L (V, W ) , define αL ∈ L (V, W ) by
αL (v) ≡ α (Lv) .
You should verify that all the axioms of a vector space hold for L (V, W ) with the
above definitions of vector addition and scalar multiplication. What about the dimension
of L (V, W )?
Before answering this question, here is a useful lemma. It gives a way to define linear
transformations and a way to tell when two of them are equal.
1 Note
that this is the standard way of defining the sum of two functions.
233
234
CHAPTER 9. LINEAR TRANSFORMATIONS
Lemma 9.2.2 Let V and W be vector spaces and suppose {v1 , · · · , vn } is a basis for V.
Then if L : V → W is given by Lvk = wk ∈ W and
( n
)
n
n
∑
∑
∑
ak Lvk =
ak wk
L
a k vk ≡
k=1
k=1
k=1
then L is well defined and is in L (V, W ) . Also, if L, M are two linear transformations such
that Lvk = M vk for all k, then M = L.
Proof: L is well defined on V because, since {v1 , · · · , vn } is a basis, there is exactly one
way to write a given vector of V as a linear combination. Next, observe
that L is obviously
∑n
linear from the definition. If L, M are equal on the basis, then if k=1 ak vk is an arbitrary
vector of V,
( n
)
( n
)
n
n
∑
∑
∑
∑
L
ak vk =
ak Lvk =
ak M v k = M
ak vk
k=1
k=1
k=1
k=1
and so L = M because they give the same result for every vector in V . The message is that when you define a linear transformation, it suffices to tell what it
does to a basis.
Theorem 9.2.3 Let V and W be finite dimensional linear spaces of dimension n and m
respectively Then dim (L (V, W )) = mn.
Proof: Let two sets of bases be
{v1 , · · · , vn } and {w1 , · · · , wm }
for V and W respectively. Using Lemma 9.2.2, let wi vj ∈ L (V, W ) be the linear transformation defined on the basis, {v1 , · · · , vn }, by
wi vk (vj ) ≡ wi δ jk
where δ ik = 1 if i = k and 0 if i ̸= k. I will show that L ∈ L (V, W ) is a linear combination
of these special linear transformations called dyadics.
Then let L ∈ L (V, W ). Since {w1 , · · · , wm } is a basis, there exist constants, djk such
that
m
∑
Lvr =
djr wj
j=1
Now consider the following sum of dyadics.
m ∑
n
∑
dji wj vi
j=1 i=1
Apply this to vr . This yields
m ∑
n
∑
j=1 i=1
dji wj vi (vr ) =
m ∑
n
∑
j=1 i=1
dji wj δ ir =
m
∑
djr wi = Lvr
j=1
∑m ∑n
Therefore, L = j=1 i=1 dji wj vi showing the span of the dyadics is all of L (V, W ) .
Now consider whether these dyadics form a linearly independent set. Suppose
∑
dik wi vk = 0.
i,k
9.3. THE MATRIX OF A LINEAR TRANSFORMATION
235
Are all the scalars dik equal to 0?
0=
∑
dik wi vk (vl ) =
m
∑
dil wi
i=1
i,k
and so, since {w1 , · · · , wm } is a basis, dil = 0 for each i = 1, · · · , m. Since l is arbitrary,
this shows dil = 0 for all i and l. Thus these linear transformations form a basis and this
shows that the dimension of L (V, W ) is mn as claimed because there are m choices for the
wi and n choices for the vj . 9.3
The Matrix Of A Linear Transformation
Definition 9.3.1 In Theorem 9.2.3, the matrix of the linear transformation L ∈ L (V, W )
with respect to the ordered bases β ≡ {v1 , · · · , vn } for V and γ ≡ {w∑
1 , · · · , wm } for W is
defined to be [L] where [L]ij = dij . Thus this matrix is defined by L = i,j [L]ij wi vi . When
it is desired to feature the bases β, γ, this matrix will be denoted as [L]γβ . When there is
only one basis β, this is denoted as [L]β .
If V is an n dimensional vector space and β = {v1 , · · · , vn } is a basis for V, there exists
a linear map
q β : Fn → V
defined as
qβ (a) ≡
n
∑
ai vi
i=1


a1
n

 ∑
a =  ...  =
ai ei ,
i=1
an
(
)T
for ei the standard basis vectors for Fn consisting of 0 · · · 1 · · · 0
. Thus the 1
is in the ith position and the other entries are 0.
It is clear that q defined in this way, is one to one, onto, and linear. For v ∈ V, qβ−1 (v)
is a vector in Fn called the component vector of v with respect to the basis {v1 , · · · , vn }.
where
Proposition 9.3.2 The matrix of a linear transformation with respect to ordered bases β, γ
as described above is characterized by the requirement that multiplication of the components
of v by [L]γβ gives the components of Lv.
∑
Proof: This happens because by definition, if v = i xi vi , then
∑
∑∑
∑∑
Lv =
xi Lvi ≡
[L]ji xi wj =
[L]ji xi wj
i
i
∑
j
j
i
and so the j th component of Lv is i [L]ji xi , the j th component of the matrix times the
component vector of v. Could there be some other matrix which will do this? No, because if
such a matrix is M, then for any x , it follows from what was just shown that [L] x = M x.
Hence [L] = M . 236
CHAPTER 9. LINEAR TRANSFORMATIONS
The above proposition shows that the following diagram determines the matrix of a
linear transformation. Here qβ and qγ are the maps defined above with reference to the
ordered bases, {v1 , · · · , vn } and {w1 , · · · , wm } respectively.
β = {v1 , · · · , vn }
L
→
◦
→
[L]γβ
V
qβ ↑
Fn
{w1 , · · · , wm } = γ
W
↑ qγ
Fm
(9.1)
In terms of this diagram, the matrix [L]γβ is the matrix chosen to make the diagram
“commute” It may help to write the description of [L]γβ in the form
(
Lv1
···
Lvn
)
=
(
w1
···
wm
)
[L]γβ
(9.2)
with the understanding that you do the multiplications in a formal manner just as you
would if everything were numbers. If this helps, use it. If it does not help, ignore it.
Example 9.3.3 Let
V ≡ { polynomials of degree 3 or less},
W ≡ { polynomials of degree 2 or less},
{
}
and L ≡ D where D is the differentiation operator. A basis for V is β = 1, x, x2 , x3 and
a basis for W is γ = {1, x, x2 }.
What is the matrix of this linear transformation with respect to this basis? Using 9.2,
(
) (
)
0 1 2x 3x2 = 1 x x2 [D]γβ .
It follows from this that the first column of [D]γβ is


0
 0 
0
The next three columns of [D]γβ are
    
0
0
1
 0 , 2 , 0 
3
0
0


and so
[D]γβ
0 1 0
= 0 0 2
0 0 0

0
0 .
3
Now consider the important case where V = Fn , W = Fm , and the basis chosen is the
standard basis of vectors ei described above.
β = {e1 , · · · , en } , γ = {e1 , · · · , em }
Let L be a linear transformation from Fn to Fm and let A be the matrix of the transformation
with respect to these bases. In this case the coordinate maps qβ and qγ are simply the
identity maps on Fn and Fm respectively, and can be accomplished by simply multiplying
9.3. THE MATRIX OF A LINEAR TRANSFORMATION
237
by the appropriate sized identity matrix. The requirement that A is the matrix of the
transformation amounts to
Lb = Ab
What about the situation where different pairs of bases are chosen for V and W ? How
are the two matrices with respect to these choices related? Consider the following diagram
which illustrates the situation.
Fn A2
Fm
−
→
qβ 2 ↓
◦ qγ 2 ↓
V −
L
W
→
qβ 1 ↑
◦ qγ 1 ↑
Fn A1
Fm
−
→
In this diagram qβ i and qγ i are coordinate maps as described above. From the diagram,
qγ−1
qγ 2 A2 qβ−1
qβ 1 = A1 ,
1
2
where qβ−1
qβ 1 and qγ−1
qγ 2 are one to one, onto, and linear maps which may be accomplished
1
2
by multiplication by a square matrix. Thus there exist matrices P, Q such that P : Fn → Fn
and Q : Fm → Fm are invertible and
P A2 Q = A1 .
Example 9.3.4 Let β ≡ {v1 , · · · , vn } and γ ≡ {w1 , · · · , wn } be two bases for V . Let L
be the linear transformation which maps vi to wi . Find [L]γβ . In case V = Fn and letting
δ = {e1 , · · · , en } , the usual basis for Fn , find [L]δ .
∑ Letting δ ij be the symbol which equals 1 if i = j and 0 if i ̸= j, it follows that L =
i,j δ ij wi vj and so [L]γβ = I the identity matrix. For the second part, you must have
(
and so
where
(
w1
···
w1
···
wn
)
=
(
v1
···
vn
)
[L]δ
(
)−1 (
)
w1 · · · wn
[L]δ = v1 · · · vn
)
wn is the n × n matrix having ith column equal to wi .
Definition 9.3.5 In the special case where V = W and only one basis is used for V = W,
this becomes
qβ−1
qβ 2 A2 qβ−1
qβ 1 = A1 .
1
2
Letting S be the matrix of the linear transformation qβ−1
qβ 1 with respect to the standard basis
2
vectors in Fn ,
S −1 A2 S = A1 .
(9.3)
When this occurs, A1 is said to be similar to A2 and A → S −1 AS is called a similarity
transformation.
Recall the following.
Definition 9.3.6 Let S be a set. The symbol ∼ is called an equivalence relation on S if it
satisfies the following axioms.
1. x ∼ x
for all x ∈ S. (Reflexive)
238
CHAPTER 9. LINEAR TRANSFORMATIONS
2. If x ∼ y then y ∼ x. (Symmetric)
3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive)
Definition 9.3.7 [x] denotes the set of all elements of S which are equivalent to x and [x]
is called the equivalence class determined by x or just the equivalence class of x.
Also recall the notion of equivalence classes.
Theorem 9.3.8 Let ∼ be an equivalence class defined on a set S and let H denote the set
of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ∼ y
and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅.
Theorem 9.3.9 In the vector space of n × n matrices, define
A∼B
if there exists an invertible matrix S such that
A = S −1 BS.
Then ∼ is an equivalence relation and A ∼ B if and only if whenever V is an n dimensional
vector space, there exists L ∈ L (V, V ) and bases {v1 , · · · , vn } and {w1 , · · · , wn } such that
A is the matrix of L with respect to {v1 , · · · , vn } and B is the matrix of L with respect to
{w1 , · · · , wn }.
Proof: A ∼ A because S = I works in the definition. If A ∼ B , then B ∼ A, because
A = S −1 BS
implies B = SAS −1 . If A ∼ B and B ∼ C, then
A = S −1 BS, B = T −1 CT
and so
−1
A = S −1 T −1 CT S = (T S)
CT S
which implies A ∼ C. This verifies the first part of the conclusion.
Now let V be an n dimensional vector space, A ∼ B so A = S −1 BS and pick a basis for
V,
β ≡ {v1 , · · · , vn }.
Define L ∈ L (V, V ) by
Lvi ≡
∑
aji vj
j
where A = (aij ) . Thus A is the matrix of the linear transformation L. Consider the diagram
Fn
qγ ↓
V
qβ ↑
Fn
B
Fn
−
→
◦ qγ ↓
L
V
−
→
◦ qβ ↑
A
Fn
−
→
where qγ is chosen to make the diagram commute. Thus we need S = qγ−1 qβ which requires
qγ = qβ S −1
9.3. THE MATRIX OF A LINEAR TRANSFORMATION
239
Then it follows that B is the matrix of L with respect to the basis
{qγ e1 , · · · , qγ en } ≡ {w1 , · · · , wn }.
That is, A and B are matrices of the same linear transformation L. Conversely, if A ∼ B,
let L be as just described. Thus L = qβ Aqβ−1 = qβ SBS −1 qβ−1 . Let qγ ≡ qβ S and it follows
that B is the matrix of L with respect to {qβ Se1 , · · · , qβ Sen }. What if the linear transformation consists of multiplication by a matrix A and you want
to find the matrix of this linear transformation with respect to another basis? Is there an
easy way to do it? The next proposition considers this.
Proposition 9.3.10 Let A be an m×n matrix and let L be the linear transformation which
is defined by
( n
)
n
m ∑
n
∑
∑
∑
L
xk ek ≡
(Aek ) xk ≡
Aik xk ei
k=1
i=1 k=1
k=1
In simple language, to find Lx, you multiply on the left of x by A. (A is the matrix of L
with respect to the standard basis.) Then the matrix M of this linear transformation with
respect to the bases β = {u1 , · · · , un } for Fn and γ = {w1 , · · · , wm } for Fm is given by
where
(
w1
···
(
)−1 (
)
M = w1 · · · wm
A u1 · · · un
)
wm is the m × m matrix which has wj as its j th column.
Proof: Consider the following diagram.
Fn
qβ ↑
Fn
L
→
◦
→
M
Fm
↑ qγ
Fm
Here the coordinate maps are defined in the usual way. Thus
qβ
(
x1
···
xn
)T
≡
n
∑
xi ui .
i=1
Therefore, q(β can be considered
the same as multiplication of a vector in Fn on the left by
)
the matrix u1 · · · un . Similar considerations apply to qγ . Thus it is desired to have
the following for an arbitrary x ∈ Fn .
)
)
(
(
A u1 · · · un x = w1 · · · wn M x
Therefore, the conclusion of the proposition follows. In the special case where m = n and F = C or R and {u1 , · · · , un } is an orthonormal
basis and you want M , the matrix of L with respect to this new orthonormal basis, it follows
from the above that
(
)∗ (
)
M = u1 · · · um
A u1 · · · un = U ∗ AU
where U is a unitary matrix. Thus matrices with respect to two orthonormal bases are
unitarily similar.
240
CHAPTER 9. LINEAR TRANSFORMATIONS
Definition 9.3.11 An n × n matrix A, is diagonalizable if there exists an invertible n × n
matrix S such that S −1 AS = D, where D is a diagonal matrix. Thus D has zero entries
everywhere except on the main diagonal. Write diag (λ1 · · · , λn ) to denote the diagonal
matrix having the λi down the main diagonal.
The following theorem is of great significance.
Theorem 9.3.12 Let A be an n × n matrix. Then A is diagonalizable if and only if Fn has
a basis of eigenvectors of A. In this case, S of Definition 9.3.11 consists of the n × n matrix
whose columns are the eigenvectors of A and D = diag (λ1 , · · · , λn ) .
Proof: Suppose first that Fn has a basis of eigenvectors, {v1 
, · · · , vn} where Avi = λi vi .
uT1
(
)


Then let S denote the matrix v1 · · · vn and let S −1 ≡  ...  where
{
uTi vj = δ ij ≡
uTn
1 if i = j
.
0 if i ̸= j
S −1 exists because S has rank n. Then from block multiplication,
 T 
 T 
u1
u1
 .. 
 .. 
−1
S AS =  .  (Av1 · · · Avn ) =  .  (λ1 v1 · · · λn vn )
uTn



=

λ1
0
..
.
0
λ2
..
.
···
0
..
.
0
···
0
0
···
..
.
λn
uTn



 = D.

−1
Next suppose A is diagonalizable so S AS = D ≡ diag (λ1 , · · · , λn ) . Then the columns
of S form a basis because S −1 is given to exist. (It only remains) to verify that these
v1 · · · vn , AS = SD and so
columns
of S are eigenvectors.
But letting) S =
(
) (
Av1 · · · Avn = λ1 v1 · · · λn vn which shows that Avi = λi vi . It makes sense to speak of the determinant of a linear transformation as described in the
following corollary.
Corollary 9.3.13 Let L ∈ L (V, V ) where V is an n dimensional vector space and let A be
the matrix of this linear transformation with respect to a basis on V. Then it is possible to
define
det (L) ≡ det (A) .
Proof: Each choice of basis for V determines a matrix for L with respect to the basis.
If A and B are two such matrices, it follows from Theorem 9.3.9 that
A = S −1 BS
and so
But
(
)
det (A) = det S −1 det (B) det (S) .
(
)
(
)
1 = det (I) = det S −1 S = det (S) det S −1
and so
det (A) = det (B) 9.3. THE MATRIX OF A LINEAR TRANSFORMATION
241
Definition 9.3.14 Let A ∈ L (X, Y ) where X and Y are finite dimensional vector spaces.
Define rank (A) to equal the dimension of A (X) .
The following theorem explains how the rank of A is related to the rank of the matrix
of A.
Theorem 9.3.15 Let A ∈ L (X, Y ). Then rank (A) = rank (M ) where M is the matrix of
A taken with respect to a pair of bases for the vector spaces X, and Y.
Proof: Recall the diagram which describes what is meant by the matrix of A. Here the
two bases are as indicated.
β = {v1 , · · · , vn }
X
A
Y
−
→
qβ ↑ ◦ ↑ qγ
m
Fn M
−
→ F
{w1 , · · · , wm } = γ
Let {Ax1 , · · · , Axr } be a basis for AX. Thus
{
}
qγ M qβ−1 x1 , · · · , qγ M qβ−1 xr
is a basis for AX. It follows that
{
}
−1
−1
M qX
x1 , · · · , M qX
xr
is linearly independent and so rank (A) ≤ rank (M ) . However, one could interchange the
roles of M and A in the above argument and thereby turn the inequality around. The following result is a summary of many concepts.
Theorem 9.3.16 Let L ∈ L (V, V ) where V is a finite dimensional vector space. Then the
following are equivalent.
1. L is one to one.
2. L maps a basis to a basis.
3. L is onto.
4. det (L) ̸= 0
5. If Lv = 0 then v = 0.
∑n
n
Proof: Suppose
∑n first L is one to one and let β = {vi }i=1 be a basis. Then if i=1 ci Lvi =
0 it follows L ( i=1
∑nci vi ) = 0 which means that since L (0) = 0, and L is one to one, it must
be the case that i=1 ci vi = 0. Since {vi } is a basis, each ci = 0 which shows {Lvi } is a
linearly independent set. Since there are n of these, it must be that this is a basis.
Now suppose 2.). Then letting {vi } ∑
be a basis, and y∑
∈ V, it follows from part 2.) that
n
n
there are constants, {ci } such that y = i=1 ci Lvi = L ( i=1 ci vi ) . Thus L is onto. It has
been shown that 2.) implies 3.).
Now suppose 3.). Then the operation consisting of multiplication by the matrix of L, [L],
must be onto. However, the vectors in Fn so obtained, consist of linear combinations of the
columns of [L] . Therefore, the column rank of [L] is n. By Theorem 3.3.23 this equals the
determinant rank and so det ([L]) ≡ det (L) ̸= 0.
Now assume 4.) If Lv = 0 for some v ̸= 0, it follows that [L] x = 0 for some x ̸= 0.
Therefore, the columns of [L] are linearly dependent and so by Theorem 3.3.23, det ([L]) =
det (L) = 0 contrary to 4.). Therefore, 4.) implies 5.).
242
CHAPTER 9. LINEAR TRANSFORMATIONS
Now suppose 5.) and suppose Lv = Lw. Then L (v − w) = 0 and so by 5.), v − w = 0
showing that L is one to one. Also it is important to note that composition of linear transformations corresponds to
multiplication of the matrices. Consider the following diagram in which [A]γβ denotes the
matrix of A relative to the bases γ on Y and β on X, [B]δγ defined similarly.
X
qβ ↑
Fn
A
Y
−
→
◦
↑ qγ
[A]γβ Fm
−−−→
B
Z
−
→
◦
↑ qδ
[B]δγ Fp
−−−→
where A and B are two linear transformations, A ∈ L (X, Y ) and B ∈ L (Y, Z) . Then
B ◦ A ∈ L (X, Z) and so it has a matrix with respect to bases given on X and Z, the
coordinate maps for these bases being qβ and qδ respectively. Then
B ◦ A = qδ [B]δγ qγ qγ−1 [A]γβ qβ−1 = qδ [B]δγ [A]γβ qβ−1 .
But this shows that [B]δγ [A]γβ plays the role of [B ◦ A]δβ , the matrix of B ◦ A. Hence the
matrix of B ◦ A equals the product of the two matrices [A]γβ and [B]δγ . Of course it is
interesting to note that although [B ◦ A]δβ must be unique, the matrices, [A]γβ and [B]δγ
are not unique because they depend on γ, the basis chosen for Y .
Theorem 9.3.17 The matrix of the composition of linear transformations equals the product of the matrices of these linear transformations.
9.3.1
Rotations About A Given Vector
As an application, I will consider the problem of rotating counter clockwise about a given
unit vector which is possibly not one of the unit vectors in coordinate directions. First
consider a pair of perpendicular unit vectors, u1 and u2 and the problem of rotating in the
counterclockwise direction about u3 where u3 = u1 × u2 so that u1 , u2 , u3 forms a right
handed orthogonal coordinate system. Thus the vector u3 is coming out of the page.
θ
θ
u1
u2R
?
Let T denote the desired rotation. Then
T (au1 + bu2 + cu3 ) = aT u1 + bT u2 + cT u3
= (a cos θ − b sin θ) u1 + (a sin θ + b cos θ) u2 + cu3 .
Thus in terms of the basis γ ≡ {u1 , u2 , u3 } , the matrix of this transformation is


cos θ − sin θ 0
[T ]γ ≡  sin θ cos θ 0  .
0
0
1
I want to obtain the matrix of the transformation in terms of the usual basis β ≡ {e1 , e2 , e3 }
because it is in terms of this basis that we usually deal with vectors. From Proposition 9.3.10,
9.3. THE MATRIX OF A LINEAR TRANSFORMATION
243
if [T ]β is this matrix,


cos θ − sin θ 0
 sin θ cos θ 0 
0
0
1
(
)−1
(
u1 u2 u3
=
[T ]β u1
and so you can solve for [T ]β if you know the ui .
Recall why this is so.
R3 [T ]γ
−−→
qγ ↓
◦
R3 −−
T
→
I↑
◦
R3 [T ]β
−−→
u2
(
u1
u2
u3
)
R3
qγ ↓
R3
I↑
R3
The map qγ is accomplished by a multiplication on the left by
[T ]β = qγ [T ]γ qγ−1 =
u3
)
(
[T ]γ
u1
(
u1
u2
u2
u3
u3
)−1
)
. Thus
.
Suppose the unit vector u3 about which the counterclockwise rotation takes place is
(a, b, c). Then I obtain vectors, u1 and u2 such that {u1 , u2 , u3 } is a right handed orthonormal system with u3 = (a, b, c) and then use the above result. It is of course somewhat
arbitrary how this is accomplished. I will assume however, that |c| ̸= 1 since otherwise you
are looking at either clockwise or counter clockwise rotation about the positive z axis and
this is a problem which has been dealt with earlier. (If c = −1, it amounts to clockwise
rotation about the positive z axis while if c = 1, it is counter clockwise rotation about the
positive z axis.)
Then let u3 = (a, b, c) and u2 ≡ √a21+b2 (b, −a, 0) . This one is perpendicular to u3 . If
{u1 , u2 , u3 } is to be a right hand system it is necessary to have
u1 = u2 × u3 = √
(
1
(a2
+
b2 ) (a2
+
b2
+
c2 )
−ac, −bc, a2 + b2
)
Now recall that u3 is a unit vector and so the above equals
(
)
1
√
−ac, −bc, a2 + b2
(a2 + b2 )
Then from the above, A is given by

√ −ac
2
 (a−bc
 √
 (a2 +b2 )
√
a2 + b2
+b2 )
√ b
a2 +b2
√ −a
a2 +b2
0
a


cos θ

 sin θ
b 

0
c
− sin θ
cos θ
0


√ −ac
2
0
 (a−bc
√
0 
 (a2 +b2 )
√
1
a2 + b2
+b2 )
√ b
a2 +b2
√ −a
a2 +b2
0
a
−1

b 

c
Of course the matrix is an orthogonal matrix so it is easy to take the inverse by simply
taking the transpose. Then doing the computation and then some simplification yields
(
)
a2 + 1 − a2 cos θ
=  ab (1 − cos θ) + c sin θ
ac (1 − cos θ) − b sin θ

ab (1 −(cos θ) −
) c sin θ
b2 + 1 − b2 cos θ
bc (1 − cos θ) + a sin θ

ac (1 − cos θ) + b sin θ
.
bc (1 −(cos θ) −
) a sin θ
c2 + 1 − c2 cos θ
(9.4)
244
CHAPTER 9. LINEAR TRANSFORMATIONS
With this, it is clear how to rotate clockwise about the unit vector, (a, b, c) . Just rotate
counter clockwise through an angle of −θ. Thus the matrix for this clockwise rotation is just
)
(


ab (1 −(cos θ) +
c sin θ ac (1 − cos θ) − b sin θ
a2 + 1 − a2 cos θ
)
.
b2 + 1 − b2 cos θ
bc (1 −(cos θ) +
=  ab (1 − cos θ) − c sin θ
) a sin θ
2
2
ac (1 − cos θ) + b sin θ bc (1 − cos θ) − a sin θ
c + 1 − c cos θ
In deriving 9.4 it was assumed that c ̸= ±1 but even in this case, it gives the correct
answer. Suppose for example that c = 1 so you are rotating in the counter clockwise
direction about the positive z axis. Then a, b are both equal to zero and 9.4 reduces to 2.24.
9.3.2
The Euler Angles
An important application of the above theory is to the Euler angles, important in the
mechanics of rotating bodies. Lagrange studied these things back in the 1700’s. To describe
the Euler angles consider the following picture in which x1 , x2 and x3 are the usual coordinate
axes fixed in space and the axes labeled with a superscript denote other coordinate axes.
Here is the picture.
x13
x23
x3 = x13
x23 = x33
θ
θ
ϕ
ϕ
x1
x12
x2
x22
x12
ψ
x11 = x21
x11
We obtain ϕ by rotating counter clockwise
has the matrix

cos ϕ − sin ϕ
 sin ϕ cos ϕ
0
0
x21
x32
x22
ψ
x31
about the fixed x3 axis. Thus this rotation

0
0  ≡ M1 (ϕ)
1
Next rotate counter clockwise about the x11 axis which results from the first rotation through
an angle of θ. Thus it is desired to rotate counter clockwise through an angle θ about the
unit vector

  

cos ϕ − sin ϕ 0
1
cos ϕ
 sin ϕ cos ϕ 0   0  =  sin ϕ  .
0
0
1
0
0
Therefore, in 9.4, a = cos ϕ, b = sin ϕ, and c = 0. It follows the matrix of this transformation
with respect to the usual basis is


cos2 ϕ + sin2 ϕ cos θ cos ϕ sin ϕ (1 − cos θ)
sin ϕ sin θ
 cos ϕ sin ϕ (1 − cos θ) sin2 ϕ + cos2 ϕ cos θ − cos ϕ sin θ  ≡ M2 (ϕ, θ)
− sin ϕ sin θ
cos ϕ sin θ
cos θ
Finally, we rotate counter clockwise about the positive x23 axis by ψ. The vector in the
positive x13 axis is the same as the vector in the fixed x3 axis. Thus the unit vector in the
9.4. EIGENVALUES AND EIGENVECTORS OF LINEAR TRANSFORMATIONS 245
positive direction of the x23 axis is

 
cos2 ϕ + sin2 ϕ cos θ cos ϕ sin ϕ (1 − cos θ)
sin ϕ sin θ
1
 cos ϕ sin ϕ (1 − cos θ) sin2 ϕ + cos2 ϕ cos θ − cos ϕ sin θ   0 
0
− sin ϕ sin θ
cos ϕ sin θ
cos θ

 

2
2
2
2
cos ϕ + sin ϕ cos θ
cos ϕ + sin ϕ cos θ
=  cos ϕ sin ϕ (1 − cos θ)  =  cos ϕ sin ϕ (1 − cos θ) 
− sin ϕ sin θ
− sin ϕ sin θ
and it is desired to rotate counter clockwise through an angle of ψ about this vector. Thus,
in this case,
a = cos2 ϕ + sin2 ϕ cos θ, b = cos ϕ sin ϕ (1 − cos θ) , c = − sin ϕ sin θ.
and you could substitute in to the formula of Theorem 9.4 and obtain a matrix which represents the linear transformation obtained by rotating counter clockwise about the positive
x23 axis, M3 (ϕ, θ, ψ) . Then what would be the matrix with respect to the usual basis for the
linear transformation which is obtained as a composition of the three just described? By
Theorem 9.3.17, this matrix equals the product of these three,
M3 (ϕ, θ, ψ) M2 (ϕ, θ) M1 (ϕ) .
I leave the details to you. There are procedures due to Lagrange which will allow you to
write differential equations for the Euler angles in a rotating body. To give an idea how
these angles apply, consider the following picture.
x3
x (t)
3
R
ψ
θ
x2
ϕ
x1
line of nodes
This is as far as I will go on this topic. The point is, it is possible to give a systematic
description in terms of matrix multiplication of a very elaborate geometrical description of
a composition of linear transformations. You see from the picture it is possible to describe
the motion of the spinning top shown in terms of these Euler angles.
9.4
Eigenvalues And Eigenvectors Of Linear Transformations
Let V be a finite dimensional vector space. For example, it could be a subspace of Cn or Rn .
Also suppose A ∈ L (V, V ) .
246
CHAPTER 9. LINEAR TRANSFORMATIONS
Definition 9.4.1 The characteristic polynomial of A is defined as q (λ) ≡ det (λI − A) .
The zeros of q (λ) in F are called the eigenvalues of A.
Lemma 9.4.2 When λ is an eigenvalue of A which is also in F, the field of scalars, then
there exists v ̸= 0 such that Av = λv.
Proof: This follows from Theorem 9.3.16. Since λ ∈ F,
λI − A ∈ L (V, V )
and since it has zero determinant, it is not one to one. The following lemma gives the existence of something called the minimal polynomial.
Lemma 9.4.3 Let A ∈ L (V, V ) where V is a finite dimensional vector space of dimension
n with arbitrary field of scalars. Then there exists a unique polynomial of the form
p (λ) = λm + cm−1 λm−1 + · · · + c1 λ + c0
such that p (A) = 0 and m is as small as possible for this to occur.
2
Proof: Consider the linear transformations, I, A, A2 , · · · , An . There are n2 + 1 of these
transformations and so by Theorem 9.2.3 the set is linearly dependent. Thus there exist
constants, ci ∈ F such that
n2
∑
c0 I +
ck Ak = 0.
k=1
This implies there exists a polynomial, q (λ) which has the property that q (A) = 0. In fact,
∑n2
one example is q (λ) ≡ c0 + k=1 ck λk . Dividing by the leading term, it can be assumed
this polynomial is of the form λm + cm−1 λm−1 + · · · + c1 λ + c0 , a monic polynomial. Now
consider all such monic polynomials, q such that q (A) = 0 and pick the one which has the
smallest degree m. This is called the minimal polynomial and will be denoted here by p (λ) .
If there were two minimal polynomials, the one just found and another,
λm + dm−1 λm−1 + · · · + d1 λ + d0 .
Then subtracting these would give the following polynomial,
qe (λ) = (dm−1 − cm−1 ) λm−1 + · · · + (d1 − c1 ) λ + d0 − c0
Since qe (A) = 0, this requires each dk = ck since otherwise you could divide by dk − ck where
k is the largest one which is nonzero. Thus the choice of m would be contradicted. Theorem 9.4.4 Let V be a nonzero finite dimensional vector space of dimension n with
the field of scalars equal to F. Suppose A ∈ L (V, V ) and for p (λ) the minimal polynomial
defined above, let µ ∈ F be a zero of this polynomial. Then there exists v ̸= 0,v ∈ V such
that
Av = µv.
If F = C, then A always has an eigenvector and eigenvalue. Furthermore, if {λ1 , · · · , λm }
are the zeros of p (λ) in F, these are exactly the eigenvalues of A for which there exists an
eigenvector in V.
9.5. EXERCISES
247
Proof: Suppose first µ is a zero of p (λ) . Since p (µ) = 0, it follows
p (λ) = (λ − µ) k (λ)
where k (λ) is a polynomial having coefficients in F. Since p has minimal degree, k (A) ̸= 0
and so there exists a vector, u ̸= 0 such that k (A) u ≡ v ̸= 0. But then
(A − µI) v = (A − µI) k (A) (u) = 0.
The next claim about the existence of an eigenvalue follows from the fundamental theorem of algebra and what was just shown.
It has been shown that every zero of p (λ) is an eigenvalue which has an eigenvector in
V . Now suppose µ is an eigenvalue which has an eigenvector in V so that Av = µv for some
v ∈ V, v ̸= 0. Does it follow µ is a zero of p (λ)?
0 = p (A) v = p (µ) v
and so µ is indeed a zero of p (λ). In summary, the theorem says that the eigenvalues which have eigenvectors in V are
exactly the zeros of the minimal polynomial which are in the field of scalars F.
9.5
Exercises
1. If A, B, and C are each n × n matrices and ABC is invertible, why are each of A, B,
and C invertible?
2. Give an example of a 3 × 2 matrix with the property that the linear transformation
determined by this matrix is one to one but not onto.
3. Explain why Ax = 0 always has a solution whenever A is a linear transformation.
4. Review problem: Suppose det (A − λI) = 0. Show using Theorem 3.1.15 there exists
x ̸= 0 such that (A − λI) x = 0.
5. How does the minimal polynomial of an algebraic number relate to the minimal polynomial of a linear transformation? Can an algebraic number be thought of as a linear
transformation? How?
6. Recall the fact from algebra that if p (λ) and q (λ) are polynomials, then there exists
l (λ) , a polynomial such that
q (λ) = p (λ) l (λ) + r (λ)
where the degree of r (λ) is less than the degree of p (λ) or else r (λ) = 0. With this in
mind, why must the minimal polynomial always divide the characteristic polynomial?
That is, why does there always exist a polynomial l (λ) such that p (λ) l (λ) = q (λ)?
Can you give conditions which imply the minimal polynomial equals the characteristic
polynomial? Go ahead and use the Cayley Hamilton theorem.
7. In the following examples, a linear transformation, T is given by specifying its action
on a basis β. Find its matrix with respect to this basis.
(
)
( )
(
)
(
) (
)
1
1
−1
−1
−1
(a) T
=2
+1
,T
=
2
2
1
1
1
248
CHAPTER 9. LINEAR TRANSFORMATIONS
(
)
)
(
) (
)
−1
−1
0
(b) T
=2
+1
,T
=
1
1
1
)
)
) ( )
(
( )
( )
(
(
1
1
1
1
1
1
(c) T
=2
+1
,T
=1
−
0
2
0
2
0
2
0
1
(
0
1
)
(
8. Let β = {u1 , · · · , un } be a basis for Fn and let T : Fn → Fn be defined as follows.
( n
)
n
∑
∑
T
ak uk =
ak bk uk
k=1
k=1
First show that T is a linear transformation. Next show that the matrix of T with
respect to this basis, [T ]β is


b1


..


.
bn
Show that the above definition is equivalent to simply specifying T on the basis vectors
of β by
T (uk ) = bk uk .
9. ↑In the situation of the above problem, let γ = {e1 , · · · , en } be the standard basis for
Fn where ek is the vector which has 1 in the k th entry and zeros elsewhere. Show that
[T ]γ =
(
)
(
)−1
u1 · · · un [T ]β u1 · · · un
(9.5)
10. ↑Generalize the above problem to the situation where T is given by specifying its
action on the vectors of a basis β = {u1 , · · · , un } as follows.
T uk =
n
∑
ajk uj .
j=1
Letting A = (aij ) , verify that for γ = {e1 , · · · , en } , 9.5 still holds and that [T ]β = A.
11. Let P3 denote the set of real polynomials of degree no more than 3, defined on an
interval [a, b]. Show that P3 is a subspace of the{ vector space
} of all functions defined
on this interval. Show that a basis for P3 is 1, x, x2 , x3 . Now let D denote the
differentiation operator which sends a function to its derivative. Show D is a linear
transformation which sends P3 to P3 . Find the matrix of this linear transformation
with respect to the given basis.
12. Generalize the above problem to Pn , the space of polynomials of degree no more than
n with basis {1, x, · · · , xn } .
13. In the situation of the above problem, let the linear transformation be T = D2 + 1,
defined as T f = f ′′ + f. Find the matrix of this linear transformation with respect to
the given basis {1, x, · · · , xn }. Write it down for n = 4.
14. In calculus, the following situation is encountered. There exists a vector valued function f :U → Rm where U is an open subset of Rn . Such a function is said to have
a derivative or to be differentiable at x ∈ U if there exists a linear transformation
T : Rn → Rm such that
lim
v→0
|f (x + v) − f (x) − T v|
= 0.
|v|
9.5. EXERCISES
249
First show that this linear transformation, if it exists, must be unique. Next show
that for β = {e1 , · · · , en } , , the standard basis, the k th column of [T ]β is
∂f
(x) .
∂xk
Actually, the result of this problem is a well kept secret. People typically don’t see
this in calculus. It is seen for the first time in advanced calculus if then.
15. Recall that A is similar to B if there exists a matrix P such that A = P −1 BP. Show
that if A and B are similar, then they have the same determinant. Give an example
of two matrices which are not similar but have the same determinant.
16. Suppose A ∈ L (V, W ) where dim (V ) > dim (W ) . Show ker (A) ̸= {0}. That is, show
there exist nonzero vectors v ∈ V such that Av = 0.
17. A vector v is in the convex hull of a nonempty set S if there are finitely many vectors
of S, {v1 , · · · , vm } and nonnegative scalars {t1 , · · · , tm } such that
v=
m
∑
tk vk ,
k=1
m
∑
tk = 1.
k=1
Such a linear combination is called a convex combination.
Suppose now that S ⊆ V,
∑m
a vector space of dimension n. Show that if v = k=1 tk vk is a vector in the convex
hull for m > n + 1, then there exist other scalars {t′k } such that
v=
m−1
∑
t′k vk .
k=1
Thus every vector in the convex hull of S can be obtained as a convex combination
of at most n + 1 points of S. This incredible result is in Rudin [24]. Hint: Consider
L : Rm → V × R defined by
(m
)
m
∑
∑
ak vk ,
ak
L (a) ≡
k=1
k=1
Explain why ker (L) ̸= {0} . Next, letting a ∈ ker (L) \ {0} and λ ∈ R, note that
λa ∈ ker (L) . Thus for all λ ∈ R,
v=
m
∑
(tk + λak ) vk .
k=1
Now vary λ till some tk + λak = 0 for some ak ̸= 0.
18. For those who know about compactness, use Problem 17 to show that if S ⊆ Rn and
S is compact, then so is its convex hull.
19. Suppose Ax = b has a solution. Explain why the solution is unique precisely when
Ax = 0 has only the trivial (zero) solution.
20. Let A be an n × n matrix of elements of F. There are two cases. In the first case,
F contains a splitting field of pA (λ) so that p (λ) factors into a product of linear
polynomials having coefficients in F. It is the second case which is of interest here
where pA (λ) does not factor into linear factors having coefficients in F. Let G be a
splitting field of pA (λ) and let qA (λ) be the minimal polynomial of A with respect
to the field G. Explain why qA (λ) must divide pA (λ). Now why must qA (λ) factor
completely into linear factors?
250
CHAPTER 9. LINEAR TRANSFORMATIONS
21. In Lemma 9.2.2 verify that L is linear.
Chapter 10
Canonical Forms
10.1
A Theorem Of Sylvester, Direct Sums
The notation is defined as follows.
Definition 10.1.1 Let L ∈ L (V, W ) . Then ker (L) ≡ {v ∈ V : Lv = 0} .
Lemma 10.1.2 Whenever L ∈ L (V, W ) , ker (L) is a subspace.
Proof: If a, b are scalars and v,w are in ker (L) , then
L (av + bw) = aL (v) + bL (w) = 0 + 0 = 0 Suppose now that A ∈ L (V, W ) and B ∈ L (W, U ) where V, W, U are all finite dimensional vector spaces. Then it is interesting to consider ker (BA). The following theorem of
Sylvester is a very useful and important result.
Theorem 10.1.3 Let A ∈ L (V, W ) and B ∈ L (W, U ) where V, W, U are all vector spaces
over a field F. Suppose also that ker (A) and A (ker (BA)) are finite dimensional subspaces.
Then
dim (ker (BA)) ≤ dim (ker (B)) + dim (ker (A)) .
Equality holds if and only if A (ker (BA)) = ker (B).
Proof: If x ∈ ker (BA) , then Ax ∈ ker (B) and so A (ker (BA)) ⊆ ker (B) . The following
picture may help.
ker(BA)
ker(A)
ker(B)
A
-
A(ker(BA))
Now let {x1 , · · · , xn } be a basis of ker (A) and
∑m let {Ay1 , · · · , Aym } be a basis for
A (ker (BA)) . Take any z ∈ ker (BA) . Then Az = i=1 ai Ayi and so
(
)
m
∑
A z−
ai yi = 0
i=1
251
252
which means z −
CHAPTER 10. CANONICAL FORMS
∑m
i=1
ai yi ∈ ker (A) and so there are scalars bi such that
z−
m
∑
ai yi =
i=1
n
∑
bi xi .
j=1
It follows span (x1 , · · · , xn , y1 , · · · , ym ) ⊇ ker (BA) and so by the first part, (See the picture.)
dim (ker (BA)) ≤ n + m ≤ dim (ker (A)) + dim (ker (B))
Now {x1 , · · · , xn , y1 , · · · , ym } is linearly independent because if
∑
∑
ai xi +
bj yj = 0
i
j
∑
then you could do A to both sides and conclude that j bj Ayj =∑0 which requires that each
bj = 0. Then it follows that each ai = 0 also because it implies i ai xi = 0. Thus
{x1 , · · · , xn , y1 , · · · , ym }
is a basis for ker (BA). Then A (ker (BA)) = ker (B) if and only if m = dim (ker (B)) if and
only if
dim (ker (BA)) = m + n = dim (ker (B)) + dim (ker (A)) . Of course this result holds for any finite product of linear transformations by induction. One way this
∏l is quite useful is in the case where you have a finite product of linear
transformations i=1 Li all in L (V, V ) . Then
(
dim ker
l
∏
)
Li
i=1
≤
l
∑
dim (ker Li ) .
i=1
r
Definition 10.1.4 Let {Vi }i=1 be subspaces of V. Then
r
∑
Vi = V1 + · · · + Vr
i=1
denotes all sums of the form
∑r
i=1
vi where vi ∈ Vi . If whenever
r
∑
vi = 0, vi ∈ Vi ,
(10.1)
i=1
it follows that vi = 0 for each i, then a special notation is used to denote
notation is
V1 ⊕ · · · ⊕ Vr ,
∑r
i=1
Vi . This
and it is called a direct sum of subspaces.
Now here is a useful lemma which is likely already understood.
Lemma 10.1.5 Let L ∈ L (V, W ) where V, W are n dimensional vector spaces. Then if L
is one to one, if and only if L is also onto. In fact, if {v1 , · · · , vn } is a basis, then so is
{Lv1 , · · · , Lvn }.
10.1. A THEOREM OF SYLVESTER, DIRECT SUMS
253
Proof: Let {v1 , · · · , vn } be a basis for V . Then I claim that {Lv1 , · · · , Lvn } is a basis
for W . First of all, I show {Lv1 , · · · , Lvn } is linearly independent. Suppose
n
∑
ck Lvk = 0.
k=1
(
Then
L
n
∑
)
ck vk
=0
k=1
and since L is one to one, it follows
n
∑
ck vk = 0
k=1
which implies each ck = 0. Therefore, {Lv1 , · · · , Lvn } is linearly independent. If there
exists w not in the span of these vectors, then by Lemma 8.2.10, {Lv1 , · · · , Lvn , w} would
be independent and this contradicts the exchange theorem, Theorem 8.2.4 because it would
be a linearly independent set having more vectors than the spanning set {v1 , · · · , vn } .
Conversely, suppose L is onto. Then there exists a basis for W which is of the form
{Lv1 , · · · , Lvn } . It follows that {v1 , · · · , vn } is linearly independent. Hence it is a basis for
V by similar reasoning
to the above. Then if Lx =
∑ 0, it follows that there are scalars ci
∑
such that x = i ci vi and consequently 0 = Lx = i ci Lvi . Therefore, each ci = 0 and so
x = 0 also. Thus L is one to one. }
{
i
is a basis for Vi , then a
Lemma 10.1.6 If V = V1 ⊕ · · · ⊕ Vr and if β i = v1i , · · · , vm
i
basis for V is {β 1 , · · · , β r }. Thus
dim (V ) =
r
∑
dim (Vi ) .
i=1
Proof: Suppose
∑r
i=1
∑ mi
i
j=1 cij vj
= 0. then since it is a direct sum, it follows for each i,
mi
∑
cij vji = 0
j=1
}
{
i
and now since v1i , · · · , vm
is a basis, each cij = 0. i
Here is a fundamental lemma.
Lemma 10.1.7 Let Li be in L (V, V ) and suppose for i ̸= j, Li Lj = Lj Li and also Li is
one to one on ker (Lj ) whenever i ̸= j. Then
( p
)
∏
ker
Li = ker (L1 ) ⊕ + · · · + ⊕ ker (Lp )
i=1
Here
∏p
i=1
Li is the product of all the linear transformations.
Proof : Note that since the operators commute, Lj : ker (Li ) → ker (Li ). Here is why.
If Li y = 0 so that y ∈ ker (Li ) , then
Li Lj y = Lj Li y = Lj 0 = 0
254
CHAPTER 10. CANONICAL FORMS
and so Lj : ker (Li ) 7→ ker (Li ). Next observe that it is obvious that, since the operators
commute,
( p
)
p
∑
∏
ker (Lp ) ⊆ ker
Li
i=1
Suppose
i=1
p
∑
but some vi ̸= 0. Then do
this results in
vi = 0, vi ∈ ker (Li ) ,
i=1
∏
j̸=i
Lj to both sides. Since the linear transformations commute,
∏
Lj (vi ) = 0
j̸=i
which contradicts the assumption that these Lj are one to one on ker (Li ) and the observation
that they map ker (Li ) to ker (Li ). Thus if
∑
vi = 0, vi ∈ ker (Li )
i
then each vi = 0. It follows that
(
ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ) ⊆ ker
p
∏
)
Li
(*)
i=1
From Sylvester’s theorem and the observation about direct sums in Lemma 10.1.6,
p
∑
i=1
dim (ker (Li )) =
dim (ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ))
(
≤ dim ker
(
p
∏
i=1
))
Li
≤
p
∑
dim (ker (Li ))
i=1
which implies all these are equal. Now in general, if W is a subspace of V, a finite dimensional
vector space and the two have the same dimension, then W = V . This is because W has
a basis and if v is not in the span of this basis, then v adjoined to the basis of W would
be a linearly independent set so the dimension of V would then be strictly larger than the
dimension of W . It follows from * that
( p
)
∏
ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ) = ker
Li i=1
10.2
Direct Sums, Block Diagonal Matrices
Let V be a finite dimensional vector space with field of scalars F. Here I will make no
assumption on F. Also suppose A ∈ L (V, V ) .
Recall Lemma 9.4.3 which gives the existence of the minimal polynomial for a linear
transformation A. This is the monic polynomial p which has smallest possible degree such
that p(A) = 0. It is stated again for convenience.
Lemma 10.2.1 Let A ∈ L (V, V ) where V is a finite dimensional vector space of dimension
n with field of scalars F. Then there exists a unique monic polynomial of the form
p (λ) = λm + cm−1 λm−1 + · · · + c1 λ + c0
such that p (A) = 0 and m is as small as possible for this to occur.
10.2. DIRECT SUMS, BLOCK DIAGONAL MATRICES
255
Now it is time to consider the notion of a direct sum of subspaces. Recall you can
always assert the existence of a factorization of the minimal polynomial into a product of
irreducible polynomials. This fact will now be used to show how to obtain such a direct
sum of subspaces.
Definition 10.2.2 For A ∈ L (V, V ) where dim (V ) = n, suppose the minimal polynomial
is
q
∏
p (λ) =
(ϕk (λ))rk
k=1
where the polynomials ϕk have coefficients in F and are irreducible. Now define the generalized eigenspaces
r
Vk ≡ ker ((ϕk (A)) k )
Note that if one of these polynomials (ϕk (λ))rk is a monic linear polynomial, then the generalized eigenspace would be an eigenspace.
Theorem 10.2.3 In the context of Definition 10.2.2,
V = V1 ⊕ · · · ⊕ Vq
(10.2)
and each{ Vk is A invariant,
meaning A (Vk ) ⊆{ Vk . ϕl (A) is one
}
} to one on each Vk for k ̸= l.
i
is
a
basis
for
V
,
then
β
,
β
,
·
·
·
,
β
If β i = v1i , · · · , vm
i
1
2
q is a basis for V.
i
Proof: It is clear Vk is a subspace which is A invariant because A commutes with
m
r
ϕk (A) k . It is clear the operators ϕk (A) k commute. Thus if v ∈ Vk ,
ϕk (A)
rk
r
r
ϕl (A) l v = ϕl (A) l ϕk (A)
rk
r
v = ϕl (A) l 0 = 0
r
and so ϕl (A) l : Vk → Vk .
I claim ϕl (A) is one to one on Vk whenever k ̸= l. The two polynomials ϕl (λ) and
r
ϕk (λ) k are relatively prime so there exist polynomials m (λ) , n (λ) such that
rk
m (λ) ϕl (λ) + n (λ) ϕk (λ)
=1
It follows that the sum of all coefficients of λ raised to a positive power are zero and the
constant term on the left is 1. Therefore, using the convention A0 = I it follows
rk
=I
rk
v=v
m (A) ϕl (A) + n (A) ϕk (A)
If v ∈ Vk , then from the above,
m (A) ϕl (A) v + n (A) ϕk (A)
Since v is in Vk , it follows by definition,
m (A) ϕl (A) v = v
r
l
and so ϕl (A) v ̸= 0 unless v = 0. Thus ϕl (A) and hence
∏q ϕl (A) rk is one to one on Vk for
every k ̸= l. By Lemma 10.1.7 and the fact that ker ( k=1 ϕk (λ) ) = V, 10.2 is obtained.
The claim about the bases follows from Lemma 10.1.6. You could consider the restriction of A to Vk . It turns out that this restriction has
m
minimal polynomial equal to ϕk (λ) k .
256
CHAPTER 10. CANONICAL FORMS
Corollary 10.2.4 Let the minimal polynomial of A be p (λ) =
m
ϕk is irreducible. Let Vk = ker (ϕ (A) k ) . Then
∏q
k=1
ϕk (λ)
mk
where each
V1 ⊕ · · · ⊕ Vq = V
and letting Ak denote the restriction of A to Vk , it follows the minimal polynomial of Ak is
m
ϕk (λ) k .
m
the direct sum, V1 ⊕ · · · ⊕ Vq = V where Vk = ker (ϕk (A) k ) for p (λ) =
∏q Proof: Recall
mk
the minimal polynomial for A where the ϕk (λ) are all irreducible. Thus each
k=1 ϕk (λ)
Vk is invariant with respect to A. What is the minimal polynomial of Ak , the restriction of
m
A to Vk ? First note that ϕk (Ak ) k (Vk ) = {0} by definition. Thus if η (λ) is the minimal
m
r
polynomial for Ak then it must divide ϕk (λ) k and so by Corollary 8.3.11 η (λ) = ϕk (λ) k
where rk ≤ mk . Could rk < mk ? No, this is not possible because then p (λ) would fail
m
to be the minimal polynomial for A. You could substitute for the term ϕk (λ) k in the
rk
′
factorization of p (λ) with ϕk (λ) and the resulting polynomial p would satisfy p′ (A) = 0.
Here is why. From Theorem 10.2.3, a typical x ∈ V is of the form
q
∑
vi , vi ∈ Vi
i=1
Then since all the factors commute,
( q
)
( q
)
q
∑
∏
∑
mi
rk
′
ϕi (A) ϕk (A)
p (A)
vi =
vi
i=1
i=1
i̸=k
For j ̸= k
q
∏
ϕi (A)
mi
rk
ϕk (A)
vj =
mi
ϕi (A)
ϕk (A)
rk
mj
ϕj (A)
vj = 0
i̸=k,j
i̸=k
If j = k,
q
∏
q
∏
mi
ϕi (A)
rk
ϕk (A)
vk = 0
i̸=k
which shows p′ (λ) is a monic polynomial having smaller degree than p (λ) such that p′ (A) =
m
0. Thus the minimal polynomial for Ak is ϕk (λ) k as claimed. How does Theorem 10.2.3 relate to matrices?
Theorem 10.2.5 Suppose V is a vector space with field of scalars F and A ∈ L (V, V ).
Suppose also
V = V1 ⊕ · · · ⊕ Vq
where each Vk is A invariant. (AVk ⊆ Vk ) Also let β k be an ordered basis for Vk and let Ak
denote the restriction of A to Vk . Letting M k denote the{matrix of A
} k with respect to this
basis, it follows the matrix of A with respect to the basis β 1 , · · · , β q is



M1
0
..
0
.
Mq



10.3. CYCLIC SETS
257
{
}
Proof: Let β denote the ordered basis β 1 , · · · , β q , |β k | being the number of vectors
in β k . Let qk : F|β k | → Vk be the usual map such that the following diagram commutes.
Vk
qk ↑
F|β k |
Ak
→
◦
→
Mk
Vk
↑ qk
F|β k |
Thus Ak qk = qk M k . Then if q is the map from Fn to V corresponding to the ordered basis
β just described,
(
)T
q 0 ··· x ··· 0
= qk x,
∑k−1
∑k
where x occupies the positions between i=1 |β i | + 1 and i=1 |β i |. Then M will be the
matrix of A with respect to β if and only if a similar diagram to the above commutes.
Thus it is required that Aq = qM . However, from the description of q just made, and the
invariance of each Vk ,





M1
0
0
0
 .. 

  .. 
..
 . 

 . 
.





k
k



 x 
M
Aq  x  = Ak qk x = qk M x = q 


 . 

 . 
..
 .. 

  .. 
.
0
0
0
Mq
It follows that the above block diagonal matrix is the matrix of A with respect to the given
ordered basis. An examination of the proof of the above theorem yields the following corollary.
Corollary 10.2.6 If any β k in the above consists of eigenvectors, then M k is a diagonal
matrix having the corresponding eigenvalues down the diagonal.
It follows that it would be interesting to consider special bases for the vector spaces in
the direct sum. This leads to the Jordan form or more generally other canonical forms such
as the rational canonical form.
10.3
Cyclic Sets
It was shown above that for A ∈ L (V, V ) for V a finite dimensional vector space over the
field of scalars F, there exists a direct sum decomposition
V = V1 ⊕ · · · ⊕ Vq
where
mk
Vk = ker (ϕk (A)
)
and ϕk (λ) is an irreducible polynomial. Here the minimal polynomial of A was
q
∏
mk
ϕk (λ)
k=1
Next I will consider the problem of finding a basis for Vk such that the matrix of A
restricted to Vk assumes various forms.
258
CHAPTER 10. CANONICAL FORMS
{
}
2
m−1
Definition 10.3.1 Letting x ̸= 0 denote by
β
the
vectors
x,
Ax,
A
x,
·
·
·
,
A
x
where
x
(
)
m
m−1
m is the smallest such that A x ∈ span x, · · · , A
x . This is called an A cyclic set.
The vectors which result are also called a Krylov sequence. For such a sequence of vectors,
|β x | ≡ m.
The first thing to notice is that such a Krylov sequence is always linearly independent.
{
}
2
m−1
=
x,
Ax,
A
x,
·
·
·
,
A
x
, x ̸= 0 where m is the smallest such
Lemma 10.3.2 Let
β
x
(
)
m
m−1
that A x ∈ span x, · · · , A
x . Then β x is linearly independent.
Proof: Suppose that there are scalars ak , not all zero such that
m−1
∑
ak Ak x = 0
k=0
Then letting ar be the last nonzero scalar in the sum, you can divide by ar and solve for
Ar x as a linear combination of the Aj x for j < r ≤ m − 1 contrary to the definition of m.
For more on the next lemma and the following theorem, see Hofman and Kunze [15]. I
am following the presentation in Friedberg Insel and Spence [10]. See also Herstein [14]for
a different approach. To help organize the ideas in the lemma, here is a diagram.
ker(ϕ(A)m )
U ⊆ ker(ϕ(A))
W
v1 , ..., vs
x1 , x2 , ..., xp
m
Lemma 10.3.3 Let W be an A invariant (AW ⊆ W ) subspace of ker (ϕ (A) ) for m a
positive integer where ϕ (λ) is an irreducible monic polynomial of degree d. Let U be an A
invariant subspace of ker (ϕ (A)) .
If {v1 , · · · , vs } is a basis for W then if x ∈ U \ W,
{v1 , · · · , vs , β x }
is linearly independent.
There exist vectors x1 , · · · , xp each in U such that
{
}
v1 , · · · , vs , β x1 , · · · , β xp
is a basis for
U + W.
m
Also, if x ∈ ker (ϕ (A) ) , |β x | = kd where k ≤ m. Here |β x | is the length of β x , the degree of
the monic polynomial η (λ) satisfying η (A) x = 0 with η (λ) having smallest possible degree.
Proof: Claim: If x ∈ ker ϕ (A) , and |β x | denotes the length of β x , then |β x | = d and so
{
}
β x = x, Ax, A2 x, · · · , Ad−1 x
10.3. CYCLIC SETS
259
also span (β x ) is A invariant, A (span (β x )) ⊆ span (β x ).
Proof of the claim: Let m = |β x | . That is, there exists monic η (λ) of degree m and
η (A) x = 0 with m is as small as possible for this to happen. Then from the usual process of
division of polynomials, there exist l (λ) , r (λ) such that r (λ) = 0 or else has smaller degree
than that of η (λ) such that
ϕ (λ) = η (λ) l (λ) + r (λ)
If deg (r (λ)) < deg (η (λ)) , then the equation implies 0 = ϕ (A) x = r (A) x and so m was
incorrectly chosen. Hence r (λ) = 0 and so if l (λ) ̸= 1, then η (λ) divides ϕ (λ) contrary
to the assumption that ϕ (λ) is irreducible. Hence l (λ) = 1 and η (λ) = ϕ (λ) . The claim
about span (β x ) is obvious because Ad x ∈ span (β x ). This shows the claim.
Suppose now x ∈ U \ W where U ⊆ ker (ϕ (A)). Consider
{v1 , · · · , vs , β x } .
Is this set of vectors independent? Suppose
s
∑
ai vi +
i=1
If z ≡
∑d
j=1
d
∑
dj Aj−1 x = 0.
j=1
(
)
dj Aj−1 x, then z ∈ W ∩ span x, Ax, · · · , Ad−1 x . Then also for each m ≤ d − 1,
(
)
Am z ∈ W ∩ span x, Ax, · · · , Ad−1 x
(
)
because W, span x, Ax, · · · , Ad−1 x are A invariant. Therefore,
(
)
(
)
span z, Az, · · · , Ad−1 z
⊆ W ∩ span x, Ax, · · · , Ad−1 x
(
)
⊆ span x, Ax, · · · , Ad−1 x
(10.3)
{
}
Suppose z ̸= 0. Then from the Lemma 10.3.2 above, z, Az, · · · , Ad−1 z must be linearly
independent. Therefore,
(
(
))
(
(
))
d = dim span z, Az, · · · , Ad−1 z ≤ dim W ∩ span x, Ax, · · · , Ad−1 x
(
(
))
≤ dim span x, Ax, · · · , Ad−1 x = d
Thus
(
)
(
)
W ∩ span x, Ax, · · · , Ad−1 x = span x, Ax, · · · , Ad−1 x
which would require x ∈ W but this is assumed not to take place. Hence z = 0 and so
the linear{ independence of the
} {v1 , · · · , vs } implies each ai = 0. Then the linear independ−1
dence
of
x,
Ax,
·
·
·
,
A
x
, which follows from Lemma 10.3.2, shows each dj = 0. Thus
{
}
d−1
v1 , · · · , vs , x, Ax, · · · , A x is linearly independent as claimed.
Let x ∈ U \ W ⊆ ker (ϕ (A)) . Then it was just shown that {v1 , · · · , vs , β x } is linearly
independent. Let W1 be given by
y ∈ span (v1 , · · · , vs , β x ) ≡ W1
Then W1 is A invariant. If W1 equals U + W, then you are done. If not, let W1 play the
role of W and pick x1 ∈ U \ W1 and repeat the argument. Continue till
(
)
span v1 , · · · , vs , β x1 , · · · , β xn = U + W
m
The process stops because ker (ϕ (A) ) is finite dimensional.
260
CHAPTER 10. CANONICAL FORMS
m
Finally, letting x ∈ ker (ϕ (A) ) , there is a monic polynomial η (λ) such that η (A) x = 0
and η (λ) is of smallest possible degree, which degree equals |β x | . Then
m
ϕ (λ)
= η (λ) l (λ) + r (λ)
If deg (r (λ)) < deg (η (λ)) , then r (A) x = 0 and η (λ) was incorrectly chosen. Hence
k
m
r (λ) = 0 and so η (λ) must divide ϕ (λ) . Hence by Corollary 8.3.11 η (λ) = ϕ (λ) where
k ≤ m. Thus |β x | = kd = deg (η (λ)). With this preparation, here is the main result about a basis V where A ∈ L (V, V ) and the
m
minimal polynomial for A is ϕ (A) for ϕ (λ) irreducible an irreducible monic polynomial.
There is a very interesting generalization of this theorem in [15] which pertains to the
existence of complementary subspaces. For an outline of this generalization, see Problem 9
on Page 310.
m
Theorem 10.3.4 Suppose A ∈ L (V, V ) and the minimal polynomial of A is ϕ (λ) where
ϕ (λ){is a monic irreducible
polynomial. Then there exists a basis for V which is of the form
}
β = β x1 , · · · , β xp .
Proof: First suppose m = 1. Then in Lemma 10.3.3 you can let W
{ = {0} and }U =
ker (ϕ (A)). Then by this lemma, there exist v1 , v2 , · · · , vs such that β v1 , · · · , β vs is a
basis for ker (ϕ (A)). Suppose then that the theorem is true for m − 1, m ≥ 2.
m
Now let the minimal polynomial for A on V be ϕ (A) where ϕ (λ) is monic and irreducible. Then ϕ (A) (V ) is an invariant subspace of V . What is the minimal polynomial
m−1
of A on ϕ (A) (V )? Clearly ϕ (A)
will send everything in ϕ (A) (V ) to 0. If η (λ) is the
minimal polynomial of A on ϕ (A) (V ) , then
m−1
ϕ (λ)
= l (λ) η (λ) + r (λ)
and r (λ) must equal 0 since otherwise r (A) = 0 and η (λ) was not minimal. By Corollary
k
8.3.11, η (λ) = ϕ (λ) for some k ≤ m − 1. However, it cannot happen that k < m − 1
m
because if so, ϕ (λ) {
would fail to be
} the minimal polynomial for A on V . By induction,
ϕ (A) (V ) has a basis
β x1 , · · · , β xp .
Let yj ∈ V be such that ϕ (A) yj = xj . Consider
independent? Suppose
0=
β yi |
p |∑
∑
aij Aj−1 yi ≡
i=1 j=1
p
∑
{
}
β y1 , · · · , β yp . Are these vectors
fi (A) yi
(10.4)
i=1
{
}
If the sum involved xi in place of yi , then something could be said because β x1 , · · · , β xp
is a basis. Do ϕ (A) to both sides to obtain
0=
β yi |
p |∑
∑
i=1 j=1
aij Aj−1 xi ≡
p
∑
fi (A) xi
i=1
( )
Now fi (A) xi = 0 for each i since fi (A) xi ∈ span β xi . Let η i (λ) be the monic polynomial
of smallest degree such that η i (A) xi = 0. It follows from the usual division algorithm
m−1
m−1
that η i (λ) divides fi (λ). Also, ϕ (A)
xi = 0 and so η i (λ) must divide ϕ (λ)
. From
10.4. NILPOTENT TRANSFORMATIONS
261
k
Corollary 8.3.11, it follows that, since ϕ (λ) is irreducible, η i (λ) = ϕ (λ) for some k ≤ m−1.
Thus ϕ (λ) divides η i (λ) which divides fi (λ). Hence fi (λ) = ϕ (λ) gi (λ) . Now
0=
p
∑
i=1
fi (A) yi =
p
∑
gi (A) ϕ (A) yi =
i=1
p
∑
gi (A) xi .
i=1
( )
By the same reasoning just given, since gi (A) xi ∈ span β xi , it follows that each gi (A) xi =
0. Therefore, fi (A) yi = gi (A) ϕ (A) yi = gi (A) xi = 0. Therefore,
β yj ∑
aij Aj−1 yi = 0
j=1
and by independence of β yi , this implies aij = 0.
(
)
Next, it follows from the definition that for W ≡ span β y1 , · · · , β yp ,
(
)
(
)
ϕ (A) (V ) = span β x1 , · · · , β xp ⊆ ϕ (A) span β y1 , · · · , β yp ≡ ϕ (A) (W )
m
Now W is an A invariant
{ subspace of V = ker (ϕ (A)
} ). Use Lemma 10.3.3 again to obtain
β z1 , · · · , β zq such that β z1 , · · · , β zq , β y1 , · · · , β yp is a basis for ker (ϕ (A)) + W . From
the above, ϕ (A) (W ) = ϕ (A) (V ). Let W ′ = W + ker (ϕ (A)). Let U be the restriction of
ϕ (A) to W ′ which is also ϕ (A) invariant. Then from the above inclusion, it follows that
U (W ′ ) = ϕ (A) (V ) . Also ker (U ) = ker (ϕ (A)). This is because if x ∈ ker (ϕ (A)) , then
x ∈ W ′ and so U x = 0 also. If U x = 0, then x ∈ W ′ and ϕ (A) x = 0. Hence x ∈ ker (ϕ (A)).
Thus
dim (W ′ )
= rank (U ) + dim (ker (U ))
= rank (ϕ (A)) + dim (ker (ϕ (A))) = dim (V )
This shows V = W ′ and so the above yields the desired basis. 10.4
Nilpotent Transformations
Definition 10.4.1 Let V be a vector space over the field of scalars F. Then N ∈ L (V, V )
is called nilpotent if for some m, it follows that N m = 0.
The following lemma contains some significant observations about nilpotent transformations.
{
}
Lemma 10.4.2 Suppose N k x ̸= 0. Then x, N x, · · · , N k x is linearly independent. Also,
the minimal polynomial of N is λm where m is the first such that N m = 0.
∑k
i
Proof: Suppose
i=0 ci N x = 0 where not all ci = 0. There exists l such that
l+1
k ≤ l < m and N x = 0 but N l x ̸= 0. Then multiply both sides by N l to conclude that
c0 = 0. Next multiply both sides by N l−1 to conclude that c1 = 0 and continue this way to
obtain that all the ci = 0.
Next consider the claim that λm is the minimal polynomial. If p (λ) is the minimal
polynomial, then
p (λ) = λm l (λ) + r (λ)
262
CHAPTER 10. CANONICAL FORMS
where the degree of r (λ) is less than m or else r (λ) = 0. Suppose the degree of r (λ) is
less than m. Then you would have 0 = 0 + r (N ) . If r (λ) = a0 + a1 λ + · · · + as λs for
s ≤ m − 1, as ̸= 0, then for any x ∈ V,
0 = a0 x + a1 N x + · · · + as N s x
If for some x, N s x ̸= 0, then from the first part of the argument, the above equation could
not hold. Hence N s x = 0 for all x and so N s = 0 for some s < m, a contradiction to the
choice of m. It follows that r (λ) = 0 and so p (λ) cannot be the minimal polynomial unless
l (λ) = 1. Hence p (λ) = λm as claimed. {
}
For such a nilpotent transformation, let β x1 , · · · , β xq be a basis for ker (N m ) = V
where these β xi are cyclic. This basis exists thanks to Theorem 10.3.4. Thus
(
)
(
)
V = span β x1 ⊕ · · · ⊕ span β xq ,
each of these subspaces in the above direct sum being N invariant. For x one of the xk ,
consider β x given by
x, N x, N 2 x, · · · , N r−1 x
where N r x is in the span of the above vectors. Then by the above lemma, N r x = 0.
By Theorem 10.2.5, the matrix of N with respect to the above basis is the block diagonal
matrix


M1
0


..


.
0
Mq
(
)
where M k denotes the matrix of N restricted to span β xk . In computing this matrix, I
will order β xk as follows:
( r −1
)
N k xk , · · · , xk
Also the cyclic sets β x1 , β x2 , · · · , β xq will be ordered according to length, the length of
β xi being at least as large as the length of β xi+1 . Then since N rk xk = 0, it is now easy
to find M k . Using the procedure mentioned above for determining the matrix of a linear
transformation,
(
)
0 N rk −1 xk · · · N xk =


0 1
0



( r −1
)  0 0 ...
rk −2
k


N
xk N
xk · · · xk  . .

.
.. 1 
 .. ..
0 0 ··· 0
Thus the matrix Mk is the rk × rk matrix which has ones down the super diagonal and zeros
elsewhere. The following convenient notation will be used.
Definition 10.4.3 Jk (α) is a Jordan block

α

 0
Jk (α) = 
 .
 ..
0
if it is a k × k matrix of the form

1
0

..
..

.
.


..
..
.
. 1 
···
0 α
In words, there is an unbroken string of ones down the super diagonal and the number α
filling every space on the main diagonal with zeros everywhere else.
10.4. NILPOTENT TRANSFORMATIONS
263
Then with this definition and the above discussion, the following proposition has been
proved.
Proposition 10.4.4 Let N ∈ L (W, W ) be nilpotent,
Nm = 0
for some m ∈ N. Here W is a p dimensional vector space with field of scalars F. Then there
exists a basis for W such that the matrix of N with respect to this basis is of the form


Jr1 (0)
0


Jr2 (0)


J =

.
..


0
Jrs (0)
∑s
where r1 ≥ r2 ≥ · · · ≥ rs ≥ 1 and i=1 ri = p. In the above, the Jrj (0) is a Jordan block of
size rj × rj with 0 down the main diagonal.
In fact, the matrix of the above proposition is unique.
Corollary 10.4.5 Let J, J ′ both be matrices of the nilpotent linear transformation N ∈
L (W, W ) which are of the form described in Proposition 10.4.4. Then J = J ′ . In fact, if
the rank of J k equals the rank of J ′k for all nonnegative integers k, then J = J ′ .
Proof: Since J and J ′ are similar, it follows that for each k an integer, J k and J ′k are
similar. Hence, for each k, these matrices have the same rank. Now suppose J ̸= J ′ . Note
first that
r
r−1
Jr (0) = 0, Jr (0)
̸= 0.
Denote the blocks of J as Jrk (0) and the blocks of J ′ as Jrk′ (0). Let k be the first such that
Jrk (0) ̸= Jrk′ (0). Suppose that rk > rk′ . By block multiplication and the above observation,
it follows that the two matrices J rk −1 and J ′rk −1 are respectively of the forms

 

Mr1′
0
Mr1
0

 

..
..

 

.
.

 





′
M rk
Mrk

, 





0
0

 





..
..

 

.
.
0
0
0
0
where Mrj = Mrj′ for j ≤ k − 1 but Mrk′ is a zero rk′ × rk′ matrix while Mrk is a larger matrix
which is not equal to 0. For example,


0 ··· 1

. 
..
Mrk = 
. .. 
0
0
rk −1
Thus there are more pivot columns in J rk −1 than in (J ′ )
that J k and J ′k have the same rank. , contradicting the requirement
264
10.5
CHAPTER 10. CANONICAL FORMS
The Jordan Canonical Form
The Jordan canonical form has to do with the case where the minimal polynomial of A ∈
L (V, V ) splits. Thus there exist λk in the field of scalars such that the minimal polynomial
of A is of the form
r
∏
m
p (λ) =
(λ − λk ) k
k=1
Recall the following which follows from Theorem 9.4.4.
Proposition 10.5.1 Let the minimal polynomial of A ∈ L (V, V ) be given by
p (λ) =
r
∏
(λ − λk )
mk
k=1
Then the eigenvalues of A are {λ1 , · · · , λr }.
It follows from Corollary 10.2.3 that
V
m1
= ker (A − λ1 I)
≡ V1 ⊕ · · · ⊕ Vr
⊕ · · · ⊕ ker (A − λr I)
mr
where I denotes the identity linear transformation. Without loss of generality, let the
dimensions of the Vk be decreasing from left to right. These Vk are called the generalized
eigenspaces.
It follows from the definition of Vk that (A − λk I) is nilpotent on Vk and clearly each
Vk is A invariant. Therefore from Proposition 10.4.4, and letting Ak denote the restriction
of A to Vk , there exists an ordered basis for Vk , β k such that with respect to this basis, the
matrix of (Ak − λk I) is of the form given in that proposition, denoted here by J k . What is
the matrix of Ak with respect to β k ? Letting {b1 , · · · , br } = β k ,
∑
∑
∑(
)
k
k
Ak bj = (Ak − λk I) bj + λk Ibj ≡
Jsj
bs +
λk δ sj bs =
Jsj
+ λk δ sj bs
s
s
s
and so the matrix of Ak with respect to this basis is J k + λk I where I is the identity
matrix. Therefore, with respect to the ordered basis {β 1 , · · · , β r } the matrix of A is in
Jordan canonical form. This means the matrix is of the form


J (λ1 )
0


..
(10.5)


.
0
J (λr )
where J (λk ) is an mk × mk matrix of the form


Jk1 (λk )
0


Jk2 (λk )


(10.6)


.
..


0
Jkr (λk )
∑r
where k1 ≥ k2 ≥ · · · ≥ kr ≥ 1 and i=1 ki = mk . Here Jk (λ) is a k × k Jordan block of the
form


λ 1
0


 0 λ ...



(10.7)


.. ..

.
. 1 
0
0 λ
10.5. THE JORDAN CANONICAL FORM
265
This proves the existence part of the following fundamental theorem.
Note that if any of the β k consists of eigenvectors, then the corresponding Jordan block
will consist of a diagonal matrix having λk down the main diagonal. This corresponds to
m
mk = 1. The vectors which are in ker (A − λk I) k which are not in ker (A − λk I) are called
generalized eigenvectors.
The following is the main result on the Jordan canonical form.
Theorem 10.5.2 Let V be an n dimensional vector space with field of scalars C or some
other field such that the minimal polynomial of A ∈ L (V, V ) completely factors into powers
of linear factors. Then there exists a unique Jordan canonical form for A as described in
10.5 - 10.7, where uniqueness is in the sense that any two have the same number and size
of Jordan blocks.
Proof: It only remains to verify uniqueness. Suppose there are two, J and J ′ . Then these
are matrices of A with respect to possibly different bases and so they are similar. Therefore,
they have the same minimal polynomials and the generalized eigenspaces have the same
dimension. Thus the size of the matrices J (λk ) and J ′ (λk ) defined by the dimension of
these generalized eigenspaces, also corresponding to the algebraic multiplicity of λk , must
be the same. Therefore, they comprise the same set of positive integers. Thus listing the
eigenvalues in the same order, corresponding blocks J (λk ) , J ′ (λk ) are the same size.
It remains to show that J (λk ) and J ′ (λk ) are not just the same size but also are the
same up to order of the Jordan blocks running down their respective diagonals. It is only
necessary to worry about the number and size of the Jordan blocks making up J (λk ) and
J ′ (λk ) . Since J, J ′ are similar, so are J − λk I and J ′ − λk I.
Thus the following two matrices are similar


J (λ1 ) − λk I
0


..


.



J
(λ
)
−
λ
I
A≡
k
k




.
..


0
J (λr ) − λk I




B≡



J ′ (λ1 ) − λk I
0
..
.
J ′ (λk ) − λk I
..








.
0
J ′ (λr ) − λk I
( )
( )
and consequently, rank Ak = rank B k for all k ∈ N. Also, both J (λj ) − λk I and
J ′ (λj ) − λk I are one to one for every λj ̸= λk . Since all the blocks in both of these matrices
are one to one except the blocks J ′ (λk ) − λk I, J (λk ) − λk I, it
the
{ follows
( that this requires
m )}∞
m ∞
two sequences of numbers {rank ((J (λk ) − λk I) )}m=1 and rank (J ′ (λk ) − λk I)
m=1
must be the same.
Then


Jk1 (0)
0


Jk2 (0)


J (λk ) − λk I ≡ 

..


.
0
Jkr (0)
266
CHAPTER 10. CANONICAL FORMS
and a similar formula holds for J ′ (λk )



J ′ (λk ) − λk I ≡ 

Jl1 (0)
0
Jl2 (0)
..
0
.





Jlp (0)
and it is required to verify that p = r and that the same blocks occur in both. Without
loss of generality, let the blocks be arranged according to size with the largest on upper left
corner falling to smallest in lower right. Now the desired conclusion follows from Corollary
10.4.5. m
Note that if any of the generalized eigenspaces ker (A − λk I) k has a basis of eigenvectors, then it would be possible to use this basis and obtain a diagonal matrix in the
block corresponding to λk . By uniqueness, this is the block corresponding to the eigenvalue
λk . Thus when this happens, the block in the Jordan canonical form corresponding to λk
is just the diagonal matrix having λk down the diagonal and there are no generalized
eigenvectors.
The Jordan canonical form is very significant when you try to understand powers of a
matrix. There exists an n × n matrix S 1 such that
A = S −1 JS.
Therefore, A2 = S −1 JSS −1 JS = S −1 J 2 S and continuing this way, it follows
Ak = S −1 J k S.
where J is given in the above corollary. Consider J k . By block multiplication,
 k

0
J1


..
Jk = 
.
.
k
0
Jr
The matrix Js is an ms × ms matrix which is

α
 ..
Js =  .
0
of the form

··· ∗
. 
..
. .. 
···
(10.8)
α
which can be written in the form
Js = D + N
for D a multiple of the identity and N an upper triangular matrix with zeros down the main
diagonal. Therefore, by the Cayley Hamilton theorem, N ms = 0 because the characteristic
equation for N is just λms = 0. (You could also verify this directly.) Now since D is just a
multiple of the identity, it follows that DN = N D. Therefore, the usual binomial theorem
may be applied and this yields the following equations for k ≥ ms .
k ( )
∑
k
k
k
Js = (D + N ) =
Dk−j N j
j
j=0
ms ( )
∑
k
=
Dk−j N j ,
(10.9)
j
j=0
1 The
S here is written as S −1 in the corollary.
10.6. EXERCISES
267
the third equation holding because N ms = 0. Thus Jsk is of the form

 k
α ··· ∗

..  .
..
Jsk =  ...
.
. 
0 · · · αk
Lemma 10.5.3 Suppose J is of the form Js described above in 10.8 where the constant α,
on the main diagonal is less than one in absolute value. Then
( )
lim J k ij = 0.
k→∞
Proof: From 10.9, it follows that for large k, and j ≤ ms ,
( )
k
k (k − 1) · · · (k − ms + 1)
≤
.
j
ms !
( ) Therefore, letting C be the largest value of N j pq for 0 ≤ j ≤ ms ,
)
(
( ) k (k − 1) · · · (k − ms + 1)
k k−ms
|α|
J pq ≤ ms C
ms !
which converges to zero as k → ∞. This is most easily seen by applying the ratio test to
the series
)
∞ (
∑
k (k − 1) · · · (k − ms + 1)
k−ms
|α|
ms !
k=ms
and then noting that if a series converges, then the k th term converges to zero. 10.6
Exercises
1. In the discussion of Nilpotent transformations, it was asserted that if two n×n matrices
A, B are similar, then Ak is also similar to B k . Why is this so? If two matrices are
similar, why must they have the same rank?
2. If A, B are both invertible, then they are both row equivalent to the identity matrix.
Are they necessarily similar? Explain.
3. Suppose you have two nilpotent matrices A, B and Ak and B k both have the same
rank for all k ≥ 1. Does it follow that A, B are similar? What if it is not known that
A, B are nilpotent? Does it follow then?
4. When we say a polynomial equals zero, we mean that all the coefficients equal 0. If we
assign a different meaning to it which says that a polynomial p (λ) equals zero when
it is the zero function, (p (λ) = 0 for every λ ∈ F.) does this amount to the same
thing? Is there any difference in the two definitions for ordinary fields like Q? Hint:
Consider for the field of scalars Z2 , the integers mod 2 and consider p (λ) = λ2 + λ.
5. Let A ∈ L (V, V ) where V is a finite dimensional vector space with field of scalars F.
Let p (λ) be the minimal polynomial and suppose ϕ (λ) is any nonzero polynomial such
that ϕ (A) is not one to one and ϕ (λ) has smallest possible degree such that ϕ (A) is
nonzero and not one to one. Show ϕ (λ) must divide p (λ).
268
CHAPTER 10. CANONICAL FORMS
6. Let A ∈ L (V, V ) where V is a finite dimensional vector space with field of scalars F.
Let p (λ) be the minimal polynomial and suppose ϕ (λ) is an irreducible polynomial
with the property that ϕ (A) x = 0 for some specific x ̸= 0. Show that ϕ (λ) must
divide p (λ) . Hint: First write p (λ) = ϕ (λ) g (λ) + r (λ) where r (λ) is either 0 or
has degree smaller than the degree of ϕ (λ). If r (λ) = 0 you are done. Suppose it is
not 0. Let η (λ) be the monic polynomial of smallest degree with the property that
η (A) x = 0. Now use the Euclidean algorithm to divide ϕ (λ) by η (λ) . Contradict the
irreducibility of ϕ (λ) .
7. Suppose A is a linear transformation and let the characteristic polynomial be
q
∏
det (λI − A) =
ϕj (λ)
nj
j=1
where the ϕj (λ) are irreducible. Explain using Corollary 8.3.11 why the irreducible
factors of∏
the minimal polynomial are ϕj (λ) and why the minimal polynomial is of
q
r
the form j=1 ϕj (λ) j where rj ≤ nj . You can use the Cayley Hamilton theorem if
you like.

8. Let
1
A= 0
0
0
0
1

0
−1 
0
Find the minimal polynomial for A.
9. Suppose{A is an n × n matrix
} and let v be a vector. Consider the A cyclic set of
vectors v, Av, · · · , Am−1 v where this is an independent set of vectors but Am v is
a linear combination of the preceding vectors in the list. Show how to obtain a monic
polynomial of smallest degree, m, ϕv (λ) such that
ϕv (A) v = 0
Now let {w1 , · · · , wn } be a basis and let ϕ (λ) be the least common multiple of the
ϕwk (λ) . Explain why this must be the minimal polynomial of A. Give a reasonably
easy algorithm for computing ϕv (λ).
10. Here is a matrix.

−7
 −21
70

−1 −1
−3 −3 
10 10
Using the process of Problem 9 find the minimal polynomial of this matrix. It turns
out the characteristic polynomial is λ3 .
11. Find the minimal polynomial for


1 2 3
A= 2 1 4 
−3 2 1
by the above technique. Is what you found also the characteristic polynomial?
12. Let A be an n × n matrix with field of scalars C. Letting λ be an eigenvalue, show
the dimension of the eigenspace equals the number of Jordan blocks in the Jordan
canonical form which are associated with λ. Recall the eigenspace is ker (λI − A) .
10.6. EXERCISES
269
13. For any n × n matrix, why is the dimension of the eigenspace always less than or
equal to the algebraic multiplicity of the eigenvalue as a root of the characteristic
equation? Hint: Note the algebraic multiplicity is the size of the appropriate block
in the Jordan form.
14. Give an example of two nilpotent matrices which are not similar but have the same
minimal polynomial if possible.
15. Use the existence of the Jordan canonical form for a linear transformation whose
minimal polynomial factors completely to give a proof of the Cayley Hamilton theorem
which is valid for any field of scalars. Hint: First assume the minimal polynomial
factors completely into linear factors. If this does not happen, consider a splitting field
of the minimal polynomial. Then consider the minimal polynomial with respect to
this larger field. How will the two minimal polynomials be related? Show the minimal
polynomial always divides the characteristic polynomial.
16. Here is a matrix. Find its Jordan canonical form by directly finding the eigenvectors
and generalized eigenvectors based on these to find a basis which will yield the Jordan
form. The eigenvalues are 1 and 2.


−3 −2 5 3
 −1 0 1 2 


 −4 −3 6 4 
−1 −1 1 3
Why is it typically impossible to find the Jordan canonical form?
17. People like to consider the solutions of first order linear systems of equations which
are of the form
x′ (t) = Ax (t)
where here A is an n × n matrix. From the theorem on the Jordan canonical form,
there exist S and S −1 such that A = SJS −1 where J is a Jordan form. Define
y (t) ≡ S −1 x (t) . Show y′ = Jy. Now suppose Ψ (t) is an n × n matrix whose columns
are solutions of the above differential equation. Thus
Ψ′ = AΨ
Now let Φ be defined by SΦS −1 = Ψ. Show
Φ′ = JΦ.
18. In the above Problem show that
′
det (Ψ) = trace (A) det (Ψ)
and so
det (Ψ (t)) = Cetrace(A)t
This is called Abel’s formula and det (Ψ (t)) is called the Wronskian. Hint: Show it
suffices to consider
Φ′ = JΦ
and establish the formula for Φ. Next let


ϕ1


Φ =  ... 
ϕn
270
CHAPTER 10. CANONICAL FORMS
where the ϕj are the rows of Φ. Then explain why
′
det (Φ) =
n
∑
det (Φi )
(10.10)
i=1
where Φi is the same as Φ except the ith row is replaced with ϕ′i instead of the row
ϕi . Now from the form of J,
Φ′ = DΦ + N Φ
where N has all nonzero entries above the main diagonal. Explain why
ϕ′i (t) = λi ϕi (t) + ai ϕi+1 (t)
Now use this in the formula for the derivative of the Wronskian given in 10.10 and use
properties of determinants to obtain
′
det (Φ) =
n
∑
λi det (Φ) .
i=1
Obtain Abel’s formula
det (Φ) = Cetrace(A)t
and so the Wronskian det Φ either vanishes identically or never.
19. Let A be an n × n matrix and let J be its Jordan canonical form. Recall J is a block
diagonal matrix having blocks Jk (λ) down the diagonal. Each of these blocks is of
the form


λ 1
0


.


λ ..


Jk (λ) = 

.
.. 1 

0
λ
Now for ε > 0 given, let the diagonal matrix Dε be given by


1
0


ε


Dε = 

.
.


.
0
εk−1
Show that Dε−1 Jk (λ) Dε has the same form as Jk (λ) but instead of ones down the
super diagonal, there is ε down the super diagonal. That is Jk (λ) is replaced with


λ ε
0


.


λ ..




..

. ε 
0
λ
Now show that for A an n × n matrix, it is similar to one which is just like the Jordan
canonical form except instead of the blocks having 1 down the super diagonal, it has
ε.
10.6. EXERCISES
271
20. Let A be in L (V, V ) and suppose that Ap x ̸= 0 for some x ̸= 0. Show that Ap ek ̸= 0
for some ek ∈ {e1 , · · · , en } , a basis for V . If you have a matrix which is nilpotent,
(Am = 0 for some m) will it always be possible to find its Jordan form? Describe how
to do it if this is the case. Hint: First explain why all the eigenvalues are 0. Then
consider the way the Jordan form for nilpotent transformations was constructed in the
above.
21. Suppose A is an n × n matrix and that it has n distinct eigenvalues. How do the minimal polynomial and characteristic polynomials compare? Determine other conditions
based on the Jordan Canonical form which will cause the minimal and characteristic
polynomials to be different.
22. Suppose A is a 3 × 3 matrix and it has at least two distinct eigenvalues. Is it possible
that the minimal polynomial is different than the characteristic polynomial?
23. If A is an n × n matrix of entries from a field of scalars and if the minimal polynomial
of A splits over this field of scalars, does it follow that the characteristic polynomial
of A also splits? Explain why or why not.
24. In proving the uniqueness of the Jordan canonical form, it was asserted that if two
n × n matrices A, B are similar, then they have the same
∏s minimalr polynomial and also
that if this minimal polynomial is of the form p (λ) = i=1 ϕi (λ) i where the ϕi (λ) are
r
r
irreducible and monic, then ker (ϕi (A) i ) and ker (ϕi (B) i ) have the same dimension.
Why is this so? This was what was responsible for the blocks corresponding to an
eigenvalue being of the same size.
25. Show that a given complex n × n matrix is non defective (diagonalizable) if and only
if the minimal polynomial has no repeated roots.
26. Describe a straight forward way to determine the minimal polynomial of an n × n
matrix using row operations. Next show that if p (λ) and p′ (λ) are relatively prime,
then p (λ) has no repeated roots. With the above problem, explain how this gives a
way to determine whether a matrix is non defective.
27. In Theorem 10.3.4 show that each cyclic set β x is associated with a monic polynomial η x (λ) such that η x (A) (x) = 0 and this polynomial has smallest possible degree
such that this happens. Show that the cyclic sets β xi can be arranged such that
η xi+1 (λ) /η xi (λ).
28. Show that if A is a complex n × n matrix, then A and AT are similar. Hint: Consider
a Jordan block. Note that


 


λ 1 0
0 0 1
λ 0 0
0 0 1
 0 1 0  0 λ 1  0 1 0  =  1 λ 0 
1 0 0
0 0 λ
1 0 0
0 1 λ
29. Let A be a linear transformation defined on a finite( dimensional vector
space V . Let
)
∏q
m
the minimal polynomial be i=1 ϕi (λ) i and let β ivi , · · · , β ivri be the cyclic sets
1
i
}
{
∑ ∑
mi
i
i
is a basis for ker (ϕi (A) ). Let v = i j vji . Now let
such that β vi , · · · , β vri
1
i
q (λ) be any polynomial and suppose that
q (A) v = 0
Show{that it follows q (A)
} = 0. Hint: First consider the special case where a basis for
V is x, Ax, · · · , An−1 x and q (A) x = 0.
272
10.7
CHAPTER 10. CANONICAL FORMS
The Rational Canonical Form
∏q
m
Here one has the minimal polynomial in the form k=1 ϕ (λ) k where ϕ (λ) is an irreducible
monic polynomial. It is not necessarily the case that ϕ (λ) is a linear factor. Thus this case
is completely general and includes the situation where the field is arbitrary. In particular, it
includes the case where the field of scalars is, for example, the rational numbers. This may
be partly why it is called the rational canonical form. As you know, the rational numbers
are notorious for not having roots to polynomial equations which have integer or rational
coefficients.
This canonical form is due to Frobenius. I am following the presentation given in [10]
and there are more details given in this reference. Another good source which has additional
results is [15].
Here is a definition of the concept of a companion matrix.
Definition 10.7.1 Let
q (λ) = a0 + a1 λ + · · · + an−1 λn−1 + λn
be a monic polynomial. The companion matrix of q (λ) ,

0 ···
0
−a0
 1 0
−a1


..
.
.
.
.

.
.
.
0
1 −an−1
denoted as C (q (λ)) is the matrix





Proposition 10.7.2 Let q (λ) be a polynomial and let C (q (λ)) be its companion matrix.
Then q (C (q (λ))) = 0.
Proof: Write C instead of C (q (λ)) for short. Note that
Ce1 = e2 , Ce2 = e3 , · · · , Cen−1 = en
Thus
and so it follows
ek = C k−1 e1 , k = 1, · · · , n
(10.11)
{
}
e1 , Ce1 , C 2 e1 , · · · , C n−1 e1
(10.12)
are linearly independent. Hence these form a basis for F . Now note that Cen is given by
n
Cen = −a0 e1 − a1 e2 − · · · − an−1 en
and from 10.11 this implies
C n e1 = −a0 e1 − a1 Ce1 − · · · − an−1 C n−1 e1
and so q (C) e1 = 0. Now since 10.12 is a basis, every vector of Fn is of the form k (C) e1
for some polynomial k (λ). Therefore, if v ∈ Fn ,
q (C) v = q (C) k (C) e1 = k (C) q (C) e1 = 0
which shows q (C) = 0. The following theorem is on the existence of the rational canonical form.
10.7. THE RATIONAL CANONICAL FORM
273
Theorem 10.7.3 Let
V ) where V is a vector space with field of scalars F and
∏qA ∈ L (V,
m
minimal polynomial i=1 ϕi (λ) i where each ϕi (λ) is irreducible and monic. Letting Vk ≡
m
ker (ϕk (λ) k ) , it follows
V = V1 ⊕ · · · ⊕ Vq
where each Vk is A invariant. Letting Bk denote a basis for Vk and M k the matrix of the
restriction of A to Vk , it follows that the matrix of A with respect to the basis {B1 , · · · , Bq }
is the block diagonal matrix of the form


M1
0


..
(10.13)


.
0
Mq
{
}
If Bk is given as β v1 , · · · , β vs as described in Theorem 10.3.4 where each β vj is an A
cyclic set of vectors, then the matrix M k is of the form


r
C (ϕk (λ) 1 )
0


..
(10.14)
Mk = 

.
r
0
C (ϕk (λ) s )
where the A cyclic sets of vectors may be arranged in order such that the positive integers rj
r
r
satisfy r1 ≥ · · · ≥ rs and C (ϕk (λ) j ) is the companion matrix of the polynomial ϕk (λ) j .
Proof: By Theorem 10.2.5 the matrix of A with respect to {B1 , · · · , Bq } is of the
form
given in} 10.13. Now by Theorem 10.3.4 the basis Bk may be chosen in the form
{
β v1 , · · · , β vs where each β vk is an A cyclic set of vectors and also it can be assumed the
lengths of these β vk are decreasing. Thus
( )
( )
Vk = span β v1 ⊕ · · · ⊕ span β vs
(
)
and it only remains to consider the matrix of A restricted to span β vk . Then you can
apply Theorem 10.2.5 to get the result in 10.14. Say
β vk = vk , Avk , · · · , Ad−1 vk
where η (A) vk = 0 and the degree of η (λ) is d, the smallest degree such that this is so, η
r
being a monic polynomial. Then by Corollary 8.3.11, (η (λ)) = ϕk (λ) k where rk ≤ mk . It
remains to consider the matrix of A restricted to span β vk . Say
η (λ) = ϕk (λ)
rk
= a0 + a1 λ + · · · + ad−1 λd−1 + λd
Thus
Ad vk = −a0 vk − a1 Avk − · · · − ad−1 Ad−1 vk
Recall the formalism for finding the matrix of A restricted to this invariant subspace.
(
(
Avk
vk
A2 vk
Avk
A3 vk
A2 vk
···
···
−a0 vk − a1 Avk − · · · − ad−1 Ad−1 vk

0 0
0 ···
−a0
 1 0
−a1


)
..
.
..
.
Ad−1 vk 
 0 1

.
.
.
.

.
. 0 −ad−2
0
0
1 −ad−1
)
=








Thus the matrix of the transformation is the above. The is the companion matrix of
r
r
ϕk (λ) k = η (λ). In other words, C = C (ϕk (λ) k ) and so M k has the form claimed in
the theorem. 274
10.8
CHAPTER 10. CANONICAL FORMS
Uniqueness
Given A ∈ L (V, V ) where V is a vector space having field of scalars F, the above shows
there exists a rational canonical form for A. Could A have more than one rational canonical
form? Recall the definition of an A cyclic set. For convenience, here it is again.
{
}
Definition 10.8.1 Letting x ̸= 0 denote(by β x the vectors
x, Ax, A2 x, · · · , Am−1 x where
)
m is the smallest such that Am x ∈ span x, · · · , Am−1 x .
The following proposition ties these A cyclic sets to polynomials. It is just a review of
ideas used above to prove existence.
{
}
Proposition 10.8.2 Let x ̸= 0 and consider x, Ax, A2 x, · · · , Am−1 x . Then this is an
A cyclic set if and only if there exists a monic polynomial η (λ) such that η (A) x = 0
and among all such polynomials ψ (λ) satisfying ψ (A) x = 0, η (λ) has the smallest degree.
m
If V = ker (ϕ (λ) ) where ϕ (λ) is monic and irreducible, then for some positive integer
p
p ≤ m, η (λ) = ϕ (λ) .
m
Lemma 10.8.3 Let V be a vector space and A ∈ L (V, V ) has minimal polynomial
ϕ (λ) }
{
where ϕ (λ) is irreducible and has degree d. Let the basis for V consist of β v1 , · · · , β vs
where β vk is A cyclic as described above and the
canonical form for A is the matrix
rational
taken with respect to this basis. Then letting β vk denote
the number of vectors in β vk , it
follows there is only one possible set of numbers β vk .
p
Proof: Say β vj is associated with the polynomial ϕ (λ) j . Thus, as described above
β vj equals pj d. Consider the following table which comes from the A cyclic set
{
}
vj , Avj , · · · , Ad−1 vj , · · · , Apj d−1 vj
αj0
vj
ϕ (A) vj
..
.
p −1
ϕ (A) j vj
αj1
Avj
ϕ (A) Avj
..
.
p −1
ϕ (A) j Avj
αj2
A2 vj
ϕ (A) A2 vj
..
.
p −1
ϕ (A) j A2 vj
···
···
···
···
αjd−1
Ad−1 vj
ϕ (A) Ad−1 vj
..
.
p −1
ϕ (A) j Ad−1 vj
In the above, αjk signifies the vectors below it in the k th column. None of these vectors
p −1
below the top row are equal to 0 because the degree of ϕ (λ) j λd−1 is dpj − 1, which is
less than pj d and the smallest degree of a nonzero polynomial sending vj to 0 is pj d. Also,
each of these vectors is in the span of β vj and there are dpj of them, just as there are dpj
vectors in β vj .
{
}
Claim: The vectors αj0 , · · · , αjd−1 are linearly independent.
Proof of claim: Suppose
j −1
d−1 p∑
∑
k
cik ϕ (A) Ai vj = 0
i=0 k=0
Then multiplying both sides by ϕ (A)
d−1
∑
i=0
pj −1
this yields
pj −1
ci0 ϕ (A)
Ai vj = 0
10.8. UNIQUENESS
275
Now if any of the ci0 is nonzero this would imply there exists a polynomial having degree
smaller than pj d which sends vj to 0. Since this does not happen, it follows each ci0 = 0.
Thus
j −1
d−1 p∑
∑
k
cik ϕ (A) Ai vj = 0
i=0 k=1
p −2
Now multiply both sides by ϕ (A) j and do a similar argument to assert that ci1 = 0 for
each i. Continuing this
{ way, all the c}ik = 0 and this proves the claim.
Thus the vectors αj0 , · · · , αjd−1 are linearly independent and there are pj d = β vj ( )
of them. Therefore, they form a basis for span β vj . Also note that if you list the
columns in reverse
order starting from the bottom and going toward the top, the vectors
{
}
αj0 , · · · , αjd−1 yield Jordan blocks in the matrix of ϕ (A). Hence, considering all these vec{
}s
tors αj0 , · · · , αjd−1
, each listed in the reverse order, the matrix of ϕ (A) with respect
j=1
to this basis of V is in Jordan canonical form. See Proposition 10.4.4 and Theorem 10.5.2
on existence and uniqueness{for the Jordan
} form. This Jordan form is unique up to order
j
j
of the blocks. For a given j α0 , · · · , αd−1 yields d Jordan blocks of size pj for ϕ (A). The
size and number of Jordan blocks of ϕ (A) depends only on ϕ (A) , hence only on A. Once
A is determined, ϕ (A) is determined and hence the number and size of Jordan blocks is
determined, so the exponents pj are determined and this shows the lengths of the β vj , pj d
are also determined. Note that if the pj are known, then so is the rational canonical form because it comes
p
from blocks which are companion matrices of the polynomials ϕ (λ) j . Now here is the main
result.
Theorem 10.8.4 Let V be a vector space having field of scalars F and let A ∈ L (V, V ).
Then the rational canonical form of A is unique up to order of the blocks.
∏q
m
Proof: Let the minimal polynomial of A be k=1 ϕk (λ) k . Then recall from Corollary
10.2.3
V = V1 ⊕ · · · ⊕ Vq
m
where Vk = ker (ϕk (A) k ) . Also recall from Corollary 10.2.4 that the minimal polynomial
m
of the restriction of A to Vk is ϕk (λ) k . Now apply Lemma 10.8.3 to A restricted to Vk . In the case where two n × n matrices M, N are similar, recall this is equivalent to the
two being matrices of the same linear transformation taken with respect to two different
bases. Hence each are similar to the same rational canonical form.
Example 10.8.5 Here is a matrix.


−2 1
10 −2 
0
9
5
A= 2
9
Find a similarity transformation which will produce the rational canonical form for A.
The characteristic polynomial is λ3 − 24λ2 + 180λ − 432. This factors as
2
(λ − 6) (λ − 12)
It turns out this is also the minimal polynomial. You can see this by plugging in A where
you see λ and observing things don’t work if you delete one of the λ − 6 factors. There is
276
CHAPTER 10. CANONICAL FORMS
more on this in the exercises. It turns out
the minimal polynomial pretty
( you can compute
)
2
3
easily. Thus Q is the direct sum of ker (A − 6I) and ker (A − 12I) . Consider the first
of these. You see easily that this is
 


1
−1
y  1  + z  0  , y, z ∈ Q.
0
1
What about the length of A cyclic sets? It turns out it doesn’t matter much. You can start
with either of these and get a cycle of length 2. Lets pick the second one. This leads to the
cycle

 


 



−1
−4
−1
−12
−1
 0  ,  −4  = A  0  ,  −48  = A2  0 
1
0
1
−36
1
where the last of the three is a linear combination of the first two. Take the first two as
the first two columns of S. To get the third, you need a cycle of length 1 corresponding to
(
)T
ker (A − 12I) . This yields the eigenvector 1 −2 3
. Thus


−1 −4 1
S =  0 −4 −2 
1
0
3
Now using Proposition 9.3.10, the Rational canonical form for A should be

−1 

 

−1 −4 1
5 −2 1
−1 −4 1
0 −36 0
 0 −4 −2   2 10 −2   0 −4 −2  =  1 12
0 
1
0
3
9 0
9
1
0
3
0
0
12
Example 10.8.6 Here is a matrix.

12
 −4

A=
 4
 0
−4
−3 −19
1
1
5
5
−5 −5
3
11
−14
6
−2
2
6
8
−4
4
0
0






Find a basis such that if S is the matrix which has these vectors as columns S −1 AS is in
rational canonical form assuming the field of scalars is Q.
First it is necessary to find the minimal polynomial. Of course you can find the characteristic polynomial and then take away factors till you find the minimal polynomial. However,
there is a much better way which is described in the exercises. Leaving out this detail, the
minimal polynomial is
λ3 − 12λ2 + 64λ − 128
This polynomial factors as
(
)
(λ − 4) λ2 − 8λ + 32 ≡ ϕ1 (λ) ϕ2 (λ)
where the second factor is irreducible over Q. Consider ϕ2 (λ) first. Messy computations
yield








−1
−1
−1
−2
 1 
 0 
 0 
 0 








 + b 1  + c 0  + d 0 .
0
ker (ϕ2 (A)) = a 








 0 
 0 
 1 
 0 
0
0
0
1
10.8. UNIQUENESS
277
Now start with one of these basis vectors and look for an A cycle. Picking the first one, you
obtain the cycle

 

−1
−15
 1   5 

 

 0 , 1 

 

 0   −5 
0
7
because the next vector involving A2 yields a vector which is in the span of the above two.
You check this by making the vectors the columns of a matrix and finding the row reduced
echelon form. Clearly this cycle does not span ker (ϕ2 (A)) , so look for another cycle. Begin
with a vector which is not in the span of these two. The last one works well. Thus another
A cycle is

 

−16
−2
 0   4 

 

 0  ,  −4 

 

 0   0 
8
1
It follows a basis for ker (ϕ2 (A)) is

 
 
 

−2
−16
−1
−15 




 0   4   1   5 



 
 
 

 0  ,  −4  ,  0  ,  1 

 
 
 


 0   0   0   −5 







1
8
0
7
Finally consider a cycle coming from ker (ϕ1 (A)). This amounts to nothing more than find(
)T
ing an eigenvector for A corresponding to the eigenvalue 4. An eigenvector is −1 0 0 0 1
.
Now the desired matrix for the similarity transformation is


−2 −16 −1 −15 −1
 0
4
1
5
0 



0
−4
0
1
0
S≡


 0
0
0
−5
0 
1
8
0
7
1
Then doing the computations, you get

S
−1


AS = 


0 −32 0
0
0
1
8
0
0
0
0
0
0 −32 0
0
0
1
8
0
0
0
0
0
4






and you see this is in rational canonical form, the two 2×2 blocks being companion matrices
for the polynomial λ2 −8λ+32 and the 1×1 block being a companion matrix for λ−4. Note
that you could have written this without finding a similarity transformation to produce it.
This follows from the above theory which gave the existence of the rational canonical form.
Obviously there is a lot more which could be considered about rational canonical forms.
Just begin with a strange field and start investigating what can be said. One can also derive
more systematic methods for finding the rational canonical form. The advantage of this is
you don’t need to find the eigenvalues in order to compute the rational canonical form and
it can often be computed for this reason, unlike the Jordan form. The uniqueness of this
rational canonical form can be used to determine whether two matrices consisting of entries
in some field are similar.
278
CHAPTER 10. CANONICAL FORMS
10.9
Exercises
1. Letting A be a complex n × n matrix, in obtaining the rational canonical form, one
obtains Cn as a direct sum of the form
)
)
(
(
span β x1 ⊕ · · · ⊕ span β xr
where β x is an ordered cyclic set of vectors, x, Ax, · · · , Am−1 x such that Am x is in
the span
( of the previous)vectors. Now apply the Gram Schmidt process to the ordered
basis β x1 , β x2 , · · · , β xr , the vectors in each β xi listed according to increasing power
of A, thus obtaining an ordered basis (q1 , · · · , qn ) . Letting Q be the unitary matrix
which has these vectors as columns, show that Q∗ AQ equals a matrix B which satisfies
Bij = 0 if i − j ≥ 2. Such a matrix is called an upper Hessenberg matrix and this
shows that every n × n matrix is orthogonally similar to an upper Hessenberg matrix.
These are zero below the main sub diagonal, like companion matrices discussed above.
2. In the argument for Theorem 10.2.3 it was shown that m (A) ϕl (A) v = v whenever
r
r
v ∈ ker (ϕk (A) k ) . Show that m (A) restricted to ker (ϕk (A) k ) is the inverse of the
rk
linear transformation ϕl (A) on ker (ϕk (A) ) .
3. Suppose A is a linear transformation and let the characteristic polynomial be
det (λI − A) =
q
∏
ϕj (λ)
nj
j=1
where the ϕj (λ) are irreducible. Explain using Corollary 8.3.11 why the irreducible
factors of∏
the minimal polynomial are ϕj (λ) and why the minimal polynomial is of
q
r
the form j=1 ϕj (λ) j where rj ≤ nj . You can use the Cayley Hamilton theorem if
you like.
4. Find the minimal polynomial for


1 2 3
A= 2 1 4 
−3 2 1
by the above technique assuming the field of scalars is the rational numbers. Is what
you found also the characteristic polynomial?
5. Show, using the rational root theorem, the minimal polynomial for A in the above
problem is irreducible with respect to Q. Letting the field of scalars be Q find the
rational canonical form and a similarity transformation which will produce it.
6. Letting the field of scalars be Q, find the

1 2
 2 3

 1 3
1 2
rational canonical form for the matrix

1 −1
0 2 

2 4 
1 2
(
)
7. Let A : Q3 → Q3 be linear. Suppose the minimal polynomial is (λ − 2) λ2 + 2λ + 7 .
Find the rational canonical form. Can you give generalizations of this rather simple
problem to other situations?
10.9. EXERCISES
279
8. Find the rational canonical form with respect to the field of scalars equal to Q for the
matrix


0 0 1
A =  1 0 −1 
0 1 1
Observe that this particular matrix is already a companion matrix of λ3 − λ2 + λ − 1.
Then find the rational canonical form if the field of scalars equals C or Q + iQ.
9. Let q (λ) be a polynomial and C its companion matrix. Show the characteristic and
minimal polynomial of C are the same and both equal q (λ).
10. ↑Use the existence of the rational canonical form to give a proof of the Cayley Hamilton
theorem valid for any field, even fields like the integers mod p for p a prime. The earlier
proof based on determinants was fine for fields like Q or R where you could let λ → ∞
but it is not clear the same result holds in general.
11. Suppose you have two n×n matrices A, B whose entries are in a field F and suppose G
is an extension of F. For example, you could have F = Q and G = C. Suppose A and
B are similar with respect to the field G. Can it be concluded that they are similar
with respect to the field F? Hint: First show that the two have the same minimal
polynomial over F. Next consider the proof of Lemma 10.8.3 and show that they have
the same rational canonical form with respect to F.
280
CHAPTER 10. CANONICAL FORMS
Chapter 11
Markov Processes
11.1
Regular Markov Matrices
The existence of the Jordan form is the basis for the proof of limit theorems for certain
kinds of matrices called Markov matrices.
Definition 11.1.1 An n × n matrix A = (aij ) , is a Markov matrix if aij ≥ 0 for all i, j
and
∑
aij = 1.
i
It may also be called a stochastic matrix or a transition matrix. A Markov or stochastic
matrix is called regular if some power of A has all entries strictly positive. A vector v ∈ Rn ,
is a steady state if Av = v.
Lemma 11.1.2 The property of being a∑stochastic matrix is preserved by taking products.
It is also true if the sum is of the form j aij = 1.
Proof: Suppose the sum over a row equals 1 for A and B. Then letting the entries be
denoted by (aij ) and (bij ) respectively and the entries of AB by (cij ),
∑∑
∑
∑∑
∑
cij =
aik bkj =
aik bkj =
bkj = 1
i
i
k
k
i
k
It is obvious that when the product is taken, if each aij , bij ≥ 0, then the same will be
true
∑ of sums of products of these numbers. Similar reasoning works for the assumption that
j aij = 1. The following theorem is convenient for showing the existence of limits.
Theorem 11.1.3 Let A be a real p × p matrix having the properties
1. aij ≥ 0
2. Either
∑p
i=1
aij = 1 or
∑p
j=1
aij = 1.
3. The distinct eigenvalues of A are {1, λ2 , . . . , λm } where each |λj | < 1.
th
Then limn→∞ An = A∞ exists in the sense that limn→∞ anij = a∞
entry A∞ .
ij , the ij
n
th
n
Here aij denotes the ij entry of A . Also, if λ = 1 has algebraic multiplicity r, then
the Jordan block corresponding to λ = 1 is just the r × r identity.
281
282
CHAPTER 11. MARKOV PROCESSES
Proof. By the existence of the Jordan form for A, it follows that there exists an invertible
matrix P such that


I +N


Jr2 (λ2 )


P −1 AP = 
=J
..


.
Jrm (λm )
where I is r × r for r the multiplicity of the eigenvalue 1 and N is a nilpotent matrix for
which N r = 0. I will show that because of Condition 2, N = 0.
First of all,
Jri (λi ) = λi I + Ni
where Ni satisfies Niri = 0 for some ri > 0. It is clear that Ni (λi I) = (λi I) N and so
n ( )
r ( )
∑
∑
n
n
n
(Jri (λi )) =
N k λn−k
=
N k λn−k
i
i
k
k
k=0
k=0
which converges to 0 due to the assumption that |λi | < 1. There are finitely many terms
and a typical one is a matrix whose entries are no larger than an expression of the form
n−k
n−k
|λi |
Ck n (n − 1) · · · (n − k + 1) ≤ Ck |λi |
nk
∑∞
n−k k
n converges. Thus
which converges to 0 because, by the root test, the series n=1 |λi |
for each i = 2, . . . , p,
n
lim (Jri (λi )) = 0.
n→∞
By Condition 2, if
anij
denotes the ij th entry of An , then either
p
∑
anij = 1 or
p
∑
anij = 1, anij ≥ 0.
j=1
i=1
This follows from Lemma 11.1.2. It is obvious each anij ≥ 0, and so the entries of An must
be bounded independent of n.
It follows easily from
n times
z
}|
{
P −1 AP P −1 AP P −1 AP · · · P −1 AP = P −1 An P
that
P −1 An P = J n
(11.1)
Hence J n must also have bounded entries as n → ∞. However, this requirement is incompatible with an assumption that N ̸= 0.
If N ̸= 0, then N s ̸= 0 but N s+1 = 0 for some 1 ≤ s ≤ r. Then
s ( )
∑
n
n
(I + N ) = I +
Nk
k
k=1
One of the entries of N s is nonzero by the definition of s. Let this entry be nsij . Then this
( )
n
implies that one of the entries of (I + N ) is of the form ns nsij . This entry dominates the
(
)
ij th entries of nk N k for all k < s because
( ) ( )
n
n
lim
/
=∞
n→∞ s
k
11.1. REGULAR MARKOV MATRICES
283
n
Therefore, the entries of (I + N ) cannot all be bounded. From block multiplication,


n
(I + N )
n


(Jr2 (λ2 ))


P −1 An P = 

..


.
n
(Jrm (λm ))
and this is a contradiction because entries are bounded on the left and unbounded on the
right.
Since N = 0, the above equation implies limn→∞ An exists and equals


I


0
 −1

P
P
.
..


0
Are there examples which will cause the eigenvalue condition of this theorem to hold?
The following lemma gives such a condition. It turns out that if aij > 0, not just ≥ 0, then
the eigenvalue condition of the above theorem is valid.
Lemma 11.1.4 Suppose A = (aij ) is a stochastic matrix. Then λ = 1 is an eigenvalue. If
aij > 0 for all i, j, then if µ is an eigenvalue of A, either |µ| < 1 or µ = 1.
Proof: First consider the claim that 1 is an eigenvalue. By definition,
∑
1aij = 1
i
(
)T
and so AT v = v where v = 1 · · · 1
. Since A, AT have the same eigenvalues, this
shows 1 is an eigenvalue. Suppose then that µ is an eigenvalue. Is |µ| < 1 or µ = 1? Let v
be an eigenvector for AT and let |vi | be the largest of the |vj | .
∑
µvi =
aji vj
j
and now multiply both sides by µvi to obtain
∑
∑
2
2
|µ| |vi | =
aji vj µvi =
aji Re (vj µvi )
j
≤
∑
j
2
2
aji |vi | |µ| = |µ| |vi |
j
Therefore, |µ| ≤ 1. If |µ| = 1, then equality must hold in the above, and so vj vi µ must
be real and nonnegative for each j. In particular, this holds for j = i which shows µ is real
and nonnegative. Thus, in this case, µ = 1 because µ
¯ = µ is nonnegative and equal to 1.
The only other case is where |µ| < 1. Lemma 11.1.5 Let
∑ A be any Markov matrix and let v be a vector having all its components
∑
non negative with i vi = c. Then if w = Av, it follows that wi ≥ 0 for all i and i wi = c.
Proof: From the definition of w,
wi ≡
∑
j
aij vj ≥ 0.
284
Also
CHAPTER 11. MARKOV PROCESSES
∑
wi =
i
∑∑
i
aij vj =
j
∑∑
j
aij vj =
i
∑
vj = c. j
The following theorem about limits is now easy to obtain.
Theorem 11.1.6 Suppose A is a Markov matrix in which aij > 0 for all i, j and suppose
w is a vector. Then for each i,
(
)
lim Ak w i = vi
k→∞
k
where Av = v. In words, A w always converges
to a steady state. In addition to this, if
∑
the vector w satisfies
w
≥
0
for
all
i
and
w
=
c, then the vector v will also satisfy the
i
i
i
∑
conditions, vi ≥ 0, i vi = c.
Proof: By Lemma 11.1.4, since each aij > 0, the eigenvalues are either 1 or have absolute
value less than 1. Therefore, the claimed limit exists by Theorem 11.1.3. The assertion that
the components are nonnegative and sum to c follows from Lemma 11.1.5. That Av = v
follows from
v = lim An w = lim An+1 w = A lim An w = Av. n→∞
n→∞
n→∞
It is not hard to generalize the conclusion of this theorem to regular Markov processes.
Corollary 11.1.7 Suppose A is a regular Markov matrix, one for which the entries of Ak
are all positive for some k, and suppose w is a vector. Then for each i,
lim (An w)i = vi
n→∞
where Av = v. In words, An w always converges
to a steady state. In addition to this, if
∑
w
=
c, Then the vector v will also satisfy the
the vector w satisfies
w
≥
0
for
all
i
and
i i
∑ i
conditions vi ≥ 0, i vi = c.
Proof: Let the entries of Ak be all positive for some k. Now suppose that aij ≥ 0 for
all i, j and A = (aij ) is a Markov matrix. Then if B = (bij ) is a Markov matrix with bij > 0
for all ij, it follows that BA is a Markov matrix which has strictly positive entries. This is
because the ij th entry of BA is
∑
bik akj > 0,
k
k
Thus, from Lemma 11.1.4, A has an eigenvalue equal to 1 for all k sufficiently large, and
all the other eigenvalues have absolute value strictly less than 1. The same must be true of
A. If v ̸= 0 and Av = λv and |λ| = 1, then Ak v = λk v and so, by Lemma 11.1.4, λm = 1
if m ≥ k. Thus
1 = λk+1 = λk λ = λ
By Theorem 11.1.3, limn→∞ An w exists. The rest follows as in Theorem 11.1.6. 11.2
Migration Matrices
Definition 11.2.1 Let n locations be denoted by the numbers 1, 2, · · · , n. Also suppose it is
the case that each year aij denotes the proportion of residents in location j which move to
location i. Also suppose
∑ no one escapes or emigrates from without these n locations. This last
assumption requires i aij = 1. Thus (aij ) is a Markov matrix referred to as a migration
matrix.
11.3. ABSORBING STATES
285
T
If v = (x1 , · · · , xn ) where xi is the population of location
∑ i at a given instant, you obtain
the population of location i one year later by computing j aij xj = (Av)i . Therefore, the
(
)
population of location i after k years is Ak v i . Furthermore, Corollary 11.1.7 can be used
to predict in the case where A is regular what the long time population will be for the given
locations.
As an example of the above, consider the case where n = 3 and the migration matrix is
of the form


.6 0 .1
 .2 .8 0  .
.2 .2 .9
Now

2 

.6 0 .1
. 38 .0 2 . 15
 .2 .8 0  =  . 28 . 64 .0 2 
.2 .2 .9
. 34 . 34 . 83
(
)
and so the Markov matrix is regular. Therefore, Ak v i will converge to the ith component
of a steady state. It follows the steady state can be obtained from solving the system
. 6x + . 1z = x
. 2x + . 8y = y
. 2x + . 2y + . 9z = z
along with the stipulation that the sum of x, y, and z must equal the constant value present
at the beginning of the process. The solution to this system is
{y = x, z = 4x, x = x} .
If the total population at the beginning is 150,000, then you solve the following system
y = x, z = 4x, x + y + z = 150000
whose solution is easily seen to be {x = 25 000, z = 100 000, y = 25 000} . Thus, after a long
time there would be about four times as many people in the third location as in either of
the other two.
11.3
Absorbing States
There is a different kind of Markov process containing so called absorbing states which result
in transition matrices which are not regular. However, Theorem 11.1.3 may still apply. One
such example is the Gambler’s ruin problem. There is a total amount of money denoted by
b. The Gambler starts with an amount j > 0 and gambles till he either loses everything or
gains everything. He does this by playing a game in which he wins with probability p and
loses with probability q. When he wins, the amount of money he has increases by 1 and
when he loses, the amount of money he has decreases by 1. Thus the states are the integers
from 0 to b. Let pij denote the probability that the gambler has i at the end of a game
given that he had j at the beginning. Let pnij denote the probability that the gambler has i
after n games given that he had j initially. Thus
∑
pn+1
=
pik pnkj ,
ij
k
286
CHAPTER 11. MARKOV PROCESSES
and so pnij is the ij th entry of P n where P is the transition matrix. The above description
indicates that this transition probability matrix is of the form


1 q
0 ··· 0


 0 0 ...
0 



.. 
P =  0 p ... q
(11.2)
. 



 .
..
 ..
. 0 0 
0 ···
0
p 1
The absorbing states are 0 and b. In the first, the gambler has lost everything and hence
has nothing else to gamble, so the process stops. In the second, he has won everything and
there is nothing else to gain, so again the process stops.
Consider the eigenvalues of this matrix.
Lemma 11.3.1 Let p, q > 0 and p + q

0
 p


 0

 ..
 .

= 1. Then the eigenvalues of

q 0 ··· 0
0 q ··· 0 

.. 
..
. . 
p 0


..
..
.
.
0
q 

..
0 . 0
p 0
have absolute value less than 1.
Proof: By Gerschgorin’s theorem, (See Page 194) if λ is
Now suppose v is an eigenvector for λ. Then



qv2
v1
 pv1 + qv3 
 v2





 ..
..
Av = 
=
λ

 .
.



 pvn−2 + qvn 
 vn−1
pvn−1
vn
an eigenvalue, then |λ| ≤ 1.




.


Suppose |λ| = 1. Let vk be the first nonzero entry. Then
qvk+1 = vk
m
and so |vk+1 | > |vk |. If {|vj |}j=k is increasing, then
p |vm−1 | + q |vm | ≥ |pvm−2 + qvm | = |λvm−1 | = |vm−1 |
and so q |vm | ≥ q |vm−1 | . Thus the sequence is increasing. Hence |vn | ≥ |vn−1 | > 0. However,
the last line states that p |vn−1 | = |vn | which requires that |vn−1 | > |vn | , a contradiction. Now consider the eigenvalues of 11.2. For P given there,


1−λ q
0 ···
0


.
 0
−λ . .
0 




..
..
P − λI =  0

.
p
q
.




..
..

. −λ
.
0 
0
···
0
p 1−λ
11.3. ABSORBING STATES
287
and so, expanding the determinant of the matrix along the first column and then along the
last column yields


−λ q


 p ... ...

2
.
(1 − λ) det 


..

. −λ q 
p −λ
2
The roots of the polynomial after (1 − λ) have absolute value less than 1 because they are
just the eigenvalues of a matrix of the sort in Lemma 11.3.1. It follows that the conditions
of Theorem 11.1.3 apply and therefore, limn→∞ P n exists. Of course, the above transition matrix, models many other kinds of problems. It is called
a Markov process with two absorbing states, sometimes a random walk with two aborbing
states.
It is interesting to find the probability that the gambler loses all his money. This is given
by limn→∞ pn0j .From the transition matrix for the gambler’s ruin problem, it follows that
∑
n−1
n−1
pn0j =
p0k
pkj = qpn−1
0(j−1) + pp0(j+1) for j ∈ [1, b − 1] ,
k
pn00
=
1, and pn0b = 0.
Assume here that p ̸= q. Now it was shown above that limn→∞ pn0j exists. Denote by Pj
this limit. Then the above becomes much simpler if written as
Pj
= qPj−1 + pPj+1 for j ∈ [1, b − 1] ,
(11.3)
P0
=
(11.4)
1 and Pb = 0.
It is only required to find a solution to the above difference equation with boundary conditions. To do this, look for a solution in the form Pj = rj and use the difference equation
with boundary conditions to find the correct values of r. Thus you need
rj = qrj−1 + prj+1
and so to find r you need to have pr2 − r + q = 0, and so the solutions for r are r =
) 1 (
)
√
√
1 (
1 + 1 − 4pq ,
1 − 1 − 4pq
2p
2p
Now
√
√
√
1 − 4pq = 1 − 4p (1 − p) = 1 − 4p + 4p2 = 1 − 2p.
Thus the two values of r simplify to
1
q
1
(1 + 1 − 2p) = ,
(1 − (1 − 2p)) = 1
2p
p 2p
Therefore, for any choice of Ci , i = 1, 2,
C1 + C2
( )j
q
p
will solve the difference equation. Now choose C1 , C2 to satisfy the boundary conditions
11.4. Thus you need to have
( )b
q
C1 + C2 = 1, C1 + C2
=0
p
288
CHAPTER 11. MARKOV PROCESSES
It follows that
C2 =
Thus
Pj
=
=
qb
pb
,
C
=
1
pb − q b
q b − pb
( )j
q
p
(
)
b
b−j j
q j q b−j − pb−j
q
p q
−
=
q b − pb
q b − pb
q b − pb
qb
pb
+ b
b
b
q −p
p − qb
To find the solution in the case of a fair game, one could take the limp→1/2 of the above
solution. Taking this limit, you get
b−j
Pj =
.
b
You could also verify directly in the case where p = q = 1/2 in 11.3 and 11.4 that Pj = 1
and Pj = j are two solutions to the difference equation and proceeding as before.
11.4
Exercises
1. Suppose the migration matrix for three

.5
 .3
.2
locations is

0 .3
.8 0  .
.2 .7
Find a comparison for the populations in the three locations after a long time.
∑
2. Show that if i aij = 1, then if A = (aij ) , then the sum of the entries of Av equals
the sum of the entries of v. Thus it does not matter whether aij ≥ 0 for this to be so.
3. If A satisfies the conditions of the above problem, can it be concluded that limn→∞ An
exists?
4. Give an example of a non regular Markov matrix which has an eigenvalue equal to
−1.
5. Show that when a Markov matrix is non defective, all of the above theory can be proved
very easily. In particular, prove the theorem about the existence of limn→∞ An if the
eigenvalues are either 1 or have absolute value less than 1.
6. Find a formula for An where

5
2
 5
A=
 7
2
7
2
− 12
0
− 12
− 12
0
0
1
2
0

−1
−4 

− 52 
−2
n
Does limn→∞ A exist? Note that all the rows sum to 1. Hint: This matrix is similar
to a diagonal matrix. The eigenvalues are 1, −1, 12 , 12 .
7. Find a formula for An where

2
 4
A=
 5
2
3
− 12
0
− 12
− 12
1
2
1
1
1
2

−1
−4 

−2 
−2
11.4. EXERCISES
289
Note that the rows sum to 1 in this matrix also. Hint: This matrix is not similar
to a diagonal matrix but you can find the Jordan form and consider this in order to
obtain a formula for this product. The eigenvalues are 1, −1, 21 , 21 .
8. Find limn→∞ An if it exists for the matrix
 1
− 12
2
1
 −1
2
2
A=
1
 1
2
3
2
2
3
2
− 12
− 12
3
2
3
2

0
0 

0 
1
The eigenvalues are 12 , 1, 1, 1.
9. Give an example of a matrix A which has eigenvalues which are either equal to 1,−1,
or have absolute value strictly less than 1 but which has the property that limn→∞ An
does not exist.
10. If A is an n × n matrix such that all the eigenvalues have absolute value less than 1,
show limn→∞ An = 0.
11. Find an example of a 3 × 3 matrix A such that limn→∞ An does not exist but
limr→∞ A5r does exist.
12. If A is a Markov matrix and B is similar to A, does it follow that B is also a Markov
matrix?
13. In
∑ suppose everything is unchanged except that you assume either
∑ Theorem 11.1.3
a
≤
1
or
i aij ≤ 1. Would the same conclusion be valid? What if you don’t
j ij
insist that each aij ≥ 0? Would the conclusion hold in this case?
14. Let V be an n dimensional vector space and let x ∈ V and x ̸= 0. Consider β x ≡
x, Ax, · · · ,Am−1 x where
(
)
Am x ∈ span x,Ax, · · · ,Am−1 x
and m
{ is the smallest such
} that the above inclusion in the span takes place. Show
that x,Ax, · · · ,Am−1 x must be linearly independent. Next suppose {v1 , · · · , vn }
is a basis for V . Consider β vi as just discussed, having length mi . Thus Ami vi is a
linearly combination of vi ,Avi , · · · ,Am−1 vi for m as small as possible. Let pvi (λ) be
the monic polynomial which expresses this linear combination. Thus pvi (A) vi = 0
and the degree of pvi (λ) is as small as possible for this to take place. Show that the
minimal polynomial for A must be the monic polynomial which is the least common
multiple of these polynomials pvi (λ).
15. If A is a complex Hermitian n × n matrix which has all eigenvalues nonnegative, show
that there exists a complex Hermitian matrix B such that BB = A.
16. ↑Suppose A, B are n × n real Hermitian matrices and they both have all nonnegative
eigenvalues. Show that det (A + B) ≥ det (A)+det (B). Hint: Use the above problem
and the Cauchy Binet theorem. Let P 2 = A, Q2 = B where P, Q are Hermitian and
nonnegative. Then
(
)
(
) P
A+B = P Q
.
Q
290
CHAPTER 11. MARKOV PROCESSES
(
)
α c∗
17. Suppose B =
is an (n + 1) × (n + 1) Hermitian nonnegative matrix where
b A
α is a scalar and A is n × n. Show that α must be real, c = b, and A = A∗ , A is
nonnegative, and that if α = 0, then b = 0. Otherwise, α > 0.
18. ↑If A is an n × n complex Hermitian and nonnegative matrix, show that there exists
an upper triangular matrix B such that B ∗ B = A. Hint: Prove this by induction. It
is obviously true if n = 1. Now if you have an (n + 1) × (n
) nonnegative
( + 21) Hermitian
α
αb∗
matrix, then from the above problem, it is of the form
, α real.
αb A
19. ↑ Suppose A is a nonnegative Hermitian matrix (all eigenvalues are nonnegative) which
is partitioned as
(
)
A11 A12
A=
A21 A22
where A11 , A22 are square matrices. Show that det (A) ≤ det (A11 ) det (A22 ). Hint:
Use the above problem to factor A getting
)(
)
( ∗
B11 B12
B11 0∗
A=
∗
∗
0
B22
B22
B12
∗
∗
∗
B22 . Use the Cauchy Binet theoB12 + B22
B11 , A22 = B12
Next argue that A11 = B11
∗
∗
∗
B22 ) . Then explain
rem to argue that det (A22 ) = det (B12 B12 + B22 B22 ) ≥ det (B22
why
det (A)
∗
∗
= det (B11
) det (B22
) det (B11 ) det (B22 )
∗
∗
B22 )
= det (B11 B11 ) det (B22
20. ↑ Prove the inequality of Hadamard. If A is a Hermitian
matrix which is nonnegative
∏
(all eigenvalues are nonnegative), then det (A) ≤ i Aii .
Chapter 12
Inner Product Spaces
12.1
General Theory
It is assumed here that the field of scalars is either R or C. The usual example of an inner
product space is Cn or Rn as described earlier. However, there are many other inner product
spaces and the topic is of such importance that it seems appropriate to discuss the general
theory of these spaces.
Definition 12.1.1 A vector space X is said to be a normed linear space if there exists a
function, denoted by |·| : X → [0, ∞) which satisfies the following axioms.
1. |x| ≥ 0 for all x ∈ X, and |x| = 0 if and only if x = 0.
2. |ax| = |a| |x| for all a ∈ F.
3. |x + y| ≤ |x| + |y| .
This function |·| is called a norm.
The notation ||x|| is also often used. Not all norms are created equal. There are many
geometric properties which they may or may not possess. There is also a concept called an
inner product which is discussed next. It turns out that the best norms come from an inner
product.
Definition 12.1.2 A mapping (·, ·) : V × V → F is called an inner product if it satisfies
the following axioms.
1. (x, y) = (y, x).
2. (x, x) ≥ 0 for all x ∈ V and equals zero if and only if x = 0.
3. (ax + by, z) = a (x, z) + b (y, z) whenever a, b ∈ F.
Note that 2 and 3 imply (x, ay + bz) = a(x, y) + b(x, z).
Then a norm is given by
1/2
(x, x)
≡ |x| .
It remains to verify this really is a norm.
Definition 12.1.3 A normed linear space in which the norm comes from an inner product
as just described is called an inner product space.
291
292
CHAPTER 12. INNER PRODUCT SPACES
Example 12.1.4 Let V = Cn with the inner product given by
(x, y) ≡
n
∑
xk y k .
k=1
This is an example of a complex inner product space already discussed.
Example 12.1.5 Let V = Rn,
(x, y) = x · y ≡
n
∑
xj yj .
j=1
This is an example of a real inner product space.
Example 12.1.6 Let V be any finite dimensional vector space and let {v1 , · · · , vn } be a
basis. Decree that
{
1 if i = j
(vi , vj ) ≡ δ ij ≡
0 if i ̸= j
and define the inner product by
(x, y) ≡
n
∑
xi y i
i=1
where
x=
n
∑
x i vi , y =
i=1
n
∑
y i vi .
i=1
The above is well defined because {v1 , · · · , vn } is a basis. Thus the components xi
associated with any given x ∈ V are uniquely determined.
This example shows there is no loss of generality when studying finite dimensional vector
spaces with field of scalars R or C in assuming the vector space is actually an inner product
space. The following theorem was presented earlier with slightly different notation.
Theorem 12.1.7 (Cauchy Schwarz) In any inner product space
|(x, y)| ≤ |x||y|.
1/2
where |x| ≡ (x, x)
.
Proof: Let ω ∈ C, |ω| = 1, and ω(x, y) = |(x, y)| = Re(x, yω). Let
F (t) = (x + tyω, x + tωy).
Then from the axioms of the inner product,
F (t) = |x|2 + 2t Re(x, ωy) + t2 |y|2 ≥ 0.
This yields
|x|2 + 2t|(x, y)| + t2 |y|2 ≥ 0.
If |y| = 0, then the inequality requires that |(x, y)| = 0 since otherwise, you could pick large
negative t and contradict the inequality. If |y| > 0, it follows from the quadratic formula
that
4|(x, y)|2 − 4|x|2 |y|2 ≤ 0. Earlier it was claimed that the inner product defines a norm. In this next proposition
this claim is proved.
12.2. THE GRAM SCHMIDT PROCESS
293
Proposition 12.1.8 For an inner product space, |x| ≡ (x, x)
1/2
does specify a norm.
Proof: All the axioms are obvious except the triangle inequality. To verify this,
2
|x + y|
2
2
≡
(x + y, x + y) ≡ |x| + |y| + 2 Re (x, y)
≤
|x| + |y| + 2 |(x, y)|
≤
|x| + |y| + 2 |x| |y| = (|x| + |y|) . 2
2
2
2
2
The best norms of all are those which come from an inner product because of the following
identity which is known as the parallelogram identity.
1/2
Proposition 12.1.9 If (V, (·, ·)) is an inner product space then for |x| ≡ (x, x)
following identity holds.
2
2
2
, the
2
|x + y| + |x − y| = 2 |x| + 2 |y| .
It turns out that the validity of this identity is equivalent to the existence of an inner
product which determines the norm as described above. These sorts of considerations are
topics for more advanced courses on functional analysis.
Definition 12.1.10 A basis for an inner product space, {u1 , · · · , un } is an orthonormal
basis if
{
1 if k = j
(uk , uj ) = δ kj ≡
.
0 if k ̸= j
Note that if a list of vectors satisfies the above condition for being an orthonormal set,
then the list of vectors is automatically linearly independent. To see this, suppose
n
∑
cj uj = 0
j=1
Then taking the inner product of both sides with uk ,
0=
n
∑
j=1
12.2
cj (uj , uk ) =
n
∑
cj δ jk = ck .
j=1
The Gram Schmidt Process
Lemma 12.2.1 Let X be a finite dimensional inner product space of dimension n whose
basis is {x1 , · · · , xn } . Then there exists an orthonormal basis for X, {u1 , · · · , un } which has
the property that for each k ≤ n, span(x1 , · · · , xk ) = span (u1 , · · · , uk ) .
Proof: Let {x1 , · · · , xn } be a basis for X. Let u1 ≡ x1 / |x1 | . Thus for k = 1, span (u1 ) =
span (x1 ) and {u1 } is an orthonormal set. Now suppose for some k < n, u1 , · · · , uk have
been chosen such that (uj , ul ) = δ jl and span (x1 , · · · , xk ) = span (u1 , · · · , uk ). Then define
uk+1
∑k
xk+1 − j=1 (xk+1 , uj ) uj
,
≡ ∑k
xk+1 − j=1 (xk+1 , uj ) uj where the denominator is not equal to zero because the xj form a basis and so
xk+1 ∈
/ span (x1 , · · · , xk ) = span (u1 , · · · , uk )
(12.1)
294
CHAPTER 12. INNER PRODUCT SPACES
Thus by induction,
uk+1 ∈ span (u1 , · · · , uk , xk+1 ) = span (x1 , · · · , xk , xk+1 ) .
Also, xk+1 ∈ span (u1 , · · · , uk , uk+1 ) which is seen easily by solving 12.1 for xk+1 and it
follows
span (x1 , · · · , xk , xk+1 ) = span (u1 , · · · , uk , uk+1 ) .
If l ≤ k,

(uk+1 , ul ) =
C (xk+1 , ul ) −
k
∑

(xk+1 , uj ) (uj , ul )
j=1

= C (xk+1 , ul ) −
k
∑

(xk+1 , uj ) δ lj 
j=1
= C ((xk+1 , ul ) − (xk+1 , ul )) = 0.
n
The vectors, {uj }j=1 , generated in this way are therefore an orthonormal basis because
each vector has unit length. The process by which these vectors were generated is called the Gram Schmidt process.
The following corollary is obtained from the above process.
Corollary 12.2.2 Let X be a finite dimensional inner product space of dimension n whose
basis is {u1 , · · · , uk , xk+1 , · · · , xn } . Then if {u1 , · · · , uk } is orthonormal, then the Gram
Schmidt process applied to the given list of vectors in order leaves {u1 , · · · , uk } unchanged.
n
Lemma 12.2.3 Suppose {uj }j=1 is an orthonormal basis for an inner product space X.
Then for all x ∈ X,
n
∑
x=
(x, uj ) uj .
j=1
n
Proof: Since {uj }j=1 is a basis, there exist unique scalars {αi } such that
x=
n
∑
αj uj
j=1
It only remains to identify αk . From the properties of the inner product,
(x, uk ) =
n
∑
j=1
αj (uj , uk ) =
n
∑
αj δ jk = αk j=1
The following theorem is of fundamental importance. First note that a subspace of an
inner product space is also an inner product space because you can use the same inner
product.
Theorem 12.2.4 Let M be a subspace of X, a finite dimensional inner product space and
m
let {xi }i=1 be an orthonormal basis for M . Then if y ∈ X and w ∈ M,
{
}
2
2
|y − w| = inf |y − z| : z ∈ M
(12.2)
if and only if
(y − w, z) = 0
(12.3)
12.2. THE GRAM SCHMIDT PROCESS
for all z ∈ M. Furthermore,
w=
295
m
∑
(y, xi ) xi
(12.4)
i=1
is the unique element of M which has this property. It is called the orthogonal projection.
Proof: Let t ∈ R. Then from the properties of the inner product,
2
2
2
|y − (w + t (z − w))| = |y − w| + 2t Re (y − w, w − z) + t2 |z − w| .
(12.5)
If (y − w, z) = 0 for all z ∈ M, then letting t = 1, the middle term in the above expression
2
vanishes and so |y − z| is minimized when z = w.
Conversely, if 12.2 holds, then the middle term of 12.5 must also vanish since otherwise,
you could choose small real t such that
2
2
|y − w| > |y − (w + t (z − w))| .
Here is why. If Re (y − w, w − z) < 0, then let t be very small and positive. The middle
term in 12.5 will then be more negative than the last term is positive and the right side of
2
this formula will then be less than |y − w| . If Re (y − w, w − z) > 0 then choose t small
and negative to achieve the same result.
It follows, letting z1 = w − z that
Re (y − w, z1 ) = 0
for all z1 ∈ M. Now letting ω ∈ C be such that ω (y − w, z1 ) = |(y − w, z1 )| ,
|(y − w, z1 )| = (y − w, ωz1 ) = Re (y − w, ωz1 ) = 0,
which proves the first part of the theorem since z1 is arbitrary.
It only remains to verify that w given in 12.4 satisfies 12.3 and is the only point of M
which does so. To do this, note that if ci , di are scalars, then the properties of the inner
product and the fact the {xi } are orthonormal implies


m
m
∑
∑
∑

ci xi ,
dj x j  =
ci di .
i=1
j=1
By Lemma 12.2.3,
z=
i
∑
(z, xi ) xi
i
and so
(
y−
m
∑
i=1
=
m
∑
)
(y, xi ) xi , z
(
=
y−
m
∑
(y, xi ) xi ,
i=1
m
∑
)
(z, xi ) xi
i=1


m
m
∑
∑
(z, xi ) (y, xi ) − 
(y, xi ) xi ,
(z, xj ) xj 
i=1
=
i=1
m
∑
i=1
(z, xi ) (y, xi ) −
m
∑
i=1
j=1
(y, xi ) (z, xi ) = 0.
296
CHAPTER 12. INNER PRODUCT SPACES
2
This shows w given in 12.4 does minimize the function, z → |y − z| for z ∈ M. It only
remains to verify uniqueness. Suppose than that wi , i = 1, 2 minimizes this function of z
for z ∈ M. Then from what was shown above,
2
|y − w1 |
= |y − w2 + w2 − w1 |
2
2
= |y − w2 | + 2 Re (y − w2 , w2 − w1 ) + |w2 − w1 |
2
2
2
2
= |y − w2 | + |w2 − w1 | ≤ |y − w2 | ,
the last equal sign holding because w2 is a minimizer and the last inequality holding because
w1 minimizes. 12.3
Riesz Representation Theorem
The next theorem is one of the most important results in the theory of inner product spaces.
It is called the Riesz representation theorem.
Theorem 12.3.1 Let f ∈ L (X, F) where X is an inner product space of dimension n.
Then there exists a unique z ∈ X such that for all x ∈ X,
f (x) = (x, z) .
Proof: First I will verify uniqueness. Suppose zj works for j = 1, 2. Then for all x ∈ X,
0 = f (x) − f (x) = (x, z1 − z2 )
and so z1 = z2 .
It remains to verify existence. By Lemma 12.2.1, there exists an orthonormal basis,
n
{uj }j=1 . Define
n
∑
z≡
f (uj )uj .
j=1
Then using Lemma 12.2.3,

(x, z)
=
x,

= f
n
∑

f (uj )uj  =
j=1
n
∑

n
∑
f (uj ) (x, uj )
j=1
(x, uj ) uj  = f (x) . j=1
Corollary 12.3.2 Let A ∈ L (X, Y ) where X and Y are two inner product spaces of finite
dimension. Then there exists a unique A∗ ∈ L (Y, X) such that
(Ax, y)Y = (x, A∗ y)X
for all x ∈ X and y ∈ Y. The following formula holds
∗
(αA + βB) = αA∗ + βB ∗
Proof: Let fy ∈ L (X, F) be defined as
fy (x) ≡ (Ax, y)Y .
(12.6)
12.3. RIESZ REPRESENTATION THEOREM
297
Then by the Riesz representation theorem, there exists a unique element of X, A∗ (y) such
that
(Ax, y)Y = (x, A∗ (y))X .
It only remains to verify that A∗ is linear. Let a and b be scalars. Then for all x ∈ X,
(x, A∗ (ay1 + by2 ))X ≡ (Ax, (ay1 + by2 ))Y
≡ a (Ax, y1 ) + b (Ax, y2 ) ≡
∗
a (x, A (y1 )) + b (x, A∗ (y2 )) = (x, aA∗ (y1 ) + bA∗ (y2 )) .
Since this holds for every x, it follows
A∗ (ay1 + by2 ) = aA∗ (y1 ) + bA∗ (y2 )
which shows A∗ is linear as claimed.
Consider the last assertion that ∗ is conjugate linear.
(
∗ )
x, (αA + βB) y ≡ ((αA + βB) x, y)
= α (Ax, y) + β (Bx, y) = α (x, A∗ y) + β (x, B ∗ y)
(
) ( (
) )
= (x, αA∗ y) + x, βA∗ y = x, αA∗ + βA∗ y .
Since x is arbitrary,
(
)
∗
(αA + βB) y = αA∗ + βA∗ y
and since this is true for all y,
∗
(αA + βB) = αA∗ + βA∗ . Definition 12.3.3 The linear map, A∗ is called the adjoint of A. In the case when A : X →
X and A = A∗ , A is called a self adjoint map. Such a map is also called Hermitian.
( )T
Theorem 12.3.4 Let M be an m × n matrix. Then M ∗ = M
in words, the transpose
of the conjugate of M is equal to the adjoint.
Proof: Using the definition of the inner product in Cn ,
(M x, y) = (x,M ∗ y) ≡
∑
xi
∑
i
Also
(M x, y) =
(M ∗ )ij yj =
j
(M ∗ )ij yj xi .
i,j
∑∑
j
∑
Mji yj xi .
i
Since x, y are arbitrary vectors, it follows that Mji = (M ∗ )ij and so, taking conjugates of
both sides,
∗
Mij
= Mji The next theorem is interesting. You have a p dimensional subspace of Fn where F = R
or C. Of course this might be “slanted”. However, there is a linear transformation Q which
preserves distances which maps this subspace to Fp .
298
CHAPTER 12. INNER PRODUCT SPACES
Theorem 12.3.5 Suppose V is a subspace of Fn having dimension p ≤ n. Then there exists
a Q ∈ L (Fn , Fn ) such that
QV ⊆ span (e1 , · · · , ep )
and |Qx| = |x| for all x. Also
Q∗ Q = QQ∗ = I.
p
Proof: By Lemma 12.2.1 there exists an orthonormal basis for V, {vi }i=1 . By using the
Gram Schmidt process this may be extended to an orthonormal basis of the whole space,
Fn ,
{v1 , · · · , vp , vp+1 , · · · , vn } .
∑n
Now define Q ∈ L (Fn , Fn ) by Q (vi ) ≡ ei and extend linearly. If i=1 xi vi is an arbitrary
element of Fn ,
2
2
(
)2 n
n
n
n
∑
∑
∑
∑
2
x i vi .
xi ei =
|xi | = x i vi = Q
i=1
i=1
i=1
i=1
It remains to verify that Q∗ Q = QQ∗ = I. To do so, let x, y ∈ Fn . Then
(Q (x + y) , Q (x + y)) = (x + y, x + y) .
Thus
2
2
2
2
|Qx| + |Qy| + 2 Re (Qx,Qy) = |x| + |y| + 2 Re (x, y)
and since Q preserves norms, it follows that for all x, y ∈ Fn ,
Re (Qx,Qy) = Re (x,Q∗ Qy) = Re (x, y) .
Thus
Re (x,Q∗ Qy − y) = 0
(12.7)
for all x, y. Let ω be a complex number such that |ω| = 1 and
ω (x,Q∗ Qy − y) = |(x,Q∗ Qy − y)| .
Then from 12.7,
0 = Re (ωx, Q∗ Qy − y) = Re ω (x,Q∗ Qy − y)
= |(x,Q∗ Qy − y)|
and since x is arbitrary, it follows that for all y,
Q∗ Qy − y = 0
Thus
I = Q∗ Q.
Similarly QQ∗ = I. Note that is is actually shown that QV = span (e1 , · · · , ep ) and that in case p = n one
obtains that a linear transformation which maps an orthonormal basis to an orthonormal
basis is unitary.
12.4. THE TENSOR PRODUCT OF TWO VECTORS
12.4
299
The Tensor Product Of Two Vectors
Definition 12.4.1 Let X and Y be inner product spaces and let x ∈ X and y ∈ Y. Define
the tensor product of these two vectors, y ⊗ x, an element of L (X, Y ) by
y ⊗ x (u) ≡ y (u, x)X .
This is also called a rank one transformation because the image of this transformation is
contained in the span of the vector, y.
The verification that this is a linear map is left to you. Be sure to verify this! The
following lemma has some of the most important properties of this linear transformation.
Lemma 12.4.2 Let X, Y, Z be inner product spaces. Then for α a scalar,
∗
(α (y ⊗ x)) = αx ⊗ y
(12.8)
(z ⊗ y1 ) (y2 ⊗ x) = (y2 , y1 ) z ⊗ x
(12.9)
Proof: Let u ∈ X and v ∈ Y. Then
(α (y ⊗ x) u, v) = (α (u, x) y, v) = α (u, x) (y, v)
and
(u, αx ⊗ y (v)) = (u, α (v, y) x) = α (y, v) (u, x) .
Therefore, this verifies 12.8.
To verify 12.9, let u ∈ X.
(z ⊗ y1 ) (y2 ⊗ x) (u) = (u, x) (z ⊗ y1 ) (y2 ) = (u, x) (y2 , y1 ) z
and
(y2 , y1 ) z ⊗ x (u) = (y2 , y1 ) (u, x) z.
Since the two linear transformations on both sides of 12.9 give the same answer for every
u ∈ X, it follows the two transformations are the same. Definition 12.4.3 Let X, Y be two vector spaces. Then define for A, B ∈ L (X, Y ) and
α ∈ F, new elements of L (X, Y ) denoted by A + B and αA as follows.
(A + B) (x) ≡ Ax + Bx, (αA) x ≡ α (Ax) .
Theorem 12.4.4 Let X and Y be finite dimensional inner product spaces. Then L (X, Y )
is a vector space with the above definition of what it means to multiply by a scalar and add.
Let {v1 , · · · , vn } be an orthonormal basis for X and {w1 , · · · , wm } be an orthonormal basis
for Y. Then a basis for L (X, Y ) is
{wj ⊗ vi : i = 1, · · · , n, j = 1, · · · , m} .
Proof: It is obvious that L (X, Y ) is a vector space. It remains to verify the given set
is a basis. Consider the following:



∑
A −
(Avk , wl ) wl ⊗ vk  vp , wr  = (Avp , wr ) −
k,l
300
CHAPTER 12. INNER PRODUCT SPACES
∑
(Avk , wl ) (vp , vk ) (wl , wr )
k,l
= (Avp , wr ) −
∑
(Avk , wl ) δ pk δ rl = (Avp , wr ) − (Avp , wr ) = 0.
k,l
∑
Letting A − k,l (Avk , wl ) wl ⊗ vk = B, this shows that Bvp = 0 since wr is an arbitrary
element of the basis for Y. Since vp is an arbitrary element of the basis for X, it follows
B = 0 as hoped. This has shown {wj ⊗ vi : i = 1, · · · , n, j = 1, · · · , m} spans L (X, Y ) .
It only remains to verify the wj ⊗ vi are linearly independent. Suppose then that
∑
cij wj ⊗ vi = 0
i,j
Then do both sides to vs . By definition this gives
∑
∑
∑
0=
cij wj (vs , vi ) =
cij wj δ si =
csj wj
i,j
i,j
j
Now the vectors {w1 , · · · , wm } are independent because it is an orthonormal set and so the
above requires csj = 0 for each j. Since s was arbitrary, this shows the linear transformations,
{wj ⊗ vi } form a linearly independent set. Note this shows the dimension of L (X, Y ) = nm. The theorem is also of enormous
importance because it shows you can always consider an arbitrary linear transformation as
a sum of rank one transformations whose properties are easily understood. The following
theorem is also of great interest.
∑
Theorem 12.4.5 Let A = i,j cij wi ⊗ vj ∈ L (X, Y ) where as before, the vectors, {wi } are
an orthonormal basis for Y and the vectors, {vj } are an orthonormal basis for X. Then if
the matrix of A has entries Mij , it follows that Mij = cij .
Proof: Recall
Avi ≡
∑
Mki wk
k
Also
Avi
=
∑
ckj wk ⊗ vj (vi ) =
k,j
=
∑
ckj wk δ ij =
∑
ckj wk (vi , vj )
k,j
cki wk
k
k,j
Therefore,
∑
∑
k
Mki wk =
∑
cki wk
k
and so Mki = cki for all k. This happens for each i. 12.5
Least Squares
A common problem in experimental work is to find a straight line which approximates as
p
well as possible a collection of points in the plane {(xi , yi )}i=1 . The usual way of dealing
with these problems is by the method of least squares and it turns out that all these sorts
of approximation problems can be reduced to Ax = b where the problem is to find the best
x for solving this equation even when there is no solution.
12.5. LEAST SQUARES
301
Lemma 12.5.1 Let V and W be finite dimensional inner product spaces and let A : V → W
be linear. For each y ∈ W there exists x ∈ V such that
|Ax − y| ≤ |Ax1 − y|
for all x1 ∈ V. Also, x ∈ V is a solution to this minimization problem if and only if x is a
solution to the equation, A∗ Ax = A∗ y.
Proof: By Theorem 12.2.4 on Page 294 there exists a point, Ax0 , in the finite dimen2
2
sional subspace, A (V ) , of W such that for all x ∈ V, |Ax − y| ≥ |Ax0 − y| . Also, from
this theorem, this happens if and only if Ax0 − y is perpendicular to every Ax ∈ A (V ) .
Therefore, the solution is characterized by (Ax0 − y, Ax) = 0 for all x ∈ V which is the
same as saying (A∗ Ax0 − A∗ y, x) = 0 for all x ∈ V. In other words the solution is obtained
by solving A∗ Ax0 = A∗ y for x0 . Consider the problem of finding the least squares regression line in statistics. Suppose
n
you have given points in the plane, {(xi , yi )}i=1 and you would like to find constants m
and b such that the line y = mx + b goes through all these points. Of course this will be
impossible in general. Therefore, try to find m, b such that you do the best you can to solve
the system

 

y1
x1 1
(
)
m
 ..   ..

.
.. 
 . = .
b
yn
xn 1

2
(
y1 )
m


which is of the form y = Ax. In other words try to make A
−  ...  as small
b
yn as possible. According to what was just shown, it is desired to solve the following for m and
b.


y1
(
)
m


A∗ A
= A∗  ...  .
b
yn
Since A∗ = AT in this case,
( ∑n
x2i
∑i=1
n
i=1 xi
∑n
i=1
n
xi
)(
m
b
)
)
( ∑n
xi yi
i=1
∑
=
n
i=1 yi
Solving this system of equations for m and b,
∑n
∑n
∑n
− ( i=1 xi ) ( i=1 yi ) + ( i=1 xi yi ) n
m=
∑n
∑n
2
( i=1 x2i ) n − ( i=1 xi )
and
b=
∑n
∑n
∑n
∑n
− ( i=1 xi ) i=1 xi yi + ( i=1 yi ) i=1 x2i
.
∑n
∑n
2
( i=1 x2i ) n − ( i=1 xi )
One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in the
same way. In this case you solve as well as possible for a, b, and c the system
 2

  y 
x1 x1 1
1
a
 ..
..
..   b  =  .. 
 .
 . 
.
. 
c
x2 xn 1
yn
n
using the same techniques.
302
CHAPTER 12. INNER PRODUCT SPACES
12.6
Fredholm Alternative Again
The best context in which to study the Fredholm alternative is in inner product spaces.
This is done here.
Definition 12.6.1 Let S be a subset of an inner product space, X. Define
S ⊥ ≡ {x ∈ X : (x, s) = 0 for all s ∈ S} .
The following theorem also follows from the above lemma. It is sometimes called the
Fredholm alternative.
Theorem 12.6.2 Let A : V → W where A is linear and V and W are inner product spaces.
⊥
Then A (V ) = ker (A∗ ) .
Proof: Let y = Ax so y ∈ A (V ) . Then if A∗ z = 0,
(y, z) = (Ax, z) = (x, A∗ z) = 0
⊥
⊥
showing that y ∈ ker (A∗ ) . Thus A (V ) ⊆ ker (A∗ ) .
⊥
Now suppose y ∈ ker (A∗ ) . Does there exists x such that Ax = y? Since this might
not be immediately clear, take the least squares solution to the problem. Thus let x be a
solution to A∗ Ax = A∗ y. It follows A∗ (y − Ax) = 0 and so y − Ax ∈ ker (A∗ ) which implies
from the assumption about y that (y − Ax, y) = 0. Also, since Ax is the closest point to
y in A (V ) , Theorem 12.2.4 on Page 294 implies that (y − Ax, Ax1 ) = 0 for all x1 ∈ V.
=0
}|
{
z
2
In particular this is true for x1 = x and so 0 = (y − Ax, y) − (y − Ax, Ax) = |y − Ax| ,
⊥
showing that y = Ax. Thus A (V ) ⊇ ker (A∗ ) . Corollary 12.6.3 Let A, V, and W be as described above. If the only solution to A∗ y = 0
is y = 0, then A is onto W.
Proof: If the only solution to A∗ y = 0 is y = 0, then ker (A∗ ) = {0} and so every vector
⊥
from W is contained in ker (A∗ ) and by the above theorem, this shows A (V ) = W . 12.7
Exercises
1. Find the best solution to the system
x + 2y = 6
2x − y = 5
3x + 2y = 0
2. Find an orthonormal basis for R3 , {w1 , w2 , w3 } given that w1 is a multiple of the
vector (1, 1, 2).
3. Suppose A = AT is a symmetric real n × n matrix which has all positive eigenvalues.
Define
(x, y) ≡ (Ax, y) .
Show this is an inner product on Rn . What does the Cauchy Schwarz inequality say
in this case?
12.7. EXERCISES
303
4. Let ||x||∞ ≡ max {|xj | : j = 1, 2, · · · , n} . Show this is a norm on Cn . Here
x=
(
···
x1
)T
xn
.
Show
1/2
||x||∞ ≤ |x| ≡ (x, x)
where the above is the usual inner product on Cn .
(
∑n
5. Let ||x||1 ≡ j=1 |xj | .Show this is a norm on Cn . Here x = x1
···
xn
)T
. Show
1/2
||x||1 ≥ |x| ≡ (x, x)
where the above is the usual inner product on Cn . Show there cannot exist an inner
product such that this norm comes from the inner product as described above for
inner product spaces.
6. Show that if ||·|| is any norm on any vector space, then |||x|| − ||y||| ≤ ||x − y|| .
7. Relax the assumptions in the axioms for the inner product. Change the axiom about
(x, x) ≥ 0 and equals 0 if and only if x = 0 to simply read (x, x) ≥ 0. Show the Cauchy
1/2
1/2
Schwarz inequality still holds in the following form. |(x, y)| ≤ (x, x) (y, y) .
n
8. Let H be an inner product space and let {uk }k=1 be an orthonormal basis for H.
Show
n
∑
(x, y) =
(x, uk ) (y, uk ).
k=1
9. Let the vector space V consist of real polynomials of degree no larger than 3. Thus a
typical vector is a polynomial
of the form a + bx + cx2 + dx3 . For p, q ∈ V define the
∫1
inner product, (p, q) ≡ 0 p (x) q (x) dx. Show this is indeed an inner product.
Then
{
}
state the Cauchy Schwarz inequality in terms of this inner product. Show 1, x, x2 , x3
is a basis for V . Finally, find an orthonormal basis for V. This is an example of some
orthonormal polynomials.
10. Let Pn denote the polynomials of degree no larger than n − 1 which are defined on an
interval [a, b] . Let {x1 , · · · , xn } be n distinct points in [a, b] . Now define for p, q ∈ Pn ,
(p, q) ≡
n
∑
p (xj ) q (xj )
j=1
Show this yields an inner product on Pn . Hint: Most of the axioms are obvious. The
one which says (p, p) = 0 if and only if p = 0 is the only interesting one. To verify this
one, note that a nonzero polynomial of degree no more than n − 1 has at most n − 1
zeros.
11. Let C ([0, 1]) denote the vector space of continuous real valued functions defined on
[0, 1]. Let the inner product be given as
∫
(f, g) ≡
1
f (x) g (x) dx
0
Show this is an inner product. Also let V be the subspace described in Problem 9.
Using the result of this problem, find the vector in V which is closest to x4 .
304
CHAPTER 12. INNER PRODUCT SPACES
12. A regular Sturm Liouville problem involves the differential equation, for an unknown function of x which is denoted here by y,
′
(p (x) y ′ ) + (λq (x) + r (x)) y = 0, x ∈ [a, b]
and it is assumed that p (t) , q (t) > 0 for any t ∈ [a, b] and also there are boundary
conditions,
C1 y (a) + C2 y ′ (a)
C3 y (b) + C4 y ′ (b)
=
=
0
0
where
C12 + C22 > 0, and C32 + C42 > 0.
There is an immense theory connected to these important problems. The constant, λ
is called an eigenvalue. Show that if y is a solution to the above problem corresponding
to λ = λ1 and if z is a solution corresponding to λ = λ2 ̸= λ1 , then
∫ b
q (x) y (x) z (x) dx = 0.
(12.10)
a
and this defines an inner product. Hint: Do something like this:
′
(p (x) y ′ ) z + (λ1 q (x) + r (x)) yz
′
(p (x) z ′ ) y + (λ2 q (x) + r (x)) zy
= 0,
= 0.
Now subtract and either use integration by parts or show
′
′
′
(p (x) y ′ ) z − (p (x) z ′ ) y = ((p (x) y ′ ) z − (p (x) z ′ ) y)
and then integrate. Use the boundary conditions to show that y ′ (a) z (a)−z ′ (a) y (a) =
0 and y ′ (b) z (b) − z ′ (b) y (b) = 0. The formula, 12.10 is called an orthogonality relation. It turns out there are typically infinitely many eigenvalues and it is interesting
to write given functions as an infinite series of these “eigenfunctions”.
∫π
13. Consider the continuous functions defined on [0, π] , C ([0, π]) . Show (f, g) ≡ 0 f gdx
{√
}∞
2
is an inner product on this vector space. Show the functions
sin
(nx)
are
π
n=1
an orthonormal set. What does
this
mean
about
the
dimension
of
the
vector
space
√
(√
)
2
2
C ([0, π])? Now let VN = span
π sin (x) , · · · ,
π sin (N x) . For f ∈ C ([0, π]) find
a formula for the vector in VN which is closest to f with respect to the norm determined
from the above inner product. This is called the N th partial sum of the Fourier series
of f . An important problem is to determine whether and in what way this Fourier
series converges to the function f . The norm which comes from this inner product is
sometimes called the mean square norm.
14. Consider the subspace V ≡ ker (A) where

1 4
 2 1
A=
 4 9
5 6
−1
2
0
3

−1
3 

1 
4
Find an orthonormal basis for V. Hint: You might first find a basis and then use the
Gram Schmidt procedure.
12.7. EXERCISES
305
15. The Gram Schmidt process starts with a basis for a subspace {v1 , · · · , vn } and produces an orthonormal basis for the same subspace {u1 , · · · , un } such that
span (v1 , · · · , vk ) = span (u1 , · · · , uk )
for each k. Show that in the case of Rm the QR factorization does the same thing.
More specifically, if
(
)
A = v1 · · · vn
and if
A = QR ≡
(
q1
···
qn
)
R
then the vectors {q1 , · · · , qn } is an orthonormal set of vectors and for each k,
span (q1 , · · · , qk ) = span (v1 , · · · , vk )
16. Verify the parallelogram identify for any inner product space,
2
2
2
2
|x + y| + |x − y| = 2 |x| + 2 |y| .
Why is it called the parallelogram identity?
17. Let H be an inner product space and let K ⊆ H be a nonempty convex subset. This
means that if k1 , k2 ∈ K, then the line segment consisting of points of the form
tk1 + (1 − t) k2 for t ∈ [0, 1]
is also contained in K. Suppose for each x ∈ H, there exists P x defined to be a point
of K closest to x. Show that P x is unique so that P actually is a map. Hint: Suppose
z1 and z2 both work as closest points. Consider the midpoint, (z1 + z2 ) /2 and use the
parallelogram identity of Problem 16 in an auspicious manner.
18. In the situation of Problem 17 suppose K is a closed convex subset and that H
is complete. This means every Cauchy sequence converges. Recall from calculus a
sequence {kn } is a Cauchy sequence if for every ε > 0 there exists Nε such that
whenever m, n > Nε , it follows |km − kn | < ε. Let {kn } be a sequence of points of K
such that
lim |x − kn | = inf {|x − k| : k ∈ K}
n→∞
This is called a minimizing sequence. Show there exists a unique k ∈ K such that
limn→∞ |kn − k| and that k = P x. That is, there exists a well defined projection map
onto the convex subset of H. Hint: Use the parallelogram identity in an auspicious
manner to show {kn } is a Cauchy sequence which must therefore converge. Since K
is closed it follows this will converge to something in K which is the desired vector.
19. Let H be an inner product space which is also complete and let P denote the projection
map onto a convex closed subset, K. Show this projection map is characterized by
the inequality
Re (k − P x, x − P x) ≤ 0
for all k ∈ K. That is, a point z ∈ K equals P x if and only if the above variational
inequality holds. This is what that inequality is called. This is because k is allowed
to vary and the inequality continues to hold for all k ∈ K.
20. Using Problem 19 and Problems 17 - 18 show the projection map, P onto a closed
convex subset is Lipschitz continuous with Lipschitz constant 1. That is
|P x − P y| ≤ |x − y|
306
CHAPTER 12. INNER PRODUCT SPACES
21. Give an example of two vectors in R4 x, y and a subspace V such that x · y = 0 but
P x·P y ̸= 0 where P denotes the projection map which sends x to its closest point on
V.
22. Suppose you are given the data, (1, 2) , (2, 4) , (3, 8) , (0, 0) . Find the linear regression
line using the formulas derived above. Then graph the given data along with your
regression line.
23. Generalize the least squares procedure to the situation in which data is given and you
desire to fit it with an expression of the form y = af (x) + bg (x) + c where the problem
would be to find a, b and c in order to minimize the error. Could this be generalized
to higher dimensions? How about more functions?
24. Let A ∈ L (X, Y ) where X and Y are finite dimensional vector spaces with the dimension of X equal to n. Define rank (A) ≡ dim (A (X)) and nullity(A) ≡ dim (ker (A)) .
r
Show that nullity(A) + rank (A) = dim (X) . Hint: Let {xi }i=1 be a basis for ker (A)
r
n−r
n−r
and let {xi }i=1 ∪ {yi }i=1 be a basis for X. Then show that {Ayi }i=1 is linearly
independent and spans AX.
25. Let A be an m×n matrix. Show the column rank of A equals the column rank of A∗ A.
Next verify column rank of A∗ A is no larger than column rank of A∗ . Next justify the
following inequality to conclude the column rank of A equals the column rank of A∗ .
rank (A) = rank (A∗ A) ≤ rank (A∗ ) ≤
= rank (AA∗ ) ≤ rank (A) .
Hint: Start with an orthonormal basis, {Axj }j=1 of A (Fn ) and verify {A∗ Axj }j=1
is a basis for A∗ A (Fn ) .
r
r
26. Let A be a real m × n matrix and let A = QR be the QR factorization with Q
orthogonal and R upper triangular. Show that there exists a solution x to the equation
RT Rx = RT QT b
and that this solution is also a least squares solution defined above such that AT Ax =
AT b.
12.8
The Determinant And Volume
The determinant is the essential algebraic tool which provides a way to give a unified treatment of the concept of p dimensional volume of a parallelepiped in RM . Here is the definition
of what is meant by such a thing.
Definition 12.8.1 Let u1 , · · · , up be vectors in RM , M ≥ p. The parallelepiped determined
by these vectors will be denoted by P (u1 , · · · , up ) and it is defined as


p
∑

P (u1 , · · · , up ) ≡
sj uj : sj ∈ [0, 1] .


j=1
The volume of this parallelepiped is defined as
volume of P (u1 , · · · , up ) ≡ v (P (u1 , · · · , up )) ≡ (det (ui · uj ))
1/2
If the vectors are dependent, this definition will give the volume to be 0.
.
12.8. THE DETERMINANT AND VOLUME
307
∑
First lets observe the last assertion is true. Say ui = j̸=i αj uj . Then the ith row is
a linear combination of the other rows and so from the properties of the determinant, the
determinant of this matrix is indeed zero as it should be.
A parallelepiped is a sort of a squashed box. Here is a picture which shows the relationship between P (u1 , · · · , up−1 ) and P (u1 , · · · , up ).
6
N
up
3
θ
P (u1 , · · · , up−1 )
-
In a sense, we can define the volume any way we want but if it is to be reasonable, the
following relationship must hold. The appropriate definition of the volume of P (u1 , · · · , up )
in terms of P (u1 , · · · , up−1 ) is
v (P (u1 , · · · , up )) = |up | |cos (θ)| v (P (u1 , · · · , up−1 ))
(12.11)
In the case where p = 1, the parallelepiped P (v) consists of the single vector and the one
(
)1/2
dimensional volume should be |v| = vT v
. Now having made this definition, I will show
that this is the appropriate definition of p dimensional volume for every p.
Definition 12.8.2 Let {u1 , · · · , up } be vectors. Then

uT1
uT2
..
.


v (P (u1 , · · · , up )) ≡ det 


(

 u1

1/2
u2
···
up
)



uTp
As just pointed out, this is the only reasonable definition of volume in the case of one
vector. The next theorem shows that it is the only reasonable definition of volume of a
parallelepiped in the case of p vectors because 12.11 holds.
Theorem 12.8.3 With the above definition of volume, 12.11 holds.
Proof: To check whether this is so, it is necessary to find |cos (θ)| . This involves finding
the vector perpendicular to P (u1 , · · · , up−1 ) . Let {w1 , · · · , wp } be an orthonormal basis
for span (u1 , · · · , up ) such that span (w1 , · · · , wk ) = span (u1 , · · · , uk ) for each k ≤ p. Such
an orthonormal basis exists because of the Gram Schmidt procedure. First note that since
{wk } is an orthonormal basis for span (u1 , · · · , up ) ,
uj =
p
∑
(uj · wk ) wk
k=1
and if i, j ≤ k
uj · ui =
k
∑
k=1
(uj · wk ) (ui · wk )
308
CHAPTER 12. INNER PRODUCT SPACES
Therefore, for each k ≤ p



det 

uT1
uT2
..
.


(

 u1

···
u2
uk
)



uTk
is the determinant of a matrix whose ij th entry is
uTi uj = ui · uj =
k
∑
(ui · wr ) (wr · uj )
r=1
Thus this matrix is the product of the two k × k matrices, one which is the transpose of the
other.


(u1 · w1 ) (u1 · w2 ) · · · (u1 · wk )
 (u2 · w1 ) (u2 · w2 ) · · · (u2 · wk ) 


·

..
..
..


.
.
.
(uk · w1 ) (uk · w2 ) · · · (uk · wk )


(u1 · w1 ) (u2 · w1 ) · · · (uk · w1 )
 (u1 · w2 ) (u2 · w2 ) · · · (uk · w2 ) 




..
..
..


.
.
.
(u1 · wk ) (u2 · wk ) · · · (uk · wk )

It follows


det 







= det 


uT1
uT2
..
.


(

 u1

···
u2
uk
)



uTk
(u1 · w1 )
(u2 · w1 )
..
.
(u1 · w2 )
(u2 · w2 )
..
.
···
···
(u1 · wk )
(u2 · wk )
..
.
(uk · w1 ) (uk · w2 ) · · ·
(uk · wk )
2




and so from the definition,
v (P

(u1 · w1 )
 (u2 · w1 )

det 
..

.
(uk · w1 )
Now consider the vector



N ≡ det 

w1
(u1 · w1 )
..
.
(u1 , · · · , uk )) =
(u1 · w2 )
(u2 · w2 )
..
.
···
···
(uk · w2 )
···
w2
(u1 · w2 )
..
.





(uk · wk ) (u1 · wk )
(u2 · wk )
..
.
···
···
(up−1 · w1 ) (up−1 · w2 ) · · ·
wp
(u1 · wp )
..
.
(up−1 · wp )





12.9. EXERCISES
309
which results from formally expanding along the top row. Note that from what was just
discussed,
v (P (u1 , · · · , up−1 )) = ±A1p
Now it follows from the formula for expansion of a determinant along the top row that for
each j ≤ p − 1
p
p
∑
∑
N · uj =
(uj · wk ) (N · wk ) =
(uj · wk ) A1k
k=1
k=1
cofactor of the above matrix. Thus if j ≤ p − 1

(uj · w1 )
(uj · w2 )
···
(uj · wp )
 (u1 · w1 )
(u
·
w
)
·
·
·
(u
1
2
1 · wp )

N · uj = det 
..
..
..

.
.
.
(up−1 · w1 ) (up−1 · w2 ) · · · (up−1 · wp )
where A1k is the 1k
th



=0

because the matrix has two equal rows while if j = p, the above discussion shows N · up
equals ±v (P (u1 , · · · , up )). Therefore, N points in the direction of the normal vector in the
above picture or else it points in the opposite direction to this vector. From the geometric
description of the dot product,
|N · up |
|cos (θ)| =
|up | |N|
and it follows
|up | |cos (θ)| v (P (u1 , · · · , up−1 )) = |up |
=
|N · up |
v (P (u1 , · · · , up−1 ))
|up | |N|
v (P (u1 , · · · , up ))
v (P (u1 , · · · , up−1 ))
|N|
Now at this point, note that from the construction, wp · uk = 0 whenever k ≤ p − 1 because
uk ∈ span (w1 , · · · , wp−1 ). Therefore, |N| = |A1p | = v (P (u1 , · · · , up−1 )) and so the above
reduces to
|up | |cos (θ)| v (P (u1 , · · · , up−1 )) = v (P (u1 , · · · , up )) . The theorem shows that the only reasonable definition of p dimensional volume of a
parallelepiped is the one given in the above definition.
12.9
Exercises
T
T
T
1. Here are three vectors in R4 : (1, 2, 0, 3) , (2, 1, −3, 2) , (0, 0, 1, 2) . Find the three
dimensional volume of the parallelepiped determined by these three vectors.
T
T
2. Here are two vectors in R4 : (1, 2, 0, 3) , (2, 1, −3, 2) . Find the volume of the parallelepiped determined by these two vectors.
T
T
T
3. Here are three vectors in R2 : (1, 2) , (2, 1) , (0, 1) . Find the three dimensional
volume of the parallelepiped determined by these three vectors. Recall that from the
above theorem, this should equal 0.
4. Find the equation of the plane through the three points (1, 2, 3) , (2, −3, 1) , (1, 1, 7) .
5. Let T map a vector space V to itself. Explain why T is one to one if and only if T is
onto. It is in the text, but do it again in your own words.
310
CHAPTER 12. INNER PRODUCT SPACES
6. ↑Let all matrices be complex with complex field of scalars and let A be an n×n matrix
and B a m × m matrix while X will be an n × m matrix. The problem is to consider
solutions to Sylvester’s equation. Solve the following equation for X
AX − XB = C
where C is an arbitrary n × m matrix. Show there exists a unique solution if and only
if σ (A) ∩ σ (B) = ∅. Hint: If q (λ) is a polynomial, show first that if AX − XB = 0,
then q (A) X − Xq (B) = 0. Next define the linear map T which maps the n × m
matrices to the n × m matrices as follows.
T X ≡ AX − XB
Show that the only solution to T X = 0 is X = 0 so that T is one to one if and only if
σ (A)∩σ (B) = ∅. Do this by using the first part for q (λ) the characteristic polynomial
−1
for B and then use the Cayley Hamilton theorem. Explain why q (A) exists if and
only if the condition σ (A) ∩ σ (B) = ∅.
7. Compare Definition 12.8.2 with the Binet Cauchy theorem, Theorem 3.3.14. What is
the geometric meaning of the Binet Cauchy theorem in this context?
8. For W a subspace of V, W is said to have a complementary subspace [15] W ′ if
W ⊕ W ′ = V. Suppose that both W, W ′ are invariant with respect to A ∈ L (V, V ).
Show that for any polynomial f (λ) , if f (A) x ∈ W, then there exists w ∈ W such
that f (A) x = f (A) w. A subspace W is called A admissible if it is A invariant and
the condition of this problem holds.
{
}
9. ↑ Return to Theorem 10.3.4 about the existence of a basis β = β x1 , · · · , β xp for V
where A ∈ L (V, V ) . Adapt the statement and proof to show that if W is A admissible,
then it has a complementary subspace which is also A invariant. Hint:
The modified version of the theorem is: Suppose A ∈ L (V, V ) and the minimal polym
nomial of A is ϕ (λ) where ϕ (λ) is a monic irreducible polynomial. Also suppose
that W is an {
A admissible subspace. }
Then there exists a basis for V which is of
the form β = β x1 , · · · , β xp , v1 , · · · , vm where {v1 , · · · , vm } is a basis of W . Thus
(
)
span β x1 , · · · , β xp is the A invariant complementary subspace for W . You may want
to use the fact that ϕ (A) (V ) ∩ W = ϕ (A) (W ) which follows easily because W is A
admissible. Then use this fact to show that ϕ (A) (W ) is also A admissible.
10. Let U, H be finite dimensional inner product spaces. (More generally, complete inner
product spaces.) Let A be a linear map from U to H. Thus AU is a subspace of
H. For g ∈ AU, define A−1 g to be
( the unique )element of {x : Ax = g} which is
closest to 0. Then define (h, g)AU ≡ A−1 g, A−1 h U . Show that this is a well defined
inner product. Let U, H be finite dimensional inner product spaces. (More generally,
complete inner product spaces.) Let A be a linear map from U to H. Thus AU is a
subspace of H. For g ∈ AU, define A−1 g to be( the unique element
of {x : Ax = g}
)
which is closest to 0. Then define (h, g)AU ≡ A−1 g, A−1 h U . Showthat this
is a
well defined inner product and that if A is one to one, then ∥h∥AU = A−1 hU and
∥Ax∥AU = ∥x∥U .
Chapter 13
Self Adjoint Operators
13.1
Simultaneous Diagonalization
Recall the following definition of what it means for a matrix to be diagonalizable.
Definition 13.1.1 Let A be an n × n matrix. It is said to be diagonalizable if there exists
an invertible matrix S such that
S −1 AS = D
where D is a diagonal matrix.
Also, here is a useful observation.
Observation 13.1.2 If A is an n × n matrix and AS = SD for D a diagonal matrix, then
each column of S is an eigenvector or else it is the zero vector. This follows from observing
that for sk the k th column of S and from the way we multiply matrices,
Ask = λk sk
It is sometimes interesting to consider the problem of finding a single similarity transformation which will diagonalize all the matrices in some set.
Lemma 13.1.3 Let A be an n × n matrix and let B be an m × m matrix. Denote by C the
matrix
(
)
A 0
C≡
.
0 B
Then C is diagonalizable if and only if both A and B are diagonalizable.
−1
−1
Proof: Suppose SA
ASA = DA and SB
BSB = DB where (
DA and DB)are diagonal
SA 0
matrices. You should use block multiplication to verify that S ≡
is such that
0 SB
−1
S CS = DC , a diagonal matrix.
Conversely, suppose C is diagonalized by S = (s1 , · · · , sn+m ) . Thus S has columns si .
For each of these columns, write in the form
(
)
xi
si =
yi
311
312
CHAPTER 13. SELF ADJOINT OPERATORS
where xi ∈ Fn and where yi ∈ Fm . The result is
(
)
S11 S12
S=
S21 S22
where S11 is an n × n matrix and S22 is an m × m matrix. Then there is a diagonal matrix
(
)
D1 0
D = diag (λ1 , · · · , λn+m ) =
0 D2
such that
(
(
=
A
0
S11
S21
0
B
)(
S12
S22
)
S11 S12
S21 S22
)(
)
D1 0
0 D2
Hence by block multiplication
AS11 = S11 D1 , BS22 = S22 D2
BS21 = S21 D1 , AS12 = S12 D2
It follows each of the xi is an eigenvector of A or else is the zero vector and that each of the
yi is an eigenvector of B or is the zero vector. If there are n linearly independent xi , then
A is diagonalizable by Theorem 9.3.12 on Page 9.3.12.
The row rank of the matrix (x1 , · · · , xn+m ) must be n because if this is not so, the rank
of S would be less than n + m which would mean S −1 does not exist. Therefore, since the
column rank equals the row rank, this matrix has column rank equal to n and this means
there are n linearly independent eigenvectors of A implying that A is diagonalizable. Similar
reasoning applies to B. The following corollary follows from the same type of argument as the above.
Corollary 13.1.4 Let Ak be an nk × nk matrix and let C denote the block diagonal
)
) ( r
( r
∑
∑
nk
nk ×
k=1
matrix given below.


C≡
k=1
A1
0
..
0
.


.
Ar
Then C is diagonalizable if and only if each Ak is diagonalizable.
Definition 13.1.5 A set, F of n × n matrices is said to be simultaneously diagonalizable if
and only if there exists a single invertible matrix S such that for every A ∈ F , S −1 AS = DA
where DA is a diagonal matrix.
Lemma 13.1.6 If F is a set of n × n matrices which is simultaneously diagonalizable, then
F is a commuting family of matrices.
Proof: Let A, B ∈ F and let S be a matrix which has the property that S −1 AS is a
diagonal matrix for all A ∈ F. Then S −1 AS = DA and S −1 BS = DB where DA and DB
are diagonal matrices. Since diagonal matrices commute,
AB
= SDA S −1 SDB S −1 = SDA DB S −1
= SDB DA S −1 = SDB S −1 SDA S −1 = BA.
13.1. SIMULTANEOUS DIAGONALIZATION
Lemma 13.1.7 Let D be a diagonal matrix of the form

λ1 In1
0
···
0

..
.
.
 0
.
λ2 In2
.

D≡
..
.
.
..
..

0
.
0
···
0 λr Inr
313



,


(13.1)
where Ini denotes the ni × ni identity matrix and λi ̸= λj for i ̸= j and suppose B is a
matrix which commutes with D. Then B is a block diagonal matrix of the form


B1 0 · · ·
0

.. 
 0 B2 . . .
. 

(13.2)
B=

 .
..
..
 ..
.
. 0 
0 ···
0 Br
where Bi is an ni × ni matrix.
Proof: Let B = (Bij ) where Bii = Bi a block matrix as above in 13.2.


B11 B12 · · · B1r


 B21 B22 . . . B2r 


 .
.. 
..
..
 ..
.
.
. 
Br1 Br2 · · · Brr
Then by block multiplication, since B is given to commute with D,
λj Bij = λi Bij
Therefore, if i ̸= j, Bij = 0. Lemma 13.1.8 Let F denote a commuting family of n × n matrices such that each A ∈ F
is diagonalizable. Then F is simultaneously diagonalizable.
Proof: First note that if every matrix in F has only one eigenvalue, there is nothing to
prove. This is because for A such a matrix,
S −1 AS = λI
and so
A = λI
Thus all the matrices in F are diagonal matrices and you could pick any S to diagonalize
them all. Therefore, without loss of generality, assume some matrix in F has more than one
eigenvalue.
The significant part of the lemma is proved by induction on n. If n = 1, there is nothing
to prove because all the 1 × 1 matrices are already diagonal matrices. Suppose then that
the theorem is true for all k ≤ n − 1 where n ≥ 2 and let F be a commuting family of
diagonalizable n × n matrices. Pick A ∈ F which has more than one eigenvalue and let
S be an invertible matrix such that S −1 AS = D where D is of the form given in 13.1.
By permuting the columns of S there is no loss
{ of generality in
} assuming D has this form.
Now denote by Fe the collection of matrices, S −1 CS : C ∈ F . Note Fe features the single
matrix S.
314
CHAPTER 13. SELF ADJOINT OPERATORS
It follows easily that Fe is also a commuting family of diagonalizable matrices. By
Lemma 13.1.7 every B ∈ Fe is of the form given in 13.2 because each of these commutes
with D described above as S −1 AS and so by block multiplication, the diagonal blocks Bi
corresponding to different B ∈ Fe commute.
By Corollary 13.1.4 each of these blocks is diagonalizable. This is because B is known to
be so. Therefore, by induction, since all the blocks are no larger than n − 1 × n − 1 thanks to
the assumption that A has more than one eigenvalue, there exist invertible ni × ni matrices,
Ti such that Ti−1 Bi Ti is a diagonal matrix whenever Bi is one of the matrices making up
the block diagonal of any B ∈ F . It follows that for T defined by


T1 0 · · · 0

.. 
 0 T2 . . .
. 

,
T ≡ .

.
.
..
.. 0 
 ..
0 ···
0 Tr
then T −1 BT = a diagonal matrix for every B ∈ Fe including D. Consider ST. It follows
that for all C ∈ F ,
e
something in F
T −1
z }| {
S −1 CS
T = (ST )
−1
C (ST ) = a diagonal matrix. Theorem 13.1.9 Let F denote a family of matrices which are diagonalizable. Then F is
simultaneously diagonalizable if and only if F is a commuting family.
Proof: If F is a commuting family, it follows from Lemma 13.1.8 that it is simultaneously
diagonalizable. If it is simultaneously diagonalizable, then it follows from Lemma 13.1.6 that
it is a commuting family. 13.2
Schur’s Theorem
Recall that for a linear transformation, L ∈ L (V, V ) for V a finite dimensional inner product
space, it could be represented in the form
∑
L=
lij vi ⊗ vj
ij
where {v1 , · · · , vn } is an orthonormal basis. Of course different bases will yield different
matrices, (lij ) . Schur’s theorem gives the existence of a basis in an inner product space such
that (lij ) is particularly simple.
Definition 13.2.1 Let L ∈ L (V, V ) where V is a vector space. Then a subspace U of V is
L invariant if L (U ) ⊆ U.
In what follows, F will be the field of scalars, usually C but maybe R.
Theorem 13.2.2 Let L ∈ L (H, H) for H a finite dimensional inner product space such
that the restriction of L∗ to every L invariant subspace has its eigenvalues in F. Then there
n
exist constants, cij for i ≤ j and an orthonormal basis, {wi }i=1 such that
L=
j
n ∑
∑
cij wi ⊗ wj
j=1 i=1
The constants, cii are the eigenvalues of L. Thus the matrix whose ij th entry is cij is upper
triangular.
13.2. SCHUR’S THEOREM
315
Proof: If dim (H) = 1, let H = span (w) where |w| = 1. Then Lw = kw for some k.
Then
L = kw ⊗ w
because by definition, w ⊗ w (w) = w. Therefore, the theorem holds if H is 1 dimensional.
Now suppose the theorem holds for n − 1 = dim (H) . Let wn be an eigenvector for L∗ .
Dividing by its length, it can be assumed |wn | = 1. Say L∗ wn = µwn . Using the Gram
Schmidt process, there exists an orthonormal basis for H of the form {v1 , · · · , vn−1 , wn } .
Then
(Lvk , wn ) = (vk , L∗ wn ) = (vk , µwn ) = 0,
which shows
L : H1 ≡ span (v1 , · · · , vn−1 ) → span (v1 , · · · , vn−1 ) .
Denote by L1 the restriction of L to H1 . Since H1 has dimension n − 1, the induction
hypothesis yields an orthonormal basis, {w1 , · · · , wn−1 } for H1 such that
j
n−1
∑∑
L1 =
cij wi ⊗wj .
(13.3)
j=1 i=1
Then {w1 , · · · , wn } is an orthonormal basis for H because every vector in
span (v1 , · · · , vn−1 )
has the property that its inner product with wn is 0 so in particular, this is true for the
vectors {w1 , · · · , wn−1 }. Now define cin to be the scalars satisfying
Lwn ≡
n
∑
cin wi
(13.4)
i=1
and let
j
n ∑
∑
B≡
cij wi ⊗wj .
j=1 i=1
Then by 13.4,
Bwn =
j
n ∑
∑
cij wi δ nj =
j=1 i=1
If 1 ≤ k ≤ n − 1,
Bwk =
n
∑
cin wi = Lwn .
j=1
j
n ∑
∑
cij wi δ kj =
j=1 i=1
k
∑
cik wi
i=1
while from 13.3,
Lwk = L1 wk =
j
n−1
∑∑
cij wi δ jk =
j=1 i=1
k
∑
cik wi .
i=1
Since L = B on the basis {w1 , · · · , wn } , it follows L = B.
It remains to verify the constants, ckk are the eigenvalues of L, solutions of the equation,
det (λI − L) = 0. However, the definition of det (λI − L) is the same as
det (λI − C)
where C is the upper triangular matrix which has cij for i ≤ j and zeros elsewhere. This
equals 0 if and only if λ is one of the diagonal entries, one of the ckk . Now with the above Schur’s theorem, the following diagonalization theorem comes very
easily. Recall the following definition.
316
CHAPTER 13. SELF ADJOINT OPERATORS
Definition 13.2.3 Let L ∈ L (H, H) where H is a finite dimensional inner product space.
Then L is Hermitian if L∗ = L.
Theorem 13.2.4 Let L ∈ L (H, H) where H is an n dimensional inner product space. If
L is Hermitian, then all of its eigenvalues λk are real and there exists an orthonormal basis
of eigenvectors {wk } such that
∑
L=
λk wk ⊗wk .
k
Proof: By Schur’s theorem, Theorem 13.2.2, there exist lij ∈ F such that
L=
j
n ∑
∑
lij wi ⊗wj
j=1 i=1
Then by Lemma 12.4.2,
j
n ∑
∑
lij wi ⊗wj
= L = L∗ =
j=1 i=1
j
n ∑
∑
∗
(lij wi ⊗wj )
j=1 i=1
=
j
n ∑
∑
lij wj ⊗wi =
j=1 i=1
n ∑
i
∑
lji wi ⊗wj
i=1 j=1
By independence, if i = j, lii = lii and so these are all real. If i < j, it follows from
independence again that lij = 0 because the coefficients corresponding to i < j are all 0 on
the right side. Similarly if i > j, it follows lij = 0. Letting λk = lkk , this shows
∑
λk wk ⊗ wk
L=
k
That each of these wk is an eigenvector corresponding to λk is obvious from the definition
of the tensor product. 13.3
Spectral Theory Of Self Adjoint Operators
The following theorem is about the eigenvectors and eigenvalues of a self adjoint operator.
Such operators are also called Hermitian as in the case of matrices. The proof given generalizes to the situation of a compact self adjoint operator on a Hilbert space and leads to
many very useful results. It is also a very elementary proof because it does not use the
fundamental theorem of algebra and it contains a way, very important in applications, of
finding the eigenvalues. This proof depends more directly on the methods of analysis than
the preceding material. Recall the following notation.
Definition 13.3.1 Let X be an inner product space and let S ⊆ X. Then
S ⊥ ≡ {x ∈ X : (x, s) = 0 for all s ∈ S} .
Note that even if S is not a subspace, S ⊥ is.
Theorem 13.3.2 Let A ∈ L (X, X) be self adjoint (Hermitian) where X is a finite dimensional inner product space of dimension n. Thus A = A∗ . Then there exists an orthonormal
n
basis of eigenvectors, {uj }j=1 .
13.3. SPECTRAL THEORY OF SELF ADJOINT OPERATORS
317
Proof: Consider (Ax, x) . This quantity is always a real number because
(Ax, x) = (x, Ax) = (x, A∗ x) = (Ax, x)
thanks to the assumption that A is self adjoint. Now define
λ1 ≡ inf {(Ax, x) : |x| = 1, x ∈ X1 ≡ X} .
Claim: λ1 is finite and there exists v1 ∈ X with |v1 | = 1 such that (Av1 , v1 ) = λ1 .
n
Proof of claim: Let {uj }j=1 be an orthonormal basis for X and for x ∈ X, let (x1 , · · · ,
xn ) be defined as the components of the vector x. Thus,
x=
n
∑
xj uj .
j=1
Since this is an orthonormal basis, it follows from the axioms of the inner product that
2
|x| =
n
∑
2
|xj | .
j=1

Thus
(Ax, x) = 
n
∑
xk Auk ,
∑

xj uj  =
∑
j=1
k=1
xk xj (Auk , uj ) ,
k,j
a real valued continuous function of (x1 , · · · , xn ) which is defined on the compact set
K ≡ {(x1 , · · · , xn ) ∈ Fn :
n
∑
2
|xj | = 1}.
j=1
Therefore, it achieves its minimum from the extreme value theorem. Then define
v1 ≡
n
∑
xj uj
j=1
where (x1 , · · · , xn ) is the point of K at which the above function achieves its minimum.
This proves the claim.
I claim that λ1 is an eigenvalue and v1 is an eigenvector. Letting w ∈ X1 ≡ X, the
function of the real variable, t, given by
f (t) ≡
(A (v1 + tw) , v1 + tw)
|v1 + tw|
2
=
(Av1 , v1 ) + 2t Re (Av1 , w) + t2 (Aw, w)
2
|v1 | + 2t Re (v1 , w) + t2 |w|
2
achieves its minimum when t = 0. Therefore, the derivative of this function evaluated at
t = 0 must equal zero. Using the quotient rule, this implies, since |v1 | = 1 that
2
2 Re (Av1 , w) |v1 | − 2 Re (v1 , w) (Av1 , v1 ) = 2 (Re (Av1 , w) − Re (v1 , w) λ1 ) = 0.
Thus Re (Av1 − λ1 v1 , w) = 0 for all w ∈ X. This implies Av1 = λ1 v1 . To see this, let w ∈ X
be arbitrary and let θ be a complex number with |θ| = 1 and
|(Av1 − λ1 v1 , w)| = θ (Av1 − λ1 v1 , w) .
318
CHAPTER 13. SELF ADJOINT OPERATORS
Then
)
(
|(Av1 − λ1 v1 , w)| = Re Av1 − λ1 v1 , θw = 0.
Since this holds for all w, Av1 = λ1 v1 .
⊥
Continuing with the proof of the theorem, let X2 ≡ {v1 } . This is a closed subspace of
X and A : X2 → X2 because for x ∈ X2 ,
(Ax, v1 ) = (x, Av1 ) = λ1 (x, v1 ) = 0.
Let
λ2 ≡ inf {(Ax, x) : |x| = 1, x ∈ X2 }
⊥
As before, there exists v2 ∈ X2 such that Av2 = λ2 v2 , λ1 ≤ λ2 . Now let X3 ≡ {v1 , v2 }
⊥
and continue in this way. As long as k < n, it will be the case that {v1 , · · · , vk } ̸= {0}.
This is because for k < n these vectors cannot be a spanning set and so there exists some
w ∈
/ span (v1 , · · · , vk ) . Then letting z be the closest point to w from span (v1 , · · · , vk ) , it
⊥
follows that w − z ∈ {v1 , · · · , vk } . Thus there is an increasing sequence of eigenvalues
n
{λk }k=1 and a corresponding sequence of orthonormal eigenvectors, {v1 , · · · , vn }.
Contained in the proof of this theorem is the following important corollary.
Corollary 13.3.3 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner
product space. Then all the eigenvalues are real and for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues
of A, there exists an orthonormal set of vectors {u1 , · · · , un } for which
Auk = λk uk .
Furthermore,
λk ≡ inf {(Ax, x) : |x| = 1, x ∈ Xk }
where
⊥
Xk ≡ {u1 , · · · , uk−1 } , X1 ≡ X.
Corollary 13.3.4 Let A ∈ L (X, X) be self adjoint (Hermitian) where X is a finite dimensional inner product space. Then the largest eigenvalue of A is given by
max {(Ax, x) : |x| = 1}
(13.5)
and the minimum eigenvalue of A is given by
min {(Ax, x) : |x| = 1} .
(13.6)
Proof: The proof of this is just like the proof of Theorem 13.3.2. Simply replace inf
with sup and obtain a decreasing list of eigenvalues. This establishes 13.5. The claim 13.6
follows from Theorem 13.3.2.
Another important observation is found in the following corollary.
∑
Corollary 13.3.5 Let A ∈ L (X, X) where A is self adjoint. Then A = i λi vi ⊗ vi where
n
Avi = λi vi and {vi }i=1 is an orthonormal basis.
Proof : If vk is one of the orthonormal basis vectors, Avk = λk vk . Also,
∑
∑
∑
λi vi ⊗ vi (vk ) =
λi vi (vk , vi ) =
λi δ ik vi = λk vk .
i
i
i
Since the two linear transformations agree on a basis, it follows they must coincide. n
By Theorem 12.4.5 this says the matrix of A with respect to this basis {vi }i=1 is the
diagonal matrix having the eigenvalues λ1 , · · · , λn down the main diagonal.
The result of Courant and Fischer which follows resembles Corollary 13.3.3 but is more
useful because it does not depend on a knowledge of the eigenvectors.
13.3. SPECTRAL THEORY OF SELF ADJOINT OPERATORS
319
Theorem 13.3.6 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner
product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal
vectors {u1 , · · · , un } for which
Auk = λk uk .
Furthermore,
λk ≡
max
w1 ,··· ,wk−1
{
{
}}
⊥
min (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 }
(13.7)
⊥
where if k = 1, {w1 , · · · , wk−1 } ≡ X.
Proof: From Theorem 13.3.2, there exist eigenvalues and eigenvectors with {u1 , · · · , un }
orthonormal and λi ≤ λi+1 . Therefore, by Corollary 13.3.5
A=
n
∑
λj uj ⊗ uj
j=1
Fix {w1 , · · · , wk−1 }.
(Ax, x) =
n
∑
λj (x, uj ) (uj , x) =
j=1
n
∑
2
λj |(x, uj )|
j=1
⊥
Then let Y = {w1 , · · · , wk−1 }
inf {(Ax, x) : |x| = 1, x ∈ Y } = inf
≤ inf

k
∑

2

n
∑

2
λj |(x, uj )| : |x| = 1, x ∈ Y
j=1
λj |(x, uj )| : |x| = 1, (x, uj ) = 0 for j > k, and x ∈ Y
j=1






.
(13.8)
The reason this is so is that the infimum is taken over a smaller set. Therefore, the infimum
gets larger. Now 13.8 is no larger than


k
 ∑

2
inf λk
|(x, uj )| : |x| = 1, (x, uj ) = 0 for j > k, and x ∈ Y = λk


j=1
∑n
2
2
because since {u1 , · · · , un } is an orthonormal basis, |x| = j=1 |(x, uj )| . It follows since
{w1 , · · · , wk−1 } is arbitrary,
}}
{ {
⊥
inf (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 }
≤ λk .
(13.9)
sup
w1 ,··· ,wk−1
However, for each w1 , · · · , wk−1 , the infimum is achieved so you can replace the inf in the
above with min. In addition to this, it follows from Corollary 13.3.3 that there exists a set,
{w1 , · · · , wk−1 } for which
{
}
⊥
inf (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 }
= λk .
Pick {w1 , · · · , wk−1 } = {u1 , · · · , uk−1 } . Therefore, the sup in 13.9 is achieved and equals
λk and 13.7 follows. The following corollary is immediate.
320
CHAPTER 13. SELF ADJOINT OPERATORS
Corollary 13.3.7 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner
product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal
vectors {u1 , · · · , un } for which
Auk = λk uk .
Furthermore,
{
λk ≡
max
{
min
w1 ,··· ,wk−1
(Ax, x)
2
|x|
}}
⊥
: x ̸= 0, x ∈ {w1 , · · · , wk−1 }
(13.10)
⊥
where if k = 1, {w1 , · · · , wk−1 } ≡ X.
Here is a version of this for which the roles of max and min are reversed.
Corollary 13.3.8 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner
product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal
vectors {u1 , · · · , un } for which
Auk = λk uk .
Furthermore,
{
λk ≡
min
w1 ,··· ,wn−k
{
max
(Ax, x)
2
|x|
}}
⊥
: x ̸= 0, x ∈ {w1 , · · · , wn−k }
(13.11)
⊥
where if k = n, {w1 , · · · , wn−k } ≡ X.
13.4
Positive And Negative Linear Transformations
The notion of a positive definite or negative definite linear transformation is very important
in many applications. In particular it is used in versions of the second derivative test for
functions of many variables. Here the main interest is the case of a linear transformation
which is an n×n matrix but the theorem is stated and proved using a more general notation
because all these issues discussed here have interesting generalizations to functional analysis.
Definition 13.4.1 A self adjoint A ∈ L (X, X) , is positive definite if whenever x ̸= 0,
(Ax, x) > 0 and A is negative definite if for all x ̸= 0, (Ax, x) < 0. A is positive semidefinite or just nonnegative for short if for all x, (Ax, x) ≥ 0. A is negative semidefinite or
nonpositive for short if for all x, (Ax, x) ≤ 0.
The following lemma is of fundamental importance in determining which linear transformations are positive or negative definite.
Lemma 13.4.2 Let X be a finite dimensional inner product space. A self adjoint A ∈
L (X, X) is positive definite if and only if all its eigenvalues are positive and negative definite
if and only if all its eigenvalues are negative. It is positive semidefinite if all the eigenvalues
are nonnegative and it is negative semidefinite if all the eigenvalues are nonpositive.
Proof: Suppose first that A is positive definite and let λ be an eigenvalue. Then for x
an eigenvector corresponding to λ, λ (x, x) = (λx, x) = (Ax, x) > 0. Therefore, λ > 0 as
claimed.
13.4. POSITIVE AND NEGATIVE LINEAR TRANSFORMATIONS
321
Now suppose
∑n all the eigenvalues of A are positive. From Theorem 13.3.2 and Corollary
13.3.5, A =
i=1 λi ui ⊗ ui where the λi are the positive eigenvalues and {ui } are an
orthonormal set of eigenvectors. Therefore, letting x ̸= 0,
)
)
) ( n
(( n
∑
∑
(Ax, x) =
λi ui ⊗ ui x, x =
λi ui (x, ui ) , x
(
=
i=1
n
∑
i=1
)
λi (x, ui ) (ui , x)
=
n
∑
2
λi |(ui , x)| > 0
i=1
i=1
∑n
2
2
because, since {ui } is an orthonormal basis, |x| = i=1 |(ui , x)| .
To establish the claim about negative definite, it suffices to note that A is negative
definite if and only if −A is positive definite and the eigenvalues of A are (−1) times the
eigenvalues of −A. The claims about positive semidefinite and negative semidefinite are
obtained similarly. The next theorem is about a way to recognize whether a self adjoint n × n complex
matrix A is positive or negative definite without having to find the eigenvalues. In order
to state this theorem, here is some notation.
Definition 13.4.3 Let A be an n × n matrix. Denote by Ak the k × k matrix obtained by
deleting the k + 1, · · · , n columns and the k + 1, · · · , n rows from A. Thus An = A and Ak
is the k × k submatrix of A which occupies the upper left corner of A. The determinants of
these submatrices are called the principle minors.
The following theorem is proved in [8]
Theorem 13.4.4 Let A be a self adjoint n × n matrix. Then A is positive definite if and
only if det (Ak ) > 0 for every k = 1, · · · , n.
Proof: This theorem is proved by induction on n. It is clearly true if n = 1. Suppose
then that it is true for n−1 where n ≥ 2. Since det (A) > 0, it follows that all the eigenvalues
are nonzero. Are they all positive? Suppose not. Then there is some even number of them
which are negative, even because the product of all the eigenvalues is known to be positive,
equaling det (A). Pick two, λ1 and λ2 and let Aui = λi ui where ui ̸= 0 for i = 1, 2 and
(u1 , u2 ) = 0. Now if y ≡ α1 u1 + α2 u2 is an element of span (u1 , u2 ) , then since these are
eigenvalues and (u1 , u2 ) = 0, a short computation shows
2
2
2
2
(A (α1 u1 + α2 u2 ) , α1 u1 + α2 u2 ) = |α1 | λ1 |u1 | + |α2 | λ2 |u2 | < 0.
Now letting x ∈ Cn−1 , x ̸= 0, the induction hypothesis implies
(
)
x
∗
(x , 0) A
= x∗ An−1 x = (Ax, x) > 0.
0
The dimension of {z ∈ Cn : zn = 0} is n − 1 and the dimension of span (u1 , u2 ) = 2 and so
there must be some nonzero x ∈ Cn which is in both of these subspaces of Cn . However,
the first computation would require that (Ax, x) < 0 while the second would require that
(Ax, x) > 0. This contradiction shows that all the eigenvalues must be positive. This proves
the if part of the theorem.
To show the converse, note that, as above, (Ax, x) = xT Ax. Suppose that A is positive
definite. Then this is equivalent to having
xT Ax ≥ δ ∥x∥
2
322
CHAPTER 13. SELF ADJOINT OPERATORS
Note that for x ∈ Rk ,
(
x
T
0
)
(
A
x
0
)
2
= xT Ak x ≥ δ ∥x∥
From Lemma 13.4.2, this implies that all the eigenvalues of Ak are positive. Hence from
Lemma 13.4.2, it follows that det (Ak ) > 0, being the product of its eigenvalues. Corollary 13.4.5 Let A be a self adjoint n × n matrix. Then A is negative definite if and
k
only if det (Ak ) (−1) > 0 for every k = 1, · · · , n.
Proof: This is immediate from the above theorem by noting that, as in the proof of
Lemma 13.4.2, A is negative definite if and only if −A is positive definite. Therefore,
if det (−Ak ) > 0 for all k = 1, · · · , n, it follows that A is negative definite. However,
k
det (−Ak ) = (−1) det (Ak ) . 13.5
The Square Root
With the above theory, it is possible to take fractional powers of certain elements of L (X, X)
where X is a finite dimensional inner product space. I will give two treatments of this, the
first pertaining to the square root only and the second more generally pertaining to the k th
root of a self adjoint nonnegative matrix.
Theorem 13.5.1 Let A ∈ L (X, X) be self adjoint and nonnegative. Then there exists a
unique self adjoint nonnegative B ∈ L (X, X) such that B 2 = A and B commutes with every
element of L (X, X) which commutes with A.
Proof: By Theorem 13.3.2, there exists an orthonormal basis of
∑eigenvectors of A, say
n
{vi }i=1 such that Avi = λi vi . Therefore, by Theorem 13.2.4, A = i λi vi ⊗ vi where each
λi ≥ 0.
Now by Lemma 13.4.2, each λi ≥ 0. Therefore, it makes sense to define
∑ 1/2
B≡
λi vi ⊗ vi .
i
It is easy to verify that
{
0 if i ̸= j
.
vi ⊗ vi if i = j
∑
Therefore, a short computation verifies that B 2 = i λi vi ⊗ vi = A. If C commutes with
A, then for some cij ,
∑
C=
cij vi ⊗ vj
(vi ⊗ vi ) (vj ⊗ vj ) =
ij
and so since they commute,
∑
∑
∑
cij vi ⊗ vj λk vk ⊗ vk =
cij λk δ jk vi ⊗ vk =
cik λk vi ⊗ vk
i,j,k
=
i,j,k
∑
cij λk vk ⊗ vk vi ⊗ vj =
i,j,k
=
∑
k,i
∑
i,j,k
cik λi vi ⊗ vk
i,k
cij λk δ ki vk ⊗ vj =
∑
j,k
ckj λk vk ⊗ vj
13.6. FRACTIONAL POWERS
323
Then by independence,
cik λi = cik λk
1/2
1/2
Therefore, cik λi = cik λk which amounts to saying that B also commutes with C. It is
clear that this operator is self adjoint. This proves existence.
Suppose B1 is another square root which is self adjoint, nonnegative and commutes with
every matrix which commutes with A. Since both B, B1 are nonnegative,
(B (B − B1 ) x, (B − B1 ) x) ≥ 0,
(B1 (B − B1 ) x, (B − B1 ) x) ≥ 0
(13.12)
Now, adding these together, and using the fact that the two commute,
(( 2
)
)
B − B12 x, (B − B1 ) x = ((A − A) x, (B − B1 ) x) = 0.
It follows that both inner products in 13.12 equal 0. Next
√ use
√ the existence part of this to
take the square root of B and B1 which is denoted by B, B1 respectively. Then
(√
)
√
0 =
B (B − B1 ) x, B (B − B1 ) x
(√
)
√
0 =
B1 (B − B1 ) x, B1 (B − B1 ) x
which implies
√
√
B (B − B1 ) x = B1 (B − B1 ) x = 0. Thus also,
B (B − B1 ) x = B1 (B − B1 ) x = 0
Hence
0 = (B (B − B1 ) x − B1 (B − B1 ) x, x) = ((B − B1 ) x, (B − B1 ) x)
and so, since x is arbitrary, B1 = B. 13.6
Fractional Powers
The main result is the following theorem.
Theorem 13.6.1 Let A be a self adjoint and nonnegative n × n matrix (all eigenvalues
are nonnegative) and let k be a positive integer. Then there exists a unique self adjoint
nonnegative matrix B such that B k = A.
Proof: By Theorem 13.3.2 or Corollary 7.4.12, there exists an orthonormal basis of
n
eigenvectors of A, say {vi }i=1 such that Avi = λi vi with each λi real. In particular, there
exists a unitary matrix U such that
U ∗ AU = D, A = U DU ∗
where D has nonnegative diagonal entries. Define B in the obvious way.
B ≡ U D1/k U ∗
Then it is clear that B is self adjoint and nonnegative. Also it is clear that B k =(A. What
)
1/k
of uniqueness? Let p (t) be a polynomial whose graph contains the ordered pairs λi , λi
where the λi are the diagonal entries of D, the eigenvalues of A. Then
p (A) = U P (D) U ∗ = U D1/k U ∗ ≡ B
324
CHAPTER 13. SELF ADJOINT OPERATORS
Suppose then that C k = A and C is also self adjoint and nonnegative.
( )
( )
CB = Cp (A) = Cp C k = p C k C = p (A) C = BC
and so {B, C} is a commuting family of non defective matrices. By Theorem 13.1.9 this
family of matrices is simultaneously diagonalizable. Hence there exists a single S such that
S −1 BS = DB , S −1 CS = DC
Where DC , DB denote diagonal matrices. Hence, raising to the power k, it follows that
k −1
k −1
S , A = C k = SDC
S
A = B k = SDB
Hence
k −1
k −1
SDB
S = SDC
S
k
k
and so DB
= DC
. Since the entries of the two diagonal matrices are nonnegative, this implies
DB = DC and so D−1 BS = S −1 CS which shows B = C. A similar result holds for a general finite dimensional inner product space. See Problem
23 in the exercises.
13.7
Polar Decompositions
An application of Theorem 13.3.2, is the following fundamental result, important in geometric measure theory and continuum mechanics. It is sometimes called the right polar
decomposition. The notation used is that which is seen in continuum mechanics, see for
example Gurtin [12]. Don’t confuse the U in this theorem with a unitary transformation.
It is not so. When the following theorem is applied in continuum mechanics, F is normally
the deformation gradient, the derivative of a nonlinear map from some subset of three dimensional space to three dimensional space. In this context, U is called the right Cauchy
Green strain tensor. It is a measure of how a body is stretched independent of rigid motions.
First, here is a simple lemma.
Lemma 13.7.1 Suppose R ∈ L (X, Y ) where X, Y are inner product spaces and R preserves
distances. Then R∗ R = I.
Proof: Since R preserves distances, |Rx| = |x| for every x. Therefore from the axioms
of the inner product,
2
2
2
|x| + |y| + (x, y) + (y, x) = |x + y| = (R (x + y) , R (x + y))
= (Rx,Rx) + (Ry,Ry) + (Rx, Ry) + (Ry, Rx)
= |x| + |y| + (R∗ Rx, y) + (y, R∗ Rx)
2
and so for all x, y,
Hence for all x, y,
2
(R∗ Rx − x, y) + (y,R∗ Rx − x) = 0
Re (R∗ Rx − x, y) = 0
Now for x, y given, choose α ∈ C such that
α (R∗ Rx − x, y) = |(R∗ Rx − x, y)|
13.7. POLAR DECOMPOSITIONS
Then
325
0 = Re (R∗ Rx − x,αy) = Re α (R∗ Rx − x, y) = |(R∗ Rx − x, y)|
Thus |(R∗ Rx − x, y)| = 0 for all x, y because the given x, y were arbitrary. Let y =
R∗ Rx − x to conclude that for all x, R∗ Rx − x = 0 which says R∗ R = I since x is arbitrary.
The decomposition in the following is called the right polar decomposition.
Theorem 13.7.2 Let X be a inner product space of dimension n and let Y be a inner
product space of dimension m ≥ n and let F ∈ L (X, Y ). Then there exists R ∈ L (X, Y )
and U ∈ L (X, X) such that
F = RU, U = U ∗ , (U is Hermitian),
all eigenvalues of U are non negative,
U 2 = F ∗ F, R∗ R = I,
and |Rx| = |x| .
∗
Proof: (F ∗ F ) = F ∗ F and so by Theorem 13.3.2, there is an orthonormal basis of
eigenvectors, {v1 , · · · , vn } such that
F ∗ F vi = λi vi , F ∗ F =
n
∑
λi vi ⊗ vi .
i=1
It is also clear that λi ≥ 0 because
λi (vi , vi ) = (F ∗ F vi , vi ) = (F vi , F vi ) ≥ 0.
Let
U≡
n
∑
1/2
λi vi ⊗ vi .
i=1
{
}n
1/2
Then U 2 = F ∗ F, U = U ∗ , and the eigenvalues of U, λi
are all non negative.
i=1
Let {U x1 , · · · , U xr } be an orthonormal basis for U (X) . By the Gram Schmidt procedure
there exists an extension to an orthonormal basis for X,
{U x1 , · · · , U xr , yr+1 , · · · , yn } .
Next note that {F x1 , · · · , F xr } is also an orthonormal set of vectors in Y because
(
)
(F xk , F xj ) = (F ∗ F xk , xj ) = U 2 xk , xj = (U xk , U xj ) = δ jk .
By the Gram Schmidt procedure, there exists an extension of {F x1 , · · · , F xr } to an orthonormal basis for Y,
{F x1 , · · · , F xr , zr+1 , · · · , zm } .
Since m ≥ n, there are at least as many zk as there are yk . Now for x ∈ X, since
{U x1 , · · · , U xr , yr+1 , · · · , yn }
is an orthonormal basis for X, there exist unique scalars
c1 , · · · , cr , dr+1 , · · · , dn
326
CHAPTER 13. SELF ADJOINT OPERATORS
such that
x=
r
∑
k=1
Define
Rx ≡
n
∑
ck U xk +
r
∑
n
∑
ck F xk +
k=1
Thus
2
|Rx| =
r
∑
dk yk
k=r+1
dk zk
(13.13)
k=r+1
n
∑
2
|ck | +
k=1
2
2
|dk | = |x| .
k=r+1
Therefore, by Lemma 13.7.1 R∗ R = I.
Then also there exist scalars bk such that
Ux =
r
∑
bk U xk
(13.14)
k=1
and so from 13.13,
RU x =
r
∑
(
bk F xk = F
k=1
r
∑
)
bk xk
k=1
∑r
Is F ( k=1 bk xk ) = F (x)?
)
)
)
( r
( ( r
∑
∑
bk xk − F (x)
bk xk − F (x) , F
F
k=1
k=1
(
=
(
(F ∗ F )
(
(
=
U
(
=
U
(
=
2
r
∑
) (
bk xk − x ,
k=1
r
∑
k=1
( r
∑
) (
bk xk − x ,
)
bk xk − x , U
k=1
r
∑
k=1
r
∑
))
bk xk − x
k=1
r
∑
bk xk − x
k=1
( r
∑
))
bk xk − x
k=1
bk U xk − U x,
r
∑
))
bk U xk − U x
)
=0
k=1
∑r
∑r
Because from 13.14, U x = k=1 bk U xk . Therefore, RU x = F ( k=1 bk xk ) = F (x). The following corollary follows as a simple consequence of this theorem. It is called the
left polar decomposition.
Corollary 13.7.3 Let F ∈ L (X, Y ) and suppose n ≥ m where X is a inner product space of
dimension n and Y is a inner product space of dimension m. Then there exists a Hermitian
U ∈ L (X, X) , and an element of L (X, Y ) , R, such that
F = U R, RR∗ = I.
∗
Proof: Recall that L∗∗ = L and (M L) = L∗ M ∗ . Now apply Theorem 13.7.2 to
F ∈ L (Y, X). Thus, F ∗ = R∗ U where R∗ and U satisfy the conditions of that theorem.
Then F = U R and RR∗ = R∗∗ R∗ = I. The following existence theorem for the polar decomposition of an element of L (X, X)
is a corollary.
∗
13.8. AN APPLICATION TO STATISTICS
327
Corollary 13.7.4 Let F ∈ L (X, X). Then there exists a Hermitian W ∈ L (X, X) , and
a unitary matrix Q such that F = W Q, and there exists a Hermitian U ∈ L (X, X) and a
unitary R, such that F = RU.
This corollary has a fascinating relation to the question whether a given linear transformation is normal. Recall that an n × n matrix A, is normal if AA∗ = A∗ A. Retain the same
definition for an element of L (X, X) .
Theorem 13.7.5 Let F ∈ L (X, X) . Then F is normal if and only if in Corollary 13.7.4
RU = U R and QW = W Q.
Proof: I will prove the statement about RU = U R and leave the other part as an
exercise. First suppose that RU = U R and show F is normal. To begin with,
∗
∗
U R∗ = (RU ) = (U R) = R∗ U.
Therefore,
F ∗F
=
U R∗ RU = U 2
FF∗
=
RU U R∗ = U RR∗ U = U 2
which shows F is normal.
Now suppose F is normal. Is RU = U R? Since F is normal,
F F ∗ = RU U R∗ = RU 2 R∗
and
F ∗ F = U R∗ RU = U 2 .
Therefore, RU 2 R∗ = U 2 , and both are nonnegative and self adjoint. Therefore, the square
roots of both sides must be equal by the uniqueness part of the theorem on fractional powers.
It follows that the square root of the first, RU R∗ must equal the square root of the second,
U. Therefore, RU R∗ = U and so RU = U R. This proves the theorem in one case. The other
case in which W and Q commute is left as an exercise. 13.8
An Application To Statistics
A random vector is a function X : Ω → Rp where Ω is a probability space. This means
that there exists a σ algebra of measurable sets F and a probability measure P : F → [0, 1].
In practice, people often don’t worry too much about the underlying probability space and
instead pay more attention to the distribution measure of the random variable. For E a
suitable subset of Rp , this measure gives the probability that X has values in E. There
are often excellent reasons for believing that a random vector is normally distributed. This
means that the probability that X has values in a set E is given by
(
)
∫
1
1
∗ −1
exp − (x − m) Σ (x − m) dx
p/2
1/2
2
E (2π)
det (Σ)
The expression in the integral is called the normal probability density function. There are
two parameters, m and Σ where m is called the mean and Σ is called the covariance matrix.
It is a symmetric matrix which has all real eigenvalues which are all positive. While it may
be reasonable to assume this is the distribution, in general, you won’t know m and Σ and
in order to use this formula to predict anything, you would need to know these quantities. I
328
CHAPTER 13. SELF ADJOINT OPERATORS
am following a nice discussion given in Wikipedia which makes use of the existence of square
roots.
What people do to estimate these is to take n independent observations x1 , · · · , xn and
try to predict what m and Σ should be based on these observations. One criterion used for
making this determination is the method of maximum likelihood. In this method, you seek
to choose the two parameters in such a way as to maximize the likelihood which is given as
n
∏
i=1
1
det (Σ)
1/2
)
(
1
∗ −1
exp − (xi −m) Σ (xi −m) .
2
p/2
For convenience the term (2π)
was ignored. Maximizing the above is equivalent to maximizing the ln of the above. So taking ln,
(
)) 1 ∑
n (
∗
ln det Σ−1 −
(xi −m) Σ−1 (xi −m)
2
2 i=1
n
Note that the above is a function of the entries of m. Take the partial derivative with
respect to ml . Since the matrix Σ−1 is symmetric this implies
n ∑
∑
i=1
Therefore,
(xir − mr ) Σ−1
rl = 0 each l.
r
n
∑
∗
(xi − m) Σ−1 = 0
i=1
and so, multiplying by Σ on the right and then taking adjoints, this yields
nm =
n
∑
1∑
xi ≡ x
¯.
n i=1
n
xi . m =
i=1
Now that m is determined, it remains to find the best estimate for Σ.
(
)
∗
∗
(xi −m) Σ−1 (xi −m) = trace (xi −m) Σ−1 (xi −m)
(
)
∗
= trace (xi −m) (xi −m) Σ−1
Therefore, the thing to maximize is
n
(
(
)) ∑
(
)
∗
n ln det Σ−1 −
trace (xi −m) (xi −m) Σ−1
i=1


S
z(
}|
{
)
 ∑

 n

(
( −1 ))
∗
−1 
= n ln det Σ
− trace 
(x
−m)
(x
−m)
Σ
i
i


 i=1

We assume that S has rank p. Thus it is a self adjoint matrix which has all positive eigenvalues. (It would be incredible if this were not the case, especially if n is large.) Therefore,
from the property of the trace, the thing to maximize is
)
(
(
(
))
n ln det Σ−1 − trace S 1/2 Σ−1 S 1/2
13.9. THE SINGULAR VALUE DECOMPOSITION
329
Now let B = S 1/2 Σ−1 S 1/2 . Then B is positive and self adjoint also and so there exists U
unitary such that B = U ∗ DU where D is the diagonal matrix having the positive scalars
λ1 , · · · , λp down the main diagonal. Solving for Σ−1 in terms of B, this yields
C (S) + n ln (det (B)) − trace (B)
as the thing to maximize. Of course this yields
(
C (S) + n ln
p
∏
)
λi
−
i=1
= C (S) + n
p
∑
ln (λi ) −
i=1
p
∑
λi
i=1
p
∑
λi
i=1
as the quantity to be maximized. To do this, take ∂/∂λk and set equal to 0. This yields
λk = n. Therefore, from the above, B = U ∗ nIU = nI. Also from the above,
B −1 =
and so
1
I = S −1/2 ΣS −1/2
n
1∑
1
∗
(xi − m) (xi − m)
S=
n
n i=1
n
Σ=
This has shown that the maximum likelihood estimates are
n
n
1∑
1∑
∗
Σ=
(xi − m) (xi − m) , m = x
¯≡
xi .
n i=1
n i=1
13.9
The Singular Value Decomposition
In this section, A will be an m × n matrix. To begin with, here is a simple lemma.
Lemma 13.9.1 Let A be an m × n matrix. Then A∗ A is self adjoint and all its eigenvalues
are nonnegative.
Proof: It is obvious that A∗ A is self adjoint. Suppose A∗ Ax = λx. Then λ |x| =
(λx, x) = (A∗ Ax, x) = (Ax,Ax) ≥ 0. 2
Definition 13.9.2 Let A be an m × n matrix. The singular values of A are the square roots
of the positive eigenvalues of A∗ A.
With this definition and lemma here is the main theorem on the singular value decomposition. In all that follows, I will write the following partitioned matrix
(
)
σ 0
0 0
where σ denotes an r × r diagonal matrix of the form


σ1
0


..


.
0
σk
and the bottom row of zero matrices in the partitioned matrix, as well as the right columns
of zero matrices are each of the right size so that the resulting matrix is m × n. Either
could vanish completely. However, I will write it in the above form. It is easy to make the
necessary adjustments in the other two cases.
330
CHAPTER 13. SELF ADJOINT OPERATORS
Theorem 13.9.3 Let A be an m × n matrix. Then there exist unitary matrices, U and V
of the appropriate size such that
(
)
σ 0
U ∗ AV =
0 0
where σ is of the form


σ=
σ1
0
..
0
.



σk
for the σ i the singular values of A, arranged in order of decreasing size.
Proof: By the above lemma and Theorem 13.3.2 there exists an orthonormal basis,
n
{vi }i=1 for Fn such that A∗ Avi = σ 2i vi where σ 2i > 0 for i = 1, · · · , k, (σ i > 0) , and equals
zero if i > k. Let the eigenvalues σ 2i be arranged in decreasing order. It is desired to have
(
)
σ 0
AV = U
0 0
(
)
and so if U = u1 · · · um , one needs to have for j ≤ k, σ j uj = Avj . Thus let
uj ≡ σ −1
j Avj , j ≤ k
Then for i, j ≤ k,
−1
−1 −1
∗
σ −1
j σ i (Avi , Avj ) = σ j σ i (A Avi , vj )
(ui , uj ) =
−1 2
= σ −1
j σ i σ i (vi , vj ) = δ ij
Now extend to an orthonormal basis of Fm , {u1 , · · · , uk , uk+1 , · · · , um } . If i > k,
(Avi , Avi ) = (A∗ Avi , vi ) = 0 (vi , vi ) = 0
so Avi = 0. Then for σ given as above in the statement of the theorem, it follows that
(
)
(
)
σ 0
σ 0
AV = U
, U ∗ AV =
0 0
0 0
The singular value decomposition has as an immediate corollary the following interesting
result.
Corollary 13.9.4 Let A be an m × n matrix. Then the rank of A and A∗ equals the number
of singular values.
Proof: Since V and U are unitary, they are each one to one and onto and so it follows
that
(
)
σ 0
rank (A) = rank (U ∗ AV ) = rank
= number of singular values.
0 0
Also since U, V are unitary,
(
∗)
rank (A∗ ) = rank (V ∗ A∗ U ) = rank (U ∗ AV )
((
= rank
σ
0
0
0
)∗ )
= number of singular values. 13.10. APPROXIMATION IN THE FROBENIUS NORM
13.10
331
Approximation In The Frobenius Norm
The Frobenius norm is one of many norms for a matrix. It is arguably the most obvious of
all norms. Here is its definition.
Definition 13.10.1 Let A be a complex m × n matrix. Then
||A||F ≡ (trace (AA∗ ))
1/2
Also this norm comes from the inner product
(A, B)F ≡ trace (AB ∗ )
2
Thus ||A||F is easily seen to equal
in Fm×n .
∑
2
ij
|aij | so essentially, it treats the matrix as a vector
Lemma 13.10.2 Let A be an m × n complex matrix with singular matrix
(
)
σ 0
Σ=
0 0
with σ as defined above. Then
2
2
||Σ||F = ||A||F
(13.15)
and the following hold for the Frobenius norm. If U, V are unitary and of the right size,
||U A||F = ||A||F , ||U AV ||F = ||A||F .
(13.16)
Proof: From the definition and letting U, V be unitary and of the right size,
||U A||F ≡ trace (U AA∗ U ∗ ) = trace (AA∗ ) = ||A||F
2
Also,
2
||AV ||F ≡ trace (AV V ∗ A∗ ) = trace (AA∗ ) = ||A||F .
2
2
It follows
2
2
2
||U AV ||F = ||AV ||F = ||A||F .
Now consider 13.15. From what was just shown,
||A||F = ||U ΣV ∗ ||F = ||Σ||F . 2
2
Of course, this shows that
2
||A||F =
∑
2
σ 2i ,
i
the sum of the squares of the singular values of A.
Why is the singular value decomposition important? It implies
(
)
σ 0
A=U
V∗
0 0
where σ is the diagonal matrix having the singular values down the diagonal. Now sometimes
A is a huge matrix, 1000×2000 or something like that. This happens in applications to
situations where the entries of A describe a picture. What also happens is that most of the
332
CHAPTER 13. SELF ADJOINT OPERATORS
singular values are very small. What if you deleted those which were very small, say for all
i ≥ l and got a new matrix
( ′
)
σ 0
′
A ≡U
V ∗?
0 0
Then the entries of A′ would end up being close to the entries of A but there is much less
information to keep track of. This turns out to be very useful. More precisely, letting


σ1
0
(
)
σ 0


∗
.
..
σ=
,
 , U AV =
0 0
0
σr
(
σ − σ′
2
||A − A′ ||F = U
0
0
0
)
2
r
∑
σ 2k
V ∗ =
F
k=l+1
Thus A is approximated by A′ where A′ has rank l < r. In fact, it is also true that out
of all matrices of rank l, this A′ is the one which is closest to A in the Frobenius norm. Here
is why.
Let B be a matrix which has rank l. Then from Lemma 13.10.2
(
2
)
σ 0
2
2
2
||A − B||F = ||U ∗ (A − B) V ||F = ||U ∗ AV − U ∗ BV ||F = − U ∗ BV 0 0
F
How can you make the last entry small? Clearly you should have the off diagonal entries
of U ∗ BV equal to zero since otherwise, you could set them equal to zero and make the
expression smaller. Since the singular values of A decrease from the upper left to the lower
right, it follows that for B to be closest as possible to A in the Frobenius norm,
( ′
)
σ 0
U ∗ BV =
0 0
where the singular values σ k for k > r are set equal to zero to obtain σ ′ . This implies
B = A′ above. This is really obvious if you look at a simple example. Say


(
)
3 0 0 0
σ 0
= 0 2 0 0 
0 0
0 0 0 0
for example. Then what rank 1 matrix would
Obviously

3 0
 0 0
0 0
13.11
be closest to this one in the Frobenius norm?

0 0
0 0 
0 0
Least Squares And Singular Value Decomposition
The singular value decomposition also has a very interesting connection to the problem of
least squares solutions. Recall that it was desired to find x such that |Ax − y| is as small as
possible. Lemma 12.5.1 shows that there is a solution to this problem which can be found by
solving the system A∗ Ax = A∗ y. Each x which solves this system solves the minimization
13.12. THE MOORE PENROSE INVERSE
333
problem as was shown in the lemma just mentioned. Now consider this equation for the
solutions of the minimization problem in terms of the singular value decomposition.
A∗
A∗
A
z ( }| ) {
z ( }| ) {z ( }| ) {
σ 0
σ 0
σ 0
∗
∗
U U
V x=V
U ∗ y.
V
0 0
0 0
0 0
Therefore, this yields the following upon using block multiplication and multiplying on the
left by V ∗ .
( 2
)
(
)
σ 0
σ 0
∗
V x=
U ∗ y.
(13.17)
0 0
0 0
One solution to this equation which is very easy to spot is
( −1
)
σ
0
x=V
U ∗ y.
0
0
13.12
(13.18)
The Moore Penrose Inverse
The particular solution of the least squares problem given in 13.18 is important enough that
it motivates the following definition.
Definition 13.12.1 Let A be an m × n matrix. Then the Moore Penrose inverse of A,
denoted by A+ is defined as
( −1
)
σ
0
A+ ≡ V
U ∗.
0
0
(
Here
∗
U AV =
σ
0
0
0
)
as above.
Thus A+ y is a solution to the minimization problem to find x which minimizes |Ax − y| .
In fact, one can say more about this. In the following picture My denotes the set of least
squares solutions x such that A∗ Ax = A∗ y.
A+ (y)
I
My
x
ker(A∗ A)
Then A+ (y) is as given in the picture.
Proposition 13.12.2 A+ y is the solution to the problem of minimizing |Ax − y| for all x
which has smallest norm. Thus
+
AA y − y ≤ |Ax − y| for all x
and if x1 satisfies |Ax1 − y| ≤ |Ax − y| for all x, then |A+ y| ≤ |x1 | .
334
CHAPTER 13. SELF ADJOINT OPERATORS
Proof: Consider x satisfying 13.17, equivalently A∗ Ax =A∗ y,
)
)
( 2
(
σ 0
σ 0
V ∗x =
U ∗y
0 0
0 0
which has smallest norm. This is equivalent to making |V ∗ x| as small as possible because
V ∗ is unitary and so it preserves norms. For z a vector, denote by (z)k the vector in Fk
which consists of the first k entries of z. Then if x is a solution to 13.17
( 2 ∗
) (
)
σ (V x)k
σ (U ∗ y)k
=
0
0
and so (V ∗ x)k = σ −1 (U ∗ y)k . Thus the first k entries of V ∗ x are determined. In order to
make |V ∗ x| as small as possible, the remaining n − k entries should equal zero. Therefore,
(
) ( −1 ∗
) ( −1
)
(V ∗ x)k
σ (U y)k
σ
0
V ∗x =
=
=
U ∗y
0
0
0
0
(
and so
x=V
σ −1
0
0
0
)
U ∗ y ≡ A+ y Lemma 13.12.3 The matrix A+ satisfies the following conditions.
AA+ A = A, A+ AA+ = A+ , A+ A and AA+ are Hermitian.
Proof: This is routine. Recall
(
A=U
(
and
A+ = V
σ
0
0
0
σ −1
0
)
0
0
(13.19)
V∗
)
U∗
so you just plug in and verify it works. A much more interesting observation is that A+ is characterized as being the unique
matrix which satisfies 13.19. This is the content of the following Theorem. The conditions
are sometimes called the Penrose conditions.
Theorem 13.12.4 Let A be an m × n matrix. Then a matrix A0 , is the Moore Penrose
inverse of A if and only if A0 satisfies
AA0 A = A, A0 AA0 = A0 , A0 A and AA0 are Hermitian.
(13.20)
Proof: From the above lemma, the Moore Penrose inverse satisfies 13.20. Suppose then
that A0 satisfies 13.20. It is necessary to verify that A0 = A+ . Recall that from the singular
value decomposition, there exist unitary matrices, U and V such that
(
)
σ 0
∗
U AV = Σ ≡
, A = U ΣV ∗ .
0 0
Let
V ∗ A0 U =
where P is k × k.
(
P
R
Q
S
)
(13.21)
13.12. THE MOORE PENROSE INVERSE
335
Next use the first equation of 13.20 to write
A0
z
}|
{ A
A
A
z }| { ( P Q ) z }| { z }| {
∗
∗
∗
U ΣV V
U U ΣV = U ΣV ∗ .
R S
Then multiplying both sides on the left by U ∗ and on the right by V,
(
)(
)(
) (
)
σ 0
P Q
σ 0
σ 0
=
0 0
R S
0 0
0 0
Now this requires
(
σP σ
0
0
0
)
(
=
σ
0
)
0
0
.
(13.22)
Therefore, P = σ −1 . From the requirement that AA0 is Hermitian,
A0
z
}|
{
A
(
z }| { ( P Q )
σ
∗
∗
U ΣV V
U =U
R S
0
0
0
)(
P
R
Q
S
)
U∗
must be Hermitian. Therefore, it is necessary that
(
)(
)
(
)
σ 0
P Q
σP σQ
=
0 0
R S
0
0
(
)
I σQ
=
0 0
(
is Hermitian. Then
I
0
σQ
0
Thus
)
(
=
I
Q∗ σ
0
0
)
Q∗ σ = 0
and so multiplying both sides on the right by σ −1 , it follows Q∗ = 0 and so Q = 0.
From the requirement that A0 A is Hermitian, it is necessary that
A0
z (
}| ) { A
z }| {
P Q
V
U ∗ U ΣV ∗ =
R S
(
V
(
=
is Hermitian. Therefore, also
(
I
Rσ
0
0
V
Pσ
Rσ
0
0
I
Rσ
0
0
)
)
V∗
V∗
)
is Hermitian. Thus R = 0 because this equals
(
)∗ (
I
0
I
=
Rσ 0
0
σ ∗ R∗
0
)
which requires Rσ = 0. Now multiply on right by σ −1 to find that R = 0.
336
CHAPTER 13. SELF ADJOINT OPERATORS
Use 13.21 and the second equation of 13.20 to write
A0
A0
A0
z (
}| ) { A z (
}| ) { z (
}| ) {
z }| {
P Q
P
Q
P
Q
V
U ∗ U ΣV ∗ V
U∗ = V
U ∗.
R S
R S
R S
which implies
(
P
R
Q
S
)(
σ
0
0
0
)(
P
R
Q
S
)
(
=
P
R
Q
S
)
.
This yields from the above in which is was shown that R, Q are both 0
( −1
)(
) ( −1
)
( −1
)
σ
0
σ 0
σ
0
σ
0
=
0
S
0 0
0
S
0
0
( −1
)
σ
0
=
.
0
S
(13.23)
(13.24)
Therefore, S = 0 also and so
V ∗ A0 U ≡
which says
(
(
A0 = V
P
R
σ −1
0
Q
S
0
0
)
(
=
)
σ −1
0
0
0
)
U ∗ ≡ A+ . The theorem is significant because there is no mention of eigenvalues or eigenvectors in
the characterization of the Moore Penrose inverse given in 13.20. It also shows immediately
that the Moore Penrose inverse is a generalization of the usual inverse. See Problem 3.
13.13
Exercises
∗
∗
1. Show (A∗ ) = A and (AB) = B ∗ A∗ .
2. Prove Corollary 13.3.8.
3. Show that if A is an n × n matrix which has an inverse then A+ = A−1 .
4. Using the singular value decomposition, show that for any square matrix A, it follows
that A∗ A is unitarily similar to AA∗ .
5. Let A, B be a m × n matrices. Define an inner product on the set of m × n matrices
by
(A, B)F ≡ trace (AB ∗ ) .
Show this is an inner product
∑nsatisfying all the inner product axioms. Recall for M an
n × n matrix, trace (M ) ≡ i=1 Mii . The resulting norm, ||·||F is called the Frobenius
norm and it can be used to measure the distance between two matrices.
∑
2
6. Let A be an m × n matrix. Show ||A||F ≡ (A, A)F = j σ 2j where the σ j are the
singular values of A.
7. If A is a general n × n matrix having possibly repeated eigenvalues, show there is a
sequence {Ak } of n × n matrices having distinct eigenvalues which has the property
that the ij th entry of Ak converges to the ij th entry of A for all ij. Hint: Use Schur’s
theorem.
13.13. EXERCISES
337
8. Prove the Cayley Hamilton theorem as follows. First suppose A has a basis of eigenn
vectors {vk }k=1 , Avk = λk vk . Let p (λ) be the characteristic polynomial. Show
p (A) vk = p (λk ) vk = 0. Then since {vk } is a basis, it follows p (A) x = 0 for all
x and so p (A) = 0. Next in the general case, use Problem 7 to obtain a sequence {Ak }
of matrices whose entries converge to the entries of A such that Ak has n distinct
eigenvalues and therefore by Theorem 7.1.7 Ak has a basis of eigenvectors. Therefore, from the first part and for pk (λ) the characteristic polynomial for Ak , it follows
pk (Ak ) = 0. Now explain why and the sense in which limk→∞ pk (Ak ) = p (A) .
9. Prove that Theorem 13.4.4 and Corollary 13.4.5 can be strengthened so that the
condition
( on
) the Ak is necessary as well as sufficient. Hint: Consider vectors of the
x
form
where x ∈ Fk .
0
10. Show directly that if A is an n × n matrix and A = A∗ (A is Hermitian) then all the
eigenvalues are real and eigenvectors can be assumed to be real and that eigenvectors
associated with distinct eigenvalues are orthogonal, (their inner product is zero).
11. Let v1 , · · · , vn be an orthonormal basis for Fn . Let Q be a matrix whose ith column
is vi . Show
Q∗ Q = QQ∗ = I.
12. Show that an n × n matrix Q is unitary if and only if it preserves distances. This
means |Qv| = |v| . This was done in the text but you should try to do it for yourself.
13. Suppose {v1 , · · · , vn } and {w1 , · · · , wn } are two orthonormal bases for Fn and suppose Q is an n × n matrix satisfying Qvi = wi . Then show Q is unitary. If |v| = 1,
show there is a unitary transformation which maps v to e1 .
14. Finish the proof of Theorem 13.7.5.
15. Let A be a Hermitian matrix so A = A∗ and suppose all eigenvalues of A are larger
than δ 2 . Show
2
(Av, v) ≥ δ 2 |v|
∑n
Where here, the inner product is (v, u) ≡ j=1 vj uj .
16. Suppose A + A∗ has all negative eigenvalues. Then show that the eigenvalues of A
have all negative real parts.
17. The discrete Fourier transform maps Cn → Cn as follows.
n−1
1 ∑ −i 2π jk
F (x) = z where zk = √
e n xj .
n j=0
Show that F −1 exists and is given by the formula
n−1
1 ∑ i 2π jk
e n zk
F −1 (z) = x where xj = √
n j=0
Here is one way to approach this problem. Note z = U x

2π
2π
2π
e−i n 0·0
e−i n 1·0
e−i n 2·0
2π
2π
2π

e−i n 1·1
e−i n 2·1
 e−i n 0·1
2π
2π
1  e−i 2π
n 0·2
e−i n 1·2
e−i n 2·2
U=√ 

n
..
..
..

.
.
.
2π
2π
2π
e−i n 0·(n−1) e−i n 1·(n−1) e−i n 2·(n−1)
where
···
···
···
e−i n (n−1)·0
2π
e−i n (n−1)·1
2π
e−i n (n−1)·2
..
.
···
e−i n (n−1)·(n−1)
2π
2π








338
CHAPTER 13. SELF ADJOINT OPERATORS
Now argue U is unitary and use this to establish the result. To show this verify
each row has length 1 and the inner product of two different rows gives 0. Now
2π
2π
Ukj = e−i n jk and so (U ∗ )kj = ei n jk .
18. Let f be a periodic function having period 2π. The Fourier series of f is an expression
of the form
∞
n
∑
∑
ck eikx
ck eikx ≡ lim
n→∞
k=−∞
k=−n
and the idea is to find ck such that the above sequence converges in some way to f . If
f (x) =
∞
∑
ck eikx
k=−∞
and you formally multiply both sides by e−imx and then integrate from 0 to 2π,
interchanging the integral with the sum without any concern for whether this makes
sense, show it is reasonable from this to expect
∫ 2π
1
cm =
f (x) e−imx dx.
2π 0
Now suppose you only know f (x) at equally spaced points 2πj/n for j = 0, 1, · · · , n.
Consider the Riemann sum for this integral obtained
{
}n from using the left endpoint of
the subintervals determined from the partition 2π
j
. How does this compare with
n
j=0
the discrete Fourier transform? What happens as n → ∞ to this approximation?
19. Suppose A is a real 3 × 3 orthogonal matrix (Recall this means AAT = AT A = I. )
having determinant 1. Show it must have an eigenvalue equal to 1. Note this shows
there exists a vector x ̸= 0 such that Ax = x. Hint: Show first or recall that any
orthogonal matrix must preserve lengths. That is, |Ax| = |x| .
20. Let A be a complex m × n matrix. Using the description of the Moore Penrose inverse
in terms of the singular value decomposition, show that
−1
lim (A∗ A + δI)
δ→0+
A∗ = A+
where the convergence happens in the Frobenius norm. Also verify, using the singular
value decomposition, that the inverse exists in the above formula. Observe that this
shows that the Moore Penrose inverse is unique.
21. Show that A+ = (A∗ A) A∗ . Hint: You might use the description of A+ in terms of
the singular value decomposition.
+
22. In Theorem 13.6.1. Show that every matrix which commutes with A also commutes
with A1/k the unique nonnegative self adjoint k th root.
23. Let X be a finite dimensional inner product space and let β = {u1 , · · · , un } be an
orthonormal basis for X. Let A ∈ L (X, X) be self adjoint and nonnegative and
let M be its matrix with respect to the given orthonormal basis. Show that M is
nonnegative, self adjoint also. Use this to show that A has a unique nonnegative self
adjoint k th root.
13.13. EXERCISES
339
24. (
Let A be) a complex m × n matrix having singular value decomposition U ∗ AV =
σ 0
as explained above, where σ is k × k. Show that
0 0
ker (A) = span (V ek+1 , · · · , V en ) ,
the last n − k columns of V .
25. The principal submatrices of an n × n matrix A are Ak where Ak consists those
entries which are in the first k rows and first k columns of A. Suppose A is a real
symmetric matrix and that x → ⟨Ax, x⟩ is positive definite. This means that if x ̸= 0,
then ⟨Ax, x⟩ > 0. Show that (
each )
of the principal submatrices are positive definite.
( T
)
x
0 A
Hint: Consider x
where x consists of k entries.
0
26. ↑Show that if A is a symmetric positive definite n × n real matrix, then A has an LU
factorization with the property that each entry on the main diagonal in U is positive.
Hint: This is pretty clear if A is 1×1. Assume true for (n − 1) × (n − 1). Then
(
)
Aˆ
a
A=
at ann
Then as above, Aˆ is positive definite. Thus it has an LU factorization with all positive
entries on the diagonal of U . Notice that, using block multiplication,
(
) (
)(
)
LU
a
L 0
U L−1 a
A=
=
at ann
0 1
at ann
˜U
˜ where U
˜
Now consider that matrix on the right. Argue that it is of the form L
th
th
has all positive diagonal entries except possibly for the one in the n row and n
˜
column. Now explain why det (A) > 0 and argue that in fact all diagonal entries of U
are positive.
27. ↑Let A be a real symmetric n × n matrix and A = LU where L has all ones down the
diagonal and U has all positive entries down the main diagonal. Show that A = LDH
where L is lower triangular and H is upper triangular, each having all ones down the
diagonal and D a diagonal matrix having all positive entries down the main diagonal.
In fact, these are the diagonal entries of U .
28. ↑Show that if L, L1 are lower triangular with ones down the main diagonal and H, H1
are upper triangular with all ones down the main diagonal and D, D1 are diagonal
matrices having all positive diagonal entries, and if LDH = L1 D1 H1 , then L =
−1
L1 , H = H1 , D = D1 . Hint: Explain why D1−1 L−1
. Then explain
1 LD = H1 H
why the right side is upper triangular and the left side is lower triangular. Conclude
these are both diagonal matrices. However, there are all ones down the diagonal in
the expression on the right. Hence H = H1 . Do something similar to conclude that
L = L1 and then that D = D1 .
29. ↑Show that if A is a symmetric real matrix such that x → ⟨Ax, x⟩ is positive definite,
then there exists a lower triangular matrix L having all positive entries down the
diagonal such that A = LLt . Hint: From the above, A = LDH where L, H are
respectively lower and upper triangular having all ones down the diagonal and D is a
diagonal matrix having all positive entries. Then argue from the above problem and
symmetry of A that H = Lt . Now modify L by making it equal to LD1/2 . This is
called the Cholesky factorization.
340
CHAPTER 13. SELF ADJOINT OPERATORS
Chapter 14
Norms
In this chapter, X and Y are finite dimensional vector spaces which have a norm. The
following is a definition.
Definition 14.0.1 A linear space X is a normed linear space if there is a norm defined on
X, ||·|| satisfying
||x|| ≥ 0, ||x|| = 0 if and only if x = 0,
||x + y|| ≤ ||x|| + ||y|| ,
||cx|| = |c| ||x||
whenever c is a scalar. A set, U ⊆ X, a normed linear space is open if for every p ∈ U,
there exists δ > 0 such that
B (p, δ) ≡ {x : ||x − p|| < δ} ⊆ U.
Thus, a set is open if every point of the set is an interior point. Also, limn→∞ xn = x means
limn→∞ ∥xn − x∥ = 0. This is written sometimes as xn → x.
Note first that
∥x∥ = ∥x − y + y∥ ≤ ∥x − y∥ + ∥y∥
so
∥x∥ − ∥y∥ ≤ ∥x − y∥ .
Similarly
∥y∥ − ∥x∥ ≤ ∥x − y∥
and so
|∥x∥ − ∥y∥| ≤ ∥x − y∥ .
(14.1)
To begin with recall the Cauchy Schwarz inequality which is stated here for convenience
in terms of the inner product space, Cn .
Theorem 14.0.2 The following inequality holds for ai and bi ∈ C.
n
( n
)1/2 ( n
)1/2
∑
∑
∑ 2
2
ai bi ≤
|ai |
|bi |
.
i=1
i=1
i=1
341
(14.2)
342
CHAPTER 14. NORMS
Let X be a finite dimensional normed linear space with norm ||·|| where the field of
scalars is denoted by F and is understood to be either R or C. Let {v1 ,· · · , vn } be a basis
for X. If x ∈ X, denote by xi the ith component of x with respect to this basis. Thus
x=
n
∑
xi vi .
i=1
Definition 14.0.3 For x ∈ X and {v1 , · · · , vn } a basis, define a new norm by
|x| ≡
( n
∑
)1/2
|xi |
2
.
i=1
where
x=
n
∑
xi vi .
i=1
Similarly, for y ∈ Y with basis {w1 , · · · , wm }, and yi its components with respect to this
basis,
(m
)1/2
∑
2
|y| ≡
|yi |
i=1
For A ∈ L (X, Y ) , the space of linear mappings from X to Y,
||A|| ≡ sup{|Ax| : |x| ≤ 1}.
(14.3)
The first thing to show is that the two norms, ||·|| and |·| , are equivalent. This means
the conclusion of the following theorem holds.
Theorem 14.0.4 Let (X, ||·||) be a finite dimensional normed linear space and let |·| be
described above relative to a given basis, {v1 , · · · , vn } . Then |·| is a norm and there exist
constants δ, ∆ > 0 independent of x such that
δ ||x|| ≤ |x| ≤∆ ||x|| .
(14.4)
Proof: All of the above properties of a norm are obvious except the second, the triangle
inequality. To establish this inequality, use the Cauchy Schwarz inequality to write
|x + y|
2
≡
n
∑
2
|xi + yi | ≤
i=1
n
∑
2
|xi | +
i=1
2
2
2
2
≤ |x| + |y| + 2
( n
∑
i=1
|xi |
2
n
∑
2
|yi | + 2 Re
i=1
)1/2 ( n
∑
n
∑
xi y i
i=1
)1/2
2
|yi |
i=1
2
= |x| + |y| + 2 |x| |y| = (|x| + |y|)
and this proves the second property above.
It remains to show the equivalence of the two norms. By the Cauchy Schwarz inequality
again,
||x|| ≡
( n
)1/2
n
n
∑
∑
∑
2
xi vi ≤
|xi | ||vi || ≤ |x|
||vi ||
i=1
≡
δ −1 |x| .
i=1
i=1
343
This proves the first half of the inequality.
Suppose the second half of the inequality is not valid. Then there exists a sequence
xk ∈ X such that
k
x > k xk , k = 1, 2, · · · .
Then define
yk ≡
It follows
Letting
yik
xk
.
|xk |
k
y = 1, yk > k yk .
(14.5)
k
be the components of y with respect to the given basis, it follows the vector
( k
)
y1 , · · · , ynk
is a unit vector in Fn . By the Heine Borel theorem, there exists a subsequence, still denoted
by k such that
( k
)
y1 , · · · , ynk → (y1 , · · · , yn ) .
It follows from 14.5 and this that for
y=
n
∑
yi vi ,
i=1
n
n
∑
∑
k 0 = lim y = lim yik vi = yi vi k→∞
k→∞ i=1
i=1
but not all the yi equal zero. The last equation follows easily from 14.1 and
n
n
n
n
∑
∑
∑ (
) ∑ k
k k
yi vi − yi vi ≤ y i − y i vi ≤
yi − yi ∥vi ∥
i=1
i=1
i=1
i=1
This contradicts the assumption that {v1 , · · · , vn } is a basis and proves the second half of
the inequality. ∞
Definition 14.0.5 Let (X, ||·||) be a normed linear space and let {xn }n=1 be a sequence of
vectors. Then this is called a Cauchy sequence if for all ε > 0 there exists N such that if
m, n ≥ N, then
||xn − xm || < ε.
This is written more briefly as
lim ||xn − xm || = 0.
m,n→∞
Definition 14.0.6 A normed linear space, (X, ||·||) is called a Banach space if it is complete. This means that, whenever, {xn } is a Cauchy sequence there exists a unique x ∈ X
such that limn→∞ ||x − xn || = 0.
Corollary 14.0.7 If (X, ||·||) is a finite dimensional normed linear space with the field of
scalars F = C or R, then (X, ||·||) is a Banach space.
344
CHAPTER 14. NORMS
Proof: Let {xk } be a Cauchy sequence. Then letting the components of xk with respect
to the given basis be
xk1 , · · · , xkn ,
it follows from Theorem 14.0.4, that
( k
)
x1 , · · · , xkn
is a Cauchy sequence in Fn and so
)
( k
x1 , · · · , xkn → (x1 , · · · , xn ) ∈ Fn .
∑n
Thus, letting x = i=1 xi vi , it follows from the equivalence of the two norms shown above
that
lim xk − x = lim xk − x = 0. k→∞
k→∞
Corollary 14.0.8 Suppose X is a finite dimensional linear space with the field of scalars
either C or R and ||·|| and |||·||| are two norms on X. Then there exist positive constants, δ
and ∆, independent of x ∈ X such that
δ |||x||| ≤ ||x|| ≤ ∆ |||x||| .
Thus any two norms are equivalent.
This is very important because it shows that all questions of convergence can be considered relative to any norm with the same outcome.
Proof: Let {v1 , · · · , vn } be a basis for X and let |·| be the norm taken with respect to
this basis which was described earlier. Then by Theorem 14.0.4, there are positive constants
δ 1 , ∆1 , δ 2 , ∆2 , all independent of x ∈X such that
δ 2 |||x||| ≤ |x| ≤ ∆2 |||x||| , δ 1 ||x|| ≤ |x| ≤ ∆1 ||x|| .
Then
δ 2 |||x||| ≤ |x| ≤ ∆1 ||x|| ≤
and so
∆1
∆1 ∆2
|x| ≤
|||x|||
δ1
δ1
∆2
δ2
|||x||| ≤ ||x|| ≤
|||x||| ∆1
δ1
Definition 14.0.9 Let X and Y be normed linear spaces with norms ||·||X and ||·||Y respectively. Then L (X, Y ) denotes the space of linear transformations, called bounded linear
transformations, mapping X to Y which have the property that
||A|| ≡ sup {||Ax||Y : ||x||X ≤ 1} < ∞.
Then ||A|| is referred to as the operator norm of the bounded linear transformation A.
It is an easy exercise to verify that ||·|| is a norm on L (X, Y ) and it is always the case
that
||Ax||Y ≤ ||A|| ||x||X .
Furthermore, you should verify that you can replace ≤ 1 with = 1 in the definition. Thus
||A|| ≡ sup {||Ax||Y : ||x||X = 1} .
345
Theorem 14.0.10 Let X and Y be finite dimensional normed linear spaces of dimension
n and m respectively and denote by ||·|| the norm on either X or Y . Then if A is any linear
function mapping X to Y, then A ∈ L (X, Y ) and (L (X, Y ) , ||·||) is a complete normed
linear space of dimension nm with
||Ax|| ≤ ||A|| ||x|| .
Proof: It is necessary to show the norm defined on linear transformations really is a
norm. Again the first and third properties listed above for norms are obvious. It remains to
show the second and verify ||A|| < ∞. Letting {v1 , · · · , vn } be a basis and |·| defined with
respect to this basis as above, there exist constants δ, ∆ > 0 such that
δ ||x|| ≤ |x| ≤ ∆ ||x|| .
Then,
||A + B|| ≡ sup{||(A + B) (x)|| : ||x|| ≤ 1}
≤ sup{||Ax|| : ||x|| ≤ 1} + sup{||Bx|| : ||x|| ≤ 1} ≡ ||A|| + ||B|| .
Next consider the claim that ||A|| < ∞. This follows from
( n
)
n
∑
∑
||A (x)|| = A
xi vi ≤
|xi | ||A (vi )||
i=1
(
≤ |x|
Thus ||A|| ≤ ∆
(∑
n
i=1
n
∑
)1/2
2
||A (vi )||
i=1
2
||A (vi )||
i=1
(
≤ ∆ ||x||
n
∑
)1/2
||A (vi )||
2
< ∞.
i=1
)1/2
.
Next consider the assertion about the dimension of L (X, Y ) . It follows from Theorem
9.2.3. By Corollary 14.0.7 (L (X, Y ) , ||·||) is complete. If x ̸= 0,
x 1
≤ ||A|| ||Ax||
= A
||x|| ||x|| Note by Corollary 14.0.8 you can define a norm any way desired on any finite dimensional
linear space which has the field of scalars R or C and any other way of defining a norm on
this space yields an equivalent norm. Thus, it doesn’t much matter as far as notions of
convergence are concerned which norm is used for a finite dimensional space. In particular
in the space of m × n matrices, you can use the operator norm defined above, or some
other way of giving this space a norm. A popular choice for a norm is the Frobenius norm
discussed earlier but reviewed here.
Definition 14.0.11 Make the space of m×n matrices into a inner product space by defining
(A, B) ≡ trace (AB ∗ ) .
Another way of describing a norm for an n × n matrix is as follows.
Definition 14.0.12 Let A be an m × n matrix. Define the spectral norm of A, written as
||A||2 to be
{
}
max λ1/2 : λ is an eigenvalue of A∗ A .
That is, the largest singular value of A. (Note the eigenvalues of A∗ A are all positive because
if A∗ Ax = λx, then
λ (x, x) = (A∗ Ax, x) = (Ax,Ax) ≥ 0.)
346
CHAPTER 14. NORMS
Actually, this is nothing new. It turns out that ||·||2 is nothing more than the operator
norm for A taken with respect to the usual Euclidean norm,
(
|x| =
n
∑
)1/2
2
|xk |
.
k=1
Proposition 14.0.13 The following holds.
||A||2 = sup {|Ax| : |x| = 1} ≡ ||A|| .
Proof: Note that A∗ A is Hermitian and so by Corollary 13.3.4,
{
}
{
}
1/2
1/2
||A||2 = max (A∗ Ax, x)
: |x| = 1 = max (Ax,Ax)
: |x| = 1
= max {|Ax| : |x| = 1} = ||A|| . Here is another proof of this
Recall there are unitary matrices of the right
( proposition.
)
σ 0
size U, V such that A = U
V ∗ where the matrix on the inside is as described
0 0
in the section on the singular value decomposition. Then since unitary matrices preserve
norms,
||A|| =
=
(
σ
sup U
0
|x|≤1
(
σ
sup U
0
|y|≤1
V ∗ x =
(
)
σ 0
sup U
V ∗ x
0 0
|V ∗ x|≤1
(
) ) σ 0
0
y = sup y = σ 1 ≡ ||A||2
0
0
0
|y|≤1
0
0
)
This completes the alternate proof.
From now on, ||A||2 will mean either the operator norm of A taken with respect to the
usual Euclidean norm or the largest singular value of A, whichever is most convenient.
An interesting application of the notion of equivalent norms on Rn is the process of
giving a norm on a finite Cartesian product of normed linear spaces.
Definition 14.0.14 Let Xi , i = 1, · · · , n be normed linear spaces with norms, ||·||i . For
x ≡ (x1 , · · · , xn ) ∈
n
∏
Xi
i=1
define θ :
∏n
i=1
Xi → Rn by
θ (x) ≡ (||x1 ||1 , · · · , ||xn ||n )
∏n
Then if ||·|| is any norm on Rn , define a norm on i=1 Xi , also denoted by ||·|| by
||x|| ≡ ||θx|| .
The following theorem follows immediately from Corollary 14.0.8.
Theorem
14.0.15 Let Xi and ||·||i be given in the above definition and consider the norms
∏n
on
X
described there in terms of norms on Rn . Then any two of these norms on
i
i=1
∏n
i=1 Xi obtained in this way are equivalent.
14.1. THE P NORMS
347
For example, define
||x||1 ≡
n
∑
|xi | ,
i=1
||x||∞ ≡ max {|xi | , i = 1, · · · , n} ,
or
(
||x||2 =
and all three are equivalent norms on
14.1
n
∑
)1/2
|xi |
2
i=1
∏n
i=1
Xi .
The p Norms
In addition to ||·||1 and ||·||∞ mentioned above, it is common to consider the so called p
norms for x ∈ Cn .
Definition 14.1.1 Let x ∈ Cn . Then define for p ≥ 1,
||x||p ≡
( n
∑
)1/p
|xi |
p
i=1
The following inequality is called Holder’s inequality.
Proposition 14.1.2 For x, y ∈ Cn ,
n
∑
(
|xi | |yi | ≤
i=1
n
∑
)1/p (
|xi |
p
i=1
n
∑
)1/p′
p
|yi |
′
i=1
The proof will depend on the following lemma.
Lemma 14.1.3 If a, b ≥ 0 and p′ is defined by
1
p
+
1
p′
= 1, then
′
ab ≤
ap
bp
+ ′.
p
p
Proof of the Proposition: If x or y equals the zero vector there is nothing to
∑n
p 1/p
prove. Therefore, assume they are both nonzero. Let A = ( i=1 |xi | )
and B =
′
)
(∑
1/p
n
p′
. Then using Lemma 14.1.3,
i=1 |yi |
[ (
)p
(
)p′ ]
n
∑
1 |xi |
1 |yi |
≤
+ ′
A B
p A
p
B
i=1
n
∑
|xi | |yi |
i=1
n
n
1 1 ∑
1
1 1 ∑
1
p
p′
=
|xi | + ′ p
|yi | = + ′ = 1
p Ap i=1
p B i=1
p p
and so
n
∑
i=1
(
|xi | |yi | ≤ AB =
n
∑
i=1
p
|xi |
)1/p ( n
∑
i=1
)1/p′
|yi |
p′
.
348
CHAPTER 14. NORMS
Theorem 14.1.4 The p norms do indeed satisfy the axioms of a norm.
Proof: It is obvious that ||·||p does indeed satisfy most of the norm axioms. The only
one that is not clear is the triangle inequality. To save notation write ||·|| in place of ||·||p
in what follows. Note also that pp′ = p − 1. Then using the Holder inequality,
||x + y||
p
=
n
∑
|xi + yi |
p
|xi + yi |
p−1
i=1
≤
=
n
∑
i=1
n
∑
p
( n
∑
p−1
|yi |
p
|xi + yi | p′ |yi |
i=1
p/p
||x + y||
p/p′
so dividing by ||x + y||
|xi + yi |
)1/p′ ( n
)1/p ( n
)1/p 
∑
∑
p
p
p


|xi + yi |
|xi |
+
|yi |
i=1
=
n
∑
i=1
n
∑
|xi + yi | p′ |xi | +
i=1
≤
|xi | +
(
′
i=1
||x||p + ||y||p
i=1
)
, it follows
−p/p′
p
||x + y|| ||x + y||
= ||x + y|| ≤ ||x||p + ||y||p
(
)
)
(
p − pp′ = p 1 − p1′ = p p1 = 1. . It only remains to prove Lemma 14.1.3.
Proof of the lemma: Let p′ = q to save on notation and consider the following picture:
x
b
x = tp−1
t = xq−1
t
a
∫
ab ≤
∫
a
b
tp−1 dt +
0
xq−1 dx =
0
Note equality occurs when ap = bq .
Alternate proof of the lemma: Let
f (t) ≡
1
1
p
(at) +
p
q
bq
ap
+ .
p
q
( )q
b
, t>0
t
You see right away it is decreasing for a while, having an asymptote at t = 0 and then
reaches a minimum and increases from then on. Take its derivative.
( )q−1 ( )
−b
b
p−1
′
f (t) = (at)
a+
t
t2
Set it equal to 0. This happens when
tp+q =
bq
.
ap
(14.6)
14.2. THE CONDITION NUMBER
349
Thus
t=
and so at this value of t,
at = (ab)
q/(p+q)
bq/(p+q)
ap/(p+q)
( )
b
p/(p+q)
,
= (ab)
.
t
Thus the minimum of f is
)p 1 (
)q
1(
q/(p+q)
p/(p+q)
pq/(p+q)
(ab)
+
(ab)
= (ab)
p
q
but recall 1/p + 1/q = 1 and so pq/ (p + q) = 1. Thus the minimum value of f is ab. Letting
t = 1, this shows
bq
ap
+ .
ab ≤
p
q
Note that equality occurs when the minimum value happens for t = 1 and this indicates
from 14.6 that ap = bq . Now ||A||p may be considered as the operator norm of A taken with respect to ||·||p . In
the case when p = 2, this is just the spectral norm. There is an easy estimate for ||A||p in
terms of the entries of A.
Theorem 14.1.5 The following holds.


q/p 1/q
∑
∑


p

||A||p ≤ 
|Ajk |  
k
j
Proof: Let ||x||p ≤ 1 and let A = (a1 , · · · , an ) where the ak are the columns of A. Then
)
(
∑
xk ak
Ax =
k
and so by Holder’s inequality,
∑
∑
||Ax||p ≡ xk ak ≤
|xk | ||ak ||p ≤
k
≤
(
∑
k
14.2
|xk |
)1/p (
∑
p
k
p
)1/q
q
||ak ||p

q/p 1/q
∑
∑


p

≤
|Ajk |  
k

k
j
The Condition Number
Let A ∈ L (X, X) be a linear transformation where X is a finite dimensional vector space
and consider the problem Ax = b where it is assumed there is a unique solution to this
problem. How does the solution change if A is changed a little bit and if b is changed a
little bit? This is clearly an interesting question because you often do not know A and b
exactly. If a small change in these quantities results in a large change in the solution, x,
then it seems clear this would be undesirable. In what follows ||·|| when applied to a linear
transformation will always refer to the operator norm.
350
CHAPTER 14. NORMS
Lemma 14.2.1 Let A, B ∈ L (X, X) where X is a normed vector space as above. Then for
||·|| denoting the operator norm,
||AB|| ≤ ||A|| ||B|| .
Proof: This follows from the definition. Letting ||x|| ≤ 1, it follows from Theorem
14.0.10
||ABx|| ≤ ||A|| ||Bx|| ≤ ||A|| ||B|| ||x|| ≤ ||A|| ||B||
and so
||AB|| ≡ sup ||ABx|| ≤ ||A|| ||B|| . ||x||≤1
Lemma 14.2.2 Let A, B ∈ L (X, X) , A−1 ∈ L (X, X) , and suppose ||B|| < 1/ A−1 .
)−1
−1 (
Then (A + B) , I + A−1 B
exists and
(
)−1
)−1 (
(14.7)
≤ 1 − A−1 B I + A−1 B
−1 (A + B) ≤ A−1 1
.
−1
1 − ||A B|| (14.8)
The above formula makes sense because A−1 B < 1.
Proof: By Lemma 14.2.1,
−1 −1 A B ≤ A ||B|| < A−1 1
=1
||A−1 ||
(14.9)
Then from the triangle inequality,
(
) I + A−1 B x ≥ ||x|| − A−1 Bx
)
(
≥ ||x|| − A−1 B ||x|| = 1 − A−1 B ||x||
that
I + A−1 B is one to one because from 14.9, 1 − A−1 B > 0. Thus if
(It follows
)
−1
I + A−1 B x = 0, then x = 0. Thus
( I + A−1 B) is also onto, taking a basis to a basis. Then
a generic y ∈ X is of the form y = I + A B x and the above shows that
(
)−1
)−1 (
y ≤ 1 − A−1 B ∥y∥
I + A−1 B
(
)
which verifies 14.7. Thus (A + B) = A I + A−1 B is one to one and this with Lemma
14.2.1 implies 14.8. Proposition
Suppose A is invertible, b ̸= 0, Ax = b, and (A + B) x1 = b1 where
14.2.3
||B|| < 1/ A−1 . Then
−1 A ∥A∥ ( ∥b1 − b∥ ∥B∥ )
||x1 − x||
≤
+
||x||
1 − ∥A−1 B∥
∥b∥
∥A∥
Proof: This follows from the above lemma.
(
)−1 −1
−1
−1 I
+
A
B
A
b
−
A
b
1
∥x1 − x∥
=
∥x∥
∥A−1 b∥
−1
(
)
A b1 − I + A−1 B A−1 b
1
≤
1 − ∥A−1 B∥
∥A−1 b∥
14.3. THE SPECTRAL RADIUS
≤
≤
351
−1
A (b1 − b) + A−1 BA−1 b
1
1 − ∥A−1 B∥
∥A−1 b∥
−1 (
)
A ∥b1 − b∥
+
∥B∥
1 − ∥A−1 B∥ ∥A−1 b∥
because A−1 b/ A−1 b is a unit vector. Now multiply and divide by ∥A∥ . Then
−1 )
A ∥A∥ ( ∥b1 − b∥
∥B∥
≤
+
1 − ∥A−1 B∥ ∥A∥ ∥A−1 b∥ ∥A∥
−1 (
A ∥A∥ ∥b1 − b∥ ∥B∥ )
+
. ≤
1 − ∥A−1 B∥
∥b∥
∥A∥
This shows that the number, A−1 ||A|| , controls how sensitive the relative change in
the solution of Ax = b is to small changes in A and b. This number is called the condition
number. It is bad when this number is large because a small relative change in b, for example
could yield a large relative change in x.
Recall that for A an n × n matrix, ||A||2 = σ 1 where σ 1 is the largest singular value. The
largest singular value of A−1 is therefore, 1/σ n where σ n is the smallest singular value of A.
Therefore, the condition number reduces to σ 1 /σ n , the ratio of the largest to the smallest
singular value of A provided the norm is the usual Euclidean norm.
14.3
The Spectral Radius
Even though it is in general impractical to compute the Jordan form, its existence is all that
is needed in order to prove an important theorem about something which is relatively easy
to compute. This is the spectral radius of a matrix.
Definition 14.3.1 Define σ (A) to be the eigenvalues of A. Also,
ρ (A) ≡ max (|λ| : λ ∈ σ (A))
The number, ρ (A) is known as the spectral radius of A.
Recall the following symbols and their meaning.
lim sup an , lim inf an
n→∞
n→∞
They are respectively the largest and smallest limit points of the sequence {an } where ±∞
is allowed in the case where the sequence is unbounded. They are also defined as
lim sup an
n→∞
lim inf an
n→∞
≡
≡
lim (sup {ak : k ≥ n}) ,
n→∞
lim (inf {ak : k ≥ n}) .
n→∞
Thus, the limit of the sequence exists if and only if these are both equal to the same real
number.
Lemma 14.3.2 Let J be a p × p Jordan matrix

J1

..
J =
.



Js
352
CHAPTER 14. NORMS
where each Jk is of the form
Jk = λk I + Nk
in which Nk is a nilpotent matrix having zeros down the main diagonal and ones down the
super diagonal. Then
1/n
lim ||J n ||
=ρ
n→∞
where ρ = max {|λk | , k = 1, . . . , n}. Here the norm is defined to equal
||B|| = max {|Bij | , i, j} , ∥x∥ = max {|xi |} .
Proof: If ρ = 0, there is nothing to show because then each Jk is nilpotent and so
J n = 0 for all n large enough. Therefore, assume ρ > 0 and consider Jkn . Since Nkp = 0, (In
fact, Nkm = 0 for some m ≤ p.) it follows that
)
)
p (
p (
n−i
∑
1 n
1 ∑ n
λnk
n
i n−i
i λk
J
=
N
λ
I
+
N
=
k
k
k
k
i
i
ρn
ρn i=0
ρn
ρn
i=1
Now by the root test or the ratio test, if λk < ρ,
(
( )n
)
n−i λk
λ
n
i
i k
lim =0
Nk n ≤ lim Cn
i
n→∞ ρ n→∞
ρ
Therefore, for such λk < ρ, it follows that
Jkn
=0
n→∞ ρn
lim
What happens when λk = ρ? In this case,
∑
1 n
Jk = ω n I +
n
ρ
i=1
p
(
)
n
i
Nki ω n−i
1
ρi
where |ω| = 1. Recall that the entries of Nki are either 1 or 0. Therefore, an upper bound
to the absolute values of the entries of the matrix on the right is
Cnp , C = C (ρ)
It follows that
1 n 1/n
1 n 1/n
lim sup J
=
lim
sup
J
≤1
n k
n n→∞ ρ
n→∞ ρ
because n1/n → 1. Therefore,
1/n
lim sup ∥J n ∥
n→∞
≤ ρ.
Now let x be a unit eigenvector corresponding to λk . Then
p ∥J n ∥ ≥ ∥J n x∥ = ρn
1/n
and so ∥J n ∥
≥ ρ. Therefore,
1/n
lim inf ∥J n ∥
n→∞
which proves the lemma. 1/n
= lim inf p1/n ∥J n ∥
n→∞
1/n
≥ ρ ≥ lim sup ∥J n ∥
n→∞
14.3. THE SPECTRAL RADIUS
353
The following theorem is due to Gelfand around 1941. First note that if A is a p × p
matrix, and ∥·∥ is the norm just used,
∥AB∥ ≤ p ∥A∥ ∥B∥
∑
p
Aij Bjk ≤ p ∥A∥ ∥B∥
j=1
This is because
Theorem 14.3.3 (Gelfand) Let A be a complex p × p matrix. Then if ρ is the absolute
value of its largest eigenvalue,
1/n
lim ||An ||
= ρ.
n→∞
Here ||·|| is any norm on L (Cn , Cn ).
Proof: First assume ||·|| is the special norm of the above lemma. Then letting J denote
the Jordan form of A, S −1 AS = J, it follows from Lemma 14.3.2
1/n
)1/n
(
1/n
lim sup ||An ||
= lim sup SJ n S −1 ≤ lim sup p ∥S∥ J n S −1 n→∞
n→∞
n→∞
(( )
)1/n
≤ lim sup p2 ||S|| S −1 ||J n ||
=ρ
n→∞
Letting λ be the largest eigenvalue of A, |λ| = ρ, and Ax = λx where ∥x∥ = 1,
p ∥An ∥ ≥ ∥An x∥ = ρn
and so
1/n
lim inf ∥An ∥
n→∞
1/n
= lim inf p1/n ∥An ∥
n→∞
n 1/n
1/n
≥ ρ ≥ lim sup ∥An ∥
n→∞
n 1/n
1/n
If follows that lim inf n→∞ ||A ||
= lim supn→∞ ||A ||
= limn→∞ ||An ||
= ρ.
Now by equivalence of norms, if |||·||| is any other norm for the set of complex p × p
matrices, there exist constants δ, ∆ such that
δ ||An || ≤ |||An ||| ≤ ∆ ||An ||
Then raising to the 1/n power and taking a limit,
ρ ≤ lim inf |||An |||
n→∞

9
Example 14.3.4 Consider  −2
1
eigenvalue.
1/n
1/n
≤ lim sup |||An |||
n→∞
≤ρ 
−1 2
8 4  . Estimate the absolute value of the largest
1 8
A laborious computation reveals the eigenvalues are 5, and 10. Therefore, the right
1/7
answer in this case is 10. Consider A7 where the norm is obtained by taking the
maximum of all the absolute values of the entries. Thus

7 
9 −1 2
8015 625 −1984 375
 −2 8 4  =  −3968 750 6031 250
1
1 8
1984 375
1984 375
and taking the seventh root of the largest entry gives

3968 750
7937 500 
6031 250
ρ (A) ≈ 8015 6251/7 = 9. 688 951 236 71.
Of course the interest lies primarily in matrices for which the exact roots to the characteristic
equation are not known and in the theoretical significance.
354
14.4
CHAPTER 14. NORMS
Series And Sequences Of Linear Operators
Before beginning this discussion, it is necessary to define what is meant by convergence in
L (X, Y ) .
∞
Definition 14.4.1 Let {Ak }k=1 be a sequence in L (X, Y ) where X, Y are finite dimensional normed linear spaces. Then limn→∞ Ak = A if for every ε > 0 there exists N such
that if n > N, then
||A − An || < ε.
Here the norm refers to any of the norms defined on L (X, Y ) . By Corollary 14.0.8 and
Theorem 9.2.3 it doesn’t matter which one is used. Define the symbol for an infinite sum in
the usual way. Thus
∞
n
∑
∑
Ak ≡ lim
Ak
n→∞
k=1
k=1
∞
{Ak }k=1
Lemma 14.4.2 Suppose
is a sequence in L (X, Y ) where X, Y are finite dimensional normed linear spaces. Then if
∞
∑
||Ak || < ∞,
k=1
It follows that
∞
∑
Ak
(14.10)
k=1
exists. In words, absolute convergence implies convergence.
Proof: For p ≤ m ≤ n,
n
m
∞
∑
∑
∑
Ak −
Ak ≤
||Ak ||
k=1
k=1
k=p
and so for p large enough, this term on the right in the above inequality is less than ε. Since
ε is arbitrary, this shows the partial sums of 14.10 are a Cauchy sequence. Therefore by
Corollary 14.0.7 it follows that these partial sums converge. As a special case, suppose λ ∈ C and consider
∞ k k
∑
t λ
k=0
k
k!
k
where t ∈ R. In this case, Ak = t k!λ and you can think of it as being in L (C, C). Then the
following corollary is of great interest.
Corollary 14.4.3 Let
f (t) ≡
∞ k k
∑
t λ
k=0
k!
≡1+
∞ k k
∑
t λ
k=1
k!
Then this function is a well defined complex valued function and furthermore, it satisfies the
initial value problem,
y ′ = λy, y (0) = 1
Furthermore, if λ = a + ib,
|f | (t) = eat .
14.4. SERIES AND SEQUENCES OF LINEAR OPERATORS
355
Proof: That f (t) makes sense follows right away from Lemma 14.4.2.
∞ k k
∞
k
k
∑
t λ ∑ |t| |λ|
= e|t||λ|
=
k! k!
k=0
k=0
It only remains to verify f satisfies the differential equation because it is obvious from the
series that f (0) = 1.
)
(
k
k
∞
λk
(t
+
h)
−
t
∑
1
f (t + h) − f (t)
=
h
h
k!
k=1
and by the mean value theorem, this equals an expression of the following form where θk is
a number between 0 and 1.
∞
k−1 k
∑
k (t + θk h)
λ
=
k!
k=1
∞
k−1 k
∑
(t + θk h)
λ
k=1
∞
∑
= λ
k=0
(k − 1)!
k
(t + θk h) λk
k!
It only remains to verify this converges to
λ
∞ k k
∑
t λ
k=0
as h → 0.
k!
= λf (t)
) ∞ (
∞
k
k
∞ k k
∑ (t + θ h)k λk ∑
(t
+
θ
h)
−
t
λk ∑
k
t
λ
k
−
=
k!
k! k!
k=0
k=0
k=0
and by the mean value theorem again and the triangle inequality
∞
∞
k−1
k
k−1
k
∑
∑
k |(t + η k h)|
|λ|
k |(t + η k h)|
|h| |λ| ≤
≤ |h|
k!
k!
k=0
k=0
where η k is between 0 and 1. Thus for |h| < 1,
≤ |h|
∞
k−1
k
∑
k (|t| + 1)
|λ|
k!
k=0
= |h| C (t)
It follows f ′ (t) = λf (t) . This proves the first part.
Next note that for f (t) = u (t) + iv (t) , both u, v are differentiable. This is because
u=
f −f
f +f
, v=
.
2
2i
Then from the differential equation,
(a + ib) (u + iv) = u′ + iv ′
and equating real and imaginary parts,
u′ = au − bv, v ′ = av + bu.
356
CHAPTER 14. NORMS
Then a short computation shows
( 2
)′
(
)
u + v 2 = 2uu′ + 2vv ′ = 2u (au − bv) + 2v (av + bu) = 2a u2 + v 2
( 2
)
2
u + v 2 (0) = |f | (0) = 1
Now in general, if
y ′ = cy, y (0) = 1,
with c real it follows y (t) = ect . To see this,
y ′ − cy = 0
and so, multiplying both sides by e−ct you get
d ( −ct )
=0
ye
dt
and so ye−ct equals a constant which must be 1 because of the initial condition y (0) = 1.
Thus
)
( 2
u + v 2 (t) = e2at
and taking square roots yields the desired conclusion. Definition 14.4.4 The function in Corollary 14.4.3 given by that power series is denoted
as
exp (λt) or eλt .
The next lemma is normally discussed in advanced calculus courses but is proved here
for the convenience of the reader. It is known as the root test.
Definition 14.4.5 For {an } any sequence of real numbers
lim sup an ≡ lim (sup {ak : k ≥ n})
n→∞
n→∞
Similarly
lim inf an ≡ lim (inf {ak : k ≥ n})
n→∞
n→∞
In case An is an increasing (decreasing) sequence which is unbounded above (below) then it
is understood that limn→∞ An = ∞ (−∞) respectively. Thus either of lim sup or lim inf can
equal +∞ or −∞. However, the important thing about these is that unlike the limit, these
always exist.
It is convenient to think of these as the largest point which is the limit of some subsequence of {an } and the smallest point which is the limit of some subsequence of {an }
respectively. Thus limn→∞ an exists and equals some point of [−∞, ∞] if and only if the
two are equal.
Lemma 14.4.6 Let {ap } be a sequence of nonnegative terms and let
r = lim sup a1/p
p .
p→∞
∑∞
Then if r < 1, it follows the series, k=1 ak converges and if r > 1, then ap fails to converge
to 0 so the series diverges. If A is an n × n matrix and
1/p
r = lim sup ||Ap ||
,
(14.11)
p→∞
then if r > 1, then
∑∞
k=0
Ak fails to converge and if r < 1 then the series converges.
14.5. ITERATIVE METHODS FOR LINEAR SYSTEMS
357
Proof: Suppose r < 1. Then there exists N such that if p > N,
a1/p
<R
p
where r < R < ∑
1. Therefore, for
all such p, ap < Rp and so by comparison with the
∑∞
geometric series,
Rp , it follows p=1 ap converges.
Next suppose r > 1. Then letting 1 < R < r, it follows there are infinitely many values
of p at which
R < a1/p
p
which implies Rp < ap , showing that ap cannot converge to 0 and so the series cannot
converge either.
{∑m
}
k ∞
To see the last claim, if r > 1, then ||Ap || fails to converge to 0 and so
k=0 A
m=0
∑∞
∑
m
is not a Cauchy sequence. Hence k=0 Ak ≡ limm→∞ k=0 Ak cannot exist. If r < 1, then
∑
1/n
for all n large enough, ∥An ∥
≤ r < 1 for
r so ∥An ∥ ≤ rn . Hence n ∥An ∥ converges
∑some
∞
and so by Lemma 14.4.2, it follows that k=1 Ak also converges. p
Now denote by σ (A) the collection of all numbers of the form λp where λ ∈ σ (A) .
p
Lemma 14.4.7 σ (Ap ) = σ (A) ≡ {λp : λ ∈ σ (A)}.
Proof: In dealing with σ (Ap ) , it suffices to deal with σ (J p ) where J is the Jordan form
of A because J p and Ap are similar. Thus if λ ∈ σ (Ap ) , then λ ∈ σ (J p ) and so λ = α
where α is one of the entries on the main diagonal of J p . These entries are of the form λp
p
p
where λ ∈ σ (A). Thus λ ∈ σ (A) and this shows σ (Ap ) ⊆ σ (A) .
Now take α ∈ σ (A) and consider αp .
(
)
αp I − Ap = αp−1 I + · · · + αAp−2 + Ap−1 (αI − A)
and so αp I − Ap fails to be one to one which shows that αp ∈ σ (Ap ) which shows that
p
σ (A) ⊆ σ (Ap ) . 14.5
Iterative Methods For Linear Systems
Consider the problem of solving the equation
Ax = b
(14.12)
where A is an n × n matrix. In many applications, the matrix A is huge and composed
mainly of zeros. For such matrices, the method of Gauss elimination (row operations) is
not a good way to solve the system because the row operations can destroy the zeros and
storing all those zeros takes a lot of room in a computer. These systems are called sparse.
To solve them, it is common to use an iterative technique. I am following the treatment
given to this subject by Nobel and Daniel [21].
Definition 14.5.1 The Jacobi iterative technique, also called the method of simultaneous
corrections is defined as follows. Let x1 be an initial vector, say the zero vector or some
other vector. The method generates a succession of vectors, x2 , x3 , x4 , · · · and hopefully this
sequence of vectors will converge to the solution to 14.12. The vectors in this list are called
iterates and they are obtained according to the following procedure. Letting A = (aij ) ,
∑
aii xr+1
=−
aij xrj + bi .
(14.13)
i
j̸=i
358
CHAPTER 14. NORMS
In terms of matrices, letting

∗
 ..
A= .
∗
···
..
.
···

∗
.. 
. 
∗
The iterates are defined as






∗
0
0
..
.
∗
..
0
.
···
···
..
.
..
.
0

0
xr+1
..   1r+1
x2
. 

.

 ..
0 
xr+1
n
∗






 = −



0
∗
∗
..
.
0
..
∗
.
···
···
..
.
..
.
∗

 
∗
xr1

..
 xr2  
. 
 

..  + 

 

.
∗ 
r
x
n
0

b1
b2 


.. (14.14)
. 
bn
The matrix on the left in 14.14 is obtained by retaining the main diagonal of A and
setting every other entry equal to zero. The matrix on the right in 14.14 is obtained from A
by setting every diagonal entry equal to zero and retaining all the other entries unchanged.
Example 14.5.2 Use the Jacobi method to solve the system


 
3 1 0 0
x1
1
 1 4 1 0   x2   2


 
 0 2 5 1   x3  =  3
0 0 2 4
x4
4




Of course this is solved most easily using row reductions. The Jacobi method is useful when the matrix is very large. This example is just to illustrate how the method
works.
) it using row operations. The exact solution from( row reduction is
( 6 11First8 lets25solve
, which in terms of decimals is approximately equal to 0.207 0.379 0.276
29
29
29
29
In terms of the matrices, the Jacobi iteration is of the form

  r+1 

 r   
x1
3 0 0 0
0 1 0 0
x1
1
 0 4 0 0   xr+1 
 1 0 1 0   xr2   2 

 2



  
 0 0 5 0   xr+1  = −  0 2 0 1   xr3  +  3  .
3
0 0 0 4
0 0 2 0
xr4
4
xr+1
4
Multiplying by the inverse of the matrix on the left, 1 this iteration reduces to
 r+1 
 r   1 

x1
0 13 0 0
x1
3
 xr+1 
 1 0 1 0   xr2   1 
 2r+1  = −  4 2 4 1   r  +  23  .
 x

 0
0 5   x3   5 
3
5
1
r+1
xr4
1
0
0
0
x4
2
(14.15)
Now iterate this starting with


0
 0 

x1 ≡ 
 0 .
0
1 You certainly would not compute the invese in solving a large system. This is just to show you how the
method works for this simple example. You would use the first description in terms of indices.
)
0.862 .
14.5. ITERATIVE METHODS FOR LINEAR SYSTEMS

Thus
1
3
0

x2 = − 

1
4
0
0
0
0
0
1
4
2
5
0
1
2
Then


x3 = − 


0

0 

1 
5
0
  1
0
3
 1
0 
 +  23
0   5
0
1
x2
0
1
3
1
4
0
0
0
0
2
5
0
1
4
0
1
2
359
z }| {

1
 1
0
3
3
 1   1
0 
2 + 2


1  3 
 3
5
5
5
1
1
0


 
=
 
1
3
1
2
3
5




1



. 166
  . 26 

=
  .2 
.7
Continuing this way one finally gets


x6 = − 

0
1
3
1
4
0
0
0
0
2
5
0
1
4
0
1
2
 z
x5
}|
0
. 197

0 
  . 351
1 
. 256 6
5
. 822
0
{


. 216
 
  . 386 
+
=

 
  . 295  .
. 871
1
1
3
1
2
3
5


You can keep going like this. Recall the solution is approximately equal to


. 206
 . 379 


 . 275 
. 862
so you see that with no care at all and only 6 iterations, an approximate solution has been
obtained which is not too far off from the actual solution.
Definition 14.5.3 The Gauss Seidel method, also called the method of successive corrections is given as follows. For A = (aij ) , the iterates for the problem Ax = b are obtained
according to the formula
i
n
∑
∑
aij xr+1
=
−
aij xrj + bi .
(14.16)
j
j=1
In terms of matrices, letting
j=i+1

∗
 ..
A= .
∗
···
..
.
···

∗
.. 
. 
∗
The iterates are defined as






∗
0
∗
..
.
∗
..
.
∗
···
···
..
.
..
.
∗

0
xr+1
..   1r+1
x2
. 


  ..
.
0 
r+1
x
n
∗


0


 0

 = −
 .

 ..
0
∗
0
..
.
···
···
..
.
..
.
0

∗
..  
. 



∗ 
0
xr1
xr2
..
.
xrn


 
 
+
 

b1
b2 


.. (14.17)
. 
bn
In words, you set every entry in the original matrix which is strictly above the main
diagonal equal to zero to obtain the matrix on the left. To get the matrix on the right,
you set every entry of A which is on or below the main diagonal equal to zero. Using the
360
CHAPTER 14. NORMS
iteration procedure of 14.16 directly, the Gauss Seidel method makes use of the very latest
information which is available at that stage of the computation.
The following example is the same as the example used to illustrate the Jacobi method.
Example 14.5.4 Use the Gauss Seidel method to


3 1 0 0
x1
 1 4 1 0   x2


 0 2 5 1   x3
x4
0 0 2 4
In terms of

3
 1

 0
0
matrices, this procedure is

  r+1 
x1
0 0 0
 r+1 

4 0 0 
  x2r+1  = − 


2 5 0   x3
0 2 4
xr+1
4
0
0
0
0
solve the system
  
1
  2 
= 
  3 
4
1
0
0
0
0
1
0
0
 r
0
x1
 xr2
0 

1   xr3
0
xr4


1
  2 
 +  .
  3 
4
Multiplying by the inverse of the matrix on the left2 this yields
 r+1 
 r  

1
x1
0
0
0
x1
3
1
 xr+1 
  xr2  
 0 −1
0
12
4
 2r+1  = − 
 r  + 
1
1
1
 x
  x3  

 0
− 10
3
30
5
1
1
1
xr4
0
−
−
xr+1
4
60
20
10

1
3
5
12
13
30
47
60




As before, I will be totally unoriginal in the choice of x1 . Let it equal the zero vector.
Therefore,
 1 

x =

2
3
5
12
13
30
47
60
Now
x2

1
0
3
1

0
−
3
12

x = −
1
0
30
1
0 − 60
Continuing this way,

1
0
3
1

0 − 12
4

x = −
1
0
30
1
0 − 60
and so

0
 0
5
x = −
 0
0

.

1
3
1
− 12
1
30
1
− 60
0
 z }|
{
1
0
0
1
4
1
− 10
1
5
1
− 10
0
0
0
1
20
1
4
1
− 10
1
20
0
1
4
1
− 10
1
20
1
5
1
− 10

 
+
 
 
. 194
  . 343  

 
  . 306  + 
. 846

1
5
1
− 10
0
0



3
5
12
13
30
47
60

 
. 219
  . 368 75  

 
  . 283 3  + 
. 858 35
1
3
5
12
13
30
47
60
1
3
5
12
13
30
47
60
1
3
5
12
13
30
47
60



. 194
  . 343 
=

  . 306  .
. 846

. 219
  . 368 75 
=

  . 283 3 
. 858 35





. 210 42
  . 376 57 
=

  . 277 7  .
. 861 15
This is fairly close to the answer. You could continue doing these iterates and it appears
they converge to the solution. Now consider the following example.
2 As in the case of the Jacobi iteration, the computer would not do this. It would use the iteration
procedure in terms of the entries of the matrix directly. Otherwise all benefit to using this method is lost.
14.5. ITERATIVE METHODS FOR LINEAR SYSTEMS
Example 14.5.5 Use the Gauss Seidel method to


x1
1 4 0 0
 1 4 1 0   x2


 0 2 5 1   x3
x4
0 0 2 4
361
solve the system
  
1
  2 
= 
  3 
4
The exact solution is given by doing
row operations
(
this is done the solution is seen to be 6.0 −1. 25 1.0
are of the form


  r+1 
x1
0 4 0
1 0 0 0
 0 0 1
 1 4 0 0   xr+1 
2




 0 2 5 0   xr+1  = −  0 0 0
3
0 0 2 4
0 0 0
xr+1
4
and so, multiplying by the inverse of the matrix on
following in terms of matrix multiplication.

0 4
0
0
1

0
−1
0
4
xr+1 = − 
2
1
1
 0
−
5
10
5
1
1
1
0 −5
− 10
20
on the
) augmented matrix. When
0.5 .The Gauss Seidel iterations
 r
0
x1
 xr2
0 

1   xr3
xr4
0



1
  2 
+ 
  3 
4
the left, the iteration reduces to the


 r 
x + 


1
1
4
1
2
3
4


.

This time, I will pick an initial vector close to the answer. Let


6
 −1 

x1 = 
 1 
1
2
This is very close to the answer. Now lets

0 4
0
0
1
 0 −1
0
2
4
x = −
2
1
1
 0
−
5
10
5
1
1
1
0 −5
− 10
20
see what the Gauss

 
1
6
  −1   1

  4
 1  +  1
2
1
2
3
4
Seidel iteration does to it.
 

5.0
  −1.0 
=

  .9 
. 55
It appears that it moved the initial guess far from the solution even though you started
with one which was initially close to the solution. This is discouraging. However, you can’t
expect the method to work well after only one iteration. Unfortunately, if you do multiple
iterations, the iterates never seem to get close to the actual solution. Why is the process
which worked so well in the other examples not working here? A better question might be:
Why does either process ever work at all?
Both iterative procedures for solving
Ax = b
(14.18)
are of the form
Bxr+1 = −Cxr + b
where A = B + C. In the Jacobi procedure, the matrix C was obtained by setting the
diagonal of A equal to zero and leaving all other entries the same while the matrix B was
obtained by making every entry of A equal to zero other than the diagonal entries which are
left unchanged. In the Gauss Seidel procedure, the matrix B was obtained from A by making
362
CHAPTER 14. NORMS
every entry strictly above the main diagonal equal to zero and leaving the others unchanged,
and C was obtained from A by making every entry on or below the main diagonal equal to
zero and leaving the others unchanged. Thus in the Jacobi procedure, B is a diagonal matrix
while in the Gauss Seidel procedure, B is lower triangular. Using matrices to explicitly solve
for the iterates, yields
xr+1 = −B −1 Cxr + B −1 b.
(14.19)
This is what you would never have the computer do but this is what will allow the statement
of a theorem which gives the condition for convergence of these and all other similar methods.
Recall the definition of the spectral radius of M, ρ (M ) , in Definition 14.3.1 on Page 351.
(
)
Theorem 14.5.6 Suppose ρ B −1 C < 1. Then the iterates in 14.19 converge to the unique
solution of 14.18.
I will prove this theorem in the next section. The proof depends on analysis which should
not be surprising because it involves a statement about convergence of sequences.
What is an easy to verify sufficient condition which will imply the above holds? It is easy
to give one in the
∑case of the Jacobi method. Suppose the matrix A is diagonally dominant.
That is |aii | > j̸=i |aij | . Then B would be the diagonal matrix consisting of the entries
aii . You need to find the size of λ where
B −1 Cx = λx
Thus you need
(λB − C) x = 0
Now if |λ| ≥ 1, then the matrix λB − C is diagonally dominant and so this matrix will be
invertible so λ is not an eigenvalue. Hence the only eigenvalues have absolute value less
than 1.
You might try a similar argument in the case of the Gauss Seidel method.
14.6
Theory Of Convergence
Definition 14.6.1 A normed vector space, E with norm ||·|| is called a Banach space if it
is also complete. This means that every Cauchy sequence converges. Recall that a sequence
∞
{xn }n=1 is a Cauchy sequence if for every ε > 0 there exists N such that whenever m, n > N,
||xn − xm || < ε.
Thus whenever {xn } is a Cauchy sequence, there exists x such that
lim ||x − xn || = 0.
n→∞
Example 14.6.2 Let E be a Banach space and let Ω be a nonempty subset of a normed
linear space F . Let B (Ω; E) denote those functions f for which
||f || ≡ sup {||f (x)||E : x ∈ Ω} < ∞
Denote by BC (Ω; E) the set of functions in B (Ω; E) which are also continuous.
Lemma 14.6.3 The above ∥·∥ is a norm on B (Ω; E). The subspace BC (Ω; E) with the
given norm is a Banach space.
14.6. THEORY OF CONVERGENCE
363
Proof: It is obvious ||·|| is a norm. It only remains to verify BC (Ω; E) is complete. Let
{fn } be a Cauchy sequence. Since ∥fn − fm ∥ → 0 as m, n → ∞, it follows that {fn (x)} is
a Cauchy sequence in E for each x. Let f (x) ≡ limn→∞ fn (x). Then for any x ∈ Ω.
||fn (x) − fm (x)||E ≤ ||fn − fm || < ε
whenever m, n are large enough, say as large as N . For n ≥ N, let m → ∞. Then passing
to the limit, it follows that for all x,
||fn (x) − f (x)||E ≤ ε
and so for all x,
∥f (x)∥E ≤ ε + ∥fn (x)∥E ≤ ε + ∥fn ∥ .
It follows that ∥f ∥ ≤ ∥fn ∥ + ε and ∥f − fn ∥ ≤ ε.
It remains to verify that f is continuous.
∥f (x) − f (y)∥E
≤
≤
∥f (x) − fn (x)∥E + ∥fn (x) − fn (y)∥E + ∥fn (y) − f (y)∥E
2ε
2 ∥f − fn ∥ + ∥fn (x) − fn (y)∥E <
+ ∥fn (x) − fn (y)∥E
3
for all n large enough. Now pick such an n. By continuity, the last term is less than 3ε if
∥x − y∥ is small enough. Hence f is continuous as well. The most familiar example of a Banach space is Fn . The following lemma is of great
importance so it is stated in general.
Lemma 14.6.4 Suppose T : E → E where E is a Banach space with norm |·|. Also suppose
|T x − T y| ≤ r |x − y|
(14.20)
for some r ∈ (0, 1). Then there exists a unique fixed point, x ∈ E such that
T x = x.
(14.21)
Letting x1 ∈ E, this fixed point x, is the limit of the sequence of iterates,
x1 , T x1 , T 2 x1 , · · · .
(14.22)
In addition to this, there is a nice estimate which tells how close x1 is to x in terms of
things which can be computed.
1
x − x ≤
1 1
x − T x1 .
1−r
(14.23)
{
}∞
Proof: This follows easily when it is shown that the above sequence, T k x1 k=1 is a
Cauchy sequence. Note that
2 1
T x − T x1 ≤ r T x1 − x1 .
Suppose
k 1
T x − T k−1 x1 ≤ rk−1 T x1 − x1 .
Then
k+1 1
T
x − T k x1 ≤ r T k x1 − T k−1 x1 ≤ rrk−1 T x1 − x1 = rk T x1 − x1 .
(14.24)
364
CHAPTER 14. NORMS
By induction, this shows that for all k ≥ 2, 14.24 is valid. Now let k > l ≥ N.
k−1
k−1
∑(
∑ j+1 1
k 1
)
j+1 1
j 1 T
T x − T l x1 = T
x
−
T
x
x − T j x1 ≤
j=l
j=l
≤
k−1
∑
j=N
rN
rj T x1 − x1 ≤ T x1 − x1 1−r
which converges to 0 as N → ∞. Therefore, this is a Cauchy sequence so it must converge
to x ∈ E. Then
x = lim T k x1 = lim T k+1 x1 = T lim T k x1 = T x.
k→∞
k→∞
k→∞
This shows the existence of the fixed point. To show it is unique, suppose there were
another one, y. Then
|x − y| = |T x − T y| ≤ r |x − y|
and so x = y.
It remains to verify the estimate.
1
x − x ≤ x1 − T x1 + T x1 − x = x1 − T x1 + T x1 − T x
≤ x1 − T x1 + r x1 − x
and solving the inequality for x1 − x gives the estimate desired. The following corollary is what will be used to prove the convergence condition for the
various iterative procedures.
Corollary 14.6.5 Suppose T : E → E, for some constant C
|T x − T y| ≤ C |x − y| ,
for all x, y ∈ E, and for some N ∈ N,
N
T x − T N y ≤ r |x − y| ,
for all x, y ∈ E where r ∈{(0, 1).}Then there exists a unique fixed point for T and it is still
the limit of the sequence, T k x1 for any choice of x1 .
Proof: From Lemma 14.6.4 there exists a unique fixed point for T N denoted here as x.
Therefore, T N x = x. Now doing T to both sides,
T N T x = T x.
By uniqueness, T x = x because the above equation shows T x is a fixed point of T N and
there is only one fixed point of T N . In fact, there is only one fixed point of T because a
fixed point of T is automatically a fixed point of T N .
It remains to show T k x1 → x, the unique fixed point of T N . If this does not happen,
there exists ε > 0 and a subsequence, still denoted by T k such that
k 1
T x − x ≥ ε
Now k = jk N + rk where rk ∈ {0, · · · , N − 1} and jk is a positive integer such that
limk→∞ jk = ∞. Then there exists a single r ∈ {0, · · · , N − 1} such that for infinitely
many k, rk = r. Taking a further subsequence, still denoted by T k it follows
j N +r 1
T k
x − x ≥ ε
(14.25)
14.7. EXERCISES
365
However,
T jk N +r x1 = T r T jk N x1 → T r x = x
and this contradicts 14.25. (
)
Theorem 14.6.6 Suppose ρ B −1 C < 1. Then the iterates in 14.19 converge to the unique
solution of 14.18.
Proof: Consider the iterates in 14.19. Let T x = B −1 Cx + B −1 b. Then
k
(
)
(
) (
) T x − T k y = B −1 C k x − B −1 C k y ≤ B −1 C k |x − y| .
Here ||·|| refers to any of the operator norms. It doesn’t matter which one you pick because
they are all equivalent. I am writing the (proof to
) indicate the operator norm taken with
respect to the usual norm on E. Since ρ B −1 C < 1, it follows from Gelfand’s theorem,
Theorem 14.3.3 on Page 353, there exists N such that if k ≥ N, then for some r1/k < 1,
(
−1 )k 1/k
< r1/k < 1.
B C Consequently,
N
T x − T N y ≤ r |x − y| .
Also |T x − T y| ≤ B −1 C |x − y| and so Corollary 14.6.5 applies and gives the conclusion
of this theorem. 14.7
Exercises
1. Solve the system

4
 1
0
1
5
2

 

1
x
1
2  y  =  2 
6
z
3
using the Gauss Seidel method and the Jacobi method. Check your answer by also
solving it using row operations.
2. Solve the system

4
 1
0
1
7
2

 

1
x
1
2  y  =  2 
4
z
3
using the Gauss Seidel method and the Jacobi method. Check your answer by also
solving it using row operations.
3. Solve the system

5
 1
0
1
7
2

 

1
x
1
2  y  =  2 
4
z
3
using the Gauss Seidel method and the Jacobi method. Check your answer by also
solving it using row operations.
4. If you are considering a system of the form Ax = b and A−1 does not exist, will either
the Gauss Seidel or Jacobi methods work? Explain. What does this indicate about
finding eigenvectors for a given eigenvalue?
366
CHAPTER 14. NORMS
5. For ||x||∞ ≡ max {|xj | : j = 1, 2, · · · , n} , the parallelogram identity does not hold.
Explain.
6. A norm ||·|| is said to be strictly convex if whenever ||x|| = ||y|| , x ̸= y, it follows
x + y 2 < ||x|| = ||y|| .
Show the norm |·| which comes from an inner product is strictly convex.
7. A norm ||·|| is said to be uniformly convex if whenever ||xn || , ||yn || are equal to 1 for
all n ∈ N and limn→∞ ||xn + yn || = 2, it follows limn→∞ ||xn − yn || = 0. Show the
norm |·| coming from an inner product is always uniformly convex. Also show that
uniform convexity implies strict convexity which is defined in Problem 6.
8. Suppose A : Cn → Cn is a one to one and onto matrix. Define
||x|| ≡ |Ax| .
Show this is a norm.
9. If X is a finite dimensional normed vector
space
and A, B ∈ L (X, X) such that
||B|| < ||A|| , can it be concluded that A−1 B < 1?
10. Let X be a vector space with a norm ||·|| and let V = span (v1 , · · · , vm ) be a finite
dimensional subspace of X such that {v1 , · · · , vm } is a basis for V. Show V is a closed
subspace of X. This means that if wn → w and each wn ∈ V, then so is w. Next show
that if w ∈
/ V,
dist (w, V ) ≡ inf {||w − v|| : v ∈ V } > 0
is a continuous function of w and
|dist (w, V ) − dist (w1 , V )| ≤ ∥w1 − w∥
Next show that if w ∈
/ V, there exists z such that ||z|| = 1 and dist (z, V ) > 1/2. For
those who know some advanced calculus, show that if X is an infinite dimensional
vector space having norm ||·|| , then the closed unit ball in X cannot be compact.
Thus closed and bounded is never compact in an infinite dimensional normed vector
space.
11. Suppose ρ (A) < 1 for A ∈ L (V, V ) where V is a p dimensional vector space having
a norm ||·||. You can use Rp or Cp if you like. Show there exists a new norm |||·|||
such that with respect to this new norm, |||A||| < 1 where |||A||| denotes the operator
norm of A taken with respect to this new norm on V ,
|||A||| ≡ sup {|||Ax||| : |||x||| ≤ 1}
Hint: You know from Gelfand’s theorem that
||An ||
1/n
<r<1
provided n is large enough, this operator norm taken with respect to ||·||. Show there
exists 0 < λ < 1 such that
( )
A
ρ
< 1.
λ
14.7. EXERCISES
367
You can do this by arguing the eigenvalues of A/λ are the scalars µ/λ where µ ∈ σ (A).
Now let Z+ denote the nonnegative integers.
n A |||x||| ≡ sup n x
λ
n∈Z+
First show this is actually a norm. Next explain why
n+1 A
|||Ax||| ≡ λ sup n+1 x ≤ λ |||x||| .
λ
n∈Z+
12. Establish a similar result to Problem 11 without using Gelfand’s theorem. Use an
argument which depends directly on the Jordan form or a modification of it.
13. Using Problem 11 give an easier proof of Theorem 14.6.6 without having to use Corollary 14.6.5. It would suffice to use a different norm of this problem and the contraction
mapping principle of Lemma 14.6.4.
∑
14. A matrix A is diagonally dominant if |aii | > j̸=i |aij | . Show that the Gauss Seidel
method converges if A is diagonally dominant.
∑∞
15. Suppose f (λ) = n=0 an λn converges if |λ| < R. Show that if ρ (A) < R where A is
an n × n matrix, then
∞
∑
f (A) ≡
an A n
n=0
converges in L (F , F ) . Hint: Use Gelfand’s theorem and the root test.
n
n
16. Referring to Corollary 14.4.3, for λ = a + ib show
exp (λt) = eat (cos (bt) + i sin (bt)) .
Hint: Let y (t) = exp (λt) and let z (t) = e−at y (t) . Show
z ′′ + b2 z = 0, z (0) = 1, z ′ (0) = ib.
Now letting z = u + iv where u, v are real valued, show
u′′ + b2 u
′′
2
v +b v
= 0, u (0) = 1, u′ (0) = 0
= 0, v (0) = 0, v ′ (0) = b.
Next show u (t) = cos (bt) and v (t) = sin (bt) work in the above and that there is at
most one solution to
w′′ + b2 w = 0 w (0) = α, w′ (0) = β.
Thus z (t) = cos (bt) + i sin (bt) and so y (t) = eat (cos (bt) + i sin (bt)). To show there
is at most one solution to the above problem, suppose you have two, w1 , w2 . Subtract
them. Let f = w1 − w2 . Thus
f ′′ + b2 f = 0
and f is real valued. Multiply both sides by f ′ and conclude
(
)
2
2
d (f ′ )
2f
+b
=0
dt
2
2
Thus the expression in parenthesis is constant. Explain why this constant must equal
0.
368
CHAPTER 14. NORMS
17. Let A ∈ L (Rn , Rn ) . Show the following power series converges in L (Rn , Rn ).
∞ k k
∑
t A
Ψ (t) ≡
k=0
k!
You might want to use Lemma 14.4.2. This is how you can define exp (tA). Next show
that Ψ′ (t) = AΨ (t) , Ψ (0) = I. To do this, use the techniques of Corollary 14.4.3.
k
∑∞ k
Next let Φ (t) = k=0 t (−A)
. Show each Φ (t) , Ψ (t) each commute with A. Next
k!
show that Φ (t) Ψ (t) = I for all t. Finally, solve the initial value problem
x′ = Ax + f , x (0) = x0
in terms of Φ and Ψ. This yields most of the substance of a typical differential
equations course.
18. In Problem 17 Ψ (t) is defined by the given series. Denote by exp (tσ (A)) the numbers
exp (tλ) where λ ∈ σ (A) . Show exp (tσ (A)) = σ (Ψ (t)) . This is like Lemma 14.4.7.
Letting J be the Jordan canonical form for A, explain why
Ψ (t) ≡
∞ k k
∑
t A
k!
k=0
=S
∞ k k
∑
t J
k=0
k!
S −1
and you note that in J k , the diagonal entries are of the form λk for λ an eigenvalue
of A. Also J = D + N where N is nilpotent and commutes with D. Argue then that
∞ k k
∑
t J
k!
k=0
is an upper triangular matrix which has on the diagonal the expressions eλt where
λ ∈ σ (A) . Thus conclude
σ (Ψ (t)) ⊆ exp (tσ (A))
Next take etλ ∈ exp (tσ (A)) and argue it must be in σ (Ψ (t)) . You can do this as
follows:
Ψ (t) − etλ I
=
∞ k k
∑
t A
k!
k=0

= 
∞ k k
∑
t λ
−
k=0
∞ k k−1
∑
t ∑
k=0
k!
k!
I=

∞ k (
∑
t
k=0
k!
Ak − λk I
)
Ak−j λj  (A − λI)
j=1
Now you need to argue
∞ k k−1
∑
t ∑
k=0
k!
Ak−j λj
j=1
converges to something in L (R , R ). To do this, use the ratio test and Lemma 14.4.2
after first using the triangle inequality. Since λ ∈ σ (A) , Ψ (t) − etλ I is not one to one
and so this establishes the other inclusion. You fill in the details. This theorem is a
special case of theorems which go by the name “spectral mapping theorem”.
n
n
14.7. EXERCISES
369
19. Suppose Ψ (t) ∈ L (V, W ) where V, W are finite dimensional inner product spaces and
t → Ψ (t) is continuous for t ∈ [a, b]: For every ε > 0 there there exists δ > 0 such that
if |s − t| < δ then ||Ψ (t) − Ψ (s)|| < ε. Show t → (Ψ (t) v, w) is continuous. Here it is
the inner product in W. Also define what it means for t → Ψ (t) v to be continuous
and show this is continuous. Do it all for differentiable in place of continuous. Next
show t → ||Ψ (t)|| is continuous.
20. If z (t) ∈ W, a finite dimensional inner product space, what does it mean for t → z (t)
to be continuous or differentiable? If z is continuous, define
∫
b
z (t) dt ∈ W
a
(
as follows.
∫
)
b
w,
∫
b
≡
z (t) dt
(w, z (t)) dt.
a
a
Show that this definition is well defined and furthermore the triangle inequality,
∫
∫
b
b
z (t) dt ≤
|z (t)| dt,
a
a
and fundamental theorem of calculus,
(∫ t
)
d
z (s) ds = z (t)
dt
a
hold along with any other interesting properties of integrals which are true.
21. For V, W two inner product spaces, define
∫
b
Ψ (t) dt ∈ L (V, W )
a
as follows.
(
∫
w,
)
b
Ψ (t) dt (v)
∫
≡
a
b
(w, Ψ (t) v) dt.
a
∫b
Show this is well defined and does indeed give a Ψ (t) dt ∈ L (V, W ) . Also show the
triangle inequality
∫
∫
b
b
Ψ (t) dt ≤
||Ψ (t)|| dt
a
a
where ||·|| is the operator norm and verify the fundamental theorem of calculus holds.
(∫
)′
t
Ψ (s) ds
= Ψ (t) .
a
Also verify the usual properties of integrals continue to hold such as the fact the
integral is linear and
∫
∫
b
a
∫
c
Ψ (t) dt +
Ψ (t) dt =
b
c
Ψ (t) dt
a
370
CHAPTER 14. NORMS
and similar things. Hint: On showing the triangle inequality, it will help if you use
the fact that
|w|W = sup |(w, v)| .
|v|≤1
You should show this also.
22. Prove Gronwall’s inequality. Suppose u (t) ≥ 0 and for all t ∈ [0, T ] ,
∫
t
u (t) ≤ u0 +
Ku (s) ds.
0
where K is some nonnegative constant. Then
u (t) ≤ u0 eKt .
∫t
Hint: w (t) = 0 u (s) ds. Then using the fundamental theorem of calculus, w (t)
satisfies the following.
u (t) − Kw (t) = w′ (t) − Kw (t) ≤ u0 , w (0) = 0.
Now use the usual techniques you saw in an introductory differential equations class.
Multiply both sides of the above inequality by e−Kt and note the resulting left side is
now a total derivative. Integrate both sides from 0 to t and see what you have got. If
you have problems, look ahead in the book. This inequality is proved later in Theorem
D.4.3.
23. With Gronwall’s inequality and the integral defined in Problem 21 with its properties
listed there, prove there is at most one solution to the initial value problem
y′ = Ay, y (0) = y0 .
Hint: If there are two solutions, subtract them and call the result z. Then
z′ = Az, z (0) = 0.
It follows
∫
z (t) = 0+
t
Az (s) ds
0
and so
∫
||z (t)|| ≤
t
∥A∥ ||z (s)|| ds
0
Now consider Gronwall’s inequality of Problem 22.
24. Suppose A is a matrix which has the property that whenever µ ∈ σ (A) , Re µ < 0.
Consider the initial value problem
y′ = Ay, y (0) = y0 .
The existence and uniqueness of a solution to this equation has been established above
in preceding problems, Problem 17 to 23. Show that in this case where the real parts
of the eigenvalues are all negative, the solution to the initial value problem satisfies
lim y (t) = 0.
t→∞
14.7. EXERCISES
371
Hint: A nice way to approach this problem is to show you can reduce it to the
consideration of the initial value problem
z′ = Jε z, z (0) = z0
where Jε is the modified Jordan canonical form where instead of ones down the main
diagonal, there are ε down the main diagonal (Problem 19). Then
z′ = Dz + Nε z
where D is the diagonal matrix obtained from the eigenvalues of A and Nε is a nilpotent
matrix commuting with D which is very small provided ε is chosen very small. Now
let Ψ (t) be the solution of
Ψ′ = −DΨ, Ψ (0) = I
described earlier as
∞
k
∑
(−1) tk Dk
k!
k=0
.
Thus Ψ (t) commutes with D and Nε . Tell why. Next argue
′
(Ψ (t) z) = Ψ (t) Nε z (t)
and integrate from 0 to t. Then
∫
t
Ψ (t) z (t) − z0 =
Ψ (s) Nε z (s) ds.
0
It follows
∫
t
||Ψ (t) z (t)|| ≤ ||z0 || +
||Nε || ||Ψ (s) z (s)|| ds.
0
It follows from Gronwall’s inequality
||Ψ (t) z (t)|| ≤ ||z0 || e||Nε ||t
Now look closely at the form of Ψ (t) to get an estimate which is interesting. Explain
why
 µt

e 1
0


..
Ψ (t) = 

.
0
eµn t
and now observe that if ε is chosen small enough, ||Nε || is so small that each component
of z (t) converges to 0.
25. Using Problem 24 show that if A is a matrix having the real parts of all eigenvalues
less than 0 then if
Ψ′ (t) = AΨ (t) , Ψ (0) = I
it follows
lim Ψ (t) = 0.
t→∞
Hint: Consider the columns of Ψ (t)?
372
CHAPTER 14. NORMS
26. Let Ψ (t) be a fundamental matrix satisfying
Ψ′ (t) = AΨ (t) , Ψ (0) = I.
Show Ψ (t) = Ψ (nt) . Hint: Subtract and show the difference satisfies Φ′ = AΦ, Φ (0) =
0. Use uniqueness.
n
27. If the real parts of the eigenvalues of A are all negative, show that for every positive
t,
lim Ψ (nt) = 0.
n→∞
Hint: Pick Re (σ (A)) < −λ < 0 and use Problem 18 about the spectrum of Ψ (t)
for the spectral radius along with Problem 26 to argue that
and
Gelfand’s theorem
Ψ (nt) /e−λnt < 1 for all n large enough.
28. Let H be a Hermitian matrix. (H = H ∗ ) . Show that eiH ≡
∑∞
n=0
(iH)n
n!
is unitary.
29. Show the converse of the above exercise. If V is unitary, then V = eiH for some H
Hermitian.
30. If U is unitary and does not have −1 as an eigenvalue so that (I + U )
that
−1
H = i (I − U ) (I + U )
−1
exists, show
is Hermitian. Then, verify that
U = (I + iH) (I − iH)
−1
.
31. Suppose that A ∈ L (V, V ) where V is a normed linear space. Also suppose that
∥A∥ < 1 where this refers to the operator norm on A. Verify that
−1
(I − A)
=
∞
∑
Ai
i=0
This is called the Neumann series. Suppose now that you only know the algebraic
−1
condition ρ (A) < 1. Is it still the case that the Neumann series converges to (I − A) ?
Chapter 15
Numerical Methods,
Eigenvalues
15.1
The Power Method For Eigenvalues
This chapter discusses numerical methods for finding eigenvalues. However, to do this
correctly, you must include numerical analysis considerations which are distinct from linear
algebra. The purpose of this chapter is to give an introduction to some numerical methods
without leaving the context of linear algebra. In addition, some examples are given which
make use of computer algebra systems. For a more thorough discussion, you should see
books on numerical methods in linear algebra like some listed in the references.
Let A be a complex p × p matrix and suppose that it has distinct eigenvalues
{λ1 , · · · , λm }
and that |λ1 | > |λk | for all k. Also let the Jordan form of A be


J1


..
J =

.
Jm
with J1 an m1 × m1 matrix.
Jk = λk Ik + Nk
where Nkrk ̸= 0 but Nkrk +1 = 0. Also let
P −1 AP = J, A = P JP −1 .
Now fix x ∈ Fp . Take Ax and let s1 be the entry of the vector Ax which has largest
absolute value. Thus Ax/s1 is a vector y1 which has a component of 1 and every other
entry of this vector has magnitude no larger than 1. If the scalars {s1 , · · · , sn−1 } and
vectors {y1 , · · · , yn−1 } have been obtained, let yn ≡ Ayn−1 /sn where sn is the entry of
Ayn−1 which has largest absolute value. Thus
yn =
An x
AAyn−2
··· =
sn sn−1
sn sn−1 · · · s1
373
(15.1)
374
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

=
1
sn sn−1 · · · s1

P

=
λn1
sn sn−1 · · · s1

P

J1n
..
 −1
P x
.
n
Jm

n
λ−n
1 J1
..
 −1
P x
.
(15.2)
n
λ−n
1 Jm
Consider one of the blocks in the Jordan form. First consider the k th of these blocks,
k > 1. It equals
rk ( )
∑
n −n n−i i
n
λ−n
J
=
λ λ Nk
1
k
i 1 k
i=0
which clearly converges to 0 as n → ∞ since |λ1 | > |λk |. An application of the ratio test or
root test for each term in the sum will show this. When k = 1, this block is
( )
r1 ( )
∑
]
n −n n−i i
n [ −r1 r1
−n n
n
λ−n
J
=
λ
J
=
λ
λ
N
=
λ1 N1 + en
1
1
k
1
1
1
1
r1
i
i=0
where
(n) ( nlim
) n→∞ en = 0 because it is a sum of bounded matrices which are multiplied by
/
i
r1 . This quotient converges to 0 as n → ∞ because i < r1 . It follows that 15.2 is of
the form
)
( ) ( −r1 r1
( )
λn1
λn1
n
n
λ1 N1 + en 0
−1
P x≡
yn =
P
wn
0
En
sn sn−1 · · · s1 r1
sn sn−1 · · · s1 r1
(
)
where En → 0, en → 0. Let P −1 x m1 denote the first m1 entries of the vector P −1 x.
(
)
Unless a very unlucky choice for x was picked, it will follow that P −1 x m1 ∈
/ ker (N1r1 ) .
Then for large n, yn is close to the vector
)
( ) ( −r1 r1
( )
λn1
λn1
n
n
0
λ1 N1
−1
P x≡
P
w ≡ z ̸= 0
0
0
sn sn−1 · · · s1 r1
sn sn−1 · · · s1 r1
However, this is an eigenvector because
A−λ1 I
}|
{ ( −r1 r1
z
λ1 N1
(A − λ1 I) w = P (J − λ1 I) P −1 P
0


P
(
= P

N1
..

 −1 
P P 
.
0
0
)
P −1 x =

r1
1
λ−r
1 N1
..
Jm − λ1 I
)
0
P −1 x = 0
0
 −1
P x
.
0
N1 λ1−r1 N1r1
0
Recall N1r1 +1 = 0. Now you could recover an approximation to the eigenvalue as follows.
(Az, z)
(Ayn , yn )
≈
= λ1
(yn , yn )
(z, z)
Here ≈ means “approximately equal”. However, there is a more convenient way to identify
the eigenvalue in terms of the scaling factors sk .
( )
λn1
n
sn · · · s1 r1 (wn − w) ≈ 0
∞
15.1. THE POWER METHOD FOR EIGENVALUES
375
Pick the largest nonzero entry of w, wl . Then for large n, it is also likely the case that
the largest entry of wn will be in the lth position because wm is close to w. From the
construction,
( )
( )
λn1
n
λn1
n
wnl = 1 ≈
wl
sn · · · s1 r1
sn · · · s1 r1
In other words, for large n
( )
λn1
n
≈ 1/wl
sn · · · s1 r1
Therefore, for large n,
( )
(
)
λn1
n
n+1
λn+1
1
≈
sn · · · s1 r1
sn+1 sn · · · s1
r1
( ) (
)
n
n+1
λ1
/
≈
sn+1
r1
r1
and so
( ) (
)
But limn→∞ rn1 / n+1
= 1 and so, for large n it must be the case that λ1 ≈ sn+1 .
r1
This has proved the following theorem which justifies the power method.
Theorem 15.1.1 Let A be a complex p × p matrix such that the eigenvalues are
{λ1 , λ2 , · · · , λr }
with |λ1 | > |λj | for all j ̸= 1. Then for x a given vector, let
y1 =
Ax
s1
where s1 is an entry of Ax which has the largest absolute value. If the scalars {s1 , · · · , sn−1 }
and vectors {y1 , · · · , yn−1 } have been obtained, let
yn ≡
Ayn−1
sn
where sn is the entry of Ayn−1 which has largest absolute value. Then it is probably the
case that {sn } will converge to λ1 and {yn } will converge to an eigenvector associated with
λ1 . If it doesn’t, you picked an incredibly inauspicious initial vector x.
In summary, here is the procedure.
Finding the largest eigenvalue with its eigenvector.
1. Start with a vector, u1 which you hope is not unlucky.
2. If uk is known,
uk+1 =
Auk
sk+1
where sk+1 is the entry of Auk which has largest absolute value.
3. When the scaling factors sk are not changing much, sk+1 will be close to the eigenvalue
and uk+1 will be close to an eigenvector.
4. Check your answer to see if it worked well. If things don’t work well, try another u1 .
You were miraculously unlucky in your choice.
376
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

5 −14
4
Example 15.1.2 Find the largest eigenvalue of A =  −4
3
6

11
−4  .
−3
T
You can begin with u1 = (1, · · · , 1) and apply the above procedure. However, you can
accelerate the process if you begin with An u1 and then divide by the largest entry to get
the first approximate eigenvector. Thus

5 −14
 −4
4
3
6

20   
11
1
2. 555 8 × 1021
−4   1  =  −1. 277 9 × 1021 
−3. 656 2 × 1015
−3
1
Divide by the largest entry to obtain a good aproximation.




2. 555 8 × 1021
1.0
1
 −1. 277 9 × 1021 

−0.5
=
2. 555 8 × 1021
15
−6
−3. 656 2 × 10
−1. 430 6 × 10
Now begin with this one.


 

5 −14 11
1.0
12. 000
 −4
=

4
−4  
−0.5
−6. 000 0
3
6
−3
−1. 430 6 × 10−6
4. 291 8 × 10−6
Divide by 12 to get the next iterate.




12. 000
1.0
1



−6. 000 0
−0.5
=
12
−6
−7
4. 291 8 × 10
3. 576 5 × 10
Another iteration will reveal that the scaling factor is still 12. Thus this is an approximate
( eigenvalue. )In fact, it is the largest eigenvalue and the corresponding eigenvector
is 1.0 −0.5 0 . The process has worked very well.
15.1.1
The Shifted Inverse Power Method
This method can find various eigenvalues and eigenvectors. It is a significant generalization
of the above simple procedure and yields very good results. One can find complex eigenvalues
using this method. The situation is this: You have a number α which is close to λ, some
eigenvalue of an n × n matrix A. You don’t know λ but you know that α is closer to λ
than to any other eigenvalue. Your problem is to find both λ and an eigenvector which goes
with λ. Another way to look at this is to start with α and seek the eigenvalue λ, which is
closest to α along with an eigenvector associated with λ. If α is an eigenvalue of A, then
you have what you want. Therefore, I will always assume α is not an eigenvalue of A and
−1
so (A − αI) exists. The method is based on the following lemma.
n
Lemma 15.1.3 Let {λk }k=1 be the eigenvalues of A. If xk is an eigenvector of A for the
−1
eigenvalue λk , then xk is an eigenvector for (A − αI)
corresponding to the eigenvalue
1
λk −α . Conversely, if
1
−1
(A − αI) y =
y
(15.3)
λ−α
and y ̸= 0, then Ay = λy.
15.1. THE POWER METHOD FOR EIGENVALUES
377
Proof: Let λk and xk be as described in the statement of the lemma. Then
(A − αI) xk = (λk − α) xk
and so
1
−1
xk = (A − αI) xk .
λk − α
1
Suppose 15.3. Then y = λ−α
[Ay − αy] . Solving for Ay leads to Ay = λy. 1
Now assume α is closer to λ than to any other eigenvalue. Then the magnitude of λ−α
−1
is greater than the magnitude of all the other eigenvalues of (A − αI) . Therefore, the
−1
1
1
power method applied to (A − αI) will yield λ−α . You end up with sn+1 ≈ λ−α
and
solve for λ.
15.1.2
The Explicit Description Of The Method
Here is how you use this method to find the eigenvalue closest to α and the
corresponding eigenvector.
1. Find (A − αI)
−1
.
2. Pick u1 . If you are not phenomenally unlucky, the iterations will converge.
3. If uk has been obtained,
−1
uk+1 =
where sk+1 is the entry of (A − αI)
−1
(A − αI)
sk+1
uk
uk which has largest absolute value.
4. When the scaling factors, sk are not changing much and the uk are not changing much,
find the approximation to the eigenvalue by solving
sk+1 =
1
λ−α
for λ. The eigenvector is approximated by uk+1 .
5. Check your work by multiplying by the original matrix to see how well what you have
found works.
−1
Thus this amounts to the power method for the matrix (A − αI) but you are free to
pick α.


5 −14 11
4
−4  which is closest to −7.
Example 15.1.4 Find the eigenvalue of A =  −4
3
6
−3
Also find an eigenvector which goes with this eigenvalue.
In this case the eigenvalues are −6, 0, and 12 so the correct answer is −6 for the eigen(
)T
value. Then from the above procedure, I will start with an initial vector, u1 = 1 1 1
.
Then I must solve the following equation.



 
 

5 −14 11
1 0 0
x
1
 −4
4
−4  + 7  0 1 0   y  =  1 
3
6
−3
0 0 1
z
1
378
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
Simplifying the matrix on the left, I must solve


  
12 −14 11
x
1
 −4 11 −4   y  =  1 
3
6
4
z
1
and then divide by the entry which has largest absolute value to obtain


1.0
u2 =  . 184 
−. 76
Now solve

12
 −4
3

 

−14 11
x
1.0
11 −4   y  =  . 184 
6
4
z
−. 76
and divide by the largest entry, 1. 051 5 to get


1.0
u3 =  .0 266 
−. 970 61
Solve

12
 −4
3

 

−14 11
x
1.0
11 −4   y  =  .0 266 
6
4
z
−. 970 61
and divide by the largest entry, 1. 01 to get


1.0
u4 =  3. 845 4 × 10−3  .
−. 996 04
These scaling factors are pretty close after these few iterations. Therefore, the predicted
eigenvalue is obtained by solving the following for λ.
1
= 1.01
λ+7
which gives λ = −6. 01. You see this is pretty close. In this case the eigenvalue closest to
−7 was −6.
How would you know what to start with for an initial guess? You might apply Gerschgorin’s theorem. However, sometimes you can begin with a better estimate.


1 2 3
Example 15.1.5 Consider the symmetric matrix A =  2 1 4  . Find the middle
3 4 2
eigenvalue and an eigenvector which goes with it.
Since A is symmetric, it follows it has three real eigenvalues
 
 
1 0 0
1 2
p (λ) = det λ  0 1 0  −  2 1
0 0 1
3 4
= λ3 − 4λ2 − 24λ − 17 = 0
which are solutions to

3
4 
2
15.1. THE POWER METHOD FOR EIGENVALUES
379
If you use your graphing calculator to graph this polynomial, you find there is an eigenvalue
somewhere between −.9 and −.8 and that this is the middle eigenvalue. Of course you could
zoom in and find it very accurately without much trouble but what about the eigenvector
which goes with it? If you try to solve


 
 
  
1 0 0
1 2 3
x
0
(−.8)  0 1 0  −  2 1 4   y  =  0 
0 0 1
3 4 2
z
0
there will be only the zero solution because the matrix on the left will be invertible and the
same will be true if you replace −.8 with a better approximation like −.86 or −.855. This is
because all these are only approximations to the eigenvalue and so the matrix in the above
is nonsingular for all of these. Therefore, you will only get the zero solution and
Eigenvectors are never equal to zero!
However, there exists such an eigenvector and you can find it using the shifted inverse power
method. Pick α = −.855. Then you solve
 
  



1 0 0
x
1
1 2 3
 2 1 4  + .855  0 1 0   y  =  1 
1
0 0 1
z
3 4 2
or in other words,


  
1. 855
2.0
3.0
x
1
 2.0
1. 855
4.0   y  =  1 
3.0
4.0
2. 855
z
1
and after finding the solution, divide by the largest entry −67. 944, to obtain


1. 0
u2 =  −. 589 21 
−. 230 44
After a couple more iterations, you obtain


1. 0
u3 =  −. 587 77 
−. 227 14
(15.4)
Then doing it again, the scaling factor is −513. 42 and the next iterate is


1. 0
u4 =  −. 587 78 
−. 227 14
Clearly the uk are not changing much. This suggests an approximate eigenvector for this
eigenvalue which is close to −.855 is the above u3 and an eigenvalue is obtained by solving
1
= −513. 42,
λ + .855
which yields λ = −0.856 95 Lets check this.


 

1 2 3
1. 0
−0.856 98
 2 1 4   −. 587 78  =  0.503 66  .
3 4 2
−. 227 14
0.194 6
380
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

 

1. 0
−0.856 95
−0.856 95  −. 587 77  =  0.503 69 
−. 227 14
0.194 65
Thus the vector of 15.4 is very close to the desired eigenvector, just as −. 856 9 is very close
to the desired eigenvalue. For practical purposes, I have found both the eigenvector and the
eigenvalue.


2 1 3
Example 15.1.6 Find the eigenvalues and eigenvectors of the matrix A =  2 1 1  .
3 2 1
This is only a 3×3 matrix and so it is not hard to estimate the eigenvalues. Just get
the characteristic equation, graph it using a calculator and zoom in to find the eigenvalues.
If you do this, you find there is an eigenvalue near −1.2, one near −.4, and one near 5.5.
(The characteristic equation is 2 + 8λ + 4λ2 − λ3 = 0.) Of course I have no idea what the
eigenvectors are.
Lets first try to find the eigenvector and a better approximation for the eigenvalue near
−1.2. In this case, let α = −1.2. Then


−25. 357 143 −33. 928 571 50.0
−1
12. 5
17. 5
−25.0  .
(A − αI) = 
23. 214 286
30. 357 143 −45.0
As before, it helps to get things started if you raise to a power and then go from the
approximate eigenvector obtained.

−25. 357 143

12. 5
23. 214 286
−33. 928 571
17. 5
30. 357 143

7   
−2. 295 6 × 1011
1
50.0
−25.0   1  =  1. 129 1 × 1011 
2. 086 5 × 1011
1
−45.0
Then the next iterate will be




−2. 295 6 × 1011
1.0
1
 1. 129 1 × 1011 
=  −0.491 85 
−2. 295 6 × 1011
11
2. 086 5 × 10
−0.908 91
Next iterate:

−25. 357 143

12. 5
23. 214 286
−33. 928 571
17. 5
30. 357 143

 

50.0
1.0
−54. 115
−25.0   −0.491 85  =  26. 615 
−45.0
−0.908 91
49. 184
Divide by largest entry




−54. 115
1.0
1
 26. 615 
=  −0.491 82 
−54. 115
49. 184
−0.908 88
You can see the vector didn’t change much and so the next scaling factor will not be much
different than this one. Hence you need to solve for λ
1
= −54. 115
λ + 1.2
15.1. THE POWER METHOD FOR EIGENVALUES
381
Then λ = −1. 218 5 is an approximate eigenvalue and


1.0
 −0.491 82 
−0.908 88
is an approximate eigenvector. How well does it work?



2 1 3
1.0
 2 1 1   −0.491 82  =
3 2 1
−0.908 88


1.0
(−1. 218 5)  −0.491 82  =
−0.908 88


−1. 218 5
 0.599 3 
1. 107 5


−1. 218 5
 0.599 28 
1. 107 5
You can see that for practical purposes, this has found the eigenvalue closest to −1. 218 5
and the corresponding eigenvector.
The other eigenvectors and eigenvalues can be found similarly. In the case of −.4, you
could let α = −.4 and then


8. 064 516 1 × 10−2 −9. 274 193 5 6. 451 612 9
−1
−. 403 225 81
11. 370 968 −7. 258 064 5  .
(A − αI) = 
. 403 225 81
3. 629 032 3 −2. 741 935 5
Following the procedure of the power method, you find that after about 5 iterations, the
scaling factor is 9. 757 313 9, they are not changing much, and


−. 781 224 8
.
1. 0
u5 = 
. 264 936 88
Thus the approximate eigenvalue is
1
= 9. 757 313 9
λ + .4
which shows λ = −. 297 512 78 is an approximation to the eigenvalue near .4. How well does
it work?



 
. 232 361 04
2 1 3
−. 781 224 8
 =  −. 297 512 72  .
 2 1 1 
1. 0
. 264 936 88
−.0 787 375 2
3 2 1

 

−. 781 224 8
. 232 424 36
=
.
1. 0
−. 297 512 78
−. 297 512 78 
. 264 936 88
−7. 882 210 8 × 10−2
It works pretty well. For practical purposes, the eigenvalue and eigenvector have now been
found. If you want better accuracy, you could just continue iterating. One can find the
eigenvector corresponding to the eigenvalue nearest 5.5 the same way.
15.1.3
Complex Eigenvalues
What about complex eigenvalues? If your matrix is real, you won’t see these by graphing
the characteristic equation on your calculator. Will the shifted inverse power method find
these eigenvalues and their associated eigenvectors? The answer is yes. However, for a real
382
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
matrix, you must pick α to be complex. This is because the eigenvalues occur in conjugate
pairs so if you don’t pick it complex, it will be the same distance between any conjugate
pair of complex numbers and so nothing in the above argument for convergence implies you
will get convergence to a complex number. Also, the process of iteration will yield only real
vectors and scalars.
Example 15.1.7 Find the complex eigenvalues
trix

5 −8
 1 0
0 1
and corresponding eigenvectors for the ma
6
0 .
0
Here the characteristic equation is λ3 − 5λ2 + 8λ − 6 = 0. One solution is λ = 3. The
other two are 1 + i and 1 − i. I will apply the process to α = i to find the eigenvalue closest
to i.


−.0 2 − . 14i 1. 24 + . 68i −. 84 + . 12i
−1
. 12 + . 84i 
(A − αI) =  −. 14 + .0 2i . 68 − . 24i
.0 2 + . 14i −. 24 − . 68i . 84 + . 88i
T
Then let u1 = (1, 1, 1) for lack of any insight into anything better.

−.0 2 − . 14i 1. 24 + . 68i
 −. 14 + .0 2i . 68 − . 24i
.0 2 + . 14i −. 24 − . 68i


−0.400 00 + 0.8i
=  0.200 00 + 0.6i 
0.400 00 + 0.2i
20  
−. 84 + . 12i
1
. 12 + . 84i   1 
. 84 + . 88i
1
Now divide by the largest entry to get the next iterate. This yields for an approximate
eigenvector approximately




−0.400 00 + 0.8i
1.0
1
 0.200 00 + 0.6i 
=  0.5 − 0.5i 
−0.400 00 + 0.8i
0.400 00 + 0.2i
−0.5i
Now leaving off extremely small terms,



−.0 2 − . 14i 1. 24 + . 68i −. 84 + . 12i
1.0
 −. 14 + .0 2i . 68 − . 24i
. 12 + . 84i   0.5 − 0.5i  =
.0 2 + . 14i −. 24 − . 68i . 84 + . 88i
−0.5i


1.0
 0.5 − 0.5i 
−0.5i
so it appears that an eigenvector is the above and an eigenvalue can be obtained by solving
1
= 1, so λ = 1 + i
λ−i
The method has successfully found the complex eigenvalue closest to i as well as the eigenvector. Note that I used essentially 20 iterations of the method.
15.1. THE POWER METHOD FOR EIGENVALUES
383
This illustrates an interesting topic which leads to many related topics. If you have a
polynomial, x4 + ax3 + bx2 + cx + d, you can consider it as the characteristic polynomial of
a certain matrix, called a companion matrix. In this case,


−a −b −c −d
 1
0
0
0 
.

 0
1
0
0 
0
0
1
0
The above example was just a companion matrix for λ3 − 5λ2 + 8λ − 6. You can see the
pattern which will enable you to obtain a companion matrix for any polynomial of the form
λn + a1 λn−1 + · · · + an−1 λ + an . This illustrates that one way to find the complex zeros
of a polynomial is to use the shifted inverse power method on a companion matrix for the
polynomial. Doubtless there are better ways but this does illustrate how impressive this
procedure is. Do you have a better way?
Note that the shifted inverse power method is a way you can begin with something close
but not equal to an eigenvalue and end up with something close to an eigenvector.
15.1.4
Rayleigh Quotients And Estimates for Eigenvalues
There are many specialized results concerning the eigenvalues and eigenvectors for Hermitian
matrices. Recall a matrix A is Hermitian if A = A∗ where A∗ means to take the transpose
of the conjugate of A. In the case of a real matrix, Hermitian reduces to symmetric. Recall
also that for x ∈ Fn ,
n
∑
2
2
|x| = x∗ x =
|xj | .
j=1
Recall the following corollary found on Page 187 which is stated here for convenience.
Corollary 15.1.8 If A is Hermitian, then all the eigenvalues of A are real and there exists
an orthonormal basis of eigenvectors.
n
Thus for {xk }k=1 this orthonormal basis,
x∗i xj
{
= δ ij ≡
1 if i = j
0 if i ̸= j
For x ∈ Fn , x ̸= 0, the Rayleigh quotient is defined by
x∗ Ax
|x|
2
.
n
Now let the eigenvalues of A be λ1 ≤ λ2 ≤ · · · ≤ λn and Axk = λk xk where {xk }k=1 is
the above orthonormal basis of eigenvectors mentioned in the corollary. Then if x is an
arbitrary vector, there exist constants, ai such that
x=
n
∑
ai xi .
i=1
Also,
2
|x| =
n
∑
i=1
ai x∗i
n
∑
j=1
aj xj =
∑
ij
ai aj x∗i xj =
∑
ij
ai aj δ ij =
n
∑
i=1
2
|ai | .
384
Therefore,
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
(∑
)
∑
n
∗
∗
a
x
)
a
λ
x
i
j
j
j
i
i=1
j=1
x Ax
ij ai aj λj xi xj
=
=
∑
∑
2
n
2
n
2
|x|
i=1 |ai |
i=1 |ai |
∑
∑n
2
ij ai aj λj δ ij
i=1 |ai | λi
= ∑n
=
∑n
2
2 ∈ [λ1 , λn ] .
i=1 |ai |
i=1 |ai |
(
∗
∑n
In other words, the Rayleigh quotient is always between the largest and the smallest eigenvalues of A. When x = xn , the Rayleigh quotient equals the largest eigenvalue and when x = x1
the Rayleigh quotient equals the smallest eigenvalue. Suppose you calculate a Rayleigh quotient. How close is it to some eigenvalue?
Theorem 15.1.9 Let x ̸= 0 and form the Rayleigh quotient,
x∗ Ax
|x|
2
≡ q.
Then there exists an eigenvalue of A, denoted here by λq such that
|λq − q| ≤
Proof: Let x =
∑n
|Ax − qx|
.
|x|
(15.5)
n
k=1
ak xk where {xk }k=1 is the orthonormal basis of eigenvectors.
∗
2
|Ax − qx| = (Ax − qx) (Ax − qx)
( n
)∗ ( n
)
∑
∑
=
ak λk xk − qak xk
ak λk xk − qak xk
k=1
k=1
(

)
n
n
∑
∑
∗

(λk − q) ak xk
=
(λj − q) aj xj
j=1
=
k=1
∑
(λj − q) aj (λk − q) ak x∗j xk
j,k
=
n
∑
2
|ak | (λk − q)
2
k=1
Now pick the eigenvalue λq which is closest to q. Then
2
|Ax − qx| =
n
∑
k=1
2
2
|ak | (λk − q) ≥ (λq − q)
2
n
∑
2
2
|ak | = (λq − q) |x|
2
k=1
which implies 15.5. 

1 2 3
T
Example 15.1.10 Consider the symmetric matrix A =  2 2 1  . Let x = (1, 1, 1) .
3 1 4
How close is the Rayleigh quotient to some eigenvalue of A? Find the eigenvector and eigenvalue to several decimal places.
15.1. THE POWER METHOD FOR EIGENVALUES
385
Everything is real and so there is no need to worry about taking conjugates. Therefore,
the Rayleigh quotient is

 
1 2 3
1
(
)
1 1 1  2 2 1  1 
3 1 4
1
19
=
3
3
According to the above theorem, there is some eigenvalue of this matrix λq such that





1 2 3
1
1  2 2 1   1  − 19  1 
 1 
3
−
3
1
4
1
1
1  34 
λq − 19 ≤
√
√
−3
=
3
3
3
5
3
√
( 4 )2 ( 5 )2
1
+ 3
9 + 3
√
= 1. 247 2
=
3
Could you find this eigenvalue and associated eigenvector? Of course you could. This is
what the shifted inverse power method is all about.
Solve

  

 

1
1 2 3
x
1 0 0
 2 2 1  − 19  0 1 0   y  =  1 
3
1
3 1 4
z
0 0 1

In other words solve
− 16
3
 2
3
2
− 13
3
1

  
3
x
1
1  y  =  1 
z
1
− 73
and divide by the entry which is largest, 3. 870 7, to get


. 699 25
u2 =  . 493 89 
1.0
Now solve

− 16
3
 2
3
2
− 13
3
1

 

3
x
. 699 25
1   y  =  . 493 89 
z
1.0
− 73
and divide by the largest entry, 2. 997 9 to get


. 714 73
u3 =  . 522 63 
1. 0
Now solve

− 16
3
 2
3
2
− 13
3
1

 

3
x
. 714 73
1   y  =  . 522 63 
z
1. 0
− 73
and divide by the largest entry, 3. 045 4, to get


. 713 7
u4 =  . 520 56 
1.0
386
Solve
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

− 16
3
 2
3
2
− 13
3
1

 

3
x
. 713 7
1   y  =  . 520 56 
z
1.0
− 73
and divide by the largest entry, 3. 042 1 to get


. 713 78
u5 =  . 520 73 
1.0
You can see these scaling factors are not changing much. The predicted eigenvalue is then
about
19
1
+
= 6. 662 1.
3. 042 1
3
How close is this?


 

1 2 3
. 713 78
4. 755 2
 2 2 1   . 520 73  =  3. 469 
3 1 4
1.0
6. 662 1
while

 
4. 755 3
. 713 78
6. 662 1  . 520 73  =  3. 469 2  .
6. 662 1
1.0

You see that for practical purposes, this has found the eigenvalue and an eigenvector.
15.2
The QR Algorithm
15.2.1
Basic Properties And Definition
Recall the theorem about the QR factorization in Theorem 5.7.5. It says that given an n×n
real matrix A, there exists a real orthogonal matrix Q and an upper triangular matrix R such
that A = QR and that this factorization can be accomplished by a systematic procedure.
One such procedure was given in proving this theorem.
Theorem 15.2.1 Let A be an n × n complex matrix. Then there exists a unitary Q and
upper triangular R such that A = QR.
Proof: This is obvious if n = 1. Suppose true for n and let
(
)
A = a1 · · · an an+1
Let Q1 be a unitary matrix such that Q1 a1 = |a1 | e1 in case a1 ̸= 0. If a1 = 0, let Q1 = I.
Thus
(
)
a b
Q1 A =
0 A1
where A1 is (n − 1) × (n − 1). By induction, there exists Q′2 an (n − 1) × (n − 1) unitary
matrix such that Q′2 A1 = R′ , an upper triangular matrix. Then
(
)
(
)
1 0
a b
Q1 A =
=R
0 Q′2
0 R′
Since the product of unitary matrices is unitary, there exists Q unitary such that Q∗ A = R
and so A = QR. ◮ ◮
The QR algorithm is described in the following definition.
15.2. THE QR ALGORITHM
387
Definition 15.2.2 The QR algorithm is the following. In the description of this algorithm,
Q is unitary and R is upper triangular having nonnegative entries on the main diagonal.
Starting with A an n × n matrix, form
A0 ≡ A = Q1 R1
(15.6)
A1 ≡ R1 Q1 .
(15.7)
Ak = Rk Qk ,
(15.8)
Ak = Qk+1 Rk+1 , Ak+1 = Rk+1 Qk+1
(15.9)
Then
In general given
obtain Ak+1 by
This algorithm was proposed by Francis in 1961. The sequence {Ak } is the desired
sequence of iterates. Now with the above definition of the algorithm, here are its properties.
The next lemma shows each of the Ak is unitarily similar to A and the amazing thing about
this algorithm is that often it becomes increasingly easy to find the eigenvalues of the Ak .
Lemma 15.2.3 Let A be an n × n matrix and let the Qk and Rk be as described in the algorithm. Then each Ak is unitarily similar to A and denoting by Q(k) the product Q1 Q2 · · · Qk
and R(k) the product Rk Rk−1 · · · R1 , it follows that
Ak = Q(k) R(k)
(The matrix on the left is A raised to the k th power.)
A = Q(k) Ak Q(k)∗ , Ak = Q(k)∗ AQ(k) .
Proof: From the algorithm, Rk+1 = Ak+1 Q∗k+1 and so
Ak = Qk+1 Rk+1 = Qk+1 Ak+1 Q∗k+1
Now iterating this, it follows
Ak−1 = Qk Ak Q∗k = Qk Qk+1 Ak+1 Q∗k+1 Q∗k
Ak−2 = Qk−1 Ak−1 Q∗k−1 = Qk−1 Qk Qk+1 Ak+1 Q∗k+1 Q∗k Q∗k−1
etc. Thus, after k − 2 more iterations,
A = Q(k+1) Ak+1 Q(k+1)∗
The product of unitary matrices is unitary and so this proves the first claim of the lemma.
Now consider the part about Ak . From the algorithm, this is clearly true for k = 1.
1
(A = QR) Suppose then that
Ak = Q1 Q2 · · · Qk Rk Rk−1 · · · R1
What was just shown indicated
A = Q1 Q2 · · · Qk+1 Ak+1 Q∗k+1 Q∗k · · · Q∗1
and now from the algorithm, Ak+1 = Rk+1 Qk+1 and so
A = Q1 Q2 · · · Qk+1 Rk+1 Qk+1 Q∗k+1 Q∗k · · · Q∗1
388
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
Then
Ak+1 = AAk =
A
z
}|
{
Q1 Q2 · · · Qk+1 Rk+1 Qk+1 Q∗k+1 Q∗k · · · Q∗1 Q1 · · · Qk Rk Rk−1 · · · R1
= Q1 Q2 · · · Qk+1 Rk+1 Rk Rk−1 · · · R1 ≡ Q(k+1) R(k+1) Here is another very interesting lemma.
Lemma 15.2.4 Suppose Q(k) , Q are unitary and Rk is upper triangular such that the diagonal entries on Rk are all positive and
Q = lim Q(k) Rk
k→∞
Then
lim Q(k) = Q, lim Rk = I.
k→∞
k→∞
Also the QR factorization of A is unique whenever A−1 exists.
Proof: Let
(
)
Q = (q1 , · · · , qn ) , Q(k) = qk1 , · · · , qkn
k
where the q are the columns. Also denote by rij
the ij th entry of Rk . Thus

)
(
Q(k) Rk = qk1 , · · · , qkn 
∗
k
r11
..
0
.



k
rnn
It follows
k k
q1 → q1
r11
and so
k k
k
q1 → 1
= r11
r11
Therefore,
qk1 → q1 .
Next consider the second column.
k k
k k
r12
q1 + r22
q2 → q2
Taking the inner product of both sides with qk1 it follows
(
)
k
lim r12
= lim q2 · qk1 = (q2 · q1 ) = 0.
k→∞
k→∞
Therefore,
k k
lim r22
q2 = q2
k→∞
k
k
and since r22
> 0, it follows as in the first part that r22
→ 1. Hence
lim qk2 = q2 .
k→∞
Continuing this way, it follows
k
lim rij
=0
k→∞
15.2. THE QR ALGORITHM
for all i ̸= j and
389
k
lim rjj
= 1, lim qkj = qj .
k→∞
k→∞
Thus Rk → I and Q → Q. This proves the first part of the lemma.
The second part follows immediately. If QR = Q′ R′ = A where A−1 exists, then
(k)
−1
Q∗ Q′ = R (R′ )
and I need to show both sides of the above are equal to I. The left side of the above is
unitary and the right side is upper triangular having positive entries on the diagonal. This
is because the inverse of such an upper triangular matrix having positive entries on the
main diagonal is still upper triangular having positive entries on the main diagonal and
the product of two such upper triangular matrices gives another of the same form having
positive entries on the main diagonal. Suppose then that Q = R where Q is unitary and R
is upper triangular having positive entries on the main diagonal. Let Qk = Q and Rk = R.
It follows
IRk → R = Q
and so from the first part, Rk → I but Rk = R and so R = I. Thus applying this to
−1
Q∗ Q′ = R (R′ ) yields both sides equal I. A case of all this is of great
( interest. Suppose A
) has a largest eigenvalue λ which is
real. Then An is of the form An−1 a1 , · · · , An−1 an and so likely each of these columns
will be pointing roughly in the direction of an eigenvector of A which corresponds to this
eigenvalue. Then when you do the QR factorization of this, it follows from the fact that R
is upper triangular, that the first column of Q will be a multiple of An−1 a1 and so will end
up being roughly parallel to the eigenvector desired. Also this will require the entries below
the top in the first column of An = QT AQ will all be small because they will be of the form
qTi Aq1 ≈ λqTi q1 = 0. Therefore, An will be of the form
( ′
)
λ a
e B
where e is small. It follows that λ′ will be close to λ and q1 will be close to an eigenvector for
λ. Then if you like, you could do the same thing with the matrix B to obtain approximations
for the other eigenvalues. Finally, you could use the shifted inverse power method to get
more exact solutions.
15.2.2
The Case Of Real Eigenvalues
With these lemmas, it is possible to prove that for the QR algorithm and certain conditions,
the sequence Ak converges pointwise to an upper triangular matrix having the eigenvalues
of A down the diagonal. I will assume all the matrices are real here.
(
)
0 1
This convergence won’t always happen. Consider for example the matrix
.
1 0
You can verify quickly that the algorithm will return this matrix for each k. The problem
here is that, although the matrix has the two eigenvalues −1, 1, they have the same absolute
value. The QR algorithm works in somewhat the same way as the power method, exploiting
differences in the size of the eigenvalues.
If A has all real eigenvalues and you are interested in finding these eigenvalues along
with the corresponding eigenvectors, you could always consider A + λI instead where λ is
sufficiently large and positive that A + λI has all positive eigenvalues. (Recall Gerschgorin’s
theorem.) Then if µ is an eigenvalue of A + λI with
(A + λI) x = µx
390
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
then
Ax = (µ − λ) x
so to find the eigenvalues of A you just subtract λ from the eigenvalues of A + λI. Thus
there is no loss of generality in assuming at the outset that the eigenvalues of A are all
positive. Here is the theorem. It involves a technical condition which will often hold. The
proof presented here follows [27] and is a special case of that presented in this reference.
Before giving the proof, note that the product of upper triangular matrices is upper
triangular. If they both have positive entries on the main diagonal so will the product.
Furthermore, the inverse of an upper triangular matrix is upper triangular. I will use these
simple facts without much comment whenever convenient.
Theorem 15.2.5 Let A be a real matrix having eigenvalues
λ1 > λ2 > · · · > λn > 0
and let
A = SDS −1

where

D=
λ1
(15.10)
0
..
.
0



λn
and suppose S −1 has an LU factorization. Then the matrices Ak in the QR algorithm
described above converge to an upper triangular matrix T ′ having the eigenvalues of A,
λ1 , · · · , λn descending on the main diagonal. The matrices Q(k) converge to Q′ , an orthogonal matrix which equals Q except for possibly having some columns multiplied by −1 for Q
the unitary part of the QR factorization of S,
S = QR,
and
lim Ak = T ′ = Q′T AQ′
k→∞
Proof: From Lemma 15.2.3
Ak = Q(k) R(k) = SDk S −1
(15.11)
Let S = QR where this is just a QR factorization which is known to exist and let S −1 = LU
which is assumed to exist. Thus
Q(k) R(k) = QRDk LU
and so
(15.12)
Q(k) R(k) = QRDk LU = QRDk LD−k Dk U
That matrix in the middle, Dk LD−k satisfies
( k
)
for j ≤ i, 0 if j > i.
D LD−k ij = λki Lij λ−k
j
Thus for j < i the expression converges to 0 because λj > λi when this happens. When
i = j it reduces to 1. Thus the matrix in the middle is of the form I + Ek where Ek → 0.
Then it follows
Ak = Q(k) R(k) = QR (I + Ek ) Dk U
15.2. THE QR ALGORITHM
391
(
)
= Q I + REk R−1 RDk U ≡ Q (I + Fk ) RDk U
where Fk → 0. Then let I + Fk = Qk Rk where this is another QR factorization. Then it
reduces to
Q(k) R(k) = QQk Rk RDk U
This looks really interesting because by Lemma 15.2.4 Qk → I and Rk → I because
Qk Rk = (I + Fk ) → I. So it follows QQk is an orthogonal matrix converging to Q while
(
)−1
Rk RDk U R(k)
is upper triangular, being the product of upper triangular matrices. Unfortunately, it is not
known that the diagonal entries of this matrix are nonnegative because of the U . Let Λ be
just like the identity matrix but having some of the ones replaced with −1 in such a way
that ΛU is an upper triangular matrix having positive diagonal entries. Note Λ2 = I and
also Λ commutes with a diagonal matrix. Thus
Q(k) R(k) = QQk Rk RDk Λ2 U = QQk Rk RΛDk (ΛU )
At this point, one does some inspired massaging to write the above in the form
]
(
) [(
)−1
QQk ΛDk
ΛDk
Rk RΛDk (ΛU )
[(
]
)−1
= Q (Qk Λ) Dk ΛDk
Rk RΛDk (ΛU )
≡Gk
=
z [
}|
{
]
(
)
k
k −1
k
Q (Qk Λ) D
ΛD
Rk RΛD (ΛU )
Now I claim the middle matrix in [·] is upper triangular and has all positive entries on the
diagonal. This is because it is an upper triangular matrix which is similar to the upper
triangular matrix Rk R[and so it has the same
] eigenvalues (diagonal entries) as Rk R. Thus
(
)
k
k −1
k
the matrix Gk ≡ D
ΛD
Rk RΛD (ΛU ) is upper triangular and has all positive
entries on the diagonal. Multiply on the right by G−1
k to get
′
Q(k) R(k) G−1
k = QQk Λ → Q
where Q′ is essentially equal to Q but might have some of the columns multiplied by −1.
This is because Qk → I and so Qk Λ → Λ. Now by Lemma 15.2.4, it follows
Q(k) → Q′ , R(k) G−1
k → I.
It remains to verify Ak converges to an upper triangular matrix. Recall that from 15.11
and the definition below this (S = QR)
A = SDS −1 = (QR) D (QR)
−1
= QRDR−1 QT = QT QT
Where T is an upper triangular matrix. This is because it is the product of upper triangular
matrices R, D, R−1 . Thus QT AQ = T. If you replace Q with Q′ in the above, it still results
in an upper triangular matrix T ′ having the same diagonal entries as T. This is because
T = QT AQ = (Q′ Λ) A (Q′ Λ) = ΛQ′T AQ′ Λ
T
and considering the iith entry yields
∑
(
)
(
)
(
)
( T
)
Λij Q′T AQ′ jk Λki = Λii Λii Q′T AQ′ ii = Q′T AQ′ ii
Q AQ ii ≡
j,k
392
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
Recall from Lemma 15.2.3, Ak = Q(k)T AQ(k) . Thus taking a limit and using the first
part,
Ak = Q(k)T AQ(k) → Q′T AQ′ = T ′ . An easy case is for A symmetric. Recall Corollary 7.4.13. By this corollary, there exists
an orthogonal (real unitary) matrix Q such that
QT AQ = D
where D is diagonal having the eigenvalues on the main diagonal decreasing in size from the
upper left corner to the lower right.
Corollary 15.2.6 Let A be a real symmetric n × n matrix having eigenvalues
λ1 > λ2 > · · · > λn > 0
and let Q be defined by
QDQT = A, D = QT AQ,
(15.13)
where Q is orthogonal and D is a diagonal matrix having the eigenvalues on the main
diagonal decreasing in size from the upper left corner to the lower right. Let QT have an
LU factorization. Then in the QR algorithm, the matrices Q(k) converge to Q′ where Q′ is
the same as Q except having some columns multiplied by (−1) . Thus the columns of Q′ are
eigenvectors of A. The matrices Ak converge to D.
Proof: This follows from Theorem 15.2.5. Here S = Q, S −1 = QT . Thus
Q = S = QR
and R = I. By Theorem 15.2.5 and Lemma 15.2.3,
Ak = Q(k)T AQ(k) → Q′T AQ′ = QT AQ = D.
because formula 15.13 is unaffected by replacing Q with Q′ . When using the QR algorithm, it is not necessary to check technical condition about
S −1 having an LU factorization. The algorithm delivers a sequence of matrices which are
similar to the original one. If that sequence converges to an upper triangular matrix, then
the algorithm worked. Furthermore, the technical condition is sufficient but not necessary.
The algorithm will work even without the technical condition.
Example 15.2.7 Find the eigenvalues and eigenvectors of the matrix


5 1 1
A= 1 3 2 
1 2 1
It is a symmetric matrix but other than that, I just pulled it out of the air. By Lemma
15.2.3 it follows Ak = Q(k)T AQ(k) . And so to get to the answer quickly I could have the
computer raise A to a power and then take the QR factorization of what results to get the
k th iteration using the above formula. Lets pick k = 10.

5 1
 1 3
1 2
10 
1
4. 227 3 × 107
2  =  2. 595 9 × 107
1
1. 861 1 × 107
2. 595 9 × 107
1. 607 2 × 107
1. 150 6 × 107

1. 861 1 × 107
1. 150 6 × 107 
8. 239 6 × 106
15.2. THE QR ALGORITHM
393
Now take QR factorization of this. The computer will do that also.
This yields


. 797 85 −. 599 12 −6. 694 3 × 10−2
 . 489 95 . 709 12
·
−. 507 06
. 351 26 . 371 76
. 859 31


7
5. 298 3 × 10 3. 262 7 × 107 2. 338 × 107


0
1. 217 2 × 105
71946.
0
0
277. 03
Next it follows
T
. 797 85 −. 599 12 −6. 694 3 × 10−2
 ·
−. 507 06
=  . 489 95 . 709 12
. 351 26 . 371 76
. 859 31



5 1 1
. 797 85 −. 599 12 −6. 694 3 × 10−2
 1 3 2   . 489 95 . 709 12

−. 507 06
. 351 26 . 371 76
. 859 31
1 2 1

A10
and this equals

6. 057 1
 3. 698 × 10−3
3. 434 6 × 10−5
3. 698 × 10−3
3. 200 8
−4. 064 3 × 10−4

3. 434 6 × 10−5
−4. 064 3 × 10−4 
−. 257 9
By Gerschgorin’s theorem, the eigenvalues are pretty close to the diagonal entries of the
above matrix. Note I didn’t use the theorem, just Lemma 15.2.3 and Gerschgorin’s theorem
to verify the eigenvalues are close to the above numbers. The eigenvectors are close to

 
 

. 797 85
−. 599 12
−6. 694 3 × 10−2
 . 489 95  ,  . 709 12  , 

−. 507 06
. 351 26
. 371 76
. 859 31
Lets check one of these.



5 1 1
1 0
 1 3 2  − 6. 057 1  0 1
1 2 1
0 0

  
−3
0
−2. 197 2 × 10
=  2. 543 9 × 10−3  ≈  0 
1. 393 1 × 10−3
0
Now lets see how well

5 1
 1 3
1 2
 

0
. 797 85
0   . 489 95 
1
. 351 26
the smallest approximate eigenvalue and eigenvector works.


 

1
1 0 0
−6. 694 3 × 10−2

2  − (−. 257 9)  0 1 0  
−. 507 06
1
0 0 1
. 859 31

  
2. 704 × 10−4
0
=  −2. 737 7 × 10−4  ≈  0 
−1. 369 5 × 10−4
0
For practical purposes, this has found the eigenvalues and eigenvectors.
394
15.2.3
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
The QR Algorithm In The General Case
In the case where A has distinct positive eigenvalues it was shown above that under reasonable conditions related to a certain matrix having an LU factorization the QR algorithm
produces a sequence of matrices {Ak } which converges to an upper triangular matrix. What
if A is just an n×n matrix having possibly complex eigenvalues but A is nondefective? What
happens with the QR algorithm in this case? The short answer to this question is that the
Ak of the algorithm typically cannot converge. However, this does not mean the algorithm is not useful in finding eigenvalues. It turns out the sequence of matrices {Ak } have
the appearance of a block upper triangular matrix for large k in the sense that the entries
below the blocks on the main diagonal are small. Then looking at these blocks gives a way
to approximate the eigenvalues. An important example of the concept of a block triangular
matrix is the real Schur form for a matrix discussed in Theorem 7.4.6 but the concept as
described here allows for any size block centered on the diagonal.
First it is important to note a simple fact about unitary diagonal matrices. In what
follows Λ will denote a unitary matrix which is also a diagonal matrix. These matrices
are just the identity matrix with some of the ones replaced with a number of the form eiθ
for some θ. The important property of multiplication of any matrix by Λ on either side
is that it leaves all the zero entries the same and also preserves the absolute values of the
other entries. Thus a block triangular matrix multiplied by Λ on either side is still block
triangular. If the matrix is close to being block triangular this property of being close to a
block triangular matrix is also preserved by multiplying on either side by Λ. Other patterns
depending only on the size of the absolute value occurring in the matrix are also preserved
by multiplying on either side by Λ. In other words, in looking for a pattern in a matrix,
multiplication by Λ is irrelevant.
Now let A be an n × n matrix having real or complex entries. By Lemma 15.2.3 and the
assumption that A is nondefective, there exists an invertible S,
Ak = Q(k) R(k) = SDk S −1
where


D=
λ1
0
..
0
.
(15.14)



λn
and by rearranging the columns of S, D can be made such that
|λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | .
Assume S −1 has an LU factorization. Then
Ak = SDk LU = SDk LD−k Dk U.
Consider the matrix in the middle, Dk LD−k . The ij th entry is of the form
 k
if j < i
 λi Lij λ−k
j
( k
)
−k
=
D LD
1 if i = j
ij

0 if j > i
and these all converge to 0 whenever |λi | < |λj | . Thus
Dk LD−k = (Lk + Ek )
where Lk is a lower triangular matrix which has all ones down the diagonal and some
subdiagonal terms of the form
(15.15)
λki Lij λ−k
j
15.2. THE QR ALGORITHM
395
for which |λi | = |λj | while Ek → 0. (Note the entries of Lk are all bounded independent of
k but some may fail to converge.) Then
Q(k) R(k) = S (Lk + Ek ) Dk U
Let
SLk = Qk Rk
(15.16)
where this is the QR factorization of SLk . Then
Q(k) R(k)
(Qk Rk + SEk ) Dk U
(
)
= Qk I + Q∗k SEk Rk−1 Rk Dk U
=
= Qk (I + Fk ) Rk Dk U
where Fk → 0. Let I + Fk = Q′k Rk′ . Then Q(k) R(k) = Qk Q′k Rk′ Rk Dk U. By Lemma 15.2.4
Q′k → I and Rk′ → I.
(15.17)
Now let Λk be a diagonal unitary matrix which has the property that Λ∗k Dk U is an upper
triangular matrix which has all the diagonal entries positive. Then
Q(k) R(k) = Qk Q′k Λk (Λ∗k Rk′ Rk Λk ) Λ∗k Dk U
That matrix in the middle has all positive diagonal entries because it is itself an upper
triangular matrix, being the product of such, and is similar to the matrix Rk′ Rk which is
upper triangular with positive diagonal entries. By Lemma 15.2.4 again, this time using the
uniqueness assertion,
Q(k) = Qk Q′k Λk , R(k) = (Λ∗k Rk′ Rk Λk ) Λ∗k Dk U
Note the term Qk Q′k Λk must be real because the algorithm gives all Q(k) as real matrices.
By 15.17 it follows that for k large enough Q(k) ≈ Qk Λk where ≈ means the two matrices
are close. Recall Ak = Q(k)T AQ(k) and so for large k,
∗
Ak ≈ (Qk Λk ) A (Qk Λk ) = Λ∗k Q∗k AQk Λk
As noted above, the form of Λ∗k Q∗k AQk Λk in terms of which entries are large and small is
not affected by the presence of Λk and Λ∗k . Thus, in considering what form this is in, it
suffices to consider Q∗k AQk .
This could get pretty complicated but I will consider the case where
if |λi | = |λi+1 | , then |λi+2 | < |λi+1 | .
(15.18)
This is typical of the situation where the eigenvalues are all distinct and the matrix A is real
so the eigenvalues occur as conjugate pairs. Then in this case, Lk above is lower triangular
with some nonzero terms on the diagonal right below the main diagonal but zeros everywhere
else. Thus maybe (Lk )s+1,s ̸= 0 Recall 15.16 which implies
Qk = SLk Rk−1
(15.19)
where Rk−1 is upper triangular. Also recall from the definition of S in 15.14, it follows that
S −1 AS = D. Thus the columns of S are eigenvectors of A, the ith being an eigenvector for
λi . Now from the form of Lk , it follows Lk Rk−1 is a block upper triangular matrix denoted
by TB and so Qk = STB . It follows from the above construction in 15.15 and the given
396
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES
assumption on the sizes of the eigenvalues, there are finitely many 2 × 2 blocks centered
on the main diagonal along with possibly some diagonal entries. Therefore, for large k the
matrix Ak = Q(k)T AQ(k) is approximately of the same form as that of
Q∗k AQk = TB−1 S −1 ASTB = TB−1 DTB
which is a block upper triangular matrix. As explained above, multiplication by the various
diagonal unitary matrices does not affect this form. Therefore, for large k, Ak is approximately a block upper triangular matrix.
How would this change if the above assumption on the size of the eigenvalues were relaxed
but the matrix was still nondefective with appropriate matrices having an LU factorization
as above? It would mean the blocks on the diagonal would be larger. This immediately
makes the problem more cumbersome to deal with. However, in the case that the eigenvalues
of A are distinct, the above situation really is typical of what occurs and in any case can be
quickly reduced to this case.
To see this, suppose condition 15.18 is violated and λj , · · · , λj+p are complex eigenvalues
having nonzero imaginary parts such that each has the same absolute value but they are all
distinct. Then let µ > 0 and consider the matrix A+µI. Thus the corresponding eigenvalues
of A+µI are λj +µ, · · · , λj+p +µ. A short computation shows shows |λj + µ| , · · · , |λj+p + µ|
are all distinct and so the above situation of 15.18 is obtained. Of course, if there are repeated
eigenvalues, it may not be possible to reduce to the case above and you would end up with
large blocks on the main diagonal which could be difficult to deal with.
So how do you identify the eigenvalues? You know Ak and behold that it is close to a
block upper triangular matrix TB′ . You know Ak is also similar to A. Therefore, TB′ has
eigenvalues which are close to the eigenvalues of Ak and hence those of A provided k is
sufficiently large. See Theorem 7.9.2 which depends on complex analysis or the exercise on
Page 205 which gives another way to see this. Thus you find the eigenvalues of this block
triangular matrix TB′ and assert that these are good approximations of the eigenvalues of
Ak and hence to those of A. How do you find the eigenvalues of a block triangular matrix?
This is easy from Lemma 7.4.5. Say


B1 · · ·
∗

.. 
..
TB′ = 
.
. 
0
Bm
Then forming λI − TB′ and taking the determinant, it follows from Lemma 7.4.5 this equals
m
∏
det (λIj − Bj )
j=1
and so all you have to do is take the union of the eigenvalues for each Bj . In the case
emphasized here this is very easy because these blocks are just 2 × 2 matrices.
How do you identify approximate eigenvectors from this? First try to find the approximate eigenvectors for Ak . Pick an approximate eigenvalue λ, an exact eigenvalue for TB′ .
Then find v solving TB′ v = λv. It follows since TB′ is close to Ak that Ak v ≈ λv and so
Q(k) AQ(k)T v = Ak v ≈ λv
Hence
AQ(k)T v ≈ λQ(k)T v
and so Q(k)T v is an approximation to the eigenvector which goes with the eigenvalue of A
which is close to λ.
15.2. THE QR ALGORITHM
397
Example 15.2.8 Here is a matrix.

3
2
 −2 0
−2 −2

1
−1 
0
It happens that the eigenvalues of this matrix are 1, 1 + i, 1 − i. Lets apply the QR algorithm
as if the eigenvalues were not known.
Applying the QR algorithm to this matrix yields the following sequence of matrices.


1. 235 3
1. 941 2
4. 365 7
A1 =  −. 392 15 1. 542 5 5. 388 6 × 10−2 
−. 161 69 −. 188 64
. 222 22

A12
..
.
9. 177 2 × 10−2
−2. 855 6
=
1. 078 6 × 10−2
. 630 89
1. 908 2
3. 461 4 × 10−4

−2. 039 8
−3. 104 3 
1.0
At this point the bottom two terms on the left part of the bottom row are both very
small so it appears the real eigenvalue is near 1.0. The complex eigenvalues are obtained
from solving
( (
) (
))
1 0
9. 177 2 × 10−2 . 630 89
det λ
−
=0
0 1
−2. 855 6
1. 908 2
This yields
λ = 1.0 − . 988 28i, 1.0 + . 988 28i
Example 15.2.9 The equation x4 + x3 +4x2 + x− 2 = 0 has exactly two real solutions. You
can see this by graphing it. However, the rational root theorem from algebra shows neither
of these solutions are rational. Also, graphing it does not yield any information about the
complex solutions. Lets use the QR algorithm to approximate all the solutions, real and
complex.
A matrix whose characteristic polynomial is the

−1 −4 −1
 1
0
0

 0
1
0
0
0
1
Using the QR algorithm yields the following

. 999 99 −2. 592 7
 2. 121 3 −1. 777 8

A1 = 
0
. 342 46
0
0
..
.
given polynomial is

2
0 

0 
0
sequence of iterates for Ak

−1. 758 8 −1. 297 8
−1. 604 2 −. 994 15 

−. 327 49 −. 917 99 
−. 446 59 . 105 26
398
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

−. 834 12
−4. 168 2
 1. 05
. 145 14
A9 = 

0
4. 026 4 × 10−4
0
0
−1. 939
. 217 1
−. 850 29
−1. 826 3 × 10−2

−. 778 3
2. 547 4 × 10−2 


−. 616 08
. 539 39
Now this is similar to A and the eigenvalues are close to the eigenvalues obtained from
the two blocks on the diagonal,
(
) (
)
−. 834 12 −4. 168 2
−. 850 29
−. 616 08
,
1. 05
. 145 14
−1. 826 3 × 10−2 . 539 39
since 4. 026 4 × 10−4 is small. After routine computations involving the quadratic formula,
these are seen to be
−. 858 34, . 547 44, −. 344 49 − 2. 033 9i, −. 344 49 + 2. 033 9i
When these are plugged in to the polynomial equation, you see that each is close to being
a solution of the equation.
It seems like most of the attention to the QR algorithm has to do with finding ways
to get it to “converge” faster. Great and marvelous are the clever tricks which have been
proposed to do this but my intent is to present the basic ideas, not to go in to the numerous
refinements of this algorithm. However, there is one thing which is usually done. It involves
reducing to the case of an upper Hessenberg matrix which is one which is zero below the
main sub diagonal. To see that every matrix is unitarily similar to an upper Hessenberg
matrix , see Problem 1 on Page 278. What follows is a construction which also proves this.
Let A be an invertible n × n matrix. Let Q′1 be a unitary matrix
 √∑
 

n
2


a
|aj1 |
j=2
a21

 
  0 
0
 
.
′ 

≡ . 
.
Q1  .  = 
  . 
..
. 


.
an1
0
0
The vector Q′1 is multiplying is just the bottom n − 1 entries of the first column of A. Then
let Q1 be
(
)
1 0
0 Q′1
It follows

Q1 AQ∗1
(
=
1 0
0 Q′1
)
AQ∗1


=

a11
a
..
.
a12
···
A′1
a1n

)
(
 1 0

 0 Q′∗
1
0



=

∗ ∗ ···
a
..
.
A1
0
∗





Now let Q′2 be the n − 2 × n − 2 matrix which does to the first column of A1 the same
sort of thing that the n − 1 × n − 1 matrix Q′1 did to the first column of A. Let
(
)
I 0
Q2 ≡
0 Q′2
15.3. EXERCISES
399
where I is the 2 × 2 identity. Then applying block multiplication,


∗ ∗ ···
∗ ∗
 ∗ ∗ ···
∗ ∗ 


 0 ∗

∗ ∗
Q2 Q1 AQ1 Q2 = 

 .. ..

 . .

A2
0 0
where A2 is now an n − 2 × n − 2 matrix. Continuing this way you eventually get a unitary
matrix Q which is a product of those discussed above such that


∗ ∗ ···
∗ ∗
 ∗ ∗ ···
∗ ∗ 



.. 
T

. 
QAQ =  0 ∗ ∗

 . . .

.. ... ∗ 
 .. ..
0 0
∗ ∗
This matrix equals zero below the subdiagonal. It is called an upper Hessenberg matrix.
It happens that in the QR algorithm, if Ak is upper Hessenberg, so is Ak+1 . To see this,
note that the matrix is upper Hessenberg means that Aij = 0 whenever i − j ≥ 2.
Ak+1 = Rk Qk
where Ak = Qk Rk . Therefore as shown before,
Ak+1 = Rk Ak Rk−1
Let the ij th entry of Ak be akij . Then if i − j ≥ 2
ak+1
=
ij
j
n ∑
∑
−1
rip akpq rqj
p=i q=1
It is given that akpq = 0 whenever p − q ≥ 2. However, from the above sum,
p−q ≥i−j ≥2
and so the sum equals 0.
Since upper Hessenberg matrices stay that way in the algorithm and it is closer to
being upper triangular, it is reasonable to suppose the QR algorithm will yield good results
more quickly for this upper Hessenberg matrix than for the original matrix. This would be
especially true if the matrix is good sized. The other important thing to observe is that,
starting with an upper Hessenberg matrix, the algorithm will restrict the size of the blocks
which occur to being 2 × 2 blocks which are easy to deal with. These blocks allow you to
identify the complex roots.
15.3
Exercises
In these exercises which call for a computation, don’t waste time on them unless you use a
computer or calculator which can raise matrices to powers and take QR factorizations.
1. In Example 15.1.10 an eigenvalue was found correct to several decimal places along
with an eigenvector. Find the other eigenvalues along with their eigenvectors.
400
CHAPTER 15. NUMERICAL METHODS, EIGENVALUES

2.
3.
4.
5.
6.
7.
8.

3 2 1
Find the eigenvalues and eigenvectors of the matrix A =  2 1 3  numerically.
1 3 2
√
In this case the exact eigenvalues are ± 3, 6. Compare with the exact answers.


3 2 1
Find the eigenvalues and eigenvectors of the matrix A =  2 5 3  numerically.
1 3 2
√
√
The exact eigenvalues are 2, 4 + 15, 4 − 15. Compare your numerical results with
the exact values. Is it much fun to compute the exact eigenvectors?


0 2 1
Find the eigenvalues and eigenvectors of the matrix A =  2 5 3  numerically.
1 3 2
I don’t know the exact eigenvalues in this case. Check your answers by multiplying
your numerically computed eigenvectors by the matrix.


0 2 1
Find the eigenvalues and eigenvectors of the matrix A =  2 0 3  numerically.
1 3 2
I don’t know the exact eigenvalues in this case. Check your answers by multiplying
your numerically computed eigenvectors by the matrix.


3 2 3
T
Consider the matrix A =  2 1 4  and the vector (1, 1, 1) . Find the shortest
3 4 0
distance between the Rayleigh quotient determined by this vector and some eigenvalue
of A.


1 2 1
T
Consider the matrix A =  2 1 4  and the vector (1, 1, 1) . Find the shortest
1 4 5
distance between the Rayleigh quotient determined by this vector and some eigenvalue
of A.


3 2 3
T
Consider the matrix A =  2 6 4  and the vector (1, 1, 1) . Find the shortest
3 4 −3
distance between the Rayleigh quotient determined by this vector and some eigenvalue
of A.
9. 
Using
3
 2
3
Gerschgorin’s
theorem, find upper and lower bounds for the eigenvalues of A =

2 3
6 4 .
4 −3
10. Tell how to find a matrix whose characteristic polynomial is a given monic polynomial.
This is called a companion matrix. Find the roots of the polynomial x3 + 7x2 + 3x + 7.
11. Find the roots to x4 + 3x3 + 4x2 + x + 1. It has two complex roots.
12. Suppose A is a real symmetric matrix and the technique of reducing to an upper
Hessenberg matrix is followed. Show the resulting upper Hessenberg matrix is actually
equal to 0 on the top as well as the bottom.
Appendix A
Matrix Calculator On The Web
A.1
Use Of Matrix Calculator On Web
There is a really nice service on the web which will do all of these things very easily. It is
www.bluebit.gr/matrix-calculator/ To get to it, you can use the address or google matrix
calculator.
When you go to this site, you enter a matrix row by row, placing a space between each
number. When you come to the end of a row, you press enter on the keyboard to start the
next row. After entering the matrix, you select what you want it to do. You will see that it
also solves systems of equations.
401
402
APPENDIX A. MATRIX CALCULATOR ON THE WEB
Appendix B
Positive Matrices
Earlier theorems about Markov matrices were presented. These were matrices in which all
the entries were nonnegative and either the columns or the rows added to 1. It turns out
that many of the theorems presented can be generalized to positive matrices. When this is
done, the resulting theory is mainly due to Perron and Frobenius. I will give an introduction
to this theory here following Karlin and Taylor [19].
Definition B.0.1 For A a matrix or vector, the notation, A >> 0 will mean every entry
of A is positive. By A > 0 is meant that every entry is nonnegative and at least one is
positive. By A ≥ 0 is meant that every entry is nonnegative. Thus the matrix or vector
consisting only of zeros is ≥ 0. An expression like A >> B will mean A − B >> 0 with
similar modifications for > and ≥.
T
For the sake of this section only, define the following for x = (x1 , · · · , xn ) , a vector.
T
|x| ≡ (|x1 | , · · · , |xn |) .
Thus |x| is the vector which results by replacing each entry of x with its absolute value1 .
Also define for x ∈ Cn ,
∑
|xk | .
||x||1 ≡
k
Lemma B.0.2 Let A >> 0 and let x > 0. Then Ax >> 0.
∑
Proof: (Ax)i = j Aij xj > 0 because all the Aij > 0 and at least one xj > 0.
Lemma B.0.3 Let A >> 0. Define
S ≡ {λ : Ax > λx for some x >> 0} ,
and let
K ≡ {x ≥ 0 such that ||x||1 = 1} .
Now define
S1 ≡ {λ : Ax ≥ λx for some x ∈ K} .
Then
sup (S) = sup (S1 ) .
1 This notation is just about the most abominable thing imaginable. However, it saves space in the
presentation of this theory of positive matrices and avoids the use of new symbols. Please forget about it
when you leave this section.
403
404
APPENDIX B. POSITIVE MATRICES
Proof: Let λ ∈ S. Then there exists x >> 0 such that Ax > λx. Consider y ≡ x/ ||x||1 .
Then ||y||1 = 1 and Ay > λy. Therefore, λ ∈ S1 and so S ⊆ S1 . Therefore, sup (S) ≤
sup (S1 ) .
Now let λ ∈ S1 . Then there exists x ≥ 0 such that ||x||1 = 1 so x > 0 and Ax > λx.
Letting y ≡ Ax, it follows from Lemma B.0.2 that Ay >> λy and y >> 0. Thus λ ∈ S
and so S1 ⊆ S which shows that sup (S1 ) ≤ sup (S) . This lemma is significant because the set, {x ≥ 0 such that ||x||1 = 1} ≡ K is a compact
set in Rn . Define
λ0 ≡ sup (S) = sup (S1 ) .
(2.1)
The following theorem is due to Perron.
Theorem B.0.4 Let A >> 0 be an n × n matrix and let λ0 be given in 2.1. Then
1. λ0 > 0 and there exists x0 >> 0 such that Ax0 = λ0 x0 so λ0 is an eigenvalue for A.
2. If Ax = µx where x ̸= 0, and µ ̸= λ0 . Then |µ| < λ0 .
3. The eigenspace for λ0 has dimension 1.
T
Proof: To see λ0 > 0, consider the vector, e ≡ (1, · · · , 1) . Then
∑
Aij > 0
(Ae)i =
j
and so λ0 is at least as large as
min
i
∑
Aij .
j
Let {λk } be an increasing sequence of numbers from S1 converging to λ0 . Letting xk be
the vector from K which occurs in the definition of S1 , these vectors are in a compact set.
Therefore, there exists a subsequence, still denoted by xk such that xk → x0 ∈ K and
λk → λ0 . Then passing to the limit,
Ax0 ≥ λ0 x0 , x0 > 0.
If Ax0 > λ0 x0 , then letting y ≡ Ax0 , it follows from Lemma B.0.2 that Ay >> λ0 y and
y >> 0. But this contradicts the definition of λ0 as the supremum of the elements of S
because since Ay >> λ0 y, it follows Ay >> (λ0 + ε) y for ε a small positive number.
Therefore, Ax0 = λ0 x0 . It remains to verify that x0 >> 0. But this follows immediately
from
∑
0<
Aij x0j = (Ax0 )i = λ0 x0i .
j
This proves 1.
Next suppose Ax = µx and x ̸= 0 and µ ̸= λ0 . Then |Ax| = |µ| |x| . But this implies
A |x| ≥ |µ| |x| . (See the above abominable definition of |x|.)
Case 1: |x| ̸= x and |x| ̸= −x.
In this case, A |x| > |Ax| = |µ| |x| and letting y = A |x| , it follows y >> 0 and
Ay >> |µ| y which shows Ay >> (|µ| + ε) y for sufficiently small positive ε and verifies
|µ| < λ0 .
Case 2: |x| = x or |x| = −x
In this case, the entries of x are all real and have the same sign. Therefore, A |x| =
|Ax| = |µ| |x| . Now let y ≡ |x| / ||x||1 . Then Ay = |µ| y and so |µ| ∈ S1 showing that
405
|µ| ≤ λ0 . But also, the fact the entries of x all have the same sign shows µ = |µ| and so
µ ∈ S1 . Since µ ̸= λ0 , it must be that µ = |µ| < λ0 . This proves 2.
It remains to verify 3. Suppose then that Ay = λ0 y and for all scalars α, αx0 ̸= y. Then
A Re y = λ0 Re y, A Im y = λ0 Im y.
If Re y = α1 x0 and Im y = α2 x0 for real numbers, αi ,then y = (α1 + iα2 ) x0 and it is
assumed this does not happen. Therefore, either
t Re y ̸= x0 for all t ∈ R
or
t Im y ̸= x0 for all t ∈ R.
Assume the first holds. Then varying t ∈ R, there exists a value of t such that x0 +t Re y > 0
but it is not the case that x0 +t Re y >> 0. Then A (x0 + t Re y) >> 0 by Lemma B.0.2. But
this implies λ0 (x0 + t Re y) >> 0 which is a contradiction. Hence there exist real numbers,
α1 and α2 such that Re y = α1 x0 and Im y = α2 x0 showing that y = (α1 + iα2 ) x0 . This
proves 3.
It is possible to obtain a simple corollary to the above theorem.
Corollary B.0.5 If A > 0 and Am >> 0 for some m ∈ N, then all the conclusions of the
above theorem hold.
Proof: There exists µ0 > 0 such that Am y0 = µ0 y0 for y0 >> 0 by Theorem B.0.4 and
µ0 = sup {µ : Am x ≥ µx for some x ∈ K} .
Let λm
0 = µ0 . Then
(
)
(A − λ0 I) Am−1 + λ0 Am−2 + · · · + λm−1
I y0 = (Am − λm
0
0 I) y0 = 0
(
)
and so letting x0 ≡ Am−1 + λ0 Am−2 + · · · + λm−1
I y0 , it follows x0 >> 0 and Ax0 =
0
λ0 x0 .
Suppose now that Ax = µx for x ̸= 0 and µ ̸= λ0 . Suppose |µ| ≥ λ0 . Multiplying both
m
sides by A, it follows Am x = µm x and |µm | = |µ| ≥ λm
0 = µ0 and so from Theorem B.0.4,
m
m
m
since |µ | ≥ µ0 , and µ is an eigenvalue of A , it follows that µm = µ0 . But by Theorem
B.0.4 again, this implies x = cy0 for some scalar, c and hence Ay0 = µy0 . Since y0 >> 0,
it follows µ ≥ 0 and so µ = λ0 , a contradiction. Therefore, |µ| < λ0 .
Finally, if Ax = λ0 x, then Am x = λm
0 x and so x = cy0 for some scalar, c. Consequently,
( m−1
)
(
)
A
+ λ0 Am−2 + · · · + λ0m−1 I x = c Am−1 + λ0 Am−2 + · · · + λm−1
I y0
0
= cx0 .
Hence
x = cx0
mλm−1
0
which shows the dimension of the eigenspace for λ0 is one. The following corollary is an extremely interesting convergence result involving the powers of positive matrices.
Corollary B.0.6 Let A > 0 and Am >> 0 for some
for λ0 given in 2.1,
( m)∈m N. Then
there exists a rank one matrix P such that limm→∞ λA0
− P = 0.
406
APPENDIX B. POSITIVE MATRICES
Proof: Considering AT , and the fact that A and AT have the same eigenvalues, Corollary
B.0.5 implies the existence of a vector, v >> 0 such that
AT v = λ0 v.
Also let x0 denote the vector such that Ax0 = λ0 x0 with x0 >> 0. First note that xT0 v > 0
because both these vectors have all entries positive. Therefore, v may be scaled such that
vT x0 = xT0 v = 1.
(2.2)
Define
P ≡ x0 v T .
Thanks to 2.2,
A
P = x0 vT = P, P
λ0
(
A
λ0
)
(
= x0 v
T
A
λ0
)
= x0 vT = P,
(2.3)
and
P 2 = x0 vT x0 vT = vT x0 = P.
(2.4)
Therefore,
(
A
−P
λ0
)2
(
=
(
=
A
λ0
A
λ0
)2
(
−2
)2
A
λ0
)
P + P2
− P.
Continuing this way, using 2.3 repeatedly, it follows
(( )
)m ( )m
A
A
−P
=
− P.
(2.5)
λ0
λ0
( )
The eigenvalues of λA0 − P are of interest because it is powers of this matrix which
( )m
to P. Therefore, let µ be a nonzero eigenvalue of this
determine the convergence of λA0
matrix. Thus
(( )
)
A
− P x = µx
(2.6)
λ0
for x ̸= 0, and µ ̸= 0. Applying P to both sides and using the second formula of 2.3 yields
( ( )
)
A
2
0 = (P − P ) x = P
− P x = µP x.
λ0
But since P x = 0, it follows from 2.6 that
Ax = λ0 µx
which implies λ0 µ is an eigenvalue of A. Therefore, by Corollary B.0.5 it follows that either
λ0 µ = λ0 in which case µ = 1, or λ0 |µ| < λ0 which implies |µ| < 1. But if µ = 1, then x is
a multiple of x0 and 2.6 would yield
(( )
)
A
− P x0 = x0
λ0
407
which says x0 − x0 vT x0 = x0 and so by 2.2, x0 = 0 contrary to the property that x0 >> 0.
Therefore,
|µ| < 1 and so this has shown that the absolute values of all eigenvalues of
( )
A
λ0
− P are less than 1. By Gelfand’s theorem, Theorem 14.3.3, it follows
(( )
)m 1/m
A
−P
<r<1
λ0
whenever m is large enough. Now by 2.5 this yields
( )m
(( )
)m A
A
≤ rm
− P = −P
λ0
λ0
whenever m is large enough. It follows
( )m
A
lim
− P = 0
m→∞ λ0
as claimed.
What about the case when A > 0 but maybe it is not the case that A >> 0? As before,
K ≡ {x ≥ 0 such that ||x||1 = 1} .
Now define
S1 ≡ {λ : Ax ≥ λx for some x ∈ K}
and
λ0 ≡ sup (S1 )
(2.7)
Theorem B.0.7 Let A > 0 and let λ0 be defined in 2.7. Then there exists x0 > 0 such
that Ax0 = λ0 x0 .
Proof: Let E consist of the matrix which has a one in every entry. Then from Theorem
B.0.4 it follows there exists xδ >> 0 , ||xδ ||1 = 1, such that (A + δE) xδ = λ0δ xδ where
λ0δ ≡ sup {λ : (A + δE) x ≥ λx for some x ∈ K} .
Now if α < δ
{λ : (A + αE) x ≥ λx for some x ∈ K} ⊆
{λ : (A + δE) x ≥ λx for some x ∈ K}
and so λ0δ ≥ λ0α because λ0δ is the sup of the second set and λ0α is the sup of the first. It
follows the limit, λ1 ≡ limδ→0+ λ0δ exists. Taking a subsequence and using the compactness
of K, there exists a subsequence, still denoted by δ such that as δ → 0, xδ → x ∈ K.
Therefore,
Ax = λ1 x
and so, in particular, Ax ≥ λ1 x and so λ1 ≤ λ0 . But also, if λ ≤ λ0 ,
λx ≤ Ax < (A + δE) x
showing that λ0δ ≥ λ for all such λ. But then λ0δ ≥ λ0 also. Hence λ1 ≥ λ0 , showing these
two numbers are the same. Hence Ax = λ0 x. If Am >> 0 for some m and A > 0, it follows that the dimension of the eigenspace for
λ0 is one and that the absolute value of every other eigenvalue of A is less than λ0 . If it is
only assumed that A > 0, not necessarily >> 0, this is no longer true. However, there is
something which is very interesting which can be said. First here is an interesting lemma.
408
APPENDIX B. POSITIVE MATRICES
Lemma B.0.8 Let M be a matrix of the form
(
A
M=
B
0
C
(
or
M=
A
0
)
)
B
C
where A is an r × r matrix and C is an (n − r) × (n − r) matrix. Then det (M ) =
det (A) det (B) and σ (M ) = σ (A) ∪ σ (C) .
Proof: To verify the claim about the determinants, note
(
) (
)(
)
A 0
A 0
I 0
=
B C
0 I
B C
Therefore,
(
det
A
B
0
C
)
(
= det
A 0
0 I
)
(
det
I
B
0
C
)
.
But it is clear from the method of Laplace expansion that
(
)
A 0
det
= det A
0 I
and from the multilinear properties of the determinant and row operations that
(
)
(
)
I 0
I 0
det
= det
= det C.
B C
0 C
The case where M is upper block triangular is similar.
This immediately implies σ (M ) = σ (A) ∪ σ (C) .
Theorem B.0.9 Let A > 0 and let λ0 be given in 2.7. If λ is an eigenvalue for A such
m
that |λ| = λ0 , then λ/λ0 is a root of unity. Thus (λ/λ0 ) = 1 for some m ∈ N.
Proof: Applying Theorem B.0.7 to AT , there exists v > 0 such that AT v = λ0 v. In
the first part of the argument it is assumed v >> 0. Now suppose Ax = λx, x ̸= 0 and that
|λ| = λ0 . Then
A |x| ≥ |λ| |x| = λ0 |x|
and it follows that if A |x| > |λ| |x| , then since v >> 0,
(
)
λ0 (v, |x|) < (v,A |x|) = AT v, |x| = λ0 (v, |x|) ,
a contradiction. Therefore,
A |x| = λ0 |x| .
It follows that
∑
∑
A
x
Aij |xj |
ij j = λ0 |xi | =
j
j
and so the complex numbers,
Aij xj , Aik xk
(2.8)
409
must have the same argument for every k, j because equality holds in the triangle inequality. Therefore, there exists a complex number, µi such that
Aij xj = µi Aij |xj |
and so, letting r ∈ N,
(2.9)
Aij xj µrj = µi Aij |xj | µrj .
Summing on j yields
∑
Aij xj µrj = µi
∑
j
Aij |xj | µrj .
(2.10)
j
Also, summing 2.9 on j and using that λ is an eigenvalue for x, it follows from 2.8 that
∑
∑
λxi =
Aij xj = µi
Aij |xj | = µi λ0 |xi | .
(2.11)
j
j
From 2.10 and 2.11,
∑
Aij xj µrj
= µi
∑
j
Aij |xj | µrj
j
= µi
∑
see 2.11
z }| {
Aij µj |xj |µr−1
j
j
= µi
∑
(
Aij
j
(
= µi
λ
λ0
λ
λ0
)∑
)
xj µr−1
j
Aij xj µr−1
j
j
Now from 2.10 with r replaced by r − 1, this equals
( )∑
( )∑
λ
λ
2
µ2i
Aij |xj | µr−1
Aij µj |xj | µr−2
=
µ
i
j
j
λ0
λ
0
j
j
( )2 ∑
λ
2
= µi
Aij xj µr−2
.
j
λ0
j
Continuing this way,
∑
(
Aij xj µrj
=
µki
j
λ
λ0
)k ∑
Aij xj µr−k
j
j
and eventually, this shows
∑
(
Aij xj µrj
µri
=
j
(
=
(
and this says
λ
λ0
)r+1
(
is an eigenvalue for
A
λ0
λ
λ0
λ
λ0
)r
)r ∑
Aij xj
j
λ (xi µri )
)
with the eigenvector being
T
(x1 µr1 , · · · , xn µrn ) .
410
APPENDIX B. POSITIVE MATRICES
( )2 ( )3 ( )4
Now recall that r ∈ N was arbitrary and so this has shown that λλ0 , λλ0 , λλ0 , · · ·
( )
are each eigenvalues of λA0 which has only finitely many and hence this sequence must
( )
repeat. Therefore, λλ0 is a root of unity as claimed. This proves the theorem in the case
that v >> 0.
Now it is necessary to consider the case where v > 0 but it is not the case that v >> 0.
Then in this case, there exists a permutation matrix P such that


v1
 .. 
 . 
 (

)
 vr 
u


≡ v1
Pv = 
≡
0
 0 
 . 
 .. 
0
Then
λ0 v = AT v = AT P v1 .
Therefore,
λ0 v1 = P AT P v1 = Gv1
Now P 2 = I because it is a permutation matrix. Therefore, the matrix G ≡ P AT P and A
are similar. Consequently, they have the same eigenvalues and it suffices from now on to
consider the matrix G rather than A. Then
(
) (
)(
)
u
M1 M2
u
λ0
=
0
M3 M4
0
where M1 is r × r and M4 is (n − r) × (n − r) . It follows from block multiplication and the
assumption that A and hence G are > 0 that
( ′
)
A B
G=
.
0 C
Now let λ be an eigenvalue of G such that |λ| = λ0 . Then from Lemma B.0.8, either
λ ∈ σ (A′ ) or λ ∈ σ (C) . Suppose without loss of generality that λ ∈ σ (A′ ) . Since A′ > 0
it has a largest positive eigenvalue λ′0 which is obtained from 2.7. Thus λ′0 ≤ λ0 but λ
being an eigenvalue of A′ , has its absolute value bounded by λ′0 and so λ0 = |λ| ≤ λ′0 ≤ λ0
showing that λ0 ∈ σ (A′ ) . Now if there exists v >> 0 such that A′T v = λ0 v, then the first
part of this proof applies to the matrix A and so (λ/λ0 ) is a root of unity. If such a vector,
v does not exist, then let A′ play the role of A in the above argument and reduce to the
consideration of
( ′′
)
A
B′
G′ ≡
0 C′
where G′ is similar to A′ and λ, λ0 ∈ σ (A′′ ) . Stop if A′′T v = λ0 v for some v >> 0.
Otherwise, decompose A′′ similar to the above and add another prime. Continuing this way
T
you must eventually obtain the situation where (A′···′ ) v = λ0 v for some v >> 0. Indeed,
′···′
this happens no later than when A
is a 1 × 1 matrix. Appendix C
Functions Of Matrices
The existence of the Jordan form also makes it possible to define various functions of matrices. Suppose
∞
∑
f (λ) =
an λn
(3.1)
n=0
∑∞
for all |λ| < R. There is a formula for f (A) ≡ n=0 an An which makes sense whenever
ρ (A) < R. Thus you can speak of sin (A) or eA for A an n × n matrix. To begin with, define
fP (λ) ≡
P
∑
an λn
n=0
so for k < P
(k)
fP (λ) =
P
∑
an n · · · (n − k + 1) λn−k
n=k
=
P
∑
n=k
Thus
an
( )
n
k!λn−k .
k
(3.2)
( )
P
(k)
∑
fP (λ)
n n−k
=
an
λ
k!
k
(3.3)
n=k
To begin with consider f (Jm (λ)) where Jm (λ) is an m × m Jordan block. Thus Jm (λ) =
D + N where N m = 0 and N commutes with D. Therefore, letting P > m
P
P
n ( )
∑
∑
∑
n
n
an Jm (λ)
=
an
Dn−k N k
k
n=0
n=0
k=0
P
P
∑ ∑ (n)
=
an
Dn−k N k
k
k=0 n=k
( )
m−1
P
∑
∑
n
k
=
N
an
Dn−k .
(3.4)
k
k=0
From 3.3 this equals
m−1
∑
k=0
(
k
N diag
(k)
n=k
(k)
f (λ)
fP (λ)
,··· , P
k!
k!
411
)
(3.5)
412
APPENDIX C. FUNCTIONS OF MATRICES
where for k = 0, · · · , m − 1, define diagk (a1 , · · · , am−k ) the m × m matrix which equals zero
everywhere except on the k th super diagonal where this diagonal is filled with the numbers,
{a1 , · · · , am−k } from the upper left to the lower right. With no subscript, it is just the
diagonal matrices having the indicated entries. Thus in 4 × 4 matrices, diag2 (1, 2) would
be the matrix


0 0 1 0
 0 0 0 2 


 0 0 0 0 .
0 0 0 0
Then from 3.5 and 3.2,
P
∑
n
an Jm (λ) =
n=0
Therefore,
∑P
m−1
∑
(
diag k
k=0
(k)
(k)
f (λ)
fP (λ)
,··· , P
k!
k!
)
.
n
n=0
an Jm (λ) =

′
fP
(λ)
fP (λ)
1!



fP (λ)






(2)
fP (λ)
2!
(m−1)
fP
(λ)
(m−1)!
···
..
.
..
.
..
.
′
fP
(λ)
1!
fP (λ)
..
.
(2)
fP (λ)
2!
′
fP
(λ)
1!
fP (λ)
0










(3.6)
Now let A be an n × n matrix with ρ (A) < R where R is given above. Then the Jordan
form of A is of the form


J1
0


J2


J =
(3.7)

..


.
0
Jr
where Jk = Jmk (λk ) is an mk × mk Jordan block and A = S −1 JS. Then, letting P > mk
for all k,
P
P
∑
∑
an An = S −1
an J n S,
n=0
n=0
and because of block multiplication of matrices,
 ∑P
n
n=0 an J1

..
P
∑

.
an J n = 


n=0
0
and from 3.6
∑P
n=0
an Jkn converges as P → ∞ to

′
f (2) (λk )
k)
f (λk ) f (λ
1!
2!

′

f
(λ
k)

0
f (λk )
1!



0
0
f (λk )


..
..

.
.
0
0
···

0
..
.
∑P
n=0





an Jrn
the mk × mk matrix

(m−1)
(λk )
· · · f (mk −1)!

..

..

.
.


..
f (2) (λk )

.
2!


′
..
f (λk )

.
1!
0
f (λk )
(3.8)
413
There is no convergence problem because |λ| < R for all λ ∈ σ (A) . This has proved the
following theorem.
Theorem C.0.10 Let f be given by 3.1 and suppose ρ (A) < R where R is the radius of
convergence of the power series in 3.1. Then the series,
∞
∑
an An
(3.9)
k=0
converges in the space L (Fn , Fn ) with respect to any of the norms on this space and furthermore,
 ∑∞

n
0
n=0 an J1


..
∞
∑


.
n
−1 
S
an A = S 

.
.


.
k=0
∑∞
n
0
a
J
n=0 n r
∑∞
n
where n=0 an Jk is an mk × mk matrix of the form given in 3.8 where A = S −1 JS and
the Jordan form of A, J is given by 3.7. Therefore, you can define f (A) by the series in
3.9.
Here is a simple example.


1 −1 1
1
0 −1 
.
−1 1 −1 
2
1
4
4
 1
Example C.0.11 Find sin (A) where A = 
 0
−1
In this case, the Jordan canonical form of the matrix is
 

4
1 −1 1
2
0
 1
  1 −4
1
0
−1

 
 0 −1 1 −1  =  0
0
−1 2
1
4
−1 4

4
 0

 0
0
0
2
0
0
0
1
2
0
 1
0
2
 1
0 
 8
1  0
2
0
1
2
− 38
1
4
1
2
0
0
− 41
1
2
Then from the above theorem sin (J) is given by

 
sin 4
0
4 0 0 0
 0 2 1 0   0
sin
2
 
sin 
 0 0 2 1 = 0
0
0 0 0 2
0
0
Therefore, sin (A) =

2
0 −2 −1
 1 −4 −2 −1

 0
0 −2 1
−1 4
4
2

sin 4
0
 0
sin
2

 0
0
0
0
0
cos 2
sin 2
0
0
not too hard to find.

−2 −1
−2 −1 
·
−2 1 
4
2

1
2
− 18
1
4
1
2

.

0
cos 2
sin 2
0

0

.
cos 2 
sin 2
− sin 2
2
1
2
1
8


cos 2   0
0
sin 2
− sin 2
2

1
2
− 38
1
4
1
2
0
0
− 14
1
2
1
2
− 18
1
4
1
2


=M

414
APPENDIX C. FUNCTIONS OF MATRICES
where the columns of M are as follows from left to right,

 
sin 4
sin 4 − sin 2 − cos 2
 1 sin 4 − 1 sin 2   1 sin 4 + 3 sin 2 − 2 cos 2
2
2
 2
, 2

 
0
− cos 2
− 1 sin 4 + 12 sin 2
− 12 sin 4 − 12 sin 2 + 3 cos 2

 2
sin 4 − sin 2 − cos 2
 1 sin 4 + 1 sin 2 − 2 cos 2 
2
.
 2


− cos 2
1
1
− 2 sin 4 + 2 sin 2 + 3 cos 2
 

− cos 2
 

sin 2
,

  sin 2 − cos 2 
cos 2 − sin 2
Perhaps this isn’t the first thing you would think of. Of course the ability to get this nice
closed form description of sin (A) was dependent on being able to find the Jordan form along
with a similarity transformation which will yield the Jordan form.
The following corollary is known as the spectral mapping theorem.
Corollary C.0.12 Let A be an n × n matrix and let ρ (A) < R where for |λ| < R,
f (λ) =
∞
∑
an λn .
n=0
Then f (A) is also an n × n matrix and furthermore, σ (f (A)) = f (σ (A)) . Thus the eigenvalues of f (A) are exactly the numbers f (λ) where λ is an eigenvalue of A. Furthermore,
the algebraic multiplicity of f (λ) coincides with the algebraic multiplicity of λ.
All of these things can be generalized to linear transformations defined on infinite dimensional spaces and when this is done the main tool is the Dunford integral along with
the methods of complex analysis. It is good to see it done for finite dimensional situations
first because it gives an idea of what is possible. Actually, some of the most interesting
functions in applications do not come in the above form as a power series expanded about
0. One example of this situation has already been encountered in the proof of the right
polar decomposition with the square root of an Hermitian transformation which had all
nonnegative eigenvalues. Another example is that of taking the positive part of an Hermitian matrix. This is important in some physical models where something may depend on
the positive part of the strain which is a symmetric real matrix. Obviously there is no way
to consider this as a power series expanded about 0 because the function f (r) = r+ is not
even differentiable at 0. Therefore, a totally different approach must be considered. First
the notion of a positive part is defined.
Definition C.0.13 Let A be an Hermitian matrix. Thus it suffices to consider A as an
element of L (Fn , Fn ) according to the usual notion of matrix multiplication. Then there
exists an orthonormal basis of eigenvectors, {u1 , · · · , un } such that
A=
n
∑
λ j uj ⊗ uj ,
j=1
for λj the eigenvalues of A, all real. Define
A+ ≡
n
∑
j=1
where λ+ ≡
|λ|+λ
2 .
λ+
j uj ⊗ uj
415
This gives us a nice definition of what is meant but it turns out to be very important in
the applications to determine how this function depends on the choice of symmetric matrix
A. The following addresses this question.
Theorem C.0.14 If A, B be Hermitian matrices, then for |·| the Frobenius norm,
+
A − B + ≤ |A − B| .
∑
∑
Proof: Let A = i λi vi ⊗ vi and let B = j µj wj ⊗ wj where {vi } and {wj } are
orthonormal bases of eigenvectors.
2

∑
∑
+
2
A − B + = trace 
 =
µ+
λ+
i vi ⊗ vi −
j wj ⊗ wj
j
i

trace 
∑ ( )2
∑ ( )2
µ+
wj ⊗ wj
λ+
vi ⊗ vi +
i
j
i
−
∑
j
+
λ+
i µj (wj , vi ) vi ⊗ wj −
i,j
∑

+

λ+
i µj (vi , wj ) wj ⊗ vi
i,j
Since the trace of vi ⊗ wj is (vi , wj ) , a fact which follows from (vi , wj ) being the only
possibly nonzero eigenvalue,
∑ ( )2 ∑ ( )2
∑
2
+
=
λ+
µ+
λ+
+
−2
(3.10)
i
i µj |(vi , wj )| .
j
i
j
i,j
Since these are orthonormal bases,
∑
∑
2
2
|(vi , wj )| = 1 =
|(vi , wj )|
i
j
and so 3.10 equals
=
∑ ∑ ((
i
λ+
i
)2
)
( )2
2
+
+ µ+
− 2λ+
|(vi , wj )| .
i µj
j
j
Similarly,
2
|A − B| =
∑∑(
i
)
( )2
2
2
(λi ) + µj − 2λi µj |(vi , wj )| .
j
( )2
( )2 ( + )2
2
+
Now it is easy to check that (λi ) + µj − 2λi µj ≥ λ+
+ µj − 2λ+
i µj . i
416
APPENDIX C. FUNCTIONS OF MATRICES
Appendix D
Differential Equations
D.1
Theory Of Ordinary Differential Equations
Here I will present fundamental existence and uniqueness theorems for initial value problems
for the differential equation,
x′ = f (t, x) .
Suppose that f : [a, b] × Rn → Rn satisfies the following two conditions.
|f (t, x) − f (t, x1 )| ≤ K |x − x1 | ,
(4.1)
f is continuous.
(4.2)
The first of these conditions is known as a Lipschitz condition.
Lemma D.1.1 Suppose x : [a, b] → Rn is a continuous function and c ∈ [a, b]. Then x is a
solution to the initial value problem,
x′ = f (t, x) , x (c) = x0
if and only if x is a solution to the integral equation,
∫ t
x (t) = x0 +
f (s, x (s)) ds.
(4.3)
(4.4)
c
Proof: If x solves 4.4, then since f is continuous, we may apply the fundamental theorem
of calculus to differentiate both sides and obtain x′ (t) = f (t, x (t)) . Also, letting t = c on
both sides, gives x (c) = x0 . Conversely, if x is a solution of the initial value problem, we
may integrate both sides from c to t to see that x solves 4.4. Theorem D.1.2 Let f satisfy 4.1 and 4.2. Then there exists a unique solution to the initial
value problem, 4.3 on the interval [a, b].
{
}
Proof: Let ||x||λ ≡ sup eλt |x (t)| : t ∈ [a, b] . Then this norm is equivalent to the usual
norm on BC ([a, b] , Fn ) described in Example 14.6.2. This means that for ||·|| the norm given
there, there exist constants δ and ∆ such that
||x||λ δ ≤ ||x|| ≤ ∆ ||x||
417
418
APPENDIX D. DIFFERENTIAL EQUATIONS
for all x ∈BC ([a, b] , Fn ) . In fact, you can take δ ≡ eλa and ∆ ≡ eλb in case λ > 0 with the
two reversed in case λ < 0. Thus BC ([a, b] , Fn ) is a Banach space with this norm, ||·||λ .
Then let F : BC ([a, b] , Fn ) → BC ([a, b] , Fn ) be defined by
∫ t
f (s, x (s)) ds.
F x (t) ≡ x0 +
c
Let λ < 0. It follows
∫ t
λt
|f (s, x (s)) − f (s, y (s))| ds
e |F x (t) − F y (t)| ≤ e
c
∫ t
λ(t−s)
λs Ke
|x (s) − y (s)| e ds
≤ λt
c
∫
≤ ||x − y||λ
a
t
Keλ(t−s) ds ≤ ||x − y||λ
K
|λ|
and therefore,
||F x − F y||λ ≤ ||x − y||
K
.
|λ|
If |λ| is chosen larger than K, this implies F is a contraction mapping on BC ([a, b] , Fn ) .
Therefore, there exists a unique fixed point. With Lemma D.1.1 this proves the theorem. D.2
Linear Systems
As an example of the above theorem, consider for t ∈ [a, b] the system
x′ = A (t) x (t) + g (t) , x (c) = x0
(4.5)
where A (t) is an n × n matrix whose entries are continuous functions of t, (aij (t)) and
g (t) is a vector whose components are continuous functions of t satisfies the conditions
T
of Theorem D.1.2 with f (t, x) = A (t) x + g (t) . To see this, let x = (x1 , · · · , xn ) and
T
x1 = (x11 , · · · x1n ) . Then letting M = max {|aij (t)| : t ∈ [a, b] , i, j ≤ n} ,
|f (t, x) − f (t, x1 )| = |A (t) (x − x1 )|


2 1/2 
2 1/2 ∑
∑
n ∑
n
n
n
∑

 
 

= 
a
(t)
(x
−
x
)
≤
M
|x
−
x
|
 ij
j
1j 
j
1j

i=1 j=1
i=1 j=1

1/2 
1/2
n
n
n
∑
∑
∑
2
2
≤ M 
n
|xj − x1j |  = M n 
|xj − x1j | 
= M n |x − x1 | .
j=1
i=1 j=1
Therefore, let K = M n. This proves
Theorem D.2.1 Let A (t) be a continuous n × n matrix and let g (t) be a continuous vector
for t ∈ [a, b] and let c ∈ [a, b] and x0 ∈ Fn . Then there exists a unique solution to 4.5 valid
for t ∈ [a, b] .
This includes more examples of linear equations than are typically encountered in an
entire differential equations course.
D.3. LOCAL SOLUTIONS
D.3
419
Local Solutions
Lemma D.3.1 Let D (x0 , r) ≡ {x ∈ Fn : |x − x0 | ≤ r} and suppose U is an open set containing D (x0 , r) such that f : U → Fn is C 1 (U ) . (Recall this means all partial derivatives
of
∂f
f exist and are continuous.) Then for K = M n, where M denotes the maximum of ∂xi (z)
for z ∈ D (x0 , r) , it follows that for all x, y ∈ D (x0 , r) ,
|f (x) − f (y)| ≤ K |x − y| .
Proof: Let x, y ∈ D (x0 , r) and consider the line segment joining these two points,
x+t (y − x) for t ∈ [0, 1] . Letting h (t) = f (x+t (y − x)) for t ∈ [0, 1] , then
∫
1
f (y) − f (x) = h (1) − h (0) =
h′ (t) dt.
0
Also, by the chain rule,
h′ (t) =
n
∑
∂f
(x+t (y − x)) (yi − xi ) .
∂x
i
i=1
Therefore,
|f (y) − f (x)| =
∫
n
1∑
∂f
(x+t (y − x)) (yi − xi ) dt
0
∂x
i
i=1
∫ 1∑
n
∂f
≤
∂xi (x+t (y − x)) |yi − xi | dt
0 i=1
≤ M
n
∑
|yi − xi | ≤ M n |x − y| . i=1
Now consider the map, P which maps all of Rn to D (x0 , r) given as follows. For
x ∈ D (x0 , r) , P x = x. For x ∈D
/ (x0 , r) , P x will be the closest point in D (x0 , r) to x. Such
a closest point exists because D (x0 , r) is a closed and bounded set. Taking f (y) ≡ |y − x| ,
it follows f is a continuous function defined on D (x0 , r) which must achieve its minimum
value by the extreme value theorem from calculus.
x
Px
9
z
D(x0 , r)
x0
Lemma D.3.2 For any pair of points, x, y ∈ Fn , |P x − P y| ≤ |x − y| .
Proof: The above picture suggests the geometry of what is going on. Letting z ∈
D (x0 , r) , it follows that for all t ∈ [0, 1] ,
2
|x − P x| ≤ |x− (P x + t (z−P x))|
2
420
APPENDIX D. DIFFERENTIAL EQUATIONS
2
= |x−P x| + 2t Re ((x − P x) · (P x − z)) + t2 |z−P x|
2
Hence
2
2t Re ((x − P x) · (P x − z)) + t2 |z−P x| ≥ 0
and this can only happen if
Re ((x − P x) · (P x − z)) ≥ 0.
Therefore,
Re ((x − P x) · (P x−P y)) ≥ 0
Re ((y − P y) · (P y−P x)) ≥ 0
and so
Re (x − P x − (y − P y)) · (P x−P y) ≥ 0
which implies
Re (x − y) · (P x − P y) ≥ |P x − P y|
2
Then using the Cauchy Schwarz inequality it follows
|x − y| ≥ |P x − P y| .
With this here is the local existence and uniqueness theorem.
Theorem D.3.3 Let [a, b] be a closed interval and let U be an open subset of Fn . Let
∂f
f : [a, b] × U → Fn be continuous and suppose that for each t ∈ [a, b] , the map x → ∂x
(t, x)
i
is continuous. Also let x0 ∈ U and c ∈ [a, b] . Then there exists an interval, I ⊆ [a, b] such
that c ∈ I and there exists a unique solution to the initial value problem,
x′ = f (t, x) , x (c) = x0
(4.6)
valid for t ∈ I.
Proof: Consider the following picture.
U
x0
D(x0 , r)
The large dotted circle represents U and the little solid circle represents D (x0 , r) as
indicated. Here r is so small that D (x0 , r) is contained in U as shown. Now let P denote
the projection map defined above. Consider the initial value problem
x′ = f (t, P x) , x (c) = x0 .
(4.7)
∂f
From Lemma D.3.1 and the continuity of x → ∂x
(t, x) , there exists a constant, K such
i
that if x, y ∈ D (x0 , r) , then |f (t, x) − f (t, y)| ≤ K |x − y| for all t ∈ [a, b] . Therefore, by
Lemma D.3.2
|f (t, P x) − f (t, P y)| ≤ K |P x−P y| ≤ K |x − y| .
It follows from Theorem D.1.2 that 4.7 has a unique solution valid for t ∈ [a, b] . Since x
is continuous, it follows that there exists an interval, I containing c such that for t ∈ I,
D.4. FIRST ORDER LINEAR SYSTEMS
421
x (t) ∈ D (x0 , r) . Therefore, for these values of t, f (t, P x) = f (t, x) and so there is a unique
solution to 4.6 on I. Now suppose f has the property that for every R > 0 there exists a constant, KR such
that for all x, x1 ∈ B (0, R),
|f (t, x) − f (t, x1 )| ≤ KR |x − x1 | .
(4.8)
Corollary D.3.4 Let f satisfy 4.8 and suppose also that (t, x) → f (t, x) is continuous.
Suppose now that x0 is given and there exists an estimate of the form |x (t)| < R for all
t ∈ [0, T ) where T ≤ ∞ on the local solution to
x′ = f (t, x) , x (0) = x0 .
(4.9)
Then there exists a unique solution to the initial value problem, 4.9 valid on [0, T ).
Proof: Replace f (t, x) with f (t, P x) where P is the projection onto B (0, R). Then by
Theorem D.1.2 there exists a unique solution to the system
x′ = f (t, P x) , x (0) = x0
valid on [0, T1 ] for every T1 < T. Therefore, the above system has a unique solution on [0, T )
and from the estimate, P x = x. D.4
First Order Linear Systems
Here is a discussion of linear systems of the form
x′ = Ax + f (t)
where A is a constant n × n matrix and f is a vector valued function having all entries
continuous. Of course the existence theory is a very special case of the general considerations
above but I will give a self contained presentation based on elementary first order scalar
differential equations and linear algebra.
Definition D.4.1 Suppose t → M (t) is a matrix valued function of t. Thus M (t) =
(mij (t)) . Then define
)
(
M ′ (t) ≡ m′ij (t) .
In words, the derivative of M (t) is the matrix whose entries consist of the derivatives of the
entries of M (t) . Integrals of matrices are defined the same way. Thus
(∫
)
∫
b
b
M (t) di ≡
mij (t) dt .
a
a
In words, the integral of M (t) is the matrix obtained by replacing each entry of M (t) by the
integral of that entry.
With this definition, it is easy to prove the following theorem.
Theorem D.4.2 Suppose M (t) and N (t) are matrices for which M (t) N (t) makes sense.
Then if M ′ (t) and N ′ (t) both exist, it follows that
′
(M (t) N (t)) = M ′ (t) N (t) + M (t) N ′ (t) .
422
APPENDIX D. DIFFERENTIAL EQUATIONS
Proof:
(
(M (t) N
′)
(t)) ij
≡
=
(
(M (t) N (t))ij
∑
(
=
∑
)′
M (t)ik N (t)kj
k
(
)′
′
(M (t)ik ) N (t)kj + M (t)ik N (t)kj
k
≡
)′
∑(
M (t)
′)
ik
(
′)
N (t)kj + M (t)ik N (t) kj
k
≡ (M ′ (t) N (t) + M (t) N ′ (t))ij In the study of differential equations, one of the most important theorems is Gronwall’s
inequality which is next.
Theorem D.4.3 Suppose u (t) ≥ 0 and for all t ∈ [0, T ] ,
∫ t
u (t) ≤ u0 +
Ku (s) ds.
(4.10)
0
where K is some nonnegative constant. Then
u (t) ≤ u0 eKt .
(4.11)
∫t
Proof: Let w (t) = 0 u (s) ds. Then using the fundamental theorem of calculus, 4.10
w (t) satisfies the following.
u (t) − Kw (t) = w′ (t) − Kw (t) ≤ u0 , w (0) = 0.
(4.12)
Multiply both sides of this inequality by e−Kt and using the product rule and the chain
rule,
)
d ( −Kt
e−Kt (w′ (t) − Kw (t)) =
e
w (t) ≤ u0 e−Kt .
dt
Integrating this from 0 to t,
( −tK
)
∫ t
e
−1
−Kt
−Ks
e
w (t) ≤ u0
e
ds = u0 −
.
K
0
Now multiply through by eKt to obtain
)
( −tK
u0
u0
e
− 1 Kt
e = − + etK .
w (t) ≤ u0 −
K
K
K
Therefore, 4.12 implies
)
( u
u0
0
u (t) ≤ u0 + K − + etK = u0 eKt .
K
K
With Gronwall’s inequality, here is a theorem on uniqueness of solutions to the initial
value problem,
x′ = Ax + f (t) , x (a) = xa ,
(4.13)
in which A is an n × n matrix and f is a continuous function having values in Cn .
Theorem D.4.4 Suppose x and y satisfy 4.13. Then x (t) = y (t) for all t.
D.4. FIRST ORDER LINEAR SYSTEMS
423
Proof: Let z (t) = x (t + a) − y (t + a). Then for t ≥ 0,
z′ = Az, z (0) = 0.
(4.14)
Note that for K = max {|aij |} , where A = (aij ) ,
)
(
∑
2
∑
∑ |zi |2
|z
|
j
2
|(Az, z)| = aij zj zi ≤ K
|zi | |zj | ≤ K
+
= nK |z| .
2
2
ij
ij
ij
2
(For x and y real numbers, xy ≤ x2 +
2
Similarly, |(z,Az)| ≤ nK |z| .Thus,
y2
2
2
because this is equivalent to saying (x − y) ≥ 0.)
2
|(z,Az)| , |(Az, z)| ≤ nK |z| .
(4.15)
Now multiplying 4.14 by z and observing that
d ( 2)
|z| = (z′ , z) + (z, z′ ) = (Az, z) + (z,Az) ,
dt
it follows from 4.15 and the observation that z (0) = 0,
∫ t
2
2
|z (t)| ≤
2nK |z (s)| ds
0
2
and so by Gronwall’s inequality, |z (t)| = 0 for all t ≥ 0. Thus,
x (t) = y (t)
for all t ≥ a.
Now let w (t) = x (a − t) − y (a − t) for t ≥ 0. Then w′ (t) = (−A) w (t) and you can
repeat the argument which was just given to conclude that x (t) = y (t) for all t ≤ a. Definition D.4.5 Let A be an n × n matrix. We say Φ (t) is a fundamental matrix for A
if
Φ′ (t) = AΦ (t) , Φ (0) = I,
(4.16)
−1
and Φ (t)
exists for all t ∈ R.
Why should anyone care about a fundamental matrix? The reason is that such a matrix
valued function makes possible a convenient description of the solution of the initial value
problem,
x′ = Ax + f (t) , x (0) = x0 ,
(4.17)
on the interval, [0, T ] . First consider the special case where n = 1. This is the first order
linear differential equation,
r′ = λr + g, r (0) = r0 ,
(4.18)
where g is a continuous scalar valued function. First consider the case where g = 0.
Lemma D.4.6 There exists a unique solution to the initial value problem,
r′ = λr, r (0) = 1,
(4.19)
and the solution for λ = a + ib is given by
r (t) = eat (cos bt + i sin bt) .
(4.20)
This solution to the initial value problem is denoted as eλt . (If λ is real, eλt as defined here
reduces to the usual exponential function so there is no contradiction between this and earlier
notation seen in Calculus.)
424
APPENDIX D. DIFFERENTIAL EQUATIONS
Proof: From the uniqueness theorem presented above, Theorem D.4.4, applied to the
case where n = 1, there can be no more than one solution to the initial value problem,
4.19. Therefore, it only remains to verify 4.20 is a solution to 4.19. However, this is an easy
calculus exercise. Note the differential equation in 4.19 says
d ( λt )
e
= λeλt .
dt
With this lemma, it becomes possible to easily solve the case in which g ̸= 0.
(4.21)
Theorem D.4.7 There exists a unique solution to 4.18 and this solution is given by the
formula,
∫ t
r (t) = eλt r0 + eλt
e−λs g (s) ds.
(4.22)
0
Proof: By the uniqueness theorem, Theorem D.4.4, there is no more
∫ 0than one solution.
It only remains to verify that 4.22 is a solution. But r (0) = eλ0 r0 + 0 e−λs g (s) ds = r0
and so the initial condition is satisfied. Next differentiate this expression to verify the
differential equation is also satisfied. Using 4.21, the product rule and the fundamental
theorem of calculus,
∫ t
′
λt
λt
r (t) = λe r0 + λe
e−λs g (s) ds + eλt e−λt g (t) = λr (t) + g (t) . 0
Now consider the question of finding a fundamental matrix for A. When this is done,
it will be easy to give a formula for the general solution to 4.17 known as the variation of
constants formula, arguably the most important result in differential equations.
The next theorem gives a formula for the fundamental matrix 4.16. It is known as
Putzer’s method [1],[22].
Theorem D.4.8 Let A be an n × n matrix whose eigenvalues are {λ1 , · · · , λn } listed according to multiplicity as roots of the characteristic equation. Define
Pk (A) ≡
k
∏
(A − λm I) , P0 (A) ≡ I,
m=1
and let the scalar valued functions, rk (t) be defined as
value problem
 

 ′
r0 (t)
0
 r1′ (t)   λ1 r1 (t) + r0 (t) 
 

 ′
 r2 (t)   λ2 r2 (t) + r1 (t) 
=
,


 ..  
..

 .  
.
rn′ (t)
λn rn (t) + rn−1 (t)
the solutions to the following initial







r0 (0)
r1 (0)
r2 (0)
..
.


0
1
0
..
.
 
 
 
=
 
 
rn (0)







0
Note the system amounts to a list of single first order linear differential equations. Now
define
n−1
∑
Φ (t) ≡
rk+1 (t) Pk (A) .
k=0
Then
Φ′ (t) = AΦ (t) , Φ (0) = I.
(4.23)
−1
Furthermore, if Φ (t) is a solution to 4.23 for all t, then it follows Φ (t)
and Φ (t) is the unique fundamental matrix for A.
exists for all t
D.4. FIRST ORDER LINEAR SYSTEMS
425
Proof: The first part of this follows from a computation. First note that by the Cayley Hamilton theorem, Pn (A) = 0 and r0 (t) = 0. Also from the formula, if we define
∏0
m=1 (A − λm I) ≡ I to correspond to the above definition, for all k ≥ 1,
Pk (A) = (A − λk I)
k−1
∏
(A − λm I) = (A − λk I) Pk−1 (A)
m=1
Now for the computation:
Φ′ (t) =
n−1
∑
′
rk+1
(t) Pk (A) =
k=0
n−1
∑
λk+1 rk+1 (t) Pk (A) +
k=0
=
n−1
∑
k=0
n−1
∑
(λk+1 rk+1 (t) + rk (t)) Pk (A) =
k=0
n
∑
rk (t) Pk (A) =
k=1
λk+1 rk+1 (t) Pk (A)+
n−1
∑
λk+1 rk+1 (t) Pk (A) +
k=0
n−1
∑
n−1
∑
rk+1 (t) Pk+1 (A)
k=0
rk+1 (t) (A − λk+1 I) Pk (A) = A
k=0
n−1
∑
rk+1 (t) Pk (A) = AΦ (t)
k=0
That Φ (0) = I follows from
Φ (0) =
n−1
∑
rk+1 (0) Pk (A) = r1 (0) P0 (A) = I.
k=0
−1
It remains to verify that if 4.23 holds, then Φ (t) exists for all t. To do so, consider
v ̸= 0 and suppose for some t0 , Φ (t0 ) v = 0. Let x (t) ≡ Φ (t0 + t) v. Then
x′ (t) = AΦ (t0 + t) v = Ax (t) , x (0) = Φ (t0 ) v = 0.
But also z (t) ≡ 0 also satisfies
z′ (t) = Az (t) , z (0) = 0,
and so by the theorem on uniqueness, it must be the case that z (t) = x (t) for all t, showing
that Φ (t + t0 ) v = 0 for all t, and in particular for t = −t0 . Therefore,
Φ (−t0 + t0 ) v = Iv = 0
and so v = 0, a contradiction. It follows that Φ (t) must be one to one for all t and so,
−1
Φ (t) exists for all t.
It only remains to verify the solution to 4.23 is unique. Suppose Ψ is another fundamental
matrix solving 4.23. Then letting v be an arbitrary vector,
z (t) ≡ Φ (t) v, y (t) ≡ Ψ (t) v
both solve the initial value problem,
x′ = Ax, x (0) = v,
and so by the uniqueness theorem, z (t) = y (t) for all t showing that Φ (t) v = Ψ (t) v for
all t. Since v is arbitrary, this shows that Φ (t) = Ψ (t) for every t. It is useful to consider the differential equations for the rk for k ≥ 1. As noted above,
r0 (t) = 0 and r1 (t) = eλ1 t .
′
rk+1
= λk+1 rk+1 + rk , rk+1 (0) = 0.
426
APPENDIX D. DIFFERENTIAL EQUATIONS
Thus
∫
t
eλk+1 (t−s) rk (s) ds.
rk+1 (t) =
0
Therefore,
∫
t
eλ2 (t−s) eλ1 s ds =
r2 (t) =
0
eλ1 t − eλ2 t
−λ2 + λ1
assuming λ1 ̸= λ2 .
Sometimes people define a fundamental matrix to be a matrix Φ (t) such that Φ′ (t) =
AΦ (t) and det (Φ (t)) ̸= 0 for all t. Thus this avoids the initial condition, Φ (0) = I. The
next proposition has to do with this situation.
Proposition D.4.9 Suppose A is an n × n matrix and suppose Φ (t) is an n × n matrix for
each t ∈ R with the property that
Φ′ (t) = AΦ (t) .
(4.24)
Then either Φ (t)
−1
−1
exists for all t ∈ R or Φ (t)
fails to exist for all t ∈ R.
−1
Proof: Suppose Φ (0) exists and 4.24 holds. Let Ψ (t) ≡ Φ (t) Φ (0)
and
−1
−1
Ψ′ (t) = Φ′ (t) Φ (0) = AΦ (t) Φ (0) = AΨ (t)
−1
−1
. Then Ψ (0) = I
−1
so by Theorem D.4.8, Ψ (t) exists for all t. Therefore, Φ (t) also exists for all t.
−1
−1
Next suppose Φ (0) does not exist. I need to show Φ (t) does not exist for any t.
−1
−1
Suppose then that Φ (t0 ) does exist. Then letΨ (t) ≡ Φ (t0 + t) Φ (t0 ) . Then Ψ (0) =
−1
′
exists for all t and so for all
I and Ψ = AΨ so by Theorem D.4.8 it follows Ψ (t)
−1
−1
t, Φ (t + t0 ) must also exist, even for t = −t0 which implies Φ (0) exists after all. The conclusion of this proposition is usually referred to as the Wronskian alternative and
another way to say it is that if 4.24 holds, then either det (Φ (t)) = 0 for all t or det (Φ (t))
is never equal to 0. The Wronskian is the usual name of the function, t → det (Φ (t)).
The following theorem gives the variation of constants formula,.
Theorem D.4.10 Let f be continuous on [0, T ] and let A be an n×n matrix and x0 a vector
in Cn . Then there exists a unique solution to 4.17, x, given by the variation of constants
formula,
∫
t
x (t) = Φ (t) x0 + Φ (t)
−1
Φ (s)
f (s) ds
(4.25)
0
−1
for Φ (t) the fundamental matrix for A. Also, Φ (t) = Φ (−t) and Φ (t + s) = Φ (t) Φ (s)
for all t, s and the above variation of constants formula can also be written as
∫ t
x (t) = Φ (t) x0 +
Φ (t − s) f (s) ds
(4.26)
0
∫ t
= Φ (t) x0 +
Φ (s) f (t − s) ds
(4.27)
0
Proof: From the uniqueness theorem there is at most one solution to 4.17. Therefore,
if 4.25 solves 4.17, the theorem is proved. The verification that the given formula works
is identical with the verification that the scalar formula given in Theorem D.4.7 solves the
−1
initial value problem given there. Φ (s) is continuous because of the formula for the inverse
of a matrix in terms of the transpose of the cofactor matrix. Therefore, the integrand in
D.4. FIRST ORDER LINEAR SYSTEMS
427
4.25 is continuous and the fundamental theorem of calculus applies. To verify the formula
for the inverse, fix s and consider x (t) = Φ (s + t) v, and y (t) = Φ (t) Φ (s) v. Then
x′ (t) = AΦ (t + s) v = Ax (t) , x (0) = Φ (s) v
y′ (t) = AΦ (t) Φ (s) v = Ay (t) , y (0) = Φ (s) v.
By the uniqueness theorem, x (t) = y (t) for all t. Since s and v are arbitrary, this shows
−1
Φ (t + s) = Φ (t) Φ (s) for all t, s. Letting s = −t and using Φ (0) = I verifies Φ (t)
=
Φ (−t) .
−1
Next, note that this also implies Φ (t − s) Φ (s) = Φ (t) and so Φ (t − s) = Φ (t) Φ (s) .
Therefore, this yields 4.26 and then 4.27follows from changing the variable. −1
If Φ′ = AΦ and Φ (t) exists for all t, you should verify that the solution to the initial
value problem
x′ = Ax + f , x (t0 ) = x0
is given by
∫
t
x (t) = Φ (t − t0 ) x0 +
Φ (t − s) f (s) ds.
t0
Theorem D.4.10 is general enough to include all constant coefficient linear differential
equations or any order. Thus it includes as a special case the main topics of an entire
elementary differential equations class. This is illustrated in the following example. One
can reduce an arbitrary linear differential equation to a first order system and then apply the
above theory to solve the problem. The next example is a differential equation of damped
vibration.
Example D.4.11 The differential equation is y ′′ + 2y ′ + 2y = cos t and initial conditions,
y (0) = 1 and y ′ (0) = 0.
To solve this equation, let x1 = y and x2 = x′1 = y ′ . Then, writing this in terms of these
new variables, yields the following system.
x′2 + 2x2 + 2x1 = cos t
x′1 = x2
This system can be written in the above form as
(
)′ (
) (
) (
)(
) (
)
x1
x2
0
0
1
x1
0
=
+
=
+
.
x2
−2x2 − 2x1
cos t
−2 −2
x2
cos t
and the initial condition is of the form
(
)
(
)
x1
1
(0) =
x2
0
Now P0 (A) ≡ I. The eigenvalues are −1 + i, −1 − i and so
((
)
(
)) (
0
1
1 0
1−i
P1 (A) =
− (−1 + i)
=
−2 −2
0 1
−2
Recall r0 (t) ≡ 0 and r1 (t) = e(−1+i)t . Then
r2′ = (−1 − i) r2 + e(−1+i)t , r2 (0) = 0
1
−1 − i
)
.
428
APPENDIX D. DIFFERENTIAL EQUATIONS
and so
e(−1+i)t − e(−1−i)t
= e−t sin (t)
2i
Putzer’s method yields the fundamental matrix as
(
)
(
)
1 0
1−i
1
(−1+i)t
−t
Φ (t) = e
+ e sin (t)
0 1
−2 −1 − i
( −t
)
−t
e (cos (t) + sin (t))
e sin t
=
−2e−t sin t
e−t (cos (t) − sin (t))
r2 (t) =
From variation of constants formula the desired solution is
(
)
( −t
)(
)
x1
e (cos (t) + sin (t))
e−t sin t
1
(t) =
x2
−2e−t sin t
e−t (cos (t) − sin (t))
0
∫ t(
)(
)
e−s (cos (s) + sin (s))
e−s sin s
0
+
−2e−s sin s
e−s (cos (s) − sin (s))
cos (t − s)
0
( −t
) ∫ t(
)
e (cos (t) + sin (t))
e−s sin (s) cos (t − s)
=
+
ds
−2e−t sin t
e−s (cos s − sin s) cos (t − s)
0
)
( −t
) ( 1
e (cos (t) + sin (t))
− 5 (cos t) e−t − 53 e−t sin t + 51 cos t + 25 sin t
=
+
−2e−t sin t
− 25 (cos t) e−t + 54 e−t sin t + 52 cos t − 15 sin t
( 4
)
−t
+ 25 e−t sin t + 15 cos t + 52 sin t
5 (cos t) e
=
− 65 e−t sin t − 25 (cos t) e−t + 25 cos t − 15 sin t
Thus y (t) = x1 (t) =
D.5
4
5
(cos t) e−t + 25 e−t sin t +
1
5
cos t +
2
5
sin t.
Geometric Theory Of Autonomous Systems
Here a sufficient condition is given for stability of a first order system. First of all, here is
a fundamental estimate for the entries of a fundamental matrix.
Lemma D.5.1 Let the functions, rk be given in the statement of Theorem D.4.8 and suppose that A is an n × n matrix whose eigenvalues are {λ1 , · · · , λn } . Suppose that these
eigenvalues are ordered such that
Re (λ1 ) ≤ Re (λ2 ) ≤ · · · ≤ Re (λn ) < 0.
Then if 0 > −δ > Re (λn ) is given, there exists a constant, C such that for each k =
0, 1, · · · , n,
|rk (t)| ≤ Ce−δt
(4.28)
for all t > 0.
Proof: This is obvious for r0 (t) because it is identically equal to 0. From the definition
of the rk , r1′ = λ1 r1 , r1 (0) = 1 and so r1 (t) = eλ1 t which implies
|r1 (t)| ≤ eRe(λ1 )t .
Suppose for some m ≥ 1 there exists a constant, Cm such that
|rk (t)| ≤ Cm tm eRe(λm )t
D.5. GEOMETRIC THEORY OF AUTONOMOUS SYSTEMS
429
for all k ≤ m for all t > 0. Then
′
rm+1
(t) = λm+1 rm+1 (t) + rm (t) , rm+1 (0) = 0
and so
∫
t
rm+1 (t) = eλm+1 t
e−λm+1 s rm (s) ds.
0
Then by the induction hypothesis,
∫
|rm+1 (t)| ≤
e
t
Re(λm+1 )t
∫
0
∫
0
t
≤ eRe(λm+1 )t
−λ
e m+1 s Cm sm eRe(λm )s ds
sm Cm e− Re(λm+1 )s eRe(λm )s ds
t
≤ eRe(λm+1 )t
sm Cm ds =
0
Cm m+1 Re(λm+1 )t
t
e
m+1
It follows by induction there exists a constant, C such that for all k ≤ n,
|rk (t)| ≤ Ctn eRe(λn )t
and this obviously implies the conclusion of the lemma.
The proof of the above lemma yields the following corollary.
Corollary D.5.2 Let the functions, rk be given in the statement of Theorem D.4.8 and
suppose that A is an n × n matrix whose eigenvalues are {λ1 , · · · , λn } . Suppose that these
eigenvalues are ordered such that
Re (λ1 ) ≤ Re (λ2 ) ≤ · · · ≤ Re (λn ) .
Then there exists a constant C such that for all k ≤ m
|rk (t)| ≤ Ctm eRe(λm )t .
With the lemma, the following sloppy estimate is available for a fundamental matrix.
Theorem D.5.3 Let A be an n × n matrix and let Φ (t) be the fundamental matrix for A.
That is,
Φ′ (t) = AΦ (t) , Φ (0) = I.
Suppose also the eigenvalues of A are {λ1 , · · · , λn } where these eigenvalues are ordered such
that
Re (λ1 ) ≤ Re (λ2 ) ≤ · · · ≤ Re (λn ) < 0.
Then if 0 > −δ > Re (λn ) , is given, there exists a constant, C such that Φ (t)ij ≤ Ce−δt
for all t > 0. Also
|Φ (t) x| ≤ Cn3/2 e−δt |x| .
(4.29)
Proof: Let
{
}
M ≡ max Pk (A)ij for all i, j, k .
Then from Putzer’s formula for Φ (t) and Lemma D.5.1, there exists a constant, C such that
n−1
∑ −δt
Ce M.
Φ (t)ij ≤
k=0
430
APPENDIX D. DIFFERENTIAL EQUATIONS
Let the new C be given by nCM. Next,

2
2

n
n
n
n
∑
∑
∑
∑
2


|Φij (t)| |xj |
|Φ (t) x| ≡
Φij (t) xj  ≤
i=1
≤
n
∑
i=1


n
∑
j=1
i=1
2
Ce−δt |x| = C 2 e−2δt
n
∑
j=1
(n |x|) = C 2 e−2δt n3 |x|
2
2
i=1
j=1
This proves 4.29 and completes the proof.
Definition D.5.4 Let f : U → Rn where U is an open subset of Rn such that a ∈ U and
f (a) = 0. A point, a where f (a) = 0 is called an equilibrium point. Then a is asymptotically
stable if for any ε > 0 there exists r > 0 such that whenever |x0 − a| < r and x (t) the
solution to the initial value problem,
x′ = f (x) , x (0) = x0 ,
it follows
lim x (t) = a, |x (t) − a| < ε
t→∞
A differential equation of the form x′ = f (x) is called autonomous as opposed to a nonautonomous equation of the form x′ = f (t, x) . The equilibrium point a is stable if for every
ε > 0 there exists δ > 0 such that if |x0 − a| < δ, then if x is the solution of
x′ = f (x) , x (0) = x0 ,
(4.30)
then |x (t) − a| < ε for all t > 0.
Obviously asymptotic stability implies stability.
An ordinary differential equation is called almost linear if it is of the form
x′ = Ax + g (x)
where A is an n × n matrix and
g (x)
= 0.
x→0 |x|
lim
Now the stability of an equilibrium point of an autonomous system, x′ = f (x) can
always be reduced to the consideration of the stability of 0 for an almost linear system.
Here is why. If you are considering the equilibrium point, a for x′ = f (x) , you could
define a new variable, y by a + y = x. Then asymptotic stability would involve |y (t)| < ε
and limt→∞ y (t) = 0 while stability would only require |y (t)| < ε. Then since a is an
equilibrium point, y solves the following initial value problem.
y′ = f (a + y) − f (a) , y (0) = y0 ,
where y0 = x0 − a.
Let A = Df (a) . Then from the definition of the derivative of a function,
y′ = Ay + g (y) , y (0) = y0
where
lim
y→0
g (y)
= 0.
|y|
(4.31)
D.6. GENERAL GEOMETRIC THEORY
431
Thus there is never any loss of generality in considering only the equilibrium point 0 for an
almost linear system.1 Therefore, from now on I will only consider the case of almost linear
systems and the equilibrium point 0.
Theorem D.5.5 Consider the almost linear system of equations,
x′ = Ax + g (x)
where
lim
x→0
(4.32)
g (x)
=0
|x|
and g is a C 1 function. Suppose that for all λ an eigenvalue of A, Re λ < 0. Then 0 is
asymptotically stable.
Proof: By Theorem D.5.3 there exist constants δ > 0 and K such that for Φ (t) the
fundamental matrix for A,
|Φ (t) x| ≤ Ke−δt |x| .
Let ε > 0 be given and let r be small enough that Kr < ε and for |x| < (K + 1) r, |g (x)| <
η |x| where η is so small that Kη < δ, and let |y0 | < r. Then by the variation of constants
formula, the solution to 4.32, at least for small t satisfies
∫ t
y (t) = Φ (t) y0 +
Φ (t − s) g (y (s)) ds.
0
The following estimate holds.
∫ t
∫ t
−δt
−δ(t−s)
−δt
|y (t)| ≤ Ke
|y0 | +
Ke
η |y (s)| ds < Ke r +
Ke−δ(t−s) η |y (s)| ds.
0
0
Therefore,
∫
eδt |y (t)| < Kr +
t
Kηeδs |y (s)| ds.
0
By Gronwall’s inequality,
eδt |y (t)| < KreKηt
and so
|y (t)| < Kre(Kη−δ)t < εe(Kη−δ)t
Therefore, |y (t)| < Kr < ε for all t and so from Corollary D.3.4, the solution to 4.32 exists
for all t ≥ 0 and since Kη − δ < 0,
lim |y (t)| = 0.
t→∞
D.6
General Geometric Theory
Here I will consider the case where the matrix A has both positive and negative eigenvalues.
First here is a useful lemma.
1 This is no longer true when you study partial differential equations as ordinary differential equations in
infinite dimensional spaces.
432
APPENDIX D. DIFFERENTIAL EQUATIONS
Lemma D.6.1 Suppose A is an n × n matrix and there exists δ > 0 such that
0 < δ < Re (λ1 ) ≤ · · · ≤ Re (λn )
where {λ1 , · · · , λn } are the eigenvalues of A, with possibly some repeated. Then there exists
a constant, C such that for all t < 0,
|Φ (t) x| ≤ Ceδt |x|
Proof: I want an estimate on the solutions to the system
Φ′ (t) = AΦ (t) , Φ (0) = I.
for t < 0. Let s = −t and let Ψ (s) = Φ (t) . Then writing this in terms of Ψ,
Ψ′ (s) = −AΨ (s) , Ψ (0) = I.
Now the eigenvalues of −A have real parts less than −δ because these eigenvalues are
obtained from the eigenvalues of A by multiplying by −1. Then by Theorem D.5.3 there
exists a constant, C such that for any x,
|Ψ (s) x| ≤ Ce−δs |x| .
Therefore, from the definition of Ψ,
|Φ (t) x| ≤ Ceδt |x| .
Here is another essential lemma which is found in Coddington and Levinson [6]
Lemma D.6.2 Let pj (t) be polynomials with complex coefficients and let
f (t) =
m
∑
pj (t) eλj t
j=1
where m ≥ 1, λj ̸= λk for j ̸= k, and none of the pj (t) vanish identically. Let
σ = max (Re (λ1 ) , · · · , Re (λm )) .
Then there exists a positive number, r and arbitrarily large positive values of t such that
e−σt |f (t)| > r.
In particular, |f (t)| is unbounded.
Proof: Suppose the largest exponent of any of the pj is M and let λj = aj + ibj . First
assume each aj = 0. This is convenient because σ = 0 in this case and the largest of the
Re (λj ) occurs in every λj .
Then arranging the above sum as a sum of decreasing powers of t,
f (t) = tM fM (t) + · · · + tf1 (t) + f0 (t) .
Then
−M
t
( )
1
f (t) = fM (t) + O
t
D.7. THE STABLE MANIFOLD
where the last term means that tO
433
(1)
t
is bounded. Then
fM (t) =
m
∑
cj eibj t
j=1
It can’t be the case that all the cj are equal to 0 because then M would not be the highest
power exponent. Suppose ck ̸= 0. Then
1
T →∞ T
∫
T
lim
t−M f (t) e−ibk t dt =
0
m
∑
j=1
cj
1
T
∫
T
ei(bj −bk )t dt = ck ̸= 0.
0
Letting r = |ck /2| , it follows t−M f (t) e−ibk t > r for arbitrarily large values of t. Thus it
is also true that |f (t)| > r for arbitrarily large values of t.
Next consider the general case in which σ is given above. Thus
∑
e−σt f (t) =
pj (t) ebj t + g (t)
j:aj =σ
∑
where limt→∞ g (t) = 0, g (t) being of the form s ps (t) e(as −σ+ibs )t where as − σ < 0. Then
this reduces to the case above in which σ = 0. Therefore, there exists r > 0 such that
−σt
e f (t) > r
for arbitrarily large values of t. Next here is a Banach space which will be useful.
Lemma D.6.3 For γ > 0, let
{
}
Eγ = x ∈ BC ([0, ∞), Fn ) : t → eγt x (t) is also in BC ([0, ∞), Fn )
and let the norm be given by
{
}
||x||γ ≡ sup eγt x (t) : t ∈ [0, ∞)
Then Eγ is a Banach space.
Proof: Let {xk } be a Cauchy sequence in Eγ . Then since BC ([0, ∞), Fn ) is a Banach
space, there exists y ∈ BC ([0, ∞), Fn ) such that eγt xk (t) converges uniformly on [0, ∞) to
y (t). Therefore e−γt eγt xk (t) = xk (t) converges uniformly to e−γt y (t) on [0, ∞). Define
x (t) ≡ e−γt y (t) . Then y (t) = eγt x (t) and by definition,
||xk − x||γ → 0.
D.7
The Stable Manifold
(
Here assume
A=
A−
0
0
A+
)
(4.33)
where A− and A+ are square matrices of size k × k and (n − k) × (n − k) respectively. Also
assume A− has eigenvalues whose real parts are all less than −α while A+ has eigenvalues
434
APPENDIX D. DIFFERENTIAL EQUATIONS
whose real parts are all larger than α. Assume also that each of A− and A+ is upper
triangular.
Also, I will use the following convention. For v ∈ Fn ,
(
)
v−
v=
v+
where v− consists of the first k entries of v.
Then from Theorem D.5.3 and Lemma D.6.1 the following lemma is obtained.
Lemma D.7.1 Let A be of the form given in 4.33 as explained above and let Φ+ (t) and
Φ− (t) be the fundamental matrices corresponding to A+ and A− respectively. Then there
exist positive constants, α and γ such that
Also for any nonzero x ∈ C
|Φ+ (t) y| ≤ Ceαt for all t < 0
(4.34)
|Φ− (t) y| ≤ Ce−(α+γ)t for all t > 0.
(4.35)
n−k
,
|Φ+ (t) x| is unbounded.
(4.36)
Proof: The first two claims have been established already. It suffices to pick α and γ
such that − (α + γ) is larger than all eigenvalues of A− and α is smaller than all eigenvalues
of A+ . It remains to verify 4.36. From the Putzer formula for Φ+ (t) ,
Φ+ (t) x =
n−1
∑
rk+1 (t) Pk (A) x
k=0
where P0 (A) ≡ I. Now each rk is a polynomial (possibly a constant) times an exponential.
This follows easily from the definition of the rk as solutions of the differential equations
′
rk+1
= λk+1 rk+1 + rk .
Now by assumption the eigenvalues have positive real parts so
σ ≡ max (Re (λ1 ) , · · · , Re (λn−k )) > 0.
It can also be assumed
Re (λ1 ) ≥ · · · ≥ Re (λn−k )
By Lemma D.6.2 it follows |Φ+ (t) x| is unbounded. This follows because
Φ+ (t) x = r1 (t) x +
n−1
∑
rk+1 (t) yk , r1 (t) = eλ1 t .
k=1
Since x ̸= 0, it has a nonzero entry, say xm ̸= 0. Consider the mth entry of the vector
Φ+ (t) x. By this Lemma the mth entry is unbounded and this is all it takes for x (t) to be
unbounded. Lemma D.7.2 Consider the initial value problem for the almost linear system
x′ = Ax + g (x) , x (0) = x0 ,
D.7. THE STABLE MANIFOLD
435
where g is C 1 and A is of the special form
(
A−
A=
0
0
A+
)
in which A− is a k × k matrix which has eigenvalues for which the real parts are all negative
and A+ is a (n − k) × (n − k) matrix for which the real parts of all the eigenvalues are
positive. Then 0 is not stable. More precisely, there exists a set of points (a− , ψ (a− )) for
a− small such that for x0 on this set,
lim x (t, x0 ) = 0
t→∞
and for x0 not on this set, there exists a δ > 0 such that |x (t, x0 )| cannot remain less than
δ for all positive t.
Proof: Consider the initial value problem for the almost linear equation,
(
)
a−
x′ = Ax + g (x) , x (0) = a =
.
a+
Then by the variation of constants formula, a local solution has the form
(
)(
)
Φ− (t)
0
a−
x (t, a) =
0
Φ+ (t)
a+
)
∫ t(
Φ− (t − s)
0
+
g (x (s, a)) ds
0
Φ+ (t − s)
0
(4.37)
Write x (t) for x (t, a) for short. Let ε > 0 be given and suppose δ is such that if |x| < δ,
then |g± (x)| < ε |x|. Assume from now on that |a| < δ. Then suppose |x (t)| < δ for all
t > 0. Writing 4.37 differently yields
(
)(
)
) ( ∫t
Φ− (t)
0
a−
Φ (t − s) g− (x (s, a)) ds
0 −
x (t, a) =
+
0
Φ+ (t)
a+
0
(
)
0
+ ∫t
Φ
(t
−
s)
g
+ (x (s, a)) ds
0 +
)
(
)(
) ( ∫t
Φ− (t)
0
a−
Φ (t − s) g− (x (s, a)) ds
0 −
=
+
0
Φ+ (t)
a+
0
(
)
0∫
∫
+
.
∞
∞
Φ+ (t − s) g+ (x (s, a)) ds − t Φ+ (t − s) g+ (x (s, a)) ds
0
These improper integrals converge thanks to the assumption that x is bounded and the
estimates 4.34 and 4.35. Continuing the rewriting,
) )
( (
∫t
(
)
x− (t)
Φ− (t) a− + 0 Φ− (t − s) g− (x (s, a)) ds
=
(
)
∫∞
x+ (t)
Φ+ (t) a+ + 0 Φ+ (−s) g+ (x (s, a)) ds
(
)
0
∫∞
+
.
− t Φ+ (t − s) g+ (x (s, a)) ds
It follows from Lemma
∫ ∞ D.7.1 that if |x (t, a)| is bounded by δ as asserted, then it must be
the case that a+ + 0 Φ+ (−s) g+ (x (s, a)) ds = 0. Consequently, it must be the case that
(
) ( ∫t
)
a−
Φ− (t − s) g− (x (s, a)) ds
0
∫
x (t) = Φ (t)
+
(4.38)
∞
0
− t Φ+ (t − s) g+ (x (s, a)) ds
436
APPENDIX D. DIFFERENTIAL EQUATIONS
Letting t → 0, this requires that for a solution to the initial value problem to exist and also
satisfy |x (t)| < δ for all t > 0 it must be the case that
(
)
a−
∫
x (0) =
∞
− 0 Φ+ (−s) g+ (x (s, a)) ds
where x (t, a) is the solution of
(
′
x = Ax + g (x) , x (0) =
−
∫∞
0
a−
Φ+ (−s) g+ (x (s, a)) ds
)
This is because in 4.38, if x is bounded by δ then the reverse steps show x is a solution of
the above differential equation and initial condition.
T
It follows if I can show that for all a− sufficiently small and a = (a− , 0) , there exists a
solution to 4.38 x (s, a) on (0, ∞) for which |x (s, a)| < δ, then I can define
∫ ∞
ψ (a) ≡ −
Φ+ (−s) g+ (x (s, a)) ds
0
T
and conclude that |x (t, x0 )| < δ for all t > 0 if and only if x0 = (a− , ψ (a− )) for some
sufficiently small a− .
Let C, α, γ be the constants of Lemma D.7.1. Let η be a small positive number such
that
Cη
1
<
α
6
∂g
Note that ∂x
(0)
=
0.
Therefore,
by
Lemma
D.3.1,
there exists δ > 0 such that if |x| , |y| ≤ δ,
i
then
|g (x) − g (y)| < η |x − y|
and in particular,
|g± (x) − g± (y)| < η |x − y|
because each
∂g
∂xi
(4.39)
(x) is very small. In particular, this implies
|g− (x)| < η |x| , |g+ (x)| < η |x| .
δ
For x ∈ Eγ defined in Lemma D.6.3 and |a− | < 2C
,
(
)
∫t
Φ− (t) a
− + 0 Φ− (t − s) g− (x (s)) ds
∫
F x (t) ≡
.
∞
− t Φ+ (t − s) g+ (x (s)) ds
I need to find a fixed point of F. Letting ||x||γ < δ, and using the estimates of Lemma D.7.1,
∫ t
γt
γt
γt
e |F x (t)| ≤ e |Φ− (t) a− | + e
Ce−(α+γ)(t−s) η |x (s)| ds
0
∫ ∞
γt
+e
Ceα(t−s) η |x (s)| ds
t
∫ t
δ −(α+γ)t
e
+ eγt ||x||γ Cη
e−(α+γ)(t−s) e−γs ds
2C
0
∫ ∞
γt
+e Cη
eα(t−s) e−γs ds ||x||γ
t
∫ t
∫ ∞
δ
+ δCη
e−α(t−s) ds + Cηδ
e(α+γ)(t−s) ds
<
2
0
t
(
)
δ
1
δCη
1 Cη
2δ
<
+ δCη +
≤δ
+
< .
2
α α+γ
2
α
3
≤ eγt C
D.7. THE STABLE MANIFOLD
437
Thus F maps every x ∈ Eγ having ||x||γ < δ to F x where ||F x||γ ≤
Now let x, y ∈ Eγ where ||x||γ , ||y||γ < δ. Then
∫
e |F x (t) − F y (t)|
γt
≤
t
γt
e
|Φ− (t − s)| ηe−γs eγs |x (s) − y (s)| ds
∫
0
∞
+eγt
≤ Cη ||x − y||γ
(
≤ Cη
1
1
+
α α+γ
)
|Φ+ (t − s)| e−γs eγs η |x (s) − y (s)| ds
t
(∫
t
e
2δ
3 .
−α(t−s)
) ∫
ds +
0
||x − y||γ <
∞
e(α+γ)(t−s) ds
t
1
2Cη
||x − y||γ < ||x − y||γ .
α
3
δ
It follows from Lemma 14.6.4, for each a− such that |a− | < 2C
, there exists a unique solution
to 4.38 in Eγ .
As pointed out earlier, if
∫ ∞
ψ (a) ≡ −
Φ+ (−s) g+ (x (s, a)) ds
0
then for x (t, x0 ) the solution to the initial value problem
x′ = Ax + g (x) , x (0) = x0
(
)
a−
has the property that if x0 is not of the form
, then |x (t, x0 )| cannot be less
ψ (a− )
than δ for all t > 0.
(
)
a−
δ
On the other hand, if x0 =
for |a− | < 2C
, then x (t, x0 ) ,the solution to
ψ (a− )
4.38 is the unique solution to the initial value problem
x′ = Ax + g (x) , x (0) = x0 .
and it was shown that ||x (·, x0 )||γ < δ and so in fact,
|x (t, x0 )| ≤ δe−γt
showing that
lim x (t, x0 ) = 0.
t→∞
The following theorem is the main result. It involves a use of linear algebra and the
above lemma.
Theorem D.7.3 Consider the initial value problem for the almost linear system
x′ = Ax + g (x) , x (0) = x0
in which g is C 1 and where at there are k < n eigenvalues of A which have negative real
parts and n − k eigenvalues of A which have positive real parts. Then 0 is not stable. More
precisely, there exists a set of points (a, ψ (a)) for a small and in a k dimensional subspace
such that for x0 on this set,
lim x (t, x0 ) = 0
t→∞
and for x0 not on this set, there exists a δ > 0 such that |x (t, x0 )| cannot remain less than
δ for all positive t.
438
APPENDIX D. DIFFERENTIAL EQUATIONS
Proof: This involves nothing more than a reduction to the situation of Lemma D.7.2.
From Theorem 10.5.2 on
) A is similar to a matrix of the form described in Lemma
( Page 10.5.2
A
0
−
D.7.2. Thus A = S −1
S. Letting y = Sx, it follows
0 A+
)
(
(
)
A−
0
′
y + g S −1 y
y =
0 A+
Now |x| = S −1 Sx ≤ S −1 |y| and |y| = SS −1 y ≤ ||S|| |x| . Therefore,
1
|y| ≤ |x| ≤ S −1 |y| .
||S||
It follows all conclusions of Lemma D.7.2 are valid for this theorem. The set of points (a, ψ (a)) for a small is called the stable manifold. Much more can be
said about the stable manifold and you should look at a good differential equations book
for this.
Appendix E
Compactness And Completeness
E.1
The Nested Interval Lemma
First, here is the one dimensional nested interval lemma.
Lemma E.1.1 Let Ik = [ak , bk ] be closed intervals, ak ≤ bk , such that Ik ⊇ Ik+1 for all k.
Then there exists a point c which is contained in all these intervals. If limk→∞ (bk − ak ) = 0,
then there is exactly one such point.
Proof: Note that the {ak } are an increasing sequence and that {bk } is a decreasing
sequence. Now note that if m < n, then
am ≤ an ≤ bn
while if m > n,
bn ≥ bm ≥ am .
It follows that am ≤ bn for any pair m, n. Therefore, each bn is an upper bound for all the
am and so if c ≡ sup {ak }, then for each n, it follows that c ≤ bn and so for all, an ≤ c ≤ bn
which shows that c is in all of these intervals.
If the condition on the lengths of the intervals holds, then if c, c′ are in all the intervals,
then if they are not equal, then eventually, for large enough k, they cannot both be contained
in [ak , bk ] since eventually bk − ak < |c − c′ |. This would be a contradiction. Hence c = c′ .
Definition E.1.2 The diameter of a set S, is defined as
diam (S) ≡ sup {|x − y| : x, y ∈ S} .
Thus diam (S) is just a careful description of what you would think of as the diameter.
It measures how stretched out the set is.
Here is a multidimensional version of the nested interval lemma.
] {
[
]}
∏p [
Lemma E.1.3 Let Ik = i=1 aki , bki ≡ x ∈ Rp : xi ∈ aki , bki
and suppose that for all
k = 1, 2, · · · ,
Ik ⊇ Ik+1 .
Then there exists a point c ∈ Rp which is an element of every Ik . If limk→∞ diam (Ik ) = 0,
then the point c is unique.
439
440
APPENDIX E. COMPACTNESS AND COMPLETENESS
[ k k]
[ k+1 k+1 ]
Proof: For each
i
=
1,
·
·
·
,
p,
a
,
b
⊇
ai , bi
and so, by Lemma E.1.1, there
i
i
[ k k]
exists a point ci ∈ ai , bi for all k. Then letting c ≡ (c1 , · · · , cp ) it follows c ∈ [Ik for ]all k.
If the condition on the diameters holds, then the lengths of the intervals limk→∞ aki , bki = 0
and so by the same lemma, each ci is unique. Hence c is unique. E.2
Convergent Sequences, Sequential Compactness
A mapping f : {k, k + 1, k + 2, · · · } → Rp is called a sequence. We usually write it in the
form {aj } where it is understood that aj ≡ f (j).
Definition E.2.1 A sequence, {ak } is said to converge to a if for every ε > 0 there exists
nε such that if n > nε , then |a − an | < ε. The usual notation for this is limn→∞ an = a
although it is often written as an → a. A closed set K ⊆ Rn is one which has the property
∞
that if {kj }j=1 is a sequence of points of K which converges to x, then x ∈ K.
One can also define a subsequence.
Definition E.2.2 {ank } is a subsequence of {an } if n1 < n2 < · · · .
The following theorem says the limit, if it exists, is unique.
Theorem E.2.3 If a sequence, {an } converges to a and to b then a = b.
Proof: There exists nε such that if n > nε then |an − a| <
|an − b| < 2ε . Then pick such an n.
|a − b| < |a − an | + |an − b| <
ε
2
and if n > nε , then
ε ε
+ = ε.
2 2
Since ε is arbitrary, this proves the theorem. The following is the definition of a Cauchy sequence in Rp .
Definition E.2.4 {an } is a Cauchy sequence if for all ε > 0, there exists nε such that
whenever n, m ≥ nε , if follows that |an −am | < ε.
A sequence is Cauchy, means the terms are “bunching up to each other” as m, n get
large.
Theorem E.2.5 The set of terms in a Cauchy sequence in Rp is bounded in the sense that
for all n, |an | < M for some M < ∞.
Proof: Let ε = 1 in the definition of a Cauchy sequence and let n > n1 . Then from the
definition, |an − an1 | < 1.It follows that for all n > n1 , |an | < 1 + |an1 | .Therefore, for all n,
|an | ≤ 1 + |an1 | +
n1
∑
|ak | . k=1
Theorem E.2.6 If a sequence {an } in Rp converges, then the sequence is a Cauchy sequence. Also, if some subsequence of a Cauchy sequence converges, then the original sequence converges.
E.2. CONVERGENT SEQUENCES, SEQUENTIAL COMPACTNESS
441
Proof: Let ε > 0 be given and suppose an → a. Then from the definition of convergence,
there exists nε such that if n > nε , it follows that |an −a| < 2ε . Therefore, if m, n ≥ nε + 1,
it follows that
ε ε
|an −am | ≤ |an −a| + |a − am | < + = ε
2 2
showing that, since ε > 0 is arbitrary, {an } is a Cauchy sequence. It remains to that the
last claim.
∞
Suppose then that {an } is a Cauchy sequence and a = limk→∞ ank where {ank }k=1
is a subsequence. Let ε > 0 be given. Then there exists K such that if k, l ≥ K, then
|ak − al | < 2ε . Then if k > K, it follows nk > K because n1 , n2 , n3 , · · · is strictly increasing
as the subscript increases. Also, there exists K1 such that if k > K1 , |ank − a| < 2ε . Then
letting n > max (K, K1 ), pick k > max (K, K1 ). Then
|a − an | ≤ |a − ank | + |ank − an | <
ε ε
+ = ε.
2 2
Therefore, the sequence converges. Definition E.2.7 A set K in Rp is said to be sequentially compact if every sequence in
K has a subsequence which converges to a point of K.
∏p
Theorem E.2.8 If I0 = i=1 [ai , bi ] where ai ≤ bi , then I0 is sequentially compact.
∏p
∞
Proof: Let[ {ak }k=1] ⊆ I0 and consider
[ ai +bi all] sets of the form pi=1 [ci , di ] where [ci , di ]
ai +bi
equals either ai , 2
or [ci , di ] =
2 , bi . Thus there are 2 of these sets because
there are two choices for the ith slot for i = 1, · · · , p. Also, if x and y are two points in one
)1/2
(∑
p
2
,
of these sets, |xi − yi | ≤ 2−1 |bi − ai | where diam (I0 ) =
i=1 |bi − ai |
(
|x − y| =
p
∑
)1/2
|xi − yi |
2
−1
≤2
i=1
( p
∑
)1/2
|bi − ai |
2
≡ 2−1 diam (I0 ) .
i=1
In particular, since d ≡ (d1 , · · · , dp ) and c ≡ (c1 , · · · , cp ) are two such points,
(
D1 ≡
p
∑
)1/2
|di − ci |
2
≤ 2−1 diam (I0 )
i=1
Denote by {J1 , · · · , J2p } these sets determined above. Since the union of these sets equals
all of I0 ≡ I, it follows that for some Jk , the sequence, {ai } is contained in Jk for infinitely
many k. Let that one be called I1 . Next do for I1 what was done for I0 to get I2 ⊆ I1
such that the diameter is half that of I1 and I2 contains {ak } for infinitely many values
of k. Continue in this way obtaining a nested sequence {Ik } such that Ik ⊇ Ik+1 , and if
x, y ∈ Ik , then |x − y| ≤ 2−k diam (I0 ), and In contains {ak } for infinitely many values of
k for each n. Then by the nested interval lemma, there exists c such that c is contained in
each Ik . Pick an1 ∈ I1 . Next pick n2 > n1 such that an2 ∈ I2 . If an1 , · · · , ank have been
chosen, let ank+1 ∈ Ik+1 and nk+1 > nk . This can be done because in the construction, In
contains {ak } for infinitely many k. Thus the distance between ank and c is no larger than
2−k diam (I0 ), and so limk→∞ ank = c ∈ I0 . Corollary E.2.9 Let K be a closed and bounded set of points in Rp . Then K is sequentially
compact.
442
APPENDIX E. COMPACTNESS AND COMPLETENESS
∏p
Proof: Since K is closed and bounded, there exists a closed rectangle, k=1 [ak , bk ]
which contains K. Now let {xk } be a sequence of∏points in K. By Theorem E.2.8, there
p
exists a subsequence {xnk } such that xnk → x ∈ k=1 [ak , bk ]. However, K is closed and
each xnk is in K so x ∈ K. Theorem E.2.10 Every Cauchy sequence in Rp converges.
∏p
Proof: Let {ak } be a Cauchy sequence. By Theorem E.2.5, there is some box i=1 [ai , bi ]
containing ∏
all the terms of {ak }. Therefore, by Theorem E.2.8, a subsequence converges to
p
a point of i=1 [ai , bi ]. By Theorem E.2.6, the original sequence converges. Appendix F
Fundamental Theorem Of
Algebra
The fundamental theorem of algebra states that every non constant polynomial having
coefficients in C has a zero in C. If C is replaced by R, this is not true because of the
example, x2 + 1 = 0. This theorem is a very remarkable result and notwithstanding its title,
all the best proofs of it depend on either analysis or topology. It was proved by Gauss in
1797 then proved with no loose ends by Argand in 1806 although others also worked on it.
For similar proofs and more discussion and references, see Rudin [23] and Hardy [13]. Recall
De Moivre’s theorem on Page 17 which is listed below for convenience.
Theorem F.0.11 Let r > 0 be given. Then if n is a positive integer,
n
[r (cos t + i sin t)] = rn (cos nt + i sin nt) .
Now from this theorem, the following corollary on Page 1.5.5 is obtained.
Corollary F.0.12 Let z be a non zero complex number and let k be a positive integer. Then
there are always exactly k k th roots of z in C.
∑n
Lemma F.0.13 Let ak ∈ C for k = 1, · · · , n and let p (z) ≡ k=1 ak z k . Then p is continuous.
Proof:
|az n − awn | ≤ |a| |z − w| z n−1 + z n−2 w + · · · + wn−1 .
Then for |z − w| < 1, the triangle inequality implies |w| < 1 + |z| and so if |z − w| < 1,
n
|az n − awn | ≤ |a| |z − w| n (1 + |z|) .
If ε > 0 is given, let
(
δ < min 1,
ε
n
|a| n (1 + |z|)
)
.
It follows from the above inequality that for |z − w| < δ, |az n − awn | < ε. The function of
the lemma is just the sum of functions of this sort and so it follows that it is also continuous.
Theorem F.0.14 (Fundamental theorem of Algebra) Let p (z) be a nonconstant polynomial.
Then there exists z ∈ C such that p (z) = 0.
443
444
APPENDIX F. FUNDAMENTAL THEOREM OF ALGEBRA
Proof. Suppose the nonconstant polynomial p (z) = a0 + a1 z + · · · + an z n , an ̸= 0, has
no zero in C. Since lim|z|→∞ |p (z)| = ∞, there is a z0 with
|p (z0 )| = min |p (z)| > 0
z∈C
0)
Then let q (z) = p(z+z
p(z0 ) . This is also a polynomial which has no zeros and the minimum of
|q (z)| is 1 and occurs at z = 0. Since q (0) = 1, it follows q (z) = 1 + ak z k + r (z) where r (z)
consists of higher order terms, exponent larger than k. Here ak is the first coefficient which
is nonzero. Choose a sequence, zn → 0, such that ak znk < 0. For example, let −ak znk = (1/n).
Then |q (zn )| ≤ 1 − ak znk + |r (zn )| < 1 for all n large enough because the higher order terms
in r (zn ) converge to 0 faster than znk . This is a contradiction. Appendix G
Fields And Field Extensions
G.1
The Symmetric Polynomial Theorem
First here is a definition of polynomials in many variables which have coefficients in a
commutative ring. A commutative ring would be a field except you don’t know that every
nonzero element has a multiplicative inverse. If you like, let these coefficients be in a field.
It is still interesting. A good example of a commutative ring is the integers. In particular,
every field is a commutative ring.
Definition G.1.1 Let k ≡ (k1 , k2 , · · · , kn ) where each ki is a nonnegative integer. Let
∑
|k| ≡
ki
i
Polynomials of degree p in the variables x1 , x2 , · · · , xn are expressions of the form
∑
g (x1 , x2 , · · · , xn ) =
ak xk11 · · · xknn
|k|≤p
where each ak is in a commutative ring. If all ak = 0, the polynomial has no degree. Such
a polynomial is said to be symmetric if whenever σ is a permutation of {1, 2, · · · , n},
(
)
g xσ(1) , xσ(2) , · · · , xσ(n) = g (x1 , x2 , · · · , xn )
An example of a symmetric polynomial is
s1 (x1 , x2 , · · · , xn ) ≡
n
∑
xi
i=1
Another one is
sn (x1 , x2 , · · · , xn ) ≡ x1 x2 · · · xn
Definition G.1.2 The elementary symmetric polynomial sk (x1 , x2 , · · · , xn ) , k = 1, · · · , n
k
is the coefficient of (−1) xn−k in the following polynomial.
(x − x1 ) (x − x2 ) · · · (x − xn )
= xn − s1 xn−1 + s2 xn−2 − · · · ± sn
Thus
s2 =
∑
i<j
s1 = x1 + x2 + · · · + xn
∑
xi xj , s3 =
xi xj xk , . . . , sn = x1 x2 · · · xn
i<j<k
445
446
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Then the following result is the fundamental theorem in the subject. It is the symmetric
polynomial theorem. It says that these elementary symmetric polynomials are a lot like a
basis for the symmetric polynomials.
Theorem G.1.3 Let g (x1 , x2 , · · · , xn ) be a symmetric polynomial. Then g (x1 , x2 , · · · , xn )
equals a polynomial in the elementary symmetric functions.
∑
g (x1 , x2 , · · · , xn ) =
ak sk11 · · · sknn
k
and the ak are unique.
Proof: If n = 1, it is obviously true because s1 = x1 . Suppose the theorem is true for
n − 1 and g (x1 , x2 , · · · , xn ) has degree d. Let
g ′ (x1 , x2 , · · · , xn−1 ) ≡ g (x1 , x2 , · · · , xn−1 , 0)
By induction, there are unique ak such that
g ′ (x1 , x2 , · · · , xn−1 ) =
∑
′k
n−1
1
ak s′k
1 · · · sn−1
k
where s′i is the corresponding symmetric polynomial which pertains to x1 , x2 , · · · , xn−1 .
Note that
sk (x1 , x2 , · · · , xn−1 , 0) = s′k (x1 , x2 , · · · , xn−1 )
Now consider
g (x1 , x2 , · · · , xn ) −
∑
k
n−1
≡ q (x1 , x2 , · · · , xn )
ak sk11 · · · sn−1
k
is a symmetric polynomial and it equals 0 when xn equals 0. Since it is symmetric, it is also
0 whenever xi = 0. Therefore,
q (x1 , x2 , · · · , xn ) = sn h (x1 , x2 , · · · , xn )
and it follows that h (x1 , x2 , · · · , xn ) is symmetric of degree no more than d − n and is
uniquely determined. Thus, if g (x1 , x2 , · · · , xn ) is symmetric of degree d,
∑
kn−1
g (x1 , x2 , · · · , xn ) =
ak sk11 · · · sn−1
+ sn h (x1 , x2 , · · · , xn )
k
where h has degree no more than d − n. Now apply the same argument to h (x1 , x2 , · · · , xn )
and continue, repeatedly obtaining a sequence of symmetric polynomials hi , of strictly decreasing degree, obtaining expressions of the form
∑
kn−1 kn
g (x1 , x2 , · · · , xn ) =
bk sk11 · · · sn−1
sn + sn hm (x1 , x2 , · · · , xn )
k
Eventually hm must be a constant or zero. By induction, each step in the argument yields
uniqueness and so, the final sum of combinations of elementary symmetric functions is
uniquely determined. Here is a very interesting result which I saw claimed in a paper by Steinberg and Redheffer
on Lindemannn’s theorem which follows from the above theorem.
G.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
447
Theorem G.1.4 Let α1 , · · · , αn be roots of the polynomial equation
an xn + an−1 xn−1 + · · · + a1 x + a0 = 0
where each ai is an integer. Then any symmetric polynomial in the quantities an α1 , · · · , an αn
having integer coefficients is also an integer. Also any symmetric polynomial in the quantities α1 , · · · , αn having rational coefficients is a rational number.
Proof: Let f (x1 , · · · , xn ) be the symmetric polynomial. Thus
f (x1 , · · · , xn ) ∈ Z [x1 · · · xn ]
From Theorem G.1.3 it follows there are integers ak1 ···kn such that
∑
f (x1 , · · · , xn ) =
ak1 ···kn pk11 · · · pknn
k1 +···+kn ≤m
where the pi are the elementary symmetric polynomials defined as the coefficients of
n
∏
(x − xj )
j=1
Thus
f (an α1 , · · · , an αn )
∑
=
ak1 ···kn pk11 (an α1 , · · · , an αn ) · · · pknn (an α1 , · · · , an αn )
k1 +···+kn
Now the given polynomial is of the form
an
n
∏
(x − αj )
j=1
and so the coefficient of xn−k is pk (α1 , · · · , αn ) an = an−k . Also
pk (an α1 , · · · , an αn ) = akn pk (α1 , · · · , αn ) = akn
an−k
an
It follows
f (an α1 , · · · , an αn ) =
∑
k1 +···+kn
(
ak1 ···kn
an−1
a1n
an
)k1 (
)k2
(
)kn
2 an−2
n a0
an
· · · an
an
an
which is an integer. To see the last claim follows from this, take the symmetric polynomial
in α1 , · · · , αn and multiply by the product of the denominators of the rational coefficients
to get one which has integer coefficients. Then by the first part, each homogeneous term is
just an integer divided by an raised to some power. G.2
The Fundamental Theorem Of Algebra
This is devoted to a mostly algebraic proof of the fundamental theorem of algebra. It
depends on the interesting results about symmetric polynomials which are presented above.
I found it on the Wikipedia article about the fundamental theorem of algebra. You google
448
APPENDIX G. FIELDS AND FIELD EXTENSIONS
“fundamental theorem of algebra” and go to the Wikipedia article. It gives several other
proofs in addition to this one. According to this article, the first completely correct proof
of this major theorem is due to Argand in 1806. Gauss and others did it earlier but their
arguments had gaps in them.
You can’t completely escape analysis when you prove this theorem. The necessary analysis is in the following lemma.
Lemma G.2.1 Suppose p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 where n is odd and the
coefficients are real. Then p (x) has a real root.
Proof: This follows from the intermediate value theorem from calculus.
Next is an algebraic consideration. First recall some notation.
m
∏
ai ≡ a1 a2 · · · am
i=1
Recall a polynomial in {z1 , · · · , zn } is symmetric only if it can be written as a sum of elementary symmetric polynomials raised to various powers multiplied by constants. This follows
from Proposition G.1.3 or Theorem G.1.3 both of which are the theorem on symmetric
polynomials.
The following is the main part of the theorem. In fact this is one version of the fundamental theorem of algebra which people studied earlier in the 1700’s.
Lemma G.2.2 Let p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 be a polynomial with real
coefficients. Then it has a complex root.
Proof: It is possible to write
n = 2k m
where m is odd. If n is odd, k = 0. If n is even, keep dividing by 2 until you are left with
an odd number. If k = 0 so that n is odd, it follows from Lemma G.2.1 that p (x) has a
real, hence complex root. The proof will be by induction on k, the case k = 0 being done.
Suppose then that it works for n = 2l m where m is odd and l ≤ k − 1 and let n = 2k m
where m is odd. Let {z1 , · · · , zn } be the roots of the polynomial in a splitting field, the
existence of this field being given by the above proposition. Then
p (x) =
n
∏
(x − zj ) =
j=1
n
∑
k
(−1) pk xk
(7.1)
k=0
where pk is the k th elementary symmetric polynomial. Note this shows
k
an−k = pk (−1) .
(7.2)
There is another polynomial which has coefficients which are sums of real numbers times
the pk raised to various powers and it is
∏
qt (x) ≡
(x − (zi + zj + tzi zj )) , t ∈ R
1≤i<j≤n
I need to verify this is really the case for qt (x). When you switch any two of the zi in qt (x)
the polynomial does not change. For example, let n = 3 when qt (x) is
(x − (z1 + z2 + tz1 z2 )) (x − (z1 + z3 + tz1 z3 )) (x − (z2 + z3 + tz2 z3 ))
G.2. THE FUNDAMENTAL THEOREM OF ALGEBRA
449
and you can observe the assertion about the polynomial is true when you switch two different zi . Thus the coefficients of qt (x) must be symmetric polynomials in the zi with real
coefficients. Hence by Proposition G.1.3 these coefficients are real polynomials in terms
of the elementary symmetric polynomials pk . Thus by 7.2 the coefficients of qt (x) are real
polynomials in terms of the ak of the original polynomial. Recall these were all real. It
follows, and this is what was wanted,(that)qt (x) has all real coefficients.
n
Note that the degree of qt (x) is
because there are this number of ways to pick
2
i < j out of {1, · · · , n}. Now
(
)
(
)
n (n − 1)
n
=
= 2k−1 m 2k m − 1
2
2
= 2k−1 (odd)
and so by induction, for each t ∈ R, qt (x) has a complex root.
There must exist s ̸= t such that for a single pair of indices i, j, with i < j,
(zi + zj + tzi zj ) , (zi + zj + szi zj )
are both complex. Here is why. Let A (i, j) denote those t ∈ R such that (zi + zj + tzi zj ) is
complex. It was just shown that every t ∈ R must be in some A (i, j). There are infinitely
many t ∈ R and so some A (i, j) contains two of them.
Now for that t, s,
zi + zj + tzi zj
zi + zj + szi zj
where t ̸= s and so by Cramer’s rule,
zi + zj = and also
zi zj = = a
= b
a t b s ∈C
1 t 1 s 1 a 1 b ∈C
1 t 1 s At this point, note that zi , zj are both solutions to the equation
x2 − (z1 + z2 ) x + z1 z2 = 0,
which from the above has complex coefficients. By the quadratic formula the zi , zj are both
complex. Thus the original polynomial has a complex root. With this lemma, it is easy to prove the fundamental theorem of algebra. The difference
between the lemma and this theorem is that in the theorem, the coefficients are only assumed
to be complex. What this means is that if you have any polynomial with complex coefficients
it has a complex root and so it is not irreducible. Hence the field extension is the same field.
Another way to say this is that for every complex polynomial there exists a factorization
into linear factors or in other words a splitting field for a complex polynomial is the field of
complex numbers.
450
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Theorem G.2.3 Let p (x) ≡ an xn + an−1 xn−1 + · · · + a1 x + a0 be any complex polynomial,
n ≥ 1, an ̸= 0. Then it has a complex root. Furthermore, there exist complex numbers
z1 , · · · , zn such that
n
∏
p (x) = an
(x − zk )
k=1
Proof: First suppose an = 1. Consider the polynomial
q (x) ≡ p (x) p (x)
this is a polynomial and it has real coefficients. This is because it equals
( n
)
x + an−1 xn−1 + · · · + a1 x + a0 ·
( n
)
x + an−1 xn−1 + · · · + a1 x + a0
The xj+k term of the above product is of the form
ak xk aj xj + ak xk aj xj = xk+j (ak aj + ak aj )
and
ak aj + ak aj = ak aj + ak aj
so it is of the form of a complex number added to its conjugate. Hence q (x) has real
coefficients as claimed. Therefore, by by Lemma G.2.2 it has a complex root z. Hence
either p (z) = 0 or p (z) = 0. Thus p (x) has a complex root.
Next suppose an ̸= 0. Then simply divide by it and get a polynomial in which an = 1.
Denote this modified polynomial as q (x). Then by what was just shown and the Euclidean
algorithm, there exists z1 ∈ C such that
q (x) = (x − z1 ) q1 (x)
where q1 (x) has complex coefficients. Now do the same thing for q1 (x) to obtain
q (x) = (x − z1 ) (x − z2 ) q2 (x)
and continue this way. Thus
n
∏
p (x)
=
(x − zj ) an
j=1
Obviously this is a harder proof than the other proof of the fundamental theorem of
algebra presented earlier. However, this is a better proof. Consider the algebraic numbers A consisting of the real numbers which are roots of some polynomial having rational
coefficients. By Theorem 8.3.32 they are a field. Now consider the field A + iA with the
usual conventions for complex arithmetic. You could repeat the above argument with small
changes and conclude that every polynomial having coefficients in A + iA has a root in
A + iA. Recall from Problem 41 on Page 231 that A is countable and so this is also the case
for A + iA. Thus this gives an algebraically complete field which is countable and so very
different than C. Of course there are other situations in which the above harder proof will
work and yield interesting results.
G.3. TRANSCENDENTAL NUMBERS
G.3
451
Transcendental Numbers
Most numbers are like this. Here the algebraic numbers are those which are roots of a
polynomial equation having rational numbers as coefficients. By the fundamental theorem
of calculus, all these numbers are in C. There are only countably many of these algebraic
numbers, (Problem 41 on Page 231). Therefore, most numbers are transcendental. Nevertheless, it is very hard to prove that this or that number is transcendental. Probably the
most famous theorem about this is the Lindemannn Weierstrass theorem.
Theorem G.3.1 Let the αi be distinct nonzero algebraic numbers and let the ai be nonzero
algebraic numbers. Then
n
∑
ai eai ̸= 0
i=1
I am following the interesting Wikepedia article on this subject. You can also look at the
book by Baker [4], Transcendental Number Theory, Cambridge University Press. There are
also many other treatments which you can find on the web including an interesting article
by Steinberg and Redheffer which appeared in about 1950.
The proof makes use of the following identity. For f (x) a polynomial,
∫
I (s) ≡
∑
deg(f )
s
es−x f (x) dx = es
0
∑
deg(f )
f (j) (0) −
j=0
f (j) (s) .
(7.3)
j=0
where f (j) denotes the j th derivative. In this formula, s ∈ C and the integral is defined in
the natural way as
∫ 1
sf (ts) es−ts dt
(7.4)
0
The identity follows from integration by parts.
∫ 1
∫ 1
sf (ts) es−ts dt = ses
f (ts) e−ts dt
0
0
[ −ts
]
∫ 1 −ts
e
e
= ses −
f (ts) |10 +
sf ′ (st) dt
s
s
0
[
]
∫ 1
−s
1
e
= ses f (s) −
f (0) +
e−ts f ′ (st) dt
s
s
0
∫ 1
= f (0) − es f (s) +
ses−ts f ′ (st) dt
0
∫ s
es−x f ′ (x) dx
≡ f (0) − f (s) es +
0
Continuing this way establishes the identity.
Lemma G.3.2 If K and c are nonzero integers, and β 1 , · · · , β m are the roots of a single
polynomial with integer coefficients,
Q (x) = vxm + · · · + u
where v, u ̸= 0, then
(
)
K + c eβ 1 + · · · + eβ m ̸= 0.
452
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Letting
v (m−1)p Qp (x) xp−1
(p − 1)!
and I (s) be defined in terms of f (x) as above, it follows,
f (x) =
lim
p→∞
and
n
∑
m
∑
I (β i ) = 0
i=1
f (j) (0) = v p(m−1) up + m1 (p) p
j=0
m ∑
n
∑
f (j) (β i ) = m2 (p) p
i=1 j=0
where mi (p) is some integer.
Proof: Let p be a prime number. Then consider the polynomial f (x) of degree n ≡
pm + p − 1,
v (m−1)p Qp (x) xp−1
f (x) =
(p − 1)!
From 7.3


m
m
n
n
∑
∑
∑
∑
 eβ i
c
I (β i ) = c
f (j) (0) −
f (j) (β i )
(
=
i=1
K +c
m
∑
)
e
βi
i=1
∑m
limp→∞ c i=1 I
i=1
n
∑
f
(j)
j=0
(0) − K
j=0
j=0
n
∑
f
(j)
(0) − c
j=0
m ∑
n
∑
f (j) (β i )
(7.5)
i=1 j=0
Claim 1:
(β i ) = 0.
( )
Proof: This follows right away from the definition of I β j and the definition of f (x) .
∫ 1
( )
( )
I β j ≤
β j f tβ j eβ j −tβ j dt
0
( )
1 |v|(m−1)p Q tβ p tp−1 β p−1
j
j
dt
≤
(p
−
1)!
0
∫
which clearly converges to 0. This proves the claim.
The next thing to consider is the term on the end in 7.5,
K
n
∑
j=0
f (j) (0) + c
m ∑
n
∑
f (j) (β i )
(7.6)
i=1 j=0
The idea is to show that∑
for large enough p it is always an integer. When this is done, it
m
can’t happen that K + c i=1 eβ i = 0 because if this were so, you would have a very small
number equal to an integer. Now

p
Q(x)
z
}|
{


v (m−1)p v (x − β 1 ) (x − β 2 ) · · · (x − β m ) xp−1
f (x) =
=
(p − 1)!
p
v mp ((x − β 1 ) (x − β 2 ) · · · (x − β m )) xp−1
(p − 1)!
(7.7)
G.3. TRANSCENDENTAL NUMBERS
453
It follows that for j < p − 1, f (j) (0) = 0. This is because of that term xp−1 . If j ≥ p, f (j) (0)
is an integer multiple of p. Here is why. The terms in this derivative which are nonzero
involve taking p − 1 derivatives of xp−1 and this introduces a (p − 1)! which cancels out the
denominator. Then there are some other derivatives of the product of the (x − β i ) raised
to the power p. By the chain rule, these all involve a multiple of p. Thus this j th derivative
is of the form
pg (x, vβ 1 , · · · , vβ m ) ,
(7.8)
where g (x, vβ 1 , · · · , vβ m ) is a polynomial in x with coefficients which are symmetric polynomials in {vβ 1 , · · · , vβ m } having integer coefficients. Then derivatives of g with respect
to x also yield polynomials in x which have coefficients which are symmetric polynomials
in {vβ 1 , · · · , vβ m } having integer coefficients. Evaluating g at x = 0 must therefore yield
a polynomial which is symmetric in the {vβ 1 , · · · , vβ m } with integer coefficients. Since the
{β 1 , · · · , β m } are the roots of a polynomial having integer coefficients with leading coefficient v, it follows from Theorem G.1.4 that this last polynomial is an integer and so the j th
derivative of f given by 7.8 when evaluated at x = 0 yields an integer times p. Now consider
the case of the (p − 1) derivative of f . The only nonzero term of f (j) (0) is the one which
comes from taking p − 1 derivatives of xp−1 and so it reduces to
v mp (−1)
mp
(β 1 β 2 · · · β m )
m
Now Q (0) = v (−1) (β 1 β 2 · · · β m ) = u and so v p (−1)
mp
p
p
(β 1 β 2 · · · β m ) = up which yields
f (p−1) (0) = v mp up v −p = v p(m−1) up
Note this is not necessarily a multiple of p and in fact will not be so if p > u, v because p is
a prime number. It follows
n
∑
f (j) (0) = v p(m−1) up + m (p) p
j=0
where m (p) is some integer.
Now consider the other sum in 7.6,
c
m ∑
n
∑
f (j) (β i )
i=1 j=0
Using the formula in 7.7 it follows that for j < p, f (j) (β i ) = 0. This is because for such
derivatives, each term will have that product of the (x − β i ) in it. Next consider the case
where j ≥ p. In this case, the nonzero terms must involve at least p derivatives of the
expression
p
((x − β 1 ) (x − β 2 ) · · · (x − β m ))
since otherwise, when evaluated at any β k the result would be 0. Hence the (p − 1)! will
vanish from the denominator and so all coefficients of the polynomials in the β j and x will
be integers and in fact, there will be an extra factor of p left over. Thus the j th derivatives
for j ≥ p involve taking the k th derivative, k ≥ 0 with respect to x of
pv mp g (x, β 1 , · · · , β m )
where g (x, β 1 , · · · , β m ) is a polynomial in x having coefficients which are integers times
symmetric polynomials in the {β 1 , · · · , β m } . It follows that the k th derivative for k ≥ 0
454
APPENDIX G. FIELDS AND FIELD EXTENSIONS
is also a polynomial in x having the same properties. Therefore, taking the k th derivative
where k corresponds to j ≥ p and adding, yields
m
∑
pv mp g,k (β i , β 1 , · · · , β m ) =
m
∑
f (j) (β i )
(7.9)
i=1
i=1
where g,k denotes the k th derivative of g taken with respect to x. Now
m
∑
g,k (β i , β 1 , · · · , β m )
i=1
is a symmetric polynomial in the {β 1 , · · · , β m } with no term having degree more than mp
and1 so by Corollary G.1.3 this is of the form
m
∑
∑
g,k (β i , β 1 , · · · , β m ) =
i=1
ak1 ···km pk11 · · · pkmm
k1 ,··· ,km
where the ak1 ···km are integers and the pk are the elementary symmetric polynomials in
{β 1 , · · · , β m }. Recall these were roots of vxm + · · · + u and so from the definition of the
elementary symmetric polynomials given in Definition G.1.2, these pk are each an integer
divided by v, the integers being the coefficients of Q (x). Therefore, from 7.9
m
∑
f (j) (β i ) = pv mp
i=1
m
∑
g,k (β i , β 1 , · · · , β m )
i=1
∑
= pv mp
ak1 ···km pk11 · · · pkmm
k1 ,··· ,km
which is pv mp times an expression which consists of integers times products of coefficients
of Q (x) divided by v raised to various powers, the sum of which is always no more than
mp. Therefore, it reduces to an integer multiple of p and so the same is true of
c
m ∑
n
∑
f (j) (β i )
i=1 j=0
which just involves adding up these integer multiples of p. Therefore, 7.6 is of the form
Kv p(m−1) up + M (p) p
for some integer M (p). Summarizing, it follows
(
) n
m
m
∑
∑
∑
βi
c
I (β i ) = K + c
e
f (j) (0) + Kv p(m−1) up + M (p) p
i=1
i=1
j=0
where the left side is very small whenever p is large enough. Let p be larger than max (K, v, u) .
Since p is prime, it follows it cannot divide Kv p(m−1) up and so the last two terms must sum
to a nonzero integer and so the equation cannot hold unless
K +c
m
∑
eβ i ̸= 0 i=1
1 Note
the claim about this being a symmetric polynomial is about the sum, not an individual term.
G.3. TRANSCENDENTAL NUMBERS
455
Note this shows π is irrational. If π = k/m where k, m are integers, then both iπ and
−iπ are roots of the polynomial with integer coefficients,
m2 x2 + k 2
which would require from what was just shown that
0 ̸= 2 + eiπ + e−iπ
which is not the case since the sum on the right equals 0.
The following corollary follows from this.
Corollary G.3.3 Let K and ci for i = 1, · · · , n be nonzero integers. For each k between 1
m(k)
and n let {β (k)i }i=1 be the roots of a polynomial with integer coefficients,
Qk (x) ≡ vk xmk + · · · + uk
where vk , uk ̸= 0. Then






m1
m2
mn
∑
∑
∑
K + c1 
eβ(1)j  + c2 
eβ(2)j  + · · · + cn 
eβ(n)j  ̸= 0.
j=1
j=1
j=1
Proof: Defining fk (x) and Ik (s) as in Lemma G.3.2, it follows from Lemma G.3.2 that
for each k = 1, · · · , n,
ck
mk
∑
(
Ik (β (k)i ) =
Kk + ck
i=1
mk
∑
e
β(k)i
) deg(fk )
∑
i=1
∑
j=0
deg(fk )
−Kk
(j)
fk (0)
(j)
fk (0) − ck
j=0
mk deg(f
∑k )
∑
i=1
(j)
fk (β (k)i )
j=0
This is exactly the same computation as in the beginning of that lemma except one adds
∑deg(f ) (j)
∑deg(f ) (j)
and subtracts Kk j=0 k fk (0) rather than K j=0 k fk (0) where the Kk are chosen
such that their sum equals K. By Lemma G.3.2,
(
)
mk
mk
(
)
∑
∑
(m −1)p p
β(k)i
ck
Ik (β (k)i ) = Kk + ck
e
vk k
u k + Nk p
i=1
i=1
(
(mk −1)p p
uk
−Kk vk
and so
ck
mk
∑
(
Ik (β (k)i ) =
i=1
Kk + ck
)
+ Nk p − ck Nk′ p
mk
∑
)
e
β(k)i
(
(mk −1)p p
uk
vk
)
+ Nk p
i=1
(m −1)p p
−Kk vk k
uk
+ Mk p
for some integer Mk . By multiplying each Qk (x) by a suitable constant, it can be assumed
without loss of generality that all the vkmk −1 uk are equal to a constant integer U . Then the
above equals
(
)
mk
mk
∑
∑
β(k)i
ck
Ik (β (k)i ) = Kk + ck
e
(U p + Nk p)
i=1
i=1
456
APPENDIX G. FIELDS AND FIELD EXTENSIONS
−Kk U p + Mk p
Adding these for all k gives
n
∑
ck
k=1
mk
∑
(
Ik (β (k)i ) = U p
K+
i=1
n
∑
ck
k=1
+
n
∑
(
Nk p Kk + ck
mk
∑
)
eβ(k)i
− KU p + M p
i=1
mk
∑
)
β(k)i
e
(7.10)
i=1
k=1
For large p it follows from Lemma G.3.2 that the left side is very small. If
K+
n
∑
k=1
∑n
ck
mk
∑
eβ(k)i = 0
i=1
∑ mk
then k=1 ck i=1 eβ(k)i is an integer and so the last term in 7.10 is an integer times p.
Thus for large p it reduces to
small number = −KU p + Ip
where I is an integer. Picking prime p > max (U, K) it follows −KU p + Ip is a nonzero
integer and this contradicts the left side being a small number less than 1 in absolute value.
Next is an even more interesting Lemma which follows from the above corollary.
Lemma G.3.4 If b0 , b1 , · · · , bn are non zero integers, and γ 1 , · · · , γ n are distinct algebraic
numbers, then
b0 eγ 0 + b1 eγ 1 + · · · + bn eγ n ̸= 0
Proof: Assume
b0 eγ 0 + b1 eγ 1 + · · · + bn eγ n = 0
(7.11)
Divide by eγ 0 and letting K = b0 ,
K + b1 eα(1) + · · · + bn eα(n) = 0
(7.12)
where α (k) = γ k − γ 0 . These are still distinct algebraic numbers none of which is 0 thanks
to Theorem 8.3.32. Therefore, α (k) is a root of a polynomial
vk xmk + · · · + uk
(7.13)
having integer coefficients, vk , uk ̸= 0. Recall algebraic numbers were defined as roots of
polynomial equations having rational coefficients. Just multiply by the denominators to get
one with integer coefficients. Let the roots of this polynomial equation be
{
}
α (k)1 , · · · , α (k)mk
and suppose they are listed in such a way that α (k)1 = α (k). Letting ik be an integer in
{1, · · · , mk } it follows from the assumption 7.11 that
(
)
∏
K + b1 eα(1)i1 + b2 eα(2)i2 + · · · + bn eα(n)in = 0
(7.14)
(i1 ,··· ,in )
ik ∈{1,··· ,mk }
G.3. TRANSCENDENTAL NUMBERS
457
This is because one of the factors is the one occurring in 7.12 when ik = 1 for every k. The
product is taken over all distinct ordered lists (i1 , · · · , in ) where ik is as indicated. Expand
this possibly huge product. This will yield something like the following.
(
)
(
)
K ′ + c1 eβ(1)1 + · · · + eβ(1)µ(1) + c2 eβ(2)1 + · · · + eβ(2)µ(2) + · · · +
(
)
cN eβ(N )1 + · · · + eβ(N )µ(N ) = 0
(7.15)
These integers cj come from products of the bi and K. The β (i)j are the distinct exponents
which result. Note that a typical term in this product 7.14 would be something like
β(j)
integer
z
}|r
{
z
}|
{
α (k1 )i1 + α (k2 )i2 · · · + α (kn−p )in−p
p+1
K
bk1 · · · bkn−p e
the kj possibly not distinct and each ik ∈ {1, · · · , mik }. Other terms of this sort are
K p+1 bk1 · · · bkn−p e
α(k1 )i′ +α(k2 )i′ ···+α(kn−p )i′
1
2
n−p
,
K p+1 bk1 · · · bkn−p eα(k1 )1 +α(k2 )1 ···+α(kn−p )1
where each i′k is another index in{ {1, · · · , mik } }and so forth. A given j in the sum of 7.15
corresponds to such a choice of bk1 , · · · , bkn−p which leads to K p+1 bk1 · · · bkn−p times a
sum of exponentials like those just described. Since the product in 7.14 is taken over all
choices ik ∈ {1, · · · , mk } , it follows that if you switch α (r)i and α (r)j , two of the roots of
the polynomial
vr xmr + · · · + ur
mentioned above, the result in 7.15 would be the same except for permuting the
β (s)1 , β (s)2 , · · · , β (s)µ(s) .
Thus a symmetric polynomial in
β (s)1 , β (s)2 , · · · , β (s)µ(s)
is also a symmetric polynomial in the α (k)1 , α (k)2 , · · · , α (k)mk for each k. Thus for a
given r, β (r)1 , · · · , β (r)µ(r) are roots of the polynomial
(
)
(x − β (r)1 ) (x − β (r)2 ) · · · x − β (r)µ(r)
whose coefficients are symmetric polynomials in the β (r)j which is a symmetric polynomial
in the α (k)j , j = 1, · · · , mk for each k. Letting g be one of these symmetric polynomials
and writing it in terms of the α (k)i you would have
∑
l
l
l
Al1 ···ln α (n)11 α (n)22 · · · α (n)mnn
l1 ,··· ,ln
where Al1 ···ln is a symmetric polynomial in α (k)j , j = 1, · · · , mk for each k ≤ n − 1. These
coefficients are in the field (Proposition 8.3.31) Q [A (1) , · · · , A (n − 1)] where A (k) denotes
{
}
α (k)1 , · · · , α (k)mk
and so from Proposition G.1.3, the above symmetric polynomial is of the form
∑
(
)
(
)
k
Bk1 ···kmn pk11 α (n)1 , · · · , α (n)mn · · · pmmnn α (n)1 , · · · , α (n)mn
(k1 ···kmn )
458
APPENDIX G. FIELDS AND FIELD EXTENSIONS
where Bk1 ···kmn is a symmetric polynomial in α (k)j , j = 1, · · · , mk for each k ≤ n − 1. Now
do for each Bk1 ···kmn what was just done for g featuring this time
{
}
α (n − 1)1 , · · · , α (n − 1)mn−1
and continuing this way, it must be the case that eventually you have a sum of integer
multiples of products of elementary symmetric polynomials in α (k)j , j = 1, · · · , mk for
each k ≤ n. By Theorem G.1.4, these are each rational numbers. Therefore, each such g is
a rational number and so the β (r)j are algebraic. Now 7.15 contradicts Corollary G.3.3. Note this lemma is sufficient to prove Lindemann’s theorem that π is transcendental.
Here is why. If π is algebraic, then so is iπ and so from this lemma, e0 + eiπ ̸= 0 but this is
not the case because eiπ = −1.
The next theorem is the main result, the Lindemannn Weierstrass theorem.
Theorem G.3.5 Suppose a (1) , · · · , a (n) are nonzero algebraic numbers and suppose
α (1) , · · · , α (n)
are distinct algebraic numbers. Then
a (1) eα(1) + a (2) eα(2) + · · · + a (n) eα(n) ̸= 0
Proof: Suppose a (j) ≡ a (j)1 is a root of the polynomial
v j x mj + · · · + u j
where vj , uj ̸= 0. Let the roots of this polynomial be a (j)1 , · · · , a (j)mj . Suppose to the
contrary that
a (1)1 eα(1) + a (2)1 eα(2) + · · · + a (n)1 eα(n) = 0
Then consider the big product
(
)
∏
a (1)i1 eα(1) + a (2)i2 eα(2) + · · · + a (n)in eα(n)
(7.16)
(i1 ,··· ,in )
ik ∈{1,··· ,mk }
the product taken over all ordered lists (i1 , · · · , in ) . This product equals
0 = b1 eβ(1) + b2 eβ(2) + · · · + bN eβ(N )
(7.17)
where the β (j) are the distinct exponents which result. The β (i) are clearly algebraic
because they are the sum of the α (i). Since the product in 7.16 is taken for all ordered lists
as described above, it follows that for a given k,if α (k)i is switched with α (k)j , that is, two
of the roots of vk xmk + · · · + uk are switched, then the product is unchanged and so 7.17
is also unchanged. Thus each bk is a symmetric polynomial in the a (k)j , j = 1, · · · , mk for
each k. It follows
∑
j
j
bk =
Aj1 ,··· ,jmn a (n)11 · · · a (n)mmnn
(j1 ,··· ,jmn )
{
}
and this is symmetric in the a (n)1 , · · · , a (n)mn the coefficients Aj1 ,··· ,jmn being in the
field (Proposition 8.3.31) Q [A (1) , · · · , A (n − 1)] where A (k) denotes
a (k)1 , · · · , a (k)mk
G.4. MORE ON ALGEBRAIC FIELD EXTENSIONS
459
and so from Proposition G.1.3,
∑
(
)
(
)
j
Bj1 ,··· ,jmn pj11 a (n)1 · · · a (n)mn · · · pmmnn a (n)1 · · · a (n)mn
bk =
(j1 ,··· ,jmn )
{
where the Bj1 ,··· ,jmn are symmetric in
}mk
a (k)j
j=1
for each k ≤ n − 1. Now doing to
Bj1 ,··· ,jmn what was just done to bk and continuing this way, it follows bk is a finite sum of
{
}mk
integers times elementary polynomials in the various a (k)j
for k ≤ n. By Theorem
j=1
G.1.4 this is a rational number. Thus bk is a rational number. Multiplying by the product
of all the denominators, it follows there exist integers ci such that
0 = c1 eβ(1) + c2 eβ(2) + · · · + cN eβ(N )
which contradicts Lemma G.3.4. This theorem is sufficient to show e is transcendental. If it were algebraic, then
ee−1 + (−1) e0 ̸= 0
but this is not the case. If a ̸= 1 is algebraic, then ln (a) is transcendental. To see this, note
that
1eln(a) + (−1) ae0 = 0
which cannot happen according to the above theorem. If a is algebraic and sin (a) ̸= 0, then
sin (a) is transcendental because
1 ia
1
e − e−ia + (−1) sin (a) e0 = 0
2i
2i
which cannot occur if sin (a) is algebraic. There are doubtless other examples of numbers
which are transcendental by this amazing theorem.
G.4
More On Algebraic Field Extensions
The next few sections have to do with fields and field extensions. There are many linear
algebra techniques which are used in this discussion and it seems to me to be very interesting.
However, this is definitely far removed from my own expertise so there may be some parts of
this which are not too good. I am following various algebra books in putting this together.
Consider the notion of splitting fields. It is desired to show that any two are isomorphic,
meaning that there exists a one to one and onto mapping from one to the other which
preserves all the algebraic structure. To begin with, here is a theorem about extending
homomorphisms. [18]
¯ are two fields and that f : F → F
¯ is a homomorphism. This
Definition G.4.1 Suppose F, F
means that
f (xy) = f (x) f (y) , f (x + y) = f (x) + f (y)
An isomorphism is a homomorphism which is one to one and onto. A monomorphism is
a homomorphism which is one to one. An automorphism is an isomorphism of a single
field. Sometimes people use the symbol ≃ to indicate something is an isomorphism. Then if
p (x) ∈ F [x] , say
n
∑
p (x) =
ak xk ,
k=0
460
APPENDIX G. FIELDS AND FIELD EXTENSIONS
¯ [x] defined as
p¯ (x) will be the polynomial in F
p¯ (x) ≡
n
∑
f (ak ) xk .
k=0
¯ [x] in the obvious way.
Also consider f as a homomorphism of F [x] and F
f (p (x)) = p¯ (x)
The following is a nice theorem which will be useful.
Theorem G.4.2 Let F be a field and let r be algebraic over F. Let p (x) be the minimal
polynomial of r. Thus p (r) = 0 and p (x) is monic and no nonzero polynomial having
coefficients in F of smaller degree has r as a root. In particular, p (x) is irreducible over F.
Then define f : F [x] → F [r] , the polynomials in r by
(m
)
m
∑
∑
f
ai x i ≡
ai ri
i=0
i=0
Then f is a homomorphism. Also, defining g : F [x] / (p (x)) by
g ([q (x)]) ≡ f (q (x)) ≡ q (r)
it follows that g is an isomorphism from the field F [x] / (p (x)) to F [r] .
Proof: First of all, consider why f is a homomorphism. The preservation of sums is
obvious. Consider products.




∑
∑
∑
∑
f
ai xi
bj xj  = f 
ai bj xi+j  =
ai bj ri+j
i
j
i,j
=
∑
i
ai ri
ij
∑
bj rj = f
(
∑
j

) 
∑
ai xi f 
bj xj 
i
j
Thus it is clear that f is a homomorphism.
First consider why g is even well defined. If [q (x)] = [q1 (x)] , this means that
q1 (x) − q (x) = p (x) l (x)
for some l (x) ∈ F [x]. Therefore,
f (q1 (x))
= f (q (x)) + f (p (x) l (x))
= f (q (x)) + f (p (x)) f (l (x))
≡ q (r) + p (r) l (r) = q (r) = f (q (x))
Now from this, it is obvious that g is a homomorphism.
g ([q (x)] [q1 (x)])
g ([q (x)]) g ([q1 (x)])
=
≡
g ([q (x) q1 (x)]) = f (q (x) q1 (x)) = q (r) q1 (r)
q (r) q1 (r)
Similarly, g preserves sums. Now why is g one to one? It suffices to show that if g ([q (x)]) = 0,
then [q (x)] = 0. Suppose then that
g ([q (x)]) ≡ q (r) = 0
G.4. MORE ON ALGEBRAIC FIELD EXTENSIONS
461
Then
q (x) = p (x) l (x) + ρ (x)
where the degree of ρ (x) is less than the degree of p (x) or else ρ (x) = 0. If ρ (x) ̸= 0, then
it follows that
ρ (r) = 0
and ρ (x) has smaller degree than that of p (x) which contradicts the definition of p (x) as the
minimal polynomial of r. Since p (x) is irreducible, F [x] / (p (x)) is a field. It is clear that g
is onto. Therefore, F [r] is a field also. (This was shown earlier by different reasoning.) Here is a diagram of what the following theorem says.
Extending f to g
f
F
→
p (x)∑
∈ F [x]
n
p (x) = k=0 ak xk
p (r) = 0
F [r]
r
≃
f
¯
F
¯ [x]
→ ∑ p¯ (x) ∈ F
n
k
→
¯ (x)
k=0 f (ak ) x = p
p¯ (¯
r) = 0
g
¯ [¯
→
F
r]
≃
g
→
r¯
One such g for each r¯
¯ be an isomorphism of the two fields. Let r be algebraic
Theorem G.4.3 Let f : F → F
¯ such that
over F with minimal polynomial p (x) and suppose there exists r¯ algebraic over F
¯
p¯ (¯
r) = 0. Then there exists an isomorphism g : F [r] → F [¯
r] which agrees with f on F. If
¯ [¯
g : F [r] → F
r] is an isomorphism which agrees with f on F and if α ([k (x)]) ≡ k (r) is the
homomorphism mapping F [x] / (p (x)) to F [r] , then there must exist r¯ such that p¯ (¯
r) = 0
and g = βα−1 where β
¯ [¯
β : F [x] / (p (x)) → F
r]
is given by β ([k (x)]) = k¯ (¯
r) . In particular, g (r) = r¯.
Proof: From Theorem G.4.2, there exists α, an isomorphism in the following picture,
α ([k (x)]) = k (r).
β
α
¯ [¯
F [r] ← F [x] / (p (x)) → F
r]
where β ([k (x)]) ≡ k¯ (¯
r) . (k¯ (x) comes from f as described in the above definition.) This β
is a well defined monomorphism because of the assumption that p¯ (¯
r) = 0. This needs to be
verified. Assume then that it is so. Then just let g = βα−1 .
Why is β well defined? Suppose [k (x)] = [k ′ (x)] so that k (x) − k ′ (x) = l (x) p (x) . Then
since f is a homomorphism,
k¯ (x) − k¯′ (x) = ¯l (x) p¯ (x) , k¯ (¯
r) − g¯k¯′ (¯
r) = ¯l (¯
r) p¯ (¯
r) = 0
so β is indeed well defined. It is clear from the definition that β is a homomorphism. Suppose
β ([k (x)]) = 0. Does it follow that [k (x)] = 0? By assumption, g¯ (¯
r) = 0 and also,
k¯ (x) = p¯ (x) ¯l (x) + ρ
¯ (x)
where the degree of ρ
¯ (x) is less than the degree of p¯ (x) or else it equals 0. But then, since
f is an isomorphism,
k (x) = p (x) l (x) + ρ (x)
462
APPENDIX G. FIELDS AND FIELD EXTENSIONS
where the degree of ρ (x) is less than the degree of p (x) . However, the above shows that
ρ (r) = 0 contrary to p (x) being the minimal polynomial. Hence ρ (x) = 0 and this implies
that [k (x)] = 0. Thus β is one to one and a homomorphism. Hence g = βα−1 works if it
is also onto. However, it is clear that α−1 is onto and that β is onto. Hence the desired
extension exists.
Now suppose such an isomorphism g exists. Then r¯ must equal g (r) and
0 = g (p (r)) = p¯ (g (r)) = p¯ (¯
r)
Hence, β can be defined as above as β ([k (x)]) ≡ k¯ (¯
r) relative to this r¯ ≡ g (r) and
βα−1 (k (r)) ≡ β ([k (x)]) ≡ k¯ (g (r)) = g (k (r))
What is the meaning of the above in simple terms? It says that the monomorphisms
¯ containing F
¯ correspond to the roots of p¯ (x) in K.
¯ That is, for each
from F [r] to a field K
root of p¯ (x), there is a monomorphism and for each monomorphism, there is a root. Also,
¯ there is an isomorphism from F [r] to F
¯ [¯
for each root r¯ of p¯ (x) in K,
r].
Note that if p (x) is a monic irreducible polynomial, then it is the minimal polynomial
for each of its roots. This is the situation which is about to be considered. It involves the
¯ of p (x) , p¯ (x) where η is an isomorphism of F and F
¯ as described above.
splitting fields K, K
See [18]. Here is a little diagram which describes what this theorem says.
Definition G.4.4 The symbol [K : F] where K is a field extension of F means the dimension
of the vector space K with field of scalars F.
η
F
→
¯
F
p (x)
“ηp (x) = p¯ (x)”
p¯ (x)
F [r1 , · · · , rn ]
ζi
¯ [r1 , · · · , rn ]
F
≃
{
i = 1, · · · , m,
→
≃
m ≤ [K : F]
m = [K : F] , r¯i ̸= r¯j
¯ and let K = F [r1 , · · · , rn ] , K
¯ =
Theorem G.4.5 Let η be an isomorphism from F to F
¯
F [¯
r1 , · · · , r¯n ] be splitting fields of p (x) and p¯ (x) respectively. Then there exist at most
¯ which extend η. If {¯
[K : F] isomorphisms ζ i : K → K
r1 , · · · , r¯n } are distinct, then there
exist exactly [K : F] isomorphisms of the above sort. In either case, the two splitting fields
are isomorphic with any of these ζ i serving as an isomorphism.
Proof: Suppose [K : F] = 1. Say a basis for K is {r} . Then {1, r} is dependent and so
there exist a, b ∈ F, not both zero such that a + br = 0. Then it follows that r ∈ F and so in
this case F = K. Then the isomorphism which extends η is just η itself and there is exactly
1 isomorphism.
Next suppose [K : F] > 1. Then p (x) has an irreducible factor over F of degree larger
than 1, q (x). If not, you would have
p (x) = xn + an−1 xn−1 + · · · + an
and it would factor as
= (x − r1 ) · · · (x − rn )
with each rj ∈ F, so F = K contrary to [K : F] > 1.Without loss of generality, let the roots
of q (x) in K be {r1 , · · · , rm }. Thus
q (x) =
m
∏
i=1
(x − ri ) , p (x) =
n
∏
i=1
(x − ri )
G.4. MORE ON ALGEBRAIC FIELD EXTENSIONS
463
Now q¯ (x) defined analogously to p (x) , also has degree at least 2. Furthermore, it divides
¯ Denote the roots of q¯ (x) in K as {¯
p¯ (x) all of whose roots are in K.
r1 , · · · , r¯m } where they
are counted according to multiplicity.
Then from Theorem G.4.3, there exist k ≤ m one to one homomorphisms ζ i mapping
¯ one for each distinct root of q¯ (x) in K.
¯ If the roots of p¯ (x) are distinct, then
F [r1 ] to K,
this is sufficient to imply that the roots of q¯ (x) are also distinct, and k = m. Otherwise,
¯ Then
maybe k < m. (It is conceivable that q¯ (x) might have repeated roots in K.)
[K : F] = [K : F [r1 ]] [F [r1 ] : F]
and since the degree of q (x) > 1 and q (x) is irreducible, this shows that [F [r1 ] : F] = m > 1
and so
[K : F [r1 ]] < [K : F]
Therefore, by induction, each of these k ≤ m = [F [r1 ] : F] one to one homomorphisms
¯ and for each of these ζ , there are no more than
extends to an isomorphism from K to K
i
[K : F [r1 ]] of these isomorphisms extending F. If the roots of p¯ (x) are distinct, then there
are exactly m of these ζ i and for each, there are [K : F [r1 ]] extensions. Therefore, if the
roots of p¯ (x) are distinct, this has identified
[K : F [r1 ]] m = [K : F [r1 ]] [F [r1 ] : F] = [K : F]
¯ which agree with η on F. If the roots of p¯ (x) are not distinct, then
isomorphisms of K to K
maybe there are fewer than [K : F] extensions of η.
¯ Then consider its
Is this all of them? Suppose ζ is such an isomorphism of K and K.
restriction to F [r1 ] . By Theorem G.4.3, this restriction must coincide with one of the ζ i
chosen earlier. Then by induction, ζ is one of the extensions of the ζ i just mentioned. Definition G.4.6 Let K be a finite dimensional extension of a field F such that every
element of K is algebraic over F, that is, each element of K is a root of some polynomial
in F [x]. Then K is called a normal extension if for every k ∈ K all roots of the minimal
polynomial of k are contained in K.
So what are some ways to tell a field is a normal extension? It turns out that if K is a
splitting field of f (x) ∈ F [x] , then K is a normal extension. I found this in [18]. This is an
amazing result.
Proposition G.4.7 Let K be a splitting field of f (x) ∈ F [x]. Then K is a normal extension. In fact, if L is an intermediate field between F and K, then L is also a normal
extension of F.
Proof: Let r ∈ K be a root of g (x), an irreducible monic polynomial in F [x]. It is
required to show that every other root of g (x) is in K. Let the roots of g (x) in a splitting
field be {r1 = r, r2 , · · · , rm }. Now g (x) is the minimal polynomial of rj over F because g (x)
is irreducible. Recall why this was. If p (x) is the minimal polynomial of rj ,
g (x) = p (x) l (x) + r (x)
where r (x) either is 0 or it has degree less than the degree of p (x) . However, r (rj ) = 0 and
this is impossible if p (x) is the minimal polynomial. Hence r (x) = 0 and now it follows
that g (x) was not irreducible unless l (x) = 1.
By Theorem G.4.3, there exists an isomorphism η of F [r1 ] and F [rj ] which fixes F and
maps r1 to rj . Now K [r1 ] and K [rj ] are splitting fields of f (x) over F [r1 ] and F [rj ] respectively. By Theorem G.4.5, the two fields K [r1 ] and K [rj ] are isomorphic, the isomorphism,
ζ extending η. Hence
[K [r1 ] : K] = [K [rj ] : K]
464
APPENDIX G. FIELDS AND FIELD EXTENSIONS
But r1 ∈ K and so K [r1 ] = K. Therefore, K = K [rj ] and so rj is also in K. Thus all the
roots of g (x) are actually in K. Consider the last assertion.
Suppose r = r1 ∈ L where the minimal polynomial for r is denoted by q (x). Then
letting the roots of q (x) in K be {r1 , · · · , rm }. By Theorem G.4.3 applied to the identity
map on L, there exists an isomorphism θ : L [r1 ] → L [rj ] which fixes L and takes r1 to rj .
But this implies that
1 = [L [r1 ] : L] = [L [rj ] : L]
Hence rj ∈ L also. Since r was an arbitrary element of L, this shows that L is normal. Definition G.4.8 When you have F [a1 , · · · , am ] with each ai algebraic so that F [a1 , · · · , am ]
is a field, you could consider
m
∏
f (x) ≡
fi (x)
i=1
where fi (x) is the minimal polynomial of ai . Then if K is a splitting field for f (x) , this K
is called the normal closure. It is at least as large as F [a1 , · · · , am ] and it has the advantage
of being a normal extension.
G.5
The Galois Group
¯ the above suggests the following definition.
In the case where F = F,
Definition G.5.1 When K is a splitting field for a polynomial p (x) having coefficients in
F, we say that K is a splitting field of p (x) over the field F. Let K be a splitting field of
p (x) over the field F. Then G (K, F) denotes the group of automorphisms of K which leave
F fixed. For a finite set S, denote by |S| as the number of elements of S. More generally,
when K is a finite extension of L, denote by G (K, L) the group of automorphisms of K
which leave L fixed.
It is shown later that G (K, F) really is a group according to the strict definition of a
group. For right now, just regard it as a set of automorphisms which keeps F fixed. Theorem
G.4.5 implies the following important result.
Theorem G.5.2 Let K be a splitting field of p (x) over the field F. Then
|G (K, F)| ≤ [K : F]
When the roots of p (x) are distinct, equality holds in the above.
So how large is |G (K, F)| in case p (x) is a polynomial of degree n which has n distinct
roots? Let p (x) be a monic polynomial with roots in K, {r1 , · · · , rn } and suppose that none
of the ri is in F. Thus
xn + a1 xn−1 + a2 xn−2 + · · · + an
n
∏
=
(x − rk ) , ai ∈ F
p (x) =
k=1
Thus K consists of all rational functions in the r1 , · · · , rn . Let σ be a mapping from
{r1 , · · · , rn } to {r1 , · · · , rn } , say rj → rij . In other words σ produces a permutation of
these roots. Consider the following way of obtaining something in G (K, F) from σ. If you
have a typical thing in K, you can obtain another thing in K by replacing each rj with rij in
G.5. THE GALOIS GROUP
465
a rational function, a quotient of two polynomials which have coefficients in F. Furthermore,
if you do this, you can see right away that the resulting map form K to K is obviously an automorphism, preserving the operations of multiplication and addition. Does it keep F fixed?
Of course. You don’t change the coefficients of the polynomials in the rational function
which are always in F. Thus every permutation of the roots determines an automorphism
of K. Now suppose σ is an automorphism of K. Does it determine a permutation of the
roots?
0 = σ (p (ri )) = σ (p (σ (ri )))
and so σ (ri ) is also a root, say rij . Thus it is clear that each σ ∈ G (K, F) determines
a permutation of the roots. Since the roots are distinct, it follows that |G (K, F)| equals
the number of permutations of {r1 , · · · , rn } which is n! and that there is a one to one
correspondence between the permutations of the roots and G (K, F) . More will be done on
this later after discussing permutation groups.
This is a good time to make a very important observation about irreducible polynomials.
Lemma G.5.3 Suppose q (x) ̸= p (x) are both irreducible polynomials over a field F. Then
for K a field which contains all the roots of both polynomials, there is no root common to
both p (x) and q (x).
Proof: If l (x) is a monic polynomial which divides them both, then l (x) must equal
1. Otherwise, it would equal p (x) and q (x) which would require these two to be equal.
Thus p (x) and q (x) are relatively prime and there exist polynomials a (x) , b (x) having
coefficients in F such that
a (x) p (x) + b (x) q (x) = 1
Now if p (x) and q (x) share a root r, then (x − r) divides both sides of the above in K [x],
but this is impossible. Now here is an important definition of a class of polynomials which yield equality in the
inequality of Theorem G.5.2.
Definition G.5.4 Let p (x) be a polynomial having coefficients in a field F. Also let K be
a splitting field. Then p (x) is separable if it is of the form
p (x) =
m
∏
ki
qi (x)
i=1
where each qi (x) is irreducible over F and each qi (x) has distinct roots in K. From the
above lemma, no two qi (x) share a root. Thus
p1 (x) ≡
m
∏
qi (x)
i=1
has distinct roots in K.
For example, consider the case where F = Q and the polynomial is of the form
(
x2 + 1
)2 ( 2
)2
x − 2 = x8 − 2x6 − 3x4 + 4x2 + 4
[ √ ]
Then let K be the splitting field over Q, Q i, 2 .The polynomials x2 + 1 and x2 − 2 are
irreducible over Q and each has distinct roots in K.
This is also a convenient time to show that G (K, F) for K a finite extension of F really
is a group. First, here is the definition.
466
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Definition G.5.5 A group G is a nonempty set with an operation, denoted here as · such
that the following axioms hold.
1. For α, β, γ ∈ G, (α · β) · γ = α · (β · γ) . We usually don’t bother to write the ·.
2. There exists ι ∈ G such that αι = ια = α
3. For every α ∈ G, there exists α−1 ∈ G such that αα−1 = α−1 α = ι.
Then why is G ≡ G (K, F) , where K is a finite extension of F, a group? If you simply
look at the automorphisms of K then it is obvious that this is a group with the operation
being composition. Also, from Theorem G.4.5 |G (K, F)| is finite. Clearly ι ∈ G. It is
just the automorphism which takes everything to itself. The operation in this case is just
composition. Thus the associative law is obvious. What about the existence of the inverse?
Clearly, you can define the inverse of α, but does it fix F? If α = ι, then the inverse is clearly
ι. Otherwise, consider α, α2 , · · · . Since |G (K, F)| is( finite,
is a repeat. Thus
)meventually there
to get ι = ααn−m . Hence α−1 is
αm = αn , n > m. Simply multiply on the left by α−1
a suitable power of α and so α−1 obviously leaves F fixed. Thus G (K, F) which has been
called a group all along, really is a group.
Then the following corollary is the reason why separable polynomials are so important.
Also, one can show that if F contains a field which is isomorphic to Q then every polynomial
with coefficients in F is separable. This will be done later after presenting the big results.
This is equivalent to saying that the field has characteristic zero. In addition, the property
of being separable holds in other situations which are described later.
Corollary G.5.6 Let K be a splitting field of p (x) over the field F. Assume p (x) is
separable. Then
|G (K, F)| = [K : F]
Proof: Just note that K is also the splitting field of p1 (x), the product of the distinct
irreducible factors and that from Lemma G.5.3, p1 (x) has distinct roots. Thus the conclusion
follows from Theorem G.4.5. What if L is an intermediate field between F and K? Then p1 (x) still has coefficients in
L and distinct roots in K and so it also follows that
|G (K, L)| = [K : L]
Definition G.5.7 Let G be a group of automorphisms of a field K. Then denote by KG the
fixed field of G. Thus
KG ≡ {x ∈ K : σ (x) = x for all σ ∈ G}
Thus there are two new things, the fixed field of a group of automorphisms H denoted
by KH and the Gallois group G (K, L). How are these related? First here is a simple lemma
which comes from the definitions.
Lemma G.5.8 Let K be an algebraic extension of L (each element of L is a root of some
polynomial in L) for L, K fields. Then
(
)
G (K, L) = G K, KG(K,L)
Proof: It is clear that L ⊆ KG(K,L) because if r ∈ L then by definition, everything in
G (K, L) fixes r and so r is in KG(K,L) . Therefore,
(
)
G (K, L) ⊇ G K, KG(K,L) .
G.5. THE GALOIS GROUP
467
Now let σ ∈ G (K, L) then it is one of the automorphisms
of K which
fixes everything in
)
(
the fixed field of G (K, L) . Thus, by definition, σ ∈ G K, KG(K,L) and so the two are the
same. Now the following says that you can start with L, go to the group G (K, L) and then to
the fixed field of this group and end up back where you started. More precisely,
Proposition G.5.9 If K is a splitting field of p (x) over the field F for separable p (x) ,
and if L is a field between K and F, then K is also a splitting field of p (x) over L and also
L = KG(K,L)
Proof: By the above lemma, and Corollary G.5.6,
[
][
]
|G (K, L)| = [K : L] = K : KG(K,L) KG(K,L) : L
(
) [
]
[
]
= G K, KG(K,L) KG(K,L) : L = |G (K, L)| KG(K,L) : L
[
]
which shows that KG(K,L) : L = 1 and so, since L ⊆ KG(K,L) , it follows that L = KG(K,L) .
This has shown the following diagram in the context of K being a splitting field of a
separable polynomial over F and L being an intermediate field.
L → G (K, L) → KG(K,L) = L
In particular, every intermediate field is a fixed field of a subgroup of G (K, F). Is every
subgroup of G (K, F) obtained in the form G (K, L) for some intermediate field? This involves
another estimate which is apparently due to Artin. I also found this in [18]. There is more
there about some of these things than what I am including.
Theorem G.5.10 Let K be a field and let G be a finite group of automorphisms of K. Then
[K : KG ] ≤ |G|
Proof: Let G = {σ 1 , · · · , σ n } , σ 1 = ι the identity map and suppose {u1 , · · · , um } is a
linearly independent set in K with respect to the field KG . Suppose m > n. Then consider
the system of equations
σ 1 (u1 ) x1 + σ 1 (u2 ) x2 + · · · + σ 1 (um ) xm = 0
σ 2 (u1 ) x1 + σ 2 (u2 ) x2 + · · · + σ 2 (um ) xm = 0
..
.
(7.18)
σ n (u1 ) x1 + σ n (u2 ) x2 + · · · + σ n (um ) xm = 0
which is of the form M x = 0 for x ∈ Km . Since M has more columns than rows, there
exists a nonzero solution x ∈ Km to the above system. Note that this could not happen if
x ∈ Km
G because of independence of {u1 , · · · , um } and the fact that σ 1 = ι. Let the solution
x be one which has the least possible number of nonzero entries. Without loss of generality,
some xk = 1 for some k. If σ r (xk ) = xk for all xk and for each r, then the xk are each
in KG and so the first equation above would be impossible as just noted. Therefore, there
exists l ̸= k and σ r such that σ r (xl ) ̸= xl . For purposes of illustration, say l > k. Now
do σ r to both sides of all the above equations. This yields, after re ordering the resulting
equations a list of equations of the form
σ 1 (u1 ) σ r (x1 ) + · · · + σ 1 (uk ) 1 + · · · + σ 1 (ul ) σ r (xl ) + · · · + σ 1 (um ) σ r (xm ) = 0
σ 2 (u1 ) σ r (x1 ) + · · · + σ 2 (uk ) 1 + · · · + σ 2 (ul ) σ r (xl ) + · · · + σ 2 (um ) σ r (xm ) = 0
..
.
σ n (u1 ) σ r (x1 ) + · · · + σ n (uk ) 1 + · · · + σ n (ul ) σ r (xl ) + · · · + σ n (um ) σ r (xm ) = 0
468
APPENDIX G. FIELDS AND FIELD EXTENSIONS
This is because σ (1) = 1 if σ is an automorphism. The original system in 7.18 is of the
form
σ 1 (u1 ) x1 + · · · + σ 1 (uk ) 1 + · · · + σ 1 (ul ) xl + · · · + σ 1 (um ) xm = 0
σ 2 (u1 ) x1 + · · · + σ 2 (uk ) 1 + · · · + σ 1 (ul ) xl + · · · + σ 2 (um ) xm = 0
..
.
σ n (u1 ) x1 + · · · + σ n (uk ) 1 + · · · + σ 1 (ul ) xl + · · · + σ n (um ) xm = 0
Now replace the k th equation with the difference of the k th equations in the original system
and the one in which σ r was done to both sides of the equations. Since σ r (xl ) ̸= xl the
result will be a linear system of the form M y = 0 where y ̸= 0 has fewer nonzero entries
than x, contradicting the choice of x. With the above estimate, here is another relation between the fixed fields and subgroups
of automorphisms. It doesn’t seem to depend on anything being a splitting field of a
separable polynomial.
Proposition G.5.11 Let H be a finite group of automorphisms defined on a field K. Then
for KH the fixed field,
G (K, KH ) = H
Proof: If σ ∈ H, then by definition, σ ∈ G (K, KH ). It is clear that H ⊆ G (K, KH ) .
Then by Proposition G.5.10 and Theorem G.5.2,
|H| ≥ [K : KH ] ≥ |G (K, KH )| ≥ |H|
and so H = G (K, KH ). This leads to the following interesting correspondence in the case where K is a splitting
field of a separable polynomial over a field F.
β
Fixed fields
L → G (K, L)
α
KH ← H
Subgroups of G (K, F)
(7.19)
Then αβL = L and βαH = H. Thus there exists a one to one correspondence between the
fixed fields and the subgroups of G (K, F). The following theorem summarizes the above
result.
Theorem G.5.12 Let K be a splitting field of a separable polynomial over a field F. Then
there exists a one to one correspondence between the fixed fields KH for H a subgroup of
G (K, F) and the intermediate fields as described in the above. H1 ⊆ H2 if and only if
KH1 ⊇ KH2 . Also
|H| = [K : KH ]
Proof: The one to one correspondence is established above. The claim about the fixed
fields is obvious because if the group is larger, then the fixed field must get harder because it
is more difficult to fix everything using more automorphisms than with fewer automorphisms.
Consider the estimate. From Theorem G.5.10, |H| ≥ [K : KH ]. But also, H = G (K, KH )
from Proposition G.5.11 G (K, KH ) = H and from Theorem G.5.2,
|H| = |G (K, KH )| ≤ [K : KH ] .
Note that from the above discussion, when K is a splitting field of p (x) ∈ F [x] , this
implies that if L is an intermediate field, then it is also a fixed field of a subgroup of G (K, F).
In fact, from the above,
L = KG(K,L)
G.6. NORMAL SUBGROUPS
469
If H is a subgroup, then it is also the Galois group
H = G (K, KH ) .
By Proposition G.4.7, each of these intermediate fields L is also a normal extension of F.
Now there is also something called a normal subgroup which will end up corresponding with
these normal field extensions consisting of the intermediate fields between F and K.
G.6
Normal Subgroups
When you look at groups, one of the first things to consider is the notion of a normal
subgroup.
Definition G.6.1 Let G be a group. Then a subgroup N is said to be a normal subgroup
if whenever α ∈ G,
α−1 N α ⊆ N
The important thing about normal subgroups is that you can define the quotient group
G/N .
Definition G.6.2 Let N be a subgroup of G. Define an equivalence relation ∼ as follows.
α ∼ β means α−1 β ∈ N
Why is this an equivalence relation? It is clear that α ∼ α because α−1 α = ι ∈ N since
N is a subgroup. If α ∼ β, then α−1 β ∈ N and so, since N is a subgroup,
(
α−1 β
)−1
= β −1 α ∈ N
which shows that β ∼ α. Now suppose αα ∼ β, then α−1 β ∈ N and so, since N is a
subgroup,
( −1 )−1
α β
= β −1 α ∈ N
which shows that β ∼ α. Now suppose α ∼ β and β ∼ γ. Then α−1 β ∈ N and β −1 γ ∈ N.
Then since N is a subgroup
α−1 ββ −1 γ = α−1 γ ∈ N
and so α ∼ γ which shows that it is an equivalence relation as claimed. Denote by [α] the
equivalence class determined by α.
Now in the case of N a normal subgroup, you can consider the quotient group.
Definition G.6.3 Let N be a normal subgroup of a group G and define G/N as the set of
all equivalence classes with respect to the above equivalence relation. Also define
[α] [β] ≡ [αβ]
Proposition G.6.4 The above definition is well defined and it also makes G/N into a
group.
Proof: First consider the claim that the definition is well defined. Suppose then that
α ∼ α′ and β ∼ β ′ . It is required to show that
[
]
[αβ] = α′ β ′
470
APPENDIX G. FIELDS AND FIELD EXTENSIONS
But
∈N
−1
(αβ)
α′ β ′
z }| {
= β −1 α−1 α′ β ′ = β −1 α−1 α′ β ′
∈N
∈N
z ( }| ) {z }| {
= β −1 α−1 α′ β β −1 β ′ = n1 n2 ∈ N
Thus the operation[ is well
] defined. Clearly the identity is [ι] where ι is the identity in G
and the inverse is α−1 where α−1 is the inverse for α in G. The associative law is also
obvious. Note that it was important to have the subgroup be normal in order to have the operation
defined on the quotient group.
G.7
Normal Extensions And Normal Subgroups
When K is a splitting field of a separable polynomial having coefficients in F, the intermediate
fields are each normal extensions from the above. If L is one of these, what about G (L, F)?
is this a normal subgroup of G (K, F)? More generally, consider the following diagram which
has now been established in the case that K is a splitting field of a separable polynomial in
F [x].
F ≡ L0
G (F, F) = {ι}
⊆ L1
⊆ G (L1 , F)
⊆ L2
···
⊆ G (L2 , F) · · ·
⊆ Lk−1
⊆ G (Lk−1 , F)
⊆ Lk ≡ K
⊆ G (K, F)
(7.20)
The intermediate fields Li are each normal extensions of F each element of Li being algebraic.
As implied in the diagram, there is a one to one correspondence between the intermediate
fields and the Galois groups displayed. Is G (Lj−1 , F) a normal subgroup of G (Lj , F)?
Let σ ∈ G (Lj , F) and let η ∈ G (Lj−1 , F) . Then is σ −1 ησ ∈ G (Lj−1 , F)? Let r = r1
be something in Lj−1 and let {r1 , · · · , rm } be the roots of the minimal polynomial of r
denoted by f (x) , a polynomial having coefficients in F. Then 0 = σf (r) = f (σ (r)) and
so σ (r) = rj for some j. Since Lj−1 is normal, σ (r) ∈ Lj−1 . Therefore, it is fixed by η. It
follows that
σ −1 ησ (r) = σ −1 σ (r) = r
and so σ −1 ησ ∈ G (Lj−1 , F). Thus G (Lj−1 , F) is a normal subgroup of G (Lj , F) as hoped.
This leads to the following fundamental theorem of Galois theory.
Theorem G.7.1 Let K be a splitting field of a separable polynomial p (x) having coefficients
k
in a field F. Let {Li }i=0 be the increasing sequence of intermediate fields between F and K
as shown above in 7.20. Then each of these is a normal extension of F and the Galois group
G (Lj−1 , F) is a normal subgroup of G (Lj , F). In addition to this,
G (Lj , F) ≃ G (K, F) /G (K, Lj )
where the symbol ≃ indicates the two spaces are isomorphic.
Proof: All that remains is to check that the above isomorphism is valid. Let
θ : G (K, F) /G (K, Lj ) → G (Lj , F) , θ [σ] ≡ σ|Lj
In other words, this is just the restriction of σ to Lj . Is θ well defined? If [σ 1 ] = [σ 2 ] , then
−1
by definition, σ 1 σ −1
fixes everything in Lj . It follows that the
2 ∈ G (K, Lj ) and so σ 1 σ 2
restrictions of σ 1 and σ 2 to Lj are equal. Therefore, θ is well defined. It is obvious that θ
G.8. CONDITIONS FOR SEPARABILITY
471
is a homomorphism. Why is θ onto? This follows right away from Theorem G.4.5. Note
that K is the splitting field of p (x) over Lj since Lj ⊇ F. Also if σ ∈ G (Lj , F) so it is an
automorphism of Lj , then, since it fixes F, p (x) = p¯ (x) in that theorem. Thus σ extends to
ζ, an automorphism of K. Thus θζ = σ. Why is θ one to one? If θ [σ] = θ [α] , this means
σ = α on Lj . Thus σα−1 is the identity on Lj . Hence σα−1 ∈ G (K, Lj ) which is what it
means for [σ] = [α]. There is an immediate application to a description of the normal closure of an algebraic
extension F [a1 , a2 , · · · , am ] . To begin with, recall the following definition.
Definition G.7.2 When you have F [a1 , · · · , am ] with each ai algebraic so that F [a1 , · · · , am ]
is a field, you could consider
m
∏
f (x) ≡
fi (x)
i=1
where fi (x) is the minimal polynomial of ai . Then if K is a splitting field for f (x) , this K
is called the normal closure. It is at least as large as F [a1 , · · · , am ] and it has the advantage
of being a normal extension.
Let G (K, F) = {η 1 , η 2 , · · · , η m } . The conjugate fields are the fields
η j (F [a1 , · · · , am ])
Thus each of these fields is isomorphic to any other and they are all contained in K. Let K′
denote the smallest field contained in K which contains all of these conjugate fields. Note
that if k ∈ F [a1 , · · · , am ] so that η i (k) is in one of these conjugate fields, then η j η i (k) is
also in a conjugate field because η j η i is one of the automorphisms of G (K, F). Let
{
}
S = k ∈ K′ : η j (k) ∈ K′ each j .
Then from what was just shown, each conjugate field is in S. Suppose k ∈ S. What about
k −1 ?
(
)
(
)
η j (k) η j k −1 = η j kk −1 = η j (1) = 1
(
)−1
(
)
(
)−1
and so η j (k)
= η j k −1 . Now η j (k)
∈ K′ because K′ is a field. Therefore,
( −1 )
′
ηj k
∈ K . Thus S is closed with respect to taking inverses. It is also closed with
respect to products. Thus it is clear that S is a field which contains each conjugate field.
However, K′ was defined as the smallest field which contains the conjugate fields. Therefore,
S = K′ and so this shows that each η j maps K′ to itself while fixing F. Thus G (K, F) ⊆
G (K′ , F) . However, since K′ ⊆ K, it follows that also G (K′ , F) ⊆ G (K, F) . Therefore,
G (K′ , F) = G (K, F) , and by the one to one correspondence between the intermediate fields
and the Galois groups, it follows that K′ = K. This proves the following lemma.
Lemma G.7.3 Let K denote the normal extension of F [a1 , · · · , am ] with each ai algebraic
so that F [a1 , · · · , am ] is a field. Thus K is the splitting field of the product of the minimal
polynomials of the ai . Then K is also the smallest field containing the conjugate fields
η j (F [a1 , · · · , am ]) for {η 1 , η 2 , · · · , η m } = G (K, F).
G.8
Conditions For Separability
So when is it that a polynomial having coefficients in a field F is separable? It turns out
that this is always the case for fields which are enough like the rational numbers. It involves
considering the derivative of a polynomial. In doing this, there will be no analysis used, just
472
APPENDIX G. FIELDS AND FIELD EXTENSIONS
the rule for differentiation which we all learned in calculus. Thus the derivative is defined
as follows.
(
)′
an xn + an−1 xn−1 + · · · + a1 x + a0
≡ nan xn−1 + an−1 (n − 1) xn−2 + · · · + a1
This kind of formal manipulation is what most students do anyway, never thinking about
where it comes from. Here nan means to add an to itself n times. With this definition, it
is clear that the usual rules such as the product rule hold. This discussion follows [18].
Definition G.8.1 A field has characteristic 0 if na ̸= 0 for all n ∈ N and a ̸= 0. Otherwise
a field F has characteristic p if p · 1 = 0 for p · 1 defined as 1 added to itself p times and p
is the smallest positive integer for which this takes place.
Note that with this definition, some of the terms of the derivative of a polynomial could
vanish in the case that the field has characteristic p. I will go ahead and write them anyway.
For example, if the field has characteristic p, then
′
(xp − a) = 0
because formally it equals p · 1xp−1 = 0xp−1 , the 1 being the 1 in the field.
Note that the field Zp does not have characteristic 0 because p · 1 = 0. Thus not all fields
have characteristic 0.
How can you tell if a polynomial has no repeated roots? This is the content of the next
theorem.
Theorem G.8.2 Let p (x) be a monic polynomial having coefficients in a field F, and let
K be a field in which p (x) factors
p (x) =
n
∏
(x − ri ) , ri ∈ K.
i=1
Then the ri are distinct if and only if p (x) and p′ (x) are relatively prime over F.
Proof: Suppose first that p′ (x) and p (x) are relatively prime over F. Since they are not
both zero, there exists polynomials a (x) , b (x) having coefficients in F such that
a (x) p (x) + b (x) p′ (x) = 1
Now suppose p (x) has a repeated root r. Then in K [x] ,
2
p (x) = (x − r) g (x)
and so p′ (x) = 2 (x − r) g (x) + (x − r) g ′ (x). Then in K [x] ,
(
)
2
2
a (x) (x − r) g (x) + b (x) 2 (x − r) g (x) + (x − r) g ′ (x) = 1
2
Then letting x = r, it follows that 0 = 1. Hence p (x) has no repeated roots.
Next suppose there are no repeated roots of p (x). Then
p′ (x) =
n ∏
∑
i=1 j̸=i
(x − rj )
G.8. CONDITIONS FOR SEPARABILITY
473
p′ (x) cannot be zero in this case because
p′ (rn ) =
n−1
∏
(rn − rj ) ̸= 0
j=1
because it is the product of nonzero elements of K. Similarly no term in the sum for p′ (x)
can equal zero because
∏
(ri − rj ) ̸= 0.
j̸=i
Then if q (x) is a monic polynomial of degree larger than 1 which divides p (x), then the
roots of q (x) in K are a subset of {r1 , · · · , rn }. Without loss of generality, suppose these
roots of q (x) are {r1 , · · · , rk } , k ≤ n − 1, since q (x) divides p′ (x) which has degree at most
∏k
n − 1. Then q (x) = i=1 (x − ri ) but this fails to divide p′ (x) as polynomials in K [x] and
so q (x) fails to divide p′ (x) as polynomials in F [x] either. Therefore, q (x) = 1 and so the
two are relatively prime. The following lemma says that the usual calculus result holds in case you are looking at
polynomials with coefficients in a field of characteristic 0.
Lemma G.8.3 Suppose that F has characteristic 0. Then if f ′ (x) = 0, it follows that f (x)
is a constant.
Proof: Suppose
f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
Then take the derivative n − 1 times to find that an multiplied by a positive integer man
equals 0. Therefore, an = 0 because, by assumption man ̸= 0 if an ̸= 0. Now repeat the
argument with
f1 (x) = an−1 xn−1 + · · · + a1 x + a0
and continue this way to find that f (x) = a0 ∈ F. Now here is a major result which applies to fields of characteristic 0.
Theorem G.8.4 If F is a field of characteristic 0, then every polynomial p (x) , having
coefficients in F is separable.
Proof: It is required to show that the irreducible factors of p (x) have distinct roots in
K a splitting field for p (x). So let q (x) be an irreducible monic polynomial. If l (x) is a
monic polynomial of positive degree which divides both q (x) and q ′ (x) , then since q (x) is
irreducible, it must be the case that l (x) = q (x) which forces q (x) to divide q ′ (x) . However,
the degree of q ′ (x) is less than the degree of q (x) so this is impossible. Hence l (x) = 1 and
so q ′ (x) and q (x) are relatively prime which implies that q (x) has distinct roots. It follows that the above theory all holds for any field of characteristic 0. For example,
if the field is Q then everything holds.
Proposition G.8.5 If a field F has characteristic p, then p is a prime.
Proof: First note that if n · 1 = 0, if and only if for all a ̸= 0, n · a = 0 also. This just
follows from the distributive law and the definition of what is meant by n · 1, meaning that
you add 1 to itself n times. Suppose then that there are positive integers, each larger than
1 n, m such that nm · 1 = 0. Then grouping the terms in the sum associated with nm · 1,
it follows that n (m · 1) = 0. If the characteristic of the field is nm, this is a contradiction
because then m · 1 ̸= 0 but n times it is, implying that n < nm but n · a = 0 for a nonzero
a. Hence n · 1 = 0 showing that mn is not the characteristic of the field after all. 474
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Definition G.8.6 A field F is called perfect if every polynomial p (x) having coefficients in
F is separable.
The above shows that fields of characteristic 0 are perfect. The above theory about
Galois groups and fixed fields all works for perfect fields. What about fields of characteristic
p where p is a prime? The following interesting lemma has to do with a nonzero a ∈ F
having a pth root in F.
Lemma G.8.7 Let F be a field of characteristic p. Let a ̸= 0 where a ∈ F. Then either
p
xp − a is irreducible or there exists b ∈ F such that xp − a = (x − b) .
Proof: Suppose that xp − a is not irreducible. Then xp − a = g (x) f (x) where the
degree of g (x) , k is less than p and at least as large as 1. Then let b be a root of g (x). Then
bp − a = 0. Therefore,
p
xp − a = xp − bp = (x − b) .
p
That is right. xp − bp = (x − b) just like many beginning calculus students believe. It
happens because of the binomial theorem and the fact that the other terms have a factor of
p. Hence
p
xp − a = (x − b) = g (x) f (x)
p
k
and so g (x) divides (x − b) which requires that g (x) = (x − b) since g (x) has degree k.
It follows, since g (x) is given to have coefficients in F, that bk ∈ F. Also bp ∈ F. Since k, p
are relatively prime, due to the fact that k < p with p prime, there are integers m, n such
that
1 = mk + np
Then from what you mean by raising b to an integer power and the usual rules of exponents
for integer powers,
( )m
n
b = bk (bp ) ∈ F.
So when is a field of characteristic p perfect? As observed above, for a field of characteristic p,
p
(a + b) = ap + bp .
Also,
p
(ab) = ap bp
It follows that a → ap is a homomorphism. This is also one to one because, as mentioned
above
p
(a − b) = ap − bp
Therefore, if ap = bp , it follows that a = b. Therefore, this homomorphism is also one to
one.
Let Fp be the collection of ap where a ∈ F. Then clearly Fp is a subfield of F because
it is the image of a one to one homomorphism. What follows is the condition for a field of
characteristic p to be perfect.
Theorem G.8.8 Let F be a field of characteristic p. Then F is perfect if and only if F = Fp .
Proof: Suppose F = Fp first. Let f (x) be an irreducible polynomial over F. By Theorem
G.8.2, if f ′ (x) and f (x) are relatively prime over F then f (x) has no repeated roots.
Suppose then that the two polynomials are not relatively prime. If d (x) divides both f (x)
and f ′ (x) with degree of d (x) ≥ 1. Then, since f (x) is irreducible, it follows that d (x) is
G.9. PERMUTATIONS
475
a multiple of f (x) and so f (x) divides f ′ (x) which is impossible unless f ′ (x) = 0. But if
f ′ (x) = 0, then f (x) must be of the form
a0 + a1 xp + a2 x2p + · · · + an xnp
since if it had some other nonzero term with exponent not a multiple of p then f ′ (x) could
not equal zero since you would have something surviving in the expression for the derivative
after taking out multiples of p which is like
kaxk−1
where a ̸= 0 and k < p. Thus ka ̸= 0. Hence the form of f (x) is as indicated above.
If ak = bpk for some bk ∈ F, then the expression for f (x) is
bp0 + bp1 xp + bp2 x2p + · · · + bpn xnp
)p
(
= b0 + b1 x + bx x2 + · · · + bn xn
because of the fact noted earlier that a → ap is a homomorphism. However, this says that
f (x) is not irreducible after all. It follows that there exists ak such that ak ∈
/ Fp contrary
p
′
to the assumption that F = F . Hence the greatest common divisor of f (x) and f (x) must
be 1.
p
Next consider the other direction. Suppose F ̸= Fp . Then there exists a ∈ F \ F .
p
p
Consider the polynomial x − a. As noted above, its derivative equals 0. Therefore, x − a
and its derivative cannot be relatively prime. In fact, xp − a would divide both. Now suppose F is a finite field. If n · 1 is never equal to 0 then, since the field is finite,
k · 1 = m · 1, for some k < m. m > k, and (m − k) · 1 = 0 which is a contradiction. Hence F
is a field of characteristic p for some prime p, by Proposition G.8.5. The mapping a → ap
was shown to be a homomorphism which is also one to one. Therefore, Fp is a subfield of
F. It follows that it has characteristic q for some q a prime. However, this requires q = p
and so Fp = F. Then the following corollary is obtained from the above theorem.
Corollary G.8.9 If F is a finite field, then F is perfect.
With this information, here is a convenient version of the fundamental theorem of Galois
theory.
Theorem G.8.10 Let K be a splitting field of any polynomial p (x) ∈ F [x] where F is
k
either of characteristic 0 or of characteristic p with Fp = F. Let {Li }i=0 be the increasing
sequence of intermediate fields between F and K. Then each of these is a normal extension
of F and the Galois group G (Lj−1 , F) is a normal subgroup of G (Lj , F). In addition to this,
G (Lj , F) ≃ G (K, F) /G (K, Lj )
where the symbol ≃ indicates the two spaces are isomorphic.
G.9
Permutations
Let {a1 , · · · , an } be a set of distinct elements. Then a permutation of these elements is
usually thought of as a list in a particular order. Thus there are exactly n! permutations of
a set having n distinct elements. With this definition, here is a simple lemma.
Lemma G.9.1 Every permutation can be obtained from every other permutation by a finite
number of switches.
476
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Proof: This is obvious if n = 1 or 2. Suppose then that it is true for sets of n−1 elements.
Take two permutations of {a1 , · · · , an } , P1 , P2 . To get from P1 to P2 using switches, first
make a switch to obtain the last element in the list coinciding with the last element of P2 .
By induction, there are switches which will arrange the first n − 1 to the right order. It is customary to consider permutations in terms of the set In ≡ {1, · · · , n} to be more
specific. Then one can think of a given permutation as a mapping σ from this set In to
itself which is one to one and onto. In fact, σ (i) ≡ j where j is in the ith position. Often
people write such a σ in the following form
(
)
1 2 ··· n
(7.21)
i1 i2 · · · in
An easy way to understand the above permutation is through the use of matrix multiplicaT
tion by permutation matrices. The above vector (i1 , · · · , in ) is obtained by


1

(
)
 2 
ei1 ei2 · · · ein  . 
(7.22)
 .. 
n
This can be seen right away from looking at a simple example or by using the definition of
matrix multiplication directly.
Definition G.9.2 The sign of the permutation 7.21 is defined as the determinant of the
above matrix in 7.22.
In other words, the sign of the permutation
(
1 2 ···
i1 i2 · · ·
n
in
)
equals sgn (i1 , · · · , in ) defined earlier in Lemma 3.3.1.
Note that from the fact that the determinant is well defined and its properties, the sign of
a permutation is 1 if and only if the permutation is produced by an even number of switches
and that the number of switches used to produce a given permutation must be either even
or odd. Of course a switch is a permutation itself and this is called a transposition. Note
also that all these matrices are orthogonal matrices so to take the inverse, it suffices to take
a transpose, the inverse also being a permutation matrix.
The resulting group consisting of the permutations of In is called Sn . An important idea
is the notion of a cycle. Let σ be a permutation, a one to one and onto function defined on
In . A cycle is of the form
(
)
k, σ (k) , σ 2 (k) , σ 3 (k) , · · · , σ m−1 (k) , σ m (k) = k.
The last condition must hold for some m because In is finite. Then a cycle can be considered
as a permutation as follows. Let (i1 , i2 , · · · , im ) be a cycle. Then define σ by σ (i1 ) =
i2 , σ (i2 ) = i3 , · · · , σ (im ) = i1 , and if k ∈
/ {i1 , i2 , · · · , im } , then σ (k) = k.
Note that if you have two cycles, (i1 , i2 , · · · , im ) , (j1 , j2 , · · · , jm ) which are disjoint in
the sense that
{i1 , i2 , · · · , im } ∩ {j1 , j2 , · · · , jm } = ∅,
then they commute. It is then clear that every permutation can be represented in a unique
way by disjoint cycles. Start with 1 and form the cycle determined by 1. Then start with the
G.9. PERMUTATIONS
477
smallest k ∈ In which was not included and begin a cycle starting with this. Continue this
way. Use the convention that (k) is just the identity. This representation is unique up to
order of the cycles which does not matter because they commute. Note that a transposition
can be written as (a, b).
A cycle can be written as a product of non disjoint transpositions.
(i1 , i2 , · · · , im ) = (im−1 , im ) · · · (i2 , im ) (i1 , im )
Thus if m is odd, the permutation has sign 1 and if m is even, the permutation has sign
−1
−1. Also, it is clear that the inverse of the above permutation is (i1 , i2 , · · · , im )
=
(im , · · · , i2 , i1 ) .
Definition G.9.3 An is the subgroup of Sn such that for σ ∈ An , σ is the product of an
even number of transpositions. It is called the alternating group.
The following important result is useful in describing An .
Proposition G.9.4 Let n ≥ 3. Then every permutation in An is the product of 3 cycles
and the identity.
Proof: In case n = 3, you can list all of the permutations in An
(
) (
) (
)
1 2 3
1 2 3
1 2 3
,
,
1 2 3
2 3 1
3 1 2
In terms of cycles, these are
(1, 2, 3) , (1, 3, 2)
You can easily check that they are inverses of each other. Now suppose n ≥ 4. The
permutations in An are defined as the product of an even number of transpositions. There
are two cases. The first case is where you have two transpositions which share a number,
(a, c) (c, b) = (a, c, b)
Thus when they share a number, the product is just a 3 cycle. Next suppose you have the
product of two transpositions which are disjoint. This can happen because n ≥ 4. First note
that
(a, b) = (c, b) (b, a, c) = (c, b, a) (c, a)
Therefore,
(a, b) (c, d) = (c, b, a) (c, a) (a, d) (d, c, a)
= (c, b, a) (c, a, d) (d, c, a)
and so every product of disjoint transpositions is the product of 3 cycles. Lemma G.9.5 If n ≥ 5, then if B is a normal subgroup of An , and B is not the identity,
then B must contain a 3 cycle.
Proof: Let α be the permutation in B which is “closest” to the identity without being
the identity. That is, out of all permutations which are not the identity, this is one which
has the most fixed points or equivalently moves the fewest numbers. Then α is the product
of disjoint cycles. Suppose that the longest cycle is the first one and it has at least four
numbers. Thus
α = (i1 , i2 , i3 , i4 , · · · , m) γ 1 · · · γ p
478
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Since B is normal,
α1 ≡ (i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) γ 1 · · · γ p ∈ Am
Then consider α1 α−1 =
(i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) (m, · · · i4 , i3 , i2 , i1 )
Then for this permutation, i1 → i3 , i2 → i2 , i3 → i4 , i4 → i1 . The other numbers not in
{i1 , i2 , i3 , i4 } are fixed, and in addition i2 is fixed which did not happen with α. Therefore,
this new permutation moves only 3 numbers. Since it is assumed that m ≥ 4, this is a
contradiction to α fixing the most points. It follows that
α = (i1 , i2 , i3 ) γ 1 · · · γ p
(7.23)
α = (i1 , i2 ) γ 1 · · · γ p
(7.24)
or else
In the first case, say γ 1 = (i4 , i5 , · · · ) . Multiply as follows α1 =
(i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) γ 2 · · · γ p (i1 , i2 , i4 ) ∈ B
Then form α1 α−1 ∈ B given by
−1
(i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) γ 2 · · · γ p (i1 , i2 , i4 ) γ −1
p · · · γ 1 (i3 , i2 , i1 )
= (i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) (i1 , i2 , i4 ) (· · · , i5 , i4 ) (i3 , i2 , i1 )
Then i1 → i4 , i2 → i3 , i3 → i5 , i4 → i2 , i5 → i1 and other numbers are fixed. Thus α1 α−1
moves 5 points. However, α moves more than 5 if γ i is not the identity for any i ≥ 2. It
follows that
α = (i1 , i2 , i3 ) γ 1
and γ 1 can only be a transposition. However, this cannot happen because then the above
α would not even be in An . Therefore, γ 1 = ι and so
α = (i1 , i2 , i3 )
Thus in this case, B contains a 3 cycle.
Now consider case 7.24. None of the γ i can be a cycle of length more than 4 since
the above argument would eliminate this possibility. If any has length 3 then the above
argument implies that α equals this 3 cycle. It follows that each γ i must be a 2 cycle. Say
α = (i1 , i2 ) (i3 , i4 ) γ 2 · · · γ p
Thus it moves at least four numbers, greater than four if any of γ i for i ≥ 2 is not the
identity. As before, α1 ≡
(i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) γ 2 · · · γ p (i1 , i2 , i4 )
= (i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) γ 2 · · · γ p ∈ B
Then α1 α−1 =
−1 −1
(i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) γ 2 · · · γ p γ −1
p · · · γ 2 γ 1 (i3 , i4 ) (i1 , i2 )
=
(i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) (i3 , i4 ) (i1 , i2 ) ∈ B
G.10. SOLVABLE GROUPS
479
Then i1 → i3 , i2 → i4 , i3 → i1 , i4 → i3 so this moves exactly four numbers. Therefore, none
of the γ i is different than the identity for i ≥ 2. It follows that
α = (i1 , i2 ) (i3 , i4 )
(7.25)
and α moves exactly four numbers. Then since B is normal, α1 ≡
(i5 , i4 , i3 ) (i1 , i2 ) (i3 , i4 ) (i3 , i4 , i5 ) ∈ B
Then α1 α−1 =
(i5 , i4 , i3 ) (i1 , i2 ) (i3 , i4 ) (i3 , i4 , i5 ) (i3 , i4 ) (i1 , i2 ) ∈ B
Then i1 → i1 , i2 → i2 , i3 → i4 , i4 → i5 , i5 → i3 . Thus this permutation moves only three
numbers and so α cannot be of the form given in 7.25. It follows that case 7.24 does not
occur. Definition G.9.6 A group G is said to be simple if its only normal subgroups are itself and
the identity.
The following major result is due to Galois [18].
Proposition G.9.7 Let n ≥ 5. Then An is simple.
Proof: From Lemma G.9.5, if B is a normal subgroup of An , B ̸= {ι} , then it contains
a 3 cycle α = (i1 , i2 , i3 ),
(
)
i1 i2 i3
i2 i3 i1
Now let (j1 , j2 , j3 ) be another 3 cycle.
(
j1
j2
j2
j3
j3
j1
)
Let σ be a permutation which satisfies
σ (ik ) = jk
Then
σασ −1 (j1 ) =
σα (i1 ) = σ (i2 ) = j2
σασ −1 (j2 ) =
σασ −1 (j3 ) =
σα (i2 ) = σ (i3 ) = j3
σα (i3 ) = σ (i1 ) = j1
while σασ −1 leaves all other numbers fixed. Thus σασ −1 is the given 3 cycle. It follows that
B contains every 3 cycle. By Proposition G.9.4, this implies B = An . The only problem is
that it is not know whether σ is in An . This is where n ≥ 5 is used. You can modify σ on
two numbers not equal to any of the {i1 , i2 , i3 } by multiplying by a transposition so that
the possibly modified σ is expressed as an even number of transpositions. G.10
Solvable Groups
Recall the fundamental theorem of Galois theory which established a correspondence between the normal subgroups of G (K, F) and normal field extensions. Also recall that if H
is one of these normal subgroups, then there was an isomorphism between G (KH , F) and
the quotient group G (K, F) /H. The general idea of a solvable group is given next.
480
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Definition G.10.1 A group G is solvable if there exists a decreasing sequence of subgroups
m
{Hi }i=0 such that H i is a normal subgroup of H (i−1) ,
G = H0 ⊇ H1 ⊇ · · · ⊇ Hm = {ι} ,
and each quotient group Hi−1 /Hi is Abelian. That is, for [a] , [b] ∈ Hi−1 /Hi ,
[ab] = [a] [b] = [b] [a] = [ba]
Note that if G is an Abelian group, then it is automatically solvable. In fact you can
just consider H0 = G, H1 = {ι}. In this case H0 /H1 is just the group G which is Abelian.
There is another idea which helps in understanding whether a group is solvable. It
involves the commutator subgroup. This is a very good idea because this subgroup is
defined in terms of the group G.
Definition G.10.2 Let a, b ∈ G a group. Then the commutator is
aba−1 b−1
The commutator subgroup, denoted by G′ , is the smallest subgroup which contains all the
commutators.
The nice thing about the commutator subgroup is that it is a normal subgroup. There
are also many other amazing properties.
Theorem G.10.3 Let G be a group and let G′ be the commutator subgroup. Then G′ is a
normal subgroup. Also the quotient group G/G′ is Abelian. If H is any normal subgroup of
G such that G/H is Abelian, then H ⊇ G′ . If G′ = {ι} , then G must be Abelian.
Proof: The elements of G′ are just finite products of things like aba−1 b−1 . Note that
the inverse of something like this is also one of these.
( −1 −1 )−1
aba b
= bab−1 a−1 .
Thus the collection of finite products is indeed a subgroup. Now consider h ∈ G. Then
haba−1 b−1 h−1 = hah−1 hbh−1 ha−1 h−1 hb−1 h−1
(
)−1 (
)−1
= hah−1 hbh−1 hah−1
hbh−1
which is another one of those commutators. Thus for c a commutator and h ∈ G,
hch−1 = c1
another commutator. If you have a product of commutators c1 c2 · · · cm , then
hc1 c2 · · · cm h−1 =
m
∏
hci h−1 =
i=1
m
∏
di ∈ G′
i=1
′
where the di are each commutators. Hence G is a normal subgroup.
Consider now the quotient group. Is [g] [h] = [h] [g]? In other words, is [gh] = [hg]?
−1
´′
In other words, is gh (hg) = ghg −1 h−1 ∈ G′ ? Of course. This is a commutator and G
consists of products of these things. Thus the quotient group is Abelian.
Now let H be a normal subgroup of G such that G/H is Abelian. Then if g, h ∈ G,
[gh] = [hg] , gh (hg)
−1
= ghg −1 h−1 ∈ H
G.10. SOLVABLE GROUPS
481
Thus every commutator is in H and so H ⊇ G.
The last assertion is obvious because G/ {ι} is isomorphic to G. Also, to say that
G′ = {ι} is to say that
aba−1 b−1 = ι
which implies that ab = ba. Let G be a group and let G′ be its commutator subgroup. Then the commutator subgroup of G′ is G′′ and so forth. To save on notation, denote by G(k) the k th commutator
subgroup. Thus you have the sequence
G(0) ⊇ G(1) ⊇ G(2) ⊇ G(3) · · ·
each G(i) being a normal subgroup of G(i−1) although it is possible that G(i) is not a normal
subgroup of G. Then there is a useful criterion for a group to be solvable.
Theorem G.10.4 Let G be a group. It is solvable if and only if G(k) = {ι} for some k.
Proof: If G(k) = {ι} then G is clearly solvable because of Theorem G.10.3. The sequence
of commutator subgroups provides the necessary sequence of subgroups.
Next suppose that you have
G = H0 ⊇ H1 ⊇ · · · ⊇ Hm = {ι}
where each is normal in the preceding and the quotient groups are Abelian. Then from
Theorem G.10.3, G(1) ⊆ H1 . Thus H1′ ⊇ G(2) . But also, from Theorem G.10.3, since
H1 /H2 is Abelian,
H2 ⊇ H1′ ⊇ G(2) .
Continuing this way G(k) = {ι} for some k ≤ m. Theorem G.10.5 If G is a solvable group and if H is a homomorphic image of G, then H
is also solvable.
Proof: By the above theorem, it suffices to show that H (k) = {ι} for some k. Let
′
f be the homomorphism. Then (H ′ = f (G
). To see this, consider (a commutator
of
)
)
−1
−1
−1 −1
(1)
(1)
H, f (a) f (b) f (a) f (b)
= f aba b
. It follows that H
= f G
. Now continue this way, letting G(1) play the role of G and H (1) the role of H. Thus, since G is
solvable, some G(k) = {ι} and so H (k) = {ι} also. Now as an important example, of a group which is not solvable, here is a theorem.
Theorem G.10.6 For n ≥ 5, Sn is not solvable.
Proof: It is clear that An is a normal subgroup of Sn because if σ is a permutation, then
it has the same sign as σ −1 . Thus σασ −1 ∈ An if α ∈ An . If H is a normal subgroup of Sn ,
for which Sn /H is Abelian, then H contains the commutator G′ . However, ασα−1 σ −1 ∈ An
obviously so An ⊇ Sn′ . By Proposition G.9.7, this forces Sn′ = An . So what is Sn′′ ? If it is
(k)
Sn , then Sn ̸= {ι} for any k and it follows that Sn is not solvable. If Sn′′ = {ι} , the only
other possibility, then An / {ι} is Abelian and so An is Abelian, but this is obviously false
because the cycles (1, 2, 3) , (2, 1, 4) are both in An . However, (1, 2, 3) (2, 1, 4) is
(
)
1 2 3 4
4 2 1 3
while (2, 1, 4) (1, 2, 3) is
(
1
1
2
3
3
4
4
2
)
Note that the above shows that An is not Abelian for n = 4 also.
482
G.11
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Solvability By Radicals
First of all, there exists a field which has all the nth roots of 1. You could simply define it
to be the smallest sub field of C such that it contains these roots. You could also enlarge
it by including some other numbers. For example, you could include Q. Observe that if
ξ ≡ ei2π/n , then ξ n = 1 but ξ k ̸= 1 if k < n and that if k < l < n, ξ k ̸= ξ l . Such a field
has characteristic 0 because for m an integer, m · 1 ̸= 0. The following is from Herstein [14].
This is the kind of field considered here.
Lemma G.11.1 Suppose a field F has all the nth roots of 1 for a particular n and suppose
there exists ξ such that the nth roots of 1 are of the form ξ k for k = 1, · · · , n, the ξ k being
distinct. Let a ∈ F be nonzero. Let K denote the splitting field of xn − a over F, thus K is
a normal extension of F. Then K = F [u] where u is any root of xn − a. The Galois group
G (K, F) is Abelian.
Proof: Let u be a root of xn − a and let K equal F [u] . Then let ξ be the nth root of
unity mentioned. Then
(
)n
k
ξ k u = (ξ n ) un = a
{
}
and so each ξ k u is a root of xn − a and these are distinct. It follows that u, ξu, · · · , ξ n−1 u
are the roots of xn − a and all are in F [u] . Thus F [u] = K. Let σ ∈ G (K, F) and observe
that since σ fixes F,
((
)n
) ( (
))n
0 = σ ξk u − a = σ ξk u
−a
It follows that σ maps roots of xn − a to roots of xn − a. Therefore, if σ, α are two elements
of G (K, F) , there exist i, j each no larger than n − 1 such that
σ (u) = ξ i u, α (u) = ξ j u
A typical thing in F [u] is p (u) where p (x) ∈ F [x]. Then
(
)
(
)
σα (p (u)) = p ξ j ξ i u = p ξ i+j u
(
)
(
)
ασ (p (u)) = p ξ i ξ j u = p ξ i+j u
Therefore, G (K, F) is Abelian. Definition G.11.2 For F a field, a polynomial p (x) ∈ F [x] is solvable by radicals over
F ≡ F0 if there is a sequence of fields F1 = F [a1 ] , F2 = F1 [a2 ] , · · · , Fk = Fk−1 [ak ] such that
for each i ≥ 1, aki i ∈ Fi−1 and Fk contains a splitting field K for p (x) over F.
Lemma G.11.3 In the above definition, you can assume that Fk is a normal extension of
F.
Proof: First note that Fk = F [a1 , a2 , · · · , ak ]. Let G be the normal extension of Fk .
By Lemma G.7.3, G is the smallest field which contains the conjugate fields
[
]
η j (F [a1 , a2 , · · · , ak ]) = F η j a1 , η j a2 , · · · , η j ak
( )
(
)ki
for {η 1 , η 2 , · · · , η m } = G (Fk , F). Also, η j ai
= η j aki i ∈ η j Fi−1 , η j F = F. Then
G = F [η 1 (a1 ) , η 1 (a2 ) , · · · , η 1 (ak ) , η 2 (a1 ) , η 2 (a2 ) , · · · , η 2 (ak ) · · · ]
and this is a splitting field so is a normal extension. Thus G could be the new Fk with
respect to a longer sequence but would now be a splitting field. G.11. SOLVABILITY BY RADICALS
483
At this point, it is a good idea to recall the big fundamental theorem mentioned above
which gives the correspondence between normal subgroups and normal field extensions since
it is about to be used again.
F ≡ F0
G (F, F) = {ι}
⊆ F1
⊆ F2
⊆ G (F1 , F) ⊆ G (F2 , F)
···
···
⊆ Fk−1
⊆ G (Fk−1 , F)
⊆ Fk ≡ K
⊆ G (Fk , F)
(7.26)
Theorem G.11.4 Let K be a splitting field of any polynomial p (x) ∈ F [x] where F is
k
either of characteristic 0 or of characteristic p with Fp = F. Let {Fi }i=0 be the increasing
sequence of intermediate fields between F and K. Then each of these is a normal extension
of F and the Galois group G (Fj−1 , F) is a normal subgroup of G (Fj , F). In addition to this,
G (Fj , F) ≃ G (K, F) /G (K, Fj )
where the symbol ≃ indicates the two spaces are isomorphic.
Theorem G.11.5 Let f (x) be a polynomial in F [x] where F is a field of characteristic 0
which contains all nth roots of unity for each n ∈ N. Let K be a splitting field of f (x) . Then
if f (x) is solvable by radicals over F, then the Galois group G (K, F) is a solvable group.
Proof: Using the definition given above for f (x) to be solvable by radicals, there is a
sequence of fields
F0 = F ⊆ F1 ⊆ · · · ⊆ Fk , K ⊆ Fk ,
where Fi = Fi−1 [ai ], aki i ∈ Fi−1 , and each field extension is a normal extension of the preceding one. You can assume that Fk is the splitting field of a polynomial having coefficients
in Fj−1 . This follows from the Lemma G.11.3 above. Then starting the hypotheses of the
theorem at Fj−1 rather than at F, it follows from Theorem G.11.4 that
G (Fj , Fj−1 ) ≃ G (Fk , Fj−1 ) /G (Fk , Fj )
By Lemma G.11.1, the Galois group G (Fj , Fj−1 ) is Abelian and so this requires that
G (Fk , F) is a solvable group.
Of course K is a normal field extension of F because it is a splitting field. By Theorem G.10.5, G (Fk , K) is a normal subgroup of G (Fk , F). Also G (K, F) is isomorphic to
G (Fk , F) /G (Fk , K) and so G (K, F) is a homomorphic image of G (Fk , F) which is solvable. Here is why this last assertion is so. Define θ : G (Fk , F) /G (Fk , K) → G (K, F) by
θ [σ] ≡ σ|K . Then this is clearly a homomorphism if it is well defined. If [σ] = [α] this
means σα−1 ∈ G (Fk , K) and so σα−1 fixes everything in K so that θ is indeed well defined.
Therefore, by Theorem G.10.5, G (K, F) must also be solvable. Now this result implies that you can’t solve the general polynomial equation of degree 5
or more by radicals. Let {a1 , a2 , · · · , an } ⊆ G where G is some field which contains a field
F0 . Let
F ≡ F0 (a1 , a2 , · · · , an )
the field of all rational functions in the numbers a1 , a2 , · · · , an . I am using this notation
because I don’t want to assume the ai are algebraic over F. Now consider the equation
p (t) = tn − a1 tn−1 + a2 tn−2 + · · · ± an .
and suppose that p (t) has distinct roots, none of them in F. Let K be a splitting field for
p (t) over F so that
n
∏
p (t) =
(t − ri )
k=1
484
APPENDIX G. FIELDS AND FIELD EXTENSIONS
Then it follows that
ai = si (r1 , · · · , rn )
where the si are the elementary symmetric functions defined in Definition G.1.2. For σ ∈
G (K, F) you can define σ
¯ ∈ Sn by the rule
σ
¯ (k) ≡ j where σ (rk ) = rj .
Recall that the automorphisms of G (K, F) take roots of p (t) to roots of p (t). This mapping
σ → σ
¯ is onto, a homomorphism, and one to one because the symmetric functions si
are unchanged when the roots are permuted. Thus a rational function in s1 , s2 , · · · , sn is
unaffected when the roots rk are permuted. It follows that G (K, F) cannot be solvable if
n ≥ 5 because Sn is not solvable.
1
3
For example, consider 3x5 − 25x3 + 45x + 1 or equivalently x5 − 25
3 x + 15x + 3 . It clearly
has no rational roots and a graph will show it has 5 real roots. Let F be the smallest field
contained in C which contains the coefficients of the polynomial and all roots of unity. Then
probably none of these roots are in F and they are all distinct. In fact, it appears that the
real numbers which are in F are rational. Therefore, from the above, none of the roots are
solvable by radicals involving numbers from F. Thus none are solvable by radicals using
numbers from the smallest field containing the coefficients either.
Bibliography
[1] Apostol T., Calculus Volume II Second edition, Wiley 1969.
[2] Artin M., Algebra, Pearson 2011.
[3] Baker, Roger, Linear Algebra, Rinton Press 2001.
[4] Baker, A. Transcendental Number Theory, Cambridge University Press 1975.
[5] Chahal J.S., Historical Perspective of Mathematics 2000 B.C. - 2000 A.D. Kendrick
Press, Inc. (2007)
[6] Coddington and Levinson, Theory of Ordinary Differential Equations McGraw Hill
1955.
[7] Davis H. and Snider A., Vector Analysis Wm. C. Brown 1995.
[8] Edwards C.H., Advanced Calculus of several Variables, Dover 1994.
[9] Evans L.C. Partial Differential Equations, Berkeley Mathematics Lecture Notes. 1993.
[10] Friedberg S. Insel A. and Spence L., Linear Algebra, Prentice Hall, 2003.
[11] Golub, G. and Van Loan, C.,Matrix Computations, Johns Hopkins University Press,
1996.
[12] Gurtin M., An introduction to continuum mechanics, Academic press 1981.
[13] Hardy G., A Course Of Pure Mathematics, Tenth edition, Cambridge University Press
1992.
[14] Herstein I. N., Topics In Algebra, Xerox, 1964.
[15] Hofman K. and Kunze R., Linear Algebra, Prentice Hall, 1971.
[16] Householder A. The theory of matrices in numerical analysis , Dover, 1975.
[17] Horn R. and Johnson C., matrix Analysis, Cambridge University Press, 1985.
[18] Jacobsen N. Basic Algebra Freeman 1974.
[19] Karlin S. and Taylor H., A First Course in Stochastic Processes, Academic Press,
1975.
[20] Marcus M., and Minc H., A Survey Of Matrix Theory and Matrix Inequalities,
Allyn and Bacon, INc. Boston, 1964
[21] Nobel B. and Daniel J., Applied Linear Algebra, Prentice Hall, 1977.
485
486
[22] E. J. Putzer, American Mathematical Monthly, Vol. 73 (1966), pp. 2-7.
[23] Rudin W., Principles of Mathematical Analysis, McGraw Hill, 1976.
[24] Rudin W., Functional Analysis, McGraw Hill, 1991.
[25] Salas S. and Hille E., Calculus One and Several Variables, Wiley 1990.
[26] Strang Gilbert, Linear Algebra and its Applications, Harcourt Brace Jovanovich 1980.
[27] Wilkinson, J.H., The Algebraic Eigenvalue Problem, Clarendon Press Oxford 1965.
BIBLIOGRAPHY
Appendix H
Selected Exercises
Exercises
5 x = 2, y = 0, z = 1.
1.6
7 x = 2 − 2t, y = −t, z = t.
9 x = t, y = s + 2, z = −s, w = s
−1
5
9
− 106
i
= 106
√
√
3 − (1 − i) 2, (1 + i) 2.
1 (5 + i9)
Exercises
1.14
4 This makes no sense at all. You can’t add different
size vectors.
Exercises
1.17
(∑
)1/2
∑n
n
2
3 | k=1 β k ak bk | ≤
β
|a
|
·
k
k=1 k
(∑
)1/2
n
2
k=1 β k |bk |
4
5 If z ̸= 0, let ω =
z
|z|
7 sin (5x) = 5 cos4 x sin x − 10 cos2 x sin3 x + sin5 x
cos (5x) = cos x − 10 cos x sin x + 5 cos x sin x
√ ))
(
(√
)) (
(
9 (x + 2) x − i 3 + 1
x− 1−i 3
√ )) (
√ ))
(
(
(
11 x − (1 − i) 2
x − − (1 + i) 2 ·
√ )) (
√ ))
(
(
(
x − − (1 − i) 2
x − (1 + i) 2
√
15 There is no single −1.
5
3
2
4 The inequality still holds. See the proof of the inequality.
4
Exercises
2.2
2 A=
A+AT
2
+
A−AT
2
3 You know that Aij = −Aji . Let j = i to conclude
that Aii = −Aii and so Aii = 0.
Exercises
5 0′ = 0 + 0′ = 0.
1.11
6 0A = (0 + 0) A = 0A + 0A. Now add the additive
inverse of 0A to both sides.
1 x = 2 − 4t, y = −8t, z = t.
3 These are invalid row operations.
487
488
APPENDIX H. SELECTED EXERCISES


1 0 53
7 0 = 0A = (1 + (−1)) A = A+(−1) A. Hence, (−1) A
29 Row echelon form:  0 1 23  . A has no inis the unique additive inverse of A. Thus −A =
0 0 0
(−1) A. The additive inverse is unique because if A1
verse.
is an additive inverse, then A1 = A1 +(A + (−A)) =
(A1 + A) + (−A) = −A.
∑
∑ ∑
Exercises
10 (Ax, y) = i (Ax)i yi = i k Aik xk yi
(
)
(
)
∑
∑
∑ ∑
T
T
x,A y = k xk i A ki yi = k i xk Aik yi , 2.9
the same as above. Hence the two are equal.
(
)
1 Show the map T : Rn → Rm defined by T (x) = Ax
T
11 (AB) x, y ≡
where A is an m×n matrix and x is an m×1 column
vector is a linear transformation.
(x, (AB) y) =
( T
) (
)
This follows from matrix multiplication rules.
A x,By = B T AT x, y (
. Since this holds for ev)
T
3 Find the matrix for the linear transformation which
ery x, y, you have for all y, (AB) x − B T AT x, y .
rotates every vector in R2 through an angle of π/4.
T
√ )
(
) ( 1√
Let y = (AB) x − B T AT x. Then since x is arbi1
cos (π/4) − sin (π/4)
2
−
2
2
√
√2
trary, the result follows.
=
1
1
sin (π/4) cos (π/4)
2
2
2 2
13 Give an example of matrices, A, B, C such that B ̸=
5 Find the matrix for the linear transformation which
C, A ̸= 0, and yet AB = AC.
rotates every vector in R2 through an angle of 2π/3.
(
)(
) (
)
√ )
(
) (
1 1
1 −1
0 0
=
2 cos (π/3) −2 sin (π/3)
1 1
−1 1
0 0
√1 − 3
=
2 sin (π/3) 2 cos (π/3)
3
1
(
)(
) (
)
1 1
−1 1
0 0
=
7 Find the matrix for the linear transformation which
1 1
1 −1
0 0
rotates every vector in R2 through an angle of 2π/3
15 It appears that there are 8 ways to do this.
and then reflects across the x axis.
(
)(
)
−1 −1
−1
1 0
cos (2π/3) − sin (2π/3)
17 ABB A = AIA = I
0 −1
sin (2π/3) cos (2π/3)
B −1 A−1 AB = B −1 IB = I
√ )
(
1
−√
− 12 3
2
Then by the definition of the inverse and its unique=
1
−1
−1
− 12 3
2
ness, it follows that (AB) exists and (AB) =
B −1 A−1 .
9 Find the matrix for the linear transformation which
rotates every vector in R2 through an angle of π/4
19 Multiply both sides on the left by A−1 .
and then reflects across the x axis.
(
)(
) (
)
(
)(
)
1 1
1 −1
0 0
1 0
cos (π/4) − sin (π/4)
21
=
1 1
−1 1
0 0
0 −1
sin (π/4) cos (π/4)
√ )
( 1√
2 − 12 √2
23 Almost anything works.
2
√
=
(
)(
) (
)
− 12 2 − 12 2
1 2
1 2
5 2
=
3 4
2 0
11 6
11 Find the matrix for the linear transformation which
(
)(
) (
)
reflects every vector in R2 across the x axis and then
1 2
1 2
7 10
=
rotates every vector through an angle of π/4.
2 0
3 4
2 4
(
)(
)
(
)
cos (π/4) − sin (π/4)
1 0
−z −w
sin (π/4) cos (π/4)
0 −1
25
, z, w arbitrary.
z
w
√
)
( 1√
1
2
2 √2
2 √

−1 

=
1
1
1 2 3
−2 4 −5
2 2 −2 2
1 −2 
27  2 1 4  =  0
1 0 2
1 −2 3
489
13 Find the matrix for the linear transformation which
reflects every vector in R2 across the x axis and then
rotates every vector through an angle of π/6.
(
)(
)
cos (π/6) − sin (π/6)
1 0
sin (π/6) cos (π/6)
0 −1
( 1√
)
1
2 3
2√
=
1
1
−2 3
2
15 Find the matrix for the linear transformation which
rotates every vector in R2 through an angle of 5π/12.
Hint: Note that 5π/12 = 2π/3 − π/4.
(
)
cos (2π/3) − sin (2π/3)
sin (2π/3) cos (2π/3)
(
)
cos (−π/4) − sin (−π/4)
·
sin (−π/4) cos (−π/4)
√
√ √
√ )
( 1√ √
2√3 − 41 √2 − 14√ 2√ 3 − 14√ 2
4
√
=
1
1
1
1
4 2 3+ 4 2
4 2 3− 4 2
T
17 Find the matrix for proju (v) where u = (1, 5, 3) .


1 5 3
1 
5 25 15 
35
3 15 9
25 Show that the function Tu defined by Tu (v) ≡ v −
proju (v) is also a linear transformation.
This is the sum of two linear transformations so it
is obviously linear.
33 Let a basis for W be {w1 , · · · , wr } Then if there
exists v ∈ V \ W, you could add in v to the basis
and obtain a linearly independent set of vectors of
V which implies that the dimension of V is at least
r + 1 contrary to assumption.
41 Obviously not. Because of the Coriolis force experienced by the fired bullet which is not experienced
by the dropped bullet, it will not be as simple as
in the physics books. For example, if the bullet is
fired East, then y ′ sin ϕ > 0 and will contribute to
a force acting on the bullet which has been fired
which will cause it to hit the ground faster than the
one dropped. Of course at the North pole or the
South pole, things should be closer to what is expected in the physics books because there sin ϕ = 0.
Also, if you fire it North or South, there seems to
be no extra force because y ′ = 0.
Exercises
19 Give an example of a 2×2 matrix A which has all its
entries nonzero and satisfies A2 = A. Such a matrix 3.2
is called idempotent.
You know it can’t be invertible. So try this.
(
2
a a
b b
)2
(
=
a2 + ba a2 + ba
b2 + ab b2 + ab
)
2
Let a + ab = a, b + ab = b. A solution which yields
a nonzero matrix is
(
)
2
2
−1 −1
21 x2 = − 12 t1 − 12 t2 − t3 , x1 = −2t1 − t2 + t3 where the
ti are arbitrary.

 

−2t1 − t2 + t3
4
 − 1 t1 − 1 t2 − t3   7/2 
2
 2
 

 +  0  , ti ∈ F
t1
23 

 


  0 
t2
0
t3
That second vector is a particular solution.
(
)
(
)
2 1 = det AA−1 = det (A) det A−1 .
( )
3 det (A) = det AT = det (−A) = det (−I) det (A) =
n
(−1) det (A) = − det (A) .
6 Each time you take out an a from a row, you multiply by a the determinant of the matrix which remains. Since there are n rows, you do this n times,
hence you get an .
(
)
(
)
9 det A = det P −1 BP = det P −1 det (B) det (P )
(
)
= det (B) det P −1 P = det (B) .
11 If that determinant equals 0 then the matrix λI − A
has no inverse. It is not one to one and so there
exists x ̸= 0 such that (λI − A) x = 0. Also recall
the process for finding the inverse.
 −t

e
0
0
e−t (cos t + sin t) − (sin t) e−t 
13  0
0 −e−t (cos t − sin t) (cos t) e−t
( )
2
15 You have to have det (Q) det QT = det (Q) = 1
and so det (Q) = ±1.
490
APPENDIX H. SELECTED EXERCISES
(
Exercises
3.6

1 2 3
 −6 3 2
5 det 
 5 2 2
3 4 6

1 −t
2e
1

cos
t + 12 sin t
6
2
1
1
2 sin t − 2 cos t
2 E=
1
0
1
1
)

2
3 
=5
3 
4

1 −t
0
2e
1
1

− sin t
2 sin t − 2 cos t
1
1
cos t − 2 cos t − 2 sin t
(
)
8 det (λI − A) = det λI − S −1 BS
(
)
= det λS −1 S − S −1 BS
(
)
= det S −1 (λI − B) S
(
)
= det S −1 det (λI − B) det (S)
(
)
= det S −1 S det (λI − B) = det (λI − B)
9 From the Cayley Hamilton theorem,An +an−1 An−1 +
· · · + a1 A + a0 I = 0. Also the characteristic polynomial is det (tI − A) and the constant term is
n
(−1) det (A) . Thus a0 ̸= 0 if and only if det (A) ̸=
0 if and only if A−1 has an inverse. Thus if A−1
exists, it follows that
(
)
a0 I = − An + an−1 An−1 + · · · + a1 A
(
)
= A −An−1 − an−1 An−2 − · · · − a1 I and also
(
)
a0 I = −An−1 − an−1 An−2 − · · · − a1 I A Therefore, the inverse is
(
)
1
n−1
− an−1 An−2 − · · · − a1 I
a0 −A
11 Say the characteristic polynomial is q (t) which is of
degree 3. Then if n ≥ 3, tn = q (t) l (t) + r (t) where
the degree of r (t) is either less than 3 or it equals
zero. Thus An = q (A) l (A) + r (A) = r (A) and
so all the terms An for n ≥ 3 can be replaced with
some r (A) where the degree of r (t) is no more than
2. Thus, assuming there are no convergence
issues,
∑2
the infinite sum must be of the form k=0 bk Ak .
Exercises
4.6
1 A typical thing in {Ax : x ∈ P (u1 , · · · , un )} is
∑n
k=1 tk Auk : tk ∈ [0, 1] and so it is just
P (Au1 , · · · , Aun ) .
P (e1 , e2 )
5 Here they

1
 0
0

0
 1
0
E(P (e1 , e2 ))
are.
0
1
0
1
0
0
 
0
0 ,
1
 
0
0 ,
1
0
1
0
0
0
1
1
0
0
0
0
1
 
1
0 ,
0
 
0
1 ,
0

0
1 
0

0 0 1
0 1 0 
1 0 0
0 1
0 0
1 0
So what is the dimension of the span of these? One
way to systematically accomplish this is to unravel
them and then use the row reduced echelon form.
Unraveling these yields the column vectors
     

1
0
0
0
1
0
 0  0  1  1  0  0 
     

 0  1  0  0  0  1 
     

 0  1  0  1  0  0 
     

 1  0  0  0  0  1 
     

 0  0  1  0  1  0 
     

 0  0  1  0  0  1 
     

 0  1  0  0  1  0 
1
0
0
1
0
0
Then arranging these as the columns of a matrix
yields the following along with its row reduced echelon form.


1 0 0 0 1 0
 0 0 1 1 0 0 


 0 1 0 0 0 1 


 0 1 0 1 0 0 


 1 0 0 0 0 1 , row echelon form:


 0 0 1 0 1 0 


 0 0 1 0 0 1 


 0 1 0 0 1 0 
1 0 0 1 0 0
491














1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
1
1
−1
−1
0
0
0
0

Let M be a subspace of Fn . If it equals {0} , consider
the matrix I. Otherwise, it has a basis {m1 , · · · , mk } .
Consider the matrix
(
)
m1 · · · mk 0













where 0 is either not there in case k = n or has
n − k columns.
The dimension is 5.
10 It is because you cannot have more than min (m, n)
nonzero rows in the row reduced echelon form. Recall that the number of pivot columns is the same
as the number of nonzero rows from the description
of this row reduced echelon form.
11 It follows from the fact that e1 , · · · , em occur as
columns in row reduced echelon form that the dimension of the column space of A is n and so, since
this column space is A (Rn ) , it follows that it equals
Fm .
12 Since m > n the dimension of the column space of
A is no more than n and so the columns of A cannot
span Fm .
∑
15 If
∑ i ci zi = 0, apply A to both sides to obtain
i ci wi = 0. By assumption, each ci = 0.
19 There are more columns than rows and at most m
can be pivot columns so it follows at least one column is a linear combination of the others hence A
is not one too one.
2
21 |b−Ay| = |b−Ax+Ax−Ay|
2
2
2
2
30 This is easy to see when you consider that P ij is
its own inverse and that P ij multiplied on the right
switches the ith and j th columns. Thus you switch
the columns and then you switch the rows. This has
the effect of switching Aii and Ajj . For example,



a b c d
1 0 0 0
 0 0 0 1  e f z h 



 0 0 1 0  j k l m ·
0 1 0 0
n t h g
 


1 0 0 0
a d c b
 0 0 0 1   n g h t 

 

 0 0 1 0 = j m l k 
0 1 0 0
e h z f
More formally, the iith entry of P ij AP ij is
∑ ij
ij
ij
Pis Asp Ppi
= Pijij Ajj Pji
= Aij
s,p
31 If A has an inverse, then it is one to one. Hence the
columns are independent. Therefore, they are each
pivot columns. Therefore, the row reduced echelon
form of A is I. This is what was needed for the
procedure to work.
2
= |b−Ax| + |Ax − Ay| + 2 (b−Ax,A (x − y))
(
)
2
2
= |b−Ax| +|Ax − Ay| +2 AT b−AT Ax, (x − y)
= |b−Ax| + |Ax − Ay| and so, Ax is closest to b
out of all vectors Ay.


1 0 2 0
 0 1 1 7 

27 No. 
 0 0 0 1 
0 0 0 0
29 Let A be an m × n matrix. Then ker (A) is a subspace of Fn . Is it true that every subspace of Fn is
the kernel or null space of some matrix? Prove or
disprove.
Exercises
5.8

1 2
1  2 1
1 2

1 0
 2 1
1 0

1 2
3  1 2
2 1

1 0
 2 1
1 0

0
3 . =
3

0
1
0  0
1
0


1
2 . = 
1

0
1
0  0
1
0

2 0
−3 3 
0 3

1 0 0
0 0 1 ·
0 1 0

2
1
−3 −1 
0
1
492
APPENDIX H. SELECTED EXERCISES




1
1 0 0 0
(e) There is no solution to this system of inequal 0 0 0 1 
2 
ities because the minimum value of x7 is not
. = 

 0 0 1 0 ·
1 
0.
1
0 1 0 0


0 0 0
1 2
1
Exercises


1 0 0 
  0 −4 −2 
0 1 0   0 0 −1 
7.3
0 −1 1
0 0
0

1 Because the vectors which result are not parallel to
2 1 0
the vector you begin with.
0 1 1 
0 2 1
3 λ → λ−1 and λ → λm .
√
√ √

1
1
10 √
11
0 √
5 Let x be the eigenvector. Then Am x = λm x,Am x =
11 √11
11 √
√
3
3
1


=
·
Ax = λx and so
11 √11 − 110 √10√11 − 10√ 2
√5
1
1
3
11
−
10
11
2
5
λm = λ
11
110
10
√
√
√
 √

2
6
4
11
11
11
11
Hence if λ ̸= 0, then
11
11
11
√
√
√
√
√
√
2
1
2
 0

10
11
10
11
−
10
11
11
22 √ √
55 √ √
1
1
λm−1 = 1
0
0
2 2 5
5 2 5
1
 1
5 
 2
3

1
 3

 2
1

1
9  3
1

2
2
4
2
Exercises
6.6
1 The maximum is 7 and it occurs when x1 = 7, x2 =
0, x3 = 0, x4 = 3, x5 = 5, x6 = 0.
2 Maximize and minimize the following if possible.
All variables are nonnegative.
(a) The minimum is −7 and it happens when x1 =
0, x2 = 7/2, x3 = 0.
(b) The maximum is 7 and it occurs when x1 =
7, x2 = 0, x3 = 0.
(c) The maximum is 14 and it happens when x1 =
7, x2 = x3 = 0.
(d) The minimum is 0 when x1 = x2 = 0, x3 = 1.
4 Find a solution to the following inequalities for x, y ≥
0 if it is possible to do so. If it is not possible, prove
it is not possible.
(a) There is no solution to these inequalities with
x1 , x2 ≥ 0.
(b) A solution is x1 = 8/5, x2 = x3 = 0.
(c) There will be no solution to these inequalities
for which all the variables are nonnegative.
(d) There is a solution when x2 = 2, x3 = 0, x1 =
0.
and so |λ| = 1.


−1 −1 7
7  −1 0 4 , eigenvectors:
−1 −1 5
 
 
 3 
 2 
1
1
↔ 1,
↔ 2. This is a defective ma 
 
1
1
trix.


−7 −12 30
9  −3 −7 15 , eigenvectors:
−3 −6 14


  

5 
 −2
 2 
 1  ,  0  ↔ −1,  1  ↔ 2




0
1
1
This matrix is not defective because, even though
λ = 1 is a repeated eigenvalue, it has a 2 dimensional eigenspace.


3 −2 −1
1 , eigenvectors:
11  0 5
0 2
4

  


0

 −1 
 1
 0  ,  − 1  ↔ 3,  1  ↔ 6
2




1
0
1
This matrix is not defective.


5 2 −5
13  12 3 −10 , eigenvectors:
12 4 −11
493
 
 i 
 i  ↔ 2 + 6i


1
 1   5 
 −3

6
 1  ,  0  ↔ −1


0
1
This matrix is defective. In this case, there is only
one eigenvalue, −1 of multiplicity 3 but the dimension of the eigenspace is only 2.


1
26 −17
−4
4 , eigenvectors:
15  4
−9 −18
9
 1 


 −3 
 −2 
 2  ↔ 0,  1  ↔ −12,
3




0
1


 −1 
 0  ↔ 18


1
  3 


−2
1 2
 4 
17  −11 −2 9 , eigenvectors:  14  ↔ 1


−8
0 7
1
This

4
19  0
2








is defective.

−2 −2
2 −2 , eigenvectors:
0
2



1

 −i 
−1  ↔ 4,  −i  ↔ 2 − 2i,



1
1

i 
i  ↔ 2 + 2i

1


4 −2 −2
21  0 2 −2 , eigenvectors:
2 0
2




1


 −i 
 −1  ↔ 4,  −i  ↔ 2 − 2i,




1
1
 
 i 
 i  ↔ 2 + 2i


1


1
1 −6
23  7 −5 −6 , eigenvectors:
−1 7
2




1


 −i 
 −1  ↔ −6,  −i  ↔ 2 − 6i,




1
1
This is not defective.
25 First consider the eigenvalue λ = 1. Then you have
ax2 = 0, bx3 = 0. If neither a nor b = 0 then
λ = 1 would be a defective eigenvalue and the matrix would be defective. If a = 0, then the dimension of the eigenspace is clearly 2 and so the matrix would be nondefective. If b = 0 but a ̸= 0,
then you would have a defective matrix because the
eigenspace would have dimension less than 2. If
c ̸= 0, then the matrix is defective. If c = 0 and
a = 0, then it is non defective. Basically, if a, c ̸= 0,
then the matrix is defective.
27 A (x + iy) = (a + ib) (x + iy) . Now just take complex conjugates of both sides.
29 Let A be skew symmetric. Then if x is an eigenvector for λ,
¯
λxT x
¯ = xT AT x
¯ = −xT A¯
x = −xT x
¯λ
¯ Thus a + ib = − (a − ib) and so
and so λ = −λ.
a = 0.
31 This follows from the observation that if Ax = λx,
then Ax = λx
   1     

−2
1
1
33  −1  , 1 ,  12  , 12  ,  1  , 13 
0
1
1


−1
35  1  (a cos (t) + b sin (t)) ,
1


0
(
(√ )
(√ ))
 −1  c sin 2t + d cos 2t ,
1
 
2
 1  (e cos (2t) + f sin (2t))where a, b, c, d, e, f are
1
scalars.
Exercises
7.10
1 To get it, you must be able to get the eigenvalues
and this is typically not possible.
494
APPENDIX H. SELECTED EXERCISES
(
) (
)(
)
−1
0 −1
2 0
4
=
0
1 0
0 1
(
)(
)
2 0
0 −1
A1 =
0 1
1 0
)
(
0 −2
=
1 0
(
) (
)(
)
0 −2
1 0
0 −1
=
1 0
1 0
0 2
(
)(
)
(
)
1 0
0 −1
0 −1
A2 =
=
. Now
0 2
1 0
2 0
it is back to where you started. Thus the algorithm
merely
between
(
) bounces
(
) the two matrices
0 −1
0 −2
and
and so it can’t possi2 0
1 0
bly converge.
0
2
11 It
by
 is asubspace.

 It
is spanned

3
2
1
 1   1   0 
 ,
  
 1   1  ,  0  . These are also indepen0
0
1
dent so they constitute a basis.
13 Pick n points {x1 , · · · , xn } . Then let ei (x) = 0 unn
less x = xi when it equals 1. Then {ei }i=1 is linearly
independent, this for any n.
{
}
15 1, x, x2 , x3 , x4
∑n
∑n
17 L ( i=1 ci vi ) ≡ i=1 ci wi
19 No. There is a spanning set having 5 vectors and
this would need to be as long as the linearly independent set.
15 B (1 + 2i, 6) , B (i, 3) , B (7, 11)
23 No. It can’t. It does not contain 0.
19 Gerschgorin’s theorem shows that there are no zero
eigenvalues and so the matrix is invertible.
25 No. This would lead to 0 = 1.The last one must
not be a pivot column and the ones to the left must
each be pivot columns.
∑
∑n
∑
43 Suppose
∑
∑ i=1 ai gi = 0. Then 0 =∑ i ai j Aij fj =
i Aij ai = 0 for
j fj
i Aij ai . It follows that
each j. Therefore, since AT is invertible, it follows
that each ai = 0. Hence the functions gi are linearly
independent.
21 6x′2 + 12y ′2 + 18z ′2 .
√
√
√
23 (x′ )2 + 13 3x′ − 2(y ′ )2 − 12 2y ′ − 2(z ′ )2 − 61 6z ′
25 (0, −1, 0) (4, −1, 0) saddle point. (2, −1, −12) local
minimum.
27 (1, 1) , (−1, 1) , (1, −1) , (−1, −1) saddle points.
( 1√ √ ) (1√ √ )
− 6 5 6, 0 , 6 5 6, 0 Local minimums.
Exercises
29 Critical points: (0, 1, 0) , Saddle point.
31 ±1
9.5
1 This is because ABC is one to one.
Exercises
8.4
1 The first three vectors form a basis and the dimension is 3.
3 No. Not a subspace. Consider (0, 0, 1, 0) and multiply by −1.
5 NO. Multiply something by −1.
7 No. Take something nonzero in M where say u1 =
1. Now multiply by 100.
9 Suppose {x1 , · · · , xk } is a set of vectors from Fn .
Show that 0 is in span (x1 , · · · , xk ) .
∑
0 = i 0xi
7 In the following examples, a linear transformation,
T is given by specifying its action on a basis β. Find
its matrix with respect to this basis.
(
)
2 0
(a)
1 1
(
)
2 1
(b)
1 0
(
)
1 1
(c)
2 −1


0 1 0 0
 0 0 2 0 

11 A = 
 0 0 0 3 
0 0 0 0
495



13 


1
0
0
0
0
0
1
0
0
0
2
0
1
0
0
0
6
0
1
0
0
0
12
0
1




0 −1 0
1 0
8  1 0 0  , Q,  0 i
0 0 1
0 0






0
0  , Q + iQ
−i
Exercises
15 You can see these are not similar by noticing that
the second has an eigenspace of dimension equal to 11.4


1 so it is not similar to any diagonal matrix which
.6
is what the first one is.
1  .9 
1
19 This is because the general solution is y + y where
p
Ayp = b and Ay = 0. Now A0 = 0 and so the solution is unique precisely when this is the only solution y to Ay = 0.
Exercises
10.6
(
1
2 Consider
0
Jordan form.
1
1
) (
)
1 0
,
. These are both in
0 1
8 λ3 − λ2 + λ − 1
10 λ2
11 λ3 − 3λ2 + 14

1 1 0 0
 0 1 0 0
16 
 0 0 2 1
0 0 0 2




6 
The columns are
  1

n
1
2n − (−1) + 1
2n − 1
 2n − 3 (−1)n + 1   2n − 1 
  2

 21
 n − 2 (−1)n + 1  ,  1n − 1  ,
2
2
n
1
1
2n − 2 (−1) + 1
2n − 1

 

n
(−1) − 22n + 1
0
 0   3 (−1)n − 4n + 1 
2
 1 ,

 n   2 (−1)n − 3n + 1 
2
2
n
0
2 (−1) − 22n + 1


0 −1 −1 0
 −1 0 −1 0 

8 
 1
1
2 0 
3
3
3 1


1 0 0
9  0 0 1 
0 1 0
(
) ( 1
)
1/2 1/3
− 2 −1
12 Try
,
5
1/2 2/3
1
3
Exercises
Exercises
10.9
12.7
4 λ3 − 3λ2 + 14


0 0 −14
0 
5  1 0
0 1
3

0 0 0 −3
 1 0 0 −1
6 
 0 1 0 −11
0 0 1
8


2 0 0
7  0 0 −7 
0 1 −2
(
1





2 
17
15
1
45
)
√  
1
6 √6
1
6 √6
1
3 6
√ √   2√ 
1
− 30
5 6
5 5
√
√
, 1 5 6 ,
0√ 
6 √ √
1
1
−5 5
− 15 5 6
1/2
3 |(Ax, y)| ≤ (Ax, x) (Ay, y)
√ (
) }
{ √
1, 3 √
(2x(− 1) , 6 5 x2 − x +)61
9
1
, 20 7 x3 − 32 x2 + 53 x − 20
11 2x3 − 97 x2 + 27 x −
1
70
496
APPENDIX H. SELECTED EXERCISES
√
9
− 146
146
√
 2 146
73 √
14 
 7 146
146
0


Exercises



2
14.7
2
16 |x + y| + |x − y| = (x + y, x + y) + (x − y, x − y)
2
2
2
2
= |x| + |y| + 2 (x, y) + |x| + |y| − 2 (x, y) .
21 Give an example of two vectors in R4 x, y and a
subspace V such that x · y = 0 but P x·P y ̸= 0
where P denotes the projection map which sends x
to its closest point on V .
Try this. V is the span of e1 and e2 and x = e3 +
e1 , y = e4 + e1 .
P x = (e3 + e1 , e1 ) e1 + (e3 + e1 , e2 ) e2 = e1
P y = (e4 + e1 , e1 ) e1 + (e4 + e1 , e2 ) e2 = e1

0.09
1  0.21 
0.43


4. 237 3 × 10−2
3  7. 627 1 × 10−2 
0.711 86
28 You have H = U ∗ DU where U is unitary and D is
a real diagonal matrix. Then you have
 iλ

e 1
∞
n
∑
(iD)


..
eiH = U ∗
U = U∗ 
U
.
n!
n=0
iλn
e
P x·P y = 1
22 y =
13
5 x
−

and this is clearly unitary because each matrix in
the product is.
2
5
Exercises
Exercises
15.3
12.9
√
1 volume is 218
3 0.
Exercises
13.13
13 This is easy because you show it preserves distances.
15 (Ax, x) = (U DU ∗ x, x) = (DU ∗ x,U ∗ x) ≥ δ 2 |U ∗ x| =
2
δ 2 |x|
2
16 0 > ((A + A∗ ) x, x) = (Ax, x) + (A∗ x, x)
= (Ax, x) + (Ax, x) Now let Ax = λx. Then you
2
¯ |x|2 = Re (λ) |x|2
get 0 > λ |x| + λ
19 If Ax = λx, then you can take the norm of both
sides and conclude that |λ| = 1. It follows that the
eigenvalues of A are eiθ , e−iθ and another one which
has magnitude 1 and is real. This can only be 1 or
−1. Since the determinant is given to be 1, it follows
that it is 1. Therefore, there exists an eigenvector
for the eigenvalue 1.

1
1  2
3













3
2  2
1













3
1.0 , eigenvectors:
4

0.534 91 
0.390 22  ↔ 6. 662,

0.749 4

0.130 16

0.838 32  ↔ 1. 679 0,

−0.529 42

0.834 83

−0.380 73  ↔ −1. 341

−0.397 63

2 1.0
1 3 , eigenvectors:
3 2

0.577 35 
0.577 35  ↔ 6.0,

0.577 35

0.788 68

−0.211 32  ↔ 1. 732 1,

−0.577 35

0.211 32

−0.788 68  ↔ −1. 732 1

0.577 35
2
2
1
497

3
3  2
1













0
4  2
1













0
5  2
1













1.0
3 , eigenvectors:
2

0.416 01 
0.779 18  ↔ 7. 873 0,

0.468 85

0.904 53

−0.301 51  ↔ 2.0,

−0.301 51

9. 356 8 × 10−2 
 ↔ 0.127 02
−0.549 52

0.830 22

2 1.0
5 3 , eigenvectors:
3 2

0.284 33 
0.819 59  ↔ 7. 514 6,

0.497 43

0.209 84

0.453 06  ↔ 0.189 11,

−0.866 43

0.935 48

 ↔ −0.703 70
−0.350 73

4. 316 8 × 10−2

2 1.0
0 3 , eigenvectors:
3 2

0.379 2

0.584 81  ↔ 4. 975 4,

0.717 08

0.814 41

0.156 94  ↔ −0.300 56,

−0.558 66

0.439 25

−0.795 85  ↔ −2. 674 9

0.416 76
2
5
3
6 |7. 333 3 − λq | ≤ 0.471 41
7 |7 − λq | = 2. 449 5
8 |λq − 8| ≤ 3. 266 0
9 −10 ≤ λ ≤ 12
10 x3 + 7x2 + 3x + 7.0 = 0, Solution is:


 [x = −0.145 83 + 1. 011i] , 
[x = −0.145 83 − 1. 011i] ,


[x = −6. 708 3]
11 −1. 475 5 + 1. 182 7i,
−1. 475 5 − 1. 182 7i, −0.024 44 + 0.528 23i,
−0.024 44 − 0.528 23i
12 Let QT AQ = H where H is upper Hessenberg.
Then take the transpose of both sides. This will
show that H = H T and so H is zero on the top as
well.
Index
∩, 11
∪, 11
A close to B
eigenvalues, 196
A invariant, 255
Abel’s formula, 110, 269, 270
absolute convergence
convergence, 354
adjugate, 88, 101
algebraic number
minimal polynomial, 223
algebraic numbers, 223
field, 224
algebraically complete field
countable one, 450
almost linear, 430
almost linear system, 431
alternating group, 477
3 cycles, 477
analytic function of matrix, 413
Archimedean property, 23
assymptotically stable, 430
augmented matrix, 29
automorphism, 459
autonomous, 430
Banach space, 343
basic feasible solution, 145
basic variables, 145
basis, 66, 208
Binet Cauchy
volumes, 310
Binet Cauchy formula, 97
block matrix, 106
multiplication, 107
block multiplication, 106
bounded linear transformations, 344
Cauchy Schwarz inequality, 36, 292, 341
Cauchy sequence, 305, 343, 440
Cayley Hamilton theorem, 105, 269, 279
centrifugal acceleration, 73
centripetal acceleration, 73
characteristic and minimal polynomial, 247
characteristic equation, 165
characteristic polynomial, 104, 246
characteristic value, 165
Cholesky factorization, 339
codomain, 12
cofactor, 85, 98
column rank, 101, 118
commutative ring, 445
commutator, 480
commutator subgroup, 480
companion matrix, 272, 383
complete, 362
completeness axiom, 22
complex conjugate, 16
complex numbers
absolute value, 16
field, 16
complex numbers, 15
complex roots, 17
composition of linear transformations, 242
comutator, 206
condition number, 351
conformable, 46
conjugate fields, 471
conjugate linear, 297
converge, 440
convex combination, 249
convex hull, 249
compactness, 249
coordinate axis, 34
coordinates, 34
Coriolis acceleration, 73
Coriolis acceleration
earth, 75
Coriolis force, 73
counting zeros, 195
Courant Fischer theorem, 318
Cramer’s rule, 89, 101
cyclic set, 258
498
INDEX
damped vibration, 427
defective, 170
DeMoivre identity, 17
dense, 24
density of rationals, 24
determinant
block upper triangular matrix, 182
definition, 94
estimate for Hermitian matrix, 290
expansion along a column, 85
expansion along a row, 85
expansion along row, column, 99
Hadamard inequality, 290
inverse of matrix, 88
matrix inverse, 100
partial derivative, cofactor, 111
permutation of rows, 94
product, 96
product of eigenvalues, 188
product of eigenvalules, 199
row, column operations, 87, 95
summary of properties, 104
symmetric definition, 95
transpose, 95
diagonalizable, 240, 311
minimal polynomial condition, 271
basis of eigenvectors, 179
diameter, 439
differentiable matrix, 69
differential equations
first order systems, 202
digraph, 48
dimension of vector space, 211
direct sum, 81, 252
directed graph, 48
discrete Fourier transform, 337
distinct roots
polynomial and its derivative, 472
division of real numbers, 24
Dolittle’s method, 132
domain, 12
dot product, 35
dyadics, 234
dynamical system, 179
eigenspace, 167, 255
eigenvalue, 83, 165
eigenvalues, 104, 195, 246
AB and BA, 109
eigenvector, 83, 165
eigenvectors
499
distinct eigenvalues independence, 170
elementary matrices, 113
elementary symmetric polynomials, 445
empty set, 12
equality of mixed partial derivatives, 191
equilibrium point, 430
equivalence class, 218, 238
equivalence of norms, 344
equivalence relation, 218, 237
Euclidean algorithm, 24
exchange theorem, 64
existence of a fixed point, 364
field
ordered, 14
field axioms, 13
field extension, 219
dimension, 220
finite, 220
field extensions, 221
fields
characteristic, 473
perfect, 474
fields
perfect, 474
finite dimensional inner product space
closest point, 294
finite dimensional normed linear space
completeness, 343
equivalence of norms, 344
fixed field, 466
fixed fields and subgroups, 468
Foucalt pendulum, 75
Fourier series, 304
Fredholm alternative, 125, 302
free variable, 31
Frobenius
inner product, 205
Frobenius norm, 331
singular value decomposition, 331
Frobinius norm, 336
functions, 12
fundamental matrix, 423
fundamental theorem of algebra, 18, 443, 450
fundamental theorem of algebra
plausibility argument, 19
fundamental theorem of arithmetic, 27
fundamental theorem of Galois theory, 470
Galois group, 464
size, 464
500
Gauss Jordan method for inverses, 52
Gauss Seidel method, 359
Gelfand, 353
generalized eigenspace, 82
generalized eigenspaces, 255, 264
generalized eigenvectors, 265
Gerschgorin’s theorem, 194
Gram Schmidt procedure, 142, 181, 294
Gram Schmidt process, 293, 294
Gramm Schmidt process, 181
greatest common divisor, 24, 214
characterization, 25
greatest lower bound, 21
Gronwall’s inequality, 370, 422
group
definition, 466
group
solvable, 480
Hermitian, 185
orthonormal basis eigenvectors, 316
positive definite, 320
real eigenvalues, 187
Hermitian matrix
factorization, 290
positive part, 414
positive part, Lipschitz continuous, 414
Hermitian operator, 297
largest, smallest, eigenvalues, 318
spectral representation, 316
Hessian matrix, 192
Holder’s inequality, 347
homomorphism, 459
Householder
reflection, 139
Householder matrix, 138
idempotent, 79, 489
impossibility of solution by radicals, 483
inconsistent, 31
initial value problem
existence, 417
global solutions, 421
linear system, 418
local solutions, existence, uniqueness, 420
uniqueness, 370, 417
injective, 12
inner product, 35, 291
inner product space, 291
adjoint operator, 296
parallelogram identity, 293
INDEX
triangle inequality, 293
integers mod a prime, 230
integral
operator valued function, 369
vector valued function, 369
intersection, 11
intervals
notation, 11
invariant, 314
subspace, 255
invariant subspaces
direct sum, block diagonal matrix, 256
inverses and determinants, 100
invertible, 51
invertible matrix
product of elementary matrices, 123
irreducible, 214
relatively prime, 215
isomorphism, 459
extensions, 461
iterative methods
alternate proof of convergence, 367
convergence criterion, 362
diagonally dominant, 367
proof of convergence, 365
Jocobi method, 357
Jordan block, 262, 264
Jordan canonical form
existence and uniqueness, 265
powers of a matrix, 266
ker, 123
kernel, 62
kernel of a product
direct sum decomposition, 253
Krylov sequence, 258
Lagrange form of remainder, 191
Laplace expansion, 98
least squares, 129, 301, 491
least upper bound, 21
Lindemann Weierstrass theorem, 458
linear combination, 43, 63, 96
linear transformation, 57, 233
defined on a basis, 234
dimension of vector space, 234
existence of eigenvector, 246
kernel, 251
matrix, 58
minimal polynomial, 246
rotation, 60
INDEX
linear transformations
a vector space, 233
commuting, 254
composition, matrices, 242
sum, 233, 299
linearly dependent, 63
linearly independent, 63, 208
linearly independent set
extend to basis, 212
Lipschitz condition, 417
LU factorization
justification for multiplier method, 135
multiplier method, 131
solutions of linear systems, 133
main diagonal, 86
Markov matrix, 281
limit, 284
regular, 284
steady state, 281, 284
mathematical induction, 22
matrices
commuting, 313
notation, 42
transpose, 50
matrix, 41
differentiation operator, 236
injective, 68
inverse, 51
left inverse, 101
lower triangular, 86, 101
Markov, 281
non defective, 185
normal, 185
polynomial, 112
rank and existence of solutions, 124
rank and nullity, 123
right and left inverse, 68
right inverse, 101
right, left inverse, 100
row, column, determinant rank, 101
self adjoint, 178
stochastic, 281
surjective, 68
symmetric, 177
unitary, 181
upper triangular, 86, 101
matrix
positive definite, 339
matrix exponential, 368
matrix multiplication
501
definition, 44
entries of the product, 46
not commutative, 45
properties, 50
vectors, 43
matrix of linear transformation
orthonormal bases, 239
migration matrix, 284
minimal polynomial, 82, 246, 254
eigenvalues, eigenvectors, 246
finding it, 268
generalized eigenspaces, 255
minimal polynomial
algebraic number, 223
minor, 85, 98
mixed partial derivatives, 190
monic, 214
monomorphism, 459
Moore Penrose inverse, 333
least squares, 333
uniqueness, 338
moving coordinate system, 70
acceleration , 73
negative definite, 320
Neuman
series, 372
nilpotent
block diagonal matrix, 263
Jordan form, uniqueness, 263
Jordan normal form, 263
non defective, 271
non solvable group, 481
nonnegative self adjoint
square root, 322
norm, 291
strictly convex, 366
uniformly convex, 366
normal, 327
diagonalizable, 187
non defective, 185
normal closure, 464, 471
normal extension, 463
normal subgroup, 469, 480
normed linear space, 291, 341
normed vector space, 291
norms
equivalent, 342
null and rank, 306
null space, 62
nullity, 123
502
one to one, 12
onto, 12
operator norm, 344
orthogonal matrix, 83, 91, 138, 183
orthogonal projection, 294
orthonormal basis, 293
orthonormal polynomials, 303
p norms, 347
axioms of a norm, 348
parallelepiped
volume, 306
partitioned matrix, 106
Penrose conditions, 334
permutation, 93
even, 115
odd, 115
permutation matrices, 113, 476
permutations
cycle, 476
perp, 125
Perron’s theorem, 404
pivot column, 121
PLU factorization, 134
existence, 138
polar decomposition
left, 326
right, 324
polar form complex number, 16
polynomial, 27, 214
addition, 27
degree, 27, 214
divides, 214
division, 27, 214
equal, 214
equality, 27
greatest common divisor, 214
greatest common divisor description, 215
greatest common divisor, uniqueness, 214
irreducible, 214
irreducible factorization, 216
multiplication, 27
relatively prime, 214
root, 214
polynomial
matrix coefficients, 112
polynomials
canceling, 216
factorization, 217
positive definite
postitive eigenvalues, 320
INDEX
principle minors, 321
positive definite matrix, 339
postitive definite, 320
power method, 375
prime number, 24
prime numbers
infinity of primes, 229
principle directions, 173
principle minors, 321
product rule
matrices, 69
projection map
convex set, 305
Putzer’s method, 424
QR algorithm, 198, 387
convergence, 390
convergence theorem, 390
non convergence, 199, 394
QR factorization, 139
existence, 141
Gram Schmidt procedure, 142
quadratic form, 189
quotient group, 469
quotient space, 230
quotient vector space, 230
range, 12
rank, 119
number of pivot columns, 123
rank of a matrix, 101, 118
rank one transformation, 299
rational canonical form, 272
uniqueness, 275
Rayleigh quotient, 383
how close?, 384
real numbers, 12
real Schur form, 183
regression line, 301
regular Sturm Liouville problem, 304
relatively prime, 24
Riesz representation theorem, 296
right Cauchy Green strain tensor, 324
right polar decomposition, 325
row equivalelance
determination, 122
row equivalent, 122
row operations, 30, 113
inverse, 30
linear relations between columns, 119
row rank, 101, 118
INDEX
row reduced echelon form
definition, 120
examples, 120
existence, 120
uniqueness, 122
scalar product, 35
scalars, 18, 34, 41
Schur’s theorem, 182, 314
inner product space, 314
second derivative test, 193
self adjoint, 185, 297
self adjoint nonnegative
roots, 323
separable
polynomial, 465
sequential compactness, 441
sequentially compact, 441
set notation, 11
sgn, 92
uniqueness, 93
shifted inverse power method, 377
complex eigenvalues, 382
sign of a permutation, 93
similar
matrix and its transpose, 271
similar matrices, 90, 110, 237
similarity transformation, 237
simple field extension, 225
simple groups, 479
simplex tableau, 146
simultaneous corrections, 357
simultaneously diagonalizable, 312
commuting family, 314
singular value decomposition, 329
singular values, 329
skew symmetric, 51, 177
slack variables, 144, 147
solvable by radicals, 482
solvable group, 480
space of linear transformations
vector space, 299
span, 63, 96
spanning set
restricting to a basis, 212
spectral mapping theorem, 414
spectral norm, 345
spectral radius, 351, 353
spectrum, 165
splitting field, 221
splitting fields
503
isomorphic, 462
normal extension, 463
stable, 430
stable manifold, 437
stochastic matrix, 281
subsequence, 440
subspace, 63, 208
basis, 67, 213
complementary, 310
dimension, 67
invariant, 255
subspaces
direct sum, 252
direct sum, basis, 253
substituting matrix into polynomial identity, 112
surjective, 12
Sylvester, 81
law of inertia, 204
dimention of kernel of product, 251
Sylvester’s equation, 310
symmetric, 51, 177
symmetric polynomial theorem, 446
symmetric polynomials, 445
system of linear equations, 32
tensor product, 299
the space AU, 310
trace, 188
AB and BA, 188
sum of eigenvalues, 199
transpose, 50
properties, 50
transposition, 476
triangle inequality, 37
trivial, 63
union, 11
Unitary matrix
representation, 372
upper Hessenberg matrix, 278, 398
Vandermonde determinant, 112
variation of constants formula, 203, 426
variational inequality, 305
vector
angular velocity, 71
vector space
axioms, 42, 207
basis, 66
dimension, 67
examples, 207
vector space axioms, 35
504
vectors, 43
volume
parallelepiped, 306
well ordered, 22
Wronskian, 110, 203, 269, 270, 426
Wronskian alternative, 203, 426
INDEX