Download Report

Course Notes
for
MS4105
Linear Algebra 2
J. Kinsella
October 20, 2014
0-0
MS4105
Linear Algebra 2
0-1
Contents
I
Linear Algebra
1 Vector Spaces
1.1
1.2
1.3
1.4
6
9
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.1.1
General Vector Notation . . . . . . . . . . . . . . .
10
1.1.2
Notation for Vectors in Rn . . . . . . . . . . . . .
13
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.2.1
Exercises . . . . . . . . . . . . . . . . . . . . . . .
24
Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
1.3.1
Exercises . . . . . . . . . . . . . . . . . . . . . . .
41
Linear Independence . . . . . . . . . . . . . . . . . . . . .
42
MS4105
Linear Algebra 2
1.4.1
1.5
0-2
Exercises . . . . . . . . . . . . . . . . . . . . . . .
62
Basis and Dimension . . . . . . . . . . . . . . . . . . . . .
63
1.5.1
91
Exercises . . . . . . . . . . . . . . . . . . . . . . .
2 Inner Product Spaces
2.1
2.2
Inner Products . . . . . . . . . . . . . . . . . . . . . . . .
95
2.1.1
Exercises . . . . . . . . . . . . . . . . . . . . . . .
99
2.1.2
Length and Distance in Inner Product Spaces
2.1.3
Unit Sphere in Inner Product Spaces
. . 101
. . . . . . . 106
Angles and Orthogonality in Inner Product Spaces
2.2.1
2.3
93
. . . 111
Exercises . . . . . . . . . . . . . . . . . . . . . . . 122
Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . 124
2.3.1
Calculating Orthonormal Bases . . . . . . . . . . . 131
MS4105
Linear Algebra 2
2.3.2
Exercises . . . . . . . . . . . . . . . . . . . . . . . 136
3 Complex Vector and Inner Product Spaces
II
0-3
142
3.1
Complex Vector Spaces . . . . . . . . . . . . . . . . . . . 143
3.2
Complex Inner Product Spaces . . . . . . . . . . . . . . . 144
3.2.1
Properties of the Complex Euclidean inner product 146
3.2.2
Orthogonal Sets . . . . . . . . . . . . . . . . . . . 149
3.2.3
Exercises . . . . . . . . . . . . . . . . . . . . . . . 156
Matrix Algebra
4 Matrices and Vectors
4.1
162
163
Properties of Matrices . . . . . . . . . . . . . . . . . . . . 172
MS4105
4.2
4.3
Linear Algebra 2
0-4
4.1.1
Range and Nullspace . . . . . . . . . . . . . . . . . 172
4.1.2
Rank . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.1.3
Inverse . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.1.4
Matrix Inverse Times a Vector . . . . . . . . . . . 182
Orthogonal Vectors and Matrices . . . . . . . . . . . . . . 184
4.2.1
Inner Product on Cn
4.2.2
Orthogonal vectors . . . . . . . . . . . . . . . . . . 193
4.2.3
Unitary Matrices . . . . . . . . . . . . . . . . . . . 199
4.2.4
Multiplication by a Unitary Matrix . . . . . . . . . 200
4.2.5
A Note on the Unitary Property . . . . . . . . . . 201
4.2.6
Exercises . . . . . . . . . . . . . . . . . . . . . . . 202
. . . . . . . . . . . . . . . . 187
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
4.3.1
Vector Norms . . . . . . . . . . . . . . . . . . . . . 205
MS4105
Linear Algebra 2
0-5
4.3.2
Inner Product based on p-norms on Rn /Cn
4.3.3
Unit Spheres . . . . . . . . . . . . . . . . . . . . . 210
4.3.4
Matrix Norms Induced by Vector Norms . . . . . . 214
. . . 209
5 QR Factorisation and Least Squares
5.1
5.2
240
Projection Operators . . . . . . . . . . . . . . . . . . . . . 241
5.1.1
Orthogonal Projection Operators . . . . . . . . . . 248
5.1.2
Projection with an Orthonormal Basis . . . . . . . 253
5.1.3
Orthogonal Projections with an Arbitrary Basis
5.1.4
Oblique (Non-Orthogonal) Projections . . . . . . . 259
5.1.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . 260
. 256
QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . 264
5.2.1
Reduced QR Factorisation . . . . . . . . . . . . . . 264
MS4105
5.3
Linear Algebra 2
0-6
5.2.2
Full QR factorisation . . . . . . . . . . . . . . . . . 268
5.2.3
Gram-Schmidt Orthogonalisation . . . . . . . . . . 269
5.2.4
Instability of Classical G-S Algorithm . . . . . . . 272
5.2.5
Existence and Uniqueness . . . . . . . . . . . . . . 275
5.2.6
Solution of Ax = b by the QR factorisation
5.2.7
Exercises . . . . . . . . . . . . . . . . . . . . . . . 279
. . . 277
Gram-Schmidt Orthogonalisation . . . . . . . . . . . . . . 282
5.3.1
Modified Gram-Schmidt Algorithm . . . . . . . . . 285
5.3.2
Example to Illustrate the “Stability” of MGS . . . 291
5.3.3
A Useful Trick . . . . . . . . . . . . . . . . . . . . 293
5.3.4
Operation Count . . . . . . . . . . . . . . . . . . . 299
5.3.5
Gram-Schmidt as Triangular Orthogonalisation . . 300
5.3.6
Exercises . . . . . . . . . . . . . . . . . . . . . . . 307
MS4105
5.4
5.5
Linear Algebra 2
Householder Transformations . . . . . . . . . . . . . . . . 308
5.4.1
Householder and Gram Schmidt . . . . . . . . . . 309
5.4.2
Triangularising by Introducing Zeroes . . . . . . . 311
5.4.3
Householder Reflections . . . . . . . . . . . . . . . 314
5.4.4
How is Q to be calculated? . . . . . . . . . . . . . 329
5.4.5
Example to Illustrate the Stability of the Householder QR Algorithm . . . . . . . . . . . . . . . . 334
Why is Householder QR So Stable? . . . . . . . . . . . . . 337
5.5.1
5.6
0-7
Operation Count . . . . . . . . . . . . . . . . . . . 342
Least Squares Problems . . . . . . . . . . . . . . . . . . . 345
5.6.1
Example: Polynomial Data-fitting . . . . . . . . . 346
5.6.2
Orthogonal Projection and the Normal Equations
5.6.3
Pseudoinverse . . . . . . . . . . . . . . . . . . . . . 352
349
MS4105
Linear Algebra 2
5.6.4
5.7
Solving the Normal Equations . . . . . . . . . . . . 354
Project
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
6 The Singular Value Decomposition
6.1
6.2
0-8
358
Existence of SVD for m × n Matrices . . . . . . . . . . . . 363
6.1.1
Some Simple Properties of the SVD . . . . . . . . 373
6.1.2
Exercises . . . . . . . . . . . . . . . . . . . . . . . 382
Uniqueness of SVD . . . . . . . . . . . . . . . . . . . . . . 387
6.2.1
Uniqueness of U and V
. . . . . . . . . . . . . . . 390
6.2.2
Exercises . . . . . . . . . . . . . . . . . . . . . . . 395
6.3
Naive method for computing SVD . . . . . . . . . . . . . 396
6.4
Significance of SVD
6.4.1
. . . . . . . . . . . . . . . . . . . . . 400
Changing Bases . . . . . . . . . . . . . . . . . . . . 400
MS4105
6.5
Linear Algebra 2
6.4.2
SVD vs. Eigenvalue Decomposition . . . . . . . . . 402
6.4.3
Matrix Properties via the SVD . . . . . . . . . . . 404
6.4.4
Low-Rank Approximations . . . . . . . . . . . . . 414
6.4.5
Application of Low-Rank Approximations . . . . . 419
Computing the SVD . . . . . . . . . . . . . . . . . . . . . 433
6.5.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . 439
7 Solving Systems of Equations
7.1
0-9
442
Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . 443
7.1.1
LU Factorisation . . . . . . . . . . . . . . . . . . . 443
7.1.2
Example . . . . . . . . . . . . . . . . . . . . . . . . 447
7.1.3
General Formulas for Gaussian Elimination . . . . 452
7.1.4
Operation Count . . . . . . . . . . . . . . . . . . . 459
MS4105
7.2
Linear Algebra 2
0-10
7.1.5
Solution of Ax = b by LU factorisation
. . . . . . 462
7.1.6
Instability of Gaussian Elimination without Pivoting464
7.1.7
Exercises . . . . . . . . . . . . . . . . . . . . . . . 469
Gaussian Elimination with Pivoting . . . . . . . . . . . . 470
7.2.1
A Note on Permutations . . . . . . . . . . . . . . . 471
7.2.2
Pivots . . . . . . . . . . . . . . . . . . . . . . . . . 473
7.2.3
Partial pivoting . . . . . . . . . . . . . . . . . . . . 476
7.2.4
Example . . . . . . . . . . . . . . . . . . . . . . . . 478
7.2.5
PA = LU Factorisation . . . . . . . . . . . . . . . . 482
7.2.6
Details of Li to Li0 Transformation . . . . . . . . . 484
7.2.7
Stability of GE . . . . . . . . . . . . . . . . . . . . 490
7.2.8
Exercises . . . . . . . . . . . . . . . . . . . . . . . 497
MS4105
Linear Algebra 2
8 Finding the Eigenvalues of Matrices
8.1
0-11
498
Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . 498
8.1.1
Eigenvalue Decomposition . . . . . . . . . . . . . . 499
8.1.2
Geometric Multiplicity . . . . . . . . . . . . . . . . 502
8.1.3
Characteristic Polynomial . . . . . . . . . . . . . . 502
8.1.4
Algebraic Multiplicity . . . . . . . . . . . . . . . . 505
8.1.5
Similarity Transformations . . . . . . . . . . . . . 507
8.1.6
Defective Eigenvalues and Matrices . . . . . . . . . 511
8.1.7
Diagonalisability . . . . . . . . . . . . . . . . . . . 513
8.1.8
Determinant and Trace . . . . . . . . . . . . . . . 518
8.1.9
Unitary Diagonalisation . . . . . . . . . . . . . . . 521
8.1.10 Schur Factorisation . . . . . . . . . . . . . . . . . . 524
8.1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . 537
MS4105
8.2
Linear Algebra 2
0-12
Computing Eigenvalues — an Introduction . . . . . . . . 540
8.2.1
Using the Characteristic Polynomial . . . . . . . . 540
8.2.2
An Alternative Method for Eigenvalue Computation544
8.2.3
Reducing A to Hessenberg Form — the “Obvious”
Method . . . . . . . . . . . . . . . . . . . . . . . . 545
8.2.4
Reducing A to Hessenberg Form — a Better Method548
8.2.5
Operation Count . . . . . . . . . . . . . . . . . . . 551
8.2.6
Exercises . . . . . . . . . . . . . . . . . . . . . . . 553
9 The QR Algorithm
555
9.1
The Power Method . . . . . . . . . . . . . . . . . . . . . . 556
9.2
Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . 563
9.3
Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . 566
9.4
The Un-Shifted QR Algorithm . . . . . . . . . . . . . . . 571
MS4105
Linear Algebra 2
10 Calculating the SVD
0-13
574
10.1 An alternative (Impractical) Method for the SVD . . . . . 575
10.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 577
10.2 The Two-Phase Method . . . . . . . . . . . . . . . . . . . 581
10.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 583
III
Supplementary Material
584
A Index Notation and an Alternative Proof for Lemma 1.10585
B Proof that under-determined homogeneous linear systems have a non-trivial solution
592
C Proof of the Jordan von Neumann Lemma for a real inner
MS4105
Linear Algebra 2
product space
0-14
596
D Matlab Code for (Very Naive) SVD algorithm
602
E Matlab Code for simple SVD algorithm
605
F Example SVD calculation
607
G Uniqueness of U and V in S.V.D.
609
H Oblique Projection Operators — the details
616
I
625
Detailed Discussion of the QR Algorithm
I.1
Simultaneous Power Method . . . . . . . . . . . . . . . . . 625
I.1.1
A Normalised version of Simultaneous Iteration . . 633
MS4105
Linear Algebra 2
I.1.2
I.2
0-15
Two Technical Points . . . . . . . . . . . . . . . . 646
QR Algorithm with Shifts . . . . . . . . . . . . . . . . . . 647
I.2.1
Connection with Shifted Inverse Iteration . . . . . 651
J Solution to Ex. 4 in Exercises 5.1.5
657
K Solution to Ex. 9 in Exercises 5.1.5
658
L Hint for Ex. 5b in Exercises 5.2.7
661
M Proof of of Gerchgorin’s theorem in Exercises 8.1.11
665
N Proof of Extended Version of Gerchgorin’s theorem in
Exercises 8.1.11
667
MS4105
Linear Algebra 2
0-16
O Backward Stability of Pivot-Free Gauss Elimination
671
P Instability of Polynomial Root-Finding in Section 8.2
676
Q Solution to Problem 2 on Slide 279
681
R Convergence of Fourier Series
686
S Example of Instability of Classical GS
687
T Example of Stability of Modified GS
689
U Proof of Unit Roundoff Formula
691
V Example to Illustrate the Stability of the Householder
QR Algorithm
692
MS4105
Linear Algebra 2
'
1
$
About the Course
• Lecture times
– (Week 1)
∗ Monday 15:00 A1–052 LECTURE
∗ Wednesday 09:00 SG–17 LECTURE
– (Week 2 and subsequently)
∗
∗
∗
∗
Monday 15:00 A1–052 LECTURE
Wednesday 09:00 SG–15 LECTURE
Thursday 10:00 KBG–11 TUTORIAL (3B)
Thursday 17:00 B1–005A TUTORIAL (3A)
• Office hours: Mondays & Wednesdays 1600. B3-043.
• These notes available at
http://jkcray.maths.ul.ie/ms4105/Slides.pdf
&
%
MS4105
'
Linear Algebra 2
2
$
• The main reference text for the course is “Numerical Linear
Algebra” by Lloyd Trefethen and David Bau (shelved at 512.5)
on which much of Part 2 is based.
• To review the basics of Linear Algebra see “Elementary Linear
Algebra” by H. Anton (shelved at 512.5).
&
%
MS4105
'
Linear Algebra 2
3
$
• The Notes are divided into two Parts which are in turn divided
into Chapters and Sections.
• The first Part of the course is devoted to Linear Algebra;
Vector Spaces and Inner Product Spaces.
• The second Part focuses on the broad topic of Matrix Algebra
— essentially Applied Linear Algebra.
• Usually only some of the topics in the Notes will be covered in
class — in particular I expect to have to leave out a lot of
material from Chapters 8, 9 & 10 — of course when students
are particularly clever anything is possible . . . ^
¨
&
%
MS4105
'
Linear Algebra 2
4
$
• There are Exercises at the end of each Section — you will
usually be asked to attempt one or more before the next
tutorial.
• There are also statements made in the notes that you are asked
to check.
• By the end of the semester you should aim — maybe in
cooperation with classmates — to have written out the answers
to most/all Exercises and check problems.
• A record of attendance at lectures and tutorials will be kept.
&
%
MS4105
'
Linear Algebra 2
5
$
• There will be a mid-semester written test for 10% after Part I
is completed, covering that first part of the course.
• This test will be held during a tutorial class — you will be
given advance notice!
• There will be a Matlab/Octave programming assignment —
also for 10%.
• The Matlab/Octave project description will appear at
http://jkcray.maths.ul.ie/ms4105/Project2014.pdf
• This assignment will be given in class around Week 9 and you
will be given a week to complete it.
• There will be a end of semester written examination for 80% of
the marks for the course.
&
%
MS4105
'
6
$
Part I
Linear Algebra
In Linear Algebra 1, many of the following topics were covered:
• Systems of linear equations and their solution by an
elimination method.
• Matrices: matrix algebra, determinants, inverses, methods for
small matrices, extension to larger matrices.
• Vectors in 2 and 3 dimensions: geometric representation of
vectors, vector arithmetic, norm, scalar product, angle,
orthogonality, projections, cross product and its uses in the
study of lines and planes in 3-space.
&
%
MS4105
'
7
$
• Extension to vectors in n dimensions: vector algebra, scalar
product, orthogonality, projections, bases in R2 , R3 and Rn .
• Matrices acting on vectors, eigenvalues and eigenvectors,
particularly in 2 and 3 dimensions.
• Applications such as least-squares fits to data.
&
%
MS4105
'
8
$
• This first Part (Part I) extends these familiar ideas.
• I will be generalising these ideas from R2 , R3 and Rn to general
vector spaces and inner product spaces.
• This will allow us to use to use geometric ideas (length,
distance, angle etc.) and results (e.g. Cauchy-Schwartz
inequality) in many useful and unexpected contexts.
&
%
MS4105
'
1
9
$
Vector Spaces
• I begin by reviewing the idea of a Vector Space.
• Rn and Cn are the most important examples.
• I will show that other important mathematical systems are also
vector spaces, e.g. sets of matrices or functions.
– Because they satisfy the same set of “rules”or axioms.
• Studying vector spaces will give us results which will
automatically hold for vectors in Rn and Cn .
• I usually knew these results already for vectors in Rn and Cn .
• But they will also hold for the more complicated situations
where “geometrical intuition” is less useful.
• This is the real benefit of studying vector spaces.
&
%
MS4105
10
'
1.1
$
Notation
First some (important) points on notation.
1.1.1
General Vector Notation
A variety of notations are used for vectors, both the familiar
vectors in R2 , R3 and Rn and abstract vectors in vector spaces.
• The clearest notation in printed material (as in these slides) is
to write
– vectors u, v, w, a, b, c, 0 in a bold font and
– scalars (numbers) α, β, γ, 1, 2, 3 in a normal (thin) font.
• A widely used convention is to use Roman letters u, v, w for
vectors and Greek letters α, β, γ for scalars (numbers).
&
%
MS4105
'
11
$
• When writing vectors by hand, they are often
– underlined u or
→
– have an arrow → over the symbol −
u
• In all cases the purpose is to differentiate vectors u, v, w from
scalars (numbers) α, β, γ.
• Very often in these notes, instead of writing α u to mean the
product of the scalar (number) α and the vector u, I will
simply write α u as it will be clear from the context that α
is a scalar (number) and u is a vector.
• The Roman/Greek convention often makes the use of bold
fonts/underlining/arrows unnecessary.
• The purpose of underlining/arrows for vectors in handwritten
material is clarity — use them when necessary to make your
intended meaning clear.
&
%
MS4105
'
12
$
• In each of the following letter pairs check whether the letters
used stand for vectors or scalars (numbers) — N.B. some of
them are deliberately confusing but you should still be able to
“decode” them:
→
α
uξ ξu γu v−
→
α−
v βw az za
&
%
MS4105
13
'
1.1.2
$
Notation for Vectors in Rn
It is normal in Linear Algebra to write vectors in Rn as row
vectors, e.g. (1, 2, 3). I will find it useful later in the module when
studying Matrix Algebra to adopt the convention that all vectors
in Rn are column vectors. For that reason I will usually write
vectors in R2 , R3 and Rn as row vectors with a superscript “T”
indicating that I am taking the transpose — turning the row into a
column. So
 
1
 
T

(1, 2, 3) ≡ 
2
3
The version on the left takes up less space on the page/screen.
&
%
MS4105
14
'
1.2
$
Definitions
Definition 1.1 A vector space is a non-empty set V together
with two operations; addition and multiplication by a scalar.
• given u, v ∈ V, write the sum as u + v.
• given u ∈ V and α ∈ R (or C) , write the scalar multiple of
u by α as αu.
• The addition and scalar multiplication operations must satisfy a
set of rules or axioms (based on the properties of R, R2 ,
R3 and Rn ), the Vector Space Axioms in Definition 1.2.
&
%
MS4105
'
15
$
• The word “sum” and the + symbol as used above doesn’t have
to be the normal process of addition even if u and v are
numbers.
√
• I could decide to use an eccentric definition like u + v ≡ uv if
u and v were positive real numbers.
• Or I could “define” αu ≡ sin(αu), again if u is a number.
• These wierd definitions won’t work as some of the axioms in
Definition 1.2 below are not satisfied.
• But some “wierd” definitions of addition and scalar
multiplication do satisfy the axioms in Definition 1.2.
&
%
MS4105
'
16
$
Example 1.1 (Strange Example) Here’s an even stranger
candidate: let V be the positive real numbers R+ , where 1 is the
“zero vector”, “scalar multiplication” is really numerical
exponentiation, and “addition” is really numerical multiplication.
In other words, x + y ≡ xy and αx ≡ xα . (Note vector space
notation on the left hand side and ordinary algebraic notation on
the right.)
• Is this combination of a set V and the operations of addition
and scalar multiplication a vector space?
• To answer the question we need to list the defining
rules/axioms for a vector space.
• If the following rules or axioms are satisfied for all u, v, w ∈ V
and for all α, β ∈ R (or C), then call V a vector space and
the elements of V vectors.
&
%
MS4105
'
17
$
Definition 1.2 (Vector Space Axioms) A non-empty set V
together with two operations; addition and multiplication by a
scalar is a vector space if the following 10 axioms are satisfied:
1. If u, v ∈ V, then u + v ∈ V. V is closed under addition
2. u + v = v + u. Addition is commutative.
3. u + (v + w) = (u + v) + w. Addition is associative.
4. There exists a special vector 0 ∈ V, the zero vector for V,
such that u + 0 = 0 + u = u for all u ∈ V.
5. For each u ∈ V, there exists a special vector −u ∈ V, the
negative of u, such that u + (−u) = (−u) + u = 0.
&
%
MS4105
'
18
$
6. If α ∈ R (or C) and u ∈ V then α u ∈ V. V is closed under
scalar multiplication.
7. α(u + v) = αu + αv. Scalar multiplication is distributive.
8. (α + β)u = αu + βu. Scalar multiplication is distributive.
9. α(βu) = (αβ)u. Scalar multiplication is associative.
10. 1u = u. Scalar multiplication by 1 works as expected.
&
%
MS4105
'
19
$
• All ten axioms are “obvious” in the sense that it is very easy to
check that they hold for the familiar examples of R2 and R3 .
• (I’ll do this shortly.)
• The subtle point is that I am now saying that any set V (along
with a definition of addition and scalar multiplication) that
satisfies these properties of R2 and R3 is a vector space.
• This is something people do all the time in mathematics;
generalise from a particular case to a general class of objects
that have the same structure as the original.
• Vector spaces in which the scalars may be complex are
complex vector spaces, if the scalars are restricted to be real
then they are called real vector spaces.
• The axioms are otherwise identical.
&
%
MS4105
'
20
$
Example 1.2 A short list of familiar examples — you should
check the 10 axioms for each:
1. V = Rn together with the standard operations of addition and
scalar multiplication together with the standard definitions
2. V = Cn
3. V = the set of all 2 × 2 matrices


a b


c d
(if restricted to real matrices then V is a real vector space).
&
%
MS4105
21
'
$
4. V = the set of all m × n matrices (real or complex)


a
. . . a1n
 11



 a21 . . . a2n 


.. 
 ..
 .
...
. 


am1 . . . amn
5. Let V be any plane in Rn containing the origin.
n
X
V = x ∈ Rn |
ai xi = 0 .
i=1
Pn
6. A “plane” P = {x ∈ R | i=1 ai xi = r}; r 6= 0 does not
contain the zero vector 0. Such a set is more correctly referred
to as an affine space and is not a vector space. Check which
axioms fail.
n
&
%
MS4105
'
22
$
I can gather together some simple consequences of the axioms for a
vector space (underlining the zero vector 0 for readability)
Theorem 1.1 Given V be a vector space, u ∈ V and α ∈ R (or
C). Then
(i) 0 u = 0.
(ii) α 0 = 0.
(iii) (−1) u = −u.
(iv) If α u = 0 then α = 0 or u = 0.
(v) The zero vector is “unique” — i.e. if two vectors satisfy
Vector Space Axioms 3 & 4 for 0, they must be equal (so 0 is
unique).
Proof:
&
%
MS4105
23
'
$
(i) I can write:
0u + 0u = (0 + 0)u
= 0u
&
Axiom 8
Properties of R.
%
MS4105
24
'
$
By Axiom 5, the vector 0u has a negative, −0u. Adding this
negative to both sides in the above equation;
[0u + 0u] + (−0u) = 0u + (−0u).
So
0u + [0u + (−0u)] = 0u + (−0u) Axiom 3
0u + 0 = 0
Axiom 5
0u = 0
Axiom 4.
(ii) Check
(iii) Check
(iv) Check
(v) Check
&
%
MS4105
25
'
1.2.1
$
Exercises
1. Check that all of the examples in Example 1.2 above are in
fact vector spaces.
2. Is the “Strange Example” vector space given in Example 1.1
actually a vector space ? (Are the 10 vector space axioms
satisfied?)
&
%
MS4105
26
'
1.3
$
Subspaces
A vector space can be contained within another. For example you
were asked to check that planes in Rn are a vector space. They are
obviously contained in Rn .
Mathematicians use the term subspace to describe this idea.
Definition 1.3 A subset W of a vector space V is called a
subspace of V if W is itself a vector space using the same
addition and scalar multiplication operations as V.
&
%
MS4105
'
27
$
• On the face of it, I need to check that all 10 axioms hold for
every vector in a subset W to confirm that it is a subspace.
• But as W is a subset of V (W ⊆ V), axioms 2, 3, 7, 8, 9 and 10
are “inherited” from the larger set V — I mean that if all
elements of a set S have some property then all elements of any
subset of S do too. (Obvious?)
• So I need only check axioms 1, 4, 5 and 6.
(“Closure” under an operation such as + means that if u and v
are in the set V then so is u + v. In the same way, closure
under scalar multiplication means that if u ∈ V and α ∈ R
then αu is also in V.)
• In fact I don’t need to check axioms 1, 4, 5 and 6, I need only
check “closure” under addition and scalar multiplication
(axioms 1 and 6).
This follows from the following theorem:
&
%
MS4105
28
'
$
Theorem 1.2 If W is a non-empty subset of a vector space V then
W is a subspace of V if and only if the following conditions hold:
(a) if u, v ∈ W, then u + v ∈ W
(closure under addition)
(b) if α ∈ R (or C) and u ∈ W then αu ∈ W
(closure under scalar multiplication.)
Proof:
[→] If W is a subspace of V, then all the vector space axioms are
satisfied; in particular, Axioms 1 & 6 hold — but these are
conditions (a) and (b).
[←] If conditions (a) and (b) hold then I need only prove that W
satisfies the remaining eight axioms. Axioms 2,3,7,8,9 and 10
are “inherited” from V. So I need to check that Axioms 4 & 5
are satisfied by all vectors in W.
&
%
MS4105
29
'
$
• Let u ∈ W.
• By condition (b), αu ∈ W for every scalar α.
• Setting α = 0, it follows from Thm 1.1 that 0u = 0 ∈ W.
• Setting α = −1, it follows from Thm 1.1 that
(−1)u = −u ∈ W.
&
%
MS4105
'
30
$
Example 1.3 Some examples using Thm 1.2 — you should
check each.
1. A line through the origin is a subspace of R3
2. The first quadrant (x ≥ 0, y ≥ 0) is not a subspace of R2 .
3. The set of symmetric m × n matrices are a subspace of the
vector space of all m × n matrices.
4. The set of upper triangular m × n matrices are a subspace of
the vector space of all m × n matrices.
5. The set of diagonal m × n matrices are a subspace of the vector
space of all m × n matrices. (Check what do you mean by
“diagonal” for a non-square (m 6= n) matrix?)
&
%
MS4105
'
31
$
6. Is the set of all m × m matrices with Trace (sum of diagonal
elements) equal to zero a subspace of the vector space of all
m × m matrices?
7. Is the set of all m × m matrices with Trace equal to one a
subspace of the vector space of all m × m matrices?
&
%
MS4105
'
32
$
An important result — solution spaces of homogeneous linear
systems are subspaces of Rn .
Theorem 1.3 If Ax = 0 is a homogeneous linear system of m
equations in n unknowns, then the set of solution vectors is a
subspace of Rn .
Proof: Check that the proof is a simple application of
Thm 1.2.
&
%
MS4105
33
'
$
Another important idea, linear combinations of vectors.
Definition 1.4 A vector w is a linear combination of the
vectors v1 , . . . , vk if it can be written in the form
w = α1 v1 + · · · + αk vk
where α1 , . . . , αk are scalars.
Example 1.4 Any vector in Rn is a linear combination of the
vectors e1 , . . . , en , where ei is the vector whose components are all
zero except for the ith component which is equal to 1. Check .
&
%
MS4105
'
34
$
If S = {v1 , . . . , vk } is a set of vectors in a vector space V then in
general some vectors in V may be linear combinations of the vectors
in S and some may not. The following Theorem shows that if I
construct a set W consisting of all the vectors that can be written
as a linear combination of {v1 , . . . , vk }, then W is a subspace of V.
This subspace
P is called the span
of S and is written
k
span(S) =
i=1 λi vi , λi ∈ R — the set of all possible linear
combinations of the vectors v1 , . . . , vk .
Theorem 1.4 If S = {v1 , . . . , vk } are is a set of vectors in a vector
space V then
(a) span(S) is a subspace of V.
(b) span(S) is the smallest subspace of V that contains S in the
sense that if any other subspace X contains S, then
span(S) ⊆ X.
&
%
MS4105
'
35
$
Proof:
(a) To show that span(S) is a subspace of V, I must show that it is
non-empty and closed under addition and scalar multiplication.
• span(S) is non-empty as 0 ∈ span(S) because
0 = 0v1 + · · · + 0vk .
• If u, v ∈ W then using the definition of span(S), certainly
both u + v and αu are also — check .
&
%
MS4105
36
'
$
(b) Each vector vi is a linear combination of v1 , . . . , vk as I can
write
vi = 0v1 + · · · + 1vi + . . . 0vk .
So each of the vectors vi , i = 1 . . . k is an element of span(S).
Let X be any other vector space that contains v1 , . . . , vk . Since
X must be closed under addition and scalar multiplication, it
must contain all linear combinations of v1 , . . . , vk . So X must
contain every element of span(S) and so span(S) ⊆ X.
&
%
MS4105
'
37
$
The following Definition summarises the result & the notation:
Definition 1.5 If S = {v1 , . . . , vk } is a set of vectors in a vector
space V then the subspace span(S) of V consisting of all linear
combinations of the vectors in S is called the space spanned by S
and I say that the vectors S = {v1 , . . . , vk } span the subspace
span(S).
&
%
MS4105
'
38
$
Example 1.5 Some examples of spanning sets:
• If v1 and v2 are two non-collinear (one is not a multiple of the
other) vectors in R3 then span{v1 , v2 } is the plane through the
origin containing v1 and v2 . (The normal n to this plane is
any multiple of v1 × v2 . This simple method for calculating the
normal only works in R3 .)
• Similarly if v is a non-zero vector in R2 , R3 or Rn , then
span({v}) is just the set of all scalar multiples of v and so is
the line through the origin determined by v, namely
L = {x ∈ Rn |x = αv}.
&
%
MS4105
'
39
$
Example 1.6 A non-geometrical example; the polynomials
(monomials) 1, x, x2 , . . . , xn span the vector space Pn of
Pn
polynomials in x (Pn = {p|p = i=0 ai xi } where the coefficients ai
are real/complex).
Check that Pn is a vector space as claimed under the ordinary
operations of addition and scalar multiplication, then confirm that
Pn is spanned by the set {1, x, x2 , . . . , xn }.
(Strictly speaking, Pn is the vector space of functions that map
real numbers into polynomials of degree n.)
&
%
MS4105
'
40
$
Example 1.7 Do the vectors v1 = (1, 1, 2)T , v2 = (1, 0, 1)T and
v3 = (2, 1, 3)T span R3 ?
Solution:
Can all vectors in R3 be written as a linear
combination of these 3 vectors? If so then for arbitrary b ∈ R3 ,
b = α1 v1 + α2 v2 + α3 v3 . Substituting for v1 , v2 and v3 , I have
  
 
b1
1 1 2 α1
  
 
b2  = 1 0 1 α2  .
  
 
b3
2 1 3 α3
But the coefficient matrix has zero determinant so there are not
solutions α1 , α2 , α3 for all vectors b.
So the vectors v1 , v2 , v3 do not span R3 . It is easy to see that the
vectors must be co-planar. Check that v1 · (v2 × v3 ) = 0 and
explain the significance of the result.
&
%
MS4105
41
'
1.3.1
$
Exercises
1. Which of the following sets are subspaces of Rn ?
(a) {(x1 , x2 , . . . , xn ) : x1 + 2x2 + · · · + nxn = 0}
(b) {(x1 , x2 , . . . , xn ) : x1 x2 . . . xn = 0}
(c) {(x1 , x2 , . . . , xn ) : x1 ≥ x2 ≥ x3 · · · ≥ xn }
(d) {(x1 , x2 , . . . , xn ) : x1 , x2 , . . . , xn are integers}
(e) {(x1 , x2 , . . . , xn ) : x21 + x22 + · · · + x2n = 0}
(f) {(x1 , x2 , . . . , xn ) : xi + xn+1−i = 0, i = 1, . . . , n}
&
%
MS4105
42
'
1.4
$
Linear Independence
• A set S of vectors spans a given vector space V if every vector
in V is expressible as a linear combination of the vectors in S.
• In general there may be more than one way to express a vector
in V as a linear combination of the vectors in S.
• Intuitively this suggests that there is some redundancy in S
and this turns out to be the case.
• A spanning set that is “as small as possible” seems better.
• I will show that “minimal” spanning sets have the property
that each vector in V is expressible as a linear combination of
the spanning vectors in one and only one way.
• Spanning sets with this property play a role for general vector
spaces like that of coordinate axes in R2 and R3 .
&
%
MS4105
'
43
$
Example 1.8 Take the three vectors v1 = (1, 0)T , v2 = (0, 1)T and
v3 = (1, 1)T . They certainly span R2 as any vector (x, y)T in the
plane can be written as xv1 + yv2 + 0v3 . But v3 isn’t needed, it is
“redundant”.
I could (with slightly more effort) write any vector (x, y)T in the
plane as a combination of v1 and v3 or v2 and v3 . Check .
So it looks like I need two vectors in the plane to span R2 ?
But not any two.
For example v1 = (1, 2)T , v2 = (2, 4)T are parallel so (for example)
v = (1, 3)T cannot be written as a combination of the two as it
doesn’t lie on the line through the origin y = 2x.
I need to sort out the ideas underlying this Example. I’ll define the
term “Linear Independence” on the next Slide — I’ll then show
that it means “no redundant vectors”.
&
%
MS4105
44
'
$
Definition 1.6 If S = {v1 , . . . , vk } is a non-empty set of vectors
then the vector equation (note that the vi and the zero vector 0 are
typed in bold here to remind you that they are vectors)
α1 v1 + · · · + αk vk = 0
certainly has the solution α1 = α2 = · · · = αk = 0.
If this is the only solution, I say that the set S is linearly
independent — otherwise I say that S is a linearly dependent
set.
The procedure for determining whether a set of vectors {v1 , . . . , vk }
is linearly independent can be simply stated: check whether the
equation α1 v1 + · · · + αk vk = 0 has any solutions other than
α1 = α2 = · · · = αk = 0.
I’ll often say that I am checking that “the only solution is the
trivial solution”.
&
%
MS4105
45
'
$
Example 1.9 Let v1 = (2, −1, 0, 3)T , v2 = (1, 2, 5, −1)T and
v3 = (7, −1, 5, 8)T . In fact, the set of vectors S = {v1 , v2 , v3 } is
linearly dependent as 3v1 + v2 − v3 = 0. But suppose that I
didn’t know this.
If I want to check for linear dependence I can just apply the
definition. I write α1 v1 + α2 v2 + α3 v3 = 0 and check whether the
set of equations for α1 , α2 , α3 has non-trivial solutions.
2α1 + α2 + 7α3 = 0
−α1 + 2α2 − α3 = 0
0α1 + 5α2 + 5α3 = 0
3α1 − α2 + 8α3 = 0
&
%
MS4105
46
'
$
Check that Gauss Elimination (without pivoting) reduces the
coefficient matrix to:

1

0


0

0
0
1
0
0

3

1


0

0
Finally check that the linear system for α1 , α2 , α3 has infinitely
many solutions — confirming that v1 , v2 , v3 are linearly
dependent.
&
%
MS4105
'
47
$
Example 1.10 A particularly simple example; consider the vectors
i = (1, 0, 0)T , j = (0, 1, 0)T ,k = (0, 0, 1)T in R3 . To confirm that
they are linearly independent , write α1 i + α2 j + α3 k = 0. I
immediately have (α1 , α2 , α3 )T = (0, 0, 0)T and so “the only
solution is the trivial solution”. Geometrically, of course, the three
vectors i, j and k are just unit vectors along the x, y and z
directions.
&
%
MS4105
48
'
$
Example 1.11 Find whether v1 = (1, −2, 3)T , v2 = (5, 6, −1)T
and v3 = (3, 2, 1)T are linearly independent . Applying the
definition, write α1 v1 + α2 v2 + α3 v3 = 0 so
α1 + 5α2 + 3α3 = 0
−2α1 + 6α2 + 2α3 = 0
3α1 − α2 + α3 = 0.
Check that the solution to this linear system is α1 = − 12 t,
α2 = − 12 t and α3 = t where t is an arbitrary parameter. I conclude
that the three vectors v1 , v2 , v3 are linearly dependent. (Why?)
&
%
MS4105
49
'
$
Example 1.12 Show that the polynomials (monomials)
1, x, x2 , . . . , xn are a linearly independent set of vectors in Pn , the
vector space of polynomials of degree less than or equal to n.
Solution:
Assume as usual that some linear combination of the
vectors 1, x, x2 , . . . , xn is the zero vector. Then
n
X
ai xi = 0
for all values of x.
i=0
But it is easy to show that all the ai ’s must be zero.
&
%
MS4105
50
'
$
• Set x = 0, I see that a0 = 0.
• So x a1 + a2 x + . . . an x
n−1
= 0 for all values of x.
• If x 6= 0 then a1 + a2 x + . . . an xn−1 = 0 for all non-zero
values of x.
• Take the limit as x → 0 of a1 + a2 x + . . . an xn−1 . The result
a1 must be zero as polynomials are continuous functions.
n−2
= 0 for all values of x.
• So x a2 + . . . an x
• The argument can be continued to show that a2 , a3 , . . . an
must all be zero.
So “the only solution is the trivial solution” and therefore the set
1, x, x2 , . . . , xn are a linearly independent set of vectors in Pn .
&
%
MS4105
51
'
$
Enough examples — the term linearly dependent suggests that the
vectors “depend” on each other. The next theorem confirms this.
Theorem 1.5 A set S of two or more vectors is
(a) Linearly dependent if and only if at least one of the vectors in S
can be expressed as a linear combination of the remaining
vectors in S.
(b) Linearly independent if and only if no vector in S can be
written as a linear combination of the remaining vectors in S.
Proof:
(a) Let S = {v1 , . . . , vk } be a set of two or more vectors. Assume
that S is linearly dependent. I know that there are scalars
α1 , . . . , αk , not all zero, such that
α1 v1 + · · · + αk vk = 0.
&
%
MS4105
52
'
$
Let αp be the first non-zero coefficient. I can solve for vp :
vp = −
1
(α1 v1 + · · · + αp−1 vp−1 + αp+1 vp+1 + · · · + αk vk )
αp
So one of the vectors in S can be expressed as a linear
combination of the remaining vectors in S.
Conversely, assume that one of the vectors in S, vp say, can be
expressed as a linear combination of the remaining vectors in S.
Then
vp = β1 v1 + · · · + βp−1 vp−1 + βp+1 vp+1 + · · · + βk vk
and so I can write
β1 v1 + · · · + βp−1 vp−1 −vp + βp+1 vp+1 + · · · + βk vk = 0.
I have a “non-trivial” solution to the latter equation (coefficient
of vp is −1) so the set {v1 , . . . , vk } is linearly dependent. &
%
MS4105
53
'
(b) Check that the proof of this case is straightforward.
$
The following Theorem is easily proved.
Theorem 1.6 (a) A finite set of vectors that contains the zero
vector is linearly dependent.
(b) A set with exactly two vectors is linearly independent if and
only if neither vector is a scalar multiple of the other.
Proof: Check that the proof is (very) straightforward.
Example 1.13 Consider the set of real-valued functions on the
real line (written F(R)):
1. Check that F(R) is a vector space with addition and scalar
multiplication defined in the obvious way.
2. Consider the set of functions {x, sin x}. Check that the above
Theorem implies that this is a linearly independent set in F(R).
&
%
MS4105
54
'
$
Some Geometric Comments
• In R2 and R3 , a set of two vectors is linearly independent iff
the vectors are not collinear.
• In R3 , a set of three vectors is linearly independent iff the
vectors are not coplanar.
I now prove an important result about Rn that can be extended to
general vector spaces.
Theorem 1.7 Let S = {v1 , . . . , vk } be a set of vectors in Rn . If
k > n, then S is linearly dependent.
Proof: Write vi = (vi1 , vi2 , . . . , vin )T for each vector vi ,
i = 1, . . . , k. Then if I write as usual
α1 v1 + · · · + αk vk = 0,
&
%
MS4105
55
'
$
I get the linear system
v11 α1
+v21 α2
+...
+vk1 αk
=0
v12 α1
..
.
+v22 α2
..
.
+...
..
.
+vk2 αk
..
.
=0
..
.
v1n α1
+v2n α2
+...
+vkn αk
=0
This is a homogeneous system of n equations in the k unknowns
α1 , . . . , αk with k > n. In other words; “more unknowns than
equations”. I know from Linear Algebra 1 that such a system must
have nontrivial solutions. (Or see App. B.) So S = {v1 , . . . , vk } is a
linearly dependent set.
&
%
MS4105
56
'
$
I can extend these ideas to vector spaces of functions to get a
useful result. First, two definitions
Definition 1.7 If a function f has n continuous derivatives on R I
write this as f ∈ Cn (R).
and
Definition 1.8 (Wronskian Determinant) Given a set
{f1 , f2 , . . . , fn } of Cn−1 (R) functions, the Wronskian
determinant W(x) which depends on x is the determinant of
the matrix:


f (x)
f2 (x)
...
fn (x)
 1

 0

0
0
f
(x)
f
(x)
.
.
.
f
(x)
 1

n
2


W(x) = 
..
..
..



.
.
.


(n−1)
f1
&
(x)
(n−1)
f2
(x) . . .
(n−1)
fn
(x)
%
MS4105
57
'
$
Theorem 1.8 (Wronskian) If the Wronskian of a set of
Cn−1 (R) functions {f1 , f2 , . . . , fn } is not identically zero on R then
the set of functions form a linearly independent set of vectors in
Cn−1 (R).
Proof: Suppose that the functions f1 , f2 , . . . , fn are linearly
dependent in C(n−1) (R). Then there exist scalars α1 , . . . , αn , not
all zero, such that
α1 f1 (x) + α2 f2 (x) + · · · + αn fn (x) = 0
for all x ∈ R. Now differentiate this equation n − 1 times in
&
%
MS4105
58
'
$
succession, giving us n equations in the n unknowns α1 , . . . , αn :
α1 f1 (x)
+α2 f2 (x)
+...
+αn fn (x)
=0
α1 f10 (x)
..
.
+α2 f20 (x)
..
.
+...
+αn fn0 (x)
..
.
=0
..
.
(n−1)
α1 f1
(n−1)
(x) +α2 f2
(x)
+...
(n−1)
+αn fn
(x)
=0
I assumed that the functions are linearly dependent so this
homogeneous linear system must have a nontrivial solution for
every x ∈ R. This means that the corresponding n × n matrix is
not invertible and so its determinant — the Wronskian — is zero
for all x ∈ R.
Conversely, if the Wronskian is not identically zero on R then the
given set of functions must be a linearly independent set of vectors
in C(n−1) (R).
&
%
MS4105
'
59
$
Example 1.14 Let F = {x, sin x}. The elements of F are certainly
C1 (R). The Wronskian is
x sin x = x cos x − sin x.
W(x) = 1 cos x
Check that this function is not identically zero on R and so the set
F is linearly independent in C1 (R).
Example 1.15 Let G = {1, ex , e2x }. Again it is obvious that
F ⊆ C2 (R). Check that the Wronskian is:
1 ex e2x 3x
x
2x
W(x) = 0 e
2e = 2e .
x
2x
0 e
4e This function is certainly not identically zero on R so the set G is
linearly independent in C2 (R).
&
%
MS4105
60
'
$
Note: the converse of the theorem is not true, i.e. even if the
Wronskian is identically zero on R, the functions are not necessarily
linearly dependent. Just to settle the point — a counter-example.
Example 1.16 Let

0
f(x) =
(x + 1)2 (x − 1)2
|x| ≤ 1
|x| > 1
and
g(x) = (x − 1)2 (x + 1)2 .
Then clearly f and g are in C1 (R) :

0
f 0 (x) =
4x(x − 1)(x + 1)
|x| ≤ 1
|x| > 1
.
I can show that lim f(x) = 0 so f 0 (x) is continuous on R as is f.
x→±1
&
%
MS4105
'
61
$
Finally, if |x| ≤ 1 then W(x) = 0 clearly as f(x) = 0 on [−1, 1]. If
|x| > 0the W(x) = 0 clearly as f, g are linearly dependent for |x| > 1.
So W(x) = 0 on R.
However f and g are linearly independent on Rbecause although I
have f − g = 0 on |x| > 1, f − g 6= 0 for |x| < 1 — i.e. there is no
non-trivial choice of α and β such that αf(x) + βg(x) = 0 for all
x ∈ R.
&
%
MS4105
62
'
Exercises

1 1

1. Let A = 
 0 2
0 4
Are the rows?
$
1.4.1
1


4 
 Are the columns linearly independent?
16
2. Let S and T be subsets of a vector space V such that S ⊂ T .
Prove that a) if T is linearly independent, so is S and b) if S
is linearly dependent then so is T .
&
%
MS4105
63
'
1.5
$
Basis and Dimension
We all know intuitively what we mean by dimension — a plane is
2–dimensional, the world that we live in is 3–dimensional, a line is
1–dimensional.
In this section I make the term precise and show that it can be
used when talking about a vector space.
You have already used i and j, the unit vectors along the x and y
directions respectively in R2 . Any vector v in R2 can be written
in one and only one way as a linear combination of i and j — if
v = (v1 , v2 )T ∈ R2 then v = v1 i + v2 j.
Similarily you have used i, j and k, the unit vectors along the x, y
and z directions respectively in R3 and any vector v in R3 can be
written in one and only one way as a linear combination of i, j and
k — if v = (v1 , v2 , v3 )T ∈ R3 then v = v1 i + v2 j + v3 k.
&
%
MS4105
'
64
$
This idea can be extended in an obvious way to Rn ; if
v = (v1 , . . . , . . . , vn )T ∈ Rn then v = v1 e1 + v2 e2 + · · · + vn en
where (for each i = 1, . . . , n) ei is the unit vector whose
components are all zero except for the ith component which is one.
(Each vector ei is just the ith column of the n × n identity matrix
I.)
What is not so obvious is that this idea can be extended to sets of
vectors that are not necessarily lined up along the various standard
coordinate axes or even perpendicular. (Though I haven’t yet said
what I mean by “perpendicular” for a general vector space. )
I need a definition:
Definition 1.9 If V is any vector space and S = {v1 , . . . , vn } is a
set of vectors in V, then I say that S is a basis for V if
(a) S is linearly independent
(b) S spans V.
&
%
MS4105
'
65
$
• The next theorem explains the significance of the term “basis”.
• A basis is the vector space generalisation of a coordinate
system in R2 , R3 or Rn .
• Unlike e1 , . . . , en , the elements of a basis are not necessarily
perpendicular.
• I haven’t yet said formally what “perpendicular” means in a
vector space — I will clarify this idea in the next Chapter.
&
%
MS4105
66
'
$
Theorem 1.9 If S = {v1 , . . . , vn } is a basis for a vector space V
then every vector v ∈ V can be expressed as a linear combination of
the vectors in S in one and only one way.
Proof: S spans the vector space so suppose that some vector
v ∈ V can be written a linear combination of the vectors v1 , . . . , vn
in two different ways:
v = α1 v1 + · · · + αn vn
and
v = β1 v1 + · · · + βn vn .
Subtracting, I have
0 = (α1 − β1 )v1 + · · · + (αn − βn )vn .
But the set S is linearly independent so each coefficient must be
zero: αi = βi , for i = 1, . . . , n.
&
%
MS4105
'
67
$
I already know how to express a vector in Rn (say) in terms of the
“standard basis” vectors e1 , . . . , en . Let’s look at a less obvious
case.
&
%
MS4105
68
'
$
Example 1.17 Let v1 = (1, 2, 1)T , v2 = (2, 9, 0)T and
v3 = (3, 3, 4)T . Show that S = {v1 , v2 , v3 } is a basis for R3 .
Solution:
I must check that the set S is linearly independent and
that it spans R3 .
[Linear independence:] Write α1 v1 + α2 v2 + α3 v3 = 0 as
usual. Expressing the three vectors in terms of their
components, I find that the problem reduces to showing that the
homogeneous linear system
α1
+2α2
+3α3
=0
2α1
+9α2
+3α3
=0
+4α3
=0
α1
has only the trivial solution.
&
%
MS4105
69
'
$
[Spanning:] I need to show that every vector
w = (w1 , w2 , w3 ) ∈ R3 can be written as a linear
combination of v1 , v2 , v3 . In terms of components;
α1
+2α2
+3α3
= w1
2α1
+9α2
+3α3
= w2
+4α3
= w3
α1
So I need to show that this linear system has a solution for all
choices of the vector w.
Note that the same coefficient matrix appears in both the check for
linear independence and that for spanning. The condition needed in
both cases is that the coefficient matrix is invertible. It is easy to
check that the determinant of the coefficient matrix is −1 so the
vectors are linearly independent and span R3 .
&
%
MS4105
'
70
$
Example 1.18 Some other examples of bases, check each:
• S = {1, x, . . . , xn } is a basis for the vector space Pn of
polynomials of degree n.

 
 
 

 1 0
0 1
0 0
0 0 
,
,
,
 is a basis for the
• M= 
 0 0
0 0
1 0
0 1 
vector space M22 of 2 × 2 matrices.
• The standard basis for Mmn , (the vector space of
m × n matrices) is just the set of mn different m × n matrices
with a single entry equal to 1 and all the others equal to zero.
• Can you find a basis for the “Strange Example” vector space in
Example 1.1?
&
%
MS4105
'
71
$
Definition 1.10 I say that a vector space V is
finite-dimensional if I can find a basis set for V that consists of a
finite number of vectors. Otherwise I say that the vector space is
infinite-dimensional .
Example 1.19 The vector spaces Rn , Pn and Mmn are all finite
dimensional. The vector space F(R) of functions on R is infinite
dimensional. Check that for all positive integers n, I can find n + 1
linearly independent vectors in F(R). Explain why this means that
F(R) cannot be finite dimensional. (Hint: consider polynomials.)
&
%
MS4105
'
72
$
I am still using the term “dimension” in a qualitative way (finite
vs. infinite). I need one more Theorem — then I will be able to
define the term unambiguously. I first prove a Lemma that will
make the Theorem trivial. (I will use the “underline” notation for
vectors to improve readability.)
Lemma 1.10 Let V be a finite-dimensional vector space. Let
L = {l1 , . . . , ln } be a linearly independent set in V and let
S = {s1 , . . . , sm } be a second subset of V which spans V. Then
m ≥ n or “any spanning set in V has at least as many
elements as any linearly independent set in V”.
“|S| ≥ |L|”— alphabetic ordering....
&
%
MS4105
73
'
$
Proof: I will assume that m < n and and show this leads to a
contradiction. As S spans V I can write
l1
= a11 s1
+a12 s2
+...
+a1m sm
l2
..
.
= a21 s1
..
.
+a22 s2
..
.
+...
+a2m sm
..
.
ln
= an1 s1
+an2 s2
+...
(1.1)
+anm sm
As m < n, the n × m coefficient matrix A = {aij } for i = 1, . . . , m
and j = 1, . . . , n is “tall and thin” and so the m × n matrix AT is
“wide and short”, so the linear system AT c = 0 where
c = (c1 , . . . , cn )T is a homogeneous linear system with more
unknowns than equations and so must have non-trivial solutions for
which not all of c1 , . . . , cn are zero. (See App. B for a proof of this.)
&
%
MS4105
74
'
$
As AT c = 0, each element of the vector AT c is also zero. So I can
write (multiplying each si by the ith element of AT c):
(a11 c1 + a21 c2 + · · · + an1 cn ) s1
+ (a12 c1 + a22 c2 + · · · + an2 cn ) s2
..
.
+ (a1m c1 + a2m c2 + · · · + anm cn ) sm = 0.
&
(1.2)
%
MS4105
75
'
$
Now the tricky bit! If I gather all the c1 ’s, c2 ’s etc. together then:
c1 (a11 s1 + a12 s2 + · · · + a1m sm )
+ c2 (a21 s1 + a22 s2 + · · · + a2m sm )
..
.
+ cn (an1 s1 + an2 s2 + · · · + anm sm ) = 0
(1.3)
But the sums in brackets are just l1 , . . . , ln by (1.1) which means
that I can write:
c1 l1 + c2 l2 + · · · + cn ln = 0
with c1 , . . . , cn not all zero. But this contradicts the assumption
that L = {l1 , . . . , ln } is linearly independent. Therefore I must have
n ≤ m as claimed.
&
%
MS4105
'
76
$
In words the Lemma says that a linearly independent set in a
finite-dimensional vector space cannot have more elements than a
spanning set in the same finite-dimensional vector space.
The proof of the Lemma is messy because I am explicitly
writing out the sums.
The proof is much easier if I use “index” notation. See
Appendix A for an explanation of this notation and a proof using
it. Either proof is acceptable.
&
%
MS4105
'
77
$
Now the Theorem — easily proved as promised.
Theorem 1.11 Given a a finite-dimensional vector space V, all
its bases have the same number of vectors.
Proof:
• Let Bm and Bn be bases for V with m and n vectors
respectively.
• Obviously as Bm and Bn are bases, both are linearly
independent and span V.
• So I am free to choose either one to be L, (a linearly
independent set) and the other to be S (a spanning set).
• Remember that the Lemma says that “any spanning set in
V has at least as many elements as any linearly
independent set in V”.
&
%
MS4105
78
'
$
• If I choose Bm (with m elements) as S (a spanning set) and Bn
(with n elements) as L (a linearly independent set) I have
m ≥ n.
• The sneaky part is to choose Bn as a spanning set S and Bm as
a linearly independent set L — effectively swapping n and m in
the Lemma.
• Then I must have n ≥ m.
• So I am forced to conclude that m = n. So any two bases for a
given finite-dimensional vector space must have the same
number of elements.
&
%
MS4105
'
79
$
This allows us to define the term dimension:
Definition 1.11 The dimension of a finite dimensional vector
space V, written dim(V) is the number of vectors in any basis for
V. I define the dimension of the zero vector space to be zero.
Example 1.20 Some examples:
• dim(Rn ) = n as the standard basis has n vectors.
• dim(Pn ) = n + 1 as the standard basis {1, x, . . . , xn } has n + 1
vectors.
• dim(Mmn ) = mn as the standard basis discussed above has mn
vectors.
• What is the dimension of the “Strange Example” vector space
in Example 1.1?
&
%
MS4105
'
80
$
Three more theorems to complete our discussion of bases and
dimensionality. The first is almost obvious but has useful
applications. In plain English, the Theorem says that
• adding an external vector to a linearly independent set does
not undo the linear independence property and
• removing a redundant vector from a set does not change the
span of the set.
&
%
MS4105
81
'
$
Theorem 1.12 (Adding/Removing) Let S be a non-empty set
of vectors in a vector space V. Then
(a) If S is a linearly independent set and if v ∈ V is outside
span(S) (i.e. v cannot be expressed as a linear combination of
the vectors in S) then the set S ∪ v is still linearly
independent (i.e. adding v to the list of vectors in S does not
affect the linear independence of S).
(b) If v ∈ S is a vector that is expressible as a linear combination of
other vectors in S and if I write S \ v to mean S with the vector
v removed then S and S \ v span the same space, i.e.
span(S) = span(S \ v).
&
%
MS4105
82
'
$
Proof:
(a) Assume that S = {v1 , . . . , vk } is a linearly independent set of
vectors in a vector space V and that v ∈ V but that v is not in
span(S). RTP that S 0 = S ∪ {v} is still linearly independent. As
usual write
α1 v1 + · · · + αk vk + αk+1 v = 0
and try to show that all the αi are zero. But I must have
αk+1 = 0 as otherwise I could write v as a linear
combination of the vectors in S. So I have
α1 v1 + · · · + αk vk = 0.
The vectors v1 , . . . , vk are linearly independent so all the αi
are zero.
&
%
MS4105
'
83
$
(b) Assume that S = {v1 , . . . , vk } is a set of vectors in a vector
space V and to be definite assume that
vk = α1 v1 + · · · + αk−1 vk−1 . Now consider the smaller set
S 0 = S \ {vk } = {v1 , . . . , vk−1 }. Check that
span(S) = span(S 0 ).
&
%
MS4105
'
84
$
In general, to check that a set of vectors {v1 , . . . , vn } is a basis for a
vector space V, I need to show that they are linearly
independent and span V. But if I know that dim(V) = n then
checking either is enough! This is justified by the theorem:
Theorem 1.13 If V is an n-dimensional vector space and if S is a
set in V with exactly n elements then S is a basis for V if either S
spans V or S is linearly independent .
&
%
MS4105
'
85
$
Proof: Assume that S has exactly n vectors and spans V. RTP
that the set S is linearly independent.
• But if this is not true, then there is some vector v in S which is
a linear combination of the others.
• If I remove this redundant vector v from S then by the
Adding/Removing Theorem 1.12 the smaller set of n − 1
vectors still spans V.
• If this smaller set is not linearly independent then repeat the
process (the process must terminate at k ≥ 1 as a set with only
one vector is certainly linearly independent ) until I have a set
of (say) k linearly independent vectors (k < n) that span V.
• But this is impossible as Thm 1.11 tells us that a set of fewer
than n vectors cannot form a basis for a n-dimensional vector
space.
So S is linearly independent.
&
%
MS4105
86
'
$
Let B be any basis for V. Let S have exactly n vectors and be
linearly independent. RTP that S spans V. Assume not.
• Let C consist of the elements of the basis B not in span(S).
• Add elements of C one at a time to S — by the
Adding/Removing Theorem 1.12, the augmented versions of S
are still linearly independent.
• Continue until the augmented version of S spans V.
• As C is a finite set the process must terminate with the
augmented version of S spanning V and linearly independent.
• But this is impossible as Thm 1.11 tells us that no set of more
than n vectors in an n-dimensional vector space can be
linearly independent.
So S spans V.
&
%
MS4105
'
87
$
Example 1.21 Show that the vectors v1 = (2, 0, −1)T ,
v2 = (4, 0, 7)T and v3 = (−1, 1, 4)T form a basis for R3 . I need
only check that the vectors are linearly independent. But by
inspection v1 and v2 are linearly independent so as v3 is outside
the x–z plane in which v1 and v2 lie, by Theorem 1.13 the set of
all three is linearly independent. .
&
%
MS4105
'
88
$
The final theorem in this Chapter is often used in Matrix Algebra;
it says that for a finite-dimensional vector space V, every set that
spans V has a subset that is a basis for V — and that every linearly
independent set in V can be expanded to form a basis for V.
Theorem 1.14 Let S be a finite set of vectors in a
finite-dimensional vector space V.
(a) If S spans V but is not a basis for V then S can be reduced to a
basis for V by discarding some of the redundant vectors in S.
(b) If S is a linearly independent set that is not a basis for V then S
can be expanded to form a basis for V by adding certain
external vectors to S.
&
%
MS4105
'
89
$
Proof:
(a) If S is a set of vectors that spans V but is not a basis for V
then it must be linearly dependent. Some vector v in S must be
expressible as a linear combination of some of the others. By
the Adding/Removing Theorem 1.12(b) I can remove v from S
and the resulting set still spans V. If this set is linearly
independent I am done, otherwise remove another “redundant”
vector.
Let dim(V) = n. If the size of S (written |S|) is less than n then
the process must stop at a linearly independent set of size k,
1 ≤ k < n which spans V. This is impossible (why?).
So |S| > n. Apply the removal process — it must continue until
a set of n vectors that spans V is reached. (Why can there not
be a linearly independent spanning set of size greater than n?)
By Thm. 1.13 this subset of S is linearly independent and a
basis for V as claimed.
&
%
MS4105
'
90
$
(b) Suppose that dim(V) = n. If S is a linearly independent set
that is not a basis for V then S fails to span V and there must
be some vector v ∈ V that is not in span(S). By the
Adding/Removing Theorem 1.12(a) I can add v to S while
maintaining linear independence. If the new set S 0 (say) spans
V then S 0 is a basis for V. Otherwise I can select another
suitable vector to add to S 0 to produce a set S 00 that is still
linearly independent. I can continue adding vectors in this way
till I reach a set with n linearly independent vectors in V. This
set must be a basis by Thm. 1.13.
&
%
MS4105
91
'
1.5.1
$
Exercises
1. Let B = {v1 , v2 , . . . ., vn } be a basis for the vector space V. Let
1 < m < n and let V1 be the subspace spanned
by{v1 , v2 , . . . ., vm } and let V2 be the subspace spanned by
{vm+1 , vm+2 , . . . , vn }. Prove that V1 ∩ V2 = {0}.
2. Let W be the set of all 3 × 3 real matrices M with the property
that all matrices in W have equal row & column sums.
(Meaning if I take any vector (matrix) in W all its row sums
are equal to some constant C and all its column sums are equal
to the same constant C.)
(a) Prove that W is a subspace of the vector space of 3 × 3 real
matrices. (Easy.)
(b) Find a basis for this subspace. (Difficult.)
(c) Determine the dimension of the subspace W. (Easy once (b)
completed or just by thinking about the problem.)
&
%
MS4105
'
92
$
(d) Let W0 be the set of all 3 × 3 matrices whose row sums and
column sums are zero. Prove that W0 is a subspace of W
and determine its dimension. (Easy once (b) completed.)
(e) Generalise to the case of n × n real matrices with equal row
and column sums.
&
%
MS4105
93
'
2
$
Inner Product Spaces
You should be familiar with the idea of a “dot product” in R2 ,
R3 and Rn . A reminder: given vectors u and v in R2 , R3 and Rn ;
u·v =
n
X
ui vi .
(2.1)
i=1
Example 2.1 Let u = (1, 2, 3)T and v = (−1, 2, −1)T . Then
u · v = (1)(−1) + (2)(2) + (3)(−1) = 0.
You learned in Linear Algebra 1 that if (as in the example)
u·v
u · v = 0, the two vectors are perpendicular: cos θ = kukkvk
=0
√
where kuk = u · u.
&
%
MS4105
'
94
$
I will show in this Chapter how these ideas — length, distance and
angle — can be extended to any vector space (satisfying the 10
vector space axioms in Def. 1.2) if some extra axioms hold.
A vector space satisfying these four extra axioms is called an
Inner Product Space.
&
%
MS4105
95
'
2.1
$
Inner Products
• In Linear Algebra 1 you probably used the standard “dot”
notation for the Euclidean inner product u · v of two vectors in
Rn — (2.1).
• When talking about the general inner product of two vectors
from a general inner product space I will use the notation
hu, vi.
&
%
MS4105
'
96
$
Definition 2.1 An inner product on a real vector space V is a
function that assigns a real number hu, vi to every pair of vectors u
and v in V so that the following axioms are satisfied for all vectors
u, v and w and all scalars α:
1. u, v = v, u
Symmetry Axiom
2. u + v, w = u, w + v, w
Distributive Axiom
3. αu, v = α u, v
Homogeneity Axiom
4. u, u ≥ 0 and u, u = 0 if and only if u = 0.
A real vector space with an inner product as defined above is called
a real inner product space. The slightly more general
complex inner product space will be defined later.
&
%
MS4105
97
'
$
The four axioms for an inner product space are based on the
properties of the Euclidean inner product so the Euclidean inner
product (or “dot product”) automatically satisfies the axioms for
an inner product space :
Example 2.2 If u = (u1 , . . . , un )T and v = (v1 , . . . , vn )T are
vectors in Rn then defining
u, v = u · v =
n
X
ui v i
i=1
defines u, v to be the Euclidean inner product on Rn . Check that
the four axioms Def. 2.1 hold.
&
%
MS4105
98
'
$
Example 2.3 A slight generalisation of the Euclidean inner
product on Rn is a weighted Euclidean inner product on Rn . If
u = (u1 , . . . , un )T and v = (v1 , . . . , vn )T are vectors in Rn and wi
are a set of positive weights then defining
u, v =
n
X
wi ui vi
i=1
check that u, v is an inner product.
&
%
MS4105
99
'
2.1.1
$
Exercises
1. Verify that the following is an inner product on R2 .
(a1 , a2 ), (b1 , b2 ) = a1 b1 − a2 b1 − a1 b2 + 4a2 b2
2. Let u = (3, −1, 0, 1/2)T , v = (−1, 3, −1, 1)T , w = (0, 2, −2, −1)T
4
be vectors in R . Calculate the inner products u, v , v, w
and w, u .
3. Check that A, B = trace(BT A) is an inner product on the
vector space of real n × n matrices, where
Pn
trace(A) = i=1 aii .
4. Let V be the vector space of continuous real-valued functions
R1
on [0, 1]. Define f, g = 0 f(t)g(t)dt. Verify that this is an
inner product on V.
&
%
MS4105
'
100
$
5. In Example 2.3 above; if I weaken the requirement that the
weights wi are positive, do I still have an IPS? Why/why not?
2
6. If (as below) I define kxk = x, x , prove that the
Parallelogram law
2
2
2
2
ku + vk + ku − vk = 2 kuk + kvk
(2.2)
holds in any inner product space. (Also draw a sketch to
illustrate the result in R2 with the Euclidean Inner Product.)
&
%
MS4105
101
'
2.1.2
$
Length and Distance in Inner Product Spaces
I said that inner product spaces allow us to extend the ideas of
length and distance from Rn to any vector space (once an inner
product is defined). In Rn , the (Euclidean) length of a vector u is
just
√
kuk = u · u
(2.3)
and the Euclidean distance between two points (or vectors)
u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) is
p
d(u, v) = ku − vk = (u − v) · (u − v).
&
(2.4)
%
MS4105
102
'
$
Note In Rn the distinction between a point (defined by a set of
n coordinates) and a vector (defined by a set of n components) is
just a matter of notation or interpretation. A “position vector”
v = (v1 , . . . , vn )T can be interpreted as a translation from the
origin to the “point” v whose coordinates are (v1 , . . . , vn ).
(v1 , v2 )
v = (v1 , v2 )T
Figure 1: Position Vector/Point Equivalence in R2
&
%
MS4105
103
'
$
Based on these formulas for Rn (“Euclidean n-space”), I can make
corresponding definitions for a general Inner Product Space.
Definition 2.2 If V is an inner product space then the norm
(“length”) of a vector u is written kuk and defined by:
q
kuk =
u, u .
The distance between two points/vectors u and v is written
d(u, v) and is defined by:
d(u, v) = ku − vk.
Example 2.4 Check that Euclidean n-space is an inner product
space.
Example 2.5 For the weighted Euclidean inner product check that
n
X
wi u2i . (Note that I often write the formula for kuk2
kuk2 =
i=1
&
%
MS4105
'
104
$
rather than kuk to avoid having to write the square root symbol.)
&
%
MS4105
'
105
$
Note: I cannot yet prove the list of properties of vector norms given
in Thm. 2.3 below as the proof requires the Cauchy-Schwarz
inequality (2.7) which is proved in the next Section.
&
%
MS4105
106
'
2.1.3
$
Unit Sphere in Inner Product Spaces
If V is an inner product space then the set of vectors that satisfy
kuk = 1 is called the unit sphere in V. A vector on the unit
sphere is called a unit vector. In R2 and R3 with the standard
Euclidean inner product , the unit sphere is just a unit circle or
unit sphere respectively. But even in R2 , R3 and Rn (n ≥ 3),
different inner products give rise to different geometry.
Example 2.6 Consider R2 with the standard Euclidean inner
product. Then the unit sphere kuk = 1 is just the set of vectors
u ∈ R2 such that u21 + u22 = 1, the familiar unit circle. But if I use
a weighted inner product , say w1 = 1 and w2 = 14 then the unit
u21
u2
2
4
sphere kuk = 1 corresponds to the set
+
= 1, an ellipse
whose semi-major axis (length 2) is along the y-direction and
semi-minor axis (length 1) along the x-direction.
&
%
MS4105
'
107
$
Example 2.7 An important class of inner products on Rn is the
class of inner products generated by matrices. Let A be an n × n
invertible matrix. Then check that
u, v = (Au) · (Av)
defines an inner product for Rn .
The exercise is easier if I note that the Euclidean inner product
u · v = vT u where — as noted earlier — I treat all vectors as
column vectors and so vT is a row vector and so vT u is the
matrix product of a 1 × n matrix and a n × 1 matrix. The result
is of course a 1 × 1 matrix — a scalar.
Using this insight, I can write u, v = (Av)T Au = vT AT Au.
&
%
MS4105
'
108
$
Example 2.8 I can define inner products on Pn (the vector space
of polynomials of degree n) in more than one way. Suppose that
vectors p and q in Pn can be written p(x) = p0 + p1 x + . . . pn xn
and q(x) = q0 + q1 x + . . . qn xn . Then define
p, q = p0 q0 + p1 q1 + · · · + pn qn
and check that this is an inner product.
The norm of the polynomial p with respect to this inner product is
Pn
2
given by kpk = p, p = i=0 p2i and the unit sphere in Pn with
this inner product is the set of coefficients that satisfy the equation
Pn
2
i=0 pi = 1. So, for exanple, if I take n = 2 then the quadratic
1
q(x) = √ (1 + x + x2 ) is a unit vector in P2 , the inner product
3
space of polynomials of degree 2 with the inner product described.
&
%
MS4105
109
'
$
Example 2.9 An important example: given two functions f and g
in C([a, b]) — the vector space of continuous functions on the
interval [a, b] — define
Zb
f, g =
f(x)g(x)dx.
a
It is easy to check that the four inner product space axioms
Def. 2.1 hold.
Example 2.10 This allows us to define a norm on C([a, b])
Zb
kf k2 = f , f =
f2 (x)dx.
a
&
%
MS4105
'
110
$
Some simple properties of inner products :
Theorem 2.1 If u,v and w are vectors in a real inner product
space and α ∈ R then (0 is the zero vector, not the number 0)
(a) 0, v = 0
(b) u, v + w = u, v + u, w
(c) u, αv = α u, v
(d) u − v, w = u, w − v, w
(e) u, v − w = u, v − u, w
Proof: Check that all are simple consequences of the four inner
product space axioms Def. 2.1.
Example 2.11 Doing algebra in inner product spaces is
straightforward using the defining axioms and this theorem: try to
“simplify” u − 2v, 3u + 4v .
&
%
MS4105
111
'
2.2
$
Angles and Orthogonality in Inner Product
Spaces
I will show that angles can be defined in a general inner product
space. Remember that in Linear Algebra 1 you saw that given two
vectors u and v in Rn , you could write
u · v = kukkvk cos θ
or equivalently
cos θ =
u·v
kukkvk
(2.5)
It would be reasonable to define the cosine of the angle between
two vectors in an inner product space to be given by the inner
product space version of (2.5)
&
u, v
cos θ =
.
kukkvk
(2.6)
%
MS4105
112
'
$
But −1 ≤ cos θ ≤ 1 so for this (2.6) definition to work I need
| cos θ| ≤ 1 or:
u, v ≤1
kukkvk This result is called the Cauchy-Schwarz Inequality and holds in
any inner product space. I state it as a theorem:
Theorem 2.2 (Cauchy-Schwarz) If u and v are vectors in a
real inner product space then
u, v ≤ kukkvk.
(2.7)
Proof: If v = 0 then both sides of the inequality are zero. So
asume that v 6= 0. Consider the vector w = u + tv where t ∈ R.
By the non-negativity of w, w I have for any real t:
2
0 ≤ w, w = u + tv, u + tv = t v, v + 2t u, v + u, u
= at2 + 2bt + c
&
%
MS4105
113
'
$
with a = v, v , b = u, v and c = u, u . The quadratic
coefficient a is non-zero as I have taken v 6= 0.
The quadratic at2 + 2bt + c either has no real roots or a double
real root so its discriminant must satisfy b2 − ac ≤ 0. Substituting
2 for a, b and c I find that u, v − u, u v, v ≤ 0 and so
2
u, v ≤ u, u v, v = kuk2 kvk2 .
Taking square roots (using the fact that u, u and v, v are both
non-negative) gives us
q
u, v ≤
u, u v, v = kukkvk.
(Note: as u, v is real in a real inner product space, here | u, v |
means the absolute value of u, v .)
&
%
MS4105
114
'
$
An alternative proof that will also work for a complex inner
product space — again assume that v 6= 0:
Proof: Again (let t ∈ R)
2
0 ≤ w, w ≡ u − tv, u − tv = t v, v − 2t u, v + u, u .
u, v
— substituting gives
The sneaky part: take t = v, v
2
u, v u, v u, v + u, u
0≤ 2 v, v − 2 v, v
v, v
2
u, v
≤ u, u −
v, v
Therfore multiplying across by v, v , the result follows.
&
%
MS4105
115
'
$
u, v
in this proof means that I have
Note The choice t =
v, v
u, v
v
u,v
v = kuk cos θ
, the
chosen w = u −
v. But v,v
kvk
v, v
projection of u along v, written Projv u. So w = Projv⊥ u the
projection of u along the direction perpendicular to v as can be
checked by calculating v, w . Check that the result is zero. I will
discuss projections and projection operators in detail later.
u
w ≡ Proj ⊥ u
v
θ
v
Projv u
Figure 2: Projection of u perpendicular to v
&
%
MS4105
116
'
$
The properties of length and distance in a general inner product
space are the same as those for Rn as the following two Theorems
confirm.
Theorem 2.3 (Properties of Vector Norm) If u and v are
vectors in an inner product space V and if α ∈ R then
(a) kuk ≥ 0
(b) kuk = 0 if and only if u = 0.
(c) kαuk = |α|kuk
(d) ku + vk ≤ kuk + kvk
the Triangle Inequality.
Proof: Check that (a), (b) and (c) are trivial consequences of the
inner product space axioms.
&
%
MS4105
117
'
$
To prove (d);
2
ku + vk = u + v, u + v
= u, v + 2 u, v + v, v
≤ u, v + 2| u, v | + v, v
≤ u, v + 2kukkvk + v, v
by C-S inequality
= kuk2 + 2kukkvk + kvk2
2
= (kuk + kvk) .
Taking square roots gives the required result.
Note: an important variation on the Triangle Inequality is:
ku − vk ≥ kuk − kvk
(2.8)
which is often used in calculus proofs. Check it!
&
%
MS4105
118
'
$
Theorem 2.4 (Properties of Distance) If u, v and w are
vectors in an inner product space V and if α ∈ R then the distance
between u and v (remember that d(u, v) = ku − vk) satisfies the
identities:
(a) d(u, v) ≥ 0
(b) d(u, v) = 0 if and only if u = v.
(c) d(u, v) = d(v, u)
(d) d(u, v) ≤ d(u, w) + d(w, v)
the Triangle Inequality.
Proof: Check that (a), (b) and (c) are trivial consequences of the
inner product space axioms. Exercise: prove (d).
&
%
MS4105
'
119
$
I can now define the angle between two vectors by (2.6) which I
repeat here:
Definition 2.3 Given two vectors u and v in an inner product
space V, define the angle θ between u and v by
u, v
cos θ =
.
kukkvk
Thanks to the Cauchy-Schwarz Inequality (2.7) I know that this
definition ensures that | cos θ| ≤ 1 as must be the case for the
Cosine to be well-defined.
Example 2.12 Let R4 have the Euclidean inner product. Let
u = (4, 3, 1, −2)T and v = (−2, 1, 2, 3)T . Check that the angle
3
between u and v is arccos − √
.
2 15
&
%
MS4105
'
120
$
When two vectors in Rn have the property u · v = 0 I say that they
are orthogonal. Geometrically this means that the angle between
them is π/2 and so cos θ = 0.
I can now extend this idea to vectors in a general inner product
space.
Definition 2.4 Two vectors in an inner product space are
orthogonal if u, v = 0.
Example 2.13 Consider the vector space C([−1, 1]) (continuous
R1
functions on [−1, 1] with the inner product f , g = −1 f(x)g(x)dx.
Let f(x) = x and g(x) = x2 . Then it is easy to check that
f , g = 0 and so the two vectors are orthogonal.
Example 2.14 Given any odd function f and any even function g,
both in C([−1, 1]) with the inner product of the previous example,
check that f and g are orthogonal.
&
%
MS4105
'
121
$
• As these two examples show, orthogonality in a general inner
product space is a much more abstract idea than that of (say)
two vectors in R2 being perpendicular.
• However I will show that an orthogonal basis is the natural
generalisation of a set of perpendicular coordinate vectors (e.g.
i, j and k in R3 ).
&
%
MS4105
122
'
2.2.1
$
Exercises
1. Prove that if u and v are vectors in a (real) inner product
space then:
1
1
2
u, v = ku + vk − ku − vk2 .
4
4
(2.9)
The question seems to say that if I can define a norm in a
vector space then automatically I have an inner product space.
The situation is not that simple. In fact there is an important
result (the Jordan von Neumann Lemma) which says that if I
define an “inner product” using the formula in the previous
question then provided the Parallelogram law (2.2) holds for all
u, v ∈ V then u, v as defined in (2.9) satisfies all the axioms
for an inner product space. Without the Parallelogram law, the
inner product axioms do not hold.
&
%
MS4105
'
123
$
n
n
2. Show that maxn
(|u
|
+
|v
|)
≤
max
|u
|
+
max
i
i
i
i=1
i=1
i=1 |vi |.
3. The ∞ norm (to be discussed later in Section 4.3.1) is defined
by kvk∞ = maxn
i=1 |vi |. Check that it satisfies the norm
properties 2.3. (To prove the Triangle Inequality for the
∞-norm, you’ll need the result of the previous Exercise.)
4. Check that the Parallelogram Law does not hold for this norm
and so I cannot derive an inner product using (2.9). (Hint: use
the vectors u = (1, 1)T and v = (1, 0)T .)
5. (Difficult) Prove the Jordan von Neumann Lemma for a real
inner product space
Lemma 2.5 (Jordan von Neumann) If V is a real vector
space and k · k is a norm defined on V then, provided the
Parallelogram law (2.2) holds for all u, v ∈ V, the expression
(2.9) defines an inner product on V.
See App. C for a proof.
&
%
MS4105
124
'
2.3
$
Orthonormal Bases
I will show that using an orthogonal basis set for an inner product
space simplifies problems — analogous to choosing a set of
coordinate axes in Rn . I formally define some terms:
Definition 2.5 A set of vectors in an inner product space is called
an orthogonal set if each vector in the set is orthogonal to every
other. If all the vectors in an orthogonal set are unit vectors
(kuk = 1) the set is called an orthonormal set.
Example 2.15 Let v1 = (0, 1, 0)T , v2 = (1, 0, 1)T and
v3 = (1, 0, −1)T . The set S = {v1 , v2 , v3 } can easily be checked to be
orthogonal wrt the Euclidean inner product. The vectors can be
normalised by dividing each by its norm, e.g. kv1 k = 1 so set
√
√
v
2
u1 = v1 , kv2 k = 2 so define u2 = √2 and kv3 k = 2 so define
v3
0
u3 = √
.
The
set
S
= {u1 , u2 , u3 } is an orthonormal set.
2
&
%
MS4105
125
'
$
Example 2.16 The most familiar example of an orthonormal set
is the set {e1 , . . . , en } in Rn where as usual ei is a column vector of
all zeroes except for its ith entry which is 1.
An important idea is that of coordinates wrt orthonormal bases.
The following theorem explains why:
Theorem 2.6 If S = {u1 , . . . , un } is an orthonormal basis for an
inner product space V and v is any vector in V then
v = v, u1 u1 + · · · + v, un un .
Proof: As S is a basis it spans V, so any vector v ∈ V can be
written as a linear combination of u1 , . . . , un :
v = α1 u1 + · · · + αn un .
&
%
MS4105
126
'
$
RTP that αi = v, ui for i = 1, . . . , n. For each vector ui ∈ S I
have:
v, ui = α1 u1 + · · · + αn un , ui
= α1 u1 , ui + α2 u2 , ui + · · · + αn un , ui .
But the vectors ui are orthonormal so that ui , uj = 0 unless i = j
and ui , ui = 1. So the “only term that survives” in the
expression for v, ui is αi giving us αi = v, ui as required.
Example 2.17 Given the vectors u1 = (0, 1, 0)T ,
u2 = (−4/5, 0, 3/5)T and u3 = (3/5, 0, 4/5)T it is easy to
check that S = {u1 , u2 , u3 } is an orthonormal basis for R3 with the
Euclidean inner product. Then express the vector (1, 1, 1)T are a
linear combination of the vectors in S. Check that this is easy
using Thm. 2.6.
&
%
MS4105
127
'
$
I can state a Theorem that summarises many of the nice properties
that orthonormal bases have:
Theorem 2.7 If S is an orthonormal basis for an n-dimensional
inner product space V and if the vectors u, v ∈ V have components
n
{ui }n
i=1 and {vi }i=1 wrt S then
(a) kuk2 =
n
X
u2i .
i=1
qP
n
2
(b) d(u, v) =
i=1 (ui − vi ) .
Pn
(c) u, v = i=1 ui vi .
Proof: Left as an exercise.
&
%
MS4105
128
'
$
It is easy to check that:
Theorem 2.8 If S is an orthogonal set of (non-zero) vectors in an
inner product space then S is linearly independent.
Proof: Left as an exercise.
Before seeing how to calculate orthonormal bases I need one more
idea — orthogonal projections.
Definition 2.6 (Orthogonal Projection) Given an inner
product space V and a finite-dimensional subspace W of V then if
{u1 , . . . , uk } is an orthogonal basis for W; for any v ∈ V define
ProjW v the orthogonal projection of v onto W by
k X
v, ui
ProjW v =
ui
(2.10)
kui k2
i=1
&
%
MS4105
129
'
or
$
v, u1
v, uk
ProjW v =
u1 + · · · +
uk .
kui k2
kuk k2
Geometrically, the projection of v onto W is just that component
of v that lies in W. A second definition:
Definition 2.7 (Orthogonal Component) Given the preamble
to the previous Definition, the component of v orthogonal to
W is just
ProjW ⊥ v = v − ProjW v.
Notice that by definition
v = ProjW v + ProjW ⊥ v
so the vector v is decomposed into a part in W and a second
part. I expect from the name that ProjW v is orthogonal to
ProjW ⊥ v — in other words that ProjW v, ProjW ⊥ v = 0.
&
%
MS4105
130
'
$
Let’s check:
ProjW v, ProjW ⊥ v = ProjW v, v − ProjW v
= ProjW v, v − ProjW v, ProjW v
v, uk v, u1 u1 , v + · · · +
uk , v
=
2
2
ku1 k
kuk k
!
2
2
v, u1
v, uk
2
2
−
ku
k
+
·
·
·
+
ku
k
1
k
ku1 k4
kuk k4
= 0.
To summarise:
• ProjW v is the component of the vector v in the subspace W,
• ProjW ⊥ v is the component of vector v orthogonal to W.
&
%
MS4105
131
'
2.3.1
$
Calculating Orthonormal Bases
I have seen some examples of the simplifications that result from
using orthonormal bases. I will now present an algorithm which
constructs an orthonormal basis for any nonzero finite-dimensional
inner product space — the description of the algorithm forms the
proof of the following Theorem:
Theorem 2.9 (Gram-Schmidt Orthogonalisation) Every
non-zero finite-dimensional inner product space has an
orthonormal basis.
Proof: Let V be any non-zero finite-dimensional inner product
space and suppose that {v1 , . . . , vn } is a basis for V. RTP that V
has an orthogonal basis. The following algorithm is what I need:
&
%
MS4105
132
'
$
Algorithm 2.1
(1) Gram-Schmidt Orthogonalisation Process
(2) begin
(3)
u1 = v1
(4)
W1 = span(u1 )
(5)
(6)
(7)
(8)
(9)
(10)
v2 , u1
u2 = ProjW1⊥ v2 ≡ v2 − ProjW1 v2 ≡ v2 −
u1
ku1 k2
W2 = span(u1 , u2 )
v3 , u1
v3 , u2
u3 = ProjW2⊥ v3 ≡ v3 − ProjW2 v3 ≡ v3 −
u1 −
u2
2
2
ku1 k
ku2 k
W3 = span(u1 , u2 , u3 )
while (i ≤ n) do
i−1 X
vi , uj
ui = ProjWi−1
vi ≡ vi − ProjWi−1 vi ≡ vi −
uj
⊥
2
kuj k
j=1
end
(12) end
&
(11)
%
MS4105
'
133
$
At the first step — (3) — I set u1 = v1 .
At each subsequent step — namely (5) and (7) — I calculate the
component of vi orthogonal to the space spanned by the already
calculated vectors u1 , . . . , ui−1 .
At line (9) I “automate” the algorithm to continue until all the
vectors v4 , . . . , vn have been processed. Note that at each step I
am guaranteed that the new vector ui 6= 0 as otherwise vi is a
linear combination of u1 , . . . , ui−1 which set is a linear
combination of v1 , . . . , vi−1 which cannot be as I am given that
v1 , . . . , vn are linearly independent.
So the algorithm is guaranteed to generate an orthonormal
basis for the inner product space V.
&
%
MS4105
134
'
$
Example 2.18 Consider R3 with the Euclidean inner product.
Apply the G-S orthogonalisation process to transform the linearly
independent vectors: v1 = (2, 1, 0)T , v2 = (1, 0, 2)T and
v3 = (0, −2, 1)T into an orthonormal basis for R3 .
• Set u1 = v1 = (2, 1, 0)T .
• Set
v2 , u1
u2 = v2 −
u1
ku1 k2
2
= (1, 0, 2)T − (2, 1, 0)T = (1/5, −2/5, 2)T
5
&
%
MS4105
135
'
$
• Set
v3 , u1
v3 , u2
u3 = v3 −
u1 −
u2
2
2
ku1 k
ku2 k
−2
14/5
T
T
= (0, −2, 1) −
(2, 1, 0) −
(1/5, −2/5, 2)T
5
21/5
2
2
T
T
= (0, −2, 1) + (2, 1, 0) − (1/5, −2/5, 2)T
5
3
= (10/15, −20/15, −1/3)
= (2/3, −4/3, −1/3)T .
Check that indeed the vectors u1 , u2 and u3 are orthogonal.
&
%
MS4105
136
'
2.3.2
$
Exercises
1. Verify that the set
{(2/3, 2/3, 1/3), (2/3, −1/3, −2/3), (1/3, −2/3, 2/3)} forms an
orthonormal set in R3 . Express the vectors of the standard
basis of R3 as linear combinations of these vectors.
2. The topic of Fourier Series gives a nice illustration of an
orthogonal set of functions. Let V be the inner product space
of real-valued functions that are integrable on the interval
Rπ
(−π, π) where the inner product is u, v = −π f(t)g(t)dt so
Rπ
2
that kfk = −π f(t)2 dt.
(a) Check
that the (infinite) set
S = 1, cos(t), cos(2t) . . . , sin(t), sin(2t), . . . is an
orthogonal set. (Integrate twice by parts.)
&
%
MS4105
'
137
$
(b) Show that I can normalise the set S to get an orthonormal
0
set
S =
√1 , √1 cos(t), √1 cos(2t) . . . , √1 sin(t), √1 sin(2t), . . .
π
π
π
π
2π
3. Notice that I cannot say that S 0 is an orthonormal basis for V,
the inner product space of real-valued functions that are
integrable on the interval (−π, π) as I need to show that it
spans V but I can show that it is linearly independent using
Thm. 2.8 — why?
&
%
MS4105
138
'
$
4. The Fourier Series for a function f is
∞
∞
X
√
cos kt X
sin kt
αk √ +
βk √
F(t) = α0 / 2π +
π
π
k=1
k=1
√1 , f
2π √1
π
cos kt, f and
where α0 =
, αk =
1
βk = √π sin kt, f . Is F(t) = f(t) for all t ∈ R for all
real-valued functions that are integrable on the interval
(−π, π)? Why/why not?
5. Can you state (without proof) what we can reasonably expect?
See Appendix R for a formal statement.
6. In particular S 0 is not a basis for V, the inner product space of
real-valued functions that are integrable on the interval (−π, π)
as F is continuous while f need not be.
&
%
MS4105
139
'
$
7. The Fourier Series is usually written
∞ X
F(t) = a0 /2 +
ak cos kt + bk sin kt
k=1
Rπ
where ak = −π f(t) cos(kt) dt for k ≥ 0 and
Rπ
bk = −π f(t) sin(kt) dt for k ≥ 1. Show that for the “square
wave” f(t) = 1, 0 < t ≤ π and f(t) = −1, −π ≤ t < 0 you have
ak ≡ 0 and bk = 0 for k even and 4/(kπ) when k is odd.
8. So what is the value of F(0)?
9. The result in Appendix R says that
a0 + lim
N→∞
N X
ak cos kt + bk sin kt
= f(t)
k=1
at all points in (−π, π) where f(t) is continuous. Where is f(t)
not continuous?
&
%
MS4105
140
'
$
10. If you want to test your matlab or octave skills, try plotting
N
X
k=1
bk sin kt ≡
N
X
1
4
sin(2k − 1)t
(2k − 1)π
for a range of increasing values of N. You’ll get something like:
&
Figure 3: Fourier Series for Square Wave
%
MS4105
'
141
$
11. Let V be the vector space of continuous real-valued functions
R1
on [0, 1]. Define (f, g) = 0 f(t)g(t)dt. I know that this is an
inner product on V. Starting with the set {1, x, x2 , x3 }, use the
Gram-Schmidt Orthogonalisation process to find a set of four
polynomials in V which form an orthonormal set with respect
to the chosen inner product.
&
%
MS4105
'
3
142
$
Complex Vector and Inner Product
Spaces
In this very short Chapter I extend the previous definitions and
discussion of vector spaces and inner product spaces to the case
where the scalars may be complex. I need to do this as our
discussion of Matrix Algebra will include the case of vectors and
matrices with complex components. The results that I establish
here will apply immediately to these vectors and matrices.
&
%
MS4105
143
'
3.1
$
Complex Vector Spaces
A vector space where the scalars may be complex is a complex
vector space. The axioms in Def. 1.1 are unchanged. The
definitions of linear independence, spanning, basis,
dimension and subspace are unchanged.
The most important example of a complex vector space is of course
Cn . The standard basis e1 , . . . , en is unchanged from Rn .
&
%
MS4105
144
'
3.2
$
Complex Inner Product Spaces
Definition 3.1 If u, v ∈ Cn then the complex Euclidean inner
product u · v is defined by
¯n
u · v = u1 ¯v1 + · · · + un v
where for any z ∈ C, z¯ is the complex conjugate of z.
&
%
MS4105
145
'
$
Compare this definition with (2.1) for the real Euclidean inner
product . The Euclidean inner product of two vectors in Cn is in
general a complex number so the positioning of the complex
conjugate “bar” in the Definition is important as in general
¯ i vi (the first expression is the conjugate of the second).
¯ i 6= u
ui v
The rule in Def. 3.1 is that
“The bar is on the second vector”.
This will allow us to reconcile the rules for dot products in complex
Euclidean n-space Cn with general complex inner product spaces
— to be discussed below.
&
%
MS4105
146
'
3.2.1
$
Properties of the Complex Euclidean inner product
It is useful to list the properties of the Complex Euclidean inner
product as a Theorem:
Theorem 3.1 If u, v, w ∈ Cn and if α ∈ C the
(a) u · v = v · u
(b) (u + v) · w = u · w + v · w
(c) (αu) · v = α(u · v)
(d) v · v ≥ 0 and v · v = 0 if and only if v = 0.
Proof: Check .
The first (part (a)) property is the only one that differs from the
corresponding properties of any real inner product such as
Euclidean Rn .
&
%
MS4105
'
147
$
I can now define a complex inner product space based on the
properties of Cn .
Definition 3.2 An inner product on a complex vector space V is a
mapping from V × V to C that maps each pair of vectors u and v
in V into a complex number u, v so that the following axioms are
satisfied for all vectors u, v and w and all scalars α ∈ C:
1. u, v = v, u
Symmetry Axiom
2. u + v, w = u, w + v, w
Distributive Axiom
3. αu, v = α u, v
Homogeneity Axiom
4. u, u ≥ 0 and u, u = 0 if and only if u = 0.
Compare with the axioms for a real inner product space in Def. 2.1
— the only axiom to change is the Symmetry Axiom.
&
%
MS4105
148
'
$
Some simple properties of complex inner product spaces (compare
with Thm. 2.1
Theorem 3.2 If u,v and w are vectors in a complex inner product
space and α ∈ C then (0 is the zero vector, not the number 0)
(a) 0, v = v, 0 = 0
(b) u, v + w = u, v + u, w
¯ u, v .
(c) u, αv = α
Proof: Check .
N.B. Property (c) tells us that I must take the complex conjugate
of the coefficient of the second vector in an inner product. It is
very easy to forget this! For example u, (1 − i)v = (1 + i) u, v .
&
%
MS4105
149
'
3.2.2
$
Orthogonal Sets
The definitions of orthogonal vectors, orthogonal set,
orthonormal set and orthonormal basis carry over to complex
inner product spaces without change. All our Theorems (see
Slide 156) on real inner product spaces still apply to complex inner
product spaces and the Gram-Schmidt process can be used to
convert an arbitrary basis for a complex inner product space into
an orthonormal basis.
&
%
MS4105
150
'
$
But I need to be careful now that I am dealing with complex scalars
and vectors. A (very) simple example illustrates the pitfalls:
Example 3.1 Let v = (1, i)T , u1 = (i, 0)T and u2 = (0, 1)T . The
vectors u1 and u2 are clearly orthogonal as u1 · u2 = i¯0 + 0¯1 = 0.
In fact they are an orthonormal set as u1 · u1 = i.¯i + 0.¯0 = 1 and
u2 · u2 = 0.¯0 + 1.¯1 = 1.
It is obvious by inspection that v = −iu1 + iu2 . Let’s try to derive
this.
Using the fact the the set {u1 , u2 } is orthonormal I know that I can
write:
v = α1 u1 + α2 u2 .
&
%
MS4105
151
'
$
To calculate α1 and α2 just compute:
u1 · v = u1 · (α1 u1 + α2 u2 )
= α1 u1 · u1 + α2 u1 · u2
= α1 (1) + α2 (0)
= α1 .
So α1 = u1 · v = (i, 0) · (1, i) = i.¯1 + 0.¯i = i. Similarly I find that
α2 = u2 · v = (0, 1) · (1, i) = 0.¯1 + 1.¯i = −i.
But I know that v = −iu1 + iu2 so α1 = −i and α2 = i. What’s
wrong?
The answer is on the next slide — try to work it out yourself first.
&
%
MS4105
152
'
$
The flaw in the reasoning was as follows:
u1 · v = u1 · (α1 u1 + α2 u2 )
¯ 1 u1 · u1 + α
¯ 2 u1 · u2 .
=α
¯ 1 = i so
(The rest of the calculation goes as previously leading to α
α1 = −i — the right answer. Similarly I correctly find that
¯ 2 = −i so α2 = i.
α
It is very easy to make this mistake — the result I need to remind
ourselves of is
¯ u, v .
u, αv = α
(3.1)
property (c) in Thm 3.2.
&
%
MS4105
153
'
$
For this reason I will always write an orthonormal expansion for a
vector v in a complex inner product space V as
v=
n
X
v, ui ui
i=1
&
%
MS4105
154
'
$
Example 3.2 Given the basis vectors v1 = (i, i, i)T , v2 = (0, i, i)T
and v3 = (0, 0, i)T , use the G-S algorithm to transform the set
S = {v1 , v2 , v3 } into an orthonormal basis.
• Set u1 = v1 = (i, i, i)T .
• Set
v2 , u1
u1
2
ku1 k
2
T
= (0, i, i) − (i, i, i)T = (−2i/3, i/3, i/3)T
3
u2 = v2 −
• Set
v3 , u1
v3 , u2
u3 = v3 −
u1 −
u2
2
2
ku1 k
ku2 k
1
1/3
T
T
= (0, 0, i) − (i, i, i) −
(−2i/3, i/3, i/3)T
3
2/3
= (0, −i/2, i/2)T .
&
%
MS4105
155
'
$
I can normalise the three vectors {u1 , u2 , u3 } by dividing each by its
norm.
2
• ku1 k = u1 , u1 = |i|2 + |i|2 + |i|2 = 3. Set u1 = √13 (i, i, i)T .
2
• ku2 k = u2 , u2 = |2i/3|2 + |i/3|2 + |i/3|2 = 2/3. Set
u2 =
√
√3 (−2i/3, i/3, i/3)T .
2
• ku3 k = u3 , u3 = |0|2 + |i/2|2 + |i/2|2 = 1/2. Set
√
u3 = 2(0, −i/2, i/2)T .
2
The vectors {u1 , u2 , u3 } form an orthonormal basis. The steps
worked as for vectors in a real inner product space — I just need to
be careful when taking the inner products.
&
%
MS4105
'
156
$
The following list of Theorems for real inner product spaces all still
apply for complex inner product spaces :
• Thm. 2.2 (the Cauchy-Schwarz inequality — note that now the
expression | u, v | refers to the modulus, not the absolute
value). Check that the alternative proof for the
Cauchy-Schwarz inequality on Slide 114 can easily be adapted
to prove the Theorem for a complex inner product space.
• Thm. 2.3 and Thm. 2.4,
• Thm. 2.6 (note that while for a real inner product space , I
could write
v = u1 , v u1 + · · · + un , v un ,
this is not correct for a complex inner product space as
ui , v 6= v, ui ),
• Thm. 2.7, Thm. 2.8 and Thm. 2.9 .
&
%
MS4105
157
'
3.2.3
$
Exercises
1. Consider the set of complex m × n matrices Cmn . Define the
complex inner product consisting of the set Cmn where for any
two matrices U and V I define the complex inner product
X
U, V =
Uij V¯ij
where the sum is over all the elements of the matrices.
(a) Find the inner product of the matrices



0
i
1



U=
and V =
1 1+i
0

−i

2i
(b) Check that U, V is indeed a complex inner product for
Cmn .
(c) Find d(U, V) for the vectors U and V defined in part (a).
&
%
MS4105
158
'
$
(d) Which
 of the following vectors are orthogonal to
2i i

?
A=
−i 3i


−3 1 − i


(a)
1−i
2


0 0


(c)
0 0


1 1


(b)
0 −1


0
1
.

(d)
3−i 0
2. Let C3 have the standard Euclidean inner product. Use the
Gram-Schmidt orthogonalisation process to transform each of
the following bases into an orthonormal basis :
(a)
u1 = (i, i, i), u2 = (−i, i, 0), u3 = (i, 2i, i)
(b) u1 = (i, 0, 0), u2 = (3i, 7i, −2i), u3 = (0, 4i, i)
&
%
MS4105
159
'
$
3. If α ∈ C and u, v is an inner product on a complex vector
space, then:
(a) prove that
¯ v, v
¯ u, v − α u, v + αα
u − αv, u − αv = u, u − α
(b) use the result in (a) to prove that
¯ v, v ≥ α
¯ u, v + α u, v .
u, u + αα
(c) prove the Cauchy-Schwarz inequality
2 | u, v | ≤ u, u v, v
u,v
by setting α = in (b).
v,v
&
%
MS4105
160
'
$
4. Check that the parallelogram law (2.2)
2
2
2
ku + vk + ku − vk = 2 kuk + kvk
2
(3.2)
still holds in any complex inner product space.
5. Prove that if u and v are vectors in a complex inner product
space then:
1
i
i
2 1
2
2
u, v = ku+vk − ku−vk + ku+ivk − ku−ivk2 . (3.3)
4
4
4
4
(This is the more general complex version of Q. 5.)
The full (complex) version of the Jordan von Neumann
Lemma) says that if I define an “inner product” using the
formula in the previous question then provided the parallelogram
law (2.2) holds for all u, v ∈ V then u, v as defined in Q. 5
satisfies all the axioms for a complex inner product space.
&
%
MS4105
'
161
$
6. Prove that if {u1 , . . . , un } is an orthonormal basis for a complex
inner product space V then if v and w are any vectors in V;
v, w = v, u1 w, u1 + v2 , u2 w, u2 +· · ·+ v, un w, un .
&
%
MS4105
'
162
$
Part II
Matrix Algebra
• This is the second Part of the Course.
• After reviewing the basic properties of vectors in Rn and of
matrices I will introduce new ideas including unitary matrices,
matrix norms and the Singular Value Decomposition.
• Although all the ideas will be set in Rn I will cross-reference
the more general vector space and inner product space ideas
from Part I where appropriate.
&
%
MS4105
'
4
163
$
Matrices and Vectors
• This Chapter covers material that will be for the most part
familiar — but from a more sophisticated viewpoint.
• For the most part I will not use bold fonts for vectors — but
we will keep to the convention that upper case letters represent
matrices and lower case letters represent vectors.
• Let A be an m × n matrix and x an n-dimensional column
vector and suppose that a vector b satisfies Ax = b.
• I will always keep to the convention that vectors are simply
matrices with a single column — so x is n × 1.
&
%
MS4105
164
'
$
• Then Ax = b and the m-dimensional column vector b is given
by
n
X
bi =
aij xj .
(4.1)
j=1
• If I use the summation convention discussed in Appendix A
then I can write (4.1) as bi = aij xj .
• In most of the course the entries of vectors and matrices may
be complex so A ∈ Cm×n , x ∈ Cn and b ∈ Cm .
• The mapping x → Ax is linear, so for any x, y ∈ Cn and any
α∈C
A(x + y) = Ax + Ay
A(αx)
&
= αAx.
%
MS4105
165
'
$
• Correspondingly, every linear mapping from Cn to Cm can be
represented as multiplication by an m × n matrix.
• The formula Ax = b is ubiquitous in Linear Algebra.
• It is useful — throughout this course — to interpret it as
saying that
If b = Ax, then the vector b can be written as a linear
combination of the columns of the matrix A.
• This follows immediately from the expression above for a
matrix-vector product, which can be re-written
b = Ax =
n
X
xj aj .
(4.2)
j=1
where now aj is the jth column of the matrix A.
&
%
MS4105
166
'
$
• Schematically:
  
  
  
  
  
b = a1
  
  
  
  

a2
...



 x1
 
 
  x2 
 
an 
  ..  =
 . 
 

xn





 
 
 
 
 
 
 
 
 
 
 
 





x1 a1  + x2 a2  + · · · + xn 
an 
 
 
 
 
 
 
 
 
 
&
(4.3)
%
MS4105
167
'
$
Similarly, in the matrix-matrix product B = AC, each column of
B is a linear combination of the columns of A.
If this isn’t obvious, just consider the familiar formula
bij =
m
X
aik ckj
(4.4)
k=1
and choose a particular column of B by fixing j. Clearly the jth
column of the RHS is formed by taking a linear combination of the
columns of A, where the coefficients are ckj with j fixed. The result
follows. I can write it (analogous with (4.2)) as
col j of B ≡ bj = Acj =
m
X
ckj ak ≡ a lin. comb. of cols of A.
k=1
(4.5)
So bj is a linear combination of the columns ak of A with
coefficients ckj .
&
%
MS4105
168
'
$
Example 4.1 (Outer Product) A simple special case of a
matrix-matrix product is the outer product of two vectors. Given
an m-dimensional column vector u and an n-dimensional row
vector vT I can (and should) regard u as an m × 1 matrix and vT
as a 1 × n matrix. Then I need no new ideas to define the outer
product by
(uvT )ij = ui1 v1j
or just ui vj .
(I can regard u and v as m × 1 and n × 1 matrices or just as
column vectors.)
&
%
MS4105
169
'
$
The outer product can be written:
 

 
 
 h
 
u v1
 
 
 
 
v2
...



i 

vn = 
v1 u



which is of course equal to

v u
 1 1
 ..
 .

v 1 um
...
...
v2 u
vn u1
..
.
v n um .
v3 u
...




v n u







.

The columns are all multiples of the same vector u (and check the
rows are all multiples of the same vector vT ).
&
%
MS4105
'
170
$
Example 4.2 As another example, consider the matrix equation
B = AR, where R is the upper-triangular n × n matrix with entries
rij = 1 for i ≤ j and rij = 0 for i > j:


1 ... 1


.

.
. . .. 
R=



1
&
%
MS4105
171
'
$
This product can be written
 





b1




b2
...
 
 
 
 

bn 
 = a1
 
 
 

a2
...

 1 ...


..

an 
.





1

.. 
.

1
The column formula (4.5) now gives
bj = Arj =
j
X
ak .
k=1
In other words, the jth column of B is the sum of the first j
columns of A.
&
%
MS4105
172
'
4.1
$
Properties of Matrices
In this Section I review basic properties of matrices, many will be
familiar.
4.1.1
Range and Nullspace
Corresponding to any m × n matrix A I can define two sets which
are respectively subspaces of Rm and Rn .
Definition 4.1 (Range) Given an m × n matrix A, the range of
a matrix A — written range(A), is the set of vectors y ∈ Rm that
can be written y = Ax for some x ∈ Rn .
Check that range(A) is a subspace of Rm .
&
%
MS4105
'
173
$
The formula (4.2) leads to the following characterisation of
range(A):
Theorem 4.1 range(A) is the vector space spanned by the columns
of A.
Proof: RTP that range(A) = span{a1 , . . . , an }. By (4.2), any Ax
is a linear combination of the columns of A. Conversely, any vector
y in the space spanned by the columns of A can be written as
Pn
y = j=1 xj aj . Forming a vector x from the coefficients xj , I have
y = Ax and so y ∈ range(A).
In the light of the result of Thm 4.1, it is reasonable to use the
term column space as an equivalent for the term range.
Exercise 4.1 What is the maximum possible value of
dim range(A)?
&
%
MS4105
'
174
$
Definition 4.2 The nullspace of an m × n matrix A — written
null(A) — is the set of vectors x ∈ Rn that satisfy Ax = 0.
Check the entries of each vector x ∈ null(A) give the coefficients of
an expansion of the zero vector as a linear combination of the
columns of A.
Check that null(A) is a subspace of Rn .
&
%
MS4105
175
'
4.1.2
$
Rank
Definition 4.3 The column rank of a matrix is the dimension of
its range (or column space).
Similarly;
Definition 4.4 The row rank of a matrix is the dimension of the
space spanned by its rows.
I will show later (Thm. 6.9) that the row rank and column rank of
an m × n matrix are always equal so I can use the term rank to
refer to either.
&
%
MS4105
'
176
$
Definition 4.5 An m × n matrix has full rank if its rank is the
largest possible, namely the lesser of m and n.
It follows that a full rank m × n matrix with (say) m ≥ n (a “tall
thin” matrix) must have n linearly independent columns.
I can show that the mapping defined by such a matrix is 1–1.
Theorem 4.2 A matrix A ∈ Cm×n with m ≥ n has full rank if
and only if it maps no two distinct vectors to the same vector.
Proof:
[→] If A has full rank then its columns are linearly independent, so
they form a basis for range A. This means that every
b ∈ range(A) has a unique linear expansion in terms of the
columns of A and so, by (4.2), every b ∈ range(A) has a unique
x such that b = Ax.
&
%
MS4105
'
177
$
[←] If A is not full rank then its columns aj are linearly
dependent and there is a non-trivial linear combination s.t.
Pn
j=1 cj aj = 0. The vector c formed from the coefficients cj
therefore satisfies Ac = 0. So A maps distinct vectors to the
same vector as for any x, A(x + c) = Ax.
&
%
MS4105
178
'
4.1.3
$
Inverse
I start with a non-standard definition of invertible matrices (which
does not refer to matrix inverses).
Definition 4.6 A square matrix A is non-singular (invertible) if
A is of full rank.
I will now show that this definition is equivalent to the standard
one.
The n columns aj , j = 1, . . . , n of a non-singular (full rank) n × n
matrix A form a basis for the space Cn . So any vector in Cn has a
unique expression as a linear combination of them. So the standard
unit (basis) vector ei , i = 1, . . . , n can be expanded in terms of the
aj , e.g.
n
X
ei =
zij aj .
(4.6)
j=1
&
%
MS4105
179
'
$
Let Z be the matrix whose ij entry is zij and let zj denote the jth
Pn
column of Z. Then (4.6) can be written ei = j=1 aj (ZT )ji . I can
assemble the ej column vectors into a single matrix:





e1





e2
...




T
≡
I
=
AZ
en 




and I is of course the n × n identity matrix . So the matrix ZT is
the inverse of A. (Any square non-singular matrix A has a unique
inverse, written A−1 satisfying AA−1 = A−1 A = I.)
&
%
MS4105
'
180
$
If you found the preceding vector treatment confusing, an index
notation version is as follows:
Pn
Pn
• Rewrite ei = j=1 zij aj so eki = j=1 zij akj where eki is the
element in the kth row of the (column) vector ei .
• Re-ordering the factors on the RHS we have (taking the
Pn
transpose of Z) eki = j=1 akj ZTji which gives us the matrix
equation I = AZT .
I always have the option of reverting to index notation if it helps to
derive a formula. Many algorithms can be explained in a very
straightforward way using vector notation.
You should try to become familiar with both.
P
(Finally, the summation convention allows us to drop the
symbols in the above — whether you use it or not is up to you.)
&
%
MS4105
'
181
$
The following Theorem is included for reference purposes — you
will have seen proofs in Linear Algebra 1.
Theorem 4.3 For any real or complex n × n matrix A, the
following conditions are equivalent:
(a) A has an inverse A−1
(b) rank(A) = n
(c) range(A) = Cn
(d) null(A) = { 0 }
(e) 0 is not an eigenvalue of A
(f ) det(A) 6= 0
Proof: Check as many of these results as you can. (I have just
shown that (b) implies (a).)
Note: I will rarely mention the determinant in the rest of the
course as it is of little importance in Numerical Linear Algebra.
&
%
MS4105
182
'
4.1.4
$
Matrix Inverse Times a Vector
When I write x = A−1 b this is of course a matrix vector product.
However; I will not (except perhaps in a pen and paper exercise)
compute this matrix product.
The inverse matrix A−1 is expensive to compute and is generally a
means to the end of computing the solution to the linear system
Ax = b.
You should think of x here as being the unique vector that satisfies
Ax = b and so x is the vector of coefficients of the (unique) linear
expansion of b in the basis formed by the columns of the full rank
matrix A.
&
%
MS4105
'
183
$
I can regard multiplication by A−1 as a change of basis operation;
switching between:
• regarding the vector b itself as the coefficients of the expansion
of b in terms of the basis {e1 , . . . , en }
• regarding A−1 b (≡ x) as the coefficients of the expansion of b
in terms of the basis {a1 , . . . , an }
&
%
MS4105
184
'
4.2
$
Orthogonal Vectors and Matrices
Orthogonality is crucial in Numerical Linear Algebra — here I
introduce two ideas; orthogonal vectors (an instance of Def. 2.4 for
inner product spaces) and orthogonal (unitary) matrices.
Remember that the complex conjugate of a complex number
z = x + iy is written z¯ or z∗ and that z¯ = x − iy. The hermitian
conjugate or adjoint of an m × n matrix A, written A∗ is the
n × m matrix whose i, j entry is the complex conjugate of the j, i
¯ ji ), the complex conjugate transpose.
entry of A, so A∗ij = (A
For example

a11

A=
a21
&
a12
a22

¯11
a

a13
 ⇒ A∗ = a
 ¯12
a23
¯13
a

¯21
a


¯22 
a

¯23
a
%
MS4105
'
185
$
• Obviously (A∗ )∗ = A.
• If A = A∗ I say that A is hermitian.
• And of course, a hermitian matrix must be square.
• If A is real then of course A∗ = AT , the transpose of A.)
• A real hermitian matrix satisfies A = AT and is called
symmetric.
• The vectors and matrices we deal with from now on may be
real — but I will use this more general notation to allow for the
possibility that they are not.
(In matlab/octave, A 0 is the hermitian conjugate A∗ and A· 0 is the
transpose AT .)
&
%
MS4105
'
186
$
Remember our convention that vectors in Rn are by default column
vectors — I can now say that vectors in Cn are by default column
vectors and that (for example)


1−i
i∗

 h
1 + i = 1 + i 1 − i −2i


2i
so that I can write the second to mean the first to avoid taking up
so much space.
&
%
MS4105
187
'
4.2.1
$
Inner Product on Cn
For the dot product (inner product ) of two column vectors x and y
in Cn to be consistent with Def. 3.1 (the complex Euclidean Inner
Product) I need to write:
x · y = y∗ x =
n
X
xi y
¯i .
(4.7)
i=1
This looks strange (the order of x and y are reversed) but remember
that I are writing all operations, even vector-vector operations, in
matrix notation. The two terms in blue correspond to Def. 3.1.
&
%
MS4105
188
'
$
The term in green is (by definition of the hermitian conjugate or
adjoint “∗” operation)
 
x
 1

h
i
x2 


y∗ x = y
¯1 y
¯2 . . . y
¯n 
.
 . 
 . 
 
xn
which I can summarise as:
y∗ x =
n
X
xi y
¯i .
i=1
&
%
MS4105
'
189
$
This is rarely an issue in Numerical Linear Algebra as, rather than
appeal to general results about complex inner product spaces , I
will use matrix algebra to prove the results that I need.
Confusingly, often writers (such as Trefethen) refer to x∗ y as the
complex inner product on Cn . This is clearly not correct but
usually doesn’t matter!
From now on in these notes all products will be matrix products so
the issue will not arise.
&
%
MS4105
190
'
$
The Euclidean length or norm (which I will discuss later in a more
general context) is:
X
12 X
21
n
n
√
¯i xi
kxk = x∗ x =
x
=
|xi |2 .
(4.8)
i=1
i=1
Again, the natural definition of the cosine of the angle θ between x
and y is
x∗ y
cos θ =
(4.9)
kxkkyk
y∗ x
based on (2.6)
(Strictly speaking this should be cos θ =
kxkkyk
and (4.7). But, as noted above, the difference is rarely important.)
The Cauchy-Schwarz inequality (2.7) ensures that | cos θ| ≤ 1 but
cos θ can be complex when the vectors x and y are complex. For
this reason the angle between two vectors in Cn is usually only of
interest when the angle is π/2, i.e. the vectors are orthogonal .
&
%
MS4105
191
'
$
It is easy to check that the operation x∗ y is bilinear — linear in
each vector separately
(x1 + x2 )∗ y = x∗1 y + x∗2 y
x∗ (y1 + y2 ) = x∗ y1 + x∗ y2
¯ βx∗ y
(αx)∗ (βy) = α
&
%
MS4105
192
'
$
• I will often need the easily proved (check ) formula that for
compatibly sized matrices A and B
(AB)∗ = B∗ A∗
(4.10)
• A similar formula for the inverse is also easily proved (check )
(AB)−1 = B−1 A−1 .
(4.11)
• The notation A−∗ is shorthand for (A∗ )−1 or (A−1 )∗ .
• A very convenient fact: these expressions are equal — check .
&
%
MS4105
193
'
4.2.2
$
Orthogonal vectors
• I say that two vectors x and y are orthogonal if x∗ y = 0 —
compare with Def. 2.4.
• Two sets of vectors X and Y are orthogonal if every vector x in
X is orthogonal to every vector y in Y.
• A set of vectors S in Cn is said to be an orthogonal set if for
all x, y in S, x 6= y ⇒ x∗ y = 0.
&
%
MS4105
194
'
$
It isn’t necessary but I can re-derive the result Thm. 2.8 that the
vectors in an orthogonal set are linearly independent.
Theorem 4.4 The vectors in an orthogonal set S in Cn are
linearly independent.
Proof: Assume the contrary; then some non-zero vk ∈ S can be
expressed as a linear combination of the rest:
vk =
n
X
ci vi .
i=1,i6=k
As vk 6= 0, v∗k vk ≡ kvk k2 > 0. But using the bilinearity of the x∗ y
operation and the orthogonality of S
v∗k vk =
n
X
ci v∗k vi = 0
i=1,i6=k
which contradicts the assumption that vk 6= 0.
&
%
MS4105
'
195
$
So, just as in a real inner product space (including Rn ) if an
orthogonal set in Cn contains n vectors, it is a basis for Cn .
Before re-stating the definition of the orthogonal projection of a
vector in Cn onto a subspace W = {u1 , . . . , uk } ⊆ Cn I need to
remind you that while in Rn x 0 y = y 0 x this is no longer the case in
Cn : u∗ v 6= v∗ u but in fact u∗ v = (v∗ u)∗ ≡ (v∗ u) as v∗ u is a scalar
— in general complex.
&
%
MS4105
196
'
$
To simplify the algebra, suppose that Q = {q1 , . . . , qk } (k ≤ n) is
an orthonormal set of vectors in Cn so that q∗i qi ≡ kqi k2 = 1.
Let v be an arbitrary vector in Cn . I can give the definition of the
orthogonal projection of the vector v onto Q:
ProjQ v =
k
X
q∗i v qi
(4.12)
i=1
q∗i v
Notice that
= v, qi so this definition is the same as Def. 2.10.
As noted previously, if the vectors are complex, the order of qi and
v in the matrix product/inner product does matter, unlike a real
inner product like the Euclidean inner product on Rn .
&
%
MS4105
197
'
$
(If the set Q = {q1 , . . . , qk } is orthogonal but not orthonormal then
I write (4.12) as
k
X
q∗i v
ProjQ v =
qi
2
kqi k
i=1
which should be compared with (2.10).)
The vector ProjQ⊥ v ≡ v − ProjQ v as in Def. 2.7 and, as I saw for
a general complex inner product space ProjQ v is orthogonal to
ProjQ⊥ v.
(Check using Cn notation — not really necessary as whatever
holds for a general complex inner product space holds for Cn with
the complex Euclidean inner product — but it is good practice.)
&
%
MS4105
198
'
$
If k = n then Q = {qi } is a basis for Cn so v = ProjQ v. Note that
I can write (4.12) for ProjQ v as:
v=
k
X
(q∗i v)qi
(4.13)
(qi q∗i )v.
(4.14)
i=1
or
v=
k
X
i=1
These two expansions are clearly equal as q∗i v is a scalar but have
different interpretations. The first expresses v as a linear
combination of the vectors qi with coefficients q∗i v. The second
expresses v as a sum of orthogonal projections of v onto the
directions {q1 , . . . , qn }. The ith projection operation is performed
by the rank-one matrix qi q∗i . I will return to this topic in the
context of the QR factorisation in Chapter 5.2.
&
%
MS4105
199
'
4.2.3
$
Unitary Matrices
Definition 4.7 A square matrix Q ∈ Cn×n is unitary (for real
matrices I usually say orthogonal) if Q∗ = Q−1 , i.e. if Q∗ Q = I.
(As the inverse is unique this means that also QQ∗ = I.) In terms
of the columns of Q, this product may be written







— q∗1 — 
1


 







— q∗2 — 
  1


 q1 q2 . . . qn  = 
.
..


 

..






.
.
 





— q∗n —
1
In other words, q∗i qj = δij and the columns of a unitary matrix Q
form an orthonormal basis for Cn . (The symbol δij is the
Kronecker delta equal to 1 if i = j and to 0 if i 6= j.)
&
%
MS4105
200
'
4.2.4
$
Multiplication by a Unitary Matrix
On Slide 183 I discussed the interpretation of matrix-vector
products Ax and A−1 b. If A is a unitary matrix Q, these products
become Qx and Q∗ b — the same interpretations are still valid. As
before, Qx is the linear combination of the columns of Q with
coefficients in x. Conversely,
Q∗ b is the vector of coefficients of the expansion of b in the basis
of the columns of Q.
Again I can regard multiplying by Q∗ as a change of basis
operation; switching between:
• regarding b as the coefficients of the expansion of b in terms of
the basis {e1 , . . . , en }
• regarding Q∗ b as the coefficients of the expansion of b in terms
of the basis {q1 , . . . , qn }
&
%
MS4105
201
'
4.2.5
$
A Note on the Unitary Property
I know that for any invertible matrix A, AA−1 = A−1 A (check ).
It follows that if Q is unitary, so is Q∗ . The argument is that I
know that Q∗ = Q−1 but QQ−1 = Q−1 Q. so QQ∗ = Q∗ Q = I.
The latter equality means that Q∗ is unitary.
The process of multiplying by a unitary matrix or its adjoint (also
unitary) preserves inner products. This follows as
(Qx)∗ (Qy) = x∗ Q∗ Qy = x∗ y
(4.15)
where I used the identity (4.10). It follows that angles are also
preserved and so are lengths:
kQxk = kxk.
&
(4.16)
%
MS4105
202
'
4.2.6
$
Exercises
1. Show the “plane rotation matrix”


cos(θ) sin(θ)


R=
− sin(θ) cos(θ)
is orthogonal and unitary.
2. Show that (AB)∗ = B∗ A∗ for any matrices A and B whose
product make sense. (Sometimes called compatible matrices:
two matrices with dimensions arranged so that they may be
multiplied. The number of columns of the first matrix must
equal the number of rows of the second.)
3. Show that the product of two unitary matrices is unitary.
4. (Difficult) Show that if a matrix is triangular and unitary then
it is diagonal.
&
%
MS4105
203
'
$
5. Prove the generalised version of Pythagoras’ Theorem: that for
a set of n orthonormal vectors x1 , . . . , xn ;
n
n
X 2 X
xi =
kxi k2 .
i=1
i=1
6. Show that the eigenvalues of a complex n × n hermitian matrix
are real.
7. Show the eigenvectors (corresponding to distinct eigenvalues)
of a complex n × n hermitian matrix are orthogonal.
8. What general properties do the eigenvalues of a unitary matrix
have?
&
%
MS4105
'
204
$
9. Prove that a skew-hermitian (S∗ = −S) matrix has pure
imaginary eigenvalues.
10. Show that if S is skew-hermitian then I − S is non-singular.
11. Show that the matrix Q = (I − S)−1 (I + S) is unitary. (Tricky.)
12. Using the above results, can you write a few lines of
Matlab/Octave that generate a random unitary matrix?
13. If u and v are vectors in Cn then let A = I + uv∗ — a rank-one
perturbation of I. Show that A−1 = I + αuv∗ and find α.
&
%
MS4105
205
'
4.3
$
Norms
Norms are measures of both size and distance. I will study first
vector, then matrix norms.
4.3.1
Vector Norms
In this Chapter I have already informally defined the Euclidean
norm in (4.8). Also in Part I I defined the induced norm
corresponding to a given inner product in a general inner product
space in (2.2). So the following definition contains nothing new —
but is useful for reference.
&
%
MS4105
206
'
$
Definition 4.8 A norm on Cn is a function from Cn to R that
satisfies (for all x, y in Cn and for all α in C )
1. kxk ≥ 0 and kxk = 0 if and only if x = 0,
2. kx + yk ≤ kxk + kyk,
Triangle Inequality
3. kαxk = |α|kxk.
(Compare with the general definition for an inner product space
Def. 2.2 and the properties proved in Thm. 2.3 based on that
definition.)
&
%
MS4105
207
'
$
A norm need not be defined in terms of an inner product as I will
see; any function that satisfies the three requirements in Def. 4.8
qualifies as a measure of size/distance.
The most important class of vector norms are the p-norms:
kxk1 =
n
X
|xi |
(4.17)
i=1
kxk2 =
n
X
! 12
|xi |
2
=
√
x∗ x
(4.18)
0 < p...
(4.19)
i=1
kxkp =
n
X
! p1
|xi |p
i=1
kxk∞ = max |xi |.
1≤i≤n
&
(4.20)
%
MS4105
208
'
$
Another useful vector norm is the weighted p-norm. For any norm
k · k,
kxkW = kWxk
(4.21)
where W is an arbitrary non-singular matrix. The most important
vector norm is the (unweighted) 2-norm.
&
%
MS4105
209
'
4.3.2
$
Inner Product based on p-norms on Rn /Cn
It is interesting to note that the Parallelogram Law (2.2) does not
hold for a p–norm, for p 6= 2. (For example, check that if p = 1,
u = (1, 0)T and v = (0, 1)T , the equality does not hold.)
(I first looked at the issue of whether the Parallelogram Law always
holds back in Exercise 4 on Slide 123.)
Exercise 4.2 Show that for any p > 0, when u = (1, 0)T and
v = (0, 1)T , then the lhs in (2.2) is 21+2/p and the rhs is 4. Show
that lhs=rhs if and only if p = 2. (What about k · k∞ ?)
So I cannot derive an inner product u, v from a p-norm using
(2.9) unless p = 2 — the Euclidean norm.
&
%
MS4105
210
'
4.3.3
$
Unit Spheres
For any choice of vector norm, the (closed) unit sphere is just
{x|kxk = 1}. (The term unit ball is used to refer to the set
{x|kxk ≤ 1} — the set bounded by the unit sphere.)
It is interesting to sketch the unit sphere for different vector norms
in R2 — see Fig. 4 on the next Slide.
&
%
MS4105
211
'
$
kxk2 = 1
(0, 1)
kxkp = 1, p > 1
kxk∞ = 1
(1, 0)
(−1, 0)
kxkp = 1, p < 1
kxk1 = 1
(0, −1)
Figure 4: Unit spheres in R2 in different norms
&
%
MS4105
'
212
$
1. kxk1 = 1 ≡ |x1 | + |x2 | = 1. By examining each of the four
possibilities (x1 ≥ 0, x2 ≥ 0), . . . , (x1 ≤ 0, x2 ≤ 0) in turn I see
that the unit sphere for this norm is a diamond-shaped region
with vertices at (0, 1), (1, 0), (0, −1) and (−1, 0).
2. kxk2 = 1 ≡ x21 + x22 = 1, the equation of a (unit) circle, as
expected.
3. kxk∞ = 1 ≡ max{|x1 |, |x2 |} = 1 — “the larger of |x1 | or |x2 | is
equal to 1”.
• If |x1 | ≤ |x2 | then kxk∞ = |x2 | = 1 or x2 = ±1 — two
horizontal lines.
• If |x1 | ≥ |x2 | then kxk∞ = |x1 | = 1 or x1 = ±1 — two
vertical lines.
So the unit sphere with the ∞-norm is a unit square centred at
the origin with vertices at the intersections of the four lines
indicated; (1, 1), (−1, 1), (−1, −1) and (1, −1).
&
%
MS4105
'
213
$
4. kxkp = 1. For p > 1 this is a closed “rounded square” or
“squared circle” — check — see Fig. 4.
5. kxkp = 1. For p < 1 I get a “cross-shaped” figure — check —
see Fig. 4.
(Use Maple/Matlab/Octave to check the plot of kxkp = 1 for p > 1
and p < 1 graphically first if you like but it is easy to justify the
plots using a little calculus. Hint: just check for the first quadrant
and use the symmetry about the x– and y– axes of the definition of
kxk to fill in the other three quadrants.)
&
%
MS4105
214
'
4.3.4
$
Matrix Norms Induced by Vector Norms
Any m × n matrix can be viewed as a vector in an mn-dimensional
space (each of the mn components treated as a coordinate) and I
could use any p-norm on this vector to measure the size of the
matrix. The main example of this is the “Frobenius norm” which
uses p = 2;
Definition 4.9 (Frobenius Norm)
 12

m X
n
X
|aij |2  .
kAkF = 
i=1 j=1
&
%
MS4105
215
'
$
It is usually more useful to use induced matrix norms, defined
in terms of one of the vector norms already discussed.
Definition 4.10 (Induced Matrix Norm — Informal) Given
a choice of norm k · k on Cm and Cn the induced matrix norm of
the m × n matrix A, kAk, is the smallest number b such that the
following inequality holds for all x ∈ Cn .
kAxk ≤ bkxk.
(4.22)
kAxk
In other words, of all upper bounds on the ratio
, I define
kxk
kAk to be the smallest (least) such upper bound.
&
%
MS4105
'
216
$
A Note on the Supremum Property 1
• A (possibly infinite) bounded set S of real numbers obviously
has an upper bound, say b.
• Any real number greater than b is also an upper bound for S.
• How do I know that there is a least upper bound for S?
• It seems obvious ( actually a deep property of the real numbers
R) that any bounded set of real numbers has a least upper
bound or supremum.
• The supremum isn’t always contained in the set S!
• The following three slides summarise what you need to know.
&
%
MS4105
'
217
$
A Note on the Supremum Property 2
• For example the set S = [0, 1) (all x s.t. 0 ≤ x < 1) certainly
has upper bounds such as 1, 2, 3, π, . . . .
• The number 1 is the least such upper bound — the supremum
of S.
• How do I know that 1 is the least upper bound?
– Any number less than 1, say 1 − ε, cannot be an upper
bound for S as the number 1 − ε/2 is in S but greater than
1 − ε!
– Any number greater than 1 cannot be the least upper
bound as 1 is an upper bound.
• Notice that the supremum 1 is not a member of the set
S = [0, 1).
&
%
MS4105
'
218
$
A Note on the Supremum Property 3
• Why not just use the term “maximum” rather than
“supremum”?
• Because every bounded set has a supremum.
• And not all bounded sets have a maximum.
• For example, our set S = [0, 1) has supremum 1 as I have seen.
• What is the “largest element” of S??????
• Finally, it should be obvious that for any set S of real numbers,
while the supremum (s say) may not be an element of S, every
element of S is less than or equal to s.
&
%
MS4105
'
219
$
A Note on the Supremum Property 4
• The payoff from the above is that I now have a general method
for calculating vector norms:
– If (for any vector norm), I can show that
1. kAxk ≤ b for all unit vectors x (in the chosen vector
norm).
2. kAx◦ k = b for some specific unit vector x◦ .
– Then kAk = b as b must be the supremum of kAxk over all
unit vectors x
∗ Because b is an upper bound
∗ And there cannot be a smaller upper bound (why?).
• I often say that “the bound is attained by x◦ , i.e. kAx◦ k = b so
b is the supremum”.
• I will use this technique to find formulas for various induced
p–norms in the following slides.
&
%
MS4105
220
'
$
• So kAk is the supremum (the least upper bound) of the ratios
kAxk
n
over
all
non-zero
x
∈
C
— the maximum factor by
kxk
which A can stretch a vector x.
• The sloppy definition is
Definition 4.11 (Induced Matrix Norm — Wrong)
Given a choice of norm k · k on Cm and Cn the induced matrix
norm of the m × n matrix A is
kAxk
max
kxk6=0 kxk
(4.23)
• The subtly different (and correct) definition is
Definition 4.12 (Induced Matrix Norm — Correct)
Given a choice of norm k · k on Cm and Cn the induced matrix
norm of the m × n matrix A is
&
kAxk
sup
kxk6=0 kxk
(4.24)
%
MS4105
'
221
$
• I say for any m × n matrix A that kAk is the matrix norm
“induced” by the vector norm kxk.
• Strictly speaking I should use a notation that reflects the fact
that the norms in the numerator and the denominator are of
vectors in Cm and Cn respectively.
• This never causes a problem as it is always clear from the
context what vector norm is being used.
• Because kαxk = |α|kxk, the ratio kAxk
kxk is does not depend on
the norm of x so the matrix norm is often defined in terms of
unit vectors.
&
%
MS4105
222
'
$
So finally:
Definition 4.13 (Induced Matrix Norm) For any
m × n matrix A and for a specific choice of norm on Cn and Cm ,
the induced matrix norm kAk is defined by:
kAk =
=
kAxk
x∈Cn ,x6=0 kxk
sup
sup
kAxk.
(4.25)
(4.26)
x∈Cn ,kxk=1
&
%
MS4105
223
'
$
Before I try to calculate some matrix norms, a simple but very
important result:
Lemma 4.5 (Bound for Matrix Norm) For any m × n matrix
A and for any specific choice of norm on Cn and Cm , for all
x ∈ Cn
kAxk ≤ kAkkxk.
(4.27)
Proof: I simply refer to Def 4.13: for any non-zero x ∈ Cn ,
kAxk
kAyk
≤ sup
≡ kAk.
kxk
kyk
y6=0
So kAxk ≤ kAkkxk.
&
%
MS4105
224
'
$

1

Example 4.3 The matrix A =
0

2
 maps R2 to R2 .
2
Using the second version (4.26) of the definition of the induced
matrix norm , let’s calculate the effect of A on the unit sphere in
R2 for various p-norms.
First I ask what the effect of A on the unit vectors e1 and e2 is
(they are unit vectors in all norms); obviously
e1 ≡ (1, 0)∗ → (1, 0)∗ ≡ (1, 0)T
e2 ≡ (0, 1)∗ → (2, 2)∗ ≡ (2, 2)T .
and of course
−e1 ≡ (−1, 0)∗ → (1−, 0)∗ ≡ (−1, 0)T
−e2 ≡ (0, −1)∗ → (−2, −2)∗ ≡ (−2, −2)T .
&
%
MS4105
225
'
$
Now consider the unit ball for each norm in turn:
• In the 1-norm, the diamond-shaped unit ball (see Fig. 4) is
mapped into the parallelogram in Fig. 5:with vertices (1, 0),
(2, 2), (−1, 0) and (−2, −2). The unit vector x that is magnified
most by A is (0, 1)∗ or its negative and the magnification factor
is 4 in the 1–norm.
Write
 
  

X
x
x + 2y
  = A  = 

Y
y
2y
&
%
MS4105
226
'
$
(2, 2)
L2 : Y = 2/3(X + 1), X ≥ 0
Y ↑
L1 : Y = 2(X − 1)
(0, 2/3)
L3 : Y = 2/3(X + 1), X ≤ 0
(−1, 0)
(1, 0)
X →
(0, −2/3)
(−2, −2)
Figure 5: Image of unit disc in 1-norm under multiplication by A
&
%
MS4105
'
227
$
Now examine the parallelogram in the X–Y plane in Fig. 5:
– On leg L1 , X and Y are both non-negative so
kAxk1 = X + Y = X + 2(X − 1) = 3X − 2, 1 ≤ X ≤ 2 so
kAxk1 ≤ 4.
– On leg L2 , X and Y are both non-negative so
kAxk1 = X + Y = X + 2/3(X + 1) = 5X/3 + 2/3, 0 ≤ X ≤ 2
so kAxk1 ≤ 4 again.
– Finally on leg L3 , X ≤ 0 and Y is non-negative so
kAxk1 = −X + Y = −X + 2/3(X + 1) = 2/3 − X/3,
−1 ≤ X ≤ 0 and on this leg, kAxk1 ≤ 1.
By the symmetry of the diagram, the results for the lower part
of the parallelogram will be the same. So kAxk1 ≤ 4.
So as 4 is the least upper bound on kAxk1 over unit vectors x
in the 1–norm, I have kAk1 = 4.
&
%
MS4105
228
'
$
• In the 2-norm, the unit ball (a unit disc) in the x–y plane is
mapped into the ellipse X2 − 2XY + 5/4Y 2 = 1 in the X–Y
plane. containing the points (1, 0) and (2, 2). It is a nice
exercise to check that the unit vector that is magnified most by
A is the vector (cos θ, sin θ) with tan(2θ) = −4/7. This
equation has multiple solutions as tan is periodic with period π.
Check that θ = 1.3112 corresponds to the largest value for
kAxk. The corresponding point on the unit circle in the x–y
plane is approximately (0.2567, 0.9665). Check that the
corresponding magnification factor is approximately 2.9208. (It
√
must be at least 8 ≈ 2.8284, as (0, 1)∗ maps to (2, 2)∗ .)
So:
kAk2 ≈ 2.9208
&
%
MS4105
229
'
$
I can plot the image of the unit disc based on the following:

1

A=
0

2
,
2
−1
A

1

=
0
−1

.
1/2
  
 

X−Y
X
x
  = A−1   = 
.
1
Y
y
2Y
So the unit circle
1 = x 2 + y2
1
= (X − Y)2 + Y 2
4
transforms into the ellipse X2 − 2XY + 5/4Y 2 = 1.
On the next Slide I plot this ellipse with (1, 0) and (2, 2) shown.
&
%
MS4105
230
'
$
2.0
1.6
1.2
Y
0.8
0.4
0.0
−1
−2
X
0
1
2
−0.4
−0.8
−1.2
−1.6
−2.0
Figure 6: Image of unit disk in 2-norm under multiplication by A
&
%
MS4105
'
231
$
That was a lot of algebra to get the norm of a 2 × 2 matrix —
there must be a better way; there is! More later.
&
%
MS4105
232
'
$
• In the ∞-norm, the square “unit ball” with vertices (1, 1),
(−1, 1), (−1, −1) and (1, −1) is transformed into the
parallelogram with vertices (3, 2), (1, 2), (−3, −2) and (−1, −2).
It is easy to check that the points with largest ∞-norm on this
parallelogram are ±(3, 2) with magnification factor equal to 3.
So
kAk∞ = 3.
&
%
MS4105
233
'
$
Example 4.4 (The p-norm of a diagonal matrix) Let
D = diag(d1 , . . . , dn ).
• Then (similar argument to that preceding Fig. 6) the image of
Pn
the n-dimensional 2-norm unit sphere i=1 x2i = 1 under D is
n
X
X2i
T −2
just the n-dimensional ellipsoid X D X = 1 or
= 1.
2
d
i=1 i
• The semiaxis lengths (maximum values of each of the Xi ) are
|di |.
• The unit vectors magnified most by D are those that are mapped
to the longest semiaxis of the ellipsoid, of length max{|di |}.
• Therefore, kDk2 = sup kDxk = sup kXk = max |di |.
kxk=1
kxk=1
1≤i≤n
• Check that this result holds not just for the 2-norm but for any
p-norm (p ≥ 1) when D is diagonal.
&
%
MS4105
234
'
$
I will derive a general “formula” for the 2-norm of a matrix in the
next Chapter — the 1– and ∞–norm are easier to analyse.
Example 4.5 (The 1-norm of a matrix) If A is any
m × n matrix then kAk1 is the “maximum column sum” of A. I
argue as follows. First write A in terms of its columns






A=
a1



a2
...




an 




where each aj is an m-vector. Consider the (diamond-shaped for
n = 2) 1-norm unit ball in Cn .
Pn
n
This is the set B = {x ∈ C : j=1 |xj | ≤ 1}.
&
%
MS4105
235
'
$
Any vector Ax in the image of this set must satisfy
X
n
X
n
kAxk1 = xj aj |xj |kaj k1 ≤ max kaj k1 .
≤
1≤j≤n
j=1
j=1
1
where the first inequality follows from the Triangle Inequality and
the second from the definition of the unit ball B. So the induced
matrix 1-norm satisfies kAxk1 ≤ max kaj k1 . By choosing x = eJ ,
1≤j≤n
where J is the index that maximises kaj k1 I have kAxk1 = kaJ k1
(“I can attain this bound”). So any number less than kaJ k1 cannot
be an upper bound as kAeJ k would exceed it. But the matrix norm
is the least upper bound (supremum) on kAxk1 over all unit
vectors x in the 1–norm and so the matrix norm is
kAk1 = max kaj k1
1≤j≤n
the maximum column sum.
(This reasoning is tricky but worth the effort.)
&
%
MS4105
236
'
$
Example 4.6 (The ∞-norm of a matrix) Using a similar
argument check that the ∞-norm of an n × m matrix is the
“maximum row sum” of A:
kAk∞ = max ka∗i k1 .
1≤i≤n
where a∗i stands for the ith row of A.
(Hint: x = (±1, ±1, . . . , ±1)T attains the relevant upper bound and
kxk∞ = 1 where the ± signs are chosen to be the same as the signs
of the corresponding components of a∗I where I is the index that
maximises ka∗i k1 . )
Check that this sneaky choice of x ensures that kAxk∞ = ka∗I k1 .
( If the matrix A is complex the definition of x is slightly more
complicated but the reasoning is the same, as is the result.)
Stopped here 16:00, Monday 20 October
&
%
MS4105
237
'
$
Exercise 4.3 Check that the max column sum & maximum row
sum formulas just derived for the 1–norm and
of a matrix

 ∞–norm
1

give the correct results for the matrix A =
0
Exercise
4.4

1 2

2 5

B=
6 3

9 −2
&
2
 discussed above.
2
Repeat
 the calculations for the 4 × 3 matrix
7

1

 and check using Matlab/Octave.
−1

4
%
MS4105
238
'
$
I proved the Cauchy-Schwarz inequality (2.7) for a general real or
complex inner product space. The result for the standard
Euclidean inner product on Cn is just
|x∗ y| ≤ kxk2 kyk2 .
I can apply the C-S inequality to find the 2-norm of some special
matrices.
Example 4.7 (The 2-norm of a row vector) Consider a
matrix A containing a single row. I can write A = a∗ where a is a
column vector. The C-S inequality allows us to find the induced
2-norm. For any x I have kAxk2 = |a∗ x| ≤ kak2 kxk2 . The bound is
“tight” as kAak2 = kak22 . So I have
kAk2 = sup {kAxk2 /kxk2 } = kak2 .
x6=0
&
%
MS4105
239
'
$
Example 4.8 (The 2-norm of an outer product) Let A be the
rank-one outer product uv∗ where u ∈ Cm and v ∈ Cn . For any
x ∈ Cn , I can use the C-S inequality to bound kAxk2 by
kAxk2 = kuv∗ xk2 = kuk2 |v∗ x| ≤ kuk2 kvk2 kxk2 .
Then kAk2 ≤ kuk2 kvk2 . In fact the inequality is an equality
(consider the case x = v/kvk) so kAk2 = kuk2 kvk2 .
&
%
MS4105
'
5
240
$
QR Factorisation and Least Squares
In this Chapter I will study the QR algorithm and the related
topic of Least Squares problems. The underlying idea is that of
orthogonality.
• I will begin by discussing projection operators.
• I will then develop the QR factorisation, our first matrix
factorisation. (SVD will be the second.)
• Next I revisit the Gram-Schmidt orthogonalisation process
(2.9) in the context of QR.
• The Householder Triangularisation will then be examined as a
more efficient algorithm for implementing the QR factorisation.
• Finally I will apply these ideas to the problem of finding least
squares fits to data.
&
%
MS4105
241
'
5.1
$
Projection Operators
• A projection operator is a square matrix P that satisfies
the simple condition P2 = P. I’ll say that such a matrix is
idempotent.
• A projection operator can be visualised as casting a shadow
or projection Pv of any vector v in Cm onto a particular
subspace.
• If v ∈ range P then “visually” v lies in its own shadow.
• Algebraically if v ∈ range P then v = Px for some x and
Pv = P2 x = Px = v.
• So (not unreasonably) a projection operator maps any vector
in its range into itself.
&
%
MS4105
242
'
$
• The operator (matrix) I − P is sometimes called the
complementary projection operator to P.
• Obviously P(I − P)v = (P − P2 )v = 0 for any v ∈ Cm so I − P
maps vectors into the nullspace of P.
• If P is a projection operator then so is I − P as
(I − P)2 = I − 2P + P2 = I − P.
• It is easy to check that for any projection operator P,
range(I − P) = null(P)
and
null(I − P) = range(P).
• Also null(P) ∩ range(P) = {0} as any vector v in both satisfies
Pv = 0 and v − Pv = 0 and so v = 0.
• So a projection operator separates Cn into two spaces (all
subspaces must contain the zero vector so the intersection of
any two subspaces will always contain 0).
&
%
MS4105
'
243
$
• On the other hand suppose that I have two subspaces S1 and
S2 of Cm s.t.
– S1 ∩ S2 = { 0 } and
– S1 + S2 = Cm
where S1 + S2 means the span of the two sets, i.e.
S1 + S2 = {x ∈ Cm |x = s1 + s2 , s1 ∈ S1 , s2 ∈ S2 }.
• Such a pair of subspaces are called complementary
subspaces.
&
%
MS4105
'
244
$
• In R3 , you can visualise S1 as a plane aT x = 0 or equivalently
αx + βy + γz = 0 with normal vector a = (α, β, γ)T and
S2 = {s|s = tb, t ∈ R}, the line through the origin parallel to
some vector b that need not be perpendicular to S1 . See the
Figure:
&
%
MS4105
245
'
$
a
s2 = (I − P)v
S2 = {x|x = tb, t ∈ R}
v = s1 + s2
b
s1 = Pv
S1 = {x|aT x = 0}
Figure 7: Complementary Subspaces S1 and S2 in R3
&
%
MS4105
'
246
$
The following Theorem says that given two complementary
subspaces S1 and S2 , I can define a projection operator based on
S1 and S2 .
Theorem 5.1 There is a projection operator P such that
range P = S1 and null P = S2 .
Proof:
• Simply define P by Px = x for all x ∈ S1 and Px = 0 for x ∈ S2 .
• Then if x ∈ S1 , P2 x = P(Px) = Px = x so P2 = P on S1 .
• If x ∈ S2 , P2 x = P(Px) = P0 = 0 = Px (using the fact that
0 ∈ S1 and 0 ∈ S2 ) so P2 = P on S2 also.
• Therefore P2 = P on Cm .
• The range and nullspace properties follow from the definition of
P.
&
%
MS4105
247
'
$
• I say that P is the projection operator onto S1 along S2 .
• The projection operator P and its complement are precisely
the matrices that solve the decomposition problem: “Given v,
find vectors v1 ∈ S1 and v2 ∈ S2 s.t. v1 + v2 = v”.
• Clearly v1 = Pv and v2 = (I − P)v is one solution.
• In fact these vectors are unique as any solution to the
decomposition problem must be of the form
(Pv + v3 ) + ((I − P)v − v3 ) = v
where v3 is in both S1 and S2 so v3 = 0 as S1 ∩ S2 = 0.
• Note that I don’t yet know how to compute the matrix P for a
given pair of complementary subspaces S1 and S2 .
&
%
MS4105
248
'
5.1.1
$
Orthogonal Projection Operators
• An orthogonal projection operator is one that projects
onto a space S1 along a space S2 where S1 and S2 are
orthogonal — i.e. s∗1 s2 = 0 for any s1 ∈ S1 and s2 ∈ S2 .
• A projection operator that is not orthogonal is sometimes
called oblique — the projection illustrated in Fig. 7 is oblique
as bT s1 is not identically zero for all s1 ∈ S1 .
• The Figure on the next Slide illustrates the idea — S1 is the
plane (through the origin) whose normal vector is a and S2 is
the line through the origin parallel to the vector a.
• So any vector s1 in (the plane) S1 is perpendicular to any
vector s2 parallel to the line S2 .
&
%
MS4105
249
'
$
S2 = {x|x = ta, t ∈ R}
v = s1 + s2
s2 = (I − P)v
S1 = {x|aT x = 0}
a
s1 = Pv
Figure 8: Orthogonal Subspaces S1 and S2 in R3
&
%
MS4105
250
'
$
• N.B. an orthogonal projection operator is not an
orthogonal matrix!
• I will show that it is in fact hermitian.
• The following theorem links the geometrical idea of
orthogonal projection operators with the hermitian property of
complex square matrices.
Theorem 5.2 A projection operator P is orthogonal iff P = P∗ .
Proof:
[→] Let P be a hermitian projection operator. RTP that the
vectors Px and (I − P)y are orthogonal for all x, y ∈ Cm . Let
x, y ∈ Cm and let s1 = Px and s2 = (I − P)y. Then
s∗1 s2 = x∗ P∗ (I − P)y = x∗ (P − P2 )y = 0.
So P is orthogonal.
&
%
MS4105
251
'
$
[←] Suppose that P projects onto S1 along S2 where S1 ⊥ S2 and
dim S1 ∩ S2 = 0. I have that range P = S1 and null P = S2 .
RTP that P∗ = P.
Let dim S1 = n. Then I can factor P as follows:
• Let Q = {q1 , . . . , qm } be an orthonormal basis for Cm where
Qn = {q1 , . . . , qn } is an orthonormal basis for S1 and
Qm−n = {qn+1 , . . . , qm } is an orthonormal basis for S2 .
• For j ≤ n I have Pqj = qj and for j > n I have Pqj = 0.
• So





PQ = 
q1



&

q2
...
qn
0
...




0
.



%
MS4105
252
'
$
• Therefore

1






∗
Q PQ = Σ = 







..
.
1
0
..
.













0
an m × m diagonal matrix with ones in the first n entries
and zeroes elsewhere. So I have constructed a factorisation
(an eigenvalue decomposition) for P, P = QΣQ∗ . It follows
that P is hermitian.
(This is also a SVD for P — discussed in detail in Ch. 6 .)
&
%
MS4105
253
'
5.1.2
$
Projection with an Orthonormal Basis
I have just seen that an orthogonal projection operator has some
singular values equal to zero (unless P = I) so it is natural to drop
the “silent” columns in Q corresponding to the zero singular values
and write
^Q
^∗
P=Q
(5.1)
^ are orthonormal. The matrix Q
^ can be any
where the columns of Q
set of n orthonormal vectors — not necessarily from a SVD. Any
^Q
^ ∗ is a orthogonal (why?) projection operator onto the
product Q
^ as
column space of Q
Pv =
n
X
(qi q∗i )v,
(5.2)
i=1
a linear combination of the vectors qi . Check that this follows
from (5.1) for P.
&
%
MS4105
254
'
$
The complement of an orthogonal projection operator is also an
^Q
^ ∗ is hermitian.
orthogonal projection operator as I − Q
A special case of orthogonal projection operators is the rank-one
orthogonal projection operator that isolates the component in a
single direction q:
Pq = qq∗ .
General higher rank orthogonal projection operators are sums of
Pqi ’s (see (5.2)). The complement of any Pq is the rank-m − 1
orthogonal projection operator that eliminates the component in
the direction of q:
P⊥q = I − qq∗ .
&
(5.3)
%
MS4105
255
'
$
Finally, q is a unit vector in the above. To project along a non-unit
vector a, just replace q by a/kak giving:
aa∗
Pa = ∗
a a
aa∗
P⊥a = I − ∗ .
a a
&
(5.4)
(5.5)
%
MS4105
256
'
5.1.3
$
Orthogonal Projections with an Arbitrary Basis
I can construct orthogonal projection operators that project any
vector onto an arbitrary not necessarily orthonormal basis for a
subspace of Cm .
• Suppose this subspace is spanned by a linearly independent set
{a1 , . . . , an } and let A be the m × n matrix whose jth column is
aj .
• Consider the orthogonal projection vA = Pv of a vector v onto
this subspace — the range of A (the space spanned by the
vectors {a1 , . . . , an }).
• Then the vector vA − v must be orthogonal to the range of A so
a∗j (vA − v) = 0 for each j.
&
%
MS4105
257
'
$
• Since vA ∈ range A, I can write vA = Ax for some vector x and
I have a∗j (Ax − v) = 0 for each j or equivalently A∗ (Ax − v) = 0.
• So A∗ Ax = A∗ v.
• I know that, as A has full rank n (how do I know?), A∗ A is
invertible.
• Therefore
x = (A∗ A)−1 A∗ v
and finally I can write vA , the projection of v as
vA = Ax = A(A∗ A)−1 A∗ v.
• So the orthogonal projection operator onto the range of A is
given by the formula
P = A(A∗ A)−1 A∗
&
(5.6)
%
MS4105
'
258
$
• Obviously P is hermitian, as predicted by Thm. 5.2.
• Note that (5.6) is a multidimensional generalisation of (5.4).
^ the (A∗ A) factor is just the
• In the orthonormal case A = Q,
identity matrix and I recover (5.1).
&
%
MS4105
259
'
5.1.4
$
Oblique (Non-Orthogonal) Projections
Oblique projections are less often encountered in Numerical Linear
Algebra but are interesting in their own right.
One obvious question is: how do I construct a matrix P
corresponding to the formula P = A(A∗ A)−1 A∗ for an
orthogonal projection operator ?
The details are interesting but involved. See App. H — this
material is optional.
&
%
MS4105
260
'
5.1.5
$
Exercises
1. If P is an orthogonal projection operator then I − 2P is unitary.
Prove this algebraically and give a geometrical explanation.
(See Fig. 9.)
2. Define F to be the m × m matrix that “flips” (reverses the
order of the elements of) a vector (x1 , . . . , xm )∗ to
(xm , . . . , x1 )∗ . Can you write F explicitly? What is F2 ?
3. Let E be the m × m matrix that finds the “even part” of a
vector in Cm so that Ex = (x + Fx)/2 where F was defined in
the previous question. Is E an orthogonal projection operator ?
Is it a projection operator ?
4. Given that A is an m × n (m ≥ n) complex matrix show that
A∗ A is invertible iff A has full rank. (See the discussion before
and after (4.6).) See App. J for a solution.
&
%
MS4105
261
'
$
5. Let S1 and S2 be the subspaces of R3 spanned by:
B1 = (1, −1, −1)T , (0, 1, −2)T
and B2 = (1, −1, 0)T
respectively. Show that S1 and S2 are complementary. Is S2
orthogonal to S1 ? If so, calculate the projection operator onto
S1 along S2 . If not, calculate it using the methods explained in
App. H. (Optional).
6. Consider the matrices

1

A=
0
1


0
1



1 , B = 
0
0
1

2

1
.
0
Answer the following by hand calculation:
(a) What is the orthogonal projection operator P onto the
range of A.
&
%
MS4105
'
262
$
(b) What is the image under P of the vector (1, 2, 3)∗ ?
(c) Repeat the calculations for B.
7. (Optional) Can you find a geometrical interpretation for the
Hermitian conjugate P∗ of an oblique (P∗ 6= P) projection
¯2 by using the fact that
operator P? Hints: Show that P∗ v ∈ S
for any v, w ∈ Cm ; w∗ (P∗ v) = (Pw)∗ v which is zero if w ∈ S2 .
¯ 1 . (S
¯1 is
Use a similar trick to show that P∗ v = 0 for any v ∈ S
the subspace of vectors orthogonal to all vectors in S1 .) So P∗
projects onto ??? along ???.
Make a sketch to illustrate the above based on the case where
S1 and S2 are both one-dimensional.
&
%
MS4105
'
263
$
8. Show that for any projection operator P, kPk = kP∗ k. (Hint:
remember/show that for any matrix M, MM∗ and M∗ M have
the same eigenvalues — Thm. 6.12 — and then use the fact
that kPk2 is largest eigenvalue of P∗ P and kP∗ k2 is largest
eigenvalue of PP∗ .)
9. (Optional) (Part of the proof requires that you read
Appendix H.) Let P ∈ Cm×m be a non-zero projection
operator. Show that kPk2 ≥ 1 with equality iff P is an
orthogonal projection operator. See App. K for a solution.
&
%
MS4105
264
'
5.2
$
QR Factorisation
The following Section explains one of the most important ideas in
Numerical Linear Algebra — QR Factorisation.
5.2.1
Reduced QR Factorisation
The column spaces of a matrix A are the succession of spaces
spanned by the columns a1 , a2 , . . . of A:
span(a1 ) ⊆ span(a1 , a2 ) ⊆ span(a1 , a2 , a3 ) ⊆ . . .
The geometric idea behind the QR factorisation is the construction
of a sequence of orthonormal vectors qi that span these successive
spaces.
So q1 is just a1 /ka1 k, q2 is a unit vector ⊥ q1 that is a linear
combination of a1 and a2 , etc.
&
%
MS4105
265
'
$
For definiteness suppose that an m × n complex matrix A is “tall
and thin” — i.e (n ≤ m) with full rank n. I want the sequence of
orthonormal vectors q1 , q2 , . . . to have the property that
span(q1 , q2 , . . . , qj ) = span(a1 , a2 , . . . , aj ),
j = 1, . . . , n.
I claim that this is equivalent to A = QR or schematically:

 



 
 r11 r12 . . .

 


 


 

r22
a1 a2 . . . an  = q1 q2 . . . qn  

 

..

 

.

 


 

where the diagonal elements rkk
are non-zero.
&

r1n
.. 

. 




rnn
(5.7)
of the upper triangular matrix R
%
MS4105
266
'
$
Argue as follows:
• If (5.7) holds, then each ak can be written as a linear
combination of q1 , . . . qk and therefore the space
span(a1 , . . . ak ) can be written as a linear combination of
q1 , . . . qk and therefore is equal to span(q1 , . . . qk ).
• Conversely span(q1 , q2 , . . . , qj ) = span(a1 , a2 , . . . , aj ) for each
j = 1, . . . , n means that (for some set of coefficients rij with the
rii non-zero)
a1 = r11 q1
a2 = r12 q1 + r22 q2
..
.
an = r1n q1 + r2n q2 + · · · + rnn qn .
This is (5.7).
&
%
MS4105
267
'
$
If I write (5.7) as a matrix equation I have
^R
^
A=Q
^ is m × n with n orthonormal columns and R
^ is n × n and
where Q
upper-triangular. This is referred to as a reduced QR
factorisation of A.
&
%
MS4105
268
'
5.2.2
$
Full QR factorisation
A full QR factorisation of an m × n complex matrix A (m ≥ n)
^ so that it becomes
appends m − n extra orthonormal columns to Q
an m × m unitary matrix Q. This is analogous to the “silent”
columns in the SVD — to be discussed in Ch. 6.
^ rows of zeroes are appended
In the process of adding columns to Q,
^ so that it becomes an m × n matrix — still upper-triangular.
to R
The extra “silent” columns in Q multiply the zero rows in R so the
product is unchanged.
Note that in the full QR factorisation, the silent columns qj of Q
(for j > n) are orthogonal to the range of A as the range of A is
spanned by the first n columns of Q. If A has full rank n, the
silent columns are an orthonormal basis for the null space of A.
&
%
MS4105
269
'
5.2.3
$
Gram-Schmidt Orthogonalisation
The equations above for ai in terms of qi suggest a method for the
computation of the reduced QR factorisation. Given the columns of
A; a1 , a2 , . . . I can construct the vectors q1 , q2 , . . . and the entries
rij by a process of successive orthogonalisation — the
Gram-Schmidt algorithm (Alg. 2.1).
&
%
MS4105
270
'
$
Applying the G-S algorithm to the problem of calculating the qj
and the rij I have:
a1
r11
a2 − r12 q1
q2 =
r22
a3 − r13 q1 − r23 q2
q3 =
r33
..
.
Pn−1
an − i=1 rin qi
qn =
rnn
q1 =
where the rij are just the components of each aj along the ith
orthonormal vector qi , i.e. rij = q∗i aj for i < j and
Pj−1
rjj = kaj − i=1 rij qi k in order to normalise the qj .
&
%
MS4105
'
271
$
Writing the algorithm in pseudo-code:
Algorithm 5.1 Classical Gram-Schmidt Process (unstable)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
for j = 1 to n
vj = aj
for i = 1 to j − 1
rij = q∗i aj
vj = vj − rij qi
end
rjj = kvj k2
qj = vj /rjj
end
A matlab/octave implementation of this algorithm can be found
at: http://jkcray.maths.ul.ie/ms4105/qrgs1.m.
&
%
MS4105
272
'
5.2.4
$
Instability of Classical G-S Algorithm
As the note in the title suggests, the above algorithm is
numerically unstable — although algebraically correct.
Let’s see why.
• First I need to explain “N–digit floating point” arithmetic —
nothing new of course but in these examples I’ll need to be
careful about how f.p. arithmetic is done.
• For any real number x define fl(x) (“float of x”) to be the
closest floating point number to x using whatever rounding
rules are selected — to N digits.
• So in 10–digit fp arithmetic, fl(π) = 3.141592653.
• In 3–digit fp arithmetic, fl(π) = 3.14.
• Also of course , fl(1 + 10−3 ) = 1 as 1.001 has to be rounded
down to 1.00.
&
%
MS4105
'
273
$
Example 5.1 (CGS Algorithm Instability ) I’ll apply the
CGS algorithm 5.1 above to three nearly equal vectors and show
that CGS gives wildly inaccurate answers. I’ll work in 3-digit f.p.
arithmetic.




1
1




−3
−3



• Take the three vectors a1 = 
10 , a2 = 10  and
10−3
0


1



a3 =  0 
 as input.
10−3
• Check that they are linearly independent.
• See the CGS calculation in App. S.
• You’ll find that q∗2 q3 = 0.709. Not even close to
orthogonal!
&
%
MS4105
'
274
$
• The example was deliberately constructed to “break” the CGS
algorithm and of course Matlab/octave use 16–digit f.p.
arithmetic, not three. The three vectors are almost equal (in
3–digit arithmetic) so you might expect problems.
• I’ll show in the next Section that a modified version of the GS
algorithm doesn’t fail, even on this contrived example.
Here’s another example which doesn’t use 3–digit f.p. arithmetic.
In the Example, ε is the smallest positive f.p number such that
1 + 2ε > 1. So 1 + ε = 1 in f.p. arithmetic and of course 1 + ε2 = 1.
In Matlab/Octave ε = 12 εM ≈ 1.1102 10−16 .
Exercise 5.1 Take a1 = (1, ε, 0, 0)∗ , a2 = (1, 0, ε, 0)∗ and
a3 = (1, 0, 0, ε)∗ . Check that (using CGS) q∗2 q3 = 12 — it should of
course be zero.
I will improve on CGS in the next Section — for the present I will
use it to discuss the QR algorithm further.
&
%
MS4105
275
'
5.2.5
$
Existence and Uniqueness
Every m × n complex matrix A has a QR factorisation which is
unique subject to some restrictions. The existence proof:
Theorem 5.3 Every A ∈ Cm×n (m ≥ n) has a full QR
factorisation and so also a reduced QR factorisation.
Proof:
• Suppose that A has full rank n and that I require a reduced
QR factorisation. Then the G-S algorithm provides the proof
^ and
as the algorithm generates orthonormal columns for Q
^ so that A = Q
^ R.
^ The algorithm can fail only if at
entries for R
some iteration vj = 0 and so cannot be normalised to produce
qj . But this would imply that aj is in the span of q1 , . . . qj−1
contradicting the assumption that A has full rank.
&
%
MS4105
'
276
$
• Now suppose that A does not have full rank. Then at one or
more steps j I will find that vj = 0 as aj can be expressed as a
linear combination of fewer than n qi ’s. Now just pick qj to be
any unit vector orthogonal to q1 , . . . , qj−1 and continue the
G-S algorithm.
• Finally the full rather than reduced QR factorisation of an
m × n matrix A with m > n can be constructed by adding
extra orthonormal vectors after the nth iteration. I just
continue to apply the G-S algorithm for m − n more iterations
to arbitrary vectors orthogonal to the column space of A.
^R
^ is a reduced QR
Now for uniqueness. Suppose that A = Q
^ is multiplied by z and the ith
factorisation. If the ith column of Q
^ is multiplied by z−1 for any z ∈ C s.t. |z| = 1 then the
row of R
^R
^ is unchanged so I have another QR factorisation for A.
product Q
The next theorem states that if A has full rank then this is the
only freedom in our choice of QR factorisations.
&
%
MS4105
'
277
$
Theorem 5.4 Every full rank m × n complex matrix A (m ≥ n)
^R
^ such that the
has a unique reduced QR factorisation A = Q
diagonal elements of R are all positive.
^ R,
^ the
Proof: Again I use the GS algorithm. From A = Q
^ and the upper-triangularity of
orthonormality of the columns of Q
R it follows that any reduced QR factorisation of A can be
generated by Alg. 5.1 — by the assumption of full rank the rjj are
all non-zero so all the vectors qj , j = 1, . . . , n can be formed. The
one degree of freedom is that in line 7 I made the arbitrary choice
rjj = kvj k2 . As mentioned above, multiplying each qi by a different
complex number zi (with unit modulus) and dividing the
corresponding row of R by the same amount does not change the
^R
^ as the qi still have unit norm. The restriction rjj > 0
product Q
means that the choice in line 7 is unique.
&
%
MS4105
278
'
5.2.6
$
Solution of Ax = b by the QR factorisation
Suppose that I want to solve Ax = b for x where A is complex,
m × m and invertible. If A = QR is a QR factorisation then I can
write QRx = b or Rx = Q∗ b. The RHS is easy to compute once I
know Q and the linear system is easy to solve (by back
substitution) as R is upper triangular.
So a general method for solving linear systems is:
1. Compute a QR factorisation for A; A = QR.
2. Compute y = Q∗ b.
3. Solve Rx = y for x.
This method works well but Gaussian elimination uses fewer
arithmetic operations. I will discuss this topic further in Chapter 7.
&
%
MS4105
279
'
5.2.7
$
Exercises
1. Consider again the matrices A and B in Q. 6 of Exercises 5.1.5.
Calculate by hand a reduced and a full QR factorisation of
both A and B.
2. Let A be a matrix with the property that its odd-numbered
columns are orthogonal to its even-numbered columns. In a
^ R,
^ what particular structure
reduced QR factorisation A = Q
^ have? See App. Q for a solution.
will R
3. Let A be square m × m and let aj be its ith column. Using the
full QR factorisation, give an algebraic proof of Hadamard’s
m
Y
kaj k2 . (Hint: use the fact that
inequality | det A| ≤
P
j=1
aj = i rij qi then take norms and use the fact that
P
2
kaj k = i |rij |2 because the vectors are orthonormal.)
&
%
MS4105
280
'
$
4. Check Hadamard’s inequality for a random (say) 6 × 6 matrix
using matlab/octave.
^R
^ be a reduced
5. Let A be complex m × n , m ≥ n and let A = Q
QR factorisation.
^
(a) Show that A has full rank n iff all the diagonal entries of R
are non-zero. (Hints: Remember A full rank means that (in
particular) the columns ai of A are linearly independent so
Pn
that i=1 αi ai = 0 ⇒ αi = 0, i = 1, . . . , n. Rewriting in
^ ∗Q
^ = In . So
vector notation; Aα = 0 ⇒ α = 0. Note that Q
^ = 0. The problem reduces to showing that
Aα = 0 iff Rα
^ Rα
^ = 0 ⇒ α = 0 iff all
for an upper triangular matrix R,
^ are non-zero. Use
the diagonal entries of R
back-substitution to check this.)
&
%
MS4105
281
'
$
^ has k non-zero diagonal entries for some k
(b) Suppose that R
with 0 ≤ k < n. What can I conclude about the rank of A?
(Is rank A = k? Or rank A = n − 1? Or rank A < n?)
First try the following Matlab experiment:
• Construct a tall thin random matrix A, say 20 × 12.
• Use the built-in Matlab QR function to find the reduced
QR factorisation of A: [q,r]=qr(A,0).
• Check that q ∗ r = A, rank A = 12 and rank r = 12.
• Now set two (or more) of the diagonal elements of r to
zero – say r(4,4)=0 and r(8,8)=0.
• What is the rank of r now?
˜ = q ∗ r?
• And what is the rank of A
• What happens if you increase the number of zero
diagonal elements of r?
See Appendix L for an answer to the original question.
&
%
MS4105
282
'
5.3
$
Gram-Schmidt Orthogonalisation
The G-S algorithm is the basis for one of the two principal
methods for computing QR factorisations. In the previous Section
I used the conventional G-S algorithm to compute the QR
factorisation. I begin by re-describing the algorithm using
projection operators. Let A be complex, m × n (m ≥ n) and full
rank with n columns aj , j = 1, . . . , n.
Consider the sequence of formulas
P2 a2
Pn an
P1 a1
, q2 =
, . . . , qn =
.
q1 =
kP1 a1 k
kP2 a2 k
kPn an k
(5.8)
Here each Pj is an orthogonal projection operator, namely the
m × m matrix of rank m − (j − 1) that projects from Cm
orthogonally onto the space orthogonal to {q1 , . . . , qj−1 }. (When
j = 1 this formula reduces to the identity P1 = I so q1 = a1 /ka1 k.)
&
%
MS4105
283
'
$
Now I notice that each qj as defined by (5.8) is orthogonal to
{q1 , . . . , qj−1 }, lies in the space spanned by {a1 , . . . , aj } and has
unit norm. So the algorithm (5.8) is equivalent to Alg. 5.1 , our
G-S-based algorithm that computes the QR factorisation of a
matrix.
^ j−1 be
The projection operators Pj can be written explicitly. Let Q
^
the m × (j − 1) matrix containing the first j − 1 columns of Q;


^ j−1
Q
&




=
q1



q2
...




qj−1 
.



%
MS4105
284
'
$
Then Pj is given by
^ j−1 Q
^ ∗j−1 .
Pj = I − Q
(5.9)
Can you see that this is precisely the operator that maps aj into
∗
∗
∗
aj − (q1 aj )q1 + (q2 aj )q2 + · · · + (qj−1 aj )qj−1 , the projection of
aj onto the subspace ⊥ to {q1 , . . . , qj−1 }?
&
%
MS4105
285
'
5.3.1
$
Modified Gram-Schmidt Algorithm
As mentioned in the previous Section, the CGS algorithm is flawed.
I showed you two examples where the vectors q1 , . . . , qn were far
from being orthogonal (mutually perpendicular) when calculated
with CGS.
Although algebraically correct, when implemented in floating point
arithmetic, when the CGS algorithm is used, the vectors qi are
often not quite orthogonal, due to rounding errors (subtractive
cancellation) that arise from the succession of subtractions and the
order in which they are performed.
A detailed explanation is beyond the scope of this course.
&
%
MS4105
286
'
$
Fortunately, a simple change is all that is needed. For each value of
j, Alg. 5.1 (or the neater version (5.8) using projection operators )
computes a single orthogonal projection of rank m − (j − 1),
(5.10)
vj = Pj aj .
The modified G-S algorithm computes exactly the same result (in
exact arithmetic) but does so by a sequence of j − 1 projections,
each of rank m − 1. I showed in (5.3) that P⊥q is the rank m − 1
orthogonal projection operator onto the space orthogonal to a
vector q ∈ Cm . By the definition of Pj above, it is easy to see that
(with P1 ≡ I)
Pj = P⊥qj−1 . . . P⊥q2 P⊥q1
(5.11)
as by the orthogonality of the qi ,
1
Y
i=j−1
&
(I − qi q∗i ) = I −
j−1
X
qi q∗i .
i=1
%
MS4105
287
'
$
Also, using the definition (5.9) for Pj , given any v ∈ Cm ,
^ j−1 Q
^ ∗j−1 v
Pj v = v − Q


q∗1 v


 q∗ v 
2 
^ j−1 
=v−Q


 ... 


q∗j−1 v
j−1
X
=v−
(q∗i v)qi
i=1
= (I −
j−1
X
qi q∗i )v.
i=1
&
%
MS4105
288
'
$
So the equation
vj = P⊥qj−1 . . . P⊥q2 P⊥q1 aj
(5.12)
is equivalent to (5.10). The modified G-S algorithm is based on
using (5.12) instead of (5.10).
A detailed explanation of why the modified G-S algorithm is better
that the “unstable” standard version would be too technical for
this course. A simplistic explanation — the process of repeated
multiplication is much more numerically stable that repeated
addition/subtraction.
Why? Repeated addition/subtraction of order one terms to a large
sum typically results in loss of significant digits. Repeated
multiplication by order-one (I − qi q∗i ) factors is not subject to this
problem.
I’ll re-do Example 5.1 with MGS in Section 5.3.2 below — you’ll
see that it gives much better results.
&
%
MS4105
289
'
$
Let’s “unpack” the modified G-S algorithm so that I can write
peudo-code for it: The modified algorithm calculates vj by
performing the following operations (for each j = 1, . . . , n)
(1)
= aj
(2)
= P⊥q1 vj
(3)
= P⊥q2 vj
vj
vj
vj
(1)
= vj
(1)
− (q∗1 vj )q1
(2)
= vj
(1)
(2)
− (q∗2 vj )q2
(2)
..
.
(j)
(j−1)
vj ≡ vj = P⊥qj−1 vj
(j−1)
= vj
(j−1)
− (q∗j−1 vj
)qj−1 .
Of course I don’t need all these different versions of the vj — I just
update each vj over and over again inside a loop.
&
%
MS4105
'
290
$
I can write this in pseudo-code as:
Algorithm 5.2 Modified Gram-Schmidt Process
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
for j = 1 to n
vj = aj
end
for j = 1 to n
for i = 1 to j − 1 (Do nothing if j = 1)
rij = q∗i vj
vj = vj − rij qi
end
rjj = kvj k
qj = vj /rjj
end
In practice, it would be sensible to let the vi overwrite the ai to
save memory.
&
%
MS4105
291
'
5.3.2
$
Example to Illustrate the “Stability” of MGS
Example 5.2 I’ll re-use Example 5.1, again working with 3–digit
f.p. arithmetic.




1
1




−3
−3



Take the three vectors a1 = 10 , a2 = 10 
 and
10−3
0


1



a3 =  0 
 as input.
10−3
See the MGS calculation in App. T.
The good news: q∗1 q2 = −10−3 , q∗2 q3 = 0 and q∗3 q1 = 0. This
is as good as I can expect when working to 3–digit accuracy.
&
%
MS4105
'
292
$
Exercise 5.2 Now check that MGS also deals correctly with the
example in Exercise 5.1.
&
%
MS4105
293
'
5.3.3
$
A Useful Trick
A surprisingly useful trick is the technique of re-ordering sums or
(as I will show) operations. The technique is usually written in
terms of double sums of the particular form
N X
i
X
fij
i=1 j=1
and the “trick” consists in noting that if I label columns in an i–j
“grid” by i and rows by j then I am summing elements of the
matrix “partial column by partial column” — i.e. I take only the
first element of column 1, the first two elements of column 2, etc.
Draw a sketch!
&
%
MS4105
294
'
$
But I could get the same sum by summing row-wise; for each row
(j) sum all the elements from column j to column N.
So I have the (completely general) formula:
N X
i
X
i=1 j=1
&
fij =
N X
N
X
fij .
(5.13)
j=1 i=j
%
MS4105
'
295
$
The usefuness of the formula here is as follows:
The MGS algorithm is:
Algorithm 5.3 Modified Gram-Schmidt Process
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
for j = 1 to n
vj = aj
end
for j = 1 to n
for i = 1 to j − 1 (Do nothing if j = 1)
rij = q∗i vj
vj = vj − rij qi
end
rjj = kvj k
qj = vj /rjj
end
&
%
MS4105
296
'
$
I can (another trick) rewrite this as a simpler double (nested) for
loop similar in structure to the double sum (5.13) above:
Algorithm 5.4
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
Modified Gram-Schmidt Process
for j = 1 to n
vj = aj
end
for j = 1 to n
for i = 1 to j
if i < j then
rij = q∗
i vj
vj = vj − rij qi
fi
if i = j then
rjj = kvj k
qj = vj /rjj
fi
end
end
Lines 1–3 can be left alone but suppose that I apply (5.13) above to
lines 4–15? (Think of this “block”, depending on i & j as fij .)
&
%
MS4105
297
'
$
I get:
Algorithm 5.5
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
Modified Gram-Schmidt Process
for j = 1 to n
vj = aj
end
for i = 1 to n
for j = i to n
if i < j then
rij = q∗
i vj
vj = vj − rij qi
fi
if i = j then
rjj = kvj k
qj = vj /rjj
fi
end
end
Finally, I can undo the i < j and i = j tricks that I used to make a
block depending on both i & j:
&
%
MS4105
298
'
$
Algorithm 5.6
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
Modified Gram-Schmidt Process—Alternative Form
for i = 1 to n (I’m using i instead of j because it looks better. . . ^)
¨
vi = ai
end
for i = 1 to n
rii = kvi k
qi = vi /rii
for j = i + 1 to n
rij = q∗
i vj
vj = vj − rij qi
end
end
Can you see why lines 5 & 6 now appear before the inner for loop
— and why the dummy variable is now i?
This alternative (and entirely equivalent) form for MGS is often
used in textbooks and on-line & if you had not read the preceding
discussion you might think it was a competely different algorithm.
Exercise 5.3 Perform the same re-ordering on the CGS algorithm
5.1.
&
%
MS4105
299
'
5.3.4
$
Operation Count
When m and n are large, the work in both Alg. 5.1 and Alg. 5.2 is
dominated by the operations in the inner loop:
rij = q∗i vj
vj = vj − rij qi
The first line computes an inner product requiring m
multiplications and m − 1 additions. The second requires m
multiplications and m subtractions. So the total work per inner
iteration is ≈ 4m flops (4 flops per column element). In total, the
number of flops used by the algorithm is approximately
j−1
n X
X
j=1 i=1
&
4m =
n
X
4m(j − 1) ≈ 2mn2 .
(5.14)
j=1
%
MS4105
300
'
5.3.5
$
Gram-Schmidt as Triangular Orthogonalisation
It is interesting to interpret the GS algorithm as a process of
multiplying the matrix A on the right by a succession of triangular
matrices; “Triangular Orthogonalisation”.
Each outer j step of Alg. 5.2 can be viewed as a
right-multiplication. Starting with A, the j = 1 step multiplies the
first column a1 by 1/r11 .
&
%
MS4105
301
'
$
This is equivalent to right-multiplying A by the matrix R1 :





AR1 = 
q1




a2
...




an 
=









a1




&
a2
...

 1/r11


 0

an 
  ..
 .


0

0 ...
0
1 ...
..
.


0
 (5.15)



1
%
MS4105
302
'
$
The j = 2 step subtracts r12 times q1 from a2 and divides the result
by r22 — equivalent to right-multiplying AR1 by the matrix R2 :






AR1 R2 = 
q1



q2
...




an 
=



...

1

 0



an 
 0

 0

0





q1




&
a2
−r12 /r22

...
0
1/r22
...
0
..
.
...
..
.

0


0
.



1
0
(5.16)
%
MS4105
303
'
$
The j = 3 step: a3 ← a3 − r13 q1 − r23 q2 , divide result by r33 .






AR1 R2 R3 = 
q1



q2





q1




&
q2
a3
...
q3
...




an 
=




 1 0 −r13 /r33


0 1 −r23 /r33


1/r33

0 0

an  
0 0
0



..

.

0

...
0
...

0


0

 (5.17)
0




1
...
...
..
.
%
MS4105
'
304
$
• I can represent this process by a process of multiplying A on
the right by a sequence of elementary upper triangular matrices
Rj where each Rj only changes the jth column of A.
• After multiplying AR1 R2 . . . Rj−1 on the right by Rj the first j
columns of the matrix AR1 R2 . . . Rj consist of the vectors
q1 , . . . , qj .
&
%
MS4105
305
'
• The matrix Rj

1

0


0


0


Rj = 
0

0


0




$
is just
0
0
...
0
−r1j /rjj
...
1
0
...
0
−r2j /rjj
...
0
1
0
0
−r3j /rjj
..
.
...
0
...
..
.
0
0
...
1 −rj−1,j /rjj
0
0
...
0
1/rjj
...
0
0
...
0
0
..
.
1
0
&
...
...
...
..
.

0

0


0


0


0
,

0


0




1
%
MS4105
306
'
$
• At the end of the process I have
^
AR1 R2 . . . Rn = Q
where Q is the m × n orthogonal matrix whose columns are the
vectors qj .
• So the GS algorithm is a process of triangular
orthogonalisation.
• Of course I do not compute the Rj explicitly in practice but
this observation gives us an insight into the structure of the GS
algorithm that will be useful later.
&
%
MS4105
307
'
5.3.6
$
Exercises
1. Let A be a complex m × n matrix. can you calculate the exact
number of flops involved in computing the QR factorisation
^R
^ using Alg. 5.2?
A=Q
2. Show that a product of upper triangular matrices is upper
triangular.
3. Show that the inverse of an upper triangular matrix is upper
triangular.
&
%
MS4105
308
'
5.4
$
Householder Transformations
The alternative approach to computing QR factorisations is
Householder triangularisation, which is numerically more stable
than the Gram-Schmidt orthogonalisation process. The
Householder algorithm is a process of
“orthogonal triangularisation”, making a matrix triangular by
multiplying it by a succession of unitary (orthogonal if real)
matrices.
&
%
MS4105
309
'
5.4.1
$
Householder and Gram Schmidt
I showed at the end of Ch 5.3.5 that the GS algorithm can be
viewed as applying a succession of elementary upper triangular
matrices Rj to the right of A, so that the resulting matrix
^
AR1 . . . Rn = Q
−1
^ = R−1
has orthonormal columns. Check that the product R
.
.
.
R
n
1
^R
^ is a reduced QR
is also upper triangular. So as expected A = Q
factorisation for A.
&
%
MS4105
310
'
$
On the other hand, I will see that the Householder method applies
a succession of elementary unitary matrices Qk on the left of A so
that the resulting matrix
Qn Qn−1 . . . Q1 A = R
is upper triangular. The product Q = Q∗1 Q∗2 . . . Q∗n is also unitary
so this method generates a full QR factorisation A = QR.
In summary;
• the Gram-Schmidt process uses triangular
orthogonalisation.
• the Householder algorithm uses
orthogonal triangularisation.
&
%
MS4105
311
'
5.4.2
$
Triangularising by Introducing Zeroes
The Householder method is based on a clever way of choosing the
unitary matrices Qk so that Qn Qn−1 . . . Q1 A is is upper
triangular.
In the example on the next slide, A is a general 5 × 3 matrix.
The matrix Qk is chosen to introduce zeroes below the diagonal in
the kth column while keeping all the zeroes introduced at previous
iterations. In the diagram, × represents an entry that is not
necessarily zero and a bold font means the entry has just been
changed. Blank entries are zero.
&
%
MS4105
312
'

×

×


×


×

×
$
×
×
×
×
×
A

×

×

 Q1
×
→

×

×




X X X
× × ×




 0 X X

X X





 Q2 
 Q3
 0 X X → 
0 X



→




 0 X X

0 X




0 X X
0 X
Q1 A
Q2 Q1 A

×









×
×


× ×


X


0

0
Q3 Q2 Q1 A
First Q1 operates on rows 1–5, introducing zeroes in column 1 in
the second and subsequent rows. Next Q2 operates on rows 2–5,
introducing zeroes in column 2 in the third and subsequent rows
but not affecting the zeroes in column 1. Finally Q3 operates
on rows 3–5, introducing zeroes in column 3 in the fourth and fifth
rows, again not affecting the zeroes in columns 1 and 2.
The matrix Q3 Q2 Q1 A is now upper triangular.
&
%
MS4105
'
313
$
In general Qk is designed to operate on rows k to m. At the
beginning of the kth step there is a block of zeroes in the first k − 1
columns of these rows. Applying Qk forms linear combinations of
these rows and the linear combinations of the zero entries remain
zero.
&
%
MS4105
314
'
5.4.3
$
Householder Reflections
How to choose unitary matrices Qk that accomplish the
transformations suggested in the diagram? The standard approach
is to take each Qk to be a unitary matrix of the form:


Ik−1 0


(5.18)
Qk =
0
Hk
where Ik−1 is the (k − 1) × (k − 1) identity matrix and Hk is an
(m − k + 1) × (m − k + 1) unitary matrix. I choose Hk so that
multiplication by Hk introduces zeroes into the kth column.
For k = 1, Q1 = H1 , an m × m unitary matrix.
Notice that the presence of Ik−1 in the top left corner of Qk
ensures that Qk does not have any effect on the first k − 1
rows/columns of A.
&
%
MS4105
'
315
$
The Householder algorithm chooses Hk to be a unitary matrix
with a particular structure — a Householder reflector.
A Householder reflector Hk is designed to introduce zeroes below
the diagonal in column k — H1 introduces zeroes in rows 2–m of
column 1, H2 introduces zeroes in rows 3–m of column 2, etc;
without affecting the zeroes below the diagonal in the
preceding columns.
&
%
MS4105
316
'

∗


















∗
∗
$
∗
...
∗
...
∗
...
∗
...
∗
...
..
.
∗
..
.
...
x1
...
x2
..
.
...
xm−k+1
...
...
...


∗
∗



∗






∗




∗ Hk 

⇒

∗






∗





∗

∗
∗ ∗
∗
...
∗

...
∗

∗


∗


∗










∗
...
∗
...
∗
...
..
.
∗
..
.
...
...
...
0
..
.
...
0
...
...
Referring to the diagram, suppose that at the beginning of step k,
the entries in rows k to m of the kth column are given by the vector
(of length m − (k − 1)) x ∈ Cm−k+1 . After mutiplying by Hk ;
zeroes are introduced in column k under the diagonal, the entries
marked are changed and the entries marked by ∗ are unchanged.
&
%
MS4105
317
'
$
To introduce the required zeroes into the kth column, the
Householder reflector H must transform x into a multiple of e1 —
the vector of the same size that is all zeroes except for the first
element.
For any vector v ∈ Cm−k+1 , the matrix P =
orthogonal projection operator so that
2vv∗
H = I − 2P = I − ∗
v v
vv∗
v∗ v
is an
(5.19)
is unitary (check ).
&
%
MS4105
318
'
$
For an arbitrary vector v ∈ Cm , the effect on a vector x of a
Householder reflector H = I − 2Pv ≡ P⊥v − Pv is to reflect x
about the normal to v that lies in the x–v plane.
v
P⊥v x
Pv x
x
−Pv x
P⊥v x − Pv x
P⊥v x
Figure 9: Householder Reflector for arbitrary v
&
%
MS4105
319
'
$
For any choice of x, I have
2v∗ x
Hx = x − ∗ v
v v
so if (as I want) Hx ∈ span{e1 } then I must have v ∈ span{x, e1 }.
Setting v = x + αe1 (taking α real) gives
v∗ x = x∗ x + αe∗1 x = kxk2 + αx1
and
v∗ v = kxk2 + α2 + 2αx1 .
So
2(kxk2 + αx1 )
(x + αe1 )
Hx = x −
2
2
kxk + α + 2αx1
(α2 − kxk2 )
v∗ x
x − 2α ∗ e1 .
=
2
2
kxk + α + 2αx1
v v
&
%
MS4105
320
'
$
If I chose α = ±kxk then the coefficient of x is zero, so that Hx is a
multiple of e1 as required.
Now I can write
v = x ± kxke1
(5.20)
and (substituting for v∗ x and v∗ v in Hx) I get the remarkably
simple result:
Hx = ∓kxke1 .
(5.21)
On the two suceeding Slides I show in diagrams the effect of the
two choices of sign.
&
%
MS4105
321
'
$
P⊥v x
v = x − kxke1
x
Pv x
−kxke1
e1
kxke1
Hx ≡ P⊥v x − Pv x
−Pv x
P⊥v x
Figure 10: Householder reflection using v = x − kxke1
&
%
MS4105
322
'
$
x
P⊥v x
v = x + kxke1
−kxke1
Hx ≡ P⊥v x − Pv x
Pv x
e1
kxke1
P⊥v x
−Pv x
Figure 11: Householder reflection using v = x + kxke1
&
%
MS4105
'
323
$
Example 5.3 If x = (3, 1, 5, 1)T then kxk = 6, α = ±6 and (taking
the positive sign in α) I find that v = (9, 1, 5, 1)T . The Householder
reflector H is given by


−27 −9 −45 −9



2vv∗
1  −9 53 −5 −1

H=I− ∗ =

.

v v
54 −45 −5 29 −5


−9 −1 −5 53
It is easy to check that Hx = (−6, 0, 0, 0)T = −6e1 as expected.
&
%
MS4105
324
'
$
An obvious question: which choice of sign should I take in (5.20)?
The short answer is that either will do — algebraically. The full
answer is that I make the choice that avoids subtraction as
subtraction of nearly equal quantities leads to loss of significant
digits in the result.
So I use the following prescription for v:
v = x + sign(x1 )kxke1
(5.22)
which of course means that (5.21) becomes:
Hx = −sign(x1 )kxke1 .
(5.23)
(Which of the two Figures 10 and 11 corresponds to this choice?)
&
%
MS4105
325
'
$
I can now write the Householder QR algorithm using a
matlab-style notation. If A is a matrix, define Ai:i 0 ,j:j 0 to be the
(i 0 − i + 1) × (j 0 − j + 1) submatrix of A whose top left corner is aij
and whose lower right corner is ai 0 j 0 — the rectangle in blue in the
diagram below. If the submatrix is a sub-vector of a particular row
or column of A I write Ai,j:j 0 or Ai:i 0 ,j respectively.

∗ ∗

∗ ∗

. .
. .
. .


∗ ∗

 .. ..
. .

∗
&
∗
...
∗
...
∗
...
...
aij
..
.
...
aij 0
..
.
...
...
ai 0 j
..
.
...
ai 0 j 0
..
.
...
...
∗
...
∗
...

∗

∗

.. 

.


∗

.. 
.

∗
%
MS4105
'
326
$
• The following algorithm Householder QR Factorisation
(Alg. 5.7) computes the factor R of a QR factorisation of an
m × n matrix A with m ≥ n overwriting A by R.
• The n “reflection vectors” v1 , . . . , vn are also computed.
• I could normalise them but it is better not to, instead I divide
vk v∗k by v∗k vk whenever the former is used!
• Algebraically it makes no difference but numerically it can.
• Why?
• Having made this choice, I must remember to divide vk v∗k by
v∗k vk whenever the former is used.
• See the Algorithms following Alg. 5.7.
&
%
MS4105
'
327
$
Algorithm 5.7 Householder QR Factorisation
(1)
(2)
(3)
(4)
(5)
for k = 1 to n
x = Ak:m,k
vk = x + sign(x1 )kxke1
Ak:m,k:n = Ak:m,k:n − 2vk (v∗k Ak:m,k:n ) /(v∗k vk )
end
Exercise 5.4 How should I efficiently implement line (3) in the
Algorithm?
Exercise 5.5 In fact I could marginally further improve this
2
1
algorithm by using the fact that ∗
=
. This only
vk vk
sign(x1 )kxkv1
requires a division and two multiplications. How many are needed
2
to calculate ∗
directly?
vk vk
&
%
MS4105
'
328
$
• Note that when A is square (m = n), n − 1 iterations are all
that is needed to make A upper triangular.
• So I have a choice:
– Either a final iteration to calculate vn and also update
A(n, n).
– Or I could choose to define vn = 1.
• Remember that I need v1 , . . . , vn to calculate the unitary
matrix Q.
• Note that the matlab-style notation avoids the necessity of
multiplying A by n × n matrices, instead I update the relevant
Ak:m,k:n submatrix at each iteration which is much more
efficient.
&
%
MS4105
329
'
5.4.4
$
How is Q to be calculated?
On completion of Alg. 5.7, A has been reduced to upper triangular
form — the matrix R in the QR factorisation. I have not
constructed the unitary m × m matrix Q nor its m × n sub-matrix
^ The reason is that forming Q or Q
^ takes extra work
Q.
(arithmetic) and I can often avoid this extra work by working
directly with the formula
Q∗ = Qn . . . Q 2 Q1
(5.24)
Q = Q1 . . . Qn−1 Qn .
(5.25)
or its conjugate
&
%
MS4105
330
'
$
N.B.
2vk v∗
k
∗
vk vk
• At each iteration Hk = I −
is hermitian and therefore Qk
as defined in (5.18) is also so I can omit the “stars” in (5.25).
• But Q is not Hermitian in general.
&
%
MS4105
331
'
$
• For example I saw on Slide 278 that a square system of
equations Ax = b can be solved via the QR factorisation of A.
• I only needed Q to compute Q∗ b and by (5.24), I can calculate
Q∗ b by applying a succession of Qk to b.
• Using the fact that each Qk = I − vk v∗k /(v∗k vk ) then (once I
know the n vectors v1 to vn ) I can evaluate Q∗ b by
Algorithm 5.8 Implicit Calculation of Q∗ b
for k = 1 to n
bk:m = bk:m − 2vk (v∗k bk:m )/(v∗k vk )
end
(1)
(2)
(3)
i.e. the same sequence of operations that were applied to A to
make it upper triangular.
&
%
MS4105
332
'
$
• Similarly the computation of a product Qx can be performed
by the same process in the reverse order:
Algorithm 5.9 Implicit Calculation of Qx
for k = n DOWNTO 1
xk:m = xk:m − 2vk (v∗k xk:m )/(v∗k vk )
end
(1)
(2)
(3)
• If I really need to construct Q I could:
– construct QI using Alg. 5.9 by computing its columns
Qe1 , . . . , Qem .
∗ In fact I can apply Alg. 5.9 to all the columns of the
identity matrix simultanaeously, see Alg. 5.10.
– or compute Q∗ I using Alg. 5.8 and conjugate the result.
– or conjugate each step rather than the final product, i.e. to
construct IQ by computing its rows e∗1 Q, . . . , e∗m Q
&
%
MS4105
333
'
$
• The first idea is the best as it begins with operations involving
Qn , Qn−1 etc. that only modify a small part of the vector that
they are applied to.
• This can result in a speed-up.
• Here’s some pseudo-code:
• Algorithm 5.10 Explicit Calculation of Q
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
&
Q = Im Initialise Q to the m × m identity matrix.
for k = n DOWNTO 1
vk = V(k : m, k)
T = Q(k : m, k : m)
w = v∗k T
T = T − 2vk w/(v∗k vk )
Q(k : m, k : m) = T
end
%
MS4105
334
'
5.4.5
$
Example to Illustrate the Stability of the
Householder QR Algorithm
Example 5.4 I’ll re-use the first example, again working with
3–digit f.p. arithmetic.




1
1




−3
−3



Take the three vectors a1 = 10 , a2 = 10 
 and
10−3
0


1



a3 =  0 
 as input.
10−3
&
%
MS4105
335
'
$

1
1

−3
So the matrix A to be factored is A = 
10
10−3
n = m = 3.
−3
10
0
1


0 
 and
10−3
I’ll apply the Householder QR algorithm to A using 3–digit f.p.
arithmetic as previously. For details, see App. V.
I find that

−1
−1

−3
R=
0
−1.0
10

0
0
&
−1

0

.

1.0 10−3
%
MS4105
336
'
$
Check that when Q is computed using Alg. 5.9, I find that


−3
−3
−1.0
−1.0 10
1.0 10


−3
−7

Q = −1.0 10
−5.0 10
−1.0 

1.0
5.0 10−7
−1.0 10−3
and that the computed Q satisfies

0.0

∗
Q Q−I=
0.0

−5.0 10−10
0.0 −5.0 10−10

0.0
0.0

.

0.0
0.0
This is remarkably accurate — for this example at least.
Finally, check that using three-digit f.p. arithmetic, QR is exactly
equal to A.
&
%
MS4105
337
'
5.5
$
Why is Householder QR So Stable?
The answer can be stated easily in the form of a Theorem (which I
will not prove).
First I need some definitions.
Definition 5.1 (Standard Model of F.P. Arithmetic)
Describe floating point arithmetic as follows:
• Let β be the number base (usually 2) and t the number of digits
precision.
• Then f.p. numbers take the form y = ±m × βe−t .
• The exponent e varies between emin and emax .
• In matlab, with β = 2, emax = 1024.
• 21024 ≈ 1.797693134862316 10308 .
&
%
MS4105
'
338
$
• The “significand” (or mantissa) m is an integer between 0 and
βt − 1.
d2
d1
dt
+ 2 + · · · + t = ±βe × 0.d1 d2 . . . dt .
• So y = ±βe
β
β
β
• Each digit di satisfies 0 ≤ di ≤ β − 1 and d1 6= 0.
• The first digit d1 is called the “most significant digit” and the
last digit dt is called the “least significant digit”.
1 1−t
is called the unit roundoff.
• The number u = β
2
&
%
MS4105
'
339
$
• It can be shown that if a real number x lies in the range of f.p.
numbers F as defined above, then x can be approximated by a
f.p. number f with a relative error no greater than u.
• Formally: fl(x) = x(1 + δ), where |δ| < u. See App. U for a
short proof.
• So the result of a floating point calculation is the exact answer
times a factor within “rounding error” of 1.
&
%
MS4105
'
340
$
I can state (but will not prove) the following Theorem to summarise
the numerical properties of the Householder QR algorithm.
First define γk = ku/(1 − ku) where u is the unit roundoff defined
˜ k = cγk for some small integer constant whose value is
above and γ
unimportant given the small size of u.
˜ k are very
Check that even for large values of k, γk and therefore γ
small.
&
%
MS4105
341
'
$
Theorem 5.5 (Stability of Householder QR Algorithm) Let
^ and R
^ be the QR factors computed using the
A be m × n and let Q
Householder QR Alg. Then
^
• There is an orthogonal m × m matrix Q s.t. A + ∆A = QR
where k∆aj k2 ≤ γmn kaj k2 for j = 1, . . . , n.
^ = Q(Im + ∆I) where
• If Q = P1 P2 . . . Pn as usual then Q
√
^
^ is very close to
k∆I(:, j)k2 ≤ γmn so kQ − QkF ≤ nγmn so Q
the orthonormal matrix Q.
• Finally,
^ R)(:,
^ j)k2
k(A − Q
≡
≤
≤
^ j) + ((Q − Q)
^ R)(:,
^ j)k2
k(A − QR)(:,
^ k^F kkR(:, j)k2
˜ mn kaj k2 + kQ − Q
γ
√
˜ mn kaj k2
nγ
^ and
so the columns of the product of the computed matrices Q
^ are very close to the corresponding columns of A.
R
&
%
MS4105
342
'
5.5.1
$
Operation Count
The work done in Alg. 5.7 is dominated by the (implicit) inner loop
Ak:m,j = Ak:m,j − 2vk (v∗k Ak:m,j )
for j = k, . . . , n. This inner step updates the jth column of the
submatrix Ak:m,k:n . If I write L = m − k + 1 for convenience then
the vectors in this step are of length L. The update requires
4L − 1 ≈ 4L flops. Argue as follows: L flops for the subtractions, L
for the scalar multiple and 2L − 1 for the inner product (L
multiplications and L − 1 additions).
&
%
MS4105
343
'
$
Now the index j ranges from k to n so the inner loop requires
≈ 4L(n − k + 1) = 4(m − k + 1)(n − k + 1) flops. Finally, the outer
loop ranges from k = 1 to k = n so I can write W, the total number
of flops used by the Householder QR algorithm as:
W=
=
1
X
k=n
n
X
4(m − k + 1)(n − k + 1)
4(m − n + k)k
k=1
n
n
X
X
= 4 (m − n)
k+
k2
k=1
k=1
= 4 (m − n)n(n + 1)/2 + n(n + 1)(2n + 1)/6
= 2 mn2 + 2 mn + 2/3 n − 2/3 n3
≈ 2mn2 − 2/3n3
&
(5.26)
%
MS4105
'
344
$
So Householder QR factorisation is more efficient than the
(modified) factorisation algorithm (see 5.14). It is also more stable
— i.e. less prone to accumulated inaccuracies due to round-off
errors — as the Example in Section 5.4.5 suggests.
(The built-in qr command is much faster - this is mainly due to the
fact that is is pre-compiled code which is not interpreted
line-by-line.)
&
%
MS4105
345
'
5.6
$
Least Squares Problems
Suppose that I have a linear system of equations with m equations
and n unknowns with m > n — i.e. I want to solve Ax = b for
x ∈ Cn where A ∈ Cm×n and b ∈ Cm .
In general there is no solution. When can I find a solution? Exactly
when b ∈ range A, which is unlikely to happen by chance. Such
systems of equations with m > n are called overdetermined. The
vector r = Ax − b called the residual may be small for some
choices of x but is unlikely to be zero for any x.
The natural resolution to an insoluble problem is to re-formulate
the problem. I re-define the problem to that of finding the choice of
x that makes the norm (usually the 2-norm) of r as small as
possible. This is referred to as the Least Squares problem as,
when the 2-norm is used, the squared norm of r is a sum of squares.
&
%
MS4105
346
'
5.6.1
$
Example: Polynomial Data-fitting
If I have data (xi , yi ), i = 1, . . . , m where the xi , yi ∈ C then there
is a unique polynomial p(x) = c0 + c1 x + c2 x2 + · · · + cm−1 xm−1 ,
of degree m − 1 that interpolated these m points — p(x) can be
found by solving the square linear system (the matrix is called the
Vandermonde matrix).

1


1

 ..
.

1
&
x1
x21
...
x2
..
.
x22
..
.
...
xm
x2m
...
...


x1m−1 


xm−1

2
c0


y1

  
 
c1 
  y1 
  
  c2  =  y2 
..  
  

 . 

.   .. 

 .. 
.

  
xm−1
m
cm−1
ym
%
MS4105
'
347
$
This is a system of m linear equations in the m unknowns ci ∈ C,
i = 0, . . . , m − 1 so there is a unique solution provided the
Vandermonde matrix is full rank.
In practice this technique is rarely used as the high-degree
polynomials needed to interpolate large data sets are typically
highly oscillatory. Additionally the Vandermonde matrix is
ill-conditioned for large m — leading to numerical instability. For
this reason it is better to “fit” the data with a relatively low order
polynomial of degree n < m.
Without changing the data points I can easily reformulate the
problem from interpolation to: find coefficients c0 , . . . , cn such that
krk2 is minimised where r = Vc − y and V is the Vandermonde
matrix as above with m rows and n columns with n < m.
&
%
MS4105
348
'
$
I have

1


1

 ..
.

x1
x21
...
x2
..
.
x22
..
.
...
1 xm
x2m
...
...



xn−1
1


xn−1

2
c0


y1

  
 y1 
c1 
  
  
  c2  ≈  y2 
..  
  


 . 
.   .. 

 .. 
.


 
xn−1
m
cn−1
ym
(5.27)
and want to choose c0 , . . . , cn to make the norm of the residual
r = Vc − y as small as possible.
&
%
MS4105
349
'
5.6.2
$
Orthogonal Projection and the Normal Equations
How to “solve” (5.27) — i.e. to to choose c0 , . . . , cn to make the
norm of the residual r = Vc − y as small as possible needs some
consideration.
I want to find the closest point Ax in the range of A to b — so that
the norm of the residual r is minimised. It is plausible that this will
occur provided that Ax = Pb where P is the orthogonal projection
operator that projects Cm onto the range of A — i.e. (5.6)
P = A(A∗ A)−1 A∗ .
This means that r ≡ Ax − b = Pb − b = −(I − P)b — so the
residual r is orthogonal to the range of A.
&
%
MS4105
350
'
$
I bundle these ideas together into a Theorem.
Theorem 5.6 Let A be an m × n complex matrix with m ≥ n and
let b ∈ Cm be given. A vector x ∈ Cn minimises krk2 ≡ kb − Axk2 ,
the norm of the residual (solving the least squares problem) if and
only if r ⊥ range(A) or any one of the following equivalent
equations hold:
A∗ r = 0
(5.28)
A∗ Ax = A∗ b
(5.29)
Pb = Ax
(5.30)
where P ∈ Cm×m is the orthogonal projection operator onto the
range of A. The n × n system of equations (5.29) are called the
normal equations and are invertible iff A has full rank. It follows
that the solution x to the least squares problem is unique iff A has
full rank.
&
%
MS4105
'
351
$
Proof:
• The equivalence of (5.28)–(5.30) is easy to check.
• To show that y = Pb is the unique point in the range of A that
minimises kb − Axk2 , suppose that z 6= y is another point in
range A. Since z − y ∈ range A and b − y = (I − P)b I have
(b − y) ⊥ (z − y) so
kb − zk22 = k(b − y) + (y − z)k22 = kb − yk22 + ky − zk22 > kb − yk22
— the result that I need.
• Finally;
– If A∗ A is singular then A∗ Ax = 0 for some non-zero x so
that x∗ A∗ Ax = 0 and so Ax = 0 meaning that A is not full
rank.
– Conversely if A is rank-deficient then Ax = 0 for some
non-zero x — implying that A∗ Ax = 0 and so A∗ A is
singular.
&
%
MS4105
352
'
5.6.3
$
Pseudoinverse
I have just seen that if A is full rank then x, the solution to the
least squares problem
min kAx − bk2
(5.31)
is unique and that the solution is given by the solution
x = (A∗ A)−1 A∗ b to the normal equations (5.29).
Definition 5.2 The matrix (A∗ A)−1 A∗ is called the
pseudoinverse of A, written A+ .
A+ = (A∗ A)−1 A∗ ∈ Cn×m .
(5.32)
This matrix maps vectors b ∈ Cm to vectors x ∈ Cn — so it has
more columns than rows. If n = m and A is invertible then
A+ = A−1 , hence the name.
&
%
MS4105
'
353
$
The (full-rank) least squares problem (5.31) can now be
summarised as that of computing one or both of x = A+ b and
y = Pb. The obvious question: how to solve these equations.
&
%
MS4105
354
'
5.6.4
$
Solving the Normal Equations
The obvious way to solve least squares problems is to solve the
normal equations (5.29) directly. This can give rise to numerical
problems as the matrix A∗ A has eigenvalues equal to the squares of
the singular values of A — so the range of eigenvalues will typically
be great, resulting in a large condition number (the ratio of the
largest to the least eigenvalue). It can be shown that a large
condition number makes the process of solving a linear system
inherently unstable — leading to loss of accuracy.
&
%
MS4105
355
'
$
A better way is to use the reduced QR factorisation. I have seen
^R
^ can be constructed using
that a QR factorisation A = Q
Gram-Schmidt orthogonalisation or more often using Householder
triangularisation.
The orthogonal projection operator P can be written
P = A(A∗ A)−1 A∗
^ R(
^ R
^∗Q
^ ∗Q
^ R)
^ −1 R
^∗Q
^∗
=Q
^ R(
^ R
^ ∗ R)
^ −1 R
^∗Q
^∗
=Q
^R
^R
^ −1 R
^ −∗ R
^∗Q
^∗
=Q
^Q
^∗
=Q
^Q
^ ∗ b.
so I have y = Pb = Q
&
%
MS4105
356
'
$
^ which
This is a nice result as it only involves the unitary matrix Q
is numerically stable (as its eigenvalues are complex numbers with
unit modulus; λi = eiφ — check ).
Since y ∈ range A, the system Ax = y has a unique solution. I can
write
^ Rx
^ =Q
^Q
^ ∗b
Q
^ ∗ gives
and left-multiplication by Q
^ =Q
^ ∗ b.
Rx
This last equation for x is an upper triangular system which is
invertible if A has full rank and can be efficiently solved by back
substitution.
&
%
MS4105
357
'
5.7
&
$
Project
%
MS4105
358
'
6
$
The Singular Value Decomposition
The singular value decomposition (SVD) is a matrix factorisation
which is the basis for many algorithms. It also gives useful insights
into different aspects of Numerical Linear Algebra.
For any complex m × n matrix A I will show that its SVD is
A = UΣV ∗
(6.1)
where U and V are unitary m × m and n × n matrices respectively
and Σ is an m × n diagonal matrix. I do not assume that m is
greater than n so that A can be “short and wide” or “tall and
thin”. The diagonal elements of Σ are called the “singular values”
of A and the number of singular values is just min(m, n).
&
%
MS4105
359
'
$
A point that often causes confusion; how can Σ be diagonal if it is
not square?
• Suppose m < n so that A

a a a a

A=
a a a a
is “short and wide” — say 2 × 5:

a

a
= UΣV ∗

u

=
u

u
σ
 1
u
0
0
σ2

v

 v

0 0 0 
 v

0 0 0 
v

v
v

v
v

v


v


v

v
v
v
v
v
v
v
v
v
v
v v
v
v
i.e. Σ is “augmented” with three extra columns of zeroes so
that the matrix multiplication works.
&
%
MS4105
'
360
$
Strictly speaking the last three rows of V ∗ (the last 3 columns
of V) are not needed as they are multiplied by the zero entries
in Σ when A is formed. In practice these columns are not
always calculated.
Of course, if these redundant columns are required then they
must be such that V is unitary.
I will return to this point when I discuss the uniqueness or
otherwise of the SVD in Chapter 6.2.
&
%
MS4105
361
'
$
• Alternatively suppose m > n so that A is “tall and thin” —
say 5 × 2:


a a


a a





A = a a



a a


a
a
= UΣV ∗

u u

u u


=
u u

u u

u
&
u
u
u
u
u
u
u
u
u
u
u

u
σ1


u
 0


u
 0


u
 0
u
0
0



σ2 

 v

0

 v
0

0

v

v
%
MS4105
'
362
$
In this case Σ is “augmented” with three extra rows of zeroes
— again so that the matrix multiplication works.
Moreover the last 3 columns of U are not needed as they are
multiplied by the zero entries in Σ when A is formed
Again in practice these columns are not always calculated. And
as for V when A is “short and wide” , here if these redundant
columns of U are required then they must be such that U is
unitary. See Chapter 6.2.
In both cases (2 × 5 or 5 × 2) A has two singular values.
In this Chapter I begin by demonstrating that all m × n matrices
have a SVD and then show that the decomposition is unique in a
certain sense.
&
%
MS4105
363
'
6.1
$
Existence of SVD for m × n Matrices
Suppose that A is an arbitrary (possibly complex) m × n matrix. I
know that
kAxk
kAk = sup
x6=0 kxk
= sup kAxk
kxk=1
Note: in this Chapter unless stated otherwise, all norms are
2-norms. To avoid clutter, the 2–subscript on the norms will be
omitted.
&
%
MS4105
'
364
$
I begin with a Lemma that will allow us to prove the main result.
Lemma 6.1 For any (possibly complex) m × n matrix A, there
exist unitary matrices U (m × m ) and V (n × n ) s.t.


σ 0
∗


(6.2)
U AV =
0 B
where σ = kAk and B is an (m − 1) × (n − 1) matrix.
Proof:
• I have σ = kAk.
• The function kAxk is a continuous function of x and the unit
ball is closed and bounded.
• So the supremum in the definition of kAk is attained by some
unit vector x◦ i.e. ∃x◦ |kAx◦ k = kAk.
&
%
MS4105
365
'
• Let y◦ =
$
Ax◦
Ax◦
=
.
kAk
σ
• Then Ax◦ = σy◦ and ky◦ k = 1.
• By Thm. 1.14 (b) I can select {u2 , . . . , um } and {v2 , . . . , vn } s.t.
{y◦ , u2 , . . . , um } and {x◦ , v2 , . . . , vn } form orthonormal
bases for Cm and Cn respectively.
• Define the matrices U and V by:
h
U = y◦
h
V = x◦
i
U1
i
V1
where the vectors {u2 , . . . , um } are the columns of U1 and the
vectors {v2 , . . . , vn } are the columns of V1 .
• From these definitions it is easy to see that U and V are
unitary — U∗ U = I and V ∗ V = I so U∗ = U−1 and V ∗ = V −1 .
&
%
MS4105
366
'
$
• Then


∗
h
i
y
◦
 A x◦ V1
U∗ AV = 
U∗1


σ y◦ ∗ AV1

=
0
B
where B = U∗1 AV1 and the zero element appears as Ax◦ = σy◦
and y◦ is orthogonal to the columns of U1 .
• I can write

σ
∗

U AV =
0
&
ω∗
B

 = A1 , (say); with ω∗ = y◦ ∗ AV1 ∈ Cn−1 .
%
MS4105
'
367
$
I will now show that ω = 0 which means that A1 is block diagonal.
as required.
• First note that kA1 k = kAk as kUAk = kAk if A is unitary.
kUAxk
This follows as kUAk = sup
. But for any vector v,
kxk
kxk=1
kUvk2 = v∗ U∗ Uv = kvk2 . So kUAk = kAk.
• Now for the clever part of the proof:
 2 
  2
σ ω∗
σ σ A1   ≡ 
  
ω ω 0 B

2
(σ2 + ω∗ ω) 

=
Bω
2
2
∗
≥ σ +ω ω .
&
%
MS4105
'
368
$
• Also (as for any compatible A and x, kAxk ≤ kAkkxk);
 2
 2
σ
σ A1   ≤ kA1 k2  
ω ω 2
2
∗
= kA1 k σ + ω ω
2
2
∗
=σ σ +ω ω .
• Combining the two inequalities: σ2 (σ2 + ω∗ ω) ≥ (σ2 + ω∗ ω)2
which forces ω∗ ω = 0 and therefore ω = 0 as required.


σ 0
∗

 as claimed.
So U AV =
0 B
&
%
MS4105
369
'
$
Now I prove the main result — that any m × n complex matrix has
a SVD.
Theorem 6.2 (Singular Value Decomposition) For any
(possibly complex) m × n matrix A, there exist unitary matrices U
and V s.t.
A = UΣV ∗
(6.3)
where U and V are unitary m × m and n × n matrices respectively
and Σ is an m × n diagonal matrix with min(m, n) diagonal
elements.
Proof: I will prove the main result by induction on m and n.
&
%
MS4105
'
370
$
[Base Step] This is the case where either n or m equal to 1.
• if n = 1 thenA isa column vector and Lemma 6.1 reduces
σ1
∗

 = Σ.
to: U AV =
0
• If m = 1 hthen Aiis a row vector and again
U∗ AV = σ1 0 = Σ.
So in either case ( either n or m equal to 1) I have (6.3) where
U and V are unitary m × m and n × n matrices respectively
and Σ is a diagonal matrix of “singular values” .
&
%
MS4105
371
'
$
[Inductive Step] By Lemma 6.1 I can write


σ1 0
∗
.

U AV =
0 B
(6.4)
The inductive hypothesis: assume that any (m − 1) × (n − 1)
matrix has a SVD. So the (m − 1) × (n − 1) matrix B from
(6.4) can be written as:
B = U2 Σ2 V2∗
where Σ2 is diagonal and padded with rows or columns of
zeroes to make it m × n as usual.
RTP that A has a SVD.
&
%
MS4105
372
'
Using (6.4) and substituting for B I have:


σ1
0

 V∗
A=U
0 U2 Σ2 V2∗



σ1 0
1
1 0





=U
0 Σ2
0
0 U2
$
0
V2
∗
 V∗
= U 0 ΣV 0∗ .




1 0
1 0
 and V 0 = V 
 are
The matrices U 0 = U 
0 U2
0 V2
products of unitary matrices and are therefore themselves
unitary.
&
%
MS4105
373
'
6.1.1
$
Some Simple Properties of the SVD
Lemma 6.3 The singular values of a complex m × n matrix are
real and non-negative — and may be ranked in decreasing order:
σ1 ≥ σ2 ≥ · · · ≥ σn .
Proof: I have σ1 ≡ σ ≡ kAk. But I now have A = UΣV ∗ so
kAxk2 ≡ x∗ VΣ∗ ΣV ∗ x = kΣzk2 , where z = V ∗ x. So (letting
Σ2 = diag(σ2 , . . . , σn ))
σ1 ≡ kAk = sup kAxk = sup kΣzk ≥
kxk=1
kzk=1
sup
kΣ(0, yT )T k
z=(0,yT )T ,kyk=1
= sup kΣ2 yk = kΣ2 k = σ2 ,
kyk=1
as the norm of a diagonal matrix is its biggest diagonal in
magnitude (see Exercise 4.4) and so σ1 ≥ σ2 . By induction the
result follows.
&
%
MS4105
374
'
$
The following Lemma ties up the relationship between three
important quantities: kAk2 , σ1 (the largest singular value ) and λ1
(the largest eigenvalue of A∗ A).
Lemma 6.4 For any m × n complex matrix A;
p
kAk2 ≡ σ1 = λ1 .
(6.5)
where λ1 is the largest eigenvalue of A∗ A.
In particular, the 2-norm of a matrix is its largest singular value.
Proof: Let q1 , . . . , qn be the (orthonormal) eigenvectors of A∗ A
with corresponding (real and non-negative — why?)
eigenvalues λ1 , . . . , λn .
&
%
MS4105
375
'
$
Then (taking x to be an arbitrary unit vector in the 2-norm on Cn )
kAxk22 = (Ax)∗ (Ax) = x∗ A∗ Ax
=
n
X
¯i q∗i A∗ Axj qj
x
i,j=1
=
n
X
¯i xi λi =
x
i=1
≤ λ1
n
X
n
X
λi |xi |2
i=1
|xi |2 = λ1 .
i=1
But the inequality is satisfied with equality if x = q1 so for any
m × n complex matrix A;
p
kAk2 ≡ σ1 = λ1 .
(6.6)
where λ1 is the largest eigenvalue of A∗ A.
&
%
MS4105
376
'
$
I have proved that every (real or complex) m × n matrix has a
SVD. I don’t yet know how to calculate the SVD but I can show an
example.
Example 6.1 Take

1

A=
2
2
3
9
3
4
8

11

12
Then the matlab command [u,s,v]=svd(a) gives
&
%
MS4105
'
377
$


−0.6904 −0.7235


U=
−0.7235 0.6904


21.2308
0
0 0 0


Σ=
0
1.5013 0 0 0


−0.1007 0.4378 −0.2772 −0.4170 −0.7399


−0.1673 0.4157 −0.3603 0.8162 −0.0564





V = −0.2339 0.3936
0.8700
0.1265 −0.1324



−0.5653 −0.6584 0.0473
0.2094 −0.4483


−0.7666 0.2172 −0.1852 −0.3163 0.4804
&
%
MS4105
378
'
$
It is easy to check (in matlab!) that UΣV ∗ = A to fourteen
decimal places — which is pretty good.
It is also easy to check that, as expected, the matrices U and V are
unitary (orthogonal as A is real).
Notice also that Σ is padded on the right with three columns of
zeroes as expected — it has 2 = min(2, 5) diagonal elements.
∗
∗
Of course, I can
now
write
down
the
SVD
of
A
=
VΣ
U. The


21.2308
0


 0
1.5013




∗

matrix Σ =  0
0 
 — again with 2 diagonal elements.


 0
0 


0
&
0
%
MS4105
379
'
$
When a matrix A is square, say 2 × 2, things are slightly simpler.
Example 6.2 Take

1

A=
2

2

3
Then the matlab command [u,s,v]=svd(a) gives


−0.5257 −0.8507


U=
−0.8507 0.5257


4.2361
0


Σ=
0
0.2361


−0.5257 0.8507


V=
−0.8507 −0.5257
&
%
MS4105
'
380
$
• Again U and V are unitary — in this case this is true by
inspection — this time U = V (apart from the sign of the
second column), I will learn why shortly. Now Σ is a 2 × 2
diagonal matrix and the singular values are 4.2361 and 0.2361.
It is easy to calculate the eigenvalues of A, by hand or with
√
matlab and I find that λ = 2 ± 5 ≈ 4.2361, −0.2361.
• So the singular values of any matrix A are the absolute values
of the eigenvalues of A? Not quite. For one thing, non-square
matrices do not have eigenvalues!
• When, as in the present example, A∗ = A, A has real
eigenvalues and the matrix A∗ A = A2 has eigenvalues that are
the square of those of A which “explains” why the singular
values for the present example are the absolute values of the
eigenvalues of A. This is not true in general.
&
%
MS4105
'
381
$
The following Theorem should make things clearer.
Theorem 6.5 For any m × n complex matrix A, the singular
values are given by
p
σi = λi , where λi are the eigenvalues of A∗ A.
(6.7)
Proof: I have A = UΣV ∗ for any m × n matrix A. Then
A∗ A = VΣ∗ ΣV ∗ = VΣT ΣV ∗ as Σ is real. So A∗ AV = VΣT Σ which
means that the non-zero eigenvalues of A∗ A are the σ2i , the squares
of the singular values.
Note: the n × n matrix ΣT Σ has the squares of the singular values
on its main diagonal. If m < n (e.g. Example 6.1) then ΣT Σ has
σ21 , . . . , σ2m on its main diagonal with n − m zeros on the main
diagonal. If m > n then ΣT Σ has σ21 , . . . , σ2n on its main diagonal
(e.g. the final comments in Example 6.1).
&
%
MS4105
382
'
6.1.2
$
Exercises
1. Find the SVD of the following matrices by any method you
wish. Hint: first find the singular values by finding the
eigenvalues of A∗ A then work out U and V as either identity
matrices — possibly with columns swapped — or generic real
unitary matrices.



 


 

0 2


3 0
2 0
1 1
1 1

 
 0 0 
 



0 −2
0 3
0 0
1 1
0 0
&
%
MS4105
'
383
$
2. Two m × n complex matrices A,B are said to be “unitarily
equivalent” if A = QBQ∗ for some unitary matrix Q. Is the
following statement true or false: “A and B are unitarily
equivalent if and only if they have the same singular values ”?
Hint:
• Write A and B as different singular value decompositions —
assume they are unitarily equivalent, what does this imply
for their singular values ?
• Now assume that they have the same singular values, does
it follow that they are unitarily equivalent?
&
%
MS4105
'
384
$
3. • Every complex m × n matrix A has a SVD A = UΣV ∗ .
• Show that if A is real then A has an SVD with U and V
real.
• Hint: consider AA∗ and A∗ A; show that both matrices are
real and symmetric and so have real eigenvalues.
• Explain why their eigenvectors may be taken to be real.
(Hint: if ui is an eigenvector of AA∗ ; what can you say
¯ i , its complex conjugate?)
about u
• (Note the SVD is not unique although the singular values
are, see next Section.)
&
%
MS4105
385
'
$
4. Show that multiplying each column of U by a different complex
number of unit modulus:
U(:, 1) = eiφ1 U(:, 1), U(:, 2) = eiφ2 U(:, 2), . . . and multiplying
the columns of V by the same phases:
V(:, 1) = eiφ1 V(:, 1), V(:, 2) = eiφ2 V(:,2), . . . leaves the product

eiφ1

˜ = U
UΣV ∗ unchanged. Hint: write U


..
.
eiφm


 and

˜ V˜ = UΣV = A. (The phases
similarly for V. Show that UΣ
cancel.)
&
%
MS4105
'
386
$
5. (Matlab/Octave exercise) Re-visit Example 6.1, then add the
following code:
>>ph=rand(5,1);
>>for j=1:5
>>
v(:,j)=v(:,j)*exp(i*ph(j));
>>end
>>for j=1:2
>>
u(:,j)=u(:,j)*exp(i*ph(j));
>>end
>> u*s*v’-a
Check that the last line returns a 2 × 5 matrix that is very
close to zero. Explain.
&
%
MS4105
387
'
6.2
$
Uniqueness of SVD
I saw in the Exercises that the SVD is not in fact unique; I could
multiply the columns of U and V by the same phases (complex
number of unit modulus) leaving UΣV ∗ unchanged.
I will prove two Theorems: the first confirms that the diagonal
matrix Σ is unique and the second clarifies exactly how much room
for manoever there is in choosing U and V.
Theorem 6.6 Given an m × n matrix A, the matrix of singular
values Σ is unique.
Proof: Take m ≥ n (the proof for the other case is similar).
Let A = UΣV ∗ and also A = LΩM∗ be two SVD’s for A. RTP that
Σ = Ω.
&
%
MS4105
'
388
$
The product AA∗ = UΣV ∗ VΣ∗ U∗ = UΣΣ∗ U∗ and also
AA∗ = LΩΩ∗ L∗ . As I saw in the discussion on Slide 360 when
m ≥ n both Σ and Ω are also m × n with the last m − n rows
consisting of zeroes.
Then the matrices Σ2 ≡ ΣΣ∗ and Ω2 = ΩΩ∗ both take the form of
an m × m diagonal matrix consisting of an n × n diagonal matrix
in the top left corner with zeroes elsewhere.
&
%
MS4105
389
'
$
So equating the two expressions for AA∗ ;
Σ2 = U∗ LΩ2 (U∗ L)∗
= OΩ2 O∗
where O = U∗ L, an m × m unitary matrix.
The eigenvalues of Σ2 are the roots of the charactistic polynomial
p(λ) = det(λI − Σ2 )
= det(λI − OΩ2 O∗ )
= det(O(λI − Ω2 )O∗ )
= det(O) det(O∗ ) det(λI − Ω2) = det(λI − Ω2)
where the final equality follows as O is unitary. So Σ2 and Ω2 have
the same eigenvalues. But as they are both diagonal by definition I
can conclude (if the same ordering of eigenvalues is used for both)
that they are equal and that therefore so are Σ and Ω.
&
%
MS4105
390
'
6.2.1
$
Uniqueness of U and V
Now I need to clarify whether U and V are unique. They aren’t.
The following Theorem clarifies to what extent U and V are
arbitrary:
Theorem 6.7 If an m × n matrix A has two different SVD’s
A = UΣV ∗ and A = LΣM∗ then
U∗ L = diag(Q1 , Q2 , . . . , Qk , R)
V ∗ M = diag(Q1 , Q2 , . . . , Qk , S)
where Q1 , Q2 , . . . , Qk are unitary matrices whose sizes are given by
the multiplicities of the corresponding distinct non-zero singular
values — and R, S are arbitrary unitary matrices whose size equals
the number of zero singular values. More precisely, if
Pk
q = min(m, n) and qi = dim Qi then i=1 qi = r = rank(A) ≤ q.
&
%
MS4105
391
'
$
Before I get bogged down in detail! When, as is usually the case, all
the singular values are different the Theorem simply says that any
alternative to the matrix U, (L, say) satisfies L = UQ where Q is a
diagonal matrix of 1 × 1 unitary matrices. A 1 × 1 unitary matrix
is just a complex number z with unit modulus, |z| = 1 or z = eiφ .
(This is what you were asked to check in Sec. 6.1.2, Exercise 4.)
Just in case you didn’t do the Exercise, let’s translate the Theorem
into simple language in the case when all the singular values are
different.
If an m × n matrix A has two different SVD’s A = UΣV ∗ and
A = LΣM∗ then I must have L = UQ and M = VP where R is an
arbitrary (for m > n) (m − n) × (m − n) unitary matrix:
Q = diag(eiφ1 , eiφ2 , . . . , eiφn , R)
P = diag(eiφ1 , eiφ2 , . . . , eiφn )
&
%
MS4105
392
'
$
Substituting for L and M, LΣM∗ = UQΣP∗ V ∗ so, when all the
singular values are different, it is easy to see that QΣP∗ = Σ as the
phases cancel.
Example 6.3 Let (using Matlab/Octave ’s [u,s,v]=svd(a))




1 5
−0.4550 0.7914
0.4082







A=
2 6 , U = −0.5697 0.0936 −0.8165
3 7
−0.6844 −0.6041 0.4082
and




11.1005
0


−0.3286 −0.9445




Σ= 0
0.8827 , V =
−0.9445 0.3286
0
0
&
%
MS4105
'
393
$
The Matlab/Octave command q=diag(exp(i*rand(3,1)*2*pi))
generates a 3 × 3 diagonal matrix Q whose diagonal entries are
random complex numbers of unit phase:


−0.4145 − 0.9100i
0
0



I
Q=
0
−0.9835 + 0.1807i
0

0
0
0.9307 + 0.3659i


−0.4145 − 0.9100i
0

 and it is easy to
have P =
0
−0.9835 + 0.1807i
check that QΣP∗ = Σ so the matrix A satisfies
A = UΣV ∗ = LΣM∗ . (In this example R is the 1 × 1 “unitary
matrix” 0.9307 + 0.3659i.)
&
%
MS4105
'
394
$
Construct another example using Matlab/Octave to illustrate the
m < n case.
The proof of Thm. 6.7 may be found in Appendix G.
&
%
MS4105
395
'
6.2.2
$
Exercises
1. Using matlab/octave, construct an example to illustrate the
result of Thm. 6.7.
&
%
MS4105
396
'
6.3
$
Naive method for computing SVD
I still don’t know how to calculate the SVD — the following is a
simple method which might work, let’s see.
It is easy to check that if A = UΣV ∗ then A∗ A = VΣ∗ ΣV ∗ and
AA∗ = UΣΣ∗ U∗ . Both A∗ A and AA∗ are obviously square and
hermitian and so have real eigenvalues and eigenvectors given by:
A∗ AV = VΣ∗ Σ
AA∗ U = UΣΣ∗ .
In other words A∗ A has eigenvalues σ21 , . . . , σ2p (p = min(m, n))
and eigenvectors given by the columns of V.
&
%
MS4105
'
397
$
Similarly AA∗ has the same eigenvalues and eigenvectors given by
the columns of U. So one way to compute the SVD af an
m × n complex matrix A is to form the two matrices A∗ A and AA∗
and find their eigenvalues and eigenvectors using the
Matlab/Octave eig command.
Let’s try it. (You can download a simple Matlab/Octave script to
implement the idea from
http://jkcray.maths.ul.ie/ms4105/verynaivesvd.m.) The
listing is in App. D.
The final line calculates the norm of the difference between UΣV ∗
and the initial (randomly generated) matrix A. This norm should
be close to the matlab built-in constant eps, about 10−16 .
&
%
MS4105
'
398
$
>> verynaivesvd
ans =
1.674023336836066
>> verynaivesvd
ans =
1.091351067518971e-15
>> verynaivesvd
ans =
1.125583683053479
>> verynaivesvd
ans =
1.351646685125938e-15
&
%
MS4105
'
399
$
Annoyingly, this Matlab/Octave script sometimes works but
sometimes doesn’t. Why? Remember that the columns of U and V
can be multiplied by arbitrary phases (which are the same for
corresponding columns of U and V). The problem is that this
method has no way to guarantee that the phases are the same as I
are separately finding the eigenvectors of A∗ A and AA∗ .
Sometimes (by luck) the script will generate U and V where the
phases are consistent. Usually it will not.
I will see a more successful approach in Sec. 6.5— in the meantime
I will just use the Matlab/Octave code:
>> [u,s,v]=svd(a)
when I need the SVD of a matrix.
&
%
MS4105
400
'
6.4
$
Significance of SVD
In this section I see how the SVD relates to other matrix properties.
6.4.1
Changing Bases
One way of interpreting the SVD is to say that every matrix is
diagonal — if I use the right bases for the domain and range spaces
for the mapping x → Ax. Any vector b ∈ Cm can be expanded in
the space of the columns of U and any x ∈ Cn can be expanded in
the basis of the columns of V.
&
%
MS4105
401
'
$
The coordinate vectors for these expansions are
b 0 = U∗ b,
x 0 = V ∗ x.
Using A = UΣV ∗ , the equation b = Ax can be written in terms of
these coefficient vectors (b 0 and x 0 ):
b = Ax
U∗ b = U∗ Ax = U∗ UΣV ∗ x
b 0 = Σx 0 .
So whenever b = Ax, I have b 0 = Σx 0 . A reduces to the diagonal
matrix Σ when the range is expressed in the basis of columns of U
and the domain is expressed in the basis of the columns of V.
&
%
MS4105
402
'
6.4.2
$
SVD vs. Eigenvalue Decomposition
Of course the idea of diagonalisation is fundamental to the study of
eigenvalues — Ch. 8. I will see there that a non-defective square
matrix A can be expressed as a diagonal matrix of eigenvalues Λ if
the domain and range are represented in a basis of eigenvectors.
If an n × n matrix X has as its columns the linearly
independent eigenvectors of an n × n complex matrix A then the
eigenvalue decomposition of A is
A = XΛX−1
(6.8)
where Λ is an n × n diagonal matrix whose entries are the
eigenvalues of the matrix A. (I will see in Ch. 8 that such a
factorisation is not always possible.)
&
%
MS4105
'
403
$
So if I have Ax = b, b ∈ Cn then if b 0 = X−1 b and x 0 = X−1 x then
b 0 = Λx 0 . See Ch. 8 for a full discussion.
• One important difference between the SVD and the eigenvalue
decomposition is that the SVD uses two different bases (the
columns of U and V) while the eigenvalue decomposition uses
just one; the eigenvectors.
• Also the SVD uses orthonormal bases while the eigenvalue
decomposition uses a basis that is not in general orthonormal.
• Finally; not all square matrices (only non-defective ones, see
Thm. 8.5) have an eigenvalue decomposition but all matrices
(not necessarily square) have a SVD.
&
%
MS4105
404
'
6.4.3
$
Matrix Properties via the SVD
The importance of the SVD becomes clear when I look at its
relationship with the rest of Matrix Algebra. In the following let A
be a complex m × n matrix, let p = min(m, n) (the number of
singular values ) and let r ≤ p be the number of non-zero singular
values.
First a simple result — useful enough to deserve to be stated as a
Lemma. (It is the second half of the result on Slide 167 that in the
matrix-matrix product B = AC, each column of B is a linear
combination of the columns of A.)
Lemma 6.8 If A and C are compatible matrices then each row of
B = AC is a linear combination of the rows of C.
&
%
MS4105
405
'
$
Proof: Write B = AC in index notation:
X
bik =
aij cjk .
j
Fixing i corresponds to choosing the ith row of B, b∗i . So
X
∗
bi =
aij c∗j .
j
Or check that the result follows from the result on Slide 167 by
taking transposes.
&
%
MS4105
'
406
$
Now an important result that allows us to define the rank of a
matrix in a simple way.
Theorem 6.9 The row and column ranks of any m × n complex
matrix A are equal to r, the number of non-zero singular values.
Proof: Let m ≥ n for the sake of definiteness — a tall thin matrix.
(If m < n then consider A∗ which has more rows than columns.
The proof below shows that the row and column ranks of A∗ are
equal. But the row rank of A∗ is the column rank of A and the
column rank of A∗ is the row rank of A.)
I have that r of the singular values are non-zero. RTP that the row
and column ranks are both equal to r.
&
%
MS4105
407
'
$
I can write the m × n diagonal matrix Σ as


^ 0
Σ
,
Σ=
0 0
^ is the r × r diagonal matrix of non-zero singular values.
where Σ
Now


∗
σ1 v1


σ2 v∗ 
2


.. 
∗

ΣV =  . 



 σ v∗ 
 r r
0
&
%
MS4105
408
'
$
and
∗
h
UΣV = u1
u2
...


∗
σ1 v1


σ2 v∗ 
2
i
 . 
. 
um 
 . =


 σ v∗ 
 r r
0
h
u1
&
u2
...
ur


∗
σ1 v1


σ2 v∗ 
2
i
 . 
. 
0 
 .  (6.9)


 σ v∗ 
 r r
0
%
MS4105
'
409
$
So every column of A = UΣV ∗ is a linear combination of the r
linearly independent vectors u1 , . . . , ur and therefore the column
rank of A is r.
But the last equation 6.9 for UΣV ∗ also tells us (by Lemma 6.8)
that every row of A = UΣV ∗ is a linear combination of the r
linearly independent row vectors v∗1 , . . . , v∗r and so the row rank of
A is r.
Theorem 6.10 The range of A is the space spanned by u1 , . . . , ur
and the nullspace of A is the space spanned by vr+1 , . . . , vn .
Proof: Using 6.9 for A = UΣV ∗ ,
&
%
MS4105
410
'
h
Ax = u1
$
u2
...
ur


∗
σ1 v1 x


σ2 v∗ x
2 
i
 . 
. 
0 
 .  which is a linear


 σ v∗ x 
 r r 
0
combination of u1 , . . . , ur .
Also if a vector z ∈ Cn is a linear combination of vr+1 , . . . , vn then,
again using 6.9 for A = UΣV ∗ and the fact that the vectors
v1 , . . . , vn are orthonormal I have Az = 0.
&
%
MS4105
'
411
$
Theorem
q 6.11 kAk2 = σ1 the largest singular value of A and
kAkF = σ21 + σ22 + · · · + σ2r .
Proof: I already have the first result by (6.5). You shold
check that the Frobenius norm satisfies
kAk2F = trace(A∗ A) = trace(AA∗ ). It follows that the Frobenius
norm is invariant under multiplication by unitary matrices (check )
so kAkF = kΣkF .
Theorem 6.12 The nonzero singular values of A are the square
roots of the non-zero eigenvalues of AA∗ or A∗ A (the two matrices
have the same eigenvalues ). (I have established this result already
— but re-stated and proved here for clarity.)
Proof: Calculate A∗ A = V(Σ∗ Σ)V ∗ , so A∗ A is unitarily similar to
Σ∗ Σ and so has the same eigenvalues by Thm. 8.2. The
eigenvalues of the diagonal matrix Σ∗ Σ are σ21 , σ22 , . . . , σ2p together
with n − p additional zero eigenvalues if n > p.
&
%
MS4105
'
412
$
Theorem 6.13 If A is hermitian (A∗ = A) then the singular
values of A are the absolute values of the eigenvalues of A.
Proof: Remember that a hermitian matrix has a full set of
orthonormal eigenvectors and real eigenvalues (Exercises 4.2.6).
But then A = QΛQ∗ so A∗ A = QΛ2 Q∗ and the squares of the
(real) eigenvalues are equal to the squares of the singular values —
i.e. λ2i = σ2i and so as the singular values are non-negative I have
σi = |λi |.
&
%
MS4105
413
'
$
Theorem 6.14 For any square matrix A ∈ Cn×n , the modulus of
Qn
the determinant of A, | det A| = i=1 σi .
Proof: The determinant of a product of square matrices is the
product of the determinants of the matrices. Also the determinant
of a unitary matrix has modulus equal to 1 as U∗ U = I and
det(U∗ ) = (det(U)) (as a determinant is a sum of products of
entries in the matrix and det AT = det A).
Therefore
| det A| = | det UΣV ∗ | = | det U|| det Σ|| det V ∗ | = | det Σ| =
n
Y
σi .
i=1
&
%
MS4105
414
'
6.4.4
$
Low-Rank Approximations
One way to understand and apply the SVD is to notice that the
SVD of a matrix can be re-interpreted as an expansion of A as a
sum of rank-1 matrices. The following result is surprising at first
but easily checked. It is reminiscent of (but completely unrelated
to) the Taylor Series expansion for smooth functions on R. And it
is crucial in the succeeding discussion.
Theorem 6.15 Any m × n matrix A can be written as a sum of
rank-1 matrices:
r
X
A=
σj uj v∗j
(6.10)
j=1
Proof: Just write Σ as a sum of r matrices Σj where each
Σj = diag(0, 0, . . . , σj , 0, . . . , 0). Then (6.10) follows from the
SVD.
&
%
MS4105
415
'
$
There are many ways to write a matrix as a sum of rank one
matrices (for example an expansion into matrices all zero except for
one of the rows of A). The rank-1 matrices σj uj v∗j in the expansion
in (6.10) have the property that a rank-k partial sum Ak of the
σj uj v∗j is the closest rank-k matrix to A — Ak is the “best
low-rank approximation” to A.
First I find an expression for kA − Ak k2 .
Theorem 6.16 For any k such that 1 ≤ k ≤ r define
Ak =
k
X
σj uj v∗j .
j=1
If k = p ≡ min(m, n), define σk+1 = 0. Then
kA − Ak k2 = σk+1 .
&
%
MS4105
416
'
Proof: First note that A − Ak =
kA − Ak k22 = k
r
X
$
Pr
j=k+1
σj uj v∗j so
σj uj v∗j k22
j=k+1

= largest eigenvalue of 
r
X
∗
σj uj v∗j 
j=k+1

= largest eigenvalue of 
r
X
!
σi ui v∗i
i=k+1

r
X
σi σj vj u∗j ui v∗i 
i,j=k+1
= largest eigenvalue of
r
X
!
σ2i vi v∗i
= σ2k+1 .
i=k+1
using (6.5), orthonormality of the ui and the fact that
Pr
2
∗
i=k+1 σi vi vi has eigenvectors vi and corresponding
eigenvalues σ2i , i = k + 1, . . . r — the largest of which is σ2k+1 .
&
%
MS4105
417
'
$
I now prove the main result:
Theorem 6.17 With the definitions in the statement of
Thm. 6.16, a rank-k partial sum of the σj uj v∗j is the closest rank-k
matrix to A (inf means “greatest lower bound” or infimum):
kA − Ak k2 =
inf
B∈Cm×n ,rank B≤k
kA − Bk2 = σk+1 .
Proof: Suppose that there is a matrix B with rank B ≤ k “closer
to A than Ak ”, i.e. kA − Bk2 < kA − Ak k2 = σk+1 (the last
equality by Thm. 6.16). As B has rank less than or equal to k, by
the second part of Thm. 6.10 there is a subspace W of dimension at
least (n − k) — the nullspace of B — W ⊆ Cn such that
w ∈ W ⇒ Bw = 0.
So for any w ∈ W, I have Aw = (A − B)w and
kAwk2 = k(A − B)wk2 ≤ kA − Bk2 kwk2 < σk+1 kwk2 .
&
%
MS4105
418
'
$
• So W is a subspace of Cn of dimension at least (n − k) — and
kAwk2 < σk+1 kwk2 for any w ∈ W.
• Let W = span{v1 , . . . , vk+1 }, the first k + 1 columns of V where
Pk+1
∗
A = UΣV . Let w ∈ W. Then w = i=1 αi vi with
Pk+1
2
kwk2 = i=1 |αi |2 . So
k+1
2 k+1
X
X
2
kAwk2 = αi σi ui =
|αi |2 σ2i
i=1
≥ σ2k+1
i=1
k+1
X
|αi |2 = σ2k+1 kwk2 .
i=1
• Since the sum of the dimensions of W and W is greater than n,
there must be a non-zero w vector that is contained in both.
Why? check . Contradiction.
&
%
MS4105
419
'
6.4.5
$
Application of Low-Rank Approximations
A nice application of low rank approximations to a matrix is image
compression. Suppose that I have an m × n matrix of numbers (say
each in [0, 1]) representing the grayscale value of a pixel where 0 is
white and 1 is black. Then a low-rank approximation to A is a neat
way of generating a compressed representation of the image.
(There are more efficient methods which are now used in practice.)
&
%
MS4105
'
420
$
I can work as follows:
• Find the SVD of A.
• Find the effective rank of A by finding the number of singular
√
values that are greater than some cutoff, say ε where ε is
matlab’s eps or machine epsilon.
• Calculate a succession of low rank approximations to A.
The following slides illustrate the idea. The original portrait of
Abraham Lincoln on Slide 422 is a greyscale image consisting of
302 rows and 244 columns of dots. Each dot is represented by an
integer between 0 and 255.
This loads into Matlab/Octave as a 302 × 244 matrix A (say) of
8-bit integers (much less storage than representing the dots as
double-precision reals). So the matrix A takes 73, 688 bytes of
storage.
&
%
MS4105
'
421
$
You should check that these need 302r + r + 244r = 547r bytes
(where r is the rank used) .
I will use low-rank approximations to store and display the
portrait, say ranks r = 10, r = 20 and r = 30. In other words, 5, 470
bytes, 10, 940 bytes and 16, 410 bytes of storage respectively —
much less than 73, 688 bytes required for the original portrait.
The low rank approximations are shown on the succeeding Slides: a
rank-10 reconstruction on Slide 423, a rank-20 reconstruction on
Slide 424 and a rank-30 reconstruction on Slide 425. Even the rank
10 approximation is unmistakeable while the rank-30
approximation is only slightly “fuzzy”.
In the Exercise you will be asked to use the method on a more
colourful portrait!
&
%
MS4105
'
422
$
Figure 12: Original greyscale picture of Abraham Lincoln
&
%
MS4105
423
'
$
Figure 13: Rank-10 reconstruction of Lincoln portrait
&
%
MS4105
424
'
$
Figure 14: Rank-20 reconstruction of Lincoln portrait
&
%
MS4105
425
'
$
Figure 15: Rank-30 reconstruction of Lincoln portrait
&
%
MS4105
'
426
$
Now for some colour! It takes a little more work to create a
low-rank approximation to an m × n pixel colour image as they are
read in and stored by Matlab/Octave as three separate matrices,
one for each of the primary colours; red, greeen and blue.
To illustrate this the next example picture Fig. 17 has 249 rows
and 250 columns and is stored by Matlab/Octave as a
multi-dimensional arrow of size
>> size(picture)
ans =
249
250
3
The three “colour planes” each have rank 249 but when I plot the
“red” singular values in Fig. 16 I see that they decrease rapidly so
a low rank approximation will capture most of the detail.
>> semilogy(diag(s_r))
>> title(’Semi-log Y plot of SVD’’s for red layer of painting’)
&
%
MS4105
'
427
$
Figure 16: Semi-log Y plot of SVD’s for red layer of painting
&
%
MS4105
428
'
$
The details will be explained in the tutorial — for now just have a
look!
&
Figure 17: Original Sailing Painting
%
MS4105
429
'
$
Figure 18: Rank-10 reconstruction of Sailing Painting
&
%
MS4105
430
'
$
Figure 19: Rank-40 reconstruction of Sailing Painting
&
%
MS4105
431
'
$
Figure 20: Rank-70 reconstruction of Sailing Painting
&
%
MS4105
'
432
$
Figure 21: Rank-100 reconstruction of Sailing Painting
&
%
MS4105
433
'
6.5
$
Computing the SVD
As mentioned earlier, in principle I can find the singular values
matrix Σ by finding the eigenvalues of A∗ A.
• A straightforward way to find the matrices U and V is:
1. Solve the eigenvalue problem A∗ AV = VΛ — i.e. find the
eigenvalue decomposition A∗ A = VΛV ∗ .
2. Set Σ to be the m × n diagonal square root of Λ.
3. Solve the system UΣ = AV for a unitary matrix U.
• Step 3 is non-trivial — I cannot just solve the matrix equation
for U by setting U = AVΣ−1 as Σ may not be invertible
(cannot be if m 6= n).
&
%
MS4105
434
'
$
• However I can (when m ≥ n) solve for the first n columns of
^ say) as follows:
U, (U
^ = AV Σ
^ −1
U
^ is the n × n top left square block of Σ.
where Σ
• This simple algorithm is implemented in
http://jkcray.maths.ul.ie/ms4105/usv.m.
• It works for moderate values of m and n (m ≥ n) when A is
not too ill-conditioned. To illustrate its limitations try running
the Matlab/Octave script:
http://jkcray.maths.ul.ie/ms4105/svd_expt.m. You’ll
find that when the ratio of the largest to the least singular
value is bigger than about 1.0e8, the orthogonality property of
u is only approximately preserved.
• (The script also tests the qrsvd.m code to be discussed below.)
&
%
MS4105
435
'
$
• A better solution is to find the QR factorisation for the matrix
AV.
• The QR factorisation, discussed in Section. 5.2 expresses any
m × n matrix A as
A = QR
(6.11)
where Q is unitary m × m and R is upper triangular.
• In fact the matrix R in the QR factorisation for the matrix AV
is diagonal — see Exercise 2 at the end of this Chapter.
• So I can write
AV = QD
where Q is unitary and D is diagonal.
• I cannot simply set U = Q as some of the diagonal elements of
D may be negative or even complex.
&
%
MS4105
'
436
$
• I have AV = QD so A = QDV ∗ and therefore by the SVD,
UΣV ∗ = QDV ∗ . Solving for D, I have D = OΣ where
O = Q∗ U is unitary. But D is diagonal so check that O must
be also. A diagonal unitary matrix must satisy |Oii |2 = 1, i.e.
each Oii = eiφi for some real φi . If I divide each diagonal
element of D by eiφi and multiply each row of the matrix Q by
the corresponding eiφi then the matrix D is real and
non-negative and I have our SVD.
• A slight “wrinkle” is that I use the absolute value of the
diagonal matrix R to calculate the singular value matrix. This
greatly improves the accuracy of the factorisation for the subtle
reason that the process of computing Q and R using the QR
algorithm is much more numerically stable than the matrix
division used in usv.m.
&
%
MS4105
'
437
$
• And the matlab code is at
http://jkcray.maths.ul.ie/ms4105/qrsvd.m
• This algorithm is much more numerically stable , even when A
is ill-conditioned.
• UΣV ∗ is a good approximation to A — though not quite as
good as the usv.m algorithm.
• The new QR SVD algorithm also preserves the orthogonality of
U to very high accuracy which is an improvement. (This is due
to the QR factorisation, discussed in Section 5.2 .)
• You can check qrsvd.m using the test script
http://jkcray.maths.ul.ie/ms4105/svd_expt.m.
• Practical algorithms for calculating the SVD will be briefly
discussed in Ch. 10. They are more numerically stable due to
the fact that U and V are calculated simultanaeously without
forming A∗ A — they are also much faster.
&
%
MS4105
'
438
$
• Once the SVD is known, the rank can be computed by simply
counting the number of non-zero singular values.
• In fact the number of singular values greater than some small
tolerance is usually calculated instead as this gives a more
robust answer in the presence of numerical rounding effects.
&
%
MS4105
439
'
6.5.1
$
Exercises
1. Show that the Frobenius norm is invariant wrt multiplication
by a unitary matrix.
2. Show that if Q and R are the (unitary and upper triangular
respectively) QR decomposition of the matrix AV, where A is
any m × n complex matrix and V is the unitary matrix that
appears in the SVD for A = UΣV ∗ , then R is diagonal. The
following steps may be helpful:
(a) Show that AV = QR and A = UΣV ∗ imply that
Σ = U∗ QR = OR (say) where O is orthogonal.
(b) Show that Σ = OR implies that Σ∗ Σ = R∗ R and so that R∗ R
is diagonal.
(c) Show (using induction) that if R is upper triangular and
R∗ R is diagonal then R must be diagonal.
&
%
MS4105
'
440
$
3. Consider again the matrix A in Example 4.3. Using the SVD,
work out the exact values of σmin and σmax for this matrix.
4. Consider the matrix (See Appendix F.)


−2 11

.
A=
−10 5
&
%
MS4105
'
441
$
(a) Find an SVD (remember that it isn’t unique) for A that is
real and has as few minus signs as possible in U and V.
(b) List the singular values of A.
(c) Using Maple or otherwise sketch the unit ball (in the
Euclidean 2-norm) in R2 and its image under A together
with the singular vectors.
(d) What are the 1-, 2- and ∞-norms of A?
(e) Find A−1 via the SVD.
(f) Find the eigenvalues λ1 , λ2 of A.
&
%
MS4105
'
7
442
$
Solving Systems of Equations
In this Chapter I will focus on Gaussian Elimination — a familiar
topic from Linear Algebra 1. I will, however, revisit the algorithm
from the now-familiar perspective of matrix products and
factorisations. I will begin by reviewing G.E. together with its
computational cost, then amend the algorithm to include “partial
pivoting” and finally I will briefly discuss the question of numerical
stability.
&
%
MS4105
443
'
7.1
$
Gaussian Elimination
You are probably familiar with Gaussian Elimination in terms of
“applying elementary row operations” to a (usually) square matrix
A in order to reduce it to “reduced row echelon form” — effectively
upper triangular form. I will re-state this process in terms of
matrix, rather than row, operations.
7.1.1
LU Factorisation
I will show that Gaussian Elimination transforms a linear system
Ax = b into a upper triangular one by applying successive linear
transformations to A, multiplying A on the left by a simple matrix
at each iteration. This process is reminiscent of the Householder
triangularisation for computing QR factorisations. The difference
is that the successive transformations applied in GE are not
unitary.
&
%
MS4105
444
'
$
I’ll start with an m × m complex matrix A (I could generalise to
non-square matrices but these are rarely of interest when solving
linear systems ). The idea is to transform A into an m × m upper
triangular matrix U by introducing zeroes below the main
diagonal; first in column 1, them in column 2 and so on — just as
in Householder triangularisation.
Gaussian Elimination effects these changes by subtracting multiples
of each row from the rows beneath. I claim that this “elimination”
process is equivalent to multiplying A by a succession of lower
triangular matrices Lk on the left:
Lm−1 Lm−2 . . . L2 L1 A = U,
(7.1)
and writing L−1 = Lm−1 Lm−2 . . . L2 L1 so that
A = LU
where U is upper triangular and L is lower triangular.
&
(7.2)
%
MS4105
445
'
$
For example start with a 4 × 4 matrix A. The algorithm takes
three steps.

×

×


×

×
×
×
×
×
×
×
×
×
A


×
× ×



×
 L1  0 X
→
0 X
×


×
0 X
×
×
X

X
 L2
→
X

X
X
X
L1 A

× × ×


× ×



0 X

0 X
L2 L1 A
&




×
× × × ×




×
× × ×
 L3 

→



X
× ×


X
0 X
L3 L2 L1 A
%
MS4105
'
446
$
So I can summarise:
[Gram-Schmidt] A = QR by triangular orthogonalisation
[Householder ] A = QR by orthogonal triangularisation
[Gaussian Elimination ] A = LU by triangular triangularisation
&
%
MS4105
447
'
7.1.2
$
Example
I’ll start with a numerical example.

2 1

4 3

A=
8 7

6 7
[Gaussian Elimination : Step 1]


1
2


 4
−2 1


L1 A = 

−4

1 

 8
−3
1
6
&
1
3
9
9

0

1


5

8
1
1
3
3
7
9
7
9
 
2 1
0
 

1
  1
=

5
  3
4
8
1
1
5
6

0

1


5

8
%
MS4105
'
448
$
I subtracted 2 times Row 1 from Row 2, four times Row 1 from
Row 3 and three times Row 1 from Row 4.
&
%
MS4105
449
'
[Gaussian Elimination : Step 2]


1
2




1


L2 L1 A = 

 −3 1  


−4
1
$
 
1 1 0
2 1
 

1 1 1
  1
=

3 5 5
 
4 6 8

1 0

1 1


2 2

2 4
I subtracted three times Row 2 from Row 3 and four times
Row 2 from Row 4.
[Gaussian Elimination : Step 3]


 
1
2 1 1 0
2


 
 1
  1 1 1 


 
L3 L2 L1 A = 

=



 
1
2
2


 
−1 1
2 4
I subtracted Row 3 from Row 4.
&

1 1 0

1 1 1


2 2

2
%
MS4105
'
450
$
Now to enable us to write A = LU, I need to calculate the product
L = L1 −1 L2 −1 L3 −1 . This turns out to be much easier than
expected due to the following two properties — I will prove them in
Theorem 7.1 below.
(a) The inverse of each Lk is just Lk with the signs of the
subdiagonal elements in column k reversed.


1


 1



For example L2 −1 = 

 3 1 


4
1
(b) The product of the Lk −1 in increasing order of k is just the
identity matrix with the non-zero sub-diagonal elements of each
of the Lk −1 inserted in the appropriate places.
&
%
MS4105
451
'
So I can write

2

4


8

6
$
1
1
3
3
7
9
7
9
A
&
 
0
1
 

1
 2
=

5
 4
8
3
1
3
1
4
1
L


2 1 1 0


 1 1 1





2 2


1
2
(7.3)
U
%
MS4105
452
'
7.1.3
$
General Formulas for Gaussian Elimination
It is easy to write the matrix Lk for an arbitrary matrix A. Let ak
be the kth column of the matrix at the beginning of Step k. Then
the transformation Lk must be chosen so that




a1k
a1k




 .. 
 .. 
 . 
 . 








 akk  Lk
akk 
 → Lk ak = 
.
ak = 




ak+1,k 
 0 




 .. 
 .. 
 . 
 . 




amk
&
0
%
MS4105
453
'
$
To achieve this I need to add −`jk times row k to row j, where `jk
is the multiplier
`jk =
&
ajk
,
akk
k < j ≤ m.
%
MS4105
454
'
The matrix Lk must take the form

1


..

.



1

Lk = 
−`k+1,k



..

.

−`m,k
&
$

1
..
.













1
%
MS4105
'
455
$
I can now prove the two useful properties of the Lk matrices
mentioned above:
Theorem 7.1 (a) Each Lk can be inverted by negating its
sub-diagonal elements.
(b) The product of the Lk −1 in increasing order of k is just the
identity matrix with the non-zero sub-diagonal elements of each
of the Lk −1 inserted in the appropriate places.
Proof:
(a) Define `k to be the vector of multipliers for the kth column of
Lk , (with zeroes in the first k rows) so that
&
%
MS4105
456
'
$


0


 .. 
 . 




 0 
.
`k = 


`k+1,k 


 .. 
 . 


`m,k
Then Lk = I − `k e∗k where ek is the usual vector in Cm with 1
in the kth row and zeroes elsewhere. Obviously e∗k `k = 0 so
(I − `k e∗k )(I + `k e∗k ) = I − `k e∗k `k e∗k = I proving that
Lk −1 = (I + `k e∗k ), proving (a).
&
%
MS4105
457
'
$
(b) Consider the product Lk −1 Lk+1 −1 . I have
Lk −1 Lk+1 −1 = (I + `k e∗k )(I + `k+1 e∗k+1 )
= I + `k e∗k + `k+1 e∗k+1 + `k e∗k `k+1 e∗k+1
= I + `k e∗k + `k+1 e∗k+1
as e∗k `k+1 = 0 and so `k e∗k `k+1 e∗k+1 = 0.
So the matrix L can be written as

1

 `21


−1
−1
−1
L = L1 L2 . . . Lm−1 = 
 `31
 .
 ..

`m1
&

1
`32
..
.
1
..
`m2
...
.
..
.
`m,m−1




 (7.4)




1
%
MS4105
'
458
$
In practice (analogously to QR factorisation ) the matrices Lk are
never formed explicitly. The multipliers `k are computed and
stored directly into L and the transformations Lk are then applied
implicitly.
Algorithm 7.1 Gaussian Elimination without Pivoting
(1)
(2)
(3)
(4)
(5)
(6)
(7)
U = A, L = I
for k = 1 to m − 1
for j = k + 1 to m
ujk
`jk = ukk
uj,k:m = uj,k:m − `jk uk,k:m
end
end
&
%
MS4105
459
'
7.1.4
$
Operation Count
The work is dominated by the vector operation in the inner loop;
uj,k:m = uj,k:m − `jk uk,k:m which executes one scalar-vector
multiplication and one vector subtraction. If l = m − k + 1 is the
length of the row vectors being worked on then the number of flops
is 2l.
To find W, the total number of flops I just write the two nested
for loops as sums:
&
%
MS4105
460
'
$
W=
m−1
X
m
X
2(m − k + 1)
k=1 j=k+1
=
m−1
X
2(m − k)(m − k + 1)
k=1
=
m−1
X
2j(j + 1)
setting j = m − k
j=1
= 2 (m − 1)m(2(m − 1) + 1)/6 + m(m − 1)/2
= 2/3 m3 − 2/3 m
≈ 2/3 m3
&
(7.5)
%
MS4105
'
461
$
Comparing this result with (5.26) for the Householder QR method
for solving a linear system (when m = n, the latter has
W = 4m3 /3) I have found that Gaussian Elimination has half the
computational cost.
&
%
MS4105
462
'
7.1.5
$
Solution of Ax = b by LU factorisation
If A is factored into L and U, a system of equations Ax = b can be
solved by first solving Ly = b for the unknown y (forward
substitution) then Ux = y for the unknown x (back substitution).
The back substitution algorithm can be coded as:
Algorithm 7.2 Back Substitution
(2)
for j = mDOWNTO1
Pm
xj = bj − k=j+1 xk ujk /ujj
(3)
end
(1)
I can easily compute the cost of back substitution.
At the jth iteration, the cost is m − j multiplys and m − j
subtractions plus one division giving a total of 2(m − j) + 1 flops.
Summing this for j = 1 to m gives a total cost W = m2 .
&
%
MS4105
'
463
$
Exercise 7.1 Write pseudo-code for the Forward Substitution
algorithm and check that the result is also W = m2 .
Putting the pieces together, I have that when I solve a linear
system using LU factorisation, the first step (factoring A) requires
≈ 2/3m3 flops (as discussed earlier), the second and third each
require ≈ m2 flops. So the overall cost of the algorithm is
W ≈ 2/3m3 as the m3 term is the leading term for large m.
&
%
MS4105
464
'
7.1.6
$
Instability of Gaussian Elimination without
Pivoting
Gauss Elimination is clearly faster (a factor of two) than an
algorithm based on Householder factorisation. However it is not
numerically stable as presented. In fact the algorithm will fail
completely for some perfectly well-conditioned matrices as it will
try to divide by zero.
Consider

0

A=
1

1
.
1
The matrix has full rank and is well-conditioned but it is obvious
that GE will fail at the first step due to dividing by zero. Clearly I
need to modify the algorithm to prevent this happening.
&
%
MS4105
465
'
$
Even if our basic GE does not fail due to division by zero it can
still have numerical problems. Now consider a small variation on A
above:

10−20
A=
1

1
.
1
The GE process does not fail. I subtract 1020 times row one from
row two and the following factors are calculated (in exact
arithmetic):




1
0
10−20
1



.
L=
, U=
1020 1
0
1 − 1020
&
%
MS4105
466
'
$
The problems start of course when I try to perform these
calculations in floating point arithmetic, with a machine epsilon
(smallest non-zero floating point number) ≈ 10−16 , say. The
number 1 − 1020 will not be represented exactly, it will be rounded
to the nearest float — suppose that this is −1020 . Then the
floating point matrices produced by the algorithm will be




−20
1
0
10
1
˜
˜
.



L=
, U=
1020 1
0
−1020
The change in L and U is small but if I
˜U
˜ I find that
L

−20
10
˜U
˜ =
L
1
now compute the product

1

0
which is very different to the original A.
&
%
MS4105
467
'
$
In fact
˜Uk
˜ = 1.
kA − L
˜ and U
˜ to find a solution to Ax = b with b = (1, 0)∗ I get
If I use L
˜ = (0, 1)∗ while the exact solution is x ≈ (−1, 1)∗ . (Check .)
x
GE has computed the LU decomposition of A in a “stable” way,
˜ and U
˜ the floating point (rounded) factors of A are close to
i.e. L
the exact factors L and U of a matrix close to A (in fact A itself).
However I have seen that the exact arithmetic solution to Ax = b
differs greatly from the floating point solution to Ax = b.
&
%
MS4105
'
468
$
This is due to the unfortunate fact that the LU algorithm for the
solution of Ax = b, though stable, is not “backward stable” in that
the floating point implementation of the algorithm does not have
the nice property that it gives the exact arithmetic solution to
the same problem with data that differs from the exact data with a
relative error of order machine epsilon.
I will show in the next section that (partial) pivoting largely
eliminates (pun) these difficulties.
(A technical description of the concepts of stability and backward
stability can be found in Appendix O.)
&
%
MS4105
469
'
7.1.7
$
Exercises
1. Let A ∈ Cm×m be invertible. Show that A has an LU
factorisation iff for each k such that 1 ≤ k ≤ m, the upper-left
k × k block A1:k,1:k is non-singular. (Hint: the row operations
of GE leave the determinants det A1:k,1:k invariant.) Prove
that this LU decomposition is unique.
2. The GE algorithm, Alg. 7.1 as presented above has two nested
for loops and a third loop implicit in line 5. Rewrite the
algorithm with just one explicit for loop indexed by k. Inside
this loop, U should be updated at each step by a certain
rank-one outer product. (This version of the algorithm may be
more efficient in matlab as matrix operations are implemented
in compiled code while for loops are interpreted.)
3. Write matlab code to implement Algs. 7.1 and 7.2.
&
%
MS4105
470
'
7.2
$
Gaussian Elimination with Pivoting
I saw in the last section that GE in the form presented is unstable.
The problem can be mitigated by re-ordering the rows of the
matrix being operated on, a process called pivoting.
But first, as I will use the term and the operation frequently in this
section; a note on permutations.
&
%
MS4105
471
'
7.2.1
$
A Note on Permutations
Formally, a permutation is just a re-ordering of the numbers
{1, 2, . . . , n} for any positive integer n. For example p = {3, 1, 2} is a
permutation of the set {1, 2, 3}.
In the following I will use permutation matrices Pi to re-order the
rows/columns of the matrix A.
Some details for reference:
• Let p be the required re-ordering of the rows of a matrix A and
I the n × n identity matrix.
• Then claim that P = I(p, :) is the corresponding permutation
matrix — or equivalently that Pij = δpi ,j .
• Check: (PA)ik = Pij Ajk = δpi ,j Ajk = Api ,k — clearly the rows
of A have been permuted using the permutation vector p.
&
%
MS4105
'
472
$
• What if the columns of the n × n identity matrix I are
permuted?
• Let Q = I(:, p) so that Qij = δi,pj .
• Then (AQ)ik = Aij Qjk = Aij δj,pk = Ai,pk — the columns of
A have been permuted using the permutation vector p.
• Finally, what is the relationship between the (row) permutation
matrix P and the (column) matrix Q?
• Answer: (PQ)ik = Pij Qjk = δpi ,j δj,pk = δpi ,pk ≡ δik !
• So Q = P−1 — i.e. Q is the permutation matrix corresponding
to the “inverse permutation” q that “undoes” p so that
p(q) = [1, 2, . . . , n]T and Qp = Pq = [1, 2, . . . , n]T .
&
%
MS4105
473
'
$
• A Matlab example:
>> n = 9;
>> I = eye(n); % the n × n identity matrix
>> A = rand(n, n)
>> p = [1, 7, 2, 5, 4, 6, 9, 3, 8]
>> P = I(p, :) %The perm matrix corr.
to p.
>> P ∗ A − A(p, :) % zero, P permutes the rows of A.
>> Q = I(:, p) %The perm matrix corr.
to q (below).
>> A ∗ Q − A(:, p) % zero
>> P ∗ Q − I % zero
i.e.
Q = P−1
>> [i, q] = sort(p) % q is "inverse permutation" for p
>> Q − I(q, :) % zero
&
%
MS4105
474
'
7.2.2
$
Pivots
At step k of GE, multiples of row k are subtracted from rows
k + 1, . . . , m of the current (partly processed version of A) matrix X
(say) in order to introduce zeroes into the kth column of these
rows. I call xkk the pivot.
However there is no reason why the kth row and column need have
been chosen. In particular if xkk = 0 I cannot use the kth row as
that choice results in a divide-by-zero error.
&
%
MS4105
'
475
$
It is also better from the stability viewpoint to choose the pivot
row to be the one that gives the largest value possible for the pivot
— equivalently the smallest value possible for the multiplier.
It is much easier to keep track of the successive choices of pivot
rows if I swap rows as neccessary to ensure that at the kth step, the
kth row is still chosen as the pivot row.
&
%
MS4105
476
'
7.2.3
$
Partial pivoting
If every element of Xk:m,k:m is to be considered as a possible pivot
at step k then O((m − k)2 ) entries need to be examined and
summing over all m steps I find that O(m3 ) operations are needed.
This would add significantly to the cost of GE. This strategy is
called complete pivoting and is rarely used.
The standard approach, partial pivoting, searches for the largest
entry in the kth column in rows k to m, the last m − k + 1 elements
of the kth column.
So the GE algorithm is amended by inserting a permutation
operator (matrix) between successive left-multiplications by the Lk .
&
%
MS4105
477
'
$
More precisely, after m − 1 steps, A becomes an upper triangular
matrix U with
Lm−1 Pm−1 . . . L2 P2 L1 P1 A = U.
(7.6)
where the matrices Pk are permutation matrices formed by
swapping the kth row of the identity matrix with a lower row.
Note that although Pk −1 ≡ Pk for all k (because Pk2 = I) I will
usually in the following write Pk −1 where appropriate rather than
Pk to make the argument clearer.
&
%
MS4105
478
'
7.2.4
$
Example
Let’s re-do the numerical example above with partial pivoting.

2

4

A=
8

6
With p.p., interchange rows 1 &


1
2 1


 4 3
 1




 8 7
1


1 6 7
&
1
1
3
3
7
9
7
9

0

1


5

8
3 by left-multiplying
 
1 0
8 7 9
 

3 1
 4 3 3
=
2 1 1
9 5
 
9 8
6 7 9
by P1 :

5

1


0

8
%
MS4105
479
'
The first elimination step:


1
8


 4
− 1 1

 2

 1
−

1 
 2
 4
− 34
1 6
$
left-multiply by L1 ;
 
7 9 5
8 7
 
1

3 3 1
  −2
=
 −3
1 1 0
4
 
7
7 9 8
4
Now swap rows 2 & 4 by left-multiplying by P2 :
 


8 7
1
8 7
9
5
 


3
3
7
1



1  − 2 − 2 − 2  


4
=



3
5
5
3



1 

  −4 −4 −4  −4
7
9
17
1
−
1
4
4
4
2
&
9
5


− 23
− 45
9
4
− 32 


5
−4
17
4
9
9
4
− 45
− 23
5


17 
4 

5
−4
− 32
%
MS4105
480
'
$
The second elimination step: left-multiply by L2 ;

1






1
3
7
2
7

8





1 

1
7
9
7
4
− 34
− 12
9
4
− 54
− 32


5
8
 
17 



4
=


− 54 
 
− 32
7
9
7
4
9
4
− 27
− 67
Now swap rows 3 & 4 by left-multiplying by P3 :
 


8 7
1
8 7
9
5
 


9
17 
7
7

 1




4
4
4 
4
=



2
4 



−
1
7
7 



− 67 − 27
1
&
9
9
4
− 67
− 27
5


17 
4 

4 
7 
− 27
5


17 
4 

2
−7
4
7
%
MS4105
481
'
The final elimination step: left-multiply by L3 ;


 
1
8 7
9
5
8 7


 
9
17 
7
7

 1





4
4
4
4
=



6
2



−
−
1
7
7



4
− 27
− 13 1
7
&
$
9
9
4
− 67
5


17 
4 

2
−7
2
3
%
MS4105
482
'
7.2.5
$
PA = LU Factorisation
Have I just completed an LU factorisation of A? No, I have
computed an LU factorisation of PA, where P is a permutation
matrix:



1
2 1 1 0






1

4 3 3 1



 1
8 7 9 5



1
6 7 9 8
P
A

1

3

= 4
1
2
1
4
&
1
− 72
− 73
L
1
1
3

8






1
7
9
7
4
9
4
− 67
U
5


17 
4 

2
−7
2
3
(7.7)
%
MS4105
483
'
$
Compare this with (7.3). Apart from the presence of fractions in
(7.7) and not in (7.3), the important difference is that all the
subdiagonal elements in L are ≤ 1 in magnitude as a result of the
pivoting strategy.
I need to justify the statement that PA = LU and explain how I
computed L and P. The example elimination just performed took
the form
L3 P3 L2 P2 L1 P1 A = U.
These elementary matrix products can be re-arranged:
−1
−1
−1
L3 P3 L2 P2 L1 P1 = L3 P3 L2 P3
P3 P2 L1 P2 P3
P3 P2 P1
= L30 L20 L10 P3 P2 P1
where obviously
L30
= L3 ,
&
L20
= P3 L2 P3
−1
and L1 = P3 P2 L1 P2
−1
P3
−1
.
%
MS4105
484
'
7.2.6
$
Details of Li to Li0 Transformation
I can check that the above results for Lj , Pj and Lj0 with j = 1, 2, 3
can be extended to j in the range 1, . . . , m − 1.
Define (for j = 1, 2, . . . , m − 1)
Πj
=
Pm−1 Pm−2 . . . Pj
(7.8)
Lj0
=
Πj+1 Lj Π−1
j+1 ,
(7.9)
0
with Πm−1 = Pm−1 and Πm = I so that Lm−1
= Lm−1 .
Then RTP that
0
0
Lm−1
Lm−2
. . . L20 L10
&
Π1 = Lm−1 Pm−1 Lm−2 Pm−2 . . . L2 P2 L1 P1 .
(7.10)
%
MS4105
485
'
$
First notice that
0
Lj+1
Lj0
−1
≡ Πj+2 Lj+1 Π−1
j+2 Πj+1 Lj Πj+1
=
−1
−1
Πj+2 Lj+1 Pj+2
. . . Pm−1
Pm−1 . . . Pj+1 Lj Π−1
j+1
=
Πj+2 Lj+1 Pj+1 Lj Π−1
j+1 .
0
0
So Lm−1
Lm−2
. . . L20 L10 Π1
=
Lm−1 Pm−1 Lm−2 . . . P2 L1 Π−1
2 Π1
=
−1
Lm−1 Pm−1 Lm−2 . . . P2 L1 P2−1 . . . Pm−1
Pm−1 . . . P1
=
Lm−1 Pm−1 Lm−2 . . . P2 L1 P1 ,
as required.
&
%
MS4105
'
486
$
• Since the definition (7.9) of Lj0 only applies permutations Pk
with k > j to Lj I can see that:
Lj0 must have the same structure as Lj .
• This is because each Pk swaps row k with one of the succeeding
rows.
• The effect of Pk with k > j on Lj when I form Pk Lj Pk −1 is just
to permute the sub-diagonal elements in the jth column of Lj
according to the permutation encoded in Pk .
• Left-multiplying Lj by Pk swaps rows k and l (say, for some
l > k) and right-multiplying Lj by Pk swaps columns k and l
which “undoes” the effect of swapping rows k and l, except for
column j.
0
So the matrices L10 , L20 , . . . Lm−1
are just the original Li ,
i = 1, 2, . . . , m − 1 with the sub-diagonal elements in columns
1, 2, . . . , m − 1 respectively appropriately permuted.
&
%
MS4105
487
'
$
In general, for an m × m matrix, the factorisation (7.6) provided by
GE with p.p. can be written (based on Eq. 7.10):
0
0 0
Lm−1 . . . L2 L1 (Pm−1 . . . P2 P1 ) A = U.
where each Lk0 is defined as (based on Eq. 7.9):
Lk0 = Pm−1 . . . Pk+1 Lk Pk+1 −1 . . . Pm−1 −1
The product of the matrices Lk0 is also unit lower triangular (ones
on main diagonal) and invertible by negating the sub-diagonal
entries, just as in GE without pivoting.. If I write
0
0 0 −1
L = Lm−1 . . . L2 L1
and P = (Pm−1 . . . P2 P1 ) I have
PA = LU.
&
(7.11)
%
MS4105
'
488
$
I can now write pseudo-code for the partial-pivoting version of
G.E.:
Algorithm 7.3 Gaussian Elimination with Partial Pivoting
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
U = A, L = I, P = I
for k = 1 to m − 1
Select i ≥ k to maximise |uik |
uk,k:m ↔ ui,k:m (Swap rows k and i)
lk,1:k−1 ↔ li,1:k−1
pk,: ↔ pi,:
for j = k + 1 to m
ujk
`jk = ukk
uj,k:m = uj,k:m − `jk uk,k:m
end
end
&
%
MS4105
'
489
$
To leading order this algorithm requires the same number of flops
as G.E. without pivoting, namely 23 m3 . (See the Exercises).
In practice, P is not computed or stored as a matrix but as a
permutation vector.
&
%
MS4105
490
'
7.2.7
$
Stability of GE
It can be shown that:
• If A is invertible and the factorisation A = LU is computed by
GE without pivoting then (provided that the reasonable
conditions on floating point arithmetic (O.1) and (O.2) hold)
then, provided M is sufficiently small if A has an LU
decomposition, the factorisation will complete successfully and
˜ and U
˜ satisfy
the computed matrices L
˜U
˜ = A + δA,
L
&
kδAk
= O(M ).
kLkkUk
(7.12)
%
MS4105
491
'
$
• In GE with p.p. each pivot selection involves maximisation
over a sub-column so the algorithm produces a matrix L with
entries whose absolute values are ≤ 1 everywhere below the
main diagonal. This mens that kLk = O(1) in any matrix
norm. So for GE with p.p., (7.12) reduces to the condition
kδAk
kUk = O(M ). The algorithm is therefore backward stable
provided that kUk = O(kAk).
• The key question for stability is whether there is growth in the
entries of U during the GE (with p.p.) process. Let the
growth factor for A be defined as:
ρ=
maxi,j |uij |
maxi,j |aij |
(7.13)
• If ρ is O(1) the elimination process is stable. If ρ is bigger I
expect instability.
&
%
MS4105
'
492
$
• As kLk = O(1) and kUk = O(ρkAk), I can conclude from the
definition of ρ that:
• If the factorisation PA = LU is computed by GE with p.p. and
˜ and U
˜
if (O.1) and (O.2) hold then the computed matrices L
satisfy:
kδAk
˜
˜
LU = PA + δA,
= O(ρM ).
(7.14)
kAk
• What’s the difference? Without pivoting L and U can be
unboundedly large. This means that the bound on the error on
A in (7.12) may in fact allow the error to be large. This
difficulty does not arise in (7.14) provided ρ = O(1).
&
%
MS4105
'
493
$
• A final — negative — comment on GE with p.p. There are
matrices that exibit large values of ρ. Consider applying GE
(no pivoting necessary) to :




1
1
1
1




−1 1


1
1
2










A = −1 −1 1
1 , L1 A =  −1 1
2





−1 −1 −1 1 1
 −1 −1 1 2




−1 −1 −1 −1 1
−1 −1 −1 2




1
1
1
1






 1
2
2
 1









L2 L1 A = 
1
4 , L3 L2 L1 A = 
1
4








1 8
−1 1 4



−1 −1 4
−1 8
&
%
MS4105
494
'
$
Finally;

1




L4 L3 L2 L1 A = 




&
1
1
1


2


4


1 8

16
%
MS4105
495
'
$
The final PA = LU (P = I) factorisation is

1

−1 1


−1 −1


−1 −1

−1 −1
1
−1
1

1

1


1
=

1

−1 −1 1

1

−1 1


−1 −1


−1 −1

−1 −1
&
1
−1
1
−1 −1

1

 1







1
1


2


1
4


1 8

16
%
MS4105
'
496
$
For this 5 × 5 matrix, the growth factor is ρ = 16. For any
m × m matrix of the same form, it is ρ = 2m−1 — the upper
bound on ρ (see the Exercises).
• Such a matrix — for fixed m — is still backward stable but if
m were large, in practice GE with p.p. could not be used.
• Fortunately it can be shown that matrices with ρ = O(2m ) are
a vanishingly small minority.
• In practice, GE is backward stable and works well.
&
%
MS4105
497
'
7.2.8
$
Exercises
1. Explain (analogously with the discussion on Slide 462) how the
PA = LU factorisation can be used to solve a linear system.
2. Calculate the extra computational cost of partial pivoting (line
3 in Alg. 7.3).
3. Find the determinant of the matrix A in the numerical
example above using the GE factorisation : (7.3) and the p.p
version (7.7).
4. Can you explain why the maximum value for ρ is ρ = 2m−1 ?
(Hint: w.l.o.g. I can normalise A so that maxi,j |aij | = 1.)
5. Write a matlab m-file to implement Alg. 7.3. Compare its
numerical stability with that of Alg. 7.1.
&
%
MS4105
498
'
8
$
Finding the Eigenvalues of Matrices
I’ll start with a short review of the topic of eigenvalues and
eigenvectors. It should be familiar from Linear Algebra 1.
8.1
Eigenvalue Problems
Let A ∈ Cn×n be a square matrix. A non-zero vector x ∈ Cn is an
eigenvector of A and λ ∈ C its corresponding eigenvalue if
Ax = λx.
(8.1)
The effect on an eigenvector of multiplication is to stretch (or
shrink) it by a scalar (possibly complex) factor. More generally, if
the action of a matrix A on a subspace S of Cn is to stretch all
vectors in the space by the same scalar factor λ, the subspace is
called an eigenspace and any nonzero x ∈ S is an eigenvector.
&
%
MS4105
499
'
$
The set of all the eigenvalues of A is the spectrum of A, a subset
of C denoted by Λ(A).
Eigenvalue problems have the important restriction that the
domain and range spaces must have the same dimension — in other
words A must be square for (8.1) to make sense.
8.1.1
Eigenvalue Decomposition
An eigenvalue decomposition of a square matrix A is a
factorisation
A = XΛX−1 .
(8.2)
Here X is invertible and Λ is diagonal. (I’ll show that such a
factorisation is not always possible.) I can re-write (8.2) as
&
%
MS4105
500
'
$
AX = XΛ or










A






 x1




x2
...






x1








xn 
=



x2
...

 λ1




xn 





λ2
..
.
λn







(8.3)
which of course can be read as Axj = λj xj , j = 1, . . . , n.
&
%
MS4105
501
'
$
So the jth column of X is an eigenvector of A and the jth diagonal
entry of Λ the corresponding eigenvalue. As in previous contexts,
the eigenvalue decomposition expresses a change of basis to
coordinates referred to eigenvectors. If Ax = b and A = XΛX−1 I
have
(X−1 b) = Λ(X−1 x).
(8.4)
So to compute Ax, I can expand x in the basis of columns of X,
apply Λ and interpret the result as a vector of coefficients of an
expansion in the columns of X — remembering that:
y = X−1 x ≡ x = Xy ≡ x is expanded in terms of cols of X
z = X−1 b ≡ b = Xz ≡ b is expanded in terms of cols of X.
&
%
MS4105
502
'
8.1.2
$
Geometric Multiplicity
I saw that the set of eigenvectors corresponding to a single
eigenvalue (plus the zero vector) forms a subspace of Cn called an
eigenspace. If λ is an eigenvalue of A, refer to the corresponding
eigenspace as Eλ .
Typically, I have dim Eλ = 1 — for a given eigenvalue λ there is a
unique eigenvector xλ (or any multiple of xλ ) but this need not be
true.
Definition 8.1 The dimension of Eλ is the size of the largest set
of linearly independent eigenvectors corresponding to the same
eigenvalue λ. This number is referred to as the geometric
multiplicity of λ. Alternatively I can say that the geometric
multiplicity is the dimension of the nullspace of A − λI since this
space is just Eλ .
&
%
MS4105
503
'
8.1.3
$
Characteristic Polynomial
The characteristic polynomial of an n × n complex matrix A
written pA or just p is the polynomial of degree m defined by
pA (z) = det(zI − A).
(8.5)
Note that the coefficient of zn is 1. The following well-known result
follows immediately from the definition:
Theorem 8.1 λ is an eigenvalue of A iff pA (λ) = 0.
Proof: From the definition of an eigenvalue ;
λ is an eigenvalue ⇔ ∃ x 6= 0 s.t. λx − Ax = 0
⇔ λI − A is singular
⇔ det(λI − A) = 0.
&
%
MS4105
'
504
$
Note that even if a matrix is real, some of its eigenvalues may be
complex — as a real polynomial may have complex roots.
&
%
MS4105
505
'
8.1.4
$
Algebraic Multiplicity
By the Fundamental Theorem of Algebra, I can write pA as
pA (z) = (z − λ1 )(z − λ2 ) . . . (z − λn ).
(8.6)
for some λ1 , . . . , λn ∈ C. By Thm. 8.1, each λj is an eigenvalue of
A and all eigenvalues of A appear somewhere in the list. In general
an eigenvalue may appear more than once.
Definition 8.2 Define the algebraic multiplicity of an
eigenvalue of A to be its multiplicity as a root of pA . An eigenvalue
is simple if its algebraic multiplicity is 1.
It follows that a n × n matrix has n eigenvalues, counted with
algebraic multiplicity. I can certainly say that every matrix has at
least one eigenvalue.
&
%
MS4105
'
506
$
• It would be nice if the geometric multiplicity of an
eigenvalue always equalled its algebraic multiplicity.
• Unfortunately this is not true.
• I will prove that the algebraic multiplicity of an eigenvalue is
always greater than or equal to its geometric multiplicity.
• First I need some results about similarity transformations.
&
%
MS4105
507
'
8.1.5
$
Similarity Transformations
Definition 8.3 If X ∈ Cn×n is invertible then the mapping
A → X−1 AX is called a similarity transformation of A. say that
two matrices A and B are similar if there is a similarity
transformation relating one to the other — i.e. if there exists an
invertible matrix X s.t. B = X−1 AX.
As I mentioned in the context of eigenvalue diagonalisation (8.4),
any similarity transformation is a change of basis operation. Many
properties are held in common by matrices that are similar.
Theorem 8.2 If X is invertible then for any compatible matrix A,
A and X−1 AX have the same:
• characteristic polynomial,
• eigenvalues and
• algebraic and geometric multiplicities.
&
%
MS4105
508
'
$
Proof:
• Check that the characteristic polynomials are equal:
pX−1 AX (z) = det(zI − X−1 AX) = det(X−1 (zI − A)X)
= det(X−1 ) det(zI − A) det(X) = det(zI − A) = pA (z).
• From the agreement of the characteristic polynomials, the
agreement of the eigenvalues and the algebraic multiplicities
follow.
• Finally, to prove that the geometric multiplicities are equal —
check that if Eλ is an eigenspace for A then X−1 Eλ is an
eigenspace for X−1 AX and vice versa. (Note that
dim X−1 Eλ = dim Eλ . Check using definition of a basis.)
&
%
MS4105
'
509
$
Now to relate geometric multiplicity to algebraic multiplicity.
Theorem 8.3 The algebraic multiplicity of an eigenvalue λ is
greater than or equal to its geometric multiplicity.
Proof: Let g be the geometric multiplicity of λ for the matrix A.
Form a n × g matrix V^ whose columns form an orthonormal
basis for the eigenspace Eλ . Then if I augment V^ to a square
unitary matrix V check that I can write:


λIg C
∗

.
B = V AV =
0 D
(Exercise: what are the dimensions of Ig , C and D?)
&
%
MS4105
'
510
$
By definition of the determinant, (see Ex. 1 below)
det(zI − B) = det(zIg − λI) det(zI − D) = (z − λ)g det(zI − D).
Therefore the algebraic multiplicity of λ as an eigenvalue of B is at
least g. Since similarity transformations preserve multiplicities, the
result holds for A.
&
%
MS4105
511
'
8.1.6
$
Defective Eigenvalues and Matrices
Usually the algebraic multiplicities and geometric multiplicities of
the eigenvalues of a square matrix are all equal (and equal to 1).
However this is not always true.
Example 8.1 Consider the matrices



2
2





A =  2 ,B = 

2
1


2 1
.
2
Both A and B have the same characteristic polynomial (z − 2)3 so
they have a single eigenvalue λ = 2 of algebraic multiplicity 3. In
the case of A, I can choose three linearly
independent eigenvectors (e.g. e1 , e2 and e3 ) so the geometric
multiplicity is also 3.
&
%
MS4105
'
512
$
However for B check that I can only find a single eigenvector (any
multiple of e1 ) so that the geometric multiplicity of the eigenvalue
is 1.
Definition 8.4 An eigenvalue whose algebraic multiplicity exceeds
its geometric multiplicity is called a defective eigenvalue . A
matrix with one or more defective eigenvalues is called a defective
matrix.
Diagonal matrices are never defective as the algebraic multiplicities
and geometric multiplicities of each of their eigenvalues λi are
precisely the number of occurrences of λi on the matrix’s diagonal.
&
%
MS4105
513
'
8.1.7
$
Diagonalisability
An important result:
Nondefective matrices ≡ Matrices that have an
eigenvalue decomposition (8.2).
I’ll state this as a Theorem shortly but first a technical result:
Lemma 8.4 For any square matrix A, a set of
eigenvectors corresponding to distinct eigenvalues must be linearly
independent.
(This holds automatically if A is Hermitian (A∗ = A) but the
following proof works for any square matrix..)
Proof: Let the matrix A have k distinct eigenvalues λ1 , . . . , λk and
let {xi }ki=1 be a set of eigenvectors corresponding to these distinct
eigenvalues so that Axi = λi xi .
&
%
MS4105
514
'
$
Assume that the set is linearly dependent so that
k
X
αi xi = 0,
not all αi = 0.
i=1
WLOG (re-labelling if necessary) assume that α1 6= 0 so that
x1 =
k
X
βi xi ,
βi = −αi /α1 .
i=2
Not all the βi = 0 for i = 2, . . . , k as if all are zero then x1 = 0.
Now:
k
k
X
X
Ax1 = λ1 x1 =
βi Axi =
βi λi xi .
i=2
i=2
As λ1 6= 0, I can solve for x1 . Now take the difference of the two
equations for x1 :
k
X
λi
0=
βi (1 − )xi .
λ1
i=2
&
%
MS4105
'
515
$
The factors 1 − λλ1i are all non-zero as the λi are all distinct. I know
that the βi are not all zero so one of the xi for i > 1 (say x2 ) can
be expressed in terms of the remaining xi , i > 2. Again, not all the
coefficients in this expression can be zero as this would result in
x2 = 0.
If I again multiply by A I can repeat the argument. So, repeating
the process k − 1 times I conclude that the last xi , say xλk must be
zero and so that all the xi = 0. This gives a contradiction as I
began by taking the {xi } to be a set of eigenvectors.
&
%
MS4105
'
516
$
Now the main Theorem:
Theorem 8.5 An n × n matrix A is nondefective iff it has an
eigenvalue decomposition A = XΛX−1 .
Proof:
[←] Given an eigenvalue decomposition A = XΛX−1 , I know by
Thm. 8.2 that A is similar to Λ — with the same
eigenvalues and multiplicities. Since Λ is a diagonal matrix it is
non-defective and so A must also be.
&
%
MS4105
'
517
$
[→] A non-defective matrix must have n linearly
independent eigenvectors. Argue as follows:
• eigenvectors with different eigenvalues must be linearly
independent by Lemma 8.4.
• as S is non-defective, each eigenvalue contributes as many
linearly independent eigenvectors as its algebraic
multiplicity (the number of repetitions of the eigenvalue )
If I form a matrix X from these n linearly
independent eigenvectors then X is invertible and AX = XΛ as
required.
So the terms nondefective and diagonalisable are equivalent.
&
%
MS4105
518
'
8.1.8
$
Determinant and Trace
A reminder of some definitions.
Definition 8.5 The trace of an n × n matrix A is the sum of its
diagonal elements..
Definition 8.6 The determinant can be defined as the sum over
all signed permutations of products of n elements from A — or
recursively:
• determinant of a 1 × 1 matrix is just the element A11
Pn
• determinant of an n × n matrix is k=1 (−1)k+1 a1k M1k where
M1k is the minor determinant of the (n − 1) × (n − 1) matrix
found by deleting the first row and the kth column of A.
&
%
MS4105
519
'
$
I can easily prove some useful results:
Theorem 8.6 The determinant and trace of a square matrix A are
equal to the product and sum of the eigenvalues of A respectively,
counted with algebraic multiplicity :
det(A) =
n
Y
λi
trace(A) =
i=1
n
X
(8.7)
λi
i=1
Proof: By (8.5) and (8.6) I can calculate
det(A) = (−1)n det(−A) = (−1)n pA (0) =
n
Y
λi
i=1
which is the required result for det A.
From (8.5) it follows that the coefficient of the zn−1 term of pA is
minus the sum of the diagonal elements of A, i.e. − trace A.
&
%
MS4105
'
520
$
(To get a factor of zn−1 in the det(zI − A) I select n − 1 diagonal
elements of zI − A which means that the nth factor for that term
in the determinant is z − aii . There are n ways to select the
“missing” nth factor so the zn−1 term in the determinant is
Pn
− i=1 aii as claimed.)
Pn
n−1
But in (8.6) the z
term is equal to − i=1 λi .
&
%
MS4105
521
'
8.1.9
$
Unitary Diagonalisation
I have shown that a non-defective n × n matrix A has n linearly
independent eigenvectors. In some cases the eigenvectors can be
chosen to be orthogonal.
Definition 8.7 If an n × n matrix A has n linearly
independent eigenvectors and there is a unitary matrix Q s.t.
A = QΛQ∗
I say that A is unitarily diagonalisable.
Note that the diagonal matrix of eigenvalues, Λ may be complex.
However I can state a theorem to clarify this.
Theorem 8.7 A hermitian matrix is unitarily diagonalisable and
its eigenvalues are real.
Proof: Exercise.
&
%
MS4105
'
522
$
Many other classes of matrices are also unitarily diagonalisable.
The list includes skew-hermitian and unitary matrices. There is a
very neat general criterion for unitary diagonalisability. I need a
definition:
Definition 8.8 An n × n matrix A is normal if AA∗ = A∗ A.
(Notice that hermitian, skew-hermitian and unitary matrices are all
normal — check . But normal matrices need not fall into one of
these categories, for example:


1 1 0 1


0 1 1 1


A=

1 1 1 0


1 0 1 1
is neither hermitian, skew-hermitian nor unitary — but is normal.
Check .)
&
%
MS4105
'
523
$
I will state and prove a Theorem (Thm. 8.11) that relates
normality to unitary diagonalisability following the discussion of
Schur Factorisation in the next Section.
&
%
MS4105
524
'
8.1.10
$
Schur Factorisation
I need a general decomposition for any m × m matrix – the Schur
decomposition allows any square matrix to be expressed as unitarily
similar to an upper triangular matrix. It is very widely used in
matrix algebra — and will allow us to prove Thm. 8.11 below.
&
%
MS4105
525
'
$
First I prove a technical lemma.
Lemma 8.8 Any m × m matrix A can be written as


∗
λ
v
1
1
Q∗0 AQ0 = 
for some unitary Q0 .
0 A1
(8.8)
Proof: Let q1 be an eigenvector of A, kq1 k = 1. I can choose
0
0
q20 , . . . , qm
so that {q1 , q20 , . . . , qm
} form an orthonormal basis for
Cm . Let
h
i
Q0 = q1
q20
...
0
,
qm
then Q0 is obviously unitary. (Q∗0 Q0 = I).
&
%
MS4105
526
'
$
I have

q∗1

 
 0∗  h
i
q2 
∗
0
0

Q0 AQ0 = 
 ..  A q1 q2 . . . qm
 . 
 
0∗
qm
 
q∗1
 
 0∗  h
i
q2 
 λ1 q1 Aq 0 . . . Aq 0
=
.
m
 . 
2
 . 
 
0∗
qm


λ1 v∗1
,
=
0 A1
0
where v∗1 = q∗1 A[q20 . . . qm
] and A1 is (m − 1) × (m − 1).
&
%
MS4105
527
'
$
Now for the main Theorem.
Theorem 8.9 For any m × m matrix A, I can write the Schur
factorisation of A as
A = QTQ∗
(8.9)
where Q is unitary and T is upper triangular.
Proof: The proof is reminiscent of the proof of Thm. 6.2 for the
SVD.
I use induction. Take m ≥ 2 as the m = 1 case is trivial.
[Base Step] . Taking m
 = 2, (8.8) in the Lemma just proved gives
∗
λ
v
1
1
— obviously upper triangular.
Q∗0 AQ0 = 
0 a1
&
%
MS4105
528
'
$
[Inductive Step] Assume that any (m − 1) × (m − 1) matrix has a
Schur factorisation. RTP that this holds for any m × m matrix.
By the inductive hypothesis, I can find a Schur decomposition
for A1 in (8.8).
So there exists a unitary (m − 1) × (m − 1) matrix Q1 s.t.


∗
λ
v
2
2
,
Q∗1 A1 Q1 = 
0 T2
where v2 ∈ Cm−2 and T2 is (m − 2) × (m − 2) upper triangular.
Define

1

Q = Q0
0
&
0
Q1

.
%
MS4105
529
'
$
Then (using (8.8) to go from the first to the second line)




1 0
1 0
∗
∗




Q AQ =
Q0 AQ0
0 Q∗1
0 Q1




1 0
λ1 v∗1
1 0






=
0 A1
0 Q1
0 Q∗1



λ1 v∗1 Q1
1 0


=
0 A1 Q1
0 Q∗1


λ1
v∗1 Q1

=
0 Q∗1 A1 Q1


λ1 ∗
∗


∗

=  0 λ2 v2 
 = T.
0
0 T2
&
%
MS4105
530
'
$
Obviously T is upper triangular (as T2 is). So
Q∗ AQ = T
upper triangular and also Q is itself unitary as it is a product of
unitary matrices.
Multiplying on the left by Q and on the right by Q∗ I have
A = QTQ∗ .
I can easily prove a nice consequence of the Schur factorisation —
the result suggests that a Schur decomposition will allow us to
compute the eigenvalues of any square matrix.
Theorem 8.10 Given the Schur factorisation A = QTQ∗ of a
square matrix A, the eigenvalues of A appear on the main diagonal
of T .
&
%
MS4105
531
'
$
Proof: I have A = QTQ∗ . Let x be any eigenvector of A. Then
Ax = QTQ∗ x = λx. Defining y = Q∗ x, I can write Ty = λy so y is
an eigenvector of T iff x = Qy is an eigenvector of A — with the
same corresponding eigenvalue λ. So certainly T has the same
spectrum as A.
Now, given that T is upper triangular I can write for a given
eigenvector ψ corresponding to a given eigenvalue λ:
(Tψ)i =
m
X
Tij ψj = λψi .
(8.10)
i≤j
Use a form of back substitution — start with the last row; i = m.
• I have Tmm ψm = λψm so ψm = 0 or Tmm = λ.
• If Tmm = λ then λ appears on the diagonal as claimed.
&
%
MS4105
'
532
$
• If Tmm 6= λ then I have ψm = 0.
• Move to the previous row, i.e. set i = m − 1.
• Then (8.10) becomes Tm−1,m−1 ψm−1 = λψm−1 .
• This time either Tm−1,m−1 = λ or ψm−1 = 0.
• Repeating this process, eventually one of the Tii = λ as
otherwise ψ = 0 which contradicts ψ an eigenvector.
I can repeat this “back-substitution” for each eigenvalue λ
(remember that T has the same eigenvalues as A).
I conclude that all the eigenvalues of T appear on the main
diagonal.
&
%
MS4105
533
'
$
I can now prove the important result that:
Theorem 8.11 A square matrix is unitarily diagonalisable iff it is
normal.
Proof: I have by Thm. 8.9 that any square matrix A can be
written as A = QTQ∗ . Use the definition of normality Def. 8.8 to
conclude that A normal if and only if T is (TT ∗ = T ∗ T ):
AA∗ = QTQ∗ QT ∗ Q = QT ∗ Q∗ QTQ∗ = A∗ A
QTT ∗ Q = QT ∗ TQ∗
TT ∗ = T ∗ T,
where each line is equivalent to its predecessor using the unitarity
of Q.
So RTP that for any upper triangular matrix T ; TT ∗ = T ∗ T iff T is
diagonal.
&
%
MS4105
534
'
$
Write this matrix identity in subscript notation:
m
X
tij t∗jk =
m
X
j=1
j=1
m
X
m
X
tij¯tTjk =
j=1
m
X
t∗ij tjk
¯tTij tjk
j=1
tij¯tkj =
j≥i,k
m
X
¯tji tjk
j≤i,k
If I consider diagonal elements of each side of TT ∗ = T ∗ T so that
i = k; the latter equality reduces to:
m
X
j≥k
&
|tkj |2 =
m
X
|tjk |2 .
(8.11)
j≤k
%
MS4105
535
'
$
I can now prove the result by induction. I simply substitute
k = 1, 2, . . . , m into the last equality above. So RTP that the kth
row of T has only its diagonal term non-zero, for k = 1, . . . , m.
[Base Step] Let k = 1. RTP that t1j = 0 for all j > 1.
Substituting k = 1 into (8.11):
|t11 |2 + |t12 |2 + . . . |t1m |2 = |t11 |2
So t1j = 0, for j = 2, . . . , m — the first row of T is zero, apart
from t11 .
&
%
MS4105
536
'
$
[Inductive Step] Let k > 1. Assume that for all rows i < k,
tij = 0, for j = i + 1, . . . , m. RTP that tkj = 0 for all
j = k + 1, . . . , m. Then writing out (8.11) explicitly:
|tkk |2 +|tk,k+1 |2 +. . . |tkm |2 = |t1k |2 + |t2k |2 + · · · + |tk−1,k |2 +|tkk |2 .
The struck-out terms in blue are zero by the inductive
hypothesis. So tkj = 0 for j = k + 1, . . . , m — the kth row of T
is zero, apart from tkk .
(A sketch of the upper triangular matrix T makes the inductive
step much easier to “see”.)
So T is diagonal.
&
%
MS4105
537
'
8.1.11
$
Exercises
1. Prove that the determinant of a block triangular matrix is the
product of the determinants of the diagonal blocks. Just use
the general definition of the determinant
X
det A =
sign(i1 , i2 , . . . , im )Aii1 A2i2 . . . Amim ,
i1 6=i2 6=···6=in
where the sum is over all permutations of the columns
1, 2, . . . , n. (Used in proof of Thm. 8.3.)
2. Prove Gerchgorin’s Theorem: for any m × m matrix A (not
necessarily hermitian), every eigenvalue of A lies in at least one
of the m discs in C with centres at aii and radii
P
Ri (A) ≡ j6=i |aij | where i is any one of the values 1 up to n.
See App. M for a proof.
&
%
MS4105
538
'
$
3. Let

8

A=
0.1

0.5

4
Use Gerchgorin’s Theorem to find bounds on the eigenvalues of
A. Are you sure?
4. (Difficult.) Prove the extended version: If k of the discs
Di ≡ {z ∈ C : |z − aii | ≤ Ri (A)} intersect to form a connected
set Uk that does not intersect the other discs then precisely k
of the eigenvalues of A lie in Uk . See App. N for a proof.
5. What can you now say about the eigenvalues of the matrix A
in Q.2?
&
%
MS4105
539
'
$
6. Consider the matrix:

10
2
3




A = −1 0 2

1 −2 1
which has eigenvalues 10.2260, 0.3870 + 2.2216i and
0.3870 − 2.2216i. What does (the extended version of) G’s
Theorem tell us about the eigenvalues of A?
&
%
MS4105
540
'
8.2
8.2.1
$
Computing Eigenvalues — an Introduction
Using the Characteristic Polynomial
The obvious method for computing the eigenvalues of a general
m × m complex matrix is to find the roots of the characteristic
polynomial pA (z).
Two pieces of bad news:
• If a polynomial has degree greater than 4, there is, in general,
no exact formula for the roots. I cannot therefore hope to find
the exact values of the eigenvalues of a matrix — any method
that finds the eigenvalues of a matrix must be
iterative. This is not a problem in practice as any desired
accuracy can be achieved by iterating a suitable algorithm
until the required accuracy is reached.
&
%
MS4105
'
541
$
• Unfortunately the process of finding the roots of a polynomial
is not only not backward stable (O.4) but is not even stable
(O.3). Small variations in the coefficients of a polynomial
p(x)can give rise to large changes in the roots.
So using a root-finding algorithm like Newton’s Method to find
the eigenvalues of a given matrix A is not a good idea.
&
%
MS4105
'
542
$
For a detailed discussion of this interesting topic see App. P.
One example to make the point — suppose that I want to find
the eigenvalues of the 2 × 2 identity matrix. The characteristic
polynomial is just p(x) = (x − 1)2 = x2 − 2x + 1. The roots are
of course both equal to one.
Suppose that I make a small change in the constant term in
˜ (x) = x2 − 2x + 0.9999
p(x) so that the polynomial is now p
corresponding to a change δa0 = −10−4 in the constant term
a0 .
If I solve the perturbed quadratic using matlab I find that the
matlab command
roots([1, -2,0.9999])
returns
the two roots:


1.0100

.
0.9900
&
%
MS4105
'
543
$
So a change of order 10−4 in a coefficient of the characteristic
polynomial results in an error in the eigenvalue estimate of
order 10−2 !
It is interesting to check that this is exactly the result
predicted by Eq. P.4— with r = 1, k = 0 and ak = 1.
&
%
MS4105
544
'
8.2.2
$
An Alternative Method for Eigenvalue
Computation
Many algorithms for computing eigenvalues work by computing a
Schur decomposition Q∗ AQ = T of A. I proved in Thm. 8.10 that
the diagonal elements of a upper triangular matrix are its
eigenvalues and obviously T is unitarily similar to A so once I have
found the Schur decomposition of A I am finished.
These methods are in fact two-stage methods;
• The first phase uses a finite number of Householder reflections
to write Q∗0 AQ0 = H where Q0 is unitary and H is an upper
Hessenberg matrix, a matrix that has zeroes below the first
sub-diagonal. I’ll show that this can be accomplished using
Householder reflections.
&
%
MS4105
545
'
$
• The second phase is to use an iterative method (pre- and
post-multiplying the Hessenberg matrix H by unitary matrices
Q∗j and Qj so that
lim Q∗j Q∗j−1 . . . Q∗1 HQ1 . . . Qj−1 Qj = T,
j→∞
upper triangular.
Obviously I can only apply a finite number of iterations in the
second phase — fortunately the standard method converges very
rapidly to the triangular matrix T .
If A is hermitian then it is certainly normal and I know from
Thm. 8.11 (or directly) that T must be diagonal. So, for a
hermitian matrix, once I have the Schur decomposition A then I
have the eigenvectors as the columns of Q and the eigenvalues as
the diagonal elements of the diagonal matrix T .
On the other hand, if A is not hermitian then the columns of Q,
while orthonormal, are not the eigenvectors of A.
&
%
MS4105
546
'
8.2.3
$
Reducing A to Hessenberg Form — the “Obvious”
Method
Let’s see to see how the first phase of the Schur factorisation is
accomplished.
(I briefly describe the “QR algorithm ” for the second phase in the
very short Chapter 9 below. However it is worth noting that each
iteration of the QR algorithm takes O(m2 ) flops and typically
O(m) iterations will reduce the error to machine precision so that
the typical cost of phase two is O(m3 ). I’ll show below that phase
one also requires O(m3 ) flops so the overall cost of accomplishing a
Schur factorisation and so computing the eigenvalues (and
eigenvectors if A is hermitian) is O(m3 ).)
&
%
MS4105
'
547
$
The “obvious” way to do the two phases at once in order to
compute the Schur factorisation of A is to pre- and post-multiply
A by unitary matrices Q∗k and Qk respectively for k = 1, . . . , m − 1
to introduce zeroes under the main diagonal. The problem is that
this cannot be done with m − 1 operations as while I can design a
Householder matrix Q∗1 that introduces zeroes under the main
diagonal; it will change all the rows of A.
This is not a problem but when I postmultiply by Q1 (≡ Q∗1 as the
Householder mtrices are hermitian) it will change all the columns
of A thus undoing the work done in pre-multiplying A by Q∗1 .
Of course I should have known in advance that this approach had
to fail as I saw at the start of this Section on Slide 540 that no
finite sequence of steps can yield us the eigenvalues of A.
&
%
MS4105
548
'
8.2.4
$
Reducing A to Hessenberg Form — a Better
Method
I need to be less ambitious and settle for reducing A to upper
Hessenberg (zeroes below the first sub-diagonal) rather than upper
triangular form. At the first step I select a Householder reflector
Q∗1 that leaves the first row unchanged. Left-multiplying A by it
forms linear combinations of rows 2 to m to introduce zeroes into
rows 3 to m of the first column. When I right-multiply Q∗1 A by
Q1 , the first column is left unchanged so I do not lose any of the
zeroes already introduced.
So the algorithm will consist at each iteration (i = 1, . . . , m − 2) of
pre- and post-multiplying A by Q∗i and Qi respectively (in fact
Q∗i = Qi ).
&
%
MS4105
'
549
$
The algorithm now can be stated:
Algorithm 8.1 Householder Hessenberg Reduction
(1)
(2)
(3)
(4)
(5)
(6)
(7)
for k = 1 to m − 2
x = Ak+1:m,k
vk = x + sign(x1 )kxke1
vk = vk /kvk k
Ak+1:m,k:m = Ak+1:m,k:m − 2vk (v∗k Ak+1:m,k:m )
A1:m,k+1:m = A1:m,k+1:m − 2 (A1:m,k+1:m vk ) v∗k
end
&
%
MS4105
'
550
$
Some comments:
• In line 2, x is set to the sub-column k, from rows k + 1 to m.
• The Householder formula for v in line 3 ensures that
I − 2vv∗ /v∗ v zeroes the entries in the kth column from row
k + 2 down.
• In line 5, I left-multiply the (m − k) × (m − k + 1) rectangular
block Ak+1:m,k:m by the (m − k) × (m − k) matrix I − 2vv∗
(having normalised v).
• In line 6 I right-multiply the (updated in line 5) m × (m − k)
matrix A1:m,k+1:m by I − 2vv∗ .
&
%
MS4105
551
'
8.2.5
$
Operation Count
The work done in Alg. 8.1 is dominated by the (implicit) inner
loops in lines 5 and 6:
Line 5
Ak+1:m,k:m
= Ak+1:m,k:m − 2vk (v∗k Ak+1:m,k:m )
Line 6
A1:m,k+1:m
= A1:m,k+1:m − 2 (A1:m,k+1:m vk ) v∗k
• Consider Line 5. It applies a Householder reflector on the left.
At the kth iteration, the reflector operates on the last m − k
rows of the full matrix A. When the reflector is applied, these
rows have zeroes in the first k − 1 columns (as these columns
are already upper Hessenberg). So only the last m − k + 1
entries in each row need to be updated — implicitly, for
j = k, . . . , m.
&
%
MS4105
'
552
$
This inner step updates the jth column of the submatrix
Ak+1:m,k:m . If I write L = m − k + 1 for convenience then the
vectors in this step are of length L. The update requires
4L − 1 ≈ 4L flops. Argue as follows: L flops for the
subtractions, L for the scalar multiple and 2L − 1 for the inner
product (L multiplications and L − 1 additions).
Now the index j ranges from k to m so the inner loop requires
≈ 4L(m − k + 1) = 4(m − k + 1)2 flops.
• Consider Line 6. It applies a Householder reflector on the
right. The last m − k columns of A are updated. Again,
implicitly, there is an inner loop, this time with index
j = 1 . . . m that updates the jth row of the sub-matrix. The
inner loop therefore requires≈ 4Lm = 4m(m − k + 1) flops.
• Finally, the outer loop ranges from k = 1 to k = m − 2 so I can
write W, the total number of flops used by the Householder
&
%
MS4105
553
'
$
Hessenberg reduction algorithm as:
W=4
m−2
X
(m − k + 1)2 + m(m − k + 1)
k=1
=4
=4
m−2
X
(m − k + 1)(2m − k + 1)
k=1
m
X
k(m + k)
(k ↔ m − k + 1)
k=3
≈ 4 m(m(m + 1)/2) + m(m + 1)(2m + 1)/6
≈ 10m3 /3.
&
(8.12)
%
MS4105
554
'
8.2.6
$
Exercises
1. Show that for a hermitian matrix, the computational cost
drops to O(8/3m3 ) flops.
2. Show that when the symmetry property is taken into account
the cost drops to O(4/3m3 ) flops.
3. Write a Matlab function m-file for Alg. 8.1 and test it; first on
randomly generated general square matrices then on hermitian
(real symmetric) matrices.
&
%
MS4105
'
9
555
$
The QR Algorithm
As mentioned in Section 8.2.3 the QR algorithm forms the second
phase of the task of computing the eigenvalues (and eigenvectors )
of a square complex matrix. It also is the basis of algorithms for
computing the SVD as I will very briefly see in Ch. 10.
In this Chapter I can only introduce the QR algorithm (not the
QR factorisation already seen ). A full analysis will not be possible
in the time available. I will work toward the QR algorithm, first
considering some simpler related methods.
I’ll start with a very simple and perhaps familiar method for
computing some of the eigenvalues of a square complex matrix.
&
%
MS4105
556
'
9.1
$
The Power Method
For any diagonalisable (or equivalently non-defective)
m × m matrix I have AX = XΛ where the columns of the invertible
matrix X are eigenvectors of A and Λ is a diagonal matrix of
corresponding eigenvalues.
Number the eigenvalues in decreasing order: |λ1 | ≥ · · · ≥ |λm | and
let q(0) ∈ Cm with kq(0) k = 1, an arbitrary unit vector. Then the
Power Method is described by the following simple algorithm :
Algorithm 9.1 Power Method
(1)
(2)
for k = 1, 2, . . .
p(k) = Aq(k−1)
(k)
q
(3)
(4)
(5)
=
p(k)
kp(k) k
(k)∗
λ(k) = q
end
&
Aq(k)
%
MS4105
'
557
$
I will briefly analyse this simple algorithm and show that if A is
hermitian (A∗ = A) then the algorithm converges quadratically to
the largest eigenvalue in magnitude. (For non-hermitian A I only
have linear convergence.)
For the rest of this Chapter I will make the simplifying assumption
that A is hermitian.
It is interesting to note that all the results below still hold
when I weaken this restriction and allow A to be normal.
Remember that a normal matrix is unitarily
diagonalisable and this is the property needed. I will not
discuss this further in this course.
&
%
MS4105
558
'
$
First define the Rayleigh Quotient:
Definition 9.1 ( Rayleigh Quotient) For any m × m hermitian
matrix A and any x ∈ Cm , define the complex-valued function:
x∗ Ax
r(x) = ∗
x x
The properties of the Rayleigh Quotient are important in the
arguments that follow.
&
%
MS4105
559
'
$
Obviously r(qJ ) = λJ for any eigenvector qJ . It is reasonable to
expect that, if x is close to an eigenvector qJ that r(x) will
approximate the corresponding eigenvalue λJ .
How closely?
If I compute the gradient ∇ r(x) of r(x) wrt the vector x (i.e. form
the vector of partial derivatives ∂r(x)
∂xi ) I find that
x∗ x (Ax + A∗ x) − 2(x∗ Ax)x
.
∇ r(x) =
∗
2
(x x)
¯) = 0 if and only
So if, as I are assuming, A is hermitian, then ∇ r(x
¯ satisfies
if x
¯ = r(x
¯ )x
¯,
Ax
¯ is an eigenvector corresponding to the eigenvalue r(x
¯).
i.e. x
&
%
MS4105
560
'
$
If for some eigenvector qJ with eigenvalue λJ , if I do a
¯ I have:
(multi-variate) Taylor series expansion around x
r(x) = r(qJ ) + (x − qJ )∗ ∇ r(qJ ) + O(kx − qJ k2 )
(9.1)
where the struck-out term in blue is zero.
So the error in using r(x) to approximate the eigenvalue λJ
is of order the square of the error in x.
But the arbitrary starting vector q(0) can be expanded in terms of
Pm
the eigenvectors qj ; q(0) = i=1 αj qj . If α1 6= 0 then
X
k (0)
A q =
αj λkj qj
k m
X
αj λj
= α1 λk1 q1 +
qj .
α1 λ1
i=2
&
%
MS4105
561
'
The ratios
$
λj
λ1
k
go to zero as k → ∞ (as λ1 is the largest
eigenvalue in magnitude). But q(k) is just a normalised version of
Ak q(0) , say ck Ak q(0) so
k P
λj
m αj
q(k) = ck Ak q(0) = ck α1 λk1 q1 + ck α1 λk1
qj .
i=2 α1
λ1
Both q(k) and the eigenvectors qk are unit vectors so I can
conclude that the factor ck α1 λk1 → ±1 as k → ∞ if I are working
in Rm or a complex phase if working in Cm .
(As the matrix A is hermitian, I can take the eigenvectors and
eigenvalues to be real.)
This means that the q(k) and λ(k) in the Power method will
converge to the largest eigenvalue in magnitude λ1 and the
corresponding eigenvector q1 .
&
%
MS4105
562
'
$
For k sufficiently large check that kq(k) − (±)q1 k =O(| λλ21 |k ). I also
have that
λ(k) = q(k)∗ Aq(k)
k ∗ k
m
m
X
X
αj λj
αj λj
= (ck λk1 α1 )2 q1 +
qj
λ1 q1 +
λj qj
α1 λ1
α1 λ1
j=2
j=2
2 2k m
X
αj
λj
k
2
= (ck λ1 α1 ) λ1 +
λj
.
α1
λ1
j=2
It follows that the eigenvalue estimates converge quadratically in
the following sense:
2k λ2 (k)
|λ − λ1 | = O = O(kq(k) − (±)q1 k2 ).
(9.2)
λ1
I could have concluded this directly from the sentence following
(9.1).
&
%
MS4105
563
'
9.2
$
Inverse Iteration
But of course I want all the eigenvalues not just the largest in
magnitude.
If µ is not an eigenvalue of A then check that the eigenvectors of
−1
(A − µI) are the same as those of A and that the corresponding
eigenvalues are λj1−µ , j = 1, . . . , m.
Suppose that µ ≈ λj then | λj1−µ | | λk1−µ | for k 6= j.
Here comes the clever idea: apply the power method to the matrix
−1
(A − µI) . The algorithm “should” converge to qj and λj where
λj is the closest eigenvalue to µ.
&
%
MS4105
564
'
$
Algorithm 9.2 Inverse Iteration Method
(1)
(2)
(3)
Choose q(0) arbitrary with kq(0) k = 1.
for k = 1, 2, . . .
Solve (A − µI) p(k−1) = q(k−1) for p(k−1)
(k)
q
(4)
(5)
(6)
=
p(k)
kp(k) k
(k)∗
λ(k) = q
end
Aq(k)
The convergence result for the Power method can be extended to
Inverse Iteration noting that Inverse iteration is the Power method
with a different choice for the matrix A. We’ll state it as a theorem.
&
%
MS4105
565
'
$
Theorem 9.1 Let λj be the closest eigenvalue to µ and λp the next
closest so that |µ − λj | < |µ − λp | ≤ |µ − λi | for i 6= j, p. If
q∗j q(0) 6= 0 (αj 6= 0) then for k sufficiently large
k µ − λj (k)
kq − (±)qj k = O µ − λp 2k µ − λj (k)
|λ − λj | = O .
µλp
Proof: Exercise.
What if µ is (almost) equal to an eigenvalue ? You would
expect the algorithm to fail in this case. In fact it can be shown
that even though the computed p(k) will be far from the correct
value in this case; remarkably, the computed q(k) will be close to
the exact value.
&
%
MS4105
566
'
9.3
$
Rayleigh Quotient Iteration
A second clever idea;
• I have a method (the Rayleigh Quotient) for obtaining an
eigenvalue estimate from an eigenvector estimate q(k) .
• I also have a method (inverse iteration) for obtaining an
eigenvector estimate from an eigenvalue estimate µ.
Why not combine them? This gives the following algorithm.
&
%
MS4105
567
'
$
Algorithm 9.3 Rayleigh Quotient Iteration
(1)
(2)
(3)
(4)
Choose q(0) arbitrary with kq(0) k = 1.
λ(0) = q(0)∗ Aq(0) the corresponding Rayleigh Quotient .
for k = 1, 2, . . .
(k−1)
(k−1)
Solve A − λ
I p
= q(k−1) for p(k−1)
(k)
q
(5)
(6)
(7)
=
p(k)
kp(k) k
(k)∗
λ(k) = q
end
Aq(k)
It can be shown that this algorithm has cubic convergence. I state
the result as a Theorem without proof.
&
%
MS4105
'
568
$
Theorem 9.2 If λj is an eigenvalue of A and q(0) is close enough
to qj the corresponding eigenvector of A then for k sufficiently large
kq(k+1) − (±)qj k = O kq(k) − (±)qj k3
|λ(k+1) − λj | = O |λ(k) − λj |3 .
Proof: Omitted for reasons of time.
(This remarkably fast convergence only holds for normal matrices.)
&
%
MS4105
'
569
$
Operation count for Rayleigh Quotient Iteration
• If A is a full m × m matrix then each step of the Power method
requires a matrix -vector multiplication — O(m2 ) flops.
• Each step of the Inverse iteration method requires solution of a
linear system — as I have seen O(m3 ) flops.
• This reduces to O(m2 ) if the matrix A − µI has already been
factored (using QR or LU factorisation).
• Unfortunately, for Rayleigh Quotient iteration, the matrix to
be (implicitly) inverted changes at each iteration as λ(k)
changes.
• So back to O(m3 ) flops.
• This is not good as the Rayleigh Quotient method has to be
applied m times, once for each eigenvector /eigenvalue pair.
&
%
MS4105
'
570
$
• However, if A is tri-diagonal, it can be shown that all three
methods require only O(m) flops.
• Finally, the good news is that when A is hermitian, the
Householder Hessenberg Reduction Alg. 8.1 (the first stage of
the process of computing the eigenvectors and eigenvalues of
A) has reduced A to a Hessenberg matrix — but the
transformed matrix is still hermitian and a hermitian
Hessenberg matrix is tri-diagonal.
&
%
MS4105
571
'
9.4
$
The Un-Shifted QR Algorithm
I are ready to present the basic QR algorithm — as I’ll show (in an
Appendix), it needs some tweaking to be a practical method.
Algorithm 9.4 Un-shifted QR Algorithm
(1)
(2)
(3)
(4)
(5)
A(0) = A
for k = 1, 2, . . .
Q(k) R(k) = A(k−1) Compute the QR factorisation of A.
A(k) = R(k) Q(k) Combine the factors in reverse order.
end
The algorithm is remarkably simple! Despite this, under suitable
conditions, it converges to a Schur form for the matrix A, upper
triangular for a general square matrix, diagonal if A is hermitian.
&
%
MS4105
'
572
$
Try it:
• a=rand(4,4); a=a+a’; Generate a random hermitian matrix
• while ∼converged [q,r]=qr(a); Calculate QR factors for a
• a=r*q; Combine the factors in reverse order
• end
• It works and coverges fast for this toy problem.
• But if you change to (say) a 20 × 20 matrix convergence is very
slow.
&
%
MS4105
'
573
$
• The QR algorithm can be shown to be closely related to
Alg. 9.3 (Rayleigh Quotient Iteration).
• With small but vital variations, the QR algorithm works very
well.
• Unfortunately the reasons why it works so well are technical.
• See App. I for the full story.
• The “Shifted QR Algorithm” is an improved version of the QR
Algorithm — you’ll find pseudo-code in Alg. I.5 in Appendix I.
• Coding it in Matlab is straightforward with the exception of
the recursive function call at line 9.
&
%
MS4105
574
'
10
$
Calculating the SVD
Some obvious methods for computing the SVD were discussed in
Sec. 6.5. I promised there to return to the subject at the end of the
course.
&
%
MS4105
575
'
10.1
$
An alternative (Impractical) Method for
the SVD
There is an interesting method for computing the SVD of a general
m × n matrix A.
For simplicity assume that A is square (m = n). I have as usual
that A = UΣV. Form the 2m × 2m matrices






0 A∗
V
Σ 0
1 V





, O= √
A=
and S =
2 U −U
A 0
0 −Σ
It is easy to check that O is unitary (O∗ O = I2m ).
&
%
MS4105
576
'
$
Check that the block 2 × 2 equation
AO = OS
is equivalent to AV = UΣ and A∗ U = VΣ∗ = VΣ — which in turn
are equivalent to from A = UΣV ∗ .
So I have A = OSO∗ which is an eigenvalue decomposition of A
and so the singular values of A are just the absolute values of the
eigenvalues of A and U& V can be extracted from the
eigenvectors of A.
This approach is numerically stable and works well if
eigenvalue and eigenvector code is available. It is not used in
practice because the size of the matrix A is excessive for large
problems.
However it does reassure us that it is possible to compute a SVD to
high accuracy.
&
%
MS4105
577
'
10.1.1
$
Exercises
1. Write Matlab code to implement the above idea (using the
built-in Matlab eig function). Test your code on a random
20 × 20 matrix. Check the results with the built-in svd
function.
&
%
MS4105
578
'
$
2. Can you extend the above construction to transform the
problem of computing the SVD A = UΣV ∗ of a general
m × n matrix into an eigenvalue problem?
Hints: use

∗

0 A

,
A=
A 0
¯ =
where U
^
√1 U
2

^
Σ
0
¯

−U U0
 and S =  0 −Σ
^

V¯
0
0
0
^ | U0 where
and V¯ = √12 V. Here U = U

¯
U
O=
V¯

0


0
,
0
^ as
U
usual consists of the first n columns of U and U0 consists of the
^ is as usual the n × n submatrix
remaining columns. Finally Σ
of Σ consisting of the first n rows and columns of Σ.
&
%
MS4105
579
'
$
3. • Write Matlab code to implement the above idea (using the
built-in Matlab eig function).
• You’ll probably need to swap blocks of O around as the
built-in Matlab eig command may not produce O with the
right block structure.
• The Matlab spy command is very useful in situations like
this when you want to examine a matrix structure.
• A useful variation on spy is:
function myspy(A,colour)
if nargin==1
spy(abs(A)>1.0e-8)
else
spy(abs(A)>1.0e-8,colour)
end
&
%
MS4105
'
580
$
• Test your code on a random 30 × 20 matrix.
• Check the results with the built-in svd function.
&
%
MS4105
581
'
10.2
$
The Two-Phase Method
The two-phase method for computing the eigenvectors and
eigenvalues of a square (real and symmetric) matrix was introduced
in Ch. 8.2. Its second phase, the QR algorithm, was elaborated at
length in Ch. 9. The two phases were:
1. Using Householder Hessenberg Reduction (Alg. 8.1) to reduce
A to a Hessenberg matrix — symmetric Hessenberg matrices
are tri-diagonal.
2. Using the QR algorithm to reduce the tri-diagonal matrix to a
diagonal one.
&
%
MS4105
'
582
$
Briefly, one standard method for computing the SVD of a general
m × n matrix A is to
1. use a “Golub-Kahn Bidiagonalisation” to reduce A to
bi-diagonal form (only the main diagonal and first
super-diagonal non-zero).
2. Use the QR algorithm to reduce the resulting bi-diagonal to a
diagonal matrix.
I will not discuss the Golub-Kahn Bidiagonalisation algorithm here
— but it is very similar to Householder Hessenberg Reduction and
indeed uses Householder reflections in a very similar way to
introduce zeroes alternately in columns to the left of the main
diagonal and in rows to the right of the first super-diagonal.
&
%
MS4105
583
'
10.2.1
$
Exercises
1. Write pseudo-code (based on Alg. 8.1) to implement
Golub-Kahn Bidiagonalisation.
2. Test your pseudo-code by writing a Matlab script to implement
it.
3. What is the computational cost of the algorithm?
&
%
MS4105
'
584
$
Part III
Supplementary Material
&
%
MS4105
585
'
A
$
Index Notation and an Alternative
Proof for Lemma 1.10
We start with a familiar example — a linear system of n equations
in m unknowns:
a11 x1
+a12 x2
+...
+a1m xm
= b1
a21 x1
..
.
+a22 x2
..
.
+...
..
.
+a2m xm
= b2
..
.
an1 x1
+an2 x2
+...
+anm xm
= bn
(A.1)
The system can be written much more compactly as
ai1 x1 + ai2 x2 + · · · + aim xm = bi
where i = 1, . . . , n.
&
%
MS4105
586
'
$
But we can do better! We can use “index” notation to write:
m
X
aij xj = bi ,
where i = 1, . . . , n.
j=1
A final twist is the “summation convention”. It very simply says
that if, in a formula, (or a single term in a formula) the same
subscript (index) is repeated then that index is to be summed over
the possible range of the index. So for example
Pk = Zkg Fg ,
where k = 1, . . . , N
is a short-hand for
Pk =
M
X
Zkg Fg ,
where k = 1, . . . , N .
g=1
&
%
MS4105
587
'
$
Another example, the trace of a matrix is the sum of its diagonal
elements so:
Tr(A) = Aii
is short-hand for
Tr(A) =
n
X
Aii
i=1
A more complicated example:
Aij = Bix Cxy Dyj
is short-hand for
Aij =
N X
M
X
Bix Cxy Dyj
x=1 y=1
which is just the (familiar??) formula for taking the product of (in
this case) three matrices so that A = BCD.
&
%
MS4105
588
'
$
So the linear system (A.1) can be writtem
aij xj = bi
(A.2)
We do not need to re-write the range of the subscripts i and j as
they are completely determined by the number of elements in the
vectors l and s — so that we must have i = 1, . . . , n and
j = 1, . . . , m.
A final example — Linear Independence:
Definition A.1 (Alternative Linear Independence) If
S = {v1 , . . . , vk } is a non-empty set of vectors then S is linearly
independent if
αj vj = 0
only has the solution αi = 0 , i = 1, . . . , k.
Note that the index notation still works when we are summing over
vectors, not just scalars.
&
%
MS4105
589
'
$
Now for a streamlined version of the proof of Lemma 1.10. It will
be word-for word the same as the original proof except for using
index notation with the summation convention.
Proof: Recall that L = {l1 , . . . , ln } is a linearly independent set in
Vand S = {s1 , . . . , sm } is a second subset of V which spans V. We
will assume that m < n and and show this leads to a contradiction.
As S spans V we can re-write (1.1) using index notation and the
summation convention as:
li = aij sj , i = 1, . . . , n.
(A.3)
Now consider the linear system
aji cj = 0, i = 1, . . . , m
— note the sneaky reversal of the order of the subscripts
(equivalent to writing AT c = 0). This is a homogeneous linear
&
(A.4)
%
MS4105
590
'
$
system with more unknowns than equations and so must have
non-trivial solutions for which not all of c1 , . . . , cn are zero.
So we can write (multiplying each aji cj by si and summing over i)
si aji cj = 0.
(A.5)
Re-ordering the factors (OK to do this as each term in the double
sum over i and j is a product of a vector sj and two scalars cj and
aij ):
cj (aji si ) = 0.
(A.6)
But the sums in brackets are just lj by (A.3).
(Note that the roles of i and j are swapped when going from (A.3)
to (A.6).)
So we can write:
cj lj = 0
&
%
MS4105
'
591
$
with cj not all zero. But this contradicts the assumption that
L = {l1 , . . . , ln } is linearly independent. Therefore we must have
n ≤ m as claimed.
&
%
MS4105
592
'
B
$
Proof that under-determined
homogeneous linear systems have a
non-trivial solution
Prove the result:
“For all m ≥ 1, given an m × n matrix A where m < n, the
under-determined homogeneous linear system Ax = 0 has
non-trivial solutions” by induction.
Write the under-determined homogeneous linear system as:
&
a11 x1
+a12 x2
+...
+a1n xn
=0
a21 x1
..
.
+a22 x2
..
.
+...
..
.
+a2n xn
..
.
=0
..
.
am1 x1
+am2 x2
+...
+amn xn
=0
(B.1)
%
MS4105
'
593
$
[Base Step] This is the case m = 1 where there is only one
equation in the n unknowns x1 , . . . , xn , namely
a11 x1 + a12 x2 + · · · + a1n xn = 0. Suppose that a11 6= 0. Then
set x1 = −a12 /a11 , x2 = 1 and all the rest to zero. This is a
non-trivial solution as required.
&
%
MS4105
594
'
$
[Induction Step] Now assume that the result holds for all
under-determined homogeneous linear systems with m − 1 rows
and n − 1 columns and m − 1 < n − 1 ≡ m < n.
• Suppose that the first column of the coefficient matrix is not
all zero. Then one iteration of Gauss Elimination produces
a new linear system (with the same solutions as the
original) of the form:
&
1x1
+^
a12 x2
+...
+^
a1n xn
=0
0x1
..
.
+^
a22 x2
..
.
+...
..
.
+^
a2n xn
..
.
=0
..
.
0x1
+^
am2 x2
+...
+^
amn xn
=0
(B.2)
%
MS4105
595
'
$
Rows 2–m of this linear system are an underdetermined
homogeneous linear system with m − 1 rows and n − 1
columns and so by the inductive assumption there is a
non-trivial solution (not all zero) for x2 , x3 , . . . , xn . Finally,
set x1 = 0 to obtain a non-trivial solution to the full linear
system (B.2) and therefore to (B.1).
• The trivial case where the first column of the coefficient
matrix is all zero is left as an exercise.
&
%
MS4105
596
'
C
$
Proof of the Jordan von Neumann
Lemma for a real inner product space
A proof of the Jordan von Neumann Lemma 2.5 mentioned in Q. 5
on Slide 123.
Proof: We define the “candidate” inner product (2.9)
1
1
u, v = ku + vk2 − ku − vk2
4
4
(C.1)
and seek to show that it satisfies the inner product space axioms
Def. 2.1 provided that the Parallelogram Law (2.2)
2
2
2
2
(C.2)
ku + vk + ku − vk = 2 kuk + kvk
holds.
&
%
MS4105
597
'
$
1. The Symmetry Axiom holds as (2.9) is symmetric in u and v.
2. To prove the Distributive Axiom we begin with:
1
u + v, w + u − v, w =
ku + v + wk2 − ku + v − wk2
4
+ ku − v + wk2 − ku − v − wk2
(C.3)
=
1
(ku + wk2 + kvk2 )
2
− (ku − wk2 + kvk2 )
1
=
ku + wk2 − ku − wk2
2
= 2 u, w .
(C.4)
(C.5)
(C.6)
We used the Parallelogram Law to derive (C.4) from (C.3).
&
%
MS4105
598
'
$
So
u + v, w + u − v, w = 2 u, w
(C.7)
and setting u = v leads to 2u, w = 2 u, w , i.e. the 2 can be
“taken out”.
Now replace u by (u + v)/2 and v by (u − v)/2 in (C.7)—
leading to:
u + v , w = u + v, w
u, w + v, w = 2
2
(C.8)
which is the Distributive Axiom.
&
%
MS4105
599
'
$
3. To prove the Homogeneity Axiom αu, v = α u, v , we first
note that we already have 2u, w = 2 u, w . The Distributive
Axiom with v = 2u gives 3 u, w = u, w + 2 u, w . But this
is equal to u, w + 2u, w = u + 2u, w = 3u, w .
and so by induction for any positive integer n:
n u, w = u, w + (n − 1) u, w
= u, w + (n − 1)u, w
= u + (n − 1)u, w
= nu, w .
(C.9)
Now set u = (1/n)u in both sides of (C.9) giving
n u/n, w = u, w or
u 1
,w =
u, w .
n
n
&
(C.10)
%
MS4105
600
'
$
Setting n = m another arbitrary positive integer in (C.10) and
combining with (C.9) we have
n
n
u, w =
u, w .
(C.11)
m
m
So for any positive rational number r = n/m,
ru, w = r u, w .
(C.12)
We can include negative rational numbers by using the
Distributive Axiom again with v = −u so that
0 = u − u, w = u, w + −u, w and so −u, w = − u, w .
Finally (and this is a hard result from First Year Analysis) any
real number can be approximated arbitrarily closely by a
rational number so we can write
ru, w = r u, w , for all r ∈ R
(C.13)
which is the Homogeneity Axiom.
&
%
MS4105
601
'
$
4. The Non-Negativity Axiom The conditions u, u ≥ 0 and
u, u = 0 if and only if u = 0 are trivial as
1
u, u = 4 ku + uk2 = kuk ≥ 0 as k · k is a norm. Again
kuk = 0 if and only if u = 0 as k · k is a norm.
&
%
MS4105
'
D
602
$
Matlab Code for (Very Naive) SVD
algorithm
The following is a listing of the Matlab code referred to in Sec. 6.3.
You can download it from
http://jkcray.maths.ul.ie/ms4105/verynaivesvd.m.
&
%
MS4105
'
1
2
3
4
5
6
7
8
9
10
11
12
603
$
m=5;n=3;a=rand(m,n); asa=a’∗a; % form a∗a
[v,s2]=eig(asa); % find the eigenvectors v and the eigenvalues s2 of a∗a
s=sqrt(s2); % s is a diagonal matrix whose diagonal elements
% are the non−zero singular values of a
ds=diag(s); % ds is the vector of non−zero singular values
[dss,is]=sort(ds,1,’descend’); %Sort the singular values
% into decreasing order
s=diag(dss); % s is a diagonal matrix whose diagonal elements
% are the non−zero singular values sorted into
% decreasing order
s=[s’ zeros(n,m−n)]’; % pad s with m−n rows of zeros
v=v(:,is); % apply the same sort to the columns of v
&
%
MS4105
'
13
14
15
16
17
18
19
20
21
22
23
24
604
$
aas=a∗a’; % form aa∗
[u,s2p]=eig(aas); % find the eigenvectors u
% and the eigenvalues s2p of a a∗
sp=sqrt(s2p); % sp is a diagonal matrix whose diagonal elements
% are the singular values of a (incl the zero ones)
dsp=diag(sp); ; % dsp is the vector of all the singular values
[dssp,isp]=sort(dsp,1,’descend’); %Sort the singular values
% into decreasing order
sp=diag(dssp); % s is a diagonal matrix whose diagonal elements
% are the singular values sorted into decreasing order
u=u(:,isp);% apply the same sort to the columns of u
norm(u∗sp(:,1:n)∗v’−a) % should be very small
&
%
MS4105
'
E
1
2
3
4
5
6
7
8
9
10
11
12
13
605
$
Matlab Code for simple SVD
algorithm
m=70;
n=100;
ar=rand(m,n);
ai=rand(m,n);
a=ar+i∗ai;
asa=a’∗a;
[v,s2]=eig(asa);
av=a∗v;
s=sqrt(s2);
if m>n
s=[s zeros(n,m−n)]’; %if m>n!!
end;
[q,r]=qr(av);
&
%
MS4105
'
14
15
16
17
18
19
20
21
22
23
24
25
26
27
606
$
d=diag(r);
dsign=sign(d);
dsign=dsign’;
zpad=zeros(1,m−n);
dsign=[dsign zpad]’;
dsignmat=diag(dsign);
if m<n
dsignmat=[dsignmat zeros(m,n−m)];
end;
u=q∗dsignmat;
atest=u∗s∗v’;
diffjk=norm(a−atest)
[U,S,V]=svd(a);
diffmatlab=norm(U∗S∗V’−a)
&
%
MS4105
607
'
F
$
Example SVD calculation

−2

A=
−10



11
−2 −10
 , AT = 

5
11
5
so

104 −72
T
.

A A=
−72 146

Solve AT AV = VΛ:


λ − 104
72
T

 = (λ−104)(λ−146)−722 = 0.
det(I−λA A) = det
72
λ − 146
So the eigenvalues are λ = 200, 50. Now find the eigenvectors . Let
&
%
MS4105
'
608
$
 
x
, then AT Av1 = 200v1 simplifies to 104x − 72y = 200x so

v1 =
y
x = −3y/4 giving v1 = (−3, 4)T . We can normalise this to
v1 = 15 (−3, 4)T . Similarly AT Av2 = 50v2 gives the normalised
v2 = 15 (4, 3)T .




200 0
−3 4
1 


.
and Λ =
So V = 5
0
50
4 3


10 5

.
Now find a QR factorisation for AV =
10 −5
&
%
MS4105
609
'
G
$
Uniqueness of U and V in S.V.D.
We re-state Theorem 6.7.
Theorem G.1 If an m × n matrix A has two different SVD’s
A = UΣV ∗ and A = LΣM∗ then
U∗ L = diag(Q1 , Q2 , . . . , Qk , R)
V ∗ M = diag(Q1 , Q2 , . . . , Qk , S)
where Q1 , Q2 , . . . , Qk are unitary matrices whose sizes are given by
the multiplicities of the corresponding distinct non-zero singular
values — and R, S are arbitrary unitary matrices whose size equals
the number of zero singular values. More precisely, if
Pk
q = min(m, n) and qi = dim Qi then i=1 qi = r = rank(A) ≤ q.
&
%
MS4105
610
'
$
The proof below investigates what happens in the more
complicated case where one or more singular value is repeated.
We need two preliminary results:
Lemma G.2 If Σ is diagonal then for any compatible matrix A,
AΣ = ΣA if and only if aij = 0 whenever σii 6= σjj .
Proof: We have (AΣ)ij = aij σjj so
AΣ = ΣA ≡ (AΣ)ij = (ΣA)ij ≡ aij σjj = aij σii
This is equivalent to aij (σii − σjj ) = 0. The result follows.
• So for A to commute with Σ, it can have non-zero diagonal
elements but non-zero off-diagonal elements (at aij say) only
when the corresponding elements σii and σjj are the same.
• The diagonal Σ must have repeated diagonal elements for a
matrix A that commutes with it to have off-diagonal elements.
• The following Example illustrates the point.
&
%
MS4105
611
'
$
Example G.1 If Σ is the diagonal matrix on the left below then
any 7 × 7 matrix A with the block structure shown satisfies
AΣ = ΣA.

1

 2





Σ=







2
3
3
3

0 0 0



0 0





0 0




,A = 0 0 0 




0 0 0 


0 0 0 

4
0 0 0 0
0
0
0

0 0


0 0


0


0

0

0 0
0
0

Try experimenting with Matlab to convince yourself.
The Example suggests the following Lemma:
&
%
MS4105
612
'
$
Lemma G.3 If Σ = diag(c1 I1 , . . . , cM IM ) where the Ik are
identity matrices then for any compatible matrix A, AΣ = ΣA iff A
is block diagonal


A
0 ...
0
 1



0 
 0 A2 . . .

A=
..
.. 
 ..
..
 .
.
.
. 


0
0
...
AM
where each Ai is the same size as the corresponding Ii .
Proof: From the first Lemma, as Σ here is diagonal, we have
aij = 0 unless σii = σjj . Since the scalars ci are distinct, σii = σjj
only within a block. The result follows.
&
%
MS4105
613
'
$
Proof: (Main Theorem) We take m ≥ n — the other case is left as
an exercise. Using the first SVD for A:
AA∗ = UΣV ∗ VΣ∗ U∗ = UΣΣ∗ U∗ . We also have AA∗ = LΣΣ∗ L∗ .
(Note that although Σ is diagonal, it is not hermitian — even if
real — as Σ is m × n .) Then Σ2 ≡ ΣΣ∗ (as previously) is an
m × m matrix with an n × n diagonal block in the top left and
zeroes elsewhere. So equating the two expressions for AA∗ ;
UΣ2 U∗ = LΣ2 L∗ and Σ2 = U∗ LΣ2 L∗ U = PΣ2 P∗ where P = U∗ L, a
unitary m × m matrix. Re-arranging we have
Σ2 P = PΣ2 .
It follows from the second Lemma above that P is block diagonal
with blocks whose sizes match the multiplicities of the singular
values in Σ2 ≡ ΣΣ∗ .
&
%
MS4105
614
'
$
¯ 2 (say) is
Similarly Σ∗ Σ = QΣ∗ ΣQ∗ where Q = V ∗ M. Now Σ∗ Σ ≡ Σ
an n × n matrix (equal to the n × n block in the top left of ΣΣ∗ and
¯2 = Σ
¯ 2 Q. Again appealing to the second Lemma above
we have QΣ
we have that Q is block diagonal with blocks whose sizes match the
multiplicities of the singular values in Σ∗ Σ.
So

P1

0

.
.
P=
.

0

0
0
...

0
0

0

.. 
.
,

0

˜
P
P2
..
.
...
..
.
0
..
.
0
...
Pk
0
...
0

Q1

 0

 .
.
Q=
 .

 0

0
0
...

0
0

0

.. 
.
,

0

˜
Q
Q2
..
.
...
..
.
0
..
.
0
...
Qk
0
...
0
where each Pi and Qi are the same size as the corresponding block
in ΣΣ∗ and Σ∗ Σ respectively.
&
%
MS4105
615
'
$
In fact each Pi = Qi . Reason as follows: we have
A = UΣV 0 = LΣM 0 .
So, using P = U∗ L and Q = V ∗ M we have L = UP and M = VQ so
that
UΣV 0 = UPΣQ 0 V 0 .
Multiplying on the left by U 0 and on the right by V gives
Σ = PΣQ 0 .
But Σ is diagonal and each Pi is of the same size as the
corresponding Qi — corresponding to the multiplicities of the
corresponding distinct non-zero singular values. So finally we must
have Pi Qi0 = Ii , for each of the blocks giving Pi = Qi .
˜ is an (m − n) × (m − n) unitary matrix
When m > n check that P
˜ is absent. (State
and the row and column in Q corresponding to Q
the corresponding results when n > m and n = m.)
&
%
MS4105
616
'
H
$
Oblique Projection Operators — the
details
In this Appendix we explain in detail how a matrix expression for a
non-orthogonal (oblique) projection operator may be calculated.
(Or return to Sec. 5.1.4.)
We will explain two different methods. Both use the same
definitions:
Let B1 = {s11 , . . . , sk1 } form a basis for S1 = range(P) and
B2 = {s12 , . . . , sm−k
} form a basis for S2 = null(P).
2
&
%
MS4105
617
'
$
1. • We seek to construct an idempotent (P2 = P) matrix P s.t.
Pv = v for v ∈ S1 and Pv = 0 for v ∈ S2 .
h
i
• Define the m × k matrix B1 = s11 . . . sk1 and the
h
i
m × (m − k) matrix B2 = s12 . . . sm−k
.
2
¯ 2 whose columns are a basis
• Also define the m × k matrix B
for the space orthogonal to S2 .
¯ 2 is orthogonal to every vector in B2
• So every column of B
(or equivalently to every column of B2 or every vector in
S2 = null(P)).
¯ T v = 0.
• So for any v ∈ S2 ≡ null(P), B
2
¯ T also has this property
• Any matrix P of the form P = XB
2
— i.e. Pv = 0 for all v ∈ S2 .
• We will tie down the choice of X by requiring that Pv = v
for v ∈ S1 .
&
%
MS4105
618
'
$
• Now consider v ∈ S1 ≡ range P.
• So v = B1 z for some z ∈ Ck .
¯T v = B
¯ T B1 z.
• Therefore B
2
2
¯ T v.
¯ T B1 )−1 B
• And z = (B
2
2
• As v ∈ S1 ≡ range P we have
¯ T B1 )−1 B
¯ T v.
Pv ≡ v = B1 z = B1 (B
2
2
• This expression also holds when v ∈ null(P) as in that case
−1 ¯ T
¯ T v = 0 and so B1 (B
¯T
B
B2 v = 0.
2 B1 )
2
¯ T B1 )−1 B
¯T .
• So we can write P = B1 (B
2
&
2
%
MS4105
619
'
$
2. An alternative “formula” for P can be derived as follows:
• As S1 and S2 are complementary subspaces of Cm we have
that B1 ∪ B2 is a basis for Cm .
• It follows that the matrix B whose columns are the vectors
s11 , . . . , sk1 , s12 , . . . , sm−k
;
2
h
i h
i
B = s11 . . . sk1 | s12 . . . sm−k
= B1 | B2
2
must be invertible.
• Now as Pv = v for all v in S1 and Pv = 0 for all v in S2 we
must have Psi1 = si1 for i = 1, . . . k and Psi2 = 0 for
i = 1, . . . m − k and so
h
i h
i h
i
PB = P B1 | B2 = PB1 | PB2 = B1 | 0
&
%
MS4105
620
'
$
• Multiplying on the right by B−1 ;
h
P = B1
|
i
−1
0 B

Ik

=B
0

0 −1
B .
0
• It follows that

0

I−P =B
0
&
0
Im−k

 B−1 .
%
MS4105
'
621
$
Let’s check that each formula works — we’ll use a simple example.
Example H.1 Let P project vectors in R2 into the y–axis along
the line y = −αx. So S1 is the y–axis and S2 is the line y = −αx.
¯ T ).
¯ T B1 )−1 B
1. • Use the first formula above ( P = B1 (B
2
2
 
0

• We have B1 = 
1
 
1
.

• And B2 =
−α
 
α
¯

.
• So B2 =
1
&
%
MS4105
'
622
$
• Substituting into the formula above for P we find:
 −1
 
h
i
h
i 0
0
P =    α 1  
α 1
1
1
 
h
i
0
=   (1) α 1
1


0 0


=
α 1
&
%
MS4105
623
'
$
 
0

2. Using the second recipe above, as before we have B1 = .
1
 
1


And B2 =
−α




α 1
0 1
−1


.

and (check ) B =
• So B =
1 0
1 −α
• So (as k = 1),

1

P=B
0
&


0 −1
0
B = 
0
1

1
1

−α 0

 

0 α 1
0 0

=
.
0
1 0
α 1
%
MS4105
'
624
$
So for any vector v = (x, y)T , Pv = (0, αx + y)T and as x and y are
aribtrary we see that range P is indeed the y–axis and that Pv = 0
precisely when y = −αx.
Which method do you think is better? Why?
(Back to Sec. 5.1.4.)
&
%
MS4105
625
'
I
I.1
$
Detailed Discussion of the QR
Algorithm
Simultaneous Power Method
To show that this method works in general we need to temporarily
back-track to the Power Method.
A natural extension of the simple Power Method above is to apply
it simultaneously to a matrix Q(0) of random starting vectors. This
offers the prospect of calculating all the eigenvalues and
eigenvectors of A at once rather than using inverse iteration to
compute then one at a time.
We start with a set of n randomly chosen linearly
(0)
(0)
independent unit vectors (n ≤ m) q1 , . . . , qn .
We expect/hope that just as
&
%
MS4105
626
'
$
(0)
• Ak q1 → (a multiple of) q1
(0)
(0)
• so the space spanned by {Ak q1 , . . . , Ak qn }
E
D
(0)
(0)
– written Ak q1 , . . . , Ak qn
should converge to hq1 , . . . , qn i, the space spanned by the n
eigenvectors corresponding to the n largest eigenvalues.
&
%
MS4105
627
'
$
h
We define the m × n matrix Q(0) = q(0)
1
(0)
q2
...
(0)
qn
i
and
Q(k) = Ak Q(0) . (We are not normalising the columns as yet.)
As we are interested in the column space of Q(k) , use a reduced
QR factorisation to extract a well-behaved (orthonormal) basis for
this space:
^ (k) R
^ (k) = Q(k)
Q
reduced QR factorisation of Q(k)
(I.1)
^ (k) is m × n and R
^ (k) is n × n .
As usual, Q
^ (k) will converge
We expect/hope that as k → ∞, the columns of Q
to (±) the eigenvectors of A, q1 , . . . , qn .
&
%
MS4105
628
'
$
We can justify this hope!
• Define the m × n matrix whose columns are the first n
^ = [q1 , . . . , qn ] .
eigenvectors of A, i.e. Q
(0)
• Expand qj
(k)
and qj
(0)
qj
in terms of these eigenvectors :
X
^
=
αij qi ↔ Q(0) = QA
i
(k)
qj
=
X
αij λki qi
i
We assume that
• the leading n + 1 eigenvalues are distinct in modulus
• the n × n matrix A of expansion coefficients aij is non-singular
^ is the matrix whose columns
in the sense that if (as above) Q
are the first n eigenvectors q1 , . . . , qn then all the leading
^ ∗ Q(0) are non-singular.
principal minors of A ≡ Q
&
%
MS4105
629
'
$
Now the Theorem:
Theorem I.1 If we apply the above un-normalised simultaneous
(or block) power iteration to a matrix A and the above assumptions
^ (k) converge linearly to
are satisfied then as k → ∞ the matrices Q
the eigenvectors of A in the sense that
(k)
− (±)q)jk = O(ck ) for each j = 1, . . . , n,
λk+1 is less than 1.
where c = max 1≤k≤n
λk kqj
We include the proof for completeness.
^ to the full m × m unitary matrix Q of
Proof: Extend Q
eigenvectors of A. Let Λ be the diagonal matrix of eigenvalues so
^ to be the top
that A = QΛQT (A real and symmetric). Define Λ
left n × n diagonal block of Λ.
&
%
MS4105
630
'
$
Then
Q(k) ≡ Ak Q(0) = QΛk QT Q(0)
^Λ
^ kQ
^ T Q(0) + O(|λn+1 |k ) as k → ∞
=Q
^Λ
^ k A + O(|λn+1 |k )
=Q
as k → ∞.
(Check that we can justify the step from the first to the second
line by writing


h
i
^ 0
Λ

¯ ,Λ = 
^ Q
Q= Q
¯
0 Λ
¯ and Λ
¯ are the remaining m − n
^ and Λ
^ are as above and Q
where Q
columns of Q and the bottom right (m − n) × (m − n) block of Λ
respectively.)
&
%
MS4105
'
631
$
^ T Q(0) is non-singular (in terms of
We assumed above that A ≡ Q
its principal minors) so multiply the last equation above on the
right by A−1 A ≡ In giving
T (0)
(k)
k
k
^
^
^ Q .
Q = QΛ + O(|λn+1 | ) Q
^ T Q(0) is non-singular, the column space of Q(k) is the same as
As Q
^Λ
^ k + O(|λn+1 |k ) (as XY is a linear
the column space of Q
^Λ
^ k as
combination of the columns of X). This is dominated by Q
k → ∞.
&
%
MS4105
'
632
$
^ T Q(0) are
We also assumed that all the principal minors of A ≡ Q
non-singular — so the above argument may be applied to leading
^ the first column, the first
subsets of the columns of Q(k) and Q;
and second, and so on. In each case we conclude that the space
spanned by the corresponding columns of Q(k) converge linearly to
^
the space spanned by the corresponding columns of Q.
From this convergence of all the successive column spaces together
^ R(k)
^
with the definition of the QR factorisation, Q(k)
= Q(k) , the
result follows.
&
%
MS4105
633
'
I.1.1
$
A Normalised version of Simultaneous Iteration
To make Simultaneous Iteration useful we must normalise at each
iteration, not just after multiplying by A k times — otherwise
round-off would cause all accuracy to be lost.
Algorithm I.1 Normalised Simultaneous Iteration
^ (0) m × n with orthonormal columns.
(1) Choose an arbitrary Q
(2) for k = 1, 2, . . .
^ (k−1)
(3)
Z = AQ
^ (k) R
^ (k) = Z reduced QR factorisation of Z
(4)
Q
(5) end
^ (k) and Z(k) are the same; both
Obviously the column spaces of Q
are in the column space of A. (You should check this statement.)
Ignoring the numerical round-off issues, Alg. I.1 converges under
the same assumptions as the original un-normalised version.
&
%
MS4105
'
634
$
We now add an extra Line 5 to the algorithm. It defines A(k) , a
projected version of the original matrix A. We will see that A(k)
converges to the diagonal matrix Λ.
¯^ (k) and R
¯ (0) = I for simplicity and drop the hats on Q
^ (k)
We take Q
as we will be using full QR factorisations as A is square and we
¯ and R
¯ in SI to distinguish
want all its eigenvectors. We will use Q
the Q(k) , R(k) generated by SI from those generated by QR.
Algorithm I.2
Normalised Simultaneous Iteration with extra line 5
¯ (0) = I
(1) Q
(2) for k = 1, 2, . . .
¯ (k−1)
(3)
Z = AQ
¯ (k) R
¯ (k) = Z reduced QR factorisation of Z
(4)
Q
¯ (k)T AQ
¯ (k) allows comparison with QR algorithm
(5)
A(k) = Q
(6) end
&
%
MS4105
'
635
$
We now re-write the QR algorithm, Alg. 9.4, also with an extra
line that defines Qπ (k) , the product of Q(1) . . . Q(k) . (Here π
stands for “product”.)
Here we will use Q and R in SI to distinguish the Q(k) , R(k)
generated by the QR algorithm from those generated by the SI
algorithm.
Algorithm I.3 Un-shifted QR Algorithm with extra line
(1)
(2)
(3)
(4)
(5)
(6)
A(0) = A, Q(0) = I
for k = 1, 2, . . .
Q(k) R(k) = A(k−1) Compute the QR factorisation of A.
A(k) = R(k) Q(k) Combine the factors in reverse order.
Qπ (k) = Q(1) . . . Q(k)
end
&
%
MS4105
'
636
$
We can state a Theorem which establishes that the two
algorithms are equivalent.
Theorem I.2 Algorithms I.2 and I.3 are equivalent in that they
generate the same sequence of matrices.
¯ (k) generated by the SI algorithm are equal
• More precisely the Q
¯ (k) generated by SI are
to the Qπ (k) generated by the QR, the R
equal to the R(k) generated by QR while the same A(k) are
computed by both.
¯ (i)
¯ (k)
• Define R
π to be the product of all the R ’s computed so far
in SI and similarly for R(k)
π in QR. (Again, π stands for
“product”.)
&
%
MS4105
637
'
$
• Claim that:
¯ (k) . . . R
¯ (1)
¯ (k)
R
≡
R
π
and that:
Alg.
Q
¯ (k)
Q
SI
QR
&
R
¯ (k)
R
m
m
Qπ (k)
R(k)
is equal to
(k)
(1)
R(k)
≡
R
.
.
.
R
π
Projection of A
¯ (k)T AQ
¯ (k)
A(k) = Q
Powers of A
¯ (k) R
¯ (k)
Ak = Q
π
A(k) = Qπ (k)T AQπ (k)
Ak = Qπ (k) R(k)
π
%
MS4105
'
638
$
Proof: The proof is by induction.
• The symbols Qπ (k) and R(k) are only used in the analysis of
the QR algorithm — they have no meaning in the SI
algorithm.
¯ (k) and R
¯ (k) are only used in the analysis of the
• The symbols Q
SI algorithm — they have no meaning in the QR algorithm.
[Base Case k = 0] Compare the outputs of the two algorithms.
¯ (0) = I and A(0) = A.
[S.I.] We have Q
[QR] A(0) = A and Qπ (0) = Q(0) = I.
¯ (0) = I.
[Both] R(0) = R
¯ (0) = Q (0) = I and so A = A(0) = Q
¯ (0)T AQ
¯ (0)
So trivially, Q
π
for both algorithms . (A0 = I by definition so nothing to
check.) X
&
%
MS4105
639
'
$
[Inductive Step k ≥ 1] Compare the outputs of the two
algorithms.
[S.I.] We need to
¯ (k)T AQ
¯ (k) . But this is Line 5 of S.I.X
• prove that A(k) = Q
¯ (k)
¯ (k) R
• prove that Ak = Q
π . Assume
¯ (k−1)
¯ (k−1) R
. (Inductive hypothesis.) Then
Ak−1 = Q
π
¯ (k−1)
¯ (k−1) R
Ak = AQ
π
¯ (k−1)
= ZR
π
Line 3 of SI algorithm
¯ (k) R
¯ (k) R
¯ (k−1)
=Q
π
Line 4 of SI algorithm
¯ (k) R
¯ (k) .X
=Q
π
&
%
MS4105
640
'
$
[QR algorithm ] We need to
• prove that Ak = Qπ (k) R(k)
π .
Inductive hypotheses: assume that
1. Ak−1 = Qπ (k−1) R(k−1)
π
2. A(k−1) = Qπ (k−1)T AQπ (k−1) .
Then
Ak = AQπ (k−1) R(k−1)
π
Inductive hypothesis 1
= Qπ (k−1) A(k−1) R(k−1)
π
= Qπ (k−1) Q(k) R(k) R(k−1)
π
Inductive hypothesis 2
Line 3 of QR alg.
= Qπ (k) R(k)
π .X
&
%
MS4105
641
'
$
• (We need to) prove that A(k) = Qπ (k)T AQπ (k) . We have
A(k) = R(k) Q(k)
Line 4 of QR alg.
= Q(k)T A(k−1) Q(k)
Line 3 of QR alg.
= Q(k)T Qπ (k−1)T AQπ (k−1) Q(k)
Inductive hypothesis 2
= Qπ (k)T AQπ (k) .X
¯ (k) R
¯ (k)
Finally, we have proved that Ak = Q
π for the S.I. algorithm
¯ (k) and
and that Ak = Qπ (k) R(k)
π for the QR algorithm . But Q
¯ (k) are QR factors (of Z) at each iteration of S.I. and so are
R
respectively unitary and upper triangular. Also Q(k) and R(k) are
QR factors (of A(k−1) ) at each iteration of S.I. so their products
Qπ (k) and R(k)
π are also respectively unitary and upper triangular.
Therefore, by the so by the uniqueness of the factors in a QR
(k)
¯ (k) = Q (k) and R
¯ (k)
factorisation, Q
=
R
π
π . (The latter equality
π
&
%
MS4105
642
'
¯ (k) = R(k) for all k.)
means that we also have R
&
$
%
MS4105
'
643
$
So both algorithms
• generate orthonormal bases for successive powers of A, i.e. they
generate eigenvectors of A
• generate eigenvalues of A as the diagonal elements of A(k) are
Rayleigh Quotients of A corresponding to the columns of
Qπ (k) .
As the columns of Qπ (k) converge to eigenvectors, the Rayleigh
Quotients converge to the corresponding eigenvalues as in (9.2).
Also, the off-diagonal elements of A(k) correspond to generalised
Rayleigh Quotients using different approximate eigenvectors on the
left and right. As these approximate eigenvectors must become
orthogonal as they converge to distinct eigenvectors, the
off-diagonal elements of A(k) → 0.
&
%
MS4105
'
644
$
We can summarise the results now established with a Theorem
whose proof is implicit in the previous discussion.
Theorem I.3 Let the (unshifted) QR algorithm be applied to a
real symmetric m × m matrix A (with |λ1 | > |λ2 | > · · · > |λm |)
whose corresponding eigenvector matrix Q has all non-singular
leading principal minors. Then as k → ∞, A(k) converges linearly
|λk+1 |
with constant factor max
to diag(λ1 , . . . , λm ) and Qπ (k)
k
|λk |
(with the signs of the columns flipped as necessary) converges at the
same rate to Q.
&
%
MS4105
'
645
$
We focus on the QR algorithm and how to make it more efficient.
Thanks to the two Theorems we can drop the clumsy Q and R
notation.
We repeat the algorithm here for convenience — dropping the
(k)
underbar (i.e. writing Q(k) as Q(k) and Qπ (k) as Qπ and
similarly for R) from now on:
Algorithm I.4 Un-shifted QR Algorithm with extra line
(1)
(2)
(3)
(4)
(5)
(6)
A(0) = A, Q(0) = I
for k = 1, 2, . . .
Q(k) R(k) = A(k−1) Compute the QR factorisation of A.
A(k) = R(k) Q(k) Combine the factors in reverse order.
(k)
(k−1) (k)
Qπ = Qπ
Q ≡ Q(1) . . . Q(k)
end
&
%
MS4105
646
'
I.1.2
$
Two Technical Points
For the QR method to work, when we reverse the order in Line 4
we should check that the tri-diagonal property is preserved. In
other words that if a tri-diagonal matrix T has a QR factorisation
T = QR then the matrix RQ is also tri-diagonal. (Remember that
phase 1 of the process of finding the eigenvectors and eigenvalues of
a real symmetric matrix A consists of using Householder
Hessenberg Reduction (Alg. 8.1) to reduce A to a Hessenberg
matrix — and symmetric Hessenberg matrices are tri-diagonal.)
This is left as an exercise, you will need to prove the result by
induction.
Another point; for greater efficiency, given that the input matrix is
tri-diagonal, we should use 2 × 2 Householder reflections when
computing the QR factorisations in the QR algorithm — this
greatly increases the speed of the algorithm.
&
%
MS4105
647
'
I.2
$
QR Algorithm with Shifts
In this final section, we tweak the (very simple) un-shifted QR
Alg. I.4 and greatly improve its performance.
We have proved that the un-shifted “pure” QR algorithm is
equivalent to Simultaneous Iteration (S.I.) applied to the Identity
matrix. So in particular, the first column of the result iterates as if
the power method were applied to e1 , the first column of I.
Correspondingly, we claim that “pure” QR is also equivalent to
Simultaneous Inverse Iteration applied to a “flipped” Identity
matrix P (one whose columns have been permuted). In particular
we claim that the mth column of the result iterates as if the
Inverse Iteration method were applied to em .
&
%
MS4105
648
'
$
Justification of the Claim Let Q(k) as above be the
orthogonal factor at the kth step of the pure QR algorithm. We
saw that the accumulated product
Q(k)
π
≡
k
Y
Q
(j)
h
= q(k)
1
|
(k)
q2
| ...
(k)
|qm
i
j=1
is the same orthogonal matrix that appears at the kth step of SI.
(k)
We also saw that Qπ is the orthogonal matrix factor in a QR
(k)
factorisation of Ak , Ak = Qπ Rπ (k) .
Now invert this formula:
(k)−T
A−k = Rπ (k)−1 Q(k)−1
= Rπ (k)−1 Q(k)T
= Q(k)
π
π
π Rπ
(k)
as Qπ is orthogonal and A is taken to be real symmetric (and
tri-diagonalised).
&
%
MS4105
649
'
$
Let P be the m × m permutation matrix that reverses row and
column order (check that you know its structure). As P2 = I we
can write
h
ih
i
A−k P = Q(k)
π P
PRπ (k)−T P
(I.2)
The first factor on the right is orthogonal. The second is upper
triangular:
• start with Rπ (k)−T lower triangular,
• flip it top to bottom (reverse row order)
• flip it left to right (reverse column order)
(Draw a picture!)
&
%
MS4105
650
'
$
• So (I.2) can be interpreted as a QR factorisation of A−k P.
• We can re-interpret the QR algorithm as carrying out SI on
the matrix A−1 applied to the starting matrix P.
• In other words Simultaneous Inverse Iteration on A.
(k)
• In particular, the first column of Qπ P (the last column of
(k)
Qπ !) is the result of applying k steps of Inverse Iteration to
em .
&
%
MS4105
651
'
I.2.1
$
Connection with Shifted Inverse Iteration
So the pure QR algorithm is both SI and Simultaneous Inverse
Iteration — a nice symmetry. But the big difference between the
Power Method and Inverse iteration is that the latter can be
accelerated using shifts. the better the estimate µ ≈ λj , the more
effective an Inverse Iteration step with the shifted matrix A − µI.
The “practical” QR algorithm on the next Slide shows how to
introduce shifts into a step of the QR algorithm. Doing so
corresponds exactly to shifts in the corresponding SI and
Simultaneous Inverse Iteration — with the same positive effect.
Lines 4 and 5 are all there is to it! (Lines 6–10 implement
“deflation”, essentially decoupling A(k) into two sub-matrices
whenever an eigenvalue is found — corresponding to very small
(k)
(k)
off-diagonal elements in Aj,j+1 and Aj+1,j .)
&
%
MS4105
'
652
$
Algorithm I.5 Shifted QR Algorithm
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
A(0) = Q(0) AQ(0)T A(0) is a tri-diagonalisation of A.
for k = 1, 2, . . .
Pick a shift µ(k) .
Q(k) R(k) = A(k−1) − µ(k) I Comp. QR fact. of A − µ(k) I.
A(k) = R(k) Q(k) + µ(k) I Combine in reverse order.
(k)
if any off-diag. element Aj,j+1 suff. close to zero
then
(k)
(k)
set Aj,j+1 = Aj+1,j = 0
Apply the algorithm to the two sub-matrices formed.
end
end
Back to Slide 573.
&
%
MS4105
653
'
$
Justification of the Shifted QR Algorithm We need to
justify Lines 4 and 5. Let µ(k) be the eigenvalue estimate chosen at
Line 3 of the algorithm. Examining Lines 4 and 5, the relationship
between steps k − 1 and k of the shifted algorithm is:
A(k−1) − µ(k) I = Q(k) R(k)
A(k) = R(k) Q(k) + µ(k) I
So A(k) = Q(k)T A(k−1) Q(k)
(k)T
(Check that the µ(k) I’s cancel.)
(k)
And, by induction, A(k) = Qπ AQπ as in the unshifted
¯ (k) R
¯ (k) . Instead we
algorithm. But it is no longer true that Ak = Q
have the factorisation
(k)
(k−1)
(1)
¯ (k) R
¯ (k) ,
A−µ I A−µ
I ... A − µ I = Q
a shifted variation on S.I. (Proof is similar to that for the
equivalence of S.I and QR.)
&
%
MS4105
654
'
$
From the discussion above re the permutation matrix P, the first
(k)
column of Qπ is the result of applying the shifted power method
to e1 using the shifts µ(k) and the last column is the result of
applying shifted inverse iteration to em with the same shifts. If the
(k)
µ(k) ’s are well chosen, the last column of Qπ will converge quickly
to an eigenvector.
Finally, we need a way to choose shifts to get fast convergence in
(k)
the last column of Qπ . The obvious choice is the Rayleigh
Quotient. To estimate the eigenvalue corresponding to the
(k)
eigenvector approximated by the last column of Qπ — it is
natural to apply the Rayleigh Quotient to this last column.
So we take
(k)T
µ(k) =
(k)
qm Aqm
(k)T
(k)
qm qm
&
(k)
= q(k)T
Aq
m
m .
%
MS4105
655
'
$
If this value for the shifts µ(k) is chosen at each iteration then the
eigenvalue and eigenvector estimates µ(k) are the same as those
chosen by the Rayleigh Quotient algorithm starting with em .
A convenient fact: the shifts are available at no extra cost as the
(m, m) element of A(k) . To check this:
T
(k)
A(k)
=
e
A
em
m,m
m
= eTm Q(k)T
AQ(k)
π
π em
(k)
= q(k)T
m Aqm
= r(q(k)
m ).
So the shifted QR algorithm with this natural choice for the shifts
(k)
µ(k) has cubic convergence in the sense that qm converges to an
eigenvector cubically.
&
%
MS4105
656
'
$
Finally!!
(k)
• Why the obsession with the last column of Qπ ?
• What about the others?
• The answer is that the deflation trick decomposes A(k) every
time the off-diagonals get sufficiently small.
• So we recursively apply the shifted QR algorithm to smaller
and smaller matrices — the last column of each converges
cubically to an eigenvector.
Back to Slide 573.
&
%
MS4105
'
J
657
$
Solution to Ex. 4 in Exercises 5.1.5
• Assume A∗ A is singular so there is a nonzero x s.t. A∗ Ax = 0.
If Ax = 0 then a linear combination of the columns of A is zero
contradicting A full rank. So y = Ax 6= 0. We have A∗ y = 0 or
y∗ A = 0 so y∗ y = y∗ Ax = 0. So y = 0 . Contradiction.
• Now assume that A∗ A is invertible and {a, . . . , an } not linearly
P
n
ind. then ∃x ∈ C s.t.
xi ai = 0 or Ax = 0. Multiplying by
A∗ gives A∗ Ax = 0 and as A∗ A is assumed invertible we have
x = 0. Contradiction.
&
%
MS4105
658
'
K
$
Solution to Ex. 9 in Exercises 5.1.5
(a) • We have that P2 = P so the eigenvalues of P are either 0 or
1.
kPxk
.
• The norm of P is just kPk = sup
n
kxk
x∈C ,x6=0
• But if ψ is an eigenvector of P corresponding to λ = 1 then
Pψ = ψ and so the ratio is 1.
• Therefore
kPk =
kPxk
kPψk
≥
= 1.
kψk
x∈Cn ,x6=0 kxk
sup
• So kPk ≥ 1 as required.
&
%
MS4105
659
'
$
(b) Now suppose that P is an orthogonal projection operator so
P∗ = P (and P2 = P).
Then
kPk2 = sup kPxk2 = sup x∗ P∗ Px
kxk=1
kxk=1
= sup x∗ P2 x = sup x∗ Px
kxk=1
≤ sup kxkkPxk
kxk=1
(C.S. inequality)
kxk=1
= sup kPxk = kPk.
kxk=1
So kPk2 ≤ kPk but we know that kPk ≥ 1 so kPk2 ≥ kPk. We
must have kPk = 1.
&
%
MS4105
660
'
$
(c) Finally, assume that P2 = P and that kPk = 1. RTP that
¯2 for all
P∗ = P. We have (Exercise 7 on Slide 262) that P∗ v ∈ S
¯1 . RTP that S
¯2 = S1 .
v ∈ Cm and P∗ v = 0 for all v ∈ S
¯2 , u 6= 0. Then u is orthogonal to (I − P)u.
(i) • Let u ∈ S
• So kPuk2 = ku + (I − P)uk2 = kuk2 + k(I − P)uk2 (by
Pythagoras’ Theorem).
• Divide across by kuk2 (equivalent to setting kuk = 1).
• So kPuk2 = 1 + k(I − P)uk2 .
• But kPk = 1 so LHS ≤ 1. And RHS ≥ 1.
• So LHS = RHS = 1 and k(I − P)uk2 = 0.
• So Pu = u and so u ∈ S1 .
¯2 ⊆ S1 .
• Thefore S
&
%
MS4105
'
661
$
(ii) • Now let u ∈ S1 . Then u is orthogonal to (I − P∗ )u as
¯2 along S
¯1 .
P∗ is the projection onto S
• So kP∗ uk2 = ku + (I − P∗ )uk2 = kuk2 + k(I − P∗ )uk2
(by Pythagoras’ Theorem).
• And kP∗ uk2 = 1 + k(I − P∗ )uk2 .
• Again LHS ≤ 1. (As kP∗ k = kPk by Exercise 8 on Slide
263). And RHS ≥ 1.
• So LHS = RHS = 1 and k(I − P∗ )uk2 = 0.
¯2 .
• So P∗ u = u and so u ∈ S
¯2 .
• Thefore S1 ⊆ S
¯2 (and S2 = S
¯1 ).
• Therefore S1 = S
• So S1 and S2 are orthogonal — so P = P∗ .
&
%
MS4105
'
L
662
$
Hint for Ex. 5b in Exercises 5.2.7
^ i) = R(j,
^ j) = 0 where 1 ≤ i < j < n then
If (say) R(i,
a1 ∈ span{q1 } non-zero coefficient for q1
a2 ∈ span{q1 , q2 } non-zero coefficient for q2
..
.
ai−1 ∈ span{q1 , q2 . . . qi−1 } non-zero coefficient for qi−1
ai ∈ span{q1 , q2 . . . qi−1 } possibly zero coefficient for qi−1
So rank [a1 a2 . . . ai ] ≤ i − 1 (as {a1 , a2 , . . . , ai } is spanned by
≤ i − 1 linearly independent vectors).
&
%
MS4105
663
'
$
• Continuing; ai+1 ∈ span{q1 , q2 . . . qi+1 } non-zero
coefficient for qi+1 so itis linearly independent
of
a1 , a2 , . . . , ai and so rank [a1 a2 . . . ai+1 ]
≤ i (one extra
linearly independent column).
• Every subsequent vector ap where i + 2 ≤ p ≤ j − 1 is in
span{q1 , q2 . . . qp } non-zero coefficient for qp and so is
similarly linearly independent of {a1 , a2 , . . . , ap−1 }.
• Therefore the rank of the submatrix [a1 a2 . . . ap ] is just one
more than that of the submatrix [a1 a2 . . . ap−1 ].
• In other words, rank [a1 a2 . . . ap ] ≤ p − 1, where
i + 2 ≤ p ≤ j − 1.
• In particular rank [a1 a2 . . . aj−1 ]
&
≤ j − 2.
%
MS4105
664
'
$
• When we get to aj we have aj ∈ span{q
1 , q2 . . . qj−1 } possibly
zero coefficient for qj−1 so rank [a1 a2 . . . aj ]
≤ j − 1 (up
to j − 1 linearly independent vectors).
• In other words a second (or subsequent) zero diagonal element
^ does not necessarily reduce the rank of A further.
of R
^ has one or more zero diagonal
The conclusion; rank A ≤ n − 1 if R
elements.
&
%
MS4105
665
'
M
$
Proof of of Gerchgorin’s theorem in
Exercises 8.1.11
Let λ be any eigenvalue of A and x a corresponding eigenvector. So
Ax = λx and in subscript notation:
n
X
xj aij = λxi
j=1
xi aii +
n
X
xj aij = λxi
for i = 1 . . . n.
j6=i
Choose p so that |xp | ≥ |xi | for i = 1 . . . n. Then, taking i = p,
λ − app =
n
X
j6=p
&
apj
xj
xp
%
MS4105
666
'
$
and
|λ − app | ≤
n
X
|apj |
j6=p
by definition of p.
&
%
MS4105
667
'
N
$
Proof of Extended Version of
Gerchgorin’s theorem in
Exercises 8.1.11
By Gerchgorin’s theorem all the eigenvalues of A are contained in
the union Um of the m Gerchgorin discs Di
n
Um = ∪n
i=1 Di ≡ ∪i=1 {z ∈ C : |z − aii | ≤ Ri (A)}
We can write A = D + E where D is the diagonal part of A and E
has zeroes on the main diagonal and write A = D + E. We will
treat as a parameter that we can vary in the range 0 ≤ ≤ 1.
Note that Ri (A) = Ri (E) = Ri (A). It will be convenient to write
Di () ≡ Di (A ).
Now we are told that k of the discs intersect to form a connected
set Uk that does not intersect the other discs.
&
%
MS4105
'
668
$
• We can write Uk = ∪ki=1 Di (1) and for any ∈ [0, 1] also write
Uk () = ∪ki=1 D( ).
• The set Uk (1) is disjoint from the union of the remaining
Gerchgorin disks Vk (say) where Vk ≡ Vk (1) = ∪m
i=k+1 Di (1).
• The set Uk () is a subset of the set Uk (1) (see Fig 22) but for sufficiently small, Uk () is not connected and Uk (0) is just the
set of distinct points ∪ki=1 {aii }.
• For each i = 1, . . . , k, the eigenvalues λi (A0 ) = λi (D) = aii .
• It is true in general that the eigenvalues of a matrix are
continuous functions of the entries — so in particular the
eigenvalues λi (A ) are continuous functions of .
• All the λi (A ) are contained in Uk () by G’s Th.
• For sufficiently small the discs Di () must be disjoint.
&
%
MS4105
'
669
$
• So by the continuity property above, as for = 0 the
eigenvalues λi (A0 ) = aii , we must have that for sufficiently
small each eigenvalue λi (A ) remains in the disc Di ().
• As increases from 0 to 1, the disks Di () eventually overlap.
• As increases from 0 to 1, each eigenvalue λi (A ) moves along
a continuous path (parameterised by ) starting at aii and
ending at λi (A1 ) ≡ λi (A) (see Fig 22).
• These continuous curves cannot leave Uk (1) to enter the
union of the remaining Gerchgorin disks Vk (1) as Vk (1) is
disjoint from Uk (1) so we conclude that Uk (1) contains k
eigenvalues of A(1) = A as claimed.
• Finally, using the same reasoning, none of the remaining
eigenvalues can enter Uk (1).
&
%
MS4105
670
'
$
aii
Vk (1)
Di (1)
Uk (1)
Di ()
λi (A)
&
Figure 22: Gerchgorin discs
%
MS4105
671
'
O
$
Backward Stability of Pivot-Free
Gauss Elimination
In this brief Appendix we formally define for reference the ideas of
stability and backward stability. These ideas were used informally
in Section 7.1.6.
• The system of floating point numbers F is a discrete subset of R
such that for all x ∈ R, there exists x 0 ∈ F s.t. |x − x 0 | ≤ M |x|
where M (machine epsilon) is “the smallest number in F
greater than zero that can be distinguished from zero”.
• In other words
∀x ∈ R, ∃s.t.|| ≤ M
&
and fl(x) = x(1 + ).
(O.1)
%
MS4105
672
'
$
• The parameter M can be estimated by executing the code:
Algorithm O.1 Find Machine Epsilon
M = 1
while 1 + M > 1
begin
M = M /2
end
(1)
(2)
(3)
(4)
(5)
• Let fl : R → F be the operation required to approximately
˜.
represent a real number x as a floating point number x
• The rule
x ~ y = fl(x ∗ y)
is commonly implemented in modern computer hardware.
&
%
MS4105
673
'
$
• A consequence: the “Fundamental Axiom of Floating point
Arithmetic”; for any flop (floating point operation) ~
corresponding to a binary operation ∗ and for all x, y ∈ F there
exists a constant with || ≤ M such that
x ~ y = (x ∗ y)(1 + )
(O.2)
• Any mathematical problem can be viewed as a function
f : X → Y from a vector space X of data to another vector space
Y.
• A algorithm can be viewed as a different map f˜ : X → Y.
˜ − f(x)k.
• The absolute error of a computation is kf(x)
• The relative error of a computation is
˜ − f(x)k
kf(x)
.
kf(x)k
&
%
MS4105
674
'
$
An algorithm f˜ for a problem f is accurate if the relative error
is O(M ), i.e. if for each x ∈ X,
˜ − f(x)k
kf(x)
= O(M ).
kf(x)k
• The goal of accuracy as defined here is often unattainable if the
problem is ill-conditioned (very sensitive to small changes in
the data). Roundoff will inevitably perturb the data.
• A more useful and attainable criterion to aspire to is stability.
We say that an algorithm f˜ for a problem f is stable if for
each x ∈ X,
˜ − f(x
˜)k
kf(x)
= O(M ).
(O.3)
˜)k
kf(x
˜ ∈ X with
for some x
kx−˜
xk
kxk
= O(M ).
• In words: a stable algorithm “ gives nearly the right answer to
nearly the right question”.
&
%
MS4105
675
'
$
• A stronger condition is satisfied by some (but not all)
algorithms in numerical linear algebra.
• We say that an algorithm f˜ for a problem f is backward
stable if for each x ∈ X,
˜k
kx − x
˜
˜
˜
f(x) = f(x) for some x ∈ X with
= O(M ).
kxk
(O.4)
• This is a considerable tightening of the definition of stability as
the O(M ) in (O.3) has been replaced by zero.
• In words: a backward stable algorithm “ gives exactly the right
answer to nearly the right question”.
&
%
MS4105
676
'
P
$
Instability of Polynomial
Root-Finding in Section 8.2
Pn
Theorem P.1 If p is a polynomial p(x) = i=0 and r is one of
the roots then, if we make a small change δak in the kth coefficient
ak , the first-order change δr in the value of the jth root r is
rk
δr = − 0 δak .
p (r)
(P.1)
Also the condition number of the problem (the ratio of the relative
error in r to the relative error in ak ) is given by
κ≡
&
|δr|
|r|
|δak |
|ak |
|ak rk−1 |
=
|p 0 (r)|
(P.2)
%
MS4105
677
'
$
Proof: The polynomial p depends on both the coefficients a and
the argument x so we can write p(r, a) = 0 as r is a root.
But p(r + δr, ak + δak ) is still zero (giving an implicit equation for
δr. We can find an approximate vaue for δr using a (first order)
Taylor series expansion:
0 = δp ≈
∂p(r)
∂p(r)
δr +
δak
∂r
∂ak
= p 0 (r)δr + rk δak .
Solving for ∂r gives the first result and the second follows
immediately.
rk
− p 0 (r)
The factor
in (P.1) can be large when |r| is large or when
p 0 (r) is close to zero or both. A similar comment may be made
about the condition number (P.2).
&
%
MS4105
'
678
$
If we are particularly unfortunate and the polynomial p has a
double root (r, say), then the situation is even worse — the change
in the root is of the order of the square root of the change in the
coefficient ak . So even if the roundoff error in ak , δak = O(M )
(machine epsilon, typically ≈ 10−16 ), the resulting error in r, δr
could be as large as 10−8 .
We can state this as a theorem.
Pn
Theorem P.2 If p is a polynomial p(x) = i=0 and r is a double
root then, if we make a small change δak in the kth coefficient ak ,
the second-order change δr in the value of the jth root r is
p
δr = O( |δak |).
(P.3)
&
%
MS4105
679
'
$
Proof:
We still have an implicit equation for δr; p(r + δr, ak + δak ) = 0.
We have p(r) = p 0 (r) = 0. We can find an approximate value for δr
again but now we need a second- order Taylor series expansion:
∂p(r)
∂p(r)
1 ∂2 p(r) 2 ∂2 p(r)
0 = δp ≈
δr +
δak +
δr +
δrδak
∂r
∂ak
2 ∂2 r
∂r∂ak
1
= rk δak + p 00 (r)δr2 + krk−1 δrδak .
2
(Note that there is no
ak .)
∂2 p(r)
∂2 a k
term as p is linear in the coefficients
The second equation is a quadratic in δr and we can solve it for δr
giving:
q
2
−krk−1 δak ± (krk−1 δak ) − 2rk p 00 (r)δak
δr =
.
(P.4)
p 00 (r)
&
%
MS4105
'
680
$
Recalling that δak is a small << 1 modification in ak , we can see
p
that the dominant term in δr is O( |δak |) as claimed.
&
%
MS4105
681
'
Q
$
Solution to Problem 2 on Slide 279
The Answer Every second element of R is zero;
R12 = R14 , · · · = 0, R23 = R25 =, · · · = 0 and Rij = 0 when i is even
and j odd or vice versa.
The Proof We have
h
Aik = u1 v1 u2 v2
...
un
vn
i
ik
= (QR)ik = Qij Rjk , j ≤ k.
where each of the column vectors v1 , . . . , vn is orthogonal to all the
column vectors u1 , . . . , un and A is m × 2n, still “tall and thin”,
i.e. 2n ≤ m.
By def, up = a2p−1 , the 2p − 1 column of A and vq = a2q , the 2q
&
%
MS4105
682
'
$
column of A. So
uip ≡ ai,2p−1 = Qij Rj,2p−1 ,
viq ≡ ai,2q = Qij Rj,2q ,
j ≤ 2p − 1
j ≤ 2q.
and so
up = Rj,2p−1 qj ,
vq = Rj,2q qj ,
j ≤ 2p − 1
j ≤ 2q.
We know that u∗p vq = 0 for all p = 1, . . . , n and q = 1, . . . , n so:
0 = u∗p vq = Rj,2p−1 Ri,2q q∗j qi
= Ri,2p−1 Ri,2q
j ≤ 2p − 1, i ≤ 2q
i ≤ 2p − 1, i ≤ 2q
(Q.1)
where the vectors qj are the jth columns of the unitary matrix Q.
&
%
MS4105
'
683
$
RTP that Ri,2p−1 = 0 for all even i and Ri,2q = 0 for all odd i.
Prove this by induction on p and q. Note that we can take Rkk 6= 0
for k = 1, . . . , r where r is the rank of R, in this case r = 2n. In
other works we assume that R is full rank.
[Base case] Either of p or q equal to one.
[p = 1] So i ≤ 2p − 1 = 1. Then (Q.1) gives R11 R1,2q = 0 so
R1,2q = 0 for all q = 1, . . . , n.
[q = 1] We have 2q = 2 so i ≤ 2 and so
R1,2p−1 R12 + R2,2p−1 R22 = 0. The first term is zero as
R1,2q = 0 for all q so as the diagonal terms are assumed
non-zero we must have R2,2p−1 = 0 for all p.
&
%
MS4105
'
684
$
[Induction Step] Assume that Ri,2p−1 = 0 for all even i ≤ 2k − 1
and Ri,2q = 0 for all odd i ≤ 2k and RTP that R2k+2,2p−1 = 0
and R2k+1,2q = 0.
[Let p = k + 1] Then i ≤ 2k + 1 and also i ≤ 2q. So (Q.1)
gives 0 = R1,2k+1 R1,2q + R2,2k+1 R2,2q + · · · +
R2k,2k+1 R2k,2q + R2k+1,2k+1 R2k+1,2q . But the first and
second factor in each term are alternately zero by the
inductive hypothesis (R1,2q = 0, R2,2k+1 = 0, . . . ,
R2k,2k+1 ). So we conclude that the last term must be zero
and so R2k+1,2q = 0.
&
%
MS4105
685
'
$
[Let q = k + 1] Then 2q = 2k + 2 and i ≤ 2k + 2.
So (Q.1) gives 0 = R1,2p−1 R1,2k+2 + R2,2p−1 R2,2k+2 + · · · +
R2k+1,2p−1 R2k+1,2k+2 + R2k+2,2p−1 R2k+2,2k+2 . Again, by the
inductive hypothesis, the first and second factors in each term
are alternately zero: R1,2k+2 = 0, R2,2p−1 = 0, . . . ,
R2k+1,2k+2 =) so the last term must be zero; R2k+2,2p−1 = 0
as required.
By the Principle of Induction, the result follows.
&
%
MS4105
'
R
686
$
Convergence of Fourier Series
(Back to Slide 136.)
If f(t) is a periodic function with period 2π that is continuous on
the interval (−π, π) except at a finite number of points — and if
the one-sided limits exist at each point of discontinuity as well as at
the end points −π and π, then the Fourier series F(t) converges to
f(t) at each t in (−π, π) where f is continuous. If f is discontinuous
at t0 but possesses left-hand and right-hand derivatives at t0 , then
+
F(t0 ) converges to the average value F(t0 ) = 12 (f(t−
)
+
f(t
0
0 )) where
+
f(t−
)
and
f(t
0
0 ) are the left and right limits at t0 respectively.
&
%
MS4105
'
S
687
$
Example of Instability of Classical GS
(Back to Slide 273.)
Now apply the CGS algorithm for j = 1, 2, 3. Initialise
vi = ai , i = 1, 2, 3.
√
[j = 1] r11 ≡ fl(kv1 k) = fl( 1 + 2.10−6 ) = 1. So q1 = v1 .
√
∗
[j = 2] r12 ≡ fl(q1 a2 ) = fl( 1 + 10−6 ) = 1 So


0


−3

v2 ← v2 − 1q1 = −  0 
 and r22 ≡ fl(kv2 k) = 10 .
10−3
 
0
 
.
Normalise: q2 = fl(v2 / fl(kv2 k)) = 
0
 
−1
&
%
MS4105
688
'
$
[j = 3] r13 ≡ fl(q∗1 v3 ) = 1 and r23 ≡ fl(q∗2 v3 ) = −10−3 . So reading
the for loop as a running sum,



  


1
0
0
1



  


−3 
−3
−3






v3 ←  0  − 1 10  + 10  0  = −10 
 and
10−3
−1
−10−3
10−3
√
r33 = fl(kv3 k) = fl( 2.10−3 ).
Normalise:

√
0




q3 ← fl(v3 / fl(kv3 k)) = fl(v3 / fl( 2.10 )) = −0.709
.
−0.709
 




0
0
1

 



−3





So q1 = 10 , q2 =  0  and q3 = −0.709
.
−1
−0.709
10−3
&
−3
%
MS4105
689
'
T
$
Example of Stability of Modified GS
(Back to Slide 291.)
• Initialise vi = ai , i = 1, 2, 3.
√
[j = 1] r11 ≡ fl(kv1 k) = fl( 1 + 2.10−6 ) = 1. So


1



q1 = v1 =  0 
.
10−3
[j = 2] r12 ≡ fl(q∗1 v2 ) = 1. So



 

1
1
0



 

−3
−3


 

v2 = v2 − r12 q1 = 
10  − 1. 10  =  0 .
0
10−3
−10−3
Normalising, r22 ≡ fl(kv2 k) = 10−3 and
&
%
MS4105
690
'
$

0

 

q2 = fl(v2 /r22 ) = 
 0 .
−1

0



−3

[j = 3] r13 ≡
= 1. So v3 = v3 − r13 q1 = −10 
.
0
Now r23 ≡ q∗2 v3 = 0 so v3 unchanged. Normalising,
 
0
 
−3
.
r33 ≡ fl(kv3 k) = 10 so q3 = v3 /r33 = 
−1
 
0
fl(q∗1 v3 )
&
%
MS4105
691
'
U
$
Proof of Unit Roundoff Formula
Assume that x > 0. Then writing the real number x in the form
x = µ × βe−t ,
βt−1 ≤ µ ≤ βt ,
(U.1)
obviously x lies between the immediately adjacent f.p. numbers
y1 = bµcβe−t and y2 = dµeβe−t (strictly speaking, if dµe = βt
then y2 = (dµe/β) βe−t+1 ).
So fl x = y1 or y2 and I have
| fl(x) − x| ≤ |y1 − y2 |/2 ≤ βe−t /2.
So
1 e−t
fl(x) − x 1 1−t
≤ 2β
≤
β
= u.
e−t
x
µ×β
2
(Back to Slide 337.)
&
(U.2)
(U.3)
%
MS4105
'
V
692
$
Example to Illustrate the Stability of
the Householder QR Algorithm
(Back to Slide 334.)




1
2




−3
−3



[k = 1] x = 10  and fl(kxk) = 1, v1 = 10 
 and
10−3
10−3
fl(v∗1 v1 ) = 4.0 .
&
%
MS4105
693
'
$
Now update A:

A
←
A
←
&
2.0

i
h

−3 
−3
A − (2/4.0) 
1.0 10−3
1.0 10  2.0 1.0 10
1.0 10−3


1
1
1



1.0 10−3 1.0 10−3
0


1.0 10−3
0
1.0 10−3


1
1
1


1.0 10−3 1.0 10−3

0


1.0 10−3
0
1.0 10−3


2
2
2


−3
−3
−3

− 1.0 10
1.0 10
1.0 10 
.
1.0 10−3 1.0 10−3 1.0 10−3
%
MS4105
694
'
$
So
A
&

−1

← 
0
0
−1
0
−1.0 10−3

−1


−3  .
−1.0 10
0
%
MS4105
695
'
$


0
 and fl(kxk) = 1.0 10−3 .

[k = 2] Now x = A(2 : 3, 2) =
−1.0 10−3


1.0 10−3
 (taking sign(x1 ) = 1).
• So v2 = 
−1.0 10−3
• Also fl(v∗2 v2 ) = 2.0 10−3 .
&
%
MS4105
696
'
$
• Now update the lower right 2 × 2 block of A:
A(2 : 3, 2 : 3) = A(2 : 3, 2 : 3)


−3
h
i
1.0
10
 1.0 10−3 −1.0 10−3
− 2/(2.0 10−3 ) 
−1.0 10−3


0
−1.0 10−3


−1.0 10−3
0


−1.0 10−3
0


=
0
−1.0 10−3
(all in three digit f.p. arithmetic)
&
%
MS4105
697
'
$
• So now (after two iterations), A is updated to:


−1
−1
−1



 0 −1.0 10−3
0


0
0
−1.0 10−3
In fact, a slightly improved version of the algorithm sets the
subdiagonals to zero and calculates the diagonal terms using
±kxk at each iteration but I’ll ignore this complication here.
&
%
MS4105
'
698
$
[k = 3] The matrix A is now upper triangular (in three digit f.p.
arithmetic) but
• as noted on Slide 328
– I need to either do one more iteration (a little more
work but I’ll take this option)
– or I could just define v3 = 1 (easier).
• I have x = A(3, 3) = −1.0 10−3 and fl(kxk) = 1.0 10−3 .
• So v3 = −2.0 10−3 and fl(v∗3 v3 ) = 4.0 10−6 .
• Finally; A(3, 3) ← A(3, 3) −
2
−3
−3
−3
−3
(−2.0
10
)(−2.0
10
)(−1.0
10
)
=
1.0
10
.
−6
4.0 10
• So the final version of A (the upper triangular matrix R) is


−1
−1
−1


−3


R =  0 −1.0 10
0

0
0
1.0 10−3
&
%
MS4105
'
699
$
• The vectors v1 , v2 , v3 can be stored as the lower triangular


2.0
0
0


−3
−3

.
part of V = 1.0 10
1.0 10
0

1.0 10−3 −1.0 10−3 −2.0 10−3
&
%