Course Notes for MS4105 Linear Algebra 2 J. Kinsella October 20, 2014 0-0 MS4105 Linear Algebra 2 0-1 Contents I Linear Algebra 1 Vector Spaces 1.1 1.2 1.3 1.4 6 9 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.1.1 General Vector Notation . . . . . . . . . . . . . . . 10 1.1.2 Notation for Vectors in Rn . . . . . . . . . . . . . 13 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 24 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 41 Linear Independence . . . . . . . . . . . . . . . . . . . . . 42 MS4105 Linear Algebra 2 1.4.1 1.5 0-2 Exercises . . . . . . . . . . . . . . . . . . . . . . . 62 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . 63 1.5.1 91 Exercises . . . . . . . . . . . . . . . . . . . . . . . 2 Inner Product Spaces 2.1 2.2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . 95 2.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 99 2.1.2 Length and Distance in Inner Product Spaces 2.1.3 Unit Sphere in Inner Product Spaces . . 101 . . . . . . . 106 Angles and Orthogonality in Inner Product Spaces 2.2.1 2.3 93 . . . 111 Exercises . . . . . . . . . . . . . . . . . . . . . . . 122 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . 124 2.3.1 Calculating Orthonormal Bases . . . . . . . . . . . 131 MS4105 Linear Algebra 2 2.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . 136 3 Complex Vector and Inner Product Spaces II 0-3 142 3.1 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . 143 3.2 Complex Inner Product Spaces . . . . . . . . . . . . . . . 144 3.2.1 Properties of the Complex Euclidean inner product 146 3.2.2 Orthogonal Sets . . . . . . . . . . . . . . . . . . . 149 3.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . 156 Matrix Algebra 4 Matrices and Vectors 4.1 162 163 Properties of Matrices . . . . . . . . . . . . . . . . . . . . 172 MS4105 4.2 4.3 Linear Algebra 2 0-4 4.1.1 Range and Nullspace . . . . . . . . . . . . . . . . . 172 4.1.2 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . 175 4.1.3 Inverse . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.1.4 Matrix Inverse Times a Vector . . . . . . . . . . . 182 Orthogonal Vectors and Matrices . . . . . . . . . . . . . . 184 4.2.1 Inner Product on Cn 4.2.2 Orthogonal vectors . . . . . . . . . . . . . . . . . . 193 4.2.3 Unitary Matrices . . . . . . . . . . . . . . . . . . . 199 4.2.4 Multiplication by a Unitary Matrix . . . . . . . . . 200 4.2.5 A Note on the Unitary Property . . . . . . . . . . 201 4.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . 202 . . . . . . . . . . . . . . . . 187 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 4.3.1 Vector Norms . . . . . . . . . . . . . . . . . . . . . 205 MS4105 Linear Algebra 2 0-5 4.3.2 Inner Product based on p-norms on Rn /Cn 4.3.3 Unit Spheres . . . . . . . . . . . . . . . . . . . . . 210 4.3.4 Matrix Norms Induced by Vector Norms . . . . . . 214 . . . 209 5 QR Factorisation and Least Squares 5.1 5.2 240 Projection Operators . . . . . . . . . . . . . . . . . . . . . 241 5.1.1 Orthogonal Projection Operators . . . . . . . . . . 248 5.1.2 Projection with an Orthonormal Basis . . . . . . . 253 5.1.3 Orthogonal Projections with an Arbitrary Basis 5.1.4 Oblique (Non-Orthogonal) Projections . . . . . . . 259 5.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . 260 . 256 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . 264 5.2.1 Reduced QR Factorisation . . . . . . . . . . . . . . 264 MS4105 5.3 Linear Algebra 2 0-6 5.2.2 Full QR factorisation . . . . . . . . . . . . . . . . . 268 5.2.3 Gram-Schmidt Orthogonalisation . . . . . . . . . . 269 5.2.4 Instability of Classical G-S Algorithm . . . . . . . 272 5.2.5 Existence and Uniqueness . . . . . . . . . . . . . . 275 5.2.6 Solution of Ax = b by the QR factorisation 5.2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . 279 . . . 277 Gram-Schmidt Orthogonalisation . . . . . . . . . . . . . . 282 5.3.1 Modified Gram-Schmidt Algorithm . . . . . . . . . 285 5.3.2 Example to Illustrate the “Stability” of MGS . . . 291 5.3.3 A Useful Trick . . . . . . . . . . . . . . . . . . . . 293 5.3.4 Operation Count . . . . . . . . . . . . . . . . . . . 299 5.3.5 Gram-Schmidt as Triangular Orthogonalisation . . 300 5.3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . 307 MS4105 5.4 5.5 Linear Algebra 2 Householder Transformations . . . . . . . . . . . . . . . . 308 5.4.1 Householder and Gram Schmidt . . . . . . . . . . 309 5.4.2 Triangularising by Introducing Zeroes . . . . . . . 311 5.4.3 Householder Reflections . . . . . . . . . . . . . . . 314 5.4.4 How is Q to be calculated? . . . . . . . . . . . . . 329 5.4.5 Example to Illustrate the Stability of the Householder QR Algorithm . . . . . . . . . . . . . . . . 334 Why is Householder QR So Stable? . . . . . . . . . . . . . 337 5.5.1 5.6 0-7 Operation Count . . . . . . . . . . . . . . . . . . . 342 Least Squares Problems . . . . . . . . . . . . . . . . . . . 345 5.6.1 Example: Polynomial Data-fitting . . . . . . . . . 346 5.6.2 Orthogonal Projection and the Normal Equations 5.6.3 Pseudoinverse . . . . . . . . . . . . . . . . . . . . . 352 349 MS4105 Linear Algebra 2 5.6.4 5.7 Solving the Normal Equations . . . . . . . . . . . . 354 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 6 The Singular Value Decomposition 6.1 6.2 0-8 358 Existence of SVD for m × n Matrices . . . . . . . . . . . . 363 6.1.1 Some Simple Properties of the SVD . . . . . . . . 373 6.1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . 382 Uniqueness of SVD . . . . . . . . . . . . . . . . . . . . . . 387 6.2.1 Uniqueness of U and V . . . . . . . . . . . . . . . 390 6.2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . 395 6.3 Naive method for computing SVD . . . . . . . . . . . . . 396 6.4 Significance of SVD 6.4.1 . . . . . . . . . . . . . . . . . . . . . 400 Changing Bases . . . . . . . . . . . . . . . . . . . . 400 MS4105 6.5 Linear Algebra 2 6.4.2 SVD vs. Eigenvalue Decomposition . . . . . . . . . 402 6.4.3 Matrix Properties via the SVD . . . . . . . . . . . 404 6.4.4 Low-Rank Approximations . . . . . . . . . . . . . 414 6.4.5 Application of Low-Rank Approximations . . . . . 419 Computing the SVD . . . . . . . . . . . . . . . . . . . . . 433 6.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 439 7 Solving Systems of Equations 7.1 0-9 442 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . 443 7.1.1 LU Factorisation . . . . . . . . . . . . . . . . . . . 443 7.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . 447 7.1.3 General Formulas for Gaussian Elimination . . . . 452 7.1.4 Operation Count . . . . . . . . . . . . . . . . . . . 459 MS4105 7.2 Linear Algebra 2 0-10 7.1.5 Solution of Ax = b by LU factorisation . . . . . . 462 7.1.6 Instability of Gaussian Elimination without Pivoting464 7.1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . 469 Gaussian Elimination with Pivoting . . . . . . . . . . . . 470 7.2.1 A Note on Permutations . . . . . . . . . . . . . . . 471 7.2.2 Pivots . . . . . . . . . . . . . . . . . . . . . . . . . 473 7.2.3 Partial pivoting . . . . . . . . . . . . . . . . . . . . 476 7.2.4 Example . . . . . . . . . . . . . . . . . . . . . . . . 478 7.2.5 PA = LU Factorisation . . . . . . . . . . . . . . . . 482 7.2.6 Details of Li to Li0 Transformation . . . . . . . . . 484 7.2.7 Stability of GE . . . . . . . . . . . . . . . . . . . . 490 7.2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . 497 MS4105 Linear Algebra 2 8 Finding the Eigenvalues of Matrices 8.1 0-11 498 Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . 498 8.1.1 Eigenvalue Decomposition . . . . . . . . . . . . . . 499 8.1.2 Geometric Multiplicity . . . . . . . . . . . . . . . . 502 8.1.3 Characteristic Polynomial . . . . . . . . . . . . . . 502 8.1.4 Algebraic Multiplicity . . . . . . . . . . . . . . . . 505 8.1.5 Similarity Transformations . . . . . . . . . . . . . 507 8.1.6 Defective Eigenvalues and Matrices . . . . . . . . . 511 8.1.7 Diagonalisability . . . . . . . . . . . . . . . . . . . 513 8.1.8 Determinant and Trace . . . . . . . . . . . . . . . 518 8.1.9 Unitary Diagonalisation . . . . . . . . . . . . . . . 521 8.1.10 Schur Factorisation . . . . . . . . . . . . . . . . . . 524 8.1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . 537 MS4105 8.2 Linear Algebra 2 0-12 Computing Eigenvalues — an Introduction . . . . . . . . 540 8.2.1 Using the Characteristic Polynomial . . . . . . . . 540 8.2.2 An Alternative Method for Eigenvalue Computation544 8.2.3 Reducing A to Hessenberg Form — the “Obvious” Method . . . . . . . . . . . . . . . . . . . . . . . . 545 8.2.4 Reducing A to Hessenberg Form — a Better Method548 8.2.5 Operation Count . . . . . . . . . . . . . . . . . . . 551 8.2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . 553 9 The QR Algorithm 555 9.1 The Power Method . . . . . . . . . . . . . . . . . . . . . . 556 9.2 Inverse Iteration . . . . . . . . . . . . . . . . . . . . . . . 563 9.3 Rayleigh Quotient Iteration . . . . . . . . . . . . . . . . . 566 9.4 The Un-Shifted QR Algorithm . . . . . . . . . . . . . . . 571 MS4105 Linear Algebra 2 10 Calculating the SVD 0-13 574 10.1 An alternative (Impractical) Method for the SVD . . . . . 575 10.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 577 10.2 The Two-Phase Method . . . . . . . . . . . . . . . . . . . 581 10.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . 583 III Supplementary Material 584 A Index Notation and an Alternative Proof for Lemma 1.10585 B Proof that under-determined homogeneous linear systems have a non-trivial solution 592 C Proof of the Jordan von Neumann Lemma for a real inner MS4105 Linear Algebra 2 product space 0-14 596 D Matlab Code for (Very Naive) SVD algorithm 602 E Matlab Code for simple SVD algorithm 605 F Example SVD calculation 607 G Uniqueness of U and V in S.V.D. 609 H Oblique Projection Operators — the details 616 I 625 Detailed Discussion of the QR Algorithm I.1 Simultaneous Power Method . . . . . . . . . . . . . . . . . 625 I.1.1 A Normalised version of Simultaneous Iteration . . 633 MS4105 Linear Algebra 2 I.1.2 I.2 0-15 Two Technical Points . . . . . . . . . . . . . . . . 646 QR Algorithm with Shifts . . . . . . . . . . . . . . . . . . 647 I.2.1 Connection with Shifted Inverse Iteration . . . . . 651 J Solution to Ex. 4 in Exercises 5.1.5 657 K Solution to Ex. 9 in Exercises 5.1.5 658 L Hint for Ex. 5b in Exercises 5.2.7 661 M Proof of of Gerchgorin’s theorem in Exercises 8.1.11 665 N Proof of Extended Version of Gerchgorin’s theorem in Exercises 8.1.11 667 MS4105 Linear Algebra 2 0-16 O Backward Stability of Pivot-Free Gauss Elimination 671 P Instability of Polynomial Root-Finding in Section 8.2 676 Q Solution to Problem 2 on Slide 279 681 R Convergence of Fourier Series 686 S Example of Instability of Classical GS 687 T Example of Stability of Modified GS 689 U Proof of Unit Roundoff Formula 691 V Example to Illustrate the Stability of the Householder QR Algorithm 692 MS4105 Linear Algebra 2 ' 1 $ About the Course • Lecture times – (Week 1) ∗ Monday 15:00 A1–052 LECTURE ∗ Wednesday 09:00 SG–17 LECTURE – (Week 2 and subsequently) ∗ ∗ ∗ ∗ Monday 15:00 A1–052 LECTURE Wednesday 09:00 SG–15 LECTURE Thursday 10:00 KBG–11 TUTORIAL (3B) Thursday 17:00 B1–005A TUTORIAL (3A) • Office hours: Mondays & Wednesdays 1600. B3-043. • These notes available at http://jkcray.maths.ul.ie/ms4105/Slides.pdf & % MS4105 ' Linear Algebra 2 2 $ • The main reference text for the course is “Numerical Linear Algebra” by Lloyd Trefethen and David Bau (shelved at 512.5) on which much of Part 2 is based. • To review the basics of Linear Algebra see “Elementary Linear Algebra” by H. Anton (shelved at 512.5). & % MS4105 ' Linear Algebra 2 3 $ • The Notes are divided into two Parts which are in turn divided into Chapters and Sections. • The first Part of the course is devoted to Linear Algebra; Vector Spaces and Inner Product Spaces. • The second Part focuses on the broad topic of Matrix Algebra — essentially Applied Linear Algebra. • Usually only some of the topics in the Notes will be covered in class — in particular I expect to have to leave out a lot of material from Chapters 8, 9 & 10 — of course when students are particularly clever anything is possible . . . ^ ¨ & % MS4105 ' Linear Algebra 2 4 $ • There are Exercises at the end of each Section — you will usually be asked to attempt one or more before the next tutorial. • There are also statements made in the notes that you are asked to check. • By the end of the semester you should aim — maybe in cooperation with classmates — to have written out the answers to most/all Exercises and check problems. • A record of attendance at lectures and tutorials will be kept. & % MS4105 ' Linear Algebra 2 5 $ • There will be a mid-semester written test for 10% after Part I is completed, covering that first part of the course. • This test will be held during a tutorial class — you will be given advance notice! • There will be a Matlab/Octave programming assignment — also for 10%. • The Matlab/Octave project description will appear at http://jkcray.maths.ul.ie/ms4105/Project2014.pdf • This assignment will be given in class around Week 9 and you will be given a week to complete it. • There will be a end of semester written examination for 80% of the marks for the course. & % MS4105 ' 6 $ Part I Linear Algebra In Linear Algebra 1, many of the following topics were covered: • Systems of linear equations and their solution by an elimination method. • Matrices: matrix algebra, determinants, inverses, methods for small matrices, extension to larger matrices. • Vectors in 2 and 3 dimensions: geometric representation of vectors, vector arithmetic, norm, scalar product, angle, orthogonality, projections, cross product and its uses in the study of lines and planes in 3-space. & % MS4105 ' 7 $ • Extension to vectors in n dimensions: vector algebra, scalar product, orthogonality, projections, bases in R2 , R3 and Rn . • Matrices acting on vectors, eigenvalues and eigenvectors, particularly in 2 and 3 dimensions. • Applications such as least-squares fits to data. & % MS4105 ' 8 $ • This first Part (Part I) extends these familiar ideas. • I will be generalising these ideas from R2 , R3 and Rn to general vector spaces and inner product spaces. • This will allow us to use to use geometric ideas (length, distance, angle etc.) and results (e.g. Cauchy-Schwartz inequality) in many useful and unexpected contexts. & % MS4105 ' 1 9 $ Vector Spaces • I begin by reviewing the idea of a Vector Space. • Rn and Cn are the most important examples. • I will show that other important mathematical systems are also vector spaces, e.g. sets of matrices or functions. – Because they satisfy the same set of “rules”or axioms. • Studying vector spaces will give us results which will automatically hold for vectors in Rn and Cn . • I usually knew these results already for vectors in Rn and Cn . • But they will also hold for the more complicated situations where “geometrical intuition” is less useful. • This is the real benefit of studying vector spaces. & % MS4105 10 ' 1.1 $ Notation First some (important) points on notation. 1.1.1 General Vector Notation A variety of notations are used for vectors, both the familiar vectors in R2 , R3 and Rn and abstract vectors in vector spaces. • The clearest notation in printed material (as in these slides) is to write – vectors u, v, w, a, b, c, 0 in a bold font and – scalars (numbers) α, β, γ, 1, 2, 3 in a normal (thin) font. • A widely used convention is to use Roman letters u, v, w for vectors and Greek letters α, β, γ for scalars (numbers). & % MS4105 ' 11 $ • When writing vectors by hand, they are often – underlined u or → – have an arrow → over the symbol − u • In all cases the purpose is to differentiate vectors u, v, w from scalars (numbers) α, β, γ. • Very often in these notes, instead of writing α u to mean the product of the scalar (number) α and the vector u, I will simply write α u as it will be clear from the context that α is a scalar (number) and u is a vector. • The Roman/Greek convention often makes the use of bold fonts/underlining/arrows unnecessary. • The purpose of underlining/arrows for vectors in handwritten material is clarity — use them when necessary to make your intended meaning clear. & % MS4105 ' 12 $ • In each of the following letter pairs check whether the letters used stand for vectors or scalars (numbers) — N.B. some of them are deliberately confusing but you should still be able to “decode” them: → α uξ ξu γu v− → α− v βw az za & % MS4105 13 ' 1.1.2 $ Notation for Vectors in Rn It is normal in Linear Algebra to write vectors in Rn as row vectors, e.g. (1, 2, 3). I will find it useful later in the module when studying Matrix Algebra to adopt the convention that all vectors in Rn are column vectors. For that reason I will usually write vectors in R2 , R3 and Rn as row vectors with a superscript “T” indicating that I am taking the transpose — turning the row into a column. So 1 T (1, 2, 3) ≡ 2 3 The version on the left takes up less space on the page/screen. & % MS4105 14 ' 1.2 $ Definitions Definition 1.1 A vector space is a non-empty set V together with two operations; addition and multiplication by a scalar. • given u, v ∈ V, write the sum as u + v. • given u ∈ V and α ∈ R (or C) , write the scalar multiple of u by α as αu. • The addition and scalar multiplication operations must satisfy a set of rules or axioms (based on the properties of R, R2 , R3 and Rn ), the Vector Space Axioms in Definition 1.2. & % MS4105 ' 15 $ • The word “sum” and the + symbol as used above doesn’t have to be the normal process of addition even if u and v are numbers. √ • I could decide to use an eccentric definition like u + v ≡ uv if u and v were positive real numbers. • Or I could “define” αu ≡ sin(αu), again if u is a number. • These wierd definitions won’t work as some of the axioms in Definition 1.2 below are not satisfied. • But some “wierd” definitions of addition and scalar multiplication do satisfy the axioms in Definition 1.2. & % MS4105 ' 16 $ Example 1.1 (Strange Example) Here’s an even stranger candidate: let V be the positive real numbers R+ , where 1 is the “zero vector”, “scalar multiplication” is really numerical exponentiation, and “addition” is really numerical multiplication. In other words, x + y ≡ xy and αx ≡ xα . (Note vector space notation on the left hand side and ordinary algebraic notation on the right.) • Is this combination of a set V and the operations of addition and scalar multiplication a vector space? • To answer the question we need to list the defining rules/axioms for a vector space. • If the following rules or axioms are satisfied for all u, v, w ∈ V and for all α, β ∈ R (or C), then call V a vector space and the elements of V vectors. & % MS4105 ' 17 $ Definition 1.2 (Vector Space Axioms) A non-empty set V together with two operations; addition and multiplication by a scalar is a vector space if the following 10 axioms are satisfied: 1. If u, v ∈ V, then u + v ∈ V. V is closed under addition 2. u + v = v + u. Addition is commutative. 3. u + (v + w) = (u + v) + w. Addition is associative. 4. There exists a special vector 0 ∈ V, the zero vector for V, such that u + 0 = 0 + u = u for all u ∈ V. 5. For each u ∈ V, there exists a special vector −u ∈ V, the negative of u, such that u + (−u) = (−u) + u = 0. & % MS4105 ' 18 $ 6. If α ∈ R (or C) and u ∈ V then α u ∈ V. V is closed under scalar multiplication. 7. α(u + v) = αu + αv. Scalar multiplication is distributive. 8. (α + β)u = αu + βu. Scalar multiplication is distributive. 9. α(βu) = (αβ)u. Scalar multiplication is associative. 10. 1u = u. Scalar multiplication by 1 works as expected. & % MS4105 ' 19 $ • All ten axioms are “obvious” in the sense that it is very easy to check that they hold for the familiar examples of R2 and R3 . • (I’ll do this shortly.) • The subtle point is that I am now saying that any set V (along with a definition of addition and scalar multiplication) that satisfies these properties of R2 and R3 is a vector space. • This is something people do all the time in mathematics; generalise from a particular case to a general class of objects that have the same structure as the original. • Vector spaces in which the scalars may be complex are complex vector spaces, if the scalars are restricted to be real then they are called real vector spaces. • The axioms are otherwise identical. & % MS4105 ' 20 $ Example 1.2 A short list of familiar examples — you should check the 10 axioms for each: 1. V = Rn together with the standard operations of addition and scalar multiplication together with the standard definitions 2. V = Cn 3. V = the set of all 2 × 2 matrices a b c d (if restricted to real matrices then V is a real vector space). & % MS4105 21 ' $ 4. V = the set of all m × n matrices (real or complex) a . . . a1n 11 a21 . . . a2n .. .. . ... . am1 . . . amn 5. Let V be any plane in Rn containing the origin. n X V = x ∈ Rn | ai xi = 0 . i=1 Pn 6. A “plane” P = {x ∈ R | i=1 ai xi = r}; r 6= 0 does not contain the zero vector 0. Such a set is more correctly referred to as an affine space and is not a vector space. Check which axioms fail. n & % MS4105 ' 22 $ I can gather together some simple consequences of the axioms for a vector space (underlining the zero vector 0 for readability) Theorem 1.1 Given V be a vector space, u ∈ V and α ∈ R (or C). Then (i) 0 u = 0. (ii) α 0 = 0. (iii) (−1) u = −u. (iv) If α u = 0 then α = 0 or u = 0. (v) The zero vector is “unique” — i.e. if two vectors satisfy Vector Space Axioms 3 & 4 for 0, they must be equal (so 0 is unique). Proof: & % MS4105 23 ' $ (i) I can write: 0u + 0u = (0 + 0)u = 0u & Axiom 8 Properties of R. % MS4105 24 ' $ By Axiom 5, the vector 0u has a negative, −0u. Adding this negative to both sides in the above equation; [0u + 0u] + (−0u) = 0u + (−0u). So 0u + [0u + (−0u)] = 0u + (−0u) Axiom 3 0u + 0 = 0 Axiom 5 0u = 0 Axiom 4. (ii) Check (iii) Check (iv) Check (v) Check & % MS4105 25 ' 1.2.1 $ Exercises 1. Check that all of the examples in Example 1.2 above are in fact vector spaces. 2. Is the “Strange Example” vector space given in Example 1.1 actually a vector space ? (Are the 10 vector space axioms satisfied?) & % MS4105 26 ' 1.3 $ Subspaces A vector space can be contained within another. For example you were asked to check that planes in Rn are a vector space. They are obviously contained in Rn . Mathematicians use the term subspace to describe this idea. Definition 1.3 A subset W of a vector space V is called a subspace of V if W is itself a vector space using the same addition and scalar multiplication operations as V. & % MS4105 ' 27 $ • On the face of it, I need to check that all 10 axioms hold for every vector in a subset W to confirm that it is a subspace. • But as W is a subset of V (W ⊆ V), axioms 2, 3, 7, 8, 9 and 10 are “inherited” from the larger set V — I mean that if all elements of a set S have some property then all elements of any subset of S do too. (Obvious?) • So I need only check axioms 1, 4, 5 and 6. (“Closure” under an operation such as + means that if u and v are in the set V then so is u + v. In the same way, closure under scalar multiplication means that if u ∈ V and α ∈ R then αu is also in V.) • In fact I don’t need to check axioms 1, 4, 5 and 6, I need only check “closure” under addition and scalar multiplication (axioms 1 and 6). This follows from the following theorem: & % MS4105 28 ' $ Theorem 1.2 If W is a non-empty subset of a vector space V then W is a subspace of V if and only if the following conditions hold: (a) if u, v ∈ W, then u + v ∈ W (closure under addition) (b) if α ∈ R (or C) and u ∈ W then αu ∈ W (closure under scalar multiplication.) Proof: [→] If W is a subspace of V, then all the vector space axioms are satisfied; in particular, Axioms 1 & 6 hold — but these are conditions (a) and (b). [←] If conditions (a) and (b) hold then I need only prove that W satisfies the remaining eight axioms. Axioms 2,3,7,8,9 and 10 are “inherited” from V. So I need to check that Axioms 4 & 5 are satisfied by all vectors in W. & % MS4105 29 ' $ • Let u ∈ W. • By condition (b), αu ∈ W for every scalar α. • Setting α = 0, it follows from Thm 1.1 that 0u = 0 ∈ W. • Setting α = −1, it follows from Thm 1.1 that (−1)u = −u ∈ W. & % MS4105 ' 30 $ Example 1.3 Some examples using Thm 1.2 — you should check each. 1. A line through the origin is a subspace of R3 2. The first quadrant (x ≥ 0, y ≥ 0) is not a subspace of R2 . 3. The set of symmetric m × n matrices are a subspace of the vector space of all m × n matrices. 4. The set of upper triangular m × n matrices are a subspace of the vector space of all m × n matrices. 5. The set of diagonal m × n matrices are a subspace of the vector space of all m × n matrices. (Check what do you mean by “diagonal” for a non-square (m 6= n) matrix?) & % MS4105 ' 31 $ 6. Is the set of all m × m matrices with Trace (sum of diagonal elements) equal to zero a subspace of the vector space of all m × m matrices? 7. Is the set of all m × m matrices with Trace equal to one a subspace of the vector space of all m × m matrices? & % MS4105 ' 32 $ An important result — solution spaces of homogeneous linear systems are subspaces of Rn . Theorem 1.3 If Ax = 0 is a homogeneous linear system of m equations in n unknowns, then the set of solution vectors is a subspace of Rn . Proof: Check that the proof is a simple application of Thm 1.2. & % MS4105 33 ' $ Another important idea, linear combinations of vectors. Definition 1.4 A vector w is a linear combination of the vectors v1 , . . . , vk if it can be written in the form w = α1 v1 + · · · + αk vk where α1 , . . . , αk are scalars. Example 1.4 Any vector in Rn is a linear combination of the vectors e1 , . . . , en , where ei is the vector whose components are all zero except for the ith component which is equal to 1. Check . & % MS4105 ' 34 $ If S = {v1 , . . . , vk } is a set of vectors in a vector space V then in general some vectors in V may be linear combinations of the vectors in S and some may not. The following Theorem shows that if I construct a set W consisting of all the vectors that can be written as a linear combination of {v1 , . . . , vk }, then W is a subspace of V. This subspace P is called the span of S and is written k span(S) = i=1 λi vi , λi ∈ R — the set of all possible linear combinations of the vectors v1 , . . . , vk . Theorem 1.4 If S = {v1 , . . . , vk } are is a set of vectors in a vector space V then (a) span(S) is a subspace of V. (b) span(S) is the smallest subspace of V that contains S in the sense that if any other subspace X contains S, then span(S) ⊆ X. & % MS4105 ' 35 $ Proof: (a) To show that span(S) is a subspace of V, I must show that it is non-empty and closed under addition and scalar multiplication. • span(S) is non-empty as 0 ∈ span(S) because 0 = 0v1 + · · · + 0vk . • If u, v ∈ W then using the definition of span(S), certainly both u + v and αu are also — check . & % MS4105 36 ' $ (b) Each vector vi is a linear combination of v1 , . . . , vk as I can write vi = 0v1 + · · · + 1vi + . . . 0vk . So each of the vectors vi , i = 1 . . . k is an element of span(S). Let X be any other vector space that contains v1 , . . . , vk . Since X must be closed under addition and scalar multiplication, it must contain all linear combinations of v1 , . . . , vk . So X must contain every element of span(S) and so span(S) ⊆ X. & % MS4105 ' 37 $ The following Definition summarises the result & the notation: Definition 1.5 If S = {v1 , . . . , vk } is a set of vectors in a vector space V then the subspace span(S) of V consisting of all linear combinations of the vectors in S is called the space spanned by S and I say that the vectors S = {v1 , . . . , vk } span the subspace span(S). & % MS4105 ' 38 $ Example 1.5 Some examples of spanning sets: • If v1 and v2 are two non-collinear (one is not a multiple of the other) vectors in R3 then span{v1 , v2 } is the plane through the origin containing v1 and v2 . (The normal n to this plane is any multiple of v1 × v2 . This simple method for calculating the normal only works in R3 .) • Similarly if v is a non-zero vector in R2 , R3 or Rn , then span({v}) is just the set of all scalar multiples of v and so is the line through the origin determined by v, namely L = {x ∈ Rn |x = αv}. & % MS4105 ' 39 $ Example 1.6 A non-geometrical example; the polynomials (monomials) 1, x, x2 , . . . , xn span the vector space Pn of Pn polynomials in x (Pn = {p|p = i=0 ai xi } where the coefficients ai are real/complex). Check that Pn is a vector space as claimed under the ordinary operations of addition and scalar multiplication, then confirm that Pn is spanned by the set {1, x, x2 , . . . , xn }. (Strictly speaking, Pn is the vector space of functions that map real numbers into polynomials of degree n.) & % MS4105 ' 40 $ Example 1.7 Do the vectors v1 = (1, 1, 2)T , v2 = (1, 0, 1)T and v3 = (2, 1, 3)T span R3 ? Solution: Can all vectors in R3 be written as a linear combination of these 3 vectors? If so then for arbitrary b ∈ R3 , b = α1 v1 + α2 v2 + α3 v3 . Substituting for v1 , v2 and v3 , I have b1 1 1 2 α1 b2 = 1 0 1 α2 . b3 2 1 3 α3 But the coefficient matrix has zero determinant so there are not solutions α1 , α2 , α3 for all vectors b. So the vectors v1 , v2 , v3 do not span R3 . It is easy to see that the vectors must be co-planar. Check that v1 · (v2 × v3 ) = 0 and explain the significance of the result. & % MS4105 41 ' 1.3.1 $ Exercises 1. Which of the following sets are subspaces of Rn ? (a) {(x1 , x2 , . . . , xn ) : x1 + 2x2 + · · · + nxn = 0} (b) {(x1 , x2 , . . . , xn ) : x1 x2 . . . xn = 0} (c) {(x1 , x2 , . . . , xn ) : x1 ≥ x2 ≥ x3 · · · ≥ xn } (d) {(x1 , x2 , . . . , xn ) : x1 , x2 , . . . , xn are integers} (e) {(x1 , x2 , . . . , xn ) : x21 + x22 + · · · + x2n = 0} (f) {(x1 , x2 , . . . , xn ) : xi + xn+1−i = 0, i = 1, . . . , n} & % MS4105 42 ' 1.4 $ Linear Independence • A set S of vectors spans a given vector space V if every vector in V is expressible as a linear combination of the vectors in S. • In general there may be more than one way to express a vector in V as a linear combination of the vectors in S. • Intuitively this suggests that there is some redundancy in S and this turns out to be the case. • A spanning set that is “as small as possible” seems better. • I will show that “minimal” spanning sets have the property that each vector in V is expressible as a linear combination of the spanning vectors in one and only one way. • Spanning sets with this property play a role for general vector spaces like that of coordinate axes in R2 and R3 . & % MS4105 ' 43 $ Example 1.8 Take the three vectors v1 = (1, 0)T , v2 = (0, 1)T and v3 = (1, 1)T . They certainly span R2 as any vector (x, y)T in the plane can be written as xv1 + yv2 + 0v3 . But v3 isn’t needed, it is “redundant”. I could (with slightly more effort) write any vector (x, y)T in the plane as a combination of v1 and v3 or v2 and v3 . Check . So it looks like I need two vectors in the plane to span R2 ? But not any two. For example v1 = (1, 2)T , v2 = (2, 4)T are parallel so (for example) v = (1, 3)T cannot be written as a combination of the two as it doesn’t lie on the line through the origin y = 2x. I need to sort out the ideas underlying this Example. I’ll define the term “Linear Independence” on the next Slide — I’ll then show that it means “no redundant vectors”. & % MS4105 44 ' $ Definition 1.6 If S = {v1 , . . . , vk } is a non-empty set of vectors then the vector equation (note that the vi and the zero vector 0 are typed in bold here to remind you that they are vectors) α1 v1 + · · · + αk vk = 0 certainly has the solution α1 = α2 = · · · = αk = 0. If this is the only solution, I say that the set S is linearly independent — otherwise I say that S is a linearly dependent set. The procedure for determining whether a set of vectors {v1 , . . . , vk } is linearly independent can be simply stated: check whether the equation α1 v1 + · · · + αk vk = 0 has any solutions other than α1 = α2 = · · · = αk = 0. I’ll often say that I am checking that “the only solution is the trivial solution”. & % MS4105 45 ' $ Example 1.9 Let v1 = (2, −1, 0, 3)T , v2 = (1, 2, 5, −1)T and v3 = (7, −1, 5, 8)T . In fact, the set of vectors S = {v1 , v2 , v3 } is linearly dependent as 3v1 + v2 − v3 = 0. But suppose that I didn’t know this. If I want to check for linear dependence I can just apply the definition. I write α1 v1 + α2 v2 + α3 v3 = 0 and check whether the set of equations for α1 , α2 , α3 has non-trivial solutions. 2α1 + α2 + 7α3 = 0 −α1 + 2α2 − α3 = 0 0α1 + 5α2 + 5α3 = 0 3α1 − α2 + 8α3 = 0 & % MS4105 46 ' $ Check that Gauss Elimination (without pivoting) reduces the coefficient matrix to: 1 0 0 0 0 1 0 0 3 1 0 0 Finally check that the linear system for α1 , α2 , α3 has infinitely many solutions — confirming that v1 , v2 , v3 are linearly dependent. & % MS4105 ' 47 $ Example 1.10 A particularly simple example; consider the vectors i = (1, 0, 0)T , j = (0, 1, 0)T ,k = (0, 0, 1)T in R3 . To confirm that they are linearly independent , write α1 i + α2 j + α3 k = 0. I immediately have (α1 , α2 , α3 )T = (0, 0, 0)T and so “the only solution is the trivial solution”. Geometrically, of course, the three vectors i, j and k are just unit vectors along the x, y and z directions. & % MS4105 48 ' $ Example 1.11 Find whether v1 = (1, −2, 3)T , v2 = (5, 6, −1)T and v3 = (3, 2, 1)T are linearly independent . Applying the definition, write α1 v1 + α2 v2 + α3 v3 = 0 so α1 + 5α2 + 3α3 = 0 −2α1 + 6α2 + 2α3 = 0 3α1 − α2 + α3 = 0. Check that the solution to this linear system is α1 = − 12 t, α2 = − 12 t and α3 = t where t is an arbitrary parameter. I conclude that the three vectors v1 , v2 , v3 are linearly dependent. (Why?) & % MS4105 49 ' $ Example 1.12 Show that the polynomials (monomials) 1, x, x2 , . . . , xn are a linearly independent set of vectors in Pn , the vector space of polynomials of degree less than or equal to n. Solution: Assume as usual that some linear combination of the vectors 1, x, x2 , . . . , xn is the zero vector. Then n X ai xi = 0 for all values of x. i=0 But it is easy to show that all the ai ’s must be zero. & % MS4105 50 ' $ • Set x = 0, I see that a0 = 0. • So x a1 + a2 x + . . . an x n−1 = 0 for all values of x. • If x 6= 0 then a1 + a2 x + . . . an xn−1 = 0 for all non-zero values of x. • Take the limit as x → 0 of a1 + a2 x + . . . an xn−1 . The result a1 must be zero as polynomials are continuous functions. n−2 = 0 for all values of x. • So x a2 + . . . an x • The argument can be continued to show that a2 , a3 , . . . an must all be zero. So “the only solution is the trivial solution” and therefore the set 1, x, x2 , . . . , xn are a linearly independent set of vectors in Pn . & % MS4105 51 ' $ Enough examples — the term linearly dependent suggests that the vectors “depend” on each other. The next theorem confirms this. Theorem 1.5 A set S of two or more vectors is (a) Linearly dependent if and only if at least one of the vectors in S can be expressed as a linear combination of the remaining vectors in S. (b) Linearly independent if and only if no vector in S can be written as a linear combination of the remaining vectors in S. Proof: (a) Let S = {v1 , . . . , vk } be a set of two or more vectors. Assume that S is linearly dependent. I know that there are scalars α1 , . . . , αk , not all zero, such that α1 v1 + · · · + αk vk = 0. & % MS4105 52 ' $ Let αp be the first non-zero coefficient. I can solve for vp : vp = − 1 (α1 v1 + · · · + αp−1 vp−1 + αp+1 vp+1 + · · · + αk vk ) αp So one of the vectors in S can be expressed as a linear combination of the remaining vectors in S. Conversely, assume that one of the vectors in S, vp say, can be expressed as a linear combination of the remaining vectors in S. Then vp = β1 v1 + · · · + βp−1 vp−1 + βp+1 vp+1 + · · · + βk vk and so I can write β1 v1 + · · · + βp−1 vp−1 −vp + βp+1 vp+1 + · · · + βk vk = 0. I have a “non-trivial” solution to the latter equation (coefficient of vp is −1) so the set {v1 , . . . , vk } is linearly dependent. & % MS4105 53 ' (b) Check that the proof of this case is straightforward. $ The following Theorem is easily proved. Theorem 1.6 (a) A finite set of vectors that contains the zero vector is linearly dependent. (b) A set with exactly two vectors is linearly independent if and only if neither vector is a scalar multiple of the other. Proof: Check that the proof is (very) straightforward. Example 1.13 Consider the set of real-valued functions on the real line (written F(R)): 1. Check that F(R) is a vector space with addition and scalar multiplication defined in the obvious way. 2. Consider the set of functions {x, sin x}. Check that the above Theorem implies that this is a linearly independent set in F(R). & % MS4105 54 ' $ Some Geometric Comments • In R2 and R3 , a set of two vectors is linearly independent iff the vectors are not collinear. • In R3 , a set of three vectors is linearly independent iff the vectors are not coplanar. I now prove an important result about Rn that can be extended to general vector spaces. Theorem 1.7 Let S = {v1 , . . . , vk } be a set of vectors in Rn . If k > n, then S is linearly dependent. Proof: Write vi = (vi1 , vi2 , . . . , vin )T for each vector vi , i = 1, . . . , k. Then if I write as usual α1 v1 + · · · + αk vk = 0, & % MS4105 55 ' $ I get the linear system v11 α1 +v21 α2 +... +vk1 αk =0 v12 α1 .. . +v22 α2 .. . +... .. . +vk2 αk .. . =0 .. . v1n α1 +v2n α2 +... +vkn αk =0 This is a homogeneous system of n equations in the k unknowns α1 , . . . , αk with k > n. In other words; “more unknowns than equations”. I know from Linear Algebra 1 that such a system must have nontrivial solutions. (Or see App. B.) So S = {v1 , . . . , vk } is a linearly dependent set. & % MS4105 56 ' $ I can extend these ideas to vector spaces of functions to get a useful result. First, two definitions Definition 1.7 If a function f has n continuous derivatives on R I write this as f ∈ Cn (R). and Definition 1.8 (Wronskian Determinant) Given a set {f1 , f2 , . . . , fn } of Cn−1 (R) functions, the Wronskian determinant W(x) which depends on x is the determinant of the matrix: f (x) f2 (x) ... fn (x) 1 0 0 0 f (x) f (x) . . . f (x) 1 n 2 W(x) = .. .. .. . . . (n−1) f1 & (x) (n−1) f2 (x) . . . (n−1) fn (x) % MS4105 57 ' $ Theorem 1.8 (Wronskian) If the Wronskian of a set of Cn−1 (R) functions {f1 , f2 , . . . , fn } is not identically zero on R then the set of functions form a linearly independent set of vectors in Cn−1 (R). Proof: Suppose that the functions f1 , f2 , . . . , fn are linearly dependent in C(n−1) (R). Then there exist scalars α1 , . . . , αn , not all zero, such that α1 f1 (x) + α2 f2 (x) + · · · + αn fn (x) = 0 for all x ∈ R. Now differentiate this equation n − 1 times in & % MS4105 58 ' $ succession, giving us n equations in the n unknowns α1 , . . . , αn : α1 f1 (x) +α2 f2 (x) +... +αn fn (x) =0 α1 f10 (x) .. . +α2 f20 (x) .. . +... +αn fn0 (x) .. . =0 .. . (n−1) α1 f1 (n−1) (x) +α2 f2 (x) +... (n−1) +αn fn (x) =0 I assumed that the functions are linearly dependent so this homogeneous linear system must have a nontrivial solution for every x ∈ R. This means that the corresponding n × n matrix is not invertible and so its determinant — the Wronskian — is zero for all x ∈ R. Conversely, if the Wronskian is not identically zero on R then the given set of functions must be a linearly independent set of vectors in C(n−1) (R). & % MS4105 ' 59 $ Example 1.14 Let F = {x, sin x}. The elements of F are certainly C1 (R). The Wronskian is x sin x = x cos x − sin x. W(x) = 1 cos x Check that this function is not identically zero on R and so the set F is linearly independent in C1 (R). Example 1.15 Let G = {1, ex , e2x }. Again it is obvious that F ⊆ C2 (R). Check that the Wronskian is: 1 ex e2x 3x x 2x W(x) = 0 e 2e = 2e . x 2x 0 e 4e This function is certainly not identically zero on R so the set G is linearly independent in C2 (R). & % MS4105 60 ' $ Note: the converse of the theorem is not true, i.e. even if the Wronskian is identically zero on R, the functions are not necessarily linearly dependent. Just to settle the point — a counter-example. Example 1.16 Let 0 f(x) = (x + 1)2 (x − 1)2 |x| ≤ 1 |x| > 1 and g(x) = (x − 1)2 (x + 1)2 . Then clearly f and g are in C1 (R) : 0 f 0 (x) = 4x(x − 1)(x + 1) |x| ≤ 1 |x| > 1 . I can show that lim f(x) = 0 so f 0 (x) is continuous on R as is f. x→±1 & % MS4105 ' 61 $ Finally, if |x| ≤ 1 then W(x) = 0 clearly as f(x) = 0 on [−1, 1]. If |x| > 0the W(x) = 0 clearly as f, g are linearly dependent for |x| > 1. So W(x) = 0 on R. However f and g are linearly independent on Rbecause although I have f − g = 0 on |x| > 1, f − g 6= 0 for |x| < 1 — i.e. there is no non-trivial choice of α and β such that αf(x) + βg(x) = 0 for all x ∈ R. & % MS4105 62 ' Exercises 1 1 1. Let A = 0 2 0 4 Are the rows? $ 1.4.1 1 4 Are the columns linearly independent? 16 2. Let S and T be subsets of a vector space V such that S ⊂ T . Prove that a) if T is linearly independent, so is S and b) if S is linearly dependent then so is T . & % MS4105 63 ' 1.5 $ Basis and Dimension We all know intuitively what we mean by dimension — a plane is 2–dimensional, the world that we live in is 3–dimensional, a line is 1–dimensional. In this section I make the term precise and show that it can be used when talking about a vector space. You have already used i and j, the unit vectors along the x and y directions respectively in R2 . Any vector v in R2 can be written in one and only one way as a linear combination of i and j — if v = (v1 , v2 )T ∈ R2 then v = v1 i + v2 j. Similarily you have used i, j and k, the unit vectors along the x, y and z directions respectively in R3 and any vector v in R3 can be written in one and only one way as a linear combination of i, j and k — if v = (v1 , v2 , v3 )T ∈ R3 then v = v1 i + v2 j + v3 k. & % MS4105 ' 64 $ This idea can be extended in an obvious way to Rn ; if v = (v1 , . . . , . . . , vn )T ∈ Rn then v = v1 e1 + v2 e2 + · · · + vn en where (for each i = 1, . . . , n) ei is the unit vector whose components are all zero except for the ith component which is one. (Each vector ei is just the ith column of the n × n identity matrix I.) What is not so obvious is that this idea can be extended to sets of vectors that are not necessarily lined up along the various standard coordinate axes or even perpendicular. (Though I haven’t yet said what I mean by “perpendicular” for a general vector space. ) I need a definition: Definition 1.9 If V is any vector space and S = {v1 , . . . , vn } is a set of vectors in V, then I say that S is a basis for V if (a) S is linearly independent (b) S spans V. & % MS4105 ' 65 $ • The next theorem explains the significance of the term “basis”. • A basis is the vector space generalisation of a coordinate system in R2 , R3 or Rn . • Unlike e1 , . . . , en , the elements of a basis are not necessarily perpendicular. • I haven’t yet said formally what “perpendicular” means in a vector space — I will clarify this idea in the next Chapter. & % MS4105 66 ' $ Theorem 1.9 If S = {v1 , . . . , vn } is a basis for a vector space V then every vector v ∈ V can be expressed as a linear combination of the vectors in S in one and only one way. Proof: S spans the vector space so suppose that some vector v ∈ V can be written a linear combination of the vectors v1 , . . . , vn in two different ways: v = α1 v1 + · · · + αn vn and v = β1 v1 + · · · + βn vn . Subtracting, I have 0 = (α1 − β1 )v1 + · · · + (αn − βn )vn . But the set S is linearly independent so each coefficient must be zero: αi = βi , for i = 1, . . . , n. & % MS4105 ' 67 $ I already know how to express a vector in Rn (say) in terms of the “standard basis” vectors e1 , . . . , en . Let’s look at a less obvious case. & % MS4105 68 ' $ Example 1.17 Let v1 = (1, 2, 1)T , v2 = (2, 9, 0)T and v3 = (3, 3, 4)T . Show that S = {v1 , v2 , v3 } is a basis for R3 . Solution: I must check that the set S is linearly independent and that it spans R3 . [Linear independence:] Write α1 v1 + α2 v2 + α3 v3 = 0 as usual. Expressing the three vectors in terms of their components, I find that the problem reduces to showing that the homogeneous linear system α1 +2α2 +3α3 =0 2α1 +9α2 +3α3 =0 +4α3 =0 α1 has only the trivial solution. & % MS4105 69 ' $ [Spanning:] I need to show that every vector w = (w1 , w2 , w3 ) ∈ R3 can be written as a linear combination of v1 , v2 , v3 . In terms of components; α1 +2α2 +3α3 = w1 2α1 +9α2 +3α3 = w2 +4α3 = w3 α1 So I need to show that this linear system has a solution for all choices of the vector w. Note that the same coefficient matrix appears in both the check for linear independence and that for spanning. The condition needed in both cases is that the coefficient matrix is invertible. It is easy to check that the determinant of the coefficient matrix is −1 so the vectors are linearly independent and span R3 . & % MS4105 ' 70 $ Example 1.18 Some other examples of bases, check each: • S = {1, x, . . . , xn } is a basis for the vector space Pn of polynomials of degree n. 1 0 0 1 0 0 0 0 , , , is a basis for the • M= 0 0 0 0 1 0 0 1 vector space M22 of 2 × 2 matrices. • The standard basis for Mmn , (the vector space of m × n matrices) is just the set of mn different m × n matrices with a single entry equal to 1 and all the others equal to zero. • Can you find a basis for the “Strange Example” vector space in Example 1.1? & % MS4105 ' 71 $ Definition 1.10 I say that a vector space V is finite-dimensional if I can find a basis set for V that consists of a finite number of vectors. Otherwise I say that the vector space is infinite-dimensional . Example 1.19 The vector spaces Rn , Pn and Mmn are all finite dimensional. The vector space F(R) of functions on R is infinite dimensional. Check that for all positive integers n, I can find n + 1 linearly independent vectors in F(R). Explain why this means that F(R) cannot be finite dimensional. (Hint: consider polynomials.) & % MS4105 ' 72 $ I am still using the term “dimension” in a qualitative way (finite vs. infinite). I need one more Theorem — then I will be able to define the term unambiguously. I first prove a Lemma that will make the Theorem trivial. (I will use the “underline” notation for vectors to improve readability.) Lemma 1.10 Let V be a finite-dimensional vector space. Let L = {l1 , . . . , ln } be a linearly independent set in V and let S = {s1 , . . . , sm } be a second subset of V which spans V. Then m ≥ n or “any spanning set in V has at least as many elements as any linearly independent set in V”. “|S| ≥ |L|”— alphabetic ordering.... & % MS4105 73 ' $ Proof: I will assume that m < n and and show this leads to a contradiction. As S spans V I can write l1 = a11 s1 +a12 s2 +... +a1m sm l2 .. . = a21 s1 .. . +a22 s2 .. . +... +a2m sm .. . ln = an1 s1 +an2 s2 +... (1.1) +anm sm As m < n, the n × m coefficient matrix A = {aij } for i = 1, . . . , m and j = 1, . . . , n is “tall and thin” and so the m × n matrix AT is “wide and short”, so the linear system AT c = 0 where c = (c1 , . . . , cn )T is a homogeneous linear system with more unknowns than equations and so must have non-trivial solutions for which not all of c1 , . . . , cn are zero. (See App. B for a proof of this.) & % MS4105 74 ' $ As AT c = 0, each element of the vector AT c is also zero. So I can write (multiplying each si by the ith element of AT c): (a11 c1 + a21 c2 + · · · + an1 cn ) s1 + (a12 c1 + a22 c2 + · · · + an2 cn ) s2 .. . + (a1m c1 + a2m c2 + · · · + anm cn ) sm = 0. & (1.2) % MS4105 75 ' $ Now the tricky bit! If I gather all the c1 ’s, c2 ’s etc. together then: c1 (a11 s1 + a12 s2 + · · · + a1m sm ) + c2 (a21 s1 + a22 s2 + · · · + a2m sm ) .. . + cn (an1 s1 + an2 s2 + · · · + anm sm ) = 0 (1.3) But the sums in brackets are just l1 , . . . , ln by (1.1) which means that I can write: c1 l1 + c2 l2 + · · · + cn ln = 0 with c1 , . . . , cn not all zero. But this contradicts the assumption that L = {l1 , . . . , ln } is linearly independent. Therefore I must have n ≤ m as claimed. & % MS4105 ' 76 $ In words the Lemma says that a linearly independent set in a finite-dimensional vector space cannot have more elements than a spanning set in the same finite-dimensional vector space. The proof of the Lemma is messy because I am explicitly writing out the sums. The proof is much easier if I use “index” notation. See Appendix A for an explanation of this notation and a proof using it. Either proof is acceptable. & % MS4105 ' 77 $ Now the Theorem — easily proved as promised. Theorem 1.11 Given a a finite-dimensional vector space V, all its bases have the same number of vectors. Proof: • Let Bm and Bn be bases for V with m and n vectors respectively. • Obviously as Bm and Bn are bases, both are linearly independent and span V. • So I am free to choose either one to be L, (a linearly independent set) and the other to be S (a spanning set). • Remember that the Lemma says that “any spanning set in V has at least as many elements as any linearly independent set in V”. & % MS4105 78 ' $ • If I choose Bm (with m elements) as S (a spanning set) and Bn (with n elements) as L (a linearly independent set) I have m ≥ n. • The sneaky part is to choose Bn as a spanning set S and Bm as a linearly independent set L — effectively swapping n and m in the Lemma. • Then I must have n ≥ m. • So I am forced to conclude that m = n. So any two bases for a given finite-dimensional vector space must have the same number of elements. & % MS4105 ' 79 $ This allows us to define the term dimension: Definition 1.11 The dimension of a finite dimensional vector space V, written dim(V) is the number of vectors in any basis for V. I define the dimension of the zero vector space to be zero. Example 1.20 Some examples: • dim(Rn ) = n as the standard basis has n vectors. • dim(Pn ) = n + 1 as the standard basis {1, x, . . . , xn } has n + 1 vectors. • dim(Mmn ) = mn as the standard basis discussed above has mn vectors. • What is the dimension of the “Strange Example” vector space in Example 1.1? & % MS4105 ' 80 $ Three more theorems to complete our discussion of bases and dimensionality. The first is almost obvious but has useful applications. In plain English, the Theorem says that • adding an external vector to a linearly independent set does not undo the linear independence property and • removing a redundant vector from a set does not change the span of the set. & % MS4105 81 ' $ Theorem 1.12 (Adding/Removing) Let S be a non-empty set of vectors in a vector space V. Then (a) If S is a linearly independent set and if v ∈ V is outside span(S) (i.e. v cannot be expressed as a linear combination of the vectors in S) then the set S ∪ v is still linearly independent (i.e. adding v to the list of vectors in S does not affect the linear independence of S). (b) If v ∈ S is a vector that is expressible as a linear combination of other vectors in S and if I write S \ v to mean S with the vector v removed then S and S \ v span the same space, i.e. span(S) = span(S \ v). & % MS4105 82 ' $ Proof: (a) Assume that S = {v1 , . . . , vk } is a linearly independent set of vectors in a vector space V and that v ∈ V but that v is not in span(S). RTP that S 0 = S ∪ {v} is still linearly independent. As usual write α1 v1 + · · · + αk vk + αk+1 v = 0 and try to show that all the αi are zero. But I must have αk+1 = 0 as otherwise I could write v as a linear combination of the vectors in S. So I have α1 v1 + · · · + αk vk = 0. The vectors v1 , . . . , vk are linearly independent so all the αi are zero. & % MS4105 ' 83 $ (b) Assume that S = {v1 , . . . , vk } is a set of vectors in a vector space V and to be definite assume that vk = α1 v1 + · · · + αk−1 vk−1 . Now consider the smaller set S 0 = S \ {vk } = {v1 , . . . , vk−1 }. Check that span(S) = span(S 0 ). & % MS4105 ' 84 $ In general, to check that a set of vectors {v1 , . . . , vn } is a basis for a vector space V, I need to show that they are linearly independent and span V. But if I know that dim(V) = n then checking either is enough! This is justified by the theorem: Theorem 1.13 If V is an n-dimensional vector space and if S is a set in V with exactly n elements then S is a basis for V if either S spans V or S is linearly independent . & % MS4105 ' 85 $ Proof: Assume that S has exactly n vectors and spans V. RTP that the set S is linearly independent. • But if this is not true, then there is some vector v in S which is a linear combination of the others. • If I remove this redundant vector v from S then by the Adding/Removing Theorem 1.12 the smaller set of n − 1 vectors still spans V. • If this smaller set is not linearly independent then repeat the process (the process must terminate at k ≥ 1 as a set with only one vector is certainly linearly independent ) until I have a set of (say) k linearly independent vectors (k < n) that span V. • But this is impossible as Thm 1.11 tells us that a set of fewer than n vectors cannot form a basis for a n-dimensional vector space. So S is linearly independent. & % MS4105 86 ' $ Let B be any basis for V. Let S have exactly n vectors and be linearly independent. RTP that S spans V. Assume not. • Let C consist of the elements of the basis B not in span(S). • Add elements of C one at a time to S — by the Adding/Removing Theorem 1.12, the augmented versions of S are still linearly independent. • Continue until the augmented version of S spans V. • As C is a finite set the process must terminate with the augmented version of S spanning V and linearly independent. • But this is impossible as Thm 1.11 tells us that no set of more than n vectors in an n-dimensional vector space can be linearly independent. So S spans V. & % MS4105 ' 87 $ Example 1.21 Show that the vectors v1 = (2, 0, −1)T , v2 = (4, 0, 7)T and v3 = (−1, 1, 4)T form a basis for R3 . I need only check that the vectors are linearly independent. But by inspection v1 and v2 are linearly independent so as v3 is outside the x–z plane in which v1 and v2 lie, by Theorem 1.13 the set of all three is linearly independent. . & % MS4105 ' 88 $ The final theorem in this Chapter is often used in Matrix Algebra; it says that for a finite-dimensional vector space V, every set that spans V has a subset that is a basis for V — and that every linearly independent set in V can be expanded to form a basis for V. Theorem 1.14 Let S be a finite set of vectors in a finite-dimensional vector space V. (a) If S spans V but is not a basis for V then S can be reduced to a basis for V by discarding some of the redundant vectors in S. (b) If S is a linearly independent set that is not a basis for V then S can be expanded to form a basis for V by adding certain external vectors to S. & % MS4105 ' 89 $ Proof: (a) If S is a set of vectors that spans V but is not a basis for V then it must be linearly dependent. Some vector v in S must be expressible as a linear combination of some of the others. By the Adding/Removing Theorem 1.12(b) I can remove v from S and the resulting set still spans V. If this set is linearly independent I am done, otherwise remove another “redundant” vector. Let dim(V) = n. If the size of S (written |S|) is less than n then the process must stop at a linearly independent set of size k, 1 ≤ k < n which spans V. This is impossible (why?). So |S| > n. Apply the removal process — it must continue until a set of n vectors that spans V is reached. (Why can there not be a linearly independent spanning set of size greater than n?) By Thm. 1.13 this subset of S is linearly independent and a basis for V as claimed. & % MS4105 ' 90 $ (b) Suppose that dim(V) = n. If S is a linearly independent set that is not a basis for V then S fails to span V and there must be some vector v ∈ V that is not in span(S). By the Adding/Removing Theorem 1.12(a) I can add v to S while maintaining linear independence. If the new set S 0 (say) spans V then S 0 is a basis for V. Otherwise I can select another suitable vector to add to S 0 to produce a set S 00 that is still linearly independent. I can continue adding vectors in this way till I reach a set with n linearly independent vectors in V. This set must be a basis by Thm. 1.13. & % MS4105 91 ' 1.5.1 $ Exercises 1. Let B = {v1 , v2 , . . . ., vn } be a basis for the vector space V. Let 1 < m < n and let V1 be the subspace spanned by{v1 , v2 , . . . ., vm } and let V2 be the subspace spanned by {vm+1 , vm+2 , . . . , vn }. Prove that V1 ∩ V2 = {0}. 2. Let W be the set of all 3 × 3 real matrices M with the property that all matrices in W have equal row & column sums. (Meaning if I take any vector (matrix) in W all its row sums are equal to some constant C and all its column sums are equal to the same constant C.) (a) Prove that W is a subspace of the vector space of 3 × 3 real matrices. (Easy.) (b) Find a basis for this subspace. (Difficult.) (c) Determine the dimension of the subspace W. (Easy once (b) completed or just by thinking about the problem.) & % MS4105 ' 92 $ (d) Let W0 be the set of all 3 × 3 matrices whose row sums and column sums are zero. Prove that W0 is a subspace of W and determine its dimension. (Easy once (b) completed.) (e) Generalise to the case of n × n real matrices with equal row and column sums. & % MS4105 93 ' 2 $ Inner Product Spaces You should be familiar with the idea of a “dot product” in R2 , R3 and Rn . A reminder: given vectors u and v in R2 , R3 and Rn ; u·v = n X ui vi . (2.1) i=1 Example 2.1 Let u = (1, 2, 3)T and v = (−1, 2, −1)T . Then u · v = (1)(−1) + (2)(2) + (3)(−1) = 0. You learned in Linear Algebra 1 that if (as in the example) u·v u · v = 0, the two vectors are perpendicular: cos θ = kukkvk =0 √ where kuk = u · u. & % MS4105 ' 94 $ I will show in this Chapter how these ideas — length, distance and angle — can be extended to any vector space (satisfying the 10 vector space axioms in Def. 1.2) if some extra axioms hold. A vector space satisfying these four extra axioms is called an Inner Product Space. & % MS4105 95 ' 2.1 $ Inner Products • In Linear Algebra 1 you probably used the standard “dot” notation for the Euclidean inner product u · v of two vectors in Rn — (2.1). • When talking about the general inner product of two vectors from a general inner product space I will use the notation hu, vi. & % MS4105 ' 96 $ Definition 2.1 An inner product on a real vector space V is a function that assigns a real number hu, vi to every pair of vectors u and v in V so that the following axioms are satisfied for all vectors u, v and w and all scalars α: 1. u, v = v, u Symmetry Axiom 2. u + v, w = u, w + v, w Distributive Axiom 3. αu, v = α u, v Homogeneity Axiom 4. u, u ≥ 0 and u, u = 0 if and only if u = 0. A real vector space with an inner product as defined above is called a real inner product space. The slightly more general complex inner product space will be defined later. & % MS4105 97 ' $ The four axioms for an inner product space are based on the properties of the Euclidean inner product so the Euclidean inner product (or “dot product”) automatically satisfies the axioms for an inner product space : Example 2.2 If u = (u1 , . . . , un )T and v = (v1 , . . . , vn )T are vectors in Rn then defining u, v = u · v = n X ui v i i=1 defines u, v to be the Euclidean inner product on Rn . Check that the four axioms Def. 2.1 hold. & % MS4105 98 ' $ Example 2.3 A slight generalisation of the Euclidean inner product on Rn is a weighted Euclidean inner product on Rn . If u = (u1 , . . . , un )T and v = (v1 , . . . , vn )T are vectors in Rn and wi are a set of positive weights then defining u, v = n X wi ui vi i=1 check that u, v is an inner product. & % MS4105 99 ' 2.1.1 $ Exercises 1. Verify that the following is an inner product on R2 . (a1 , a2 ), (b1 , b2 ) = a1 b1 − a2 b1 − a1 b2 + 4a2 b2 2. Let u = (3, −1, 0, 1/2)T , v = (−1, 3, −1, 1)T , w = (0, 2, −2, −1)T 4 be vectors in R . Calculate the inner products u, v , v, w and w, u . 3. Check that A, B = trace(BT A) is an inner product on the vector space of real n × n matrices, where Pn trace(A) = i=1 aii . 4. Let V be the vector space of continuous real-valued functions R1 on [0, 1]. Define f, g = 0 f(t)g(t)dt. Verify that this is an inner product on V. & % MS4105 ' 100 $ 5. In Example 2.3 above; if I weaken the requirement that the weights wi are positive, do I still have an IPS? Why/why not? 2 6. If (as below) I define kxk = x, x , prove that the Parallelogram law 2 2 2 2 ku + vk + ku − vk = 2 kuk + kvk (2.2) holds in any inner product space. (Also draw a sketch to illustrate the result in R2 with the Euclidean Inner Product.) & % MS4105 101 ' 2.1.2 $ Length and Distance in Inner Product Spaces I said that inner product spaces allow us to extend the ideas of length and distance from Rn to any vector space (once an inner product is defined). In Rn , the (Euclidean) length of a vector u is just √ kuk = u · u (2.3) and the Euclidean distance between two points (or vectors) u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) is p d(u, v) = ku − vk = (u − v) · (u − v). & (2.4) % MS4105 102 ' $ Note In Rn the distinction between a point (defined by a set of n coordinates) and a vector (defined by a set of n components) is just a matter of notation or interpretation. A “position vector” v = (v1 , . . . , vn )T can be interpreted as a translation from the origin to the “point” v whose coordinates are (v1 , . . . , vn ). (v1 , v2 ) v = (v1 , v2 )T Figure 1: Position Vector/Point Equivalence in R2 & % MS4105 103 ' $ Based on these formulas for Rn (“Euclidean n-space”), I can make corresponding definitions for a general Inner Product Space. Definition 2.2 If V is an inner product space then the norm (“length”) of a vector u is written kuk and defined by: q kuk = u, u . The distance between two points/vectors u and v is written d(u, v) and is defined by: d(u, v) = ku − vk. Example 2.4 Check that Euclidean n-space is an inner product space. Example 2.5 For the weighted Euclidean inner product check that n X wi u2i . (Note that I often write the formula for kuk2 kuk2 = i=1 & % MS4105 ' 104 $ rather than kuk to avoid having to write the square root symbol.) & % MS4105 ' 105 $ Note: I cannot yet prove the list of properties of vector norms given in Thm. 2.3 below as the proof requires the Cauchy-Schwarz inequality (2.7) which is proved in the next Section. & % MS4105 106 ' 2.1.3 $ Unit Sphere in Inner Product Spaces If V is an inner product space then the set of vectors that satisfy kuk = 1 is called the unit sphere in V. A vector on the unit sphere is called a unit vector. In R2 and R3 with the standard Euclidean inner product , the unit sphere is just a unit circle or unit sphere respectively. But even in R2 , R3 and Rn (n ≥ 3), different inner products give rise to different geometry. Example 2.6 Consider R2 with the standard Euclidean inner product. Then the unit sphere kuk = 1 is just the set of vectors u ∈ R2 such that u21 + u22 = 1, the familiar unit circle. But if I use a weighted inner product , say w1 = 1 and w2 = 14 then the unit u21 u2 2 4 sphere kuk = 1 corresponds to the set + = 1, an ellipse whose semi-major axis (length 2) is along the y-direction and semi-minor axis (length 1) along the x-direction. & % MS4105 ' 107 $ Example 2.7 An important class of inner products on Rn is the class of inner products generated by matrices. Let A be an n × n invertible matrix. Then check that u, v = (Au) · (Av) defines an inner product for Rn . The exercise is easier if I note that the Euclidean inner product u · v = vT u where — as noted earlier — I treat all vectors as column vectors and so vT is a row vector and so vT u is the matrix product of a 1 × n matrix and a n × 1 matrix. The result is of course a 1 × 1 matrix — a scalar. Using this insight, I can write u, v = (Av)T Au = vT AT Au. & % MS4105 ' 108 $ Example 2.8 I can define inner products on Pn (the vector space of polynomials of degree n) in more than one way. Suppose that vectors p and q in Pn can be written p(x) = p0 + p1 x + . . . pn xn and q(x) = q0 + q1 x + . . . qn xn . Then define p, q = p0 q0 + p1 q1 + · · · + pn qn and check that this is an inner product. The norm of the polynomial p with respect to this inner product is Pn 2 given by kpk = p, p = i=0 p2i and the unit sphere in Pn with this inner product is the set of coefficients that satisfy the equation Pn 2 i=0 pi = 1. So, for exanple, if I take n = 2 then the quadratic 1 q(x) = √ (1 + x + x2 ) is a unit vector in P2 , the inner product 3 space of polynomials of degree 2 with the inner product described. & % MS4105 109 ' $ Example 2.9 An important example: given two functions f and g in C([a, b]) — the vector space of continuous functions on the interval [a, b] — define Zb f, g = f(x)g(x)dx. a It is easy to check that the four inner product space axioms Def. 2.1 hold. Example 2.10 This allows us to define a norm on C([a, b]) Zb kf k2 = f , f = f2 (x)dx. a & % MS4105 ' 110 $ Some simple properties of inner products : Theorem 2.1 If u,v and w are vectors in a real inner product space and α ∈ R then (0 is the zero vector, not the number 0) (a) 0, v = 0 (b) u, v + w = u, v + u, w (c) u, αv = α u, v (d) u − v, w = u, w − v, w (e) u, v − w = u, v − u, w Proof: Check that all are simple consequences of the four inner product space axioms Def. 2.1. Example 2.11 Doing algebra in inner product spaces is straightforward using the defining axioms and this theorem: try to “simplify” u − 2v, 3u + 4v . & % MS4105 111 ' 2.2 $ Angles and Orthogonality in Inner Product Spaces I will show that angles can be defined in a general inner product space. Remember that in Linear Algebra 1 you saw that given two vectors u and v in Rn , you could write u · v = kukkvk cos θ or equivalently cos θ = u·v kukkvk (2.5) It would be reasonable to define the cosine of the angle between two vectors in an inner product space to be given by the inner product space version of (2.5) & u, v cos θ = . kukkvk (2.6) % MS4105 112 ' $ But −1 ≤ cos θ ≤ 1 so for this (2.6) definition to work I need | cos θ| ≤ 1 or: u, v ≤1 kukkvk This result is called the Cauchy-Schwarz Inequality and holds in any inner product space. I state it as a theorem: Theorem 2.2 (Cauchy-Schwarz) If u and v are vectors in a real inner product space then u, v ≤ kukkvk. (2.7) Proof: If v = 0 then both sides of the inequality are zero. So asume that v 6= 0. Consider the vector w = u + tv where t ∈ R. By the non-negativity of w, w I have for any real t: 2 0 ≤ w, w = u + tv, u + tv = t v, v + 2t u, v + u, u = at2 + 2bt + c & % MS4105 113 ' $ with a = v, v , b = u, v and c = u, u . The quadratic coefficient a is non-zero as I have taken v 6= 0. The quadratic at2 + 2bt + c either has no real roots or a double real root so its discriminant must satisfy b2 − ac ≤ 0. Substituting 2 for a, b and c I find that u, v − u, u v, v ≤ 0 and so 2 u, v ≤ u, u v, v = kuk2 kvk2 . Taking square roots (using the fact that u, u and v, v are both non-negative) gives us q u, v ≤ u, u v, v = kukkvk. (Note: as u, v is real in a real inner product space, here | u, v | means the absolute value of u, v .) & % MS4105 114 ' $ An alternative proof that will also work for a complex inner product space — again assume that v 6= 0: Proof: Again (let t ∈ R) 2 0 ≤ w, w ≡ u − tv, u − tv = t v, v − 2t u, v + u, u . u, v — substituting gives The sneaky part: take t = v, v 2 u, v u, v u, v + u, u 0≤ 2 v, v − 2 v, v v, v 2 u, v ≤ u, u − v, v Therfore multiplying across by v, v , the result follows. & % MS4105 115 ' $ u, v in this proof means that I have Note The choice t = v, v u, v v u,v v = kuk cos θ , the chosen w = u − v. But v,v kvk v, v projection of u along v, written Projv u. So w = Projv⊥ u the projection of u along the direction perpendicular to v as can be checked by calculating v, w . Check that the result is zero. I will discuss projections and projection operators in detail later. u w ≡ Proj ⊥ u v θ v Projv u Figure 2: Projection of u perpendicular to v & % MS4105 116 ' $ The properties of length and distance in a general inner product space are the same as those for Rn as the following two Theorems confirm. Theorem 2.3 (Properties of Vector Norm) If u and v are vectors in an inner product space V and if α ∈ R then (a) kuk ≥ 0 (b) kuk = 0 if and only if u = 0. (c) kαuk = |α|kuk (d) ku + vk ≤ kuk + kvk the Triangle Inequality. Proof: Check that (a), (b) and (c) are trivial consequences of the inner product space axioms. & % MS4105 117 ' $ To prove (d); 2 ku + vk = u + v, u + v = u, v + 2 u, v + v, v ≤ u, v + 2| u, v | + v, v ≤ u, v + 2kukkvk + v, v by C-S inequality = kuk2 + 2kukkvk + kvk2 2 = (kuk + kvk) . Taking square roots gives the required result. Note: an important variation on the Triangle Inequality is: ku − vk ≥ kuk − kvk (2.8) which is often used in calculus proofs. Check it! & % MS4105 118 ' $ Theorem 2.4 (Properties of Distance) If u, v and w are vectors in an inner product space V and if α ∈ R then the distance between u and v (remember that d(u, v) = ku − vk) satisfies the identities: (a) d(u, v) ≥ 0 (b) d(u, v) = 0 if and only if u = v. (c) d(u, v) = d(v, u) (d) d(u, v) ≤ d(u, w) + d(w, v) the Triangle Inequality. Proof: Check that (a), (b) and (c) are trivial consequences of the inner product space axioms. Exercise: prove (d). & % MS4105 ' 119 $ I can now define the angle between two vectors by (2.6) which I repeat here: Definition 2.3 Given two vectors u and v in an inner product space V, define the angle θ between u and v by u, v cos θ = . kukkvk Thanks to the Cauchy-Schwarz Inequality (2.7) I know that this definition ensures that | cos θ| ≤ 1 as must be the case for the Cosine to be well-defined. Example 2.12 Let R4 have the Euclidean inner product. Let u = (4, 3, 1, −2)T and v = (−2, 1, 2, 3)T . Check that the angle 3 between u and v is arccos − √ . 2 15 & % MS4105 ' 120 $ When two vectors in Rn have the property u · v = 0 I say that they are orthogonal. Geometrically this means that the angle between them is π/2 and so cos θ = 0. I can now extend this idea to vectors in a general inner product space. Definition 2.4 Two vectors in an inner product space are orthogonal if u, v = 0. Example 2.13 Consider the vector space C([−1, 1]) (continuous R1 functions on [−1, 1] with the inner product f , g = −1 f(x)g(x)dx. Let f(x) = x and g(x) = x2 . Then it is easy to check that f , g = 0 and so the two vectors are orthogonal. Example 2.14 Given any odd function f and any even function g, both in C([−1, 1]) with the inner product of the previous example, check that f and g are orthogonal. & % MS4105 ' 121 $ • As these two examples show, orthogonality in a general inner product space is a much more abstract idea than that of (say) two vectors in R2 being perpendicular. • However I will show that an orthogonal basis is the natural generalisation of a set of perpendicular coordinate vectors (e.g. i, j and k in R3 ). & % MS4105 122 ' 2.2.1 $ Exercises 1. Prove that if u and v are vectors in a (real) inner product space then: 1 1 2 u, v = ku + vk − ku − vk2 . 4 4 (2.9) The question seems to say that if I can define a norm in a vector space then automatically I have an inner product space. The situation is not that simple. In fact there is an important result (the Jordan von Neumann Lemma) which says that if I define an “inner product” using the formula in the previous question then provided the Parallelogram law (2.2) holds for all u, v ∈ V then u, v as defined in (2.9) satisfies all the axioms for an inner product space. Without the Parallelogram law, the inner product axioms do not hold. & % MS4105 ' 123 $ n n 2. Show that maxn (|u | + |v |) ≤ max |u | + max i i i i=1 i=1 i=1 |vi |. 3. The ∞ norm (to be discussed later in Section 4.3.1) is defined by kvk∞ = maxn i=1 |vi |. Check that it satisfies the norm properties 2.3. (To prove the Triangle Inequality for the ∞-norm, you’ll need the result of the previous Exercise.) 4. Check that the Parallelogram Law does not hold for this norm and so I cannot derive an inner product using (2.9). (Hint: use the vectors u = (1, 1)T and v = (1, 0)T .) 5. (Difficult) Prove the Jordan von Neumann Lemma for a real inner product space Lemma 2.5 (Jordan von Neumann) If V is a real vector space and k · k is a norm defined on V then, provided the Parallelogram law (2.2) holds for all u, v ∈ V, the expression (2.9) defines an inner product on V. See App. C for a proof. & % MS4105 124 ' 2.3 $ Orthonormal Bases I will show that using an orthogonal basis set for an inner product space simplifies problems — analogous to choosing a set of coordinate axes in Rn . I formally define some terms: Definition 2.5 A set of vectors in an inner product space is called an orthogonal set if each vector in the set is orthogonal to every other. If all the vectors in an orthogonal set are unit vectors (kuk = 1) the set is called an orthonormal set. Example 2.15 Let v1 = (0, 1, 0)T , v2 = (1, 0, 1)T and v3 = (1, 0, −1)T . The set S = {v1 , v2 , v3 } can easily be checked to be orthogonal wrt the Euclidean inner product. The vectors can be normalised by dividing each by its norm, e.g. kv1 k = 1 so set √ √ v 2 u1 = v1 , kv2 k = 2 so define u2 = √2 and kv3 k = 2 so define v3 0 u3 = √ . The set S = {u1 , u2 , u3 } is an orthonormal set. 2 & % MS4105 125 ' $ Example 2.16 The most familiar example of an orthonormal set is the set {e1 , . . . , en } in Rn where as usual ei is a column vector of all zeroes except for its ith entry which is 1. An important idea is that of coordinates wrt orthonormal bases. The following theorem explains why: Theorem 2.6 If S = {u1 , . . . , un } is an orthonormal basis for an inner product space V and v is any vector in V then v = v, u1 u1 + · · · + v, un un . Proof: As S is a basis it spans V, so any vector v ∈ V can be written as a linear combination of u1 , . . . , un : v = α1 u1 + · · · + αn un . & % MS4105 126 ' $ RTP that αi = v, ui for i = 1, . . . , n. For each vector ui ∈ S I have: v, ui = α1 u1 + · · · + αn un , ui = α1 u1 , ui + α2 u2 , ui + · · · + αn un , ui . But the vectors ui are orthonormal so that ui , uj = 0 unless i = j and ui , ui = 1. So the “only term that survives” in the expression for v, ui is αi giving us αi = v, ui as required. Example 2.17 Given the vectors u1 = (0, 1, 0)T , u2 = (−4/5, 0, 3/5)T and u3 = (3/5, 0, 4/5)T it is easy to check that S = {u1 , u2 , u3 } is an orthonormal basis for R3 with the Euclidean inner product. Then express the vector (1, 1, 1)T are a linear combination of the vectors in S. Check that this is easy using Thm. 2.6. & % MS4105 127 ' $ I can state a Theorem that summarises many of the nice properties that orthonormal bases have: Theorem 2.7 If S is an orthonormal basis for an n-dimensional inner product space V and if the vectors u, v ∈ V have components n {ui }n i=1 and {vi }i=1 wrt S then (a) kuk2 = n X u2i . i=1 qP n 2 (b) d(u, v) = i=1 (ui − vi ) . Pn (c) u, v = i=1 ui vi . Proof: Left as an exercise. & % MS4105 128 ' $ It is easy to check that: Theorem 2.8 If S is an orthogonal set of (non-zero) vectors in an inner product space then S is linearly independent. Proof: Left as an exercise. Before seeing how to calculate orthonormal bases I need one more idea — orthogonal projections. Definition 2.6 (Orthogonal Projection) Given an inner product space V and a finite-dimensional subspace W of V then if {u1 , . . . , uk } is an orthogonal basis for W; for any v ∈ V define ProjW v the orthogonal projection of v onto W by k X v, ui ProjW v = ui (2.10) kui k2 i=1 & % MS4105 129 ' or $ v, u1 v, uk ProjW v = u1 + · · · + uk . kui k2 kuk k2 Geometrically, the projection of v onto W is just that component of v that lies in W. A second definition: Definition 2.7 (Orthogonal Component) Given the preamble to the previous Definition, the component of v orthogonal to W is just ProjW ⊥ v = v − ProjW v. Notice that by definition v = ProjW v + ProjW ⊥ v so the vector v is decomposed into a part in W and a second part. I expect from the name that ProjW v is orthogonal to ProjW ⊥ v — in other words that ProjW v, ProjW ⊥ v = 0. & % MS4105 130 ' $ Let’s check: ProjW v, ProjW ⊥ v = ProjW v, v − ProjW v = ProjW v, v − ProjW v, ProjW v v, uk v, u1 u1 , v + · · · + uk , v = 2 2 ku1 k kuk k ! 2 2 v, u1 v, uk 2 2 − ku k + · · · + ku k 1 k ku1 k4 kuk k4 = 0. To summarise: • ProjW v is the component of the vector v in the subspace W, • ProjW ⊥ v is the component of vector v orthogonal to W. & % MS4105 131 ' 2.3.1 $ Calculating Orthonormal Bases I have seen some examples of the simplifications that result from using orthonormal bases. I will now present an algorithm which constructs an orthonormal basis for any nonzero finite-dimensional inner product space — the description of the algorithm forms the proof of the following Theorem: Theorem 2.9 (Gram-Schmidt Orthogonalisation) Every non-zero finite-dimensional inner product space has an orthonormal basis. Proof: Let V be any non-zero finite-dimensional inner product space and suppose that {v1 , . . . , vn } is a basis for V. RTP that V has an orthogonal basis. The following algorithm is what I need: & % MS4105 132 ' $ Algorithm 2.1 (1) Gram-Schmidt Orthogonalisation Process (2) begin (3) u1 = v1 (4) W1 = span(u1 ) (5) (6) (7) (8) (9) (10) v2 , u1 u2 = ProjW1⊥ v2 ≡ v2 − ProjW1 v2 ≡ v2 − u1 ku1 k2 W2 = span(u1 , u2 ) v3 , u1 v3 , u2 u3 = ProjW2⊥ v3 ≡ v3 − ProjW2 v3 ≡ v3 − u1 − u2 2 2 ku1 k ku2 k W3 = span(u1 , u2 , u3 ) while (i ≤ n) do i−1 X vi , uj ui = ProjWi−1 vi ≡ vi − ProjWi−1 vi ≡ vi − uj ⊥ 2 kuj k j=1 end (12) end & (11) % MS4105 ' 133 $ At the first step — (3) — I set u1 = v1 . At each subsequent step — namely (5) and (7) — I calculate the component of vi orthogonal to the space spanned by the already calculated vectors u1 , . . . , ui−1 . At line (9) I “automate” the algorithm to continue until all the vectors v4 , . . . , vn have been processed. Note that at each step I am guaranteed that the new vector ui 6= 0 as otherwise vi is a linear combination of u1 , . . . , ui−1 which set is a linear combination of v1 , . . . , vi−1 which cannot be as I am given that v1 , . . . , vn are linearly independent. So the algorithm is guaranteed to generate an orthonormal basis for the inner product space V. & % MS4105 134 ' $ Example 2.18 Consider R3 with the Euclidean inner product. Apply the G-S orthogonalisation process to transform the linearly independent vectors: v1 = (2, 1, 0)T , v2 = (1, 0, 2)T and v3 = (0, −2, 1)T into an orthonormal basis for R3 . • Set u1 = v1 = (2, 1, 0)T . • Set v2 , u1 u2 = v2 − u1 ku1 k2 2 = (1, 0, 2)T − (2, 1, 0)T = (1/5, −2/5, 2)T 5 & % MS4105 135 ' $ • Set v3 , u1 v3 , u2 u3 = v3 − u1 − u2 2 2 ku1 k ku2 k −2 14/5 T T = (0, −2, 1) − (2, 1, 0) − (1/5, −2/5, 2)T 5 21/5 2 2 T T = (0, −2, 1) + (2, 1, 0) − (1/5, −2/5, 2)T 5 3 = (10/15, −20/15, −1/3) = (2/3, −4/3, −1/3)T . Check that indeed the vectors u1 , u2 and u3 are orthogonal. & % MS4105 136 ' 2.3.2 $ Exercises 1. Verify that the set {(2/3, 2/3, 1/3), (2/3, −1/3, −2/3), (1/3, −2/3, 2/3)} forms an orthonormal set in R3 . Express the vectors of the standard basis of R3 as linear combinations of these vectors. 2. The topic of Fourier Series gives a nice illustration of an orthogonal set of functions. Let V be the inner product space of real-valued functions that are integrable on the interval Rπ (−π, π) where the inner product is u, v = −π f(t)g(t)dt so Rπ 2 that kfk = −π f(t)2 dt. (a) Check that the (infinite) set S = 1, cos(t), cos(2t) . . . , sin(t), sin(2t), . . . is an orthogonal set. (Integrate twice by parts.) & % MS4105 ' 137 $ (b) Show that I can normalise the set S to get an orthonormal 0 set S = √1 , √1 cos(t), √1 cos(2t) . . . , √1 sin(t), √1 sin(2t), . . . π π π π 2π 3. Notice that I cannot say that S 0 is an orthonormal basis for V, the inner product space of real-valued functions that are integrable on the interval (−π, π) as I need to show that it spans V but I can show that it is linearly independent using Thm. 2.8 — why? & % MS4105 138 ' $ 4. The Fourier Series for a function f is ∞ ∞ X √ cos kt X sin kt αk √ + βk √ F(t) = α0 / 2π + π π k=1 k=1 √1 , f 2π √1 π cos kt, f and where α0 = , αk = 1 βk = √π sin kt, f . Is F(t) = f(t) for all t ∈ R for all real-valued functions that are integrable on the interval (−π, π)? Why/why not? 5. Can you state (without proof) what we can reasonably expect? See Appendix R for a formal statement. 6. In particular S 0 is not a basis for V, the inner product space of real-valued functions that are integrable on the interval (−π, π) as F is continuous while f need not be. & % MS4105 139 ' $ 7. The Fourier Series is usually written ∞ X F(t) = a0 /2 + ak cos kt + bk sin kt k=1 Rπ where ak = −π f(t) cos(kt) dt for k ≥ 0 and Rπ bk = −π f(t) sin(kt) dt for k ≥ 1. Show that for the “square wave” f(t) = 1, 0 < t ≤ π and f(t) = −1, −π ≤ t < 0 you have ak ≡ 0 and bk = 0 for k even and 4/(kπ) when k is odd. 8. So what is the value of F(0)? 9. The result in Appendix R says that a0 + lim N→∞ N X ak cos kt + bk sin kt = f(t) k=1 at all points in (−π, π) where f(t) is continuous. Where is f(t) not continuous? & % MS4105 140 ' $ 10. If you want to test your matlab or octave skills, try plotting N X k=1 bk sin kt ≡ N X 1 4 sin(2k − 1)t (2k − 1)π for a range of increasing values of N. You’ll get something like: & Figure 3: Fourier Series for Square Wave % MS4105 ' 141 $ 11. Let V be the vector space of continuous real-valued functions R1 on [0, 1]. Define (f, g) = 0 f(t)g(t)dt. I know that this is an inner product on V. Starting with the set {1, x, x2 , x3 }, use the Gram-Schmidt Orthogonalisation process to find a set of four polynomials in V which form an orthonormal set with respect to the chosen inner product. & % MS4105 ' 3 142 $ Complex Vector and Inner Product Spaces In this very short Chapter I extend the previous definitions and discussion of vector spaces and inner product spaces to the case where the scalars may be complex. I need to do this as our discussion of Matrix Algebra will include the case of vectors and matrices with complex components. The results that I establish here will apply immediately to these vectors and matrices. & % MS4105 143 ' 3.1 $ Complex Vector Spaces A vector space where the scalars may be complex is a complex vector space. The axioms in Def. 1.1 are unchanged. The definitions of linear independence, spanning, basis, dimension and subspace are unchanged. The most important example of a complex vector space is of course Cn . The standard basis e1 , . . . , en is unchanged from Rn . & % MS4105 144 ' 3.2 $ Complex Inner Product Spaces Definition 3.1 If u, v ∈ Cn then the complex Euclidean inner product u · v is defined by ¯n u · v = u1 ¯v1 + · · · + un v where for any z ∈ C, z¯ is the complex conjugate of z. & % MS4105 145 ' $ Compare this definition with (2.1) for the real Euclidean inner product . The Euclidean inner product of two vectors in Cn is in general a complex number so the positioning of the complex conjugate “bar” in the Definition is important as in general ¯ i vi (the first expression is the conjugate of the second). ¯ i 6= u ui v The rule in Def. 3.1 is that “The bar is on the second vector”. This will allow us to reconcile the rules for dot products in complex Euclidean n-space Cn with general complex inner product spaces — to be discussed below. & % MS4105 146 ' 3.2.1 $ Properties of the Complex Euclidean inner product It is useful to list the properties of the Complex Euclidean inner product as a Theorem: Theorem 3.1 If u, v, w ∈ Cn and if α ∈ C the (a) u · v = v · u (b) (u + v) · w = u · w + v · w (c) (αu) · v = α(u · v) (d) v · v ≥ 0 and v · v = 0 if and only if v = 0. Proof: Check . The first (part (a)) property is the only one that differs from the corresponding properties of any real inner product such as Euclidean Rn . & % MS4105 ' 147 $ I can now define a complex inner product space based on the properties of Cn . Definition 3.2 An inner product on a complex vector space V is a mapping from V × V to C that maps each pair of vectors u and v in V into a complex number u, v so that the following axioms are satisfied for all vectors u, v and w and all scalars α ∈ C: 1. u, v = v, u Symmetry Axiom 2. u + v, w = u, w + v, w Distributive Axiom 3. αu, v = α u, v Homogeneity Axiom 4. u, u ≥ 0 and u, u = 0 if and only if u = 0. Compare with the axioms for a real inner product space in Def. 2.1 — the only axiom to change is the Symmetry Axiom. & % MS4105 148 ' $ Some simple properties of complex inner product spaces (compare with Thm. 2.1 Theorem 3.2 If u,v and w are vectors in a complex inner product space and α ∈ C then (0 is the zero vector, not the number 0) (a) 0, v = v, 0 = 0 (b) u, v + w = u, v + u, w ¯ u, v . (c) u, αv = α Proof: Check . N.B. Property (c) tells us that I must take the complex conjugate of the coefficient of the second vector in an inner product. It is very easy to forget this! For example u, (1 − i)v = (1 + i) u, v . & % MS4105 149 ' 3.2.2 $ Orthogonal Sets The definitions of orthogonal vectors, orthogonal set, orthonormal set and orthonormal basis carry over to complex inner product spaces without change. All our Theorems (see Slide 156) on real inner product spaces still apply to complex inner product spaces and the Gram-Schmidt process can be used to convert an arbitrary basis for a complex inner product space into an orthonormal basis. & % MS4105 150 ' $ But I need to be careful now that I am dealing with complex scalars and vectors. A (very) simple example illustrates the pitfalls: Example 3.1 Let v = (1, i)T , u1 = (i, 0)T and u2 = (0, 1)T . The vectors u1 and u2 are clearly orthogonal as u1 · u2 = i¯0 + 0¯1 = 0. In fact they are an orthonormal set as u1 · u1 = i.¯i + 0.¯0 = 1 and u2 · u2 = 0.¯0 + 1.¯1 = 1. It is obvious by inspection that v = −iu1 + iu2 . Let’s try to derive this. Using the fact the the set {u1 , u2 } is orthonormal I know that I can write: v = α1 u1 + α2 u2 . & % MS4105 151 ' $ To calculate α1 and α2 just compute: u1 · v = u1 · (α1 u1 + α2 u2 ) = α1 u1 · u1 + α2 u1 · u2 = α1 (1) + α2 (0) = α1 . So α1 = u1 · v = (i, 0) · (1, i) = i.¯1 + 0.¯i = i. Similarly I find that α2 = u2 · v = (0, 1) · (1, i) = 0.¯1 + 1.¯i = −i. But I know that v = −iu1 + iu2 so α1 = −i and α2 = i. What’s wrong? The answer is on the next slide — try to work it out yourself first. & % MS4105 152 ' $ The flaw in the reasoning was as follows: u1 · v = u1 · (α1 u1 + α2 u2 ) ¯ 1 u1 · u1 + α ¯ 2 u1 · u2 . =α ¯ 1 = i so (The rest of the calculation goes as previously leading to α α1 = −i — the right answer. Similarly I correctly find that ¯ 2 = −i so α2 = i. α It is very easy to make this mistake — the result I need to remind ourselves of is ¯ u, v . u, αv = α (3.1) property (c) in Thm 3.2. & % MS4105 153 ' $ For this reason I will always write an orthonormal expansion for a vector v in a complex inner product space V as v= n X v, ui ui i=1 & % MS4105 154 ' $ Example 3.2 Given the basis vectors v1 = (i, i, i)T , v2 = (0, i, i)T and v3 = (0, 0, i)T , use the G-S algorithm to transform the set S = {v1 , v2 , v3 } into an orthonormal basis. • Set u1 = v1 = (i, i, i)T . • Set v2 , u1 u1 2 ku1 k 2 T = (0, i, i) − (i, i, i)T = (−2i/3, i/3, i/3)T 3 u2 = v2 − • Set v3 , u1 v3 , u2 u3 = v3 − u1 − u2 2 2 ku1 k ku2 k 1 1/3 T T = (0, 0, i) − (i, i, i) − (−2i/3, i/3, i/3)T 3 2/3 = (0, −i/2, i/2)T . & % MS4105 155 ' $ I can normalise the three vectors {u1 , u2 , u3 } by dividing each by its norm. 2 • ku1 k = u1 , u1 = |i|2 + |i|2 + |i|2 = 3. Set u1 = √13 (i, i, i)T . 2 • ku2 k = u2 , u2 = |2i/3|2 + |i/3|2 + |i/3|2 = 2/3. Set u2 = √ √3 (−2i/3, i/3, i/3)T . 2 • ku3 k = u3 , u3 = |0|2 + |i/2|2 + |i/2|2 = 1/2. Set √ u3 = 2(0, −i/2, i/2)T . 2 The vectors {u1 , u2 , u3 } form an orthonormal basis. The steps worked as for vectors in a real inner product space — I just need to be careful when taking the inner products. & % MS4105 ' 156 $ The following list of Theorems for real inner product spaces all still apply for complex inner product spaces : • Thm. 2.2 (the Cauchy-Schwarz inequality — note that now the expression | u, v | refers to the modulus, not the absolute value). Check that the alternative proof for the Cauchy-Schwarz inequality on Slide 114 can easily be adapted to prove the Theorem for a complex inner product space. • Thm. 2.3 and Thm. 2.4, • Thm. 2.6 (note that while for a real inner product space , I could write v = u1 , v u1 + · · · + un , v un , this is not correct for a complex inner product space as ui , v 6= v, ui ), • Thm. 2.7, Thm. 2.8 and Thm. 2.9 . & % MS4105 157 ' 3.2.3 $ Exercises 1. Consider the set of complex m × n matrices Cmn . Define the complex inner product consisting of the set Cmn where for any two matrices U and V I define the complex inner product X U, V = Uij V¯ij where the sum is over all the elements of the matrices. (a) Find the inner product of the matrices 0 i 1 U= and V = 1 1+i 0 −i 2i (b) Check that U, V is indeed a complex inner product for Cmn . (c) Find d(U, V) for the vectors U and V defined in part (a). & % MS4105 158 ' $ (d) Which of the following vectors are orthogonal to 2i i ? A= −i 3i −3 1 − i (a) 1−i 2 0 0 (c) 0 0 1 1 (b) 0 −1 0 1 . (d) 3−i 0 2. Let C3 have the standard Euclidean inner product. Use the Gram-Schmidt orthogonalisation process to transform each of the following bases into an orthonormal basis : (a) u1 = (i, i, i), u2 = (−i, i, 0), u3 = (i, 2i, i) (b) u1 = (i, 0, 0), u2 = (3i, 7i, −2i), u3 = (0, 4i, i) & % MS4105 159 ' $ 3. If α ∈ C and u, v is an inner product on a complex vector space, then: (a) prove that ¯ v, v ¯ u, v − α u, v + αα u − αv, u − αv = u, u − α (b) use the result in (a) to prove that ¯ v, v ≥ α ¯ u, v + α u, v . u, u + αα (c) prove the Cauchy-Schwarz inequality 2 | u, v | ≤ u, u v, v u,v by setting α = in (b). v,v & % MS4105 160 ' $ 4. Check that the parallelogram law (2.2) 2 2 2 ku + vk + ku − vk = 2 kuk + kvk 2 (3.2) still holds in any complex inner product space. 5. Prove that if u and v are vectors in a complex inner product space then: 1 i i 2 1 2 2 u, v = ku+vk − ku−vk + ku+ivk − ku−ivk2 . (3.3) 4 4 4 4 (This is the more general complex version of Q. 5.) The full (complex) version of the Jordan von Neumann Lemma) says that if I define an “inner product” using the formula in the previous question then provided the parallelogram law (2.2) holds for all u, v ∈ V then u, v as defined in Q. 5 satisfies all the axioms for a complex inner product space. & % MS4105 ' 161 $ 6. Prove that if {u1 , . . . , un } is an orthonormal basis for a complex inner product space V then if v and w are any vectors in V; v, w = v, u1 w, u1 + v2 , u2 w, u2 +· · ·+ v, un w, un . & % MS4105 ' 162 $ Part II Matrix Algebra • This is the second Part of the Course. • After reviewing the basic properties of vectors in Rn and of matrices I will introduce new ideas including unitary matrices, matrix norms and the Singular Value Decomposition. • Although all the ideas will be set in Rn I will cross-reference the more general vector space and inner product space ideas from Part I where appropriate. & % MS4105 ' 4 163 $ Matrices and Vectors • This Chapter covers material that will be for the most part familiar — but from a more sophisticated viewpoint. • For the most part I will not use bold fonts for vectors — but we will keep to the convention that upper case letters represent matrices and lower case letters represent vectors. • Let A be an m × n matrix and x an n-dimensional column vector and suppose that a vector b satisfies Ax = b. • I will always keep to the convention that vectors are simply matrices with a single column — so x is n × 1. & % MS4105 164 ' $ • Then Ax = b and the m-dimensional column vector b is given by n X bi = aij xj . (4.1) j=1 • If I use the summation convention discussed in Appendix A then I can write (4.1) as bi = aij xj . • In most of the course the entries of vectors and matrices may be complex so A ∈ Cm×n , x ∈ Cn and b ∈ Cm . • The mapping x → Ax is linear, so for any x, y ∈ Cn and any α∈C A(x + y) = Ax + Ay A(αx) & = αAx. % MS4105 165 ' $ • Correspondingly, every linear mapping from Cn to Cm can be represented as multiplication by an m × n matrix. • The formula Ax = b is ubiquitous in Linear Algebra. • It is useful — throughout this course — to interpret it as saying that If b = Ax, then the vector b can be written as a linear combination of the columns of the matrix A. • This follows immediately from the expression above for a matrix-vector product, which can be re-written b = Ax = n X xj aj . (4.2) j=1 where now aj is the jth column of the matrix A. & % MS4105 166 ' $ • Schematically: b = a1 a2 ... x1 x2 an .. = . xn x1 a1 + x2 a2 + · · · + xn an & (4.3) % MS4105 167 ' $ Similarly, in the matrix-matrix product B = AC, each column of B is a linear combination of the columns of A. If this isn’t obvious, just consider the familiar formula bij = m X aik ckj (4.4) k=1 and choose a particular column of B by fixing j. Clearly the jth column of the RHS is formed by taking a linear combination of the columns of A, where the coefficients are ckj with j fixed. The result follows. I can write it (analogous with (4.2)) as col j of B ≡ bj = Acj = m X ckj ak ≡ a lin. comb. of cols of A. k=1 (4.5) So bj is a linear combination of the columns ak of A with coefficients ckj . & % MS4105 168 ' $ Example 4.1 (Outer Product) A simple special case of a matrix-matrix product is the outer product of two vectors. Given an m-dimensional column vector u and an n-dimensional row vector vT I can (and should) regard u as an m × 1 matrix and vT as a 1 × n matrix. Then I need no new ideas to define the outer product by (uvT )ij = ui1 v1j or just ui vj . (I can regard u and v as m × 1 and n × 1 matrices or just as column vectors.) & % MS4105 169 ' $ The outer product can be written: h u v1 v2 ... i vn = v1 u which is of course equal to v u 1 1 .. . v 1 um ... ... v2 u vn u1 .. . v n um . v3 u ... v n u . The columns are all multiples of the same vector u (and check the rows are all multiples of the same vector vT ). & % MS4105 ' 170 $ Example 4.2 As another example, consider the matrix equation B = AR, where R is the upper-triangular n × n matrix with entries rij = 1 for i ≤ j and rij = 0 for i > j: 1 ... 1 . . . . .. R= 1 & % MS4105 171 ' $ This product can be written b1 b2 ... bn = a1 a2 ... 1 ... .. an . 1 .. . 1 The column formula (4.5) now gives bj = Arj = j X ak . k=1 In other words, the jth column of B is the sum of the first j columns of A. & % MS4105 172 ' 4.1 $ Properties of Matrices In this Section I review basic properties of matrices, many will be familiar. 4.1.1 Range and Nullspace Corresponding to any m × n matrix A I can define two sets which are respectively subspaces of Rm and Rn . Definition 4.1 (Range) Given an m × n matrix A, the range of a matrix A — written range(A), is the set of vectors y ∈ Rm that can be written y = Ax for some x ∈ Rn . Check that range(A) is a subspace of Rm . & % MS4105 ' 173 $ The formula (4.2) leads to the following characterisation of range(A): Theorem 4.1 range(A) is the vector space spanned by the columns of A. Proof: RTP that range(A) = span{a1 , . . . , an }. By (4.2), any Ax is a linear combination of the columns of A. Conversely, any vector y in the space spanned by the columns of A can be written as Pn y = j=1 xj aj . Forming a vector x from the coefficients xj , I have y = Ax and so y ∈ range(A). In the light of the result of Thm 4.1, it is reasonable to use the term column space as an equivalent for the term range. Exercise 4.1 What is the maximum possible value of dim range(A)? & % MS4105 ' 174 $ Definition 4.2 The nullspace of an m × n matrix A — written null(A) — is the set of vectors x ∈ Rn that satisfy Ax = 0. Check the entries of each vector x ∈ null(A) give the coefficients of an expansion of the zero vector as a linear combination of the columns of A. Check that null(A) is a subspace of Rn . & % MS4105 175 ' 4.1.2 $ Rank Definition 4.3 The column rank of a matrix is the dimension of its range (or column space). Similarly; Definition 4.4 The row rank of a matrix is the dimension of the space spanned by its rows. I will show later (Thm. 6.9) that the row rank and column rank of an m × n matrix are always equal so I can use the term rank to refer to either. & % MS4105 ' 176 $ Definition 4.5 An m × n matrix has full rank if its rank is the largest possible, namely the lesser of m and n. It follows that a full rank m × n matrix with (say) m ≥ n (a “tall thin” matrix) must have n linearly independent columns. I can show that the mapping defined by such a matrix is 1–1. Theorem 4.2 A matrix A ∈ Cm×n with m ≥ n has full rank if and only if it maps no two distinct vectors to the same vector. Proof: [→] If A has full rank then its columns are linearly independent, so they form a basis for range A. This means that every b ∈ range(A) has a unique linear expansion in terms of the columns of A and so, by (4.2), every b ∈ range(A) has a unique x such that b = Ax. & % MS4105 ' 177 $ [←] If A is not full rank then its columns aj are linearly dependent and there is a non-trivial linear combination s.t. Pn j=1 cj aj = 0. The vector c formed from the coefficients cj therefore satisfies Ac = 0. So A maps distinct vectors to the same vector as for any x, A(x + c) = Ax. & % MS4105 178 ' 4.1.3 $ Inverse I start with a non-standard definition of invertible matrices (which does not refer to matrix inverses). Definition 4.6 A square matrix A is non-singular (invertible) if A is of full rank. I will now show that this definition is equivalent to the standard one. The n columns aj , j = 1, . . . , n of a non-singular (full rank) n × n matrix A form a basis for the space Cn . So any vector in Cn has a unique expression as a linear combination of them. So the standard unit (basis) vector ei , i = 1, . . . , n can be expanded in terms of the aj , e.g. n X ei = zij aj . (4.6) j=1 & % MS4105 179 ' $ Let Z be the matrix whose ij entry is zij and let zj denote the jth Pn column of Z. Then (4.6) can be written ei = j=1 aj (ZT )ji . I can assemble the ej column vectors into a single matrix: e1 e2 ... T ≡ I = AZ en and I is of course the n × n identity matrix . So the matrix ZT is the inverse of A. (Any square non-singular matrix A has a unique inverse, written A−1 satisfying AA−1 = A−1 A = I.) & % MS4105 ' 180 $ If you found the preceding vector treatment confusing, an index notation version is as follows: Pn Pn • Rewrite ei = j=1 zij aj so eki = j=1 zij akj where eki is the element in the kth row of the (column) vector ei . • Re-ordering the factors on the RHS we have (taking the Pn transpose of Z) eki = j=1 akj ZTji which gives us the matrix equation I = AZT . I always have the option of reverting to index notation if it helps to derive a formula. Many algorithms can be explained in a very straightforward way using vector notation. You should try to become familiar with both. P (Finally, the summation convention allows us to drop the symbols in the above — whether you use it or not is up to you.) & % MS4105 ' 181 $ The following Theorem is included for reference purposes — you will have seen proofs in Linear Algebra 1. Theorem 4.3 For any real or complex n × n matrix A, the following conditions are equivalent: (a) A has an inverse A−1 (b) rank(A) = n (c) range(A) = Cn (d) null(A) = { 0 } (e) 0 is not an eigenvalue of A (f ) det(A) 6= 0 Proof: Check as many of these results as you can. (I have just shown that (b) implies (a).) Note: I will rarely mention the determinant in the rest of the course as it is of little importance in Numerical Linear Algebra. & % MS4105 182 ' 4.1.4 $ Matrix Inverse Times a Vector When I write x = A−1 b this is of course a matrix vector product. However; I will not (except perhaps in a pen and paper exercise) compute this matrix product. The inverse matrix A−1 is expensive to compute and is generally a means to the end of computing the solution to the linear system Ax = b. You should think of x here as being the unique vector that satisfies Ax = b and so x is the vector of coefficients of the (unique) linear expansion of b in the basis formed by the columns of the full rank matrix A. & % MS4105 ' 183 $ I can regard multiplication by A−1 as a change of basis operation; switching between: • regarding the vector b itself as the coefficients of the expansion of b in terms of the basis {e1 , . . . , en } • regarding A−1 b (≡ x) as the coefficients of the expansion of b in terms of the basis {a1 , . . . , an } & % MS4105 184 ' 4.2 $ Orthogonal Vectors and Matrices Orthogonality is crucial in Numerical Linear Algebra — here I introduce two ideas; orthogonal vectors (an instance of Def. 2.4 for inner product spaces) and orthogonal (unitary) matrices. Remember that the complex conjugate of a complex number z = x + iy is written z¯ or z∗ and that z¯ = x − iy. The hermitian conjugate or adjoint of an m × n matrix A, written A∗ is the n × m matrix whose i, j entry is the complex conjugate of the j, i ¯ ji ), the complex conjugate transpose. entry of A, so A∗ij = (A For example a11 A= a21 & a12 a22 ¯11 a a13 ⇒ A∗ = a ¯12 a23 ¯13 a ¯21 a ¯22 a ¯23 a % MS4105 ' 185 $ • Obviously (A∗ )∗ = A. • If A = A∗ I say that A is hermitian. • And of course, a hermitian matrix must be square. • If A is real then of course A∗ = AT , the transpose of A.) • A real hermitian matrix satisfies A = AT and is called symmetric. • The vectors and matrices we deal with from now on may be real — but I will use this more general notation to allow for the possibility that they are not. (In matlab/octave, A 0 is the hermitian conjugate A∗ and A· 0 is the transpose AT .) & % MS4105 ' 186 $ Remember our convention that vectors in Rn are by default column vectors — I can now say that vectors in Cn are by default column vectors and that (for example) 1−i i∗ h 1 + i = 1 + i 1 − i −2i 2i so that I can write the second to mean the first to avoid taking up so much space. & % MS4105 187 ' 4.2.1 $ Inner Product on Cn For the dot product (inner product ) of two column vectors x and y in Cn to be consistent with Def. 3.1 (the complex Euclidean Inner Product) I need to write: x · y = y∗ x = n X xi y ¯i . (4.7) i=1 This looks strange (the order of x and y are reversed) but remember that I are writing all operations, even vector-vector operations, in matrix notation. The two terms in blue correspond to Def. 3.1. & % MS4105 188 ' $ The term in green is (by definition of the hermitian conjugate or adjoint “∗” operation) x 1 h i x2 y∗ x = y ¯1 y ¯2 . . . y ¯n . . . xn which I can summarise as: y∗ x = n X xi y ¯i . i=1 & % MS4105 ' 189 $ This is rarely an issue in Numerical Linear Algebra as, rather than appeal to general results about complex inner product spaces , I will use matrix algebra to prove the results that I need. Confusingly, often writers (such as Trefethen) refer to x∗ y as the complex inner product on Cn . This is clearly not correct but usually doesn’t matter! From now on in these notes all products will be matrix products so the issue will not arise. & % MS4105 190 ' $ The Euclidean length or norm (which I will discuss later in a more general context) is: X 12 X 21 n n √ ¯i xi kxk = x∗ x = x = |xi |2 . (4.8) i=1 i=1 Again, the natural definition of the cosine of the angle θ between x and y is x∗ y cos θ = (4.9) kxkkyk y∗ x based on (2.6) (Strictly speaking this should be cos θ = kxkkyk and (4.7). But, as noted above, the difference is rarely important.) The Cauchy-Schwarz inequality (2.7) ensures that | cos θ| ≤ 1 but cos θ can be complex when the vectors x and y are complex. For this reason the angle between two vectors in Cn is usually only of interest when the angle is π/2, i.e. the vectors are orthogonal . & % MS4105 191 ' $ It is easy to check that the operation x∗ y is bilinear — linear in each vector separately (x1 + x2 )∗ y = x∗1 y + x∗2 y x∗ (y1 + y2 ) = x∗ y1 + x∗ y2 ¯ βx∗ y (αx)∗ (βy) = α & % MS4105 192 ' $ • I will often need the easily proved (check ) formula that for compatibly sized matrices A and B (AB)∗ = B∗ A∗ (4.10) • A similar formula for the inverse is also easily proved (check ) (AB)−1 = B−1 A−1 . (4.11) • The notation A−∗ is shorthand for (A∗ )−1 or (A−1 )∗ . • A very convenient fact: these expressions are equal — check . & % MS4105 193 ' 4.2.2 $ Orthogonal vectors • I say that two vectors x and y are orthogonal if x∗ y = 0 — compare with Def. 2.4. • Two sets of vectors X and Y are orthogonal if every vector x in X is orthogonal to every vector y in Y. • A set of vectors S in Cn is said to be an orthogonal set if for all x, y in S, x 6= y ⇒ x∗ y = 0. & % MS4105 194 ' $ It isn’t necessary but I can re-derive the result Thm. 2.8 that the vectors in an orthogonal set are linearly independent. Theorem 4.4 The vectors in an orthogonal set S in Cn are linearly independent. Proof: Assume the contrary; then some non-zero vk ∈ S can be expressed as a linear combination of the rest: vk = n X ci vi . i=1,i6=k As vk 6= 0, v∗k vk ≡ kvk k2 > 0. But using the bilinearity of the x∗ y operation and the orthogonality of S v∗k vk = n X ci v∗k vi = 0 i=1,i6=k which contradicts the assumption that vk 6= 0. & % MS4105 ' 195 $ So, just as in a real inner product space (including Rn ) if an orthogonal set in Cn contains n vectors, it is a basis for Cn . Before re-stating the definition of the orthogonal projection of a vector in Cn onto a subspace W = {u1 , . . . , uk } ⊆ Cn I need to remind you that while in Rn x 0 y = y 0 x this is no longer the case in Cn : u∗ v 6= v∗ u but in fact u∗ v = (v∗ u)∗ ≡ (v∗ u) as v∗ u is a scalar — in general complex. & % MS4105 196 ' $ To simplify the algebra, suppose that Q = {q1 , . . . , qk } (k ≤ n) is an orthonormal set of vectors in Cn so that q∗i qi ≡ kqi k2 = 1. Let v be an arbitrary vector in Cn . I can give the definition of the orthogonal projection of the vector v onto Q: ProjQ v = k X q∗i v qi (4.12) i=1 q∗i v Notice that = v, qi so this definition is the same as Def. 2.10. As noted previously, if the vectors are complex, the order of qi and v in the matrix product/inner product does matter, unlike a real inner product like the Euclidean inner product on Rn . & % MS4105 197 ' $ (If the set Q = {q1 , . . . , qk } is orthogonal but not orthonormal then I write (4.12) as k X q∗i v ProjQ v = qi 2 kqi k i=1 which should be compared with (2.10).) The vector ProjQ⊥ v ≡ v − ProjQ v as in Def. 2.7 and, as I saw for a general complex inner product space ProjQ v is orthogonal to ProjQ⊥ v. (Check using Cn notation — not really necessary as whatever holds for a general complex inner product space holds for Cn with the complex Euclidean inner product — but it is good practice.) & % MS4105 198 ' $ If k = n then Q = {qi } is a basis for Cn so v = ProjQ v. Note that I can write (4.12) for ProjQ v as: v= k X (q∗i v)qi (4.13) (qi q∗i )v. (4.14) i=1 or v= k X i=1 These two expansions are clearly equal as q∗i v is a scalar but have different interpretations. The first expresses v as a linear combination of the vectors qi with coefficients q∗i v. The second expresses v as a sum of orthogonal projections of v onto the directions {q1 , . . . , qn }. The ith projection operation is performed by the rank-one matrix qi q∗i . I will return to this topic in the context of the QR factorisation in Chapter 5.2. & % MS4105 199 ' 4.2.3 $ Unitary Matrices Definition 4.7 A square matrix Q ∈ Cn×n is unitary (for real matrices I usually say orthogonal) if Q∗ = Q−1 , i.e. if Q∗ Q = I. (As the inverse is unique this means that also QQ∗ = I.) In terms of the columns of Q, this product may be written — q∗1 — 1 — q∗2 — 1 q1 q2 . . . qn = . .. .. . . — q∗n — 1 In other words, q∗i qj = δij and the columns of a unitary matrix Q form an orthonormal basis for Cn . (The symbol δij is the Kronecker delta equal to 1 if i = j and to 0 if i 6= j.) & % MS4105 200 ' 4.2.4 $ Multiplication by a Unitary Matrix On Slide 183 I discussed the interpretation of matrix-vector products Ax and A−1 b. If A is a unitary matrix Q, these products become Qx and Q∗ b — the same interpretations are still valid. As before, Qx is the linear combination of the columns of Q with coefficients in x. Conversely, Q∗ b is the vector of coefficients of the expansion of b in the basis of the columns of Q. Again I can regard multiplying by Q∗ as a change of basis operation; switching between: • regarding b as the coefficients of the expansion of b in terms of the basis {e1 , . . . , en } • regarding Q∗ b as the coefficients of the expansion of b in terms of the basis {q1 , . . . , qn } & % MS4105 201 ' 4.2.5 $ A Note on the Unitary Property I know that for any invertible matrix A, AA−1 = A−1 A (check ). It follows that if Q is unitary, so is Q∗ . The argument is that I know that Q∗ = Q−1 but QQ−1 = Q−1 Q. so QQ∗ = Q∗ Q = I. The latter equality means that Q∗ is unitary. The process of multiplying by a unitary matrix or its adjoint (also unitary) preserves inner products. This follows as (Qx)∗ (Qy) = x∗ Q∗ Qy = x∗ y (4.15) where I used the identity (4.10). It follows that angles are also preserved and so are lengths: kQxk = kxk. & (4.16) % MS4105 202 ' 4.2.6 $ Exercises 1. Show the “plane rotation matrix” cos(θ) sin(θ) R= − sin(θ) cos(θ) is orthogonal and unitary. 2. Show that (AB)∗ = B∗ A∗ for any matrices A and B whose product make sense. (Sometimes called compatible matrices: two matrices with dimensions arranged so that they may be multiplied. The number of columns of the first matrix must equal the number of rows of the second.) 3. Show that the product of two unitary matrices is unitary. 4. (Difficult) Show that if a matrix is triangular and unitary then it is diagonal. & % MS4105 203 ' $ 5. Prove the generalised version of Pythagoras’ Theorem: that for a set of n orthonormal vectors x1 , . . . , xn ; n n X 2 X xi = kxi k2 . i=1 i=1 6. Show that the eigenvalues of a complex n × n hermitian matrix are real. 7. Show the eigenvectors (corresponding to distinct eigenvalues) of a complex n × n hermitian matrix are orthogonal. 8. What general properties do the eigenvalues of a unitary matrix have? & % MS4105 ' 204 $ 9. Prove that a skew-hermitian (S∗ = −S) matrix has pure imaginary eigenvalues. 10. Show that if S is skew-hermitian then I − S is non-singular. 11. Show that the matrix Q = (I − S)−1 (I + S) is unitary. (Tricky.) 12. Using the above results, can you write a few lines of Matlab/Octave that generate a random unitary matrix? 13. If u and v are vectors in Cn then let A = I + uv∗ — a rank-one perturbation of I. Show that A−1 = I + αuv∗ and find α. & % MS4105 205 ' 4.3 $ Norms Norms are measures of both size and distance. I will study first vector, then matrix norms. 4.3.1 Vector Norms In this Chapter I have already informally defined the Euclidean norm in (4.8). Also in Part I I defined the induced norm corresponding to a given inner product in a general inner product space in (2.2). So the following definition contains nothing new — but is useful for reference. & % MS4105 206 ' $ Definition 4.8 A norm on Cn is a function from Cn to R that satisfies (for all x, y in Cn and for all α in C ) 1. kxk ≥ 0 and kxk = 0 if and only if x = 0, 2. kx + yk ≤ kxk + kyk, Triangle Inequality 3. kαxk = |α|kxk. (Compare with the general definition for an inner product space Def. 2.2 and the properties proved in Thm. 2.3 based on that definition.) & % MS4105 207 ' $ A norm need not be defined in terms of an inner product as I will see; any function that satisfies the three requirements in Def. 4.8 qualifies as a measure of size/distance. The most important class of vector norms are the p-norms: kxk1 = n X |xi | (4.17) i=1 kxk2 = n X ! 12 |xi | 2 = √ x∗ x (4.18) 0 < p... (4.19) i=1 kxkp = n X ! p1 |xi |p i=1 kxk∞ = max |xi |. 1≤i≤n & (4.20) % MS4105 208 ' $ Another useful vector norm is the weighted p-norm. For any norm k · k, kxkW = kWxk (4.21) where W is an arbitrary non-singular matrix. The most important vector norm is the (unweighted) 2-norm. & % MS4105 209 ' 4.3.2 $ Inner Product based on p-norms on Rn /Cn It is interesting to note that the Parallelogram Law (2.2) does not hold for a p–norm, for p 6= 2. (For example, check that if p = 1, u = (1, 0)T and v = (0, 1)T , the equality does not hold.) (I first looked at the issue of whether the Parallelogram Law always holds back in Exercise 4 on Slide 123.) Exercise 4.2 Show that for any p > 0, when u = (1, 0)T and v = (0, 1)T , then the lhs in (2.2) is 21+2/p and the rhs is 4. Show that lhs=rhs if and only if p = 2. (What about k · k∞ ?) So I cannot derive an inner product u, v from a p-norm using (2.9) unless p = 2 — the Euclidean norm. & % MS4105 210 ' 4.3.3 $ Unit Spheres For any choice of vector norm, the (closed) unit sphere is just {x|kxk = 1}. (The term unit ball is used to refer to the set {x|kxk ≤ 1} — the set bounded by the unit sphere.) It is interesting to sketch the unit sphere for different vector norms in R2 — see Fig. 4 on the next Slide. & % MS4105 211 ' $ kxk2 = 1 (0, 1) kxkp = 1, p > 1 kxk∞ = 1 (1, 0) (−1, 0) kxkp = 1, p < 1 kxk1 = 1 (0, −1) Figure 4: Unit spheres in R2 in different norms & % MS4105 ' 212 $ 1. kxk1 = 1 ≡ |x1 | + |x2 | = 1. By examining each of the four possibilities (x1 ≥ 0, x2 ≥ 0), . . . , (x1 ≤ 0, x2 ≤ 0) in turn I see that the unit sphere for this norm is a diamond-shaped region with vertices at (0, 1), (1, 0), (0, −1) and (−1, 0). 2. kxk2 = 1 ≡ x21 + x22 = 1, the equation of a (unit) circle, as expected. 3. kxk∞ = 1 ≡ max{|x1 |, |x2 |} = 1 — “the larger of |x1 | or |x2 | is equal to 1”. • If |x1 | ≤ |x2 | then kxk∞ = |x2 | = 1 or x2 = ±1 — two horizontal lines. • If |x1 | ≥ |x2 | then kxk∞ = |x1 | = 1 or x1 = ±1 — two vertical lines. So the unit sphere with the ∞-norm is a unit square centred at the origin with vertices at the intersections of the four lines indicated; (1, 1), (−1, 1), (−1, −1) and (1, −1). & % MS4105 ' 213 $ 4. kxkp = 1. For p > 1 this is a closed “rounded square” or “squared circle” — check — see Fig. 4. 5. kxkp = 1. For p < 1 I get a “cross-shaped” figure — check — see Fig. 4. (Use Maple/Matlab/Octave to check the plot of kxkp = 1 for p > 1 and p < 1 graphically first if you like but it is easy to justify the plots using a little calculus. Hint: just check for the first quadrant and use the symmetry about the x– and y– axes of the definition of kxk to fill in the other three quadrants.) & % MS4105 214 ' 4.3.4 $ Matrix Norms Induced by Vector Norms Any m × n matrix can be viewed as a vector in an mn-dimensional space (each of the mn components treated as a coordinate) and I could use any p-norm on this vector to measure the size of the matrix. The main example of this is the “Frobenius norm” which uses p = 2; Definition 4.9 (Frobenius Norm) 12 m X n X |aij |2 . kAkF = i=1 j=1 & % MS4105 215 ' $ It is usually more useful to use induced matrix norms, defined in terms of one of the vector norms already discussed. Definition 4.10 (Induced Matrix Norm — Informal) Given a choice of norm k · k on Cm and Cn the induced matrix norm of the m × n matrix A, kAk, is the smallest number b such that the following inequality holds for all x ∈ Cn . kAxk ≤ bkxk. (4.22) kAxk In other words, of all upper bounds on the ratio , I define kxk kAk to be the smallest (least) such upper bound. & % MS4105 ' 216 $ A Note on the Supremum Property 1 • A (possibly infinite) bounded set S of real numbers obviously has an upper bound, say b. • Any real number greater than b is also an upper bound for S. • How do I know that there is a least upper bound for S? • It seems obvious ( actually a deep property of the real numbers R) that any bounded set of real numbers has a least upper bound or supremum. • The supremum isn’t always contained in the set S! • The following three slides summarise what you need to know. & % MS4105 ' 217 $ A Note on the Supremum Property 2 • For example the set S = [0, 1) (all x s.t. 0 ≤ x < 1) certainly has upper bounds such as 1, 2, 3, π, . . . . • The number 1 is the least such upper bound — the supremum of S. • How do I know that 1 is the least upper bound? – Any number less than 1, say 1 − ε, cannot be an upper bound for S as the number 1 − ε/2 is in S but greater than 1 − ε! – Any number greater than 1 cannot be the least upper bound as 1 is an upper bound. • Notice that the supremum 1 is not a member of the set S = [0, 1). & % MS4105 ' 218 $ A Note on the Supremum Property 3 • Why not just use the term “maximum” rather than “supremum”? • Because every bounded set has a supremum. • And not all bounded sets have a maximum. • For example, our set S = [0, 1) has supremum 1 as I have seen. • What is the “largest element” of S?????? • Finally, it should be obvious that for any set S of real numbers, while the supremum (s say) may not be an element of S, every element of S is less than or equal to s. & % MS4105 ' 219 $ A Note on the Supremum Property 4 • The payoff from the above is that I now have a general method for calculating vector norms: – If (for any vector norm), I can show that 1. kAxk ≤ b for all unit vectors x (in the chosen vector norm). 2. kAx◦ k = b for some specific unit vector x◦ . – Then kAk = b as b must be the supremum of kAxk over all unit vectors x ∗ Because b is an upper bound ∗ And there cannot be a smaller upper bound (why?). • I often say that “the bound is attained by x◦ , i.e. kAx◦ k = b so b is the supremum”. • I will use this technique to find formulas for various induced p–norms in the following slides. & % MS4105 220 ' $ • So kAk is the supremum (the least upper bound) of the ratios kAxk n over all non-zero x ∈ C — the maximum factor by kxk which A can stretch a vector x. • The sloppy definition is Definition 4.11 (Induced Matrix Norm — Wrong) Given a choice of norm k · k on Cm and Cn the induced matrix norm of the m × n matrix A is kAxk max kxk6=0 kxk (4.23) • The subtly different (and correct) definition is Definition 4.12 (Induced Matrix Norm — Correct) Given a choice of norm k · k on Cm and Cn the induced matrix norm of the m × n matrix A is & kAxk sup kxk6=0 kxk (4.24) % MS4105 ' 221 $ • I say for any m × n matrix A that kAk is the matrix norm “induced” by the vector norm kxk. • Strictly speaking I should use a notation that reflects the fact that the norms in the numerator and the denominator are of vectors in Cm and Cn respectively. • This never causes a problem as it is always clear from the context what vector norm is being used. • Because kαxk = |α|kxk, the ratio kAxk kxk is does not depend on the norm of x so the matrix norm is often defined in terms of unit vectors. & % MS4105 222 ' $ So finally: Definition 4.13 (Induced Matrix Norm) For any m × n matrix A and for a specific choice of norm on Cn and Cm , the induced matrix norm kAk is defined by: kAk = = kAxk x∈Cn ,x6=0 kxk sup sup kAxk. (4.25) (4.26) x∈Cn ,kxk=1 & % MS4105 223 ' $ Before I try to calculate some matrix norms, a simple but very important result: Lemma 4.5 (Bound for Matrix Norm) For any m × n matrix A and for any specific choice of norm on Cn and Cm , for all x ∈ Cn kAxk ≤ kAkkxk. (4.27) Proof: I simply refer to Def 4.13: for any non-zero x ∈ Cn , kAxk kAyk ≤ sup ≡ kAk. kxk kyk y6=0 So kAxk ≤ kAkkxk. & % MS4105 224 ' $ 1 Example 4.3 The matrix A = 0 2 maps R2 to R2 . 2 Using the second version (4.26) of the definition of the induced matrix norm , let’s calculate the effect of A on the unit sphere in R2 for various p-norms. First I ask what the effect of A on the unit vectors e1 and e2 is (they are unit vectors in all norms); obviously e1 ≡ (1, 0)∗ → (1, 0)∗ ≡ (1, 0)T e2 ≡ (0, 1)∗ → (2, 2)∗ ≡ (2, 2)T . and of course −e1 ≡ (−1, 0)∗ → (1−, 0)∗ ≡ (−1, 0)T −e2 ≡ (0, −1)∗ → (−2, −2)∗ ≡ (−2, −2)T . & % MS4105 225 ' $ Now consider the unit ball for each norm in turn: • In the 1-norm, the diamond-shaped unit ball (see Fig. 4) is mapped into the parallelogram in Fig. 5:with vertices (1, 0), (2, 2), (−1, 0) and (−2, −2). The unit vector x that is magnified most by A is (0, 1)∗ or its negative and the magnification factor is 4 in the 1–norm. Write X x x + 2y = A = Y y 2y & % MS4105 226 ' $ (2, 2) L2 : Y = 2/3(X + 1), X ≥ 0 Y ↑ L1 : Y = 2(X − 1) (0, 2/3) L3 : Y = 2/3(X + 1), X ≤ 0 (−1, 0) (1, 0) X → (0, −2/3) (−2, −2) Figure 5: Image of unit disc in 1-norm under multiplication by A & % MS4105 ' 227 $ Now examine the parallelogram in the X–Y plane in Fig. 5: – On leg L1 , X and Y are both non-negative so kAxk1 = X + Y = X + 2(X − 1) = 3X − 2, 1 ≤ X ≤ 2 so kAxk1 ≤ 4. – On leg L2 , X and Y are both non-negative so kAxk1 = X + Y = X + 2/3(X + 1) = 5X/3 + 2/3, 0 ≤ X ≤ 2 so kAxk1 ≤ 4 again. – Finally on leg L3 , X ≤ 0 and Y is non-negative so kAxk1 = −X + Y = −X + 2/3(X + 1) = 2/3 − X/3, −1 ≤ X ≤ 0 and on this leg, kAxk1 ≤ 1. By the symmetry of the diagram, the results for the lower part of the parallelogram will be the same. So kAxk1 ≤ 4. So as 4 is the least upper bound on kAxk1 over unit vectors x in the 1–norm, I have kAk1 = 4. & % MS4105 228 ' $ • In the 2-norm, the unit ball (a unit disc) in the x–y plane is mapped into the ellipse X2 − 2XY + 5/4Y 2 = 1 in the X–Y plane. containing the points (1, 0) and (2, 2). It is a nice exercise to check that the unit vector that is magnified most by A is the vector (cos θ, sin θ) with tan(2θ) = −4/7. This equation has multiple solutions as tan is periodic with period π. Check that θ = 1.3112 corresponds to the largest value for kAxk. The corresponding point on the unit circle in the x–y plane is approximately (0.2567, 0.9665). Check that the corresponding magnification factor is approximately 2.9208. (It √ must be at least 8 ≈ 2.8284, as (0, 1)∗ maps to (2, 2)∗ .) So: kAk2 ≈ 2.9208 & % MS4105 229 ' $ I can plot the image of the unit disc based on the following: 1 A= 0 2 , 2 −1 A 1 = 0 −1 . 1/2 X−Y X x = A−1 = . 1 Y y 2Y So the unit circle 1 = x 2 + y2 1 = (X − Y)2 + Y 2 4 transforms into the ellipse X2 − 2XY + 5/4Y 2 = 1. On the next Slide I plot this ellipse with (1, 0) and (2, 2) shown. & % MS4105 230 ' $ 2.0 1.6 1.2 Y 0.8 0.4 0.0 −1 −2 X 0 1 2 −0.4 −0.8 −1.2 −1.6 −2.0 Figure 6: Image of unit disk in 2-norm under multiplication by A & % MS4105 ' 231 $ That was a lot of algebra to get the norm of a 2 × 2 matrix — there must be a better way; there is! More later. & % MS4105 232 ' $ • In the ∞-norm, the square “unit ball” with vertices (1, 1), (−1, 1), (−1, −1) and (1, −1) is transformed into the parallelogram with vertices (3, 2), (1, 2), (−3, −2) and (−1, −2). It is easy to check that the points with largest ∞-norm on this parallelogram are ±(3, 2) with magnification factor equal to 3. So kAk∞ = 3. & % MS4105 233 ' $ Example 4.4 (The p-norm of a diagonal matrix) Let D = diag(d1 , . . . , dn ). • Then (similar argument to that preceding Fig. 6) the image of Pn the n-dimensional 2-norm unit sphere i=1 x2i = 1 under D is n X X2i T −2 just the n-dimensional ellipsoid X D X = 1 or = 1. 2 d i=1 i • The semiaxis lengths (maximum values of each of the Xi ) are |di |. • The unit vectors magnified most by D are those that are mapped to the longest semiaxis of the ellipsoid, of length max{|di |}. • Therefore, kDk2 = sup kDxk = sup kXk = max |di |. kxk=1 kxk=1 1≤i≤n • Check that this result holds not just for the 2-norm but for any p-norm (p ≥ 1) when D is diagonal. & % MS4105 234 ' $ I will derive a general “formula” for the 2-norm of a matrix in the next Chapter — the 1– and ∞–norm are easier to analyse. Example 4.5 (The 1-norm of a matrix) If A is any m × n matrix then kAk1 is the “maximum column sum” of A. I argue as follows. First write A in terms of its columns A= a1 a2 ... an where each aj is an m-vector. Consider the (diamond-shaped for n = 2) 1-norm unit ball in Cn . Pn n This is the set B = {x ∈ C : j=1 |xj | ≤ 1}. & % MS4105 235 ' $ Any vector Ax in the image of this set must satisfy X n X n kAxk1 = xj aj |xj |kaj k1 ≤ max kaj k1 . ≤ 1≤j≤n j=1 j=1 1 where the first inequality follows from the Triangle Inequality and the second from the definition of the unit ball B. So the induced matrix 1-norm satisfies kAxk1 ≤ max kaj k1 . By choosing x = eJ , 1≤j≤n where J is the index that maximises kaj k1 I have kAxk1 = kaJ k1 (“I can attain this bound”). So any number less than kaJ k1 cannot be an upper bound as kAeJ k would exceed it. But the matrix norm is the least upper bound (supremum) on kAxk1 over all unit vectors x in the 1–norm and so the matrix norm is kAk1 = max kaj k1 1≤j≤n the maximum column sum. (This reasoning is tricky but worth the effort.) & % MS4105 236 ' $ Example 4.6 (The ∞-norm of a matrix) Using a similar argument check that the ∞-norm of an n × m matrix is the “maximum row sum” of A: kAk∞ = max ka∗i k1 . 1≤i≤n where a∗i stands for the ith row of A. (Hint: x = (±1, ±1, . . . , ±1)T attains the relevant upper bound and kxk∞ = 1 where the ± signs are chosen to be the same as the signs of the corresponding components of a∗I where I is the index that maximises ka∗i k1 . ) Check that this sneaky choice of x ensures that kAxk∞ = ka∗I k1 . ( If the matrix A is complex the definition of x is slightly more complicated but the reasoning is the same, as is the result.) Stopped here 16:00, Monday 20 October & % MS4105 237 ' $ Exercise 4.3 Check that the max column sum & maximum row sum formulas just derived for the 1–norm and of a matrix ∞–norm 1 give the correct results for the matrix A = 0 Exercise 4.4 1 2 2 5 B= 6 3 9 −2 & 2 discussed above. 2 Repeat the calculations for the 4 × 3 matrix 7 1 and check using Matlab/Octave. −1 4 % MS4105 238 ' $ I proved the Cauchy-Schwarz inequality (2.7) for a general real or complex inner product space. The result for the standard Euclidean inner product on Cn is just |x∗ y| ≤ kxk2 kyk2 . I can apply the C-S inequality to find the 2-norm of some special matrices. Example 4.7 (The 2-norm of a row vector) Consider a matrix A containing a single row. I can write A = a∗ where a is a column vector. The C-S inequality allows us to find the induced 2-norm. For any x I have kAxk2 = |a∗ x| ≤ kak2 kxk2 . The bound is “tight” as kAak2 = kak22 . So I have kAk2 = sup {kAxk2 /kxk2 } = kak2 . x6=0 & % MS4105 239 ' $ Example 4.8 (The 2-norm of an outer product) Let A be the rank-one outer product uv∗ where u ∈ Cm and v ∈ Cn . For any x ∈ Cn , I can use the C-S inequality to bound kAxk2 by kAxk2 = kuv∗ xk2 = kuk2 |v∗ x| ≤ kuk2 kvk2 kxk2 . Then kAk2 ≤ kuk2 kvk2 . In fact the inequality is an equality (consider the case x = v/kvk) so kAk2 = kuk2 kvk2 . & % MS4105 ' 5 240 $ QR Factorisation and Least Squares In this Chapter I will study the QR algorithm and the related topic of Least Squares problems. The underlying idea is that of orthogonality. • I will begin by discussing projection operators. • I will then develop the QR factorisation, our first matrix factorisation. (SVD will be the second.) • Next I revisit the Gram-Schmidt orthogonalisation process (2.9) in the context of QR. • The Householder Triangularisation will then be examined as a more efficient algorithm for implementing the QR factorisation. • Finally I will apply these ideas to the problem of finding least squares fits to data. & % MS4105 241 ' 5.1 $ Projection Operators • A projection operator is a square matrix P that satisfies the simple condition P2 = P. I’ll say that such a matrix is idempotent. • A projection operator can be visualised as casting a shadow or projection Pv of any vector v in Cm onto a particular subspace. • If v ∈ range P then “visually” v lies in its own shadow. • Algebraically if v ∈ range P then v = Px for some x and Pv = P2 x = Px = v. • So (not unreasonably) a projection operator maps any vector in its range into itself. & % MS4105 242 ' $ • The operator (matrix) I − P is sometimes called the complementary projection operator to P. • Obviously P(I − P)v = (P − P2 )v = 0 for any v ∈ Cm so I − P maps vectors into the nullspace of P. • If P is a projection operator then so is I − P as (I − P)2 = I − 2P + P2 = I − P. • It is easy to check that for any projection operator P, range(I − P) = null(P) and null(I − P) = range(P). • Also null(P) ∩ range(P) = {0} as any vector v in both satisfies Pv = 0 and v − Pv = 0 and so v = 0. • So a projection operator separates Cn into two spaces (all subspaces must contain the zero vector so the intersection of any two subspaces will always contain 0). & % MS4105 ' 243 $ • On the other hand suppose that I have two subspaces S1 and S2 of Cm s.t. – S1 ∩ S2 = { 0 } and – S1 + S2 = Cm where S1 + S2 means the span of the two sets, i.e. S1 + S2 = {x ∈ Cm |x = s1 + s2 , s1 ∈ S1 , s2 ∈ S2 }. • Such a pair of subspaces are called complementary subspaces. & % MS4105 ' 244 $ • In R3 , you can visualise S1 as a plane aT x = 0 or equivalently αx + βy + γz = 0 with normal vector a = (α, β, γ)T and S2 = {s|s = tb, t ∈ R}, the line through the origin parallel to some vector b that need not be perpendicular to S1 . See the Figure: & % MS4105 245 ' $ a s2 = (I − P)v S2 = {x|x = tb, t ∈ R} v = s1 + s2 b s1 = Pv S1 = {x|aT x = 0} Figure 7: Complementary Subspaces S1 and S2 in R3 & % MS4105 ' 246 $ The following Theorem says that given two complementary subspaces S1 and S2 , I can define a projection operator based on S1 and S2 . Theorem 5.1 There is a projection operator P such that range P = S1 and null P = S2 . Proof: • Simply define P by Px = x for all x ∈ S1 and Px = 0 for x ∈ S2 . • Then if x ∈ S1 , P2 x = P(Px) = Px = x so P2 = P on S1 . • If x ∈ S2 , P2 x = P(Px) = P0 = 0 = Px (using the fact that 0 ∈ S1 and 0 ∈ S2 ) so P2 = P on S2 also. • Therefore P2 = P on Cm . • The range and nullspace properties follow from the definition of P. & % MS4105 247 ' $ • I say that P is the projection operator onto S1 along S2 . • The projection operator P and its complement are precisely the matrices that solve the decomposition problem: “Given v, find vectors v1 ∈ S1 and v2 ∈ S2 s.t. v1 + v2 = v”. • Clearly v1 = Pv and v2 = (I − P)v is one solution. • In fact these vectors are unique as any solution to the decomposition problem must be of the form (Pv + v3 ) + ((I − P)v − v3 ) = v where v3 is in both S1 and S2 so v3 = 0 as S1 ∩ S2 = 0. • Note that I don’t yet know how to compute the matrix P for a given pair of complementary subspaces S1 and S2 . & % MS4105 248 ' 5.1.1 $ Orthogonal Projection Operators • An orthogonal projection operator is one that projects onto a space S1 along a space S2 where S1 and S2 are orthogonal — i.e. s∗1 s2 = 0 for any s1 ∈ S1 and s2 ∈ S2 . • A projection operator that is not orthogonal is sometimes called oblique — the projection illustrated in Fig. 7 is oblique as bT s1 is not identically zero for all s1 ∈ S1 . • The Figure on the next Slide illustrates the idea — S1 is the plane (through the origin) whose normal vector is a and S2 is the line through the origin parallel to the vector a. • So any vector s1 in (the plane) S1 is perpendicular to any vector s2 parallel to the line S2 . & % MS4105 249 ' $ S2 = {x|x = ta, t ∈ R} v = s1 + s2 s2 = (I − P)v S1 = {x|aT x = 0} a s1 = Pv Figure 8: Orthogonal Subspaces S1 and S2 in R3 & % MS4105 250 ' $ • N.B. an orthogonal projection operator is not an orthogonal matrix! • I will show that it is in fact hermitian. • The following theorem links the geometrical idea of orthogonal projection operators with the hermitian property of complex square matrices. Theorem 5.2 A projection operator P is orthogonal iff P = P∗ . Proof: [→] Let P be a hermitian projection operator. RTP that the vectors Px and (I − P)y are orthogonal for all x, y ∈ Cm . Let x, y ∈ Cm and let s1 = Px and s2 = (I − P)y. Then s∗1 s2 = x∗ P∗ (I − P)y = x∗ (P − P2 )y = 0. So P is orthogonal. & % MS4105 251 ' $ [←] Suppose that P projects onto S1 along S2 where S1 ⊥ S2 and dim S1 ∩ S2 = 0. I have that range P = S1 and null P = S2 . RTP that P∗ = P. Let dim S1 = n. Then I can factor P as follows: • Let Q = {q1 , . . . , qm } be an orthonormal basis for Cm where Qn = {q1 , . . . , qn } is an orthonormal basis for S1 and Qm−n = {qn+1 , . . . , qm } is an orthonormal basis for S2 . • For j ≤ n I have Pqj = qj and for j > n I have Pqj = 0. • So PQ = q1 & q2 ... qn 0 ... 0 . % MS4105 252 ' $ • Therefore 1 ∗ Q PQ = Σ = .. . 1 0 .. . 0 an m × m diagonal matrix with ones in the first n entries and zeroes elsewhere. So I have constructed a factorisation (an eigenvalue decomposition) for P, P = QΣQ∗ . It follows that P is hermitian. (This is also a SVD for P — discussed in detail in Ch. 6 .) & % MS4105 253 ' 5.1.2 $ Projection with an Orthonormal Basis I have just seen that an orthogonal projection operator has some singular values equal to zero (unless P = I) so it is natural to drop the “silent” columns in Q corresponding to the zero singular values and write ^Q ^∗ P=Q (5.1) ^ are orthonormal. The matrix Q ^ can be any where the columns of Q set of n orthonormal vectors — not necessarily from a SVD. Any ^Q ^ ∗ is a orthogonal (why?) projection operator onto the product Q ^ as column space of Q Pv = n X (qi q∗i )v, (5.2) i=1 a linear combination of the vectors qi . Check that this follows from (5.1) for P. & % MS4105 254 ' $ The complement of an orthogonal projection operator is also an ^Q ^ ∗ is hermitian. orthogonal projection operator as I − Q A special case of orthogonal projection operators is the rank-one orthogonal projection operator that isolates the component in a single direction q: Pq = qq∗ . General higher rank orthogonal projection operators are sums of Pqi ’s (see (5.2)). The complement of any Pq is the rank-m − 1 orthogonal projection operator that eliminates the component in the direction of q: P⊥q = I − qq∗ . & (5.3) % MS4105 255 ' $ Finally, q is a unit vector in the above. To project along a non-unit vector a, just replace q by a/kak giving: aa∗ Pa = ∗ a a aa∗ P⊥a = I − ∗ . a a & (5.4) (5.5) % MS4105 256 ' 5.1.3 $ Orthogonal Projections with an Arbitrary Basis I can construct orthogonal projection operators that project any vector onto an arbitrary not necessarily orthonormal basis for a subspace of Cm . • Suppose this subspace is spanned by a linearly independent set {a1 , . . . , an } and let A be the m × n matrix whose jth column is aj . • Consider the orthogonal projection vA = Pv of a vector v onto this subspace — the range of A (the space spanned by the vectors {a1 , . . . , an }). • Then the vector vA − v must be orthogonal to the range of A so a∗j (vA − v) = 0 for each j. & % MS4105 257 ' $ • Since vA ∈ range A, I can write vA = Ax for some vector x and I have a∗j (Ax − v) = 0 for each j or equivalently A∗ (Ax − v) = 0. • So A∗ Ax = A∗ v. • I know that, as A has full rank n (how do I know?), A∗ A is invertible. • Therefore x = (A∗ A)−1 A∗ v and finally I can write vA , the projection of v as vA = Ax = A(A∗ A)−1 A∗ v. • So the orthogonal projection operator onto the range of A is given by the formula P = A(A∗ A)−1 A∗ & (5.6) % MS4105 ' 258 $ • Obviously P is hermitian, as predicted by Thm. 5.2. • Note that (5.6) is a multidimensional generalisation of (5.4). ^ the (A∗ A) factor is just the • In the orthonormal case A = Q, identity matrix and I recover (5.1). & % MS4105 259 ' 5.1.4 $ Oblique (Non-Orthogonal) Projections Oblique projections are less often encountered in Numerical Linear Algebra but are interesting in their own right. One obvious question is: how do I construct a matrix P corresponding to the formula P = A(A∗ A)−1 A∗ for an orthogonal projection operator ? The details are interesting but involved. See App. H — this material is optional. & % MS4105 260 ' 5.1.5 $ Exercises 1. If P is an orthogonal projection operator then I − 2P is unitary. Prove this algebraically and give a geometrical explanation. (See Fig. 9.) 2. Define F to be the m × m matrix that “flips” (reverses the order of the elements of) a vector (x1 , . . . , xm )∗ to (xm , . . . , x1 )∗ . Can you write F explicitly? What is F2 ? 3. Let E be the m × m matrix that finds the “even part” of a vector in Cm so that Ex = (x + Fx)/2 where F was defined in the previous question. Is E an orthogonal projection operator ? Is it a projection operator ? 4. Given that A is an m × n (m ≥ n) complex matrix show that A∗ A is invertible iff A has full rank. (See the discussion before and after (4.6).) See App. J for a solution. & % MS4105 261 ' $ 5. Let S1 and S2 be the subspaces of R3 spanned by: B1 = (1, −1, −1)T , (0, 1, −2)T and B2 = (1, −1, 0)T respectively. Show that S1 and S2 are complementary. Is S2 orthogonal to S1 ? If so, calculate the projection operator onto S1 along S2 . If not, calculate it using the methods explained in App. H. (Optional). 6. Consider the matrices 1 A= 0 1 0 1 1 , B = 0 0 1 2 1 . 0 Answer the following by hand calculation: (a) What is the orthogonal projection operator P onto the range of A. & % MS4105 ' 262 $ (b) What is the image under P of the vector (1, 2, 3)∗ ? (c) Repeat the calculations for B. 7. (Optional) Can you find a geometrical interpretation for the Hermitian conjugate P∗ of an oblique (P∗ 6= P) projection ¯2 by using the fact that operator P? Hints: Show that P∗ v ∈ S for any v, w ∈ Cm ; w∗ (P∗ v) = (Pw)∗ v which is zero if w ∈ S2 . ¯ 1 . (S ¯1 is Use a similar trick to show that P∗ v = 0 for any v ∈ S the subspace of vectors orthogonal to all vectors in S1 .) So P∗ projects onto ??? along ???. Make a sketch to illustrate the above based on the case where S1 and S2 are both one-dimensional. & % MS4105 ' 263 $ 8. Show that for any projection operator P, kPk = kP∗ k. (Hint: remember/show that for any matrix M, MM∗ and M∗ M have the same eigenvalues — Thm. 6.12 — and then use the fact that kPk2 is largest eigenvalue of P∗ P and kP∗ k2 is largest eigenvalue of PP∗ .) 9. (Optional) (Part of the proof requires that you read Appendix H.) Let P ∈ Cm×m be a non-zero projection operator. Show that kPk2 ≥ 1 with equality iff P is an orthogonal projection operator. See App. K for a solution. & % MS4105 264 ' 5.2 $ QR Factorisation The following Section explains one of the most important ideas in Numerical Linear Algebra — QR Factorisation. 5.2.1 Reduced QR Factorisation The column spaces of a matrix A are the succession of spaces spanned by the columns a1 , a2 , . . . of A: span(a1 ) ⊆ span(a1 , a2 ) ⊆ span(a1 , a2 , a3 ) ⊆ . . . The geometric idea behind the QR factorisation is the construction of a sequence of orthonormal vectors qi that span these successive spaces. So q1 is just a1 /ka1 k, q2 is a unit vector ⊥ q1 that is a linear combination of a1 and a2 , etc. & % MS4105 265 ' $ For definiteness suppose that an m × n complex matrix A is “tall and thin” — i.e (n ≤ m) with full rank n. I want the sequence of orthonormal vectors q1 , q2 , . . . to have the property that span(q1 , q2 , . . . , qj ) = span(a1 , a2 , . . . , aj ), j = 1, . . . , n. I claim that this is equivalent to A = QR or schematically: r11 r12 . . . r22 a1 a2 . . . an = q1 q2 . . . qn .. . where the diagonal elements rkk are non-zero. & r1n .. . rnn (5.7) of the upper triangular matrix R % MS4105 266 ' $ Argue as follows: • If (5.7) holds, then each ak can be written as a linear combination of q1 , . . . qk and therefore the space span(a1 , . . . ak ) can be written as a linear combination of q1 , . . . qk and therefore is equal to span(q1 , . . . qk ). • Conversely span(q1 , q2 , . . . , qj ) = span(a1 , a2 , . . . , aj ) for each j = 1, . . . , n means that (for some set of coefficients rij with the rii non-zero) a1 = r11 q1 a2 = r12 q1 + r22 q2 .. . an = r1n q1 + r2n q2 + · · · + rnn qn . This is (5.7). & % MS4105 267 ' $ If I write (5.7) as a matrix equation I have ^R ^ A=Q ^ is m × n with n orthonormal columns and R ^ is n × n and where Q upper-triangular. This is referred to as a reduced QR factorisation of A. & % MS4105 268 ' 5.2.2 $ Full QR factorisation A full QR factorisation of an m × n complex matrix A (m ≥ n) ^ so that it becomes appends m − n extra orthonormal columns to Q an m × m unitary matrix Q. This is analogous to the “silent” columns in the SVD — to be discussed in Ch. 6. ^ rows of zeroes are appended In the process of adding columns to Q, ^ so that it becomes an m × n matrix — still upper-triangular. to R The extra “silent” columns in Q multiply the zero rows in R so the product is unchanged. Note that in the full QR factorisation, the silent columns qj of Q (for j > n) are orthogonal to the range of A as the range of A is spanned by the first n columns of Q. If A has full rank n, the silent columns are an orthonormal basis for the null space of A. & % MS4105 269 ' 5.2.3 $ Gram-Schmidt Orthogonalisation The equations above for ai in terms of qi suggest a method for the computation of the reduced QR factorisation. Given the columns of A; a1 , a2 , . . . I can construct the vectors q1 , q2 , . . . and the entries rij by a process of successive orthogonalisation — the Gram-Schmidt algorithm (Alg. 2.1). & % MS4105 270 ' $ Applying the G-S algorithm to the problem of calculating the qj and the rij I have: a1 r11 a2 − r12 q1 q2 = r22 a3 − r13 q1 − r23 q2 q3 = r33 .. . Pn−1 an − i=1 rin qi qn = rnn q1 = where the rij are just the components of each aj along the ith orthonormal vector qi , i.e. rij = q∗i aj for i < j and Pj−1 rjj = kaj − i=1 rij qi k in order to normalise the qj . & % MS4105 ' 271 $ Writing the algorithm in pseudo-code: Algorithm 5.1 Classical Gram-Schmidt Process (unstable) (1) (2) (3) (4) (5) (6) (7) (8) (9) for j = 1 to n vj = aj for i = 1 to j − 1 rij = q∗i aj vj = vj − rij qi end rjj = kvj k2 qj = vj /rjj end A matlab/octave implementation of this algorithm can be found at: http://jkcray.maths.ul.ie/ms4105/qrgs1.m. & % MS4105 272 ' 5.2.4 $ Instability of Classical G-S Algorithm As the note in the title suggests, the above algorithm is numerically unstable — although algebraically correct. Let’s see why. • First I need to explain “N–digit floating point” arithmetic — nothing new of course but in these examples I’ll need to be careful about how f.p. arithmetic is done. • For any real number x define fl(x) (“float of x”) to be the closest floating point number to x using whatever rounding rules are selected — to N digits. • So in 10–digit fp arithmetic, fl(π) = 3.141592653. • In 3–digit fp arithmetic, fl(π) = 3.14. • Also of course , fl(1 + 10−3 ) = 1 as 1.001 has to be rounded down to 1.00. & % MS4105 ' 273 $ Example 5.1 (CGS Algorithm Instability ) I’ll apply the CGS algorithm 5.1 above to three nearly equal vectors and show that CGS gives wildly inaccurate answers. I’ll work in 3-digit f.p. arithmetic. 1 1 −3 −3 • Take the three vectors a1 = 10 , a2 = 10 and 10−3 0 1 a3 = 0 as input. 10−3 • Check that they are linearly independent. • See the CGS calculation in App. S. • You’ll find that q∗2 q3 = 0.709. Not even close to orthogonal! & % MS4105 ' 274 $ • The example was deliberately constructed to “break” the CGS algorithm and of course Matlab/octave use 16–digit f.p. arithmetic, not three. The three vectors are almost equal (in 3–digit arithmetic) so you might expect problems. • I’ll show in the next Section that a modified version of the GS algorithm doesn’t fail, even on this contrived example. Here’s another example which doesn’t use 3–digit f.p. arithmetic. In the Example, ε is the smallest positive f.p number such that 1 + 2ε > 1. So 1 + ε = 1 in f.p. arithmetic and of course 1 + ε2 = 1. In Matlab/Octave ε = 12 εM ≈ 1.1102 10−16 . Exercise 5.1 Take a1 = (1, ε, 0, 0)∗ , a2 = (1, 0, ε, 0)∗ and a3 = (1, 0, 0, ε)∗ . Check that (using CGS) q∗2 q3 = 12 — it should of course be zero. I will improve on CGS in the next Section — for the present I will use it to discuss the QR algorithm further. & % MS4105 275 ' 5.2.5 $ Existence and Uniqueness Every m × n complex matrix A has a QR factorisation which is unique subject to some restrictions. The existence proof: Theorem 5.3 Every A ∈ Cm×n (m ≥ n) has a full QR factorisation and so also a reduced QR factorisation. Proof: • Suppose that A has full rank n and that I require a reduced QR factorisation. Then the G-S algorithm provides the proof ^ and as the algorithm generates orthonormal columns for Q ^ so that A = Q ^ R. ^ The algorithm can fail only if at entries for R some iteration vj = 0 and so cannot be normalised to produce qj . But this would imply that aj is in the span of q1 , . . . qj−1 contradicting the assumption that A has full rank. & % MS4105 ' 276 $ • Now suppose that A does not have full rank. Then at one or more steps j I will find that vj = 0 as aj can be expressed as a linear combination of fewer than n qi ’s. Now just pick qj to be any unit vector orthogonal to q1 , . . . , qj−1 and continue the G-S algorithm. • Finally the full rather than reduced QR factorisation of an m × n matrix A with m > n can be constructed by adding extra orthonormal vectors after the nth iteration. I just continue to apply the G-S algorithm for m − n more iterations to arbitrary vectors orthogonal to the column space of A. ^R ^ is a reduced QR Now for uniqueness. Suppose that A = Q ^ is multiplied by z and the ith factorisation. If the ith column of Q ^ is multiplied by z−1 for any z ∈ C s.t. |z| = 1 then the row of R ^R ^ is unchanged so I have another QR factorisation for A. product Q The next theorem states that if A has full rank then this is the only freedom in our choice of QR factorisations. & % MS4105 ' 277 $ Theorem 5.4 Every full rank m × n complex matrix A (m ≥ n) ^R ^ such that the has a unique reduced QR factorisation A = Q diagonal elements of R are all positive. ^ R, ^ the Proof: Again I use the GS algorithm. From A = Q ^ and the upper-triangularity of orthonormality of the columns of Q R it follows that any reduced QR factorisation of A can be generated by Alg. 5.1 — by the assumption of full rank the rjj are all non-zero so all the vectors qj , j = 1, . . . , n can be formed. The one degree of freedom is that in line 7 I made the arbitrary choice rjj = kvj k2 . As mentioned above, multiplying each qi by a different complex number zi (with unit modulus) and dividing the corresponding row of R by the same amount does not change the ^R ^ as the qi still have unit norm. The restriction rjj > 0 product Q means that the choice in line 7 is unique. & % MS4105 278 ' 5.2.6 $ Solution of Ax = b by the QR factorisation Suppose that I want to solve Ax = b for x where A is complex, m × m and invertible. If A = QR is a QR factorisation then I can write QRx = b or Rx = Q∗ b. The RHS is easy to compute once I know Q and the linear system is easy to solve (by back substitution) as R is upper triangular. So a general method for solving linear systems is: 1. Compute a QR factorisation for A; A = QR. 2. Compute y = Q∗ b. 3. Solve Rx = y for x. This method works well but Gaussian elimination uses fewer arithmetic operations. I will discuss this topic further in Chapter 7. & % MS4105 279 ' 5.2.7 $ Exercises 1. Consider again the matrices A and B in Q. 6 of Exercises 5.1.5. Calculate by hand a reduced and a full QR factorisation of both A and B. 2. Let A be a matrix with the property that its odd-numbered columns are orthogonal to its even-numbered columns. In a ^ R, ^ what particular structure reduced QR factorisation A = Q ^ have? See App. Q for a solution. will R 3. Let A be square m × m and let aj be its ith column. Using the full QR factorisation, give an algebraic proof of Hadamard’s m Y kaj k2 . (Hint: use the fact that inequality | det A| ≤ P j=1 aj = i rij qi then take norms and use the fact that P 2 kaj k = i |rij |2 because the vectors are orthonormal.) & % MS4105 280 ' $ 4. Check Hadamard’s inequality for a random (say) 6 × 6 matrix using matlab/octave. ^R ^ be a reduced 5. Let A be complex m × n , m ≥ n and let A = Q QR factorisation. ^ (a) Show that A has full rank n iff all the diagonal entries of R are non-zero. (Hints: Remember A full rank means that (in particular) the columns ai of A are linearly independent so Pn that i=1 αi ai = 0 ⇒ αi = 0, i = 1, . . . , n. Rewriting in ^ ∗Q ^ = In . So vector notation; Aα = 0 ⇒ α = 0. Note that Q ^ = 0. The problem reduces to showing that Aα = 0 iff Rα ^ Rα ^ = 0 ⇒ α = 0 iff all for an upper triangular matrix R, ^ are non-zero. Use the diagonal entries of R back-substitution to check this.) & % MS4105 281 ' $ ^ has k non-zero diagonal entries for some k (b) Suppose that R with 0 ≤ k < n. What can I conclude about the rank of A? (Is rank A = k? Or rank A = n − 1? Or rank A < n?) First try the following Matlab experiment: • Construct a tall thin random matrix A, say 20 × 12. • Use the built-in Matlab QR function to find the reduced QR factorisation of A: [q,r]=qr(A,0). • Check that q ∗ r = A, rank A = 12 and rank r = 12. • Now set two (or more) of the diagonal elements of r to zero – say r(4,4)=0 and r(8,8)=0. • What is the rank of r now? ˜ = q ∗ r? • And what is the rank of A • What happens if you increase the number of zero diagonal elements of r? See Appendix L for an answer to the original question. & % MS4105 282 ' 5.3 $ Gram-Schmidt Orthogonalisation The G-S algorithm is the basis for one of the two principal methods for computing QR factorisations. In the previous Section I used the conventional G-S algorithm to compute the QR factorisation. I begin by re-describing the algorithm using projection operators. Let A be complex, m × n (m ≥ n) and full rank with n columns aj , j = 1, . . . , n. Consider the sequence of formulas P2 a2 Pn an P1 a1 , q2 = , . . . , qn = . q1 = kP1 a1 k kP2 a2 k kPn an k (5.8) Here each Pj is an orthogonal projection operator, namely the m × m matrix of rank m − (j − 1) that projects from Cm orthogonally onto the space orthogonal to {q1 , . . . , qj−1 }. (When j = 1 this formula reduces to the identity P1 = I so q1 = a1 /ka1 k.) & % MS4105 283 ' $ Now I notice that each qj as defined by (5.8) is orthogonal to {q1 , . . . , qj−1 }, lies in the space spanned by {a1 , . . . , aj } and has unit norm. So the algorithm (5.8) is equivalent to Alg. 5.1 , our G-S-based algorithm that computes the QR factorisation of a matrix. ^ j−1 be The projection operators Pj can be written explicitly. Let Q ^ the m × (j − 1) matrix containing the first j − 1 columns of Q; ^ j−1 Q & = q1 q2 ... qj−1 . % MS4105 284 ' $ Then Pj is given by ^ j−1 Q ^ ∗j−1 . Pj = I − Q (5.9) Can you see that this is precisely the operator that maps aj into ∗ ∗ ∗ aj − (q1 aj )q1 + (q2 aj )q2 + · · · + (qj−1 aj )qj−1 , the projection of aj onto the subspace ⊥ to {q1 , . . . , qj−1 }? & % MS4105 285 ' 5.3.1 $ Modified Gram-Schmidt Algorithm As mentioned in the previous Section, the CGS algorithm is flawed. I showed you two examples where the vectors q1 , . . . , qn were far from being orthogonal (mutually perpendicular) when calculated with CGS. Although algebraically correct, when implemented in floating point arithmetic, when the CGS algorithm is used, the vectors qi are often not quite orthogonal, due to rounding errors (subtractive cancellation) that arise from the succession of subtractions and the order in which they are performed. A detailed explanation is beyond the scope of this course. & % MS4105 286 ' $ Fortunately, a simple change is all that is needed. For each value of j, Alg. 5.1 (or the neater version (5.8) using projection operators ) computes a single orthogonal projection of rank m − (j − 1), (5.10) vj = Pj aj . The modified G-S algorithm computes exactly the same result (in exact arithmetic) but does so by a sequence of j − 1 projections, each of rank m − 1. I showed in (5.3) that P⊥q is the rank m − 1 orthogonal projection operator onto the space orthogonal to a vector q ∈ Cm . By the definition of Pj above, it is easy to see that (with P1 ≡ I) Pj = P⊥qj−1 . . . P⊥q2 P⊥q1 (5.11) as by the orthogonality of the qi , 1 Y i=j−1 & (I − qi q∗i ) = I − j−1 X qi q∗i . i=1 % MS4105 287 ' $ Also, using the definition (5.9) for Pj , given any v ∈ Cm , ^ j−1 Q ^ ∗j−1 v Pj v = v − Q q∗1 v q∗ v 2 ^ j−1 =v−Q ... q∗j−1 v j−1 X =v− (q∗i v)qi i=1 = (I − j−1 X qi q∗i )v. i=1 & % MS4105 288 ' $ So the equation vj = P⊥qj−1 . . . P⊥q2 P⊥q1 aj (5.12) is equivalent to (5.10). The modified G-S algorithm is based on using (5.12) instead of (5.10). A detailed explanation of why the modified G-S algorithm is better that the “unstable” standard version would be too technical for this course. A simplistic explanation — the process of repeated multiplication is much more numerically stable that repeated addition/subtraction. Why? Repeated addition/subtraction of order one terms to a large sum typically results in loss of significant digits. Repeated multiplication by order-one (I − qi q∗i ) factors is not subject to this problem. I’ll re-do Example 5.1 with MGS in Section 5.3.2 below — you’ll see that it gives much better results. & % MS4105 289 ' $ Let’s “unpack” the modified G-S algorithm so that I can write peudo-code for it: The modified algorithm calculates vj by performing the following operations (for each j = 1, . . . , n) (1) = aj (2) = P⊥q1 vj (3) = P⊥q2 vj vj vj vj (1) = vj (1) − (q∗1 vj )q1 (2) = vj (1) (2) − (q∗2 vj )q2 (2) .. . (j) (j−1) vj ≡ vj = P⊥qj−1 vj (j−1) = vj (j−1) − (q∗j−1 vj )qj−1 . Of course I don’t need all these different versions of the vj — I just update each vj over and over again inside a loop. & % MS4105 ' 290 $ I can write this in pseudo-code as: Algorithm 5.2 Modified Gram-Schmidt Process (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) for j = 1 to n vj = aj end for j = 1 to n for i = 1 to j − 1 (Do nothing if j = 1) rij = q∗i vj vj = vj − rij qi end rjj = kvj k qj = vj /rjj end In practice, it would be sensible to let the vi overwrite the ai to save memory. & % MS4105 291 ' 5.3.2 $ Example to Illustrate the “Stability” of MGS Example 5.2 I’ll re-use Example 5.1, again working with 3–digit f.p. arithmetic. 1 1 −3 −3 Take the three vectors a1 = 10 , a2 = 10 and 10−3 0 1 a3 = 0 as input. 10−3 See the MGS calculation in App. T. The good news: q∗1 q2 = −10−3 , q∗2 q3 = 0 and q∗3 q1 = 0. This is as good as I can expect when working to 3–digit accuracy. & % MS4105 ' 292 $ Exercise 5.2 Now check that MGS also deals correctly with the example in Exercise 5.1. & % MS4105 293 ' 5.3.3 $ A Useful Trick A surprisingly useful trick is the technique of re-ordering sums or (as I will show) operations. The technique is usually written in terms of double sums of the particular form N X i X fij i=1 j=1 and the “trick” consists in noting that if I label columns in an i–j “grid” by i and rows by j then I am summing elements of the matrix “partial column by partial column” — i.e. I take only the first element of column 1, the first two elements of column 2, etc. Draw a sketch! & % MS4105 294 ' $ But I could get the same sum by summing row-wise; for each row (j) sum all the elements from column j to column N. So I have the (completely general) formula: N X i X i=1 j=1 & fij = N X N X fij . (5.13) j=1 i=j % MS4105 ' 295 $ The usefuness of the formula here is as follows: The MGS algorithm is: Algorithm 5.3 Modified Gram-Schmidt Process (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) for j = 1 to n vj = aj end for j = 1 to n for i = 1 to j − 1 (Do nothing if j = 1) rij = q∗i vj vj = vj − rij qi end rjj = kvj k qj = vj /rjj end & % MS4105 296 ' $ I can (another trick) rewrite this as a simpler double (nested) for loop similar in structure to the double sum (5.13) above: Algorithm 5.4 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) Modified Gram-Schmidt Process for j = 1 to n vj = aj end for j = 1 to n for i = 1 to j if i < j then rij = q∗ i vj vj = vj − rij qi fi if i = j then rjj = kvj k qj = vj /rjj fi end end Lines 1–3 can be left alone but suppose that I apply (5.13) above to lines 4–15? (Think of this “block”, depending on i & j as fij .) & % MS4105 297 ' $ I get: Algorithm 5.5 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) Modified Gram-Schmidt Process for j = 1 to n vj = aj end for i = 1 to n for j = i to n if i < j then rij = q∗ i vj vj = vj − rij qi fi if i = j then rjj = kvj k qj = vj /rjj fi end end Finally, I can undo the i < j and i = j tricks that I used to make a block depending on both i & j: & % MS4105 298 ' $ Algorithm 5.6 (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) Modified Gram-Schmidt Process—Alternative Form for i = 1 to n (I’m using i instead of j because it looks better. . . ^) ¨ vi = ai end for i = 1 to n rii = kvi k qi = vi /rii for j = i + 1 to n rij = q∗ i vj vj = vj − rij qi end end Can you see why lines 5 & 6 now appear before the inner for loop — and why the dummy variable is now i? This alternative (and entirely equivalent) form for MGS is often used in textbooks and on-line & if you had not read the preceding discussion you might think it was a competely different algorithm. Exercise 5.3 Perform the same re-ordering on the CGS algorithm 5.1. & % MS4105 299 ' 5.3.4 $ Operation Count When m and n are large, the work in both Alg. 5.1 and Alg. 5.2 is dominated by the operations in the inner loop: rij = q∗i vj vj = vj − rij qi The first line computes an inner product requiring m multiplications and m − 1 additions. The second requires m multiplications and m subtractions. So the total work per inner iteration is ≈ 4m flops (4 flops per column element). In total, the number of flops used by the algorithm is approximately j−1 n X X j=1 i=1 & 4m = n X 4m(j − 1) ≈ 2mn2 . (5.14) j=1 % MS4105 300 ' 5.3.5 $ Gram-Schmidt as Triangular Orthogonalisation It is interesting to interpret the GS algorithm as a process of multiplying the matrix A on the right by a succession of triangular matrices; “Triangular Orthogonalisation”. Each outer j step of Alg. 5.2 can be viewed as a right-multiplication. Starting with A, the j = 1 step multiplies the first column a1 by 1/r11 . & % MS4105 301 ' $ This is equivalent to right-multiplying A by the matrix R1 : AR1 = q1 a2 ... an = a1 & a2 ... 1/r11 0 an .. . 0 0 ... 0 1 ... .. . 0 (5.15) 1 % MS4105 302 ' $ The j = 2 step subtracts r12 times q1 from a2 and divides the result by r22 — equivalent to right-multiplying AR1 by the matrix R2 : AR1 R2 = q1 q2 ... an = ... 1 0 an 0 0 0 q1 & a2 −r12 /r22 ... 0 1/r22 ... 0 .. . ... .. . 0 0 . 1 0 (5.16) % MS4105 303 ' $ The j = 3 step: a3 ← a3 − r13 q1 − r23 q2 , divide result by r33 . AR1 R2 R3 = q1 q2 q1 & q2 a3 ... q3 ... an = 1 0 −r13 /r33 0 1 −r23 /r33 1/r33 0 0 an 0 0 0 .. . 0 ... 0 ... 0 0 (5.17) 0 1 ... ... .. . % MS4105 ' 304 $ • I can represent this process by a process of multiplying A on the right by a sequence of elementary upper triangular matrices Rj where each Rj only changes the jth column of A. • After multiplying AR1 R2 . . . Rj−1 on the right by Rj the first j columns of the matrix AR1 R2 . . . Rj consist of the vectors q1 , . . . , qj . & % MS4105 305 ' • The matrix Rj 1 0 0 0 Rj = 0 0 0 $ is just 0 0 ... 0 −r1j /rjj ... 1 0 ... 0 −r2j /rjj ... 0 1 0 0 −r3j /rjj .. . ... 0 ... .. . 0 0 ... 1 −rj−1,j /rjj 0 0 ... 0 1/rjj ... 0 0 ... 0 0 .. . 1 0 & ... ... ... .. . 0 0 0 0 0 , 0 0 1 % MS4105 306 ' $ • At the end of the process I have ^ AR1 R2 . . . Rn = Q where Q is the m × n orthogonal matrix whose columns are the vectors qj . • So the GS algorithm is a process of triangular orthogonalisation. • Of course I do not compute the Rj explicitly in practice but this observation gives us an insight into the structure of the GS algorithm that will be useful later. & % MS4105 307 ' 5.3.6 $ Exercises 1. Let A be a complex m × n matrix. can you calculate the exact number of flops involved in computing the QR factorisation ^R ^ using Alg. 5.2? A=Q 2. Show that a product of upper triangular matrices is upper triangular. 3. Show that the inverse of an upper triangular matrix is upper triangular. & % MS4105 308 ' 5.4 $ Householder Transformations The alternative approach to computing QR factorisations is Householder triangularisation, which is numerically more stable than the Gram-Schmidt orthogonalisation process. The Householder algorithm is a process of “orthogonal triangularisation”, making a matrix triangular by multiplying it by a succession of unitary (orthogonal if real) matrices. & % MS4105 309 ' 5.4.1 $ Householder and Gram Schmidt I showed at the end of Ch 5.3.5 that the GS algorithm can be viewed as applying a succession of elementary upper triangular matrices Rj to the right of A, so that the resulting matrix ^ AR1 . . . Rn = Q −1 ^ = R−1 has orthonormal columns. Check that the product R . . . R n 1 ^R ^ is a reduced QR is also upper triangular. So as expected A = Q factorisation for A. & % MS4105 310 ' $ On the other hand, I will see that the Householder method applies a succession of elementary unitary matrices Qk on the left of A so that the resulting matrix Qn Qn−1 . . . Q1 A = R is upper triangular. The product Q = Q∗1 Q∗2 . . . Q∗n is also unitary so this method generates a full QR factorisation A = QR. In summary; • the Gram-Schmidt process uses triangular orthogonalisation. • the Householder algorithm uses orthogonal triangularisation. & % MS4105 311 ' 5.4.2 $ Triangularising by Introducing Zeroes The Householder method is based on a clever way of choosing the unitary matrices Qk so that Qn Qn−1 . . . Q1 A is is upper triangular. In the example on the next slide, A is a general 5 × 3 matrix. The matrix Qk is chosen to introduce zeroes below the diagonal in the kth column while keeping all the zeroes introduced at previous iterations. In the diagram, × represents an entry that is not necessarily zero and a bold font means the entry has just been changed. Blank entries are zero. & % MS4105 312 ' × × × × × $ × × × × × A × × Q1 × → × × X X X × × × 0 X X X X Q2 Q3 0 X X → 0 X → 0 X X 0 X 0 X X 0 X Q1 A Q2 Q1 A × × × × × X 0 0 Q3 Q2 Q1 A First Q1 operates on rows 1–5, introducing zeroes in column 1 in the second and subsequent rows. Next Q2 operates on rows 2–5, introducing zeroes in column 2 in the third and subsequent rows but not affecting the zeroes in column 1. Finally Q3 operates on rows 3–5, introducing zeroes in column 3 in the fourth and fifth rows, again not affecting the zeroes in columns 1 and 2. The matrix Q3 Q2 Q1 A is now upper triangular. & % MS4105 ' 313 $ In general Qk is designed to operate on rows k to m. At the beginning of the kth step there is a block of zeroes in the first k − 1 columns of these rows. Applying Qk forms linear combinations of these rows and the linear combinations of the zero entries remain zero. & % MS4105 314 ' 5.4.3 $ Householder Reflections How to choose unitary matrices Qk that accomplish the transformations suggested in the diagram? The standard approach is to take each Qk to be a unitary matrix of the form: Ik−1 0 (5.18) Qk = 0 Hk where Ik−1 is the (k − 1) × (k − 1) identity matrix and Hk is an (m − k + 1) × (m − k + 1) unitary matrix. I choose Hk so that multiplication by Hk introduces zeroes into the kth column. For k = 1, Q1 = H1 , an m × m unitary matrix. Notice that the presence of Ik−1 in the top left corner of Qk ensures that Qk does not have any effect on the first k − 1 rows/columns of A. & % MS4105 ' 315 $ The Householder algorithm chooses Hk to be a unitary matrix with a particular structure — a Householder reflector. A Householder reflector Hk is designed to introduce zeroes below the diagonal in column k — H1 introduces zeroes in rows 2–m of column 1, H2 introduces zeroes in rows 3–m of column 2, etc; without affecting the zeroes below the diagonal in the preceding columns. & % MS4105 316 ' ∗ ∗ ∗ $ ∗ ... ∗ ... ∗ ... ∗ ... ∗ ... .. . ∗ .. . ... x1 ... x2 .. . ... xm−k+1 ... ... ... ∗ ∗ ∗ ∗ ∗ Hk ⇒ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ... ∗ ... ∗ ∗ ∗ ∗ ∗ ... ∗ ... ∗ ... .. . ∗ .. . ... ... ... 0 .. . ... 0 ... ... Referring to the diagram, suppose that at the beginning of step k, the entries in rows k to m of the kth column are given by the vector (of length m − (k − 1)) x ∈ Cm−k+1 . After mutiplying by Hk ; zeroes are introduced in column k under the diagonal, the entries marked are changed and the entries marked by ∗ are unchanged. & % MS4105 317 ' $ To introduce the required zeroes into the kth column, the Householder reflector H must transform x into a multiple of e1 — the vector of the same size that is all zeroes except for the first element. For any vector v ∈ Cm−k+1 , the matrix P = orthogonal projection operator so that 2vv∗ H = I − 2P = I − ∗ v v vv∗ v∗ v is an (5.19) is unitary (check ). & % MS4105 318 ' $ For an arbitrary vector v ∈ Cm , the effect on a vector x of a Householder reflector H = I − 2Pv ≡ P⊥v − Pv is to reflect x about the normal to v that lies in the x–v plane. v P⊥v x Pv x x −Pv x P⊥v x − Pv x P⊥v x Figure 9: Householder Reflector for arbitrary v & % MS4105 319 ' $ For any choice of x, I have 2v∗ x Hx = x − ∗ v v v so if (as I want) Hx ∈ span{e1 } then I must have v ∈ span{x, e1 }. Setting v = x + αe1 (taking α real) gives v∗ x = x∗ x + αe∗1 x = kxk2 + αx1 and v∗ v = kxk2 + α2 + 2αx1 . So 2(kxk2 + αx1 ) (x + αe1 ) Hx = x − 2 2 kxk + α + 2αx1 (α2 − kxk2 ) v∗ x x − 2α ∗ e1 . = 2 2 kxk + α + 2αx1 v v & % MS4105 320 ' $ If I chose α = ±kxk then the coefficient of x is zero, so that Hx is a multiple of e1 as required. Now I can write v = x ± kxke1 (5.20) and (substituting for v∗ x and v∗ v in Hx) I get the remarkably simple result: Hx = ∓kxke1 . (5.21) On the two suceeding Slides I show in diagrams the effect of the two choices of sign. & % MS4105 321 ' $ P⊥v x v = x − kxke1 x Pv x −kxke1 e1 kxke1 Hx ≡ P⊥v x − Pv x −Pv x P⊥v x Figure 10: Householder reflection using v = x − kxke1 & % MS4105 322 ' $ x P⊥v x v = x + kxke1 −kxke1 Hx ≡ P⊥v x − Pv x Pv x e1 kxke1 P⊥v x −Pv x Figure 11: Householder reflection using v = x + kxke1 & % MS4105 ' 323 $ Example 5.3 If x = (3, 1, 5, 1)T then kxk = 6, α = ±6 and (taking the positive sign in α) I find that v = (9, 1, 5, 1)T . The Householder reflector H is given by −27 −9 −45 −9 2vv∗ 1 −9 53 −5 −1 H=I− ∗ = . v v 54 −45 −5 29 −5 −9 −1 −5 53 It is easy to check that Hx = (−6, 0, 0, 0)T = −6e1 as expected. & % MS4105 324 ' $ An obvious question: which choice of sign should I take in (5.20)? The short answer is that either will do — algebraically. The full answer is that I make the choice that avoids subtraction as subtraction of nearly equal quantities leads to loss of significant digits in the result. So I use the following prescription for v: v = x + sign(x1 )kxke1 (5.22) which of course means that (5.21) becomes: Hx = −sign(x1 )kxke1 . (5.23) (Which of the two Figures 10 and 11 corresponds to this choice?) & % MS4105 325 ' $ I can now write the Householder QR algorithm using a matlab-style notation. If A is a matrix, define Ai:i 0 ,j:j 0 to be the (i 0 − i + 1) × (j 0 − j + 1) submatrix of A whose top left corner is aij and whose lower right corner is ai 0 j 0 — the rectangle in blue in the diagram below. If the submatrix is a sub-vector of a particular row or column of A I write Ai,j:j 0 or Ai:i 0 ,j respectively. ∗ ∗ ∗ ∗ . . . . . . ∗ ∗ .. .. . . ∗ & ∗ ... ∗ ... ∗ ... ... aij .. . ... aij 0 .. . ... ... ai 0 j .. . ... ai 0 j 0 .. . ... ... ∗ ... ∗ ... ∗ ∗ .. . ∗ .. . ∗ % MS4105 ' 326 $ • The following algorithm Householder QR Factorisation (Alg. 5.7) computes the factor R of a QR factorisation of an m × n matrix A with m ≥ n overwriting A by R. • The n “reflection vectors” v1 , . . . , vn are also computed. • I could normalise them but it is better not to, instead I divide vk v∗k by v∗k vk whenever the former is used! • Algebraically it makes no difference but numerically it can. • Why? • Having made this choice, I must remember to divide vk v∗k by v∗k vk whenever the former is used. • See the Algorithms following Alg. 5.7. & % MS4105 ' 327 $ Algorithm 5.7 Householder QR Factorisation (1) (2) (3) (4) (5) for k = 1 to n x = Ak:m,k vk = x + sign(x1 )kxke1 Ak:m,k:n = Ak:m,k:n − 2vk (v∗k Ak:m,k:n ) /(v∗k vk ) end Exercise 5.4 How should I efficiently implement line (3) in the Algorithm? Exercise 5.5 In fact I could marginally further improve this 2 1 algorithm by using the fact that ∗ = . This only vk vk sign(x1 )kxkv1 requires a division and two multiplications. How many are needed 2 to calculate ∗ directly? vk vk & % MS4105 ' 328 $ • Note that when A is square (m = n), n − 1 iterations are all that is needed to make A upper triangular. • So I have a choice: – Either a final iteration to calculate vn and also update A(n, n). – Or I could choose to define vn = 1. • Remember that I need v1 , . . . , vn to calculate the unitary matrix Q. • Note that the matlab-style notation avoids the necessity of multiplying A by n × n matrices, instead I update the relevant Ak:m,k:n submatrix at each iteration which is much more efficient. & % MS4105 329 ' 5.4.4 $ How is Q to be calculated? On completion of Alg. 5.7, A has been reduced to upper triangular form — the matrix R in the QR factorisation. I have not constructed the unitary m × m matrix Q nor its m × n sub-matrix ^ The reason is that forming Q or Q ^ takes extra work Q. (arithmetic) and I can often avoid this extra work by working directly with the formula Q∗ = Qn . . . Q 2 Q1 (5.24) Q = Q1 . . . Qn−1 Qn . (5.25) or its conjugate & % MS4105 330 ' $ N.B. 2vk v∗ k ∗ vk vk • At each iteration Hk = I − is hermitian and therefore Qk as defined in (5.18) is also so I can omit the “stars” in (5.25). • But Q is not Hermitian in general. & % MS4105 331 ' $ • For example I saw on Slide 278 that a square system of equations Ax = b can be solved via the QR factorisation of A. • I only needed Q to compute Q∗ b and by (5.24), I can calculate Q∗ b by applying a succession of Qk to b. • Using the fact that each Qk = I − vk v∗k /(v∗k vk ) then (once I know the n vectors v1 to vn ) I can evaluate Q∗ b by Algorithm 5.8 Implicit Calculation of Q∗ b for k = 1 to n bk:m = bk:m − 2vk (v∗k bk:m )/(v∗k vk ) end (1) (2) (3) i.e. the same sequence of operations that were applied to A to make it upper triangular. & % MS4105 332 ' $ • Similarly the computation of a product Qx can be performed by the same process in the reverse order: Algorithm 5.9 Implicit Calculation of Qx for k = n DOWNTO 1 xk:m = xk:m − 2vk (v∗k xk:m )/(v∗k vk ) end (1) (2) (3) • If I really need to construct Q I could: – construct QI using Alg. 5.9 by computing its columns Qe1 , . . . , Qem . ∗ In fact I can apply Alg. 5.9 to all the columns of the identity matrix simultanaeously, see Alg. 5.10. – or compute Q∗ I using Alg. 5.8 and conjugate the result. – or conjugate each step rather than the final product, i.e. to construct IQ by computing its rows e∗1 Q, . . . , e∗m Q & % MS4105 333 ' $ • The first idea is the best as it begins with operations involving Qn , Qn−1 etc. that only modify a small part of the vector that they are applied to. • This can result in a speed-up. • Here’s some pseudo-code: • Algorithm 5.10 Explicit Calculation of Q (1) (2) (3) (4) (5) (6) (7) (8) & Q = Im Initialise Q to the m × m identity matrix. for k = n DOWNTO 1 vk = V(k : m, k) T = Q(k : m, k : m) w = v∗k T T = T − 2vk w/(v∗k vk ) Q(k : m, k : m) = T end % MS4105 334 ' 5.4.5 $ Example to Illustrate the Stability of the Householder QR Algorithm Example 5.4 I’ll re-use the first example, again working with 3–digit f.p. arithmetic. 1 1 −3 −3 Take the three vectors a1 = 10 , a2 = 10 and 10−3 0 1 a3 = 0 as input. 10−3 & % MS4105 335 ' $ 1 1 −3 So the matrix A to be factored is A = 10 10−3 n = m = 3. −3 10 0 1 0 and 10−3 I’ll apply the Householder QR algorithm to A using 3–digit f.p. arithmetic as previously. For details, see App. V. I find that −1 −1 −3 R= 0 −1.0 10 0 0 & −1 0 . 1.0 10−3 % MS4105 336 ' $ Check that when Q is computed using Alg. 5.9, I find that −3 −3 −1.0 −1.0 10 1.0 10 −3 −7 Q = −1.0 10 −5.0 10 −1.0 1.0 5.0 10−7 −1.0 10−3 and that the computed Q satisfies 0.0 ∗ Q Q−I= 0.0 −5.0 10−10 0.0 −5.0 10−10 0.0 0.0 . 0.0 0.0 This is remarkably accurate — for this example at least. Finally, check that using three-digit f.p. arithmetic, QR is exactly equal to A. & % MS4105 337 ' 5.5 $ Why is Householder QR So Stable? The answer can be stated easily in the form of a Theorem (which I will not prove). First I need some definitions. Definition 5.1 (Standard Model of F.P. Arithmetic) Describe floating point arithmetic as follows: • Let β be the number base (usually 2) and t the number of digits precision. • Then f.p. numbers take the form y = ±m × βe−t . • The exponent e varies between emin and emax . • In matlab, with β = 2, emax = 1024. • 21024 ≈ 1.797693134862316 10308 . & % MS4105 ' 338 $ • The “significand” (or mantissa) m is an integer between 0 and βt − 1. d2 d1 dt + 2 + · · · + t = ±βe × 0.d1 d2 . . . dt . • So y = ±βe β β β • Each digit di satisfies 0 ≤ di ≤ β − 1 and d1 6= 0. • The first digit d1 is called the “most significant digit” and the last digit dt is called the “least significant digit”. 1 1−t is called the unit roundoff. • The number u = β 2 & % MS4105 ' 339 $ • It can be shown that if a real number x lies in the range of f.p. numbers F as defined above, then x can be approximated by a f.p. number f with a relative error no greater than u. • Formally: fl(x) = x(1 + δ), where |δ| < u. See App. U for a short proof. • So the result of a floating point calculation is the exact answer times a factor within “rounding error” of 1. & % MS4105 ' 340 $ I can state (but will not prove) the following Theorem to summarise the numerical properties of the Householder QR algorithm. First define γk = ku/(1 − ku) where u is the unit roundoff defined ˜ k = cγk for some small integer constant whose value is above and γ unimportant given the small size of u. ˜ k are very Check that even for large values of k, γk and therefore γ small. & % MS4105 341 ' $ Theorem 5.5 (Stability of Householder QR Algorithm) Let ^ and R ^ be the QR factors computed using the A be m × n and let Q Householder QR Alg. Then ^ • There is an orthogonal m × m matrix Q s.t. A + ∆A = QR where k∆aj k2 ≤ γmn kaj k2 for j = 1, . . . , n. ^ = Q(Im + ∆I) where • If Q = P1 P2 . . . Pn as usual then Q √ ^ ^ is very close to k∆I(:, j)k2 ≤ γmn so kQ − QkF ≤ nγmn so Q the orthonormal matrix Q. • Finally, ^ R)(:, ^ j)k2 k(A − Q ≡ ≤ ≤ ^ j) + ((Q − Q) ^ R)(:, ^ j)k2 k(A − QR)(:, ^ k^F kkR(:, j)k2 ˜ mn kaj k2 + kQ − Q γ √ ˜ mn kaj k2 nγ ^ and so the columns of the product of the computed matrices Q ^ are very close to the corresponding columns of A. R & % MS4105 342 ' 5.5.1 $ Operation Count The work done in Alg. 5.7 is dominated by the (implicit) inner loop Ak:m,j = Ak:m,j − 2vk (v∗k Ak:m,j ) for j = k, . . . , n. This inner step updates the jth column of the submatrix Ak:m,k:n . If I write L = m − k + 1 for convenience then the vectors in this step are of length L. The update requires 4L − 1 ≈ 4L flops. Argue as follows: L flops for the subtractions, L for the scalar multiple and 2L − 1 for the inner product (L multiplications and L − 1 additions). & % MS4105 343 ' $ Now the index j ranges from k to n so the inner loop requires ≈ 4L(n − k + 1) = 4(m − k + 1)(n − k + 1) flops. Finally, the outer loop ranges from k = 1 to k = n so I can write W, the total number of flops used by the Householder QR algorithm as: W= = 1 X k=n n X 4(m − k + 1)(n − k + 1) 4(m − n + k)k k=1 n n X X = 4 (m − n) k+ k2 k=1 k=1 = 4 (m − n)n(n + 1)/2 + n(n + 1)(2n + 1)/6 = 2 mn2 + 2 mn + 2/3 n − 2/3 n3 ≈ 2mn2 − 2/3n3 & (5.26) % MS4105 ' 344 $ So Householder QR factorisation is more efficient than the (modified) factorisation algorithm (see 5.14). It is also more stable — i.e. less prone to accumulated inaccuracies due to round-off errors — as the Example in Section 5.4.5 suggests. (The built-in qr command is much faster - this is mainly due to the fact that is is pre-compiled code which is not interpreted line-by-line.) & % MS4105 345 ' 5.6 $ Least Squares Problems Suppose that I have a linear system of equations with m equations and n unknowns with m > n — i.e. I want to solve Ax = b for x ∈ Cn where A ∈ Cm×n and b ∈ Cm . In general there is no solution. When can I find a solution? Exactly when b ∈ range A, which is unlikely to happen by chance. Such systems of equations with m > n are called overdetermined. The vector r = Ax − b called the residual may be small for some choices of x but is unlikely to be zero for any x. The natural resolution to an insoluble problem is to re-formulate the problem. I re-define the problem to that of finding the choice of x that makes the norm (usually the 2-norm) of r as small as possible. This is referred to as the Least Squares problem as, when the 2-norm is used, the squared norm of r is a sum of squares. & % MS4105 346 ' 5.6.1 $ Example: Polynomial Data-fitting If I have data (xi , yi ), i = 1, . . . , m where the xi , yi ∈ C then there is a unique polynomial p(x) = c0 + c1 x + c2 x2 + · · · + cm−1 xm−1 , of degree m − 1 that interpolated these m points — p(x) can be found by solving the square linear system (the matrix is called the Vandermonde matrix). 1 1 .. . 1 & x1 x21 ... x2 .. . x22 .. . ... xm x2m ... ... x1m−1 xm−1 2 c0 y1 c1 y1 c2 = y2 .. . . .. .. . xm−1 m cm−1 ym % MS4105 ' 347 $ This is a system of m linear equations in the m unknowns ci ∈ C, i = 0, . . . , m − 1 so there is a unique solution provided the Vandermonde matrix is full rank. In practice this technique is rarely used as the high-degree polynomials needed to interpolate large data sets are typically highly oscillatory. Additionally the Vandermonde matrix is ill-conditioned for large m — leading to numerical instability. For this reason it is better to “fit” the data with a relatively low order polynomial of degree n < m. Without changing the data points I can easily reformulate the problem from interpolation to: find coefficients c0 , . . . , cn such that krk2 is minimised where r = Vc − y and V is the Vandermonde matrix as above with m rows and n columns with n < m. & % MS4105 348 ' $ I have 1 1 .. . x1 x21 ... x2 .. . x22 .. . ... 1 xm x2m ... ... xn−1 1 xn−1 2 c0 y1 y1 c1 c2 ≈ y2 .. . . .. .. . xn−1 m cn−1 ym (5.27) and want to choose c0 , . . . , cn to make the norm of the residual r = Vc − y as small as possible. & % MS4105 349 ' 5.6.2 $ Orthogonal Projection and the Normal Equations How to “solve” (5.27) — i.e. to to choose c0 , . . . , cn to make the norm of the residual r = Vc − y as small as possible needs some consideration. I want to find the closest point Ax in the range of A to b — so that the norm of the residual r is minimised. It is plausible that this will occur provided that Ax = Pb where P is the orthogonal projection operator that projects Cm onto the range of A — i.e. (5.6) P = A(A∗ A)−1 A∗ . This means that r ≡ Ax − b = Pb − b = −(I − P)b — so the residual r is orthogonal to the range of A. & % MS4105 350 ' $ I bundle these ideas together into a Theorem. Theorem 5.6 Let A be an m × n complex matrix with m ≥ n and let b ∈ Cm be given. A vector x ∈ Cn minimises krk2 ≡ kb − Axk2 , the norm of the residual (solving the least squares problem) if and only if r ⊥ range(A) or any one of the following equivalent equations hold: A∗ r = 0 (5.28) A∗ Ax = A∗ b (5.29) Pb = Ax (5.30) where P ∈ Cm×m is the orthogonal projection operator onto the range of A. The n × n system of equations (5.29) are called the normal equations and are invertible iff A has full rank. It follows that the solution x to the least squares problem is unique iff A has full rank. & % MS4105 ' 351 $ Proof: • The equivalence of (5.28)–(5.30) is easy to check. • To show that y = Pb is the unique point in the range of A that minimises kb − Axk2 , suppose that z 6= y is another point in range A. Since z − y ∈ range A and b − y = (I − P)b I have (b − y) ⊥ (z − y) so kb − zk22 = k(b − y) + (y − z)k22 = kb − yk22 + ky − zk22 > kb − yk22 — the result that I need. • Finally; – If A∗ A is singular then A∗ Ax = 0 for some non-zero x so that x∗ A∗ Ax = 0 and so Ax = 0 meaning that A is not full rank. – Conversely if A is rank-deficient then Ax = 0 for some non-zero x — implying that A∗ Ax = 0 and so A∗ A is singular. & % MS4105 352 ' 5.6.3 $ Pseudoinverse I have just seen that if A is full rank then x, the solution to the least squares problem min kAx − bk2 (5.31) is unique and that the solution is given by the solution x = (A∗ A)−1 A∗ b to the normal equations (5.29). Definition 5.2 The matrix (A∗ A)−1 A∗ is called the pseudoinverse of A, written A+ . A+ = (A∗ A)−1 A∗ ∈ Cn×m . (5.32) This matrix maps vectors b ∈ Cm to vectors x ∈ Cn — so it has more columns than rows. If n = m and A is invertible then A+ = A−1 , hence the name. & % MS4105 ' 353 $ The (full-rank) least squares problem (5.31) can now be summarised as that of computing one or both of x = A+ b and y = Pb. The obvious question: how to solve these equations. & % MS4105 354 ' 5.6.4 $ Solving the Normal Equations The obvious way to solve least squares problems is to solve the normal equations (5.29) directly. This can give rise to numerical problems as the matrix A∗ A has eigenvalues equal to the squares of the singular values of A — so the range of eigenvalues will typically be great, resulting in a large condition number (the ratio of the largest to the least eigenvalue). It can be shown that a large condition number makes the process of solving a linear system inherently unstable — leading to loss of accuracy. & % MS4105 355 ' $ A better way is to use the reduced QR factorisation. I have seen ^R ^ can be constructed using that a QR factorisation A = Q Gram-Schmidt orthogonalisation or more often using Householder triangularisation. The orthogonal projection operator P can be written P = A(A∗ A)−1 A∗ ^ R( ^ R ^∗Q ^ ∗Q ^ R) ^ −1 R ^∗Q ^∗ =Q ^ R( ^ R ^ ∗ R) ^ −1 R ^∗Q ^∗ =Q ^R ^R ^ −1 R ^ −∗ R ^∗Q ^∗ =Q ^Q ^∗ =Q ^Q ^ ∗ b. so I have y = Pb = Q & % MS4105 356 ' $ ^ which This is a nice result as it only involves the unitary matrix Q is numerically stable (as its eigenvalues are complex numbers with unit modulus; λi = eiφ — check ). Since y ∈ range A, the system Ax = y has a unique solution. I can write ^ Rx ^ =Q ^Q ^ ∗b Q ^ ∗ gives and left-multiplication by Q ^ =Q ^ ∗ b. Rx This last equation for x is an upper triangular system which is invertible if A has full rank and can be efficiently solved by back substitution. & % MS4105 357 ' 5.7 & $ Project % MS4105 358 ' 6 $ The Singular Value Decomposition The singular value decomposition (SVD) is a matrix factorisation which is the basis for many algorithms. It also gives useful insights into different aspects of Numerical Linear Algebra. For any complex m × n matrix A I will show that its SVD is A = UΣV ∗ (6.1) where U and V are unitary m × m and n × n matrices respectively and Σ is an m × n diagonal matrix. I do not assume that m is greater than n so that A can be “short and wide” or “tall and thin”. The diagonal elements of Σ are called the “singular values” of A and the number of singular values is just min(m, n). & % MS4105 359 ' $ A point that often causes confusion; how can Σ be diagonal if it is not square? • Suppose m < n so that A a a a a A= a a a a is “short and wide” — say 2 × 5: a a = UΣV ∗ u = u u σ 1 u 0 0 σ2 v v 0 0 0 v 0 0 0 v v v v v v v v v v v v v v v v v v v v v v i.e. Σ is “augmented” with three extra columns of zeroes so that the matrix multiplication works. & % MS4105 ' 360 $ Strictly speaking the last three rows of V ∗ (the last 3 columns of V) are not needed as they are multiplied by the zero entries in Σ when A is formed. In practice these columns are not always calculated. Of course, if these redundant columns are required then they must be such that V is unitary. I will return to this point when I discuss the uniqueness or otherwise of the SVD in Chapter 6.2. & % MS4105 361 ' $ • Alternatively suppose m > n so that A is “tall and thin” — say 5 × 2: a a a a A = a a a a a a = UΣV ∗ u u u u = u u u u u & u u u u u u u u u u u u σ1 u 0 u 0 u 0 u 0 0 σ2 v 0 v 0 0 v v % MS4105 ' 362 $ In this case Σ is “augmented” with three extra rows of zeroes — again so that the matrix multiplication works. Moreover the last 3 columns of U are not needed as they are multiplied by the zero entries in Σ when A is formed Again in practice these columns are not always calculated. And as for V when A is “short and wide” , here if these redundant columns of U are required then they must be such that U is unitary. See Chapter 6.2. In both cases (2 × 5 or 5 × 2) A has two singular values. In this Chapter I begin by demonstrating that all m × n matrices have a SVD and then show that the decomposition is unique in a certain sense. & % MS4105 363 ' 6.1 $ Existence of SVD for m × n Matrices Suppose that A is an arbitrary (possibly complex) m × n matrix. I know that kAxk kAk = sup x6=0 kxk = sup kAxk kxk=1 Note: in this Chapter unless stated otherwise, all norms are 2-norms. To avoid clutter, the 2–subscript on the norms will be omitted. & % MS4105 ' 364 $ I begin with a Lemma that will allow us to prove the main result. Lemma 6.1 For any (possibly complex) m × n matrix A, there exist unitary matrices U (m × m ) and V (n × n ) s.t. σ 0 ∗ (6.2) U AV = 0 B where σ = kAk and B is an (m − 1) × (n − 1) matrix. Proof: • I have σ = kAk. • The function kAxk is a continuous function of x and the unit ball is closed and bounded. • So the supremum in the definition of kAk is attained by some unit vector x◦ i.e. ∃x◦ |kAx◦ k = kAk. & % MS4105 365 ' • Let y◦ = $ Ax◦ Ax◦ = . kAk σ • Then Ax◦ = σy◦ and ky◦ k = 1. • By Thm. 1.14 (b) I can select {u2 , . . . , um } and {v2 , . . . , vn } s.t. {y◦ , u2 , . . . , um } and {x◦ , v2 , . . . , vn } form orthonormal bases for Cm and Cn respectively. • Define the matrices U and V by: h U = y◦ h V = x◦ i U1 i V1 where the vectors {u2 , . . . , um } are the columns of U1 and the vectors {v2 , . . . , vn } are the columns of V1 . • From these definitions it is easy to see that U and V are unitary — U∗ U = I and V ∗ V = I so U∗ = U−1 and V ∗ = V −1 . & % MS4105 366 ' $ • Then ∗ h i y ◦ A x◦ V1 U∗ AV = U∗1 σ y◦ ∗ AV1 = 0 B where B = U∗1 AV1 and the zero element appears as Ax◦ = σy◦ and y◦ is orthogonal to the columns of U1 . • I can write σ ∗ U AV = 0 & ω∗ B = A1 , (say); with ω∗ = y◦ ∗ AV1 ∈ Cn−1 . % MS4105 ' 367 $ I will now show that ω = 0 which means that A1 is block diagonal. as required. • First note that kA1 k = kAk as kUAk = kAk if A is unitary. kUAxk This follows as kUAk = sup . But for any vector v, kxk kxk=1 kUvk2 = v∗ U∗ Uv = kvk2 . So kUAk = kAk. • Now for the clever part of the proof: 2 2 σ ω∗ σ σ A1 ≡ ω ω 0 B 2 (σ2 + ω∗ ω) = Bω 2 2 ∗ ≥ σ +ω ω . & % MS4105 ' 368 $ • Also (as for any compatible A and x, kAxk ≤ kAkkxk); 2 2 σ σ A1 ≤ kA1 k2 ω ω 2 2 ∗ = kA1 k σ + ω ω 2 2 ∗ =σ σ +ω ω . • Combining the two inequalities: σ2 (σ2 + ω∗ ω) ≥ (σ2 + ω∗ ω)2 which forces ω∗ ω = 0 and therefore ω = 0 as required. σ 0 ∗ as claimed. So U AV = 0 B & % MS4105 369 ' $ Now I prove the main result — that any m × n complex matrix has a SVD. Theorem 6.2 (Singular Value Decomposition) For any (possibly complex) m × n matrix A, there exist unitary matrices U and V s.t. A = UΣV ∗ (6.3) where U and V are unitary m × m and n × n matrices respectively and Σ is an m × n diagonal matrix with min(m, n) diagonal elements. Proof: I will prove the main result by induction on m and n. & % MS4105 ' 370 $ [Base Step] This is the case where either n or m equal to 1. • if n = 1 thenA isa column vector and Lemma 6.1 reduces σ1 ∗ = Σ. to: U AV = 0 • If m = 1 hthen Aiis a row vector and again U∗ AV = σ1 0 = Σ. So in either case ( either n or m equal to 1) I have (6.3) where U and V are unitary m × m and n × n matrices respectively and Σ is a diagonal matrix of “singular values” . & % MS4105 371 ' $ [Inductive Step] By Lemma 6.1 I can write σ1 0 ∗ . U AV = 0 B (6.4) The inductive hypothesis: assume that any (m − 1) × (n − 1) matrix has a SVD. So the (m − 1) × (n − 1) matrix B from (6.4) can be written as: B = U2 Σ2 V2∗ where Σ2 is diagonal and padded with rows or columns of zeroes to make it m × n as usual. RTP that A has a SVD. & % MS4105 372 ' Using (6.4) and substituting for B I have: σ1 0 V∗ A=U 0 U2 Σ2 V2∗ σ1 0 1 1 0 =U 0 Σ2 0 0 U2 $ 0 V2 ∗ V∗ = U 0 ΣV 0∗ . 1 0 1 0 and V 0 = V are The matrices U 0 = U 0 U2 0 V2 products of unitary matrices and are therefore themselves unitary. & % MS4105 373 ' 6.1.1 $ Some Simple Properties of the SVD Lemma 6.3 The singular values of a complex m × n matrix are real and non-negative — and may be ranked in decreasing order: σ1 ≥ σ2 ≥ · · · ≥ σn . Proof: I have σ1 ≡ σ ≡ kAk. But I now have A = UΣV ∗ so kAxk2 ≡ x∗ VΣ∗ ΣV ∗ x = kΣzk2 , where z = V ∗ x. So (letting Σ2 = diag(σ2 , . . . , σn )) σ1 ≡ kAk = sup kAxk = sup kΣzk ≥ kxk=1 kzk=1 sup kΣ(0, yT )T k z=(0,yT )T ,kyk=1 = sup kΣ2 yk = kΣ2 k = σ2 , kyk=1 as the norm of a diagonal matrix is its biggest diagonal in magnitude (see Exercise 4.4) and so σ1 ≥ σ2 . By induction the result follows. & % MS4105 374 ' $ The following Lemma ties up the relationship between three important quantities: kAk2 , σ1 (the largest singular value ) and λ1 (the largest eigenvalue of A∗ A). Lemma 6.4 For any m × n complex matrix A; p kAk2 ≡ σ1 = λ1 . (6.5) where λ1 is the largest eigenvalue of A∗ A. In particular, the 2-norm of a matrix is its largest singular value. Proof: Let q1 , . . . , qn be the (orthonormal) eigenvectors of A∗ A with corresponding (real and non-negative — why?) eigenvalues λ1 , . . . , λn . & % MS4105 375 ' $ Then (taking x to be an arbitrary unit vector in the 2-norm on Cn ) kAxk22 = (Ax)∗ (Ax) = x∗ A∗ Ax = n X ¯i q∗i A∗ Axj qj x i,j=1 = n X ¯i xi λi = x i=1 ≤ λ1 n X n X λi |xi |2 i=1 |xi |2 = λ1 . i=1 But the inequality is satisfied with equality if x = q1 so for any m × n complex matrix A; p kAk2 ≡ σ1 = λ1 . (6.6) where λ1 is the largest eigenvalue of A∗ A. & % MS4105 376 ' $ I have proved that every (real or complex) m × n matrix has a SVD. I don’t yet know how to calculate the SVD but I can show an example. Example 6.1 Take 1 A= 2 2 3 9 3 4 8 11 12 Then the matlab command [u,s,v]=svd(a) gives & % MS4105 ' 377 $ −0.6904 −0.7235 U= −0.7235 0.6904 21.2308 0 0 0 0 Σ= 0 1.5013 0 0 0 −0.1007 0.4378 −0.2772 −0.4170 −0.7399 −0.1673 0.4157 −0.3603 0.8162 −0.0564 V = −0.2339 0.3936 0.8700 0.1265 −0.1324 −0.5653 −0.6584 0.0473 0.2094 −0.4483 −0.7666 0.2172 −0.1852 −0.3163 0.4804 & % MS4105 378 ' $ It is easy to check (in matlab!) that UΣV ∗ = A to fourteen decimal places — which is pretty good. It is also easy to check that, as expected, the matrices U and V are unitary (orthogonal as A is real). Notice also that Σ is padded on the right with three columns of zeroes as expected — it has 2 = min(2, 5) diagonal elements. ∗ ∗ Of course, I can now write down the SVD of A = VΣ U. The 21.2308 0 0 1.5013 ∗ matrix Σ = 0 0 — again with 2 diagonal elements. 0 0 0 & 0 % MS4105 379 ' $ When a matrix A is square, say 2 × 2, things are slightly simpler. Example 6.2 Take 1 A= 2 2 3 Then the matlab command [u,s,v]=svd(a) gives −0.5257 −0.8507 U= −0.8507 0.5257 4.2361 0 Σ= 0 0.2361 −0.5257 0.8507 V= −0.8507 −0.5257 & % MS4105 ' 380 $ • Again U and V are unitary — in this case this is true by inspection — this time U = V (apart from the sign of the second column), I will learn why shortly. Now Σ is a 2 × 2 diagonal matrix and the singular values are 4.2361 and 0.2361. It is easy to calculate the eigenvalues of A, by hand or with √ matlab and I find that λ = 2 ± 5 ≈ 4.2361, −0.2361. • So the singular values of any matrix A are the absolute values of the eigenvalues of A? Not quite. For one thing, non-square matrices do not have eigenvalues! • When, as in the present example, A∗ = A, A has real eigenvalues and the matrix A∗ A = A2 has eigenvalues that are the square of those of A which “explains” why the singular values for the present example are the absolute values of the eigenvalues of A. This is not true in general. & % MS4105 ' 381 $ The following Theorem should make things clearer. Theorem 6.5 For any m × n complex matrix A, the singular values are given by p σi = λi , where λi are the eigenvalues of A∗ A. (6.7) Proof: I have A = UΣV ∗ for any m × n matrix A. Then A∗ A = VΣ∗ ΣV ∗ = VΣT ΣV ∗ as Σ is real. So A∗ AV = VΣT Σ which means that the non-zero eigenvalues of A∗ A are the σ2i , the squares of the singular values. Note: the n × n matrix ΣT Σ has the squares of the singular values on its main diagonal. If m < n (e.g. Example 6.1) then ΣT Σ has σ21 , . . . , σ2m on its main diagonal with n − m zeros on the main diagonal. If m > n then ΣT Σ has σ21 , . . . , σ2n on its main diagonal (e.g. the final comments in Example 6.1). & % MS4105 382 ' 6.1.2 $ Exercises 1. Find the SVD of the following matrices by any method you wish. Hint: first find the singular values by finding the eigenvalues of A∗ A then work out U and V as either identity matrices — possibly with columns swapped — or generic real unitary matrices. 0 2 3 0 2 0 1 1 1 1 0 0 0 −2 0 3 0 0 1 1 0 0 & % MS4105 ' 383 $ 2. Two m × n complex matrices A,B are said to be “unitarily equivalent” if A = QBQ∗ for some unitary matrix Q. Is the following statement true or false: “A and B are unitarily equivalent if and only if they have the same singular values ”? Hint: • Write A and B as different singular value decompositions — assume they are unitarily equivalent, what does this imply for their singular values ? • Now assume that they have the same singular values, does it follow that they are unitarily equivalent? & % MS4105 ' 384 $ 3. • Every complex m × n matrix A has a SVD A = UΣV ∗ . • Show that if A is real then A has an SVD with U and V real. • Hint: consider AA∗ and A∗ A; show that both matrices are real and symmetric and so have real eigenvalues. • Explain why their eigenvectors may be taken to be real. (Hint: if ui is an eigenvector of AA∗ ; what can you say ¯ i , its complex conjugate?) about u • (Note the SVD is not unique although the singular values are, see next Section.) & % MS4105 385 ' $ 4. Show that multiplying each column of U by a different complex number of unit modulus: U(:, 1) = eiφ1 U(:, 1), U(:, 2) = eiφ2 U(:, 2), . . . and multiplying the columns of V by the same phases: V(:, 1) = eiφ1 V(:, 1), V(:, 2) = eiφ2 V(:,2), . . . leaves the product eiφ1 ˜ = U UΣV ∗ unchanged. Hint: write U .. . eiφm and ˜ V˜ = UΣV = A. (The phases similarly for V. Show that UΣ cancel.) & % MS4105 ' 386 $ 5. (Matlab/Octave exercise) Re-visit Example 6.1, then add the following code: >>ph=rand(5,1); >>for j=1:5 >> v(:,j)=v(:,j)*exp(i*ph(j)); >>end >>for j=1:2 >> u(:,j)=u(:,j)*exp(i*ph(j)); >>end >> u*s*v’-a Check that the last line returns a 2 × 5 matrix that is very close to zero. Explain. & % MS4105 387 ' 6.2 $ Uniqueness of SVD I saw in the Exercises that the SVD is not in fact unique; I could multiply the columns of U and V by the same phases (complex number of unit modulus) leaving UΣV ∗ unchanged. I will prove two Theorems: the first confirms that the diagonal matrix Σ is unique and the second clarifies exactly how much room for manoever there is in choosing U and V. Theorem 6.6 Given an m × n matrix A, the matrix of singular values Σ is unique. Proof: Take m ≥ n (the proof for the other case is similar). Let A = UΣV ∗ and also A = LΩM∗ be two SVD’s for A. RTP that Σ = Ω. & % MS4105 ' 388 $ The product AA∗ = UΣV ∗ VΣ∗ U∗ = UΣΣ∗ U∗ and also AA∗ = LΩΩ∗ L∗ . As I saw in the discussion on Slide 360 when m ≥ n both Σ and Ω are also m × n with the last m − n rows consisting of zeroes. Then the matrices Σ2 ≡ ΣΣ∗ and Ω2 = ΩΩ∗ both take the form of an m × m diagonal matrix consisting of an n × n diagonal matrix in the top left corner with zeroes elsewhere. & % MS4105 389 ' $ So equating the two expressions for AA∗ ; Σ2 = U∗ LΩ2 (U∗ L)∗ = OΩ2 O∗ where O = U∗ L, an m × m unitary matrix. The eigenvalues of Σ2 are the roots of the charactistic polynomial p(λ) = det(λI − Σ2 ) = det(λI − OΩ2 O∗ ) = det(O(λI − Ω2 )O∗ ) = det(O) det(O∗ ) det(λI − Ω2) = det(λI − Ω2) where the final equality follows as O is unitary. So Σ2 and Ω2 have the same eigenvalues. But as they are both diagonal by definition I can conclude (if the same ordering of eigenvalues is used for both) that they are equal and that therefore so are Σ and Ω. & % MS4105 390 ' 6.2.1 $ Uniqueness of U and V Now I need to clarify whether U and V are unique. They aren’t. The following Theorem clarifies to what extent U and V are arbitrary: Theorem 6.7 If an m × n matrix A has two different SVD’s A = UΣV ∗ and A = LΣM∗ then U∗ L = diag(Q1 , Q2 , . . . , Qk , R) V ∗ M = diag(Q1 , Q2 , . . . , Qk , S) where Q1 , Q2 , . . . , Qk are unitary matrices whose sizes are given by the multiplicities of the corresponding distinct non-zero singular values — and R, S are arbitrary unitary matrices whose size equals the number of zero singular values. More precisely, if Pk q = min(m, n) and qi = dim Qi then i=1 qi = r = rank(A) ≤ q. & % MS4105 391 ' $ Before I get bogged down in detail! When, as is usually the case, all the singular values are different the Theorem simply says that any alternative to the matrix U, (L, say) satisfies L = UQ where Q is a diagonal matrix of 1 × 1 unitary matrices. A 1 × 1 unitary matrix is just a complex number z with unit modulus, |z| = 1 or z = eiφ . (This is what you were asked to check in Sec. 6.1.2, Exercise 4.) Just in case you didn’t do the Exercise, let’s translate the Theorem into simple language in the case when all the singular values are different. If an m × n matrix A has two different SVD’s A = UΣV ∗ and A = LΣM∗ then I must have L = UQ and M = VP where R is an arbitrary (for m > n) (m − n) × (m − n) unitary matrix: Q = diag(eiφ1 , eiφ2 , . . . , eiφn , R) P = diag(eiφ1 , eiφ2 , . . . , eiφn ) & % MS4105 392 ' $ Substituting for L and M, LΣM∗ = UQΣP∗ V ∗ so, when all the singular values are different, it is easy to see that QΣP∗ = Σ as the phases cancel. Example 6.3 Let (using Matlab/Octave ’s [u,s,v]=svd(a)) 1 5 −0.4550 0.7914 0.4082 A= 2 6 , U = −0.5697 0.0936 −0.8165 3 7 −0.6844 −0.6041 0.4082 and 11.1005 0 −0.3286 −0.9445 Σ= 0 0.8827 , V = −0.9445 0.3286 0 0 & % MS4105 ' 393 $ The Matlab/Octave command q=diag(exp(i*rand(3,1)*2*pi)) generates a 3 × 3 diagonal matrix Q whose diagonal entries are random complex numbers of unit phase: −0.4145 − 0.9100i 0 0 I Q= 0 −0.9835 + 0.1807i 0 0 0 0.9307 + 0.3659i −0.4145 − 0.9100i 0 and it is easy to have P = 0 −0.9835 + 0.1807i check that QΣP∗ = Σ so the matrix A satisfies A = UΣV ∗ = LΣM∗ . (In this example R is the 1 × 1 “unitary matrix” 0.9307 + 0.3659i.) & % MS4105 ' 394 $ Construct another example using Matlab/Octave to illustrate the m < n case. The proof of Thm. 6.7 may be found in Appendix G. & % MS4105 395 ' 6.2.2 $ Exercises 1. Using matlab/octave, construct an example to illustrate the result of Thm. 6.7. & % MS4105 396 ' 6.3 $ Naive method for computing SVD I still don’t know how to calculate the SVD — the following is a simple method which might work, let’s see. It is easy to check that if A = UΣV ∗ then A∗ A = VΣ∗ ΣV ∗ and AA∗ = UΣΣ∗ U∗ . Both A∗ A and AA∗ are obviously square and hermitian and so have real eigenvalues and eigenvectors given by: A∗ AV = VΣ∗ Σ AA∗ U = UΣΣ∗ . In other words A∗ A has eigenvalues σ21 , . . . , σ2p (p = min(m, n)) and eigenvectors given by the columns of V. & % MS4105 ' 397 $ Similarly AA∗ has the same eigenvalues and eigenvectors given by the columns of U. So one way to compute the SVD af an m × n complex matrix A is to form the two matrices A∗ A and AA∗ and find their eigenvalues and eigenvectors using the Matlab/Octave eig command. Let’s try it. (You can download a simple Matlab/Octave script to implement the idea from http://jkcray.maths.ul.ie/ms4105/verynaivesvd.m.) The listing is in App. D. The final line calculates the norm of the difference between UΣV ∗ and the initial (randomly generated) matrix A. This norm should be close to the matlab built-in constant eps, about 10−16 . & % MS4105 ' 398 $ >> verynaivesvd ans = 1.674023336836066 >> verynaivesvd ans = 1.091351067518971e-15 >> verynaivesvd ans = 1.125583683053479 >> verynaivesvd ans = 1.351646685125938e-15 & % MS4105 ' 399 $ Annoyingly, this Matlab/Octave script sometimes works but sometimes doesn’t. Why? Remember that the columns of U and V can be multiplied by arbitrary phases (which are the same for corresponding columns of U and V). The problem is that this method has no way to guarantee that the phases are the same as I are separately finding the eigenvectors of A∗ A and AA∗ . Sometimes (by luck) the script will generate U and V where the phases are consistent. Usually it will not. I will see a more successful approach in Sec. 6.5— in the meantime I will just use the Matlab/Octave code: >> [u,s,v]=svd(a) when I need the SVD of a matrix. & % MS4105 400 ' 6.4 $ Significance of SVD In this section I see how the SVD relates to other matrix properties. 6.4.1 Changing Bases One way of interpreting the SVD is to say that every matrix is diagonal — if I use the right bases for the domain and range spaces for the mapping x → Ax. Any vector b ∈ Cm can be expanded in the space of the columns of U and any x ∈ Cn can be expanded in the basis of the columns of V. & % MS4105 401 ' $ The coordinate vectors for these expansions are b 0 = U∗ b, x 0 = V ∗ x. Using A = UΣV ∗ , the equation b = Ax can be written in terms of these coefficient vectors (b 0 and x 0 ): b = Ax U∗ b = U∗ Ax = U∗ UΣV ∗ x b 0 = Σx 0 . So whenever b = Ax, I have b 0 = Σx 0 . A reduces to the diagonal matrix Σ when the range is expressed in the basis of columns of U and the domain is expressed in the basis of the columns of V. & % MS4105 402 ' 6.4.2 $ SVD vs. Eigenvalue Decomposition Of course the idea of diagonalisation is fundamental to the study of eigenvalues — Ch. 8. I will see there that a non-defective square matrix A can be expressed as a diagonal matrix of eigenvalues Λ if the domain and range are represented in a basis of eigenvectors. If an n × n matrix X has as its columns the linearly independent eigenvectors of an n × n complex matrix A then the eigenvalue decomposition of A is A = XΛX−1 (6.8) where Λ is an n × n diagonal matrix whose entries are the eigenvalues of the matrix A. (I will see in Ch. 8 that such a factorisation is not always possible.) & % MS4105 ' 403 $ So if I have Ax = b, b ∈ Cn then if b 0 = X−1 b and x 0 = X−1 x then b 0 = Λx 0 . See Ch. 8 for a full discussion. • One important difference between the SVD and the eigenvalue decomposition is that the SVD uses two different bases (the columns of U and V) while the eigenvalue decomposition uses just one; the eigenvectors. • Also the SVD uses orthonormal bases while the eigenvalue decomposition uses a basis that is not in general orthonormal. • Finally; not all square matrices (only non-defective ones, see Thm. 8.5) have an eigenvalue decomposition but all matrices (not necessarily square) have a SVD. & % MS4105 404 ' 6.4.3 $ Matrix Properties via the SVD The importance of the SVD becomes clear when I look at its relationship with the rest of Matrix Algebra. In the following let A be a complex m × n matrix, let p = min(m, n) (the number of singular values ) and let r ≤ p be the number of non-zero singular values. First a simple result — useful enough to deserve to be stated as a Lemma. (It is the second half of the result on Slide 167 that in the matrix-matrix product B = AC, each column of B is a linear combination of the columns of A.) Lemma 6.8 If A and C are compatible matrices then each row of B = AC is a linear combination of the rows of C. & % MS4105 405 ' $ Proof: Write B = AC in index notation: X bik = aij cjk . j Fixing i corresponds to choosing the ith row of B, b∗i . So X ∗ bi = aij c∗j . j Or check that the result follows from the result on Slide 167 by taking transposes. & % MS4105 ' 406 $ Now an important result that allows us to define the rank of a matrix in a simple way. Theorem 6.9 The row and column ranks of any m × n complex matrix A are equal to r, the number of non-zero singular values. Proof: Let m ≥ n for the sake of definiteness — a tall thin matrix. (If m < n then consider A∗ which has more rows than columns. The proof below shows that the row and column ranks of A∗ are equal. But the row rank of A∗ is the column rank of A and the column rank of A∗ is the row rank of A.) I have that r of the singular values are non-zero. RTP that the row and column ranks are both equal to r. & % MS4105 407 ' $ I can write the m × n diagonal matrix Σ as ^ 0 Σ , Σ= 0 0 ^ is the r × r diagonal matrix of non-zero singular values. where Σ Now ∗ σ1 v1 σ2 v∗ 2 .. ∗ ΣV = . σ v∗ r r 0 & % MS4105 408 ' $ and ∗ h UΣV = u1 u2 ... ∗ σ1 v1 σ2 v∗ 2 i . . um . = σ v∗ r r 0 h u1 & u2 ... ur ∗ σ1 v1 σ2 v∗ 2 i . . 0 . (6.9) σ v∗ r r 0 % MS4105 ' 409 $ So every column of A = UΣV ∗ is a linear combination of the r linearly independent vectors u1 , . . . , ur and therefore the column rank of A is r. But the last equation 6.9 for UΣV ∗ also tells us (by Lemma 6.8) that every row of A = UΣV ∗ is a linear combination of the r linearly independent row vectors v∗1 , . . . , v∗r and so the row rank of A is r. Theorem 6.10 The range of A is the space spanned by u1 , . . . , ur and the nullspace of A is the space spanned by vr+1 , . . . , vn . Proof: Using 6.9 for A = UΣV ∗ , & % MS4105 410 ' h Ax = u1 $ u2 ... ur ∗ σ1 v1 x σ2 v∗ x 2 i . . 0 . which is a linear σ v∗ x r r 0 combination of u1 , . . . , ur . Also if a vector z ∈ Cn is a linear combination of vr+1 , . . . , vn then, again using 6.9 for A = UΣV ∗ and the fact that the vectors v1 , . . . , vn are orthonormal I have Az = 0. & % MS4105 ' 411 $ Theorem q 6.11 kAk2 = σ1 the largest singular value of A and kAkF = σ21 + σ22 + · · · + σ2r . Proof: I already have the first result by (6.5). You shold check that the Frobenius norm satisfies kAk2F = trace(A∗ A) = trace(AA∗ ). It follows that the Frobenius norm is invariant under multiplication by unitary matrices (check ) so kAkF = kΣkF . Theorem 6.12 The nonzero singular values of A are the square roots of the non-zero eigenvalues of AA∗ or A∗ A (the two matrices have the same eigenvalues ). (I have established this result already — but re-stated and proved here for clarity.) Proof: Calculate A∗ A = V(Σ∗ Σ)V ∗ , so A∗ A is unitarily similar to Σ∗ Σ and so has the same eigenvalues by Thm. 8.2. The eigenvalues of the diagonal matrix Σ∗ Σ are σ21 , σ22 , . . . , σ2p together with n − p additional zero eigenvalues if n > p. & % MS4105 ' 412 $ Theorem 6.13 If A is hermitian (A∗ = A) then the singular values of A are the absolute values of the eigenvalues of A. Proof: Remember that a hermitian matrix has a full set of orthonormal eigenvectors and real eigenvalues (Exercises 4.2.6). But then A = QΛQ∗ so A∗ A = QΛ2 Q∗ and the squares of the (real) eigenvalues are equal to the squares of the singular values — i.e. λ2i = σ2i and so as the singular values are non-negative I have σi = |λi |. & % MS4105 413 ' $ Theorem 6.14 For any square matrix A ∈ Cn×n , the modulus of Qn the determinant of A, | det A| = i=1 σi . Proof: The determinant of a product of square matrices is the product of the determinants of the matrices. Also the determinant of a unitary matrix has modulus equal to 1 as U∗ U = I and det(U∗ ) = (det(U)) (as a determinant is a sum of products of entries in the matrix and det AT = det A). Therefore | det A| = | det UΣV ∗ | = | det U|| det Σ|| det V ∗ | = | det Σ| = n Y σi . i=1 & % MS4105 414 ' 6.4.4 $ Low-Rank Approximations One way to understand and apply the SVD is to notice that the SVD of a matrix can be re-interpreted as an expansion of A as a sum of rank-1 matrices. The following result is surprising at first but easily checked. It is reminiscent of (but completely unrelated to) the Taylor Series expansion for smooth functions on R. And it is crucial in the succeeding discussion. Theorem 6.15 Any m × n matrix A can be written as a sum of rank-1 matrices: r X A= σj uj v∗j (6.10) j=1 Proof: Just write Σ as a sum of r matrices Σj where each Σj = diag(0, 0, . . . , σj , 0, . . . , 0). Then (6.10) follows from the SVD. & % MS4105 415 ' $ There are many ways to write a matrix as a sum of rank one matrices (for example an expansion into matrices all zero except for one of the rows of A). The rank-1 matrices σj uj v∗j in the expansion in (6.10) have the property that a rank-k partial sum Ak of the σj uj v∗j is the closest rank-k matrix to A — Ak is the “best low-rank approximation” to A. First I find an expression for kA − Ak k2 . Theorem 6.16 For any k such that 1 ≤ k ≤ r define Ak = k X σj uj v∗j . j=1 If k = p ≡ min(m, n), define σk+1 = 0. Then kA − Ak k2 = σk+1 . & % MS4105 416 ' Proof: First note that A − Ak = kA − Ak k22 = k r X $ Pr j=k+1 σj uj v∗j so σj uj v∗j k22 j=k+1 = largest eigenvalue of r X ∗ σj uj v∗j j=k+1 = largest eigenvalue of r X ! σi ui v∗i i=k+1 r X σi σj vj u∗j ui v∗i i,j=k+1 = largest eigenvalue of r X ! σ2i vi v∗i = σ2k+1 . i=k+1 using (6.5), orthonormality of the ui and the fact that Pr 2 ∗ i=k+1 σi vi vi has eigenvectors vi and corresponding eigenvalues σ2i , i = k + 1, . . . r — the largest of which is σ2k+1 . & % MS4105 417 ' $ I now prove the main result: Theorem 6.17 With the definitions in the statement of Thm. 6.16, a rank-k partial sum of the σj uj v∗j is the closest rank-k matrix to A (inf means “greatest lower bound” or infimum): kA − Ak k2 = inf B∈Cm×n ,rank B≤k kA − Bk2 = σk+1 . Proof: Suppose that there is a matrix B with rank B ≤ k “closer to A than Ak ”, i.e. kA − Bk2 < kA − Ak k2 = σk+1 (the last equality by Thm. 6.16). As B has rank less than or equal to k, by the second part of Thm. 6.10 there is a subspace W of dimension at least (n − k) — the nullspace of B — W ⊆ Cn such that w ∈ W ⇒ Bw = 0. So for any w ∈ W, I have Aw = (A − B)w and kAwk2 = k(A − B)wk2 ≤ kA − Bk2 kwk2 < σk+1 kwk2 . & % MS4105 418 ' $ • So W is a subspace of Cn of dimension at least (n − k) — and kAwk2 < σk+1 kwk2 for any w ∈ W. • Let W = span{v1 , . . . , vk+1 }, the first k + 1 columns of V where Pk+1 ∗ A = UΣV . Let w ∈ W. Then w = i=1 αi vi with Pk+1 2 kwk2 = i=1 |αi |2 . So k+1 2 k+1 X X 2 kAwk2 = αi σi ui = |αi |2 σ2i i=1 ≥ σ2k+1 i=1 k+1 X |αi |2 = σ2k+1 kwk2 . i=1 • Since the sum of the dimensions of W and W is greater than n, there must be a non-zero w vector that is contained in both. Why? check . Contradiction. & % MS4105 419 ' 6.4.5 $ Application of Low-Rank Approximations A nice application of low rank approximations to a matrix is image compression. Suppose that I have an m × n matrix of numbers (say each in [0, 1]) representing the grayscale value of a pixel where 0 is white and 1 is black. Then a low-rank approximation to A is a neat way of generating a compressed representation of the image. (There are more efficient methods which are now used in practice.) & % MS4105 ' 420 $ I can work as follows: • Find the SVD of A. • Find the effective rank of A by finding the number of singular √ values that are greater than some cutoff, say ε where ε is matlab’s eps or machine epsilon. • Calculate a succession of low rank approximations to A. The following slides illustrate the idea. The original portrait of Abraham Lincoln on Slide 422 is a greyscale image consisting of 302 rows and 244 columns of dots. Each dot is represented by an integer between 0 and 255. This loads into Matlab/Octave as a 302 × 244 matrix A (say) of 8-bit integers (much less storage than representing the dots as double-precision reals). So the matrix A takes 73, 688 bytes of storage. & % MS4105 ' 421 $ You should check that these need 302r + r + 244r = 547r bytes (where r is the rank used) . I will use low-rank approximations to store and display the portrait, say ranks r = 10, r = 20 and r = 30. In other words, 5, 470 bytes, 10, 940 bytes and 16, 410 bytes of storage respectively — much less than 73, 688 bytes required for the original portrait. The low rank approximations are shown on the succeeding Slides: a rank-10 reconstruction on Slide 423, a rank-20 reconstruction on Slide 424 and a rank-30 reconstruction on Slide 425. Even the rank 10 approximation is unmistakeable while the rank-30 approximation is only slightly “fuzzy”. In the Exercise you will be asked to use the method on a more colourful portrait! & % MS4105 ' 422 $ Figure 12: Original greyscale picture of Abraham Lincoln & % MS4105 423 ' $ Figure 13: Rank-10 reconstruction of Lincoln portrait & % MS4105 424 ' $ Figure 14: Rank-20 reconstruction of Lincoln portrait & % MS4105 425 ' $ Figure 15: Rank-30 reconstruction of Lincoln portrait & % MS4105 ' 426 $ Now for some colour! It takes a little more work to create a low-rank approximation to an m × n pixel colour image as they are read in and stored by Matlab/Octave as three separate matrices, one for each of the primary colours; red, greeen and blue. To illustrate this the next example picture Fig. 17 has 249 rows and 250 columns and is stored by Matlab/Octave as a multi-dimensional arrow of size >> size(picture) ans = 249 250 3 The three “colour planes” each have rank 249 but when I plot the “red” singular values in Fig. 16 I see that they decrease rapidly so a low rank approximation will capture most of the detail. >> semilogy(diag(s_r)) >> title(’Semi-log Y plot of SVD’’s for red layer of painting’) & % MS4105 ' 427 $ Figure 16: Semi-log Y plot of SVD’s for red layer of painting & % MS4105 428 ' $ The details will be explained in the tutorial — for now just have a look! & Figure 17: Original Sailing Painting % MS4105 429 ' $ Figure 18: Rank-10 reconstruction of Sailing Painting & % MS4105 430 ' $ Figure 19: Rank-40 reconstruction of Sailing Painting & % MS4105 431 ' $ Figure 20: Rank-70 reconstruction of Sailing Painting & % MS4105 ' 432 $ Figure 21: Rank-100 reconstruction of Sailing Painting & % MS4105 433 ' 6.5 $ Computing the SVD As mentioned earlier, in principle I can find the singular values matrix Σ by finding the eigenvalues of A∗ A. • A straightforward way to find the matrices U and V is: 1. Solve the eigenvalue problem A∗ AV = VΛ — i.e. find the eigenvalue decomposition A∗ A = VΛV ∗ . 2. Set Σ to be the m × n diagonal square root of Λ. 3. Solve the system UΣ = AV for a unitary matrix U. • Step 3 is non-trivial — I cannot just solve the matrix equation for U by setting U = AVΣ−1 as Σ may not be invertible (cannot be if m 6= n). & % MS4105 434 ' $ • However I can (when m ≥ n) solve for the first n columns of ^ say) as follows: U, (U ^ = AV Σ ^ −1 U ^ is the n × n top left square block of Σ. where Σ • This simple algorithm is implemented in http://jkcray.maths.ul.ie/ms4105/usv.m. • It works for moderate values of m and n (m ≥ n) when A is not too ill-conditioned. To illustrate its limitations try running the Matlab/Octave script: http://jkcray.maths.ul.ie/ms4105/svd_expt.m. You’ll find that when the ratio of the largest to the least singular value is bigger than about 1.0e8, the orthogonality property of u is only approximately preserved. • (The script also tests the qrsvd.m code to be discussed below.) & % MS4105 435 ' $ • A better solution is to find the QR factorisation for the matrix AV. • The QR factorisation, discussed in Section. 5.2 expresses any m × n matrix A as A = QR (6.11) where Q is unitary m × m and R is upper triangular. • In fact the matrix R in the QR factorisation for the matrix AV is diagonal — see Exercise 2 at the end of this Chapter. • So I can write AV = QD where Q is unitary and D is diagonal. • I cannot simply set U = Q as some of the diagonal elements of D may be negative or even complex. & % MS4105 ' 436 $ • I have AV = QD so A = QDV ∗ and therefore by the SVD, UΣV ∗ = QDV ∗ . Solving for D, I have D = OΣ where O = Q∗ U is unitary. But D is diagonal so check that O must be also. A diagonal unitary matrix must satisy |Oii |2 = 1, i.e. each Oii = eiφi for some real φi . If I divide each diagonal element of D by eiφi and multiply each row of the matrix Q by the corresponding eiφi then the matrix D is real and non-negative and I have our SVD. • A slight “wrinkle” is that I use the absolute value of the diagonal matrix R to calculate the singular value matrix. This greatly improves the accuracy of the factorisation for the subtle reason that the process of computing Q and R using the QR algorithm is much more numerically stable than the matrix division used in usv.m. & % MS4105 ' 437 $ • And the matlab code is at http://jkcray.maths.ul.ie/ms4105/qrsvd.m • This algorithm is much more numerically stable , even when A is ill-conditioned. • UΣV ∗ is a good approximation to A — though not quite as good as the usv.m algorithm. • The new QR SVD algorithm also preserves the orthogonality of U to very high accuracy which is an improvement. (This is due to the QR factorisation, discussed in Section 5.2 .) • You can check qrsvd.m using the test script http://jkcray.maths.ul.ie/ms4105/svd_expt.m. • Practical algorithms for calculating the SVD will be briefly discussed in Ch. 10. They are more numerically stable due to the fact that U and V are calculated simultanaeously without forming A∗ A — they are also much faster. & % MS4105 ' 438 $ • Once the SVD is known, the rank can be computed by simply counting the number of non-zero singular values. • In fact the number of singular values greater than some small tolerance is usually calculated instead as this gives a more robust answer in the presence of numerical rounding effects. & % MS4105 439 ' 6.5.1 $ Exercises 1. Show that the Frobenius norm is invariant wrt multiplication by a unitary matrix. 2. Show that if Q and R are the (unitary and upper triangular respectively) QR decomposition of the matrix AV, where A is any m × n complex matrix and V is the unitary matrix that appears in the SVD for A = UΣV ∗ , then R is diagonal. The following steps may be helpful: (a) Show that AV = QR and A = UΣV ∗ imply that Σ = U∗ QR = OR (say) where O is orthogonal. (b) Show that Σ = OR implies that Σ∗ Σ = R∗ R and so that R∗ R is diagonal. (c) Show (using induction) that if R is upper triangular and R∗ R is diagonal then R must be diagonal. & % MS4105 ' 440 $ 3. Consider again the matrix A in Example 4.3. Using the SVD, work out the exact values of σmin and σmax for this matrix. 4. Consider the matrix (See Appendix F.) −2 11 . A= −10 5 & % MS4105 ' 441 $ (a) Find an SVD (remember that it isn’t unique) for A that is real and has as few minus signs as possible in U and V. (b) List the singular values of A. (c) Using Maple or otherwise sketch the unit ball (in the Euclidean 2-norm) in R2 and its image under A together with the singular vectors. (d) What are the 1-, 2- and ∞-norms of A? (e) Find A−1 via the SVD. (f) Find the eigenvalues λ1 , λ2 of A. & % MS4105 ' 7 442 $ Solving Systems of Equations In this Chapter I will focus on Gaussian Elimination — a familiar topic from Linear Algebra 1. I will, however, revisit the algorithm from the now-familiar perspective of matrix products and factorisations. I will begin by reviewing G.E. together with its computational cost, then amend the algorithm to include “partial pivoting” and finally I will briefly discuss the question of numerical stability. & % MS4105 443 ' 7.1 $ Gaussian Elimination You are probably familiar with Gaussian Elimination in terms of “applying elementary row operations” to a (usually) square matrix A in order to reduce it to “reduced row echelon form” — effectively upper triangular form. I will re-state this process in terms of matrix, rather than row, operations. 7.1.1 LU Factorisation I will show that Gaussian Elimination transforms a linear system Ax = b into a upper triangular one by applying successive linear transformations to A, multiplying A on the left by a simple matrix at each iteration. This process is reminiscent of the Householder triangularisation for computing QR factorisations. The difference is that the successive transformations applied in GE are not unitary. & % MS4105 444 ' $ I’ll start with an m × m complex matrix A (I could generalise to non-square matrices but these are rarely of interest when solving linear systems ). The idea is to transform A into an m × m upper triangular matrix U by introducing zeroes below the main diagonal; first in column 1, them in column 2 and so on — just as in Householder triangularisation. Gaussian Elimination effects these changes by subtracting multiples of each row from the rows beneath. I claim that this “elimination” process is equivalent to multiplying A by a succession of lower triangular matrices Lk on the left: Lm−1 Lm−2 . . . L2 L1 A = U, (7.1) and writing L−1 = Lm−1 Lm−2 . . . L2 L1 so that A = LU where U is upper triangular and L is lower triangular. & (7.2) % MS4105 445 ' $ For example start with a 4 × 4 matrix A. The algorithm takes three steps. × × × × × × × × × × × × A × × × × L1 0 X → 0 X × × 0 X × × X X L2 → X X X X L1 A × × × × × 0 X 0 X L2 L1 A & × × × × × × × × × L3 → X × × X 0 X L3 L2 L1 A % MS4105 ' 446 $ So I can summarise: [Gram-Schmidt] A = QR by triangular orthogonalisation [Householder ] A = QR by orthogonal triangularisation [Gaussian Elimination ] A = LU by triangular triangularisation & % MS4105 447 ' 7.1.2 $ Example I’ll start with a numerical example. 2 1 4 3 A= 8 7 6 7 [Gaussian Elimination : Step 1] 1 2 4 −2 1 L1 A = −4 1 8 −3 1 6 & 1 3 9 9 0 1 5 8 1 1 3 3 7 9 7 9 2 1 0 1 1 = 5 3 4 8 1 1 5 6 0 1 5 8 % MS4105 ' 448 $ I subtracted 2 times Row 1 from Row 2, four times Row 1 from Row 3 and three times Row 1 from Row 4. & % MS4105 449 ' [Gaussian Elimination : Step 2] 1 2 1 L2 L1 A = −3 1 −4 1 $ 1 1 0 2 1 1 1 1 1 = 3 5 5 4 6 8 1 0 1 1 2 2 2 4 I subtracted three times Row 2 from Row 3 and four times Row 2 from Row 4. [Gaussian Elimination : Step 3] 1 2 1 1 0 2 1 1 1 1 L3 L2 L1 A = = 1 2 2 −1 1 2 4 I subtracted Row 3 from Row 4. & 1 1 0 1 1 1 2 2 2 % MS4105 ' 450 $ Now to enable us to write A = LU, I need to calculate the product L = L1 −1 L2 −1 L3 −1 . This turns out to be much easier than expected due to the following two properties — I will prove them in Theorem 7.1 below. (a) The inverse of each Lk is just Lk with the signs of the subdiagonal elements in column k reversed. 1 1 For example L2 −1 = 3 1 4 1 (b) The product of the Lk −1 in increasing order of k is just the identity matrix with the non-zero sub-diagonal elements of each of the Lk −1 inserted in the appropriate places. & % MS4105 451 ' So I can write 2 4 8 6 $ 1 1 3 3 7 9 7 9 A & 0 1 1 2 = 5 4 8 3 1 3 1 4 1 L 2 1 1 0 1 1 1 2 2 1 2 (7.3) U % MS4105 452 ' 7.1.3 $ General Formulas for Gaussian Elimination It is easy to write the matrix Lk for an arbitrary matrix A. Let ak be the kth column of the matrix at the beginning of Step k. Then the transformation Lk must be chosen so that a1k a1k .. .. . . akk Lk akk → Lk ak = . ak = ak+1,k 0 .. .. . . amk & 0 % MS4105 453 ' $ To achieve this I need to add −`jk times row k to row j, where `jk is the multiplier `jk = & ajk , akk k < j ≤ m. % MS4105 454 ' The matrix Lk must take the form 1 .. . 1 Lk = −`k+1,k .. . −`m,k & $ 1 .. . 1 % MS4105 ' 455 $ I can now prove the two useful properties of the Lk matrices mentioned above: Theorem 7.1 (a) Each Lk can be inverted by negating its sub-diagonal elements. (b) The product of the Lk −1 in increasing order of k is just the identity matrix with the non-zero sub-diagonal elements of each of the Lk −1 inserted in the appropriate places. Proof: (a) Define `k to be the vector of multipliers for the kth column of Lk , (with zeroes in the first k rows) so that & % MS4105 456 ' $ 0 .. . 0 . `k = `k+1,k .. . `m,k Then Lk = I − `k e∗k where ek is the usual vector in Cm with 1 in the kth row and zeroes elsewhere. Obviously e∗k `k = 0 so (I − `k e∗k )(I + `k e∗k ) = I − `k e∗k `k e∗k = I proving that Lk −1 = (I + `k e∗k ), proving (a). & % MS4105 457 ' $ (b) Consider the product Lk −1 Lk+1 −1 . I have Lk −1 Lk+1 −1 = (I + `k e∗k )(I + `k+1 e∗k+1 ) = I + `k e∗k + `k+1 e∗k+1 + `k e∗k `k+1 e∗k+1 = I + `k e∗k + `k+1 e∗k+1 as e∗k `k+1 = 0 and so `k e∗k `k+1 e∗k+1 = 0. So the matrix L can be written as 1 `21 −1 −1 −1 L = L1 L2 . . . Lm−1 = `31 . .. `m1 & 1 `32 .. . 1 .. `m2 ... . .. . `m,m−1 (7.4) 1 % MS4105 ' 458 $ In practice (analogously to QR factorisation ) the matrices Lk are never formed explicitly. The multipliers `k are computed and stored directly into L and the transformations Lk are then applied implicitly. Algorithm 7.1 Gaussian Elimination without Pivoting (1) (2) (3) (4) (5) (6) (7) U = A, L = I for k = 1 to m − 1 for j = k + 1 to m ujk `jk = ukk uj,k:m = uj,k:m − `jk uk,k:m end end & % MS4105 459 ' 7.1.4 $ Operation Count The work is dominated by the vector operation in the inner loop; uj,k:m = uj,k:m − `jk uk,k:m which executes one scalar-vector multiplication and one vector subtraction. If l = m − k + 1 is the length of the row vectors being worked on then the number of flops is 2l. To find W, the total number of flops I just write the two nested for loops as sums: & % MS4105 460 ' $ W= m−1 X m X 2(m − k + 1) k=1 j=k+1 = m−1 X 2(m − k)(m − k + 1) k=1 = m−1 X 2j(j + 1) setting j = m − k j=1 = 2 (m − 1)m(2(m − 1) + 1)/6 + m(m − 1)/2 = 2/3 m3 − 2/3 m ≈ 2/3 m3 & (7.5) % MS4105 ' 461 $ Comparing this result with (5.26) for the Householder QR method for solving a linear system (when m = n, the latter has W = 4m3 /3) I have found that Gaussian Elimination has half the computational cost. & % MS4105 462 ' 7.1.5 $ Solution of Ax = b by LU factorisation If A is factored into L and U, a system of equations Ax = b can be solved by first solving Ly = b for the unknown y (forward substitution) then Ux = y for the unknown x (back substitution). The back substitution algorithm can be coded as: Algorithm 7.2 Back Substitution (2) for j = mDOWNTO1 Pm xj = bj − k=j+1 xk ujk /ujj (3) end (1) I can easily compute the cost of back substitution. At the jth iteration, the cost is m − j multiplys and m − j subtractions plus one division giving a total of 2(m − j) + 1 flops. Summing this for j = 1 to m gives a total cost W = m2 . & % MS4105 ' 463 $ Exercise 7.1 Write pseudo-code for the Forward Substitution algorithm and check that the result is also W = m2 . Putting the pieces together, I have that when I solve a linear system using LU factorisation, the first step (factoring A) requires ≈ 2/3m3 flops (as discussed earlier), the second and third each require ≈ m2 flops. So the overall cost of the algorithm is W ≈ 2/3m3 as the m3 term is the leading term for large m. & % MS4105 464 ' 7.1.6 $ Instability of Gaussian Elimination without Pivoting Gauss Elimination is clearly faster (a factor of two) than an algorithm based on Householder factorisation. However it is not numerically stable as presented. In fact the algorithm will fail completely for some perfectly well-conditioned matrices as it will try to divide by zero. Consider 0 A= 1 1 . 1 The matrix has full rank and is well-conditioned but it is obvious that GE will fail at the first step due to dividing by zero. Clearly I need to modify the algorithm to prevent this happening. & % MS4105 465 ' $ Even if our basic GE does not fail due to division by zero it can still have numerical problems. Now consider a small variation on A above: 10−20 A= 1 1 . 1 The GE process does not fail. I subtract 1020 times row one from row two and the following factors are calculated (in exact arithmetic): 1 0 10−20 1 . L= , U= 1020 1 0 1 − 1020 & % MS4105 466 ' $ The problems start of course when I try to perform these calculations in floating point arithmetic, with a machine epsilon (smallest non-zero floating point number) ≈ 10−16 , say. The number 1 − 1020 will not be represented exactly, it will be rounded to the nearest float — suppose that this is −1020 . Then the floating point matrices produced by the algorithm will be −20 1 0 10 1 ˜ ˜ . L= , U= 1020 1 0 −1020 The change in L and U is small but if I ˜U ˜ I find that L −20 10 ˜U ˜ = L 1 now compute the product 1 0 which is very different to the original A. & % MS4105 467 ' $ In fact ˜Uk ˜ = 1. kA − L ˜ and U ˜ to find a solution to Ax = b with b = (1, 0)∗ I get If I use L ˜ = (0, 1)∗ while the exact solution is x ≈ (−1, 1)∗ . (Check .) x GE has computed the LU decomposition of A in a “stable” way, ˜ and U ˜ the floating point (rounded) factors of A are close to i.e. L the exact factors L and U of a matrix close to A (in fact A itself). However I have seen that the exact arithmetic solution to Ax = b differs greatly from the floating point solution to Ax = b. & % MS4105 ' 468 $ This is due to the unfortunate fact that the LU algorithm for the solution of Ax = b, though stable, is not “backward stable” in that the floating point implementation of the algorithm does not have the nice property that it gives the exact arithmetic solution to the same problem with data that differs from the exact data with a relative error of order machine epsilon. I will show in the next section that (partial) pivoting largely eliminates (pun) these difficulties. (A technical description of the concepts of stability and backward stability can be found in Appendix O.) & % MS4105 469 ' 7.1.7 $ Exercises 1. Let A ∈ Cm×m be invertible. Show that A has an LU factorisation iff for each k such that 1 ≤ k ≤ m, the upper-left k × k block A1:k,1:k is non-singular. (Hint: the row operations of GE leave the determinants det A1:k,1:k invariant.) Prove that this LU decomposition is unique. 2. The GE algorithm, Alg. 7.1 as presented above has two nested for loops and a third loop implicit in line 5. Rewrite the algorithm with just one explicit for loop indexed by k. Inside this loop, U should be updated at each step by a certain rank-one outer product. (This version of the algorithm may be more efficient in matlab as matrix operations are implemented in compiled code while for loops are interpreted.) 3. Write matlab code to implement Algs. 7.1 and 7.2. & % MS4105 470 ' 7.2 $ Gaussian Elimination with Pivoting I saw in the last section that GE in the form presented is unstable. The problem can be mitigated by re-ordering the rows of the matrix being operated on, a process called pivoting. But first, as I will use the term and the operation frequently in this section; a note on permutations. & % MS4105 471 ' 7.2.1 $ A Note on Permutations Formally, a permutation is just a re-ordering of the numbers {1, 2, . . . , n} for any positive integer n. For example p = {3, 1, 2} is a permutation of the set {1, 2, 3}. In the following I will use permutation matrices Pi to re-order the rows/columns of the matrix A. Some details for reference: • Let p be the required re-ordering of the rows of a matrix A and I the n × n identity matrix. • Then claim that P = I(p, :) is the corresponding permutation matrix — or equivalently that Pij = δpi ,j . • Check: (PA)ik = Pij Ajk = δpi ,j Ajk = Api ,k — clearly the rows of A have been permuted using the permutation vector p. & % MS4105 ' 472 $ • What if the columns of the n × n identity matrix I are permuted? • Let Q = I(:, p) so that Qij = δi,pj . • Then (AQ)ik = Aij Qjk = Aij δj,pk = Ai,pk — the columns of A have been permuted using the permutation vector p. • Finally, what is the relationship between the (row) permutation matrix P and the (column) matrix Q? • Answer: (PQ)ik = Pij Qjk = δpi ,j δj,pk = δpi ,pk ≡ δik ! • So Q = P−1 — i.e. Q is the permutation matrix corresponding to the “inverse permutation” q that “undoes” p so that p(q) = [1, 2, . . . , n]T and Qp = Pq = [1, 2, . . . , n]T . & % MS4105 473 ' $ • A Matlab example: >> n = 9; >> I = eye(n); % the n × n identity matrix >> A = rand(n, n) >> p = [1, 7, 2, 5, 4, 6, 9, 3, 8] >> P = I(p, :) %The perm matrix corr. to p. >> P ∗ A − A(p, :) % zero, P permutes the rows of A. >> Q = I(:, p) %The perm matrix corr. to q (below). >> A ∗ Q − A(:, p) % zero >> P ∗ Q − I % zero i.e. Q = P−1 >> [i, q] = sort(p) % q is "inverse permutation" for p >> Q − I(q, :) % zero & % MS4105 474 ' 7.2.2 $ Pivots At step k of GE, multiples of row k are subtracted from rows k + 1, . . . , m of the current (partly processed version of A) matrix X (say) in order to introduce zeroes into the kth column of these rows. I call xkk the pivot. However there is no reason why the kth row and column need have been chosen. In particular if xkk = 0 I cannot use the kth row as that choice results in a divide-by-zero error. & % MS4105 ' 475 $ It is also better from the stability viewpoint to choose the pivot row to be the one that gives the largest value possible for the pivot — equivalently the smallest value possible for the multiplier. It is much easier to keep track of the successive choices of pivot rows if I swap rows as neccessary to ensure that at the kth step, the kth row is still chosen as the pivot row. & % MS4105 476 ' 7.2.3 $ Partial pivoting If every element of Xk:m,k:m is to be considered as a possible pivot at step k then O((m − k)2 ) entries need to be examined and summing over all m steps I find that O(m3 ) operations are needed. This would add significantly to the cost of GE. This strategy is called complete pivoting and is rarely used. The standard approach, partial pivoting, searches for the largest entry in the kth column in rows k to m, the last m − k + 1 elements of the kth column. So the GE algorithm is amended by inserting a permutation operator (matrix) between successive left-multiplications by the Lk . & % MS4105 477 ' $ More precisely, after m − 1 steps, A becomes an upper triangular matrix U with Lm−1 Pm−1 . . . L2 P2 L1 P1 A = U. (7.6) where the matrices Pk are permutation matrices formed by swapping the kth row of the identity matrix with a lower row. Note that although Pk −1 ≡ Pk for all k (because Pk2 = I) I will usually in the following write Pk −1 where appropriate rather than Pk to make the argument clearer. & % MS4105 478 ' 7.2.4 $ Example Let’s re-do the numerical example above with partial pivoting. 2 4 A= 8 6 With p.p., interchange rows 1 & 1 2 1 4 3 1 8 7 1 1 6 7 & 1 1 3 3 7 9 7 9 0 1 5 8 3 by left-multiplying 1 0 8 7 9 3 1 4 3 3 = 2 1 1 9 5 9 8 6 7 9 by P1 : 5 1 0 8 % MS4105 479 ' The first elimination step: 1 8 4 − 1 1 2 1 − 1 2 4 − 34 1 6 $ left-multiply by L1 ; 7 9 5 8 7 1 3 3 1 −2 = −3 1 1 0 4 7 7 9 8 4 Now swap rows 2 & 4 by left-multiplying by P2 : 8 7 1 8 7 9 5 3 3 7 1 1 − 2 − 2 − 2 4 = 3 5 5 3 1 −4 −4 −4 −4 7 9 17 1 − 1 4 4 4 2 & 9 5 − 23 − 45 9 4 − 32 5 −4 17 4 9 9 4 − 45 − 23 5 17 4 5 −4 − 32 % MS4105 480 ' $ The second elimination step: left-multiply by L2 ; 1 1 3 7 2 7 8 1 1 7 9 7 4 − 34 − 12 9 4 − 54 − 32 5 8 17 4 = − 54 − 32 7 9 7 4 9 4 − 27 − 67 Now swap rows 3 & 4 by left-multiplying by P3 : 8 7 1 8 7 9 5 9 17 7 7 1 4 4 4 4 = 2 4 − 1 7 7 − 67 − 27 1 & 9 9 4 − 67 − 27 5 17 4 4 7 − 27 5 17 4 2 −7 4 7 % MS4105 481 ' The final elimination step: left-multiply by L3 ; 1 8 7 9 5 8 7 9 17 7 7 1 4 4 4 4 = 6 2 − − 1 7 7 4 − 27 − 13 1 7 & $ 9 9 4 − 67 5 17 4 2 −7 2 3 % MS4105 482 ' 7.2.5 $ PA = LU Factorisation Have I just completed an LU factorisation of A? No, I have computed an LU factorisation of PA, where P is a permutation matrix: 1 2 1 1 0 1 4 3 3 1 1 8 7 9 5 1 6 7 9 8 P A 1 3 = 4 1 2 1 4 & 1 − 72 − 73 L 1 1 3 8 1 7 9 7 4 9 4 − 67 U 5 17 4 2 −7 2 3 (7.7) % MS4105 483 ' $ Compare this with (7.3). Apart from the presence of fractions in (7.7) and not in (7.3), the important difference is that all the subdiagonal elements in L are ≤ 1 in magnitude as a result of the pivoting strategy. I need to justify the statement that PA = LU and explain how I computed L and P. The example elimination just performed took the form L3 P3 L2 P2 L1 P1 A = U. These elementary matrix products can be re-arranged: −1 −1 −1 L3 P3 L2 P2 L1 P1 = L3 P3 L2 P3 P3 P2 L1 P2 P3 P3 P2 P1 = L30 L20 L10 P3 P2 P1 where obviously L30 = L3 , & L20 = P3 L2 P3 −1 and L1 = P3 P2 L1 P2 −1 P3 −1 . % MS4105 484 ' 7.2.6 $ Details of Li to Li0 Transformation I can check that the above results for Lj , Pj and Lj0 with j = 1, 2, 3 can be extended to j in the range 1, . . . , m − 1. Define (for j = 1, 2, . . . , m − 1) Πj = Pm−1 Pm−2 . . . Pj (7.8) Lj0 = Πj+1 Lj Π−1 j+1 , (7.9) 0 with Πm−1 = Pm−1 and Πm = I so that Lm−1 = Lm−1 . Then RTP that 0 0 Lm−1 Lm−2 . . . L20 L10 & Π1 = Lm−1 Pm−1 Lm−2 Pm−2 . . . L2 P2 L1 P1 . (7.10) % MS4105 485 ' $ First notice that 0 Lj+1 Lj0 −1 ≡ Πj+2 Lj+1 Π−1 j+2 Πj+1 Lj Πj+1 = −1 −1 Πj+2 Lj+1 Pj+2 . . . Pm−1 Pm−1 . . . Pj+1 Lj Π−1 j+1 = Πj+2 Lj+1 Pj+1 Lj Π−1 j+1 . 0 0 So Lm−1 Lm−2 . . . L20 L10 Π1 = Lm−1 Pm−1 Lm−2 . . . P2 L1 Π−1 2 Π1 = −1 Lm−1 Pm−1 Lm−2 . . . P2 L1 P2−1 . . . Pm−1 Pm−1 . . . P1 = Lm−1 Pm−1 Lm−2 . . . P2 L1 P1 , as required. & % MS4105 ' 486 $ • Since the definition (7.9) of Lj0 only applies permutations Pk with k > j to Lj I can see that: Lj0 must have the same structure as Lj . • This is because each Pk swaps row k with one of the succeeding rows. • The effect of Pk with k > j on Lj when I form Pk Lj Pk −1 is just to permute the sub-diagonal elements in the jth column of Lj according to the permutation encoded in Pk . • Left-multiplying Lj by Pk swaps rows k and l (say, for some l > k) and right-multiplying Lj by Pk swaps columns k and l which “undoes” the effect of swapping rows k and l, except for column j. 0 So the matrices L10 , L20 , . . . Lm−1 are just the original Li , i = 1, 2, . . . , m − 1 with the sub-diagonal elements in columns 1, 2, . . . , m − 1 respectively appropriately permuted. & % MS4105 487 ' $ In general, for an m × m matrix, the factorisation (7.6) provided by GE with p.p. can be written (based on Eq. 7.10): 0 0 0 Lm−1 . . . L2 L1 (Pm−1 . . . P2 P1 ) A = U. where each Lk0 is defined as (based on Eq. 7.9): Lk0 = Pm−1 . . . Pk+1 Lk Pk+1 −1 . . . Pm−1 −1 The product of the matrices Lk0 is also unit lower triangular (ones on main diagonal) and invertible by negating the sub-diagonal entries, just as in GE without pivoting.. If I write 0 0 0 −1 L = Lm−1 . . . L2 L1 and P = (Pm−1 . . . P2 P1 ) I have PA = LU. & (7.11) % MS4105 ' 488 $ I can now write pseudo-code for the partial-pivoting version of G.E.: Algorithm 7.3 Gaussian Elimination with Partial Pivoting (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) U = A, L = I, P = I for k = 1 to m − 1 Select i ≥ k to maximise |uik | uk,k:m ↔ ui,k:m (Swap rows k and i) lk,1:k−1 ↔ li,1:k−1 pk,: ↔ pi,: for j = k + 1 to m ujk `jk = ukk uj,k:m = uj,k:m − `jk uk,k:m end end & % MS4105 ' 489 $ To leading order this algorithm requires the same number of flops as G.E. without pivoting, namely 23 m3 . (See the Exercises). In practice, P is not computed or stored as a matrix but as a permutation vector. & % MS4105 490 ' 7.2.7 $ Stability of GE It can be shown that: • If A is invertible and the factorisation A = LU is computed by GE without pivoting then (provided that the reasonable conditions on floating point arithmetic (O.1) and (O.2) hold) then, provided M is sufficiently small if A has an LU decomposition, the factorisation will complete successfully and ˜ and U ˜ satisfy the computed matrices L ˜U ˜ = A + δA, L & kδAk = O(M ). kLkkUk (7.12) % MS4105 491 ' $ • In GE with p.p. each pivot selection involves maximisation over a sub-column so the algorithm produces a matrix L with entries whose absolute values are ≤ 1 everywhere below the main diagonal. This mens that kLk = O(1) in any matrix norm. So for GE with p.p., (7.12) reduces to the condition kδAk kUk = O(M ). The algorithm is therefore backward stable provided that kUk = O(kAk). • The key question for stability is whether there is growth in the entries of U during the GE (with p.p.) process. Let the growth factor for A be defined as: ρ= maxi,j |uij | maxi,j |aij | (7.13) • If ρ is O(1) the elimination process is stable. If ρ is bigger I expect instability. & % MS4105 ' 492 $ • As kLk = O(1) and kUk = O(ρkAk), I can conclude from the definition of ρ that: • If the factorisation PA = LU is computed by GE with p.p. and ˜ and U ˜ if (O.1) and (O.2) hold then the computed matrices L satisfy: kδAk ˜ ˜ LU = PA + δA, = O(ρM ). (7.14) kAk • What’s the difference? Without pivoting L and U can be unboundedly large. This means that the bound on the error on A in (7.12) may in fact allow the error to be large. This difficulty does not arise in (7.14) provided ρ = O(1). & % MS4105 ' 493 $ • A final — negative — comment on GE with p.p. There are matrices that exibit large values of ρ. Consider applying GE (no pivoting necessary) to : 1 1 1 1 −1 1 1 1 2 A = −1 −1 1 1 , L1 A = −1 1 2 −1 −1 −1 1 1 −1 −1 1 2 −1 −1 −1 −1 1 −1 −1 −1 2 1 1 1 1 1 2 2 1 L2 L1 A = 1 4 , L3 L2 L1 A = 1 4 1 8 −1 1 4 −1 −1 4 −1 8 & % MS4105 494 ' $ Finally; 1 L4 L3 L2 L1 A = & 1 1 1 2 4 1 8 16 % MS4105 495 ' $ The final PA = LU (P = I) factorisation is 1 −1 1 −1 −1 −1 −1 −1 −1 1 −1 1 1 1 1 = 1 −1 −1 1 1 −1 1 −1 −1 −1 −1 −1 −1 & 1 −1 1 −1 −1 1 1 1 1 2 1 4 1 8 16 % MS4105 ' 496 $ For this 5 × 5 matrix, the growth factor is ρ = 16. For any m × m matrix of the same form, it is ρ = 2m−1 — the upper bound on ρ (see the Exercises). • Such a matrix — for fixed m — is still backward stable but if m were large, in practice GE with p.p. could not be used. • Fortunately it can be shown that matrices with ρ = O(2m ) are a vanishingly small minority. • In practice, GE is backward stable and works well. & % MS4105 497 ' 7.2.8 $ Exercises 1. Explain (analogously with the discussion on Slide 462) how the PA = LU factorisation can be used to solve a linear system. 2. Calculate the extra computational cost of partial pivoting (line 3 in Alg. 7.3). 3. Find the determinant of the matrix A in the numerical example above using the GE factorisation : (7.3) and the p.p version (7.7). 4. Can you explain why the maximum value for ρ is ρ = 2m−1 ? (Hint: w.l.o.g. I can normalise A so that maxi,j |aij | = 1.) 5. Write a matlab m-file to implement Alg. 7.3. Compare its numerical stability with that of Alg. 7.1. & % MS4105 498 ' 8 $ Finding the Eigenvalues of Matrices I’ll start with a short review of the topic of eigenvalues and eigenvectors. It should be familiar from Linear Algebra 1. 8.1 Eigenvalue Problems Let A ∈ Cn×n be a square matrix. A non-zero vector x ∈ Cn is an eigenvector of A and λ ∈ C its corresponding eigenvalue if Ax = λx. (8.1) The effect on an eigenvector of multiplication is to stretch (or shrink) it by a scalar (possibly complex) factor. More generally, if the action of a matrix A on a subspace S of Cn is to stretch all vectors in the space by the same scalar factor λ, the subspace is called an eigenspace and any nonzero x ∈ S is an eigenvector. & % MS4105 499 ' $ The set of all the eigenvalues of A is the spectrum of A, a subset of C denoted by Λ(A). Eigenvalue problems have the important restriction that the domain and range spaces must have the same dimension — in other words A must be square for (8.1) to make sense. 8.1.1 Eigenvalue Decomposition An eigenvalue decomposition of a square matrix A is a factorisation A = XΛX−1 . (8.2) Here X is invertible and Λ is diagonal. (I’ll show that such a factorisation is not always possible.) I can re-write (8.2) as & % MS4105 500 ' $ AX = XΛ or A x1 x2 ... x1 xn = x2 ... λ1 xn λ2 .. . λn (8.3) which of course can be read as Axj = λj xj , j = 1, . . . , n. & % MS4105 501 ' $ So the jth column of X is an eigenvector of A and the jth diagonal entry of Λ the corresponding eigenvalue. As in previous contexts, the eigenvalue decomposition expresses a change of basis to coordinates referred to eigenvectors. If Ax = b and A = XΛX−1 I have (X−1 b) = Λ(X−1 x). (8.4) So to compute Ax, I can expand x in the basis of columns of X, apply Λ and interpret the result as a vector of coefficients of an expansion in the columns of X — remembering that: y = X−1 x ≡ x = Xy ≡ x is expanded in terms of cols of X z = X−1 b ≡ b = Xz ≡ b is expanded in terms of cols of X. & % MS4105 502 ' 8.1.2 $ Geometric Multiplicity I saw that the set of eigenvectors corresponding to a single eigenvalue (plus the zero vector) forms a subspace of Cn called an eigenspace. If λ is an eigenvalue of A, refer to the corresponding eigenspace as Eλ . Typically, I have dim Eλ = 1 — for a given eigenvalue λ there is a unique eigenvector xλ (or any multiple of xλ ) but this need not be true. Definition 8.1 The dimension of Eλ is the size of the largest set of linearly independent eigenvectors corresponding to the same eigenvalue λ. This number is referred to as the geometric multiplicity of λ. Alternatively I can say that the geometric multiplicity is the dimension of the nullspace of A − λI since this space is just Eλ . & % MS4105 503 ' 8.1.3 $ Characteristic Polynomial The characteristic polynomial of an n × n complex matrix A written pA or just p is the polynomial of degree m defined by pA (z) = det(zI − A). (8.5) Note that the coefficient of zn is 1. The following well-known result follows immediately from the definition: Theorem 8.1 λ is an eigenvalue of A iff pA (λ) = 0. Proof: From the definition of an eigenvalue ; λ is an eigenvalue ⇔ ∃ x 6= 0 s.t. λx − Ax = 0 ⇔ λI − A is singular ⇔ det(λI − A) = 0. & % MS4105 ' 504 $ Note that even if a matrix is real, some of its eigenvalues may be complex — as a real polynomial may have complex roots. & % MS4105 505 ' 8.1.4 $ Algebraic Multiplicity By the Fundamental Theorem of Algebra, I can write pA as pA (z) = (z − λ1 )(z − λ2 ) . . . (z − λn ). (8.6) for some λ1 , . . . , λn ∈ C. By Thm. 8.1, each λj is an eigenvalue of A and all eigenvalues of A appear somewhere in the list. In general an eigenvalue may appear more than once. Definition 8.2 Define the algebraic multiplicity of an eigenvalue of A to be its multiplicity as a root of pA . An eigenvalue is simple if its algebraic multiplicity is 1. It follows that a n × n matrix has n eigenvalues, counted with algebraic multiplicity. I can certainly say that every matrix has at least one eigenvalue. & % MS4105 ' 506 $ • It would be nice if the geometric multiplicity of an eigenvalue always equalled its algebraic multiplicity. • Unfortunately this is not true. • I will prove that the algebraic multiplicity of an eigenvalue is always greater than or equal to its geometric multiplicity. • First I need some results about similarity transformations. & % MS4105 507 ' 8.1.5 $ Similarity Transformations Definition 8.3 If X ∈ Cn×n is invertible then the mapping A → X−1 AX is called a similarity transformation of A. say that two matrices A and B are similar if there is a similarity transformation relating one to the other — i.e. if there exists an invertible matrix X s.t. B = X−1 AX. As I mentioned in the context of eigenvalue diagonalisation (8.4), any similarity transformation is a change of basis operation. Many properties are held in common by matrices that are similar. Theorem 8.2 If X is invertible then for any compatible matrix A, A and X−1 AX have the same: • characteristic polynomial, • eigenvalues and • algebraic and geometric multiplicities. & % MS4105 508 ' $ Proof: • Check that the characteristic polynomials are equal: pX−1 AX (z) = det(zI − X−1 AX) = det(X−1 (zI − A)X) = det(X−1 ) det(zI − A) det(X) = det(zI − A) = pA (z). • From the agreement of the characteristic polynomials, the agreement of the eigenvalues and the algebraic multiplicities follow. • Finally, to prove that the geometric multiplicities are equal — check that if Eλ is an eigenspace for A then X−1 Eλ is an eigenspace for X−1 AX and vice versa. (Note that dim X−1 Eλ = dim Eλ . Check using definition of a basis.) & % MS4105 ' 509 $ Now to relate geometric multiplicity to algebraic multiplicity. Theorem 8.3 The algebraic multiplicity of an eigenvalue λ is greater than or equal to its geometric multiplicity. Proof: Let g be the geometric multiplicity of λ for the matrix A. Form a n × g matrix V^ whose columns form an orthonormal basis for the eigenspace Eλ . Then if I augment V^ to a square unitary matrix V check that I can write: λIg C ∗ . B = V AV = 0 D (Exercise: what are the dimensions of Ig , C and D?) & % MS4105 ' 510 $ By definition of the determinant, (see Ex. 1 below) det(zI − B) = det(zIg − λI) det(zI − D) = (z − λ)g det(zI − D). Therefore the algebraic multiplicity of λ as an eigenvalue of B is at least g. Since similarity transformations preserve multiplicities, the result holds for A. & % MS4105 511 ' 8.1.6 $ Defective Eigenvalues and Matrices Usually the algebraic multiplicities and geometric multiplicities of the eigenvalues of a square matrix are all equal (and equal to 1). However this is not always true. Example 8.1 Consider the matrices 2 2 A = 2 ,B = 2 1 2 1 . 2 Both A and B have the same characteristic polynomial (z − 2)3 so they have a single eigenvalue λ = 2 of algebraic multiplicity 3. In the case of A, I can choose three linearly independent eigenvectors (e.g. e1 , e2 and e3 ) so the geometric multiplicity is also 3. & % MS4105 ' 512 $ However for B check that I can only find a single eigenvector (any multiple of e1 ) so that the geometric multiplicity of the eigenvalue is 1. Definition 8.4 An eigenvalue whose algebraic multiplicity exceeds its geometric multiplicity is called a defective eigenvalue . A matrix with one or more defective eigenvalues is called a defective matrix. Diagonal matrices are never defective as the algebraic multiplicities and geometric multiplicities of each of their eigenvalues λi are precisely the number of occurrences of λi on the matrix’s diagonal. & % MS4105 513 ' 8.1.7 $ Diagonalisability An important result: Nondefective matrices ≡ Matrices that have an eigenvalue decomposition (8.2). I’ll state this as a Theorem shortly but first a technical result: Lemma 8.4 For any square matrix A, a set of eigenvectors corresponding to distinct eigenvalues must be linearly independent. (This holds automatically if A is Hermitian (A∗ = A) but the following proof works for any square matrix..) Proof: Let the matrix A have k distinct eigenvalues λ1 , . . . , λk and let {xi }ki=1 be a set of eigenvectors corresponding to these distinct eigenvalues so that Axi = λi xi . & % MS4105 514 ' $ Assume that the set is linearly dependent so that k X αi xi = 0, not all αi = 0. i=1 WLOG (re-labelling if necessary) assume that α1 6= 0 so that x1 = k X βi xi , βi = −αi /α1 . i=2 Not all the βi = 0 for i = 2, . . . , k as if all are zero then x1 = 0. Now: k k X X Ax1 = λ1 x1 = βi Axi = βi λi xi . i=2 i=2 As λ1 6= 0, I can solve for x1 . Now take the difference of the two equations for x1 : k X λi 0= βi (1 − )xi . λ1 i=2 & % MS4105 ' 515 $ The factors 1 − λλ1i are all non-zero as the λi are all distinct. I know that the βi are not all zero so one of the xi for i > 1 (say x2 ) can be expressed in terms of the remaining xi , i > 2. Again, not all the coefficients in this expression can be zero as this would result in x2 = 0. If I again multiply by A I can repeat the argument. So, repeating the process k − 1 times I conclude that the last xi , say xλk must be zero and so that all the xi = 0. This gives a contradiction as I began by taking the {xi } to be a set of eigenvectors. & % MS4105 ' 516 $ Now the main Theorem: Theorem 8.5 An n × n matrix A is nondefective iff it has an eigenvalue decomposition A = XΛX−1 . Proof: [←] Given an eigenvalue decomposition A = XΛX−1 , I know by Thm. 8.2 that A is similar to Λ — with the same eigenvalues and multiplicities. Since Λ is a diagonal matrix it is non-defective and so A must also be. & % MS4105 ' 517 $ [→] A non-defective matrix must have n linearly independent eigenvectors. Argue as follows: • eigenvectors with different eigenvalues must be linearly independent by Lemma 8.4. • as S is non-defective, each eigenvalue contributes as many linearly independent eigenvectors as its algebraic multiplicity (the number of repetitions of the eigenvalue ) If I form a matrix X from these n linearly independent eigenvectors then X is invertible and AX = XΛ as required. So the terms nondefective and diagonalisable are equivalent. & % MS4105 518 ' 8.1.8 $ Determinant and Trace A reminder of some definitions. Definition 8.5 The trace of an n × n matrix A is the sum of its diagonal elements.. Definition 8.6 The determinant can be defined as the sum over all signed permutations of products of n elements from A — or recursively: • determinant of a 1 × 1 matrix is just the element A11 Pn • determinant of an n × n matrix is k=1 (−1)k+1 a1k M1k where M1k is the minor determinant of the (n − 1) × (n − 1) matrix found by deleting the first row and the kth column of A. & % MS4105 519 ' $ I can easily prove some useful results: Theorem 8.6 The determinant and trace of a square matrix A are equal to the product and sum of the eigenvalues of A respectively, counted with algebraic multiplicity : det(A) = n Y λi trace(A) = i=1 n X (8.7) λi i=1 Proof: By (8.5) and (8.6) I can calculate det(A) = (−1)n det(−A) = (−1)n pA (0) = n Y λi i=1 which is the required result for det A. From (8.5) it follows that the coefficient of the zn−1 term of pA is minus the sum of the diagonal elements of A, i.e. − trace A. & % MS4105 ' 520 $ (To get a factor of zn−1 in the det(zI − A) I select n − 1 diagonal elements of zI − A which means that the nth factor for that term in the determinant is z − aii . There are n ways to select the “missing” nth factor so the zn−1 term in the determinant is Pn − i=1 aii as claimed.) Pn n−1 But in (8.6) the z term is equal to − i=1 λi . & % MS4105 521 ' 8.1.9 $ Unitary Diagonalisation I have shown that a non-defective n × n matrix A has n linearly independent eigenvectors. In some cases the eigenvectors can be chosen to be orthogonal. Definition 8.7 If an n × n matrix A has n linearly independent eigenvectors and there is a unitary matrix Q s.t. A = QΛQ∗ I say that A is unitarily diagonalisable. Note that the diagonal matrix of eigenvalues, Λ may be complex. However I can state a theorem to clarify this. Theorem 8.7 A hermitian matrix is unitarily diagonalisable and its eigenvalues are real. Proof: Exercise. & % MS4105 ' 522 $ Many other classes of matrices are also unitarily diagonalisable. The list includes skew-hermitian and unitary matrices. There is a very neat general criterion for unitary diagonalisability. I need a definition: Definition 8.8 An n × n matrix A is normal if AA∗ = A∗ A. (Notice that hermitian, skew-hermitian and unitary matrices are all normal — check . But normal matrices need not fall into one of these categories, for example: 1 1 0 1 0 1 1 1 A= 1 1 1 0 1 0 1 1 is neither hermitian, skew-hermitian nor unitary — but is normal. Check .) & % MS4105 ' 523 $ I will state and prove a Theorem (Thm. 8.11) that relates normality to unitary diagonalisability following the discussion of Schur Factorisation in the next Section. & % MS4105 524 ' 8.1.10 $ Schur Factorisation I need a general decomposition for any m × m matrix – the Schur decomposition allows any square matrix to be expressed as unitarily similar to an upper triangular matrix. It is very widely used in matrix algebra — and will allow us to prove Thm. 8.11 below. & % MS4105 525 ' $ First I prove a technical lemma. Lemma 8.8 Any m × m matrix A can be written as ∗ λ v 1 1 Q∗0 AQ0 = for some unitary Q0 . 0 A1 (8.8) Proof: Let q1 be an eigenvector of A, kq1 k = 1. I can choose 0 0 q20 , . . . , qm so that {q1 , q20 , . . . , qm } form an orthonormal basis for Cm . Let h i Q0 = q1 q20 ... 0 , qm then Q0 is obviously unitary. (Q∗0 Q0 = I). & % MS4105 526 ' $ I have q∗1 0∗ h i q2 ∗ 0 0 Q0 AQ0 = .. A q1 q2 . . . qm . 0∗ qm q∗1 0∗ h i q2 λ1 q1 Aq 0 . . . Aq 0 = . m . 2 . 0∗ qm λ1 v∗1 , = 0 A1 0 where v∗1 = q∗1 A[q20 . . . qm ] and A1 is (m − 1) × (m − 1). & % MS4105 527 ' $ Now for the main Theorem. Theorem 8.9 For any m × m matrix A, I can write the Schur factorisation of A as A = QTQ∗ (8.9) where Q is unitary and T is upper triangular. Proof: The proof is reminiscent of the proof of Thm. 6.2 for the SVD. I use induction. Take m ≥ 2 as the m = 1 case is trivial. [Base Step] . Taking m = 2, (8.8) in the Lemma just proved gives ∗ λ v 1 1 — obviously upper triangular. Q∗0 AQ0 = 0 a1 & % MS4105 528 ' $ [Inductive Step] Assume that any (m − 1) × (m − 1) matrix has a Schur factorisation. RTP that this holds for any m × m matrix. By the inductive hypothesis, I can find a Schur decomposition for A1 in (8.8). So there exists a unitary (m − 1) × (m − 1) matrix Q1 s.t. ∗ λ v 2 2 , Q∗1 A1 Q1 = 0 T2 where v2 ∈ Cm−2 and T2 is (m − 2) × (m − 2) upper triangular. Define 1 Q = Q0 0 & 0 Q1 . % MS4105 529 ' $ Then (using (8.8) to go from the first to the second line) 1 0 1 0 ∗ ∗ Q AQ = Q0 AQ0 0 Q∗1 0 Q1 1 0 λ1 v∗1 1 0 = 0 A1 0 Q1 0 Q∗1 λ1 v∗1 Q1 1 0 = 0 A1 Q1 0 Q∗1 λ1 v∗1 Q1 = 0 Q∗1 A1 Q1 λ1 ∗ ∗ ∗ = 0 λ2 v2 = T. 0 0 T2 & % MS4105 530 ' $ Obviously T is upper triangular (as T2 is). So Q∗ AQ = T upper triangular and also Q is itself unitary as it is a product of unitary matrices. Multiplying on the left by Q and on the right by Q∗ I have A = QTQ∗ . I can easily prove a nice consequence of the Schur factorisation — the result suggests that a Schur decomposition will allow us to compute the eigenvalues of any square matrix. Theorem 8.10 Given the Schur factorisation A = QTQ∗ of a square matrix A, the eigenvalues of A appear on the main diagonal of T . & % MS4105 531 ' $ Proof: I have A = QTQ∗ . Let x be any eigenvector of A. Then Ax = QTQ∗ x = λx. Defining y = Q∗ x, I can write Ty = λy so y is an eigenvector of T iff x = Qy is an eigenvector of A — with the same corresponding eigenvalue λ. So certainly T has the same spectrum as A. Now, given that T is upper triangular I can write for a given eigenvector ψ corresponding to a given eigenvalue λ: (Tψ)i = m X Tij ψj = λψi . (8.10) i≤j Use a form of back substitution — start with the last row; i = m. • I have Tmm ψm = λψm so ψm = 0 or Tmm = λ. • If Tmm = λ then λ appears on the diagonal as claimed. & % MS4105 ' 532 $ • If Tmm 6= λ then I have ψm = 0. • Move to the previous row, i.e. set i = m − 1. • Then (8.10) becomes Tm−1,m−1 ψm−1 = λψm−1 . • This time either Tm−1,m−1 = λ or ψm−1 = 0. • Repeating this process, eventually one of the Tii = λ as otherwise ψ = 0 which contradicts ψ an eigenvector. I can repeat this “back-substitution” for each eigenvalue λ (remember that T has the same eigenvalues as A). I conclude that all the eigenvalues of T appear on the main diagonal. & % MS4105 533 ' $ I can now prove the important result that: Theorem 8.11 A square matrix is unitarily diagonalisable iff it is normal. Proof: I have by Thm. 8.9 that any square matrix A can be written as A = QTQ∗ . Use the definition of normality Def. 8.8 to conclude that A normal if and only if T is (TT ∗ = T ∗ T ): AA∗ = QTQ∗ QT ∗ Q = QT ∗ Q∗ QTQ∗ = A∗ A QTT ∗ Q = QT ∗ TQ∗ TT ∗ = T ∗ T, where each line is equivalent to its predecessor using the unitarity of Q. So RTP that for any upper triangular matrix T ; TT ∗ = T ∗ T iff T is diagonal. & % MS4105 534 ' $ Write this matrix identity in subscript notation: m X tij t∗jk = m X j=1 j=1 m X m X tij¯tTjk = j=1 m X t∗ij tjk ¯tTij tjk j=1 tij¯tkj = j≥i,k m X ¯tji tjk j≤i,k If I consider diagonal elements of each side of TT ∗ = T ∗ T so that i = k; the latter equality reduces to: m X j≥k & |tkj |2 = m X |tjk |2 . (8.11) j≤k % MS4105 535 ' $ I can now prove the result by induction. I simply substitute k = 1, 2, . . . , m into the last equality above. So RTP that the kth row of T has only its diagonal term non-zero, for k = 1, . . . , m. [Base Step] Let k = 1. RTP that t1j = 0 for all j > 1. Substituting k = 1 into (8.11): |t11 |2 + |t12 |2 + . . . |t1m |2 = |t11 |2 So t1j = 0, for j = 2, . . . , m — the first row of T is zero, apart from t11 . & % MS4105 536 ' $ [Inductive Step] Let k > 1. Assume that for all rows i < k, tij = 0, for j = i + 1, . . . , m. RTP that tkj = 0 for all j = k + 1, . . . , m. Then writing out (8.11) explicitly: |tkk |2 +|tk,k+1 |2 +. . . |tkm |2 = |t1k |2 + |t2k |2 + · · · + |tk−1,k |2 +|tkk |2 . The struck-out terms in blue are zero by the inductive hypothesis. So tkj = 0 for j = k + 1, . . . , m — the kth row of T is zero, apart from tkk . (A sketch of the upper triangular matrix T makes the inductive step much easier to “see”.) So T is diagonal. & % MS4105 537 ' 8.1.11 $ Exercises 1. Prove that the determinant of a block triangular matrix is the product of the determinants of the diagonal blocks. Just use the general definition of the determinant X det A = sign(i1 , i2 , . . . , im )Aii1 A2i2 . . . Amim , i1 6=i2 6=···6=in where the sum is over all permutations of the columns 1, 2, . . . , n. (Used in proof of Thm. 8.3.) 2. Prove Gerchgorin’s Theorem: for any m × m matrix A (not necessarily hermitian), every eigenvalue of A lies in at least one of the m discs in C with centres at aii and radii P Ri (A) ≡ j6=i |aij | where i is any one of the values 1 up to n. See App. M for a proof. & % MS4105 538 ' $ 3. Let 8 A= 0.1 0.5 4 Use Gerchgorin’s Theorem to find bounds on the eigenvalues of A. Are you sure? 4. (Difficult.) Prove the extended version: If k of the discs Di ≡ {z ∈ C : |z − aii | ≤ Ri (A)} intersect to form a connected set Uk that does not intersect the other discs then precisely k of the eigenvalues of A lie in Uk . See App. N for a proof. 5. What can you now say about the eigenvalues of the matrix A in Q.2? & % MS4105 539 ' $ 6. Consider the matrix: 10 2 3 A = −1 0 2 1 −2 1 which has eigenvalues 10.2260, 0.3870 + 2.2216i and 0.3870 − 2.2216i. What does (the extended version of) G’s Theorem tell us about the eigenvalues of A? & % MS4105 540 ' 8.2 8.2.1 $ Computing Eigenvalues — an Introduction Using the Characteristic Polynomial The obvious method for computing the eigenvalues of a general m × m complex matrix is to find the roots of the characteristic polynomial pA (z). Two pieces of bad news: • If a polynomial has degree greater than 4, there is, in general, no exact formula for the roots. I cannot therefore hope to find the exact values of the eigenvalues of a matrix — any method that finds the eigenvalues of a matrix must be iterative. This is not a problem in practice as any desired accuracy can be achieved by iterating a suitable algorithm until the required accuracy is reached. & % MS4105 ' 541 $ • Unfortunately the process of finding the roots of a polynomial is not only not backward stable (O.4) but is not even stable (O.3). Small variations in the coefficients of a polynomial p(x)can give rise to large changes in the roots. So using a root-finding algorithm like Newton’s Method to find the eigenvalues of a given matrix A is not a good idea. & % MS4105 ' 542 $ For a detailed discussion of this interesting topic see App. P. One example to make the point — suppose that I want to find the eigenvalues of the 2 × 2 identity matrix. The characteristic polynomial is just p(x) = (x − 1)2 = x2 − 2x + 1. The roots are of course both equal to one. Suppose that I make a small change in the constant term in ˜ (x) = x2 − 2x + 0.9999 p(x) so that the polynomial is now p corresponding to a change δa0 = −10−4 in the constant term a0 . If I solve the perturbed quadratic using matlab I find that the matlab command roots([1, -2,0.9999]) returns the two roots: 1.0100 . 0.9900 & % MS4105 ' 543 $ So a change of order 10−4 in a coefficient of the characteristic polynomial results in an error in the eigenvalue estimate of order 10−2 ! It is interesting to check that this is exactly the result predicted by Eq. P.4— with r = 1, k = 0 and ak = 1. & % MS4105 544 ' 8.2.2 $ An Alternative Method for Eigenvalue Computation Many algorithms for computing eigenvalues work by computing a Schur decomposition Q∗ AQ = T of A. I proved in Thm. 8.10 that the diagonal elements of a upper triangular matrix are its eigenvalues and obviously T is unitarily similar to A so once I have found the Schur decomposition of A I am finished. These methods are in fact two-stage methods; • The first phase uses a finite number of Householder reflections to write Q∗0 AQ0 = H where Q0 is unitary and H is an upper Hessenberg matrix, a matrix that has zeroes below the first sub-diagonal. I’ll show that this can be accomplished using Householder reflections. & % MS4105 545 ' $ • The second phase is to use an iterative method (pre- and post-multiplying the Hessenberg matrix H by unitary matrices Q∗j and Qj so that lim Q∗j Q∗j−1 . . . Q∗1 HQ1 . . . Qj−1 Qj = T, j→∞ upper triangular. Obviously I can only apply a finite number of iterations in the second phase — fortunately the standard method converges very rapidly to the triangular matrix T . If A is hermitian then it is certainly normal and I know from Thm. 8.11 (or directly) that T must be diagonal. So, for a hermitian matrix, once I have the Schur decomposition A then I have the eigenvectors as the columns of Q and the eigenvalues as the diagonal elements of the diagonal matrix T . On the other hand, if A is not hermitian then the columns of Q, while orthonormal, are not the eigenvectors of A. & % MS4105 546 ' 8.2.3 $ Reducing A to Hessenberg Form — the “Obvious” Method Let’s see to see how the first phase of the Schur factorisation is accomplished. (I briefly describe the “QR algorithm ” for the second phase in the very short Chapter 9 below. However it is worth noting that each iteration of the QR algorithm takes O(m2 ) flops and typically O(m) iterations will reduce the error to machine precision so that the typical cost of phase two is O(m3 ). I’ll show below that phase one also requires O(m3 ) flops so the overall cost of accomplishing a Schur factorisation and so computing the eigenvalues (and eigenvectors if A is hermitian) is O(m3 ).) & % MS4105 ' 547 $ The “obvious” way to do the two phases at once in order to compute the Schur factorisation of A is to pre- and post-multiply A by unitary matrices Q∗k and Qk respectively for k = 1, . . . , m − 1 to introduce zeroes under the main diagonal. The problem is that this cannot be done with m − 1 operations as while I can design a Householder matrix Q∗1 that introduces zeroes under the main diagonal; it will change all the rows of A. This is not a problem but when I postmultiply by Q1 (≡ Q∗1 as the Householder mtrices are hermitian) it will change all the columns of A thus undoing the work done in pre-multiplying A by Q∗1 . Of course I should have known in advance that this approach had to fail as I saw at the start of this Section on Slide 540 that no finite sequence of steps can yield us the eigenvalues of A. & % MS4105 548 ' 8.2.4 $ Reducing A to Hessenberg Form — a Better Method I need to be less ambitious and settle for reducing A to upper Hessenberg (zeroes below the first sub-diagonal) rather than upper triangular form. At the first step I select a Householder reflector Q∗1 that leaves the first row unchanged. Left-multiplying A by it forms linear combinations of rows 2 to m to introduce zeroes into rows 3 to m of the first column. When I right-multiply Q∗1 A by Q1 , the first column is left unchanged so I do not lose any of the zeroes already introduced. So the algorithm will consist at each iteration (i = 1, . . . , m − 2) of pre- and post-multiplying A by Q∗i and Qi respectively (in fact Q∗i = Qi ). & % MS4105 ' 549 $ The algorithm now can be stated: Algorithm 8.1 Householder Hessenberg Reduction (1) (2) (3) (4) (5) (6) (7) for k = 1 to m − 2 x = Ak+1:m,k vk = x + sign(x1 )kxke1 vk = vk /kvk k Ak+1:m,k:m = Ak+1:m,k:m − 2vk (v∗k Ak+1:m,k:m ) A1:m,k+1:m = A1:m,k+1:m − 2 (A1:m,k+1:m vk ) v∗k end & % MS4105 ' 550 $ Some comments: • In line 2, x is set to the sub-column k, from rows k + 1 to m. • The Householder formula for v in line 3 ensures that I − 2vv∗ /v∗ v zeroes the entries in the kth column from row k + 2 down. • In line 5, I left-multiply the (m − k) × (m − k + 1) rectangular block Ak+1:m,k:m by the (m − k) × (m − k) matrix I − 2vv∗ (having normalised v). • In line 6 I right-multiply the (updated in line 5) m × (m − k) matrix A1:m,k+1:m by I − 2vv∗ . & % MS4105 551 ' 8.2.5 $ Operation Count The work done in Alg. 8.1 is dominated by the (implicit) inner loops in lines 5 and 6: Line 5 Ak+1:m,k:m = Ak+1:m,k:m − 2vk (v∗k Ak+1:m,k:m ) Line 6 A1:m,k+1:m = A1:m,k+1:m − 2 (A1:m,k+1:m vk ) v∗k • Consider Line 5. It applies a Householder reflector on the left. At the kth iteration, the reflector operates on the last m − k rows of the full matrix A. When the reflector is applied, these rows have zeroes in the first k − 1 columns (as these columns are already upper Hessenberg). So only the last m − k + 1 entries in each row need to be updated — implicitly, for j = k, . . . , m. & % MS4105 ' 552 $ This inner step updates the jth column of the submatrix Ak+1:m,k:m . If I write L = m − k + 1 for convenience then the vectors in this step are of length L. The update requires 4L − 1 ≈ 4L flops. Argue as follows: L flops for the subtractions, L for the scalar multiple and 2L − 1 for the inner product (L multiplications and L − 1 additions). Now the index j ranges from k to m so the inner loop requires ≈ 4L(m − k + 1) = 4(m − k + 1)2 flops. • Consider Line 6. It applies a Householder reflector on the right. The last m − k columns of A are updated. Again, implicitly, there is an inner loop, this time with index j = 1 . . . m that updates the jth row of the sub-matrix. The inner loop therefore requires≈ 4Lm = 4m(m − k + 1) flops. • Finally, the outer loop ranges from k = 1 to k = m − 2 so I can write W, the total number of flops used by the Householder & % MS4105 553 ' $ Hessenberg reduction algorithm as: W=4 m−2 X (m − k + 1)2 + m(m − k + 1) k=1 =4 =4 m−2 X (m − k + 1)(2m − k + 1) k=1 m X k(m + k) (k ↔ m − k + 1) k=3 ≈ 4 m(m(m + 1)/2) + m(m + 1)(2m + 1)/6 ≈ 10m3 /3. & (8.12) % MS4105 554 ' 8.2.6 $ Exercises 1. Show that for a hermitian matrix, the computational cost drops to O(8/3m3 ) flops. 2. Show that when the symmetry property is taken into account the cost drops to O(4/3m3 ) flops. 3. Write a Matlab function m-file for Alg. 8.1 and test it; first on randomly generated general square matrices then on hermitian (real symmetric) matrices. & % MS4105 ' 9 555 $ The QR Algorithm As mentioned in Section 8.2.3 the QR algorithm forms the second phase of the task of computing the eigenvalues (and eigenvectors ) of a square complex matrix. It also is the basis of algorithms for computing the SVD as I will very briefly see in Ch. 10. In this Chapter I can only introduce the QR algorithm (not the QR factorisation already seen ). A full analysis will not be possible in the time available. I will work toward the QR algorithm, first considering some simpler related methods. I’ll start with a very simple and perhaps familiar method for computing some of the eigenvalues of a square complex matrix. & % MS4105 556 ' 9.1 $ The Power Method For any diagonalisable (or equivalently non-defective) m × m matrix I have AX = XΛ where the columns of the invertible matrix X are eigenvectors of A and Λ is a diagonal matrix of corresponding eigenvalues. Number the eigenvalues in decreasing order: |λ1 | ≥ · · · ≥ |λm | and let q(0) ∈ Cm with kq(0) k = 1, an arbitrary unit vector. Then the Power Method is described by the following simple algorithm : Algorithm 9.1 Power Method (1) (2) for k = 1, 2, . . . p(k) = Aq(k−1) (k) q (3) (4) (5) = p(k) kp(k) k (k)∗ λ(k) = q end & Aq(k) % MS4105 ' 557 $ I will briefly analyse this simple algorithm and show that if A is hermitian (A∗ = A) then the algorithm converges quadratically to the largest eigenvalue in magnitude. (For non-hermitian A I only have linear convergence.) For the rest of this Chapter I will make the simplifying assumption that A is hermitian. It is interesting to note that all the results below still hold when I weaken this restriction and allow A to be normal. Remember that a normal matrix is unitarily diagonalisable and this is the property needed. I will not discuss this further in this course. & % MS4105 558 ' $ First define the Rayleigh Quotient: Definition 9.1 ( Rayleigh Quotient) For any m × m hermitian matrix A and any x ∈ Cm , define the complex-valued function: x∗ Ax r(x) = ∗ x x The properties of the Rayleigh Quotient are important in the arguments that follow. & % MS4105 559 ' $ Obviously r(qJ ) = λJ for any eigenvector qJ . It is reasonable to expect that, if x is close to an eigenvector qJ that r(x) will approximate the corresponding eigenvalue λJ . How closely? If I compute the gradient ∇ r(x) of r(x) wrt the vector x (i.e. form the vector of partial derivatives ∂r(x) ∂xi ) I find that x∗ x (Ax + A∗ x) − 2(x∗ Ax)x . ∇ r(x) = ∗ 2 (x x) ¯) = 0 if and only So if, as I are assuming, A is hermitian, then ∇ r(x ¯ satisfies if x ¯ = r(x ¯ )x ¯, Ax ¯ is an eigenvector corresponding to the eigenvalue r(x ¯). i.e. x & % MS4105 560 ' $ If for some eigenvector qJ with eigenvalue λJ , if I do a ¯ I have: (multi-variate) Taylor series expansion around x r(x) = r(qJ ) + (x − qJ )∗ ∇ r(qJ ) + O(kx − qJ k2 ) (9.1) where the struck-out term in blue is zero. So the error in using r(x) to approximate the eigenvalue λJ is of order the square of the error in x. But the arbitrary starting vector q(0) can be expanded in terms of Pm the eigenvectors qj ; q(0) = i=1 αj qj . If α1 6= 0 then X k (0) A q = αj λkj qj k m X αj λj = α1 λk1 q1 + qj . α1 λ1 i=2 & % MS4105 561 ' The ratios $ λj λ1 k go to zero as k → ∞ (as λ1 is the largest eigenvalue in magnitude). But q(k) is just a normalised version of Ak q(0) , say ck Ak q(0) so k P λj m αj q(k) = ck Ak q(0) = ck α1 λk1 q1 + ck α1 λk1 qj . i=2 α1 λ1 Both q(k) and the eigenvectors qk are unit vectors so I can conclude that the factor ck α1 λk1 → ±1 as k → ∞ if I are working in Rm or a complex phase if working in Cm . (As the matrix A is hermitian, I can take the eigenvectors and eigenvalues to be real.) This means that the q(k) and λ(k) in the Power method will converge to the largest eigenvalue in magnitude λ1 and the corresponding eigenvector q1 . & % MS4105 562 ' $ For k sufficiently large check that kq(k) − (±)q1 k =O(| λλ21 |k ). I also have that λ(k) = q(k)∗ Aq(k) k ∗ k m m X X αj λj αj λj = (ck λk1 α1 )2 q1 + qj λ1 q1 + λj qj α1 λ1 α1 λ1 j=2 j=2 2 2k m X αj λj k 2 = (ck λ1 α1 ) λ1 + λj . α1 λ1 j=2 It follows that the eigenvalue estimates converge quadratically in the following sense: 2k λ2 (k) |λ − λ1 | = O = O(kq(k) − (±)q1 k2 ). (9.2) λ1 I could have concluded this directly from the sentence following (9.1). & % MS4105 563 ' 9.2 $ Inverse Iteration But of course I want all the eigenvalues not just the largest in magnitude. If µ is not an eigenvalue of A then check that the eigenvectors of −1 (A − µI) are the same as those of A and that the corresponding eigenvalues are λj1−µ , j = 1, . . . , m. Suppose that µ ≈ λj then | λj1−µ | | λk1−µ | for k 6= j. Here comes the clever idea: apply the power method to the matrix −1 (A − µI) . The algorithm “should” converge to qj and λj where λj is the closest eigenvalue to µ. & % MS4105 564 ' $ Algorithm 9.2 Inverse Iteration Method (1) (2) (3) Choose q(0) arbitrary with kq(0) k = 1. for k = 1, 2, . . . Solve (A − µI) p(k−1) = q(k−1) for p(k−1) (k) q (4) (5) (6) = p(k) kp(k) k (k)∗ λ(k) = q end Aq(k) The convergence result for the Power method can be extended to Inverse Iteration noting that Inverse iteration is the Power method with a different choice for the matrix A. We’ll state it as a theorem. & % MS4105 565 ' $ Theorem 9.1 Let λj be the closest eigenvalue to µ and λp the next closest so that |µ − λj | < |µ − λp | ≤ |µ − λi | for i 6= j, p. If q∗j q(0) 6= 0 (αj 6= 0) then for k sufficiently large k µ − λj (k) kq − (±)qj k = O µ − λp 2k µ − λj (k) |λ − λj | = O . µλp Proof: Exercise. What if µ is (almost) equal to an eigenvalue ? You would expect the algorithm to fail in this case. In fact it can be shown that even though the computed p(k) will be far from the correct value in this case; remarkably, the computed q(k) will be close to the exact value. & % MS4105 566 ' 9.3 $ Rayleigh Quotient Iteration A second clever idea; • I have a method (the Rayleigh Quotient) for obtaining an eigenvalue estimate from an eigenvector estimate q(k) . • I also have a method (inverse iteration) for obtaining an eigenvector estimate from an eigenvalue estimate µ. Why not combine them? This gives the following algorithm. & % MS4105 567 ' $ Algorithm 9.3 Rayleigh Quotient Iteration (1) (2) (3) (4) Choose q(0) arbitrary with kq(0) k = 1. λ(0) = q(0)∗ Aq(0) the corresponding Rayleigh Quotient . for k = 1, 2, . . . (k−1) (k−1) Solve A − λ I p = q(k−1) for p(k−1) (k) q (5) (6) (7) = p(k) kp(k) k (k)∗ λ(k) = q end Aq(k) It can be shown that this algorithm has cubic convergence. I state the result as a Theorem without proof. & % MS4105 ' 568 $ Theorem 9.2 If λj is an eigenvalue of A and q(0) is close enough to qj the corresponding eigenvector of A then for k sufficiently large kq(k+1) − (±)qj k = O kq(k) − (±)qj k3 |λ(k+1) − λj | = O |λ(k) − λj |3 . Proof: Omitted for reasons of time. (This remarkably fast convergence only holds for normal matrices.) & % MS4105 ' 569 $ Operation count for Rayleigh Quotient Iteration • If A is a full m × m matrix then each step of the Power method requires a matrix -vector multiplication — O(m2 ) flops. • Each step of the Inverse iteration method requires solution of a linear system — as I have seen O(m3 ) flops. • This reduces to O(m2 ) if the matrix A − µI has already been factored (using QR or LU factorisation). • Unfortunately, for Rayleigh Quotient iteration, the matrix to be (implicitly) inverted changes at each iteration as λ(k) changes. • So back to O(m3 ) flops. • This is not good as the Rayleigh Quotient method has to be applied m times, once for each eigenvector /eigenvalue pair. & % MS4105 ' 570 $ • However, if A is tri-diagonal, it can be shown that all three methods require only O(m) flops. • Finally, the good news is that when A is hermitian, the Householder Hessenberg Reduction Alg. 8.1 (the first stage of the process of computing the eigenvectors and eigenvalues of A) has reduced A to a Hessenberg matrix — but the transformed matrix is still hermitian and a hermitian Hessenberg matrix is tri-diagonal. & % MS4105 571 ' 9.4 $ The Un-Shifted QR Algorithm I are ready to present the basic QR algorithm — as I’ll show (in an Appendix), it needs some tweaking to be a practical method. Algorithm 9.4 Un-shifted QR Algorithm (1) (2) (3) (4) (5) A(0) = A for k = 1, 2, . . . Q(k) R(k) = A(k−1) Compute the QR factorisation of A. A(k) = R(k) Q(k) Combine the factors in reverse order. end The algorithm is remarkably simple! Despite this, under suitable conditions, it converges to a Schur form for the matrix A, upper triangular for a general square matrix, diagonal if A is hermitian. & % MS4105 ' 572 $ Try it: • a=rand(4,4); a=a+a’; Generate a random hermitian matrix • while ∼converged [q,r]=qr(a); Calculate QR factors for a • a=r*q; Combine the factors in reverse order • end • It works and coverges fast for this toy problem. • But if you change to (say) a 20 × 20 matrix convergence is very slow. & % MS4105 ' 573 $ • The QR algorithm can be shown to be closely related to Alg. 9.3 (Rayleigh Quotient Iteration). • With small but vital variations, the QR algorithm works very well. • Unfortunately the reasons why it works so well are technical. • See App. I for the full story. • The “Shifted QR Algorithm” is an improved version of the QR Algorithm — you’ll find pseudo-code in Alg. I.5 in Appendix I. • Coding it in Matlab is straightforward with the exception of the recursive function call at line 9. & % MS4105 574 ' 10 $ Calculating the SVD Some obvious methods for computing the SVD were discussed in Sec. 6.5. I promised there to return to the subject at the end of the course. & % MS4105 575 ' 10.1 $ An alternative (Impractical) Method for the SVD There is an interesting method for computing the SVD of a general m × n matrix A. For simplicity assume that A is square (m = n). I have as usual that A = UΣV. Form the 2m × 2m matrices 0 A∗ V Σ 0 1 V , O= √ A= and S = 2 U −U A 0 0 −Σ It is easy to check that O is unitary (O∗ O = I2m ). & % MS4105 576 ' $ Check that the block 2 × 2 equation AO = OS is equivalent to AV = UΣ and A∗ U = VΣ∗ = VΣ — which in turn are equivalent to from A = UΣV ∗ . So I have A = OSO∗ which is an eigenvalue decomposition of A and so the singular values of A are just the absolute values of the eigenvalues of A and U& V can be extracted from the eigenvectors of A. This approach is numerically stable and works well if eigenvalue and eigenvector code is available. It is not used in practice because the size of the matrix A is excessive for large problems. However it does reassure us that it is possible to compute a SVD to high accuracy. & % MS4105 577 ' 10.1.1 $ Exercises 1. Write Matlab code to implement the above idea (using the built-in Matlab eig function). Test your code on a random 20 × 20 matrix. Check the results with the built-in svd function. & % MS4105 578 ' $ 2. Can you extend the above construction to transform the problem of computing the SVD A = UΣV ∗ of a general m × n matrix into an eigenvalue problem? Hints: use ∗ 0 A , A= A 0 ¯ = where U ^ √1 U 2 ^ Σ 0 ¯ −U U0 and S = 0 −Σ ^ V¯ 0 0 0 ^ | U0 where and V¯ = √12 V. Here U = U ¯ U O= V¯ 0 0 , 0 ^ as U usual consists of the first n columns of U and U0 consists of the ^ is as usual the n × n submatrix remaining columns. Finally Σ of Σ consisting of the first n rows and columns of Σ. & % MS4105 579 ' $ 3. • Write Matlab code to implement the above idea (using the built-in Matlab eig function). • You’ll probably need to swap blocks of O around as the built-in Matlab eig command may not produce O with the right block structure. • The Matlab spy command is very useful in situations like this when you want to examine a matrix structure. • A useful variation on spy is: function myspy(A,colour) if nargin==1 spy(abs(A)>1.0e-8) else spy(abs(A)>1.0e-8,colour) end & % MS4105 ' 580 $ • Test your code on a random 30 × 20 matrix. • Check the results with the built-in svd function. & % MS4105 581 ' 10.2 $ The Two-Phase Method The two-phase method for computing the eigenvectors and eigenvalues of a square (real and symmetric) matrix was introduced in Ch. 8.2. Its second phase, the QR algorithm, was elaborated at length in Ch. 9. The two phases were: 1. Using Householder Hessenberg Reduction (Alg. 8.1) to reduce A to a Hessenberg matrix — symmetric Hessenberg matrices are tri-diagonal. 2. Using the QR algorithm to reduce the tri-diagonal matrix to a diagonal one. & % MS4105 ' 582 $ Briefly, one standard method for computing the SVD of a general m × n matrix A is to 1. use a “Golub-Kahn Bidiagonalisation” to reduce A to bi-diagonal form (only the main diagonal and first super-diagonal non-zero). 2. Use the QR algorithm to reduce the resulting bi-diagonal to a diagonal matrix. I will not discuss the Golub-Kahn Bidiagonalisation algorithm here — but it is very similar to Householder Hessenberg Reduction and indeed uses Householder reflections in a very similar way to introduce zeroes alternately in columns to the left of the main diagonal and in rows to the right of the first super-diagonal. & % MS4105 583 ' 10.2.1 $ Exercises 1. Write pseudo-code (based on Alg. 8.1) to implement Golub-Kahn Bidiagonalisation. 2. Test your pseudo-code by writing a Matlab script to implement it. 3. What is the computational cost of the algorithm? & % MS4105 ' 584 $ Part III Supplementary Material & % MS4105 585 ' A $ Index Notation and an Alternative Proof for Lemma 1.10 We start with a familiar example — a linear system of n equations in m unknowns: a11 x1 +a12 x2 +... +a1m xm = b1 a21 x1 .. . +a22 x2 .. . +... .. . +a2m xm = b2 .. . an1 x1 +an2 x2 +... +anm xm = bn (A.1) The system can be written much more compactly as ai1 x1 + ai2 x2 + · · · + aim xm = bi where i = 1, . . . , n. & % MS4105 586 ' $ But we can do better! We can use “index” notation to write: m X aij xj = bi , where i = 1, . . . , n. j=1 A final twist is the “summation convention”. It very simply says that if, in a formula, (or a single term in a formula) the same subscript (index) is repeated then that index is to be summed over the possible range of the index. So for example Pk = Zkg Fg , where k = 1, . . . , N is a short-hand for Pk = M X Zkg Fg , where k = 1, . . . , N . g=1 & % MS4105 587 ' $ Another example, the trace of a matrix is the sum of its diagonal elements so: Tr(A) = Aii is short-hand for Tr(A) = n X Aii i=1 A more complicated example: Aij = Bix Cxy Dyj is short-hand for Aij = N X M X Bix Cxy Dyj x=1 y=1 which is just the (familiar??) formula for taking the product of (in this case) three matrices so that A = BCD. & % MS4105 588 ' $ So the linear system (A.1) can be writtem aij xj = bi (A.2) We do not need to re-write the range of the subscripts i and j as they are completely determined by the number of elements in the vectors l and s — so that we must have i = 1, . . . , n and j = 1, . . . , m. A final example — Linear Independence: Definition A.1 (Alternative Linear Independence) If S = {v1 , . . . , vk } is a non-empty set of vectors then S is linearly independent if αj vj = 0 only has the solution αi = 0 , i = 1, . . . , k. Note that the index notation still works when we are summing over vectors, not just scalars. & % MS4105 589 ' $ Now for a streamlined version of the proof of Lemma 1.10. It will be word-for word the same as the original proof except for using index notation with the summation convention. Proof: Recall that L = {l1 , . . . , ln } is a linearly independent set in Vand S = {s1 , . . . , sm } is a second subset of V which spans V. We will assume that m < n and and show this leads to a contradiction. As S spans V we can re-write (1.1) using index notation and the summation convention as: li = aij sj , i = 1, . . . , n. (A.3) Now consider the linear system aji cj = 0, i = 1, . . . , m — note the sneaky reversal of the order of the subscripts (equivalent to writing AT c = 0). This is a homogeneous linear & (A.4) % MS4105 590 ' $ system with more unknowns than equations and so must have non-trivial solutions for which not all of c1 , . . . , cn are zero. So we can write (multiplying each aji cj by si and summing over i) si aji cj = 0. (A.5) Re-ordering the factors (OK to do this as each term in the double sum over i and j is a product of a vector sj and two scalars cj and aij ): cj (aji si ) = 0. (A.6) But the sums in brackets are just lj by (A.3). (Note that the roles of i and j are swapped when going from (A.3) to (A.6).) So we can write: cj lj = 0 & % MS4105 ' 591 $ with cj not all zero. But this contradicts the assumption that L = {l1 , . . . , ln } is linearly independent. Therefore we must have n ≤ m as claimed. & % MS4105 592 ' B $ Proof that under-determined homogeneous linear systems have a non-trivial solution Prove the result: “For all m ≥ 1, given an m × n matrix A where m < n, the under-determined homogeneous linear system Ax = 0 has non-trivial solutions” by induction. Write the under-determined homogeneous linear system as: & a11 x1 +a12 x2 +... +a1n xn =0 a21 x1 .. . +a22 x2 .. . +... .. . +a2n xn .. . =0 .. . am1 x1 +am2 x2 +... +amn xn =0 (B.1) % MS4105 ' 593 $ [Base Step] This is the case m = 1 where there is only one equation in the n unknowns x1 , . . . , xn , namely a11 x1 + a12 x2 + · · · + a1n xn = 0. Suppose that a11 6= 0. Then set x1 = −a12 /a11 , x2 = 1 and all the rest to zero. This is a non-trivial solution as required. & % MS4105 594 ' $ [Induction Step] Now assume that the result holds for all under-determined homogeneous linear systems with m − 1 rows and n − 1 columns and m − 1 < n − 1 ≡ m < n. • Suppose that the first column of the coefficient matrix is not all zero. Then one iteration of Gauss Elimination produces a new linear system (with the same solutions as the original) of the form: & 1x1 +^ a12 x2 +... +^ a1n xn =0 0x1 .. . +^ a22 x2 .. . +... .. . +^ a2n xn .. . =0 .. . 0x1 +^ am2 x2 +... +^ amn xn =0 (B.2) % MS4105 595 ' $ Rows 2–m of this linear system are an underdetermined homogeneous linear system with m − 1 rows and n − 1 columns and so by the inductive assumption there is a non-trivial solution (not all zero) for x2 , x3 , . . . , xn . Finally, set x1 = 0 to obtain a non-trivial solution to the full linear system (B.2) and therefore to (B.1). • The trivial case where the first column of the coefficient matrix is all zero is left as an exercise. & % MS4105 596 ' C $ Proof of the Jordan von Neumann Lemma for a real inner product space A proof of the Jordan von Neumann Lemma 2.5 mentioned in Q. 5 on Slide 123. Proof: We define the “candidate” inner product (2.9) 1 1 u, v = ku + vk2 − ku − vk2 4 4 (C.1) and seek to show that it satisfies the inner product space axioms Def. 2.1 provided that the Parallelogram Law (2.2) 2 2 2 2 (C.2) ku + vk + ku − vk = 2 kuk + kvk holds. & % MS4105 597 ' $ 1. The Symmetry Axiom holds as (2.9) is symmetric in u and v. 2. To prove the Distributive Axiom we begin with: 1 u + v, w + u − v, w = ku + v + wk2 − ku + v − wk2 4 + ku − v + wk2 − ku − v − wk2 (C.3) = 1 (ku + wk2 + kvk2 ) 2 − (ku − wk2 + kvk2 ) 1 = ku + wk2 − ku − wk2 2 = 2 u, w . (C.4) (C.5) (C.6) We used the Parallelogram Law to derive (C.4) from (C.3). & % MS4105 598 ' $ So u + v, w + u − v, w = 2 u, w (C.7) and setting u = v leads to 2u, w = 2 u, w , i.e. the 2 can be “taken out”. Now replace u by (u + v)/2 and v by (u − v)/2 in (C.7)— leading to: u + v , w = u + v, w u, w + v, w = 2 2 (C.8) which is the Distributive Axiom. & % MS4105 599 ' $ 3. To prove the Homogeneity Axiom αu, v = α u, v , we first note that we already have 2u, w = 2 u, w . The Distributive Axiom with v = 2u gives 3 u, w = u, w + 2 u, w . But this is equal to u, w + 2u, w = u + 2u, w = 3u, w . and so by induction for any positive integer n: n u, w = u, w + (n − 1) u, w = u, w + (n − 1)u, w = u + (n − 1)u, w = nu, w . (C.9) Now set u = (1/n)u in both sides of (C.9) giving n u/n, w = u, w or u 1 ,w = u, w . n n & (C.10) % MS4105 600 ' $ Setting n = m another arbitrary positive integer in (C.10) and combining with (C.9) we have n n u, w = u, w . (C.11) m m So for any positive rational number r = n/m, ru, w = r u, w . (C.12) We can include negative rational numbers by using the Distributive Axiom again with v = −u so that 0 = u − u, w = u, w + −u, w and so −u, w = − u, w . Finally (and this is a hard result from First Year Analysis) any real number can be approximated arbitrarily closely by a rational number so we can write ru, w = r u, w , for all r ∈ R (C.13) which is the Homogeneity Axiom. & % MS4105 601 ' $ 4. The Non-Negativity Axiom The conditions u, u ≥ 0 and u, u = 0 if and only if u = 0 are trivial as 1 u, u = 4 ku + uk2 = kuk ≥ 0 as k · k is a norm. Again kuk = 0 if and only if u = 0 as k · k is a norm. & % MS4105 ' D 602 $ Matlab Code for (Very Naive) SVD algorithm The following is a listing of the Matlab code referred to in Sec. 6.3. You can download it from http://jkcray.maths.ul.ie/ms4105/verynaivesvd.m. & % MS4105 ' 1 2 3 4 5 6 7 8 9 10 11 12 603 $ m=5;n=3;a=rand(m,n); asa=a’∗a; % form a∗a [v,s2]=eig(asa); % find the eigenvectors v and the eigenvalues s2 of a∗a s=sqrt(s2); % s is a diagonal matrix whose diagonal elements % are the non−zero singular values of a ds=diag(s); % ds is the vector of non−zero singular values [dss,is]=sort(ds,1,’descend’); %Sort the singular values % into decreasing order s=diag(dss); % s is a diagonal matrix whose diagonal elements % are the non−zero singular values sorted into % decreasing order s=[s’ zeros(n,m−n)]’; % pad s with m−n rows of zeros v=v(:,is); % apply the same sort to the columns of v & % MS4105 ' 13 14 15 16 17 18 19 20 21 22 23 24 604 $ aas=a∗a’; % form aa∗ [u,s2p]=eig(aas); % find the eigenvectors u % and the eigenvalues s2p of a a∗ sp=sqrt(s2p); % sp is a diagonal matrix whose diagonal elements % are the singular values of a (incl the zero ones) dsp=diag(sp); ; % dsp is the vector of all the singular values [dssp,isp]=sort(dsp,1,’descend’); %Sort the singular values % into decreasing order sp=diag(dssp); % s is a diagonal matrix whose diagonal elements % are the singular values sorted into decreasing order u=u(:,isp);% apply the same sort to the columns of u norm(u∗sp(:,1:n)∗v’−a) % should be very small & % MS4105 ' E 1 2 3 4 5 6 7 8 9 10 11 12 13 605 $ Matlab Code for simple SVD algorithm m=70; n=100; ar=rand(m,n); ai=rand(m,n); a=ar+i∗ai; asa=a’∗a; [v,s2]=eig(asa); av=a∗v; s=sqrt(s2); if m>n s=[s zeros(n,m−n)]’; %if m>n!! end; [q,r]=qr(av); & % MS4105 ' 14 15 16 17 18 19 20 21 22 23 24 25 26 27 606 $ d=diag(r); dsign=sign(d); dsign=dsign’; zpad=zeros(1,m−n); dsign=[dsign zpad]’; dsignmat=diag(dsign); if m<n dsignmat=[dsignmat zeros(m,n−m)]; end; u=q∗dsignmat; atest=u∗s∗v’; diffjk=norm(a−atest) [U,S,V]=svd(a); diffmatlab=norm(U∗S∗V’−a) & % MS4105 607 ' F $ Example SVD calculation −2 A= −10 11 −2 −10 , AT = 5 11 5 so 104 −72 T . A A= −72 146 Solve AT AV = VΛ: λ − 104 72 T = (λ−104)(λ−146)−722 = 0. det(I−λA A) = det 72 λ − 146 So the eigenvalues are λ = 200, 50. Now find the eigenvectors . Let & % MS4105 ' 608 $ x , then AT Av1 = 200v1 simplifies to 104x − 72y = 200x so v1 = y x = −3y/4 giving v1 = (−3, 4)T . We can normalise this to v1 = 15 (−3, 4)T . Similarly AT Av2 = 50v2 gives the normalised v2 = 15 (4, 3)T . 200 0 −3 4 1 . and Λ = So V = 5 0 50 4 3 10 5 . Now find a QR factorisation for AV = 10 −5 & % MS4105 609 ' G $ Uniqueness of U and V in S.V.D. We re-state Theorem 6.7. Theorem G.1 If an m × n matrix A has two different SVD’s A = UΣV ∗ and A = LΣM∗ then U∗ L = diag(Q1 , Q2 , . . . , Qk , R) V ∗ M = diag(Q1 , Q2 , . . . , Qk , S) where Q1 , Q2 , . . . , Qk are unitary matrices whose sizes are given by the multiplicities of the corresponding distinct non-zero singular values — and R, S are arbitrary unitary matrices whose size equals the number of zero singular values. More precisely, if Pk q = min(m, n) and qi = dim Qi then i=1 qi = r = rank(A) ≤ q. & % MS4105 610 ' $ The proof below investigates what happens in the more complicated case where one or more singular value is repeated. We need two preliminary results: Lemma G.2 If Σ is diagonal then for any compatible matrix A, AΣ = ΣA if and only if aij = 0 whenever σii 6= σjj . Proof: We have (AΣ)ij = aij σjj so AΣ = ΣA ≡ (AΣ)ij = (ΣA)ij ≡ aij σjj = aij σii This is equivalent to aij (σii − σjj ) = 0. The result follows. • So for A to commute with Σ, it can have non-zero diagonal elements but non-zero off-diagonal elements (at aij say) only when the corresponding elements σii and σjj are the same. • The diagonal Σ must have repeated diagonal elements for a matrix A that commutes with it to have off-diagonal elements. • The following Example illustrates the point. & % MS4105 611 ' $ Example G.1 If Σ is the diagonal matrix on the left below then any 7 × 7 matrix A with the block structure shown satisfies AΣ = ΣA. 1 2 Σ= 2 3 3 3 0 0 0 0 0 0 0 ,A = 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Try experimenting with Matlab to convince yourself. The Example suggests the following Lemma: & % MS4105 612 ' $ Lemma G.3 If Σ = diag(c1 I1 , . . . , cM IM ) where the Ik are identity matrices then for any compatible matrix A, AΣ = ΣA iff A is block diagonal A 0 ... 0 1 0 0 A2 . . . A= .. .. .. .. . . . . 0 0 ... AM where each Ai is the same size as the corresponding Ii . Proof: From the first Lemma, as Σ here is diagonal, we have aij = 0 unless σii = σjj . Since the scalars ci are distinct, σii = σjj only within a block. The result follows. & % MS4105 613 ' $ Proof: (Main Theorem) We take m ≥ n — the other case is left as an exercise. Using the first SVD for A: AA∗ = UΣV ∗ VΣ∗ U∗ = UΣΣ∗ U∗ . We also have AA∗ = LΣΣ∗ L∗ . (Note that although Σ is diagonal, it is not hermitian — even if real — as Σ is m × n .) Then Σ2 ≡ ΣΣ∗ (as previously) is an m × m matrix with an n × n diagonal block in the top left and zeroes elsewhere. So equating the two expressions for AA∗ ; UΣ2 U∗ = LΣ2 L∗ and Σ2 = U∗ LΣ2 L∗ U = PΣ2 P∗ where P = U∗ L, a unitary m × m matrix. Re-arranging we have Σ2 P = PΣ2 . It follows from the second Lemma above that P is block diagonal with blocks whose sizes match the multiplicities of the singular values in Σ2 ≡ ΣΣ∗ . & % MS4105 614 ' $ ¯ 2 (say) is Similarly Σ∗ Σ = QΣ∗ ΣQ∗ where Q = V ∗ M. Now Σ∗ Σ ≡ Σ an n × n matrix (equal to the n × n block in the top left of ΣΣ∗ and ¯2 = Σ ¯ 2 Q. Again appealing to the second Lemma above we have QΣ we have that Q is block diagonal with blocks whose sizes match the multiplicities of the singular values in Σ∗ Σ. So P1 0 . . P= . 0 0 0 ... 0 0 0 .. . , 0 ˜ P P2 .. . ... .. . 0 .. . 0 ... Pk 0 ... 0 Q1 0 . . Q= . 0 0 0 ... 0 0 0 .. . , 0 ˜ Q Q2 .. . ... .. . 0 .. . 0 ... Qk 0 ... 0 where each Pi and Qi are the same size as the corresponding block in ΣΣ∗ and Σ∗ Σ respectively. & % MS4105 615 ' $ In fact each Pi = Qi . Reason as follows: we have A = UΣV 0 = LΣM 0 . So, using P = U∗ L and Q = V ∗ M we have L = UP and M = VQ so that UΣV 0 = UPΣQ 0 V 0 . Multiplying on the left by U 0 and on the right by V gives Σ = PΣQ 0 . But Σ is diagonal and each Pi is of the same size as the corresponding Qi — corresponding to the multiplicities of the corresponding distinct non-zero singular values. So finally we must have Pi Qi0 = Ii , for each of the blocks giving Pi = Qi . ˜ is an (m − n) × (m − n) unitary matrix When m > n check that P ˜ is absent. (State and the row and column in Q corresponding to Q the corresponding results when n > m and n = m.) & % MS4105 616 ' H $ Oblique Projection Operators — the details In this Appendix we explain in detail how a matrix expression for a non-orthogonal (oblique) projection operator may be calculated. (Or return to Sec. 5.1.4.) We will explain two different methods. Both use the same definitions: Let B1 = {s11 , . . . , sk1 } form a basis for S1 = range(P) and B2 = {s12 , . . . , sm−k } form a basis for S2 = null(P). 2 & % MS4105 617 ' $ 1. • We seek to construct an idempotent (P2 = P) matrix P s.t. Pv = v for v ∈ S1 and Pv = 0 for v ∈ S2 . h i • Define the m × k matrix B1 = s11 . . . sk1 and the h i m × (m − k) matrix B2 = s12 . . . sm−k . 2 ¯ 2 whose columns are a basis • Also define the m × k matrix B for the space orthogonal to S2 . ¯ 2 is orthogonal to every vector in B2 • So every column of B (or equivalently to every column of B2 or every vector in S2 = null(P)). ¯ T v = 0. • So for any v ∈ S2 ≡ null(P), B 2 ¯ T also has this property • Any matrix P of the form P = XB 2 — i.e. Pv = 0 for all v ∈ S2 . • We will tie down the choice of X by requiring that Pv = v for v ∈ S1 . & % MS4105 618 ' $ • Now consider v ∈ S1 ≡ range P. • So v = B1 z for some z ∈ Ck . ¯T v = B ¯ T B1 z. • Therefore B 2 2 ¯ T v. ¯ T B1 )−1 B • And z = (B 2 2 • As v ∈ S1 ≡ range P we have ¯ T B1 )−1 B ¯ T v. Pv ≡ v = B1 z = B1 (B 2 2 • This expression also holds when v ∈ null(P) as in that case −1 ¯ T ¯ T v = 0 and so B1 (B ¯T B B2 v = 0. 2 B1 ) 2 ¯ T B1 )−1 B ¯T . • So we can write P = B1 (B 2 & 2 % MS4105 619 ' $ 2. An alternative “formula” for P can be derived as follows: • As S1 and S2 are complementary subspaces of Cm we have that B1 ∪ B2 is a basis for Cm . • It follows that the matrix B whose columns are the vectors s11 , . . . , sk1 , s12 , . . . , sm−k ; 2 h i h i B = s11 . . . sk1 | s12 . . . sm−k = B1 | B2 2 must be invertible. • Now as Pv = v for all v in S1 and Pv = 0 for all v in S2 we must have Psi1 = si1 for i = 1, . . . k and Psi2 = 0 for i = 1, . . . m − k and so h i h i h i PB = P B1 | B2 = PB1 | PB2 = B1 | 0 & % MS4105 620 ' $ • Multiplying on the right by B−1 ; h P = B1 | i −1 0 B Ik =B 0 0 −1 B . 0 • It follows that 0 I−P =B 0 & 0 Im−k B−1 . % MS4105 ' 621 $ Let’s check that each formula works — we’ll use a simple example. Example H.1 Let P project vectors in R2 into the y–axis along the line y = −αx. So S1 is the y–axis and S2 is the line y = −αx. ¯ T ). ¯ T B1 )−1 B 1. • Use the first formula above ( P = B1 (B 2 2 0 • We have B1 = 1 1 . • And B2 = −α α ¯ . • So B2 = 1 & % MS4105 ' 622 $ • Substituting into the formula above for P we find: −1 h i h i 0 0 P = α 1 α 1 1 1 h i 0 = (1) α 1 1 0 0 = α 1 & % MS4105 623 ' $ 0 2. Using the second recipe above, as before we have B1 = . 1 1 And B2 = −α α 1 0 1 −1 . and (check ) B = • So B = 1 0 1 −α • So (as k = 1), 1 P=B 0 & 0 −1 0 B = 0 1 1 1 −α 0 0 α 1 0 0 = . 0 1 0 α 1 % MS4105 ' 624 $ So for any vector v = (x, y)T , Pv = (0, αx + y)T and as x and y are aribtrary we see that range P is indeed the y–axis and that Pv = 0 precisely when y = −αx. Which method do you think is better? Why? (Back to Sec. 5.1.4.) & % MS4105 625 ' I I.1 $ Detailed Discussion of the QR Algorithm Simultaneous Power Method To show that this method works in general we need to temporarily back-track to the Power Method. A natural extension of the simple Power Method above is to apply it simultaneously to a matrix Q(0) of random starting vectors. This offers the prospect of calculating all the eigenvalues and eigenvectors of A at once rather than using inverse iteration to compute then one at a time. We start with a set of n randomly chosen linearly (0) (0) independent unit vectors (n ≤ m) q1 , . . . , qn . We expect/hope that just as & % MS4105 626 ' $ (0) • Ak q1 → (a multiple of) q1 (0) (0) • so the space spanned by {Ak q1 , . . . , Ak qn } E D (0) (0) – written Ak q1 , . . . , Ak qn should converge to hq1 , . . . , qn i, the space spanned by the n eigenvectors corresponding to the n largest eigenvalues. & % MS4105 627 ' $ h We define the m × n matrix Q(0) = q(0) 1 (0) q2 ... (0) qn i and Q(k) = Ak Q(0) . (We are not normalising the columns as yet.) As we are interested in the column space of Q(k) , use a reduced QR factorisation to extract a well-behaved (orthonormal) basis for this space: ^ (k) R ^ (k) = Q(k) Q reduced QR factorisation of Q(k) (I.1) ^ (k) is m × n and R ^ (k) is n × n . As usual, Q ^ (k) will converge We expect/hope that as k → ∞, the columns of Q to (±) the eigenvectors of A, q1 , . . . , qn . & % MS4105 628 ' $ We can justify this hope! • Define the m × n matrix whose columns are the first n ^ = [q1 , . . . , qn ] . eigenvectors of A, i.e. Q (0) • Expand qj (k) and qj (0) qj in terms of these eigenvectors : X ^ = αij qi ↔ Q(0) = QA i (k) qj = X αij λki qi i We assume that • the leading n + 1 eigenvalues are distinct in modulus • the n × n matrix A of expansion coefficients aij is non-singular ^ is the matrix whose columns in the sense that if (as above) Q are the first n eigenvectors q1 , . . . , qn then all the leading ^ ∗ Q(0) are non-singular. principal minors of A ≡ Q & % MS4105 629 ' $ Now the Theorem: Theorem I.1 If we apply the above un-normalised simultaneous (or block) power iteration to a matrix A and the above assumptions ^ (k) converge linearly to are satisfied then as k → ∞ the matrices Q the eigenvectors of A in the sense that (k) − (±)q)jk = O(ck ) for each j = 1, . . . , n, λk+1 is less than 1. where c = max 1≤k≤n λk kqj We include the proof for completeness. ^ to the full m × m unitary matrix Q of Proof: Extend Q eigenvectors of A. Let Λ be the diagonal matrix of eigenvalues so ^ to be the top that A = QΛQT (A real and symmetric). Define Λ left n × n diagonal block of Λ. & % MS4105 630 ' $ Then Q(k) ≡ Ak Q(0) = QΛk QT Q(0) ^Λ ^ kQ ^ T Q(0) + O(|λn+1 |k ) as k → ∞ =Q ^Λ ^ k A + O(|λn+1 |k ) =Q as k → ∞. (Check that we can justify the step from the first to the second line by writing h i ^ 0 Λ ¯ ,Λ = ^ Q Q= Q ¯ 0 Λ ¯ and Λ ¯ are the remaining m − n ^ and Λ ^ are as above and Q where Q columns of Q and the bottom right (m − n) × (m − n) block of Λ respectively.) & % MS4105 ' 631 $ ^ T Q(0) is non-singular (in terms of We assumed above that A ≡ Q its principal minors) so multiply the last equation above on the right by A−1 A ≡ In giving T (0) (k) k k ^ ^ ^ Q . Q = QΛ + O(|λn+1 | ) Q ^ T Q(0) is non-singular, the column space of Q(k) is the same as As Q ^Λ ^ k + O(|λn+1 |k ) (as XY is a linear the column space of Q ^Λ ^ k as combination of the columns of X). This is dominated by Q k → ∞. & % MS4105 ' 632 $ ^ T Q(0) are We also assumed that all the principal minors of A ≡ Q non-singular — so the above argument may be applied to leading ^ the first column, the first subsets of the columns of Q(k) and Q; and second, and so on. In each case we conclude that the space spanned by the corresponding columns of Q(k) converge linearly to ^ the space spanned by the corresponding columns of Q. From this convergence of all the successive column spaces together ^ R(k) ^ with the definition of the QR factorisation, Q(k) = Q(k) , the result follows. & % MS4105 633 ' I.1.1 $ A Normalised version of Simultaneous Iteration To make Simultaneous Iteration useful we must normalise at each iteration, not just after multiplying by A k times — otherwise round-off would cause all accuracy to be lost. Algorithm I.1 Normalised Simultaneous Iteration ^ (0) m × n with orthonormal columns. (1) Choose an arbitrary Q (2) for k = 1, 2, . . . ^ (k−1) (3) Z = AQ ^ (k) R ^ (k) = Z reduced QR factorisation of Z (4) Q (5) end ^ (k) and Z(k) are the same; both Obviously the column spaces of Q are in the column space of A. (You should check this statement.) Ignoring the numerical round-off issues, Alg. I.1 converges under the same assumptions as the original un-normalised version. & % MS4105 ' 634 $ We now add an extra Line 5 to the algorithm. It defines A(k) , a projected version of the original matrix A. We will see that A(k) converges to the diagonal matrix Λ. ¯^ (k) and R ¯ (0) = I for simplicity and drop the hats on Q ^ (k) We take Q as we will be using full QR factorisations as A is square and we ¯ and R ¯ in SI to distinguish want all its eigenvectors. We will use Q the Q(k) , R(k) generated by SI from those generated by QR. Algorithm I.2 Normalised Simultaneous Iteration with extra line 5 ¯ (0) = I (1) Q (2) for k = 1, 2, . . . ¯ (k−1) (3) Z = AQ ¯ (k) R ¯ (k) = Z reduced QR factorisation of Z (4) Q ¯ (k)T AQ ¯ (k) allows comparison with QR algorithm (5) A(k) = Q (6) end & % MS4105 ' 635 $ We now re-write the QR algorithm, Alg. 9.4, also with an extra line that defines Qπ (k) , the product of Q(1) . . . Q(k) . (Here π stands for “product”.) Here we will use Q and R in SI to distinguish the Q(k) , R(k) generated by the QR algorithm from those generated by the SI algorithm. Algorithm I.3 Un-shifted QR Algorithm with extra line (1) (2) (3) (4) (5) (6) A(0) = A, Q(0) = I for k = 1, 2, . . . Q(k) R(k) = A(k−1) Compute the QR factorisation of A. A(k) = R(k) Q(k) Combine the factors in reverse order. Qπ (k) = Q(1) . . . Q(k) end & % MS4105 ' 636 $ We can state a Theorem which establishes that the two algorithms are equivalent. Theorem I.2 Algorithms I.2 and I.3 are equivalent in that they generate the same sequence of matrices. ¯ (k) generated by the SI algorithm are equal • More precisely the Q ¯ (k) generated by SI are to the Qπ (k) generated by the QR, the R equal to the R(k) generated by QR while the same A(k) are computed by both. ¯ (i) ¯ (k) • Define R π to be the product of all the R ’s computed so far in SI and similarly for R(k) π in QR. (Again, π stands for “product”.) & % MS4105 637 ' $ • Claim that: ¯ (k) . . . R ¯ (1) ¯ (k) R ≡ R π and that: Alg. Q ¯ (k) Q SI QR & R ¯ (k) R m m Qπ (k) R(k) is equal to (k) (1) R(k) ≡ R . . . R π Projection of A ¯ (k)T AQ ¯ (k) A(k) = Q Powers of A ¯ (k) R ¯ (k) Ak = Q π A(k) = Qπ (k)T AQπ (k) Ak = Qπ (k) R(k) π % MS4105 ' 638 $ Proof: The proof is by induction. • The symbols Qπ (k) and R(k) are only used in the analysis of the QR algorithm — they have no meaning in the SI algorithm. ¯ (k) and R ¯ (k) are only used in the analysis of the • The symbols Q SI algorithm — they have no meaning in the QR algorithm. [Base Case k = 0] Compare the outputs of the two algorithms. ¯ (0) = I and A(0) = A. [S.I.] We have Q [QR] A(0) = A and Qπ (0) = Q(0) = I. ¯ (0) = I. [Both] R(0) = R ¯ (0) = Q (0) = I and so A = A(0) = Q ¯ (0)T AQ ¯ (0) So trivially, Q π for both algorithms . (A0 = I by definition so nothing to check.) X & % MS4105 639 ' $ [Inductive Step k ≥ 1] Compare the outputs of the two algorithms. [S.I.] We need to ¯ (k)T AQ ¯ (k) . But this is Line 5 of S.I.X • prove that A(k) = Q ¯ (k) ¯ (k) R • prove that Ak = Q π . Assume ¯ (k−1) ¯ (k−1) R . (Inductive hypothesis.) Then Ak−1 = Q π ¯ (k−1) ¯ (k−1) R Ak = AQ π ¯ (k−1) = ZR π Line 3 of SI algorithm ¯ (k) R ¯ (k) R ¯ (k−1) =Q π Line 4 of SI algorithm ¯ (k) R ¯ (k) .X =Q π & % MS4105 640 ' $ [QR algorithm ] We need to • prove that Ak = Qπ (k) R(k) π . Inductive hypotheses: assume that 1. Ak−1 = Qπ (k−1) R(k−1) π 2. A(k−1) = Qπ (k−1)T AQπ (k−1) . Then Ak = AQπ (k−1) R(k−1) π Inductive hypothesis 1 = Qπ (k−1) A(k−1) R(k−1) π = Qπ (k−1) Q(k) R(k) R(k−1) π Inductive hypothesis 2 Line 3 of QR alg. = Qπ (k) R(k) π .X & % MS4105 641 ' $ • (We need to) prove that A(k) = Qπ (k)T AQπ (k) . We have A(k) = R(k) Q(k) Line 4 of QR alg. = Q(k)T A(k−1) Q(k) Line 3 of QR alg. = Q(k)T Qπ (k−1)T AQπ (k−1) Q(k) Inductive hypothesis 2 = Qπ (k)T AQπ (k) .X ¯ (k) R ¯ (k) Finally, we have proved that Ak = Q π for the S.I. algorithm ¯ (k) and and that Ak = Qπ (k) R(k) π for the QR algorithm . But Q ¯ (k) are QR factors (of Z) at each iteration of S.I. and so are R respectively unitary and upper triangular. Also Q(k) and R(k) are QR factors (of A(k−1) ) at each iteration of S.I. so their products Qπ (k) and R(k) π are also respectively unitary and upper triangular. Therefore, by the so by the uniqueness of the factors in a QR (k) ¯ (k) = Q (k) and R ¯ (k) factorisation, Q = R π π . (The latter equality π & % MS4105 642 ' ¯ (k) = R(k) for all k.) means that we also have R & $ % MS4105 ' 643 $ So both algorithms • generate orthonormal bases for successive powers of A, i.e. they generate eigenvectors of A • generate eigenvalues of A as the diagonal elements of A(k) are Rayleigh Quotients of A corresponding to the columns of Qπ (k) . As the columns of Qπ (k) converge to eigenvectors, the Rayleigh Quotients converge to the corresponding eigenvalues as in (9.2). Also, the off-diagonal elements of A(k) correspond to generalised Rayleigh Quotients using different approximate eigenvectors on the left and right. As these approximate eigenvectors must become orthogonal as they converge to distinct eigenvectors, the off-diagonal elements of A(k) → 0. & % MS4105 ' 644 $ We can summarise the results now established with a Theorem whose proof is implicit in the previous discussion. Theorem I.3 Let the (unshifted) QR algorithm be applied to a real symmetric m × m matrix A (with |λ1 | > |λ2 | > · · · > |λm |) whose corresponding eigenvector matrix Q has all non-singular leading principal minors. Then as k → ∞, A(k) converges linearly |λk+1 | with constant factor max to diag(λ1 , . . . , λm ) and Qπ (k) k |λk | (with the signs of the columns flipped as necessary) converges at the same rate to Q. & % MS4105 ' 645 $ We focus on the QR algorithm and how to make it more efficient. Thanks to the two Theorems we can drop the clumsy Q and R notation. We repeat the algorithm here for convenience — dropping the (k) underbar (i.e. writing Q(k) as Q(k) and Qπ (k) as Qπ and similarly for R) from now on: Algorithm I.4 Un-shifted QR Algorithm with extra line (1) (2) (3) (4) (5) (6) A(0) = A, Q(0) = I for k = 1, 2, . . . Q(k) R(k) = A(k−1) Compute the QR factorisation of A. A(k) = R(k) Q(k) Combine the factors in reverse order. (k) (k−1) (k) Qπ = Qπ Q ≡ Q(1) . . . Q(k) end & % MS4105 646 ' I.1.2 $ Two Technical Points For the QR method to work, when we reverse the order in Line 4 we should check that the tri-diagonal property is preserved. In other words that if a tri-diagonal matrix T has a QR factorisation T = QR then the matrix RQ is also tri-diagonal. (Remember that phase 1 of the process of finding the eigenvectors and eigenvalues of a real symmetric matrix A consists of using Householder Hessenberg Reduction (Alg. 8.1) to reduce A to a Hessenberg matrix — and symmetric Hessenberg matrices are tri-diagonal.) This is left as an exercise, you will need to prove the result by induction. Another point; for greater efficiency, given that the input matrix is tri-diagonal, we should use 2 × 2 Householder reflections when computing the QR factorisations in the QR algorithm — this greatly increases the speed of the algorithm. & % MS4105 647 ' I.2 $ QR Algorithm with Shifts In this final section, we tweak the (very simple) un-shifted QR Alg. I.4 and greatly improve its performance. We have proved that the un-shifted “pure” QR algorithm is equivalent to Simultaneous Iteration (S.I.) applied to the Identity matrix. So in particular, the first column of the result iterates as if the power method were applied to e1 , the first column of I. Correspondingly, we claim that “pure” QR is also equivalent to Simultaneous Inverse Iteration applied to a “flipped” Identity matrix P (one whose columns have been permuted). In particular we claim that the mth column of the result iterates as if the Inverse Iteration method were applied to em . & % MS4105 648 ' $ Justification of the Claim Let Q(k) as above be the orthogonal factor at the kth step of the pure QR algorithm. We saw that the accumulated product Q(k) π ≡ k Y Q (j) h = q(k) 1 | (k) q2 | ... (k) |qm i j=1 is the same orthogonal matrix that appears at the kth step of SI. (k) We also saw that Qπ is the orthogonal matrix factor in a QR (k) factorisation of Ak , Ak = Qπ Rπ (k) . Now invert this formula: (k)−T A−k = Rπ (k)−1 Q(k)−1 = Rπ (k)−1 Q(k)T = Q(k) π π π Rπ (k) as Qπ is orthogonal and A is taken to be real symmetric (and tri-diagonalised). & % MS4105 649 ' $ Let P be the m × m permutation matrix that reverses row and column order (check that you know its structure). As P2 = I we can write h ih i A−k P = Q(k) π P PRπ (k)−T P (I.2) The first factor on the right is orthogonal. The second is upper triangular: • start with Rπ (k)−T lower triangular, • flip it top to bottom (reverse row order) • flip it left to right (reverse column order) (Draw a picture!) & % MS4105 650 ' $ • So (I.2) can be interpreted as a QR factorisation of A−k P. • We can re-interpret the QR algorithm as carrying out SI on the matrix A−1 applied to the starting matrix P. • In other words Simultaneous Inverse Iteration on A. (k) • In particular, the first column of Qπ P (the last column of (k) Qπ !) is the result of applying k steps of Inverse Iteration to em . & % MS4105 651 ' I.2.1 $ Connection with Shifted Inverse Iteration So the pure QR algorithm is both SI and Simultaneous Inverse Iteration — a nice symmetry. But the big difference between the Power Method and Inverse iteration is that the latter can be accelerated using shifts. the better the estimate µ ≈ λj , the more effective an Inverse Iteration step with the shifted matrix A − µI. The “practical” QR algorithm on the next Slide shows how to introduce shifts into a step of the QR algorithm. Doing so corresponds exactly to shifts in the corresponding SI and Simultaneous Inverse Iteration — with the same positive effect. Lines 4 and 5 are all there is to it! (Lines 6–10 implement “deflation”, essentially decoupling A(k) into two sub-matrices whenever an eigenvalue is found — corresponding to very small (k) (k) off-diagonal elements in Aj,j+1 and Aj+1,j .) & % MS4105 ' 652 $ Algorithm I.5 Shifted QR Algorithm (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) A(0) = Q(0) AQ(0)T A(0) is a tri-diagonalisation of A. for k = 1, 2, . . . Pick a shift µ(k) . Q(k) R(k) = A(k−1) − µ(k) I Comp. QR fact. of A − µ(k) I. A(k) = R(k) Q(k) + µ(k) I Combine in reverse order. (k) if any off-diag. element Aj,j+1 suff. close to zero then (k) (k) set Aj,j+1 = Aj+1,j = 0 Apply the algorithm to the two sub-matrices formed. end end Back to Slide 573. & % MS4105 653 ' $ Justification of the Shifted QR Algorithm We need to justify Lines 4 and 5. Let µ(k) be the eigenvalue estimate chosen at Line 3 of the algorithm. Examining Lines 4 and 5, the relationship between steps k − 1 and k of the shifted algorithm is: A(k−1) − µ(k) I = Q(k) R(k) A(k) = R(k) Q(k) + µ(k) I So A(k) = Q(k)T A(k−1) Q(k) (k)T (Check that the µ(k) I’s cancel.) (k) And, by induction, A(k) = Qπ AQπ as in the unshifted ¯ (k) R ¯ (k) . Instead we algorithm. But it is no longer true that Ak = Q have the factorisation (k) (k−1) (1) ¯ (k) R ¯ (k) , A−µ I A−µ I ... A − µ I = Q a shifted variation on S.I. (Proof is similar to that for the equivalence of S.I and QR.) & % MS4105 654 ' $ From the discussion above re the permutation matrix P, the first (k) column of Qπ is the result of applying the shifted power method to e1 using the shifts µ(k) and the last column is the result of applying shifted inverse iteration to em with the same shifts. If the (k) µ(k) ’s are well chosen, the last column of Qπ will converge quickly to an eigenvector. Finally, we need a way to choose shifts to get fast convergence in (k) the last column of Qπ . The obvious choice is the Rayleigh Quotient. To estimate the eigenvalue corresponding to the (k) eigenvector approximated by the last column of Qπ — it is natural to apply the Rayleigh Quotient to this last column. So we take (k)T µ(k) = (k) qm Aqm (k)T (k) qm qm & (k) = q(k)T Aq m m . % MS4105 655 ' $ If this value for the shifts µ(k) is chosen at each iteration then the eigenvalue and eigenvector estimates µ(k) are the same as those chosen by the Rayleigh Quotient algorithm starting with em . A convenient fact: the shifts are available at no extra cost as the (m, m) element of A(k) . To check this: T (k) A(k) = e A em m,m m = eTm Q(k)T AQ(k) π π em (k) = q(k)T m Aqm = r(q(k) m ). So the shifted QR algorithm with this natural choice for the shifts (k) µ(k) has cubic convergence in the sense that qm converges to an eigenvector cubically. & % MS4105 656 ' $ Finally!! (k) • Why the obsession with the last column of Qπ ? • What about the others? • The answer is that the deflation trick decomposes A(k) every time the off-diagonals get sufficiently small. • So we recursively apply the shifted QR algorithm to smaller and smaller matrices — the last column of each converges cubically to an eigenvector. Back to Slide 573. & % MS4105 ' J 657 $ Solution to Ex. 4 in Exercises 5.1.5 • Assume A∗ A is singular so there is a nonzero x s.t. A∗ Ax = 0. If Ax = 0 then a linear combination of the columns of A is zero contradicting A full rank. So y = Ax 6= 0. We have A∗ y = 0 or y∗ A = 0 so y∗ y = y∗ Ax = 0. So y = 0 . Contradiction. • Now assume that A∗ A is invertible and {a, . . . , an } not linearly P n ind. then ∃x ∈ C s.t. xi ai = 0 or Ax = 0. Multiplying by A∗ gives A∗ Ax = 0 and as A∗ A is assumed invertible we have x = 0. Contradiction. & % MS4105 658 ' K $ Solution to Ex. 9 in Exercises 5.1.5 (a) • We have that P2 = P so the eigenvalues of P are either 0 or 1. kPxk . • The norm of P is just kPk = sup n kxk x∈C ,x6=0 • But if ψ is an eigenvector of P corresponding to λ = 1 then Pψ = ψ and so the ratio is 1. • Therefore kPk = kPxk kPψk ≥ = 1. kψk x∈Cn ,x6=0 kxk sup • So kPk ≥ 1 as required. & % MS4105 659 ' $ (b) Now suppose that P is an orthogonal projection operator so P∗ = P (and P2 = P). Then kPk2 = sup kPxk2 = sup x∗ P∗ Px kxk=1 kxk=1 = sup x∗ P2 x = sup x∗ Px kxk=1 ≤ sup kxkkPxk kxk=1 (C.S. inequality) kxk=1 = sup kPxk = kPk. kxk=1 So kPk2 ≤ kPk but we know that kPk ≥ 1 so kPk2 ≥ kPk. We must have kPk = 1. & % MS4105 660 ' $ (c) Finally, assume that P2 = P and that kPk = 1. RTP that ¯2 for all P∗ = P. We have (Exercise 7 on Slide 262) that P∗ v ∈ S ¯1 . RTP that S ¯2 = S1 . v ∈ Cm and P∗ v = 0 for all v ∈ S ¯2 , u 6= 0. Then u is orthogonal to (I − P)u. (i) • Let u ∈ S • So kPuk2 = ku + (I − P)uk2 = kuk2 + k(I − P)uk2 (by Pythagoras’ Theorem). • Divide across by kuk2 (equivalent to setting kuk = 1). • So kPuk2 = 1 + k(I − P)uk2 . • But kPk = 1 so LHS ≤ 1. And RHS ≥ 1. • So LHS = RHS = 1 and k(I − P)uk2 = 0. • So Pu = u and so u ∈ S1 . ¯2 ⊆ S1 . • Thefore S & % MS4105 ' 661 $ (ii) • Now let u ∈ S1 . Then u is orthogonal to (I − P∗ )u as ¯2 along S ¯1 . P∗ is the projection onto S • So kP∗ uk2 = ku + (I − P∗ )uk2 = kuk2 + k(I − P∗ )uk2 (by Pythagoras’ Theorem). • And kP∗ uk2 = 1 + k(I − P∗ )uk2 . • Again LHS ≤ 1. (As kP∗ k = kPk by Exercise 8 on Slide 263). And RHS ≥ 1. • So LHS = RHS = 1 and k(I − P∗ )uk2 = 0. ¯2 . • So P∗ u = u and so u ∈ S ¯2 . • Thefore S1 ⊆ S ¯2 (and S2 = S ¯1 ). • Therefore S1 = S • So S1 and S2 are orthogonal — so P = P∗ . & % MS4105 ' L 662 $ Hint for Ex. 5b in Exercises 5.2.7 ^ i) = R(j, ^ j) = 0 where 1 ≤ i < j < n then If (say) R(i, a1 ∈ span{q1 } non-zero coefficient for q1 a2 ∈ span{q1 , q2 } non-zero coefficient for q2 .. . ai−1 ∈ span{q1 , q2 . . . qi−1 } non-zero coefficient for qi−1 ai ∈ span{q1 , q2 . . . qi−1 } possibly zero coefficient for qi−1 So rank [a1 a2 . . . ai ] ≤ i − 1 (as {a1 , a2 , . . . , ai } is spanned by ≤ i − 1 linearly independent vectors). & % MS4105 663 ' $ • Continuing; ai+1 ∈ span{q1 , q2 . . . qi+1 } non-zero coefficient for qi+1 so itis linearly independent of a1 , a2 , . . . , ai and so rank [a1 a2 . . . ai+1 ] ≤ i (one extra linearly independent column). • Every subsequent vector ap where i + 2 ≤ p ≤ j − 1 is in span{q1 , q2 . . . qp } non-zero coefficient for qp and so is similarly linearly independent of {a1 , a2 , . . . , ap−1 }. • Therefore the rank of the submatrix [a1 a2 . . . ap ] is just one more than that of the submatrix [a1 a2 . . . ap−1 ]. • In other words, rank [a1 a2 . . . ap ] ≤ p − 1, where i + 2 ≤ p ≤ j − 1. • In particular rank [a1 a2 . . . aj−1 ] & ≤ j − 2. % MS4105 664 ' $ • When we get to aj we have aj ∈ span{q 1 , q2 . . . qj−1 } possibly zero coefficient for qj−1 so rank [a1 a2 . . . aj ] ≤ j − 1 (up to j − 1 linearly independent vectors). • In other words a second (or subsequent) zero diagonal element ^ does not necessarily reduce the rank of A further. of R ^ has one or more zero diagonal The conclusion; rank A ≤ n − 1 if R elements. & % MS4105 665 ' M $ Proof of of Gerchgorin’s theorem in Exercises 8.1.11 Let λ be any eigenvalue of A and x a corresponding eigenvector. So Ax = λx and in subscript notation: n X xj aij = λxi j=1 xi aii + n X xj aij = λxi for i = 1 . . . n. j6=i Choose p so that |xp | ≥ |xi | for i = 1 . . . n. Then, taking i = p, λ − app = n X j6=p & apj xj xp % MS4105 666 ' $ and |λ − app | ≤ n X |apj | j6=p by definition of p. & % MS4105 667 ' N $ Proof of Extended Version of Gerchgorin’s theorem in Exercises 8.1.11 By Gerchgorin’s theorem all the eigenvalues of A are contained in the union Um of the m Gerchgorin discs Di n Um = ∪n i=1 Di ≡ ∪i=1 {z ∈ C : |z − aii | ≤ Ri (A)} We can write A = D + E where D is the diagonal part of A and E has zeroes on the main diagonal and write A = D + E. We will treat as a parameter that we can vary in the range 0 ≤ ≤ 1. Note that Ri (A) = Ri (E) = Ri (A). It will be convenient to write Di () ≡ Di (A ). Now we are told that k of the discs intersect to form a connected set Uk that does not intersect the other discs. & % MS4105 ' 668 $ • We can write Uk = ∪ki=1 Di (1) and for any ∈ [0, 1] also write Uk () = ∪ki=1 D( ). • The set Uk (1) is disjoint from the union of the remaining Gerchgorin disks Vk (say) where Vk ≡ Vk (1) = ∪m i=k+1 Di (1). • The set Uk () is a subset of the set Uk (1) (see Fig 22) but for sufficiently small, Uk () is not connected and Uk (0) is just the set of distinct points ∪ki=1 {aii }. • For each i = 1, . . . , k, the eigenvalues λi (A0 ) = λi (D) = aii . • It is true in general that the eigenvalues of a matrix are continuous functions of the entries — so in particular the eigenvalues λi (A ) are continuous functions of . • All the λi (A ) are contained in Uk () by G’s Th. • For sufficiently small the discs Di () must be disjoint. & % MS4105 ' 669 $ • So by the continuity property above, as for = 0 the eigenvalues λi (A0 ) = aii , we must have that for sufficiently small each eigenvalue λi (A ) remains in the disc Di (). • As increases from 0 to 1, the disks Di () eventually overlap. • As increases from 0 to 1, each eigenvalue λi (A ) moves along a continuous path (parameterised by ) starting at aii and ending at λi (A1 ) ≡ λi (A) (see Fig 22). • These continuous curves cannot leave Uk (1) to enter the union of the remaining Gerchgorin disks Vk (1) as Vk (1) is disjoint from Uk (1) so we conclude that Uk (1) contains k eigenvalues of A(1) = A as claimed. • Finally, using the same reasoning, none of the remaining eigenvalues can enter Uk (1). & % MS4105 670 ' $ aii Vk (1) Di (1) Uk (1) Di () λi (A) & Figure 22: Gerchgorin discs % MS4105 671 ' O $ Backward Stability of Pivot-Free Gauss Elimination In this brief Appendix we formally define for reference the ideas of stability and backward stability. These ideas were used informally in Section 7.1.6. • The system of floating point numbers F is a discrete subset of R such that for all x ∈ R, there exists x 0 ∈ F s.t. |x − x 0 | ≤ M |x| where M (machine epsilon) is “the smallest number in F greater than zero that can be distinguished from zero”. • In other words ∀x ∈ R, ∃s.t.|| ≤ M & and fl(x) = x(1 + ). (O.1) % MS4105 672 ' $ • The parameter M can be estimated by executing the code: Algorithm O.1 Find Machine Epsilon M = 1 while 1 + M > 1 begin M = M /2 end (1) (2) (3) (4) (5) • Let fl : R → F be the operation required to approximately ˜. represent a real number x as a floating point number x • The rule x ~ y = fl(x ∗ y) is commonly implemented in modern computer hardware. & % MS4105 673 ' $ • A consequence: the “Fundamental Axiom of Floating point Arithmetic”; for any flop (floating point operation) ~ corresponding to a binary operation ∗ and for all x, y ∈ F there exists a constant with || ≤ M such that x ~ y = (x ∗ y)(1 + ) (O.2) • Any mathematical problem can be viewed as a function f : X → Y from a vector space X of data to another vector space Y. • A algorithm can be viewed as a different map f˜ : X → Y. ˜ − f(x)k. • The absolute error of a computation is kf(x) • The relative error of a computation is ˜ − f(x)k kf(x) . kf(x)k & % MS4105 674 ' $ An algorithm f˜ for a problem f is accurate if the relative error is O(M ), i.e. if for each x ∈ X, ˜ − f(x)k kf(x) = O(M ). kf(x)k • The goal of accuracy as defined here is often unattainable if the problem is ill-conditioned (very sensitive to small changes in the data). Roundoff will inevitably perturb the data. • A more useful and attainable criterion to aspire to is stability. We say that an algorithm f˜ for a problem f is stable if for each x ∈ X, ˜ − f(x ˜)k kf(x) = O(M ). (O.3) ˜)k kf(x ˜ ∈ X with for some x kx−˜ xk kxk = O(M ). • In words: a stable algorithm “ gives nearly the right answer to nearly the right question”. & % MS4105 675 ' $ • A stronger condition is satisfied by some (but not all) algorithms in numerical linear algebra. • We say that an algorithm f˜ for a problem f is backward stable if for each x ∈ X, ˜k kx − x ˜ ˜ ˜ f(x) = f(x) for some x ∈ X with = O(M ). kxk (O.4) • This is a considerable tightening of the definition of stability as the O(M ) in (O.3) has been replaced by zero. • In words: a backward stable algorithm “ gives exactly the right answer to nearly the right question”. & % MS4105 676 ' P $ Instability of Polynomial Root-Finding in Section 8.2 Pn Theorem P.1 If p is a polynomial p(x) = i=0 and r is one of the roots then, if we make a small change δak in the kth coefficient ak , the first-order change δr in the value of the jth root r is rk δr = − 0 δak . p (r) (P.1) Also the condition number of the problem (the ratio of the relative error in r to the relative error in ak ) is given by κ≡ & |δr| |r| |δak | |ak | |ak rk−1 | = |p 0 (r)| (P.2) % MS4105 677 ' $ Proof: The polynomial p depends on both the coefficients a and the argument x so we can write p(r, a) = 0 as r is a root. But p(r + δr, ak + δak ) is still zero (giving an implicit equation for δr. We can find an approximate vaue for δr using a (first order) Taylor series expansion: 0 = δp ≈ ∂p(r) ∂p(r) δr + δak ∂r ∂ak = p 0 (r)δr + rk δak . Solving for ∂r gives the first result and the second follows immediately. rk − p 0 (r) The factor in (P.1) can be large when |r| is large or when p 0 (r) is close to zero or both. A similar comment may be made about the condition number (P.2). & % MS4105 ' 678 $ If we are particularly unfortunate and the polynomial p has a double root (r, say), then the situation is even worse — the change in the root is of the order of the square root of the change in the coefficient ak . So even if the roundoff error in ak , δak = O(M ) (machine epsilon, typically ≈ 10−16 ), the resulting error in r, δr could be as large as 10−8 . We can state this as a theorem. Pn Theorem P.2 If p is a polynomial p(x) = i=0 and r is a double root then, if we make a small change δak in the kth coefficient ak , the second-order change δr in the value of the jth root r is p δr = O( |δak |). (P.3) & % MS4105 679 ' $ Proof: We still have an implicit equation for δr; p(r + δr, ak + δak ) = 0. We have p(r) = p 0 (r) = 0. We can find an approximate value for δr again but now we need a second- order Taylor series expansion: ∂p(r) ∂p(r) 1 ∂2 p(r) 2 ∂2 p(r) 0 = δp ≈ δr + δak + δr + δrδak ∂r ∂ak 2 ∂2 r ∂r∂ak 1 = rk δak + p 00 (r)δr2 + krk−1 δrδak . 2 (Note that there is no ak .) ∂2 p(r) ∂2 a k term as p is linear in the coefficients The second equation is a quadratic in δr and we can solve it for δr giving: q 2 −krk−1 δak ± (krk−1 δak ) − 2rk p 00 (r)δak δr = . (P.4) p 00 (r) & % MS4105 ' 680 $ Recalling that δak is a small << 1 modification in ak , we can see p that the dominant term in δr is O( |δak |) as claimed. & % MS4105 681 ' Q $ Solution to Problem 2 on Slide 279 The Answer Every second element of R is zero; R12 = R14 , · · · = 0, R23 = R25 =, · · · = 0 and Rij = 0 when i is even and j odd or vice versa. The Proof We have h Aik = u1 v1 u2 v2 ... un vn i ik = (QR)ik = Qij Rjk , j ≤ k. where each of the column vectors v1 , . . . , vn is orthogonal to all the column vectors u1 , . . . , un and A is m × 2n, still “tall and thin”, i.e. 2n ≤ m. By def, up = a2p−1 , the 2p − 1 column of A and vq = a2q , the 2q & % MS4105 682 ' $ column of A. So uip ≡ ai,2p−1 = Qij Rj,2p−1 , viq ≡ ai,2q = Qij Rj,2q , j ≤ 2p − 1 j ≤ 2q. and so up = Rj,2p−1 qj , vq = Rj,2q qj , j ≤ 2p − 1 j ≤ 2q. We know that u∗p vq = 0 for all p = 1, . . . , n and q = 1, . . . , n so: 0 = u∗p vq = Rj,2p−1 Ri,2q q∗j qi = Ri,2p−1 Ri,2q j ≤ 2p − 1, i ≤ 2q i ≤ 2p − 1, i ≤ 2q (Q.1) where the vectors qj are the jth columns of the unitary matrix Q. & % MS4105 ' 683 $ RTP that Ri,2p−1 = 0 for all even i and Ri,2q = 0 for all odd i. Prove this by induction on p and q. Note that we can take Rkk 6= 0 for k = 1, . . . , r where r is the rank of R, in this case r = 2n. In other works we assume that R is full rank. [Base case] Either of p or q equal to one. [p = 1] So i ≤ 2p − 1 = 1. Then (Q.1) gives R11 R1,2q = 0 so R1,2q = 0 for all q = 1, . . . , n. [q = 1] We have 2q = 2 so i ≤ 2 and so R1,2p−1 R12 + R2,2p−1 R22 = 0. The first term is zero as R1,2q = 0 for all q so as the diagonal terms are assumed non-zero we must have R2,2p−1 = 0 for all p. & % MS4105 ' 684 $ [Induction Step] Assume that Ri,2p−1 = 0 for all even i ≤ 2k − 1 and Ri,2q = 0 for all odd i ≤ 2k and RTP that R2k+2,2p−1 = 0 and R2k+1,2q = 0. [Let p = k + 1] Then i ≤ 2k + 1 and also i ≤ 2q. So (Q.1) gives 0 = R1,2k+1 R1,2q + R2,2k+1 R2,2q + · · · + R2k,2k+1 R2k,2q + R2k+1,2k+1 R2k+1,2q . But the first and second factor in each term are alternately zero by the inductive hypothesis (R1,2q = 0, R2,2k+1 = 0, . . . , R2k,2k+1 ). So we conclude that the last term must be zero and so R2k+1,2q = 0. & % MS4105 685 ' $ [Let q = k + 1] Then 2q = 2k + 2 and i ≤ 2k + 2. So (Q.1) gives 0 = R1,2p−1 R1,2k+2 + R2,2p−1 R2,2k+2 + · · · + R2k+1,2p−1 R2k+1,2k+2 + R2k+2,2p−1 R2k+2,2k+2 . Again, by the inductive hypothesis, the first and second factors in each term are alternately zero: R1,2k+2 = 0, R2,2p−1 = 0, . . . , R2k+1,2k+2 =) so the last term must be zero; R2k+2,2p−1 = 0 as required. By the Principle of Induction, the result follows. & % MS4105 ' R 686 $ Convergence of Fourier Series (Back to Slide 136.) If f(t) is a periodic function with period 2π that is continuous on the interval (−π, π) except at a finite number of points — and if the one-sided limits exist at each point of discontinuity as well as at the end points −π and π, then the Fourier series F(t) converges to f(t) at each t in (−π, π) where f is continuous. If f is discontinuous at t0 but possesses left-hand and right-hand derivatives at t0 , then + F(t0 ) converges to the average value F(t0 ) = 12 (f(t− ) + f(t 0 0 )) where + f(t− ) and f(t 0 0 ) are the left and right limits at t0 respectively. & % MS4105 ' S 687 $ Example of Instability of Classical GS (Back to Slide 273.) Now apply the CGS algorithm for j = 1, 2, 3. Initialise vi = ai , i = 1, 2, 3. √ [j = 1] r11 ≡ fl(kv1 k) = fl( 1 + 2.10−6 ) = 1. So q1 = v1 . √ ∗ [j = 2] r12 ≡ fl(q1 a2 ) = fl( 1 + 10−6 ) = 1 So 0 −3 v2 ← v2 − 1q1 = − 0 and r22 ≡ fl(kv2 k) = 10 . 10−3 0 . Normalise: q2 = fl(v2 / fl(kv2 k)) = 0 −1 & % MS4105 688 ' $ [j = 3] r13 ≡ fl(q∗1 v3 ) = 1 and r23 ≡ fl(q∗2 v3 ) = −10−3 . So reading the for loop as a running sum, 1 0 0 1 −3 −3 −3 v3 ← 0 − 1 10 + 10 0 = −10 and 10−3 −1 −10−3 10−3 √ r33 = fl(kv3 k) = fl( 2.10−3 ). Normalise: √ 0 q3 ← fl(v3 / fl(kv3 k)) = fl(v3 / fl( 2.10 )) = −0.709 . −0.709 0 0 1 −3 So q1 = 10 , q2 = 0 and q3 = −0.709 . −1 −0.709 10−3 & −3 % MS4105 689 ' T $ Example of Stability of Modified GS (Back to Slide 291.) • Initialise vi = ai , i = 1, 2, 3. √ [j = 1] r11 ≡ fl(kv1 k) = fl( 1 + 2.10−6 ) = 1. So 1 q1 = v1 = 0 . 10−3 [j = 2] r12 ≡ fl(q∗1 v2 ) = 1. So 1 1 0 −3 −3 v2 = v2 − r12 q1 = 10 − 1. 10 = 0 . 0 10−3 −10−3 Normalising, r22 ≡ fl(kv2 k) = 10−3 and & % MS4105 690 ' $ 0 q2 = fl(v2 /r22 ) = 0 . −1 0 −3 [j = 3] r13 ≡ = 1. So v3 = v3 − r13 q1 = −10 . 0 Now r23 ≡ q∗2 v3 = 0 so v3 unchanged. Normalising, 0 −3 . r33 ≡ fl(kv3 k) = 10 so q3 = v3 /r33 = −1 0 fl(q∗1 v3 ) & % MS4105 691 ' U $ Proof of Unit Roundoff Formula Assume that x > 0. Then writing the real number x in the form x = µ × βe−t , βt−1 ≤ µ ≤ βt , (U.1) obviously x lies between the immediately adjacent f.p. numbers y1 = bµcβe−t and y2 = dµeβe−t (strictly speaking, if dµe = βt then y2 = (dµe/β) βe−t+1 ). So fl x = y1 or y2 and I have | fl(x) − x| ≤ |y1 − y2 |/2 ≤ βe−t /2. So 1 e−t fl(x) − x 1 1−t ≤ 2β ≤ β = u. e−t x µ×β 2 (Back to Slide 337.) & (U.2) (U.3) % MS4105 ' V 692 $ Example to Illustrate the Stability of the Householder QR Algorithm (Back to Slide 334.) 1 2 −3 −3 [k = 1] x = 10 and fl(kxk) = 1, v1 = 10 and 10−3 10−3 fl(v∗1 v1 ) = 4.0 . & % MS4105 693 ' $ Now update A: A ← A ← & 2.0 i h −3 −3 A − (2/4.0) 1.0 10−3 1.0 10 2.0 1.0 10 1.0 10−3 1 1 1 1.0 10−3 1.0 10−3 0 1.0 10−3 0 1.0 10−3 1 1 1 1.0 10−3 1.0 10−3 0 1.0 10−3 0 1.0 10−3 2 2 2 −3 −3 −3 − 1.0 10 1.0 10 1.0 10 . 1.0 10−3 1.0 10−3 1.0 10−3 % MS4105 694 ' $ So A & −1 ← 0 0 −1 0 −1.0 10−3 −1 −3 . −1.0 10 0 % MS4105 695 ' $ 0 and fl(kxk) = 1.0 10−3 . [k = 2] Now x = A(2 : 3, 2) = −1.0 10−3 1.0 10−3 (taking sign(x1 ) = 1). • So v2 = −1.0 10−3 • Also fl(v∗2 v2 ) = 2.0 10−3 . & % MS4105 696 ' $ • Now update the lower right 2 × 2 block of A: A(2 : 3, 2 : 3) = A(2 : 3, 2 : 3) −3 h i 1.0 10 1.0 10−3 −1.0 10−3 − 2/(2.0 10−3 ) −1.0 10−3 0 −1.0 10−3 −1.0 10−3 0 −1.0 10−3 0 = 0 −1.0 10−3 (all in three digit f.p. arithmetic) & % MS4105 697 ' $ • So now (after two iterations), A is updated to: −1 −1 −1 0 −1.0 10−3 0 0 0 −1.0 10−3 In fact, a slightly improved version of the algorithm sets the subdiagonals to zero and calculates the diagonal terms using ±kxk at each iteration but I’ll ignore this complication here. & % MS4105 ' 698 $ [k = 3] The matrix A is now upper triangular (in three digit f.p. arithmetic) but • as noted on Slide 328 – I need to either do one more iteration (a little more work but I’ll take this option) – or I could just define v3 = 1 (easier). • I have x = A(3, 3) = −1.0 10−3 and fl(kxk) = 1.0 10−3 . • So v3 = −2.0 10−3 and fl(v∗3 v3 ) = 4.0 10−6 . • Finally; A(3, 3) ← A(3, 3) − 2 −3 −3 −3 −3 (−2.0 10 )(−2.0 10 )(−1.0 10 ) = 1.0 10 . −6 4.0 10 • So the final version of A (the upper triangular matrix R) is −1 −1 −1 −3 R = 0 −1.0 10 0 0 0 1.0 10−3 & % MS4105 ' 699 $ • The vectors v1 , v2 , v3 can be stored as the lower triangular 2.0 0 0 −3 −3 . part of V = 1.0 10 1.0 10 0 1.0 10−3 −1.0 10−3 −2.0 10−3 & %
© Copyright 2024