MA385 (Numerical Analysis I) Niall Madden October 28, 2014 0 MA385: Preliminaries 0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 4 1 Solving nonlinear equations 1.1 Bisection . . . . . . . . . . . . . . 1.2 The Secant Method . . . . . . . . 1.3 Newton’s Method . . . . . . . . . 1.4 Fixed Point Iteration . . . . . . . 1.5 Matlab 1: the bisection and secant 1.6 Nonlinear equations: Exercises . . 1.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 10 12 14 17 19 20 2 Initial Value Problems 2.1 Introduction . . . . . . . . . 2.2 One step methods . . . . . . 2.3 Error Analysis . . . . . . . . 2.4 Runge-Kutta 2(RK2) . . . . 2.5 Higher-order methods . . . . 2.6 From IVPs to Linear Systems 2.7 Matlab 2: Euler’s Method . . 2.8 Matlab 3: RK methods . . . 2.9 IVPs: Exercises . . . . . . . 2.10 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 22 23 25 27 29 32 34 35 36 . . . . . . . . . . . 37 37 38 40 42 43 44 45 45 46 49 51 3 Solving Linear Systems 3.1 Introduction . . . . . . . . 3.2 Gaussian Elimination . . . 3.3 LU-factorisation . . . . . . 3.4 Putting it all together . . . 3.5 Matlab 3: LU-factorisations 3.6 Norms, Condition Numbers, 3.7 Matrix Norms . . . . . . . 3.8 Computing kAk2 . . . . . 3.9 Condition Numbers . . . . 3.10 Gerschgorin’s theorems . . 3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 0 MA385: Preliminaries 0.1 0.1.1 Introduction who would enjoy a bit more discussion at an easier pace might turn to it. Welcome to MA385/MA530 • Cleve Moler, Numerical Computing with MATLAB [2]. The emphasis is on the implementation of algorithms in Matlab, but the techniques are well explained and there are some nice exercises. Also, it is freely available online. This is a Semester 1, two-part, upper level module on numerical analysis. It is often taken in conjunction with MA378/MA531. You may be taking this course if you are enrolled in 3rd year Mathematics, 3rd year Mathematical Science, 4th year Financial Mathematics, 4th year Applied Mathematics, the H.Dip. in Applied Science (Maths), or year one of the two-year MA program. The basic information for the course is as follows: • James F Epperson, An introduction to numerical methods and analysis,[5]. There are five copies in the library at 519.4. • Michelle Schatzman, Numerical Analysis,[8]. Lecturer: Dr Niall Madden, School of Maths. My office is in room ADB-1013, Ar´as de Br´ un. Email: [email protected] • Stoer and Bulirsch,Introduction to Numerical Analysis [6]. A very complete reference for this course. Lectures: Tuesday and Thursday, 2.00–2.50. See website for venues. • Quarteroni, et al, Numerical Mathematics [7]. Also very thorough, and with sample Matlab code. Tutorial/Lab: TBA (to begin during Week 3) Web site: Assessment: • Three assignments that combine written problem sets with lab work and participation. Each is worth 10%. The on-line content for the course will be hosted at NUIGalway.BlackBoard.com and on the Mathematics webserver at http://www.maths.nuigalway.ie/MA385. There you’ll find various pieces of information, including these notes, problem sets, announcements, etc. If you are registered for MA385, you should be automatically enrolled onto the blackboard site. If you are enrolled in MA530, please send an email to me. These notes are a synopsis of the course material. My aim is to provide these in 4 or 5 sections, and always in advance of the class. They contain most of the main remarks, statements of theorems, results and exercises. However, they will not contain proofs of theorems, examples, solutions to exercises, etc. Please let me know of the typos and mistakes that you spot. • A 2-hour exam in December, worth 70%. The main text-book is S¨ uli and Mayers, An Introduction to Numerical Analysis, Cambridge University Press [1]. This is available from the library at 519.4 MAY, and there are copies in the book shop. It is very well suited to this course: though it does not over complicate the material, it approaches topics with a reasonable amount of rigour. There is a good selection of interesting problems. The scope of the book is almost perfect for the course, especially for those students taking both semesters. You should buy this book. Other useful books include • G.W. Stewart, Afternotes on Numerical Analysis, SIAM [3]. In the library at 519.4 STE. Moreover, the full text is freely available online to NUI Galway users! This book is very easy to read, and students You should try to bring these notes to class. It will make following the lecture easier, and you’ll know what notes to take down. 2 Introduction 0.1.2 What is Numerical Analysis? Numerical analysis is the design, analysis and implementation of numerical methods that yield exact or approximate solutions to mathematical problems. It does not involve long, tedious calculations. We won’t (usually) implement Newton’s Method by hand, or manually do the arithmetic of Gaussian Elimination, etc. The Design of a numerical method is perhaps the most interesting; its often about finding a clever way swapping the problem for one that is easier to solve, but has the same or similar solution. If the two problems have the same solution, then the method is exact. If they are similar, then it is approximate. The Analysis is the mathematical part; its usually culminates in proving a theorem that tells us (at least) one of the following 3 MA385: Preliminaries these principles to design algorithms for solving mathematical problems, and discover the properties of these algorithms. course to solve problems. Students will gain the ability to use a Matlab to implement these algorithms, and adapt the codes for more general problems, and for new techniques. Mathematical Preliminaries We’ll look at the implementation of the methods in labs. Anyone who can remember their first and second years of analysis and algebra should be able to handle this course. Students who know a little about differential equations (initial value and boundary value) will find a certain sections (particularly in Semester II) somewhat easier than those who haven’t. If its been a while since you covered basic calculus, you will find it very helpful to revise the following: the Intermediate Value Theorem; Rolle’s Theorem; The Mean Value Theorem; Taylor’s Theorem, and the triangle inequality: |a + b| 6 |a| + |b|. You’ll find them in any good text book, e.g., Appendix 1 of S¨ uli and Mayers. You’ll also find it helpful to recall some basic linear algebra, particularly relating to eigenvalues and eigenvectors. Consider the statement: “all the eigenvalues of a real symmetric matrix are real”. If are unsure what the meaning of any of the terms used, or if you didn’t know that its true, you should have a look at a book on Linear Algebra. Topics 0.1.3 • The method will work; that our algorithm will yield the solution we are looking for; • how much effort is required; • if the method is approximate, determine how close the true solution be to the real one. A description of this aspect of the course, to quote the Epperson [5], is being “rigorously imprecise or approximately precise”. 1. We’ll start with a review of Taylor’s theorem. It is central to the algorithms of the following sections. 2. Root-finding and solving non-linear equations. 3. Initial value ordinary differential equations. 4. Matrix Algorithms I: systems of linear equations. 5. Matrix Algorithms II: eigenvalues and eigenvectors. We also see how these methods can be applied to realworld, including Financial Mathematics. Learning outcomes When you have successfully completed this course, you will be able to demonstrate your factual knowledge of the core topics (root-finding, solving ODEs, solving linear systems, estimating eigenvalues), using appropriate mathematical syntax and terminology. Moreover, you will be able to describe the fundamental principles of the concepts (e.g., Taylor’s Theorem) underpinning Numerical Analysis. Then, you will apply Why take this course? Many industry and academic environments require graduates who can solve real-world problems using a mathematical model, but these models can often only be resolved using numerical methods. To quote one Financial Engineer: “We prefer approximate (numerical) solutions to exact models rather than exact solutions to simplified models”. Another expert, who leads a group in fund management with DB London, when asked “what sort of graduates would you hire”, the list of specific skills included • A programming language and a 4th-generation language such as Matlab (or S-PLUS). • Numerical Analysis Recent graduates of our Financial Mathematics, Computing and Mathematics degrees often report to us that they were hired because that had some numerical analysis background, or were required to go and learn some before they could do some proper work. This is particularly true in the financial sector, games development, and mathematics civil services (e.g., MET office, CSO). Bibliography 0.2 [1] E S¨ uli and D Mayers, An Introduction to Numerical Analysis, 2003. 519.4 MAY. Taylor’s Theorem Taylor’s theorem is perhaps the most important mathematical tool in Numerical Analysis. Providing we can evaluate the derivatives of a given function at some point, it gives us a way of approximating the function by a polynomial. Working with polynomials, particularly ones of degree 3 or less, is much easier than working with arbitrary functions. For example, polynomials are easy to differentiate and integrate. Most importantly for the next section of this course, their zeros are easy to find. Our study of Taylor’s theorem starts with one of the first theoretical results you learned in university mathematics: the mean value theorem. ................................................. [2] Cleve Moler, Numerical Computing with MATLAB, Cambridge University Press. Also available free from http://www.mathworks.com/moler [3] G.W. Stewart, Afternotes on Numerical Analysis, SIAM, 1996. 519.4 STE. [4] G.W. Stewart, Afternotes goes to Graduate School, SIAM, 1998. 519.4 STE. [5] James F Epperson, An introduction to numerical methods and analysis. 519.4EPP [6] Stoer and Bulirsch, An Introduction to Numerical Analysis, Springer. [7] Quarteroni, Sacco and Saleri, Numerical Mathematics, Springer. Theorem 0.2.1 (Mean Value Theorem). If f is function that is continuous and differentiable for all a 6 x 6 b, then there is a point c ∈ [a, b] such that [8] Michelle Schatzman, Numerical Analysis: a mathematical introduction, 515 SCH. f(b) − f(a) = f 0 (c). b−a ................................................. This is just a consequence of Rolle’s Theorem, and has few different interpretations. One is that the slope of the line that intersects f at the points a and b is equal to the slope of the tangent to f at some point between a and b. There are many important consequences of the MVT, some of which we’ll return to later. Right now, we interested in the fact that the MVT tells us that we can approximate the value of a function by a near-by value, with accuracy that depends on f 0 : 4 Taylor’s Theorem 5 Or we can think of it as approximating f by a line: MA385: Preliminaries ries) that approximates the function f about the point x = a is (x − a)2 00 f (a)+ 2 (x − a)3 00 (x − a)k (k) f (a) + . . . f (a). 3! k! pk (x) = f(a) + (x − a)f 0 (a) + We’ll return to this topic later, with a particular emphasis on quantifying the “error” in the Taylor Polynomial. ................................................. But what if we wanted a better approximation? We could replace our function with, say, a quadratic polynomial. Let p2 (x) = b0 + b1 (x − a) + b2 (x − a)2 and solve for the coefficients b0 , b1 and b2 so that p2 (a) = f(a), p20 (a) = f 0 (a), Example 0.2.3. Write down the Taylor polynomial of degree k that approximates f(x) = ex about the point x = 0. p200 (a) = f 00 (a). Giving that f 00 (a) . 2 ................................................. Next, if we try to construct an approximating cubic of the form p2 (x) = f(a) + f 0 (a)(x − a) + (x − a)2 p3 (x) = b0 + b1 (x − a) + b2 (x − a)2 + b3 (x − a)3 , = 3 X bk (x − a)k , k=0 so that p3 (a) = f(a), p30 (a) = f 0 (a), p300 (a) = f 00 (a), and p3000 (a) = f 000 (a). Note: we can write this is a more succinct, using the mathematical short-hand: (k) p3 (a) = f(k) (a) for k = 0, 1, 2, 3. Again we find that f(k) for k = 0, 1, 2, 3. k! As you can probably guess, this formula can be easily extended for arbitrary k, giving us the Taylor Polynomial. Fig. 0.1: Taylor polynomials for f(x) = ex about x = 0 ................................................. As Figure 0.1 suggests, in this case p3 (x) is a more accurate estimation of ex than p2 (x), which is more accurate than p1 (x). This is demonstrated in Figure 0.2 were it is shown the difference between f and pk . bk = 1 Definition 0.2.2 (Taylor Polynomial). The Taylor Polynomial of degree k (also called the Truncated Taylor Se- 0.2.1 The Remainder We now want to examine the accuracy of the Taylor polynomial as an approximation. In particular, we would like to find a formula for the remainder or error : Rk (x) := f(x) − pk (x). Brook Taylor, 1865 – 1731, England. He 1 (re)discovered this polynomial approximation in 1712, though it’s importance was not realised for another 50 years. With a little bit of effort one can prove that: Rk (x) := (x − a)k+1 (k+1) f (σ), for some σ ∈ [x, a]. (k + 1)! Taylor’s Theorem 6 MA385: Preliminaries 0.2.2 Two applications of Taylor’s Theorem The reasons for emphasising Taylor’s theorem so early in this course are that • It introduces us to the concept of approximation, and error estimation, but in a very simple setting; • It is the basis for deriving methods for solving both nonlinear equations, and initial value ordinary differential equations. Fig. 0.2: Error in Taylor polys for f(x) = ex about x = 0 We won’t prove this in class, since it is quite standard and features in other courses you have taken. But for the sake of completeness, a proof is included below in Section 0.2.4 below. Example 0.2.4. With f(x) = ex and a = 0, we get that Rk (x) = xk+1 σ e , some σ ∈ [0, x]. (k + 1)! Example 0.2.5. How many terms are required in the Taylor Polynomial for ex about x = 0 to ensure that the error at x = 1 is • no more than 10−1 ? • no more than 10−2 ? • no more than 10−6 ? • no more than 10−10 ? ................................................. There are other ways of representing the remainder, including the Integral Representation of the Remainder : Z x (n+1) f (t) Rn (x) = (x − t)n dt. (0.2.1) n! a ................................................. Other issues related to Taylor polynomials, and that we will touch on later in the course, include constructing the Taylor series for a function of two variables: f = f(x, y). With the last point in mind, we’ll now outline how to derive Newton’s method for nonlinear equations. (This is just a taster : we’ll return to this topic in the next section). Taylor’s Theorem 0.2.3 7 Exercises Exercise 0.2.1. Prove the Integral Mean Value Theorem: there exists a point c ∈ [a, b] such that 1 f(x) = b−a f(x) − f(a) = f 0 (σ). x−a f(x)dx. a (ii) f(x) = sin(x) about the point x = 0, (iii) f(x) = log(x) about the point x = 1. Exercise 0.2.3. The Fundamental Theorem of Calculus Rx tells us that a f 0 (t)dt = f(x) − f(a). This can be reRx arranged to get f(x) = f(a) + a f 0 (t)dt. Use this and integration by parts to deduce (0.2.1) for the case n = 1. (Hint: Check Wikipedia!) A proof of Taylor’s Theorem Here is a proof of Taylor’s theorem. It wasn’t covered in class. One of the ingredients need is Generalised Mean Value Theorem: if the functions F and G are continuous and differentiable, etc, then, for some point c ∈ [a, b], F 0 (c) F(b) − F(a) = 0 . (0.2.2) G(b) − G(a) G (c) Theorem 0.2.6 (Taylor’s Theorem). Suppose we have a function f that is sufficiently differentiable on the interval [a, x], and a Taylor polynomial for f about the point x = a (x − a)2 00 f (a)+ 2 (x − a)3 00 (x − a)k (n) f (a) + . . . f (a). (0.2.3) 3! k! pn (x) = f(a) + (x − a)f 0 (a) + If the remainder is written as Rn (x) := f(x) − pn (x), then Rn (x) := For the case x 6= a, we will use a proof by induction. The Mean Value Theorem tells us that there is some point σ ∈ [a, x] such that Zb Exercise 0.2.2. Write down the formula for the Taylor Polynomial for √ (i) f(x) = 1 + x about the point x = 0, 0.2.4 MA385: Preliminaries (x − a)n+1 (n+1) f (σ), (n + 1)! Using that p0 (a) = f(a) and that R0 (x) = (x − a)f 0 (σ) we can rearrange to get f(x) = p0 (a) + R0 (x), as required. Now we will assume that (0.2.3)–(0.2.4) are true for the case n = k − 1; and use this to show that they are true for n = k. From the Generalised Mean Value Theorem (0.2.2), there is some point c such that Rk (x) − Rk (a) Rk (x) = = k+1 (x − a) (x − a)k+1 − (a − a)k+1 Rk0 (c) (k + 1)(c − a)k , where here we have used that Rk (a) = 0. Rearranging we see that we can write Rk in terms of it’s own derivative: Rk (x) = Rk0 (c) Proof. We want to prove that, for any n = 0, 1, 2, . . . , there is a point σ ∈ [a, x] such that f(x) = pn (x) + Rn (x). If x = a then this is clearly the case because f(a) = pn (a) and Rn (a) = 0. (0.2.5) So now we need an expression for Rk0 . This is done by noting that it also happens to be the remainder for the Taylor polynomial of degree k − 1 for the function f 0 . Rk0 (x) = f 0 (x) − d f(a) + (x − a)f 0 (a)+ dx (x − a)k (k) (x − a)2 00 f (a) + · · · + f (a) . 2! k! Rk0 (x) = f 0 (x)− (x − a)k−1 (k) 0 00 f (a) + (x − a)f (a) + · · · + f (a) . (k − 1)! But the expression on the last line of the above equation is the formula for the Taylor Polynomial of degree −1 for f 0 . By our inductive hypothesis: (0.2.4) for some point σ ∈ [x, a]. (x − a)k+1 . (k + 1)(c − a)k Rk0 (c) := (c − a)k (k+1) f (σ), k! for some σ. Substitute into (0.2.5) above and we are done. Chapter 1 Solving nonlinear equations 1.1 1.1.1 Bisection Proposition 1.1.1. Let f be a real-valued function that is defined and continuous on a bounded closed interval [a, b] ⊂ R. Suppose that f(a)f(b) 6 0. Then there exists τ ∈ [a, b] such that f(τ) = 0. Introduction Linear equations are of the form: find x such that ax + b = 0 and are easy to solve. Some nonlinear problems are also easy to solve, e.g., find x such that ax2 + bx + c = 0. Cubic and quartic equations also have solutions for which we can obtain a formula. But, in the majority of cases, there is no formula. So we need a numerical method. References • S¨ uli and Mayers [1, Chapter 1]. We’ll follow this pretty closely in lectures. OK – now we know there is a solution τ to f(x) = 0, but how to we actually solve it? Usually we don’t! Instead we construct a sequence of estimates {x0 , x1 , x2 , x3 , . . . } that converge to the true solution. So now we have to answer these questions: • Stewart (Afternotes ...), [3, Lectures 1–5]. A wellpresented introduction, with lots of diagrams to give an intuitive introduction. (1) How can we construct the sequence x0 , x1 , . . . ? • Moler (Numerical Computing with MATLAB) [2, Chap. 4]. Gives a brief introduction to the methods we study, and a description of MATLAB functions for solving these problems. (2) How do we show that limk→∞ xk = τ? There are some subtleties here, particularly with part (2). What we would like to say is that at each step the error is getting smaller. That is • The proof of the convergence of Newton’s Method is based on the presentation in [5, Thm 3.2]. |τ − xk | < |τ − xk−1 | ................................................. Our generic problem is: But we can’t. Usually all we can say is that the bounds on the error is getting smaller. That is: let εk be a bound on the error at step k Let f be a continuous function on the interval [a, b]. Find τ = [a, b] such that f(τ) = 0. |τ − xk | < εk , Then τ is the solution to f(x) = 0. This leads to two natural questions: then εk+1 < µεk for some number µ ∈ (0, 1). It is easiest to explain this in terms of an example, so we’ll study the simplest method: Bisection. 1. How do we know there is a solution? 2. How do we find it? The following gives sufficient conditions for the existence of a solution: for k = 1, 2, 3, . . . . 8 Bisection 1.1.2 9 Solving nonlinear equations Bisection The most elementary algorithm is the “Bisection Method” (also known as “Interval Bisection”). Suppose that we know that f(x) = 0 has a solution, τ, in the interval [a, b]. We divide the interval in two, and check which of these subintervals contains τ (by using that f will have a different sign at the end points of that interval). Now bisect that interval in two and repeat the process. Stop when f sufficiently small at the end points of the interval. This may be expressed more precisely using some pseudocode. Method 1.1.2 (Bisection). Set eps to be the stopping criterion. If |f(a)| 6 eps, return a. Exit. If |f(b)| 6 eps, return b. Exit. Set x0 = a and x1 = b. Set xL = x0 and xR = x1 . Set x2 = (xL + xR )/2. Set k = 2 while( |f(xk )| > eps) xk+1 = (xL + xR )/2; if (f(xL )f(xk+1 ) < eps) xR = xk+1 ; else xL = xk+1 end if; k=k+1 end while; k 0 1 2 3 4 5 6 7 8 9 10 .. . xk 0.000000 2.000000 1.000000 1.500000 1.250000 1.375000 1.437500 1.406250 1.421875 1.414062 1.417969 .. . |xk − τ| 1.41 5.86e-01 4.14e-01 8.58e-02 1.64e-01 3.92e-02 2.33e-02 7.96e-03 7.66e-03 1.51e-04 3.76e-03 .. . |xk − xk−1 | 22 1.414214 5.72e-07 9.54e-07 1.00 5.00e-01 2.50e-01 1.25e-01 6.25e-02 3.12e-02 1.56e-02 7.81e-03 3.91e-03 .. . Table 1.1: Solving x2 − 2 = 0 with the Bisection Method • It will always work. • After i steps we know that Proposition 1.1.4. |τ − xk | 6 √ Example 1.1.3. Find an estimate for 2 that is correct to 6 decimal places. Solution: Try to solve the equation f(x) := x2 − 2 = 0. Then proceed as shown in Figure 1.1 and Table 1.1. 1 k−1 |b − a|, 2 for k = 2, 3, 4, ... ................................................. A disadvantage of bisection is that it is not as efficient as some other methods we’ll investigate later. 3 f(x)= x2 −2 2 1.1.4 x−axis The bisection method is not very efficient. Our next goals will be to derive better methods, particularly the Secant Method and Newton’s method. We also have to come up with some way of expressing what we mean by “better”; and we’ll have to use Taylor’s theorem in our analyses. 1 0 [0] [2] x =a x =1 [3] x =1.5 [1] x =b −1 −2 −0.5 0 0.5 1 1.5 2 2.5 Fig. 1.1: Solving x2 − 2 = 0 with the Bisection Method Note that at steps 4 and 10 in Table 1.1 the error actually increases, although the bound on the error is decreasing. 1.1.3 The bisection method works The main advantages of the Bisection method are Improving upon bisection The Secant Method 1.2 10 Solving nonlinear equations The Secant Method 1.2.1 k 0 1 2 3 4 5 6 7 8 Motivation Looking back at Table 1.1 we notice that, at step 4 the error increases rather decreases. You could argue that this is because we didn’t take into account how close x3 is to the true solution. We could improve upon the bisection method as described below. The idea is, given xk−1 and xk , take xk+1 to be the zero of the line intersects the points xk−1 , f(xk−1 ) and xk , f(xk ) . See Figure 1.2. xk 0.000000 2.000000 1.000000 1.333333 1.428571 1.413793 1.414211 1.414214 1.414214 Table 1.2: Solving x2 − 2 = 0 using the Secant Method where 3 σk = (x[1], f(x[1])) f(x)= x2 −2 2 f(xk ) . f(xk ) − f(xk−1 ) When its written in this form it is sometimes called a relaxation method. x−axis 1 1.2.2 0 x[0] |xk − τ| 1.41e 5.86e-01 4.14e-01 8.09e-02 1.44e-02 4.20e-04 2.12e-06 3.16e-10 4.44e-16 x[2] Order of Convergence x[1] To compare different methods, we need the following concept: −1 −2 (x[0], f(x[0])) −3 −0.5 0 0.5 1 1.5 2 2.5 Fig. 1.2: The Secant Method for Example 1.2.2 Method 1.2.1 (Secant). 1 Choose x0 and x1 so that there is a solution in [x0 , x1 ]. Then define xk+1 = xk − f(xk ) xk − xk−1 . f(xk ) − f(xk−1 ) The Method of Bisection could be written as the weighted average xk+1 = (1 − σk )xk + σk xk−1 , with σk = 1/2. We can also think of the Secant method as being a weighted average, but with σk chosen to obtain faster convergence to the true solution. Looking at Figure 1.2 above, you could say that we should choose σk depending on which is smaller: f(xk−1 ) or f(xk ). If (for example) |f(xk−1 )| < |f(xk )|, then probably |τ − xk−1 | < |τ − xk |. This gives another formulation of the Secant Method. xk+1 = xk (1 − σk ) + xk−1 σk , lim εk = 0, k→∞ (1.2.2) name comes from the name of the line that intersects a curve at two points. There is a related method called “false position” which was known in India in the 3rd century BC, and China in the 2nd century BC. (1.2.3a) and (1.2.1) Example 1.2.2. Use the Secant Method to solve the nonlinear problem x2 − 2 = 0 in [0, 2]. The results are shown in Table 1.2. By comparing Tables 1.1 and 1.2, we see that for this example, the Secant method is much more efficient than Bisection. We’ll return to why this is later. 1 The Definition 1.2.3 (Linear Convergence). Suppose that τ = limk→∞ xk . Then we say that the sequence {xk }∞ k=0 converges to τ at least linearly if there is a sequence of positive numbers {εk }∞ k=0 , and µ ∈ (0, 1), such that |τ − xk | 6 εk for and lim k→∞ k = 0, 1, 2, . . . . εk+1 = µ. εk (1.2.3b) (1.2.3c) So, for example, the bisection method converges at least linearly. The reason for the expression “at least” is because we usually can only show that a set of upper bounds for the errors converges linearly. If (1.2.3b) can be strengthened to the equality |τ − xk | = εk , then the {xk }∞ k=0 converges linearly, (not just “at least” linearly). As we have seen, there are methods that converge more quickly than bisection. We state this more precisely: Definition 1.2.4 (Order of Convergence). Let τ = limk→∞ xk . Suppose there exists µ > 0 and a sequence of positive numbers {εk }∞ k=0 such (1.2.3a) and and (1.2.3b) both hold. Then we say that the sequence {xk }∞ k=0 converges with at least order q if lim k→∞ εk+1 = µ. (εk )q Two particular values of q are important to us: The Secant Method 11 (i) If q = 1, and we further have that 0 < µ < 1, then the rate is linear. (ii) If q = 2, the rate is quadratic for any µ > 0. 1.2.3 Analysis of the Secant Method Our next goal is to prove that the Secant Method converges. We’ll be a little lazy, and only prove a suboptimal linear convergence rate. Then, in our Matlab class, we’ll investigate exactly how rapidly it really converges. One simple mathematical tool that we use is the Mean Value Theorem Theorem 0.2.1 . See also [1, p420]. Proposition 1.2.5. Suppose that f and f 0 are real-valued functions, continuous and defined in an interval I = [τ − h, τ + h] for some h > 0. If f(τ) = 0 and f 0 (τ) 6= 0, then the sequence (1.2.1) converges at least linearly to τ. • We wish to show that |τ − xk+1 | < |τ − xk |. • From Theorem 0.2.1, there is a point wk ∈ [xk−1 , xk ] such that (1.2.4) • Also by the MVT, there is a point zk ∈ [xk , τ] such that f(xk ) f(xk ) − f(τ) = = f 0 (zk ). xk − τ xk − τ Therefore (1.2.5) f(xk ) = (xk − τ)f 0 (zk ). • Suppose that f 0 (τ) > 0. (If f 0 (τ) < 0 just tweak the arguments accordingly). Saying that f 0 is continuous in the region [τ − h, τ + h] means that, for any ε > 0 there is a δ > 0 such that |f 0 (x) − f 0 (τ)| < ε for any x ∈ [τ − δ, τ + δ]. Take ε = f 0 (τ)/4. Then |f 0 (x) − f 0 (τ)| < f 0 (τ)/4. Thus 3 0 5 f (τ) 6 f 0 (x) 6 f 0 (τ) 4 4 for any x ∈ [τ−δ, τ+δ]. Then, so long as wk and zk are both in [τ−δ, τ+δ] f 0 (zk ) 5 6 . 0 f (wk ) 3 Given enough time and effort we could show that the Secant Method converges faster that linearly. In partic√ ular, that the order of convergence is q = (1 + 5)/2 ≈ 1.618. This number arises as the only positive root of q2 − q − 1. It is called the Golden Mean, and arises in many areas of Mathematics, including finding an explicit expression for the Fibonacci Sequence: f0 = 1, f1 = 1, fk+1 = fk + fk−1 for k = 2, 3, . . . . That gives, f0 = 1, f1 = 1, f2 = 2, f3 = 3, f4 = 5, f5 = 8, f6 = 13, . . . . A rigorous proof depends on, among other things, and error bound for polynomial interpolation, which is the first topic in MA378. With that, one can show that εk+1 6 Cεk εk−1 . Repeatedly using this we get: • Let r = |x1 − x0 | so that ε0 6 r and ε1 6 r, • Then ε2 6 Cε1 ε0 6 Cr2 • Then ε3 6 Cε2 ε1 6 C(Cr2 )r = C2 r3 . Before we prove this, we note the following f(xk ) − f(xk−1 ) = f 0 (wk ). xk − xk−1 Solving nonlinear equations • Then ε4 6 Cε3 ε2 6 C(C2 r3 )(Cr2 ) = C4 r5 . • Then ε5 6 Cε4 ε3 6 C(C4 r5 )(C2 r3 ) = C7 r8 . • And in general, εk = Cfk −1 rfk . Newton’s Method 1.3 1.3.1 12 Newton’s Method Motivation These notes are loosely based on Section 1.4 of [1] (i.e., S¨ uli and Mayers, Introduction to Numerical Analysis). See also, [3, Lecture 2], and [5, §3.5] The Secant method is often written as Solving nonlinear equations By writing down the equation for the line at xk , f(xk ) with slope f 0 (xk ), one can show that the formula for the iteration is f(xk ) xk+1 = xk − 0 . (1.3.6) f (xk ) Example 1.3.2. Use Newton’s Method to solve the nonlinear problem x2 − 2 = 0 in [0, 2]. The results are shown in Table 1.3. For this example, the method becomes xk+1 = xk − f(xk )φ(xk , xk−1 ), xk+1 = xk − where the function φ is chosen so that xk+1 is the root of the secant line joining the points xk−1 , f(xk−1 ) and xk , f(xk ) . A related idea is to construct a method xk+1 = xk − f(xk )λ(xk ), where we choose λ so that xk+1 is the point where the line through (xk , f(xk )) with slope f 0 (xk ) cuts the x-axis. This is shown in Figure 1.3. We attempt to solve x2 − 2 = 0, taking x0 = 2. Taking the x1 to be zero of the tangent to f(x) at x = 2, we get x1 = 1.5. Taking the x2 to be zero of the tangent to f(x) at x = 1.5, we get x2 = 1.4167, which is very close to the true solution of τ = 1.4142. Fig. 1.3: Estimating k 0 1 2 3 4 5 x2k − 2 f(xk ) 1 1 = x − = xk + . k f 0 (xk ) 2xk 2 xk xk 2.000000 1.500000 1.416667 1.414216 1.414214 1.414214 |xk − τ| 5.86e-01 8.58e-02 2.45e-03 2.12e-06 1.59e-12 2.34e-16 |xk − xk−1 | 5.00e-01 8.33e-02 2.45e-03 2.12e-06 1.59e-12 Table 1.3: Solving x2 − 2 = 0 using Newton’s Method By comparing Table 1.2 and Table 1.3, we see that for this example, the Newton’s method is more efficient again than the Secant method. Deriving Newton’s method geometrically certainly has an intuitive appeal. However, to analyse the method, we need a more abstract derivation based on a Truncated Taylor Series. √ 2 using Newton’s Method Method 1.3.1 (Newton’s Method2 ). 1. Choose any x0 in [a, b], 1.3.2 2. For i = 1, 2, . . . , set xk+1 to the root of the line through xk with slope f 0 (xk ). We saw in Table 1.3 that this method can be much more efficient than, say, Bisection: it yields estimates that converge far more quickly to τ. Bisection converges (at least) linearly, whereas Newton’s converges quadratically, that is, with at least order q = 2. In order to prove that this is so, we need to 2 Sir Isaac Newton, 1643 - 1727, England. Easily one of the greatest scientist of all time. The method we are studying appeared in his celebrated Principia Mathematica in 1687, but it is believed he had used it as early as 1669. Newton Error Formula 1. Write down a recursive formula for the error. Newton’s Method 13 Solving nonlinear equations 2. Show that it converges • f(τ) = 0 and f 00 (τ) 6= 0, 3. Use this to find the limit of |τ − xk+1 |/|τ − xk |2 . • there is some positive constant A such that Step 2 is usually the crucial part. There are two parts to the proof. The first involves deriving the so-called “Newton Error formula”. Then we’ll apply this to prove (quadratic) convergence. In all cases we’ll assume that the functions f, f 0 and f 00 are defined and continuous on the an interval Iδ = [τ − δ, τ + δ] around the root τ. The proof we’ll do in class is essentially the same as Epperson [5, Thm 3.2]. Proposition 1.3.3 (Newton Error Formula). If f(τ) = 0 and f(xk ) , xk+1 = xk − 0 f (xk ) then there is a point ηk between τ and xk such that τ − xk+1 = − (τ − xk )2 f 00 (ηk ) , 2 f 0 (xk ) Example 1.3.4. As an application of Newton’s error formula, we’ll show that the number of correct decimal digits in the approximation doubles at each step. 1.3.3 Convergence of Newton’s Method We’ll now complete our analysis of this section by proving the convergence of Newton’s method. Proposition 1.3.5. Let us suppose that f is a function such that • f is continuous and real-valued, with continuous f 00 , defined on some close interval Iδ = [τ−δ, τ+δ], |f 00 (x)| 6A |f 0 (y)| for all x, y ∈ Iδ . Let h = min{δ, 1/A}. If |τ − x0 | 6 h then Newton’s Method converges quadratically. Fixed Point Iteration 1.4 14 Fixed Point Iteration 1.4.1 Introduction Newton’s method can be considered to be a particular instance of a very general approach called Fixed Point Iteration or Simple Iteration. The basic idea is: Solving nonlinear equations So we need to refine the method that ensure that it will. Before we do that in a formal way, consider the following... Example 1.4.2. Use the Mean Value Theorem to show that the fixed point method xk+1 = g(xk ) converges if |g 0 (x)| < 1 for all x near the fixed point. If we want to solve f(x) = 0 in [a, b], find a function g(x) such that, if τ is such that f(τ) = 0, then g(τ) = τ. Choose x0 and for k > 0 set xk+1 = g(xk ). Example 1.4.1. Suppose that f(x) = ex −2x−1 and we are trying to find a solution to f(x) = 0 in [1, 2]. Then we can take g(x) = ln(2x + 1). We have reformulated the problem in Example 1.4.1 above, from For f(x) = ex − 2x − 1, find τ ∈ [1, 2] such that f(τ) = 0, to: This is an important example; mostly because it introduces the “tricks” of using that g(τ) = τ and g(xk ) = xk+1 . But it is not a rigorous theory. That requires some ideas such as the contraction mapping theorem. ................................................. 1.4.2 For g(x) = ln(2x + 1), find τ ∈ [1, 2] such that g(τ) = τ. If we take the initial estimate x0 = 1, then we get the following sequence of estimates. k 0 1 2 3 4 5 .. . xk 1.0000 1.0986 1.1623 1.2013 1.2246 1.2381 .. . |τ − xk | 2.564e-1 1.578e-1 9.415e-2 5.509e-2 3.187e-2 1.831e-2 .. . 10 1.2558 6.310e-4 To make this table, I used a numerical scheme to solve the problem quite accurately to get τ = 1.256431. In general we don’t know τ in advance–otherwise we wouldn’t need such a scheme! I’ve given the quantities |τ − xk | here so we can observe that the method is converging, and get an idea of how quickly it is converging.. ................................................. We have to be quite careful with this method: not every choice is g is suitable. k xk Suppose we want the solution to 0 1 f(x) = x2 −2 = 0 in [1, 2]. We could 1 0 2 -2 choose g(x) = x2 + x − 2. Taking 3 0 x0 = 1 we get the iterations shown 4 -2 opposite. 5 0 .. .. This sequence doesn’t converge! . . A short tour of fixed points and contractions A variant of the famous Fixed Point Theorem3 is : Suppose that g(x) is defined and continuous on [a, b], and that g(x) ∈ [a, b] for all x ∈ [a, b]. Then there exists a point τ ∈ [a, b] such that g(τ) = τ. That is, g(x) has a fixed point in the interval [a, b]. Try to convince yourself that it is true, by sketching the graphs of a few functions that send all points in the interval, say, [1, 2] to that interval, as in Figure 1.4. b . .... .... ..... ..... . . . . ... ..... ..... ..... ..... . . . . . . . . . . . . . . . . . . . ........ ... ..... ....... .... ............................... g(x) ... ... .. . ... ... ... .. . .. .. .. .. .. . ...... . .. ....... .......... ........ ....... a a b Fig. 1.4: Sketch of a function g(x) such that, if a 6 x 6 b then a 6 g(x) 6 b 3 LEJ Brouwer, 1881–1966, Netherlands Fixed Point Iteration 15 Solving nonlinear equations The next ingredient we need is to observe that g is a contraction. That is, g(x) is continuous and defined on [a, b] and there is a number L ∈ (0, 1) such that |g(α) − g(β)| 6 L|α − β| for all α, β ∈ [a, b]. (1.4.7) Proposition 1.4.3 (Contraction Mapping Theorem). Suppose that the function g is a real-valued, defined, continuous, and • maps every point in [a, b] to some point in [a, b] • is a contraction on [a, b], then This calculation only gives an upper bound for the number of iterations. It is correct, but not necessarily sharp. In Table 1.4 it is shown that this level of accuracy is achieved after 23 iterations. Even so, 23 iterations a quite a lot for such a simple problem. So can conclude that this method is not as fast as, say, Newton’s Method. However, it is perhaps the most generalizable. (i) g(x) has a fixed point τ ∈ [a, b], (ii) the fixed point is unique, (iii) the sequence {xk }∞ k=0 defined by x0 ∈ [a, b] and xk = g(xk−1 ) for k = 1, 2, . . . converges to τ. Proof: i 0 1 2 3 4 5 6 .. . xk 1.00000 1.09861 1.16228 1.20134 1.22456 1.23812 1.24595 |xk − τ| 2.56e-01 1.58e-01 9.41e-02 5.51e-02 3.19e-02 1.83e-02 1.05e-02 22 23 1.25643 1.25643 1.29e-06 7.32e-07 Table 1.4: Applying Fixed Point Iteration for E.g. 1.4.1 1.4.4 1.4.3 Convergence of Fixed Point Iteration We now know how to apply to Fixed-Point Method and to check if it will converge. Of course we can’t perform an infinite number of iterations, and so the method will yield only an approximate solution. Suppose we want the solution to be accurate to say 10−6 , how many steps are needed? That is, how large must k be so that |xk − τ| 6 10−6 ? The answer is obtained by first showing that |τ − xk | 6 Lk |x1 − x0 |. 1−L (1.4.8) Example 1.4.4. If g(x) = ln(2x + 1) and x0 = 1, and we want |xk − τ| 6 10−6 , then we can use (1.4.8) to determine the number of iterations required. Knowing When to Stop Suppose you wish to program one of the above methods. You will get your computer to repeat one of the iterative methods until your solution is sufficiently close to the true solution: x[0] = 0 tol = 1e-6 i=0 while (abs(tau - x[i]) > tol) // This is the // stopping criterion x[i+1] = g(x[i]) // Fixed point iteration i = i+1 end All very well, except you don’t know τ. If you did, you wouldn’t need a numerical method. Instead, we could choose the stopping criterion based on how close successive estimates are: while (abs(x[i-1] - x[i]) > tol) This is fine if the solution is not close to zero. E.g., if its about 1, would should get roughly 6 accurate figures. But is τ = 10−7 then it is quite useless: xk could be ten times larger than τ. The problem is that we are estimating the absolute error. Instead, we usually work with relative error: while (abs ( x[i−1]−x[i] ) > tol) x[i] Fixed Point Iteration 1.4.5 16 What has Newton’s method ever done for me? When studying a numerical method (or indeed any piece of Mathematics) it is important to ask why you are doing this. Sometimes it is so that you can understand other topics later; because it is interesting/beautiful in its own right; (most commonly) because it is useful. Here are some examples of each of these: 1. The analyses we have used in this section allowed us to consider some important ideas in a simple setting. Examples include • Convergence, include rates of convergence Solving nonlinear equations 3. Some of these ideas are interesting and beautiful. Consider Newton’s method. Suppose that we want to find the complex nth roots of unity: the set of numbers {z0 , z1 , z2 , . . . , zn−1 } who’s nth roots are 1. Although we could just express this in terms of polar coordinates: z = eiθ ................................................. 2. Applications come from lots of areas of science and engineering. Less obvious might be applications to financial mathematics. The celebrated Black-Scholes equation for pricing a put option can be written as 2kπ n √ for some k ∈ Z = {. . . , −2, −1, 0, 1, 2 . . . } and i = −1. Example: 1, i, −1 and −i are the 4th roots of unity. Recall that, plotted in the Argand Plane, these point form a regular polygon. If we use Newton’s method to find a give root, we could try to solve f(z) = 0 with f(z) = zn − 1. The iteration is: • Fixed-point theory, and contractions. We’ll be seeing analogous ideas in the next section (Lipschitz conditions). • The approximation of functions by polynomials (Taylor’s Theorem). This point will reoccur in the next section, and all through-out next semester. where θ = zk+1 = zk − (zk )n − 1 . n(zk )n−1 However, there are n possible solutions; given a particular starting point, which root with the method converge to? If we take a number of points in a region of space, iterate on each of them, and then colour the points to indicate the ones that converge to the same root, we get the famous Julia 4 set, an example of a fractal. One such Julia set, generate by the Matlab script Julia.m, which you can download from the course website, is shown below in Figure 1.5. 1.5 2 1 ∂ V ∂V ∂V − σ2 S2 2 − rS + rV = 0. ∂t 2 ∂S ∂S 1 where 0.5 • V(S, t) is the current value of the right (but not the obligation) to buy or sell (“put” or “call”) an asset at a future time T ; 0 • S is the current value of the underlying asset; −0.5 • r is the current interest rate (because the value of the option has to be compared with what we would have gained by investing the money we paid for it) −1 −1.5 −1.5 • σ is the volatility of the asset’s price. Often one knows S, T and r, but not σ. The method of implied volatility is when we take data from the market and then find the value of σ which would give the data as the solution to the Black-Scholes equation. This is a nonlinear problem and so Newton’s method can be used. See Chapters 13 and 14 of Higham’s “An Introduction to Financial Option Valuation” for more details. (We will return to the Black-Scholes problem again at the end of the next section). ................................................. −1 −0.5 0 0.5 1 1.5 Fig. 1.5: A contour plot of a Julia set with n = 5 4 Gaston Julia, French mathematician 1893– 1978. The famous paper which introduced these ideas was published in 1918 when he was just 25. Interest later waned until the 1970s when Mandelbrot’s computer experiments reinvigorated interest. Matlab 1: the bisection and secant methods 1.5 Matlab 1: the bisection and secant methods The goal of this section is to help you gain familiarity with the fundamental tasks that can be accomplished with Matlab: defining vectors, computing functions, and plotting. We’ll then see how to implement and analyse the Bisection and Secant schemes in Matlab. ................................................. Matlab is an interactive environment for mathematical and scientific computing. It the standard tool for numerical computing in industry and research. Matlab stands for Matrix Laboratory. It specialises in Matrix and vector computations, but includes functions for graphics, numerical integration and differentiation, solving differential equations, etc. Matlab differs from most significantly from, say, Maple, by not having a facility for abstract computation. 1.5.1 The Basics Matlab an interpretive environment – you type a command and it will execute it immediately. The default data-type is a matrix of double precision floating-point numbers. A scalar variable is an instance of a 1 × 1 matrix. To check this set, say, >> t=10 and use the >> size(t) command to find the numbers of rows and columns of t. A vector may be declared as follows: >> x = [1 2 3 4 5 6 7] This generates a vector, x, with x1 = 1, x2 = 2, etc. However, this could also be done with x=1:7 More generally, a vector may be declared as follows: >> x = a:h:b; This sets x1 = a, x2 = a + h, x3 = x + 2h, ..., xn = b. If h is omitted, it is assumed to be 1. The ith element of a vector is access by typing x(i). The element of in row i and column j of a matrix is given by A(i,j) Most “scalar” functions return a matrix when given a matrix as an argument. For example, if x is a vector of length n, then y = sin(x) sets y to be a vector, also of length n, with yi = sin(xi ). Matlab has most of the standard mathematical functions: sin, cos, exp, log, etc. In each case, write the function name followed by the argument in round brackets, e.g., >> exp(x) for ex . The * operator performs matrix multiplication. For element-by-element multiplication use .* 17 Solving nonlinear equations For example, y = x.*x sets yi = (xi )2 . So does y = x.^2. Similarly, y=1./x set yi = 1/xi . If you put a semicolon at the end of a line of Matlab, the line is executed, but the output is not shown. (This is useful if you are dealing with large vectors). If no semicolon is used, the output is shown in the command window. 1.5.2 Plotting functions Define a vector >> x=[0 1 2 3] and then set >> f = x.^2 -2 To plot these vectors use: >> plot(x, f) If the picture isn’t particularly impressive, then this might be because Matlab is actually only printing the 4 points that you defined. To make this more clear, use >> plot(x, f, ’-o’) This means to plot the vector f as a function of the vector x, placing a circle at each point, and joining adjacent points with a straight line. Try instead: >> x=0:0.1:3 and f = x.^2 -2 and plot them again. To define function in terms of any variable, type: >> F = @(x)(x.^2 -2); Now you can use this function as follows: >> plot(x, F(x)) ; Take care to note that Matlab is case sensitive. In this last case, it might be helpful to also observe where the function cuts the x-axis. That can be done by also plotting the line joining, for example, the points (0, 0), and (3, 0). This can be done by: >> plot(x,F(x), [0,3], [0,0]); Also try >> plot(x,F(x), [0,3], [0,0], ’--’); Notice the difference? ................................................. Tip: Use the >> help menu to find out what the >> ezplot function is, and how to use it. 1.5.3 Programming the Bisection Method Revise the lecture notes on the Bisection Method. Suppose we want to find a solution to ex − (2 − x)3 = 0 in the interval [0, 5] using Bisection. • Define the function f as: >> f = @(x)(exp(x) - (2-x).^3); • Taking x1 = 0 and x2 = 5, do 8 iterations of the Bisection method • Complete the table below. You may use that the solution is (approximately) τ = 0.7261444658054950. Matlab 1: the bisection and secant methods k 1 2 3 4 5 6 7 8 xk 18 |τ − xk | Solving nonlinear equations 1.5.4 The Secant method Recall the the Secant Method in (1.2.1). Q2 (a) Adapt the program above to implement the secant method. (b) Use it to find a solution to ex − (2 − x)3 = 0 in the interval [0, 5]. (c) How many iterations are required to ensure that the error is less than 10−10 ? ................................................. Implementing the Bisection method by hand is very tedious. Here is a program that will do it for you. You don’t need to type it all in; you can download it from http://www.maths.nuigalway.ie/MA385/lab1/Bisection.m %% Using the Bisection method to find a zero of f(x) % Lab 1 of MA385 clear; % Erase all stored variables fprintf(’\n\n---------\n Using Bisection ’); % The function is f = @(x)(exp(x) - (2-x).^3); disp(’Solving f=0 with the function’); disp(f); %% The true solution is tau = 0.72614446580549503614; fprintf(’The true solution is %12.8f’, tau); %% Our initial guesses are x_1=0 and x_2 =2; x(1)=0; fprintf(’\n\n%2d | %1.8e | %1.3e \n’, ... 1, x(1), abs(tau - x(1))); x(2)=5; fprintf(’%2d | %1.8e | %1.3e \n’, 2, x(2), abs(tau-x(2))); for k=2:10 x(k+1) = (x(k-1)+x(k))/2; if ( f(x(k+1))*f(x(k-1)) < 0) x(k)=x(k-1); end fprintf(’%2d | %1.8e | %1.3e\n’, k+1, x(k+1), ... abs(tau - x(k+1))); end Read the code carefully. If there is a line you do not understand, then ask a tutor, or look up the on-line help. For example, find out what that clear on Line 3 does by typing >> doc clear . . . . . . . . . . . . . . . . . . . . . . . . . . Q1. Suppose we wanted an estimate xk for τ so that |τ − xk | 6 10−10 . 1 k−1 |x1 2 − x0 |. (i) In §1.1 we saw that |τ − xk | 6 Use this to estimate how many iterations are required in theory. (ii) Use the program above to find how many iterations are required in practice. Q3 Recall from Definition 1.2.4 the order of convergence of a sequence {ε0 , ε1 , ε2 , . . . } is q if εk+1 lim q = µ, k→∞ εk for some constant µ. √ We would like to verify that q = (1 + 5)/2 ≈ 1.618. This is difficult to do computationally because, after a relatively small number of iterations, the round-off error becomes significant. But we can still try! Adapt the program above so that at each iteration it displays |τ − xk |, |τ − xk |1.68 , |τ − xk |2 , and so deduce that the order of converges is greater than 1 (so better than bisection), less than 2, and √ roughly (1 + 5)/2. Q4 (Bonus) The bisection method is popular because it is robust: it will always work subject to minimal constraints. However, it is slow: if the Secant works, then it converges much more quickly. How can we combine these two algorithms to get a fast, robust method? Consider the following problem: Solve f(x) = 1 − x2 2 − 2x + 2 on [−10, 1]. You should find that the bisection method works (slowly) for this problem, but the Secant method will fail. So write a hybrid algorithm that switches between the bisection method and the secant method as appropriate. Take care to document your code carefully, to show which algorithm is used when. How many iterations are required? 1.5.5 To Finish Before you leave the class send an email to [email protected] with your name, ID number, and answers to Questions 1, 2 and 3. For bonus credit, submit your solution to Q4 too. If you have worked in small groups, you should still submit separate emails. Matlab 1: the bisection and secant methods 1.6 Nonlinear equations: Exercises Exercise 1.6.1. Does Proposition 1.1.1 mean that, if there is a solution to f(x) = 0 in [a, b] then 19 Solving nonlinear equations Exercise 1.6.8. ? Here is (yet) another scheme called Steffenson’s Method: Choose x0 ∈ [a, b] and set 2 f(xk ) xk+1 = xk − for k = 0, 1, 2, . . . . f xk + f(xk ) − f(xk ) f(a)f(b) 6 0? (a) Explain how this method relates to Newton’s Method. That is, is f(a)f(b) 6 0 a necessary condition for their being a solution to f(x) = 0? Give an example that supports your answer. (b) Write a Matlab program to implement this method. Exercise 1.6.2. Suppose we want to find τ ∈ [a, b] such that f(τ) = 0 for some given f, a and b. Write down an estimate for the number of iterations K required by the bisection method to ensure that, for a given ε, we know |xk − τ| 6 ε for all k > K. In particular, how does this estimate depend on f, a and b? Exercise 1.6.3. How many (decimal) digits of accuracy are gained at each step of the bisection method? (If you prefer, how many steps are need to gain a single (decimal) digit of accuracy?) Exercise 1.6.4. ? Let f(x) = ex − 2x − 2. Show that the problem Find τ ∈ [0, 2] such that f(τ) = 0 has a solution. Taking x0 = 0 and x1 = 2, use 6 steps of the bisection method to estimate τ. Give an upper bound for the error |τ − x6 |. (You may use a computer program to do this). Exercise 1.6.5. (i) Write down the equation of the line that intersect xk−1 , f(xk−1 ) and xk , f(xk ) . Hence show how to derive (1.2.1). (ii) Can you construct a problem for which the bisection method will work, but the secant method will fail completely? If so, give an example. Exercise 1.6.6. (i) Write down Newton’s Method as applied to the function f(x) = x3 − 2. Simplify the computation as much as possible. What is achieved if we find the root of this function? (ii) Do three iterations by hand of Newton’s Method applied to f(x) = x3 − 2 with x0 = 1. Exercise 1.6.7. (This is taken from [5, Exer 3.5.1]). If f is such that |f 00 (x)| 6 3 and |f 0 (x)| > 1 for all x, and if the initial error in Newton’s Method is less than 1/2, give an upper bound for the error at each of the first 3 steps. (c) Using this program or otherwise, determine x3 and |τ − x3 | when the method is used to solve ex = (2 − x)3 with x0 = 0. Exercise 1.6.9. ([1, Exer 1.6]) The proof of the convergence of Newton’s method given in Prop. 1.3.5 uses that f 0 (τ) 6= 0. Suppose that it is the case that f 0 (τ) = 0. (i) What can we say about the root, τ? (ii) Show that the Newton Error formula can be extended to give that τ − xk+1 = (τ − xk ) f 00 (ηk ) , 2 f 00 (µk ) for some µk between τ and xk . (iii) What does the above error formula tell us about the convergence of Newton’s method in this case? Exercise 1.6.10. Is it possible for g to be a contraction on [a, b] but not have a fixed point in [a, b]? Give an example to support your answer. Exercise 1.6.11. Show that g(x) = ln(2x + 1) is a contraction on [1, 2]. Give an estimate for L. (Hint: Use the Mean Value Theorem). Exercise 1.6.12. Consider the function g(x) = x2 /4 + 5x/4 − 1/2. 1. It has two fixed points – what are they? 2. For each of these, find the largest region around them such that g is a contraction on that region. Exercise 1.6.13. ? Although we didn’t prove it in class, it turns out that, if g(τ) = τ, and the fixed point method given by xk+1 = g(xk ), converges to the point τ (where g(τ) = τ), and g 0 (τ) = g 00 (τ) = · · · = g(p−1) (τ) = 0, then it converges with order p. (i) Use a Taylor Series expansion to prove this. (ii) We can think of Newton’s Method for the problem f(x) = 0 as fixed point iteration with g(x) = x − f(x)/f 0 (x). Use this to show that, if Newton’s method converges, it does so with order 2, providing that f 0 (τ) 6= 0. Matlab 1: the bisection and secant methods 1.7 Appendix Here are the full details on the proof of the fact that the Secant Method converges at least linearly (Proposition 1.2.5). Before you read it, take care to review the notes from that section, particularly (1.2.4) and (1.2.5). Proof. The method is xk+1 = xk − f(xk ) xk − xk−1 . f(xk ) − f(xk−1 ) We’ll use this to derive an expression of the error at step k + 1 in terms of the error at step k. In particular, τ − xk+1 = τ − xk + f(xk ) xk − xk−1 f(xk ) − f(xk−1 ) = τ − xk + f(xk )/f 0 (wk ) = τ − xk + (xk − τ)f 0 (zk )/f 0 (wk ) 0 0 = (τ − xk ) 1 − f (zk )/f (wk ) . Therefore f 0 (zk ) |τ − xk+1 | 6 1− 0 . |τ − xk | f (wk ) So it remains to be shown that 0 1 − f (zk ) < 1. 0 f (wk ) Lets first assume that f 0 (τ) = α > 0. (If f 0 (τ) = α < 0 the following arguments still hold, just with a few small changes). Because f 0 is continuous in the region [τ − h, τ + h], for any given ε > 0 there is a δ > 0 such that |f 0 (x) − α| < ε for and x ∈ [τ − δ, τ + δ]. Take ε = α/4. Then |f 0 (x) − α| < α/4. Thus α 3 5 6 f 0 (x) 6 α 4 4 for any x ∈ [τ − δ, τ + δ]. Then, so long as wk and zk are both in [τ − δ, τ + δ] 5 f 0 (zk ) 6 . 0 f (wk ) 3 This gives |τ − xk+1 | 2 6 , |τ − xk | 3 which is what we needed. ................................................. 20 Solving nonlinear equations Chapter 2 Initial Value Problems 2.1 Introduction rectangular region D if there is a positive real number L such that |f(t, u) − f(t, v)| 6 L|u − v| (2.1.2) The first part of this introduction is based on [5, Chap. 6]. The rest of the notes mostly follow [1, Chap. 12]. The growth of some tumours can be modelled as for all (t, u) ∈ D and (t, v) ∈ D. 2λσ 1 p , R 0 (t) = − Si R(t) + 3 µR + µ2 R2 + 4σ Example 2.1.2. For each of the following functions f, show that is satisfies a Lipschitz condition, and give an upper bound on the Lipschitz constant L. subject to the initial condition R(t0 ) = a, where R is the radius of the tumour at time t. Clearly, it would be useful to know the value of R at certain times in the future. Though it’s essentially impossible to solve for R exactly, we can accurately estimate it. In this section, we’ll study techniques for this. ................................................. Initial Value Problems (IVPs) are differential equations of the form: Find y(t) such that (i) f(t, y) = y/(1 + t)2 for 0 6 t 6 ∞. (ii) f(t, y) = 4y − e−t for all t. (iii) f(t, y) = −(1 + t2 )y + sin(t) for 1 6 t 6 2. dy = f(t, y) for t > t0 , and y(t0 ) = y0 . (2.1.1) dt Here y 0 = f(t, y) is the differential equation and y(t0 ) = y0 is the initial value. Some IVPs are easy to solve. For example: y 0 = t2 with y(1) = 1. ................................................. The reason we are interested in functions satisfying Lipschitz conditions is as follows: Just integrate the differential equation to get that y(t) = t3 /3 + C, and use the initial value to find the constant of integration. This gives the solution y 0 (t) = (t3 + 2)/3. However, most problems are much harder, and some don’t have solutions at all. In many cases, it is possible to determine that a given problem does indeed have a solution, even if we can’t write it down. The idea is that the function f should be “Lipschitz”, a notion closely related to that of a contraction (1.4.7). Proposition 2.1.3 (Picard’s2 ). Suppose that the realvalued function f(t, y) is continuous for t ∈ [t0 , tM ] and y ∈ [y0 −C, y0 +C]; that |f(t, y0 )| 6 K for t0 6 t 6 tM ; and that f satisfies the Lipschitz condition (2.1.2). If K L(tM −t0 ) C> e −1 , L then (2.1.1) us a unique solution on [t0 , tM ]. Furthermore |y(t) − y(t0 )| 6 C t 0 6 t 6 tM . Definition 2.1.1. A function f satisfies a Lipschitz 1 Condition (with respect to its second argument) in the 2 Charles 1 Rudolf Otto Sigismund Lipschitz, Germany, 1832–1903. Made many important contributions to science in areas that include differential equations, number theory, Fourier Series, celestial mechanics, and analytic mechanics. 21 Emile Picard, France, 1856–1941. He made important discoveries in the fields of analysis, function theory, differential equations and geometry. He supervised the Ph.D. of Padraig de Br´ un, Professor of Mathematics (and President) of University College Galway/NUI Galway. One step methods 22 You are not required to know this theorem for this course. However, it’s important to be able to determine when a given f satisfies a Lipschitz condition. 2.2 One step methods 2.2.1 Euler’s Method Classical numerical methods for IVPs attempt to generate approximate solutions at a finite set of discrete points t0 < t1 < t2 < · · · < tn . The simplest is Euler’s Method 3 and may be motivated as follows: Suppose we know y(ti ), and want to find y(ti+1 ). From the differential equation we know the slope of the tangent to y at ti . So, if this is similar to the slope of the line joining (ti , y(ti )) and (ti+1 , y(ti+1 )): y 0 (ti ) = f(ti , y(ti )) ≈ Initial Value Problems If we had chosen h = 4 we would have only required one step: yn = y0 +4f(t0 , y0 ) = 5. However, this would not be very accurate. With a little work one can show −1 that the solution to this problem is y(t) = etan (t) and so y(4) = 3.7652. Hence the computed solution with h = 1 is much more accurate than the computed solution when h = 4. This is also demonstrated in Figure 2.1 below, and in Table 2.1 where we see that the error seems to be proportional to h. yi+1 − yi . ti+1 − ti . ..... ... .. . . . . . . . . . . . . .... . ... .......... . .......................... ... . . ........ . . . . . . ... . . . . . ....... . . . . ... . . . . . . . . .. . .......... . .......... . ......... . .......... . ... .................... ............. ..... ..... . . . ... . . . . . . . . . . . . . ..... .................... ...... . . . . . . . . . . . . . . . . . . . . ..... ..... . ............... .......... . .. . ..... .......... .......... . .......... ..... ..... ..... ..... .... .... . . (t .......... ....... i+1 , yi+1 ) . i i...... .................................... ..... ..... ..... . . . . . ..................... ... . . . . . . . . . . . . . . . . . . . ..... . .............. . ........ . i .. . . . ........... . . ..... . . ... ... . ........... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... y(t) pp pp p pp p p p p p p p p p ppp p p p p ppp p p p pp p p p p p p p p p p p p p p pppp p p p p p p p pppp p p p p p p p p p p p p p ppppp p p p ppppppp p p p p p (t , y ) p p p p p ppppp p p p p p p p p pppppppp p p p p p p p p p p pp p p p p p p p p Secant line joining the points on y(t) at ti and ti+1 Tangent to y(x) at x = x ti h ti+1 Method 2.2.1 (Euler’s Method). Choose equally spaced points t0 , t1 , . . . , tn so that ti − ti−1 = h = (tn − t0 )/n for i = 0, . . . , n − 1. We call h the “time step”. Let yi denote the approximation for y(t) at t = ti . Set yi+1 = yi + hf(ti , yi ), i = 0, 1, . . . , n − 1. (2.2.3) Example 2.2.2. Taking h = 1, estimate y(4) where 0 2 y (t) = y/(1 + t ), y(0) = 1. (2.2.4) Fig. 2.1: Euler’s method for Example 2.2.2 with h = 4, h = 2, h = 1 and h = 1/2 The true solution to this is y(t) = earctan(t) . 3 Leonhard Euler, 1707–1783, Switzerland. One of the greatest Mathematicians of all time, he made vital contributions to geometry, calculus and number theory. finite differences, special functions, differential equations, continuum mechanics, astronomy, elasticity, acoustics, light, hydraulics, music, cartography, and much more. n 1 2 4 8 16 32 h 4 2 1 1/2 1/4 1/8 yn 5.0 4.2 3.960 3.881 3.831 3.800 |y(tn ) − yn | 1.235 0.435 0.195 0.115 0.065 0.035 Table 2.1: Error in Euler’s method for Example 2.2.2 Analysis of Euler’s method 2.3 2.3.1 23 Error Analysis Initial Value Problems tion errors is explained in the following (important!) result, which in turn is closely related to Proposition 2.1.3: General one-step methods Euler’s method is an example of the more general onestep methods, which have the form: yi+1 = yi + hΦ(ti , yi ; h). (2.3.5) To get Euler’s method, just take Φ(ti , yi ; h) = f(ti , yi ). In the introduction, we motivated Euler’s method with a geometrical argument. An alternative, more mathematical way of deriving Euler’s Method is to use a Truncated Taylor Series. Proposition 2.3.3. Let Φ() be Lipschitz with constant L. Then L(tn −t0 ) e −1 |En | 6 T , (2.3.7) L where T = maxi=0,1,...n |Ti |. (See also Exercise 2.9.3) 2.3.3 Analysis of Euler’s method For Euler’s method, we get T = max |Tj | 6 This again motivates formula (2.2.3), and also suggests that at each step the method introduces a (local) error of h2 y 00 (η)/2. (More of this later). 2.3.2 Two types of error 06j6n h max |y 00 (t)|. 2 t0 6t6tn Example 2.3.4. Given the problem: y0 = 1 + t + y for t > 1; y(1) = 1, t find an approximation for y(2). We’ll now give an error analysis for general one-step methods, and then look at Euler’s Method as a specific example. First, some definitions. (i) Give an upper bound for the global error taking n = 4 (i.e., h = 1/4) Definition 2.3.1. Global Error : Ei = y(ti ) − yi . (ii) What n should you take to ensure that the global error is no more that 0.1? Definition 2.3.2. Truncation Error : Ti := y(ti+1 ) − y(ti ) − Φ(ti , y(ti ); h). h (2.3.6) To answer these questions we need to use (2.3.7), which requires that we find L and an upper bound for T . In this instance, L is easy: It can be helpful to think of Ti as representing how much the difference equation (2.2.3) differs from the differential equation. We can also determine the truncation error for Euler’s method directly from a Taylor Series. (Note: This is a particularly easy example. Often we need to employ the mean value theorem. See [1, Eg 12.2].) To find T we need an upper bound for |y 00 (t)| on [1, 2], even though we don’t know y(t). However, we do know y 0 (t).... The relationship between the global error and trunca- Analysis of Euler’s method 24 Initial Value Problems Definition 2.3.6. A one-step method yn+1 = yn + hΦ(tn , yn ; h) is consistent with the differential equa tion y 0 (t) = f t, y(t) if f(t, y) ≡ Φ(t, y; 0). Plug these values into (2.3.7) to find En 6 0.644. In fact, the true answer is 0.43, so we see that (2.3.7) is somewhat pessimistic. To answer (ii): What n should you take to ensure that the global error is no more that 0.1? (We should get n = 26. This is not that sharp: n = 19 will do). 2.3.4 Convergence and Consistency We are often interested in the convergence of a method. That is, is it true that lim yn = y(tn )? h→0 Or equivalently that, lim En = 0? h→0 Given that the global error for Euler’s method can be bounded: max |y 00 (t)| L(tn −t0 ) e −1 = hK, (2.3.8) |En | 6 h 2L we can say it converges. Definition 2.3.5. The order of accuracy of a numerical method is p if there is a constant K so that |Tn | 6 Khp . (The term order of convergence is often use instead of order of accuracy). In light of this definition, we read from (2.3.8) that Euler’s method is first-order. One of the requirements for convergence is Consistency : ................................................. It is quite trivial to see that Euler’s method is consistent. In the next section, we’ll try to develop methods that are of higher order than Euler’s method. That is, we will study methods for which one can show |En | 6 Khp for some p > 1. For these, showing consistency is a little (but only a little) more work. Runge-Kutta 2(RK2) 2.4 25 Runge-Kutta 2(RK2) The goal of this section is to develop some techniques to help us derive our own methods for accurately solving Initial Value Problems. Rather than using formal theory, the approach will be based on carefully chosen examples. As a motivation, a suppose we numerically solve some differential equation and estimate the error. If we think this error is too large, we could redo the calculation with a smaller value of h. Or we could use a better method, for example a Runge-Kutta method. These are highorder methods that rely on evaluating f(t, y) a number of times at each step in order to improve accuracy. We’ll first motivate one such method and then, in the next section, look at the general framework. 2.4.1 Modified Euler Method Recall the motivation for Euler’s method from §2.2.1. We can do something similar to derive what’s often called the Modified Euler’s method, or, less commonly, the Midpoint Euler’s method. In Euler’s method, we use the slope of the tangent to y at ti as an approximation for the slope of the secant line joining the points (ti , y(ti )) and (ti+1 , y(ti+1 )). One could argue, given the diagram below, that the slope of the tangent to y at t = (ti + ti+1 )/2 = ti + h/2 would be a better approximation. This would give h h y(ti+1 ) ≈ yi + hf ti + , y(ti + ) . 2 2 h h , yi + f(ti , yi ) . 2 2 (2.4.10) .. ..... .. ... . . . . . . . . . . i i+1 . . .... . . ........... . ... . ........................ . . . . . . ... ........... . . . . . . . . . . . . . ... . ..... ..... .. ...... . ........... ... . ........... ........ . . ........... ..... .. .. . . . . . . . . . . . ......... . . . . . .. . .... .................... . ..... ... . . . ..... ........... . ... .... . ........... ..... .. . ........... . . . . . . . . . . . . . . . . . . . .... ....... ....... . . . . . . . . . . . . . . . . . . . .... ..... . ...... . . . . . . . . . . . . . . . . ... . i i ..... ....................... ..... ..... ..... . . ... . . . . . . . . . . . . . . . . . . . ............. ..... ..... . . . . . . . . . . .. . ....... . . . . . . . . . . . . . . . . .. .... . ........... ... ..... ..... .. i . . . . .. ................................................................................................................................................................................................................................................................................................... . . . .... .... . .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... y(x) Secant line joining the points on y(t) at t and t p p p p pp p p p p pp p p p p p p p p p ppp p p p p p p pp p p p p p p p p p p p p pppp p p pp pppppppp p p p p p p p p p p p p p p p p p p p p p p pp p p p p p p p ppppppp p (t , y ) p p ppppp p p p p p p p p ppppppppppp p p p pppppp p p p p p pp p p p p p p p p Tangent to y(t) at t = t + h/2 ti h ti+1 Example 2.4.1. Use the Modified Euler Method to approximate y(1) where y(0) = 1, y 0 (t) = y log(1 + t2 ). This has the solution y(t) = (1+t2 )t exp(−2t+2 tan−1 t). In Table 2.2 below we compare the error in the solution to this problem using Euler’s Method (left) and the Modified Euler’s Method (right) for various values of n. Euler n 1 2 4 8 16 32 64 128 En 3.02e-01 1.90e-01 1.11e-01 6.02e-02 3.14e-02 1.61e-02 8.13e-03 4.09e-03 En /En−1 1.59 1.72 1.84 1.91 1.95 1.98 1.99 Modified En 7.89e-02 2.90e-02 8.20e-03 2.16e-03 5.55e-04 1.40e-04 3.53e-05 8.84e-06 En /En−1 2.72 3.54 3.79 3.90 3.95 3.98 3.99 Table 2.2: Errors in solutions to Example 2.4.1 using Euler’s and Modified Euler’s Methods Clearly we get a much more accurate result using the Modified Euler Method. Even more importantly, we get a higher order of accuracy : if we half h, the error in the Modified method is reduced by a factor of four. 2.4.2 General RK2 The “Modified Euler Method” we have just studied is an example of one of the (large) family of 2nd -order RungeKutta (RK2) methods, whose general form is: yi+1 = yi + h(ak1 + bk2 ) k1 = f(ti , yi ) (2.4.11) k2 = f(ti + αh, yi + βhk1 ). (2.4.9) However, we don’t know y(ti + h/2), but can approximate it using Euler’s Method: y(ti + h/2) ≈ yi + (h/2)f(ti , yi ). Substituting this into (2.4.9) gives yi+1 = yi + hf ti + Initial Value Problems If we choose a, b, α and β in the right way, then the error for the method will be bounded by Kh2 , for some constant K. An (uninteresting) example of such a method is if we take a = 1 and b = 0, it reduces to Euler’s Method. If we can choose α = β = 1/2, a = 0, b = 1 and get the “Modified” method above. Our aim now is to deduce general rules for choosing a, b, α and β. We’ll see that if we pick any one of these four parameters, then the requirement that the method be consistent and second-order determines the other three. 2.4.3 Using consistency By demanding that RK2 be consistent we get that a + b = 1. Runge-Kutta 2(RK2) 2.4.4 Ensuring that RK2 is 2nd -order Next we need to know how to choose α and β. The formal way is to use a two-dimensional Taylor series expansion. It is quite technical, and not suitable for doing in class. Detailed notes on it are given in Section 2.10 below. Instead we’ll take a less rigorous, heuristic approach. Because we expect that, for a second order accurate method, |En | 6 Kh2 where K depends on y 000 (t), if we choose a problem for which y 000 (t) ≡ 0, we expect no error... In the above example, the right-hand side of the differential equation, f(t, y), depended only on t. Now we’ll try the same trick: using a problem with a simple known solution (and zero error), but for which f depends explicitly on y. Consider the DE y(1) = 1, y 0 (t) = y(t)/t. It has a simple solution: y(t) = t. We now use that any RK2 method should be exact for this problem to deduce that α = β. Now we collect the above results all together and show that the second-order Runge-Kutta (RK2) methods 26 Initial Value Problems are: yi+1 = yi + h(ak1 + bk2 ) k1 = f(ti , yi ) k2 = f(ti + αh, yi + βhk1 ), where we choose any b 6= 0 and then set a = 1 − b, α= 1 , 2b β = α. It is easy to verify that the Modified method satisfies these criteria. Higher-order methods 2.5 2.5.1 27 Higher-order methods Runge-Kutta 4 It is possible to construct methods that have rates of convergence that are higher than RK2 methods, such as the RK4 methods, for which |y(tn ) − yn | 6 Ch4 . However, writing down the general form of the RK4 method, and then deriving conditions on the parameters is rather complicated. Therefore, we’ll simply state the most popular RK4 method, without proving that it is 4th-order. 4th-Order Runge Kutta Method (RK4): Initial Value Problems n 1 2 4 8 16 32 64 128 256 512 1024 2048 Euler 1.235 4.347e-01 1.947e-01 1.154e-01 6.530e-02 3.501e-02 1.817e-02 9.263e-03 4.678e-03 2.351e-03 1.178e-03 5.899e-04 |y(tn ) − yn | Modified 3.653e-01 4.526e-02 2.108e-02 1.146e-02 4.274e-03 1.315e-03 3.650e-04 9.619e-05 2.469e-05 6.254e-06 1.574e-06 3.948e-07 RK4 6.037e-01 1.953e-01 1.726e-02 9.192e-04 5.712e-05 3.474e-06 2.122e-07 1.308e-08 8.112e-10 5.050e-11 3.154e-12 1.839e-13 Table 2.3: Errors in solutions to Example 2.5.1 using Euler’s, Modified, and RK4 k1 = f(ti , yi ), h h , yi + k1 ), 2 2 h h k3 = f(ti + , yi + k2 ), 2 2 k4 = f(ti + h, yi + hk3 ), k2 = f(ti + yi+1 = yi + h k1 + 2k2 + 2k3 + k4 ). 6 It can be interpreted as k1 is the slope of y(t) at ti . k2 is an approximation for the slope of y(t) at ti + h/2 (using Euler’s Method). k3 is an improved approximation for the slope of y(t) at ti + h/2. k4 is an approximation for the slope of y(t) at ti+1 computed using the slope at y(ti + h/2). Finally The slope of the secant line joining y(t) at the points ti and ti+1 is approximated using a weighted average of the of the above values. Example 2.5.1. Recall the test problem (2.2.4). In Table 2.3 below we give the errors in the solutions computed using various methods and values of n. 2.5.2 RK4: consistency and convergence Although we won’t do a detailed analysis of RK4, we can do a little. In particular, we would like to show it is (i) consistent, (ii) convergent and fourth-order, at least for some examples. Example 2.5.2. It is easy to see that RK4 is consistent: Example 2.5.3. In general, showing the rate of convergence is tricky. Instead, we’ll demonstrate how the method relates to a Taylor Series expansion for the problem y 0 = λy where λ is a constant. Higher-order methods 2.5.3 28 §2.5.3 The (Butcher) Tableau A great number of RK methods have been proposed and used through the years. A unified approach of representing and studying them was developed by John Butcher (Auckland, New Zealand). In his notation, we write an s-stage method as Φ(ti , yi ; h) = s X b j kj , where j=1 k1 = f(ti + α1 h, yi ), k2 = f(ti + α2 h, yi + β21 hk1 ), k3 = f(ti + α3 h, yi + β31 hk1 + β32 hk2 ), .. . ks = f(ti + αs h, yi + βs1 hk1 + . . . βs,s−1 hks−1 ), The most convenient way to represent the coefficients is in a tableau: α1 α2 α3 .. . αs β21 β31 β32 βs1 b1 βs2 b2 ··· ··· βs,s−1 bs−1 bs The tableau for the basic Euler method is trivial: 0 1 The two best-known RK2 methods are probably the “Modified method” of Section 2.7, and the “Improved Method” of Exercise 2.9.6. Their tableaux are: 0 1/2 1/2 0 and 0 1 1 1 1/2 1/2 The tableau for the RK4 method above is: 0 1/2 1/2 1 1/2 0 0 1/6 1/2 0 2/6 1 2/6 1/6 You should now convince yourself that these tableaux do indeed correspond to the methods we did in class. 2.5.4 Even higher-order methods? A Runge Kutta method has s stages if it involves s evaluations of the function f. (That it, its formula features k1 , k2 , . . . , ks ). We’ve seen a one-stage method that is 1st -order, a two-stage method that is 2nd -order, ..., and Initial Value Problems a four-stage method that is 4th -order. It is tempting to think that for any s we can get a method of order s using s stages. However, it can be shown that, for example, to get a 5th -order method, you need at least 6 stages; for a 7th -order method, you need at least 9 stages. The theory involved is both intricate and intriguing, and involves aspects of group theory, graph theory, and differential equations. Students in third year might consider this as a topic for their final year project. From IVPs to Linear Systems 2.6 From IVPs to Linear Systems In this final section, we highlight some of the many important aspects of the numerical solution of IVPs that are not covered in detail in this course: • Systems of ODEs; • Higher-order equations; 29 Initial Value Problems 2.6.2 Higher-Order problems So far we’ve only considered first-order initial value problems: y 0 (t) = f(t, y); y(t0 ) = y0 . However, the methods can easily be extended to highorder problems: y 00 (t) + a(t)y 0 (t) = f(t, y); y(t0 ) = y0 , y 0 (t0 ) = y1 . • Implicit methods; and • Problems in two dimensions. We have the additional goal of seeing how these methods related to the earlier section of the course (nonlinear problems) and next section (linear equation solving). We do this by converting the problem to a system: set z(t) = y 0 (t). Then: z 0 (t) = −a(t)z(t) + f(t, y), z(t0 ) = y1 , y 0 (t) = z(t), y(t0 ) = y0 . Example 2.6.2. Consider the following 2nd-order IVP 2.6.1 Systems of ODEs So far we have solved only single IVPs. However, must interesting problems are coupled systems: find functions y and z such that y 00 (t) − 3y 0 (t) + 2y(t) + et = 0, y(1) = e, y 0 (1) = 2e. Let z = y 0 , then y 0 (t) = f1 (t, y, z), z 0 (t) = 3z(t) − 2y(t) + et , 0 0 z (t) = f2 (t, y, z). This does not present much of a problem to us. For example the Euler Method is extended to y (t) = z(t), z(0) = 2e y(0) = e. Euler’s Method is zi+1 = zi + h 3zi − 2yi + eti , yi+1 = yi + hf1 (t, yi , zi ), yi+1 = yi + hzi . zi+1 = zi + hf2 (t, yi , zi ). Example 2.6.1. In pharmokinetics, the flow of drugs between the blood and major organs can be modelled dy (t) = k21 z(t) − (k12 + kelim )y(t). dt dz (t) = k12 y(t) − k21 z(t). dt y(0) = d, z(0) = 0. where y is the concentration of a given drug in the bloodstream and z is its concentration in another organ. The parameters k21 , k12 and kelim are determined from physical experiments. Euler’s method for this is: yi+1 = yi + h − (k12 + kelim )yi + k21 zi , zi+1 = zi + h k12 yi + k21 zi . 2.6.3 Implicit methods Although we won’t dwell on the point, there are many problems for which the one-steps methods we have seen will give a useful solution only when the step size, h, is small enough. For larger h, the solution can be very unstable. Such problems are called “stiff” problems. They can be solved, but are best done with so-called “implicit methods”, the simplest of which is the Implicit Euler Method: yi+1 = yi + hf(ti+1 , yi+1 ). Note that yi+1 appears on both sizes of the equation. To implement this method, we need to be able to solve this non-linear problem. The most common method for doing this is Newton’s method. 2.6.4 Towards Partial Differential Equations So far we’ve only considered ordinary differential equations: these are DEs which involve functions of just one variable. In our examples above, this variable was time. From IVPs to Linear Systems 30 But of course many physical phenomena vary in space and time, and so the solutions to the differential equations the model them depend on two or more variables. The derivatives expressed in the equations are partial derivatives and so they are called partial differential equations (PDEs). We will take a brief look at how to solve these (and how not to solve them). This will motivate the following section, on solving systems of linear equations. Students of financial mathematics will be familiar with the Black-Scholes equations for pricing an option, which we mentioned in Section 1.4.5: Initial Value Problems 1. Divide [0, T ] into N intervals of width h, giving the grid {0 = t0 < t1 < · · · < tN−1 < tN = T }, with ti = t0 + ih. 2. Divide [0, L] into M intervals of width H, giving the grid {0 = x0 < x1 < · · · < xM = L} with xj = x0 + jH. 3. Denote by ui,j the approximation for u(t, x) at (ti , xj ). 4. For each i = 0, 1, . . . , N − 1, use the following 2 approximation for ∂∂xu2 (ti , xj ) ∂V 1 ∂2 V ∂V − σ2 S2 2 − rS + rV = 0. ∂t 2 ∂S ∂S With a little effort, (see, e.g., Chapter 5 of “The Mathematics of Financial Derivatives: a student introduction”, by Wilmott, Howison, and Dewynne) this can be transformed to the simpler-looking heat equation: ∂2 u ∂u (t, x) = (t, x), for (x, t) ∈ [0, L] × [0, T ], ∂t ∂x2 δ2x ui,j = 1 ui,j−1 − 2ui,j + ui,j+1 , H2 for k = 1, 2, . . . , M − 1, and then take ui+1,j := ui,j − h δ2x ui,j . This scheme is called an explicit method: if we know ui,j−1 , ui,j and ui,j+1 then we can explicitly calculate ui+1,j . See Figure 2.3. and with the initial and boundary conditions u(0, x) = g(x) and u(t, 0) = a(t), u(t, L) = b(t). Example 2.6.3. If L = π, g(x) = sin(x), a(t) = b(t) ≡ 0 then u(t, x) = e−t sin(x) (see Figure 2.2). ui+1,j t (time) ui,j−1 ui,j ui,j+1 u0,0 u0,M x (space) Fig. 2.3: A Finite Difference Grid. If we know ui,j−1 , ui,j and ui,j+1 we can calculate ui+1,j . Fig. 2.2: The true solution to the heat equation In general, however, this problem can’t be solved explicitly for arbitrary g, a, and b, and so a numerical scheme is used. Suppose we somehow know ∂2 u/∂x2 , then we could just use Euler’s method: u(ti+1 , x) = u(ti , x) + h 2 ∂2 u (ti , x). ∂x2 Although we don’t know ∂∂xu2 (ti , x) we can approximate it. The algorithm is as follows: Unfortunately, as we will see in class, this method is not very stable. Unless we are very careful choosing h and H, huge errors occur in the approximation for larger i (time steps). Instead one might use an implicit method: if we know ui−1,j , we compute ui,j−1 , ui,j and ui,j+1 simultaneously. More precisely: solve ui,j − h δ2x ui,j = ui−1,j for i = 1, then i = 2, i = 3, etc. Expanding the δ2x term we get, for each i = 1, 2, 3, . . . , the set of simultaneous equations ui,0 = a(ti ), αui,j−1 + βui,j + αui,j+1 = ui−1,k , ui,M = b(ti ), k = 1, 2, . . . , M − 1 From IVPs to Linear Systems 31 2h where α = − Hh2 and β = H This could be 2 + 1. expressed more clearly as the Matrix-Vector equation: Ax = f, where 1 0 α β 0 α .. A= . 0 0 0 0 0 0 and 0 α β 0 ... 0 ... α ... .. . 0 0 0 0 0 0 0 0 0 .. . 0 0 0 0 0 0 α β 0 α 0 0 α β 0 ui,0 ui,1 ui,2 .. . ... ... ... a(0) ui−1,1 ui−1,2 .. . 0 0 0 , 0 α 1 . x= ,y = u u i−1,n−2 i,n−2 ui−1,n−1 ui,n−1 b(T ) ui,n So “all” we have to do now is solve this system of equations. That is what the next section of the course is about. Initial Value Problems Matlab 2: Euler’s Method 2.7 32 Matlab 2: Euler’s Method In this session you’ll extend your familiarity with Matlab, and to use it to implement some numerical methods for IVPs, and study their order of accuracy. You should be able to complete at least Section 2.7.8 today. To verify your participation in the lab, send your results to me by email by the end of today. We’ll do Section 2.7 in the next lab. 2.7.1 Four ways to define a vector We know that the most fundamental object in Matlab is a matrix. The the simplest (nontriveal) example of a matrix is a vector. So we need to know how to define vectors. Here are several ways we could define the vector x = (0, .2, .4, . . . , 1.8, 2.0). (2.7.1) x = 0:0.2:2 % From 0 to 2 in steps of 0.2 x = linspace(0, 2, 11); % 11 equally spaced % points with x(1)=0, x(11)=2. x = [0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, ... 1.6, 1.8, 2.0]; % Define points individually The last way is rather tedious, but this one is worse: x(1)=0.0; x(2)=0.2, x(3)=0.4; x(4)=0.6; ... We’ll see a less tedious way of doing this last approach in Section 2.7.3 below. 2.7.2 Script files Matlab is an interpretative environment: if you type a (correct) line of Matlab code, and hit return, it will execute it immediately. For example: try >> exp(1) to get a decimal approximation of e. However, we usually want to string together a collection of Matlab operations and run the repeatedly. To do that, it is best to store these commands in a script file. This is done be making a file called, for example, Lab2.m and placing in it a series of Matlab commands. A script file is run from the Matlab command window by typing the name of the script, e.g., >> Lab2 Try putting some of the above commands for defining a vector into a script file and run it. 2.7.3 for–loops When we want to run a particular set of operations a fixed number of times we use a for–loop. It works by iterating over a vector; at each iteration the iterand takes each value stored in the vector in turn. For example, here is another way to define the vector in (2.7.1): Initial Value Problems for i=0:10 % 0:10 is the vector [0,1,...,10] x(i+1) = i*0.2; end 2.7.4 Functions In Matlab we can define a function in a way that is quite similar to the mathematical definition of a function. The syntax is >> Name = @(Var)(Formula); Examples: f = @(x)(exp(x) - 2*x -1); g = @(x)(log(2*x +1)); Now we can call, say, f(1) to compute e − 3. Note also that if x is a vector, so too is f(x). But a more interesting example would be to try >> xk = 1; and the repeat the line >> xk = g(xk) Try this and observe that the values of xk seem to be converging. This is because we are using Fixed Point Iteration. Later we’ll need to know how to define functions of two variables. This can be done as: F = @(y,t)(y./(1 + t.^2)); 2.7.5 Plotting functions Matlab has two ways to plot functions. The easiest way is to use a function called ezplot: ezplot(f, [0, 2]); plots the function f(x) for 0 6 x 6 2. A more flexible approach is to use the plot function, which can plot one vector as a function of another. Try these examples below, making sure you first have defined the vector x and the functions f and g: figure(1); plot(x,f(x)); figure(2); plot(x,f(x), x, g(x), ’--’, x,x, ’-.’); Can you work out what the syntax ’--’ and ’-.’ does? If not, ask a tutor. Also try plot(x,f(x), ’g-o’, x, g(x), x,x, ’-.’); ’r--x’, ... Matlab 2: Euler’s Method 2.7.6 33 How to learn more These notes are not an encyclopedic guide to Matlab – they have just enough information to get started. There are many good references online. As an exercise, access Learning MATLAB by Tobin Driscoll through the NUI Galway library portal. Read Section 1.6: “Things about MATLAB that are very nice to know, but which often do not come to the attention of beginners”. 2.7.7 Initial Value Problems A computational technique that verifies the order of the method is to estimate ρ by |En | ρ ≈ log2 . (2.7.2) |E2n | Use the data in the table verify that ρ ≈ 1 for Euler’s Method. Initial Value Problems The particular example of an IVP that we’ll look at in this lab is: estimate y(4) given that is one that we had earlier in (2.2.4): y 0 (t) = y/(1 + t2 ), for t > 0, 4.2 4.347 × 10−1 4 3.96 1.947 × 10−1 64 128 256 512 2.7.9 Euler’s Method is • Choose n, the number of points at which you will estimate y(t). Let h = (tn − t0 )/n, and ti = t0 + ih. • For i = 0, 1, 2, . . . , n−1 set yi+1 = yi +hf(ti , yi ). Then y(tn ) ≈ yn . As shown in Section 2.3.3, the global error for Euler’s method can be bounded: |En | := |y(T ) − yn | 6 Kh, for some constant K that does not depend on h (or n). That is, if we half h (i.e, double n), the error is reduced by half. Download the Matlab script file Euler.m. It can be run in Matlab simply by typing >> Euler It implements Euler’s method for n = 1. Read the file carefully and make sure you understand it. The program computes a vector y that contains the estimates for y at the time-values specified in the vector t. However, Matlab indexes all vectors from 1, and not 0. So t(1) = t0 , t(2) = t1 , ... t(n + 1) = tn . By changing the value of n, complete the table. We want to use this table to verify that Euler’s Method is 1st -order accurate. That is: with 2 ρ 32 and y(0) = 1. Euler’s Method |En | 6 Kh En 16 The true solution to this is y(t) = e . If you don’t want to solve this problem by hand, you could use Maple. The command is: dsolve({D(y)(t)=y(t)/(1+t^2),y(0)=1},y(t)); ρ yn 8 arctan(t) 2.7.8 n ρ = 1. More Matlab: formatted output When you read the code for the Euler.m script file, you’ll see that it uses the disp function to display messages and values. It is not a very flexible function. For example, it only prints one piece of information per line, and it doesn’t display in scientific notation where appropriate. A better approach is to use the more flexible, if complicated, fprintf function. Its basic syntax is: fprintf(’Here is a message\n’); where the \n indicates a new line. Using fprintf to display the contents of a variable is a little more involved, and depends on how we want the data displayed. For example: • To display an integer: fprintf(’Using n=%d steps\n’, n); • To display a floating point number: fprintf(’y(n)=%f\n’, Y(n+1)); • To display in exponential notation: fprintf(’Error=%e\n’, Error); • To display 3 decimal places: fprintf(’y(n)=%.3f\n’, Y(n+1)); or fprintf(’Error=%.3e\n’, Error); Use these to improve the formatting of the output from your Euler.m script. To get more information on fprintf, type >> doc fprintf , or ask one of the tutors. Matlab 2: Euler’s Method 2.8 34 Matlab 3: RK methods (i) (a) Assuming that the method is consistent, determine the value of b3 . In this lab, we’ll extend to code for Euler’s method from Lab 2 to implement higher-order methods. 2.8.1 Runge-Kutta 2 Methods y 0 (t) = 2t, t > 0, determine α2 . (c) Consider the initial value problem: yi+1 = yi + h(ak1 + bk2 ), where k1 = f(ti , yi ) and k2 = f(ti + αh, yi + βhk1 ). In Section 2.4, we saw that if we pick any b 6= 0, and let α= 1 , 2b (2.8.3) β = α, then we get a second order method: |En | 6 Kh2 . If we choose b = 1, we get the so-called Modified or mid-point Euler Method from Section 2.4. Adapt the Euler.m script to implement this method, and complete the table below. n (b) Assuming that the method is exact when used to compute to solution to y(0) = 0, The RK2 method is: a = 1 − b, Initial Value Problems En yn ρ 2 3.720 4.526 × 10−2 4 3.786 2.108 × 10−2 8 y(0) = 1, y 0 (t) = λy(t). Using that the solution is y(t) = eλt , write out a Taylor series for y(ti+1 ) about y(ti ) up to terms of order h4 (note: use that h = ti+1 − ti ). Using that RK3-1 should agrees with the Taylor Series expansion up to terms of order h3 , determine b21 and b32 . (ii) Write out the equations for this method in the same form that RK4 was given in Section 2.5.1. (iii) Write a Matlab program that implements this method. Test it for your own choice of a differential equation for which you know the exact solution. Now use this program to check the order of convergence of the method. If possible, have it compute the error for different values of n (e.g., n = 2, n = 4, n = 8, etc). Then produce a log-log plot of these n against the corresponding errors. 16 32 64 Explain how one should interpret the slope of this line, and why. 128 256 512 ................................................. Important: Send a summary of your numerical results, and your code for the Modified method to [email protected] before the end of the lab. The following exercise is part of a homework assignment. Come to lectures for more details. Also, have a look at Exercise 2.8.1. Exercise 2.8.1. ? Here is the tableau for a three stage Runge-Kutta method, which we’ll call RK3-1 (because that is its name). 0 2/3 b21 α2 1/3 b32 1/4 0 b3 (iv) Write a version of the programme that can implement a Runge-Kutta method, up to four stages, for any tableau. Find an example of an RK2 and RK3 method that were not covered in lectures, and verify that your program works for these. Matlab 2: Euler’s Method 2.9 35 IVPs: Exercises Exercise 2.9.1. For the following functions show that they satisfy a Lipschitz condition on the corresponding domain, and give an upper-bound for L: (i) f(t, y) = 2yt−4 for t ∈ [1, ∞), (ii) f(t, y) = 1 + t sin(ty) for 0 6 t 6 2. Exercise 2.9.2. Many text books, instead of giving the Lipschitz condition (2.1.2) give the following: There is a finite, positive, real number L such that | ∂ f(t, y)| 6 L ∂y for all (t, y) ∈ D. Is this statement stronger than (i.e., more restrictive then), equivalent to or weaker than (i.e., less restrictive than) statement (2.1.2)? Justify your answer. Exercise 2.9.3. An important step in the proof of Proposition 2.3.3, but which we didn’t do in class, requires the observation that if |Ei+1 | 6 |Ei |(1 + hL) + h|Ti |, then T (1 + hL)i − 1 i = 0, 1, . . . , N. L Use induction to show that is indeed the case. |Ei | 6 Exercise 2.9.4. Suppose we use Euler’s method to find an approximation for y(2), where y solves y 0 = (t − 1) sin(y). y(1) = 1, (i) Give an upper bound for the global error taking n = 4 (i.e., h = 1/4) (ii) What n should you take to ensure that the global error is no more that 10−3 ? Exercise 2.9.5. As a special case in which the error of Euler’s method can be analysed directly, consider Euler’s method applied to y 0 (t) = y(t), y(0) = 1. The true solution is y(t) = et . (i) Show that the solution to Euler’s method can be written as yi = (1 + h)ti /h , i > 0. Initial Value Problems Exercise 2.9.6. A popular RK2 method, called the Improved Euler Method, is obtained by choosing α = 1 in (2.4.11). (i) Use the Improved Euler Method to find an approximation for y(4) when y(0) = 1, y 0 = y/(1 + t2 ), taking n = 2. (If you wish, use Matlab.) (ii) Using a diagram similar to the one above for the Modified Euler Method, justify the assertion that the Improved Euler Method is more accurate than the basic Euler Method. (iii) Show that the method is consistent. (iv) Write out what this method would be for the problem: y 0 (t) = λy for a constant λ. How does this relate to the Taylor series expansion for y(xi+1 ) about the point xi ? Exercise 2.9.7. In his seminal paper of 1901, Carl Runge gave the following example of what we now call a RungeKutta 2 method: h f(tn , yn )+ yn+1 = yn + 4 2 2 3f tn + h, yn + hf(tn , yn ) . 3 3 (i) Show that it is consistent. (ii) Show how this method fits into the general framework of RK2 methods. (iii) Use it to estimate the solution at the point t = 2 to y(1) = 1, y 0 = 1 + t + y/t taking n = 2 time steps. (iv) Write down what this method is when problem y 0 (t) = λy(t). How does this compare with the Taylor series for y(ti+1 ) about the point ti ? Exercise 2.9.8. We claim that, for RK4: |EN | = |y(tN ) − yN | 6 Kh4 . lim (1 + h)1/h = e. for some constant K. How could you verify that the statement is true in this instance (i.e., using the data of Table 2.3)? Give an estimate for K. This then shows that, if we denote by yn (T ) the approximation for y(T ) obtained using Euler’s method with n intervals between t0 and T , then Exercise 2.9.9. Recall the problem in Example 2.2.2: Estimate y(2) given that (ii) Show that h→0 lim yn (T ) = eT . n→∞ Hint: Let w = (1+h)1/h , so that log w = (1/h) log(1+ h). Now use l’Hospital’s rule to find limh→0 w. y(1) = 1, y 0 = f(t, y) := 1 + t + y , t (i) Show that f(t, y) satisfies a Lipschitz condition and give an upper bound for L. Matlab 2: Euler’s Method 36 (ii) Use Euler’s method with h = 1/4 to estimate y(2). Using the true solution, calculate the error. Solving Linear Systems • To differentiate a function f a(t), b(t) with respect to t: df ∂f da ∂f db = + ; dt ∂a dt ∂b dt (iii) Repeat this for RK2 method taking h = 1/2. (iv) Use RK4 with h = 1 to estimate y(2). Exercise 2.9.10. Here is the tableau for a three stage Runge-Kutta method: f(x + η1 , y + η2 ) = f(x, y) + η1 0 η2 α2 1/2 1 b31 2 1/6 b2 1/6 2. The method is exact when used to compute to solution to y(0) = 0, y 0 (t) = 2t, t > 0. Use this to determine α2 . 3. The method should agree with an appropriate Taylor series for the solution to y 0 (t) = λy(t), up to terms that are O(h3 ). Use this to determine b31 . Exercise 2.9.11. Write down the Euler Method for the following 3rd-order IVP y 000 − y 00 + 2y 0 + 2y = x2 − 1, y(0) = 1, y 0 (0) = 0, y 00 (0) = −1. Exercise 2.9.12. Use a Taylor series to provide a derivation for the formula 1 ∂2 u (ti , xj ) ≈ 2 ui,j−1 − 2ui,j + ui,j+1 . ∂x2 H Appendix Formal Dirivation of RK2 This section is based on [1, p422]. We won’t actually cover this in class, instead opting to to deduce the same result in an easier, but unrigorous way in Section 2.4.4. If were to do it properly, this how we would do it. We will require that the Truncation Error (2.3.6) being second-order. More precisely, we want to be able to say that y(tn+1 ) − y(tn ) − Φ(tn , y(tn ); h)| 6 Ch2 h where C is some constant that does not depend on n or h. So the problem is to find expressions (using Taylor se ries) for both y(tn+1 ) − y(tn ) /h and Φ(tn , y(tn ); h) that only have O(h2 ) remainders. To do this we need to recall two ideas from 2nd year calculus: |Tn | = | ∂f (x, y)+ ∂x ∂f (x, y) + C(max{η1 , η2 })2 . ∂y for some constant C. See [1, p422] for details. 1. Use that the method is consistent to determine b2 . 2.10 • The Taylor Series for a function of 2 variables, truncated at the 2nd term is: To get an expression for |y(tn+1 ) − y(tn )|, use a Taylor Series: h2 00 y (tn ) + O(h3 ) 2 h2 0 + O(h3 ) = y(tn ) + hf tn , y(tn ) + f tn , y(tn ) 2 h2 ∂ = y(tn ) + hf tn , y(tn ) + f tn , y(tn ) + 2 ∂t ∂ y 0 (tn ) f tn , y(tn ) + O(h3 ), ∂y = y(tn ) + hf tn , y(tn ) + h2 ∂ ∂ f + f f tn , y(tn ) + O(h3 ), 2 ∂t ∂y h2 {t + {y F + O(h3 ). = y(tn ) + hf tn , y(tn ) + 2 y(tn+1 ) = y(tn ) + hy 0 (tn ) + This gives |y(tn+1 ) − y(tn )| = f tn , y(tn ) + h h ∂ ∂ f + f f tn , y(tn ) + O(h2 ). (2.10.4) 2 ∂t ∂y Next we expand the expression f(ti +αh, yi +βhf(ti , yi ) using a (two dimensional) Taylor Series: f tn + αh, yn + βhf(tn , yn ) = ∂ f(tn , yn )+ ∂t ∂ hβf(tn , yn ) f(tn , yn ) + O(h2 ). ∂y f(tn , yn ) + hα This leads to the following expansion for Φ(tn , y(tn )): Φ tn , y(tn ); k = (a + b)f tn , y(tn ) + ∂ ∂ h bα f + bβf f tn , y(tn ) + O(h2 ). (2.10.5) ∂t ∂y So now, if we are to subtract (2.10.5) from (2.10.4) and leave only terms of O(h2 ) we have to choose a + b = 1, bα = 1/2 = bβ = 1/2. That is, choose α and let β = α, b= 1 , 2α a = 1 − b. (2.10.6) (For a more detailed exposition, see [1, Chap 12]). Chapter 3 Solving Linear Systems 3.1 Introduction • We assume you know how to multiply a matrix by a vector, and a matrix by a matrix. This section of the course is concerned with solving systems of linear equations (“simultaneous equations”). All problems are of the form: find the set of real numbers x1 , x2 , . . . , xn such that • You can write the scalar product of two vectors, P x and y as (x, y) = xT y = n i=1 xi yi . (This is also called the “(usual) inner product” or the “dot product”). a11 x1 + a12 x2 + · · · + a1n xn = b1 • Recall that (Ax)T = xT AT , so (x, Ay) = xT Ay = (AT x)T y = (AT x, y). a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . • Letting I be the identity matrix, if there is a matrix B such that AB = BA = I, when we call B the inverse of A and write B = A−1 . If there is no such matrix, we say that A is singular. an1 x1 + an2 x2 + · · · + ann xn = bn where the aii and bi are real numbers. It is natural to rephrase this using the language of linear algebra: Find x = (x1 , x2 , . . . xn )T ∈ Rn such that Ax = b, (3.1.1) Lemma 3.1.1. The following are equivalent: 1. for any b, Ax = b has a solution. 2. if there is a solution to Ax = b, it is unique. where A ∈ Rn×n is a n × n-matrix and b ∈ Rn is a (column) vector with n entries. In this section, we’ll try to find clever ways of solving this system. In particular, 3. for all x ∈ Rn , if Ax = 0 then x = 0. 4. The columns (or rows) of A are linearly independent. 1. we’ll argue that its unnecessary and (more importantly) expensive to try to compute A−1 . 5. There exists a matrix A−1 ∈ Rn×n such that AA−1 = I = A−1 A. 2. we’ll have a look at Gaussian Elimination, but will study it in the context of LU-factorisation. 6. det(A) 6= 0. Item 5 of the above lists suggests that we could solve (3.1.1) as follows: find A−1 and then compute x = A−1 b. However this is not the best way to proceed. We only need x and computing A−1 requires quite a bit a work. 3. After a detour to talk about matrix norms, we calculate the condition number of a matrix. 4. This last task will require us to be able to estimate the eigenvalues of matrices; so we will finish the module by studying how to do that. 3.1.2 3.1.1 A short review of linear algebra The time it takes to compute det(A) In first-year linear algebra courses, we would compute n • A vector x ∈ R , is an ordered n-tuple of real numbers, x = (x1 , x2 , . . . , xn )T . • A matrix A ∈ Rm×n is a rectangular array of n columns of m real numbers. A−1 = 37 1 A∗ det(A) where A∗ is the adjoint matrix. But it turns out that computing det(A) directly is time-consuming. Introduction 38 Let dn be the number of multiplications required to compute the determinant of an n × n matrix using the Method of Minors (also known as “Laplace’s Formula”). We know that if a b A= c d then det(A) = ad − bc. So a A = d g c e f = a h j d f − b g j We can perform a sequence of elementary row operations to yield the system: 5 x −1 3 −1 1 0 2 −1 x2 = 3 . −5 x3 0 0 5 In the latter form, the problem is much easier: from the 3rd row, it is clear that x3 = −1, substitute into row 2 to get x2 = 1, and then into row 1 to get x1 = −1. d2 = 2. If b c e f h j 3.2.2 then a b det(A) = d e g h Solving Linear Systems d f + c g j e , h so d3 = 3d2 . Next: Elementary row operations as matrix multiplication Recall that in Gaussian elimination, at each step in the process, we perform an elementary row operation such as a11 a12 a13 A = a21 a22 a23 a31 a32 a33 being replaced by a11 a21 + µ21 a11 a31 3.2 3.2.1 The basic idea Example 3.2.1. Consider the problem: −1 3 −1 x 5 1 3 1 −2 x2 = 0 2 −2 5 x3 −9 Carl Freidrich Gauß, Germany, 1777-1855. Although he produced many very important original ideas, this wasn’t one of them. The Chinese knew of “Gaussian Elimination” about 2000 years ago. His actual contributions included major discoveries in the areas of number theory, geometry, and astronomy. a13 a23 + µ21 a13 = a32 a33 0 0 0 A + µ21 a11 a12 a13 0 0 0 a22 + µ21 a12 Gaussian Elimination In the earlier sections of this course, we studied approximate methods for solving a problem: we replaced the problem (e.g, finding a zero of a nonlinear function) with an easier problem (finding the zero of a line) that has a similar solution. The method we’ll use here, Gaussian Elimination 1 , is exact: it replaces the problem with one that is easier to solve and has the same solution. 1 a12 where µ21 = −a21 /a11 . Because a 0 0 0 0 0 0 11 a11 a12 a13 = 1 0 0 a21 0 0 0 a31 0 0 0 a12 a22 a32 a13 a23 a33 we can write the row operation as I+µ21 E(21) A, where E(pq) is the matrix of all zeros, except for epq = 1. In general each of the row operations in Gaussian Elimination can be written as I + µpq E(pq) A where 1 6 q < p 6 n, (3.2.2) and I + µpq E(pq) is an example of a Unit Lower Triangular Matrix. As we will see, each step of the process will involve multiplying A by a unit lower triangular matrix, resulting in an upper triangular matrix. It turns out that triangular matrices have important properties. So we’ll study these, and see how they relate to Gaussian Elimination. Introduction 3.2.3 Unit Lower Triangular and Upper Triangular Matrices Definition 3.2.2. L ∈ Rn×n is a Lower Triangular Matrix (LTM) if the only non-zero entries are on or below the main diagonal, i.e., if lij = 0 for 1 6 i 6 j 6 n. It is a Unit Lower Triangular Matrix (ULTM) if lii = 1. U ∈ Rn×n is an Upper Triangular Matrix (UTM) if uij = 0 for 1 6 j 6 i 6 n. It is a Unit Upper Triangular Matrix (UUTM) if uii = 1. Example 3.2.3. 39 Solving Linear Systems Theorem 3.2.6 (Properties of Lower Triangular Matrices). For any integer n > 2: (i) If L1 and L2 are n × n Lower Triangular (LT) Matrices that so too is their product L1 L2 . (ii) If L1 and L2 are n × n Unit Lower Triangular (ULT) matrices, then so too is their product L1 L2 . (iii) L1 is nonsingular if and only if all the lii 6= 0. In particular all ULT matrices are nonsingular. (iv) The inverse of a LTM is an LTM. The inverse of a ULTM is a ULTM. In class, we’ll cover the main ideas needed to prove part (iv) of Theorem 3.2.6, and for LTMs (the arguments for ULTMs are almost identical: if anything, a little easier). The other parts are left to Exercise 3.11.4. We restate Part (iv) as follows: Triangular matrices have many important properties. One of these is: the determinant of a triangular matrix is the product of the diagonal entries (for proof, see Exercise 3.11.2). Another is that the eigenvalues of a triangular matrix are just the diagonal entries (see Exercise 3.11.3. Some more properties are noted Theorem 3.2.6 below. The style of proof used in that theorem recurs through out this section of the course and so is studying for its own sake. The key idea is to partition the matrix and then apply inductive arguments. Definition 3.2.4. X is a submatrix of A if it can be obtained by deleting some rows and columns of A. Definition 3.2.5. The Leading Principal Submatrix of order k of A ∈ Rn×n is A(k) ∈ Rk×k obtained by deleting all but the first k rows and columns of A. (Simply put, it’s the k × k matrix in the top left-hand corner of A). Next recall that if A and V are matrices of the same size, and each are partitioned C W X B , , and V= A= D E Y Z where B is the same size as W, C is the same size as X, etc. Then BW + CY BX + CZ . AV = DW + EY DX + EY Armed with this, we are now ready to state our theorem on lower triangular matrices. Suppose that L ∈ Rn×n is a Lower Triangular matrix (LTM), with n > 2, and that there is a matrix L−1 ∈ Rn×n such that L−1 L = In . Then L−1 is also a Lower Triangular Matrix. Proof. Here are the main ideas. Some more details are given in Section 3.2.4 below. Introduction 40 Theorem 3.2.7. Statements analogous to Theorem 3.2.6 hold for upper triangular and unit lower triangular matrices. (For proof, see Exercise 3.11.5) 3.2.4 Extra: some details on the proof of Theorem 3.2.6 The proof is by induction. First show that the statement is true for n = 2 by writing down the of a 2×2 nonsingular lower triangular matrix. Next we shall that the result holds true for k = 3, 4, . . . n − 1. For the case where L ∈ Rn×n , partition L by the last row and column: l1,1 l2,1 L = ln−1,1 .. . 0 ... 0 0 0 0 l2,2 ... 0 ln−1,2 ... ln−1,n−1 ln,1 ln,2 ... ln,n−1 L(n−1) = rT Solving Linear Systems • zT = (−rT X)/lnn which exists because, since det(L) 6= 0 we know that lnn 6= 0. • Similarly, β = (1 − rT y)/lnn 3.3 LU-factorisation In this section we’ll see see that applying Gaussian elimination to solve the linear system Ax = b is equivalent to factoring A as the product of a unit lower triangular (ULT) and upper triangular (UT) matrix. 3.3.1 Factorising A It was mentioned towards the end of Section 3.2 that each elementary row operation in Gaussian Elimination (GE) involves replacing A with (I + µrs E(rs) )A. But (I + µrs E(rs) ) is a unit lower triangular matrix. Also, when we are finished we have an upper triangular matrix. So we can write the whole process as lnn 0 ln,n where L(n−1) is the LT matrix with n−1 rows and column obtained by deleting the last row and column of L, 0 = (0, 0, . . . , 0)T and r ∈ Rn−1 . Do the same for L−1 : ? X y , L−1 = zT β and use the fact that their product is In : (n−1) L 0 X y LL−1 = rT ln,n zT β L(n−1) X + 0zT L(n−1) y + 0β = T T T r X + ln,n z r y + ln,n β In−1 0 . = T 0 1 Lk Lk−1 Lk−2 . . . L2 L1 A = U, where each of the Li is a ULT matrix. But Theorem 3.2.6 tells us that the product of ULT matrices is itself a ULT matrix. So we can write the whole process as ˜ = U. LA Theorem 3.2.6 also tells us that the inverse of a ULT matrix exists and is a ULT matrix. So we can write A = LU, where L is unit lower triangular and U is upper triangular. This is called... Definition 3.3.1. The LU-factorization of a matrix, A, is a unit lower triangular matrix L and an upper triangular matrix U such that LU = A. 3 2 then: Example 3.3.2. If A = −1 2 This gives that • L(n−1) X = In−1 so, by the inductive hypothesis, X is an LT matrix. (n−1) • L y = 0. Since Ln−1 is invertible, we get that y = L−1 N−1 0 = 0. 3 Example 3.3.3. If A = 2 0 −1 4 2 1 3 then: −4 Introduction 3.3.2 41 A formula for LU-factorisation In the above examples, we deduced the factorisation by inspection. But the process suggests a algorithm/formula. That is, we need to work out formulae for L and U where ai,j = (LU)ij = n X lik ukj 1 6 i, j 6 n. k=1 Solving Linear Systems Then (3.3.3b) with j = 1 we have li1 = ai1 /u11 : 1 0 0 0 2 1 0 0 . L= 3 l32 1 0 4 l42 l43 1 Next (3.3.3a) with i = 2 −1 0 U= 0 0 Since L and U are triangular, so If i 6 j then ai,j = i X then (3.3.3b) with j = 2 we have li2 1 0 0 2 1 0 L= 3 2 1 4 3 l43 lik ukj k=1 If j < i then ai,j = j X lik ukj k=1 The first of these equations can be written as i−1 X ai,j = we have u2j = a2j − l21 u2j : 0 1 2 −2 −1 0 , 0 u33 u34 0 0 u44 = (ai2 −li1 u12 )/u22 : 0 0 0 1 Etc.... lik ukj + lii uij . k=1 3.3.3 But lii = 1 so: ui,j = aij − i−1 X lik ukj k=1 i = 1, . . . , j − 1, (3.3.3a) j = 2, . . . , n. And from the second: j−1 X i = 2, . . . , n, 1 li,j = aij − lik ukj ujj j = 1, . . . , i − 1. k=1 (3.3.3b) ................................................. Example 3.3.4. Find the −1 −2 A= −3 −4 LU-factorisation of 0 1 2 −2 1 4 −4 −2 4 −6 −5 0 Details of Example 3.3.4: First, using (3.3.3a) with i = 1 we have u1j = a1j : −1 0 1 2 0 u22 u23 u24 . U= 0 0 u33 u34 0 0 0 u44 Existence of an LU-factorisation Question: It this factorisation possible for all matrices? That answer is no. For example, suppose one of the uii = 0 in (3.3.3b). We’ll now characterise the matrices which can be factorised. Example 3.3.5. To prove the next theorem we need the Binet-Cauchy Theorem: det(AB) = det(A) det(B). Theorem 3.3.6. If n > 2 and A ∈ Rn×n is such that every leading principal submatrix of A is nonsingular for 1 6 k < n, then A has an LU-factorisation. Introduction 3.4 Putting it all together Now that we know how to construct the LU-factorisation of a matrix, we can use it to solve a linear system. We also need to consider the computational efficiency of the method. 42 Solving Linear Systems Definition 3.4.3. P ∈ Rn×n is a Permutation Matrix if every entry is either 0 or 1 (it is a Boolean Matrix) and if all the row and column sums are 1. Theorem 3.4.4. For any A ∈ Rn×n there exists a permutation matrix P such that PA = LU. For a proof, see p53 of text book. 3.4.1 Solving LUx = b From Theorem 3.3.6 we can factorize a A ∈ Rn×n (that satisfies the theorem’s hypotheses) as A = LU. But out overarching goal is to solve the problem: “find x ∈ Rn such that Ax = b, for some b ∈ Rn ”. We do this by first solving Ly = b for y ∈ Rn and then Ux = y. Because L and U are triangular, this is easy. The process is called back-substitution. Example 3.4.1. Use LU-factorisation to solve x −2 −1 0 1 2 1 −2 −2 1 4 x2 −3 = −3 −4 −2 4 x3 −1 −4 −6 −5 0 1 x4 Solution: Take the LU-factorisation from Example 3.3.4. Then... 3.4.3 The computational cost How efficient is the method of LU-factorization for solving Ax = b? That is, how many computational steps (additions and multiplications) are required? In Section 2.6 of the textbook [1], you’ll find a discussion that goes roughly as follows: Suppose we want to compute li,j . From the formula (3.3.3), we see that this would take j − 2 additions, j − 2 multiplications, 1 subtraction and 1 division: a total of 2j − 2 operations. Recall that 1 + 2 + ··· + k = 1 k(k + 1), 2 and 1 k(k + 1)(2k + 1). 6 So the number of operations required for computing L is 12 + 22 + · · · k2 = n n X i−1 X X i2 − 2i + 1 = (2j − 1) = i=2 j=1 i=2 1 n(n + 1)(2n + 1) − n(n + 1) 6 Cn3 6 3.4.2 Pivoting Example 3.4.2. Suppose we want to compute the LUfactorisation of 0 2 −4 A = 2 4 3 . 3 −1 1 for some C. A similar (slightly smaller) number of operations is required for computing U. (For a slightly different approach that yields cruder estimates, but requires a little less work, have a look at Lecture 10 of Afternotes [3]). This doesn’t tell us how long a computer program will take to run, but it does tell us how the execution time grows with n. For example, if n = 100 and the program takes a second to execute, then if n = 1000 we’d expect it to take about a thousand seconds. 3.4.4 We can’t compute l21 because u11 = 0. But if we swap rows 1 and 3 we get the matrix in Example 3.3.3 and so can form an LU-factorization. This like changing the order of the linear equations we want to solve. If 0 0 1 3 −1 1 P = 0 1 0 then PA = 2 4 3 . 1 0 0 0 2 −4 This is called Pivoting and P is the permutation matrix. Towards an error analysis Unlike the other methods we studied so far in this course, we shouldn’t have to do an error analysis, in the sense of estimating the difference between the true solution, and our numerical one. That is because the LU-factorisation approach should give us exactly the true solution. However, things are not that simple. Unlike the methods in earlier sections, the effects of (inexact) floating point computations become very pronounced. In the next section, we’ll develop the ideas needed to quantify these effects. Introduction 3.5 43 Solving Linear Systems Matlab 3: LU-factorisations The LU-factorisation of a matrix is typically used to solve a linear system. However, we’ll use it to compute the determinant of a matrix, and test if the approach is more efficient than the “Method of Minors”. The files mentioned below can be downloaded from http://www.maths.nuigalway.ie/MA385/lab4 3.5.1 How not to do it In Section 3.1.2 we considered how to compute the determinant of a N × N matrix using the method of minors (a.k.a. “Laplace’s Formula”). The algorithm is recursive: • If N = 1, then det(A) = a11 . Otherwise N X • det(A) = (−1)(i−1) a1i det(A(1,i) ), where A(1,i) i=1 is the submatrix/minor one gets by deleting the first row and (i) column of A. This is implemented in the code MyDet.m. This is an example of a Matlab function file. A function file begins with the keyword function. It can be invoked within Matlab just by typing the name. Download and test it on some (small!) matrix. As we saw in class, and, the time it takes to run is proportional to N!. To see how long it takes to run in practice, you could try something like this: >> A= rand(4) % Create a random 4x4 matrix >> tic; d = MyDet(A); toc % Start the clock % compute det(A), stop the clock The program TimeDet.m gives a more systematic way of doing this. Download it, read it and run it. Based on your results, answer the following questions. Q1. From the data you have collected, predict how long it would take to compute the determinant of a 16 × 16 matrix? Q2. What is the size of the largest matrix whose determinant you could compute in 10 minutes? 3.5.2 How to do it As we’ve seen in recent lectures, one can factorise a Matrix A into L and U, where L is a Unit Lower Triangular Matrix, and U is an Upper Triangular Matrix. The algorithm for this is ui,j = aij − i−1 X k=1 lik ukj i = 1, . . . , j − 1, j = 2, . . . , n. li,j lii = 1, j−1 X 1 = aij − lik ukj ujj k=1 i = 2, . . . , n, j = 1, . . . , i − 1. This is implemented in the Matlab program LUFactor.m. Download and test it. Usually we use the LU-factorisation to solve a system of linear equations Ax = b, by first using backsubstitution to solve Ly = b and then Ux = y. Instead, we will just use the factorisation to compute det(A). According to the Cauchy-Binet Theorem, det(XY) = det(X) det(Y). So det(A) = det(L) det(U). Furthermore, the determinant of any triangular matrix is just the product of the entries on the diagonal, so det(L) = l11 l22 · · · lNN = 1 and det(U) = u11 u22 · · · uNN This suggests an algorithm for computing det(A) using its LU-factorisation. Moreover, we showed that the time required to compute the factorisation is bounded by CN3 . And N3 N!. ................................................. Q3. Write a short Matlab function to compute det(A) by computing det(U). Q4. Write a script similar to TimeDet.m that verifies that the time required for computing det(A) this way is bounded by CN3 . What is your estimate for C? Q5. Based on your results from Q4, what is the largest value of N for which you can compute the determinant of an N × N matrix in less than 10 minutes? Important: Send your program from Q3, and your answers Q2 and Q5 to me by email by the end of the class. ([email protected]) ................................................. To understand the Matlab code provided, the following may be useful: • the tic starts a stopwatch timer, toc prints the number of seconds elapsed since tic was used. • sum(x) computes the sum of the entries of the vector x • disp(’Message’) prints the word Message to the command window. disp(x) displays the contents of the variable x. • [M,N]=size(A); Sets M and N to be the number of rows and columns (respectively) in A. • i == j checks if i = j. It does not set i = j. M~=N means M 6= N. Introduction 3.6 Norms, Condition Numbers, and Eigenvalues Motivation: error analysis for linear solvers As mentioned in Section 3.4.4, all computer implementations of algorithms that involve floating-point numbers (roughly, finite decimal approximations of real numbers) contain errors due to round-off error. We saw this, for example, in labs when convergence of Newton’s method stagnated when the approximation was within about 10−16 of the true solution. It transpires that computer implementations of LUfactorization, and related methods, lead to these roundoff errors being greatly magnified: this phenomenon is the main focus of this section of the course. You might remember from earlier sections of the course that we had to assume functions where well-behaved in the sense that |f(x) − f(y)| 6 L, |x − y| for some number L, so that our numerical schemes (e.g., fixed point iteration, Euler’s method, etc) would work. If a function doesn’t satisfy a condition like this, we say it is “ill-conditioned”. One of the consequences is that a small error in the inputs gives a large error in the outputs. We’d like to be able to express similar ideas about matrices: that A(u − v) = Au − Av is not too “large” compared to u − v. To do this we used the notion of a “norm” to describing the relatives sizes of the vectors u and Au. 3.6.1 44 Solving Linear Systems 4. ku + vk 6 kuk + kvk (triangle inequality). The norms of a vector give us some information about the size of the vector. But there are different ways of measuring the size: you could take the absolute value of the largest entry, you cold look at the “distance” for the origin, etc... There are three important examples. Definition 3.6.2. Let v ∈ Rn : v = (v1 , v2 , . . . , vn−1 , vn )T . (i) The 1-norm (also known as the Taxi cab or Manhattan norm) is kvk1 = n X |vi |. i=1 (ii) The 2-norm (a.k.a. the Euclidean norm) kvk2 = X n v2i 1/2 . i=1 Note, if v is a vector in Rn , then vT v = v21 + v22 + · · · + v2n = kvk22 . (iii) The ∞-norm (also known as the max-norm) n kvk∞ = max |vi |. i=1 Example 3.6.3. If v = (−2, 4, −4)T then Three vector norms When we want to consider the size of a real number, without regard to sign, we use the absolute value. Important properties of this function are: 1. |x| > 0 for all x. 2. |x| = 0 if and only if x = 0. 3. |λx| = |λ||x|. 4. |x + y| 6 |x| + |y| (triangle inequality). This notion can be extended to vectors and matrices. Definition 3.6.1. Let Rn be the set of all the vectors of length n of real numbers. The function k · k : Rn → R is called a norm on Rn if, for all u, v ∈ Rn 1. kvk > 0, 2. kvk = 0 if and only if v = 0. 3. kλvk = |λ|kvk for any λ ∈ R, In Figure 3.1, the first diagram shows the unit ball in R given by the 1-norm: the vectors x = (x1 , x2 ) in R2 are such that kxk1 = |x1 | + |x2 | = 1 are all found on the diamond (top left). In the second diagram, the vectors p have x21 + x22 = 1 and so are arranged in a circle (top right). The bottom diagram gives the unit ball in k · k∞ , for which the largest component of each vector is 1. It is easy to show that k · k1 and k · k∞ are norms (see Exercise 3.11.11). And it is not hard to show that k · k2 satisfies conditions (1), (2) and (3) of Definition 3.6.1. But it takes a little bit of effort to show that k · k2 satisfies the triangle inequality. First we need 2 Lemma 3.6.4 (Cauchy-Schwarz). | n X i=1 ui vi | 6 kuk2 kvk2 , ∀u, v ∈ Rn . Introduction 45 (0, 1) (0, 1) ...... .... .... .... ... . .... . . .... .... .... .... .... .. .. .... . . . (−1, 0) .... ... . . ...... .... .... .... .... . . .. (1, 0) .... . . . .... .... .... .... .... ... .... .. . .... .... ...... . ... ..... .... .. ..... ..... ..... ..... ..... .... . ... ... . . .. ... . . (−1, 0) .. ... . ... .. ... . ... (1, 0) ... . . ... ... . ... ..... .... . ..... ..... ..... ..... ..... ..... (0, −1) (0, −1) (0, 1) Solving Linear Systems 3.7.1 Computing Matrix Norms The formula for a subordinate matrix norm in Definition 3.7.1, is sensible, but not much use to if we actually want to compute, say, kAk1 , kAk∞ or kAk2 . For a given A, we’d have to calculate kAvk/kvk for all v. And there is rather a lot of them. Fortunately, there are some easier ways of computing the more important norms. We’ll see that • The ∞-norm of a matrix is just largest absolutevalue row sum. (−1, 0) (1, 0) • The 1-norm of a matrix is just largest absolutevalue column sum. (0, −1) Fig. 3.1: The unit vectors in R2 : kxk1 = 1, kxk2 = 1, kxk∞ = 1, • The 2-norm of the matrix A is the square root of the largest eigenvalue of AT A. 3.7.2 The proof can be found in any text-book on analysis. Now can now apply it to show that Lemma 3.6.5. ku + vk2 6 kuk2 + kvk2 . The max-norm on Rn×n Theorem 3.7.2. For any A ∈ Rn×n the subordinate matrix norm associated with k · k∞ on Rn can be computed by n X kAk∞ = max |aij |. i=1,...n j=1 It follows directly that Corollary 3.6.6. k · k2 is a norm. 3.7 Matrix Norms n Definition 3.7.1. Given any norm k · k on R , there is a subordinate matrix norm on Rn×n defined by kAk = maxn v∈R? kAvk , kvk A similar result holds for the 1-norm, the proof of which is left until Exercise 3.11.14 Theorem 3.7.3. (3.7.1) kAk1 = max j=1,...,n n X |ai,j |. (3.7.2) i=1 n where A ∈ Rn×n and Rn ? = R /{0}. You might wonder why we define a matrix norm like this. The reason is that we like to think of A as an operator on Rn : if v ∈ Rn then Av ∈ Rn . So rather than the norm giving us information about the “size” of the entries of a matrix, it tells us how much the matrix can change the size of a vector. 3.8 Computing kAk2 Computing the 2-norm of a matrix is a little harder that computing the 1- or ∞-norms. However, later we’ll need estimates not just for kAk, but also kA−1 k. And, unlike the 1- and ∞-norms, we can estimate kA−1 k2 without explicitly forming A−1 . Introduction 3.8.1 46 Eigenvalues We begin by recalling some important facts about eigenvalues and eigenvectors. Definition 3.8.1. Let A ∈ Rn×n . We call λ ∈ C an eigenvalue of A if there is a non-zero vector x ∈ Cn such that Ax = λx. Solving Linear Systems The importance of this result is that the square root of an eigenvalue of AT A is a non-negative and real. Furthermore, part of the above proof involved showing that, if AT A x = λx, then √ kAxk2 λ= . kxk2 This at the very least tells us that kAk2 := maxn We call any such x an eigenvector associated with A. Some properties of eigenvalues: (i) If A is a real symmetric matrix (i.e., A = AT ), its eigenvalues and eigenvectors are all real-valued. x∈R? p kAxk2 > max λi . i=1,...,n kxk2 With a bit more work, we can show that if λ1 6 λ2 6 · · · 6 λn are the the eigenvalues of B = AT A, then p kAk2 = λn . (ii) If λ is an eigenvalue of A, the 1/λ is an eigenvalue of A−1 . Theorem 3.8.3. Let A ∈ Rn×n . Let λ1 6 λ2 6 · · · 6 λn , be the eigenvalues of B = AT A. Then p p λi = λn , kAk2 = max (iii) If x is an eigenvector associated with the eigenvalue λ then so too is ηx for any non-zero scalar η. Proof: i=1,...,n (iv) An eigenvector may be normalised as kxk22 = xT x = 1. (v) There are n eigenvectors λ1 , λn , . . . , λn associated with the real symmetric matrix A. Let x(1) , x(2) , ..., x(n) be the associated normalised eigenvectors. Then the eigenvectors are linearly independent and so form a basis for Rn . That is, any vector v ∈ Rn can be written as a linear combination: v= n X αi x(i) . i=1 (vi) Furthermore, these eigenvectors are orthogonal and orthonormal: 1 i=j (i) T (j) x x = 0 i 6= j 3.8.2 3.9 Condition Numbers 3.9.1 Consistency of matrix norms It should be clear from (3.7.1) that, if k·k is a subordinate matrix norm then, for any u ∈ Rn , A ∈ Rn×n , kAuk 6 kAkkuk. Singular values The singular values of a matrix A are the square roots of the eigenvalues of AT A. They play a very important role in matrix analysis and in areas of applied linear algebra, such as image and text processing. Our interest here is in their relationship to kAk2 . Lemma 3.8.2. For any matrix A, the eigenvalues of AT A are real and non-negative. Proof: It is an important result: we’ll need it later. There is an analogous statement for the product of two matrices: Definition 3.9.1. A matrix norm k · k is consistent if kABk 6 kAkkBk, for all A, B ∈ Rn×n . Theorem 3.9.2. Any subordinate matrix norm is consistent. The proof is left to Exercise 3.11.17. You should note that there are matrix norms which are not consistent. See Exercise 3.11.18. Introduction 3.9.2 47 A short note on computer representation of numbers This course is concerned with the analysis of numerical methods to solve problems. It is implicit that these methods are implemented on a computer. However, computers are finite machines: there are limits on the amount of information they can store. In particular there are limits on the number of digits that can be stored for each number, and there are limits on the size of number that can be stored. Because of this, just working with a computer introduces an error. Modern computers don’t store numbers in decimal (base 10), but in binary (base 2) “floating point numbers” of the form : ±a × 2b−X . Most often 8 bytes (64 bits or binary digits) are used to store the three components of this number: the signs, a (the “significand” or “mantissa”) and b − 1024 (the exponent). One bit is used for the sign, 52 for a, and 11 for b. This is called double precision. Note that a has roughly 16 decimal digits. (Some older computer systems sometimes use single precision where a has 23 bits (giving 8 decimal digits) and b has 7; so too do many new GPU-based systems). When we try to store a real number x on a computer, we actually store the nearest floating-point number. That is, we end up storing x+δx, where |δx| is the “round-off” error and |δx|/|x| is the relative error. Since this is not a course on computer architecture, we’ll simplify a little and just take it that single and double precision systems lead to a relative error of 10−8 and 10−16 respectively. 3.9.3 Solving Linear Systems Theorem 3.9.4 means that, if we are solving the system Ax = b but because of (e.g.) round-off error, there is an error in the right-hand side, then the relative error in the solution is bounded by the condition number of A multiplied by the relative error in the right-hand side.2 Condition Numbers (Please refer to p68–70 of [1] for an thorough development of the concepts of local condition number and relative local condition number ). Definition 3.9.3. The condition number of a matrix is κ(A) = kAkkA−1 k. If κ(A) 1 then we say A is ill-conditioned. ................................................. We now come the most important theorem of this section. It tells us how the condition number of a matrix is related to (numerical) error. Theorem 3.9.4. Suppose that A ∈ Rn×n is nonsingular and that b, x ∈ Rn are a non-zero. If Ax = b and A(x + δx) = (b + δb) then kδbk kδxk 6 κ(A) . kxk kbk 2 The importance of this results is that it shows that the numerical error in solving a linear system can be quite dependent on error introduced by just using floating point approximations of the entries in the right-hand side vector, rather than the solution process itself. This important observation, and the analysis above can be easily extended to the case where, instead of solving A(x+δx) = (b+δb), we solve (A+δA)(a+δx) = (b+δb). That is, where the coefficient matrix contains errors due to roundoff. A famous example where this occurs is the Hilbert Matrix. See the nice blog post by Cleve Moler on the issue: Cleve’s Corner, Feb 2nd, 2013. Introduction 48 Example 3.9.5. Suppose we are using a computer to solve Ax = b where 10 12 1 and b = . A= 0.08 0.1 1 But, due to round-off error, right-hand side has a relative error (in the ∞-norm) of 10−6 . Then For every matrix norm we get a different condition number. Consider the following example: Example 3.9.6. Let A 1 1 A = 1 .. . 1 be the n × n matrix 0 0 ... 0 1 0 . . . 0 0 1 . . . 0 . 0 0 ... 1 What are κ1 (A) and κ∞ (A)? 3.9.4 Estimating κ2 In the previous example we computed κ(A) by finding the inverse of A. In general, that is not practical. However, we can estimate the condition number of a matrix. In particular, this is usually possible for κ2 (A). Recall √ that Theorem 3.8.3 tells us that kAk2 = λn where λn is the largest eigenvalue of B = AT A. Furthermore, we can show that 1/2 κ2 (A) = λn /λ1 , where λ1 and λn are, respectively, the smallest and largest eigenvalues of B = AT A. Hint: We first show that AT A and AAT are “similar”, i.e., they have the same eigenvalues. This can be Solving Linear Systems done by noting that, if AT Ax = λx, then A(AT A)x = (AAT )Ax = λ(Ax). That is, if λ is an eigenvalue of AT A, with corresponding eigenvector x, then λ is also an eigenvalue of AAT with corresponding eigenvector Ax. Now use this, and that for any nonsingular matrix X, (XT )−1 = (X−1 )T , to prove the result. Introduction 3.10 49 Gerschgorin’s theorems The goal of this final section is to learn of a simple, but very useful approach, to estimating eigenvalues of matrices. This can be used, for example, when computing kAk2 and κ2 (A). This involves two theorems which together provide quick estimates of the eigenvalues of a matrix. 3.10.1 Solving Linear Systems Note that we made no assumption on the symmetry of A. However, if A is symmetric, then its eigenvalues are real and so the theorem can be simplified. Example 3.10.3. Let 4 A = −2 1 1 −3 0 0 2 −2 Gerschgorin’s First Theorem (See Section 5.4 of [1]). Definition 3.10.1. Given a matrix A ∈ Rn×n , the Gerschgorin 3 Discs Di are the discs in the complex plane with centre aii and radius ri : n X ri = 3.10.2 |aij |. j=1,j6=i So Di = {z ∈ C : |aii − z| 6 ri }. Theorem 3.10.2 (Gerschgorin’s First Theorem). All the eigenvalues of A are contained in the union of the Gerschgorin discs. Warning: an earlier draft of this proof was mostly nonsense. Proof. Let λ be an eigenvalues of A, so Ax = λx for the corresponding eigenvector x. Suppose that xi is the entry of x with largest absolute value. That is |xi | = kxk∞ . Looking at the ith entry of the vector Ax we see that (Ax)i = λxi =⇒ n X aij xj = λxi . j=1 n X This can be rewritten as aii xi + aij xj = λxi , which j=0 j6=i gives (aii − λ)xi = − n X aij xj j=0 j6=i By the triangle inequality, |aii − λ||xi | = | n X aij xj | 6 j=0 j6=i n X |aij ||xj | 6 |xi | j=0 j6=i n X |aij |, j=0 j6=i since |xi | > |xj | for all j. Dividing by |xi | gives |aii − λ| 6 n X |aij |, j=0 j6=i as required. 3 Semyon Aranovich Gerschgorin, 1901–1933, Belarus. The ideas here appeared in a paper in 1931. They were subsequently popularised by others, most notably Olga Taussky, the pioneering matrix theorist (A recurring theorem on determinants, The American Mathematical Monthly, vol 56, p672–676. 1949.) Gerschgorin’s 2nd Theorem The first theorem proves that every eigenvalue is contained in a disc. The converse, that every disc contains an eigenvalue, is not true in general. However, Instead we have the following result. Theorem 3.10.4 (Gerschgorin’s Second Theorem). If k of discs are disjoint (have an empty intersection) from the others, their union contains k eigenvalues. Proof. We won’t do the proof in class, and you are not expected to know it. Here is a sketch of it: let B(ε) be the matrix with entries aij i=j bij = εaij i 6= j. So B(1) = 1 and B(0) is the diagonal matrix whose entries are the diagonal entries of A. Each of the eigenvalues of B(0) correspond to its diagonal entries and (obviously) coincide with the Gerschgorin discs of B(0) – the centres of the Gerschgorin discs of A. The eigenvalues of B are the zeros of the characteristic polynomial det(B(ε) − λI) of B. Since the coefficients of this polynomial depend continuously on ε, so too do the eigenvalues. Now as ε varies from 0 to 1, the eigenvalues of B(ε) trace a path in the complex plane, and at the same time the radii of the Gerschgorin discs of A increase from 0 to the radii of the discs of A. If a particular eigenvalue was in a certain disc for ε = 0, the corresponding eigenvalue is in the corresponding disc for all ε. Thus if one of the discs of A is disjoint from the others, it must contain an eigenvalue. The same reasoning applies if k of the discs of A are disjoint from the others; their union must contain k eigenvalues. Introduction 3.10.3 Using Gerschgorin’s theorems Example 3.10.5. Locate the regions contains the eigenvalues of −3 1 2 A= 1 4 0 2 0 −6 (The actual eigenvalues are approximately −7.018, −2.130 and 4.144.) Example 3.10.6. Use Gerschgorin’s Theorems to find an upper and lower bound for the Singular Values of the Matrix 4 −1 2 A = 2 3 1 1 1 4 Example 3.10.7. Let D ∈ Rn×n be the matrix with the same diagonal entries as A, and zeros for all the offdiagonals. That is: a1,1 0 ... 0 0 a2,2 . . . 0 0 0 .. . .. D = .. . . 0 0 . . . a 0 n−1,n−1 0 0 ... 0 an,n Some iterative schemes (e.g., Jacobi’s and Gauss-Seidel) for solving system of equations involve successive multiplication by the matrix T = D−1 (A − D). Proving that these methods work often involves showing that if λ is an eigenvalue of T then |λ| < 1. Using Gerschgorin’s theorem, we can show that this is indeed the case if A is strictly diagonally dominant. 50 Solving Linear Systems Introduction 3.11 51 Exercises Exercise 3.11.1. Show that det(σA) = σn det(A) for any A ∈ Rn×n and any scalar σ ∈ R. Note: the purpose of this exercise is not just to help develop skills at establishing important properties of matrices. It also gives us another reason to avoid trying to calculate the determinant of the coefficient matrix in order to find the inverse, and thus the solution to the problem. For example A ∈ R10×10 and the system is rescaled by σ = 0.1, then det(A) is rescaled by 10−10 – it is almost singular! Exercise 3.11.2. Let L be a lower triangular n × n matrix. Show that det(L) = n Y ljj . Solving Linear Systems Exercise 3.11.10. In 3.5 you were provided with a Matlab function LUFactor which, if given a matrix A, returns the LU-factors. Write a Matlab function/script which can be used to solve a linear system. Your function should (i) takes as inputs a matrix A ∈ Rn×n and vector b ∈ Rn , (ii) compute the LU-factorisation using the provided code, (iii) use back-substitution to solve Ly = b, (iv) use back-substitution to solve Ux = y, (v) returns y. Your program should also: j=0 Hence give a necessary and sufficient condition for L to be invertible. What does that tell us about Unit Lower Triangular Matrices? Exercise 3.11.3. Let L be a lower triangular n × n matrix. Show that each diagonal entry of L, ljj is an eigenvalue of L. Exercise 3.11.4. Prove Parts (i)–(iii) of Theorem 3.2.6. Exercise 3.11.5. Prove Theorem 3.2.7. Exercise 3.11.6. Suppose the n × n matrices A and C are both LTMs, and that there is a n × n matrix B such that AB = C. Must B be a LTM? Suppose A and C are ULTMs. Must B be ULT? Why? Exercise 3.11.7. Construct an alternative proof of Theorem 3.2.6 (iv) as follows: Suppose that L is a non-singular lower triangular matrix. If b ∈ Rn is such that bi = 0 for i = 1, . . . , k 6 n, and y solves Ly = b, then yi = 0 for i = 1, . . . , k 6 n. (Hint: partition L by the first k rows and columns.) (a) be accompanied by a piece of code that verifies your approach can solve a 20 × 20 linear system Ax = b; (b) verify that it works by computing the residual, kb − Axk. In Matlab, this is norm(b - A*x) Exercise 3.11.11. Show that k · k1 and k · k∞ are norms on Rn . Exercise 3.11.12. Show that k · k2 on Rn satisfies conditions (1)–(3) in Definition 3.6.1. Exercise 3.11.13. Show that, for any subordinate matrix norm on Rn×n , the norm of the identity matrix is 1. Exercise 3.11.14. Prove Theorem 3.7.3. Hint: Suppose that n X |aij | 6 C, j = 1, 2, . . . n, i=1 show that for any vector x ∈ Rn n X |(Ax)i | 6 Ckxk1 . i=1 Exercise 3.11.8. Many textbooks and computing systems compute the factorisation A = LDU where L and U are unit lower and unit upper triangular matrices respectively, and D is a diagonal matrix. Show such a factorisation exists. Exercise 3.11.9. Suppose that A ∈ Rn×n is nonsingular and symmetric, and that every leading principle submatrix of A is nonsingular. Use Exercise 3.11.8 so show that A can be factorised as A = LDLT . How would this factorization be used to solve Ax = b? The following exercise was added after the notes were originally posted online Now find a vector x such that Deduce (3.7.2). Pn i=1 |(Ax)i | = Ckxk1 . Exercise 3.11.15. Show that, for any vector x ∈ Rn , kxk∞ 6 kxk2 and kxk22 6 kxk1 kxk∞ . For each of these inequalities, give an example for which the equality holds. Deduce that kxk∞ 6 kxk2 6 kxk1 . Exercise 3.11.16. Show that if x ∈ Rn , then kxk1 6 √ nkxk∞ and that kxk2 6 nkxk∞ . Exercise 3.11.17. Prove that all subordinate matrix norms are consistent. Introduction 52 Exercise 3.11.18. One might think it intuitive to define the “max” norm of a matrix as follows: kAkf ∞ = max |aij |. i,j Show that this is indeed a norm on Rn×n . Show that, however, it is not consistent. Exercise 3.11.19. Let A be the matrix 2 0 0 0 0 1 2 0 0 0 A = 1 0 2 0 0 . 1 0 0 2 0 1 0 0 0 2 What is κ1 (A)? and κ∞ (A)? (Hint: You’ll need to find A−1 . Recall that the inverse of a lower triangular matrix is itself lower triangular). Note: to do this using Maple: > > > > > > with(linalg): n:=5; f:= (i,j) -> piecewise( (i=j),2, (j=1),1); A := matrix(5,5, f); evalf(cond(A,1)); evalf(cond(A,2)); evalf(cond(A,infinity)); Exercise 3.11.20. Let A be the matrix 0.1 0 0 A = 10 0.1 10 0 0 0.1 Compute κ∞ (A). Suppose we wish to solve the system of equations Ax = b on single precision computer system (i.e., the relative error in any stored number is approximately 10−8 ). Give an upper bound on the relative error in the computed solution x. Exercise 3.11.21. A real matrix A = {ai,j } is Strictly Diagonally Dominant if |aii | > n X |ai,j | for i = 1, . . . , n. j=1,j6=i Show that all strictly diagonally dominant matrices are nonsingular. Exercise 3.11.22. Let 4 A = −1 0 −1 4 −1 0 −1 −3 Use Gerschgorin’s theorems to give an upper bound for κ2 (A). Solving Linear Systems Exercise 3.11.23. The problem with the matrix in Example 3.9.5 is that the magnitudes of the entries in the first and second rows of A are very different. This means that det(A) is quite small and thus that kA−1 k∞ is quite large. Let D = diag(A), i.e, D is a matrix of the same size as A with aij i = j dij = 0 i 6= j. For this problem, what is the condition number (in the ∞-norm) of the matrix D−1 A?
© Copyright 2024