MA385 (Numerical Analysis I) Niall Madden October 28, 2014

MA385 (Numerical Analysis I)
Niall Madden
October 28, 2014
0 MA385: Preliminaries
0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
4
1 Solving nonlinear equations
1.1 Bisection . . . . . . . . . . . . . .
1.2 The Secant Method . . . . . . . .
1.3 Newton’s Method . . . . . . . . .
1.4 Fixed Point Iteration . . . . . . .
1.5 Matlab 1: the bisection and secant
1.6 Nonlinear equations: Exercises . .
1.7 Appendix . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
methods
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
10
12
14
17
19
20
2 Initial Value Problems
2.1 Introduction . . . . . . . . .
2.2 One step methods . . . . . .
2.3 Error Analysis . . . . . . . .
2.4 Runge-Kutta 2(RK2) . . . .
2.5 Higher-order methods . . . .
2.6 From IVPs to Linear Systems
2.7 Matlab 2: Euler’s Method . .
2.8 Matlab 3: RK methods . . .
2.9 IVPs: Exercises . . . . . . .
2.10 Appendix . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
25
27
29
32
34
35
36
.
.
.
.
.
.
.
.
.
.
.
37
37
38
40
42
43
44
45
45
46
49
51
3 Solving Linear Systems
3.1 Introduction . . . . . . . .
3.2 Gaussian Elimination . . .
3.3 LU-factorisation . . . . . .
3.4 Putting it all together . . .
3.5 Matlab 3: LU-factorisations
3.6 Norms, Condition Numbers,
3.7 Matrix Norms . . . . . . .
3.8 Computing kAk2 . . . . .
3.9 Condition Numbers . . . .
3.10 Gerschgorin’s theorems . .
3.11 Exercises . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
and Eigenvalues
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 0
MA385: Preliminaries
0.1
0.1.1
Introduction
who would enjoy a bit more discussion at an easier
pace might turn to it.
Welcome to MA385/MA530
• Cleve Moler, Numerical Computing with MATLAB [2].
The emphasis is on the implementation of algorithms in Matlab, but the techniques are well explained and there are some nice exercises. Also, it
is freely available online.
This is a Semester 1, two-part, upper level module on
numerical analysis. It is often taken in conjunction with
MA378/MA531. You may be taking this course if you are
enrolled in 3rd year Mathematics, 3rd year Mathematical
Science, 4th year Financial Mathematics, 4th year Applied
Mathematics, the H.Dip. in Applied Science (Maths), or
year one of the two-year MA program.
The basic information for the course is as follows:
• James F Epperson, An introduction to numerical
methods and analysis,[5]. There are five copies in
the library at 519.4.
• Michelle Schatzman, Numerical Analysis,[8].
Lecturer: Dr Niall Madden, School of Maths. My office
is in room ADB-1013, Ar´as de Br´
un.
Email: [email protected]
• Stoer and Bulirsch,Introduction to Numerical Analysis [6]. A very complete reference for this course.
Lectures: Tuesday and Thursday, 2.00–2.50. See website for venues.
• Quarteroni, et al, Numerical Mathematics [7]. Also
very thorough, and with sample Matlab code.
Tutorial/Lab: TBA (to begin during Week 3)
Web site:
Assessment:
• Three assignments that combine written problem sets with lab work and participation. Each is worth 10%.
The on-line content for the course will be hosted at
NUIGalway.BlackBoard.com and on the Mathematics webserver at http://www.maths.nuigalway.ie/MA385. There
you’ll find various pieces of information, including these
notes, problem sets, announcements, etc.
If you are registered for MA385, you should be automatically enrolled onto the blackboard site. If you are
enrolled in MA530, please send an email to me.
These notes are a synopsis of the course material. My aim is to provide these in 4 or 5 sections, and
always in advance of the class. They contain most of the
main remarks, statements of theorems, results and exercises. However, they will not contain proofs of theorems,
examples, solutions to exercises, etc. Please let me know
of the typos and mistakes that you spot.
• A 2-hour exam in December, worth 70%.
The main text-book is S¨
uli and Mayers, An Introduction to Numerical Analysis, Cambridge University
Press [1]. This is available from the library at 519.4 MAY,
and there are copies in the book shop. It is very well
suited to this course: though it does not over complicate the material, it approaches topics with a reasonable
amount of rigour. There is a good selection of interesting problems. The scope of the book is almost perfect
for the course, especially for those students taking both
semesters. You should buy this book.
Other useful books include
• G.W. Stewart, Afternotes on Numerical Analysis,
SIAM [3]. In the library at 519.4 STE. Moreover,
the full text is freely available online to NUI Galway
users! This book is very easy to read, and students
You should try to bring these notes to class. It will
make following the lecture easier, and you’ll know what
notes to take down.
2
Introduction
0.1.2
What is Numerical Analysis?
Numerical analysis is the design, analysis and implementation of numerical methods that yield exact or approximate solutions to mathematical problems.
It does not involve long, tedious calculations. We
won’t (usually) implement Newton’s Method by hand, or
manually do the arithmetic of Gaussian Elimination, etc.
The Design of a numerical method is perhaps the
most interesting; its often about finding a clever way
swapping the problem for one that is easier to solve, but
has the same or similar solution. If the two problems have
the same solution, then the method is exact. If they are
similar, then it is approximate.
The Analysis is the mathematical part; its usually culminates in proving a theorem that tells us (at least) one
of the following
3
MA385: Preliminaries
these principles to design algorithms for solving mathematical problems, and discover the properties of these
algorithms. course to solve problems.
Students will gain the ability to use a Matlab to implement these algorithms, and adapt the codes for more
general problems, and for new techniques.
Mathematical Preliminaries
We’ll look at the implementation of the methods in labs.
Anyone who can remember their first and second years of
analysis and algebra should be able to handle this course.
Students who know a little about differential equations
(initial value and boundary value) will find a certain sections (particularly in Semester II) somewhat easier than
those who haven’t.
If its been a while since you covered basic calculus,
you will find it very helpful to revise the following: the Intermediate Value Theorem; Rolle’s Theorem; The Mean
Value Theorem; Taylor’s Theorem, and the triangle inequality: |a + b| 6 |a| + |b|. You’ll find them in any good
text book, e.g., Appendix 1 of S¨
uli and Mayers.
You’ll also find it helpful to recall some basic linear
algebra, particularly relating to eigenvalues and eigenvectors. Consider the statement: “all the eigenvalues of a
real symmetric matrix are real”. If are unsure what the
meaning of any of the terms used, or if you didn’t know
that its true, you should have a look at a book on Linear
Algebra.
Topics
0.1.3
• The method will work; that our algorithm will yield
the solution we are looking for;
• how much effort is required;
• if the method is approximate, determine how close
the true solution be to the real one. A description
of this aspect of the course, to quote the Epperson
[5], is being “rigorously imprecise or approximately
precise”.
1. We’ll start with a review of Taylor’s theorem. It is
central to the algorithms of the following sections.
2. Root-finding and solving non-linear equations.
3. Initial value ordinary differential equations.
4. Matrix Algorithms I: systems of linear equations.
5. Matrix Algorithms II: eigenvalues and eigenvectors.
We also see how these methods can be applied to realworld, including Financial Mathematics.
Learning outcomes
When you have successfully completed this course, you
will be able to demonstrate your factual knowledge of
the core topics (root-finding, solving ODEs, solving linear systems, estimating eigenvalues), using appropriate
mathematical syntax and terminology.
Moreover, you will be able to describe the fundamental principles of the concepts (e.g., Taylor’s Theorem)
underpinning Numerical Analysis. Then, you will apply
Why take this course?
Many industry and academic environments require graduates who can solve real-world problems using a mathematical model, but these models can often only be resolved using numerical methods. To quote one Financial
Engineer: “We prefer approximate (numerical) solutions
to exact models rather than exact solutions to simplified
models”.
Another expert, who leads a group in fund management with DB London, when asked “what sort of graduates would you hire”, the list of specific skills included
• A programming language and a 4th-generation language such as Matlab (or S-PLUS).
• Numerical Analysis
Recent graduates of our Financial Mathematics, Computing and Mathematics degrees often report to us that
they were hired because that had some numerical analysis background, or were required to go and learn some
before they could do some proper work. This is particularly true in the financial sector, games development, and
mathematics civil services (e.g., MET office, CSO).
Bibliography
0.2
[1] E S¨
uli and D Mayers, An Introduction to Numerical Analysis, 2003. 519.4 MAY.
Taylor’s Theorem
Taylor’s theorem is perhaps the most important mathematical tool in Numerical Analysis. Providing we can
evaluate the derivatives of a given function at some point,
it gives us a way of approximating the function by a polynomial.
Working with polynomials, particularly ones of degree
3 or less, is much easier than working with arbitrary functions. For example, polynomials are easy to differentiate
and integrate. Most importantly for the next section of
this course, their zeros are easy to find.
Our study of Taylor’s theorem starts with one of the
first theoretical results you learned in university mathematics: the mean value theorem.
.................................................
[2] Cleve
Moler,
Numerical
Computing
with
MATLAB,
Cambridge
University Press. Also available free from
http://www.mathworks.com/moler
[3] G.W. Stewart, Afternotes on Numerical Analysis, SIAM, 1996. 519.4 STE.
[4] G.W. Stewart, Afternotes goes to Graduate
School, SIAM, 1998. 519.4 STE.
[5] James F Epperson, An introduction to numerical methods and analysis. 519.4EPP
[6] Stoer and Bulirsch, An Introduction to Numerical Analysis, Springer.
[7] Quarteroni, Sacco and Saleri, Numerical Mathematics, Springer.
Theorem 0.2.1 (Mean Value Theorem). If f is function
that is continuous and differentiable for all a 6 x 6 b,
then there is a point c ∈ [a, b] such that
[8] Michelle Schatzman, Numerical Analysis: a
mathematical introduction, 515 SCH.
f(b) − f(a)
= f 0 (c).
b−a
.................................................
This is just a consequence of Rolle’s Theorem, and
has few different interpretations. One is that the slope of
the line that intersects f at the points a and b is equal
to the slope of the tangent to f at some point between
a and b.
There are many important consequences of the MVT,
some of which we’ll return to later. Right now, we interested in the fact that the MVT tells us that we can
approximate the value of a function by a near-by value,
with accuracy that depends on f 0 :
4
Taylor’s Theorem
5
Or we can think of it as approximating f by a line:
MA385: Preliminaries
ries) that approximates the function f about the point
x = a is
(x − a)2 00
f (a)+
2
(x − a)3 00
(x − a)k (k)
f (a) + . . .
f (a).
3!
k!
pk (x) = f(a) + (x − a)f 0 (a) +
We’ll return to this topic later, with a particular emphasis on quantifying the “error” in the Taylor Polynomial.
.................................................
But what if we wanted a better approximation? We
could replace our function with, say, a quadratic polynomial. Let p2 (x) = b0 + b1 (x − a) + b2 (x − a)2 and solve
for the coefficients b0 , b1 and b2 so that
p2 (a) = f(a),
p20 (a) = f 0 (a),
Example 0.2.3. Write down the Taylor polynomial of
degree k that approximates f(x) = ex about the point
x = 0.
p200 (a) = f 00 (a).
Giving that
f 00 (a)
.
2
.................................................
Next, if we try to construct an approximating cubic of
the form
p2 (x) = f(a) + f 0 (a)(x − a) + (x − a)2
p3 (x) = b0 + b1 (x − a) + b2 (x − a)2 + b3 (x − a)3 ,
=
3
X
bk (x − a)k ,
k=0
so that p3 (a) = f(a), p30 (a) = f 0 (a), p300 (a) = f 00 (a),
and p3000 (a) = f 000 (a). Note: we can write this is a more
succinct, using the mathematical short-hand:
(k)
p3 (a) = f(k) (a) for k = 0, 1, 2, 3.
Again we find that
f(k)
for k = 0, 1, 2, 3.
k!
As you can probably guess, this formula can be easily
extended for arbitrary k, giving us the Taylor Polynomial.
Fig. 0.1: Taylor polynomials for f(x) = ex about x = 0
.................................................
As Figure 0.1 suggests, in this case p3 (x) is a more
accurate estimation of ex than p2 (x), which is more accurate than p1 (x). This is demonstrated in Figure 0.2
were it is shown the difference between f and pk .
bk =
1
Definition 0.2.2 (Taylor Polynomial). The Taylor Polynomial of degree k (also called the Truncated Taylor Se-
0.2.1
The Remainder
We now want to examine the accuracy of the Taylor
polynomial as an approximation. In particular, we would
like to find a formula for the remainder or error :
Rk (x) := f(x) − pk (x).
Brook Taylor, 1865 – 1731, England. He
1 (re)discovered this polynomial approximation in 1712, though it’s importance was
not realised for another 50 years.
With a little bit of effort one can prove that:
Rk (x) :=
(x − a)k+1 (k+1)
f
(σ), for some σ ∈ [x, a].
(k + 1)!
Taylor’s Theorem
6
MA385: Preliminaries
0.2.2
Two applications of Taylor’s Theorem
The reasons for emphasising Taylor’s theorem so early in
this course are that
• It introduces us to the concept of approximation,
and error estimation, but in a very simple setting;
• It is the basis for deriving methods for solving both
nonlinear equations, and initial value ordinary differential equations.
Fig. 0.2: Error in Taylor polys for f(x) = ex about x = 0
We won’t prove this in class, since it is quite standard and features in other courses you have taken. But
for the sake of completeness, a proof is included below
in Section 0.2.4 below.
Example 0.2.4. With f(x) = ex and a = 0, we get that
Rk (x) =
xk+1 σ
e , some σ ∈ [0, x].
(k + 1)!
Example 0.2.5. How many terms are required in the
Taylor Polynomial for ex about x = 0 to ensure that the
error at x = 1 is
• no more than 10−1 ?
• no more than 10−2 ?
• no more than 10−6 ?
• no more than 10−10 ?
.................................................
There are other ways of representing the remainder,
including the Integral Representation of the Remainder :
Z x (n+1)
f
(t)
Rn (x) =
(x − t)n dt.
(0.2.1)
n!
a
.................................................
Other issues related to Taylor polynomials, and that
we will touch on later in the course, include constructing
the Taylor series for a function of two variables: f =
f(x, y).
With the last point in mind, we’ll now outline how to
derive Newton’s method for nonlinear equations. (This
is just a taster : we’ll return to this topic in the next
section).
Taylor’s Theorem
0.2.3
7
Exercises
Exercise 0.2.1. Prove the Integral Mean Value Theorem: there exists a point c ∈ [a, b] such that
1
f(x) =
b−a
f(x) − f(a)
= f 0 (σ).
x−a
f(x)dx.
a
(ii) f(x) = sin(x) about the point x = 0,
(iii) f(x) = log(x) about the point x = 1.
Exercise 0.2.3. The Fundamental Theorem of Calculus
Rx
tells us that a f 0 (t)dt = f(x) − f(a). This can be reRx
arranged to get f(x) = f(a) + a f 0 (t)dt. Use this and
integration by parts to deduce (0.2.1) for the case n = 1.
(Hint: Check Wikipedia!)
A proof of Taylor’s Theorem
Here is a proof of Taylor’s theorem. It wasn’t covered in class. One of the ingredients need is Generalised Mean Value Theorem: if the functions F and G are
continuous and differentiable, etc, then, for some point
c ∈ [a, b],
F 0 (c)
F(b) − F(a)
= 0 .
(0.2.2)
G(b) − G(a)
G (c)
Theorem 0.2.6 (Taylor’s Theorem). Suppose we have a
function f that is sufficiently differentiable on the interval
[a, x], and a Taylor polynomial for f about the point x =
a
(x − a)2 00
f (a)+
2
(x − a)3 00
(x − a)k (n)
f (a) + . . .
f (a). (0.2.3)
3!
k!
pn (x) = f(a) + (x − a)f 0 (a) +
If the remainder is written as Rn (x) := f(x) − pn (x),
then
Rn (x) :=
For the case x 6= a, we will use a proof by induction.
The Mean Value Theorem tells us that there is some
point σ ∈ [a, x] such that
Zb
Exercise 0.2.2. Write down the formula for the Taylor
Polynomial for
√
(i) f(x) = 1 + x about the point x = 0,
0.2.4
MA385: Preliminaries
(x − a)n+1 (n+1)
f
(σ),
(n + 1)!
Using that p0 (a) = f(a) and that R0 (x) = (x − a)f 0 (σ)
we can rearrange to get
f(x) = p0 (a) + R0 (x),
as required.
Now we will assume that (0.2.3)–(0.2.4) are true for
the case n = k − 1; and use this to show that they
are true for n = k. From the Generalised Mean Value
Theorem (0.2.2), there is some point c such that
Rk (x) − Rk (a)
Rk (x)
=
=
k+1
(x − a)
(x − a)k+1 − (a − a)k+1
Rk0 (c)
(k + 1)(c − a)k ,
where here we have used that Rk (a) = 0. Rearranging we
see that we can write Rk in terms of it’s own derivative:
Rk (x) = Rk0 (c)
Proof. We want to prove that, for any n = 0, 1, 2, . . . ,
there is a point σ ∈ [a, x] such that
f(x) = pn (x) + Rn (x).
If x = a then this is clearly the case because f(a) =
pn (a) and Rn (a) = 0.
(0.2.5)
So now we need an expression for Rk0 . This is done by
noting that it also happens to be the remainder for the
Taylor polynomial of degree k − 1 for the function f 0 .
Rk0 (x) = f 0 (x) −
d
f(a) + (x − a)f 0 (a)+
dx
(x − a)k (k)
(x − a)2 00
f (a) + · · · +
f (a) .
2!
k!
Rk0 (x) = f 0 (x)−
(x − a)k−1 (k)
0
00
f (a) + (x − a)f (a) + · · · +
f (a) .
(k − 1)!
But the expression on the last line of the above equation
is the formula for the Taylor Polynomial of degree −1 for
f 0 . By our inductive hypothesis:
(0.2.4)
for some point σ ∈ [x, a].
(x − a)k+1
.
(k + 1)(c − a)k
Rk0 (c) :=
(c − a)k (k+1)
f
(σ),
k!
for some σ. Substitute into (0.2.5) above and we are
done.
Chapter 1
Solving nonlinear equations
1.1
1.1.1
Bisection
Proposition 1.1.1. Let f be a real-valued function that
is defined and continuous on a bounded closed interval
[a, b] ⊂ R. Suppose that f(a)f(b) 6 0. Then there
exists τ ∈ [a, b] such that f(τ) = 0.
Introduction
Linear equations are of the form:
find x such that ax + b = 0
and are easy to solve. Some nonlinear problems are also
easy to solve, e.g.,
find x such that ax2 + bx + c = 0.
Cubic and quartic equations also have solutions for which
we can obtain a formula. But, in the majority of cases,
there is no formula. So we need a numerical method.
References
• S¨
uli and Mayers [1, Chapter 1]. We’ll follow this
pretty closely in lectures.
OK – now we know there is a solution τ to f(x) = 0, but
how to we actually solve it? Usually we don’t! Instead
we construct a sequence of estimates {x0 , x1 , x2 , x3 , . . . }
that converge to the true solution. So now we have to
answer these questions:
• Stewart (Afternotes ...), [3, Lectures 1–5]. A wellpresented introduction, with lots of diagrams to
give an intuitive introduction.
(1) How can we construct the sequence x0 , x1 , . . . ?
• Moler (Numerical Computing with MATLAB) [2,
Chap. 4]. Gives a brief introduction to the methods
we study, and a description of MATLAB functions
for solving these problems.
(2) How do we show that limk→∞ xk = τ?
There are some subtleties here, particularly with part (2).
What we would like to say is that at each step the error
is getting smaller. That is
• The proof of the convergence of Newton’s Method
is based on the presentation in [5, Thm 3.2].
|τ − xk | < |τ − xk−1 |
.................................................
Our generic problem is:
But we can’t. Usually all we can say is that the bounds
on the error is getting smaller. That is: let εk be a bound
on the error at step k
Let f be a continuous function on the interval [a, b].
Find τ = [a, b] such that f(τ) = 0.
|τ − xk | < εk ,
Then τ is the solution to f(x) = 0.
This leads to two natural questions:
then εk+1 < µεk for some number µ ∈ (0, 1). It is
easiest to explain this in terms of an example, so we’ll
study the simplest method: Bisection.
1. How do we know there is a solution?
2. How do we find it?
The following gives sufficient conditions for the existence of a solution:
for k = 1, 2, 3, . . . .
8
Bisection
1.1.2
9
Solving nonlinear equations
Bisection
The most elementary algorithm is the “Bisection Method”
(also known as “Interval Bisection”). Suppose that we
know that f(x) = 0 has a solution, τ, in the interval
[a, b]. We divide the interval in two, and check which of
these subintervals contains τ (by using that f will have
a different sign at the end points of that interval). Now
bisect that interval in two and repeat the process. Stop
when f sufficiently small at the end points of the interval.
This may be expressed more precisely using some
pseudocode.
Method 1.1.2 (Bisection).
Set eps to be the stopping criterion.
If |f(a)| 6 eps, return a. Exit.
If |f(b)| 6 eps, return b. Exit.
Set x0 = a and x1 = b.
Set xL = x0 and xR = x1 .
Set x2 = (xL + xR )/2. Set k = 2
while( |f(xk )| > eps)
xk+1 = (xL + xR )/2;
if (f(xL )f(xk+1 ) < eps)
xR = xk+1 ;
else
xL = xk+1
end if;
k=k+1
end while;
k
0
1
2
3
4
5
6
7
8
9
10
..
.
xk
0.000000
2.000000
1.000000
1.500000
1.250000
1.375000
1.437500
1.406250
1.421875
1.414062
1.417969
..
.
|xk − τ|
1.41
5.86e-01
4.14e-01
8.58e-02
1.64e-01
3.92e-02
2.33e-02
7.96e-03
7.66e-03
1.51e-04
3.76e-03
..
.
|xk − xk−1 |
22
1.414214
5.72e-07
9.54e-07
1.00
5.00e-01
2.50e-01
1.25e-01
6.25e-02
3.12e-02
1.56e-02
7.81e-03
3.91e-03
..
.
Table 1.1: Solving x2 − 2 = 0 with the Bisection Method
• It will always work.
• After i steps we know that
Proposition 1.1.4.
|τ − xk | 6
√
Example 1.1.3. Find an estimate for 2 that is correct
to 6 decimal places.
Solution: Try to solve the equation f(x) := x2 − 2 = 0.
Then proceed as shown in Figure 1.1 and Table 1.1.
1 k−1
|b − a|,
2
for k = 2, 3, 4, ...
.................................................
A disadvantage of bisection is that it is not as efficient
as some other methods we’ll investigate later.
3
f(x)= x2 −2
2
1.1.4
x−axis
The bisection method is not very efficient. Our next goals
will be to derive better methods, particularly the Secant
Method and Newton’s method. We also have to come up
with some way of expressing what we mean by “better”;
and we’ll have to use Taylor’s theorem in our analyses.
1
0
[0]
[2]
x =a
x =1
[3]
x =1.5
[1]
x =b
−1
−2
−0.5
0
0.5
1
1.5
2
2.5
Fig. 1.1: Solving x2 − 2 = 0 with the Bisection Method
Note that at steps 4 and 10 in Table 1.1 the error
actually increases, although the bound on the error is
decreasing.
1.1.3
The bisection method works
The main advantages of the Bisection method are
Improving upon bisection
The Secant Method
1.2
10
Solving nonlinear equations
The Secant Method
1.2.1
k
0
1
2
3
4
5
6
7
8
Motivation
Looking back at Table 1.1 we notice that, at step 4 the
error increases rather decreases. You could argue that
this is because we didn’t take into account how close x3 is
to the true solution. We could improve upon the bisection
method as described below. The idea is, given xk−1 and
xk , take xk+1 to be the zero of the line intersects the
points xk−1 , f(xk−1 ) and xk , f(xk ) . See Figure 1.2.
xk
0.000000
2.000000
1.000000
1.333333
1.428571
1.413793
1.414211
1.414214
1.414214
Table 1.2: Solving x2 − 2 = 0 using the Secant Method
where
3
σk =
(x[1], f(x[1]))
f(x)= x2 −2
2
f(xk )
.
f(xk ) − f(xk−1 )
When its written in this form it is sometimes called a
relaxation method.
x−axis
1
1.2.2
0
x[0]
|xk − τ|
1.41e
5.86e-01
4.14e-01
8.09e-02
1.44e-02
4.20e-04
2.12e-06
3.16e-10
4.44e-16
x[2]
Order of Convergence
x[1]
To compare different methods, we need the following
concept:
−1
−2
(x[0], f(x[0]))
−3
−0.5
0
0.5
1
1.5
2
2.5
Fig. 1.2: The Secant Method for Example 1.2.2
Method 1.2.1 (Secant). 1 Choose x0 and x1 so that
there is a solution in [x0 , x1 ]. Then define
xk+1 = xk − f(xk )
xk − xk−1
.
f(xk ) − f(xk−1 )
The Method of Bisection could be written as the
weighted average xk+1 = (1 − σk )xk + σk xk−1 , with
σk = 1/2. We can also think of the Secant method
as being a weighted average, but with σk chosen to
obtain faster convergence to the true solution. Looking at Figure 1.2 above, you could say that we should
choose σk depending on which is smaller: f(xk−1 ) or
f(xk ). If (for example) |f(xk−1 )| < |f(xk )|, then probably |τ − xk−1 | < |τ − xk |. This gives another formulation
of the Secant Method.
xk+1 = xk (1 − σk ) + xk−1 σk ,
lim εk = 0,
k→∞
(1.2.2)
name comes from the name of the line that intersects
a curve at two points. There is a related method called “false
position” which was known in India in the 3rd century BC, and
China in the 2nd century BC.
(1.2.3a)
and
(1.2.1)
Example 1.2.2. Use the Secant Method to solve the
nonlinear problem x2 − 2 = 0 in [0, 2]. The results are
shown in Table 1.2. By comparing Tables 1.1 and 1.2,
we see that for this example, the Secant method is much
more efficient than Bisection. We’ll return to why this is
later.
1 The
Definition 1.2.3 (Linear Convergence). Suppose that
τ = limk→∞ xk . Then we say that the sequence {xk }∞
k=0
converges to τ at least linearly if there is a sequence of
positive numbers {εk }∞
k=0 , and µ ∈ (0, 1), such that
|τ − xk | 6 εk
for
and
lim
k→∞
k = 0, 1, 2, . . . .
εk+1
= µ.
εk
(1.2.3b)
(1.2.3c)
So, for example, the bisection method converges at least
linearly.
The reason for the expression “at least” is because we
usually can only show that a set of upper bounds for the
errors converges linearly. If (1.2.3b) can be strengthened
to the equality |τ − xk | = εk , then the {xk }∞
k=0 converges
linearly, (not just “at least” linearly).
As we have seen, there are methods that converge
more quickly than bisection. We state this more precisely:
Definition 1.2.4 (Order of Convergence). Let
τ = limk→∞ xk . Suppose there exists µ > 0 and a
sequence of positive numbers {εk }∞
k=0 such (1.2.3a) and
and (1.2.3b) both hold. Then we say that the sequence
{xk }∞
k=0 converges with at least order q if
lim
k→∞
εk+1
= µ.
(εk )q
Two particular values of q are important to us:
The Secant Method
11
(i) If q = 1, and we further have that 0 < µ < 1, then
the rate is linear.
(ii) If q = 2, the rate is quadratic for any µ > 0.
1.2.3
Analysis of the Secant Method
Our next goal is to prove that the Secant Method converges. We’ll be a little lazy, and only prove a suboptimal
linear convergence rate. Then, in our Matlab class, we’ll
investigate exactly how rapidly it really converges.
One simple mathematical tool that we use is the
Mean Value Theorem Theorem 0.2.1 . See also [1, p420].
Proposition 1.2.5. Suppose that f and f 0 are real-valued
functions, continuous and defined in an interval I = [τ −
h, τ + h] for some h > 0. If f(τ) = 0 and f 0 (τ) 6= 0, then
the sequence (1.2.1) converges at least linearly to τ.
• We wish to show that |τ − xk+1 | < |τ − xk |.
• From Theorem 0.2.1, there is a point wk ∈ [xk−1 , xk ]
such that
(1.2.4)
• Also by the MVT, there is a point zk ∈ [xk , τ] such
that
f(xk )
f(xk ) − f(τ)
=
= f 0 (zk ).
xk − τ
xk − τ
Therefore
(1.2.5)
f(xk ) = (xk − τ)f 0 (zk ).
• Suppose that f 0 (τ) > 0. (If f 0 (τ) < 0 just tweak
the arguments accordingly). Saying that f 0 is continuous in the region [τ − h, τ + h] means that, for
any ε > 0 there is a δ > 0 such that
|f 0 (x) − f 0 (τ)| < ε for any x ∈ [τ − δ, τ + δ].
Take ε = f 0 (τ)/4. Then |f 0 (x) − f 0 (τ)| < f 0 (τ)/4.
Thus
3 0
5
f (τ) 6 f 0 (x) 6 f 0 (τ)
4
4
for any x ∈ [τ−δ, τ+δ].
Then, so long as wk and zk are both in [τ−δ, τ+δ]
f 0 (zk )
5
6 .
0
f (wk )
3
Given enough time and effort we could show that the
Secant Method converges faster that linearly. In partic√
ular, that the order of convergence is q = (1 + 5)/2 ≈
1.618. This number arises as the only positive root of
q2 − q − 1. It is called the Golden Mean, and arises in
many areas of Mathematics, including finding an explicit
expression for the Fibonacci Sequence: f0 = 1, f1 = 1,
fk+1 = fk + fk−1 for k = 2, 3, . . . . That gives, f0 = 1,
f1 = 1, f2 = 2, f3 = 3, f4 = 5, f5 = 8, f6 = 13, . . . .
A rigorous proof depends on, among other things,
and error bound for polynomial interpolation, which is
the first topic in MA378. With that, one can show that
εk+1 6 Cεk εk−1 . Repeatedly using this we get:
• Let r = |x1 − x0 | so that ε0 6 r and ε1 6 r,
• Then ε2 6 Cε1 ε0 6 Cr2
• Then ε3 6 Cε2 ε1 6 C(Cr2 )r = C2 r3 .
Before we prove this, we note the following
f(xk ) − f(xk−1 )
= f 0 (wk ).
xk − xk−1
Solving nonlinear equations
• Then ε4 6 Cε3 ε2 6 C(C2 r3 )(Cr2 ) = C4 r5 .
• Then ε5 6 Cε4 ε3 6 C(C4 r5 )(C2 r3 ) = C7 r8 .
• And in general, εk = Cfk −1 rfk .
Newton’s Method
1.3
1.3.1
12
Newton’s Method
Motivation
These notes are loosely based on Section 1.4 of [1] (i.e.,
S¨
uli and Mayers, Introduction to Numerical Analysis).
See also, [3, Lecture 2], and [5, §3.5] The Secant method
is often written as
Solving nonlinear equations
By writing down the equation for the line at xk , f(xk )
with slope f 0 (xk ), one can show that the formula for the
iteration is
f(xk )
xk+1 = xk − 0
.
(1.3.6)
f (xk )
Example 1.3.2. Use Newton’s Method to solve the nonlinear problem x2 − 2 = 0 in [0, 2]. The results are shown
in Table 1.3. For this example, the method becomes
xk+1 = xk − f(xk )φ(xk , xk−1 ),
xk+1 = xk −
where the function φ is chosen so that xk+1 is the root
of the secant line joining the points xk−1 , f(xk−1 ) and
xk , f(xk ) . A related idea is to construct a method
xk+1 = xk − f(xk )λ(xk ), where we choose λ so that
xk+1 is the point where the line through (xk , f(xk )) with
slope f 0 (xk ) cuts the x-axis. This is shown in Figure 1.3.
We attempt to solve x2 − 2 = 0, taking x0 = 2. Taking
the x1 to be zero of the tangent to f(x) at x = 2, we
get x1 = 1.5. Taking the x2 to be zero of the tangent to
f(x) at x = 1.5, we get x2 = 1.4167, which is very close
to the true solution of τ = 1.4142.
Fig. 1.3: Estimating
k
0
1
2
3
4
5
x2k − 2
f(xk )
1
1
=
x
−
= xk +
.
k
f 0 (xk )
2xk
2
xk
xk
2.000000
1.500000
1.416667
1.414216
1.414214
1.414214
|xk − τ|
5.86e-01
8.58e-02
2.45e-03
2.12e-06
1.59e-12
2.34e-16
|xk − xk−1 |
5.00e-01
8.33e-02
2.45e-03
2.12e-06
1.59e-12
Table 1.3: Solving x2 − 2 = 0 using Newton’s Method
By comparing Table 1.2 and Table 1.3, we see that
for this example, the Newton’s method is more efficient
again than the Secant method.
Deriving Newton’s method geometrically certainly has
an intuitive appeal. However, to analyse the method, we
need a more abstract derivation based on a Truncated
Taylor Series.
√
2 using Newton’s Method
Method 1.3.1 (Newton’s Method2 ).
1. Choose any x0 in [a, b],
1.3.2
2. For i = 1, 2, . . . , set xk+1 to the root of the line
through xk with slope f 0 (xk ).
We saw in Table 1.3 that this method can be much more
efficient than, say, Bisection: it yields estimates that converge far more quickly to τ. Bisection converges (at least)
linearly, whereas Newton’s converges quadratically, that
is, with at least order q = 2.
In order to prove that this is so, we need to
2
Sir Isaac Newton, 1643 - 1727, England.
Easily one of the greatest scientist of all
time. The method we are studying appeared in his celebrated Principia Mathematica in 1687, but it is believed he had
used it as early as 1669.
Newton Error Formula
1. Write down a recursive formula for the error.
Newton’s Method
13
Solving nonlinear equations
2. Show that it converges
• f(τ) = 0 and f 00 (τ) 6= 0,
3. Use this to find the limit of |τ − xk+1 |/|τ − xk |2 .
• there is some positive constant A such that
Step 2 is usually the crucial part.
There are two parts to the proof. The first involves
deriving the so-called “Newton Error formula”. Then
we’ll apply this to prove (quadratic) convergence. In all
cases we’ll assume that the functions f, f 0 and f 00 are
defined and continuous on the an interval Iδ = [τ −
δ, τ + δ] around the root τ. The proof we’ll do in class
is essentially the same as Epperson [5, Thm 3.2].
Proposition 1.3.3 (Newton Error Formula). If f(τ) = 0
and
f(xk )
,
xk+1 = xk − 0
f (xk )
then there is a point ηk between τ and xk such that
τ − xk+1 = −
(τ − xk )2 f 00 (ηk )
,
2
f 0 (xk )
Example 1.3.4. As an application of Newton’s error formula, we’ll show that the number of correct decimal digits in the approximation doubles at each step.
1.3.3
Convergence of Newton’s Method
We’ll now complete our analysis of this section by proving
the convergence of Newton’s method.
Proposition 1.3.5. Let us suppose that f is a function
such that
• f is continuous and real-valued, with continuous
f 00 , defined on some close interval Iδ = [τ−δ, τ+δ],
|f 00 (x)|
6A
|f 0 (y)|
for all
x, y ∈ Iδ .
Let h = min{δ, 1/A}. If |τ − x0 | 6 h then Newton’s
Method converges quadratically.
Fixed Point Iteration
1.4
14
Fixed Point Iteration
1.4.1
Introduction
Newton’s method can be considered to be a particular
instance of a very general approach called Fixed Point
Iteration or Simple Iteration.
The basic idea is:
Solving nonlinear equations
So we need to refine the method that ensure that it
will. Before we do that in a formal way, consider the
following...
Example 1.4.2. Use the Mean Value Theorem to show
that the fixed point method xk+1 = g(xk ) converges if
|g 0 (x)| < 1 for all x near the fixed point.
If we want to solve f(x) = 0 in [a, b], find
a function g(x) such that, if τ is such that
f(τ) = 0, then g(τ) = τ. Choose x0 and for
k > 0 set xk+1 = g(xk ).
Example 1.4.1. Suppose that f(x) = ex −2x−1 and we
are trying to find a solution to f(x) = 0 in [1, 2]. Then
we can take g(x) = ln(2x + 1).
We have reformulated the problem in Example 1.4.1
above, from
For f(x) = ex − 2x − 1, find τ ∈ [1, 2] such
that f(τ) = 0,
to:
This is an important example; mostly because it introduces the “tricks” of using that g(τ) = τ and g(xk ) =
xk+1 . But it is not a rigorous theory. That requires some
ideas such as the contraction mapping theorem.
.................................................
1.4.2
For g(x) = ln(2x + 1), find τ ∈ [1, 2] such
that g(τ) = τ.
If we take the initial estimate x0 = 1, then we get the
following sequence of estimates.
k
0
1
2
3
4
5
..
.
xk
1.0000
1.0986
1.1623
1.2013
1.2246
1.2381
..
.
|τ − xk |
2.564e-1
1.578e-1
9.415e-2
5.509e-2
3.187e-2
1.831e-2
..
.
10
1.2558
6.310e-4
To make this table, I used a numerical scheme to solve the
problem quite accurately to get τ = 1.256431. In general
we don’t know τ in advance–otherwise we wouldn’t need
such a scheme! I’ve given the quantities |τ − xk | here so
we can observe that the method is converging, and get
an idea of how quickly it is converging..
.................................................
We have to be quite careful with this method: not every
choice is g is suitable.
k xk
Suppose we want the solution to
0
1
f(x) = x2 −2 = 0 in [1, 2]. We could
1
0
2 -2
choose g(x) = x2 + x − 2. Taking
3
0
x0 = 1 we get the iterations shown
4 -2
opposite.
5
0
..
..
This sequence doesn’t converge!
.
.
A short tour of fixed points and contractions
A variant of the famous Fixed Point Theorem3 is : Suppose that g(x) is defined and continuous on [a, b], and
that g(x) ∈ [a, b] for all x ∈ [a, b]. Then there exists a
point τ ∈ [a, b] such that g(τ) = τ. That is, g(x) has a
fixed point in the interval [a, b].
Try to convince yourself that it is true, by sketching
the graphs of a few functions that send all points in the
interval, say, [1, 2] to that interval, as in Figure 1.4.
b
.
....
....
.....
.....
.
.
.
.
...
.....
.....
.....
.....
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
........
...
.....
.......
....
............................... g(x)
...
...
..
.
...
...
...
..
.
..
..
..
..
..
.
......
.
..
.......
.......... ........
.......
a
a
b
Fig. 1.4: Sketch of a function g(x) such that, if a 6 x 6
b then a 6 g(x) 6 b
3 LEJ
Brouwer, 1881–1966, Netherlands
Fixed Point Iteration
15
Solving nonlinear equations
The next ingredient we need is to observe that g is a
contraction. That is, g(x) is continuous and defined on
[a, b] and there is a number L ∈ (0, 1) such that
|g(α) − g(β)| 6 L|α − β| for all α, β ∈ [a, b]. (1.4.7)
Proposition 1.4.3 (Contraction Mapping Theorem).
Suppose that the function g is a real-valued, defined,
continuous, and
• maps every point in [a, b] to some point in [a, b]
• is a contraction on [a, b],
then
This calculation only gives an upper bound for the
number of iterations. It is correct, but not necessarily
sharp. In Table 1.4 it is shown that this level of accuracy
is achieved after 23 iterations. Even so, 23 iterations a
quite a lot for such a simple problem. So can conclude
that this method is not as fast as, say, Newton’s Method.
However, it is perhaps the most generalizable.
(i) g(x) has a fixed point τ ∈ [a, b],
(ii) the fixed point is unique,
(iii) the sequence {xk }∞
k=0 defined by x0 ∈ [a, b] and
xk = g(xk−1 ) for k = 1, 2, . . . converges to τ.
Proof:
i
0
1
2
3
4
5
6
..
.
xk
1.00000
1.09861
1.16228
1.20134
1.22456
1.23812
1.24595
|xk − τ|
2.56e-01
1.58e-01
9.41e-02
5.51e-02
3.19e-02
1.83e-02
1.05e-02
22
23
1.25643
1.25643
1.29e-06
7.32e-07
Table 1.4: Applying Fixed Point Iteration for E.g. 1.4.1
1.4.4
1.4.3
Convergence of Fixed Point Iteration
We now know how to apply to Fixed-Point Method and
to check if it will converge. Of course we can’t perform
an infinite number of iterations, and so the method will
yield only an approximate solution. Suppose we want the
solution to be accurate to say 10−6 , how many steps are
needed? That is, how large must k be so that
|xk − τ| 6 10−6 ?
The answer is obtained by first showing that
|τ − xk | 6
Lk
|x1 − x0 |.
1−L
(1.4.8)
Example 1.4.4. If g(x) = ln(2x + 1) and x0 = 1, and
we want |xk − τ| 6 10−6 , then we can use (1.4.8) to
determine the number of iterations required.
Knowing When to Stop
Suppose you wish to program one of the above methods.
You will get your computer to repeat one of the iterative
methods until your solution is sufficiently close to the
true solution:
x[0] = 0
tol = 1e-6
i=0
while (abs(tau - x[i]) > tol) // This is the
// stopping criterion
x[i+1] = g(x[i])
// Fixed point iteration
i = i+1
end
All very well, except you don’t know τ. If you did, you
wouldn’t need a numerical method. Instead, we could
choose the stopping criterion based on how close successive estimates are:
while (abs(x[i-1] - x[i]) > tol)
This is fine if the solution is not close to zero. E.g., if
its about 1, would should get roughly 6 accurate figures.
But is τ = 10−7 then it is quite useless: xk could be
ten times larger than τ. The problem is that we are
estimating the absolute error.
Instead, we usually work with relative error:
while (abs ( x[i−1]−x[i]
) > tol)
x[i]
Fixed Point Iteration
1.4.5
16
What has Newton’s method ever done
for me?
When studying a numerical method (or indeed any piece
of Mathematics) it is important to ask why you are doing
this. Sometimes it is so that you can understand other
topics later; because it is interesting/beautiful in its own
right; (most commonly) because it is useful.
Here are some examples of each of these:
1. The analyses we have used in this section allowed us
to consider some important ideas in a simple setting.
Examples include
• Convergence, include rates of convergence
Solving nonlinear equations
3. Some of these ideas are interesting and beautiful.
Consider Newton’s method. Suppose that we want to
find the complex nth roots of unity: the set of numbers
{z0 , z1 , z2 , . . . , zn−1 } who’s nth roots are 1. Although we
could just express this in terms of polar coordinates:
z = eiθ
.................................................
2. Applications come from lots of areas of science
and engineering. Less obvious might be applications to
financial mathematics.
The celebrated Black-Scholes equation for pricing a
put option can be written as
2kπ
n
√
for some k ∈ Z = {. . . , −2, −1, 0, 1, 2 . . . } and i = −1.
Example: 1, i, −1 and −i are the 4th roots of unity.
Recall that, plotted in the Argand Plane, these point
form a regular polygon.
If we use Newton’s method to find a give root, we
could try to solve f(z) = 0 with f(z) = zn − 1. The
iteration is:
• Fixed-point theory, and contractions. We’ll be
seeing analogous ideas in the next section (Lipschitz conditions).
• The approximation of functions by polynomials
(Taylor’s Theorem). This point will reoccur in the
next section, and all through-out next semester.
where θ =
zk+1 = zk −
(zk )n − 1
.
n(zk )n−1
However, there are n possible solutions; given a particular
starting point, which root with the method converge to?
If we take a number of points in a region of space, iterate
on each of them, and then colour the points to indicate
the ones that converge to the same root, we get the famous Julia 4 set, an example of a fractal. One such Julia
set, generate by the Matlab script Julia.m, which you
can download from the course website, is shown below in
Figure 1.5.
1.5
2
1
∂ V
∂V
∂V
− σ2 S2 2 − rS
+ rV = 0.
∂t
2
∂S
∂S
1
where
0.5
• V(S, t) is the current value of the right (but not
the obligation) to buy or sell (“put” or “call”) an
asset at a future time T ;
0
• S is the current value of the underlying asset;
−0.5
• r is the current interest rate (because the value of
the option has to be compared with what we would
have gained by investing the money we paid for it)
−1
−1.5
−1.5
• σ is the volatility of the asset’s price.
Often one knows S, T and r, but not σ. The method of
implied volatility is when we take data from the market
and then find the value of σ which would give the data
as the solution to the Black-Scholes equation. This is a
nonlinear problem and so Newton’s method can be used.
See Chapters 13 and 14 of Higham’s “An Introduction
to Financial Option Valuation” for more details.
(We will return to the Black-Scholes problem again
at the end of the next section).
.................................................
−1
−0.5
0
0.5
1
1.5
Fig. 1.5: A contour plot of a Julia set with n = 5
4
Gaston Julia, French mathematician 1893–
1978. The famous paper which introduced
these ideas was published in 1918 when he
was just 25. Interest later waned until the
1970s when Mandelbrot’s computer experiments reinvigorated interest.
Matlab 1: the bisection and secant methods
1.5
Matlab 1: the bisection and
secant methods
The goal of this section is to help you gain familiarity
with the fundamental tasks that can be accomplished
with Matlab: defining vectors, computing functions, and
plotting. We’ll then see how to implement and analyse
the Bisection and Secant schemes in Matlab.
.................................................
Matlab is an interactive environment for mathematical and scientific computing. It the standard tool for
numerical computing in industry and research.
Matlab stands for Matrix Laboratory. It specialises
in Matrix and vector computations, but includes functions for graphics, numerical integration and differentiation, solving differential equations, etc.
Matlab differs from most significantly from, say, Maple,
by not having a facility for abstract computation.
1.5.1
The Basics
Matlab an interpretive environment – you type a command and it will execute it immediately.
The default data-type is a matrix of double precision
floating-point numbers. A scalar variable is an instance
of a 1 × 1 matrix. To check this set, say,
>> t=10
and use the
>> size(t)
command to find the numbers of rows and columns of t.
A vector may be declared as follows:
>> x = [1 2 3 4 5 6 7]
This generates a vector, x, with x1 = 1, x2 = 2, etc.
However, this could also be done with x=1:7
More generally, a vector may be declared as follows:
>> x = a:h:b;
This sets x1 = a, x2 = a + h, x3 = x + 2h, ..., xn = b.
If h is omitted, it is assumed to be 1.
The ith element of a vector is access by typing x(i).
The element of in row i and column j of a matrix is given
by A(i,j)
Most “scalar” functions return a matrix when given
a matrix as an argument. For example, if x is a vector of
length n, then y = sin(x) sets y to be a vector, also
of length n, with yi = sin(xi ).
Matlab has most of the standard mathematical functions: sin, cos, exp, log, etc.
In each case, write the function name followed by the
argument in round brackets, e.g.,
>> exp(x)
for
ex .
The * operator performs matrix multiplication. For
element-by-element multiplication use .*
17
Solving nonlinear equations
For example, y = x.*x sets yi = (xi )2 . So does y = x.^2.
Similarly, y=1./x set yi = 1/xi .
If you put a semicolon at the end of a line of Matlab,
the line is executed, but the output is not shown. (This
is useful if you are dealing with large vectors). If no
semicolon is used, the output is shown in the command
window.
1.5.2
Plotting functions
Define a vector >> x=[0 1 2 3]
and then set
>> f = x.^2 -2
To plot these vectors use:
>> plot(x, f)
If the picture isn’t particularly impressive, then this might
be because Matlab is actually only printing the 4 points
that you defined. To make this more clear, use
>> plot(x, f, ’-o’)
This means to plot the vector f as a function of the vector x, placing a circle at each point, and joining adjacent
points with a straight line.
Try instead: >> x=0:0.1:3 and f = x.^2 -2
and plot them again.
To define function in terms of any variable, type:
>> F = @(x)(x.^2 -2);
Now you can use this function as follows:
>> plot(x, F(x)) ;
Take care to note that Matlab is case sensitive.
In this last case, it might be helpful to also observe
where the function cuts the x-axis. That can be done
by also plotting the line joining, for example, the points
(0, 0), and (3, 0). This can be done by:
>> plot(x,F(x), [0,3], [0,0]);
Also try >> plot(x,F(x), [0,3], [0,0], ’--’);
Notice the difference?
.................................................
Tip: Use the >> help menu to find out what the
>> ezplot function is, and how to use it.
1.5.3
Programming the Bisection Method
Revise the lecture notes on the Bisection Method.
Suppose we want to find a solution to ex − (2 − x)3 = 0
in the interval [0, 5] using Bisection.
• Define the function f as:
>> f = @(x)(exp(x) - (2-x).^3);
• Taking x1 = 0 and x2 = 5, do 8 iterations of the
Bisection method
• Complete the table below. You may use that the
solution is (approximately)
τ = 0.7261444658054950.
Matlab 1: the bisection and secant methods
k
1
2
3
4
5
6
7
8
xk
18
|τ − xk |
Solving nonlinear equations
1.5.4
The Secant method
Recall the the Secant Method in (1.2.1).
Q2 (a) Adapt the program above to implement the
secant method.
(b) Use it to find a solution to ex − (2 − x)3 = 0
in the interval [0, 5].
(c) How many iterations are required to ensure
that the error is less than 10−10 ?
.................................................
Implementing the Bisection method by hand is very
tedious. Here is a program that will do it for you. You
don’t need to type it all in; you can download it from
http://www.maths.nuigalway.ie/MA385/lab1/Bisection.m
%% Using the Bisection method to find a zero of f(x)
% Lab 1 of MA385
clear; % Erase all stored variables
fprintf(’\n\n---------\n Using Bisection ’);
% The function is
f = @(x)(exp(x) - (2-x).^3);
disp(’Solving f=0 with the function’);
disp(f);
%% The true solution is
tau = 0.72614446580549503614;
fprintf(’The true solution is %12.8f’, tau);
%% Our initial guesses are x_1=0 and x_2 =2;
x(1)=0;
fprintf(’\n\n%2d | %1.8e | %1.3e \n’, ...
1, x(1), abs(tau - x(1)));
x(2)=5;
fprintf(’%2d | %1.8e | %1.3e \n’, 2, x(2), abs(tau-x(2)));
for k=2:10
x(k+1) = (x(k-1)+x(k))/2;
if ( f(x(k+1))*f(x(k-1)) < 0)
x(k)=x(k-1);
end
fprintf(’%2d | %1.8e | %1.3e\n’, k+1, x(k+1), ...
abs(tau - x(k+1)));
end
Read the code carefully. If there is a line you do not
understand, then ask a tutor, or look up the on-line help.
For example, find out what that clear on Line 3 does
by typing >> doc clear . . . . . . . . . . . . . . . . . . . . . . . . . .
Q1. Suppose we wanted an estimate xk for τ so that
|τ − xk | 6 10−10 .
1 k−1
|x1
2
− x0 |.
(i) In §1.1 we saw that |τ − xk | 6
Use this to estimate how many iterations are
required in theory.
(ii) Use the program above to find how many iterations are required in practice.
Q3 Recall from Definition 1.2.4 the order of convergence of a sequence {ε0 , ε1 , ε2 , . . . } is q if
εk+1
lim
q = µ,
k→∞ εk
for some constant µ.
√
We would like to verify that q = (1 + 5)/2 ≈
1.618. This is difficult to do computationally because, after a relatively small number of iterations,
the round-off error becomes significant. But we
can still try!
Adapt the program above so that at each iteration
it displays
|τ − xk |,
|τ − xk |1.68 ,
|τ − xk |2 ,
and so deduce that the order of converges is greater
than 1 (so better than bisection), less than 2, and
√
roughly (1 + 5)/2.
Q4 (Bonus) The bisection method is popular because
it is robust: it will always work subject to minimal constraints. However, it is slow: if the Secant
works, then it converges much more quickly. How
can we combine these two algorithms to get a fast,
robust method? Consider the following problem:
Solve f(x) = 1 −
x2
2
− 2x + 2
on [−10, 1].
You should find that the bisection method works (slowly)
for this problem, but the Secant method will fail. So write
a hybrid algorithm that switches between the bisection
method and the secant method as appropriate.
Take care to document your code carefully, to show
which algorithm is used when.
How many iterations are required?
1.5.5
To Finish
Before you leave the class send an email to
[email protected] with your name, ID number,
and answers to Questions 1, 2 and 3. For bonus credit,
submit your solution to Q4 too. If you have worked in
small groups, you should still submit separate emails.
Matlab 1: the bisection and secant methods
1.6
Nonlinear equations: Exercises
Exercise 1.6.1. Does Proposition 1.1.1 mean that, if
there is a solution to f(x) = 0 in [a, b] then
19
Solving nonlinear equations
Exercise 1.6.8. ? Here is (yet) another scheme called
Steffenson’s Method: Choose x0 ∈ [a, b] and set
2
f(xk )
xk+1 = xk −
for k = 0, 1, 2, . . . .
f xk + f(xk ) − f(xk )
f(a)f(b) 6 0?
(a) Explain how this method relates to Newton’s Method.
That is, is f(a)f(b) 6 0 a necessary condition for their
being a solution to f(x) = 0? Give an example that
supports your answer.
(b) Write a Matlab program to implement this method.
Exercise 1.6.2. Suppose we want to find τ ∈ [a, b] such
that f(τ) = 0 for some given f, a and b. Write down an
estimate for the number of iterations K required by the
bisection method to ensure that, for a given ε, we know
|xk − τ| 6 ε for all k > K. In particular, how does this
estimate depend on f, a and b?
Exercise 1.6.3. How many (decimal) digits of accuracy
are gained at each step of the bisection method? (If you
prefer, how many steps are need to gain a single (decimal)
digit of accuracy?)
Exercise 1.6.4. ? Let f(x) = ex − 2x − 2. Show that
the problem
Find τ ∈ [0, 2] such that f(τ) = 0
has a solution.
Taking x0 = 0 and x1 = 2, use 6 steps of the bisection
method to estimate τ. Give an upper bound for the error
|τ − x6 |. (You may use a computer program to do this).
Exercise 1.6.5.
(i) Write down the equation of the line that intersect
xk−1 , f(xk−1 ) and xk , f(xk ) .
Hence show how to derive (1.2.1).
(ii) Can you construct a problem for which the bisection
method will work, but the secant method will fail
completely? If so, give an example.
Exercise 1.6.6.
(i) Write down Newton’s Method as applied to the
function f(x) = x3 − 2. Simplify the computation
as much as possible. What is achieved if we find
the root of this function?
(ii) Do three iterations by hand of Newton’s Method
applied to f(x) = x3 − 2 with x0 = 1.
Exercise 1.6.7. (This is taken from [5, Exer 3.5.1]). If
f is such that |f 00 (x)| 6 3 and |f 0 (x)| > 1 for all x, and
if the initial error in Newton’s Method is less than 1/2,
give an upper bound for the error at each of the first 3
steps.
(c) Using this program or otherwise, determine x3 and
|τ − x3 | when the method is used to solve ex = (2 −
x)3 with x0 = 0.
Exercise 1.6.9. ([1, Exer 1.6]) The proof of the convergence of Newton’s method given in Prop. 1.3.5 uses that
f 0 (τ) 6= 0. Suppose that it is the case that f 0 (τ) = 0.
(i) What can we say about the root, τ?
(ii) Show that the Newton Error formula can be extended to give that
τ − xk+1 =
(τ − xk ) f 00 (ηk )
,
2
f 00 (µk )
for some µk between τ and xk .
(iii) What does the above error formula tell us about the
convergence of Newton’s method in this case?
Exercise 1.6.10. Is it possible for g to be a contraction
on [a, b] but not have a fixed point in [a, b]? Give an
example to support your answer.
Exercise 1.6.11. Show that g(x) = ln(2x + 1) is a contraction on [1, 2]. Give an estimate for L. (Hint: Use the
Mean Value Theorem).
Exercise 1.6.12. Consider the function g(x) = x2 /4 +
5x/4 − 1/2.
1. It has two fixed points – what are they?
2. For each of these, find the largest region around
them such that g is a contraction on that region.
Exercise 1.6.13. ? Although we didn’t prove it in class,
it turns out that, if g(τ) = τ, and the fixed point method
given by
xk+1 = g(xk ),
converges to the point τ (where g(τ) = τ), and
g 0 (τ) = g 00 (τ) = · · · = g(p−1) (τ) = 0,
then it converges with order p.
(i) Use a Taylor Series expansion to prove this.
(ii) We can think of Newton’s Method for the problem f(x) = 0 as fixed point iteration with g(x) =
x − f(x)/f 0 (x). Use this to show that, if Newton’s
method converges, it does so with order 2, providing
that f 0 (τ) 6= 0.
Matlab 1: the bisection and secant methods
1.7
Appendix
Here are the full details on the proof of the fact that the
Secant Method converges at least linearly (Proposition
1.2.5). Before you read it, take care to review the notes
from that section, particularly (1.2.4) and (1.2.5).
Proof. The method is
xk+1 = xk − f(xk )
xk − xk−1
.
f(xk ) − f(xk−1 )
We’ll use this to derive an expression of the error at step
k + 1 in terms of the error at step k. In particular,
τ − xk+1 = τ − xk + f(xk )
xk − xk−1
f(xk ) − f(xk−1 )
= τ − xk + f(xk )/f 0 (wk )
= τ − xk + (xk − τ)f 0 (zk )/f 0 (wk )
0
0
= (τ − xk ) 1 − f (zk )/f (wk ) .
Therefore
f 0 (zk ) |τ − xk+1 | 6 1− 0
.
|τ − xk |
f (wk )
So it remains to be shown that
0
1 − f (zk ) < 1.
0
f (wk )
Lets first assume that f 0 (τ) = α > 0. (If f 0 (τ) = α < 0
the following arguments still hold, just with a few small
changes). Because f 0 is continuous in the region [τ −
h, τ + h], for any given ε > 0 there is a δ > 0 such that
|f 0 (x) − α| < ε for and x ∈ [τ − δ, τ + δ]. Take ε = α/4.
Then |f 0 (x) − α| < α/4. Thus
α
3
5
6 f 0 (x) 6 α
4
4
for any x ∈ [τ − δ, τ + δ].
Then, so long as wk and zk are both in [τ − δ, τ + δ]
5
f 0 (zk )
6 .
0
f (wk )
3
This gives
|τ − xk+1 |
2
6 ,
|τ − xk |
3
which is what we needed.
.................................................
20
Solving nonlinear equations
Chapter 2
Initial Value Problems
2.1
Introduction
rectangular region D if there is a positive real number L
such that
|f(t, u) − f(t, v)| 6 L|u − v|
(2.1.2)
The first part of this introduction is based on [5, Chap.
6]. The rest of the notes mostly follow [1, Chap. 12].
The growth of some tumours can be modelled as
for all (t, u) ∈ D and (t, v) ∈ D.
2λσ
1
p
,
R 0 (t) = − Si R(t) +
3
µR + µ2 R2 + 4σ
Example 2.1.2. For each of the following functions f,
show that is satisfies a Lipschitz condition, and give an
upper bound on the Lipschitz constant L.
subject to the initial condition R(t0 ) = a, where R is
the radius of the tumour at time t. Clearly, it would
be useful to know the value of R at certain times in the
future. Though it’s essentially impossible to solve for R
exactly, we can accurately estimate it. In this section,
we’ll study techniques for this.
.................................................
Initial Value Problems (IVPs) are differential equations
of the form: Find y(t) such that
(i) f(t, y) = y/(1 + t)2 for 0 6 t 6 ∞.
(ii) f(t, y) = 4y − e−t for all t.
(iii) f(t, y) = −(1 + t2 )y + sin(t) for 1 6 t 6 2.
dy
= f(t, y) for t > t0 ,
and y(t0 ) = y0 . (2.1.1)
dt
Here y 0 = f(t, y) is the differential equation and y(t0 ) =
y0 is the initial value.
Some IVPs are easy to solve. For example:
y 0 = t2
with y(1) = 1.
.................................................
The reason we are interested in functions satisfying
Lipschitz conditions is as follows:
Just integrate the differential equation to get that
y(t) = t3 /3 + C,
and use the initial value to find the constant of integration. This gives the solution y 0 (t) = (t3 + 2)/3. However, most problems are much harder, and some don’t
have solutions at all.
In many cases, it is possible to determine that a given
problem does indeed have a solution, even if we can’t
write it down. The idea is that the function f should be
“Lipschitz”, a notion closely related to that of a contraction (1.4.7).
Proposition 2.1.3 (Picard’s2 ). Suppose that the realvalued function f(t, y) is continuous for t ∈ [t0 , tM ] and
y ∈ [y0 −C, y0 +C]; that |f(t, y0 )| 6 K for t0 6 t 6 tM ;
and that f satisfies the Lipschitz condition (2.1.2). If
K L(tM −t0 )
C>
e
−1 ,
L
then (2.1.1) us a unique solution on [t0 , tM ]. Furthermore
|y(t) − y(t0 )| 6 C
t 0 6 t 6 tM .
Definition 2.1.1. A function f satisfies a Lipschitz 1
Condition (with respect to its second argument) in the
2 Charles
1 Rudolf
Otto Sigismund Lipschitz, Germany, 1832–1903. Made
many important contributions to science in areas that include differential equations, number theory, Fourier Series, celestial mechanics,
and analytic mechanics.
21
Emile Picard, France, 1856–1941. He made important discoveries in the fields of analysis, function theory, differential equations and geometry. He supervised the Ph.D. of Padraig
de Br´
un, Professor of Mathematics (and President) of University
College Galway/NUI Galway.
One step methods
22
You are not required to know this theorem for this course.
However, it’s important to be able to determine when a
given f satisfies a Lipschitz condition.
2.2
One step methods
2.2.1
Euler’s Method
Classical numerical methods for IVPs attempt to generate approximate solutions at a finite set of discrete
points t0 < t1 < t2 < · · · < tn . The simplest is Euler’s Method 3 and may be motivated as follows:
Suppose we know y(ti ), and want to find y(ti+1 ). From
the differential equation we know the slope of the tangent to y at ti . So, if this is similar to the slope of the
line joining (ti , y(ti )) and (ti+1 , y(ti+1 )):
y 0 (ti ) = f(ti , y(ti )) ≈
Initial Value Problems
If we had chosen h = 4 we would have only required
one step: yn = y0 +4f(t0 , y0 ) = 5. However, this would
not be very accurate. With a little work one can show
−1
that the solution to this problem is y(t) = etan (t) and
so y(4) = 3.7652. Hence the computed solution with
h = 1 is much more accurate than the computed solution
when h = 4. This is also demonstrated in Figure 2.1
below, and in Table 2.1 where we see that the error seems
to be proportional to h.
yi+1 − yi
.
ti+1 − ti
.
.....
... ..
.
.
.
.
.
.
.
.
.
.
.
.
....
.
...
..........
.
..........................
...
.
.
........
.
.
.
.
.
.
...
.
.
.
.
.
.......
.
.
.
.
...
.
.
.
.
.
.
.
.
..
.
..........
.
..........
.
.........
.
..........
.
... ....................
............. ..... ..... .
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
..... ....................
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... .....
. ...............
..........
.
..
. ..... ..........
..........
.
.......... ..... ..... ..... ..... ....
....
.
. (t
..........
.......
i+1 , yi+1 )
.
i
i...... .................................... ..... .....
.....
.
.
.
.
.
.....................
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..... . .............. .
........
.
i ..
.
.
.
...........
.
.
.....
.
. ...
...
. ........... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........
y(t)
pp pp
p pp p p p
p
p
p
p
p
p
ppp p p p p
ppp p p p pp
p
p
p
p
p
p
p
p
p p p p p p pppp
p p p p p p p pppp
p
p
p
p
p
p
p
p
p
p
p
p p ppppp
p p p ppppppp p p p p p
(t , y ) p p p p p ppppp p p p p p p p p pppppppp p p p p p
p
p
p p p pp p p p p p p p p
Secant line joining the points
on y(t) at ti and ti+1
Tangent to y(x) at x = x
ti
h
ti+1
Method 2.2.1 (Euler’s Method). Choose equally spaced
points t0 , t1 , . . . , tn so that
ti − ti−1 = h = (tn − t0 )/n
for i = 0, . . . , n − 1.
We call h the “time step”. Let yi denote the approximation for y(t) at t = ti . Set
yi+1 = yi + hf(ti , yi ), i = 0, 1, . . . , n − 1.
(2.2.3)
Example 2.2.2. Taking h = 1, estimate y(4) where
0
2
y (t) = y/(1 + t ),
y(0) = 1.
(2.2.4)
Fig. 2.1: Euler’s method for Example 2.2.2 with h = 4,
h = 2, h = 1 and h = 1/2
The true solution to this is y(t) = earctan(t) .
3
Leonhard Euler, 1707–1783, Switzerland. One of the
greatest Mathematicians of all time, he made vital
contributions to geometry, calculus and number theory. finite differences, special functions, differential
equations, continuum mechanics, astronomy, elasticity, acoustics, light, hydraulics, music, cartography,
and much more.
n
1
2
4
8
16
32
h
4
2
1
1/2
1/4
1/8
yn
5.0
4.2
3.960
3.881
3.831
3.800
|y(tn ) − yn |
1.235
0.435
0.195
0.115
0.065
0.035
Table 2.1: Error in Euler’s method for Example 2.2.2
Analysis of Euler’s method
2.3
2.3.1
23
Error Analysis
Initial Value Problems
tion errors is explained in the following (important!) result, which in turn is closely related to Proposition 2.1.3:
General one-step methods
Euler’s method is an example of the more general onestep methods, which have the form:
yi+1 = yi + hΦ(ti , yi ; h).
(2.3.5)
To get Euler’s method, just take Φ(ti , yi ; h) = f(ti , yi ).
In the introduction, we motivated Euler’s method
with a geometrical argument. An alternative, more mathematical way of deriving Euler’s Method is to use a Truncated Taylor Series.
Proposition 2.3.3. Let Φ() be Lipschitz with constant
L. Then
L(tn −t0 )
e
−1
|En | 6 T
,
(2.3.7)
L
where T = maxi=0,1,...n |Ti |.
(See also Exercise 2.9.3)
2.3.3
Analysis of Euler’s method
For Euler’s method, we get
T = max |Tj | 6
This again motivates formula (2.2.3), and also suggests that at each step the method introduces a (local)
error of h2 y 00 (η)/2. (More of this later).
2.3.2
Two types of error
06j6n
h
max |y 00 (t)|.
2 t0 6t6tn
Example 2.3.4. Given the problem:
y0 = 1 + t +
y
for t > 1; y(1) = 1,
t
find an approximation for y(2).
We’ll now give an error analysis for general one-step
methods, and then look at Euler’s Method as a specific
example. First, some definitions.
(i) Give an upper bound for the global error taking n =
4 (i.e., h = 1/4)
Definition 2.3.1. Global Error : Ei = y(ti ) − yi .
(ii) What n should you take to ensure that the global
error is no more that 0.1?
Definition 2.3.2. Truncation Error :
Ti :=
y(ti+1 ) − y(ti )
− Φ(ti , y(ti ); h).
h
(2.3.6)
To answer these questions we need to use (2.3.7),
which requires that we find L and an upper bound for T .
In this instance, L is easy:
It can be helpful to think of Ti as representing how
much the difference equation (2.2.3) differs from the differential equation. We can also determine the truncation
error for Euler’s method directly from a Taylor Series.
(Note: This is a particularly easy example. Often we need
to employ the mean value theorem. See [1, Eg 12.2].)
To find T we need an upper bound for |y 00 (t)| on
[1, 2], even though we don’t know y(t). However, we do
know y 0 (t)....
The relationship between the global error and trunca-
Analysis of Euler’s method
24
Initial Value Problems
Definition 2.3.6. A one-step method yn+1 = yn +
hΦ(tn , yn ; h) is consistent with the differential equa
tion y 0 (t) = f t, y(t) if f(t, y) ≡ Φ(t, y; 0).
Plug these values into (2.3.7) to find En 6 0.644. In
fact, the true answer is 0.43, so we see that (2.3.7) is
somewhat pessimistic.
To answer (ii): What n should you take to ensure that
the global error is no more that 0.1? (We should get
n = 26. This is not that sharp: n = 19 will do).
2.3.4
Convergence and Consistency
We are often interested in the convergence of a method.
That is, is it true that
lim yn = y(tn )?
h→0
Or equivalently that,
lim En = 0?
h→0
Given that the global error for Euler’s method can be
bounded:
max |y 00 (t)| L(tn −t0 )
e
−1 = hK, (2.3.8)
|En | 6 h
2L
we can say it converges.
Definition 2.3.5. The order of accuracy of a numerical
method is p if there is a constant K so that
|Tn | 6 Khp .
(The term order of convergence is often use instead of
order of accuracy).
In light of this definition, we read from (2.3.8) that
Euler’s method is first-order.
One of the requirements for convergence is Consistency :
.................................................
It is quite trivial to see that Euler’s method is consistent. In the next section, we’ll try to develop methods
that are of higher order than Euler’s method. That is,
we will study methods for which one can show
|En | 6 Khp
for some p > 1.
For these, showing consistency is a little (but only a little)
more work.
Runge-Kutta 2(RK2)
2.4
25
Runge-Kutta 2(RK2)
The goal of this section is to develop some techniques
to help us derive our own methods for accurately solving
Initial Value Problems. Rather than using formal theory,
the approach will be based on carefully chosen examples.
As a motivation, a suppose we numerically solve some
differential equation and estimate the error. If we think
this error is too large, we could redo the calculation with
a smaller value of h. Or we could use a better method,
for example a Runge-Kutta method. These are highorder methods that rely on evaluating f(t, y) a number
of times at each step in order to improve accuracy.
We’ll first motivate one such method and then, in the
next section, look at the general framework.
2.4.1
Modified Euler Method
Recall the motivation for Euler’s method from §2.2.1. We
can do something similar to derive what’s often called the
Modified Euler’s method, or, less commonly, the Midpoint Euler’s method.
In Euler’s method, we use the slope of the tangent to
y at ti as an approximation for the slope of the secant
line joining the points (ti , y(ti )) and (ti+1 , y(ti+1 )).
One could argue, given the diagram below, that the
slope of the tangent to y at t = (ti + ti+1 )/2 = ti + h/2
would be a better approximation. This would give
h h
y(ti+1 ) ≈ yi + hf ti + , y(ti + ) .
2
2
h
h
, yi + f(ti , yi ) .
2
2
(2.4.10)
..
.....
.. ...
.
.
.
.
.
.
.
.
.
.
i
i+1
.
.
....
.
.
...........
.
...
.
........................
.
.
.
.
.
.
...
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
...
. ..... ..... ..
......
.
...........
...
.
...........
........
.
.
...........
..... .. ..
.
.
.
.
.
.
.
.
.
.
.
.........
.
.
.
.
.
..
.
.... ....................
.
..... ...
.
.
. .....
...........
.
... ....
.
...........
..... ..
.
...........
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
.......
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....
.....
.
......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
.
i
i ..... ....................... ..... ..... .....
.
.
...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
............. ..... .....
.
.
.
.
.
.
.
.
.
.
..
.
.......
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
..
....
.
........... ... ..... ..... ..
i
.
.
.
.
..
...................................................................................................................................................................................................................................................................................................
.
.
. ....
....
. .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..........
y(x)
Secant line joining the points
on y(t) at t and t
p
p p p pp
p p p p pp
p
p
p
p
p
p p p p ppp p
p p p p p pp p
p
p
p
p
p
p
p
p
p
p p pppp p p pp
pppppppp p p p
p
p
p
p
p
p
p
p
p
p
p
p
p p p p p p p pp
p p p p p p p ppppppp p
(t , y ) p p ppppp p p p p p p p p ppppppppppp p p p pppppp
p p p p p pp p p p p p p p p
Tangent to y(t) at t = t + h/2
ti
h
ti+1
Example 2.4.1. Use the Modified Euler Method to approximate y(1) where
y(0) = 1,
y 0 (t) = y log(1 + t2 ).
This has the solution y(t) = (1+t2 )t exp(−2t+2 tan−1 t).
In Table 2.2 below we compare the error in the solution
to this problem using Euler’s Method (left) and the Modified Euler’s Method (right) for various values of n.
Euler
n
1
2
4
8
16
32
64
128
En
3.02e-01
1.90e-01
1.11e-01
6.02e-02
3.14e-02
1.61e-02
8.13e-03
4.09e-03
En /En−1
1.59
1.72
1.84
1.91
1.95
1.98
1.99
Modified
En
7.89e-02
2.90e-02
8.20e-03
2.16e-03
5.55e-04
1.40e-04
3.53e-05
8.84e-06
En /En−1
2.72
3.54
3.79
3.90
3.95
3.98
3.99
Table 2.2: Errors in solutions to Example 2.4.1 using
Euler’s and Modified Euler’s Methods
Clearly we get a much more accurate result using the
Modified Euler Method. Even more importantly, we get
a higher order of accuracy : if we half h, the error in the
Modified method is reduced by a factor of four.
2.4.2
General RK2
The “Modified Euler Method” we have just studied is an
example of one of the (large) family of 2nd -order RungeKutta (RK2) methods, whose general form is:
yi+1 = yi + h(ak1 + bk2 )
k1 = f(ti , yi )
(2.4.11)
k2 = f(ti + αh, yi + βhk1 ).
(2.4.9)
However, we don’t know y(ti + h/2), but can approximate it using Euler’s Method: y(ti + h/2) ≈ yi +
(h/2)f(ti , yi ). Substituting this into (2.4.9) gives
yi+1 = yi + hf ti +
Initial Value Problems
If we choose a, b, α and β in the right way, then the
error for the method will be bounded by Kh2 , for some
constant K.
An (uninteresting) example of such a method is if we
take a = 1 and b = 0, it reduces to Euler’s Method. If
we can choose α = β = 1/2, a = 0, b = 1 and get the
“Modified” method above.
Our aim now is to deduce general rules for choosing a,
b, α and β. We’ll see that if we pick any one of these four
parameters, then the requirement that the method be
consistent and second-order determines the other three.
2.4.3
Using consistency
By demanding that RK2 be consistent we get that a +
b = 1.
Runge-Kutta 2(RK2)
2.4.4
Ensuring that RK2 is 2nd -order
Next we need to know how to choose α and β. The
formal way is to use a two-dimensional Taylor series expansion. It is quite technical, and not suitable for doing
in class. Detailed notes on it are given in Section 2.10
below. Instead we’ll take a less rigorous, heuristic approach.
Because we expect that, for a second order accurate
method, |En | 6 Kh2 where K depends on y 000 (t), if we
choose a problem for which y 000 (t) ≡ 0, we expect no
error...
In the above example, the right-hand side of the differential equation, f(t, y), depended only on t. Now we’ll
try the same trick: using a problem with a simple known
solution (and zero error), but for which f depends explicitly on y.
Consider the DE y(1) = 1, y 0 (t) = y(t)/t. It has a
simple solution: y(t) = t. We now use that any RK2
method should be exact for this problem to deduce that
α = β.
Now we collect the above results all together and
show that the second-order Runge-Kutta (RK2) methods
26
Initial Value Problems
are:
yi+1 = yi + h(ak1 + bk2 )
k1 = f(ti , yi ) k2 = f(ti + αh, yi + βhk1 ),
where we choose any b 6= 0 and then set
a = 1 − b,
α=
1
,
2b
β = α.
It is easy to verify that the Modified method satisfies
these criteria.
Higher-order methods
2.5
2.5.1
27
Higher-order methods
Runge-Kutta 4
It is possible to construct methods that have rates of
convergence that are higher than RK2 methods, such as
the RK4 methods, for which
|y(tn ) − yn | 6 Ch4 .
However, writing down the general form of the RK4 method,
and then deriving conditions on the parameters is rather
complicated. Therefore, we’ll simply state the most popular RK4 method, without proving that it is 4th-order.
4th-Order Runge Kutta Method (RK4):
Initial Value Problems
n
1
2
4
8
16
32
64
128
256
512
1024
2048
Euler
1.235
4.347e-01
1.947e-01
1.154e-01
6.530e-02
3.501e-02
1.817e-02
9.263e-03
4.678e-03
2.351e-03
1.178e-03
5.899e-04
|y(tn ) − yn |
Modified
3.653e-01
4.526e-02
2.108e-02
1.146e-02
4.274e-03
1.315e-03
3.650e-04
9.619e-05
2.469e-05
6.254e-06
1.574e-06
3.948e-07
RK4
6.037e-01
1.953e-01
1.726e-02
9.192e-04
5.712e-05
3.474e-06
2.122e-07
1.308e-08
8.112e-10
5.050e-11
3.154e-12
1.839e-13
Table 2.3: Errors in solutions to Example 2.5.1 using
Euler’s, Modified, and RK4
k1 = f(ti , yi ),
h
h
, yi + k1 ),
2
2
h
h
k3 = f(ti + , yi + k2 ),
2
2
k4 = f(ti + h, yi + hk3 ),
k2 = f(ti +
yi+1 = yi +
h
k1 + 2k2 + 2k3 + k4 ).
6
It can be interpreted as
k1 is the slope of y(t) at ti .
k2 is an approximation for the slope of y(t) at ti + h/2
(using Euler’s Method).
k3 is an improved approximation for the slope of y(t) at
ti + h/2.
k4 is an approximation for the slope of y(t) at ti+1 computed using the slope at y(ti + h/2).
Finally The slope of the secant line joining y(t) at the
points ti and ti+1 is approximated using a weighted
average of the of the above values.
Example 2.5.1. Recall the test problem (2.2.4). In Table 2.3 below we give the errors in the solutions computed
using various methods and values of n.
2.5.2
RK4: consistency and convergence
Although we won’t do a detailed analysis of RK4, we can
do a little. In particular, we would like to show it is
(i) consistent,
(ii) convergent and fourth-order, at least for some examples.
Example 2.5.2. It is easy to see that RK4 is consistent:
Example 2.5.3. In general, showing the rate of convergence is tricky. Instead, we’ll demonstrate how the
method relates to a Taylor Series expansion for the problem y 0 = λy where λ is a constant.
Higher-order methods
2.5.3
28
§2.5.3 The (Butcher) Tableau
A great number of RK methods have been proposed and
used through the years. A unified approach of representing and studying them was developed by John Butcher
(Auckland, New Zealand). In his notation, we write an
s-stage method as
Φ(ti , yi ; h) =
s
X
b j kj ,
where
j=1
k1 = f(ti + α1 h, yi ),
k2 = f(ti + α2 h, yi + β21 hk1 ),
k3 = f(ti + α3 h, yi + β31 hk1 + β32 hk2 ),
..
.
ks = f(ti + αs h, yi + βs1 hk1 + . . . βs,s−1 hks−1 ),
The most convenient way to represent the coefficients
is in a tableau:
α1
α2
α3
..
.
αs
β21
β31
β32
βs1
b1
βs2
b2
···
···
βs,s−1
bs−1
bs
The tableau for the basic Euler method is trivial:
0
1
The two best-known RK2 methods are probably the
“Modified method” of Section 2.7, and the “Improved
Method” of Exercise 2.9.6. Their tableaux are:
0
1/2
1/2
0
and
0
1
1
1
1/2
1/2
The tableau for the RK4 method above is:
0
1/2
1/2
1
1/2
0
0
1/6
1/2
0
2/6
1
2/6
1/6
You should now convince yourself that these tableaux
do indeed correspond to the methods we did in class.
2.5.4
Even higher-order methods?
A Runge Kutta method has s stages if it involves s evaluations of the function f. (That it, its formula features
k1 , k2 , . . . , ks ). We’ve seen a one-stage method that is
1st -order, a two-stage method that is 2nd -order, ..., and
Initial Value Problems
a four-stage method that is 4th -order. It is tempting to
think that for any s we can get a method of order s using
s stages. However, it can be shown that, for example, to
get a 5th -order method, you need at least 6 stages; for a
7th -order method, you need at least 9 stages. The theory involved is both intricate and intriguing, and involves
aspects of group theory, graph theory, and differential
equations. Students in third year might consider this as
a topic for their final year project.
From IVPs to Linear Systems
2.6
From IVPs to Linear Systems
In this final section, we highlight some of the many important aspects of the numerical solution of IVPs that
are not covered in detail in this course:
• Systems of ODEs;
• Higher-order equations;
29
Initial Value Problems
2.6.2
Higher-Order problems
So far we’ve only considered first-order initial value problems:
y 0 (t) = f(t, y);
y(t0 ) = y0 .
However, the methods can easily be extended to highorder problems:
y 00 (t) + a(t)y 0 (t) = f(t, y);
y(t0 ) = y0 , y 0 (t0 ) = y1 .
• Implicit methods; and
• Problems in two dimensions.
We have the additional goal of seeing how these methods related to the earlier section of the course (nonlinear
problems) and next section (linear equation solving).
We do this by converting the problem to a system: set
z(t) = y 0 (t). Then:
z 0 (t) = −a(t)z(t) + f(t, y),
z(t0 ) = y1 ,
y 0 (t) = z(t),
y(t0 ) = y0 .
Example 2.6.2. Consider the following 2nd-order IVP
2.6.1
Systems of ODEs
So far we have solved only single IVPs. However, must
interesting problems are coupled systems: find functions
y and z such that
y 00 (t) − 3y 0 (t) + 2y(t) + et = 0,
y(1) = e, y 0 (1) = 2e.
Let z = y 0 , then
y 0 (t) = f1 (t, y, z),
z 0 (t) = 3z(t) − 2y(t) + et ,
0
0
z (t) = f2 (t, y, z).
This does not present much of a problem to us. For
example the Euler Method is extended to
y (t) = z(t),
z(0) = 2e
y(0) = e.
Euler’s Method is
zi+1 = zi + h 3zi − 2yi + eti ,
yi+1 = yi + hf1 (t, yi , zi ),
yi+1 = yi + hzi .
zi+1 = zi + hf2 (t, yi , zi ).
Example 2.6.1. In pharmokinetics, the flow of drugs
between the blood and major organs can be modelled
dy
(t) = k21 z(t) − (k12 + kelim )y(t).
dt
dz
(t) = k12 y(t) − k21 z(t).
dt
y(0) = d, z(0) = 0.
where y is the concentration of a given drug in the bloodstream and z is its concentration in another organ. The
parameters k21 , k12 and kelim are determined from physical experiments.
Euler’s method for this is:
yi+1 = yi + h − (k12 + kelim )yi + k21 zi ,
zi+1 = zi + h k12 yi + k21 zi .
2.6.3
Implicit methods
Although we won’t dwell on the point, there are many
problems for which the one-steps methods we have seen
will give a useful solution only when the step size, h, is
small enough. For larger h, the solution can be very unstable. Such problems are called “stiff” problems. They
can be solved, but are best done with so-called “implicit
methods”, the simplest of which is the Implicit Euler
Method:
yi+1 = yi + hf(ti+1 , yi+1 ).
Note that yi+1 appears on both sizes of the equation.
To implement this method, we need to be able to solve
this non-linear problem. The most common method for
doing this is Newton’s method.
2.6.4
Towards Partial Differential Equations
So far we’ve only considered ordinary differential equations: these are DEs which involve functions of just one
variable. In our examples above, this variable was time.
From IVPs to Linear Systems
30
But of course many physical phenomena vary in space
and time, and so the solutions to the differential equations the model them depend on two or more variables.
The derivatives expressed in the equations are partial
derivatives and so they are called partial differential equations (PDEs).
We will take a brief look at how to solve these (and
how not to solve them). This will motivate the following
section, on solving systems of linear equations.
Students of financial mathematics will be familiar with
the Black-Scholes equations for pricing an option, which
we mentioned in Section 1.4.5:
Initial Value Problems
1. Divide [0, T ] into N intervals of width h, giving the
grid {0 = t0 < t1 < · · · < tN−1 < tN = T }, with
ti = t0 + ih.
2. Divide [0, L] into M intervals of width H, giving
the grid {0 = x0 < x1 < · · · < xM = L} with
xj = x0 + jH.
3. Denote by ui,j the approximation for u(t, x) at
(ti , xj ).
4. For each i = 0, 1, . . . , N − 1, use the following
2
approximation for ∂∂xu2 (ti , xj )
∂V
1
∂2 V
∂V
− σ2 S2 2 − rS
+ rV = 0.
∂t
2
∂S
∂S
With a little effort, (see, e.g., Chapter 5 of “The Mathematics of Financial Derivatives: a student introduction”,
by Wilmott, Howison, and Dewynne) this can be transformed to the simpler-looking heat equation:
∂2 u
∂u
(t, x) =
(t, x), for (x, t) ∈ [0, L] × [0, T ],
∂t
∂x2
δ2x ui,j =
1
ui,j−1 − 2ui,j + ui,j+1 ,
H2
for k = 1, 2, . . . , M − 1, and then take
ui+1,j := ui,j − h δ2x ui,j .
This scheme is called an explicit method: if we know
ui,j−1 , ui,j and ui,j+1 then we can explicitly calculate
ui+1,j . See Figure 2.3.
and with the initial and boundary conditions
u(0, x) = g(x)
and u(t, 0) = a(t), u(t, L) = b(t).
Example 2.6.3. If L = π, g(x) = sin(x), a(t) = b(t) ≡
0 then u(t, x) = e−t sin(x) (see Figure 2.2).
ui+1,j
t
(time)
ui,j−1 ui,j ui,j+1
u0,0
u0,M
x (space)
Fig. 2.3: A Finite Difference Grid. If we know ui,j−1 ,
ui,j and ui,j+1 we can calculate ui+1,j .
Fig. 2.2: The true solution to the heat equation
In general, however, this problem can’t be solved explicitly for arbitrary g, a, and b, and so a numerical
scheme is used. Suppose we somehow know ∂2 u/∂x2 ,
then we could just use Euler’s method:
u(ti+1 , x) = u(ti , x) + h
2
∂2 u
(ti , x).
∂x2
Although we don’t know ∂∂xu2 (ti , x) we can approximate
it. The algorithm is as follows:
Unfortunately, as we will see in class, this method is
not very stable. Unless we are very careful choosing h
and H, huge errors occur in the approximation for larger
i (time steps).
Instead one might use an implicit method: if we know
ui−1,j , we compute ui,j−1 , ui,j and
ui,j+1
simultaneously. More precisely: solve ui,j − h δ2x ui,j = ui−1,j for
i = 1, then i = 2, i = 3, etc. Expanding the δ2x term
we get, for each i = 1, 2, 3, . . . , the set of simultaneous
equations
ui,0 = a(ti ),
αui,j−1 + βui,j + αui,j+1 = ui−1,k ,
ui,M = b(ti ),
k = 1, 2, . . . , M − 1
From IVPs to Linear Systems
31
2h
where α = − Hh2 and β = H
This could be
2 + 1.
expressed more clearly as the Matrix-Vector equation:
Ax = f,
where

1 0
α β

0 α

..
A=
.

0 0

0 0
0 0
and

0
α
β
0 ...
0 ...
α ...
..
.
0
0
0
0
0
0
0
0
0
..
.
0
0
0
0
0
0
α β
0 α
0 0
α
β
0
ui,0
ui,1
ui,2
..
.

...
...
...

a(0)
ui−1,1
ui−1,2
..
.

0
0

0

,

0

α
1

















.



x=

,y = 

u

u
 i−1,n−2 
 i,n−2 
ui−1,n−1 
ui,n−1 
b(T )
ui,n
So “all” we have to do now is solve this system of
equations. That is what the next section of the course is
about.
Initial Value Problems
Matlab 2: Euler’s Method
2.7
32
Matlab 2: Euler’s Method
In this session you’ll extend your familiarity with Matlab,
and to use it to implement some numerical methods for
IVPs, and study their order of accuracy.
You should be able to complete at least Section 2.7.8
today. To verify your participation in the lab, send your
results to me by email by the end of today. We’ll do
Section 2.7 in the next lab.
2.7.1
Four ways to define a vector
We know that the most fundamental object in Matlab
is a matrix. The the simplest (nontriveal) example of a
matrix is a vector. So we need to know how to define
vectors. Here are several ways we could define the vector
x = (0, .2, .4, . . . , 1.8, 2.0).
(2.7.1)
x = 0:0.2:2 % From 0 to 2 in steps of 0.2
x = linspace(0, 2, 11); % 11 equally spaced
% points with x(1)=0, x(11)=2.
x = [0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, ...
1.6, 1.8, 2.0]; % Define points individually
The last way is rather tedious, but this one is worse:
x(1)=0.0; x(2)=0.2, x(3)=0.4; x(4)=0.6; ...
We’ll see a less tedious way of doing this last approach
in Section 2.7.3 below.
2.7.2
Script files
Matlab is an interpretative environment: if you type a
(correct) line of Matlab code, and hit return, it will execute it immediately. For example: try >> exp(1) to
get a decimal approximation of e.
However, we usually want to string together a collection of Matlab operations and run the repeatedly. To do
that, it is best to store these commands in a script file.
This is done be making a file called, for example, Lab2.m
and placing in it a series of Matlab commands. A script
file is run from the Matlab command window by typing
the name of the script, e.g., >> Lab2
Try putting some of the above commands for defining
a vector into a script file and run it.
2.7.3
for–loops
When we want to run a particular set of operations a
fixed number of times we use a for–loop.
It works by iterating over a vector; at each iteration
the iterand takes each value stored in the vector in turn.
For example, here is another way to define the vector
in (2.7.1):
Initial Value Problems
for i=0:10 % 0:10 is the vector [0,1,...,10]
x(i+1) = i*0.2;
end
2.7.4
Functions
In Matlab we can define a function in a way that is quite
similar to the mathematical definition of a function. The
syntax is >> Name = @(Var)(Formula);
Examples:
f = @(x)(exp(x) - 2*x -1);
g = @(x)(log(2*x +1));
Now we can call, say, f(1) to compute e − 3. Note also
that if x is a vector, so too is f(x).
But a more interesting example would be to try
>> xk = 1;
and the repeat the line
>> xk = g(xk)
Try this and observe that the values of xk seem to be
converging. This is because we are using Fixed Point
Iteration.
Later we’ll need to know how to define functions of
two variables. This can be done as:
F = @(y,t)(y./(1 + t.^2));
2.7.5
Plotting functions
Matlab has two ways to plot functions. The easiest way
is to use a function called ezplot:
ezplot(f, [0, 2]);
plots the function f(x) for 0 6 x 6 2. A more flexible
approach is to use the plot function, which can plot
one vector as a function of another. Try these examples
below, making sure you first have defined the vector x
and the functions f and g:
figure(1);
plot(x,f(x));
figure(2);
plot(x,f(x), x, g(x),
’--’, x,x, ’-.’);
Can you work out what the syntax ’--’ and ’-.’ does?
If not, ask a tutor. Also try
plot(x,f(x), ’g-o’, x, g(x),
x,x, ’-.’);
’r--x’, ...
Matlab 2: Euler’s Method
2.7.6
33
How to learn more
These notes are not an encyclopedic guide to Matlab –
they have just enough information to get started. There
are many good references online. As an exercise, access
Learning MATLAB by Tobin Driscoll through the NUI
Galway library portal. Read Section 1.6: “Things about
MATLAB that are very nice to know, but which often do
not come to the attention of beginners”.
2.7.7
Initial Value Problems
A computational technique that verifies the order of the
method is to estimate ρ by
|En |
ρ ≈ log2
.
(2.7.2)
|E2n |
Use the data in the table verify that ρ ≈ 1 for Euler’s
Method.
Initial Value Problems
The particular example of an IVP that we’ll look at in
this lab is: estimate y(4) given that is one that we had
earlier in (2.2.4):
y 0 (t) = y/(1 + t2 ), for t > 0,
4.2
4.347 × 10−1
4
3.96
1.947 × 10−1
64
128
256
512
2.7.9
Euler’s Method is
• Choose n, the number of points at which you will
estimate y(t). Let h = (tn − t0 )/n, and ti =
t0 + ih.
• For i = 0, 1, 2, . . . , n−1 set yi+1 = yi +hf(ti , yi ).
Then y(tn ) ≈ yn . As shown in Section 2.3.3, the global
error for Euler’s method can be bounded:
|En | := |y(T ) − yn | 6 Kh,
for some constant K that does not depend on h (or n).
That is, if we half h (i.e, double n), the error is reduced
by half.
Download the Matlab script file Euler.m. It can be
run in Matlab simply by typing >> Euler
It implements Euler’s method for n = 1. Read the file
carefully and make sure you understand it.
The program computes a vector y that contains the
estimates for y at the time-values specified in the vector
t. However, Matlab indexes all vectors from 1, and not
0. So t(1) = t0 , t(2) = t1 , ... t(n + 1) = tn .
By changing the value of n, complete the table.
We want to use this table to verify that Euler’s Method
is 1st -order accurate. That is:
with
2
ρ
32
and y(0) = 1.
Euler’s Method
|En | 6 Kh
En
16
The true solution to this is y(t) = e
. If you don’t
want to solve this problem by hand, you could use Maple.
The command is:
dsolve({D(y)(t)=y(t)/(1+t^2),y(0)=1},y(t));
ρ
yn
8
arctan(t)
2.7.8
n
ρ = 1.
More Matlab: formatted output
When you read the code for the Euler.m script file, you’ll
see that it uses the disp function to display messages and
values. It is not a very flexible function. For example,
it only prints one piece of information per line, and it
doesn’t display in scientific notation where appropriate.
A better approach is to use the more flexible, if complicated, fprintf function. Its basic syntax is:
fprintf(’Here is a message\n’);
where the \n indicates a new line.
Using fprintf to display the contents of a variable
is a little more involved, and depends on how we want
the data displayed. For example:
• To display an integer:
fprintf(’Using n=%d steps\n’, n);
• To display a floating point number:
fprintf(’y(n)=%f\n’, Y(n+1));
• To display in exponential notation:
fprintf(’Error=%e\n’, Error);
• To display 3 decimal places:
fprintf(’y(n)=%.3f\n’, Y(n+1));
or
fprintf(’Error=%.3e\n’, Error);
Use these to improve the formatting of the output from
your Euler.m script. To get more information on fprintf,
type >> doc fprintf , or ask one of the tutors.
Matlab 2: Euler’s Method
2.8
34
Matlab 3: RK methods
(i) (a) Assuming that the method is consistent, determine the value of b3 .
In this lab, we’ll extend to code for Euler’s method from
Lab 2 to implement higher-order methods.
2.8.1
Runge-Kutta 2 Methods
y 0 (t) = 2t, t > 0,
determine α2 .
(c) Consider the initial value problem:
yi+1 = yi + h(ak1 + bk2 ),
where k1 = f(ti , yi ) and k2 = f(ti + αh, yi + βhk1 ). In
Section 2.4, we saw that if we pick any b 6= 0, and let
α=
1
,
2b
(2.8.3)
β = α,
then we get a second order method: |En | 6 Kh2 .
If we choose b = 1, we get the so-called Modified or
mid-point Euler Method from Section 2.4.
Adapt the Euler.m script to implement this method,
and complete the table below.
n
(b) Assuming that the method is exact when used
to compute to solution to
y(0) = 0,
The RK2 method is:
a = 1 − b,
Initial Value Problems
En
yn
ρ
2
3.720 4.526 × 10−2
4
3.786 2.108 × 10−2
8
y(0) = 1, y 0 (t) = λy(t).
Using that the solution is y(t) = eλt , write out
a Taylor series for y(ti+1 ) about y(ti ) up to
terms of order h4 (note: use that h = ti+1 −
ti ).
Using that RK3-1 should agrees with the Taylor Series expansion up to terms of order h3 ,
determine b21 and b32 .
(ii) Write out the equations for this method in the same
form that RK4 was given in Section 2.5.1.
(iii) Write a Matlab program that implements this method.
Test it for your own choice of a differential equation
for which you know the exact solution.
Now use this program to check the order of convergence of the method. If possible, have it compute
the error for different values of n (e.g., n = 2,
n = 4, n = 8, etc). Then produce a log-log plot of
these n against the corresponding errors.
16
32
64
Explain how one should interpret the slope of this
line, and why.
128
256
512
.................................................
Important: Send a summary of your numerical results, and your code for the Modified method to
[email protected] before the end of the lab.
The following exercise is part of a homework assignment. Come to lectures for more details.
Also, have a look at Exercise 2.8.1.
Exercise 2.8.1. ? Here is the tableau for a three stage
Runge-Kutta method, which we’ll call RK3-1 (because
that is its name).
0
2/3
b21
α2
1/3
b32
1/4
0
b3
(iv) Write a version of the programme that can implement a Runge-Kutta method, up to four stages, for
any tableau. Find an example of an RK2 and RK3
method that were not covered in lectures, and verify
that your program works for these.
Matlab 2: Euler’s Method
2.9
35
IVPs: Exercises
Exercise 2.9.1. For the following functions show that
they satisfy a Lipschitz condition on the corresponding
domain, and give an upper-bound for L:
(i) f(t, y) = 2yt−4 for t ∈ [1, ∞),
(ii) f(t, y) = 1 + t sin(ty) for 0 6 t 6 2.
Exercise 2.9.2. Many text books, instead of giving the
Lipschitz condition (2.1.2) give the following: There is a
finite, positive, real number L such that
|
∂
f(t, y)| 6 L
∂y
for all (t, y) ∈ D.
Is this statement stronger than (i.e., more restrictive then),
equivalent to or weaker than (i.e., less restrictive than)
statement (2.1.2)? Justify your answer.
Exercise 2.9.3. An important step in the proof of Proposition 2.3.3, but which we didn’t do in class, requires the
observation that if |Ei+1 | 6 |Ei |(1 + hL) + h|Ti |, then
T
(1 + hL)i − 1
i = 0, 1, . . . , N.
L
Use induction to show that is indeed the case.
|Ei | 6
Exercise 2.9.4. Suppose we use Euler’s method to find
an approximation for y(2), where y solves
y 0 = (t − 1) sin(y).
y(1) = 1,
(i) Give an upper bound for the global error taking n =
4 (i.e., h = 1/4)
(ii) What n should you take to ensure that the global
error is no more that 10−3 ?
Exercise 2.9.5. As a special case in which the error of
Euler’s method can be analysed directly, consider Euler’s
method applied to
y 0 (t) = y(t),
y(0) = 1.
The true solution is y(t) = et .
(i) Show that the solution to Euler’s method can be
written as
yi = (1 + h)ti /h ,
i > 0.
Initial Value Problems
Exercise 2.9.6. A popular RK2 method, called the Improved Euler Method, is obtained by choosing α = 1 in
(2.4.11).
(i) Use the Improved Euler Method to find an approximation for y(4) when
y(0) = 1,
y 0 = y/(1 + t2 ),
taking n = 2. (If you wish, use Matlab.)
(ii) Using a diagram similar to the one above for the
Modified Euler Method, justify the assertion that
the Improved Euler Method is more accurate than
the basic Euler Method.
(iii) Show that the method is consistent.
(iv) Write out what this method would be for the problem: y 0 (t) = λy for a constant λ. How does this
relate to the Taylor series expansion for y(xi+1 )
about the point xi ?
Exercise 2.9.7. In his seminal paper of 1901, Carl Runge
gave the following example of what we now call a RungeKutta 2 method:
h
f(tn , yn )+
yn+1 = yn +
4
2
2
3f tn + h, yn + hf(tn , yn ) .
3
3
(i) Show that it is consistent.
(ii) Show how this method fits into the general framework of RK2 methods.
(iii) Use it to estimate the solution at the point t = 2
to y(1) = 1, y 0 = 1 + t + y/t taking n = 2 time
steps.
(iv) Write down what this method is when problem y 0 (t) =
λy(t). How does this compare with the Taylor series for y(ti+1 ) about the point ti ?
Exercise 2.9.8. We claim that, for RK4:
|EN | = |y(tN ) − yN | 6 Kh4 .
lim (1 + h)1/h = e.
for some constant K. How could you verify that the
statement is true in this instance (i.e., using the data of
Table 2.3)? Give an estimate for K.
This then shows that, if we denote by yn (T ) the approximation for y(T ) obtained using Euler’s method
with n intervals between t0 and T , then
Exercise 2.9.9. Recall the problem in Example 2.2.2:
Estimate y(2) given that
(ii) Show that
h→0
lim yn (T ) = eT .
n→∞
Hint: Let w = (1+h)1/h , so that log w = (1/h) log(1+
h). Now use l’Hospital’s rule to find limh→0 w.
y(1) = 1,
y 0 = f(t, y) := 1 + t +
y
,
t
(i) Show that f(t, y) satisfies a Lipschitz condition and
give an upper bound for L.
Matlab 2: Euler’s Method
36
(ii) Use Euler’s method with h = 1/4 to estimate y(2).
Using the true solution, calculate the error.
Solving Linear Systems
• To differentiate a function f a(t), b(t) with respect to t:
df
∂f da
∂f db
=
+
;
dt
∂a dt
∂b dt
(iii) Repeat this for RK2 method taking h = 1/2.
(iv) Use RK4 with h = 1 to estimate y(2).
Exercise 2.9.10. Here is the tableau for a three stage
Runge-Kutta method:
f(x + η1 , y + η2 ) = f(x, y) + η1
0
η2
α2
1/2
1
b31
2
1/6
b2
1/6
2. The method is exact when used to compute to solution to
y(0) = 0,
y 0 (t) = 2t, t > 0.
Use this to determine α2 .
3. The method should agree with an appropriate Taylor series for the solution to y 0 (t) = λy(t), up to
terms that are O(h3 ). Use this to determine b31 .
Exercise 2.9.11. Write down the Euler Method for the
following 3rd-order IVP
y 000 − y 00 + 2y 0 + 2y = x2 − 1,
y(0) = 1, y 0 (0) = 0, y 00 (0) = −1.
Exercise 2.9.12. Use a Taylor series to provide a derivation for the formula
1
∂2 u
(ti , xj ) ≈ 2 ui,j−1 − 2ui,j + ui,j+1 .
∂x2
H
Appendix
Formal Dirivation of RK2
This section is based on [1, p422]. We won’t actually
cover this in class, instead opting to to deduce the same
result in an easier, but unrigorous way in Section 2.4.4.
If were to do it properly, this how we would do it.
We will require that the Truncation Error (2.3.6) being second-order. More precisely, we want to be able to
say that
y(tn+1 ) − y(tn )
− Φ(tn , y(tn ); h)| 6 Ch2
h
where C is some constant that does not depend on n or
h.
So the problem is to find expressions (using Taylor se
ries) for both y(tn+1 ) − y(tn ) /h and Φ(tn , y(tn ); h)
that only have O(h2 ) remainders. To do this we need to
recall two ideas from 2nd year calculus:
|Tn | = |
∂f
(x, y)+
∂x
∂f
(x, y) + C(max{η1 , η2 })2 .
∂y
for some constant C. See [1, p422] for details.
1. Use that the method is consistent to determine b2 .
2.10
• The Taylor Series for a function of 2 variables, truncated at the 2nd term is:
To get an expression for |y(tn+1 ) − y(tn )|, use a
Taylor Series:
h2 00
y (tn ) + O(h3 )
2
h2
0
+ O(h3 )
= y(tn ) + hf tn , y(tn ) +
f tn , y(tn )
2
h2 ∂
= y(tn ) + hf tn , y(tn ) +
f tn , y(tn ) +
2 ∂t
∂
y 0 (tn ) f tn , y(tn ) + O(h3 ),
∂y
= y(tn ) + hf tn , y(tn ) +
h2 ∂
∂
f + f f tn , y(tn ) + O(h3 ),
2 ∂t
∂y
h2
{t + {y F + O(h3 ).
= y(tn ) + hf tn , y(tn ) +
2
y(tn+1 ) = y(tn ) + hy 0 (tn ) +
This gives
|y(tn+1 ) − y(tn )|
= f tn , y(tn ) +
h
h ∂
∂
f + f f tn , y(tn ) + O(h2 ). (2.10.4)
2 ∂t
∂y
Next we expand the expression f(ti +αh, yi +βhf(ti , yi )
using a (two dimensional) Taylor Series:
f tn + αh, yn + βhf(tn , yn ) =
∂
f(tn , yn )+
∂t
∂
hβf(tn , yn ) f(tn , yn ) + O(h2 ).
∂y
f(tn , yn ) + hα
This leads to the following expansion for Φ(tn , y(tn )):
Φ tn , y(tn ); k = (a + b)f tn , y(tn ) +
∂
∂
h bα f + bβf f tn , y(tn ) + O(h2 ). (2.10.5)
∂t
∂y
So now, if we are to subtract (2.10.5) from (2.10.4) and
leave only terms of O(h2 ) we have to choose a + b = 1,
bα = 1/2 = bβ = 1/2. That is, choose α and let
β = α,
b=
1
,
2α
a = 1 − b.
(2.10.6)
(For a more detailed exposition, see [1, Chap 12]).
Chapter 3
Solving Linear Systems
3.1
Introduction
• We assume you know how to multiply a matrix by
a vector, and a matrix by a matrix.
This section of the course is concerned with solving systems of linear equations (“simultaneous equations”). All
problems are of the form: find the set of real numbers
x1 , x2 , . . . , xn such that
• You can write the scalar product of two vectors,
P
x and y as (x, y) = xT y = n
i=1 xi yi . (This is
also called the “(usual) inner product” or the “dot
product”).
a11 x1 + a12 x2 + · · · + a1n xn = b1
• Recall that (Ax)T = xT AT , so (x, Ay) = xT Ay =
(AT x)T y = (AT x, y).
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
• Letting I be the identity matrix, if there is a matrix
B such that AB = BA = I, when we call B the
inverse of A and write B = A−1 . If there is no
such matrix, we say that A is singular.
an1 x1 + an2 x2 + · · · + ann xn = bn
where the aii and bi are real numbers.
It is natural to rephrase this using the language of
linear algebra: Find x = (x1 , x2 , . . . xn )T ∈ Rn such
that
Ax = b,
(3.1.1)
Lemma 3.1.1. The following are equivalent:
1. for any b, Ax = b has a solution.
2. if there is a solution to Ax = b, it is unique.
where A ∈ Rn×n is a n × n-matrix and b ∈ Rn is a
(column) vector with n entries.
In this section, we’ll try to find clever ways of solving
this system. In particular,
3. for all x ∈ Rn , if Ax = 0 then x = 0.
4. The columns (or rows) of A are linearly independent.
1. we’ll argue that its unnecessary and (more importantly) expensive to try to compute A−1 .
5. There exists a matrix A−1 ∈ Rn×n such that
AA−1 = I = A−1 A.
2. we’ll have a look at Gaussian Elimination, but will
study it in the context of LU-factorisation.
6. det(A) 6= 0.
Item 5 of the above lists suggests that we could solve
(3.1.1) as follows: find A−1 and then compute x =
A−1 b. However this is not the best way to proceed.
We only need x and computing A−1 requires quite a bit
a work.
3. After a detour to talk about matrix norms, we calculate the condition number of a matrix.
4. This last task will require us to be able to estimate
the eigenvalues of matrices; so we will finish the
module by studying how to do that.
3.1.2
3.1.1
A short review of linear algebra
The time it takes to compute det(A)
In first-year linear algebra courses, we would compute
n
• A vector x ∈ R , is an ordered n-tuple of real
numbers, x = (x1 , x2 , . . . , xn )T .
• A matrix A ∈ Rm×n is a rectangular array of n
columns of m real numbers.
A−1 =
37
1
A∗
det(A)
where A∗ is the adjoint matrix. But it turns out that
computing det(A) directly is time-consuming.
Introduction
38
Let dn be the number of multiplications required to
compute the determinant of an n × n matrix using the
Method of Minors (also known as “Laplace’s Formula”).
We know that if


a b

A=
c d
then det(A) = ad − bc. So

a


A = d

g
c
e
f = a h
j
d
f
− b
g
j
We can perform a sequence of elementary row operations
to yield the system:
   

5
x
−1 3 −1
  1  

   

 0 2 −1 x2  =  3  .
   

−5
x3
0 0 5
In the latter form, the problem is much easier: from
the 3rd row, it is clear that x3 = −1, substitute into row
2 to get x2 = 1, and then into row 1 to get x1 = −1.
d2 = 2. If

b c


e f

h j
3.2.2
then
a b
det(A) = d e
g h
Solving Linear Systems
d
f
+ c
g
j
e
,
h
so d3 = 3d2 . Next:
Elementary row operations as matrix multiplication
Recall that in Gaussian elimination, at each step in the
process, we perform an elementary row operation such as


a11 a12 a13




A = a21 a22 a23 


a31 a32 a33
being replaced by

a11


a21 + µ21 a11

a31
3.2
3.2.1
The basic idea
Example 3.2.1. Consider the problem:

   
−1 3 −1
x
5

  1  

   
3
1 −2 x2  =  0 
   

2 −2 5
x3
−9
Carl Freidrich Gauß, Germany, 1777-1855. Although
he produced many very important original ideas, this
wasn’t one of them. The Chinese knew of “Gaussian
Elimination” about 2000 years ago. His actual contributions included major discoveries in the areas of
number theory, geometry, and astronomy.
a13



a23 + µ21 a13  =

a32
a33


0
0
0




A + µ21 a11 a12 a13 


0
0
0
a22 + µ21 a12
Gaussian Elimination
In the earlier sections of this course, we studied approximate methods for solving a problem: we replaced the
problem (e.g, finding a zero of a nonlinear function) with
an easier problem (finding the zero of a line) that has
a similar solution. The method we’ll use here, Gaussian
Elimination 1 , is exact: it replaces the problem with one
that is easier to solve and has the same solution.
1
a12
where µ21 = −a21 /a11 . Because

 

a
0
0
0
0 0 0
  11

 


 
a11 a12 a13  = 1 0 0 a21

 

0 0 0
a31
0
0
0
a12
a22
a32
a13



a23 

a33
we can write the row operation as I+µ21 E(21) A, where
E(pq) is the matrix of all zeros, except for epq = 1.
In general each of the row operations in Gaussian
Elimination can be written as
I + µpq E(pq) A
where 1 6 q < p 6 n,
(3.2.2)
and I + µpq E(pq) is an example of a Unit Lower Triangular Matrix.
As we will see, each step of the process will involve
multiplying A by a unit lower triangular matrix, resulting
in an upper triangular matrix. It turns out that triangular
matrices have important properties. So we’ll study these,
and see how they relate to Gaussian Elimination.
Introduction
3.2.3
Unit Lower Triangular and Upper Triangular Matrices
Definition 3.2.2. L ∈ Rn×n is a Lower Triangular Matrix (LTM) if the only non-zero entries are on or below
the main diagonal, i.e., if lij = 0 for 1 6 i 6 j 6 n. It is
a Unit Lower Triangular Matrix (ULTM) if lii = 1.
U ∈ Rn×n is an Upper Triangular Matrix (UTM) if
uij = 0 for 1 6 j 6 i 6 n. It is a Unit Upper Triangular
Matrix (UUTM) if uii = 1.
Example 3.2.3.
39
Solving Linear Systems
Theorem 3.2.6 (Properties of Lower Triangular Matrices). For any integer n > 2:
(i) If L1 and L2 are n × n Lower Triangular (LT) Matrices that so too is their product L1 L2 .
(ii) If L1 and L2 are n × n Unit Lower Triangular (ULT)
matrices, then so too is their product L1 L2 .
(iii) L1 is nonsingular if and only if all the lii 6= 0. In
particular all ULT matrices are nonsingular.
(iv) The inverse of a LTM is an LTM. The inverse of a
ULTM is a ULTM.
In class, we’ll cover the main ideas needed to prove
part (iv) of Theorem 3.2.6, and for LTMs (the arguments
for ULTMs are almost identical: if anything, a little easier). The other parts are left to Exercise 3.11.4. We
restate Part (iv) as follows:
Triangular matrices have many important properties.
One of these is: the determinant of a triangular matrix is the product of the diagonal entries (for proof,
see Exercise 3.11.2). Another is that the eigenvalues
of a triangular matrix are just the diagonal entries (see
Exercise 3.11.3. Some more properties are noted Theorem 3.2.6 below. The style of proof used in that theorem
recurs through out this section of the course and so is
studying for its own sake. The key idea is to partition
the matrix and then apply inductive arguments.
Definition 3.2.4. X is a submatrix of A if it can be
obtained by deleting some rows and columns of A.
Definition 3.2.5. The Leading Principal Submatrix of
order k of A ∈ Rn×n is A(k) ∈ Rk×k obtained by deleting all but the first k rows and columns of A. (Simply
put, it’s the k × k matrix in the top left-hand corner of
A).
Next recall that if A and V are matrices of the same
size, and each are partitioned




C
W
X
B
,
,
and
V=
A=
D
E
Y
Z
where B is the same size as W, C is the same size as X,
etc. Then


BW + CY
BX + CZ
.
AV = 
DW + EY
DX + EY
Armed with this, we are now ready to state our theorem on lower triangular matrices.
Suppose that L ∈ Rn×n is a Lower Triangular matrix (LTM), with n > 2, and that
there is a matrix L−1 ∈ Rn×n such that
L−1 L = In . Then L−1 is also a Lower Triangular Matrix.
Proof. Here are the main ideas. Some more details are
given in Section 3.2.4 below.
Introduction
40
Theorem 3.2.7. Statements analogous to Theorem 3.2.6
hold for upper triangular and unit lower triangular matrices. (For proof, see Exercise 3.11.5)
3.2.4
Extra: some details on the proof
of Theorem 3.2.6
The proof is by induction. First show that the statement
is true for n = 2 by writing down the of a 2×2 nonsingular
lower triangular matrix.
Next we shall that the result holds true for k =
3, 4, . . . n − 1. For the case where L ∈ Rn×n , partition
L by the last row and column:

l1,1


 l2,1


L = ln−1,1

 ..
 .

0
...
0
0


0 


0 




l2,2
...
0
ln−1,2
...
ln−1,n−1
ln,1
ln,2

...
ln,n−1



 L(n−1)


=




rT
Solving Linear Systems
• zT = (−rT X)/lnn which exists because, since det(L) 6=
0 we know that lnn 6= 0.
• Similarly, β = (1 − rT y)/lnn
3.3
LU-factorisation
In this section we’ll see see that applying Gaussian elimination to solve the linear system Ax = b is equivalent
to factoring A as the product of a unit lower triangular
(ULT) and upper triangular (UT) matrix.
3.3.1
Factorising A
It was mentioned towards the end of Section 3.2 that
each elementary row operation in Gaussian Elimination
(GE) involves replacing A with (I + µrs E(rs) )A. But
(I + µrs E(rs) ) is a unit lower triangular matrix. Also,
when we are finished we have an upper triangular matrix.
So we can write the whole process as
lnn



0 







ln,n
where L(n−1) is the LT matrix with n−1 rows and column
obtained by deleting the last row and column of L, 0 =
(0, 0, . . . , 0)T and r ∈ Rn−1
. Do the same for L−1 :
?


X
y
,
L−1 = 
zT
β
and use the fact that their product is In :



(n−1)
L
0
X
y


LL−1 = 
rT
ln,n
zT
β


L(n−1) X + 0zT
L(n−1) y + 0β

=
T
T
T
r X + ln,n z
r y + ln,n β


In−1
0
.
=
T
0
1
Lk Lk−1 Lk−2 . . . L2 L1 A = U,
where each of the Li is a ULT matrix. But Theorem 3.2.6
tells us that the product of ULT matrices is itself a ULT
matrix. So we can write the whole process as
˜ = U.
LA
Theorem 3.2.6 also tells us that the inverse of a ULT
matrix exists and is a ULT matrix. So we can write
A = LU,
where L is unit lower triangular and U is upper triangular.
This is called...
Definition 3.3.1. The LU-factorization of a matrix, A,
is a unit lower triangular matrix L and an upper triangular
matrix U such that LU = A.


3 2
 then:
Example 3.3.2. If A = 
−1 2
This gives that
• L(n−1) X = In−1 so, by the inductive hypothesis,
X is an LT matrix.
(n−1)
• L
y = 0. Since Ln−1 is invertible, we get that
y = L−1
N−1 0 = 0.

3


Example 3.3.3. If A = 2

0
−1
4
2
1



3  then:

−4
Introduction
3.3.2
41
A formula for LU-factorisation
In the above examples, we deduced the factorisation by
inspection. But the process suggests a algorithm/formula.
That is, we need to work out formulae for L and U where
ai,j = (LU)ij =
n
X
lik ukj
1 6 i, j 6 n.
k=1
Solving Linear Systems
Then (3.3.3b) with j = 1 we have li1 = ai1 /u11 :


1 0
0 0




2 1
0 0
.
L=


3 l32 1 0


4 l42 l43 1
Next (3.3.3a) with i = 2

−1


0
U=

0

0
Since L and U are triangular, so
If i 6 j
then
ai,j =
i
X
then (3.3.3b) with j = 2 we have li2

1 0 0


2 1 0
L=

3 2 1

4 3 l43
lik ukj
k=1
If j < i
then
ai,j =
j
X
lik ukj
k=1
The first of these equations can be written as
i−1
X
ai,j =
we have u2j = a2j − l21 u2j :

0
1
2


−2 −1
0 
,

0 u33 u34 

0
0 u44
= (ai2 −li1 u12 )/u22 :

0


0


0

1
Etc....
lik ukj + lii uij .
k=1
3.3.3
But lii = 1 so:
ui,j = aij −
i−1
X
lik ukj
k=1
i = 1, . . . , j − 1,
(3.3.3a)
j = 2, . . . , n.
And from the second:
j−1
X
i = 2, . . . , n,
1
li,j =
aij −
lik ukj
ujj
j = 1, . . . , i − 1.
k=1
(3.3.3b)
.................................................
Example 3.3.4. Find the

−1


−2
A=

−3

−4
LU-factorisation of

0
1 2


−2 1 4


−4 −2 4

−6 −5 0
Details of Example 3.3.4: First, using (3.3.3a) with
i = 1 we have u1j = a1j :


−1 0
1
2




 0 u22 u23 u24 

.
U=

0
0 u33 u34 


0
0
0 u44
Existence of an LU-factorisation
Question: It this factorisation possible for all matrices?
That answer is no. For example, suppose one of the
uii = 0 in (3.3.3b). We’ll now characterise the matrices
which can be factorised.
Example 3.3.5.
To prove the next theorem we need the Binet-Cauchy
Theorem:
det(AB) = det(A) det(B).
Theorem 3.3.6. If n > 2 and A ∈ Rn×n is such that
every leading principal submatrix of A is nonsingular for
1 6 k < n, then A has an LU-factorisation.
Introduction
3.4
Putting it all together
Now that we know how to construct the LU-factorisation
of a matrix, we can use it to solve a linear system. We
also need to consider the computational efficiency of the
method.
42
Solving Linear Systems
Definition 3.4.3. P ∈ Rn×n is a Permutation Matrix if
every entry is either 0 or 1 (it is a Boolean Matrix) and
if all the row and column sums are 1.
Theorem 3.4.4. For any A ∈ Rn×n there exists a permutation matrix P such that PA = LU.
For a proof, see p53 of text book.
3.4.1
Solving LUx = b
From Theorem 3.3.6 we can factorize a A ∈ Rn×n (that
satisfies the theorem’s hypotheses) as A = LU. But out
overarching goal is to solve the problem: “find x ∈ Rn
such that Ax = b, for some b ∈ Rn ”. We do this by
first solving Ly = b for y ∈ Rn and then Ux = y.
Because L and U are triangular, this is easy. The process
is called back-substitution.
Example 3.4.1. Use LU-factorisation to solve

   
x
−2
−1 0
1 2

  1  

   
−2 −2 1 4 x2  −3

  =  

   
−3 −4 −2 4 x3  −1

   
−4 −6 −5 0
1
x4
Solution: Take the LU-factorisation from Example 3.3.4.
Then...
3.4.3
The computational cost
How efficient is the method of LU-factorization for solving Ax = b? That is, how many computational steps
(additions and multiplications) are required? In Section
2.6 of the textbook [1], you’ll find a discussion that goes
roughly as follows:
Suppose we want to compute li,j . From the formula
(3.3.3), we see that this would take j − 2 additions, j − 2
multiplications, 1 subtraction and 1 division: a total of
2j − 2 operations. Recall that
1 + 2 + ··· + k =
1
k(k + 1),
2
and
1
k(k + 1)(2k + 1).
6
So the number of operations required for computing L is
12 + 22 + · · · k2 =
n
n X
i−1
X
X
i2 − 2i + 1 =
(2j − 1) =
i=2 j=1
i=2
1
n(n + 1)(2n + 1) − n(n + 1) 6 Cn3
6
3.4.2
Pivoting
Example 3.4.2. Suppose we want to compute the LUfactorisation of


0 2 −4




A = 2 4
3 .


3 −1 1
for some C. A similar (slightly smaller) number of operations is required for computing U. (For a slightly different
approach that yields cruder estimates, but requires a little
less work, have a look at Lecture 10 of Afternotes [3]).
This doesn’t tell us how long a computer program will
take to run, but it does tell us how the execution time
grows with n. For example, if n = 100 and the program
takes a second to execute, then if n = 1000 we’d expect
it to take about a thousand seconds.
3.4.4
We can’t compute l21 because u11 = 0. But if we
swap rows 1 and 3 we get the matrix in Example 3.3.3
and so can form an LU-factorization. This like changing
the order of the linear equations we want to solve. If




0 0 1
3 −1 1








P = 0 1 0 then PA = 2 4
3 .




1 0 0
0 2 −4
This is called Pivoting and P is the permutation matrix.
Towards an error analysis
Unlike the other methods we studied so far in this course,
we shouldn’t have to do an error analysis, in the sense of
estimating the difference between the true solution, and
our numerical one. That is because the LU-factorisation
approach should give us exactly the true solution.
However, things are not that simple. Unlike the methods in earlier sections, the effects of (inexact) floating
point computations become very pronounced. In the next
section, we’ll develop the ideas needed to quantify these
effects.
Introduction
3.5
43
Solving Linear Systems
Matlab 3: LU-factorisations
The LU-factorisation of a matrix is typically used
to solve a linear system. However, we’ll use it to
compute the determinant of a matrix, and test if
the approach is more efficient than the “Method of
Minors”.
The files mentioned below can be downloaded from
http://www.maths.nuigalway.ie/MA385/lab4
3.5.1
How not to do it
In Section 3.1.2 we considered how to compute the determinant of a N × N matrix using the method of minors
(a.k.a. “Laplace’s Formula”). The algorithm is recursive:
• If N = 1, then det(A) = a11 . Otherwise
N
X
• det(A) =
(−1)(i−1) a1i det(A(1,i) ), where A(1,i)
i=1
is the submatrix/minor one gets by deleting the
first row and (i) column of A.
This is implemented in the code MyDet.m. This is an
example of a Matlab function file. A function file begins
with the keyword function. It can be invoked within
Matlab just by typing the name.
Download and test it on some (small!) matrix. As we
saw in class, and, the time it takes to run is proportional
to N!.
To see how long it takes to run in practice, you could
try something like this:
>> A= rand(4) % Create a random 4x4 matrix
>> tic; d = MyDet(A); toc % Start the clock
% compute det(A), stop the clock
The program TimeDet.m gives a more systematic way
of doing this. Download it, read it and run it. Based on
your results, answer the following questions.
Q1. From the data you have collected, predict how long
it would take to compute the determinant of a 16 ×
16 matrix?
Q2. What is the size of the largest matrix whose determinant you could compute in 10 minutes?
3.5.2
How to do it
As we’ve seen in recent lectures, one can factorise a Matrix A into L and U, where L is a Unit Lower Triangular
Matrix, and U is an Upper Triangular Matrix. The algorithm for this is
ui,j = aij −
i−1
X
k=1
lik ukj
i = 1, . . . , j − 1,
j = 2, . . . , n.
li,j
lii = 1,
j−1
X
1
=
aij −
lik ukj
ujj
k=1
i = 2, . . . , n,
j = 1, . . . , i − 1.
This is implemented in the Matlab program LUFactor.m.
Download and test it.
Usually we use the LU-factorisation to solve a system of linear equations Ax = b, by first using backsubstitution to solve Ly = b and then Ux = y. Instead,
we will just use the factorisation to compute det(A).
According to the Cauchy-Binet Theorem,
det(XY) = det(X) det(Y). So det(A) = det(L) det(U).
Furthermore, the determinant of any triangular matrix is
just the product of the entries on the diagonal, so
det(L) = l11 l22 · · · lNN = 1
and
det(U) = u11 u22 · · · uNN
This suggests an algorithm for computing det(A) using its LU-factorisation. Moreover, we showed that the
time required to compute the factorisation is bounded by
CN3 . And N3 N!.
.................................................
Q3. Write a short Matlab function to compute det(A)
by computing det(U).
Q4. Write a script similar to TimeDet.m that verifies that
the time required for computing det(A) this way is
bounded by CN3 . What is your estimate for C?
Q5. Based on your results from Q4, what is the largest
value of N for which you can compute the determinant of an N × N matrix in less than 10 minutes?
Important: Send your program from Q3, and your
answers Q2 and Q5 to me by email by the end of the
class. ([email protected])
.................................................
To understand the Matlab code provided, the following may be useful:
• the tic starts a stopwatch timer, toc prints the
number of seconds elapsed since tic was used.
• sum(x) computes the sum of the entries of the
vector x
• disp(’Message’) prints the word Message to the
command window. disp(x) displays the contents
of the variable x.
• [M,N]=size(A); Sets M and N to be the number
of rows and columns (respectively) in A.
•
i == j checks if i = j. It does not set i = j.
M~=N means M 6= N.
Introduction
3.6
Norms, Condition Numbers, and
Eigenvalues
Motivation: error analysis for linear solvers
As mentioned in Section 3.4.4, all computer implementations of algorithms that involve floating-point numbers (roughly, finite decimal approximations of real numbers) contain errors due to round-off error. We saw
this, for example, in labs when convergence of Newton’s
method stagnated when the approximation was within
about 10−16 of the true solution.
It transpires that computer implementations of LUfactorization, and related methods, lead to these roundoff errors being greatly magnified: this phenomenon is
the main focus of this section of the course.
You might remember from earlier sections of the course
that we had to assume functions where well-behaved in
the sense that
|f(x) − f(y)|
6 L,
|x − y|
for some number L, so that our numerical schemes (e.g.,
fixed point iteration, Euler’s method, etc) would work. If
a function doesn’t satisfy a condition like this, we say it is
“ill-conditioned”. One of the consequences is that a small
error in the inputs gives a large error in the outputs. We’d
like to be able to express similar ideas about matrices:
that A(u − v) = Au − Av is not too “large” compared
to u − v. To do this we used the notion of a “norm” to
describing the relatives sizes of the vectors u and Au.
3.6.1
44
Solving Linear Systems
4. ku + vk 6 kuk + kvk (triangle inequality).
The norms of a vector give us some information about
the size of the vector. But there are different ways of
measuring the size: you could take the absolute value of
the largest entry, you cold look at the “distance” for the
origin, etc... There are three important examples.
Definition 3.6.2. Let v ∈ Rn : v = (v1 , v2 , . . . , vn−1 , vn )T .
(i) The 1-norm (also known as the Taxi cab or Manhattan norm) is
kvk1 =
n
X
|vi |.
i=1
(ii) The 2-norm (a.k.a. the Euclidean norm)
kvk2 =
X
n
v2i
1/2
.
i=1
Note, if v is a vector in Rn , then
vT v = v21 + v22 + · · · + v2n = kvk22 .
(iii) The ∞-norm (also known as the max-norm)
n
kvk∞ = max |vi |.
i=1
Example 3.6.3. If v = (−2, 4, −4)T then
Three vector norms
When we want to consider the size of a real number,
without regard to sign, we use the absolute value. Important properties of this function are:
1. |x| > 0 for all x.
2. |x| = 0 if and only if x = 0.
3. |λx| = |λ||x|.
4. |x + y| 6 |x| + |y| (triangle inequality).
This notion can be extended to vectors and matrices.
Definition 3.6.1. Let Rn be the set of all the vectors of
length n of real numbers. The function k · k : Rn → R
is called a norm on Rn if, for all u, v ∈ Rn
1. kvk > 0,
2. kvk = 0 if and only if v = 0.
3. kλvk = |λ|kvk for any λ ∈ R,
In Figure 3.1, the first diagram shows the unit ball in
R given by the 1-norm: the vectors x = (x1 , x2 ) in R2
are such that kxk1 = |x1 | + |x2 | = 1 are all found on the
diamond (top left). In the second diagram, the vectors
p
have x21 + x22 = 1 and so are arranged in a circle (top
right). The bottom diagram gives the unit ball in k · k∞ ,
for which the largest component of each vector is 1.
It is easy to show that k · k1 and k · k∞ are norms
(see Exercise 3.11.11). And it is not hard to show that
k · k2 satisfies conditions (1), (2) and (3) of Definition
3.6.1. But it takes a little bit of effort to show that k · k2
satisfies the triangle inequality. First we need
2
Lemma 3.6.4 (Cauchy-Schwarz).
|
n
X
i=1
ui vi | 6 kuk2 kvk2 ,
∀u, v ∈ Rn .
Introduction
45
(0, 1)
(0, 1)
......
.... ....
....
...
.
....
.
.
....
....
....
....
....
..
..
....
.
.
.
(−1, 0)
....
...
.
.
......
....
....
....
....
.
.
.. (1, 0)
....
.
.
.
....
....
....
....
....
...
....
..
.
.... ....
......
. ...
..... .... .. ..... .....
.....
.....
.....
....
.
...
...
.
.
..
...
.
.
(−1, 0)
..
...
.
...
..
...
.
... (1, 0)
...
.
.
...
...
.
...
.....
....
.
.....
..... ..... ..... ..... .....
(0, −1)
(0, −1)
(0, 1)
Solving Linear Systems
3.7.1
Computing Matrix Norms
The formula for a subordinate matrix norm in Definition 3.7.1, is sensible, but not much use to if we actually
want to compute, say, kAk1 , kAk∞ or kAk2 . For a given
A, we’d have to calculate kAvk/kvk for all v. And there
is rather a lot of them. Fortunately, there are some easier
ways of computing the more important norms. We’ll see
that
• The ∞-norm of a matrix is just largest absolutevalue row sum.
(−1, 0)
(1, 0)
• The 1-norm of a matrix is just largest absolutevalue column sum.
(0, −1)
Fig. 3.1: The unit vectors in R2 : kxk1 = 1, kxk2 = 1,
kxk∞ = 1,
• The 2-norm of the matrix A is the square root of
the largest eigenvalue of AT A.
3.7.2
The proof can be found in any text-book on analysis.
Now can now apply it to show that
Lemma 3.6.5.
ku + vk2 6 kuk2 + kvk2 .
The max-norm on Rn×n
Theorem 3.7.2. For any A ∈ Rn×n the subordinate
matrix norm associated with k · k∞ on Rn can be computed by
n
X
kAk∞ = max
|aij |.
i=1,...n
j=1
It follows directly that
Corollary 3.6.6. k · k2 is a norm.
3.7
Matrix Norms
n
Definition 3.7.1. Given any norm k · k on R , there is
a subordinate matrix norm on Rn×n defined by
kAk = maxn
v∈R?
kAvk
,
kvk
A similar result holds for the 1-norm, the proof of
which is left until Exercise 3.11.14
Theorem 3.7.3.
(3.7.1)
kAk1 = max
j=1,...,n
n
X
|ai,j |.
(3.7.2)
i=1
n
where A ∈ Rn×n and Rn
? = R /{0}.
You might wonder why we define a matrix norm like
this. The reason is that we like to think of A as an
operator on Rn : if v ∈ Rn then Av ∈ Rn . So rather
than the norm giving us information about the “size” of
the entries of a matrix, it tells us how much the matrix
can change the size of a vector.
3.8
Computing kAk2
Computing the 2-norm of a matrix is a little harder that
computing the 1- or ∞-norms. However, later we’ll need
estimates not just for kAk, but also kA−1 k. And, unlike
the 1- and ∞-norms, we can estimate kA−1 k2 without
explicitly forming A−1 .
Introduction
3.8.1
46
Eigenvalues
We begin by recalling some important facts about eigenvalues and eigenvectors.
Definition 3.8.1. Let A ∈ Rn×n . We call λ ∈ C an
eigenvalue of A if there is a non-zero vector x ∈ Cn
such that
Ax = λx.
Solving Linear Systems
The importance of this result is that the square root of
an eigenvalue of AT A is a non-negative and real. Furthermore, part of the above proof involved showing that,
if AT A x = λx, then
√
kAxk2
λ=
.
kxk2
This at the very least tells us that
kAk2 := maxn
We call any such x an eigenvector associated with A.
Some properties of eigenvalues:
(i) If A is a real symmetric matrix (i.e., A = AT ), its
eigenvalues and eigenvectors are all real-valued.
x∈R?
p
kAxk2
> max
λi .
i=1,...,n
kxk2
With a bit more work, we can show that if λ1 6 λ2 6
· · · 6 λn are the the eigenvalues of B = AT A, then
p
kAk2 = λn .
(ii) If λ is an eigenvalue of A, the 1/λ is an eigenvalue
of A−1 .
Theorem 3.8.3. Let A ∈ Rn×n . Let λ1 6 λ2 6 · · · 6
λn , be the eigenvalues of B = AT A. Then
p
p
λi = λn ,
kAk2 = max
(iii) If x is an eigenvector associated with the eigenvalue
λ then so too is ηx for any non-zero scalar η.
Proof:
i=1,...,n
(iv) An eigenvector may be normalised as
kxk22 = xT x = 1.
(v) There are n eigenvectors λ1 , λn , . . . , λn associated
with the real symmetric matrix A. Let x(1) , x(2) ,
..., x(n) be the associated normalised eigenvectors.
Then the eigenvectors are linearly independent and
so form a basis for Rn . That is, any vector v ∈ Rn
can be written as a linear combination:
v=
n
X
αi x(i) .
i=1
(vi) Furthermore, these eigenvectors are orthogonal and
orthonormal:
1 i=j
(i) T (j)
x
x =
0 i 6= j
3.8.2
3.9
Condition Numbers
3.9.1
Consistency of matrix norms
It should be clear from (3.7.1) that, if k·k is a subordinate
matrix norm then, for any u ∈ Rn , A ∈ Rn×n ,
kAuk 6 kAkkuk.
Singular values
The singular values of a matrix A are the square roots of
the eigenvalues of AT A. They play a very important role
in matrix analysis and in areas of applied linear algebra,
such as image and text processing. Our interest here is
in their relationship to kAk2 .
Lemma 3.8.2. For any matrix A, the eigenvalues of
AT A are real and non-negative.
Proof:
It is an important result: we’ll need it later. There is an
analogous statement for the product of two matrices:
Definition 3.9.1. A matrix norm k · k is consistent if
kABk 6 kAkkBk,
for all A, B ∈ Rn×n .
Theorem 3.9.2. Any subordinate matrix norm is consistent.
The proof is left to Exercise 3.11.17. You should note
that there are matrix norms which are not consistent. See
Exercise 3.11.18.
Introduction
3.9.2
47
A short note on computer representation of numbers
This course is concerned with the analysis of numerical
methods to solve problems. It is implicit that these methods are implemented on a computer. However, computers are finite machines: there are limits on the amount of
information they can store. In particular there are limits
on the number of digits that can be stored for each number, and there are limits on the size of number that can
be stored. Because of this, just working with a computer
introduces an error.
Modern computers don’t store numbers in decimal
(base 10), but in binary (base 2) “floating point numbers” of the form : ±a × 2b−X . Most often 8 bytes (64
bits or binary digits) are used to store the three components of this number: the signs, a (the “significand” or
“mantissa”) and b − 1024 (the exponent). One bit is
used for the sign, 52 for a, and 11 for b. This is called
double precision. Note that a has roughly 16 decimal
digits.
(Some older computer systems sometimes use single
precision where a has 23 bits (giving 8 decimal digits)
and b has 7; so too do many new GPU-based systems).
When we try to store a real number x on a computer,
we actually store the nearest floating-point number. That
is, we end up storing x+δx, where |δx| is the “round-off”
error and |δx|/|x| is the relative error.
Since this is not a course on computer architecture,
we’ll simplify a little and just take it that single and double precision systems lead to a relative error of 10−8 and
10−16 respectively.
3.9.3
Solving Linear Systems
Theorem 3.9.4 means that, if we are solving the system
Ax = b but because of (e.g.) round-off error, there is
an error in the right-hand side, then the relative error in
the solution is bounded by the condition number of A
multiplied by the relative error in the right-hand side.2
Condition Numbers
(Please refer to p68–70 of [1] for an thorough development of the concepts of local condition number and relative local condition number ).
Definition 3.9.3. The condition number of a matrix is
κ(A) = kAkkA−1 k.
If κ(A) 1 then we say A is ill-conditioned.
.................................................
We now come the most important theorem of this
section. It tells us how the condition number of a matrix
is related to (numerical) error.
Theorem 3.9.4. Suppose that A ∈ Rn×n is nonsingular
and that b, x ∈ Rn are a non-zero. If Ax = b and
A(x + δx) = (b + δb) then
kδbk
kδxk
6 κ(A)
.
kxk
kbk
2 The
importance of this results is that it shows that the numerical error in solving a linear system can be quite dependent
on error introduced by just using floating point approximations
of the entries in the right-hand side vector, rather than the solution process itself. This important observation, and the analysis
above can be easily extended to the case where, instead of solving
A(x+δx) = (b+δb), we solve (A+δA)(a+δx) = (b+δb).
That is, where the coefficient matrix contains errors due to roundoff. A famous example where this occurs is the Hilbert Matrix. See
the nice blog post by Cleve Moler on the issue: Cleve’s Corner, Feb
2nd, 2013.
Introduction
48
Example 3.9.5. Suppose we are using a computer to
solve Ax = b where


 
10
12
1
 and b =   .
A=
0.08 0.1
1
But, due to round-off error, right-hand side has a relative
error (in the ∞-norm) of 10−6 . Then
For every matrix norm we get a different condition
number. Consider the following example:
Example 3.9.6. Let A

1


1


A = 1

 ..
.

1
be the n × n matrix

0 0 ... 0


1 0 . . . 0


0 1 . . . 0 .




0 0 ... 1
What are κ1 (A) and κ∞ (A)?
3.9.4
Estimating κ2
In the previous example we computed κ(A) by finding
the inverse of A. In general, that is not practical. However, we can estimate the condition number of a matrix.
In particular, this is usually possible for κ2 (A). Recall
√
that Theorem 3.8.3 tells us that kAk2 = λn where λn
is the largest eigenvalue of B = AT A.
Furthermore, we can show that
1/2
κ2 (A) = λn /λ1
,
where λ1 and λn are, respectively, the smallest and largest
eigenvalues of B = AT A.
Hint: We first show that AT A and AAT are “similar”, i.e., they have the same eigenvalues. This can be
Solving Linear Systems
done by noting that, if AT Ax = λx, then A(AT A)x =
(AAT )Ax = λ(Ax). That is, if λ is an eigenvalue of
AT A, with corresponding eigenvector x, then λ is also an
eigenvalue of AAT with corresponding eigenvector Ax.
Now use this, and that for any nonsingular matrix X,
(XT )−1 = (X−1 )T , to prove the result.
Introduction
3.10
49
Gerschgorin’s theorems
The goal of this final section is to learn of a simple, but
very useful approach, to estimating eigenvalues of matrices. This can be used, for example, when computing
kAk2 and κ2 (A).
This involves two theorems which together provide
quick estimates of the eigenvalues of a matrix.
3.10.1
Solving Linear Systems
Note that we made no assumption on the symmetry
of A. However, if A is symmetric, then its eigenvalues
are real and so the theorem can be simplified.
Example 3.10.3. Let

4


A = −2

1

1


−3 0

0 2
−2
Gerschgorin’s First Theorem
(See Section 5.4 of [1]).
Definition 3.10.1. Given a matrix A ∈ Rn×n , the Gerschgorin 3 Discs Di are the discs in the complex plane
with centre aii and radius ri :
n
X
ri =
3.10.2
|aij |.
j=1,j6=i
So Di = {z ∈ C : |aii − z| 6 ri }.
Theorem 3.10.2 (Gerschgorin’s First Theorem). All the
eigenvalues of A are contained in the union of the Gerschgorin discs.
Warning: an earlier draft of this proof was mostly
nonsense.
Proof. Let λ be an eigenvalues of A, so Ax = λx for the
corresponding eigenvector x. Suppose that xi is the entry
of x with largest absolute value. That is |xi | = kxk∞ .
Looking at the ith entry of the vector Ax we see that
(Ax)i = λxi =⇒
n
X
aij xj = λxi .
j=1
n
X
This can be rewritten as aii xi +
aij xj = λxi , which
j=0
j6=i
gives
(aii − λ)xi = −
n
X
aij xj
j=0
j6=i
By the triangle inequality,
|aii − λ||xi | = |
n
X
aij xj | 6
j=0
j6=i
n
X
|aij ||xj | 6 |xi |
j=0
j6=i
n
X
|aij |,
j=0
j6=i
since |xi | > |xj | for all j. Dividing by |xi | gives
|aii − λ| 6
n
X
|aij |,
j=0
j6=i
as required.
3 Semyon
Aranovich Gerschgorin, 1901–1933, Belarus. The
ideas here appeared in a paper in 1931. They were subsequently
popularised by others, most notably Olga Taussky, the pioneering
matrix theorist (A recurring theorem on determinants, The American Mathematical Monthly, vol 56, p672–676. 1949.)
Gerschgorin’s 2nd Theorem
The first theorem proves that every eigenvalue is contained in a disc. The converse, that every disc contains
an eigenvalue, is not true in general. However, Instead
we have the following result.
Theorem 3.10.4 (Gerschgorin’s Second Theorem). If k
of discs are disjoint (have an empty intersection) from
the others, their union contains k eigenvalues.
Proof. We won’t do the proof in class, and you are not
expected to know it. Here is a sketch of it: let B(ε) be
the matrix with entries
aij
i=j
bij =
εaij i 6= j.
So B(1) = 1 and B(0) is the diagonal matrix whose entries are the diagonal entries of A.
Each of the eigenvalues of B(0) correspond to its
diagonal entries and (obviously) coincide with the Gerschgorin discs of B(0) – the centres of the Gerschgorin
discs of A.
The eigenvalues of B are the zeros of the characteristic polynomial det(B(ε) − λI) of B. Since the coefficients
of this polynomial depend continuously on ε, so too do
the eigenvalues.
Now as ε varies from 0 to 1, the eigenvalues of B(ε)
trace a path in the complex plane, and at the same time
the radii of the Gerschgorin discs of A increase from 0 to
the radii of the discs of A. If a particular eigenvalue was
in a certain disc for ε = 0, the corresponding eigenvalue
is in the corresponding disc for all ε.
Thus if one of the discs of A is disjoint from the
others, it must contain an eigenvalue.
The same reasoning applies if k of the discs of A
are disjoint from the others; their union must contain k
eigenvalues.
Introduction
3.10.3
Using Gerschgorin’s theorems
Example 3.10.5. Locate the regions contains the eigenvalues of


−3 1 2




A= 1 4 0 


2 0 −6
(The actual eigenvalues are approximately −7.018, −2.130
and 4.144.)
Example 3.10.6. Use Gerschgorin’s Theorems to find
an upper and lower bound for the Singular Values of the
Matrix


4 −1 2




A =  2 3 1


1 1 4
Example 3.10.7. Let D ∈ Rn×n be the matrix with
the same diagonal entries as A, and zeros for all the offdiagonals. That is:


a1,1
0
...
0
0




a2,2 . . .
0
0 
 0


.. 
 .
..
D =  ..
.
. 




 0

0
.
.
.
a
0
n−1,n−1


0
0
...
0
an,n
Some iterative schemes (e.g., Jacobi’s and Gauss-Seidel)
for solving system of equations involve successive multiplication by the matrix T = D−1 (A − D). Proving that
these methods work often involves showing that if λ is
an eigenvalue of T then |λ| < 1. Using Gerschgorin’s
theorem, we can show that this is indeed the case if A is
strictly diagonally dominant.
50
Solving Linear Systems
Introduction
3.11
51
Exercises
Exercise 3.11.1. Show that det(σA) = σn det(A) for
any A ∈ Rn×n and any scalar σ ∈ R.
Note: the purpose of this exercise is not just to help
develop skills at establishing important properties of matrices. It also gives us another reason to avoid trying
to calculate the determinant of the coefficient matrix in
order to find the inverse, and thus the solution to the
problem. For example A ∈ R10×10 and the system is
rescaled by σ = 0.1, then det(A) is rescaled by 10−10 –
it is almost singular!
Exercise 3.11.2. Let L be a lower triangular n × n matrix. Show that
det(L) =
n
Y
ljj .
Solving Linear Systems
Exercise 3.11.10. In 3.5 you were provided with a Matlab function LUFactor which, if given a matrix A, returns the LU-factors. Write a Matlab function/script
which can be used to solve a linear system. Your function should
(i) takes as inputs a matrix A ∈ Rn×n and vector
b ∈ Rn ,
(ii) compute the LU-factorisation using the provided
code,
(iii) use back-substitution to solve Ly = b,
(iv) use back-substitution to solve Ux = y,
(v) returns y.
Your program should also:
j=0
Hence give a necessary and sufficient condition for L to
be invertible. What does that tell us about Unit Lower
Triangular Matrices?
Exercise 3.11.3. Let L be a lower triangular n × n matrix. Show that each diagonal entry of L, ljj is an eigenvalue of L.
Exercise 3.11.4. Prove Parts (i)–(iii) of Theorem 3.2.6.
Exercise 3.11.5. Prove Theorem 3.2.7.
Exercise 3.11.6. Suppose the n × n matrices A and C
are both LTMs, and that there is a n × n matrix B such
that AB = C. Must B be a LTM?
Suppose A and C are ULTMs. Must B be ULT? Why?
Exercise 3.11.7. Construct an alternative proof of Theorem 3.2.6 (iv) as follows: Suppose that L is a non-singular
lower triangular matrix. If b ∈ Rn is such that bi = 0
for i = 1, . . . , k 6 n, and y solves Ly = b, then yi = 0
for i = 1, . . . , k 6 n. (Hint: partition L by the first k
rows and columns.)
(a) be accompanied by a piece of code that verifies your
approach can solve a 20 × 20 linear system Ax = b;
(b) verify that it works by computing the residual,
kb − Axk. In Matlab, this is norm(b - A*x)
Exercise 3.11.11. Show that k · k1 and k · k∞ are norms
on Rn .
Exercise 3.11.12. Show that k · k2 on Rn satisfies conditions (1)–(3) in Definition 3.6.1.
Exercise 3.11.13. Show that, for any subordinate matrix norm on Rn×n , the norm of the identity matrix is
1.
Exercise 3.11.14. Prove Theorem 3.7.3. Hint: Suppose
that
n
X
|aij | 6 C,
j = 1, 2, . . . n,
i=1
show that for any vector x ∈ Rn
n
X
|(Ax)i | 6 Ckxk1 .
i=1
Exercise 3.11.8. Many textbooks and computing systems compute the factorisation A = LDU where L and
U are unit lower and unit upper triangular matrices respectively, and D is a diagonal matrix. Show such a
factorisation exists.
Exercise 3.11.9. Suppose that A ∈ Rn×n is nonsingular
and symmetric, and that every leading principle submatrix of A is nonsingular. Use Exercise 3.11.8 so show that
A can be factorised as A = LDLT .
How would this factorization be used to solve Ax = b?
The following exercise was added after the
notes were originally posted online
Now find a vector x such that
Deduce (3.7.2).
Pn
i=1
|(Ax)i | = Ckxk1 .
Exercise 3.11.15. Show that, for any vector x ∈ Rn ,
kxk∞ 6 kxk2 and kxk22 6 kxk1 kxk∞ . For each of these
inequalities, give an example for which the equality holds.
Deduce that kxk∞ 6 kxk2 6 kxk1 .
Exercise 3.11.16. Show that if x ∈ Rn , then kxk1 6
√
nkxk∞ and that kxk2 6 nkxk∞ .
Exercise 3.11.17. Prove that all subordinate matrix norms
are consistent.
Introduction
52
Exercise 3.11.18. One might think it intuitive to define
the “max” norm of a matrix as follows:
kAkf
∞ = max |aij |.
i,j
Show that this is indeed a norm on Rn×n . Show that,
however, it is not consistent.
Exercise 3.11.19. Let A be the matrix


2 0 0 0 0




 1 2 0 0 0




A =  1 0 2 0 0 .




 1 0 0 2 0


1 0 0 0 2
What is κ1 (A)? and κ∞ (A)? (Hint: You’ll need to
find A−1 . Recall that the inverse of a lower triangular
matrix is itself lower triangular).
Note: to do this using Maple:
>
>
>
>
>
>
with(linalg):
n:=5;
f:= (i,j) -> piecewise( (i=j),2, (j=1),1);
A := matrix(5,5, f);
evalf(cond(A,1)); evalf(cond(A,2));
evalf(cond(A,infinity));
Exercise 3.11.20. Let A be the matrix


0.1 0
0




A =  10 0.1 10 


0
0 0.1
Compute κ∞ (A). Suppose we wish to solve the system
of equations Ax = b on single precision computer system
(i.e., the relative error in any stored number is approximately 10−8 ). Give an upper bound on the relative error
in the computed solution x.
Exercise 3.11.21. A real matrix A = {ai,j } is Strictly
Diagonally Dominant if
|aii | >
n
X
|ai,j |
for i = 1, . . . , n.
j=1,j6=i
Show that all strictly diagonally dominant matrices are
nonsingular.
Exercise 3.11.22. Let

4


A = −1

0
−1
4
−1
0



−1

−3
Use Gerschgorin’s theorems to give an upper bound for
κ2 (A).
Solving Linear Systems
Exercise 3.11.23. The problem with the matrix in Example 3.9.5 is that the magnitudes of the entries in the
first and second rows of A are very different. This means
that det(A) is quite small and thus that kA−1 k∞ is quite
large. Let D = diag(A), i.e, D is a matrix of the same
size as A with
aij i = j
dij =
0
i 6= j.
For this problem, what is the condition number (in the
∞-norm) of the matrix D−1 A?