On Methods for Solving Symmetric Systems of Linear

On Methods for Solving Symmetric Systems of Linear
Equations Arising in Optimization
TOVE ODLAND
Doctoral Thesis
Stockholm, Sweden 2015
TRITA-MAT-A 2015:05
ISRN KTH/MAT/A-15/05-SE
ISBN 978-91-7595-514-8
Department of Mathematics
Royal Institute of Technology
100 44 Stockholm, Sweden
Akademisk avhandling som med tillstånd av Kungl Tekniska Högskolan framlägges till
offentlig granskning för avläggande av teknologie doktorsexamen, fredagen den 12 juni
2015 klockan 14.00 i sal F3, Lindstedtsvägen 26, Kungl Tekniska Högskolan, Stockholm.
© Tove Odland, 2015
Print: Universitetsservice US AB, Stockholm, 2015
To F and S.B.
"You are the sky; everything else is just weather."
-Ani Pema Chödrön
Abstract
In this thesis we present research on mathematical properties of methods for solving symmetric systems of linear equations that arise in various optimization problem
formulations and in methods for solving such problems.
In the first and third paper (Paper A and Paper C), we consider the connection between the method of conjugate gradients and quasi-Newton methods on strictly convex
quadratic optimization problems or equivalently on a symmetric system of linear equations with a positive definite matrix. We state conditions on the quasi-Newton matrix
and the update matrix such that the search directions generated by the corresponding
quasi-Newton method and the method of conjugate gradients respectively are parallel.
In paper A, we derive such conditions on the update matrix based on a sufficient
condition to obtain mutually conjugate search directions. These conditions are shown
to be equivalent to the one-parameter Broyden family. Further, we derive a one-to-one
correspondence between the Broyden parameter and the scaling between the search
directions from the method of conjugate gradients and a quasi-Newton method employing some well-defined update scheme in the one-parameter Broyden family.
In paper C, we give necessary and sufficient conditions on the quasi-Newton matrix and on the update matrix such that equivalence with the method of conjugate gradients hold for the corresponding quasi-Newton method. We show that the set of quasiNewton schemes admitted by these necessary and sufficient conditions is strictly larger
than the one-parameter Broyden family. In addition, we show that this set of quasiNewton schemes includes an infinite number of symmetric rank-one update schemes.
In the second paper (Paper B), we utilize an unnormalized Krylov subspace framework for solving symmetric systems of linear equations. These systems may be incompatible and the matrix may be indefinite/singular. Such systems of symmetric linear
equations arise in constrained optimization. In the case of an incompatible symmetric
system of linear equations we give a certificate of incompatibility based on a projection
on the null space of the symmetric matrix and characterize a minimum-residual solution. Further we derive a minimum-residual method, give explicit recursions for the
minimum-residual iterates and characterize a minimum-residual solution of minimum
Euclidean norm.
Keywords: symmetric system of linear equations, method of conjugate gradients,
quasi-Newton method, unconstrained optimization, unconstrained quadratic optimization, Krylov subspace method, unnormalized Lanczos vectors, minimum-residual method
vii
Sammanfattning
I denna avhandling betraktar vi matematiska egenskaper hos metoder för att lösa
symmetriska linjära ekvationssystem som uppkommer i formuleringar och metoder för
en mängd olika optimeringsproblem.
I första och tredje artikeln (Paper A och Paper C), undersöks kopplingen mellan
konjugerade gradientmetoden och kvasi-Newtonmetoder när dessa appliceras på strikt
konvexa kvadratiska optimeringsproblem utan bivillkor eller ekvivalent på ett symmetrisk linjärt ekvationssystem med en positivt definit symmetrisk matris. Vi ställer upp
villkor på kvasi-Newtonmatrisen och uppdateringsmatrisen så att sökriktningen som
fås från motsvarande kvasi-Newtonmetod blir parallell med den sökriktning som fås
från konjugerade gradientmetoden.
I den första artikeln (Paper A), härleds villkor på uppdateringsmatrisen baserade
på ett tillräckligt villkor för att få ömsesidigt konjugerade sökriktningar. Dessa villkor
på kvasi-Newtonmetoden visas vara ekvivalenta med att uppdateringsstrategin tillhör
Broydens enparameterfamilj. Vi tar också fram en ett-till-ett överensstämmelse mellan
Broydenparametern och skalningen mellan sökriktningarna från konjugerade gradientmetoden och en kvasi-Newtonmetod som använder någon väldefinierad uppdateringsstrategi från Broydens enparameterfamilj.
I den tredje artikeln (Paper C), ger vi tillräckliga och nödvändiga villkor på en
kvasi-Newtonmetod så att nämnda ekvivalens med konjugerade gradientmetoden erhålls. Mängden kvasi-Newtonstrategier som uppfyller dessa villkor är strikt större än
Broydens enparameterfamilj. Vi visar också att denna mängd kvasi-Newtonstrategier
innehåller ett oändligt antal uppdateringsstrategier där uppdateringsmatrisen är en symmetrisk matris av rang ett.
I den andra artikeln (Paper B), används ett ramverk för icke-normaliserade Krylovunderrumsmetoder för att lösa symmetriska linjära ekvationssystem. Dessa ekvationssystem kan sakna lösning och matrisen kan vara indefinit/singulär. Denna typ av symmetriska linjära ekvationssystem uppkommer i en mängd formuleringar och metoder
för optimeringsproblem med bivillkor. I fallet då det symmetriska linjära ekvationssystemet saknar lösning ger vi ett certifikat för detta baserat på en projektion på nollrummet för den symmetriska matrisen och karaktäriserar en minimum-residuallösning.
Vi härleder även en minimum-residualmetod i detta ramverk samt ger explicita rekursionsformler för denna metod. I fallet då det symmetriska linjära ekvationssystemet
saknar lösning så karaktäriserar vi en minimum-residuallösning av minsta euklidiska
norm.
Nyckelord: symmetriska linjära ekvationssystem, konjugerade gradientmetoden, kvasiNewtonmetoder, optimering utan bivillkor, kvadratisk optimering utan bivillkor, Krylovunderrumsmetoder, icke-normaliserade Lanczosvektorer, minimum-residualmetoden
viii
Acknowledgments
First of all I would like to express my deepest gratitude to my academic advisor and coauthor Professor Anders Forsgren. Your mathematical expertise, attention to detail and
patience have made it a privilege to work with you these last five years. It has been an
inspiration to work with a researcher who sees all of the possibilities and is fearless in the
face of a sometimes conservative community.
Secondly, I would like to thank my academic co-advisor Professor Krister Svanberg
for supporting me and for giving me valuable pointers when writing this thesis.
My officemate Hildur Æsa Oddsdóttir, I have really appreciated sharing a room with
you. You are an expert, not only in your academic field, but also when it comes to the
Stockholm housing situation, were to buy the best electronic equipment and so much more.
We have had so many great discussions on everything from feminism to LaTeX. Thank you
for being a great co-worker and friend.
I would also like to thank the faculty members and postdocs at the Division of Optimization and Systems Theory at KTH and my fellow students, past and present, in the
Applied and Computational Mathematics doctoral program.
I gratefully acknowledge that the research described in this thesis has been partly financially supported by the Swedish Research Council (VR).
Thank you to my family, my partner Jonas and my friends for your love and support.
Stockholm, May 2015
Tove Odland
ix
Table of Contents
Abstract
vii
Acknowledgments
ix
Table of Contents
xi
I
1
1
2
3
4
6
Introduction
I.1 Introduction to optimization . . . . . . . . . . . . . . . . . . . . . . . .
I.2 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . . . . . .
I.2.1 Linesearch methods for unconstrained optimization . . . . . . . .
I.2.2 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . .
I.2.3 The method of conjugate gradients . . . . . . . . . . . . . . . . .
I.2.4 Connection between the method of conjugate gradients and quasiNewton methods on quadratic problems . . . . . . . . . . . . . .
I.3 Symmetric systems of linear equations arising in constrained optimization
I.3.1 Solving symmetric systems of linear equations . . . . . . . . . .
I.3.2 Methods based on the symmetric Lanczos process . . . . . . . .
I.4 Summary of papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I.4.1 Contribution by co-author . . . . . . . . . . . . . . . . . . . . .
I.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
8
10
12
13
15
15
A Paper A
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2.1 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . .
A.2.2 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . .
A.2.3 Background results . . . . . . . . . . . . . . . . . . . . . . . . .
A.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3.1 Update matrices defined by Proposition A.3.3 . . . . . . . . . . .
A.3.2 Relation between δk and φk . . . . . . . . . . . . . . . . . . . .
A.3.3 Remarks on when the one-parameter Broyden family is well-defined
A.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
22
22
23
24
26
26
29
31
33
33
35
xi
B Paper B
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.2.1 An extended representation of the unnormalized Lanczos vectors .
B.3 Properties of the unnormalized Krylov framework . . . . . . . . . . . . .
B.3.1 Convergence in the unnormalized Krylov framework . . . . . . .
B.3.2 An unnormalized Krylov algorithm . . . . . . . . . . . . . . . .
B.3.3 On the case when normalization is well defined . . . . . . . . . .
B.4 Connection to the minimum-residual method . . . . . . . . . . . . . . .
B.4.1 Convergence of the minimum-residual method . . . . . . . . . .
B.4.2 A minimum-residual algorithm based on the unnormalized Krylov
method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.5 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
B.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B.6.1 A result on the Lanczos vectors . . . . . . . . . . . . . . . . . .
B.6.2 Properties of the sequence {γk } . . . . . . . . . . . . . . . . . .
B.6.3 The method of conjugate gradients . . . . . . . . . . . . . . . . .
B.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C Paper C
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2.1 Previous results on a sufficient conditions for equivalence for CG
and QN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.3 Results on the equivalence of CG and QN on (QP) . . . . . . . . . . . . .
C.3.1 Necessary and sufficient conditions for equivalence of CG and QN
C.3.2 Symmetric rank-one update schemes . . . . . . . . . . . . . . . .
C.3.3 Review of known results for the one-parameter Broyden family .
C.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
37
39
41
41
44
45
46
46
48
49
49
53
54
55
55
56
58
59
63
65
66
67
69
69
73
76
78
78
80
Introduction
I.1 Introduction to optimization
Optimization is used in a great number of applications such as telecommunications, intensity-modulated radiation therapy treatment planning, crew-scheduling, modeling of metabolic reaction networks, structure/topology design, portfolio theory and traffic modeling
to mention just a few. Depending on the application the difficulty may lay in either the
modeling and/or solving the posed optimization problem.
An optimization problem is a decision problem on the form
min f (x),
x∈F
(I.1)
where x ∈ Rn is the vector of decision variables, f , the objective function, is some
function from Rn → R and F is the feasible region for x. The field of optimization can
roughly be divided into two major parts.
The first part is posing the problem (I.1), that is finding a suitable f and F to represent
the problem at hand. This requires consideration since the choices made highly affects how
hard the problem will be to solve. We will consider formulations where f is some smooth
function over F, which means that f will be continuously differentiable as many times as
required.
The second part is to solve the posed problem (I.1) and determining the quality of the
solution obtained. The choices made in the modeling will determine which procedures
for solving (I.1) that will be adequate. In this thesis we focus on the mathematics behind
the second part and consider in great detail methods for solving (I.1) and the symmetric
systems of linear equations that arise in various formulations of optimization problems and
methods for solving these.
When solving a problem (I.1), the goal is to find a vector x� that is a global minimizer,
i.e. x� ∈ F and f (x� ) ≤ f (x) for all x ∈ F. In some cases, a globally optimal solution
can not be attained and then a locally optimal solution is sought, such a solution x� satisfy
f (x� ) ≤ f (x) in some neighborhood of x� .
In order to find x� one can use some method that systematically searches F without
evaluating all feasible points. Depending on the characteristics of f and F there are different methods that are suitable. In general these methods work the following way: given
a starting guess x0 , the method generates a sequence of iterates x1 , x2 , . . . , where each
new iterate is in some way a better estimation than the previous of x� . Note that in some
methods the intermediate iterates may not be feasible to F. This process continues until
1
2
O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
one reaches an iterate that satisfies some stopping criteria based on feasibility and some
conditions that guarantee local or global optimality.
All the work in this thesis is done assuming exact arithmetic. There are many aspects
that needs to be considered when implementing an algorithm and solving a problem in
finite arithmetic that are beyond the scope of the work presented in this thesis. However,
understanding the mechanics of the inner-most parts of optimization methods is crucial
and we have devoted this thesis to the mathematics behind this.
Notation
The letters i, j and k are used to denote integer indices, other lowercase letters such as
g, x, p and c denote column vectors in Rn , possibly with super- and/or subscripts. For
a symmetric matrix H ∈ Rn×n , H � 0 denotes that H is positive definite, i.e. that
v T Hv > 0 for all 0 �= v ∈ Rn . Analogously, H � 0 is used to denote that H is positive
semidefinite, i.e. that v T Hv ≥ 0 for all v ∈ Rn . The null space and range space of H are
denoted by N (H) and R(H) respectively.
I.2
Unconstrained optimization
Consider an unconstrained optimization problem, i.e. F = Rn in (I.1), given by
min f (x),
x∈Rn
(P)
where the function f : Rn → R is smooth. For such a general unconstrained problem it
holds that if,
i) ∇f (x� ) = 0,
ii) ∇2 f (x� ) � 0,
then x� is locally optimal solution. The above conditions are known as second-order sufficient optimality conditions in that they are sufficient for local optimality, see, e.g., [26,
Chapter 11] and [1, Chapter 4].
An optimization problem is convex if the feasible region is convex and the objective
function f is a convex function over the feasible region. If an optimization problem is
convex, then all locally optimal solutions are globally optimal solutions and therefore this
is a desirable property to have for a problem one wants to solve. See, e.g., [26, Chapter 11]
for an introduction on the topic of convexity. It holds that an unconstrained problem (P) is
convex if and only if ∇2 f (x) � 0 for all x ∈ Rn . Note that if f is a convex function on Rn
then x� is a globally optimal solution if and only if ∇f (x� ) = 0, see. e.g. [1, Chapter 4].
For this reason, many methods for solving (P) will search for an iterate where the gradient
of the objective function is zero, a so-called stationary point to the problem. In the next
section a certain class of such methods known as linesearch methods is introduced.
I NTRODUCTION
3
I.2.1 Linesearch methods for unconstrained optimization
For our purposes, methods for unconstrained optimization problems (P) are divided into
two classes, methods where second-derivative information is available and used and methods which only use first-derivative information. It deserves to be mentioned that there are
also so-called derivative-free methods that uses neither first- nor second-derivative information, see, e.g., [9] for an introduction to such methods.
We consider so-called linesearch methods that takes us from xk to the next iterate
xk+1 by taking some step θk along a non-zero search direction pk ,
(I.2)
xk+1 = xk + θk pk .
Different linesearch methods arise depending on how the search direction pk is generated
at each step k. A general requirement on a linesearch method is that it should have the
descent property, i.e. f (xk + θk pk ) < f (xk ) for all k. This property is for example
guaranteed if ∇f (xk )T pk < 0, i.e. pk is a descent direction of f at xk . Then there always
exists a positive θk such that f (xk + θk pk ) < f (xk ) is satisfied, see, e.g. [22, Chapter 3].
There are many strategies on how to choose the steplength θk . Ideally, one would like
to find the optimal value of θk corresponding to
θk = argmin f (xk + θpk ),
θ∈R
given xk and pk . Using such a value of θk is referred to the linesearch method employing
exact linesearch. However, it is not always worthwhile to use exact linesearch, especially
for iterates that are far from the optimal solution. See, e.g., [41, Chapter 3] for a discussion
on other strategies for choosing θk and their benefits.
A common stopping criteria for linesearch methods is ∇f (x) = 0, i.e. the current
iterate is a stationary point. If a linesearch method is implemented and used in finite precision then this stopping criteria will be �∇f (x)� < �, where � should be chosen as some
small number in an appropriate manner. In addition it should be pointed out that for finite
precision computations there are more stable choices for stopping criteria and one can also
consider using a combination of such choices, see, e.g. [1, Chapter 11] and [22, Chapter
8].
When second-derivative information is available the perhaps most famous line-search
method is Newton’s method, for which the search direction pk is given by solving
∇2 f (xk )pk = −∇f (xk ).
(I.3)
Newton’s method makes a quadratic model of f at xk . If ∇ f (xk ) � 0 then the unit
step along pk takes us to the minimum of this quadratic model. If ∇2 f (xk ) is not positive definite then the quadratic model may not have a minimum and in this case there are
strategies in which one uses a related positive-definite matrix instead of ∇2 f (xk ) in (I.3),
see, e.g., [22, Chapter 4]. If Newton’s method is started sufficiently close to x� , then the
convergence rate of the method is at least quadratic, see, e.g. [37, Chapter 7]. Note that if
f is a strictly convex quadratic function then Newton’s methods finds the optimal solution
in one iteration.
2
4
O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
In this thesis we are concerned with methods for problems on the form (P) where
second-derivative information is not available, but for which the gradient, ∇f (x), is available for all x ∈ Rn .
One of the oldest and simplest methods of this kind is the method of steepest descent,
also known as gradient descent, see, e.g., [45] for references to the origin of this method. In
the method of steepest descent, the search direction pk is chosen in the negative direction
of the gradient at xk , i.e.
pk = −∇f (xk ).
(I.4)
Even though the search direction pk will always be a descent direction for ∇f (xk ) �= 0,
(∇f (xk )T pk = −∇f (xk )T ∇f (xk ) < 0) this method may have a very poor convergence
rate for some starting guesses and displaying a behavior known as ’zig-zagging’. Further,
with the norm of the gradient as a stopping criteria one might run into problems as this
norm tends to oscillate. For a discussion on the magnitude of the oscillations of the gradient
norm in the method of steepest descent see, e.g., [40].
Next a class of first-order methods that is studied in two of the appended papers (Paper
A and Paper C), namely quasi-Newton methods, is introduced.
I.2.2
Quasi-Newton methods
The first quasi-Newton method was suggested by Davidon in 1959 [11], using the term
variable metric method. Davidon’s work caught the attention of the optimization community when Fletcher and Powell [17] presented a slightly modified method that outdid all
other existing methods of its time. The paper by Fletcher and Powell marked the start of
a period of intense research of quasi-Newton methods. For a thorough treatment of quasiNewton methods see, e.g., [41, Chapter 8], [16, Chapter 3] and [37, Chapter 9] and the
references therein. An interesting note is that Davidon’s original paper was not published
until 1991.
Quasi-Newton methods comprise a class of methods that mimic Newton’s method by
using quasi-Newton matrix Bk that is an approximation of the true Hessian ∇2 f (xk ) in
(I.3). In effect, in a quasi-Newton method the search direction at step k is generated by
solving
Bk pk = −∇f (xk ),
(I.5)
usually with B0 = I, i.e. p0 is the steepest descent step. Different quasi-Newton methods
arise depending on the choice of Bk in each iteration. The matrix Bk is usually given by
adding an update matrix Uk to the previous approximation Bk−1 , i.e.
Bk = Bk−1 + Uk .
(I.6)
One often considers updating a Cholesky factorization of Bk , then (I.5) can be solved
in order of n2 operation. Further if the update matrix is of low rank, then one does not
need to recompute the Cholesky factorization at each step, see, e.g., [22, Chapter 4].
In many paper, especially the earlier ones, an approximation of the inverse Hessian,
Mk , is used and (I.5) then takes the form pk = −Mk gk . From a theoretical point of view
I NTRODUCTION
5
it makes little difference if one works with Bk or Mk . In the work presented in this thesis
we choose to work with quasi-Newton methods on the form given by (I.5).
A famous class update schemes for quasi-Newton methods is the one-parameter Broyden family [3]. This family of update schemes depends on φk , a free parameter, known as
the Broyden parameter. The one-parameter Broyden family includes the DFP (DavidonFletcher-Powell) update scheme, for φk = 1, which was the first quasi-Newton scheme to
be suggested in [11, 17], the BFGS update scheme, for φk = 0, discovered independently
by Broyden [4], Fletcher [15], Goldfarb [23] and Shanno [47] and the symmetric rank-one
update scheme, SR1. All update schemes in the one-parameter Broyden family are of rank
two except SR1 which is of rank one. There is a subclass of the one-parameter Broyden
family called the convex Broyden family which is given by all convex combinations of the
BFGS and DFP update schemes. Other parametric formulations of quasi-Newton update
schemes are possible, see, e.g., [47, 49].
The update schemes in the one-parameter Broyden family can become not well defined
e.g. for certain values (degenerate values) of the Broyden parameter φ, see, e.g., [19].
In many treatments the matrix Bk is assumed to be symmetric, for example the oneT
parameter Broyden family has the property of hereditary symmetry, i.e. if Bk−1 = Bk−1
T
then Bk = Bk . It is also possible to consider nonsymmetric update schemes, e.g. the
Huang family of updates which has three degrees of freedom, see [34]. The symmetric
part of the Huang family of updates can be shown to be equivalent to a class of updates
schemes with two degrees of freedom defined by the self-scaling quasi-Newton algorithm
by Oren and Luenberger [42]. In this thesis quasi-Newton methods for which Bk = BkT
are the main focus.
An interesting characteristic of the SR1 update scheme is that on a quadratic problem
given any set of linearly independent search directions, the optimal solution will be found
in at most n steps without the use of linear searches in each step, see, e.g. [10, 15]. In [52],
the class of methods presented in [34] is modified to obtain this characteristic.
For very large problems it might be too expensive to compute and store the full n × n
matrix Bk at each step. In this case one can consider a limited-memory quasi-Newton
method that uses an implicit representation of Bk , see, e.g., [41, Chapter 9].
Quasi-Newton methods can be shown to have Q-superlinear convergence, see, [13].
Note that, for this convergence, it is not required the {Bk } tends to the true Hessian at x� ,
only that it does so along the search directions {pk }.
In 1972, Dixon [14] showed that all well-defined update schemes in the one-parameter
Broyden family generate parallel search directions on (P) for any smooth function if perfect
linesearch is used. Perfect linesearch is a generalization of exact linesearch for smooth nonconvex functions. For a strictly convex quadratic functions these search directions will in
addition be mutually conjugate with respect to the Hessian and hence linearly independent.
Hence, a quasi-Newton method using any well-defined update scheme in the one-parameter
Broyden family has the property of finite termination on the corresponding quadratic
optimization problem, see, e.g., [34].
Further, on a quadratic problem the search directions generated by a quasi-Newton
method, using any well-defined update scheme from the one-parameter Broyden family,
6
O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
will be parallel with those generated by the method of conjugate gradients(introduced in
the next section) provided that both methods employ exact linesearch.
I.2.3
The method of conjugate gradients
Consider an unconstrained quadratic optimization problem
1
min q(x) = minn xT Hx + cT x,
x∈R 2
x∈Rn
(QP)
where H = H T � 0. Note that solving (QP) is equivalent to solving a symmetric system
of linear equations on the form
Hx + c = 0,
(I.7)
where H = H T � 0. For (QP), the second-order sufficient optimality conditions are given
by
i) Hx� + c = 0,
ii) H � 0.
If H � 0 then there exists a unique solution to (I.7) or equivalently a unique optimal
solution to (QP) given by x� = −H −1 c. For the case H � 0 there exist an infinite number
of solutions to (I.7) if c ∈ R(H) and no solutions if c ∈
/ R(H).
In this section the method of conjugate gradients by Hestenes and Stiefel [33] is
considered. This is a method for solving a system of symmetric linear equations on the
form (I.7) and which can also be expressed as a linesearch method for (QP). The method
of conjugate gradients is a first-order method and hence the matrix H is only needed to be
available through matrix-vector multiplication.
In a linesearch description of the method of conjugate gradients for solving (QP) the
gradient of the objective function at xk is denoted by gk , i.e. gk = Hxk + c. In accordance
with (I.2) the gradient is updated in each step as
gk+1 = gk + θk Hpk .
(I.8)
On (QP) it is natural use exact linesearch for θk . Consider θ to be unknown and solve
for the optimal θ,
1
min q(xk + θpk ) = min θ2 pTk Hpk + θpTk gk + q(xk ).
θ∈R
θ∈R 2
Note that q(xk ) is a constant term, hence,
0=
�
d�
q(xk + θpk ) = θpTk Hpk + pTk gk .
dθ
It follows that the exact linesearch step is given by
θk = −
pTk gk
.
pTk Hpk
(I.9)
I NTRODUCTION
7
As Hpk is already needed for computation of gk+1 in (I.8), calculating the exact linesearch
step only entails two vector-vector multiplications.
In a linesearch description of the method of conjugate gradients the search directions
are given by
pCG
= −g0 ,
0
pCG
= −gk +
k
gkT gk
pCG
k−1 ,
T g
gk−1
k−1
(I.10)
k ≥ 1.
For the methods of conjugate gradients it holds that giT gj = 0 for all i �= j and also
that gkT pCG
= 0, for all j = 0, . . . , k − 1. In fact the gradients {gk } in the method of
j
conjugate gradients form an orthogonal basis for the Krylov subspaces generated by H
and c,
K0 (c, H) = {0},
Kk (c, H) = span{c, Hc, H 2 c, . . . , H k−1 c},
k ≥ 1,
(I.11)
i.e., it holds that gk ∈ Kk+1 (c, H) ∩ Kk (c, H)⊥ . Let r be the minimum index k for which
Kk+1 (c, H) = Kk (c, H), then after generating r mutually orthogonal, and hence linearly
independent, gradients the stopping criteria is reached as gr = 0. Hence, on (QP) the
method of conjugate gradients has the property of finite termination. Further, the search
directions given by (I.10) form an H-orthogonal basis for the expanding Krylov subspaces
given by (I.11), i.e. the search directions are mutually conjugate with respect to H. See,
e.g., [30] for an introduction to Krylov subspaces and Krylov subspace methods for solving
systems of linear equations.
There are several other ways to derive the method of conjugate gradients. For example,
as a conjugate directions method where the Gram-Schmidt conjugation is based on the orthogonal gradients, as minimization of q(x) where x belongs expanding Krylov subspaces,
as minimization of �xk −x� �H or equivalently �Hxk +c�H −1 , and as a method for solving
a symmetric system of linear equations based on a normalized symmetric Lanczos process.
For a variation of treatments of the method of conjugate gradients see, e.g. [12, Chapter
6], [25, Chapter 10], [41, Chapter 5] and [50] and the references therein.
The method of conjugate gradients was first extended to a nonlinear version for solving
general unconstrained problems (P) by Fletcher and Reeves [18]. For a survey on the
Fletcher-Reeves and other nonlinear versions of the methods of conjugate gradients see,
e.g., [32]. In [48], the topic of the method of conjugate gradients with inexact line searches
is considered.
Next the connection between the method of conjugate gradients and quasi-Newton
methods on (QP) or equivalently for solving a symmetric system of linear equations with
a positive definite matrix is discussed. This is one of the central themes of the thesis.
Besides being important in its own right, quadratic optimization problems are often at
the core of algorithms for more challenging non-linear optimizations problems. Further,
many problems tend to behave quadratically in the region of an optimal solution. In addition, some methods, for example the method of conjugate gradients and BFGS-methods,
8
O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
are considered effective when one wants to obtain an approximate solution to a quadratic
problem with desirable characteristics in a timely manner, see, e.g. [6] for an article on
intensity-modulated radiation therapy treatment planning where this property is used.
I.2.4
Connection between the method of conjugate gradients and
quasi-Newton methods on quadratic problems
Even though the method of conjugate gradients and quasi-Newton methods may seem to
be very different kind of methods, it is well-known that, when solving (QP) or equivalently
when solving (I.7) with H = H T � 0, the method of conjugate gradients and a quasiNewton methods using any well-defined update from the one-parameter Broyden family
generate parallel search directions provided that exact linesearch is used in both methods,
see, e.g., [16, Chapter 3]. This implies that the corresponding x-iterates will be identical
in exact arithmetic. In [38], this result is shown for the DFP-update scheme and in [39],
Nazareth shows that for the BFGS update scheme the search directions are, in addition
to being parallel, of the same length. The connection between the method of conjugate
gradients and the BFGS update scheme for a quasi-Newton method is further explored
by Buckley [5] where one of the results is that the BFGS update scheme can be carried
out using any of the previous approximation matrices and equivalence with the method of
conjugate gradients still holds on (QP).
The search direction, pk , in a quasi-Newton method, given by solving (I.5), is determined by the choice of Bk in each iteration k. One can consider what the precise conditions
are on Bk and/or the update matrix Uk such that pk and pCG
are parallel, i.e. pk = δk pCG
k
k
for some non-zero scalar δk . In two of the appended papers in this thesis (Paper A and
Paper C), this topic is investigated.
Next symmetric systems of linear equations that arise in problem formulations and
methods for constrained optimization problems are considered.
I.3
Symmetric systems of linear equations arising in constrained
optimization
Consider a constrained optimization problem on the form
minx∈Rn
subject to
f (x)
h(x) = 0,
g(x) ≥ 0,
(I.12)
where f , h and g are smooth functions. Depending on further characteristics of the functions f , h and g there are different methods that are suitable when one wants to solve (I.12).
Some methods in which symmetric systems of linear equations arise are mentioned next.
Consider a problem on the form of (I.12) with a quadratic objective function and linear
equality constraints,
1 T
T
minx∈Rn
2 x Hx + c x,
(I.13)
subject to Ax = b,
I NTRODUCTION
9
where A ∈ Rm×n is of rank m. It holds that, if x� is a local minimizer of (I.13), then there
exists λ� ∈ Rm such that
�
��
� �
�
x�
−c
H AT
=
.
(I.14)
�
−λ
b
A
0
The vector λ� is the Lagrange multipliers corresponding to the iterate x� . The symmetric
system of linear equations with a symmetric and indefinite matrix given in (I.14) are the
first-order necessary optimality conditions for (I.13). Let Z be a matrix whose columns
form a basis for the null space of A. It holds that x� is a global minimizer to (I.13) if and
only if there exists λ� ∈ Rm such that (I.14) holds and the reduced Hessian Z T HZ is
positive semi-definite.
Solving a symmetric system of linear equations on the form (I.14) is important, not
only for finding an optimal solution to (I.13) itself, but also in methods for solving more
complex problems. An example of this is methods that employ sequential quadratic
programming (SQP). An SQP-method is a method for solving (I.12) where in each step
a search direction is found by solving a quadratic subproblem. For simplicity consider
(I.12) with only equality constraints, then the subproblem will be on the form (I.13) where
the objective function is a quadratic model of the Lagrangian function L(x, λ) = f (x) −
λT h(x) at the current iterates (xk , λk ) and the constraints are given by a linearization of
h around the current iterate xk . The first-order necessary optimality conditions of such
a quadratic subproblem is given by (I.14), a symmetric system of linear equations with a
matrix that is symmetric and indefinite. Such an SQP-method is equivalent to applying
Newton iterations to the first-order necessary optimality conditions of (I.12) itself. Note
that there are SQP-methods for (I.12) with both inequality and equality constraints. The
iterates of an SQP-method will in general not be feasible to the original problem. See,
e.g., [26, Chapter 15] for an introduction to SQP.
Consider a quadratic problem with linear inequality constraints,
minx∈Rn
subject to
1 T
2 x Hx
Ax ≥ b.
+ cT x,
(I.15)
When it comes to handling inequality constraints we present two approches.
The first one is to consider an active-set method, in which one keeps track of the
set of active constraints and treat them as equality constraints while ignoring the inactive
constraints. In each iteration one then gets a problem on the form (I.13) and obtains a
search direction to a new iterate at which the set of active constraints is updated. This
implies that in each iteration one solves a symmetric system of linear equations with an
indefinite symmetric matrix. The iterates of an active-set method will always be feasible to
the original problem. See, e.g., [41, Chapter 15] for an introduction to active-set methods.
The second approach entails moving in the interior of the feasible region, and in a sense
letting all constraints be inactive but eventually reaching an optimal solution at which some
constraints are active. Such a method is called an interior-point method. The main idea
of interior-point methods is to perturb the first-order necessary optimality conditions of the
10 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
given problem. The perturbed optimality conditions for (I.15) for some µ > 0 are given by
Ax − s = b,
Hx − AT λ = −c,
SΛe = µe,
where S and Λ are diagonal matrices with the elements of s and λ on their respective
diagonals and e is a vector of ones. The inequalities s ≥ 0 and λ ≥ 0 are kept implicitly
in the method. These perturbed optimality conditions are called the primal-dual nonlinear
equations. An primal-dual interior-point method is based on applying Newton iterations
to the primal-dual nonlinear equations. Such an iteration can be written on the form of a
symmetric system of linear equations with a matrix that is symmetric and indefinite given
by
�
��
�
�
�
∆x
H
AT
Hx + c − AT λ
=−
,
−1
−1
−∆λ
A −SΛ
Ax − b − µΛ e
where ∆s has been eliminated.
Interior-point methods are illustrated here for solving (I.15), however these methods
can be applied directly to (I.12) where the inequality constraints can be handled by a barrier transformation. There are also interior-point methods for solving linear programming
problems on standard form. Symmetric systems of linear equations with an indefinite
symmetric matrix arise in all of the above mentioned cases. See, e.g., [20, 21, 41, 53] for
introductions to interior-point methods.
I.3.1
Solving symmetric systems of linear equations
Symmetric systems of linear equations on the form
Hx + c = 0,
(I.16)
with H = H T arise in many methods for constrained optimization as indicated in the
previous section. In addition, finding a solution to (I.16), if such a solution exists, or in
the incompatible case a minimum-residual solution, argminx∈IRn �Hx + c�22 , are both
important problems in the field of numerical linear algebra. As the only assumption on
(I.16) is that H = H T , it may be the case that H is indefinite/singular and the system of
linear equations (I.16) may be incompatible. Note that H and c in (I.16) represents the
whole matrix on the left-hand side and the whole right-hand side of a symmetric system of
linear equations, not to be confused with the H and c that appears as a components in e.g.
the matrix and right-hand side of (I.14).
In this thesis the mathematics behind so-called Krylov subspace methods for solving
(I.16) is considered. These methods are based on generating an orthogonal basis for the
expanding Krylov subspaces generated by H and c given in (I.11). In particular, this
orthogonal basis will be generated by a symmetric version of a process given by Lanczos
in [35, 36] and this process is commonly referred to as a symmetric Lanczos process.
There has been a significant amount of research devoted to the theory on using Lanczos
I NTRODUCTION
11
processes to solve (I.16) both for H symmetric and H non symmetric, see, e.g., [24,29,43].
The discussion here on solving (I.16) is aimed towards the treatment of this problem in the
appended papers, it should be pointed out that solving (I.16) can also be done by applying
different matrix-factorization techniques to H, see, e.g., [25, Chapter 5].
The symmetric Lanczos process is given as follows. Choose q0 = c/�c�2 such that
�q0 �2 = 1, and generate {qk }rk=1 as
−β0 q1 = −Hq0 + α0 q0
−βk qk+1 = −Hqk + αk qk + βk−1 qk−1 ,
k = 1, . . . , r − 1,
(I.17)
where the coefficients α and β are chosen such that one gets orthonormal vectors {qk }r−1
k=1 .
These choices are given by,
α0 =
and
β0 =
q0T Hq0
,
q0T q0
q0T Hq1
,
q0T q0
αk =
βk =
qkT Hqk
,
qkT qk
(I.18)
k = 1, . . . r − 1,
T
qk−1
Hqk
,
T q
qk−1
k−1
(I.19)
k = 1, . . . r − 1.
⊥
Then {qk }r−1
k=0 are such that qk ∈ Kk+1 (c, H) ∩ Kk (c, H) and �qk �2 = 1 for all k =
0, . . . , r−1 and qr = 0. The three-term recurrence in (I.17) is due to the fact that H = H T ,
see, e.g. [46, Chapter 6].
Note that if one only requires that the generated vectors should be orthogonal then
there is one degree of freedom in each iteration and (I.17) may be expressed as
q1 = θ0 (−Hq0 + α0 q0 )
qk+1 = θk (−Hqk + αk qk + βk−1 qk−1 ),
(I.20)
k = 1, . . . , r − 1,
where the scalar θk represents the available scaling factor in each iteration k and α and β
are as in (I.18) and (I.19) respectively. These vectors {qk }rk=0 given by (I.20) are referred
to as Lanczos vectors.
Let Qk be the matrix with the Lanczos vectors q0 , q1 , . . . , qk as columns vectors, then
(I.20) may be written on matrix form as,
HQk = Qk+1 T̄k = Qk Tk −
1
qk+1 eTk+1 ,
θk
where

α0

 − θ1
0
Tk = 


β0
..
.
..
.
..
..
.
.
1
− θk−1

βk−1
αk


,


T̄k =
�
Tk
− θ1k eTk+1
�
.
(I.21)
12 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
For the choice of θk = − β1k one obtains the recursion (I.17) and it holds that ||qk ||2 =
||q0 ||2 for all k. Further, for this choice of θk , Tk will be a symmetric matrix. Changing
the set of {θk }r−1
k=0 can be seen as a similarity transform of Tk , see, e.g., [29].
Many methods for solving (I.16) use matrix-factorization techniques on Tk or T̄k given
in (I.21). For an introduction to how Krylov subspace methods are formalized in this way,
see, e.g., [8, 43]. In the work presented in this thesis we choose to work with the recursion
in (I.20) directly.
As the set of vectors {qk }r−1
k=0 form an orthogonal basis for the Krylov subspaces in
(I.11), each vector may be expressed as
qk =
k
�
(j)
γk H j c = H
j=0
k
��
j=1
�
(j)
(0)
γk H j−1 c + γk c,
k = 0, . . . , r,
(j)
for some scalars {γk }kj=0 . The following choice
θ0 =
1
,
α0
θk =
1
,
αk + βk−1
(I.22)
k = 1, . . . , r − 1,
(0)
of θk in (I.20) will give a vector qk+1 on the form of a gradient gk+1 , i.e. with γk = 1.
(0)
Requiring γk = 1 is usually referred to as the normalization/consistency condition.
We will denote by a normalized symmetric Lanczos process the symmetric Lanczos
process where the particular choice of scaling (I.22) is used and the generated vector are
made to satisfy the normalization condition. Note that (I.22) may not always be a well
defined expression, so a normalized symmetric Lanczos process may suffer a so-called
pivot breakdown. The idea of utilizing an unnormalized symmetric Lanczos process in
order to avoid such breakdowns is due to Gutknecht [27, 28].1 In an unnormalized setting
(0)
there is no restriction on the sign or value of γk . One can show that this quantity can
not be zero in two consecutive iterations. This property is used in for example composite
step biconjugate gradient methods and other methods employing look-ahead techniques to
avoid breakdowns, see, e.g., [2, 7, 31].
I.3.2
Methods based on the symmetric Lanczos process
A symmetric Lanczos process (normalized or not) is by itself not sufficient to obtain a solution to (I.16). Hence, a method for solving (I.16) needs to manage additional quantities
such that a solution can be extracted when the stopping criteria is reached. We shall mention a few such Krylov subspace methods, but for a thorough treatment of such methods,
both for the symmetric and non symmetric case, see, e.g., [46, Chapters 5, 6, 7] and [29].
The method of conjugate gradients may be viewed as a method for solving (I.16) based
on the normalized symmetric Lanczos process. In the normalized case search directions
1 Gutknecht considers the non symmetric case and uses the unnormalized framework to avoid pivot breakdowns. When solving (I.16) for a nonsymmetric H there are several possible breakdowns for methods based on
a nonsymmetric Lanczos process, see, e.g., [29, 46]. For the symmetric case the pivot breakdown is the only one
that can happen.
I NTRODUCTION
13
may be defined and the three-term recurrence in (I.20) may be reduced to a two-term
recurrence using these search directions. Note that when viewing the method of conjugate
gradients as a method based on a normalized symmetric Lanczos process it becomes clear
that the search directions are not necessary for the formulation but rather may be viewed as
a product of the normalization itself. The breakdown of the method of conjugate gradients
when solving (I.16) corresponds exactly to the case when normalization is not possible.
For the case H = H T � 0 no breakdown of the method of conjugate gradients will occur,
and further it holds that if H = H T � 0 and c ∈ R(H) normalization will always be
possible.
As mentioned it is also possible to consider a method based on the unnormalized
symmetric Lanczos process in order to avoid pivot-breakdowns. Such an unnormalized
Krylov subspace method is described in Paper B and in terms of the survey [29] this
method is called an inconsistent ORes version of the method of conjugate gradients.
Further in the case when (I.16) is incompatible one can consider a minimum-residual
method that delivers a minimum-residual solution to (I.16). Minimum residual methods was first suggested in Lanczos’ early paper [36] and by Stiefel in [51]. The name,
minimum-residual method, is adopted from the implementation of the method, MINRES,
by Paige and Saunders, see [44]. In this implementation, which is based on the symmetric Lanczos process, the optimal solution of minx∈IRn �Hx + c�22 obtained is in general
not of minimum Euclidean norm. In [8], Choi, Paige and Saunders present a MINRESlike algorithm, MINRES-QLP, that does deliver a minimum-residual solution of minimum
Euclidean norm.
In the second paper of this thesis (Paper B) the mathematical properties of methods
based on an unnormalized symmetric Lanczos process are considered and both a minimumresidual solution and a minimum-residual solution of minimum Euclidean norm in the case
when (I.16) is incompatible are characterized.
I.4 Summary of papers
In this section a short summary of each of the appended papers is given, highlighting the
main contributions. The common theme of the three papers is the in-depth study of the
mathematical properties of methods for solving symmetric systems of linear equations
arising in various optimization problem formulations and derivation of new theoretical
results.
Paper A
A. Forsgren and T. Odland. On the connection between the conjugate gradient method and
quasi-Newton methods on quadratic problems. Computational Optimization and Applications, 60 (2015), pp 377-392.
We derive conditions on the update matrix Uk , and hence on the quasi-Newton matrix
Bk such that the search directions generated by the corresponding quasi-Newton method
and the method of conjugate gradients are parallel when applied to an unconstrained strictly
14 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
convex quadratic optimization problem or equivalently a symmetric system of linear equations with a symmetric positive definite matrix. The conditions state that the range of the
update matrix must lie in the last two dimensions of the Krylov subspace generated by H
and c and that in addition the so-called quasi-Newton condition must be satisfied. These
conditions on Uk , that are based on a sufficient condition to obtain mutually conjugate
search directions, are shown to be equivalent to the quasi-Newton scheme belonging to the
one-parameter Broyden family.
An explicit formula for the one-to-one correspondence between the Broyden parameter and the scaling of the search direction from the corresponding quasi-Newton method
compared to the search direction in the method of conjugate gradients is stated.
Paper B
A. Forsgren and T. Odland. On solving symmetric systems of linear equations in an unnormalized Krylov subspace framework. arXiv:1409.4937 [math.OC], 2014.
We study the mathematical properties of an unnormalized Krylov subspace framework
for solving a symmetric system of linear equations Hx + c = 0. This framework is based
on an unnormalized symmetric Lanczos process for generating an orthogonal basis for the
expanding Krylov subspaces generated by H and c. Using only the three-term recurrence
relations for a triple of quantities associated with each orthogonal vector from the unnormalized symmetric Lanczos process we give conditions on whether the symmetric system
of linear equations is compatible or not. In the compatible case a solution is obtained and
in the incompatible case we obtain a certificate of incompatibility based on a projection on
the null space of H and characterize a minimum-residual solution to the symmetric system
of linear equations.
Further we derive a minimum-residual method in the unnormalized Krylov subspace
framework and give explicit recursions of the minimum residual iterates. In the case of an
incompatible symmetric system of linear equations we characterize a minimum-residual
solution of minimum Euclidean norm.
Paper C
A. Forsgren and T. Odland. On the equivalence of the method of conjugate gradients
and quasi-Newton methods on quadratic problems. arXiv:1503.01892 [math.OC], 2015.
Submitted to Computational Optimization and Applications.
We derive necessary and sufficient conditions on the quasi-Newton matrix Bk and the
update matrix Uk such that the search direction in the corresponding quasi-Newton method
is parallel to the search direction in the method of conjugate gradients on an unconstrained
strictly convex quadratic problem or equivalently a symmetric system of linear equations
with a symmetric positive definite matrix.
We show that the set of quasi-Newton schemes admitted by these necessary and sufficient conditions are strictly larger than the one-parameter Broyden family. Further we show
that this set of quasi-Newton schemes contains an infinite number of symmetric rank-one
I NTRODUCTION
15
update schemes. The necessary and sufficient conditions derived in this paper admit nonsymmetric quasi-Newton schemes as well, however it is shown that symmetry of Bk can
be achieved without effort.
I.4.1 Contribution by co-author
Throughout the work Anders Forsgren has acted as an advisor in the classical sense and is
therefore co-author of all three papers.
I.5 References
[1]
N. Andréasson, A. Evgrafov, and M. Patriksson. An Introduction to Continuous Optimization. Professional Publishing Svc., 2007.
[2]
R. E. Bank and T. F. Chan. An analysis of the composite step biconjugate gradient
method. Numer. Math., 66(3):295–319, 1993.
[3]
C. G. Broyden. Quasi-Newton methods and their application to function minimisation. Math. Comp., 21:368–381, 1967.
[4]
C. G. Broyden. The convergence of a class of double-rank minimization algorithms.
II. The new algorithm. J. Inst. Math. Appl., 6:222–231, 1970.
[5]
A. Buckley. Termination and equivalence results for conjugate gradient algorithms.
Math. Programming, 29(1):64–76, 1984.
[6]
F. Carlsson and A. Forsgren. Iterative regularization in intensity-modulated radiation
therapy optimization. Medical physics, 33(1):225–234, 2006.
[7]
T. F. Chan and T. Szeto. Composite step product methods for solving nonsymmetric
linear systems. SIAM J. Sci. Comput., 17(6):1491–1508, 1996.
[8]
S.-C. T. Choi, C. C. Paige, and M. A. Saunders. MINRES-QLP: a Krylov subspace method for indefinite or singular symmetric systems. SIAM J. Sci. Comput.,
33(4):1810–1836, 2011.
[9]
A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to derivative-free optimization, volume 8 of MPS/SIAM Series on Optimization. Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA, 2009.
[10] J. Cullum and R. K. Brayton. Some remarks on the symmetric rank-one update. J.
Optim. Theory Appl., 29(4):493–519, 1979.
[11] W. C. Davidon. Variable metric method for minimization. SIAM J. Optim., 1(1):1–17,
1991.
16 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
[12] J. W. Demmel. Applied numerical linear algebra. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, 1997.
[13] J. E. Dennis, Jr. and J. J. Moré. A characterization of superlinear convergence and its
application to quasi-Newton methods. Math. Comp., 28:549–560, 1974.
[14] L. C. W. Dixon. Quasi-Newton algorithms generate identical points. Math. Programming, 2:383–387, 1972.
[15] R. Fletcher. A new approach to variable metric algorithms. The Computer Journal,
13(3):317–322, Jan. 1970.
[16] R. Fletcher. Practical methods of optimization. A Wiley-Interscience Publication.
John Wiley & Sons Ltd., Chichester, second edition, 1987.
[17] R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization. Comput. J., 6:163–168, 1963/1964.
[18] R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients. Comput. J., 7:149–154, 1964.
[19] R. Fletcher and J. W. Sinclair. Degenerate values for Broyden methods. J. Optim.
Theory Appl., 33(3):311–324, 1981.
[20] A. Forsgren, P. E. Gill, and J. R. Shinnerl. Stability of symmetric ill-conditioned
systems arising in interior methods for constrained optimization. SIAM J. Matrix
Anal. Appl., 17:187–211, 1996.
[21] A. Forsgren, P. E. Gill, and M. H. Wright. Interior methods for nonlinear optimization. SIAM Rev., 44(4):525–597 (2003), 2002.
[22] P. E. Gill, W. Murray, and M. H. Wright. Practical optimization. Academic Press
Inc. [Harcourt Brace Jovanovich Publishers], London, 1981.
[23] D. Goldfarb. A family of variable-metric methods derived by variational means.
Math. Comp., 24:23–26, 1970.
[24] G. H. Golub and D. P. O’Leary. Some history of the conjugate gradient and Lanczos
algorithms: 1948–1976. SIAM Rev., 31(1):50–102, 1989.
[25] G. H. Golub and C. F. Van Loan. Matrix computations, volume 3 of Johns Hopkins
Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore,
MD, 1983.
[26] I. Griva, S. G. Nash, and A. Sofer. Linear and nonlinear optimization. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2009.
[27] M. H. Gutknecht. The unsymmetric Lanczos algorithms and their relations to Pade
approximation, continued fractions, and the QD algorithm. Preliminary Proceedings
of the Copper Mountain Conference on Iterative methods, April, 1990.
I NTRODUCTION
17
[28] M. H. Gutknecht. A completed theory of the unsymmetric Lanczos process and
related algorithms, part I. SIAM J. Matrix Anal. Appl., 13(2):594–639, 1992.
[29] M. H. Gutknecht. Lanczos-type solvers for nonsymmetric linear systems of equations. In Acta numerica, 1997, volume 6 of Acta Numer., pages 271–397. Cambridge
Univ. Press, Cambridge, 1997.
[30] M. H. Gutknecht. A brief introduction to Krylov space methods for solving linear systems. In Frontiers of Computational Science, pages 53–62. Springer-Verlag Berlin,
2007. International Symposium on Frontiers of Computational Science, Nagoya
Univ, Nagoya, Japan, Dec 12-13, 2005.
[31] M. H. Gutknecht and K. J. Ressel. Look-ahead procedures for Lanczos-type product methods based on three-term Lanczos recurrences. SIAM J. Matrix Anal. Appl.,
21(4):1051–1078, 2000.
[32] W. W. Hager and H. Zhang. A survey of nonlinear conjugate gradient methods. Pac.
J. Optim., 2(1):35–58, 2006.
[33] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear
systems. J. Research Nat. Bur. Standards, 49:409–436 (1953), 1952.
[34] H. Y. Huang. Unified approach to quadratically convergent algorithms for function
minimization. J. Optimization Theory Appl., 5:405–423, 1970.
[35] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear
differential and integral operators. J. Research Nat. Bur. Standards, 45:255–282,
1950.
[36] C. Lanczos. Solution of systems of linear equations by minimized-iterations. J.
Research Nat. Bur. Standards, 49:33–53, 1952.
[37] D. G. Luenberger. Linear and nonlinear programming. Addison-Wesley Pub Co,
Boston, MA, second edition, 1984.
[38] G. E. Myers. Properties of the conjugate-gradient and Davidon methods. J. Optimization Theory Appl., 2:209–219, 1968.
[39] L. Nazareth. A relationship between the BFGS and conjugate gradient algorithms
and its implications for new algorithms. SIAM J. Numer. Anal., 16(5):794–800, 1979.
[40] J. Nocedal, A. Sartenaer, and C. Zhu. On the behavior of the gradient norm in the
steepest descent method. Computational Optimization and Applications, 22(1):5–35,
2002.
[41] J. Nocedal and S. J. Wright. Numerical optimization. Springer Series in Operations
Research. Springer-Verlag, New York, 1999.
18 O N M ETHODS FOR S YMMETRIC L INEAR E QUATIONS ARISING IN O PTIMIZATION
[42] S. S. Oren and D. G. Luenberger. Self-scaling variable metric (SSVM) algorithms. I.
Criteria and sufficient conditions for scaling a class of algorithms. Management Sci.,
20:845–862, 1973/74. Mathematical programming.
[43] C. C. Paige. Krylov subspace processes, Krylov subspace methods, and iteration
polynomials. In Proceedings of the Cornelius Lanczos International Centenary Conference (Raleigh, NC, 1993), pages 83–92, Philadelphia, PA, 1994. SIAM.
[44] C. C. Paige and M. A. Saunders. Solutions of sparse indefinite systems of linear
equations. SIAM J. Numer. Anal., 12(4):617–629, 1975.
[45] S. S. Petrova and A. D. Solov’ev. The origin of the method of steepest descent.
Historia Mathematica, 24(4):361 – 375, 1997.
[46] Y. Saad. Iterative methods for sparse linear systems. Society for Industrial and
Applied Mathematics, Philadelphia, PA, second edition, 2003.
[47] D. F. Shanno. Conditioning of quasi-Newton methods for function minimization.
Math. Comp., 24:647–656, 1970.
[48] D. F. Shanno. Conjugate gradient methods with inexact searches. Math. Oper. Res.,
3(3):244–256, 1978.
[49] D. F. Shanno and P. C. Kettler. Optimal conditioning of quasi-Newton methods.
Math. Comp., 24:657–664, 1970.
[50] J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical report, Pittsburgh, PA, USA, 1994.
[51] E. Stiefel. Relaxationsmethoden bester Strategie zur Lösung linearer Gleichungssysteme. Comment. Math. Helv., 29:157–179, 1955.
[52] A. Tits. Some investigations about a unified approach to quadratically convergent
algorithms for function minimization. J. Optimization Theory Appl., 20(4):489–496,
1976.
[53] S. J. Wright. Primal-dual interior-point methods. Society for Industrial and Applied
Mathematics (SIAM), Philadelphia, PA, 1997.