Download Report

Research
Matters
How
and Why
to Estimate
Condition February
Numbers
Matrix Functions
25,for
2009
Nick
Higham
Nick
Higham
Director
SchoolofofResearch
Mathematics
The
University
of Manchester
School
of Mathematics
http://www.maths.manchester.ac.uk/~higham
@nhigham, nickhigham.wordpress.com
Joint work with Sam Relton.
Computational Linear Algebra and Optimization
for the Digital Economy
1/6
Edinburgh, October 2013
Sources of Error
f : Cn×n → Cn×n .
Want to compute f (A) but given A + ∆A not A.
∆A may come from
Rounding errors in storing A.
Measurement errors.
Errors from an earlier computation.
Nick Higham
Matrix Function Condition Numbers
2 / 33
Sources of Error
f : Cn×n → Cn×n .
Want to compute f (A) but given A + ∆A not A.
∆A may come from
Rounding errors in storing A.
Measurement errors.
Errors from an earlier computation.
Photocopying errors!
Nick Higham
Matrix Function Condition Numbers
2 / 33
Photocopier
http://en.wikipedia.org/wiki/Photocopier
“Most current photocopiers use a technology called
xerography, a dry process that uses electrostatic
charges on a light sensitive photoreceptor to first attract
and then transfer toner particles (a powder) onto paper
in the form of an image”
“There is an increasing trend for new photocopiers to
adopt digital technology, thus replacing the older
analog technology. With digital copying, the copier
effectively consists of an integrated scanner and laser
printer.”
Nick Higham
Matrix Function Condition Numbers
3 / 33
Xerox WorkCentre 7535, 7556
August 2013: German computer scientist David Kriesel
discovered the machines often change “6” to “8”.
Jbig2 compression algorithm implicated.
Nick Higham
Matrix Function Condition Numbers
4 / 33
Maxim
Every code for solving a numerical problem should return
an error estimate or bound with the computed result.
Nick Higham
Matrix Function Condition Numbers
5 / 33
Conditioning of f (x)
Scalar function f ∈ C 2 , and y = f (x), y + ∆y = f (x + ∆x).
Then
0 ∆y
xf (x) ∆x
=
+ O ∆x 2 .
y
f (x)
x
The relative condition number
0 xf (x) .
c(x) = f (x) Nick Higham
Matrix Function Condition Numbers
6 / 33
Conditioning of f (x)
Scalar function f ∈ C 2 , and y = f (x), y + ∆y = f (x + ∆x).
Then
0 ∆y
xf (x) ∆x
=
+ O ∆x 2 .
y
f (x)
x
The relative condition number
0 xf (x) .
c(x) = f (x) Condition number of the condition number: assuming
xf 0 (x) > 0 and f (x) > 0,
0 00
xc (x) f (x) f 0 (x) [2]
c (x) := = 1+x
−
.
c(x) f 0 (x)
f (x) Nick Higham
Matrix Function Condition Numbers
6 / 33
Fréchet Derivative
Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n
A linear mapping Lf : Cn×n → Cn×n s.t. for all E ∈ Cn×n
f (X + E) − f (X ) − Lf (X , E) = o(kEk).
Nick Higham
Matrix Function Condition Numbers
7 / 33
Fréchet Derivative
Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n
A linear mapping Lf : Cn×n → Cn×n s.t. for all E ∈ Cn×n
f (X + E) − f (X ) − Lf (X , E) = o(kEk).
Examples:
(X + E)2 − X 2 = XE + EX + E 2 ⇒ Lx 2 (X , E) = XE + EX .
(X + E)−1 − X −1 = −X −1 EX −1 + O(E 2 )
⇒ Lx −1 (X , E) = −X −1 EX −1 .
Nick Higham
Matrix Function Condition Numbers
7 / 33
Fréchet Derivative
Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n
A linear mapping Lf : Cn×n → Cn×n s.t. for all E ∈ Cn×n
f (X + E) − f (X ) − Lf (X , E) = o(kEk).
Examples:
(X + E)2 − X 2 = XE + EX + E 2 ⇒ Lx 2 (X , E) = XE + EX .
(X + E)−1 − X −1 = −X −1 EX −1 + O(E 2 )
⇒ Lx −1 (X , E) = −X −1 EX −1 .
Scalar case: Lf (x, e) = f (0 x)e.
Nick Higham
Matrix Function Condition Numbers
7 / 33
Gâteaux Derivative
Gâteaux derivative of matrix function f at A in direction E:
f (A + E) − f (A)
.
→0
Gf (A, E) = lim
Weaker notion of differentiability.
Nick Higham
Matrix Function Condition Numbers
8 / 33
Applications of Fréchet Derivative
Computation of correlated choice probabilities (2013),
registration of MRI images (2013),
Markov models applied to cancer data (2013),
matrix geometric mean computation (2012),
model reduction (2012),
first and second Fréchet derivative in Halley’s method
on Banach space (2003).
Nick Higham
Matrix Function Condition Numbers
9 / 33
Componentwise Sensitivity (1)
Let E = ei ejT . Then
f (X + ei ejT ) − f (X )
lim
= Lf (X , ei ejT ).
→0
(Lf (X , ei ejT ))rs measures the sensitivity of f (X )rs to
perturbations in aij .
Nick Higham
Matrix Function Condition Numbers
10 / 33
Componentwise Sensitivity (2)
Since Lf is a linear operator,
vec(Lf (A, E)) = Kf (A)vec(E)
where Kf (A) ∈ Cn
derivative.
2 ×n2
is the Kronecker form of the Fréchet
a11 a21 a31 . . . ann
f11
f21
f31
..
.
Kf (A) ≡
Lf (A, ei ejT )
fnn
Nick Higham
Matrix Function Condition Numbers
11 / 33
Condition Number
kf (A + E) − f (A)k
.
→0 kEk≤
condabs (f , A) = lim sup
kLf (A)k := max
E6=0
kLf (A, E)k
.
kEk
Lemma
condabs (f , A) = kLf (A)k.
Nick Higham
Matrix Function Condition Numbers
12 / 33
Condition Number
kf (A + E) − f (A)k
.
→0 kEk≤
condabs (f , A) = lim sup
kLf (A)k := max
E6=0
kLf (A, E)k
.
kEk
Lemma
condabs (f , A) = kLf (A)k.
Scalar case: condabs (f , x) = |f 0 (x)|.
Nick Higham
Matrix Function Condition Numbers
12 / 33
Relative Condition Number
kf (A + E) − f (A)k
→0 kEk≤kAk
kf (A)k
condrel (f , A) := lim
sup
= condabs (f , A)
Nick Higham
kAk
.
kf (A)k
Matrix Function Condition Numbers
13 / 33
Computing Lf : Methods for Specific f
exponential Kenney & Laub (1998),
Al-Mohy & H (2009)
logarithm Al-Mohy, H & Relton (2013)
fractional power H & Lin (2013)
Latter three methods obtained by differentiating
alg for f .
Nick Higham
Matrix Function Condition Numbers
14 / 33
Computing Lf : Via 2n × 2n Matrix
Theorem (Mathias, 1996)
If f is 2n − 1 times ctsly diffble,
A E
f (A) Lf (A, E)
f
=
.
0 A
0
f (A)
Note that Lf (A, αE) = αLf (A, E), but α may effect alg
used for the evaluation.
Nick Higham
Matrix Function Condition Numbers
15 / 33
Computing Lf : Complex Step
Assume that f : Rn×n → Rn×n and A, E ∈ Rn×n . Then
f (A + i hE) − f (A) − i hLf (A, E) = o(h).
Thus (Al-Mohy & H, 2010)
f (A) ≈ Re f (A + i hE),
Lf (A, E) ≈ Im
f (A + i hE)
.
h
Errors O(h2 ).
h not restricted by fl pt arith considerations. Can take
h = 10−100 .
f alg must not employ complex arith.
Nick Higham
Matrix Function Condition Numbers
16 / 33
Condition Estimation (1)
Since Lf is a linear operator,
vec(Lf (A, E)) = Kf (A)vec(E)
where Kf (A) ∈ Cn
derivative.
2 ×n2
is the Kronecker form of the Fréchet
Can show
kLf (A)kF = kKf (A)k2 ,
kLf (A)k1
≤ kKf (A)k1 ≤ nkLf (A)k1 .
n
So problem reduces to matrix norm estimation.
Nick Higham
Matrix Function Condition Numbers
17 / 33
Condition Estimation (2)
Use the block 1-norm estimator of H & Tisseur (2000).
For kBk1 it needs Bx and B ∗ y for several x and y .
Let vec(X ) = x. Then Kf (A)x = vec(Lf (A, X )).
Theorem (H & Lin, 2013)
Let f ∈ C 2n−1 and and ef (z) := f (z).
If ef (A)∗ = ef (A∗ ) for all A ∈ Cn×n then
Kf (A)∗ x = vec(Lef (A, X ∗ )∗ ), where vec(X ) = x.
ef = f for most functions of interest.
ef = f ⇒ ef (A∗ )∗ = ef (A∗ ).
Nick Higham
Matrix Function Condition Numbers
18 / 33
Software for 1-Norm Estimator
MATLAB: normest1.
NAG library: F04YD, F04ZD, Mark 24.
Python:
Nick Higham
Matrix Function Condition Numbers
19 / 33
Python: SciPy 0.13.0
linalg/_expm_frechet.py
linalg/_expm_multiply.py
linalg/_sqrtm.py (blocked)
Nick Higham
Matrix Function Condition Numbers
20 / 33
Higher Derivatives
Needed for
level-2 condition number,
understanding accuracy of algorithms for computing
Lf (A, E).
(2)
Second Fréchet derivative Lf (A, E1 , E2 ) is unique
mutlilinear function of E1 , E2 satisfying
(2)
Lf (A+E2 , E1 ) − Lf (A, E1 ) − Lf
Nick Higham
(A, E1 , E2 ) = o(kE2 k).
Matrix Function Condition Numbers
21 / 33
Higher Derivatives
Needed for
level-2 condition number,
understanding accuracy of algorithms for computing
Lf (A, E).
(2)
Second Fréchet derivative Lf (A, E1 , E2 ) is unique
mutlilinear function of E1 , E2 satisfying
(2)
Lf (A+E2 , E1 ) − Lf (A, E1 ) − Lf
(A, E1 , E2 ) = o(kE2 k).
k th Fréchet derivative defined by
(k −1)
Lf
(k −1)
(A+Ek , E1 , . . . , Ek −1 ) − Lf
(k )
− Lf
Nick Higham
(A, E1 , . . . , Ek −1 )
(A, E1 , . . . , Ek ) = o(kEk k).
Matrix Function Condition Numbers
21 / 33
Existing Literature
Large literature on Fréchet derivatives in Banach
space.
Need specialized results for matrix functions.
Nick Higham
Matrix Function Condition Numbers
22 / 33
Existence & Continuity of Fréchet Derivatives
D = open subset of C.
Cn×n (D, p) = matrices with spectrum in D and largest
Jordan block of size p.
Theorem (H & Relton, 2013)
Let f be 2k p − 1 times continuously differentiable on D.
(k )
Then for A ∈ Cn×n (D, p) the kth Fréchet derivative Lf (A)
(k )
exists and Lf (A, E1 , . . . , Ek ) is continuous in A and
E1 , . . . , Ek ∈ Cn×n .
Proof uses Gâteaux derivative.
k = 1: Mathias (1996).
Nick Higham
Matrix Function Condition Numbers
23 / 33
Properties
Assume from now on conditions of theorem satisfied.
Then the Ei are interchangeable:
(2)
(2)
Lf (A, E1 , E2 ) = Lf (A, E2 , E1 ).
Indeed
(k )
Lf (A, E1 , . . . , Ek )
∂
f (A + s1 E1 + · · · + sk Ek ).
=
∂s1 · · · ∂sk s=0
Nick Higham
Matrix Function Condition Numbers
24 / 33
The Setup
Wish to estimate norms and condition numbers.
Knowing how to evaluate exact quantities helps
develop and test the estimates.
In practice we only ever work with n × n matrices.
Nick Higham
Matrix Function Condition Numbers
25 / 33
How to Compute Lf(2)
X1 =
Let
h
i
A E f (A) Lf (A,E)
1
.
Know
f
(X
)
=
.
1
0
f (A)
0 A


A
E
E
0
1
2
 0 A 0 E2 
0 1

X2 = I2 ⊗ X1 +
⊗ E2 = 
 0 0 A E1  .
0 0
0 0 0 A
Then

(2)
f (A) Lf (A, E1 ) Lf (A, E2 ) Lf (A, E1 , E2 )
 0
f (A)
0
Lf (A, E2 ) 
.
f (X2 ) = 
 0
0
f (A)
Lf (A, E1 ) 
0
0
0
f (A)

Nick Higham
Matrix Function Condition Numbers
26 / 33
)
How to Compute L(k
f
i
i
Define Xi ∈ C2 n×2 n by
0 1
Xi = I2 ⊗ Xi−1 +
⊗ I2i−1 ⊗ Ei ,
0 0
X0 = A.
Theorem (H & Relton, 2013)
(k )
The (1, n) block of f (Xk ) is Lf (A, E1 , . . . , Ek ).
Nick Higham
Matrix Function Condition Numbers
27 / 33
Level-2 Condition Number
“Condition number of the condition number”.
Demmel (1987) showed that for
matrix inversion (and D. J. Higham, 1995),
the eigenproblem,
polynomial zero-finding,
pole assignment in linear control problems
(relative) level-1 and level-2 cond no’s are equivalent.
Cheung & Cucker (2005) show same holds when
“condition number = 1 / distance to nearest ill-posed
problem”.
Nick Higham
Matrix Function Condition Numbers
28 / 33
Level-2 Condition Number
[2]
|condabs (f , A + Z ) − condabs (f , A)|
.
→0 kZ k≤
condabs (f , A) = lim sup
Theorem (H & Relton, 2013)
[2]
(2)
condabs (f , A) ≤ kKf (A)k2 ,
(2)
for a Kronecker matrix Kf (A) ∈ Cn
Nick Higham
4 ×n2
.
Matrix Function Condition Numbers
29 / 33
Level-2 Condition Number: Exponential
Theorem
Let A ∈ Cn×n be normal. Then in the 2-norm
[2]
condabs (exp, A) = condabs (exp, A).
Nick Higham
Matrix Function Condition Numbers
30 / 33
Level-2 Condition Number: Exponential
Theorem
Let A ∈ Cn×n be normal. Then in the 2-norm
[2]
condabs (exp, A) = condabs (exp, A).
Theorem
Let A ∈ Cn×n be normal. Then in the 2-norm
[2]
1 ≤ condrel (exp, A) ≤ 2condrel (exp, A) + 1.
Nick Higham
Matrix Function Condition Numbers
30 / 33
Level-2 Condition Number: Matrix Inverse
Theorem
For nonsingular A ∈ Cn×n ,
[2]
condabs (x −1 , A) = 2condabs (x −1 , A)3/2 .
Latter is what is expected from the scalar case:
f (x) = x −1 ⇒ |f 00 | = |2(f 0 )3/2 |.
Have experimental evidence that matrix case can be
similar to scalar case.
Nick Higham
Matrix Function Condition Numbers
31 / 33
Condition Number of the Fréchet Derivative
kLf (A + ∆A, E + ∆E) − Lf (A, E)k
.
→0 k∆Ak≤
condabs (Lf , A, E) = lim sup
k∆Ek≤
Theorem (H & Relton, 2013)
We have
condabs (f , A) ≤ condabs (Lf , A, E)
(2)
≤ max kLf (A, E, Z )k + condabs (f , A).
kZ k=1
We have developed method to estimate the bounds.
Nick Higham
Matrix Function Condition Numbers
32 / 33
Summary and Open Questions
Showed that given an O(n3 ) flop method for f :
(k )
Computing Lf (A, E) costs O(8k n3 ) flops.
(k )
Computing Kf (A) costs O(8k n3+2k ) flops.
Cost of estimates is O(n3 ) flops, given an O(n3 ) flop
method for Lf .
Open questions about relation between level-1 and
level-2 condition numbers.
How can we exploit the symmetry of
(k )
Lf (E1 , E2 , . . . , Ek ) in the Ei ?
Looking at componentwise condition numbers.
Nick Higham
Matrix Function Condition Numbers
33 / 33
References I
S. D. Ahipasaoglu, X. Li, and K. Natarajan.
A convex optimization approach for computing
correlated choice probabilities with many alternatives.
Preprint 4034, Optimization Online, 2013.
26 pp.
A. H. Al-Mohy and N. J. Higham.
Computing the Fréchet derivative of the matrix
exponential, with an application to condition number
estimation.
SIAM J. Matrix Anal. Appl., 30(4):1639–1657, 2009.
Nick Higham
Matrix Function Condition Numbers
1/8
References II
A. H. Al-Mohy and N. J. Higham.
The complex step approximation to the Fréchet
derivative of a matrix function.
Numer. Algorithms, 53(1):133–148, 2010.
A. H. Al-Mohy, N. J. Higham, and S. D. Relton.
Computing the Fréchet derivative of the matrix logarithm
and estimating the condition number.
SIAM J. Sci. Comput., 35(4):C394–C410, 2013.
S. Amat, S. Busquier, and J. M. Gutiérrez.
Geometric constructions of iterative functions to solve
nonlinear equations.
J. Comput. Appl. Math., 157(1):197–205, 2003.
Nick Higham
Matrix Function Condition Numbers
2/8
References III
J. Ashburner and G. R. Ridgway.
Symmetric diffeomorphic modelling of longitudinal
structural MRI.
Frontiers in Neuroscience, 6, 2013.
Article 197.
J. W. Demmel.
On condition numbers and the distance to the nearest
ill-posed problem.
Numer. Math., 51:251–289, 1987.
Nick Higham
Matrix Function Condition Numbers
3/8
References IV
B. García-Mora, C. Santamaría, G. Rubio, and J. L.
Pontones.
Computing survival functions of the sum of two
independent Markov processes. An application to
bladder carcinoma treatment.
Int. Journal of Computer Mathematics, 2013.
D. J. Higham.
Condition numbers and their condition numbers.
Linear Algebra Appl., 214:193–213, 1995.
Nick Higham
Matrix Function Condition Numbers
4/8
References V
N. J. Higham.
Functions of Matrices: Theory and Computation.
Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2008.
ISBN 978-0-898716-46-7.
xx+425 pp.
N. J. Higham and A. H. Al-Mohy.
Computing matrix functions.
Acta Numerica, 19:159–208, 2010.
N. J. Higham and L. Lin.
An improved Schur–Padé algorithm for fractional
powers of a matrix and their Fréchet derivatives.
SIAM J. Matrix Anal. Appl., 34(3):1341–1360, 2013.
Nick Higham
Matrix Function Condition Numbers
5/8
References VI
N. J. Higham and S. D. Relton.
The condition number of the Fréchet derivative of a
matrix function.
MIMS EPrint, Manchester Institute for Mathematical
Sciences, The University of Manchester, UK, 2013.
In preparation.
N. J. Higham and S. D. Relton.
Higher order Fréchet derivatives of matrix functions and
the level-2 condition number.
MIMS EPrint 2013.58, Manchester Institute for
Mathematical Sciences, The University of Manchester,
UK, Nov. 2013.
19 pp.
Nick Higham
Matrix Function Condition Numbers
6/8
References VII
B. Jeuris, R. Vandebril, and B. Vandereycken.
A survey and comparison of contemporary algorithms
for computing the matrix geometric mean.
Electron. Trans. Numer. Anal., 39:379–402, 2012.
C. S. Kenney and A. J. Laub.
A Schur–Fréchet algorithm for computing the logarithm
and exponential of a matrix.
SIAM J. Matrix Anal. Appl., 19(3):640–663, 1998.
R. Mathias.
A chain rule for matrix functions and applications.
SIAM J. Matrix Anal. Appl., 17(3):610–620, 1996.
Nick Higham
Matrix Function Condition Numbers
7/8
References VIII
D. Petersson.
A Nonlinear Optimization Approach to H2 -Optimal
Modeling and Control.
PhD thesis, Department of Electrical Engineering,
Linköping University, Sweden, SE-581 83 Linköping,
Sweden, 2013.
Dissertation No. 1528.
D. Petersson and J. Löfberg.
Model reduction using a frequency-limited H2 -cost.
Technical report, Department of Electrical Engineering,
Linköpings Universitet, SE-581 83, Linköping, Sweden,
Mar. 2012.
13 pp.
Nick Higham
Matrix Function Condition Numbers
8/8