Why Norms (p. 52) Vector and Matrix Norms Matrix Computations — CPSC 5006 E Julien Dompierre Department of Mathematics and Computer Science Laurentian University Norms serve the same purpose on vector spaces that absolute values do on the real line: they furnish a measure of distance. More precisely, IRn together with a norm on IRn defines a metric space. Therefore, we have the familiar notions of neighborhood, open sets, convergence, and continuity when working with vectors and vector valued functions. Sudbury, September 29, 2010 Remind: Inner Product Vector Norm on a Vector Space The inner product (x, y ) of two vectors x and y in IRn is given by (x, y ) = x T y = n X xi yi = x1 y1 + x2 y2 + · · · + xn yn . i =1 For complex vectors, the inner product (x, y ) of two vectors x and y in Cn is given by (x, y ) = n X i =1 xi y¯i = x1 y¯1 + x2 y¯2 + · · · + xn y¯n = y H x Definition A vector norm on a vector space V over a field K is a function f : V −→ K that satisfies the following properties: 1. f (x) ≥ 0 for all x ∈ V . 2. f (x) = 0 if and only if x = 0. 3. f (αx) = |α|f (x) for all α ∈ K and for all x ∈ V . 4. f (x + y ) ≤ f (x) + f (y ) for all x, y ∈ V . Notes. Item 4. is called the triangle inequality. Vector Norm on IRn Important Example: Euclidean Norm An important example is when the vector space V is IRn and the field K is IR. Definition A vector norm on a vector space IRn over a field IR is a function f : IRn −→ IR that satisfies the following properties: 1. f (x) ≥ 0 for all x ∈ IRn . Suppose that V = C2 and K = C,p then the function H 1/2 f (x) = kxk2 = x x = (x, x) = |x1 |2 + |x2 |2 is a vector norm. Suppose that V = Cn and K = C,p then the function f (x) = kxk2 = x H x = (x, x)1/2 = |x1 |2 + |x2 |2 + · · · + |xn |2 is a vector norm. 2. f (x) = 0 if and only if x = 0. 3. f (αx) = |α|f (x) for all α ∈ IR and for all x ∈ IRn . 4. f (x + y ) ≤ f (x) + f (y ) for all x, y ∈ IRn . Notes. Item 4. is called the triangle inequality. We denote such a function f with a double bar notation: f (x) = kxk. Subscript on the double bar are used to distinguish between various norms. The p-Norms (p. 52) H¨older Inequality (p. 53) The k · k2 is the most common vector norm in linear algebra. It is a special case of the p-norms, p ≥ 1, defined by kxkp = n X !1/p p |xi | A classic result concerning p-norms is the H¨ older inequality: p p p 1/p = (|x1 | + |x2 | + · · · + |xn | ) . i =1 Of these, the 1, 2, and ∞ norms are the most important in practice: kxk1 = |x1 | + |x2 | + · · · + |xn |, kxk2 = (|x1 |2 + |x2 |2 + · · · + |xn |2 )1/2 = (x H x)1/2 , kxk∞ = max(|x1 |, |x2 |, · · · , |xn |) = max |xi |. 1≤i ≤n Note. The infinity norm is the limit when p tend to infinity of the p-norm, i.e. lim kxkp = max |xi | = kxk∞ . p→∞ 1≤i ≤n |x T y | ≤ kxkp ky kq , where 1 1 + = 1. p q The numbers p and q above are said to be H¨ older conjugates of each other. A very important special case of the H¨older inequality is the Cauchy-Schwartz inequality: |x T y | ≤ kxk2 ky k2 . Norm Equivalence (p. 53) Norm Equivalence If x ∈ IRn , then n All norms on IR are equivalent, i.e., if k · kα and k · kβ are norms on IRn , then there exist positive constants c1 and c2 such that kxk2 ≤ kxk1 ≤ kxk∞ ≤ kxk2 ≤ c1 kxkα ≤ kxkβ ≤ c2 kxkα Unit Circles 1.5 −1.5 −1.5 1.0 −1.0 −1.0 0.5 −0.5 0.67 0.33 0 0.33 0.0 1.3 1 −0.5 0 0.0 0.0 0.67 0.33 0.5 Y −1.0 1.0 −1.5 1.5 0.5 0.67 1 1.0 1.3 1.0 0.5 n kxk∞ 0.0 −0.5 −1.0 −1.5 1.5 1.0 0.5X 0.0 The sequence {x (k) } of n-vectors   1 + 1/k   o n  k/(k + 2)  x (k) =   1/k          5/4 4/3 3/2 2    1/3  ,  2/4  ,  3/5  ,  4/6  , · · · =   1/4 1/3 1/2 1 0 −0.5 1.5 n kxk2 Convergence of Vectors (p. 54) What are the unit circle Cp = {x such that kxkp = 1} associated with the norm k · kp for p = 1, 2, ∞, in IR2 ? 1 √ kxk∞ ≤ kxk1 ≤ n kxk∞ for all x ∈ IRn . 1.3 √ −0.5 −1.0 −1.5 1.5 1.5 1.0 0.5X 0.0 −0.5 −1.0 −1.5 converges to   1 x =  1 . 0 Note: Convergence of x (k) to x is the same as the convergence of (k) each individual component xi of x (k) to the corresponding component xi of x. Convergence of Vectors (p. 54) We say that a sequence {x (k) }, k = 1, 2, ..., ∞, of n-vectors converges to a vector x with respect to the norm k · k if lim kx (k) − xk = 0. k→∞ Note that because the equivalence of norms, convergence in the α-norm implies convergence in the β-norm and vice-versa. We will use the notation lim x (k) = x. k→∞ Frobenius Norm (p. 55) Matrix Norm (p. 55) Since IRm×n is isomorphic to IRmn , i.e. we can consider m × n matrices as vectors in IRmn , then the definition of a matrix norm should be equivalent to the definition of a vector norm. In particular, f : IRm×n −→ IR is a matrix norm if the following four properties hold: 1. f (A) ≥ 0 for all A ∈ IRm×n 2. f (A) = 0 if and only if A = 0 3. f (αA) = |α|f (A), for all α ∈ IR and for all A ∈ IRm×n 4. f (A + B) ≤ f (A) + f (B), for all A, B ∈ IRm×n . Notes. As with vector norms, we use a double bar notation with subscripts to designate matrix norms, i.e., f (A) = kAk. Submultiplicativity Property of Matrix Norms (p. 55) The most frequently used matrix norm in numerical linear algebra is the Frobenius norm (or Euclidean norm or Schur norm). This norm is the same as the 2-norm of the column vector in IRmn consisting of all the columns (respectively rows) of A, i.e. v uX n u m X |aij |2 . kAkF = t i =1 j=1 A matrix norm k · k satisfies the submultiplicativity property (or consistency) if kABk ≤ kAk kBk. The Frobenius norm is submultiplicative. (Hint: Use Cauchy-Schwartz). However, the Frobenius norm of the identity matrix is not one, i.e., √ kIn kF = n 6= 1. Vector Norms in IRm×n and Submultiplicativity Induced Matrix Norm by a Vector Norm Let k · k be a vector norm on IRn . From this vector norm, we can define a corresponding matrix norm as follows: In general, matrix norms in IRmn are not submultiplicative. For example, consider kAk∆ = maxi ,j |aij |, which is the infinity norm k · k∞ in IRmn , and let 1 1 A=B = , 1 1 then 2 = kABk∆ > kAk∆ kBk∆ = 1 · 1 = 1. kAk = sup x∈IRn x6=0 kAxk . kxk These norms satisfy the usual properties of vector norms. These norms are submultiplicative (or consistent). With these norms, the identity matrix satisfies kIn k = 1. Again, important cases are with vector norms k · kp with p = 1, 2, and ∞: kAxkp kAkp = max . x∈IRn kxkp x6=0 The matrix norm k · kp is induced by the vector norm k · kp . The matrix norm k · kp is subordinate to the vector norm k · kp . Submultiplicativity of Matrix Norms A consequence of the submultiplicative of matrix norms is k kA kp ≤ kAkkp . Maximum on Unit Vectors (p. 55) It is clear that kAkp is the p-norm of the largest vector obtained by applying A to the unit p-norm vector: 2. lim Ak = 0. kAxkp kxk−1 kAxkp p = max −1 x∈IRn kxk kxk x∈IRn kxkp p p x6=0 x6=0 x A = max x6=0 kxkp p 3. lim Ak x = 0 for all x ∈ IRn . = This implies the following theorem. Let A ∈ IRn×n . The following four conditions are equivalent: 1. kAkp < 1 for some p. k→∞ k→∞ 4. ρ(A) < 1. kAkp = max max kAxkp . kxkp =1 Convenient Expression for k · k1 and k · k∞ kAk1 = kAk∞ = max kAxk1 = max kxk1 =1 1≤j≤n m X |aij | i =1 max kAxk∞ = max kxk∞ =1 Spectral Radius of a Matrix 1≤i ≤m n X |aij | j=1 If A ∈ IRn×n , then the spectral radius of A, denoted by ρ(A), is given by ρ(A) = max |λi (A)|, 1≤i ≤n where λi (A), i = 1, 2, ..., n are all eigenvalues of A. The 1-norm of a matrix is the maximum column sum and the ∞-norm is the maximum row sum. Properties of the Spectral Radius ρ(A) is not a norm. Indeed, for 0 1 A= , 0 0 we have ρ(A) = 0 while A 6= 0. Also, triangle inequality is not satisfied for the pair A and B = AT . An other property is ρ(A) ≤ kAk for any matrix norm. k · k2 and Spectral Radius kAk2 = q q ρ(AH A) = ρ(AAH ). P P Remind that the tr (A) = ni=1 aii = ni=1 λi (A). Then q q H kAkF = tr (A A) = tr (AAH ). Equivalence of Matrix Norms (p. 56) The Frobenius and p-norms (especially p = 1, 2, ∞) satisfy certain inequalities that are frequently used in analysis of matrix computations. For A ∈ IRm×n we have √ kAk2 ≤ kAkF ≤ n kAk2 √ max |aij | ≤ kAk2 ≤ mn max |aij | i ,j i ,j √ 1 √ kAk∞ ≤ kAk2 ≤ m kAk∞ n √ 1 √ kAk1 ≤ kAk2 ≤ n kAk1 m