MATH08007 Linear Algebra S2, 2011/12 Lecture 1

MATH08007
Linear Algebra
S2, 2011/12
Lecture 1
0. Prelude - What is Linear Algebra?
The key idea that underlies the course is the notion of LINEARITY. Roughly speaking, a
property is linear if, whenever objects X and Y have the property, so do X + Y and λX, where
λ is a real or complex scalar. For this to make sense, we need to be able to add our objects X
and Y and multiply them by scalars.
The first key idea we shall meet is that of sets with a linear structure, that is sets in which there
are natural definitions of addition and scalar multiplication. Simple examples are
(a) R3 (3-dimensional space) with
(x1 , x2 , x3 ) + (y1 , y2 , y3 ) = (x1 + y1 , x2 + y2 , x3 + y3 )
and
λ(x1 , x2 , x3 ) = (λx1 , λx2 , λx3 )
for (x1 , x2 , x3 ), (y1 , y2 , y3 ) ∈ R3 and λ ∈ R.
(b) The set CR [a, b] of all continuous functions f : [a, b] → R, with addition f + g and scalar
multiplication λf (λ ∈ R) given by
(f + g)(t) = f (t) + g(t) and (λf )(t) = λf (t)
for a ≤ t ≤ b.
These two examples, though different in concrete terms, have similar ‘linear’ structures; they
are in fact both examples of vector spaces (the precise definition will be given in Chapter 1).
Linear algebra is the study of linear structures (vector spaces) and certain mappings between
them (linear mappings). To illustrate, consider the following simple examples.
µ
¶
a b
(a) A 2 × 2 real matrix A =
defines a mapping R2 → R2 given by
c d
¶
¶ µ
¶
µ
¶µ
µ
ax1 + bx2
a b
x1
x1
.
=
→
T :
cx1 + dx2
x2
x2
c d
This has the property that T (x + y) = T x + T y and T (λx) = λ(T x); in other words, T
respects the linear structure in R2
(b) For f ∈ CR [a, b], put
Z
b
Tf =
f (t) dt.
a
Here T maps CR [a, b] to R and satisfies
T (f + g) = T f + T g and T (λf ) = λT f
so that, as in (a), T respects the linear structures in CR [a, b] and R.
Such mappings T that respect or preserve the linear structure are called linear mappings.
1. Vector spaces
Throughout, R and C denote, respectively, the real and complex numbers, referred to as scalars.
F will denote either R or C when it is unimportant which set (field) of scalars we are using.
Fn is the set of all n-tuples of elements of F and we write a typical element x of Fn either as a
column (vector)


x1
 x2 


x =  .. 
 . 
xn
or as a row (vector) x = (x1 , x2 , . . . , xn ). In Fn , addition and scalar multiplication are defined
coordinatewise by
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
λ(x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ).
This is the basic example of a vector space over F.
The formal definition of a vector space V over F is as follows. V is a non-empty set with the
following structure.
• Given elements x, y ∈ V , we can add them to obtain an element x + y of V .
• Given x ∈ V and λ ∈ F, we can form the product λx, again an element of V .
• We need these (usually natural) operations to obey the familiar algebraic laws of arithmetic:
(a) x + y = y + x for all x, y ∈ V (addition is commutative);
(b) x + (y + z) = (x + y) + z for all x, y, z ∈ V (addition is associative);
(c) there exists an element 0 ∈ V such that 0 + x = x (= x + 0) for all x ∈ V ;
(d) given x ∈ V , there is an element −x ∈ V such that x + (−x) = 0 (= (−x) + 0)
and
(e) λ(x + y) = λx + λy;
(f) (λ + µ)x = λx + µx;
(g) (λµ)x = λ(µx)
for all x, y ∈ V and all λ, µ ∈ F; and finally
(h) 1.x = x for all x ∈ V .
[(a)-(d) say that V is an abelian group under +; (e)-(g) are the natural distributive laws linking
addition and scalar multiplication; (h) is a natural normalization for scalar multiplication.]
We call V a real (resp. complex) vector space when F = R (resp. C).
The examples we shall encounter in the course will comprise either sets of points in Fn , with
2
algebraic operations defined coordinatewise, or sets of scalar valued functions defined on a set,
with algebraic operations defined pointwise on the set. In both cases, properties (a),(b), (e),
(f), (g) and (h) hold automatically. Also, 0 = 0.x (where the 0 on the l.h.s. is the zero element
of V and the 0 on the r.h.s. in the scalar 0) and −x = (−1).x for every x ∈ V . The crucial
properties to check are
• If x, y ∈ V , is x + y in V ?
• If x ∈ V and λ ∈ F, is λx in V ?
When these two properties hold, we say that V is closed under addition and scalar multiplication.
Examples
1. Fn with addition and scalar multiplication defined coordinatewise as above. Here (writing
elements as row vectors), 0 = (0, 0, . . . , 0) and −(x1 , x2 , . . . , xn ) = (−x1 , −x2 , . . . , −xn ).
2. The set Mm,n (F) of all m × n matrices with entries in F and with addition and scalar
multiplication defined entrywise. Here 0 in Mm,n (F) is just the zero matrix (all entries 0)
and −(ai,j ) = (−ai,j ).
3. The set Pn of all polynomials in a variable t with real coefficients and degree less than or
equal to n and with addition and scalar multiplication defined pointwise in the variable t.
This is a real vector space. (Note that the set of such polynomials of exact degree n is not
a vector space.)
Lecture 2
1. Vector spaces contd.
Subspaces
Let V be a vector space over F and let U ⊆ V .
Definition We say that U is a subspace of V if, with addition and scalar multiplication inherited
from V , U is a vector space in its own right.
(1.1) Proposition Let U be a subset of a vector space V over F. Then U is a subspace of V if
and only if
(a) 0 ∈ U ;
(b) x + y ∈ U for all x, y ∈ U ;
(c) λx ∈ U for all x ∈ U and all λ ∈ F.
Comments
1. The zero element of a subspace of V is the same as the zero element of V .
2. We could replace (a) by just saying that U 6= ∅.
3. V is a subspace of itself; {0} is also a subspace of V .
Examples
1. {p ∈ Pn : p(0) = 0} is a subspace of Pn .
3
2. {x ∈ Rn : x1 ≥ 0} is not a subspace of Rn .
3. The subspaces of R2 are just R2 , {0}, and straight lines through 0. Likewise, the subspaces
of R3 are just R3 , {0}, and straight lines or planes through 0.
Spans
Definitions Let V be a vector space over F.
(i) Given vectors v1 , v2 , . . . , vk in V , we call a vector u of the form
u = λ 1 v1 + λ 2 v2 + · · · + λ k vk
for some λ1 , λ2 , . . . , λk ∈ F a linear combination of v1 , . . . , vk .
(ii) Let S be a non-empty subset of V . Then the span (or linear span) of S, denoted by span S,
is the set of all linear combinations of elements of S. Thus
span S = {λ1 v1 + · · · + λk vk : λj ∈ F, vj ∈ S}.
Notes
1. An alternative notation to span S is lin S.
2. k here may be different for different elements of span S.
3. S may be infinite but we only consider finite linear combinations.
4. If S is finite, say S = {v1 , . . . , vm }, then
span S = {λ1 x1 + · · · + λm xm : λj ∈ F}.
(1.2) Theorem Let S be a non-empty subset of a vector space V . Then
(i) span S is a subspace of V ;
(ii) if U is a subspace of V and S ⊆ U , then span S ⊆ U .
[Thus span S is the smallest subspace of V containing S.]
Examples
1. In R2 , the span of a single non-zero vector v = (a, b) is just the straight line through the
point (a, b) and the origin. Similarly, in R3 , the span of a single non-zero vector v = (a, b, c)
is just the straight line through the point (a, b, c) and the origin.
Lecture 3
1. Vector spaces contd.
Examples of spans contd.
4
2. Let v1 = (1, 1, 2) and v2 = (−1, 3, 1) in R3 . Does (1,0,1) belong to span {v1 , v2 }? Try to
write (1, 0, 1) as λv1 + µv2 for some λ, µ ∈ R; that is, write
(1, 0, 1) = λ(1, 1, 2) + µ(−1, 3, 1).
Equating coordinates, this gives
λ − µ = 1,
λ + 3µ = 0,
2λ + µ = 1.
The first two equations give λ = 3/4, µ = −1/4, but then 2λ + µ 6= 1 so the third equation
is not satisfied. Thus no such λ, µ exist and (1, 0, 1) does not belong to span {v1 , v2 }.
3. Let P denote the vector space of all real polynomials of arbitrary degree in a variable t,
with the natural definitions of addition and scalar multiplication. Then
span {1, t, t2 , . . . , tn } = Pn ,
the vector space of all such polynomials with degree less than or equal to n.
Spanning sets
Definition A non-empty subset S of a vector space V is a spanning set for V (or spans V ) if
span S = V .
Examples
1. Let v1 = (1, 0, 0), v2 = (0, 1, 0), v3 = (1, 1, 1) in R3 . Then
(x1 , x2 , x3 ) = (x1 − x3 )v1 + (x2 − x3 )v2 + x3 v3
and so {v1 , v2 , v3 } spans R3 .
2. Let v1 = (1, 0), v2 = (0, 2), v3 = (3, 1) in R2 . Then
(x1 , x2 ) = x1 v1 + (x2 /2)v2 + 0.v3
or
(x1 , x2 ) =
³x
x1
x1 ´
x1
2
v1 +
−
v2 + v3 .
2
2
12
6
So span {v1 , v2 , v3 } = R2 here.
Note
In Example 1 above, the expression for (x1 , x2 , x3 ) as a linear combination of {v1 , v2 , v3 } is
unique. However, in Example 2, the expression for (x1 , x2 ) as a linear combination of {v1 , v2 }
is not unique. Indeed, we have
span {v1 , v2 } = R2
and so v3 is not needed to span R2 . This redundancy in a spanning set will lead to the notion
of a basis of a vector space.
Linear dependence and independence
Definition A set of vectors {v1 , v2 , . . . , vk } in a vector space V over F is said to be linearly
dependent if there exist λ1 , λ2 , . . . , λk in F, not all zero, such that
λ1 v1 + λ2 v2 + · · · + λk vk = 0
5
(i.e. there is a non-trivial linear combination of the vj ’s equal to zero).
The set {v1 , v2 , . . . , vk } is linearly independent if it is not linearly dependent.
(We tacitly assume here that the vectors v1 , v2 , . . . , vk are distinct.)
Comments
1. If some vj = 0, then {v1 , . . . , vk } is linearly dependent since in that case we have
0.v1 + · · · + 0.vj−1 + 1.vj + 0.vj+1 + · · · + 0.vk = 0.
2. The definition of linear independence can be reformulated as: {v1 , . . . , vk } is linearly independent if
λ1 v1 + λ2 v2 + · · · + λk vk = 0 ⇒ λ1 = λ2 = · · · = λk = 0.
3. If v 6= 0 then the set {v} is linearly independent.
(1.3) Proposition The vectors v1 , . . . , vk are linearly dependent if and only if one of them is a
linear combination of the others.
Examples
1. Let v1 = (1, 2, −1), v2 = (0, 1, 2), v1 = (−1, 1, 7) in R3 . Are the vectors v1 , v2 , v3 linearly
dependent or linearly independent?
Consider λ1 v1 + λ2 v2 + λ3 v3 = 0. Equating coordinates leads to
λ1 − λ3 = 0 ; 2λ1 + λ2 + λ3 = 0 ; −λ1 + 2λ2 + 7λ3 = 0 ;
and then λ1 = λ3 (from the first equation), 3λ1 +λ2 = 0 (from the second) and 6λ1 +2λ2 = 0
(from the third). So we get (taking λ1 = 1 for instance), the non-trivial solution
λ1 = λ3 = 1 and λ2 = −3. Hence
v1 − 3v2 + v3 = 0
and the vectors v1 , v2 , v3 are linearly dependent.
2. The vectors v1 = (1, 1, 1, −1) and v2 = (2, 1, 1, 1) in R4 are linearly independent since
neither is a scalar multiple of the other.
Comment
Although in this course we shall primarily be concerned with the notions of linear dependence
and independence for finite sets of vectors, these notions extend to sets that may be infinite.
Specifically, given an arbitrary (possibly infinite) set S in a vector space V over F, S is said to
be linearly dependent if, for some k ∈ N, there exist distinct v1 , . . . , vk in S and λ1 , . . . , λk in F,
with some λj 6= 0, such that
λ1 v1 + · · · + λk vk = 0;
and S is linearly independent if it is not linearly dependent.
Bases and dimension
Definition A set of vectors B in a vector space V is called a basis of V if
(i) span B = V, and (ii) B is linearly independent.
Examples
6
1. Let v1 = (1, 0) and v2 = (0, 1) in R2 . Then B = {v1 , v2 } is a basis of R2 .
Lecture 4
1. Vector spaces contd.
Examples of bases contd.
2. Let B = {v1 , v2 } where v1 = (3, 1) and v2 = (1, −2) in R2 . Here, if we try to find λ1 and
λ2 such that
(x1 , x2 ) = λ1 (3, 1) + λ2 (1, −2)
we find (solving the two coordinate equations) that we get
(x1 , x2 ) =
2x1 + x2
x1 − 3x2
(3, 1) +
(1, −2),
7
7
giving span B = R2 . Also, {v1 , v2 } is linearly independent (neither vector is a scalar
multiple of the other). So B is a basis of R2 .
3. B = {1, t, t2 , . . . , tn } is a basis of Pn . It is clear that span B = Pn . For linear independence,
suppose that λ0 .1 + λ1 .t + · · · + λn .tn = 0. Differentiate k times (0 ≤ k ≤ n) and put t = 0
to get λk k! = 0. Hence λk = 0 for k = 0, . . . , n and B is linearly independent. Thus B is
a basis of Pn .
(1.4) Theorem Let B = {v1 , . . . , vn } in a vector space V over F. Then B is a basis of V if
and only if each element x ∈ V can be written uniquely as
x = λ1 v1 + · · · + λn vn ,
with λ1 , . . . , λn ∈ F.
Informally, the way to think of a basis B of V is that it has two properties:
(i) span B = V , so B is big enough to generate V ;
(ii) B is linearly independent, so we cannot omit any element of B and still have the span
equal to V (using Proposition (1.3)).
The aim now is to show that, if a particular basis of V has n elements, then every basis will
have exactly n elements and then we are able to define the dimension of V to be n. To do this,
we need the following result.
(1.5) Theorem (Exchange Theorem) Let {u1 , . . . , um } span V and let {v1 , . . . , vk } be a linearly
independent subset of V . Then
(i) k ≤ m;
(ii) V is spanned by v1 , . . . , vk and m − k of the u’s.
To prove this result, we use the following simple lemma.
(1.6) Lemma Let S ⊆ V and let u ∈ S. Suppose that u is a linear combination of the other
vectors in S, i.e. u ∈ span (S\{u}). Then
span S = span (S\{u}).
Corollary of (1.5) Let {v1 , . . . , vn } be a basis of V . Then every basis of V has exactly n
elements.
7
Lecture 5
1. Vector spaces contd.
Definition Let V be a vector space over F. If V has a basis with n elements, then we define the
dimension of V , denoted by dim V , to be n. [This is a well-defined concept by the Corollary to
Theorem (1.5).] Otherwise, we say that V is infinite dimensional.
Examples of bases and dimension
1. In Rn , let ej = (0, 0, . . . , 0, 1, 0, . . . , 0), where the 1 is in the j th coordinate. Then
{e1 , . . . , en } is a basis of Rn and dim Rn = n. This basis is often referred to as the
standard basis of Rn .
2. From an earlier example, put v1 = (3, 1) and v2 = (1, −2) in R2 . Then {v1 , v2 } spans R2
and {v1 , v2 } is clearly linearly independent since neither vector is a multiple of the other.
Thus {v1 , v2 } is also a basis of R2 . [Note the illustration of the Corollary of Theorem (1.5)
here.]
3. The set {1, t, t2 , . . . , tn } is a basis of Pn ; so dim Pn = n + 1.
4. The vector space, P , of real polynomials of arbitrary degree is infinite dimensional. [If it
had a basis {p1 , . . . , pn }, then every polynomial in P would have degree no more than the
maximum of the degrees of p1 , . . . , pn , contradicting the fact that there are polynomials of
arbitrarily large degree.]
5. Find a basis of
U = {x = (x1 , x2 , x3 , x4 ) ∈ R4 : x1 + x2 + x3 + x4 = 0}.
Putting x4 = −x1 − x2 − x3 leads to the vectors {(1, 0, 0, −1), (0, 1, 0, −1), (0, 0, 1, −1)}
spanning U and these vectors are easily checked to be linearly independent, so form a basis
of U ; dim U = 3.
Convention
If V = {0}, we say that the empty set ∅ is a basis of V and that dim V = 0.
The following results are simple consequences of the Exchange Theorem (1.5). This simple
lemma (see Tut.Qn. 3.1) is useful in their proof.
(1.7) Lemma Let {v1 , . . . , vn } be a linearly independent subset of a vector space V and let
v ∈ V . If v ∈
/ {v1 , . . . , vn }, then {v, v1 , . . . , vn } is also linearly independent.
(1.8) Theorem Let dim V = n.
(i) A linearly independent subset of V has most n elements; if it has n elements then it is a
basis of V .
(ii) A spanning subset of V has least n elements; if it has n elements then it is a basis of V .
(iii) A subset of V containing at least n + 1 elements is linearly dependent; a subset of V with
at most n − 1 elements cannot span V .
8
(iv) Let U be a subspace of V . Then U is finite dimensional and dim U ≤ dim V . If dim U =
dim V , then U = V .
(1.9) Theorem (Extension to a basis) Let {v1 , . . . , vk } be a linearly independent in a finite
dimensional vector space V . Then there exists a basis of V containing {v1 , . . . , vk } (i.e. vectors
can be added to {v1 , . . . , vk } to obtain a basis).
Lecture 6
1. Vector spaces contd.
Example
Let v1 = (1, 0, 1, 0) and v2 = (1, 1, 0, 1). Extend {v1 , v2 } to obtain a basis of R4 . (The set {v1 , v2 }
is clearly linearly independent since both vectors in it are non-zero and v2 is not a multiple of v1 .)
To do this, apply the procedure detailed in the proof of the Exchange Theorem (1.5), starting
with the linearly independent set {v1 , v2 } and the spanning set comprising the standard basis
{e1 , e2 , e3 , e4 } of R4 . This will yield the basis {v1 , v2 , e3 , e4 } of R4 .
In a similar way, a spanning set can be reduced to a basis.
(1.10) Theorem Let S be a finite set that spans a vector space V . Then some subset of S
forms a basis of V and (hence) V is finite dimensional.
Comment
The proof of this result actually shows that, given a finite set S in a vector space, there is a
linearly independent subset S 0 of S with span S 0 = span S (so that S 0 is a basis of span S).
Example
Let v1 = (1, 0, 1), v2 = (0, 1, 1), v3 = (1, −1, 0) and v4 = (1, 0, 0). Find a linearly independent
subset of {v1 , v2 , v3 , v4 } with the same span. Note that {v1 , v2 } is linearly independent but
v3 = v1 − v2 . So span{v1 , v2 , v3 , v4 } = span{v1 , v2 , v4 }. Now check whether or not v4 is a linear
combination of v1 and v2 . Setting v4 = λ1 v1 + λ2 v2 leads (equating coordinates) to λ1 = 1,
λ2 = 0 and λ1 + λ2 = 0, so no such λ1 , λ2 exist and v4 is not a linear combination of v1 and v2 .
Thus {v1 , v2 , v4 } is linearly independent and has the same span as {v1 , v2 , v3 , v4 }. (This span
will be all of R3 since dim R3 = 3.)
Sums of subspaces
Definition The sum U1 + U2 of two subspaces U1 , U2 of a vector space V is defined as
U1 + U2 = {u1 + u2 : u1 ∈ U1 and u2 ∈ U2 }.
Compare the following result with that of Theorem (1.2).
(1.11) Theorem U1 + U2 is a subspace of V containing U1 and U2 . Further, if W is a subspace
of V with U1 , U2 ⊆ W , then U1 + U2 ⊆ W . Hence U1 + U2 is the smallest subspace of V that
contains both U1 and U2 .
Examples
1. In R2 , let U1 = span{v1 } and U2 = span{v2 }, where v1 , v2 6= 0 and neither is a multiple
of the other. Then U1 + U2 = R2 . Note that each v ∈ R2 can be written uniquely as
v = u1 + u2 with u1 ∈ U1 and u2 ∈ U2 .
9
2. In P4 , let U1 be the subspace of all polynomials of degree less than or equal to 3 and
let U2 be the subspace of all even polynomials of degree less than or equal to 4. Then
U1 + U2 = P4 . Note that, here, the expression of a given p ∈ P4 as p = p1 + p2 , with
p1 ∈ U1 and p2 ∈ U2 , is not unique.
Direct sums
Suppose that U1 ∩ U2 = {0}. Then each u ∈ U1 + U2 can be expressed uniquely as u = u1 + u2 ,
with u1 ∈ U1 and u2 ∈ U2 . In this situation, we call write U1 ⊕ U2 for U1 + U2 and call this the
direct sum of U1 and U2 .
Comment
In Example 1 above, R2 = U1 ⊕ U2 whilst, in Example 2, P4 is not the direct sum of U1 and U2 .
(1.12) Theorem (The dimension theorem) Let U1 and U2 be subspaces of a finite dimensional vector space. Then
(i) dim(U1 + U2 ) = dim U1 + dim U2 − dim(U1 ∩ U2 );
(ii) if U1 ∩ U2 = {0}, then dim(U1 ⊕ U2 ) = dim U1 + dim U2 .
Lecture 7
1. Vector spaces contd.
Comment
Compare the formula in Theorem (1.12)(i) with the formula
#(A ∪ B) = #(A) + #(B) − #(A ∩ B)
for finite sets A, B, where # denotes ‘number of elements in’.
Exercise
Verify the results of Theorem (1.12) in the cases of the Examples 1 and 2 of sums of subspaces
in Lecture 6.
Coordinates
Let B = {v1 , . . . , vn } be a basis of a vector space V . Then each v ∈ V can be written uniquely
as
v = λ1 v1 + · · · + λn vn
with the coefficients λj ∈ F. We call λ1 , . . . , λn the coordinates of v with respect to the basis B;
we also call
(a) the row vector (λ1 , . . . , λn ) the coordinate row vector of v w.r.t. B; and


λ1


(b) the column vector  ...  the coordinate column vector of v w.r.t. B.
λn
Note: The order in which the vectors in B are listed is important when we consider row or
column coordinate vectors.
Examples
10
1. Let B be the standard basis {e1 , . . . , en } in Fn (ej = (0, 0, . . . , 0, 1, 0, . . . , 0) with the 1 in
the j th 
place).If x = (x1 , . . . , xn ) ∈ Fn , then the column coordinate vector of x w.r.t. B
x1
 .. 
is just  . .
xn
2. In R2 , consider the basis B = {(1, 0), (2, 3)}. Then the coordinate column vector of
¡ ¢
(4, 9) ∈ R2 w.r.t. B is −2
3 since (4, 9) = −2(1, 0) + 3(2, 3).
Change of basis
Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector space V . Express each
vector in B2 in terms of B1 thus:
w1 = p11 v1 + p21 v2 + · · · + pn1 vn
w2 = p12 v1 + p22 v2 + · · · + pn2 vn
·········
(∗)
wn = p1n v1 + p2n v2 + · · · + pnn vn .
Note that (∗) can be written as
wj =
n
X
pij vi
i=1
for j = 1, . . . , n. We call the matrix

p11 p12 · · ·
 p21 p22 · · ·
P =

···
pn1 pn2 · · ·

p1n
p2n 


pnn
the change of basis matrix from B1 to B2 . It is important to note that the columns of the matrix
P are the rows of the coefficients in (∗) (or, equivalently, the columns of P are the column
coordinate vectors of w1 , . . . , wn w.r.t. B1 ).
Note that, writing B1 and B2 as rows (v1 , . . . , vn ) and (w1 , . . . , wn ) respectively, (∗) can be
written as
(w1 , . . . , wn ) = (v1 , . . . , vn )P.
Example
Let B1 = {e1 , e2 , e3 } be the standard basis of R3 and let B2 = {w1 , w2 , w3 } be the basis given
by
w1 = (1, 0, 1) ; w2 = (1, 1, 0) ; w3 = (1, 1, 1).
Then
w1 = 1.e1 + 0.e2 + 1.e3
w2 = 1.e1 + 1.e2 + 0.e3
w3 = 1.e1 + 1.e2 + 1.e3
11
and so the change of basis matrix from B1 to B2 is


1 1 1
P =  0 1 1 .
1 0 1
We also have
e1 = 1.w1 + 1.w2 − 1.w3
e2 = −1.w1 + 0.w2 + 1.w3
e3 = 0.w1 − 1.w2 + 1.w3 ,
so the change of basis matrix from B2 to B1 is


1 −1 0
0 −1  .
Q= 1
−1 1
1
Lecture 8
1. Vector spaces contd.
Comments
1. Notice that, in the example at the end of Lecture 7, the columns of P are just the elements
of B2 written as column vectors. This true is true more generally.
Important fact
Let B1 = {e1 , . . . , en } be the standard basis of Fn and let B2 = {v1 , . . . , vn } be another
basis. Suppose that
v1 = (α11 , α21 , . . . , αn1 ) , v2 = (α12 , α22 , . . . , αn2 ) , . . . , vn = (α1n , α2n , . . . , αnn )
(that is, vj = (α1j , α2j , . . . , αnj ) for j = 1, . . . , n). Then the change of basis matrix from
B1 to B2 is the matrix


α11 α12 · · · α1n
 α21 α22 · · · α2n 


 ..
..
..  ,
 .
.
. 
αn1 αn2 · · · αnn
the columns of which are just the vectors vj written as column vectors.
2. Note also that, in this same example, P Q = QP = I3 , the 3 × 3 identity matrix. Hence
the change of basis matrix P from B1 to B2 is invertible and P −1 = Q, the change of basis
matrix from B2 to B1 . This is true in general.
(1.13) Theorem Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector space
V . Let P be the change of basis matrix from B1 to B2 and Q the change of basis matrix from
B2 to B1 . Then P is invertible and Q = P −1 .
12
The importance of a change of basis matrix is that it gives a way of relating thecolumn
 coorλ1


dinates of a vector relative to two bases. Note that we can express the fact that  ...  is the
λn
coordinate column vector of a vector v w.r.t. a basis {v1 , . . . , vn } by writing


λ1


(∗)
v = (v1 , . . . , vn )  ...  .
λn
Formally, (∗) denotes v = v1 λ1 + · · · + vn λn , but we interpret this as the usual expression
v = λ1 v1 + · · · + λn vn for v in terms of {v1 , . . . , vn }.
(1.14) Theorem Let B1 = {v1 , . . . , vn } and B2 = {w1 , . . . , wn } be two bases of a vector
space V . Let P be the change of basis matrix from B1 to B2 and Q the change
 of basis
 matrix
λ1


from B2 to B1 (so that Q = P −1 ). Let v ∈ V have coordinate column vector  ...  w.r.t. B1
λn


µ1
 .. 
and coordinate column vector  .  w.r.t. B2 . Then
µn








µ1
λ1
λ1
µ1
 .. 
 . 
 . 
 . 
 .  = Q  ..  and  ..  = P  ..  .
µn
λn
λn
µn
Important Comment
Note that, although P expresses B2 in terms of B1 , it is the inverse P −1 that is used to express
B2 -coordinates in terms of B1 -coordinates.
Examples
1. Let B1 be the standard basis of R2 and let B2 be the basis {(3, −1), (1, 1)}. The change
of basis matrix from B1 to B2 is
µ
¶
3 1
P =
−1 1
and that from B2 to B1 is
µ
¶
1 1 −1
P =
.
4 1 3
µ
¶
a b
[General fact: The 2 × 2 matrix
is non-singular (invertible) if and only if
c d
ad − bc 6= 0 and in this case
µ
¶−1
µ
¶
1
a b
d −b
=
.]
c d
ad − bc −c a
−1
Thus the coordinates of (x1 , x2 ) ∈ R2 w.r.t. B2 are given by
¶
µ
¶µ ¶
µ
1 x1 − x2
x1
1 1 −1
=
.
4 1 3
x2
4 x1 + 3x2
Thus (x1 , x2 ) = 14 (x1 − x2 )(3, −1) + 14 (x1 + 3x2 )(1, 1).
13
2. Let B1 be the standard basis of R3 and let B2 be the basis {w1 , w2 , w3 } where
w1 = (1, 0, 1), w2 = (1, 1, 0), w3 = (1, 1, 1). Then the change of basis from B2 to B1 is


1 −1 0
 1
0 −1 
−1 1
1
(see the example at the end of Lecture 7). Hence the coordinates of the vector (2, 3, 1) ∈ R3
w.r.t. B2 are given by

  

−1
2
1 −1 0
 1
0 −1   3  =  1  .
2
1
−1 1
1
Thus (2, 3, 1) = −w1 + w2 + 2w3 .
Lecture 9
2. Linear Mappings
Let U and V be two vector spaces over F.
Definition A linear mapping (or linear transformation) from U to V is a map T : U → V such
that
(∗)
T (x + y) = T x + T y and T (λx) = λT x
for all x, y ∈ U and all λ ∈ F.
Notes
1. T is a function defined on U with values in V . We usually write T x rather than T (x) for
its value at x ∈ U .
2. Note that (∗) can be replaced by the single condition
T (λx + µy) = λT x + µT y
for all x, y ∈ U and all λ, µ ∈ F.
Examples
1. Let T : R3 → R2 be defined by




µ
¶ x1
x1
1 −1 2 
x2  ,
T :  x2  →
3 1 0
x3
x3
so that T (x1 , x2 , x3 ) = (x1 − x2 + 2x3 , 3x1 + x2 ). Then T is linear.
2. T : Pn → Pn+1 given by (T p)(t) = tp(t) is linear.
3. T : Pn → Pn given by (T p)(t) = p0 (t) + 2p(t) is linear.
14
4. The linear mappings Fn → Fm are precisely the mappings of the form




x1
x1
 x2 
 x2 




T :  ..  → A  ..  ,
 . 
 . 
xn
xn
where A is an m × n matrix (fixed for a given T ).
Technical terms
Let T : U → V be a linear mapping.
(a) If T u = v (u ∈ U, v ∈ V ), then v is the image of u under v.
(b) T is one-one (or injective) if, for u1 , u2 ∈ U , u1 6= u2 ⇒ T u1 6= T u2 (equivalently,
T u1 = T u2 ⇒ u1 = u2 ).
(c) T is onto V (or surjective) if, given v ∈ V , v = T u for some u ∈ U .
(d) The kernel of T is the set
ker T = {u ∈ U : T u = 0}
.
(e) The image of T is the set
im T = {T u : u ∈ U };
im T is also called the range of T , sometimes denoted by ran T .
(2.1) Theorem Let T : U → V be linear. Then
(i) T 0 = 0;
(ii) ker T and im T are subspaces of U and V respectively;
(iii) T is one-one (or injective) if and only if ker T = {0}.
Example
Let T : R2 → R3 be given by
µ
T
x1
x2
¶


¶
1 2 µ
x
1
=  −1 0 
,
x2
1 1
so that T (x1 , x2 ) = (x1 + 2x2 , −x1 , x1 + x2 ). Then ker T = {0}, so that T is one-one, whilst
im T = span{(1, −1, 1) , (2, 0, 1)}.
Note
This example illustrates the important fact that, if T



x1

 x2 



T  ..  = A 

 . 
xn
: Fn → Fm is a linear mapping defined by

x1
x2 

..  ,
. 
xn
where A is an m × n matrix, then im T equals the span (in Fm ) of the columns of A
15
Lecture 10
2. Linear Mappings contd.
The matrix of a linear mapping
Let T : U → V be linear, and suppose also that B1 = {u1 , . . . , un } is a basis of U and that
B2 = {v1 , . . . , vm } is a basis of V . The linearity of T means that T is completely determined by
the values of T u1 , T u2 , . . . , T un . Suppose then that
T u1 = a11 v1 + a21 v2 + · · · + am1 vm
T u2 = a12 v1 + a22 v2 + · · · + am2 vm
·········
(∗)
T un = a1n v1 + a2n v2 + · · · + amn vm .
Note that (∗) can be written as
T uj =
m
X
aij vi
i=1
for j = 1, . . . , n. We call the matrix

a11 a12 · · ·
 a21 a22 · · ·
A=

···
am1 am2 · · ·

a1n
a2n 


amn
the matrix of T with respect to B1 and B2 .
Notes
1. The columns of A are coefficients in the rows of (∗).
2. (∗) can be written as
(T u1 , . . . , T un ) = (v1 , . . . , vm ) A.
3. A is an m × n matrix, where n = dim U and m = dim V .
4. If U = V and B1 = B2 = B, we just refer to A as the matrix of T w.r.t. B.
5. Let B1 and B2 be two bases of U , let T be the identity mapping T x = x (x ∈ U ) and
consider T as a linear mapping from U equipped with basis B1 to U equipped with basis
B2 . Then the matrix of T is the change of basis matrix from B2 to B1 .
Examples
1. Let T be linear mapping Pn → Pn+1 defined by (T p)(t) = tp(t). Then the matrix of T
w.r.t. the bases {1, t, t2 , . . . , tn } and {1, t, t2 , . . . , tn+1 } of Pn and Pn+1 respectively is the
(n + 1) × n matrix


0 0 0 ··· 0
 1 0 0 ··· 0 


 0 1 0 ··· 0 


 0 0 1 ··· 0 .


 ..
.. 
.
.
 . ···
. . 
0 0 ··· 0 1
16
For instance, the first two and final columns of this matrix come from noting that, since
T (1) = t, T (t) = t2 and T (tn ) = tn+1 , we have
T (1) = 0.1 + 1.t + 0.t2 + · · · + 0.tn + 0.tn+1
T (t) = 0.1 + 0.t + 1.t2 + · · · + 0.tn + 0.tn+1
T (tn ) = 0.1 + 0.t + 0.t2 + · · · + 0.tn + 1.tn+1 .
2. Let T : Fn → Fm be given by


x1



T :  ...  → C 
xn


x1
..  ,
. 
xn
where C is an m × n matrix. Then C is the matrix of T w.r.t. the standard bases of Fn
and Fm .
3. Let T : R3 → R3 be defined by T (x1 , x2 , x3 ) = (x2 , x3 , x1 ). Consider the basis
B = {u1 , u2 , u3 } of R3 , where
u1 = (1, 0, 0) ; u2 = (1, 2, 3) ; u3 = (−1, 1, 1).
Then T u1 = (0, 0, 1) = λ1 u1 + λ2 u2 + λ3 u3 , where (after equating the three coordinates
and solving the resulting equation for λ1 , λ2 , λ3 ) λ1 = −3, λ2 = 1 and λ3 = −3. Hence
T u1 = −3u1 + 1.u2 − 2.u3 .
Similarly,
T u2 = (2, 3, 1) = 11u1 − 2u2 + 7u3
T u3 = (1, 1, −1) = 8u1 − 2u2 + 5u3 .
So the matrix of T w.r.t. B is

−3 11 8
 1 −2 −2  .
−2 7
5

Coordinates
The matrix of a linear mapping is related to the corresponding mapping of coordinates as follows.
(2.2) Theorem Let T : U → V be linear and have matrix A = (aij ) w.r.t. bases
B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vm } of U and V respectively. Let u ∈ U have coordinate
column vector


λ1


λ =  ... 
λn
w.r.t. B1 . Then T u has coordinate column vector



λ1
a11 · · · a1n

..   .. 
Aλ =  ... · · ·
.  . 
λn
am1 · · · amn
17
w.r.t. B2 .
Example
Let T : U → V have matrix
µ
1 2 −1
−1 3 6
¶
w.r.t. bases {u1 , u2 , u3 } of U and {v1 , v2 } of V . Then T (3u1 − u2 + 2u3 ) has coordinate column
vector


µ
¶
µ ¶
3
−1
1 2 −1 
−1  =
−1 3 6
6
2
w.r.t. {v1 , v2 }, so that T (3u1 − u2 + 2u3 ) = −v1 + 6v2 .
Change of bases
(2.3) Theorem Let T : U → V be linear, with matrix A w.r.t. bases B1 of U and B2 of V .
Let B10 be another basis of U and B20 another basis of V . Then the matrix C of T w.r.t. B10 and
B20 is given by
C = Q−1 AP,
where P is the change of basis matrix from B1 to B10 and Q is the change of basis matrix from
B2 to B20 .
Lecture 11
2. Linear Mappings contd.
The rank theorem
Let T : U → V be linear and recall that ker T = {u ∈ U : T u = 0} is a subspace of U , whilst
im T = {T u : u ∈ U } is a subspace of V . Suppose that U and V are finite dimensional.
Definitions
(i) The nullity of T , null(T ), is defined by
null(T ) = dim ker T.
(ii) The rank of T , rk(T ), is defined by
rk(T ) = dim im T.
[Alternative notation: n(T ) and r(T )]
(2.4) Theorem (The rank theorem) rk(T ) + null(T ) = dim U.
Example
Let T : R3 → R4 be defined by
T (x1 , x2 , x3 ) = (x1 − x2 , x2 − x3 , x3 − x1 , x1 + x2 − 2x3 ).
Then
x = (x1 , x2 , x3 ) ∈ ker T ⇔ x1 − x2 = x2 − x3 = x3 − x1 = x1 + x2 − 2x3 = 0
⇔ x1 = x2 = x3 ⇔ x = x1 (1, 1, 1);
18
thus ker T = span{(1, 1, 1)} and null(T ) = 1. Writing (x1 − x2 , x2 − x3 , x3 − x1 , x1 + x2 − 2x3 ) as
x1 (1, 0, −1, 1) + x2 (−1, 1, 0, 1) + x3 (0, −1, 1, −2),
we see that
im T = span{(1, 0, −1, 1) , (−1, 1, 0, 1) , (0, −1, 1, −2)}.
The first two of these vectors are clearly linearly independent but
(0, −1, 1, −2) = −(1, 0, −1, 1) − (−1, 1, 0, 1).
Hence
im T = span{(1, 0, −1, 1) , (−1, 1, 0, 1)},
{(1, 0, −1, 1) , (−1, 1, 0, 1)} is a basis of im T and rk(T ) = 2. Thus
rk(T ) + null(T ) = 2 + 1 = 3 = dim R3
as expected.
Note: dim V does not appear in the statement of the rank theorem; this is not surprising as the
space into which T is mapping could be adjusted (say, enlarged) without changing im T .
Corollary to the rank theorem Let T : U → V be linear, where U and V are finite dimensional. We have the following.
(i) rk(T ) ≤ dim U.
(ii) Suppose that dim U = dim V. Then T is one-one (i.e. is injective) if and only it maps U
onto V (i.e. is surjective). In this case, T is bijective and the inverse mapping
T −1 : V → U is also linear.
Comments
1. It is clear that rk(T ) ≤ dim V . Result (i) is saying something less obvious.
2. When T is bijective as in (ii), we say that T is invertible or is a linear isomorphism.
Rank and nullity of a matrix
Let A be an m × n matrix. Then there is an associated linear mapping TA : Fn → Fm given by




x1
x1




TA :  ...  → A  ...  .
xn
xn
The rank of A, rk(A), is defined to be the rank of TA and the nullity of A, null(A), is defined to
be the nullity of TA .
Since im TA is the span in Fm of the columns of A, rk(A) is sometimes referred to as the column
rank of A. It equals the maximum number of linearly independent columns of A. One can also
consider the maximum number of linearly independent rows of A (equivalently, the dimension
of the span in Fn of the rows of A). This is called the row rank of A. It can be shown, though
this is not at all obvious, that the row and column ranks of a matrix are always equal (Tut. Qn.
7.1 will indicate how this is proved).
19
Notice also that, if A = (aij ), then null(A) is the maximum number of linearly independent
solutions of the system of linear equations
a11 x1 + a12 x2 + · · · + a1n xn = 0
a21 x1 + a22 x2 + · · · + a2n xn = 0
·········
am1 x1 + am2 x2 + · · · + amn xn = 0 .
Sums and products of linear mappings
Let S, T : U → V be two linear mappings. Then we define S + T : U → V by
(S + T )u = Su + T u
(u ∈ U )
and, for λ ∈ F, we define λS : U → V by
(λS)u = λSu
(u ∈ U ).
It is easy to check that S + T and λS are linear mappings and that, if S has matrix A and T
has matrix B w.r.t. some bases of U and V , then S + T has matrix A + B and λS has matrix
λA w.r.t. the same bases.
Comment: Denote by L(U, V ) the set of all linear mappings from U into V . Then, with the
above definitions of addition and scalar multiplication, L(U, V ) is itself a vector space.
Suppose now that T : U → V and S : V → W are linear. We get the composition
R = S ◦ T : U → W , defined by
Ru = S(T u)
(u ∈ U ).
Again it is easy to check that R is linear. For linear mappings, we normally write such a
composition as R = ST and call R the product of S and T .
Example
Let T : R2 → R3 and S : R3 → R2 be defined by
T (x1 , x2 ) = (x1 + x2 , 0, x2 ) and S(x1 , x2 , x3 ) = (x2 , x3 − x1 ).
Then ST : R2 → R2 is given by
(ST )(x1 , x2 ) = S(x1 + x2 , 0, x2 ) = (0, −x1 ).
We also have T S : R3 → R3 given by
(T S)(x1 , x2 , x3 ) = T (x2 , x3 − x1 ) = (x2 + x3 − x1 , 0, x3 − x1 ).
Note: ST and T S are completely different mappings.
20
Lecture 12
2. Linear Mappings contd.
(2.5) Theorem Let T : U → V and S : V → W be linear mappings. Suppose that T has
matrix A w.r.t. bases B1 of U and B2 of V , and S has matrix C w.r.t. the bases B2 of V and
B3 of W . Then ST has matrix CA w.r.t. B1 and B3 .
Notes
1. Given S, T ∈ L(U, U ), both products ST and T S are defined. However, when dim U > 1,
it may not be the case that ST equals T S. This is because, when n > 1, there always
exist n × n matrices A, B such that AB 6= BA.
2. When T ∈ L(U, U ), we can form positive powers T n of T by taking n compositions of
T with itself. It is customary to define T 0 to be the identity mapping Iu = u on U .
Further, when T is invertible, negative powers of T are defined by setting T −n = (T −1 )n
for n = 1, 2, . . . . With these definitions, the expected index laws hold for the range of
indices for which they make sense. For instance,
(T n )(T m ) = T (m+n)
holds for general T when m, n ≥ 0 and holds for invertible T when m, n ∈ Z.
Singular and non-singular matrices
Let A be an n × n matrix with entries in Fn and let In be the n × n identity matrix (i.e. it has
1’s on the diagonal and 0’s elsewhere). The following is a consequence of the Corollary to the
rank theorem.
(2.6) Theorem The following statements are equivalent.
(i) There exists an n × n matrix B such that BA = In .
(ii) There exists an n × n matrix B such that AB = In .
(iii) The columns of A are linearly independent.


x1


(iv) Ax = 0 ⇒ x = 0 [where x =  ... ].
xn
Definition When (i)-(iv) hold, we say that A is non-singular; otherwise, A is said to be singular.
Thus A is singular precisely when its columns are linearly dependent. It can be shown that A
is singular if and only if det A = 0 (equivalently, A is non-singular if and only if det A 6= 0).
Linear Equations
Let T : U → V , let v ∈ V be given, and suppose that we are interested in solving the equation
Tx = v
(†);
i.e. we wish to find all x ∈ U such that T x = v. The basic simple result is the following.
(2.7) Theorem The equation (†) has a solution if and only if v ∈ im T. Assume that this is
the case and let x = x0 be some solution of (†). Then the general solution of (†) is
x = x0 + u,
21
where u ∈ ker T .
Comment
This result is illustrated , for instance, in the following.
1. Let A be an m × n matrix
of m linear equations Ax = b in
 and considerthe system

x1
b1
 .. 
 .. 
the variables x =  . , where b =  .  . This has a solution if and only if b
xn
bm
is in the span of the columns of A and, in that case, the general solution has the form
x = x0 + u, where x0 is a particular solution and u is the general solution of the associated
homogeneous equation Ax = 0.
2. Let L be a linear differential operator (e.g. a linear combination of derivatives dk x/dtk )
and consider the equation Lx = f , where f (t) is some given function. Then the general
solution of this equation has the form x(t) = x0 (t) + u(t), where x0 (t) is a solution of the
equation (called, in this context, a particular integral), and u(t) is the general solution of
the associated homogeneous equation Lx = 0 (called the complementary function).
Eigenvalues, eigenvectors and diagonalization
Let S : U → U be a linear mapping, where U is a vector space over F.
Definition A scalar λ ∈ F is an eigenvalue of T if there exists u 6= 0 such that T u = λu. Such
a (non-zero) vector u is called an eigenvector associated with λ. The set
{u : T u = λu},
which equals ker(T − λI) where I is the identity operator Iu = u on U , is called the eigenspace
corresponding to λ. It is clearly a subspace of U (being the kernel of a linear mapping).
We also have the following related definition.
Definition We say that T is diagonalizable if U has a basis consisting of eigenvectors of T .
We then have the following simple result that explains the terminology.
(2.8) Theorem T is diagonalizable if and only if there exists a basis B = {u1 , . . . , un } of U
w.r.t. which the matrix of T is diagonal, i.e. has the form


λ1 0 · · · 0
 0 λ2 · · · 0 


 ··· ··· .
0 · · · λn
Suppose now that A is an n × n matrix, with an associated linear mapping TA : x → Ax from
Fn to itself. Then TA is diagonalizable if and only if there exists a non-singular n × n matrix
P such that P −1 AP is diagonal. [P will be the change of basis matrix from the standard basis
of Fn to the basis consisting of eigenvectors of TA .] In these circumstances, we say that A is
diagonalizable.
(2.9) Theorem Let λ1 , . . . , λn be distinct eigenvalues of T , with corresponding eigenvectors
u1 , . . . , un . Then {u1 , . . . , un } is linearly independent.
Corollary If T : U → U is linear, dim U = n and T has n distinct eigenvalues, then T is
diagonalizable.
22
Calculation of eigenvalues
Let T : U → U be linear, suppose that dim U = n and let T have matrix A w.r.t. some basis of
U . Then
λ is an eigenvalue of T ⇔
⇔
⇔
⇔
ker(T − λI) 6= {0}
T − λI is not invertible
A − λIn is singular
det(A − λIn ) = 0.
Here In denotes the n × n identity matrix.
Notice that det(A−λIn ) is a polynomial in λ of degree n. It is called the characteristic polynomial
of T ; its roots (in F) are precisely the eigenvalues of T . It is easy to check, using the change
of bases formula for the matrix of a linear mapping (Theorem (2.3)), that the characteristic
polynomial depends on T but not the particular matrix A representing T .
The underlying scalar field is important here. If F = C, then T will always have an eigenvalue,
since every complex polynomial has a root in C (the so-called Fundamental Theorem of Algebra).
However, when F = R, the characteristic polynomial may have no real roots if n is even, though
it will always have at least one real root if n is odd. Thus, a linear mapping on an even
dimensional real vector space might not have any eigenvalues, though such a mapping on an
odd dimensional real vector space will always have an eigenvalue.
Comment
Starting with an n × n matrix A with entries in F, the polynomial det(A − λIn ) is also referred
to as the characteristic polynomial of A. Its roots are the eigenvalues of the associated linear
mapping TA : x → Ax on Fn .
Lecture 13
3. Inner product spaces
We now extend the idea of an inner (or scalar) product in R2 and R3 to more general vector
spaces. In this context, it is necessary to distinguish between real and complex vector spaces.
Definition Let V be a vector space over F = R or C. An inner product on V is a map that
associates with each pair of vectors x, y ∈ V a scalar hx, yi ∈ F (more formally, x, y → hx, yi is
a function from V × V to F) such that, for all x, y and z ∈ V and all λ, µ ∈ F,

 hx, yi when F = R
(i) hy, xi =

hx, yi when F = C ;
(ii) hλx + µy, zi = λhx, zi + µhy, zi;
(iii) hx, xi ≥ 0, with equality if and only if x = 0.
A real (resp. complex) inner product space is a vector space over R (resp. over C) endowed with
an inner product.
Comments
23
1. It follows from (i) that, even when F = C, hx, xi ∈ R for all x ∈ V ; thus (iii) both makes
sense and strengthens this. Further, it is easy to see that (ii) implies that h0, 0i = 0, so
that, for (iii), we need hx, xi ≥ 0 and hx, xi = 0 ⇒ x = 0.
2. For a real inner product space V , properties (i) and (ii) together imply that
hx, λy + µzi = λhx, yi + µhx, zi
for all x, y, z ∈ V and all λ, µ ∈ R, and so hx, yi is linear in the second variable y as well
as in the first variable x. However, for a complex inner product space V ,
hx, λy + µzi = λhx, yi + µhx, zi
for all x, y, z ∈ V and all λ, µ ∈ C. In this case, we say that hx, yi is conjugate linear in
the second variable y (it is linear in the first variable x by (ii)) .
Examples
1. In Rn , define
hx, yi = x1 y1 + x2 y2 + · · · + xn yn
for x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Then h·, ·i is an inner product, referred to as the
standard inner product on Rn .
2. On R2 , define hx, yi = x1 y1 + 4x2 y2 . Then h·, ·i is an inner product on R2 .
3. In Cn , define
hz, wi = z1 w1 + z2 w2 + · · · + zn wn
for z = (z1 , . . . , zn ) and w = (w1 , . . . , wn ). Then h·, ·i is an inner product, referred to as
the standard inner product on Cn .
4. hz, wi = z1 w1 + 4z2 w2 defines an inner product on C2 .
5. On R2 , define hx, yi = 2x1 y1 + x1 y2 + x2 y1 + x2 y2 . Properties (i) and (ii) for an inner
product are easily seen to be satisfied. Also
hx, xi = 2x21 + 2x1 x2 + x22
= 2(x1 + x2 /2)2 + x22 /2
≥ 0,
with equality in the last line if and only if x1 = x2 = 0. Thus h·, ·i is inner product on R2 .
6. Now define hx, yi = x1 y1 + 2x1 y2 + 2x2 y1 + x2 y2 on R2 . Again it is easy to see that
properties (i) and (ii) of an inner product are satisfied. However,
hx, xi = x21 + 4x1 x2 + x22
= (x1 + 2x2 )2 − 3x22
< 0,
provided x2 6= 0 and x1 = −2x2 . Thus h·, ·i is not an inner product on R2 in this case.
24
7. Fix a < b and, for p, q ∈ Pn , define
Z
b
hp, qi =
p(t)q(t) dt.
a
Then h·, ·i is an inner product on Pn . Properties (i) and (ii) follow from the linearity of
the integral, whilst (iii) follows from the fact that, if ϕ : [a, b] → R is continuous, ϕ(t) ≥ 0
Rb
on [a, b] and a ϕ(t) dt = 0, then ϕ ≡ 0 on [a, b]. Thus, with this inner product, Pn is a
real inner product space.
8. Let V = CC [a, b] be the vector space of all continuous complex valued functions defined
on [a, b], where a < b. (Continuity for complex valued functions is defined as the natural
extension of the definition for real valued functions, using the modulus in C to measure
‘nearness’). For f, g ∈ V , define
Z
b
hf, gi =
f (t)g(t) dt.
a
(For a complex valued function ϕ on [a, b], the integral
Rb
Rb
Reϕ(t) dt + i a Imϕ(t) dt.)
a
Rb
a
ϕ(t) dt is defined to be
Rb
It is straightforward to check that h·, ·i satisfies (i) and (ii). Also, hf, f i = a |f (t)|2 dt ≥ 0,
with equality if and only if f (t) ≡ 0 on [a, b] (using the continuity of the non-negative
function |f (t)|2 ). Hence h·, ·i also satisfies (iii) and so defines an inner product on the
complex vector space CC [a, b]
Lecture 14
3. Inner product spaces contd.
Norms
Suppose that V, h·, ·i is a real or complex inner product space. Then hx, xi ≥ 0 for all x ∈ V .
Definition The norm of x ∈ V , kxk, is defined as
kxk = hx, xi1/2 ,
where
1/2
denotes the non-negative square root.
(3.1) Theorem (The Cauchy-Schwarz inequality) We have
|hx, yi| ≤ kxkkyk
for all x, y ∈ V .
Applications of the Cauchy-Schwarz inequality
1. Let a1 , . . . , an , b1 , . . . , bn ∈ R. Then
|a1 b1 + · · · + an bn | ≤ (a21 + · · · + a2n )1/2 (b21 + · · · + b2n )1/2 .
25
2. Let p, q be two real polynomials (thought of as in Pn for some n) and let a < b. Then,
Rb
applying the Cauchy-Schwarz inequality with the inner product hp, qi = a p(t)q(t) dt, we
get
¯ ½Z b
¯Z b
¾1/2 ½Z b
¾1/2
¯
¯
2
2
¯
p(t)q(t) dt¯¯ ≤
|p(t)| dt
|q(t)| dt
.
¯
a
a
a
(3.2) Theorem The norm k · k in an inner product space V satisfies
(i) kxk ≥ 0 with equality if and only if x = 0;
(ii) kλxk = |λ|kxk;
(iii) kx + yk ≤ kxk + kyk [the triangle inequality]
for all x, y, z ∈ V and all λ ∈ F.
Orthogonal and orthonormal sets
Definitions
(i) Two vectors x, y in an inner product space are orthogonal if hx, yi = 0. We write this as
x⊥y
(ii) A set of vectors {u1 , . . . , uk } in an inner product space is
(a) orthogonal if hui , uj i = 0 when i 6= j;
½
1 if i = j
(b) orthonormal if hui , uj i =
0 if i 6= j .
Comments
1. If {u1 , . . . , uk } is orthonormal, then kuj k = 1 and so uj 6= 0 for all j. If {u1 , . . . , uk } is
u
orthogonal and uj 6= 0 for all j, then the set { j : j = 1, . . . , k} is orthonormal. We say
kuj k
that {u1 , . . . , uk } has been normalized.
2. The term ‘orthonormal’ is often abbreviated to ‘o.n.’.
(3.3) Theorem An orthogonal set {u1 , . . . , uk } of non-zero vectors is linearly independent. In
particular, an o.n. set {u1 , . . . , uk } is linearly independent.
Suppose now that V is an inner product space and dim V = n. Let {u1 , . . . , un } be an orthogonal
set of non-zero vectors in V . Then {u1 , . . . , un } is a basis of V and so each vector x ∈ V can be
expressed uniquely as x = λ1 u1 +· · ·+λn un . In this situation, it easy to compute the coordinates
λj of x ∈ V .
(3.4) Theorem Let V be an n-dimensional inner product space.
(i) Let {u1 , . . . , un } be an orthogonal set of non-zero vectors in V , and hence a basis of V .
Then
n
X
hx, uj i
x=
uj
2
ku
k
j
j=1
for all x ∈ V .
26
(ii) Let {e1 , . . . , en } be an orthonormal set of vectors in V , and hence an o.n. basis of V . Then
n
X
x=
hx, ej iej
j=1
and
2
||xk =
n
X
|hx, ej i|2
(Parseval0 s equality)
j=1
for all x ∈ V . Thus, if x =
Pn
j=1
λj ej , then
2
kxk =
n
X
|λj |2 .
j=1
Lecture 15
3. Inner product spaces contd.
The Gram-Schmidt process
(3.5) Theorem Let V be a finite dimensional inner product space. Then V has an orthogonal,
and hence an o.n basis.
The proof of this result gives a procedure, called the Gram-Schmidt process, for constructing
from a given basis of V an orthogonal basis. Normalization will then yield an o.n. basis.
Example
Let V be the subspace of R4 with basis {v1 , v2 , v3 }, where
v1 = (1, 1, 0, 0), v2 = (1, 0, 1, 1), v3 = (1, 0, 0, 1).
(It is easy to check that these three vectors are linearly independent; for instance, v2 and v3 are
linearly independent since they are not multiples of each other and, by considering the second
coordinate, v1 ∈
/ span{v2 , v3 }.) An orthogonal basis {u1 , u2 , u3 } of V can be constructed from
{v1 , v2 , v3 } as follows.
Let u1 = v1 = (1, 1, 0, 0). Now put
u2 = v2 − λu1 = (1, 0, 1, 1) − λ(1, 1, 0, 0) = (1 − λ, −λ, 1, 1)
and choose λ so that u2 ⊥ u1 . For this, we need
hu2 , u1 i = 1 − λ − λ = 0 or λ = 1/2,
giving u2 = ( 12 , − 12 , 1, 1). Now put
1 1
u3 = v3 − λ1 u1 − λ2 u2 = (1, 0, 0, 1) − λ1 (1, 1, 0, 0) − λ2 ( , − , 1, 1)
2 2
and choose λ1 , λ2 so that u3 ⊥ u1 , u2 . For u3 ⊥ u1 , we need
hu3 , u1 i = h(1, 0, 0, 1), (1, 1, 0, 0)i − λ1 h(1, 1, 0, 0), (1, 1, 0, 0)i = 0
27
i.e. 1 − 2λ1 = 0 or λ1 = 1/2. (Note that we have used the fact that u1 ⊥ u2 when calculating
hu3 , u1 i here; the effect is that we get an equation in λ1 alone.)
For u3 ⊥ u2 , we need
1 1
1 1
1 1
hu3 , u2 i = h(1, 0, 0, 1), ( , − , 1, 1)i − λ2 h( , − , 1, 1), ( , − , 1, 1)i = 0,
2 2
2 2
2 2
giving
1
1 1
+ 1 − λ2 ( + + 1 + 1) = 0
2
4 4
i.e. λ2 =
3/2
= 3/5. Hence
5/2
1
3 1 1
u3 = (1, 0, 0, 1) − (1, 1, 0, 0) − ( , − , 1, 1)
2
5 2 2
1 1 3 2
= ( ,− ,− , )
5 5 5 5
1
(1, −1, −3, 2).
=
5
We have thus constructed an orthogonal basis {u1 , u2 , u3 } of V given by
1 1
1
1
u1 = (1, 1, 0, 0), u2 = u2 = ( , − , 1, 1) = (1, −1, 2, 2), u3 = (1, −1, −3, 2).
2 2
2
5
Stripping out the fractions, we get
(1, 1, 0, 0), (1, −1, 2, 2), (1, −1, −3, 2).
It useful to check that these three vectors are indeed orthogonal in pairs. Note also that, to
avoid quite so many fractions, we could have replaced u2 = ( 12 , − 12 , 1, 1) by (1, −1, 2, 2) earlier
when calculating u3 .
To obtain an o.n. basis {e1 , e2 , e3 }, we normalize to get
e1 =
u1
1
u2
1
u3
1
= √ (1, 1, 0, 0), e2 =
= √ (1, −1, 2, 2), e3 =
= √ (1, −1, −3, 2).
ku1 k
ku2 k
ku3 k
2
10
15
Consider v = v1 + v2 − v3 = (1, 1, 1, 0) ∈ V . We can write v as λ1 e1 + λ2 e2 + λ3 e3 , the coefficients
here being given by
√
√
√
2
2
10
−3
15
λ1 = hv, e1 i = √ = 2, λ2 = hv, e2 i = √ =
, λ3 = hv, e3 i = √ = −
.
5
5
2
10
15
Notice that
λ21 + λ22 + λ23 = 2 +
10 15
+
= 3 = kvk2 ,
25 25
as expected by Parseval’s equality.
Another example
R1
Find a basis of P2 that is orthogonal w.r.t. the inner product given by hp, qi = 0 p(t)q(t) dt.
To do this, start with the basis {v1 , v2 , v3 }, where v1 (t) = 1, v2 (t) = t, v3 (t) = t2 , and apply the
Gram-Schmidt process.
28
Put u1 = v1 ; then u2 (t) = v2 (t) − λu1 (t) = t − λ. For u2 ⊥ u1 , we need
Z 1
(t − λ).1 dt = 0, i.e. λ = 1/2.
0
This gives u2 (t) = t − 1/2. Now put u3 = v3 − λ1 u1 − λ2 u2 . For u3 ⊥ u1 , we need
Z 1
Z 1
Z 1
2
(v3 (t) − λ1 u1 (t))u1 (t) dt = 0, i.e.
t dt − λ1
1 dt = 0.
0
0
0
So λ1 = 1/3.
For u3 ⊥ u2 , we need
Z 1
Z
(v3 (t) − λ2 u2 (t))u2 (t) dt = 0, i.e.
0
This leads to
1
Z
1
2
t (t − 1/2) dt − λ2
0
(t − 1/2)2 dt = 0.
0
R1
t2 (t − 1/2) dt
1/12
λ2 = R0 1
=
= 1.
2 dt
1/12
(t
−
1/2)
0
Hence u3 (t) = t2 − 1/3 − (t − 1/2) = t2 − t + 1/6 and we obtain the orthogonal basis
1, t − 1/2, t2 − t + 1/6.
Lecture 16
3. Inner product spaces contd.
Orthogonal complements and orthogonal projections
Let U be a subspace of an inner product space V .
Definition The orthogonal complement of U is the set U ⊥ defined by
U ⊥ = {x ∈ V : hx, ui = 0 for all u ∈ U }.
(3.6) Theorem Let U be a subspace of a finite dimensional inner product space V . Then each
x ∈ V can be written uniquely as
x = y + z,
where y ∈ U and z ∈ U ⊥ . Thus V = U ⊕ U ⊥ .
Terminology Let V and U be as in Theorem (3.6). Given x ∈ V , define the orthogonal projection
of x onto U to be the unique y ∈ U with x = y + z, where z ∈ U ⊥ , and write y = PU x. We can
then think of PU as a mapping V → V . This mapping is called the orthogonal projection of V
onto U . The following are its main properties.
(3.7) Theorem With V, U and PU as above, the following hold.
(i) PU is a linear mapping with im PU = U and ker PU = U ⊥ .
29
(ii) If {e1 , . . . , ek } is an o.n. basis of U , then PU is given by the formula
k
X
PU x =
hx, ej i ej
(x ∈ V ).
j=1
If {u1 , . . . , uk } is an orthogonal basis of U , then PU is given by the formula
PU x =
k
X
j=1
1
hx, uj i uj
kuj k2
(x ∈ V ).
(iii) PU2 = PU .
(iv) x ∈ U if and only if x = PU x.
(v) For x ∈ V ,
kxk2 = kPU xk2 + kx − PU xk2 .
[Strictly, we should consider the case when U = {0} separately. Then U ⊥ = V and PU = 0.]
Nearest point to a subspace.
In an inner product space, the norm is used to measure distance, with kx − yk being thought of
as the distance between x and y.
(3.8) Theorem Let U be a subspace of a finite dimensional inner product space V and let
x ∈ V . Then
kx − uk ≥ kx − PU xk
for all u ∈ U , with equality if and only if u = PU x. Thus, with distance measured using the
norm, PU x is the unique nearest point of U to x.
Connection with Fourier series
Let V denote the space of 2π-periodic continuous functions f : R → C and give V the inner
product defined by
Z π
1
hf, gi =
f (t)g(t) dt.
2π −π
For n ∈ Z, let en ∈ V be the function en (t) = eint . Then {en : n ∈ Z} is an orthonormal set
and, for f ∈ V , hf, en i is the nth (complex) Fourier coefficient
Z π
1
f (t)e−int dt
cn =
2π −π
of f . Thus the Fourier series of f can be written as
∞
X
hf, en i en .
n=−∞
This suggests that, in some sense, the Fourier series of f is its expansion in terms of the o.n.
‘basis’ {en : n ∈ Z}. This is, indeed, a reasonable interpretation but, to make it more rigorous,
one has to decide how to make sense of the convergence of the Fourier series of f ∈ V and also
30
whether V is the right space to consider these matters. What is true is that, for f ∈ V with
∞
P
Fourier series
cn e n ,
n=−∞
kf −
N
X
(
hf, en ien k =
n=−N
1
2π
Z
π
|f (t) −
−π
)1/2
N
X
cn en (t)|2 dt
→0
n=−N
as N → ∞. Further, fixing N ∈ N, amongst all the functions g ∈ span{en : n = −N, . . . , N },
N
P
the N th partial sum
cn en of the Fourier series of f is the unique function g that minimizes
n=−N
½
kf − gk =
1
2π
Z
¾1/2
π
2
|f (t) − g(t)| dt
.
−π
In fact, V is not quite the correct space to discuss these issues − this will be discussed in
more detail in later years.
Lecture 17
4. Adjoints and self-adjointness
Assume throughout this chapter that V is a finite dimensional inner product space. The aim is
to show that, given a linear mapping T : V → V , there is a (unique) linear mapping T ∗ : V → V
such that hT ∗ x, yi = hx, T yi for all x, y ∈ V and then to study those mappings for which T = T ∗ .
(4.1) Theorem Given a linear mapping T : V → V , there is a unique linear mapping
T ∗ : V → V such that
hT ∗ x, yi = hx, T yi
for all x, y ∈ V . Furthermore, if {e1 , . . . , en } is an o.n. basis of V and T has matrix A = (aij )
with respect to {e1 , . . . , en }, then T ∗ has matrix C = (cij ) with respect to {e1 , . . . , en }, where

(when F = R)
 aji
cij =

aji
(when F = C) .
Notation and terminology
1. The mapping T ∗ is called the adjoint of T .
2. Given a matrix A = (aij ), its transpose is the matrix At = (aji ) and, when the entries are
complex, its conjugate transpose is A∗ = (aji ) = (A)t . Thus, for a given linear mapping
T : V → V with matrix A with respect to some o.n. basis of V ,
(a) if V is a real inner product space, T ∗ has matrix At ,
(b) if V is a complex inner product space, T ∗ has matrix A∗
with respect to the same o.n. basis.
Definition A linear mapping T : V → V is said to be self-adjoint if T ∗ = T , i.e. if hT x, yi =
hx, T yi for all x, y ∈ V .
Thus T is self-adjoint if and only if its matrix A w.r.t. any (and hence every) o.n. basis satisfies
31
(a) At = A if V is a real inner product space;
(b) A∗ = A if V is a complex inner product space.
Definitions A matrix A for which At = A is called symmetric whilst (in the case of complex
entries) it is called hermitian when A∗ = A.
Although a linear mapping on a complex (finite dimensional) vector space always has an eigenvalue, it is possible for a linear mapping on a real vector space not to have any eigenvalues.
However, this phenomenon cannot occur for a self-adjoint mapping. Indeed, more can be said
about eigenvalues and eigenvectors of self-adjoint linear mappings.
(4.2) Theorem Let T : V → V be self-adjoint.
(i) Every eigenvalue of T is real (even when V is a complex inner product space).
(ii) Let λ, µ be distinct eigenvalues of T with corresponding eigenvectors u, v. Then hu, vi = 0.
(iii) T has an eigenvalue (even when V is a real inner product space).
Lecture 18
4. Adjoints and self-adjointness contd.
Our aim now is to show that, given a self-adjoint T on V , V has an o.n. basis consisting of
eigenvectors of T (and so, in particular, T is diagonalizable). The following result will be useful
in doing this, as well as being of interest in its own right.
(4.3) Theorem Let T : V → V be self-adjoint and let U be a subspace of V such that T (U ) ⊆ U .
Then T (U ⊥ ) ⊆ U ⊥ and the restrictions of T to U and U ⊥ are self-adjoint.
We now have the main result of this chapter.
(4.4) Theorem (The finite dimensional spectral theorem) Let V be a finite dimensional
inner product space and let T : V → V be a self-adjoint linear mapping. Then V has an o.n.
basis consisting of eigenvectors of T .
An illustration of the use of the spectral theorem
Suppose that T : V → V is self-adjoint and that every eigenvalue λ of T satisfies λ ≥ 1. Then
hT x, xi ≥ kxk2 for all x ∈ V.
Proof Let {e1 , . . . , en } be an o.n. basis of V with T ej = λj ej for j = 1, . . . , n, where λj ≥ 1 for
all j. Let x ∈ V . We have
x = x1 e1 + · · · + xn en ,
where xj = hx, ej i, and
T x = x1 λ1 e1 + · · · + xn λn en .
32
Then
hT x, xi = h
=
≥
n
X
xj λj ej ,
j=1
n
X
|xj |2
xk e k i
k=1
λj |xj |2
j=1
n
X
n
X
since hej , ek i = δij
since each λj ≥ 1
j=1
= kxk2
by Parseval0 s equality.
Here δij is the Kronecker delta symbol, defined as

 0 if i 6= j
δij =

1 if i = j .
Lecture 19
4 Adjoints and self-adjointness contd.
Real symmetric matrices
Let A be a real n × n matrix. This has an associated mapping TA : Rn → Rn given by
TA : x → Ax,
where we are thinking of x as a column vector. Then A is the matrix of TA and At is the matrix
of (TA )∗ w.r.t. the standard basis of Rn .
Suppose now that A is symmetric (i.e. At = A). Then TA is self-adjoint since in this case
(TA )∗ = TAt = TA .
Aside: This can also be seen as follows. Note that, writing vectors in Rn as columns 
and thinking

x1


of these as n × 1 matrices, we can express the standard inner product of x =  ...  and
xn




y1
y1
 .. 
 .. 
t
[= (x1 , . . . , xn )  . ]. We then have
y =  .  as hx, yi = x y
yn
yn
hAx, yi = (Ax)t y = (xt At )y = (xt A)y = xt (Ay) = hx, Ayi,
which shows that TA is self-adjoint.
Applying the spectral theorem to TA , we conclude that there is an o.n. basis {u1 , . . . , un } of Rn
33
consisting of eigenvectors of TA , say TA uj = λj uj for j = 1, . . . , n. Consider the n × n matrix
R = (u1 . . . un ) (i.e. uj is the j th column of R). Then Auj is the j th column of AR and so
AR = (Au1 . . . Aun ) = (λ1 u1 . . . λn un )


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


= (u1 . . . un )  ..
.. 
.
.
 . ···
. . 
0 0 · · · 0 λn
= RD,


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


where D is the diagonal matrix  ..
.. .
.
.
 . ···
. . 
0 0 · · · 0 λn
Now Rt is the n × n matrix with j th row utj (where uj is a column and so utj is the vector uj
written as a row). Thus


ut1


Rt R =  ...  (u1 . . . un ),
utn
so the ij th entry of Rt R is uti uj = hui , uj i = δij . Thus
Rt R = In , the n × n identity matrix.
Thus R is invertible and R−1 = Rt . (It could have been noted earlier that R is invertible since
its columns are linearly independent. However, the above calculation shows that its inverse can
be written down without any further work.)
Definition An orthogonal matrix is an n × n real matrix R (for some n) such that Rt R = In
[equivalently, R is invertible and R−1 = Rt or the columns (or rows) of R form an o.n. basis of
Rn ].
Returning to the above equation AR = RD, premultiplication by Rt gives Rt AR = D. It is
easy to show (by considering the characteristic polynomial of A, which equals the characteristic
polynomial of R−1 AR = Rt AR = D) that the diagonal elements of D are precisely the eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial.
(4.5) Theorem (The spectral theorem for real symmetric matrices) Let A be a real
n × n symmetric matrix. Then there is an n × n orthogonal matrix R such that


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


t
−1
R AR = R AR =  ..
..  ,
.
.
 . ···
. . 
0 0 · · · 0 λn
where λ1 , λ2 , . . . , λn are the (necessarily real) eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial of A.
Example
34
Let

2 1 1
A =  1 2 1 .
1 1 2

Find an orthogonal matrix R such that Rt AR is diagonal.
There are three steps to do this.
1. Find the eigenvalues of A. We have (after some computation)
det(A − λI3 ) = (1 − λ)2 (4 − λ).
Thus the eigenvalues, which are the roots of det(A − λI3 ) = 0, are 1 and 4. (As a check,
their sum counting multiplicities is 1 + 1 + 4 = 6 and this equals the trace of A, the sum
of the diagonal elements of A. This will always be the case.)
2. Find o.n. bases of the corresponding eigenspaces.
Eigenspace corresponding to λ = 1:
We need to solve (A − 1.I3 )x = 0, i.e. x1 + x2 + x3 = 0, leading to
x = (x1 , x2 , −x1 − x2 ) = x1 (1, 0, −1) + x2 (0, 1, −1).
Thus {v1 , v2 } is a basis of this eigenspace, where v1 = (1, 0, −1) and v2 = (0, 1, −1). An
application of the Gram-Schmidt process to {v1 , v2 } gives the orthogonal basis
{(1, 0, −1) , (−1/2, 1, −1/2)} or (stripping out the fractions) {(1, 0, −1) , (−1, 2, −1)}. Normalizing, we get the o.n. basis {e1 , e2 } where
1
1
e1 = √ (1, 0, −1) ; e2 = √ (−1, 2, −1).
2
6
Eigenspace corresponding to λ = 4:
For this we need to solve (A − 4I3 )x = 0, leading to the three equations
−2x1 + x2 + x3 = 0
x1 − 2x2 + x3 = 0
x1 + x2 − 2x3 = 0.
The third of these equations is just minus the sum of the first two and so is redundant.
Eliminating x3 from the first two gives x1 = x2 and then x3 = x1 from the first, leading
to the general solution x = x1 (1, 1, 1). Thus an o.n. basis for this eigenspace consists of
the single vector
1
e3 = √ (1, 1, 1).
3
3. Obtain the required orthogonal matrix. Now form the matrix R = (e1 e2 e3 ), with the ej ’s
written as column vectors; i.e. let
 1 −1 1 

R=
√
2
0
−1
√
2
This matrix is orthogonal and
√
√
6
3
√2
√1
6
3
−1 √1
√
6
3


.

1 0 0
Rt AR = A =  0 1 0  .
0 0 4
35
Lecture 20
4 Adjoints and self-adjointness contd.
Hermitian matrices
Suppose that A = (aij ) ∈ Mn (C) is hermitian, i.e. A∗ = A or aji = aij for all i, j. In this case,
the associated mapping TA : Cn → Cn given by
TA : z → Az,
where, as in the discussion of real symmetric matrices, we are thinking of z as a column vector,
is self-adjoint. We have already seen this, but a direct proof is as follows.


z1


Write a typical vector of Cn as a column vector z =  ... , thought of as an n × 1 matrix.
zn
Then z ∗ = (z1 . . . zn ) and hence
hw, zi = w1 z1 + · · · + wn zn = z ∗ w.
We then have
hAw, zi = z ∗ (Aw) = (z ∗ A)w = (z ∗ A∗ )w = (Az)∗ w = hw, Azi,
showing that TA is self-adjoint.
Again, we can apply the spectral theorem to TA and conclude that there is an o.n. basis
{u1 , . . . , un } of Cn consisting of eigenvectors of TA , say TA uj = λj uj for j = 1, . . . , n, where each
λj ∈ R since the eigenvalues of self-adjoint mappings are always real. Guided by the discussion
of real symmetric matrices, consider the n × n matrix U = (u1 . . . un ) (i.e. uj is the j th column
of U ). Then Auj is the j th column of AU and so
AU = (Au1 . . . Aun ) = (λ1 u1 . . . λn un )


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


= (u1 . . . un )  ..
. . .. 
 . ···
. . 
0 0 · · · 0 λn
= U D,


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


where D is the diagonal matrix  ..
.. .
.
.
 . ···
. . 
0 0 · · · 0 λn
Here we have

u∗1


U ∗ U =  ...  (u1 . . . un ) = In ,
u∗n

as the ij th entry of U ∗ U is u∗i uj = huj , ui i = δij . Thus
U ∗ U = In , the n × n identity matrix
36
and U −1 = U ∗ .
Definition A unitary matrix is a (complex) n × n real matrix U (for some n) such that U ∗ U = In
[equivalently, U is invertible and U −1 = U ∗ or the columns (or rows) of U form an o.n. basis of
Cn ].
The spectral theorem then takes on the following form.
(4.6) Theorem (The spectral theorem for hermitian matrices) Let A be an n × n (complex) hermitian matrix. Then there is an n × n unitary matrix U such that


λ1 0 0 · · · 0
 0 λ2 0 · · · 0 


,
U ∗ AU = U −1 AU =  ..
. . .. 
 . ···
. . 
0 0 · · · 0 λn
where λ1 , λ2 , . . . , λn are the (necessarily real) eigenvalues of A, repeated according to their multiplicity as roots of the characteristic polynomial of A.
Comments about orthogonal and unitary matrices
1. Let R be an n × n orthogonal matrix, so that Rt R = In . Then
hRx, Ryi = (Rx)t (Ry) = xt Rt Ry = xt y = hx, yi
for all x, y ∈ Rn . Thus the mapping x → Rx from Rn to Rn preserves the inner product.
In particular, kRxk = kxk for all x ∈ Rn since
kRxk2 = hRx, Rxi = hx, xi = kxk2 .
Conversely, suppose that A is an n × n real matrix and hAx, Ayi = hx, yi for all x, y ∈ Rn .
Then hAt Ax, yi = hAx, Ayi = hx, yi for all x, y, from which it follows that At A = In , i.e.
A is orthogonal. In fact, it suffices to assume that kAxk = kxk for all x ∈ Rn to conclude
that A is orthogonal. To prove this, note the expression for an inner product in terms of
the associated norm given in Tutorial Qn. 7.7.
2. In much the same way, if U is a unitary n × n matrix, then
hU w, U zi = hw, zi
and, in particular,
kU wk = kwk
for all w, z ∈ Cn . Conversely, if A is an n × n complex matrix such that hAw, Azi = hw, zi
for all w, z ∈ Cn (or just kAwk = kwk for all w ∈ Cn ), then A is unitary.
37