2. Norm, distance, angle • norm • distance • angle

L. Vandenberghe
EE133A (Fall 2014-15)
2. Norm, distance, angle
• norm
• distance
• angle
• hyperplanes
• complex vectors
2-1
Euclidean norm
(Euclidean) norm of vector a ∈ Rn:
kak =
q
√
=
a21 + a22 + · · · + a2n
aT a
• if n = 1, kak reduces to absolute value |a|
• measures the magnitude of a
• sometimes written as kak2 to distinguish from other norms, e.g.,
kak1 = |a1| + |a2| + · · · + |an|
Norm, distance, angle
2-2
Properties
Nonnegative definiteness
kak ≥ 0
for all a,
kak = 0 only if a = 0
Homogeneity
kβak = |β|kak
for all vectors a and scalars β
Triangle inequality
ka + bk ≤ kak + kbk
for all vectors a and b of equal length
(proof on page 2-7)
Norm, distance, angle
2-3
Cauchy-Schwarz inequality
|aT b| ≤ kakkbk
for all a, b ∈ Rn
moreover, equality |aT b| = kakkbk holds if:
• a = 0 or b = 0; in this case aT b = 0 = kakkbk
• a 6= 0 and b 6= 0, and b = γa for some γ > 0; in this case
0 < aT b = γkak2 = kakkbk
• a 6= 0 and b 6= 0, and b = −γa for some γ > 0; in this case
0 > aT b = −γkak2 = −kakkbk
Norm, distance, angle
2-4
Proof of Cauchy-Schwarz inequality
1. trivial if a = 0 or b = 0
2. assume kak = kbk = 1; we show that −1 ≤ aT b ≤ 1
0 ≤ ka − bk2
0 ≤ ka + bk2
= (a − b)T (a − b)
= (a + b)T (a + b)
= kak2 − 2aT b + kbk2
= kak2 + 2aT b + kbk2
= 2(1 − aT b)
= 2(1 + aT b)
with equality only if a = b
with equality only if a = −b
3. for general nonzero a, b, apply case 2 to the unit-norm vectors
1
a,
kak
Norm, distance, angle
1
b
kbk
2-5
RMS value
let a be a real n-vector
• the average of the entries of a is
a1 + a2 + · · · + an 1T a
avg(a) =
=
n
n
• the root-mean-square value is the root of the average squared entry
r
rms(a) =
a21 + a22 + · · · + a2n kak
=√
n
n
Exercise
• show that | avg(a)| ≤ rms(a)
• show that avg(b) ≤ rms(a) where b = (|a1|, |a2|, . . . , |an|)
Norm, distance, angle
2-6
Triangle inequality from CS inequality
for vectors a, b of equal length
ka + bk2 = (a + b)T (a + b)
= aT a + bT a + aT b + bT b
= kak2 + 2aT b + kbk2
≤ kak2 + 2kakkbk + kbk2
(by Cauchy-Schwarz)
= (kak + kbk)2
• taking squareroots gives the triangle inequality
• triangle inequality is an equality if and only if aT b = kakkbk (see p. 2-4)
• also note from line 3 that ka + bk2 = kak2 + kbk2 if aT b = 0
Norm, distance, angle
2-7
Outline
• norm
• distance
• angle
• hyperplanes
• complex vectors
Distance
the (Euclidean) distance between vectors a and b is defined as ka − bk
• ka − bk ≥ 0 for all a, b; distance is equal to zero only if a = b
• triangle inequality
ka − ck ≤ ka − bk + kb − ck
for all a, b, c
c
ka − ck
kb − ck
a ka − bk b
ka − bk
• RMS deviation between n-vectors a and b is rms(a − b) = √
n
Norm, distance, angle
2-8
Standard deviation
let a be a real n-vector
• the de-meaned vector is the vector of deviations from the average

 

T
a1 − (1 a)/n
a1 − avg(a)
 a2 − avg(a)   a2 − (1T a)/n 
=

a − avg(a)1 = 
..
..

 

an − avg(a)
an − (1T a)/n
• the standard deviation is the RMS deviation from the average
a − ((1T a)/n)1
√
std(a) = rms(a − avg(a)1) =
n
• the de-meaned vector in standard units is
1
(a − avg(a)1)
std(a)
Norm, distance, angle
2-9
Exercise
show that
avg(a)2 + std(a)2 = rms(a)2
Solution
2
std(a)2 =
=
=
=
ka − avg(a)1k
n
T T
T
1
1 a
1 a
a−
1
a−
1
n
n
n
(1T a)2 (1T a)2
a a−
−
+
n
n
T
2
1
(1
a)
aT a −
n
n
1
n
T
!
2
1T a
n
n
= rms(a)2 − avg(a)2
Norm, distance, angle
2-10
Exercise: nearest scalar multiple
given two vectors a, b ∈ Rn, with a 6= 0, find scalar multiple ta closes to b
b
a
ta
Solution
• squared distance between ta and b is
kta − bk2 = (ta − b)T (ta − b) = t2aT a − 2taT b + bT b
a quadratic function of t with positive leading coefficient aT a
• derivative with respect to t is zero for
aT b
aT b
t= T =
a a kak2
Norm, distance, angle
2-11
Exercise: mean of set of points
given N vectors x1, . . . , xN ∈ Rn, find the n-vector z that minimizes
kz − x1k2 + kz − x2k2 + · · · + kz − xN k2
x4
x3
x5
z
x2
x1
z is also known as the centroid of the points x1, . . . , xN
Norm, distance, angle
2-12
Solution: sum of squared distances is
kz − x1k2 + kz − x2k2 + · · · kz − xN k2
n
X
2
2
2
=
(zi − x1i) + (zi − x2i) + · · · + (zi − xN i)
=
i=1
n
X
N zi2
− 2zi(x1i + x2i + · · · + xN i) +
x21i
+ ··· +
x2N i
i=1
(here xji is ith element of vector xj )
• term i in the sum is minimized by
x1i + x2i + · · · + xN i
zi =
N
• solution z is componentwise average of the points x1, . . . , xN :
1
z = (x1 + x2 + · · · + xN )
N
Norm, distance, angle
2-13
K-means clustering
a very popular iterative algorithm for partitioning N vectors in K clusters
Norm, distance, angle
2-14
Algorithm
choose initial ‘representatives’ z1, . . . , zK for the K clusters and repeat:
1. assign each vector xi to the nearest representative zj
2. replace each representative zj by the mean of the vectors assigned to it
• can be shown to converge in a finite number of iterations
• initial representatives are often chosen randomly
• solution depends on choice of initial representatives
• in practice, often restarted a few times, with different starting points
Norm, distance, angle
2-15
Example: first iteration
assignment to clusters
Norm, distance, angle
updated representatives
2-16
Example: iteration 2
assignment to clusters
Norm, distance, angle
updated representatives
2-17
Example: iteration 3
assignment to clusters
Norm, distance, angle
updated representatives
2-18
Example: iteration 9
assignment to clusters
Norm, distance, angle
updated representatives
2-19
Example: iteration 10
assignment to clusters
Norm, distance, angle
updated representatives
2-20
Example: iteration 11
assignment to clusters
Norm, distance, angle
updated representatives
2-21
Example: iteration 12
assignment to clusters
Norm, distance, angle
updated representatives
2-22
Outline
• norm
• distance
• angle
• hyperplanes
• complex vectors
Angle between vectors
the angle between nonzero real vectors a, b is defined as
T
a b
arccos
kak kbk
• this is the unique value of θ ∈ [0, π] that satisfies aT b = kakkbk cos θ
b
θ
a
• Cauchy-Schwarz inequality guarantees that
aT b
−1 ≤
≤1
kak kbk
Norm, distance, angle
2-23
Terminology
θ=0
aT b = kakkbk
vectors are aligned or parallel
0 ≤ θ < π/2
aT b > 0
vectors make an acute angle
θ = π/2
aT b = 0
vectors are orthogonal (a ⊥ b)
π/2 < θ ≤ π
aT b < 0
vectors make an obtuse angle
θ=π
aT b = −kakkbk
Norm, distance, angle
vectors are anti-aligned or opposed
2-24
Orthogonal decomposition
given a nonzero a ∈ Rn, every n-vector x can be decomposed as
x = ta + y
y
with y ⊥ a
x
aT x
t=
,
2
kak
ta
aT x
y =x−
a
2
kak
a
• proof is by inspection
• decomposition (i.e., t, y) exists and is unique for every x
• ta is projection of x on line through a (see page 2-11)
• since y ⊥ a, we have kxk2 = ktak2 + kyk2
Norm, distance, angle
2-25
Correlation coefficient
the correlation coefficient between non-constant vectors a, b is
a
˜T ˜b
ρ=
k˜
ak k˜bk
where a
˜ = a − avg(a)1 and ˜b = b − avg(b)1 are the de-meaned vectors
• only defined when a and b are not constant (˜
a 6= 0 and ˜b 6= 0)
• ρ is the cosine of the angle between the de-meaned vectors
• ρ is the average product of deviations from the mean in standard units
n
1 X (ai − avg(a)) (bi − avg(b))
ρ=
n i=1
std(a)
std(b)
Norm, distance, angle
2-26
Examples
ak
bk
bk
ρ = 0.97
k
ak
ak
k
bk
bk
ρ = −0.99
k
ak
ak
k
bk
bk
ρ = 0.004
k
Norm, distance, angle
k
ak
2-27
Regression line
• scatterplot shows two n-vectors a, b as n points (ak , bk )
• straight line shows affine function f (x) = c1 + c2x with
f (ak ) ≈ bk ,
Norm, distance, angle
k = 1, . . . , n
2-28
Least-squares regression
use coefficients c1, c2 that minimize J =
n
X
(f (ak ) − bk )
2
k=1
• J is a quadratic function of c1 and c2:
J
n
X
=
2
(c1 + c2ak − bk )
k=1
= nc21 + 2(1T a)c1c2 + kak2c22 − 2(1T b)c1 − 2(aT b)c2 + kbk2
• to minimize J, set derivatives with respect to c1, c2 to zero:
nc1 + (1T a)c2 = 1T b,
(1T a)c1 + kak2c2 = aT b
• solution is
aT b − (1T a)(1T b)/n
,
c2 =
kak2 − (1T a)2/n
Norm, distance, angle
1T b − (1T a)c2
c1 =
n
2-29
Interpretation
slope c2 can be written in terms of correlation coefficient ρ of a and b:
std(b)
(a − avg(a)1)T (b − avg(b)1)
=
ρ
c2 =
ka − avg(a)1k2
std(a)
offset c1 = avg(b) − avg(a)c2
• hence, expression for regression line can be written as
f (x) = avg(b) +
ρ std(b)
(x − avg(a))
std(a)
• correlation coefficient ρ is the slope after converting to standard units:
f (x) − avg(b)
x − avg(a)
=ρ
std(b)
std(a)
Norm, distance, angle
2-30
Examples
ρ = 0.91
ρ = −0.89
ρ = 0.25
• dashed lines in top row show average ± standard deviation
• bottom row shows scatterplots of top row in standard units
Norm, distance, angle
2-31
Outline
• norm
• distance
• angle
• hyperplanes
• complex vectors
Hyperplane
one linear equation in n variables x1, x2, . . . , xn:
a1x1 + a2x2 + · · · + anxn = b
in vector notation: aT x = b
let H be the set of solutions: H = {x ∈ Rn | aT x = b}
• H is empty if a1 = a2 = · · · = an = 0 and b 6= 0
• H = Rn if a1 = a2 = · · · = an = 0 and b = 0
• H is called a hyperplane if a = (a1, a2, . . . , an) 6= 0
• for n = 2, a straight line in a plane; for n = 3, a plane in 3-D space, . . .
Norm, distance, angle
2-32
Example
b = −5
x2
b = −10
b = 15
a = (2, 1)
x1
b = −15
b = 10
b=0
Norm, distance, angle
b=5
2-33
Geometric interpretation of hyperplane
• recall formula for orthogonal decomposition of x w.r.t. a (page 2-25):
aT x
x=
a+y
2
kak
with y ⊥ a
H
y
T
x
• x satisfies a x = b if and only if
b
a+y
x=
2
kak
with y ⊥ a
2
a
(b/kak )a
• point (b/kak2)a is the intersection of hyperplane with line through a
• add arbitrary vectors y ⊥ a to get all other points in hyperplane
Norm, distance, angle
2-34
Exercise: projection on hyperplane
• show that the point in H = {x | aT x = b} closest to c ∈ Rn is
aT c − b
a
x
˜=c−
kak2
|aT c − b|
• kc − x
˜k =
is the distance of c to the hyperplane
kak2
c
x
˜
a
H
Norm, distance, angle
2-35
Solution
we need to find y in the decomposition
b
x
˜=
a+y
kak2
with y ⊥ a
• decomposition of c with respect to a is
aT c
a+d
c=
2
kak
aT c
with d = c −
a
2
kak
• squared distance between c and x
˜ is
T
2
T
2
a
c
−
b
(a
c
−
b)
2
kc − x
˜k2 = +
kd
−
yk
aT a a + d − y =
kak2
(second step follows because d − y ⊥ a)
• distance is minimized by choosing y = d
Norm, distance, angle
2-36
Kaczmarz algorithm
Problem: find (one) solution of set of linear equations
aT1 x = b1,
aT2 x = b2,
...,
aTmx = bm
• here a1, a2, . . . , am are nonzero n-vectors
• we assume the equations are solvable (have at least one solution)
• n is huge, so we need a very inexpensive algorithm
Algorithm: start at some initial x and repeat the following steps
• pick an index i = {1, . . . , m}, for example, cyclically or randomly
• replace x with projection on hyperplane Hi = {˜
x | aTi x
˜ = bi}
aTi x − bi
x := x −
ai
2
kaik
Norm, distance, angle
2-37
Tomography
reconstruct unknown image from line integrals
aij
pixel j
ray i
• x represents unknown image with n pixels
• aij is length of intersection of ray i and pixel i
n
P
• bi is a measurement of the line integral
aij xj along ray i
j=1
Kaczmarz alg. is also known as Algebraic Reconstruction Technique (ART)
Norm, distance, angle
2-38
Outline
• norm
• distance
• angle
• hyperplanes
• complex vectors
Norm
norm of vector a ∈ Cn:
kak =
=
p
√
|a1|2 + |a2|2 + · · · + |an|2
aH a
• nonnegative definite:
kak ≥ 0
for all a,
kak = 0
only if a = 0
• homogeneous:
kβak = |β|kak
for all vectors a, complex scalars β
• triangle inequality
ka + bk ≤ kak + kbk
Norm, distance, angle
for all vectors a, b of equal length
2-39
Cauchy-Schwarz inequality for complex vectors
|aH b| ≤ kakkbk
for all a, b ∈ Cn
moreover, equality |aH b| = kakkbk holds if:
• a = 0 or b = 0
• a 6= 0 and b 6= 0, and b = γa for some (complex) scalar γ
• exercise: generalize proof for real vectors on page 2-4
• we say a and b are orthogonal if aH b = 0
• we will not need definition of angle, correlation coefficient, . . . in Cn
Norm, distance, angle
2-40