INF 386, V-2003 What is “shape” ? Selected Themes from Digital Image Analysis

INF 386, V-2003
Selected Themes
from Digital Image Analysis
Lecture 5
07.04.2003
Moment Based
Shape Representation
Fritz Albregtsen
Department of Informatics
University of Oslo
INF 386, 2003, Lecture 1, page 1 of 42
What is “shape” ?
• A numerical description of the spatial
configurations in the image.
• There is no generally accepted
methodology of shape description.
• Location and description of high curvature
points give essential information.
• Many useful, application-dependent
heuristics.
• Shape description of 2D planar objects is
“easy”.
• Shape is defined in an image, but its
usefulness in a 3D world depends on how
well the 3D -> 2D mapping is handled.
• Invariance is an important issue.
INF 386, 2003, Lecture 1, page 2 of 42
Considerations
Assumptions
• We have a segmented, labeled image.
• Each object that is to be described has
been identified.
• The image objects can be represented as
— binary image (whole regions)
— contour (region boundaries)
— through a run length code
— through a chain code
— through a quad tree
— in cartesian coordinates
— in polar coordinates
— in some other coordinates
— as coefficients of some transform
— ...
• Input representation form, boundaries or
whole regions?
• Object reconstruction ability?
• Incomplete shape recognition ability?
• Local/global description?
• Mathematical or heuristic techniques?
• Statistical or syntactic object description?
• Robustness of description to translation,
rotation, and scale transformations?
• Shape description properties in different
resolutions?
— description changes discontinously.
• Robustness against
— noise
— geometric sampling (discretization)
— quantization
INF 386, 2003, Lecture 1, page 3 of 42
INF 386, 2003, Lecture 1, page 4 of 42
Area
• Features may be computed both from the
region and its boundary.
Recursive boundary splitting
• Area is given by the number of pixels of
which the region consists.
In its original form (Douglas and Peucker
1973 ), the algorithm is as follows :
• The surface integral over S (contour C)
is given by Green’s theorem:
1) Draw a straight line between the first and
the last point.
A=
Z Z
dxdy =
S
Z
xdy
C
s := 0.0;
n := n + 1;
pkt[n].x := pkt[1].x;
pkt[n].y := pkt[1].y;
for i:=2 step 1 until n do
begin
dy := pkt[i].y - pkt[i-1].y
s := s + (pkt[i].x + pkt[i-1].x)/2 * dy;
end;
area := if (s > 0) then s else -s;
• The region can also be represented
by n polygon vertices
n−1
1 X
A= (ik jk+1 − ik+1jk )
2
2) For each intermediate point, compute the
distance from the line.
- The point lying farthest from the line
becomes a new break point, provided that
its distance from the line is greater than a
given threshold.
1) and 2) are then repeated for the two new
parts of the path.
This is probably the most frequently used
method.
The method is not sequential,
and is not particularly fast.
k=0
where the sign of the sum reflects
the polygon orientation.
INF 386, 2003, Lecture 1, page 5 of 42
INF 386, 2003, Lecture 1, page 6 of 42
Boundary polygonization
• The polygonization method of Wall and Danielsson
(1984) steps from point to point through an ordered
sequence of points, and outputs the previous point
as a new breakpoint if the area deviation per unit
length of the approximating line segment exceeds a
prespecified tolerance, TW D .
• Using the previous breakpoint as the current origin,
the area between the curve f (x) and the
approximating line segment is accumulated by
Freeman computed the perimeter as the length of the chain. The
formula for the perimeter is
√
P = ne + 2no
(1)
where ne is the number of even chain elements and no the
number of odd chain elements. An even chain element indicates
a vertical or horizontal connection between two boundary pixels,
having length 1, while an odd chain
element indicates a diagonal
√
connection, which has length 2.
Vossepoel and Smeulders improved Freeman’s method in
estimating lengths of straight lines by using a corner count n c ,
defined as the number of occurrences of consecutive unequal
chain elements in the Freeman chain code string. The length is
given by
P = 0.980ne + 1.406no − 0.091nc
(2)
f(x)
si
Ai
x
1
Ai = Ai−1 + (yixi−1 − xiyi−1) ,
2
Perimeter from chain codes
q
si = x2i + yi2
• If | Ai | /si < T , i is incremented and (Ai, si) is
recomputed.
• Otherwise, the previous point is a new breakpoint.
The origin is moved, and the previous value of si is
stored.
• This method is purely sequential and very fast.
INF 386, 2003, Lecture 1, page 7 of 42
where the weights were found by a least-square fitting for all
straight lines with ne + no = 1000.
The methods based on the chain coding compute the perimeter
as the length of the chain, and often give an overestimated result.
Kulpa derived a compensation factor for computing the length of
straight lines. With this factor, Eq. (1) becomes
√
√
π
P = (1 + 2)(ne + 2no )
(3)
8
where the factor is approximately 0.948. Kulpa found that this
compensation also gave good results for most of the blob-like
objects met in practice. Dorst and Smeulders proved that Eq. (3)
gave a consistent estimate for the length of a circular arc of π/4.
INF 386, 2003, Lecture 1, page 8 of 42
Invariance of features
Shape invariants
• Shape descriptors depend on viewpoint,
=> object recognition may often be
impossible if object or observer changes
position.
• Shape description invariance is important
– shape invariants represent properties
which remain unchanged under an
appropriate class of transforms.
• Stability of invariants is a crucial property
which affects their applicability. The
robustness of invariants to image noise
and errors introduced by image sensors is
of prime importance.
• Assume that we have an object, and that
we want to extract some features to
describe the object.
• We may wish that the features are:
— Position invariant
independent of the position of the
object within the image.
— Scaling invariant
independent of the size of the object.
— Rotation invariant
independent of the orientation of the
object.
— Warp invariant
independent of a deformation of the
object.
• In most cases we want position invariant
features.
• The other depend on the application.
INF 386, 2003, Lecture 1, page 9 of 42
INF 386, 2003, Lecture 1, page 10 of 42
Statistical moments
Moments are applicable to many aspects of image processing;
invariant pattern recognition, image encoding, pose estimation.
Here we are using them to extract properties that have analogies
in statistics or mechanics.
The general form of a moment of order (p + q), evaluating over the
complete image plane ξ is:
Z Z
mpq =
ψpq (x, y)f (x, y)dxdy
(4)
ξ
where the weighting kernel or basis function is ψpq .
Non-orthogonal moments
The continuous two-dimensional (p + q)-th order
Cartesian moment is defined as:
Z ∞Z ∞
xpy q f (x, y)dxdy
mpq =
−∞
(7)
−∞
It is assumed that f (x, y) is a piecewise continuous,
bounded function and that it can have non-zero values
only in the finite region of the xy plane. Then, moments
of all orders exist and the uniqueness theorem holds:
This produces a weighted description of f (x, y) over ξ.
The basis functions may have a range of useful properties that
may be passed onto the moments, producing descriptions which
can be invariant under rotation, scale, translation and
orientation. To apply this to digital images, Equation 4 needs to
be expressed in discrete form.
The probability density function (of a continuous distribution) is
different from that of the probability of a discrete distribution.
For simplicity we assume that ξ is divided into square pixels of
dimension 1 × 1, with constant intensity I over each square pixel.
So if Px,y is a discrete pixel value then:
Px,y = I(x, y)∆A
where ∆A is the sample or pixel area equal to one. Thus,
XX
ψpq (x, y)Px,y ; p, q = 0, 1, 2, ..., ∞
Mpq =
x
(5)
(6)
y
The moment sequence mpq with basis xpy q is uniquely
defined by f (x, y) and f (x, y) is uniquely defined by mpq .
Thus, the original image can be described and
reconstructed, if sufficiently high order moments are
used.
The discrete version of the Cartesian moment for an
image consisting of pixels Pxy , is:
mpq =
M X
N
X
xpy q Px,y
(8)
x=1 y=1
mpq is a two dimensional Cartesian moment, where M
and N are the image dimensions and the monomial
product xpy q is the basis function.
The choice of basis function depends on the application
and on any desired invariant properties.
INF 386, 2003, Lecture 1, page 11 of 42
INF 386, 2003, Lecture 1, page 12 of 42
Fast computation of moments
Green’s theorem states that if a function g(x, y) is integrable over a
simply connected region O and can be decomposed as a sum of
the derivatives of two functions N and M
Low order moments
g(x, y) =
• The zero order moment m00 is defined as
the total mass (or power) of the image.
• For a binary M × N image of an object, this
gives the number of pixels in the object.
m00 =
N
M X
X
Px,y
(9)
then we have
Z Z
∂N (x, y) ∂M (x, y)
−
∂x
∂y
g(x, y)dxdy =
O
Z
[M (x, y)dx + N (x, y)dy]
(11)
(12)
L
where L is the anticlockwise oriented boundary of the region O.
Replacing g(x, y) by the moment kernel xp y q , we obtain
Z
Z
Z Z
1
1
xpy q dxdy =
xp+1y q dy =
xpy q+1 dx
mpq =
p+1 L
q+1 L
O
(13)
x=1 y=1
• The two first order moments are used to
find the Centre Of Mass (COM) of an
image. If this is applied to a binary image
and the results are normalised by m00, then
the result is the centre co-ordinates of the
object.
m01
m10
y¯ =
(10)
x¯ =
m00
m00
Li and Shen (1991) approximate Eq. 13 by
X p+1 q
X p q+1
1
1
mpq =
xi yi ∆yi =
xi yi ∆xi
p+1
q+1
(xi ,yi )∈L
where ∆yi = yi+1 − yi and ∆xi = xi+1 − xi , and (xi+1, yi+1) is the
point following (xi, yi) on L.
From one edge point to the next, the change of coordinate values
x and y is at most 1. When x increases or decreases by 1, we have
(x + 1)py q =
p
X
k=0
Cpk xk y q ;
(x − 1)py q =
p
X
(−1)p−k Cpk xk y q
(15)
k=0
where Cpk is the binomial coefficient
!
p
p!
k
.
=
Cp =
k!(p − k)!
k
INF 386, 2003, Lecture 1, page 13 of 42
(14)
(xi ,yi )∈L
(16)
INF 386, 2003, Lecture 1, page 14 of 42
Our method
Fast computation .. II
(x ± 1)py q can be computed from xpy q . Li and Shen implement
Eq. 15 by the method shown in the figure below.
q
y
q
xy
(x 1) y
q
c
d
c d
2
(x 1) y
2 q
xy
q
3
(x 1) y
3 q
xy
....
....
p q
Let O be a discrete region, and L be its 8-connected boundary. A
boundary point is a pixel in the object, of which at least one of the
four neighbors is in the background. We then have
X
X
f (x, y) =
(Fx (x, y)DY (x, y) + f (x, y)CY (x, y))
(17)
O
q
L
where
....
p
xy
We have proposed a method which is computationally as effective
as the method of Li and Shen, and achieves better accuracy.
In our method, Green’s theorem is applied in a different way.
....
(x 1) y
q
xp(y ± 1)q can be computed from xpy q in a similar way.
In this way, xpiyiq is updated by only additions.
To compute the 10 moments with order (p + q) ≤ 3, only 20
additions are needed to update xp+1y q when x is changed, and
only 16 additions are needed to update xp+1y q when y is changed.
The method of Li and Shen is simple to implement and effective
in computation. To compute the 10 low order moments, it needs
about 23L additions and 23 multiplications plus contour
following.
The disadvantage of this method is its inaccuracy for small
objects and objects with complex shape. The inaccuracy is due to
the approximation by Eq. 14.
INF 386, 2003, Lecture 1, page 15 of 42
Fx (x, y) =
x
X
f (i, y)
(18)
i=1
and DY (x, y) and CY (x, y) are defined as


 1 (x, y) is only a “right hand point”
DY (x, y) = −1 (x, y) is only a “left hand point”


0 otherwise
(
1 (x, y) is a “left hand point”
CY (x, y) =
0 otherwise.
A “right hand point” is a boundary point at the right hand side of
the object, and a “left hand point” is a boundary point at the left
hand side of the object. A boundary point can simultaneously be
both a “left” and a “right hand point”, in which case DY (x, y) = 0,
CY (x, y) = 1.
We propose to consider background to object transitions instead
of the boundary points. Assuming that a pixel is a square with
four sides, a background to object transition is a side of the
boundary pixel at which the background and the object are
divided. A boundary pixel may have one to four transitions. A
background to object transition has four possible directions, as
illustrated in the next figure.
INF 386, 2003, Lecture 1, page 16 of 42
Our method .. III
Our method .. II
2
1
3
If we let
upq =
0
x i= x
y i= y
dyi = 0
x i= x−1
y i= y
dyi = −1
xi = x
y i= y
dyi = 0
xi = x
y i= y
dyi = 1
X
q
xp+1
i yi ∆yi
(22)
i
Eq. 21 becomes
p
These four directions are enumerated as 0, 1, 2, and 3. A triplet
(xi, yi , ∆yi) can be computed for each transition from the
background to the object. For the i’th transition, (xi, yi) is
recorded as the coordinate of the boundary point, except when
the direction is 1, then xi is x-coordinate of boundary point minus
1, i.e. xi = x − 1. ∆yi is 0 when the direction is 0 or 2, it is −1 when
the direction is 1, and 1 when the direction is 3.
By using the transitions instead of the boundary points, it is not
difficult to rewrite Eq. 17 as
X
X
f (x, y) =
Fx (xi, yi)∆yi
(19)
i
(x,y)∈O
which is a sum over all the transitions from the background to the
object. If a boundary point is a “left hand point”, then a transition
with direction 1 will be recorded; if a boundary point is a “right
hand point”, then a transition with direction 3 will be recorded.
Letting f (x, y) = xp y q , we have
Fx (xi, yi ) =
xi
X
x=1
p
xp yiq =
q
xp+1
xpy q X 1 j−1
i yi
yiq (20)
+ i i +
Cp Bj xp−j+1
i
p+1
2
j
j=2
where
is the combination number, Bj is the j’th Bernoulli
number, and Eq. 19 now becomes
"
p
X xp+1y q xpy q X
1 j−1
p−j+1 q
i
i
i i
+
+
C Bj xi
yi ∆yi.
(21)
mpq =
p+1
2
j p
i
j=2
Cpj−1
INF 386, 2003, Lecture 1, page 17 of 42
mpq =
up−1,q X 1 j−1
upq
+
+
C Bj up−j,q .
p+1
2
j p
j=2
(23)
The moment mpq is a linear combination of ujq for j = 0, 1, 2, · · · p.
The monomials ujq can be computed by the method of Li and
Shen. The 10 low order moments can thereafter be computed as


 

u0q
1
0
0
0
m0q

 m   1/2 1/2 0

0 
 1q  
  u1q 


=
.
 m2q   1/6 1/2 1/3 0   u2q 
0 1/4 1/2 1/4
m3q
u3q
Our algorithm to compute the geometric moments mpq for all the
orders from 0 up to (P + Q) for any integers P and Q is now stated
as follows.
• The contour following is applied, and a triple (xi, yi, ∆yi), as
defined in the above text and illustrated in the figure, is
recorded for each transition from the background to the
object.
q
• xp+1
i yi for all p and q are computed by the updating as
described in Figure 1.
• upq as defined in Eq. 22 is then computed, and the moments
mpq is a linear combination of ujq for j = 0, 1, 2, · · · p as in
Eq. 23.
The result will be exactly the same as the one obtained by the
double summation in Eq. 8.
INF 386, 2003, Lecture 1, page 18 of 42
Central moments
Comparison of methods
• Algorithms for fast, but not exact
computation of moments are given by
B.-C. Li and J. Chen,
Pattern Recognition 24, 807-813, 1991.
• Faster and exact algorithms are given by
L. Yang and F. Albregtsen,
Pattern Recognition 29, 1061-1073, 1996.
L. Yang, F. Albregtsen, and T. Taxt,
Graphical Models and Image Processing,
59, 97-108, 1997.
• The definition of a 2D discrete central
moment is:
XX
(x − x¯)p(y − y¯)q f (x, y)
µp,q =
x
y
where
x¯ =
m10
,
m00
y¯ =
m01
m00
• This corresponds to computing ordinary
Cartesian moments after translating the
object so that center of mass is in origo.
• This means that central moments are
invariant under translation.
• Central moments are not scaling or
rotation invariant.
INF 386, 2003, Lecture 1, page 19 of 42
INF 386, 2003, Lecture 1, page 20 of 42
µpq from mpq
• Central moments µpq can easily be obtained from
ordinary moments mpq .
A translation of an image f (x, y) by (∆x, ∆y) in the
(x, y)-direction gives a new image
0
f (x, y) = f (x − ∆x, y − ∆y)
• If we assume that ∆x = x¯ and ∆y = y¯, then the
moments of p + q ≤ 3 are :
µ00
µ10
µ20
µ02
µ11
µ30
µ12
µ21
µ03
=
=
=
=
=
=
=
=
=
m00
µ01 = 0
m20 − x
¯m10
m02 − y¯m01
m11 − y¯m10
m30 − 3¯
xm20 + 2¯
x2m10
m12 − 2¯
y m11 − x
¯m02 + 2¯
y 2 m10
m21 − 2¯
xm11 − y¯m20 + 2¯
x2m01
m03 − 3¯
y m02 + 2¯
y 2m01
• The general 3D central moments, µpqr , are generally
expressed by the mpqr moments:
µpqr =
p X
q X
r
X
s=0 t=0 u=0
−1
[D−d]
p
s
!
q
t
!
r
u
!
∆xp−s∆y q−t ∆z r−u mstu
where
D = (p + q + r);
d = (s + t + u)
and the binomial coefficients are given by
v
w
!
=
Moments of inertia
• The two second order central moments
µ20 =
XX
(x − x
¯)2f (x, y)
x
µ02 =
y
XX
x
y
(y − y¯)2f (x, y)
correspond to the “moments of inertia”
relative to the coordinate directions,
while the cross moment of inertia
is given by
µ11 =
XX
(x − x¯)(y − y¯)f (x, y)
x
y
• An elongated object having a random
orientation will have moments of inertia
that do not reflect the true shape of the
object, as they are not orientation
invariant.
• The three second order µpq can easily be
made invariant to rotation.
v!
, w<v
w!(v − w)!
INF 386, 2003, Lecture 1, page 21 of 42
INF 386, 2003, Lecture 1, page 22 of 42
Object orientation
• Orientation is defined as the angle (relative
to the x-axis) of the axis through the center
of mass that gives the lowest moment of
inertia.
• Orientation, θ, relative to x-axis is found by minimizing the
sum
XX
2
I(θ) =
β − β¯ f (α, β)
α
β
where the rotated coordinates are given by
α = x cos θ + y sin θ, β = y cos θ − x sin θ
Bounding rectangle
• “Image-oriented bounding box”
The smallest rectangle around the object,
having sides that are parallell to the edges
of the image.
Found by searching for min and max x and
y within the object (xmin , xmax, ymin, ymax).
Then we get
I(θ) =
x
y
We require that
∂I(θ)
∂θ
XX
x
XX
y
[(y − y¯) cos θ − (x − x
¯) sin θ]2 f (x, y)
= 0 and get
2f (x, y) [(y − y¯) cos θ − (x − x¯) sin θ]×[−(y − y¯) sin θ − (x − x
¯) cos θ] = 0
XX
x
=
y
2f (x, y) (x − x
¯)(y − y¯)(cos2 θ − sin2 θ)
XX
x
y
2f (x, y) (x − x
¯)2 − (y − y¯)2 sin θ cos θ
2 sin θ cos θ
2 tan θ
2µ11
=
= tan 2θ
2 =
2
(µ20 − µ02 ) cos θ − sin θ
1 − tan2 θ
• Orientation is then given by
2µ1,1
1
−1
θ = tan
2
µ2,0 − µ0,2
• “Object-oriented bounding box”
Smallest rectangle around the object,
having one side parallell to the orientation
of the object (θ).
The transformation
α = x cos θ + y sin θ
β = −x sin θ + y cos θ
is applied to all pixels in the object
(or its boundary).
Then search for αmin , αmax, βmin , βmax.
where θ ∈ [0, π/2] if µ11 > 0, and θ ∈ [π/2, π] if µ11 < 0.
INF 386, 2003, Lecture 1, page 23 of 42
INF 386, 2003, Lecture 1, page 24 of 42
Orientation invariant features
• The radius of gyration of an object is
defined as the radius of a circle where we
could concentrate all the mass of the
object without altering the moment of
inertia about its center of mass.
• This feature is invariant to orientation.
ˆ=
R
r
µ20 + µ02
µ00
• The object ellipse is also a simple,
invariant object feature. The object ellipse
is defined as the ellipse whose least and
greatest moments of inertia equal those of
the object.
• The semimajor and semiminor axes
v h
i
u
p
u 2 µ20 + µ02 ± (µ20 − µ02 )2 + 4µ2
11
t
(ˆ
a, ˆb) =
µ00
Normalization and invariants
• Changing the scale of f (x, y) by (α, β) in the
(x, y)-direction gives a new image
0
f (x, y) = f (x/α, y/β)
0
• The transformed central moments µpq can
be expressed by the original µpq
0
µpq = α1+pβ 1+q µpq
0
• For β = α we have µpq = α2+p+q µpq .
We get scaling invariant central moments
by the normalization
µpq
p+q
, γ=
+ 1, ∀(p + q) ≥ 2.
ηpq =
γ
(µ00)
2
• The numerical eccentricity of the ellipse
=
r
a2 − b 2
a2
INF 386, 2003, Lecture 1, page 25 of 42
INF 386, 2003, Lecture 1, page 26 of 42
Symmetry properties
• The degree and direction of skewness can be
determined using the two third order moments, µ30
and µ03 .
• Character recognition based on symmetry
properties is possible, using the first seven
scale-normalised central moments
(η11 , η20 , η02 , η21 , η12 , η30 , η03 ).
• Symmetry about the x or y axes will produce η11 = 0.
Symmetry about the y axis ⇐ η12 = 0, η30 = 0.
Symmetry about the x axis ⇐ η03 = 0, η12 > 0.
• Further, the following generalities are true:
ηpq = 0 ∀p = 0, 2, 4, ...; q = 1, 3, 5, ...
for objects symmetric about the x axis.
However, objects asymmetrical about the x axis
produce:
and
ηpq < 0 ∀p = 0, 2, 4, ...; q = 1, 3, 5, ...
ηp0 > 0, η0p > 0 ∀p = 0, 2, 4, ...; f (x, y) > 0.
• The sign of the normalized central moments give
qualitative information about the symmetry.
The magnitude of the central moments gives a
quantitative description.
INF 386, 2003, Lecture 1, page 27 of 42
Hu’s rotation invariance
1 Find principal axes of object, rotate coordinats.
This method can break down when images do not
have unique principal axes.
2 The method of absolute moment invariants.
This is a set of seven combined normalized central
moment invariants, which can be used for scale,
position, and rotation invariant pattern
identification.
φ1 = η20 + η02
2
φ2 = (η20 − η02)2 + 4η11
φ3 = (η30 − 3η12)2 + (3η21 − η03)2
φ4 = (η30 + η12)2 + (η21 + η03)2
φ5 = (η30 − 3η12)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2
+(3η21 − η03)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2
φ6 = (η20 − η02) (η30 + η12)2 − (η21 + η03)2 + 4η11(η30 + η12)(η21 + η03)
φ7 = (3η21 − η03)(η30 + η12) (η30 + η12)2 − 3(η21 + η03)2
+(3η12 − η30)(η21 + η03) 3(η30 + η12)2 − (η21 + η03)2
• φ7 is skew invariant, to help distinguish mirror
images.
• These moments are of finite order, therefore, they do
not comprise a complete set of image descriptors.
However, higher order invariants can be derived.
INF 386, 2003, Lecture 1, page 28 of 42
Contrast invariants
• A change in contrast gives a new intensity
0
distribution f (x, y) = cf (x, y).
The transformed moments are then
0
µpq = cµpq
cancels both scaling and contrast.
The normalization is given by
0
ηpq =
0
µpq
µ00
0
µ00
µ20 + µ02
• A set of four moments that are invariant
under general affine transforms
µ20µ02 − µ211
µ400
µ2 µ2 − 6µ30µ21 µ12µ03 + 4µ30µ312 µ03 − 3µ212 µ221
= 30 03
µ10
00
2
µ20(µ21µ30 − µ12 ) − µ11 (µ30µ03 − µ21µ12 + µ02 (µ30µ12 − µ221)
=
µ700
= {µ320µ203 − 6µ220µ02µ21 µ03 + 9µ220µ02 µ212
+12µ20µ211 µ21 µ03 + 6µ20µ11 µ02 µ30µ03 − 18µ20µ11µ02 µ21µ12
−8µ311µ30µ03 − 6µ20µ202 µ30µ12 + 9µ20µ202 µ221
+12µ211µ02 µ30 µ12 − 6µ11µ202 µ30 µ21 + µ302 µ230 }/µ11
00
I1 =
• Abo-Zaid et al. have defined a normalization that
Affine invariants
I2
I3
(p+q)
2
I4
0
• If we use µ00 = cα2 µ00 , µ02 = cα4 µ02 , and µ20 = cα4 µ20 ,
0
we easily see that ηpq is invariant to both scale and
contrast.
• This normalization also reduces the dynamic range
of the moment features, so that we may use higher
order moments without having to resort to
logarithmic representation.
• Abo-Zaid’s normalization cancels the effect of
changes in contrast, but not the effect of changes in
0
intensity: f (x, y) = f (x, y) + b.
In practice, we often experience a combination :
0
f (x, y) = cf (x, y) + b.
INF 386, 2003, Lecture 1, page 29 of 42
• J. Flusser,
Pattern Recognition 33, 1405-1410, 2000
gives an excellent overview of moment invariants.
• O. Trier, A.K. Jain and T. Taxt,
"Feature Extraction Methods for Character
Recognition-A Survey",
Pattern Recognition, Vol. 29, 3, pp. 641-662, April
1996,
gives a very good and useful overview.
INF 386, 2003, Lecture 1, page 30 of 42
Legendre moments
Orthogonal moments
• Moments produced using orthogonal basis sets
have the advantage of needing lower precision to
represent differences to the same accuracy as the
monomials.
• The orthogonality condition simplifies the
reconstruction of the original function from the
generated moments.
• Orthogonality means mutually perpendicular: two
functions ym and ym are orthogonal over an interval
a ≤ x ≤ b if and only if:
Z b
ym (x)yn(x)dx = 0; m 6= n
• The Legendre moments of order (m + n):
Z Z
(2m + 1)(2n + 1) 1 1
Pm(x)Pn(y)f (x, y)dxdy
λmn =
4
−1 −1
(24)
where m, n = 0, 1, 2, ..., ∞,
Pm and Pn are the Legendre polynomials
f (x, y) is the continuous image function.
• For orthogonality to exist in the moments, the image
function f (x, y) is defined over the same interval as
the basis set, where the n-th order Legendre
polynomial is defined as:
Pn(x) =
• Two such (well established) orthogonal moments
are Legendre and Zernike.
anj xj
(25)
j=0
and the Legendre coefficients are given by:
a
• Here we are primarily interested in discrete images,
so the integrals within the moment descriptors are
replaced by summations.
n
X
anj = (−1)(n−j)/2
1
(n + j)!
,
n
(n−j)
2 (
)!( (n+j) )!j!
2
2
n − j = even.
(26)
• For a discrete image with current pixel Pxy , the
Legendre moments of order (m + n) are given by
(2m + 1)(2n + 1) X X
Pm(x)Pn(y)Pxy (27)
λmn =
4
x
y
and x, y are defined over the interval [−1, 1].
INF 386, 2003, Lecture 1, page 31 of 42
INF 386, 2003, Lecture 1, page 32 of 42
Complex Zernike moments
• The Zernike moment of order m and repetition n is
m+1 XX
f (x, y) [Vmn(x, y)]∗ ,
Amn =
π
x
y
where
Orthogonal radial polynomial
• The Zernike polynomials Vmn(x, y), expressed in
polar coordinates are:
Vmn(r, θ) = Rmn(r)ejnθ
where (r, θ) are defined over the unit disc
and Rmn is the orthogonal radial polynomial
(m−|n|)/2
x2 + y 2 ≤ 1
m = 0, 1, 2, ..., ∞; f (x, y) is the image function, ∗
denotes the complex conjugate, and n is an integer
(positive or negative) depicting the angular
dependence or rotation, subject to the conditions
m − |n| = even,
|n| ≤ m
• The Zernike moments are projections of the input
image onto a space spanned by the orthogonal V
functions
Vmn(x, y) = Rmnejnθ
√
where j = −1, and
(m−|n|)/2
Rmn (x, y) =
X
s=0
(−1)s(x2 + y 2 )(m/2)−s (m − s)!
m−|n|
s! m+|n|
−
s
!
−
s
!
2
2
INF 386, 2003, Lecture 1, page 33 of 42
X
Rmn(r) =
(−1)sF (m, n, s, r)
s=0
where
F (m, n, s, r) =
s!
m+|n|
2
(m − s)!
r(m−2s)
m−|n|
−s !
−s !
2
• The first radial polynomials are
R00 = 1 , R11 = r
R20 = 2r2 − 1 , R22 = r2
R31 = 3r3 − 2r , R33 = r3
• So for a discrete image, if P(x,y) is the current pixel,
m +1 XX
P (x, y) [Vmn(x, y)]∗ , x2 + y 2 ≤ 1
Amn =
π
x
y
INF 386, 2003, Lecture 1, page 34 of 42
Relating Zernike and Cartesian
Orthogonal radial polynomial II
• To calculate the Zernike moments, the image
(or region of interest) is first mapped to the unit disc
using polar coordinates, where the centre of the
image is the origin of the unit disc. Those pixels
falling outside the unit disc are not used in the
calculation.
• When mapping from Cartesian to polar coordinates
care must be taken as to which quadrant the
Cartesian coordinates appear in.
• Translation invariance is achieved by moving the
origin to the image’s COM.
• Scale invariance is produced by altering each object
so that its area (or pixel count for a binary image) is
m00 = β, where β is a predetermined value.
• For a binary image, both invariance properties can
be achieved using
r
x
y
β
+ x¯, + y¯ , a =
h(x, y) = f
a
a
m00
As a result, |A11 | is set to zero and |A00| = βπ .
This will be affected by the discrete implementation
where the error in the mapping will decrease as the
object size (or the pixel-resolution) increases.
• Zernike polynomials are given by Vmn (r, θ) = Rmn (r)ejnθ
and the radial polynomials are defined by
P(m−|n|)/2
(−1)sF (m, n, s, r)
Rmn (r) = s=0
• Re-arranging F (m, n, s, r) gives
Fmn(r) =
s!
=
s!
m+|n|
2
(m − s)!
r(m−2s)
m−|n|
−s !
−s !
2
(m − s)!
r(m−2s)
m−2s−|n|
!
!
2
m−2s+|n|
2
• Substituting k = (m − 2s) and rearranging gives
Rmn (r) =
m
X
k=n
Bmnk =
Bmnk rk ,
(m − k)even, n ≥ 0
!
(−1)(m−k)/2 m+k
2
m−k
k+n
k−n
!
!
2
2
2 !
• This produces Zernike moment definitions (in continuous
form):
Z 2π Z 1
m
m+1X
Amn =
Bmnk
rk e−jnθ f (r, θ)rdrdθ; r ≤ 1
π
0
0
k=n
which when translated to Cartesian coordinates is
Z Z
m
m+1X
Bmnk
(x−jy)n(x2+y 2 )(k−n)/2f (x, y)dxdy; x2+y 2 ≤ 1
Amn =
π
x y
k=n
• The double integral can now be expressed in terms of a series
of summed Cartesian moments, e.g.
Z Z
0
1X
1
Z00 =
B00k
(x − jy)0(x2 + y 2 )(k)/2 f (x, y)dxdy = m00
π
π
x y
k=0
INF 386, 2003, Lecture 1, page 35 of 42
INF 386, 2003, Lecture 1, page 36 of 42
Moment noise sensitivity
• Various invariant moment schemes have proved
useful in recognition and reconstruction tests for
images containing very little or no noise.
• In the presence of noise, the Hu invariant moments
begin to degrade.
• Higher order moments are more vulnerable to white
noise, thus making their use undesirable for pattern
recognition.
• When using Hu’s moments, even a slight
discrepancy in the image (i.e. minor shape
deformation or digitisation errors) can cause
considerable confusion if trying to discriminate
between two similar images.
• However, noise simulation is very involved, and is
highly dependent on the type of noise being
simulated, its distribution, how it is applied etc.
Care must be taken when generalising to alternative
noise-related conditions.
INF 386, 2003, Lecture 1, page 37 of 42
Image reconstruction
• The image within the unit circle may be
reconstructed to an arbitrary precision by
f (x, y) = lim
N →∞
N X
X
AnmVnm(x, y)
n=0 m
where the second sum is taken over all |m| ≤ n,
such that n − |m| is even.
• The contribution of the Zernike moment of order m
to the reconstruction is
X
AmnVmn(r, θ)|
|Im(x, y)| = |
n
2
2
where x + y ≤ 1, |n| ≤ m and m − |n| is even.
• Gibbs phenomena may appear in the reconstructed
object. This is caused by the inability of a
continuous function to recreate a step function - no
matter how many finite high order terms are used,
an overshoot of the function will occur. Outside of
the original area of a binary object, “ripples” of
overshoot of the continuous function may be visible.
INF 386, 2003, Lecture 1, page 38 of 42
Contour invariants
Contour description
• Moments can be used to characterize
shapes even if the region is represented by
its boundary.
• Suppose we have an object S and that we
are able to find the length of its contour.
• Assume a closed boundary as an ordered sequence
z(i) of Euclidean distance between the centroid and
all N boundary pixels.
• We partition the contour into M segments
of equal length, and thereby find M
equidistant points along the contour of S.
• No extra processing for spiral or concave contours.
• Translation, rotation, and scale invariant
one-dimensional normalized contour sequence
moments can be estimated as
mr =
1
N
N
X
i=1
[z(i)]r ,
µr =
1
N
N
X
i=1
[z(i) − m1]r
• Normalized ordinary and central contour sequence
moments:
P
r
1N N
mr
i=1 [z(i)]
m
¯ r = (µ2 )r/2 = h
ir/2
PN
2
1
[z(i)
−
m
]
1
i=1
N
PN
1
[z(i) − m1 ]r
µ¯ r = (µµ2 )rr/2 = h N i=1
ir/2
PN
2
1
[z(i)
−
m
]
1
i=1
N
INF 386, 2003, Lecture 1, page 39 of 42
• The coordinates (x, y) of these M points
are then put into a complex vector f
f (k) = x(k) + iy(k), k ∈ [0, M − 1]
• We view the x-axis as the real axis and the
y-axis as the imaginary one for a sequence
of complex numbers (Granlund 1972).
• The description of the object contour is
changed, but all the information is
preserved.
• And we have transformed the contour
problem from 2D to 1D.
INF 386, 2003, Lecture 1, page 40 of 42
Fourier-coefficients
• We perform a forward Fourier transform
M −1
−2πiuk
1 X
f (x) exp
F (u) =
M
M
k=0
for u ∈ [0, M − 1].
• F (0) now contains the center of mass of
the object, and the coefficients
F (1), F (2), F (3), ..., F (M − 1) will describe
the object in increasing detail.
• These features depend on rotation, scaling
and starting point on the contour.
• We do not want to use all coefficients as
features, but terminate at F (N ), N < M .
• This corresponds to setting
F (k) = 0, k > N − 1
INF 386, 2003, Lecture 1, page 41 of 42
Approximation
• When transforming back, we get an
approximation to the original contour
N
−1
X
2πiuk
F (u) exp
fˆ(k) =
M
u=0
defined for k ∈ [0, M − 1].
• We have only used N features to
reconstruct each component of fˆ(k), but k
still runs from 0 to M − 1.
• The number of points in the
approximation is the same (M ), but the
number of coefficients (features) used to
reconstruct each point is smaller (N < M ).
• The first 10 – 15 descriptorsare found to be
sufficient for character description.
• The Fourier descriptors can be invariant to
translation and rotation if the co-ordinate
system is appropriately chosen.
INF 386, 2003, Lecture 1, page 42 of 42