Patch tracking based on comparing its pixels1

Patch tracking based on comparing its pixels1
Tomáš Svoboda, [email protected]
Czech Technical University in Prague, Center for Machine Perception
http://cmp.felk.cvut.cz
Last update: March 23, 2015
comparing patch pixels
normalized cross-correlation, ssd . . .
KLT - gradient based optimization
Talk Outline
good features to track
Please note that the lecture will be accompanied be several sketches and derivations on the blackboard
and few live-interactive demos in Matlab
1
What is the problem?
2/38
video: CTU campus, door of G building
Tracking of dense sequences — camera motion
T - Template
I - Image
Scene static, camera moves.
3/38
Tracking of dense sequences — object motion
T - Template
4/38
I - Image
Camera static, object moves.
Alignment of an image (patch)
5/38
Goal is to align a template image T (x) to an input image I(x). x column
vector containing image coordinates [x, y]>. The I(x) could be also a small
subwindow within an image.
How to measure the alignment?
6/38
What is the best criterial function?
How to find the best match, in other words, how to find extremum of
the criterial function?
Criterial function
convex (remember the optimization course?)
discriminative
What are the desired properties (on a certain domain)?
...
Normalized cross-correlation
7/38
You may know it as correlation coefficient (from statistics)
ρX,Y =
cov(X, Y ) E[(X − µX )(Y − µY )]
=
σX σY
σX σY
where σ means standard deviation.
Having template T (k, l) and image I(x, y),
T (k, l) − T I(x + k, y + l) − I(x, y)
r
r(x, y) = q
2
2 P P P P
T
I(x,
y)
T
(k,
l)
−
I(x
+
k,
y
+
l)
−
k
l
k
l
P P
k
l
Normalized cross-correlation – in picture
8/38
criterial function ncc
50
50
100
100
150
150
0.6
200
200
0.4
250
250
300
300
350
350
400
400
450
450
500
500
550
550
400
500
600
−0.2
−0.4
−0.6
100
700
200
300
0
well, definitely not convex
200
0.2
but the discriminability looks promising
100
0.8
very efficient in computation, see [3]2.
2
300
400
500
600
700
check also normxcorr2 in Matlab
Sum of squared differences
ssd (x, y) =
XX
k
l
9/38
2
(T (k, l) − I(x + k, y + l))
criterial function ssd
50
50
100
100
150
150
200
200
250
250
300
300
350
350
400
400
450
450
500
500
550
550
7
x 10
4
3.5
3
2.5
100
200
300
400
500
600
700
2
1.5
1
0.5
100
200
300
400
500
600
700
0
Sum of absolute differences
sad (x, y) =
XX
k
l
10/38
|T (k, l) − I(x + k, y + l)|
criterial function sad
5
x 10
2.2
50
50
100
100
1.8
150
150
1.6
200
200
250
250
300
300
350
350
400
400
450
450
500
500
550
550
2
1.4
1.2
1
0.8
0.6
0.4
0.2
100
200
300
400
500
600
100
700
200
300
400
500
600
700
SAD for the door part
11/38
criterial function sad
4
x 10
50
10
9
100
8
50
150
100
7
150
200
6
250
5
200
250
300
4
300
350
3
400
350
2
450
500
1
400
550
100
200
300
400
500
600
200
700
250
300
350
400
450
500
550
SAD for the door part – truncated
12/38
criterial function sad_truncated
50
8000
100
7000
150
6000
200
5000
250
4000
50
100
150
200
250
300
3000
300
350
400
2000
350
450
500
1000
400
550
100
200
300
400
500
600
700
200
250
300
350
400
450
500
Differences greater than 20 intensity levels are counted as 20.
550
0
Normalized cross-correlation: how it works
13/38
live demo for various patches
Normalized cross-correlation: tracking
14/38
What went wrong?
video
Why did it failed?
Suggestions for improvement?
finding extrema of a criterial function . . .
Tracking as an optimization problem
. . . sounds like an optimization problem
Iteratively minimizes sum of square differences.
Kanade–Lucas–Tomasi (KLT) tracker
It is a Gauss-Newton gradient algorithm.
15/38
Improved many times, most importantly by Carlo Tomasi [5, 6]
Free implementation(s) available3. Also part of the OpenCV library4.
Firstly published in 1981 as an image registration method [4].
Importance in Computer Vision
16/38
After more than two decades, a project5 at CMU dedicated to this
single algorithm and results published in a premium journal [1].
Part of plethora computer vision algorithms.
Our explanation follows mainly the paper [1]. It is a good reading for those
who are also interested in alternative solutions.
http://www.ces.clemson.edu/~stb/klt/
http://opencv.org/
5
Lucas-Kanade 20 Years On https://www.ri.cmu.edu/research_project_detail.html?project_
id=515&menu_id=261
3
4
Original Lucas-Kanade algorithm I
17/38
Goal is to align a template image T (x) to an input image I(x). x column
vector containing image coordinates [x, y]>. The I(x) could be also a small
subwindow withing an image.
Set of allowable warps W(x; p), where p is a vector of parameters. For
translations
x + p1
W(x; p) =
y + p2
W(x; p) can be arbitrarily complex
The best alignment, p∗, minimizes image dissimilarity
X
x
[I(W(x; p)) − T (x)]2
Original Lucas-Kanade algorithm II
X
x
18/38
[I(W(x; p)) − T (x)]2
I(W(x; p) is nonlinear! The warp W(x; p) may be linear but the pixels
value are, in general, non-linear. In fact, they are essentially unrelated to x.
Linearization of the image: It is assumed that some p is known and best
increment ∆p is sought. The modified problem
X
[I(W(x; p + ∆p)) − T (x)]2
x
is solved with respect to ∆p. When found then p gets updated
p ← p + ∆p
...
Original Lucas-Kanade algorithm III
X
x
19/38
[I(W(x; p + ∆p)) − T (x)]2
linearization by performing first order Taylor expansion6
X
[I(W(x; p)) + ∇I >
x
∂W
∆p − T (x)]2
∂p
∂I ∂I
∇I > = [ ∂x
, ∂y ] is the gradient image computed at W(x; p). The term
is the Jacobian of the warp.
6
∂W
∂p
Detailed explanation on the blackboard.
Original Lucas-Kanade algorithm IV
Differentiate
P
x [I(W(x; p))
+
∇I > ∂∂W
p ∆p
20/38
− T (x)]2 with respect to ∆p
> X
> ∂W
> ∂W
2
∇I
I(W(x; p)) + ∇I
∆p − T (x)
∂p
∂p
x
setting equality to zero yields
−1
∆p = H
>
X
> ∂W
[T (x) − I(W(x; p))]
∇I
∂p
x
where H is (Gauss-Newton) approximation of Hessian matrix.
>
X
> ∂W
> ∂W
H=
∇I
∇I
∂p
∂p
x
The Lucas-Kanade algorithm—Summary
21/38
Iterate:
1. Warp I with W(x; p)
2. Warp the gradient ∇I > with W(x; p)
3. Evaluate the Jacobian
image ∇I > ∂∂W
p
∂W
∂p
4. Compute the H =
∇I >
P h
5. Compute ∆p = H−1
x
P h
x
at (x; p) and compute the steepest descent
∂W
∂p
∇I >
i> h
∂W
∂p
∇I >
i>
6. Update the parameters p ← p + ∆p
until k∆pk ≤ ∂W
∂p
i
[T (x) − I(W(x; p))]
Example of convergence
22/38
video
Example of convergence
23/38
Convergence video: Initial state is within the basin of attraction
Example of divergence
Divergence video: Initial state is outside the basin of attraction
24/38
Example – on-line demo
25/38
Let play and see . . .
What are good features (windows) to track?
What are good features (windows) to track?
26/38
27/38
How to select good templates T (x) for image registration, object tracking.
−1
∆p = H
>
X
> ∂W
[T (x) − I(W(x; p))]
∇I
∂p
x
where H is the matrix
>
X
> ∂W
> ∂W
H=
∇I
∇I
∂p
∂p
x
The stability of the iteration is mainly influenced by the inverse of Hessian.
We can study its eigenvalues. Consequently, the criterion of a good feature
window is min(λ1, λ2) > λmin (texturedness).
What are good features for translations?
Consider translation W(x; p) =
1 0
∂W
∂p =
0 1
x + p1
. The Jacobian is then
y + p2
28/38
P h
i> h
i
> ∂W
> ∂W
∇I
∇I
x
∂ p"
∂p
∂I #
P
1 0
1 0
∂I ∂I
∂x
[
=
,
]
∂I
x
∂x ∂x
0 1
0 1
∂y


2
∂I ∂I
∂I
P
∂x ∂y
2 
 ∂x
=
x
∂I ∂I
∂I
H =
∂x ∂y
∂y
The image windows with varying derivatives in both directions.
Homeogeneous areas are clearly not suitable. Texture oriented mostly in one
direction only would cause instability for this translation.
What are the good points for translations?
The matrix
H=
X
x


∂I 2
∂x
∂I ∂I
∂x ∂y
∂I ∂I
∂x ∂y
2
∂I
∂y
29/38


Should have large eigenvalues. We have seen the matrix already, where?
Harris corner detector [2]! The matrix is sometimes called Harris matrix.
Experiments - no occlusions
video
30/38
Experiments - occlusions
31/38
video
Experiments - occlusions with dissimilarity
32/38
video
Experiments - object motion
video
33/38
Experiments – door tracking
34/38
video
Experiments – door tracking – smoothed
35/38
video
Comparison of ncc vs KLT tracking
video
36/38
References
37/38
[1] Simon Baker and Iain Matthews. Lucas-Kanade 20 years on: A unifying framework. International Journal
of Computer Vision, 56(3):221–255, 2004.
[2] C. Harris and M. Stephen. A combined corner and edge detection. In M. M. Matthews, editor,
Proceedings of the 4th ALVEY vision conference, pages 147–151, University of Manchaster, England,
September 1988. on-line copies available on the web.
[3] J.P. Lewis. Fast template matching. In Vision Interfaces, pages 120–123, 1995. Extended version
published on-line as "Fast Normalized Cross-Correlation" at
http://scribblethink.org/Work/nvisionInterface/nip.html.
[4] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an application to
stereo vision. In Proceedings of the 7th International Conference on Artificial Intelligence, pages 674–679,
August 1981.
[5] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pages 593–600, 1994.
[6] Carlo Tomasi and Takeo Kanade. Detection and tracking of point features. Technical Report
CMU-CS-91-132, Carnegie Mellon University, April 1991.
End
38/38