Local Image Features

Local Image Features
Ali Borji
!
UWM
Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial
Overview of Keypoint Matching
1. Find a set of distinctive key-
points
B3
A1
A2
2. Define a region around each keypoint
A3
B2
B1
fA
fB
d ( f A, fB ) < T
K. Grauman, B. Leibe
3. Extract and normalize the region content
4. Compute a local descriptor from the
normalized region
5. Match local descriptors
SIFT
Why local features?
•
Why
Local
Features?
Click
to
edit
title style
LocalFeature=Interest“Point”=Keypoint=F
“Point”= Distinguished Region =
• eature
Local Feature = Interest “Point” = Keypoint = Feature Covariant
Region
“Point”= Distinguished Region = Covariant Region
()
local descriptor
Local : robust to occlusion/clutter
Local
: robust
to occlusion/clutter
Invariant
: to scale/view/illumination
changes
Invariant: to scale/view/illumination changes
Local feature can be discriminant!
3
Local feature can be discriminant!
Slide credit: Liangliang Cao
4
Slide credit: B. Freeman and A. Torralba
Image/patch representations
!
• Templates
– Intensity, gradients, etc.
!
!
• Histograms
– Color, texture, SIFT descriptors, etc.
Development of Low-Level Image Features
1999
Classical
features
• Raw pixel
• Histogram feature
- Color histogram
- Edge histogram
• Frequency analysis
• Image filters
• Texture features
- LBP
• Scene features
- GIST
• Shape descriptors
• Edge detection
• Corner detection
•
•
•
•
•
•
•
•
•
•
•
•
Local Descriptors
SIFT
HOG
SURF
DAISY
BRIEF
…
DoG
Hessian detector
Laplacian of Harris
FAST
ORB
…
6
Slide credit: Liangliang Cao
Raw Pixels and Histograms
Concatenating
Rawtitle
Pixels
As 1D
Click to edit
style
Concatenating
Raw
Pixels
1DVector
Vector
Concatenating
Rawtitle
Pixels
As 1D Vector
Click to edit
style
Credit: The Face Research Lab
Credit: The Face Research Lab
Visual Recognition And Search
Visual Recognition And Search
Columbia University, Spring 2013
7
7
Columbia University, Spring 201
7
Slide credit: Liangliang Cao
Raw Pixels and Histograms
Raw Pixels and Histograms
Concatenated
Raw
Pixels
Concatenated
Raw
Pixels
Click to edit title style
Concatenated
Raw
Pixels
Click
to
edit
title
style
Famous applications (widely used in ML field)
Famous applications (widely used in ML field)
Famous
applications
(widely used in ML field)
•
Face
recognition
• Face recognition
• Face
recognition
!
!
•• Hand
written digitsdigits
Hand-written
• Hand written digits
Pictures courtesy to Sam Roweis
Pictures courtesy to Sam Roweis
Visual Recognition And Search
8
Columbia University, Spring 2013
Visual Recognition
AndtoSearch
Pictures
courtesy
Sam Roweis
8
Columbia University, Spring 2013
8
Slide credit: Liangliang Cao
Raw patches as local descriptors
The simplest way to describe the
neighborhood around an interest
point is to write down the list of
intensities to form a feature
vector.
!
But this is very sensitive to even
small shifts, rotations.
Slide credit: K. Grauman
Raw Pixels and Histograms
Tiny Images
Click to Tiny
editImages
title style
Antonio Torralba et al. proposed to resize
Antonio Torralba et al proposed to resize images to
images
to 32x32 color thumbnails, which are
32x32 color thumbnails, which are called “tiny called
“tiny images”
images”
Related applications
• Related applications
– Scene recognition
- –Scene
recognition
Object recognition
- Object recognition
Visual Recognition And Search
Fast speed with limited accuracy
Fast speed with limited accuracy
Torralba et al, MIT CSAIL report, 200
Columbia University, Spring
10 2013
9
Torralba et al, MIT CSAIL report, 2007Slide credit: Liangliang Cao
Problem
of
raw-pixel
based
Raw Pixels and Histograms
representation
-Rely of
heavily
onbased
good
alignment
Click
to
edit
title
style
Problem
raw-pixel
representation
-Assume
images
are of similar scale – Rely heavilythe
on good
alignment
-Suffer
occlusion
scale
– Assumefrom
the images
are of similar
-Recognition
from different view point will
– Suffer from occlusion
be difficult – Recognition from different view point will be difficult
We
want
moremore
powerful
features for
real-world
problems
We
want
powerful
features
for
reallike the problems
following
world
like the following
11
Slide credit: Liangliang Cao
Image Representations: Histograms
Global histogram
• Represent distribution of features
– Color, texture, depth, …
Images from Dave Kauchak
Space Shuttle
Cargo Bay
Image Representations: Histograms
Histogram: Probability or count of data in each bin
• Joint histogram
– Requires lots of data
– Loss of resolution to avoid empty bins
Images from Dave Kauchak
Marginal histogram
•
•
Requires independent features
More data/bin than joint histogram
Raw Pixels
and Histograms
Color
Histogram
Color
Click to
editHistogram
title
styleof color
Distribution
Each pixel is
Each pixel by
is described
described
a vector
pixel
by aof
vector
of values
pixel values
Distribution
color vectorsby
vectors isofdescribed
a histogram
is described
by a histogram
r
g
b
Note: There are different choices for color space: RGB, HSV, Lab, etc.
Note:
There
are we
different
choices
for color
RGB, HSV,
For gray
images,
usually use
256 or fewer
bins forspace:
histogram.
Lab, etc. For gray images, we usually use 256 or fewer bins
for histogram.
[Swain and Ballard 91]
[Swain and Ballard 91]
14
Slide credit: Liangliang Cao
What kind of things do we compute histograms of?
!
• Color
!
!
!
!
!
!
L*a*b* color space
HSV color space
• Texture (filter banks or HOG over regions)
Raw Pixels and Histograms
Benefits
of
Histogram
Representations
Benefits
of
Histogram
Representation
Click to edit title style
No
longer
sensitive
to alignment,
scale or
No longer
sensitive
to alignment,
scale transform,
transform,
even global rotation
even globalor
rotation.
Similar color histograms
(after normalization).
Similar color histograms
(after normalization
)
Visual Recognition And Search
12
Columbia University, Spring 2013
16
Slide credit: Liangliang Cao
Limitation of Global Histogram
Global histogram has no location information at all
All of these images have the same color histogram!
17
Slide credit: Ali Farhadi
Spatial pyramid representation
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
level 0
Lazebnik, Schmid & Ponce (CVPR 2006)
Slide credit: Ali Farhadi
Spatial pyramid representation
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
level 0
level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
Slide credit: Ali Farhadi
Spatial pyramid representation
• Extension of a bag of features
• Locally orderless representation at several levels of resolution
level 0
level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
level 2
Slide credit: Ali Farhadi
Raw
Pixels Histogram
and Histograms
General
Representation
Raw Pixels and Histograms
General
Histogram
Representation
Click
to
edit
title
style
General
Histogram
Representation
• Color
histogram
Click
Color histogram to edit title style
Color
histogram
! patch
!
!
!
patch
patch
patch
patch
patch
Extract color descriptor
Extract color descriptor
histogram
histogram
accumulation
accumulation
General histogram
General
histogram
• General
histogram
patch
patch
patch
patch
patch
patch
Extract other descriptors
Extract other descriptors
• edge histogram
edge histogram
• shape
context histogram
shapebinary
context
histogram
•• local
patterns
• local binary patterns
Visual Recognition And Search
histogram
histogram
accumulation
accumulation
21
17
Columbia
University,
Spring 2013
Slide
credit:
Liangliang
Cao
Raw Pixels and Histograms
Local
Binary
Pattern
(LBP)
Local
Binary
Pattern
Click to edit title style (LBP)
• For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8,
where bi = 0 if neighbor i has value less than or equal to p’s
value and 1 otherwise.
• Represent the texture in the image (or a region) by the
histogram of these numbers.
1
8
3
100 101 103
40 50 80
50 60 90
7
Visual Recognition And Search
2
6
11111100
4
5
Ojala et al, PR’96, PAMI’02
Columbia University, Spring 2013
18
Ojala et al, PR’96, PAMI’02
22
Slide credit: Liangliang Cao
Raw Pixels and Histograms
LBP
Histogram
LBP Histogram
Click to edit title style
• Divide the examined window to cells (e.g. 16x16 pixels for
each cell).
• Compute the histogram, over the cell, of the frequency of
each "number" occurring.
• Optionally normalize the histogram.
• Concatenate normalized histograms of all cells.
Visual Recognition And Search
19
Columbia University, Spring 2013
23
Slide credit: Liangliang Cao
Steps
Histogram of Oriented (HOG)
descriptor
Histograms of Oriented Gradients for Human Detection [N. Dalal and B. Triggs CVPR 2005]
24
25
26
27
28
SIFT descriptor [Lowe 2004]
Full version
• Divide the 16x16 window into a 4x4 grid of cells
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
!
0
2p
Why subpatches?
Why does SIFT have
some illumination
invariance?
Slide credit: K. Grauman
SIFT descriptor
Full version
•
•
•
•
Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
Compute an orientation histogram for each cell
16 cells * 8 orientations = 128 dimensional descriptor
Threshold normalize the descriptor:
!
!
such that:
0.2
Adapted from slide by David Lowe
Making descriptor rotation invariant
CSE 576: Computer Vision
• Rotate patch according to its dominant gradient
orientation
• This puts the patches into a canonical orientation.
Image from Matthew Brown
Slide credit: K. Grauman
SIFT properties
Extraordinarily robust matching technique
Can handle changes in viewpoint
Up to about 60 degree out of plane rotation
Can handle significant changes in illumination
Sometimes even day vs. night (below)
Fast and efficient—can run in real time
Lots of code available
http://people.csail.mit.edu/albert/ladypack/wiki/index.php/
Known_implementations_of_SIFT
Slide credit: K. Grauman, S. Seitz
Example
NASA Mars Rover images
Slide credit: K. Grauman
Example
NASA Mars Rover images
with SIFT feature matches
Figure by Noah Snavely
Slide credit: K. Grauman
Example: Object Recognition
SIFT is extremely powerful for object instance
recognition, especially for well-textured objects
Lowe, IJCV04
Example: Google Goggle
SIFT properties
• Invariant to
– Scale – Rotation
!
• Partially invariant to
– Illumination changes
– Camera viewpoint
– Occlusion, clutter
Slide credit: K. Grauman
38
Slide credit: R. Urtasun
SIFT Repeatability
Lowe IJCV 2004
SIFT Repeatability
Recognition of specific objects, scenes
Schmid and Mohr 1997
Rothganger et al. 2003
Kristen Grauman
Sivic and Zisserman, 2003
Lowe 2002
When does SIFT fail?
Patches SIFT thought were the same but aren’t:
Other methods: Daisy
Circular gradient binning
SIFT
Daisy
Picking the best DAISY, S. Winder, G. Hua, M. Brown, CVPR 09
Local Descriptors: SURF
(Speeded Up Robust Features)
• Fast approximation of SIFT idea
➢
➢
Efficient computation by 2D box filters &
integral images
⇒ 6 times faster than SIFT
Equivalent quality for object
identification
• GPU implementation available
➢
➢
Feature extraction @ 200Hz
(detector + descriptor, 640×480 img)
http://www.vision.ee.ethz.ch/~surf
[Bay, ECCV’06], [Cornelis, CVGPU’08]
K. Grauman, B. Leibe
Other methods: SURF
For computational efficiency only compute gradient
histogram with 4 bins:
SURF: Speeded Up Robust Features
Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, ECCV 2006
Other methods: BRIEF
Randomly sample pair of pixels a and b.
1 if a > b, else 0. Store binary vector.
Daisy
BRIEF: binary robust independent elementary features, Calonder,
V Lepetit, C Strecha, ECCV 2010
Local Descriptors: Shape Context
Count the number of points
inside each bin, e.g.:
Count = 4
...
Count = 10
Log-polar binning: more
precision for nearby points,
more flexibility for farther
points.
Belongie & Malik, ICCV 2001
K. Grauman, B. Leibe
Shape Context Descriptor
Shape Context Descriptor
• Now in order for a feature descriptor to be useful, it needs to have
certain invariances. In particular it needs to be invariant to
translation, scale, small perturbations, and depending on
application rotation. Translational invariance come naturally to
shape context. Scale invariance is obtained by normalizing all radial
distances by the mean distance \alpha between all the point pairs in
the shape [2][3] although the median distance can also be used.[1]
[4] Shape contexts are empirically demonstrated to be robust to
deformations, noise, and outliers[4] using synthetic point set
matching experiments.[5]
!
• One can provide complete rotation invariance in shape contexts.
One way is to measure angles at each point relative to the direction
of the tangent at that point (since the points are chosen on edges).
This results in a completely rotationally invariant descriptor. But of
course this is not always desired since some local features lose their
discriminative power if not measured relative to the same frame.
Many applications in fact forbid rotation invariance e.g.
49
distinguishing a "6" from a “9". [from WIKIPEDIA]
http://www.acberg.com/gb.html
Local Descriptors: Geometric Blur
Compute
edges at four
orientations
Extract a patch
in each channel
~
Apply spatially varying
blur and sub-sample
Example descriptor
(Idealized signal)
Berg & Malik, CVPR 2001
K. Grauman, B. Leibe
Self-similarity Descriptor
Matching Local Self-Similarities across Images
and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor
Matching Local Self-Similarities across Images
and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor
Matching Local Self-Similarities across Images
and Videos, Shechtman and Irani, 2007
Learning Local Image Descriptors, Winder and
Brown, 2007
Choosing a detector
!
• What do you want it for?
– Precise localization in x-y: Harris
– Good localization in scale: Difference of Gaussian
– Flexible region shape: MSER
!
• Best choice often application dependent
– Harris-/Hessian-Laplace/DoG work well for many natural categories
– MSER works well for buildings and printed things
!
• Why choose?
– Get more points with more detectors
!
• There have been extensive evaluations/comparisons
– [Mikolajczyk et al., IJCV’05, PAMI’05]
– All detectors/descriptors shown here work well
Comparison of Keypoint Detectors
Tuytelaars Mikolajczyk 2008
Local Descriptors
• Most features can be thought of as templates,
histograms (counts), or combinations
• The ideal descriptor should be
–
–
–
–
Robust
Distinctive
Compact
Efficient
• Most available descriptors focus on edge/
gradient information
– Capture texture information
– Color rarely used
K. Grauman, B. Leibe
Things to remember
!
• Keypoint detection:
repeatable and distinctive
– Corners, blobs, stable regions
– Harris, DoG
!
!
• Descriptors: robust and
selective
– spatial histograms of orientation
– SIFT