Local Image Features Ali Borji ! UWM Many slides from James Hayes, Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial Overview of Keypoint Matching 1. Find a set of distinctive key- points B3 A1 A2 2. Define a region around each keypoint A3 B2 B1 fA fB d ( f A, fB ) < T K. Grauman, B. Leibe 3. Extract and normalize the region content 4. Compute a local descriptor from the normalized region 5. Match local descriptors SIFT Why local features? • Why Local Features? Click to edit title style LocalFeature=Interest“Point”=Keypoint=F “Point”= Distinguished Region = • eature Local Feature = Interest “Point” = Keypoint = Feature Covariant Region “Point”= Distinguished Region = Covariant Region () local descriptor Local : robust to occlusion/clutter Local : robust to occlusion/clutter Invariant : to scale/view/illumination changes Invariant: to scale/view/illumination changes Local feature can be discriminant! 3 Local feature can be discriminant! Slide credit: Liangliang Cao 4 Slide credit: B. Freeman and A. Torralba Image/patch representations ! • Templates – Intensity, gradients, etc. ! ! • Histograms – Color, texture, SIFT descriptors, etc. Development of Low-Level Image Features 1999 Classical features • Raw pixel • Histogram feature - Color histogram - Edge histogram • Frequency analysis • Image filters • Texture features - LBP • Scene features - GIST • Shape descriptors • Edge detection • Corner detection • • • • • • • • • • • • Local Descriptors SIFT HOG SURF DAISY BRIEF … DoG Hessian detector Laplacian of Harris FAST ORB … 6 Slide credit: Liangliang Cao Raw Pixels and Histograms Concatenating Rawtitle Pixels As 1D Click to edit style Concatenating Raw Pixels 1DVector Vector Concatenating Rawtitle Pixels As 1D Vector Click to edit style Credit: The Face Research Lab Credit: The Face Research Lab Visual Recognition And Search Visual Recognition And Search Columbia University, Spring 2013 7 7 Columbia University, Spring 201 7 Slide credit: Liangliang Cao Raw Pixels and Histograms Raw Pixels and Histograms Concatenated Raw Pixels Concatenated Raw Pixels Click to edit title style Concatenated Raw Pixels Click to edit title style Famous applications (widely used in ML field) Famous applications (widely used in ML field) Famous applications (widely used in ML field) • Face recognition • Face recognition • Face recognition ! ! •• Hand written digitsdigits Hand-written • Hand written digits Pictures courtesy to Sam Roweis Pictures courtesy to Sam Roweis Visual Recognition And Search 8 Columbia University, Spring 2013 Visual Recognition AndtoSearch Pictures courtesy Sam Roweis 8 Columbia University, Spring 2013 8 Slide credit: Liangliang Cao Raw patches as local descriptors The simplest way to describe the neighborhood around an interest point is to write down the list of intensities to form a feature vector. ! But this is very sensitive to even small shifts, rotations. Slide credit: K. Grauman Raw Pixels and Histograms Tiny Images Click to Tiny editImages title style Antonio Torralba et al. proposed to resize Antonio Torralba et al proposed to resize images to images to 32x32 color thumbnails, which are 32x32 color thumbnails, which are called “tiny called “tiny images” images” Related applications • Related applications – Scene recognition - –Scene recognition Object recognition - Object recognition Visual Recognition And Search Fast speed with limited accuracy Fast speed with limited accuracy Torralba et al, MIT CSAIL report, 200 Columbia University, Spring 10 2013 9 Torralba et al, MIT CSAIL report, 2007Slide credit: Liangliang Cao Problem of raw-pixel based Raw Pixels and Histograms representation -Rely of heavily onbased good alignment Click to edit title style Problem raw-pixel representation -Assume images are of similar scale – Rely heavilythe on good alignment -Suffer occlusion scale – Assumefrom the images are of similar -Recognition from different view point will – Suffer from occlusion be difficult – Recognition from different view point will be difficult We want moremore powerful features for real-world problems We want powerful features for reallike the problems following world like the following 11 Slide credit: Liangliang Cao Image Representations: Histograms Global histogram • Represent distribution of features – Color, texture, depth, … Images from Dave Kauchak Space Shuttle Cargo Bay Image Representations: Histograms Histogram: Probability or count of data in each bin • Joint histogram – Requires lots of data – Loss of resolution to avoid empty bins Images from Dave Kauchak Marginal histogram • • Requires independent features More data/bin than joint histogram Raw Pixels and Histograms Color Histogram Color Click to editHistogram title styleof color Distribution Each pixel is Each pixel by is described described a vector pixel by aof vector of values pixel values Distribution color vectorsby vectors isofdescribed a histogram is described by a histogram r g b Note: There are different choices for color space: RGB, HSV, Lab, etc. Note: There are we different choices for color RGB, HSV, For gray images, usually use 256 or fewer bins forspace: histogram. Lab, etc. For gray images, we usually use 256 or fewer bins for histogram. [Swain and Ballard 91] [Swain and Ballard 91] 14 Slide credit: Liangliang Cao What kind of things do we compute histograms of? ! • Color ! ! ! ! ! ! L*a*b* color space HSV color space • Texture (filter banks or HOG over regions) Raw Pixels and Histograms Benefits of Histogram Representations Benefits of Histogram Representation Click to edit title style No longer sensitive to alignment, scale or No longer sensitive to alignment, scale transform, transform, even global rotation even globalor rotation. Similar color histograms (after normalization). Similar color histograms (after normalization ) Visual Recognition And Search 12 Columbia University, Spring 2013 16 Slide credit: Liangliang Cao Limitation of Global Histogram Global histogram has no location information at all All of these images have the same color histogram! 17 Slide credit: Ali Farhadi Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006) Slide credit: Ali Farhadi Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006) Slide credit: Ali Farhadi Spatial pyramid representation • Extension of a bag of features • Locally orderless representation at several levels of resolution level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006) level 2 Slide credit: Ali Farhadi Raw Pixels Histogram and Histograms General Representation Raw Pixels and Histograms General Histogram Representation Click to edit title style General Histogram Representation • Color histogram Click Color histogram to edit title style Color histogram ! patch ! ! ! patch patch patch patch patch Extract color descriptor Extract color descriptor histogram histogram accumulation accumulation General histogram General histogram • General histogram patch patch patch patch patch patch Extract other descriptors Extract other descriptors • edge histogram edge histogram • shape context histogram shapebinary context histogram •• local patterns • local binary patterns Visual Recognition And Search histogram histogram accumulation accumulation 21 17 Columbia University, Spring 2013 Slide credit: Liangliang Cao Raw Pixels and Histograms Local Binary Pattern (LBP) Local Binary Pattern Click to edit title style (LBP) • For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p’s value and 1 otherwise. • Represent the texture in the image (or a region) by the histogram of these numbers. 1 8 3 100 101 103 40 50 80 50 60 90 7 Visual Recognition And Search 2 6 11111100 4 5 Ojala et al, PR’96, PAMI’02 Columbia University, Spring 2013 18 Ojala et al, PR’96, PAMI’02 22 Slide credit: Liangliang Cao Raw Pixels and Histograms LBP Histogram LBP Histogram Click to edit title style • Divide the examined window to cells (e.g. 16x16 pixels for each cell). • Compute the histogram, over the cell, of the frequency of each "number" occurring. • Optionally normalize the histogram. • Concatenate normalized histograms of all cells. Visual Recognition And Search 19 Columbia University, Spring 2013 23 Slide credit: Liangliang Cao Steps Histogram of Oriented (HOG) descriptor Histograms of Oriented Gradients for Human Detection [N. Dalal and B. Triggs CVPR 2005] 24 25 26 27 28 SIFT descriptor [Lowe 2004] Full version • Divide the 16x16 window into a 4x4 grid of cells • Compute an orientation histogram for each cell • 16 cells * 8 orientations = 128 dimensional descriptor ! 0 2p Why subpatches? Why does SIFT have some illumination invariance? Slide credit: K. Grauman SIFT descriptor Full version • • • • Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) Compute an orientation histogram for each cell 16 cells * 8 orientations = 128 dimensional descriptor Threshold normalize the descriptor: ! ! such that: 0.2 Adapted from slide by David Lowe Making descriptor rotation invariant CSE 576: Computer Vision • Rotate patch according to its dominant gradient orientation • This puts the patches into a canonical orientation. Image from Matthew Brown Slide credit: K. Grauman SIFT properties Extraordinarily robust matching technique Can handle changes in viewpoint Up to about 60 degree out of plane rotation Can handle significant changes in illumination Sometimes even day vs. night (below) Fast and efficient—can run in real time Lots of code available http://people.csail.mit.edu/albert/ladypack/wiki/index.php/ Known_implementations_of_SIFT Slide credit: K. Grauman, S. Seitz Example NASA Mars Rover images Slide credit: K. Grauman Example NASA Mars Rover images with SIFT feature matches Figure by Noah Snavely Slide credit: K. Grauman Example: Object Recognition SIFT is extremely powerful for object instance recognition, especially for well-textured objects Lowe, IJCV04 Example: Google Goggle SIFT properties • Invariant to – Scale – Rotation ! • Partially invariant to – Illumination changes – Camera viewpoint – Occlusion, clutter Slide credit: K. Grauman 38 Slide credit: R. Urtasun SIFT Repeatability Lowe IJCV 2004 SIFT Repeatability Recognition of specific objects, scenes Schmid and Mohr 1997 Rothganger et al. 2003 Kristen Grauman Sivic and Zisserman, 2003 Lowe 2002 When does SIFT fail? Patches SIFT thought were the same but aren’t: Other methods: Daisy Circular gradient binning SIFT Daisy Picking the best DAISY, S. Winder, G. Hua, M. Brown, CVPR 09 Local Descriptors: SURF (Speeded Up Robust Features) • Fast approximation of SIFT idea ➢ ➢ Efficient computation by 2D box filters & integral images ⇒ 6 times faster than SIFT Equivalent quality for object identification • GPU implementation available ➢ ➢ Feature extraction @ 200Hz (detector + descriptor, 640×480 img) http://www.vision.ee.ethz.ch/~surf [Bay, ECCV’06], [Cornelis, CVGPU’08] K. Grauman, B. Leibe Other methods: SURF For computational efficiency only compute gradient histogram with 4 bins: SURF: Speeded Up Robust Features Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, ECCV 2006 Other methods: BRIEF Randomly sample pair of pixels a and b. 1 if a > b, else 0. Store binary vector. Daisy BRIEF: binary robust independent elementary features, Calonder, V Lepetit, C Strecha, ECCV 2010 Local Descriptors: Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 Log-polar binning: more precision for nearby points, more flexibility for farther points. Belongie & Malik, ICCV 2001 K. Grauman, B. Leibe Shape Context Descriptor Shape Context Descriptor • Now in order for a feature descriptor to be useful, it needs to have certain invariances. In particular it needs to be invariant to translation, scale, small perturbations, and depending on application rotation. Translational invariance come naturally to shape context. Scale invariance is obtained by normalizing all radial distances by the mean distance \alpha between all the point pairs in the shape [2][3] although the median distance can also be used.[1] [4] Shape contexts are empirically demonstrated to be robust to deformations, noise, and outliers[4] using synthetic point set matching experiments.[5] ! • One can provide complete rotation invariance in shape contexts. One way is to measure angles at each point relative to the direction of the tangent at that point (since the points are chosen on edges). This results in a completely rotationally invariant descriptor. But of course this is not always desired since some local features lose their discriminative power if not measured relative to the same frame. Many applications in fact forbid rotation invariance e.g. 49 distinguishing a "6" from a “9". [from WIKIPEDIA] http://www.acberg.com/gb.html Local Descriptors: Geometric Blur Compute edges at four orientations Extract a patch in each channel ~ Apply spatially varying blur and sub-sample Example descriptor (Idealized signal) Berg & Malik, CVPR 2001 K. Grauman, B. Leibe Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007 Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007 Self-similarity Descriptor Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007 Learning Local Image Descriptors, Winder and Brown, 2007 Choosing a detector ! • What do you want it for? – Precise localization in x-y: Harris – Good localization in scale: Difference of Gaussian – Flexible region shape: MSER ! • Best choice often application dependent – Harris-/Hessian-Laplace/DoG work well for many natural categories – MSER works well for buildings and printed things ! • Why choose? – Get more points with more detectors ! • There have been extensive evaluations/comparisons – [Mikolajczyk et al., IJCV’05, PAMI’05] – All detectors/descriptors shown here work well Comparison of Keypoint Detectors Tuytelaars Mikolajczyk 2008 Local Descriptors • Most features can be thought of as templates, histograms (counts), or combinations • The ideal descriptor should be – – – – Robust Distinctive Compact Efficient • Most available descriptors focus on edge/ gradient information – Capture texture information – Color rarely used K. Grauman, B. Leibe Things to remember ! • Keypoint detection: repeatable and distinctive – Corners, blobs, stable regions – Harris, DoG ! ! • Descriptors: robust and selective – spatial histograms of orientation – SIFT
© Copyright 2024