High-speed object matching and localization using gradient orientation features Xinyu Xu*, Peter van Beek, Xiaofan Feng Sharp Laboratories of America, 5750 NW Pacific Rim Blvd, Camas, WA, USA 98607 ABSTRACT In many robotics and automation applications, it is often required to detect a given object and determine its pose (position and orientation) from input images with high speed, high robustness to photometric changes, and high pose accuracy. We propose a new object matching method that improves efficiency over existing approaches by decomposing orientation and position estimation into two cascade steps. In the first step, an initial position and orientation is found by matching with Histogram of Oriented Gradients (HOG), reducing orientation search from 2D template matching to 1D correlation matching. In the second step, a more precise orientation and position is computed by matching based on Dominant Orientation Template (DOT), using robust edge orientation features. The cascade combination of the HOG and DOT feature for high-speed and robust object matching is the key novelty of the proposed method. Experimental evaluation was performed with real-world single-object and multi-object inspection datasets, using software implementations on an Atom CPU platform. Our results show that the proposed method achieves significant speed improvement compared to an already accelerated template matching method at comparable accuracy performance. Keywords: Object matching, fast object search, fast template matching, histogram of oriented gradients, dominant orientation template, coarse to fine search, factory inspection, machine vision, robotics 1. INTRODUCTION In this paper, we propose a method for object detection and localization in 2-D images. In particular, given an object of interest in a model image, the goal is to automatically detect the object in an input image, as well as to determine its pose (e.g. position, orientation, and scale). In the input image, the object of interest may have undergone geometric transforms (e.g. rotation, zoom) and photometric changes (e.g. brightness, contrast, blur, noise). The basic task of detecting the presence of an object in the input image may be referred to as object detection. The problem also includes object localization, referring to the accurate estimation of the objects position, orientation angle, and scaling factor with respect to a reference. Systems for detection and localization may involve a search stage; hence the process may also be called image search. This problem is important in applications such as automated manufacturing and inspection in factory production lines, product order fulfillment in automated warehouses, and service robotics. The canonical set-up is illustrated in Figure 1, along with a sample result of the method proposed in this paper. Typically, the relevant characteristics of the object of interest may be extracted, modeled or learned in a prior analysis stage, and are assumed known before the main image search stage is performed. This model extraction stage is considered off-line, while the main image search stage is considered on-line. Multiple instances of the object of interest may appear in the input image. A very important goal is to detect and localize multiple objects with very high speed, i.e. short processing time per input image for the on-line search stage. Another important goal is that the system must be capable of handling a wide variety of object types, either with distinguishing features such as sharp corners or significant edges, or with few such features. Another important goal is for the system to be robust to non-essential changes in the object’s appearance due to lighting or illumination changes, blur, noise, and other degradations due to imaging imperfections. Moreover, the system needs a high degree of repeatability even under varying and noisy conditions. Another goal is that the system can withstand (partial) occlusion of the object or missing object parts in the input image. *[email protected]; phone 1 360 834-8766; www.sharplabs.com Object Model Extraction Model Image Object model / template database Off-line On-line Object pose Object Search Input Image Figure 1. Overview of object detection and localization system. There is a very wide body of literature on object detection and localization. One well-known class of approaches is based on template matching. In this approach, the object model consists of one or more templates that are extracted from the model image(s). The templates may include pixel-based features such as gray-level or color values, edge locations and orientations, etc. Subsequently, the input image is searched for patterns that are similar to one or more of the templates. This may involve a template matching or pattern matching stage in which the similarity between a template and a pattern from the input image is computed. A major problem with template matching approaches based on exhaustive search is that the processing time is too high. Another major class of approaches is based on feature point matching, e.g. SIFT 7. This approach starts with detection of key points (e.g. corner points). The local features of key points are captured using descriptors or a statistical model. Correspondences between individual key points in the model and input image can be found either by matching feature descriptors or using a classification approach. Subsequently, a global transform describing the objects position, rotation and scale may be robustly determined based on the individual point correspondences. A problem with this class of approaches is that it relies on the presence of stable key points, such as corner points, for which the location can be reliably determined. Several types of man-made objects common in industrial inspection and service robotics applications may not contain such stable key points. Another class of approaches includes training-based or learning-based object detection methods9. These methods utilize sets of training samples of the object of interest to build a statistical model. Subsequently, a statistical classification approach is used in the on-line search stage. A problem with such methods is that a large number of training samples may be required in order to learn the relevant object characteristics properly, and that the training or learning stage may require significant time. In practical scenarios, new objects may have to be learned frequently, and limited example images may be available. In this paper, we propose to combine two successful object detection approaches, namely Dominant Orientation Templates (DOT)3,4 and Histogram of Oriented Gradients (HOG) descriptors1. Both methods are based on the use of local gradient features and local edge orientations as the main feature, which have shown to be highly robust, allowing invariance to illumination variations, noise and blur. Both methods also include local pooling of feature responses over small spatial cells or blocks, providing robustness to small changes in edge position or small object deformations. Our overall framework builds specifically on the DOT concept3, based on template matching. Template matching considers the global object and works very well for a large variety of object shapes and appearances, including objects with little texture and few or no stable key points. It does not require extraction of precise contour points, which can be fragile in practice. It does not require compilation of a large training set or a training stage. To make template matching successful in practical applications, it is critical to use acceleration techniques to avoid exhaustive search and to use robust features and similarity measures. A conventional approach for template matching is to use Normalized Cross-Correlation (NCC) between a template and a sliding window in the input image 6. NCC is robust to linear changes between the signal values in the model template and the input image, but not to non-linear signal changes. A fast method based on a similarity measure that is robust to non-linear signal changes was proposed2. Methods for fast template matching and high-speed object localization have been proposed recently5,8. General techniques for speeding up object matching include coarse-to-fine search (using an image pyramid representation), transform-domain processing (to compute cross-correlation), and the use of integral images. In object search, multiple templates of the same object may need to be used that represent different views of a single object. In the 2-D matching case, multiple templates can be pre-computed off-line corresponding to different global orientation angles of a 2-D object. Hence, multiple templates need to be used during the on-line search stage. Significant acceleration can be achieved by considering the relation between multiple templates that represent different views of the object. In particular, we use a single Histogram of Oriented Gradients descriptor covering the entire 2-D object as a rotation-invariant representation (up to a shift) of the object. This single descriptor can be used in a fast pre-search stage that provides coarse estimates of the position and orientation of candidate objects in the input image. This fast pre-search stage rules out many locations in the input image as well as ruling out feasible orientation angles for the object at promising locations. This avoids scanning the entire image with multiple templates representing the different possible global orientations of the object. In addition, 1D HOG descriptors can be matched very efficiently using 1D crosscorrelation. The combination of HOG and DOT as described above serves as a coarse search stage for our overall method. Our application requires accurate estimates of the object positions and orientation angles. In addition we require searching for multiple object matches. Hence, we utilize additional DOT matching stages for position and orientation refinement. During each stage a list of candidate object matches is maintained; candidate matches are added to this list during the initial search stages, and candidates may be removed or merged after subsequent stages. In section 2, we describe the proposed object matching and localization method in more detail. In section 3, we provide experimental results demonstrating the high detection performance and robustness of the proposed method, as well as the high pose estimation accuracy, and high speed. We conclude in section 4. 2. PROPOSED OBJECT MATCHING AND LOCALIZATION METHOD 2.1 Overview The algorithm flow of the proposed method is shown in Figure 2. Firstly, we discuss offline model image processing; and secondly, online input image processing. In the offline model image processing phase, we extract both the Histogram of Oriented Gradient (HOG)1 descriptor and Dominant Orientation Templates (DOT) 3 descriptors from the model ROI region. The HOG descriptor is computed as a single 1-dimensional histogram of the gradient orientations for the model ROI region, and it is used to find out the input object orientation by matching the model HOG descriptor with the HOG descriptor of a particular area in the input image in a sliding window manner. The HOG descriptor has a few key advantages over other descriptors for fast and robust object matching. The first advantage is that we can quickly find a coarse object orientation with HOG matching since the global object orientation change within 2D image plane is manifested as a shift in the index of local orientation histogram binning. This eliminates the need of rotating the model ROI template many times and matching each of them to the input (as done by the classical template matching), leading to great time saving. The second advantage is that, since the HOG descriptor operates on localized cells, it is more robust against geometric (except rotation) and photometric changes including position and scale variations, lighting or illumination change, blur and noise. However, HOG has some limitations in matching. First, the position estimated by HOG is not accurate because the HOG matching scores are not sufficiently discriminative; second, HOG cannot distinguish an object from the same object when it is 180° rotated since their HOG descriptor is the same. To overcome these limitations, we employ matching with DOT. During offline model image processing, we compute DOT descriptors for each rotated model ROI at fine angle resolution. These DOTs will be used to find out the object position at fine precision. DOT is chosen because it is fast to compute, compact in memory, resilient to photometric transformations (blur, noise, low contrast), robust against occlusion and can deal with un-textured objects. Figure 2. Algorithm flow of the proposed method. In the online input image processing phase, the orientation and position of object(s) are determined via three stages. In the first step of the coarse search, we compute a coarse orientation map by matching model HOG with input HOG in a sliding window manner. This step also yields a score map where local peaks correspond to candidate object positions. We then compute an update score map by matching input DO of each sliding window with the model DOT of a particular orientation that is given by the coarse orientation map. We then retain the best matches with highest scores by finding the local peaks in the score map and remove duplicate/close matches whose orientation and position are similar to each other. The coarse search provides initial candidate locations in low/block resolution and rough angular orientation. In the second middle search stage, the algorithm then refines the spatial position and the angular orientation by matching model low/block resolution DOT of different orientations with input low/block resolution DO image in a small spatial neighborhood. In this step we also resolve 180° orientation ambiguity using an additional round of search in a small neighborhood around an angle that is offset by 180 degrees from the previously refined angle. Next, we retain the best matches resulted from “middle search” and remove duplicate/close matches. In the last fine search stage, we perform further refinement to get pixel/sub-pixel position estimate and 1-degree/sub-degree angle estimate using the full resolution DOT plus sub-pixel and sub-angle interpolations. The cascade estimation from coarse to fine and the complementary use of HOG and DOT result in a very efficient and very accurate object matching method. The HOG descriptor reduces orientation search from with 2D template (as done by classical template matching) to 1D correlation matching, and only one HOG feature vector is needed to find out the rough orientation (e.g. 10° resolution) of the target object, eliminating the necessity of rotating model template many times and matching each of them to the input, which leads to great time saving. In the middle search stage, both the orientation and position search takes places in a small angular and spatial neighborhood around the orientation and position that are previously found by the coarse search, which greatly reduces the search space. The fine search stage improves angular and position estimation precision by matching in pixel/sub-pixel position resolution and 1-degree/subdegree orientation resolution, leading to more precise localization. In addition, we employ HOG and DOT feature in complementary to each other to compensate their shortcomings: HOG is fast in arriving at object orientation but not precise in the position estimation, DOT gives good position and orientation estimate but is slow as it requires large number of matchings by scanning the whole image and rotating the template at desired angular resolution. 2.2 Feature extraction We describe the extraction of HOG and DO feature in this section. In the subsequent sections, we will use model DOT to refer to the dominant orientation feature image computed for model image, and will use input DO to refer to the dominant orientation feature image computed for input image. HOG and DO share similar processing in the first few steps which include image pre-processing (smoothing, downsampling), computing horizontal and vertical gradient at edges and computing orientation angle at edges. In our method, the range of orientation angle in both HOG and DO is 0°~180° (not 0°~360°) as we find the former case is more invariant to contrast inversions. Figure 3. Compute HOG descriptor for a ROI. Figure 4. Computation of Dominant Orientation feature. After these, for HOG, we select the orientation with largest gradient magnitude, i.e. dominant orientation, in a 4x4 block and quantize it into discrete index. Multiple dominant orientations can also be combined and quantized into one index. The angle quantization interval determines the HOG orientation estimation precision. The smaller the quantization interval the more precise the estimated orientation, but then HOG 1D correlation matching will take longer time. In our application we quantize the angle into 10° precision, which results in good balance between speed and accuracy in the coarse search stage. Finally we compute the histogram of the orientation indices for the 4x4 blocks encapsulated by the ROI inconsideration. This results in an 18-dimensional histogram vector for a ROI. Figure 3 shows the process of computing HOG descriptor for a ROI region. For DO, we retain the 2D dominant orientation image as opposed to collapsing them into one 1D histogram vector. The orientation angle at each pixel is encoded into a byte, and the angle is quantized into 6 levels, each orientation is indicated by setting “1” to the corresponding bit. To improve matching efficiency, we combine the dominant orientations within a 4x4 block into one byte, each bit corresponding to an angular orientation. What’s more, to reduce the effect of compared angles being on opposing side of quantized angular boundaries, we allow the byte code of neighboring orientation to have overlap. This measure improves the robustness of DO to noise and other small changes between input and model image. More details can also be found in Ref. 3. We compute two types of DO: the first type of DO is in low/block resolution, and the second type is in normal/pixel resolution. For model image, we compute the DOs of model ROI rotated at every 1° for both types and store them as templates for future matching. The first type of model DOTs will be used during coarse and middle search stage, the second type of DOTs will be used during fine search stage. Figure 4 shows the process of computing DOT descriptor. 2.3 Offline model image processing During offline model image processing, we compute the HOG of the maximum region of all rotated model ROIs. The purpose of using the maximum region of all rotated ROIs is to guarantee the object rotated at all possible angles are encapsulated by this ROI so that every edge pixel is counted toward the histogram computation. This HOG descriptor is represented as a 1D vector. We then compute the DOT for each rotated model ROI at every 1°. These DOTs are computed at both block resolution and pixel resolution, as described in section 2.2. 2.4 Coarse search Once the system receives an input image, the method computes the pixel and block resolution orientation index image where each pixel in this image corresponds to the orientation index (18 levels) in the HOG descriptor, and compute the pixel and block resolution DO image where the orientation is quantized into 6 levels. We perform three cascade steps of coarse to fine search to localize the object pose. In this section we devoted to the coarse search stage. Figure 5. Finding coarse object orientation by locating the angle that maximizes the cross correlation score between model HOG and input patch HOG. The coarse search stage is divided two steps. In the first step, we find the coarse orientation of object(s) with HOG matching. For each local input patch [x y wm hm] (wm and hm are the width and height of the maximum model ROI), we compute its HOG at low/block resolution, we then match it with the model low/block resolution HOG using normalized cross correlation (other histogram comparison metric such as, Chi-square, histogram intersection, Bhattacharyya distance10 or Kullback–Leibler divergence11 can also be used). The orientation that yields the highest correlation score is deemed as the orientation of the object in the local input patch. When computing the HOG of local input patch, in order to prevent the edge pixels of the neighborhood region from being included in computing the histogram of current patch, we apply a disk mask to mask out those neighborhood edge pixels. The radius of the disk mask is computed as the half of the maximum width and height of the model ROI. Figure 5 graphically illustrate coarse search step. This step yields a coarse orientation map where each pixel represents the coarse orientation of the object (i.e. 10° resolution) encapsulated in the local input patch, and a score map where local peaks in the score map correspond to the estimated object position at low/block resolution. However, it is challenging to reliably estimate object positions based on the HOG matching score map because first the HOG score map is not sparse and thus is not sufficiently discriminative to indicate true object positions; second, high scores tend to appear in a large continuous region; last, in crowded area, the cross correlation score tends to be higher than non-crowded area, even if the non-crowded area could be a more accurate position. In order to reliably estimate object positions, in the second step of coarse search, we employ DO feature which is more discriminative since it uses the whole 2D template for matching. Given the model DOT rotate by every n-th degree (n is given by the angle quantization interval in HOG), the input DO image, the coarse orientation map and the score map estimated by the first step of coarse search, we refine object position as follows. For each pixel [x y] in the input DO image, we first extract the coarse orientation from the orientation map at [x y], we then get the corresponding model DOT of that particular orientation. Next the model DOT of that particular orientation is matched with the input local DO by per-pixel byte AND. We then count the number of nonzero bytes as the DO matching score. This results in an updated score map. Since both the model DOT and input DO are still in low/block resolution, the local peaks in the updated score map correspond to object positions in low/block resolution. 2.5 Coarse orientation search with optimized HOG computation The complexity of computing histogram at each input patch is O(r) where r is the half length of the longest side of the patch. This complexity is found to be slow and it hampers the HOG matching approach from being used in real-time applications. In this paper, a new, simple yet much faster histogram computation algorithm is proposed. The proposed algorithm maintains one histogram for each column in the image. This set of histograms is preserved across rows for the entirety of the process. Each column histogram accumulates 2r + 1 adjacent pixels and is initially centered on the first row of the image. The kernel histogram (i.e. the histogram of the sliding window) is computed by summing 2r + 1 adjacent column histograms. What we have done is to break up the kernel histogram into the union of its columns, each of which maintains its own histogram. While computing the HOG for the entire input image, all histograms can be kept up to date in constant time with a two-step approach. (a) (b) Figure 6. The two steps of the fast histogram computation algorithm. Consider the case of moving to the right from one pixel to the next, as shown in Figure 6. The column histograms to the right of the kernel are yet to be processed for the current row, so they are centered one row above. The first step consists of updating the column histogram to the right of the kernel by subtracting its topmost pixel and adding one new pixel below it (Figure 6a). The effect of this is lowering the column histogram by one row. The initialization consists of accumulating the first r rows in the column histograms and computing the kernel histogram from the first r column histograms. The coarse orientation estimation algorithm with optimized histogram computation is listed in pseudo code in Figure 7. Figure 7. Coarse orientation search with fast histogram computation. 2.6 Middle search Coarse search stage uses large angle quantization interval in both HOG (10°) and DO (30°) feature for improved matching efficiency, as a result the estimated orientation is in rough angular resolution. We therefore perform a middle search to refine the object orientation to finer angular resolution. Middle search is performed in low/block resolution. At the start, the input orientation is one of the coarse rotation angles (e.g. 10° resolution). At the end, the output orientation is any of the (fine) rotation angles (e.g. 2° resolution). The positions are always at the low/block resolution level for both input and output positions. The low/block resolution model DOT at different orientations around the previously found coarse orientation are matched to the low/block resolution input DO image to find the optimal fine angle. At each rotation angle, there is also a local spatial search across positions around the previous match position. Figure 8. Method for resolving HOG orientation estimation ambiguity during middle search. During middle search, we also resolve the ambiguity that HOG cannot distinguish an object and the same object that is rotated by180°. First, an initial, coarse orientation (whose range is from 0 to 180°) is identified by the coarse search with HOG (section 2.4). Next, this initial orientation is refined during middle search stage by matching with DO feature in a small angular and spatial neighborhood. Then, the middle search is repeated by searching in a small angular neighborhood around an orientation that is offset by 180 degrees from the previously refined angle with a small spatial search. The best orientation from the two middle search stages is chosen as the correct object orientation, which may subsequently be refined further in the find search stage. This strategy resolves orientation estimation ambiguity and greatly improves matching efficiency since the two round of middle search are performed in a small spatial and angular neighborhood. Figure 8 illustrates this process. 2.7 Fine search The position estimated by coarse and middle search is in low/block resolution since low/block resolution HOG and DO feature is used. It is hence necessary to perform fine search to get pixel/sub-pixel level position estimates. Meanwhile, we also refine orientation precision from degree to sub-degree resolution during fine search. This refinement stage is performed at the normal/pixel resolution. Both orientation and position are refined by local search. At the start, the input position is on the low/block resolution grid. At the end, the output position is on the normal resolution grid. The orientation angle is also refined by a search around the previously estimated orientation angle after middle search. For each angle, first, the pixel resolution model DOT at 0° is rotated to the proper (candidate) angle before searching the different positions. Then, this pixel resolution model DOT at the correct rotation angle is matched with the input DO image at different positions to yield the optimal position. All matching scores across angles and positions are stored. These can be used later for fractional rotation angle and position estimation if necessary. 3. EXPERIMENTS AND RESULTS 3.1 Datasets and experiment platform To test the accuracy and speed of the proposed system, we collected image datasets suitable for industrial inspection applications, including images of automobile, metal, food, pharmaceutical, packaging and electronics products. These datasets were captured using a SHARP factory inspection camera. A wide variety of different objects are captured in these datasets including transistors, springs, screws, coins, IC chips, metal parts from different industries. The objects exhibit large amount of geometric changes (e.g. in-plane rotation) and photometric changes (blur, noise, contrast, object defects, complex background), making this dataset very challenging. A few sample input images with different target objects are shown in Figure 9. We evaluated the proposed method in terms of speed, accuracy and robustness. Figure 9. Sample input images in the testing dataset. The small image is the model image with target object in it. In terms of accuracy evaluation, we benchmarked the proposed method with a baseline template matching method on a PC platform. The baseline template matching method uses DOT as the main feature and employs a similar coarse-to-fine search architecture as the proposed method. The principle difference of these two methods lie in that the proposed method employs HOG feature to compute a coarse orientation during the coarse search stage whereas the baseline template matching method uses DOT for both orientation and position estimation. In other words, the baseline method does not use HOG to accelerate the coarse search. Figure 10 shows the side-by-side object localization results obtained by the proposed method (left panel) and the baseline template matching method (right panel) for the screw, spring and transistor datasets. Both method exhibits high detection accuracy and repeatability in these challenging multi-objects datasets, and their detection accuracy are comparable. Figure 10. Detection accuracy benchmark comparison between proposed method (left) and baseline method (right) on screw (top), spring (middle) and transistor (bottom) dataset. In terms of robustness evaluation, we tested the proposed method on a wide variety of datasets with large range of inplane rotation changes (with no/small scale change) and photometric variations including blur, noise, contrast, object defect and complex background. Figure 11 (a) to (f) shows the evaluation result under these different conditions. In figure 11 (a) to (f), the top row shows the input images, and the bottom row shows the result image obtained by the proposed method. The proposed method is shown to be fairly robust to geometric changes and photometric changes, which makes it quite applicable to real-world industrial inspection applications. (a) Multi-object (b) Defective object (d) Object with decreasing contrast (c) Complex background (e) Object with increasing noise level. (f) Object with increasing blur level. Figure 11. Robustness evaluation result of the proposed method. In terms of speed evaluation, we first benchmarked our implementation of the proposed approach with our implementation of the baseline template matching method. Both methods are implemented in C/C++, and use the Intel Integrated Performance Primitives (IPP) library for software acceleration. The timing performance is measured on a netbook PC with an Intel Atom CPU, for both implementations. The processing time results are shown in the left and middle columns of Table 1. The image size of the spring, transistor and screw are 512x480, 1600x1200, and 1600x1200 respectively, and the model image size for these three objects are 61x91, 66x139 and 66x139 respectively. The proposed technique achieved very large reduction in processing time in case of the transistor and screw datasets. The proposed technique did not achieve a reduction in overall processing time in case of the spring dataset. The reason is that in this case, the implementation retained a much larger number of candidate objects after the coarse search stage, that were used for refinement during the later search stages. For the spring dataset, even though the proposed technique was able to speed up the coarse search, the reduction in processing time was not able to make up for the increase in processing time that resulted from the refinement stages. Additional benchmarking was performed by measuring the processing time of an optimized and embedded implementation of the baseline template matching method. This implementation runs on the target embedded inspection camera platform, with the same Atom CPU. The processing time results are shown in the right column of Table 1. The results provide additional evidence of the speed-up provided by the proposed combination of HOG and DOT. Table 1. Speed benchmark evaluation. The columns show processing time results for: (left) our implementation of the proposed (HOG+DOT) approach with timing measured on a netbook PC platform; (middle) our implementation of the baseline (DOT) approach with timing measured on a netbook PC platform; (right) an embedded and optimized implementation with timing measured on the inspection camera platform. 4. CONCLUSION In this paper, we proposed a new template matching method that improves efficiency over existing approaches by decomposing orientation and position estimation into two cascade steps. Our results show that the proposed method achieves significant speed improvement at comparable accuracy performance. And the proposed method achieves high robustness to photometric changes (contrast, background illumination, blur, noise and object defect) and geometric changes (rotation, translation and small scale change). REFERENCES [1] Dalal, N, Triggs B., “Histograms of oriented gradients for human detection,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 886-893 (2005). [2] Hel-Or, Y., Hel-Or, H., and David, E., “Fast template matching in non-linear tone-mapped images,” Proc. 13th IEEE International Conference on Computer Vision (ICCV), 1355-1362 (2011). [3] Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., and Navab N., “Dominant orientation templates for real-time detection of texture-less objects,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2257- 2264 (2010). [4] Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., and Lepetit, V., “Gradient response maps for real-time detection of texture-less Objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, 876-888 (2012). [5] Lampert, C. H., Blaschko, M. B., and Hofmann, T., “Beyond sliding windows: Object localization by Efficient Subwindow Search,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-8 (2008). [6] Lewis, J. P., “Fast template matching,” Vision Interface 95, Canadian Image Processing and Pattern Recognition Society, 120-123 (1995). [7] Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60, 2, pp. 91-110 (2004). [8] Sibiryakov, A., “Fast and high-performance template matching method,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1417-1424 (2011). [9] Viola, P., and Jones, M., “Robust Real-time Object Detection,” Int. Journal of Computer Vision, vol. 57 (2), 137154 (2001). [10] Bhattacharyya, A., “On a measure of divergence between two statistical populations defined by their probability distributions,” Bulletin of the Calcutta Mathematical Society, 35: 99-109 (1943). [11] Kullback, S.; Leibler, R.A., “On Information and Sufficiency,” Annals of Mathematical Statistics 22 (1): 79-86 (1951).
© Copyright 2025