Multi-scale sub-image search

Multi-Scale Sub-Image Search
Nicu Sebe
Michael S. Lew
D.P. Huijsmans
Leiden Institute of Advanced Computer Science,
Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
fnicu mlew [email protected]
ABSTRACT
If an image should be retrieved by its subregions from
a large image database, an immense number of possible queries will appear. Therefore, the index which encodes the spatial information of an image, should make
only few assumptions about possible queries. In addition, this index has to consider dierent scales of objects in the image. In this paper, we propose a novel
approach using a hierarchical index encoding image regions, gained by a xed partition. The suggested index
uses color features and is easy to implement. The index
is tested on a database with more than 12,000 images.
1 INTRODUCTION
While most image retrieval systems retrieve images
based on overall image comparison, users are typically
interested in "object searching", where they can specify an "interesting" subregion (usually an interesting
object) of an image, as a query. The system should
then retrieve images containing this subregion (according to human perception) from a database. This task
called sub-image query, is made challenging by a wide
variety of eects (dierent viewing positions, camera
noise, object occlusion, etc.) that cause the same object to have dierent appearances in dierent images.
The system should also be able to solve the localization
problem, i.e. it should nd the location of the object in
an image. The lack of a good image segmentation process suitable for large, heterogeneous image databases,
implies that objects have to be located in unsegmented
images, making the location problem dicult.
These simple considerations call for two important
demands of an image index: (1) the index must have a
hierarchical structure to cope with details of the image,
and (2) the index should make nearly no assumptions
about possible queries, since it is not possible to know
in advance if the image should be retrieved by one subregion or by another. This is one of the drawbacks if
the image is segmented explicitly in objects.
To process user queries, some systems use xed grid
segmentation methods [3, 1]. For further improvement
on the eciency and the eectiveness of content-based
retrieval, a multiresolution matching approach [2] has
been proposed. Here, the query image may be a handdrawn sketch or (a potentially low-quality) scan of the
image to be retrieved. However, in many situations,
users are interested in, or can remember only local image contents, therefore, sub-image query processing is
needed. Unfortunately, not many image database management systems can handle arbitrary-size sub-image
queries based on color and spatial similarity. For the
systems that can deal with sub-image queries of arbitrary size, multiresolution matching is not used.
This paper suggests a novel approach which leads to
an index that (1) regards the hierarchical composition
of an image, (2) makes nearly no assumption about possible queries and (3) is small. This is done by indexing
only subregions and no specic objects. The image is
partitioned in overlapping rectangles of dierent sizes.
This allows to deal with the dierent scales of objects
in the image. Each of these regions is represented by
global information and, for eciency reasons, instead of
storing the whole global features, only the dierences of
the feature vectors are stored. The proposed index uses
color features, but these ideas can be easily extended
for other global features.
2 SUB-IMAGE QUERYING
The sub-image querying problem can be dened as follows: given as input query a sub-image Q of an image I
and an image set S , retrieve from S those images Q in
which query Q appears according to human perception
(denoted Q Q ). The problem is made more dicult
than image retrieval by a wide variety of eects (such
as changing viewpoint, camera noise, etc.) that cause
the same object to appear dierent in dierent images.
0
0
2.1 Performance measures
Let I be the correct answer for Q and let rank(I )
be the rank assigned by the retrieval method. We use
the following measures: (1) Average r-measure, the averaged sum over
P all queries of the rank of the correct
answer, 1=n P =1 rank(I ), (2) Average precision of a
method 1=n =1 1=rank(I ), the sum over all queries
of the precision at recall equal to 1. Note that a method
is good if it has a low r-measure and a high precision.
i
i
i
n
i
i
n
i
i
2.2 Location problem
The location problem is the following: given a query
image Q and an image I such that Q I , nd the "location" in I where Q is "present". Eciency is required
because huge amounts of data need to be processed.
The location problem can be solved by using a xed
rectangular partition. Using such partition, we obtain
a precise localization of the sub-images that can be retrieved as matches for the user query. Problems appear
when the desired sub-image covers parts of two or more
neighboring possible locations drawn by the xed grid.
To overcome this problem we chose the rectangles to be
overlapping. Moreover, we consider a partition formed
of dierent sized rectangles in order to account for different scales of regions in the image.
where ( ) are the standard deviations of the respective features over the entire database. Finally, the similarity of two color feature vectors (A) and (B )
is given by j (A) ? (B )j .
i
color
color
color
color
db
4 INTER HIERARCHICAL DISTANCE
Let A be a two-dimensional image pixel array partitioned into L subsets A . A convenient representation
of the image information can be found by taking the
distribution of the local features. This leads to a normalized probability density function represented by a
histogram. Unfortunately, this approach suers from
complete lack of spatial information, which makes it difcult to index image regions properly. One solution is
to extract the global features of image subregions (A ),
but this increases the index size dramatically. However,
if only the dierences of the global features of the image
and its subregions are stored, then the spatial encoding is guaranteed without a major increase of the index
size. We introduce a measure of the distance between
the global features of the image and the features of its
subregions: inter hierarchical distance (IHD) taken between feature vectors of dierent hierarchical levels of
the image partition [1].
In the case of color features, a two dimensional IHD
vector is used. The vector components are the
L1 -norm of the dierences of the mean and covariance
elements, respectively:
l
l
I HD
3 IMAGE FEATURES
Color indexing is one of the dominant methods of the
visual media retrieval methods. In color indexing, histogram methods [6] are often used but, even if they
are relatively insensitive to position and orientation
changes, they do not capture the spatial relationships
of color regions and thus, have limited discriminating
power. Some authors showed [5] that characterizing
one dimensional color distributions with the rst three
moments is more robust and more ecient than working with color histograms. A further improvement can
be achieved by taking the covariance and the mean of
the color distribution in a multidimensional color space
[4]. The color features representing the color distribution are: (1) the average color = ( ; ; ) and, (2)
the covariance matrix [ ] (i; j 2 fL; a; bg) of the color
channels. The La b color space is chosen because it
is perceptually uniform. Since the covariance matrix is
symmetric, only 6 entries have to be stored and considering the 3 mean entries, we obtain a nine dimensional
global color feature (A).
To determine the similarity of two n-dimensional feature vector, a weighted L1-norm, which is dependent of
the database entries, is introduced:
L
a
b
ij
l
I HD;
1
X
(A) =
j (A) ? (A )j
(2)
j (A) ? (A )j
(3)
i
i
l
i2fL;a;bg
l
I HD;
2
(A) =
X
ij
ij
l
i;j2fL;a;bg
where index l refers to the subregion A .
Let B be a region that serves as query, then all images A with regions A , similar to B , should be retrieved. For this, the IHD (A) for the image A
and the region B is computed. This IHD is then compared with each entry of the set f g =1 of the
image A. The minimum distance is used to rank the
images A:
l
l
B
I HD
l
I HD
l
L
color
j (A)j
X
= j (A)j
n
i
db
=1
i
( )
i
(1)
d (A; B ) = =0
min j
reg
l
L
l
I HD
(A) ? B
I HD
(A)j
db
(4)
Because the region B has also to be compared with
the whole image A, the region l = 0 with 0 = 0, is
included in the minimization.
I HD
5 MULTI-SCALE PARTITION
In Eq. (4), when two images are compared, the locations of the subregions A have to be known. To simplify
l
Method
Avg. r-measure Avg. precision
Mean & Cov.
6.4
0.42
Mean
9.2
0.31
Cov.
8.7
0.33
Table 1: Average r-measure and Average precision for the rst experiment.
Rank
15
8
6
3
5
Rank
42
7
2
1
5
Rank
4
10
6
8
673
Figure 1: Variations: Rank of the original image using the combined index with two-dimensional IHD when the
region indicated by the black frame serves as query
the comparison, a xed partition for all images is chosen. The advantage is that the partition is robust and
does not rely on any segmentation algorithm. In addition, the partition (1) has to cope with dierent scales
of objects and regions in the image and (2) should not
make any assumption about possible queries. To fulll
the former task, a partition with dierent sized rectangles is chosen. To accomplish the latter task, the regions
are overlapping, ensuring that all objects of a certain
scale are covered by a region of the same scale. We use
a partition with three multi-scale levels. The highest
level is the image itself. For the second level the image
is sampled with 3x3 rectangles of half the side length
of the image which yields to overlapping regions. The
lowest level is composed by 5x5 rectangle regions which
have a side length of one third of the image dimensions.
Totally, the IHD's of 25+9=34 regions and the global
color feature of the whole image are computed.
The overlapping leads to redundancy information in
the IHD's, because some pixels are included in more
than one region. This causes no problem since each
image region is regarded as independent (Eq. (4)).
6 RESULTS
Due to space limitations we present only some of our
experiments. All tests run on a database containing
12,012, 8 bit color JPEG images. They cover a wide
range of nature scenes, animals, buildings, construction
sites, texture and paintings.
We consider rst a query set consisted of 50 queries,
each with a unique correct answer. The queries represent various situations like dierent views of the
same scene, large changes in appearance, small lighting
changes, spatial translations. The images were retrieved
and ranked by the distance d . Table 1 contains the
average r-measure and average precision obtained when
on the same query the index was run with only one
or the other vector component of the two-dimensional
IHD. It shows that even one component can be used to
achieve a reasonable performance, but the combination
of both is more powerful.
In the second experiment, a variety of images is used
(Figure 1). Each image is retrieved by the subregion
indicated with the black frame. Although the regions
does not correspond to the regions in the image partition, the images were retrieved. As long as the regions
include several colors or saturated colors, the retrieval
results are promising. The index is even able to deal
with objects like re and "sunsets", which would have
been a real challenge for the methods which rely on an
explicit segmentation of the image into objects. An example where the index fails completely, is the one with
reg
Rank
2
6
50
15
156
Figure 2: Stability: Rank of the original image when the region indicated by the black frame serves as query
the face of the lion. The selected region includes colors that are very similar to the color of soil, therefore,
the index retrieves many images which include regions
of soil. Another problem is retrieving simply blue or
green objects. In this case, all images with blue sky
and green forest, respectively, are retrieved.
To illustrate the stability of the index we selected
ve dierent regions from an image with the head of an
animal (Figure 2). We present the ranks of the original image when the regions indicated with the black
frame serve as queries. Note that in the cases where
the region is large enough, the index performs reasonable and seems to be stable. The index fails only in
the cases where the details of the animal head are not
characteristic enough.
In summary, we tried our method in dierent situations. First, we considered the situation where the index is used to retrieve similar images taken under dierent conditions, such as dierent views of the same scene,
large changes in appearance, small lighting changes,
spatial translations. It was proven that the best performances are obtained with the full index but, even
the use of only one component of the index leads to
promising results. Secondly, we considered various images that were retrieved using as queries, regions indicated by user. As long as the regions were characteristic
enough, the index was able to retrieve, on average, the
original images in top 10. Finally, in order to illustrate
the stability of our index, we considered the situation
where the user indicates dierent query regions from
the same image. The index performed reasonable and
seemed to be stable.
7 SUMMARY AND CONCLUSIONS
Not many image database management systems can
handle arbitrary-size sub-image queries based on color
and spatial similarity. For the systems that can deal
with sub-image queries of arbitrary size, multiresolution matching is not used.
Our novel approach suggests an index which is based
on a xed multi-scale partition with overlapping rectangular regions. Each region is represented by global
features and instead of storing them for all regions, only
the global feature of the whole image and the IHD's of
the subregions are stored. The index allows to retrieve
images by relative arbitrary subregions. The proposed
index uses color features. As tests have shown in a
database containing more than 12,000 images, the index is suitable to retrieve images by multicolored subregions. The index is easy to be implemented, which
makes it convenient to add it to existing retrieval systems which use color features.
The advantages of our approach come from the way
the partition was chosen. Because dierent sized rectangles where chosen, the partition can cope with dierent
scales of objects and regions in the image. On the other
hand, the overlapping of the rectangles ensures that all
objects of a certain scale are covered by a region of the
same scale. Furthermore, out index does not rely on
any segmentation algorithm.
8 ACKNOWLEDGMENTS
This research was supported with a grant from Philips
in the Netherlands.
References
[1] A. Dimai. Spatial encoding using dierences of
global features. Proc. SPIE, Storage and Retrieval
for Image/Video Databases IV, 1997.
[2] C. Jacobs, A. Finkelstein, and D. Salesin. Fast
multiresolution image querying. ACM SIGGRAPH,
pages 277{286, 1995.
[3] K-S. Leung and R. Ng. Multiresolution subimage similarity matching for large image databases.
Proc. SPIE, Storage and Retrieval for Image/Video
Databases IV, 1997.
[4] M. Stricker and A. Dimai. Spectral covariance and
fuzzy regions for image indexing. Machine Vision
and Applications, 10:66{73, 1997.
[5] M.A. Stricker and M. Orengo. Similarity of color
images. Proc. SPIE, Storage and Retrieval for Image/Video Databases III, 2420:381{392, 1995.
[6] M.J. Swain and D.H. Ballard. Color indexing. International Journal of Computer Vision, 7(1):11{32,
1991.