Being similar is not enough: how to bridge images

Universidade de São Paulo
Biblioteca Digital da Produção Intelectual - BDPI
Departamento de Ciências de Computação - ICMC/SCC
Comunicações em Eventos - ICMC/SCC
2014-05
Being similar is not enough: how to bridge
usability gap through diversity in medical
images
International Symposium on Computer-Based Medical Systems, 27th, 2014, New York.
http://www.producao.usp.br/handle/BDPI/45664
Downloaded from: Biblioteca Digital da Produção Intelectual - BDPI, Universidade de São Paulo
2014 IEEE 27th International Symposium on Computer-Based Medical Systems
Being Similar is Not Enough: How to Bridge Usability Gap Through Diversity in
Medical Images
L´ucio F. D. Santos∗ , Marcos V. N. Bedo∗ , Marcelo Ponciano-Silva∗† , Agma J. M. Traina∗ and Caetano Traina Jr∗
∗ Department of Computer Science, University of S˜
ao Paulo, Brazil
† Fed. Institute of Education, Science and Technology of the Triˆ
angulo Mineiro, Brazil
{luciodb, bedo, ponciano, agma, caetano}@icmc.usp.br
similarity-based searches, represented in metric spaces [5].
The two most well-known similarity-based comparison operators are the similarity range (Rq) and the k-nearest neighbor
(k-N N q). While a query Rq retrieves the images closer
or at the same distance from the query image, the k-N N q
operator retrieves the k elements most similar to it.
Writing a correct query definition is a multidisciplinary
subject, which can help to bridge both the semantic and the
usability gap. According to [2], usability gaps are related
to how easy it is to use a system from the health staff
perspective, allowing them to easily and intuitively formulate
queries and later interact with the result through query
refinement and relevance feedback. However, in a clinical
application, querying a massive database employing k-N N q
and/or Rq predicates may often retrieve images too similar
among themselves (i.e. retrieved images are more similar
to each other than to the query center, being called nearduplicates) [6].
Figure 1 (a) illustrates the result obtained by the traditional k-N N q algorithm in a search space, where dotted
balls show the relation between the near-duplicates group.
As it can be seen, plain k-N N q returns two near-duplicates,
which may be unsatisfactory to the expert, hence they probably bring the same amount of information when they are
both in the result set. Near-duplicate’s requires more effort
from the specialist to navigate and analyze the elements,
often requiring several query reformulations until the desired
answer is obtained [5].
In order to present a more useful result and bypass the
problem of the near-duplicates, several research areas from
different domains have introduced “diversity” to similarity
result sets [7], [8]. The intuition behind diversity is to
generate a result set that includes elements not only similar
to the query element, but also diverse among themselves.
Recently, this property has been explored in medical images
systems [9] as a bi-criteria optimization problem, where
similarity and diversity compete with each other ruled by
a trade-off parameter defined by the user, which results in
an NP-hard problem [10]. In spite of the efforts to reduce the
computational cost required by the diversity approach, the
problem remains costly and setting the trade-off parameter
among similarity and diversity for each query is difficult and
Abstract—In this paper we present a technique developed
to bridge the usability gap in Content-Based Medical Image
Retrieval (CBMIR) systems exploring both similarity and diversity. Usability gaps are related to how easy to use a software
tool from the radiologist’s perspective is. Although much have
been done to better express similarity queries, the use of
CBMIR over massive databases may have drawbacks that
impact its usability. We claim that much of the problems derives
from the fact that many images returned are closer to each
other than to the query element (near-duplicates). To target this
nuisance, we propose to boost similarity queries with diversity,
using a technique to hierarchically cluster near-duplicates. We
tailored a domain-independent and parameter-free method by
controlling the maximum area reached in the search space. This
novel approach to improve CBMIR systems take advantage of
diversity expectations. The proposed approach BridGE (Better
result with influence diversification to Group Elements) aims
at adding new relevant information to the analysts, reducing
the need of further query refinement or relevance feedback
cycles. The results are displayed to the specialist as a traditional
CBMIR result whereas the radiologists are able to expand the
clusters and navigate through them. The results support our
claim that a CBMIR system empowered with diversity is able to
bridge the usability gap, grouping near-duplicates and being at
least 2 orders of magnitude faster than its mainly competitors.
Keywords-Content-Based Medical Image Retrieval; Similarity Queries; Search Result Diversification;
I. I NTRODUCTION
When radiologists analyze new cases in the clinical routine, they can be motivated to search for similar past cases
in a historic database that could have had similar known
anomalies. In fact, retrieving similar images have potential
to help the specialists to interpret medical images, providing
new insights and contributions to the current case. Also,
differential diagnosis techniques may help to increase (or
decrease) the certainty degree of the professionals about their
previous diagnosis hypothesis [1].
The automatically retrieval of similar images has been
studied by many researchers [2][3], culminating (among
others) in the current components/methods of the ContentBased Medical Image Retrieval (CBMIR) systems [4].
CBMIR-based tools retrieve images that are similar to the
given query image instead of using traditional data (i.e.
text or numerical attributes related to the image). Usually, those systems are supported by operations involving
1063-7125/14 $31.00 © 2014 IEEE
DOI 10.1109/CBMS.2014.21
287
Figure 1. Element selection for similarity queries in an Euclidean bi-dimensional space. Squares are the elements selected. (a) The solution space for the
traditional k-N N q centered at query element (sq ), including near-duplicates. (b) The result set retrieved by an optimization diversity approach. (c) The
result diversification retrieved by the proposed diversity on CBMIR: clustered near-duplicates (circle) and its representative elements (pentagons). Triangle
means elements not returned/used to answer the query.
at least generally unintuitive for some users [11]. Figure 1
(b) depicts an example of the diversity provided by the
method developed in [9], which shows that even considering
diversity, it is possible to have near-duplicates.
II. BACKGROUND
Most CBMIR systems rely on the similarity query
paradigm [12], [13] to perform image retrieval. Similarity
queries over images are performed comparing pairs of feature vectors extracted from them. The vectors are compared
using a function to measure their similarity, and based on
similarity criteria (either Rq or k-N N q) the system retrieves
images that are close/similar to the given query image. The
problem with those traditional criteria is that they produce
a great number of near-duplicates in the result set.
A solution for the near-duplicate problem is to take into
account the diversity among the elements in the result set.
Diversity has been tackled in various ways by different
research areas [7], [8]. The most common method is solving
a bi-criteria optimization problem, where similarity and
diversity compete with each other, ruled by a trade-off
parameter [14], [10], [9], which is defined by an expert.
Generally speaking, those methods receive the result of basic
similarity algorithms configured to retrieve more elements
than requested by the user and thereafter apply the bi-criteria
objective function to re-rank the solution inducing diversity
among the elements sent to the user. Another definition of
diversity considers that it exists a minimum distance ξp
allowable between each pair of elements [15]. It assumes
that if two elements are closer than ξp , then they probably
carry the same amount of information and only one should
be returned (minimum-distance rule).
Besides the higher computational cost, the main problem
to adopt content-based diversity queries in CBMIR systems
is the number of additional parameter required for each
query, which contributes to widen the usability gap. Notice that a query using those methods may not return the
requested k similar images (see Figure 1 (b)) when the
diversity is considered, showing that they sacrifice similarity
to the query element to achieve diversity, imposing a tradeoff between similarity and offering more options to the user.
In this paper we explore result diversification on CBMIR systems to group the near-duplicates automatically,
retrieving more information to the specialist right in the
first query (retrieving the answer illustrated in Figure 1(c)).
In addition, the user can hierarchically visualize each nearduplicate group in the result set and proceed interactively,
performing either the shifting of the query element in an
intuitive way or a relevance feedback through the Rocchio
method [1].
The results reported in this paper were obtained using images from two real medical datasets. Our proposed method
highlights the usability difference between the traditional
CBMIR and the CBMIR with diversity. We follow two
strategies to evaluate our proposal: 1) we measured the
number of interesting elements returned by our approach and
by the traditional k-N N q algorithm, and 2) we measured
the time required to execute traditional similarity queries
and those improved with diversity. The results showed that
our method is at least 2 orders of magnitude faster than the
competitors to group near-duplicate images and returns 10
times more elements from the space-solution at the cost of
being at most 2 times slower than traditional (non-diverse)
k-N N q .
The remainder of this paper is structured as follows.
Section 2 summarizes the main concepts and related works.
Section 3 presents the methodology applied to our technique,
while Section 4 details experiments performed over two real
medical datasets and analyzes the results achieved. Finally,
Section 5 presents the conclusions and future works.
288
Other recent approach defines diversity without requiring
more information from the user, the so called result diversification based on influence (RDI) [5]. It is based on an automatically defined minimum distance among two elements si
and sj that employs only the position among the elements
relative to the query element sq . The minimum distance is
estimated using the concept of “influence” intensity I, which
is defined as the inverse of the distance between si and sq .
Thus, an element sj is more influenced by si than sq , if
I(si , sj ) ≥ I(sj , sq ).
The BRID (Better Result with Influence Diversification)
technique implements the influence concept for similarity
queries. It considers that all the elements influenced by
element si can be discarded from the result set. Furthermore,
as BRID uses the distance from each element si to the
query center to define the region of influenced elements, the
estimated radius is strictly increasing and may exceed the
context of the query (the nearest elements), which provides
a holistic vision of the dataset from the perspective of the
query element.
In this paper, we extend the BRID technique in two
distinct ways. First, we expand the search algorithm to
retrieve not only the answer taking the RDI concept into
account, but also to retain and group the near-duplicates
instead of discarding then. Second, we limit the search space
using a heuristic that explores a context more focused on
the query result, avoiding traversing the entire dataset. We
choose to adopt RDI because it does not require setting
parameters, thus being transparent for the user.
The BridGE technique uses the RDI concept to cluster
medical images, retrieving for each of the k resulting image
its own similar image subset. The k image subset allows
a new, hierarchical way to present results, showing the
neighborhood around elements with respect to its distance
to the query element. It is important to highlight that nearduplicate images are stored to enable the user to ask about
them, even though the images are similar among themselves.
They may be interesting because of other data associated to
them, distinct from the image itself, such as medical records
and other metadata.
We also propose a heuristic denominated ContextBoundary (CB), whereby we expect to restrict and control
the size of the space solution’s visualization. Our approach
allows the user to perform query refinements: for instance, to
browse the result set and select one of the retrieved images
as a new query element. Moreover, BridGE also permits
executing relevance feedback cycles, embedding an adequate
strategy, as for example the well-known Rocchio technique.
A. BridGE: Grouping elements based on Influence
Our proposed method incrementally builds the result
set selecting representative elements for groups of nearduplicates. The first step selects the first element that is not
Influenced by others in the result set. The main contribution
of BridGE over BRID is that it assigns the near-duplicates to
the non-influenced elements (called representatives), keeping
they hierarchically sorted, forming lists of clusters. Previous
techniques simply discard the near-duplicates, considering
that they can not help the user. However, in spite of
the images be near-duplicates, the additional information
associated to each element can impact the support decision
process when analyzing the medical images.
Each cluster list is sorted by the distances between
the elements and their representative, which is maintained as the list header. Thus a cluster is maintained as list: Si {si1 , si2 , ..., sin } | d(si1 , si2 ) ≤
d(si1 , si3 ) ≤ ... ≤ d(si1 , sin ), n ∈ N. Therefore, in a
hierarchical way, the representative image summarizes the
entire set of near-duplicate ones regarding the similarity to
the query element.
The algorithm processes the search space sequentially.
After finding the first cluster, it iteratively continues searching for the next not Influenced element until retrieving k
of them. Thus, BridGE always retrieves more information
when compared to the traditional approaches because for
each element it can have an associated cluster list which can
be shown to the user on request to navigate the neighborhood
of each element. We call this operation as a “local expanded
query”. Figure 1 (a) and Figure 1 (c) illustrates the difference
between the traditional result produced by a similarity query
and by our proposal when searching for the five most similar
cases. Using the CBMIR front-end, the radiologist can
navigate through the result set, expanding the sorted cluster
III. P ROPOSED METHOD
Intuitively, content-based queries should retrieve the most
relevant images close to the query element. Therefore, the
resources that the user can employ to express and to analyze
the results are directly related to the system usability and
its application to the clinical routine. A method intended
to bridge the usability gap takes into account the socalled query usability levels: query statement, query feedback and query refinement. Accordingly, we developed the
Better result with influence diversification to Group Elements
technique (BridGE) that covers the aforementioned levels.
It extends and improves the BRID algorithm specifically
targeting the task of recovering medical images. It provides a
clear way to express queries, facilitates expressing query refinements and allows the user to explore relevance feedback
in an intuitive and unobtrusive way. Moreover, the proposed
technique allows a clever way to cluster and hierarchically
display results, diminishing the cluttering of presenting too
many similar elements without discharging these elements.
We select elements that are not near-duplicates from others in the search space, which keeps the fundamental nature
of the operations that retrieve images similar to the query
element, but internally considers the relationships among the
elements, making the result analysis task more intuitive.
289
by selecting its representative images and retrieving their
reports and metadata. Moreover, the specialist can perform
relevance feedback or just select a new query element, by
shifting the search space.
ξi =
i
1
d(su , sq )
i u=1
(1)
This process repeats until BridGE finds the required k
representative elements or no other element exists in the
search space. Notice that controlling the increase of the
influence radius on the relative rank of the result set elements
regarding the query element allows BridGE to define a
search space focused on the query and assigns only the too
similar elements as near-duplicates, without bothering the
user to set parameters.
B. The Context-Boundary to CBMIR with diversity
Algorithms that analyze several elements of the dataset
to diversify a query have two mainly problems: 1) it is
hard to maintain the search space focused on the query
context, that is, around the elements that remain similar to
the query element, and 2) maintain an acceptable time to
retrieve the answer (computational cost). The first problem
is related to the semantics of the returned elements, as too
much dissimilar elements from the query center may induce
users to think that the query was misinterpreted. The second
problem is related to the system acceptability, because the
user will not wait for minutes to receive the answer to a
query that even may need to be reformulated or refined.
To surpass such problems, we can consider that there
exists a maximum context radius (MCR) (see Definition 1)
related to the query element, which avoids returning elements farther than a distance ξmax . Indeed, some of the
approaches reviewed in Section II uses concepts equivalent
to the MCR to reduce computational costs and define the
query context. However, defining the distance ξmax to the
query element relies on the user expertise about the distance
distribution among the elements in the whole dataset, which
impaires the usability of the system.
IV. E XPERIMENTAL E VALUATION
In this section we compare our proposed BridGE algorithm to the traditional (non-diverse) k-N N q, to a diversity algorithm based on the k-medoid clustering algorithm
(CLT) [10] and to the diversification algorithm based on
the optimization approach (OPT) [9]. The CLT algorithm
employs a concept similar to that used in BridGE, as it also
groups similar elements and returns a set of representative
elements. OPT represents the concept of diversity on similarity queries already applied to medical images.
We follow two strategies to evaluate our proposal. The
first one determines which method has more probability to
empower a similarity query to recover new information. For
this, we measured the number of images retrieved by our
BridGE algorithm and by the traditional k-N N q one. The
second is a performance test to evaluate the cost to provide
more useful answers to a similarity query with diversity.
We evaluate the results by processing two real image
datasets: the MRIBalan dataset [9] and the ImageCLEFmed
dataset [16]. The former is composed of 704 images of magnetic resonance imaging (MRI) obtained from the Clinical
Hospital at Ribeirao Preto of University of Sao Paulo (USP).
The image feature vectors were obtained by the method
proposed in [17] and were compared using the Euclidean
distance function (L2 ) evaluated over the 30 features extracted. The ImageCLEFmed dataset is composed of 5,042
biomedical images of 32 manually assigned disjoint global
categories, which is a subset of a larger collection of six
different datasets used for the medical image retrieval task
in ImageCLEFmed 2007 [16]. The image features vector
of each image in this dataset was extracted using the SIFT
descriptor and modelled using the Bag-of-Features approach
with a dictionary of 310 visual words. The Manhattan
distance function (L1 ) is used for this dataset.
For each evaluated dataset, we randomly chose 100 different elements to be employed as query center elements. The
parameter required for CLT and OPT to balance similarity
and diversity was set to 0.5, which has been reported by their
authors to achieve on average the best diversification [10].
Regarding BridGE, as it is a parameter-free technique, we
Definition 1. Maximum Context Radius (MCR): Given
a domain S, a dataset S ∈ S, a query element sq ∈ S,
a distance ξmax , ξmax ∈ R and a distance evaluation
function d, the elements most similar to sq is the subset
S ⊂ S| ∀ sj ∈ S , d(sj , sq ) ≤ ξmax .
Our approach automatically estimates an influence radius
for each element to maintain the representative elements
among the most similar ones, which better adapts to the
distance distribution around the query element without requesting any information to the user.
The Context-Boundary heuristic is defined based on the
relative rank of each element in the result set regarding the
query element. BridGE starts assuming that the influence
radius around the query element is zero, since the result set
is empty. The most similar element to the query element
is inserted as the representative and its distance to the
query element is used to define the first influence radius,
ξ1 = d(s1 , sq ). All the elements si in the search space at
a distance d(si , s1 ) ≤ ξ1 are grouped as near-duplicates of
s1 and removed from the search space. Thereafter, the next
element more similar to the query element that is not a nearduplicate of s1 is inserted in the result set and its influence
radius is defined following Equation 1.
290
#$%
&%
'
!
"##
Figure 2. Result sets for a query over the MRIBalan dataset, centered at the “query element” shown and considering k = 5. (a) The result obtained by
BridGE highlighting the near-duplicate images hierarchically grouped to an element of the result set and (b) the result set generated by the traditional
k-N N q. Notice that the 5 images returned by k-N N q are the same grouped as near-duplicates of the third returned by BridGE.
do not need to set any query parameter besides the number
k of elements to retrieve.
The experiments were performed on a computer with an
Intel Core i7 processor and 8 GB of main memory, under
Ubuntu Linux 11.10. All the algorithms were implemented
on C++, using the same programming framework for all of
them to enable fair comparisons.
each other than the result presented by the plain similarity
query shown in Figure 2 (b). In addition, each image in
the diversity result shows also the number of other nearduplicate images, indicating for the user that grouped images
do exist. Thereafter, if the near-duplicates are interesting to
the user, it is possible to expand the query result without resubmitting a query to the system. Notice that we improved
the query coverage to naturally include the diversity without
missing the context of retrieving the most similar images nor
requesting new parameters from the user.
A. Recovering New Information Experiment
In a massive image database, near-duplicate images usually have a lower probability to add new information to
the query result. Nevertheless, the near-duplicates should
be made available on request to the user, increasing the
understanding of the provided answer as, if they were properly presented, it is possible to extract additional relevant
information.
The search for similar images with high probability of new
information (i.e, that are not near-duplicate to each other)
using traditional k-N N q may need a large number of images
to be retrieved. Therefore, it is frequently required that the
user interacts with the system through queries refinement
and relevance feedback until the system have been feed
with enough information to retrieve an image set that brings
enough information.
For instance, suppose that a user is looking for the 5
cases most similar to the current one, taking into account
the similarity of the MRI image of its patient over the
MRIBalan dataset. Two possible results for this example
are presented in Figure 2. The result obtained by BridGE
is shown in Figure 2 (a). As it can be seen, the diversity
query retrieved 5 elements that are more dissimilar from
In order to evaluate the information retrieval power, we
measured the number of images retrieved by BridGE and by
the traditional k-N N q. Figure 3 (a) shows the measurements
obtained to answer queries by similarity over MRIBalan
dataset. Besides the high number of elements retrieved by
BridGE, the results are sorted employing sets of similar
elements, where the images that have more probability to
add new information are shown on the top. Thus, BridGE
retrieves the number of images defined by the specialist, as
it can be seen in Figure 3 (b), while k-N N q linearly fetches
those images with a slight slope, in general returning less
than 50% of images that are not near-duplicated of each
other.
We presented our approach to specialists in CBIR to evaluate both alternatives. They performed similarity queries using both the traditional k-N N q and BridGE. They stated that
BridGE is indeed capable of presenting more meaningful
images than k-N N q, providing more information about the
relationships among the elements, yet being able to reduce
the need of further query refinements. The specialists also
highlighted that similarity queries with diversification that
291
set. However, the experiments showed that our proposed
technique is faster than the other diversity approaches to
execute similarity queries with diversity. In fact, BridGE
is consistently around two orders of magnitude faster than
OPT, and BridGE is consistently around four orders of
magnitude faster than the CLT algorithm, which is based
on the k-medoid clustering algorithm.
The result presented in Figure 4 pinpoints that our approach can be seamlessly integrated to a CBMIR system
at the cost of doubling the required time as compared
to k-N N q (non-diverse), while the competitors are easily
100 times slower. Moreover, BridGE does not require new
information from the specialist to retrieve relevant images
in a feasible time, turning the use of diversity on medical
images transparent and intuitive.
Figure 3. Relation among the number of retrieved elements and of the
representative ones in the result set. (a) Total amount of retrieved images for
a query by BridGE and k-N N q and (b) Number of representative images
retrieved.
V. C ONCLUSIONS
This paper presented an improved version for result diversification on k-N N queries based on influence to enhance
the usability of CBMIR systems. This technique is able
to automatically group near-duplicate entries and shows
only those most representatives for the query, thus having
more probability to retrieve new information to the specialist
starting from the first query in an interaction section.
The method BridGE that we proposed includes a new way
to transparently express similarity queries with diversity,
without cluttering the display with too much information,
since it just shows the representative image of each cluster
(the near-duplicates) and allows the users to walk deeper in
the information only if they want to. Moreover, BridGE spots
the similar images that can bring new useful information
to the user. No information is omitted or lost compared
to the traditional k-N N q approach. Instead, all the images
considered near-duplicates are grouped and sorted, making
it easier to the specialist to browse among them.
To validate our method, we performed experiments using
two real medical datasets that span up to 5,042 data elements
and are represented by feature vectors of up to 310 dimensions. The experiments show that diversity techniques can
enhance the use of CBMIR systems to avoid query refinement and further relevance feedback cycles. Thus, BridGE
can retrieve more representative images than k-N N q and
group the near-duplicates to increase the usefulness of the
data retrieved. Moreover, unlike the competitors, our proposed technique has the desirable feature of not requesting
any external parameter from the users, yet maintaining the
context focused on the query element. The performance tests
showed that BridGE is at least 2 orders of magnitude faster
than the closest competitors, while being able to retrieve
up to 10 times more elements than the traditional k-N N q
algorithm.
Evaluation performed by health personnel highlighted that
being able to ask for similarity queries with diversification
without setting external parameters is a more adequate
Figure 4. Average running time for the (a) MRIBalan dataset and (b)
ImageCLEFmed dataset for k varying from 3 to 11.
do not require external parameters demand less effort from
themselves, being simpler and more adequate to embedded
into CBMIR systems for use in the daily clinical routine.
B. Performance Experiment
In order to evaluate the retrieval performance of BridGE
regarding its competitors, we also performed queries on the
MRIBalan and ImageCLEFmed datasets. For each evaluated dataset, we randomly chose 100 different elements to
be employed as query elements. Each point measured in
the running time graphs represents the average number of
microseconds required to evaluate 100 queries with constant
values for k, but posed at distinct query elements.
Figure 4 shows, in log scale, the average time demanded
by each technique to answer queries when k varies from 3 to
11 for both MRIBalan (Figure 4 (a)) and ImageCLEFmed
(Figure 4 (b)) datasets. For this experiment, we restricted
the search space of CLT and OPT algorithms to the farthest
element retrieved by BridGE, because those algorithms
require this information in advance (see Definition 1) to
reduce the computational cost of using the entire dataset.
Notice that if we do not have BridGE, this information must
be estimated by the users, using their knowledge about the
dataset, which is a factor hurting CLT and OPT usability.
As expected, the traditional k-N N q is the fastest nondiverse algorithm. This occurs because it does not consider the relationship among the elements in the result
292
expression to pose queries in CBMIR tools. Thus, the main
contribution of BridGE is to reduce the system’ usability
gap, by better expressing similarity queries that considers
the relationship among the elements in the result set, without
increasing the performance gap. Moreover, we believe that
BridGE has a great potential to be seamlessly integrated into
CBMIR tools, facilitating later inroads of those systems in
the clinical routine.
As a future work, besides the incorporation of the developed technique into a CBMIR with traditional queries to
perform a user-centered analysis regarding the feasibility of
using diversity in clinical routine, we will explore other ways
to control the context boundary to adequately suit similarity
queries into he daily clinical practice.
[8] M. Drosou and E. Pitoura, “Search result diversification,”
SIGMOD Record, vol. 39, no. 1, pp. 41–47, Sep. 2010.
[9] R. L. Dias, R. Bueno, and M. X. Ribeiro, “Reducing the
complexity of k-nearest diverse neighbor queries in medical
image datasets through fractal analysis,” in Proceedings of
26th IEEE Symposium on Computer-Based Medical Systems.,
ser. CBMS, 2013, pp. 101–106.
[10] M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. Traina Jr, and V. J. Tsotras,
“On query result diversification,” in IEEE 27th International
Conference on Data Engineering.
Hannover, Germany:
IEEE, 2011, pp. 1163–1174.
[11] A. Angel and N. Koudas, “Efficient diversity-aware search,”
in ACM SIGMOD International Conference on Management
of Data. Athens, Greece: ACM, 2011, pp. 781–792.
ACKNOWLEDGMENT
The authors would like to thank FAPESP, CNPq, Capes
and SticAMSUD for the financial support.
[12] D. Kaster, P. Bugatti, M. Ponciano-Silva, A. Traina, P. Marques, A. Santos, and J. Traina, Caetano, “Medfmi-sir: A powerful dbms solution for large-scale medical image retrieval,”
in Information Technology in Bio- and Medical Informatics,
ser. Lecture Notes in Computer Science, C. B¨ohm, S. Khuri,
L. Lhotska, and N. Pisanti, Eds. Springer Berlin Heidelberg,
2011, vol. 6865, pp. 16–30.
R EFERENCES
[1] M. Ponciano-Silva, J. P. Souza, P. H. Bugatti, M. V. N.
Bedo, D. S. Kaster, R. T. V. Braga, A. D. Bellucci, P. M.
Azevedo-Marques, C. T. Jr., and A. J. M. Traina, “Does a
cbir system really impact decisions of physicians in a clinical
environment?” in Proceedings of 26th IEEE Symposium on
Computer-Based Medical Systems., ser. CBMS, 2013, pp. 41–
46.
[13] M. V. N. Bedo, M. P. da Silva, D. dos Santos Kaster, P. H.
Bugatti, A. J. M. Traina, and C. T. Jr, “Higiia: A perceptual
medical cbir system applied to mammography classification,”
in Demo and Applications Session of the XXVII Brazilian
Symposium on Databases (SBBD). S˜ao Paulo, SP: SBC Brazilian Computer Society, 2012.
[2] T. M. Deserno, S. Antani, and L. R. Long, “Ontology of gaps
in content-based image retrieval.” Journal of Digital Imaging,
vol. 22, no. 2, pp. 202–215, 2009.
[14] J. Carbonell and J. Goldstein, “The use of MMR, diversitybased reranking for reordering documents and producing
summaries,” in 21st Annual International ACM SIGIR
Conference on Research and Development in Information
Retrieval. New York, NY, USA: ACM, 1998, pp. 335–
336. [Online]. Available: http://doi.acm.org/10.1145/290941.
291025
[3] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of contentbased image retrieval with high-level semantics,” Pattern
Recognition., vol. 40, no. 1, pp. 262–282, Jan. 2007.
[4] M. O. G¨uld, C. Thies, B. Fischer, and T. M. Lehmann, “A
generic concept for the implementation of medical image
retrieval systems,” International Journal of Medical Informatics, vol. 76, no. 2-3, pp. 252–259, 2007.
[15] T. Skopal, V. Dohnal, M. Batko, and P. Zezula, “Distinct
nearest neighbors queries for similarity search in very large
multimedia databases,” in 11th International Workshop on
Web Information and Data Management. Hong Kong, China:
ACM, 2009, pp. 11–14.
[5] L. F. D. Santos, W. D. Oliveira, M. R. P. Ferreira, A. J. M.
Traina, and C. Traina, Jr., “Parameter-free and domainindependent similarity search with diversity,” in Proceedings
of the 25th International Conference on Scientific and Statistical Database Management, ser. SSDBM. New York, NY,
USA: ACM, 2013, pp. 5:1–5:12.
[16] H. M¨uller, T. Deselaers, T. Deserno, J. Kalpathy Cramer,
E. Kim, and W. Hersh, “Overview of the imageclefmed 2007
medical retrieval and medical annotation tasks,” in Advances
in Multilingual and Multimodal Information Retrieval, ser.
Lecture Notes in Computer Science, C. Peters, V. Jijkoun,
T. Mandl, H. M¨uller, D. Oard, A. Pe˜nas, V. Petras, and
D. Santos, Eds. Springer Berlin Heidelberg, 2008, vol. 5152,
pp. 472–491.
[6] J. Banda, M. Schuh, T. Wylie, P. McInerney, and R. Angryk,
“When too similar is bad: A practical example of the solar
dynamics observatory content-based image-retrieval system,”
in New Trends in Databases and Information Systems, ser.
Advances in Intelligent Systems and Computing, B. Catania,
T. Cerquitelli, S. Chiusano, G. Guerrini, M. K¨ampf, A. Kemper, B. Novikov, T. Palpanas, J. Pokorn´y, and A. Vakali, Eds.
Springer International Publishing, 2014, vol. 241, pp. 87–95.
[17] A. Balan, A. J. M. Traina, A. Traina, and P. AzevedoMarques, “Fractal analysis of image textures for indexing and
retrieval by content,” in Proceedings of 18th IEEE Symposium
on Computer-Based Medical Systems., ser. CBMS, June 2005,
pp. 581–586.
[7] L. F. D. Santos, W. D. Oliveira, M. R. P. Ferreira, R. L. F.
Cordeiro, A. J. M. Traina, and C. T. Jr., “Evaluating the diversification of similarity query results,” Journal of Information
and Data Management, vol. 4, no. 3, pp. 188–203, 2013.
293