Federated search systems

Federated search systems
Prepared to support an invited
lecture in the framework of a
workshop on
Federated search
organized by ABD/BVD
and hosted by KBR in Brussels,
23 October 2012
by [email protected]
2B114
Vrije Universiteit Brussel
B-1050 Brussel
Belgium
1
2
These slides should be available from the WWW site
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
3
1. Introduction and definition
2. Problem statement
- contents
- summary
- structure
- overview
3. Federated search engines as
a partial solution
4. Meaning and confusion
5. Advantages / benefits
J
6. Difficulties / limitations 
7. Implementation
of this
presentation
8. Putting federated searching
in a wider context
9. Combination of merging
databases with federated
searching
4
• Federated search
5
Federated searching
Introduction and definition
6
Scattering of sources
• Users want to exploit information sources fast and
effectively.
• This is hindered by the fact that digital, electronic
information sources that may contain relevant
information are created and scattered, distributed on
numerous computers all over the intranet of the user’s
organization AND over the Internet and the WWW.

7
Scattering of sources
• In other words:
integration / aggregation
is still far from perfect.

8
Scattering of sources
 difficulties
• Using many information retrieval systems costs time:
1. They must be used one after the other which requires
many decisions and actions.

9
Scattering of sources
 difficulties
• Using many information retrieval systems costs time:
2. They offer different user interfaces in the retrieval phase,
which is confusing.

10
Scattering of sources
 difficulties
• Using many information retrieval systems costs time:
3. They offer found information items in various data
formats.
They display found items in different ways on a computer
screen.

11
Introduction:
scattering of sources  difficulties
Small = BEAUTIFUL ?
12
Introduction:
scattering of sources  difficulties
Small = BEAUTIFUL ?
13
Federated searching
Problem statement
14
Problem statement
Which methods have been
developed and applied to
cope with this reality?
15
Federated searching
Federated search systems as a partial solution
16
Introduction:
scattering of sources  difficulties
Solutions?
1. Merging of databases
2. Federated searching.
See the published paper
P. Nieuwenhuysen
So many digital libraries, so little time.
In International Conference on Digital
Libraries, 2010: ICDL 2010, Shaping the
Information Paradigm,
New Delhi, 23-26 February 2010.
Preconference proceedings. Conference
papers, published by TERI, The Energy and
Resources Institute, http://www.teriin.org/
and the Indira Gandhi National Open
University,
2010, 2 volumes, 1349 pages. pp. 56-71.
17
Method 1: Merging = aggregating
into a searchable database
User
User
Search engine
Database
or web site
or…
Aggregated database
Database
or web site
or…
Database
or web site
or…
D
o
18
Method 2: Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
19
Both methods
offer benefits to the users
+ Saves the users time that would be needed
to execute queries towards various servers or
to browse through various systems.
J
20
Both methods
offer benefits to the users
+ The users have to learn only 1 user interface for
searching and only 1 search syntax,
instead of a user interface and a search syntax for each
database.
J
21
Both methods
offer benefits to the users
+ The system offers a uniform / consistent display of results
in the output phase.
J
22
Method 1: Merging = aggregating
into a searchable database
User
User
Search engine
Database
or web site
or…
Aggregated database
Database
or web site
or…
Database
or web site
or…
D
o
23
Method 2: Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
24
Federated searching:
definition
An ideal federated search system
1. allows a user to formulate a query,
2. it adapts/transforms this query,
so that it can be sent with a proper syntax to each search
engine of a chosen set/group of disparate databases,
3. it broadcasts this query to those databases,
4. it collects results from each database,
5. (perhaps: consolidates these results into 1 result set)
6. (perhaps: detects and removes duplicate items)
7. shows the final results to the user, in a unified format
8. (allows the user to sort the results by various criteria)
J
25
Federated searching
in many domains
• Federated searching is applied in many domains,
where information discovery is important,
NOT only by libraries.
26
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight
2. A rented car
3. A cheap and nice hotel
4. A visit to a nice museum
5. Something interesting to read (free via your library)
J
Example
Federated searching:
application: finding a suitable flight
Example:
• http://CheapTickets.com/ for the USA
27
28
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight
2. A rented car, cheap
3. A cheap and nice hotel
4. A visit to a nice museum
5. Something interesting to read (free via your library)
J
29
Example
Federated searching:
application: finding a car to rent
30
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight
2. A rented car
3. A cheap and nice hotel
4. A visit to a nice museum
5. Something interesting to read (free via your library)
J
Example
Federated searching:
application: finding a hotel room
31
32
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight
2. A rented car
3. A cheap and nice hotel
4. A visit to a nice museum
5. Something interesting to read (free via your library)
J
Example
Federated searching:
application: searching in a museum
33
34
Federated searching
through scattered databases: why?
The perfect trip:
1. A cheap and nice flight
2. A rented car
3. A cheap and nice hotel
4. A visit to a nice museum
5. Something interesting to read (free via your library)
J
35
Example
Federated searching:
searching in a library
36
Federated searching:
scheme
End user
J
End user
J
portal for
meta-searching
= federated searching
= cross-database searching
information
sources
37
Federated searching:
integrating access
Articles
Journals
Publishers
Databases
(full-text or bibliographic)
38
Federated searching:
integrating access
Intranet
Articles
Journals
Publishers
WWW
search engines
Catalog
database(s)
of other libraries
Databases
(full-text or bibliographic)
Local library catalog
database(s)
39
Federated searching:
integrating access
Intranet
Articles
WWW
search engines
Journals
Catalog
database(s)
of other libraries
Publishers
Databases
(full-text or bibliographic)
Local library catalog
database(s)
Meta-searching system
40
Federated searching:
produce - distribute - implement
Producers = developers = creators
Intermediate sellers = distributors
Implementers = users (for instance a library
41
Federated searching
Meaning and confusion
42
Federated searching:
terminology / vocabulary / synonyms
federated searching
=
meta-searching = metasearching
=
cross-database searching
=
multi-database searching
=
multi-threaded searching
=
one-stop searching
=
poly-searching = polysearching
=
broadcast searching
=
searching through a portal
(but the term “portal” is used also with other meanings)
43
“Federated searching”
meaning and confusion
Here and in many other contexts,
the term “federated searching” is used
as a synonym for “meta-searching”.
44
“Federated searching”
meaning and confusion
1. Some people use the terms “federated searching” and
“meta-searching” with DIFFERENT meanings.
2. This language problem creates confusion.
Examples 
45
“Federated searching”
meaning and confusion
»“Federated searching” as searching through a database
that results from merging several databases.
So this is certainly NOT equal to “meta-searching”.
»“Federated searching” as a subcategory, as powerful type
of meta-searching:
namely meta-searching that is followed by merging
(federating) the items retrieved from various databases
into only 1 set, ordered in one way or another.
46
“Federated searching”
meaning and confusion
• Furthermore:
A federated search engine as software product
is NOT the same as
a federated searching system implemented as a service
that can be available for all on the WWW, to search
»public WWW search engines
»bookshop databases
»library catalogs / holdings
»flight databases
»hotel databases
47
Federated searching
Advantages / benefits
J
48
Federated searching:
benefits for the users
benefits mentioned earlier, offered
• by merging databases
or
• by federated searching
+…
J
49
Federated searching:
benefits for the users
+ The system can help the user to select appropriate
sources.
J
50
Federated searching:
benefits for the users
+ The system can help in the process of authentication and
authorization when this involves not only a simple
recognition of IP-address of the user’s client computer,
but when it involves user-id’s and passwords.
J
51
Federated searching:
benefits for the users
+ The need to know which particular database is suitable
for a particular search is reduced,
because several ones can be searched in one action.
J
52
Federated searching:
benefits for the users
+ Can make users search and exploit databases that they
would never use otherwise, that is without federated
search system!
J
53
Federated searching:
benefits for the users
+ Useful, relevant, interesting items/references can be
found/uncovered from unexpected, unknown, unfamiliar
databases!
This is mainly beneficial in the case of interdisciplinary
subjects/topics.
J
54
Federated searching:
benefits for the users
So far so good !
J
55
Federated searching
Difficulties / challenges / problems / limitations

56
Federated searching:
difficulties / challenges / problems !!
- Portal software tries to cope
with several difficulties/challenges/problems/pitfalls
that hinder the application of the “good idea”:
The user does not notice most of these problems and
shortcomings,
because results from various databases are merged by the
federated search system.

57
Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
58
Federated searching:
difficulties / challenges / problems
- Searching in a target database may be restricted by the
federated search engine to a particular field (for example:
a restriction to words occurring in the title, because this is
the default way of searching of that system) while this
restriction is not present in other target databases.
Furthermore, this is perhaps not explained in the user
interface.
This may lead to a lower recall, which is of course NOT
desirable.
Even worse, the user is perhaps not aware of this.
59
Federated searching:
difficulties / challenges / problems
- How to deduplicate/dedupe/cluster
very similar entries/results/items
= near-duplicates,
from various target sources?
When is similar similar enough?
Which entry/result/item
to choose/select
as the representative of a cluster of similar entries?

60
Federated searching:
difficulties / challenges / problems !
- How to provide some useful relevance ranking of search
results/entries,
even when the target databases can be quite different in
type and quality, and
even when no index is created in advance, just-in-case,
well before the search action, like Google and other
Internet search engines do.

61
Federated searching:
difficulties / challenges / problems !
- Powerful / sophisticated / refined forms of searching may
not be applicable in a federated search.
Example:
limiting to a particular type of document,
such as a therapy in medicine.
This may cause a LOSS of time, instead of winning time.

62
Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
63
Federated searching:
difficulties / challenges / problems
- Differences among target sources in the Internet
application protocols that are applied normally,
by default, for connection/communication and retrieval,
such as
»(telnet)
HTTP
»proprietary, non-standard protocols
»Z39.50, ISO239.50, SRU, and related protocols that are
developed for federated-searching!

64
Federated searching:
difficulties / challenges / problems
- Even when the target is compatible with a suitable set of
protocols for standardised retrieval
Z39.50, ISO239.50, SRU…,
then difficulties can arise due to incomplete
implementations
(the target may lack features supported by the protocol
and by the software for federated searching)

65
Federated searching:
difficulties / challenges / problems
- When a suitable protocol can NOT be used and simple
HTTP must be used for connection to the target source,
and when simple HTML is used by the target source to
present results,
then the capture and analysis of the results by the
federating search system is complicated and difficult
and can be hindered by changes with time in the method
of the presentation of results.

66
Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
67
Federated searching:
difficulties / challenges / problems !
- Various search engines may act in different ways!
For instance:
Is truncation of a word in a search query possible?
Is limitation to a particular field possible?
 How can a federated search engine take these differences
into account?

68
Federated searching:
difficulties / challenges / problems
- A query with several words and without explicit Boolean
operators can be interpreted in various ways
by the various database retrieval systems.
For instance:
The retrieval software may apply the Boolean operator AND
to combine all the query words, but it may also use OR.
In the case that the federated search system does not take
care of this well, then this may lead to lower recall and
precision.

69
Federated searching:
difficulties / challenges / problems !
- When some special, non-standard, dedicated retrieval
software is made available by a specific target source
databases to offer special features to the user to exploit
the database better than with a standard retrieval
interface,
then the source can probably not be exploited as well by
the federated search system.
Searches are reduced to the lowest common denominator.

70
Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
71
Federated searching:
difficulties / challenges / problems !
- Differences among target sources in the
formatting/structuring of their database records in fields
hinders
- searching limited to a field
(for instance the author field)
- displaying selected fields only
(such as the retrieved titles)
- sorting of the displayed records on the contents of a
particular selected field
(such as author or publication date)
72
Federated searching or merging:
difficulties / challenges / problems !
- In many cases there are differences among sources in the
metadata schemes that are applied in the databases to
improve retrieval, such as
»classifications
»taxonomies
»thesaurus systems
»ontologies
- This hinders the exploitation of the added value of such
metadata.

73
Federated searching:
difficulties / challenges / problems
- A user of a federated search system may perhaps
incorrectly assume that ALL relevant databases are
covered simply in 1 action, or
that if a database is not included,
then it must not be relevant/important.
However, even a federated search system can only search
a limited number of databases, so that perhaps some
relevant databases are NOT covered.

74
Federated searching:
difficulties / challenges / problems
- Students who rely on a federated search system may
perhaps not learn about the important subject-specific
databases in their field,
so that when they have no access anymore to the same
federated search system, they still do not know which
database may help them in their research and how to use
it well.

75
Federated searching:
difficulties / challenges / problems
- Some databases are accessible only by a limited number of
concurrent/simultaneous users from one organisation, as
agreed in the licence and controlled by the authorization
software of the database.
When such a database would be included automatically in
all or in many federated searches,
then some users who really require access to that
particular database may perhaps not be able to use that
database.

76
Federated searching:
difficulties / challenges / problems
- When a database is accessible by an unlimited number of
concurrent/simultaneous users from one organisation,
and when such a database would be included
automatically in all or in many federated searches, from
many organisations (even when the searcher does not
have any particular interest in that database),
then the retrieval system of that database may be
overburdened.
This is mainly a concern for information vendors, who
must maintain servers with sufficient capacity.
77
Federated searching:
difficulties / challenges / problems
- Some databases can NOT be included as a target
database in a federated searching engine,
because their owners/producers do not allow this.
This is a difficulty, because in this way interesting /
valuable databases are perhaps not exploited by users
who rely on federated searching.
78
Federated searching
through scattered databases
User
User
Federated search engine
Search engine
Database
Search engine
Database
Search engine
Database
79
Federated searching:
difficulties / challenges / problems
- Users may be less impressed by a federated searching
system than by the simple, common, familiar, famous
Internet / WWW search engines, as response time is in
most cases less impressive, due to differences as follows:
- The computer hardware used by the systems
- Slower distributed searching through several computer
systems, versus faster searching through a more centralised
computer database of a priori compiled records

80
Federated searching:
difficulties / challenges / problems
- The evaluation of the quality of each search result
from a federated search action
may be more difficult than when each database is
searched separately, because the user may be less aware
of the limitations, strengths, selection criteria and aims of
the individual, separate databases that offer each result.
For instance, peer-reviewed articles from reputable scientific
journals may be mixed with more popular and more biased,
unscientific texts from trade literature.

81
Federated searching:
general remarks
Federated searching
- offers benefits for those end-users
who are not enthusiastic to work with separate target
source databases
- is a continuous challenge
for developers of the sophisticated software and
for the implementers in libraries and information centers
- does not eliminate the need for access to individual
databases
82
Federated searching
Implementation
83
Federated searching:
local or remote hosting
• The federated searching system can be developed and
maintained
»on a local computer in-house, or
»hosted on a more distant, external, remote computer;
this service is offered by some vendors of software for
federated searching;
partly outsourcing
84
Federated searching:
local hosting: scheme
End user
J
End user
J
In-house portal for
meta-searching
= federated searching
= cross-database searching
information
sources
85
Federated searching:
remote hosting: scheme
End user
J
End user
J
Externally hosted portal for
meta-searching
= federated searching
= cross-database searching
information
sources
86
Federated searching:
local versus remote hosting
• Remote hosting requires perhaps
»a smaller initial investment in computer hardware and
skilled personnel
»less time investment in installation and maintenance of
equipment and software
87
Federated searching:
tasks for the library
• Of course providing a computer system for metasearching
88
Federated searching:
tasks for the library
• Maintaining a list of target information sources that are
appropriate in the framework of the particular library:
»subjects covered by the target databases should be relevant
»subscriptions must have been made by the library for
access to the targets
89
Federated searching:
tasks for the library
• Grouping databases in groups that correspond to subject
fields and offer these as pre-selections in the user
interface of the federated search system
90
Federated searching:
tasks for the library
• Showing the system and its features to potential users
91
Federated searching
in a library WWW site?
- Searching for books
- Opening hours
- Searching for articles
- Library services
- Rules and regulations
- Organisation of the library
92
Federated searching
in a library WWW site?
- Searching for books
- Opening hours
- Catalog of this library
- Library services
- Other catalogs
- Rules and regulations
- Other book databases
- Organisation of the library
- Electronic books
- Federated searching for
books
- Searching for articles
93
Federated searching
in a library WWW site?
- Searching for books
- Opening hours
- Searching for articles
- Library services
- Databases to find articles
- Rules and regulations
- Electronic journals
- Organisation of the library
- Collective catalog of
periodicals
- Repositories of articles on
the Internet and WWW
- Federated searching for
articles
94
Federated searching
in a library WWW site!
- Find the information that you need
- The catalog
- Databases
- Opening hours
- Library services
- Rules and regulations
- Organisation of the library
To a federated search engine
95
Federated searching
Putting federated searching
in a wider context
96
Federated searching
+ link generator
user
J
reference
FEDERATED SEARCH
information
sources
full-text document !
menu
context-sensitive
hyperlink generator
database
about local situation
“knowledgebase”
appropriate
target
information
source
97
Federated search system
and link resolver compared
Federated
search
or
merging
into 1 database
Link resolver
=
link generator
How to allow a user to discover
information, by exploiting many
information sources in 1 action?
YES !
no
How to bring a user from some
discovered, known information to
additional, related information?
no
YES !
Problem to be solved
98
Putting the digital tools together
in a library system
user
J
library WWW site
context-sensitive
hyperlink generator
catalogue(s)
of local holdings
federated searching
database
about local situation
“knowledgebase”
99
Access to information sources:
tools / methods / systems
In sequence of priority:
1. Online library catalogue
(for hard copy and digital documents)
2. Library web site
3. Link generator + “knowledgebase”
4.
Federated search system
5. …
100
Federated searching
Examples of applications offered free of charge
101
Example
Federated searching:
example
• http://WorldWideScience.org/
• “A global science gateway connecting you to national and
international scientific databases and portals.
Accelerates scientific discovery and progress by providing
one-stop searching of global science sources.”
102
Example
Federated searching:
example
• http://www.scitopia.org/scitopia/
• Federated searching through
various scientific databases
103
Example
Federated searching:
example: Yippy
104
Example
Federated searching
+ clustering: example: Yippy
• Adds value by analyzing the retrieved
results / hits / links / WWW documents,
in order to
cluster / group / categorize / classify / map
these under headings / classes / categories,
to make further selections by the user / searcher easier
and faster.
• Can accomplish this on the fly, that is WITHOUT preprocessing the documents before the search.
105
Example
Federated searching
+ clustering: example: Yippy
106
Search systems for books
that are made available by dealers
User
Federated
book search systems
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalog
databases
descriptions of books & real books for sale
107
Search systems for books
that are made available by dealers
User
Federated
book search systems
Multi-dealer
databases
= merged
book dealer
databases
Book dealer
catalogue
databases
descriptions of books & real books for sale
108
Free federated search systems
for books: examples
• http://www.addall.com/
• Covers many book dealer databases and multi-dealer
databases, including unique databases that are not
covered by competing search systems.
• Can calculate the cost to ship/send a book to you, taking
into account your country and currency.
• Searches only new books;
to find used books, a companion system should be used.
This is inconvenient if the user is interested in both types of
books.
109
Free federated search systems
for books: examples
• http://www.bookfinder.com/
• Covers many book dealer databases and multi-dealer
databases, including unique databases that are not
covered by competing search systems.
• It is efficient that new and used books are searched in 1
action;
the results are presented in 2 columns: new | used.
110
Example
Online catalogues:
simultaneous searching: examples
• Simultaneous access to catalogues of libraries related to
water, organized by IAMSLIC, using the standard Z39.50
111
Federated searching
offered by a university library
•
Main goal of such a system is offering easy and fast
access to various information sources
and NOT sophisticated and complicated searching.
•
The user interface is simple,
in agreement with the aim of such a system.
112
Federated search
For libraries and information centres
1.
Point your users to external existing federated search
services on the WWW, which are available free of charge.
113
Federated search
For libraries and information centres
2.
Consider implementing a local federated search system for
your users, hoping that the databases that you offer anyway
are used more often and more effectively.
114
• Combination of
merging databases
with federated
searching
115
Comparison of methods
for efficient information retrieval
1.
Merging
databases
2.
Federated
searching
A smaller version of this table with comments
has been published:
P. Nieuwenhuysen
So many digital libraries, so little time.
In International Conference on Digital
Libraries, 2010: ICDL 2010, Shaping the
Information Paradigm,
New Delhi, 23-26 February 2010.
Preconference proceedings. Conference
papers, published by TERI, The Energy and
Resources Institute, http://www.teriin.org/
and the Indira Gandhi National Open
University,
2010, 2 volumes, 1349 pages. pp. 56-71.
116
Comparison of methods
for efficient information retrieval
Pre-search
analysis of all
data (for better
relevance
ranking,
to eliminate
duplicates, etc…)
1.
Merging
databases
+
2.
-
Federated
searching
117
Comparison of methods
for efficient information retrieval
Pre-search
Size
analysis of all
of the
data (for better
coverage
relevance
ranking,
to eliminate
duplicates, etc…)
1.
Merging
databases
+
-+
2.
-
+-
Federated
searching
118
Comparison of methods
for efficient information retrieval
Pre-search
Size
analysis of all
of the
data (for better
coverage
relevance
ranking,
to eliminate
duplicates, etc…)
Independent
of Internet /
WWW
1.
Merging
databases
+
-+
+
2.
-
+-
-
Federated
searching
119
Comparison of methods
for efficient information retrieval
Pre-search
Size
analysis of all
of the
data (for better
coverage
relevance
ranking,
to eliminate
duplicates, etc…)
Independent Up-to-date
of Internet / information
WWW
1.
Merging
databases
+
-+
+
-
2.
-
+-
-
+
Federated
searching
120
Comparison of methods
for efficient information retrieval
Pre-search
Size
analysis of all
of the
data (for better
coverage
relevance
ranking,
to eliminate
duplicates, etc…)
Independent Up-to-date Speed
of Internet / information of
WWW
retrieval
and
display
1.
Merging
databases
+
-+
+
-
+-
2.
-
+-
-
+
-+
Federated
searching
121
Comparison of methods
for efficient information retrieval
Both methods
• have pros and cons
• are used in combination, in some systems to search
through many bibliographic databases in 1 action,
(for instance, if merging + indexing of a database with
other databases is not allowed by the producer, then that
databases can be included nevertheless, by a federated
search)
122
Comparison of methods
for efficient information retrieval
+ The evolution of information and communication
technology makes systems more powerful, easier to
implement and use, and cheaper:
+ Merging information sources is pushed forward mainly by
the decreasing costs of hard disks and of computer
memory in general.
+ Federated searching is pushed mainly by the evolution of
the Internet.
J
123
Introduction:
scattering of sources  difficulties
Islands of information
124
Introduction:
scattering of sources  difficulties
125
Questions are welcome
126
• You are free to copy, distribute, display this work under
the following conditions:
»Attribution:
You must mention the author.
»Noncommercial:
You may not use this work for commercial purposes.
»No Derivative Works:
You may not change, modify, alter, transform, or build
upon this work.
• For any reuse or distribution, you must make clear to
others the license terms of this work.