Federated search systems Prepared to support an invited lecture in the framework of a workshop on Federated search organized by ABD/BVD and hosted by KBR in Brussels, 23 October 2012 by [email protected] 2B114 Vrije Universiteit Brussel B-1050 Brussel Belgium 1 2 These slides should be available from the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/ (note: BIBLIO and not biblio) 3 1. Introduction and definition 2. Problem statement - contents - summary - structure - overview 3. Federated search engines as a partial solution 4. Meaning and confusion 5. Advantages / benefits J 6. Difficulties / limitations 7. Implementation of this presentation 8. Putting federated searching in a wider context 9. Combination of merging databases with federated searching 4 • Federated search 5 Federated searching Introduction and definition 6 Scattering of sources • Users want to exploit information sources fast and effectively. • This is hindered by the fact that digital, electronic information sources that may contain relevant information are created and scattered, distributed on numerous computers all over the intranet of the user’s organization AND over the Internet and the WWW. 7 Scattering of sources • In other words: integration / aggregation is still far from perfect. 8 Scattering of sources difficulties • Using many information retrieval systems costs time: 1. They must be used one after the other which requires many decisions and actions. 9 Scattering of sources difficulties • Using many information retrieval systems costs time: 2. They offer different user interfaces in the retrieval phase, which is confusing. 10 Scattering of sources difficulties • Using many information retrieval systems costs time: 3. They offer found information items in various data formats. They display found items in different ways on a computer screen. 11 Introduction: scattering of sources difficulties Small = BEAUTIFUL ? 12 Introduction: scattering of sources difficulties Small = BEAUTIFUL ? 13 Federated searching Problem statement 14 Problem statement Which methods have been developed and applied to cope with this reality? 15 Federated searching Federated search systems as a partial solution 16 Introduction: scattering of sources difficulties Solutions? 1. Merging of databases 2. Federated searching. See the published paper P. Nieuwenhuysen So many digital libraries, so little time. In International Conference on Digital Libraries, 2010: ICDL 2010, Shaping the Information Paradigm, New Delhi, 23-26 February 2010. Preconference proceedings. Conference papers, published by TERI, The Energy and Resources Institute, http://www.teriin.org/ and the Indira Gandhi National Open University, 2010, 2 volumes, 1349 pages. pp. 56-71. 17 Method 1: Merging = aggregating into a searchable database User User Search engine Database or web site or… Aggregated database Database or web site or… Database or web site or… D o 18 Method 2: Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 19 Both methods offer benefits to the users + Saves the users time that would be needed to execute queries towards various servers or to browse through various systems. J 20 Both methods offer benefits to the users + The users have to learn only 1 user interface for searching and only 1 search syntax, instead of a user interface and a search syntax for each database. J 21 Both methods offer benefits to the users + The system offers a uniform / consistent display of results in the output phase. J 22 Method 1: Merging = aggregating into a searchable database User User Search engine Database or web site or… Aggregated database Database or web site or… Database or web site or… D o 23 Method 2: Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 24 Federated searching: definition An ideal federated search system 1. allows a user to formulate a query, 2. it adapts/transforms this query, so that it can be sent with a proper syntax to each search engine of a chosen set/group of disparate databases, 3. it broadcasts this query to those databases, 4. it collects results from each database, 5. (perhaps: consolidates these results into 1 result set) 6. (perhaps: detects and removes duplicate items) 7. shows the final results to the user, in a unified format 8. (allows the user to sort the results by various criteria) J 25 Federated searching in many domains • Federated searching is applied in many domains, where information discovery is important, NOT only by libraries. 26 Federated searching through scattered databases: why? The perfect trip: 1. A cheap and nice flight 2. A rented car 3. A cheap and nice hotel 4. A visit to a nice museum 5. Something interesting to read (free via your library) J Example Federated searching: application: finding a suitable flight Example: • http://CheapTickets.com/ for the USA 27 28 Federated searching through scattered databases: why? The perfect trip: 1. A cheap and nice flight 2. A rented car, cheap 3. A cheap and nice hotel 4. A visit to a nice museum 5. Something interesting to read (free via your library) J 29 Example Federated searching: application: finding a car to rent 30 Federated searching through scattered databases: why? The perfect trip: 1. A cheap and nice flight 2. A rented car 3. A cheap and nice hotel 4. A visit to a nice museum 5. Something interesting to read (free via your library) J Example Federated searching: application: finding a hotel room 31 32 Federated searching through scattered databases: why? The perfect trip: 1. A cheap and nice flight 2. A rented car 3. A cheap and nice hotel 4. A visit to a nice museum 5. Something interesting to read (free via your library) J Example Federated searching: application: searching in a museum 33 34 Federated searching through scattered databases: why? The perfect trip: 1. A cheap and nice flight 2. A rented car 3. A cheap and nice hotel 4. A visit to a nice museum 5. Something interesting to read (free via your library) J 35 Example Federated searching: searching in a library 36 Federated searching: scheme End user J End user J portal for meta-searching = federated searching = cross-database searching information sources 37 Federated searching: integrating access Articles Journals Publishers Databases (full-text or bibliographic) 38 Federated searching: integrating access Intranet Articles Journals Publishers WWW search engines Catalog database(s) of other libraries Databases (full-text or bibliographic) Local library catalog database(s) 39 Federated searching: integrating access Intranet Articles WWW search engines Journals Catalog database(s) of other libraries Publishers Databases (full-text or bibliographic) Local library catalog database(s) Meta-searching system 40 Federated searching: produce - distribute - implement Producers = developers = creators Intermediate sellers = distributors Implementers = users (for instance a library 41 Federated searching Meaning and confusion 42 Federated searching: terminology / vocabulary / synonyms federated searching = meta-searching = metasearching = cross-database searching = multi-database searching = multi-threaded searching = one-stop searching = poly-searching = polysearching = broadcast searching = searching through a portal (but the term “portal” is used also with other meanings) 43 “Federated searching” meaning and confusion Here and in many other contexts, the term “federated searching” is used as a synonym for “meta-searching”. 44 “Federated searching” meaning and confusion 1. Some people use the terms “federated searching” and “meta-searching” with DIFFERENT meanings. 2. This language problem creates confusion. Examples 45 “Federated searching” meaning and confusion »“Federated searching” as searching through a database that results from merging several databases. So this is certainly NOT equal to “meta-searching”. »“Federated searching” as a subcategory, as powerful type of meta-searching: namely meta-searching that is followed by merging (federating) the items retrieved from various databases into only 1 set, ordered in one way or another. 46 “Federated searching” meaning and confusion • Furthermore: A federated search engine as software product is NOT the same as a federated searching system implemented as a service that can be available for all on the WWW, to search »public WWW search engines »bookshop databases »library catalogs / holdings »flight databases »hotel databases 47 Federated searching Advantages / benefits J 48 Federated searching: benefits for the users benefits mentioned earlier, offered • by merging databases or • by federated searching +… J 49 Federated searching: benefits for the users + The system can help the user to select appropriate sources. J 50 Federated searching: benefits for the users + The system can help in the process of authentication and authorization when this involves not only a simple recognition of IP-address of the user’s client computer, but when it involves user-id’s and passwords. J 51 Federated searching: benefits for the users + The need to know which particular database is suitable for a particular search is reduced, because several ones can be searched in one action. J 52 Federated searching: benefits for the users + Can make users search and exploit databases that they would never use otherwise, that is without federated search system! J 53 Federated searching: benefits for the users + Useful, relevant, interesting items/references can be found/uncovered from unexpected, unknown, unfamiliar databases! This is mainly beneficial in the case of interdisciplinary subjects/topics. J 54 Federated searching: benefits for the users So far so good ! J 55 Federated searching Difficulties / challenges / problems / limitations 56 Federated searching: difficulties / challenges / problems !! - Portal software tries to cope with several difficulties/challenges/problems/pitfalls that hinder the application of the “good idea”: The user does not notice most of these problems and shortcomings, because results from various databases are merged by the federated search system. 57 Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 58 Federated searching: difficulties / challenges / problems - Searching in a target database may be restricted by the federated search engine to a particular field (for example: a restriction to words occurring in the title, because this is the default way of searching of that system) while this restriction is not present in other target databases. Furthermore, this is perhaps not explained in the user interface. This may lead to a lower recall, which is of course NOT desirable. Even worse, the user is perhaps not aware of this. 59 Federated searching: difficulties / challenges / problems - How to deduplicate/dedupe/cluster very similar entries/results/items = near-duplicates, from various target sources? When is similar similar enough? Which entry/result/item to choose/select as the representative of a cluster of similar entries? 60 Federated searching: difficulties / challenges / problems ! - How to provide some useful relevance ranking of search results/entries, even when the target databases can be quite different in type and quality, and even when no index is created in advance, just-in-case, well before the search action, like Google and other Internet search engines do. 61 Federated searching: difficulties / challenges / problems ! - Powerful / sophisticated / refined forms of searching may not be applicable in a federated search. Example: limiting to a particular type of document, such as a therapy in medicine. This may cause a LOSS of time, instead of winning time. 62 Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 63 Federated searching: difficulties / challenges / problems - Differences among target sources in the Internet application protocols that are applied normally, by default, for connection/communication and retrieval, such as »(telnet) HTTP »proprietary, non-standard protocols »Z39.50, ISO239.50, SRU, and related protocols that are developed for federated-searching! 64 Federated searching: difficulties / challenges / problems - Even when the target is compatible with a suitable set of protocols for standardised retrieval Z39.50, ISO239.50, SRU…, then difficulties can arise due to incomplete implementations (the target may lack features supported by the protocol and by the software for federated searching) 65 Federated searching: difficulties / challenges / problems - When a suitable protocol can NOT be used and simple HTTP must be used for connection to the target source, and when simple HTML is used by the target source to present results, then the capture and analysis of the results by the federating search system is complicated and difficult and can be hindered by changes with time in the method of the presentation of results. 66 Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 67 Federated searching: difficulties / challenges / problems ! - Various search engines may act in different ways! For instance: Is truncation of a word in a search query possible? Is limitation to a particular field possible? How can a federated search engine take these differences into account? 68 Federated searching: difficulties / challenges / problems - A query with several words and without explicit Boolean operators can be interpreted in various ways by the various database retrieval systems. For instance: The retrieval software may apply the Boolean operator AND to combine all the query words, but it may also use OR. In the case that the federated search system does not take care of this well, then this may lead to lower recall and precision. 69 Federated searching: difficulties / challenges / problems ! - When some special, non-standard, dedicated retrieval software is made available by a specific target source databases to offer special features to the user to exploit the database better than with a standard retrieval interface, then the source can probably not be exploited as well by the federated search system. Searches are reduced to the lowest common denominator. 70 Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 71 Federated searching: difficulties / challenges / problems ! - Differences among target sources in the formatting/structuring of their database records in fields hinders - searching limited to a field (for instance the author field) - displaying selected fields only (such as the retrieved titles) - sorting of the displayed records on the contents of a particular selected field (such as author or publication date) 72 Federated searching or merging: difficulties / challenges / problems ! - In many cases there are differences among sources in the metadata schemes that are applied in the databases to improve retrieval, such as »classifications »taxonomies »thesaurus systems »ontologies - This hinders the exploitation of the added value of such metadata. 73 Federated searching: difficulties / challenges / problems - A user of a federated search system may perhaps incorrectly assume that ALL relevant databases are covered simply in 1 action, or that if a database is not included, then it must not be relevant/important. However, even a federated search system can only search a limited number of databases, so that perhaps some relevant databases are NOT covered. 74 Federated searching: difficulties / challenges / problems - Students who rely on a federated search system may perhaps not learn about the important subject-specific databases in their field, so that when they have no access anymore to the same federated search system, they still do not know which database may help them in their research and how to use it well. 75 Federated searching: difficulties / challenges / problems - Some databases are accessible only by a limited number of concurrent/simultaneous users from one organisation, as agreed in the licence and controlled by the authorization software of the database. When such a database would be included automatically in all or in many federated searches, then some users who really require access to that particular database may perhaps not be able to use that database. 76 Federated searching: difficulties / challenges / problems - When a database is accessible by an unlimited number of concurrent/simultaneous users from one organisation, and when such a database would be included automatically in all or in many federated searches, from many organisations (even when the searcher does not have any particular interest in that database), then the retrieval system of that database may be overburdened. This is mainly a concern for information vendors, who must maintain servers with sufficient capacity. 77 Federated searching: difficulties / challenges / problems - Some databases can NOT be included as a target database in a federated searching engine, because their owners/producers do not allow this. This is a difficulty, because in this way interesting / valuable databases are perhaps not exploited by users who rely on federated searching. 78 Federated searching through scattered databases User User Federated search engine Search engine Database Search engine Database Search engine Database 79 Federated searching: difficulties / challenges / problems - Users may be less impressed by a federated searching system than by the simple, common, familiar, famous Internet / WWW search engines, as response time is in most cases less impressive, due to differences as follows: - The computer hardware used by the systems - Slower distributed searching through several computer systems, versus faster searching through a more centralised computer database of a priori compiled records 80 Federated searching: difficulties / challenges / problems - The evaluation of the quality of each search result from a federated search action may be more difficult than when each database is searched separately, because the user may be less aware of the limitations, strengths, selection criteria and aims of the individual, separate databases that offer each result. For instance, peer-reviewed articles from reputable scientific journals may be mixed with more popular and more biased, unscientific texts from trade literature. 81 Federated searching: general remarks Federated searching - offers benefits for those end-users who are not enthusiastic to work with separate target source databases - is a continuous challenge for developers of the sophisticated software and for the implementers in libraries and information centers - does not eliminate the need for access to individual databases 82 Federated searching Implementation 83 Federated searching: local or remote hosting • The federated searching system can be developed and maintained »on a local computer in-house, or »hosted on a more distant, external, remote computer; this service is offered by some vendors of software for federated searching; partly outsourcing 84 Federated searching: local hosting: scheme End user J End user J In-house portal for meta-searching = federated searching = cross-database searching information sources 85 Federated searching: remote hosting: scheme End user J End user J Externally hosted portal for meta-searching = federated searching = cross-database searching information sources 86 Federated searching: local versus remote hosting • Remote hosting requires perhaps »a smaller initial investment in computer hardware and skilled personnel »less time investment in installation and maintenance of equipment and software 87 Federated searching: tasks for the library • Of course providing a computer system for metasearching 88 Federated searching: tasks for the library • Maintaining a list of target information sources that are appropriate in the framework of the particular library: »subjects covered by the target databases should be relevant »subscriptions must have been made by the library for access to the targets 89 Federated searching: tasks for the library • Grouping databases in groups that correspond to subject fields and offer these as pre-selections in the user interface of the federated search system 90 Federated searching: tasks for the library • Showing the system and its features to potential users 91 Federated searching in a library WWW site? - Searching for books - Opening hours - Searching for articles - Library services - Rules and regulations - Organisation of the library 92 Federated searching in a library WWW site? - Searching for books - Opening hours - Catalog of this library - Library services - Other catalogs - Rules and regulations - Other book databases - Organisation of the library - Electronic books - Federated searching for books - Searching for articles 93 Federated searching in a library WWW site? - Searching for books - Opening hours - Searching for articles - Library services - Databases to find articles - Rules and regulations - Electronic journals - Organisation of the library - Collective catalog of periodicals - Repositories of articles on the Internet and WWW - Federated searching for articles 94 Federated searching in a library WWW site! - Find the information that you need - The catalog - Databases - Opening hours - Library services - Rules and regulations - Organisation of the library To a federated search engine 95 Federated searching Putting federated searching in a wider context 96 Federated searching + link generator user J reference FEDERATED SEARCH information sources full-text document ! menu context-sensitive hyperlink generator database about local situation “knowledgebase” appropriate target information source 97 Federated search system and link resolver compared Federated search or merging into 1 database Link resolver = link generator How to allow a user to discover information, by exploiting many information sources in 1 action? YES ! no How to bring a user from some discovered, known information to additional, related information? no YES ! Problem to be solved 98 Putting the digital tools together in a library system user J library WWW site context-sensitive hyperlink generator catalogue(s) of local holdings federated searching database about local situation “knowledgebase” 99 Access to information sources: tools / methods / systems In sequence of priority: 1. Online library catalogue (for hard copy and digital documents) 2. Library web site 3. Link generator + “knowledgebase” 4. Federated search system 5. … 100 Federated searching Examples of applications offered free of charge 101 Example Federated searching: example • http://WorldWideScience.org/ • “A global science gateway connecting you to national and international scientific databases and portals. Accelerates scientific discovery and progress by providing one-stop searching of global science sources.” 102 Example Federated searching: example • http://www.scitopia.org/scitopia/ • Federated searching through various scientific databases 103 Example Federated searching: example: Yippy 104 Example Federated searching + clustering: example: Yippy • Adds value by analyzing the retrieved results / hits / links / WWW documents, in order to cluster / group / categorize / classify / map these under headings / classes / categories, to make further selections by the user / searcher easier and faster. • Can accomplish this on the fly, that is WITHOUT preprocessing the documents before the search. 105 Example Federated searching + clustering: example: Yippy 106 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalog databases descriptions of books & real books for sale 107 Search systems for books that are made available by dealers User Federated book search systems Multi-dealer databases = merged book dealer databases Book dealer catalogue databases descriptions of books & real books for sale 108 Free federated search systems for books: examples • http://www.addall.com/ • Covers many book dealer databases and multi-dealer databases, including unique databases that are not covered by competing search systems. • Can calculate the cost to ship/send a book to you, taking into account your country and currency. • Searches only new books; to find used books, a companion system should be used. This is inconvenient if the user is interested in both types of books. 109 Free federated search systems for books: examples • http://www.bookfinder.com/ • Covers many book dealer databases and multi-dealer databases, including unique databases that are not covered by competing search systems. • It is efficient that new and used books are searched in 1 action; the results are presented in 2 columns: new | used. 110 Example Online catalogues: simultaneous searching: examples • Simultaneous access to catalogues of libraries related to water, organized by IAMSLIC, using the standard Z39.50 111 Federated searching offered by a university library • Main goal of such a system is offering easy and fast access to various information sources and NOT sophisticated and complicated searching. • The user interface is simple, in agreement with the aim of such a system. 112 Federated search For libraries and information centres 1. Point your users to external existing federated search services on the WWW, which are available free of charge. 113 Federated search For libraries and information centres 2. Consider implementing a local federated search system for your users, hoping that the databases that you offer anyway are used more often and more effectively. 114 • Combination of merging databases with federated searching 115 Comparison of methods for efficient information retrieval 1. Merging databases 2. Federated searching A smaller version of this table with comments has been published: P. Nieuwenhuysen So many digital libraries, so little time. In International Conference on Digital Libraries, 2010: ICDL 2010, Shaping the Information Paradigm, New Delhi, 23-26 February 2010. Preconference proceedings. Conference papers, published by TERI, The Energy and Resources Institute, http://www.teriin.org/ and the Indira Gandhi National Open University, 2010, 2 volumes, 1349 pages. pp. 56-71. 116 Comparison of methods for efficient information retrieval Pre-search analysis of all data (for better relevance ranking, to eliminate duplicates, etc…) 1. Merging databases + 2. - Federated searching 117 Comparison of methods for efficient information retrieval Pre-search Size analysis of all of the data (for better coverage relevance ranking, to eliminate duplicates, etc…) 1. Merging databases + -+ 2. - +- Federated searching 118 Comparison of methods for efficient information retrieval Pre-search Size analysis of all of the data (for better coverage relevance ranking, to eliminate duplicates, etc…) Independent of Internet / WWW 1. Merging databases + -+ + 2. - +- - Federated searching 119 Comparison of methods for efficient information retrieval Pre-search Size analysis of all of the data (for better coverage relevance ranking, to eliminate duplicates, etc…) Independent Up-to-date of Internet / information WWW 1. Merging databases + -+ + - 2. - +- - + Federated searching 120 Comparison of methods for efficient information retrieval Pre-search Size analysis of all of the data (for better coverage relevance ranking, to eliminate duplicates, etc…) Independent Up-to-date Speed of Internet / information of WWW retrieval and display 1. Merging databases + -+ + - +- 2. - +- - + -+ Federated searching 121 Comparison of methods for efficient information retrieval Both methods • have pros and cons • are used in combination, in some systems to search through many bibliographic databases in 1 action, (for instance, if merging + indexing of a database with other databases is not allowed by the producer, then that databases can be included nevertheless, by a federated search) 122 Comparison of methods for efficient information retrieval + The evolution of information and communication technology makes systems more powerful, easier to implement and use, and cheaper: + Merging information sources is pushed forward mainly by the decreasing costs of hard disks and of computer memory in general. + Federated searching is pushed mainly by the evolution of the Internet. J 123 Introduction: scattering of sources difficulties Islands of information 124 Introduction: scattering of sources difficulties 125 Questions are welcome 126 • You are free to copy, distribute, display this work under the following conditions: »Attribution: You must mention the author. »Noncommercial: You may not use this work for commercial purposes. »No Derivative Works: You may not change, modify, alter, transform, or build upon this work. • For any reuse or distribution, you must make clear to others the license terms of this work.
© Copyright 2024