Patterns: Portal Search h Custom Design gn

Front cover
Patterns: Portal Searchh
Custom Design
gn
Applying the Information Aggregation
patterns to portal search solutions
Hints/tips for using IBM search
technologies
A portal search scenario
William Tworek
Christopher Desforges
Robert Bell
Raghu Krishnaswamy
ibm.com/redbooks
International Technical Support Organization
Patterns: Portal Search Custom Design
April 2004
SG24-6881-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page ix.
First Edition (April 2004)
© Copyright International Business Machines Corporation 2004. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Part 1. Introductory material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1. Patterns for e-business introduction . . . . . . . . . . . . . . . . . . . . . 3
1.1 The IT architect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The Patterns for e-business layered asset model . . . . . . . . . . . . . . . . . . . . 4
1.3 How to use the Patterns for e-business . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Select a Business, Integration, or Composite pattern, or a Custom
design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Select Application patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.3 Review Runtime patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Review Product mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.5 Review guidelines and related links . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2. Portal composite pattern and custom designs introduction . 17
2.1 Introduction to the Portal composite pattern . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Business drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Jump-start portal questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.3 IT drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Understanding the Patterns for e-business . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Portal custom designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Access Integration pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Self-Service business pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Collaboration business pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.4 Information Aggregation business pattern . . . . . . . . . . . . . . . . . . . . 27
2.3.5 Extended Enterprise business pattern . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.6 Application Integration pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.7 Portal characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.8 The Portal composite pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.9 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.10 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
© Copyright IBM Corp. 2004. All rights reserved.
iii
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Part 2. Portal Search custom design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 3. The Portal Search custom design . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 What is a Custom design? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The need for portal search capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Technology drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 The Custom design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 4. Application patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 An overview of the Application patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Application Integration patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2.1 Population: Single Step, Multi-step, and Data Cleansing . . . . . . . . . 46
4.2.2 Population: Index Population application pattern . . . . . . . . . . . . . . . 50
4.2.3 Population: Synchronization application pattern . . . . . . . . . . . . . . . . 54
4.2.4 Federation application pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Information Aggregation patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.1 User Information Access application pattern. . . . . . . . . . . . . . . . . . . 57
4.3.2 User Search and Discovery application pattern . . . . . . . . . . . . . . . . 61
4.3.3 Self-Service application patterns compared . . . . . . . . . . . . . . . . . . . 63
4.4 Combining the patterns for search solutions . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter 5. Runtime patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 Runtime node descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Runtime pattern for the Portal composite pattern . . . . . . . . . . . . . . . . . . . 72
5.3 Runtime pattern for Portal Search custom design. . . . . . . . . . . . . . . . . . . 73
5.4 Application Integration Runtime patterns . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.1 Population: Index Population Runtime pattern . . . . . . . . . . . . . . . . . 76
5.4.2 Federation Runtime pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Information Aggregation Runtime patterns . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 User Search and Discovery Runtime pattern . . . . . . . . . . . . . . . . . . 86
5.5.2 Information Aggregation in business intelligence solutions. . . . . . . . 90
5.6 Combining the Runtime patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Chapter 6. Portal Search product mappings. . . . . . . . . . . . . . . . . . . . . . . . 93
6.1 Mapping the Runtime pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1.1 Functional mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1.2 Product mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.3 Network protocol mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Product descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
iv
Patterns: Portal Search Custom Design
6.2.1 Lotus Extended Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.2 DB2 Information Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.2.3 Lotus Domino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2.4 Lotus Discovery Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2.5 WebSphere Application Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.6 WebSphere Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.7 WebSphere Portal Search Engine (Juru) . . . . . . . . . . . . . . . . . . . . 104
6.3 Choosing the product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Part 3. Solution guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Chapter 7. Technology considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.1 Query syntax support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.2 Support for a common data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.3 Simple versus advanced index creation . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 Honoring the security of data sources. . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.5 Source discovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.6 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.7 Client features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.8 Client technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.8.1 HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.8.2 Dynamic HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.8.3 JavaScript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.8.4 Java applets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.8.5 Java servlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.8.6 JavaServer Pages (JSPs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.8.7 JavaBeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.8.8 XML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.8.9 Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Chapter 8. Application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2 WebSphere Portal Services architecture diagram . . . . . . . . . . . . . . . . . 135
8.2.1 Single-Tier versus Multi-Tier design . . . . . . . . . . . . . . . . . . . . . . . . 136
8.3 Portal solution guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.3.1 Model-View-Controller design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.3.2 Content management guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.3.3 Single sign-on guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3.4 Collaboration guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.3.5 Web services guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
8.5 Where to find more information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Contents
v
Part 4. Technical scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Chapter 9. “Chrisco Books” scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.1 Chrisco Books scenario: story line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.2 Chrisco Books scenario: requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.2.1 Functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.2.2 Non-functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.2.3 Summary of requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.3 Patterns mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.3.1 Examining the business requirements . . . . . . . . . . . . . . . . . . . . . . 159
9.3.2 Solution options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.3.3 Integrating the solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.4 Expanding the scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Chapter 10. Technical implementation of the scenario . . . . . . . . . . . . . . 167
10.1 The runtime environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.2 The Lotus Domino server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
10.3 The IBM Content Manager server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
10.4 The Lotus Extended Search server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.4.1 Internet and Intranet data source setup . . . . . . . . . . . . . . . . . . . . 179
10.4.2 Domino application data source setup . . . . . . . . . . . . . . . . . . . . . 189
10.4.3 IBM Content Manager data source setup . . . . . . . . . . . . . . . . . . . 190
10.5 The WebSphere Portal server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.6 Putting it all together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Part 5. Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Appendix A. Pattern changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Appendix B. Understanding the Lotus Extended Search architecture . 207
Extended Search architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Links and translators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Brokers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Configuration database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Appendix C. Using the WebSphere Portal Search Engine . . . . . . . . . . . 219
How to set up Portal Search in WebSphere Portal Server. . . . . . . . . . . . . . . 220
Creating the Search page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Building a Juru Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Setting up permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Configuring the crawler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
vi
Patterns: Portal Search Custom Design
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Referenced Web sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
IBM Redbooks collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Contents
vii
viii
Patterns: Portal Search Custom Design
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES
THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,
modify, and distribute these sample programs in any form without payment to IBM for the purposes of
developing, using, marketing, or distributing application programs conforming to IBM's application
programming interfaces.
© Copyright IBM Corp. 2004. All rights reserved.
ix
Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
AIX®
DB2®
DB2 Information Integrator™
DB2 Universal Database™
Domino™
Domino.Doc®
EDMSuite™
^™
IBM®
ibm.com®
ImagePlus®
Informix®
iSeries™
Lotus®
Lotus Discovery Server™
Lotus Notes®
Notes®
OS/390®
Redbooks™
Redbooks (logo)
Sametime®
SmartSuite®
VisualInfo™
WebSphere®
z/OS®
™
The following terms are trademarks of other companies:
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
x
Patterns: Portal Search Custom Design
Preface
The Patterns for e-business are a group of proven, reusable assets that can
speed the process of developing applications. The Portal Search custom design
builds off the Portal composite pattern, combining Business and Integration
patterns to help implement a portal search solution. This IBM Redbook provides
a technical scenario and guidelines for the Portal Search custom design. It also
shows how the Portal Search custom design works, and documents the tasks
required to build a technical scenario of it.
Part 1 provides introductory material around the IBM Patterns for e-business,
and the Portal composite pattern on which this custom design is based.
Part 2 guides you through the process of choosing the Business and Integration
patterns of the custom design and then drills down to the Application and
Runtime pattern and Product mapping to deliver the desired functionality.
Part 3 provides a set of guidelines for implementing and building a portal search
solution, including a discussion of search technology selection criteria as well as
application design and development.
Part 4 demonstrates how to implement a portal search solution via a technical
scenario. This technical scenario uses the WebSphere® Portal Extend offering,
combined with Lotus® Extended Search.
Finally, the appendix of this redbook provides some additional technical details
around some of the products used in this custom design, including: Lotus
Extended Search and the WebSphere Portal Search Engine technology.
The team that wrote this redbook
This redbook was produced by a team of specialists from around the world
working at the International Technical Support Organization, Cambridge Center.
William Tworek is a Project Leader with the International Technical Support
Organization, working out of Westford, Massachusetts. He provides
management and technical leadership for projects that produce IBM Redbooks™
on various topics involving IBM and Lotus Software technologies. Prior to joining
the ITSO, he was an IT Architect in the consulting industry working for Andersen
Consulting/Accenture, followed by IBM Software Services for Lotus. His areas of
expertise include collaborative technologies and business portals, system
integration, and systems infrastructure design.
© Copyright IBM Corp. 2004. All rights reserved.
xi
Christopher Desforges is a Consulting IT Architect with IBM Software Services
for Lotus, working out of New York.
Robert Bell is an Advisory IT Specialist working with IBM Software Services for
Lotus, working out of California.
Raghu Krishnaswamy is Senior Software Engineer with IBM Global Services
India. He holds a Bachelor's Degree in Electronic and Communication
Engineering, and has experience in Application and Frameworks Architecture.
Thanks to the following people for their contributions to this project:
򐂰 Jonathan Adams, Distinguished Engineer, IBM UK, Software Group
Technical Strategy
򐂰 Michele Galic, WebSphere Specialist, International Technical Support
Organization, Raleigh NC
򐂰 David Bryant, DB2/Business Intelligence Consultant, IBM® UK
򐂰 Todd Leyba, Architect, Extended Search Technology and Development, IBM
SWG
򐂰 Dana Morris, Advisory Software Engineer, Extended Search Technology and
Development, IBM SWG
򐂰 Yvonne Lyon, Technical Editor, International Technical Support Organization,
San Jose CA
Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook
dealing with specific products or solutions, while getting hands-on experience
with leading-edge technologies. You'll team with IBM technical professionals,
Business Partners and/or customers.
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you'll develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
xii
Patterns: Portal Search Custom Design
Comments welcome
Your comments are important to us!
We want our Redbooks to be as helpful as possible. Send us your comments
about this or other Redbooks in one of the following ways:
򐂰 Use the online Contact us review redbook form found at:
ibm.com/redbooks
򐂰 Send your comments in an Internet note to:
[email protected]
Preface
xiii
xiv
Patterns: Portal Search Custom Design
Part 1
Part
1
Introductory
material
Note: In the first part of this redbook, we introduce you to the IBM Patterns for
e-business, and the Portal composite pattern on which this book is based.
Those already familiar with the IBM Patterns for e-business, or the Portal
composite pattern, may want to skip forward to Part 2, “Portal Search custom
design” on page 33.
© Copyright IBM Corp. 2004. All rights reserved.
1
2
Patterns: Portal Search Custom Design
1
Chapter 1.
Patterns for e-business
introduction
This redbook is part of the Patterns for e-business series. In this introductory
chapter we provide an overview of how IT architects can work effectively with the
Patterns for e-business.
© Copyright IBM Corp. 2004. All rights reserved.
3
1.1 The IT architect
The role of the IT architect is to evaluate business problems and to build
solutions to solve them. To do this, the architect begins by gathering input on the
problem, an outline of the desired solution, and any special considerations or
requirements that need to be factored into that solution. The architect then takes
this input and designs the solution. This solution can include one or more
computer applications that address the business problems by supplying the
necessary business functions.
To enable the architect to do this better each time, we need to capture and reuse
the experience of these IT architects in such a way that future engagements can
be made simpler and faster. We do this by taking these experiences and using
them to build a repository of assets that provides a source from which architects
can reuse this experience to build future solutions, using proven assets. This
reuse saves time, money, and effort, and in the process, helps ensure delivery of
a solid, properly architected solution.
The IBM Patterns for e-business helps facilitate this reuse of assets. Their
purpose is to capture and publish e-business artifacts that have been used,
tested, and proven. The information captured by them is assumed to fit the
majority, or 80/20, situation.
The IBM Patterns for e-business are further augmented with guidelines and
related links for their better use.
The layers of patterns plus their associated links and guidelines allow the
architect to start with a problem and a vision for the solution, and then find a
pattern that fits that vision. Then, by drilling down using the patterns process, the
architect can further define the additional functional pieces that the application
will need to succeed. Finally, he can build the application using coding
techniques outlined in the associated guidelines.
1.2 The Patterns for e-business layered asset model
The Patterns for e-business approach enables architects to implement
successful e-business solutions through the re-use of components and solution
elements from proven successful experiences. The Patterns approach is based
on a set of layered assets that can be exploited by any existing development
methodology. These layered assets are structured in a way that each level of
detail builds on the last. These assets include:
򐂰 Business patterns that identify the interaction between users, businesses,
and data.
4
Patterns: Portal Search Custom Design
򐂰 Integration patterns that tie multiple Business patterns together when a
solution cannot be provided based on a single Business pattern.
򐂰 Composite patterns that represent commonly occurring combinations of
Business patterns and Integration patterns.
򐂰 Application patterns that provide a conceptual layout describing how the
application components and data within a Business pattern or Integration
pattern interact.
򐂰 Runtime patterns that define the logical middleware structure supporting an
Application pattern. Runtime patterns depict the major middleware nodes,
their roles, and the interfaces between these nodes.
򐂰 Product mappings that identify proven and tested software implementations
for each Runtime pattern.
򐂰 Best-practice guidelines for design, development, deployment, and
management of e-business applications.
These assets and their relation to each other are shown in Figure 1-1.
Customer
requirements
Composite
patterns
Business
patterns
Integration
patterns
gy
olo
od
eth
yM
An
Application
patterns
Runtime
patterns
Product
mappings
Best-Practice Guidelines
Application Design
Systems Management
Performance
Application Development
Technology Choices
Figure 1-1 The Patterns for e-business layered asset model
Chapter 1. Patterns for e-business introduction
5
Patterns for e-business Web site
The Patterns Web site provides an easy way of navigating top down through the
layered Patterns’ assets in order to determine the preferred reusable assets for
an engagement.
For easy reference to Patterns for e-business, refer to the Patterns for
e-business Web site at:
http://www.ibm.com/developerWorks/patterns/
1.3 How to use the Patterns for e-business
As described in the last section, the Patterns for e-business are a layered
structure where each layer builds detail on the last. At the highest layer are
Business patterns. These describe the entities involved in the e-business
solution.
Composite patterns appear in the hierarchy shown in Figure 1-1 on page 5 above
the Business patterns. However, Composite patterns are made up of a number of
individual Business patterns, and at least one Integration pattern. In this section,
we discuss how to use the layered structure of Patterns for e-business assets.
1.3.1 Select a Business, Integration, or Composite pattern, or a
Custom design
When faced with the challenge of designing a solution for a business problem,
the first step is to take a high-level view of the goals you are trying to achieve. A
proposed business scenario should be described and each element should be
matched to an appropriate IBM Pattern for e-business. You may find, for
example, that the total solution requires multiple Business and Integration
patterns, or that it fits into a Composite pattern or Custom design.
For example, suppose an insurance company wants to reduce the amount of
time and money spent on call centers that handle customer inquiries. By allowing
customers to view their policy information and to request changes online, they
will be able to cut back significantly on the resources spent handling this by
phone. The objective is to allow policyholders to view their policy information
stored in legacy databases.
The Self-Service business pattern fits this scenario perfectly. It is meant to be
used in situations where users need direct access to business applications and
data. Let’s take a look at the available Business patterns.
6
Patterns: Portal Search Custom Design
Business patterns
A Business pattern describes the relationship between the users, the business
organizations or applications, and the data to be accessed.
There are four primary Business patterns, explained in Figure 1-2.
Business Patterns
Description
Self-Service
(User-to-Business)
Applications where users
interact with a business
via the Internet or
intranet
Simple Web site
applications
Information Aggregation
(User-to-Data)
Applications where users
can extract useful
information from large
volumes of data, text,
images, etc.
Business intelligence,
knowledge management,
Web crawlers
Applications where the
Internet supports
collaborative work
between users
E-mail, community, chat,
video conferencing, etc.
Applications that link two
or more business
processes across
separate enterprises
EDI, supply chain
management, etc.
Collaboration
(User-to-User)
Extended Enterprise
(Business-to-Business)
Examples
Figure 1-2 The four primary Business patterns
It would be very convenient if all problems fit nicely into these four slots, but
reality says that things will often be more complicated. The patterns assume that
most problems, when broken down into their most basic components, will fit
more than one of these patterns. When a problem requires multiple Business
patterns, the Patterns for e-business provide additional patterns in the form of
Integration patterns.
Integration patterns
Integration patterns allow us to tie together multiple Business patterns to solve a
business problem. The Integration patterns are outlined in Figure 1-3.
Chapter 1. Patterns for e-business introduction
7
Integration Patterns
Description
Examples
Access Integration
Integration of a number
of services through a
common entry point
Portals
Application Integration
Integration of multiple
applications and data
sources without the user
directly invoking them
Message brokers,
workflow managers
Figure 1-3 Integration patterns
These Business and Integration patterns can be combined to implement
installation-specific business solutions. We call this a Custom design.
Custom design
Self-Service
Collaboration
Information Aggregation
Extended Enterprise
Application Integration
Access Integration
We can represent the use of a Custom design to address a business problem
through an iconic representation as shown in Figure 1-4.
Figure 1-4 Patterns representing a Custom design
If any of the Business or Integration patterns are not used in a Custom design,
we can show that with the blocks being lighter than the other ones. For example,
Figure 1-5 shows a Custom design that does not have a Collaboration business
pattern or an Extended Enterprise business pattern for a business problem.
8
Patterns: Portal Search Custom Design
Collaboration
Information Aggregation
Extended Enterprise
Application Integration
Access Integration
Self-Service
Figure 1-5 Custom design with Self-Service, Information Aggregation, Access
Integration, and Application Integration
A Custom design may also be a Composite pattern if it recurs many times across
domains with similar business problems. For example, the iconic view of a
Custom design in Figure 1-5 can also describe a Sell-Side Hub composite
pattern.
Composite patterns
Several common uses of Business and Integration patterns have been identified
and formalized into Composite patterns. The identified Composite patterns are
shown in Figure 1-6.
Chapter 1. Patterns for e-business introduction
9
Composite Patterns
Electronic Commerce
Description
User-to-Online-Buying
Examples
www.macys.com
www.amazon.com
Enterprise Intranet portal
providing self-service functions
such as payroll, benefits, and
travel expenses.
Collaboration providers who
provide services such as e-mail
or instant messaging.
Portal
Typically designed to aggregate
multiple information sources and
applications to provide uniform,
seamless, and personalized
access for its users.
Account Access
Provide customers with
around-the-clock account access
to their account information.
Online brokerage trading apps.
Telephone company account
manager functions.
Bank, credit card and insurance
company online apps.
Trading Exchange
Allows buyers and sellers to trade
goods and services on a public
site.
Buyer's side - interaction
between buyer's procurement
system and commerce
functions of e-Marketplace.
Seller's side - interaction
between the procurement
functions of the e-Marketplace
and its suppliers.
Sell-Side Hub
(Supplier)
The seller owns the e-Marketplace
and uses it as a vehicle to sell
goods and services on the Web.
Buy-Side Hub
(Purchaser)
The buyer of the goods owns the
e-Marketplace and uses it as a
vehicle to leverage the buying or
procurement budget in soliciting
the best deals for goods and
services from prospective sellers
across the Web.
www.carmax.com (car purchase)
www.wre.org
(WorldWide Retail Exchange)
Figure 1-6 Composite patterns
The makeup of these patterns is variable in that there will be basic patterns
present for each type, but the Composite can easily be extended to meet
additional criteria. For more information on Composite patterns, refer to Patterns
for e-business: A Strategy for Reuse by Jonathan Adams, Srinivas Koushik,
Guru Vasudeva, and George Galambos.
10
Patterns: Portal Search Custom Design
1.3.2 Select Application patterns
Once the Business pattern is identified, the next step is to define the high-level
logical components that make up the solution and how these components
interact. This is known as the Application pattern. A Business pattern will usually
have multiple possible Application patterns. An Application pattern may have
logical components that describe a presentation tier for interacting with users, an
application tier, and a back-end application tier.
Application patterns break the application down into the most basic conceptual
components, identifying the goal of the application. In our example, the
application falls into the Self-Service business pattern and the goal is to build a
simple application that allows users to access back-end information. The
Application pattern shown in Figure 1-7 fulfills this requirement.
Presentation
synchronous
Read/Write data
Web
Application
synch/
asynch
Application node
containing new or
modified components
Back-End
Application 2
Back-End
Application 1
Application node containing
existing components with
no need for modification
or which cannot be changed
Figure 1-7 Self -Service::Directly Integrated Single Channel
The Application pattern shown consists of a presentation tier that handles the
request/response to the user. The application tier represents the component that
handles access to the back-end applications and data. The multiple application
boxes on the right represent the back-end applications that contain the business
data. The type of communication is specified as synchronous (one request/one
response, then next request/response) or asynchronous (multiple requests and
responses intermixed).
Suppose that the situation is a little more complicated than that. Let's say that the
automobile policies and the homeowner policies are kept in two separate and
dissimilar databases. The user request would actually need data from multiple,
disparate back-end systems. In this case there is a need to break the request
Chapter 1. Patterns for e-business introduction
11
down into multiple requests (decompose the request) to be sent to the two
different back-end databases, then to gather the information sent back from the
requests, and then put this information into the form of a response (recompose).
In this case the Application pattern shown in Figure 1-8 would be more
appropriate.
Back-End
Application 2
Presentation
Application node
containing new
or modified
components
synchronous
Decomp/
Recomp
Transient data
- Work in progress
- Cached committed data
- Staged data (data replication
flow)
synch/
asynch
Back-End
Application 1
Application node
containing existing
components with no need
for modification or which
cannot be changed
Read/
Write data
Figure 1-8 Self-Service::Decomposition
This Application pattern extends the idea of the application tier that accesses the
back-end data by adding decomposition and recomposition capabilities.
1.3.3 Review Runtime patterns
The Application pattern can be further refined with more explicit functions to be
performed. Each function is associated with a runtime node. In reality these
functions, or nodes, can exist on separate physical machines or may co-exist on
the same machine. In the Runtime pattern this is not relevant. The focus is on the
logical nodes required and their placement in the overall network structure.
As an example, let's assume that our customer has determined that his solution
fits into the Self-Service business pattern and that the Directly Integrated Single
Channel pattern is the most descriptive of the situation. The next step is to
determine the Runtime pattern that is most appropriate for his situation.
He knows that he will have users on the Internet accessing his business data and
he will therefore require a measure of security. Security can be implemented at
various layers of the application, but the first line of defense is almost always one
or more firewalls that define who and what can cross the physical network
boundaries into his company network.
12
Patterns: Portal Search Custom Design
He also needs to determine the functional nodes required to implement the
application and security measures. The Runtime pattern shown in Figure 1-9 is
one of his options.
Demilitarized Zone
(DMZ)
Outside World
Public Key
Infrastructure
Web
Application
Server
Domain Firewall
User
Directory and
Security
Services
Protocol Firewall
Domain Name
Server
I
N
T
E
R
N
E
T
Internal Network
Existing
Existing
Applications
Applications
andData
Data
and
Directly Integrated Single Channel application
Presentation
Application
Application
Application
Figure 1-9 Directly Integrated Single Channel application pattern::Runtime pattern
By overlaying the Application pattern on the Runtime pattern, you can see the
roles that each functional node will fulfill in the application. The presentation and
application tiers will be implemented with a Web application server, which
combines the functions of an HTTP server and an application server. It handles
both static and dynamic Web pages.
Application security is handled by the Web application server through the use of
a common central directory and security services node.
Chapter 1. Patterns for e-business introduction
13
A characteristic that makes this Runtime pattern different from others is the
placement of the Web application server between the two firewalls. The Runtime
pattern shown in Figure 1-10 is a variation on this. It splits the Web application
server into two functional nodes by separating the HTTP server function from the
application server. The HTTP server (Web server redirector) will serve static
Web pages and redirect other requests to the application server. It moves the
application server function behind the second firewall, adding further security.
Demilitarized Zone
(DMZ)
Outside World
Internal Network
Public Key
Infrastructure
Web
Server
Redirector
Domain Firewall
User
Protocol Firewall
Domain Name
Server
I
N
T
E
R
N
E
T
Directory and
Security
Services
Application
Server
Existing
Existing
Applications
Applications
andData
Data
and
Directly Integrated Single Channel application
Presentation
Application
Application
Application
Figure 1-10 Directly Integrated Single Channel application pattern::Runtime pattern:
Variation 1
These are just two examples of the possible Runtime patterns available. Each
Application pattern will have one or more Runtime patterns defined. These can
be modified to suit the customer’s needs. For example, he/she may want to add
a load-balancing function and multiple application servers.
14
Patterns: Portal Search Custom Design
1.3.4 Review Product mappings
The last step in defining the network structure for the application is to correlate
real products with one or more runtime nodes. The Patterns Web site shows
each Runtime pattern with products that have been tested in that capacity. The
Product mappings are oriented toward a particular platform, though more likely
the customer will have a variety of platforms involved in the network. In this case,
it is simply a matter of mix and match.
For example, the runtime variation in Figure 1-10 on page 14 could be
implemented using the product set depicted in Figure 1-11.
Internal network
Demilitarized zone
Web Server
Redirector
Domain Firewall
Protocol Firewall
Outside world
Windows 2000 + SP3
IBM WebSphere Application
Server V5.0 HTTP Plug-in
IBM HTTP Server 1.3.26
Directory and
Security
Services
Windows 2000 + SP3
IBM SecureWay Directory V3.2.1
IBM HTTP Server 1.3.19.1
IBM GSKit 5.0.3
IBM DB2 UDB EE V7.2 + FP5
LDAP
Existing
Applications
and Data
Application
Server
Windows 2000 + SP3
IBM WebSphere Application
Server V5.0
JMS Option add:
IBM WebSphere MQ 5.3
Database
Web Services Option:
Windows 2000 + SP3
IBM WebSphere Application
Server V5.0
IBM HTTP Server 1.3.26
IBM DB2 UDB ESE 8.1
Web service EJB application
JCA Option:
z/OS Release 1.3
IBM CICS Transaction Gateway
V5.0
IBM CICS Transaction Server
V2.2
CICS C-application
JMS Option:
Windows 2000 + SP3
IBM WebSphere Application
Server V5.0
IBM WebSphere MQ 5.3
Message-driven bean application
Windows 2000 + SP3
IBM DB2 UDB ESE V8.1
Figure 1-11 Directly Integrated Single Channel application pattern: Windows® 2000 product mapping
Chapter 1. Patterns for e-business introduction
15
1.3.5 Review guidelines and related links
The Application patterns, Runtime patterns, and Product mappings are intended
to guide you in defining the application requirements and the network layout. The
actual application development has not been addressed yet. The Patterns Web
site provides guidelines for each Application pattern, including techniques for
developing, implementing, and managing the application based on the following
guidelines:
򐂰 Design guidelines instruct you on tips and techniques for designing the
applications.
򐂰 Development guidelines take you through the process of building the
application, from the requirements phase all the way through the testing and
rollout phases.
򐂰 System management guidelines address the day-to-day operational
concerns, including security, backup and recovery, application management,
etc.
򐂰 Performance guidelines give information on how to improve the application
and system performance.
1.4 Summary
The IBM Patterns for e-business are a collective set of proven architectures. This
repository of assets can be used by companies to facilitate the development of
Web-based applications. They help an organization understand and analyze
complex business problems and break them down into smaller, more
manageable functions that can then be implemented.
16
Patterns: Portal Search Custom Design
2
Chapter 2.
Portal composite pattern
and custom designs
introduction
Organizations strive to achieve the best combination of business process
efficiency, deep customer knowledge and mindshare, and product leadership —
a combination that best suits their business goals. In order to obtain these goals,
organizations leverage role-based portals to provide relevant information to
specific audiences.
The Portal composite pattern assists in the design process for a portal
implementation. Portal custom designs will typically be variations of the Portal
composite pattern, which has been extended to meet a specific customer’s
requirements.
© Copyright IBM Corp. 2004. All rights reserved.
17
2.1 Introduction to the Portal composite pattern
A Portal composite pattern leverages various mechanisms (for example,
personalization, collaboration, content management, user interface formatting
and display, and data aggregation) to bring together the appropriate information
and existing systems to serve the goals of the business.
For example, when attempting to grow customer mindshare and knowledge, a
portal system can bring together the proper information tailored to the type of
user that the business would like to target. This can be implemented in a number
of ways; however, the cogent point to remember is that when a customer is made
to feel that business truly understands his or her needs and wants, they will be
retained as a customer.
Consequently, the customer’s needs and wants may be provided through the
business achieving product leadership, great customer service, or highly efficient
transactional processes that support product leadership and/or customer service.
The components of personalization, collaboration, multi-device type access, a
presentation rendering mechanism, and a business rules engine, are combined
with the ability to search and index content (of various types and formats) and
management of content via a workflow process to provide both content
aggregation and a collaborative environment.
A portal can help a business gain marketshare, retain existing customers, and
reduce costs through the ability to target the delivery of information to specific
user audiences.
2.1.1 Business drivers
Business drivers are specific goals that the business is trying to achieve. In most
cases, business drivers have an ultimate goal of reducing costs, increasing
revenue, or improving productivity. In fact, a business can be any type of
organization (for example, manufacturing, research, military, etc.) that seeks to
make the best use of its available resources and determine if new resources are
required. The design of a portal can help to clarify these goals, and analysis of
interactions with the portal can further define and enhance these drivers. Various
“paths” that can be followed to achieve the desired results are as follows:
򐂰 Deep customer knowledge and mindshare:
This can be thought of as “customer intimacy”. When a business wants to
provide the “best customer service” experience and this is their primary driver
for revenue, they need to understand their customers and market as much as
possible. So it is important to identify these types of customers when
designing a portal, and once implemented, a portal can provide valuable
knowledge about the habits of the targeted audience.
18
Patterns: Portal Search Custom Design
Also, this information can be used to determine if the targeted audience is
helping the business achieve its goals. Thus an organization can increase
customer retention through deep knowledge of that customer, resulting in
increased revenues through more efficient marketing practices.
򐂰 Product leadership:
Some organizations want to be the best in their market for the products or
services they provide. These organizations want to achieve leadership from a
quality and/or marketplace mindshare perspective. One of the common
methods for providing product mindshare leadership is by communicating
certain information about upcoming products or enhancements to existing
products to the targeted audience. In addition, if the business can identify
other possible audiences to expand their customer base, this can also
contribute to product leadership.
A portal can assist in disseminating both the technical and marketing
information about the products or services provided, and this information can
be tailored to specific user audiences (as defined by demographic and
“device type” information). In addition, the usage of the portal by these
targeted audiences (customers) can be analyzed to determine if the
marketing efforts are successful.
򐂰 Business process efficiency:
Organizations that have identified increased efficiency in their internal
processes want to attain the highest possible efficiency in the transactions
that take place between departments, divisions, employees, and external
partners (for example, external suppliers who supply raw material for the
products or services being offered).
A portal can give people access to the information and business processes
that they need in a single, secure, and dynamic environment. This allows the
user to:
– Access the relevant information in context of the task performed
– Collaborate in context of the business process
– See consolidated information in a single aggregated view
This collaborative portal environment, with its aggregated views of data,
provides just the information necessary for the person or entity to gain
maximum efficiency in how tasks are accomplished. For example, in sales it
is important to present the sales person all the information available on a
given customer in context with the products the customer buys and the most
recent sales activities with that customer. Often there is also the need for
additional information from other sales people or the project team working on
site with the customer. Using collaboration tools can help to have better and
more responsive interaction within the teams and ulimatively lead to faster
decision making.
Chapter 2. Portal composite pattern and custom designs introduction
19
We see that a portal implementation requires the identification of the information
desired, the audience for that information, and an analysis of the usefulness of
that information to fulfill the business drivers of an organization. Organizations
may have only one of these business drivers, or there may be a combination of
these drivers that will help the organization meet their goals.
In addition, concepts such as ease of use (for example, single sign-on), security,
and reduced Total Cost of Ownership (TCO) are all examples of specific, tactical
goals of a portal implementation that will ultimately support the three core
business drivers above. The following are some additional examples of specific
goals that can be used to achieve the ultimate drivers for an organization:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Time to market
Improved organizational efficiency
Faster decision making
Reduced latency of business events
Adaptability during mergers and acquisitions
Integration across multiple delivery channels
Unified customer view across lines of business
Support of effective cross selling
Support of mass customization (reducing the cost of customizing products
and services)
2.1.2 Jump-start portal questions
The Patterns for e-business and specifically the Portal composite pattern assist
in the design process for a portal implementation. This allows both the business
and technical groups within an organization to ask the jump-start portal
questions. They are as follows:
1. Where is the information in our organization located?
In order to aggregate information, the location of this information
(applications, databases, external sources) must first be determined.
2. Does the information needed currently exist?
The business drivers will determine what information is needed.
3. Do you want to enable collaboration and human interaction across all areas of
business?
Processes and applications don't make decisions — the users do. A
collaborative portal environment allows you to integrate human interaction
with processes and information. These are capabilities that, for example, let
people get the just-in-time advice, education, consensus, and approval they
need to respond quickly, to any business situation or emergency.
20
Patterns: Portal Search Custom Design
4. Do you want to enable widespread teams to work together efficiently in the
context of the business process?
The portal allows you to make your organization’s people, processes, and
information readily available to individual teams so they can solve everyday
business problems more efficiently.
5. What are the processes by which that information is collected, updated,
managed, and disseminated?
Portals are based on information that has to be managed from a collection,
update, and processing perspective. There are likely existing processes by
which information (that already exists in the organization) is collected, and
these will have to be examined to determine if they must be modified to
support both the business and IT drivers.
6. What defines a portal user?
The definition of a user will impact the types of security, the types of data, and
the types of client devices that need to be supported.
7. What is to be gained by implementing a portal?
This is a direct reference to the business drivers. Once an organization
defines what they want to achieve or improve upon, then a discrete set of
business goals, or drivers, can be identified.
2.1.3 IT drivers
As with all organizations, those concepts that drive the IT organization to make
decisions are ultimately driven by the needs of the organization at the business
or enterprise level. Those items, described in 2.1.1, “Business drivers” on
page 18, can each be supported through the appropriate use of technologies
that help implement the following goals:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Minimize application complexity
Minimize Total Cost of Ownership (TCO)
Are open-standards-based
Offer an end-to-end solution
Leverage existing skills
Leverage legacy investment
Integrate back-end applications
Minimize enterprise complexity
Support maintainability
Support scalability
Support availability
Chapter 2. Portal composite pattern and custom designs introduction
21
Many of these IT drivers are focused on cost reduction through minimizing
complexity. These can be further abstracted into five core IT drivers as follows:
򐂰 Availability:
The IT organization needs to have the solution available as defined in the
business drivers. A portal implementation means having the information that
the customer wants to see in the way he wants to see it. Therefore, the
application needs to be available when the customer wants to see it.
򐂰 Open-standards-based:
An open-standards-based infrastructure provides both a choice of platform
and the ability to integrate into other vendor’s environments. In addition, it
allows the applications that you develop to interact in a much larger
environment.
򐂰 Reusability:
Reusing existing IT assets, such as programming code, existing applications,
and existing data sources, can reduce overall cost. A portal implementation,
and specifically the Portal composite pattern, brings together various existing
and new systems to construct an end-to-end solution.
򐂰 Maintainability:
Maintainability is a goal of the IT organization because shifting business goals
will often require adding or deleting functionality. In addition, the sources of
information available to a portal system may change. Thus, it is vital that the
portal implementation be able to adapt to the changing environment by
isolating different systems so that changes to one type of component will not
affect other components that make up the portal system.
򐂰 Scalability:
The Portal composite pattern is a “best mix” of nodes and components that
lead to the Portal composite Runtime pattern discussed in 5.2, “Runtime
pattern for the Portal composite pattern” on page 72. This Runtime pattern is
a high-level representation of a portal architecture that separates the
components so that each component can be chosen for maximum scalability.
Scalability is also important because the system should be designed and built
only once and should be able to handle increased demands. This supports
the general business driver of reduced cost and operational efficiency.
򐂰 Extensibility:
Extensibility in a system design allows for easier functional enhancement as
the needs of the business change and/or increase. Once again, this IT driver
supports the general business driver of reduced cost by being able to reuse
the same architected solution.
22
Patterns: Portal Search Custom Design
2.2 Understanding the Patterns for e-business
Understanding the Patterns for e-business, and specifically the Composite
patterns, is not always a straightforward process. In interpreting how to use the
Patterns for e-business, it is best to start with how people in different roles might
leverage these to explain and/or justify a particular solution.
In a portal implementation, the Portal composite pattern is the logical starting
point. It identifies the Business and Integration patterns that make sense for the
typical portal implementation.
These roles are common in the IT industry, and each type of role will use and
leverage the patterns in a different manner.
򐂰 Sales:
The Sales role describes a person who is making the initial relationship with
those organizations that might benefit from using and understanding the
patterns. The role can be a person within an organization or a person from an
external vendor (for example IBM Global Services) who has the expertise to
understand the business problems and issues that need to be addressed.
The sales person will use the patterns to begin the analysis discussion with
the business level stakeholders to understand the business drivers. They will
start with the Business and Integration patterns and likely continue to the
Application and Runtime patterns, showing how you can move from Business
patterns to Application patterns to Runtime patterns.
During discussions with the business stakeholders, the initial team identifies
high-level goals of the business and makes a determination that no specific
Business or Integration pattern will address all the business drivers. At this
point a Composite pattern makes sense and, specifically when there are
information aggregation and other requirements that are fulfilled by a portal,
the Portal composite pattern is a good starting point.
Refer to Applying Pattern Approaches, SG24-6805, for more details on how
to use the Patterns for e-business in a sales role.
򐂰 Project Manager:
A project manager will need an understanding of those patterns that have
already been chosen so that a set of tasks can be derived. Priorities can be
set because once the Business and Integration patterns have been chosen,
the business and IT drivers are understood.
򐂰 Architect:
The architect is the bridge between the business and technology domains.
Once the business drivers are understood (from the discussions with the
Sales role and the business stakeholders), the architect can decide on likely
IT drivers, namely those goals IT must focus on to fulfill the business drivers.
Chapter 2. Portal composite pattern and custom designs introduction
23
Combined discussions with both the business and IT stakeholders are
important so that all can participate in the process of determining the final set
of Business and Integration patterns, then decide on the Application patterns,
and finally decide on the set of Runtime patterns. Once this is complete, the
architect can derive an initial architecture (operational architecture and
general architecture overview) for the implemented portal solution. When
design begins, it is the role of the architect to understand the “big picture” of
the system and to make sure the proper components are given priority so that
those business drivers that are most important are given top priority.
A Composite pattern such as the Portal composite pattern saves the architect
time by performing some initial “integration” work, bringing together various
characteristics that are important to a typical portal implementation. Anything
that speeds up the process, such as the Patterns for e-business assets, saves
time and increases the chances for a successful implementation.
򐂰 Developer:
Although developers are generally tasked with very specific programming
level tasks, it is important for those in this role to understand the general
thinking behind how the architecture was originally designed. This allows the
team to leverage the focused technical knowledge of a developer (an expert
Java programmer, for example) to understand how their tasks fit into the
system and to alert the team to how their work might impact other
components being designed. This role works to augment the architect role.
The developer also uses the Application design and development guidelines
provided by the Patterns for e-business to assist and speed up the application
development cycle.
The patterns are used to bring together the business and technical people in an
organization. The intersection point of these two groups is the set of Runtime
patterns that are detailed enough for developers and abstract enough for
business people (because these Runtime patterns are far less complex than a
portal or systems architecture diagram).
The Portal composite pattern is valuable because it performs some of the initial
“integration thoughts” that lead to a typical portal implementation. Of course, the
standard caution that “your mileage may vary” applies to the use of this
Composite pattern, because each implementation will introduce some variation.
Using just one or two Business and/or Integration patterns may not address all
the needs of both the business and IT drivers of the equation. The creation of the
Portal composite pattern has brought together a combination of patterns that can
jump start the design and analysis process. You realize savings in time, people,
and thus money by leveraging reusable assets such as the Patterns for
e-business.
24
Patterns: Portal Search Custom Design
2.3 Portal custom designs
A Custom design, like the Composite patterns, combines Business and
Integration patterns to create advanced, end-to-end e-business applications.
These solutions, however, have not been implemented to the extent of the
Composite patterns.
The Business and Integration patterns that could be combined in any given
Portal Custom design are as follows:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Access Integration
Self-Service
Collaboration
Information Aggregation
Extended Enterprise
Application Integration
Depending on the type of portal solution being deployed, different combinations
are implemented based on the required functionality. Some of these Business
and Integration patterns are more common. However, our premise here is that
these patterns common to any Portal custom design will contribute to the Portal
composite pattern that is the focus of this endeavor.
One of the patterns, Access Integration, can be considered the most distinctive
pattern for a portal, given its focus on improving a user’s access to information
and e-business services. Since the Integration patterns are used to extend the
capabilities of Business patterns, we will also be looking for which of the other
patterns contribute to a specific portal scenario. We will see that Self-Service,
Collaboration, Information Aggregation, and Application Integration can also be
important to portal solutions.
2.3.1 Access Integration pattern
The Access Integration pattern is commonly observed in e-business solutions
that provide users a seamless and consistent user experience that combines
access to multiple applications, databases, and services. It is used as a front-end
integration pattern. The Access Integration pattern does not stand alone in a
solution, but is typically used to combine Business patterns to create custom
designs and Composite patterns used to solve complex business problems.
Access Integration contains many of the characteristics that describe a portal
implementation. It fits well into the Portal composite pattern because it includes
aggregation and management of information and access to information by
various user and group types, and the business “rules” have been clearly defined
that determine which user types can access certain types of data.
Chapter 2. Portal composite pattern and custom designs introduction
25
For more information on the Access Integration pattern and its services, refer to
the Access Integration Pattern Using WebSphere Portal Server, SG24-6267.
2.3.2 Self-Service business pattern
The Self-Service business pattern describes situations where users are
interacting with a business application view or update data.
Often an organization not only wants to disseminate information internally but
also wants to make this information available to external users and partners. The
Self-Service business pattern is focused on allowing the end user access to
information from various data sources using a mechanism that allows the user to
access just the specific information that applies.
For more information on the Self-Service business pattern, refer to the following
redbooks:
򐂰 Patterns: Self-Service Application Solutions using WebSphere V5,
SG24-6591
򐂰 Self-Service Applications Using IBM WebSphere V5.0 and WebSphere MQ
Integrator V2.1 Patterns for e-business Series, SG246875
2.3.3 Collaboration business pattern
The Collaboration business pattern enables interaction and collaboration
between users including e-mail, virtual team meetings, e-learning, instant
messaging, and workflow processes. This pattern can be observed in solutions
that support small or extended teams who need to work together in order to
achieve a joint goal.
Collaboration can often combine with a workflow engine that provides the ability
to set up and support more complex processes that might involve multiple users
from different workgroups, departments, and organizations. An emerging
capability is the concept of contextual collaboration that incorporates functions
previously found only in knowledge management applications. This includes the
ability to apply context to a piece of content, to discover the experts within the
organization, and to add collaborative functions to transaction-based
applications. Collaboration is a core feature of a portal implementation.
26
Patterns: Portal Search Custom Design
2.3.4 Information Aggregation business pattern
The Information Aggregation business pattern describes situations where users
access and manipulate large amounts of data collected from multiple sources.
There are two broad aspects to consider: 1) populating data stores with
aggregated data, and 2) access to the aggregated data. Population is
accomplished through Application Integration techniques1. How access is
supported is related to the scope of data being accessed.
Access to a small portion of aggregated data is easily handled by the
Self-Service business pattern. Access to a single individual’s account summary
that was aggregated from multiple systems is an example of this. On the other
hand, when access is characterized by analysis and manipulation of large
amounts of the aggregated data, then Information Aggregation applies. This type
of access typically uses sophisticated tools that analyze, summarize and report
on large quantities of aggregated data stored in specially designed databases
optimized for just this type of analysis and reporting.
2.3.5 Extended Enterprise business pattern
The Extended Enterprise business pattern describes the programmatic
interaction between two distinct businesses.
The focus of the Portal composite pattern is to implement a portal within a
business or single enterprise. It does not directly address how two separate
enterprises will interact. In this book, our analysis revealed that treating external
enterprises as just additional “data sources” seems clearer than talking about
enterprise-to-enterprise interaction.
However, this is open to interpretation. Portals are about intergrating data and
processes, so this pattern only makes sense when bringing together the data
sources and systems in two enterprises. Thus, what this implies is more complex
re-architecture of two systems. It is just as effective and less complex to simply
treat other external systems as just data sources, the same as local databases or
applications. If these external systems support common communication
methods, this makes the intergation that much easier.
1
This reflects a recent change to the alignment of population related Application patterns. Previously
these patterns were grouped together with information access patterns and associated with
Information Aggregation. The population patterns are now considered to be part of the data focused
Application Integration patterns. Information Aggregation now addresses information access, in
particular when that access is characterized by analysis and manipulation of large amounts of
aggregated information, an approach typically associated with business intelligence...
Chapter 2. Portal composite pattern and custom designs introduction
27
2.3.6 Application Integration pattern
The Application Integration pattern provides for the seamless back-end
integration of multiple applications and/or data. Application Integration can be
process focused as well as data focused, and can therefore address integration
requirements for any of the Business patterns.
A portal can act as an integration mechanism for both application services and
information. Self-Service can use a process focused approach to transparently
invoke enterprise application services, and can depend on data focused
application integration to populate a centralized operational data store containing
customer information. Information Aggregation may depend on data focused
application integration to populate data stores that will be used for analysis and
reporting. This means that either or both forms of application integration may be
utilized by the portal, depending on the specific scenario.
2.3.7 Portal characteristics
The diagram shown in Figure 2-1 was part of the process used to identify the
Business, Integration, Application, and Runtime patterns that could be combined
into a Portal composite pattern based on the characteristics we needed. You can
use this diagram as a starting point to help determine the best fit for the particular
solution you need to create.
28
Patterns: Portal Search Custom Design
Business and
Integration
Application
Patterns
Access
Integration
Personalized
Delivery
Optional
Extended
Single
Sign-On
Web Single
Sign-On
Optional
Pervasive Device
Access
Self-Service
Directly Integrated
Single Channel
Collaboration
Store and
Retrieve
Optional
Application
Integration
Directed
Collaboration
Population
Single-Step
Index
Population
Figure 2-1 Patterns hierarchy contributing to the Portal composite pattern
Chapter 2. Portal composite pattern and custom designs introduction
29
2.3.8 The Portal composite pattern
The Business and Integration patterns that we have identified as the building
blocks or the more common patterns of the Portal composite pattern are as
follows:
򐂰
򐂰
򐂰
򐂰
Access Integration pattern
Self-Service business pattern
Collaboration business pattern
Application Integration pattern
Please note that based on your specific requirements, your building blocks of the
Business and Integration patterns for your portal may vary from the Portal
composite pattern. For example, you may find that you have use for the
Extended Enterprise business pattern in addition to the ones we defined, or you
may find that you only need the Access Integration, Collaboration, and
Information Integration business patterns for your portal. Based on your specific
requirements, this would then be defined as a Portal custom design.
Self-Service
Collaboration
Information Aggregation
(Optional)
Extended Enterprise
(Optional)
Application Integration
Access Integration
For this redbook, the visual representation of the Portal composite pattern is
shown in Figure 2-2.
Figure 2-2 Portal composite pattern showing our mandatory patterns2
2
This composite pattern reflects the re-alignment of data population patterns to Application
Integration from Information Aggregation.
30
Patterns: Portal Search Custom Design
2.3.9 Benefits
The Portal composite pattern is a combination of patterns, technologies, and
products. It allows for an understanding of the business and IT drivers that help
an organization answer these questions:
򐂰 Do I need a portal?
򐂰 What can I achieve with a portal?
Once an organization has determined that it needs to aggregate information,
target that information to specific users, analyze the usage of information, and
collect and manage information, it can use a portal to handle these requirements.
Consequently, using the Portal composite pattern will eventually lead to a choice
of Application patterns and the subsequent combined Runtime pattern. This, in
turn, will drive the creation of a portal architecture. Some specific benefits
include:
򐂰 A single aggregated view of content targeted to specific user types
򐂰 Ability to analyze usage patterns to make marketing efforts more efficient
򐂰 Ability to tailor the user interface to specific groups enabling a focus on
cultural, language, and nationality-based differences
򐂰 Single sign-on, allowing the user to “save time” and have access to
information while lessening the requirements for direct interaction with the
organization (saves money)
򐂰 Enables collaboration and human interaction in the context of the business
process
򐂰 Enables widespread teams to work together efficiently in the context of the
business process.
2.3.10 Limitations
The creation of a portal can in some cases be a complex undertaking. The
degree of complexity is driven in large part by the scope or range of application
services and aggregated content that will be provided through the portal. As the
number of applications being integrated increases, or the complexity of the
content or aggregated information expands, a portal implementation will likewise
increase in complexity. This translates into an impact to the IT organization within
the enterprise.
Chapter 2. Portal composite pattern and custom designs introduction
31
Introduction of a portal also can have an impact on the business organization
within the enterprise. Although this should be an intended rather than
unexpected result of a portal implementation, the following are examples of what
should be considered and planned for:
򐂰 Organizational changes
򐂰 Process changes
򐂰 Restructuring of existing data sources
򐂰 Rebuilding some existing applications to support available connectivity
options
򐂰 The detailed analysis of the various user groups that need to be supported
(usually in much more detail than what currently exists)
The Portal composite pattern assumes that there will be impacts in all of these
areas.
2.4 Summary
In summary, the Portal composite pattern includes characteristics from several
Business and Integration patterns that are typically part of a portal
implementation. However, when designing your solution, re-evaluate the chosen
patterns to assure that they contain the characteristics that are important for the
portal solution you are creating. Remember that it is ultimately based on the
business drivers and choosing a pattern and subsequent architecture that
supports those drivers.
32
Patterns: Portal Search Custom Design
Part 2
Part
2
Portal Search
custom design
.
© Copyright IBM Corp. 2004. All rights reserved.
33
34
Patterns: Portal Search Custom Design
3
Chapter 3.
The Portal Search custom
design
So far in this redbook, we have introduced the key concepts behind the IBM
Patterns for e-business, and we have introduced the Portal composite pattern
and Portal custom designs in general. In this chapter, we will introduce a specific
variation of the Portal composite pattern in the form of a Custom design — the
Portal Search custom design. The goal of this custom design is to provide a
solution for the advanced search requirements that are currently being identified
as organizations deploy portal solutions into their environments, and integrate
them into their business processes.
© Copyright IBM Corp. 2004. All rights reserved.
35
3.1 What is a Custom design?
As introduced in Chapter 1, “Patterns for e-business introduction” and Chapter 2,
“Portal composite pattern and custom designs introduction” on page 17, Custom
designs are similar to Composite patterns, as they combine Business patterns
and Integration patterns to form an advanced, end-to-end solution. These
solutions, however, have not been implemented to the extent of Composite
patterns, but are instead developed to solve the e-business problems of one
specific company, or perhaps several enterprises with similar problems.
In general, Custom designs do not meet the higher qualifications of a Composite
pattern, and do not give as great a reassurance of reusability, because they have
not been "recurrently employed to solve the problems of businesses across a
wide range of industries."
However, as the Custom designs detailed on the Patterns for e-business Web
site and within this redbook are used more and more by diverse developers, who
are vocal about the benefits and limitations of these solutions, these Custom
designs might eventually achieve the status of Composite patterns.
3.2 The need for portal search capabilities
As businesses have begun deploying and integrating portal solutions, the need
for more extensive search and retrieval capabilities beyond those provided in the
base Portal composite pattern has begun to surface.
Overall, the Portal composite pattern provides data access via simple self
service capabilties that allow one to receive small views of data. The inclusion of
these self service capabilties as part of an overall secure and personalized Portal
interface means that these “snippets” of data are organized in a manner such
that is easy to locate and find. However, most enterprises are still left with no
clear way to access and search across all of their corporate data and knowledge,
especially the unstructured data that is not normally accessible via normal
business intelligence tools. While each key data source may be able to be
surfaced in a self-service manner within the portal as a portlet application, each
still requires training on syntax, semantics, and interfaces.
Obviously, in any portal implementation, the ability to locate such data and
information is vital — and thus the next logical step in providing information
location capabilties in a portal is to provide robust search and retrieval capabilties
that can encompass these disparate data sources. This includes expanding
search capabilties outside the context of the portal; to enter a single request that
searches potentially thousands of data repositories, the Internet, and for people
with expert knowledge, at the same time. These repositories could be of varied
36
Patterns: Portal Search Custom Design
content and structure and, like the experts with whom you need to collaborate,
they might be geographically dispersed throughout the world.
It is combining these advanced search needs with the other personalization,
self-service, and collaborative capabilties of a Portal composite pattern, that
takes the value of a portal to the next level. Organizations are able to perform all
key business activities from within the portal, resulting in an improved efficiency,
and a reduced latency of business events.
Thus, in defining the need for such advanced search capabilties, we have
ultimately defined the business drivers for such a Portal Search custom design,
as follows:
򐂰 To help streamline current business activities, by improving organizational
efficiency and reducing the latency of business events. When this desire is
applied to search capabilities, one can see that these efficiency
improvements will ultimately come from:
a. Distilling meaningful information from a vast amount of structured and
unstructured data
b. Providing easier access to vast amounts of unstructured data through
indexing, categorization, and other advanced forms of summarization
3.3 Technology drivers
As we described when we introduced the Portal composite pattern earlier in this
redbook, while it is the business drivers that ultimately drive an IT organization,
the appropriate use of technologies can also have an important business impact
in terms of:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Minimized application complexity
Minimized total cost of ownership (TCO)
Leverage of existing skills
Leverage of legacy investments
Back-end application integration
Minimized enterprise complexity
Maintainability
Scalability
Availability
For the advanced portal search needs we have described to this point, there can
be substantial savings in terms of reducing costs, leveraging legacy investments,
minimizing enterprise complexity, and simplifying maintainability — all depending
on the manner in which these search capabilties are built and integrated.
Chapter 3. The Portal Search custom design
37
For example, when building a single search interface that accesses multiple data
repositories and legacy systems on the back-end, how does one do this in a cost
effective manner? How does one build interfaces into all of these systems so that
they are easily maintained? What happens when a given repository is upgraded
or replaced? What are the impact and costs for such a change?
By clearly defining IT drivers in regards to cost, simplicity, and maintainability as
key decision points in the solution, we will ensure that any solution meets all of
the wider business needs beyond the purely functional needs. Thus, these
defined IT drivers, along with the business drivers, ultimately feed into the
selection of Application patterns that are appropriate and will be used to
implement this Custom design.
3.4 The Custom design
Now that we have defined the business and IT context for this custom design to
be built on top of the Portal composite pattern, it is time to discuss the specific
Business patterns that apply. However, to be able to clearly distinguish this
Custom design from the Composite pattern, we must go down to the Application
pattern level as well.
When considering the application patterns for any composite pattern or custom
design, both mandatory and optional patterns are described. The mandatory
patterns represent the recurring patterns that should be regularly implemented
by companies in a Portal Search custom design. The optional patterns represent
patterns that are not necessarily implemented with each solution, but may make
sense to include for a specific company requirements. Their inclusion would
result in yet another Portal custom design.
Figure 3-1 depicts the normal mandatory and optional Application patterns for a
Portal solution, as they map to the business and integration patterns already
discussed in Chapter 2, “Portal composite pattern and custom designs
introduction. As discussed in this previous chapter, these identified Application
patterns are based on the requirements of typical portal implementations.
38
Patterns: Portal Search Custom Design
Note: As mentioned earlier in this redbook, recent changes have been made
to the alignment of the “Population” related Application patterns, as well as to
the naming of some of the Information Aggregation patterns.
The population patterns are now considered to be part of the data focused
Application Integration patterns. Information Aggregation now addresses
information access, in particular when that access is characterized by analysis
and manipulation of large amounts of aggregated information, an approach
typically associated with business intelligence, content management, and
knowledge management.
Details on these recent changes in patterns can be found in Appendix A,
“Pattern changes” on page 205.
Business and
Integration
Application
Patterns
Access
Integration
Personalized
Delivery
Optional
Extended
Single
Sign-On
Web Single
Sign-On
Optional
Pervasive Device
Access
Self-Service
Directly Integrated
Single Channel
Collaboration
Store and
Retrieve
Optional
Application
Integration
Directed
Collaboration
Population:
Single-Step
Population: Index
Population
Figure 3-1 Portal composite pattern — typical application patterns
Chapter 3. The Portal Search custom design
39
When examining the patterns included in Figure 3-1, and the functionality
provided by these patterns, it is clear that some level of “data access” is included
via the Application Integration::Population and Self-Service::Directly Integrated
Single Channel patterns. Of course, as discussed earlier in this book, how
access is supported is related to the scope of the data access — and only
access to small snippets of information is supported in the base Portal composite
pattern via inclusion of the Self-Service pattern.
However, the business and IT drivers we defined earlier for this Portal Search
custom design clearly show the need for data access characterized by analysis
and manipulation of large amounts of the data. In such cases “Information
Aggregation” applies, and the Information Aggregation business pattern must be
introduced to this design to support this need. The Application pattern specifically
related to search that we thus need to add to our Custom design is the
Information Aggregation::User Search and Discovery application pattern.
Additionally, as the amount and scope of data access increases, the capabitlies
required for populating data stores with such data also increases — and
additional Application Integration patterns are required. To join the existing
Population: Single Step, and Index Population patterns, we must also add the
Population: Multi-step and Federation patterns to cover the more advanced data
collection needs.
Figure 3-2 shows this updated list of Application patterns that makes up the
Portal Search custom design.
40
Patterns: Portal Search Custom Design
Business and
Integration
Application
Patterns
Business and
Integration
Application
Patterns
Access
Integration
Personalized
Delivery
Application
Integration
Population:
Single Step
Optional
Optional
Extended
Single
Sign-On
Population:
Multi Step
Web Single
Sign-On
Population: Index
Population
Pervasive Device
Access
Federation
Self-Service
Directly Integrated
Single Channel
Collaboration
Store and
Retrieve
Optional
Information
Aggregation
User Search
& Discovery
Directed
Collaboration
Figure 3-2 Portal Search custom design — application patterns
As depicted, the changes between the base Portal composite pattern and the
Portal Search custom design are the inclusion of the following additional
patterns:
򐂰 Information Aggregation::User Search and Discovery
򐂰 Application Integration::Population: Multi-step
򐂰 Application Integration::Federation
When combining these three new Application patterns with the previously
existing Application Integration::Population: Single Step and Population: Index
Population patterns, we have a full set of “search” related capabilties. It is these
five Application Integration and Information Aggregation patterns that will be the
focus of the rest of this redbook.
Figure 3-3 then depicts a higher level comparison between the Portal composite
pattern, and this custom design — in that the fundamental difference is the
inclusion of the Information Aggregation business pattern.
Chapter 3. The Portal Search custom design
41
Self-Service
Collaboration
Information Aggregation
(Optional)
Extended Enterprise
(Optional)
Collaboration
Information Aggregation
Extended Enterprise
(Optional)
Application Integration
Self-Service
Access Integration
Portal Search custom design
Application Integration
Access Integration
Portal composite pattern
Figure 3-3 Portal composite pattern compared to Portal Search custom design
However, it is important to note that the remainder of the mandatory and optional
Application patterns from the original Portal composite pattern, which are still
included within the custom design, are all crucial to integrating these search
capabilties into a portal solution. For example, the Access Integration patterns
are required to integrate the search capabilities with the other portal functionality
via Single-Sign-On and Personalization; while the Collaboration and Self-Service
patterns are required to provide the common collaborative/teaming benefits of a
robust portal solution.
For a detailed discussion on all of the application patterns associated with the
overall Portal composite pattern, please see the IBM Redbooks:
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V4.1.2,
SG24-6869
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V5, SG24-6087
3.5 Summary
At this point we have introduced the IBM Patterns for e-business, the Portal
composite pattern, and now the Portal Search custom design. While this is a
large amount of introductory material, we felt it important to show the full
“lineage” of this Custom design, so that all readers can clearly understand the
business context for this solution.
It is now time to get into some of the details of this Custom design, and the
technologies one can use to implement it — and we will do so in the remaining
chapters of this redbook.
42
Patterns: Portal Search Custom Design
4
Chapter 4.
Application patterns
After identifying the Business and Integration patterns, the next step in planning
an e-business application is to choose the Application pattern(s) that apply to the
business drivers and objectives. An Application pattern shows the principal layout
of the application, focusing on the shape of the application, the application logic,
and the associated data. Such Application patterns are then taken to the next
level, by creating a set of Runtime patterns. Runtime patterns are discussed
within the next chapter.
This chapter focuses on defining and describing the Application patterns that
apply to typical e-business portals that have been extended for robust search
capabilities. The Application patterns we will discuss fall under the Information
Aggregation business pattern, and the Application Integration pattern.
© Copyright IBM Corp. 2004. All rights reserved.
43
4.1 An overview of the Application patterns
As identified in Chapter 3, “The Portal Search custom design“, the following five
application patterns differentiate this Custom design from the basic Portal
composite pattern. And it is these Application patterns that we discuss in this
chapter:
򐂰
򐂰
򐂰
򐂰
򐂰
Information Aggregation::User Search and Discovery
Application Integration::Population: Single Step
Application Integration::Population: Multi-step
Application Integration::Population: Index Population
Application Integration::Federation
However, prior to examining these specific application patterns in more detail, it
is important to first understand the relationship between the Information
Aggregation and Application Integration patterns.
Basically, Application patterns use logical tiers to illustrate the various ways to
configure the interaction between users, applications, and data. The focus in
these tiers is on the application layout, shape, and application logic for the
associated data. In some cases though, multiple Application patterns may be
required to define a complete interaction between users, applications, and data.
The results of one Application pattern will feed into another Application pattern,
so that the combination of patterns results in a functioning e-business solution.
Search solutions based on the Application Integration and Information
Aggregation patterns follow this model.
First, the data integration aspects of the Application Integration (also known as
Enterprise Application Integration) patterns serve to integrate the information (or
data) used by multiple applications. In the case of search solutions, existing data
is available, in both structured and unstructured forms, in existing application
data repositories. A proven repeatable pattern is thus needed for combining this
data in one search, and this is where Application Integration patterns fall.
Next, the Information Aggregation patterns allow users to access and manipulate
data that is aggregated from multiple sources. Thus, these patterns take the data
that is available from the multiple sources and applications via Application
Integration, and provide tools to extract useful information and value from such
large volumes of data.
Figure 4-1 depicts this relationship between these two business and integration
patterns.
44
Patterns: Portal Search Custom Design
.
Information
Aggregation
User
Searchable
Data
Application
Integration
Original
Application
Data
Figure 4-1 The relationship between Information Aggregation and Application Integration
4.2 Application Integration patterns
As a whole, the Application Integration pattern (also known as Enterprise
Application Integration) serves to integrate multiple Business patterns, or to
integrate applications and data within an individual Business pattern. The pattern
has two approaches for providing such integration:
򐂰 Process integration: The integration of the functional flow of processing
between the applications
򐂰 Data integration: The integration of the information used by applications
For search related solutions, it is primarily the data integration focused aspects of
Application Integration that are involved to integrate data with an individual
(Information Aggregation) business pattern. Thus, this section will focus on the
Application Integration application patterns that implement such “data
integration”. These “data integration” focused patterns can be broken into two
sub-categories.
򐂰 Data movement application patterns:
– Population: Single Step
• Population: Multi-step
• Population: Data Cleansing
– Population: Index Population
– Population: Synchronization
򐂰 Federated access application patterns:
– Federation
In general, these patterns apply to both “search”, and traditional business
intelligence/data mining types of activities. However, the focus of the pattern
descriptions in this redbook will focus more on their usage in the unstructured
text/search world, rather than the more structured data/business intelligence
world — although information on all aspects of the patterns will be provided
whenever feasible.
Chapter 4. Application patterns
45
4.2.1 Population: Single Step, Multi-step, and Data Cleansing
The Population: Single Step, Multi-step, and Data Cleansing patterns all follow a
similar model, and build upon each other. The primary business drivers for these
Population patterns are to reconcile data from multiple data sources. In Single
Step population, the reconciliation is sufficiently simple that it can be conceived
as a single functional entity. In many cases, however, the transformation and
restructuring is rather complex. This leads to the Multi-step variation. Similarly,
extensive analysis and cleansing is emphasized in the Data Cleansing variation.
These patterns are most often applied towards business intelligence related
business problems. However, they can be utilized to provide content feeds into
an e-business portal of more unstructured data. This “content” can then be
accessed via the portal, or even searched via basic portal search capabilities.
Business and IT drivers
Here, we are concerned with the following business and IT drivers:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Improve organizational efficiency
Reduce the latency of business events
Distill meaningful information from a vast amount of structure data
Minimize total cost of ownership (TCO)
Promote consistency of operational data
Maintainability
The primary business driver for choosing these Population: Single Step,
Multi-step, or Data Cleansing patterns is to copy data from the source data store
to a target data store with possible transformation of the data in the process. In
the case of a single step, the main reason for creating a copy of the data is to
avoid manipulating the primary source of a company’s operational data often
maintained by Operational Systems. However, in the case of a multi-step or data
cleansing, the data requires extensive reconciliation, transformation, and
restructuring to improve usability.
Solutions
The Application pattern shown in Figure 4-2 represents the basic single step
data population functionality by a “read dataset – process – write dataset” model.
There can be one or more source data stores that are read by the population
application. These source data stores are created and maintained by other
processes. The target data store is the output from the population application.
These can be the final output from the process, or can be an intermediate data
store used as the source for another step in the process.
46
Patterns: Portal Search Custom Design
Metadata
Population
method
App
Source
Target
Figure 4-2 Population: Single Step application pattern
The box around the source data represents the fact that the source data may
need to be accessed by means of a control application via an application API, or
may be accessed directly via a database API.
The metadata contains the rules describing which records from the source are
read, how they are modified (if needed) on their way to the target, and how they
are applied to the target. The rules are depicted in this way to emphasize the
best practice of having a rules-driven application, rather than hard-coding the
rules in the application, to facilitate maintenance.
This logical dataset also holds a variety of metadata describing the output that
the population application produces, such as statistics, timing information, and so
on. In general, both source and target can contain any type of data, including
structured and unstructured data. However, in the majority of the cases, this
Application pattern is used for propagating structured data from one data store to
another.
In providing the capabilities outlined above, this Application pattern uses
common services related to Data-focused integration such as Data replication,
Cleansing, Transformation, and Augmentation. These common services are
further elaborated on the Patterns for e-business Web site, specifically under the
discussion about Application patterns for Application Integration:
http://www.ibm.com/developerWorks/patterns/
Chapter 4. Application patterns
47
Figure 4-3 depicts the common three-step process.
optional
Metadata
Metadata
Transform
Extract
Metadata
Load
App
Target
Source
Intermediate
data
Intermediate
data
Figure 4-3 Population: Multi-step variation
In the Multi-step variation of the pattern, the building block provided by the
Population: Single Step application pattern is repeated several times to achieve
the desired results. The intermediate target data created by one step acts as the
source data for the subsequent step.
As shown in Figure 4-3, the application is divided into three logical tiers: extract,
transform, and load. In most best practice implementations, these functional
steps contain additional sub tasks:
򐂰 The Extract Tier extracts data from the source data store. This data store is
typically owned by another application and used in a read/write fashion by
that application. The extraction rules may range from a simple rule such as
including all data, to a more complex rule, prescribing the extraction of only
specific fields from specific records under varying conditions.
򐂰 The Transform Tier transforms data from an input to an output structure
according to the supplied rules. Transformation covers a wide variety of
activities, including reconciling data from many inputs, transforming data in
individual fields based on predefined rules or based on the content of other
fields, and so on. When two or more inputs are involved, there is generally no
guarantee that all inputs will be present when required. The transform step
must be able to handle this situation.
򐂰 The Load Tier loads the input data into the target data store. As with extract,
load can range from a simple process of overwriting the target data store to a
complex process of inserting new records and updating existing records.
48
Patterns: Portal Search Custom Design
The actual implementation can involve a fewer or greater number of steps. In
such cases, the diagram in Figure 4-3 must be adjusted accordingly, and
consideration must be given to the placement of any additional tiers. It is also
important to note that this Application pattern has been generalized to cover any
source and target data stores.
Finally, in the new Data Cleansing variation of this pattern shown in Figure 4-4,
the transform and load step from the Multi-step pattern has been combined into a
single data analysis and cleansing stage. This stage does not so much transform
the data, but rather validates and cleanses the data of errors. Data is extensively
analyzed for such errors, such that the resulting data may include calculated or
deduced information. Additionally, the resulting data in this variation may in fact
be written back to the original source database.
Metadata
Metadata
Data Analysis
& Cleansing
Extract
App
Target
Source
Intermediate
data
Figure 4-4 Population: Data Cleansing variation
Guidelines for use of these population patterns
It is highly recommended that the logic that governs the transformation of source
data into target data (including any transformation or cleansing) be implemented
using rules-driven metadata rather than hard coding these rules. This approach
enhances the maintainability of the application and hence this reduces the total
cost of ownership.
Benefits
This is the ideal architecture when the required transformation between the
source and target data store is simplistic.
This architecture is ideal when data must be transformed between a source and
target data store. This can include simplistic transformation of the data (single
step) or complex transformation (Multi-step and Data Cleansing) — as all levels
are supported by the variations in this pattern.
Chapter 4. Application patterns
49
Limitations
Most of the real world requirements for propagating structured data from one
data store to another are complicated. They require extensive reconciliation,
transformation, restructuring, and merging of data from multiple sources. Under
such circumstances a single-step approach is obviously not advisable, and a
multi-step approach should be undertaken.
Additionally, reconciling data from multiple sources is often a complex
undertaking and requires a considerable amount of effort, time, and resources.
This is especially true when different systems use different semantics.
Putting the Application pattern to use
Consider a Financial Services Company that provides various services, including
checking account, savings account, brokerage account, insurance, and so on.
The company has built this impressive portfolio of services primarily through
mergers and acquisitions. As a result, the company has inherited a number of
product-specific operational systems. The company would like to create a
Business Data Warehouse (BDW) that provides a consolidated view of customer
information. It would like to use this consolidated information for sophisticated
pattern analysis and fraud detection purposes.
Populating such a BDW would require reconciling customer records from
different operational systems that use different identification mechanisms to
identify the same customer. Further, other operational systems record
transactions with different time dependencies. The reconciliation process must
resolve these semantic and time differences and must check for any
inconsistencies and irregularities. Due to the complexity involved, the Financial
Services Company chooses the Population: Multi-step and Population: Data
Cleansing application patterns.
Implementation details
The Population: Single Step application pattern is not documented with
additional Runtime patterns or product mappings. Because Single Step
functionality is essentially a simplified version of the functionality found in the
Population: Multi-step application pattern, the solution designs documented there
can be used as a basis for further understanding implementation details of the
Population: Single Step application pattern.
4.2.2 Population: Index Population application pattern
The Population:Index Population application pattern is a new application pattern
that combines several prior population application patterns. Specifically, it
represents the combination of the previous “Population Crawl and Discovery”,
and “Population Summarization” applications patterns. As the business world’s
usage and understanding of search solutions has increased, it has become
50
Patterns: Portal Search Custom Design
apparent that a single Application pattern more accurately represented the
solutions being built today. Thus, this single “Index Population” application
pattern has replaced the original multiple search related Population application
patterns that existed.
Overall, Index Population provides a structure for applications that retrieve and
parse documents and data, and create resulting indices, taxonomies, and other
summarizations of the original data. These result sets may include:
򐂰 A basic index of relevant documents that match a specified selection criteria
򐂰 A categorization or clustering of common documents from the original data
򐂰 An automatically built taxonomy of the original data, to allow for easy
browsing
򐂰 Expertise location, automatically mapping the authors of the original data to
topics of “expert based on the contents of the documents, and the categories
discovered.
Business and IT drivers
Here, we are concerned with the following business and IT drivers:
򐂰 Improve organizational efficiency
򐂰 Reduce the latency of business events
򐂰 Provide easier access to vast amounts of unstructured data through indexing,
categorization, and other advanced forms of summarization.
򐂰 Provide access to corporate/institutional “tacit” knowledge via identification of
experts within the organization.
Note: “Tacit” knowledge is the untapped knowledge still within the human
mind that has not yet made it into documents and formal data.
򐂰 Minimize total cost of ownership
򐂰 Maintainability
Overall, the primary business driver for choosing the Index Population application
pattern is to provide a more usable and relevant organization of documents or
unstructured data, built from a vast set of original documents, and based on a
specified selection criteria. The objective is to provide quick access to useful
information instead of bombarding the user with too much information.
Search engines that crawl the World Wide Web/file systems implement this
Application pattern; as well as the more advanced “discovery” search engines
that perform document clustering/categorization, expertise location (that is,
identify experts), and intelligent analysis of the document contents.
Chapter 4. Application patterns
51
This pattern is best suited for selecting useful information from a huge collection
of unstructured textual data. A variation of this application pattern can be used
for working with other forms of unstructured data such as images, audio, and
video files — in such cases additional transformation and translations services
are required to parse and analyze the data.
The solution
As shown in Figure 4-5, this Application pattern mainly follows the framework
proposed by the Population: Single Step application pattern. However, in the
case of this Index Population application pattern, the “Search, Discover, and
Indexing” tier crawls through multiple data stores, retrieving documents, parsing
them, and building a result set of all documents that match the selection criteria.
In some cases, such as World Wide Web search engines, the contents of
documents in one data source (that is, URL links) may actually be used to
determine additional data sources to crawl.
Metadata
Index and
Taxonomy
Retrieve & Parse
documents
Index,
Summarize, &
Categorize
Method
App N
App 2
Sources
(In practice, may be a multi-step method)
Figure 4-5 Population: Index Population application pattern
When the unstructured data recovered by these activities must be transformed,
cleansed, or manipulated before it can be purposefully used, a Multi-step variant
of this application pattern based on the Population: Multi-step application pattern
might be required. This Multi-step approach is often required for the more
advanced search applications that perform document clustering and expertise
identification. An initial step will often exist to parse the original data from multiple
sources and build a single interim “index” that contains key pieces of document
data and meta-data. This initial step then allows additional steps to summarize,
categorize, create taxonomies, or locate experts from this single normalized
index.
52
Patterns: Portal Search Custom Design
Additionally, this Application pattern can probably be even further decomposed
into two separate Application patterns — one that performs “search” types of
activities, and another that actually populates the index from these searches.
However, in this current book, we will simplify things by treating the Population:
Index Population application pattern as simply another Population pattern. Such
decomposition of this pattern any further will be left to future redbooks.
Guidelines for usage
As discussed earlier in this section, in many case the pattern will be implemented
in a Multi-step approach, utilizing intermediate data stores at each stage. The
resulting data may even be distributed across multiple target data stores. For
example, an indexing engine that produces a basic document index, summary,
and taxonomy — may create the index during an initial step, the summary in a
secondary step, and then the taxonomy in a final third step; storing the taxonomy
in a 2nd target data store to improve performance when accessing the index or
walking the taxonomy.
Benefits
This is the ideal architecture for extracting useful information from a vast set of
unstructured textual data.
Limitations
This Application pattern is geared towards unstructured text document location
and search needs. This pattern does not apply to business intelligence types of
activities — where the more structured data focused Population: Single Step,
Multi-step, and Data Cleansing application patterns apply.
Usage scenarios
Consider a large software company with a huge array of software products. The
company develops vast amounts of technical documentation to support these
products. Each product line publishes its own documentation on its own
department Web site. As products change, so does the technical support
documentation. Locating a particular piece of information in this sea of ever
changing data can be quite challenging and time consuming.
In order to improve efficiency of information access, the company wants to create
a categorized and federated index of all documents — that can then be searched
or browsed by users as needed to find the required information. Such an index
must be refreshed on a periodic basis to keep it current. To meet these
requirements, the software company chooses to implement the Population:
Index Population application pattern.
Chapter 4. Application patterns
53
4.2.3 Population: Synchronization application pattern
There is one more “data movement” Application Integration pattern that, while it
does not directly relate to a portal search solution in all cases, should be
referenced here for a complete understanding of this group of patterns.
This additional pattern is the Population: Synchronization application pattern, but
was previously known as the “Replication” pattern. It enables a coordinated
bidirectional update flow of data in a multi-copy database environment. It is
important to highlight the “two-way” synchronization aspect of this pattern, as it is
separate from the “one-way” capabilities provided by the Population patterns
already discussed.
Business and IT drivers
This Application pattern may be required for geographically dispersed
applications using similar database technologies and schemas. It is needed by
mobile workers who can not have direct access to the central repository. Inherent
support of synchronization by database products makes this an ideal solution for
distributed environments. For homogeneous database environments,
synchronization is very straightforward.
The solution
As shown in Figure 4-6, this pattern is a basic two-way synchronization of data
between separate data stores. The two variations shown represent the fact that
the data may be replicated via a controlling application through an application
API, or may be replicated directly via database APIs.
App 1
Synchronization
Method
App 2
OR
App 1
App 2
Synchronization
Method
Figure 4-6 Population: Synchronization application pattern
54
Patterns: Portal Search Custom Design
Applications in this solution design do not necessarily have to be identical, but
underlying database schemas should be. Synchronization processing can be
used for propagation by eliminating the feedback process.
Guidelines for usage
Synchronization conflict resolution needs to be incorporated into the exception
processing design of this solution. If the data is updated by both the source and
target systems then the sychronization may fail because of timestamp conflicts. If
dual control is required for a synchronization solution then a conflict resolution
solution must be implemented.
Usage Scenarios
As an example of this Application pattern, consider a mobile worker downloading
a work list at the beginning of the day, and uploading updates to this work list at
the end of the day. Also, consider the broadcast of product / price information to
multiple lines of business where all the LOBs use the same applications as an
illustration of this Application pattern put to use.
4.2.4 Federation application pattern
The Federation application pattern, previously known as the Federated
Repository application pattern, creates a unified query interface into isolated
structured and unstructured repositories. It is a key component of an overall
search solution when many complex and diverse sources must be integrated.
Business and IT drivers
This pattern is normally selected for the same business drivers as for the “data
movement” patterns. That is:
򐂰 Improve organizational efficiency
򐂰 Reduce the latency of business events
򐂰 Distill meaningful information from a vast amount of structure data
However, it is the IT drivers that distinguishes the selection of this pattern over
the more basic “data movement” patterns; as its “connector/adaptor” design
allows for improved:
򐂰
򐂰
򐂰
򐂰
Maintainability
Minimized total cost of ownership
Leverage of existing technology investments
Reduced deployment and implementation costs
The solution
As shown in Figure 4-7, this pattern provides a real-time query interface into both
structured and unstructured data. Metadata mapping enables the decomposition
of a unified query into requests to each individual repository. The information
Chapter 4. Application patterns
55
model appears as one unified virtual repository to users. Using adapters for each
target repository, multiple disjoint formats can be integrated into a common
federated schema.
Metadata
App N
read
App 1
Data Integration
Method
App 2
read/write
Figure 4-7 Federation application Pattern
Is is important to note that this pattern is not a user accessed pattern, but rather
represents the method in which another system or application can access
integrated data.
Benefits
This Application pattern is appropriate when an infrastructure for integrating data
sources is needed without the need for propagation and/or additional
repositories. It can be driven by the need for unified information access by portal
projects. It is useful where relational data and text data need to be accessible
through one common Web search interface. It is also applicable for the structured
data-only/business intelligence solutions where the frequency of change of
application data would prohibit an Operational Data Store type of solution.
Limitations
This solution eliminates any duplication of data that exists in other data
integration patterns but does require metadata mapping during the processing of
the federated query. The architecture of this federated real-time query
environment needs to be tuned for optimum performance.
Usage scenarios
As an example of the appropriate use of this Application pattern, consider the
situation in which an insurance agent requires access to a customer's policy
information, policy document, and pictures from an automobile accident claim. A
line of business executive could use this Application pattern to view information
56
Patterns: Portal Search Custom Design
about one of his key customers using an aggregate of sales data, customer
account information, and the latest news from syndicated sources
As another example, consider when an customer support agent requires
information about a certain product to answer a customers support call.
Documents exist in multiple locations, a file system, a knowledge base, a Web
site, etc., from which an answer could be found. Additionally, search interfaces to
some of these locations have already been created in the past. Rather than
requiring the customer support agent to use multiple search engines to locate the
needed information, a single “federated search” application could be created that
would perform a unified query against all of the existing search engines, and
return one set of normalized results — so that the customer support agent can
quickly find the relevant information to solve the customer’s problem.
4.3 Information Aggregation patterns
As mentioned earlier, the Information Aggregation business pattern, also known
as User-to-Data, can be observed in e-business solutions that allow users to
access and manipulate data that is aggregated from multiple sources. This
Business pattern captures the process of taking large volumes of data, text,
images, video, and so on, and using tools to extract useful information from them.
There are two key applications patterns in this area that we will discuss:
򐂰 User Information Access (UIA)
򐂰 User Search and Discovery (US&D)
Overall, these patterns represent similar functionality, with the User Information
Access applying to primarily structured data/business intelligence types of
applications, and the User Search and Discovery pattern applying to
unstructured data/knowledge search applications. However, both of these
patterns will be discussed in this section.
4.3.1 User Information Access application pattern
The User Information Access application pattern, previously known as the
Information Access application pattern, helps structure a system design that
provides access to aggregated information. It is most often used in conjunction
with one of the “data movement” Application Integration patterns already
discussed, to provide a data access interface for users into an aggregated
repository created by these data movement patterns. For the more data oriented
applications to which this pattern applies, this might also be called a “query”.
Chapter 4. Application patterns
57
There are two forms of this pattern that we will discuss. First is a basic variation,
in which users view but cannot update information. The second variation is very
simliar to the basic read-only variation, except that the data sources may be
updated.
Business and IT drivers
Here, we are concerned with the following business and IT drivers:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Improve organizational efficiency
Reduce the latency of business events
Provide access to distilled information and drill-through capability
Minimize total cost of ownership (TCO)
Promote consistency of Operational Data
Maintainability
The primary business driver for choosing this Application pattern is to provide
efficient access to information that has been aggregated from multiple sources.
This mechanism can access both structured and unstructured data populated by
one or more of the “data movement” application patterns. Internal and/or external
users may use this information for decision-making purposes.
For example, an Executive Information System (EIS) might generate a summary
report on a periodic basis that compares the sales performance of various
divisions of a company with the sales targets of those divisions. In this example,
the User Information Access application pattern is used for accessing information
from structured raw data. In addition, the application may provide drill-through
capability allowing the user to track the performance of individual sales
representatives against their individual targets.
The basic read only solution
Figure 4-8 shows a diagram of the User Information Access application pattern.
Metadata
App N
read
Pres.
Data Integration
Method
App 2
read
Figure 4-8 User Information Access application pattern
58
Patterns: Portal Search Custom Design
As shown in Figure 4-8, the basic User Information Access application pattern is
broken into three logical tiers:
򐂰 The Presentation Tier is responsible for all the user interface related logic that
includes data formatting and screen navigation. In some cases the
presentation might be as simple as a printout.
򐂰 The Data Integration Tier is responsible for accessing the associated read
only data store and distilling the required information from this data.
򐂰 An additional “drill-through” Application Tier is sometimes provided in this
pattern to allow the ability to drill-through to detailed data. However,
drill-though capability is not always required so is not depicted in the pattern
diagram. Data necessary to enable drill-through might already exist and be
accessed from an existing information access application, or might be defined
in the scope of this information access application.
Additionally, the box around one of the target sources represents the fact that the
target data may need to be access via a controlling application’s application API,
or may be accessed directly via a database API.
Two variations
Patterns that include update facilities are needed because many data oriented
systems that provide query and reporting also need to accommodate changes
provided by the end users. For example, a user analyzing financial results may
also wish to include their own budgetary figures.
Thus, a very minor change to this pattern is needed to show the “read/write”
capabilities of the access to the data sources. One read/write variation is shown
in Figure 4-9.
Metadata
Read/write
Pres.
App N
Data Integration
Method
App 2
read/write
Figure 4-9 User Information Access application pattern: immediate update
There are actually two main methods in which an update can be supported –
immediate or batched.
Chapter 4. Application patterns
59
In an “immediate” variation, any user updates are immediately applied to the
source data. However, when multiple source applications are involved,
propagating changes back to the source applications can be extremely complex,
possibly even requiring two-way synchronization of data. Such an “immediate”
approach is depicted in Figure 4-9 on page 59. Note that in some cases it may
make sense to provide an additional “update” node to handle the update
processing, especially if it is complex.
A “batched” variation may also make sense in some cases in which updates are
not as time critical. Such an approach may involve an additional update “staging”
repository to hold any updates made by the user. Such updates would then be
applied on some set schedule, as a “batch” process. Similar to the immediate
approach, this update process may be implemented via an additional “update”
node — or, even handled by a “population” tier that is external to this pattern.
The choice of a batched or immediate variation will usually be based on various
IT drivers in the environment. There may be existing population methods already
in existence and ready to be leveraged for the “batch” updating. Alternatively, the
business controls and rules of an enterprise may require all updates to come
from a single controlling application. The introduction of an additional update
source, as in the “immediate update” variation of this pattern, may introduce data
integrity and reliability issues.
Guidelines for use
A clear separation of the presentation logic and the information access logic
increases the maintainability of the application and decreases the total cost of
ownership. This allows the same information to be accessed using various user
interfaces.
Benefits
For the basic pattern, the use of read-only data provides for maximum
consistency in a multi-user analysis or reporting environment. This simple yet
powerful Application pattern meets the majority of the information aggregation
and distillation needs. The simplicity of this pattern reduces implementation risk.
Limitations
In some cases, the data sources utilized in this pattern will be single consolidated
data sources (that is, ODS) taken from multiple original datasets. In such cases,
any updates to the consolidated data would not be propagated back to the
originating data/source applications. One option is to have updates occur via
“drill through” capabilities, in which users drill through to the original data and
make updates via original source application capabilities. Updates then occur to
the original data; however, a time delay could occur until these updates are
available in the consolidated data that is accessed by data queries.
60
Patterns: Portal Search Custom Design
Putting the Application pattern to use
Consider a Personal Portal such as my.yahoo.com that aggregates information
from disparate data sources and allows users to personalize this information to
meet their preferences. These portals aggregate both structured data such as
weather information and stock quotes and unstructured data such as news and
links to other sources of information. Based on the type of the data and the
amount of transformation required, the portal developers may choose one or
more of the Population application patterns. Once the data has been stored in
the optimal format, the portal developers may offer the User Information Access
application pattern to search this information, access this information in a
personalized style, and/or to provide drill-through capabilities.
4.3.2 User Search and Discovery application pattern
The User Search and Discovery pattern is very similar to the User Information
Access pattern. However, as previously mentioned, while User Information
Access applies primarily to structured data/business intelligence types of
applications, the User Search and Discovery pattern normally applies to
unstructured data/knowledge search applications.
This search-focused User Search and Discovery application pattern is shown in
Figure 4-10.
Metadata
Retrieve & Parse
information
Pres.
Search, Discover
& Optional
Additional
Function
App N
App 2
Figure 4-10 User Search and Discovery application pattern
With a quick glance, one can easily see that this pattern is almost a duplicate of
the User Information Access application pattern discussed earlier. The only real
change is that the previous “data integration” tier has been replaced by a
“search” tier. This is done to highlight the “search” aspects of this pattern versus
the normal relational “query” aspects of the User Information Access pattern.
Chapter 4. Application patterns
61
Overall, the search tier supports a unified query mapping, across multiple data
indices. Both metadata and query syntax mapping enables the decomposition of
such a unified query into requests understood by each individual data index. This
syntax mapping is an important aspect, as search concepts do not yet have a
unified query language such as SQL used in normal data querying applications.
In simple cases only one data index is searched, while in other cases multiple
indexes are searched, and the search tier takes on “brokering” capabilities. That
is, the search tier would “broker” search requests to the data indices, and result
sets are then combined and normalized by the search tier’s “brokering”
capabilities.
In fact, in such “brokered” cases the actual “connection” to the data may be
handled by search “adapters” or connectors for each target repository. These
“adapters” then contain the logic to access the data, perform the search, and
then send the results back to the “search” tier where they can be consolidated
and combined with other result sets. However, this is really getting into Runtime
pattern types of details, so we will leave further discussion on the search
“adapters” to 5.5.1, “User Search and Discovery Runtime pattern” on page 86.
A final aspect to consider, when comparing this pattern to the User Information
Access pattern, is the data sources themselves. While in the User Information
Access pattern, real data may be accessed directly via normal relational
database/SQL queries, with the User Search and Discovery pattern it is normally
data indices that are searched — note the actual data itself. That is, consolidated
indices of data are built by population methods, that in turn index multiple other
data sources — be they files on a file system, HTML pages on the internet, or
even “blob” data from a relational repository.
Thus, the actual types of data being “queried” is another distinguishing feature of
this pattern over the User Information Access pattern.
Drivers and guidelines for usage
In general, the same business and IT drivers, and guidelines for usage, apply to
this pattern as applied to the User Information Access pattern.
However, when a “brokered” approach is utilized, additional IT drivers are often at
play in bringing about this design. Specifically, the goals of “leveraging existing
technology investments” and “reducing deployment and implementation costs”
can be met with this approach. The usage of “adapters” can allow one to better
leverage existing legacy interfaces to the data sources — reducing deployment
costs. Additionally, should a data source need to be removed or replaced, only
this adapter/connector logic would need to change, not the entire search
application. Of course, this is again getting into Runtime pattern types of details,
so we will again leave further discussion regarding search “adapters” to 5.5.1,
“User Search and Discovery Runtime pattern” on page 86.
62
Patterns: Portal Search Custom Design
4.3.3 Self-Service application patterns compared
At the outset, one may find architectural similarities between the User
Information Access/User Search and Discovery application patterns and
applications that automate the Self-Service business patterns. It is important to
distinguish the differences as we close our detailed discussion of the Information
Aggregation patterns.
What actually distinguishes these two patterns is the user interaction with data
(User Information Access) versus a business transaction (Self-Service). The
User Information Access application patterns facilitate direct interaction between
users and data, hence providing significant freedom and flexibility in accessing
and manipulating data. Applications that automate the Self-Service business
pattern enable direct interaction between users and business transactions, thus
enabling users to electronically perform a business process. Data may be
involved in these business transactions, but only smaller snippets of data.
However, a few of the Application patterns for Self-Service can be used as
front-ends to data stores, thus providing basic information access capabilities as
well — again, for a smaller scope and size of data than supported by the
Information Aggregation patterns.
4.4 Combining the patterns for search solutions
The basic relationship of Application Integration and Information Aggregation
patterns was discussed earlier in this chapter in Section 4.1, “An overview of the
Application patterns” on page 44. Now that these patterns have been described
in additional detail, we can more closely examine this relationship in regard to
search solutions.
In general, it is the Application Integration patterns that allow the Information
Access patterns to search data sets. However, there are multiple ways in which
this can be allowed, as shown in the multiple Application patterns we have
discussed. For example, taken at the very basic level, a User Search and
Discovery based application could have the ability to directly search a data
source on its own, without required a population/indexing step, as depicted in
Figure 4-11.
Chapter 4. Application patterns
63
User
"User
Search &
Discovery"
based
application
Original
Data
Figure 4-11 A basic search solution
More commonly, applications based on one of the Population application
patterns, such as the Index population pattern, are utilized to combine multiple
data sources into a single search “index”. A User Search and Discovery based
application would then be created to provide the user interface, and perform the
actual search operations against this “index” of data. This “index” may be a more
advanced categorized and summarized index available via a taxonomy — but
the same model would apply. Population applications populate this data, and
User Search and Discovery based applications then allow users to search. This
is depicted in Figure 4-12.
User
"User
Search &
Discovery"
based
application
Searchable
Data
Index
"Population"
based
application
Original
Data
Figure 4-12 A single index search solution
This basic model can then be expanded. If the User Search and Discovery based
application was built with the optional “brokering” capabilities, then multiple
indices could actually be searched and bring the user an aggregated and
normalized result set. This is depicted in Figure 4-13.
64
Patterns: Portal Search Custom Design
"Population"
based
application
User
Searchable
Data
Index
"User
Search &
Discovery"
based
application
Original
Data
"Population"
based
application
Searchable
Data
Index
Original
Data
Figure 4-13 A brokered search solution
To take this a step further, the capabilities of the Federation application pattern
could then be introduced to allow multiple data sources to be searched without
populating a single consolidated index. Moreover, Federation capabilities could
be utilized by a Population application to aid it in creating its index. This
introduction of Federation capabilities is depicted in Figure 4-14.
"Population"
based
application
User
"User
Search &
Discovery"
based
application
Searchable
Data
Index
"Federation"
based
application
"Federation"
based
application
Original
Data
Original
Data
Original
Data
Original
Data
Figure 4-14 A brokered and federated search solution
Chapter 4. Application patterns
65
As shown in these various solutions, the clear separation of information access
logic and population logic allows for the same information to be packaged
differently based on the needs of the business problem at hand. This layered
approach decreases the total cost of ownership, and increases solution flexibility.
4.5 Summary
In this chapter we have introduced the Application patterns that are required for
common portal search solution needs, and are part of the Portal Search custom
design. As with the portal composite pattern as a whole, a single application
pattern cannot accurately define the problem. Rather, to extend the portal for
search capabilities, Information Aggregation application patterns (User Search
and Discovery) must be introduced to provide the search interface and logic
capabilities; while Application Integration patterns (Population and Federation)
are required to prepare and integrate multiple data sources for searching. The
addition of these patterns to the base Portal composite pattern, creates the
Portal Search custom design.
In the following chapter, we will describe the common Runtime patterns and
Product mappings that take these higher level Application patterns down to the
technology level.
66
Patterns: Portal Search Custom Design
5
Chapter 5.
Runtime patterns
After choosing the appropriate Business pattern and Application pattern, it is
time to define the Runtime pattern and map the products used to implement it.
Runtime patterns define functional nodes (logical) that underpin an Application
pattern. The Application pattern exists as an abstract representation of
application functions, whereas the Runtime pattern is a middleware
representation of the functions that must be performed, the network structure to
be used, and the systems management features, such as load balancing and
security. In reality, these functions, or nodes, can exist on separate physical
machines or may co-exist on the same machine. In the Runtime pattern, this is
not relevant. The focus is on the logical nodes required and their placement in
the overall network structure.
This chapter introduces the Runtime patterns for our Portal Search custom
design.
© Copyright IBM Corp. 2004. All rights reserved.
67
5.1 Runtime node descriptions
A Runtime pattern is represented by logical nodes, where each node has a
specific role in the architecture. It defines the topology of the architecture and
node placement. Most patterns will consist of a core set of common nodes, with
the addition of one or more nodes unique to that pattern. To understand the
Runtime patterns presented in this book, you will need to review the following
common node definitions.
Public Key Infrastructure (PKI) node
PKI is a collection of standards-based technologies and commercial services to
support the secure interaction of two unrelated entities (for example, a public
user and a corporation) over the Internet. In the context of the topologies defined
in this redbook, PKI supports the authentication of the server to the browser
client, using the SSL protocol.
Domain Name Server (DNS) node
The DNS node assists in determining the physical network address associated
with the symbolic address (URL) of the requested information. The DNS is that of
the Internet Service Provider, although for additional security and more efficient
use of network resources, the hosting environment where the portal
implementation is housed can leverage its own DNS node.
User / Internal User node
This node is most frequently a personal computing device (PC, etc.) supporting a
commercial browser, for example, Netscape Navigator or Internet Explorer. The
level of the browser is expected to support SSL and some level of DHTML.
Increasingly, designers should also consider that this node may be a pervasive
computing device, such as a personal digital assistant (PDA).
The internal user accesses the system from within the client’s network and/or via
a VPN connection from the Internet. This user will also use some type of desktop
or mobile based computing device.
Directory and Security Services node
This node supplies information on the location, capabilities and various attributes
(including user ID/password pairs and certificates) of resources and users known
to this Web application system. The node may supply information for various
security services (authentication and authorization) and may also perform the
actual security processing, for example, to verify certificates. The authentication
in most current designs validates the access to the Web application server part
of the Web server, but it can also authenticate for access to the database server.
68
Patterns: Portal Search Custom Design
Protocol and Domain Firewall node
Firewalls provide services that can be used to control access from a less trusted
network to a more trusted network. Traditional implementations of firewall
services include:
򐂰 Screening routers (protocol firewall)
򐂰 Application gateways (domain firewall)
The two firewall nodes provide increasing levels of protection at the expense of
increasing computing resource requirements. The protocol firewall is typically
implemented as an IP router, and the domain firewall is a dedicated server node.
The protocol firewall prevents unauthorized access to servers in the DMZ from
the outside world by filtering incoming requests by protocol, access route, point
of origin, and other data characteristics.
The domain firewall prevents unauthorized access to servers on the internal
network by limiting incoming requests to a tightly controlled list of trusted servers
in the DMZ.
Web Server Redirector node
In order to separate the Web server from the application server, a Web Server
Redirector node (or just redirector for short) is introduced. The Web server
redirector is used in conjunction with a Web server. The Web server serves
HTTP pages and the redirector forwards servlet and JSP requests to the
application servers. The advantage of using a redirector is that you can move the
application server behind the domain firewall into the secure network, where it is
more protected than within the demilitarized zone (DMZ). Static pages can be
served from the DMZ by this node.
The Portal composite Runtime pattern supports this, since the ability to add
additional Web servers and/or additional application servers, without affecting
other portal nodes, is important for supporting scalability in the system and
enhancing maintainability.
The redirector can be implemented, for example, by either a reverse proxy server
or by a Web server plug-in.
Application Server node
The Application Server node provides the execution and communication runtime
environment for the business logic of the application. The business logic may be
self-contained on the application server node. If not, the application server node
is responsible for interacting with back-end applications and retrieving data from
back-end data sources. The application server node typically enables
infrastructure services such as persistence, resource connection pooling,
scalability, failover, administration, and support for Java.
Chapter 5. Runtime patterns
69
The application server node is often the central mechanism in the systems
architecture to provide access to various back-end data sources and/or
applications and to provide this access to mechanisms that are used to present
data to the end-user.
Presentation Server node
The Presentation Server node provides services to enable a unified user
interface. It is responsible for all presentation-related activity. In its simplest form,
it serves HTML pages and runs servlets and JSPs. For more advanced patterns,
it acts as a portal and provides the access integration services (single sign-on,
for example). It interacts with the Personalization Server node to customize the
presentation based on the individual user preferences or on the user role.
The Presentation Server node manages the presentation of data extracted from
multiple sources. Through the use of user profile information, business rules
(personalization) and a mechanism for aggregating different information sources
(static editorial data, content managed data, data from remote systems), an
aggregated view of data can be displayed. This aggregated view can be tailored
for different device types based on information known about the current user
accessing the portal.
Personalization Server (Rules Engine) node
The Personalization Server node works with the Presentation Server node to
customize the presentation with data that matches a user’s interest. The
Personalization server identifies the type or class of the user based on
information available about the user. Based on this classification, data taken from
a content data store either in the Personalization tier or from back-end sources is
selected for presentation to the user. It provides the mapping function of user
classification to content data.
The Personalization server contains the “rules” that determine what types of
user’s can have access to certain type of information. These are also referred to
as access control rules and are directly related to business rules and processes.
This is referred to as the Personalized Delivery::Prescriptive Runtime pattern.
The Personalization server also allows the user to design the content and the
layout of the content that they see by explicitly choosing from a selection of
options. This is referred to as the Personalized Delivery::Participator Runtime
pattern. You can use either or both of these patterns for the Portal composite
pattern.
Collaboration node
The Collaboration node provides synchronous and asynchronous modes of
communicating between an organization. We call this a community. Community
is empowered by collaboration, collaborative work between users. The
70
Patterns: Portal Search Custom Design
Collaboration node provides interactive discussions (interactive messaging and
chat functionality) and the sharing of documents/ideas (teamroom environment).
Content Management node
The Content Management node provides for the management of digital assets
(for example, images, documents, “pieces” of text) and applies a workflow and
security rules (for example, access control) to each discrete asset. Note that
assets can also be referred to as “resources” (as they are in WebSphere Content
Publisher). The Content Management node will commonly include and/or
leverage these functions:
򐂰 Content Type / Category Identification
򐂰 Workflow (based on a user’s role and/or the type of content)
򐂰 Versioning (including rollback to previous versions)
򐂰 Handling of static or dynamic content
򐂰 Transcoding / reformatting of content (more recently added to handle multiple
end-user channel device types)
򐂰 Storage of content to multiple data source types (for example, DBMS, file
system)
Search and Indexing node
A Search and Indexing node provides a function to catalog and/or index the
content data sources. This will provide the capabilities to locate specific content
(for example, product or catalog information) and to update this search capability
when updates are added (via indexing).
In addition, this information can be “indexed” in a manner that provides the
Presentation and Personalization server an ability to find information that is
“associated” to the actions taken by the end-user. For example, this could
provide for “cross-selling” or “up-selling” on a commerce site, which is a specific
form of Implicit Personalization. For more details, refer to the “Predictive
Personalization” Runtime pattern at:
http://www-106.ibm.com/developerworks/patterns/access/at3-runtime.html
Database Server node
The Database Server node provides a persistent data storage and retrieval
service in support of transactional interactions. The data stored is relevant to the
specific business interaction, for example, bank balance, insurance information,
or current purchase by the user. This node represents a common mechanism to
manage all database management system-based data sources.
Chapter 5. Runtime patterns
71
Pervasive User node
A Pervasive User node is a “catch all” category of portal user that is all “mobile”
(non-desktop) connected end-user devices other than a Web browser. In most
current scenarios this includes devices such as mobile phones, personal digital
assistants, and text pagers.
Wireless Gateway node
This node serves the information from the portal via alternative protocols to
wireless devices
5.2 Runtime pattern for the Portal composite pattern
The overall Portal composite Runtime pattern is represented in Figure 5-1. It
contains nodes that are common in a portal implementation. However, note that
based on specific business drivers, your implementation of a portal will include
some or all of these nodes. In fact, it may contain nodes that are not included in
this diagram and are specific to the implementation.
The Portal composite Runtime pattern combines the characteristics of many
different Runtime patterns. Through this combination of characteristics a
composite picture of those “components” that are generally implemented for, and
add value to, a portal implementation become clear. Below is a list of the
characteristics that generally make up Runtime patterns and the additional nodes
that are included for the Portal composite pattern.
Familiar functionality is included in this pattern, such as:
򐂰
򐂰
򐂰
򐂰
Databases and data sources
A data integration and business logic mechanism, via an application server
A security and user directory system
Browser based access
In addition, there are these portal specific nodes and functions:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
Content Management
Collaboration (the synchronous chat and messaging form)
Workflow (part of the application server and/or Content Management)
A Presentation Server (generates properly formatted output)
A business rules (Personalization) engine
Search and Indexing
Multi-client device and formats supported (for example, Wireless gateway)
A portal implementation leverages the concept of personalization, multi-device
type access, a presentation rendering mechanism, and a business rules engine.
These are combined with the ability to search and index content (of various types
72
Patterns: Portal Search Custom Design
and formats), provide collaboration, and manage content via a workflow to
provide both content aggregation and a collaborative environment.
The Portal composite Runtime pattern represents a starting point for most portal
implementations, providing a way to identify those functional areas that will likely
need to be addressed when considering this type of implementation.
Consequently, the Portal composite Runtime pattern shown in Figure 5-1
represents a preliminary step towards an operational architecture that can be
implemented in a target environment to provide secure data aggregation,
multi-client access and collaboration.
Outside world
DMZ
Internal network
Directory and Security
Services
I
N
T
E
R
N
E
T
Domain Name
Server
Pervasive
Device
Collaboration
Personalization
Server
Web
Server
Redirector
Domain Firewall
Browser
User
Search &
Indexing
Protocol Firewall
Public Key
Infrastructure
Database
Server
Content
Management
Presentation
Server
Application
Server
Wireless
Gateway
Figure 5-1 The basic portal composite Runtime pattern
5.3 Runtime pattern for Portal Search custom design
Now that we have defined the Portal composite pattern’s overall Runtime pattern,
it is important to now show the differences for the Portal Search custom design.
This “custom design variation” of the overall Runtime pattern is depicted in
Figure 5-2.
The main differences between these two Runtime patterns center on the
searching and indexing node. The basic Portal composite pattern included a
single node defined as “searching and indexing”. However, this searching and
indexing capability was really focused on data included within the portal itself —
as a companion to the Content Management node, and consequently the content
Chapter 5. Runtime patterns
73
management capabilities available within the portal. When we bring this Runtime
pattern to the level of a more comprehensive portal search solution, this
searching and index capability needs to be separated from the realm of content
management and portal specific data — to allow it to handle diverse and external
data sources.
Furthermore, as described in the Application patterns discussion earlier in this
redbook, there is a need to separate the Information Aggregation (that is, user
information access) and data focused Application Integration (that is, “federation”
and “population”) aspects of search into separate nodes. As these concepts
represent distinctly different application and performance needs, they should
therefore be considered separately in the runtime model. Thus, separate
“search”, “population”, and “federation” nodes are defined within this Custom
design runtime model, as shown in Figure 5-2.
Outside world
DMZ
Internal network
Population
Domain Name
Server
Pervasive
Device
Personalization
Server
Web
Server
Redirector
Domain Firewall
Browser
User
I
N
T
E
R
N
E
T
Protocol Firewall
Public Key
Infrastructure
Directory and Security
Services
Federation
Database
Server
Search
Presentation
Server
Application
Server
Wireless
Gateway
Collaboration
Content
Management
Figure 5-2 The Portal Search custom design Runtime pattern
Within this overall Custom design Runtime pattern, the key nodes that provide for
the search capabilities are the Database Server node, the Presentation Server
node, the Search node, the Federation node, and the Population node. These
nodes interact as follows:
򐂰 The Database server node provides the persistent data storage and retrieval
services as required by all other nodes in the Runtime pattern.
򐂰 The Federation node provides for a unified interface into multiple isolated
data sources. It is a key component of an overall search solution when many
complex and diverse sources must be integrated.
74
Patterns: Portal Search Custom Design
This node represents the high level runtime implementation of the Application
Integration::Federation application pattern.
򐂰 The Population node provides the functionality to communicate with raw data
sources to perform the indexing, and optional categorization/summarization
processing, required to produce searchable data indices of this original data.
The Population node may access multiple data sources directly, or may
access a single “virtual” data source as presented by the Federation node, or
any combination of these two.
This node represents the high level runtime implementation of the Application
Integration::Population: Index Population application pattern.
򐂰 The Presentation node is the same node utilized in the generic Portal
Composite pattern to enable a unified user interface. In terms of search
capabilities, this would involve communications with the Search node to pass
along user requests, receive search responses, and potentially format the
search response. Depending on the search technologies utilized, the search
responses may be sent back to the presentation node in an already formatted
manner (that is, full HTML), or may be passed in a manner that requires the
presentation node to transform the data (that is, XML) into the appropriate
formats such that a single aggregated view of all portal content can be
presented to the user.
򐂰 The Search node provides the core engine that communicates with the
various search indices and data sources, and performs the actual searches
based on users requests. In a portal solution, the portals “presentation
server” node performs the user interface with the end user, although it will
usually leverage various UI capabilities of the search node (via product APIs).
This search node will also provide any extended search capabilities in the
solution. This includes search brokering and aggregation to provide for a
single unified search result across multiple disparate data sources.
The combined Presentation and Search nodes represents the high level
runtime implementation of the Information Aggregation::User Search and
Discovery application pattern.
In the next few sections of this chapter we will look at the specific Runtime
patterns for each of these application patterns in more detail, examining more
closely how the key Portal Search custom design application patterns, identified
in Chapter 4, “Application patterns” on page 43, map into this overall solution
Runtime pattern.
For details on the general portal aspects of this Runtime pattern, please refer to
these IBM Redbooks:
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V4.1.2,
SG24-6869
Chapter 5. Runtime patterns
75
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V5, SG24-6087
5.4 Application Integration Runtime patterns
As discussed earlier in this redbook, the following Application Integration
patterns are important for the understanding of a search solution:
򐂰 Data movement application patterns:
– Population: Single Step
• Population: Multi-step variation
• Population: Data Cleansing variation
– Population: Index Population
– Population: Synchronization
򐂰 Federated access application patterns:
– Federation
As also discussed, it is primarily the “Population: Index Population” and
“Federation” application patterns that play in a given search solution. Thus, these
patterns are the ones described to the runtime level in this chapter. However,
some of the other Population patterns (multi-step, synchronization, and data
cleansing) are also discussed in their relation to various compositions of the
Population: Index Population and Federation Runtime patterns.
5.4.1 Population: Index Population Runtime pattern
Figure 5-3 depicts the basic Runtime pattern for applications implementing the
Population: Index Population application pattern.
76
Patterns: Portal Search Custom Design
Population
Runtime
pattern
Data
Server /
Services
Data
Server /
Services
Metadata
Application
pattern
Index and
Taxonomy
Retrieve & Parse
documents
Index,
Summarize, &
Categorize
Method
App N
App 2
Sources
Figure 5-3 Application Integration::Population: Index Population - Runtime pattern
This population Runtime pattern is in fact the same Runtime pattern that would
be used to represent many of the other population Runtime patterns as well. For
example, the “Population: Single Step” pattern would follow this same model —
as would the “Population: Data Cleansing” variation. In the case of the Data
Cleansing variation the population node would perform the cleansing activities,
while in this index pattern the population node performs indexing activities. The
usage any overall system designs relying on population capabilities.
In the case of this patterns application to a “Population: Index Population”
pattern, the population activity taking place in this Runtime pattern is as follows:
The overall Population application “indexing and summarization activities” begin
processing data from the original data sources. Data sources are usually
processed one at a time, in a sequential order. As the data is being processed,
the usage of a temporary work queue may be required. This work queue data
can be stored in files, message queues, or databases. The final index of all
documents, including document metadata and summarizations, is written to a
resulting index data file.
Advanced categorization
When advanced categorization and taxonomy generation capabilities are
required, a slightly more advanced variation of this Runtime pattern may
sometimes be required. In this variation, the “Population: Index Population”
needs are fulfilled in a “Population: Multi-step” type of approach, as shown in
Figure 5-4.
Chapter 5. Runtime patterns
77
Runtime
pattern
Population
(Categorize)
Data
Server /
Services
Metadata
Application
pattern
Load
Population
(Index)
Data
Server /
Services
Population
(Index)
Metadata
Metadata
Transform
Extract
Data
Server /
Services
App
Target
Source
Figure 5-4 Application Integration::Population: Multi-step (applied to indexing)
By examining the Runtime pattern, one can see that the categorization
capabilities have been split out into a separate runtime node. The Categorization
node takes the index and summarized search data, and performs the
categorization processing to produce a document taxonomy including indexed
and categorized data. This runtime design allows for more flexibility in terms of
scaling the overall system to support the indexing/processing of larger data
sources, and produce larger resulting search indices.
However, there is one other manner in which the need for a more scalable
“Population: Index Population” can be handled. This alternative approach
involves the combination of multiple Population patterns into one overall runtime
solution, as shown in Figure 5-5.
78
Patterns: Portal Search Custom Design
Runtime
view
Data
Server /
Services
Population
(Index)
Population
(Categorize)
Data
Server /
Services
Population
(Index)
Data
Server /
Services
Data
Server /
Services
Application
pattern
Figure 5-5 Multiple “Population: Index Population” patterns combined
As shown in the diagram, multiple “Index Population” patterns have been
combined into a single solution. This solution allows for the same scaling
flexibility as the “Population: Multi-step” variation discussed earlier. However, this
approach also allows for the separation of the document index data from the
document categorization data. Such a separation of the resulting index and
categorized data would allows for a separation of the database load from those
users performing searches, versus those users walking the taxonomy of
categorized documents.
Overall, this variation (while still simplistic) represents a high scale enterprise
solution, with maximum flexibility in terms of horizontal scaling.
External data
To take this discussion of combining patterns at the runtime level one step
further, let us examine situation involving the need for external data to be
included within a population routine. In such a case, one option would be for the
core “population node” processing to continue to function on an internal network,
access external data through the enterprise firewall. This approach is shown in
Figure 5-5.
Chapter 5. Runtime patterns
79
Outside world
DMZ
Internal network
Data
Server/
Services
External
Data
Population
Data
Server/
Services
Internal
Data
Data
Server/
Services
Metadata
Population
method
Target
App
Source
Figure 5-6 “Index Population” applied to external data
In such cases, careful consideration must be taken in regards to the security and
encryption of all communication to the external data sources, so that data
security and integrity is not compromised as the external data flows across the
unsecured external network.
Data replication
Another way to handle the external data at a runtime level would be to implement
a “replication” routine that would handle the making of an internal “copy” of the
external data. In some cases, this replication node would be placed within the
“DMZ” layer of the network, to allow for a more secure separation of the network
traffic. This approach, can be taken such that the internal “replica” of the data is
one way “pull” of data only, in which case this solution might be really the
combination of the “Population: Single Step” and “Population: Index Population”
patterns.
Alternatively, the internal replica of the data could be a full two way
synchronization, in which updates to the internal copy of this data are allowed
and then synchronized back out to the original external copy. This case, the
approach depicted in Figure 5-6, is actually a combination of the “Population:
Synchronization” and “Population: Index Population” patterns.
80
Patterns: Portal Search Custom Design
Outside world
Data
Server/
Services
DMZ
Internal network
Population
Data
Server/
Services
Replicated
External
Data
External
Data
Population
Data
Server/
Services
Internal
Data
Data
Server/
Services
Metadata
Synchronization
method
Population
method
Target
App
Source
Figure 5-7 “Index Population” applied to external data via “synchronization”
This solution does have an advantage in that it may minimize the network traffic
of the system, as the traffic associated with data lookups for an indexing process
would be more intensive than the simple replication/synchronization traffic.
However, this runtime solution has its disadvantages as well. For example, the
security concerns would be similar to a solution without replication. One would
still need to ensure that the communication of data from the external network is
secure during the replication/synchronization process. Additionally, keeping the
schedules of synchronization updates matched up with the schedule for
re-indexing of the data may also become problematic — resulting in slower
updates to the data indices — and more delayed search results to end users.
This approach would also not be feasible when the external data is owned by an
external entity. In such cases, the external entity will probably be hesitant to allow
a full copy of their data to another location, where they would lose all control of
the data security.
Chapter 5. Runtime patterns
81
Data extraction
One final combination of Runtime pattern to consider in terms of a runtime
solution is in regard to the “load” on the system of accessing and extracting data
from the original data sources. The logic required for the indexing system to
“speak” the multiple data source languages, and then actually process these
disparate data sources into a common data model can be quite intensive. This
can result in poor performance of the indexing process, and very long processing
times for the indexing node.
To again allow for more flexibility in terms of performance and horizontal scaling,
an additional population routine can be added to the solution. In such an
approach, an extraction process would run against all data sources. This
extraction node would then process all the data, and transform it into a common
data model — ultimately resulting in a single source of “normalized data” upon
which the indexing focused population node can focus its efforts.
This approach can once again be implemented in more that one way. The data
extraction could be a simple data transformation process, which might be
implemented in a “Population: Multi-step” fashion. Alternatively, the data
extraction could require more advanced cleansing of errors and other issues, in
which case the solution would be a combination of the “Population: Single Step”
and “Population: Data Cleansing” patterns as depicted in Figure 5-8.
Runtime
view
Data
Server /
Services
Data
Server /
Services
Population
Indexing
Population
Data
Server /
Services
Data Cleansing
Application
pattern
Figure 5-8 “Index Population” combined with “Population: Data Cleansing”
82
Patterns: Portal Search Custom Design
Data
Server /
Services
5.4.2 Federation Runtime pattern
The Federation application pattern was discussed earlier in this redbook, within
4.2.4, “Federation application pattern” on page 55. It is designed to create a
unified query interface into isolated structured and unstructured repositories.
Figure 5-9 depicts the basic Runtime pattern for applications implementing the
Federation application pattern.
Runtime
pattern
Data
Server /
Services
Data
Server /
Services
Data
Integration
Data
Server /
Services
Metadata
Application
pattern
App N
read
App 1
Data Integration
Method
App 2
read/write
Figure 5-9 Application Integration::Federation — Runtime pattern
The population activity taking place in this Runtime pattern is as follows:
A requesting application makes a query of data from the “federated” data source
— for example, a simple SQL Select request. The data integration node
processes the request; and utilizing its metadata, which defines the data
sources, it passes on the requests to the appropriate data sources.
In many cases, the data integration/federation logic within the Data Integration
node may be logically separate from “data connector” logic. This data connector
logic spreads out the load of making the query to multiple data sources —
allowing the queries to run simultaneously against each database. When
performance is of major concern, multiple logical data connectors may exist to
process queries against a single data source — the idea here being to eliminate
any single node in the process from becoming a bottleneck, if too many requests
run against one data source.
Chapter 5. Runtime patterns
83
In all cases, the results that are returned from each individual data source must
then be aggregated and normalized by the data integration layer so that these
results appear to be from one “virtual” data source. The results are then sent
back to the requesting application, which has no idea that multiple data sources
were involved.
External data
As discussed with the Population Runtime patterns, the inclusion of external data
into such a “federated” data source model may also be required. In such a case,
one option would be for the core “federation node” processing to continue to
function on an internal network, accessing external data through the enterprise
firewall. This approach is shown in Figure 5-10.
Outside world
DMZ
Data
Server/
Services
External
Data
Internal network
Application
Server/
Services
Data
Server/
Services
Data
Integration
Internal
Data
Metadata
App N
read
App 1
Data Integ
Tier
read/write
App 2
Figure 5-10 Federation — Runtime pattern — with external data
While this runtime design provides for a clean integration of external data, the
performance of federation queries in a model such as this, which contains both
internal and external data, may be poor. Depending on the speed of links and
firewalls in between the federation interface and the external data source, one
external data source could slow down the entire query — even if internal data
sources responded immediately. As a “federation” request happens in real-time
from a requesting application, putting an external resource and network in the
middle of this process can effectively affect larger enterprise wide applications.
84
Patterns: Portal Search Custom Design
Additionally, as with other “external data” Runtime patterns discussed in this
chapter, security of requests to the external data source would also be a
concern. All communication channels to the external data source would need to
be carefully secured and encrypted.
Of course, another alternative would be to introduce the “Population:
Synchronization” pattern combined with the Federation pattern, in a manner
similar to the Synchronization/Index Population combination shown in Figure 5-7
on page 81.
However, the same concerns would exist in this runtime solution as with the
Synchronization/Index Population data runtime solution — namely, concerns
about the scheduling of data replication updates and security concerns of owners
of the external data.
5.5 Information Aggregation Runtime patterns
The Information Aggregation business pattern captures the process of taking
large volumes of data, text, images, video, and so on, and using tools to extract
useful information from them. As discussed earlier in this redbook, Information
Aggregation patterns normally leverage the data focused Application Integration
patterns to provide a “search” interface into an aggregated set of data. As we
have already reviewed the runtime implementation of the Application Integration
patterns, it is now time to focus on the Information Aggregation pattern runtime
implementations.
There are two key applications patterns in this Information Aggregation area that
have so far been discussed at the application level:
򐂰 User Information Access (UIA)
򐂰 User Search and Discovery (US&D)
However, while these patterns differ at the application level due to their focus on
structured data (UIA) versus unstructured text (US&D), their runtime
implementations are very similar. Thus, we will focus on the User Search and
Discovery pattern, as search is the focus of this redbook — but the runtime
discussion of this pattern will apply to the UIA pattern as well.
Chapter 5. Runtime patterns
85
5.5.1 User Search and Discovery Runtime pattern
The User Search and Discovery pattern can be discussed at a basic level,
followed by several more advanced, and more common, variations. To start, the
basic runtime model for this pattern is shown in Figure 5-11.
W32 App
Runtime
pattern
Browser
Web
Application
Server
Search
Metadata
Application
pattern
Pres.
Search, Discover &
Optional Additional
Function
Data
Server/
Services
Retrieve & Parse
information
App N
App 2
Figure 5-11 Information Aggregation: User Search and Discovery — Runtime pattern
In this Runtime pattern, client users may be either “thick” clients operating via a
product API (represented by the Win32 node in this Runtime pattern), or “thin”
Web browser based clients interacting via standard Web application models. The
presentation/Web application server node handles the user interaction, and
passes search requests to the search node. This search node then analyzes the
search request, and based on its configuration processes the search, and
returns the results to the user through the presentation/Web application server
node.
Search adapters
However, especially in a portal implementation, a search technology will often be
expected to have “brokering” capabilities. This is the ability to search multiple
data indices simultaneously, and provide a seamless consolidated result set with
reasonable response time. In most cases, such multi-source brokered search
solutions will be implemented via the inclusion of “search adapter” nodes, as
depicted in Figure 5-12.
86
Patterns: Portal Search Custom Design
Runtime
pattern
Browser
Web
Application
Server
Search
Adapter
Search
Retrieve & Parse
information
Metadata
Application
pattern
Pres.
Data
Server /
Services
Search, Discover &
Optional Additional
Function
App N
App 2
Figure 5-12 Iser Search & Discovery — search adapter variation — Runtime pattern
The search adapters, then, contain the logic to interface with the actual data
indices — either interacting directly with the index at a database level, or
interacting with a controlling application’s search interface (normally via a
product API). In some cases, multiple search adapters may exist to process
queries against a single data index. The idea here is to eliminate any single node
in the process from becoming a bottleneck, if too many search requests run
against one source. The data indices themselves are regularly updated by the
“population” based capabilities discussed in other Runtime patterns earlier in this
chapter.
The search adapters then return the search results from each individual data
index, which the search brokering node must then aggregate and normalize so
that the results appear to be from one “virtual” source. The aggregated results
are sent back to the presentation/Web application server node, which must
format the results for sending back to the client that requested the original
search. Search results may be sent back to the presentation node in an already
formatted manner (that is, full HTML), or may be passed in a manner that
requires the presentation node to transform the data (that is, XML) into the
appropriate formats. Similarly, the presentation/Web application server node
may then pass the search results to the client in either a fully formatted manner,
or may leave the final formatting of the search results up to the client application.
Overall, this separation of the search capabilities into multiple nodes in the
Runtime pattern, when they were simply defined as a single “data integration” tier
in the application pattern view, provides for a level of flexibility in terms of
maintenance and performance tuning that would otherwise be unavailable
should only a single node provide the required capabilities.
Chapter 5. Runtime patterns
87
Search services
Another variation of this Runtime pattern to consider is support for the cases in
which the “data” being search is not data itself, but rather a “search service” with
a defined API for interaction. In such a case, an additional variation would apply
to highlight the fact that the search adapters may not directly access a data
index, but may interact with an existing search service to actually perform the
search. This variation is shown in Figure 5-13.
Data Server / Services
Runtime
pattern
Browser
Web
Application
Server
Search
Search
Adapter
Metadata
Application
pattern
Pres.
Search, Discover &
Optional Additional
Function
Search
Retrieve & Parse
information
Data
Server /
Services
App N
App 2
Figure 5-13 Iser Search & Discovery — search service variation — Runtime pattern
The main difference between this variation, and the search adapter variation, is
the inclusion of another search node — representing the external “search
service” to be accessed by the search adapter. As far as the user of the
application is concerned, they do not know that the search is being performed
against a data index directly, or via such a search service.
External users and data
The User Search and Discovery application pattern is the first pattern we have
discussed to the runtime level that includes direct user interaction. Thus, it needs
to take into account, not only external and internal datasources and systems as
we have considered with other Runtime patterns, but internal and external users
as well.
In any application that has direct user interaction, the involvement of internal
“trusted” users, versus external “untrusted” users, must be considered. The basic
runtime model for this pattern already discussed does not truly take this into
account. To represent such external users, we must view this Runtime pattern on
top of a standard three-tier e-business environment. This is the view shown in
Figure 5-14.
88
Patterns: Portal Search Custom Design
Outside world
DMZ
Internal network
External User
Browser
Search
Web
Application
Server
Data Server/Services
Search
Adapter
Search
Adapter
Search
Data
Server/
Services
Search
Adapter
Search
Data
Server/
Services
Data
Server/
Services
External Data
Figure 5-14 User Information Access — Runtime pattern — external users and data
To support external users, a “Web application server” node will normally be
placed in the DMZ network layer. This node would service HTTP requests from
the Web browser based thin client, passing each search application request
(normally JSP or servlets) to the presentation server. Actually, in most cases the
Web Application Server node would really be split into two nodes: a Web Server
“redirector” node, which would be placed in the DMZ; and a Web Presentation
Server node, which would remain on the internal network — thus ensuring that
any user interface logic is safe from hacking.
Security is of course important whenever an external user is introduced.
Depending on the sensitivity of the data included within search results, all
communications between the thin client user and Web server must be encrypted.
In most e-business Web based applications, this will be done via usage of the
SSL encryption, resulting in encrypted HTTP traffic (HTTPS). However, the
introduction of any additional encryption to an existing system can result in a
dramatic performance impact. Therefore, the ability of any system to support the
additional encryption loads should always be considered.
Tip: Of course, to alleviate the impacts of such encryption, special “SSL
appliances” can be utilized to off-load the encryption/decryption processing
from the rest of the system.
Typically, thick client users would only be supported internally, as it can be
problematic to properly secure external API based requests.
Chapter 5. Runtime patterns
89
External data
There will also be situations in which external and internal data indices will be
included in an “extended search” implementation of this Runtime pattern. In such
situations, the various search nodes will continue to run within the internal
network, and a search connector will be placed within the external network close
to the external data itself. Alternatively, the search connector node may be
placed in the DMZ layer to limit any compromise of the search connector logic.
This support for external data is also shown (along with the external users) in
Figure 5-14 on page 89.
Similar to the impacts of external data in the “federation” Runtime patterns
already discussed, the inclusion of any external data sources within an “external
brokered search” implementation of the User Search and Discovery pattern can
have unexpected performance results. Depending on the speed of links and
firewalls in between the search broker and the external data index, this one
external data query could slow down the entire search query — even if internal
data searches responded immediately.
However, this may have less of a “domino affect” than in the case of the
Federation Runtime pattern, as only a single user might experience this delay —
versus the application “customers” of pure data federation. Ultimately, users may
consider a small performance hit acceptable in exchange for a single search
result that integrates both internal and external data.
5.5.2 Information Aggregation in business intelligence solutions
The runtime designs that have been presented so far for each of the Information
Aggregation patterns have been focused specifically on general search related
business problems. Typically, such search capabilities will be enabled in a
knowledge management type of solution, with the goal of helping users make
sense of large volumes of unstructured content.
However, these patterns can also apply to the more structured data analysis
problems common in business intelligence (BI) types of solutions. For more
understanding in how these patterns would function at the runtime level for these
solutions, please reference the IBM patterns for e-business Web site at:
http://www-106.ibm.com/developerworks/patterns/bi/at5-runtime.html
90
Patterns: Portal Search Custom Design
5.6 Combining the Runtime patterns
At this point we have presented a high level view of the overall Portal Search
custom design Runtime pattern in Figure 5-2 on page 74. We have also
presented detailed views to the various Application Integration::Population,
Application Integration::Federation, and Information Aggregation::User Search
and Discovery Runtime patterns.
However, as discussed in 4.4, “Combining the patterns for search solutions” on
page 63, all of these search related patterns must really be combined to result in
a single comprehensive e-business search solution. Thus, it is helpful to view all
three of these patterns in a single consolidated yet detailed Runtime pattern to
clearly understand the interactions of these patterns.
Figure 5-15 depicts one potential solution shown at the Runtime pattern level
combining several of these patterns.
Federation
Data
Server/
Services
Population
Data
Server/
Services
Replicated
External
Data
External
Data
Population
Browser
Web
Application
Server
Search
Adapter
Search
User Search and Discovery
Data
Server/
Services
Data
Integration
Search
Adapter
Internal
Data
Data
Server/
Services
Population
Population
Figure 5-15 “Search” Runtime patterns combined — federation, population, and user search
Chapter 5. Runtime patterns
91
As can be seen in this combined Runtime pattern, the combination of all of these
application patterns is what truly creates a complete search solution. In this
composition of patterns, we have external data being replicated internally, then
federated with existing internal data to appear as one search source. Then we
have another search source being generated by a instance of the Population:
Index Population pattern. All three of these data sources are ultimately being
searched by external users via one single query. The user is oblivious to the
location and virtualization of the data.
When taking the knowledge gained from this “combined” search Runtime
pattern, and adding its concepts to the overall Portal composite Runtime pattern,
we ultimately end up with the overall Portal Search custom design Runtime
pattern presented in Figure 5-2 on page 74. The “search”, “federation”, and
“population nodes” in this overall custom design Runtime pattern represent the
three key search Runtime patterns we have discussed in this chapter.
5.7 Summary
In this chapter, we have stepped down to the “runtime” level of search solutions.
We have discussed the common runtime nodes typically used, and the variations
of these nodes, depending on the requirements for performance and security,
and integration with external data sources.
In the next chapter we will take these Runtime patterns and map them down to
the product level. That is, we will highlight the specific IBM technologies and
products that can be utilized to implement each runtime node in the real world.
92
Patterns: Portal Search Custom Design
6
Chapter 6.
Portal Search product
mappings
After choosing the appropriate Runtime pattern, it is time to map the Runtime
pattern and the products used to implement it.
A product mapping maps the logical nodes defined in the Runtime pattern to
specific products that implement the Runtime solution design on a selected
platform. The product mapping identifies the platform, software product name,
and often version numbers as well.
© Copyright IBM Corp. 2004. All rights reserved.
93
6.1 Mapping the Runtime pattern
In order to expedite the process of implementing any pattern, existing products
can be chosen that already contain the necessary functionality. In many cases,
additional customization of these products will be necessary to meet the
business drivers. The goal is to choose a set of products and technologies that
minimize the need for customization.
Thus, the next step after choosing a Runtime pattern is to determine the actual
products, technologies, and platforms that form a best fit for the desired solution.
In addition to the business drivers, consider these principles when determining a
product and technology mix:
򐂰
򐂰
򐂰
򐂰
Existing systems and platform investments
Customer and developer skills available
Customer choice
Future functional enhancement direction
The products and technologies chosen should fit into the target environment and
ensure quality of service, such as scalability and reliability, so that the solution
can grow along with the e-business.
6.1.1 Functional mappings
The Portal Search custom design Runtime pattern, based on the Portal
composite Runtime pattern, is constructed to be product and technology
agnostic. Figure 6-1 contains those “functions” that various nodes will provide,
and these functions can be mapped to specific products, a group of products, or
multiple products providing functionality to more than one node.
Note: Refer to 5.1, “Runtime node descriptions” on page 68 for additional
information regarding the various nodes identified as part of the Portal Search
custom design, and a detailed description for each node as needed.
94
Patterns: Portal Search Custom Design
DMZ
Outside world
Internal network
Search/Query
engine
Search Brokering
Indexing
Categorization
Summarization
Protocol filtering
Circuit level gateway
support
Proxy functionality
Population
Domain Name
Server
Pervasive
Device
Web
Server
Redirector
Domain Firewall
Browser
User
Data federation /
virtualization
unified query
interface
Directory and Security
Services
Federation
Personalization
Server
I
N
T
E
R
N
E
T
Protocol Firewall
Public Key
Infrastructure
Database supporting
LDAP
User Records
Group Data
Organizational Data
Servlet Redirector
Database
Server
Search
A rules engine
mapping portal
"resources" to
user "groups"
Presentation
Server
Application
Server
Wireless
Gateway
Alternative protocol
gateway (WAP, UDP,
etc.)
RDMBS
or
Structured file system
data source
Content Transcoding
Access Control List
Content Aggregation
Collaboration
Content
Management
Synchronous
interaction (Instant
Messaging)
Asynchronous
interaction (content
management)
Data source
connectivity (remote &
local)
Business logic
Transaction
management
Content transcoding
Content Publishing
Content Creation
Content Editing
Content Approval
Workflow
Content Transcoding
Presentation Templates
Versioning
Figure 6-1 Portal search custom design Runtime pattern::Functional mappings
6.1.2 Product mappings
Once the Runtime pattern has been chosen and functions have been identified, a
set of products and technologies must be applied so that detailed design and
implementation can occur. As this is an IBM Redbook, we will focus on the IBM
products and technologies that map to these runtime nodes. However,
technologies from other vendors may also apply, should you have existing
technologies in your environment that you wish to leverage for some of these
capabilities.
There are actually multiple IBM products that have the correct balance of
scalability, maintainability, and extensibility to support this Runtime pattern.
Figure 6-2 provides a list of the IBM products that can be applied to the different
runtime nodes in this custom design, depending on a specific solution’s needs.
Chapter 6. Portal Search product mappings
95
DMZ
Outside world
Internal network
Lotus Discovery
Server
WebSphere Portal
(Juru) Search
Engine
Lotus Domino
Lotus Extended
Search
DB2 Information
Integrator for
Content (EIP)
Lotus Extended
Search
AND/OR
DB2 Information
Integrator for
Content (II4C,
formerly known as
EIP)
IBM SecureWay
Firewall
Population
Domain Name
Server
Pervasive
Device
IBM WebSphere
Everyplace Suite
DB2 Information
Integrator and
Information
Integrator for
Content
Directory and Security
Services
Federation
Personalization
Server
Web
Server
Redirector
Domain Firewall
Browser
User
I
N
T
E
R
N
E
T
Protocol Firewall
Public Key
Infrastructure
IBM SecureWay
Directory Server
OR
Lotus Domino with
LDAP service
AND/OR
IBM Policy Director
(WebSEAL)
Database
Server
Search
IBM WebSphere
Personalization
Server
Presentation
Server
Application
Server
Wireless
Gateway
IBM HTTPD Server
IBM DB2 Universal
Database
IBM WebSphere Portal
Server
AND
(for search presentation)
Lotus Extended Search
OR
DB2 Information
Integrator for Content
Collaboration
Content
Management
Lotus Sametime
AND/OR
Lotus Quickplace
AND/OR
Lotus Domino
IBM WebSphere
Application Server
Advanced Edition
OR
IBM WebSphere
Application Server
Enterprise Edition
IBM Web Content
Publisher
OR
IBM Content Manager
OR
Lotus Domino.DOC
Figure 6-2 Portal Search custom design Runtime pattern::Product mappings
Note: A specific operating system for each node is not listed in the product
mappings, as there are generally several options, because IBM’s products run
on a multitude of platforms (for example, Windows 2000, Linux, AIX®, etc.).
For the scenario implemented in this redbook, Microsoft® Windows 2000
Server with Service Pack 3 was utilized for all servers. More details on the
runtime product mappings chosen for this book’s technical scenario can be
found in 10.1, “The runtime environment” on page 168.
96
Patterns: Portal Search Custom Design
6.1.3 Network protocol mappings
Finally, as shown in Figure 6-3, the network protocols used for a typical
installation of this Runtime pattern are as follows:
򐂰 HTTP/HTTPS: Hypertext Transfer Protocol (HTTP), or Hypertext Transfer
Protocol Secure (HTTPS), is used from the user’s Web browser to the HTTP
server in the Web server redirector node.
HTTP, or HTTPS, is also used from the WebSphere Web server plug-in the
Web server redirector node to the Web container in the Presentation server
node as well as from the collaboration and content management node to the
Presentation server node.
Finally, HTTP or HTTPs may be used between the Search node and the
Presentation server node.
򐂰 LDAP/LDAPS: The presentation and application server uses Lightweight
Directory Access protocol (LDAP) to access the LDAP Server in the Directory
and Security Services node. LDAPS is the secure LDAP connection to a
directory server using SSL. Since LDAP directories store essential and
sensitive applications, as well as business information; the communication
can use LDAPS to be secure.
򐂰 JDBC: The application server node and the Directory and Security Services
node uses a Java Database Connectivity (JDBC) driver to access the
database server node.
The Search, Population, and Federation nodes may also use JDBC to
communicate with their applicable data sources. The Search node will
communication either directly, or through the Federation node.
򐂰 RM/IIOP: The personalization server node uses Remote Method Invocation
(RMI) over Internet Inter-Orb Protocol (IIOP) to access the EJB container in
the presentation server node and EJB container in the application server
node.
RMI/IIOP is also used from the presentation server node to the EJB container
in the application server node. Additionally, RMI may be the method of
communication to the Search functionality available in the Search node as
well.
Note: Two application servers can also communicate via HTTP with SOAP
using the Web Services technology.
Chapter 6. Portal Search product mappings
97
DMZ
Outside world
Internal network
JDBC
Directory and Security
Services
Population
Domain Name
Server
Pervasive
Device
LDAP/LDAPS
Federation
Personalization
Server
RMI/IIOP/JDBC
Wireless
Gateway
Web
Server
Redirector
Domain Firewall
Browser
User
I
N
T
E
R
N
E
T
Protocol Firewall
Public Key
Infrastructure
Search
RMI/IIOP/HTTP
RMI/IIOP
Presentation
Server
HTTP/IIOP
JDBC
Database
Server
JDBC
Application
Server
HTTP
HTTP/HTTPS
Collaboration
Content
Management
Figure 6-3 Portal Search custom design Runtime pattern — Protocol mapping
6.2 Product descriptions
This section provides some background and details on each IBM product
identified in the product mappings for this Portal Search custom design. This
information is required to more fully understand these product mappings.
6.2.1 Lotus Extended Search
IBM Lotus Extended Search is a scalable, server-based technology that
searches in parallel across many content and data sources, returning integrated
query results into a Web application.
It provides the following capabilities:
򐂰 Single search: Find relevant information from multiple sources with a single
search using only a Web browser.
򐂰 Parallel searching: Search in parallel across structured and unstructured data
stores, including popular Web search sites, Lotus sources, RDBMS, Index
and Directory sources, Content Management applications, Sametime® users,
and more.
򐂰 Single result set: Get aggregated results presented as a single, ranked result
set.
98
Patterns: Portal Search Custom Design
򐂰 Integrate with e-business applications: Easily integrate search capability into
e-business applications via a strong Java based API and SOAP/Web
Services interface.
򐂰 Scalable search: Support scalable enterprise search requirements across
departmental and geographic locations.
򐂰 Save, resuse, share searches: Save, re-use and share searches.
򐂰 Store and forward search results: Store search results and/or forward search
results to workflow or personalization applications.
򐂰 Identify people: View shared searches to identify people with similar interests.
Extended Seach — key components
There are really three key “components” or the extended search product: clients,
brokers, and links.
Clients:
Extended Search Includes several ready-to-use, customizable search
applications — including “portlets” for usage in WebSphere Portal. However, it
also provides a solid API to allow for custom client development.
The Extended Search common API is available in three forms:
򐂰 You can embed data beans and Extended Search Java Knowledge
Management (JKM) tags in HTML pages. If you use a Lotus Domino™ Web
application server, you must use this approach. If you use IBM WebSphere
Application Server, you can choose to use this approach.
򐂰 You can create JavaServer Pages (JSP) and use Extended Search JSP
beans or JSP tags to embed search functionality. This approach is supported
by WebSphere Application Server.
򐂰 You can use the Extended Search Simple Object Access Protocol (SOAP)
interface, which enables you to provide search functionality as a Web service.
Brokers:
Extended search brokers manage and synchronize requests and responses
between multiple clients and back-end systems. They:
򐂰 Distribute queries for efficient, parallel searching
򐂰 Aggregate and filter search results
򐂰 Enable peer-to-peer communication to search across departmental,
corporate, and geographic boundaries
Chapter 6. Portal Search product mappings
99
Links:
Extended Search links are the software modules that encapsulate the native API
calls for search and retrieval to a specific data management system. They
contain all of the required data structures, programming objects, and procedural
logic necessary to interface with the back-end data system:
򐂰 They can connect brokers to targeted data stores, including: IBM Lotus
Notes®, IBM DB, Oracle, Sybase, Microsoft SQL Server, Microsoft Access,
IBM Lotus Discovery Server™, IBM Lotus Instant Messaging (Sametime),
IBM Lotus Team Workplace (Quickplace), IBM Lotus Domino.Doc®, IBM
WebSphere Portal Search Engine, Domino Domain Index, IBM Secureway,
Microsoft Index Server, Microsoft Site Server, Microsoft Exchange,LDAP
Server directories, file systems, and over 18 Web search sites (such as
Hotbot, Excite, Alta Vista, News and User groups, etc).
򐂰 They can translate queries into the native search languages of the target data
stores.
򐂰 They can be created to access additional data stores.
More details on Lotus Extended Search can be found within the various
documents and whitepapers available at:
http://www-10.lotus.com/ldd/notesua.nsf/find/les
Extended Search is also implemented within the technical scenario detailed in
later this redbook, and the product architecture is described in more detail within
Appendix B, “Understanding the Lotus Extended Search architecture” on
page 207.
6.2.2 DB2 Information Integrator
The DB2® Information Integrator™ product sets are designed to address
customer requirements for integrating structured, semistructured and
unstructured information effectively and efficiently. This product is broken into
both structured data, and unstructured text capabilities:
DB2 Information Integrator V8.1
IBM DB2® Information Integrator V8.1 provides integrated, real-time access to
diverse data as if it were a single database, regardless of where it resides.
The federated server capabilities of this product allow users to:
򐂰 Create an abstract relational view across diverse data
򐂰 Use existing reporting and development tools
򐂰 Rely on leading-edge cost-based optimization
100
Patterns: Portal Search Custom Design
The replication server capabilities of this product allow users to:
򐂰 Manage data movement strategies, including distribution and consolidation
models
򐂰 Monitor synchronization processes
This product supports multiple data sources, including:
򐂰
򐂰
򐂰
򐂰
򐂰
򐂰
DB2 Universal Database™
Informix®
MS SQL Server
Oracle,
Sybase,
Teradata, ODBC and others
More details on the DB2 Information Integrator (for data) can be found within the
various documents and whitepapers available at:
http://www-3.ibm.com/software/data/integration/db2ii/support.html
DB2 Information Integrator for Content (EIP)
IBM® DB2® Information Integrator for Content (formerly Enterprise Information
Portal in versions 8.1 and earlier) provides broad information integration and
access to:
򐂰 Unstructured digital content such as text, XML and HTML files, document
images, computer output, audio and video
򐂰 Structured enterprise information via connectors to relational databases
򐂰 Lotus Notes® Domino databases and popular Web search engines (via IBM
Lotus Extended Search)
This product also supports Information Mining and Web Crawling, such that it
provides an API for the automation of information extraction and analysis, and
provides for intranet, extranet, or Internet Web crawling.
More details on the DB2 Information Integrator for content can be found within
the various documents and whitepapers available at:
http://www-3.ibm.com/software/data/Information Integrator for Content/library.html
6.2.3 Lotus Domino
Lotus Domino provides a multiplatform foundation for collaboration and
e-business, driving solutions from corporate messaging to Web based
transactions — and everything in between. This enterprise-class messaging and
collaboration system is built to maximize human productivity by unleashing the
experience and expertise of individuals, teams, and extended communities.
Chapter 6. Portal Search product mappings
101
In terms of portal search solutions, Lotus Domino provides a basic search and
indexing engine. Administrators can create full-text indexes to allow users to
quickly search for information in databases. To search in a database, users enter
a word or phrase in the search bar of the database to locate all documents
containing the word or phrase.
Additionally, Domino also supports a capability called “Domain Search”, which
supports searching across and entire Domino domain of databases and files. To
support Domain Search, you need to designate a Domino server as the indexing
server, which builds a domain wide index that all Domain Search queries run
against. In order for the indexing server to build the index, you must first create a
Domain Catalog on the server — a database that controls which databases and
file systems get indexed. The indexing server then spiders, or crawls, the servers
that contain the content to be indexed.
When a user submits a query, the results that the indexing server returns contain
only database documents to which that user has appropriate access. If the
indexing server is set up as a Domino Web server, it can support searches from
both Lotus Notes and Web browsers.
More information on Lotus Domino, and the Domino Domain search, can be
found within the Lotus Domino product documents found at:
http://www.lotus.com/ldd/doc
6.2.4 Lotus Discovery Server
The IBM Lotus Discovery Server is a knowledge server that provides advanced
search and expertise location solutions designed to ensure that all of the relevant
knowledge and collective experiences of an organization are readily available to
help individuals and teams solve every day business problems.
To do this, the Lotus Discovery Server extracts, analyzes and categorizes
structured and unstructured information to reveal the relationships between the
content, people, topics and user activity in an organization. It will automatically
generate and maintain a Knowledge Map (K-map) to display relevant content
categories and their appropriate hierarchical mapping that can easily be
searched or browsed by users. The server also generates and maintains user
profiles and tracks relevant end-user activities, identifying those individuals who
may be subject matter experts.
More details on Lotus Discovery Server can be found at:
http://www-3.ibm.com/software/lotus/knowledge/
http://www.lotus.com/products/discserver.nsf
102
Patterns: Portal Search Custom Design
More information can also be found within the IBM Redbook Lotus Discovery
Server 2.0 Deployment, Planning, and Integration, SG246575:
http://www.redbooks.ibm.com/abstracts/sg246575.html
6.2.5 WebSphere Application Server
IBM WebSphere Application Server provides Web and application server
services in an e-business environment. It supports custom-built applications,
based on integrated WebSphere software platform products, or on other
third-party products. Such applications can range from dynamic Web
presentations to sophisticated transaction processing systems.
WebSphere Application Server is leading the way in support for industry open
standards. WebSphere Application Server provides full Java 2 Platform,
Enterprise Edition (J2EE) compliance with a rich set of enterprise Java
open-standards implementations. It also provides built-in support for the key Web
services open standards, making it production-ready for the deployment of
enterprise Web services solutions.
The latest version of WebSphere Application Server is version 5.
More details on WebSphere Application Server can be found at:
http://www.ibm.com/websphere
More information can also be found within the IBM Redbook: WebSphere
Application Server V5.0 System Management and Configuration, SG24-6195:
http://www.redbooks.ibm.com/abstracts/sg246195.html
6.2.6 WebSphere Portal
WebSphere Portal is IBM’s comprehensive portal offerings for successful
business-to-employee (B2E), business-to-business (B2B) and
business-to-consumer (B2C) portals. WebSphere Portal:
򐂰 Delivers a single, point of personalized interaction with applications, content,
processes, and people for a unified user experience
򐂰 Allows users to view, search, create, convert, and edit basic documents,
spreadsheets, and presentations from within the portal
򐂰 Provides powerful collaboration capabilities such as instant messaging, team
workplaces, people finder and e-meetings
򐂰 Enables quick portal integration with back-end systems via portlet builders
Chapter 6. Portal Search product mappings
103
The latest version of WebSphere Portal is Version 5.0. WebSphere Portal 5.0 for
Multiplatforms. It includes two offerings:
򐂰 Portal Enable is the base offering, and provides personalization, content
publishing, document management, productivity functions along with the
scalable portal framework.
򐂰 Portal Extend adds powerful collaborative, extended search (via Lotus
Extended Search) and Web analysis features to enhance portal
effectiveness.
There are also small business focused versions of WebSphere Portal, and
versions for IBM ^™ zseries and iseries platforms:
򐂰 WebSphere Portal — Express
򐂰 WebSphere Portal Enable for iSeries™
򐂰 WebSphere Portal for z/OS® and OS/390®
More details on WebSphere Application Server can be found at:
http://www7b.software.ibm.com/wsdd/zones/portal/
6.2.7 WebSphere Portal Search Engine (Juru)
WebSphere Portal Version 5.0 provides a Portal Search Engine to facilitate
indexing and searching of information.
Starting with WebSphere Portal version 4, it has included a built-in search engine
that crawls and indexes Internet and text documents. This consists of a search
portlet that provides fast and precise free-text search (with ranking by relevance
or by date at user’s request) as well as an “admin” portlet for easy configuration.
In addition, it provides a SOAP interface for Web services enablement.
The search technology included in WebSphere Portal is based on a search
technology originally developed by IBM Research, and code-named Juru. Juru is
a full-text search library entirely written in Java that focuses on highly precise
search results. It efficiently applies state-of-the-art search algorithms as well as
unique techniques to produce both effective and efficient results. The use of Java
and Internet technology (servlets, templates, SOAP etc.) allows easy integration
with cross-platform applications. It also enables developers to incorporate new
document types and to easily develop new user interfaces.
Juru's basic search features include: free-text query specification, advanced
query operators, multi-lingual support, summarization, customized thesaurus
(for example, synonyms) and stop list, search results clustering, and index
compression. Some of Juru's unique features include: multiple word indexing
(Lexical Affinities) for disambiguation and high precision, query assistance
word/query completion, utilization of link information for Intranet search, and
lossy index compression to help choose the ideal size of your index.
104
Patterns: Portal Search Custom Design
The Juru based WebSphere Portal Search Engine can scale up to 100 GB worth
of index data on a single server.
As the technical scenario in this redbook is based on WebSphere Portal v4, this
technology was not utilized — due to incompatibilities with some of the other
technologies that were implemented (that is, Lotus Extended Search did not
support WebSphere Portal Search Engine until Lotus Extended Search 4.0).
However, this technology should be considered a key component of any
WebSphere Portal version 5.0 based portal search implementation.
6.3 Choosing the product
For each node in our overall Runtime pattern, a set of products and technologies
are known to provide a correct mix of scalability, maintainability, and extensibility.
In numerous client engagements, multiple products have proven to meet
demands of the target environment. In this section, we will highlight the products
applicable to each node in our “overall” custom design Runtime pattern, by
discussing the product mappings for the specific Runtime patterns that each high
level node represents.
The specific product mappings we will examine are:
򐂰 Population: Index Population — as this maps to our overall “indexing” node
򐂰 Federation — as this maps to our overall “federation” node
򐂰 User Search and Discovery — as this maps to our overall “searching” node
Population choices
Figure 6-4 shows the Population Runtime pattern.
Product
Mappings
Runtime
pattern
Lotus Domino
WebSphere Portal Search Engine
Lotus Extended Search (web
crawler)
DB2 Information Integrator for
Content (web crawler)
Lotus Discovery Server (spiders)
Population
Data
Server /
Services
Data
Server /
Services
Figure 6-4 Product mappings for Index Population Runtime pattern
Chapter 6. Portal Search product mappings
105
As depicted, the Population: Index Population Runtime pattern has multiple
product mapping choices, as multiple IBM products perform basic index creation
capabilities. Ultimately, the product you choose will depend on the environment
into which your are going to deploy the solution, and the types of data you are
trying to index:
򐂰 Lotus Extended Search ships with a basic Web crawler, that can be used for
small basic Web indexing needs.
򐂰 DB2 Information Integrator for Content has a Web crawler API, which allows
one to develop their own custom Web crawler.
򐂰 WebSphere Portal has a high quality search engine based on the IBM Juru
technology. It would be most appropriately used in WebSphere Portal
environments.
򐂰 Lotus Domino has always had a robust data indexing engine, which easily
handles the creation of indices of Domino/Notes® data.
򐂰 Lotus Discovery Server providers built in “spiders” to crawl and perform
advanced indexing and categorization of multiple types of sources — as well
as an API for building customer “spiders”. It is most often used when
advanced taxonomy generation and expertise location capabilities are
needed.
Federation choices
Figure 6-5 shows the Federation Runtime pattern.
Product
Mappings
Runtime
pattern
WebSphere
DB2 Information Integrator
Lotus Extended Search for Content
Data
Server /
Services
Data
Server /
Services
Data
Integration
Figure 6-5 Product mappings for Federation Runtime pattern
106
Patterns: Portal Search Custom Design
IBM Content Manager
DB2
& other
Data
Server /
Services
As depicted, the Federation Runtime pattern also has multiple product mapping
choices — although less choices than with Population. The sole IBM product that
provides the Data Integration capabilities is the DB2 Information Integrator and
DB2 Information Integrator for Content (II4C). We will focus on the “content”
version, as the focus of this book is searching of unstructured content.
When II4C is used, IBM Content Manager and/or DB2 or other relational
databases server as the back-end data service. The front end calling application
would then be any application written to the II4C API — be it a native C based
application or a Java based application running in WebSphere.
Additionally, Lotus Extended Search provides a connector to allow it to leverage
a federated data source through Information Integrator within one of its queries.
Search choices
When determining which technologies to map to the search node, we need to
first decide which variation of the User Search and Discovery application pattern
to utilize. This can be either a basic single data source search, or a more
“extended/federated” search across multiple data indices. Once this is decided,
then the product mappings shown in Figure 6-6 apply.
Product
Mappings
Runtime
pattern
WebSphere App Server
WebSphere Portal
Lotus Domino
Browser
Web
Application
Server
WebSphere Portal Search
Engine (Juru)
Lotus Domino Search
Search
files
NSF
Data
Server /
Services
Figure 6-6 Product mappings for basic User Search and Discovery
In the base User Search and Discovery pattern, the choices are really defined b
the user interface. That is, if the user is in Domino, then traditionally the Domino
search capabilities will be utilized against a back-end NSF data store.
Alternatively, if the user is in a WebSphere application, or WebSphere Portal,
then the WebSphere Portal Search Engine will be used with the Portal Seach
Engines index previously built by PSE in various files on the file system.
Chapter 6. Portal Search product mappings
107
Product mappings for User Search and Discovery are shown in Figure 6-7.
Product
Mappings
Runtime
pattern
WebSphere App
Server (LES & II4C)
Lotus Domino (LDS)
WebSphere Portal (all)
Browser
Web
Application
Server
Lotus Extended Search (LES)
DB2 Information Integrator for
Content
(II4C - formerly EIP)
Search
Search
Adapter
DB2
Files
nsf
Content
Manager
etc
Data
Server /
Services
Figure 6-7 Product mappings for User Search and Discovery w/Search Adapters
However, when a more extended search version of the User Search and
Discovery pattern is used (with search adapters), there are really two key
technology choices. The selection of one technology over the other is usually
made based on the data to be searched:
򐂰 Lotus Extended Search (LES): This is most often used for search across
more typically “unmanaged” sources of “collaboration” data, such as Lotus
Domino, Domino.Doc, MS Index Server, MS Sharepoint Portal,Quickplace
3.0, Lotus Discovery Server, WebSphere Portal Search Engine, MS Site
Server, MS Exchange, MS SQL Server, MS Access, public Web search sites
and syndicated content providers, and ODBC data.
򐂰 IBM Information Integrator for Content (II4C): This is most often used when
searching is needed across traditional IBM “managed content” sources, such
as IBM DB2 Content Manager, Content Manager OnDemand, ImagePlus®,
EDMSuite™ VisualInfo™, and Lotus Domino.doc.
To make matters more confusing though, both LES and II4C provide connectors
to leverage each other. Thus, one can use LES as the search “broker”, including
more traditionally “managed content” sources in its search via a connector to
II4C. Alternatively, II4C can connect to LES to include more unmanaged file and
collaborative data.
108
Patterns: Portal Search Custom Design
In such cases, where the type of data to be searched does not clearly define the
technology to select, or aspects such as the varying APIs for each option may
come into play. For example, LES comes with more ready to use “out of the box”
interfaces, while the II4C toolkit/API must really be leveraged to build a custom
interface for usage with II4C.
One other final consideration is the ability to update data from the seach results.
II4C allows for edit and manipulation of resulting documents within the back-end
managed content sources.
Note: In this section we have only discussed the key decision criteria in
choosing the right product mapping for seach specific nodes. For more details
on choosing the right product choice for the portal specific nodes of the Portal
Search custom design, please reference the IBM Redbook Patterns: A Portal
composite pattern Using WebSphere Portal V4.1.2, SG24-6869.
6.4 Summary
This chapter has taken our discussion of the Portal Search custom design down
to its final level. The high level business and integration patterns we originally
discussed when first introducing the need for portal search, have now been taken
down to specific product mapping recommendations. In the rest of this redbook,
we will now discuss technical guidelines for using and implementing these
technologies.
Chapter 6. Portal Search product mappings
109
110
Patterns: Portal Search Custom Design
Part 3
Part
3
Solution guidelines
© Copyright IBM Corp. 2004. All rights reserved.
111
112
Patterns: Portal Search Custom Design
7
Chapter 7.
Technology considerations
When selecting any search engine or technology, there are many important
aspects to consider beyond just whether it can support the data sources that you
require it to search. This chapter attempts to define the key questions and
technology aspects that must be considered when determining the technical
implementation of a search solution.
© Copyright IBM Corp. 2004. All rights reserved.
113
7.1 Query syntax support
Nearly all data management systems employ a grammar or query language of
some kind to express the criteria of a search. These grammars can vary widely
depending on the structure and composition of the data.
In free text systems such as the Web, for example, the search is generally
expressed as a list of keywords. Additional notations are used to express
boolean conditions (and, or, not) or positional information, such as specific words
that must occur within the same sentence or paragraph. However, if the data is
highly codified and structured, the grammar may be more parametric and may
support fielded operations (for example, the value of the Quantity field is greater
than 100).
So, when a search technology is being investigated, the potential power of its
search language should be carefully considered. Does it support boolean
searches, field level searches, exact phrase searches, etc? There are currently
efforts underway in the various standards bodies to investigate standards for a
common search query syntax, similar to the SQL standard utilized in the
relational database world.
Query syntax in an extended search world
When an “extended” search solution is utilized, it is clearly impractical for a user
to know the syntax used by each brokered search source. It is instead much
more practical to let the user express a query in a single common search
language, which is then in turn mapped to the specific native query syntax
utilized by each of the brokered search sources.
As an example, the Lotus Extended Search products offers a “common”
language, which it refers to as the Generalized Query Language (GQL).This
GQL is basically a superset of search grammars from which most queries can be
expressed.
Even with such a common syntax solution, there will be occasions in which a
specific query “expression” will simply not map to a back-end search source, as
that back-end search source does not have a matching search capability. For
example, a field level search may not be supported by all search sources.
Alternatively, a single search source may support more advanced search syntax
expressions than are supported by the extended searches common query
language. In these cases, the ability for the extended search technology to
support the “passthrough” of a query is important, such that it skips the
translation step and runs “as is” against the target data/search source.
114
Patterns: Portal Search Custom Design
Additionally, there may also be occasions in which the “translation” utilized by
extended search product to transform the common query into the specific data
sources query syntax may not be sufficient. One may need to add and revise the
defined translation rules to include enterprise-specific rules for customized
translator programs (for example, to access enterprise-proprietary databases).
Thus, the capability to modify these rules is an important feature to consider.
In the Lotus Extended Search product, the specification of each supported
grammar is recorded in a grammar definition in the products configuration
database. Grammar definitions contain an entry that associates a shared library
with a particular Extended Search grammar. The specified library contains the
actual code for translating a GQL statement into grammar native to the link type,
such as rules for translating GQL to SQL — and these definitions can be
modified or created from scratch as required.
7.2 Support for a common data model
Just as search grammars can vary with each dissimilar back-end system, so can
the data models used to organize and store information. The data model used by
a particular data management system is typically designed for the class of
applications it serves. This determines the amount of structure and granularity
found in its information.
For example, free text systems tend to use a loosely structured document model
with low data granularity. A document may consist of a few fields (such as title,
author, and body) but its text remains free in form and unstructured. By
comparison, information can be highly structured, such as that found in relational
databases. Here, data is organized into rows and columns that can be related in
any number of ways, which results in high data granularity.
A search solution that provides for extended searching across multiple data
sources must consider these diverse data models, and determine how to
normalize these models into a single common data model so that search results
can easily be aggregated. A common model should not attempt to achieve a full
union of all the back-end data models but rather provide a flexible form into which
all models can map most of their concepts. One important feature enabled by
such a common data model is field level searches via field mapping.
Field mapping
A common problem encountered when relating data sources of different types is
the mismatch in field labels. For example, an author’s name might be labeled
AUTH_NAME in one data source and CREATOR in another, and yet be
represented as three fields (such first name, middle initial, and last name) in
another.
Chapter 7. Technology considerations
115
An important feature of a common data model is the ability to define mapped
fields. A mapped field is a composite of one or more native fields. To resolve the
ambiguity in our author name example, you could define a single mapped field
with the label AUTHOR. You could then map this field to one or more native
fields in each of the data sources that support the semantic of author’s name.
The benefits of this mapped field feature in an extended search technology are
compelling when used in a search expression. A user needs only to specify the
mapped field in the query, and the search server will automatically associate the
mapped field to the correct native fields on the back-end. This approach greatly
simplifies the query expression, and provides greater benefit as the number of
data sources increases.
Not only do mapped fields help in simplifying the search expression, they can
also be used to simplify the processing of search results by the extended search
engine. For example, if the result came from a personnel record in an LDAP
directory, you might want to return the person’s name, job title, and contact
information in the search results list. On the other hand, if the result came from
an e-mail system, you might want to return the date, subject, and author. These
different pieces of document metadata could both be mapped to the same
common fields, and be requested from the document index with the search
results.
7.3 Simple versus advanced index creation
As discussed in the application patterns discussion earlier in this redbook, a
search product that builds search indices can function at multiple levels. It can
perform a basic index creation, which allows for a simple text based search of
source data, or it can provide more advanced capabilities such as summarization
and categorization. It is becoming increasingly common these days for search
technologies to provide more than just index creation, so this is another key
criteria in selection a search technology.
Summarization
Summarization techniques provide for a short descriptive text “summary” of a
document within the resulting index. This allows search results to present a small
summary of text to the user, allowing them to better decide if a given search
results is applicable to them or not, without having to click through and view the
actual document.
Basic summarization techniques can extract document summaries that are
already included and marked as such within a document. Obviously, it is easier to
attach a summary to index entries if selected documents contain a summary that
is clearly marked. This is often the case with HTML based documents that may
116
Patterns: Portal Search Custom Design
contain summaries in the HTML headers. However, when a pre-written summary
does not exist, one can also be extracted via other electronic means.
Electronic extraction of document summaries can also occur in a simple fashion,
such as using the first XX characters or sentences in a document as the
summary. In more advanced cases, summarization processes can intelligently
analyze the document to determine important sentences that should be included
within the summary. The importance of a sentence is determined by some
surface clues such as the number of important keywords, the type of sentence
(fact, conjecture, opinion, etc.), rhetorical relations in the context, and the
location in which a sentence exists in a document.
Such advanced summarization extraction capabilities can also be taken a step
further in that a summarization method might be changed by text types. For
example, one would use different strategies of summarization for ordinary
articles versus editorial articles in newspapers. This corresponds to a change in
the weight (or the importance) of each surface feature when calculating sentence
importance.
Categorization
When document categorization capabilities are involved, documents are sorted
into a list of categories that group like documents together. Such categories can
allow users to browse to documents that might meet there interest, rather than
just entering search criteria. Such categories can be grouped into a hierarchical
structure, forming an overall “taxonomy” for all indexed documents.
Like summarization techniques, categorization techniques can also vary widely.
At the basic level are techniques that provide a simple rules based
categorization. An administrator defines the categories, and the rules that
documents must map to fit into a given category. For example, all documents
with the words “Redbook” and “IBM” and IBM occurring often in the first part of a
document would map to the “IBM Redbooks” category.
More advanced categorization and taxonomy generation technologies also exist.
These technologies will attempt to extract key “features” of documents, cluster
documents with common “features” together, and then determine the appropriate
category names for these clusters of documents. The Lotus Discovery Server is
one example of such technology, although the Discovery Server product takes
this a step further by including capabilities to match the authors of documents to
categories as well, providing for an “expertise location” type of capability.
In real life usage, many of these “automatic” taxonomy generating products still
need to be verified and cleaned up by a document specialist that is familiar with
the subject matter. Thus, for such “automatic” taxonomy and category generating
products, strong taxonomy “editing” tools are an important consideration to allow
the resulting taxonomies to be edited and modified as needed.
Chapter 7. Technology considerations
117
Multi-language support
When selecting a product due to its more advanced summarization and
categorization capabilities, do not forget to first consider whether any of the data
sources to be searched involve multiple languages. The more advanced “text
mining” features utilized by summarization and categorization engines will not
always support any language beyond english. When they do, they will often have
separate “linguistic analysis” processes that must run depending on the
language involved.
In cases such as this, it is important to identify if the language of data is fixed for
each source, or if the data within a single source is a mix of languages. For
example all data in this source is in german, while all data in a second source is
in english — versus a mix of english and german within the same database.
When this mix of languages is in place, search technologies must have a
language identification capability that will let them identify the language of a
given document, and then run the appropriate summarization and categorization
tools for that language on that document.
7.4 Honoring the security of data sources
Another important consideration for any search technology is its ability to honor
the security of the data sources that are searched. This becomes particularly
complex when document level security is enabled on the data source. There are
two main areas to consider in terms of search security: security during indexing,
and security during searching. Such “security” features of search technologies
are often overlooked, but are a crucial product selection criteria.
Honoring of data security during indexing
Search technologies that create an index of document data must ensure that
security details of documents are brought with the documents into the index as
part of the documents metadata. This is the only way that the searching of this
index can honor the original documents security.
A key deployment consideration associated with this is ensuring that the
credentials the indexing process uses to access the data source has access to
all data that is desired to be indexed. This is again sometimes a complicated
matter when document, or record, level security is enabled on the data.
Honoring of data security during searching
When data involved in your search solution is of a sensitive nature, it is crucial
that users are never returned results for documents to which they do not have
access. For one, receiving a result set you cannot access is not a quality solution
in terms of user satisfaction. But more importantly, the search results may
contain document summarize or contents of certain document fields such that
118
Patterns: Portal Search Custom Design
user would view in the search results sensitive information that they would
otherwise not have access to.
Thus, the searching capabilities of any search technology should be able to
“impersonate” the user making the search request, and pass these user
credentials on to the back-end data sources being searched. This will ensure that
the only results received are results for which the user has access.
A twist on this same scenario would be a back-end data source that contains
highly personalized data. In this case, the ability of a search engine to
impersonate the user making the request is important not so much for security
reasons but for user satisfaction reasons. If the user is used to browsing
information and viewing personalize information while browsing, they will
probably expect a search capability over these same information to provide for
personalize results as well.
7.5 Source discovery
When deploying any search technology, the initial setup of connecting the search
engine to the data sources to be searched can be quite time consuming. This is
especially so when extended search technologies are involved, and thus multiple
diverse data sources may need to be included. In such cases it is helpful if the
search product allows for the automatic “discovery” of details about data
sources, and then automatically configures required settings, any field mappings,
and other parameter information for each new data source.
Any such “discovery processes” should also be able to ascertain whether or not
a particular data source has been previously loaded into the search configuration
— in which cases the discoverer should skip already defined sources on
subsequent invocation. This should be true even if the data source name
changes.
7.6 Performance considerations
Performance and scalability of any e-business solution can sometimes be
considered a “black art”. There are any number of network, hardware, software,
or environmental considerations that can affect an overall systems performance.
However, when considering search technologies to utilize in a search solution,
several key features or metrics of the search engines should be considered.
򐂰 Index size:
Any search engine has a limit in the amount of data it can index, and the size
of the index itself, before it will reach a poor level of performance.
Chapter 7. Technology considerations
119
Additionally, the size of the index also needs to be considered to ensure that
adequate disk resources are available. In many search technologies, the size
of the index can be 50-75% the size of the original data!
򐂰 Crawl rates:
Search engines will also have limitations in terms of how quickly they can
crawl through source data sets, and perform the necessary indexing and
categorization steps. Obviously, the more advanced the summarization and
categorization techniques applied, the slower the search engines crawl rate
will be.
Crawl rates are usually expressed in terms of documents per hour — but it is
important to understand the average document size that manufactures are
making when determining such statistics.
򐂰 Caching:
Caching of data is another common aspect considered in the performance of
any IT system. When applied to search, caching can fall into several
categories. First, are searches against a datasoruce cached, such that
additional searches for the same search term utilize the cached results? Such
global search caching obviously needs to take into account the update period
for the search indices/etc., to ensure that cached search results do not begin
providing out-of-date information.
However, another equally important caching consideration relates specifically
to the performance of the search client interfaces. If a users receives 1000
search results, then it is obviously not ideal to return all of these results
directly to the client, in terms of ensuring the quickest response to the end
user. Typically, only the first X (10, 50, 100, etc) set of results are returned,
and the rest are cached within the presentation module of the search engine.
When the user then asks for the next “page” of search results (like 50-100),
the search engine would simply serve this next page of results from cache.
򐂰 Componentized architecture:
Probably the most important aspect of any search technology is the flexibility
and compotentization of its overall architecture. This is the most important
aspect in terms of performance. For example, if the search engine has a
relatively slow crawl rate, but has a componentized crawl engine that allows
for multiple “crawlers” to be executing at the same time, then the slow crawl
rate of a single crawler may be acceptable.
Common capabilities that should be available as separate “components” to
aid in performance and scalability are:
– Crawling/indexing — This is necessary so that multiple data sources can
be crawled and indexed at the same time.
120
Patterns: Portal Search Custom Design
– Search engine — Even if a single search engine is efficiently
multi-threaded, supporting multiple instances of the search engine to
handle user search requests increases scalability of the implementation
so that multiple searches can be processed at the same time.
– Client interface — The client interface should ideally be separate from the
search engine. This allows the load of presentation and user formatting
logic to be moved to separate hardware than the hardware utilized for the
search engine itself. This also allows for better maintainability in that the
user interface can be modified without affecting the underlying search
implementation.
򐂰 In the case of an extended search capability, some additional key
components would be:
– Search broker — Support for multiple search brokers instances increases
the scalability of the implementation in the same manner as do multiple
search engines in a non-extended search product.
– Search connectors — Search connectors allow for communication
between the search brokers and individual data source search engines —
to allow for modular plug and play of additional search sources. Support
for multiple connectors to the same data source can further distribute the
load of search requests.
For more details on these components, and ideal “runtime” architectures, please
see Chapter 5, “Runtime patterns” on page 67.
7.7 Client features
When considering aspects of a good search client, the flexibility of this search
client to provide more than just a basic search result set are another important
consideration. Next we offer some questions to ask when considering the
capabilities of clients provided with a search product.
What level of detail is available in results?
After a user performs a search, the more detail they are provided with in the
search results, the easier it will be to determine the relevancy of any resulting
documents found. For example, is just a basic document title included in the
search result, or is a document summary also included? Even more, can specific
fields from the document metadata be shown (creation date, author, etc.) to
further refine and clarify the search results.
Chapter 7. Technology considerations
121
How are results from multiple sources handled?
When an extended search technology is involved, and thus searches are
spanning multiple data sources, how are these results from multiple sources
presented to the user?
One option is for the results to be displayed broken down by data source. In
other words, users would see all results for datasource1, then all results for
datasource2, and so on. Another option that provides additional advantages to
the user is to present in a single consolidated view, with results aggregated and
ranked into one list. However, such an aggregate solution does introduce
additional complexities, such as how the diverse rankings of each data source
are identified.
For example, a user may search on the term “redbook”. The search against the
first data source may return many hits with a ranking of “80%”, while a second
data source may user different criteria in determining its rankings and label
similar results as having an accuracy of 60%. When the results of these sources
are aggregated into a single list, this different in ranking must be accounted for.
Thus, many extended search technologies will support the assignment of
“weights” to the rankings returned by a source. In our example, the results for the
first datasoruce would be given a lower weighting than that of the second data
source, thus resulting in an aggregated list with similar percentage rankings of
the results.
Of course, there may be situations in which support for both methods described
here is required in a given solution! Therefore, the most ideal solution would
provide support for results broken down by source, and consolidated into a single
list.
Can the original document be “fetched” if needed?
After viewing a set of search results, and selecting a specific document to view in
more detail, how does the user gain access to that original document? For Web
based search clients, users typically select a result item (identified by a URL),
and the Web browser renders the content in accordance with the MIME type set
for the document. For example, the browser might use Microsoft Word to render
documents that have the file extension .doc).
However, in some cases, such as with documents stored in file systems or in
relational databases, the search technology will need to retrieve the requested
document from the back-end data source — and then render it in some format
that is usable by the user.
122
Patterns: Portal Search Custom Design
Does the search client support saved searches?
For searches that a user may need to repeat on a regular basis, the ability to
provide for saved searches is important. This is especially true if the search
technology allows for searches with an advance search query syntax — as such
searches, if not saved, may be difficult for the user to recreate.
Support for the saving of search queries can be extended even more to allow for
sharing of saved searches with other users, or allowing for the scheduling of
searches to run on a repeated basis — perhaps with search results emailed to
the user.
Another aspect of saving searches to consider is the saving of search results.
The result set may also need to be shared with other users, and providing a
mechanism for users to save search results in multiple formats would allow for
this reuse. Common methods for the saving of search results are PDF, MS
Word, or even XML. Without such capabilities users may forward around links to
search queries, which would run the search again each time a user follows the
link — potentially impacting the performance of the search server.
There are obvious ways to handle these capabilities programmatically in any
search client that is created. However, having saved search capabilities
supported within the back-end of the search engine itself eases development
effort and time for any search clients.
Can results sets be dynamically used?
One final consideration for features in the search client is whether users have the
ability to utilize the search results to perform further analysis. For example, can
the users search with the search results — thus further refining their search?
7.8 Client technologies
As we are considering portal related search solutions in this redbook, the main
capability required in terms of a client technology is support for standard “Web”
based capabilities that can be “surfaced” within the context of a portal — via a
“portlet” within the portal.
Specific guidelines on portlet development are provided in Chapter 8,
“Application design” on page 133 later in this redbook. However, at a high level, a
“portlet” is itself just a basic Web-based e-business application. Thus, any search
engine should support multiple common Web technologies to provide the
maximum flexibility in integrating the search technology into the portal.
Some common Web technologies that should be supported are:
򐂰 HTML, the basis of any e-business Web application
Chapter 7. Technology considerations
123
򐂰 Dynamic HTML, JavaScript, and Java Applets for enhancing the user
experience on the client
򐂰 Java Servlets, Java Beans, and Java Server Pages (JSP) to provide for
server side processing and logic
When these technologies are put together, the common process for a Web
based e-business application is as follows:
򐂰 An HTML client interacts with a Web application server by using the HTTP
protocol.
򐂰 The Web server processes the request via a “server side” technology such as
Java Servlets or Java Server pages.
Any search technology would ideally be integrated at this point, by providing
strong support for standard J2EE components, such as Java Beans and JSP
Tags, to easily access search capabilities from server side logic.
򐂰 The server then returns a new HTML page to the client as a response to the
original request, again responding via HTTP.
򐂰 The new HTML page can contain Java applets, JavaScript, or dynamic HTML
(DHTML) for enhancing the presentation to the user.
An alternative to the foregoing process would be using an Extensible Markup
Language (XML) based “Web Services” solution to facilitate communication with
the server. In this model:
򐂰 Messages are packaged for sending to a server via structured XML, within a
SOAP (Simple Object Access Protocol) “envelope”.
򐂰 The search service is located via the usage of Web Services Description
Language (WSDL) and Universal Description, Discovery and Integration
(UDDI) capabilities.
򐂰 The message is then sent to the Web service over HTTP, within a SOAP
message. The search technology would implement this “Web service” and
would then process the request, returning a response to client via another
SOAP message.
򐂰 The client would then manipulate this response by parsing the XML, and
using standard XML APIs and tools to present the data to the user.
Ideally, any search technology chosen would support both models of
development (that is, standard HTML/Java and SOAP/XML) for maximum
solution flexibility.
All of these technologies are well known for the development of any e-business
application, and some guidelines for their usage are described within the rest of
this section.
124
Patterns: Portal Search Custom Design
7.8.1 HTML
HTML (HyperText Markup Language) is a document markup language with
support for hyperlinks that is rendered by the browser. It includes tags for simple
form controls. Many e-business applications are assembled strictly using HTML.
This has the advantage that the client-side Web application can be a simple
HTML browser, enabling a less capable client to execute an e-business
application.
The HTML specification defines user interface (UI) elements for text with various
fonts and colors, lists, tables, images, and forms (text fields, buttons,
checkboxes, and radio buttons). These elements are adequate to display the
user interface for most applications. The disadvantage, however, is that these
elements have a generic look and feel, and lack customization. As a result, some
e-business application developers augment HTML with other user-interface
technologies to enhance the visual experience, subject to maintaining access by
the intended user base and compliance with company policy on Web client
technologies.
Because most Web browsers can display HTML V3.2, this is the lowest common
denominator for building the client side of an application. To ensure compatibility,
developers should be unit testing pages against a validator tool. Free tools, such
as the W3C HTML Validation Service, are available at:
http://validator.w3.org/
7.8.2 Dynamic HTML
DHTML allows a high degree of flexibility in designing and displaying a user
interface. In particular, DHTML includes Cascading Style Sheets (CSS) that
enable different fonts, margins, and line spacing for various parts of the display
to be created. These elements can be accurately positioned using absolute
coordinates.
Another advantage of DHTML is that it increases the level of functionality of an
HTML page through a document object model and event model. The document
object enables scripting languages such as JavaScript to control parts of the
HTML page. For example, text and images can be moved about the window, and
hidden or shown, under the command of a script. Also, scripting can be used to
change the color or image of a link when the mouse is moved over it, or to
validate a text input field of a form without having to send it to the server.
Chapter 7. Technology considerations
125
Unfortunately, there are several disadvantages when using DHTML. The
greatest of these is that two different implementations (Netscape and Microsoft)
exist and are found only on the more recent browser versions. A small, basic set
of functionality is common to both, but differences appear in most areas. The
significant difference is that Microsoft allows the content of the HTML page to be
modified by using either JScript or VBScript, while Netscape allows the content
to be manipulated (moved, hidden, shown) using JavaScript only.
Due to varying levels of browser support, cross-browser design strategies must
be used to ensure appropriate presentation and behavior of DHTML elements. In
general, this technology is not recommended unless its features are needed to
meet usability requirements.
Additionally, DHTML is not supported within a “portlet” application.
7.8.3 JavaScript
JavaScript is a cross-platform object-oriented scripting language. It has great
utility in Web applications because of the browser and document objects that the
language supports. Client-side JavaScript provides the capability to interact with
HTML forms. You can use JavaScript to validate user input on the client and help
improve the performance of your Web application by reducing the number of
requests that flow over the network to the server.
ECMA, a European standards body, has published a standard (ECMA-262) that
is based on JavaScript (from Netscape) and JScript (from Microsoft), called
ECMAScript. The ECMAScript standard defines a core set of objects for scripting
in Web browsers. JavaScript and JScript implement a superset of ECMAScript.
To address various client-side requirements, Netscape and Microsoft have
extended their implementations of JavaScript in version 1.2 by adding new
browser objects. Because Netscape's and Microsoft's extensions are different
from each other, any script that uses JavaScript 1.2 extensions must detect the
browser being used, and select the correct statements to run.
One caveat is that users can disable JavaScript on the client browser, but this
can be programmatically detected.
126
Patterns: Portal Search Custom Design
7.8.4 Java applets
The most flexibility of the user interface (UI) technologies that can be run in a
Web browser is offered by the Java applet. Java provides a rich set of UI
elements that include an equivalent for each of the HTML UI elements. In
addition, because Java is a programming language, an infinite set of UI elements
can be built and used. There are many widget libraries available that offer
common UI elements, such as tables, scrolling text, spreadsheets, editors,
graphs, charts, and so on.
You can use either the Java AWT or Swing classes to build a Java applet. But
while designing your applet, you should keep in mind that Swing is supported
only by later browser versions.
A Java applet is a program written in Java that is downloaded from the Web
server and run on the Web browser. The applet to be run is specified in the
HTML page using an APPLET tag:
<APPLET CODEBASE="/mydir" CODE="myapplet.class" width=400 height=100>
<PARAM NAME="myParameter" VALUE="myValue">
</APPLET>
For this example, a Java applet called “myapplet” will run. An effective way to
send data to an applet is by using the PARAM tag. The applet has access to this
parameter data and can easily use it as input to the display logic.
Java can also request a new HTML page from the Web application server. This
provides an equivalent function to the HTML FORM submit function. The
advantage is that an applet can load a new HTML page based upon the obvious
(a button being clicked) or the unique (the editing of a cell in a spreadsheet).
A characteristic of Java applets is that they seldom consist of just one class file.
On the contrary, a large applet may reference hundreds of class files. Making a
request for each of these class files individually can tax any server and also tax
the network capacity. However, packaging all of these class files into one file
reduces the number of requests from hundreds to just one.
This optimization is available in many Web browsers in the form of either a JAR
file or a CAB file. Netscape and HotJava support JAR files simply by adding an
ARCHIVE="myjarfile.jar" variable within the APPLET tag. Internet Explorer uses
CAB files specified as an applet parameter within the APPLET tag. In all cases,
executing an applet contained within a JAR/CAB file exhibits faster load times
than individual class files. While Netscape and Internet Explorer use different
APPLET tags to identify the packaged class files, a single HTML page containing
both tags can be created to support both browsers. Each browser simply ignores
the other's tag.
Chapter 7. Technology considerations
127
JavaScript can be used to invoke methods on an applet using the SCRIPT tag in
the applet’s HTML page.
A disadvantage of using Java applets for UI generation is that the required
version of Java must be supported by the Web browser. Thus, when using Java,
the UI part of the application will dictate which browsers can be used for the
client-side application. Note that the leading browsers support variants of the
JDK 1.1 level of Java and have different security models for signed applets.
Using Java plug-ins, you can extend the functionality of your browser to support
a particular version of Java. Java plug-ins are part of the Java Runtime
Environment (JRE) and they are installed when the JRE is installed on the
computer. You can specify certain tags in your Web page, to use a particular
JRE. This will download the particular JRE if it is not found on the local computer.
This can be done in HTML through either of these tags:
򐂰 The conventional APPLET tag
򐂰 The OBJECT tag, instead of the APPLET tag, for Internet Explorer; or the
EMBED tag with the APPLET tag for Netscape.
A second disadvantage of Java applets is that any classes such as widgets and
business logic that are not included as part of the Java support in the browser
must be loaded from the Web server as they are needed. If these additional
classes are large, the initialization of the applet may take from seconds to
minutes, depending upon the speed of the connection to the Internet.
Using HTTP tunneling, an applet can call back on the server without reloading
the HTML page. For users who are behind a restrictive firewall, HTTP tunneling
offers a bidirectional data connection to connect to a system outside the firewall.
Because of the above shortcomings, the use of Java applets is not
recommended in environments where mixed levels and brands of browsers are
present. Small applets may be used in rare cases where HTML UI elements are
insufficient to express the semantics of the client-side Web application user
interface. If it is absolutely necessary to use an applet, care should be taken to
include UI elements that are core Java classes whenever possible.
7.8.5 Java servlets
Servlets are Java-based software components that can respond to HTTP
requests with dynamically generated HTML. Servlets are more efficient than CGI
for Web request processing, since they do not create a new process for each
request.
128
Patterns: Portal Search Custom Design
Servlets run within a Web container as defined by the J2EE Model and therefore
have access to the rich set of Java-based APIs and services. In this model, the
HTTP request is invoked by a client such as a Web browser using the servlet
URL. Parameters associated with the request are passed into the servlet via the
HttpServletRequest, which maintains the data in the form of name/value pairs.
Servlets maintain state across multiple requests by accessing the current
HttpSession object, which is unique per client and remains available throughout
the life of the client session.
Acting as an “controller” component, a servlet delegates the requested tasks to
beans that coordinate the execution of business logic. The results of the tasks
are then forwarded to a “view” component, such as a JSP, to produce formatted
output.
One of the attractions of using servlets is that the API is a very accessible one for
a Java programmer to master. The specification of the J2EE 1.3 platform
requires Servlet API 2.3 for support of packaging and installation of Web
applications.
Servlets are a core technology in the Web application programming model. They
are the recommended choice for implementing the logic that handles HTTP
requests received from a Web client.
7.8.6 JavaServer Pages (JSPs)
JSPs were designed to simplify the process of creating Web pages by separating
the Web presentation from Web content. In the page construction logic of a Web
application, the response sent to the client is often a combination of template
data and dynamically generated data. In this situation, it is much easier to work
with JSPs than to do everything with servlets. The JSP acts as the View
component in a standard Model View Controller (MVC) programming model.
The chief advantage JSPs have over standard Java servlets is that they are
closer to the presentation medium. A JavaServer Page is developed as an HTML
page. Once compiled, it runs as a servlet. JSPs can contain all the HTML tags
that Web authors are familiar with. A JSP may contain fragments of Java code
that encapsulate the logic that generates the content for the page. These code
fragments may call out to beans to access reusable components and enterprise
data.
JSP technology uses XML-like tags and scriptlets written in Java programming
language to encapsulate the conditional logic that generates dynamic content for
an HTML page. In the runtime environment, JSPs are compiled into servlets
before being executed on the Web application. Output is not limited to HTML but
also includes WML, XML, cHTML,and DHTML. The JSP API for J2EE 1.3 is JSP
1.2.
Chapter 7. Technology considerations
129
JSPs are the recommended choice for implementing the presentation (the view)
that is sent back to the Web client. For those cases where the code required on
the page is to be a large percentage of the page, and the HTML minimal, writing
a Java servlet will make the Java code much easier to read and maintain.
7.8.7 JavaBeans
JavaBeans are an architecture developed by Sun Microsystems, Inc. describing
an API and a set of conventions for reusable, Java-based components. Code
written to Sun’s JavaBeans architecture is called JavaBeans or just beans. One
of the design criteria for the JavaBeans API was support for builder tools that can
compose solutions that incorporate beans. Beans may be visual or non-visual.
Beans are recommended for use in conjunction with servlets and JSPs. For
example, the JavaServer Pages specification includes a set of tags for accessing
JavaBeans properties.s
7.8.8 XML
XML (Extensible Markup Language) and XSL stylesheets can be used on the
server side to encode content streams and parse them for different clients, thus
enabling you to develop applications for a range of PC browsers and for the
emerging pervasive devices. The content is in XML and an XML parser is used to
transform it to output streams based on XSL stylesheets that use CSS.
This general capability is known as transcoding and is not limited to XML-based
technology. The appropriate design decision here is how much control over the
content transforms you need in your application. You will want to consider when
it is appropriate to use this dynamic content generation and when there are
advantages to having servlets or JSPs specific to certain device types.
XML is also used as a means to specify the content of messages between
servers, whether the two servers are within an enterprise or represent a
business-to-business connection. The critical factor here is the agreement
between parties on the message schema, which is specified as an XML DTD or
Schema. An XML parser is used to extract specific content from the message
stream. Your design will need to consider whether to use an event-based
approach, for which the SAX API is appropriate, or to navigate the tree structure
of the document using the DOM API.
7.8.9 Web Services
Web Services is the label placed on the latest variation of a service-oriented
architecture (SOA). Basically, when someone now refers to a Web Services
130
Patterns: Portal Search Custom Design
application, they are referring to an application or application component that
makes use of the XML-based standards, SOAP, UDDI, and WSDL:
򐂰 SOAP (Simple Object Access Protocol) is one of the key standards that make
up the foundation for Web services. SOAP is a lightweight XML-based
protocol for exchange of information in a decentralized, distributed
environment.
A SOAP message is composed of several parts. These include the envelope,
header, body and the actual payload. The entire XML message is wrapped
within an envelope (<Soap:Envelope>…</Soap:Envelope>) clause. Headers
are optional, but may contain routing and authentication type of content.
These are found within the (<Soap:Header> </Soap:Header>) clauses. The
body contains the actual payload wrapped in the
(<Soap:Body>…</Soap:Body>) clause.
A SOAP message can select from one of two transmission styles. A message
can be transmitted in a “document-style” or a “remote-procedure-call style”.
򐂰 UDDI (Universal Description, Discovery and Integration) is an XML-based
framework to enable businesses to discover each other, define how they
interact and share information in global registry.
Combined with SOAP, the UDDI initiative was created to facilitate discovery
of Web services over the Internet. The Web service developer registers the
service definition and response specifications with the registry.
򐂰 WSDL (Web Services Description Language) is an XML format for the
description of network services as a set of endpoints operating on messages
containing either document-oriented or procedure-oriented information. When
an application “finds” and available Web service from within a UDDI registry, it
then interacts with that service using the information defining the service as
described in the services WSDL.
There are many existing IBM Redbooks and other sources of documentation that
provide more details about the usage and concepts behind Web services.
However, here are a few important considerations for the usage of Web services:
򐂰 Web Services based architectures provide for interoperability across the two
main development environments in existence today; that is, Microsoft’s.Net
initiative, and the open source J2EE environment. An application written in
.Net can leverage Web services written in J2EE, and vice versa.
򐂰 Security is currently a concern with Web services environments. However,
there are many standards in development to apply a level of security over
SOAP messages, etc. The support for any of these security standards should
be investigated when a product supporting Web services is purchased.
Chapter 7. Technology considerations
131
7.9 Summary
This chapter has provided an overview of the key technological aspects that
should be considered when selecting a search product to implement your search
solution.
Overall, the following key questions should come to play when making such a
search technology decision:
򐂰 What type of Query syntax is supported?
򐂰 Is a common data model supported for extended search capabilities?
򐂰 Are capabilities provided for simple index creation, or more advanced
summarization and categorization?
򐂰 Is the security of document and data sources honored throughout the
indexing and searching process?
򐂰 Is the technology built via a compotentization architecture that will allow for
deployment flexibility, and ease in addressing any performance issues?
򐂰 Does the product provide a diverse and flexible set of client choices, and/or
APIs for creating custom clients?
By considering the answers to these questions, one should be able to select the
best search product for any given environment.
132
Patterns: Portal Search Custom Design
8
Chapter 8.
Application design
Portal application design presents some unique challenges compared to
traditional application design and development. These challenges apply to any
search solutions built within a portal environment, as they need to adhere to
portal design rules and limitations.
The majority of the challenges are related to the fact that traditional applications
were primarily used by a defined set of internal users, whereas Portal
applications are used by a broad set of internal and external users such as
employees, customers, and partners.
This chapter attempts to define these challenges, and provide some best
practices for dealing with this challenges for any portal/portlet based applications
— including search solutions.
© Copyright IBM Corp. 2004. All rights reserved.
133
8.1 Introduction
In supporting a multitude of audiences, data from a wide variety of sources must
be captured, managed, aggregated, and targeted to specific groups while also
customizing the display formatting of the information for various client device
types. The following list provides key issues to consider when designing Portal
applications:
򐂰 The user experience, look, and feel of the site need to be constantly
enhanced to leverage emerging technologies, as well as to attract and retain
site users.
򐂰 New features have to be constantly added to the site to meet customer
demands.
򐂰 Such changes and enhancements will have to be delivered at record speed to
avoid losing customers to the competition.
򐂰 Portal applications in essence represent the corporate brand online.
Developers have to work closely with the marketing department to ensure the
digital brand effectively represents the company image. Such intra-group
interactions usually present content management challenges.
򐂰 It is hard to predict the runtime load of Portal applications. Based on the
marketing of the site, the load can increase dramatically over time. If the load
increases, the design must allow such applications to be deployed in various
high volume configurations.
򐂰 Security requirements are significantly higher for Portal applications
compared to traditional applications. In order to execute traditional
applications from the Web, a special set of security-related software may be
needed to access private networks.
򐂰 The emergence of the Personal Digital Assistant (PDA) market and
broadband Internet market will require the same information to be presented
in various user interface formats. PDAs and various other “pervasive”
(mobile) devices will require a lightweight presentation style to accommodate
the low network bandwidth. Broad-band users on the other hand will demand
a highly interactive, rich graphical user interface. The presentation of the
information in the portal must be logically separated from the business logic
and datasources so that it can be changed as required.
134
Patterns: Portal Search Custom Design
򐂰 The domain of e-business application users is typically much more diverse
than that of the user group for traditional applications. Users can be known to
the systems, can remain anonymous, and can come from inside or outside
the enterprise. Web accessible applications must be developed to meet the
varied needs of those different end user types.
򐂰 The diversity in user types and exposure to the outside world significantly
increases security risks to internal systems. The e-business application
security infrastructure and applications must be designed accordingly, and
will likely require dedicated security components.
򐂰 Content for e-business applications can come from many sources: technical
and non-technical, from inside or outside the enterprise. Such diversity in
location, skill levels, and access create significant challenges for content
creation and management.
To meet these challenges, extensibility, maintainability, and scalability are critical
aspects in the design of Web applications. The following sections provide some
suggestions for meeting the diverse challenges of e-business, especially in
regard to using the Portal composite pattern.
8.2 WebSphere Portal Services architecture diagram
IBM’s WebSphere Portal (WP) is essentially a Web application that runs within
the IBM WebSphere Application Server (WAS) environment. This allows
WebSphere Portal to take advantage of the core services in WebSphere
Application Server for connectivity to various data sources and applications
(for example, IBM’s Directory Server for LDAP).
In addition, WebSphere Portal provides a Portlet API that allows the developer to
create compact Java Web applications that “sit on top of” the WebSphere Portal
application and thus have access to all the core services via simple to implement
tag libraries and extended classes. This allows a developer creating end-user
services to avoid having to “re-write” core connectors to WebSphere Application
Server services each time they want to provide functionality.
Chapter 8. Application design
135
Secure Way, Domino, Netscape,
Active Directory LDAPs
WebSphere Common
User Subsystem's DB
WPS Database,
Policy Director or
Netegrity
Siteminder
Authorization
Enrollment
Portlet
WebSphere
Common
User
Subsystem
WPS Database
or
Tivol Policy Director
Vault
Credential
Vault
Self Care
Portlet
Apps
Portlet
WAP Aggr.
iMode Aggr.
Search
Portlet
Content
Management
WPS Content Organizer
Content Integration Packs for
Interwoven, Vignette, Documentum
Customer
Portlets
Portlet
PDA Aggr.
Portlet
Proxy
Portlet
Registry
Integrated Local Search (Juru)
Domino Extended Search, EII
Third-Parties (Autonomy, Verity)
Admin
Portlets
Voice Aggr.
User
Config
DB2 or
Oracle
Portal
Engine
Portlet
Portlet
Data
Web
Service
SOAP
PC Aggr.
Portlet API (Java)
PD WebSeal
WTE Seal
Netegrity
Siteminder
WebSphere
Security
Authentication
SOAP
Router
For example,
Notes,
Exchange,
Remote
Portlet
Site Analyzer
Any web service
.NET or J2EE based
Portlet published as
RPWS
Once standard exists:
.NET based RPWS
J2EE ased RPWS
Predefined reports for
WPS Log Analysis
Tivoli (Future)
WP Data Store
Figure 8-1 WebSphere Portal Server 4.1 Component Architecture
For example, WebSphere Portal can leverage the concept of Web Services by
allowing a developer to create their own portlets or use the existing Web services
portlets with tag libraries to enable this communication. They can avoid having to
write their own SOAP based wrappers and use the common wrappers available
to WebSphere Portal via the core services in WebSphere Application Server.
8.2.1 Single-Tier versus Multi-Tier design
There are two perspectives in the application development and systems
architecture realm.
The first perspective describes keeping all functional capability on a single
system and using a single “codebase”. This allows for a “single point of failure”
and makes it more difficult to determine the root cause for possible functional or
infrastructure based system problems. In addition, this requires that any
functional modifications will have an effect on other systems. There is no
abstraction between the presentation, business logic, and data source tiers.
136
Patterns: Portal Search Custom Design
The second perspective describes the separation of functionality and “functional
concerns”. This is exemplified by an architecture that has the business logic,
data sources, transaction processing/management, presentation layer, and
security mechanisms logically separated but working in concert to provide a
single set of functionality. This identifies several possible points of failure but also
provides for easier problem determination when system difficulties occur.
For example, if the user cannot login and authenticate with the portal, then the
first place to look is the authentication/security mechanism. This, in turn, includes
the “rules” that govern how the portal responds to requests (including user type
definition, group definition and their privileges), and the LDAP directory
(containing user and group profile information and meta-data).
A multi-tier design is preferable because it allows a separation of concerns
between the presentation, business logic, and datasource environments. It uses
the application server, in conjunction with security, directory, and business rules
mechanisms to by the “integration hub” where data is aggregated. The
presentation layer leverages the aggregated data to provide end-user display
and datasources are separated and accessed via a common set of connectors
via the application server’s core services (for example, JDBC, JMS, JCA, or Web
Services).
8.3 Portal solution guidelines
The design of a portal system and architecture adopts many of the general
tenants for e-business application design. In addition, using the Portal composite
pattern as a guide, consider these guidelines when designing a portal
implementation:
򐂰 The application server node is the central mechanism where data is “pulled”
from multiple data sources.
򐂰 In a portal implementation it is sound architectural practice to provide a
central, yet separated (loosely coupled) security mechanism that enhances
maintainability and especially extensibility of the authentication and
authorization mechanisms. New policies can be implemented without the
need to modify other systems in the portal implementation.
򐂰 Use a component based architecture that isolates functionality (for example,
business logic versus data sources versus security versus presentation)
allowing the enhancement of specific areas of functionality with minimal affect
on the whole system. This is often referred to as a “separation of concerns”
and is similar to the MVC based application architecture. It makes sense to
apply these same concepts of MVC based design to the larger environment
where disparate applications and datasources are being “integrated”.
Chapter 8. Application design
137
򐂰 The use of a separated directory service allows for the upgrade of user and
organization profile management without affecting the rest of the system.
򐂰 The portal system should be based on a set of functionally separated
components leveraging common connectors for communication.
򐂰 The concept of workflow should be a single mechanism and can be used to
manage both content “assets” that are contributed to the portal system and
inter-application communication. An example of this management of
communication between components is most evident when some type of
queueing mechanism is used (for example, IBM’s WebSphere MQ) to provide
guaranteed message communication between components.
򐂰 Content (including data from applications, databases, external datasources,
and people) should be managed centrally. In some cases, it makes sense to
pre-format the content before providing this content to the presentation or
business logic (application server) mechanisms.
򐂰 Collaboration should leverage common security mechanisms and provide
both asynchronous (for example, content management, discussion forums)
and synchronous interaction (for example, instant messaging and “chat”
facilities). The security of instant messaging is still relatively immature but is
available in products such as Lotus’ Sametime.
򐂰 Leverage the single set of aggregated content and reformat for various device
types. Sometimes, it seems most expedient to just duplicate the content and
reformat for different device types. This places a maintainability burden on the
system. It is more effective to use the same core set of aggregated
information and apply end-user display templates (for example, using XSLT
applied to XML formatted data) to format content. In this way, as new
end-user display formats arise, more templates can be added while leaving
the content storage and management mechanisms and processes relatively
untouched. This saves time and money.
򐂰 Single Sign-On is important to provide seamless access to various
datasources through a single interface. This provides for a “easier to use”
user experience and allows for easier maintainability of user account
information (through leveraging the concept of a central directory and
authentication mechanism).
138
Patterns: Portal Search Custom Design
8.3.1 Model-View-Controller design
In the Model-View-Controller design shown in Figure 7-2, Model represents the
application object that implements the application data and business logic. The
View is responsible for formatting the application results and dynamic page
construction. The Controller is responsible for receiving the client request,
invoking the appropriate business logic, and based on the results, selecting the
appropriate view to be presented to the user.
A number of different types of skills and tools are required to implement various
parts of a Web application. For example, the skills and tools required to design
an HTML page are vastly different from the skills and tools required to design
and develop the business logic part of the application. In order to effectively
leverage these scarce resources and to promote reuse, we recommend
structuring Web applications to follow the Model-View-Controller design pattern:
򐂰 The model represents enterprise data and the business rules that govern
access to and updates of this data. Often the model serves as a software
approximation to a real-world process, so simple real-world modeling
techniques apply when defining the model.
򐂰 A view renders the contents of a model. It accesses enterprise data through
the model and specifies how that data should be presented. Web services
and Web services technologies
It is the view's responsibility to maintain consistency in its presentation when
the model changes. This can be achieved by using a push model, where the
view registers itself with the model for change notifications, or a pull model,
where the view is responsible for calling the model when it needs to retrieve
the most current data.
򐂰 A controller translates interactions with the view into actions to be performed
by the model. In a stand-alone GUI client, user interactions could be button
clicks or menu selections, whereas in a Web application, they appear as GET
and POST HTTP requests. The actions performed by the model include
activating business processes or changing the state of the model. Based on
the user interactions and the outcome of the model actions, the controller
responds by selecting an appropriate view.
Chapter 8. Application design
139
Model
Encapsulates application on state
Responds to state queries
Exposes application
functionality
Notifies views of changes
State
Query
State
Change
Change
Notification
View
Renders the models
Requests updates from models
Sends user gestures to controller
Allows controller to select view
View Selection
User Gestures
Controller
Defines application behavior
Maps user actions to
model updates
Selects view for response
One for each functionality
Method Invocations
Events
Figure 8-2 The Model-View-Controller design pattern
MVC and Web Services
An MVC architected application can leverage classes at the action level to invoke
communication via SOAP to access Web Services. In fact, the action class can
be used as the gateway for communication via other methods. Using the
paradigm allows the application and its architecture to remain in its current form,
and it can treat the Web Service as just another data source. In Figure 8-3, an
example of this is diagrammed.
140
Patterns: Portal Search Custom Design
Figure 8-3 An MVC based application communicating with a Web Service via SOAP
Also note that if you couple an MVC architected application with another
business logic and data access layer following something like the
Command-Manager design pattern, the access to Web Services can still be
through the action class, or alternatively through a datasource level Manager
class.
While the Portal composite pattern is at the operational architecture level, the
design of the application at the application architecture level also has impacts on
which connector technologies can be used. In some cases, it makes sense to
use Web Services and in other cases it makes sense to use JMS or JCA.
Chapter 8. Application design
141
MVC and portlets
Portlets must be capable of supporting display output to multiple Internet
browsers running on many communications devices. These browsers typically
require different display markup languages. Smaller devices have smaller
display screens and typically have more limited handling of display markup.
Portlets can support multiple browsers and device types by being implemented
with the model-view-controller (MVC) design pattern. This design contains three
entities:
򐂰 The model, the data source to be retrieved for the portlet:
Model data for a portlet is typically retrieved from an external data source and
loaded into Java display beans, or arrives formatted in an XML document.
򐂰 The view or views, the output mechanism used to display the data of the
portlet:
Display views are typically implemented as either JSP's, more typically used
when the data model is implemented in Java beans, or XSLT style sheets
when the incoming data is formatted in an XML document.
򐂰 The controller, which joins the selected view to the data and conducts the
operation of the portlet:
The controller selects the view for display based on the target device or
browser, and then passes the data model to the view. The view extracts the
specific display data, formats the data for the browser and renders its output
to the browser as part of the portal aggregation of portlet outputs.
For portlet development, the MVC pattern has the following characteristics:
򐂰 The portlet is only responsible for calling the right controller, depending on the
markup supported by the client.
򐂰 Connectors are responsible for accessing content sources. Typically, there is
one connector per content source type, for example, one connector for POP3
access and one for file-based cache.
򐂰 Models represent the content as retrieved through the connector. A model is
independent of the presentation.
򐂰 Controllers are responsible for providing the appropriate markup (HTML,
cHTML, or WML) for the content.
In the MVC structure, there is a distinct separation of data from presentation
along with a controller component for managing the interaction between the data
(model) and the presentation or view. The controller knows the environment in
which the application is invoked, gathers information from the data object to be
displayed, and then applies the appropriate view to render the data using the
markup language appropriate for the current device.
142
Patterns: Portal Search Custom Design
A portal system is described as all of the business logic, user data, existing data
sources (databases and applications), and supported end-user channels that
contribute to an aggregated view of information targeted to specific user “groups”
or “types”. The fundamental characteristics of a portal are:
򐂰
򐂰
򐂰
򐂰
Information aggregation
Targeted and personalized information
Managed content
Single sign-on
The characteristics and considerations in portal implementation are described in
2.3.7, “Portal characteristics” on page 28.
8.3.2 Content management guidelines
In a portal implementation, content management provides a type of collaboration
and a mechanism and process for managing how information gets added and
removed from the portal system.
A content management system should provide the following capabilities:
򐂰
򐂰
򐂰
򐂰
Workflow
Content creation, approval, deletion, formatting, publishing
Access control lists
Content “asset level” and “edition level” versioning
The implementation of content management involves these guidelines:
1. Define the types of content:
The content can defined as “documents” or other binary data or more
discretely as “pieces” of content that are meant to be combined into a single
view. This identification includes identifying the source, format, and update
schedule of the content.
2. Define the location and/or source of the content:
Knowing the location of existing and sources of new content is important.
For example, if the content is a news feed (RSS formatted) from an external
source or if the content is being contributed by users from several
departments, all of this will have to be taken into consideration when defining
the types of users and their access to the content management process and
mechanism.
3. Define the process by which the content will be contributed and eventually
published:
The contribution and publishing of content can have impacts on existing
process in the organization. Each group will likely have different methods for
collecting and organizing the content they maintain. These processes will
Chapter 8. Application design
143
have to be analyzed to determine if they need to be changed, adopted or ‘not
used’ in lieu of other organizational and process changes that may be
occurring. These decision can have an impact on what functionality the portal
will provide and how that functionality will leverage other systems. This
analysis should take place before the architecture is completed. In addition,
the publishing of content may be impacted by existing security policies in an
organization and these will have to be taken into account when determining
how content will get from the content management system to the presentation
mechanism (for eventual display to the portal user).
4. Define the versioning scheme content:
Versioning allows the ability to revert to previous incarnations of the portal
site. The content and/or display templates can be versioned. Each content
management package handles this differently and with varying success.
5. Define the expiration scheme of the content:
The expiration of content is an important concept because is will affect
changes on the production systems. It is important that whatever packages or
technologies are used to enable this, take into account how deletion of
content will affect other systems that may depend on that content. In a
properly designed, loosely coupled architecture, these impacts are mitigated
because the content management process only identifies what changes (or
deletions are necessary) and allows other systems to handle the mechanics
of the deletions. This also implies that before implementing content
management, a complete understanding of the types of content and what
systems depend on that content needs to be completed.
8.3.3 Single sign-on guidelines
From the user’s perspective single sign-on (SSO) is the ability to move between
applications without being prompted for a userid and password (or certificate)
when moving from one application or datasource to another. These applications
and datasources can be on the same or different physical servers.
Guidelines for implementing SSO are as follows:
򐂰 Leverage a central user directory
򐂰 Provide a single mechanism for intercepting user requests and passing
security credentials to various applications and datasources
򐂰 Provide a session or authentication timeout for a user’s logged in session
򐂰 Provide a variation of the SSO interface for various client device types
򐂰 Identify the various user types and link them with the business rules engine
򐂰 Enable encryption of the “data packets” as authenication is being performed
144
Patterns: Portal Search Custom Design
򐂰 Update security policies to enhance the physical security of the SSO
mechanism that intercepts authentication requests for existing applications
and data sources
򐂰 Agree to a common format for usernames and passwords
򐂰 Agree to a process for updating SSO account information for usernames and
passwords (for example, passwords cannot be retrieved if lost, they must be
reset)
8.3.4 Collaboration guidelines
Collaboration is a vital part of a portal implementation. It helps organizations
ease their transition through process change brought on by the data
consolidation efforts resulting from a portal implementation. In addition, it
provides a mechanism for organizations to communicate with their target
audiences so that they can understand the appropriateness and value of the
information provided to these audiences. When implementing collaboration, it is
important to understand these concepts and address them when designing,
implementing, deploying a portal system:
򐂰 Standards based approach:
The realm of standards for collaboration (and “collaboration services”) is very
young and evolving. It would be worthwhile, to develop applications based on
specifications such as JMS, which would make the application vendor
independent and hence portable. However, in certain areas like instant
messaging (JAIN, http://jcp.org/jsr/detail/165.jsp), the specifications
are still in the community process and one may require to adopt a
vendor-specific API. In such circumstances, the use of design patterns
become paramount so that the collaborative modules are loosely coupled
with the core application modules. Thus, the use of a loosely coupled
architecture for your portal solution allows the use of “packaged” solutions
that provide both synchronous and asynchronous interaction and prepare the
solution for incorporating a standards based set of technologies in the future.
򐂰 User Management:
The major concern here is the coupling between the user directory (and/or
discovery) service and the collaborative service.
In most cases, the collaboration service (CS) provider can be configured to
exclusively use its own or another directory service. This type of setup is ideal
since it provides single point of control and extensibility. However, in some
cases, the CS provider would use its own directory service. In such a
scenario, the directory services might require frequent replication/
synchronization. This, of course, implies that the two directory services are
either based on the same standards, or, we have a synchronization tool that
takes care of the conversion. This is not a best practices setup. Ideally, the
Chapter 8. Application design
145
collaboration service should leverage the central directory where users and
groups are described.
򐂰 Security:
Collaboration essentially involves peer communication and hence each client
is more powerful than in a server-based pattern. It becomes essential from
the systems and application perspective to provide a domain of activity for a
peer.
Most CS providers allow peer management through access control lists that
can be managed through a central directory/discovery server. The CS server
uses these ACLs to allow services to peers.
The authentication and authorization procedures are API specific and also to
application requirements. However, most CS providers such as Lotus
Sametime support for standards such as SSL and SOCKS proxy through
their API. These can be utilized, in conjunction with the user profile/ACL
based security to provide both user authentication and user authorization.
8.3.5 Web services guidelines
Web services constitute a distributed computer architecture made up of different
computers communicating over a network to form one system. At the current
time there are two competing application paradigms being put forward. One by
Sun Microsystems and the other by Microsoft. The Sun model is called the Sun
Open Net Environment (Sun ONE), an open framework that supports “smart”
Web services, and in which the J2EE platform plays a fundamental role.
The Microsoft application paradigm is called .NET. In fact, using Web services,
either .NET or Sun ONE services can be accessed by a Web services requester.
However, there are still issues and problems when communicating between
J2EE-base and .NET-based applications. Thus, the first best practice that we
would suggest is to avoid mixing these application paradigms if at all possible.
Having said that, keep in mind that many .NET services are being successfully
accessed by J2EE-based applications, and vice versa.
In this section, we focus on best practices for Web services development and
deployment within a J2EE environment, that is, Web services that are built using
servlets, JSP pages, EJB architecture, and all the other standards that are part of
the J2EE technology.
146
Patterns: Portal Search Custom Design
Apply distributed computing principles
Think of Web services as another technology for developing distributed systems.
All of the best-practice principles used in developing distributed systems apply to
Web services. All of the considerations that would go into any enterprise systems
design apply to Web services, such as high availability, high throughput,
clustering, hardware management, and network topology.
The main difference between most distributed systems and Web services is that
Web services are newer. Most Web services software is less than a year old. So
as a rule, there is not the same level of reliability, security, or performance that
you would find with other distributed systems software that have been around
longer. Another factor is that Web services are built on a set of technologies
(SOAP, XML, WSDL, UDDI) that are still evolving and are being evolved by
separate standards organizations and vendors in parallel. It will be some time
before all these standards will be able to converge (especially given the Sun
versus Microsoft debate). Because of the lack of a solid set of standards,
implementation details are left for individuals. Still, some common principles can
be adopted as best practices at this time:
򐂰 Design systems that are layered:
This is the same principle that you would apply to any distributed, component
architecture. It is especially important in Web services applications where we
do not have control over some components (services) that we access in our
application.
򐂰 Design course-grained Web services:
Web services have all the same issues as those of distributed systems when
it comes to requesting a remote service. Requesting a service from a
machine over the network is more expensive than a local operation. With this
in mind, keep the request as coarse grained as possible when requesting a
Web service from a remote machine.
Existing Java beans or EJBs with fine-grained methods or operations should
be aggregated into a single coarse-grained Web services, wherever possible.
This technique avoids unnecessary network traffic and overhead on the
communication stack. This also makes it possible to push the transaction
integrity requirements to the Web services provider making for a cleaner
design. In other words, if a coarse-grained request did not successfully
complete, then the Web service provider can roll back that entire transaction.
򐂰 Design for “loosely coupled” components:
A Web service by definition is an interface to a loosely coupled component on
a remote system. Therefore, it is very important to be cognizant of the impact
of integrating loosely coupled components. With this in mind, define clear
contracts between layers and services, but utilize the “Parameter List”
paradigm where possible.
Chapter 8. Application design
147
򐂰 Limit dependency on other components:
Managing dependencies is one of the key challenges in utilizing Web
services in an intranet or extranet scenario. Common dependencies that
occur in an application design are:
– Call flow dependency:
Business processes implemented by systems are not typically within the
domain of one business component.
– Object association dependency:
Using object-oriented techniques, it is easy to model a business problem
by associating objects together. However, from an implementation
perspective, doing so increases the linkage from one component to
another. Use interfaces where possible.
򐂰 Implement all cross "domain" business processes in a "control" or "workflow"
layer.
The flexibility of an application is increased if all business processes that
cross multiple business domains are implemented in a workflow layer. In
doing so, the application architecture has more flexibility in what is called,
when it is called, managing the call (such as exception handling), and
performing any translation on the data that is passed in or out.
For more detailed guidelines on Web Services, refer to Patterns: Self-Service:
Connecting to the Enterprise, SG24-6572.
Web services and Microsoft’s .NET
Using IBM WebSphere Studio Application Developer, it is possible to create and
run a Web service and to invoke the Web service from client applications created
using the Microsoft .NET Framework SDK. Using the Web services wizard within
Application Developer, you can generate WSDL proxies for consumption by
these Microsoft .NET clients in both the C# and JScript programming languages,
which are both currently supported by the Microsoft .NET Framework. It is even
possible to combine proxies created in both languages into single executables.
For details on how to create these integrated applications, see the WebSphere
Developer Domain article Developing Microsoft .NET Web Service Clients for
EJB Web Services with IBM WebSphere Studio Application Developer and the
Microsoft .NET Framework SDK:
http://www7b.software.ibm.com/wsdd/techjournal/0204_wosnick/wosnick.html?open&
l=851,t=gr
Note: C#, pronounced "C sharp," is a new Microsoft programming language
similar to Java and C/C++.
148
Patterns: Portal Search Custom Design
8.4 Summary
It is important that the portal design be based on a loosely coupled architecture.
This provides for the separation of components so that they can be replaced,
enhanced, or removed with little effect on other systems. In addition, the use of
common connectors is vital to linking the components together into single virtual
system. Connection technologies such as JMS, JCA, and Web Services are
methods that many vendors agree with and support allowing the portal
implementation that uses these technologies to bring together the best of breed.
8.5 Where to find more information
In this section we list some information sources for reference:
For information on the IBM Patterns for e-business:
http://www.ibm.com/developerworks/patterns
Web Services Security (WS-Security):
http://www-106.ibm.com/developerworks/webservices/library/ws-secure/
Web services architecture using MVC style:
http://www-106.ibm.com/developerworks/webservices/library/
Developing Web Services:
http://dcb.sun.com/practices/howtos/developing_webserv.jsp
Model-View-Controller design pattern — SUN Microsystems:
http://java.sun.com/blueprints/patterns/MVC-detailed.html
Chapter 8. Application design
149
150
Patterns: Portal Search Custom Design
Part 4
Part
4
Technical scenario
© Copyright IBM Corp. 2004. All rights reserved.
151
152
Patterns: Portal Search Custom Design
9
Chapter 9.
“Chrisco Books” scenario
To illustrate the Portal Search custom design, this chapter describes a sample
intranet portal scenario for a fictitious technical book publisher called “Chrisco
Books”. The example described is a demonstration of the need for a single
portal-based search capability across Web sites (HTML), file systems, content
management systems, and Lotus Domino databases.
Although this example scenario was created for a publishing company, the same
solution should meet the needs of most companies where unstructured textual
information is contained in various disparate repositories, and where multiple
search technologies have already been deployed in point solutions.
© Copyright IBM Corp. 2004. All rights reserved.
153
9.1 Chrisco Books scenario: story line
Chrisco Books writes and publishes technical journals and books for the IT
community. The company consists of subject matter experts, technical writers
and editors, and customer service representatives.
The company has recently deployed an employee portal providing employees
the ability to collaborate in team workplaces, and access human resource
information and industry related news feeds. The company would like to extend
the portal implementation in order to provide employees the ability to locate all
the details about any given book project — be it a completed project, legacy
“paper” based project, an in-progress project, or even related materials and
knowledge available on the internet.
In their current environment, the company maintains various data repositories,
including:
򐂰 File systems, where content creators and editors work on journals/books in
progress
򐂰 An intranet site, for published drafts and final work products
򐂰 A Lotus Domino based project tracking application, where all information
pertaining to the status a particular project is maintained
򐂰 A content management system, containing legacy books and journals that
have been converted to an image format for electronic storage, as well as
other image files for new book projects
In some of the cases, the data repositories have existing “point” search solutions
that were implemented over the years to allow for basic searching. However, any
user must locate the specific search capability, then query each of these multiple
search engines to find the need information, and then cross-reference the results
to determine what has been found.
Overall, the company has determined that the second phase of their portal rollout
should provide the following business benefits:
1. Help streamline current business activities, by improving organizational
efficiency and reducing the latency of business events. When this desire is
applied to search capabilities, one can see that these efficiency
improvements will ultimately come from:
a. Distilling meaningful information from a vast amount of structured and
unstructured data
b. Providing easier access to vast amounts of unstructured data through
indexing, categorization, and other advanced forms of summarization.
154
Patterns: Portal Search Custom Design
2. Chrisco Books also has a desire to reduce their IT/technology costs by:
a. Reducing spending on maintenance and training of legacy system
interfaces.
b. Reducing deployment and implementation costs for any new systems.
9.2 Chrisco Books scenario: requirements
In any technology effort, one of the most important first steps is to be sure you
understand what it is you are trying to build or implement — and its relationship
to the real business problem at hand. This section describes the requirements for
our Chrisco Books scenario.
9.2.1 Functional requirements
In any fictitious scenario, such as our book publishers search needs, we have full
reign over the functionality being provided. Rather than bore you with elaborate
Unified Modeling Language (UML) use cases, that go into an exhaustive listing of
pre-conditions, post-conditions, and alternate path scenarios, we are taking the
abbreviated approach of providing very brief one or two sentence descriptions of
the functionality available to the various users in our scenario.
For this scenario, we define several key user roles:
򐂰 Authors and Editors: Are ultimately responsible for the writing and editing of
complete technical documents, based on input from Subject Matter Experts.
This includes validating the source and accuracy of information provided by
such Subject Matter Experts.
򐂰 Subject Matter Experts: Are responsible for researching and drafting key
portions of the technical content, as contribution to the efforts of the authors
and editors. SMEs should not have access to project management
information, as this includes information regarding payment of SMEs, etc.
򐂰 Customer Service Representative: Are responsible for handling questions
originating from customer calls or instant messages, by pointing them to the
available books, and discussing upcoming book availability. They have other
billing and service responsibilities that are outside the scope of this effort.
򐂰 Administrators: Are responsible for maintaining the portal, search engines,
and back end applications.
򐂰 Customers: Are interested in buying and reading the books produced, as well
as understanding what upcoming books are planned.
򐂰 All users should be able to save their favorite search queries, so that they
may more easily access them in the future.
Chapter 9. “Chrisco Books” scenario
155
The user scenarios for the portal search users are shown in Figure 9-1.
Note: These scenarios only cover those related to searching from within the
portal framework, and do not include user scenarios related to the overall
portal solution.
PortalSearch
System
SearchPublished
Books
Customer
«extends»
Author/Editor
SearchProject
Mgmt/Status
Data
«extends»
Search Work in
Progress book
files
Customer ServiceRep
Utilizesavedsearche
s andpersonalization
SME
Searchinternet
andother w eb
know ledge
Administrator
Administersystem
Figure 9-1 User scenarios for new portal search functionality
9.2.2 Non-functional requirements
In addition to the business requirements, it is important to clearly articulate the
non-functional requirements associated with this business need as well. This
includes information such as:
򐂰 Target repositories: What data is being searched? The physical location and
data formats, supported queries, platform.
򐂰 Information retrieved: What information should be returned? Documents,
people (that is, expertise).
򐂰 Metrics: Number of concurrent searches, response times.
156
Patterns: Portal Search Custom Design
As with user scenario generation, the creation of full non-functional requirements
is beyond the scope of this book, and is not truly needed for such a fictitious
scenario. However, in this section we will briefly describe some non-functional
aspects.
Target repositories
As described earlier, Chrisco Publishers makes use of various repositories.
These repositories are located within the US and are based on various content
platforms. However, the user population is spread out across globe. Thus, a Web
based solution to accessing these repositories is required from a non-functional
standpoint — as such a thin client approach would be the only option for
providing performance on such a distributed basis.
File system
This repository contains Adobe Framemaker, Microsoft Word, and IBM Lotus
SmartSuite® files used by writers and editors. The file system used in our
scenario is a shared Windows 2000 Server drive, but could be any generally
accepted file services alternative; such as Novell Netware, or even Samba on
Linux.
“Legacy” search capabilities are provided for this repository in the existing
environment via standard Windows operating system file search capabilities.
Intranet site
This repository contains published books and other documents in Adobe PDF
and/or HTML file formats. Books are presented in a user interface that includes
summary information about the file, including an abstract, table of contents, and
list of authors.
“Legacy” search capabilities are provided for this repository via an existing Web
search engine. Users are presented with a browser based search interface,
which allows searches via simple boolean logic.
Project tracking application
This repository includes project management documents and information
maintained in an IBM Lotus Domino application. The information included
consists of client satisfaction surveys, time lines for the book effort, details on
book contents, as well as a list of people involved in the books creation (subject
matter experts, authors, editors, researchers,...).
“Legacy” search capabiltites are provided for this repository via the built in
indexing and searching in the IBM Lotus Notes/Domino technology.
Chapter 9. “Chrisco Books” scenario
157
Content management
This repository contains older books and journals in various image formats that
would otherwise be unavailable electronically. Additionally other graphics/image
files that are needed for current projects are also included, all stored within IBM
Content Manager. Metadata has been associated with the files to facilitate the
search and retrieval of information.
Search capabilities are provided the IBM Content Manager windows client.
Figure 9-2 depicts these multiple data repositories and search solutions as they
exist today.
OS File
Services
Index
Built in OS File
Search
WIP
FileSystem
(Win2k)
Notes
Indexing
Search
Project
Tracking
(Notes)
Notes client
Search
User
Search
Custom HTML
search interface
HTML
Index
Legacy
Web
Crawler
Published
Books
(HTML,PDF)
Search
Content
Mgr
Indexing
Content Manger
Client
Content Mgmt
(IBM CM)
Figure 9-2 Existing data repositories, with current search capabilities
158
Patterns: Portal Search Custom Design
9.2.3 Summary of requirements
After considering both the business and non-functional requirements, the desires
of Chrisco Books can be summarized as follows:
This “phase 2” portal solution should provide users the ability to search various
repositories from a Web single interface, which is tightly integrated into the new
portal “workplace” to which their employees are growing accustomed. Results
from the multiple repositories should be combined and normalized into one user
results list. Additionally, the appropriate content should be delivered in this
results list based on the employee’s role and access.
9.3 Patterns mapping
Now that the business requirements, and key technology drivers, have been
clearly identified, it is time to investigate the best practices available within the
industry for such a solution. This is done through the application of, or “mapping”
to, the Patterns for e-business.
9.3.1 Examining the business requirements
As we begin to examine the various “search” oriented application patterns, it is
easy to see that the existing environment described earlier in this chapter
basically maps towards the Application Integration::Population: Index Population
and Information Aggregation::User Search and Discovery application patterns
we discussed in Chapter 4.
This mapping of the patterns onto the current environment is shown in
Figure 9-3.
Chapter 9. “Chrisco Books” scenario
159
Index Population
User Search & Discovery
OS File
Services
Index
Built in OS File
Search
WIP
FileSystem
(Win2k)
Index Population
User Search & Discovery
Notes
Indexing
Search
Project
Tracking
(Notes)
Notes client
Search
User Search & Discovery
User
Index Population
Search
Custom HTML
search interface
HTML
Index
Legacy
Web
Crawler
Search
User Search & Discovery
Content Manger
Client
Index Population
Published
Books
(HTML,PDF)
Content
Mgr
Indexing
Content
Mgmt
(IBM CM)
Figure 9-3 Existing environment, mapped to application patterns
However, this existing environment consists of multiple search interfaces and
their supporting applications, in a manner that does not provide a single business
solution — and is not working for Chrisco books today. Therefore, we must next
consider the key business requirements specified in our scenario that are
needed to improve the situation:
򐂰 Improve organizational efficiency.
򐂰 Reduce the latency of business events.
򐂰 Distill meaningful information from a vast amount of structured and
unstructured data.
򐂰 Provide easier access to vast amounts of unstructured data through indexing,
categorization, and other advanced forms of summarization.
160
Patterns: Portal Search Custom Design
While the existing environment map be helping to distill meaningful information, it
is clearly not improving organizational efficiency and reducing the latency of
business events; as the existing environment is only providing benefits at
workgroup/individual task level. Chrisco Book’s just needs these benefits to be
provided on an enterprise level, so that true organizational efficiency can be
achieved.
9.3.2 Solution options
There can be multiple patterns based solutions to any business problem.
Therefore, it is important to analyze each possible option to determine the best fit
for Chrisco Book’s.
Option 1: Single Index
Based on the analysis so far, one alternative for a solution for Chrisco Book’s
needs, would be to implement the already identified Information Aggregation and
Application Integration patterns (Index Population and User Search and
Discovery) that meet the business requirements, on a single enterprise wide
basis, as depicted in Figure 9-4.
Index Population
WIP
FileSystem
(Win2k)
User Search & Discovery
Project
Tracking
(Notes)
Search Portlet
Search
Index
Crawler/
Indexer
User
Published
Books
(HTML,PDF)
Content
Mgmt
(IBM CM)
Figure 9-4 Solution 1: Single Index
Chapter 9. “Chrisco Books” scenario
161
Note: The Application patterns used here are discussed in detail within
Section 4.2, “Application Integration patterns” on page 45 and Section 4.3,
“Information Aggregation patterns” on page 57.
While such a single index solution would meet the business requirements for
Chrisco Books, it might also have a large technology/implementation cost
associated. For example, the data storage requirements for the creation of a
single index encompassing all of the data sources might be substantial.
Additionally, it may be difficult to find an “off-the-shelf” product with the ability to
talk to all of the platforms Chrisco utilizes for data collecting/indexing — and thus
substantial development efforts could be required to build such a single index via
custom code.
The question is, are these IT impacts acceptable to Chrisco Books?
Option 2: Search brokering/federated search
By analyzing the first solution option, we have found that looking at the business
requirements alone is not enough; and we must then revisit the IT drivers that
Chrisco books identified:
򐂰 Reduce spending on maintenance and training of legacy system interfaces.
򐂰 Reduce deployment and implementation costs for any new systems.
When combining these IT drivers with the business requirements, it is clear that
the search adapter/search service variations of the Information
Aggregation::User Search and Discovery application patterns maps to these IT
drivers.
A solution based on the inclusion of this variation of the User Search and
Discovery pattern is shown in Figure 9-5.
162
Patterns: Portal Search Custom Design
Index Population
OS File
Services
Index
Built in OS File
Search API
WIP
FileSystem
(Win2k)
Index Population
Notes
Indexing
User Search & Discovery
Search Adapter/Search Service variation
Search Broker/
Federator
Project
Tracking
(Notes)
Notes client API
Connector(s)
Connector(s)
Connector(s)
Index Population
Search Portlet
Custom HTML
search interface
HTML
Index
Legacy
Web
Crawler
Index Population
Published
Books
(HTML,PDF)
Content
Mgr
Indexing
User
Content Manger
Client API
Content
Mgmt
(IBM CM)
Figure 9-5 Solution 2: Extended search solution
Note: The additional Application pattern variation used in this solution is
discussed in more detail within “User Search and Discovery application
pattern” on page 61.
This solution meets the same business drivers as the “Single Index” solution, as
it includes the same Information Aggregation and Population patterns. The
addition of the extended search federation/brokering technology, via the Search
Adapter/Search Service variation of the User Search and Discovery application
pattern, then meets the cost reduction IT drivers specified by Chrisco Books.
In this solution, the User Search and Discovery based application is the only real
new capability that must be developed and deployed. The existing population
based applications are left intact, with search connectors interfacing with these
existing capabilities. Of the two solution options presented, this “extended
search” option would probably have the lowest deployment and overall IT costs
— as it provides for a large amount of reuse.
Chapter 9. “Chrisco Books” scenario
163
Additionally, the maintenance costs associated with the solution should also be
minimized, as any of the data repositories can be pulled out and replaced without
required modification to the entire solution. In the worse case, only one of the
“connectors” would need to be modified or replaced.
Based on these IT cost savings, this second option is clearly the correct choice
for Chrisco Book’s needs.
9.3.3 Integrating the solution
For most of this chapter we have focused more directly on the search needs and
requirements of Chrisco Books. However, it is important to highlight the
integration of this solution into the overall portal environment, and the Portal
composite pattern/Portal Search custom design.
Since the solution chosen delivers these search capabilities via a “portlet”, full
integration into the context of the portal is guaranteed. Thus, these Information
Aggregation and Application Integration capabilities will be available right
alongside the other Self-service, Collaboration, and Access Integration
capabilities of the full portal solution.
9.4 Expanding the scenario
So far in this chapter we have described a business problem, that being Chrisco
Books search needs, which is probably a bit more simplistic than a real world
problem. However, this scenario can be easily built upon with additional
functionality, by adding additional patterns and concepts as needed.
Here are a few examples of how this scenario can be expanded:
򐂰 The content of the portal itself could be included as one of the repositories
that is searched. The addition of additional sources is fully supported by the
proposed solution, as one of the existing Index Population applications could
crawl and index this additional data source, or another connector would
simply be needed to allow the User Information Access based application to
“extend” its search to this additional source.
Alternatively, additional sources could be integrated by introducing the
Application Interation:Federation application pattern as discussed in 4.2.4,
“Federation application pattern” on page 55. In this case, the additional data
sources would be unified behind a federation tier, that would then portray the
“image” of a single data source to the User Information Access application.
164
Patterns: Portal Search Custom Design
򐂰 More advanced search capabilities could be added, such as person and
expertise identification (that is, tacit knowledge), taxonomy and
categorization, etc., could be added to this scenario. In this case the same
application patterns would apply, as the “Search, Discovery, and Indexing”
tier of the Application Integration::Population: Index Population application
pattern supports these advanced capabilities.
Such advanced search capabilities could also be applied in a more “dynamic”
fashion, by allowing them to be performed in a second step. That is, users
would perform a search and then optionally choose to categorize the search
results into a taxonomy for easier analysis.
򐂰 Other enterprise systems, such as the CRM systems utilized by the Customer
Service Representatives, could leverage the same search capabilities and
interface. Customer Service representatives would then have one location to
both research, and look up details about a given customer’s interaction with
Chrisco Books.
򐂰 The search capabilities could also be expanded to include a more “context
aware” search, that would better leverage the portal environment in which
these search capabilities are deployed. For example, capabilities could be
include such that any text or phrase could be searched on via a simple right
mouse click. The sources searched would be determined by the current
location or portlet being used within the portal. This would be an example of
adding the Application Integration::Direct Connection pattern to the solution.
This pattern is described in more detail on the patterns Web site:
http://www-106.ibm.com/developerworks/patterns/application/at1-runtime.html
9.5 Summary
In this chapter, we introduced our Chrisco Books scenario by describing the
overall business context, including the business and non-functional
requirements. We then examined these requirements, to identify the Application
patterns that clearly address the problem at hand. Finally, we analyzed the
various options for implementing the identified patterns, and discussed some
ways in which this scenario could be expanded further.
In the next chapter, we will detail how we actually implemented the various
Application patterns identified in our solution, by leveraging IBM technologies.
Chapter 9. “Chrisco Books” scenario
165
166
Patterns: Portal Search Custom Design
10
Chapter 10.
Technical implementation of
the scenario
In this chapter we provide some details on the technical implementation of our
Chrisco Books scenario. Screen-by-screen installation instructions are not
provided for all products, but rather, we attempt to provide some understanding
of the key architectural decisions and product mappings made to implement the
scenario within the Redbooks testlab.
© Copyright IBM Corp. 2004. All rights reserved.
167
10.1 The runtime environment
Prior to actually implementing anything, we must take the proposed solution
formulated in the previous chapter, and map it to the runtime and product levels.
Figure 10-1 provides a quick review of the solution we are implementing when
viewed from an Application patterns level.
Index Population
OS File
Services
Index
Built in OS File
Search API
WIP
FileSystem
(Win2k)
Index Population
Notes
Indexing
User Search & Discovery
Search Adapter/Search Service variation
Search Broker/
Federator
Project
Tracking
(Notes)
Notes client API
Connector(s)
Connector(s)
Connector(s)
Index Population
Search Portlet
Custom HTML
search interface
HTML
Index
Legacy
Web
Crawler
Index Population
Published
Books
(HTML,PDF)
Content
Mgr
Indexing
User
Content Manger
Client API
Content
Mgmt
(IBM CM)
Figure 10-1 The Chrisco Books solution that we have chosen to implement
When we map this to the Runtime patterns defined in Chapter 5, “Runtime
patterns” on page 67, and then remove the firewall and other infrastructure
aspects that are not needed in our lab environment, we are left with the runtime
“picture” of our scenario infrastructure shown in Figure 10-2.
168
Patterns: Portal Search Custom Design
Population
Directory and Security
Services
Federation
Personalization
Server
Database
Server
Search
Presentation
Server
Collaboration
Application
Server
Content
Management
Figure 10-2 Chrisco Books scenario — runtime environment
Using this model runtime environment as a guideline, we then choose the
appropriate products for each of these key runtime nodes. However, we know
that Chrisco Books already has some existing technologies in place in our
scenario, specifically:
򐂰 A WebSphere Portal implementation
򐂰 An IBM Content Manager based content manger system
򐂰 A Lotus Notes/Domino based collaboration system that also provides some
project tracking and content management capabilities.
These pre-existing technologies make many of our product mapping decisions
for us, leaving only the “search” node as the main product choice required. We
choose Lotus Extended Search (LES) as the product to provide the search node
capabilities, as the scenario solution requires a product that provides the
“extended search” variation of the Information Aggregation::User Information
Access application pattern. To allow for LES to access the IBM Content Manager
product, the IBM Enterprise Information Portal (now known as IBM Information
Integrator for Content) is also required.
Based on these product selections, the products in our scenario environment
would map to the runtime nodes as follows:
򐂰 WebSphere Portal Experience V4.12 (including WebSphere Application
Server v4) = Personalization Server, Presentation Server, Application Server
򐂰 IBM DB2 v7.1. = Database Server
򐂰 IBM Content Manager v7.1 = Content Management, Population
򐂰 Lotus Domino v5.10 = Collaboration, Content Management, Population
򐂰 Lotus Extended Search v3.7 = Search
򐂰 IBM Enterprise Information Portal v7 = Federation
Chapter 10. Technical implementation of the scenario
169
Finally, we choose to utilize the built in LDAP capabilities of Lotus Domino to
provide Directory and Security Services — resulting in a deployment of four
physical nodes (servers) in our environment, as depicted in Figure 10-3.
Domino Server
Population
Collaboration
Directory and Security
Services
Federation
Personalization
Server
Database
Server
Search
Extended Search
Server
Presentation
Server
Content
Management
Application
Server
Portal Server
Content Manager
Server
Figure 10-3 Chrisco Books scenario — physical nodes
All products were installed on IBM series servers, running Windows 2000 Server
(service pack 3) as the base operating system.
10.2 The Lotus Domino server
As the scenario has Chrisco Books with an existing Domino environment, this
server was installed first. We started with a generically installed Lotus Domino
5.10 server, and then made the following configuration changes to support the
WebSphere Portal and other features required for this scenario:
򐂰 The LDAP task was loaded and set up to act as the central authentication
directory for the scenario. This involved setting up the wpsadmin/wpsbind
users, and wpsadmins group, with the appropriate access rights as required
for a WebSphere Portal 4.1 installation.
򐂰 A “redbooks.nsf” database was created, populated with PDF files of
redbooks, and added to the server. This database included a browser
accessible search interface, to simulate the published books HTML site
required for this scenario.
170
Patterns: Portal Search Custom Design
򐂰 A “projects.nsf” project tracking Notes database was created and added to
the server to allow for the searching of Notes based data as required by the
scenario.
Note: Both of these databases were based on IBM ITSO organization
applications, and are thus unavailable for download. However, these
databases represented basic Lotus Notes/Domino applications. Any standard
Notes client based, workflow enabled, database can be utilized in place of the
projects.nsf we used in this scenario, and any standard browser client
accessed database with search capabilities enabled can be utilized in place of
the redbooks.nsf used in this scenario.
10.3 The IBM Content Manager server
The next server installed was the IBM Content Manager server, as this was
another of the technologies that Chrisco Books was supposed to have had in
their environment prior to our scenario.
Software for the CM server was installed in the following order — using the
default installation values:
򐂰 IBM DB2 7.1 EE
򐂰 IBM WebSphere Application Server 3.02
򐂰 Microsoft Visual C++ (Visual C++ is required to compile the database access
libraries; the installation will fail if it is not installed.)
򐂰 Content Manager v7.1
򐂰 IBM Enterprise Information Portal v7.1
Once the installshield installation processes were finished, the Content Manager
database was loaded with sample PDF files for simulating the older books and
images used by Chrisco books in our scenario. The PDF files utilized for this
were simply IBM Redbook PDF files.
This content was defined and loaded into the Content Manager database, and
then set up for searching through the Enterprise Information Portal, via the steps
given in the following sections.
Define the metadata
Here are the steps to follow:
1. Load the Content Manager Administration Client (login as frnadmin/password
in our scenario).
2. Expand the LIBSRVRN tree, expand Fileroom
Chapter 10. Technical implementation of the scenario
171
3. Create the following key fields to define the content metadata (Figure 10-4):
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
BOOK_ABSTRACT (VARCHAR)
BOOK_AUTHOR1_FIRST (VARCHAR)
BOOK_AUTHOR1_LAST (VARCHAR)
BOOK_AUTHOR2_FIRST (VARCHAR)
BOOK_AUTHOR2_LAST (VARCHAR)
BOOK_AUTHOR2_FIRST (VARCHAR)
BOOK_AUTHOR2_LAST (VARCHAR)
BOOK_ISBN (VARCHAR)
BOOK_PAGES (INT)
BOOK_KEYWORDS (VARCHAR)
BOOK_PUBLISH_DATE (DATE)
BOOK_PUBLISHER (VARCHAR)
BOOK_TITLE (VARCHAR)
Figure 10-4 Create Key Fields
Create the search index and search template
Here are the steps:
4. Create a new index class called BOOK to enable indexing of this content.
5. Assign the key fields created in step 3 to the BOOK index class (Figure 10-5).
172
Patterns: Portal Search Custom Design
Figure 10-5 Assign key fields
6. Load the Information Integrator for Content Administration Client login as
cmbadmin/password
7. Create a new federated entity BOOK with new federated attributes
(Figure 10-6).
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
ABSTRACT (VARCHAR)
AUTHOR1_FIRST (VARCHAR)
AUTHOR1_LAST (VARCHAR)
AUTHOR2_FIRST (VARCHAR)
AUTHOR2_LAST (VARCHAR)
AUTHOR2_FIRST (VARCHAR)
AUTHOR2_LAST (VARCHAR)
ISBN (VARCHAR)
PAGES (LONG)
KEYWORDS (VARCHAR)
PUBLISH_DATE (DATE)
PUBLISHER (VARCHAR)
TITLE (VARCHAR)
Chapter 10. Technical implementation of the scenario
173
Figure 10-6 Create Federated Entity BOOK
8. From the federated entity properties dialog, click Map Federated Entity to
map the key fields created in step 3 to the federated attributes created in
step 7 (Figure 10-7).
174
Patterns: Portal Search Custom Design
Figure 10-7 Map Federated Entity
9. Create a new search template called BooksAuthLast (Figure 10-8).
10.Add the following template criteria; add all operators for each item:
a. Name: Auth1Last - Attribute AUTHOR1_LAST - Default Operator: like
b. Name: Title - Attribute TITLE - Default Operator: like
c. Name: ISBN - Attribute ISBN - Default Operator: equals
Chapter 10. Technical implementation of the scenario
175
Figure 10-8 Create Search Template
Import the PDF files
Here are the steps:
11.Load Content Manager client for Windows; login as frnadmin/password.
12.Open the work baskets list; choose To Be Indexed.
13.From the file menu, choose Import.
14.Choose PDF file type.
15.Click Browse and choose file, then click Import.
16.Repeat for all files to be imported, then click Close (Figure 10-9).
176
Patterns: Portal Search Custom Design
Figure 10-9 Import PDF Files
17.For each PDF imported and listed in the “To Be Indexed” work basket:
a.
b.
c.
d.
Double-click the document in the work basket.
Press CTRL-I to index the document.
Choose the BOOK index class and fill in all the book meta-data attributes.
Click OK to save.
10.4 The Lotus Extended Search server
At this point, the Lotus Domino and IBM Content Manager servers have been
installed, and are configured and ready to act as our key data and application
servers for the scenario. Thus, it is now time to install the Lotus Extended Search
server that will be brokering search requests from portal users, to the data and
application servers, and then aggregating the results.
Chapter 10. Technical implementation of the scenario
177
To start, Lotus Extended Search requires DB2 for its configuration database
information, and IBM WebSphere Application Server for its administrative
interfaces. So, DB2 7.2 FixPak 7 and WebSphere Application Server 4.04 were
first installed on this server prior to starting the installation of Lotus Extended
Search 3.7 (LES).
A full version of LES was then installed on the server using the RMI option for
communication between the various Extended Search components.
Note: During the installation of WebSphere Application server, one has to
decide how handle communications between the Extended Search server
components. The two options available are a plain Remote Method Invocation
(RMI) approach, and an Enterprise JavaBeans (EJB) approach.
The main difference is that the RMI approach uses the Java Remote Method
Protocol (JRMP) for communications, while an EJB approach uses the
Internet Inter-ORB Protocol (IIOP). IIOP can be problematic for corporate
firewalls and other security standards, and thus RMI is provided as a more
universally supported option.
When EJB communications are used, EJB support built in to WebSphere
Application Server allows for support of the communication. However, if RMI is
chosen, then the Extended Search RMI server must be started separately
from the WebSphere server to allow communication to take place.
If your extended search server is set for EJB, and EJB communication fails,
then the servlets/ES server/etc will try to use RMI as a backup.
However, in order for the Domino and Content Manager based data and
applications to be searchable, LES components must be installed on those
servers as well. The LES broker components on the main LES server
communicate to LES components installed on Domino and Content Manager
servers, so the LES components on these servers can access these servers
product APIs to perform the searches.
In the case of the Content Manager server, a “server only” install of LES was
executed, as Content Manager requires a local LES broker to interact with the
Enterprise Information Portal capabilities that then search Content Manager. On
the Domino server, just an LES agent was installed.
A “server only” install of extended search is performed using so_setup.exe from
the Lotus Extended Search CDs.
178
Patterns: Portal Search Custom Design
Note: Agents must be running on the same machine where either the Domino
server or the Notes client software is installed (the Notes API permits remote
access). We choose to install the LES agents on our Domino server, but
alternatively could have installed a Notes client directly on our LES server.
Please see Appendix B, “Understanding the Lotus Extended Search
architecture” on page 207 for more detailed discussions of the various agent
requirements and considerations.
After all of the Extended Search components had been installed, LES “data
sources” were created to allow for searching of the key data components
required for the Chrisco Books scenario. These key search data sources for the
Chrisco Scenario included:
1. A public internet search site, provided by google.com in our implementation of
the scenario
2. An intranet site, provided by the redbooks.nsf database setup on the Domino
server in our implementation of the scenario
3. A project tracking application, provided by the projects.nsf database setup on
the Notes server in our implementation of the scenario
4. Data from a full content management system, provided by the PDF files
loaded into our IBM Content Manager server in our implementation of the
scenario
Each of these data sources were then setup on the LES infrastructure as
described in the rest of this section.
10.4.1 Internet and Intranet data source setup
Both the Google Internet source, and the intranet source, are set up via standard
LES Web sources. LES is actually preconfigured with data source definitions for
many popular internet sites, including Google — so the Google Web source was
already enabled. However, to make a Web source definition for our intranet site,
we were required to create a Web source definition file for this new Web source.
The first step in creating a Web source definition file is to contact Intelligent
Algorithms Enterprises, Ltd., to obtain an infoGIST toolkit that enables you to
build your own Web source definition file (.sbb file).
http://www.infogist.com/lotus.htm
Chapter 10. Technical implementation of the scenario
179
In order to create the SBB file, the InfoGist SBB Authoring toolkit needs to be
obtained from Infogist and installed on a development machine. After doing this,
we used the steps in the following section to create our custom Web source:
Step 1: Create the SBB file
Here are the steps to follow:
1. Identify the search page URL for the Web source to include. In our scenario,
we utilized the redbooks.nsf URL on the Domino data server.
2. Launch the InfoGist SBB Authoring tool.
Figure 10-10 Sample InfoGist Authoring tool user interface (with existing search targets)
180
Patterns: Portal Search Custom Design
3. Select the menu item SearchBot, Add.
4. Click OK when prompted for the searchBot ID.
5. Enter the SearchBot definition as shown in Figure 10-10.
Figure 10-11 SearchBot Definition
Chapter 10. Technical implementation of the scenario
181
6. Click the SearchBot Forms button and insert a new form. The details of the
form are shown in Figure 10-12.
Figure 10-12 Search Form Definition
7. When complete, click OK until you return to the searchBot definition page.
182
Patterns: Portal Search Custom Design
8. Define the Follow/Skip rules by clicking the Follow/Skip button and entering
the information specified in Figure 10-13.
Figure 10-13 Follow/Skip Rule definition
Chapter 10. Technical implementation of the scenario
183
9. Next, specify the search syntax by clicking the Search Syntax button and
completing it as shown in Figure 10-14.
Figure 10-14 Search Syntax configuration
10.Validate the configuration by performing a search from within the SBB
Authoring tool.
To initiate the search, enter the search parameter and select the Click Here
To Search button located to the right of the search input field. Figure 10-15
shows the results of an example search.
184
Patterns: Portal Search Custom Design
Figure 10-15 Search execution within the InfoGist toolkit
Step 2: Deploy the SBB file to the LES Server
In order to deploy the SBB file to the LES server, copy the file to the server’s
base directory (for example, <drive>:\Program Files\IBM\Extended Search)
Step 3: Discover the Web Sources
Through the ES administrative interface, follow these steps:
1. Click Servers in the administration interface navigator.
2. Discover the new data source, by right-clicking the server icon (where the
SBB file was deployed) and selecting Discover Data Sources. Refer to
Figure 10-16.
Chapter 10. Technical implementation of the scenario
185
Figure 10-16 Discovering a data source in Lotus Extended Search
Refer to Figure 10-17 for the following sequence:
3. Enter the name of the data source.
4. Click the Start Discovery button.
5. Select the source from the list generated/discovered.
6. Click the Add to ES button.
186
Patterns: Portal Search Custom Design
Figure 10-17 Configuring the Web Source discoverer
Step 4: Configure the Web Sources Link
Here are the steps to follow:
7. Select the Links option on the navigator.
8. Right-click the Web Sources link and select Properties (Figure 10-18).
Chapter 10. Technical implementation of the scenario
187
Figure 10-18 Update the Web Source Link
Refer to Figure 10-19 for the following sequence:
9. Click the second tab (Parameters) and enter the name of the discovered
SBB file in the ESWebConfig link parameter value column. Make sure to
separate the different files with a “?” as shown in the diagram below.
10.Click the Apply button.
188
Patterns: Portal Search Custom Design
Figure 10-19 Configure the ESWebConfig Parameter
11.Finally, propagate the changes, and restart the LES server.
At this point, the new Web source based on the custom SBB definition file should
be available.
10.4.2 Domino application data source setup
Domino data sources are easily set up within Lotus Extended Search. To create
the Domino Data source for our projects.nsf Notes application, we performed the
following steps:
1. In the ES Admin applet, choose the primary server. Right-click and choose
Discover Data Sources.
2. Choose Lotus Notes for the type of source to discover.
3. Enter the Domino server name and the domino hostname.
Note: Include the http port number if using a port other than 80.
4. If the Domino server is not on the same physical machine as Extended
Search server, uncheck the box labeled, “Is this Domino server located on the
local host?”
5. Uncheck “Load these databases with a skeleton set of fields.”
6. Choose Lotus Notes 5.0 for the link name.
7. Click Start Discovery. The LES server will then communicate with the
remote LES “agent” (as installed earlier on the Domino server) and bring back
a list of all available databases on the server to search.
8. Choose the database from the list and click to add this source to Extended
search.
Chapter 10. Technical implementation of the scenario
189
10.4.3 IBM Content Manager data source setup
As discussed earlier, Content Manager is actually accessed by LES through the
Enterprise Information Portal (Information Integrator for Content). An LES broker
was installed on the Content Manager/EIP server so that LES can communicate
with EIP. EIP in turn is configured to search Content Manager via its own
federated search capabilities.
Thus, to configure LES to search the IBM Content Manager in our scenario, we
first need to configure the connection to the LES broker installed on the Content
Manager server, and then set up the EIP data sources in LES.
To configure the connection between the LES server, and LES components on
the Content Manager server, the following steps are performed. This steps must
be performed prior to starting the LES components installed on the Content
Manager server:
1. Ensure that the LES server is running, and open the Extended Search
Administration applet.
2. Create a new Extended Search server in the Admin applet (Figure 10-20).
Figure 10-20 Creating a new extended search server
3. Connect the Primary Extended Search server, installed on our LES node, to
the Extended Search server installed on the Content Manager/EIP server
(Figure 10-21).
190
Patterns: Portal Search Custom Design
Figure 10-21 Connecting the LES servers
4. Enable basic authentication on Extended Search servers via the Admin
applet (Figure 10-22).
Figure 10-22 Enabling basic authentication between the servers
5. Map user IDs from within the EIP Admin application:
a. From within the EIP Administration client, create a user to be used by the
LES server as it accesses Content Manager through EIP.
Note: For a production system, each LES user must be mapped to an EIP
user in this manner. This limitation is removed in Extended Search 4.0. We
utilized Extended Search 3.7 in our scenario.
Chapter 10. Technical implementation of the scenario
191
b. Check the box labeled “Allow access from Extended Search” and enter
the user name which you created in Step a. (see Figure 10-23).
Figure 10-23 Enabling LES access for Information Integrator for Content users
c. Close the EIP Administration client.
6. Rename Agents & Brokers on EIP Extended Search Server.
a. In the Extended search admin applet, open the properties for the LES
server installed on the Content Manager/EIP server.
b. Rename the Broker and Agents to identify them as EIP/Content Manager
specific (Figure 10-24).
192
Patterns: Portal Search Custom Design
Figure 10-24 Server Properties
7. Start EIP based Extended Search Server.
At this point, the LES components running on the Content Manager server have
been properly configured and started. The EIP data source can then be created,
as follows:
1. In the Extended Search Admin applet, choose the EIP server, right-click and
choose Discover Data Sources.
2. Select IBM Enterprise Information Portal as the type of source to discover.
3. Enter the content manager database, “cmbdb” in our scenario.
4. Enter the user id for this database, “cmbadmin” in our scenario.
5. Enter the password.
6. Click Start Discovery.
Chapter 10. Technical implementation of the scenario
193
7. Choose the BOOK federated search object that was set up in EIP during the
Content Manager server install, and choose Add to ES (Figure 10-25).
Figure 10-25 Information Integrator for Content discovery setup
Creating an LES application
After all the data sources have been setup, the final step is to create an
Extended Search “application” that associates all these data sources into a
single “federated” search.
1. In the ES Admin applet, choose Applications in the left pane, right-click and
choose New.
2. In the dialog enter an application name and description.
3. Choose Broker as the entry broker (Figure 10-26).
194
Patterns: Portal Search Custom Design
Figure 10-26 Application Properties
4. Click OK to close the dialog.
5. Add Data Sources to the new search application:
a. In the Extended Search Admin applet, choose Applications in the left
pane.
b. In the right pane is a list of Categories. Expand the section of the tree
labeled, “[All Categories]”.
c. Right-click Domino Sources, and choose Copy.
d. Right-click the Books application and choose Paste. The Domino data
sources are added to the Books application.
e. Do the same for the EIP and Intranet data sources.
Chapter 10. Technical implementation of the scenario
195
10.5 The WebSphere Portal server
Finally, the WebSphere Portal server was installed in our scenario environment
to pull all of the other technologies together into a seamless interface. Since the
scenario calls for the usage of Lotus Extended Search and IBM Enterprise
Information Portal, WebSphere Portal v4.1 “experience” was required.
During the installation “Setup Manager” process, the portal was configured with
the following options:
– Standard Install
– Install Components:
•
•
•
•
WebSphere Portal
WebSphere Personalization
WebSphere Application Server
IBM HTTP Server
– Use Local DB2 Database
– Database & LDAP Directory
– Use Domino LDAP
Once the WebSphere Portal was installed and up and running, a custom portal
theme was applied, and search functionality was added to a new page group
called "Search”.
To create the search page group, the following steps were performed
(Figure 10-27):
1. Log in as a portal administrator.
2. Click the Work With Pages link.
3. Click Manage Places and Pages, then Create Place.
4. Enter the place name Search, choose the ITSOSearch theme, and click OK.
5. Select the Search place and click Manage Place.
6. Choose Create Page, then Create New.
7. Enter the page name Search, and choose a layout. Click OK.
8. In the Manage Places portlet, click Done.
196
Patterns: Portal Search Custom Design
Figure 10-27 Create the portal theme
Next, the Search page group was customized, and the out-of-the-box Advanced
Extended search portlet was added as our search interface.
Note: We could have created a custom search portlet leveraging the Lotus
Extended Search API for our scenario. However, we choose to utilize the
out-of-the-box portlet to simplify the scenario.
1. Log in as a portal administrator.
2. Click Work With Pages.
3. Click Edit Layout and Content.
4. In the drop-down list, choose the Search place and Search page.
5. Click Get Portlets.
6. In the “Name Contains” field, enter Search and click Go.
7. In the search results, click the plus symbol next to “Extended Search
Advanced Portlet”.
8. Click OK.
Chapter 10. Technical implementation of the scenario
197
9. Choose Extended Search Advanced Portlet.
10.Click the plus symbol in the page layout frame to add the portlet to the page.
11.Click Activate.
12.Switch to the new Search page.
13.Click the configure icon for the Extended Search portlet, and enter the correct
hostname for the Extended Search server.
14.Configure the search portlet (Figure 10-28):
a. Log into the portal as wpsadmin.
b. Go to the Search page.
c. Click the EDIT button for the Advanced Search portlet.
d. Click Change Application name or Server URL.
Figure 10-28 Advanced Extended Search portlet configuration options
198
Patterns: Portal Search Custom Design
e. Enter books for the application name, as this is the LES “application” we
set up earlier with our scenarios data sources defined.
f. Edit the URL, and change localhost to the address of the Extended
Search server. Do not change the rest of the URL (Figure 10-29).
Figure 10-29 Setting the LES application name
g. Click OK, then Done.
10.6 Putting it all together
At this point, all of the servers are installed and configured, and the scenario can
be verified by accessing the WebSphere Portal, and performing a search within
the Extended Search portlet. Overall, the Lotus Extended Search server was the
real “workhorse” of this Portal search solution, as it brokers the search requests
out to the other search engines built into Content Manager (that is, Information
Integrator for Content/EIP) and Lotus Domino. The results are returned to
Extended Search server where they are aggregated, ranked, and sorted — and
then returned to the portlet as a single hit list of results from all data sources.
Chapter 10. Technical implementation of the scenario
199
The out-of-the-box portlet we utilized in our scenario provides a basic interface
for the user to do a search, as shown in Figure 10-30.
Figure 10-30 Basic portlet search UI
The portlet also allows the user to specify which sources to search, by choosing
the “advanced” option, as shown in Figure 10-31. By default, all sources are
searched.
Figure 10-31 Selecting the sources
200
Patterns: Portal Search Custom Design
The search results are returned to the portlet in an aggregated and ranked
fashion, as shown in Figure 10-32. These search results show results from both
Google and the internal redbooks.nsf search site aggregated. However, the
search results interface provided in this out-of-the-box portlet is not the most
intuitive. Any real world deployment of this scenario would obviously consider the
usage of a custom portlet — with a more user friendly view of the search results.
Figure 10-32 Example search results
Chapter 10. Technical implementation of the scenario
201
202
Patterns: Portal Search Custom Design
Part 5
Part
5
Appendices
© Copyright IBM Corp. 2004. All rights reserved.
203
204
Patterns: Portal Search Custom Design
A
Appendix A.
Pattern changes
With the publication of this redbook, several changes have occurred to the
Application Integration and Information Aggregation application patterns. Some
patterns have been renamed, some have been discontinued, and new patterns
have been introduced.
In general, these changes were made to more clearly represent the “data
focused application integration” capabilities provided by some application
patterns that were previously considered part of the Information Aggregation
business pattern — by moving these patterns to the more accurate category of
Application Integration patterns. In other words, data based application
integration is more “integration” focused, than “business” focused, and thus more
correctly belongs within an Integration pattern.
Additionally, in the process of making these changes, names of Application
patterns were also modified in some cases to better identify their capabilities.
To help clarify these changes, Table A-1 provides a mapping of each new
application pattern name to the older application pattern name used in previous
Patterns for e-business IBM Redbooks.
© Copyright IBM Corp. 2004. All rights reserved.
205
Table A-1 Information Aggregation and Application Integration pattern changes
Old Pattern Name(s)
New Pattern Name(s)
Information Aggregation::Information
Access
Information Aggregration::User
Information Access (UIA)
Information Aggregation::Information
Aggregation plus Limited/Extended
Update
Information Aggregration::User
Information Access (UIA) - Immediate
update variation
Information Aggregration::User
Information Access (UIA) - Batched
update variation
No prior name, new pattern
Information Aggregration::User Search
and Discovery (US&D)
Application Integration::Federated
Repository
Application Integration::Federation
Information Aggregation::Population
Single Step
Application Integration::Population: Single
Step
Information Aggregation::Population
Multi-step
Application Integration::Population:
Multi-step variation
No prior name, new variation
Application Integration::Population: Data
Cleansing variation
Information Aggregation::Replication
Application Integration::Population:
Synchronization
Information Aggregation::Population
Crawl and Discovery
Application Integration::Population: Index
Information Aggregation::Population
Summarization
206
Patterns: Portal Search Custom Design
B
Appendix B.
Understanding the Lotus
Extended Search
architecture
This appendix provides some background details on the Lotus Extended Search
architecture that should be helpful to any IT professional depoying this
technology as part of a Portal Search Custom Design.
Overall, the distributed component architecture of Extended Search offers the
flexibility to scale a system according to changing requirements. It also allows the
Extended Search components to be arranged in a topology that matches any
environment, enabling a blend of IBM AIX, Sun Solaris, Windows 2000, and
Windows NT® platforms as needed. The architecture supports vertical and
horizontal scalability:
򐂰 Vertically, within a single Extended Search server, you can configure multiple
instances of server processes to influence the number of simultaneous
requests that the server can process.
򐂰 Horizontally, with multiple machines, you can set up additional Extended
Search servers and additional Web servers. For each Extended Search
server, you can determine the types of server tasks you want to run. By
having multiple servers, you can distribute and balance the processing load.
© Copyright IBM Corp. 2004. All rights reserved.
207
Extended Search architecture
The Extended Search system employs a four-tiered architecture (Figure B-1).
Messages start from search applications in the first tier and proceed
consecutively through subsequent tiers to the back-end. In most cases, the
back-end is a third-party data source to which Extended Search is connected;
but it can also be the Extended Search configuration database (CDB), a private
back-end that is managed by DB2.
1st Tier
2nd Tier
3rd Tier
4th Tier
ES Server
Run Time
Browser
Notes Client
Applet
Web Server
App Server
Web Server
Web Server
App Server
Broker
Agent
Link
Backend
Broker
Agent
Link
Backend
Data Source Discoverer
Backend
Admin
ES CDB
Applet
Web Server
App Server
RMI Server
CDB
Figure B-1 Extended Search tiered architecture
Message flows between the tiers can be divided into two basic categories:
򐂰 Run time messages, shown above the dotted line in the preceding diagram,
are messages usually issued by the user community to perform searches and
retrieve documents.
򐂰 Administrative messages, shown below the dotted line, are issued by the
Administrator and result in updates to the configuration database.
Run time messages can be submitted either through a standard Web browser or
a Lotus Notes client program. Administrative messages are always submitted
through the Extended Search Administration interface.
The horizontal bars in Figure B-1 indicate the consecutive components through
which each message must flow during its journey from the first tier through the
fourth tiers and back again. Each of these components is described in the
following sections, starting from the right side of the diagram and moving to the
left.
208
Patterns: Portal Search Custom Design
Links and translators
Extended Search links are the software modules that encapsulate the native API
calls for search and retrieval to a specific data management system. They
contain all of the required data structures, programming objects, and procedural
logic necessary to interface with the back-end data system.
A link module is uniquely assembled to support (at a minimum) four callable
methods that typically exist in all data management systems:
򐂰 Methods to connect to and disconnect from the host system
򐂰 Methods to search content and retrieve data from the system
The link module performs a null operation for those methods that are not
supported by the back-end source. For example, a file system search does not
support the concept of connecting and disconnecting.
Extended Search translators are the software modules responsible for
translating the incoming GQL expression into the native search grammar of the
back-end data system. They, too, contain all of the required data structures,
programming objects, and parsing logic necessary to generate a syntactically
correct search expression.
In some cases, the same translator module applies to several different back-end
systems, as is the case for the SQL translator and the many varied systems that
support the standard SQL grammar.
Extended Search comes with a broad set of link and translator modules that
enable you to connect to most of the industry’s common data management
systems. If your data system is not contained in this standard set, you can
develop a custom link or translator module by using an easy-to-use toolkit
provided with the product.
Agents
Extended Search agents are programs that respond to search and retrieval
operations targeted against a particular data source. The agent loads the
appropriate link and translator modules when a request against a specific data
source type is first made. The agent then calls upon these module libraries for
translation (XLAT), connect, disconnect, search, and retrieval operations.
Appendix B. Understanding the Lotus Extended Search architecture
209
Figure B-2 illustrates the interaction of the agent with a given back-end system.
Connect
Search/Fetch
Request
Response
(Sorted by Rank,
Pruned to MaxHits)
X
X L
X L
A
L A T
A T
T
L
L I
L I
N
(NT/AIX) I
N K
(NT/AIX)
N K
K
Agent
Agent
Agent
Xlat/Link libraries
loaded for each LinkType
Search/Fetch
Search Engine
Disconnect
(Web, File, etc...)
DataSource
User Exits
Figure B-2 Extended Search agents
For search operations, an agent will sort the results set by relevance rank and
then truncate the set to the maximum number of hits, as specified in the original
search request. This sorting and subsequent pruning of the list of hits is an
important precursor to aggregation, which will be discussed shortly.
Agents can reside on the same machine as the data source (recommended) or
use a data source’s remote APIs for access. More than one copy of an agent can
run on a single computer to handle concurrent search and retrieval requests. An
agent can be dedicated to a single data source, a group of sources of a particular
type, or a range of sources that have a mixture of link requirements.
To be able to discover and search certain types of data sources, the Extended
Search Server component, including an agent, must be installed on the same
machine with the data source being discovered or searched. Note that you are
not required to install the Web server and configuration database components on
a remote machine, but you may need to install the base server software to
ensure that an agent is locally available to the remote target sources.
Table B-1 details the requirements for accessing remote data sources and
identifies those products (for which support is predefined in Extended Search)
that require a local agent for searching.
210
Patterns: Portal Search Custom Design
Table B-1 Agent location requirements
Product
Agent requirements
File systems
Agents must be running on the same
machine as the directories being
searched.
IBM Enterprise
Information Portal
Agents must be running on the same
machine as the Information Integrator for
Content federated server.
LDAP
Agents can search any LDAP server from
any Extended Search machine (LDAP is
completely remote).
Lotus Connectors
Agents must be running on the same
machine as the Domino server that hosts
the Connectors software.
Lotus Domain Index
Agents must be running on the same
machine where either the Domino server
or the Notes client software is installed
(the Notes API permits remote access).
Lotus Domino.Doc
Agents must be running on the same
machine where the Domino.Doc Desktop
Enabler is installed (the Domino.Doc COM
API permits remote access).
Lotus Notes
Agents must be running on the same
machine where either the Domino server
or the Notes client software is installed
(the Notes API permits remote access).
Microsoft Access
Agents must be running on the same
machine where the Access database
(.mdb file) is installed. MDAC 2.5 or higher
must also be installed.
Microsoft Exchange
Server
Agents must be running on the same
machine where the Exchange Server
software is installed. MDAC 2.5 or higher
must also be installed.
Microsoft Index Server
Agents must be running on the same
machine where the Index Server software
is installed. MDAC 2.5 or higher must also
be installed.
Appendix B. Understanding the Lotus Extended Search architecture
211
Product
Agent requirements
Microsoft Site Server
Agents must be running on the same
machine where the Site Server software is
installed. MDAC 2.5 or higher must also
be installed.
Microsoft SQL serve
Agents can be running on the any
machine where MDAC 2.5 or higher is
installed (the SQL Server API permits
remote access).
ODBC — Access
Agents must be running on the same
machine where the Access database
(.mdb file) is installed. MDAC 2.5 or higher
must also be installed.
ODBC — DB2
Agents must be running on the same
machine where the DB2 client, at a
minimum, is installed (remote access is
possible as long as the DB2 client is
available).
ODBC — Oracle
Agents must be running on the same
machine where the DB2 client, at a
minimum, is installed (remote access is
possible as long as the DB2 client is
available).
ODBC — SQL Server
Agents must be running on the same
machine where ODBC 3.0 is installed
(access is completely remote).
Brokers
Extended Search brokers are intermediary components that exist between the
requestors of service and the agents that actually perform the service through
the back-end. They function as special purpose resource coordinators designed
to manage the multitude of searches generated from a single request – as
caused by a category search, for example. Figure B-3 illustrates the functionality
performed by an Extended Search broker.
212
Patterns: Portal Search Custom Design
Shared Memory
1 Request
Broker
Broker
Search/Fetch
Response
2
4
Broker
JSP
Hitlist
1.0
1
3
Security &
Logging Exits
JSP
Hitlist
1.1
2
JKM
Hitlist
3
Cached Search Results
Local and/or Remote Data Sources
Figure B-3 Extended Search Broker
A broker typically performs the following tasks:
򐂰 Validates the request.
򐂰 Expands categories to obtain a list of the data sources available to the
application and resolves the source addresses. (Label 1)
򐂰 Distributes queries to agents for efficient, parallel searching. (Label 2)
򐂰 Aggregates and optionally sorts search results that are returned by the
various agents into a single search result set. (Label 3)
򐂰 Caches search results for subsequent paging operations. (Label 4)
򐂰 Issues requests to agents to retrieve source documents for the user (note that
in most cases, the Web browser uses the URL returned in the results list to
retrieve the document).
򐂰 Honors timeouts and response options.
The degree of responsiveness can vary dramatically from a large set of back-end
systems contributing to a single request. Some data management systems
respond faster than others, and some not at all – possibly due to out of service
conditions. To account for this situation, brokers were designed to communicate
asynchronously with their agents.
This design allows a broker to not be dedicated to any one particular back-end
data source, and it enables the user to assign a timeout value to the request.
When a timeout threshold has been reached, the broker returns whatever results
have been compiled up until that point.
Appendix B. Understanding the Lotus Extended Search architecture
213
Additional options let you control how the broker returns results. Two such
options are to return the results when they are available or after they have been
sorted.
򐂰 If you specify the “When available” option, the broker will return the results in
the order that the sources respond to the query. This approach provides a
fast way to see the results of your search, but there is no guarantee that the
first results you see will be the most relevant results.
򐂰 If you specify the “Sorted” option, the Broker will collate all the results, sort
them according to additional options you specify, and eliminate duplicate
references before returning the results to you. This approach usually takes
longer than obtaining results as they are available, but the results may be
more relevant to your query.
Multiple brokers
To support performance and scalability, a given Extended Search domain can
contain multiple brokers. This ability to establish a hierarchy of brokers, along
with the ability to set up agents co-resident with the sources they support or to
dedicate agents to particular sources or types of sources, provides Extended
Search with endless flexibility with regard to changing and expanding
environments. Under a multiple broker schema, sources get partitioned across
all of the brokers, a design that prevents any one broker from being
overwhelmed.
In a single broker environment, a search that targets six dozen sources would
result in 72 queries being sent to the remote machines and 72 sets of search
results being returned to the broker. If each result set contains the maximum
number of results, most of the data will be discarded when the broker
consolidates and aggregates the data for the list being returned to the requestor
(the broker prunes the results and keeps only the top items, up to the maximum
number allowed by the search application).
With multiple brokers, an entry broker sends a single message to brokers on
remote machines. The remote brokers then split the message into multiple
requests for the sources (fronted by agents) on their respective machines.
Instead of all result sets being returned to one broker, each broker consolidates,
aggregates, and prunes the results returned by its agents, and then returns just a
single list – containing the top hits – to the entry broker. The entry broker only
needs to create a final results set from its own local sources and the consolidated
lists returned by the remote brokers.
This design enhances overall performance (less bandwidth is needed for
broker-to-broker communication as compared to that needed to communicate
with hosts that lack brokers) and it allows new sources, regardless of location, to
be easily integrated into an existing domain.
214
Patterns: Portal Search Custom Design
Configuration database
The broker obtains information about the resources it is to manage from the
Extended Search configuration database. This database contains information
about data sources and how they should be searched. It also stores network
addresses, saved queries, saved search results, and data that was downloaded
by a Web crawler.
You can easily update information about your network topology, data sources,
and search applications by using an intuitive Administration interface. This
interface also provides the gateway through which you can run discovery
(discussed below), view error message and event data, schedule queries, and
work with saved queries and search results.
Several wizards facilitate common configuration activities. The wizards enable
you to easily export and import data between domains, design the format and
content of search result sets, specify data source search and retrieval
parameters, and configure mapped fields.
Note that a simple refresh action will disseminate changes you make in the CDB
throughout the Extended Search domain. The only time you need to restart the
server is when you update configuration data for the server itself.
Discovery
To add data sources to your domain, Extended Search provides a collection of
discoverers, programs that load the CDB with default information about a data
source. The discovery process greatly facilitates the configuration process by
automatically configuring field and parameter information for each new data
source. Later, using the Administration interface, you can designate which fields
you want to enable for search and retrieval operations.
The discovery process is also able to ascertain whether or not a particular data
source has been previously loaded into the configuration database. The
discoverer skips already defined sources on subsequent invocation. This is true
even if the data source name changes.
Extended Search comes with a broad set of discoverers that enable you to
quickly incorporate many of the industry’s common data management systems.
Like links and translators, if your data system is not contained in this standard
set, you can develop a custom discoverer by using the Extended Search toolkit.
Appendix B. Understanding the Lotus Extended Search architecture
215
Monitoring
To help you collect statistics and fine-tune the system for performance, Extended
Search includes a Monitor, and tool that enables you to observe server activity
through a graphical user interface. The Monitor is packaged as a standalone C++
program and as a Java applet that you can launch from within the Administration
interface. This feature enables you to make adjustments and refresh the system
without having to restart the server.
The Monitor can run independently of the broker, and be started and stopped
any number of times, without affecting work being done by the Extended Search
server. Because it can run remotely, you can quickly check the status of various
servers from a location other than the host console.
Environment
Because Extended Search is designed to use existing software to search for
information and retrieve data from wherever it exists throughout an organization,
it must integrate well with the existing IT infrastructure.
To this end, an Extended Search domain supports a mixed topology. Extended
Search server components (brokers, agents, and so on) can reside on IBM AIX,
Sun Solaris, Microsoft Windows 2000, and Microsoft Windows NT platforms, and
you can mix the component topology as needed to satisfy the requirements of
your operating environment (Figure B-4).
216
Patterns: Portal Search Custom Design
Figure B-4 A typical Extended Search environment
As shown in the preceding illustration, users can submit requests through a Web
browser or a Lotus Notes client — interfaces that they are already familiar and
comfortable with. This design allows Extended Search to provide a distributed
search across many different data repositories through a single, efficient, and
easy to use point of access.
All user requests get sent to the Web server, which in turn forwards the request
to the appropriate Extended Search broker. The broker, in turn, contacts the
agents needed to carry out the request and search the various target sources.
򐂰 When access is through a Web browser, information about the search (what
sources to search, how to search them, and how results should be returned)
is determined by the HTML or JavaServer Pages that define the search
application.
Appendix B. Understanding the Lotus Extended Search architecture
217
򐂰 When access is through Notes client software, information about the search
is stored in a search application database, which can either exist on a Domino
server or be replicated down to the user’s workstation.
Note that Extended Search uses the Hypertext Transfer Protocol (HTTP) to
invoke the appropriate servlet for processing requests. This approach has some
advantages:
򐂰 It allows the search application to use an industry-standard protocol (HTTP).
This enables the application to use many Web server-related features such
as support for socks, proxies, and secure sockets layer (SSL) technology.
򐂰 It allows servlets to communicate with an Extended Search server that
resides on a machine other than the Web server. This provides for added
flexibility when resource capacity and performance are of concern.
218
Patterns: Portal Search Custom Design
C
Appendix C.
Using the WebSphere Portal
Search Engine
The WebSphere Portal Search Engine was not utilized in the search scenario in
this redbook. However, it is a powerful basic search utility that has been provided
with even more capabilities with each release of WebSphere Portal.
As a guideline for implementing this technology in your own solutions, this
appendix describes the setup of the Portal Search Engine within a WebSphere
Portal 4.12 environment.
Details on the setup and usage of the updated Portal Search Engine in
WebSphere Portal v5 can be found in the Portal v5 Infocenter:
http://publib.boulder.ibm.com/pvc/wp/500/ent/en/InfoCenter/wps/admsrch.html
Additional details on the Portal Search Engine in WebSphere Portal v4.21 can be
found in the Portal v4 Infocenter:
http://publib.boulder.ibm.com/pvc/wp/42/ext/en/InfoCenter/wps/admsrch.html
© Copyright IBM Corp. 2004. All rights reserved.
219
How to set up Portal Search in WebSphere Portal Server
Setting up Juru search or document search for your Portal would require:
1. Creating the Search page.
2. Building an index.
3. Setting up security.
4. Configuring the crawler.properties (optional).
Creating the Search page
You need to create a page that will contain the Document Search and Manage
Search Index portlets. Let us create a sample search page:
1. Log on to the portal as the Administrator (wpsadmin).
2. First we need to create a copy of the Document Search portlet, which we can
then use on our Search page. Select Portal Administration ->Portlets
->Manage Portlets
Note: It is recommended to create another instance of the Juru Search portlet,
because this portlet can be used to search on a single index.
3. From the list of portlets, select Juru Search and then click Copy.
Figure C-1 Create Copy of Juru or Document Search Portlet
220
Patterns: Portal Search Custom Design
4. Provide a name for the new portlet instance; for example, My Juru Search
and then click OK.
5. The new portlet is not activated by default. So, select it from the list of portlets
and then click Activate/Deactivate.
6. Click Modify parameters. This option allows you to specify the search index.
Specify the Index Location parameter, for example, in the case of UNIX,
/var/PortalServer/indices/index1, or in the case of Windows,
C:\temp \index1, depending upon the platform on which the Portal is
installed. This is the name and location of the index that we will create later
on. Now, click Save.
7. Select the Work with Pages option. Click Manage Places and Pages and
then select Create place.
8. Provide a Place name and default locale title for the place; for example,
Juru Search. Then, click OK.
9. From the list of Places you can manage, select Test and then click Manage
pages.
10.Click Create page -> Create new
11.Provide a name for the page, select a Layout, and then click OK.
12.Select Edit Layout and Content, and select the Place as Test and the Page
as Search.
13.Click Get portlets. Select either Show all portlets or Search for portlets
using the keyword search. Click Go.
14.From the list of portlets returned, select My Document Search and Manage
Search Index portlets by clicking the Add to list (+) button besides them.
Then, click OK.
15.You can edit the layout of the Search page and then add the selected portlets
to the page. Click Activate.
Building a Juru Index
The Manage Search Index portlet can be used to build and maintain indices of
Web content that will be used by the search portlet. The search index stores key
words and terms and maps them to their source documents, enabling fast
processing of requests from the search portlet. During the build process,
documents are retrieved for indexing through a Web crawler (robot). Searchable
resources can be stored on the local portal server or on remote sites. Users can
search HTML and text documents:
1. Log onto the portal as the Administrator (wpsadmin) and then navigate to the
search page that we created, for example, Test -> Search.
Appendix C. Using the WebSphere Portal Search Engine
221
2. On the Manage Search Index Portlet, click the Configure search index
option.
3. Specify the following values for configuring our index:
– Location of the index as /var/PortalServer/indices/index1 or
C:\Temp\SampleIndex
– Task for configuring the index as New Index
– Starting URL as http://w3.itso.ibm.com/ or any URL that would be the
base URL for your index
– The option to enable CJK language support enables support for
Chinese, Japanese, and Korean languages. We do not require this option.
– Document types to be indexed as both HTML and text.
– Levels of linked documents should be at least 1.
– Number of linked documents to index can be retained as 100.
4. Click OK to save the configuration and then click Done.
5. Now click the Manage search index option on the Manage Search Index
Portlet.
6. From the list of indices, select the index that we just configured
(C:\Temp\SampleIndex) and then click Begin index update. Once the index
has been built, if you re-visit the Manage search index option (or click
Refresh on the browser) you will see the statistics for Last update
completed at and Number of active documents updated.
7. Click Done.
Setting up permissions
There are two basic tasks that are required to be completed before the Search
feature can be made available to a portal user:
򐂰 Portal users should be provided View access to the Search page.
򐂰 The Manage Search Index portlet should not be accessible to users other
than the Administrator.
Here are the steps to accomplish these objectives for our Search page:
1. Log onto the portal as the Administrator (wpsadmin) and then click Portal
Administration -> Security.
2. For the Select a group or user to assign permissions field, select Special
groups -> All authenticated users.
3. Select pages for Select the objects for the permissions field. Click Go.
222
Patterns: Portal Search Custom Design
4. Provide View permissions for the Test place and Search page. Click Save.
5. Now, select portlets for Select the objects for the permissions field.
Provide search as the keyword for the Search On -> Name contains field
and then click Go.
6. Provide View access for the My Document Search portlet and None for the
other portlets.
7. You can now log out and then log onto the portal as an ordinary user.
Configuring the crawler
The index build process is optimized for crawling inside an Intranet. If you need
the crawler to fetch documents on the other side of a firewall, you need to update
the crawler.properties file (located in the index directory). You can set either the
name and port of a proxy server or a socks server. For example:
Example: Proxy settings for the crawler
#The name of the socks server to be used <server name>:
#<port number>server-name>:
SocksServer=socks.yourco.domain \:1080
#The name of the proxy server to be used <server name>:<port number>
ProxyServer=proxy.yourco.domain \:80
You can specify additional URLs (maximum of nine) crawled into the same index.
Example:Additional sites to be indexed
#OtherRoot1=http \://www.second.site
#OtherRoot2=http \://www.third.site
...
#OtherRoot9=http \://www.last.site
Appendix C. Using the WebSphere Portal Search Engine
223
224
Patterns: Portal Search Custom Design
Related publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see “How to get IBM Redbooks”
on page 227.
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V4.1.2,
SG24-6869
򐂰 Patterns: A Portal composite pattern using WebSphere Portal V5, SG24-6087
򐂰 Access Integration Pattern Using WebSphere Portal Server, SG24-6267
򐂰 Applying Pattern Approaches, SG24-6805
򐂰 Patterns: Self-Service: Connecting to the Enterprise, SG24-6572
򐂰 IBM WebSphere Everyplace Server Service Provider and Enable Offerings:
Enterprise Wireless Applications, SG24-6519
򐂰 IBM WebSphere Everyplace Server: A Guide for Architects and Systems
Integrators, SG24-6189
򐂰 Applying the Patterns for e-business to Domino and WebSphere Scenarios,
SG24-6255
򐂰 Mobile Applications with IBM WebSphere Everyplace Access Design and
Development, SG24-6259
򐂰 Self-Service Patterns using WebSphere Application V4.0, SG24-6175
򐂰 Self-Service Applications Using IBM WebSphere V4.0 and MQSeries
Integrator, SG24-6160
򐂰 Patterns for the Edge of Network, SG24-6822
򐂰 Web Services Wizardry with WebSphere Studio Application Developer,
SG24-6292
򐂰 IBM WebSphere V4.0 Advanced Edition Handbook, SG24-6176
򐂰 Java Connectors for CICS: Featuring the J2EE Connector Architecture,
SG24-6401
򐂰 MQSeries Programming Patterns, SG24-6506
© Copyright IBM Corp. 2004. All rights reserved.
225
򐂰 WebSphere Version 4 Application Development Handbook, SG24-6134
򐂰 IBM WebSphere V4.0 Advanced Edition Handbook, SG24-6176
򐂰 IBM WebSphere Portal Developers Handbook, SG24-6897
򐂰 IBM WebSphere Portal V4.1 Handbook, Volume 1, SG24-6883
򐂰 IBM WebSphere Portal V4.1 Handbook, Volume 1, SG24-6920
򐂰 IBM WebSphere Portal V4.1 Handbook, Volume 3, SG24-6921
򐂰 WebSphere Portal 4.12 Collaboration Services, REDP0319
򐂰 IBM WebSphere Portal V4.1.2 in a Linux Environment, REDP0310
򐂰 WebSphere Portal V4.1 AIX 5L Installation, REDP3594
򐂰 WebSphere Portal V4.1 Windows 2000 Installation, REDP3593
Other resources
These publications are also relevant as further information sources:
򐂰 Patterns for e-business: A Strategy for Reuse by Jonathan Adams, Srinivas
Koushik, Guru Vasudeva, and George Galambos, ISBN 1-931182-02-7
򐂰 Flanagan, David, JavaScript: The Definitive Guide, Third Edition, O'Reilly &
Associates, Inc., 1998, ISBN 0-596-00048-0
򐂰 Maruyama, Hiroshi, Kent Tamura and Naohiko Uramoto, XML and Java:
Developing Web Applications, Addison-Wesley 1999, ISBN 0-201-77004-0
򐂰 Flanagan, David, Jim Farley, William Crawford and Kris Magnusson, Java
Enterprise in a Nutshell, O’Reilly & Associates, Inc., 1999, ISBN
0-596-00152-5
򐂰 Subrahmanyam Allamaraju et al, Professional Java Server Programming
J2EE Edition, Wrox Press, 2001, ISBN 1-861004-65-6
Referenced Web sites
These Web sites are also relevant as further information sources:
򐂰 Patterns for e-business Web site:
http://www.ibm.com/developerWorks/patterns/
򐂰 IBM Redbooks internal Web site:
http://w3.itso.ibm.com
226
Patterns: Portal Search Custom Design
How to get IBM Redbooks
You can order hardcopy Redbooks, as well as view, download, or search for
Redbooks at the following Web site:
ibm.com/redbooks
You can also download additional materials (code samples or diskette/CD-ROM
images) from that site.
IBM Redbooks collections
Redbooks are also available on CD-ROMs. Click the CD-ROMs button on the
Redbooks Web site for information about all the CD-ROMs offered, as well as
updates and formats.
Related publications
227
228
Patterns: Portal Search Custom Design
Index
Symbols
.NET 146, 148
Numerics
80/20 situation 4
A
Access Integration pattern 25
aggregated content 138
APPLET tag 128
Application Integration 44–45
runtime patterns 76
Application Integration pattern 25, 28
Application patterns 5, 11
Application Server node 69
Architect 23
architecture 137
asynchronous 70
authentication 68, 144
authorization 68, 146
B
back-end applications 69
back-end integration 28
Benefits 31
Best practices 5
Web services 146
Best-practices 16
Business drivers 20
Business patterns 4, 7, 23, 25
C
Cascading Style Sheets 125
Categorization 117
certificates 68
Chrisco Books 153
cHTML 129
Collaboration 26, 72, 138, 145
Collaboration business pattern 25–26
Collaboration node 70
Collaboration pattern 25
© Copyright IBM Corp. 2004. All rights reserved.
collaboration services 145
Command-Manager design pattern 141
common data model 115
Community 70
Composite patterns 5, 9, 23, 25
components 23
Content 143
Content management 72, 143
Content Management node 71
controller 139
cross-selling 71
CSS See Cascading Style Sheets
Custom designs
what are 36
D
data aggregation 18, 73
Data integration 45
data sources 69
Database Server node 71
DB2 Information Integrator for Content 100–101
demilitarized zone (DMZ) 69
Developer 24
DHTML 129
Directory and Security Services node 68
documents 71
Domain Name Server node 68
Dynamic HTML
DHTML 126
E
ECMA-262 126
ECMAScript 126
EIP
see DB2 Information Integrator for Content
EMBED tag 128
Enterprise Information Portal
see DB2 Information Integrator for Content
Extended Enterprise business pattern 25, 27
Extensibility 22, 95
Extensible Markup Language 130
229
F
see WebSphere Portal Search Engine
Federated Repository 55
Federation 41, 44–45, 55, 65, 105
application pattern 56
product mappings 107
runtime pattern 83
with external data 84
federation 74
Field mapping 115
Firewalls 69
functions 94
G
Guidelines 5, 16, 137, 145
H
HTML 70, 126
Validator tools 125
HTTP 69
HTTP tunneling 128
HTTP/HTTPS 97
I
IBM Global Services 23
Images 71
indexed 71
Information Aggregation 44, 57
runtime patterns 85
Information Aggregation business pattern 25, 27
Integration patterns 5, 7, 23, 25
Internet Service Provider 68
IT drivers 21
J
J2EE 129
Java applets 127
Disadvantages 128
Java programmer 24
Java Runtime Environment 128
JavaBeans 130
JavaScript 126
JavaServer Pages 129
JDBC 97
JRE See Java Runtime Environment
JScript 126
JSP 69, 129
Juru
230
Patterns: Portal Search Custom Design
L
Layered design 147
LDAP directory 137
LDAP/LDAPS 97
leveraging legacy investments 37
Limitations 31
Loose coupling 147
Lotus Discovery Server 102
Lotus Domino 101
Lotus Extended Search 98
agents 209
architecture 207
brokers 212
configuration database 215
links 209
M
Maintainability 22, 95
mobile 72
model 139
Model-View-Controller design 139
Model-View-Controller design pattern 139
Multi-client device 72
Multi-Tier Design 136
MVC structure 142
N
network protocols 97
P
Patterns for e-business 3
Application patterns 5, 11
Best practices 5, 16
Business patterns 4, 7
Composite patterns 5, 9
Guidelines 5, 16
Integration patterns 5, 7
Product mappings 5, 15
Runtime patterns 5, 12
Web site 6
performance considerations 119
personal computing device 68
Personal Digital Assistant (PDA) 134
personal digital assistant (PDA) 68
Personalization 72
Personalization Server (Rules Engine) node 70
Pervasive User node 72
platforms 96
Population 47, 65
Data Cleansing 45–46
application pattern 49
Index Population 44–45, 50, 52, 105
application pattern 52
product mappings 106
runtime pattern 76
with Data Cleansing 82
with external data 80
Multi Step 45
application pattern 48
applied to indexing 78
Multi-step 41, 44, 46, 48
Single Step 44–46
application pattern 47
Synchronization 45, 54
application pattern 54
population 74
Population Crawl and Discovery 50
Population Summarization 50
Portal application design 133
Portal applications 133
Portal characteristics 28
Portal composite pattern 18, 24, 27, 30–31, 137,
141
Portal composite runtime pattern 69, 72
portal implementation 23, 68, 72, 137, 143
Portal search
the need for 36
Portal Search custom design 35
a scenario 153
application patterns 41
business drivers 37
compared to composite pattern 42
IT drivers 37
product mappings 96
protocol mappings 98
Runtime pattern 73
runtime pattern 94
portal system 143
Portlet API 135
Portlets 142
Presentation Server 72
Presentation Server node 70
Process integration 45
Product Descriptions 98
Product mappings 5, 15
products 94
Project Manager 23
Protocol and Domain Firewall node 69
Public Key Infrastructure node 68
Q
Query syntax 114
R
Redbooks Web site 227
Contact us xiii
Replication 54
requirements 30
resource connection pooling 69
resources 71
Reuse 22
RM/IIOP 97
rules 70
Runtime patterns 5, 12, 28, 67, 72, 94–95
S
Sales 23
saved searches 123
Scalability 22, 95
SCRIPT tag 128
search 74
Search & Indexing 72
security 71, 134, 145–146
Security concerns 118
Self-Service business pattern 25–26
Servlets 128
Signed applet 128
Single Sign-On 31, 70, 138
Single Sign-On (SSO) 144
Single-Tier Design 136
SOAP 140
SOCKS proxy 146
SSL protocol 68
Summarization 116
Sun ONE 146
Swing 127
synchronous 70, 72
systems 18, 138
U
User / Internal User node 68
Index
231
User Information Access 57
application pattern 58–59
immediate update 59
User Management 145
User Search & Discovery
application pattern 61
runtime pattern 86
search adapter variation 87
search service variation 88
with external users and data 89
User Search and Discovery 41, 44, 61, 64, 105
product mappings 107
V
Validator tools
HTML 125
VBScript 126
Versioning 144
View 139
VPN 68
W
Web container 129
Web Server Redirector node 69
Web Services 136, 140
Web services
Best practices 146
WebSphere Application Server 103
WebSphere Content Publisher 71
WebSphere Porta 103
WebSphere Portal Search Engine 104
usage hints and tips 219
WebSphere Portal Server 135
Wireless Gateway node 72
Wireless Markup Language 129
Workflow 26, 72
workflow 138
X
XML 129–130
232
Patterns: Portal Search Custom Design
Patterns: Portal Search Custom Design
(0.5” spine)
0.475”<->0.875”
250 <-> 459 pages
Back cover
®
Patterns: Portal Search
Custom Design
Applying the
Information
Aggregation patterns
to portal search
solutions
Hints/tips for using
IBM search
technologies
A portal search
scenario
The Patterns for e-business are a group of proven, reusable
assets that can speed the process of developing applications.
The Portal Search Customer Design builds off the Portal
Composite Pattern, combining Business and Integration
patterns to help implement a portal search solution.
Part 1 of this IBM Redbook provides introductory material
around the IBM Patterns for e-business, and the Portal
Composite Pattern.
Part 2 guides you through the process of choosing the
Business and Integration patterns of the custom design, and
then drills down to the Application and Runtime patterns, and
Product mappings.
Part 3 provides a set of guidelines for implementing and
building a portal search solution, including a discussion of
search technology selection criteria, as well as application
design and development.
Part 4 demonstrates how to implement a portal search
solution via a technical scenario. This technical scenario uses
the WebSphere Portal Extend offering, combined with Lotus
Extended Search.
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed by
the IBM International Technical
Support Organization. Experts
from IBM, Customers and
Partners from around the world
create timely technical
information based on realistic
scenarios. Specific
recommendations are provided
to help you implement IT
solutions more effectively in
your environment.
For more information:
ibm.com/redbooks
SG24-6881-00
ISBN 0738498289