IBM InfoSphere Discovery Version 4 Release 5.1 Sample Projects SC23-9880-04 IBM InfoSphere Discovery Version 4 Release 5.1 Sample Projects SC23-9880-04 © Copyright IBM Corporation 2006, 2011. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Chapter 1. Installing IBM InfoSphere Discovery . . . . . . . . . . . . . . 1 Prerequisites . . . . . . . . . . . . . Supported Data and Database . . . . . . . Automatic Database Configuration . . . . . Using IBM InfoSphere Discovery with a Different DB2 Database . . . . . . . . . . . . Installing Discovery with IBM DB2 Express Edition Uninstalling Discovery . . . . . . . . . . . 1 . 1 . 1 . 2 . 2 . 3 Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery . . . . . . . . . . . . . . 5 Demonstration Project: Overlaps and the Unified Schema Builder . . . . . . . . . . . . Start Discovery Studio and create a project . . Create and populate the data sets . . . . . Import tables from the JDBC connection into the data set . . . . . . . . . . . . . . Create the Region data set . . . . . . . Create the CRM data set . . . . . . . . Run and review column analysis . . . . . Identifying critical elements . . . . . . . Discover and review PF Keys . . . . . . Discover and review data objects . . . . . Overlaps . . . . . . . . . . . . . Creating a unified customer model . . . . © Copyright IBM Corp. 2006, 2011 . 5 . 6 . 9 . . . . . . . . . 11 14 16 16 20 21 24 26 29 Unified column analysis . . . . . . . . . Perform match and merge analysis . . . . . Create a report to include in your development specifications . . . . . . . . . . . . . Demonstration Project: Archiving tables by defining business objects . . . . . . . . . . . . . Start InfoSphere Discovery . . . . . . . . Create the project and the data sets . . . . . Import CIS tables . . . . . . . . . . . Defining option sets for your analysis . . . . Analyzing and reviewing discovered relationships . . . . . . . . . . . . . Adjusting the data object . . . . . . . . . Export artifacts . . . . . . . . . . . . 30 30 32 33 34 34 35 36 39 40 42 Contacting IBM . . . . . . . . . . . 43 Product accessibility . . . . . . . . 45 Accessing product documentation. . . 47 Links to non-IBM Web sites. . . . . . 49 Notices and trademarks . . . . . . . 51 Index . . . . . . . . . . . . . . . 55 iii iv IBM InfoSphere Discovery Sample Projects Chapter 1. Installing IBM InfoSphere Discovery Discovery consists of three components: Discovery Server, Discovery Engine Service, and Discovery Studio. When you choose to install IBM® DB2® Express® Edition during installation, all three Discovery components must be installed on a single host. The Discovery installer installs IBM DB2 Express Edition and all database tables necessary to run the demo project, along with a completed version of the project that you can use for reference. If you do not install IBM DB2 Express Edition, the demo project will not be installed and you cannot use this IBM InfoSphere Discovery Sample Projects guide or run the demo project. Prerequisites To run the demo project, you can either install IBM DB2 9.7 Express Edition along with Discovery or you can use the existing installation of IBM DB2 9.7 Express Edition on Windows platform. Prerequisites are described in the IBM InfoSphere Discovery Installation Guide. Supported Data and Database The tutorials and demos used here are pre-configured for IBM DB2 9.7 on Windows. The bundled demo project uses DB2 source, a DB2 repository and DB2 staging databases. The IBM InfoSphere Discovery Installation Guide lists the operating system requirements, supported databases, and supported ODBC or JDBC drivers or your production environment. Automatic Database Configuration You have the option of installing IBM DB2 Express Edition along with Discovery. If you do, the Discovery installer automatically preconfigures IBM DB2 Express Edition and IBM InfoSphere Discovery for the demo projects by performing the following actions: v creating the data sources and loading the tables into the database v creating the required users and JDBC connections v creating a default staging data source in Discovery Studio v importing a completed version of one project into Discovery Studio The Discovery installer catalogs the system JDBC data sources with the same names as the databases. Note: IBM DB2 Express Edition cannot be installed on a host that already has any existing DB2 version installed (including other versions of DB2 clients or servers). © Copyright IBM Corp. 2006, 2011 1 To install the bundled IBM DB2 Express Edition version and the pre-configured demo project along with Discovery, make sure any previous DB2 packages are completely uninstalled from the host. Using IBM InfoSphere Discovery with a Different DB2 Database You may use a different DB2 database with IBM InfoSphere Discovery, but the installer will not automatically preconfigure it or Discovery Studio. Installing Discovery with IBM DB2 Express Edition About this task The following instructions are for installing IBM InfoSphere Discovery with IBM DB2 Express Edition. Note: If IBM InfoSphere Discovery cannot be installed using these steps, install the product using the instructions in the IBM InfoSphere Discovery Installation Guide. Procedure 1. Make sure the host meets the hardware and software prerequisites. 2. Close all applications and windows on the machine. 3. In a file explorer window, open the <installation_disk>/CD directory, then double-click the file install.exe. The installation package starts extracting, which can take up to one minute. When it is finished, the installer's Introduction screen appears. 4. Click Next to start the installation. 5. Accept the license agreement and click Next. 6. On the remaining screens, click Next to accept the default options. 7. If any of the following situations occurs during installation, take action as noted below. v If an error message states that IBM DB2 Express Edition cannot be installed, you have the following options: – Quit installation and completely uninstall any existing DB2 product from the machine (including deleting the DB2 directory), then start Discovery installation again. – Uncheck the IBM DB2 Express Edition option in the installer, then continue installation. IBM DB2 Express Edition will not be installed and you will not be able to use this IBM InfoSphere Discovery Sample Projects guide or run the demo project. v If the Discovery Server Port screen states that some or all of the required ports are unavailable, change the ports as prompted. Contact your system administrator if needed. v If a security notice about blocking Java 2 Platform Standard Edition Binary appears, click Unblock to allow Windows to access Java. v If the installer asks to install Microsoft Visual J# 2.0 Redistributable Package, click Next to accept the installation. v If an error message states that the installer did not successfully preconfigure IBM DB2 Express Edition or Discovery Studio, you will not be able to use this IBM InfoSphere Discovery Sample Projects guide or run the demo project. 8. In the Discovery Server Host screen, enter the following value: 2 IBM InfoSphere Discovery Sample Projects v Discovery Server Hostname: localhost 9. When the Start Both Services screen appears, click Next and then Done to close the Discovery installer. Results IBM InfoSphere Discovery and IBM DB2 Express Edition are now installed. The appropriate ODBC connections, users, and databases are created, the demo tables are loaded, and you are ready to start Discovery Studio. Uninstalling Discovery About this task The uninstaller automatically uninstalls Discovery Server, Discovery Engine Service, and Discovery Studio. To uninstall IBM InfoSphere Discovery: Procedure 1. Stop Discovery Studio, Discovery Server, and Discovery Engine Service. Make sure no Discovery Studio tasks are queued or running. 2. From the Start menu, select Programs>IBM InfoSphere>Discovery>Uninstall IBM InfoSphere Discovery. 3. Accept the default, Full, by clicking Next. 4. The uninstaller stops the selected components, if they are running, and uninstalls them from the machine. 5. If any components or files could not be uninstalled, a message appears. In most cases these are logs, configuration files, or user-created files. These files do not contain any project data and can be deleted. Results IBM InfoSphere Discovery is now uninstalled. Chapter 1. Installing IBM InfoSphere Discovery 3 4 IBM InfoSphere Discovery Sample Projects Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery By using InfoSphere® Discovery, you can find and manipulate relationships. These demonstrations display some of the basic principles of Discovery. These instructions assume the following things: v You installed IBM DB2 Express Edition with IBM InfoSphere Discovery v IBM DB2 Express Edition and InfoSphere Discovery Studio are successfully preconfigured with the following objects: – The necessary data sources are created. – The tables are loaded in IBM DB2 Express Edition. – The required users and JDBC connections are created. – A default staging server is created in IBM InfoSphere Discovery Studio. As part of this preconfiguration, four completed demonstration projects are imported into Discovery Studio. You can review the completed projects before you run these learning modules, and use them for reference as you work. Important: If you did not install IBM DB2 Express Edition, your demonstration projects will not be automatically configured. See the IBM InfoSphere Discovery User Guide for instructions on creating projects and executing tasks. Learning objectives The objective of the demonstrations is to help you understand how to use InfoSphere Discovery to analyze data. Specifically you will be able to do the following: v Create a project. v Create and populate data sets. v Run and review column analysis. v Discover and review primary and foreign keys. v Discover and review data objects. v Discover and review overlaps and unified schemas. Time required Each demonstration should take approximately 60 minutes to finish. If you explore other concepts related to the demonstrations, it could take longer to complete. Demonstration Project: Overlaps and the Unified Schema Builder Consolidating data from multiple systems can be difficult. IBM InfoSphere Discovery enables a 4-step methodology for prototyping the artifacts for the final solution The four steps are: 1. Inventory the data landscape 2. Model the target © Copyright IBM Corp. 2006, 2011 5 3. Map to and analyze the target 4. Perform match and merge analysis. The Discover_Data_Consolidation sample project contains three data sets already defined and configured for you: v CRM v Region v Community Each of the data sets appears as a tab in the Data Sets view. You can view the connection information for any of the data sets by using the following procedure: 1. Click on a data set tab. 2. Right-click on the data set in Database Connections & Tables. 3. Select Edit the selected connection. You can also view the data content by clicking the Column Analysis tab. Learning objectives After completing the lessons in this module you will be able to consolidate data from multiple source systems. This module should take approximately 60 minutes to complete. Start Discovery Studio and create a project You can create your own project to start learning to consolidate data from multiple systems. All work in IBM InfoSphere Discovery is done in projects. Begin the lesson by opening Discovery Studio and then creating a project. 1. From the Windows Start menu, select Programs > IBM InfoSphere > Discovery > > Discovery Studio. Discovery Studio opens and automatically connects to the Discovery Server. The sample projects that were loaded during installation, Discover_Data_Consolidation, Discover_PFKey_DataObject, and Discover_Sensitive_Critical_Data, appear in the project list of the Source Data Discovery tab. There is also a sample project in the Transformation Discovery tab called Discover_Transformation. 6 IBM InfoSphere Discovery Sample Projects To hide the Error List and Output pane, click the button in the upper right corner of the pane. 2. In the Source Data Discovery tab, click New Project. You can create as many projects as necessary, but only one project can be open at a time. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 7 3. In the Name field, type the name of the project. In this example, type Training - Overlaps and Unified Schema Builder. 4. Clear the Use Password checkbox, or you can enter the password to protect your project from unauthorized access. Use the default settings for the other fields. 5. Click OK. The project Training - Overlaps and Unified Schema Builder is now created. Discovery automatically opens the next tab, Data Sets. You can click the Home tab to see the Training - Overlaps and Unified Schema Builder project in the Source Data Discovery project list. 8 IBM InfoSphere Discovery Sample Projects Create and populate the data sets The new project requires three data sets. Create and name the data sets, specify a JDBC connection for each one, and import tables into each one. 1. In the Data Sets tab, click Rename. In the dialog box type Community and click OK. 2. Click the Click here to add a new connection link. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 9 3. In the Create Connection window, complete the following fields using the values shown: v Connection Name: Community Source v Database Server Name: localhost v Database Name: ISD_SRC v User Name: ISD_MDM v Password: ISD_user1 10 IBM InfoSphere Discovery Sample Projects 4. In the Create Connection window, click Test Connection to verify the connection parameters. 5. Click OK to save the connection. The Community Source connection is added to the Import Objects list under the Database Connections & Tables section. You have created a new data set and specified JDBC connection information for that data set. Import tables from the JDBC connection into the data set After creating the data set, you need to add tables to begin working with the data. 1. In the Import Objects list of the Data Sets tab, right-click the JDBC connection, Community Source, that you created in the previous lesson. Select Import Tables/File Formats from the drop-down menu. 2. In the Import Table Wizard, click Search Tables. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 11 3. In the Table Name field, type COMMUNITY_ to search for tables that have names that begin with that string. 4. Click Next. 5. The result of the search found three tables beginning with the string COMMUNITY_. Click Select All and then click Finish to select all three tables to import. 12 IBM InfoSphere Discovery Sample Projects The tables are imported into the Community data set. The physical tables are listed in the Database Connections & Tables list and are appended with _PT. One logical table is created for each physical table, and is listed in Logical Tables. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 13 Create the Region data set You can now create the second data set that is needed for this demonstration and import tables into it. You follow the same steps that you did for the Community data set. 1. Right-click on the Community tab and select Add Data Set. A second, blank data set is added to the project. 14 IBM InfoSphere Discovery Sample Projects 2. Rename the second data set by clicking Rename and changing the name to Region. 3. Click the Click here to add a new connection link. 4. In the Create Connection window, complete the connection information by using the following values: v Connection Name: Region Source v Database Server Name: localhost (same as previous connection) v Database Name: ISD_SRC (same as previous connection) v User Name: ISD_MDM (same as previous connection) v Password: ISD_user1 (same as previous connection) 5. Click OK. 6. In the Import Objects list of the Data Sets tab, right-click the JDBC connection, Region Source that you just created. Select Import Tables/File Formats from the drop-down menu. 7. In the Import Table Wizard, click Search Tables. 8. In the Table Name field, type Region_ to search for tables that have names that begin with that string. 9. Click Next. 10. The result of the search found three tables beginning with the string Region_. Click Select All and then click Finish to select all three tables to import. These are the tables to import: v ISD_MDM.REGION_ACCT_NAMES v ISD_MDM.REGION_ADDR_TYPE Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 15 v ISD_MDM.REGION_BRCH The tables are imported into the Region data set. The physical tables are listed in the Database Connections & Tables list and are appended with _PT. One logical table is created for each physical table, and is listed in Logical Tables. Create the CRM data set Create the third of the three data sets and populate the data set. You follow the same steps that you did for the Region and Community data sets. 1. Right-click the Region tab and select Add Data Set. A third blank data set is added to the project. 2. Rename the third data set by clicking Rename and changing the name to CRM. 3. Click the Click here to add a new connection link. 4. In the Create Connection window, complete the connection information by using the following values: v Connection Name: CRM Source v Database Server Name: localhost (same as previous connection) v Database Name: ISD_SRC (same as previous connection) v User Name: ISD_MDM (same as previous connection) 5. 6. 7. 8. 9. v Password: ISD_user1 (same as previous connection) In the Import Objects list of the Data Sets tab, right-click the JDBC connection, CRM Source, that you just created. Select Import Tables/File Formats from the drop-down menu. In the Import Table Wizard, click Search Tables. In the Table Name field, type CRM_ to search for tables that have names that begin with that string. Click Next. The result of the search found three tables beginning with the string CRM_. Click Select All > Finish to select all three tables to import. These are the tables to import: v ISD_MDM.CRM_ACCT_TYPE v ISD_MDM.CRM_ADDRESS_TYPE v ISD_MDM.CRM_BRCH_1A The tables are imported into the CRM data set. The physical tables are listed in the Database Connections & Tables list and are appended with _PT. One logical table is created for each physical table, and is listed in Logical Tables. Run and review column analysis Column analysis is performed individually on each table within each data set. The Column Analysis tab displays information about all columns in the data sets, such as the following information: v Metadata v Data types from physical or logical tables (Native Type) v Data types used in the staging database v Formats for textual data discovered as number (NUMBERSTRING) or date-time data (DATETIMESTRING) v Statistics gathered during the profiling step 16 IBM InfoSphere Discovery Sample Projects If necessary, you can manually change a data type of a column and some other metadata. You can use data preview to verify the actual data. Tip: Always re-run Discovery, including Column Analysis, after importing tables or text files, changing a primary sample set, reloading or reimporting tables, or performing any other action that affects the contents of a table, file, or data set. 1. In the Data Sets tab, click Run Next Steps. 2. In the Processing Options window, click Run to accept the defaults and queue the Column Analysis task for processing on the tables in the project. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 17 As soon as you queue the task, the Column Analysis tab appears. Imported metadata and other information available without discovery is displayed. While the task is queued or running, the project is locked. You can click on other tabs but you cannot make any changes that affect the project, such as adding data sets or tables, while a project is locked. Notice in the following Column Analysis figure that Discovery can include textual data types (the SSN column is a NumberString). 18 IBM InfoSphere Discovery Sample Projects 3. When processing is complete, review the results in the Column Analysis tab. Review the tables in the data sets by clicking on each data set tab to display its tables in the Tables list, and then clicking each table to display the column information in the center grid. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 19 To display the actual data in a selected table, click Preview Data. You can sort, filter, and export the data from the preview. 4. Verify that all of the metadata is correct. Imported or discovered metadata is shown in the first nine columns of the center grid in the Metadata category. The Data Type, Length, Precision, Scale, and Formats fields are editable, if necessary. If you change any of those values, click Re-Run Step to re-run Column Analysis. 5. Review the discovered statistics in the remaining columns of the center grid. Scroll to the right if necessary to view all columns in the center grid. If these statistics are not correct, these values can help you identify which columns or tables might be related. Statistics cannot be manually changed. For example, Cardinality and Selectivity are used together to identify how unique the values are in a column. Click Value Frequencies in the menu bar for a list of each value in the column and how often it appears. Min and Max display the actual smallest and largest values in the column. Mode is the most common value in the column. You have done a very basic column analysis to get an understanding of the data in these sample data sets. Identifying critical elements In most projects you understand at least one data source more than others. For the purposes of this lesson, assume that you know the CRM data more than the other sources. 20 IBM InfoSphere Discovery Sample Projects You know the CRM source, so you first need to mark up the known critical data elements (CDE). 1. Click the Column Analysis tab. 2. Select the CRM data set. 3. Select the table CRM_BRCH_1A. 4. Select the following boxes in the CDE column: v FIRST_NAME v LAST_NAME v v v v TAX_ID ADDRESS_LINE_1 CITY STATE This process identifies these particular columns with specific attributes that you want to include in your new target schema. 5. Click Run Next Steps to process these attributes. You can go into the other data sets to mark any data elements that you recognize as critical to retain in the consolidated project. You can also use the Value Frequencies, Pattern Frequencies, or Length Frequencies views to examine the data content of a column that you think might be critical. These CDEs help you focus on the relationships in later discovery steps. Discover and review PF Keys PF Key discovery is performed across all columns within each data set. PF Keys are primary-foreign key pairs. InfoSphere Discovery discovers column matches, which are relationships between the data in two or more columns in different tables in the same data set. Based on the statistics and additional calculations not shown, Discovery promotes certain column matches to the status of PF Keys. The PF Key with the best statistics for each column pair is selected as the primary PF Key for that column pair. 1. In the Column Analysis tab, click Run Next Steps. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 21 2. In the Processing Options window, ensure that the slider is next to PF Keys. Click Run to run PF Key discovery. Processing will take a minute or two to complete. 3. When processing is complete, review the discovered primary-foreign keys by clicking on each data set tab. The Connected Tables and Unconnected Tables in each data set are listed on the left of the screen and are also shown graphically in the center pane. Expand the list and each table in the list to view its PF Keys and column matches. Scroll the display until the PF Keys and column matches of interest are visible in the center pane, dragging tables (boxes) and relationships (lines) to rearrange them as necessary. The Display Mode allows you to filter the center panel to show only column matches, only PF Keys, or only the selected item. Zoom is also useful. 22 IBM InfoSphere Discovery Sample Projects 4. Review the statistics for each PF Key by clicking on its connection in the Connected Tables list or on the connecting line in the center pane. The SQL for the selected PF Key and its discovered statistics are displayed in the grid below the center pane. You now know something about the primary and foreign key relationships and have a better understanding of the data. The statistics for each relationship are based on the join expression, shown in the Foreign Keys tab. There might be several join expressions discovered for each relationship, each with different statistics. v Row Hit Rate (RHR) is the total number of table rows that satisfy the PF Key expression. v Value Hit Rate (VHR) is the number of unique values that satisfy the PF Key expression. v Cardinality is the number of unique value combinations involved in the PF Key expression. v Selectivity is the Cardinality divided by the total number of rows. A strong PF Key relationship has a high RHR, high VHR on the primary and foreign side, and a high Selectivity on the primary side. In some cases, especially when the statistics for all discovered relationships are similar, you might need to investigate further to determine which relationships are Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 23 valid and which join expression is the best. The Show Hits, Show Misses, or Show Duplicates drop-down button allows you to preview the actual data in the tables. Discover and review data objects Data object discovery is performed across all tables within each data set. A data object is a logical cluster of all tables in a data set that have one or more columns that contain data that is related to the same business entity. Data objects are not maps, but instead represent an object view of related tables. By grouping tables in this way, InfoSphere Discovery can narrow the focus of the analysis to only the tables that are known to be related. Each table in the data set is represented in at least one data object, and a data object can contain as many tables as necessary. If more than one PF Key was found between a pair of tables, Discovery creates one data object for the tables based on the primary PF Key. A data object with only one table means that no other tables in the data set contain data that is related to that table. Tip: A table that is not related to any others within its own data set may still be related to a table in another data set. Discovery across data sets is performed in the Target Matches step, which is not included in these lessons. For example, assume a data set contains three tables. In the PF Keys step, Discovery found several primary-foreign keys between two of the tables and selected one PF Key as primary. In the Data Objects step, Discovery creates two data objects: one for the two tables related by the primary PF Key, and one for the unrelated third table. 1. In the PF Keys tab, click Run Next Steps . 24 IBM InfoSphere Discovery Sample Projects 2. In the Processing Options dialog, click Run to execute Data Object processing. 3. When processing is complete, verify that the data objects are sensible and accurate, as measured by the statistics and your knowledge of the data. The data objects discovered within each data set are shown in the Data Objects list on the left of the screen. Expand each data object in the list to display the tables in it. When you click on a data object or one of its tables, the data object is displayed in the center pane. Scroll the center pane display, if necessary, to see all of the tables and relationships within a data object, dragging tables (boxes) and relationships (lines) to rearrange them as necessary. Click on a connecting line in the diagram to display statistics about the PF Key relationships between the two tables. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 25 You have reviewed the PF Key relationships in the data objects and are satisfied with the validity of the relationships. Overlaps The main task in the Overlaps tab is to review the discovered overlaps. This includes viewing the column data to verify that the overlaps are useful and valid, deleting incorrect overlaps, and adding overlaps that you know exist but were not discovered. Accurate results provide a clear picture of overlapping data in your data sources. 1. In the Data Objects tab, click Run Next Steps. 26 IBM InfoSphere Discovery Sample Projects 2. In the Processing Options window, click Run. 3. When processing is complete, review the overlaps. Results are provided separately for each data set, but are combined into Data Set Summary and Data Set Overlaps pages. The graphic on the top-level Data Set Summary page provides a visual summary of the overlap statistics. Each group of columns corresponds to a row in the grid above the graphic. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 27 4. Review the results by clicking on the statistics to drill down into the data. The Data Set Summary pages and Data Set Overlaps pages each have three levels: Data Set Summary, Table Summary, and Column Summary. When you examine the CRM data set, you can see that 22 out of the 33 columns overlap with columns from other data sets. These overlaps can provide important insight into the relationships between the CRM data set and other data sets. a. Click on the 22 in the table. You see a list of all CRM columns that overlap in data value (exact values) with some columns in Region and Community. The instances where you see zeroes, which indicates low overlap, means that the two sets of data do not have much in common. b. Examine the overlap on a critical data element such as LAST_NAME, which is an important natural key. A high degree of overlap on LAST_NAME is a good indicator of overlapping customers. In this case, out of 77 last names in CRM, 34 of them appear in Region, and 35 of them appear in Community. c. Confirm your conjecture that CRM has common customers with Region by clicking the 34 hits to see the Region overlap details. Now you can review the overlapping names in the Value Overlap Details window. d. In the Value Overlap Details click the data row and then click Show Hits to see the actual overlapping last names. You can also select Show Misses to see the names that are not common. The Preview Criteria window opens. Click OK to close and open the Matches Data Preview window. Click Close until you return to the Overlaps view. Tip: Overlap displays can help you find more critical data elements. For example, if a column has a very high cardinality and strong overlap, it is likely to be an important natural key that exists in all sources that contain customer data. You can use data views and data profiles, including all types of 28 IBM InfoSphere Discovery Sample Projects frequencies to investigate the nature of these columns. If they are indeed meaningful, you can mark them up as CDEs on the Overlap tab, or on the Column Analysis tab. 5. When you have reviewed all overlaps and deleted any incorrect overlaps, select Project > Save to save the project. Lesson checkpoint You have used the Overlaps information to help you further understand the customer information. This is what you have accomplished to this point in the lessons: v You have marked CDEs and you might have discovered a few CDEs that you were not aware of previously. v You know that the CRM, Community, and Region data sets have overlapping customer populations. v You understand the table relationships within each of these data sets. v You are ready to prototype a canonical customer table. Creating a unified customer model Now that you have an inventory of the data sources on hand, you are ready to prototype a table that contains customer data from all relevant sources. You want your consolidated table to account for all the critical data elements that you have marked in previous steps, so that it models critical customer properties. 1. Click the Unified Schema tab. 2. Click the plus (+) symbol in the menu bar under Target Table Navigator to add a new table. 3. Click on the new table to edit the name, and type ALL_CUSTOMERS. You now have a target table to model customers, except that it does not yet contain any columns. 4. Select the new table and then select the Target Table Schema tab. There are currently no columns in the ALL_CUSTOMERS table. 5. On the right source tree, click on the drop-down next to the CDE header and click Checked. The source tree now only displays the CDE elements that you selected earlier. 6. Drag and drop the Table:CRM_BRCH_1A into the empty middle pane. You have adopted all of the CDEs in table CRM_BRCH_1A into the target model. You can modify these definitions. For example, change TAX_ID to SSN by clicking on TAX_ID and typing SSN. 7. Create the source maps. You want to map all three sets of source data to the new target table, ALL_CUSTOMERS. To do this, create a map for each data set. a. Click the Source Mapping tab. b. Select the CRM data source from the drop-down list The CRM map is already filled out, because you adopted data elements into the target schema in a prior lesson. Therefore, the CRM source map is complete. c. Select the Region data source from the drop-down list. d. Click Suggest Transformations to display any suggestions that Discovery might provide. e. Select all of the suggestions that are good. Then click OK. f. Click Preview Data to review the results of the mapping. g. Select the Community data source and click Suggest Transformations and map the data source as you did for the Region data source. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 29 Unified column analysis You want to test the combined source maps by using unified column analysis. Profile all of the maps and the combined results. 1. From the Unified Schema view in your new ALL_CUSTOMERS table, select Run Next Steps. 2. In the Processing Options window, click Run. 3. Select the Unified Column Analysis tab to refine the source maps. 4. Expand the SSN column to see the unified profile of this target column. Observe that the data seem to appear in 2 formats. 5. With SSN still selected, click on the drop-down menu that is labeled Value Frequency and click Pattern Frequency. The frequency result shows that the combined SSN (from 3 maps) contains 51 social security numbers in a no-dash format, all of them coming from the Region map. The other two sources contribute social security numbers in a dashed format. You want to fix this inconsistency. Click OK to close the Pattern Frequency view. 6. Click the Source Mapping tab. 7. Select Region from the drop-down menu. 8. Click on the SSN row. In the Expression editor in the lower pane type a transformation instruction that inserts dashes into the social security numbers, as in the following example: substr(REGION_BRCH.SSN,1,3) || ’-’ || substr(REGION_BRCH.SSN,4,2) || ’-’ || substr(REGION_BRCH.SSN,6,4) 9. Click Preview Data to verify that the transformation instruction is correct. 10. Click Run Next Steps to run unified column analysis. 11. Ensure that the SSN target column is now mapped consistently by all of the maps. 12. Click on Preview Data again. Click on LAST_NAME to sort all records on last name, so that you can see the mixture of records in this view. This table has all of the customer information in the correct format, but there are duplicates and discrepancies which you can analyze in the next lesson. You have profiled three source maps side by side, and you have used the unified column statistics and unified pattern frequency to identify problems with the diverse maps and to bring the data to consistency. You can use the Preview Data button on the Unified Column Analysis page to see the combined mapping results in the target format. Perform match and merge analysis You are going to try and determine the best keys to use for matching duplicate rows, and analyze the potential data conflicts. Match and merge analysis is performed on target table schemas. 1. Click the Match and Merge Analysis tab under the Unified Schema tab. 2. Click the plus (+) symbol on the menu bar in the middle section to add a new matching condition. 30 IBM InfoSphere Discovery Sample Projects 3. Highlight the Matching Condition cell, and enter this condition in the window at the bottom of the view: DM_ROW1.SSN = DM_ROW2.SSN This matching condition means that given any two rows in the table, DM_ROW1 and DM_ROW2, if they have the same SSN value, then they might represent the same customer. If yes, then Discovery adds them into the same group and tries to merge all of the records in that group into a single record. For this lesson, you used a simple matching condition, matching on a single column, SSN. You could also use a more complex matching condition. For example: DM_ROW1.SSN = DM_ROW2.SSN and DM_ROW1.LAST_NAME = DM_ROW2.LAST_NAME Discovery allows you to take advantage of the power of SQL expressions, including User Defined Functions. For instance you can use fuzzy matching functions that are provided with the product installation, such as DMCOMPARE_LCS or DMCOMPARE_EDITDST, or any DB2 UDF you created. 4. Click Re-Run Step. 5. In the Processing Options window click the checkboxes for both Match and Merge, then click Run. 6. After the run completes, look at the statistics that Discovery produces for the matching condition. You want to assess the correctness of the matching condition by using the Groups views. Now that you have entered and processed a match condition, determine its accuracy. Does it match what you think it should match? Does it accurately group all records for the same customer into one group? To assess the semantics and strength of a matching condition, use the Groups views to see the following: v All Groups v Consistent Groups v Groups with Discrepancies v Exclusive Groups v Groups with Source Duplicates Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 31 v Groups without Source Duplicates 7. Select Groups with Discrepancies to view matches with conflicting data. If you grouped the name John with the name Betty, you might need to reconsider the matching condition. If the discrepancy level is low and you can explain the existing discrepancies, then the matching condition is good. 8. In the data view that opens, click on the Conflict Count column header to sort the values. Find the group with the greatest number of conflicts and select this group. 9. Click the Details icon (the small binocular) to see the conflicts in this most troubled group. The conflicts are due to non-standard data and other issues that do not invalidate the matching condition by SSN. Therefore, matching by SSN is a good match key. Create a report to include in your development specifications Your goal was to prototype the consolidation of three data sources into one customer data set. You now know the guidelines that should be followed when developing the Customer Master. You have also identified potential areas for data cleanup prior to the consolidation. You now need to develop specifications that will allow you to create the Customer Master data set. 1. Click the Match and Merge Analysis tab under the Unified Schema tab. 2. Click Merge Summary in the menu bar. 3. The Merge Summary view summarizes the results of your prototyping. Export this information for your developers by clicking the export icon and selecting Export All Rows. 4. A file browser window opens and you can select a target folder and file name to save the report. You can save the report in several formats, such as csv, html, xls, xml, or tsv. Click Save. You can now share the report with others. 32 IBM InfoSphere Discovery Sample Projects Lesson checkpoint In this lesson, you have seen that even good matching conditions do not remove conflicts. When there are conflicts, when different data sets provide different information about the same customer you can use the set of facilities in InfoSphere Discovery that discover a trust index and help you prototype conflict resolution logistics. You have now prototyped the consolidation of three data stores into a new Customer Master. v You have prototyped a matching condition. v You have used InfoSphere Discovery’s features to assess the correctness of the matching condition. v You have built the following prototype artifacts: – inventory – unified model – source maps, including profiling and testing across maps – match and merge analysis To accomplish these tasks you inventoried the data landscape, modeled the new target schema, determined how to map each source into the new schema, determined the best keys to use for matching duplicate rows, and analyzed the potential data conflicts. Demonstration Project: Archiving tables by defining business objects Use IBM InfoSphere Discovery to create a complete business object for Optim archiving. To deploy an Optim archiving solution successfully, you must archive business objects, or tables that are related to each other. Correctly identifying business objects is often a complex task. A typical data set has a large number of tables and might not have well declared or documented foreign or primary keys. It can be a challenge to work with such a data set to establish the boundary of tables to be archived together and maintain the correct relationships between them. IBM InfoSphere Discovery can help you meet these challenges to create business objects. After reviewing Optim solutions with IBM sales representatives, Company A decides to pursue the strategy of archiving orders that are older than two years from its Customer Information System, also known as CIS. Before Company A can configure an Optim Access Definition, they need to find out the location and content of the Orders table and the related tables by using InfoSphere Discovery. Learning objectives After completing the lessons in this module you will understand that InfoSphere Discovery automatically discovers implicit relationships in a large schema and also clusters tables into business objects. In this module, you complete these tasks: v Start InfoSphere Discovery. v Create data sets containing tables from which we wish to create business objects. v Use Discovery to find and review foreign keys. v Use Discovery to find and review business objects, also, called Data Objects. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 33 Time required This module should take approximately 60 minutes to complete. Start InfoSphere Discovery You can use InfoSphere Discovery to look for meaningful primary and foreign key relationships. A relationship is meaningful if the primary side has selectivity of one (or close to one) and the foreign side is 100% (or close to 100%) matching a value from the primary side. 1. From the Windows Start menu, select Programs>IBM InfoSphere>Discovery>Discovery Studio. Discovery Studio opens and automatically connects to the Discovery Server. You see a sample project that was loaded during installation, Discover_PFKey_DataObject, that you can refer to during these lessons. 2. Click OK in the Server Connection dialog window. Create the project and the data sets Create the data sets that contain tables from which you want to create business objects. For this lesson, you will be working with a source data discovery project. Create the project Training PFKey_DataObjects 1. Click the Source Data Discovery tab. 2. Click New Project to create a new project. 3. Type Training PFKey_DataObjects in the Name field. 4. Ensure that the Use Default Staging checkbox is selected to use the default staging database. 5. Clear the Use Password checkbox so that you do not require a password for this project. 6. Click OK to create the new project. 7. In the Data Sets window, click Rename to rename the data set from its default name to CIS. 8. In the Import Objects list, click the Click here to add a new connection link. You can now connect to one or more relational databases or add text files into the CIS data set. In this lesson, CIS exists as an Oracle database. You connect to the CIS database by providing ODBC information to Discovery in the Create Connection window. 9. In the Create Connection window, complete the following fields using the values shown: v Connection Name: CIS v Database Server Name: localhost v Database Name: ISD_SRC v User Name: ISD_ASSETS v Password: ISD_user1. This is case sensitive. 10. In the Create Connection window, click Test Connection to verify the connection parameters. 11. Click OK to save the connection. 34 IBM InfoSphere Discovery Sample Projects Tip: You can create more than one connection to the database. With multiple connections, you can discover relationships between schemas or even between different databases. Import CIS tables After you connect to the database, you need to import tables to prepare for analysis. You can import all tables from this connection, or selectively import the tables that you need. 1. From the Data Sets window, right-click the CIS connection in the Database Connections & Tables view. 2. Click Import Tables/File Formats. 3. In the Import Table Wizard, specify Search Tables. 4. In the Table Name field, type ISD_ASSETS.. If you know that something is common for all the relevant tables, such as common prefix, or common user name, you can search for these tables to import them. In this lesson, you search for all tables owned by CIS. 5. Click Next. The result of the search for all tables with names that begin with ISD_ASSETS. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 35 6. Click Select All and then click Finish to import all of the selected tables. For each table that is imported, Discovery displays both a physical table and a logical table. The logical tables are identical to the physical tables. The rest of the analysis in these lessons is performed on logical tables. For these lessons, do not make changes to logical tables; just remember that they are the same as the actual tables from the CIS database. Tip: By using Discovery you can use the logical tables to perform more advanced analysis than the analysis that is used in this scenario. Defining option sets for your analysis Before you can analyze the tables that you imported from the previous lesson you need to set the options for the analysis. 36 IBM InfoSphere Discovery Sample Projects InfoSphere Discovery performs sophisticated analysis of data to discover relationships and other data properties. Option sets are a way for you to instruct Discovery about whatever you know of the data, so that Discovery can deliver more accurate results. For example, in the CIS discovery, you are interested in perfect foreign key relationships, where the primary keys are unique and foreign keys reference the primary key values 100% of the time. You set the options so that Discovery only looks for this type of relationship. You can also specify that you want Discovery to find almost foreign keys, where the primary key has selectivity of greater than 0.8 and more than 80% of the foreign key values match the primary key values. 1. Click the Data Sets tab. 2. Click Run Next Steps. 3. In the Processing Options window, click New . 4. In the New Options window, in the Name field, type a name that is meaningful, such as CIS options. 5. In the Step field, select PF Keys. 6. In the set of options under Generate PFKeys, change the following values to 1.0: v Min foreign row hit rate to identify column as foreign key. v Min selectivity value to identify column as primary key. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 37 Figure 1. Options for PF Keys step 7. Click OK to save the options for the PF Keys step. 8. Click Edit in the Options block to modify the CIS options that you just created. 9. In the Step field, specify Data Objects from the drop-down menu. 10. In the set of options under Generate Data Objects, change the following values to True: v Data Object generation includes Reference tables. v Data Object generation includes attribute tables of reference tables. 38 IBM InfoSphere Discovery Sample Projects Figure 2. Options for Data Objects step 11. Click OK to save the options for the Data Objects step. Analyzing and reviewing discovered relationships You are now ready to run the analysis to discover relationships and data objects. With a large schema, the relationships could be numerous and complex. It is important for an analyst to focus on the purpose of their analysis, in this case the Orders table and related tables. The PF Keys tab provides several facilities for you to work with a large schema with a large number of keys. Use these facilities to review the relationships that matter to your immediate goal of archiving Orders and Details data. 1. In the Processing Options window that you are still in from the previous lesson, move the slider to Data Objects. 2. Click Run. After you submit the task, you can use the Activity Viewer to monitor the progress of the task. When the task indicator on the upper right corner of the Studio shows No Activity, processing is completed. While the process is running, the project is locked. You can still browse while the process is running, but you cannot modify anything. If you click the Activity Viewer you can see which steps have completed and which steps are currently running. 3. When the task is complete, select the PF Keys tab. 4. Click the view mode of the selected object, such as Show All PF Keys. There are several display modes that help you focus on relevant tables in a large diagram. For this scenario, you are only using some of the basic functions. 5. In the list of Connected Tables on the left, find the Orders table and double-click it. a. Review all of the relationships around the Orders table by selecting that relationship either from the diagram or from the tree view. When you select Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 39 a relationship, you see the contents and statistics of that table. Figure 3. Orders table relationships b. Optional: If a relationship is not meaningful, delete the relationship by clicking the X symbol. c. Optional: To add an additional relationship, click the plus (+) symbol and enter the join information. Validate the relationship by clicking Validate Step. Use the statistics to validate the strength of the proposed relationship. d. Optional: Change a relationship by clicking the down arrow and change the join information. Click Validate Step to verify the strength of the proposed change. e. For any relationships that you determine to be good, approve the relationship by clicking the Approved checkbox, and optionally, drag the related tables close together. Tip: Approving an object such as a key relationship, ensures that if the steps are run again this relationship information is not updated. This process of approving and dragging the tables closer together allows you to explore the diagram and to cluster the related tables together for a better view. f. Repeat the review steps until you are satisfied with the valid relationships. 6. Review the data objects for completeness. You can add or change a data object as needed. Each discovered PFkey is presented in the diagram with statistical properties as well as hit or miss data views. For this scenario, you are looking for perfect keys, so these facilities are not as useful as when you examine an imperfect key and try to determine whether it is a real relationship or an accidental one. Discovery can discover perfect keys as well as almost keys. Adjusting the data object InfoSphere Discovery analyzes overall relationships to find business objects. Depending on how deep and wide you want an Access Definition to be, you might need to expand or shrink the boundaries of the business objects. Even in those cases, Discovery provides a critical starting point for you to work with business objects. 40 IBM InfoSphere Discovery Sample Projects The relationships shown in Figure 3 on page 40 were discovered automatically by InfoSphere Discovery. After you review the topology and all of the links between the tables, you might want to change the data objects. In this lesson, it might make more sense that SALES be included through its relationships with CUSTOMERS, instead of with ORDERS. 1. In the Data Object window, select DO_ORDERS. In the diagram, right-click the Sales object and click Delete. 2. Right-click CUSTOMERS and select Add child table. Chapter 2. Introduction to demonstrations about IBM InfoSphere Discovery 41 3. In the Add Table window, select SALES from the table list. Discovery automatically inserts the link between the CUSTOMERS table and the SALES table. 4. Save your work by clicking Project > Save from the main menu bar. Export artifacts After reviewing the relevant data objects, in this case the Orders data object, you have all that you need to generate the code to archive orders that are older than two years from the Customer Information System. You have the access definitions and the associated objects. The next challenge is to make the data objects available to Optim Designer so Optim Designer can generate the code. Use the export to Optim feature in Discovery to generate a set of artifacts as an XML file. 1. Access the Optim Connector in Discovery by clicking Project > Export > Optim Database Models from the main menu bar. 2. In the file browser, select a folder, or click Make New Folder and name it appropriately. 3. Select the new directory and select OK. The export process begins. 4. When you see the export results window, click OK to close that window. You can examine the XML files when the code generation is complete. Discovery generates one Physical Data Model (PDM) file for each Discovery data set, and one Logical Data Model (LDM) file for each data object from Discovery. Optim Designer reads the generated files and turns them into access definitions. What you have learned You created a business object for Optim archiving. v You created data sets that contain tables that you examined for relationships. v You identified relationships between tables and clustered tables together into business objects. v You discovered and reviewed foreign keys that InfoSphere Discovery found for you. v You used InfoSphere Discovery to find and review business objects or data objects. v You exported the data objects that were needed to generate the code to archive orders that are older than two years from the Customer Information System. 42 IBM InfoSphere Discovery Sample Projects Contacting IBM You can contact IBM for customer support, software services, product information, and general information. You also can provide feedback to IBM about products and documentation. The following table lists resources for customer support, software services, training, and product and solutions information. Table 1. IBM resources Resource Description and location IBM Support Portal You can customize support information by choosing the products and the topics that interest you at www.ibm.com/support/ entry/portal/Software/ Information_Management/ InfoSphere_Information_Server Software services You can find information about software, IT, and business consulting services, on the solutions site at www.ibm.com/ businesssolutions/ My IBM You can manage links to IBM Web sites and information that meet your specific technical support needs by creating an account on the My IBM site at www.ibm.com/account/ Training and certification You can learn about technical training and education services designed for individuals, companies, and public organizations to acquire, maintain, and optimize their IT skills at http://www.ibm.com/software/swtraining/ IBM representatives You can contact an IBM representative to learn about solutions at www.ibm.com/connect/ibm/us/en/ Providing feedback The following table describes how to provide feedback to IBM about products and product documentation. Table 2. Providing feedback to IBM Type of feedback Action Product feedback You can provide general product feedback through the Consumability Survey at www.ibm.com/software/data/info/ consumability-survey © Copyright IBM Corp. 2006, 2011 43 Table 2. Providing feedback to IBM (continued) Type of feedback Action Documentation feedback To comment on the information center, click the Feedback link on the top right side of any topic in the information center. You can also send comments about PDF file books, the information center, or any other documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: [email protected] 44 IBM InfoSphere Discovery Sample Projects Product accessibility You can get information about the accessibility status of IBM products. The IBM InfoSphere Information Server product modules and user interfaces are not fully accessible. The installation program installs the following product modules and components: v IBM InfoSphere Business Glossary v IBM InfoSphere Business Glossary Anywhere v IBM InfoSphere DataStage® v IBM InfoSphere FastTrack v IBM InfoSphere Information Analyzer v IBM InfoSphere Information Services Director v IBM InfoSphere Metadata Workbench v IBM InfoSphere QualityStage™ For information about the accessibility status of IBM products, see the IBM product accessibility information at http://www.ibm.com/able/product_accessibility/ index.html. Accessible documentation Accessible documentation for InfoSphere Information Server products is provided in an information center. The information center presents the documentation in XHTML 1.0 format, which is viewable in most Web browsers. XHTML allows you to set display preferences in your browser. It also allows you to use screen readers and other assistive technologies to access the documentation. For information about the accessibility features of the information center, see Accessibility and keyboard shortcuts in the information center. The documentation that is in the information center is also provided in PDF files, which are not fully accessible. IBM and accessibility See the IBM Human Ability and Accessibility Center for more information about the commitment that IBM has to accessibility: © Copyright IBM Corp. 2006, 2011 45 46 IBM InfoSphere Discovery Sample Projects Accessing product documentation Documentation is provided in a variety of locations and formats, including in help that is opened directly from the product client interfaces, in PDF files, and in HTML files. Obtaining the documentation The documentation is distributed with the product, can be accessed from a Web browser, and is orderable. v PDF file books are available online and periodically refreshed at www.ibm.com/support/docview.wss?uid=swg27020315 v You can also order IBM publications in hardcopy format online or through your local IBM representative. To order publications online, go to the IBM Publications Center at http://www.ibm.com/e-business/linkweb/publications/ servlet/pbi.wss. Providing feedback about the documentation You can send your comments about documentation in the following ways: v Online reader comment form: www.ibm.com/software/data/rcf/ v E-mail: [email protected] © Copyright IBM Corp. 2006, 2011 47 48 IBM InfoSphere Discovery Sample Projects Links to non-IBM Web sites This information center may provide links or references to non-IBM Web sites and resources. IBM makes no representations, warranties, or other commitments whatsoever about any non-IBM Web sites or third-party resources (including any Lenovo Web site) that may be referenced, accessible from, or linked to any IBM site. A link to a non-IBM Web site does not mean that IBM endorses the content or use of such Web site or its owner. In addition, IBM is not a party to or responsible for any transactions you may enter into with third parties, even if you learn of such parties (or use a link to such parties) from an IBM site. Accordingly, you acknowledge and agree that IBM is not responsible for the availability of such external sites or resources, and is not responsible or liable for any content, services, products or other materials on or available from those sites or resources. When you access a non-IBM Web site, even one that may contain the IBM-logo, please understand that it is independent from IBM, and that IBM does not control the content on that Web site. It is up to you to take precautions to protect yourself from viruses, worms, trojan horses, and other potentially destructive programs, and to protect your information as you deem appropriate. © Copyright IBM Corp. 2006, 2011 49 50 IBM InfoSphere Discovery Sample Projects Notices and trademarks This information was developed for products and services offered in the U.S.A. Notices IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web © Copyright IBM Corp. 2006, 2011 51 sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation J46A/G4 555 Bailey Avenue San Jose, CA 95141-1003 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to 52 IBM InfoSphere Discovery Sample Projects IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: © (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. _enter the year or years_. All rights reserved. If you are viewing this information softcopy, the photographs and color illustrations may not appear. Trademarks IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: Adobe is a registered trademark of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office UNIX is a registered trademark of The Open Group in the United States and other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Notices and trademarks 53 Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. The United States Postal Service owns the following trademarks: CASS, CASS Certified, DPV, LACSLink, ZIP, ZIP + 4, ZIP Code, Post Office, Postal Service, USPS and United States Postal Service. IBM Corporation is a non-exclusive DPV and LACSLink licensee of the United States Postal Service. Other company, product or service names may be trademarks or service marks of others. 54 IBM InfoSphere Discovery Sample Projects Index A archiving business objects 33 B business objects 33 C creating a project 6 customer support 43 D data objects 33 L legal notices 51 N non-IBM Web sites links to 49 P product accessibility accessibility 45 product documentation accessing 47 S software services support customer 43 43 T trademarks list of 51 W Web sites non-IBM 49 © Copyright IBM Corp. 2006, 2011 55 56 IBM InfoSphere Discovery Sample Projects Printed in USA SC23-9880-04
© Copyright 2025