Mike Cain DB2 for i Center of Excellence Rochester, MN USA [email protected] db2fori.blogspot.com How to Boost Query Performance Using Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training in·dex Something that serves to guide, point out, or otherwise facilitate reference. COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation Indexing Technology within DB2 for i COMMON Sweden 2013 / Copyright 2013 IBM Corporation IBM Training DB2 for i Two types of indexing technologies are supported – Radix Index – Encoded Vector Index (OmniFind Text Index will be covered later) Each type of index has specific uses and advantages Respective indexing technologies compliment each other Indexes can be used for statistics and implementation Indexes can provide RRNs and/or data Indexes are scanned or probed – Probe can only occur on contiguous, leading key columns – Scan can occur on any key column – Probe and scan can be used together COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Using Indexes - Probe v Scan Index Key Columns (CUSTOMER, ORDER, ITEM) • Probe (key positioning) with leading, n contiguous key columns 1 1+2 1+2+3 • Scan (test) with any other key columns 2 3 2+3 CUSTOMER ORDER ITEM 001 B507 AB-2700 001 B607 CD-2000 002 B100 XY-1005 002 B102 AZ-5000 003 B709 HH-6500 004 B043 HH-6500 ...WHERE ORDER = ‘B102‘ AND CUSTOMER = 002 ...WHERE ITEM = ‘HH-6500’ ...WHERE ITEM = ‘CD-2000’ AND CUSTOMER = 001 …WHERE ITEM = ‘HH-6500’ OR COLOR = ‘RED’ COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Radix Index Index “tree” structure Key values are compressed – Common patterns are stored once – Unique portion stored in “leaf” pages – Positive impact on size and depth of the index tree Algorithm used to find values – Binary search – Modified to fit the data structure Maintenance – Index data is automatically spread across all available disk units – Tree is automatically rebalanced to maintain an efficient structure Temporary indexes – Considered a temporary data structure to assist the DB engine COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Radix Index ROOT Database Table 001 ARKANSAS 002 MISSISSIPPI 003 MISSOURI 004 IOWA 005 ARIZONA … … Test Node AR IZONA 005 ADVANTAGES: Very fast access to a single key value Also fast for small, selected range of key values (low cardinality) Provides order MISS IOWA 004 ISSIPPI 002 OURI 003 KANSAS 001 DISADVANTAGES: Table rows retrieved in order of key values (not physical order) which equates to random I/O’s No way to predict which physical index pages are next when traversing the index for large number of key values COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Index Probe Example EMPLOYEE Table Given an index on table EMPLOYEE keyed on STATE... SELECT * FROM EMPLOYEE WHERE STATE = ‘IOWA’ EMPLOYEE Index STATE Perform a probe into the range using the local selection value(s) RRN ... IOWA (004) IOWA (007) IOWA (010) IOWA (017) KANSAS (011) MISSISSIPPI (002) MISSISSIPPI (013) MISSOURI (003) ... COMMON Sweden 2013 / Copyright 2013 IBM Corporation RRN STATE 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 ARKANSAS MISSISSIPPI MISSOURI IOWA ARIZONA MONTANA IOWA NEBRASKA NEBRASKA IOWA KANSAS WISCONSIN MISSISSIPPI WISCONSIN WISCONSIN ARKANSAS IOWA © 2012 IBM Corporation IBM Training Encoded Vector Index (EVI) Index for delivering fast data access in analytical and reporting environments – Advanced technology from IBM Research – Used to produce dynamic bitmaps and RRN lists – Fast access to statistics to improve query optimizer decision making Not a “tree” structure Can only be created through an SQL interface or System i Navigator GUI CREATE ENCODED VECTOR INDEX SchemaName/IndexName ON SchemaName/TableName (ColumnName) INCLUDE ( SUM(ColumnName)); New in 7.1 Maintained aggregate to support index only access COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Encoded Vector Index (EVI) Symbol Table Key Value Code Count Include Include Sum() Sum() Arizona 1 5000 1500 2005 Arkansas 2 7300 3200 450 … Wisconsin 49 340 575 1200 Wyoming 50 2760 210 0 optional Vector RRN 1 1 17 2 5 3 9 4 2 5 7 6 50 7 49 8 5 9 … … Symbol table contains information for each distinct key value - Each key value is assigned a unique code (key compression) - Code is 1, 2, or 4 bytes depending on number of distinct key values Rather then a bit array for each distinct key value, use one array of codes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Bitmap / RRN List Example Given an EVI on table EMPLOYEE keyed on STATE... Set bits in bitmap or return RRN list ...WHERE STATE = ‘ILLINOIS’ Vector Symbol Table Scan or binary search symbol table for key(s) and code(s) Key ARIZONA ARKANSAS CALIFORNIA COLORADO ILLINOIS IOWA KANSAS MISSISSIPPI ... Code 1 2 3 4 5 6 7 8 ... Scan vector for code(s) 1 17 5 9 2 7 50 49 5 . . . . . . . COMMON Sweden 2013 / Copyright 2013 IBM Corporation Bitmap 0 0 1 0 0 0 0 0 1 . . . . . . . Row 1 2 3 4 5 6 7 8 9 ... © 2012 IBM Corporation IBM Training Index ANDing Example State Workdept EVI EVI Intermediate RRN list 3 5 10 15 1000 AND 1005 (Merge) 1007 3001 3050 State SELECT * FROM EMPLOYEE WHERE STATE = ‘MINNESOTA' AND WORKDEPT IN ( 'B01', C01, 'E01') Intermediate RRN list Final RRN list 3 7 10 27 1000 1010 2035 3001 4100 3 10 1000 3001 Represents all the local selection Workdept COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Index ORing Example State Workdept EVI Radix Intermediate RRN list 4 1001 1005 OR 3051 (Merge) State SELECT * FROM EMPLOYEE WHERE STATE = ‘IOWA' OR WORKDEPT IN ( 'B01', C01, 'E01') Intermediate RRN list Final RRN list 2 8 1004 4105 2 4 8 1001 1004 1005 3051 4105 Represents all the local selection Workdept COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training EVI Symbol Table Only Example Given an EVI on table EMPLOYEE keyed on STATE... SELECT COUNT(*) FROM EMPLOYEE WHERE STATE = ‘Wisconsin’; SELECT COUNT(DISTINCT STATE) FROM EMPLOYEE; SELECT STATE, SUM(), SUM() FROM EMPLOYEE GROUP BY STATE; Symbol Table Key Value Search / Scan symbol table for key(s) and counts Code Count Arizona 1 Arkansas Include Include Sum() Sum() 5000 1500 2005 2 7300 3200 450 Wisconsin 49 340 575 1200 Wyoming 50 2760 210 0 … COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training DB2 for i cardinality The number of distinct elements in a set. • High cardinality = large distinct number of values • Low cardinality = small distinct number of values selectivity The number of elements matching the criteria. • High selectivity = small number of rows match • Low selectivity = large number of rows match In general… • A radix index is best when choosing a small set of rows and the key cardinality is high • An encoded vector index is best when choosing a set of rows and the key cardinality is low • Understanding the data and query are key COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Transactions table Transaction Number Transaction Type 100 million rows 3 transaction types Cardinality and Selectivity SELECT * FROM TRANSACTIONS WHERE TRANSACTION_NUMBER = 1; SELECT * FROM TRANSACTIONS WHERE TRANSACTION_TYPE = ‘D’ COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating SQL Indexes CREATE INDEX SQL statement CREATE INDEX MY_IX on MY_TABLE (KEY1, KEY2) CREATE ENCODED VECTOR INDEX SQL statement CREATE ENCODED VECTOR INDEX MY_EVI on MY_TABLE (KEY1) System i Navigator – Database graphical interface Primary Key, Foreign Key and Unique Key Constraints – CREATE TABLE – ALTER TABLE COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creating Indexes COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Creation of Index with Derived Keys Creation of indexes with derived keys via SQL CREATE INDEX ORDERPRIORITYUPPER ON T1 (UPPER(ORDERPRIORITY) AS UORDERPRIORITY ASC); CREATE ENCODED VECTOR INDEX YEARQTR ON T1 (YEAR(ORDERDATE) AS ORDYEAR ASC, QUARTER(ORDERDATE) AS ORDQTR ASC); CREATE INDEX TOTALEXTENDEDPRICE ON T1 (QUANTITY * EXTENDEDPRICE AS TOTEXTPRICE ASC); NOTE: There are some restrictions on which indexes can be used by the optimizer COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Create Sparse Indexes from SQL Support of WHERE clause on SQL create index CREATE INDEX FASTDELIVER ON T1 (SHIPMODE ASC) WHERE SHIPMODE = 'NEXTDAYAIR‘ OR SHIPMODE = 'COURIER'; NOTE: Index is NOT used by the query optimizer (CQE and SQE) prior to 7.1 COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training When to a use Derived Index? Derived indexes may be useful for – Case insensitive searches (i.e UPPER(COL1) = ‘ABC’ – Data extracted from a column (i.e. SUBSTR(), YEAR(), MONTH()…) – Results of operations (i.e. COL1+COL2 , QTY * COST) – Might be useful to allow index only access in more cases – Reduce table scans and temporary data structures COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training IBM OmniFind Text Search Server for DB2 for i IBM i licensed product: 5733-OMF – No-charge offering – Requires IBM i 6.1 or 7.1 Delivers common DB2 Family text search technology – Advanced, linguistic high-speed textual searches – Support enabled for any character-based column – Search technology also supports Rich Text document formats • Example: LOB columns containing PDF or Microsoft® Word documents • IFS documents can be indexed with extra programming – Includes support for 26 different languages COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training OmniFind Database Requirements Some of the supported document format types: – Plain text – XML – HTML – Adobe PDF – Rich Text Format (RTF) – JustSystems Ichitaro – Microsoft Excel – Microsoft PowerPoint – Microsoft Word – Lotus® 123 – Lotus Freelance® – Lotus WordPro – Lotus Symphony – OpenOffice 1.1 & 2.0 – OpenOffice Calc – Quattro Pro – StarOffice Calc COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Text Index vs DB2 Index Text indexes stored outside of DB2 in IFS Text indexes have delayed maintenance – Changes logged to staging table – Index maintenance is scheduled Text indexes not protected by IBM i SMAPP Text indexes utilize different indexing methods – Table with VARCHAR(32000) column and 175,000 rows • DB2 Index object size: 1.7 GB • Text Index object size: 0.1 GB COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Interfaces A set of administrative stored procedures for starting and ending the text search server. • SYSTS_START • SYSTS_STOP A set of administrative stored procedures to create, drop, and maintain the text search index. • SYSTS_CREATE • SYSTS_DROP, • SYSTS_UPDATE Two built in functions to query the index • CONTAINS • SCORE System i Navigator 7.1 COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Built in DB2 Text Search functions CONTAINS – Searches a text search index using the criteria that are specified in a search argument and results about whether or not a match was found CONTAINS(column-name, search-argument, options) – column-name is the column over which the text search server is built – search-argument is the term or phrase to search for – options is an optional parameter that can be used to modify the query language or turn synonym matching on – If the term or phrase is found, returns 1 (i.e. true) COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training CONTAINS example Find the exact string “IBM” in the COMMENT column, using a host variable Assume a text search index is built over column COMMENT EXEC SQL DECLARE C1 CURSOR FOR SELECT CUSTKEY FROM CUSTOMERS WHERE CONTAINS(COMMENT, :search_arg) = 1 ORDER BY CUSTKEY; EXEC SQL SET :search_arg = ‘”IBM”'; EXEC SQL OPEN C1; COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Built in search functions SCORE – The SCORE function searches a text search index using criteria that are specified in a search argument and returns a relevance score that measures how well a document matches the search argument SCORE(column-name, search-argument, options) – column-name is the column over which the text search server is built – search-argument is the term or phrase to search for – options is an optional parameter that can be used to modify the query language or turn synonym matching on – The result of SCORE is a value between 0 and 1 (i.e. decimal) – A higher number indicates that a document contains a match for the search argument more frequently COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Score Example The following statement generates a list of employees in the order of how well their resumes match the query "programmer AND (java OR cobol)", along with a relevance value that is normalized between 0 (zero) and 100 SELECT EMPNO, INTEGER(SCORE(RESUME, 'programmer AND (java OR cobol)') * 100)) AS RELEVANCE FROM EMP_RESUME WHERE RESUME_FORMAT = ‘ASCII' AND CONTAINS(RESUME, 'programmer AND (java OR cobol)') = 1 ORDER BY RELEVANCE DESC COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation XML IBM Training Given a table with email messages store in xml format, within a VARCHAR column EMAIL The text index can be created with: CALL sysproc.systs_create('MYSCHEMA', 'XMLI', ’MYSCHEMA.MY_TABLE(EMAIL)', 'FORMAT XML') The “FORMAT XML” clause tells OmniFind that the column should be parsed and indexed as XML data COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation XML IBM Training Suppose the XML data being searched is in a format of: <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Nick</to> <from>Tiffany</from> <heading>OmniFind for DB2i</heading> <body> OmniFind is wonderful for working with XML data! </body> </note> COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Xpath in CONTAINS clause Use the CONTAINS function to search for documents that have “Nick” in the “to” section of the xml document SELECT EMAIL FROM MYCHEMA.MYTABLE WHERE CONTAINS(EMAIL, @xpath:''/note//to[.contains("Nick")]''') = 1 COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation Query Optimization (using indexes) COMMON Sweden 2013 / Copyright 2013 IBM Corporation IBM Training Data Access Methods Cost based optimization dictates that the fastest access method for a given table will vary based upon selectivity of the query High Response Time Table Scan Clustered, Skip Seq Probe Low Few Many Number of rows searched / accessed COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Strategy for Query Optimization Optimizing indexes will generally follow this simplified strategy: ? Gather list of indexes for statistics and costing implementation methods Consider the indexes based on how the index can be used Local selection Joining Grouping Ordering Index only access One index may be useful for statistics, and another useful for implementation COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Query Optimization Feedback Visual Explain SQE Plan Cache Indexes Advised SQE Plan Cache Snapshots SQL Monitor Data SQL request Query Optimization COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Indexing Advice from the Optimizer Both CQE and SQE provide index creation advice – QSYS2.SYSIXADV – System i Navigator CQE – – – – – Basic advice Radix index only Based on table scan and local selection columns only Temporary index creation information also provides insight CQE Visual Explain will try and tie pieces together to advice a better index SQE – – – – – Robust advice Radix and EVI indexes Based on all parts of the query Multiple indexes can be advised for the same query Some limitations COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Indexing Advice from the Optimizer… Local selection, Join, Group By, Order By Equals first, followed by one inequality (for probing) No ORed predicates involving different columns No derived keys No EVI include columns COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Autonomic Index Creation Optimizer can have the DB Engine create a temporary index Both full and sparse indexes can be created Temporary indexes are not used for statistics (unless running 7.1 + PTFs) Temporary indexes are maintained CQE – Temporary indexes are not reused and not shared – Usually a bottleneck in query performance – Can impact overall system performance – Can increase the amount of temporary storage used SQE – Temporary indexes are reused and shared across jobs and queries – Creation is based on “watching” the query requests over time – Creation is based on optimizer’s own index advice – Temporary index maintenance is delayed when all associated cursors closed COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Index Evaluator via Catalog Views select * from qsys2.sysindexstat; COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Index Evaluator via Catalog Views SYSINDEXSTAT Contains one row for every SQL index. Use this view when you want to see information for a specific SQL index or set of SQL indexes. The information is similar to that returned via Show Indexes in System i Navigator. SYSPARTITIONINDEXES Contains one row for every index built over a table partition or table member. Use this view when you want to see index information for indexes built on a specified table or set of tables. The information is similar to that returned via Show Indexes in System i Navigator. SYSTABLEINDEXSTAT Contains one row for every index that has at least one partition or member built over a table. If the index is over more than one partition or member, the statistics include all those partitions and members. COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Indexing Strategies COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Create all advised indexes? Nothing, let the system handle it? Monitor, analyze, and tune important tables and queries? Get some help? COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training DB2 for i The goals of creating indexes are: 1. Provide the optimizer the statistics needed to understand the data, based on the query 2. Provide the optimizer implementation choices, based on the selectivity of the query COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training The Process of Identifying Indexes Proactive method • Analyze the data model, application and SQL requests • Database PK, UK, RI constraints are helpful Reactive method • Rely on optimizer feedback and actual implementation methods • Rely on SQE’s ability to auto tune using temporary indexes Understand the data being queried • Column selectivity • Column cardinality COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Indexing Strategy - Basic Approach Radix Indexes • Common local selection columns Minimum • Join columns • Local selection columns + join columns • Local selection columns + grouping columns • Local selection columns + ordering columns Advanced Encoded Vector Indexes • Local selection column for index ANDing/ORing • Join columns (star or snowflake schema) • Index only access Requires knowledge of query optimization and data lifecycle •DISTINCT, COUNT, COUNT DISTINCT, SUM() Note: Columns used with equal conditions are first in key list COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Indexing Strategy – React to Implementatoin If the optimizer feedback indicates: Full table scan Create an index on local selection columns Temporary index Create an index on join columns Create an index on grouping columns Create an index on ordering columns Hash table Create an index on join columns Create an index on grouping columns “Perfect”, multiple key column radix indexes are usually best COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Analyzing the Data Model CUSTOMERS custkey ORDERS orderkey custkey DATES partkey datekey orderdate PARTS shipdate partkey suppkey SUPPLIERS suppkey COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training Examples select from where * orders shipdate = ‘2010-10-01’; select from where and * orders orderdate = ‘2010-10-01’ quantity = 0; select from inner join on where * orders o customers c o.custkey = c.custkey c.customer_number = 102; COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation IBM Training COMMON Sweden 2013 / Copyright 2013 IBM Corporation © 2012 IBM Corporation
© Copyright 2024