Chapter 12: Query Processing Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation Database System Concepts - 6th Edition 12.2 ©Silberschatz, Korth and Sudarshan Basic Steps in Query Processing Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations Evaluation The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query. Database System Concepts - 6th Edition 12.3 ©Silberschatz, Korth and Sudarshan Basic Steps in Query Processing : Optimization A relational algebra expression may have many equivalent expressions E.g., salary75000(salary(instructor)) is equivalent to salary(salary75000(instructor)) Each relational algebra operation can be evaluated using one of several different algorithms Correspondingly, a relational-algebra expression can be evaluated in many ways. Database System Concepts - 6th Edition 12.4 ©Silberschatz, Korth and Sudarshan Optimization An annotated relational algebra expression specifying the detailed evaluation strategy is called an evaluation primitive Example: We can use an index on salary to find instructors with salary < 75000, or we can perform a complete (serial) relation scan and discard instructors with salary 75000 The sequence of annotated relational algebra expressions is called an evaluation plan. Database System Concepts - 6th Edition 12.5 ©Silberschatz, Korth and Sudarshan Basic Steps: Optimization The goal of Query Optimization: Amongst all equivalent evaluation plans choose the one with the lowest cost. How do we measure cost? Number of tuples in each relation Size of tuples Number of disk access operations CPU time RAM space needed Time needed for communications over a network (LAN, SAN) Etc. Database System Concepts - 6th Edition 12.6 ©Silberschatz, Korth and Sudarshan QUIZ: Query Processing What are the 3 main steps in query processing? What is an evaluation primitive? What is a query evaluation plan? Database System Concepts - 6th Edition 12.7 ©Silberschatz, Korth and Sudarshan QUIZ: Query Processing What are the 3 main steps in query processing? What is an evaluation primitive? What is a query evaluation plan? Database System Concepts - 6th Edition 12.8 ©Silberschatz, Korth and Sudarshan 12.2 Measures of Query Cost Time “cost” = total elapsed time for answering the query Many factors contribute to time cost … Typically disk access is the predominant time cost (also relatively easy to estimate!). Measured by taking into account Number of seeks * average-seek-cost Number of blocks read * average-block-read-cost Number of blocks written * average-block-write-cost Cost to write a block is greater than cost to read a block b/c data is read back after being written to ensure that the write was free of errors! Database System Concepts - 6th Edition 12.9 ©Silberschatz, Korth and Sudarshan 12.2 Measures of Query Cost Cost to write a block is greater than cost to read a block b/c data is read back after being written to ensure that the write was free of errors! Database System Concepts - 6th Edition 12.10 ©Silberschatz, Korth and Sudarshan Measures of Query Cost Simplification: we just use the number of block transfers from disk and the number of seeks as the cost measures tT – time to transfer one block tS – time for one seek 0.1 ms 4 ms Cost for b block transfers plus S seeks b * tT + S * tS Simplification: we ignore CPU costs (time) Real systems do take CPU cost into account Simplification: we do not include cost to writing the output to disk. Why? All execution plans we must choose from end up with the same data … Database System Concepts - 6th Edition 12.11 ©Silberschatz, Korth and Sudarshan QUIZ: Measures of Query Cost ≈ 0.1 ms Cost for ≈ 4 ms b block transfers plus S seeks b * tT + S * tS If the size of a block is 4 KB, and the transfer rate is 200 MB/s, calculate the block transfer time. Database System Concepts - 6th Edition 12.12 ©Silberschatz, Korth and Sudarshan QUIZ: Measures of Query Cost ≈ 0.1 ms Cost for ≈ 4 ms b block transfers plus S seeks b * tT + S * tS If the size of a block is 4 KB, and the transfer rate is 200 MB/s, calculate the block transfer time. 4 KB / 200 MB/s = 4/(200*1024) s = 0.0000195 s = = 0.0192 ms ≈ 0.02 ms Note: Today’s best “enterprise level” HDDs have transfer rates around 200 Mb/s when handling pairs of consecutive blocks (2x4 KB = 8 KB), see for example this test on a Seagate drive. Database System Concepts - 6th Edition 12.13 ©Silberschatz, Korth and Sudarshan QUIZ: Measures of Query Cost ≈ 0.1 ms ≈ 4 ms Cost for b block transfers plus S seeks b * tT + S * tS Evaluation of a query requires 3 disk seeks, each followed by transfer of 50 blocks. What is the total time cost? Database System Concepts - 6th Edition 12.14 ©Silberschatz, Korth and Sudarshan QUIZ: Measures of Query Cost ≈ 0.1 ms ≈ 4 ms Cost for b block transfers plus S seeks b * tT + S * tS Evaluation of a query requires 3 disk seeks, each followed by transfer of 50 blocks. What is the total time cost? 3*50* 0.1 ms + 3*4 ms = 15 ms + 12 ms = 27 ms 3*50*0.02 ms + 3*4 ms = 3 ms + 12 ms = 15 ms If using transfer time from prev. example Database System Concepts - 6th Edition 12.15 ©Silberschatz, Korth and Sudarshan Perspective: Transfer Rates Theoretical maximum for SATA: 600 MB/s (SATA III), but in practice ~100-200 MB/s SATA Express announced in 2014: 2 GB/s (theoretical) PCI Express (PCIe) 3.0 bus: ~1 GB/s per lane, with typical 16x configurations : SSD connected through PCI have surpassed 1 GB/s SATA = Serial ATA (Advanced Technology Attachment) PCI = Periferal Component Interconnect http://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/ Database System Concepts - 6th Edition 12.16 ©Silberschatz, Korth and Sudarshan 12.3 Selection Operation Selection w/o an index is a file scan Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition. Cost estimate: 1 seek + br block transfers br denotes number of blocks containing records from relation r If selection is on a key attribute, can stop on finding record cost = 1 seek + (br /2) block transfers Linear search can be applied regardless of selection condition or ordering of records in the file, or availability of indices Database System Concepts - 6th Edition 12.17 average ©Silberschatz, Korth and Sudarshan Extra-credit Database System Concepts - 6th Edition 12.18 ©Silberschatz, Korth and Sudarshan QUIZ: Linear search A table has 50,000 tuples, each 100 Bytes in length. What is the average time cost using linear search? State all your assumptions! Database System Concepts - 6th Edition 12.19 ©Silberschatz, Korth and Sudarshan QUIZ: Linear search A table has 50,000 tuples, each 100 Bytes in length. What is the average time cost using linear search? State all your assumptions! The table has a total of 50,000*100 = 5 million B 5 million B / 4096 B/block = 1,220.7 blocks 1,220.7 / 2 = 610.3 → 611 blocks on average 611 * 0.1 ms + 4 ms = 65.1 ms Database System Concepts - 6th Edition 12.20 ©Silberschatz, Korth and Sudarshan 12.3 Selection Operation Why not binary search? Generally does not make sense since data is not stored consecutively … except when there is an index available, and binary search requires more seeks than index search Database System Concepts - 6th Edition 12.21 ©Silberschatz, Korth and Sudarshan Selections Using Indices Index scan – search algorithms that use an index selection condition must be on search-key of index! A2 (primary index, equality on key). Retrieve a single record that satisfies the corresponding equality condition Cost = (hi + 1) * (tT + tS) A3 (primary index, equality on nonkey) Retrieve multiple records (duplicates). Records will be in consecutive blocks (Why?) Let b = number of blocks containing matching records Cost = hi * (tT + tS) + tS + tT * b Database System Concepts - 6th Edition 12.22 ©Silberschatz, Korth and Sudarshan Extra-credit Database System Concepts - 6th Edition 12.23 ©Silberschatz, Korth and Sudarshan QUIZ: Linear search A table has 50,000 tuples, each 100 Bytes in length. The index is a B+ tree with n = 100. In our table, there are no more than 1000 duplicates for any nonkey value. What is the average time cost using primary index, equality on nonkey? State all your assumptions! Database System Concepts - 6th Edition 12.24 ©Silberschatz, Korth and Sudarshan Selections Using Indices A4 (secondary index, equality on nonkey). Retrieve a single record if the search-key is a candidate key Cost = (hi + 1) * (tT + tS) Retrieve multiple records if search-key is not a candidate key each of n matching records may be on a different block Cost = (hi + n) * (tT + tS) – Can be very expensive! Database System Concepts - 6th Edition 12.25 ©Silberschatz, Korth and Sudarshan Selections Involving Comparisons Can implement selections of the form AV (r) or A V(r) by using a linear file scan, or by using indices in the following ways: A5 (primary index, comparison). (Relation is sorted on A) For A V(r) use index to find first tuple v and scan relation sequentially from there For AV (r) just scan relation sequentially until first tuple > v; do not use index A6 (secondary index, comparison). For A V(r) use index to find first index entry v and scan index sequentially from there, to find pointers to records. For AV (r) just scan leaf pages of index finding pointers to records, until first entry > v In either case, retrieve records that are pointed to – requires an I/O for each record – Linear file scan may be cheaper Database System Concepts - 6th Edition 12.26 ©Silberschatz, Korth and Sudarshan This is the end of the material covered in our class We stop on p.545, before 12.3.3 Complex Selections Review today in the lab! Database System Concepts - 6th Edition 12.27 ©Silberschatz, Korth and Sudarshan
© Copyright 2024