ch.12

Chapter 12: Query Processing
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Database System Concepts - 6th Edition
12.2
©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing
 Parsing and translation

translate the query into its internal form. This is then
translated into relational algebra.

Parser checks syntax, verifies relations
 Evaluation

The query-execution engine takes a query-evaluation plan,
executes that plan, and returns the answers to the query.
Database System Concepts - 6th Edition
12.3
©Silberschatz, Korth and Sudarshan
Basic Steps in Query Processing :
Optimization
 A relational algebra expression may have many equivalent
expressions

E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
 Each relational algebra operation can be evaluated using one of
several different algorithms

Correspondingly, a relational-algebra expression can be
evaluated in many ways.
Database System Concepts - 6th Edition
12.4
©Silberschatz, Korth and Sudarshan
Optimization
An annotated relational algebra
expression specifying the
detailed evaluation strategy is
called an evaluation primitive
Example:

We can use an index on salary to find instructors with salary <
75000,

or we can perform a complete (serial) relation scan and discard
instructors with salary  75000
The sequence of annotated relational algebra expressions is called
an evaluation plan.
Database System Concepts - 6th Edition
12.5
©Silberschatz, Korth and Sudarshan
Basic Steps: Optimization
The goal of Query Optimization: Amongst all equivalent
evaluation plans choose the one with the lowest cost.
How do we measure cost?

Number of tuples in each relation

Size of tuples

Number of disk access operations

CPU time

RAM space needed

Time needed for communications over a network (LAN, SAN)

Etc.
Database System Concepts - 6th Edition
12.6
©Silberschatz, Korth and Sudarshan
QUIZ: Query Processing
 What are the 3 main steps in query processing?
 What is an evaluation primitive?
 What is a query evaluation plan?
Database System Concepts - 6th Edition
12.7
©Silberschatz, Korth and Sudarshan
QUIZ: Query Processing
 What are the 3 main steps in query processing?
 What is an evaluation primitive?
 What is a query evaluation plan?
Database System Concepts - 6th Edition
12.8
©Silberschatz, Korth and Sudarshan
12.2 Measures of Query Cost
 Time “cost” = total elapsed time for answering the query
 Many factors contribute to time cost …
 Typically disk access is the predominant time cost (also
relatively easy to estimate!).
 Measured by taking into account

Number of seeks
* average-seek-cost

Number of blocks read
* average-block-read-cost

Number of blocks written * average-block-write-cost
Cost to write a block is greater than cost to read a block b/c data is
read back after being written to ensure that the write was free of
errors!
Database System Concepts - 6th Edition
12.9
©Silberschatz, Korth and Sudarshan
12.2 Measures of Query Cost
Cost to write a block is greater than cost to read a block b/c data is
read back after being written to ensure that the write was free of
errors!
Database System Concepts - 6th Edition
12.10
©Silberschatz, Korth and Sudarshan
Measures of Query Cost
 Simplification: we just use the number of block transfers
from disk and the number of seeks as the cost measures
 tT
– time to transfer one block
 tS
– time for one seek
0.1 ms
4 ms
 Cost for b block transfers plus S seeks
b * tT + S * tS
 Simplification: we ignore CPU costs (time)

Real systems do take CPU cost into account
 Simplification: we do not include cost to writing the output
to disk. Why? All execution plans we must choose from
end up with the same data …
Database System Concepts - 6th Edition
12.11
©Silberschatz, Korth and Sudarshan
QUIZ: Measures of Query Cost
≈ 0.1 ms
 Cost for
≈ 4 ms
b block transfers plus S seeks
b * tT + S * tS
If the size of a block is 4 KB, and the transfer rate is
200 MB/s, calculate the block transfer time.
Database System Concepts - 6th Edition
12.12
©Silberschatz, Korth and Sudarshan
QUIZ: Measures of Query Cost
≈ 0.1 ms
 Cost for
≈ 4 ms
b block transfers plus S seeks
b * tT + S * tS
If the size of a block is 4 KB, and the transfer rate is
200 MB/s, calculate the block transfer time.
4 KB / 200 MB/s = 4/(200*1024) s = 0.0000195 s =
= 0.0192 ms ≈ 0.02 ms
Note: Today’s best “enterprise level” HDDs have transfer rates around 200 Mb/s when
handling pairs of consecutive blocks (2x4 KB = 8 KB), see for example this test on a
Seagate drive.
Database System Concepts - 6th Edition
12.13
©Silberschatz, Korth and Sudarshan
QUIZ: Measures of Query Cost
≈ 0.1 ms
≈ 4 ms
 Cost for b block transfers plus S seeks
b * tT + S * tS
Evaluation of a query requires 3 disk seeks, each
followed by transfer of 50 blocks.
What is the total time cost?
Database System Concepts - 6th Edition
12.14
©Silberschatz, Korth and Sudarshan
QUIZ: Measures of Query Cost
≈ 0.1 ms
≈ 4 ms
 Cost for b block transfers plus S seeks
b * tT + S * tS
Evaluation of a query requires 3 disk seeks, each
followed by transfer of 50 blocks.
What is the total time cost?
3*50* 0.1 ms + 3*4 ms = 15 ms + 12 ms = 27 ms
3*50*0.02 ms + 3*4 ms = 3 ms + 12 ms = 15 ms
If using transfer time from prev. example
Database System Concepts - 6th Edition
12.15
©Silberschatz, Korth and Sudarshan
Perspective: Transfer Rates
 Theoretical maximum for SATA: 600 MB/s (SATA III), but in
practice ~100-200 MB/s
 SATA Express announced in 2014: 2 GB/s (theoretical)
 PCI Express (PCIe) 3.0 bus: ~1 GB/s per lane, with typical 16x
configurations :

SSD connected through PCI have surpassed 1 GB/s
SATA = Serial ATA (Advanced Technology Attachment)
PCI = Periferal Component Interconnect
http://www.tested.com/tech/457440-theoretical-vs-actual-bandwidth-pci-express-and-thunderbolt/
Database System Concepts - 6th Edition
12.16
©Silberschatz, Korth and Sudarshan
12.3 Selection Operation
Selection w/o an index is a file scan
 Algorithm A1 (linear search). Scan each file block and
test all records to see whether they satisfy the selection
condition.

Cost estimate: 1 seek + br block transfers
 br


denotes number of blocks containing records from relation r
If selection is on a key attribute, can stop on finding record
 cost = 1 seek + (br /2) block transfers
Linear search can be applied regardless of
selection condition or
 ordering of records in the file, or
 availability of indices

Database System Concepts - 6th Edition
12.17
average
©Silberschatz, Korth and Sudarshan
Extra-credit
Database System Concepts - 6th Edition
12.18
©Silberschatz, Korth and Sudarshan
QUIZ: Linear search
A table has 50,000 tuples, each 100 Bytes in length.
What is the average time cost using linear search?
State all your assumptions!
Database System Concepts - 6th Edition
12.19
©Silberschatz, Korth and Sudarshan
QUIZ: Linear search
A table has 50,000 tuples, each 100 Bytes in length.
What is the average time cost using linear search?
State all your assumptions!
The table has a total of 50,000*100 = 5 million B
5 million B / 4096 B/block = 1,220.7 blocks
1,220.7 / 2 = 610.3 → 611 blocks on average
611 * 0.1 ms + 4 ms = 65.1 ms
Database System Concepts - 6th Edition
12.20
©Silberschatz, Korth and Sudarshan
12.3 Selection Operation
Why not binary search?
Generally does not make sense since data is not stored
consecutively …


except when there is an index available,
and binary search requires more seeks than index search
Database System Concepts - 6th Edition
12.21
©Silberschatz, Korth and Sudarshan
Selections Using Indices
Index scan – search algorithms that use an index

selection condition must be on search-key of index!
 A2 (primary index, equality on key). Retrieve a single record
that satisfies the corresponding equality condition

Cost = (hi + 1) * (tT + tS)
 A3 (primary index, equality on nonkey) Retrieve multiple
records (duplicates).

Records will be in consecutive blocks (Why?)
 Let

b = number of blocks containing matching records
Cost = hi * (tT + tS) + tS + tT * b
Database System Concepts - 6th Edition
12.22
©Silberschatz, Korth and Sudarshan
Extra-credit
Database System Concepts - 6th Edition
12.23
©Silberschatz, Korth and Sudarshan
QUIZ: Linear search
A table has 50,000 tuples, each 100 Bytes in length. The
index is a B+ tree with n = 100. In our table, there are no
more than 1000 duplicates for any nonkey value.
What is the average time cost using primary index, equality
on nonkey?
State all your assumptions!
Database System Concepts - 6th Edition
12.24
©Silberschatz, Korth and Sudarshan
Selections Using Indices
 A4 (secondary index, equality on nonkey).

Retrieve a single record if the search-key is a candidate key
 Cost

= (hi + 1) * (tT + tS)
Retrieve multiple records if search-key is not a candidate key
 each
of n matching records may be on a different block
 Cost
= (hi + n) * (tT + tS)
– Can be very expensive!
Database System Concepts - 6th Edition
12.25
©Silberschatz, Korth and Sudarshan
Selections Involving Comparisons
 Can implement selections of the form AV (r) or A  V(r) by using
a linear file scan,
 or by using indices in the following ways:
 A5 (primary index, comparison). (Relation is sorted on A)
 For A  V(r) use index to find first tuple  v and scan relation
sequentially from there
 For AV (r) just scan relation sequentially until first tuple > v; do
not use index
 A6 (secondary index, comparison).
 For A  V(r) use index to find first index entry  v and scan index
sequentially from there, to find pointers to records.
 For AV (r) just scan leaf pages of index finding pointers to
records, until first entry > v
 In either case, retrieve records that are pointed to
– requires an I/O for each record
– Linear file scan may be cheaper

Database System Concepts - 6th Edition
12.26
©Silberschatz, Korth and Sudarshan
This is the end of the material covered in
our class
 We stop on p.545, before 12.3.3 Complex Selections
 Review today in the lab!
Database System Concepts - 6th Edition
12.27
©Silberschatz, Korth and Sudarshan