Query Processing - Department of Computer Science and Engineering

CSL 451 Introduction to Database Systems
Query Processing
Department of Computer Science and Engineering
Indian Institute of Technology Ropar
Narayanan (CK) Chatapuram Krishnan!
Summary
•  Measures of Query Cost
–  time to transfer one block
–  time for one seek
•  Selection Operation
–  linear search/file scan
–  index scan - equality
•  primary index (A2, A3)
•  secondary index (A4)
–  index scan – comparison
•  primary index (A5)
•  secondary index (A6)
–  conjunctive selection (A7,
A8, A9)
–  disjunctive selection (A10)
01/04/15!
•  Sorting
–  external sort-merge
•  Join
– 
– 
– 
– 
nested-loop join
block nested-loop join
indexed nested-loop join
merge-join
•  sort-merge join
•  hybrid merge-join
–  hash-join
•  hybrid hash-join
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
2!
12.1 Assume that only one tuple fits in a block and memory holds at
most 3 blocks. Show the runs created on each pass of the sort-merge
algorithm, when applied to sort the following tuples on the first
attribute
(Kangaroo, 17)
(wallaby, 21)
(emu, 1)
(wombat, 13)
(platypus, 3)
(lion, 8)
(warthog, 4)
(zebra, 11)
(meerkat, 6)
(hyena, 9)
(hornbill, 2)
(baboon, 12)
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
3!
12.2 Consider the bank database, where the primary keys are
underlined, and the following SQL query:
select T.branch_name
from branch T, branch S
where T.assets > S.assets and S.branch_city=‘Brooklyn’
Write an (efficient) relational algebra expression that is
equivalent to the query.
branch (branch_name, branch_city, assets)
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
4!
12.6.a Consider the bank database. Suppose that a B+-tree
index on branch_city is available on relation branch, and that no
other index is available. List different ways to handle the
following selections that involve negation
σ~(branch_city<‘Brooklyn’)(branch)
branch (branch_name, branch_city, assets)
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
5!
12.6.b Consider the bank database. Suppose that a B+-tree
index on branch_city is available on relation branch, and that no
other index is available. List different ways to handle the
following selections that involve negation
σ~(branch_city=‘Brooklyn’)(branch)
branch (branch_name, branch_city, assets)
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
6!
12.6.c Consider the bank database. Suppose that a B+-tree
index on branch_city is available on relation branch, and that no
other index is available. List different ways to handle the
following selections that involve negation
σ~(branch_city<‘Brooklyn’∨assets<5000)(branch)
branch (branch_name, branch_city, assets)
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
7!
12.9 What is the effect on the cost of merging runs if the number
of buffer blocks per run is increased, while keeping overall
memory available for buffering runs fixed?
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
8!
12.10.a Suppose you need to sort a relation of 40GB, with 4KB
blocks, using a memory size of 40MB. Suppose the cost of a
seek is 5ms, while the disk transfer rate is 40MB/s.
Find the cost of sorting the relation, in seconds, with bb=1 and
with bb=100.
number of block transfers –
number of seeks -
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
9!
12.10.b Suppose you need to sort a relation of 40GB, with 4KB
blocks, using a memory size of 40MB. Suppose the cost of a
seek is 5ms, while the disk transfer rate is 40MB/s.
In each case, how many merge passes are required?
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
10!
12.10.c Suppose you need to sort a relation of 40GB, with 4KB
blocks, using a memory size of 40MB. Suppose the cost of a
seek is 5ms, while the disk transfer rate is 40MB/s.
Suppose a flash storage device is used instead of a disk, and it
has a seek time of 1mcs, and a transfer rate of 40MB/s.
Recompute the cost of sorting the relation in seconds, with bb=1
and bb=100 in this setting.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
11!
12.3.a Let r1(A, B, C) and r2(C, D, E) have the following
properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples
of r1 fit on one block, and 30 tuples of r2 fit on one block.
Estimate the number of block transfers and seeks required,
using nested-loop join.
To compute the theta join
r
θ s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy
the join condition θ if they do, add tr • ts to the result.
end
end
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
12!
12.3.b Let r1(A, B, C) and r2(C, D, E) have the following
properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples
of r1 fit on one block, and 30 tuples of r2 fit on one block.
Estimate the number of block transfers and seeks required,
using block nested-loop join.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
13!
12.3.c Let r1(A, B, C) and r2(C, D, E) have the following
properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples
of r1 fit on one block, and 30 tuples of r2 fit on one block.
Estimate the number of block transfers and seeks required,
using merge join.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
14!
12.3.d Let r1(A, B, C) and r2(C, D, E) have the following
properties: r1 has 20,000 tuples, r2 has 45,000 tuples, 25 tuples
of r1 fit on one block, and 30 tuples of r2 fit on one block.
Estimate the number of block transfers and seeks required,
using hash join.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
15!
12.11. Describe how to implement the following operation using
sorting
semijoin (⋉θ): r ⋉θ s is defined as ΠR (r ⋈θ s)where R is the set of
attributes in the schema of r; that it selects those tuples in ri in r
for which there is a tuple sj in s such that ri and sj satisfy
predicate θ.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
16!
12.13 Design a variant of hybrid merge-join algorithm for the
case where both relations are not physically sorted, but both
have a sorted secondary index on the join attributes.
01/04/15!
Department of Computer Science and Engineering !
Indian Institute of Technology Ropar!
17!