CSL 451 Introduction to Database Systems Quiz 8, 8-4-2015 Name: Roll Number: 1. Define/Explain the following a. Hybrid-merge join (2 points) - Algorithm to perform join between two relations, which merges the sorted relation with the leaf entries of the secondary B+-tree index of the unsorted relation. b. Hash join (2 points) - Algorithm to perform join between two relations, which first partitions the relations using a hash function defined on the join attributes. The corresponding pairs of partitions of the two relations are then joined using a index nested-loop join using a separate hash index. c. Equivalence of two relational algebra expressions (1 point) Two relational algebra expressions are equivalent if, on every legal database instance, the two expressions generate the same set of tuples. 2. Give an instance of a relation that shows the expressions not equivalent. (2 points) R(A,B) (a1, b1) !(π β π) and !(π ) β !(π) are S(A,B) (a1, b2) !(π β π) = !(π1, π1) = π1 (π ) β ! π = π1 β π1 = π ! 3. Let relations r and s have the following properties: r has 15,000 tuples and s has 1000 tuples. 10 tuples of r fit in one memory block, while 20 tuples of s fit in one block. The main memory has a total of 100 blocks. Ir, a primary B+-tree index on relation r, occupies 25 blocks and Is, a primary B+-tree index on s, occupies 15 blocks. Let the index attribute of both the relations be the only attributes that participate in a theta join. An index nested-loop join is used to perform the join. Let us also make the assumption that the index of the inner relation is completely loaded into the memory, one block in the memory is used to store a block of the inner relation and the remaining blocks of the memory are equally divided to store blocks of the outer relation and join output. Estimate the cost of the join operation (in terms of number of disk access) between r and s using index nested-loop join. Based on the cost estimate, which of the relations should be used as the outer relation while performing the join? (8 points) case 1: Let r and s be the outer and inner relations respectively for the index nested-loop join 15 blocks of the primary B+-tree index of relation s have to be copied to the main memory. 1 block in the main memory is used to store one block of relation s 42 blocks are available for storing blocks of relation r 42 blocks are available for storing the output of the join There are 15,000/10 = 1,500 blocks of relation r. The blocks are continuous and therefore require 1500/42 number of disk access to copy them into the main memory. For each tuple of relation r loaded in the main memory, perform index lookup for relation s. In the worst case we need 15 disk accesses to load the index of s into the main memory and one additional access to the tuple of relation s that satisfies the join condition. So total number of disk accesses β 1500/42 +15+15000*1 = 15051 case 2: Let s and r be the outer and inner relations respectively for the index nested-loop join. 25 blocks of primary B+-tree index of relation r have to be copied to the main memory 1 block in the main memory is used to store one block of relation r 37 blocks are available for storing blocks of relation s 37 blocks are available for storing the output of the join There are 1000/20 = 50 blocks of relation s. The blocks are continuous and therefore require 50/37 number of disk access to copy them into the main memory. For each tuple in relation s loaded in the main memory, perform index lookup for relation r. In the worst case we need 25 disk accesses to load the index of r into the main memory and one additional access to the tuple of relation r that satisfies the join condition. So the total number of disk accesses β 50/37 +25+1000*1=1027 Relation s should be used as the outer relation in the index nested-loop join algorithm.
© Copyright 2024