Download Report

CS 245: Database System
Principles
Notes 4: Indexing
Hector Garcia-Molina
CS 245
Notes 4
1
Chapter 4
Indexing & Hashing
value
CS 245
record
value
?
Notes 4
2
Topics
• Conventional indexes
• B-trees
• Hashing schemes
CS 245
Notes 4
3
Sequential File
10
20
30
40
50
60
70
80
90
100
CS 245
Notes 4
4
Dense Index
10
20
10
20
30
40
30
40
50
60
70
80
50
60
70
80
90
100
90
100
110
120
CS 245
Sequential File
Notes 4
5
Sparse Index
10
20
10
30
50
70
30
40
90
110
130
150
50
60
70
80
90
100
170
190
210
230
CS 245
Sequential File
Notes 4
6
Sequential File
Sparse 2nd level
10
90
170
250
330
410
490
570
10
20
10
30
50
70
30
40
90
110
130
150
50
60
70
80
90
100
170
190
210
230
CS 245
Notes 4
7
• Comment:
{FILE,INDEX} may be contiguous
or not (blocks chained)
CS 245
Notes 4
8
Question:
• Can we build a dense, 2nd level index
for a dense index?
CS 245
Notes 4
9
Notes on pointers:
(1) Block pointer (sparse index) can be
smaller than record pointer
BP
RP
CS 245
Notes 4
10
Notes on pointers:
(2) If file is contiguous, then we can omit
pointers (i.e., compute them)
CS 245
Notes 4
11
R1
K1
R2
K2
K3
R3
K4
R4
CS 245
Notes 4
12
R1
K1
R2
K2
K3
R3
say:
1024 B
per block
K4
R4
• if we want K3 block:
get it at offset
(3-1)1024
= 2048 bytes
CS 245
Notes 4
13
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record
can keep more of index in memory
• Dense: Can tell if any record exists
without accessing file
(Later:
– sparse better for insertions
– dense needed for secondary indexes)
CS 245
Notes 4
14
Terms
•
•
•
•
•
•
•
Index sequential file
Search key (  primary key)
Primary index (on Sequencing field)
Secondary index
Dense index (all Search Key values in)
Sparse index
Multi-level index
CS 245
Notes 4
15
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 245
Notes 4
16
Duplicate keys
10
10
10
20
20
30
30
30
40
45
CS 245
Notes 4
17
Duplicate keys
Dense index, one way to implement?
10
10
10
10
10
20
10
20
20
30
20
30
30
30
CS 245
30
30
40
45
Notes 4
18
Duplicate keys
Dense index, better way?
10
10
10
20
30
40
10
20
20
30
30
30
40
45
CS 245
Notes 4
19
Duplicate keys
Sparse index, one way?
10
10
10
10
20
30
10
20
20
30
30
30
40
45
CS 245
Notes 4
20
Duplicate keys
careful if looking
for 20 or 30!
Sparse index, one way?
CS 245
10
10
10
10
20
30
10
20
20
30
30
30
40
45
Notes 4
21
Duplicate keys
Sparse index, another way?
– place first new key from block
10
20
30
30
10
10
10
20
20
30
30
30
40
45
CS 245
Notes 4
22
Duplicate keys
Sparse index, another way?
– place first new key from block
should
this be
40?
10
20
30
30
10
10
10
20
20
30
30
30
40
45
CS 245
Notes 4
23
Summary
Duplicate values,
primary index
• Index may point to first instance of
each value only
File
Index
a
a
a
.
.
b
CS 245
Notes 4
24
Deletion from sparse index
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
25
Deletion from sparse index
– delete record 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
26
Deletion from sparse index
– delete record 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
27
Deletion from sparse index
– delete record 30
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
28
Deletion from sparse index
– delete record 30
10
20
10
40 30
50
70
30 40
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
29
Deletion from sparse index
– delete records 30 & 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
30
Deletion from sparse index
– delete records 30 & 40
10
20
10
30
50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
31
Deletion from sparse index
– delete records 30 & 40
10
20
10
50 30
70 50
70
30
40
50
60
90
110
130
150
CS 245
70
80
Notes 4
32
Deletion from dense index
10
20
10
20
30
40
30
40
50
60
50
60
70
80
CS 245
70
80
Notes 4
33
Deletion from dense index
– delete record 30
10
20
10
20
30
40
30
40
50
60
50
60
70
80
CS 245
70
80
Notes 4
34
Deletion from dense index
– delete record 30
10
20
10
20
30
40
30 40
40
50
60
50
60
70
80
CS 245
70
80
Notes 4
35
Deletion from dense index
– delete record 30
10
20
10
20
40 30
40
30 40
40
50
60
50
60
70
80
CS 245
70
80
Notes 4
36
Insertion, sparse index case
10
20
10
30
40
60
30
40
50
60
CS 245
Notes 4
37
Insertion, sparse index case
– insert record 34
10
20
10
30
40
60
30
40
50
60
CS 245
Notes 4
38
Insertion, sparse index case
– insert record 34
10
20
10
30
40
60
30
34
40
50
• our lucky day!
we have free space
where we need it!
CS 245
Notes 4
60
39
Insertion, sparse index case
– insert record 15
10
20
10
30
40
60
30
40
50
60
CS 245
Notes 4
40
Insertion, sparse index case
– insert record 15
10
20 15
10
20 30
40
60
30 20
30
40
50
60
CS 245
Notes 4
41
Insertion, sparse index case
– insert record 15
10
20 15
10
20 30
40
60
30 20
30
40
50
• Illustrated: Immediate reorganization 60
• Variation:
– insert new block (chained file)
– update index
CS 245
Notes 4
42
Insertion, sparse index case
– insert record 25
10
30
40
60
10
20
30
40
50
60
CS 245
Notes 4
43
Insertion, sparse index case
– insert record 25
10
30
40
60
10
20
25
30
overflow blocks
(reorganize later...)
40
50
60
CS 245
Notes 4
44
Insertion, dense index case
• Similar
• Often more expensive . . .
CS 245
Notes 4
45
Secondary indexes
Sequence
field
30
50
20
70
80
40
100
10
90
60
CS 245
Notes 4
46
Secondary indexes
Sequence
field
• Sparse index
30
50
30
20
80
100
20
70
80
40
90
...
100
10
90
60
CS 245
Notes 4
47
Secondary indexes
Sequence
field
• Sparse index
30
50
30
20
80
100
20
70
80
40
90
...
100
10
does not make sense!
CS 245
Notes 4
90
60
48
Secondary indexes
Sequence
field
• Dense index
30
50
20
70
80
40
100
10
90
60
CS 245
Notes 4
49
Secondary indexes
Sequence
field
• Dense index
10
20
30
40
30
50
50
60
70
...
80
40
20
70
100
10
90
60
CS 245
Notes 4
50
Secondary indexes
Sequence
field
• Dense index
10
50
90
...
sparse
high
level
CS 245
10
20
30
40
30
50
50
60
70
...
80
40
20
70
100
10
90
60
Notes 4
51
With secondary indexes:
• Lowest level is dense
• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; not computed)
CS 245
Notes 4
52
Duplicate values & secondary indexes
20
10
20
40
10
40
10
40
30
40
CS 245
Notes 4
53
Duplicate values & secondary indexes
one option...
20
10
10
10
10
20
20
40
20
30
40
40
10
40
10
40
30
40
40
40
...
CS 245
Notes 4
54
Duplicate values & secondary indexes
one option...
Problem:
excess overhead!
• disk space
• search time
CS 245
20
10
10
10
10
20
20
40
20
30
40
40
10
40
10
40
30
40
40
40
...
Notes 4
55
Duplicate values & secondary indexes
another option...
10
20
10
20
20
40
10
40
30
40
CS 245
10
40
30
40
Notes 4
56
Duplicate values & secondary indexes
another option...
Problem:
variable size
records in
index!
CS 245
10
20
10
20
20
40
10
40
30
40
10
40
30
40
Notes 4
57
Duplicate values & secondary indexes
20
10
10
20
30
40
20
40
10
40
50
60
...
Another idea (suggested in class):
Chain records with same key?
CS 245

Notes 4
10
40

30
40


58
Duplicate values & secondary indexes
20
10
10
20
30
40

20
40
10
40
50
60
...
Another idea (suggested in class):
Chain records with same key?
10
40

30
40


Problems:
• Need to add fields to records
• Need to follow chain to know records
CS 245
Notes 4
59
Duplicate values & secondary indexes
20
10
10
20
30
40
20
40
10
40
50
60
...
10
40
30
40
buckets
CS 245
Notes 4
60
Why “bucket” idea is useful
Indexes
Name: primary
Dept: secondary
Floor: secondary
CS 245
Records
EMP (name,dept,floor,...)
Notes 4
61
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index
EMP
Toy
CS 245
Floor index
2nd
Notes 4
62
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index
EMP
Floor index
Toy
2nd
 Intersect toy bucket and 2nd Floor
bucket to get set of matching EMP’s
CS 245
Notes 4
63
This idea used in
text information retrieval
Documents
...the cat is
fat ...
...was raining
cats and dogs...
...Fido the
dog ...
CS 245
Notes 4
64
This idea used in
text information retrieval
Documents
cat
...the cat is
fat ...
dog
...was raining
cats and dogs...
...Fido the
dog ...
Inverted lists
CS 245
Notes 4
65
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
CS 245
Notes 4
66
IR QUERIES
• Find articles with “cat” and “dog”
• Find articles with “cat” or “dog”
• Find articles with “cat” and not “dog”
• Find articles with “cat” in title
• Find articles with “cat” and “dog”
within 5 words
CS 245
Notes 4
67
Common technique:
more info in inverted list
cat
d1
Title 5
Author 10
Abstract 57
dog
CS 245
Title
Title
d3
d2
100
12
Notes 4
68
Posting: an entry in inverted list.
Represents occurrence of
term in article
Size of a list:
1
Rare words or
miss-spellings
106
Common words
(in postings)
Size of a posting: 10-15 bits
CS 245
Notes 4
(compressed)
69
IR DISCUSSION
•
•
•
•
•
Stop words
Truncation
Thesaurus
Full text vs. Abstracts
Vector model
CS 245
Notes 4
70
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1
0 0 1
1 0 0 …>
Query= <0
CS 245
0
1
Notes 4
1
0
0
0 …>
71
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1
0 0 1
1 0 0 …>
Query= <0
0
1
PRODUCT =
CS 245
1
0
0
0 …>
1 + ……. = score
Notes 4
72
• Tricks to weigh scores + normalize
e.g.: Match on common word not as
useful as match on rare words...
CS 245
Notes 4
73
• How to process V.S. Queries?
w1 w2 w3 w4 w5 w6
Q=<0
0 0 1
1 0
CS 245
Notes 4
…
…>
74
• Try Stanford Libraries
• Try Google, Yahoo, ...
CS 245
Notes 4
75
Summary so far
• Conventional index
– Basic Ideas: sparse, dense, multi-level…
– Duplicate Keys
– Deletion/Insertion
– Secondary indexes
– Buckets of Postings List
CS 245
Notes 4
76
Conventional indexes
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive, and/or
- Lose sequentiality & balance
CS 245
Notes 4
77
Example
Index
(sequential)
10
20
30
continuous
40
50
60
free space
70
80
90
CS 245
Notes 4
78
Example
continuous
Index
(sequential)
10
20
30
33
40
50
60
39
31
35
36
32
38
34
free space
70
80
90
CS 245
overflow area
(not sequential)
Notes 4
79
Outline:
• Conventional indexes
• B-Trees
 NEXT
• Hashing schemes
CS 245
Notes 4
80
• NEXT: Another type of index
– Give up on sequentiality of index
– Try to get “balance”
CS 245
Notes 4
81
CS 245
Notes 4
180
200
150
156
179
120
130
100
101
110
30
35
3
5
11
120
150
180
30
100
B+Tree Example
n=3
Root
82
95
81
57
Sample non-leaf
to keys
to keys
to keys
< 57
57 k<81
81k<95
CS 245
Notes 4
to keys
95
83
Sample leaf node:
CS 245
81
95
To record
with key 81
To record
with key 85
To record
with key 57
57
From non-leaf node
Notes 4
to next leaf
in sequence
84
In textbook’s notation
n=3
Leaf:
30
35
30 35
Non-leaf:
30
30
CS 245
Notes 4
85
Size of nodes:
CS 245
n+1 pointers
(fixed)
n keys
Notes 4
86
Don’t want nodes to be too empty
• Use at least
Non-leaf:
(n+1)/2 pointers
Leaf:
(n+1)/2 pointers to data
CS 245
Notes 4
87
n=3
30
Leaf
30
35
CS 245
Notes 4
counts even if null
Non-leaf
120
150
180
min. node
3
5
11
Full node
88
B+tree rules
tree of order n
(1) All leaves at same lowest level
(balanced tree)
(2) Pointers in leaves point to records
except for “sequence pointer”
CS 245
Notes 4
89
(3) Number of pointers/keys for B+tree
Max Max Min
ptrs keys ptrsdata
Non-leaf
(non-root)
Leaf
(non-root)
Root
CS 245
Min
keys
n+1
n
(n+1)/2 (n+1)/2- 1
n+1
n
(n+1)/2
(n+1)/2
n+1
n
1
1
Notes 4
90
Insert into B+tree
(a) simple case
– space available in leaf
(b) leaf overflow
(c) non-leaf overflow
(d) new root
CS 245
Notes 4
91
(a) Insert key = 32
CS 245
30
31
3
5
11
30
100
n=3
Notes 4
92
(a) Insert key = 32
CS 245
30
31
32
3
5
11
30
100
n=3
Notes 4
93
(a) Insert key = 7
CS 245
30
31
3
5
11
30
100
n=3
Notes 4
94
(a) Insert key = 7
CS 245
30
31
3
57
11
3
5
30
100
n=3
Notes 4
95
(a) Insert key = 7
CS 245
30
30
31
3
57
11
3
5
7
100
n=3
Notes 4
96
(c) Insert key = 160
150
156
179
180
200
120
150
180
100
n=3
CS 245
Notes 4
97
(c) Insert key = 160
CS 245
Notes 4
180
200
160
179
150
156
179
120
150
180
100
n=3
98
(c) Insert key = 160
CS 245
Notes 4
180
200
180
160
179
150
156
179
120
150
180
100
n=3
99
n=3
CS 245
Notes 4
180
200
180
160
179
150
156
179
120
150
180
160
100
(c) Insert key = 160
100
(d) New root, insert 45
CS 245
Notes 4
30
32
40
20
25
10
12
1
2
3
10
20
30
n=3
101
(d) New root, insert 45
CS 245
Notes 4
40
45
30
32
40
20
25
10
12
1
2
3
10
20
30
n=3
102
(d) New root, insert 45
40
45
40
Notes 4
30
32
40
20
25
10
20
30
10
12
1
2
3
CS 245
n=3
103
(d) New root, insert 45
CS 245
Notes 4
40
45
30
32
40
20
25
10
12
1
2
3
10
20
30
40
30
new root
n=3
104
Deletion from B+tree
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
CS 245
Notes 4
105
(b) Coalesce with sibling
n=4
CS 245
40
50
10
20
30
10
40
100
– Delete 50
Notes 4
106
(b) Coalesce with sibling
n=4
CS 245
40
50
10
20
30
40
10
40
100
– Delete 50
Notes 4
107
(c) Redistribute keys
n=4
CS 245
40
50
10
20
30
35
10
40
100
– Delete 50
Notes 4
108
(c) Redistribute keys
n=4
CS 245
35
40
50
10
20
30
35
10
40 35
100
– Delete 50
Notes 4
109
(d) Non-leaf coalese
n=4
CS 245
Notes 4
40
45
30
37
30
40
25
26
20
22
10
14
1
3
10
20
25
– Delete 37
110
(d) Non-leaf coalese
n=4
CS 245
Notes 4
40
45
30
37
30
40
25
26
30
20
22
10
14
1
3
10
20
25
– Delete 37
111
(d) Non-leaf coalese
n=4
CS 245
Notes 4
40
45
30
37
25
26
30
30
40
40
20
22
10
14
1
3
10
20
25
– Delete 37
112
(d) Non-leaf coalese
n=4
25
– Delete 37
CS 245
Notes 4
40
45
30
37
30
40
25
26
30
20
22
10
14
1
3
10
20
25
40
new root
113
B+tree deletions in practice
– Often, coalescing is not implemented
– Too hard and not worth it!
CS 245
Notes 4
114
Comparison: B-trees vs. static
indexed sequential file
Ref #1: Held & Stonebraker
“B-Trees Re-examined”
CACM, Feb. 1978
CS 245
Notes 4
115
Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:
block = 512 bytes
key = pointer = 4 bytes
4 data records per block
CS 245
Notes 4
116
Example: 1 block static index
k1
k1
127 keys
k2
k2
k3
k3
(127+1)4 = 512 Bytes
-> pointers in index implicit!
CS 245
1 data
block
Notes 4
up to 127
blocks
117
Example: 1 block B-tree
k1
k1
1 data
block
k2
63 keys
k2
...
k63
k3
next
63x(4+4)+8 = 512 Bytes
-> pointers needed in B-tree
blocks because index is
not contiguous
CS 245
Notes 4
up to 63
blocks
118
Size comparison
Ref. #1
Static Index
# data
blocks
height
2 -> 127
128 -> 16,129
16,130 -> 2,048,383
CS 245
2
3
4
B-tree
# data
blocks
height
2 -> 63
64 -> 3968
3969 -> 250,047
250,048 -> 15,752,961
Notes 4
2
3
4
5
119
Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
CS 245
Notes 4
120
Ref. #1 analysis claims
• For an 8,000 block file,
after 32,000 inserts
after 16,000 lookups
 Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion
CS 245
Static index better!!
Notes 4
121
Ref #2: M. Stonebraker,
“Retrospective on a database
system,” TODS, June 1980
B-trees better!!
Ref. #2 conclusion
CS 245
Notes 4
122
B-trees better!!
Ref. #2 conclusion
• DBA does not know when to reorganize
• DBA does not know how full to load
pages of new index
CS 245
Notes 4
123
B-trees better!!
Ref. #2 conclusion
• Buffering
– B-tree: has fixed buffer requirements
– Static index: must read several overflow
blocks to be efficient
(large & variable size
buffers needed for this)
CS 245
Notes 4
124
• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
CS 245
Notes 4
125
• Speaking of buffering…
Is LRU a good policy for B+tree buffers?
 Of course not!
 Should try to keep root in memory
at all times
(and perhaps some nodes from second level)
CS 245
Notes 4
126
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
CS 245
Notes 4
127
Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
CS 245
Notes 4
128
Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
CS 245
Notes 4
129
Sample assumptions:
(1) Time to read node from disk is
(S+Tn) msec.
(2) Once block in memory, use binary
search to locate key:
(a + b LOG2 n) msec.
For some constants a,b; Assume a << S
(3) Assume B+tree is full, i.e.,
# nodes to examine is LOGn N
where N = # records
CS 245
Notes 4
130
Can get:
f(n) = time to find a record
f(n)
nopt
CS 245
Notes 4
n
131
 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
CS 245
Notes 4
132
 FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
 What happens to nopt as
• Disk gets faster?
• CPU get faster?
CS 245
Notes 4
133
Exercise
S=
14000
T=
0.2
b=
0.002
a=
0
N=
10000000
• f(n)= lognN*[S+T*n+a+b*log2(n)]
CS 245
Notes 4
134
N=10 million records
S=
T=
b=
a=
N=
14000
0.2
0.002
0
10000000
times in microseconds
CS 245
Notes 4
135
N=100 million records
S=
T=
b=
a=
N=
14000
0.2
0.002
0
10000000
times in microseconds
CS 245
Notes 4
136
Variation on B+tree: B-tree (no +)
• Idea:
– Avoid duplicate keys
– Have record pointers in non-leaf nodes
CS 245
Notes 4
137
K1 P1
to keys
< K1
CS 245
K2 P2
K3 P3
to record
to record
to record
with K1
with K2
with K3
to keys
to keys
K1<x<K2
K2<x<k3
Notes 4
to keys
>k3
138
CS 245
Notes 4
170
180
150
160
130
140
110
120
90
100
70
80
50
60
30
40
10
20
145
165
85
105
25
45
65
125
B-tree example
n=2
139
CS 245
Notes 4
170
180
150
160
145
165
110
120
90
100
70
80
50
60
30
40
10
20
25
45
85
105
(but keep space for simplicity)
130
140
• sequence pointers
not useful now!
n=2
65
125
B-tree example
140
Note on inserts
leaf
CS 245
10
20
30
• Say we insert record with key = 25
Notes 4
n=3
141
Note on inserts
• Say we insert record with key = 25
CS 245
25
30
10
–
20
–
• Afterwards:
10
20
30
leaf
n=3
Notes 4
142
So, for B-trees:
MAX
Non-leaf
non-root
Leaf
non-root
Root
non-leaf
Root
Leaf
CS 245
MIN
Tree
Ptrs
Rec Keys
Ptrs
Tree
Ptrs
Rec
Ptrs
Keys
n+1
n
n
1
n
n
1
n/2
n+1
n
n
2
1
1
1
n
n
1
1
1
(n+1)/2 (n+1)/2-1 (n+1)/2-1
Notes 4
n/2
143
Tradeoffs:
 B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes
 in B-tree, deletion more complicated
CS 245
Notes 4
144
Tradeoffs:
 B-trees have faster lookup than B+trees
 in B-tree, non-leaf & leaf different sizes
 in B-tree, deletion more complicated
 B+trees preferred!
CS 245
Notes 4
145
But note:
• If blocks are fixed size
(due to disk and buffering restrictions)
Then lookup for B+tree is
actually better!!
CS 245
Notes 4
146
Example:
- Pointers
- Keys
- Blocks
- Look at full
CS 245
4 bytes
4 bytes
100 bytes (just example)
2 level tree
Notes 4
147
B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
CS 245
Notes 4
148
B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes
CS 245
Notes 4
149
B-tree:
Root has 8 keys + 8 record pointers
+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes
2-level B-tree, Max # records =
12x9 + 8 = 116
CS 245
Notes 4
150
B+tree:
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
CS 245
Notes 4
151
B+tree:
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes
CS 245
Notes 4
152
B+tree:
Root has 12 keys + 13 son pointers
= 12x4 + 13x4 = 100 bytes
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes
2-level B+tree, Max # records
= 13x12 = 156
CS 245
Notes 4
153
So...
8 records
B+
B
ooooooooooooo
ooooooooo
156 records
108 records
Total = 116
CS 245
Notes 4
154
So...
8 records
B+
B
ooooooooooooo
ooooooooo
156 records
108 records
Total = 116
• Conclusion:
– For fixed block size,
– B+ tree is better because it is bushier
CS 245
Notes 4
155
An Interesting Problem...
• What is a good index structure when:
– records tend to be inserted with keys
that are larger than existing values?
(e.g., banking records with growing data/time)
– we want to remove older data
CS 245
Notes 4
156
One Solution: Multiple Indexes
• Example: I1, I2
day
10
11
12
13
days indexed
I1
1,2,3,4,5
11,2,3,4,5
11,12,3,4,5
11,12,13,4,5
days indexed
I2
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
•advantage: deletions/insertions from smaller index
•disadvantage: query multiple indexes
CS 245
Notes 4
157
Another Solution (Wave Indexes)
day
10
11
12
13
14
15
16
I1
1,2,3
1,2,3
1,2,3
13
13,14
13,14,15
13,14,15
I2
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
16
I3
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
I4
10
10,11
10,11,
10,11,
10,11,
10,11,
10,11,
•advantage: no deletions
•disadvantage: approximate windows
CS 245
Notes 4
158
12
12
12
12
12
Outline/summary
• Conventional Indexes
• Sparse vs. dense
• Primary vs. secondary
• B trees
• B+trees vs. B-trees
• B+trees vs. indexed sequential
• Hashing schemes
CS 245
--> Next
Notes 4
159