Algorithms 3.1 S T

Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
F O U R T H
E D I T I O N
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ ordered operations
Data structures
“ Smart data structures and dumb code works a lot better
than the other way around. ” – Eric S. Raymond
2
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ ordered operations
Symbol tables
Key-value pair abstraction.
・Insert a value with specified key.
・Given a key, search for the corresponding value.
Ex. DNS lookup.
・Insert domain name with specified IP address.
・Given domain name, find corresponding IP address.
domain name
IP address
www.cs.princeton.edu
128.112.136.11
www.princeton.edu
128.112.128.15
www.yale.edu
130.132.143.21
www.harvard.edu
128.103.060.55
www.simpsons.com
209.052.165.60
key
value
4
Symbol table applications
application
purpose of search
key
value
dictionary
find definition
word
definition
book index
find relevant pages
term
list of page numbers
file share
find song to download
name of song
computer ID
financial account
process transactions
account number
transaction details
web search
find relevant web pages
keyword
list of page names
compiler
find properties of variables
variable name
type and value
routing table
route Internet packets
destination
best route
DNS
find IP address
domain name
IP address
reverse DNS
find domain name
IP address
domain name
genomics
find markers
DNA string
known positions
file system
find file on disk
filename
location on disk
5
Symbol tables: context
Also known as: maps, dictionaries, associative arrays.
Generalizes arrays. Keys need not be between 0 and N – 1.
Language support.
・External libraries: C, VisualBasic, Standard ML, bash, ...
・Built-in libraries: Java, C#, C++, Scala, ...
・Built-in to language: Awk, Perl, PHP, Tcl, JavaScript, Python, Ruby, Lua.
every array is an
associative array
every object is an
associative array
table is the only
primitive data structure
hasNiceSyntaxForAssociativeArrays["Python"] = True
hasNiceSyntaxForAssociativeArrays["Java"]
= False
legal Python code
6
Basic symbol table API
Associative array abstraction. Associate one value with each key.
public class ST<Key, Value>
ST()
void put(Key key, Value val)
Value get(Key key)
boolean contains(Key key)
void delete(Key key)
boolean isEmpty()
int size()
Iterable<Key> keys()
create an empty symbol table
put key-value pair into the table
value paired with key
a[key] = val;
a[key]
is there a value paired with key?
remove key (and its value) from table
is the table empty?
number of key-value pairs in the table
all the keys in the table
7
Conventions
・Values are not null.
・Method get() returns null if key not present.
・Method put() overwrites old value with new value.
Java allows null value
Intended consequences.
・Easy to implement contains().
public boolean contains(Key key)
{ return get(key) != null; }
・Can implement lazy version of delete().
public void delete(Key key)
{ put(key, null); }
8
Keys and values
Value type. Any generic type.
specify Comparable in API.
Key type: several natural assumptions.
・Assume keys are Comparable, use compareTo().
・Assume keys are any generic type, use equals() to test equality.
・Assume keys are any generic type, use equals() to test equality;
use hashCode() to scramble key.
built-in to Java
(stay tuned)
Best practices. Use immutable types for symbol table keys.
・Immutable in Java: Integer, Double, String, java.io.File, …
・Mutable in Java: StringBuilder, java.net.URL, arrays, ...
9
Equality test
All Java classes inherit a method equals().
Java requirements. For any references x, y and z:
・Reflexive:
・Symmetric:
・Transitive:
・Non-null:
x.equals(x)
is true.
x.equals(y)
iff y.equals(x).
equivalence
relation
if x.equals(y) and y.equals(z), then x.equals(z).
x.equals(null)
is false.
do x and y refer to
the same object?
Default implementation. (x == y)
Customized implementations. Integer, Double, String, java.io.File, …
User-defined implementations. Some care needed.
10
Implementing equals for user-defined types
Seems easy.
public
class Date implements Comparable<Date>
{
private final int month;
private final int day;
private final int year;
...
public boolean equals(Date that)
{
if (this.day
!= that.day ) return false;
if (this.month != that.month) return false;
if (this.year != that.year ) return false;
return true;
check that all significant
fields are the same
}
}
11
Implementing equals for user-defined types
Seems easy, but requires some care.
typically unsafe to use equals() with inheritance
(would violate symmetry)
public final class Date implements Comparable<Date>
{
private final int month;
private final int day;
private final int year;
...
public boolean equals(Object y)
{
if (y == this) return true;
must be Object.
Why? Experts still debate.
optimize for true object equality
if (y == null) return false;
check for null
if (y.getClass() != this.getClass())
return false;
objects must be in the same class
(religion: getClass() vs. instanceof)
Date that = (Date) y;
if (this.day
!= that.day ) return false;
if (this.month != that.month) return false;
if (this.year != that.year ) return false;
return true;
cast is guaranteed to succeed
check that all significant
fields are the same
}
}
12
Equals design
"Standard" recipe for user-defined types.
・Optimization for reference equality.
・Check against null.
・Check that two objects are of the same type; cast.
・Compare each significant field:
– if field is a primitive type, use ==
but use Double.compare() with double
(to deal with -0.0 and NaN)
– if field is an object, use equals()
apply rule recursively
– if field is an array, apply to each entry
can use Arrays.deepEquals(a, b)
but not a.equals(b)
Best practices.
e.g., cached Manhattan distance
・No need to use calculated fields that depend on other fields.
・Compare fields mostly likely to differ first.
・Make compareTo() consistent with equals().
x.equals(y) if and only if (x.compareTo(y) == 0)
13
ST test client for traces
Build ST by associating value i with ith string from standard input.
public static void main(String[] args)
{
ST<String, Integer> st = new ST<String, Integer>();
for (int i = 0; !StdIn.isEmpty(); i++)
keys S E A R C H
{
values 0 1 2 3 4 5
String key = StdIn.readString();
st.put(key, i);
output for
basic symbol table
}
(one possibility)
for (String s : st.keys())
L 11
StdOut.println(s + " " + st.get(s));
}
P 10
M
X
keys
S
E
A
R
C
H
E
X
A
M
values 0
1
2
3
4
5
6
7
8
9
output for
basic symbol table
(one possibility)
H
C
P L ER
10 11 12A
E
S
output for
ordered
symbol table
A
8
9
7
5
4
3
8
12
0
E
X
A
M
P
L
6
7
8
9 10 11 1
output
outputfor
ordered
symbol table
A
8
C
E
H
L
M
P
R
S
4
12
5
11
9
10
3
0
X
7
Keys, values, and output for test client
14
ST test client for analysis
Frequency counter. Read a sequence of strings from standard input
and print out one that occurs with highest frequency.
% more
it was
it was
it was
it was
it was
it was
it was
it was
it was
it was
tinyTale.txt
the best of times
the worst of times
the age of wisdom
the age of foolishness
the epoch of belief
the epoch of incredulity
the season of light
the season of darkness
the spring of hope
the winter of despair
% java FrequencyCounter 1 < tinyTale.txt
it 10
tiny example
(60 words, 20 distinct)
% java FrequencyCounter 8 < tale.txt
business 122
real example
(135,635 words, 10,769 distinct)
% java FrequencyCounter 10 < leipzig1M.txt
government 24763
real example
(21,191,455 words, 534,580 distinct)
15
Frequency counter implementation
public class FrequencyCounter
{
public static void main(String[] args)
{
int minlen = Integer.parseInt(args[0]);
ST<String, Integer> st = new ST<String, Integer>();
while (!StdIn.isEmpty())
{
String word = StdIn.readString();
ignore short strings
if (word.length() < minlen) continue;
if (!st.contains(word)) st.put(word, 1);
else
st.put(word, st.get(word) + 1);
}
String max = "";
st.put(max, 0);
for (String word : st.keys())
if (st.get(word) > st.get(max))
max = word;
StdOut.println(max + " " + st.get(max));
}
}
create ST
read string and
update frequency
print a string
with max freq
16
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ ordered operations
Sequential search in a linked list
Data structure. Maintain an (unordered) linked list of key-value pairs.
Search. Scan through all keys until find a match.
Insert. Scan through all keys until find a match; if no match add to front.
key value first
S
0
S 0
red nodes
are new
E
1
E 1
S 0
A
2
A 2
E 1
S 0
R
3
R 3
A 2
E 1
S 0
C
4
C 4
R 3
A 2
E 1
S 0
H
5
H 5
C 4
R 3
A 2
E 1
S 0
E
6
H 5
C 4
R 3
A 2
E 6
S 0
X
7
X 7
H 5
C 4
R 3
A 2
E 6
S 0
A
8
X 7
H 5
C 4
R 3
A 8
E 6
S 0
M
9
M 9
X 7
H 5
C 4
R 3
A 8
E 6
gray nodes
are untouched
S 0
P 10
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 6
S 0
L 11
L 11
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 6
S 0
E 12
L 11
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 12
S 0
black nodes
are accessed
in search
circled entries are
changed values
Trace of linked-list ST implementation for standard indexing client
18
Elementary ST implementations: summary
guarantee
average case
operations
on keys
implementation
sequential search
(unordered list)
search
insert
search hit
insert
N
N
N
N
equals()
Challenge. Efficient implementations of both search and insert.
19
Binary search in an ordered array
Data structure. Maintain an ordered array of key-value pairs.
Rank helper function. How many keys < k ?
keys[]
0 1 2 3 4 5 6 7 8 9
successful search for P
keys[]
lo hi m
A C E H L M P R S X
successful
search
for for
P
entries in black
0
4 M
5 P
6 7
9
successful
0search
9 4P
A 1
C 2
E 3
H keys[]
L
R 8
S X
are a[lo..hi]
lo
hi
m
5 9 7
A
0 C
1 E
2 H
3 L
4 M
5 P
6 R
7 S
8 X
9
successful search for P
entries in black
0
A
R S
S X
5 9
6 4
5
A C
C E
E H
H L
L M
M P
P R
X
are a[lo..hi]
lo hi m
entry in red is a[m]
5
9
7
A
C
E
H
L
M
P
R
S
X
entries in black
6
A
0 6
9 6
4
A C
C E
E H
H L
L M
M P
P R
R S
S X
X
are a[lo..hi]
5 search
6 5
A C E H L M P R S exits
X
keys[m] = P: return 6
unsuccessful
5 9 7for Q A C E H L M P R loop
S X with
entry in red is a[m]
6
6
6
A
C
E
H
L
M
P
R
S
X
lo
5 hi
6 m5
A C E H L M P R S X
exits with
keys[m]
P: return 6
entry
in red is=a[m]
unsuccessful
search4for Q A C E H L M P R loop
0
6 9
6 6
A C E H L M P R S
S X
X
lo
5 hi
9 form7Q
A C E H L M P R loop
S exits
X with keys[m] = P: return 6
unsuccessful
search
unsuccessful search for Q
0
A
R S
S X
5 9
6 4
5
A C
C E
E H
H L
L M
M P
P R
X
lo hi m
5
9 7
A
C E
H L
M P
R S
X
7
A
0 6
9 6
4
A C
C E
E H
H L
L M
M P
P R
R S
S X
X
5 6 5
A C E H L M P R S X
5 9 7 loop exits
A with
C Elo >H hi:
L return
M P7 R S X
7 6 6
A C E H L M P R S X
5 6 5
A C E H L M P R S X
Trace of
binary searchreturn
for rank in an ordered array
7 6 6 loop exits
A with
C Elo >H hi:
L M P7 R S X
loop
exitsof
with
lo >search
hi: return
Trace
binary
for rank7 in an ordered array
Trace of binary search for rank in an ordered array
20
Binary search: Java implementation
public Value get(Key key)
{
if (isEmpty()) return null;
int i = rank(key);
if (i < N && keys[i].compareTo(key) == 0) return vals[i];
else return null;
}
number of keys < key
private int rank(Key key)
{
int lo = 0, hi = N-1;
while (lo <= hi)
{
int mid = lo + (hi - lo) / 2;
int cmp = key.compareTo(keys[mid]);
if
(cmp < 0) hi = mid - 1;
else if (cmp > 0) lo = mid + 1;
else if (cmp == 0) return mid;
}
return lo;
}
21
Elementary symbol tables: quiz 1
Implementing binary search was
A.
Easier than I thought.
B.
About what I expected.
C.
Harder than I thought.
D.
Much harder than I thought.
E.
I don't know. (Well, you should!)
22
FIND THE FIRST 1
Problem. Given an array with all 0s in the beginning and all 1s at the end,
find the index in the array where the 1s start.
input
0
0
0
0
0
…
0
0
0
0
0
1
1
1
…
1
1
1
Variant 1. You are given the length of the array.
Variant 2. You are not given the length of the array.
23
Binary search: trace of standard indexing client
Problem. To insert, need to shift all greater keys over.
key value
0
1
2
keys[]
3 4 5 6
7
8
9
N
0
1
0
1
2
2
2
0
1
1
4
0
3
1
0
3
0
2
2
2
8
8
4
4
4
4
4
1
6
6
6
6
5
5
5
5
5
3
3
3
3
9
8
8
8
8
S
E
A
R
C
0
1
2
3
4
S
E
A
A
A
S
E
E
C
S
R
E
S
R
H
E
X
A
M
5
6
7
8
9
A
A
A
A
A
C
C
C
C
C
E
E
E
E
E
H
H
H
H
H
1
2
entries in red
3
were inserted
4
S
entries in gray 5
did not move 6
R S
R S
6
R S X
7
R S X
7
M R S X
8
P
L
E
10
11
12
A
A
A
C
C
C
E
E
E
H
H
H
M
L
L
P
M
M
R
P
P
S
R
R
X
S
S
X
X
A
C
E
H
L
M
P
R
S
X
9
10
10
2
vals[]
3 4 5 6
7
8
9
entries in black
moved to the right
0
0
0
0
3
circled entries are
changed values
7
7
0
7
4 6
4 6
4 12
5 9 10 3
5 11 9 10
5 11 9 10
0
3
3
7
0
0
7
7
4 12
5 11
3
0
7
9 10
Trace of ordered-array ST implementation for standard indexing client
24
Elementary ST implementations: summary
guarantee
average case
operations
on keys
implementation
search
insert
search hit
insert
sequential search
(unordered list)
N
N
N
N
equals()
binary search
(ordered array)
log N
N
log N
N
compareTo()
Challenge. Efficient implementations of both search and insert.
25
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ ordered operations
Examples of ordered symbol table API
min()
get(09:00:13)
floor(09:05:00)
select(7)
keys(09:15:00, 09:25:00)
ceiling(09:30:00)
max()
keys
values
09:00:00
09:00:03
09:00:13
09:00:59
09:01:10
09:03:13
09:10:11
09:10:25
09:14:25
09:19:32
09:19:46
09:21:05
09:22:43
09:22:54
09:25:52
09:35:21
09:36:14
09:37:44
Chicago
Phoenix
Houston
Chicago
Houston
Chicago
Seattle
Seattle
Phoenix
Chicago
Chicago
Chicago
Seattle
Seattle
Chicago
Chicago
Seattle
Phoenix
size(09:15:00, 09:25:00) is 5
rank(09:10:25) is 7
Examples of ordered symbol-table operations
27
Ordered symbol table API
public class ST<Key extends Comparable<Key>, Value>
...
Key min()
smallest key
Key max()
largest key
Key floor(Key key)
Key ceiling(Key key)
largest key less than or equal to key
smallest key greater than or equal to key
int rank(Key key)
number of keys less than key
Key select(int k)
key of rank k
void deleteMin()
delete smallest key
void deleteMax()
delete largest key
int size(Key lo, Key hi)
Iterable<Key> keys()
Iterable<Key> keys(Key lo, Key hi)
number of keys between lo and hi
all keys, in sorted order
keys between lo and hi, in sorted order
28
Binary search: ordered symbol table operations summary
sequential
search
binary
search
search
N
log N
insert
N
N
min / max
N
1
floor / ceiling
N
log N
rank
N
log N
select
N
1
ordered iteration
N log N
N
order of growth of the running time for ordered symbol table operations
29
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
3.2 B INARY S EARCH T REES
‣ BSTs
‣ ordered operations
Algorithms
F O U R T H
E D I T I O N
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ deletion
3.2 B INARY S EARCH T REES
‣ BSTs
‣ ordered operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ deletion
Binary search trees
Definition. A BST is a binary tree in symmetric order.
root
a left link
A binary tree is either:
a subtree
・Empty.
・Two disjoint binary trees (left and right).
right child
of root
null links
Anatomy of a binary tree
Symmetric order. Each node has a key,
and every node’s key is:
・
・Smaller than all keys in its right subtree.
Larger than all keys in its left subtree.
parent of A and R
left link
of E
S
E
X
A
R
C
keys smaller than E
key
H
9
value
associated
with R
keys larger than E
Anatomy of a binary search tree
3
Binary search tree demo
Search. If less, go left; if greater, go right; if equal, search hit.
successful search for H
S
E
X
A
R
C
H
M
4
Binary search tree demo
Insert. If less, go left; if greater, go right; if null, insert.
insert G
S
E
X
A
R
C
H
G
M
5
BST representation in Java
Java definition. A BST is a reference to a root Node.
A Node is composed of four fields:
・A Key and a Value.
・A reference to the left and right subtree.
smaller keys
larger keys
private class Node
{
private Key key;
private Value val;
private Node left, right;
public Node(Key key, Value val)
{
this.key = key;
this.val = val;
}
}
BST
Node
key
left
BST with smaller keys
val
right
BST with larger keys
Binary search tree
Key and Value are generic types; Key is Comparable
6
BST implementation (skeleton)
public class BST<Key extends Comparable<Key>, Value>
{
private Node root;
private class Node
{ /* see previous slide */
root of BST
}
public void put(Key key, Value val)
{ /* see next slides */ }
public Value get(Key key)
{ /* see next slides */ }
public void delete(Key key)
{ /* see next slides */ }
public Iterable<Key> iterator()
{ /* see next slides */ }
}
7
BST search: Java implementation
Get. Return value corresponding to given key, or null if no such key.
public Value get(Key key)
{
Node x = root;
while (x != null)
{
int cmp = key.compareTo(x.key);
if
(cmp < 0) x = x.left;
else if (cmp > 0) x = x.right;
else if (cmp == 0) return x.val;
}
return null;
}
Cost. Number of compares = 1 + depth of node.
8
BST insert
Put. Associate value with key.
inserting L
S
E
X
A
H
C
Search for key, then two cases:
・Key in tree ⇒ reset value.
・Key not in tree ⇒ add new node.
R
M
P
search for L ends
at this null link
S
E
X
A
R
H
C
M
create new node
P
L
S
E
X
A
R
C
H
M
reset links
on the way up
L
P
Insertion into a BST
9
BST insert: Java implementation
Put. Associate value with key.
public void put(Key key, Value val)
{ root = put(root, key, val); }
concise, but tricky,
recursive code;
read carefully!
private Node put(Node x, Key key, Value val)
{
if (x == null) return new Node(key, val);
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
else if (cmp == 0) x.val = val;
return x;
}
Cost. Number of compares = 1 + depth of node.
10
S
C
Tree shape
R
E
A
X
・Many BSTs correspond to same set of keys. typical case
case
search/insert
= 1 + depth of node.
・Number of compares for best
H
E
S
C
best case
R
X
A
typical case
X
A
R
H
C
E
H
R
H
S
X
S
E
worst case
C
R
C
H
C
X
X
R
A
S
E
S
C
E
X
typical case
H
A
R
E
A
A
S
A
worst case
C
BST possibilities
E
H
R
S
BottomAline. Tree
depends on order of X
insertion.
worstshape
case
C
E
BST possibilities
H
11
BST insertion: random order visualization
Ex. Insert keys in random order.
12
Binary search trees: quiz 1
In what order does the traverse(root) code print out the keys in the BST?
private void traverse(Node x)
{
if (x == null) return;
traverse(x.left);
StdOut.println(x.key);
traverse(x.right);
}
A.
A C E H M R S X
B.
A C E R H M X S
C.
S E A C R H M X
D.
C A M H R E X S
E.
S
E
X
A
R
C
H
M
I don't know.
13
Inorder traversal
・Traverse left subtree.
・Enqueue key.
・Traverse right subtree.
public Iterable<Key> keys()
{
Queue<Key> q = new Queue<Key>();
inorder(root, q);
return q;
}
private void inorder(Node x, Queue<Key> q)
{
if (x == null) return;
inorder(x.left, q);
q.enqueue(x.key);
inorder(x.right, q);
}
BST
key
left
val
right
BST with larger keys
BST with smaller keys
smaller keys, in order
key
larger keys, in order
all keys, in order
Property. Inorder traversal of a BST yields keys in ascending order.
14
Binary search trees: quiz 2
What is the name of this sorting algorithm?
1. Shuffle the keys.
2. Insert the keys into a BST, one at a time.
3. Do an inorder traversal of the BST.
A.
Insertion sort.
B.
Mergesort.
C.
Quicksort.
D.
None of the above.
E.
I don't know.
15
Correspondence between BSTs and quicksort partitioning
P
H
T
D
O
E
A
I
S
U
Y
C
M
L
Remark. Correspondence is 1–1 if array has no duplicate keys.
16
BSTs: mathematical analysis
Proposition. If N distinct keys are inserted into a BST in random order,
the expected number of compares for a search/insert is ~ 2 ln N.
Pf. 1–1 correspondence with quicksort partitioning.
Proposition. [Reed, 2003] If N distinct keys are inserted into a BST
in random order, the expected height is ~ 4.311 ln N.
expected depth of
function-call stack in quicksort
How Tall is a Tree
How Tall is a Tree?
Bruce Reed
Bruce Reed
CNRS, Paris, France
CNRS, Paris, France
[email protected]
[email protected]
ABSTRACT
But… Worst-case height is N – 1.
purpose of th
ABSTRACT
have:
purpose
this note
that
for /3 -- ½
Let H~ be the height of a random
binaryof search
tree toonprove
n
nodes.
Wesearch
show tree
that on
there
constants a = 4.31107...
Let H~ be the height of a random
binary
n existshave:
THEOREM
a
n
d
/
3
=
1.95...
such
that
E(H~)
= c~logn - / 3 1 o g l o g n +
nodes. We show that there exists constants a = 4.31107...
Var(Hn)
We also
O(1).
1. E(H~) = ~ l o g n - / 3 1 o g l=o gOn
a n d / 3 = 1.95... such that E(H~)O(1),
= c~logn
- / 3show
1 o g l othat
g n +Var(H~) =THEOREM
Var(Hn) = O(1) .
O(1), We also show that Var(H~) = O(1).
R e m a r k By
Categories and Subject Descriptors
nition
given
i
R e m a r k By the definition of a, /3 = 7"g~"
3~ T
Categories and Subject Descriptors
we this
will see
E.2 [ D a t a S t r u c t u r e s ] : Trees nition given is more suggestive ofaswhy
val
For more inf
as we will see.
E.2 [ D a t a S t r u c t u r e s ] : Trees
consult
For more information on randommay
binary
searc[
1. THE RESULTS
R
e
m
a
r
k
may consult [6],[7], [1], [2], [9], [4], and [8]. Afte
A binary search tree is a binary tree to each node of which
1. THE RESULTS
an
R e m a r k After I announced these developed
results, Drmo
we have
associated
key; these keys axe drawn from some
O(1)
using
A binary search tree is a binary tree
to each
node of awhich
developed an alternative proof of the fact thatc
totally
ordered
set
and
the
key
at v cannot be larger than
illumin
we have associated a key; these keys axe drawn from some
O(1) using completely different proofs
techniques.
17
the
key
at
its
right
child
nor
smaller
than
the
key
at
its
left
decided
su
totally ordered set and the key at v cannot be larger than
proofs illuminate different aspects of the to
probl
[ exponentially small chance when keys are inserted in random order ]
ST implementations: summary
guarantee
average case
implementation
operations
on keys
search
insert
search hit
insert
sequential search
(unordered list)
N
N
N
N
equals()
binary search
(ordered array)
log N
N
log N
N
compareTo()
BST
N
N
log N
log N
compareTo()
Why not shuffle to ensure a (probabilistic) guarantee of log N?
18
3.2 B INARY S EARCH T REES
‣ BSTs
‣ ordered operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ deletion
Minimum and maximum
Minimum. Smallest key in table.
Maximum. Largest key in table.
max()max
S
min()
E
min
X
A
R
C
H
M
Examples of BST order queries
Q. How to find the min / max?
20
Floor and ceiling
Floor. Largest key ≤ a given key.
Ceiling. Smallest key ≥ a given key.
floor(G)
max()
S
min()
E
X
A
R
C
H
ceiling(Q)
M
floor(D)
Examples of BST order queries
Q. How to find the floor / ceiling?
21
Computing the floor
finding
finding floor(S)
floor(S)
Floor. Find largest key ≤ k ?
Case 1. [ key in node x = k ]
E
E
A
A
The floor of k is in the left subtree of x.
E
C
AC
R
R
H
H
M
HM
M
C
The floor of k is k.
Case 2. [ key in node x > k ]
S
S
finding floor(S)
R
finding
finding floor(G)
floor(G)
S
S
finding floor(G)
E
E
A
A
E
C
AC
H
H
R
R
on the left
The floor of k can't be in left subtree of x:
it is either in the right subtree of x or
it is the key in node x.
X
SX
X
R greater than G so
S
S is
is
greater than G so
M
floor(G)
H M floor(G) must
must be
be
S is greater
than
G so
on
the
left
on
the
left
M floor(G) must be
C
Case 3. [ key in node x < k ]
X
SX
X
A
A
C
AC
S
S
E
E
E
H
H
M
CG so H M
is
less
than
E
E is less than G so
floor(G)
M
floor(G) could
could be
be
is less
Eon
the
right
on
the than
right G so
floor(G) could be
R
R
X
SX
X
R
22
Computing the floor
finding floor(G)
public Key floor(Key key)
{
Node x = floor(root, key);
if (x == null) return null;
return x.key;
}
private Node floor(Node x, Key key)
{
if (x == null) return null;
int cmp = key.compareTo(x.key);
if (cmp == 0) return x;
if (cmp < 0)
return floor(x.left, key);
Node t = floor(x.right, key);
if (t != null) return t;
else
return x;
S
E
X
A
R
H
C
M
G is less than S so
floor(G) must be
on the left
S
E
X
A
R
H
C
G is greater than E so
floor(G) could be
M
on the right
S
E
X
A
R
H
C
M
floor(G)in left
subtree is null
S
}
E
A
C
result
X
R
H
M
23
Rank and select
Q. How to implement rank() and select() efficiently?
A. In each node, store the number of nodes in its subtree.
subtree count
8
S
6
E
2
A
3
1
2
C
H
1
X
R
1
M
24
BST implementation: subtree counts
private class Node
{
private Key key;
private Value val;
private Node left;
private Node right;
private int count;
}
public int size()
{ return size(root);
}
private int size(Node x)
{
if (x == null) return 0;
ok to call
return x.count;
when x is null
}
number of nodes in subtree
initialize subtree
private Node put(Node x, Key key, Value val)
count to 1
{
if (x == null) return new Node(key, val, 1);
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
else if (cmp == 0) x.val = val;
x.count = 1 + size(x.left) + size(x.right);
return x;
}
25
Computing the rank
Rank. How many keys < k ?
node count
8
6
S
Case 1. [ key in node = k ]
All keys in left subtree < k;
no key in right subtree < k.
Case 2. [ key in node x > k ]
E
2
A
3
1
2
C
H
1
X
R
1
M
No key in right subtree < k;
recursively compute rank in left subtree.
Case 3. [ key in node x < k ]
All keys in left subtree < k;
some keys in right subtree may be < k.
26
Rank
Rank. How many keys < k ?
node count
8
6
S
Easy recursive algorithm (3 cases!)
E
2
A
3
1
2
C
H
1
X
R
1
M
public int rank(Key key)
{ return rank(key, root);
}
private int rank(Key key, Node x)
{
if (x == null) return 0;
int cmp = key.compareTo(x.key);
if
(cmp < 0) return rank(key, x.left);
else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right);
else if (cmp == 0) return size(x.left);
}
27
BST: ordered symbol table operations summary
sequential
search
binary
search
BST
search
N
log N
h
insert
N
N
h
min / max
N
1
h
floor / ceiling
N
log N
h
rank
N
log N
h
select
N
1
h
ordered iteration
N log N
N
N
h = height of BST
(proportional to log N
if keys inserted in random order)
order of growth of running time of ordered symbol table operations
28
3.2 B INARY S EARCH T REES
‣ BSTs
‣ ordered operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ deletion
ST implementations: summary
guarantee
average case
ordered
ops?
implementation
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N
N
N
N
N
N
binary search
(ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
?
✔
compareTo()
equals()
Next. Deletion in BSTs.
30
BST deletion: lazy approach
To remove a node with a given key:
・Set its value to null.
・Leave key in tree to guide search (but don't consider it equal in search).
E
E
A
S
A
S
delete I
C
H
☠
C
I
R
N
H
tombstone
R
N
Cost. ~ 2 ln N' per insert, search, and delete (if keys in random order),
where N' is the number of key-value pairs ever inserted in the BST.
Unsatisfactory solution. Tombstone (memory) overload.
31
Deleting the minimum
To delete the minimum key:
・Go left until finding a node with a null left link.
・Replace that node by its right link.
・Update subtree counts.
go left until
reaching null
left link
A
S
X
E
R
H
C
M
return that
node’s right link
S
X
E
R
A
public void deleteMin()
{ root = deleteMin(root);
H
C
}
private Node deleteMin(Node x)
{
if (x.left == null) return x.right;
x.left = deleteMin(x.left);
x.count = 1 + size(x.left) + size(x.right);
return x;
}
M
available for
garbage collection
update links and node counts
after recursive calls
7
S
5
X
E
R
C
H
M
Deleting the minimum in a BST
32
Hibbard deletion
To delete a node with key k: search for node t containing key k.
Case 0. [0 children] Delete t by setting parent link to null.
deleting C
update counts after
recursive calls
S
S
E
X
A
R
C
E
R
M
node to delete
replace with
null link
1
E
X
A
R
H
H
H
S
5
X
A
7
M
M
available for
garbage
collection
C
33
Hibbard deletion
To delete a node with key k: search for node t containing key k.
Case 1. [1 child] Delete t by replacing parent link.
deleting R
update counts after
recursive calls
S
S
E
X
A
R
C
H
M
node to delete
E
C
X
E
M
X
H
A
replace with
child link
S
5
H
A
7
C
M
available for
garbage
collection
R
34
Hibbard deletion
deleting E
node to delete
S
To delete a node with key k: search for node tEcontainingX key k.
A
R
H
C
M
Case 2. [2 children]
・
・Delete the minimum in t's right subtree.
A
C
・Put x in t's spot.
Find successor x of t.
t
S
E
node to delete
S
E
X
H
C
M
x
search for key E
successor
M
go right, then
go left until
reaching null
left link
min(t.right)
X
deleteMin(t.right)
X
M
7
S
5
H
A
X
R
C
S
E
E
R
R
H
C
min(t.right)
S
H
X
x
M
C
S
E
x
t.left
still a BST
successor
H
A
t
A
R
go right, then
go left until
reaching null
left link
R
x has no left child
but
X don't garbage collect x
x
deleting E
A
search for key E
M update links and
node counts after
recursive calls
Deletion in a BST
35
Hibbard deletion: Java implementation
public void delete(Key key)
{ root = delete(root, key);
}
private Node delete(Node x, Key key) {
if (x == null) return null;
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = delete(x.left, key);
else if (cmp > 0) x.right = delete(x.right, key);
else {
if (x.right == null) return x.left;
if (x.left == null) return x.right;
Node t = x;
x = min(t.right);
x.right = deleteMin(t.right);
x.left = t.left;
}
x.count = size(x.left) + size(x.right) + 1;
return x;
search for key
no right child
no left child
replace with
successor
update subtree
counts
}
36
Hibbard deletion: analysis
Unsatisfactory solution. Not symmetric.
Surprising consequence. Trees not random (!) ⇒ √ N per op.
Longstanding open problem. Simple and efficient delete for BSTs.
37
ST implementations: summary
guarantee
average case
ordered
ops?
implementation
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N
N
N
N
N
N
binary search
(ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
√N
✔
compareTo()
equals()
other operations also become √N
if deletions allowed
Next lecture. Guarantee logarithmic performance for all operations.
38