Don’t Thrash: How to Cache Your Hash in Flash

Don’t Thrash:
How to Cache
Your Hash
in Flash
M.A. Bender, M. Farach-Colton, R. Johnson,
B.C. Kuszmaul, D. Medjedovic, P. Montes,
P. Shetty, R. P. Spillane, E. Zadok
Stony Brook U., Rutgers U., MIT, TokuTek
Bloom Filter
Cache (e.g., RAM)
Bit-array
0
1
0
1
Store
A
Elements stored
in the Bloom filter
B
1
Each element is hashed
To K positions in the
bit-array. Here k=2
• A Bloom filter is a bit-array + k hash functions
• Storing a few bits per element lets the BF stay in
RAM, even as the elements are too large
Don't Thrash: How to Cache Your Hash in Flash
2
Bloom Filter Lookups & False Positives
D
lookup
B [ h2 (D)] =1
A
C
1
1
False positive
(C was never inserted)
B [ h1 (D)] = 0
B
0
1
A
0
B
(
)
kn/m k
• False positives unlikely, pFP ( x ) » 1- e
• No false negatives (no means no)
• Allowing false positives is what keeps the BF small
Don't Thrash: How to Cache Your Hash in Flash
3
Flash
RAM
Flash
B
0
1
0
1
1
Store
A
B
• Bigger & cheaper than RAM, faster than disk
• 8TB of 512B keys needs 16GB of RAM for a ~1% BF
• Flash is a good place to cheaply store large BFs
Don't Thrash: How to Cache Your Hash in Flash
4
Thrashing
Random writes
C
Flash
B
0
1
0
1
1
• Setting random bits to 1 causes random writes
• OK in RAM, not in Flash
Don't Thrash: How to Cache Your Hash in Flash
5
Summary of Our Results
• Cascade Filter (CF), a BF replacement opt. for fast
inserts on Flash
• Our performance
– We do 670,000 inserts/sec (40x of other variants)
– We do 530 lookups/sec (1/3x of other variants)
• We use Quotient Filters (QF) instead of Bloom Filters
– They have better access locality
– You can efficiently merge two QFs into a larger QF (w/
same FP rate)
• We use merging techniques to compose multiple QFs
into a CF
Don't Thrash: How to Cache Your Hash in Flash
6
Thrashing is the Problem
C
Random Writes
K
Flash
• Every insert, you write to K Flash pages
• Expensive to write to a Flash page
• We can’t do fast insertions without working
around this issue
Don't Thrash: How to Cache Your Hash in Flash
7
Shaving off K
C
K
Now just one
random write, not K
Flash
• Now you only write one block for each insert
instead of K blocks
• Two-step hash [Canim et. al., 2010]
• This helps a little
Don't Thrash: How to Cache Your Hash in Flash
8
Queue Writes
B
RAM
3
D
A
4
1
3
0
We write 5 bits with
only 2 flash writes
1
Flash
B
1
1
0
1
1
• This helps a lot [Canim et. al. 2010]
• Buffering gives bit-flips a chance to piggy-back
• How others have cached hashes in Flashes
Don't Thrash: How to Cache Your Hash in Flash
9
We Need Help
RAM
Flash
• Buffering works when the queue is large
• Small queues insert ~1 element per flash write
• We’re interested in large datasets, and fast
insertions (i.e., when buffering doesn’t work)
Don't Thrash: How to Cache Your Hash in Flash
10
An Important Problem
• Many companies optimize their DBs for large
data-sets and fast inserts
–
–
–
–
–
–
Bai-Du Hypertable
Facebook Cassandra
Google BigTable
TokuTek TokuDB
Yahoo! HBase
… and more!
• Scaling the trusty Bloom Filter to Flash would be
a powerful tool for tackling these problems
Don't Thrash: How to Cache Your Hash in Flash
11
Several data structures avoid RWs
• A list of the most common methods
– Buffered Repository Trees
– Cassandra
– Cache Oblivious Look-ahead Arrays
– Log-structured Merge Trees
– …and more
• We can try to adapt the general method many
of these structures use
Don't Thrash: How to Cache Your Hash in Flash
12
The General Method
RAM
Store
£ éêlog N ùú
7
2
4
2
4
7
1
3
9
5
6
8
1
3
5
No
No
8
Lookup 8
Found
2 Previously flushed buffers
Buffers are merged to keep
total number of buffers low
6
8
9
• Supports deletes
• Composed of many sorted lists
• We can use this technique to avoid random writes
Don't Thrash: How to Cache Your Hash in Flash
13
Problem: Elements not Bits
• This method is used with sorted lists of
elements, not Bloom filters
• We need a data structure that
– Supports insert + lookup
– Is as space efficient as a Bloom filter
– Can be merged on Flash like a sorted list of
elements
– Bonus: supports always-working deletes
– Bonus: faster than BFs
Don't Thrash: How to Cache Your Hash in Flash
14
Our Proposal: Quotient Filters
•
•
•
•
•
•
Supports insert + lookup
Compact like a Bloom filter
Two QFs can be merged into a larger QF
Supports always-working deletes
Faster
We can use this alternative to replace the
sorted lists of elements in a write-opt. method
Don't Thrash: How to Cache Your Hash in Flash
15
A Quotient Filter
2 or 3 MD bits
per element
r-bit array
r=3
101
000
110
000
00
01
10
11
Q[10]=110
A
h(A)=00:101
B
address:identity
h(B)=10:110
• fingerprints + quotienting to save space
• fingerprint: p-bit hash (p=5)
• Compact, only stores r+MD bits per element
Don't Thrash: How to Cache Your Hash in Flash
16
A Quotient Filter
h(C)=01:010
h(A)=00:101
r-bit array
r=3
A
C
E
101
000
110
111
00
01
10
11
False positive
h(E)=10:110
(E was never inserted)
Soft collision
(push D to the side, use a
few MD bits to remember)
A
B
h(B)=10:110
D
h(D)=10:111
• False positive: fingerprint collision
-1
q
1
size
=
a
r
+
MD
2
p
x
£
a
(
)
(
)
r
FP
•
, or ~1.2x a BF for ~0.1% FP-rate
2 ,
• Quotient Filters also remain small by allowing false positives
Don't Thrash: How to Cache Your Hash in Flash
17
But Will it Merge?
00:101=5
r-bit array
r=3
10:110=22
10:111=23
101
000
110
111
00
01
10
11
A
B
D
• Actually, a compact sorted list of integers
Don't Thrash: How to Cache Your Hash in Flash
18
Merge as Integers, Then Insert
r-bit array
r=2
A
r-bit arrays
r=3
000
001
010
011
100
101
110
111
00
01
00
00
00
10
11
00
001:01=5
00:101=5
B
101 000 111
00
01
10
101:10=22
10:110=22
000
11
101:11=23
10:111=23
000 000
00
01
C
110 000
10
11
• QFs support Plug-n-Play with wrt.-opt. DSes
Don't Thrash: How to Cache Your Hash in Flash
19
Cascade Filter
RAM
Store
£ éêlog N ùú
QF
QF
QF
QF
• Just substitute sorted lists of elements with Quotient
Filters instead
• Now we have fast insertions and a compact
representation in Flash
Don't Thrash: How to Cache Your Hash in Flash
20
Experimental Setup
• Everything was the same (e.g., cache size)
• Inserted 8.4 billion hashes
• Randomly queried them
Don't Thrash: How to Cache Your Hash in Flash
21
Insertion Throughput
Number of Fingerprints Inserted
9E+09
Peak append
thruput: 8.4MB/S
8E+09
7E+09
6E+09
Large Merges
5E+09
4E+09
Thruput much higher:
40x higher than BBF
3000x higher than BF
3E+09
2E+09
1E+09
0
0
2000
4000
6000
8000
10000
12000
14000
Seconds
Don't Thrash: How to Cache Your Hash in Flash
22
Lookup Throughput
1600 lkus/sec
10000
1600 lkus/sec
Lookup Throughput
530 lkus/sec
1000
1/3x
100
10
1
CF
Traditional BF
Elevator BF
Don't Thrash: How to Cache Your Hash in Flash
23
Conclusions
• Quotient Filters outperform BFs in RAM
– 3x faster inserts, same lookups
– Support deletes
– Can be dynamically resized
• Cascade Filters outperform BFs in Flash
– All advantages of Quotient Filters (e.g., deletes)
– 40x faster inserts, 1/3x lookups
– CPU bound
Don't Thrash: How to Cache Your Hash in Flash
24
Future Work
•
•
•
•
Tweak the CF to handle buffering as well
Measure real index workloads
Can a CF help a write-optimized DB?
There are a lot of exciting boulevards to
explore
Don't Thrash: How to Cache Your Hash in Flash
25
And That is How…
• …you Don’t Thrash, when you Cache Your
Hash in Flash
• Thank you for listening, Questions?
– Pablo Montes: [email protected]
– Rick Spillane: [email protected]
Don't Thrash: How to Cache Your Hash in Flash
26
Insertion Throughput
670,000 ins/sec
1000000
40x
Insertion Throughput
100000
17000 ins/sec
3000x
10000
200 ins/sec
1000
100
10
1
CF
Traditional BF
Elevator BF
Don't Thrash: How to Cache Your Hash in Flash
27
Experimental Setup
• Controls:
–
–
–
–
–
~Equal DS cache size, BF given benefit of doubt
Equal RAM in all runs/tests
BF tests run in steady-state for 4+ hours
CF tests run for 8.4 billion insertions (~16GB CF)
Flash partition 60% of Intel X25-Mv2, 90GB
• Machine:
– Quad-core 2.4GHz Xeon E5530 with 8MB cache
– 24GB of RAM (booted with 0.994GB)
– 159.4GB Intel X-25M SSD (second generation)
Don't Thrash: How to Cache Your Hash in Flash
28
Future Work
•
•
•
•
•
Measure CF effectiveness for read-optimized
Measure real index workloads
Can a CF help a write-optimized DB?
Better CPU/GPU optimization
There are a lot of exciting boulevards to
explore
Don't Thrash: How to Cache Your Hash in Flash
29