Parallel String Sorting with Super Scalar String Sample Sort I T

Parallel String Sorting with
Super Scalar String Sample Sort
Timo Bingmann and Peter Sanders | September 4th, 2013 @ ESA’13
I NSTITUTE
OF
T HEORETICAL I NFORMATICS – A LGORITHMICS
KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
www.kit.edu
Abstract
We present the currently fastest parallel string sorting algorithm
for modern multi-core shared memory architectures.
First, we describe the challenges posed by these new
architectures, and discuss key points to achieving high
performance gains. Then we give an overview of existing
sequential and parallel string sorting algorithms and
implementations. Thereafter, we continue by developing super
scalar string sample sort (S5 ), which is easily parallelizable and
yields higher parallel speedups than all previously known
algorithms.
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
2 / 29
Overview
1
Introduction and Motivation
Parallel Memory Bandwidth Test
2
String Sorting Algorithms
Radix Sort
Multikey Quicksort
Super Scalar String Sample Sort
3
Experimental Results
4
Conclusion
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
3 / 29
String Sorting Algorithms
Theoretical Parallel Algorithms
“Optimal Parallel String Algorithms: . . .”
[Hagerup ’94]
Existing Basic Sequential Algorithms
Radix Sort
Multikey Quicksort
Burstsort
LCP-Mergesort
[McIlroy et al. ’95]
[Bentley, Sedgewick ’97]
[Sinha, Zobel ’04]
[Ng, Kakehi ’08]
Existing Algorithm Library
by Tommi Rantala
(for Radix Sort [Kärkkäinen, Rantala ’09])
http://github.com/rantala/string-sorting
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
4 / 29
String Sorting Algorithms
Theoretical Parallel Algorithms
“Optimal Parallel String Algorithms: . . .”
[Hagerup ’94]
Existing Basic Sequential Algorithms
Radix Sort
Multikey Quicksort
Burstsort
LCP-Mergesort
[McIlroy et al. ’95]
[Bentley, Sedgewick ’97]
[Sinha, Zobel ’04]
[Ng, Kakehi ’08]
Existing Algorithm Library
by Tommi Rantala
(for Radix Sort [Kärkkäinen, Rantala ’09])
http://github.com/rantala/string-sorting
Our Contribution: Practical Parallel Algorithms
Parallel Super Scalar String Sample Sort (pS5 )
[This Work]
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
4 / 29
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L3 Cache
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L2
L3 Cache
RAM
RAM
L3 Cache
RAM
RAM
Modern Multi-Core Architecture
L3 Cache
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
5 / 29
Parallel Memory Bandwidth Test
// ScanRead64IndexUnrollLoop
for (size_t i=0; i<n; i+=16) {
uint64_t x0 = array[i+0];
// ... 14 times
uint64_t x15 = array[i+15];
}
// PermRead64SimpleLoop
uint64_t p = *array;
while((uint64_t*)p != array)
p = *(uint64_t*)p;
n
p1
p2
p3
p4
n
p1
p2
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
6 / 29
ScanRead64IndexUnrollLoop
1,200
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
bandwidth [GiB/s]
1,000
800
600
400
200
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
7 / 29
ScanRead64IndexUnrollLoop
1,200
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
bandwidth [GiB/s]
1,000
1,137 GiB/s
800
600
400
16 GiB/s
200
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
7 / 29
ScanRead64IndexUnrollLoop
50
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
bandwidth [GiB/s]
40
30
20
10
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
8 / 29
ScanRead64IndexUnrollLoop
50
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
bandwidth [GiB/s]
40
30
35 GiB/s
20
2.6 GiB/s
10
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
8 / 29
PermRead64SimpleLoop
bandwidth [GiB/s]
300
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
200
100
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
9 / 29
access latency [ns]
PermRead64SimpleLoop
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
600
400
200
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
10 / 29
access latency [ns]
PermRead64SimpleLoop
p=1
p=2
p=4
p=8
p = 16
p = 32
p = 64
600
400
6 ns
200
0.028 ns
0
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
10 / 29
Log-Log Access Latency
access latency [s]
10−6
Scan p = 1
Scan p = 64
Perm p = 1
Perm p = 64
10−7
10−8
10−9
10−10
10−11
210
215
220
225
area size n [B]
230
235
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
11 / 29
Parallel Memory Bandwidth Test
Scanning an Array
p1
p2
n
p3
p4
Random Access in an Array
n
p1
p
1
16
32
Cache (n = 16 KiB)
Scan
Random
35 GiB/s 4.4 GiB/s
567 GiB/s
69 GiB/s
1131 GiB/s 137 GiB/s
p2
p
1
16
32
RAM (n = 4 GiB)
Scan
Random
2.6 GiB/s
19 MiB/s
10.3 GiB/s 403 MiB/s
10.6 GiB/s 763 MiB/s
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
12 / 29
Sorting Strings
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
u s
t on
a
i c
⇒
a
a
a
a
a
a
k
k
k
k
k
k
k
b
l
r
r
r
r
a
e
i
i
i
i
r
a
p
c
c
r
r
y
r
t
t
t
t
y
c
h
a
a
a
a
a
n
u
a
d
i
n
y
k
e
s
e
c
g e
l
c h e n
e
t e n
p t on
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
13 / 29
Sorting Strings
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
u s
t on
a
i c
⇒
a
a
a
a
a
a
k
k
k
k
k
k
k
b
l
r
r
r
r
a
e
i
i
i
i
r
a
p
c
c
r
r
y
r
t
t
t
t
y
c
h
a
a
a
a
a
n
u
a
d
i
n
y
k
e
s
e
c
g e
l
c h e n
e
t e n
p t on
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
13 / 29
Sorting Strings: Radix Sort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
0
a
k
σ− 1
0
6
7
0
l
e n
n
e
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Radix Sort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
0
a
k
σ− 1
g e
0
6
7
0
l
e n
n
e
0
0 6
6 13
13 13
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Radix Sort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
0
a
k
σ− 1
g e
0
6
7
0
l
e n
n
e
0
0 6
6 13
13 13
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Radix Sort
a
a
a
a
a
a
k
k
k
k
k
k
k
r
r
r
b
l
r
i
a
e
i
i
i
r
r
r
c
a
p
c
t
y
r
t
t
t
y
a
a
a
c
h
a
y
n
d
u
a
i
a
n
c
t
e
p
k
e l
h e n
e n
g e
e
s
0
a
k
σ− 1
0
6
7
0
0
0 6
6 13
13 13
c
t on
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Radix Sort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
0
a
k
σ− 1
g e
0
6
7
0
l
e n
n
e
0
0 6
6 13
13 13
2
2
2
3
3
1
0
0
0
p1 0
p2 0
p3 0
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Radix Sort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
0
a
k
σ− 1
g e
0
6
7
0
l
e n
n
e
0
0 6
6 13
13 13
p1 0
p2 0
p3 0
2
2
2
3
3
1
0
0
0
p1 0
p2 0
p3 0
0 6
2 6
4 6
6 13
9 13
12 13
13 13
13
13
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
14 / 29
Sorting Strings: Multikey Quicksort
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
[Bentley, Sedgewick ’97]
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
[Bentley, Sedgewick ’97]
g e
l
e
s
<
c
c h e n
t e n
e
p t on
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
[Bentley, Sedgewick ’97]
=
g e
l
e
s
<
?
>
=
<
c
c h e n
t e n
e
p t on
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
[Bentley, Sedgewick ’97]
<
g e
l
e
s
=
>
<
c
c h e n
t e n
e
p t on
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
[Bentley, Sedgewick ’97]
<
g e
l
e
s
>
[Rantala ’??]
<
partition by w = 8 characters
cache characters
⇒ fewer random accesses
c
c h e n
t e n
e
p t on
=
fastest sequential algorithm
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
[Bentley, Sedgewick ’97]
<
g e
l
e
s
>
[Rantala ’??]
<
partition by w = 8 characters
cache characters
⇒ fewer random accesses
c
c h e n
t e n
e
p t on
=
fastest sequential algorithm
[This Work]
=
? ? ? ? ? ? ? ? ? ? ? ? ?
>
parallelize using blocks
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
15 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
c
c h e n
t e n
e
p t on
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
c
c h e n
t e n
e
p t on
p1
?
?
?
p2
?
?
?
p3
?
?
?
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
a
a
k
k
a
a
a
a
k
k
k
k
k
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
<
=? >?
p2 < ? = ?
p3 < ?
c
c h e n
t e n
e
p t on
p1
=
>
>?
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
?
=? >?
p2 < ? = ?
p3 < ?
c
c h e n
t e n
e
p t on
p1
=
>
Output
a
a
k
k
a
a
a
a
k
k
k
k
k
?
?
>?
<
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
p2
<
=
>
>
=? >?
p3 < ? = ?
c
c h e n
t e n
e
p t on
p1 < ? = ?
Output
a
a
k
k
a
a
a
a
k
k
k
k
k
>
<
=
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
r
r
a
e
r
b
l
r
i
i
i
i
r
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
p2
0
=? >?
p3 < ? = ?
c
c h e n
t e n
e
p t on
p1 < ? = ? ? 0
=
>
Output
a
a
k
k
a
a
a
a
k
k
k
k
k
<
<
0
<
=
>
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Sorting Strings: Multikey Quicksort
r
r
y
r
c
a
p
c
t
t
t
t
y
a
a
a
n
a
c
h
a
y
n
k
e
d
u
a
i
g e
l
e
s
? ? ? ? ? ? ? ? ? ? ? ??
<
p2 < 0 = 0 > 0
p3 < 0 = 0 > 0
c
c h e n
t e n
e
p t on
p1 < 0 = 0 > 0
=
>
Output
r
r
a
e
r
b
l
r
i
i
i
i
r
<
<
More Output
a
a
k
k
a
a
a
a
k
k
k
k
k
<
=
>
>
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
16 / 29
Super Scalar String Sample Sort (S5 )
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
ar
<
>
=
ab
ki
< = >
< = >
b0 b1 b2 b3 b4 b5 b6
u s
t on
a
i c
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
g e
l
e n
n
e
u s
t on
a
i c
ar
<
>
=
ab
ki
< = >
< = >
b0 b1 b2 b3 b4 b5 b6
b0 b1 b2 b3 b4 b5 b6
b0 b1 b2 b3 b4 b5 b6
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
ar
g e
l
e n
n
e
u s
t on
a
i c
<
>
=
ab
ki
< = >
< = >
0
0
0
0
1
0
0
0
1
2
1
1
2
0
0
1
3
0
0
0
1
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
a
k
a
k
k
k
k
a
k
a
k
a
a
r
i
r
a
e
i
i
r
i
b
r
l
r
r
t
r
y
r
t
t
c
t
a
y
p
c
a y
a
a
n
c
t
a
e
c
p
h
a
n
k
e
h
e
d
ar
g e
l
e n
n
e
u s
t on
a
i c
<
>
=
ab
ki
< = >
< = >
0
0
0
0
1
1
1
1
2
4
5
6
8
8
8
9 12
12 12
12 13
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
a
a
a
a
a
a
k
k
k
k
k
k
k
b
l
r
r
r
r
a
e
i
i
i
i
r
a
p
r
r
c
c
y
r
t
t
t
t
y
c
h
a
a
a
a
a
n
u
a
y
n
d
i
k
e
s
ar
<
>
g e
e
c
=
ab
ki
< = >
< = >
l
0
0
0
c h e n
t e n
e
p t on
0
1
1
1
1
2
4
5
6
8
8
8
9 12
12 12
12 13
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
prefix
a b a c u s
2
1
a l ph a
a
a
a
a
k
k
k
k
k
k
k
r
r
r
r
a
e
i
i
i
i
r
r
r
c
c
y
r
t
t
t
t
y
a
a
a
a
a
n
y
n
d
i
k
e
g e
2
e
c
l
0
c h e n
2
t e n
e
p t on 0
increase prefix by
LCP of splitters or
key size.
>
ar
<
1
0
=
ab
ki
< = >
< = >
0
0
0
0
1
1
1
1
2
4
5
6
8
8
8
9 12
12 12
12 13
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
prefix
a b a c u s
2
1
a l ph a
a
a
a
a
k
k
k
k
k
k
k
r
r
r
r
a
e
i
i
i
i
r
r
r
c
c
y
r
t
t
t
t
y
a
a
a
a
a
n
y
n
d
i
k
e
g e
2
e
c
l
0
c h e n
2
t e n
e
p t on 0
increase prefix by
LCP of splitters or
key size.
>
ar
<
1
0
=
ab
ki
< = >
< = >
0
0
0
0
1
1
1
1
2
4
5
6
8
8
8
9 12
12 12
12 13
partitions by w = 8 chars
easy parallelization
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
17 / 29
Super Scalar String Sample Sort (S5 )
predicated instructions
ar
<
>
=
ab
ki
< = >
< = >
1
2
3
ar ab ki
i := 2i + 0/1
b0 b1 b2 b3 b4 b5 b6
partitions by w = 8 chars
easy parallelization
256 KiB L2 cache: 13 levels
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
18 / 29
Super Scalar String Sample Sort (S5 )
predicated instructions
ar
<
>
=
ab
ki
< = >
< = >
b0 b1 b2 b3 b4 b5 b6
partitions by w = 8 chars
1
2
3
ar ab ki
i := 2i + 0/1
equality checking:
1
at each node
2
after full descent
easy parallelization
256 KiB L2 cache: 13 levels
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
18 / 29
Super Scalar String Sample Sort (S5 )
predicated instructions
ar
<
>
=
ab
ki
< = >
< = >
b0 b1 b2 b3 b4 b5 b6
partitions by w = 8 chars
easy parallelization
256 KiB L2 cache: 13 levels
1
2
3
ar ab ki
i := 2i + 0/1
equality checking:
1
at each node
2
after full descent
interleave tree descents:
classify 4 strings at once
⇒ super scalar parallelism
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
18 / 29
Parallel S5 – Sub-Algorithms
|S| ≥
n
p
n
p
> |S| ≥ 216
fully parallel S5
sequential S5
216 > |S| ≥ 64
caching multikey quicksort
64 > |S|
insertion sort
Important: dynamic load balancing with voluntary work freeing
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
19 / 29
URLs – 1.1 G Lines, 70.7 GiB
http://algo2.iti.kit.edu/index.php
http://algo2.iti.kit.edu/english/index.php
http://algo2.iti.kit.edu/1483.php
http://algo2.iti.kit.edu/1484.php
http://www.kit.edu/
http://algo2.iti.kit.edu/286.php
http://algo2.iti.kit.edu/1294.php
http://algo2.iti.kit.edu/research.php
http://algo2.iti.kit.edu/publications.php
http://algo2.iti.kit.edu/members.php
http://algo2.iti.kit.edu/lehre.php
http://algo2.iti.kit.edu/1844.php
http://algo2.iti.kit.edu/294.php
http://algo2.iti.kit.edu/basic-toolbox-page.php
http://map.project-osrm.org?dest=49.0137004,8.419307&destname=%22Am%20Fa...
http://www.uni-karlsruhe.de/fs/Uni/info/campusplan/index.php?id=50.34
http://www.informatik.kit.edu/1158.php
http://algo2.iti.kit.edu/emailform.php?id=eea49c752e4c1329710cba2efae10511
http://algo2.iti.kit.edu/routeplanning.php
http://algo2.iti.kit.edu/sanders.php
http://www.mwk.baden-wuerttemberg.de/forschung/forschungsfoerderung/land...
http://algo2.iti.kit.edu/1996.php
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
20 / 29
URLs – Speedup on 32-core Intel E5
10
speedup
8
pS5
pMultikeyQuicksort
pRadixSort
6
4
2
0
1
8
16
32
48
number of threads
64
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
21 / 29
Wikipedia – Suffixes of 4 GiB
<page>
<title>Karlsruhe Institute of Technology</title>
<ns>0</ns>
<id>5977314</id>
<revision>
<id>491222814</id>
<timestamp>2013-04-24T03:17:12Z</timestamp>
<text xml:space="preserve">{{Infobox university
|name
= Karlsruhe Institute of Technology
|image = [[Image:KIT logo.png|270px|]]
|tagline
= ’’{{lang|de|Forschungsuniversität}}’’ (Research University)
|established
= Fridericiana: 1825 as polytechnical school, 1865 as university;&lt;br...
|president
= Horst Hippler, Eberhard Umbach
|students
= 23,905 (October 2012)
|staff
= 3.423<ref name="kit.edu"/>}}
The ’’’Karlsruhe Institute of Technology’’’ (’’’KIT’’’) is one of the largest research and
educations institution in Germany, resulting from a merger of the university (’’Universität
Karlsruhe (TH)’’) and the research center (’’Forschungszentrum Karlsruhe’’)&lt;ref&gt;[[
Federal Ministry of Education and Research (Germany)]]: http://www.bmbf.de/pub/eck
punktepapier_kit.pdf&lt;/ref&gt; of the city of [[Karlsruhe]]. The university, also known
as ’’’’’Fridericiana’’’’’, was founded in 1825. In 2009, it merged with the former national
nuclear research center founded in 1956 as the ’’Kernforschungszentrum Karlsruhe (KfK)’’.
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
22 / 29
Suffixes – Speedup on 32-core Intel E5
20
speedup
15
10
5
pS5
pMultikeyQuicksort
pRadixSort
0
1
8
16
32
48
number of threads
64
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
23 / 29
GOV2 – 3.1 G Lines in 128 GiB
<p align="left"><font face="Verdana, Arial, Helvetica, sans-serif"><b>BACKGROUND
INFORMATION. </b>The Forest Ecosystem Dynamics (FED) Project is concerned
with modeling and monitoring ecosystem processes and patterns in response
to natural and anthropogenic effects. The project uses coupled ecosystem
models and remote sensing models and measurements to predict and observe
ecosystem change. The overall objective of the FED project is to link
and use models of forest dynamics, soil processes, and canopy energetics
to understand how ecosystem response to change affects patterns and processes
in northern and boreal forests and to assess the implications for global
change. See <a href="http://fedwww.gsfc.nasa.gov/html/conceptdiagram.html">
Conceptual Diagram</a> for model schematic. </font></p>
<p align="left"><font face="Verdana, Arial, Helvetica, sans-serif">The Forest
Ecosystem Dynamics World-Wide-Web server has been online since July 1994.
The FED server was created for the dissemination of project information,
to archive numerous spatial and scientific data sets, and demonstrate
the linking of ecosystem and remote sensing models. </font></p>
</td>
<td valign="top" width="28%" align="left" height="565">
<p><img src="vertrule.gif" alt="vertical rule" width="2" height="580" hspace="5" ...
<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a href="http://fedwww...
Abstract</a> </font></p>
<p><font face="Verdana, Arial, Helvetica, sans-serif" size="2"><a href="http://fedwww...
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
24 / 29
GOV2 – Speedup on 32-core Intel E5
12
10
speedup
8
pS5
pMultikeyQuicksort
pRadixSort
6
4
2
0
1
8
16
32
48
number of threads
64
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
25 / 29
Words – 31.6 M Lines, 382 MiB
stereosto
tocellular
cellularand
andportable
wellas
asdiscussion
ofcomputer
computersecurity
securityissues
issuesin
ingeneral
generalWhile
Whilewe
wewelcome
welcomeall
allmanner
mannerof
ofquestion
questionabout
aboutencrypted
encryptedmessages
messagesHow
doesPGP
PGPencrypt
encrypta
amessage
messageWhat
apublic
publickey
keyencrypted
messagesthemselves
themselvesare
offtopic
topicHost
CamelotMessage
MessageFile
FileBBS
BBSCulver
CulverCity
CityCA
ListGraphics
GraphicsGeneral
GeneralGraphics
Graphics15
15Discussions
aboutall
alltypes
typesof
computergraphics
graphicsand
languageconference
forthe
theexchange
exchangeof
ofideas
ideasin
Learnmore
moreabout
theSpanish
Spanishculture
culturewhile
communicatingin
thelanguage
languageHost
ListFantsySciF
FantsySciFFantasy
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
26 / 29
Words – Speedup on 4-core Intel i7
4
speedup
3
2
pS5
pMultikeyQuicksort
pRadixSort
1
1
2
3
5
4
6
number of threads
7
8
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
27 / 29
Future Work
Non-uniform memory architecture (NUMA) effects
distributed string sorting algorithms
distributed text index construction and query
high-performance middleware?
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
28 / 29
Thank you for your attention!
Questions?
Timo Bingmann and Peter Sanders – Parallel String Sorting with Super Scalar String Sample Sort
Institute of Theoretical Informatics – Algorithmics
September 4th, 2013
29 / 29