A Scalable Multi-Producer Multi-Consumer Wait

Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
A Scalable Multi-Producer Multi-Consumer
Wait-Free Ring Buffer
Andrew Barrington
Steven Feldman
Damian Dechev
Department of Electrical Engineering and Computer Science
University of Central Florida
ACM SAC Presentation, 2015
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Outline
1
Background
2
Ring Buffer
3
Design
4
Wait-Freedom
5
Performance
6
Questions
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Concurrency and Progress
What wrong with locks?
Mutual Exclusion exhibit diminished parallelism
It is prone to hazards (dead-lock, live-lock, starvation)
No guarantee of progress
In complex systems it is hard to make sure its safe
Non-blocking algorithms are designed without locks.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Non-Blocking Algorithms
Provide three levels of progress guarantee:
Obstruction-Free: If all threads are suspended then at least one
thread can make progress.
Lock-Free: At least one thread can always make progress.
Wait-Free: All threads make progress in a finite number of steps.
No need to suspend other threads Hard to prove
Results in Safer Algorithm
Obstruction-Free: Are dead-lock free.
Lock-Free: Are dead-lock and live-lock free.
Wait-Free: Are dead-lock, live-lock free, and starvation free.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Background
Hardware Primitives:
cas (compare-and-swap): Updates an address if its value matches
an expected value.
faa (fetch-and-add): atomically increments the value at an address
by the specified value
fao (fetch-and-or): atomically bitwise or.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Background
Correctness based on linearizability:
Each operation appears to take effect instantaneously at some
moment.
Allows a sequential history to be constructed from a concurrent
execution
Operations ordered by linearization point
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
What is a Ring Buffer?
Related Work
Our Wait-Free Ring Buffer
What is a Ring Buffer?
In general...
It is a FIFO queue implemented on a fixed size array.
Operations take O(1) complexity.
Enqueue(T val): adds the passed value to end of the buffer.
Dequeue(): removes the oldest value from the buffer.
T top(): returns the oldest value from the buffer.
bool isFull(): returns whether or not the buffer is full.
bool isEmpty(): returns whether or not the buffer is empty.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
What is a Ring Buffer?
Related Work
Our Wait-Free Ring Buffer
Related Work
Tsigas et al.: Lock-free design where threads compete to
place/remove values. Not FIFO.
Krizhanovsky: Claimed lock-free design based on FAA. Requires
shared information about the state. Can be shown to be
obstruction-free
Intel TBB: fine-grained locking and micro-queues
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
What is a Ring Buffer?
Related Work
Our Wait-Free Ring Buffer
Our Wait-Free Ring Buffer
Slightly different API:
bool Enqueue(T* v): returns whether or not v was inserted.
bool Dequeue(T** v): returns whether or not a value was removed
and if so assigns it to v.
Restrictions:
We store references to objects instead of objects themselves.
Values in the buffer must contain a sequence identifier (seqid).
The buffer will assign a generated seqid to each object.
We reserve the two least significant bits.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
The Structure
Structure:
A fixed length array of atomic pointers of type T.
An atomic head counter.
An atomic tail counter.
Data Type:
DataNode: a reference to a type T object
EmptyNode: a pointer sized integer
We reserve two least significant bits (LSB) of each value for
state/type information
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Overview
We use seqid’s to provide FIFO ordering.
The value stored at an address can transition in one of the following
manors
An EmptyNode to a DataNode, where
EmptyNode.seqid ≤ DataNode.seqid
A DataNode to an EmptyNode, where
EmptyNode.seqid = DataNode.seqid + length
An EmptyNode to an EmptyNode, where
new EmptyNode.seqid > old EmptyNode.seqid
The key is that the seqid is increasing only.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Common Case
Enqueue(T *v):
1
v .seqid = tail.faa(1) and pos = seqid%length
2
val = array [pos].load()
3
If val.EmptyNode() and v .seqid == val.seqid then
array [pos].cas(val, v )
Dequeue(T**v):
1
seqid = head.faa(1) and pos = seqid%length
2
val = array [pos].load()
3
If val.DataNode() and v .seqid == val.seqid then
pass = array [pos].cas(val, EmptyNode(seqid + length)
4
if pass then ∗v = val
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Example
head=9 tail=13
Enqueue(X):
16’
9
10
11
12
13’
14’
15’
X.seqid = tail.faa(1) = 13
head=9 tail=14 16’ 9
10
11
12
13’
14’
15’
11
12
13
14’
15’
pos = 13%8 = 5
v = array[5]
isEmpty(v) and v.seqid == X.seqid
array[5].cas(v,X)
head=9 tail=14
16’
9
A. Barrington
10
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Example
head=9 tail=14
Dequeue:
16’
9
10
seqid = head.faa(1) = 9
head=10 tail=14 16’
9
11
10
12
11
13
14’
15’
13
14’
12
15’
pos = 9%8 = 1
v = array[1]
isData(v) and v.9 == seqid
array[1].cas(v,EmptyNode(seqid + length))
head=10 tail=14 16’ 17’ 10 11 12
A. Barrington
13
14’
15’
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Uncommon Enqueue Case
What if the current value is a EmptyNode...
with a seqid > then the assigned value
Get a new seqid, the assigned one was skipped
with a seqid < then the assigned value
backoff then re-read
if unchanged, then take the position
What if the current value is a DataNode...
with a seqid > then the assigned value
Get a new seqid, the assigned one was skipped
with a seqid <= then the assigned value
backoff then re-read
if unchanged, get a new seqid and try again elsewhere
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Uncommon Dequeue Case
What if the current value is a EmptyNode...
with a seqid > then the assigned value
Get a new seqid, the assigned one was skipped
with a seqid < then the assigned value
backoff then re-read
if unchanged, replace it with an EmptyNode with a higher seqid.
What if the current value is a DataNode...
with a seqid > then the assigned value
Get a new seqid, the assigned one was skipped
with a seqid < then the assigned value
backoff then re-read
if unchanged, atomic bit mark the LSB and ...
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
Atomic bit mark
How does the atomic bit mark help?
A dequeue can only remove a value if its seqid matches.
If a dequeuer never removes its value? How can we ensure the next
dequeuer accessing the position progresses?
The bitmark is used to signals this case.
If a dequeuer’s value is bitmarked, then it knows a later dequeue got
a new seqid.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
The Structure
Overview
Common Case
Uncommon Enqueue Case
Uncommon Dequeue Case
How to handle a bit mark
If it is on an DataNode then replace it with an EmptyNode with a
higher seqid and a bit mark.
If it is on an EmptyNode then replace it with an EmptyNode with a
higher seqid.
A dequeuer can only get a new seqid if
the current value’s seqid is >
the current value is a DataNode with a bitmark and it is bitmark.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Is this wait-free?
As described no, it is susceptible livelock.
Consider the scenario where just before an enqueuer executes cas,
another enqueuer takes the position.
A similar case occurs when memory management is used. As a result,
the solution must work for both scenarios.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
How we made it wait-free
Progress assurance framework composed of
Herlihy’s announcement table.
Kogan’s methodology for checking it in O(1/C )
An association model we developed in prior work.
Association Model:
Two descriptors:
An operation record (oprec), atomic < helper ∗ > h.
An helper, oprec ∗ op, Value v
If helper .op.h.load() == helper then they are associated
Otherwise the helper was placed in error.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
100% Dequeue
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
0
1
2
4
8
16
32
64
Threads
Wait-free
Linux
Tsigas
TBB
Locking
MCAS
Linux buffer out performs us by 18%, TBB by 8%, and we out perform
the others by 25%.
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
100% Enqueue
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
0
1
2
4
8
16
32
64
Threads
Wait-free
Linux
Tsigas
TBB
Locking
MCAS
We see performance improvements of about 25%
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
80% Enqueue, 20% Dequeue
10000000
9000000
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
1
2
4
8
16
32
64
Threads
Wait-free
Linux
Tsigas
TBB
Locking
MCAS
We see performance improvements of about 16%
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
20% Enqueue, 80% Dequeue
12000000
10000000
8000000
6000000
4000000
2000000
0
1
2
4
8
16
32
64
Threads
Wait-free
Linux
Tsigas
TBB
Locking
MCAS
Linux beats us by 1%, we be TBB beat 2% and the rest by 45%
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
50% Enqueue, 50% Dequeue
10000000
9000000
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
1
2
4
8
16
32
64
Threads
Wait-free
Linux
Tsigas
TBB
Locking
MCAS
Ours and Linux, out perform the rest by about 46%
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe
Background
Ring Buffer
Design
Wait-Freedom
Performance
Questions
Questions?
Questions?
For more details, please visit cse.eecs.ucf.edu
A. Barrington
A Scalable Multi-Producer Multi-Consumer Wait-Free Ring Buffe