Meeting Deadlines in Datacenter networks

Better Never than Late:
Meeting Deadlines in
Datacenter networks
Peter Peresini
This talk is about ...
networking in a datacenter
Network traffic
two classes
● realtime
● offline
Realtime traffic
user requests
● unpredictable traffic
● fast response
○ increase in latency = disaster
○ (Amazon: 100ms = 1% sales)
Life of user request
what is going on inside DC?
1 user request = storm of traffic
Life of user request (overview)
Life of user request (details)
frontend
50 ms
aggregators
20 ms
workers
10 ms
storage
Real-time requests (summary)
● many flows with tight deadlines
○ flow is useful if and only if it meets deadline
● flow explosion
○ bursty traffic pattern
○ incast problem
Current datecenter networking
● TCP fair share
● per-packet QoS (DiffServ, IntServ)
Fair share not "fair"?
● flows with different
deadlines
boost flow with tighter
deadline
Fair share even harmful?
● several flows
● router with small
available bandwidth
sacrifice a flow
x
Fair share summary
fair share not optimal for datacenters!
3
D - deadline-driven delivery
Key idea: make network deadline-aware
● prioritize flows
○ tighter deadline
○ bigger size for same deadline
Design goals
● per-flow stateless core
○ routers cannot maintain per-flow state
● maximize useful throughput
○ minimize #expired deadlines
● tolerate bursts
○ workloads often cause bursts + incast problem
● high utilization
○ long flows should use all remaining bandwidth
Challenges
● per-flow deadlines
○ per-packet schemes not enough
● small flows (<100KB), small RTT
○ reservation schemes too inflexible
○ slow response time => under/over-utilization
3
D prototype
● endhosts
○ rate-limit connections
○ negotiate rate with routers
■ frequently (per RTT)
● routers
○ allocate rates
○ maintain allocations with no per-flow state
Rate control on endhosts
applications expose
● size s
● deadline d
● minimal rate to satisfy deadline
r=s/d
Rate allocation on routers
with deadline
alloc = demanded minimal rate + fair share
without deadline
alloc = fair share
where
fair share = (capacity - demand) / #flows
Where is the catch?
● do it stateless
○ need to track #flows, demand, allocation
key idea: endhosts maintain per-flow state
○ send routers everything needed
Rate alloc on routers (cont.)
● capacity of the router
○ constant
● #flows
○ count SYN, FIN
● current demand
○ include old and new demand in request
○ demand += new - old
● current alloc
○ include old alloc in request
○ alloc += new - old
But things can go wrong ...
http://hospitalnotes.blogspot.com/2009/10/computer-failure.html
Getting around failures
failures affecting bookkeeping
● packets are lost
● hosts go down
key idea: periodically adjust router capacity
Adjusting router capacity
key idea:
free bandwidth ⇒ increase capacity
long queue ⇒ decrease capacity
3
D wrapup
endhosts
● rate-limit flows
● periodically request flow allocations
routers
● respond with allocations
○ everything in global state + packets
● capacity adjustments
○ to overcome failures
Evaluation
● flow burst microbenchmarks
○ today's worst-case
● typical traffic benchmark
○ expected performance in DC
● flow quenching
○ value of killing flows
Flow burst μ-benchmark
RACK
workers
aggregator
Flow burst μ-benchmark - flows
flow size
● uniform
● [2KB, 50KB]
deadlines:
● exponentially distributed around
○ 20ms - tight
○ 30ms - moderate
○ 40ms - lax
Flow burst μ-benchmark - protocols
●
●
●
●
D3
RCPDC (D3 without deadlines)
TCPpr (priority queuing)
TCP
Flow burst μ-benchmark - results
2 graphs
● max workers (1% deadline miss)
● application throughput (deadline miss rate)
μ-benchmark - #senders 99%tile
Number of senders which can sustain 99% throughput
#flows meeting deadlines
(%)
μ-benchmark - app throughput
TCP: 10 senders leads to
10% deadline miss
D3: up to 30 senders
Conclusions
need for deadline awareness
D3 - deadline driven delivery
● prioritize flows according to deadlines
● designed for datacenters
● sustain triple #flows