Better Never than Late: Meeting Deadlines in Datacenter networks Peter Peresini This talk is about ... networking in a datacenter Network traffic two classes ● realtime ● offline Realtime traffic user requests ● unpredictable traffic ● fast response ○ increase in latency = disaster ○ (Amazon: 100ms = 1% sales) Life of user request what is going on inside DC? 1 user request = storm of traffic Life of user request (overview) Life of user request (details) frontend 50 ms aggregators 20 ms workers 10 ms storage Real-time requests (summary) ● many flows with tight deadlines ○ flow is useful if and only if it meets deadline ● flow explosion ○ bursty traffic pattern ○ incast problem Current datecenter networking ● TCP fair share ● per-packet QoS (DiffServ, IntServ) Fair share not "fair"? ● flows with different deadlines boost flow with tighter deadline Fair share even harmful? ● several flows ● router with small available bandwidth sacrifice a flow x Fair share summary fair share not optimal for datacenters! 3 D - deadline-driven delivery Key idea: make network deadline-aware ● prioritize flows ○ tighter deadline ○ bigger size for same deadline Design goals ● per-flow stateless core ○ routers cannot maintain per-flow state ● maximize useful throughput ○ minimize #expired deadlines ● tolerate bursts ○ workloads often cause bursts + incast problem ● high utilization ○ long flows should use all remaining bandwidth Challenges ● per-flow deadlines ○ per-packet schemes not enough ● small flows (<100KB), small RTT ○ reservation schemes too inflexible ○ slow response time => under/over-utilization 3 D prototype ● endhosts ○ rate-limit connections ○ negotiate rate with routers ■ frequently (per RTT) ● routers ○ allocate rates ○ maintain allocations with no per-flow state Rate control on endhosts applications expose ● size s ● deadline d ● minimal rate to satisfy deadline r=s/d Rate allocation on routers with deadline alloc = demanded minimal rate + fair share without deadline alloc = fair share where fair share = (capacity - demand) / #flows Where is the catch? ● do it stateless ○ need to track #flows, demand, allocation key idea: endhosts maintain per-flow state ○ send routers everything needed Rate alloc on routers (cont.) ● capacity of the router ○ constant ● #flows ○ count SYN, FIN ● current demand ○ include old and new demand in request ○ demand += new - old ● current alloc ○ include old alloc in request ○ alloc += new - old But things can go wrong ... http://hospitalnotes.blogspot.com/2009/10/computer-failure.html Getting around failures failures affecting bookkeeping ● packets are lost ● hosts go down key idea: periodically adjust router capacity Adjusting router capacity key idea: free bandwidth ⇒ increase capacity long queue ⇒ decrease capacity 3 D wrapup endhosts ● rate-limit flows ● periodically request flow allocations routers ● respond with allocations ○ everything in global state + packets ● capacity adjustments ○ to overcome failures Evaluation ● flow burst microbenchmarks ○ today's worst-case ● typical traffic benchmark ○ expected performance in DC ● flow quenching ○ value of killing flows Flow burst μ-benchmark RACK workers aggregator Flow burst μ-benchmark - flows flow size ● uniform ● [2KB, 50KB] deadlines: ● exponentially distributed around ○ 20ms - tight ○ 30ms - moderate ○ 40ms - lax Flow burst μ-benchmark - protocols ● ● ● ● D3 RCPDC (D3 without deadlines) TCPpr (priority queuing) TCP Flow burst μ-benchmark - results 2 graphs ● max workers (1% deadline miss) ● application throughput (deadline miss rate) μ-benchmark - #senders 99%tile Number of senders which can sustain 99% throughput #flows meeting deadlines (%) μ-benchmark - app throughput TCP: 10 senders leads to 10% deadline miss D3: up to 30 senders Conclusions need for deadline awareness D3 - deadline driven delivery ● prioritize flows according to deadlines ● designed for datacenters ● sustain triple #flows
© Copyright 2025