RR-TCP: A Reordering-Robust TCP with DSACK Ming Zhang (Princeton) Brad Karp (Intel Research/Carnegie Mellon) Sally Floyd (ICSI) Larry Peterson (Princeton) 7/28/2010 ICNP 2003 Motivation TCP performs badly under packet reordering In today’s Internet, reordering is considered maladaptive: Route oscillation [Paxson96] Router software errors [Paxson97] Striping packets across multiple links [Bennett99] Satellite links [Ward95] The above are not the main reasons that TCP should be reordering-robust Beneficial systems cannot be deployed because of reordering Multi-path routing Parallel packet forwarding path [Chen01] Goal: improve TCP’s robustness on paths that reorder packets, without increasing aggressiveness under congestion. 7/28/2010 ICNP 2003 Outline Motivation False fast retransmit and dupthresh Measuring reordering length and DSACK-FA Cost of false fast retransmit and timeout Using costs to adapt dupthresh to maximize throughput Experimental evaluation Conclusion and future work 7/28/2010 ICNP 2003 Problem: False Fast Retransmit TCP detects loss with duplicate ACKs (dupacks) TCP enters fast retransmit when number of dupacks reaches some theshold dupthresh dupthresh is 3 per [Jacobson89] If a packet’s position is perturbed by more than 3 packets, sender misinterprets reordering as loss and enters false fast retransmit False fast retransmit causes congestion window (cwnd) to be halved unnecessarily DSACK can help identify false fast retransmits by reporting duplicate data packets to sender 7/28/2010 ICNP 2003 Increasing dupthresh has risks Setting dupthresh greater than max reordering length can avoid false fast retransmit Risks of increasing dupthresh include: Generate one-second-minimum timeouts, not enough ACKs return after real loss to trigger fast retransmit Increased end-to-end delay for dropped packets, longer for enough dupacks to return Delayed response to congestion, window reduction delayed until enough dupacks arrive The scheme for adapting dupthresh must balance these opposing goals Avoid both false fast retransmit (too small a dupthresh) and timeout (too large a dupthresh) 7/28/2010 ICNP 2003 Measuring Reordering Length When packet 1 is delayed, one dupack will be generated by receiver for each packet 2, 3, 4, 5 that arrives before 1 Packet 1 will create a hole in SACK sender’s scoreboard The returning cumulative ACK (C) or SACK (S) for packet 1 will close the hole in the scoreboard Packet 1’s reordering length is the difference between the greatest SACKed/ACKed packet number, 5, and 1 Receiver: 2 3 4 5 1 Sender: S2 S3 S4 S5 C5 1 2 3 4 5 Sequence # ACKed? Reordering length 4 7/28/2010 ICNP 2003 Measuring Reordering Length Distribution Sender stores timestamped samples of reordering length in a reordering histogram Sample expires after a configurable interval Histogram summarizes distribution of reordering for any persistent reordering process Histogram stores up to n reordering events, each of which requires a timestamp (4 bytes) and a pointer (4 bytes) The reordering length measurement scans the same number of scoreboard entries as standard SACK 7/28/2010 ICNP 2003 DSACK-FA: Avoiding False Fast Retransmit To avoid false fast retransmit for X% (FA ratio) of reorderings, set dupthresh to the X% value in the reordering length cumulative distribution Histogram maps FA ratio to dupthresh value Fixed FA ratio: DSACK-FA (false fast retransmit avoidance) Can we adapt FA ratio dynamically? Central idea: choose FA ratio to optimize bandwidth of a connection 7/28/2010 ICNP 2003 Outline Motivation False fast retransmit and dupthresh Measuring reordering length and DSACK-FA Cost of false fast retransmit and timeout Using costs to adapt dupthresh to maximize throughput Experimental evaluation Conclusion and future work 7/28/2010 ICNP 2003 Cost: False Fast Retransmit False fast retransmit has opportunity cost in needlessly missed packet transmissions cwnd is reduced by half until sender learns retransmit is false To recover from false fast retransmit when DSACK returns, sender restores previous window value [Floyd99], [Ludwig00] How many packets could we have transmitted but didn't? The cost depends on the duration D of wrongly reduced window For fixed cwnd W and k D / RTT W W W cost (W ) (W 1) ... (W (k 1)) 2 2 2 7/28/2010 ICNP 2003 Cost: Timeout After a real loss, timeouts have two main costs beyond fast retransmit cost: The idle period after sender cannot send any new packets, but before retransmission timer expires Slow start, during which cwnd grows from one to half the previous cwnd Extend limited transmit to send k*cwnd additional dupack clocked packets For fixed cwnd W, C(timeout) = C(idle) + C(ss) where, RTO C (idle ) W W (1 k ) RTT C( ss) (W 1) (W 2) (W 4) ... (W W / 2) 7/28/2010 ICNP 2003 DACK-TA: Adapting FA ratio to maximize throughput Limited transmit can also introduce an opportunity cost, C(limited transmit), in idle time. Let S be the fundamental step by which we adapt FA ratio (0.01 in this work) Upon every false fast retransmit, increase FA ratio by S Upon every timeout, decrease FA ratio by C (timeout ) S C ( false fast retransmit ) Upon every limited-transmit-induced idle period, decrease FA ratio by C (limited tr ansmit) S C ( false fast retransmit ) Algorithm name: DSACK-TA (timeout avoidance) 7/28/2010 ICNP 2003 Experimental Evaluation Use NS-2 network simulator Extended to delay or drop packets according to various distributions One long-lived TCP flow lasting 1000s Initial FA ratio is 90%ile sampled Limited transmit bound is 1*cwnd Reordering length sample life time is 80s Compare 4 TCP variants: SACK, DSACK-R, DSACK-FA and DSACK-TA S1 10 Mbps 1 ms 7/28/2010 R1 ? Mbps ? ms ICNP 2003 R2 10 Mbps 1 ms S2 False Fast Retransmit Avoidance Link delay 50ms, delay time [0, 50ms], normal distribution Throughput (pkts) 500000 400000 300000 200000 100000 0 0 5 10 15 20 25 % Delayed Pkts SACK DSACK-TA 7/28/2010 DSACK-R NODELAY ICNP 2003 DSACK-FA 30 % reordering causing FFR False Fast Retransmit Ratio 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 5 10 15 20 25 % of Delayed Pkts DSACK-R 7/28/2010 DSACK-FA ICNP 2003 DSACK-TA 30 Timeout Avoidance – Timeout Ratio % of pkts that timeout Link delay 100ms, delay time [0, 400ms], uniform distribution Loss rate 0.6% and delay rate 1.4% 0.4% 0.3% 0.3% 0.2% 0.2% 0.1% 0.1% 0.0% 0% SACK 7/28/2010 20% 40% 60% FA Ratio % SACK-R DSACK-TA ICNP 2003 80% 100% DSACK-FA % of pkts resent w/ fast rtx Timeout Avoidance – Fast Retransmit Ratio 1.6% 1.4% 1.2% 1.0% 0.8% 0.6% 0.4% 0.2% 0.0% 0% 20% 40% 60% 80% 100% FA Ratio % SACK 7/28/2010 SACK-R DSACK-TA ICNP 2003 DSACK-FA Timeout Avoidance Throughput Throughput (pkts) 64000 59000 54000 49000 44000 39000 0% 20% 40% 60% 80% 100% FA Ratio % SACK 7/28/2010 SACK-R DSACK-TA ICNP 2003 DSACK-FA Loss and Reordering link delay 50ms, 5% delayed pkt, mean delay 25ms, normal Throughput (pkts) 600000 500000 400000 300000 200000 100000 0 0.00% 0.50% 1.00% 1.50% Drop Rate % 7/28/2010 SACK DSACK-R DSACK-TA NODELAY ICNP 2003 DSACK-FA 2.00% Related Work [Ludwig00] Use TCP timestamp option to identify and recover from false fast retransmit, but no dupthresh adaptation [Blanton01] Increase dupthresh to avoid false fast retransmit, but no explicit timeout avoidance 7/28/2010 ICNP 2003 Conclusion RR-TCP improves TCP’s robustness under reordering Use reordering histogram to adapt dupthresh to avoid false fast retransmit Balance false fast retransmit, timeout and limited transmit according to their relative costs Not more aggressive than SACK under congestion, only more slowly responsive [Bansal01] Demonstrate the relative importance of loss and reordering to throughput, reordering matters most under low loss rate Future work Receiver-side RR-TCP, store reordering state and dupthresh at clients Share reordering state among short-lived flows Study RR-TCP together with multi-path routing URL: http://www.icir.org/bkarp/RR-TCP 7/28/2010 ICNP 2003
© Copyright 2024