RR-TCP: A Reordering-Robust TCP with DSACK

RR-TCP: A Reordering-Robust
TCP with DSACK
Ming Zhang (Princeton)
Brad Karp (Intel Research/Carnegie Mellon)
Sally Floyd (ICSI)
Larry Peterson (Princeton)
7/28/2010
ICNP 2003
Motivation





TCP performs badly under packet reordering
In today’s Internet, reordering is considered maladaptive:
 Route oscillation [Paxson96]
 Router software errors [Paxson97]
 Striping packets across multiple links [Bennett99]
 Satellite links [Ward95]
The above are not the main reasons that TCP should be
reordering-robust
Beneficial systems cannot be deployed because of reordering
 Multi-path routing
 Parallel packet forwarding path [Chen01]
Goal: improve TCP’s robustness on paths that reorder packets,
without increasing aggressiveness under congestion.
7/28/2010
ICNP 2003
Outline







Motivation
False fast retransmit and dupthresh
Measuring reordering length and DSACK-FA
Cost of false fast retransmit and timeout
Using costs to adapt dupthresh to maximize
throughput
Experimental evaluation
Conclusion and future work
7/28/2010
ICNP 2003
Problem: False Fast Retransmit






TCP detects loss with duplicate ACKs (dupacks)
TCP enters fast retransmit when number of dupacks
reaches some theshold dupthresh
dupthresh is 3 per [Jacobson89]
If a packet’s position is perturbed by more than 3
packets, sender misinterprets reordering as loss and
enters false fast retransmit
False fast retransmit causes congestion window (cwnd)
to be halved unnecessarily
DSACK can help identify false fast retransmits by
reporting duplicate data packets to sender
7/28/2010
ICNP 2003
Increasing dupthresh has risks




Setting dupthresh greater than max reordering length can avoid
false fast retransmit
Risks of increasing dupthresh include:
 Generate one-second-minimum timeouts, not enough ACKs
return after real loss to trigger fast retransmit
 Increased end-to-end delay for dropped packets, longer for
enough dupacks to return
 Delayed response to congestion, window reduction delayed
until enough dupacks arrive
The scheme for adapting dupthresh must balance these
opposing goals
Avoid both false fast retransmit (too small a dupthresh) and
timeout (too large a dupthresh)
7/28/2010
ICNP 2003
Measuring Reordering Length




When packet 1 is delayed, one dupack will be generated by
receiver for each packet 2, 3, 4, 5 that arrives before 1
Packet 1 will create a hole in SACK sender’s scoreboard
The returning cumulative ACK (C) or SACK (S) for packet 1 will
close the hole in the scoreboard
Packet 1’s reordering length is the difference between the
greatest SACKed/ACKed packet number, 5, and 1
Receiver: 2 3 4 5 1
Sender: S2 S3 S4 S5 C5
1 2 3 4 5
Sequence #
ACKed?
Reordering length 4
7/28/2010
ICNP 2003
Measuring Reordering Length
Distribution





Sender stores timestamped samples of reordering
length in a reordering histogram
Sample expires after a configurable interval
Histogram summarizes distribution of reordering for
any persistent reordering process
Histogram stores up to n reordering events, each of
which requires a timestamp (4 bytes) and a pointer
(4 bytes)
The reordering length measurement scans the same
number of scoreboard entries as standard SACK
7/28/2010
ICNP 2003
DSACK-FA: Avoiding False
Fast Retransmit





To avoid false fast retransmit for X% (FA ratio) of
reorderings, set dupthresh to the X% value in the
reordering length cumulative distribution
Histogram maps FA ratio to dupthresh value
Fixed FA ratio: DSACK-FA (false fast retransmit
avoidance)
Can we adapt FA ratio dynamically?
Central idea: choose FA ratio to optimize bandwidth
of a connection
7/28/2010
ICNP 2003
Outline







Motivation
False fast retransmit and dupthresh
Measuring reordering length and DSACK-FA
Cost of false fast retransmit and timeout
Using costs to adapt dupthresh to maximize
throughput
Experimental evaluation
Conclusion and future work
7/28/2010
ICNP 2003
Cost: False Fast Retransmit






False fast retransmit has opportunity cost in needlessly missed
packet transmissions
cwnd is reduced by half until sender learns retransmit is false
To recover from false fast retransmit when DSACK returns,
sender restores previous window value [Floyd99], [Ludwig00]
How many packets could we have transmitted but didn't?
The cost depends on the duration D of wrongly reduced window
For fixed cwnd W and k  D / RTT 
W
W
W
cost  (W  )  (W   1)  ...  (W   (k  1))
2
2
2
7/28/2010
ICNP 2003
Cost: Timeout



After a real loss, timeouts have two main costs beyond fast
retransmit cost:
 The idle period after sender cannot send any new packets,
but before retransmission timer expires
 Slow start, during which cwnd grows from one to half the
previous cwnd
Extend limited transmit to send k*cwnd additional dupack
clocked packets
For fixed cwnd W, C(timeout) = C(idle) + C(ss) where,
RTO
C (idle )  W
 W (1  k )
RTT
C( ss)  (W  1)  (W  2)  (W  4)  ...  (W  W / 2)
7/28/2010
ICNP 2003
DACK-TA: Adapting FA ratio to
maximize throughput


Limited transmit can also introduce an opportunity cost, C(limited
transmit), in idle time.
Let S be the fundamental step by which we adapt FA ratio (0.01
in this work)
 Upon every false fast retransmit, increase FA ratio by S
 Upon every timeout, decrease FA ratio by
C (timeout )
S
C ( false fast retransmit )

Upon every limited-transmit-induced idle period, decrease FA
ratio by
C (limited tr ansmit)
S
C ( false fast retransmit )

Algorithm name: DSACK-TA (timeout avoidance)
7/28/2010
ICNP 2003
Experimental Evaluation


Use NS-2 network simulator
 Extended to delay or drop packets according to various
distributions
 One long-lived TCP flow lasting 1000s
 Initial FA ratio is 90%ile sampled
 Limited transmit bound is 1*cwnd
 Reordering length sample life time is 80s
Compare 4 TCP variants: SACK, DSACK-R, DSACK-FA and
DSACK-TA
S1
10 Mbps
1 ms
7/28/2010
R1
? Mbps
? ms
ICNP 2003
R2
10 Mbps
1 ms
S2
False Fast Retransmit Avoidance

Link delay 50ms, delay time [0, 50ms], normal distribution
Throughput (pkts)
500000
400000
300000
200000
100000
0
0
5
10
15
20
25
% Delayed Pkts
SACK
DSACK-TA
7/28/2010
DSACK-R
NODELAY
ICNP 2003
DSACK-FA
30
% reordering causing FFR
False Fast Retransmit Ratio
80%
70%
60%
50%
40%
30%
20%
10%
0%
0
5
10
15
20
25
% of Delayed Pkts
DSACK-R
7/28/2010
DSACK-FA
ICNP 2003
DSACK-TA
30
Timeout Avoidance – Timeout Ratio

% of pkts that timeout

Link delay 100ms, delay time [0, 400ms], uniform distribution
Loss rate 0.6% and delay rate 1.4%
0.4%
0.3%
0.3%
0.2%
0.2%
0.1%
0.1%
0.0%
0%
SACK
7/28/2010
20%
40%
60%
FA Ratio %
SACK-R
DSACK-TA
ICNP 2003
80%
100%
DSACK-FA
% of pkts resent w/ fast rtx
Timeout Avoidance – Fast
Retransmit Ratio
1.6%
1.4%
1.2%
1.0%
0.8%
0.6%
0.4%
0.2%
0.0%
0%
20%
40%
60%
80%
100%
FA Ratio %
SACK
7/28/2010
SACK-R
DSACK-TA
ICNP 2003
DSACK-FA
Timeout Avoidance Throughput
Throughput (pkts)
64000
59000
54000
49000
44000
39000
0%
20%
40%
60%
80%
100%
FA Ratio %
SACK
7/28/2010
SACK-R
DSACK-TA
ICNP 2003
DSACK-FA
Loss and Reordering

link delay 50ms, 5% delayed pkt, mean delay 25ms, normal
Throughput (pkts)
600000
500000
400000
300000
200000
100000
0
0.00%
0.50%
1.00%
1.50%
Drop Rate %
7/28/2010
SACK
DSACK-R
DSACK-TA
NODELAY
ICNP 2003
DSACK-FA
2.00%
Related Work


[Ludwig00] Use TCP timestamp option to identify and
recover from false fast retransmit, but no dupthresh
adaptation
[Blanton01] Increase dupthresh to avoid false fast
retransmit, but no explicit timeout avoidance
7/28/2010
ICNP 2003
Conclusion



RR-TCP improves TCP’s robustness under reordering
 Use reordering histogram to adapt dupthresh to avoid false
fast retransmit
 Balance false fast retransmit, timeout and limited transmit
according to their relative costs
 Not more aggressive than SACK under congestion, only
more slowly responsive [Bansal01]
 Demonstrate the relative importance of loss and reordering
to throughput, reordering matters most under low loss rate
Future work
 Receiver-side RR-TCP, store reordering state and dupthresh
at clients
 Share reordering state among short-lived flows
 Study RR-TCP together with multi-path routing
URL: http://www.icir.org/bkarp/RR-TCP
7/28/2010
ICNP 2003