Modeling TCP Congestion Control Don Towsley UMass Amherst

  • Slides: 41
Download presentation
Modeling TCP Congestion Control Don Towsley UMass Amherst towsley@cs. umass. edu collaborators: T. Bu,

Modeling TCP Congestion Control Don Towsley UMass Amherst towsley@cs. umass. edu collaborators: T. Bu, W. Gong, C. Hollot, V. Misra

Outline • motivation • TCP primer • bottleneck invariance principle – instability of RED

Outline • motivation • TCP primer • bottleneck invariance principle – instability of RED active queue management – fixed point approximations of TCP networks • transient analysis of TCP networks – stochastic differential equations • summary

Properties of TCP • 90% of Internet traffic • conservative end-end congestion control (CC)

Properties of TCP • 90% of Internet traffic • conservative end-end congestion control (CC) – equal bandwidth share – additive increase multiplicative decrease CC • only end-end protocol with congestion control TCP can be pushed around TCP 1 2 3 4 UDP-12 Mbs 20 Mbs B 1=5 Mbs B 2=5 Mbs B 3=5 Mbs B 4=5 Mbs 1 2 3 4 { TCP 20 Mbs 12 Mbs B 1= 2 Mbs B 2= 2 Mbs B 3= 2 Mbs B 4= 2 Mbs

Properties of TCP • 90% of Internet traffic • conservative end-end congestion control (CC)

Properties of TCP • 90% of Internet traffic • conservative end-end congestion control (CC) – equal bandwidth share – additive increase multiplicative decrease CC • only end-end protocol with congestion control Need to understand TCP in network setting

Additive-Increase Multiplicative. Decrease (AIMD) Congestion Control ri - rate after i-th feedback ri+1 =

Additive-Increase Multiplicative. Decrease (AIMD) Congestion Control ri - rate after i-th feedback ri+1 = ri + c ri+1 = a x ri if i-th feedback is no congestion if i-th feedback indicates congestion, a<1 • similar algorithms for window-based CC • basic building block of most congestion control algorithms (e. g. , TCP)

AIMD and Fair Share • as time goes on, i increases, source rates converge

AIMD and Fair Share • as time goes on, i increases, source rates converge to a fair share C (r 1, r 2) . . . ri 2 • two sources, rates ri 1, ri 2 • bandwidth C • initial rates r 1 and r 2 ri 1 (Chiu, Jain 89) C

Generic TCP Behavior • window algorithm (window W ) – up to W packets

Generic TCP Behavior • window algorithm (window W ) – up to W packets can be in network – return of ACK allows sender to send another packet – ACKS cumulative • increase window by one per RTT W <- W +1/W per ACK W <- W +1 per RTT • seeks available network bandwidth

receiver sender W

receiver sender W

Generic TCP Behavior • window algorithm (window W) • increase window by one per

Generic TCP Behavior • window algorithm (window W) • increase window by one per RTT W <- W +1/W per ACK • loss indication of congestion • decrease window by half on detection of loss (triple duplicate ACK), W <- W/2

receiver TD sender

receiver TD sender

Generic TCP Behavior • window algorithm (window W) • increase window by one per

Generic TCP Behavior • window algorithm (window W) • increase window by one per RTT W <- W +1/W per ACK • halve window on detection of loss, W <- W/2 • timeouts due to lack of ACKs -> window reduced to one, W <- 1

receiver sender TO

receiver sender TO

Generic TCP Behavior • window algorithm (window W) • increase window by one per

Generic TCP Behavior • window algorithm (window W) • increase window by one per RTT (or one over window per ACK, W <- W +1/W) • halve window on detection of loss, W <- W/2 • timeouts due to lack of ACKs, W <- 1 • successive timeout intervals grow exponentially long • slow start mechanism • provides fair share, full use of bandwidth

Early Models • B(p, R) - TCP session throughput – p - loss probability

Early Models • B(p, R) - TCP session throughput – p - loss probability – R - avg round trip time • equilibrium analysis of AIMD B (p, R ) 1/R (1/p )1/2 • equilibrium analysis incl. timeouts (PFTK 98) B (p, R ) [R (4 p/3)1/2 + T 0 3(3 p/4)1/2 p (1+32 p 2)]-1 – T 0 - timeout length

Experiments: – 38 traces from 18 hosts – unidirectional bulk transfers – 100 sec

Experiments: – 38 traces from 18 hosts – unidirectional bulk transfers – 100 sec measurements Packets/s Validation Measured PFTK p 1/2 model Observations: Packets/s – many timeout loss events Conclusions: Measured PFTK p 1/2 model – good validation – other studies support model – insensitive to TCP version

Lessons • TCP exhibits well defined bandwidth curve – decreasing function of R and

Lessons • TCP exhibits well defined bandwidth curve – decreasing function of R and p • timeouts important • little difference between TCP versions – AIMD, timeouts – Vegas an exception

Bottleneck invariance principle • bottleneck router – loss, high load, util. 1 • bottleneck

Bottleneck invariance principle • bottleneck router – loss, high load, util. 1 • bottleneck invariance principle (BIP) i Bi (Ri , p) = C C - router bandwidth Bi - throughput of flow i

Applications of BIP • provides simple checks of protocol design – active queue management,

Applications of BIP • provides simple checks of protocol design – active queue management, RED – new improved congestion control algorithms • accurate models of networks supporting infinite/finite duration TCP flows – thruput, loss rate, avg. queue length, …

Active queue management • drop tail - drop pkt when buffer fills • active

Active queue management • drop tail - drop pkt when buffer fills • active queue management (AQM) – proactively drop/mark packets before buffer overflow – example: drop pkt with probability p(x) x - avg. queue length

RED (Random Early Detect) marking prob p RED: marking/dropping based on average queue length

RED (Random Early Detect) marking prob p RED: marking/dropping based on average queue length x (t ) 1 pmax tmin tmax 2 tmax avg queue length x x (t) : smoothed, time averaged q (t) x (ti +1) = a q (ti +d) + (1 -a) x (ti) - q (t ) - x (t ) t

RED discard function • RED queue • N identical TCP sources B(R, p) =

RED discard function • RED queue • N identical TCP sources B(R, p) = C/N C N • p increases with N N 1 < N 2 < N 3 < N 4 pmax ? N 2 N 3 N 1 tmax

RED discard function • RED queue • N identical TCP sources B(R, p) =

RED discard function • RED queue • N identical TCP sources B(R, p) = C/N C N queue length • p increases with N N 4 tmax pmax tmax

RED discard function • RED queue • N identical TCP sources B(R, p) =

RED discard function • RED queue • N identical TCP sources B(R, p) = C/N C N • p increases with N • once p > pmax, queue oscillates around tmax RED unstable! pmax (Firoiu, Borden, 00) tmax

Marking prob. p Improved RED 1 discontinuity removed in gentle_ variant pmax tmin tmax

Marking prob. p Improved RED 1 discontinuity removed in gentle_ variant pmax tmin tmax 2 tmax Avg queue length x

Stationary Behavior: single bottleneck 1 N • N infinite source. TCP sessions • one

Stationary Behavior: single bottleneck 1 N • N infinite source. TCP sessions • one bottleneck link – AQM; avg. queue length q, discard prob. p(q) – bandwidth C • R (q) = R 0 + q /C (round trip time) B (p(q ), R (q )) = C /N

Single bottleneck • solve a fixed point problem for q • unique solution provided

Single bottleneck • solve a fixed point problem for q • unique solution provided B is monotonic and continuous • resulting q can be used to obtain Ri and p • preliminary evaluation results good

Results: single Bottleneck PFKT Model • N TCP flows • two-way prop. delay 18+2

Results: single Bottleneck PFKT Model • N TCP flows • two-way prop. delay 18+2 i ms, i = 1, …, N • link bandwidth N • RED: scaled by N RT Model + p 1/2 Model Simulation p 1/2 Model similar results for short-lived flows

Network Setting • N TCP flows – throughputs {Bi } • V congested RED

Network Setting • N TCP flows – throughputs {Bi } • V congested RED routers TCP flow Bi – capacities {Cv } – avg. queue lengths x = {xv } – discard prob. p = {pv (xv )} AQM router Cv, pv(xv) • TCP model: Bi (R , p) • congested router model i Bi (x ) = Cv , v =1, …, V • V equations, V unknowns

 • 8 networks, 5 -10 routers • 2 -way propagation delay 20 -120

• 8 networks, 5 -10 routers • 2 -way propagation delay 20 -120 ms • bandwidth, 2 -6 Mbps • error • throughput < 10% • loss rate < 10% • avg. queue length < 15% • similar results for cyclic networks Estimated end to end loss Results: Tandem Networks + RT Model 1/2 p Model Measured end 2 end loss

Transient Analysis: single router • one congested router – capacity {C (packets/sec) } –

Transient Analysis: single router • one congested router – capacity {C (packets/sec) } – queue length q (t ) – discard prob. p(t ) 1 • N TCP flows – window sizes Wi (t ) N – round trip time Ri(t ) = Ai +q (t )/C – throughputs Bi (t ) = Wi (t )/Ri (t )

System of Stochastic Differential Equations Assumption: Poisson loss process {Ni (t)}: Window Size: d.

System of Stochastic Differential Equations Assumption: Poisson loss process {Ni (t)}: Window Size: d. Wi = dt Ri (q (t)) Wi d. Ni (t-t) Additive increase Mult. decrease 2 {Ni (t)}: time varying, rate Wi (t ) p (x (t )) / Ri (q (t)) t : feedback delay (round trip time) Timeouts ignored

System of SDEs Window Size: d. Wi Queue length: dq = dt Ri (q

System of SDEs Window Size: d. Wi Queue length: dq = dt Ri (q (t)) Wi d. N (t-t) i 2 = - 1{q (t) > 0}C dt + Outgoing traffic S Wi (t ) dt Ri (q (t )) Incoming traffic

System of Differential Equations EWi(t-t) Ep(t-t) 1 EW Window Size: d. EWi i dt

System of Differential Equations EWi(t-t) Ep(t-t) 1 EW Window Size: d. EWi i dt Ri(Eq(t)) 2 Ri (Eq(t-t)) Queue length: d. Eq -1{Eq(t) > 0}C dt + Conjecture: exact as no. flows -> S EWi(t) Ri(Eq(t))

System of Differential Equations (cont. ) Estimated ln (1 -a) d. Ex Ex(t) Eq(t)

System of Differential Equations (cont. ) Estimated ln (1 -a) d. Ex Ex(t) Eq(t) average queue length: dt = d d a = averaging parameter of RED d = sampling interval ~ 1/C = dp dx d. Ex dt dp/dx is obtained from the marking profile p Loss probability: d. Ep dt x

Result: N +2 coupled equations d. EWi = f (Ep, R EW ) i

Result: N +2 coupled equations d. EWi = f (Ep, R EW ) i = 1. . N 1 i, i dt d. Ep = f 3(Eq) dt d. Eq = f (EW ) 2 i dt Numerical solution using MATLAB

Extension to Networked case: V congested AQM routers Queuing delay - aggregate delay q(t

Extension to Networked case: V congested AQM routers Queuing delay - aggregate delay q(t ) = SV q. V(t ) loss probability - cumulative loss probability p(t) = 1 -PV (1 -p. V(t )) Other extensions to model Timeouts: leveraged work done in [PFTK Sigcomm 98] to model timeouts Flow aggregation: represent flows sharing same route by single equation

Validation scenario Topology • two RED routers • one way ftp flows traffic sources

Validation scenario Topology • two RED routers • one way ftp flows traffic sources • comparison to ns • transient queuing performance obtained Flow set 4 Flow set 1 RED router 2 Flow set 3 Flow set 5 5 sets of flows 2 RED routers Set 2 flows through both routers

Performance of DE method Fluid model ns simulation Inst. Queue length • link rates

Performance of DE method Fluid model ns simulation Inst. Queue length • link rates 5 Mb/s • load variation at t=75 and t=150 seconds • 200 flows simulated • drops to 120 between t = 75, 120 • fluid model captures transient performance Time

Observations on RED • “Tuning” of RED is difficult - sensitive to packet sizes,

Observations on RED • “Tuning” of RED is difficult - sensitive to packet sizes, load levels, round trip time, etc. • oscillations due to – – exponential smoothing of queue length use of variable sampling interval d feedback delay queue fill delay Further Work Detailed Control Theoretic Analysis of TCP/RED available at http: //gaia. cs. umass. edu/papers. html

Improving congestion control Queue length vs. Time - PI Controller - RED Controller queue

Improving congestion control Queue length vs. Time - PI Controller - RED Controller queue length Working with linearized model • developed rules for setting RED parameters • compared Proportional Integral (PI) controller with properly adjusted RED • ns simulation with time varying http, ftp flows • PI controller: faster response, decouples queue size and load level time

Conclusions • fixed point approach promising for stationary behavior of TCP • DE approach

Conclusions • fixed point approach promising for stationary behavior of TCP • DE approach promising for more detailed transient behavior • computation cost of methods a fraction of the discrete event simulation cost • formal representation and analysis yields better understanding of RED/AQM