TCP TCP Congestion Control and Common AQM Schemes
- Slides: 60
TCP, TCP Congestion Control and Common AQM Schemes: Quick Revision Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse. rpi. edu http: //www. ecse. rpi. edu/Homepages/shivkuma Based in part upon slides of Prof. Raj Jain (OSU), Srini Seshan (CMU), J. Kurose (U Mass), I. Stoica (UCB) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1
Overview q Quick introduction to TCP Services q TCP Reliability Model, Mechanisms q TCP Congestion Control Model and Mechnisms q TCP Versions: Reno, New. Reno, SACK, Vegas etc q AQM schemes: common goals, RED, … Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2
Multiplexing / demultiplexing 32 bits source port # dest port # header/payload fields application-layer data segment header segment Ht M Hn segment P 1 M application transport network P 3 receiver M M application transport network P 4 M P 2 application transport network Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3
Checksum Goal: detect “errors” (e. g. , flipped bits) in transmitted segment (I. e. , payload + header) Note: IP only has a header checksum. Receiver: q Compute checksum of received segment q Check if computed checksum equals checksum field value: q NO - error detected q YES - no error detected. But maybe errors nonetheless? Sender: q Treat segment contents as sequence of 16 -bit integers q Checksum: addition (1’s complement sum) of segment contents q Sender puts checksum value into UDP checksum field Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4
Introduction to TCP q Communication abstraction: close equivalent to UNIX file-system interface => programmer productivity! q Reliable q Ordered q Point-to-point q Byte-stream q Full duplex q Flow and congestion controlled Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5
TCP Header Source port Destination port Sequence number Flags: SYN FIN RESET PUSH URG ACK Acknowledgement Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6
Principles of Reliable Data Transfer q Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7
Temporal Redundancy Model Packets • Sequence Numbers • CRC or Checksum Timeout • ACKs • NAKs, • SACKs • Bitmaps Status Reports Retransmissions • Packets • FEC information Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8
Types of errors and effects q q q Forward channel bit-errors (garbled packets) Forward channel packet-errors (lost packets) Reverse channel bit-errors (garbled status reports) Reverse channel bit-errors (lost status reports) Protocol-induced effects: q Duplicate packets q Duplicate status reports q Out-of-order packets q Out-of-order status reports q Out-of-range packets/status reports (in window-based transmissions) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9
q q Reliability Mechanisms… Mechanisms: q Checksum: detects corruption in pkts & acks q ACK: “packet correctly received” q Duplicate ACK: “packet incorrectly received” q Sequence number: identifies packet or ack q 1 -bit sequence number used both in forward & reverse channel q Timeout only at sender Provides reliable transmission over: q An error-free channel q A forward & reverse channel with bit-errors q Detects duplicates of packets/acks q NAKs eliminated q Forward & reverse channel w/ packet-errors (loss) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10
Example: Three-Way Handshake q TCP connection-establishment: 3 -way-handshake necessary and sufficient for unambiguous setup/teardown even under conditions of loss, duplication, and delay Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11
Stop-and-Wait (Handshake) Efficiency tframe U= tframe 2 tprop+tframe 1 Data = tprop Data Ack 2 + 1 U Ack = tprop tframe = Distance/Speed of Signal Frame size /Bit rate Distance Bit rate = Frame size Speed of Signal Rensselaer Polytechnic Institute No loss or bit-errors! 12 Light in vacuum = 300 m/ s Light in fiber = 200 m/ s Electricity = 250 m/ s Shivkumar Kalyanaraman
“Sliding Window” Protocols U= tframe Data tprop = Ntframe 2 tprop+tframe N 2 +1 1 if N>2 +1 Ack Note: no loss or bit-errors! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13
Sliding Window: Details Sender Max ACK received Receiver Next expected Next seqnum … … Sender window Sent & Acked Sent Not Acked OK to Send Not Usable Max acceptable Receiver window Received & Acked Acceptable Packet Not Usable Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14
Window Flow Control: Header Packet Received Packet Sent Source Port Dest. Port Sequence Number Acknowledgment HL/Flags Window D. Checksum Urgent Pointer Options. . App write acknowledged sent to be sentoutside window Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15
Go-Back-N Retransmission q q If you hear of packet loss, retransmit the whole window! k-bit seq # in pkt header q Allows upto N = 2 k – 1 packets in-flight, unacked ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” q Sender may receive duplicate ACKs q Robust to ack losses on the reverse channel q Can pinpoint the first packet lost, but cannot identify blocks of lost packets in window One timer for oldest-in-flight pkt Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16
Selective Repeat: Sender, Receiver Windows Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17
Timeout and RTT Estimation q Problem: q Unlike a physical link, the RTT of a logical link can vary, quite substantially q How long should timeout be ? q Too long => underutilization q Too short => wasteful retransmissions q Solution: adaptive timeout: based on a good estimate of maximum current value of RTT Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18
How to estimate max RTT? q q q RTT = prop + queuing delay q Queuing delay highly variable q So, different samples of RTTs will give different random values of queuing delay Chebyshev’s Theorem: q Max. RTT = Avg RTT + k*Deviation q Error probability is less than 1/(k**2) q Result true for ANY distribution of samples In TCP: q Timeout = Average. RTT + 4*Deviation q Rounded up to timer granularity (50 -500 ms) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19
Recap: Stability of a Multiplexed System Average Input Rate > Average Output Rate => system is unstable! How to ensure stability ? 1. Reserve enough capacity so that demand is less than reserved capacity 2. Dynamically detect overload and adapt either the demand or capacity to resolve overload Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20
Congestion Problem in Packet Switching 10 Mbs Ethernet A B statistical multiplexing C 1. 5 Mbs queue of packets waiting for output link 45 Mbs D E If capacity is sized to be less than peak demand (statistical muxing!), need to either reserve resources or dynamically detect/adapt to overload for stability q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21
q q knee – point after which q throughput increases very slowly q delay increases fast cliff – point after which q throughput starts to decrease very fast to zero (congestion collapse) q delay approaches infinity Delay q Throughput Congestion: A Close-up View Note (in an M/M/1 queue) q delay = 1/(1 – utilization) knee packet loss cliff congestion collapse Load Shivkumar Load Kalyanaraman Rensselaer Polytechnic Institute 22
Congestion Control vs. Congestion Avoidance Congestion control goal q stay left of cliff q Congestion avoidance goal q stay left of knee q Right of cliff: q Congestion collapse q Increase in network load results in decrease of useful work done Throughput q knee cliff congestion collapse Load Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23
End-to-End Congestion Control q 1. End-to-end model: q End-system estimate the timing and degree of congestion and reduces its demand appropriately q Intermediate nodes relied upon to send timely and appropriate penalty indications (eg: packet loss rate) during congestion q Enhanced routers could send more accurate congestion signals (eg: early congestion notifications, I. e. ECNs) q Key: trust and complexity resides at end-systems Issue: What about misbehaving flows? q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24
Packet Conservation: Self-clocking Pb Pr Sender Receiver As q Ab Ar Implications of ack-clocking: q More batching of acks => bursty traffic Less batching leads to a large fraction of Internet traffic being just acks (overhead) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute q 25
Additive Increase/Multiplicative Decrease (AIMD) Policy Fairness Line x 1 User 2’s Allocation x 2 x 0 x 2 Efficiency Line User 1’s Allocation x 1 q Assumption: decrease policy must (at minimum) reverse the load increase over-and-above efficiency line q Implication: decrease factor should be conservatively set to account for any congestion detection lags etc Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26
TCP Congestion Control q Maintains three variables: q cwnd – congestion window q rcv_win – receiver advertised window q ssthresh – threshold size (used to update cwnd) q Rough estimate of knee point… q For sending use: win = min(rcv_win, cwnd) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27
TCP: Slow Start q Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced: q Set cwnd =1 q Each time a segment is acknowledged increment cwnd by one (cwnd++). q Does Slow Start increment slowly? Not really. In fact, the increase of cwnd is exponential!! q Window increases to W in RTT * log 2(W) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28
Slow Start Example q The congestion window size grows very rapidly cwnd = 1 segment 1 ACK for segm cwnd = 2 segment 3 ents 2 + ACK for segm cwnd = 4 q TCP slows down the increase of cwnd when cwnd >= ssthresh cwnd = 8 3 segment 4 segment 5 segment 6 segment 7 ents 4+5+6+7 ACK for segm Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29
Slow Start Sequence Plot. . . Sequence No Window doubles every round Time Rensselaer Polytechnic Institute 30 Shivkumar Kalyanaraman
Congestion Avoidance q q q Goal: maintain operating point at the left of the cliff: How? q additive increase: starting from the rough estimate (ssthresh), slowly increase cwnd to probe for additional available bandwidth q multiplicative decrease: cut congestion window size aggressively if a loss is detected. If cwnd > ssthresh then each time a segment is acknowledged increment cwnd by 1/cwnd i. e. (cwnd += 1/cwnd). Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31
Congestion Avoidance Sequence Plot Sequence No Window grows by 1 every round Time Rensselaer Polytechnic Institute 32 Shivkumar Kalyanaraman
Slow Start/Congestion Avoidance Eg. Cwnd (in segments) q Assume that ssthresh =8 ssthresh Roundtrip times Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33
Putting Everything Together: TCP Pseudo-code Initially: cwnd = 1; ssthresh = infinite; New ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: (loss detection) /* Multiplicative decrease */ ssthresh = win/2; cwnd = 1; while (next < unack + win) transmit next packet; where win = min(cwnd, flow_win); seq # unack next win Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34
The big picture cwnd Timeout Congestion Avoidance Slow Start Time Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35
Packet Loss Detection: Timeout Avoidance q q Wait for Retransmission Time Out (RTO) What’s the problem with this? q Because RTO is a performance killer In BSD TCP implementation, RTO is usually more than 1 second q the granularity of RTT estimate is 500 ms q retransmission timeout is at least two times of RTT Solution: Don’t wait for RTO to expire q Use alternate mechanism for loss detection q Fall back to RTO only if these alternate mechanisms fail. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 36
Fast Retransmit Resend a segment after 3 duplicate ACKs q Recall: a duplicate cwnd = 1 ACK means that an out-of sequence cwnd = 2 segment was received cwnd = 4 q Notes: q duplicate ACKs due packet reordering! 3 duplicate ACKs q if window is small don’t get duplicate ACKs! q segment 1 ACK 1 segment 2 segment 3 ACK 1 ACK 3 ACK 4 segment 5 segment 6 segment 7 ACK 4 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 37
Fast Recovery (Simplified) After a fast-retransmit set cwnd to ssthresh/2 q i. e. , don’t reset cwnd to 1 q But when RTO expires still do cwnd = 1 q q Fast Retransmit and Fast Recovery implemented by TCP Reno; most widely used version of TCP today Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 38
Fast Retransmit and Fast Recovery cwnd Congestion Avoidance Slow Start Time q q q Retransmit after 3 duplicated acks q prevent expensive timeouts No need to slow start again At steady state, cwnd oscillates around the optimal window size. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 39
Fast Retransmission X Duplicate Acks Sequence No Time Rensselaer Polytechnic Institute 40 Shivkumar Kalyanaraman
Multiple Losses X X Now what? (TCP Versions) Retransmission Duplicate Acks Sequence No Time Rensselaer Polytechnic Institute 41 Shivkumar Kalyanaraman
TCP Versions: Tahoe: set window to 1, and do slow start! No timeout… X X Sequence No Time Rensselaer Polytechnic Institute 42 Shivkumar Kalyanaraman
TCP Versions: Reno: Recover 1 packet loss ok, but multiple loss => timeout X X Now what? - timeout Sequence No Time Rensselaer Polytechnic Institute 43 Shivkumar Kalyanaraman
TCP Reno (Jacobson 1990) window SS time CA SS: Slow Start CA: Congestion Avoidance Rensselaer Polytechnic Institute Fast retransmission/fast recovery Shivkumar Kalyanaraman 44
New. Reno: Recover multiple losses in successive RTTs using notion of partial ack”. No timeout. X X Now what? – partial ack recovery Sequence No Time Rensselaer Polytechnic Institute 45 Shivkumar Kalyanaraman
SACK q Basic problem is that cumulative acks only provide little information q Alt: Selective Ack for just the packet received q What if selective acks are lost? carry cumulative ack also! q Implementation: Bitmask of packets received q Selective acknowledgement (SACK) q Only provided as an optimization for retransmission q Fall back to cumulative acks to guarantee correctness and window updates Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 46
SACK X X Sequence No Time Rensselaer Polytechnic Institute 47 Now what? – send retransmissions as soon as detected Shivkumar Kalyanaraman
TCP Congestion Control Summary q q q q Sliding window limited by receiver window. Dynamic windows: slow start (exponential rise), congestion avoidance (additive rise), multiplicative decrease. q Ack clocking Adaptive timeout: need mean RTT & deviation Timer backoff and Karn’s algo during retransmission Go-back-N or Selective retransmission Cumulative and Selective acknowledgements Timeout avoidance: Fast Retransmit Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 48
Queuing Disciplines q q q Each router must implement some queuing discipline Queuing allocates bandwidth and buffer space: q Bandwidth: which packet to serve next (scheduling) q Buffer space: which packet to drop next (buff mgmt) Queuing also affects latency Traffic Sources Traffic Classes Class A Class B Class C Drop Scheduling Buffer Management Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 49
Typical Internet Queuing q q FIFO + drop-tail q Simplest choice q Used widely in the Internet FIFO (first-in-first-out) q Implies single class of traffic Drop-tail q Arriving packets get dropped when queue is full regardless of flow or importance Important distinction: q FIFO: scheduling discipline q Drop-tail: drop (buffer management) policy Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 50
FIFO + Drop-tail Problems q q FIFO Issues: In a FIFO discipline, the service seen by a flow is convoluted with the arrivals of packets from all other flows! q No isolation between flows: full burden on e 2 e control q No policing: send more packets get more service Drop-tail issues: q Routers are forced to have large queues to maintain high utilizations q Larger buffers => larger steady state queues/delays q Synchronization: end hosts react to same events because packets tend to be lost in bursts q Lock-out: a side effect of burstiness and synchronization is that a few flows can monopolize queue space Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 51
Queue Management Ideas q q Synchronization, lock-out: q Random drop: drop a randomly chosen packet q Drop front: drop packet from head of queue High steady-state queuing vs burstiness: q Early drop: Drop packets before queue full q Do not drop packets “too early” because queue may reflect only burstiness and not true overload Misbehaving vs Fragile flows: q Drop packets proportional to queue occupancy of flow q Try to protect fragile flows from packet loss (eg: color them or classify them on the fly) Drop packets vs Mark packets: q Dropping packets interacts w/ reliability mechanisms q Mark packets: need to trust end-systems to respond! Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 52
Packet Drop Dimensions Aggregation Per-connection state Single class Class-based queuing Head Drop position Tail Random location Early drop Overflow drop Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 53
Random Early Detection (RED) Min thresh Max thresh P(drop) Average Queue Length 1. 0 max. P minth maxth Avg queue length Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 54
Random Early Detection (RED) q q Maintain running average of queue length q Low pass filtering If avg Q < minth do nothing q Low queuing, send packets through If avg Q > maxth, drop packet q Protection from misbehaving sources Else mark (or drop) packet in a manner proportional to queue length & bias to protect against synchronization q Pb = maxp(avg - minth) / (maxth - minth) q Further, bias Pb by history of unmarked packets q Pa = Pb/(1 - count*Pb) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 55
RED Issues q q Issues: q Breaks synchronization well q Extremely sensitive to parameter settings q Wild queue oscillations upon load changes q Fail to prevent buffer overflow as #sources increases q Does not help fragile flows (eg: small window flows or retransmitted packets) q Does not adequately isolate cooperative flows from non-cooperative flows Isolation: q Fair queuing achieves isolation using per-flow state q RED penalty box: Monitor history for packet drops, identify flows that use disproportionate bandwidth Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 56
REM q Athuraliya & Low 2000 Main ideas q Decouple congestion & performance measure q “Price” adjusted to match rate and clear buffer q Marking probability exponential in `price’ REM RED 1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 57 Avg queue
Comparison of AQM Performance REM queue = 1. 5 pkts utilization = 92% Drop. Tail queue = 94% g = 0. 05, = 0. 4, f = 1. 15 RED min_th = 10 pkts max_th = 40 pkts max_p = 0. 1 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 58
What is TCP Throughput? window 4 w/3 w = (4 w/3+2 w/3)/2 2 w/3 t 2 w/3 Area = 2 w 2/3 Each cycle delivers 2 w 2/3 packets q Assume: each cycle delivers 1/p packets = 2 w 2/3 q Delivers 1/p packets followed by a drop q => Loss probability = p/(1+p) ~ p if p is small. Shivkumar Kalyanaraman q Hence Rensselaer Polytechnic Institute q 59
Law q Equilibrium window size q Equilibrium rate q Empirically constant a ~ 1 Verified extensively through simulations and on Internet References q T. J. Ott, J. H. B. Kemperman and M. Mathis (1996) q M. Mathis, J. Semke, J. Mahdavi, T. Ott (1997) q T. V. Lakshman and U. Mahdow (1997) q J. Padhye, V. Firoiu, D. Towsley, J. Kurose (1998) q q Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 60
- Pincuegula
- Tcp congestion control
- Tcp congestion control
- Tcp congestion control
- Tcp segment header size
- Working aqm. is this future
- Aqm quality management
- General principles of congestion control
- Traffic throttling and load shedding
- Congestion prevention policies
- Congestion control principles
- Congestion control in virtual circuit
- Udp congestion control
- Principles of congestion control
- Congestion control in network layer
- Hop by hop choke packet
- General principles of congestion control
- New and navigation schemes selection of window
- What is hypermia
- Topic sentence about traffic
- Importance of micro organisms
- Difference between hyperemia and congestion
- Capacity allocation and congestion management
- Lcm of 12 and 18
- Common anode and common cathode
- Hcf 60 and 72
- How to find lowest common factor
- Lcm of 16 24 and 40
- Highest common factors and lowest common multiples
- Network congestion causes
- Shield punt drills
- Stata graph schemes
- Alliteration examples in poetry
- Traffic congestion conclusion
- Packaging color schemes
- Mouse in afrikaans
- Disney character color schemes
- Main classes of library of congress classification scheme
- Eneritis
- Punt block schemes
- Three generations of multicomputers
- Wristband colors meaning hospital
- Wecs schemes
- Anadiplosi
- Pathology
- Schemes in piaget's theory
- Carmen bove
- Acute pulmonary congestion histology
- Ocr past paper mark schemes
- Components of information architecture
- Basic color schemes
- Congestion
- Slsam
- Product classification schemes
- Cop4020
- Estasis de leche
- Filling schemes
- Crohn's disease pathology outlines
- Pilot relay is used for
- Split complementary color scheme
- Circulatory disturbances pathology