Transport Layer Transport Layer Topics r Review multiplexing

Transport Layer – Topics r Review: multiplexing, connection and connectionless transport, services provided by

Transport layer r Transfers messages between application in hosts m For ftp you exchange

Connection oriented / connectionless r TCP supports the idea of a connection m Once

TCP vs UCP r Connection oriented m m Connections must be set up The

Applications and Transport Protocols r Smtp/mail TCP r telnet TCP r http TCP r

Multiplexing with ports Transport layer packet headers always contain source and destination port IP

Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing

UDP: User Datagram Protocol r “no frills, ” “bare bones” Internet transport protocol r

UDP: more r often used for streaming multimedia apps m loss tolerant m rate

UDP checksum Goal: detect “errors” (e. g. , flipped bits) in transmitted segment Sender:

Internet Checksum Example r Note m When adding numbers, a carryout from the most

Reliable data transfer: getting started rdt_send(): called from above, (e. g. , by app.

Reliable data transfer: getting started We’ll: r incrementally develop sender, receiver sides of reliable

Rdt 1. 0: reliable transfer over a reliable channel r underlying channel perfectly reliable

Rdt 2. 0: channel with bit errors r underlying channel may flip bits in

rdt 2. 0: FSM specification rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is.

rdt 2. 0 has a fatal flaw! What happens if ACK/NAK corrupted? r sender

rdt 2. 1: sender, handles garbled ACK/NAKs

rdt 2. 1: receiver, handles garbled ACK/NAKs

rdt 2. 1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt)

rdt 2. 1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 0(rcvpkt) rdt_rcv(rcvpkt)

rdt 2. 1: discussion Sender: r seq # added to pkt r two seq.

rdt 2. 2: a NAK-free protocol r same functionality as rdt 2. 1, using

rdt 2. 2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt)

rdt 3. 0: channels with errors and loss New assumption: underlying channel can also

rdt 3. 0 sender rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) L

rdt 3. 0 in action sender receiver send pkt 0 rec ack 0 rec

rdt 3. 0 in action sender receiver send pkt 0 rec ack 0 send

Performance of rdt 3. 0 r rdt 3. 0 works, but performance stinks r

rdt 3. 0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0

Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged pkts m m range of sequence

Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit

Pipelining Protocols Go-back-N: big picture: r Sender can have up to N unacked packets

Selective repeat: big picture r Sender can have up to N unacked packets in

Go-Back-N Sender: r k-bit seq # in pkt header r “window” of up to

Go-Back-N State of pkts Pkt that could be sent ACKed pkts start 0 un.

Go-Back-N N un. ACKed pkts window ACK arrives N-1 un. ACKed pkts Send pkt

GBN: sender extended Activity Diagram Waiting for file Set Next. Pkt. To. Send=0 Set

GBN: Receiver Activity Diagram start Set Next. Pkt. To. Rec = 0 Clear Receiver.

$GBN: sender extended FSM rdt_send(data) L base=1 nextseqnum=1 if (nextseqnum < base+N) { sndpkt[nextseqnum]$

GBN: receiver extended FSM default udt_send(sndpkt) L Wait expectedseqnum=1 sndpkt = make_pkt(expectedseqnum, ACK, chksum)

GBN in Action sender Send pkt 0 Send pkt 2 Send pkt 3 Send

Selective Repeat r receiver individually acknowledges all correctly received pkts m buffers pkts, as

Selective repeat: sender, receiver windows

Selective repeat sender data from above : receiver pkt n in [rcvbase, rcvbase+N-1] r

Summary of transport layer tools used so far r ACK and NACK r Sequence

TCP: Overview r point-to-point: m one sender, one receiver r reliable, in-order byte steam:

TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK #

TCP seq. #’s and ACKs Seq. #’s: m byte stream “number” of first byte

Seq no and ACKs Byte numbers 101 102 103 104 105 106 107 108

Seq no and ACKs - bidirectional Byte numbers 12 13 14 15 16 17

TCP Round Trip Time and Timeout Q: how to set TCP timeout value (RTO)?

TCP Round Trip Time and Timeout Estimated. RTT = (1 - )*Estimated. RTT +

TCP Round Trip Time and Timeout Setting the timeout (RTO) r RTO = Estimted.

TCP Round Trip Time and Timeout RTO = Estimated. RTT + 4*Dev. RTT Might

RTO details r When a pkt is sent, the timer is started, unless it

Lost Detection sender Send pkt 0 Send pkt 2 Send pkt 3 Send pkt

Fast Retransmit sender Send pkt 0 Send pkt 2 Send pkt 3 Send pkt

TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival

TCP Flow Control r receive side of TCP connection has a receive buffer: flow

Flow control – so the receive doesn’t get overwhelmed. r Seq#=20 Ack#=1001 Data =

Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq#=1001 Ack#=22 Data size =0

Receiver window r The receiver window field is 16 bits. r Default receiver window

TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments r

Connection establishment Send SYN Seq no=2197 Ack no = xxxx SYN=1 ACK=0 Seq no

Connection with losses SYN 3 sec SYN 2 x 3=6 sec SYN 12 sec

SYN Attack attacker SYN ignored Reserve memory for TCP connection. Must reserve enough for

SYN Attack attacker SYN ignored SYN-ACK SYN SYN • Total memory usage: • Memory

Defense from SYN Attack attacker SYN ignored • If too many SYNs come from

SYN Cookie r Do not allocate memory when the SYN arrives, but when the

TCP Connection Management (cont. ) Closing a connection: client close Step 1: client end

TCP Connection Management (cont. ) Step 3: client receives FIN, replies with ACK. m

TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle

Principles of Congestion Control Congestion: r informally: “too many sources sending too much r

Causes/costs of congestion: scenario 1 Host A r two senders, two receivers r one

Causes/costs of congestion: scenario 2 r one router, finite buffers r sender retransmission of

Causes/costs of congestion: scenario 3 Q: what happens as in increases? r The total

Causes/costs of congestion: scenario 3 H o st A o u t H o

Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: r

TCP congestion control: additive increase, multiplicative decrease (AIMD) r In go-back-N, the maximum number

AIMD (approximate description) r When an ACK arrives m cwnd = cwnd + MSS/floor(cwnd/MSS)

Congestion Avoidance (AIMD) (approximate) When an ACK arrives: cwnd = cwnd + 1 /

AIMD (approximately) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd) When

Fast recovery (actual, not approximate) r Upon the two DUP ACK arrival, do nothing.

Congestion Avoidance (AIMD) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd)

Congestion Avoidance (AIMD) (actual, not approximate) When an ACK arrives: cwnd = cwnd +

TCP Performance • Q 2: at what rate does cwnd increase? • How often

TCP Start Up r What should the initial value of cwnd be? m Option

Slow start cwnd SYN: Seq#=20 Ack#=X SYN: Seq#=1000 Ack#=21 SYN: Seq#=21 Ack#=1001 1 Seq#=21

drops drop Slow start Congestion avoidance After a drop in slow start, TCP switches

Slow start r The exponential growth of cwnd during slow start can get a

TCP Behavior drops cwnd Cwnd=ssthresh Slow start Congestion avoidance drops cwnd drop Slow start

Time out? r Detecting losses with time out is considered to be an indication

Time Out cwnd SSThresh 8 X RTO 1 4 2 4 3 4 4.

Time out RTO 2 x. RTO Give up if no ACK for ~120 sec

Rough view of TCP congestion control drops Cwnd=ssthres Slow start Congestion avoidance drops drop

TCP Tahoe (old version of TCP) Enter slow start after every loss drop Slow

Summary of TCP congestion control r Theme: probe the system. m m Slowly increase

TCP sender congestion control State Event TCP Sender Action Commentary Slow Start (SS) ACK

TCP Performance 1: ACK Clocking What is the maximum data rate that TCP can

TCP throughput w Mean value = (w+w/2)/2 = w*3/4 w/2 Throughput = w/RTT =

TCP Throughput How many packets sent during one cycle (i. e. , one tooth

TCP Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R,

Why is TCP fair? Two competing sessions: r Additive increase gives slope of 1,

RTT unfairness r Throughput = sqrt(3/2) / (RTT * sqrt(p)) r A shorter RTT

Fairness (more) Fairness and UDP r Multimedia apps often do not use TCP m

TCP problems: TCP over “long, fat pipes” r Example: 1500 byte segments, 100 ms

TCP over wireless r In the simple case, wireless links have random losses. r

Chapter 3: Summary r principles behind transport layer services: m multiplexing, demultiplexing m reliable

Slides: 135

Download presentation

Transport Layer

Transport Layer – Topics r Review: multiplexing, connection and connectionless transport, services provided by a transport layer r UDP r Tools for transport layer m Error detection, ACK/NACK, ARQ r Approaches to transport m Go-Back-N m Selective repeat r TCP m Services m TCP: Connection setup, acks and seq num, timeout and triple-dup ack, slow-start, congestion avoidance.

Transport layer r Transfers messages between application in hosts m For ftp you exchange files and directory information. m For http you exchange requests and replies/files m For smtp messages are exchanged r Services possibly provided m Reliability m Error detection/correction m Flow/congestion control m Multiplexing (support several messages being transported simultaneously)

Connection oriented / connectionless r TCP supports the idea of a connection m Once listen and connect complete, there is a logical connection between the hosts. m The state of the connection can be determined (the connection is cut or not) • But TCP does not have a heartbeat message r UDP is connectionless m Packets are just sent. There is no concept (supported by the transport layer) of a connection m The application can make a connection over UDP. So the application is each host will support the hand-shaking and monitoring the state of the “connection. ” r There are several other transport layer protocols besides TCP and UDP, but TCP and UDP are the most popular

TCP vs UCP r Connection oriented m m Connections must be set up The state of the connection can be determined r Flow/congestion control m m Limits congestion in the network and end hosts Control how fast data can be sent r Larger Packet header r Retransmits lost packets and reports if packets were not successfully transmitted r Check sum for error detection r Connectionless m m Connections does not need to be set-up The state of the connection is unknown r No flow/congestion control m m Could cause excessive congestion and unfair usage Data can be sent exactly when it needs to be. r Low overhead r No feedback provided as to whether packets were successfully transmitted. r Check sum for error detection

Applications and Transport Protocols r Smtp/mail TCP r telnet TCP r http TCP r ftp TCP r NFS UDP or TCP (why udp, I do not know) r Multimedia streaming UDP or TCP r Voice over ip – UDP r Routing –UDP, its own, or TCP r DNS -UDP

Multiplexing with ports Transport layer packet headers always contain source and destination port IP headers have source and destination IPs P 1 P 4 P 5 P 2 P 6 P 1 P 3 SP: 5775 DP: 80 S-IP: B D-IP: C client IP: A SP: 9157 DP: 80 S-IP: A D-IP: C SP: 9157 server IP: C DP: 80 S-IP: B D-IP: C Client IP: B

Chapter 3 outline r 3. 1 Transport-layer services r 3. 2 Multiplexing and demultiplexing r 3. 3 Connectionless transport: UDP r 3. 4 Principles of reliable data transfer r 3. 5 Connection-oriented transport: TCP m m segment structure reliable data transfer flow control connection management r 3. 6 Principles of congestion control r 3. 7 TCP congestion control

UDP: User Datagram Protocol r “no frills, ” “bare bones” Internet transport protocol r “best effort” service, UDP segments may be: m lost m delivered out of order to app r connectionless: m no handshaking between UDP sender, receiver m each UDP segment handled independently of others [RFC 768] Why is there a UDP? r no connection establishment (which can add delay) r simple: no connection state at sender, receiver r small segment header r no congestion control: UDP can blast away as fast as desired

UDP: more r often used for streaming multimedia apps m loss tolerant m rate sensitive Length, in bytes of UDP segment, including header r other UDP uses m DNS m SNMP r reliable transfer over UDP: add reliability at application layer m application-specific error recovery! 32 bits source port # dest port # length checksum Application data (message) UDP segment format

UDP checksum Goal: detect “errors” (e. g. , flipped bits) in transmitted segment Sender: Receiver: r treat segment contents as r compute checksum of sequence of 16 -bit integers r checksum: addition (1’s complement sum) of segment contents r sender puts checksum value into UDP checksum field received segment r check if computed checksum equals checksum field value: m NO - error detected m YES - no error detected. But maybe errors nonetheless? More later ….

Internet Checksum Example r Note m When adding numbers, a carryout from the most significant bit needs to be added to the result r Example: add two 16 -bit integers 1 1 0 0 1 1 1 0 1 0 1 wraparound 1 1 0 1 1 sum 1 1 0 1 1 0 0 checksum 1 0 0 0 0 1 1

Principles of Reliable data transfer

Reliable data transfer: getting started rdt_send(): called from above, (e. g. , by app. ). Passed data to deliver to receiver upper layer send side udt_send(): called by rdt, to transfer packet over unreliable channel to receiver deliver_data(): called by rdt to deliver data to upper receive side rdt_rcv(): called when packet arrives on rcv-side of channel

Reliable data transfer: getting started We’ll: r incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) r consider only unidirectional data transfer m but control info will flow on both directions! r use finite state machines (FSM) to specify sender, receiver state: when in this “state” next state uniquely determined by next event state 1 event causing state transition actions taken on state transition event actions state 2

Rdt 1. 0: reliable transfer over a reliable channel r underlying channel perfectly reliable m no bit errors m no loss of packets r separate FSMs for sender, receiver: m sender sends data into underlying channel m receiver read data from underlying channel Wait for call from above rdt_send(data) packet = make_pkt(data) udt_send(packet) sender Wait for call from below rdt_rcv(packet) extract (packet, data) deliver_data(data) receiver

Rdt 2. 0: channel with bit errors r underlying channel may flip bits in packets m checksum to detect bit errors r the question: how to recover from errors: m acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK m negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors m sender retransmits pkt on receipt of NAK r new mechanisms in rdt 2. 0 (beyond rdt 1. 0): m m error detection receiver feedback: control msgs (ACK, NAK) rcvr->sender

rdt 2. 0: FSM specification

rdt 2. 0: FSM specification rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) L sender receiver rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK)

rdt 2. 0 has a fatal flaw! What happens if ACK/NAK corrupted? r sender doesn’t know what happened at receiver! r can’t just retransmit: possible duplicate Handling duplicates: r sender retransmits current pkt if ACK/NAK garbled r sender adds sequence number to each pkt r receiver discards (doesn’t deliver up) duplicate pkt stop and wait Sender sends one packet, then waits for receiver response

rdt 2. 1: sender, handles garbled ACK/NAKs

rdt 2. 1: receiver, handles garbled ACK/NAKs

rdt 2. 1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt) Wait for call 0 from above rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt) L rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. NAK(rcvpkt) ) udt_send(sndpkt) Wait for ACK or NAK 0 L Wait for ACK or NAK 1 Wait for call 1 from above rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt)

rdt 2. 1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 0(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq 1(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) Wait for 0 from below Wait for 1 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 1(rcvpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq 0(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt)

rdt 2. 1: discussion Sender: r seq # added to pkt r two seq. #’s (0, 1) will suffice. Why? r must check if received ACK/NAK corrupted r twice as many states m state must “remember” whether “current” pkt has 0 or 1 seq. # Receiver: r must check if received packet is duplicate m state indicates whether 0 or 1 is expected pkt seq # r note: receiver can not know if its last ACK/NAK received OK at sender

rdt 2. 2: a NAK-free protocol r same functionality as rdt 2. 1, using ACKs only r instead of NAK, receiver sends ACK for last pkt received OK m receiver must explicitly include seq # of pkt being ACKed r duplicate ACK at sender results in same action as NAK: retransmit current pkt

rdt 2. 2: sender, receiver fragments

rdt 2. 2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && Wait for call 0 from above rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq 1(rcvpkt)) udt_send(sndpkt) Wait for 0 from below ( corrupt(rcvpkt) || is. ACK(rcvpkt, 1) ) udt_send(sndpkt) Wait for ACK 0 sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 0) receiver FSM fragment L rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 1(rcvpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK 1, chksum) udt_send(sndpkt) What happens if a pkt is duplicated?

rdt 3. 0: channels with errors and loss New assumption: underlying channel can also lose packets (data or ACKs) m checksum, seq. #, ACKs, retransmissions will be of help, but not enough Approach: sender waits “reasonable” amount of time for ACK r retransmits if no ACK received in this time r if pkt (or ACK) just delayed (not lost): m retransmission will be duplicate, but use of seq. #’s already handles this m receiver must specify seq # of pkt being ACKed r requires countdown timer

rdt 3. 0 sender

rdt 3. 0 sender rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) L rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 1) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. ACK(rcvpkt, 0) ) timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 0) stop_timer timeout udt_send(sndpkt) start_timer L Wait for ACK 0 Wait for call 0 from above L rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. ACK(rcvpkt, 1) ) Wait for ACK 1 Wait for call 1 from above rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) L

rdt 3. 0 in action sender receiver send pkt 0 rec ack 0 rec pkt 0 send ack 0 rec ack 0 send pkt 1 rec pkt 0 send ack 0 TO send pkt 1 rec ack 1 receiver send ack 1 resend pkt 1 rec ack 1 send pkt 2 time rec pkt 1 send ack 1 rec pkt 2 time

rdt 3. 0 in action sender receiver send pkt 0 rec ack 0 send pkt 1 TO rec pkt 0 send ack 0 rec ack 1 send pkt 2 time rec ack 0 send pkt 1 TO rec pkt 1 send ack 1 rec ack 1 send pkt 2 send pkt 1 rec pkt 1 send ack 1 receiver rec ack 1 send no pktsend (dup. ACK) pkt? rec ack 2 send pkt 2 time rec pkt 0 send ack 0 rec pkt 1 send ack 1 rec pkt 2 send ack 2

Performance of rdt 3. 0 r rdt 3. 0 works, but performance stinks r ex: 1 Gbps link, 15 ms prop. delay, 8000 bit packet and 100 bit ACK: m What is the total delay • Data transmission delay – 8000/109 = 8 10 -6 • ACK Transmission delay – 100/109 = 10 -7 sec • Total Delay – 2 15 ms +. 008 +. 0001=30. 0081 ms r Utilization m Time transmitting / total time m. 008 / 30. 0081 = 0. 00027 r This is one pkt every 30 msec or 33 k. B/sec over a 1 Gbps link!

rdt 3. 0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R RTT ACK arrives, send next packet, t = RTT + L / R first packet bit arrives last packet bit arrives, send ACK

Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged pkts m m range of sequence numbers must be increased buffering at sender and/or receiver r Two generic forms of pipelined protocols: go-Back-N, selective repeat

Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3!

Pipelining Protocols Go-back-N: big picture: r Sender can have up to N unacked packets in pipeline r Rcvr only sends cumulative acks m Doesn’t ack packet if there’s a gap r Sender has timer for oldest unacked packet m If timer expires, retransmit all unacked packets Selective Repeat: big pic r Sender can have up to N unacked packets in pipeline r Rcvr acks individual packets r Sender maintains timer for each unacked packet m When timer expires, retransmit only unack packet

Selective repeat: big picture r Sender can have up to N unacked packets in pipeline r Receiver acks individual packets r Sender maintains timer for each unacked packet m When timer expires, retransmit only unack packet

Go-Back-N Sender: r k-bit seq # in pkt header r “window” of up to N, unack’ed pkts allowed r ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” may receive duplicate ACKs (see receiver) r timer for each in-flight pkt r timeout(n): retransmit pkt n and all higher seq # pkts in window m

Go-Back-N State of pkts Pkt that could be sent ACKed pkts start 0 un. ACKed pkts send pkt window N=12 1 un. ACKed pkts window send pkts N un. ACKed pkts Next pkt to be sent window ACK arrives N-1 un. ACKed pkts window Send pkt N un. ACKed pkts window N=12 Sliding window un. ACKed pkt Unused pkt

Go-Back-N N un. ACKed pkts window ACK arrives N-1 un. ACKed pkts Send pkt window N un. ACKed pkts No ACK arrives …. timeout window 0 un. ACKed pkts window Pkt that could be sent un. ACKed pkt Unused pkt

GBN: sender extended Activity Diagram

GBN: Receiver Activity Diagram

GBN: sender extended Activity Diagram Waiting for file Set Next. Pkt. To. Send=0 Set Last. ACKed=-1 Clear Timers(Last. ACKed+1 to Next. Pkt. To. Send-1) Next. Pkt. To. Send = Last. ACKed+1 otherwise Next. Pkt. To. Send – Last. ACKed<N Send pkt[Next. Pkt. To. Send] with Seq. Num= Next. Pkt. To. Send++ Set Timer(Next. Pkt. To. Send) = Now + TO Timer expires Wait ACK arrived with ACKNum = AN Clear Timers(Last. ACKed+1 to AN) Last. ACKed = AN

GBN: Receiver Activity Diagram start Set Next. Pkt. To. Rec = 0 Clear Receiver. Buffer Clear Received. Pkts Receiver. Base = 0 wait Place Pkt in Receiver. Buffer[Seq. Num] Received. Pkts[Seq. Num]=1 otherwise Send ACK with ACKNum = Next. Pkt. To. Rec - 1 Received. Pkts[Next. Pkt. To. Rec] == 1 Next. Pkt. To. Rec++ Give Receiver. Buffer[Next. Pkt. To. Rec] to app

$GBN: sender extended FSM rdt_send(data) L base=1 nextseqnum=1 if (nextseqnum < base+N) { sndpkt[nextseqnum]$

GBN: sender extended FSM rdt_send(data) L base=1 nextseqnum=1 if (nextseqnum < base+N) { sndpkt[nextseqnum] = make_pkt(nextseqnum, data, chksum) udt_send(sndpkt[nextseqnum]) if (base == nextseqnum) start_timer nextseqnum++ } else refuse_data(data) Wait rdt_rcv(rcvpkt) && corrupt(rcvpkt) timeout start_timer udt_send(sndpkt[base]) udt_send(sndpkt[base+1]) … udt_send(sndpkt[nextseqnum-1]) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) base = getacknum(rcvpkt)+1 If (base == nextseqnum) stop_timer else start_timer

GBN: receiver extended FSM default udt_send(sndpkt) L Wait expectedseqnum=1 sndpkt = make_pkt(expectedseqnum, ACK, chksum) rdt_rcv(rcvpkt) && notcurrupt(rcvpkt) && hasseqnum(rcvpkt, expectedseqnum) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(expectedseqnum, ACK, chksum) udt_send(sndpkt) expectedseqnum++ ACK-only: always send ACK for correctly-received pkt with highest in-order seq # m m may generate duplicate ACKs need only remember expectedseqnum r out-of-order pkt: m discard (don’t buffer) -> no receiver buffering! m Re-ACK pkt with highest in-order seq #

GBN in Action sender Send pkt 0 Send pkt 2 Send pkt 3 Send pkt 4 Send pkt 5 Send pkt 6 Send pkt 7 TO Send pkt 8 Send pkt 9 Send pkt 10 receiver Rec 0, give to app, and Send ACK=0 Rec 1, give to app, and Send ACK=1 Rec 2, give to app, and Send ACK=2 Rec 3, give to app, and Send ACK=3 Rec 4, give to app, and Send ACK=4 Rec 5, give to app, and Send ACK=5 Rec 7, discard, and Send ACK=5 Rec 8, discard, and Send ACK=5 Rec 9, discard, and Send ACK=5 Send pkt 11 Send pkt 12 Send pkt 13 Send pkt 6 Send pkt 7 Send pkt 8 Send pkt 9 Rec 10, discard, and Send ACK=5 Rec 11, discard, and Send ACK=5 Rec 12, discard, and Send ACK=5 Rec 13, discard, and Send ACK=5 Rec 6, give to app, . and Send ACK=6 Rec 7, give to app, . and Send ACK=7 Rec 8, give to app, . and Send ACK=8 Rec 9, give to app, . and Send ACK=9

Optimal size of N in GBN

Selective Repeat r receiver individually acknowledges all correctly received pkts m buffers pkts, as needed, for eventual in-order delivery to upper layer r sender only resends pkts for which ACK not received m sender timer for each un. ACKed pkt r sender window m N consecutive seq #’s m again limits seq #s of sent, un. ACKed pkts

Selective repeat: sender, receiver windows

Selective repeat sender data from above : receiver pkt n in [rcvbase, rcvbase+N-1] r if next available seq # in r send ACK(n) timeout(n): r in-order: deliver (also window, send pkt r resend pkt n, restart timer ACK(n) in [sendbase, sendbase+N]: r mark pkt n as received r if n smallest un. ACKed pkt, advance window base to next un. ACKed seq # r out-of-order: buffer deliver buffered, in-order pkts), advance window to next not-yet-received pkt n in [rcvbase-N, rcvbase-1] r ACK(n) otherwise: r ignore

Selective repeat in action

Summary of transport layer tools used so far r ACK and NACK r Sequence numbers (and no NACK) r Time out r Sliding window m Optimal size = bandwidth delay product (if no other flows are using the network) r Cumulative ACK m Buffer at the receiver is optional r Selective ACK m Requires buffering at the receiver

TCP: Overview r point-to-point: m one sender, one receiver r reliable, in-order byte steam: r Pipelined and timevarying window size: m TCP congestion and flow control set window size r send & receive buffers RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum segment size r connection-oriented: m handshaking (exchange of control msgs) init’s sender, receiver state before data exchange r flow controlled: m sender will not overwhelm receiver

TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept

TCP seq. #’s and ACKs Seq. #’s: m byte stream “number” of first byte in segment’s data m It can be used as a pointer for placing the received data in the receiver buffer ACKs: m seq # of next byte expected from other side m cumulative ACK Host B Host A User types ‘C’ Seq=4 2, ACK = 79, da ta ata = d , 3 4 K= C 79, A = q e S host ACKs receipt of echoed ‘C’ = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=4 3, ACK =80 simple telnet scenario time

Seq no and ACKs Byte numbers 101 102 103 104 105 106 107 108 109 110 111 H E L L O WOR L D Seq no: 101 ACK no: 12 Data: HEL Length: 3 Seq no: 12 ACK no: 104 Data: Length: 0 Seq no: 104 ACK no: 12 Data: LO W Length: 4 Seq no: 12 ACK no: 108 Data: Length: 0

Seq no and ACKs - bidirectional Byte numbers 12 13 14 15 16 17 18 101 102 103 104 105 106 107 108 109 110 111 H E L L O WOR L D Seq no: 101 ACK no: 12 Data: HEL Length: 3 Seq no: 12 ACK no: 104 Data: GOOD Length: 4 Seq no: 104 ACK no: 16 Data: LO W Length: 4 Seq no: 16 ACK no: 108 Data: BU Length: 2 G OOD B UY

TCP Round Trip Time and Timeout Q: how to set TCP timeout value (RTO)? If RTO is too short: premature timeout m unnecessary retransmissions r If RTO is too long: r m r Can RTT be used? m m r slow reaction to segment loss No, RTT varies, there is no single RTT Why does RTT varying? • Because statistical multiplexing results in queuing How about using the average RTT? m The average is too small, since half of the RTTs are larger the average Q: how to estimate RTT? r Sample. RTT: measured time from segment transmission until ACK receipt m ignore retransmissions r Sample. RTT will vary, want estimated RTT “smoother” m average several recent measurements, not just current Sample. RTT

TCP Round Trip Time and Timeout Estimated. RTT = (1 - )*Estimated. RTT + *Sample. RTT r Exponential weighted moving average r influence of past sample decreases exponentially fast r typical value: = 0. 125

Example RTT estimation:

TCP Round Trip Time and Timeout Setting the timeout (RTO) r RTO = Estimted. RTT plus “safety margin” m large variation in Estimated. RTT -> larger safety margin r first estimate of how much Sample. RTT deviates from Estimated. RTT: Dev. RTT = (1 - )*Dev. RTT + *|Sample. RTT-Estimated. RTT| (typically, = 0. 25) Then set timeout interval: RTO = Estimated. RTT + 4*Dev. RTT

TCP Round Trip Time and Timeout RTO = Estimated. RTT + 4*Dev. RTT Might not always work RTO = max(Min. RTO, Estimated. RTT + 4*Dev. RTT) Min. RTO = 250 ms for Linux 500 ms for windows 1 sec for BSD So in most cases RTO = min. RTO Actually, when RTO>Min. RTO, the performance is quite bad; there are many spurious timeouts. Note that RTO was computed in an ad hoc way. It is really a signal processing and queuing theory question…

RTO details r When a pkt is sent, the timer is started, unless it is already running. r When a new ACK is received, the timer is restarted r Thus, the timer is for the oldest un. ACKed pkt m m • • Q: if RTO=RTT- , are there many spurious timeouts? A: Not necessarily (actually, yes) ACK arrives, and so RTO timer is restarted RTO RTO • This shifting of the RTO means that even if RTO<RTT, there might not be a timeout. • However, for the first packet sent, the timer is started. If RTO<RTT of this first packet, then there will be a spurious timeout. While it is implementation dependent, some implementations estimate RTT only once per RTT. The RTT of every pkt is not measured. Instead, if no RTT is being measured, then the RTT of the next pkt is measured. But the RTT of retransmitted pkts is not measured Some versions of TCP measure RTT more often.

Lost Detection sender Send pkt 0 Send pkt 2 Send pkt 3 Send pkt 4 Send pkt 5 Send pkt 6 Send pkt 7 receiver Rec 0, give to app, and Send ACK no= 1 Rec 1, give to app, and Send ACK no= 2 Rec 2, give to app, and Send ACK no = 3 Rec 3, give to app, and Send ACK no =4 Rec 4, give to app, and Send ACK no = 5 Rec 5, give to app, and Send ACK no = 6 Send pkt 8 Rec 7, save in buffer, and Send ACK no = 6 Send pkt 9 TO Send pkt 10 Rec 8, save in buffer, and Send ACK no = 6 Rec 9, save in buffer, and Send ACK no = 6 Send pkt 11 Send pkt 12 Send pkt 13 Send pkt 6 Send pkt 7 Send pkt 8 Send pkt 9 Rec 10, save in buffer, and Send ACK no = 6 Rec 11, save in buffer, and Send ACK no = 6 Rec 12, save in buffer, and Send ACK no= 6 Rec 13, save in buffer, and Send ACK no=6 Rec 6, give to app, . and Send ACK no =14 Rec 7, give to app, . and Send ACK no =14 Rec 8, give to app, . and Send ACK no =14 Rec 9, give to app, . and Send ACK no=14 • It took a long time to detect the loss with RTO • But by examining the ACK no, it is possible to determine that pkt 6 was lost • Specifically, receiving two ACKs with ACK no=6 indicates that segment 6 was lost • A more conservative approach is to wait for 4 of the same ACK no (triple-duplicate ACKs), to decide that a packet was lost • This is called fast retransmit • Triple dup-ACK is like a NACK

Fast Retransmit sender Send pkt 0 Send pkt 2 Send pkt 3 Send pkt 4 Send pkt 5 Send pkt 6 Send pkt 7 receiver Rec 0, give to app, and Send ACK no= 1 Rec 1, give to app, and Send ACK no= 2 Rec 2, give to app, and Send ACK no = 3 Rec 3, give to app, and Send ACK no =4 Rec 4, give to app, and Send ACK no = 5 Rec 5, give to app, and Send ACK no = 6 Send pkt 8 Rec 7, save in buffer, and Send ACK no = 6 Send pkt 9 first dup-ACK Send pkt 10 Rec 8, save in buffer, and Send ACK no = 6 Rec 9, save in buffer, and Send ACK no = 6 second dup-ACK third dup-ACK Retransmit pkt 6 Send pkt 11 Send pkt 6 Send pkt 12 Send pkt 13 Send pkt 14 Send pkt 15 Send pkt 16 Rec 10, save in buffer, and Send ACK no = 6 Rec 11, save in buffer, and Send ACK no = 6 Rec 6, save in buffer, and Send ACK= 12 Rec 12, save in buffer, and Send ACK=13 Rec 13, give to app, . and Send ACK=14 Rec 14, give to app, . and Send ACK=15 Rec 15, give to app, . and Send ACK=16 Rec 16, give to app, . and Send ACK=17

TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Delayed ACK. Wait up to 500 ms for next segment. If no next segment, send ACK Arrival of in-order segment with expected seq #. One other segment has ACK pending Immediately send single cumulative ACK, ACKing both in-order segments Arrival of out-of-order segment higher-than-expect seq. #. Gap detected Immediately send duplicate ACK, indicating seq. # of next expected byte Arrival of segment that partially or completely fills gap Immediate send ACK, provided that segment starts at lower end of gap

TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not U A P R S F Receive window len used checksum Urg data pnter Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept

TCP Flow Control r receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast r speed-matching service: r app process may be slow at reading from buffer matching the send rate to the receiving app’s drain rate r The sender never has more than a receiver windows worth of bytes un. ACKed r This way, the receiver buffer will never overflow

Flow control – so the receive doesn’t get overwhelmed. r Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq#=1001 Ack#=22 Data size =0 Rwin=2 SYN had seq#=14 Seq # buffer Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) Seq#=1001 Ack#=24 Data size =0 Rwin=0 15 16 S 15 17 t e 16 S 17 t e 18 19 20 21 22 r v e H i 18 19 20 21 v e H i 22 B y The r. Buffer is full Application reads buffer 24 25 26 27 28 29 30 31 Seq#=1001 Ack#=24 Data size =0 Rwin=9 Seq#=4 Ack#=1001 Data = ‘e’, size = 1 (bytes) e The number of unacknowledged packets must be less than the receiver window. As the receivers buffer fills, decreases the receiver window.

Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq#=1001 Ack#=22 Data size =0 Rwin=2 SYN had seq#=14 Seq # Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) 16 15 18 19 20 21 22 17 18 19 20 21 22 S t e v e H i B y Application reads buffer 24 3 s 17 S t e v e H i buffer Seq#=1001 Ack#=24 Data size =0 Rwin=0 16 15 25 26 27 28 29 30 31 Seq#=1001 Ack#=24 Data size =0 Rwin=9 Seq#=4 Ack#=1001 Data = , size = 0 (bytes) window probe Seq#=1001 Ack#=24 Data size =0 Rwin=9 Seq#=4 Ack#=1001 Data = ‘e’, size = 1 (bytes) 24 e 25 26 27 28 29 30 31

Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq#=1001 Ack#=22 Data size =0 Rwin=2 Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) Seq#=1001 Ack#=24 Data size =0 Rwin=0 SYN had seq#=14 Seq # buffer 15 S 16 17 t e 18 19 20 21 22 v e H i 18 19 20 21 v e H i 22 B y 3 s Seq#=4 Ack#=1001 Data = , size = 0 (bytes) Seq#=1001 Ack#=24 Data size =0 Rwin=0 The buffer is still full 6 s Seq#=4 Ack#=1001 Data = , size = 0 (bytes) Max time between probes is 60 or 64 seconds

Receiver window r The receiver window field is 16 bits. r Default receiver window m By default, the receiver window is in units of bytes. m Hence 64 KB is max receiver size for any (default) implementation. m Is that enough? • • Recall that the optimal window size is the bandwidth delay product. Suppose the bit-rate is 100 Mbps = 12. 5 MBps 2^16 / 12. 5 M = 0. 005 = 5 msec If RTT is greater than 5 msec, then the receiver window will force the window to be less than optimal • Windows 2 K had a default window size of 12 KB r Receiver window scale m During SYN, one option is Receiver window scale. m This option provides the amount to shift the Receiver window. m Eg. Is rec win scale = 4 and rec win=10, then real receiver window is 10<<4 = 160 bytes.

TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments r initialize TCP variables: m seq. #s m buffers, flow control info (e. g. Rcv. Window) m Establish options and versions of TCP Three way handshake: Step 1: client host sends TCP SYN segment to server m specifies initial seq # m no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers m specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data m

TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not U A P R S F Receive window len used checksum Urg data pnter Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept

Connection establishment Send SYN Seq no=2197 Ack no = xxxx SYN=1 ACK=0 Seq no = 12 ACK no = 2198 SYN=1 ACK=1 Send ACK (for syn) Seq no = 2198 ACK no = 13 SYN = 0 ACK =1 Reset the sequence number The ACK no is invalid Although no new data has arrived, the ACK no is incremented (2197 + 1) Send SYN-ACK

Connection with losses SYN 3 sec SYN 2 x 3=6 sec SYN 12 sec SYN 64 sec Give up Total waiting time 3+6+12+24+48+64 = 157 sec

SYN Attack attacker SYN ignored Reserve memory for TCP connection. Must reserve enough for the receiver buffer. And that must be large enough to support high data rate SYN-ACK SYN SYN SYN 157 sec SYN Victim gives up on first SYN-ACK and frees first chunk of memory

SYN Attack attacker SYN ignored SYN-ACK SYN SYN • Total memory usage: • Memory per connection x number of SYNs sent in 157 sec • Number of syns sent in 157 sec: • 157 x 10 Mbps / (SYN size x 8) = 157 x 31250 = 5 M • Suppose Memory per connection = 20 K • Total memory = 20 K x 5 M = 100 GB … machine will crash 157 sec

Defense from SYN Attack attacker SYN ignored • If too many SYNs come from the same host, ignore them SYN-ACK SYN SYN ignore ignore • Better attack • Change the source address of the SYN to some random address

SYN Cookie r Do not allocate memory when the SYN arrives, but when the ACK for the SYN-ACK arrives r The attacker could send fake ACKs r But the ACK must contain the correct ACK number r Thus, the SYN-ACK must contain a sequence number that is m m not predictable and does not require saving any information. r This is what the SYN cookie method does

TCP Connection Management (cont. ) Closing a connection: client close Step 1: client end system sends TCP packet with FIN=1 to the server FIN, replies with ACK no incremented Closes connection, FIN ACK FIN timed wait Step 2: server receives server closed The server close its side of the conenction whenever it wants (by send a pkt with FIN=1) ACK close

TCP Connection Management (cont. ) Step 3: client receives FIN, replies with ACK. m client closing Enters “timed wait” will respond with ACK to received FINs server FIN ACK Step 4: server, receives closing FIN Note: with small modification, can handle simultaneous FINs. timed wait ACK. Connection closed ACK closed

TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle

Principles of Congestion Control Congestion: r informally: “too many sources sending too much r r data too fast for network to handle” different from flow control! manifestations: m lost packets (buffer overflow at routers) m long delays (queueing in router buffers) On the other hand, the host should send as fast as possible (to speed up the file transfer) a top-10 problem! m m Low quality solution in wired networks Big problems in wireless (especially cellular)

Causes/costs of congestion: scenario 1 Host A r two senders, two receivers r one router, infinite buffers r no retransmission Host B out in : original data unlimited shared output link buffers r large delays when congested r maximum achievable throughput

Causes/costs of congestion: scenario 2 r one router, finite buffers r sender retransmission of lost packet Host A in : original data 'in : original data, plus retransmitted data Host B finite shared output link buffers out

Causes/costs of congestion: scenario 3 Q: what happens as in increases? r The total data rate is the sending rate + the retransmission rate. four senders r multihop paths r timeout/retransmit r Host A Host B in : original data ’: retransmitted finite shared data output link buffers A o ut B D Host C C 1. 2. 3. Congestion at A will cause losses at router A and force host B to increase its sending rate of retransmitted pkts This will cause congestion at router B and force host C to increase its sending rate And so on

Causes/costs of congestion: scenario 3 H o st A o u t H o st B Another “cost” of congestion: r when packet dropped, any “upstream transmission capacity used for that packet wasted!

Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control: r no explicit feedback from network r congestion inferred from end-system observed loss, delay r approach taken by TCP Network-assisted congestion control: r routers provide feedback to end systems m single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) m explicit rate sender should send at (XCP) Today, the network does not provide help to TCP. But this will likely change with wireless data networking

TCP congestion control: additive increase, multiplicative decrease (AIMD) r In go-back-N, the maximum number of un. ACKed pkts was N r In TCP, cwnd is the maximum number of un. ACKed bytes r TCP varies the value of cwnd r Approach: increase transmission rate (window size), probing for usable bandwidth, until loss occurs m additive increase: increase cwnd by 1 MSS every RTT until loss detected • MSS = maximum segment size and may be negotiated during connection establishment. Otherwise, it is set to 576 B multiplicative decrease: cut cwnd in half after loss Saw tooth behavior: probing for bandwidth cwnd m time

AIMD (approximate description) r When an ACK arrives m cwnd = cwnd + MSS/floor(cwnd/MSS) m After cwnd/MSS acks, cwnd=cwnd+1 r When a drop is detected via triple duplicate ACK, m cwnd = MSS * floor((cwnd/MSS)/2) m cwnd ~ cwnd/2

Congestion Avoidance (AIMD) (approximate) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd) When a drop is detected via triple-dup ACK, cwnd = cwnd/2 cwnd inflight 4000 0 4000 1000 SN: 1000 AN: 30 Length: 1000 4000 2000 SN: 2000 AN: 30 Length: 1000 4000 3000 SN: 3000 AN: 30 Length: 1000 4000 SN: 4000 AN: 30 Length: 1000 4500 4750 3000 4000 5000 3000 5000 4000 5000 SN: 30 AN: 3000 RWin: 9000 SN: 30 AN: 4000 Rwin: 8000 SN: 30 AN: 2000 RWin: 7000 4250 3000 4250 4000 SN: 30 AN: 2000 RWin: 10000 SN: 5000 AN: 30 Length: 1000 SN: 6000 AN: 30 Length: 1000 SN: 7000 AN: 30 Length: 1000/ SN: 8000 AN: 30 Length: 1000/ SN: 9000 AN: 30 Length: 1000/

AIMD (approximately) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd) When a drop is detected via triple-dup ACK, cwnd = cwnd/2 cwnd inflight 0 8000 1000 SN: 1 MSS. L=1 MSS SN: 2 MSS. L=1 MSS SN: 3 MSS. L=1 MSS SN: 4 MSS. L=1 MSS SN: 5 MSS. L=1 MSS SN: 6 MSS. L=1 MSS SN: 7 MSS. L=1 MSS AN=2 MSS AN=3 MSS 8000 8125 8000 SN: 8 MSS. L=1 MSS 8250 8000 8375 8000 SN: 10 MSS. L=1 MSS AN=5 MSS SN: 11 MSS. L=1 MSS AN=5 MSS SN: 12 MSS. L=1 MSS AN=5 MSS AN=4 MSS AN=5 MSS SN: 9 MSS. L=1 MSS AN=5 MSS 4000 5000 4250 4000 4500 4000 4750 4000 5000 Drop detected rd dup-ACK Retransmit lost packet 3 SN: 4 MSS. L=1 MSS Cwnd=cwnd/2 AN=5 MSS SN: 13 MSS. L=1 MSS SN: 14 MSS. L=1 MSS SN: 15 MSS. L=1 MSS SN: 16 MSS. L=1 MSS 4000 2000 SN: 17 MSS. L=1 MSS SN: 18 MSS. L=1 MSS AN=13 MSS 4 pkts are already un. ACKed, so don’t send anymore

Fast recovery (actual, not approximate) r Upon the two DUP ACK arrival, do nothing. Don’t send any packets (In. Flight is the same). r Upon the third Dup ACK, m m m set SSThres=cwnd/2. Cwnd=cwnd/2+3 Retransmit the requested packet. r Upon every DUP ACK, cwnd=cwnd+1. r If In. Flight<cwnd, send a packet and increment In. Flight. r When a new ACK arrives, set cwnd=ssthres (RENO). r When an ACK arrives that ACKs all packets that were outstanding when the first drop was detected, cwnd=ssthres (NEWRENO)

Congestion Avoidance (AIMD) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd) When a drop is detected via triple-dup ACK, cwnd = cwnd/2 cwnd inflight ssthresh 4000 0 4000 1000 0 0 4000 2000 0 SN: 2000 AN: 30 Length: 1000 4000 3000 0 SN: 3000 AN: 30 Length: 1000 4000 0 4250 3000 SN: 1000 AN: 30 Length: 1000 0 4250 4000 0 4500 4750 3000 4000 0 0 5000 3000 5000 4000 0 0 5000 0 SN: 4000 AN: 30 Length: 1000 SN: 30 AN: 2000 RWin: 10000 SN: 30 AN: 3000 RWin: 9000 SN: 30 AN: 4000 Rwin: 8000 SN: 30 AN: 2000 RWin: 7000 SN: 5000 AN: 30 Length: 1000 SN: 6000 AN: 30 Length: 1000 SN: 7000 AN: 30 Length: 1000/ SN: 8000 AN: 30 Length: 1000/ SN: 9000 AN: 30 Length: 1000/

Congestion Avoidance (AIMD) (actual, not approximate) When an ACK arrives: cwnd = cwnd + 1 / floor(cwnd) When a drop is detected via triple-dup ACK, cwnd = cwnd/2 cwnd inflight ssthresh 0 8000 1000 0 0 SN: 1 MSS. L=1 MSS SN: 2 MSS. L=1 MSS SN: 3 MSS. L=1 MSS SN: 4 MSS. L=1 MSS SN: 5 MSS. L=1 MSS SN: 6 MSS. L=1 MSS SN: 7 MSS. L=1 MSS AN=2 MSS AN=3 MSS 8000 8125 8000 0 0 SN: 8 MSS. L=1 MSS 8250 8000 8375 8000 0 0 SN: 10 MSS. L=1 MSS AN=4 MSS SN: 11 MSS. L=1 MSS AN=4 MSS SN: 9 MSS. L=1 MSS AN=4 MSS 7000 8000 9000 4000 10000 4000 3 rd dup-ACK SN: 4 MSS. L=1 MSS AN=4 MSS SN: 12 MSS. L=1 MSS SN: 13 MSS. L=1 MSS AN=12 MSS 4000 2000 0 SN: 14 MSS. L=1 MSS SN: 15 MSS. L=1 MSS

TCP Performance • Q 2: at what rate does cwnd increase? • How often does cwnd increase by 1 • Each RTT, cwnd increases by 1 • d. Rate/dt = 1/RTT • Q 1: What is the rate that packets are sent? • How many pkts are send in a RTT? • Rate = cwnd / RTT Seq# (MSS) cwnd 4 RTT 4. 25 4. 75 5 1 2 3 4 5 6 7 8 9 RTT 5. 2 10 5. 4 5. 6 5. 8 6 11 12 13 14 15 2 3 4 5 5 6 7 8 9 10 11 12 13 14 15

TCP Start Up r What should the initial value of cwnd be? m Option one: large, it should be a rough guess of the steady state value of cwnd • But this might cause too much congestion m Option two: do it more slowly = slow start r Slow Start m Initially, cwnd = cwnd 0 (typical 1, 2 or 3) m When an non-dup ack arrives • cwnd = cwnd + 1 m When a pkt loss is detected, exit slow start

Slow start cwnd SYN: Seq#=20 Ack#=X SYN: Seq#=1000 Ack#=21 SYN: Seq#=21 Ack#=1001 1 Seq#=21 Ack#=1001 Data=‘…’ size =1000 2 Seq#=1021 Ack#=1001 Data=‘…’ size =1000 Seq#=2021 Ack#=1001 Data=‘…’ size =1000 3 4 Seq#=1001 Ack#=1021 size =0 Seq#=1021 Ack#=1001 Data=‘…’ size =1000 Seq#=2021 Ack#=1001 Data=‘…’ size =1000 5 6 7 8 4 Seq#=1001 Ack#=1021 size =0 Triple dup ack Seq#=1001 Ack#=1021 size =0

drops drop Slow start Congestion avoidance After a drop in slow start, TCP switches to AIMD (congestion avoidance) How quickly does cwnd increase during slow start? How much does it increase in 1 RTT? It roughly doubles each RTT – it grows exponentially dcnwd/dt = 2 cwnd

Slow start r The exponential growth of cwnd during slow start can get a bit of control. r To tame things: r Initially: m m cwnd = 1, 2 or 3 SSThresh = SSThresh 0 (e. g. , 44 MSS) r When an new ACK arrives m m m cwnd = cwnd + 1 if cwnd >= SSThresh, go to congestion avoidance If a triple dup ACK occures, cwnd=cwnd/2 and go to congestion avoidance

TCP Behavior drops cwnd Cwnd=ssthresh Slow start Congestion avoidance drops cwnd drop Slow start Congestion avoidance

Time out? r Detecting losses with time out is considered to be an indication of severe r When time out occurs: m Ssthresh m cwnd = cwnd/2 =1 m RTO = 2 x. RTO m Enter slow start

Time Out cwnd SSThresh 8 X RTO 1 4 2 4 3 4 4. 25 X 4. 5 4. 75 5 X X X Cwnd = ssthresh => exit slow start and enter congestion avoidance

Time out RTO 2 x. RTO Give up if no ACK for ~120 sec min(4 x. RTO, 64 sec)

Rough view of TCP congestion control drops Cwnd=ssthres Slow start Congestion avoidance drops drop Slow start Congestion avoidance Slow start

TCP Tahoe (old version of TCP) Enter slow start after every loss drop Slow start Congestion avoidance Slow start

Summary of TCP congestion control r Theme: probe the system. m m Slowly increase cwnd until there is a packet drop. That must imply that the cwnd size (or sum of windows sizes) is larger than the BWDP. Once a packet is dropped, then decrease the cwnd. And then continue to slowly increase. r Two phases: m m slow start (to get to the ballpark of the correct cwnd) Congestion avoidance, to oscillate around the correct cwnd size. Cwnd>ssthress Triple dup ack Connection establishment Congestion avoidance Slow-start timeout Connection termination

Slow start state chart

Congestion avoidance state chart

TCP sender congestion control State Event TCP Sender Action Commentary Slow Start (SS) ACK receipt Cong. Win = Cong. Win + MSS, for previously If (Cong. Win > Threshold) unacked data set state to “Congestion Avoidance” Resulting in a doubling of Cong. Win every RTT Congestion Avoidance (CA) ACK receipt Cong. Win = Cong. Win+MSS * for previously (MSS/Cong. Win) unacked data Additive increase, resulting in increase of Cong. Win by 1 MSS every RTT SS or CA Loss event detected by triple duplicate ACK Threshold = Cong. Win/2, Cong. Win = Threshold, Set state to “Congestion Avoidance” Fast recovery, implementing multiplicative decrease. Cong. Win will not drop below 1 MSS. SS or CA Timeout Threshold = Cong. Win/2, Cong. Win = 1 MSS, Set state to “Slow Start” Enter slow start SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked Cong. Win and Threshold not changed

TCP Performance 1: ACK Clocking What is the maximum data rate that TCP can send data? source 1 Gbps 10 Mbps destination Rate that pkts are sent = 1 pkt for each ACK Rate that pkts are sent = 10 Mbps/pkt size Rate that pkts are sent = 1 Gbps/pkt size Rate that pkts are sent = 10 Mbps/pkt size = 1 pkt every 1. 2 msec = 1 pkt each 12 usec = 1 pkt each 1. 2 msec Rate that ACKs are sent: ACK 1 pkts = 10 Mbps/pkt size = 1 ACK every 1. 2 msec The sending rate is the correct date rate. No congestion should occur! This is due to ACK clocking; pkts are clocked our as fast as ACK arrive

TCP throughput

TCP throughput w Mean value = (w+w/2)/2 = w*3/4 w/2 Throughput = w/RTT = w*3/4/RTT

TCP Throughput How many packets sent during one cycle (i. e. , one tooth of the saw-tooth)? The “tooth” starts at w/2, increments by one, up to w w/2 + (w/2+1) + (w/2+2) + …. + (w/2+w/2) w/2 +1 terms = w/2 * (w/2+1) + (0+1+2+…w/2) = w/2 * (w/2+1) + (w/2*(w/2+1))/2 = (w/2)^2 + w/2 + 1/2(w/2)^2 + 1/2 w/2 = 3/2(w/2)^2 + 3/2(w/2) ~ 3/8 w^2 So one out of 3/8 w^2 packets is dropped. This gives a loss probability of p = 1/(3/8 w^2) Or w = sqrt(8/3) / sqrt(p) Combining with the first eq. Throughput = w*3/4/RTT = sqrt(8/3)*3/4 / (RTT * sqrt(p)) = sqrt(3/2) / (RTT * sqrt(p))

TCP Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R

Why is TCP fair? Two competing sessions: r Additive increase gives slope of 1, as throughout increases r multiplicative decreases throughput proportionally equal bandwidth share Connection 2 throughput R loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R

RTT unfairness r Throughput = sqrt(3/2) / (RTT * sqrt(p)) r A shorter RTT will get a higher throughput, even if the loss probability is the same TCP connection 1 bottleneck TCP router connection 2 capacity R Two connections share the same bottleneck, so they share the same critical resources A yet the one with a shorter RTT receives higher throughput, and thus receives a higher fraction of the critical resources

Fairness (more) Fairness and UDP r Multimedia apps often do not use TCP m do not want rate throttled by congestion control r Instead use UDP: m pump audio/video at constant rate, tolerate packet loss r Research area: TCP friendly Fairness and parallel TCP connections r nothing prevents app from opening parallel connections between 2 hosts. r Web browsers do this r Example: link of rate R supporting 9 connections; m m new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 !

TCP problems: TCP over “long, fat pipes” r Example: 1500 byte segments, 100 ms RTT, want 10 Gbps throughput r Requires window size W = 83, 333 in-flight segments r Throughput in terms of loss rate: 1. 22 × MSS RTT p r ➜ p = 2·10 -10 m Random loss from bit-errors on fiber links may have a higher loss probability r New versions of TCP for high-speed

TCP over wireless r In the simple case, wireless links have random losses. r These random losses will result in a low throughput, even if there is little congestion. r However, link layer retransmissions can dramatically reduce the loss probability r Nonetheless, there are several problems m Wireless connections might occasionally break. • TCP behaves poorly in this case. m The throughput of a wireless link may quickly vary • TCP is not able to react quick enough to changes in the conditions of the wireless channel.

Chapter 3: Summary r principles behind transport layer services: m multiplexing, demultiplexing m reliable data transfer m flow control m congestion control r instantiation and implementation in the Internet m UDP m TCP Next: r leaving the network “edge” (application, transport layers) r into the network “core”