TCP EE 122 Intro to Communication Networks Fall

  • Slides: 60
Download presentation
TCP EE 122: Intro to Communication Networks Fall 2010 (MW 4 -5: 30 in

TCP EE 122: Intro to Communication Networks Fall 2010 (MW 4 -5: 30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http: //inst. eecs. berkeley. edu/~ee 122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson and other colleagues at Princeton and UC Berkeley

Today’s Lecture • Review some basic concepts from routing lectures – Lots of details

Today’s Lecture • Review some basic concepts from routing lectures – Lots of details in previous lectures – Today just focus on a few key points • Basic overview of TCP – Service model and header structure – Segments and sequence numbers – Setting up and tearing down connections – Timers and retransmissions – Many details ignored – Congestion control next lecture 2

Quick Review 3

Quick Review 3

Review of BGP: Simplified Version • If domains A and B have an interdomain

Review of BGP: Simplified Version • If domains A and B have an interdomain link: – Their border routers announce routes to each other o One route for each reachable prefix – Routes announced whenever changed or withdrawn o Route withdrawn when domain no longer offering path to prefix o It usually has a path itself, but is choosing to not export that path • Policies: – Import policy: which routes the domain will use o Chooses among routes advertised by neighbors – Export policy: which routes the domain lets other use o Purely a filtering policy o Domain can only advertise routes it imported 4

Review of DVMRP: Simplified Version • Starts by broadcasting along reverse path tree –

Review of DVMRP: Simplified Version • Starts by broadcasting along reverse path tree – Tree formed by paths from members to source – Why is this a tree? • Prune tree, to avoid sending wasted messages – Leaf networks start by issuing NMR o Non-Membership Report – If all of a router’s children send NMRs, and it has no local members, then it sends an NMR to its parent • This builds source-specific trees: – Packets from source S to group member m follow same path that packets from m take to reach S (in reverse) 5

Constructing Source-Specific Tree source M 1 M 2 M 3 • Individual paths from

Constructing Source-Specific Tree source M 1 M 2 M 3 • Individual paths from members to source • Union of these paths form tree • Data packets sent in opposite direction down tree 6

Review of CBT: Simplified Version • Picks core (root, center, whatever) for each group

Review of CBT: Simplified Version • Picks core (root, center, whatever) for each group • Member sends join message towards core • This builds a shared spanning tree – Later joins are “grafted” onto existing tree • Packets are delivered over this tree using standard flooding over spanning tree – Send out on all but incoming interface 7

Building and Using Shared Tree • Group members send joins to core – Joins

Building and Using Shared Tree • Group members send joins to core – Joins are grafted on to tree • M 1 sends data to group core M 1 M 2 M 3 control (join) messages data 8

Review of Fair Sharing • Given a set of bandwidth demands ri and a

Review of Fair Sharing • Given a set of bandwidth demands ri and a total bandwidth C, the max-min bandwidth allocations are: ai = min(f, ri) • where f is the unique value such that Sum(ai) = C • Property: – If you don’t get full demand, no one gets more than you 9

TCP 10

TCP 10

TCP Service Model • Reliable, in-order, duplex byte-stream delivery – • Hopefully with good

TCP Service Model • Reliable, in-order, duplex byte-stream delivery – • Hopefully with good performance Challenges - the network can – drop packets o – delay packets o – Follows from possibility of arbitrary delay replicate packets o – Even perhaps for many seconds deliver packets out-of-order o – Even perhaps a large number Weird, but it does sometimes happen corrupt packets 11

TCP Support for Reliable Delivery • Checksum – – • Sequence numbers – –

TCP Support for Reliable Delivery • Checksum – – • Sequence numbers – – • Used to detect corrupted data at the receiver …leading the receiver to drop the packet Used to detect missing data. . . and for putting the data back in order Retransmission – – – Sender retransmits lost or corrupted data Timeout based on estimates of round-trip time Fast retransmit algorithm for rapid retransmission 12

TCP Header Source port Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised

TCP Header Source port Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 13

TCP Header Source port These should be familiar Destination port Sequence number Acknowledgment Hdr.

TCP Header Source port These should be familiar Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 14

TCP Header Starting sequence number (byte offset) of data carried in this segment Source

TCP Header Starting sequence number (byte offset) of data carried in this segment Source port Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 15

TCP Header Acknowledgment gives seq # just beyond highest seq. received in order. “What’s

TCP Header Acknowledgment gives seq # just beyond highest seq. received in order. “What’s Next” If sender sends N in-order bytes starting at seq S then ack for it will be S+N. Source port Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 16

ACKing and Sequence Numbers • Sender sends packet – – Data starts with sequence

ACKing and Sequence Numbers • Sender sends packet – – Data starts with sequence number X Packet contains B bytes o • X, X+1, X+2, …. X+B-1 Upon receipt of packet, receiver sends an ACK – If all data prior to X already received: o – ACK acknowledges X+B (because that is next expected byte) If highest byte already received is some smaller value Y o o ACK acknowledges Y+1 Even if this has been ACKed before 17

TCP Header Source port Destination port Sequence number Buffer space available for receiving data.

TCP Header Source port Destination port Sequence number Buffer space available for receiving data. Used for TCP’s sliding window. Interpreted as offset beyond Acknowledgment field’s value. Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 18

Flow Control • Advertised Window: W – • Can send W bytes beyond the

Flow Control • Advertised Window: W – • Can send W bytes beyond the next expected byte Receiver uses W to prevent sender from overflowing buffer 19

Sliding Window • Allow a given amount of data “in flight” Sending process TCP

Sliding Window • Allow a given amount of data “in flight” Sending process TCP Last byte written Last byte ACKed Last byte can send Receiving process Window Last byte read Next byte needed Last byte received 20

Advertised Window Limits Rate • If the window is W, then sender can send

Advertised Window Limits Rate • If the window is W, then sender can send no faster than W/RTT bytes/sec – Receiver implicitly limits sender to rate that receiver can sustain – If sender is going too fast, window advertisements get smaller & smaller • In original TCP design, that was the sole protocol mechanism controlling sender’s rate • What’s missing? – Will cover that next time…. 21

TCP Header Source port Destination port Sequence number Number of 4 -byte words in

TCP Header Source port Destination port Sequence number Number of 4 -byte words in TCP header; 5 = no options Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 22

TCP Header Source port Destination port Sequence number “Must Be Zero” 6 bits reserved

TCP Header Source port Destination port Sequence number “Must Be Zero” 6 bits reserved Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 23

TCP Header Source port Destination port Sequence number We will get to these shortly

TCP Header Source port Destination port Sequence number We will get to these shortly Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 24

TCP Header Source port Destination port Sequence number Used with URG flag to indicate

TCP Header Source port Destination port Sequence number Used with URG flag to indicate urgent data (not discussed further) Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data 25

Segments and Sequence Numbers 26

Segments and Sequence Numbers 26

TCP “Stream of Bytes” Service Host A Byte 80 Byte 3 Byte 2 Byte

TCP “Stream of Bytes” Service Host A Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Host B Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 27

… Provided Using TCP “Segments” Host A Byte 80 Byte 3 Byte 2 Byte

… Provided Using TCP “Segments” Host A Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 Segment sent when: TCP Data Host B 1. Segment full (Max Segment Size), 2. Not full, but times out, or 3. “Pushed” by application. TCP Data Byte 80 Byte 3 Byte 2 Byte 1 Byte 0 28

TCP Segment IP Data TCP Data (segment) TCP Hdr IP Hdr • IP packet

TCP Segment IP Data TCP Data (segment) TCP Hdr IP Hdr • IP packet – No bigger than Maximum Transmission Unit (MTU) – E. g. , up to 1, 500 bytes on an Ethernet • TCP packet – IP packet with a TCP header and data inside – TCP header 20 bytes long • TCP segment – No more than Maximum Segment Size (MSS) bytes – E. g. , up to 1460 consecutive bytes from the stream – MSS = MTU – (IP header) – (TCP header) 29

Sequence Numbers Host A ISN (initial sequence number) Sequence number = 1 st byte

Sequence Numbers Host A ISN (initial sequence number) Sequence number = 1 st byte Host B TCP Data TCP HDR TCP Data ACK sequence number = next expected byte TCP HDR 30

Initial Sequence Number (ISN) • Sequence number for the very first byte – E.

Initial Sequence Number (ISN) • Sequence number for the very first byte – E. g. , Why not just use ISN = 0? • Practical issue – IP addresses and port #s uniquely identify a connection – Eventually, though, these port #s do get used again – … small chance an old packet is still in flight – … and might be associated with new connection • TCP therefore requires changing ISN – Set from 32 -bit clock that ticks every 4 microseconds – … only wraps around once every 4. 55 hours • To establish a connection, hosts exchange ISNs 31

Connection Establishment: TCP’s Three-Way Handshake 32

Connection Establishment: TCP’s Three-Way Handshake 32

Establishing a TCP Connection A B SYN C SYN A K ACK Each host

Establishing a TCP Connection A B SYN C SYN A K ACK Each host tells its ISN to the other host. Data • Three-way handshake to establish connection – Host A sends a SYN (open; “synchronize sequence numbers”) to host B – Host B returns a SYN acknowledgment (SYN ACK) – Host A sends an ACK to acknowledge the SYN ACK 33

TCP Header Source port Flags: SYN ACK FIN RST PSH URG Destination port Sequence

TCP Header Source port Flags: SYN ACK FIN RST PSH URG Destination port Sequence number Acknowledgment Hdr. Len 0 Flags Advertised window Checksum Urgent pointer Options (variable) Data See /usr/include/netinet/tcp. h on Unix Systems 34

Step 1: A’s Initial SYN Packet A’s port B’s port A’s Initial Sequence Number

Step 1: A’s Initial SYN Packet A’s port B’s port A’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG (Irrelevant since ACK not set) 5=20 B 0 Flags Checksum Advertised window Urgent pointer Options (variable) A tells B it wants to open a connection… 35

Step 2: B’s SYN-ACK Packet B’s port A’s port B’s Initial Sequence Number Flags:

Step 2: B’s SYN-ACK Packet B’s port A’s port B’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG ACK = A’s ISN plus 1 20 B 0 Flags Checksum Advertised window Urgent pointer Options (variable) B tells A it accepts, and is ready to hear the next byte… … upon receiving this packet, A can start sending data 36

Step 3: A’s ACK of the SYN-ACK A’s port B’s port A’s Initial Sequence

Step 3: A’s ACK of the SYN-ACK A’s port B’s port A’s Initial Sequence Number Flags: SYN ACK FIN RST PSH URG B’s ISN plus 1 20 B Flags 0 Checksum Advertised window Urgent pointer Options (variable) A tells B it’s likewise okay to start sending … upon receiving this packet, B can start sending data 37

Timing Diagram: 3 -Way Handshaking Passive Open Active Open Server listen() Client (initiator) connect()

Timing Diagram: 3 -Way Handshaking Passive Open Active Open Server listen() Client (initiator) connect() SYN, Seq Num = x ck = x A , y = m q. Nu +1 K, Se SYN + AC ACK, Ack =y+1 accept() 38

What if the SYN Packet Gets Lost? • Suppose the SYN packet gets lost

What if the SYN Packet Gets Lost? • Suppose the SYN packet gets lost – Packet is lost inside the network, or: – Server discards the packet (e. g. , listen queue is full) • Eventually, no SYN-ACK arrives – Sender sets a timer and waits for the SYN-ACK – … and retransmits the SYN if needed • How should the TCP sender set the timer? – Sender has no idea how far away the receiver is – Hard to guess a reasonable length of time to wait – SHOULD (RFCs 1122 & 2988) use default of 3 seconds o Other implementations instead use 6 seconds 39

SYN Loss and Web Downloads • User clicks on a hypertext link – Browser

SYN Loss and Web Downloads • User clicks on a hypertext link – Browser creates a socket and does a “connect” – The “connect” triggers the OS to transmit a SYN • If the SYN is lost… – 3 -6 seconds of delay: can be very long – User may become impatient – … and click the hyperlink again, or click “reload” • User triggers an “abort” of the “connect” – Browser creates a new socket and another “connect” – Essentially, forces a faster send of a new SYN packet! – Sometimes very effective, and the page comes quickly 40

5 Minute Break Questions Before We Proceed? 41

5 Minute Break Questions Before We Proceed? 41

Announcements • Mini-lecture next Monday by Igor on: A Quick Review of Networking Libraries

Announcements • Mini-lecture next Monday by Igor on: A Quick Review of Networking Libraries • Homework 3 b is out 42

Tearing Down the Connection 43

Tearing Down the Connection 43

Normal Termination, One Side At A Time ACK FIN Data ACK FIN ACK CK

Normal Termination, One Side At A Time ACK FIN Data ACK FIN ACK CK A SYN B time • Finish (FIN) to close and receive remaining bytes – FIN occupies one octet in the sequence space • Other host ack’s the octet to confirm Connection now half-closed • Closes A’s side of the connection, but not B’s – Until B likewise sends a FIN – Which A then acks Connection now closed Timeout: Avoid reincarnation B will retransmit FIN if ACK is lost 44

FIN Data ACK SYN ACK CK ACK SYN A A FIN + B ACK

FIN Data ACK SYN ACK CK ACK SYN A A FIN + B ACK Normal Termination, Both Together time Timeout: Avoid reincarnation Can retransmit FIN ACK if ACK lost Connection now closed • Same as before, but B sets FIN with their ack of A’s FIN 45

Sending/Receiving the FIN Packet • Sending a FIN: close() – Process has finished sending

Sending/Receiving the FIN Packet • Sending a FIN: close() – Process has finished sending data via the socket – Process calls “close()” to close the socket – Once TCP has sent all of the outstanding bytes… – … then TCP sends a FIN • Receiving a FIN: EOF – Process is reading data from the socket – Eventually, the attempt to read returns an EOF – All bytes prior to sender calling close() have been delivered o Even if bytes not yet ack’d o Because FIN has seqno beyond all the bytes … o … and thus won’t be ack’d until all bytes are delivered 46

Abrupt Termination RST Data ACK CK A SYN B time • A sends a

Abrupt Termination RST Data ACK CK A SYN B time • A sends a RESET (RST) to B – E. g. , because app. process on A crashed • That’s it – – B does not ack the RST Thus, RST is not delivered reliably And: any data in flight is lost But: if B sends anything more, will elicit another RST 47

Reliability: TCP Retransmission 48

Reliability: TCP Retransmission 48

Packe t ACK Packet lost ACK Packe t ACK lost DUPLICATE PACKET Timeout Packe

Packe t ACK Packet lost ACK Packe t ACK lost DUPLICATE PACKET Timeout Packe t Timeout Packe Timeout Reasons for Retransmission Packe t K C A Packe t ACK Early timeout DUPLICATE PACKETS 49

How Long Should Sender Wait? • Sender sets a timeout to wait for an

How Long Should Sender Wait? • Sender sets a timeout to wait for an ACK – Too short: wasted retransmissions – Too long: excessive delays when packet lost • TCP sets retransmission timeout (RTO) as function of RTT – Expect ACK to arrive roughly an RTT after data sent – … plus slop to allow for variations (e. g. , queuing, MAC) • But: how do we measure RTT? • And: what is a good estimate for RTT? • And: what’s a good estimate for “slop”? 50

Problem: Ambiguous Measurement • How to differentiate between the real ACK, and ACK of

Problem: Ambiguous Measurement • How to differentiate between the real ACK, and ACK of the retransmitted packet? Sample. RTT ? Origin Receiver al Tra nsmis Retra nsmi sion ssion Sender Sample. RTT ? Sender Origin Receiver al Tra nsmis s ion CK Retra A nsmi ssion ACK 51

Karn/Partridge Algorithm • Measure Sample. RTT only for original transmissions – Once a segment

Karn/Partridge Algorithm • Measure Sample. RTT only for original transmissions – Once a segment has been retransmitted, do not use it for any further measurements • Also, employ exponential backoff – Every time RTO timer expires, set RTO 2·RTO – (Up to maximum 60 sec) – Every time new measurement comes in (= successful original transmission), collapse RTO back to computed value 52

Next Step • Turn these individual RTT measurements into an estimate of RTT that

Next Step • Turn these individual RTT measurements into an estimate of RTT that we can use to compute RTO • Challenge: – Average RTT, but recent values more important 53

Exponential Averaging: • Estimate(n) = α Estimate(n-1) + (1 -α) Value(n) Expanding: • Estimate(n)

Exponential Averaging: • Estimate(n) = α Estimate(n-1) + (1 -α) Value(n) Expanding: • Estimate(n) = (1 -α) Sum {αk Value(n-k)} • Weight on historical data decreases exponentially 54

RTT Estimation Estimated. RTT • Use exponential averaging: Sample. RTT Time 55

RTT Estimation Estimated. RTT • Use exponential averaging: Sample. RTT Time 55

Jacobson/Karels Algorithm • Compute “slop” in terms of observed variability – standard deviation requires

Jacobson/Karels Algorithm • Compute “slop” in terms of observed variability – standard deviation requires expensive square root – Use mean deviation instead • Deviation = | Sample. RTT – Estimated. RTT | • Estimated. Deviation: exp. average of Deviation • RTO = Estimated. RTT + 4 x Estimated. Deviation 56

This is all very interesting, but…. . • Implementations often use a coarse-grained timer

This is all very interesting, but…. . • Implementations often use a coarse-grained timer – 500 msec is typical • So what? – Above algorithms are largely irrelevant – Incurring a timeout is expensive 57

Alternative to Timeouts • Triple duplicate ACK (“three dups”) – Packet n is lost,

Alternative to Timeouts • Triple duplicate ACK (“three dups”) – Packet n is lost, but packets n+1, n+2, …, arrive • On each arrival of a packet not in sequence, receiver generates an ACK – So as n+1, n+2, … arrive, receiver generates repeated ACKs for seq. no. n – “duplicate” acknowledgments since they all look the same – Sender sees 3 of these and immediately retransmits packet n (and only n) • Termed Fast Retransmission 58

Fast Retransmission • Resend a segment after 3 duplicate ACKs – Duplicate ACK means

Fast Retransmission • Resend a segment after 3 duplicate ACKs – Duplicate ACK means that an out-of sequence segment was received segment 1 ACK 2 segment 3 ACK 4 • Notes: 3 duplicate ACKs – ACKs are for next expected packet – Packet reordering can cause duplicate ACKs – Window may be too small to generate enough duplicate ACKs ACK 4 segment 5 segment 6 segment 7 segment 4 59

Summary • Reliable, in-order, byte-stream delivery – Sequence numbers – Acknowledgments – 3 -way

Summary • Reliable, in-order, byte-stream delivery – Sequence numbers – Acknowledgments – 3 -way handshake to establish – 3 -way or 4 -way handshake to terminate – Timer-based retransmission • What’s missing? – Taking network conditions into consideration • Next lecture – Congestion control (K&R 3. 6, 3. 7) 60