TCP Session 22 INST 346 Technologies Infrastructure and

  • Slides: 34
Download presentation
TCP Session 22 INST 346 Technologies, Infrastructure and Architecture

TCP Session 22 INST 346 Technologies, Infrastructure and Architecture

Improving Support for Learning • • • Less lecture, more discussion, slow down More

Improving Support for Learning • • • Less lecture, more discussion, slow down More examples and lab-style homework In-class activities Discuss homework and quizzes in class Solicit topics in advance for exam review More readings (!) Jump around less on the slides More extensive exam study guide More quiz questions (!)

Goals • TCP – Connection Setup – Reliable Transfer – Timeout Setting – Flow

Goals • TCP – Connection Setup – Reliable Transfer – Timeout Setting – Flow Control – Disconnection • Maybe: BGP

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 § point-to-point: • one sender, one

TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 § point-to-point: • one sender, one receiver § reliable, in-order byte steam: • no “message boundaries” § pipelined: • TCP congestion and flow control set window size § full duplex data: • bi-directional data flow in same connection • MSS: maximum segment size § connection-oriented: • handshaking (exchange of control msgs) inits sender, receiver state before data exchange § flow controlled: • sender will not overwhelm receiver

TCP segment structure 32 bits source port # sequence number ACK: ACK # valid

TCP segment structure 32 bits source port # sequence number ACK: ACK # valid A RSF checksum Internet checksum (as in UDP) counting by bytes of data acknowledgement number head len RST, SYN, FIN: connection estab (setup, teardown commands) dest port # receive window Urg data pointer options (variable length) application data (variable length) # bytes rcvr willing to accept

Connection Management before exchanging data, sender/receiver “handshake”: § agree to establish connection (each knowing

Connection Management before exchanging data, sender/receiver “handshake”: § agree to establish connection (each knowing the other willing to establish connection) § agree on connection parameters application connection state: ESTAB connection variables: seq # client-to-server-to-client rcv. Buffer size at server, client network Socket client. Socket = new. Socket("hostname", "port number"); application connection state: ESTAB connection Variables: seq # client-to-server-to-client rcv. Buffer size at server, client network Socket connection. Socket = welcome. Socket. accept();

TCP 3 -way handshake client state server state LISTEN choose init seq num, x

TCP 3 -way handshake client state server state LISTEN choose init seq num, x send TCP SYN msg SYNSENT received SYNACK(x) indicates server is live; ESTAB send ACK for SYNACK; this segment may contain client-to-server data SYNbit=1, Seq=x choose init seq num, y send TCP SYNACK SYN RCVD msg, acking SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 ACKbit=1, ACKnum=y+1 received ACK(y) indicates client is live ESTAB

TCP sender events: data rcvd from app: § create segment with seq # §

TCP sender events: data rcvd from app: § create segment with seq # § seq # is byte-stream number of first data byte in segment § start timer if not already running • think of timer as for oldest unacked segment • expiration interval: Time. Out. Interval timeout: § retransmit segment that caused timeout § restart timer ack rcvd: § if acknowledges previously unacked segments • update what is known to be ACKed • start timer if there are still unacked segments

TCP seq. numbers, ACKs outgoing segment from sender sequence numbers: • byte stream “number”

TCP seq. numbers, ACKs outgoing segment from sender sequence numbers: • byte stream “number” of first byte in segment’s data acknowledgements: • seq # of next byte expected from other side • cumulative ACK Q: how receiver handles out-of-order segments • A: TCP spec doesn’t say, - up to implementor source port # dest port # sequence number acknowledgement number rwnd checksum urg pointer window size N sender sequence number space sent ACKed sent, not- usable not yet ACKed but not usable (“in-flight”) yet sent incoming segment to sender source port # dest port # sequence number acknowledgement number rwnd A checksum urg pointer

TCP seq. numbers, ACKs Host B Host A User types ‘C’ host ACKs receipt

TCP seq. numbers, ACKs Host B Host A User types ‘C’ host ACKs receipt of echoed ‘C’ Seq=42, ACK=79, data = ‘C’ Seq=79, ACK=43, data = ‘C’ Seq=43, ACK=80 simple telnet scenario host ACKs receipt of ‘C’, echoes back ‘C’

TCP: retransmission scenarios Host B Host A Send. Base=92 X ACK=100 Seq=92, 8 bytes

TCP: retransmission scenarios Host B Host A Send. Base=92 X ACK=100 Seq=92, 8 bytes of data timeout Seq=92, 8 bytes of data Seq=100, 20 bytes of data ACK=100 ACK=120 Seq=92, 8 bytes of data Send. Base=100 ACK=100 Seq=92, 8 bytes of data Send. Base=120 ACK=120 Send. Base=120 lost ACK scenario premature timeout

TCP: retransmission scenarios Host B Host A Seq=92, 8 bytes of data timeout Seq=100,

TCP: retransmission scenarios Host B Host A Seq=92, 8 bytes of data timeout Seq=100, 20 bytes of data X ACK=100 ACK=120 Seq=120, 15 bytes of data cumulative ACK

TCP ACK generation [RFC 1122, RFC 2581] event at receiver TCP receiver action arrival

TCP ACK generation [RFC 1122, RFC 2581] event at receiver TCP receiver action arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed delayed ACK. Wait up to 500 ms for next segment. If no next segment, send ACK arrival of in-order segment with expected seq #. One other segment has ACK pending immediately send single cumulative ACK, ACKing both in-order segments arrival of out-of-order segment higher-than-expect seq. #. Gap detected immediately send duplicate ACK, indicating seq. # of next expected byte arrival of segment that partially or completely fills gap immediate send ACK, provided that segment starts at lower end of gap

TCP fast retransmit § time-out period often relatively long: • long delay before resending

TCP fast retransmit § time-out period often relatively long: • long delay before resending lost packet § detect lost segments via duplicate ACKs. • sender often sends many segments back -to-back • if segment is lost, there will likely be many duplicate ACKs. TCP fast retransmit if sender receives 3 ACKs for same data (“triple duplicate ACKs”), resend unacked segment with smallest seq # § likely that unacked segment lost, so don’t wait for timeout

TCP fast retransmit Host B Host A Seq=92, 8 bytes of data Seq=100, 20

TCP fast retransmit Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data X timeout ACK=100 Seq=100, 20 bytes of data fast retransmit after sender receipt of triple duplicate ACK

TCP round trip time, timeout Q: how to set TCP timeout value? § longer

TCP round trip time, timeout Q: how to set TCP timeout value? § longer than RTT • but RTT varies § too short: premature timeout, unnecessary retransmissions § too long: slow reaction to segment loss Q: how to estimate RTT? § Sample. RTT: measured time from segment transmission until ACK receipt • ignore retransmissions § Sample. RTT will vary, want estimated RTT “smoother” • average several recent measurements, not just current Sample. RTT

TCP round trip time, timeout Estimated. RTT = (1 - )*Estimated. RTT + *Sample.

TCP round trip time, timeout Estimated. RTT = (1 - )*Estimated. RTT + *Sample. RTT § exponential weighted moving average § influence of past sample decreases exponentially fast § typical value: = 0. 125 RTT (milliseconds) RTT: gaia. cs. umass. edu to fantasia. eurecom. fr sample. RTT Estimated. RTT time (seconds)

TCP round trip time, timeout § timeout interval: Estimated. RTT plus “safety margin” •

TCP round trip time, timeout § timeout interval: Estimated. RTT plus “safety margin” • large variation in Estimated. RTT -> larger safety margin § estimate deviation from Estimated. RTT: Dev. RTTSample. RTT = (1 - )*Dev. RTT + *|Sample. RTT-Estimated. RTT| (typically, = 0. 25) Timeout. Interval = Estimated. RTT + 4*Dev. RTT estimated RTT * Check out the online interactive exercises for more examples: http: //gaia. cs. umass. edu/kurose_ross/interactive/ “safety margin”

TCP flow control application may remove data from TCP socket buffers …. … slower

TCP flow control application may remove data from TCP socket buffers …. … slower than TCP receiver is delivering (sender is sending) application process application TCP socket receiver buffers TCP code IP code flow control receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast from sender receiver protocol stack OS

TCP flow control § receiver “advertises” free buffer space by including rwnd value in

TCP flow control § receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments • Rcv. Buffer size set via socket options (typical default is 4096 bytes) • many operating systems autoadjust Rcv. Buffer § sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value § guarantees receive buffer will not overflow to application process Rcv. Buffer rwnd buffered data free buffer space TCP segment payloads receiver-side buffering

TCP: closing a connection § client, server each close their side of connection •

TCP: closing a connection § client, server each close their side of connection • send TCP segment with FIN bit = 1 § respond to received FIN with ACK • on receiving FIN, ACK can be combined with own FIN § simultaneous FIN exchanges can be handled

TCP: closing a connection client state server state ESTAB client. Socket. close() FIN_WAIT_1 FIN_WAIT_2

TCP: closing a connection client state server state ESTAB client. Socket. close() FIN_WAIT_1 FIN_WAIT_2 can no longer send but can receive data FINbit=1, seq=x CLOSE_WAIT ACKbit=1; ACKnum=x+1 wait for server close FINbit=1, seq=y TIMED_WAIT timed wait for 2*max segment lifetime CLOSED can still send data LAST_ACK can no longer send data ACKbit=1; ACKnum=y+1 CLOSED

Inter-AS routing is different policy: § intra-AS: single admin, so single consistent policy §

Inter-AS routing is different policy: § intra-AS: single admin, so single consistent policy § inter-AS: each admin wants control over how its traffic routed and who routes through its AS performance: § intra-AS: can focus on performance § inter-AS: policy may dominate over performance

Inter-AS tasks § suppose router in AS 1 receives datagram destined outside of AS

Inter-AS tasks § suppose router in AS 1 receives datagram destined outside of AS 1: • router should forward packet to gateway router, but which one? AS 1 must: 1. learn which dests are reachable through AS 2, which through AS 3 2. propagate this reachability info to all routers in AS 1 3 c 3 b other networks 3 a AS 3 1 c 1 a AS 1 1 d 2 a 1 b 2 c 2 b AS 2 other networks

Internet inter-AS routing: BGP § BGP (Border Gateway Protocol): the de facto inter-domain routing

Internet inter-AS routing: BGP § BGP (Border Gateway Protocol): the de facto inter-domain routing protocol • “glue that holds the Internet together” § BGP provides each AS a means to: • e. BGP: obtain subnet reachability information from neighboring ASes • i. BGP: propagate reachability information to all AS-internal routers. • determine “good” routes to other networks based on reachability information and policy § allows subnet to advertise its existence to rest of Internet: “I am here”

e. BGP, i. BGP connections 2 b 2 a 1 b 1 a 1

e. BGP, i. BGP connections 2 b 2 a 1 b 1 a 1 c 2 d AS 2 1 d AS 1 1 c 2 c ∂ e. BGP connectivity i. BGP connectivity 3 b ∂ 3 a 3 c 3 d AS 3 gateway routers run both e. BGP and i. BGP protools

BGP basics § BGP session: two BGP routers (“peers”) exchange BGP messages over semi-permanent

BGP basics § BGP session: two BGP routers (“peers”) exchange BGP messages over semi-permanent TCP connection: • advertising paths to different destination network prefixes (BGP is a “path vector” protocol) § when AS 3 gateway router 3 a advertises path AS 3, X to AS 2 gateway router 2 c: • AS 3 promises to AS 2 it will forward datagrams towards X AS 1 AS 3 1 b 1 a 3 b 3 a 1 c AS 2 1 d 2 b 2 a 3 d 2 c 2 d 3 c BGP advertisement: AS 3, X X

Path attributes and BGP routes § advertised prefix includes BGP attributes • prefix +

Path attributes and BGP routes § advertised prefix includes BGP attributes • prefix + attributes = “route” § two important attributes: • AS-PATH: list of ASes through which prefix advertisement has passed • NEXT-HOP: indicates specific internal-AS router to next-hop AS § Policy-based routing: • gateway receiving route advertisement uses import policy to accept/decline path (e. g. , never route through AS Y). • AS policy also determines whether to advertise path to other neighboring ASes

BGP path advertisement AS 1 AS 3 1 b 1 a 3 a 1

BGP path advertisement AS 1 AS 3 1 b 1 a 3 a 1 c AS 2 1 d AS 2, AS 3, X 3 b 2 b 2 a AS 3, X 3 c 3 d X 2 c 2 d § AS 2 router 2 c receives path advertisement AS 3, X (via e. BGP) from AS 3 router 3 a § Based on AS 2 policy, AS 2 router 2 c accepts path AS 3, X, propagates (via i. BGP) to all AS 2 routers § Based on AS 2 policy, AS 2 router 2 a advertises (via e. BGP) path AS 2, AS 3, X to AS 1 router 1 c

BGP path advertisement AS 1 1 b 1 a AS 3, X 3 b

BGP path advertisement AS 1 1 b 1 a AS 3, X 3 b 3 a 1 c AS 2 1 d AS 2, AS 3, X AS 3 2 b 2 a AS 3, X 3 c 3 d X 2 c 2 d gateway router may learn about multiple paths to destination: § AS 1 gateway router 1 c learns path AS 2, AS 3, X from 2 a § AS 1 gateway router 1 c learns path AS 3, X from 3 a § Based on policy, AS 1 gateway router 1 c chooses path AS 3, X, and advertises path within AS 1 via i. BGP

BGP: achieving policy via advertisements legend: B W provider network X A customer network:

BGP: achieving policy via advertisements legend: B W provider network X A customer network: C Y Suppose an ISP only wants to route traffic to/from its customer networks (does not want to carry transit traffic between other ISPs) § A advertises path Aw to B and to C § B chooses not to advertise BAw to C: § B gets no “revenue” for routing CBAw, since none of C, A, w are B’s customers § C does not learn about CBAw path § C will route CAw (not using B) to get to w

BGP: achieving policy via advertisements legend: B W provider network X A customer network:

BGP: achieving policy via advertisements legend: B W provider network X A customer network: C Y Suppose an ISP only wants to route traffic to/from its customer networks (does not want to carry transit traffic between other ISPs) § A, B, C are provider networks § X, W, Y are customer (of provider networks) § X is dual-homed: attached to two networks § policy to enforce: X does not want to route from B to C via X §. . so X will not advertise to B a route to C

BGP route selection § router may learn about more than one route to destination

BGP route selection § router may learn about more than one route to destination AS, selects route based on: 1. 2. 3. 4. local preference value attribute (policy decision) shortest AS-PATH closest NEXT-HOP router (hot potato routing) additional criteria

Hot Potato Routing AS 1 AS 3 1 b 1 a 3 a 1

Hot Potato Routing AS 1 AS 3 1 b 1 a 3 a 1 c AS 2 2 b 1 d AS 1, AS 3, X 3 b 2 a 152 263 201 2 d 112 3 c 3 d X AS 3, X 2 c OSPF link weights § 2 d learns (via i. BGP) it can route to X via 2 a or 2 c § hot potato routing: choose local gateway that has least intra-domain cost (e. g. , 2 d chooses 2 a, even though more AS hops to X): don’t worry about inter-domain cost!