Whats the Internet nuts and bolts view r
What’s the Internet: “nuts and bolts” view r millions of connected computing devices: hosts = end systems m examples of hosts? router workstation server mobile local ISP r running network apps m examples of applications? regional ISP company network Transport Layer 1
What’s the Internet: “nuts and bolts” view r communication links m fiber, copper, radio, satellite m transmission rate = bandwidth • typical bandwidth for modem? wireless? router workstation server mobile local ISP regional ISP r routers: forward packets (chunks of data) m what’s in a packet? company network Transport Layer 2
What’s the Internet: “nuts and bolts” view r protocols control sending, receiving of msgs m e. g. , TCP, IP, HTTP, FTP, PPP r Internet: “network of networks” m loosely hierarchical m public Internet versus private intranet r Internet standards m RFC: Request for comments m IETF: Internet Engineering Task Force router workstation server mobile local ISP regional ISP company network Transport Layer 3
What’s the Internet: a service view r communication infrastructure enables distributed applications: m Web, email, other examples? r communication services provided to apps: m connection-oriented reliable • example apps? m Connectionless unreliable • example apps? Transport Layer 4
What’s a protocol? human protocols: r “what’s the time? ” r “I have a question” r introductions network protocols: r machines rather than humans r all communication activity in Internet governed by protocols define format, order of msgs sent and received among network entities, and actions taken on msg transmission, receipt Transport Layer 5
What’s a protocol? a human protocol and a computer network protocol: Hi TCP connection req Hi TCP connection response Got the time? Get http: //www. awl. com/kurose-ross 2: 00 <file> time Q: Why are protocols so important? Transport Layer 6
A closer look at network structure: r network edge: applications and hosts r network core: m m routers network of networks r access networks, physical media: communication links Transport Layer 7
Protocol “Layers” Networks are complex! r many “pieces”: m hosts m routers m links of various media m applications m protocols m hardware, software Question: Is there any hope of organizing structure of network? Or at least our discussion of networks? Transport Layer 8
Organization of air travel ticket (purchase) ticket (complain) baggage (check) baggage (claim) gates (load) gates (unload) runway takeoff runway landing airplane routing r a series of steps Transport Layer 9
Layering of airline functionality ticket (purchase) ticket (complain) ticket baggage (check) baggage (claim baggage gates (load) gates (unload) gate runway (takeoff) runway (land) takeoff/landing airplane routing departure airport airplane routing intermediate air-traffic control centers arrival airport Layers: each layer implements a service m via its own internal-layer actions m relying on services provided by layer below Transport Layer 10
Why layering? Dealing with complex systems: r explicit structure allows identification, relationship of complex system’s pieces m layered reference model for discussion r modularization eases maintenance, updating of system m change of implementation of layer’s service transparent to rest of system m e. g. , change in gate procedure doesn’t affect rest of system r layering considered harmful? Transport Layer 11
Internet protocol stack r application: supporting network applications m FTP, SMTP, STTP r transport: host-host data transfer m TCP, UDP r network: routing of datagrams from source to destination m transport IP, routing protocols r link: data transfer between neighboring network elements m application PPP, Ethernet r physical: bits “on the wire” network link physical Transport Layer 12
source message segment Ht datagram Hn Ht frame Hl Hn Ht M M Encapsulation application transport network link physical Hl Hn Ht M switch destination M Ht M Hn Ht Hl Hn Ht M M application transport network link physical Hn Ht Hl Hn Ht M M router Transport Layer 13
Internet transport protocols services TCP service: r connection-oriented: setup r r required between client and server processes reliable transport between sending and receiving process flow control: sender won’t overwhelm receiver congestion control: throttle sender when network overloaded does not provide: timing, minimum bandwidth guarantees UDP service: r unreliable data transfer between sending and receiving process r does not provide: connection setup, reliability, flow control, congestion control, timing, or bandwidth guarantee Q: why bother? Why is there a UDP? Transport Layer 14
Transport vs. network layer r network layer: logical communication between hosts r transport layer: logical communication between processes m relies on, enhances, network layer services Household analogy: 12 kids sending letters to 12 kids r processes = kids r app messages = letters in envelopes r hosts = houses r transport protocol = Ann and Bill r network-layer protocol = postal service Transport Layer 15
Reliable data transfer: getting started We’ll: r incrementally develop sender, receiver sides of reliable data transfer protocol (rdt) r consider only unidirectional data transfer m but control info will flow on both directions! r use finite state machines (FSM) to specify sender, receiver state: when in this “state” next state uniquely determined by next event state 1 event causing state transition actions taken on state transition event actions state 2 Transport Layer 16
Rdt 1. 0: reliable transfer over a reliable channel r underlying channel perfectly reliable m no bit errors m no loss of packets r separate FSMs for sender, receiver: m sender sends data into underlying channel m receiver read data from underlying channel Wait for call from above rdt_send(data) packet = make_pkt(data) udt_send(packet) sender Wait for call from below rdt_rcv(packet) extract (packet, data) deliver_data(data) receiver Transport Layer 17
Rdt 2. 0: channel with bit errors r underlying channel may flip bits in packet m checksum to detect bit errors r the question: how to recover from errors: m acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK m negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors m sender retransmits pkt on receipt of NAK r new mechanisms in rdt 2. 0 (beyond rdt 1. 0): m m error detection receiver feedback: control msgs (ACK, NAK) rcvr->sender Transport Layer 18
rdt 2. 0: FSM specification rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) L sender receiver rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 19
rdt 2. 0: operation with no errors rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) L rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 20
rdt 2. 0: error scenario rdt_send(data) snkpkt = make_pkt(data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && is. NAK(rcvpkt) Wait for call from ACK or udt_send(sndpkt) above NAK rdt_rcv(rcvpkt) && is. ACK(rcvpkt) L rdt_rcv(rcvpkt) && corrupt(rcvpkt) udt_send(NAK) Wait for call from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) udt_send(ACK) Transport Layer 21
rdt 2. 0 has a fatal flaw! r Stop and Wait m Sender sends one packet, then waits for receiver response r What happens if ACK/NAK corrupted? m m sender doesn’t know what happened at receiver! can’t just retransmit: possible duplicate r Handling duplicates: m m m sender adds sequence number to each pkt sender retransmits current pkt if ACK/NAK garbled receiver discards (doesn’t deliver up) duplicate pkt Transport Layer 22
rdt 2. 1: sender, handles garbled ACK/NAKs rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt) Wait for call 0 from above rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt) L rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. NAK(rcvpkt) ) udt_send(sndpkt) Wait for ACK or NAK 0 L Wait for ACK or NAK 1 Wait for call 1 from above rdt_send(data) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) Transport Layer 23
rdt 2. 1: receiver, handles garbled ACK/NAKs rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 0(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq 1(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) sndpkt = make_pkt(NAK, chksum) udt_send(sndpkt) Wait for 0 from below Wait for 1 from below rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 1(rcvpkt) rdt_rcv(rcvpkt) && not corrupt(rcvpkt) && has_seq 0(rcvpkt) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK, chksum) udt_send(sndpkt) Transport Layer 24
rdt 2. 1: discussion Sender: r seq # added to pkt r two seq. #’s (0, 1) will suffice. Why? r must check if received ACK/NAK corrupted r twice as many states m state must “remember” whether “current” pkt has 0 or 1 seq. # Receiver: r must check if received packet is duplicate m state indicates whether 0 or 1 is expected pkt seq # r note: receiver can not know if its last ACK/NAK received OK at sender Transport Layer 25
rdt 2. 2: a NAK-free protocol r same functionality as rdt 2. 1, using ACKs only r instead of NAK, receiver sends ACK for last pkt received OK m receiver must explicitly include seq # of pkt being ACKed r duplicate ACK at sender results in same action as NAK: retransmit current pkt Transport Layer 26
rdt 2. 2: sender, receiver fragments rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) rdt_rcv(rcvpkt) && Wait for call 0 from above rdt_rcv(rcvpkt) && (corrupt(rcvpkt) || has_seq 1(rcvpkt)) udt_send(sndpkt) Wait for 0 from below ( corrupt(rcvpkt) || is. ACK(rcvpkt, 1) ) udt_send(sndpkt) Wait for ACK 0 sender FSM fragment rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 0) receiver FSM fragment L rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq 1(rcvpkt) extract(rcvpkt, data) deliver_data(data) sndpkt = make_pkt(ACK 1, chksum) udt_send(sndpkt) Transport Layer 27
rdt 3. 0: channels with errors and loss New assumption: underlying channel can also lose packets (data or ACKs) m checksum, seq. #, ACKs, retransmissions will be of help, but not enough Approach: sender waits “reasonable” amount of time for ACK r retransmits if no ACK received in this time r if pkt (or ACK) just delayed (not lost): m retransmission will be duplicate, but use of seq. #’s already handles this m receiver must specify seq # of pkt being ACKed r requires countdown timer Transport Layer 28
rdt 3. 0 sender rdt_send(data) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) L rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 1) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. ACK(rcvpkt, 0) ) timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && is. ACK(rcvpkt, 0) stop_timer timeout udt_send(sndpkt) start_timer L Wait for ACK 0 Wait for call 0 from above L rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || is. ACK(rcvpkt, 1) ) Wait for ACK 1 Wait for call 1 from above rdt_send(data) rdt_rcv(rcvpkt) L sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer Transport Layer 29
rdt 3. 0 in action Transport Layer 30
rdt 3. 0 in action Transport Layer 31
Performance of rdt 3. 0 r rdt 3. 0 works, but performance stinks r Why? Transport Layer 32
rdt 3. 0: stop-and-wait operation sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK arrives, send next packet, t = RTT + L / R Transport Layer 33
Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-to-beacknowledged pkts m m range of sequence numbers must be increased buffering at sender and/or receiver r Two generic forms of pipelined protocols: go-Back-N, selective repeat Transport Layer 34
Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R RTT first packet bit arrives last packet bit arrives, send ACK last bit of 2 nd packet arrives, send ACK last bit of 3 rd packet arrives, send ACK arrives, send next packet, t = RTT + L / R Transport Layer 35
Go-Back-N Sender: r k-bit seq # in pkt header r “window” of up to N, consecutive unack’ed pkts allowed r ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” may deceive duplicate ACKs (see receiver) r timer for each in-flight pkt r timeout(n): retransmit pkt n and all higher seq # pkts in window m Transport Layer 36
GBN: receiver ACK-only: always send ACK for correctlyreceived pkt with highest in-order seq # m may generate duplicate ACKs m need only remember expectedseqnum r out-of-order pkt: m discard (don’t buffer) -> isn't this bad? m Re-ACK pkt with highest in-order seq # Transport Layer 37
GBN in action Transport Layer 38
Selective Repeat r receiver individually acknowledges all correctly received pkts m buffers pkts, as needed, for eventual in-order delivery to upper layer r sender only resends pkts for which ACK not received m sender timer for each un. ACKed pkt r sender window m N consecutive seq #’s m again limits seq #s of sent, un. ACKed pkts Transport Layer 39
Selective repeat: sender, receiver windows Transport Layer 40
Selective repeat sender data from above : receiver pkt n in [rcvbase, rcvbase+N-1] r if next available seq # in r send ACK(n) timeout(n): r in-order: deliver (also deliver window, send pkt r resend pkt n, restart timer ACK(n) in [sendbase, sendbase+N]: r mark pkt n as received r if n smallest un. ACKed pkt, advance window base to next un. ACKed seq # r out-of-order: buffered, in-order pkts), advance window to next notyet-received pkt n in [rcvbase-N, rcvbase-1] r ACK(n) otherwise: r ignore Transport Layer 41
Selective repeat in action Transport Layer 42
Selective repeat: dilemma Example: r seq #’s: 0, 1, 2, 3 r window size=3 r receiver sees no difference in two scenarios! r incorrectly passes duplicate data as new in (a) Q: what relationship between seq # size and window size? Transport Layer 43
TCP: Overview r point-to-point: m one sender, one receiver r reliable, in-order byte steam: m no “message boundaries” r pipelined: m TCP congestion and flow control set window size r send & receive buffers RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum segment size r connection-oriented: m handshaking (exchange of control msgs) init’s sender, receiver state before data exchange r flow controlled: m sender will not overwhelm receiver Transport Layer 44
TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number acknowledgement number head not UA P R S F len used checksum Receive window Urg data pnter Options (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept application data (variable length) Transport Layer 45
TCP seq. #’s and ACKs Seq. #’s: m byte stream “number” of first byte in segment’s data ACKs: m seq # of next byte expected from other side m cumulative ACK m piggybacking Q: how receiver handles out-oforder segments m A: TCP spec doesn’t say, - up to implementor Host B Host A User types ‘C’ Seq=4 2 , ACK= 79, da ta = ‘C a , dat 3 4 = CK 9, A 7 Seq= host ACKs receipt of echoed ‘C’ ’ = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=4 3, ACK =80 simple telnet scenario Transport Layer time 46
TCP Round Trip Time and Timeout Q: how to set TCP timeout value? r longer than RTT m but RTT varies r too short: premature timeout m unnecessary retransmissions r too long: slow reaction to segment loss Q: how to estimate RTT? r Sample. RTT: measured time from segment transmission until ACK receipt m ignore retransmissions r Sample. RTT will vary, want estimated RTT “smoother” m average several recent measurements, not just current Sample. RTT Transport Layer 47
Example RTT estimation: Transport Layer 48
TCP reliable data transfer r TCP creates rdt service on top of IP’s unreliable service r Pipelined segments r Cumulative acks r TCP uses single retransmission timer r Retransmissions are triggered by: m m timeout events duplicate acks r Initially consider simplified TCP sender: m m ignore duplicate acks ignore flow control, congestion control Transport Layer 49
TCP sender events: data rcvd from app: r Create segment with seq # r seq # is byte-stream number of first data byte in segment r start timer if not already running (think of timer as for oldest unacked segment) r expiration interval: Time. Out. Interval timeout: r retransmit segment that caused timeout r restart timer Ack rcvd: r If acknowledges previously unacked segments m m update what is known to be acked start timer if there are outstanding segments Transport Layer 50
Next. Seq. Num = Initial. Seq. Num Send. Base = Initial. Seq. Num loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number Next. Seq. Num if (timer currently not running) start timer pass segment to IP Next. Seq. Num = Next. Seq. Num + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > Send. Base) { Send. Base = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ TCP sender (simplified) Comment: • Send. Base-1: last cumulatively ack’ed byte Example: • Send. Base-1 = 71; y= 73, so the rcvr wants 73+ ; y > Send. Base, so that new data is acked Transport Layer 51
TCP: retransmission scenarios Host A 2, 8 by tes da Seq=92 timeout ta =100 X ACK loss Seq=9 2, 8 by tes da ta 100 Sendbase = 100 Send. Base = 120 = ACK Send. Base = 100 time Host B Seq=9 Send. Base = 120 lost ACK scenario Seq= 2, 8 by tes da t 100, 2 0 byte a s data 0 10 = K 120 = C K A AC Seq=9 2, 8 by Seq=92 timeout Seq=9 timeout Host A Host B time tes da ta 20 K=1 AC premature timeout Transport Layer 52
TCP retransmission scenarios (more) Host A Host B Seq=9 timeout 2, 8 by Send. Base = 120 tes da ta =100 K C A 00, 20 bytes data Seq=1 X loss 120 = ACK time Cumulative ACK scenario Transport Layer 53
TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Delayed ACK. Wait up to 500 ms for next segment. If no next segment, send ACK Arrival of in-order segment with expected seq #. One other segment has ACK pending Immediately send single cumulative ACK, ACKing both in-order segments Arrival of out-of-order segment higher-than-expect seq. #. Gap detected Immediately send duplicate ACK, indicating seq. # of next expected byte Arrival of segment that partially or completely fills gap Immediate send ACK, provided that segment startsat lower end of gap Transport Layer 54
Fast Retransmit r Time-out period often relatively long: m long delay before resending lost packet r Detect lost segments via duplicate ACKs. m m Sender often sends many segments back-toback If segment is lost, there will likely be many duplicate ACKs. r If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: m fast retransmit: resend segment before timer expires Transport Layer 55
TCP Flow Control r receive side of TCP connection has a receive buffer: flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast r speed-matching r app process may be service: matching the send rate to the receiving app’s drain rate slow at reading from buffer Transport Layer 56
TCP Flow control: how it works r Rcvr advertises spare (Suppose TCP receiver discards out-of-order segments) r spare room in buffer room by including value of Rcv. Window in segments r Sender limits un. ACKed data to Rcv. Window m guarantees receive buffer doesn’t overflow = Rcv. Window = Rcv. Buffer-[Last. Byte. Rcvd Last. Byte. Read] Transport Layer 57
TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments r initialize TCP variables: m seq. #s m buffers, flow control info (e. g. Rcv. Window) r client: connection initiator Socket client. Socket = new Socket("hostname", "port number"); r server: contacted by client Socket connection. Socket = welcome. Socket. accept(); Transport Layer 58
TCP Connection Management Three way handshake: Step 1: client host sends TCP SYN segment to server m specifies initial seq # m no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers m specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data m Transport Layer 59
TCP Connection Management (cont. ) Closing a connection: client closes socket: client. Socket. close(); client close Step 1: client end system sends TCP FIN control segment to server FIN ACK close FIN replies with ACK. Closes connection, sends FIN. timed wait Step 2: server receives FIN, ACK closed Transport Layer 60
TCP Connection Management (cont. ) Step 3: client receives FIN, replies with ACK. m Enters “timed wait” - will respond with ACK to received FINs client closing FIN timed wait Connection closed. can handle simultaneous FINs. FIN ACK Step 4: server, receives ACK. Note: with small modification, server ACK closed Transport Layer 61
TCP Congestion Control r end-end control (no network assistance) r sender limits transmission: Last. Byte. Sent-Last. Byte. Acked Cong. Win r Roughly, r Cong. Win is dynamic, function of Cong. Win rate = Bytes/sec perceived network RTTcongestion How does sender perceive congestion? r loss event = timeout or 3 duplicate acks r TCP sender reduces rate (Cong. Win) after loss event three mechanisms: m m m AIMD slow start conservative after timeout events Transport Layer 62
TCP AIMD multiplicative decrease: cut Cong. Win in half after loss event additive increase: increase Cong. Win by 1 MSS every RTT in the absence of loss events: probing Long-lived TCP connection Transport Layer 63
TCP Slow Start r When connection begins, Cong. Win = 1 MSS m m Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps r When connection begins, increase rate exponentially fast until first loss event r available bandwidth may be >> MSS/RTT m desirable to quickly ramp up to respectable rate Transport Layer 64
TCP Slow Start (more) r When connection m m double Cong. Win every RTT done by incrementing Cong. Win for every ACK received RTT begins, increase rate exponentially until first loss event: Host A Host B one segme nt two segme nts four segme nts r Summary: initial rate is slow but ramps up exponentially fast time Transport Layer 65
Refinement Philosophy: r After 3 dup ACKs: Cong. Win is cut in half m window then grows linearly r But after timeout event: m Cong. Win instead set to 1 MSS; m window then grows exponentially m to a threshold, then grows linearly m • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming” Transport Layer 66
Refinement (more) Q: When should the exponential increase switch to linear? A: When Cong. Win gets to 1/2 of its value before timeout. Implementation: r Variable Threshold r At loss event, Threshold is set to 1/2 of Cong. Win just before loss event Transport Layer 67
Summary: TCP Congestion Control r When Cong. Win is below Threshold, sender in slow-start phase, window grows exponentially. r When Cong. Win is above Threshold, sender is in congestion- avoidance phase, window grows linearly. r When a triple duplicate ACK occurs, Threshold set to Cong. Win/2 and Cong. Win set to Threshold. r When timeout occurs, Threshold set to Cong. Win/2 and Cong. Win is set to 1 MSS. Transport Layer 68
TCP sender congestion control Event State TCP Sender Action Commentary ACK receipt Slow Start for previously (SS) unacked data Cong. Win = Cong. Win + MSS, If (Cong. Win > Threshold) set state to “Congestion Avoidance” Resulting in a doubling of Cong. Win every RTT ACK receipt Congestion for previously Avoidance unacked data (CA) Cong. Win = Cong. Win+MSS * (MSS/Cong. Win) Additive increase, resulting in increase of Cong. Win by 1 MSS every RTT Loss event detected by triple duplicate ACK SS or CA Threshold = Cong. Win/2, Cong. Win = Threshold, Set state to “Congestion Avoidance” Fast recovery, implementing multiplicative decrease. Cong. Win will not drop below 1 MSS. Timeout SS or CA Threshold = Cong. Win/2, Cong. Win = 1 MSS, Set state to “Slow Start” Enter slow start Duplicate ACK SS or CA Increment duplicate ACK count for segment being acked Cong. Win and Threshold not changed Transport Layer 69
Interplay between routing and forwarding routing algorithm local forwarding table header value output link 0100 0101 0111 1001 3 2 2 1 value in arriving packet’s header 0111 1 3 2 Transport Layer 70
Graph abstraction 5 2 u 2 1 Graph: G = (N, E) v x 3 w 3 1 5 z 1 y 2 N = set of routers = { u, v, w, x, y, z } E = set of links ={ (u, v), (u, x), (v, w), (x, y), (w, z), (y, z) } Remark: Graph abstraction is useful in other network contexts Example: P 2 P, where N is set of peers and E is set of TCP connections Transport Layer 71
Graph abstraction: costs 5 2 u v 2 1 x • c(x, x’) = cost of link (x, x’) 3 w 3 1 5 z 1 y - e. g. , c(w, z) = 5 2 • cost could always be 1, or inversely related to bandwidth, or inversely related to congestion Cost of path (x 1, x 2, x 3, …, xp) = c(x 1, x 2) + c(x 2, x 3) + … + c(xp-1, xp) Question: What’s the least-cost path between u and z ? Routing algorithm: algorithm that finds least-cost path Transport Layer 72
Routing Algorithm classification Global or decentralized information? Global: r all routers have complete topology, link cost info r “link state” algorithms Decentralized: r router knows physicallyconnected neighbors, link costs to neighbors r iterative process of computation, exchange of info with neighbors r “distance vector” algorithms Static or dynamic? Static: r routes change slowly over time Dynamic: r routes change more quickly m periodic update m in response to link cost changes Transport Layer 73
A Link-State Routing Algorithm Dijkstra’s algorithm r net topology, link costs known to all nodes m accomplished via “link state broadcast” m all nodes have same info r computes least cost paths from one node (‘source”) to all other nodes m gives forwarding table for that node r iterative: after k iterations, know least cost path to k dest. ’s Notation: r c(x, y): link cost from node x to y; = ∞ if not direct neighbors r D(v): current value of cost of path from source to dest. v r p(v): predecessor node along path from source to v r N': set of nodes whose least cost path definitively known Transport Layer 74
Dijsktra’s Algorithm 1 Initialization: 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u, v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w, v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N' Transport Layer 75
Dijkstra’s algorithm: example Step 0 1 2 3 4 5 N' u ux uxyvwz D(v), p(v) D(w), p(w) 2, u 5, u 2, u 4, x 2, u 3, y D(x), p(x) 1, u D(y), p(y) ∞ 2, x D(z), p(z) ∞ ∞ 4, y 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 Transport Layer 76
Dijkstra’s algorithm, discussion Algorithm complexity: n nodes r each iteration: need to check all nodes, w, not in N r n(n+1)/2 comparisons: O(n 2) r more efficient implementations possible: O(nlogn) Oscillations possible: r e. g. , link cost = amount of carried traffic D 1 1 0 A 0 0 C e 1+e e initially B 1 2+e A 0 D 1+e 1 B 0 0 C … recompute routing 0 D 1 A 0 0 C 2+e B 1+e … recompute 2+e A 0 D 1+e 1 B e 0 C … recompute Transport Layer 77
Distance Vector Algorithm (1) Bellman-Ford Equation (dynamic programming) Define dx(y) : = cost of least-cost path from x to y Then dx(y) = min {c(x, v) + dv(y) } where min is taken over all neighbors of x Transport Layer 78
Bellman-Ford example (2) 5 2 u v 2 1 x 3 w 3 1 5 z 1 y Clearly, dv(z) = 5, dx(z) = 3, dw(z) = 3 2 B-F equation says: du(z) = min { c(u, v) + dv(z), c(u, x) + dx(z), c(u, w) + dw(z) } = min {2 + 5, 1 + 3, 5 + 3} = 4 Node that achieves minimum is next hop in shortest path ➜ forwarding table Transport Layer 79
Distance Vector Algorithm (3) r Dx(y) = estimate of least cost from x to y r Distance vector: Dx = [Dx(y): y є N ] r Node x knows cost to each neighbor v: c(x, v) r Node x maintains Dx = [Dx(y): y є N ] r Node x also maintains its neighbors’ distance vectors m For each neighbor v, x maintains Dv = [Dv(y): y є N ] Transport Layer 80
Distance vector algorithm (4) Basic idea: r Each node periodically sends its own distance vector estimate to neighbors r When node a node x receives new DV estimate from neighbor, it updates its own DV using B-F equation: Dx(y) ← minv{c(x, v) + Dv(y)} for each node y ∊ N r Under minor, natural conditions, the estimate Dx(y) converge the actual least cost dx(y) Transport Layer 81
Distance Vector Algorithm (5) Iterative, asynchronous: each local iteration caused by: r local link cost change r DV update message from neighbor Distributed: Each node: wait for (change in local link cost of msg from neighbor) r each node notifies neighbors only when its DV changes m neighbors then notify their neighbors if necessary recompute estimates if DV to any dest has changed, notify neighbors Transport Layer 82
Dx(y) = min{c(x, y) + Dy(y), c(x, z) + Dz(y)} = min{2+0 , 7+1} = 2 node x table cost to x y z x ∞∞ ∞ y ∞∞ ∞ z 71 0 from x 0 2 7 y 2 0 1 z 7 1 0 cost to x y z x 0 2 7 y 2 0 1 z 3 1 0 x 0 2 3 y 2 0 1 z 3 1 0 cost to x y z x 0 2 3 y 2 0 1 z 3 1 0 x 2 y 1 7 z cost to x y z from x ∞ ∞ ∞ y 2 0 1 z ∞∞ ∞ node z table cost to x y z x 0 2 3 y 2 0 1 z 7 1 0 cost to x y z from x 0 2 7 y ∞∞ ∞ z ∞∞ ∞ node y table cost to x y z Dx(z) = min{c(x, y) + Dy(z), c(x, z) + Dz(z)} = min{2+1 , 7+0} = 3 x 0 2 3 y 2 0 1 z 3 1 0 time Transport Layer 83
Distance Vector: link cost changes Link cost changes: r node detects local link cost change r updates routing info, recalculates distance vector r if DV changes, notify neighbors “good news travels fast” 1 x 4 y 50 1 z At time t 0, y detects the link-cost change, updates its DV, and informs its neighbors. At time t 1, z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV. At time t 2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. Transport Layer 84
Distance Vector: link cost changes Link cost changes: r good news travels fast r bad news travels slow - “count to infinity” problem! r 44 iterations before algorithm stabilizes: see text 60 x 4 y 50 1 z Poissoned reverse: r If Z routes through Y to get to X : m Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z) r will this completely solve count to infinity problem? Transport Layer 85
Comparison of LS and DV algorithms Message complexity r LS: with n nodes, E links, O(n. E) msgs sent r DV: exchange between neighbors only m convergence time varies Speed of Convergence r LS: O(n 2) algorithm requires O(n. E) msgs m may have oscillations r DV: convergence time varies m may be routing loops m count-to-infinity problem Robustness: what happens if router malfunctions? LS: m m DV: m m node can advertise incorrect link cost each node computes only its own table DV node can advertise incorrect path cost each node’s table used by others • error propagate thru network Transport Layer 86
Multiple Access Links and Protocols Two types of “links”: r point-to-point m PPP for dial-up access m point-to-point link between Ethernet switch and host r broadcast (shared wire or medium) m traditional Ethernet m upstream HFC m 802. 11 wireless LAN Transport Layer 87
Multiple Access protocols r single shared broadcast channel r two or more simultaneous transmissions by nodes: interference m collision if node receives two or more signals at the same time multiple access protocol r distributed algorithm that determines how nodes share channel, i. e. , determine when node can transmit r communication about channel sharing must use channel itself! m no out-of-band channel for coordination Transport Layer 88
Ideal Mulitple Access Protocol Broadcast channel of rate R bps 1. When one node wants to transmit, it can send at rate R. 2. When M nodes want to transmit, each can send at average rate R/M 3. Fully decentralized: m m no special node to coordinate transmissions no synchronization of clocks, slots 4. Simple Transport Layer 89
MAC Protocols: a taxonomy Three broad classes: r Channel Partitioning m m divide channel into smaller “pieces” (time slots, frequency, code) allocate piece to node for exclusive use r Random Access m channel not divided, allow collisions m “recover” from collisions r “Taking turns” m Nodes take turns, but nodes with more to send can take longer turns Transport Layer 90
Channel Partitioning MAC protocols: TDMA: time division multiple access r access to channel in "rounds" r each station gets fixed length slot (length = pkt trans time) in each round r unused slots go idle r example: 6 -station LAN, 1, 3, 4 have pkt, slots 2, 5, 6 idle Transport Layer 91
Channel Partitioning MAC protocols: FDMA: frequency division multiple access r channel spectrum divided into frequency bands r each station assigned fixed frequency band r unused transmission time in frequency bands go idle r example: 6 -station LAN, 1, 3, 4 have pkt, frequency bands 2, 5, 6 idle frequency bands time Transport Layer 92
Random Access Protocols r When node has packet to send m transmit at full channel data rate R. m no a priori coordination among nodes r two or more transmitting nodes ➜ “collision”, r random access MAC protocol specifies: m how to detect collisions m how to recover from collisions (e. g. , via delayed retransmissions) r Examples of random access MAC protocols: m slotted ALOHA m CSMA, CSMA/CD, CSMA/CA Transport Layer 93
Slotted ALOHA Assumptions r all frames same size r time is divided into equal size slots, time to transmit 1 frame r nodes start to transmit frames only at beginning of slots r nodes are synchronized r if 2 or more nodes transmit in slot, all nodes detect collision Operation r when node obtains fresh frame, it transmits in next slot r no collision, node can send new frame in next slot r if collision, node retransmits frame in each subsequent slot with prob. p until success Transport Layer 94
Slotted ALOHA Pros r single active node can continuously transmit at full rate of channel r highly decentralized: only slots in nodes need to be in sync r simple Cons r collisions, wasting slots r idle slots r nodes may be able to detect collision in less than time to transmit packet r clock synchronization Transport Layer 95
Slotted Aloha efficiency Efficiency is the long-run fraction of successful slots when there are many nodes, each with many frames to send Suppose N nodes with many frames to send, each transmits in slot with probability p r prob that node 1 has success in a slot = p(1 -p)N-1 r prob that any node has a success = r Np(1 -p)N-1 r For max efficiency with N nodes, find p* that maximizes Np(1 -p)N-1 r For many nodes, take limit of Np*(1 -p*)N-1 as N goes to infinity, gives 1/e =. 37 At best: channel used for useful transmissions 37% of time! Transport Layer 96
Pure (unslotted) ALOHA r unslotted Aloha: simpler, no synchronization r when frame first arrives m transmit immediately r collision probability increases: m frame sent at t 0 collides with other frames sent in [t 0 -1, t 0+1] Transport Layer 97
CSMA (Carrier Sense Multiple Access) CSMA: listen before transmit: If channel sensed idle: transmit entire frame r If channel sensed busy, defer transmission r Human analogy: don’t interrupt others! Transport Layer 98
CSMA collisions spatial layout of nodes collisions can still occur: propagation delay means two nodes may not hear each other’s transmission collision: entire packet transmission time wasted note: role of distance & propagation delay in determining collision probability Transport Layer 99
CSMA/CD (Collision Detection) CSMA/CD: carrier sensing, deferral as in CSMA m collisions detected within short time m colliding transmissions aborted, reducing channel wastage r collision detection: m easy in wired LANs: measure signal strengths, compare transmitted, received signals m difficult in wireless LANs: receiver shut off while transmitting r human analogy: the polite conversationalist Transport Layer 100
CSMA/CD collision detection Transport Layer 101
“Taking Turns” MAC protocols channel partitioning MAC protocols: m share channel efficiently and fairly at high load m inefficient at low load: delay in channel access, 1/N bandwidth allocated even if only 1 active node! Random access MAC protocols m efficient at low load: single node can fully utilize channel m high load: collision overhead “taking turns” protocols look for best of both worlds! Transport Layer 102
“Taking Turns” MAC protocols Polling: r master node “invites” slave nodes to transmit in turn r concerns: m m m polling overhead latency single point of failure (master) Token passing: r control token passed from one node to next sequentially. r token message r concerns: m m m token overhead latency single point of failure (token) Transport Layer 103
Ethernet uses CSMA/CD r No slots r adapter doesn’t transmit if it senses that some other adapter is transmitting, that is, carrier sense r transmitting adapter aborts when it senses that another adapter is transmitting, that is, collision detection r Before attempting a retransmission, adapter waits a random time, that is, random access Transport Layer 104
Ethernet CSMA/CD algorithm 1. Adaptor receives datagram from net 4. If adapter detects another layer & creates frame transmission while transmitting, 2. If adapter senses channel idle, it aborts and sends jam signal starts to transmit frame. If it 5. After aborting, adapter enters senses channel busy, waits until exponential backoff: after the channel idle and then transmits mth collision, adapter chooses a 3. If adapter transmits entire frame K at random from without detecting another {0, 1, 2, …, 2 m-1}. Adapter waits transmission, the adapter is done K·512 bit times and returns to with frame ! Step 2 Transport Layer 105
Ethernet’s CSMA/CD (more) Jam Signal: make sure all other transmitters are aware of collision; 48 bits Bit time: . 1 microsec for 10 Mbps Ethernet ; for K=1023, wait time is about 50 msec See/interact with Java applet on AWL Web site: highly recommended ! Exponential Backoff: r Goal: adapt retransmission attempts to estimated current load m heavy load: random wait will be longer r first collision: choose K from {0, 1}; delay is K· 512 bit transmission times r after second collision: choose K from {0, 1, 2, 3}… r after ten collisions, choose K from {0, 1, 2, 3, 4, …, 1023} Transport Layer 106
- Slides: 106