Networking and TCPIP Sam Kumar CS 162 Operating

  • Slides: 77
Download presentation
Networking and TCP/IP Sam Kumar CS 162: Operating Systems and System Programming Lecture 22

Networking and TCP/IP Sam Kumar CS 162: Operating Systems and System Programming Lecture 22 https: //inst. eecs. berkeley. edu/~cs 162/su 20 Read: End-to-End Principle 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 1

Recall: New Technology File System (NTFS) • Default on modern Windows systems • Instead

Recall: New Technology File System (NTFS) • Default on modern Windows systems • Instead of FAT or inode array: Master File Table • Max 1 KB size for each table entry • Each entry in MFT contains metadata and: • File’s data directly (for small files) • A list of extents (start block, size) for file’s data • For big files: pointers to other MFT entries with more extent lists 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 2

Recall: The mmap System Call • API provided by OS for a process to

Recall: The mmap System Call • API provided by OS for a process to alter its memory map (memory regions) • Also supports anonymous mapping: memory not backed by a file • Memory regions can be • shared (inherited by child on fork) • private (not inherited by child on fork) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 3

Recall: File System Buffer Cache Disk Data blocks • OS implements PCB a cache

Recall: File System Buffer Cache Disk Data blocks • OS implements PCB a cache of file disk blocks desc Reading i. Nodes Writing Dir Data blocks Free bitmap Memory Blocks State 10/21/2021 free Kumar CS 162 at UC Berkeley, Summer 2020 4

Recall: Buffer Caching vs. Demand Paging • Replacement Policy? • Demand Paging: LRU is

Recall: Buffer Caching vs. Demand Paging • Replacement Policy? • Demand Paging: LRU is infeasible; use approximation (like NRU/Clock) • Buffer Cache: LRU is OK • Eviction Policy? • Demand Paging: evict not-recently-used pages when memory is close to full • Buffer Cache: write back dirty blocks periodically, even if used recently • Why? To minimize data loss in case of a crash 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 5

Recall: Important Terminology • Availability: the probability that the system can accept and process

Recall: Important Terminology • Availability: the probability that the system can accept and process requests • Often measured in “nines” of probability (e. g. , 99. 9% is “ 3 -nines of availability”) • Key idea here is independence of failures • Durability: the ability of a system to recover data despite faults • This idea is fault tolerance applied to data • Doesn’t necessarily imply availability: information on pyramids was very durable, but could not be accessed until discovery of Rosetta Stone • Reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time (IEEE definition) • Usually stronger than availability: means system is not only “up”, but also working correctly • Includes availability, security, fault tolerance/durability • Must make sure data survives system crashes, disk crashes, other problems 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 6

Recall: RAID 6 and Erasure Codes • 10/21/2021 Kumar CS 162 at UC Berkeley,

Recall: RAID 6 and Erasure Codes • 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 7

Recall: Two Reliability Approaches Careful Ordering and Recovery Versioning and Copy-on-Write • FAT &

Recall: Two Reliability Approaches Careful Ordering and Recovery Versioning and Copy-on-Write • FAT & FFS + (fsck) • Each step builds structure, • Data block << inode << free << directory • Last step links it in to rest of FS • Recover scans structure looking for incomplete actions • ZFS, … • Version files at some granularity • Create new structure linking back to unchanged parts of old • Last step is to declare that the new version is ready 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 8

Recall: Key Systems Concept—Transaction • A transaction is an atomic sequence of reads and

Recall: Key Systems Concept—Transaction • A transaction is an atomic sequence of reads and writes that takes the system from consistent state to another. consistent state 1 transaction consistent state 2 • Recall: Code in a critical section appears atomic to other threads • Transactions extend the concept of atomic updates from memory to persistent storage 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 9

Key Systems Concept: Log • Writing/appending a single item to a log is atomic

Key Systems Concept: Log • Writing/appending a single item to a log is atomic • Like memory load/store in most architectures 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 Commit Tran N Put 15$ into account Y Put 15$ into account X Get 13$ from account C Get 7$ from account B Get 10$ from account A Start Tran N • Key idea: append a single item (atomic) to seal the commitment to a whole sequence of actions 10

Journaling File Systems • Don’t modify data structures on disk directly • Write each

Journaling File Systems • Don’t modify data structures on disk directly • Write each update as transaction recorded in a log • Commonly called a journal or intention list • Also maintained on disk (allocate blocks for it when formatting) • Once changes are in the log, they can be safely applied • e. g. modify inode pointers and directory mapping • Garbage collection: once a change is applied, remove its entry from the log 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 11

Creating a File (No Journaling Yet) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer

Creating a File (No Journaling Yet) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 Free space map … • Find free data block(s) • Find free inode entry • Find dirent insertion point -------------------- • Write map (i. e. , mark used) • Write inode entry to point to block(s) • Write dirent to point to inode Data blocks Inode table Directory entries 12

Creating a File (With Journaling) Inode table Directory entries pending commit done Data blocks

Creating a File (With Journaling) Inode table Directory entries pending commit done Data blocks head start tail Free space map … • Find free data block(s) • Find free inode entry • Find dirent insertion point -------------------- • [log] Write map (i. e. , mark used) • [log] Write inode entry to point to block(s) • [log] Write dirent to point to inode Log: in non-volatile storage (Flash or on Disk) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 13

After Commit, Eventually Replay Transaction • All accesses to the file system first looks

After Commit, Eventually Replay Transaction • All accesses to the file system first looks in the log Free space map … • Actual on-disk data structure might be stale Data blocks Inode table • Eventually, copy changes to disk and discard transaction from the log Directory entries pending tail head commit done start tail Log: in non-volatile storage (Flash or on Disk) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 14

Crash Recovery: Discard Partial Transactions • Upon recovery, scan the log Free space map

Crash Recovery: Discard Partial Transactions • Upon recovery, scan the log Free space map … • Detect transaction start with no commit Data blocks • Discard log entries Inode table • Disk remains unchanged Directory entries tail pending start done head Log: in non-volatile storage (Flash or on Disk) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 15

Crash Recovery: Keep Complete Transactions • Scan log, find start Free space map …

Crash Recovery: Keep Complete Transactions • Scan log, find start Free space map … • Find matching commit Data blocks Inode table • Redo it as usual Directory entries • Or just let it happen later pending commit done head start tail Log: in non-volatile storage (Flash or on Disk) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 16

Journaling Summary Why go through all this trouble? • Updates atomic, even if we

Journaling Summary Why go through all this trouble? • Updates atomic, even if we crash: • Update either gets fully applied or discarded • All physical operations treated as a logical unit Isn’t this expensive? • Yes! We're now writing all data twice (once to log, once to actual data blocks in target file) • Modern filesystems journal metadata updates only • Record modifications to file system data structures • But apply updates to a file’s contents directly 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 17

Going Further: Log-Structured File Systems • The log IS what is recorded on disk

Going Further: Log-Structured File Systems • The log IS what is recorded on disk • File system operations logically replay log to get result • Create data structures to make this fast • On recovery, replay the log • • Index (inodes) and directories are written into the log too Large, important portion of the log is cached in memory Do everything in bulk: log is collection of large segments Each segment contains a summary of all the operations within the segment • Fast to determine if segment is relevant or not • Free space is approached as continual cleaning process of segments • Detect what is live or not within a segment • Copy live portion to new segment being formed (replay) • Garbage collection: entire segment 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 18

LFS Paper (see Readings) • LFS: write file 1 block, write inode for file

LFS Paper (see Readings) • LFS: write file 1 block, write inode for file 1, write directory page mapping “file 1” in “dir 1” to its inode, write inode for this directory page. Do the same for ”/dir 2/file 2”. Then write summary of the new inodes that got created in the segment • Reads are same for LFS and FFS • Buffer cache likely to hold information in both cases • But disk IOs are very different 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 19

Focus of Today: Implementing the Socket Abstraction 10/21/2021 Kumar CS 162 at UC Berkeley,

Focus of Today: Implementing the Socket Abstraction 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 20

Recall: Most Transformative Artifact of Human Civilization… Map of the Internet in 1999 10/21/2021

Recall: Most Transformative Artifact of Human Civilization… Map of the Internet in 1999 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 21

Networking Bottom Line • All distributed systems boil down to message passing • Across

Networking Bottom Line • All distributed systems boil down to message passing • Across a network, the internet • Spread across the world • Things fail • Nodes die, software crashes, routers crash, messages get lost or corrupted… • Dealing with reliability / failure is inherent • Limited knowledge of what happened is fundamental • Question is what steps are taken at what level ? • What is the “contract” with the infrastructure? • What can we reason from and what can we not? 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 22

Recall: Sockets Requ Client socket ion t c e nn o C t es

Recall: Sockets Requ Client socket ion t c e nn o C t es connection • 5 -Tuple identifies each connection: 1. 2. 3. 4. 5. 10/21/2021 Server Socket Source IP Address Destination IP Address Source Port Number Destination Port Number Protocol (always TCP here) new socket Connection socket Server • Often, Client Port “randomly” assigned • Done by OS during client socket setup • Server Port often “well known” • 80 (web), 443 (secure web), 25 (sendmail), etc • Well-known ports from 0— 1023 Kumar CS 162 at UC Berkeley, Summer 2020 23

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to an Address (host: port) Connect it to server (host: port) Listen for Connection Socket Accept syscall() Connection Socket write request read response write response Close Client Socket Close Connection Socket Close Server Socket 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 24

How does the Network? And what abstractions does it provide the OS? 10/21/2021 Kumar

How does the Network? And what abstractions does it provide the OS? 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 25

Networking Challenge • Many different applications • Email, Web, Online Games, etc. • Many

Networking Challenge • Many different applications • Email, Web, Online Games, etc. • Many different network types and technologies • Wireless, Wired, Optical, etc. • Inherently distributed, parts can fail, no overall management authority, … • How do we manage this complexity? • Such that the whole is more “reliable” than its parts • Available, despite underlying unreliability 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 26

Networking Challenge • Re-implement every application for every technology? No! 10/21/2021 Application Skype Transmission

Networking Challenge • Re-implement every application for every technology? No! 10/21/2021 Application Skype Transmission Media Kumar CS 162 at UC Berkeley, Summer 2020 SSH Coaxial cable NFS Fiber optic HTTP Packet Radio 27

Networking Challenge • Re-implement sockets for every technology? No! 10/21/2021 Skype Application NFS SSH

Networking Challenge • Re-implement sockets for every technology? No! 10/21/2021 Skype Application NFS SSH HTTP Sockets UDP/IPv 4 Transmission Media Kumar CS 162 at UC Berkeley, Summer 2020 UDP/IPv 6 Coaxial cable TCP/IPv 4 TCP/IPv 6 Fiber optic 28

Layering: Complex Services from Simple Ones • Physical and Link Layers (Wi-Fi, Ethernet, …)

Layering: Complex Services from Simple Ones • Physical and Link Layers (Wi-Fi, Ethernet, …) • Unreliable, local exchange of limited-size frames • Network (IP) – routing between local networks • Unreliable, global exchange of limited-size packets • Transport (e. g. , TCP or UDP) – Glue • Reliable (with retries), ordering, stream of bytes • Not-necessarily reliable datagrams (UDP) • Application – Everything on top of sockets 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 29

Internet Architecture: The Five Layers • Lower three layers implemented everywhere • Top two

Internet Architecture: The Five Layers • Lower three layers implemented everywhere • Top two layers implemented only at hosts Application Transport Network Datalink Physical Host A 10/21/2021 s c so s t ke c so Network Datalink Physical Application Transport Network Datalink Physical Router Host B Kumar CS 162 at UC Berkeley, Summer 2020 30

Internet Architecture: Five Layers • Communication goes down to physical network • Then from

Internet Architecture: Five Layers • Communication goes down to physical network • Then from network peer to peer • Then up to relevant layer Application Transport Network Datalink Physical Host A 10/21/2021 s c so s t ke c so Network Datalink Physical Application Transport Network Datalink Physical Router Host B Kumar CS 162 at UC Berkeley, Summer 2020 31

Layering Analogy: Packets in Envelopes Application Layer Network Layer Physical Layer 10/21/2021 Trans. Hdr.

Layering Analogy: Packets in Envelopes Application Layer Network Layer Physical Layer 10/21/2021 Trans. Hdr. Data Transport Layer Net. Hdr. Trans. Hdr. Data Network Layer Net. Hdr. Trans. Hdr. Datalink Layer Data Transport Layer Datalink Layer Data Application Layer Frame Hdr. Trans. Hdr. Data Net. Hdr. Trans. Hdr. Data 10100110101110 Frame Hdr. 10100110101110 Kumar CS 162 at UC Berkeley, Summer 2020 Physical Layer 32

The Internet Hourglass SMTP HTTP DNS TCP NTP UDP IP Sockets – the OS

The Internet Hourglass SMTP HTTP DNS TCP NTP UDP IP Sockets – the OS “Hourglass” Applications Transport Waist Data Link Ethernet Copper 10/21/2021 SONET Fiber 802. 11 Radio Physical The Hourglass Model Kumar CS 162 at UC Berkeley, Summer 2020 33

Implications of the Hourglass • There is only one Network-Layer Protocol: IP • Allows

Implications of the Hourglass • There is only one Network-Layer Protocol: IP • Allows networks to interoperate • Above IP: Applications function on all networks • Below IP: Change network’s construction without disturbing applications • One drawback: Changing IP itself (e. g. transitioning to IPv 6) very involved 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 34

Internet Protocol (IP) • Internet Protocol: Internet’s network layer • Service it provides: “Best-Effort”

Internet Protocol (IP) • Internet Protocol: Internet’s network layer • Service it provides: “Best-Effort” Packet Delivery • • Application Present. Session Transport Network Datalink Physical Tries its “best” to deliver packet to its destination Packets may be lost Packets may be corrupted Packets may be delivered out of order source destination IP network 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 35

Internet: Network of Networks • Hierarchy of Networks: Scales to billions of hosts Other

Internet: Network of Networks • Hierarchy of Networks: Scales to billions of hosts Other subnets subnet 1 Router Transcontinental Link Router subnet 2 10/21/2021 Other subnets Kumar CS 162 at UC Berkeley, Summer 2020 Router subnet 3 36

Internet Protocol Features • Routing – an IP packet goes anywhere • Just need

Internet Protocol Features • Routing – an IP packet goes anywhere • Just need the destination IP address • Fragmentation – split big messages into smaller pieces • • Think about downloading a file Maximum size 64 K Reassemble at destination Hides differences in link/physical layers • Multiple protocols running on top • ICMP, TCP, UDP, … 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 37

Internet Protocol Non-Features • Unreliable Delivery ("Best Effort") • IP packet delivery not guaranteed

Internet Protocol Non-Features • Unreliable Delivery ("Best Effort") • IP packet delivery not guaranteed • May be lost by underlying physical layer (e. g. , radio noise) • May be dropped in transit • Out-of-order/duplicate delivery • Tolerance of physical layer retrying transmission • Tolerance of multiple paths 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 38

End-to-End Principle • Seen as a guiding principle of the Internet • Some types

End-to-End Principle • Seen as a guiding principle of the Internet • Some types of network functionality can only be correctly implemented end-to-end • Reliability, security, etc. • Implementing complex functionality in the network: • Doesn’t necessarily reduce complexity on end hosts • Does increase network complexity • Imposes a cost on all applications, even if they don’t need the functionality 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 39

Transport Layer • Service: • Provide end-to-end communication between processes • Demultiplexing of communication

Transport Layer • Service: • Provide end-to-end communication between processes • Demultiplexing of communication between hosts • Possible other services: • Reliability in the presence of errors • Timing properties • Rate adaption (flow-control, congestion control) Application Present. Session Transport Network Datalink Physical • Interface: send message to “specific process” at given destination; local process receives messages sent to it • How to name the process you want to send data to? • Protocol: port numbers, perhaps implement reliability, flow control, packetization of large messages, framing • Prime Examples: TCP and UDP 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 40

Transport Layer Protocols • Datagram service (UDP) • No-frills extension of “best-effort” IP •

Transport Layer Protocols • Datagram service (UDP) • No-frills extension of “best-effort” IP • Multiplexing/Demultiplexing among processes • Reliable, in-order delivery (TCP) • • • Connection set-up & tear-down Discarding corrupted packets (segments) Retransmission of lost packets (segments) Flow control Congestion control Application Present. Session Transport Network Datalink Physical • Services not available • Delay and/or bandwidth guarantees • Sessions that survive change-of-IP-address 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 41

Application Layer • Service: any service provided to the end user • Interface: depends

Application Layer • Service: any service provided to the end user • Interface: depends on the application • Protocol: depends on the application Application Present. Session Transport Network Datalink Physical • Examples: Skype, SMTP (email), HTTP (Web), Halo, Bit. Torrent … • What happened to layers 5 & 6? • “Session” and “Presentation” layers • Part of OSI architecture, but not Internet architecture • Their functionality is provided by application layer 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 42

Socket API Application Socket API Transport Network 10/21/2021 TCP UDP IP Kumar CS 162

Socket API Application Socket API Transport Network 10/21/2021 TCP UDP IP Kumar CS 162 at UC Berkeley, Summer 2020 43

BSD Socket API • Created at UC Berkeley (1980 s) • Most popular network

BSD Socket API • Created at UC Berkeley (1980 s) • Most popular network API • Ported to various OSes, various languages • Windows Winsock, BSD, OS X, Linux, Solaris, … • Socket modules in Java, Python, Perl, … • Like Unix file I/O API • In the form of file descriptor • Can share same read()/write()/close() system calls 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 44

Summary of Network Layers • Physical Layer: bits on a wire • Link Layer:

Summary of Network Layers • Physical Layer: bits on a wire • Link Layer: send frames over a local network • Local addresses (MAC address) • Network Layer: send packets to machines on different networks • Global address (IP address) • Transport Layer: send data between processes on different machines • Could be byte stream (the sockets we’ve seen) or raw segments (like packets) • OS implements this layer • Application Layer: do something with the data (RPCs, HTTP, etc. ) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 45

Does the Internet have to be this way? “There is a tendency in our

Does the Internet have to be this way? “There is a tendency in our field to believe that everything we currently use is a paragon of engineering, rather than a snapshot of our understanding at the time. We build great myths of spin about how what we have done is the only way to do it to the point that our universities now teach the flaws to students (and professors and textbook authors) who don’t know better. ” —John Day • The design of the Internet has many flaws • But one may say that they’ve stood the test of time… 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 46

Implementing Reliable Transport Protocol: TCP 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020

Implementing Reliable Transport Protocol: TCP 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 47

Announcements • Project 3 is released • Start on this early • Pay particular

Announcements • Project 3 is released • Start on this early • Pay particular attention to synchronization • Reminder: Homework 6 is optional • All mandatory assignments have been released… • Quiz 3 is on Monday 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 48

Recall: End-to-End Principle • Seen as a guiding principle of the Internet • Some

Recall: End-to-End Principle • Seen as a guiding principle of the Internet • Some types of network functionality can only be correctly implemented end-to-end • Reliability, security, etc. • Implementing complex functionality in the network: • Doesn’t necessarily reduce complexity on end hosts • Does increase network complexity • Imposes a cost on all applications, even if they don’t need the functionality 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 49

TCP: Transport Control Protocol • Reliable, in-order, and at-most-once delivery • Stream oriented: messages

TCP: Transport Control Protocol • Reliable, in-order, and at-most-once delivery • Stream oriented: messages can be of arbitrary length • Provides multiplexing/demultiplexing to IP • Provides congestion and flow control • Application examples: file transfer, chat, HTTP 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 50

TCP Service 1. Open connection: 3 -way handshaking 2. Reliable byte stream transfer from

TCP Service 1. Open connection: 3 -way handshaking 2. Reliable byte stream transfer from (IPa, TCP_Port 1) to (IPb, TCP_Port 2) • Indication if connection fails: Reset 3. Close (tear-down) connection 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 51

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to an Address (host: port) Connect it to server (host: port) Listen for Connection Socket Accept syscall() Connection Socket write request read response write response Close Client Socket Close Connection Socket Close Server Socket 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 52

Open Connection: 3 -Way Handshake • Goal: agree on a set of parameters, i.

Open Connection: 3 -Way Handshake • Goal: agree on a set of parameters, i. e. , the start sequence number for each side • Starting sequence number: sequence of first byte in stream • Starting sequence numbers are random 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 53

Open Connection: 3 -Way Handshake Server Client (initiator) time • Server calls listen() to

Open Connection: 3 -Way Handshake Server Client (initiator) time • Server calls listen() to wait for a new connect() connection • Client calls connect() providing server’s IP address and port number • Each side sends SYN packet proposing an initial sequence number (one for each sender) and ACKs the other 10/21/2021 listen() SYN, Seq. N um = x d Ack = n a y = m u , Seq. N YN and ACK x+1 S ACK, Ack = Kumar CS 162 at UC Berkeley, Summer 2020 y+1 allocate buffer space, connection enqueued accept() dequeues connection 54

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to an Address (host: port) Connect it to server (host: port) Listen for Connection Socket Accept syscall() Connection Socket write request read response write response Close Client Socket Close Connection Socket Close Server Socket 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 55

Close Connection: 4 -Way Teardown • Connection is not closed until both sides agree

Close Connection: 4 -Way Teardown • Connection is not closed until both sides agree 10/21/2021 Host 2 FIN close() FIN ACK OS discards data (no socket to give it to) data Any calls to read() return 0 FIN close() FIN ACK Can retransmit FIN ACK if it is lost timeout • If multiple FDs on Host 1 refer to this connection, all of them must be closed • Same for close() call on Host 2 Host 1 OS deallocates connection state Kumar CS 162 at UC Berkeley, Summer 2020 56

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to

Sockets in Schematic Client Server Create Server Socket Create Client Socket Bind it to an Address (host: port) Connect it to server (host: port) Listen for Connection Socket Accept syscall() Connection Socket write request read response write response Close Client Socket Close Connection Socket Close Server Socket 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 57

Problem: Packet Reordering • Want to divide a message into packets / frames •

Problem: Packet Reordering • Want to divide a message into packets / frames • Think about downloading a file over IP • 64 K max packet size • IP might reorder these packets • Imagine receiving the end of a file before the beginning! 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 58

Solution: Sequence Numbers and Buffering • Simulate ordered messages on top of unordered messages

Solution: Sequence Numbers and Buffering • Simulate ordered messages on top of unordered messages • Assign each packet a sequence number: 0, 1, 2, 3, … • If packets arrive out of order, hold on to them • Deliver their data in order to user (through socket interface) • Example: Hold on to #3 until #2 arrives, etc. 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 59

Problem: Dropped Packets • All physical networks can garble or drop packets • Physical

Problem: Dropped Packets • All physical networks can garble or drop packets • Physical hardware problems (bad wire, bad signal) • Therefore, IP can garble or drop packets • It doesn't repair this itself (end-to-end principle!) • Building reliable message delivery • Confirm that packets aren't garbled • Confirm that packets arrive exactly once 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 60

Solution: Acknowledgments A Packe t ack B A Timeout Packe t B Packe t

Solution: Acknowledgments A Packe t ack B A Timeout Packe t B Packe t ack • Checksum: Detect garbled packets • Receiver sends a packet to acknowledge when a packet received and ungarbled • No acknowledgement? Resend after timeout • What if acknowledgement dropped? • Packet is resent (wasteful), second chance to acknowledge 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 61

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT ACK 2 3 Time 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 62

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT ACK 2 3 Time 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 63

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT

Stop-and-Wait (No Packet Loss) Sender • RTT 1 Receiver d ACK 1 2 RTT ACK 2 3 Time 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 64

Stop-and-Wait with Packet Loss • Loss recovery relies on timeouts • How to choose

Stop-and-Wait with Packet Loss • Loss recovery relies on timeouts • How to choose a good timeout? Sender RTT timeout 1 Receiver ACK 1 1 Time 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 65

Moving Away From Stop-and-Wait • Idea: don’t wait for ACK before sending next packet

Moving Away From Stop-and-Wait • Idea: don’t wait for ACK before sending next packet • How many packets are in-flight now? • How long does the sender have to keep the packets around? • How long does the receiver have to keep the packets’ data? • What if sender is sending packets faster than the receiver can process the data? 10/21/2021 Sender Receiver d RTT Time Kumar CS 162 at UC Berkeley, Summer 2020 66

Recall: Communication Between Processes write(wfd, wbuf, wlen); Process A Process B In-Memory Queue n

Recall: Communication Between Processes write(wfd, wbuf, wlen); Process A Process B In-Memory Queue n = read(rfd, rbuf, rmax); • Data written by A is held in memory until B reads it • Queue has a fixed capacity • Writing to the queue blocks if the queue if full • Reading from the queue blocks if the queue is empty • POSIX provides this abstraction in the form of pipes 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 67

Buffering in a TCP Connection Host 1 Host 2 Process A Process B Send

Buffering in a TCP Connection Host 1 Host 2 Process A Process B Send Queue Receive Queue Data (Packets) Send Queue Separate pair of queues per TCP connection • A single TCP connection needs four in-memory queues: • Send buffer: add data on write syscall, remove data when ACK received • Receive buffer: add data when packets received, remove data on read syscall 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 68

Window Size: Space in Receive Queue Host 1 Host 2 Process A Process B

Window Size: Space in Receive Queue Host 1 Host 2 Process A Process B Send Queue Receive Queue Data (Packets) Send Queue Separate pair of queues per TCP connection • A host’s window size for a TCP connection is how much remaining space it has in its receive queue • A host advertises its window size in every TCP packet it sends! • Sender never sends more than receiver’s advertised window size 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 69

Sliding Window Protocol • TCP sender knows receiver’s window size, and aims never to

Sliding Window Protocol • TCP sender knows receiver’s window size, and aims never to exceed it • But packets that it previously send may arrive, filling the window size! Rule: TCP sender ensures that: Number of Sent but Un. ACKed Bytes < Receiver’s Advertised Window Size • Can send new packets as long as sent-but-unacked packets haven’t already filled the advertised window size 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 70

Sliding Window (No Packet Loss) Unacked packets that sender sent • {1} {1, 2,

Sliding Window (No Packet Loss) Unacked packets that sender sent • {1} {1, 2, 3} {2, 3, 4} {3, 4, 5} {4, 5, 6}. . . 10/21/2021 1 2 3 4 5 6 Out-o-seq packets in receiver’s window {} {} {}. . . Time Sender Kumar CS 162 at UC Berkeley, Summer 2020 Receiver 71

ACKs and Loss Recovery • In TCP receiver always ACKs the packet before the

ACKs and Loss Recovery • In TCP receiver always ACKs the packet before the first loss • Duplicate ACKs are a signal for packet loss • Rely on timeouts as well (in case the ACKs, too, are lost…) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 73

Sliding Window with Packet Loss • In the actual TCP protocol, you actually ACK

Sliding Window with Packet Loss • In the actual TCP protocol, you actually ACK the sequence number you expect next • But that’s just a minor detail… Timeout Packet 4 Window size (W) = 3 packets 1 2 3 4 5 6 7 {} {} {} 4 Assume packet 4 lost! Sender 10/21/2021 ACK 3 Why doesn’t sender retransmit packet 4 here? Kumar CS 162 at UC Berkeley, Summer 2020 {5} {5, 6} {6, 7} {} Receiver 74 4 is missin

Congestion • Too much data trying to flow through some part of the network

Congestion • Too much data trying to flow through some part of the network • IP’s solution: Drop packets • What happens to TCP connection? • Lots of retransmission – wasted work and wasted bandwidth (when bandwidth is scarce) 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 77

Congestion Management • TCP artificially restricts the window size if it sees packet loss

Congestion Management • TCP artificially restricts the window size if it sees packet loss • Careful control loop to make sure: 1. We don’t send too fast and overwhelm the network 2. We utilize most of the bandwidth the network has available • In general, these are conflicting goals! 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 78

Summary Physical Reality: Frames OS Abstraction: Sockets (Byte Stream) Limited size Arbitrary size No

Summary Physical Reality: Frames OS Abstraction: Sockets (Byte Stream) Limited size Arbitrary size No ordering guarantees Bytes received in order Unreliable—packets may be dropped Reliable—all sent data is received Machine-to-machine communication Process-to-process communication Only on Local Area Network Routed anywhere, across networks Asynchronous Synchronous 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 79

Summary: TCP • Use sequence numbers to solve out-of-order delivery problem • Use acknowledgements

Summary: TCP • Use sequence numbers to solve out-of-order delivery problem • Use acknowledgements to solve reliable delivery problem • For better utilization, allow a window of unacknowledged packets • Adjust window size in response to perceived congestion events 10/21/2021 Kumar CS 162 at UC Berkeley, Summer 2020 80