Datacenter architectures V Arun College of Computer Science

  • Slides: 128
Download presentation
Datacenter architectures V. Arun College of Computer Science University of Massachusetts Amherst 1

Datacenter architectures V. Arun College of Computer Science University of Massachusetts Amherst 1

Data center networks § 10’s to 100’s of thousands of hosts, often closely coupled,

Data center networks § 10’s to 100’s of thousands of hosts, often closely coupled, in close proximity: • e-business (e. g. Amazon) • content-servers (e. g. , You. Tube, Akamai, Apple, Microsoft) • search engines, data mining (e. g. , Google) v challenges: § multiple applications, each serving massive numbers of clients § managing/balancing load, avoiding processing, networking, data bottlenecks Link Layer Inside a 40 -ft Microsoft container, Chicago data center U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 5 -2

Data center networks load balancer: application-layer routing § receives external client requests § directs

Data center networks load balancer: application-layer routing § receives external client requests § directs workload within data center § returns results to external client (hiding data center internals from client) Internet Border router Load balancer Access router Tier-1 switches B A Load balancer C Tier-2 switches TOR switches Server racks 1 2 3 4 5 6 7 8 U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science Link Layer 5 -3

Data center networks v rich interconnection among switches, racks: § increased throughput between racks

Data center networks v rich interconnection among switches, racks: § increased throughput between racks (multiple routing paths possible) § increased reliability via redundancy Tier-1 switches Tier-2 switches TOR switches Server racks 1 2 3 4 5 6 7 8 U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science

Broad questions § How are massive numbers of commodity machines networked inside a data

Broad questions § How are massive numbers of commodity machines networked inside a data center? § Virtualization: How to effectively manage physical machine resources across client virtual machines? § Operational costs: • Server equipment • Power and cooling U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 5

Source: NRDC research paper AMHERST U NIVERSITY OF MASSACHUSETTS • School of Computer Science

Source: NRDC research paper AMHERST U NIVERSITY OF MASSACHUSETTS • School of Computer Science 6

Breakdown wrt DC size Source: NRDC research paper U NIVERSITY OF MASSACHUSETTS AMHERST •

Breakdown wrt DC size Source: NRDC research paper U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 7

Chapter 5 Link Layer Computer Networking: A Top Down Approach 6 th edition Jim

Chapter 5 Link Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 All material copyright 1996 -2012 J. F Kurose and K. W. Ross, All Rights Reserved Link Layer 5 -8

Chapter 5: Link layer our goals: v understand principles behind link layer services: §

Chapter 5: Link layer our goals: v understand principles behind link layer services: § § v error detection, correction multiple access: sharing a broadcast channel link layer addressing local area networking: Ethernet, VLANs instantiation, implementation of various link layer technologies Link Layer 5 -9

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -10

Link layer: introduction terminology: hosts and routers: nodes v communication channels that connect adjacent

Link layer: introduction terminology: hosts and routers: nodes v communication channels that connect adjacent nodes along communication path: links § wired links § wireless links § LANs v layer-2 packet: frame, encapsulates datagram data-link layer has responsibility of transferring datagram from one node to physically adjacent node over a link v global ISP Link Layer 5 -11

Link layer: context v v datagram transferred by different link protocols over different links:

Link layer: context v v datagram transferred by different link protocols over different links: § e. g. , Ethernet on first link, frame relay on intermediate links, 802. 11 on last link each link protocol provides different services § e. g. , may or may not provide rdt over link transportation analogy: v v v trip from Amherst to Lausanne § limo: Amherst to BOS § plane: BOS to Geneva § train: Geneva to Lausanne tourist = datagram transport segment = communication link transportation mode = link layer protocol travel agent = routing algorithm Link Layer 5 -12

Link layer services v framing, multiple link access: § encapsulate datagram into frame, adding

Link layer services v framing, multiple link access: § encapsulate datagram into frame, adding header, trailer § channel access if shared medium § “MAC” addresses used in frame headers to identify source and destination • different from IP address! • Q: why two addresses for the same interface? v reliable delivery between adjacent nodes § we learned how to do this already (chapter 3)! § seldom used on low bit-error link (fiber, twisted pair) § wireless links: high error rates, need link-layer reliability • Q: why both link-level and end-end reliability? Link Layer 5 -13

Link layer services (more) v flow control: § pacing between adjacent sending and receiving

Link layer services (more) v flow control: § pacing between adjacent sending and receiving nodes v error detection: § errors caused by signal attenuation, noise. § receiver detects presence of errors: signals sender for retransmission or drops frame v error correction: § receiver identifies and corrects bit error(s) without resorting to retransmission v half-duplex and full-duplex § with half duplex, nodes at both ends of link can transmit, but not at same time Link Layer 5 -14

Where is the link layer implemented? v v every host and router implemented in

Where is the link layer implemented? v v every host and router implemented in “adaptor” (aka network interface card NIC) or on a chip § Ethernet card, 802. 11 card; Ethernet chipset § implements link and physical layers attaches to host system buses combination of hardware, software, firmware application transport network link cpu memory controller link physical host bus (e. g. , PCI) physical transmission network adapter card Link Layer 5 -15

Adaptors communicating datagram controller receiving host sending host datagram frame v sending side: §

Adaptors communicating datagram controller receiving host sending host datagram frame v sending side: § encapsulates datagram in link layer frame § adds error checking bits, rdt, flow control, etc. v receiving side § looks for errors, rdt, flow control, etc § extracts datagram, passes to upper layer Link Layer 5 -16

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -17

Error detection and correction EDC= Error Detection and Correction bits (redundancy) D = Data

Error detection and correction EDC= Error Detection and Correction bits (redundancy) D = Data protected by error checking, may include header fields • Error detection not 100% reliable! • protocol may miss some errors, but rarely • larger EDC field yields better detection and correction otherwise Link Layer 5 -18

Parity checking single bit parity: v detect single bit errors two-dimensional bit parity: v

Parity checking single bit parity: v detect single bit errors two-dimensional bit parity: v detect and correct single bit errors 0 0 Link Layer 5 -19

Internet checksum (review) goal: detect “errors” (e. g. , flipped bits) in transmitted packet

Internet checksum (review) goal: detect “errors” (e. g. , flipped bits) in transmitted packet (note: used at transport layer only) sender: v v v treat segment contents as sequence of 16 -bit integers checksum: addition (1’s complement sum) of segment contents sender puts checksum value into UDP checksum field receiver: v compute checksum of received segment v check if computed checksum equals checksum field value: § NO - error detected § YES - no error detected. But maybe errors nonetheless? Link Layer 5 -20

Cyclic redundancy check v v more powerful error-detection than Internet checksums view data bits,

Cyclic redundancy check v v more powerful error-detection than Internet checksums view data bits, D, as a binary number choose r+1 bit pattern (generator), G goal: choose r CRC bits, R, such that § <D, R> exactly divisible by G (modulo 2) § receiver knows G, divides <D, R> by G. If non-zero remainder: error detected! § can detect all burst errors less than r+1 bits v widely used in practice (Ethernet, 802. 11 Wi. Fi, ATM) Link Layer 5 -21

CRC example want: D. 2 r XOR R = n. G equivalently: D. 2

CRC example want: D. 2 r XOR R = n. G equivalently: D. 2 r = n. G XOR R equivalently: if we divide D. 2 r by G, want remainder R to satisfy: R = remainder[ D. 2 r ] G Link Layer 5 -22

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -23

Multiple access links, protocols two types of “links”: v point-to-point § PPP for dial-up

Multiple access links, protocols two types of “links”: v point-to-point § PPP for dial-up access § point-to-point link between Ethernet switch, host v broadcast (shared wire or medium) § old-fashioned Ethernet § upstream HFC § 802. 11 wireless LAN shared wire (e. g. , cabled Ethernet) shared RF (e. g. , 802. 11 Wi. Fi) shared RF (satellite) humans at a cocktail party (shared air, acoustical) Link Layer 5 -24

Multiple access protocols v v single shared broadcast channel two or more simultaneous transmissions

Multiple access protocols v v single shared broadcast channel two or more simultaneous transmissions interference as simultaneously received signals collide causing errors multiple access protocol v v distributed algorithm that determines how nodes share channel, i. e. , determine when node can transmit communication about channel sharing must use channel itself! § no out-of-band channel for coordination Link Layer 5 -25

An ideal multiple access protocol given: broadcast channel of rate R bps goal: 1.

An ideal multiple access protocol given: broadcast channel of rate R bps goal: 1. when one node wants to transmit, it can send at rate R. 2. when M nodes want to transmit, each can send at average rate R/M 3. fully decentralized: • no special node to coordinate transmissions • no synchronization of clocks, slots 4. simple Link Layer 5 -26

MAC protocols: taxonomy three broad classes: v channel partitioning § divide channel into smaller

MAC protocols: taxonomy three broad classes: v channel partitioning § divide channel into smaller “pieces” (time slots, frequency, code) § allocate piece to node for exclusive use v random access § channel not divided, allow collisions § “recover” from collisions v “taking turns” § nodes take turns, but nodes with more to send can take longer turns Link Layer 5 -27

Channel partitioning MAC protocols: TDMA: time division multiple access v v access to channel

Channel partitioning MAC protocols: TDMA: time division multiple access v v access to channel in "rounds" each station gets fixed length slot (length = pkt trans time) in each round unused slots go idle example: 6 -station LAN, 1, 3, 4 have pkt, slots 2, 5, 6 idle 6 -slot frame 1 3 4 Link Layer 5 -28

Channel partitioning MAC protocols: FDMA: frequency division multiple access v v channel spectrum divided

Channel partitioning MAC protocols: FDMA: frequency division multiple access v v channel spectrum divided into frequency bands each station assigned fixed frequency band unused transmission time in frequency bands go idle example: 6 -station LAN, 1, 3, 4 have pkt, frequency bands 2, 5, 6 idle FDM cable frequency bands time Link Layer 5 -29

Random access protocols v when node has packet to send § transmit at full

Random access protocols v when node has packet to send § transmit at full channel data rate R. § no a priori coordination among nodes v v two or more transmitting nodes ➜ “collision”, random access MAC protocol specifies: § how to detect collisions § how to recover from collisions (e. g. , via delayed retransmissions) v examples of random access MAC protocols: § slotted ALOHA § CSMA, CSMA/CD, CSMA/CA Link Layer 5 -30

Slotted ALOHA assumptions: v v v all frames same size time divided into same

Slotted ALOHA assumptions: v v v all frames same size time divided into same size slots (time to transmit 1 frame) nodes start to transmit only slot beginning nodes are synchronized if 2 or more nodes transmit in slot, all nodes detect collision operation: v when node obtains fresh frame, transmits in next slot § if no collision: node can send new frame in next slot § if collision: node retransmits frame in each subsequent slot with probability p until success Link Layer 5 -31

Slotted ALOHA node 1 1 1 node 2 2 2 node 3 3 C

Slotted ALOHA node 1 1 1 node 2 2 2 node 3 3 C 2 3 E C S E Pros: v v v 1 1 single active node can continuously transmit at full rate of channel highly decentralized: only slots in nodes need to be in sync simple C 3 E S S Cons: v v collisions, wasting slots idle slots nodes may be able to detect collision in less than time to transmit packet clock synchronization Link Layer 5 -32

Slotted ALOHA: efficiency: long-run fraction of successful slots (many nodes, all with many frames

Slotted ALOHA: efficiency: long-run fraction of successful slots (many nodes, all with many frames to send) v suppose: N nodes with many frames to send, each transmits in slot with probability p v prob that given node has success in a slot = v prob that any node has a success = v v max efficiency: find p* that maximizes [ ] for many nodes, take limit of [ ] as N goes to infinity, gives: max efficiency = 1/e =. 37 at best: channel used for useful transmissions 37% of time! ! Link Layer 5 -33

Slotted ALOHA: efficiency: long-run fraction of successful slots (many nodes, all with many frames

Slotted ALOHA: efficiency: long-run fraction of successful slots (many nodes, all with many frames to send) v suppose: N nodes with many frames to send, each transmits in slot with probability p v prob that given node has success in a slot = p(1 -p)N-1 v prob that any node has a success = Np(1 -p)N-1 v v max efficiency: find p* that maximizes Np(1 -p)N-1 for many nodes, take limit of Np*(1 -p*)N-1 as N goes to infinity, gives: max efficiency = 1/e =. 37 at best: channel used for useful transmissions 37% of time! ! Link Layer 5 -34

Pure (unslotted) ALOHA v v v unslotted Aloha: simpler, no synchronization when frame first

Pure (unslotted) ALOHA v v v unslotted Aloha: simpler, no synchronization when frame first arrives § transmit immediately collision probability increases: § frame sent at t 0 collides with other frames sent in [t 01, t 0+1] Link Layer 5 -35

Pure ALOHA efficiency P(success by given node) = P(node transmits). P(no other node transmits

Pure ALOHA efficiency P(success by given node) = P(node transmits). P(no other node transmits in [t 0 -1, t 0] = p. (1 -p)N-1 = p. (1 -p)2(N-1) … choosing optimum p and then letting n = 1/(2 e) =. 18 even worse than slotted Aloha! Link Layer 5 -36

CSMA (carrier sense multiple access) CSMA: listen before transmit: v if channel sensed idle:

CSMA (carrier sense multiple access) CSMA: listen before transmit: v if channel sensed idle: transmit entire frame if channel sensed busy, defer transmission v human analogy: don’t interrupt others! v Link Layer 5 -37

CSMA collisions v v spatial layout of nodes collisions can still occur: propagation delay

CSMA collisions v v spatial layout of nodes collisions can still occur: propagation delay means two nodes may not hear other’s transmission collision: entire packet transmission time wasted § distance & propagation delay play role in in determining collision probability Link Layer 5 -38

CSMA/CD (collision detection) CSMA/CD: carrier sensing, deferral as in CSMA § collisions detected within

CSMA/CD (collision detection) CSMA/CD: carrier sensing, deferral as in CSMA § collisions detected within short time § colliding transmissions aborted, reducing channel wastage v collision detection: § easy in wired LANs: measure signal strengths, compare transmitted, received signals § difficult in wireless LANs: received signal strength overwhelmed by local transmission strength v human analogy: the polite conversationalist Link Layer 5 -39

CSMA/CD (collision detection) spatial layout of nodes Link Layer 5 -40

CSMA/CD (collision detection) spatial layout of nodes Link Layer 5 -40

Ethernet CSMA/CD algorithm 1. NIC receives datagram 4. If NIC detects another from network

Ethernet CSMA/CD algorithm 1. NIC receives datagram 4. If NIC detects another from network layer, transmission while creates frame transmitting, aborts and sends jam signal 2. If NIC senses channel idle, starts frame 5. After aborting, NIC transmission. Else if enters binary NIC senses channel (exponential) backoff: busy, waits until § after mth collision, NIC channel idle, then chooses K at random transmits. from {0, 1, 2, …, 2 m-1}. NIC waits K·512 bit 3. If NIC transmits entire times, returns to Step frame without detecting 2 another transmission, § longer backoff interval NIC is done with frame ! with more collisions Link Layer 5 -41

CSMA/CD efficiency v v tprop = max prop delay between 2 nodes in LAN

CSMA/CD efficiency v v tprop = max prop delay between 2 nodes in LAN ttrans = time to transmit max-size frame efficiency goes to 1 § as tprop goes to 0 § as ttrans goes to infinity better performance than ALOHA: and simple, cheap, decentralized! Link Layer 5 -42

“Taking turns” MAC protocols channel partitioning MAC protocols: § share channel efficiently and fairly

“Taking turns” MAC protocols channel partitioning MAC protocols: § share channel efficiently and fairly at high load § inefficient at low load: delay in channel access, 1/N bandwidth allocated even if only 1 active node! random access MAC protocols § efficient at low load: single node can fully utilize channel § high load: collision overhead “taking turns” protocols look for best of both worlds! Link Layer 5 -43

“Taking turns” MAC protocols polling: v v v master node “invites” slave nodes to

“Taking turns” MAC protocols polling: v v v master node “invites” slave nodes to transmit in turn typically used with “dumb” slave devices concerns: § polling overhead § latency § single point of failure (master) data poll master data slaves Link Layer 5 -44

“Taking turns” MAC protocols token passing: v v v control token passed from one

“Taking turns” MAC protocols token passing: v v v control token passed from one node to next sequentially. token message concerns: § token overhead § latency § single point of failure (token) T (nothing to send) T data Link Layer 5 -45

Cable access network Internet frames, TV channels, control transmitted downstream at different frequencies cable

Cable access network Internet frames, TV channels, control transmitted downstream at different frequencies cable headend … CMTS cable modem termination system ISP v v … splitter cable modem upstream Internet frames, TV control, transmitted upstream at different frequencies in time slots multiple 40 Mbps downstream (broadcast) channels § single CMTS transmits into channels multiple 30 Mbps upstream channels § multiple access: all users contend for certain upstream channel time slots (others assigned)

Cable access network cable headend MAP frame for Interval [t 1, t 2] Downstream

Cable access network cable headend MAP frame for Interval [t 1, t 2] Downstream channel i CMTS Upstream channel j t 1 Minislots containing minislots request frames t 2 Residences with cable modems Assigned minislots containing cable modem upstream data frames DOCSIS: data over cable service interface spec v v FDM over upstream, downstream frequency channels TDM upstream: some slots assigned, some have contention § downstream MAP frame: assigns upstream slots § request for upstream slots (and data) transmitted random access (binary backoff) in selected Link slots Layer 5 -47

Summary of MAC protocols v channel partitioning, by time, frequency or code § Time

Summary of MAC protocols v channel partitioning, by time, frequency or code § Time Division, Frequency Division v v random access (dynamic), § ALOHA, S-ALOHA, CSMA/CD § carrier sensing: easy in some technologies (wire), hard in others (wireless) § CSMA/CD used in Ethernet § CSMA/CA used in 802. 11 taking turns § polling from central site, token passing § bluetooth, FDDI, token ring Link Layer 5 -48

Q 1 Error detection/correction Can these schemes correct bit errors: Internet checksums, two-dimendional parity,

Q 1 Error detection/correction Can these schemes correct bit errors: Internet checksums, two-dimendional parity, cyclic redundancy check (CRC) A. Yes, No B. No, Yes C. No, Yes, No D. No, Yes E. Ho, hum, ha v Data Link Layer 5 -49

Q 2 CRC vs Internet checksums Which of these is not true? A. CRC’s

Q 2 CRC vs Internet checksums Which of these is not true? A. CRC’s are commonly used at the link layer B. CRC’s can detect any bit error of up to r bits with an r-bit EDC. C. CRC’s are more resilient to bursty bit errors D. CRC’s can not correct bit errors v Data Link Layer 5 -50

Q 3 Random access Consider an ALOHA network with N users that transmit with

Q 3 Random access Consider an ALOHA network with N users that transmit with probability p in slots just after a collision. Assuming users have infinite data to send, what is the probability that a slot is successful (no collisions)? A. Np B. p(1 -p)N-1 C. Np(1 -p)N-1 D. C(N, N/2)p(1 -p)N-1 E. Np/(1 -p) v Data Link Layer 5 -51

Q 4 Random access v Random access protocols achieve all four of the properties

Q 4 Random access v Random access protocols achieve all four of the properties below: True(A)/false(B)? 1. when one node wants to transmit, it can send at rate R. 2. when M nodes want to transmit, each can send at average rate R/M 3. fully decentralized: • no special node to coordinate transmissions • no synchronization of clocks, slots 4. simple v Data Link Layer 5 -52

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -53

MAC addresses and ARP v 32 -bit IP address: § network-layer address for interface

MAC addresses and ARP v 32 -bit IP address: § network-layer address for interface § used for layer 3 (network layer) forwarding v MAC (or LAN or physical or Ethernet) address: § function: used ‘locally” to get frame from one interface to another physically-connected interface (same network, in IP-addressing sense) § 48 bit MAC address (for most LANs) burned in NIC ROM, also sometimes software settable § e. g. : 1 A-2 F-BB-76 -09 -AD hexadecimal (base 16) notation (each “number” represents 4 bits) Link Layer 5 -54

LAN addresses and ARP each adapter on LAN has unique LAN address 1 A-2

LAN addresses and ARP each adapter on LAN has unique LAN address 1 A-2 F-BB-76 -09 -AD LAN (wired or wireless) 71 -65 -F 7 -2 B-08 -53 adapter 58 -23 -D 7 -FA-20 -B 0 0 C-C 4 -11 -6 F-E 3 -98 Link Layer 5 -55

LAN addresses (more) v v v MAC address allocation administered by IEEE manufacturer buys

LAN addresses (more) v v v MAC address allocation administered by IEEE manufacturer buys portion of MAC address space (to assure uniqueness) analogy: § MAC address: like Social Security Number § IP address: like postal address v MAC flat address ➜ portability § can move LAN card from one LAN to another v IP hierarchical address not portable § address depends on IP subnet to which node is attached Link Layer 5 -56

ARP: address resolution protocol Question: how to determine interface’s MAC address, knowing its IP

ARP: address resolution protocol Question: how to determine interface’s MAC address, knowing its IP address? 137. 196. 7. 78 1 A-2 F-BB-76 -09 -AD 137. 196. 7. 23 137. 196. 7. 14 LAN 71 -65 -F 7 -2 B-08 -53 58 -23 -D 7 -FA-20 -B 0 0 C-C 4 -11 -6 F-E 3 -98 137. 196. 7. 88 ARP table: each IP node (host, router) on LAN has table § IP/MAC address mappings for some LAN nodes: < IP address; MAC address; TTL> § TTL (Time To Live): time after which address mapping will be forgotten (typically 20 min) Link Layer 5 -57

ARP protocol: same LAN v A wants to send datagram to B § B’s

ARP protocol: same LAN v A wants to send datagram to B § B’s MAC address not in A’s ARP table. v A broadcasts ARP query packet, containing B's IP address § dest MAC address = FFFF-FF-FF § all nodes on LAN receive ARP query v v B receives ARP packet, replies to A with its (B's) MAC address § frame sent to A’s MAC address (unicast) A caches (saves) IP-to. MAC address pair in its ARP table until information becomes old (times out) § soft state: information that times out (goes away) unless refreshed v ARP is “plug-and-play”: § nodes create their ARP tables without intervention from net administrator Link Layer 5 -58

Addressing: routing to another LAN walkthrough: send datagram from A to B via R

Addressing: routing to another LAN walkthrough: send datagram from A to B via R § focus on addressing – at IP (datagram) and MAC layer (frame) § assume A knows B’s IP address § assume A knows IP address of first hop router, R (how? ) § assume A knows R’s MAC address (how? ) A R 111 74 -29 -9 C-E 8 -FF-55 B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -59

Addressing: routing to another LAN A creates IP datagram with IP source A, destination

Addressing: routing to another LAN A creates IP datagram with IP source A, destination B A creates link-layer frame with R's MAC address as dest, frame contains A-to-B IP datagram v v MAC src: 74 -29 -9 C-E 8 -FF-55 MAC dest: E 6 -E 9 -00 -17 -BB-4 B IP src: 111 IP dest: 222 IP Eth Phy A R 111 74 -29 -9 C-E 8 -FF-55 B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -60

Addressing: routing to another LAN frame sent from A to R frame received at

Addressing: routing to another LAN frame sent from A to R frame received at R, datagram removed, passed up to IP v v MAC src: 74 -29 -9 C-E 8 -FF-55 MAC dest: E 6 -E 9 -00 -17 -BB-4 B IP src: 111 IP dest: 222 IP Eth Phy A IP Eth Phy R 111 74 -29 -9 C-E 8 -FF-55 B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -61

Addressing: routing to another LAN v v R forwards datagram with IP source A,

Addressing: routing to another LAN v v R forwards datagram with IP source A, destination B R creates link-layer frame with B's MAC address as dest, frame contains A-to-B IP datagram MAC src: 1 A-23 -F 9 -CD-06 -9 B MAC dest: 49 -BD-D 2 -C 7 -56 -2 A IP src: 111 IP dest: 222 IP Eth Phy A R 111 74 -29 -9 C-E 8 -FF-55 IP Eth Phy B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -62

Addressing: routing to another LAN v v R forwards datagram with IP source A,

Addressing: routing to another LAN v v R forwards datagram with IP source A, destination B R creates link-layer frame with B's MAC address as dest, frame contains A-to-B IP datagram MAC src: 1 A-23 -F 9 -CD-06 -9 B MAC dest: 49 -BD-D 2 -C 7 -56 -2 A IP src: 111 IP dest: 222 IP Eth Phy A R 111 74 -29 -9 C-E 8 -FF-55 IP Eth Phy B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -63

Addressing: routing to another LAN v v R forwards datagram with IP source A,

Addressing: routing to another LAN v v R forwards datagram with IP source A, destination B R creates link-layer frame with B's MAC address as dest, frame contains A-to-B IP datagram MAC src: 1 A-23 -F 9 -CD-06 -9 B MAC dest: 49 -BD-D 2 -C 7 -56 -2 A IP src: 111 IP dest: 222 IP Eth Phy A R 111 74 -29 -9 C-E 8 -FF-55 B 222 49 -BD-D 2 -C 7 -56 -2 A 222. 220 1 A-23 -F 9 -CD-06 -9 B 111. 112 CC-49 -DE-D 0 -AB-7 D 111. 110 E 6 -E 9 -00 -17 -BB-4 B 222. 221 88 -B 2 -2 F-54 -1 A-0 F Link Layer 5 -64

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -65

Ethernet “dominant” wired LAN technology: v cheap $20 for NIC v first widely used

Ethernet “dominant” wired LAN technology: v cheap $20 for NIC v first widely used LAN technology v simpler, cheaper than token LANs and ATM v kept up with speed race: 10 Mbps – 10 Gbps Metcalfe’s Ethernet sketch Link Layer 5 -66

Ethernet: physical topology v bus: popular through mid 90 s § all nodes in

Ethernet: physical topology v bus: popular through mid 90 s § all nodes in same collision domain (can collide with each other) v star: prevails today § active switch in center § each “spoke” runs a (separate) Ethernet protocol (nodes do not collide with each other) switch bus: coaxial cable star Link Layer 5 -67

Ethernet frame structure sending adapter encapsulates IP datagram (or other network layer protocol packet)

Ethernet frame structure sending adapter encapsulates IP datagram (or other network layer protocol packet) in type Ethernet frame dest. source preamble address data (payload) CRC preamble: v 7 bytes with pattern 1010 followed by one byte with pattern 10101011 v used to synchronize receiver, sender clock rates Link Layer 5 -68

Ethernet frame structure (more) v addresses: 6 byte source, destination MAC addresses § if

Ethernet frame structure (more) v addresses: 6 byte source, destination MAC addresses § if adapter receives frame with matching destination address, or with broadcast address (e. g. ARP packet), it passes data in frame to network layer protocol § otherwise, adapter discards frame v v type: indicates higher layer protocol (mostly IP but others possible, e. g. , Novell IPX, Apple. Talk) CRC: cyclic redundancy check at receiver § error detected: frame is dropped type dest. source preamble address data (payload) CRC Link Layer 5 -69

Ethernet: unreliable, connectionless v v v connectionless: no handshaking between sending and receiving NICs

Ethernet: unreliable, connectionless v v v connectionless: no handshaking between sending and receiving NICs unreliable: receiving NIC doesnt send acks or nacks to sending NIC § data in dropped frames recovered only if initial sender uses higher layer rdt (e. g. , TCP), otherwise dropped data lost Ethernet’s MAC protocol: unslotted CSMA/CD wth binary backoff Link Layer 5 -70

802. 3 Ethernet standards: link & physical layers v many different Ethernet standards §

802. 3 Ethernet standards: link & physical layers v many different Ethernet standards § common MAC protocol and frame format § different speeds: 2 Mbps, 100 Mbps, 1 Gbps, 10 G bps § different physical layer media: fiber, cable application transport network link physical MAC protocol and frame format 100 BASE-TX 100 BASE-T 2 100 BASE-FX 100 BASE-T 4 100 BASE-SX 100 BASE-BX copper (twister pair) physical layer fiber physical layer Link Layer 5 -71

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -72

Ethernet switch v v v link-layer device: takes an active role § store, forward

Ethernet switch v v v link-layer device: takes an active role § store, forward Ethernet frames § examine incoming frame’s MAC address, selectively forward frame to one-or-more outgoing links when frame is to be forwarded on segment, uses CSMA/CD to access segment transparent § hosts are unaware of presence of switches plug-and-play, self-learning § switches do not need to be configured Link Layer 5 -73

Switch: multiple simultaneous transmissions v v hosts have dedicated, direct connection to switches buffer

Switch: multiple simultaneous transmissions v v hosts have dedicated, direct connection to switches buffer packets Ethernet protocol used on each incoming link, but no collisions; full duplex § each link is its own collision domain switching: A-to-A’ and B-to-B’ can transmit simultaneously, without collisions A B C’ 6 1 2 4 5 B’ 3 C A’ switch with six interfaces (1, 2, 3, 4, 5, 6) Link Layer 5 -74

Switch forwarding table Q: how does switch know A’ reachable via interface 4, B’

Switch forwarding table Q: how does switch know A’ reachable via interface 4, B’ reachable via interface 5? v A: each switch has a switch table, each entry: § (MAC address of host, interface to reach host, time stamp) § looks like a routing table! Q: how are entries created, maintained in switch table? A B C’ 6 1 2 4 5 B’ 3 C A’ switch with six interfaces (1, 2, 3, 4, 5, 6) § something like a routing protocol? Link Layer 5 -75

Switch: self-learning v switch learns which hosts can be reached through which interfaces §

Switch: self-learning v switch learns which hosts can be reached through which interfaces § when frame received, switch “learns” location of sender: incoming LAN segment § records sender/location pair in switch table A A A’ B C’ 6 1 2 4 5 B’ 3 C A’ MAC addr interface A Source: A Dest: A’ 1 TTL 60 Switch table (initially empty) Link Layer 5 -76

Switch: frame filtering/forwarding when frame received at switch: 1. record incoming link, MAC address

Switch: frame filtering/forwarding when frame received at switch: 1. record incoming link, MAC address of sending host 2. index switch table using MAC destination address 3. if entry found for destination then { if destination on segment from which frame arrived then drop frame else forward frame on interface indicated by entry } else flood /* forward on all interfaces except arriving interface */ Link Layer 5 -77

Self-learning, forwarding: example A v v frame destination, A’, locaton unknown: flood destination A

Self-learning, forwarding: example A v v frame destination, A’, locaton unknown: flood destination A location known: selectively send on just one link Source: A Dest: A’ A A’ B C’ 6 1 2 A A’ 4 5 B’ 3 C A’ A A’ MAC addr interface A A’ 1 4 TTL 60 60 switch table (initially empty) Link Layer 5 -78

Interconnecting switches v switches can be connected together S 4 S 1 S 3

Interconnecting switches v switches can be connected together S 4 S 1 S 3 S 2 A B C F D E I G H Q: sending from A to G - how does S 1 know to forward frame destined to F via S 4 and S 3? v A: self learning! (works exactly the same as in single-switch case!) Link Layer 5 -79

Self-learning multi-switch example Suppose C sends frame to I, I responds to C S

Self-learning multi-switch example Suppose C sends frame to I, I responds to C S 4 S 1 S 3 S 2 A B C F D E v I G H Q: show switch tables and packet forwarding in S 1, S 2, S 3, S 4 Link Layer 5 -80

Institutional network mail server to external network router web server IP subnet Link Layer

Institutional network mail server to external network router web server IP subnet Link Layer 5 -81

Switches vs. routers both are store-and-forward: § routers: network-layer devices (examine network-layer headers) §

Switches vs. routers both are store-and-forward: § routers: network-layer devices (examine network-layer headers) § switches: link-layer devices (examine linklayer headers) both have forwarding tables: § routers: compute tables using routing algorithms, IP addresses § switches: learn forwarding table using flooding, learning, MAC addresses datagram frame application transport network link physical frame link physical switch network datagram link frame physical application transport network link physical Link Layer 5 -82

VLANs: motivation consider: v v Computer Science Electrical Engineering Computer Engineering CS user moves

VLANs: motivation consider: v v Computer Science Electrical Engineering Computer Engineering CS user moves office to EE, but wants connect to CS switch? single broadcast domain: § all layer-2 broadcast traffic (ARP, DHCP, unknown location of destination MAC address) must cross entire LAN § security/privacy, efficiency issues Link Layer 5 -83

VLANs Virtual Local Area Network switch(es) supporting VLAN capabilities can be configured to define

VLANs Virtual Local Area Network switch(es) supporting VLAN capabilities can be configured to define multiple virtual LANS over single physical LAN infrastructure. port-based VLAN: switch ports grouped (by switch management software) so that single physical switch …… 1 7 9 15 2 8 10 16 … … Electrical Engineering (VLAN ports 1 -8) Computer Science (VLAN ports 9 -15) … operates as multiple virtual switches 1 7 9 15 2 8 10 16 … Electrical Engineering (VLAN ports 1 -8) … Computer Science (VLAN ports 9 -16) Link Layer 5 -84

Port-based VLAN v § can also define VLAN based on MAC addresses of endpoints,

Port-based VLAN v § can also define VLAN based on MAC addresses of endpoints, rather than switch port v v router traffic isolation: frames to/from ports 1 -8 can only reach ports 1 -8 dynamic membership: ports can be dynamically assigned among VLANs 1 7 9 15 2 8 10 16 … Electrical Engineering (VLAN ports 1 -8) … Computer Science (VLAN ports 9 -15) forwarding between VLANS: done via routing (just as with separate switches) § in practice vendors sell combined switches plus routers Link Layer 5 -85

VLANS spanning multiple switches 1 7 9 15 1 3 5 7 2 8

VLANS spanning multiple switches 1 7 9 15 1 3 5 7 2 8 10 16 2 4 6 8 … Electrical Engineering (VLAN ports 1 -8) v … Computer Science (VLAN ports 9 -15) Ports 2, 3, 5 belong to EE VLAN Ports 4, 6, 7, 8 belong to CS VLAN trunk port: carries frames between VLANS defined over multiple physical switches § frames forwarded within VLAN between switches can’t be vanilla 802. 1 frames (must carry VLAN ID info) § 802. 1 q protocol adds/removed additional header fields for frames forwarded between trunk ports Link Layer 5 -86

802. 1 Q VLAN frame format type preamble dest. address source address data (payload)

802. 1 Q VLAN frame format type preamble dest. address source address data (payload) CRC 802. 1 frame type preamble dest. address source address data (payload) 2 -byte Tag Protocol Identifier (value: 81 -00) CRC 802. 1 Q frame Recomputed CRC Tag Control Information (12 bit VLAN ID field, 3 bit priority field like IP TOS) Link Layer 5 -87

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -88

Multiprotocol label switching (MPLS) v initial goal: high-speed IP forwarding using fixed length label

Multiprotocol label switching (MPLS) v initial goal: high-speed IP forwarding using fixed length label (instead of IP address) § fast lookup using fixed length identifier (rather than shortest prefix matching) § borrowing ideas from Virtual Circuit (VC) approach § but IP datagram still keeps IP address! PPP or Ethernet header MPLS header label 20 IP header remainder of link-layer frame Exp S TTL 3 1 5 Link Layer 5 -89

MPLS capable routers v v a. k. a. label-switched router forward packets to outgoing

MPLS capable routers v v a. k. a. label-switched router forward packets to outgoing interface based only on label value (don’t inspect IP address) § MPLS forwarding table distinct from IP forwarding tables v flexibility: MPLS forwarding decisions can differ from those of IP § use destination and source addresses to route flows to same destination differently (traffic engineering) § re-route flows quickly if link fails: pre-computed backup paths (useful for Vo. IP) Link Layer 5 -90

MPLS versus IP paths R 6 D R 4 R 3 R 5 A

MPLS versus IP paths R 6 D R 4 R 3 R 5 A R 2 v IP routing: path to destination determined by destination address alone IP router Link Layer 5 -91

MPLS versus IP paths entry router (R 4) can use different MPLS routes to

MPLS versus IP paths entry router (R 4) can use different MPLS routes to A based, e. g. , on source address R 6 D R 4 R 3 R 5 A R 2 v v IP routing: path to destination determined by destination address alone MPLS routing: path to destination can be based on source and dest. address § fast reroute: precompute backup routes in case of link failure IP-only router MPLS and IP router Link Layer 5 -92

MPLS signaling v v modify OSPF, IS-IS link-state flooding protocols to carry info used

MPLS signaling v v modify OSPF, IS-IS link-state flooding protocols to carry info used by MPLS routing, § e. g. , link bandwidth, amount of “reserved” link bandwidth entry MPLS router uses RSVP-TE signaling protocol to set up MPLS forwarding at downstream routers RSVP-TE R 6 D R 4 R 5 modified link state flooding A Link Layer 5 -93

MPLS forwarding tables in label out label dest 10 12 8 out interface A

MPLS forwarding tables in label out label dest 10 12 8 out interface A D A R 6 0 0 1 in label 0 R 4 R 5 out label dest 10 6 A 1 12 9 D 0 0 1 R 3 out interface D 1 0 0 R 2 in label 8 out label dest 6 A out interface in label 6 out. R 1 label dest - A A out interface 0 0 Link Layer 5 -94

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -95

Data center networks v v 10’s to 100’s of thousands of hosts, often closely

Data center networks v v 10’s to 100’s of thousands of hosts, often closely coupled, in close proximity: § e-business (e. g. Amazon) § content-servers (e. g. , You. Tube, Akamai, Apple, Microsoft) § search engines, data mining (e. g. , Google) challenges: § multiple applications, each serving massive numbers of clients § managing/balancing load, avoiding processing, networking, data bottlenecks Inside a 40 -ft Microsoft container, Chicago data center Link Layer 5 -96

Data center networks load balancer: application-layer routing § receives external client requests § directs

Data center networks load balancer: application-layer routing § receives external client requests § directs workload within data center § returns results to external client (hiding data center internals from client) Internet Border router Load balancer Access router Tier-1 switches B A Load balancer C Tier-2 switches TOR switches Server racks 1 2 3 4 5 6 7 8 Link Layer 5 -97

Data center networks v rich interconnection among switches, racks: § increased throughput between racks

Data center networks v rich interconnection among switches, racks: § increased throughput between racks (multiple routing paths possible) § increased reliability via redundancy Tier-1 switches Tier-2 switches TOR switches Server racks 1 2 3 4 5 6 7 8

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5.

Link layer, LANs: outline 5. 1 introduction, services 5. 2 error detection, correction 5. 3 multiple access protocols 5. 4 LANs § § 5. 5 link virtualization: MPLS 5. 6 data center networking 5. 7 a day in the life of a web request addressing, ARP Ethernet switches VLANS Link Layer 5 -99

Synthesis: a day in the life of a web request v journey down protocol

Synthesis: a day in the life of a web request v journey down protocol stack complete! § application, transport, network, link v putting-it-all-together: synthesis! § goal: identify, review, understand protocols (at all layers) involved in seemingly simple scenario: requesting www page § scenario: student attaches laptop to campus network, requests/receives www. google. com Link Layer 5 -100

A day in the life: scenario DNS server browser Comcast network 68. 80. 0.

A day in the life: scenario DNS server browser Comcast network 68. 80. 0. 0/13 school network 68. 80. 2. 0/24 web page web server 64. 233. 169. 105 Google’s network 64. 233. 160. 0/19 Link Layer 5 -101

A day in the life… connecting to the Internet DHCP UDP IP Eth Phy

A day in the life… connecting to the Internet DHCP UDP IP Eth Phy DHCP v DHCP DHCP UDP IP Eth Phy router (runs DHCP) v v connecting laptop needs to get its own IP address, addr of first-hop router, addr of DNS server: use DHCP request encapsulated in UDP, encapsulated in IP, encapsulated in 802. 3 Ethernet frame broadcast (dest: FFFFFF) on LAN, received at router running DHCP server Ethernet demuxed to IP demuxed, UDP demuxed to DHCP Link Layer 5 -102

A day in the life… connecting to the Internet DHCP UDP IP Eth Phy

A day in the life… connecting to the Internet DHCP UDP IP Eth Phy DHCP v v DHCP DHCP UDP IP Eth Phy router (runs DHCP) v DHCP server formulates DHCP ACK containing client’s IP address, IP address of first-hop router for client, name & IP address of DNS server encapsulation at DHCP server, frame forwarded (switch learning) through LAN, demultiplexing at client DHCP client receives DHCP ACK reply Client now has IP address, knows name & addr of DNS server, IP address of its first-hop router Link Layer 5 -103

A day in the life… ARP (before DNS, before HTTP) DNS DNS ARP query

A day in the life… ARP (before DNS, before HTTP) DNS DNS ARP query DNS UDP IP ARP Eth Phy v before sending HTTP request, need IP address of www. google. com: DNS v DNS query created, encapsulated in UDP, encapsulated in IP, encapsulated in Eth. To send frame to router, need MAC address of router interface: ARP query broadcast, received ARP reply Eth Phy router (runs DHCP) v v by router, which replies with ARP reply giving MAC address of router interface client now knows MAC address of first hop router, so can now send frame containing DNS query Link Layer 5 -104

A day in the life… using DNS DNS UDP IP Eth Phy DNS DNS

A day in the life… using DNS DNS UDP IP Eth Phy DNS DNS server Comcast network 68. 80. 0. 0/13 router (runs DHCP) v DNS UDP IP Eth Phy IP datagram containing DNS query forwarded via LAN switch from client to 1 st hop router v v v IP datagram forwarded from campus network into comcast network, routed (tables created by RIP, OSPF, IS-IS and/or BGP routing protocols) to DNS server demux’ed to DNS server replies to client with IP address of www. google. com Link Layer 5 -105

A day in the life…TCP connection carrying HTTP TCP IP Eth Phy SYNACK SYN

A day in the life…TCP connection carrying HTTP TCP IP Eth Phy SYNACK SYN v SYNACK SYN TCP IP Eth Phy web server 64. 233. 169. 105 router (runs DHCP) v to send HTTP request, client first opens TCP socket to web server TCP SYN segment (step 1 in 3 -way handshake) interdomain routed to web server v web server responds with TCP SYNACK (step 2 in 3 way handshake) v TCP connection established! Link Layer 5 -106

A day in the life… HTTP request/reply HTTP TCP IP Eth Phy HTTP HTTP

A day in the life… HTTP request/reply HTTP TCP IP Eth Phy HTTP HTTP HTTP TCP IP Eth Phy web server 64. 233. 169. 105 v router (runs DHCP) web page finally (!!!) displayed v HTTP request sent into TCP socket v IP datagram containing HTTP request routed to www. google. com v web server responds with HTTP reply (containing web page) v IP datagram containing HTTP reply routed back to client Link Layer 5 -107

Chapter 5: Summary v principles behind data link layer services: § error detection, correction

Chapter 5: Summary v principles behind data link layer services: § error detection, correction § sharing a broadcast channel: multiple access § link layer addressing v instantiation and implementation of various link layer technologies § Ethernet § switched LANS, VLANs § virtualized networks as a link layer: MPLS v synthesis: a day in the life of a web request Link Layer 5 -108

Chapter 5: let’s take a breath v v v journey down protocol stack complete

Chapter 5: let’s take a breath v v v journey down protocol stack complete (except PHY) solid understanding of networking principles, practice …. . could stop here …. but lots of interesting topics! § § wireless multimedia security network management Link Layer 5 -109

DATACENTER NETWORK DESIGNS Data Link Layer U NIVERSITY OF MASSACHUSETTS AMHERST • School of

DATACENTER NETWORK DESIGNS Data Link Layer U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 5 -110

Scaling a LAN network § Self-learning Ethernet switches work great at small scales, but

Scaling a LAN network § Self-learning Ethernet switches work great at small scales, but buckle at larger scales • Broadcast overhead of self-learning linear in the total number of interfaces • Broadcast storms possible in non-tree topologies § Goals • Scalability to a very large number of machines • Isolation of unwanted traffic from unrelated subnets • Ability to accommodate general types of workloads (Web, database, Map. Reduce, scientific computing, etc. ) U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 111

Typical DC network components v rich interconnection among switches, racks: § increased throughput between

Typical DC network components v rich interconnection among switches, racks: § increased throughput between racks (multiple routing paths possible) § increased reliability via redundancy Tier-1 or core switches Tier-2 or aggregation switches TOR switches Server racks 1 2 3 4 5 6 7 8 U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science

DC network design questions § Core and aggregation switches much faster than To. R

DC network design questions § Core and aggregation switches much faster than To. R switches § How much faster should core and aggregation switches need to be than To. R switches? § How many ports do core/aggregation switches need to support for a given number of To. R switch ports? § How many cables need to be run in total for a N machine datacenter? § What bisection bandwidth can be achieved? Q: Why can’t we just build a single BIG switch to interconnect all machines? U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 113

DC network topologies § Fat-tree (used ambiguously to mean Clos as well as a

DC network topologies § Fat-tree (used ambiguously to mean Clos as well as a simple hierarchical design) § Clos family § Hypercube § Torus U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 114

Why simpler hierarchies not good enough? § High cost § High oversubscription (ratio of

Why simpler hierarchies not good enough? § High cost § High oversubscription (ratio of worst-case aggregate bandwidth among end-hosts to bisection bandwidth) U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 115

Fat tree topology § Core branches, i. e. , those near the top of

Fat tree topology § Core branches, i. e. , those near the top of the hierarchy, are fatter or higher in capacity U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 116

Example: uniform Clos topology [UCSD] A Scalable Commodity Data Center Network Architecture U NIVERSITY

Example: uniform Clos topology [UCSD] A Scalable Commodity Data Center Network Architecture U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 117

Clos family § Ingress, intermediate, and egress switches where each stage’s links form a

Clos family § Ingress, intermediate, and egress switches where each stage’s links form a bipartite graph U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 118

VL 2: Clos case study (Microsoft) U NIVERSITY OF MASSACHUSETTS AMHERST • School of

VL 2: Clos case study (Microsoft) U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 119

VL 2: Addressing and routing U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer

VL 2: Addressing and routing U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 120

Valiant load balancing § Randomization for efficient, load-balanced routing [VLB] Valiant Load-Balancing: Building Networks

Valiant load balancing § Randomization for efficient, load-balanced routing [VLB] Valiant Load-Balancing: Building Networks That Can Support All Traffic Matrices U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 121

VL 2: Directory for AA<->LA mappings U NIVERSITY OF MASSACHUSETTS AMHERST • School of

VL 2: Directory for AA<->LA mappings U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 122

BCube: relies on more server ports U NIVERSITY OF MASSACHUSETTS AMHERST • School of

BCube: relies on more server ports U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 123

Other topologies from “supercomputing” Hypercube U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer

Other topologies from “supercomputing” Hypercube U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 124

Optical in data centers § Optical switching (100’s of Gbps) faster than traditional switches

Optical in data centers § Optical switching (100’s of Gbps) faster than traditional switches (40 -160 Gbps). § Optical cheaper 10 Gbps port § But optical circuit establishment delay high • MEMS (Micro-electro mechanical systems) reconfiguration time is ~10 ms § Optical enhanced data center designs migrate heavy flows (elephants) to optical pathways U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 125

Energy usage numbers § Typical US household: ~1000 k. Wh per month or ~30

Energy usage numbers § Typical US household: ~1000 k. Wh per month or ~30 k. W § Typical desktop computer: 80 -250 W § Typical 1 U rack mounted server: ~300 W (can be a few thousand W for high-end servers) § Switches and networking equipment? U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 126

Switch power consumption § Generally small fraction (5 -25%) of servers in typical topologies

Switch power consumption § Generally small fraction (5 -25%) of servers in typical topologies U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 127

Techniques to reduce energy § Dynamic voltage and frequency scaling (DVFS): reduces CV 2

Techniques to reduce energy § Dynamic voltage and frequency scaling (DVFS): reduces CV 2 f by reducing voltage V • Generally not power-proportional, i. e. , power does not proportionally go down with decreased usage § Shutting down (“consolidating”) servers and parts of network: widely studied by cautiously used if at all in practice U NIVERSITY OF MASSACHUSETTS AMHERST • School of Computer Science 128