6 829 Computer Networks Lecture 12 Data Center

  • Slides: 31
Download presentation
6. 829: Computer Networks Lecture 12: Data Center Network Architectures Mohammad Alizadeh Fall 2016

6. 829: Computer Networks Lecture 12: Data Center Network Architectures Mohammad Alizadeh Fall 2016 ² Slides adapted from presentations by Albert Greenberg and Changhoon Kim (Microsoft) 1

What are Data Centers? Large facilities with 10 s of thousands of networked servers

What are Data Centers? Large facilities with 10 s of thousands of networked servers – Compute, storage, and networking in concert – “Warehouse-Scale Computers” – Huge investment: ~ 0. 5 billion for large datacenter 2

Data Center Costs Amortized Cost* ~45% Component Sub-Components Servers CPU, memory, disk ~25% ~15%

Data Center Costs Amortized Cost* ~45% Component Sub-Components Servers CPU, memory, disk ~25% ~15% Power infrastructure Power draw UPS, cooling, power distribution Electrical utility costs ~15% Network Switches, links, transit The Cost of a Cloud: Research Problems in Data Center Networks. Sigcomm CCR 2009. Greenberg, Hamilton, Maltz, Patel. *3 yr amortization for servers, 15 yr for infrastructure; 5% cost of money

Server Costs 30% utilization considered “good” in most data centers! Uneven application fit –

Server Costs 30% utilization considered “good” in most data centers! Uneven application fit – Each server has CPU, memory, disk: most applications exhaust one resource, stranding the others Uncertainty in demand – Demand for a new service can spike quickly Risk management – Not having spare servers to meet demand brings failure just when success is at hand 4

Goal: Agility – Any service, Any Server Turn the servers into a single large

Goal: Agility – Any service, Any Server Turn the servers into a single large fungible pool – Dynamically expand contract service footprint as needed Benefits – Lower cost (higher utilization) – Increase developer productivity – Achieve high performance and reliability 5

Achieving Agility Workload management – Means for rapidly installing a service’s code on a

Achieving Agility Workload management – Means for rapidly installing a service’s code on a server – Virtual machines, disk images, containers Storage Management – Means for a server to access persistent data – Distributed filesystems (e. g. , HDFS, blob stores) Network – Means for communicating with other servers, regardless of where they are in the data center 6

Datacenter Networks Provide the illusion of “One Big Switch” 10, 000 s of ports

Datacenter Networks Provide the illusion of “One Big Switch” 10, 000 s of ports Compute Storage (Disk, Flash, …)

Datacenter Traffic Growth Today: Petabits/s in one DC Ø More than core of the

Datacenter Traffic Growth Today: Petabits/s in one DC Ø More than core of the Internet! ² Source: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network”, SIGCOMM 2015.

Conventional DC Network Problems 9

Conventional DC Network Problems 9

Conventional DC Network Internet CR DC-Layer 3 DC-Layer 2 AR AR S S CR

Conventional DC Network Internet CR DC-Layer 3 DC-Layer 2 AR AR S S CR . . . — L 2 pros, cons? — L 3 pros, cons? AR AR Key S S A A … A . . . • • CR = Core Router (L 3) AR = Access Router (L 3) S = Ethernet Switch (L 2) A = Rack of app. servers ~ 1, 000 servers/pod == IP subnet Reference – “Data Center: Load balancing Data Center Services”, Cisco 2004 10

Conventional DC Network Problems CR AR AR S S CR ~ 200: 1 AR

Conventional DC Network Problems CR AR AR S S CR ~ 200: 1 AR AR S S S A A … A ~ 40: 1 S A ~ S 5: 1 A … A S S A A … A . . . Dependence on high-cost proprietary routers Extremely limited server-to-server capacity 11

Conventional DC Network Problems CR CR ~ 200: 1 AR AR S S S

Conventional DC Network Problems CR CR ~ 200: 1 AR AR S S S A A … A IP subnet (VLAN) #2 Dependence on high-cost proprietary routers Extremely limited server-to-server capacity Resource fragmentation 12

And More Problems … CR CR ~ 200: 1 AR AR S S S

And More Problems … CR CR ~ 200: 1 AR AR S S S A A … A Complicated manual L 2/L 3 re-configuration IP subnet (VLAN) #1 IP subnet (VLAN) #2 Poor reliability Lack of performance isolation 13

VL 2 Paper Measurements VL 2 Design - Clos topology Valiant LB Name/location separation

VL 2 Paper Measurements VL 2 Design - Clos topology Valiant LB Name/location separation (precursor to network virtualization) http: //research. microsoft. com/en-US/news/features/datacenternetworking-081909. aspx 14

Measurements 15

Measurements 15

DC Traffic Characteristics Instrumented a large cluster used for data mining and identified distinctive

DC Traffic Characteristics Instrumented a large cluster used for data mining and identified distinctive traffic patterns Traffic patterns are highly volatile – A large number of distinctive patterns even in a day Traffic patterns are unpredictable – Correlation between patterns very weak Traffic-aware optimization needs to be done frequently and rapidly 16

DC Opportunities DC controller knows everything about hosts Host OS’s are easily customizable Probabilistic

DC Opportunities DC controller knows everything about hosts Host OS’s are easily customizable Probabilistic flow distribution would work well enough, because … ? ? – Flows are numerous and not huge – no elephants – Commodity switch-to-switch links are substantially thicker (~ 10 x) than the maximum thickness of a flow DC network can be made simple 17

Intuition Higher speed links improve flow-level load balancing (ECMP) 20× 10 Gbps Uplinks 2×

Intuition Higher speed links improve flow-level load balancing (ECMP) 20× 10 Gbps Uplinks 2× 100 Gbps Uplinks Prob of 100% throughput = 3. 27% 1 2 20 Prob of 100% throughput = 99. 95% 11× 10 Gbps flows (55% load) 1 2 18

Virtual Layer 2 19

Virtual Layer 2 19

VL 2 Goals The Illusion of a Huge L 2 Switch 1. L 2

VL 2 Goals The Illusion of a Huge L 2 Switch 1. L 2 semantics 2. Uniform high capacity A A … A 3. Performance isolation A A … A 20

Clos Topology Offer huge capacity via multiple paths (scale out, not up) VL 2

Clos Topology Offer huge capacity via multiple paths (scale out, not up) VL 2 Int . . . Aggr . . TOR 20 Servers . . . 21

Multiple switching layers (Why? ) ² https: //code. facebook. com/posts/360346274145943/introducing-data-center-fabric-the-next -generation-facebook-data-center-network/ 22

Multiple switching layers (Why? ) ² https: //code. facebook. com/posts/360346274145943/introducing-data-center-fabric-the-next -generation-facebook-data-center-network/ 22

Building Block: Merchant Silicon Switching Chips Switch ASIC 6 pack Facebook Wedge ² Image

Building Block: Merchant Silicon Switching Chips Switch ASIC 6 pack Facebook Wedge ² Image courtesy of Facebook 23

Long cables (fiber) ² https: //code. facebook. com/posts/360346274145943/introducing-data-center-fabric-the-next -generation-facebook-data-center-network/ 24

Long cables (fiber) ² https: //code. facebook. com/posts/360346274145943/introducing-data-center-fabric-the-next -generation-facebook-data-center-network/ 24

VL 2 Design Principles Randomizing to Cope with Volatility – Tremendous variability in traffic

VL 2 Design Principles Randomizing to Cope with Volatility – Tremendous variability in traffic matrices Separating Names from Locations – Any server, any service Embracing End Systems – Leverage the programmability & resources of servers – Avoid changes to switches Building on Proven Networking Technology – Build with parts shipping today – Leverage low cost, powerful merchant silicon ASICs

VL 2 Goals and Solutions Approach Solution Employ flat addressing Name-location separation & resolution

VL 2 Goals and Solutions Approach Solution Employ flat addressing Name-location separation & resolution service 2. Uniform high capacity between servers Guarantee bandwidth for hose-model traffic Flow-based random traffic indirection (Valiant LB) 3. Performance Isolation Enforce hose model using existing mechanisms only TCP Objective 1. Layer-2 semantics 26

Addressing and Routing: Name-Location Separation VL 2 Switches run link-state routing and maintain only

Addressing and Routing: Name-Location Separation VL 2 Switches run link-state routing and maintain only switch-level topology • Allows to use low cost switches • Protects network from host-state churn • Obviates host and switch reconfiguration To. R 1. . . To. R 2. . . To. R 3. . . To. R 4 To. R 3 y payload To. R 34 z payload x y, yz Servers use flat names z Directory Service … x To. R 2 y To. R 3 z To. R 34 … Lookup & Response 27

VL 2 Agent in Action H(ft) Int dst LA IP src IP dst IPLA

VL 2 Agent in Action H(ft) Int dst LA IP src IP dst IPLA src H(ft) IP dst. To. R dst AA src AA payload Int (10. 1. 1. 1) (10. 0. 0. 6) To. R (20. 0. 0. 1) (10. 0. 0. 4) To. R (20. 0. 0. 1) VL 2 Agent Why use hash for Src IP? Why anycast & double encap? VLB ECMP 28

Other details How does L 2 broadcast work? How does Internet communication work? 29

Other details How does L 2 broadcast work? How does Internet communication work? 29

VL 2 Directory System Read-optimized Directory Servers for lookups Write-optimized Replicated State Machines for

VL 2 Directory System Read-optimized Directory Servers for lookups Write-optimized Replicated State Machines for updates Stale mappings? 30

31

31