Helios A Hybrid ElectricalOptical Switch Architecture for Modular

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers Nathan Farrington George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat

Electrical Packet Switch • • • $500/port 10 Gb/s fixed rate 12 W/port Requires transceivers Per-packet switching For bursty, uniform traffic 2010 -09 -02 SIGCOMM Optical Circuit Switch • • • Nathan Farrington $500/port Rate free 240 m. W/port No transceivers 12 ms switching time For stable, pair-wise traffic 2

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 3

Optical Circuit Switch Output 1 Output 2 Input 1 Lenses Fixed Mirror Glass Fiber Bundle 1. Full crossbar switch Rotate Mirror 2. Does not decode packets 3. Needs external scheduler Mirrors on Motors 2010 -09 -02 SIGCOMM Nathan Farrington 4

Wavelength Division Multiplexing Optical Circuit Switch No Transceivers Required Superlink 80 G WDM MUX WDM DEMUX 10 G WDM Optical Transceivers 1 2 3 4 5 6 7 8 Electrical Packet Switch 2010 -09 -02 SIGCOMM Nathan Farrington 5

Stability Increases with Aggregation Inter-Data Center Inter-Pod Inter-Rack Inter-Server Inter-Process Inter-Thread 2010 -09 -02 SIGCOMM Nathan Farrington Where is the Sweet Spot? 1. Enough Stability 2. Enough Traffic 6

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 7

k switches, N-ports each N pods, k-ports each Example: N=64 pods * k=1024 hosts/pod = 64 K hosts total; 8 wavelengths Bisection Bandwidth 10% Electrical 100% Electrical (10: 1 Oversubscribed) Cost $6. 3 M Power 96. 5 k. W Cables 6, 656 2010 -09 -02 SIGCOMM Nathan Farrington Helios Example 10% Electrical + 90% Optical 8

k switches, N-ports each N pods, k-ports each Example: N=64 pods * k=1024 hosts/pod = 64 K hosts total; 8 wavelengths Bisection Bandwidth 10% Electrical 100% Electrical (10: 1 Oversubscribed) Cost $6. 3 M $62. 3 M Power 96. 5 k. W 950. 3 k. W Cables 6, 656 65, 536 2010 -09 -02 SIGCOMM Nathan Farrington Helios Example 10% Electrical + 90% Optical 9

Less than k switches, N-ports each Fewer Core Switches N pods, k-ports each Example: N=64 pods * k=1024 hosts/pod = 64 K hosts total; 8 wavelengths Bisection Bandwidth 10% Electrical 100% Electrical (10: 1 Oversubscribed) Helios Example 10% Electrical + 90% Optical Cost $6. 3 M $62. 2 M $22. 1 M 2. 8 x Less Power 96. 5 k. W 950. 3 k. W 157. 2 k. W 6. 0 x Less Cables 6, 656 65, 536 14, 016 4. 7 x Less 2010 -09 -02 SIGCOMM Nathan Farrington 10

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 11

Setup a Circuit Pod 1 -> 2: • Capacity = 10 G • Demand = 10 G • Throughput = 10 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G • Throughput = 80 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 12

Traffic Patterns Change Pod 1 -> 2: • Capacity = 10 G • Demand = 10 G • Throughput = 10 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G • Throughput = 80 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 13

Traffic Patterns Change Pod 1 -> 2: • Capacity = 10 G • Demand = 10 G 80 G • Throughput = 10 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G 10 G • Throughput = 10 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 14

Break a Circuit Pod 1 -> 2: • Capacity = 10 G • Demand = 10 G 80 G • Throughput = 10 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G 10 G • Throughput = 10 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 15

Setup a Circuit Pod 1 -> 2: • Capacity = 10 G • Demand = 10 G 80 G • Throughput = 10 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G 10 G • Throughput = 10 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 16

Pod 1 -> 2: • Capacity = 80 G • Demand = 80 G • Throughput = 80 G Pod 1 -> 3: • Capacity = 80 G • Demand = 80 G 10 G • Throughput = 10 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 17

Pod 1 -> 2: • Capacity = 80 G • Demand = 80 G • Throughput = 80 G Pod 1 -> 3: • Capacity = 10 G • Demand = 10 G • Throughput = 10 G EPS 10 G 80 G Pod 1 2010 -09 -02 SIGCOMM OCS 10 G 80 G Pod 2 Nathan Farrington 10 G 80 G Pod 3 18

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 19

Topology Manager EPS 10 G 80 G OCS Circuit Switch Manager 10 G 80 G Pod Switch Manager Pod 1 Pod 2 Pod 3 2010 -09 -02 SIGCOMM Nathan Farrington 20

Outline of Control Loop 1. Estimate traffic demand 2. Compute optimal topology for maximum throughput 3. Program the pod switches and circuit switches 2010 -09 -02 SIGCOMM Nathan Farrington 21

1. Estimate Traffic Demand Question: Will this flow use more bandwidth if we give it more capacity? 1. Identify elephant flows (mice don’t grow) Problem: Measurements are biased by current topology 2. Pretend all hosts are connected to an ideal crossbar switch 3. Compute the max-min fair bandwidth fixpoint Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI’ 10. 2010 -09 -02 SIGCOMM Nathan Farrington 22

2. Compute Optimal Topology 1. Formulate as instance of max-weight perfect matching problem on bipartite graph 2. Solve with Edmonds algorithm Source Pods Destination Pods 1 1 2 2 3 3 4 4 2010 -09 -02 SIGCOMM a) Pods do not send traffic to themselves b) Edge weights represent interpod demand c) Algorithm is run iteratively for each circuit switch, making use of the previous results Nathan Farrington 23

Example: Compute Optimal Topology 2010 -09 -02 SIGCOMM Nathan Farrington 24

Example: Compute Optimal Topology 2010 -09 -02 SIGCOMM Nathan Farrington 25

Example: Compute Optimal Topology 2010 -09 -02 SIGCOMM Nathan Farrington 26

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 27

Traditional Network Helios Network 100% bisection bandwidth (240 Gb/s) 2010 -09 -02 SIGCOMM Nathan Farrington 28

Hardware • 24 servers – HP DL 380 – 2 socket (E 5520) Nehalem – Dual Myricom 10 G NICs • 7 switches – One Dell 1 G 48 -port – Three Fulcrum 10 G 24 -port – One Glimmerglass 64 -port optical circuit switch – Two Cisco Nexus 5020 10 G 52 -port 2010 -09 -02 SIGCOMM Nathan Farrington 29

2010 -09 -02 SIGCOMM Nathan Farrington 30

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 31

Traditional Network Hash Collisions TCP/IP Overhead 190 Gb/s Peak 171 Gb/s Avg 2010 -09 -02 SIGCOMM Nathan Farrington 32

Helios Network (Baseline) 160 Gb/s Peak 43 Gb/s Avg 2010 -09 -02 SIGCOMM Nathan Farrington 33

Port Debouncing 1. Layer 1 PHY signal locked (bits are detected) 2. Switch thread wakes up and polls for PHY status • Makes note to enable link after 2 seconds 3. Switch thread enables Layer 2 link 0. 0 0. 25 0. 75 1. 0 1. 25 1. 75 2. 0 Time (s) 2010 -09 -02 SIGCOMM Nathan Farrington 34

Without Debouncing 160 Gb/s Peak 87 Gb/s Avg 2010 -09 -02 SIGCOMM Nathan Farrington 35

Without EDC Software Limitation 27 ms Gaps 2010 -09 -02 SIGCOMM Nathan Farrington 160 Gb/s Peak 142 Gb/s Avg 36

Bidirectional Circuits Optical Circuit Switch RX TX Pod Switch 2010 -09 -02 SIGCOMM RX TX Pod Switch Nathan Farrington RX TX Pod Switch 37

Unidirectional Circuits Optical Circuit Switch RX TX Pod Switch 2010 -09 -02 SIGCOMM RX TX Pod Switch Nathan Farrington RX TX Pod Switch 38

Unidirectional Circuits Unidirectional Scheduler 142 Gb/s Avg Daisy Chain Needed for Good Performance For Arbitrary Traffic Patterns Bidirectional Scheduler 100 Gb/s Avg 2010 -09 -02 SIGCOMM Nathan Farrington 39

Traffic Stability and Throughput 2010 -09 -02 SIGCOMM Nathan Farrington 40

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 41

Link Technology Modifications Required Working Prototype Helios Switch Software (SIGCOMM ‘ 10) Optics w/ WDM 10 G-180 G (CWDM) 10 G-400 G (DWDM) Glimmerglass, Fulcrum c-Through Optics (10 G) Host OS Emulation Flyways Wireless (1 G, 10 m) Unspecified (SIGCOMM ’ 10) (Hot. Nets ‘ 09) IBM System-S Optics (10 G) Host Application; Calient, Specific to Stream Nortel Processing HPC Host NIC Hardware (GLOBECOM ‘ 09) (SC ‘ 05) 2010 -09 -02 SIGCOMM Optics (10 G) Nathan Farrington 42

Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion 43
![“Why Packet Switching? ” “The conventional wisdom [of 1985 is] that packet switching is “Why Packet Switching? ” “The conventional wisdom [of 1985 is] that packet switching is](http://slidetodoc.com/presentation_image_h/218dcfdacbe56b5263769d7be9c30c25/image-44.jpg)
“Why Packet Switching? ” “The conventional wisdom [of 1985 is] that packet switching is poorly suited to the needs of telephony. . . ” Jonathan Turner. “Design of an Integrated Services Packet Network”. IEEE J. on Selected Areas in Communications, SAC-4 (8), Nov 1986. 2010 -09 -02 SIGCOMM Nathan Farrington 44

Conclusion • Helios: a scalable, energy-efficient network architecture for modular data centers • Large cost, power, and cabling complexity savings • Dynamically and automatically provisions bisection bandwidth at runtime • Does not require end-host modifications or switch hardware modifications • Deployable today using commercial components • Uses the strengths of circuit switching to compensate for the weaknesses of packet switching, and vice versa 2010 -09 -02 SIGCOMM Nathan Farrington 45
- Slides: 45