High Performance Switches and Routers Theory and Practice





























































































































































































- Slides: 189

High Performance Switches and Routers: Theory and Practice Sigcomm 99 August 30, 1999 Harvard University Nick Mc. Keown Balaji Prabhakar Departments of Electrical Engineering and Computer Science nickm@stanford. edu Copyright 1999. All Rights Reserved balaji@isl. stanford. edu

Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 2

Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 3

Basic Architectural Components Admission Control Policing Congestion Control Routing Switching Copyright 1999. All Rights Reserved Reservation Output Scheduling Control Datapath: per packet processing 4

Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 5

Where high performance packet switches are used Carrier Class Core Router ATM Switch Frame Relay Switch The Internet Core Edge Router Copyright 1999. All Rights Reserved Enterprise WAN access & Enterprise Campus Switch 6

Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 7

ATM Switch • • Lookup cell VCI/VPI in VC table. Replace old VCI/VPI with new. Forward cell to outgoing interface. Transmit cell onto link. Copyright 1999. All Rights Reserved 8

Ethernet Switch • Lookup frame DA in forwarding table. – If known, forward to correct port. – If unknown, broadcast to all ports. • Learn SA of incoming frame. • Forward frame to outgoing interface. • Transmit frame onto link. Copyright 1999. All Rights Reserved 9

IP Router • Lookup packet DA in forwarding table. – If known, forward to correct port. – If unknown, drop packet. • Decrement TTL, update header Cksum. • Forward packet to outgoing interface. • Transmit packet onto link. Copyright 1999. All Rights Reserved 10

Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 11

First Generation IP Routers Shared Backplane Buffer Memory CPU CP L U I ine nt er fa M ce em or y Copyright 1999. All Rights Reserved DMA DMA Line Interface MAC MAC 12

Second Generation IP Routers Buffer Memory CPU DMA DMA Line Card Local Buffer Memory MAC MAC Copyright 1999. All Rights Reserved 13

Third Generation Switches/Routers Switched Backplane Li L i Li. In nene L I Li. Ininnetneeterf rfa ace L I CPI Initnnetneeterf rfacacece n. Ut er rfa ac e er fa ce e fa ce M ce em or y Copyright 1999. All Rights Reserved Line Card CPU Card Line Card Local Buffer Memory MAC 14

Fourth Generation Switches/Routers Clustering and Multistage 1 2 3 4 5 6 13 14 15 16 17 18 25 26 27 28 29 30 7 8 9 10 11 12 19 20 21 22 23 24 31 32 21 1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16 17 1819 20 21 22 23 2425 26 27 28 29 30 31 32 Copyright 1999. All Rights Reserved 15

Packet Switches References • J. Giacopelli, M. Littlewood, W. D. Sincoskie “Sunshine: A high performance self routing broadband packet switch architecture”, ISS ‘ 90. • J. S. Turner “Design of a Broadcast packet switching network”, IEEE Trans Comm, June 1988, pp. 734 743. • C. Partridge et al. “A Fifty Gigabit per second IP Router”, IEEE Trans Networking, 1998. • N. Mc. Keown, M. Izzard, A. Mekkittikul, W. Ellersick, M. Horowitz, “The Tiny Tera: A Packet Switch Core”, IEEE Micro Magazine, Jan Feb 1997. Copyright 1999. All Rights Reserved 16

Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 17

Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 18

Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 19

ATM and MPLS Switches Direct Lookup Memory Data Copyright 1999. All Rights Reserved Address VCI (Port, VCI) 20

Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 21

Bridges and Ethernet Switches Associative Lookups Advantages: Associative Memory or CAM Search Data Network Associated Address Data 48 • Simple Associated Data { Hit? Address log 2 N Copyright 1999. All Rights Reserved Disadvantages • Slow • High Power • Small • Expensive 22

Bridges and Ethernet Switches Hashing 16 Memory Data 48 Hashing Function Address Search Data Associated Data { Hit? Address log 2 N Copyright 1999. All Rights Reserved 23

Lookups Using Hashing An example Memory #1 Search Data 48 #2 #3 #4 Associated Data Hashing Function CRC 16 Linked lists Copyright 1999. All Rights Reserved 16 #1 { #2 Hit? Address log 2 N #1 #2 #3 24

Lookups Using Hashing Performance of simple example Copyright 1999. All Rights Reserved 25

Lookups Using Hashing Advantages: • Simple • Expected lookup time can be small Disadvantages • Non deterministic lookup time • Inefficient use of memory Copyright 1999. All Rights Reserved 26

Trees and Tries Binary Search Tree < > > < N entries Copyright 1999. All Rights Reserved > log 2 N < Binary Search Trie 0 0 1 1 010 0 1 111 27

Trees and Tries Multiway tries 16 ary Search Trie 0000, ptr 0000, 0 1111, ptr 000011110000 Copyright 1999. All Rights Reserved 1111, ptr 0000, 0 1111, ptr 111111 28

Trees and Tries Multiway tries Table produced from 215 randomly generated 48 -bit addresses Copyright 1999. All Rights Reserved 29

Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 30

Caching Addresses Slow Path Buffer Memory CPU Fast Path DMA DMA Line Card Local Buffer Memory MAC MAC Copyright 1999. All Rights Reserved 31

Caching Addresses LAN: WAN: Average flow < 40 packets Huge Number of flows Cache Hit Rate Cache = 10% of Full Table Copyright 1999. All Rights Reserved 32

IP Routers Class-based addresses IP Address Space Class A Class B Class A 212. 17. 9. 4 Class B Class C Copyright 1999. All Rights Reserved Class C D Routing Table: Exact Match 212. 17. 9. 0 Port 4 33

IP Routers CIDR Class based: A B C D 232 1 0 Classless: 128. 9. 0. 0 65/8 0 142. 12/19 128. 9/16 232 1 128. 9. 16. 14 Copyright 1999. All Rights Reserved 34

IP Routers CIDR 128. 9. 19/24 128. 9. 25/24 128. 9. 16/20 128. 9. 176/20 128. 9/16 232 1 0 128. 9. 16. 14 Most specific route = “longest matching prefix” Copyright 1999. All Rights Reserved 35

IP Routers Metrics for Lookups 128. 9. 16. 14 Prefix Port 65/8 128. 9/16 128. 9. 16/20 128. 9. 19/24 128. 9. 25/24 128. 9. 176/20 142. 12/19 3 5 2 7 10 1 3 Copyright 1999. All Rights Reserved • Lookup time • Storage space • Update time • Preprocessing time 36

IP Router Lookup H E A D E R Dstn Addr Forwarding Engine Next Hop Computation Next Hop Forwarding Table Destination Next Hop Incoming Packet IPv 4 unicast destination address based lookup Copyright 1999. All Rights Reserved 37

Need more than IPv 4 unicast lookups • Multicast • PIM SM – Longest Prefix Matching on the source and group address – Try (S, G) followed by (*, *, RP) – Check Incoming Interface • DVMRP: – Incoming Interface Check followed by (S, G) lookup • IPv 6 • 128 bit destination address field • Exact address architecture not yet known Copyright 1999. All Rights Reserved 38

Lookup Performance Required Line Rate Pkt size=40 B Pkt size=240 B T 1 1. 5 Mbps 4. 68 Kpps 0. 78 Kpps OC 3 155 Mbps 480 Kpps OC 12 622 Mbps 1. 94 Mpps 323 Kpps OC 48 2. 5 Gbps 7. 81 Mpps 1. 3 Mpps 31. 25 Mpps 5. 21 Mpps OC 192 10 Gbps Gigabit Ethernet (84 B packets): 1. 49 Mpps Copyright 1999. All Rights Reserved 39

Size of the Routing Table Source: http: //www. telstra. net/ops/bgptable. html Copyright 1999. All Rights Reserved 40

Ternary CAMs Associative Memory Value 10. 0 10. 1. 1. 0 10. 1. 3. 1 Mask 255. 0. 0. 0 255 R 1 R 2 R 3 R 4 Next Hop Priority Encoder Copyright 1999. All Rights Reserved 41

Binary Tries 0 d 1 f e a b g i h c Copyright 1999. All Rights Reserved j Example Prefixes a) 00001 b) 00010 c) 00011 d) 001 e) 0101 f) 011 g) 100 h) 1010 i) 1100 j) 11110000 42

Patricia Tree 0 f d a b e c Copyright 1999. All Rights Reserved 1 g h i Example Prefixes a) 00001 b) 00010 c) 00011 d) 001 Skip=5 e) 0101 f) 011 j g) 100 h) 1010 i) 1100 j) 11110000 43

Patricia Tree Disadvantages • Many memory accesses • May need backtracking • Pointers take up a lot of space Advantages • General Solution • Extensible to wider fields Avoid backtracking by storing the intermediate best matched prefix. (Dynamic Prefix Tries) 40 K entries: 2 MB data structure with 0. 3 0. 5 Mpps [O(W)] Copyright 1999. All Rights Reserved 44

Binary search on trie levels Level 0 Level 8 Level 29 Copyright 1999. All Rights Reserved P 45

Binary search on trie levels Store a hash table for each prefix length to aid search at a particular trie level. Length Hash 8 12 10 16 24 10. 1, 10. 2 10. 1. 1, 10. 1. 2, 10. 2. 3 Copyright 1999. All Rights Reserved Example Prefixes 10. 0/8 10. 1. 0. 0/16 10. 1. 1. 0/24 10. 1. 2. 0/24 10. 2. 3. 0/24 Example Addrs 10. 1. 1. 4 10. 4. 4. 3 10. 2. 3. 9 10. 2. 4. 8 46

Binary search on trie levels Disadvantages • Multiple hashed memory accesses. • Updates are complex. Advantages • Scaleable to IPv 6. 33 K entries: 1. 4 MB data structure with 1. 2 2. 2 Mpps [O(log W)] Copyright 1999. All Rights Reserved 47

Compacting Forwarding Tables 1 0 0 0 1 0 Copyright 1999. All Rights Reserved 1 1 1 0 0 0 1 1 48

Compacting Forwarding Tables 10001010 11100010 10000010 10110100 R 1, 0 0 R 2, 3 1 R 3, 7 2 Codeword array 11000000 R 4, 9 3 R 5, 0 4 Base index array 0 0 13 1 Copyright 1999. All Rights Reserved 49

Compacting Forwarding Tables Disadvantages • Scalability to larger tables? • Updates are complex. Advantages • Extremely small data structure can fit in cache. 33 K entries: 160 KB data structure with average 2 Mpps [O(W/k)] Copyright 1999. All Rights Reserved 50

Multi bit Tries 16 ary Search Trie 0000, ptr 0000, 0 1111, ptr 000011110000 Copyright 1999. All Rights Reserved 1111, ptr 0000, 0 1111, ptr 111111 51

Compressed Tries Only 3 memory accesses L 8 L 16 L 24 Copyright 1999. All Rights Reserved 52

Number Routing Lookups in Hardware Prefix length Most prefixes are 24 -bits or shorter Copyright 1999. All Rights Reserved 53

Routing Lookups in Hardware Prefixes up to 24 -bits 224 = 16 M entries 142. 19. 6 Next Hop 24 14 142. 19. 6. 14 1 Next Hop Copyright 1999. All Rights Reserved 54

Routing Lookups in Hardware Prefixes up to 24 -bits 128. 3. 72 0 Next Hop Pointer base 128. 3. 72 24 Next Hop Prefixes above 24 -bits Copyright 1999. All Rights Reserved 8 offset Next Hop Next 44 128. 3. 72. 44 1 55

Routing Lookups in Hardware Prefixes up to n-bits 2 n entries: 0 i N entries j Prefixes longer than N+M bits Next Hop N+M Copyright 1999. All Rights Reserved 56

Routing Lookups in Hardware Disadvantages • Large memory required (9 33 MB) • Depends on prefix length distribution. Advantages • 20 Mpps with 50 ns DRAM • Easy to implement in hardware Various compression schemes can be employed to decrease the storage requirements: e. g. employ carefully chosen variable length strides, bitmap compression etc. Copyright 1999. All Rights Reserved 57

IP Router Lookups References • A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3 14. • B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248 56, vol. 3. • M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25 36. • P. Gupta, S. Lin, N. Mc. Keown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241 1248, vol. 3. • S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1 3, 1998. • V. Srinivasan, G. Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998. Copyright 1999. All Rights Reserved 58

Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 59

Providing Value Added Services Some examples • Differentiated services – Regard traffic from Autonomous System #33 as `platinum grade’ • Access Control Lists – Deny udp host 194. 72. 33 194. 72. 6. 64 0. 0. 0. 15 eq snmp • Committed Access Rate – Rate limit WWW traffic from sub interface#739 to 10 Mbps • Policy based Routing – Route all voice traffic through the ATM network Copyright 1999. All Rights Reserved 60

Packet Classification H E A D E R Incoming Packet Copyright 1999. All Rights Reserved Forwarding Engine Packet Classification Action Classifier (Policy Database) Predicate Action 61

Multi field Packet Classification Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet. Copyright 1999. All Rights Reserved 62

Geometric Interpretation in 2 D Field #1 Field #2 R 7 R 6 P 1 P 2 Field #2 Data R 3 e. g. (144. 24/16, 64/24) e. g. (128. 16. 46. 23, *) R 1 R 5 Copyright 1999. All Rights Reserved R 4 R 2 Field #1 63

Proposed Schemes Copyright 1999. All Rights Reserved 64

Proposed Schemes (Contd. ) Copyright 1999. All Rights Reserved 65

Proposed Schemes (Contd. ) Copyright 1999. All Rights Reserved 66

Grid of Tries 0 Dimension 1 1 0 0 0 1 R 1 0 1 1 0 R 3 R 2 Copyright 1999. All Rights Reserved R 4 0 0 R 5 0 R 6 0 Dimension 2 1 R 7 67

Grid of Tries Disadvantages • Static solution • Not easy to extend to higher dimensions Advantages • Good solution for two dimensions 20 K entries: 2 MB data structure with 9 memory accesses [at most 2 W] Copyright 1999. All Rights Reserved 68

Classification using Bit Parallelism 0 1 1 1 0 R 4 R 3 R 1 R 2 0 Copyright 1999. All Rights Reserved 69

Classification using Bit Parallelism Disadvantages • Large memory bandwidth • Hardware optimized Advantages • Good solution for multiple dimensions for small classifiers 512 rules: 1 Mpps with single FPGA and 5 128 KB SRAM chips. Copyright 1999. All Rights Reserved 70

Classification Using Multiple Fields Recursive Flow Classification 2 S = 2128 2 T = 212 Packet Header Memory F 1 Memory Action F 2 F 3 2 S = 2128 264 224 2 T = 212 F 4 Fn Copyright 1999. All Rights Reserved 71

Packet Classification References • T. V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding using efficient multi dimensional range matching”, Sigcomm 1998, pp 191 202. • V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer 4 switching”, Sigcomm 1998, pp 203 214. • V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space search”, to be presented at Sigcomm 1999. • P. Gupta, N. Mc. Keown, “Packet classification using hierarchical intelligent cuttings”, Hot Interconnects VII, 1999. • P. Gupta, N. Mc. Keown, “Packet classification on multiple fields”, Sigcomm 1999. Copyright 1999. All Rights Reserved 72

Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 73

Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Combining input and output queues – Other non blocking fabrics – Multicast traffic Copyright 1999. All Rights Reserved 74

Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 75

Interconnects Two basic techniques Input Queueing Usually a non-blocking switch fabric (e. g. crossbar) Copyright 1999. All Rights Reserved Output Queueing Usually a fast bus 76

Interconnects Output Queueing Individual Output Queues Centralized Shared Memory b/w = 2 N. R 1 2 N 1 2 Memory b/w = (N+1). R Copyright 1999. All Rights Reserved N 77

Output Queueing The “ideal” 2 1 1 2 1 2 11 2 2 1 Copyright 1999. All Rights Reserved 78

Output Queueing How fast can we make centralized shared memory? 5 ns SRAM Shared Memory • 5 ns per memory operation • Two memory operations per packet • Therefore, up to 160 Gb/s • In practice, closer to 80 Gb/s 1 2 N 200 byte bus Copyright 1999. All Rights Reserved 79

Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 80

Interconnects Input Queueing with Crossbar Memory b/w = 2 R Data In Scheduler configuration Data Out Copyright 1999. All Rights Reserved 81

Input Queueing Delay Head of Line Blocking Load Copyright 1999. All Rights Reserved 58. 6% 100% 82

Head of Line Blocking Copyright 1999. All Rights Reserved 83

Copyright 1999. All Rights Reserved 84

Copyright 1999. All Rights Reserved 85

Input Queueing Virtual output queues Copyright 1999. All Rights Reserved 86

Input Queues Delay Virtual Output Queues Load Copyright 1999. All Rights Reserved 100% 87

Input Queueing Memory b/w = 2 R Scheduler Copyright 1999. All Rights Reserved Can be quite complex! 88

Input Queueing Scheduling Copyright 1999. All Rights Reserved 89

Input Queueing 1 2 3 4 7 2 4 2 5 2 Request Graph Scheduling 1 1 2 2 3 3 4 4 1 2 3 4 Bipartite Matching (Weight = 18) Question: Maximum weight or maximum size? Copyright 1999. All Rights Reserved 90

Input Queueing Scheduling • Maximum Size – Maximizes instantaneous throughput – Does it maximize long term throughput? • Maximum Weight – Can clear most backlogged queues – But does it sacrifice long term throughput? Copyright 1999. All Rights Reserved 91

Input Queueing Scheduling Copyright 1999. All Rights Reserved 1 1 2 2 92

Input Queueing Longest Queue First or Oldest Cell First Weight 1 2 3 4 1 1 1 ={ Queue Length Waiting Time 1 10 10 1 2 3 4 Copyright 1999. All Rights Reserved } Maximum weight 100% 1 2 3 4 93

Input Queueing Why is serving long/old queues better than serving maximum number of queues? Non-uniform traffic Uniform traffic VOQ # Copyright 1999. All Rights Reserved Avg Occupancy • When traffic is uniformly distributed, servicing the maximum number of queues leads to 100% throughput. • When traffic is non uniform, some queues become longer than others. • A good algorithm keeps the queue lengths matched, and services a large number of queues. VOQ # 94

Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 95

Wave Front Arbiter Requests Match 1 1 2 2 3 3 4 4 Copyright 1999. All Rights Reserved 96

Wave Front Arbiter Requests Copyright 1999. All Rights Reserved Match 97

Wave Front Arbiter Implementation Copyright 1999. All Rights Reserved 1, 1 1, 2 1, 3 1, 4 2, 1 2, 2 2, 3 2, 4 3, 1 3, 2 3, 3 3, 4 4, 1 4, 2 4, 3 4, 4 Combinational Logic Blocks 98

Wave Front Arbiter Wrapped WFA (WWFA) N steps instead of 2 N 1 Requests Copyright 1999. All Rights Reserved Match 99

Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 100

Parallel Random Iterative Matching Random Selection #1 1 2 3 1 2 3 4 4 4 Requests Grant Accept/Match 1 2 #2 3 1 2 3 1 2 3 4 4 4 Copyright 1999. All Rights Reserved 101

Parallel Iterative Matching Maximal is not Maximum 1 2 3 4 4 Requests Copyright 1999. All Rights Reserved Accept/Match 1 2 3 4 4 102

Parallel Iterative Matching Analytical Results Number of iterations to converge: Copyright 1999. All Rights Reserved 103

Parallel Iterative Matching Copyright 1999. All Rights Reserved 104

Parallel Iterative Matching Copyright 1999. All Rights Reserved 105

Parallel Iterative Matching Copyright 1999. All Rights Reserved 106

Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 107

i. SLIP Round Robin Selection #1 1 2 3 1 2 3 4 4 4 Requests Grant Accept/Match 1 2 #2 3 1 2 3 1 2 3 4 4 4 Copyright 1999. All Rights Reserved 108

i. SLIP Properties • • • Random under low load TDM under high load Lowest priority to MRU 1 iteration: fair to outputs Converges in at most N iterations. On average <= log 2 N • Implementation: N priority encoders • Up to 100% throughput for uniform traffic Copyright 1999. All Rights Reserved 109

i. SLIP Copyright 1999. All Rights Reserved 110

i. SLIP Copyright 1999. All Rights Reserved 111

i. SLIP Programmable Priority Encoder N N Implementation 1 Grant 1 Accept log 2 N 2 2 log 2 N Grant Accept State Decision N N Grant Copyright 1999. All Rights Reserved N Accept log 2 N 112

Input Queueing References • M. Karol et al. “Input vs Output Queueing on a Space Division Packet Switch”, IEEE Trans Comm. , Dec 1987, pp. 1347 1356. • Y. Tamir, “Symmetric Crossbar arbiters for VLSI communication switches”, IEEE Trans Parallel and Dist Sys. , Jan 1993, pp. 13 27. • T. Anderson et al. “High Speed Switch Scheduling for Local Area Networks”, ACM Trans Comp Sys. , Nov 1993, pp. 319 352. • N. Mc. Keown, “The i. SLIP scheduling algorithm for Input Queued Switches”, IEEE Trans Networking, April 1999, pp. 188 201. • C. Lund et al. “Fair prioritized scheduling in an input buffered switch”, Proc. of IFIP IEEE Conf. , April 1996, pp. 358 69. • A. Mekkitikul et al. “A Practical Scheduling Algorithm to Achieve 100% Throughput in Input Queued Switches”, IEEE Infocom 98, April 1998. Copyright 1999. All Rights Reserved 113

Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 114

Other Non Blocking Fabrics Clos Network Copyright 1999. All Rights Reserved 115

Other Non Blocking Fabrics Clos Network Expansion factor required = 2 1/N (but still blocking for multicast) Copyright 1999. All Rights Reserved 116

Other Non Blocking Fabrics Self-Routing Networks 000 001 010 011 100 101 110 111 Copyright 1999. All Rights Reserved 117

Other Non Blocking Fabrics Self-Routing Networks The Non blocking Batcher Banyan Network Batcher Sorter Self-Routing Network 3 7 7 7 7 2 5 0 4 6 6 5 3 2 5 5 4 5 2 5 3 1 6 5 4 6 6 1 3 0 3 3 0 1 0 4 3 2 2 1 0 6 2 1 0 1 4 4 4 6 2 2 0 001 010 011 100 101 110 111 • Fabric can be used as scheduler. • Batcher-Banyan network is blocking for multicast. Copyright 1999. All Rights Reserved 118

Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 119

Speedup • Context – input queued switches – output queued switches – the speedup problem • Early approaches • Algorithms • Implementation considerations Copyright 1999. All Rights Reserved 120

Speedup: Context M e m o r y A generic switch The placement of memory gives Output queued switches Input queued switches Combined input and output queued switches Copyright 1999. All Rights Reserved 121

Output queued switches Best delay and throughput performance Possible to erect “bandwidth firewalls” between sessions Main problem Requires high fabric speedup (S = N) Unsuitable for high speed switching Copyright 1999. All Rights Reserved 122

Input queued switches Big advantage Speedup of one is sufficient Main problem Can’t guarantee delay due to input contention Overcoming input contention: use higher speedup Copyright 1999. All Rights Reserved 123

A Comparison Memory speeds for 32 x 32 switch Output queued Input queued Line Rate Memory BW Access Time Per cell Memory BW Access Time 100 Mb/s 3. 3 Gb/s 128 ns 200 Mb/s 2. 12 s 1 Gb/s 33 Gb/s 12. 8 ns 2 Gb/s 212 ns 2. 5 Gb/s 82. 5 Gb/s 5. 12 ns 5 Gb/s 84. 8 ns 10 Gb/s 330 Gb/s 1. 28 ns 20 Gb/s 21. 2 ns Copyright 1999. All Rights Reserved 124

The Speedup Problem Find a compromise: 1 < Speedup << N to get the performance of an OQ switch close to the cost of an IQ switch Essential for high speed Qo. S switching Copyright 1999. All Rights Reserved 125

Some Early Approaches Probabilistic Analyses assume traffic models (Bernoulli, Markov modulated, non uniform loading, “friendly correlated”) obtain mean throughput and delays, bounds on tails analyze different fabrics (crossbar, multistage, etc) Numerical Methods use actual and simulated traffic traces run different algorithms set the “speedup dial” at various values Copyright 1999. All Rights Reserved 126

The findings Very tantalizing. . . under different settings (traffic, loading, algorithm, etc) and even for varying switch sizes A speedup of between 2 and 5 was sufficient! Copyright 1999. All Rights Reserved 127

Using Speedup 1 2 1 Copyright 1999. All Rights Reserved 128

Intuition Bernoulli IID inputs Speedup = 1 Fabric throughput =. 58 Bernoulli IID inputs Speedup = 2 Fabric throughput = 1. 16 I/p efficiency, = 1/1. 16 Ave I/p queue = 6. 25 Copyright 1999. All Rights Reserved 129

Intuition (continued) Bernoulli IID inputs Speedup = 3 Fabric throughput = 1. 74 Input efficiency = 1/1. 74 Ave I/p queue = 1. 35 Bernoulli IID inputs Speedup = 4 Fabric throughput = 2. 32 Input efficiency = 1/2. 32 Ave I/p queue = 0. 75 Copyright 1999. All Rights Reserved 130

Issues Need hard guarantees exact, not average Robustness realistic, even adversarial, traffic not friendly Bernoulli IID Copyright 1999. All Rights Reserved 131

The Ideal Solution Inputs Speedup = N Outputs ? Speedup << N Question: Can we find a simple and good algorithms that exactly mimics output queueing regardless of switch sizes and traffic patterns? Copyright 1999. All Rights Reserved 132

What is exact mimicking? Apply same inputs to an OQ and a CIOQ switch packet by packet Obtain same outputs packet by packet Copyright 1999. All Rights Reserved 133

Algorithm MUCF Key concept: urgency value urgency = departure time present time Copyright 1999. All Rights Reserved 134

MUCF The algorithm Outputs try to get their most urgent packets Inputs grant to output whose packet is most urgent, ties broken by port number Loser outputs for next most urgent packet Algorithm terminates when no more matchings are possible Copyright 1999. All Rights Reserved 135

Stable Marriage Problem Men = Outputs Bill John Pedro Women = Inputs Hillary Copyright 1999. All Rights Reserved Monica Maria 136

An example Observation: Only two reasons a packet doesn’t get to its output Input contention, Output contention This is why speedup of 2 works!! Copyright 1999. All Rights Reserved 137

What does this get us? Speedup of 4 is sufficient for exact emulation of FIFO OQ switches, with MUCF What about non FIFO OQ switches? E. g. WFQ, Strict priority Copyright 1999. All Rights Reserved 138

Other results To exactly emulate an Nx. N OQ switch Speedup of 2 1/N is necessary and sufficient (Hence a speedup of 2 is sufficient for all N) Input traffic patterns can be absolutely arbitrary Emulated OQ switch may use a “monotone” scheduling policies E. g. : FIFO, LIFO, strict priority, WFQ, etc Copyright 1999. All Rights Reserved 139

What gives? Complexity of the algorithms Extra hardware for processing Extra run time (time complexity) What is the benefit? Reduced memory bandwidth requirements Tradeoff: Memory for processing Moore’s Law supports this tradeoff Copyright 1999. All Rights Reserved 140

Implementation a closer look Main sources of difficulty Estimating urgency, etc info is distributed (and communicating this info among I/ps and O/ps) Matching process too many iterations? Estimating urgency depends on what is being emulated Like taking a ticket to hold a place in a queue FIFO, Strict priorities no problem WFQ, etc problems Copyright 1999. All Rights Reserved 141

Implementation (contd) Matching process A variant of the stable marriage problem Worst case number of iterations for SMP = N 2 Worst case number of iterations in switching = N High probability and average approxly log(N) Copyright 1999. All Rights Reserved 142

Other Work Relax stringent requirement of exact emulation Least Occupied O/p First Algorithm (LOOFA) Keeps outputs always busy if there are packets By time stamping packets, it also exactly mimics Disallow arbitrary inputs E. g. leaky bucket constrained Obtain worst case delay bounds Copyright 1999. All Rights Reserved 143

References for speedup Y. Oie et al, “Effect of speedup in nonblocking packet switch’’, ICC 89. A. L Gupta, N. D. Georgana, “Analysis of a packet switch with input and output buffers and speed constraints”, Infocom 91. S T. Chuang et al, “Matching output queueing with a combined input and output queued switch”, IEEE JSAC, vol 17, no 6, 1999. B. Prabhakar, N. Mc. Keown, “On the speedup required for combined input and output queued switching”, Automatica, vol 35, 1999. P. Krishna et al, “On the speedup required for work conserving crossbar switches”, IEEE JSAC, vol 17, no 6, 1999. A. Charny, “Providing Qo. S guarantees in input buffered crossbar switches with speedup”, Ph. D Thesis, MIT, 1998. Copyright 1999. All Rights Reserved 144

Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 145

Multicast Switching • The problem • Switching with crossbar fabrics • Switching with other fabrics Copyright 1999. All Rights Reserved 146

Multicasting 2 1 Copyright 1999. All Rights Reserved 3 5 4 6 147

Crossbar fabrics: Method 1 Copy network + unicast switching Copy networks Increased hardware, increased input contention Copyright 1999. All Rights Reserved 148

Method 2 Use copying properties of crossbar fabric No fanout splitting: Easy, but low throughput Fanout splitting: higher throughput, but not as simple. Leaves “residue”. Copyright 1999. All Rights Reserved 149

The effect of fanout splitting Performance of an 8 x 8 switch with and without fanout splitting under uniform IID traffic Copyright 1999. All Rights Reserved 150

Placement of residue Key question: How should outputs grant requests? (and hence decide placement of residue) Copyright 1999. All Rights Reserved 151

Residue and throughput Result: Concentrating residue brings more new work forward. Hence leads to higher throughput. But, there are fairness problems to deal with. This and other problems can be looked at in a unified way by mapping the multicasting problem onto a variation of Tetris. Copyright 1999. All Rights Reserved 152

Multicasting and Tetris Input ports 1 2 3 4 5 Residue 1 2 3 4 5 Output ports Copyright 1999. All Rights Reserved 153

Multicasting and Tetris Input ports 1 2 3 4 5 Residue Concentrated 1 2 3 4 5 Output ports Copyright 1999. All Rights Reserved 154

Replication by recycling Main idea: Make two copies at a time using a binary tree with input at root and all possible destination outputs at the leaves. b c y a d x e Copyright 1999. All Rights Reserved x b x a c y y e d 155

Replication by recycling (cont’d) Receive Reseq Transmit Output Table Network Recycle Scaleable to large fanouts. Needs resequencing at outputs and introduces variable delays. Copyright 1999. All Rights Reserved 156

References for Multicasting • J. Hayes et al. “Performance analysis of a multicast switch”, IEEE/ACM Trans. on Networking, vol 39, April 1991. • B. Prabhakar et al. “Tetris models for multicast switches”, Proc. of the 30 th Annual Conference on Information Sciences and Systems, 1996 • B. Prabhakar et al. “Multicast scheduling for input queued switches”, IEEE JSAC, 1997 • J. Turner, “An optimal nonblocking multicast virtual circuit switch”, INFOCOM, 1994 Copyright 1999. All Rights Reserved 157

Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 158

Output Scheduling • What is output scheduling? • How is it done? • Practical Considerations Copyright 1999. All Rights Reserved 159

Output Scheduling Allocating output bandwidth Controlling packet delay scheduler Copyright 1999. All Rights Reserved 160

Output Scheduling FIFO Fair Queueing Copyright 1999. All Rights Reserved 161

Motivation • FIFO is natural but gives poor Qo. S – bursty flows increase delays for others – hence cannot guarantee delays Need round robin scheduling of packets – Fair Queueing – Weighted Fair Queueing, Generalized Processor Sharing Copyright 1999. All Rights Reserved 162

Fair queueing: Main issues • Level of granularity – packet by packet? (favors long packets) – bit by bit? (ideal, but very complicated) • Packet Generalized Processor Sharing (PGPS) – serves packet by packet – and imitates bit by bit schedule within a tolerance Copyright 1999. All Rights Reserved 163

How does WFQ work? WR = 1 WG = 5 WP = 2 Copyright 1999. All Rights Reserved 164

Delay guarantees • Theorem If flows are leaky bucket constrained and all nodes employ GPS (WFQ), then the network can guarantee worst case delay bounds to sessions. Copyright 1999. All Rights Reserved 165

Practical considerations • For every packet, the scheduler needs to – classify it into the right flow queue and maintain a linked list for each flow – schedule it for departure • Complexities of both are o(log [# of flows]) – first is hard to overcome – second can be overcome by DRR Copyright 1999. All Rights Reserved 166

Deficit Round Robin 700 50 250 400 200 600 500 250 750 500 100 400 500 Good approximation of FQ Much simpler to implement Copyright 1999. All Rights Reserved 500 Quantum size 167

But. . . • WFQ is still very hard to implement – classification is a problem – needs to maintain too much state information – doesn’t scale well Copyright 1999. All Rights Reserved 168

Strict Priorities and Diff Serv • Classify flows into priority classes – maintain only per class queues – perform FIFO within each class – avoid “curse of dimensionality” Copyright 1999. All Rights Reserved 169

Diff Serv • A framework for providing differentiated Qo. S – set Type of Service (To. S) bits in packet headers – this classifies packets into classes – routers maintain per class queues – condition traffic at network edges to conform to class requirements May still need queue management inside the network Copyright 1999. All Rights Reserved 170

References for O/p Scheduling A. Demers et al, “Analysis and simulation of a fair queueing algorithm”, ACM SIGCOMM 1989. A. Parekh, R. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: the single node case”, IEEE Trans. on Networking, June 1993. A. Parekh, R. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: the multiple node case”, IEEE Trans. on Networking, August 1993. M. Shreedhar, G. Varghese, “Efficient Fair Queueing using Deficit Round Robin”, ACM SIGCOMM, 1995. K. Nichols, S. Blake (eds), “Differentiated Services: Operational Model and Definitions”, Internet Draft, 1998. Copyright 1999. All Rights Reserved 171

Active Queue Management • Problems with traditional queue management – tail drop • Active Queue Management – goals – an example – effectiveness Copyright 1999. All Rights Reserved 172

Tail Drop Queue Management Lock-Out Max Queue Length Copyright 1999. All Rights Reserved 173

Tail Drop Queue Management • Drop packets only when queue is full – long steady state delay – global synchronization – bias against bursty traffic Copyright 1999. All Rights Reserved 174

Global Synchronization Max Queue Length Copyright 1999. All Rights Reserved 175

Bias Against Bursty Traffic Max Queue Length Copyright 1999. All Rights Reserved 176

Alternative Queue Management Schemes • Drop from front on full queue • Drop at random on full queue F both solve the lock out problem F both have the full queues problem Copyright 1999. All Rights Reserved 177

Active Queue Management Goals • Solve lock out and full queue problems – no lock out behavior – no global synchronization – no bias against bursty flow • Provide better Qo. S at a router – low steady state delay – lower packet dropping Copyright 1999. All Rights Reserved 178

Active Queue Management • Problems with traditional queue management – tail drop • Active Queue Management – goals F an example – effectiveness Copyright 1999. All Rights Reserved 179

Random Early Detection (RED) Pk maxth l l l P 2 qavg P 1 minth if qavg < minth: admit every packet else if qavg <= maxth: drop an incoming packet with p = (qavg - minth)/(maxth - minth) else if qavg > maxth: drop every incoming packet Copyright 1999. All Rights Reserved 180

Effectiveness of RED: Lock Out • Packets are randomly dropped • Each flow has the same probability of being discarded Copyright 1999. All Rights Reserved 181

Effectiveness of RED: Full Queue • Drop packets probabilistically in anticipation of congestion (not when queue is full) • Use qavg to decide packet dropping probability: allow instantaneous bursts • Randomness avoids global synchronization Copyright 1999. All Rights Reserved 182

What Qo. S does RED Provide? • Lower buffer delay: good interactive service – qavg is controlled to be small • Given responsive flows: packet dropping is reduced – early congestion indication allows traffic to throttle back before congestion • Given responsive flows: fair bandwidth allocation Copyright 1999. All Rights Reserved 183

Unresponsive or aggressive flows • Don’t properly back off during congestion • Take away bandwidth from TCP compatible flows • Monopolize buffer space Copyright 1999. All Rights Reserved 184

Control Unresponsive Flows • Some active queue management schemes – RED with penalty box – Flow RED (FRED) – Stabilized RED (SRED) identify and penalize unresponsive flows with a bit of extra work Copyright 1999. All Rights Reserved 185

Active Queue Management References • B. Braden et al. “Recommendations on queue management and congestion avoidance in the internet”, RFC 2309, 1998. • S. Floyd, V. Jacobson, “Random early detection gateways for congestion avoidance”, IEEE/ACM Trans. on Networking, 1(4), Aug. 1993. • D. Lin, R. Morris, “Dynamics on random early detection”, ACM SIGCOMM, 1997 • T. Ott et al. “SRED: Stabilized RED”, INFOCOM 1999 • S. Floyd, K. Fall, “Router mechanisms to support end to end congestion control”, LBL technical report, 1997 Copyright 1999. All Rights Reserved 186

Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 187

Basic Architectural Components Admission Control Policing Congestion Control Routing Switching Copyright 1999. All Rights Reserved Reservation Output Scheduling Control Datapath: per packet processing 188

Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 189
High performance switches and routers
High performance core router
Small business rv router
High performance switches
First internet router
Business class routers
Three dumb routers
Three dumb routers
Juniper ptx packet transport routers
Routers
Stallioni net solutions
Routers.
Componentes internos de un router
Show that the maximum efficiency of pure aloha is 1/(2e)
Hnd routers
Sisco routers
High performance cycle theory
Vni2140
Three generations of packet switches
Kundan switches models
Which type of reaction
Bridges vs switches
Cisco 100 series
Bridges vs switches
Pneumatic
Mercury switches in cars
Magnetic quadrupole
All switches illustrated in schematics are
Slotted optical switches
Schneider unica switches
Electronics q
A switch combines crossbar switches in several stages
Uma multiprocessors using crossbar switches
A switch combines crossbar switches in several stages
Zenith automatic transfer switches
Netgear gsm/fsm fully managed switches
Refurbished netgear gsm/fsm fully managed switches
Programmable logic controller
We should not touch electric switches with wet hands. why
Practice assessor and practice supervisor
Chapter 11 performance appraisal - (pdf)
Performance management vs performance appraisal
All performance attributes designated as joint performance
Proper practice prevents poor performance
Proper planning and preparation prevents poor performance
Quality assurance theory
Compassion theory
Quality revolution in software testing
Software testing and quality assurance theory and practice
Software testing and quality assurance theory and practice
Quality assurance theory
High directive and high supportive behavior
High directive and high supportive behavior
Decruitment options
Sand: towards high-performance serverless computing
Maui high performance computing center
Hpc linux distro
High performance work practices examples
Introduction hplc
Laptops for high performance computing
High performance nutrition
Mttf
High performance distributed file system
High performance distributed file system
Anatomy of high-performance matrix multiplication
High performance development model
High performance organization principles
Adaptive insertion policies for high performance caching
High performance development model
High performance operating system
High performance sql server
High performance computing modernization program
Bigpurple nyu
Ged high impact indicators
What is hpda
Principles of high-performance processor design
High performance web sites
High performance data analytics definition
Queryperformancecounter delphi
High-performance forecasting
High performance operating system
High performance concrete
Density of concrete
High performance liquid chromatography introduction
The alarm management handbook
High performance liquid chromatography hplc machine
High performance computing modernization program
High performance additives
Accelerating high performance
"high performance learning"
High performance shaders
Hplinpack
"high performance learning"
High performance food
High quality performance
High performance data mining
Superscalar vs vliw
What is the gpa equivalent of hibernate.cfg.xml file
High performance grid
High performance ssh
High-performance digital signal processing
High performance spaceflight computing
Matlab high performance computing
"high performance learning"
High performance planning
"high performance learning"
High performance embedded computing
High performance embedded computer
Regina high performance endurance
High performance servers
Ceph: a scalable, high-performance distributed file system
Army high performance computing research center
Adaptive insertion policies for high performance caching
Uil number sense
Theory and practice of histotechnology
Oligarchical collectivism meaning
Leadership theory and practice 6th edition
Jacques p thiroux
Theory research and evidence based practice
Cloud computing theory and practice
Accounting theory and practice notes
Coaching theory and practice
The theory and practice of oligarchical collectivism
Automated planning theory and practice
Educational psychology chapter 1
Theory of translation lectures
Alon global marketing download
Theory research and evidence based practice
Curriculum development theory and practice
Software architecture foundations theory and practice
Macroeconomics theory and practice
Automated planning theory and practice
Authentic leadership theory and practice
Software architecture foundations theory and practice
Global marketing contemporary theory practice and cases
Jacques thiroux
Substance abuse counseling theory and practice
Batch sequential architecture
According to kimball richman and copen administration is
Learning approaches, theory, and practice
The servant theories
Global marketing: contemporary theory, practice, and cases
Un global compact
Effective group discussion defines human communication as
English lexicology theory and practice
Automated planning theory and practice
Global marketing contemporary theory practice and cases
Carbo-loaders meal
Product line architecture
High precision vs high recall
High precision vs high recall
High expectations high support
High precision vs high accuracy
Pengertian investasi
Margaret newman background
X-bar schema
Set theory practice
Theory of unpleasant symptoms in practice
Applying educational theory in practice
Henderson nursing definition
The theory of consumer choice chapter 21 practice
Evidence-based practice orem's theory
Abraham maslow siblings
Jean watson theory of caring diagram
Implementing organizational change theory into practice
Trait theory of leadership
Continental drift vs plate tectonics
Continental drift theory and plate tectonics theory
Neoclassical organization theory
Motivation in group formation
Theory x and theory y
Plate and rate theory of chromatography
Lien theory vs title theory
Y and y management
Game theory and graph theory
Mot and vbt difference
Valence bond theory shapes
Theory x and theory y
Valence bond theory and molecular orbital theory
Valence bond theory and molecular orbital theory
Valence bond theory and molecular orbital theory
Theory x and y douglas mcgregor
Modernization theory vs dependency theory
Colour design: theories and applications
Importance of dependency theory
Opponent process theory vs trichromatic theory
Despair vs integrity
Adulthood introduction
Explain the keynesian theory of employment
Stephen e palmer