High Performance Switches and Routers Theory and Practice
- Slides: 189
High Performance Switches and Routers: Theory and Practice Sigcomm 99 August 30, 1999 Harvard University Nick Mc. Keown Balaji Prabhakar Departments of Electrical Engineering and Computer Science nickm@stanford. edu Copyright 1999. All Rights Reserved balaji@isl. stanford. edu
Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 2
Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 3
Basic Architectural Components Admission Control Policing Congestion Control Routing Switching Copyright 1999. All Rights Reserved Reservation Output Scheduling Control Datapath: per packet processing 4
Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 5
Where high performance packet switches are used Carrier Class Core Router ATM Switch Frame Relay Switch The Internet Core Edge Router Copyright 1999. All Rights Reserved Enterprise WAN access & Enterprise Campus Switch 6
Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 7
ATM Switch • • Lookup cell VCI/VPI in VC table. Replace old VCI/VPI with new. Forward cell to outgoing interface. Transmit cell onto link. Copyright 1999. All Rights Reserved 8
Ethernet Switch • Lookup frame DA in forwarding table. – If known, forward to correct port. – If unknown, broadcast to all ports. • Learn SA of incoming frame. • Forward frame to outgoing interface. • Transmit frame onto link. Copyright 1999. All Rights Reserved 9
IP Router • Lookup packet DA in forwarding table. – If known, forward to correct port. – If unknown, drop packet. • Decrement TTL, update header Cksum. • Forward packet to outgoing interface. • Transmit packet onto link. Copyright 1999. All Rights Reserved 10
Introduction What is a Packet Switch? • Basic Architectural Components • Some Example Packet Switches • The Evolution of IP Routers Copyright 1999. All Rights Reserved 11
First Generation IP Routers Shared Backplane Buffer Memory CPU CP L U I ine nt er fa M ce em or y Copyright 1999. All Rights Reserved DMA DMA Line Interface MAC MAC 12
Second Generation IP Routers Buffer Memory CPU DMA DMA Line Card Local Buffer Memory MAC MAC Copyright 1999. All Rights Reserved 13
Third Generation Switches/Routers Switched Backplane Li L i Li. In nene L I Li. Ininnetneeterf rfa ace L I CPI Initnnetneeterf rfacacece n. Ut er rfa ac e er fa ce e fa ce M ce em or y Copyright 1999. All Rights Reserved Line Card CPU Card Line Card Local Buffer Memory MAC 14
Fourth Generation Switches/Routers Clustering and Multistage 1 2 3 4 5 6 13 14 15 16 17 18 25 26 27 28 29 30 7 8 9 10 11 12 19 20 21 22 23 24 31 32 21 1 2 3 4 5 6 7 8 9 10 1112 13 14 15 16 17 1819 20 21 22 23 2425 26 27 28 29 30 31 32 Copyright 1999. All Rights Reserved 15
Packet Switches References • J. Giacopelli, M. Littlewood, W. D. Sincoskie “Sunshine: A high performance self routing broadband packet switch architecture”, ISS ‘ 90. • J. S. Turner “Design of a Broadcast packet switching network”, IEEE Trans Comm, June 1988, pp. 734 743. • C. Partridge et al. “A Fifty Gigabit per second IP Router”, IEEE Trans Networking, 1998. • N. Mc. Keown, M. Izzard, A. Mekkittikul, W. Ellersick, M. Horowitz, “The Tiny Tera: A Packet Switch Core”, IEEE Micro Magazine, Jan Feb 1997. Copyright 1999. All Rights Reserved 16
Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 17
Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 18
Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 19
ATM and MPLS Switches Direct Lookup Memory Data Copyright 1999. All Rights Reserved Address VCI (Port, VCI) 20
Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 21
Bridges and Ethernet Switches Associative Lookups Advantages: Associative Memory or CAM Search Data Network Associated Address Data 48 • Simple Associated Data { Hit? Address log 2 N Copyright 1999. All Rights Reserved Disadvantages • Slow • High Power • Small • Expensive 22
Bridges and Ethernet Switches Hashing 16 Memory Data 48 Hashing Function Address Search Data Associated Data { Hit? Address log 2 N Copyright 1999. All Rights Reserved 23
Lookups Using Hashing An example Memory #1 Search Data 48 #2 #3 #4 Associated Data Hashing Function CRC 16 Linked lists Copyright 1999. All Rights Reserved 16 #1 { #2 Hit? Address log 2 N #1 #2 #3 24
Lookups Using Hashing Performance of simple example Copyright 1999. All Rights Reserved 25
Lookups Using Hashing Advantages: • Simple • Expected lookup time can be small Disadvantages • Non deterministic lookup time • Inefficient use of memory Copyright 1999. All Rights Reserved 26
Trees and Tries Binary Search Tree < > > < N entries Copyright 1999. All Rights Reserved > log 2 N < Binary Search Trie 0 0 1 1 010 0 1 111 27
Trees and Tries Multiway tries 16 ary Search Trie 0000, ptr 0000, 0 1111, ptr 000011110000 Copyright 1999. All Rights Reserved 1111, ptr 0000, 0 1111, ptr 111111 28
Trees and Tries Multiway tries Table produced from 215 randomly generated 48 -bit addresses Copyright 1999. All Rights Reserved 29
Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 30
Caching Addresses Slow Path Buffer Memory CPU Fast Path DMA DMA Line Card Local Buffer Memory MAC MAC Copyright 1999. All Rights Reserved 31
Caching Addresses LAN: WAN: Average flow < 40 packets Huge Number of flows Cache Hit Rate Cache = 10% of Full Table Copyright 1999. All Rights Reserved 32
IP Routers Class-based addresses IP Address Space Class A Class B Class A 212. 17. 9. 4 Class B Class C Copyright 1999. All Rights Reserved Class C D Routing Table: Exact Match 212. 17. 9. 0 Port 4 33
IP Routers CIDR Class based: A B C D 232 1 0 Classless: 128. 9. 0. 0 65/8 0 142. 12/19 128. 9/16 232 1 128. 9. 16. 14 Copyright 1999. All Rights Reserved 34
IP Routers CIDR 128. 9. 19/24 128. 9. 25/24 128. 9. 16/20 128. 9. 176/20 128. 9/16 232 1 0 128. 9. 16. 14 Most specific route = “longest matching prefix” Copyright 1999. All Rights Reserved 35
IP Routers Metrics for Lookups 128. 9. 16. 14 Prefix Port 65/8 128. 9/16 128. 9. 16/20 128. 9. 19/24 128. 9. 25/24 128. 9. 176/20 142. 12/19 3 5 2 7 10 1 3 Copyright 1999. All Rights Reserved • Lookup time • Storage space • Update time • Preprocessing time 36
IP Router Lookup H E A D E R Dstn Addr Forwarding Engine Next Hop Computation Next Hop Forwarding Table Destination Next Hop Incoming Packet IPv 4 unicast destination address based lookup Copyright 1999. All Rights Reserved 37
Need more than IPv 4 unicast lookups • Multicast • PIM SM – Longest Prefix Matching on the source and group address – Try (S, G) followed by (*, *, RP) – Check Incoming Interface • DVMRP: – Incoming Interface Check followed by (S, G) lookup • IPv 6 • 128 bit destination address field • Exact address architecture not yet known Copyright 1999. All Rights Reserved 38
Lookup Performance Required Line Rate Pkt size=40 B Pkt size=240 B T 1 1. 5 Mbps 4. 68 Kpps 0. 78 Kpps OC 3 155 Mbps 480 Kpps OC 12 622 Mbps 1. 94 Mpps 323 Kpps OC 48 2. 5 Gbps 7. 81 Mpps 1. 3 Mpps 31. 25 Mpps 5. 21 Mpps OC 192 10 Gbps Gigabit Ethernet (84 B packets): 1. 49 Mpps Copyright 1999. All Rights Reserved 39
Size of the Routing Table Source: http: //www. telstra. net/ops/bgptable. html Copyright 1999. All Rights Reserved 40
Ternary CAMs Associative Memory Value 10. 0 10. 1. 1. 0 10. 1. 3. 1 Mask 255. 0. 0. 0 255 R 1 R 2 R 3 R 4 Next Hop Priority Encoder Copyright 1999. All Rights Reserved 41
Binary Tries 0 d 1 f e a b g i h c Copyright 1999. All Rights Reserved j Example Prefixes a) 00001 b) 00010 c) 00011 d) 001 e) 0101 f) 011 g) 100 h) 1010 i) 1100 j) 11110000 42
Patricia Tree 0 f d a b e c Copyright 1999. All Rights Reserved 1 g h i Example Prefixes a) 00001 b) 00010 c) 00011 d) 001 Skip=5 e) 0101 f) 011 j g) 100 h) 1010 i) 1100 j) 11110000 43
Patricia Tree Disadvantages • Many memory accesses • May need backtracking • Pointers take up a lot of space Advantages • General Solution • Extensible to wider fields Avoid backtracking by storing the intermediate best matched prefix. (Dynamic Prefix Tries) 40 K entries: 2 MB data structure with 0. 3 0. 5 Mpps [O(W)] Copyright 1999. All Rights Reserved 44
Binary search on trie levels Level 0 Level 8 Level 29 Copyright 1999. All Rights Reserved P 45
Binary search on trie levels Store a hash table for each prefix length to aid search at a particular trie level. Length Hash 8 12 10 16 24 10. 1, 10. 2 10. 1. 1, 10. 1. 2, 10. 2. 3 Copyright 1999. All Rights Reserved Example Prefixes 10. 0/8 10. 1. 0. 0/16 10. 1. 1. 0/24 10. 1. 2. 0/24 10. 2. 3. 0/24 Example Addrs 10. 1. 1. 4 10. 4. 4. 3 10. 2. 3. 9 10. 2. 4. 8 46
Binary search on trie levels Disadvantages • Multiple hashed memory accesses. • Updates are complex. Advantages • Scaleable to IPv 6. 33 K entries: 1. 4 MB data structure with 1. 2 2. 2 Mpps [O(log W)] Copyright 1999. All Rights Reserved 47
Compacting Forwarding Tables 1 0 0 0 1 0 Copyright 1999. All Rights Reserved 1 1 1 0 0 0 1 1 48
Compacting Forwarding Tables 10001010 11100010 10000010 10110100 R 1, 0 0 R 2, 3 1 R 3, 7 2 Codeword array 11000000 R 4, 9 3 R 5, 0 4 Base index array 0 0 13 1 Copyright 1999. All Rights Reserved 49
Compacting Forwarding Tables Disadvantages • Scalability to larger tables? • Updates are complex. Advantages • Extremely small data structure can fit in cache. 33 K entries: 160 KB data structure with average 2 Mpps [O(W/k)] Copyright 1999. All Rights Reserved 50
Multi bit Tries 16 ary Search Trie 0000, ptr 0000, 0 1111, ptr 000011110000 Copyright 1999. All Rights Reserved 1111, ptr 0000, 0 1111, ptr 111111 51
Compressed Tries Only 3 memory accesses L 8 L 16 L 24 Copyright 1999. All Rights Reserved 52
Number Routing Lookups in Hardware Prefix length Most prefixes are 24 -bits or shorter Copyright 1999. All Rights Reserved 53
Routing Lookups in Hardware Prefixes up to 24 -bits 224 = 16 M entries 142. 19. 6 Next Hop 24 14 142. 19. 6. 14 1 Next Hop Copyright 1999. All Rights Reserved 54
Routing Lookups in Hardware Prefixes up to 24 -bits 128. 3. 72 0 Next Hop Pointer base 128. 3. 72 24 Next Hop Prefixes above 24 -bits Copyright 1999. All Rights Reserved 8 offset Next Hop Next 44 128. 3. 72. 44 1 55
Routing Lookups in Hardware Prefixes up to n-bits 2 n entries: 0 i N entries j Prefixes longer than N+M bits Next Hop N+M Copyright 1999. All Rights Reserved 56
Routing Lookups in Hardware Disadvantages • Large memory required (9 33 MB) • Depends on prefix length distribution. Advantages • 20 Mpps with 50 ns DRAM • Easy to implement in hardware Various compression schemes can be employed to decrease the storage requirements: e. g. employ carefully chosen variable length strides, bitmap compression etc. Copyright 1999. All Rights Reserved 57
IP Router Lookups References • A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp 3 14. • B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp 1248 56, vol. 3. • M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp 25 36. • P. Gupta, S. Lin, N. Mc. Keown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp 1241 1248, vol. 3. • S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1 3, 1998. • V. Srinivasan, G. Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998. Copyright 1999. All Rights Reserved 58
Forwarding Decisions • ATM and MPLS switches – Direct Lookup • Bridges and Ethernet switches – Associative Lookup – Hashing – Trees and tries • IP Routers – Caching – CIDR – Patricia trees/tries – Other methods • Packet Classification Copyright 1999. All Rights Reserved 59
Providing Value Added Services Some examples • Differentiated services – Regard traffic from Autonomous System #33 as `platinum grade’ • Access Control Lists – Deny udp host 194. 72. 33 194. 72. 6. 64 0. 0. 0. 15 eq snmp • Committed Access Rate – Rate limit WWW traffic from sub interface#739 to 10 Mbps • Policy based Routing – Route all voice traffic through the ATM network Copyright 1999. All Rights Reserved 60
Packet Classification H E A D E R Incoming Packet Copyright 1999. All Rights Reserved Forwarding Engine Packet Classification Action Classifier (Policy Database) Predicate Action 61
Multi field Packet Classification Given a classifier with N rules, find the action associated with the highest priority rule matching an incoming packet. Copyright 1999. All Rights Reserved 62
Geometric Interpretation in 2 D Field #1 Field #2 R 7 R 6 P 1 P 2 Field #2 Data R 3 e. g. (144. 24/16, 64/24) e. g. (128. 16. 46. 23, *) R 1 R 5 Copyright 1999. All Rights Reserved R 4 R 2 Field #1 63
Proposed Schemes Copyright 1999. All Rights Reserved 64
Proposed Schemes (Contd. ) Copyright 1999. All Rights Reserved 65
Proposed Schemes (Contd. ) Copyright 1999. All Rights Reserved 66
Grid of Tries 0 Dimension 1 1 0 0 0 1 R 1 0 1 1 0 R 3 R 2 Copyright 1999. All Rights Reserved R 4 0 0 R 5 0 R 6 0 Dimension 2 1 R 7 67
Grid of Tries Disadvantages • Static solution • Not easy to extend to higher dimensions Advantages • Good solution for two dimensions 20 K entries: 2 MB data structure with 9 memory accesses [at most 2 W] Copyright 1999. All Rights Reserved 68
Classification using Bit Parallelism 0 1 1 1 0 R 4 R 3 R 1 R 2 0 Copyright 1999. All Rights Reserved 69
Classification using Bit Parallelism Disadvantages • Large memory bandwidth • Hardware optimized Advantages • Good solution for multiple dimensions for small classifiers 512 rules: 1 Mpps with single FPGA and 5 128 KB SRAM chips. Copyright 1999. All Rights Reserved 70
Classification Using Multiple Fields Recursive Flow Classification 2 S = 2128 2 T = 212 Packet Header Memory F 1 Memory Action F 2 F 3 2 S = 2128 264 224 2 T = 212 F 4 Fn Copyright 1999. All Rights Reserved 71
Packet Classification References • T. V. Lakshman. D. Stiliadis. “High speed policy based packet forwarding using efficient multi dimensional range matching”, Sigcomm 1998, pp 191 202. • V. Srinivasan, S. Suri, G. Varghese and M. Waldvogel. “Fast and scalable layer 4 switching”, Sigcomm 1998, pp 203 214. • V. Srinivasan, G. Varghese, S. Suri. “Fast packet classification using tuple space search”, to be presented at Sigcomm 1999. • P. Gupta, N. Mc. Keown, “Packet classification using hierarchical intelligent cuttings”, Hot Interconnects VII, 1999. • P. Gupta, N. Mc. Keown, “Packet classification on multiple fields”, Sigcomm 1999. Copyright 1999. All Rights Reserved 72
Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 73
Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Combining input and output queues – Other non blocking fabrics – Multicast traffic Copyright 1999. All Rights Reserved 74
Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 75
Interconnects Two basic techniques Input Queueing Usually a non-blocking switch fabric (e. g. crossbar) Copyright 1999. All Rights Reserved Output Queueing Usually a fast bus 76
Interconnects Output Queueing Individual Output Queues Centralized Shared Memory b/w = 2 N. R 1 2 N 1 2 Memory b/w = (N+1). R Copyright 1999. All Rights Reserved N 77
Output Queueing The “ideal” 2 1 1 2 1 2 11 2 2 1 Copyright 1999. All Rights Reserved 78
Output Queueing How fast can we make centralized shared memory? 5 ns SRAM Shared Memory • 5 ns per memory operation • Two memory operations per packet • Therefore, up to 160 Gb/s • In practice, closer to 80 Gb/s 1 2 N 200 byte bus Copyright 1999. All Rights Reserved 79
Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 80
Interconnects Input Queueing with Crossbar Memory b/w = 2 R Data In Scheduler configuration Data Out Copyright 1999. All Rights Reserved 81
Input Queueing Delay Head of Line Blocking Load Copyright 1999. All Rights Reserved 58. 6% 100% 82
Head of Line Blocking Copyright 1999. All Rights Reserved 83
Copyright 1999. All Rights Reserved 84
Copyright 1999. All Rights Reserved 85
Input Queueing Virtual output queues Copyright 1999. All Rights Reserved 86
Input Queues Delay Virtual Output Queues Load Copyright 1999. All Rights Reserved 100% 87
Input Queueing Memory b/w = 2 R Scheduler Copyright 1999. All Rights Reserved Can be quite complex! 88
Input Queueing Scheduling Copyright 1999. All Rights Reserved 89
Input Queueing 1 2 3 4 7 2 4 2 5 2 Request Graph Scheduling 1 1 2 2 3 3 4 4 1 2 3 4 Bipartite Matching (Weight = 18) Question: Maximum weight or maximum size? Copyright 1999. All Rights Reserved 90
Input Queueing Scheduling • Maximum Size – Maximizes instantaneous throughput – Does it maximize long term throughput? • Maximum Weight – Can clear most backlogged queues – But does it sacrifice long term throughput? Copyright 1999. All Rights Reserved 91
Input Queueing Scheduling Copyright 1999. All Rights Reserved 1 1 2 2 92
Input Queueing Longest Queue First or Oldest Cell First Weight 1 2 3 4 1 1 1 ={ Queue Length Waiting Time 1 10 10 1 2 3 4 Copyright 1999. All Rights Reserved } Maximum weight 100% 1 2 3 4 93
Input Queueing Why is serving long/old queues better than serving maximum number of queues? Non-uniform traffic Uniform traffic VOQ # Copyright 1999. All Rights Reserved Avg Occupancy • When traffic is uniformly distributed, servicing the maximum number of queues leads to 100% throughput. • When traffic is non uniform, some queues become longer than others. • A good algorithm keeps the queue lengths matched, and services a large number of queues. VOQ # 94
Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 95
Wave Front Arbiter Requests Match 1 1 2 2 3 3 4 4 Copyright 1999. All Rights Reserved 96
Wave Front Arbiter Requests Copyright 1999. All Rights Reserved Match 97
Wave Front Arbiter Implementation Copyright 1999. All Rights Reserved 1, 1 1, 2 1, 3 1, 4 2, 1 2, 2 2, 3 2, 4 3, 1 3, 2 3, 3 3, 4 4, 1 4, 2 4, 3 4, 4 Combinational Logic Blocks 98
Wave Front Arbiter Wrapped WFA (WWFA) N steps instead of 2 N 1 Requests Copyright 1999. All Rights Reserved Match 99
Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 100
Parallel Random Iterative Matching Random Selection #1 1 2 3 1 2 3 4 4 4 Requests Grant Accept/Match 1 2 #2 3 1 2 3 1 2 3 4 4 4 Copyright 1999. All Rights Reserved 101
Parallel Iterative Matching Maximal is not Maximum 1 2 3 4 4 Requests Copyright 1999. All Rights Reserved Accept/Match 1 2 3 4 4 102
Parallel Iterative Matching Analytical Results Number of iterations to converge: Copyright 1999. All Rights Reserved 103
Parallel Iterative Matching Copyright 1999. All Rights Reserved 104
Parallel Iterative Matching Copyright 1999. All Rights Reserved 105
Parallel Iterative Matching Copyright 1999. All Rights Reserved 106
Input Queueing Practical Algorithms • Maximal Size Algorithms – Wave Front Arbiter (WFA) – Parallel Iterative Matching (PIM) – i. SLIP • Maximal Weight Algorithms – Fair Access Round Robin (FARR) – Longest Port First (LPF) Copyright 1999. All Rights Reserved 107
i. SLIP Round Robin Selection #1 1 2 3 1 2 3 4 4 4 Requests Grant Accept/Match 1 2 #2 3 1 2 3 1 2 3 4 4 4 Copyright 1999. All Rights Reserved 108
i. SLIP Properties • • • Random under low load TDM under high load Lowest priority to MRU 1 iteration: fair to outputs Converges in at most N iterations. On average <= log 2 N • Implementation: N priority encoders • Up to 100% throughput for uniform traffic Copyright 1999. All Rights Reserved 109
i. SLIP Copyright 1999. All Rights Reserved 110
i. SLIP Copyright 1999. All Rights Reserved 111
i. SLIP Programmable Priority Encoder N N Implementation 1 Grant 1 Accept log 2 N 2 2 log 2 N Grant Accept State Decision N N Grant Copyright 1999. All Rights Reserved N Accept log 2 N 112
Input Queueing References • M. Karol et al. “Input vs Output Queueing on a Space Division Packet Switch”, IEEE Trans Comm. , Dec 1987, pp. 1347 1356. • Y. Tamir, “Symmetric Crossbar arbiters for VLSI communication switches”, IEEE Trans Parallel and Dist Sys. , Jan 1993, pp. 13 27. • T. Anderson et al. “High Speed Switch Scheduling for Local Area Networks”, ACM Trans Comp Sys. , Nov 1993, pp. 319 352. • N. Mc. Keown, “The i. SLIP scheduling algorithm for Input Queued Switches”, IEEE Trans Networking, April 1999, pp. 188 201. • C. Lund et al. “Fair prioritized scheduling in an input buffered switch”, Proc. of IFIP IEEE Conf. , April 1996, pp. 358 69. • A. Mekkitikul et al. “A Practical Scheduling Algorithm to Achieve 100% Throughput in Input Queued Switches”, IEEE Infocom 98, April 1998. Copyright 1999. All Rights Reserved 113
Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 114
Other Non Blocking Fabrics Clos Network Copyright 1999. All Rights Reserved 115
Other Non Blocking Fabrics Clos Network Expansion factor required = 2 1/N (but still blocking for multicast) Copyright 1999. All Rights Reserved 116
Other Non Blocking Fabrics Self-Routing Networks 000 001 010 011 100 101 110 111 Copyright 1999. All Rights Reserved 117
Other Non Blocking Fabrics Self-Routing Networks The Non blocking Batcher Banyan Network Batcher Sorter Self-Routing Network 3 7 7 7 7 2 5 0 4 6 6 5 3 2 5 5 4 5 2 5 3 1 6 5 4 6 6 1 3 0 3 3 0 1 0 4 3 2 2 1 0 6 2 1 0 1 4 4 4 6 2 2 0 001 010 011 100 101 110 111 • Fabric can be used as scheduler. • Batcher-Banyan network is blocking for multicast. Copyright 1999. All Rights Reserved 118
Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 119
Speedup • Context – input queued switches – output queued switches – the speedup problem • Early approaches • Algorithms • Implementation considerations Copyright 1999. All Rights Reserved 120
Speedup: Context M e m o r y A generic switch The placement of memory gives Output queued switches Input queued switches Combined input and output queued switches Copyright 1999. All Rights Reserved 121
Output queued switches Best delay and throughput performance Possible to erect “bandwidth firewalls” between sessions Main problem Requires high fabric speedup (S = N) Unsuitable for high speed switching Copyright 1999. All Rights Reserved 122
Input queued switches Big advantage Speedup of one is sufficient Main problem Can’t guarantee delay due to input contention Overcoming input contention: use higher speedup Copyright 1999. All Rights Reserved 123
A Comparison Memory speeds for 32 x 32 switch Output queued Input queued Line Rate Memory BW Access Time Per cell Memory BW Access Time 100 Mb/s 3. 3 Gb/s 128 ns 200 Mb/s 2. 12 s 1 Gb/s 33 Gb/s 12. 8 ns 2 Gb/s 212 ns 2. 5 Gb/s 82. 5 Gb/s 5. 12 ns 5 Gb/s 84. 8 ns 10 Gb/s 330 Gb/s 1. 28 ns 20 Gb/s 21. 2 ns Copyright 1999. All Rights Reserved 124
The Speedup Problem Find a compromise: 1 < Speedup << N to get the performance of an OQ switch close to the cost of an IQ switch Essential for high speed Qo. S switching Copyright 1999. All Rights Reserved 125
Some Early Approaches Probabilistic Analyses assume traffic models (Bernoulli, Markov modulated, non uniform loading, “friendly correlated”) obtain mean throughput and delays, bounds on tails analyze different fabrics (crossbar, multistage, etc) Numerical Methods use actual and simulated traffic traces run different algorithms set the “speedup dial” at various values Copyright 1999. All Rights Reserved 126
The findings Very tantalizing. . . under different settings (traffic, loading, algorithm, etc) and even for varying switch sizes A speedup of between 2 and 5 was sufficient! Copyright 1999. All Rights Reserved 127
Using Speedup 1 2 1 Copyright 1999. All Rights Reserved 128
Intuition Bernoulli IID inputs Speedup = 1 Fabric throughput =. 58 Bernoulli IID inputs Speedup = 2 Fabric throughput = 1. 16 I/p efficiency, = 1/1. 16 Ave I/p queue = 6. 25 Copyright 1999. All Rights Reserved 129
Intuition (continued) Bernoulli IID inputs Speedup = 3 Fabric throughput = 1. 74 Input efficiency = 1/1. 74 Ave I/p queue = 1. 35 Bernoulli IID inputs Speedup = 4 Fabric throughput = 2. 32 Input efficiency = 1/2. 32 Ave I/p queue = 0. 75 Copyright 1999. All Rights Reserved 130
Issues Need hard guarantees exact, not average Robustness realistic, even adversarial, traffic not friendly Bernoulli IID Copyright 1999. All Rights Reserved 131
The Ideal Solution Inputs Speedup = N Outputs ? Speedup << N Question: Can we find a simple and good algorithms that exactly mimics output queueing regardless of switch sizes and traffic patterns? Copyright 1999. All Rights Reserved 132
What is exact mimicking? Apply same inputs to an OQ and a CIOQ switch packet by packet Obtain same outputs packet by packet Copyright 1999. All Rights Reserved 133
Algorithm MUCF Key concept: urgency value urgency = departure time present time Copyright 1999. All Rights Reserved 134
MUCF The algorithm Outputs try to get their most urgent packets Inputs grant to output whose packet is most urgent, ties broken by port number Loser outputs for next most urgent packet Algorithm terminates when no more matchings are possible Copyright 1999. All Rights Reserved 135
Stable Marriage Problem Men = Outputs Bill John Pedro Women = Inputs Hillary Copyright 1999. All Rights Reserved Monica Maria 136
An example Observation: Only two reasons a packet doesn’t get to its output Input contention, Output contention This is why speedup of 2 works!! Copyright 1999. All Rights Reserved 137
What does this get us? Speedup of 4 is sufficient for exact emulation of FIFO OQ switches, with MUCF What about non FIFO OQ switches? E. g. WFQ, Strict priority Copyright 1999. All Rights Reserved 138
Other results To exactly emulate an Nx. N OQ switch Speedup of 2 1/N is necessary and sufficient (Hence a speedup of 2 is sufficient for all N) Input traffic patterns can be absolutely arbitrary Emulated OQ switch may use a “monotone” scheduling policies E. g. : FIFO, LIFO, strict priority, WFQ, etc Copyright 1999. All Rights Reserved 139
What gives? Complexity of the algorithms Extra hardware for processing Extra run time (time complexity) What is the benefit? Reduced memory bandwidth requirements Tradeoff: Memory for processing Moore’s Law supports this tradeoff Copyright 1999. All Rights Reserved 140
Implementation a closer look Main sources of difficulty Estimating urgency, etc info is distributed (and communicating this info among I/ps and O/ps) Matching process too many iterations? Estimating urgency depends on what is being emulated Like taking a ticket to hold a place in a queue FIFO, Strict priorities no problem WFQ, etc problems Copyright 1999. All Rights Reserved 141
Implementation (contd) Matching process A variant of the stable marriage problem Worst case number of iterations for SMP = N 2 Worst case number of iterations in switching = N High probability and average approxly log(N) Copyright 1999. All Rights Reserved 142
Other Work Relax stringent requirement of exact emulation Least Occupied O/p First Algorithm (LOOFA) Keeps outputs always busy if there are packets By time stamping packets, it also exactly mimics Disallow arbitrary inputs E. g. leaky bucket constrained Obtain worst case delay bounds Copyright 1999. All Rights Reserved 143
References for speedup Y. Oie et al, “Effect of speedup in nonblocking packet switch’’, ICC 89. A. L Gupta, N. D. Georgana, “Analysis of a packet switch with input and output buffers and speed constraints”, Infocom 91. S T. Chuang et al, “Matching output queueing with a combined input and output queued switch”, IEEE JSAC, vol 17, no 6, 1999. B. Prabhakar, N. Mc. Keown, “On the speedup required for combined input and output queued switching”, Automatica, vol 35, 1999. P. Krishna et al, “On the speedup required for work conserving crossbar switches”, IEEE JSAC, vol 17, no 6, 1999. A. Charny, “Providing Qo. S guarantees in input buffered crossbar switches with speedup”, Ph. D Thesis, MIT, 1998. Copyright 1999. All Rights Reserved 144
Switching Fabrics • Output and Input Queueing • Output Queueing • Input Queueing – Scheduling algorithms – Other non blocking fabrics – Combining input and output queues – Multicast traffic Copyright 1999. All Rights Reserved 145
Multicast Switching • The problem • Switching with crossbar fabrics • Switching with other fabrics Copyright 1999. All Rights Reserved 146
Multicasting 2 1 Copyright 1999. All Rights Reserved 3 5 4 6 147
Crossbar fabrics: Method 1 Copy network + unicast switching Copy networks Increased hardware, increased input contention Copyright 1999. All Rights Reserved 148
Method 2 Use copying properties of crossbar fabric No fanout splitting: Easy, but low throughput Fanout splitting: higher throughput, but not as simple. Leaves “residue”. Copyright 1999. All Rights Reserved 149
The effect of fanout splitting Performance of an 8 x 8 switch with and without fanout splitting under uniform IID traffic Copyright 1999. All Rights Reserved 150
Placement of residue Key question: How should outputs grant requests? (and hence decide placement of residue) Copyright 1999. All Rights Reserved 151
Residue and throughput Result: Concentrating residue brings more new work forward. Hence leads to higher throughput. But, there are fairness problems to deal with. This and other problems can be looked at in a unified way by mapping the multicasting problem onto a variation of Tetris. Copyright 1999. All Rights Reserved 152
Multicasting and Tetris Input ports 1 2 3 4 5 Residue 1 2 3 4 5 Output ports Copyright 1999. All Rights Reserved 153
Multicasting and Tetris Input ports 1 2 3 4 5 Residue Concentrated 1 2 3 4 5 Output ports Copyright 1999. All Rights Reserved 154
Replication by recycling Main idea: Make two copies at a time using a binary tree with input at root and all possible destination outputs at the leaves. b c y a d x e Copyright 1999. All Rights Reserved x b x a c y y e d 155
Replication by recycling (cont’d) Receive Reseq Transmit Output Table Network Recycle Scaleable to large fanouts. Needs resequencing at outputs and introduces variable delays. Copyright 1999. All Rights Reserved 156
References for Multicasting • J. Hayes et al. “Performance analysis of a multicast switch”, IEEE/ACM Trans. on Networking, vol 39, April 1991. • B. Prabhakar et al. “Tetris models for multicast switches”, Proc. of the 30 th Annual Conference on Information Sciences and Systems, 1996 • B. Prabhakar et al. “Multicast scheduling for input queued switches”, IEEE JSAC, 1997 • J. Turner, “An optimal nonblocking multicast virtual circuit switch”, INFOCOM, 1994 Copyright 1999. All Rights Reserved 157
Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 158
Output Scheduling • What is output scheduling? • How is it done? • Practical Considerations Copyright 1999. All Rights Reserved 159
Output Scheduling Allocating output bandwidth Controlling packet delay scheduler Copyright 1999. All Rights Reserved 160
Output Scheduling FIFO Fair Queueing Copyright 1999. All Rights Reserved 161
Motivation • FIFO is natural but gives poor Qo. S – bursty flows increase delays for others – hence cannot guarantee delays Need round robin scheduling of packets – Fair Queueing – Weighted Fair Queueing, Generalized Processor Sharing Copyright 1999. All Rights Reserved 162
Fair queueing: Main issues • Level of granularity – packet by packet? (favors long packets) – bit by bit? (ideal, but very complicated) • Packet Generalized Processor Sharing (PGPS) – serves packet by packet – and imitates bit by bit schedule within a tolerance Copyright 1999. All Rights Reserved 163
How does WFQ work? WR = 1 WG = 5 WP = 2 Copyright 1999. All Rights Reserved 164
Delay guarantees • Theorem If flows are leaky bucket constrained and all nodes employ GPS (WFQ), then the network can guarantee worst case delay bounds to sessions. Copyright 1999. All Rights Reserved 165
Practical considerations • For every packet, the scheduler needs to – classify it into the right flow queue and maintain a linked list for each flow – schedule it for departure • Complexities of both are o(log [# of flows]) – first is hard to overcome – second can be overcome by DRR Copyright 1999. All Rights Reserved 166
Deficit Round Robin 700 50 250 400 200 600 500 250 750 500 100 400 500 Good approximation of FQ Much simpler to implement Copyright 1999. All Rights Reserved 500 Quantum size 167
But. . . • WFQ is still very hard to implement – classification is a problem – needs to maintain too much state information – doesn’t scale well Copyright 1999. All Rights Reserved 168
Strict Priorities and Diff Serv • Classify flows into priority classes – maintain only per class queues – perform FIFO within each class – avoid “curse of dimensionality” Copyright 1999. All Rights Reserved 169
Diff Serv • A framework for providing differentiated Qo. S – set Type of Service (To. S) bits in packet headers – this classifies packets into classes – routers maintain per class queues – condition traffic at network edges to conform to class requirements May still need queue management inside the network Copyright 1999. All Rights Reserved 170
References for O/p Scheduling A. Demers et al, “Analysis and simulation of a fair queueing algorithm”, ACM SIGCOMM 1989. A. Parekh, R. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: the single node case”, IEEE Trans. on Networking, June 1993. A. Parekh, R. Gallager, “A generalized processor sharing approach to flow control in integrated services networks: the multiple node case”, IEEE Trans. on Networking, August 1993. M. Shreedhar, G. Varghese, “Efficient Fair Queueing using Deficit Round Robin”, ACM SIGCOMM, 1995. K. Nichols, S. Blake (eds), “Differentiated Services: Operational Model and Definitions”, Internet Draft, 1998. Copyright 1999. All Rights Reserved 171
Active Queue Management • Problems with traditional queue management – tail drop • Active Queue Management – goals – an example – effectiveness Copyright 1999. All Rights Reserved 172
Tail Drop Queue Management Lock-Out Max Queue Length Copyright 1999. All Rights Reserved 173
Tail Drop Queue Management • Drop packets only when queue is full – long steady state delay – global synchronization – bias against bursty traffic Copyright 1999. All Rights Reserved 174
Global Synchronization Max Queue Length Copyright 1999. All Rights Reserved 175
Bias Against Bursty Traffic Max Queue Length Copyright 1999. All Rights Reserved 176
Alternative Queue Management Schemes • Drop from front on full queue • Drop at random on full queue F both solve the lock out problem F both have the full queues problem Copyright 1999. All Rights Reserved 177
Active Queue Management Goals • Solve lock out and full queue problems – no lock out behavior – no global synchronization – no bias against bursty flow • Provide better Qo. S at a router – low steady state delay – lower packet dropping Copyright 1999. All Rights Reserved 178
Active Queue Management • Problems with traditional queue management – tail drop • Active Queue Management – goals F an example – effectiveness Copyright 1999. All Rights Reserved 179
Random Early Detection (RED) Pk maxth l l l P 2 qavg P 1 minth if qavg < minth: admit every packet else if qavg <= maxth: drop an incoming packet with p = (qavg - minth)/(maxth - minth) else if qavg > maxth: drop every incoming packet Copyright 1999. All Rights Reserved 180
Effectiveness of RED: Lock Out • Packets are randomly dropped • Each flow has the same probability of being discarded Copyright 1999. All Rights Reserved 181
Effectiveness of RED: Full Queue • Drop packets probabilistically in anticipation of congestion (not when queue is full) • Use qavg to decide packet dropping probability: allow instantaneous bursts • Randomness avoids global synchronization Copyright 1999. All Rights Reserved 182
What Qo. S does RED Provide? • Lower buffer delay: good interactive service – qavg is controlled to be small • Given responsive flows: packet dropping is reduced – early congestion indication allows traffic to throttle back before congestion • Given responsive flows: fair bandwidth allocation Copyright 1999. All Rights Reserved 183
Unresponsive or aggressive flows • Don’t properly back off during congestion • Take away bandwidth from TCP compatible flows • Monopolize buffer space Copyright 1999. All Rights Reserved 184
Control Unresponsive Flows • Some active queue management schemes – RED with penalty box – Flow RED (FRED) – Stabilized RED (SRED) identify and penalize unresponsive flows with a bit of extra work Copyright 1999. All Rights Reserved 185
Active Queue Management References • B. Braden et al. “Recommendations on queue management and congestion avoidance in the internet”, RFC 2309, 1998. • S. Floyd, V. Jacobson, “Random early detection gateways for congestion avoidance”, IEEE/ACM Trans. on Networking, 1(4), Aug. 1993. • D. Lin, R. Morris, “Dynamics on random early detection”, ACM SIGCOMM, 1997 • T. Ott et al. “SRED: Stabilized RED”, INFOCOM 1999 • S. Floyd, K. Fall, “Router mechanisms to support end to end congestion control”, LBL technical report, 1997 Copyright 1999. All Rights Reserved 186
Tutorial Outline • Introduction: What is a Packet Switch? • Packet Lookup and Classification: Where does a packet go next? • Switching Fabrics: How does the packet get there? • Output Scheduling: When should the packet leave? Copyright 1999. All Rights Reserved 187
Basic Architectural Components Admission Control Policing Congestion Control Routing Switching Copyright 1999. All Rights Reserved Reservation Output Scheduling Control Datapath: per packet processing 188
Basic Architectural Components 1. Datapath: per-packet processing Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision Copyright 1999. All Rights Reserved 189
- High performance switches and routers
- High performance core router
- Small business rv router
- High performance switches
- First internet router
- Business class routers
- Three dumb routers
- Three dumb routers
- Juniper ptx packet transport routers
- Routers
- Stallioni net solutions
- Routers.
- Componentes internos de un router
- Show that the maximum efficiency of pure aloha is 1/(2e)
- Hnd routers
- Sisco routers
- High performance cycle theory
- Vni2140
- Three generations of packet switches
- Kundan switches models
- Which type of reaction
- Bridges vs switches
- Cisco 100 series
- Bridges vs switches
- Pneumatic
- Mercury switches in cars
- Magnetic quadrupole
- All switches illustrated in schematics are
- Slotted optical switches
- Schneider unica switches
- Electronics q
- A switch combines crossbar switches in several stages
- Uma multiprocessors using crossbar switches
- A switch combines crossbar switches in several stages
- Zenith automatic transfer switches
- Netgear gsm/fsm fully managed switches
- Refurbished netgear gsm/fsm fully managed switches
- Programmable logic controller
- We should not touch electric switches with wet hands. why
- Practice assessor and practice supervisor
- Chapter 11 performance appraisal - (pdf)
- Performance management vs performance appraisal
- All performance attributes designated as joint performance
- Proper practice prevents poor performance
- Proper planning and preparation prevents poor performance
- Quality assurance theory
- Compassion theory
- Quality revolution in software testing
- Software testing and quality assurance theory and practice
- Software testing and quality assurance theory and practice
- Quality assurance theory
- High directive and high supportive behavior
- High directive and high supportive behavior
- Decruitment options
- Sand: towards high-performance serverless computing
- Maui high performance computing center
- Hpc linux distro
- High performance work practices examples
- Introduction hplc
- Laptops for high performance computing
- High performance nutrition
- Mttf
- High performance distributed file system
- High performance distributed file system
- Anatomy of high-performance matrix multiplication
- High performance development model
- High performance organization principles
- Adaptive insertion policies for high performance caching
- High performance development model
- High performance operating system
- High performance sql server
- High performance computing modernization program
- Bigpurple nyu
- Ged high impact indicators
- What is hpda
- Principles of high-performance processor design
- High performance web sites
- High performance data analytics definition
- Queryperformancecounter delphi
- High-performance forecasting
- High performance operating system
- High performance concrete
- Density of concrete
- High performance liquid chromatography introduction
- The alarm management handbook
- High performance liquid chromatography hplc machine
- High performance computing modernization program
- High performance additives
- Accelerating high performance
- "high performance learning"
- High performance shaders
- Hplinpack
- "high performance learning"
- High performance food
- High quality performance
- High performance data mining
- Superscalar vs vliw
- What is the gpa equivalent of hibernate.cfg.xml file
- High performance grid
- High performance ssh
- High-performance digital signal processing
- High performance spaceflight computing
- Matlab high performance computing
- "high performance learning"
- High performance planning
- "high performance learning"
- High performance embedded computing
- High performance embedded computer
- Regina high performance endurance
- High performance servers
- Ceph: a scalable, high-performance distributed file system
- Army high performance computing research center
- Adaptive insertion policies for high performance caching
- Uil number sense
- Theory and practice of histotechnology
- Oligarchical collectivism meaning
- Leadership theory and practice 6th edition
- Jacques p thiroux
- Theory research and evidence based practice
- Cloud computing theory and practice
- Accounting theory and practice notes
- Coaching theory and practice
- The theory and practice of oligarchical collectivism
- Automated planning theory and practice
- Educational psychology chapter 1
- Theory of translation lectures
- Alon global marketing download
- Theory research and evidence based practice
- Curriculum development theory and practice
- Software architecture foundations theory and practice
- Macroeconomics theory and practice
- Automated planning theory and practice
- Authentic leadership theory and practice
- Software architecture foundations theory and practice
- Global marketing contemporary theory practice and cases
- Jacques thiroux
- Substance abuse counseling theory and practice
- Batch sequential architecture
- According to kimball richman and copen administration is
- Learning approaches, theory, and practice
- The servant theories
- Global marketing: contemporary theory, practice, and cases
- Un global compact
- Effective group discussion defines human communication as
- English lexicology theory and practice
- Automated planning theory and practice
- Global marketing contemporary theory practice and cases
- Carbo-loaders meal
- Product line architecture
- High precision vs high recall
- High precision vs high recall
- High expectations high support
- High precision vs high accuracy
- Pengertian investasi
- Margaret newman background
- X-bar schema
- Set theory practice
- Theory of unpleasant symptoms in practice
- Applying educational theory in practice
- Henderson nursing definition
- The theory of consumer choice chapter 21 practice
- Evidence-based practice orem's theory
- Abraham maslow siblings
- Jean watson theory of caring diagram
- Implementing organizational change theory into practice
- Trait theory of leadership
- Continental drift vs plate tectonics
- Continental drift theory and plate tectonics theory
- Neoclassical organization theory
- Motivation in group formation
- Theory x and theory y
- Plate and rate theory of chromatography
- Lien theory vs title theory
- Y and y management
- Game theory and graph theory
- Mot and vbt difference
- Valence bond theory shapes
- Theory x and theory y
- Valence bond theory and molecular orbital theory
- Valence bond theory and molecular orbital theory
- Valence bond theory and molecular orbital theory
- Theory x and y douglas mcgregor
- Modernization theory vs dependency theory
- Colour design: theories and applications
- Importance of dependency theory
- Opponent process theory vs trichromatic theory
- Despair vs integrity
- Adulthood introduction
- Explain the keynesian theory of employment
- Stephen e palmer