Introduction to High Performance Internet Switches and Routers












































































































































- Slides: 140

Introduction to High. Performance Internet Switches and Routers COMP 680 E by M. Hamdi 1

Network Architecture Long Haul Network DWDM Core 10 Gb. E Core Routers Campus / Residential Metropolitan 10 Gb. E Edge Routers 10 Gb. E Edge switch Gb. E • • • Access Routers COMP 680 E by M. Hamdi • • • Access switch 2

pop pop COMP 680 E by M. Hamdi 3

How the Internet really is: Current Trend Modems, DSL SONET/SDH DWDM COMP 680 E by M. Hamdi 4

The Internet is a mesh of routers mostly interconnected by (ATM and) SONET (and DWDM) TDM TDM Circuit switched crossconnects, DWDM etc. COMP 680 E by M. Hamdi 5

Typical (BUT NOT ALL) IP Backbone (Late 1990’s) Core Router ATM Switch MUX SONET/SDH ADM SONET/SDH DCS SONET/SDH ADM MUX ATM Switch Core Router COMP 680 E by M. Hamdi 6

Points of Presence (POPs) POP 2 A POP 1 POP 4 B C POP 3 D E POP 5 POP 6 POP 7 POP 8 COMP 680 E by M. Hamdi F 7

Where High Performance Routers are Used (2. 5 Gb/s) R 1 R 2 R 5 R 4 R 3 R 8 R 9 R 10 R 7 R 11 R 14 R 13 (2. 5 Gb/s) R 6 R 15 COMP 680 E by M. Hamdi (2. 5 Gb/s) R 12 R 16 8 (2. 5 Gb/s)

Hierarchical arrangement End hosts (1000 s per mux) Access multiplexer Edge Routers Core Routers POP 10 Gb/s “OC 192” POP Point of Presence (POP) COMP 680 E by M. Hamdi POP: Point of Presence. Richly interconnected by mesh of long-haul links. Typically: 40 POPs per national network operator; 10 -40 core routers per POP. 9

Typical POP Configuration Transport Network DWDM/SONET Terminal Backbone routers 10 G WAN Transport Links > 50% of high speed interfaces are router-torouter (Core routers) 10 G Router-Router Intra-Office Links Aggregation switches/routers (Edge Switches) COMP 680 E by M. Hamdi 10

Today’s Network Equipment Routers Switches SONET DWDM LAYER 3 LAYER 2 LAYER 1 LAYER 0 Internet Protocol FR & ATM SONET DWDM COMP 680 E by M. Hamdi 11

Functions in a packet switch Interconnect Ingress linecard Buffer Framing Route TTL lookup process ing Egress linecard Buffer Qo. S Framing schedul ing Interconnect scheduling Control plane Control path Data path Scheduling path COMP 680 E by M. Hamdi 12

Functions in a circuit switch Ingress linecard Interconnect Framing Egress linecard Framing Interconnect scheduling Control plane Control path Data path COMP 680 E by M. Hamdi 13

Our emphasis for now is to look at packet switches (IP, ATM, Ethernet, framerelay, etc. ) COMP 680 E by M. Hamdi 14

What a Router Looks Like Cisco GSR 12416 Juniper M 160 19” Capacity: 160 Gb/s Power: 4. 2 k. W 6 ft Capacity: 80 Gb/s Power: 2. 6 k. W 3 ft 2. 5 ft COMP 680 E by M. Hamdi 15

A Router Chassis Fans/ Power Supplies Linecards COMP 680 E by M. Hamdi 16

Backplane • • A Circuit Board with connectors for line cards High speed electrical traces connecting line cards to fabric Usually passive Typically 30 -layer boards COMP 680 E by M. Hamdi 17

Line Card Picture COMP 680 E by M. Hamdi 18

What do these two have in common? Cisco Catalyst 3750 G Cisco CRS-1 COMP 680 E by M. Hamdi 19

What do these two have in common? • • • CRS-1 linecard 20” x (18”+11”) x 1 RU 40 Gbps, 80 MPPS State-of-the-art 0. 13 u silicon Full IP routing stack including IPv 4 and IPv 6 support Distributed IOS Multi-chassis support • • • Cat 3750 G Switch 19” x 16” x 1 RU 52 Gpbs, 78 MPPS State-of-the-art 0. 13 u silicon Full IP routing stack including IPv 4 and IPv 6 support Distributed IOS Multi-chassis support COMP 680 E by M. Hamdi 20

What is different between them? Cisco Catalyst 3750 G Cisco CRS-1 COMP 680 E by M. Hamdi 21

A lot… CRS-1 linecard Cat 3750 G Switch • Up to 1024 linecards • Up to 9 stack members • Fully programmable forwarding • Hardwired ASIC forwarding • MPLS support • Re-startable routing applications • 11 K prefix entries and 1. 5 K ACLs • 2 M prefix entries and 512 K ACLs • 32 Gbps shared stack ring • 46 Tbps 3 -stage switching fabric • L 2 switching support • H-A non-stop routing protocols COMP 680 E by M. Hamdi 22

Other packet switches Cisco 7500 “edge” routers Lucent GX 550 Core ATM switch DSL router COMP 680 E by M. Hamdi 23

What is Routing? R 3 R 1 A R 4 D B D C E D R 2 Destination Next Hop D R 3 E R 3 F R 5 by M. Hamdi COMP 680 E R 5 F 24

What is Routing? R 3 R 1 A 4 Ver 20 bytes B C R 4 16 HLen T. Service D Total Packet Length Flags Fragment Offset Fragment ID TTL D D 32 D 1 D Protocol Header Checksum R 2 Source Address Destination Next Hop D Options (if any) R 3 E F E Data R 5 F R 3 R 5 by M. Hamdi COMP 680 E 25

What is Routing? R 3 A R 1 R 4 D B C E R 2 R 5 COMP 680 E by M. Hamdi F 26

Basic Architectural Elements of a Router Routing • Routing table update (OSPF, RIP, IS-IS) • Admission Control • Congestion Control • Reservation • Routing • Switching Lookup • Arbitration • Packet • Scheduling Classifier Control Plane “Typically in Software” Switch (per-packet processing) “Typically in Hardware” Switching COMP 680 E by M. Hamdi 27

Basic Architectural Components Datapath: per-packet processing 1. Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision COMP 680 E by M. Hamdi 28

Per-packet processing in a Switch/Router 1. Accept packet arriving on an ingress line. 2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s). 3. Manipulate packet header: e. g. , decrement TTL, update header checksum. 4. Send packet to outgoing interface(s). 5. Queue until line is free. 6. Transmit packet onto outgoing line. COMP 680 E by M. Hamdi 29

ATM Switch • • Lookup cell VCI/VPI in VC table. Replace old VCI/VPI with new. Forward cell to outgoing interface. Transmit cell onto link. COMP 680 E by M. Hamdi 30

Ethernet Switch • Lookup frame DA in forwarding table. – If known, forward to correct port. – If unknown, broadcast to all ports. • Learn SA of incoming frame. • Forward frame to outgoing interface. • Transmit frame onto link. COMP 680 E by M. Hamdi 31

IP Router • Lookup packet DA in forwarding table. – If known, forward to correct port. – If unknown, drop packet. • Decrement TTL, update header Cksum. • Forward packet to outgoing interface. • Transmit packet onto link. COMP 680 E by M. Hamdi 32

Special per packet/flow processing • The router can be equipped with additional capabilities to provide special services on a per-packet or per-class basis. • The router can perform some additional processing on the incoming packets: – Classifying the packet • IPv 4, IPv 6, MPLS, . . . – Delivering packets according to a pre-agreed service: Absolute service or relative service (e. g. , send a packet within a given deadline, give a packet a better service than another packet (Int. Serv – Diff. Serv)) – Filtering packets for security reasons – Treating multicast packets differently from unicast packets COMP 680 E by M. Hamdi 33

Per packet Processing Must be Fast !!! Year Aggregate Linerate Arriving rate of 40 B POS packets (Million pkts/sec) 1997 1999 2001 2003 2006 622 Mb/s 2. 5 Gb/s 10 Gb/s 40 Gb/s 80 Gb/s 1. 56 6. 25 25 100 200 1. Packet processing must be simple and easy to implement 2. Memory access time is the bottleneck 200 Mpps × 2 lookups/pkt = 400 Mlookups/sec → 2. 5 ns per lookup COMP 680 E by M. Hamdi 34

First Generation Routers Shared Backplane Li CP n I U nt e er fa ce M em or y CPU Route Table Buffer Memory Line Interface MAC MAC Typically <0. 5 Gb/s aggregate capacity COMP 680 E by M. Hamdi 35

Bus-based Router Architectures with Single Processor • The first generation of IP router • Based on software implementations on a single general-purpose CPU. • Limitations: – Serious processing bottleneck in the central processor – Memory intensive operations (e. g. table lookup & data movements) limits the effectiveness of processor power – A severe limiting factor to overall router throughput from input/output (I/O) bus COMP 680 E by M. Hamdi 36

Second Generation Routers CPU Route Table Buffer Memory Line Card Buffer Memory Fwding Cache MAC MAC Typically <5 Gb/s aggregate capacity COMP 680 E by M. Hamdi 37

Bus-based Router Architectures with Multiple Processors • Architectures with Route Caching – Second generation IP routers – Distribute packet forwarding operations – Network interface cards » Processors » Route caches – Packets are transmitted once over the shared bus – Limitations: » The central routing table is a bottleneck at high-speeds » traffic dependent throughput » shared bus is still a bottleneck COMP 680 E by M. Hamdi 38

Limitation of IP Packet Forwarding based on Route Caching • Routing changes invalidate existing cache entries and need re-establishment • The performance depends on: – a. how big the cache – b. how the cache is maintained – c. what the performance of the slow path is • Solution: – Using a forwarding database in each network interface • Benefit: – Performance, Scalability, Network resilience, and Functionality COMP 680 E by M. Hamdi 39

Third Generation Routers Switched Backplane Li I CP n ne Ute rf ac M e em or y Line Card CPU Card Line Card Local Buffer Memory Routing Table Local Buffer Memory Fwding Table MAC Typically <50 Gb/s aggregate capacity COMP 680 E by M. Hamdi 40

Switch-based Router Architectures with Fully Distributed Processors • To avoid bottlenecks: – Processing power – Memory bandwidth – Internal bus bandwidth • Each network interface is equipped with appropriate processing power and buffer space. COMP 680 E by M. Hamdi 41

Fourth Generation Routers/Switches Optics inside a router for the first time Optical links 100 s of metres Switch Core Linecards 0. 3 - 10 Tb/s routers in development COMP 680 E by M. Hamdi 42

Alcatel 7670 RSP Juniper TX 8/T 640 TX 8 Avici TSR Chiaro COMP 680 E by M. Hamdi 43

Next Gen. Backbone Network Architecture – One backbone, multiple access networks Dual Stack IPv 4 -IPv 6 Cable Network CE router Dual Stack IPv 4 -IPv 6 Enterprise Network Residentia l (G)MPLS based Multiservice Intelligent Packet Backbone PE Network PE Router Service (Service POP) POP PE DSL, FTTH, Dial CE router GGSN SGSN CE router ISP’s Telecomm Dual Stack uter IPv 4 -IPv 6 DSL/FTTH/Dial access Network IPv 6 IX ISP offering Native IPv 6 services • One Backbone COMP 680 E by. Network M. Hamdi 44 • Maximizes speed, flexibility and manageability

Current Generation: Generic Router Architecture Header Processing Data Hdr Lookup Update IP Address Header IP Address ~1 M prefixes Off-chip DRAM Queue Packet Data Hdr Next Hop Address Table Buffer Memory COMP 680 E by M. Hamdi ~1 M packets Off-chip DRAM 45

Current Generation: Generic Router Architecture (IQ) Data Hdr Header Processing Lookup IP Address Update Header Address Table Data Hdr Update Header Address Table Queue Packet 2 2 Data Hdr Buffer Memory Header Processing Lookup IP Address 1 Buffer Address Table Data Hdr 1 Data Memory. Hdr Header Processing Lookup IP Address Queue Packet Update Header Queue Packet Scheduler N N Buffer Data Memory. Hdr COMP 680 E by M. Hamdi 46

Current Generation: Generic Router Architecture (OQ) Data Hdr Header Processing Lookup IP Address Update Header 1 1 Buffer Memory Address Table Data Hdr Header Processing Lookup IP Address Update Header 2 2 Header Processing Lookup IP Address Table Queue Packet Buffer Memory Address Table Data Hdr Queue Packet Update Header N COMP 680 E by M. Hamdi N Queue Packet Buffer Memory 47

Basic Architectural Elements of a Current Router Typical IP Router Linecard Buffer & State Memory Physical Layer Buffer Mgmt & Scheduling Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Lookup Tables Buffer & State Memory Scheduler Buffered or Bufferless Fabric (e. g. crossbar, bus) OC 192 c Linecard: ~10 -30 M gates ~2 Gbits of memory ~2 square feet >$10 k cost; price $100 K COMP 680 E by M. Hamdi Backplane 48

Performance metrics 1. Capacity – “maximize C, s. t. volume < 2 m 3 and power < 5 k. W” 2. Throughput – Operators like to maximize usage of expensive longhaul links. 3. Controllable Delay – Some users would like predictable delay. – This is feasible with output-queueing plus weighted fair queueing (WFQ). WFQ COMP 680 E by M. Hamdi 49

Why do we Need Faster Routers? 1. To prevent routers from becoming the bottleneck in the Internet. 2. To increase POP capacity, and to reduce cost, size and power. COMP 680 E by M. Hamdi 50

Why we Need Faster Routers To prevent routers from being the bottleneck Line Capacity 2 x / 7 months User Traffic 2 x / 12 months Router Capacity 2. 2 x / 18 months Moore’s Law 2 x / 18 months DRAM Random Access Time 1. 1 x / 18 months COMP 680 E by M. Hamdi 51

Why we Need Faster Routers 1: To prevent routers from being the bottleneck Disparity between traffic and router growth traffic 5 -fold disparity Router capacity COMP 680 E by M. Hamdi 52

Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs • Big POPs need big routers POP with large routers POP with smaller routers • Interfaces: Price >$200 k, Power > 400 W • About 50 -60% of interfaces are used for interconnection within the POP. • Industry trend is towards large, single router per POP. COMP 680 E by M. Hamdi 53

A Case study: UUNET Internet Backbone Build Up 1999 View (4 Q) • 8 OC-48 links between POPs (not parallel) 2000 View (4 Q) • 52 OC-48 links between POPs: many parallel links • 3 OC-192 Super POP links: multiple parallel interfaces between POPs (D. C. – Chicago; NYC – D. C. ) To Meet the traffic growth, Higher Performance Routers with Higher Port Speed, are required COMP 680 E by M. Hamdi 54

Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs Further Reduces Cap. Ex, Operational cost Further increases network stability COMP 680 E by M. Hamdi 55

Ideal POP Existing Carrier Equipment Gigabit Routers Vo. IP Gateways SONET Digital Subscriber Line Aggregation Gigabit Routers CARRIER OPTICAL TRANSPORT DWDM and OPTICAL SWITCHES ATM Vo. IP Gateways SONET Digital Subscriber Line Aggregation ATM Gigabit Ethernet Cable Modem Aggregation COMP 680 E by M. Hamdi 56

Why are Fast Routers Difficult to Make? 1. Big disparity between line rates and memory access speed COMP 680 E by M. Hamdi 57

Problem: Fast Packet Buffers Example: 40 Gb/s packet buffer Size = RTT*BW = 10 Gb; 64 byte packets Write Rate, R 1 packet every 12. 8 ns Buffer Manager Read Rate, R 1 packet every 12. 8 ns Buffer Memory Use SRAM? Use DRAM? + fast enough random access time, but - too low density to store 10 Gb of data. + high density means we can store data, but - too slow (50 ns random access time). COMP 680 E by M. Hamdi 58

Memory Technology (2006) Technology Max single $/chip ($/MByte) density Access speed Watts/chip Networking DRAM 64 MB $30 -$50 ($0. 50 -$0. 75) 40 -80 ns 0. 5 -2 W SRAM 8 MB $50 -$60 ($5 -$8) 3 -4 ns 2 -3 W TCAM 2 MB $200 -$250 ($100 -$125) 4 -8 ns 15 -30 W COMP 680 E by M. Hamdi 59

How fast a buffer can be made? External Line Buffer Memory 64 -byte wide bus Rough Estimate: – – – ~5 ns for SRAM ~50 ns for DRAM 5/50 ns per memory operation. Two memory operations per packet. Therefore, maximum ~50/5 Gb/s. Aside: Buffers need to be large for TCP to work well, so DRAM is usually required. COMP 680 E by M. Hamdi 60

Packet Caches Small ingress SRAM cache of FIFO heads Small ingress SRAM cache of FIFO tails Arriving Packets 60 59 58 57 56 55 1 97 96 2 Buffer Manager Q SRAM 91 90 89 88 87 4 3 2 1 5 4 3 2 1 6 5 4 3 2 1 2 Departing Packets Q b>>1 packets at a time DRAM Buffer Memory 54 53 52 51 50 10 9 8 7 6 5 8 7 6 11 10 9 8 7 DRAM Buffer Memory 95 94 93 92 91 90 89 88 87 86 1 86 85 84 83 82 15 14 13 12 11 10 9 COMP 680 E by M. Hamdi 1 2 Q 61

Why are Fast Routers Difficult to Make? Instructions per arriving byte Packet processing gets harder What we’d like: (more features) Qo. S, Multicast, Security, … What will happen COMP 680 E by M. Hamdi time 62

Why are Fast Routers Difficult to Make? Clock cycles per minimum length packet since 1996 COMP 680 E by M. Hamdi 63

Options for packet processing • General purpose processor – MIPS – Power. PC – Intel • Network processor – Intel IXA and IXP processors – IBM Rainier – Control plane processors: Si. Byte (Broadcom), QED (PMC-Sierra). • FPGA • ASIC COMP 680 E by M. Hamdi 64

General Observations • Up until about 2000, – Low-end packet switches used general purpose processors, – Mid-range packet switches used FPGAs for datapath, general purpose processors for control plane. – High-end packet switches used ASICs for datapath, general purpose processors for control plane. • More recently, – 3 rd party network processors now used in many low- and mid-range datapaths. – Home-grown network processors used in high-end. COMP 680 E by M. Hamdi 65

Why are Fast Routers Difficult to Make? Demand for Router Performance Exceeds Moore’s Law Growth in capacity of commercial routers (per rack): – – – Capacity 1992 ~ 2 Gb/s Capacity 1995 ~ 10 Gb/s Capacity 1998 ~ 40 Gb/s Capacity 2001 ~ 160 Gb/s Capacity 2003 ~ 640 Gb/s Average growth rate: 2. 2 x / 18 months. COMP 680 E by M. Hamdi 66

Maximizing the throughput of a router Engine of the whole router • Operators increasingly demand throughput guarantees: – To maximize use of expensive long-haul links – For predictability and planning – Serve as many customers as possible – Increase the lifetime of the equipment – Despite lots of effort and theory, no commercial router today has a throughput guarantee. COMP 680 E by M. Hamdi 67

Maximizing the throughput of a router Engine of the whole router Interconnect Ingress linecard Buffer Framing Route TTL lookup process ing Egress linecard Buffer Qo. S Framing schedul ing Interconnect scheduling Control plane Control path Data path Scheduling path COMP 680 E by M. Hamdi 68

Maximizing the throughput of a router Engine of the whole router • This depends on the architecture of the switching: – Input Queued – Output Queued – Shared memory • It depends on the arbitration/scheduling algorithms within the specific architecture • This is key to the overall performance of the router. COMP 680 E by M. Hamdi 69

Why are Fast Routers Difficult to Make? Power: It is exceeding the limit COMP 680 E by M. Hamdi 70

Switching Architectures COMP 680 E by M. Hamdi 71

Generic Router Architecture Data Hdr Header Processing Lookup IP Address Update Header 1 1 Buffer Memory Address Table Data Hdr Header Processing Lookup IP Address Queue Packet Update Header 2 2 NQueue times line rate Packet Buffer Memory Address Table N times line rate Data Hdr Header Processing Lookup IP Address Table Update Header N COMP 680 E by M. Hamdi N Queue Packet Buffer Memory 72

Generic Router Architecture Data Hdr Header Processing Lookup IP Address Update Header Address Table Data Hdr Update Header Address Table Queue Packet 2 2 Data Hdr Buffer Memory Header Processing Lookup IP Address 1 Buffer Address Table Data Hdr 1 Data Memory. Hdr Header Processing Lookup IP Address Queue Packet Update Header Queue Packet Scheduler N N Buffer Data Memory. Hdr COMP 680 E by M. Hamdi 73

Interconnects Two basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e. g. crossbar) COMP 680 E by M. Hamdi Usually a fast bus 74

Simple model of output queued switch Link 1, ingress Link 2 Link 1 R 1 Link 3 Link 4 Link rate, R Link 2, ingress R Link 3, ingress R Link 4, ingress R COMP 680 E by M. Hamdi Link 1, egress Link rate, R Link 2, egress R Link 3, egress R Link 4, egress R 75

How an OQ Switch Works Output Queued (OQ) Switch COMP 680 E by M. Hamdi 76

Characteristics of an output queued (OQ) switch • Arriving packets are immediately written into the output queue, without intermediate buffering. • The flow of packets to one output does not affect the flow to another output. • An OQ switch has the highest throughput, and lowest delay. • The rate of individual flows, and the delay of packets can be controlled (Qo. S). COMP 680 E by M. Hamdi 77

The shared memory switch A single, physical memory device Link 1, ingress Link 1, egress Link 2, ingress Link 2, egress R R Link 3, ingress Link 3, egress R R Link N, ingress Link N, egress R R COMP 680 E by M. Hamdi 78

Characteristics of a shared memory switch COMP 680 E by M. Hamdi 79

Memory bandwidth Basic OQ switch: • Consider an OQ switch with N different physical memories, and all links operating at rate R bits/s. • In the worst case, packets may arrive continuously from all inputs, destined to just one output. • Maximum memory bandwidth requirement for each memory is (N+1)R bits/s. Shared Memory Switch: • Maximum memory bandwidth requirement for the memory is 2 NR bits/s. COMP 680 E by M. Hamdi 80

How fast can we make a centralized shared memory switch? 5 ns SRAM Shared Memory 1 v 5 ns per memory operation v Two memory operations per packet Therefore, up to 160 Gb/s (200 x 8/10 nsec) v 2 v In practice, closer to 80 Gb/s N 200 byte bus COMP 680 E by M. Hamdi 81

Output Queueing The “ideal” 1 2 1 2 11 2 2 1 COMP 680 E by M. Hamdi 82

How to Solve the Memory Bandwidth Problem? Use Input Queued Switches • In the worst case, one packet is written and one packet is read from an input buffer • Maximum memory bandwidth requirement for each memory is 2 R bits/s. • However, using FIFO input queues can result in what is called “Head-of-Line (Ho. L)” blocking COMP 680 E by M. Hamdi 83

Input Queueing Delay Head of Line Blocking Load COMP 680 E by M. Hamdi 58. 6% 100% 84

Head of Line Blocking COMP 680 E by M. Hamdi 85

COMP 680 E by M. Hamdi 86

COMP 680 E by M. Hamdi 87

Virtual Output Queues (Vo. Q) • Virtual Output Queues: – At each input port, there are N queues – each associated with an output port – Only one packet can go from an input port at a time – Only one packet can be received by an output port at a time • It retains the scalability of FIFO input-queued switches • It eliminates the Ho. L problem with FIFO input Queues COMP 680 E by M. Hamdi 88

Input Queueing Virtual output queues COMP 680 E by M. Hamdi 89

Input Queues Delay Virtual Output Queues Load COMP 680 E by M. Hamdi 100% 90

Input Queueing (Vo. Q) Memory b/w = 2 R Scheduler COMP 680 E by M. Hamdi Can be quite complex! 91

Combined IQ/SQ Architecture Can be a good compromise 1 …. . … Routing fabric N N output queues In one shared memory Packets (data) Flow control COMP 680 E by M. Hamdi 92

A Comparison Memory speeds for 32 x 32 switch Cell size = 64 bytes Shared-Memory Line Rate Memory BW Access Time Per cell 100 Mb/s 6. 4 Gb/s 80 ns 1 Gb/s 64 Gb/s 2. 5 Gb/s 160 Gb/s 10 Gb/s 640 Gb/s Input-queued Memory BW Access Time 200 Mb/s 2. 56 s 8 ns 2 Gb/s 256 ns 3. 2 ns 5 Gb/s 102. 4 ns 0. 8 ns 20 Gb/s COMP 680 E by M. Hamdi 25. 6 ns 93

Scalability of Switching Fabrics COMP 680 E by M. Hamdi 94

Shared Bus • It is the simplest interconnect possible • Protocols are very well established • Multicasting and broadcasting is natural • They have a scalability problem as we cannot have multiple transmissions concurrently • Its maximum bandwidth is around 100 Gbps – it limits the maximum number of I/O ports and/or the line rates • It is typically used for “small” shared memory switches or output-queued switches – very good choice for Ethernet switches COMP 680 E by M. Hamdi 95

Crossbars Data In • It is becoming the preferred interconnect of choice for highspeed switches • Have a very high throughput, and support Qo. S and multicast • N 2 crosspoints – but now it is not the real limitation nowadays configuration Data Out COMP 680 E by M. Hamdi 96

Limiting factors Crossbar switch – N 2 crosspoints per chip, – It’s not obvious how to build a crossbar from multiple chips, – Capacity of “I/O”s per chip. • State of the art: About 200 pins each operating at 3. 125 Gb/s ~= 600 Gb/s per chip. • About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. • Crossbar chips today are limited by the “I/O” capacity. COMP 680 E by M. Hamdi 97

Limitations to Building Large Crossbar Switches: I/O pins • Maximum practical bit rate per pin ~ 3. 125 Gbits/sec v At this speed you need between 2 -4 pins per single bit v To achieve a 10 Gbps/sec (OC-192) line rate, you need around 4 parallel data lines (4 -bit parallel transmission) v For example, consider a 4 -bit data parallel 64 -input crossbar that is designed to support OC-192 line rates per port. v Each port interface would require 4 x 3 = 12 pins in each direction. Hence a 64 -port crossbar would need 12 x 64 x 2 = 1536 pins just for the I/O data lines v Hence, the real problem is I/O pin limitations • How to solve the problem? COMP 680 E by M. Hamdi 98

Scaling: Trying to build a crossbar from multiple chips 16 x 16 crossbar switch: 4 inputs Building Block: 4 outputs Eight inputs and eight outputs required! COMP 680 E by M. Hamdi 99

How to build a scalable crossbar 1. Use bit slicing – parallel crossbars • For example, we can use 4 -bit crossbars to implement the previous example. So we need 4 parallel 1 -bit crossbars. • Each port interface would require 1 x 3 = 3 pins in each direction. Hence a 64 -port crossbar would need 3 x 64 x 2 = 384 pins for the I/O data lines – which is reasonable (but we need 4 chips here). COMP 680 E by M. Hamdi 100

Scaling: Bit-slicing N 8 7 6 5 4 3 2 1 Cell Linecard Scheduler COMP 680 E by M. Hamdi • Cell is “striped” across multiple identical planes. • Crossbar switched “bus”. • Scheduler makes same decision for all slices. 101

Scaling: Time-slicing Linecard N Cell 8 7 6 5 4 3 2 1 Cell Cell Scheduler COMP 680 E by M. Hamdi • Cell goes over one plane; takes N cell times. • Scheduler is unchanged. • Scheduler makes decision for each slice in turn. 102

HKUST 10 Gb/s 256 x 256 Crossbar Switch Fabric Design • Our overall switch fabric is an OC-192 256*256 crossbar switch • Such a system is composed of 8 256*256 crossbar chips, each running at 2 Gb/s (to compensate for the overhead and to provide a switch speedup) • The Deserializer (DES) is to convert the OC-192 10 Gb/s data at the fiber link to 8 low speed signals, while the Serializer (SER) is to serialize the low speed signals back to the fiber link COMP 680 E by M. Hamdi 103

Architecture of the Crossbar Chip • Crossbar Switch Core – fulfills the switch functions • Control – configures the crossbar core • High speed data link – communicates between this chip and SER/DES • PLL – provides on-chip precise clock COMP 680 E by M. Hamdi 104

Technical Specification of our Core-Crossbar Chip Full crossbar core 256*256 (embedded with 2 bit-slices) Technology TSMC 0. 25 mm SCN 5 M Deep (lambda=0. 12 mm) Layout size 14 mm * 8 mm Transistor counts 2000 k Supply voltage 2. 5 v Clock Frequency 1 GHz Power 40 W COMP 680 E by M. Hamdi 105

Layout of a 256*256 crossbar switch core COMP 680 E by M. Hamdi 106

HKUST Crossbar Chip in the News Researchers offer alternative to typical crossbar design http: //www. eetimes. com/story/OEG 20020820 S 0054 By Ron Wilson - EE Times August 21, 2002 (10: 56 a. m. ET) PALO ALTO, Calif. — In a technical paper presented at the Hot Chips conference here Monday (Aug. 19) researchers Ting Wu, Chi. Ying Tsui and Mounir Hamdi from Hong Kong University of Science and Technology (China) offered an alternative pipeline approach to crossbar design. Their approach has yielded a 256 -by-256 signal switch with a 2 -GHz input bandwidth, simulated in a 0. 25 -micron, 5 -metal process. The growing importance of crossbar switch matrices, now used for onchip interconnect as well as for switching fabric in routers, has led to increased study of the best ways to build these parts. COMP 680 E by M. Hamdi 107

Scaling a crossbar • Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). • In each scheme so far, the number of ports stays the same, but the speed of each port is increased. • What if we want to increase the number of ports? • Can we build a crossbar-equivalent from multiple stages of smaller crossbars? • If so, what properties should it have? COMP 680 E by M. Hamdi 108

Multi-Stage Switches COMP 680 E by M. Hamdi 109

Basic Switch Element This is equivalent to crosspoint in the crossbar (no longer a good argument) 0 0 1 1 Two States • Cross • Through Optional Buffering COMP 680 E by M. Hamdi 110

Example of Multistage Switch • It needs Nlog. N Internal switches (crosspoints) – less than the crossbar K N 0 1 2 3 4 5 6 7 0 1 0 1 one half of the deck the other half of the deck a perfect shuffle 0 1 0 1 a perfect shuffle COMP 680 E by M. Hamdi 000 001 010 011 100 101 110 111

Packet Routing The bits of the destination address provide the required routing tags. The digits in the destination address are used to set the state of the stages. destination port address 0 1 011 2 3 white bit controls 4 switch setting in each 5 stage 6 101 7 0 1 0 1 0 1 Stage 1 011 101 Perfect shuffle 0 1 0 1 Stage 2 011 0 1 011 101 0 1 101 Perfect shuffle COMP 680 E by M. Hamdi 0 1 000 001 010 011 100 101 110 111 Stage 3 112

Internal blocking • Internal link blocking as well as output blocking can happen in a Multistage switch. The following example illustrates an internal blocking for connections of input 0 to output 3 and input 4 to output 2. 011 010 0 1 2 3 4 5 6 7 0 1 010 0 1 blocking link ? ? ? 0 1 0 1 0 1 Stage 1 Perfect shuffle Stage 2 Perfect shuffle COMP 680 E by M. Hamdi 0 1 ? ? ? 000 001 010 011 100 101 110 111 Stage 3 113

Output Blocking The following example illustrates output blocking for the connections between input 1 and output 6, and input 3 and output 6. 110 0 1 2 3 4 5 6 7 0 1 110 0 1 Stage 1 110 Perfect shuffle 0 1 0 1 Stage 2 110 Perfect shuffle COMP 680 E by M. Hamdi 0 1 000 001 010 011 100 101 110 111 Stage 3 output blocking 114

A Solution: Batcher Sorter • One solution to the contention problem is to sort the cells into monotonically increasing order based on desired destination port • Done using a bitonic sorter called a Batcher • Places the M cells into gap-free increasing sequence on the first M input ports • Eliminates duplicate destinations COMP 680 E by M. Hamdi 115

Batcher-Banyan Example 0 0 0 1 1 1 2 3 4 3 4 6 4 5 7 5 6 6 7 7 COMP 680 E by M. Hamdi 116

Batcher-Banyan Example 0 0 0 1 6 1 2 3 7 3 4 5 6 7 5 6 4 COMP 680 E by M. Hamdi 7 117

Batcher-Banyan Example 0 0 0 1 6 1 2 3 7 3 4 5 5 6 6 7 4 COMP 680 E by M. Hamdi 7 118

Batcher-Banyan Example 0 0 0 1 3 1 2 6 2 3 3 4 1 5 4 5 6 7 4 7 COMP 680 E by M. Hamdi 119

Batcher-Banyan Example 0 0 0 1 3 1 2 2 3 6 3 4 1 4 5 5 6 4 6 7 7 7 COMP 680 E by M. Hamdi 120

Batcher-Banyan Example 0 0 0 1 1 1 2 3 3 4 4 5 6 6 6 7 7 7 COMP 680 E by M. Hamdi 121

Batcher-Banyan Example 0 0 0 1 1 1 2 2 3 3 3 4 4 4 5 5 6 6 6 7 7 7 COMP 680 E by M. Hamdi 122

Simple Sort & Route Network 3 0 0 0 6 3 3 1 3 3 0 3 4 5 3 Sort 4 Filter 4 Add 2 4 Conc. 6 0 Route 3 4 6 5 5 3 5 5 4 6 6 3 6 • Simple components with no buffering. – filter eliminates duplicates by comparing consecutive addresses and returns ack to inputs – adder computes and inserts “rank” of cells – concentrator uses rank as output address – routing network delivers to output • Adder, concentrator and routing network all have log 2 n stages COMP 680 E by M. Hamdi 123

3 -stage Clos Network m x m 1 n N n x k 1 k x n 1 2 … 2 … … m m k COMP 680 E by M. Hamdi 1 n N N = n x m k >= n 124

Clos-network Blocking • Blocking – When a connection is made it can exclude the possibility of certain other connections being made • Non-blocking – A new connection can always be made without disturbing the existing connections • Rearrangeably non-blocking – A new connection can be made but it might be necessary to reconfigure some other connections on the switch COMP 680 E by M. Hamdi 125

1 2 3 4 Connection cannot be set up between input 4 and output 1 A connection request from input 4 to output 1 is blocked 1 2 3 4 Connection can now be set up between input 4 and output 1 Same connection request can be satisfied by rearranging the existing connection from input 2 to output 2 COMP 680 E by M. Hamdi 126

Clos-network Properties Expansion factors • Strictly Nonblocking iff m >= 2 n -1 • Rearrangeable Nonblocking iff m >= n COMP 680 E by M. Hamdi 127

3 -stage Fabrics (Basic building block – a crossbar) Clos Network COMP 680 E by M. Hamdi 128

3 -Stage Fabrics Clos Network Expansion factor required = 2 -1/N (but still blocking for multicast) COMP 680 E by M. Hamdi 129

4 -Port Clos Network Strictly Non-blocking COMP 680 E by M. Hamdi 130

Construction example 1 32 x 48 32 #1 48 x 48 #1 48 x 32 #1 32 x 48 #2 48 x 32 #2 32 x 48 #32 48 x 48 #48 48 x 32 #32 33 64 • Switch size 1024 x 1024 • Construction module 993 1024 COMP 680 E by M. Hamdi – Input switch thirty-two 32 x 48 – Central switch forty-eight 48 x 48 – Output switch thirty-two 48 x 32 – Expansion 48/32=1. 5 131

Lucent Architecture Buffers COMP 680 E by M. Hamdi 132

MSM Architecture COMP 680 E by M. Hamdi 133

Cisco’s 46 Tbps Switch System Fabric Card Chassis Line Card Chassis 12. 5 G 40 G LC (1) S 1/S 3 (1) 18 x 18 LC (16) S 1/S 3 (8) 18 x 18 12. 5 G S 2 (1) 72 x 72 S 2 (18) 72 x 72 LCC(1) FCC(1) LC (1137) S 1/S 3 (569) 18 x 18 S 2 (127) 72 x 72 LC (1152) S 1/S 3 (576) 18 x 18 S 2 (144) 72 x 72 LCC(72) COMP 680 E by M. Hamdi FCC(8) • total 80 chassis • 8 sw planes • speedup 2. 5 • 1152 LICs • 1296 x 1296 switch fabric • 3 -stage Benes sw • multicast in the sw • 1: N fabric redundancy • 40 Gbps packet processor (188 RISCs) 134

Massively Parallel Switches • Instead of using tightly coupled fabrics like a crossbar or a bus, they use massively parallel interconnects such as hypercube, 2 D torus, and 3 D torus. • Few companies use this design architecture for their core routers • These fabrics are generally scalable • However: – It is very difficult to guarantee Qo. S and to include value-added functionalities (e. g. , multicast, fair bandwidth allocation) – They consume a lot of power – They are relatively costly COMP 680 E by M. Hamdi 135

Massively Parallel Switches COMP 680 E by M. Hamdi 136

3 D Switching Fabric: Avici • Three components – Topology 3 D torus – Routing source routing with randomization – Flow control virtual channels and virtual networks • Maximum configuration: 14 x 8 x 5 = 560 • Channel speed is 10 Gbps COMP 680 E by M. Hamdi 137

Packaging • Uniformly short wires between adjacent nodes – Can be built in passive backplanes – Run at high speed Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www. avici. com) COMP 680 E by M. Hamdi 138

Avici: Velociti™ Switch Fabric • Toroidal direct connect fabric (3 D Torus) • Scales to 560 active modules • Each element adds switching & forwarding capacity • Each module connects to 6 other modules COMP 680 E by M. Hamdi 139

Switch fabric chips comparison COMP 680 E by M. Hamdi 140
High performance switches and routers
High performance core router
Cisco rv120w price
High performance switches
History of the router
Business class routers
Sdn overview
Three-tier network topologies
Juniper ptx packet transport routers
Routers
Confidendial
Routers.
Routers internos
Consider three lans interconnected by two routers
Hnd routers
Systems integration specialists
Chromatography plate theory
High performance liquid chromatography introduction
Intelligent power switches
Benes network
Kundan switches models
Which type of reaction
Bridges vs switches
Cisco 100 series
Bridges vs switches
Pneumatic push button symbol
Mercury switches in cars
X-ray cwo
All switches illustrated in schematics are
Switched pdu
Schneider unica switches
Series resonant inverter with bidirectional switches
Clos criteria formula
Multiple processor systems
A switch combines crossbar switches in several stages
Zte ats
Netgear gsm/fsm fully managed switches
Used netgear gsm/fsm fully managed switches
Plc
We should not touch electric switches with wet hands. why
What is internet
Directive supportive leadership
Directive behavior and supportive behavior
Bars performance appraisal
Performance management vs performance appraisal
2018 jcids manual
Acn home phone
Acn cable
Acn canada high speed internet
Twitter
Sand: towards high-performance serverless computing
Maui high performance computing center
High performance linux clusters
High performance work practices examples
Laptops for high performance computing
High performance nutrition
High performance embedded computing
High performance distributed file system
High performance distributed file system
Anatomy of high-performance matrix multiplication
High performance development model
High performance organization principles
Adaptive insertion policies for high performance caching
High performance development model
High performance operating system
High performance sql server
High performance computing modernization program
Bigpurple nyu
High performance cycle theory
High performance indicator test
High performance data analytics hpda
Principles of high-performance processor design
High performance web site
High performance data analytics definition
Delphi high performance
High-performance forecasting
High performance operating system
High performance concrete
Design and control of concrete mixtures
The high performance hmi handbook pdf
High performance liquid chromatography hplc machine
High performance computing modernization program
High performance additives
Accelerating high performance
"high performance learning"
High performance shaders
High performance analysis
"high performance learning"
High performance food
High quality performance
High performance data mining
Superscalar vs vliw
What is the gpa equivalent of hibernate.cfg.xml file
High performance grid
High performance ssh
High-performance digital signal processing
Hpsc nasa
Matlab high performance computing
"high performance learning"
High performance planning
"high performance learning"
High performance embedded computing
High performance embedded computer
Regina high performance endurance
High performance servers
Ceph distributed file system
Army high performance computing research center
Adaptive insertion policies for high performance caching
Internet safety introduction
Introduction to internet slideshare
High precision vs high recall
High precision vs high recall
High expectations high support
High precision vs high accuracy
Pengertian investasi
Performance task introduction
Performance task intro
Job review examples
Introduction paragraph examples high school
Low voltage
High-density-interconnection
Mike mozer
High school introduction paragraph
Intro paragraph layout
Progress and performance measurement and evaluation
Evaluation in progress
Discussing advantages and disadvantages
Bulk tv and internet
Nuts and bolts internet
Press: television: radio: ?
Internet technologies and applications
The do’s and don’ts of online communication
Ms internet security and acceleration server
Ce este intranetul
Effects of internet use and study habits
Difference between internet and www
Give two pieces of advice from tablets for
What is cloud computing presentation
What is the answer to this problem
Medtech and the internet of medical things
Medical internet of things and big data in healthcare