Introduction to High Performance Internet Switches and Routers
- Slides: 140
Introduction to High. Performance Internet Switches and Routers COMP 680 E by M. Hamdi 1
Network Architecture Long Haul Network DWDM Core 10 Gb. E Core Routers Campus / Residential Metropolitan 10 Gb. E Edge Routers 10 Gb. E Edge switch Gb. E • • • Access Routers COMP 680 E by M. Hamdi • • • Access switch 2
pop pop COMP 680 E by M. Hamdi 3
How the Internet really is: Current Trend Modems, DSL SONET/SDH DWDM COMP 680 E by M. Hamdi 4
The Internet is a mesh of routers mostly interconnected by (ATM and) SONET (and DWDM) TDM TDM Circuit switched crossconnects, DWDM etc. COMP 680 E by M. Hamdi 5
Typical (BUT NOT ALL) IP Backbone (Late 1990’s) Core Router ATM Switch MUX SONET/SDH ADM SONET/SDH DCS SONET/SDH ADM MUX ATM Switch Core Router COMP 680 E by M. Hamdi 6
Points of Presence (POPs) POP 2 A POP 1 POP 4 B C POP 3 D E POP 5 POP 6 POP 7 POP 8 COMP 680 E by M. Hamdi F 7
Where High Performance Routers are Used (2. 5 Gb/s) R 1 R 2 R 5 R 4 R 3 R 8 R 9 R 10 R 7 R 11 R 14 R 13 (2. 5 Gb/s) R 6 R 15 COMP 680 E by M. Hamdi (2. 5 Gb/s) R 12 R 16 8 (2. 5 Gb/s)
Hierarchical arrangement End hosts (1000 s per mux) Access multiplexer Edge Routers Core Routers POP 10 Gb/s “OC 192” POP Point of Presence (POP) COMP 680 E by M. Hamdi POP: Point of Presence. Richly interconnected by mesh of long-haul links. Typically: 40 POPs per national network operator; 10 -40 core routers per POP. 9
Typical POP Configuration Transport Network DWDM/SONET Terminal Backbone routers 10 G WAN Transport Links > 50% of high speed interfaces are router-torouter (Core routers) 10 G Router-Router Intra-Office Links Aggregation switches/routers (Edge Switches) COMP 680 E by M. Hamdi 10
Today’s Network Equipment Routers Switches SONET DWDM LAYER 3 LAYER 2 LAYER 1 LAYER 0 Internet Protocol FR & ATM SONET DWDM COMP 680 E by M. Hamdi 11
Functions in a packet switch Interconnect Ingress linecard Buffer Framing Route TTL lookup process ing Egress linecard Buffer Qo. S Framing schedul ing Interconnect scheduling Control plane Control path Data path Scheduling path COMP 680 E by M. Hamdi 12
Functions in a circuit switch Ingress linecard Interconnect Framing Egress linecard Framing Interconnect scheduling Control plane Control path Data path COMP 680 E by M. Hamdi 13
Our emphasis for now is to look at packet switches (IP, ATM, Ethernet, framerelay, etc. ) COMP 680 E by M. Hamdi 14
What a Router Looks Like Cisco GSR 12416 Juniper M 160 19” Capacity: 160 Gb/s Power: 4. 2 k. W 6 ft Capacity: 80 Gb/s Power: 2. 6 k. W 3 ft 2. 5 ft COMP 680 E by M. Hamdi 15
A Router Chassis Fans/ Power Supplies Linecards COMP 680 E by M. Hamdi 16
Backplane • • A Circuit Board with connectors for line cards High speed electrical traces connecting line cards to fabric Usually passive Typically 30 -layer boards COMP 680 E by M. Hamdi 17
Line Card Picture COMP 680 E by M. Hamdi 18
What do these two have in common? Cisco Catalyst 3750 G Cisco CRS-1 COMP 680 E by M. Hamdi 19
What do these two have in common? • • • CRS-1 linecard 20” x (18”+11”) x 1 RU 40 Gbps, 80 MPPS State-of-the-art 0. 13 u silicon Full IP routing stack including IPv 4 and IPv 6 support Distributed IOS Multi-chassis support • • • Cat 3750 G Switch 19” x 16” x 1 RU 52 Gpbs, 78 MPPS State-of-the-art 0. 13 u silicon Full IP routing stack including IPv 4 and IPv 6 support Distributed IOS Multi-chassis support COMP 680 E by M. Hamdi 20
What is different between them? Cisco Catalyst 3750 G Cisco CRS-1 COMP 680 E by M. Hamdi 21
A lot… CRS-1 linecard Cat 3750 G Switch • Up to 1024 linecards • Up to 9 stack members • Fully programmable forwarding • Hardwired ASIC forwarding • MPLS support • Re-startable routing applications • 11 K prefix entries and 1. 5 K ACLs • 2 M prefix entries and 512 K ACLs • 32 Gbps shared stack ring • 46 Tbps 3 -stage switching fabric • L 2 switching support • H-A non-stop routing protocols COMP 680 E by M. Hamdi 22
Other packet switches Cisco 7500 “edge” routers Lucent GX 550 Core ATM switch DSL router COMP 680 E by M. Hamdi 23
What is Routing? R 3 R 1 A R 4 D B D C E D R 2 Destination Next Hop D R 3 E R 3 F R 5 by M. Hamdi COMP 680 E R 5 F 24
What is Routing? R 3 R 1 A 4 Ver 20 bytes B C R 4 16 HLen T. Service D Total Packet Length Flags Fragment Offset Fragment ID TTL D D 32 D 1 D Protocol Header Checksum R 2 Source Address Destination Next Hop D Options (if any) R 3 E F E Data R 5 F R 3 R 5 by M. Hamdi COMP 680 E 25
What is Routing? R 3 A R 1 R 4 D B C E R 2 R 5 COMP 680 E by M. Hamdi F 26
Basic Architectural Elements of a Router Routing • Routing table update (OSPF, RIP, IS-IS) • Admission Control • Congestion Control • Reservation • Routing • Switching Lookup • Arbitration • Packet • Scheduling Classifier Control Plane “Typically in Software” Switch (per-packet processing) “Typically in Hardware” Switching COMP 680 E by M. Hamdi 27
Basic Architectural Components Datapath: per-packet processing 1. Forwarding Table 2. Interconnect 3. Output Scheduling Forwarding Decision Forwarding Table Forwarding Decision COMP 680 E by M. Hamdi 28
Per-packet processing in a Switch/Router 1. Accept packet arriving on an ingress line. 2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s). 3. Manipulate packet header: e. g. , decrement TTL, update header checksum. 4. Send packet to outgoing interface(s). 5. Queue until line is free. 6. Transmit packet onto outgoing line. COMP 680 E by M. Hamdi 29
ATM Switch • • Lookup cell VCI/VPI in VC table. Replace old VCI/VPI with new. Forward cell to outgoing interface. Transmit cell onto link. COMP 680 E by M. Hamdi 30
Ethernet Switch • Lookup frame DA in forwarding table. – If known, forward to correct port. – If unknown, broadcast to all ports. • Learn SA of incoming frame. • Forward frame to outgoing interface. • Transmit frame onto link. COMP 680 E by M. Hamdi 31
IP Router • Lookup packet DA in forwarding table. – If known, forward to correct port. – If unknown, drop packet. • Decrement TTL, update header Cksum. • Forward packet to outgoing interface. • Transmit packet onto link. COMP 680 E by M. Hamdi 32
Special per packet/flow processing • The router can be equipped with additional capabilities to provide special services on a per-packet or per-class basis. • The router can perform some additional processing on the incoming packets: – Classifying the packet • IPv 4, IPv 6, MPLS, . . . – Delivering packets according to a pre-agreed service: Absolute service or relative service (e. g. , send a packet within a given deadline, give a packet a better service than another packet (Int. Serv – Diff. Serv)) – Filtering packets for security reasons – Treating multicast packets differently from unicast packets COMP 680 E by M. Hamdi 33
Per packet Processing Must be Fast !!! Year Aggregate Linerate Arriving rate of 40 B POS packets (Million pkts/sec) 1997 1999 2001 2003 2006 622 Mb/s 2. 5 Gb/s 10 Gb/s 40 Gb/s 80 Gb/s 1. 56 6. 25 25 100 200 1. Packet processing must be simple and easy to implement 2. Memory access time is the bottleneck 200 Mpps × 2 lookups/pkt = 400 Mlookups/sec → 2. 5 ns per lookup COMP 680 E by M. Hamdi 34
First Generation Routers Shared Backplane Li CP n I U nt e er fa ce M em or y CPU Route Table Buffer Memory Line Interface MAC MAC Typically <0. 5 Gb/s aggregate capacity COMP 680 E by M. Hamdi 35
Bus-based Router Architectures with Single Processor • The first generation of IP router • Based on software implementations on a single general-purpose CPU. • Limitations: – Serious processing bottleneck in the central processor – Memory intensive operations (e. g. table lookup & data movements) limits the effectiveness of processor power – A severe limiting factor to overall router throughput from input/output (I/O) bus COMP 680 E by M. Hamdi 36
Second Generation Routers CPU Route Table Buffer Memory Line Card Buffer Memory Fwding Cache MAC MAC Typically <5 Gb/s aggregate capacity COMP 680 E by M. Hamdi 37
Bus-based Router Architectures with Multiple Processors • Architectures with Route Caching – Second generation IP routers – Distribute packet forwarding operations – Network interface cards » Processors » Route caches – Packets are transmitted once over the shared bus – Limitations: » The central routing table is a bottleneck at high-speeds » traffic dependent throughput » shared bus is still a bottleneck COMP 680 E by M. Hamdi 38
Limitation of IP Packet Forwarding based on Route Caching • Routing changes invalidate existing cache entries and need re-establishment • The performance depends on: – a. how big the cache – b. how the cache is maintained – c. what the performance of the slow path is • Solution: – Using a forwarding database in each network interface • Benefit: – Performance, Scalability, Network resilience, and Functionality COMP 680 E by M. Hamdi 39
Third Generation Routers Switched Backplane Li I CP n ne Ute rf ac M e em or y Line Card CPU Card Line Card Local Buffer Memory Routing Table Local Buffer Memory Fwding Table MAC Typically <50 Gb/s aggregate capacity COMP 680 E by M. Hamdi 40
Switch-based Router Architectures with Fully Distributed Processors • To avoid bottlenecks: – Processing power – Memory bandwidth – Internal bus bandwidth • Each network interface is equipped with appropriate processing power and buffer space. COMP 680 E by M. Hamdi 41
Fourth Generation Routers/Switches Optics inside a router for the first time Optical links 100 s of metres Switch Core Linecards 0. 3 - 10 Tb/s routers in development COMP 680 E by M. Hamdi 42
Alcatel 7670 RSP Juniper TX 8/T 640 TX 8 Avici TSR Chiaro COMP 680 E by M. Hamdi 43
Next Gen. Backbone Network Architecture – One backbone, multiple access networks Dual Stack IPv 4 -IPv 6 Cable Network CE router Dual Stack IPv 4 -IPv 6 Enterprise Network Residentia l (G)MPLS based Multiservice Intelligent Packet Backbone PE Network PE Router Service (Service POP) POP PE DSL, FTTH, Dial CE router GGSN SGSN CE router ISP’s Telecomm Dual Stack uter IPv 4 -IPv 6 DSL/FTTH/Dial access Network IPv 6 IX ISP offering Native IPv 6 services • One Backbone COMP 680 E by. Network M. Hamdi 44 • Maximizes speed, flexibility and manageability
Current Generation: Generic Router Architecture Header Processing Data Hdr Lookup Update IP Address Header IP Address ~1 M prefixes Off-chip DRAM Queue Packet Data Hdr Next Hop Address Table Buffer Memory COMP 680 E by M. Hamdi ~1 M packets Off-chip DRAM 45
Current Generation: Generic Router Architecture (IQ) Data Hdr Header Processing Lookup IP Address Update Header Address Table Data Hdr Update Header Address Table Queue Packet 2 2 Data Hdr Buffer Memory Header Processing Lookup IP Address 1 Buffer Address Table Data Hdr 1 Data Memory. Hdr Header Processing Lookup IP Address Queue Packet Update Header Queue Packet Scheduler N N Buffer Data Memory. Hdr COMP 680 E by M. Hamdi 46
Current Generation: Generic Router Architecture (OQ) Data Hdr Header Processing Lookup IP Address Update Header 1 1 Buffer Memory Address Table Data Hdr Header Processing Lookup IP Address Update Header 2 2 Header Processing Lookup IP Address Table Queue Packet Buffer Memory Address Table Data Hdr Queue Packet Update Header N COMP 680 E by M. Hamdi N Queue Packet Buffer Memory 47
Basic Architectural Elements of a Current Router Typical IP Router Linecard Buffer & State Memory Physical Layer Buffer Mgmt & Scheduling Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Lookup Tables Buffer & State Memory Scheduler Buffered or Bufferless Fabric (e. g. crossbar, bus) OC 192 c Linecard: ~10 -30 M gates ~2 Gbits of memory ~2 square feet >$10 k cost; price $100 K COMP 680 E by M. Hamdi Backplane 48
Performance metrics 1. Capacity – “maximize C, s. t. volume < 2 m 3 and power < 5 k. W” 2. Throughput – Operators like to maximize usage of expensive longhaul links. 3. Controllable Delay – Some users would like predictable delay. – This is feasible with output-queueing plus weighted fair queueing (WFQ). WFQ COMP 680 E by M. Hamdi 49
Why do we Need Faster Routers? 1. To prevent routers from becoming the bottleneck in the Internet. 2. To increase POP capacity, and to reduce cost, size and power. COMP 680 E by M. Hamdi 50
Why we Need Faster Routers To prevent routers from being the bottleneck Line Capacity 2 x / 7 months User Traffic 2 x / 12 months Router Capacity 2. 2 x / 18 months Moore’s Law 2 x / 18 months DRAM Random Access Time 1. 1 x / 18 months COMP 680 E by M. Hamdi 51
Why we Need Faster Routers 1: To prevent routers from being the bottleneck Disparity between traffic and router growth traffic 5 -fold disparity Router capacity COMP 680 E by M. Hamdi 52
Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs • Big POPs need big routers POP with large routers POP with smaller routers • Interfaces: Price >$200 k, Power > 400 W • About 50 -60% of interfaces are used for interconnection within the POP. • Industry trend is towards large, single router per POP. COMP 680 E by M. Hamdi 53
A Case study: UUNET Internet Backbone Build Up 1999 View (4 Q) • 8 OC-48 links between POPs (not parallel) 2000 View (4 Q) • 52 OC-48 links between POPs: many parallel links • 3 OC-192 Super POP links: multiple parallel interfaces between POPs (D. C. – Chicago; NYC – D. C. ) To Meet the traffic growth, Higher Performance Routers with Higher Port Speed, are required COMP 680 E by M. Hamdi 54
Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs Further Reduces Cap. Ex, Operational cost Further increases network stability COMP 680 E by M. Hamdi 55
Ideal POP Existing Carrier Equipment Gigabit Routers Vo. IP Gateways SONET Digital Subscriber Line Aggregation Gigabit Routers CARRIER OPTICAL TRANSPORT DWDM and OPTICAL SWITCHES ATM Vo. IP Gateways SONET Digital Subscriber Line Aggregation ATM Gigabit Ethernet Cable Modem Aggregation COMP 680 E by M. Hamdi 56
Why are Fast Routers Difficult to Make? 1. Big disparity between line rates and memory access speed COMP 680 E by M. Hamdi 57
Problem: Fast Packet Buffers Example: 40 Gb/s packet buffer Size = RTT*BW = 10 Gb; 64 byte packets Write Rate, R 1 packet every 12. 8 ns Buffer Manager Read Rate, R 1 packet every 12. 8 ns Buffer Memory Use SRAM? Use DRAM? + fast enough random access time, but - too low density to store 10 Gb of data. + high density means we can store data, but - too slow (50 ns random access time). COMP 680 E by M. Hamdi 58
Memory Technology (2006) Technology Max single $/chip ($/MByte) density Access speed Watts/chip Networking DRAM 64 MB $30 -$50 ($0. 50 -$0. 75) 40 -80 ns 0. 5 -2 W SRAM 8 MB $50 -$60 ($5 -$8) 3 -4 ns 2 -3 W TCAM 2 MB $200 -$250 ($100 -$125) 4 -8 ns 15 -30 W COMP 680 E by M. Hamdi 59
How fast a buffer can be made? External Line Buffer Memory 64 -byte wide bus Rough Estimate: – – – ~5 ns for SRAM ~50 ns for DRAM 5/50 ns per memory operation. Two memory operations per packet. Therefore, maximum ~50/5 Gb/s. Aside: Buffers need to be large for TCP to work well, so DRAM is usually required. COMP 680 E by M. Hamdi 60
Packet Caches Small ingress SRAM cache of FIFO heads Small ingress SRAM cache of FIFO tails Arriving Packets 60 59 58 57 56 55 1 97 96 2 Buffer Manager Q SRAM 91 90 89 88 87 4 3 2 1 5 4 3 2 1 6 5 4 3 2 1 2 Departing Packets Q b>>1 packets at a time DRAM Buffer Memory 54 53 52 51 50 10 9 8 7 6 5 8 7 6 11 10 9 8 7 DRAM Buffer Memory 95 94 93 92 91 90 89 88 87 86 1 86 85 84 83 82 15 14 13 12 11 10 9 COMP 680 E by M. Hamdi 1 2 Q 61
Why are Fast Routers Difficult to Make? Instructions per arriving byte Packet processing gets harder What we’d like: (more features) Qo. S, Multicast, Security, … What will happen COMP 680 E by M. Hamdi time 62
Why are Fast Routers Difficult to Make? Clock cycles per minimum length packet since 1996 COMP 680 E by M. Hamdi 63
Options for packet processing • General purpose processor – MIPS – Power. PC – Intel • Network processor – Intel IXA and IXP processors – IBM Rainier – Control plane processors: Si. Byte (Broadcom), QED (PMC-Sierra). • FPGA • ASIC COMP 680 E by M. Hamdi 64
General Observations • Up until about 2000, – Low-end packet switches used general purpose processors, – Mid-range packet switches used FPGAs for datapath, general purpose processors for control plane. – High-end packet switches used ASICs for datapath, general purpose processors for control plane. • More recently, – 3 rd party network processors now used in many low- and mid-range datapaths. – Home-grown network processors used in high-end. COMP 680 E by M. Hamdi 65
Why are Fast Routers Difficult to Make? Demand for Router Performance Exceeds Moore’s Law Growth in capacity of commercial routers (per rack): – – – Capacity 1992 ~ 2 Gb/s Capacity 1995 ~ 10 Gb/s Capacity 1998 ~ 40 Gb/s Capacity 2001 ~ 160 Gb/s Capacity 2003 ~ 640 Gb/s Average growth rate: 2. 2 x / 18 months. COMP 680 E by M. Hamdi 66
Maximizing the throughput of a router Engine of the whole router • Operators increasingly demand throughput guarantees: – To maximize use of expensive long-haul links – For predictability and planning – Serve as many customers as possible – Increase the lifetime of the equipment – Despite lots of effort and theory, no commercial router today has a throughput guarantee. COMP 680 E by M. Hamdi 67
Maximizing the throughput of a router Engine of the whole router Interconnect Ingress linecard Buffer Framing Route TTL lookup process ing Egress linecard Buffer Qo. S Framing schedul ing Interconnect scheduling Control plane Control path Data path Scheduling path COMP 680 E by M. Hamdi 68
Maximizing the throughput of a router Engine of the whole router • This depends on the architecture of the switching: – Input Queued – Output Queued – Shared memory • It depends on the arbitration/scheduling algorithms within the specific architecture • This is key to the overall performance of the router. COMP 680 E by M. Hamdi 69
Why are Fast Routers Difficult to Make? Power: It is exceeding the limit COMP 680 E by M. Hamdi 70
Switching Architectures COMP 680 E by M. Hamdi 71
Generic Router Architecture Data Hdr Header Processing Lookup IP Address Update Header 1 1 Buffer Memory Address Table Data Hdr Header Processing Lookup IP Address Queue Packet Update Header 2 2 NQueue times line rate Packet Buffer Memory Address Table N times line rate Data Hdr Header Processing Lookup IP Address Table Update Header N COMP 680 E by M. Hamdi N Queue Packet Buffer Memory 72
Generic Router Architecture Data Hdr Header Processing Lookup IP Address Update Header Address Table Data Hdr Update Header Address Table Queue Packet 2 2 Data Hdr Buffer Memory Header Processing Lookup IP Address 1 Buffer Address Table Data Hdr 1 Data Memory. Hdr Header Processing Lookup IP Address Queue Packet Update Header Queue Packet Scheduler N N Buffer Data Memory. Hdr COMP 680 E by M. Hamdi 73
Interconnects Two basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e. g. crossbar) COMP 680 E by M. Hamdi Usually a fast bus 74
Simple model of output queued switch Link 1, ingress Link 2 Link 1 R 1 Link 3 Link 4 Link rate, R Link 2, ingress R Link 3, ingress R Link 4, ingress R COMP 680 E by M. Hamdi Link 1, egress Link rate, R Link 2, egress R Link 3, egress R Link 4, egress R 75
How an OQ Switch Works Output Queued (OQ) Switch COMP 680 E by M. Hamdi 76
Characteristics of an output queued (OQ) switch • Arriving packets are immediately written into the output queue, without intermediate buffering. • The flow of packets to one output does not affect the flow to another output. • An OQ switch has the highest throughput, and lowest delay. • The rate of individual flows, and the delay of packets can be controlled (Qo. S). COMP 680 E by M. Hamdi 77
The shared memory switch A single, physical memory device Link 1, ingress Link 1, egress Link 2, ingress Link 2, egress R R Link 3, ingress Link 3, egress R R Link N, ingress Link N, egress R R COMP 680 E by M. Hamdi 78
Characteristics of a shared memory switch COMP 680 E by M. Hamdi 79
Memory bandwidth Basic OQ switch: • Consider an OQ switch with N different physical memories, and all links operating at rate R bits/s. • In the worst case, packets may arrive continuously from all inputs, destined to just one output. • Maximum memory bandwidth requirement for each memory is (N+1)R bits/s. Shared Memory Switch: • Maximum memory bandwidth requirement for the memory is 2 NR bits/s. COMP 680 E by M. Hamdi 80
How fast can we make a centralized shared memory switch? 5 ns SRAM Shared Memory 1 v 5 ns per memory operation v Two memory operations per packet Therefore, up to 160 Gb/s (200 x 8/10 nsec) v 2 v In practice, closer to 80 Gb/s N 200 byte bus COMP 680 E by M. Hamdi 81
Output Queueing The “ideal” 1 2 1 2 11 2 2 1 COMP 680 E by M. Hamdi 82
How to Solve the Memory Bandwidth Problem? Use Input Queued Switches • In the worst case, one packet is written and one packet is read from an input buffer • Maximum memory bandwidth requirement for each memory is 2 R bits/s. • However, using FIFO input queues can result in what is called “Head-of-Line (Ho. L)” blocking COMP 680 E by M. Hamdi 83
Input Queueing Delay Head of Line Blocking Load COMP 680 E by M. Hamdi 58. 6% 100% 84
Head of Line Blocking COMP 680 E by M. Hamdi 85
COMP 680 E by M. Hamdi 86
COMP 680 E by M. Hamdi 87
Virtual Output Queues (Vo. Q) • Virtual Output Queues: – At each input port, there are N queues – each associated with an output port – Only one packet can go from an input port at a time – Only one packet can be received by an output port at a time • It retains the scalability of FIFO input-queued switches • It eliminates the Ho. L problem with FIFO input Queues COMP 680 E by M. Hamdi 88
Input Queueing Virtual output queues COMP 680 E by M. Hamdi 89
Input Queues Delay Virtual Output Queues Load COMP 680 E by M. Hamdi 100% 90
Input Queueing (Vo. Q) Memory b/w = 2 R Scheduler COMP 680 E by M. Hamdi Can be quite complex! 91
Combined IQ/SQ Architecture Can be a good compromise 1 …. . … Routing fabric N N output queues In one shared memory Packets (data) Flow control COMP 680 E by M. Hamdi 92
A Comparison Memory speeds for 32 x 32 switch Cell size = 64 bytes Shared-Memory Line Rate Memory BW Access Time Per cell 100 Mb/s 6. 4 Gb/s 80 ns 1 Gb/s 64 Gb/s 2. 5 Gb/s 160 Gb/s 10 Gb/s 640 Gb/s Input-queued Memory BW Access Time 200 Mb/s 2. 56 s 8 ns 2 Gb/s 256 ns 3. 2 ns 5 Gb/s 102. 4 ns 0. 8 ns 20 Gb/s COMP 680 E by M. Hamdi 25. 6 ns 93
Scalability of Switching Fabrics COMP 680 E by M. Hamdi 94
Shared Bus • It is the simplest interconnect possible • Protocols are very well established • Multicasting and broadcasting is natural • They have a scalability problem as we cannot have multiple transmissions concurrently • Its maximum bandwidth is around 100 Gbps – it limits the maximum number of I/O ports and/or the line rates • It is typically used for “small” shared memory switches or output-queued switches – very good choice for Ethernet switches COMP 680 E by M. Hamdi 95
Crossbars Data In • It is becoming the preferred interconnect of choice for highspeed switches • Have a very high throughput, and support Qo. S and multicast • N 2 crosspoints – but now it is not the real limitation nowadays configuration Data Out COMP 680 E by M. Hamdi 96
Limiting factors Crossbar switch – N 2 crosspoints per chip, – It’s not obvious how to build a crossbar from multiple chips, – Capacity of “I/O”s per chip. • State of the art: About 200 pins each operating at 3. 125 Gb/s ~= 600 Gb/s per chip. • About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. • Crossbar chips today are limited by the “I/O” capacity. COMP 680 E by M. Hamdi 97
Limitations to Building Large Crossbar Switches: I/O pins • Maximum practical bit rate per pin ~ 3. 125 Gbits/sec v At this speed you need between 2 -4 pins per single bit v To achieve a 10 Gbps/sec (OC-192) line rate, you need around 4 parallel data lines (4 -bit parallel transmission) v For example, consider a 4 -bit data parallel 64 -input crossbar that is designed to support OC-192 line rates per port. v Each port interface would require 4 x 3 = 12 pins in each direction. Hence a 64 -port crossbar would need 12 x 64 x 2 = 1536 pins just for the I/O data lines v Hence, the real problem is I/O pin limitations • How to solve the problem? COMP 680 E by M. Hamdi 98
Scaling: Trying to build a crossbar from multiple chips 16 x 16 crossbar switch: 4 inputs Building Block: 4 outputs Eight inputs and eight outputs required! COMP 680 E by M. Hamdi 99
How to build a scalable crossbar 1. Use bit slicing – parallel crossbars • For example, we can use 4 -bit crossbars to implement the previous example. So we need 4 parallel 1 -bit crossbars. • Each port interface would require 1 x 3 = 3 pins in each direction. Hence a 64 -port crossbar would need 3 x 64 x 2 = 384 pins for the I/O data lines – which is reasonable (but we need 4 chips here). COMP 680 E by M. Hamdi 100
Scaling: Bit-slicing N 8 7 6 5 4 3 2 1 Cell Linecard Scheduler COMP 680 E by M. Hamdi • Cell is “striped” across multiple identical planes. • Crossbar switched “bus”. • Scheduler makes same decision for all slices. 101
Scaling: Time-slicing Linecard N Cell 8 7 6 5 4 3 2 1 Cell Cell Scheduler COMP 680 E by M. Hamdi • Cell goes over one plane; takes N cell times. • Scheduler is unchanged. • Scheduler makes decision for each slice in turn. 102
HKUST 10 Gb/s 256 x 256 Crossbar Switch Fabric Design • Our overall switch fabric is an OC-192 256*256 crossbar switch • Such a system is composed of 8 256*256 crossbar chips, each running at 2 Gb/s (to compensate for the overhead and to provide a switch speedup) • The Deserializer (DES) is to convert the OC-192 10 Gb/s data at the fiber link to 8 low speed signals, while the Serializer (SER) is to serialize the low speed signals back to the fiber link COMP 680 E by M. Hamdi 103
Architecture of the Crossbar Chip • Crossbar Switch Core – fulfills the switch functions • Control – configures the crossbar core • High speed data link – communicates between this chip and SER/DES • PLL – provides on-chip precise clock COMP 680 E by M. Hamdi 104
Technical Specification of our Core-Crossbar Chip Full crossbar core 256*256 (embedded with 2 bit-slices) Technology TSMC 0. 25 mm SCN 5 M Deep (lambda=0. 12 mm) Layout size 14 mm * 8 mm Transistor counts 2000 k Supply voltage 2. 5 v Clock Frequency 1 GHz Power 40 W COMP 680 E by M. Hamdi 105
Layout of a 256*256 crossbar switch core COMP 680 E by M. Hamdi 106
HKUST Crossbar Chip in the News Researchers offer alternative to typical crossbar design http: //www. eetimes. com/story/OEG 20020820 S 0054 By Ron Wilson - EE Times August 21, 2002 (10: 56 a. m. ET) PALO ALTO, Calif. — In a technical paper presented at the Hot Chips conference here Monday (Aug. 19) researchers Ting Wu, Chi. Ying Tsui and Mounir Hamdi from Hong Kong University of Science and Technology (China) offered an alternative pipeline approach to crossbar design. Their approach has yielded a 256 -by-256 signal switch with a 2 -GHz input bandwidth, simulated in a 0. 25 -micron, 5 -metal process. The growing importance of crossbar switch matrices, now used for onchip interconnect as well as for switching fabric in routers, has led to increased study of the best ways to build these parts. COMP 680 E by M. Hamdi 107
Scaling a crossbar • Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). • In each scheme so far, the number of ports stays the same, but the speed of each port is increased. • What if we want to increase the number of ports? • Can we build a crossbar-equivalent from multiple stages of smaller crossbars? • If so, what properties should it have? COMP 680 E by M. Hamdi 108
Multi-Stage Switches COMP 680 E by M. Hamdi 109
Basic Switch Element This is equivalent to crosspoint in the crossbar (no longer a good argument) 0 0 1 1 Two States • Cross • Through Optional Buffering COMP 680 E by M. Hamdi 110
Example of Multistage Switch • It needs Nlog. N Internal switches (crosspoints) – less than the crossbar K N 0 1 2 3 4 5 6 7 0 1 0 1 one half of the deck the other half of the deck a perfect shuffle 0 1 0 1 a perfect shuffle COMP 680 E by M. Hamdi 000 001 010 011 100 101 110 111
Packet Routing The bits of the destination address provide the required routing tags. The digits in the destination address are used to set the state of the stages. destination port address 0 1 011 2 3 white bit controls 4 switch setting in each 5 stage 6 101 7 0 1 0 1 0 1 Stage 1 011 101 Perfect shuffle 0 1 0 1 Stage 2 011 0 1 011 101 0 1 101 Perfect shuffle COMP 680 E by M. Hamdi 0 1 000 001 010 011 100 101 110 111 Stage 3 112
Internal blocking • Internal link blocking as well as output blocking can happen in a Multistage switch. The following example illustrates an internal blocking for connections of input 0 to output 3 and input 4 to output 2. 011 010 0 1 2 3 4 5 6 7 0 1 010 0 1 blocking link ? ? ? 0 1 0 1 0 1 Stage 1 Perfect shuffle Stage 2 Perfect shuffle COMP 680 E by M. Hamdi 0 1 ? ? ? 000 001 010 011 100 101 110 111 Stage 3 113
Output Blocking The following example illustrates output blocking for the connections between input 1 and output 6, and input 3 and output 6. 110 0 1 2 3 4 5 6 7 0 1 110 0 1 Stage 1 110 Perfect shuffle 0 1 0 1 Stage 2 110 Perfect shuffle COMP 680 E by M. Hamdi 0 1 000 001 010 011 100 101 110 111 Stage 3 output blocking 114
A Solution: Batcher Sorter • One solution to the contention problem is to sort the cells into monotonically increasing order based on desired destination port • Done using a bitonic sorter called a Batcher • Places the M cells into gap-free increasing sequence on the first M input ports • Eliminates duplicate destinations COMP 680 E by M. Hamdi 115
Batcher-Banyan Example 0 0 0 1 1 1 2 3 4 3 4 6 4 5 7 5 6 6 7 7 COMP 680 E by M. Hamdi 116
Batcher-Banyan Example 0 0 0 1 6 1 2 3 7 3 4 5 6 7 5 6 4 COMP 680 E by M. Hamdi 7 117
Batcher-Banyan Example 0 0 0 1 6 1 2 3 7 3 4 5 5 6 6 7 4 COMP 680 E by M. Hamdi 7 118
Batcher-Banyan Example 0 0 0 1 3 1 2 6 2 3 3 4 1 5 4 5 6 7 4 7 COMP 680 E by M. Hamdi 119
Batcher-Banyan Example 0 0 0 1 3 1 2 2 3 6 3 4 1 4 5 5 6 4 6 7 7 7 COMP 680 E by M. Hamdi 120
Batcher-Banyan Example 0 0 0 1 1 1 2 3 3 4 4 5 6 6 6 7 7 7 COMP 680 E by M. Hamdi 121
Batcher-Banyan Example 0 0 0 1 1 1 2 2 3 3 3 4 4 4 5 5 6 6 6 7 7 7 COMP 680 E by M. Hamdi 122
Simple Sort & Route Network 3 0 0 0 6 3 3 1 3 3 0 3 4 5 3 Sort 4 Filter 4 Add 2 4 Conc. 6 0 Route 3 4 6 5 5 3 5 5 4 6 6 3 6 • Simple components with no buffering. – filter eliminates duplicates by comparing consecutive addresses and returns ack to inputs – adder computes and inserts “rank” of cells – concentrator uses rank as output address – routing network delivers to output • Adder, concentrator and routing network all have log 2 n stages COMP 680 E by M. Hamdi 123
3 -stage Clos Network m x m 1 n N n x k 1 k x n 1 2 … 2 … … m m k COMP 680 E by M. Hamdi 1 n N N = n x m k >= n 124
Clos-network Blocking • Blocking – When a connection is made it can exclude the possibility of certain other connections being made • Non-blocking – A new connection can always be made without disturbing the existing connections • Rearrangeably non-blocking – A new connection can be made but it might be necessary to reconfigure some other connections on the switch COMP 680 E by M. Hamdi 125
1 2 3 4 Connection cannot be set up between input 4 and output 1 A connection request from input 4 to output 1 is blocked 1 2 3 4 Connection can now be set up between input 4 and output 1 Same connection request can be satisfied by rearranging the existing connection from input 2 to output 2 COMP 680 E by M. Hamdi 126
Clos-network Properties Expansion factors • Strictly Nonblocking iff m >= 2 n -1 • Rearrangeable Nonblocking iff m >= n COMP 680 E by M. Hamdi 127
3 -stage Fabrics (Basic building block – a crossbar) Clos Network COMP 680 E by M. Hamdi 128
3 -Stage Fabrics Clos Network Expansion factor required = 2 -1/N (but still blocking for multicast) COMP 680 E by M. Hamdi 129
4 -Port Clos Network Strictly Non-blocking COMP 680 E by M. Hamdi 130
Construction example 1 32 x 48 32 #1 48 x 48 #1 48 x 32 #1 32 x 48 #2 48 x 32 #2 32 x 48 #32 48 x 48 #48 48 x 32 #32 33 64 • Switch size 1024 x 1024 • Construction module 993 1024 COMP 680 E by M. Hamdi – Input switch thirty-two 32 x 48 – Central switch forty-eight 48 x 48 – Output switch thirty-two 48 x 32 – Expansion 48/32=1. 5 131
Lucent Architecture Buffers COMP 680 E by M. Hamdi 132
MSM Architecture COMP 680 E by M. Hamdi 133
Cisco’s 46 Tbps Switch System Fabric Card Chassis Line Card Chassis 12. 5 G 40 G LC (1) S 1/S 3 (1) 18 x 18 LC (16) S 1/S 3 (8) 18 x 18 12. 5 G S 2 (1) 72 x 72 S 2 (18) 72 x 72 LCC(1) FCC(1) LC (1137) S 1/S 3 (569) 18 x 18 S 2 (127) 72 x 72 LC (1152) S 1/S 3 (576) 18 x 18 S 2 (144) 72 x 72 LCC(72) COMP 680 E by M. Hamdi FCC(8) • total 80 chassis • 8 sw planes • speedup 2. 5 • 1152 LICs • 1296 x 1296 switch fabric • 3 -stage Benes sw • multicast in the sw • 1: N fabric redundancy • 40 Gbps packet processor (188 RISCs) 134
Massively Parallel Switches • Instead of using tightly coupled fabrics like a crossbar or a bus, they use massively parallel interconnects such as hypercube, 2 D torus, and 3 D torus. • Few companies use this design architecture for their core routers • These fabrics are generally scalable • However: – It is very difficult to guarantee Qo. S and to include value-added functionalities (e. g. , multicast, fair bandwidth allocation) – They consume a lot of power – They are relatively costly COMP 680 E by M. Hamdi 135
Massively Parallel Switches COMP 680 E by M. Hamdi 136
3 D Switching Fabric: Avici • Three components – Topology 3 D torus – Routing source routing with randomization – Flow control virtual channels and virtual networks • Maximum configuration: 14 x 8 x 5 = 560 • Channel speed is 10 Gbps COMP 680 E by M. Hamdi 137
Packaging • Uniformly short wires between adjacent nodes – Can be built in passive backplanes – Run at high speed Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www. avici. com) COMP 680 E by M. Hamdi 138
Avici: Velociti™ Switch Fabric • Toroidal direct connect fabric (3 D Torus) • Scales to 560 active modules • Each element adds switching & forwarding capacity • Each module connects to 6 other modules COMP 680 E by M. Hamdi 139
Switch fabric chips comparison COMP 680 E by M. Hamdi 140
- High performance switches and routers
- High performance core router
- Cisco rv120w price
- High performance switches
- History of the router
- Business class routers
- Sdn overview
- Three-tier network topologies
- Juniper ptx packet transport routers
- Routers
- Confidendial
- Routers.
- Routers internos
- Consider three lans interconnected by two routers
- Hnd routers
- Systems integration specialists
- Chromatography plate theory
- High performance liquid chromatography introduction
- Intelligent power switches
- Benes network
- Kundan switches models
- Which type of reaction
- Bridges vs switches
- Cisco 100 series
- Bridges vs switches
- Pneumatic push button symbol
- Mercury switches in cars
- X-ray cwo
- All switches illustrated in schematics are
- Switched pdu
- Schneider unica switches
- Series resonant inverter with bidirectional switches
- Clos criteria formula
- Multiple processor systems
- A switch combines crossbar switches in several stages
- Zte ats
- Netgear gsm/fsm fully managed switches
- Used netgear gsm/fsm fully managed switches
- Plc
- We should not touch electric switches with wet hands. why
- What is internet
- Directive supportive leadership
- Directive behavior and supportive behavior
- Bars performance appraisal
- Performance management vs performance appraisal
- 2018 jcids manual
- Acn home phone
- Acn cable
- Acn canada high speed internet
- Sand: towards high-performance serverless computing
- Maui high performance computing center
- High performance linux clusters
- High performance work practices examples
- Laptops for high performance computing
- High performance nutrition
- High performance embedded computing
- High performance distributed file system
- High performance distributed file system
- Anatomy of high-performance matrix multiplication
- High performance development model
- High performance organization principles
- Adaptive insertion policies for high performance caching
- High performance development model
- High performance operating system
- High performance sql server
- High performance computing modernization program
- Bigpurple nyu
- High performance cycle theory
- High performance indicator test
- High performance data analytics hpda
- Principles of high-performance processor design
- High performance web site
- High performance data analytics definition
- Delphi high performance
- High-performance forecasting
- High performance operating system
- High performance concrete
- Design and control of concrete mixtures
- The high performance hmi handbook pdf
- High performance liquid chromatography hplc machine
- High performance computing modernization program
- High performance additives
- Accelerating high performance
- "high performance learning"
- High performance shaders
- High performance analysis
- "high performance learning"
- High performance food
- High quality performance
- High performance data mining
- Superscalar vs vliw
- What is the gpa equivalent of hibernate.cfg.xml file
- High performance grid
- High performance ssh
- High-performance digital signal processing
- Hpsc nasa
- Matlab high performance computing
- "high performance learning"
- High performance planning
- "high performance learning"
- High performance embedded computing
- High performance embedded computer
- Regina high performance endurance
- High performance servers
- Ceph distributed file system
- Army high performance computing research center
- Adaptive insertion policies for high performance caching
- Internet safety introduction
- Introduction to internet slideshare
- High precision vs high recall
- High precision vs high recall
- High expectations high support
- High precision vs high accuracy
- Pengertian investasi
- Performance task introduction
- Performance task intro
- Job review examples
- Introduction paragraph examples high school
- Low voltage
- High-density-interconnection
- Mike mozer
- High school introduction paragraph
- Intro paragraph layout
- Progress and performance measurement and evaluation
- Evaluation in progress
- Discussing advantages and disadvantages
- Bulk tv and internet
- Nuts and bolts internet
- Press: television: radio: ?
- Internet technologies and applications
- The do’s and don’ts of online communication
- Ms internet security and acceleration server
- Ce este intranetul
- Effects of internet use and study habits
- Difference between internet and www
- Give two pieces of advice from tablets for
- What is cloud computing presentation
- What is the answer to this problem
- Medtech and the internet of medical things
- Medical internet of things and big data in healthcare