ECE 526 Network Processing Systems Design Network Processor


























- Slides: 26

ECE 526 – Network Processing Systems Design Network Processor Tradeoffs and Examples Chapter: D. E. Comer

Outline • Network Processor design tradeoffs • Sample Network Processor Ning Weng ECE 526 2

NP Architecture • Numerous different design goals ─ ─ Performance Cost Functionality Programmability • Numerous different system choices ─ ─ Use of parallelism Types of memories Types of interfaces Etc. • We consider ─ Design tradeoffs on high level (qualitative tradeoffs) ─ Commercial Network Processors Ning Weng ECE 526 3

Processor Topologies • How can processors be arranged on NP? ─ Consider heterogeneity of processing resources and workload • Multiprocessor ─ Parallel processors with shared interconnect ─ Problems? • Pipeline ─ Multiple processors per data path ─ Problems? • Data Flow Architecture ─ Extreme form of pipelining ─ Problems? • Heterogeneous Architectures Ning Weng ECE 526 4

Design Tradeoffs (1) • Low development cost vs. performance ─ ASICs give higher performance, but take time to develop ─ NPs allow faster development, but might give lower performance • Programmability vs. processing speed ─ Similar to tradeoff between ASIC and NP ─ Co-processors pose the same tradeoffs ─ Complexity of instruction set • Performance: packet rate, data rate, and bursts ─ Difficult to assess the performance of a system ─ Even more difficult to compare different systems • Per-interface rate vs. aggregate data rate ─ NP usually limited to one port Ning Weng ECE 526 5

Design Tradeoffs (2) • NP speed vs. bandwidth ─ How much processing power per bandwidth is necessary? ─ Depends on application complexity • Coprocessor design: look aside vs. flow-through ─ Look aside: “called” from main processor, need state transfer ─ Flow-through: all traffic streams through coprocessor • Pipelining: uniform vs. synchronized ─ Pipeline stages can take different times ─ Tradeoff between slowing down or synchronization • Explicit parallelism vs. cost and programmability ─ Hidden parallelism is easier to program ─ Explicit parallelism is cheaper to implement Ning Weng ECE 526 6

Design Tradeoffs (3) • Parallelism: scale vs. packet ordering ─ Why is packet order important? ─ Giving up packet order constraint gives better throughput • Parallelism: speed vs. stateful classification ─ Shared state requires synchronization ─ Limits parallelism • Memory: speed vs. programmability ─ Different types of memories give performance ─ Increases difficulty in programming • I/O performance vs. pin count ─ Packaging can be major cost factor ─ More pins give higher performance Ning Weng ECE 526 7

Design Tradeoffs (4) • Programming languages ─ Ease of programming vs. functionality vs. speed • Multithreading: throughput vs. programmability ─ Threads improve performance ─ Threads require more complex programs and synchronization • Traffic management vs. blind forwarding at low cost ─ Traffic management is desirable but requires processing • Generality vs. specific architecture role ─ NPs can be specialized for access, edge, core ─ NPs can be specialized towards certain protocols • Memory type: special-purpose vs. general-purpose ─ SRAM and DRAM vs. CAM Ning Weng ECE 526 8

Design Tradeoffs (5) • Backward compatibility vs. architectural advances ─ On component level: e. g. , memories DDR DRAM ─ On system level: NP needs to fit into overall router system • Parallelism vs. pipelining ─ Depends on usage of NP • Summary: ─ Lots of choices ─ Most decisions require some insight in expected NP usage ─ Tradeoffs are all qualitative • Lets look at the commercial design Ning Weng ECE 526 9

Novel Areas of NP Use • • TCP/IP offloading on high-performance servers Security processing: SSL offloading Storage area networks Many others: IDSs and etc. Ning Weng ECE 526 10

Performance Bottlenecks • Memory ─ Bandwidth available, but access time too slow ─ Increasing delay for off-chip memory • I/O ─ High-speed interfaces available ─ Cost problem with optical interfaces ─ Otherwise no problem • Processing power ─ Individual cores are getting more complex ─ Problems with access to shared resources ─ Control processor can become bottleneck Ning Weng ECE 526 11

Limitations on Scalability • What are the limitations on how fast NPs need to get? ─ Link rates (optical bandwidth limits) ─ Application complexity (core vs. edge) • What are the limitations on how fast NPs can get? ─ Parallelism in networks ─ Power consumption ─ Chip area Ning Weng ECE 526 12

Commercial Network Processors • Commercial NPs ─ Large variety of architectures ─ Different applications and performance spaces ─ Lots of implementation details and practical issues • General Themes ─ Type and number of processors • Homogeneous vs. heterogeneous ─ ─ Type and size of memories Internal and External communications channels Mechanisms of scalability: parallelism and pipelining Generality vs. specialization Ning Weng ECE 526 13

Intel IXP 1200: external connection Ning Weng ECE 526 14

Intel IXP 1200: internal architecture Ning Weng ECE 526 15

Cisco PXF Ning Weng ECE 526 16

Motorola C-Port: conceptual design Ning Weng ECE 526 17

Motorola C-Port: internal architecture Ning Weng ECE 526 18

Motorola C-Port: channel processor Ning Weng ECE 526 19

IXP 2400 • XScale (ARM compliant) embedded control processor ─ Instruction and data caches • 8 microengines ─ 400 or 600 MHz • • 8 threads per microengine Multiple instruction stores with 4 k instructions 256 general purpose registers 512 transfer registers 2 GB addressable DDR-DRAM memory (19. 2 Gbps) 32 MB addressable QDR-SRAM memory (12 Gbps r+w) 16 words of Next Neighbor Registers 16 k. B scratchpad Ning Weng ECE 526 20

IXP 2400 • Interconnects ─ Coprocessor bus added (incl. access to T-CAM) ─ Flow control bus for two-chip configurations (e. g. , ingress and egress) • Switch Fabrics ─ ─ No IX bus Utopia 1, 2, 3 CSIX-L 1 SPI-3 (POS-PHY 2/3) Ning Weng ECE 526 21

Two-Chip Configurations • Flow control needed between ingress and ─ 1 Gbps over flow control bus (not shown) Ning Weng ECE 526 22

IXP 2400 Internal Architecture Ning Weng ECE 526 23

IXP 2400 Microengine • Enhancements over IXP 1200 microengines: ─ ─ ─ Multiplier unit Pseudo-random number generator CRC calculator 4 32 -bit timers and timer signaling 16 -entry CAM for inter-thread communication Time stamping unit Generalized thread signaling 640 words of local memory Simultaneous access to packet queues without mutual exclusion Functional units for ATM segmentation and reassembly Automated byte-alignment u. E divided into two clusters with independent command SRAM buses Ning Weng ECE 526 24

Software • Support for software pipelining ─ “Reflector Mode Pathways” for communication ─ Next Neighbor Registers as programming abstraction • SDK 4. 0 ─ ─ Simulator, debugger, profiler, traffic generator Portable modules Provides better infrastructure support C compiler Ning Weng ECE 526 25

Summary • Network Processor design space is big due to ─ Varying design goals ─ Varying implementation choices • • • Qualitative tradeoffs Survey commercial NPs Network processors are getting more features Main architecture characteristic is still parallelism Software support is becoming more important Ning Weng ECE 526 26
Ece 526
Ece 526
Network systems design using network processors
Network processor design
Byzantine empire 526 ce
Brb silicone oil
Byzantine empire 526 ce
Rounding jeopardy
Olho as florestas murmurando ao vento
Network security process
A special purpose processor in raster scan systems called
Uma multiprocessors using crossbar switches
Logic
Pipelined processor design
Macro instruction
Ics 233
Principles of high-performance processor design
Embedded processor design
Single purpose processor in embedded system
Bottom up vs top down processing
Gloria suarez
Bottom-up processing examples
Neighborhood processing in digital image processing
Primary food production
Image enhancement by point processing
Histogram processing in digital image processing
Parallel processing vs concurrent processing