CS 213 Computer Architecture Lecture 7 Introduction to


























![References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick](https://slidetodoc.com/presentation_image_h2/5e69a646a50ced1f582b33d8f00d48ae/image-27.jpg)
- Slides: 27

CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L. N. Bhuyan www. cs. ucr. edu/~bhuyan/CS 213 1 2003 ©UCR

Outline ° Introduction to NP Systems ° Relevant Applications ° Design Issues and Challenges ° Relevant Software and Benchmarks ° A case study: Intel IXP network processors 2 2003 ©UCR

What are Network Processors ° Any device that executes programs to handle packets in a data network ° Examples • Processors on router line cards • Processors in network access equipment 3 2003 ©UCR

Why Network Processors ° Current Situation • Data rates are increasing • Protocols are becoming more dynamic and sophisticated • Protocols are being introduced more rapidly ° Processing Elements • GP(General-purpose Processor) - Programmable, Not optimized for networking applications • ASIC(Application Specific Integrated Circuit) - high processing capacity, long time to develop, Lack the flexibility • NP(Network Processor) 4 - achieve high processing performance - programming flexibility - Cheaper than GP 2003 ©UCR

Typical NP Architecture SDRAM Bus (Packet buffer) SRAM (Routing table) Input ports Bus Output ports multi-threaded processing elements Co-processor Network Processor 5 2003 ©UCR

TCP/IP Model OSI TCP/IP 7 Application 6 Pre. 5 Session 4 Transport TCP 3 Network IP 2 Data Link 1 Physical Host-to-Net ° ISO OSI (Open Systems Interconnection) not fully implemented ° Presentation and Session layers not present in TCP/IP 6 2003 ©UCR

Processing Tasks Policy Applications Control Plane Network Management Signaling Topology Management Queuing / Scheduling Data Transformation Data Plane Classification Data Parsing Media Access Control Physical Layer Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik 7 2003 ©UCR

Application Categorization ° Control-Plane tasks • Less time-critical • Control and management of device operation - Table maintenance, port states, etc. ° Data-Plane tasks • Operations occurring real-time on “packet path” • Core device operations - Receive, process and transmit packets 8 2003 ©UCR

Data Plane Tasks ° Media Access Control • Low-level protocol implementation - Ethernet, SONET framing, ATM cell processing, etc. ° Data Parsing • Parsing cell or packet headers for address or protocol information ° Classification • Identify packet against a criteria (filtering / forwarding decision, Qo. S, accounting, etc. ) ° Data Transformation • Transformation of packet data between protocols ° Traffic Management • Queuing, scheduling and policing packet data 9 2003 ©UCR

Applications: IPv 4 Routing P A P P B Router C ° Routers determine next hop and forward packets 10 2003 ©UCR

URL-based switching – My NSF Project www. yahoo. com Internet Image Server IP TCP APP. DATA Application Server GET /cgi-bin/form HTTP/1. 1 Host: www. yahoo. com… Switch HTML Server ° Increase efficiency ° Tasks • Traverse the packet data (request) for each arriving packet and classify it: - Contains ‘. jpg’ -> to image server - Contains ‘cgi-bin/’ -> to application server 11 2003 ©UCR

Organizing Processor Resources ° Design decisions: • High-level organization • ISA and micro architecture • Memory and I/O integration ° Today’s commercial NPs: • Chip multiprocessors • Most are multithreaded • Exploit little ILP (Cisco does) • No cache • Micro-programmed 12 2003 ©UCR

Architectural Comparisons ° High-level organizations • Aggressive superscalar (SS) • Fine-grained multithreaded (FGMT) • Chip multiprocessor (CMP) • Simultaneous multithreaded (SMT) 13 2003 ©UCR

Multithreading ° Basic idea • multiple register sets in the processor • fast context switch • switch thread on a cache access (How is this different than non-blocking cache? ) • tolerating local latency vs remote in CCNUMA multiprocessors • hybrids - switch on notice - simultaneous multithreading 14 2003 ©UCR

Time (processor cycle) Architectural Comparisons (cont. ) 15 Superscalar Simultaneous Fine-Grained. Coarse-Grained. Multiprocessing Multithreading Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot 2003 ©UCR

Tasks and Services Three Benchmarks used in the experiment 16 2003 ©UCR

Some Challenges ° Intelligent Design • Given a selection of programs, a target network link speed, the ‘best’ design for the processor - Least area - Least power - Most performance ° Write efficient multithreaded programs • NPs have - Heterogeneous computer resources Non-uniform memory Multiple interacting threads of execution Real-time constraints • Make use of resources - How to use special instructions and hardware assists – – Compilers Hand-coded • Multithreaded programs 17 Manage access to shared state Synchronization between threads 2003 ©UCR

Benchmarks for Network Processors • Net. Bench - 10 applications - http: //cares. icsl. ucla. edu/Net. Bench • Comm. Bench - 8 networking and communications applications - http: //ccrc. wustl. edu/~wolf/cb/ • EEMBC - http: //www. eembc. org/benchmark • Media. Bench - Transcoders - Some communications applications 18 2003 ©UCR

IXP 1200 Block Diagram ° Strong. ARM processing core ° Microengines introduce new ISA ° I/O • PCI • SDRAM • SRAM • IX : PCI-like packet bus ° On chip FIFOs • 16 entry 64 B each 19 2003 ©UCR

IXP 1200 Microengine ° 4 hardware contexts • Single issue processor • Explicit optional context switch on SRAM access ° Registers • All are single ported • Separate GPR • 256*6 = 1536 registers total ° 32 -bit ALU • Can access GPR or XFER registers ° Shared hash unit • 1/2/3 values – 48 b/64 b • For IP routing hashing ° Standard 5 stage pipeline ° 4 KB SRAM instruction store – not a cache! ° Barrel shifter 20 2003 ©UCR

IXP 2400 Block Diagram ° XScale core replaces Strong. ARM DDR DRAM controller ° Microengines ME 0 ME 1 ME 3 ME 2 Scratch /Hash /CSR XScale Core PCI QDR SRAM controller • Faster • More: 2 clusters of 4 microengines each ° Local memory ME 4 ME 7 ME 5 ME 6 MSF Unit ° Next neighbor routes added between microengines ° Hardware to accelerate CRC operations and Random number generation ° 16 entry CAM 21 2003 ©UCR

Different Types of Memory Type Width Size Approx Notes unloaded (byte) (bytes) latency (cycles) Local 4 2560 1 Indexed addressing post incr/decr On-chip 4 Scratch 16 K 60 Atomic ops SRAM 4 256 M 150 Atomic ops DRAM 8 2 G 300 Direct path to/fro MSF 22 2003 ©UCR

IXA Software Framework External Processors Control Plane Protocol Stack Control Plane PDK XScale Core C/C++ Language Core Components Core Component Library Resource Manager Library Microengine Pipeline Microblock Library Micro block Protocol Library Micro block Microengine C Language Utility Library Hardware Abstraction Library 23 2003 ©UCR

Example Toaster System: Cisco 10000 ° Almost all data plane operations execute on the programmable XMC ° Pipeline stages are assigned tasks – e. g. classification, routing, firewall, MPLS • Classic SW load balancing problem ° External SDRAM shared by common pipe stages 24 2003 ©UCR

IBM Power. NP ° 16 pico-procesors and 1 power. PC ° Each pico-processor • Support 2 hardware threads • 3 stage pipeline : fetch/decode/execute ° Dyadic Processing Unit • Two pico-processors • 2 KB Shared memory • Tree search engine ° Focus is layers 2 -4 ° Power. PC 405 for control plane operations • 16 K I and D caches ° Target is OC-48 25 2003 ©UCR

Motorola C-Port C-5 Chip Architecture 26 2003 ©UCR
![References NPT W H MangioneSmith G Memik Network Processor Technologies NPRD Patrick References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick](https://slidetodoc.com/presentation_image_h2/5e69a646a50ced1f582b33d8f00d48ae/image-27.jpg)
References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial 27 2003 ©UCR