CS 213 Computer Architecture Lecture 7 Introduction to

  • Slides: 27
Download presentation
CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L. N. Bhuyan

CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L. N. Bhuyan www. cs. ucr. edu/~bhuyan/CS 213 1 2003 ©UCR

Outline ° Introduction to NP Systems ° Relevant Applications ° Design Issues and Challenges

Outline ° Introduction to NP Systems ° Relevant Applications ° Design Issues and Challenges ° Relevant Software and Benchmarks ° A case study: Intel IXP network processors 2 2003 ©UCR

What are Network Processors ° Any device that executes programs to handle packets in

What are Network Processors ° Any device that executes programs to handle packets in a data network ° Examples • Processors on router line cards • Processors in network access equipment 3 2003 ©UCR

Why Network Processors ° Current Situation • Data rates are increasing • Protocols are

Why Network Processors ° Current Situation • Data rates are increasing • Protocols are becoming more dynamic and sophisticated • Protocols are being introduced more rapidly ° Processing Elements • GP(General-purpose Processor) - Programmable, Not optimized for networking applications • ASIC(Application Specific Integrated Circuit) - high processing capacity, long time to develop, Lack the flexibility • NP(Network Processor) 4 - achieve high processing performance - programming flexibility - Cheaper than GP 2003 ©UCR

Typical NP Architecture SDRAM Bus (Packet buffer) SRAM (Routing table) Input ports Bus Output

Typical NP Architecture SDRAM Bus (Packet buffer) SRAM (Routing table) Input ports Bus Output ports multi-threaded processing elements Co-processor Network Processor 5 2003 ©UCR

TCP/IP Model OSI TCP/IP 7 Application 6 Pre. 5 Session 4 Transport TCP 3

TCP/IP Model OSI TCP/IP 7 Application 6 Pre. 5 Session 4 Transport TCP 3 Network IP 2 Data Link 1 Physical Host-to-Net ° ISO OSI (Open Systems Interconnection) not fully implemented ° Presentation and Session layers not present in TCP/IP 6 2003 ©UCR

Processing Tasks Policy Applications Control Plane Network Management Signaling Topology Management Queuing / Scheduling

Processing Tasks Policy Applications Control Plane Network Management Signaling Topology Management Queuing / Scheduling Data Transformation Data Plane Classification Data Parsing Media Access Control Physical Layer Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik 7 2003 ©UCR

Application Categorization ° Control-Plane tasks • Less time-critical • Control and management of device

Application Categorization ° Control-Plane tasks • Less time-critical • Control and management of device operation - Table maintenance, port states, etc. ° Data-Plane tasks • Operations occurring real-time on “packet path” • Core device operations - Receive, process and transmit packets 8 2003 ©UCR

Data Plane Tasks ° Media Access Control • Low-level protocol implementation - Ethernet, SONET

Data Plane Tasks ° Media Access Control • Low-level protocol implementation - Ethernet, SONET framing, ATM cell processing, etc. ° Data Parsing • Parsing cell or packet headers for address or protocol information ° Classification • Identify packet against a criteria (filtering / forwarding decision, Qo. S, accounting, etc. ) ° Data Transformation • Transformation of packet data between protocols ° Traffic Management • Queuing, scheduling and policing packet data 9 2003 ©UCR

Applications: IPv 4 Routing P A P P B Router C ° Routers determine

Applications: IPv 4 Routing P A P P B Router C ° Routers determine next hop and forward packets 10 2003 ©UCR

URL-based switching – My NSF Project www. yahoo. com Internet Image Server IP TCP

URL-based switching – My NSF Project www. yahoo. com Internet Image Server IP TCP APP. DATA Application Server GET /cgi-bin/form HTTP/1. 1 Host: www. yahoo. com… Switch HTML Server ° Increase efficiency ° Tasks • Traverse the packet data (request) for each arriving packet and classify it: - Contains ‘. jpg’ -> to image server - Contains ‘cgi-bin/’ -> to application server 11 2003 ©UCR

Organizing Processor Resources ° Design decisions: • High-level organization • ISA and micro architecture

Organizing Processor Resources ° Design decisions: • High-level organization • ISA and micro architecture • Memory and I/O integration ° Today’s commercial NPs: • Chip multiprocessors • Most are multithreaded • Exploit little ILP (Cisco does) • No cache • Micro-programmed 12 2003 ©UCR

Architectural Comparisons ° High-level organizations • Aggressive superscalar (SS) • Fine-grained multithreaded (FGMT) •

Architectural Comparisons ° High-level organizations • Aggressive superscalar (SS) • Fine-grained multithreaded (FGMT) • Chip multiprocessor (CMP) • Simultaneous multithreaded (SMT) 13 2003 ©UCR

Multithreading ° Basic idea • multiple register sets in the processor • fast context

Multithreading ° Basic idea • multiple register sets in the processor • fast context switch • switch thread on a cache access (How is this different than non-blocking cache? ) • tolerating local latency vs remote in CCNUMA multiprocessors • hybrids - switch on notice - simultaneous multithreading 14 2003 ©UCR

Time (processor cycle) Architectural Comparisons (cont. ) 15 Superscalar Simultaneous Fine-Grained. Coarse-Grained. Multiprocessing Multithreading

Time (processor cycle) Architectural Comparisons (cont. ) 15 Superscalar Simultaneous Fine-Grained. Coarse-Grained. Multiprocessing Multithreading Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot 2003 ©UCR

Tasks and Services Three Benchmarks used in the experiment 16 2003 ©UCR

Tasks and Services Three Benchmarks used in the experiment 16 2003 ©UCR

Some Challenges ° Intelligent Design • Given a selection of programs, a target network

Some Challenges ° Intelligent Design • Given a selection of programs, a target network link speed, the ‘best’ design for the processor - Least area - Least power - Most performance ° Write efficient multithreaded programs • NPs have - Heterogeneous computer resources Non-uniform memory Multiple interacting threads of execution Real-time constraints • Make use of resources - How to use special instructions and hardware assists – – Compilers Hand-coded • Multithreaded programs 17 Manage access to shared state Synchronization between threads 2003 ©UCR

Benchmarks for Network Processors • Net. Bench - 10 applications - http: //cares. icsl.

Benchmarks for Network Processors • Net. Bench - 10 applications - http: //cares. icsl. ucla. edu/Net. Bench • Comm. Bench - 8 networking and communications applications - http: //ccrc. wustl. edu/~wolf/cb/ • EEMBC - http: //www. eembc. org/benchmark • Media. Bench - Transcoders - Some communications applications 18 2003 ©UCR

IXP 1200 Block Diagram ° Strong. ARM processing core ° Microengines introduce new ISA

IXP 1200 Block Diagram ° Strong. ARM processing core ° Microengines introduce new ISA ° I/O • PCI • SDRAM • SRAM • IX : PCI-like packet bus ° On chip FIFOs • 16 entry 64 B each 19 2003 ©UCR

IXP 1200 Microengine ° 4 hardware contexts • Single issue processor • Explicit optional

IXP 1200 Microengine ° 4 hardware contexts • Single issue processor • Explicit optional context switch on SRAM access ° Registers • All are single ported • Separate GPR • 256*6 = 1536 registers total ° 32 -bit ALU • Can access GPR or XFER registers ° Shared hash unit • 1/2/3 values – 48 b/64 b • For IP routing hashing ° Standard 5 stage pipeline ° 4 KB SRAM instruction store – not a cache! ° Barrel shifter 20 2003 ©UCR

IXP 2400 Block Diagram ° XScale core replaces Strong. ARM DDR DRAM controller °

IXP 2400 Block Diagram ° XScale core replaces Strong. ARM DDR DRAM controller ° Microengines ME 0 ME 1 ME 3 ME 2 Scratch /Hash /CSR XScale Core PCI QDR SRAM controller • Faster • More: 2 clusters of 4 microengines each ° Local memory ME 4 ME 7 ME 5 ME 6 MSF Unit ° Next neighbor routes added between microengines ° Hardware to accelerate CRC operations and Random number generation ° 16 entry CAM 21 2003 ©UCR

Different Types of Memory Type Width Size Approx Notes unloaded (byte) (bytes) latency (cycles)

Different Types of Memory Type Width Size Approx Notes unloaded (byte) (bytes) latency (cycles) Local 4 2560 1 Indexed addressing post incr/decr On-chip 4 Scratch 16 K 60 Atomic ops SRAM 4 256 M 150 Atomic ops DRAM 8 2 G 300 Direct path to/fro MSF 22 2003 ©UCR

IXA Software Framework External Processors Control Plane Protocol Stack Control Plane PDK XScale Core

IXA Software Framework External Processors Control Plane Protocol Stack Control Plane PDK XScale Core C/C++ Language Core Components Core Component Library Resource Manager Library Microengine Pipeline Microblock Library Micro block Protocol Library Micro block Microengine C Language Utility Library Hardware Abstraction Library 23 2003 ©UCR

Example Toaster System: Cisco 10000 ° Almost all data plane operations execute on the

Example Toaster System: Cisco 10000 ° Almost all data plane operations execute on the programmable XMC ° Pipeline stages are assigned tasks – e. g. classification, routing, firewall, MPLS • Classic SW load balancing problem ° External SDRAM shared by common pipe stages 24 2003 ©UCR

IBM Power. NP ° 16 pico-procesors and 1 power. PC ° Each pico-processor •

IBM Power. NP ° 16 pico-procesors and 1 power. PC ° Each pico-processor • Support 2 hardware threads • 3 stage pipeline : fetch/decode/execute ° Dyadic Processing Unit • Two pico-processors • 2 KB Shared memory • Tree search engine ° Focus is layers 2 -4 ° Power. PC 405 for control plane operations • 16 K I and D caches ° Target is OC-48 25 2003 ©UCR

Motorola C-Port C-5 Chip Architecture 26 2003 ©UCR

Motorola C-Port C-5 Chip Architecture 26 2003 ©UCR

References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick

References ° [NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies ° [NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial 27 2003 ©UCR