PAPR Network Processor Architecture and Programming Daniela GENIUS

  • Slides: 22
Download presentation
PAPR - Network Processor Architecture and Programming Daniela GENIUS LIP 6 ASIM IM AS

PAPR - Network Processor Architecture and Programming Daniela GENIUS LIP 6 ASIM IM AS Journée Informatique Embarquée, 13 Mai 2005 1 Daniela GENIUS

PAPR Team PAPR started as a collaboration between the ASIM and RP departments of

PAPR Team PAPR started as a collaboration between the ASIM and RP departments of LIP 6 • • • IM AS Etienne Faure Daniela Genius Alain Greiner Eric Horlait Kave Salamatian Journée Informatique Embarquée, 13 Mai 2005 2 Daniela GENIUS

Contents • • • IM AS Introduction The PAPR Generic Platform Hardware/Software Codesign Application

Contents • • • IM AS Introduction The PAPR Generic Platform Hardware/Software Codesign Application Specification Language MWMR Communication Channels Memory Management Hardware Coprocessors Performance Evaluation Research Perspectives Journée Informatique Embarquée, 13 Mai 2005 3 Daniela GENIUS

Introduction Bandwidth (Mbyte/sec) OC 768 100, 000 OC 192 10, 000 x 4 40

Introduction Bandwidth (Mbyte/sec) OC 768 100, 000 OC 192 10, 000 x 4 40 Gb 10 Gb x 16 OC 12 1000 DS 3 100 PR 622 Mb x 12 DS= Digital Signal 44 Mb x 28 10 1 DS 1 x 24 0. 1 DS 0 64 K 1980 IM AS OC = Optical Carrier 1. 5 M 1985 Journée Informatique Embarquée, 13 Mai 2005 1990 4 2000 2005 year Daniela GENIUS

Introduction • In recent years, diverse architectures have been proposed for Network Processors: •

Introduction • In recent years, diverse architectures have been proposed for Network Processors: • Intel IXP • IBM Power NP • Motorola • AMCC • . . . • Bottlenecks are memory access and on-chip communications. • Our aim is to propose a design method for telecom embedded application, using a generic, multiprocessor, hardware platfom. • Related Work : STep. NP of STMicroelectronics IM AS Journée Informatique Embarquée, 13 Mai 2005 5 Daniela GENIUS

Introduction • Data rates keep increasing • Protocols and applications keep evolving • System

Introduction • Data rates keep increasing • Protocols and applications keep evolving • System design and test is slow and expensive • Special-purpose hardware can hardly be reused • Flexibility and adaptation to new standards (UMTS, IPv 6, MPLS, . . . ) is a must. => The key is programmability We will use general-purpose processors, and a “classical” shared memory multiprocessor architecture IM AS Journée Informatique Embarquée, 13 Mai 2005 6 Daniela GENIUS

LIP 6 Contributions • Generic network processor architecture: The LIP 6 proposes a generic

LIP 6 Contributions • Generic network processor architecture: The LIP 6 proposes a generic and flexible hardware platform which can adapt to different (networking) applications : PAPR • Application specification language/environment: The LIP 6 proposes an environment which allows the system designer to describe and validate multi-threaded applications and which facilitates the mapping on the generic platform. • Design space exploration: The LIP 6 proposes a general method to ease the migration from software (running on programmable processors), to hardware (dedicated coprocessors), in order to optimize performance or minimize power consumption. IM AS Journée Informatique Embarquée, 13 Mai 2005 7 Daniela GENIUS

The PAPR Generic Platform • Based on So. CLib component library • Shared address

The PAPR Generic Platform • Based on So. CLib component library • Shared address space • VCI compliant • MIPS R 3000 micro-processors • External RAM controller • DSPIN network-on-chip • Dedicated hardware coprocessor for I/Os • Optionnal (synthesized) hardware coprocessors IM AS Journée Informatique Embarquée, 13 Mai 2005 8 Daniela GENIUS

The PAPR Generic Platform • A IM AS Journée Informatique Embarquée, 13 Mai 2005

The PAPR Generic Platform • A IM AS Journée Informatique Embarquée, 13 Mai 2005 9 Daniela GENIUS

Hardware/Software Codesign Multi-thread application The system designer must have the following possibilities : •

Hardware/Software Codesign Multi-thread application The system designer must have the following possibilities : • choose the hard/soft implementation for each task Application Mapping Multi-processor architecture • map the software tasks on the programmable processors (and the hardware tasks on synthesized or existing coprocessors) • map the communication channels onto the physical memory banks IM AS Journée Informatique Embarquée, 13 Mai 2005 10 Daniela GENIUS

Application Specification Language The software parallel application is described as a task graph with

Application Specification Language The software parallel application is described as a task graph with two types of nodes : tasks & communication channels. Tasks communicate through Multi-Writer / Multi-Reader FIFOs. . Tasks can be hardware or software. IM AS Journée Informatique Embarquée, 13 Mai 2005 11 Daniela GENIUS

MWMR Communication Channels • Each MWMR channel is implemented as a software FIFO, and

MWMR Communication Channels • Each MWMR channel is implemented as a software FIFO, and is caracterized by 2 parameters: width & a depth. • Each MWMR channel is protected by a lock, in order to guarantee exclusive access. • Read & Write communication primitives are non-blocking : - int mwmr_read(channel_id, *buffer, nb_bytes) - int mwmr_write(channel_id, *buffer, nb_bytes) • As any task can be implemented in hardware or software, MWMR channels can be accessed by both hardware and software controllers. • The software versions of the communication primitives are built upon the POSIX API : The software application can be executed on any UNIX workstation, before being mapped on the So. C. IM AS Journée Informatique Embarquée, 13 Mai 2005 12 Daniela GENIUS

Application Specification Language R I R F F O R R I F R

Application Specification Language R I R F F O R R I F R IPV 4 Routing Application IM AS Journée Informatique Embarquée, 13 Mai 2005 13 Daniela GENIUS

Application Specification Language C S C I F F C S F O C

Application Specification Language C S C I F F C S F O C F S C Classification Application IM AS Journée Informatique Embarquée, 13 Mai 2005 14 Daniela GENIUS

Memory Management • The network processor must have the largest possible storage capacity (several

Memory Management • The network processor must have the largest possible storage capacity (several thousands packets). • In networking applications, the relevant information is usually located within the first few bytes of a packet. • On-chip memory is limited à External RAM is mandatory, with a careful allocation/free policy. à Only the packet descriptors are stored in the on-chip RAM. IM AS Journée Informatique Embarquée, 13 Mai 2005 15 Daniela GENIUS

Memory Management „Slot“ Data Structure • Descriptor (128 bits) : MWMR channels • First

Memory Management „Slot“ Data Structure • Descriptor (128 bits) : MWMR channels • First slot (128 bytes) : on-chip RAM • Following slots (128 bytes) : external RAM @ @ IM AS @ Journée Informatique Embarquée, 13 Mai 2005 @ 16 NULL Daniela GENIUS

Memory Management • Incoming packets : The slots are allocated by the coprocessor Input

Memory Management • Incoming packets : The slots are allocated by the coprocessor Input Engine. • Outgoing packets : After treatment, once read by the Output Engine, they are de-allocated. IM AS Journée Informatique Embarquée, 13 Mai 2005 17 Daniela GENIUS

Hardware Coprocessors Both the Input Engine and Output Engine coprocessors are configured by software,

Hardware Coprocessors Both the Input Engine and Output Engine coprocessors are configured by software, and use a MWMR hardware controller. • Input Engine • Its aim is to copy the packets coming from the Gigabit Ethernet link into system memory. • It implements the management of the slot structure. • Output Engine • Its aim is to reconstitute the packets from their slots in order to copy them to the outgoing Ethernet link. • The Output Engine works symmetrically to the Input Engine. IM AS Journée Informatique Embarquée, 13 Mai 2005 18 Daniela GENIUS

The Hardware MWMR Controller The VCI_MWMR_initiator wrapper is a generic hardware MWMR controller that

The Hardware MWMR Controller The VCI_MWMR_initiator wrapper is a generic hardware MWMR controller that has a DMA capability. It implements a variable number of read or write MWMR communication channels. HARDWARE COPROCESSOR VCI_RAM (IP CORE) (containing the FIFO MWMR software FIFO) VCI_MWMR_INITIATOR WRAPPER VCI interconnect IM AS Journée Informatique Embarquée, 13 Mai 2005 19 Daniela GENIUS

Packet Moves IM AS Journée Informatique Embarquée, 13 Mai 2005 20 Daniela GENIUS

Packet Moves IM AS Journée Informatique Embarquée, 13 Mai 2005 20 Daniela GENIUS

Performance Evaluation • We use So. CLIB models for the hardware part of the

Performance Evaluation • We use So. CLIB models for the hardware part of the platform. • The abstraction level is CABA (cycle accurate, bit accurate). • In a class project, we developed a suite of small benchmarks (IPv 4, classification, NAT, firewall, . . . ) • We analyze and compare the output files generated by the simulation (chronograms end Ethernet flows) • The throughput of incoming packets can be varied for performance/maximal load measurements. IM AS Journée Informatique Embarquée, 13 Mai 2005 21 Daniela GENIUS

Ongoing Research • Multi-Cluster Communication Architecture We want to optimize the performances by exploiting

Ongoing Research • Multi-Cluster Communication Architecture We want to optimize the performances by exploiting the locality of applications. Clusterization further complicates the mapping of tasks to processors, and communication channels on memory banks. • Hardware-managed Synchronization By using hardware queues, we try to minimize the cost of taking locks for exclusive FIFO access. • Macro-pipeline for packet treatment The two applications shown exhibit packet-level parallelism (one packet per task). We try to parallelize further by decomposing the treatment into several threads. • Packet Reordering MWMR communication channels allows out-of-order arrival of packets at the Output Engine. We want to analyse several strategy of packet reordering. • KPN communication mode on the MWMR channels Other classes of applications, whose task graphs are not necessarily of task-farm type, are currently studied : MJPEG and JPEG 2000. IM AS Journée Informatique Embarquée, 13 Mai 2005 22 Daniela GENIUS