ECE 526 Network Processing Systems Design Programming Model














![Profiling: Trace Generation • Packet. Bench [Ramaswamy 2003] • Data dependencies between registers and Profiling: Trace Generation • Packet. Bench [Ramaswamy 2003] • Data dependencies between registers and](https://slidetodoc.com/presentation_image_h2/ac70e4b8fc2f07406fdefafc86e7ca25/image-15.jpg)
![Clustering Algorithm • Ratio Cut [ Wei 1991] ─ identify the natural cluster without Clustering Algorithm • Ratio Cut [ Wei 1991] ─ identify the natural cluster without](https://slidetodoc.com/presentation_image_h2/ac70e4b8fc2f07406fdefafc86e7ca25/image-16.jpg)





- Slides: 21

ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer

Overview • Recalled ─ Network processors is complicated and heterogeneous architecture ─ Hard to program it • Need understand fine details of architecture • Current approach assembly or subset of C language • Programming Model ─ Filling the gap between application and architecture ─ Natural interface (e. g. domain-specific language for programmer) ─ Abstraction of underlying hardware • Enough architecture details to write efficient code • Not too complicated for programmer • Two models ─ Hardware specific model: IXP Programming Model ─ General Models: NP–Click and ADAG Ning Weng ECE 526 2

IXP Programming Model • What kind of software abstractions are used on IXP? • Active Computing Element (ACE): ─ ─ ─ ─ Fundamental software building block Used to construct packet processing system Runs on XScale, u. E, host Handles control plane and fast or slow path packet processing Coordinates and synchronizes with other ACEs Can have multiple outputs Can serve as part of pipeline • Protocol processing is implemented by combining multiple ACEs Ning Weng ECE 526 3

ACE Terminology • Library ACE: ─ ACE that has been provided by Intel for basic functions • Conventional ACE or Standard ACE: ─ ACE build by customer ─ Might make use of Intel’s Action Service Libraries • Micro ACE ─ ACE with two components: • Core component (runs on XScale) • Microblock component (runs on u. E) • Terminology for microblocks: ─ Source microblock: initial point that receives packets ─ Transform microblock: intermediate point that accepts and forwards packets ─ Sink microblock: last point that sends packets Ning Weng ECE 526 4

ACE Parts • An ACE contains four conceptual parts: • Initialization: ─ Initialization of data structures and variables before code execution • Classification: ─ ACE classifies packet on arrival ─ Classification can be chosen or use default • Actions: ─ Based on classification an action is invoked • Message and event management: ─ ACE can generate or handle messages ─ Communication with another ACE or hardware Ning Weng ECE 526 5

ACE Binding • ACE can be bound together to implement protocol processing: • Binding happens when loading ACE into NP • Binding can be changed dynamically • Unbound targets perform silent discard Ning Weng ECE 526 6

ACE Division Ning Weng ECE 526 7

Microengine Assignment • Packet processing involves several microblocks • How should microblocks be allocated to microengines? ─ One microblock per micorengine ─ Multiple microblocks per microengine (in pipeline) ─ Multiple pipelines on multiple microengines • What are pros and cons? ─ Passing packets between microengines incurs overhead ─ Pipelining causes inefficiencies if blocks are not equal in size ─ Multiple blocks per microengine causes contention and requires more instruction storage • Intel terminology: “microblock group” ─ Set of microblock running on one microengine Ning Weng ECE 526 8

Microblocks Groups • Microblock groups can be replicated to increase parallelism Ning Weng ECE 526 9

Microblock Group Replication • Performance Critical Groups can be replicated Ning Weng ECE 526 10

Control of Packet Flow • Packets require different processing blocks ─ IP requires different microblocks than ARP ─ Special packets get handed off to core • “Dispatch Look” control packet flow among microblocks ─ Each thread runs its own dispatch loop ─ Infinite loop that grabs packets and hands them to microblocks ─ Return value from microblock determines the next step • Invocation of microblockis similar to function call Ning Weng ECE 526 11

Dispatch Loop • Example: ─ Three microblocks ─ Ingress, IP, egress Ning Weng ECE 526 12

Click Model of IPv 4 NP-Click: A Programming Model for the Intel IXP 1200 by Niraj Shah and etc, UC Berkeley Ning Weng ECE 526 13

My Approach: ADAG • Architecture-independent workload representation • ADAG (Annotated Directed Acyclic Graph) ─ Node: processing task • 3 -tuple: the number of instructions, the number of memory reads and writes. ─ Edge: the dependency • edge weight: the amount of data communicated between nodes. Ning Weng ECE 526 14
![Profiling Trace Generation Packet Bench Ramaswamy 2003 Data dependencies between registers and Profiling: Trace Generation • Packet. Bench [Ramaswamy 2003] • Data dependencies between registers and](https://slidetodoc.com/presentation_image_h2/ac70e4b8fc2f07406fdefafc86e7ca25/image-15.jpg)
Profiling: Trace Generation • Packet. Bench [Ramaswamy 2003] • Data dependencies between registers and memories • Control dependency for conditional branch Ning Weng ECE 526 15
![Clustering Algorithm Ratio Cut Wei 1991 identify the natural cluster without Clustering Algorithm • Ratio Cut [ Wei 1991] ─ identify the natural cluster without](https://slidetodoc.com/presentation_image_h2/ac70e4b8fc2f07406fdefafc86e7ca25/image-16.jpg)
Clustering Algorithm • Ratio Cut [ Wei 1991] ─ identify the natural cluster without a-priori knowledge of the final number of clusters ─ cluster nodes together such that rij is minimized ─ top down approach ─ NP-complete • MLRC (Maximum Local Ratio Cut) ─ bottom-up ─ merge the nodes that should be least separated and recursively apply the process ─ computation complexity O(n 3) Ning Weng ECE 526 16

ADAG Mapping onto NPs • Goal: to generate a high performance schedule • Mapping is NP-complete problem • Using randomized mapping to solve this NP-complete • Evaluate the randomized mapping by an analytical performance model PE ADAG Node B. A. Malloy, E. L. Lloyd, and M. L. Souffa. Scheduling DAG’s for asynchronous multiprocessor execution. IEEE Transactions on Parallel and Distributed Systems, 5(5): 498 – 508, May 1994. Ning Weng ECE 526 17

Mapping Quality I • Simulation setup: pipeline depth 1, width 8. • Performance model of ideal mapping: Ning Weng ECE 526 18

Mapping Quality II • Exhaustive search: enumerates all possible mappings • Randomized search: randomly chooses a mapping Ning Weng ECE 526 19

Summary • NP programming for high performance is hard problem • Programming model is solution ─ Intel ACE ─ NP Click ─ ADAGs Ning Weng ECE 526 20

For Next Class and Reminder • Read Chapter 23 • Lab 3 • Project Ning Weng ECE 526 21
Ece 526
Ece 526
Network systems design using network processors
How was byzantium a continuation of the roman empire?
Sempure 60
Byzantine empire 526 ce
What is the value of the underlined digit 526
Então meu deus
Cs 526
Network model linear programming
Perbedaan linear programming dan integer programming
Greedy programming vs dynamic programming
System programming
Linear vs integer programming
Programing adalah
Bottom-up processing example
Bottom up processing vs top down processing
Bottom up and top down processing
Neighborhood averaging in image processing
Secondary processing
What is point processing in digital image processing
Histogram processing in digital image processing