Calcolo parallelo in sistemi embedded Massimo Bocchi Corso

Calcolo parallelo in sistemi embedded Massimo Bocchi Corso di Architettura dei Sistemi Integrati A. A. 2002/2003

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Data-flow Graph representations A data-flow graph (DFG) contains nodes and directed edges n NODE: represents a computation or task n DIRECTED EDGE: represents the communication between two nodes (data-path) n Massimo Bocchi, 07/02/2003 each node has an execution time associated with it (2) A (4) (2) D B D 2 D each edge has a number of delays associated with it C (5) ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Pipelining and Parallel Processing n Pipelining and parallel processing techniques can be used to transform a DFG n These DFG transformations can be used to: – increase the clock speed or the data throughput – reduce the power consumption if the clock speed is not increased Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Example Basic DFG x[n] (2) 2 -level pipelined structure x[n] (2) x[2 m] (2) 2 -level parallel processing structure Massimo Bocchi, 07/02/2003 x[2 m+1] (2) (4) A A D (4) (4) B B y[n] y[n-1] y[2 m+1] ARCES - University of Bologna

Processor-time diagram Basic DFG 2 -level pipelined structure 2 -level parallel processing structure (3) x[n] A P 2 P 1 A A A P 2 P 1 A Massimo Bocchi, 07/02/2003 A A (1) (5) B C C B A A A B C C C C y[n] C C C A A A B C C A C A A A B ARCES - University of Bologna C

Scheduling n Static scheduling: the execution list of the tasks is determined at compile-time performing an estimation of the execution times n Dynamic scheduling: the execution order of the tasks is resolved at run-time considering the real execution time of the different tasks Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Multiprocessor classification In 1966 Flynn proposed a multiprocessor classification: SISD (single instruction stream – single data stream): a normal single processor based on the von Neumann model n SIMD (single instruction stream – multiple data stream): a single instruction stream is generated by a single control unit but used by several processors; each processor executes the same instruction stream but acts upon different data, generating multiple data streams n MISD (multiple instruction stream – single data stream): no existing implementations can be referred to this model n MIMD (multiple instruction stream – multiple data stream): one instruction stream is generated for each computer; each instruction acts upon different data n Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

MIMD multiprocessor classification A MIMD multiprocessor system needs a mechanism to load the instruction and data memories and to pass information between processors. This can be done using two solutions: n shared memory multiprocessor systems – bus-based multiprocessor systems n multiprocessor systems without shared memory – – dataflow multiprocessor systems message-passing multiprocessor systems Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Shared memory model All the processors have access to a common memory space n Cache memories create consistency problems on the shared data n Massimo Bocchi, 07/02/2003 Memories Interconnection network Processors ARCES - University of Bologna

Bus-based multiprocessor systems n n A typical implementation of a shared memory model consists of a bus-based multiprocessor system (e. g. an AMBA AHB-based system) These systems are not easily expandable to contain a large number of bus masters Synchronization mechanisms are necessary to maintain consistency on the shared data The shared bus and memory generate contentions between bus masters, thus reducing the system speed and the data throughput Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Dataflow model This model directly derives from DFG representations n An instruction is executed within a module when the operands required are available n The computation is data driven since the data stream activates the nodal operations n Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Message-passing model Direct links are used to pass data between the processors n There are only local memories; each memory can be accessed only by one processor (multicomputers) n Massimo Bocchi, 07/02/2003 Interconnection network Processors Memories ARCES - University of Bologna

Message-passing systems elements n Node: an independent subsystem containing a processor and local memories and peripherals n Link: a physical communication path between two nodes n Channel: a communication path between two processes in one processor or between two processes executing on different processors Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Message communication mechanisms n Two processes can communicate through a bidirectional channel n Basic communication operations are divided into three categories: – Send – Receive – Test Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Message communication mechanisms Process P 1 Process P 2 Process P 3 receive (m 2, P 3) send (m 1, P 2) receive (m 1, P 1) send (m 2, P 3) Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Physical support for message passing A link is necessary when processes are executed on different processors n The physical connection may include message buffers to improve communication efficiency n Massimo Bocchi, 07/02/2003 P 1 buffer P 2 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

MPI standard n n n Message Passing Interface is a standard for writing message-passing programs It is not targeted to a specific platform Both Fortran 77 and C languages are supported Main MPI features are intended to improve performance on scalable parallel computers with specialized interprocessor communication hardware MPICH is a free portable implementation of MPI standard Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Outline n n n n Data-Flow Graph (DFG) representations Pipelining and Parallel Processing transformations on a DFG Multiprocessor classification Bus-based multiprocessor systems Multiprocessor systems based on the dataflow mechanism Message-passing multiprocessor systems MPI: a message-passing interface standard Application: a message-passing multiprocessor So. C based on the Multi-layer AHB architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Application n An AMBA AHB compliant version of Xi. Risc processor has been designed n An AHB layer containing the processor, a local memory and other peripherals can be replicated to create an AHB Multi-layer multiprocessor So. C n A small set of message-passing primitives can be implemented on the proposed architecture Massimo Bocchi, 07/02/2003 ARCES - University of Bologna

Application Xi. Risc processor 1 send Buffers receive Xi. Risc processor 2 AHB Local Memory AHB DMA I/O channel AHB layer 1) P 2 performes a receive operation; execution is stopped since no messages have been sent. Massimo Bocchi, 07/02/2003 Local Memory DMA AHB layer 2) 3) P 1 sends a message to P 2 receives a message and can continue execution ARCES - University of Bologna

References n n n Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley, 1999. Richard Y. Kain, Advanced computer architecture: a systems design approach, Prentice Hall, 1996. Barry Wilkinson, Computer architecture design and performance, Prentice Hall, 1996. MPI: A Message Passing Interface Standard, http: //www-unix. mcs. anl. gov/mpi/ AMBA Specification 2. 0, www. arm. com Massimo Bocchi, 07/02/2003 ARCES - University of Bologna