The Microprocessor is no more General Purpose Design

  • Slides: 39
Download presentation
The Microprocessor is no more General Purpose

The Microprocessor is no more General Purpose

Design Gap

Design Gap

Problems with Fine Grained Approach FPGAs • Area in-efficient – Percentage of chip area

Problems with Fine Grained Approach FPGAs • Area in-efficient – Percentage of chip area for wiring far too high • Too slow – Unavoidable critical paths too long • Routing and Placement is very complex

Problems with Fine Grained FPGAs

Problems with Fine Grained FPGAs

Coarse Grained Reconfigurable computing • Uses reconfigurable arrays with path-widths greater than 1 bit

Coarse Grained Reconfigurable computing • Uses reconfigurable arrays with path-widths greater than 1 bit • More area-efficient • Massive reduction in configuration memory and configuration time • Drastic reduction in complexity of Placement & Routing

Coarse Grained Architectures Classification • Mesh-based • Linear Arrays based • Cross-bar based

Coarse Grained Architectures Classification • Mesh-based • Linear Arrays based • Cross-bar based

Mesh Based Architectures • Arranges PEs in a 2 -D array • Encourages nearest

Mesh Based Architectures • Arranges PEs in a 2 -D array • Encourages nearest neighbor links between adjacent PEs • Eg. Kress. Array, Matrix, RAW, CHESS

Matrix – Mesh based Architecture

Matrix – Mesh based Architecture

Matrix – Mesh Based Architecture

Matrix – Mesh Based Architecture

Architectures based on Linear Arrays • Aimed at mapping pipelines on linear arrays •

Architectures based on Linear Arrays • Aimed at mapping pipelines on linear arrays • If pipeline has forks longer lines spanning whole or part of the array are used • Eg. Ra. Pi. D, Pipe. Rench

Pipe. Rench – Linear Array based architecture

Pipe. Rench – Linear Array based architecture

Pipe. Rench – Linear Array Based Architecture

Pipe. Rench – Linear Array Based Architecture

Cross-bar based Architectures • Communication Network is easy to route • Uses restricted cross-bars

Cross-bar based Architectures • Communication Network is easy to route • Uses restricted cross-bars with hierarchical interconnect to save area • Eg. PADDI-1, PADDI-2, Pleiades

PADDI-2 – Cross-bar based architecture

PADDI-2 – Cross-bar based architecture

PADDI-2 Cross-bar based Architecture

PADDI-2 Cross-bar based Architecture

Coarse Grained Architectures

Coarse Grained Architectures

EGRA • Architectural template to enable design space exploration • Execute expressions as opposed

EGRA • Architectural template to enable design space exploration • Execute expressions as opposed to operations • Supports heterogeneous cells and various memory interfaces

EGRA

EGRA

Evolution of fine grained and coarse grained architectures

Evolution of fine grained and coarse grained architectures

EGRA – at Cell Level

EGRA – at Cell Level

Architectural Exploration

Architectural Exploration

Architectural exploration

Architectural exploration

EGRA vs CGRA vs FPGA

EGRA vs CGRA vs FPGA

EGRA – at array level • Organized as a mesh of cells of three

EGRA – at array level • Organized as a mesh of cells of three types – RACs – Memories – Multipliers • Cells are connected using both nearest neighbor and horizontal-vertical buses • Each cell has a I/O interface, context memory and core

Control Unit

Control Unit

EGRA Operation • DMA mode – Used to transfer data in bursts to EGRA

EGRA Operation • DMA mode – Used to transfer data in bursts to EGRA – To program cells and to read/write from scratchpad memories • Execution mode – Control unit orchestrates data flow between cells

EGRA – at array level

EGRA – at array level

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

EGRA Memory Interface • Data register at the output of computational cells • Memory

EGRA Memory Interface • Data register at the output of computational cells • Memory cells can be scattered around in the array • A scratchpad memory outside reconfigurable mesh

Architectural exploration - Area

Architectural exploration - Area

Architectural exploration - Delay

Architectural exploration - Delay

MORA

MORA

The reconfigurable Cell

The reconfigurable Cell

Operating modes of RC

Operating modes of RC

Interconnection Topology • Hierarchical – Level 1 used within 4 x 4 quadrant to

Interconnection Topology • Hierarchical – Level 1 used within 4 x 4 quadrant to provide nearest neighbor connectivity – Interleaved Horizontal and Vertical connectivity of length two – Each RC can receive data from at most two other RCs and send data to at-most four other RCs – Data and control across quadrants is guaranteed over Level 2 interconnection

Interconnection Topology

Interconnection Topology

Computational Strategies • Temporal computational load balancing • Spatial computational load balancing

Computational Strategies • Temporal computational load balancing • Spatial computational load balancing