Chapter 7 FunctionArchitecture Codesign Paradigm 1 Function Architecture

  • Slides: 77
Download presentation
Chapter 7 Function-Architecture Codesign Paradigm 1

Chapter 7 Function-Architecture Codesign Paradigm 1

Function Architecture Co-design Methodology l System Level design methodology l Top-down (synthesis) l Bottom-up

Function Architecture Co-design Methodology l System Level design methodology l Top-down (synthesis) l Bottom-up (constraint-driven) 2

Co-design Process Methodology Trade-off Architecture Mapping Synthesis HW Trade-off Verification Abstraction Refinement Function SW

Co-design Process Methodology Trade-off Architecture Mapping Synthesis HW Trade-off Verification Abstraction Refinement Function SW 3

System Level Design Vision Function casts a shadow Refinement Constrained Optimization and Co-design Abstraction

System Level Design Vision Function casts a shadow Refinement Constrained Optimization and Co-design Abstraction Architecture sheds light 4

Main Concepts l Decomposition l Abstraction and successive refinement l Target architectural exploration and

Main Concepts l Decomposition l Abstraction and successive refinement l Target architectural exploration and estimation 5

Decomposition l l l Top-down flow Find an optimal match between the application function

Decomposition l l l Top-down flow Find an optimal match between the application function and architectural application constraints (size, power, performance). Use separation of concerns approach to decompose a function into architectural units. 6

Abstraction & Successive Refinement l l l Function/Architecture formal trade-off is applied for mapping

Abstraction & Successive Refinement l l l Function/Architecture formal trade-off is applied for mapping function onto architecture Co-design and trade-off evaluation from the highest level down to the lower levels Successive refinement to add details to the earlier abstraction level 7

Target Architectural Exploration and Estimation l l l Synthesized target architecture is analyzed and

Target Architectural Exploration and Estimation l l l Synthesized target architecture is analyzed and estimated Architecture constraints are derived An adequate model of target architecture is built 8

Architectural Exploration in POLIS 9

Architectural Exploration in POLIS 9

Main Steps in Co-design and Synthesis l Function architecture co-design and trade-off – Fully

Main Steps in Co-design and Synthesis l Function architecture co-design and trade-off – Fully synthesize the architecture? – Co-simulation in trade-off evaluation • Functional debugging • Constraint satisfaction and missed deadlines • Processor utilization and task scheduling charts • Cost of implementation l Mapping function on the architecture – Architecture organization can be a pre-designed collection of components with various degrees of flexibilities – Matching the optimal function to the best architecture 10

Function/ Architecture Co-design vs. HW/SW Co-design l l Design problem over-simplified Must use Fun.

Function/ Architecture Co-design vs. HW/SW Co-design l l Design problem over-simplified Must use Fun. /Arch. Optimization & Co-design to match the optimal Function to the best Architecture 1. Fun. /Arch. Co-design and Trade-off 2. Mapping Function Onto Architecture 11

Reactive System Co-synthesis(1) Controldominated Design CDFG Representation EFSM Representation Decompose Map HW/SW Map EFSM:

Reactive System Co-synthesis(1) Controldominated Design CDFG Representation EFSM Representation Decompose Map HW/SW Map EFSM: Extended Finite State Machines CDFG: Control Data Flow directed acyclic Graph 12

Reactive System Co-synthesis(2) EFSM S 0 a: = 5 Mapping CDFG S 2 S

Reactive System Co-synthesis(2) EFSM S 0 a: = 5 Mapping CDFG S 2 S 1 a: = a + 1 S 2 BEGIN Case (state) S 1 S 0 emit(a) a : = 5 a : = a + 1 state : = S 2 a CDFG is suitable for describing EFSM reactive behavior but 8 Some of the control flow is hidden 8 Data cannot be propagated END 13

Data Flow Optimization S 0 a: = 5 EFSM Representation Optimized EFSM Representation S

Data Flow Optimization S 0 a: = 5 EFSM Representation Optimized EFSM Representation S 1 S 2 a: = a+ 61 a 14

Optimization and Co-design Approach l Architecture-independent phase – Task function is considered solely and

Optimization and Co-design Approach l Architecture-independent phase – Task function is considered solely and control data flow analysis is performed – Removing redundant information and computations l Architecture-dependent phase – Rely on architectural information to perform additional guided optimizations tuned to the target platform 15

Concrete Co-design Flow Architecture Function Esterel Reactive VHDL Constraints Specification Decomposition FFG Functional Optimization

Concrete Co-design Flow Architecture Function Esterel Reactive VHDL Constraints Specification Decomposition FFG Functional Optimization SHIFT Behavioral Optimization (CFSM Network) AFFG Macro-level Optimization AUX Modeling Micro-level Optimization Resource Pool Processor HW Partition Interface RTOS BUS SW Partition Interface Estimation and Validation Graphical EFSM HW 1 HW 5 HW 2 HW 3 HW 4 Cost-guided Optimization HW/SW RTOS/Interface Co-synthesis 16

Function/Architecture Co-Design Representation 17

Function/Architecture Co-Design Representation 17

Abstract Co-design Flow Desig n Application Decomposition I ASICs IDR fsm f. data processors

Abstract Co-design Flow Desig n Application Decomposition I ASICs IDR fsm f. data processors data control i/o O Function/Architecture Optimization and Co-design Mapping Processor Interface BUS RTOS HW Partition Interface SW Partition HW 1 HW 5 HW 2 HW 3 Hardware/Software Co-synthesis HW 4 18

Unifying Intermediate Design Representation for Co-design Design Functional Decomposition Architecture Independent IDR Architecture Dependent

Unifying Intermediate Design Representation for Co-design Design Functional Decomposition Architecture Independent IDR Architecture Dependent Intermediate Design Representation Constraints SW HW 19

Platform-Based Design Source: ASV 20

Platform-Based Design Source: ASV 20

Models and System l Models of computation – Petri-net model (graphical language for system

Models and System l Models of computation – Petri-net model (graphical language for system design) – FSM (Finite-State Machine) models – Hierarchical Concurrent FSM models l POLIS system – CFSM (Co-design FSM) – EFSM (Extended FSM): support for data handling and asynchronous communication 21

CFSM l Includes – Finite state machine – Data computation – Locally synchronous behavior

CFSM l Includes – Finite state machine – Data computation – Locally synchronous behavior – Globally asynchronous behavior l Semantics: GALS (Globally Asynchronous and Locally Synchronous communication model) 22

CFSM Network MOC F B=>C C=>F G C=>G C=>A CFSM 1 F^(G==1) C CFSM

CFSM Network MOC F B=>C C=>F G C=>G C=>A CFSM 1 F^(G==1) C CFSM 2 C C=>B A C=>B B (A==0)=>B CFSM 3 Communication between CFSMs by means of events MOC: Model of Computation 23

System Specification Language l “Esterel” – as “front-end” for functional specification – Synchronous programming

System Specification Language l “Esterel” – as “front-end” for functional specification – Synchronous programming language for specifying reactive real-time systems l Reactive VHDL l Graphical EFSM 24

Intermediate Design Representation (IDR) l Most current optimization and synthesis are performed at the

Intermediate Design Representation (IDR) l Most current optimization and synthesis are performed at the low abstraction level of a DAG (Direct Acyclic Graph). l Function Flow Graph (FFG) is an IDR having the notion of I/O semantics. l Textual interchange format of FFG is called C-Like Intermediate Format (CLIF). l FFG is generated from an EFSM description and can be in a Tree Form or a DAG Form. 25

(Architecture) Function Flow Graph Design Refinement Restriction Functional Decomposition Architecture Independent FFG Architecture Dependent

(Architecture) Function Flow Graph Design Refinement Restriction Functional Decomposition Architecture Independent FFG Architecture Dependent Constraints I/O Semantics EFSM Semantics AFFG SW HW 26

FFG/CLIF l Develop Function Flow Graph (FFG) / C-Like Intermediate Format (CLIF) • Able

FFG/CLIF l Develop Function Flow Graph (FFG) / C-Like Intermediate Format (CLIF) • Able to capture EFSM • Suitable for control and data flow analysis EFSM FFG Optimized FFG CDFG Data Flow/Control Optimizations 27

Function Flow Graph (FFG) – FFG is a triple G = (V, E, N

Function Flow Graph (FFG) – FFG is a triple G = (V, E, N 0) where • V is a finite set of nodes • E = {(x, y)}, a subset of V V; (x, y) is an edge from x to y where x Pred(y), the set of predecessor nodes of y. • N 0 V is the start node corresponding to the EFSM initial state. • An unordered set of operations is associated with each node N. • Operations consist of TESTs performed on the EFSM inputs and internal variables, and ASSIGNs of computations on the input alphabet (inputs/internal variables) to the EFSM output alphabet (outputs and internal (state) variables) 28

C-Like Intermediate Format (CLIF) l Import/Export Function Flow Graph (FFG) l “Un-ordered” list of

C-Like Intermediate Format (CLIF) l Import/Export Function Flow Graph (FFG) l “Un-ordered” list of TEST and ASSIGN operations – [if (condition)] goto label – dest = op(src) • op = {not, minus, …} – dest = src 1 op src 2 • op = {+, *, /, ||, &&, |, &, …} – dest = func(arg 1, arg 2, …) 29

Preserving I/O Semantics input inp; output outp; int a = 0; int CONST_0 =

Preserving I/O Semantics input inp; output outp; int a = 0; int CONST_0 = 0; int T 11 = 0; int T 13 = 0; S 1: S 2: S 3: goto S 2; a = inp; T 13 = a + 1 CONST_0; T 11 = a + a; outp = T 11; goto S 3; outp = T 13; goto S 3; 30

FFG / CLIF Example Legend: constant, output flow, dead operation S# = State, S#L#

FFG / CLIF Example Legend: constant, output flow, dead operation S# = State, S#L# = Label in State S# Function Flow Graph y=1 CLIF Textual Representation S 1: (cond 2 == 1) / output(b) S 1 x=x+y a= b+c a=x cond 1 = (y==cst 1) cond 2 = !cond 1; (cond 2 == 0) / output(a) x = x + y; a = b + c; a = x; cond 1 = (y == cst 1); cond 2 = !cond 1; if (cond 2) goto S 1 L 0 output = a; goto S 1; /* Loop */ S 1 L 0: output = b; goto S 1; 31

Tree-Form FFG 32

Tree-Form FFG 32

Function/Architecture Co-Design Function/Architecture Optimizations 33

Function/Architecture Co-Design Function/Architecture Optimizations 33

Function Optimization l Architecture-Independent optimization objective: – Eliminate redundant information in the FFG. –

Function Optimization l Architecture-Independent optimization objective: – Eliminate redundant information in the FFG. – Represent the information in an optimized FFG that has a minimal number of nodes and associated operations. 34

FFG Optimization Algorithm l FFG Optimization algorithm(G) begin while changes to FFG do Variable

FFG Optimization Algorithm l FFG Optimization algorithm(G) begin while changes to FFG do Variable Definition and Uses FFG Build Reachability Analysis Normalization Available Elimination False Branch Pruning Copy Propagation Dead Operation Elimination end while end 35

Optimization Approach l Develop optimizer for FFG (CLIF) intermediate design representation l Goal: Optimize

Optimization Approach l Develop optimizer for FFG (CLIF) intermediate design representation l Goal: Optimize for speed, and size by reducing – ASSIGN operations – TEST operations – variables l Reach goal by solving sequence of data flow problems for analysis and information gathering using an underlying Data Flow Analysis (DFA) framework l Optimize by information redundancy elimination 36

Sample DFA Problem Available Expressions Example l Goal is to eliminate re-computations – Formulate

Sample DFA Problem Available Expressions Example l Goal is to eliminate re-computations – Formulate Available Expressions Problem – Forward Flow (meet) Problem AE = f S 2 t 1: = a + 1 t 2: = b + 2 S 1 t: = a + 1 AE = {a+1} AE = {a+1, b+2} AE = {a+1} AE = Available Expression S 3 a : = a * 5 t 3 = a + 2 AE = {a+2} 37

Data Flow Problem Instance l A particular (problem) instance of a monotone data flow

Data Flow Problem Instance l A particular (problem) instance of a monotone data flow analysis framework is a pair I = (G, M) where M: N F is a function that maps each node N in V of FFG G to a function in F on the node label semilattice L of the framework D. 38

Data Flow Analysis Framework l A monotone data flow analysis framework D = (L,

Data Flow Analysis Framework l A monotone data flow analysis framework D = (L, , F) is used to manipulate the data flow information by interpreting the node labels on N in V of the FFG G as elements of an algebraic structure where – L is a bounded semilattice with meet , and – F is a monotone function space associated with L. 39

Solving Data Flow Problems Data Flow Equations AE = {f } S 2 t

Solving Data Flow Problems Data Flow Equations AE = {f } S 2 t 1: = a + 1 t 2: = b + 2 S 1 t: = a + 1 AE = {a+1} AE = {a+1, b+2} AE = {a+1} AE = Available Expression S 3 a : = a * 5 t 3 = a + 2 AE = {a+2} 40

Solving Data Flow Problems l Solve data flow problems using the iterative method –

Solving Data Flow Problems l Solve data flow problems using the iterative method – General: does not depend on the flow graph – Optimal for a class of data flow problems Reaches fixpoint in polynomial time (O(n 2)) 41

FFG Optimization Algorithm l Solve following problems in order to improve design: – Reaching

FFG Optimization Algorithm l Solve following problems in order to improve design: – Reaching Definitions and Uses – Normalization – Available Expression Computation – Copy Propagation, and Constant Folding – Reachability Analysis – False Branch Pruning l Code Improvement techniques – Dead Operation Elimination – Computation sharing through normalization 42

Function/Architecture Co-design 43

Function/Architecture Co-design 43

Function Architecture Optimizations l Fun. /Arch. Representation: – Attributed Function Flow Graph (AFFG) is

Function Architecture Optimizations l Fun. /Arch. Representation: – Attributed Function Flow Graph (AFFG) is used to represent architectural constraints impressed upon the functional behavior of an EFSM task. 44

Architecture Dependent Optimizations lib EFSM FFG OFFG Architectural Information AFFG CDFG Sum Architecture Independent

Architecture Dependent Optimizations lib EFSM FFG OFFG Architectural Information AFFG CDFG Sum Architecture Independent 45

EFSM in AFFG (State Tree) Form S 1 F 4 F 3 S 0

EFSM in AFFG (State Tree) Form S 1 F 4 F 3 S 0 F 5 F 1 F 0 S 2 F 6 F 7 F 8 46

Architecture Dependent Optimization Objective l Optimize the AFFG task representation for speed of execution

Architecture Dependent Optimization Objective l Optimize the AFFG task representation for speed of execution and size given a set of architectural constrains l Size: area of hardware, code size of software 47

Motivating Example 1 2 y=a+b a=c x=a+b 3 4 y=a+b 6 5 7 8

Motivating Example 1 2 y=a+b a=c x=a+b 3 4 y=a+b 6 5 7 8 Reactivity Loop y=a+b z=a+b a=c Eliminate the redundant needless runtime re-evaluation of the a+b operation 9 x=a+b 10 48

Cost-guided Relaxed Operation Motion (ROM) l For performing safe and operation from heavily executed

Cost-guided Relaxed Operation Motion (ROM) l For performing safe and operation from heavily executed portions of a design task to less visited segments l Relaxed-Operation-Motion (ROM): begin Data Flow and Control Optimization Reverse Sweep (dead operation addition, Normalization and available operation elimination, dead operation elimination) Forward Sweep (optional, minimize the lifetime) Final Optimization Pass end 49

Cost-Guided Operation Motion Cost Estimation User Input Profiling Design Optimization FFG (back-end) Attributed FFG

Cost-Guided Operation Motion Cost Estimation User Input Profiling Design Optimization FFG (back-end) Attributed FFG Inference Engine Relaxed Operation Motion 50

Function Architecture Co-design in the Micro-Architecture System Constraints Decomposition ASICs processors data control i/o

Function Architecture Co-design in the Micro-Architecture System Constraints Decomposition ASICs processors data control i/o Instruction Selection t 1= 3*b t 2= t 1+a emit x(t 2) System Specs Decomposition AFFG I fsm data f. O Operator Strength Reduction 51

Operator Strength Reduction t 1= 3*b t 2=t 1 + a x=t 2 expr

Operator Strength Reduction t 1= 3*b t 2=t 1 + a x=t 2 expr 1 = b + b; t 1 = expr 1 + b; t 2 = t 1 + a; x = t 2; Reducing the multiplication operator 52

Architectural Optimization l Abstract Target Platform – Macro-architectures of the HW or SW system

Architectural Optimization l Abstract Target Platform – Macro-architectures of the HW or SW system design tasks l CFSM (Co-design FSM): FSM with reactive behavior – A reactive block – A set of combinational data-low functions l Software Hardware Intermediate Format (SHIFT) – SHIFT = CFSMs + Functions 53

Macro-Architectural Organization Processor Interface BUS RTOS HW Partition Interface SW Partition HW 1 HW

Macro-Architectural Organization Processor Interface BUS RTOS HW Partition Interface SW Partition HW 1 HW 5 HW 2 HW 3 HW 4 54

Architectural Organization of a Single CFSM Task CFSM 55

Architectural Organization of a Single CFSM Task CFSM 55

Task Level Control and Data Flow Organization c Reactive Controller b EQ s a_EQ_b

Task Level Control and Data Flow Organization c Reactive Controller b EQ s a_EQ_b a INC_a INC RESET_a a MUX y RESET 0 1 56

CFSM Network Architecture l Software Hardware Intermediate Forma. T (SHIFT) for describing a network

CFSM Network Architecture l Software Hardware Intermediate Forma. T (SHIFT) for describing a network of CFSMs l It is a hierarchical netlist of – Co-design finite state machine – Functions: state-less arithmetic, Boolean, or user-defined operations 57

SHIFT: CFSMs + Functions 58

SHIFT: CFSMs + Functions 58

Architectural Modeling l Using an AUXiliary specification (AUX) l AUX can describe the following

Architectural Modeling l Using an AUXiliary specification (AUX) l AUX can describe the following information – Signal and variable type-related information – Definition of the value of constants – Creation of hierarchical netlist, instantiating and interconnecting the CFSMs described in SHIFT 59

Mapping AFFG onto SHIFT l Synthesis through mapping AFFG onto SHIFT and AUX (Auxiliary

Mapping AFFG onto SHIFT l Synthesis through mapping AFFG onto SHIFT and AUX (Auxiliary Specification) l Decompose each AFFG task behavior into a single reactive control part, and a set of data-path functions. Mapping AFFG onto SHIFT Algorithm (G, AUX) begin foreach state s belong to G do build_trel (s. trel , s, s. start_node, G, AUX); end foreach end 60

Architecture Dependent Optimizations l Additional architecture Information leads to an increased level of macro-

Architecture Dependent Optimizations l Additional architecture Information leads to an increased level of macro- (or micro-) architectural optimization l Examples of macro-arch. Optimization – Multiplexing computation Inputs – Function sharing l Example of micro-arch. Optimization – Data Type Optimization 61

Distributing the Reactive Controller Move some of the control into data path as an

Distributing the Reactive Controller Move some of the control into data path as an ITE assign expression d e Reactive Controller s MUX … e a ITE Tout b d ITE out c ITE: if-then-else 62

Multiplexing Inputs c c=a + T(b+c) + T(b+a) b a Control {1, 2} T=b+c

Multiplexing Inputs c c=a + T(b+c) + T(b+a) b a Control {1, 2} T=b+c a 1 c 2 b -c- + T(b+-c-) b 63

Micro-Architectural Optimization Available Expressions cannot eliminate T 2 But if variables are registered (additional

Micro-Architectural Optimization Available Expressions cannot eliminate T 2 But if variables are registered (additional architectural information) we can share T 1 and T 2 l l S 1 T 1 = a + b; x = T 1; a = c; x a S 2 T 2 = a + b; + Out = T(a+b); emit(Out) b T(a+b) Out 64

Function/Architecture Co-Design Hardware/Software Co-Synthesis and Estimation 65

Function/Architecture Co-Design Hardware/Software Co-Synthesis and Estimation 65

Co-Synthesis Flow FFG Interpreter (Simulation) EFSM FFG AFFG CDFG SHIFT Or Software Compilation Hardware

Co-Synthesis Flow FFG Interpreter (Simulation) EFSM FFG AFFG CDFG SHIFT Or Software Compilation Hardware Synthesis Object Code (. o) Netlist 66

POLIS Co-design Environment Graphical EFSM ESTEREL . . . . Compilers SW Estimation SW

POLIS Co-design Environment Graphical EFSM ESTEREL . . . . Compilers SW Estimation SW Code + RTOS HW Synthesis CFSMs SW Synthesis Partitioning HW Estimation Logic Netlist Performance/trade-off Evaluation Physical Prototyping Programmable Board · m. P of choice · FPGAs · FPICs 67

POLIS Co-design Environment l Specification: FSM-based languages (Esterel, . . . ) l Internal

POLIS Co-design Environment l Specification: FSM-based languages (Esterel, . . . ) l Internal representation: CFSM network l Validation: – High-level co-simulation – FSM-based formal verification – Rapid prototyping l Partitioning: based on co-simulation estimates l Scheduling l Synthesis: – S-graph (based on a CDFG) based code synthesis for software – Logic synthesis for hardware l Main emphasis on unbiased verifiable specification 68

Hardware/Software Co-Synthesis l Functional GALS CFSM model for hardware and software +initially unbounded delays

Hardware/Software Co-Synthesis l Functional GALS CFSM model for hardware and software +initially unbounded delays refined after architecture mapping l Automatic synthesis of: • Hardware • Software • Interfaces • RTOS 69

RTOS Synthesis and Evaluation in Polis Resource Pool CFSM Network 1. Provide communication mechanisms

RTOS Synthesis and Evaluation in Polis Resource Pool CFSM Network 1. Provide communication mechanisms among CFSMs implemented in SW and between the OS is running on and HW partitions. 2. Schedule the execution of the SW tasks. HW/SW Synthesis RTOS Synthesis Physical Prototyping 70

Estimation on the Synthesis CDFG BEGIN detect(c) 26 40 a<b T 41 63 F

Estimation on the Synthesis CDFG BEGIN detect(c) 26 40 a<b T 41 63 F F a : = 0 T a : = a + 1 9 18 emit(y) END 14 71

Architecture Evaluation Problem System Behavior e in ef R System Architecture e in ef

Architecture Evaluation Problem System Behavior e in ef R System Architecture e in ef R Out of Spec High Cost HDL Time and Money 72

Proper Architectural Evaluation System Behavior System Architecture e in ef R In Spec Low

Proper Architectural Evaluation System Behavior System Architecture e in ef R In Spec Low Cost Implementation Time and Money 73

Estimation-Based Co-simulation Network of EFSMs HW/SW Partitioning SW Estimation HW/SW Co-Simulation Performance/trade-off Evaluation 74

Estimation-Based Co-simulation Network of EFSMs HW/SW Partitioning SW Estimation HW/SW Co-Simulation Performance/trade-off Evaluation 74

Co-simulation Approach (1) l Fills the “validation gap” between fast and slow models –

Co-simulation Approach (1) l Fills the “validation gap” between fast and slow models – l l Performs performance simulation based on software and hardware timing estimates Outputs behavioral VHDL code – Generated from CDFG describing EFSM reactive function – Annotated with clock cycles required on target processors Can incorporate VHDL models of pre-existing components 75

Co-simulation Approach (2) l Models of mixed hardware, software, RTOS and interfaces l Mimics

Co-simulation Approach (2) l Models of mixed hardware, software, RTOS and interfaces l Mimics the RTOS I/O monitoring and scheduling – Hardware CFSMs are concurrent – Only one software CFSM can be active at a time l Future Work – Architectural view instead of component view 76

Research Directions in F-A Codesign l Functional decomposition, cross- “block” optimization ~ hardware/software partitioning

Research Directions in F-A Codesign l Functional decomposition, cross- “block” optimization ~ hardware/software partitioning techniques l Task and system level algorithm manipulations ~ performing user-guided algorithmic manipulations 77