COE 561 Digital System Design Synthesis Architectural Synthesis

  • Slides: 52
Download presentation
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer

COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits]

Outline n Motivation n Dataflow graphs n Sequencing graphs n Compilation and behavioral optimization

Outline n Motivation n Dataflow graphs n Sequencing graphs n Compilation and behavioral optimization n Resources n Constraints n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding 2

Synthesis n Transform behavioral into structural view. n Architectural-level synthesis n Logic-level synthesis •

Synthesis n Transform behavioral into structural view. n Architectural-level synthesis n Logic-level synthesis • Architectural abstraction level. • Determine macroscopic structure. • Example: major building blocks like adder, register, mux. • Logic abstraction level. • Determine microscopic structure. • Example: logic gate interconnection. 3

Synthesis and Optimization 4

Synthesis and Optimization 4

Architectural Design Space Example … 5

Architectural Design Space Example … 5

Different Design Solutions 1 Multiplier , 1 ALU 2 Multipliers, 2 ALUs 6

Different Design Solutions 1 Multiplier , 1 ALU 2 Multipliers, 2 ALUs 6

Example of Structures 7

Example of Structures 7

Area vs. Latency Tradeoffs Multiplier Area: 5 Adder Area: 1 Other logic Area: 1

Area vs. Latency Tradeoffs Multiplier Area: 5 Adder Area: 1 Other logic Area: 1 8

Architectural-Level Synthesis Motivation n Raise input abstraction level. n Reduce design time. n Explore

Architectural-Level Synthesis Motivation n Raise input abstraction level. n Reduce design time. n Explore and optimize macroscopic structure • Reduce specification of details. • Extend designer base. • Self-documenting design specifications. • Ease modifications and extensions. • Series/parallel execution of operations. 9

Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization • Optimize

Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization • Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization • Create macroscopic structure • data-path and control-unit. • Consider area and delay information of the implementation. 10

Dataflow Graphs… n Behavioral views of architectural models. n Useful to represent datapaths. n

Dataflow Graphs… n Behavioral views of architectural models. n Useful to represent datapaths. n Graph n Dependencies arise due • Vertices = operations. • Edges = dependencies. • Input to an operation is result • • of another operation. Serialization constraints in specification. Two tasks share the same resource. 11

…Dataflow Graphs n Assumes the existence of variables who store information required and generated

…Dataflow Graphs n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to operation. n Values must be preserved during life-time. 12

Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs •

Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs • Control Data Flow Graphs • • • (CDFGs). Operation serialization. Hierarchy. Control-flow commands • branching and iteration. • Polar: source and sink. n Paths in the graph represent concurrent streams of operations. 13

Example of Hierarchy n Two kinds of vertices • Operations • Links: linking sequencing

Example of Hierarchy n Two kinds of vertices • Operations • Links: linking sequencing graph entities in the hierarchy • Model call • Branching • Iteration n n Vertex vi is a predecessor of vertex vj if there is a path with tail vi and head vj Vertex vi is a successor of vertex vj if there is a path with head vi and tail vj 14

Example of Branching… n Branching modeled by • Branching clause • Branching body •

Example of Branching… n Branching modeled by • Branching clause • Branching body • Set of tasks selected according to value of branching clause. n Several branching bodies n A sequencing graph entity associated with each branch body. n Link vertex models • Mutual exclusive execution. • Branching clause. • Operation of evaluating clause and taking branch decision. 15

…Example of Branching n x= a*b n y=x*c n z=a+b n If (z 0)

…Example of Branching n x= a*b n y=x*c n z=a+b n If (z 0) • {p=m+n; q=m*n} 16

Iterative Constructs n Iterative constructs modeled by n Iteration body is a set of

Iterative Constructs n Iterative constructs modeled by n Iteration body is a set of tasks repeated as long as iteration clause is true. n Iteration modeled through use of hierarchy. n Iteration represented as repeated model call to sequencing graph entity modeling iteration body. n Link vertex models the operation of evaluating the iteration cause. • Iteration clause • Iteration body 17

Example of Iteration… 18

Example of Iteration… 18

…Example of Iteration Loop Body 19

…Example of Iteration Loop Body 19

Semantics of Sequencing Graphs n Marking of vertices n Firing an operation means starting

Semantics of Sequencing Graphs n Marking of vertices n Firing an operation means starting its execution. Execution semantics n • Waiting for execution. • Executing. • Have completed execution. • An operation can be fired as soon as all its immediate predecessors have completed execution. n n n Model can be reset by making all operations waiting for execution. Model can be fired (executed) by firing the source vertex. Model completes execution when sink completes execution. 20

Vertex Attributes n Area cost. n Delay cost n Data-dependent execution delays • Propagation

Vertex Attributes n Area cost. n Delay cost n Data-dependent execution delays • Propagation delay. • Execution delay. • Bounded (e. g. branching). • Maximum and minimum delays can be computed • E. g. floating-point data normalization requiring conditional data alignment. • Unbounded (e. g. iteration, synchronization). 21

Properties of Sequencing Graphs n Computed by visiting hierarchy bottom-up. n Area estimate n

Properties of Sequencing Graphs n Computed by visiting hierarchy bottom-up. n Area estimate n Delay estimate (latency) • Sum of the area attributes of all vertices. • Worst-case -- no sharing. • Bounded-latency graphs. • Length of longest path. 22

Compilation and Behavioral Optimization n Software compilation n Hardware compilation • Compile program into

Compilation and Behavioral Optimization n Software compilation n Hardware compilation • Compile program into intermediate form. • Optimize intermediate form. • Generate target code for an architecture. • Compile HDL model into sequencing graph. • Optimize sequencing graph. • Generate gate-level interconnection for a cell library. 23

Hardware and Software Compilation 24

Hardware and Software Compilation 24

Compilation n Front-end n Semantic analysis • Lexical and syntax analysis. • Parse-tree generation.

Compilation n Front-end n Semantic analysis • Lexical and syntax analysis. • Parse-tree generation. • Macro-expansion. • Expansion of meta-variables. • Data-flow and control-flow analysis. • Type checking. • Resolve arithmetic and relational operators. 25

Parse Tree Example n a=p+q*r 26

Parse Tree Example n a=p+q*r 26

Behavioral-Level Optimization n Semantic-preserving transformations aiming at simplifying the model. n Applied to parse-trees

Behavioral-Level Optimization n Semantic-preserving transformations aiming at simplifying the model. n Applied to parse-trees or during their generation. n Taxonomy • Data-flow based transformations. • Control-flow based transformations. 27

Data-Flow Based Transformations n Tree-height reduction. n Constant and variable propagation. n Common subexpression

Data-Flow Based Transformations n Tree-height reduction. n Constant and variable propagation. n Common subexpression elimination. n Dead-code elimination. n Operator-strength reduction. n Code motion. 28

Tree-Height Reduction n Applied to arithmetic expressions. n Goal • Split into two-operand expressions

Tree-Height Reduction n Applied to arithmetic expressions. n Goal • Split into two-operand expressions to exploit hardware parallelism at best. n Techniques • Balance the expression tree. • Exploit commutativity, associativity and distributivity. 29

Example of Tree-Height Reduction using Commutativity and Associativity n x = a + b

Example of Tree-Height Reduction using Commutativity and Associativity n x = a + b * c + d => x = (a + d) + b * c 30

Example of Tree-Height Reduction using Distributivity n x = a * (b * c

Example of Tree-Height Reduction using Distributivity n x = a * (b * c * d + e) => x = a * b * c * d + a * e 31

Examples of Propagation & Subexpression Elimination n Constant propagation n Variable propagation n Subexpression

Examples of Propagation & Subexpression Elimination n Constant propagation n Variable propagation n Subexpression elimination • a = 0; b = a+1; c = 2 * b; • a = 0; b = 1; c = 2; • a = x; b = a+1; c = 2 * a; • a = x; b = x+1; c = 2 * x; • Search isomorphic patterns in the parse trees. • Example • a = x+y; b = a+1; c = x+y; • a = x+y; b = a+1; c = a; 32

Examples of Other Transformations n Dead-code elimination n Operator-strength reduction n Code motion •

Examples of Other Transformations n Dead-code elimination n Operator-strength reduction n Code motion • a = x; b = x+1; c = 2 * x; • a = x; can be removed if not referenced. • a = x 2; b = 3 * x; • a = x * x; t = x << 1; b = x+t. • for (i = 1; i < a * b) { } ; • t = a * b; for (i = 1; i < t) { }. 33

Control-Flow Based Transformations n Model expansion. n Conditional expansion. n Loop expansion. n Block-level

Control-Flow Based Transformations n Model expansion. n Conditional expansion. n Loop expansion. n Block-level transformations. n Model Expansion • Expand subroutine -- flatten hierarchy. • Useful to expand scope of other optimization techniques. • Problematic when routine is called more than once. • Example • x = a+b; y = a * b; z = foo(x; y); • foo(p; q){ t = q - p; return(t); } • By expanding foo • x = a+b; y = a * b; z = y - x 34

Conditional Expansion n Transform conditional into parallel execution with test at the end. n

Conditional Expansion n Transform conditional into parallel execution with test at the end. n Useful when test depends on late signals. n May preclude hardware sharing. n Always useful for logic expressions. n Example • If (A>B) { Y= A-B} Else {Y=B-A}. • y = ab; if (a) {x = b + d; } else {x = bd; } • can be expanded to: x = a (b+d) + a’bd • and simplified as: y = ab; x = y + d (a+b) 35

Loop Expansion n Applicable to loops with data-independent exit conditions. n Useful to expand

Loop Expansion n Applicable to loops with data-independent exit conditions. n Useful to expand scope of other optimization techniques. n Problematic when loop has many iterations. n Example • x = 0; for (i = 1; i 3; i++) {x = x+1; } • Expanded to: • x = 0; x = x+1; x = x+2; x = x+3 36

Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of buildingblocks. n Explore

Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of buildingblocks. n Explore area/performance trade-offs • maximum performance implementations subject to area • constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control. 37

Design Space and Objectives n Design space n Implementation parameters • Set of all

Design Space and Objectives n Design space n Implementation parameters • Set of all feasible implementations. • Area. • Performance • Cycle-time, • Latency, • Throughput (for pipelined implementations). • Power consumption. 38

Design Evaluation Space 39

Design Evaluation Space 39

Circuit Specification for Architectural Synthesis n Circuit behavior n Building blocks • Sequencing graphs.

Circuit Specification for Architectural Synthesis n Circuit behavior n Building blocks • Sequencing graphs. • Resources. • Functional resources: process data (e. g. ALU). • Memory resources: store data (e. g. Register). • Interface resources: support data transfer (e. g. MUX and Buses). n Constraints • Interface constraints • Format and timing of I/O data transfers. • Implementation constraints • Timing and resource usage. • Area • Cycle-time and latency 40

Resources n Functional resources: perform operations on data. • Example: arithmetic and logic blocks.

Resources n Functional resources: perform operations on data. • Example: arithmetic and logic blocks. • Standard resources • Existing macro-cells. • Well characterized (area/delay). • Example: adders, multipliers, ALUs, Shifters, . . . • Application-specific resources • Circuits for specific tasks. • Yet to be synthesized. • Example: instruction decoder. n Memory resources: store data. n Interface resources • Example: memory and registers. • Example: busses and ports. 41

Resources and Circuit Families n Resource-dominated circuits. • Area and performance depend on few,

Resources and Circuit Families n Resource-dominated circuits. • Area and performance depend on few, well-characterized • n blocks. Example: DSP circuits. Non resource-dominated circuits. • Area and performance are strongly influenced by sparse logic, • control and wiring. Example: some ASIC circuits. 42

Synthesis in the Temporal Domain: Scheduling n Goal n • Associate a start-time with

Synthesis in the Temporal Domain: Scheduling n Goal n • Associate a start-time with each operation. • Satisfying all the sequencing (timing and resource) constraint. • Determine area/latency trade-off. • Determine latency and parallelism of the implementation. Scheduled sequencing graph • Sequencing graph with start-time annotation. n Unconstrained scheduling. Scheduling with timing constraints n Scheduling with resource constraints. n • Latency. • Detailed timing constraints. 43

Scheduling… 4 Multipliers, 2 ALUs 1 Multiplier , 1 ALU 44

Scheduling… 4 Multipliers, 2 ALUs 1 Multiplier , 1 ALU 44

… Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs 45

… Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs 45

Synthesis in the Spatial Domain: Binding n Sharing n Bound sequencing graph • Associate

Synthesis in the Spatial Domain: Binding n Sharing n Bound sequencing graph • Associate a resource with each operation with the same type. • Determine area of the implementation. • Bind a resource to more than one operation. • Operations must not execute concurrently. • Sequencing graph with resource annotation. 46

Example: Bound Sequencing Graph 47

Example: Bound Sequencing Graph 47

Binding Specification n Mapping from the vertex set to the set of resource instances,

Binding Specification n Mapping from the vertex set to the set of resource instances, for each given type. n Partial binding n • Partial mapping, given as design constraint. Compatible binding • Binding satisfying the constraints of the partial binding. 48

Performance and Area Estimation n Resource-dominated circuits • Area = sum of the area

Performance and Area Estimation n Resource-dominated circuits • Area = sum of the area of the resources bound to the operations. • Determined by binding. • Latency = start time of the sink operation (minus start time of the source operation). • Determined by scheduling n Non resource-dominated circuits • Area also affected by • registers, steering logic, wiring and control. • Cycle-time also affected by • steering logic, wiring and (possibly) control. 49

Approaches to Architectural Optimization n Multiple-criteria optimization problem n Determine Pareto optimal points •

Approaches to Architectural Optimization n Multiple-criteria optimization problem n Determine Pareto optimal points • area, latency, cycle-time. • Implementations such that no other has all parameters with inferior values. n Draw trade-off curves n Area/latency trade-off n Cycle-time/latency trade-off n Area/cycle-time trade-off • discontinuous and highly nonlinear. • for some values of the cycle-time. • for some binding (area). • for some schedules (latency). 50

Area/Latency Trade-off… n n n Rationale • Cycle-time dictated by system constraints. Resource-dominated circuits

Area/Latency Trade-off… n n n Rationale • Cycle-time dictated by system constraints. Resource-dominated circuits • Area is determined by resource usage. General circuits • • Area and delay affected by registers, steering logic, wiring and control logic. Complex dependency of area and delay on circuit structure. Scheduling and binding are deeply interrelated. • • Most approaches perform scheduling before binding (fits well for CPU and DSP designs). Performing binding before scheduling fits control dominated designs. Approaches • • Schedule for minimum latency under resource constraints. Schedule for minimum resource usage under latency constraints. 51

…Area/Latency Trade-off n Areas smaller than 20 units. n Latency less than 8 cycles.

…Area/Latency Trade-off n Areas smaller than 20 units. n Latency less than 8 cycles. n ALU area = 1 unit. n MUL area = 5 units. n Overhead area = 1 unit. n ALU propagation delay 25 ns. n MUL propagation delay 35 ns. n Cycle time = 40 ns • Resources have unit execution delay. n Cycle time = 30 ns • MUL has 2 unit execution delay. 52