COE 561 Digital System Design Synthesis Architectural Synthesis

  • Slides: 42
Download presentation
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer

COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits]

Outline n Motivation n Dataflow graphs & Sequencing graphs n Resources n Synthesis in

Outline n Motivation n Dataflow graphs & Sequencing graphs n Resources n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding n Scheduling Models n Algorithmic Solution to the Optimum Binding Problem n Register Binding Problem • Unconstrained scheduling • Scheduling with timing constraints • Scheduling with resource constraints 2

Synthesis n Transform behavioral into structural view. n Architectural-level synthesis n Logic-level synthesis •

Synthesis n Transform behavioral into structural view. n Architectural-level synthesis n Logic-level synthesis • Architectural abstraction level. • Determine macroscopic structure. • Example: major building blocks like adder, register, mux. • Logic abstraction level. • Determine microscopic structure. • Example: logic gate interconnection. 3

Synthesis and Optimization 4

Synthesis and Optimization 4

Architectural-Level Synthesis Motivation n Raise input abstraction level. n Reduce design time. n Explore

Architectural-Level Synthesis Motivation n Raise input abstraction level. n Reduce design time. n Explore and optimize macroscopic structure • Reduce specification of details. • Extend designer base. • Self-documenting design specifications. • Ease modifications and extensions. • Series/parallel execution of operations. 5

Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization • Optimize

Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization • Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization • Create macroscopic structure • data-path and control-unit. • Consider area and delay information of the implementation. 6

Dataflow Graphs… n Behavioral views of architectural models. n Useful to represent datapaths. n

Dataflow Graphs… n Behavioral views of architectural models. n Useful to represent datapaths. n Graph n Dependencies arise due • Vertices = operations. • Edges = dependencies. • Input to an operation is result • • of another operation. Serialization constraints in specification. Two tasks share the same resource. 7

…Dataflow Graphs n Assumes the existence of variables who store information required and generated

…Dataflow Graphs n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to an operation. n Values must be preserved during life-time. 8

Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs •

Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs • Control Data Flow Graphs • • (CDFGs). Polar: source and sink. Operation serialization. Hierarchy. Control-flow commands • branching and iteration. n Paths in the graph represent concurrent streams of operations. 9

Behavioral-level optimization n Tree-height reduction using commutativity and associativity n x = a +

Behavioral-level optimization n Tree-height reduction using commutativity and associativity n x = a + b * c + d => x = (a + d) + b * c n Tree-height reduction using distributivity n x = a * (b * c * d + e) => x=a*b*c*d+a*e 10

Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of buildingblocks. n Explore

Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of buildingblocks. n Explore area/performance trade-offs • maximum performance implementations subject to area • constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control. 11

Circuit Specification for Architectural Synthesis n Circuit behavior n Building blocks • Sequencing graphs.

Circuit Specification for Architectural Synthesis n Circuit behavior n Building blocks • Sequencing graphs. • Resources. • Functional resources: process data (e. g. ALU). • Memory resources: store data (e. g. Register). • Interface resources: support data transfer (e. g. MUX and Buses). n Constraints • Interface constraints • Format and timing of I/O data transfers. • Implementation constraints • Timing and resource usage. • Area • Cycle-time and latency 12

Resources n Functional resources: perform operations on data. • Example: arithmetic and logic blocks.

Resources n Functional resources: perform operations on data. • Example: arithmetic and logic blocks. • Standard resources • Existing macro-cells. • Well characterized (area/delay). • Example: adders, multipliers, ALUs, Shifters, . . . • Application-specific resources • Circuits for specific tasks. • Yet to be synthesized. • Example: instruction decoder. n Memory resources: store data. n Interface resources • Example: memory and registers. • Example: busses and ports. 13

Resources and Circuit Families n Resource-dominated circuits. • Area and performance depend on few,

Resources and Circuit Families n Resource-dominated circuits. • Area and performance depend on few, well-characterized • n blocks. Example: DSP circuits. Non resource-dominated circuits. • Area and performance are strongly influenced by sparse logic, • control and wiring. Example: some ASIC circuits. 14

Synthesis in the Temporal Domain: Scheduling n Goal n • Associate a start-time with

Synthesis in the Temporal Domain: Scheduling n Goal n • Associate a start-time with each operation. • Satisfying all the sequencing (timing and resource) constraint. • Determine area/latency trade-off. • Determine latency and parallelism of the implementation. Scheduled sequencing graph • Sequencing graph with start-time annotation. n Unconstrained scheduling. n Scheduling with timing constraints n Scheduling with resource constraints. 15

Scheduling… 4 Multipliers, 2 ALUs 1 Multiplier , 1 ALU 16

Scheduling… 4 Multipliers, 2 ALUs 1 Multiplier , 1 ALU 16

… Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs 17

… Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs 17

Synthesis in the Spatial Domain: Binding n Sharing n Bound sequencing graph • Associate

Synthesis in the Spatial Domain: Binding n Sharing n Bound sequencing graph • Associate a resource with each operation with the same type. • Determine area of the implementation. • Bind a resource to more than one operation. • Operations must not execute concurrently. • Sequencing graph with resource annotation. 18

Example: Bound Sequencing Graph 19

Example: Bound Sequencing Graph 19

Performance and Area Estimation n Resource-dominated circuits • Area = sum of the area

Performance and Area Estimation n Resource-dominated circuits • Area = sum of the area of the resources bound to the operations. • Determined by binding. • Latency = start time of the sink operation (minus start time of the source operation). • Determined by scheduling n Non resource-dominated circuits • Area also affected by • registers, steering logic, wiring and control. • Cycle-time also affected by • steering logic, wiring and (possibly) control. 20

Scheduling n Circuit model n Scheduling n Goal n Scheduling affects • Sequencing graph.

Scheduling n Circuit model n Scheduling n Goal n Scheduling affects • Sequencing graph. • Cycle-time is given. • Operation delays expressed in cycles. • Determine the start times for the operations. • Satisfying all the sequencing (timing and resource) constraint. • Determine area/latency trade-off. • Area: maximum number of concurrent operations of same • type is a lower bound on required hardware resources. Performance: concurrency of resulting implementation. 21

Scheduling Models n Unconstrained scheduling. n Scheduling with timing constraints n Scheduling with resource

Scheduling Models n Unconstrained scheduling. n Scheduling with timing constraints n Scheduling with resource constraints. n Simplest scheduling model • Latency. • Detailed timing constraints. • All operations have bounded delays. • All delays are in cycles. • Cycle-time is given. • No constraints - no bounds on area. • Goal • Minimize latency. 22

Minimum-Latency Unconstrained Scheduling Problem n n Given a set of operations V with integer

Minimum-Latency Unconstrained Scheduling Problem n n Given a set of operations V with integer delays D and a partial order on the operations E Find an integer labeling of the operations : V Z+, such that • ti = (vi), • ti tj + dj i, j s. t. (vj, vi) E • and tn is minimum. n Unconstrained scheduling used when • Dedicated resources are used. • Operations differ in type. • Operations cost is marginal when compared to that of steering logic, registers, wiring, and control logic. • Binding is done before scheduling: resource conflicts solved by serializing operations sharing same resource. • Deriving bounds on latency for constrained problems. 23

ASAP Scheduling Algorithm n Denote by ts the start times computed by the as

ASAP Scheduling Algorithm n Denote by ts the start times computed by the as soon as possible (ASAP) algorithm. n Yields minimum values of start times. 24

ALAP Scheduling Algorithm n Denote by t. L the start times computed by the

ALAP Scheduling Algorithm n Denote by t. L the start times computed by the as late as possible (ALAP) algorithm. n Yields maximum values of start times. n Latency upper bound 25

Latency-Constrained Scheduling n ALAP solves a latency-constrained problem. n Latency bound can be set

Latency-Constrained Scheduling n ALAP solves a latency-constrained problem. n Latency bound can be set to latency computed by ASAP algorithm. n Mobility • Defined for each operation. • Difference between ALAP and ASAP schedule. • Zero mobility implies that an operation can be started only at • n one given time step. Mobility greater than 0 measures span of time interval in which an operation may start. Slack on the start time. 26

Example n n n Operations with zero mobility • • {v 1, v 2,

Example n n n Operations with zero mobility • • {v 1, v 2, v 3, v 4, v 5}. Critical path. Operations with mobility one • {v 6, v 7}. Operations with mobility two • {v 8, v 9, v 10, v 11} 27

Minimum Latency Resource-Constrained Scheduling Problem n Given a set of ops V with integer

Minimum Latency Resource-Constrained Scheduling Problem n Given a set of ops V with integer delays D, a partial order on the operations E, and upper bounds {ak; k = 1, 2, … , nres} n Find an integer labeling of the operations : V Z+, such that • ti = (vi), • t i t j + dj i, j s. t. (vj, vi) E : V {1, 2, …nres} • and tn is minimum. n Number of operations of any given type in any schedule step does not exceed bound. 28

List Scheduling Algorithms n Heuristic method for n Greedy strategy. n Priority list heuristics.

List Scheduling Algorithms n Heuristic method for n Greedy strategy. n Priority list heuristics. • Minimum latency subject to resource bound. • Minimum resource subject to latency bound. • Assign a weight to each vertex indicating its scheduling priority • Longest path to sink. • Longest path to timing constraint. 29

List Scheduling Algorithm for Minimum Latency … 30

List Scheduling Algorithm for Minimum Latency … 30

… List Scheduling Algorithm for Minimum Latency n n Candidate Operations Ul, k •

… List Scheduling Algorithm for Minimum Latency n n Candidate Operations Ul, k • Operations of type k whose predecessors are scheduled and completed at time step before l Unfinished operations Tl, k are operations of type k that started at earlier cycles and whose execution is not finished at time l • Note that when execution delays are 1, Tl, k is empty. 31

Example n n n Assumptions • • a 1 = 2 multipliers with delay

Example n n n Assumptions • • a 1 = 2 multipliers with delay 1. a 2 = 2 ALUs with delay 1. First Step • • • U 1, 1 = {v 1, v 2, v 6, v 8} Select {v 1, v 2} U 1, 2 = {v 10}; selected Second step • • • U 2, 1 = {v 3, v 6, v 8} select {v 3, v 6} U 2, 2 = {v 11}; selected Third step • • • U 3, 1 = {v 7, v 8} Select {v 7, v 8} U 3, 2 = {v 4}; selected Fourth step • U 4, 2 = {v 5, v 9}; selected 32

Example n Assumptions • • a 1 = 3 multipliers with delay 2. a

Example n Assumptions • • a 1 = 3 multipliers with delay 2. a 2 = 1 ALU with delay 1. 33

List Scheduling Algorithm for Minimum Resource Usage 34

List Scheduling Algorithm for Minimum Resource Usage 34

Example n n n Assume =4 Let a = [1, 1]T First Step •

Example n n n Assume =4 Let a = [1, 1]T First Step • U 1, 1 = {v 1, v 2, v 6, v 8} • Operations with zero slack {v 1, v 2} • a = [2, 1]T • U 1, 2 = {v 10} Second step • U 2, 1 = {v 3, v 6, v 8} • Operations with zero slack {v 3, v 6} • U 2, 2 = {v 11} Third step • U 3, 1 = {v 7, v 8} • Operations with zero slack {v 7, v 8} • U 3, 2 = {v 4} Fourth step • U 4, 2 = {v 5, v 9} • Both have zero slack; a = [2, 2]T 35

Allocation and Binding n Allocation n Binding n Sharing n Optimum binding/sharing • Determine

Allocation and Binding n Allocation n Binding n Sharing n Optimum binding/sharing • Determine number of resources needed • Mapping between operations and resources. • Assignment of a resource to more than one operation. • Minimize the resource usage. 36

Compatibility and Conflicts n n n Operation compatibility • • Same resource type. Non

Compatibility and Conflicts n n n Operation compatibility • • Same resource type. Non concurrent. Compatibility graph • • Vertices: operations. Edges: compatibility relation. Conflict graph • Complement of compatibility graph. Multiplier ALU 37

Algorithmic Solution to the Optimum Binding Problem n Compatibility graph. n Conflict graph. n

Algorithmic Solution to the Optimum Binding Problem n Compatibility graph. n Conflict graph. n NP-complete problems - Heuristic algorithms. • Partition the graph into a minimum number of cliques. • Find clique cover number. • Color the vertices by a minimum number of colors. • Find chromatic number. 38

Example 1 2 5 3 2 1 ALU 1: 1, 3, 5 ALU 2:

Example 1 2 5 3 2 1 ALU 1: 1, 3, 5 ALU 2: 2, 4 5 4 39

Register Binding Problem n Given a schedule n Conflict graph (interval graph). n Find

Register Binding Problem n Given a schedule n Conflict graph (interval graph). n Find minimum number of registers storing all the variables. n Compatibility graph. • Lifetime intervals for variables. • Lifetime overlaps. • Vertices variables. • Edges overlaps. 40

Example n Six intermediate variables that need to be stored in registers {z 1,

Example n Six intermediate variables that need to be stored in registers {z 1, z 2, z 3, z 4, z 5, z 6{ n Six variables can be stored in two registers 41

Example n 7 intermediate variables, 3 loop invariants n 5 registers suffice to store

Example n 7 intermediate variables, 3 loop invariants n 5 registers suffice to store 10 intermediate loop variables 42