ECE 667 Synthesis and Verification of Digital Systems

  • Slides: 13
Download presentation
ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D. Chen,

ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D. Chen, J. Cong, DAOMap : A Depth optimal Area Optimization mapping algorithm for FPGA Designs, ICCAD 2004 1

FPGA Mapping (LUT-based) • How is it different from ASIC (standard cells) – Structural

FPGA Mapping (LUT-based) • How is it different from ASIC (standard cells) – Structural in nature, simpler – Any function with k inputs can be mapped into a k-LUT – Typically implemented by cut mapping F = x 1’x 2’ + x 1 x 2 • FPGA architecture: k-LUT x 1 x 2 F 0 0 1 1 1 0 0 1 Programming bit P 2 -Input LUT 0/1 F 0/1 0 1 0/1 x 2 ECE 667 Synthesis & Verificatioin - FPGA Mapping 2

FPGA Mapping - example A possible mapping onto 3 -LUTs f - each block

FPGA Mapping - example A possible mapping onto 3 -LUTs f - each block has inputs g d e h b a c ECE 667 Synthesis & Verificatioin - FPGA Mapping 3

Definitions • • • DAG: Boolean network Cone Cv: sub-network rooted on node v

Definitions • • • DAG: Boolean network Cone Cv: sub-network rooted on node v K-feasible cone: |input(Cv)| K Fanin Cone Fv: the largest Cv k-feasible cut: a k-feasible Cv Unit delay model: – Each LUT contributes one unit delay • Cut rooted on node C: cut with output C PIs a Fv c b d e v 3 -feasible cone Cv ECE 667 Synthesis & Verificatioin - FPGA Mapping Delay of 2 4

Problem Formulation • Delay-optimal Area Optimization problem – Given: a Boolean network; an integer

Problem Formulation • Delay-optimal Area Optimization problem – Given: a Boolean network; an integer k (LUT size) – Goal: cover the network with k-feasible cones (k-LUTs), such that • Mapping depth (delay) is minimum • Area (number of LUTs) is minimized • NP-hard problem on area minimization • A two-step process – Cut enumeration + evaluation (delay, area) – Cut selection to minimize delay – Possible iteration to remap nodes on non-critical paths (area recovery) – Takes into consideration node duplication ECE 667 Synthesis & Verificatioin - FPGA Mapping 5

Cut Enumeration x w z y c a Subcut c a b d Subcut

Cut Enumeration x w z y c a Subcut c a b d Subcut New cut Another Subcut • Process nodes in topological order from PIs to POs • Combine sub-cuts of the fanin nodes to create a new cut • If the size of the cut exceeds k (LUT size), discard the cut ECE 667 Synthesis & Verificatioin - FPGA Mapping 6

Delay Propagation (k = 3) x Delay = 1 w z y b Delay

Delay Propagation (k = 3) x Delay = 1 w z y b Delay = 2 1 Delay = 1 Optimal Delay = 1 a c Delay = 1 Optimal Delay = 1 d Delay = 2 Optimal Delay = 1 e g f Delay = 2 Optimal Delay = 2 • Delay computed using dynamic programming method. • The longest best delay on the POs is the optimal mapping delay ECE 667 Synthesis & Verificatioin - FPGA Mapping 7

Area Estimation Tries to estimate area considering fanout effect AC = [Ai / f(i)]

Area Estimation Tries to estimate area considering fanout effect AC = [Ai / f(i)] + UC Ap m n p o f(p) = 2 i = input(C) • • q Ai : estimated area of the fanin cone of signal i f(i) : fanout number of inputs Uc : area of the cut itself Can underestimate area due to node duplication ECE 667 Synthesis & Verificatioin - FPGA Mapping r s Cut Ct t Cut C As / 2 u X Cut Cu 8

Duplication Cost Adjustment • Considers potential node duplications • Check the sub-cuts for multiple

Duplication Cost Adjustment • Considers potential node duplications • Check the sub-cuts for multiple fanouts • Area adjusted by addition of duplication cost Duplication Cost: § NCf : number of nodes contained by subcut Cf m n § IC : cutsize of C q §fi : fanout number of subcut p o r Subcut Cf 2 Subcut Cf 1 New cut C IC = 4 ECE 667 Synthesis & Verificatioin - FPGA Mapping s NCf 2 = 1 Multiple fanouts 9

Cost (Area) Function of a Cut Some Key parameters • IC: cutsize of C

Cost (Area) Function of a Cut Some Key parameters • IC: cutsize of C • NC: number of nodes covered by C • f(v): fanout number of the root node v • Pf: duplication cost a C 1 c b d C 2 e C 3 v fanin 1 ECE 667 Synthesis & Verificatioin - FPGA Mapping fanin 2 10

Cut Selection • Once cuts are generated, traverse networks from POs to PIs and

Cut Selection • Once cuts are generated, traverse networks from POs to PIs and select cuts that map into LUTs • Select cuts such that timing is met and the area is minimized • Iterative Cut Selection Procedure – Local Cost Adjustment • Input Sharing • Slack Distribution • Cut Probing ECE 667 Synthesis & Verificatioin - FPGA Mapping 11

Local Cost Adjustment – Slack Distribution • Slack. C = Reqv – 1 –

Local Cost Adjustment – Slack Distribution • Slack. C = Reqv – 1 – MAX (Arri) i input(C) • If Slack. C < 0, C is not a timing_feasible cut • The larger the Slack. C, the better for C in terms of slack distribution effect x y w z b a c d ECE 667 Synthesis & Verificatioin - FPGA Mapping C Largest arrival time among inputs Reqd : Required time of the root 12

Algorithm Recap • Cut generation of k- feasible cuts • Area propagation under timing

Algorithm Recap • Cut generation of k- feasible cuts • Area propagation under timing constraints – optimal area at a node is the minimum area among cuts that give minimum delay • Representation of the cost function for a cut more accurately • Global duplication cost adjustment • Cut selection involving local cost adjustment ECE 667 Synthesis & Verificatioin - FPGA Mapping 13