ESE 535 Electronic Design Automation Day 7 February
ESE 535: Electronic Design Automation Day 7: February 4, 2013 Clustering (LUT Mapping, Delay) 1 Penn ESE 535 Spring 2013 -- De. Hon
Behavioral (C, MATLAB, …) Today Arch. Select Schedule RTL • How do we map to LUTs? • What happens when – IO dominates – Delay dominates? • Lessons… – for non-LUTs – for delay-oriented partitioning FSM assign Two-level, Multilevel opt. Covering Retiming Gate Netlist Placement Routing Layout Masks 2 Penn ESE 535 Spring 2013 -- De. Hon
LUT Mapping • Problem: Map logic netlist to LUTs – minimizing area – minimizing delay • Old problem? – Technology mapping? (Day 2) – How big is the library for K-input LUT? • 22 K gates in library 3 Penn ESE 535 Spring 2013 -- De. Hon
Simplifying Structure • K-LUT can implement any K-input function 4 Penn ESE 535 Spring 2013 -- De. Hon
Preclass: Cover in 4 -LUT? 5 Penn ESE 535 Spring 2013 -- De. Hon
Preclass: Cover in 4 -LUT? 6 Penn ESE 535 Spring 2013 -- De. Hon
Preclass: Cover in 4 -LUT? 7 Penn ESE 535 Spring 2013 -- De. Hon
Preclass: Cover in 4 -LUT? 8 Penn ESE 535 Spring 2013 -- De. Hon
Preclass: Cover in 4 -LUT? 9 Penn ESE 535 Spring 2013 -- De. Hon
Cost Function • Delay: number of LUTs in critical path – doesn’t say delay in LUTs or in wires – does assume uniform interconnect delay • Area: number of LUTs – Assumes adequate interconnect to use LUTs 10 Penn ESE 535 Spring 2013 -- De. Hon
LUT Mapping • NP-Hard in general • Fanout-free -- can solve optimally given decomposition – (but which one? ) • Delay optimal mapping achievable in Polynomial time • Area w/ fanout NP-complete 11 Penn ESE 535 Spring 2013 -- De. Hon
Preliminaries • What matters/makes this interesting? – Area / Delay target – Decomposition – Fanout • replication • reconvergent 12 Penn ESE 535 Spring 2013 -- De. Hon
Costs: Area vs. Delay 13 Penn ESE 535 Spring 2013 -- De. Hon
Decomposition 14 Penn ESE 535 Spring 2013 -- De. Hon
Decomposition 15 Penn ESE 535 Spring 2013 -- De. Hon
Fanout: Replication 16 Penn ESE 535 Spring 2013 -- De. Hon
Fanout: Replication 17 Penn ESE 535 Spring 2013 -- De. Hon
Fanout: Reconvergence 18 Penn ESE 535 Spring 2013 -- De. Hon
Fanout: Reconvergence 19 Penn ESE 535 Spring 2013 -- De. Hon
What makes it hard? • Cost does not monotonically increase as cover more of graph. • Not clear when to stop? • We say cost does not have a monotone property 20 Penn ESE 535 Spring 2013 -- De. Hon
Preclass Revisited 21 Penn ESE 535 Spring 2013 -- De. Hon
Definition • Cone: set of nodes in the recursive fanin of a node 22 Penn ESE 535 Spring 2013 -- De. Hon
Example Cones 23 Penn ESE 535 Spring 2013 -- De. Hon
Delay 24 Penn ESE 535 Spring 2013 -- De. Hon
Delay of Preclass Circuit? • Poll: Delay of preclass circuit 25 Penn ESE 535 Spring 2013 -- De. Hon
Dynamic Programming • Optimal covering of a logic cone is: – Minimum cost (all possible coverings) • Evaluate costs of each node based on: – cover node – cones covering each fanin to node cover • Evaluate node costs in topological order • Key: are calculating optimal solutions to subproblems – only have to evaluate covering options at each node Penn ESE 535 Spring 2013 -- De. Hon 26
Flowmap • Key Idea: – LUT holds anything with K inputs – Use network flow to find cuts • logic can pack into LUT including reconvergence • …allows replication – Optimal depth arise from optimal depth solution to subproblems 27 Penn ESE 535 Spring 2013 -- De. Hon
Max-Flow / Min-Cut • The maximum flow in a network is equal to the minimum cut – …the bottleneck • We can find the mincut by computing the maxflow. • Conceptually, how would we determine the maximum flow? 28 Penn ESE 535 Spring 2013 -- De. Hon
Max. Flow • Set all edge flows to zero – F[u, v]=0 • While there is a path from s, t – (breadth-first-search) – for each edge in path f[u, v]=f[u, v]+1 – f[v, u]=-f[u, v] – When c[v, u]=f[v, u] remove edge from search • O(|E|*cutsize) 29 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap Examples are K=4 • Delay objective: – minimum height, K-feasible cut – I. e. cut no more than K edges – start by bounding fanin K 1 1 1 • Height of node will be: – height of predecessors or – one greater than height of predecessors 1 2 • Check shorter first 30 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap • Construct flow problem – sink target node being mapped – source start set (primary inputs) – flow infinite into start set – flow of one on each link – to see if height same as predecessors • collapse all predecessors of maximum height into sink (single node, cut must be above) • height +1 case is trivially true 31 Penn ESE 535 Spring 2013 -- De. Hon
Example Subgraph 1 1 2 Target: K=4 2 32 Penn ESE 535 Spring 2013 -- De. Hon
Trivial: Height +1 1 1 2 2 3 33 Penn ESE 535 Spring 2013 -- De. Hon
Collapse at max height 1 1 2 2 34 Penn ESE 535 Spring 2013 -- De. Hon
Collapse at max height 1 1 2 2 Collapsed Node 35 Penn ESE 535 Spring 2013 -- De. Hon
Augmenting Flows Collapsed Node 36 Penn ESE 535 Spring 2013 -- De. Hon
Augmenting Flows Collapsed Node 37 Penn ESE 535 Spring 2013 -- De. Hon
Augmenting Flows Collapsed Node 38 Penn ESE 535 Spring 2013 -- De. Hon
Augmenting Flows Collapsed Node 39 Penn ESE 535 Spring 2013 -- De. Hon
Augmenting Flows Collapsed Node 40 Penn ESE 535 Spring 2013 -- De. Hon
Collapse at max height: works for K=4 1 1 2 2 Collapsed Node 2 41 Penn ESE 535 Spring 2013 -- De. Hon
Collapse not work (K still 4) (different/larger graph) 1 1 2 2 Forced to label height+1 2 42 Penn ESE 535 Spring 2013 -- De. Hon
Reconvergent fanout (yet different graph) 1 1 1 2 Can label at height 2 43 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap • Max-flow Min-cut algorithm to find cut • Use augmenting paths until discover max flow > K • O(K|e|) time to discover K-feasible cut – (or that does not exist) • Depth identification: O(KN|e|) 44 Penn ESE 535 Spring 2013 -- De. Hon
Mincut may not be unique 1 1 2 2 Collapsed Node 2 45 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap • Min-cut may not be unique • To minimize area achieving delay optimum – find max volume min-cut • • Compute max flow find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set 46 Penn ESE 535 Spring 2013 -- De. Hon
Collapse at max height: works for K=4 1 1 2 2 Collapsed Node 2 47 Penn ESE 535 Spring 2013 -- De. Hon
BFS from Source 1 1 2 2 Collapsed Node 2 48 Penn ESE 535 Spring 2013 -- De. Hon
BFS from Source 1 1 2 2 Collapsed Node 2 49 Penn ESE 535 Spring 2013 -- De. Hon
BFS from Source 1 1 2 2 Collapsed Node 2 50 Penn ESE 535 Spring 2013 -- De. Hon
BFS from Source 1 1 2 2 Collapsed Node 2 Does not find rest. Penn ESE 535 Spring 2013 -- De. Hon 51
Max-Volume Mincut 1 1 2 2 Collapsed Node 2 52 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap • Covering from labeling is straightforward – – process in reverse topological order allocate identified K-feasible cut to LUT remove node postprocess to minimize LUT count • Notes: – replication implicit (covered multiple places) – nodes purely internal to one or more covers may not get their own LUTs 53 Penn ESE 535 Spring 2013 -- De. Hon
Flowmap Roundup • Label – – Work from inputs to outputs Find max label of predecessors Collapse new node with all predecessors at this label Can find flow cut K? • Yes: mark with label (find max-volume cut extent) • No: mark with label+1 • Cover – Work from outputs to inputs – Allocate LUT for identified cluster/cover – Recurse covering selection on inputs to identified LUT 54 Penn ESE 535 Spring 2013 -- De. Hon
Area Changing Cost Functions Now (previous was delay) 55 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map • Duplication Free Mapping – can find optimal area under this constraint – (but optimal area may not be duplication free) [Cong+Ding, IEEE TR VLSI Sys. V 2 n 2 p 137] 56 Penn ESE 535 Spring 2013 -- De. Hon
Maximum Fanout Free Cones MFFC: bit more general than trees Penn ESE 535 Spring 2013 -- De. Hon 57
MFFC • Follow cone backward • end at node that fans out (has output) outside the code 58 Penn ESE 535 Spring 2013 -- De. Hon
MFFC example Identify FFC I F C G H D A E B 59 Penn ESE 535 Spring 2013 -- De. Hon
MFFC example I F C G H D A E B 60 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map • Partition into graph into MFFCs • Optimally map each MFFC • In dynamic programming – for each node • examine each K-feasible cut – note: this is very different than flowmap where only had to examine a single cut – Example to follow • pick cut to minimize cost – 1 + cones for fanins Penn ESE 535 Spring 2013 -- De. Hon 61
DF-Map Example Cones? 62 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 63 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 64 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 65 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 66 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 67 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example Start mapping cone 68 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 69 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 ? 1 1 70 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 ? 1 1 71 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 ? 1 1 72 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 ? 1 1 73 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 74 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example Similar to previous 1 1 1 1 75 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 ? 76 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 ? 77 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 3 ? 78 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 ? 79 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 2 1 1 1 1 ? 80 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 ? 81 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 3 1 1 1 1 ? 82 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 2 83 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 ? 84 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 ? 85 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 3 2 ? 86 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 3 2 ? 87 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 3 1 1 1 3 2 ? 88 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 3 1 1 1 3 2 ? 89 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 3 1 1 1 1 1 3 2 ? 90 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 91 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 1 1 ? 92 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 1 1 ? 93 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 4 2 3 ? 94 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 4 2 3 ? 95 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 3 1 1 1 4 2 3 ? 96 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 3 1 1 1 4 2 3 ? 97 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 3 1 1 1 4 2 3 ? 98 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 3 1 1 1 4 2 3 ? 99 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 3 1 1 1 4 2 3 ? 4 100 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 3 1 1 1 4 2 3 ? 4 101 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 3 1 1 1 4 2 3 3 ? 4 102 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 103 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 ? 104 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 ? 105 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 9 ? 106 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 9 ? 107 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 7 2 3 3 9 ? 108 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 7 2 3 3 9 ? 109 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 1 7 2 3 5 3 9 ? 110 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 5 111 Penn ESE 535 Spring 2013 -- De. Hon
DF-Map Example 1 1 1 2 1 1 3 5 112 Penn ESE 535 Spring 2013 -- De. Hon
Composing • Don’t need minimum delay off the critical path • Don’t always want/need minimum delay • Composite: – map with flowmap – Greedy decomposition of “most promising” non-critical nodes – DF-map these nodes 113 Penn ESE 535 Spring 2013 -- De. Hon
Variations on a Theme 114 Penn ESE 535 Spring 2013 -- De. Hon
Applicability to Non-LUTs? • E. g. LUT Cascade – can handle some functions of K inputs • How apply? 115 Penn ESE 535 Spring 2013 -- De. Hon
Adaptable to Non-LUTs • Sketch: – Initial decomposition to nodes that will fit – Find max volume, min-height K-feasible cut – ask if logic block will cover • yes done • no exclude one (or more) nodes from block and repeat – exclude == collapse into start set nodes – this makes heuristic 116 Penn ESE 535 Spring 2013 -- De. Hon
Partitioning? • Effectively partitioning logic into clusters – LUT cluster • unlimited internal “gate” capacity • limited I/O (K) • simple delay cost model – 1 cross between clusters – 0 inside cluster 117 Penn ESE 535 Spring 2013 -- De. Hon
Partitioning • Clustering – if strongly I/O limited, same basic idea works for partitioning to components • typically: partitioning onto multiple FPGAs • assumption: inter-FPGA delay >> intra-FPGA delay – w/ area constraints • similar to non-LUT case – make min-cut – will it fit? – Exclude some LUTs and repeat 118 Penn ESE 535 Spring 2013 -- De. Hon
Clustering for Delay • W/ no IO constraint • area is monotone property • DP-label forward with delays – grab up largest labels (greatest delays) until fill cluster size • Work backward from outputs creating clusters as needed 119 Penn ESE 535 Spring 2013 -- De. Hon
Area and IO? • Real problem: – FPGA/chip partitioning • Doing both optimally is NP-hard • Heuristic around IO cut first should do well – (e. g. non-LUT slide) – [Yang and Wong, FPGA’ 94] 120 Penn ESE 535 Spring 2013 -- De. Hon
Partitioning • To date: – primarily used for 2 -level hierarchy • I. e. intra-FPGA, inter-FPGA • Open/promising – adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule • localize critical paths to smallest subtree possible? 121 Penn ESE 535 Spring 2013 -- De. Hon
Summary • Optimal LUT mapping NP-hard in general – fanout, replication, …. • K-LUTs makes delay optimal feasible – single constraint: IO capacity – technique: max-flow/min-cut • Heuristic adaptations of basic idea to capacity constrained problem – promising area for interconnect delay optimization 122 Penn ESE 535 Spring 2013 -- De. Hon
Today’s Big Ideas: • IO may be a dominant cost – limiting capacity, delay • Exploit structure: K-LUTs • Mixing dominant modes – multiple objectives • Define optimally solvable subproblem – duplication free mapping 123 Penn ESE 535 Spring 2013 -- De. Hon
Admin • Reading Wednesday on web • Assignment 2 a was due at beginning of class • Assignment 2 b due next Monday 124 Penn ESE 535 Spring 2013 -- De. Hon
- Slides: 124