Placement 1 Placement Problem Given a netlist and
Placement 1
Placement • Problem § Given a netlist, and fixed-shape cells (small, standard cell), find the exact location of the cells to minimize area and wire-length § Consistent with the standard-cell design methodology o Row-based, no hard-macros § Modules: o Usually fixed, equal height (exception: double height cells) o Some fixed (I/O pads) o Connected by edges or hyperedges • Objectives § Cost components: area, wire length o Additional cost components: timing, congestion [Bazargan] 2
Problem formulation • Input: § Blocks (standard cells and macros) B 1, . . . , Bn § Shapes and Pin Positions for each block Bi § Nets N 1, . . . , Nm • Output: § § Coordinates (xi , yi ) for block Bi. No overlaps between blocks The total wire length is minimized The area of the resulting block is minimized or given a fixed die • Other considerations: timing, routability, clock, buffering and interaction with physical synthesis 3
Wire Length Objective 4
Placement Cost Components • Area § Would like to pack all the modules very tightly • Wire length (half-perimeter of the hnet bbox) § Minimize average wire length § Would result in tight packing of modules with high connectivity • Overlap § Could be prohibited by the moves, or used as penalty § Keep the cells from overlapping (moves cells apart) • Timing § Not a 1 -1 correspondence with wire length minimization, but consistent on average • Congestion § Measure of routability § Tends to move cells apart [Bazargan] 5
Importance of Placement • Placement: fundamental problem in physical design • Glue of the physical synthesis • Became very active again in recent years: § 9 new academic placers for WL min. since 2000 § Many other publications to handle timing, routability, etc. • Reasons: § Serious interconnect issues (delay, routability, noise) in deep-submicron design o Placement determines interconnect to the first order o Need placement information even in early design stages (e. g. , logic synthesis) o Need to have a good placement solution § Placement problem becomes significantly larger § Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out that existing placers are far from optimal, not scalable, and not stable; subsequent rejoinder from other studies [Bazargan] [© He] 6
Placement can Make A Difference • MCNC Benchmark circuit e 64 (contains 230 4 LUT). Placed to a FPGA. Random Initial Final After Detailed Placement Routing [Bazargan] [© He] 7
Design Types • ASICs § Lots of fixed I/Os, few macros, millions of standard cells § Placement densities : 40 -80% (IBM) § Flat and hierarchical designs • So. Cs § Many more macro blocks, cores § Datapaths + control logic § Can have very low placement densities : < 20% • Micro-Processor ( P) Random Logic Macros(RLM) § § Hierarchical partitions are placement instances (5 -30 K) High placement densities : 80%-98% (low whitespace) Many fixed I/Os, relatively few standard cells Recall “Partitioning w Terminals” DAC`99, ISPD `99, ASPDAC`00 [Bazargan] [© He] 8
Requirements for Placers • Must handle 4 -10 M cells, 1000 s macros § 64 bits + near-linear asymptotic complexity § Scalable/compact design database (Open. Access) • Accept fixed ports/pads/pins + fixed cells • Place macros, esp. with var. aspect ratios § Non-trivial heights and widths (e. g. , height=2 rows) • Honor targets and limits for net length • Respect floorplan constraints • Handle a wide range of placement densities (from <25% to 100% occupied), ICCAD `02 [Bazargan] [© He] 9
Optimal Relative Order: A B C 10
To spread. . . A B C 11
. . or not to spread A B C 12
Place to the left A B C 13
… or to the right A B C 14
Optimal Relative Order: A B C Without “free” space the problem is dominated by order 15
Placement Footprints: Standard Cell: Data Path: IP - Floorplanning [Bazargan] 16
Placement Footprints: Core Reserved areas IO Control Mixed Data Path & sea of gates: [Bazargan] 17
Placement Footprints: Perimeter IO Area IO [Bazargan] 18
Unconstrained Placement [Bazargan] 19
Floor planned Placement [Bazargan] 20
VLSI Global Placement Examples bad placement good placement [Bazargan] 21
Placement Algorithms • Top-Down § Partitioning-based placement § Recursive bi-partitioning or quadrisection o Cut direction? o Partition vs. physical location 1 A 2 B • Iterative § Simulated annealing OR: Force directed/analytic § Start with an initial placement, iteratively improve wire-length / area § Start with a few cells in the center, and place highly connected adjacent modules around them [Bazargan] L C • Constructive A D F H B G 22
Analytic Placement 23
Analytic Placement • Also referred to as force-directed or quadratic placement • Model § Wires simulated as springs Forceij = Weightij x distanceij. § Cell sizes as repellent forces • Algorithm § Solve a set of linear equations to find an intermediate solution (module locations) § Repeat the process until equilibrium [Bazargan] 24
Force-Directed Placement (cont. ) • Model (details): § Cell distances: either § OR: § Forces: § Objective: find x, y coordinates for all cells such that total force exerted on each cell is zero. [Bazargan] 25
Force-Directed Placement (cont. ) • Avoiding overlaps or collapsing in one point? § Use fixed boundary I/O cells § Use repelling force between cells that are not connected by a net § Do not allow a move that results in overlap § Use repelling “field” forces from congested areas to sparse ones [Eisenmann, DAC’ 98] • Problems with force directed: § Overlap still might occur (cell sizes model artificially) § Flat design, not hierarchy [Bazargan] 26
Analytic Placement (Contd. ) • Write down the placement problem as an analytical mathematical problem • Solve the placement problem directly • Example: § Sum of squared wire length is quadratic in the cell coordinates. § So the wirelength minimization problem can be formulated as a quadratic program. § It can be proved that the quadratic program is convex, hence polynomial time solvable 27
Toy Example: x=100 x=200 x 1 x 2 Cost = (x 1 - 100)2 + (x 1 - x 2 ) 2 + (x 2 -200)2 Cost 2(x = 1 - 100) + 2(x 1 - x 2 ) x 1 x 2 Cost = 2(x 1 -x 2 ) +2(x 2 - 200) setting the partial derivatives = 0 we solve for the minimum Cost: Ax + B = 0 4 -2 -2 4 x 1 -200 x 2 + -400 = 0 2 -1 x 1 =0 + -100 x 2 -200 -1 2 x 1=400/3 x 2=500/3 28
Example: x=100 x=200 x 1 x 2 Interpretation of matrices A and B: The diagonal values A[i, i] correspond to the number of connections to xi The off diagonal values A[i, j] are -1 if object i is connected to object j, 0 otherwise The values B[i] correspond to the sum of the locations of fixed objects connected to object i 29
Why formulate the problem this way? • • • Because we can Because it is trivial to solve Because there is only one solution Because the solution is a global optimum Because the solution conveys “relative order” information • Because the solution conveys “global position” information 30
Gordian: A Quadratic Placement Approach § Global optimization: solves a sequence of quadratic programming problems § Partitioning: enforces the non-overlap constraints Ref. 1: Gordian: VLSI Placement by Quadratic Programming and slicing Optimization, by J. M. Kleinhans, G. Sigl, F. M. Johannes, K. J. Antreich IEEE Trans. On CAD, March 1991. pp 356 -365 Ref. 2: Analytical Placement: A Linear or a Quadratic Objective Function? By G. Sigl, K. Doll, F. M. Johannes, DAC’ 91 pp 427 -423 31
Quadratic Placement Formulation • Quadratic Placement Framework: repeat Solve the convex quadratic program Spread the cells until the cells are evenly distributed 32
Solution of the Original QP 33
Partitioning • Find a good cut direction and position. • Improve the cut value using FM. 34
Applying the Idea Recursively • Before every level of partitioning, do the Global Optimization again with additional constraints that the center of gravities should be in the center of regions. Center of Gravities • Always solve a single QP (i. e. , global). 35
Center of Gravity Constraints 36
Mathematical formulation 37
Process of Gordian (a) Global placement with 1 region (b) Global placement with 4 region (c) Final placements 38
Star Model for Wire Length 39
Other Details • Sometimes, there is too much overlapping between modules (indicating a bad partitioning). In that case, it is necessary to merge 2 subpartitions and repartition them. • When the number of cells in each partition is small enough, can stop partitioning. Then do a final placement of the sub-circuits according to the design style. 40
Repartitioning 41
Quadratic Techniques Pros: - mathematically well behaved - efficient solution techniques find global optimum - great quality Cons: - solution of Ax + B = 0 is not a legal placement, so generally some additional partitioning techniques are required. - solution of Ax + B = 0 is that of the "mapped" problem, i. e. , nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additional methods are deployed - fixed IOs are required for these techniques to work well 42
Linear vs. Quadratic Objective Function Differences between linear and quadratic objective function b A B a fixed g movable C fixed a) Quadratic objective function A B fixed movable g C fixed b) Linear objective function 43
Linear vs. Quadratic Objective Function (Cont’d) § Quadratic objective function tends to make very long nets shorter than linear objective function does, and lets short nets become slightly longer A row 1 row 2 B row 3 row 4 Linear objective function row 1 row 2 A B row 3 row 4 Quadratic objective function 44
Kraftwerk Placement Tool Hans Eisenmann and F. Johannes, “Generic Global Placement and Floorplanning”, DAC-98, p. 269 - 274 45
Approach • Iteratively solve the quadratic formulation: // equivalent to spring force // equilibrium • Spread cells by additional forces: • Density-based force proposed § Push cells away from dense region to sparse region 46
Some Details • Let fi be the additional force applied to cell i • The proportional constant k is chosen so that the maximum of all fi is the same as the force of a net with length k(W+H) • k is a user-defined parameter § k=0. 2 for standard operation § k=1. 0 for fast operation • Can be extended to handle timing, mixed block placement and floorplanning, congestion, heatdriven placement, incremental changes, etc. 47
Some Potential Problems of Kraftwerk 1. • • • 2. Convergence is difficult to control Large k oscillation Small k slow convergence Example: Layout of a multiplier Density-based force is expensive to compute 48
A “discrete” penalty function • Used in Aplace, Kahng et al. , ISPD 05 § § Proposed by Naylor (U. S. Pat. 6301693 ) Divide placement area into grids Equalize cell area over all grids Straightforward overlap function § Bell-shaped function is continuous and differentiable 49
Fast. Place: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model Natarajan Viswanathan and Chris Chu ISPD-04 50
Fast. Place Approach • Fast. Place Framework (roughly): repeat Solve the convex quadratic program Reduce wirelength by iterative heuristic Spread the cells until the cells are evenly distributed • Special features of Fast. Place: § Cell Shifting o Easy-to-compute technique o Enable fast convergence § Hybrid Net Model o Speed up solving of convex QP § Iterative Local Refinement o Minimize wirelength based on linear objective 51
Cell Shifting 1. Shifting of bin boundary Uniform Bin Structure Non-uniform Bin Structure 2. Shifting of cells linearly within each bin • Apply to all rows and all columns independently 52
Cell Shifting – Animation … Bin i+1 k j Ui l OBi-1 OBi+1 Ui+1 OBi-1 OBi+1 k j l NBi 53
Iterative Local Refinement • Iteratively go through all the cells one by one • For each cell, consider moving it in four directions for a certain distance • Compute a score for each direction based on § Half-perimeter wirelength (HPWL) reduction § Cell density at the source and destination regions • Move to the direction with highest positive score (Not move if no positive score) • Distance moved (H or V) is V decreasing over iterations • Detailed placement is handled H H by the same heuristic V 54
Pseudo pin and Pseudo net • Need to add forces to prevent cells from collapse back • By adding pseudo pins and pseudo nets • Only diagonal and linear terms of the quadratic system need to be updated • Horizontal and vertical problems have the same connectivity matrix Q Pseudo pin Pseudo net Additional Force Target Position Original Position 55
Effect of Net Model on Runtime • Need to replace each multi-pin net by 2 -pin nets • Then the placement problem (even with pseudo nets) can be formulated as a convex QP: • Solved by any convex QP algorithms § Use Incomplete Cholesky Conjugate Gradient (ICCG) • Runtime is proportional to # of non-zero entries in Q • Each non-zero entry in Q corresponds to one 2 -pin net • Traditionally, placers model each multi-pin net by a clique • High-degree nets will generate a lot of 2 -pin nets • Slow down convex QP algorithms significantly 56
Clique, Star and Hybrid Net Models • Star model is introduced by Mo et al. [ICCAD-00] for macro placement • Introduces a star node even for 2 -pin nets Star Node Clique Model Star Model # pins 2 3 4 5 6 … Net Model Clique Star … Hybrid Model 57
Comparsion • Fast. Place is fast: § Compared to Capo 8. 8: o 13 x faster o 1% longer in wirelength § Compared to Dragon 2. 2. 3 o 97. 4 x faster o 1. 6% longer in wirelength § Compared to Kraftwerk (based on published data) o 20 -25 x faster o 10% better in wirelength 58
Detailed Placement 59
Objectives • Major: Legalization § Make placement feasible with as little perturbation as possible • Minor: Further Improvement § Wirelength § Timing § Routability 60
Representative Approaches • Constructive methods § Network flow [K. Doll et al, TCAD’ 94] § Linear placement [A. Kahng et al, ASPDAC’ 99] § Window interleaving [S. Hur et al, ICCAD’ 00’] • Iterative method § Simulated annealing [W. Sun & C. Sechen, ICCAD’ 93] • Hybrid method § Network flow + Dynamic programming + Linear programming [J. Vygen, DATE’ 98] [Brenner et al, ISPD’ 04] • Placement migration/spreading § Diffusion-based [Ren et al, DAC’ 05] § Delaunay [Luo et al, ICCAD’ 05] 61
Constructive Method [K. Doll et al, TCAD’ 94] • Overlapped initial placement • Improve placement • Divide chip into overlapping regions • Assign cells within each region • Iterated if necessary 62
Constructive Method: Network Flow • Network transformation for regions C 1 C 2 Region (capacity, cost) C 3 Subcells Subregion Final Result 63
Constructive Method: Network Flow • Within each region § Chop cells into uniform sized subcells o All subcells from the same cell have same cost to each candidate location • All subcells tend to be pulled towards the cheapest location. As a result, tend to lie side by side • In case of separation, move subcells to the column holding the majority o Determine y coordinate using gravity center of subcells § Transform into min-cost-maximum-flow o o o Introduce source Q and sink S Capacity from Q to any cell µ is sµ=# subcells in µ Capacity from candidate location to S is 1 One edge from each cell to each candidate location with capacity One edge from each candidate location to sink D with capacity 1 § Solve by flow-augmentation algorithm 64
Constructive Method [K. Doll et al, TCAD’ 94] Pseudo-code for the algorithm 65
Constructive Method [A. Kahng et al, ASPDAC’ 99] • Determine the order of cells on a row from global placement • Fix cells outside the row, and place the cells within the row optimally according to cell order • Feasible location of cells are discrete, e. g. , grid points • Consider white space on the row, i. e. , unused space between cells • Three algorithms are given to exploit different properties of cost function: details omitted • Key point: concept of optimal regions 66
Constructive Method: Linear Placement • Input § A single row with dimension with n movable cells ci, i=1, 2, … m § Fixed cells in other rows § Feasible locations S 1, S 2, Sk § m nets, N 1, N 2, … Nm between the movable and fixed cells § Discrete feasible location for cells • Output § Overlapping free placement of ci within the row with minimum • Constraint (fixed order) § x( c 1 ) x( c 2 ) … x( c n ) 67
Constructive Method: Linear Placement fixed cells fl(N) net N fr(N) ml(N) span (N) fixed_span (N) mr(N) minimize 68
Constructive Method: Linear Placement 1 2 3 4 5 7 6 • Consider contribution of cell ci • Piece-wise linear and convex § Increasing (decreasing) when fr(N) to the left of corresponding li(N) is less (more) than fl(N) to the right of corresponding ri(N) 69
Optimal region for detailed placement • Generalization of Kahng, ASPDAC 99 (key idea proposed earlier by Goto in 1981); used in Pan, ICCAD 05 • Optimal region = target region for moving a cell in detailed placement • Given by the x and y medians of all nets § Given nets j=1…n, with bounding box coordinates xj(l), xj(r), yj(l), yj(r), order all x’s and find the median; order all y’s and find the median 70
Constructive Legalization in Mongrel [Hur and Lillis, ICCAD’ 00] • Start from an overlap-free detailed placement and further improve the wirelength • Pick an arbitrarily sized interval (window) on a row § Fix the cells outside the window, and interleave the cells optimally § Repeat by sliding the window right across each row 71
1. Wirelength improvement • All movements monotonic from a source bin S with violated capacity to a destination bin T with available capacity • Gain = wire length reduction by moving a single cell to a neighbor • Create a gain graph (DAG), and find maximum gain path 72
2. Window Interleaving • Input § An arbitrarily sized interval W within a row § Arbitrary ordered subsequence of W as A={a 1, a 2, … am}. , and B = W -A = { b 1 , b 2 , … b n } § Fixed cells outside W § n nets, N 1, N 2, … Nn between cells in W and fixed cells • Output § Overlapping free placement within W with minimum • Constraint § Cell order of A and B is preserved § Wirelength of nets connecting cells within W is minimized 73
2. Window Interleaving Ordered sequence A and B are chosen arbitrarily The cell order of A and B is preserved after interleaving 74
2. Window Interleaving • Dynamic programming § Subproblem o Denote Sij as the interleaving if {a 1, a 2, … ai } and {b 1, b 2, … bj}, with C(Si, j) minimized, C(Si, j) denotes the cost of Sij , § Recursion § Complexity O(nm+p(n+m)), p is the number of pins on incident nets, n and m is the cells in A and B respectively 75
2. Window Interleaving • Dynamic programming § Key point: the optimal interleaving of a prefix Si, j is independent of the ordering of subsequent cells in the window § Separability => Dynamic Programming • Points to ponder § Is it optimal? § What is the difference compared to the DP in Linear Placement [Kahng et al]? 76
Other approaches to legalization • Not discussed in class – for your reference § Feng. Shui o Dynamic programming [ICCAD’ 03] o Greedy Algorithm [ISPD’ 04] § Tetris - Dwight Hill (US Patent 6, 370, 673) o Also used in Kahng/Wang APlace paper [ISPD 04] 77
Simulated annealing 78
Simulated Annealing Placement • Cost § Area (usually fixed # of rows, variable row width) § Wirelength (Euclidian or Manhattan) § Cell overlap (penalty increases with temperature) • Moves § Exchange two cells within a radius R (R temperature dependent? ) § Displace a cell within a row § Flip a cell horizontally • Low vs. High temperature § If used as a post processing, start with low-temp • Post-processing? § Might be needed if there are still overlaps [Bazargan] 79
Case Study: Timber. Wolf • • “The Timberwolf Placement and Routing Package”, Sechen, Sangiovanni; IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985) 510 -522 “Timber wolf 3. 2: A New Standard Cell Placement and Global Routing Package” Sechen, Sangiovanni, 23 rd DAC, 1986, 432 -439 Timber wolf Stage 1 § Modules are moved between different rows as well as within the same row § Modules overlaps are allowed § When the temperature is reduced below a certain value, stage 2 begins Stage 2 § Remove overlaps § Annealing process continues, but only interchanges adjacent modules within the same row [Bazargan] [© He] 80
Solution Space All possible arrangements of modules into rows possibly with overlaps [Bazargan] [© He] 81
Neighboring Solutions Three types of moves: . M 1: Displace a module to a new location . M 2: Interchange two modules M 3: Change the orientation of a module 1 2 3 2 4 1 3 [Bazargan] 4 1 2 3 4 Axis of reflections [© He] 82
Move Selection • Timber wolf first tries to select a move between M 1 and M 2 M 1: Displacement o Prob(M 1)=4/5 o Prob(M 2)=1/5 M 2: Interchange M 3: Reflection • If a move of type M 1 is chosen (for certain module) and it is rejected, then a move of type M 3 (for the same module) will be chosen with probability 1/10 • Restriction on: § How far a module can be displaced § What pairs of modules can be interchanged [Bazargan] [© He] 83
Move Restriction • Range Limiter § At the beginning, R is very large, big enough to contain the whole chip § Window size shrinks slowly as the temperature decreases. In fact, height and width of R log(T) § Stage 2 begins when window size are so small that no inter-row modules interchanges are possible Rectangular window R [Bazargan] [© He] 84
Cost Function net i • Cost = C 1+C 2+C 3 § C 1 = hi S(aiwi + bihi) wi § ai, bi are horizontal and vertical weights, respectively § ai =1, bi =1 1/2 perimeter of bounding box § Critical nets: Increase both ai and bi § Double metal technology: Over-the-cell routing is possible. Fewer feed through cells are needed § If vertical wirings are “cheaper” than horizontal wiringsl use smaller vertical weights i. e. bi< ai [Bazargan] [© He] 85
Cost Function (Cont’d) C 2: Penalty function for module overlaps O(i, j) = amount of overlaps in the X-dimension between modules i and j C 2 = 2 ( ) + a O ( i , j ) å i¹j a — offset parameter to ensure C 2 0 when T 0 C 3: Penalty function that controls the row lengths Desired row length = d( r ) l( r ) = sum of the widths of the modules in row r C 3 = åb l ( r ) - d (r ) r [Bazargan] [© He] 86
Annealing Schedule § Tk = r(k) • Tk-1 k= 1, 2, 3, …. § r(k) increase from 0. 8 to max value 0. 94 and then decrease to 0. 1 § At each temperature, a total number of K • n attempts is made § n= number of modules § K= user specified constant [Bazargan] [© He] 87
Dragon 2000: Standard-Cell Placement Tool for Large Industry Circuits M. Wang, X. Yang, and M. Sarrafzadeh, ICCAD-2000 pages 260 -263 88
Main Idea • Simulated annealing based § § § 1. 9 x faster than i. Tools 1. 4. 0 (commerical version of Timber. Wolf) Comparable wirelength to i. Tools (i. e. , very good) Performs better for larger circuits Still very slow compared with than other approaches Also shown to have good routability • Top-down hierarchical approach § § h. Metis to recursively quadrisect into 4 h bins at level h Swapping of bins at each level by SA to minimize WL Terminates when each bin contains < 7 cells Then swap single cells locally to further minimize WL • Detailed placement is done by greedy algorithm 89
Partitioning 90
Partitioning-based Approach • Try to group closely connected modules together. • Repetitively divide a circuit into sub-circuits such that the cut value is minimized. • Also, the placement region is partitioned (by cutlines) accordingly. • Each sub-circuit is assigned to one partition of the placement region. Note: Also called min-cut placement approach. 91
An Example Cutline Circuit Placement 92
Variations • There are many variations in the partitioningbased approach. They are different in: § The objective function used. § The partitioning algorithm used. § The selection of cutlines. 93
Partitioning: Objective: Given a set of interconnected blocks, produce two sets that are of equal size, and such that the number of nets connecting the two sets is minimized. 94
FM Partitioning: Initial Random Placement list_of_sets = entire_chip; while(any_set_has_2_or_more_objects(list_of_sets)) { for_each_set_in(list_of_sets) { partition_it(); } /* each time through this loop the number of */ /* sets in the list doubles. */ } After Cut 1 After Cut 2 95
Partitioning-based Placement • Simultaneously perform: § Circuit partitioning § Chip area partitioning § Assign circuit partitions to chip slots • Problem: § Circuit partitioning unaware of the physical location A B B A § Solution: Terminal propagation (add dummy terminals) A B [She 99] p. 239 [Bazargan] 96
Terminal Propagation Algorithm by Dunlop and Kernighan “A Procedure for Placement of Standard-Cell VLSI Circuits”, TCAD, 4(1): 92 -98, Jan. 1985. 97
Problem of Partitioning Subcircuits A A B B B A Cost of these 2 partitionings are not the same. 98
Terminal Propagation • Need to consider nets connecting to external terminals or other modules as well. • Do partitioning in a breath-first manner (i. e. , finish all higher-level partitioning first). The Dummy Terminal will try to pull B to the top partition. Dummy Terminal A B A B 99
Terminal Propagation 100
Partitioning-based Placement • More problems: § Direction of the cut? [Yildiz, DAC’ 01] 1 2 4 6 3 5 2 7 4 (a) 1 3 5 (b) 5 6 7 8 9 1 2 3 4 (c) 3 (d) § How to handle fixed blocks? (area assigned to a partition might not be enough) § How to correct a bad decision made at a higher level? • Advantages: § Hierarchical, scalable § Inherently apt for congestion minimization, easily extendable to timing optimization [Bazargan] 101
Can Recursive Bisection Alone Produce Routable Placement? (Name of placer: Capo) Andrew Caldwell, Andrew Kahng, and Igor Markov DAC-2000 102
Capo Overview • Standard cell placement, Fixed-die context • Pure recursive bisectioning placer § Several minor techniques to produce good bisections • Produce good results mainly because: § Improvement in mincut bisection using multi-level idea in the past few years § Pay attention to details in implementation • Implementation with good interface (LEF/DEF and GSRC bookshelf) available on web 103
Capo Approach • Recursive bisection framework: § Multi-level FM for instances with >200 cells § Flat FM for instances with 35 -200 cells § Branch-and-bound for instances with <35 cells • Careful handling partitioning tolerance: § Uncorking: Prevent large cells from blocking smaller cells to move § Repartitioning: Several FM calls with decreasing tolerance § Block splitting heuristics: Higher tolerance for vertical cut § Hierarchical tolerance computation: Instance with more whitespace can have a bigger partitioning tolerance 104
Handling Macrocells • Macrocells: large blocks § Shred into smaller blocks of regular height [Doll, 1994] § Add a number of “fake”nets between these to keep them together (or equivalently, use a larger net weight) § Use the centroid of the placed blocks as the centroid of the macroblock after placement • Can be used for “floorplacement” (floorplanning+placement) 105
Partitioning with CAPO Pros: § fast § scales nearly linearly with problem size Cons: § non-trivial to implement § very directed algorithm, but this limits the ability to deal with miscellaneous constraints § Not stable (if there is minor change) o Concept of placement stability: if the input netlist experiences a small change, the placement should not change substantially 106
Summary for Partition Based Placement • Improvement in mincut partitioning are conducive to better wirelength and congestion • Routable placements can be produced in most cases without explicit congestion management § Explicit congestion control may still be useful in some cases • Better weighted wirelength often implies better routed wirelength, but not always 107
Other Placement Metrics 108
Timing Analysis netlist with delay for each gate PI 1 1 4 6 5 PO 1 PI 2 3 6 6 7 PO 2 PI 3 1 4 4 5 4 PO 3 1 0 1 PI 1 arrival times 0 PI 2 3 0 PI 3 1 7 4 3 1 13 5 6 9 6 6 4 7 5 15 14 7 4 18 22 18 PO 1 PO 2 PO 3 109
Timing Analysis 1/5 0/4 1 PI 1 arrival time/required time 0/0 PI 2 3 0/8 1 PI 3 slack = required time arrival time 3/3 1/9 1 0 PI 2 3 8 PI 3 4 6 4 4 1 8 5 6 6 7/15 5 15/15 14/18 7 4 18/22 22/22 18/22 PO 1 PO 2 PO 3 7/13 2 4 0 13/15 9/9 4 4 PI 1 7/9 2 5 6 0 6 6 4 8 4 6 5 0 4 7 4 4 0 4 PO 1 PO 2 PO 3 110
Another example with interconnect delay – Same Timing Analysis 22 3 L A T C H 2 5 19 2 4 5 1 2 1 4 5 1 L A T C H 1 3 4 2
Timing Driven Placement Approaches • Path-based § Most accurate information § Very slow • Budgeting § Inaccurate information § Hard to budget § Fast • Net-based approach § Net-weighting 112
Net-Weighting • Basic approach § For more timing critical nets (i. e. , smaller slack), assign higher net weights § Minimize where 113
Congestion Minimization • Traditional placement problem is to minimize interconnection length (wirelength) • A valid placement has to be routable • Congestion is important because it represents routability (lower congestion implies better routability) • There is not yet enough research work on the congestion minimization problem 114
Definition of Congestion Routing demand = 3 Assume routing supply is 1, overflow = 3 - 1 = 2 on this edge. Overflow on each edge = Routing Demand - Routing Supply (if Routing Demand > Routing Supply) 0 (otherwise) Overflow = S overflow all edges 115
Correlation between Wirelength and Congestion Total Wirelength = Total Routing Demand 116
Wirelength Congestion A congestion minimized placement A wirelength minimized placement 117
Congestion Map of a Wirelength Minimized Placement Congested Spots 118
Congestion Reduction Postprocessing Reduce congestion globally by minimizing the traditional wirelength Post process the wirelength optimized placement using the congestion objective 119
An Effective Congestion Driven Placement Framework André Rohe University of Bonn, Germany joint work with Ulrich Brenner ISPD 2002 (Best Paper) 120
A Dense Placement • good wirelength • impossible to route 121
Possible Solution • easy to route • bad wirelength/timing 122
Congestion Driven Placement • easy to route + good wirelength almost no extra computation effort ! 123
Overall Algorithm: Bonn Place • Partitioning based approach • Solves QP in each level, followed by partitioning • Partitioning is done by quadrisection: circuits are partitioned with minimum movement (Vygen) 124
Methods used for congestion driven placement • Very fast congestion calculation • Inflate circuits in congested regions • Spreading inflated cells 125
Congestion calculation • Calculate Steiner Tree for each net • Probability estimation for each 2 -point connection (similar to Hung & Flynn, Lou et al. ) 126
Inflation of circuits (used previously by Hou et al. ) • Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by up to 100% • The inflation is based on the congestion in R and the surrounding regions & the pin density in R • Deflation is possible if the circuit is no longer critical. 127
Placement Step 0 128
Placement Step 1 129
Placement Step 2 130
Placement Step 3 131
Placement Step 4 132
Placement Step 5 133
Placement Step 6 134
Placement Step 7 135
Spreading inflated cells • Repartitioning considers 2 x 2 windows in placement grid to optimize netlength • Use extra repartitioning step to move cells away from overloaded regions 136
Placement Migration • From an initial placement (either legal or illegal), migrate to another placement solution • It happens all the time during physical synthesis for multiobjective design closure • Post/detailed placement § Legalization due to buffer insertion, gate sizing, Engineering Change Order (ECO), or decoupling capacitor insertion. § Congestion/noise mitigation § Thermal hot spots removal § … • Global placement § Cell spreading is a KEY step in any analytical engine 137
Prior Works • There are many previous works in legalization/spreading Greedy approach [Viswanathan & Chu, ISPD’ 04]] Force-directed [Eisenmann & Johannes, DAC’ 98] Network flow [Brenner et al, ISPD’ 04] Dynamic Programming [Kahng et al, ASPDAC’ 99; Hur & Lillis ICCAD’ 00] § Global re-placement from a late cut [Ren et al, ICCAD’ 04] § …… § § However, none of them attempted to PRESERVE the relative geometry order in a continuous manner – important for design closure! 138
Diffusion Based Placement Migration [Ren et al, DAC’ 05] Initial placement (illegal) Legal placement from greedy alg. Legal placement from Diffusion Based Placement [Ren+, DAC’ 05] 139
What is Diffusion? • Diffusion equation • Physical diffusion is a natural process to diffuse material from high density to low density while maintaining the relative order 140
Diffusion Flow • The velocity of the diffusion flow is proportional to the local density gradient. • Its final location can be computed by the velocity integral. B A 141
Diffusion-Based Placement Initialize density Compute cell velocity Compute next cell location Update density No Max(density) < 1 Yes 142
Initialize Bin Density 0. 4 1. 0 0. 2 0. 6 1. 2 0. 4 0. 8 0. 6 1. 4 1. 0 0. 4 0. 8 0. 5 1. 6 0. 4 143
Compute Horizontal Bin Velocity 1. 4 1. 0 0. 4 144
Compute Vertical Bin Velocity 0. 4 1. 0 1. 6 145
Compute Bin Velocity 146
Velocity Interpolation 147
Move Cell 148
Update Bin Density 149
Cell Movement Example x(9), y(9) x(0), y(0) 150
Computational Geometry Based Placement [Luo+, ICCAD’ 05] • Another view for stable placement migration to maintain the “relative” geometry order § Bin-based Spreading o Iterative bin stretching o Cell interpolation § Delaunay triangulation based legalization o Delaunay triangulation of the placement region o Fine-grain density distribution and overlapping removing § Geometric placement stability metrics 151
Bin Stretching • Compute the bin density Dk, l(n) in iteration n • Compute εxk, l and εxk, l, the stretching/shrinking amount in Horizontal and Vertical directions. H Dtarget εxk, l W Dk, l(n) εyk, l 152
Bin Stretching Corner Stretching • Stretch each bin individually will generate bin overlaps § stretch bin corners pxk, l, pyk, l instead § If a corner is on a fixed boundary, take the 0. 5 factor off § δ is for controlling the smoothness of spreading process D 3, 4 D 4, 4 p 4, 4(n) D 3, 3 D 4, 3 153
Cell Location Interpolation • Linearly interpolate cell coordinates x(n+1), y(n+1) from x(n), y(n) inside the new stretched bin pk, l(n) Pk-1, l(n+1) x(n+1), y(n+1) x(n), y(n) pk-1, l-1(n+1) β *(pk-1, l(n)-pk-1, l-1(n)) pk-1, l-1(n) pk, l(n+1) Pk, l-1(n) α*(pk, l-1(n)-pk-1, l-1(n)) β *(pk, l(n+1)-pk, l-1(n+1)) pk, l-1(n+1) α*(pk, l-1(n+1)-p’k-1, l-1(n+1)) 154
Hierarchical Iterative Spreading • In every iteration § The bin boundary is restored after cells are moved to new locations § Re-compute the bin density and repeat previous steps § Stops once all bin densities are under Dtarget. • Multilevel spread § Use large bin in the beginning § Divide large bin into smaller bins, selectively do regional spreading. … 155
Delaunay Triangulation • What is Delaunay triangulation? § The dual of the Voronoi diagram § Given a set of point set S, the Voronoi diagram of S is the partition of space such that each point has a closet subspace. • Delaunay triangulation captures the geometric ordering § Any point is connected with all nearest neighbors by Delaunay edges. 156
Delaunay Triangulate a Placement 157
Delaunay Triangulation Based Legalizer • • Delaunay-triangulate the placement region Delaunay-based cell spreading Snap cells to closest rows Row balancing (multiple iterations) § Triangulate the region • Final legalization (multiple iterations) 158
Delaunay Spreading Order • Cell spreading ordering § Pick a root cell in the center of congestion area § BFS walking through Delaunay edges to add cells into the tree § Spreading stops at low congestion area … § [For details, see the paper] 159
Multilevel placement (Note: this is a full placer, not a detailed placement method!) 160
Motivation and Related Work Multilevel Methods in Scientific Computation • Originally developed to solve boundary-value partial differential equation (PDE) problems on continuous field • Discretized elliptic PDE is a structured, positivedefinite system of linear equations § Geometric/algebraic multi-grid methods • Successfully applied to solve hypergraph partitioning problem: § Hmetis [ G. Karypis 1998] 161
Multigrid methods • Structure of analysis problem is similar to solution of partial differential equations (PDE’s) • Borrow the idea of multigrid solution techniques [Taken from U. Meier Yang, ATCS Toolkits Workshop 162
Outline of the approach • Original grid Reduced (coarsened) grid • Solve coarsened grid • Map solution back to original fine grid 163
Illustration of a multilevel algorithm (for graph drawing) Coarse solution Finer solution (in various iterations) Final solution Flat (single-level) solution [Walshaw] 164
Main Components in Multilevel Framework 1. Create coarser problems [aggregation/coarsening/clustering] 2. Optimize coarser problems [relaxation/smoothing] 3. Transform coarser problem solution to finer level [Interpolation/declustering ] 165
m. PL: Multilevel Placement Framework Initial Fine-Grain Problem Aggregate Final Fine-Grain Problem. Thorough Relaxation and Detailed Placement Interpolate Intermediate Level Aggregate etc. Aggregate Intermediate Level Relaxation (Optimization) Interpolate etc. Interpolate Intermediate Level Relaxation (Optimization) Aggregate Interpolate Find a Good Coarse-Grain Problem Solution 166
Solving the Subproblem • Problem formulation (horizontal case): • Iterative solve the weighted quadratic minimization problem, using the current solution to determine the weight (Gordian-L). 167
AMG-based Linear Interpolation [A. Brandt 1986] Next finer level cells cluster constant AMG interpolation C-point Within each cluster, select the one with maximum degree as Cpoint; others are considered as F-points 168
AMG-based Interpolation • Use the clique-model graph to define connectivity weights (connectivity matrix) • Within each cluster, select the one with maximum degree as a Cpoint • Each C-point is placed at the cluster’s position. • Each F-point is placed at the weighted average of the C-points and F-points to which it is strongly connected • The F-points’ positions can be iteratively improved. 169
Iterated Multilevel Flow Make use of placement solution from 1 st V-cycle First Choice (FC) clustering Geometric based FC clustering 170
Comparison on Standard Cell Designs (Caveat: this is outdated) Experiments carried out on ISPD 2004 Fast. Place IBM benchmarks. 171
Placement: Summary 172
Global Placement HPWL (conventional) Metric: linear vs. quadratic wirelength (net weighting) objectives Timing (net weighting) Congestion (cell inflation) Global Placement Simulated Annealing (Timber. Wolf, Dragon) various classes of algorithms Partitioning (Capo) Analytic/ Quadratic/ Force-directed Multilevel (m. PL) types of spreading forces Partitioning based (Gordian) Density based (Kraftwerk, Aplace) Cell shifting (Fastplace) 173
Detailed Placement objectives Legalization Improving wirelength, timing, congestion Detailed Placement techniques Network flows (Domino) Restricted network flows by longest paths (Hur/Lillis-Mongrel) Optimal regions (Kahng-ASPDAC 99; Fastplace: Pan-ICCAD 05) Grid distortion/ Delaunay triangulations (Luo-ICCAD 05) Diffusion analogy (Ren-DAC 05) Greedy Dynamic programming (Kahng-ASPDAC 99; Hur/Lillis-Mongrel, Feng. Shui) 174
- Slides: 174