Scalable DontCareBased Logic Optimization and Resynthesis Alan Mishchenko
Scalable Don’t-Care-Based Logic Optimization and Resynthesis Alan Mishchenko, University of California, Berkeley Robert Brayton, University of California, Berkeley Jie-Hong Roland Jiang, National Taiwan University Stephen Jang, Xilinx, Inc
Outline • • • Motivation Brief history of don’t-cares Algorithm overview Algorithm components Experimental results Conclusion 2
Motivation Network after mapping Optimized network resynthesis f Applications Requirements • • • tech-independent synthesis post-mapping delay/area optimization placement-aware resynthesis f substantial logic restructuring flexibility to solve many optimization tasks reasonable runtime for large designs Our solution • • • SAT-based re-synthesis with don’t-cares using resubstitution 3
Brief History of Don’t-Cares • Previous century work (1960 -2000) – Permissible functions (Saburo Muroga, 1989) – Compatible observability don’t-care (Hamid Savoj, 1992) • Complete rather than compatible don’t-cares (2002) • SAT-based don’t-care computation (2005) • Interpolation-based optimization with don’t-cares without explicitly computing don’t-cares (this talk) 4
Background Summary • Assuming familiarity with – – Networks and nodes Cuts and cones Don’t-cares and resubstitution SAT-based interpolation 5
Resubstitution with Don’t-Cares Consider all or some nodes in Boolean network For each node • Create window • Select possible fanin nodes (divisors) • For each candidate subset of divisors – Rule out some subsets using simulation – Check resubstitution feasibility using SAT – Compute resubstitution function using interpolation • A low-cost by-product of completed SAT proofs • Update the network, if there is an improvement 6
Resubstitution considers a node in a Boolean network and expresses it using a different set of fanin nodes X X Computation can be enhanced by don’t cares 7
Windowing a Node in the Network for Don’t-Care Computation • Definition Boolean network (k-LUT mapped circuit) – A window for a node in the network is the context in which the don’t-cares are computed • A window includes – n levels of the TFI – m levels of the TFO – all re-convergent paths captured in this scope • Window with its PIs and POs can be considered as a separate network Window POs m=3 n=3 Window PIs 8
Don’t-Care Representation Miter for don’t-care computation If output is 1, input is a care If output is 0, input is a don’t-care … Window f f Same window with inverter 9
Resubstitution with Don’t Cares • Given: – node function F(x) to be replaced – care set C(x) for the node – candidate set of divisors {gi(x)} for expressing F(x) C(x) F(x) • Find: g 1 g 2 g 3 F’(x) – A resubstitution function h(y) such that F(x) = h(g(x)) on the care set C(x) F(x) h(g) • SPFD Theorem: – Function h exists if and only if each pair of care minterms, x 1 and x 2, distinguished by F(x), is also distinguished by gi(x) for some i g 1 g 2 g 3 10
Checking Resubstitution using SAT Miter for resubstitution check SPFD Theorem in practice h (g ) Comments: • Note use of care set, C. • Resubstitution function exists if and only if the SAT problem is unsatisfiable 11 • Function h(g) is computed using interpolation
Experimental Setup • Implemented in ABC (command “mfs”) • The SAT solver is a modified version of Mini. Sat-1. 14 C, by Niklas Een and Niklas Sorensson • The algorithm was applied to a mapped network and attempted resubstitution for each LUT to reduce (a) area, (b) number of fanins. • Experiments targeting networks after FPGA mapping into 6 -LUTs on an Intel Xeon 2 -CPU with 8 Gb of RAM • The resulting networks have been verified using equivalence checker in ABC (command “cec”) • Optimization scripts used – Baseline: result of (dc 2 –l; if –C 12)1 – Choices: best result of (st; dch; if –C 12)4 – Mfs: best result of (st; dch; if –C 12; mfs –W 4)4 12
Results for Academic Benchmarks Example PI alu 4 apex 2 apex 4 bigkey clma des diffeq dsip elliptic ex 1010 ex 5 p frisc i 10 misex 3 pdc s 38417 s 38584 seq spla tseng Ratio 14 39 9 263 383 256 64 229 131 10 8 20 257 14 16 28 12 41 16 52 Profile PO 8 3 19 197 82 245 39 197 114 10 63 116 224 14 40 106 278 35 46 122 FF 0 0 0 224 33 0 377 224 1122 0 0 886 0 0 0 1636 1452 0 0 385 LUT 845 987 821 567 3309 880 712 682 1877 2934 593 1777 595 772 2113 2257 2319 872 1622 717 1. 000 Baseline Level 5 6 5 3 10 5 7 3 10 6 5 12 9 5 7 6 7 5 6 7 1. 000 Time 0. 46 0. 53 0. 41 0. 60 1. 80 0. 62 0. 37 0. 50 0. 85 1. 48 0. 37 1. 06 0. 39 0. 43 1. 35 1. 33 1. 47 0. 50 1. 08 0. 30 1. 000 LUT 786 922 798 567 2910 872 690 681 1914 2712 521 1749 554 701 1959 2271 2373 834 1417 690 0. 952 1. 000 Choices Level 5 6 5 3 9 4 7 3 10 6 5 12 9 5 6 6 7 5 6 7 0. 976 1. 000 Time 2. 23 5. 80 2. 10 0. 86 16. 23 2. 90 0. 80 0. 58 2. 20 17. 14 1. 58 7. 43 1. 37 2. 19 15. 36 7. 09 8. 41 4. 64 11. 58 0. 63 4. 831 1. 000 LUT 499 674 786 455 701 638 645 677 1813 1342 119 1757 545 368 128 2206 2250 684 161 639 0. 550 0. 578 Mfs Level 5 6 5 3 7 4 7 2 10 6 3 11 8 5 5 6 6 5 4 7 0. 878 0. 900 Time 15. 53 33. 71 16. 41 1. 68 122. 24 7. 88 2. 77 1. 65 4. 80 101. 13 4. 57 16. 64 9. 35 12. 11 25. 91 26. 11 14. 01 17. 73 19. 12 2. 35 17. 101 3. 540 13
Results for Industrial Benchmarks Example Profile PI PO Baseline FF LUT Lev Choices Time LUT Lev Mfs Time LUT Lev Time Design 01 1332 5064 5625 15453 8 10. 08 14830 8 62. 17 13793 7 104. 91 Design 02 1559 5701 10373 28091 10 21. 50 26972 9 134. 89 24997 9 312. 14 Design 03 993 5533 6430 15033 10 7. 43 14428 10 40. 69 14010 10 118. 00 Design 04 974 1301 940 2841 31 2. 09 2723 30 7. 82 2697 30 121. 33 Design 05 101 198 1177 2649 6 1. 60 2554 5 10. 86 2222 5 20. 80 Design 06 68 85 1355 3624 19 2. 53 3385 16 27. 58 3192 15 102. 77 Design 07 6598 11151 22382 71637 17 61. 73 69747 15 475. 84 63116 13 1154. 14 Design 08 2126 6451 7075 20504 15 12. 27 19860 14 70. 61 18943 12 150. 09 Design 09 2450 4798 3725 9951 4 3. 13 9718 4 9. 50 9374 3 21. 67 Design 10 1032 1767 1124 4447 10 2. 24 4299 10 15. 13 4105 9 44. 32 Design 11 4040 9406 35654 83113 16 71. 99 81601 16 472. 68 73478 14 1748. 12 Design 12 115 264 2293 5413 7 3. 53 5209 6 24. 07 4576 6 49. 35 Design 13 56 87 465 1756 12 1. 19 1311 8 8. 19 1162 8 27. 44 Design 14 14 60 426 1448 9 0. 91 1455 8 8. 79 1382 7 34. 77 1. 000 0. 949 0. 90 6. 310 0. 882 0. 83 18. 801 1. 000 0. 930 0. 92 2. 979 Ratio 14
Conclusion • Introduced a new SAT-based logic optimization engine – uses rugged windowing scheme without previous limitations – uses SAT solver for all aspects of functional manipulation – designed for scalability and applicable to large industrial circuits • Showed promising experimental results – academic benchmarks (10 -40% in area, 10% in delay) – industrial benchmarks (7% in area, 8% in delay) – improvements can be made even on top of strong synthesis • Future work – – improving runtime by fine-tuning simulation and SAT experimenting with timing-driven and power-aware resynthesis extending don’t-care computation to work with white-boxes global circuit restructuring using interpolation 15
The End 16
17
Algorithm Overview node. Sat. Based. Resynthesis( node, parameters ) { window = node. Window( node, parameters ); divisors = node. Divisors( node, window, parameters ); cands = node. Resub. Cands. Filter( node, window, parameters ); best_cand = NULL; for each candidate set c in cands { if ( best_cand != NULL && resub. Cost(best_cand) < resub. Cost(c) ) continue; if ( !resub. Feasible( node, window, c ) ) continue; best_cand = c; } if ( best_cand != NULL ) { best_func = node. Interpolate( sat_solver, node ); node. Update( node, best_cand, best_func ); } } 18
Divisor Selection • Divisor is a candidate fanin of the pivot node after resubstitution • Divisor computation: Window POs – Partition window PIs into (a) those in the TFI node of the pivot (b) the remaining window PIs Pivot node ) (b b) e( k=3 ty pe typ – Add nodes between the pivot and window PIs of type (a), excluding the node and the node’s MFFC – Add nodes in the window if their structural support has no window PIs of type (b) – Do not collect divisors whose level exceed a limit – Do not collect more than a given number of divisors m=3 type (a) Window PIs 19
Resubstitution • Resubstitution of F(x) with care set C(x) and candidate functions {gi(x)} exists iff every pair of care minterms, x 1 and x 2, distinguished by F(x), is also distinguished by gi(x) for some i – That is, if information of F(x) does not exceed that of {gi(x)} Example: Given F = (a b)(b c), C = 1 Two candidate sets: {y 1= a’b, y 2 = ab’c}, {y 3= a b, y 4 = bc} Set {y 1, y 2} is feasible Set {y 3, y 4} is infeasible Counter-example: x 1 = 100, x 2 = 101 abc F y 1 y 2 y 3 y 4 000 0 0 001 0 0 010 1 1 0 011 1 1 0 1 1 100 0 1 0 101 1 0 110 0 1 0 111 0 0 0 1 1 20
Computing Dependency Function • Definition of the interpolant: – Consider A(x, y) and B(y, z), such that A(x, y) B(y, z) = 0, where x and z appear only in the clauses of A and of B, respectively, and y are variables common to A and B. – An interpolant of function A(x, y) w. r. t. function B(y, z) is a Boolean function, I(y), depending only on the common variables y, such that A(x, y) I(y) and I(y) (y, z). • Problem: – Find function h(g), such that h(g(x)) can replace f(x) on care set C(x), that is, C(x) [h(g(x)) f(x)]. The dependency function h(g) expresses the node, f(x), in terms of {gi}. • Solution: – Prove the corresponding SAT problem “unsatisfiable” – Derive unsatisfiability proof [Goldberg/Novikov, DATE’ 03] – Derive interpolant from the unsatisfiability proof using Mc. Millan’s procedure [CAV’ 03] (assume A and B as shown on previous slide) – Use interpolant as the dependency function, h(g) 21
Resynthesis Heuristics • Resynthesis is attempted for each node • Window, divisors, and resubstitution candidates are computed • Heuristics for different minimization criteria: – Area • Try replacing each fanin whose reference counter is 1 – Fanin count • Try replacing each fanin – Delay • Try replacing each fanin that is on the critical path 22
Previous Work • Optimization and mapping with internal flexibilities – S. Muroga, Y. Kambayashi, H. C. Lai, and J. N. Culliney, “The transduction method-design of logic networks based on permissible functions”, IEEE Trans. Comp, Vol. 38(10), pp. 14041424, Oct 1989 – H. Savoj. Don't cares in multi-level network optimization. Ph. D. Dissertation, UC Berkeley, May 1992. – V. N. Kravets and P. Kudva, “Implicit enumeration of structural changes in circuit optimization”, Proc. DAC ’ 04, pp. 438 -441. – A. Mishchenko and R. Brayton, "SAT-based complete don't-care computation for network optimization", Proc. DATE '05, pp. 418 -423. – K. Mc. Millan, “Don't-care computation using k-clause approximation”, Proc. IWLS ’ 05, pp. 153 -160. • Equivalence under don’t-cares – Q. Zhu, N. Kitchen, A. Kuehlmann, and A. L. Sangiovanni-Vincentelli. "SAT sweeping with local observability don't-cares, " Proc. DAC ’ 06, pp. 229 -234. – S. Plaza, K. -H. Chang, I. L. Markov, and V. Bertacco, “Node mergers in the presence of don't cares'', Proc. ASP-DAC’ 07, pp. 414 -419. • Maximal reduction resynthesis without don’t-cares – K. -C. Chen and J. Cong, “Maximal reduction of lookup-table-based FPGAs”, Proc. DATE ’ 92, pp. 224 -229. • Computing dependency functions using interpolation – C. -C. Lee, J. -H. R. Jiang, C. -Y. Huang, and A. Mishchenko. “Scalable exploration of functional dependency by interpolation and incremental SAT solving”, Proc. IWLS’ 07. 23
Experimental Results • Implementation of SAT-based resynthesis – ABC: Logic synthesis and verification system developed at UC Berkeley – SAT solver used is Mini. Sat-C_v 1. 14. 1 by Niklas Een and Niklas Sörensson • Outline of experiments – Perform technology-independent synthesis: resyn; if – Perform high-quality FPGA mapping: if – Perform resynthesis • without choices: imfs –W 66; imfs –a –W 66; imfs -W 66 • with choices (script is more complicated) – Measure gain in area, delay, net count • Commands used in the scripts – if is a new efficient FPGA mapper based on priority cuts – imfs is the new logic optimization and resynthesis engine described in the present paper, – resyn is a fast logic synthesis script that performs 5 iterations of AIG rewriting, – choice is a logic synthesis script that performs 15 passes of AIG rewriting and collects three snapshots of the current network: the original, the final, and an intermediate AIG saved after the first 5 rewriting passes. • Computer used – ? – Runtime is several minutes for the largest designs in the tables 24
Academic Benchmarks 25
Academic Benchmarks (PLAs) 26
- Slides: 26