Enumeration of Irredundant Circuit Structures Alan Mishchenko Department

  • Slides: 22
Download presentation
Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley

Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley

Overview l l Logic synthesis is important and challenging task Boolean decomposition is a

Overview l l Logic synthesis is important and challenging task Boolean decomposition is a way to do logic synthesis l l Drawbacks l l l Incomplete algorithms - suboptimal results Computationally expensive algorithms - high runtime Our goal is to overcome these drawbacks l l l Several algorithms - many heuristics Perform exhaustive enumeration offline Use pre-computed results online, to get good Q&R and low runtime Practical discoveries l l The number of unique functions up to 16 inputs is not too high The number of unique decompositions of a function is not too high 2

Background l l And-Inverter Graphs Structural cuts and mapping Small practical functions (SPFs) Boolean

Background l l And-Inverter Graphs Structural cuts and mapping Small practical functions (SPFs) Boolean decomposition l l l Disjoint-support decomposition Non-disjoint-support decomposition NPN classification Boolean matching LUT mapping and LUT structure mapping

AIG Definition and Examples AIG is a Boolean network composed of two-input ANDs and

AIG Definition and Examples AIG is a Boolean network composed of two-input ANDs and inverters. a cd b 00 01 11 10 00 0 0 1 1 11 0 10 0 0 1 0 F(a, b, c, d) = ab + d(ac’+bc) a 6 nodes d b 4 levels a a cd b 00 01 11 10 00 0 0 1 1 11 0 10 0 0 1 0 c b c F(a, b, c, d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) 7 nodes 3 levels a c b d b c a d

Mapping in a Nutshell l AIGs reprsent logic functions l l Primitives with delay,

Mapping in a Nutshell l AIGs reprsent logic functions l l Primitives with delay, area, etc LUT Computes a cover of AIG using primitives of the technology Computes cuts for each AIG node Associates each cut with a primitive Selects a cover with a minimum cost LUT a b c d e Primary outputs Structural bias l l Uses a description of a technology Cut-based structural mapping l l l LUT Structural mapping l l f f Technology l l A good subject graph for mapping Technology mapping expresses logic functions to be implemented l Mapped network AIG Good mapping cannot be found because of the poor AIG structure Choice node Overcoming structural bias l Need to map over a number of AIG structures (leads to choice nodes) Primary inputs

Small Practical Functions l Classifications of Boolean functions l l l Logic synthesis and

Small Practical Functions l Classifications of Boolean functions l l l Logic synthesis and technology mapping deal with l l l Random functions Special function classes (symmetric, unate, etc) Functions appearing in the designs Functions with small support (up to 16 variables) These functions are called small practical functions (SPFs) We will concentrate on SPFs and study their properties In particular, we will ask l l How many different SPFs exist? How many different irredundant logic structures they have?

Ashenhurst-Curtis Decomposition Z(X) = H( G(B), F ), X = B F B (Bound

Ashenhurst-Curtis Decomposition Z(X) = H( G(B), F ), X = B F B (Bound Set) X Z Z F (Free Set) if B F = , this is disjoint-support decomposition (DSD) if B F , this is non-disjoint-support decomposition

Example of Deriving DSD Bound Set={a, b} Free Set ={c, d} Incompatibility Graph 00

Example of Deriving DSD Bound Set={a, b} Free Set ={c, d} Incompatibility Graph 00 01 11 10 00 1 1 01 1 0 11 0 1 10 0 0 G 1 G =2 2 4 G 3 F(a, b, c, d) = (a b + ab)c + (a b+ ab )(cd+c d ) G(a, b)= a b +ab H(G, c, d) = Gc + G (cd+c d ) G

DSD Structure l l DSD structure is a tree of nodes derived by applying

DSD Structure l l DSD structure is a tree of nodes derived by applying DSD recursively until remaining nodes are not decomposable DSD is full if the resulting tree consists of only simple gates (AND/XOR/MUX) DSD is partial if the resulting tree has non-decomposable nodes (called prime nodes) DSD does not exist if the tree is composed of one node Full DSD a f Partial DSD No DSD f b cde abc de f

Computing DSD l l The input is a Boolean function The output is a

Computing DSD l l The input is a Boolean function The output is a DSD structure l The structure is unique up to several normalizations, for example l l l Placement of inverters Factoring of multi-input AND/XOR gates Ordering of fanins of AND/XOR gates Ordering of data inputs of MUXes NPN representative of prime nodes F This computation is fast and reliable l l Originally implemented with BDDs (Bertacco et al) In a limited form, re-implemented with truth tables l l F(a, b, c, d) = ab + cd Detects about 95% of DSDs of cut functions In 8 -LUT mapping, it takes roughly the same time to l l l to compute structural cuts to derive their truth tables to compute DSDs of the truth tables a b c d

Pre-computing Non-Disjoint-Support Decompositions l Enumerate bound sets while increasing size l Enumerate shared sets

Pre-computing Non-Disjoint-Support Decompositions l Enumerate bound sets while increasing size l Enumerate shared sets while increasing size l G If the bound+shared set is irredundant l l H Add it to the computed set a b CD e Bound+shared set is redundant l l If it a variable can be removed and the resulting set is still decomposable Ex: (ab. CD) is redundant if (abc. D) or (ab. D) is valid H H G abc D G e ab D ce

Example of Non-DS Decomposition: Mapping 4: 1 MUX into two 4 -LUTs The complete

Example of Non-DS Decomposition: Mapping 4: 1 MUX into two 4 -LUTs The complete set of support-reducing bound-sets for Boolean function of 4: 1 MUX: Set Set Set Set 0: S=1 D=3 C=5 1: S=1 D=3 C=5 2: S=1 D=3 C=5 3: S=1 D=3 C=5 4: S=1 D=3 C=5 5: S=1 D=3 C=5 6: S=1 D=3 C=5 7: S=1 D=3 C=5 8: S=1 D=4 C=4 9: S=1 D=4 C=4 10 : S = 1 D = 4 C = 4 11 : S = 1 D = 4 C = 4 12 : S = 2 D = 5 C = 4 13 : S = 2 D = 5 C = 4 14 : S = 2 D = 5 C = 4 15 : S = 2 D = 5 C = 4 x=Acd y=x. Abef x=Bce y=xa. Bdf x=Ade y=x. Abcf x=Bde y=xa. Bcf x=Acf y=x. Abde x=Bcf y=xa. Bde x=Bdf y=xa. Bce x=Aef y=x. Abcd x=a. Bcd y=x. Bef x=Abce y=x. Adf x=Abdf y=x. Ace x=a. Bef y=x. Bcd x=ABcde y=x. ABf x=ABcdf y=x. ABe x=ABcef y=x. ABd x=ABdef y=x. ABc

Application to LUT Structure Mapping: Matching 6 -input function with LUT structure “ 44”

Application to LUT Structure Mapping: Matching 6 -input function with LUT structure “ 44” Case 2 Case 1 abc de f Case 3 f f abc de H H H f G G G abc D H’ f f ef abc D e ab. C G de ab. C de

Application to Standard Cell Mapping l l l Enumerate decomposable bound sets Enumerate decomposition

Application to Standard Cell Mapping l l l Enumerate decomposable bound sets Enumerate decomposition structures for each bound set Use them as choice nodes Use choice nodes to improve quality of Boolean matching Property: When non-disjoint-support decomposition is applied, there are exactly M = 2^((2^k)-1) pairs of different NPN classes of decomposition/composition functions, G and H, where k is the number of shared variables F H G k M 0 1 1 2 2 8 3 128 4 32768 5 2147483648

Example of a Typical SPF abc 01> rt 000 A 115 F abc 02>

Example of a Typical SPF abc 01> rt 000 A 115 F abc 02> pk Truth table: 000 a 115 f d e a b c 0 0 1 1 1 1 0 0 0 1 1 0 +---+---+---+---+ 00 | 1 | 1 | 1 | | | 1 | +---+---+---+---+ 01 | | | 1 | +---+---+---+---+ 11 | | | | | +---+---+---+---+ 10 | 1 | | | | +---+---+---+---+ NOTATIONS: !a is complementation NOT(a) (ab) is AND(a, b) [ab] is XOR(a, b) <abc> is MUX(a, b, c) = ab + !ac <truth_table>{abc} is PRIME node abc 01> rt 000 A 115 F abc 02> print_dsd –d F = 0505003 F(a, b, c, d, e) This 5 -variable function has 10 decomposable variable sets: Set 0 : S = 1 D = 3 C = 4 x=ab. C y=x. Cde 0 : <cba> 011 D{decf} 1 : <c!ba> 110 D{decf} Set 1 : S = 1 D = 3 C = 4 x=b. Cd y=xa. Ce 0 : !(!d!(cb)) <e(!c!a)!f> 1 : 1 C{bdc} 3407{aecf} Set 2 : S = 1 D = 3 C = 4 x=ab. E y=xcd. E 0 : <eab> 0153{cdef} 1 : <e!ab> 5103{cdef} Set 3 : S = 1 D = 3 C = 4 x=ac. E y=xbd. E 0 : !(!c!(ea)) 01 F 3{bdef} 1 : 1 C{ace} F 103{bdef} Set 4 : S = 1 D = 3 C = 4 x=bc. E y=xad. E 0 : (c!(!e!b)) (!f<e!a!d>) 1 : 38{bce} 5003{adef} Set 5 : S = 1 D = 3 C = 4 x=b. Ce y=xa. Cd 0 : !(!e!(cb)) <f(!c!a)!d> 1 : 1 C{bec} 3503{adcf} Set 6 : S = 1 D = 3 C = 4 x=ad. E y=xbc. E 0 : <ead> (!f!(c!(!e!b))) 1 : <e!ad> 3007{bcef} Set 7 : S = 1 D = 4 C = 3 x=abc. E y=xd. E 0 : FAC 0{abce} (!f!(!ed)) 1 : 05 C 0{abce} C 1{def} Set 8 : S = 1 D = 4 C = 3 x=a. Cde y=xb. C 0 : <e!(!c!a)d> (!f!(cb)) 1 : 03 AC{adec} 43{bcf} Set 9 : S = 1 D = 4 C = 3 x=bcd. E y=xa. E 0 : CCF 8{bcde} (!f!(ea)) 1 : 33 F 8{bcde} 43{aef}

Statistics of DSD Manager abc 01> pub 12_16. dsd; dsd_ps Total number of objects

Statistics of DSD Manager abc 01> pub 12_16. dsd; dsd_ps Total number of objects = Externally used objects = Non-DSD objects (max =12) = Non-DSD structures = Prime objects = Memory Memory used used 0 1 2 3 4 5 6 7 8 9 10 11 12 All All All All : : : : for for for = = = = objects functions hash table bound sets array 1 1 2 10 229 3823 22273 77959 200088 396307 661620 972333 1233234 3567880 Non Non Non Non This DSD manager was created using cut enumeration applied to *all* MCNC, ISCAS, and ITC benchmarks circuits (the total of about 835 K AIG nodes). 3567880 3060774 479945 3220044 1405170 = = = 100. 04 238. 01 40. 83 79. 98 27. 22 = = = = 0 0 0 0 MB. MB. MB. ( ( ( ( abc 01> time elapse: 3. 00 seconds, total: 3. 00 seconds This involved computing 16 priority 12 -input cuts at each node. 0. 00 0. 00 %) %) %) %) Binary file “pub 12_16. dsd” has size 177 MB. Gzipped archive has size 42 MB. Reading it into ABC takes 3 sec.

Typical DSD Structures NOTATIONS: !a is complementation NOT(a) (ab) is AND(a, b) [ab] is

Typical DSD Structures NOTATIONS: !a is complementation NOT(a) (ab) is AND(a, b) [ab] is XOR(a, b) <abc> is MUX(a, b, c) = ab + !ac <truth_table>{abc} is a PRIME node with hexadecimal <truth_table>

Support-Reducing Decompositions For each support size (S) of NPN classes of non-DSD-decomposable functions -

Support-Reducing Decompositions For each support size (S) of NPN classes of non-DSD-decomposable functions - the columns are ranges of counts of irredundant decompositions - the entries are percentages of functions in each range - the last two columns are the maximum and average decomposition counts

LUT Structure Mapping LUT: LUT count Level: LUT level count Time, s: Runtime, in

LUT Structure Mapping LUT: LUT count Level: LUT level count Time, s: Runtime, in seconds The last two columns: - with online DSD computations - with offline DSD computations (based on pre-computed data)

LUT Level Minimization 6 -LUT mapping: Standard mapping into 6 -LUTs LUTB: DSD-based LUT

LUT Level Minimization 6 -LUT mapping: Standard mapping into 6 -LUTs LUTB: DSD-based LUT balancing proposed in this work SOPB+LUTB: SOP balancing followed by LUT balancing (ICCAD’ 11) LMS+LUTB: Lazy Man’s Logic Synthesis followed by LUT balancing (ICCAD’ 12)

Conclusions l l l Introduced Boolean decomposition Proposed exhaustive enumeration of decomposable sets Discussed

Conclusions l l l Introduced Boolean decomposition Proposed exhaustive enumeration of decomposable sets Discussed applications to Boolean matching Experimented with benchmarks to find a 3 x speedup in LUT structure mapping Future work will focus on l l l Improving implementation Extending to standard cells Use in technology-independent synthesis

Abstract l A new approach to Boolean decomposition and matching is proposed. It uses

Abstract l A new approach to Boolean decomposition and matching is proposed. It uses enumeration of all support-reducing decompositions of Boolean functions up to 16 inputs. The approach is implemented in a new framework that compactly stores multiple circuit structures. The method makes use of pre-computations performed offline, before the framework is started by the calling application. As a result, the runtime of the online computations is substantially reduced. For example, matching Boolean functions against an interconnected LUT structure during technology mapping is reduced to the extent that it no longer dominates the runtime of the mapper. Experimental results indicate that this work has promising applications in CAD tools for both FPGAs and standard cells.