Reconfigurable Computing Dr Christophe Bobda CSCE Department University

Reconfigurable Computing Dr. Christophe Bobda CSCE Department University of Arkansas 1 © C. Bobda, C. R. K. Prasad

Chapter 3 FPGA Synthesis 2 © Christophe Bobda

Agenda 1. Brief tour in Logic Synthesis 2. LUT-Based technology mapping 1. The chortle algorithm 2. The Flow. Map approach 3 © Christophe Bobda

1. Goal � A structured system is made upon a set of combinatorial parts separated by memory elements � The goal of the logic synthesis is to provide an implementation of a structured system for a given platform or for a given target library � FPGA-Goal: Generation of configuration data � A structured digital system The implementation must be optimized according to factors like area, delay, power consumption, testability, etc. . . 4 © Christophe Bobda

1. Two-level-logic � Two approaches to logic synthesis: � Two-level logic synthesis: targets designs represented in two-level logic sum of product-terms � sums are implemented on the first level and the product on the second level Advantages: � Natural representation of Boolean functions � Well understood and Easy manipulation Drawbacks: not representative of the logic complexity. X 1 * X 2 X 3 X 2 F * + * X 4 * X 3 Two-level logic � bad estimator of complexity during logic optimization Initially developed for PALs and PLAs 5 © Christophe Bobda

1. Multi-level-logic � Multi-level logic synthesis: targets multi- level designs � many Boolean function on the path from the inputs to the outputs � Advantages: F 2 X 1 X 2 X 3 F 1 � Small � Faster X 2 � Consume less power in most cases � Representative of the logic complexity Drawbacks: � Difficult to manipulate X 4 X 6 F 3 X 5 � Few manipulation algorithms exist Appropriate for mask-programmable or field programmable devices X 5 Multi-level logic Multi-level will therefore be considered in this � course � 6 © Christophe Bobda

1. Boolean Networks Multi-level logic are usually represented using � Boolean networks � A Boolean network is a directed Acyclic � graph (DAG) in which: � � A node represents an arbitrary Boolean function � An edge represents the (data) dependency between nodes � � � viable representation is required for manipulation Important factors are: � memory efficiency � correlation with the final representation 7 © Christophe Bobda

1. Node representation � The choices usually made for node representation are: � Sum-Of-Products (SOP) � Factored Form (FF) � Binary Decision Diagram (BDD) � Sum-Of-Product: Sum of product term � Factored form (FF): Defined recursively as follow: � (FF = product) or (FF = sum). � (product = literal) or (product = FF 1*FF 2). � (sum = literal) or (sum = FF 1+FF 2). Example: and is a product of the factored forms , which in turn is a sum of the factored forms 8 © Christophe Bobda

1. Node representation � Binary Decision Diagram (BDD): A BDD is a rooted DAG used to represent Boolean function. Two kinds of nodes exist: � Variable nodes : A variable node v is a non terminal node with the following attributes: � � Two children (i defines a variable x i) and � Constant nodes: A constant node v is a terminal node with � OBDD: A BDD is ordered if an ordering relation exists between its the nodes � Example: ordering the nodes from the root to the terminal � For each non terminal v, if then � Similarly, if is non terminal, then 9 © Christophe Bobda

1. Node representation � Correspondence between a BDD with root v and a Boolean function � The root represents the Boolean function � If v is terminal, then � If v is a non terminal node with index i, the Shannon expansion theorem is used: � The value of for a given assignment is obtained by traversing the graph from the root to the terminal according to the assignment values � The figure aside shows the optimal-BDD representation of the function 10 © Christophe Bobda

1. Node representation � ROBDD (Reduces Ordered BDD) � An ROBDD is an OBDD with: � � the subtree rooted at are not isomorphic and the one � Two BDDs and are isomorphic iff there exists a bijective function s. t � For a terminal node in , is a terminal node in � For a non terminal node , terminal node with and is a non � The figure aside shows the optimal-BDD representation of the function 11 © Christophe Bobda

1. Node manipulation Given a suitable node representation, operations are done on the Boolean network. The goal is the generation of an equivalent and cost effective simplified function. � The operations usually applied for the reduction of Boolean networks are: � � Decomposition: Replace a Boolean expression with a collection of new expressions. A Boolean function such that Example: Decomposition: , is decomposable if we can find a 12 literals Representation with 8 literals � Extraction: Use to identify common intermediate sub-functions from a set of given functions. Example and with and can be rewritten as 12 © Christophe Bobda

1. Node manipulation � Factoring: Transformation of SOP-expressions in factored form Example can be rewritten � Substitution: Replace an expression within a function with the value of an equivalent expression Example: can be rewritten as with � Collapsing or Elimination: Reverse operation to substitution. It is use to eliminate levels in order to meet timing constraints Example: with will be replaced by 13 © Christophe Bobda

2. LUT-Technology mapping � The technology mapping implements the optimized nodes of the Boolean network to the target device library. � In the FPGA case, library elements are LUT. Therefore, this process is called LUT-based Technology mapping. � LUT-Based technology mapping is an optimization process whose goal is usually: � Minimizing the number of LUT used (device area) � Minimizing the signal delay (Speed) � Optimizing routability, minimizing power (very few work) � In this chapter we will study two LUT-technology mapping algorithms. � The chortle-crf for area minimization � The Flow. Map for delay minimization 14 © Christophe Bobda

2. LUT-Technology mapping – definitions � Given a Boolean network: � The fan-in of a node is the set of nodes whose outputs are inputs of � The fan-out of a node is the set of nodes, which use the output of as inputs � A primary input (PI) node is a node with no predecessor. � A primary output (PO) is a node, which has no successor. � The level of a node is the length of the longest path from the primary input to that node. � The depth of a graph is the largest level of a node in the graph. � For a node , � is defined as the set of nodes which are fan-in of A Boolean network is K-Bounded, if graph. 15 for all nodes in the © Christophe Bobda

2. LUT-Technology mapping – definitions � A tree or fan-out-free circuit is one in which each node has a maximal fan-out of one. � A forest is an independent set of trees � A leaf-DAG is a combinatorial circuit in which the only combinatorial gates with a fan-out greater than one are the primary inputs. � The depth of a graph is the largest level of a node in the graph. � For a node , � A Boolean network is K-Bounded, if graph. is defined as the set of nodes which are fan-in of 16 for all nodes in the © Christophe Bobda

2. LUT-Technology mapping – definitions � A Cone at a node is the three with root and which spans from to the primary inputs. � A Cone at a node is K-feasible if: � Any path connecting a node in and � A K-feasible Cone at v lies entirely in The LUT-technology mapping can be defined as the problem of covering a Boolean network with a set of Kfeasible cones. Graph covering with cones 17 LUT Mapping © Christophe Bobda

2. 1 The Chortle-crf algorithm Developed by Francis et al, University of Toronto in 1991. � Two steps approach: � 1 st step: � � Partition the circuit in a set of trees � Separately map the trees into circuits of K-inputs LUTs � 2 nd step: � Assemble the circuits implementing the trees to produce the final circuit � The two main goals are: � Minimizing the number of LUTs and therefore the device area. � Minimizing the number of used pins at the output LUTs. � Transformation of the original graph in trees � Partitioning through duplication of node with fan-out greater than 1. � Leaf-DAG are converted to trees by duplicating common inputs 18 © Christophe Bobda

2. 1 The Chortle-crf algorithm � Mapping the threes into LUT-netlist � Bin packing approach which traverses the node from the PIs to the POs � At each node , the best-circuit implementing the K-feasible cone at searched for. is � Best circuit: � The three routed at should contain the minimum number of LUT � The output LUT, i. e the cone at should contain the maximum number of unused inputs. � The second objective is to minimize the number of input of � Approach: At each node, construct a tree of LUTs that implement: � The function of the fan-in LUT � The decomposition of the node � The construction of the three is done in two steps 19 © Christophe Bobda

2. 1 Chortle-crf algorithm � First step: Two-level decomposition � The two-levels consist of a single first-level and several second- level nodes (the fan-in). � Each second-level node implements the operation of the nodes being decomposed over a set of fan-in LUTs. � The first-level nodes will be implemented in the second phase � The construction is done using bin-packing approach. � bin-packing : find a minimum number of bins with a given capacity to hold a set of boxes � In this case: � the boxes are the second level or fan-in LUTs and � the bins are the resulting LUTs. � The capacity of a bin is the number, K, of LUT-inputs. 20 © Christophe Bobda

2. 1 Chortle-crf algorithm � Packing consist of combining two-input-LUTs LUT 1 (implementing the function f 1) and LUT 2 (implementing the function f 2) into a new LUTr that implements the function f = f 1 Ø f 2, , where Ø is the function implemented in the fan-out node 21 © Christophe Bobda

2. 1 Chortle-crf algorithm � Two-level decomposition � First-fit decreasing � � • • • • Algorithm Two-Level-decomposition { start with an empty list of LUT while there are unpacked fanin LUTs do { if the largest unpacked fanin LUT will not fit within any LUT in the list { create an empty LUT and add it to the end of the LUT list } pack the largest unpacked fanin LUT into the first LUT it will fit within } } 22 © Christophe Bobda

2. 1 The Chortle-crf algorithm � Second step: Multi-level decomposition � The first-level node are implemented with a three of LUTs � Reduction of the number of LUTs is done by using unused pins of the 2 nd level LUTs to implement a portion of the first-level LUTs. • • • • Algorithm Multi. Level { while there is more than one unconnected LUT do { if there are no free inputs among the remaining unconnected LUT { create an empty LUT and add it to the end of the LUT list } connect the most filled unconnected LUT to the next unconnected LUT with a free input } } 23 © Christophe Bobda

2. 1 The Chortle-crf algorithm � Improvement � Preprocessing step to insure before the creation of the forest � Insures that inverted egdes are only available at leaf � No consecutive OR and no consecutive AND available � Exploiting reconvergent paths � A reconvergent path is caused by a node with fan-in > 1 Creates two paths in the graph that terminates at same node Pack reconvergent paths cause by an input in just one LUT 24 © Christophe Bobda

2. 1 The Chortle-crf algorithm � Improvement � Logic replication at fan-out nodes reduces the number of LUTs 25 © Christophe Bobda

2. 2 The Flow. Map algorithm � The Flow. Map algorithm is a network flow-based method aimed at minimizing signal delays of mapped designs. We first recall some basics of network flow. � Given is a network (which is a graph with the set of nodes and the set of edges ) with source and a sink � � A cut is a partition � � � of with and The cut-size of a cut is the number of nodes in adjacent to some nodes in A cut is K-feasible iff The edge cut-size of is the weighted sum of crossing edges. For each node , we define the label of as the depth of the optimal LUT which implements in an optimal mapping of the subgraph of (where is the cone at ) The height of is the maximum label in The volume of a cut is the number of nodes in 26 © Christophe Bobda

2. 2 The Flow. Map algorithm � The objective of the Flow. Map algorithm is the minimization of the signal delays determined by: � The delay in the LUTs. � The interconnection delay. � LUT placement is not known during the technology mapping step. only LUT delay is considered. Interconnection delay is assumed to be the same for all signals. The delay of a signal is therefore the number of LUTs that the signal traverses on a path from input to output. � minimization of the depth of the resulting DAG. � � The Flow. Map algorithm is a two-steps Method: � Node labelling phase. � Node mapping phase. 27 © Christophe Bobda

2. 2 The Flow. Map algorithm � The First phase of the algorithm computes the labels of the nodes in a topological order. each nodes is processed after all its predecessors � The labelling is done as follow: � Each primary input is assigned the label 0. � For a given node to be processed, the cone is transformed into a network by inserting a source node whose output is connected to all inputs of. � With the assumption that mapping of nodes in � The level � Lemma 1: If , the cut and of implements in an optimal , where is the set of is K-feasible. Network transformation is then given by: is the maximum label in 28 , then © Christophe Bobda

2. 2 The Flow. Map algorithm � Lemma 1: If Proof: � Let is the maximum label in , then � � � then for any cut in either or also determines a K-feasible cut where and in In the first case, we have with , and therefore In the second case we have the label of a node cannot be smaller than that of its predecessor. 29 © Christophe Bobda

2. 2 The Flow. Map algorithm is a K-feasible cut, because Because each node in in is K-bounded ( is either in , the maximum label of the nodes in ) or is a predecessor of some node is , i. e 30 © Christophe Bobda

2. 2 The Flow. Map algorithm � Lemma 2: Let be the network obtained from by collapsing all the nodes with maximum label p in into a single node has a K-feasible cut of height iff has a K-feasible cut. � Proof: � if has a K-feasible cut , then No node in has a label according to lemma 1 we have � v , set and is a K-feasible cut in Network collapsing if has a K-feasible cut of height , then cannot contain a node with label , i. e. Forms a K-feasible cut in is the set of collapsed nodes 31 © Christophe Bobda

2. 2 The Flow. Map algorithm � The problem of testing if a K-feasible cut with height exists can be done by first transforming into � A second transformation is done to transform new network. into a � For each node in other than and , two new nodes and are introduced and connected by a bridging edge � The source and sink are also inserted in. For each edge , an edge is inserted in. � For each edge in a new edge is introduced in. � The capacity of each bridging edge is set to 0 and that of non bridging edge is set to � Second transformation The goal of this step is: � to reduce the node cut-size in into an edge cut-size in � applied well known methods to solve the edge cut-size in � finally derive the equivalent solution in This will be done using the following Lemma 32 © Christophe Bobda

2. 2 The Flow. Map algorithm � Lemma 3: � Testing is such a cut exists in is done using Min-cut max-flow theorem (the minimum cut produce the maximal flow between source and sink). The augmenting path method is then used to increasingly detect if the value of a flow in is more than K. has a K-feasible cut iff whose edge size is no more than K. has a cut Second transformation Derived solution 33 © Christophe Bobda

2. 2 The Flow. Map algorithm � Flow in a network: stream of data from the source to sink � Residual value = flow – capacity. The residual value can be added to the flow on an edge in order to saturate that edge � Capacity of a cut = sum of all positive crossing edge capacity (not influenced by negative crossing edges) � Residual network = residual edges + associated nodes � Augmenting path = path from source to sink in the residual network � Max-flow min-cut theorem: Ford and Fulkerson � The value of flow is bounded by the capacity of any cut in the network � the maximum flow is bounded from abobe by the minimum cut capacity � K-feasible cut only exists iff the maximum value of any flow is less than K � Approach: since we are only interested in testing if the value of a cut is less than K, we use the method of augmenting path � Augmenting path approach: � increase the value on the residual path and test � Test the value of the resulting flow. If les than K continue. If more stop 34 © Christophe Bobda

2. 2 The Flow. Map algorithm � In the second phase of the Flow. Map algorithm, nodes are mapped to KLUTs. The algorithm works on the set of outputs of the Boolean network. � Initially contains all primary outputs � For each node , it is assumed that a minimum K-feasible cut have been computed in the first phase. � A K-LUT is created to implement the function of as well as that of all nodes in � Is then updated to � Nodes belonging to two different cut-set and will be automatically duplicated. � 35 © Christophe Bobda