Mapping into LUT Structures Sayak Ray Alan Mishchenko

  • Slides: 16
Download presentation
Mapping into LUT Structures Sayak Ray, Alan Mishchenko, Niklas Een, Robert Brayton Department of

Mapping into LUT Structures Sayak Ray, Alan Mishchenko, Niklas Een, Robert Brayton Department of EECS, UC Berkeley Stephen Jang, Chao Chen Agate Logic Inc.

Contributions (in a nutshell) • • New mapping algorithm for FPGAs, which maps into

Contributions (in a nutshell) • • New mapping algorithm for FPGAs, which maps into LUT structures, instead of LUTs It has two applications: (1) Improving the quality of mapping into LUTs – Area improves by 7. 4% on average – Delay improves by 11. 3% on average (2) Improving delay for specialized hardware, which supports non-routable connections – Delay improves by 40% on average – With some area penalty

LUT Structure • LUT-structure – a group of LUTs connected by direct, non-routable wires

LUT Structure • LUT-structure – a group of LUTs connected by direct, non-routable wires Non-routable Wire 7 -input LUT structure “ 44” Non-routable Wire 10‑input LUT structure “ 444”

Some Terminology • • Let (X) be a Boolean function Let X 1 X

Some Terminology • • Let (X) be a Boolean function Let X 1 X be a subset of its support Suppose {q 1(X), q 2(X), …, q (X)} is the set of distinct cofactors of w. r. t. X 1 • is called the column multiplicity of w. r. t X 1 Given a partition of X into two disjoint subsets X 1 and X 2, we say that Ashenhurst-Curtis decomposition of (X) exists if (X) can be expressed as (X) = h(g 1(X 1), g 2(X 1), …, gk(X 1), X 2) • X 1 : bound set • X 2 : free set

Flow of perform. Lut. Matching. XY 1 Support. Minimize removes vacuous variables 2 find.

Flow of perform. Lut. Matching. XY 1 Support. Minimize removes vacuous variables 2 find. Output. Decomposition Checks for f = x G 3 find. Good. Bound. Set 4 check. Special. Non. Disjoint 5 reverse. Variable. Order 6 find. Good. Bound. Set 7 check. Special. Non. Disjoint • Variable reordering in truth table • Allows cases = 2, 3, 4 • For = 3, 4, consider special decomposition with one shared variable only A heuristic to find suitable decomposition

Checking for XYZ decomposition • X, Y, and Z are sizes of the main/fanin

Checking for XYZ decomposition • X, Y, and Z are sizes of the main/fanin LUTs • Two step process • Checking for XW where W = Y + Z – 2 • If it exists, then check the remainder function G for YZ Priority cut-based technology mapper is modified to accommodate the algorithm for XY and XYZ The results of decomposition checking are cached • This substantially reduces runtime on large designs • •

Experiment 1 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 7

Experiment 1 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 7

Experiment 2 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 8

Experiment 2 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 8

Experiment 3 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 9

Experiment 3 Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 9

Experiment 4 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012

Experiment 4 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 10

Experiment 5 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012

Experiment 5 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 11

Experiment 6 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012

Experiment 6 – Delay Optimization Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 12

Experiment 7 : industrial design Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012

Experiment 7 : industrial design Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 13

Experiment 8 : industrial design Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012

Experiment 8 : industrial design Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 14

Future Work • • • Improving Implementation • Handling delay driven decomposition – Currently

Future Work • • • Improving Implementation • Handling delay driven decomposition – Currently we ignore arrival time, and just care about detecting any decomposition – Using semi-canonical form to increase the number of hits in the hash table of computed results – Making truth-table based decomposition even faster Combining Boolean decomposition into LUT structures with structural mapping of LUTs into clusters Evaluating results after place and route • This will be especially interesting when specialized hardware is available Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 15

Questions • Questions…. Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 16

Questions • Questions…. Ray, Mishchenko, Een, Brayton, Jang, Chen – DATE 2012 16