FlexCell Optimization A Paradigm Shift in HighPerformance CellBased
Flex-Cell Optimization A Paradigm Shift in High-Performance Cell-Based Design Slide 1 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
The Power-User Dilemma Takes too long! Cost / TTM Results aren’t good enough! ASIC/COT FPGA Team=10 400 MHz 9 Months Custom Team=400 3 GHz, 3 Years Flex-Cell Opt Team=10 520 MHz 6 Months Speed, Power, Area Slide 2 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
The Timing Dilemma • Design Team clock target – 350 MHz • On Post-logic synth. /Post-placement STA – Only 300 MHz – Problem!! • Options – Design change • Rewrite RTL – Tapeout Delay!! – Better technology • Smaller geometry – Tapeout delay and NRE cost!! • Low-k technology – Yield hit!! – Better tools • Flex-Cell Optimization – Custom-design benefits in std cell flow Slide 3 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Root of the Problem • Various past studies, including a special session at DAC 2000 • Std-Cell based design “an order of magnitude” lower performance than custom, at same process node – Architecture – Fixed cell library – Layout • Fixed cell library can account for as much as 25% of the performance shortfall Slide 4 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Rich vs Smart • Simply creating a “richer” cell library does not solve problem – Too many cells hinder automated optimization – Missing design-specific context information – Well-known matching problems for larger cells • Custom-crafted cells, for specific design, can inject large timing gains late in the design cycle • Compute-intensive process – Transistor netlist optimization – Cell layout creation – View generation Slide 5 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Flex-Cell Optimization -- Concept Logical Level Physical Level Flex-Cell Opt Transistor Level Optimization at Gate, Transistor & Physical Levels Slide 6 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Prior Work • Manual custom-crafting of cells, is well established – Tactical cells: every high-performance design project uses some • Automated transistor-level netlist creation/optimization – Fishburn, Dunlop(1985): TILOS, transistor sizing – Gavrilov et al (1997): Library-less synthesis – Kanecko, Tian (1998): Concurrent cell generation and mapping of digital logic – Liu, Abraham (1999): Transistor-level synthesis of combinational logic Slide 7 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Flex-Cell Optimization Targets • Eliminate deficiency due to fixed cell library – Boost performance by 15% - 25% • Close aggressive timing in days • Retain proven existing cell-based design flow • Use high-yield process, still get performance • Minimal increase in die-size or power • Get custom-design performance from std-cellbased flow Slide 8 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Key Steps Crit ic • Post synthesis netlist • STA • Cluster formation • Flex-cell (custom crafted) creation • Gate-level optimization Slide 9 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. a c d b al P a aths 4 Cells 22 Transistors 9 Wires c a a b d a d c a 1 Cell 13 Transistors 6 Wires
Flex-Cell Optimization with Physicals • Physically-aware STA – Placement aware • Congestion • Blockage – Multiple levels of accuracy for route info • Steiner estimates • Global route • Detailed route** • Physically-driven optimization – – – Physically-aware clustering and mapping Physically-aware gate-level optimizations Low disturbance to existing placement Incremental legalization of placement Incremental re-computation of routes/estimates Slide 10 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Sample Flex-Cell Gate-Level Cluster Before 4 Cells, 9 nets a c d Rise (critical) 0. 26 ns 0. 12 ns Fall (critical) 0. 31 ns 0. 10 ns # Cells 4 1 # Transistors 22 13 Path depth 3 2 # nets 9 7 y a Critical Path: a -> y b After Rise = 0. 26 ns ; Fall = 0. 31 ns Custom-Crafted Flex-Cell 1 Cell, 7 nets Critical Path: a -> y Rise = 0. 12 ns ; Fall = 0. 10 ns Critical Path: a -> y c a a Rise = 0. 12 ns; Fall = 0. 10 ns y c a b c d d b c 22 Transistors Path depth = 3 levels Tx-Level View of Gate Cluster Slide 11 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. Tx Opt c a a b d a d c a y 13 Transistors; Path depth = 2 levels After Tx-Level Optimization
Transistor-Level Optimization Slide 12 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Key Issues • Judicious mix of gate-level and transistor-level optimization • Judicious mix of discrete and continuous transistor sizing • Effective use of transistor-level restructuring • Fast and accurate transistor-level simulation – 50 x to 100 x faster than Spice • Accurate estimation of parasitics given transistorlevel netlist Slide 13 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Impact On a Sample Critical Path Original Critical Path 0. 29 0. 25 0. 07 0. 11 0. 18 0. 14 0. 04 0. 15 0. 20 0. 24 Optimized Path 0. 36 0. 07 0. 82 Flex-Cell 1 21% Improvement 0. 04 0. 20 Slide 14 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. 1. 04
Results (Zen. Time ) • 38 K+ instance design • 16% performance boost – 297 MHz --> 344 MHz • Implemented in a 0. 13 u process • Added 132 flex-cells, 5, 927 instances • Without increasing power or area Slide 15 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Impact on Global Timing • Initial frequency: 297 MHz • Final frequency: 344 MHz Slide 16 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
Timing Optimization Results with physicals (def, sdf, …) Slide 17 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. with wire loads
I/O & Design Flow Design library. lef library. cdl Constraints netlist. v netlist. def constr. sdc netlist. set_load netlist. sdf tech. bsim 3 Physical Synthesis Interface Flex-Cell Opt Detailed Route Extraction & Verification GDSII Slide 18 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. Flex-Cell Factory Clustering Cont. Sizing Discrete Sizing Physical Timing Back-end Design Front-end Design Library Gatelevel Opt. flex-cell. cdl flex-cell. est. lib flex-cell. est. lef opt_netlist. v opt_netlist. def
Automated Flex-Cell Generation Sized spice netlists Cell Architecture Tool Suite and Flow Layout Spice gds lef ant. lef lumped. C. sp distr. RC. sp Functional eqn. v mos. v Slide 19 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. Reports Timing Power. lib. db. tlf Noise/ glitch. lib ? ?
Summary • New dimension in optimization of cell-based designs • Essential to find the “right balance” between gate-level and transistor-level optimization • Better design quality, higher runtime • Timing, Area, Power no longer a simple tradeoff – Possible to improve more than one, simultaneously • Many challenges – Lots of research opportunities!! Slide 20 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc.
The History of Methodology Shifts Netlist optimization Slide 21 Dec 1, 2003 Copyright, 1999 - 2003 © Zenasis Technologies, Inc. Flex-cell synthesis Physical synthesis Logic synthesis Netlist schematic Physical optimization Flex-cell optimization
- Slides: 21