An Improved Soft e FPGA Design and Implementation

  • Slides: 26
Download presentation
An Improved “Soft” e. FPGA Design and Implementation Strategy Victor Aken’Ova, Guy Lemieux, Resve

An Improved “Soft” e. FPGA Design and Implementation Strategy Victor Aken’Ova, Guy Lemieux, Resve Saleh So. C Research Lab, University of British Columbia Vancouver, BC Canada

Overview • Introduction and Motivation – Embedded FPGA (e. FPGA) • Soft Embedded FPGAs

Overview • Introduction and Motivation – Embedded FPGA (e. FPGA) • Soft Embedded FPGAs – Configurable Architecture • Improving Soft e. FPGAs – Tactical Standard Cells – Structured e. FPGA layout • Results • Summary and Conclusions 2

Introduction • So. C designs are getting more complex and costly • Programmability can

Introduction • So. C designs are getting more complex and costly • Programmability can be built into So. Cs to amortize costs by reducing chip re-spins Software Flexibility No Flexibility Hardware Flexibility e. FPGAs 3

Applications for e. FPGA Fabrics CPU 3 1 An e. FPGA for CPU acceleration

Applications for e. FPGA Fabrics CPU 3 1 An e. FPGA for CPU acceleration 2 e. FPGA for product differentiation An e. FPGA for revisions 4

Motivation • shortcomings of existing e. FPGA design approaches – Hard e. FPGA •

Motivation • shortcomings of existing e. FPGA design approaches – Hard e. FPGA • Highly efficient full-custom layouts but inflexible – Soft e. FPGA • Very flexible but inefficient standard cell layouts • alternative approach: flexible + efficient 5

“Hard” e. FPGA Approach with a library of 3 Cores user circuit 1 RTL

“Hard” e. FPGA Approach with a library of 3 Cores user circuit 1 RTL ? ? 3 ? 2 Restrictive! overcapacity increases area and delay overheads 6

The “Soft” e. FPGA Approach e. FPGA RTL Generator auto generated e. FPGA ASIC

The “Soft” e. FPGA Approach e. FPGA RTL Generator auto generated e. FPGA ASIC flow much less logic and routing overcapacity Generic 7 x area and 2 x delay versus full-custom Standard Cells 7

Some Solutions to Problems of Existing Approaches • retain e. FPGA generator idea for

Some Solutions to Problems of Existing Approaches • retain e. FPGA generator idea for flexibility But… • use structured approach for efficiency • use tactical cells to reduce area + delay 8

Our Improved Design Approach “Soft++” e. FPGA RTL Generator auto generated e. FPGA Structured

Our Improved Design Approach “Soft++” e. FPGA RTL Generator auto generated e. FPGA Structured ASIC FLOW GOAL Tactical combine best of soft and hard approaches +Generic Cells 9

Island-style e. FPGA Architecture • used island-style architecture because – Mainstream: existing FPGA CAD

Island-style e. FPGA Architecture • used island-style architecture because – Mainstream: existing FPGA CAD tools can be leveraged – can exploit its regular structure to improve design efficiency • Created parameterized e. FPGA in VHDL 10

Island-style e. FPGA Architecture L: Left Edge TILE B: Bottom Edge TILE C: Corner

Island-style e. FPGA Architecture L: Left Edge TILE B: Bottom Edge TILE C: Corner TILE L C B (a) Island-style e. FPGA (b) e. FPGA Tile Layout 11

Unstructured vs. Structured e. FPGA Design Approach Fixed Logic Soft e. FPGA (a) unstructured

Unstructured vs. Structured e. FPGA Design Approach Fixed Logic Soft e. FPGA (a) unstructured e. FPGA layout tile 1 tile 2 tile 3 4 (b) structured e. FPGA layout 12

Measured Impact of Structure on e. FPGA Quality • Significant improvements in logic capacity

Measured Impact of Structure on e. FPGA Quality • Significant improvements in logic capacity – result of a more efficient CAD methodology • wire-only critical path delay less by 21% • Cut CAD design time by as much as 6 X 13

Architecture-specific Tactical Cells – The Concept • improve quality by creating few tactical standard

Architecture-specific Tactical Cells – The Concept • improve quality by creating few tactical standard cells to replace generic cells • detailed analysis of design profile should reveal areas that yield significant gains 14

Standard cell Area Breakdown for Island-Style Architecture other 12% flip-flops 46% muxes 42% switch

Standard cell Area Breakdown for Island-Style Architecture other 12% flip-flops 46% muxes 42% switch 16% LUT input 30% mux 13% LUT mux 39% flip-flops and multiplexers dominate e. FPGA area 15

Architecture-specific Tactical Cells – Flip-Flop vs. SRAM ~2: 1 area ratio! (a) typical D

Architecture-specific Tactical Cells – Flip-Flop vs. SRAM ~2: 1 area ratio! (a) typical D flip-flop (b) typical SRAM cell An SRAM circuit has fewer transistors = less area 16

Custom Layout of Standard Cell – Flip-Flop vs. SRAM vdd 2. 5 X gnd

Custom Layout of Standard Cell – Flip-Flop vs. SRAM vdd 2. 5 X gnd 1 X vdd gnd Standard Cell Flip-flop Tactical SRAM Cell 17

Architecture-specific Tactical Cells – CMOS vs. Pass Gate A S 0 B S 0

Architecture-specific Tactical Cells – CMOS vs. Pass Gate A S 0 B S 0 C S 0 S 0 S 1 VDD A S 1 O B S 1 O C D S 0 D decompose into NAND, INV after extra output inverter ~4: 1 area ratio! pass tree logic uses fewer transistors and is faster 18

Layout Technique for Pass-Tree Multiplexers vdd n-well cutout gnd underutilized region extra NMOS (denser

Layout Technique for Pass-Tree Multiplexers vdd n-well cutout gnd underutilized region extra NMOS (denser cell) gnd n-well cut-outs allow denser pass transistor tree layouts 19

Architecture-specific Tactical Cells – Cell Area Cell Equivalent Custom Standard cell Tactical cell improvement

Architecture-specific Tactical Cells – Cell Area Cell Equivalent Custom Standard cell Tactical cell improvement Factor 2 Area (um 2) Area (um ) 1 -SRAM 61 16: 1 MUX 899 2228 32: 1 MUX 4 -LUT 5 -LUT 1875 4180 24 146 2. 5 6. 1 293 530 1061 7. 6 3. 5 3. 9 20

Area Impact of Tactical Standard Cells – e. FPGA Area -58% -85% e. FPGA

Area Impact of Tactical Standard Cells – e. FPGA Area -58% -85% e. FPGA (a) soft (b) soft ++ e. FPGA (c) full-custom soft ++ ~2. 4 X smaller than soft = 58% area savings 21

Area Graphs of Area and Delay Savings 2. 4 X Better 1. 6 –

Area Graphs of Area and Delay Savings 2. 4 X Better 1. 6 – 2. 8 X full-custom area 1. 1 X of full-custom delay Delay Benchmarks 1. 4 X Better Benchmarks 22

Fabricated Chip Designs with e. FPGAs (180 nm process) (a) gradual architecture (b) island-style

Fabricated Chip Designs with e. FPGAs (180 nm process) (a) gradual architecture (b) island-style architecture 23

Summary • e. FPGA area improved 58% (on average) – 2 to 2. 8

Summary • e. FPGA area improved 58% (on average) – 2 to 2. 8 X larger than full-custom equivalent (worst case) • e. FPGA delay improved 40% (average) – within 10% of delay of full-custom versions • exploited the regularity of island-style architecture to increase logic capacity 24

End of Talk 25

End of Talk 25

Question and Answer Slide Soft++ hard Area Soft custom Logic Capacity soft++ fills some

Question and Answer Slide Soft++ hard Area Soft custom Logic Capacity soft++ fills some of performance gap left by hard 26