Fully Pipelined FPU for OR 1200 Eric Zhang

  • Slides: 23
Download presentation
Fully Pipelined FPU for OR 1200 Eric Zhang Electrical & Computer Engineering

Fully Pipelined FPU for OR 1200 Eric Zhang Electrical & Computer Engineering

Introduction & Motivation • Floating Point Unit: – Performs floating point operations such as:

Introduction & Motivation • Floating Point Unit: – Performs floating point operations such as: • add/sub, multiplication, division, sine, cosine, FMA – Wide dynamic range and high precision – Required by many algorithms and applications • Eg. Hotspot, SRAD, etc. – High performance and Low power consumption

FPU in OR 1200 • Arithmetic, Conversion, Comparison

FPU in OR 1200 • Arithmetic, Conversion, Comparison

FPU in OR 1200 • Serial implementation with long stalls 10 cycles total 38

FPU in OR 1200 • Serial implementation with long stalls 10 cycles total 38 cycles total 37 cycles total

Goals and Objectives • Pipeline the current version of floating point multiplication and division

Goals and Objectives • Pipeline the current version of floating point multiplication and division • Reduce number of clock cycles • Eliminate the stalls due to serial implementation • Synthesize and obtain the physical layout of the pipelined FPU using Synopsys Top-Down design flow

Methodology • Analyze existing floating point implementation – Identify serial implementation that possible for

Methodology • Analyze existing floating point implementation – Identify serial implementation that possible for pipelining • Pipeline the FPU multiplier and divider using Synopsys Register Retiming design flow • DC for synthesis, VCS for functional simulation and verification, IC compiler for physical layout, and power and area measurement

Register Retiming

Register Retiming

Register Retiming 1. Library setup 2. Constraint setup 3. 4. Compile 5. New constraint

Register Retiming 1. Library setup 2. Constraint setup 3. 4. Compile 5. New constraint 6. Retiming

Register Retiming Flow

Register Retiming Flow

Register Retiming Timing Report

Register Retiming Timing Report

Schematic Before Retiming

Schematic Before Retiming

Schematic After Retiming

Schematic After Retiming

VCS Functional Simulation 1. 6 * 4. 0 = 6. 4

VCS Functional Simulation 1. 6 * 4. 0 = 6. 4

VCS Functional Simulation 1. 6 / 4. 0 = 0. 0625

VCS Functional Simulation 1. 6 / 4. 0 = 0. 0625

Physical Layout

Physical Layout

Specification Results Spec Pipelined Original Frequency 222 MHz 222 Mhz VDD 1. 05 V

Specification Results Spec Pipelined Original Frequency 222 MHz 222 Mhz VDD 1. 05 V 1. 05 V Metal Layers 9 9 # of input pins 143 # of output pins 80 80 Area 0. 5 mm^2 0. 45 mm^2 FPMUL Cycles 13 38 FPDIV Cycles 11 37 Dynamic Power 3. 79 m. W 0. 65 m. W Leakage Power 1. 33 m. W 0. 69 m. W Total Power 5. 13 m. W 1. 34 m. W

Design. Ware IP • Technology-independent • Microarchitecture-level library • Synthesizable for ASIC, So. C,

Design. Ware IP • Technology-independent • Microarchitecture-level library • Synthesizable for ASIC, So. C, and FPGA design • IPs include: – Arithmetic Components: Multiplier, divider, adder, etc • DW 01_add, DW 02_mult, DW_fp_mult – DSP, AMBA Bus, Memory Controller • DW_fir – etc

Design. Ware IP • To use Design. Ware IP: 1. set synthetic_library dw_foundation. sldb

Design. Ware IP • To use Design. Ware IP: 1. set synthetic_library dw_foundation. sldb 2. set link_library $target_library $synthetic_library 3. License: Design. Ware • Instantiation In Verilog file: – • DW 01_mult #(8, 8) U 1 (A, B, TC, PRODUCT); Synthesize using normal flow

Design. Ware IP • Benefits of using Design. Ware IP – Increased productivity: parameterized,

Design. Ware IP • Benefits of using Design. Ware IP – Increased productivity: parameterized, pre-verified – Better quality of results (Qo. R): optimized by Synopsys – Design reusability

Improved Scripts for design flow • Automaticly setup all necessary folders and scripts •

Improved Scripts for design flow • Automaticly setup all necessary folders and scripts • Automaticly setup scratch storage for synthesis results • Scripts common to different projects are created as symbolic links – Eg. setup. tcl

Improved Scripts for design flow Top level folder without any projects: Create a project

Improved Scripts for design flow Top level folder without any projects: Create a project called “test”:

Improved Scripts for design flow Top level folder after creating “test”: Folder layout of

Improved Scripts for design flow Top level folder after creating “test”: Folder layout of project “test” : Other useful scripts : timing_closure. sh : binary search for minimum delay project_init. tcl: Project specific information: top-level design name, language, etc

Thank you!

Thank you!