SPEEDINDEPENDENT FUSED MULTIPLY ADD AND SUBTRACT UNIT Yuri
SPEED-INDEPENDENT FUSED MULTIPLY ADD AND SUBTRACT UNIT Yuri Stepchenkov, Victor Zakharov, Yuri Rogdestvenski, Yuri Diachenko, Nickolaj Morozov and Dmitri Stepchenkov Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, IPI RAS, Moscow, Russian Federation
OUTLINE What circuits do we design? Block diagram of Fused Multiply-Add & Subtract (SI-FMAS) Unit Simplified indication SI-FMAS implementation Testing SI-FMAS Conclusions IPI FRC CSC RAS EWDTS-2016 2 of 23
CIRCUITS CLASSIFICATION All circuits Asynchronous (no clocks!) Synchronous Speed. Independent (clock!) Others IPI FRC CSC RAS EWDTS-2016 3 of 23
ADVANTAGES OF SI CIRCUITS Their workability does not depend on delay of their cells They are free of hazards They have extremely wide workability range on supply voltage and ambient temperature, They detect constant failures and stop working IPI FRC CSC RAS EWDTS-2016 4 of 23
SPEED-INDEPENDENT PRINCIPLES Two phase operation: work and spacer (pause) Each circuit cell can switch only once during transient of the circuit from spacer to next work state Full indication of all cells in the circuit in each phase of work IPI FRC CSC RAS EWDTS-2016 5 of 23
BLOCK DIAGRAM OF SI-FMAS Clk Start Rst Stop Х Y 64 Z 64 R 64 3 I/O Manager 195 Wr 1 195 Stop 1 Wr 2 SI-FMAS Core-1 Rd 1 Stop 2 SI-FMAS Core-2 135 Rd 2 SI-MX 64 Ready IPI FRC CSC RAS 3 64 3 FMA RFA FMS RFS Rd EWDTS-2016 6 of 23
BLOCK DIAGRAM OF SI-FMAS CORE Wr X Rst Y Z R Stop Input SI-FIFO МX МY Ternary multiplier X*Y X*Y+ZA, normalization, rounding 1 st pipeline stage МZ EX EY EZ Exponent estimation, Z alignment E, ZA X*Y−ZA, normalization, rounding IPI FRC CSC RAS RFA RFS FMA FMS EWDTS-2016 3 rd pipeline stage 4 th pipeline stage Output SI-FIFO Ready 2 nd pipeline stage Rd 7 of 23
SIMPLIFIED INDICATION (1) Why is it possible? First work state just appeared at circuit’s outputs during transient from spacer to work phase is a stationary state CMOS cell stops its switching into opposite state if input combination that caused this transient has disappeared IPI FRC CSC RAS EWDTS-2016 8 of 23
SIMPLIFIED INDICATION (2) How can we optimize indication? Full indication in spacer phase and simplified one in work phase Bitwise indication in combinational circuits Taking into account bitwise indicators in the input register of the following pipeline stage IPI FRC CSC RAS EWDTS-2016 8 of 23
SIMPLIFIED INDICATION (3) Unit spacer Q+ = I 0*I 1 + Q*(I 0 + I 1) CMOS transistors: 92 IPI FRC CSC RAS 68 EWDTS-2016 60 10 of 23
SIMPLIFIED INDICATION (4) Zero spacer CMOS transistors: 92 IPI FRC CSC RAS 76 EWDTS-2016 11 of 23
OPTIMIZED PIPELINE INDICATION Stage (i) Data Stage (i-1) BWI R-A Stage (i+1) BWI R-A IPI FRC CSC RAS Data RG Logic BWI R-A EWDTS-2016 Data RG Logic BWI R-A 12 of 23
IMPLEMENTATION BASIS Traditional CMOS circuitry with dualrail signals everywhere except multiplier utilized ternary coding, 65 -nm CMOS process with 6 metals, Standard cell library (Dolphin) Self-timed cell library (IPI 65 D, 108 cells) designed in IPI FRC CSC RAS EWDTS-2016 13 of 23
FEATURES OF SI-FMAS Parameter Synchronous analog SI-FMAS Die size, mm 2 0. 312 1. 12 Performance, Gflops 2. 06 3. 15 Latency, ns 10. 8 1. 84 Die size efficiency, mm 2/Gflops 0. 151 0. 321 Workability range on VDD ± 10% Vth. . . VBD IPI FRC CSC RAS EWDTS-2016 14 of 23
LAYOUT OF SI-FMAS Input FIFO Fraction Multiplier Exponent calculator Adder-Subtractor Normalizer Round & Postnormalization Output FIFO Multiplexer IPI FRC CSC RAS EWDTS-2016 15 of 23
GOALS OF TESTING: SYNCHRONOUS Logical Level: Functional verification Electrical Level: Eliminating hazards and signal competitions in a full range of supply voltage and temperature IPI FRC CSC RAS EWDTS-2016 16 of 23
GOALS OF TESTING: SPEED-INDEPENDENT Logical Level: Functional verification Self-timed analysis (ASPECT, FAZAN, FIESTA) Electrical Level: Nothing IPI FRC CSC RAS EWDTS-2016 17 of 23
HARDWARE TEST ENVIRONMENT Result Input Data SI-FMAS SI Reference Register Supply VBD Vth Temperature, 0 C Clk Start Mode 63 SI Control IPI FRC CSC RAS SI Comparator EWDTS-2016 +125 OK 18 of 23
TEST ORDER Supply nominal voltage Set fixed operands at SI-FMAS inputs and Mode=0 Run clock generator and set Start =1 Change input Mode to Mode=1 Observe periodic pulses at output OK Change supply voltage and/or temperature until OK disappears Repeat experiment for other operands IPI FRC CSC RAS EWDTS-2016 19 of 23
WORKABILITY RANGE VDD Synchronous SI-FMAS analog VBD Vnom Vth 63 IPI FRC CSC RAS +125 EWDTS-2016 T, 0 C 20 of 23
SUMMARY Designed speed independent (SI) pipelined 64 -bit FMAS unit conforming to IEEE 754 demonstrates high average performance (up to 3. 15 Gigaflops), low latency (less than 2 ns), and wide workability range being implemented in 65 nm standard CMOS process Developed test environment proves that suggested unit is true SI unit whose functionality does not depend on real parameters of its components Next researches will be devoted to decomposition of the multiplier in order to obtain the same performance of the SI-FMAS unit while using one computing channel instead of two identical channels IPI FRC CSC RAS EWDTS-2016 21 of 23
Thank You! IPI FRC CSC RAS EWDTS-2016 22 of 23
CONTACTS Director: academician Sokolov I. A. Address: Institute of Informatics Problems of the Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences (IPI RAS), Moscow, Russian Federation, 119333, Vavilova str. , 44, b. 2 Tel: +7 (495) 137 34 94 Fax: +7 (495) 930 45 05 E-mail: ISokolov@ipiran. ru Stepchenkov Y. A. , tel. +7 (495) 671 15 20, Ystepchenkov@ipiran. ru IPI FRC CSC RAS EWDTS-2016 23 of 23
- Slides: 23