Integer Multipliers 1 Multipliers A must have circuit

  • Slides: 139
Download presentation
Integer Multipliers 1

Integer Multipliers 1

Multipliers • A must have circuit in most DSP applications • A variety of

Multipliers • A must have circuit in most DSP applications • A variety of multipliers exists that can be chosen based on their performance • Serial, Serial/Parallel, Shift and Add, Array, Booth, Wallace Tree, …. 2

en en en reset converter reset RA converter 16 x 16 multiplier RC Converter

en en en reset converter reset RA converter 16 x 16 multiplier RC Converter RB 3

Multiplication Algorithm X= Xn-1 Xn-2 ………. . ……X 0 Multiplicand Y=Yn-1 Yn-2………………. Y 0

Multiplication Algorithm X= Xn-1 Xn-2 ………. . ……X 0 Multiplicand Y=Yn-1 Yn-2………………. Y 0 Multiplier Yn-1 X 0 Yn-2 X 0 Yn-3 X 0 …… Y 1 X 0 Y 0 X 0 Yn-1 X 1 Yn-2 X 1 Yn-3 X 1 …… Y 1 X 1 Y 0 X 1 Yn-1 X 2 Yn-2 X 2 Yn-3 X 2 …… Y 1 X 2 Y 0 X 2 … … …. …. Yn-1 Xn-2 Yn-2 X 0 n-2 Yn-3 X n-2 …… Y 1 Xn-2 Y 0 Xn-2 Yn-1 Xn-1 Yn-2 X 0 n-1 Yn-3 Xn-1 …… Y 1 Xn-1 Y 0 Xn-1 -------------------------------------------------------------------- P 2 n-1 P 2 n-2 P 2 n-3 P 2 P 1 P 0 4

1. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to

1. Multiplication Algorithms Implementation of multiplication of binary numbers boils down to how to do the additions. Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64 partial Products and then add them up. 5

Multiplier Design Storage R E G I N R E G MU ( Multiplier

Multiplier Design Storage R E G I N R E G MU ( Multiplier Unit) O U T Control Unit 6

Serial Multiplier X: x 3 x 2 x 1 x 0 Y: y 3

Serial Multiplier X: x 3 x 2 x 1 x 0 Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 1 7

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 2 8

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 3 9

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 4 10

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 5 11

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Reset: 01000010000 Slide 6 12

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 7 13

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 8 14

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 9 15

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 10 16

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 11 17

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 12 18

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 13 19

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 14 20

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 15 21

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 16 22

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 17 23

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 18 24

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 19 25

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 20 26

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 21 27

X: x 3 x 2 x 1 x 0 Si: the ith bit of

X: x 3 x 2 x 1 x 0 Si: the ith bit of the final result Y: y 3 y 2 y 1 y 0 Input Sequence for G 1: Ci: the only carry from column i 00 x 3 x 2 x 1 x 00 x 3 x 2 x 1 x 0 0 x 3 x 2 x 1 x 0 Sij: the jth partial sum for column i 00 y 3 y 3 0 y 2 y 2 0 y 1 y 1 0 y 0 y 0 Cij: the jth partial carry from column i Reset: 01000010000 Slide 21 28

Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1

Si: the ith bit of the final result Serial / Parallel Multiplier Slide 1 29

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Slide 2 30

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 3 31

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 4 32

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 5 33

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 6 34

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Sij: the jth partial sum for column i Cij: the jth partial carry from column i Slide 7 35

Si: the ith bit of the final result Ci: the only carry from column

Si: the ith bit of the final result Ci: the only carry from column i Slide 8 36

 Shift AND Add Multiplier INPUT Ain (7 downto 0) REGA 0 MUX 8

Shift AND Add Multiplier INPUT Ain (7 downto 0) REGA 0 MUX 8 bit Adder INPUT Bin (7 downto 0) REGC Result (15 downto 8) REGB Result (7 downto 0) CLOCK 37

 Synchronous Shift and Add Multiplier controller Ø Multiplication process: § 5 states: Idle,

Synchronous Shift and Add Multiplier controller Ø Multiplication process: § 5 states: Idle, Init, Test, Add, and Shift&Count. § Idle: Starts by receiving the Start signal; § Init: Multiplicand multiplier are loaded into a load register and a shift register, respectively; § Test: The LSB in the shift register which contains the multiplier is tested to decide the next state; 38

 Synchronous Shift and Add Multiplier Controller. Design § Add: If LSB is ‘

Synchronous Shift and Add Multiplier Controller. Design § Add: If LSB is ‘ 1’, then next state is to add the new partial product to the accumulation result, and the state machine transits to shift&count state ; § Shift&Count: If LSB is ‘ 0’, then the two shift register shift their contains one bit right, and the counter counts up by one step. After that, the state machine transits back to test state; § When the counter reaches to N , a Stop signal is asserted and the state machine goes to the idle state; § Idle: In the idle state, a Done signal is asserted to indicate the end of multiplication. 39

n-bit Multiplier: Q 0=1: Multiplicand is added to register A; the result is stored

n-bit Multiplier: Q 0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right one bit Q 0=0: Registers C, A, Q are shifted to the right one bit Slide 1 40

Example: 4 -bit Multiplier Initial Values Slide 2 41

Example: 4 -bit Multiplier Initial Values Slide 2 41

Example: 4 -bit Multiplier First Cycle--Add Slide 3 42

Example: 4 -bit Multiplier First Cycle--Add Slide 3 42

Example: 4 -bit Multiplier First Cycle--Shift Slide 4 43

Example: 4 -bit Multiplier First Cycle--Shift Slide 4 43

Example: 4 -bit Multiplier Second Cycle--Shift Slide 5 44

Example: 4 -bit Multiplier Second Cycle--Shift Slide 5 44

Example: 4 -bit Multiplier Third Cycle--Add Slide 6 45

Example: 4 -bit Multiplier Third Cycle--Add Slide 6 45

Example: 4 -bit Multiplier Third Cycle--Shift Slide 7 46

Example: 4 -bit Multiplier Third Cycle--Shift Slide 7 46

Example: 4 -bit Multiplier Fourth Cycle--Add Slide 8 47

Example: 4 -bit Multiplier Fourth Cycle--Add Slide 8 47

Example: 4 -bit Multiplier Fourth Cycle--Shift Slide 9 48

Example: 4 -bit Multiplier Fourth Cycle--Shift Slide 9 48

4*4 Synchronous Shift and Add Multiplier Design Layout Design Floor plan of the 4*4

4*4 Synchronous Shift and Add Multiplier Design Layout Design Floor plan of the 4*4 Synchronous Shift and Add Multiplier 49

Comparison between Synchronous and Asynchronous Approaches . 50

Comparison between Synchronous and Asynchronous Approaches . 50

Example : (simulated by Ovais Ahmed) Multiplicand = 100010012 = 8916 Multiplier = AB

Example : (simulated by Ovais Ahmed) Multiplicand = 100010012 = 8916 Multiplier = AB 16 101010112 = Expected Result = 1011011100000112 =5 B 8316 51

 Array Multiplier · Regular structure based on add and shift algorithm. · Addition

Array Multiplier · Regular structure based on add and shift algorithm. · Addition is mainly done by carry save algorithm. · Sign bit extension results in a higher capacitive load and slows down the speed of the circuit. 52

Addition with CLA 53

Addition with CLA 53

Array Multiplier with CSA 54

Array Multiplier with CSA 54

Critical Path with Array Multipliers FA FA HA FA FA HA HA Two of

Critical Path with Array Multipliers FA FA HA FA FA HA HA Two of the possible paths for the Ripple-Carry based 4*4 Multiplier Area = (N*N) AND Gate + (N-1)N Full-Adder Delay = τ HA + (2 N-1) τ FA 55

56

56

Wallace Tree 57

Wallace Tree 57

Array Multiplier + Wallace Tree 58

Array Multiplier + Wallace Tree 58

 Baugh-Wooley Algorithm Convert negative partial products to positive representation • No sign-extension required

Baugh-Wooley Algorithm Convert negative partial products to positive representation • No sign-extension required 2/23/2021 Concordia VLSI Lab 59 59

examples of 5 -by-5 Baugh-Wooley 2/23/2021 Concordia VLSI Lab 60 60

examples of 5 -by-5 Baugh-Wooley 2/23/2021 Concordia VLSI Lab 60 60

Squarer using Baugh-Wooley Algorithm a 7 a 6 a 5 a 4 a 3

Squarer using Baugh-Wooley Algorithm a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 * a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 ------------ ------------ ------ a 7*a 0 a 6*a 0 a 5*a 0 a 4*a 0 a 3*a 0 a 2*a 0 a 1*a 0 a 0*a 0 a 7*a 1 a 6*a 1 a 5*a 1 a 4*a 1 a 3*a 1 a 2*a 1 a 1*a 1 a 0*a 1 a 7*a 2 a 6*a 2 a 5*a 2 a 4*a 2 a 3*a 2 a 2*a 2 a 1*a 2 a 0*a 2 a 7*a 3 a 6*a 3 a 5*a 3 a 4*a 3 a 3*a 3 a 2*a 3 a 1*a 3 a 0*a 3 a 7*a 4 a 6*a 4 a 5*a 4 a 4*a 4 a 3*a 4 a 2*a 4 a 1*a 4 a 0*a 4 a 7*a 5 a 6*a 5 a 5*a 5 a 4*a 5 a 3*a 5 a 2*a 5 a 1*a 5 a 0*a 5 a 7*a 6 a 6*a 6 a 5*a 6 a 4*a 6 a 3*a 6 a 2*a 6 a 1*a 6 a 0*a 6 a 7*a 7 a 6*a 7 a 5*a 7 a 4*a 7 a 3*a 7 a 2*a 7 a 1*a 7 a 0*a 7 ------------ ------------ ------------ ------------ ------a 4*a ------------ ------ 61

Example of an 8 bit squarer 62

Example of an 8 bit squarer 62

Array Multiplier 32 bits by 32 bits multiplier 63

Array Multiplier 32 bits by 32 bits multiplier 63

 Booth (Radix-4) Multiplier · Radix-4 (3 bit recoding) reduces number of partial products

Booth (Radix-4) Multiplier · Radix-4 (3 bit recoding) reduces number of partial products to be added by half. · Great saving in area and increased speed. A = -an-12 n-1 + an-22 n-2 + an-32 n-3 + …. + a 12 + a 0 B = -bn-12 n-1 + bn-22 n-2 + bn-32 n-3 + …. + b 12 + b 0 · Base 4 redundant sign digit representation of B is (n/2) - 1 B = 22 i Ki i = 0 64

· · Ki is calculated by following equation Ki = -2 b 2 i+1

· · Ki is calculated by following equation Ki = -2 b 2 i+1 + b 2 i-1 i = 0, 1, 2, …. (n-2)/2 · 3 bits of Multiplier B, b 2 i+1, b 2 i-1, are examined and corresponding Ki is calculated. · B is always appended on the right with zero (b-1 = 0), and n is always even (B is sign extended if needed). · The product A B is then obtained by adding n/2 partial products. (n/2) - 1 A B = P = 22 i Ki A i = 0 65

Booth Algorithm Decoding of multiplier to generate signals for hardware use Xi+1 Xi Xi-1

Booth Algorithm Decoding of multiplier to generate signals for hardware use Xi+1 Xi Xi-1 OP NEG ZERO TWO 0 0 0 1 0 0 2 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 0 0 1 1 2 0 0 1 1 0 66

Booth Algorithm A Booth recoded multiplier examines Three bits of the multiplicand at a

Booth Algorithm A Booth recoded multiplier examines Three bits of the multiplicand at a time It determine whether to add zero, 1, -1, 2, or -2 of that rank of the multiplicand. The operation to be performed is based on the current two bits of the multiplicand the previous bit Xi+1 X Xi-1 Zi/2 0 0 0 1 1 0 1 0 1 1 2 1 0 0 -2 1 0 1 -1 1 1 0 -1 1 0 67

BIT M is multiplied by 21 20 2 -1 Xi Xi+1 Xi+2 0 0

BIT M is multiplied by 21 20 2 -1 Xi Xi+1 Xi+2 0 0 0 add zero (no string) +0 0 0 1 add multipleic (end of string) +X 0 1 0 add multiplic. (a string) +X 0 1 1 add twice the mul. (end of string) +2 X 1 0 0 sub. twice the m. (beg. of string) -2 X 1 0 1 sub. the m. (-2 X and +X) -X 1 1 0 sub. the m. (beg. of string) -X 1 1 1 sub. zero (center of string) -0 OPERATION 68

Booth Algorithm- dot notation Multiplicand A = ● ● Multiplier B = (●●) Partial

Booth Algorithm- dot notation Multiplicand A = ● ● Multiplier B = (●●) Partial product bits ● ● (B 1 B 0)2 A 40 Partial product bits ● ● (B 3 B 2)A 41 Product P = ● ● ● ● 69

Example The following example is used to show the calculation is done properly. Added

Example The following example is used to show the calculation is done properly. Added to Multiplicand X = 000011 the multiplier Multiplier Y = 011101 0 1 1 1 0 After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial product two bits and add them together. X* +1 0000011 X* -1 111101 X* +2 00000110 ---------------------- 000001010111 70

Sign Extension 71

Sign Extension 71

 Sign extension § Traditional sign-extension scheme • Segment the input operands based on

Sign extension § Traditional sign-extension scheme • Segment the input operands based on the size of embedded blocks • Multiply the segmented inputs and extend the sign bit of each partial products • Sum all partial products Segmented input operands × Sign extension partial products + Sign 2/23/2021 Final result Concordia VLSI Lab 72 72

Booth Algorithm-Example 1 Example 1: 73

Booth Algorithm-Example 1 Example 1: 73

Booth Algorithm Example 2 Notice sign extensions 74

Booth Algorithm Example 2 Notice sign extensions 74

Booth Algorithm-Example 3 Notice the sign extensions 75

Booth Algorithm-Example 3 Notice the sign extensions 75

Comparison of Booth and parallel multiplier shift and Add 76

Comparison of Booth and parallel multiplier shift and Add 76

Template to reduce sign extensions for Booth Algorithm Please note that each operand is

Template to reduce sign extensions for Booth Algorithm Please note that each operand is 17 bit ie. the 17 th bit is the sign bit. Also negative numbers are entered as 1’s complement, this is why you need to add the S in the right hand side of the diagram. If you use 2’complement then the S’s on right side of the diagram can be removed 77

Comparison of Template and the sign extension 78

Comparison of Template and the sign extension 78

3 3 3 2 2 2 2 2 1 1 1 1 1 9

3 3 3 2 2 2 2 2 1 1 1 1 1 9 8 7 6 5 4 1 0 3 2 2 1 0 9 8 7 6 5 4 3 2 1 0 S S S A A A A A 0 0 0 0 0 1 S A A A A A 1 1 1 1 1 1 S A A A A A 2 2 2 2 2 1 S A A A A A 3 3 3 3 3 1 S A A A A A 4 4 4 4 4 1 S A A A A A 5 5 5 5 5 1 S A A A A A 6 6 6 6 6 A A A A A 7 7 7 7 7 Partial Product matrix generated for a 16 * 16 bit multiplication, Using booth and the template given in previous slide 7 S A A A A 8 8 8 8 8 79

Example of using the template 25 * - 35 with -35 as the multiplier.

Example of using the template 25 * - 35 with -35 as the multiplier. Using 8 bit representation Using the Template 25 * -35 Sign bit 0 0 0 1 1 0 0 1 Add SS 1 1 0 1 0 Add inverted S Add Inverted sign and add 1 1 0 0 0 1 1 0 0 1 * 1 Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1 1 0 0 1 0 * 2 No sign bit 1 1 0 0 1 1 1 * -1 1 1 0 0 1 0 1 This is a –ve number. Convert it 0 0 1 1 0 1 0 1 1 512 256 64 32 8 2 1 = 875 80

Booth Multiplier Components Multiplier Booth Encoder Mu lt ip li ca nd PPU (Partial

Booth Multiplier Components Multiplier Booth Encoder Mu lt ip li ca nd PPU (Partial products unit) PPA (Partial products adding unit) Product 81

Wallace Tree and Ripple Carry Adder Structure. Of 8*8 multiplier With Pipeline 82

Wallace Tree and Ripple Carry Adder Structure. Of 8*8 multiplier With Pipeline 82

Hardware implementation of Booth with shift and add 83

Hardware implementation of Booth with shift and add 83

Simulation Plan 84

Simulation Plan 84

Testing the Design 85

Testing the Design 85

 Simulation For Parallel Multipliers Signed Number: Unsigned Number: 86

Simulation For Parallel Multipliers Signed Number: Unsigned Number: 86

 Simulation For Signed S/P Multipliers There are 340 ns delay between the result

Simulation For Signed S/P Multipliers There are 340 ns delay between the result and the operators because of the D flip-flops delay. 87

FPGA after implementation, areas of programming shown clearly 88

FPGA after implementation, areas of programming shown clearly 88

Another implementation of the above after pipelining, the place and rout has paced the

Another implementation of the above after pipelining, the place and rout has paced the design in different places. 89

Spartacus FPGA board 90

Spartacus FPGA board 90

Testing the multiplication system 91

Testing the multiplication system 91

Comparison of Multipliers Array Multiplier   Area – Total CLB’s (#) Modified Booth Multiplier

Comparison of Multipliers Array Multiplier   Area – Total CLB’s (#) Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth. Wallace Tree Multiplier Twin Pipe Serial -Parallel Multiplier Behavioral Multiplier 3076. 50 2649. 50 3325. 50 2672. 50 490. 00 2993. 50 Maximum Delay D(ns) 35. 78 24. 43 18. 93 18. 53 107. 52 (3. 36 x 32) 49. 33 Total Dynamic Power P (W) 7. 52 6. 33 7. 46 6. 41 0. 28 6. 24 Delay ·Power Product (DP) (ns W) 268. 98 154. 64 141. 14 118. 76 30. 62 307. 58 Area • Power Product (AP) (# W) 23128. 20 16771. 60 24793. 93 17127. 79 139. 54 18665. 07 Area • Delay Product (AD) (# ns) 1. 10 E+05 6. 47 E+04 6. 30 E+04 4. 95 E+04 5. 27 E+04 1. 48 E+05 3. 94 E+06 1. 58 E+06 1. 19 E+06 9. 18 E+05 5. 66 E+06 7. 28 E+06 Area • Delay 2 Product (AD 2) (# ns 2) 92 Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M. Eng. 2005

Comparison of Multipliers Array Multiplier   Area – Total CLB’s (#) Modified Booth Multiplier

Comparison of Multipliers Array Multiplier   Area – Total CLB’s (#) Modified Booth Multiplier Wallace-Tree Multiplier Modified Booth. Wallace Tree Multiplier Twin Pipe Serial. Parallel Multiplier Behavioral Multiplier 3280. 50 2800. 00 3321. 50 2845. 50 487. 00 3003. 00 37. 23 25. 33 18. 93 18. 33 107. 52 44. 50 Total Dynamic Power P (W) 7. 57 6. 66 7. 32 6. 66 0. 29 6. 26 Delay ·Power Product (DP) (ns W) 281. 88 168. 77 138. 60 122. 13 30. 66 278. 53 Area • Power Product (AP) (# W) 24837. 98 18656. 40 24319. 36 18959. 57 138. 89 18795. 78 Area • Delay Product (AD) (# ns) 1. 22 E+05 7. 09 E+04 6. 29 E+04 5. 22 E+04 5. 24 E+04 1. 34 E+05 4. 55 E+06 1. 80 E+06 1. 19 E+06 9. 56 E+05 5. 63 E+06 5. 95 E+06 Maximum Delay D(ns) Area • Delay 2 Product (AD 2) (# ns 2) 93 Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M. Eng. 2005

Comparison of Multipliers Change the value of “set_max_delay” in Script file (ns) 0 10

Comparison of Multipliers Change the value of “set_max_delay” in Script file (ns) 0 10 20 30 40 50 60 >60 3013. 0 3110. 0 3193. 5 3019. 5 2999. 5 2978. 5 Power(w ) 6. 649 6. 647 7. 568 9 0 3 8. 187 8 8. 064 5 8. 041 9 8. 015 6 Delay(n s) 31. 98 30. 08 39. 93 49. 88 59. 63 Area(#) 3014. 5 31. 98 30. 93 The relation of Area and Delay for behavioral multiplier -"banana curve" 94

Comparison of Multipliers   Array Multiplier Modified Booth Multiplier Wallace. Tree Multiplier Modified Booth.

Comparison of Multipliers   Array Multiplier Modified Booth Multiplier Wallace. Tree Multiplier Modified Booth. Wallace Tree Multiplier Twin Pipe Serial. Parallel Multiplier Behavioral Multiplier Area Medium Small Large Smallest Medium Critical Delay Medium Fast Very Fastest Very Large Power Consumption Large Medium Smallest Medium Complexity Simple Complex More Complex Simplest Implement Easy Medium Difficut Easy Easiest By Chen Yaoquan, M. Eng. 2005 95

 Pipelining Simulation 96

Pipelining Simulation 96

 Synthesis for Signed Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree

Synthesis for Signed Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral 97

 Synthesis for Unsigned Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree

Synthesis for Unsigned Multipliers Array Modified Booth Wallace Tree Modified Booth -Wallace Tree Twin Pipe S/P Behavioral 98

Conclusion • • Modified Booth and Wallace Tree are the best techniques for high

Conclusion • • Modified Booth and Wallace Tree are the best techniques for high speed multiplication. Wallace Tree has the best performance, but it is hard to implement. Booth algorithm based multipliers have lower area among parallel multipliers. For behavioral multipliers, the area will increase while the delay decreases. 99

Comparison Area – Total CLB’s (#) Maximum Delay (ns) Power Consumption at highest speed

Comparison Area – Total CLB’s (#) Maximum Delay (ns) Power Consumption at highest speed (m. W) Delay Power Product (DP) (ns m. W) Area Power Product (AP) (# m. W) Area Delay Product (AD) (# ns) Area Delay 2 Product(AD 2) (# ns 2) Array Multiplier 1165 187. 87 ns Modified Booth Multiplier 1292 139. 41 ns 16. 6506 m W (at 188 ns) 3128. 15 Wallace Tree Multiplier 1659 101. 14 ns Modified Booth & Wallace Tree Multiplier 1239 Twin Pipe Serial. Parallel Multiplier 133 101. 43 ns 22. 58 ns (722. 56 ns) 23. 136 m. W (at 140 ns) 30. 95 m. W (at 101. 14 ns) 30. 862 m. W (at 101. 43 ns) 2. 089 m. W (at 722. 56 ns) 3225. 39 3130. 28 3130. 33 1509. 42 19. 397 x 103 218. 868 x 103 29. 891 x 103 51. 346 x 103 38. 238 x 103 277. 837 180. 118 x 103 167. 791 x 103 125. 671 x 103 96. 101 x 103 25. 110 x 106 16. 970 x 106 12. 747 x 106 69. 438 x 106 41. 119 x 106 100

NOTICE · The rest of these slides are for extra information only and are

NOTICE · The rest of these slides are for extra information only and are not part of the lecture 101

Array Addition 102

Array Addition 102

Addition of 8 binary numbers using the Wallace tree principal 103

Addition of 8 binary numbers using the Wallace tree principal 103

104

104

105

105

106

106

Baugh-Wooley two's complement multiplier: • 107

Baugh-Wooley two's complement multiplier: • 107

108

108

Cluster Multipliers Divide the multiplier into smaller multipliers 109

Cluster Multipliers Divide the multiplier into smaller multipliers 109

Cluster Multipliers The circuit used to generate the enable signal 110 8 -bit cluster

Cluster Multipliers The circuit used to generate the enable signal 110 8 -bit cluster low power multiplier

Cluster Multipliers • Dividing the multiplication circuit into clusters (blocks) of smaller multipliers •

Cluster Multipliers • Dividing the multiplication circuit into clusters (blocks) of smaller multipliers • Applying clock gating techniques to disable the blocks that are producing a zero result. • Features – Low Power (claims 13. 4 % savings) 111

 Multiplexer-Based Array Multipliers Zj xjyj 112

Multiplexer-Based Array Multipliers Zj xjyj 112

 Multiplexer-Based Array Multipliers Two types of cells: Cell 1: produce the terms carry

Multiplexer-Based Array Multipliers Two types of cells: Cell 1: produce the terms carry save adder array Z ij 2 j and includes a full adder of Cell 2: produce the terms xjyj 2 j and includes a full adder of carry save adder array 113

 Multiplexer-Based Array Multipliers • Characteristics – Faster than Modified Booth – Unlike Booth,

Multiplexer-Based Array Multipliers • Characteristics – Faster than Modified Booth – Unlike Booth, does not require encoding logic – Requires approximately N 2/2 cells – Has a zigzag shape, thus not layout-friendly 114

 Multiplexer-Based Array Multipliers • Improvement – More rectangular layout – Save up to

Multiplexer-Based Array Multipliers • Improvement – More rectangular layout – Save up to 40 percent area without penalties – Outperforms the modified Booth multiplier in both speed and power by 13% to 26% 115

 Gray-Encoded Array Multiplier Dec Hyb Dec Hyb 0 0000 4 0100 -8 1100

Gray-Encoded Array Multiplier Dec Hyb Dec Hyb 0 0000 4 0100 -8 1100 -4 1000 1 0001 5 0101 -7 1101 -3 1001 2 0011 6 0111 -6 1111 -2 1011 3 0010 7 0110 -5 1110 -1 1010 • 2’s complement Hybrid Coding – Having a single bit different for consecutive values – Reducing the number of transitions, and thus power ( for highly correlated streams ). 116

 Gray-Encoded Array Multiplier An 8 -bit wide 2’s complement radix-4 array multiplier 117

Gray-Encoded Array Multiplier An 8 -bit wide 2’s complement radix-4 array multiplier 117

 Gray-Encoded Array Multiplier • Characteristics – Uses gray code to reduce the switching

Gray-Encoded Array Multiplier • Characteristics – Uses gray code to reduce the switching activity of multiplier – Saves 45. 6% power than Modified Booth – Uses greater area(26. 4% ) than Modified Booth 118

 Ultra-high Speed Parallel Multiplier • How to ultra-high speed? – Based on Modified

Ultra-high Speed Parallel Multiplier • How to ultra-high speed? – Based on Modified Booth Algorithm and Tree Structure (Column compress) – Chooses efficient counters (3: 2 and 5: 3) – Uses the new compressor (faster 20% ) – Uses First Partial product Addition (FPA) Algorithm (reducing the bits of CLA by 50%) 119

 Ultra-high Speed Parallel Multiplier Divide into 3 rows or 5 rows only (most

Ultra-high Speed Parallel Multiplier Divide into 3 rows or 5 rows only (most efficient). Calculate the partial products as soon as possible. The final CLA is only 16 -bit instead of 32 -bit. Calculation process using parallel counter in case of 16 x 16 ---Totally reduce delay by about 30% 120

ULLRLF Multiplier • ULLRLF stands for Upper/Lower Left-to. Right Leapfrog. • Combine the following

ULLRLF Multiplier • ULLRLF stands for Upper/Lower Left-to. Right Leapfrog. • Combine the following techniques: – Signal flow optimization in [3: 2] adder array for partial product reduction, – Left-to-right leapfrog (LRLF) signal flow, – Splitting of the reduction array into upper/lower parts. 121

ULLRLF Multiplier PPij is always connected to pin A Sin/Cin are connected to B/C

ULLRLF Multiplier PPij is always connected to pin A Sin/Cin are connected to B/C , most Sin signals are connected to C 1) Signal flow optimization in [3: 2] adder array -- For n = 32, the delay is reduced by 30 percent. -- The power is saved also. 122

ULLRLF Multiplier The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure

ULLRLF Multiplier The sum signals skip over alternate rows. 2) Left-to-Right Leapfrog (LRLF) Structure -- The delay of signals is more balanceable. -- Low power. 123

ULLRLF Multiplier Only n+2 bits 3) Upper/Lower Split Structure -- The long path of

ULLRLF Multiplier Only n+2 bits 3) Upper/Lower Split Structure -- The long path of data path be broken into parallel short paths, there would be a saving in power. -- The delay of Partial Products Reduction is reduced. 124

ULLRLF Multiplier • ULLRLF multipliers have less power than optimized tree multipliers for n

ULLRLF Multiplier • ULLRLF multipliers have less power than optimized tree multipliers for n ≤ 32 while keeping similar delay and area. • With more regularity and inherently shorter interconnects, the ULLRLF structure presents a competitive alternative to tree structures. Floorplan of ULLRLF (n = 32)125

Signed Array Multiplier 126

Signed Array Multiplier 126

Unsigned Array Multiplier 127

Unsigned Array Multiplier 127

 Signed Modified Booth Multiplier 128

Signed Modified Booth Multiplier 128

 Signed Modified Booth Multiplier 129

Signed Modified Booth Multiplier 129

 Unsigned Modified Booth Multiplier 130

Unsigned Modified Booth Multiplier 130

 Unsigned Modified Booth Multiplier 131

Unsigned Modified Booth Multiplier 131

Wallace Tree multipliers 132

Wallace Tree multipliers 132

Wallace Tree multipliers • Use the 3: 2 counters and 2: 2 counters •

Wallace Tree multipliers • Use the 3: 2 counters and 2: 2 counters • Number of levels of = log (32/2) / log (3/2) ≈8 • Irregular structure • Fast 133

Wallace Tree multipliers 2 -level hierarchical 134

Wallace Tree multipliers 2 -level hierarchical 134

 Modified Booth-Wallace Tree Multipliers 135

Modified Booth-Wallace Tree Multipliers 135

 Modified Booth-Wallace Tree Multipliers • Use the 3: 2 counters and 2: 2

Modified Booth-Wallace Tree Multipliers • Use the 3: 2 counters and 2: 2 counters • Number of levels of = log (16/2) / log (3/2) ≈6 • Irregular structure • Fast • Less area 136

 Twin pipe serial-parallel multipliers 137

Twin pipe serial-parallel multipliers 137

 Signed twin pipe serial-parallel multipliers “Sign” control line and the sign-change hardware 138

Signed twin pipe serial-parallel multipliers “Sign” control line and the sign-change hardware 138

 Unsigned twin pipe serial-parallel multipliers • Don’t need the “Sign” control line and

Unsigned twin pipe serial-parallel multipliers • Don’t need the “Sign” control line and the sign-change hardware 139