CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture

  • Slides: 23
Download presentation
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 20: Multiplier Design [Adapted from Rabaey’s

CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 20: Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © 2003 Rabaey, A. Chandrakasan, B. Nikolic] Sp 09 CMPEN 411 L 20 S. 1 J.

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter,

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter, etc. q l Register file and pipeline registers l Multiplexers, decoders Control l q Interconnect l q Finite state machines (PLA, ROM, random logic) Switches, arbiters, buses Memory l Caches (SRAMs), TLBs, DRAMs, buffers Sp 09 CMPEN 411 L 20 S. 2

The Binary Multiplication Sp 09 CMPEN 411 L 20 S. 3

The Binary Multiplication Sp 09 CMPEN 411 L 20 S. 3

Multiply Operation q Multiplication is just a a lot of additions N multiplicand multiplier

Multiply Operation q Multiplication is just a a lot of additions N multiplicand multiplier partial product array N can be formed in parallel double precision product 2 N Sp 09 CMPEN 411 L 20 S. 4

Multiplication Approaches q Right shift and add l Partial product array rows are accumulated

Multiplication Approaches q Right shift and add l Partial product array rows are accumulated from top to bottom on an N-bit adder - After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add l q Time for N bits Tserial_mult = O(N Tadder) = O(N 2) for a RCA Making it faster l Use a faster adder l Use higher radix (e. g. , base 4) multiplication – O(N/2 Tadder) - Use multiplier recoding to simplify multiple formation (booth) l q Sp 09 Form the partial product array in parallel and add it in parallel Making it smaller (i. e. , slower) l Use serial-parallel mult l Use an array multiplier - Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and CMPEN 411 L 20 S. 5 efficiently pipelined

Serial-parallel multiplier structure Sp 09 CMPEN 411 L 20 S. 6

Serial-parallel multiplier structure Sp 09 CMPEN 411 L 20 S. 6

The Array Multiplier Sp 09 CMPEN 411 L 20 S. 7

The Array Multiplier Sp 09 CMPEN 411 L 20 S. 7

Booth multiplier q Encoding scheme to reduce number of stages in multiplication. q Performs

Booth multiplier q Encoding scheme to reduce number of stages in multiplication. q Performs two bits of multiplication at once—requires half the stages. q Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder. Sp 09 CMPEN 411 L 20 S. 11

Booth encoding q Two’s-complement form of multiplier: y = -2 nyn + 2 n-1

Booth encoding q Two’s-complement form of multiplier: y = -2 nyn + 2 n-1 yn-1 + 2 n-2 yn-2 +. . . (first bit is the sign bit) (example, y=18=010010 y= -18 = 101110 ) l q Rewrite using 2 a = 2 a+1 - 2 a: l q y = 2 n(yn-1 -yn) + 2 n-1(yn-2 -yn-1) + 2 n-2(yn-3 -yn-2) +. . . Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2 x to partial product. Sp 09 CMPEN 411 L 20 S. 12

Booth actions l q y = 2 n(yn-1 -yn) + 2 n-1(yn-2 -yn-1) +

Booth actions l q y = 2 n(yn-1 -yn) + 2 n-1(yn-2 -yn-1) + 2 n-2(yn-3 -yn-2) +. . . Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2 x to partial product. yi yi-1 yi-2 increment 000 0 001 x 010 x 011 2 x 100 -2 x 101 -x 110 -x 111 0 Sp 09 CMPEN 411 L 20 S. 13

Booth example q x = 1001 (910), y = 0111 (710). q P 0

Booth example q x = 1001 (910), y = 0111 (710). q P 0 = 0000 q y 3 y 2 y 1=011 q y 1 y 0 y-1=11(0) = 110, P 1 = P 0 - (1001) = 11110111 q x shift left for 2 bits to be 100100 q y 3 y 2 y 1 = 011, P 2 = P 1 (10*100100) = 11110111+01001000 = 001111111 (6310) q An array multiplier needs N addtions, booth multiplier needs only N/2 additions Sp 09 CMPEN 411 L 20 S. 14

Review: A 64 -bit Adder/Subtractor add/subt q q Ripple Carry Adder (RCA) built out

Review: A 64 -bit Adder/Subtractor add/subt q q Ripple Carry Adder (RCA) built out of 64 FAs Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in RCA l advantage: simple logic, so small (low cost) l disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption) Sp 09 CMPEN 411 L 20 S. 15 A 0 1 -bit FA C 1 S 0 A 1 1 -bit FA C 2 S 1 A 2 1 -bit FA C 3 S 2 B 0 B 1 B 2 . . . q C 0=Cin C 63 A 63 B 63 1 -bit FA S 63 C 64=Cout

Booth structure Sp 09 CMPEN 411 L 20 S. 16

Booth structure Sp 09 CMPEN 411 L 20 S. 16

Wallace-Tree Multiplier Sp 09 CMPEN 411 L 20 S. 17

Wallace-Tree Multiplier Sp 09 CMPEN 411 L 20 S. 17

Wallace-Tree Multiplier Full adder = (3, 2) compressor Sp 09 CMPEN 411 L 20

Wallace-Tree Multiplier Full adder = (3, 2) compressor Sp 09 CMPEN 411 L 20 S. 18

Making it Faster: Tree Multiplier Structure 0 D Q (‘ier) 0 D 0 D

Making it Faster: Tree Multiplier Structure 0 D Q (‘ier) 0 D 0 D (‘icand) partial product array reduction tree fast carry propagate adder (CPA) Sp 09 CMPEN 411 L 20 S. 19 mux + reduction tree (log N) + CPA (log N) P (product) interconnect multiple forming circuits

(4, 2) Counter q Built out of two (3, 2) counters (just FA’s!) l

(4, 2) Counter q Built out of two (3, 2) counters (just FA’s!) l l all of the inputs (4 external plus one internal) have the same weight (i. e. , are in the same bit position) the internal carry output is fed to the next higher weight position (indicated by the ) (3, 2) Sp 09 CMPEN 411 L 20 S. 20 Note: Two carry outs - one “internal” and one “external”

Tiling (4, 2) Counters q (3, 2) (3, 2) Reduces columns four high to

Tiling (4, 2) Counters q (3, 2) (3, 2) Reduces columns four high to columns only two high l l Tiles with neighboring (4, 2) counters Internal carry in at same “level” (i. e. , bit position weight) as the internal carry out Sp 09 CMPEN 411 L 20 S. 22

4 x 4 Partial Product Array Reduction q Fast 4 x 4 multiplication using

4 x 4 Partial Product Array Reduction q Fast 4 x 4 multiplication using (4, 2) counters q How would you lay it out? multiplicand multiplier partial product array reduced pp array (to CPA) double precision product Sp 09 CMPEN 411 L 20 S. 24 five (4, 2) counters 5 -bit CPA 8 -bit product

8 x 8 Partial Product Array Reduction Wallace tree multiplier q ‘icand ‘ier partial

8 x 8 Partial Product Array Reduction Wallace tree multiplier q ‘icand ‘ier partial product array two rows of nine (4, 2) counters reduced partial product array one row of thirteen (4, 2) counters to a 13 -bit fast CPA Sp 09 CMPEN 411 L 20 S. 25

An 8 x 8 Multiplier Layout q How should it be laid out? multiplicand

An 8 x 8 Multiplier Layout q How should it be laid out? multiplicand multiplier nine (4, 2) counters thirteen (4, 2) counters 13 -bit CPA Sp 09 CMPEN 411 L 20 S. 26

Multipliers —Summary Sp 09 CMPEN 411 L 20 S. 32

Multipliers —Summary Sp 09 CMPEN 411 L 20 S. 32

Next Lecture and Reminders q Next lecture l Shifters, decoders, and multiplexers - Reading

Next Lecture and Reminders q Next lecture l Shifters, decoders, and multiplexers - Reading assignment – Rabaey, et al, 11. 5 -11. 6 Sp 09 CMPEN 411 L 20 S. 33