CSE 477 VLSI Digital Circuits Fall 2003 Lecture

  • Slides: 15
Download presentation
CSE 477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design Mary Jane Irwin

CSE 477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design Mary Jane Irwin ( www. cse. psu. edu/~mji ) www. cse. psu. edu/~cg 477 [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © 2003 Rabaey, A. Chandrakasan, B. Nikolic] CSE 477 L 21 Multiplier Design. 1 J. Irwin&Vijay, PSU, 2003

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter,

Review: Basic Building Blocks q Datapath l Execution units - Adder, multiplier, divider, shifter, etc. q l Register file and pipeline registers l Multiplexers, decoders Control l q Interconnect l q Finite state machines (PLA, ROM, random logic) Switches, arbiters, buses Memory l Caches (SRAMs), TLBs, DRAMs, buffers CSE 477 L 21 Multiplier Design. 2 Irwin&Vijay, PSU, 2003

Review: Adder Comparisons MCC RCA best area* power*speed (N) CSE 477 L 21 Multiplier

Review: Adder Comparisons MCC RCA best area* power*speed (N) CSE 477 L 21 Multiplier Design. 3 CSkip best area*speed ( N) KS PPA best speed (log. N) Irwin&Vijay, PSU, 2003

Multiply Operation q Multiplication is just a a lot of additions N multiplicand multiplier

Multiply Operation q Multiplication is just a a lot of additions N multiplicand multiplier partial product array N can be formed in parallel double precision product 2 N CSE 477 L 21 Multiplier Design. 4 Irwin&Vijay, PSU, 2003

Multiplication Approaches q Right shift and add l Partial product array rows are accumulated

Multiplication Approaches q Right shift and add l Partial product array rows are accumulated from top to bottom on an N-bit adder - After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add l q Time for N bits Tserial_mult = O(N Tadder) = O(N 2) for a RCA Making it faster l Use a faster adder l Use higher radix (e. g. , base 4) multiplication – O(N/2 Tadder) - Use multiplier recoding to simplify multiple formation l q Form the partial product array in parallel and add it in parallel Making it smaller (i. e. , slower) l Use an array multiplier - Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI - Can be easily and efficiently pipelined CSE 477 L 21 Multiplier Design. 5 Irwin&Vijay, PSU, 2003

Making it Faster: Tree Multiplier Structure 0 D Q (‘ier) 0 D 0 D

Making it Faster: Tree Multiplier Structure 0 D Q (‘ier) 0 D 0 D (‘icand) partial product array reduction tree fast carry propagate adder (CPA) CSE 477 L 21 Multiplier Design. 6 mux + reduction tree (log N) + CPA (log N) interconnect multiple forming circuits P (product) Irwin&Vijay, PSU, 2003

(4, 2) Counter q Built out of two (3, 2) counters (just FA’s!) l

(4, 2) Counter q Built out of two (3, 2) counters (just FA’s!) l l all of the inputs (4 external plus one internal) have the same weight (i. e. , are in the same bit position) the internal carry output is fed to the next higher weight position (indicated by the ) (3, 2) CSE 477 L 21 Multiplier Design. 7 Note: Two carry outs - one “internal” and one “external” Irwin&Vijay, PSU, 2003

Tiling (4, 2) Counters q (3, 2) (3, 2) Reduces columns four high to

Tiling (4, 2) Counters q (3, 2) (3, 2) Reduces columns four high to columns only two high l l Tiles with neighboring (4, 2) counters Internal carry in at same “level” (i. e. , bit position weight) as the internal carry out CSE 477 L 21 Multiplier Design. 9 Irwin&Vijay, PSU, 2003

4 x 4 Partial Product Array Reduction q Fast 4 x 4 multiplication using

4 x 4 Partial Product Array Reduction q Fast 4 x 4 multiplication using (4, 2) counters q How would you lay it out? multiplicand multiplier partial product array reduced pp array (to CPA) double precision product CSE 477 L 21 Multiplier Design. 11 five (4, 2) counters 5 -bit CPA 8 -bit product Irwin&Vijay, PSU, 2003

8 x 8 Partial Product Array Reduction Wallace tree multiplier q ‘icand ‘ier partial

8 x 8 Partial Product Array Reduction Wallace tree multiplier q ‘icand ‘ier partial product array two rows of nine (4, 2) counters reduced partial product array one row of thirteen (4, 2) counters to a 13 -bit fast CPA CSE 477 L 21 Multiplier Design. 12 Irwin&Vijay, PSU, 2003

An 8 x 8 Multiplier Layout q How should it be laid out? multiplicand

An 8 x 8 Multiplier Layout q How should it be laid out? multiplicand multiplier nine (4, 2) counters thirteen (4, 2) counters 13 -bit CPA CSE 477 L 21 Multiplier Design. 13 Irwin&Vijay, PSU, 2003

A Better 8 x 8 Multiplier Layout q A better layout that focuses on

A Better 8 x 8 Multiplier Layout q A better layout that focuses on interconnect multiplicand . . . nine (4, 2) counters thirteen (4, 2) counters 2 multiplier multiple generators nine (4, 2) counters CPA CSE 477 L 21 Multiplier Design. 14 Irwin&Vijay, PSU, 2003

A 16 x 16 Multiplier Layout multiplicand . . . multiple generators (4, 2)

A 16 x 16 Multiplier Layout multiplicand . . . multiple generators (4, 2) counter slice 2 multiple selection signals (‘ier) (4, 2) counter slice CPA CSE 477 L 21 Multiplier Design. 15 Irwin&Vijay, PSU, 2003

Why Not Recode ? q Multiplier recoding (modified Booth’s, canonical, …) recode the multiplier

Why Not Recode ? q Multiplier recoding (modified Booth’s, canonical, …) recode the multiplier to allow base 4 multiplication with simple multiple formation l without recoding have the base 4 multiplier digit set of 0, 1, 2, 3 l with recoding have the base 4 multiplier digit set of -2, -1, 0, 1, 2 N q Thus, with recoding the initial partial product array is only N/2 high q But, the first level of (4, 2) counters also reduces the partial product array to N/2 high q N/2 2 N Which is better depends on the logic delay (recoding wins) and interconnect complexity (counters win big) CSE 477 L 21 Multiplier Design. 16 Irwin&Vijay, PSU, 2003

Next Lecture and Reminders q Next lecture l Shifters, decoders, and multiplexers - Reading

Next Lecture and Reminders q Next lecture l Shifters, decoders, and multiplexers - Reading assignment – Rabaey, et al, 11. 5 -11. 6 q Reminders l HW#4 due today l HW#5 will (optional) due November 20 th l Project final reports due December 4 th l Final grading negotiations/correction (except for the final exam) must be concluded by December 10 th Final exam scheduled l - Tuesday, December 16 th from 10: 10 to noon in 118 and 113 Thomas CSE 477 L 21 Multiplier Design. 17 Irwin&Vijay, PSU, 2003