Lowpower Highspeed Multiplier Architectures Shawn Nicholl ELEC5705 y

  • Slides: 28
Download presentation
Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705 y March 7, 2005/03/07 Low-Power, High-Speed Multiplier

Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705 y March 7, 2005/03/07 Low-Power, High-Speed Multiplier Architectures

Agenda/Overview n n n n 2005/03/07 Design Abstraction Numbering Systems Addition and Subtraction Adder

Agenda/Overview n n n n 2005/03/07 Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures Low-Power, High-Speed Multiplier Architectures 2

Levels of Abstraction in Digital ICs Low-power, high-speed techniques can be used at many

Levels of Abstraction in Digital ICs Low-power, high-speed techniques can be used at many levels of abstraction Increasing Abstraction n n Systems Modules Multiplier Architectures Logic Gates Circuits Devices Higher levels of abstraction have greater effect on overall system performance 2005/03/07 Low-Power, High-Speed Multiplier Architectures 3

Numbering Systems – A Quick Review n Some common numbering systems: n n n

Numbering Systems – A Quick Review n Some common numbering systems: n n n Decimal Range: 0 to 10 n-1 Unsigned Binary Range: 0 to 2 n-1 Two’s-Complement Range: -2 n-1 to +(2 n-1 – 1) Sign Decimal Sign Unsigned Binary Sign Two’s Complement + 10 + 0000 1010 N/A 0000 1010 - 45 - 0010 1101 N/A 1101 0011 0 0 1 1 0 1 Eg. 45 d = 0+0+25+0+23+22+0+20 2’s Comp 1 1 0 0 1 1 2005/03/07 Low-Power, High-Speed Multiplier Architectures 4

Adding and Subtracting n Two’s-complement algorithm is consistent n n Addition and subtraction and

Adding and Subtracting n Two’s-complement algorithm is consistent n n Addition and subtraction and behave the same Negative numbers treated same as positive numbers Example: Add – 45 d to 10 d Step 1) Initialize 10 d -45 d Two’s Complement Method Step 2) Compare so that augend holds larger number -45 d 10 d Step 1) Initialize 45 d -10 d Step 2) Add 0000 1010 b (no special rules) 1101 0011 b 1101 b Step 3) Treat as a subtraction Step 4) Do subtraction (borrows may be required) 45 d -10 d 35 d Step 5) Negate result (knowing that -35 d augend was negative) 2005/03/07 10 d = 0000 1010 b -45 d = 1101 0011 b Converting 2’s Comp back to decimal: 1101 b = -35 d Low-Power, High-Speed Multiplier Architectures 5

Adding and Subtracting (Example 2) Example 2: Subtract – 45 d from 10 d

Adding and Subtracting (Example 2) Example 2: Subtract – 45 d from 10 d Signed Decimal Method Step 1) Initialize Two’s Complement Method 10 d - -45 d Step 2) Subtrahend is negative, 10 d so negate it and do an addition + 45 d 55 d Step 1) Initialize 10 d = 0000 1010 b -45 d = 1101 0011 b Step 2) Invert 1 b subtrahend and set 0000 1010 b CIN = 1 0010 1100 b 0011 0111 b Converting 2’s Comp back to decimal: 0011 0111 b = 55 d Subtraction logic can be shared with addition logic! 2005/03/07 Low-Power, High-Speed Multiplier Architectures 6

Adder Building Blocks n n Half Adder Sn = An Bn COn = An

Adder Building Blocks n n Half Adder Sn = An Bn COn = An • Bn Full Adder Sn = An Bn CINn COUTn = An • Bn • CINn 2005/03/07 Low-Power, High-Speed Multiplier Architectures 7

Adder Architectures (CRA) n Carry Ripple Adder (CRA) n n 2005/03/07 Gate Count N

Adder Architectures (CRA) n Carry Ripple Adder (CRA) n n 2005/03/07 Gate Count N Area N Delay N Power N Layout friendly (low fan-in/fan-out; regular structure) Low-Power, High-Speed Multiplier Architectures 8

Adder Architectures (CLA) n Carry Lookahead Adder (CLA) Generates n Generate: Gn = An

Adder Architectures (CLA) n Carry Lookahead Adder (CLA) Generates n Generate: Gn = An • Bn n Propagate: Pn = An + Bn n Recursive Relationship: Propagates 1 CINn = Gn-1 + Pn-1 • CINn-1 CINn = Gn-1 + Pn-1 Gn-2 + Pn-1 Pn-2…P 1 G 0 + Pn-1 Pn-2…P 0 CIN 0 n CLA: n n Source: Patterson and Hennessy, Figure A. 14 2005/03/07 n Delay log 2 N (if built right) Gate count, power are greater than CRA Not layout friendly (high fan-in; difficult to route) Low-Power, High-Speed Multiplier Architectures 9

Adder Architectures (CSA) n Carry Save Adder n n 2005/03/07 Adders work independently, so

Adder Architectures (CSA) n Carry Save Adder n n 2005/03/07 Adders work independently, so very fast Pipelined architecture results in flops and control logic, which increase area and latency Low-Power, High-Speed Multiplier Architectures 10

Unsigned Multiplication Example: Multiply 118 d by 99 d Step 1) Initialize Multiplicand 118

Unsigned Multiplication Example: Multiply 118 d by 99 d Step 1) Initialize Multiplicand 118 d Multiplier 99 d Step 2) Find partial products 1062 d 1062 d Step 3) Sum up the shifted 11682 d partial products n Shift-and-Add Algorithm Two’s Complement Method 118 d = 0111 0110 b 99 d = 0110 0011 b 01110110 b 0000 b Step 2) Find partial 0000 b products 0000 b 01110110 b Step 3) Sum up the 0000 b 010110110100010 b shifted partial products Step 1) Initialize Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682 d 2005/03/07 Low-Power, High-Speed Multiplier Architectures 11

Shift-and-Add Multiplier B Multiplicand X A Multiplier P Product n Shift-and-Add Multiplier n Take

Shift-and-Add Multiplier B Multiplicand X A Multiplier P Product n Shift-and-Add Multiplier n Take N cycles to complete: TLat= (TN-bit. ADD+Tshift)x. N n Requires minimal logic (most logic is in the adder) 2005/03/07 Low-Power, High-Speed Multiplier Architectures 12

Basic Signed Multiplication Basic Idea 1. Convert to Unsigned 2. Use Shift-and-Add Multiplier 3.

Basic Signed Multiplication Basic Idea 1. Convert to Unsigned 2. Use Shift-and-Add Multiplier 3. Convert to Signed Extra Hardware! 2005/03/07 Low-Power, High-Speed Multiplier Architectures 13

Signed Multiplication n Booth Recoding n n Reduce the number of partial products by

Signed Multiplication n Booth Recoding n n Reduce the number of partial products by re-coding the multiplier operand Works for signed numbers Low-order Bit Last Bit Shifted Out Example: Multiply -118 d by -99 d Recall, 99 d = 0110 0011 b An An-1 1001 1100 b 1 b -99 d = 1001 1101 b Partial Product 0 0 1 +B 1 0 -B 1 1 0 Radix-2 Booth Recoding 2005/03/07 -99 d = Low-Power, High-Speed Multiplier Architectures 14

Radix-2 Booth Multiplication Example: Multiply -118 d by -99 d B = -118 d

Radix-2 Booth Multiplication Example: Multiply -118 d by -99 d B = -118 d = 1000 1010 b -B = 118 d = 0111 0110 b A = -99 d = 1001 1101 b -99 d = Sign Extension Radix-2 Booth -118 d = 0111 0110 b -99 d = 01110110 b 110001010 b 01110110 b Step 2) Find partial 0000 b products 0000 b 1110001010 b 00000 b Step 3) Sum up the 01110110 b 0010110110100010 b shifted partial products Step 1) Initialize -B B -B 0 0 B 0 -B Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682 d 2005/03/07 Low-Power, High-Speed Multiplier Architectures 15

Array Multiplier -118 d = 0111 0110 b -99 d = 01110110 b 110001010

Array Multiplier -118 d = 0111 0110 b -99 d = 01110110 b 110001010 b 01110110 b 00000000 b 1110001010 b 00000 b 01110110 b 0010110110100010 b n -B B -B 0 0 B 0 -B n n Combinatorial, so it is very fast – delay N Can be pipelined Very regular structure 2005/03/07 0000 b 0 0000 b 0 1110001010 b B 00000 b 0 Array Multiplier n 01110110 b -B 110001010 b B 01110110 b -B 01110110 b -B Low-Power, High-Speed Multiplier Architectures 16

Array Multiplier Structure Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999

Array Multiplier Structure Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999 2005/03/07 Low-Power, High-Speed Multiplier Architectures 17

Radix-4 Booth Multiplication Low-order Bits n Similar to Radix-2, but uses looks at two

Radix-4 Booth Multiplication Low-order Bits n Similar to Radix-2, but uses looks at two loworder bits at a time (instead of 1) A 2 n+1 A 2 n-1 Partial Product 0 0 0 1 +B 0 1 0 +B 1001 1100 b 1 b -99 d = 1001 1101 b 0 1 1 +2 B 1 0 0 -2 B 1 0 1 -B -99 d = 1 1 0 -B 1 1 1 0 Recall, 99 d = 0110 0011 b Radix-4 Booth Recoding 2005/03/07 Last Bit Shifted Out Low-Power, High-Speed Multiplier Architectures 18

Radix-4 Booth Multiplication Example: Multiply -118 d by -99 d B = -118 d

Radix-4 Booth Multiplication Example: Multiply -118 d by -99 d B = -118 d = 1000 1010 b -B = 118 d = 0111 0110 b 2 B = -236 d = 1 0001 0100 b -2 B = 236 d = 0 1110 1100 b A = -99 d = 1001 1101 b -99 d = Sign Extension n 2005/03/07 Radix-4 Booth -118 d = 0111 0110 b -99 d = 11110001010 b Step 2) Find partial 01110110 b products 11100010100 b 011101100 b Step 3) Sum up the 0010110110100010 b shifted partial products Step 1) Initialize B -B 2 B -2 B Convert 2’s-Comp back to decimal: 0010 1101 1010 0010 = 11682 d Reduces number of partial products by half! Low-Power, High-Speed Multiplier Architectures 19

Tree Multiplier Original Structure n Tree Structure Wallace Tree n n Reduces the total

Tree Multiplier Original Structure n Tree Structure Wallace Tree n n Reduces the total number of full-adders Uses 3: 2 Compressor (aka Full Adder) Delay log 3/2 N Irregular structure is difficult to layout 2005/03/07 Source: J. Kuo, et. al. , Low-Voltage CMOS VLSI Circuits, 1999 Low-Power, High-Speed Multiplier Architectures 20

Twin Pipe Serial-Parallel Multiplier Even data bits on rising clock Parallel Feed One Operand

Twin Pipe Serial-Parallel Multiplier Even data bits on rising clock Parallel Feed One Operand Serial Feed One Operand Odd data bits on falling clock Source: S. Shah, et. al. , “Comparison of 32 -bit Multipliers for Various Performance Measures”, 2000. n Features n 2005/03/07 Low Power n Low Area n Low-Power, High-Speed Multiplier Architectures High latency 21

Cluster Multiplication n Divide circuit into clusters of nibblewide multiplications n n If all

Cluster Multiplication n Divide circuit into clusters of nibblewide multiplications n n If all bits in a nibble are zeroes, then use clock-gating to gate multiplication for that nibble Features Low Power (claims 13% savings) n 2005/03/07 Source: A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, 2001. Low-Power, High-Speed Multiplier Architectures 22

Multiplexer-Based Array Multiplier n Characteristics n n Fast (because it is array-based) Unlike Booth,

Multiplexer-Based Array Multiplier n Characteristics n n Fast (because it is array-based) Unlike Booth, does not require Source: K. Pekmestzi, “Multiplexer-Based Array Multipliers”, 1999. encoding logic Processes 1 bit of multiplier and 1 bit of multiplicand at a time, thus it is symmetric Has a zigzag shape, thus not layout-friendly 2005/03/07 Low-Power, High-Speed Multiplier Architectures 23

Area-Efficient Multiplexer-Based Multiplier Source: Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power

Area-Efficient Multiplexer-Based Multiplier Source: Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, 2001. n Characteristics n n Increases each row to have N+1 cells (instead of N) Depth is cut in half (increases “squareness”) 2005/03/07 Low-Power, High-Speed Multiplier Architectures 24

Low Latency Booth-Encoding-based Pipeline Multiplier n Features n n n Source: 2005/03/07 Delay N/4

Low Latency Booth-Encoding-based Pipeline Multiplier n Features n n n Source: 2005/03/07 Delay N/4 Needs (N+N/2)-bit addition at end Uses CLA’s instead of CSA’s because longest stage (i. e. adder at end) determines fastest operating frequency X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, 2001. Low-Power, High-Speed Multiplier Architectures 25

Two’s Complement Gray-Encoded Array Multiplier n Characteristics n n n Uses gray code to

Two’s Complement Gray-Encoded Array Multiplier n Characteristics n n n Uses gray code to reduce the switching activity of multiplier Claims that traditional Booth uses 45% more power Greater area than traditional Booth Source: E. Costa, et. al. , “A New Architecture for 2’s Complement Gray Encoded Array Multiplier”, 2002. 2005/03/07 Low-Power, High-Speed Multiplier Architectures 26

Project Plan Start 03/06 03/13 03/20 End 03/05 03/12 03/19 03/26 03/27 04/03 04/10

Project Plan Start 03/06 03/13 03/20 End 03/05 03/12 03/19 03/26 03/27 04/03 04/10 04/02 04/09 04/16 2005/03/07 Task Research Multiplier Circuits Code multipliers in Verilog HDL Synthesize all multiplier circuits Analyze results (delay/power/area) Prepare report Prepare for final exam Complete Report and Submit Low-Power, High-Speed Multiplier Architectures 27

References n n n n S. Shah, A. J. Al-Khalili, D. Al-Khalili, “Comparison of

References n n n n S. Shah, A. J. Al-Khalili, D. Al-Khalili, “Comparison of 32 -bit Multipliers for Various Performance Measures”, Proc. 2000 Int’l Conf. Microelectronics, pp. 75 -80, 2000. D. Patterson, J. Hennessy, 2 nd, ed. , Computer Architecture – A Quantitative Approach, San Francisco, CA: Morgan Kaufmann Publishers, Inc. , 1996. X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, Proc. 2001 Int’l Conf. on ASIC, pp. 551 -554, 2001. J. Wakerly, 2 nd, ed. , Digital Design – Principles and Practices, Eaglewood Cliffs, NJ: Prentice Hall, 1994. J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons, Inc. , 1999. K. Pekmestzi, “Multiplexer-Based Array Multipliers”, IEEE Trans. on Computers, vol. 48, pp. 15 -23, 1999. A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149 -154, 2001. Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, Proc. 2001 IEEE Int’l Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429‑ 1432, 2001. 2005/03/07 Low-Power, High-Speed Multiplier Architectures 28