14 332 331 Computer Architecture and Assembly Language
14: 332: 331 Computer Architecture and Assembly Language Spring 2006 Week 7 ALU Design [Adapted from Dave Patterson’s UCB CS 152 slides and Mary Jane Irwin’s PSU CSE 331 slides] 331 W 07. 1 Spring 2006
Head’s Up q This week’s material l MIPS logic and multiply instructions - Reading assignment – PH 3. 1 -3. 4 l MIPS ALU design - Reading assignment – PH B. 5, B. 6 331 W 07. 2 Spring 2006
Review: MIPS Arithmetic Instructions 31 R-type: I-Type: op op 25 Rs Rs 20 15 Rt 5 Rd Rt funct Immed 16 expand immediates to 32 bits before ALU l 10 operations so can encode in 4 bits 0 A 32 zeroovf 1 1 ALU B 32 l 32 4 m (operation) 0 add Type op funct 1 addu ADD 00 100000 2 sub ADDU 00 100001 3 subu SUB 00 100010 SUBU 00 AND op funct 4 and 100011 00 101000 5 or 00 100100 00 101001 6 xor OR 00 100101 SLT 00 101010 7 nor XOR 00 100110 SLTU 00 101011 a slt NOR 00 100111 00 101100 b sltu 331 W 07. 3 Type result Spring 2006
Review: A 32 -bit Adder/Subtractor Built out of 32 full adders (FAs) B 1 bit FA A 0 1 -bit FA c 1 S 0 A 1 1 -bit FA c 2 S 1 A 2 1 -bit FA c 3 S 2 B 0 carry_in A c 0=carry_in S carry_out B 1 B 2 . . . q add/subt S = A xor B xor carry_in c 31 carry_out = A B v A carry_in v B carry_in A 31 (majority function) q Small but slow! 331 W 07. 4 B 31 1 -bit FA S 31 c 32=carry_out Spring 2006
Minimal Implementation of a Full Adder q Gate library: inverters, 2 -input nands, or-and-inverters architecture concurrent_behavior of full_adder is signal t 1, t 2, t 3, t 4, t 5: std_logic; begin t 1 <= not A after 1 ns; t 2 <= not cin after 1 ns; t 4 <= not((A or cin) and B) after 2 ns; t 3 <= not((t 1 or t 2) and (A or cin)) after 2 ns; t 5 <= t 3 nand B after 2 ns; S <= not((B or t 3) and t 5) after 2 ns; cout <= not(t 1 or t 2) and t 4) after 2 ns; end concurrent_behavior; Can you create the equivalent schematic? Can you determine worst case delay (the worst case timing path through the circuit)? q 331 W 07. 5 Spring 2006
Logic Operations q Logic operations operate on individual bits of the operand. $t 2 = 0… 0 0000 1101 0000 $t 1 = 0… 0 0011 1100 0000 q and $t 0, $t 1, $t 2 $t 0 = or $t 0, $t 1 $t 2 $t 0 = xor $t 0, $t 1, $t 2 $t 0 = nor $t 0, $t 1, $t 2 $t 0 = How do we expand our FA design to handle the logic operations - and, or, xor, nor ? 331 W 07. 6 Spring 2006
A Simple ALU Cell add/subt carry_in op A result 1 -bit FA B add/subt 331 W 07. 7 carry_out Spring 2006
An Alternative ALU Cell s 2 s 1 s 0 A carry_in 1 -bit FA result B carry_out 331 W 07. 8 Spring 2006
The Alternative ALU Cell’s Control Codes 331 W 07. 9 s 2 s 1 s 0 c_in result function 0 0 0 1 1 1 0 0 1 0 1 0 A A+1 A+B+1 A–B– 1 A–B A– 1 transfer A increment A add with carry subt with borrow subtract decrement A 0 1 1 0 1 x A A or B transfer A or 1 0 1 x A xor B xor 1 1 0 x A and B and 1 1 1 x !A complement A Spring 2006
Tailoring the ALU to the MIPS ISA q q Need to support the set-on-less-than instruction (slt) l remember: slt is an arithmetic instruction l produces a 1 if rs < rt and 0 otherwise l use subtraction: (a - b) < 0 implies a < b Need to support test for equality (beq) l q 331 W 07. 10 use subtraction: (a - b) = 0 implies a = b Need to add the overflow detection hardware Spring 2006
Modifying the ALU Cell for slt add/subt carry_in op A result 1 -bit FA B less add/subt 331 W 07. 11 carry_out Spring 2006
Modifying the ALU for slt q q q First perform a subtraction Make the result 1 if the subtraction yields a negative result Make the result 0 if the subtraction yields a positive result A 0 result 0 B 0 + less A 1 result 1 B 1 + less . . . A 31 result 31 B 31 + less 331 W 07. 12 Spring 2006
op add/subt A 0 Modifying the ALU for Zero result 0 B 0 q First perform subtraction q Insert additional logic to detect when all result bits are zero + less A 1 result 1 B 1 0 + less . . . A 31 result 31 B 31 0 + less set 331 W 07. 13 Spring 2006
Review: Overflow Detection q Overflow: the result is too large to represent in the number of bits allocated q Overflow occurs when q l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive On your own: Prove you can detect overflow by: l 0 + 331 W 07. 14 Carry into MSB xor Carry out of MSB 1 1 0 1 1 1 7 0 0 1 1 3 1 0 – 6 + 0 1 1 0 0 – 4 1 0 1 1 – 5 0 1 1 1 7 Spring 2006
op add/subt A 0 Modifying the ALU for Overflow Modify the most significant cell to determine overflow output setting q result 0 B 0 + less A 1 result 1. . . B 1 0 + less . . . Disable overflow bit setting for unsigned arithmetic q A 31 B 31 0 zero + less result 31 overflow set 331 W 07. 15 Spring 2006
Example: When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 000 A = 1111 B = 0001 331 W 07. 16 Spring 2006
Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 0 op = 100 A = 1111 B = 0001 331 W 07. 17 Spring 2006
Example: cont’d When do the result outputs settle at their final values for the inputs: add/subt = 1 op = 101 A = 1111 B = 0001 What is the zero output of these inputs? 331 W 07. 18 Spring 2006
Example: cont’d With the ALU design described in class, we assumed that a subtraction operation had to be performed as part of the beq instruction. When do the outputs settle? Is there a faster alternative? 331 W 07. 19 Spring 2006
But What about Performance? q Critical path of n-bit ripple-carry adder is n*CP Carry. In 0 A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 1 -bit Result 0 ALU Carry. Out 0 Carry. In 1 1 -bit Result 1 ALU Carry. Out 1 Carry. In 2 1 -bit Result 2 ALU Carry. Out 2 Carry. In 3 1 -bit Result 3 ALU Carry. Out 3 q Design trick – throw hardware at it (Carry Lookahead) 331 W 07. 20 Spring 2006
Fast carry using “infinite” hardware (Parallel) q cout = b • cin + a • b c 1 = (b 0+a 0) • c 0 + a 0 • b 0 c 2 = (b 1+a 1) • c 1 + a 1 • b 1 = a 0 • b 0 + a 0 • c 0 + b 0 • c 0 = (b 1+a 1) • ((b 0+a 0) • c 0 + a 0 • b 0) + a 1 • b 1 = a 1 • a 0 • b 0 + a 1 • a 0 • c 0 + b 1 • a 0 • b 0 + a 1 • b 0 • c 0 + b 1 • a 1 c 3 = a 2 • a 1 • a 0 • b 0 + a 2 • a 1 • a 0 • c 0 + a 2 • b 1 • a 0 • b 0 + a 2 • a 1 • b 0 • c 0 + a 2 • b 1 • a 1 + … … q Outputs settle much faster l l l q D_c 3 = 2* D_and + D_or (best case) … D_c 31 = 5 *D_and + D_or (best case) Problem: Prohibitively expensive 331 W 07. 21 Spring 2006
Hierarchical Solution I q Hierarchical solution I l l l 331 W 07. 22 Group 32 bits into 8 4 -bit groups Within each group, use carry look ahead Use 4 -bit as a building block, and connect them in ripple carry fashion. Spring 2006
First Level: Propagate and generate ci+1 = (ai • bi)+(ai+bi) • ci gi = pi = (ai+bi) q ci+1 = gi + pi • ci ci+1 = 1 if l l q ai • bi gi = 1, or pi and ci = 1 c 1 = g 0+(p 0 • c 0) c 2 = g 1+(p 1 • g 0)+(p 1 • p 0 • c 0) c 3 = g 2+(p 2 • g 1)+(p 2 • p 1 • g 0)+(p 2 • p 1 • p 0 • c 0) c 4 = g 3+(p 3 • g 2)+(p 3 • p 2 • g 1)+ (p 3 • p 2 • p 1 • g 0) + (p 3 • p 2 • p 1 • p 0 • c 0) 331 W 07. 23 Spring 2006
Hierarchical Solution I (16 bit) A 0 B 0 A 1 B 1 A 2 B 2 A 3 B 3 A 4 B 4 A 5 B 5 A 6 B 6 A 7 B 7 c 0=carry_in ALU 0 result 0 -3 c 4=carry_in Delay = 4 * Delay ( 4 -bit carry look-ahead ALU) ALU 1 result 4 -7 … 331 W 07. 24 Spring 2006
Hierarchical Solution II q Hierarchical solution I l l l q Group 32 bits into 8 4 -bit groups Within each group, use carry look ahead Use 4 -bit as a building block, and connect them in ripple carry fashion. Hierarchical solution II l Group 32 bits into 8 4 -bit groups l Within each group, use carry look ahead Another level of carry look ahead is used to connect these 4 -bit groups l 331 W 07. 25 Spring 2006
Hierarchical Solution II A 0 B 0 result 0 -3 cin P 0 A 3 B 3 G 0 C 1 pi gi ci+1 A 4 B 4 P 1 G 1 A 7 B 7 C 2 A 8 B 8 Carry-lookahead unit result 4 -7 • input a 0 -a 15, b 0 -b 15 pi+1 gi+1 • calculate P 0 -P 3, G 0 -G 3 ci+2 • Calculate C 1 -C 4 result 8 -11 • each 4 -bit ALU calculates its results pi+2 P 2 G 2 A 11 B 11 C 3 A 12 B 12 result 12 -15 pi+3 gi+3 P 3 G 3 A 15 B 15 gi+2 ci+3 331 W 07. 26 cout Spring 2006
Fast Carry using the second level abstraction q P 0 = p 3. p 2. p 1. p 0 P 1 = p 7. p 6. p 5. p 4 P 2 = p 11. p 10. p 9. p 8 P 3 = p 15. p 14. p 13. p 12 q G 0 = g 3+(p 3. g 2) + (p 3. p 2. g 1) + (p 3. p 2. p 1. g 0) G 1 = g 7+(p 7. g 6) + (p 7. p 6. g 5) + (p 7. p 6. p 5. g 4) G 2 = g 11+(p 11. g 10)+(p 11. p 10. g 9) + (p 11. p 10. p 9. g 8) G 3 = g 15+(p 15. g 14)+(p 15. p 14. g 3)+(p 15. p 14. p 3. g 12) q C 1 = G+(P 0 • c 0) C 2 = G 1+(P 1 • G 0)+(P 1 • P 0 • c 0) C 3 = G 2+(P 2 • G 1)+(P 2 • P 1 • G 0)+(P 2 • P 1 • P 0 • c 0) C 4 = G 3+(P 3 • G 2)+(P 3 • P 2 • G 1)+(P 3 • P 2 • P 1 • G 0) + (P 3 • P 2 • P 1 • P 0 • c 0) 331 W 07. 27 Spring 2006
Shift Operations q Also need operations to pack and unpack 8 -bit characters into 32 -bit words q Shifts move all the bits in a word left or right sll $t 2, $s 0, 8 op rs 000000 srl q rt rd 10000 01010 $t 2, $s 0, 8 000000 #$t 2 = $s 0 << 8 bits 10000 shamt funct 01000 000000 #$t 2 = $s 0 >> 8 bits 01010 01000 000010 Such shifts are logical because they fill with zeros 331 W 07. 28 Spring 2006
Shift Operations, con’t q An arithmetic shift (sra) maintain the arithmetic correctness of the shifted value (i. e. , a number shifted right one bit should be ½ of its original value; a number shifted left should be 2 times its original value) l l so sra uses the most significant bit (sign bit) as the bit shifted in note that there is no need for a sla when using two’s complement number representation sra q $t 2, $s 0, 8 000000 10000 #$t 2 = $s 0 >> 8 bits 01010 01000 000011 The shift operation is implemented by hardware (usually a barrel shifter) outside the ALU 331 W 07. 29 Spring 2006
Multiplication q More complicated than addition l accomplished via shifting and addition 0010 x_1011 0010 0000 0010 00010110 (multiplicand) (multiplier) (partial product array) (product) q Double precision product produced q More time and more area to compute 331 W 07. 30 Spring 2006
MIPS Multiply Instruction mult $s 1 $s 0, $s 1 op rs 000000 10000 # hi||lo = $s 0 * rt rd 10001 00000 shamt funct 00000 011000 q Low-order word of the product is left in processor register lo and the high-order word is left in register hi q Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file 331 W 07. 31 Spring 2006
Review: MIPS ISA, so far Category Instr Op Code Example Meaning Arithmeti c add 0 and 32 add $s 1, $s 2, $s 3 $s 1 = $s 2 + $s 3 add unsigned 0 and 33 addu $s 1, $s 2, $s 3 $s 1 = $s 2 + $s 3 (R & I format) subtract 0 and 34 sub $s 1, $s 2, $s 3 $s 1 = $s 2 - $s 3 subt unsigned 0 and 35 subu $s 1, $s 2, $s 3 $s 1 = $s 2 - $s 3 add immediate 8 addi $s 1, $s 2, 6 $s 1 = $s 2 + 6 add imm. unsigned 9 addiu $s 1, $s 2, 6 $s 1 = $s 2 + 6 hi || lo = $s 1 * $s 2 Logical (R & I format) 331 W 07. 32 multiply 0 and 24 mult $s 1, $s 2 multiply unsigned 0 and 25 multu divide 0 and 26 divide unsigned 0 and 27 divu and 0 and 36 and $s 1, $s 2, $s 3 $s 1 = $s 2 & $s 3 or 0 and 37 or $s 1, $s 2, $s 3 $s 1 = $s 2 | $s 3 xor 0 and 38 xor $s 1, $s 2, $s 3 $s 1 = $s 2 xor $s 3 nor 0 and 39 nor $s 1, $s 3 $s 1 = !($s 2 | $s 2) $s 1, $s 2 hi || lo = $s 1 * $s 2 lo = $s 1/$s 2, rem. in hi and immediate 12 andi $s 1, $s 2, 6 $s 1 = $s 2 & 6 or immediate 13 ori $s 1, $s 2, 6 $s 1 = $s 2 | 6 xor immediate 14 xori $s 1, $s 2, 6 $s 1 = $s 2 xor 6 Spring 2006
Review: MIPS ISA, so far con’t Category Instr Op Code Example Meaning Shift sll 0 and 0 sll $s 1, $s 2, 4 $s 1 = $s 2 << 4 (R format) srl 0 and 2 srl $s 1, $s 2, 4 $s 1 = $s 2 >> 4 sra 0 and 3 sra $s 1, $s 2, 4 $s 1 = $s 2 >> 4 Data Transfer load word 35 lw $s 1, 24($s 2) $s 1 = Memory($s 2+24) store word 43 sw $s 1, 24($s 2) Memory($s 2+24) = $s 1 (I format) load byte 32 lb $s 1, 25($s 2) $s 1 = Memory($s 2+25) load byte unsigned 36 lbu $s 1, 25($s 2) $s 1 = Memory($s 2+25) store byte 40 sb $s 1, 25($s 2) Memory($s 2+25) = $s 1 load upper imm 15 lui $s 1, 6 $s 1 = 6 * 216 331 W 07. 33 move from hi 0 and 16 mfhi $s 1 = hi move to hi 0 and 17 mthi $s 1 hi = $s 1 move from lo 0 and 18 mflo $s 1 = lo move to lo 0 and 19 mtlo $s 1 lo = $s 1 Spring 2006
Review: MIPS ISA, so far con’t Category Instr Op Code Example Meaning Cond. Branch br on equal 4 beq $s 1, $s 2, L if ($s 1==$s 2) go to L br on not equal 5 bne $s 1, $s 2, L if ($s 1 !=$s 2) go to L (I & R format) set on less than unsigned Uncond. Jump (J & R format) 331 W 07. 34 0 and 42 slt $s 1, $s 2, $s 3 if ($s 2<$s 3) $s 1=1 else $s 1=0 0 and 43 sltu $s 1, $s 2, if ($s 2<$s 3) $s 1=1 else $s 3 $s 1=0 set on less than immediate 10 slti $s 1, $s 2, 6 if ($s 2<6) $s 1=1 else $s 1=0 set on less than imm. unsigned 11 sltiu $s 1, $s 2, 6 if ($s 2<6) $s 1=1 else $s 1=0 jump 2 j 2500 go to 10000 jump and link 3 jal 2500 go to 10000; $ra=PC+4 jump register 0 and 8 jr $s 1 go to $s 1 jump and link reg 0 and 9 jalr $s 1, $s 2 go to $s 1, $s 2=PC+4 Spring 2006
- Slides: 34