14 332 331 Computer Architecture and Assembly Language
14: 332: 331 Computer Architecture and Assembly Language Fall 2003 Week 6 [Adapted from Dave Patterson’s UCB CS 152 slides and Mary Jane Irwin’s PSU CSE 331 slides] 331 W 06. 1 Fall 2003
Head’s Up q This week’s material l VHDL modeling - Reading assignment – Y, Chapter 4 and 5 l MIPS arithmetic operations - Reading assignment – PH 4. 1 through 4. 3 q Next week’s material l MIPS logic and multiply instructions - Reading assignment – PH 4. 4 l MIPS ALU design - Reading assignment – PH 4. 5 331 W 06. 2 Fall 2003
Review: Entity-Architecture Features Design Entity-Architecture == Hardware Component Entity == External Characteristics Architecture (Body ) == Internal Behavior or Structure q Entity defines externally visible characteristics l q Ports: channels of communication Architecture defines the internal behavior or structure 331 W 06. 3 l Declaration of internal signals l Description of behavior - concurrent behavioral description: collection of CSA’s - process behavioral description: CSAs and variable assignment statements within a process description - structural description: system described in terms of the Fall 2003 interconnections of its components
Review: Model of Execution q CSA’s are executed concurrently - textural order of the statements is irrelevant to the correct operation q Two stage model of circuit execution l first stage - all CSA’s with events occurring at the current time on signals on their right hand side (RHS) are evaluated - all future events that are generated from this evaluation are scheduled l second stage - time is advanced to the time of the next event q VHDL programmer specifies l l l 331 W 06. 4 events - with CSA’s delays - with CSA’s with delay annotation concurrency - by having a distinct CSA for each signal Fall 2003
Review: Signal Resolution q Resolving values of pairs of std_logic type signals l When a signal has multiple drivers (e. g. , a bus), the value of the resulting signal is determined by a resolution function U X Z W L H - unknow n forcing unknow n high imped weak unknow n weak 0 weak 1 don’t care U U U U U X U X X X X 0 U X 0 0 0 0 X 1 U X X 1 1 1 X Z U X 0 1 Z W L H X W U X 0 1 W W X L U X 0 1 L W X H U X 0 1 H W W H X - U X X X X 331 W 06. 5 0 1 Fall 2003
Motivation for Process Construct q How would you build the logic (and the VHDL code) for a 32 by 2 multiplexor given inverters and 2 input nands? SEL A 0 DOUT B 331 W 06. 6 1 Fall 2003
MUX CSA Description SEL A 0 entity MUX 32 X 2 is port(A, B: in std_logic_vector(31 downto 0); DOUT: out std_logic_vector(31 downto 0); SEL: in std_logic); end MUX 32 X 2; DOUT B 1 q How can we describe the circuit in VHDL if we don’t know what primitive gates we will be designing with? 331 W 06. 7 Fall 2003
Mux Process Description entity MUX 32 X 2 is port(A, B: in std_logic_vector(31 downto 0); DOUT: out std_logic_vector(31 downto 0); SEL: in std_logic); end MUX 32 X 2; architecture process_behavior of MUX 32 X 2 is begin mux 32 x 2_process: process(A, B, SEL) begin if (SEL = ‘ 0’) then DOUT <= A after 5 ns; A else DOUT <= B after 4 ns; end if; B end process mux 32 x 2_process; end process_behavior; q SEL 0 DOUT 1 Process fires whenever a signal in the “sensitivity list” changes 331 W 06. 8 Fall 2003
VHDL Process Features q Process body is executed sequentially to completion in zero (simulation) time q Delays are associated only with assignment of values to signals l q marked by CSAs <= operator Variable assignments take effect immediately l marked by : = operator q Upon initialization all processes are executed once q After initialization processes are data-driven l l 331 W 06. 9 activated by events on signals in sensitivity list waiting for the occurrence of specific events using wait statements Fall 2003
Process Programming Constructs q if-then-else l q Boolean valued expressions are evaluated sequentially until first true is encountered case l if (expression 1 = ‘value 1’) then. . . elsif (expression 2 = ‘value 2’) then. . . end if; case (expression) is branches must cover all when ‘value 0’ => possible values for the case. . . expression end case; q for loop l l q loop index declared (locally) by virtue of use in loop stmt loop index cannot be assigned a value or altered in loop body while loop 331 W 06. 10 l for index in value 1 to value 2 loop while (condition) loop 2003 condition may involve variables modified within the. Fallloop
Behavioral Description of a Register File write_cntrl src 1_addr src 2_addr dst_addr src 1_data 32 words src 2_data write_data 32 bits library IEEE; use IEEE. std_logic_1164. all; use IEEE. std_logic_arith. all; entity regfile is port(write_data: in std_logic_vector(31 downto 0); dst_addr, src 1_addr, src 2_addr: in UNSIGNED(4 downto 0); write_cntrl: in std_logic; src 1_data, src 2_data: out std_logic_vector(31 downto 0)); end regfile; 331 W 06. 11 Fall 2003
Behavioral Description of a Register File, con’t architecture process_behavior of regfile is type reg_array is array(0 to 31) of std_logic_vector (31 downto 0); begin regfile_process: process(src 1_addr, src 2_addr, write_cntrl) variable data_array: reg_array : = ( (X” 00000000”), . . . (X” 0000”)); variable addrofsrc 1, addrofsrc 2, addrofdst: integer; begin addrofsrc 1 : = conv_integer(src 1_addr); addrofsrc 2 : = conv_integer(src 2_addr); addrofdst : = conv_integer(dst_addr); if write_cntrl = ‘ 1’ then data_array(addrofdst) : = write_data; end if; src 1_data <= data_array(addrofsrc 1) after 10 ns; src 2_data <= data_array(addrofsrc 2) after 10 ns; end process regfile_process; end process_behavior; 331 W 06. 12 Fall 2003
Process Construct with Wait Statement library IEEE; use IEEE. std_logic_1164. all; use IEEE. std_logic_arith. all; entity dff is port(D, clk: in std_logic; Q, Qbar: out std_logic); end dff; Q D dff Qbar clk positive edge-triggered architecture dff_behavior of dff is begin output: process begin wait until (clk’event and clk = ‘ 1’); Q <= D after 5 ns; Qbar <= not D after 5 ns; end process output; end dff_behavior; 331 W 06. 13 Fall 2003
Wait Statement Types q Wait statements specify conditions under which a process may resume execution after suspension l wait for time expression wait for (20 ns); - suspends process for a period of time defined by the time expression l wait on signal wait on clk, reset, status; - suspends process until an event occurs on one (or more) of the signals l wait until condition wait until (clk’event and clk = ‘ 1’); - suspends process until condition evaluates to specified Boolean l q wait Process resumes execution at the first statement following the wait statement 331 W 06. 14 Fall 2003
Signal Attributes q Attributes are used to return various types of information about a signal Function attribute signal_name’event Function Boolean value signifying a change in value on this signal Boolean value singifying an signal_name’active assignment made to this signal (may not be a new value!) signal_name’last_event Time since the last event on this signal_name’last_active Time since the signal was last active signal_name’last_value Previous value of this signal 331 W 06. 15 Fall 2003
Things to Remember About Processes q A process must have either a sensitivity list or at least one wait statement q A process cannot have both a sensitivity list and a wait statement q Remember, all processes are executed once when the simulation is started q Don’t confuse signals and variables. l l 331 W 06. 16 Signals are declared either in the port definitions in the entity description or as internal signals in the architecture description. They are used in CSAs. Signals will be updated only after the next simulation cycle. Variable exist only inside architecture process descriptions. They are used in variable assignment statements. Variables are updated immediately. Fall 2003
Finite State Machine “Structure” a z comb b Fetch PC = PC+4 Exec Q(0) Decode dff Q(1) dff D(0) D(1) clk 331 W 06. 17 Fall 2003
Structural VHDL Model q System is described by its component interconnections l assumes we have previously designed entity-architecture descriptions for both comb and dff with behavioral models in 1 in 2 a b z comb c_state(1) c_state(0) nxt_state(1) nxt_state(0) Qbar(0) dff Q(1) Qbar(1) dff clk 331 W 06. 18 out 1 D(0) D(1) clk Fall 2003
Finite State Machine Structural VHDL entity seq_circuit is port(in 1, in 2, clk: in std_logic; out 1: out std_logic); end seq_circuit; architecture structural of seq_circuit is component comb port(a, b: in std_logic; z: out std_logic; c_state: in std_logic_vector (1 downto 0); nxt_state: out std_logic_vector (1 downto 0)); end component; component dff port(D, clk: in std_logic; Q, Qbar: out std_logic); end component; for all: comb use entity work. comb(comb_behavior); for all: dff use entity work. dff(dff_behavior); signal s 1, s 2: std_logic_vector (1 downto 0); begin C 0: comb port map(a=>in 1, b=>in 2, c_state=>s 1, z=>out 1, nxt_state=>s 2); D 0: dff port map(D=>s 2(0), clk=>clk, Q=>s 1(0), Qbar=>open); D 1: dff port map(D=>s 2(1), clk=>clk, Q=>s 1(1), Qbar=>open); end structural; 331 W 06. 19 Fall 2003
Summary q Introduction to VHDL l A language to describe hardware - entity = symbol, architecture ~ schematic, signals = wires Inherently concurrent (parallel) l Has time as concept l Behavioral descriptions of a component l - can be specified using CSAs - can be specified using one or more processes and sequential statements l Structural descriptions of a system are specified in terms of its interconnections - behavioral models of each component must be provided 331 W 06. 20 Fall 2003
Because ease of use is the purpose, this ratio of function to conceptual complexity is the ultimate test of system design. Neither function alone nor simplicity alone defines a good design. 43 331 W 06. 21 The Mythical Man-Month, Brooks, pg. Fall 2003
Review: MIPS ISA Category Instr Op Code Example Meaning Arithmetic add 0 and 32 add $s 1, $s 2, $s 3 $s 1 = $s 2 + $s 3 (R & I format) subtract 0 and 34 sub $s 1, $s 2, $s 3 $s 1 = $s 2 - $s 3 add immediate 8 addi $s 1, $s 2, 6 $s 1 = $s 2 + 6 or immediate 13 ori $s 1, $s 2, 6 $s 1 = $s 2 v 6 Data Transfer load word 35 lw $s 1, 24($s 2) $s 1 = Memory($s 2+24) store word 43 sw $s 1, 24($s 2) Memory($s 2+24) = $s 1 (I format) load byte 32 lb $s 1, 25($s 2) $s 1 = Memory($s 2+25) store byte 40 sb $s 1, 25($s 2) Memory($s 2+25) = $s 1 load upper imm 15 lui $s 1, 6 $s 1 = 6 * 216 Cond. br on equal Branch (I br on not equal & R format) set on less than 4 beq $s 1, $s 2, L if ($s 1==$s 2) go to L 5 bne $s 1, $s 2, L if ($s 1 !=$s 2) go to L slt if ($s 2<$s 3) $s 1=1 else $s 1=0 set on less than immediate Uncond. jump Jump (J jump register & R format) jump and link 331 W 06. 22 0 and 42 $s 1, $s 2, $s 3 10 slti $s 1, $s 2, 6 if ($s 2<6) $s 1=1 else $s 1=0 2 j 2500 go to 10000 0 and 8 jr $t 1 go to $t 1 3 jal 2500 go to 10000; $ra=PC+4 Fall 2003
Review: MIPS Organization, so far Processor Memory Register File src 1 addr src 2 addr dst addr write data 5 5 5 1… 1100 src 1 data 32 32 registers ($zero - $ra) read/write addr src 2 32 data 32 32 32 bits br offset 32 PC Fetch PC = PC+4 Exec 32 Add 4 32 Add read data 32 32 32 write data 32 Decode 230 words 32 32 ALU 32 32 4 0 5 1 32 bits byte address (big Endian) 331 W 06. 23 6 2 7 3 0… 1100 0… 1000 0… 0100 0… 0000 word address (binary) Fall 2003
Arithmetic q Where we've been: l Abstractions: - Instruction Set Architecture (ISA) - Assembly and machine language q What's up ahead: l Implementing the architecture (in VHDL) zero ovf 1 A 1 32 ALU result 32 B 32 4 m (operation) 331 W 06. 24 Fall 2003
ALU VHDL Representation entity ALU is port(A, B: in std_logic_vector (31 downto 0); m: in std_logic_vector (3 downto 0); result: out std_logic_vector (31 downto 0); zero: out std_logic; ovf: out std_logic) end ALU; architecture process_behavior of ALU is. . . begin ALU: process begin. . . result : = A + B; . . . end process ALU; end process_behavior; 331 W 06. 25 Fall 2003
Number Representation q Bits are just bits (have no inherent meaning) l q conventions define the relationships between bits and numbers Binary numbers (base 2) - integers 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 . . . l q in decimal from 0 to 2 n-1 for n bits Of course, it gets more complicated l l l 331 W 06. 26 storage locations (e. g. , register file words) are finite, so have to worry about overflow (i. e. , when the number is too big to fit into 32 bits) have to be able to represent negative numbers, e. g. , how do we specify -8 in addi $sp, -8 #$sp = $sp - 8 in real systems have to provide for more that just integers, e. g. , fractions and real numbers (and floating point) Fall 2003
Possible Representations Sign Mag. Two’s Comp. One’s Comp. 1000 = -8 q Issues: 1111 = -7 1001= -7 1000 = -7 1110 = -6 1001 = -6 l balance 1101 = -5 1010 = -5 l 1100 = -4 1011 = -4 number of zeros 1011 = -3 1100 = -3 l ease of operations 1010 = -2 1101 = -2 1001 = -1 1110 = -1 1000 = -0 1111 = -0 0000 = +0 0001 = +1 0010 = +2 0011 = +3 0100 = +4 0101 = +5 0110 = +6 0111 = +7 331 W 06. 27 q Which one is best? Why? Fall 2003
MIPS Representations q 32 -bit signed numbers (2’s complement): 0000 0000 two = 0 ten 0000 0000 0001 two = + 1 ten 0000 0000 0010 two = + 2 ten. . . 0111 1111 2, 147, 483, 646 ten 0111 1111 2, 147, 483, 647 ten 1000 0000 2, 147, 483, 648 ten 1000 0000 2, 147, 483, 647 ten 1000 0000 2, 147, 483, 646 ten. . . maxint 1111 1110 two = + 1111 two = + 0000 two = – 0000 0001 two = – minint 0000 0010 two = – 1111 1111 1101 two = – 3 ten 1111 1111 1110 two = – 2 ten 1111 1111 two = – 1 ten 331 W 06. 28 Fall 2003
Review: Signed Binary Representation -23 = -(23 - 1) = 1011 then add a 1 1010 complement all the bits 23 - 1 = 331 W 06. 29 2’s comp 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0010 0011 0100 0101 0110 0111 decimal -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Fall 2003
Two's Complement Operations q Negating a two's complement number: complement all the bits and add a 1 l q remember: “negate” and “invert” are quite different! Converting n-bit numbers into numbers with more than n bits: l MIPS 16 -bit immediate gets converted to 32 bits for arithmetic l copy the most significant bit (the sign bit) into the other bits 331 W 06. 30 l 0010 -> 0000 0010 1010 -> 1111 1010 sign extension versus zero extend (lb vs. lbu) Fall 2003
Goal: Design a ALU for the MIPS ISA q Must support the Arithmetic/Logic operations of the ISA q Tradeoffs of cost and speed based on frequency of occurrence, hardware budget 331 W 06. 31 Fall 2003
MIPS Arithmetic and Logic Instructions 31 Type ADDI 25 20 R-type: op Rs Rt I-Type: op Rs Rt op funct Type 15 5 Rd 0 funct Immed 16 op funct Type op funct 001000 xx ADD 000000 1000000 101000 ADDIU 001001 xx ADDU 000000 100001 000000 101001 SLTI SUB 000000 100010 SLT 000000 101010 SLTIU 001011 xx SUBU 000000 100011 SLTU 000000 101011 ANDI 001100 xx AND 000000 100100 000000 101100 ORI 001101 xx OR XORI 001110 xx XOR 000000 100110 LUI 001111 xx NOR 000000 100111 q 331 W 06. 32 001010 xx 000000 100101 Signed arithmetic generates overflow, but no carry out Fall 2003
Design Trick: Divide & Conquer q Break the problem into simpler problems, solve them and glue together the solution q Example: assume the immediates have been taken care of before the ALU 331 W 06. 33 l now down to 10 operations 00 add l can encode in 4 bits 01 addu 02 sub 03 subu 04 and 05 or 06 xor 07 nor 12 slt 13 sltu Fall 2003
Addition & Subtraction q Just like in grade school (carry/borrow 1 s) 0111 + 0110 q 0110 - 0101 Two's complement operations easy l q 0111 - 0110 subtraction using addition of negative numbers 0111 - 0110 + 1010 Overflow (result too large for finite computer word): l 331 W 06. 34 e. g. , adding two n-bit numbers does not yield an n-bit number 0111 + 0001 Fall 2003
Building a 1 -bit Binary Adder carry_in A 1 bit Full Adder B carry_out S A B carry_in carry_out S 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 1 1 1 S = A xor B xor carry_in carry_out = A B v A carry_in v B carry_in (majority function) q How can we use it to build a 32 -bit adder? q How can we modify it easily to build an adder/subtractor? 331 W 06. 35 Fall 2003
Building 32 -bit Adder c 0=carry_in A 0 B 0 A 1 B 1 A 2 S 0 1 -bit FA c 2 S 1 1 -bit FA c 3 S 2 q Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect. . . q Ripple Carry Adder (RCA) l advantage: simple logic, so small (low cost) l disadvantage: slow and lots of glitching (so lots of energy consumption) . . . B 2 1 -bit FA c 1 c 31 A 31 B 31 1 -bit FA S 31 c 32=carry_out 331 W 06. 36 Fall 2003
Building 32 -bit Adder/Subtractor Remember 2’s complement is just l complement all the bits control (0=add, 1=subt) B 0 l B 0 if control = 0, !B 0 if control = 1 add a 1 in the least significant bit A 0111 B - 0110 c 0=carry_in A 0 1 -bit FA c 1 S 0 A 1 1 -bit FA c 2 S 1 A 2 1 -bit FA c 3 S 2 B 0 B 1 B 2 . . . q add/subt 0111 + 1010 c 31 A 31 B 31 331 W 06. 37 1 -bit FA S 31 c 32=carry_out Fall 2003
Overflow Detection and Effects q Overflow: the result is too large to represent in the number of bits allocated q When adding operands with different signs, overflow cannot occur! Overflow occurs when q l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive On overflow, an exception (interrupt) occurs l l q Control jumps to predefined address for exception Interrupted address (address of instruction causing the overflow) is saved for possible resumption Don't always want to detect (interrupt on) overflow 331 W 06. 38 Fall 2003
New MIPS Instructions Category Instr Op Code Example Meaning Arithmetic add unsigned 0 and 33 addu $s 1, $s 2, $s 3 $s 1 = $s 2 + $s 3 (R & I format) subt unsigned 0 and 35 subu $s 1, $s 2, $s 3 $s 1 = $s 2 - $s 3 add imm. unsigned 9 addiu $s 1, $s 2, 6 $s 1 = $s 2 + 6 Data Transfer load byte unsigned 36 lbu $s 1, 25($s 2) $s 1 = Memory($s 2+25) 0 and 43 sltu $s 1, $s 2, $s 3 if ($s 2<$s 3) $s 1=1 else $s 1=0 Cond. set on less than Branch (I unsigned & R format) set on less than imm. unsigned 11 sltiu $s 1, $s 2, 6 if ($s 2<6) $s 1=1 else $s 1=0 q Sign extend - addiu, sltiu q Zero extend - lbu q No overflow detected - addu, subu, addiu, sltiu 331 W 06. 39 Fall 2003
Conclusion q q q We can build an ALU to support the MIPS ISA l we can efficiently perform subtraction using two’s complement l we can replicate a 1 -bit ALU to produce a 32 -bit ALU Important points about hardware l all of the gates are always working (concurrent) l the speed of a gate is affected by the number of inputs to the gate (fan-in) and the number of gates that the output is connected to (fan-out) l the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “number of levels of logic”) Our primary focus: comprehension, however, l 331 W 06. 40 Clever changes to organization can improve performance (similar to using better algorithms in software) Fall 2003
- Slides: 40