ECE 232 Hardware Organization and Design Lecture 4

  • Slides: 19
Download presentation
ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL Maciej Ciesielski

ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL Maciej Ciesielski www. ecs. umass. edu/ece/labs/vlsicad/ece 232/spr 2002/index_232. html ECE 232 L 4 perform. 1 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Outline ° Performance, evaluation • Metrics: MIPS, CPI, execution time • Amdahl’s law °

Outline ° Performance, evaluation • Metrics: MIPS, CPI, execution time • Amdahl’s law ° VHDL basics • Combinational logic • Examples ° Instruction formats, cont’d • Addressing classes, modes • Examples • MIPS assembly ECE 232 L 4 perform. 2 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Two notions of “performance” Plane NY to Paris Speed Passengers Throughput (p/mph) Boeing 747

Two notions of “performance” Plane NY to Paris Speed Passengers Throughput (p/mph) Boeing 747 6. 5 hours 610 mph 470 286, 700 Concodre 3 hours 1350 mph 132 178, 200 Which has higher performance? ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns. . . (Performance) – throughput, bandwidth Response time and throughput often are in opposition ECE 232 L 4 perform. 3 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Performance - Example Performance: in units of things/time_unit - bigger is better • Time

Performance - Example Performance: in units of things/time_unit - bigger is better • Time of Concorde vs. Boeing 747? • Concord is 1350 mph / 610 mph = 2. 2 times faster (6. 5 hours / 3 hours) • Throughput of Concorde vs. Boeing 747 ? • Concord is 178, 200 pmph / 286, 700 pmph = 0. 62 “times faster” • Boeing is 286, 700 pmph / 178, 200 pmph = 1. 6 “times faster” • Boeing is 1. 6 times (“ 60%”)faster in terms of throughput • Concord is 2. 2 times (“ 120%”) faster in terms of flying time We will focus primarily on execution time for a single job ECE 232 L 4 perform. 4 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Metrics of performance Answers per month Application Useful Operations per second Programming Language Compiler

Metrics of performance Answers per month Application Useful Operations per second Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F. P. ) operations per second – MFLOP/s ISA Datapath Control Megabytes per second Function Units Transistors Wires Pins Cycles per second (clock rate) Each metric has a place and a purpose, and each can be misused ECE 232 L 4 perform. 5 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Review: Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles Program

Review: Aspects of CPU Performance CPU time = Seconds = Instructions x Cycles Program Instr count Instruction X Compiler X X Instr. Set X X X Technology ECE 232 L 4 perform. 6 Cycle CPI Clock rate Program Organization x Seconds Adapted from Patterson 97 ©UCB X X Copyright 1998 Morgan Kaufmann Publishers

MIPS, CPI • MIPS = # instructions per cycle (in millions) MIPS = Instruction

MIPS, CPI • MIPS = # instructions per cycle (in millions) MIPS = Instruction count / Execution time *106 • CPI = average # cycles per instruction CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count cycles per intstruction class i n CPU time = Clock. Cycle. Time * CPI i =1 i * Instri n CPI i =1 CPI = i * Fi where Fi = "instruction frequency" Instri Instruction Count ° Invest Resources where time is Spent! ECE 232 L 4 perform. 7 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Evaluating Instruction Sets Design-time metrics: ° Can it be implemented, in how long, at

Evaluating Instruction Sets Design-time metrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? ° How "lean" a clock is practical? Best Metric: Time to execute the program! NOTE: this depends on instructions set, processor organization, and compilation techniques. ECE 232 L 4 perform. 8 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Example (RISC processor) Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1

Example (RISC processor) Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 2 Typical Mix CPI(i). 5 1. 0. 3. 4 2. 2 % Time 23% 45% 14% 18% • How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? • How does this compare with using branch prediction to shave a cycle off the branch time? • What if two ALU instructions could be executed at once? ECE 232 L 4 perform. 9 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Amdahl's Law Speedup due to enhancement E: Exec_time w/o E Speedup(E) = -----------Exec_time with

Amdahl's Law Speedup due to enhancement E: Exec_time w/o E Speedup(E) = -----------Exec_time with E = F Performance with E -------------Performance w/o E F 1 1/S Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected. Then: Exec_time(with E) = (F/S + (1 -F) ) X Exec_time(w/o E) Speedup(with E) = ECE 232 L 4 perform. 10 1 (1 -F) + F/S Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Review: Summary of the Design Process • Hierarchical Design to manage complexity • Top

Review: Summary of the Design Process • Hierarchical Design to manage complexity • Top Down vs. Bottom Up vs. Successive Refinement • Importance of Design Representations: • • • Block Diagrams Decomposition into Bit Slices Truth Tables, K-Maps Circuit Diagrams Other Descriptions: - state diagrams - timing diagrams - register transfer, . . . • Optimization Criteria: Area Gate Count bottom up Logic Levels Fan-in/Fan-out [Package Count] ECE 232 L 4 perform. 11 top down Pin Out Adapted from Patterson 97 ©UCB Cost Delay Power Design time Copyright 1998 Morgan Kaufmann Publishers

Hardware Representation Languages Block Diagrams: FUs, Registers, & Dataflows Register Transfer Diagrams: Choice of

Hardware Representation Languages Block Diagrams: FUs, Registers, & Dataflows Register Transfer Diagrams: Choice of busses to connect FUs, Regs Flowcharts State Diagrams Two different ways to describe sequencing & microoperations Hardware Description Languages HW modules described like programs with i/o ports, internal state, & parallel execution of assignment statements Verilog HDL VHDL Descriptions in these languages can be used as input to simulation systems "software breadboard" synthesis systems generate hw from high level description "To Design is to Represent" ECE 232 L 4 perform. 12 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

VHDL (VHSIC Hardware Description Language) ° Goals: • Support design, documentation, and simulation of

VHDL (VHSIC Hardware Description Language) ° Goals: • Support design, documentation, and simulation of hardware • Digital system level to gate level • “Technology Insertion” ° Concepts: • Design entity • Time-based execution model. Interface = External Characteristics Design Entity = Hardware Component Architecture (Body ) = Internal Behavior or Structure ECE 232 L 4 perform. 13 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

VHDL Example: nand Gate ENTITY nand is PORT (a, b: IN BIT; names (given)

VHDL Example: nand Gate ENTITY nand is PORT (a, b: IN BIT; names (given) y: OUT BIT); END nand; a ARCHITECTURE behavioral OF nand is BEGIN nand y b y <= a NAND b; END behavioral; ° Entity describes interface ° Architecture give behavior (function) ° y is a signal, not a variable • it changes whenever the inputs change • NAND process is in an infinite loop ° Bit is 0, 1. Can also use STD_LOGIC (0, 1, Z, X) ECE 232 L 4 perform. 14 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Modeling Delays ENTITY nand is PORT (a, b: IN BIT; y: OUT BIT); END

Modeling Delays ENTITY nand is PORT (a, b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y <= a NAND b after 1 ns; END behavioral; ° Model temporal, as well as functional behavior, with delays in signal statements. Time is one difference from programming languages ° Output y changes 1 ns after a or b changes ° Delay statements not supported by synthesis tools (non-synthesizable) ECE 232 L 4 perform. 15 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Bit-vector Operators ENTITY nand 32 is PORT (a, b: IN STD_LOGIC_VECTOR ( 31 downto

Bit-vector Operators ENTITY nand 32 is PORT (a, b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END nand 32; ARCHITECTURE behavioral OF nand 32 is BEGIN y <= a NAND b; STD_LOGIC_VECTOR END behavioral; • Can be converted to a 32 bit integer a [31: 0] b [31: 0] ECE 232 L 4 perform. 16 Adapted from Patterson 97 ©UCB nand 32 Y[31: 0] Copyright 1998 Morgan Kaufmann Publishers

Simple Operators LIBRARY ieee; ENTITY mux 2 to 1 IS PORT (a, b, sel:

Simple Operators LIBRARY ieee; ENTITY mux 2 to 1 IS PORT (a, b, sel: IN STD_LOGIC; y: OUT STD_LOGIC; a b END mux 2 to 1; 0 1 mux 2 to 1 USE ieee. std_logic_1164. all; y sel ARCHITECTURE logic OF mux 2 to 1 IS BEGIN WITH sel SELECT y <= a WHEN ‘ 0’ ; b WHEN OTHERS; END logic ; You can also use other constructs: IF … THEN WHEN, etc. ° Must use “others”, since sel={0, 1, Z, X} (std_logic) ECE 232 L 4 perform. 17 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Arithmetic Operations ENTITY add 32 is PORT (a, b: IN STD_LOGIC_VECTOR ( 31 downto

Arithmetic Operations ENTITY add 32 is PORT (a, b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END add 32; ARCHITECTURE behavioral OF add 32 is BEGIN y <= addum(a, b) ; END behavioral; ° “addum” adds two n-bit vectors to produce an n+1 bit vector ° Alternatively, you can declare a, b, y as INTEGERS, and use y <= a+b. ECE 232 L 4 perform. 18 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers

Control Constructs ENTITY mux 32 is PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0); DOUT:

Control Constructs ENTITY mux 32 is PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0); DOUT: STD_LOGIC_VECTOR (31 downto 0); SEL: in BIT); End mux 32; ARCHITECTURE behavior Of mux 32 Is begin mux 32_process: process(A, B, SEL) begin if (SEL= 0) then DOUT <= A; else DOUT <= B; end if; end process; end behavior; ° Process fires whenever its “sensitivity list” changes ° Evaluates the body sequentially ° VHDL provide case statements as well ECE 232 L 4 perform. 19 Adapted from Patterson 97 ©UCB Copyright 1998 Morgan Kaufmann Publishers