Computer Organization EECC 550 Week 1 Week 2

Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals • Computing Element Choices: –

Computing Element Choices • • General Purpose Processors (GPPs): Intended for general purpose computing

Programmability / Flexibility Computing Element Choices General Purpose Processors (GPPs): The main goal of

Computing Element Choices: Computing Element Programmability (Hardware) (Processor) Software Fixed Function: Programmable: • Computes

Computing Element Choices: Spatial vs. Temporal Computing Spatial (using hardware) Defined by fixed functionality

The main goal of this course is the study of fundamental design techniques for

Performance The Processor Design Space Application specific architectures for performance Embedded Real-time constraints processors

General Purpose Processor/Computer System Generations Classified according to implementation technology: • The First Generation,

The Von Neumann Computer Model • Partitioning of the programmable computing engine into components:

Generic CPU Machine Instruction Processing Steps (Implied by The Von Neumann Computer Model) Instruction

Hardware Components of Computer Systems Five classic components of all computers: 1. Control Unit;

CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions

Control Unit A Typical Microprocessor Layout: The Intel Pentium Classic 1993 - 1997 60

Computer System Components CPU Core Recently 1 or 2 or 4 processor cores per

Microprocessor Performance Increase 1984 -2000 SPEC CPU 2000 Performance > 100 x performance increase

Microprocessor Transistor Count Growth Rate Currently ~ 2 Billion Moore’s Law: 2 X transistors/Chip

A current Multi-core Microprocessor Example The increase in transistor chip density allows integrating more

Increase of Capacity of VLSI Dynamic RAM (DRAM) Memory Chips 1. 55 X/yr, or

Computer Technology Trends: Evolutionary but Rapid Change • Processor: – 1. 5 -1. 6

A Simplified View of The Software/Hardware Hierarchical Layers EECC 550 - Shaaban #22 Lec

Hierarchy of Computer Architecture High-Level Language Programs Software Application Operating System Machine Language Program

Levels of Program Representation temp = v[k]; High Level Language Program v[k] = v[k+1];

A Hierarchy of Computer Design Level Name Modules Primitives Descriptive Media 1 Electronics Gates,

Hardware Description • Hardware visualization: – Block diagrams (spatial visualization): Two-dimensional representations of functional

Register Transfer Notation (RTN) • Independent RTN: – No predefined data flow is assumed

RTN Statement Examples A ¬ B or R[A] ¬ R[B] where R[X] mean the

RTN Statement Examples MD ¬ M[MA] or MD ¬ Mem[MA] – Means the memory

Computer Architecture Vs. Computer Organization • The term Computer architecture is sometimes erroneously restricted

Assembly Programmer Or Compiler Instruction Set Architecture (ISA) “. . . the attributes of

Evolution of Instruction Set Architectures Single Accumulator (EDSAC 1949) No ISA Accumulator + Index

Computer Instruction Sets • Regardless of computer type, CPU structure, or hardware organization, every

Instruction Set Architecture (ISA) Instruction Specification Requirements Fetch Instruction Decode Operand Fetch Execute Result

Main General Types of Instructions 1. Data Movement Instructions, possible variations: – – –

Examples of Data Movement Instructions Instruction Meaning Machine MOV A, B Move 16 -bit

Examples of ALU Instructions Instruction Meaning Machine MULF A, B, C Multiply the 32

Examples of Branch Instructions Instruction Meaning Machine BLBS A, Tgt Branch to address Tgt

Operation Types in The Instruction Set Operator Type Examples Arithmetic and logical Integer arithmetic

Instruction Usage Example: Top 10 Intel X 86 Instructions Rank instruction Integer Average Percent

Types of Instruction Set Architectures According To Operand Addressing Fields Memory-To-Memory Machines: – Operands

Types of Instruction Set Architectures Memory-To-Memory Machines: The 4 -Address Machine • • No

Types of Instruction Set Architectures Memory-To-Memory Machines: The 3 -Address Machine • • A

Types of Instruction Set Architectures Memory-To-Memory Machines: The 2 -Address Machine • The 2

Types of Instruction Set Architectures The 1 -address (Accumulator) Machine • A single accumulator

Types of Instruction Set Architectures The 0 -address (Stack) Machine • A push-down stack

Types of Instruction Set Architectures General Purpose Register (GPR) Machines • CPU contains several

Expression Evaluation Example with 3 -, 2 -, 1 -, 0 -Address, And GPR

Typical GPR ISA Memory Addressing Modes Addressing Sample Mode Instruction Meaning Register Add R

Addressing Modes Usage Example For 3 programs running on VAX ignoring direct register mode:

Displacement Address Size Example Avg. of 5 SPECint 92 programs v. avg. 5 SPECfp

Instruction Set Encoding Considerations affecting instruction set encoding: – The number of registers and

Three Examples of Instruction Set Encoding Operations & no of operands Address specifier 1

Instruction Set Architecture Tradeoffs • 3 -address machine: shortest code sequence; a large number

ISA Examples Machine Number of General Architecture year Purpose Registers EDSAC IBM 701 CDC

Examples of GPR Machines For Arithmetic/Logic (ALU) Instructions (ISAs) Max. number of memory addresses

Complex Instruction Set Computer (CISC) • Emphasizes doing more with each instruction: ISAs –

Example CISC ISAs Motorola 680 X 0 18 addressing modes: • • • •

Example CISC ISA: Intel 80386 Intel X 86 or IA-32 GPR ISA (Register-Memory) 12

Reduced Instruction Set Computer (RISC) ~1984 ISAs • Focuses on reducing the number and

Example RISC ISA: Power. PC 8 addressing modes: • • Register direct. Immediate. Register

Example RISC ISA: HP Precision Architecture HP PA-RISC Load-Store GPR 7 addressing modes: •

Example RISC ISA: SPARC 5 addressing modes: • • • Register indirect with immediate

Example RISC ISA: DEC Alpha AXP Load-Store GPR 4 addressing modes: • • Register

RISC ISA Example: MIPS R 3000 (32 -bit) Instruction Categories: 5 Addressing Modes: •

Slides: 65

Download presentation

Computer Organization EECC 550 Week 1 Week 2 Week 3 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Notation (RTN). [Chapters 1, 2] • Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] • MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2] • Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 1] • CPU Organization: Datapath & Control Unit Design. [Chapter 4] Week 4 Week 5 Week 6 Week 7 Week 8 • 3 rd Edition Ch. 5 – MIPS Single Cycle Datapath & Control Unit Design. – MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Microprogrammed Control Unit Design. – 3 rd Edition Ch. 4 3 rd Edition Ch. 5 (not in 4 th) 3 rd Edition Ch. 5 (not in 4 th Edition) Microprogramming Project • Midterm Review and Midterm Exam • CPU Pipelining. [Chapter 4] • The Memory Hierarchy: Cache Design & Performance. [Chapter 5] 3 rd Edition Ch. 6 3 rd Edition Ch. 7 • The Memory Hierarchy: Main & Virtual Memory. [Chapter 5] Week 9 • Input/Output Organization & System Performance Evaluation. [Chapter 7] Week 10 • Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Week 11 • Final Exam. 3 rd Edition Ch. 8 EECC 550 - Shaaban #1 Lec # 1 Winter 2009 12 -1 -2009

Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals • Computing Element Choices: – – – • • Computing Element Programmability Spatial vs. Temporal Computing Main Processor Types/Applications General Purpose Processor Generations The Von Neumann Computer Model CPU Organization (Design) Recent Trends in Computer Design/performance Hierarchy of Computer Architecture Hardware Description: Register Transfer Notation (RTN) Computer Architecture Vs. Computer Organization Instruction Set Architecture (ISA): – – – – – Definition and purpose ISA Specification Requirements Main General Types of Instructions ISA Types and characteristics Typical ISA Addressing Modes Instruction Set Encoding Instruction Set Architecture Tradeoffs Complex Instruction Set Computer (CISC) Reduced Instruction Set Computer (RISC) Evolution of Instruction Set Architectures Chapters 1, 2 (both editions) EECC 550 - Shaaban #2 Lec # 1 Winter 2009 12 -1 -2009

Computing Element Choices • • General Purpose Processors (GPPs): Intended for general purpose computing (desktops, servers, clusters. . ) Application-Specific Processors (ASPs): Processors with ISAs and architectural features tailored towards specific application domains – E. g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors, Graphics Processing Units (GPUs), Vector Processors? ? ? . . . • • Co-Processors: A hardware (hardwired) implementation of specific algorithms with limited programming interface (augment GPPs or ASPs) Configurable Hardware: – Field Programmable Gate Arrays (FPGAs) – Configurable array of simple processing elements • Application Specific Integrated Circuits (ASICs): A custom VLSI hardware solution for a specific computational task • The choice of one or more depends on a number of factors including: - Type and complexity of computational algorithm (general purpose vs. Specialized) - Desired level of flexibility/ - Performance requirements programmability - Development cost/time - System cost - Power requirements - Real-time constrains The main goal of this course is the study of fundamental design techniques for General Purpose Processors EECC 550 - Shaaban #3 Lec # 1 Winter 2009 12 -1 -2009

Programmability / Flexibility Computing Element Choices General Purpose Processors (GPPs): The main goal of this course is the study of fundamental design techniques for General Purpose Processors Application-Specific Processors (ASPs) Processor : Programmable computing element that runs programs written using a pre-defined set of instructions Configurable Hardware Selection Factors: - Type and complexity of computational algorithms (general purpose vs. Specialized) - Desired level of flexibility - Performance - Development cost - System cost - Power requirements - Real-time constrains Co-Processors Specialization , Development cost/time Performance/Chip Area/Watt (Computational Efficiency) Application Specific Integrated Circuits (ASICs) Performance EECC 550 - Shaaban #4 Lec # 1 Winter 2009 12 -1 -2009

Computing Element Choices: Computing Element Programmability (Hardware) (Processor) Software Fixed Function: Programmable: • Computes one function (e. g. FP-multiply, divider, DCT) • Function defined at fabrication time • e. g hardware (ASICs) • Computes “any” computable function (e. g. Processors) • Function defined after fabrication Parameterizable Hardware: Performs limited “set” of functions e. g. Co-Processors Processor = Programmable computing element that runs programs written using pre-defined instructions EECC 550 - Shaaban #5 Lec # 1 Winter 2009 12 -1 -2009

Computing Element Choices: Spatial vs. Temporal Computing Spatial (using hardware) Defined by fixed functionality and connectivity of hardware elements Hardware Block Diagram Temporal (using software/program running on a processor) Processor Instructions (Program) Processor = Programmable computing element that runs programs written using a pre-defined set of instructions EECC 550 - Shaaban #6 Lec # 1 Winter 2009 12 -1 -2009

The main goal of this course is the study of fundamental design techniques for General Purpose Processors • • Examples of Application-Specific Processors (ASPs) Increasing volume • General Purpose Computing & General Purpose Processors (GPPs) – High performance: In general, faster is always better. – RISC or CISC: Intel P 4, IBM Power 4, SPARC, Power. PC, MIPS. . . 64 bit – Used for general purpose software – End-user programmable – Real-time performance may not be fully predictable (due to dynamic arch. features) – Heavy weight, multi-tasking OS - Windows, UNIX – Normally, low cost and power not a requirement (changing) – Servers, Workstations, Desktops (PC’s), Notebooks, Clusters … Embedded Processing: Embedded processors and processor cores – Cost, power code-size and real-time requirements and constraints – Once real-time constraints are met, a faster processor may not be better 16 -32 bit – e. g: Intel XScale, ARM, 486 SX, Hitachi SH 7000, NEC V 800. . . – Often require Digital signal processing (DSP) support or other application-specific support (e. g network, media processing) – Single or few specialized programs – known at system design time – Not end-user programmable – Real-time performance must be fully predictable (avoid dynamic arch. features) – Lightweight, often realtime OS or no OS – Examples: Cellular phones, consumer electronics. . … Microcontrollers 8 bit – Extremely code size/cost/power sensitive – Single program – Small word size - 8 bit common Processor = Programmable computing element – Usually no OS that runs programs written using pre-defined instructions – Highest volume processors by far – Examples: Control systems, Automobiles, industrial control, thermostats, . . . Increasing Cost/Complexity Main Processor Types/Applications EECC 550 - Shaaban #7 Lec # 1 Winter 2009 12 -1 -2009

Performance The Processor Design Space Application specific architectures for performance Embedded Real-time constraints processors Specialized applications Low power/cost constraints Microprocessors GPPs Performance is everything & Software rules The main goal of this course is the study of fundamental design techniques for General Purpose Processors Microcontrollers Cost is everything Chip Area, Power complexity Processor = Programmable computing element that runs programs written using a pre-defined set of instructions Processor Cost EECC 550 - Shaaban #8 Lec # 1 Winter 2009 12 -1 -2009

General Purpose Processor/Computer System Generations Classified according to implementation technology: • The First Generation, 1946 -59: Vacuum Tubes, Relays, Mercury Delay Lines: – ENIAC (Electronic Numerical Integrator and Computer): First electronic computer, 18000 vacuum tubes, 1500 relays, 5000 additions/sec (1944). – First stored program computer: EDSAC (Electronic Delay Storage Automatic Calculator), 1949. • The Second Generation, 1959 -64: Discrete Transistors. – e. g. IBM Main frames • The Third Generation, 1964 -75: Small and Medium-Scale Integrated (MSI) Circuits. – e. g Main frames (IBM 360) , mini computers (DEC PDP-8, PDP-11). • The Fourth Generation, 1975 -Present: The Microcomputer. VLSI-based Microprocessors (single-chip processor) – First microprocessor: Intel’s 4 -bit 4004 (2300 transistors), 1970. – Personal Computer (PCs), laptops, PDAs, servers, clusters … – Reduced Instruction Set Computer (RISC) 1984 Common factor among all generations: All target the The Von Neumann Computer Model or paradigm EECC 550 - Shaaban #9 Lec # 1 Winter 2009 12 -1 -2009

The Von Neumann Computer Model • Partitioning of the programmable computing engine into components: – – Central Processing Unit (CPU): Control Unit (instruction decode , sequencing of operations), Datapath (registers, arithmetic and logic unit, connections, buses …). AKA Program Counter Memory: Instruction (program) and operand (data) storage. (PC) Based Architecture Input/Output (I/O) sub-system: I/O bus, interfaces, devices. The stored program concept: Instructions from an instruction set are fetched from a common memory and executed one at a time The Program Counter (PC) points to next instruction to be processed Control Input Memory (instructions, data) Computer System Datapath registers ALU, buses Output CPU I/O Devices Major CPU Performance Limitation: The Von Neumann computing model implies sequential execution one instruction at a time Another Performance Limitation: Separation of CPU and memory (The Von Neumann memory bottleneck) EECC 550 - Shaaban #10 Lec # 1 Winter 2009 12 -1 -2009

Generic CPU Machine Instruction Processing Steps (Implied by The Von Neumann Computer Model) Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Obtain instruction from program storage (memory) The Program Counter (PC) points to next instruction to be processed Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor or next instruction (i. e Update PC to fetch next instruction to be processed) Major CPU Performance Limitation: The Von Neumann computing model implies sequential execution one instruction at a time EECC 550 - Shaaban #11 Lec # 1 Winter 2009 12 -1 -2009

Hardware Components of Computer Systems Five classic components of all computers: 1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output } } Processor I/O Central Processing Unit (CPU) Computer Processor (active) Control Unit Datapath Memory (passive) (where programs, data live when running) Devices Keyboard, Mouse, etc. Input I/O Disk Output Display, Printer, etc. EECC 550 - Shaaban #12 Lec # 1 Winter 2009 12 -1 -2009

CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions – Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions – (e. g. , Registers, ALU, Shifters, Logic Units, . . . ) Components – Ways in which these components are interconnected (buses connections, multiplexors, etc. ). Connections – How information flows between components. • Control Unit Design: Control/sequencing of operations of datapath components to realize ISA instructions – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). ISA = Instruction Set Architecture The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers EECC 550 - Shaaban #13 Lec # 1 Winter 2009 12 -1 -2009

Control Unit A Typical Microprocessor Layout: The Intel Pentium Classic 1993 - 1997 60 MHz - 233 MHz Datapath First Level of Memory (Cache) EECC 550 - Shaaban #14 Lec # 1 Winter 2009 12 -1 -2009

Control Unit A Typical Microprocessor Layout: The Intel Pentium Classic 1993 - 1997 60 MHz - 233 MHz Datapath First Level of Memory (Cache) EECC 550 - Shaaban #15 Lec # 1 Winter 2009 12 -1 -2009

Computer System Components CPU Core Recently 1 or 2 or 4 processor cores per chip 1 GHz - 3. 8 GHz 4 -way Superscaler All Non-blocking caches RISC or RISC-core (x 86): L 1 16 -128 K 1 -2 way set associative (on chip), separate or unified Deep Instruction Pipelines L 1 L 2 256 K- 2 M 4 -32 way set associative (on chip) unified Dynamic scheduling L 3 2 -16 M 8 -32 way set associative (off or on chip) unified CPU Multiple FP, integer FUs Dynamic branch prediction L 2 Hardware speculation Examples: Alpha, AMD K 7: EV 6, 200 -400 MHz Intel PII, PIII: GTL+ 133 MHz L 3 SDRAM Caches Intel P 4 800 MHz PC 100/PC 133 100 -133 MHZ 64 -128 bits wide 2 -way inteleaved ~ 900 MBYTES/SEC )64 bit) Current Standard Double Date Rate (DDR) SDRAM PC 3200 MHZ DDR 64 -128 bits wide 4 -way interleaved ~3. 2 GBYTES/SEC (one 64 bit channel) ~6. 4 GBYTES/SEC (two 64 bit channels) Front Side Bus (FSB) Off or On-chip adapters Memory Controller Memory Bus RAMbus DRAM (RDRAM) 400 MHZ DDR 16 bits wide (32 banks) ~ 1. 6 GBYTES/SEC NICs Controllers Memory I/O Buses Disks Displays Keyboards North Bridge South Bridge Chipset Example: PCI, 33 -66 MHz 32 -64 bits wide 133 -528 MBYTES/SEC PCI-X 133 MHz 64 bit 1024 MBYTES/SEC Networks I/O Devices: I/O Subsystem EECC 550 - Shaaban #16 Lec # 1 Winter 2009 12 -1 -2009

Microprocessor Performance Increase 1984 -2000 SPEC CPU 2000 Performance > 100 x performance increase in one decade EECC 550 - Shaaban #17 Lec # 1 Winter 2009 12 -1 -2009

Microprocessor Transistor Count Growth Rate Currently ~ 2 Billion Moore’s Law: 2 X transistors/Chip Every 1. 5 -2 years (circa 1970) Intel 4004 (2300 transistors) ~ 800, 000 x transistor density increase in the last 38 years Still holds today EECC 550 - Shaaban #18 Lec # 1 Winter 2009 12 -1 -2009

A current Multi-core Microprocessor Example The increase in transistor chip density allows integrating more than one processor core per chip A benefit of Moore’s Law n AMD Barcelona (Opteron X 4) : 4 processor cores on one chip EECC 550 - Shaaban #19 Lec # 1 Winter 2009 12 -1 -2009

Increase of Capacity of VLSI Dynamic RAM (DRAM) Memory Chips 1. 55 X/yr, or doubling every 1. 6 years (Also follows Moore’s Law) ~ 17, 000 x DRAM chip capacity increase in 20 years EECC 550 - Shaaban #20 Lec # 1 Winter 2009 12 -1 -2009

Computer Technology Trends: Evolutionary but Rapid Change • Processor: – 1. 5 -1. 6 performance improvement every year; Over 100 X performance in last decade. • Memory: – DRAM capacity: > 2 x every 1. 5 years; 1000 X size in last decade. – Cost per bit: Improves about 25% or more per year. – Only 15 -25% performance improvement per year. • Disk: – – Performance gap compared Capacity: > 2 X in size every 1. 5 years. to CPU performance causes Cost per bit: Improves about 60% per year. system performance bottlenecks 200 X size in last decade. Only 10% performance improvement per year, due to mechanical limitations. • Expected State-of-the-art PC Fourth Quarter 2009 : – Processor clock speed: ~ 3000 Mega. Hertz (3 Giga Hertz) – Memory capacity: ~ 8000 Mega. Byte (8 Giga Bytes) – Disk capacity: > 1000 Giga. Bytes (1 Tera Bytes) With 2 -4 processor cores on a single chip EECC 550 - Shaaban #21 Lec # 1 Winter 2009 12 -1 -2009

A Simplified View of The Software/Hardware Hierarchical Layers EECC 550 - Shaaban #22 Lec # 1 Winter 2009 12 -1 -2009

Hierarchy of Computer Architecture High-Level Language Programs Software Application Operating System Machine Language Program Software/Hardware Boundary Assembly Language Programs Compiler Firmware Instr. Set Proc. I/O system Instruction Set Architecture (ISA) The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers Datapath & Control Hardware e. g. BIOS (Basic Input/Output System) Digital Design Circuit Design Microprogram Layout Logic Diagrams Circuit Diagrams VLSI placement & routing Register Transfer Notation (RTN) EECC 550 - Shaaban #23 Lec # 1 Winter 2009 12 -1 -2009

Levels of Program Representation temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; Compiler lw $15, 0($2) lw $16, 4($2) sw$16, 0($2) sw$15, 4($2) Hardware Software Assembly Language Program Assembler Machine Language Program ISA 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 1010 0000 0101 1100 MIPS Assembly Code 1111 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Machine Interpretation Control Signal Specification ° ° Microprogram ALUOP[0: 3] <= Inst. Reg[9: 11] & MASK Register Transfer Notation (RTN) ISA = Instruction Set Architecture. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers EECC 550 - Shaaban #24 Lec # 1 Winter 2009 12 -1 -2009

A Hierarchy of Computer Design Level Name Modules Primitives Descriptive Media 1 Electronics Gates, FF’s Transistors, Resistors, etc. Circuit Diagrams 2 Logic Registers, ALU’s. . . Gates, FF’s …. Logic Diagrams 3 Organization Processors, Memories Registers, ALU’s … Register Transfer Notation (RTN) Low Level - Hardware 4 Microprogramming Assembly Language Microinstructions Microprogram Firmware 5 Assembly language OS Routines Assembly language Assembly Language programming Instructions Programs 6 Procedural Applications OS Routines High-level Language Programming Drivers. . High-level Languages Programs 7 Application Systems Procedural Constructs Problem-Oriented Programs High Level - Software EECC 550 - Shaaban #25 Lec # 1 Winter 2009 12 -1 -2009

Hardware Description • Hardware visualization: – Block diagrams (spatial visualization): Two-dimensional representations of functional units and their interconnections. – Timing charts (temporal visualization): Waveforms where events are displayed vs. time. • Register Transfer Notation (RTN): AKA Register Transfer Language (RTL) – A way to describe microoperations capable of being performed by the data flow (data registers, data buses, functional units) at the register transfer level of design (RT). – Also describes conditional information in the system which cause operations to come about. – A “shorthand” notation for microoperations. • Hardware Description Languages: – Examples: VHDL: VHSIC (Very High Speed Integrated Circuits) Hardware Description Language, Verilog. EECC 550 - Shaaban #26 Lec # 1 Winter 2009 12 -1 -2009

Register Transfer Notation (RTN) • Independent RTN: – No predefined data flow is assumed (i. e No datapath design yet) – Describe actions on registers and memory locations without regard to nonexistence of direct paths or intermediate registers. – Useful to describe functionality of instructions of a given ISA. • Dependent RTN: – When RTN is used after the data flow (datapath design) is assumed to be frozen. – No data transfer can take place over a path that does not exist. – No RTN statement implies a function the data flow hardware is incapable of performing. • The general format of an RTN statement: Conditional information : Action 1; Action 2; … • The conditional statement is often an AND of literals (status and control signals) in the system (a p-term). The p-term is said to imply the action. • Possible actions include transfer of data to/from registers/memory data shifting, functional unit operations etc. EECC 550 - Shaaban #27 Lec # 1 Winter 2009 12 -1 -2009

RTN Statement Examples A ¬ B or R[A] ¬ R[B] where R[X] mean the content of register X – A copy of the data in entity B (typically a register) is placed in Register A – If the destination register has fewer bits than the source, the destination accepts only the lowest-order bits. – If the destination has more bits than the source, the value of the source is sign extended to the left. CTL · T 0: A = B – The contents of B are presented to the input of combinational circuit A – This action to the right of “: ” takes place when control signal CTL is active and signal T 0 is active. EECC 550 - Shaaban #28 Lec # 1 Winter 2009 12 -1 -2009

RTN Statement Examples MD ¬ M[MA] or MD ¬ Mem[MA] – Means the memory data (MD) register receives the contents of the main memory (M or Mem) as addressed from the Memory Address (MA) register. AC(0), AC(1), AC(2), AC(3) – – – Register fields are indicated by parenthesis. The concatenation operation is indicated by a comma. Bit AC(0) is bit 0 of the accumulator AC The above expression means AC bits 0, 1, 2, 3 More commonly represented by AC(0 -3) E · T 3: CLRWRITE – The control signal CLRWRITE is activated when the condition E · T 3 is active. EECC 550 - Shaaban #29 Lec # 1 Winter 2009 12 -1 -2009

Computer Architecture Vs. Computer Organization • The term Computer architecture is sometimes erroneously restricted to computer instruction set design, with other aspects of computer design called implementation. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers • More accurate definitions: – Instruction Set Architecture (ISA): The actual programmervisible instruction set and serves as the boundary or interface between the software and hardware. – Implementation of a machine has two components: • Organization: includes the high-level aspects of a computer’s CPU Microarchitecture design such as: The memory system, the bus structure, the (CPU design) internal CPU unit which includes implementations of arithmetic, logic, branching, and data transfer operations. • Hardware: Refers to the specifics of the machine such as detailed logic design and packaging technology. Hardware design and implementation • In general, Computer Architecture refers to the above three aspects: 1 - Instruction set architecture 2 - Organization. 3 - Hardware. EECC 550 - Shaaban #30 Lec # 1 Winter 2009 12 -1 -2009

Assembly Programmer Or Compiler Instruction Set Architecture (ISA) “. . . the attributes of a [computing] system as seen by the programmer, i. e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. ” – Amdahl, Blaaw, and Brooks, 1964. The ISA forms an abstraction layer that sets the requirements for both complier and CPU designers The instruction set architecture is concerned with: • Organization of programmable storage (memory & registers): Includes the amount of addressable memory and number of available registers. • Data Types & Data Structures: Encodings & representations. • Instruction Set: What operations are specified. • Instruction formats and encoding. • Modes of addressing and accessing data items and instructions • Exceptional conditions. EECC 550 - Shaaban #31 Lec # 1 Winter 2009 12 -1 -2009

Evolution of Instruction Set Architectures Single Accumulator (EDSAC 1949) No ISA Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language Based (B 5000 1963) Concept of an ISA Family (IBM 360 1964) General Purpose Register (GPR) Machines Complex Instruction Sets (CISC) (Vax, Motorola 68000, Intel x 86 1977 -80) Load/Store Architecture (CDC 6600, Cray 1 1963 -76) Reduced Instruction Set Computer (RISC) ( (MIPS, SPARC, HP-PA, Power. PC, . . . 1984. . ) EECC 550 - Shaaban #32 Lec # 1 Winter 2009 12 -1 -2009

Computer Instruction Sets • Regardless of computer type, CPU structure, or hardware organization, every machine instruction must specify the following: – Opcode: Which operation to perform. Example: add, load, Opcode = Operation Code and branch. – Where to find the operand or operands, if any: Operands may be contained in CPU registers, main memory, or I/O ports. Operands location can be explicitly specified in the instruction or implied – Where to put the result, if there is a result: May be explicitly mentioned or implicit in the opcode. – Where to find the next instruction: Without any explicit branches, the instruction to execute is the next instruction in the sequence or a specified address in case of jump or branch instructions. EECC 550 - Shaaban #33 Lec # 1 Winter 2009 12 -1 -2009

Instruction Set Architecture (ISA) Instruction Specification Requirements Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction • Instruction Format or Encoding: – How is it decoded? • Location of operands and result (addressing modes): – Where other than memory? – How many explicit operands? – How are memory operands located? – Which can or cannot be in memory? • Data type and Size. • Operations – What are supported • Successor instruction: – Jumps, conditions, branches. • Fetch-decode-execute is implicit. EECC 550 - Shaaban #34 Lec # 1 Winter 2009 12 -1 -2009

Main General Types of Instructions 1. Data Movement Instructions, possible variations: – – – Memory-to-memory. Memory-to-CPU register. CPU-to-memory. Constant-to-CPU register. CPU-to-output. etc. 2. Arithmetic Logic Unit (ALU) Instructions: – Logic instructions – Integer Arithmetic Instructions – Floating Point Arithmetic Instructions 3. Branch (Control) Instructions: – Unconditional jumps. – Conditional branches. EECC 550 - Shaaban #35 Lec # 1 Winter 2009 12 -1 -2009

Examples of Data Movement Instructions Instruction Meaning Machine MOV A, B Move 16 -bit data from memory loc. A to loc. B VAX 11 lwz R 3, A Move 32 -bit data from memory loc. A to register R 3 PPC 601 li $3, 455 Load the 32 -bit integer 455 into register $3 MIPS R 3000 MOV AX, BX Move 16 -bit data from register BX into register AX Intel X 86 LEA. L (A 0), A 2 Load the address pointed to by A 0 into A 2 MC 68000 EECC 550 - Shaaban #36 Lec # 1 Winter 2009 12 -1 -2009

Examples of ALU Instructions Instruction Meaning Machine MULF A, B, C Multiply the 32 -bit floating point values at mem. locations A and B, and store result in loc. C VAX 11 nabs r 3, r 1 Store the negative absolute value of register r 1 in r 2 PPC 601 ori $2, $1, 255 Store the logical OR of register $1 with 255 into $2 MIPS R 3000 SHL AX, 4 Shift the 16 -bit value in register AX left by 4 bits Intel X 86 ADD. L D 0, D 1 Add the 32 -bit values in registers D 0, D 1 and store the result in register D 0 MC 68000 EECC 550 - Shaaban #37 Lec # 1 Winter 2009 12 -1 -2009

Examples of Branch Instructions Instruction Meaning Machine BLBS A, Tgt Branch to address Tgt if the least significant bit at location A is set. VAX 11 bun r 2 Branch to location in r 2 if the previous comparison signaled that one or more values was not a number. PPC 601 Beq $2, $1, 32 Branch to location PC+4+32 if contents of $1 and $2 are equal. MIPS R 3000 JCXZ Addr Jump to Addr if contents of register CX = 0. Intel X 86 BVS next Branch to next if overflow flag in CC is set. MC 68000 EECC 550 - Shaaban #38 Lec # 1 Winter 2009 12 -1 -2009

Operation Types in The Instruction Set Operator Type Examples Arithmetic and logical Integer arithmetic and logical operations: add, or Data transfer Loads-stores (move on machines with memory 2 addressing) 1 3 Control Branch, jump, procedure call, and return, traps. System Operating system call/return, virtual memory management instructions. . . Floating point operations: add, multiply. . Decimal Decimal add, decimal multiply, decimal to character conversion String String move, string compare, string search Media The same operation performed on multiple data (e. g Intel MMX, SSE) EECC 550 - Shaaban #39 Lec # 1 Winter 2009 12 -1 -2009

Instruction Usage Example: Top 10 Intel X 86 Instructions Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Observation: Simple instructions dominate instruction usage frequency. CISC to RISC observation EECC 550 - Shaaban #40 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures According To Operand Addressing Fields Memory-To-Memory Machines: – Operands obtained from memory and results stored back in memory by any instruction that requires operands. – No local CPU registers are used in the CPU datapath. – Include: • The 4 Address Machine = ISA or CPU targeting a specific ISA type • The 3 -address Machine. • The 2 -address Machine. The 1 -address (Accumulator) Machine: – A single local CPU special-purpose register (accumulator) is used as the source of one operand as the result destination. The 0 -address or Stack Machine: – A push-down stack is used in the CPU. General Purpose Register (GPR) Machines: – The CPU datapath contains several local general-purpose registers which can be used as operand sources and as result destinations. – A large number of possible addressing modes. – Load-Store or Register-To-Register Machines: GPR machines where only data movement instructions (loads, stores) can obtain operands from memory and store results to memory. CISC to RISC observation (load-store simplifies CPU design) EECC 550 - Shaaban #41 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures Memory-To-Memory Machines: The 4 -Address Machine • • No program counter (PC) or other CPU registers are used. Instruction encoding has four address fields to specify: – Location of first operand. - Location of second operand. – Place to store the result. - Location of next instruction. Instruction: Memory CPU add Res, Op 1, Op 2, Nexti Op 1 Addr: Op 1 Op 2 Addr: Op 2 Meaning: Res ¬ Op 1 + Op 2 + or more precise RTN: M[Res. Addr] ¬ M[Op 1 Addr] + M[Op 2 Addr] Res. Addr: Res : : Instruction Format (encoding) Bits: Nexti. Addr: Nexti Instruction Size: 13 bytes Can address 224 bytes = 16 MBytes 8 24 add Opcode Which operation Res. Addr Where to put result Op 1 Addr Op 2 Addr Where to find operands Nexti. Addr Where to find next instruction EECC 550 - Shaaban #42 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures Memory-To-Memory Machines: The 3 -Address Machine • • A program counter (PC) is included within the CPU which points to the next instruction. No CPU storage (general-purpose registers). Memory CPU add Res, Op 1, Op 2 Op 1 Addr: Op 1 Op 2 Addr: Op 2 + Res. Addr: Res : : Nexti. Addr: Nexti Instruction: Where to find next instruction Program 24 Counter (PC) Meaning: Instruction Size: 10 bytes Res ¬ Op 1 + Op 2 or more precise RTN: M[Res. Addr] ¬ M[Op 1 Addr] + M[Op 2 Addr] PC ¬ PC + 10 Increment PC Instruction Format (encoding) Bits: 8 24 add Opcode Can address 224 bytes = 16 MBytes Which operation Res. Addr Where to put result Op 1 Addr Op 2 Addr Where to find operands EECC 550 - Shaaban #43 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures Memory-To-Memory Machines: The 2 -Address Machine • The 2 -address Machine: Result is stored in the memory address of one of the operands. Instruction: Memory CPU Meaning: Op 1 Addr: Op 1 + Op 2 Addr: Op 2, Res : : Nexti. Addr: Nexti add Op 2, Op 1 Where to find next instruction Program 24 Counter (PC) Can address 224 bytes = 16 MBytes Op 2 ¬ Op 1 + Op 2 or more precise RTN: M[Op 2 Addr] ¬ M[Op 1 Addr] + M[Op 2 Addr] PC ¬ PC + 7 Increment PC Instruction Format (encoding) Bits: 8 24 add Opcode Which operation Op 2 Addr Op 1 Addr Where to find operands Where to put result Instruction Size: 7 bytes EECC 550 - Shaaban #44 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures The 1 -address (Accumulator) Machine • A single accumulator in the CPU is used as the source of one operand result destination. Memory Instruction: CPU add Op 1 Addr: Op 1 Meaning: + : : Nexti. Addr: Nexti Where to find operand 2, and where to put result Accumulator Where to find next instruction Program 24 Counter (PC) Acc ¬ Acc + Op 1 or more precise RTN: Acc ¬ Acc + M[Op 1 Addr] PC ¬ PC + 4 Increment PC Instruction Format (encoding) Bits: 8 24 add Opcode Op 1 Addr Where to find Which operand 1 operation Can address 224 bytes = 16 MBytes Instruction Size: 4 bytes EECC 550 - Shaaban #45 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures The 0 -address (Stack) Machine • A push-down stack is used in the CPU. 4 Bytes Memory push Op 1 Addr: Op 1 Op 2 Addr: Op 2 Res. Addr: Res : : Instruction Format Bits: 8 24 CPU Stack pop TOS Op 2, Res SOS Op 1 add + etc. Instruction: push Op 1 Opcode Meaning: TOS ¬ M[Op 1 Addr] Op 1 Addr Where to find operand Instruction: Instruction Format 1 Byte add Bits: 8 Meaning: add TOS ¬ TOS + SOS Opcode 8 Instruction Format Bits: 8 24 4 Bytes Nexti. Addr: Nexti Program 24 Counter (PC) TOS = Top Entry in Stack SOS = Second Entry in Stack Can address 224 bytes = 16 MBytes pop Instruction: pop Res Opcode Meaning: M[Res. Addr] ¬ TOS Res. Addr Memory Destination EECC 550 - Shaaban #46 Lec # 1 Winter 2009 12 -1 -2009

Types of Instruction Set Architectures General Purpose Register (GPR) Machines • CPU contains several general-purpose registers which can be used as operand sources and result destination. Eight general purpose Registers (GPRs) assumed here: R 1 -R 8 CPU Memory Op 1 Addr: Op 1 add : : Nexti. Addr: Nexti Registers load R 8 + R 7 R 6 R 5 R 4 R 3 store R 2 R 1 Program 24 Counter (PC) Instruction Format Instruction: Bits: 8 3 24 load R 8, Op 1 load R 8 Op 1 Addr Meaning: R 8 ¬ M[Op 1 Addr] Opcode Where to find operand 1 PC ¬ PC + 5 Size = 4. 375 bytes rounded up to 5 bytes Instruction: add R 2, R 4, R 6 Meaning: R 2 ¬ R 4 + R 6 PC ¬ PC + 3 Instruction Format Bits: 8 3 3 3 add R 2 R 4 R 6 Opcode Des Operands Size = 2. 125 bytes rounded up to 3 bytes Instruction Format Instruction: Bits: 8 3 24 store R 2, Op 2 Meaning: store R 2 Res. Addr M[Op 2 Addr] ¬ R 2 Opcode Destination PC ¬ PC + 5 Here add instruction has three register specifier fields While load, store instructions have one register specifier field and one memory address specifier field Size = 4. 375 bytes rounded up to 5 bytes EECC 550 - Shaaban #47 Lec # 1 Winter 2009 12 -1 -2009

Expression Evaluation Example with 3 -, 2 -, 1 -, 0 -Address, And GPR Machines For the expression A = (B + C) * D - E where A-E are in memory 3 -Address 2 -Address add A, B, C load A, B mul A, A, D add A, C sub A, A, E mul A, D sub A, E 3 instructions Code size: 30 bytes 9 memory accesses for data 1 -Address Accumulator GPR 0 -Address Register-Memory Load-Store Stack load B add C mul D sub E store A push B push C add push D mul push E sub pop A load R 1, B add R 1, C mul R 1, D sub R 1, E store A, R 1 load R 1, B load R 2, C add R 3, R 1, R 2 load R 1, D mul R 3, R 1 load R 1, E sub R 3, R 1 store A, R 3 8 instructions Code size: 23 bytes 5 memory accesses for data 5 instructions Code size: 25 bytes 5 memory accesses for data 8 instructions Code size: 34 bytes 5 memory accesses for data 4 instructions 5 instructions Code size: 28 bytes 11 memory accesses for data 20 bytes 5 memory accesses for data EECC 550 - Shaaban #48 Lec # 1 Winter 2009 12 -1 -2009

Typical GPR ISA Memory Addressing Modes Addressing Sample Mode Instruction Meaning Register Add R 4, R 3 Immediate Add R 4, #3 R 4 ¬ R 4 + R 3 R 4 ¬ R 4 + 3 Add R 4, 10 (R 1) R 4 ¬ R 4 + Mem[10+ R 1] Indirect Add R 4, (R 1) R 4 ¬ R 4 + Mem[R 1] Indexed Add R 3, (R 1 + R 2) R 3 ¬ R 3 +Mem[R 1 + R 2] Absolute Add R 1, (1001) R 1 ¬ R 1 + Mem[1001] Memory indirect Add R 1, @ (R 3) R 1 ¬ R 1 + Mem[R 3]] Autoincrement Add R 1, (R 2) + R 1 ¬ R 1 + Mem[R 2] Autodecrement Add R 1, - (R 2) R 2 ¬ R 2 - d Scaled Add R 1, 100 (R 2) [R 3] R 1 ¬ R 1+ Mem[100+ R 2 + R 3*d] CISC to RISC observation (fewer addressing modes simplify CPU design) EECC 550 - Shaaban Displacement R 2 ¬ R 2 + d R 1 ¬ R 1 + Mem[R 2] #49 Lec # 1 Winter 2009 12 -1 -2009

Addressing Modes Usage Example For 3 programs running on VAX ignoring direct register mode: Displacement Immediate: 42% avg, 32% to 55% 75% 33% avg, 17% to 43% 88% Register deferred (indirect): 13% avg, 3% to 24% Scaled: 7% avg, 0% to 16% Memory indirect: Misc: 3% avg, 1% to 6% 2% avg, 0% to 3% 75% displacement & immediate 88% displacement, immediate & register indirect. Observation: In addition Register direct, Displacement, Immediate, Register Indirect addressing modes are important. CISC to RISC observation (fewer addressing modes simplify CPU design) EECC 550 - Shaaban #50 Lec # 1 Winter 2009 12 -1 -2009

Displacement Address Size Example Avg. of 5 SPECint 92 programs v. avg. 5 SPECfp 92 programs Displacement Address Bits Needed For displacement addressing mode 1% of addresses > 16 -bits 12 - 16 bits of displacement needed CISC to RISC observation EECC 550 - Shaaban #51 Lec # 1 Winter 2009 12 -1 -2009

Instruction Set Encoding Considerations affecting instruction set encoding: – The number of registers and addressing modes supported by ISA. – The impact of of the size of the register and addressing mode fields on the average instruction size and on the average program. – To encode instructions into lengths that will be easy to handle in the implementation. On a minimum to be a multiple of bytes. • Instruction Encoding Classification: 1. Fixed length encoding: Faster and easiest to implement in hardware. e. g. Simplifies design of pipelined CPUs 2. Variable length encoding: Produces smaller instructions. 3. Hybrid encoding. CISC to RISC observation to reduce code size EECC 550 - Shaaban #52 Lec # 1 Winter 2009 12 -1 -2009

Three Examples of Instruction Set Encoding Operations & no of operands Address specifier 1 Address field 1 Address specifier n Address field n Variable Length Encoding: VAX (1 -53 bytes) Operation Address field 1 Address field 2 Address field 3 Fixed Length Encoding: MIPS, Power. PC, SPARC (all instructions are 4 bytes each) Operation Address Specifier 1 Address Specifier Address field Address Specifier 2 Address field 1 Address field 2 Hybrid Encoding: IBM 360/370, Intel 80 x 86 EECC 550 - Shaaban #53 Lec # 1 Winter 2009 12 -1 -2009

Instruction Set Architecture Tradeoffs • 3 -address machine: shortest code sequence; a large number of bits per instruction; large number of memory accesses. • 0 -address (stack) machine: Longest code sequence; shortest individual instructions; more complex to program. Machine = CPU or ISA • General purpose register machine (GPR): – Addressing modified by specifying among a small set of registers with using a short register address (all new ISAs since 1975). – Advantages of GPR: • Low number of memory accesses. Faster, since register access is currently still much faster than memory access. • Registers are easier for compilers to use. • Shorter, simpler instructions. • Load-Store Machines: GPR machines where memory addresses are only included in data movement instructions (loads/stores) between memory and registers (all new ISAs designed after 1980). CISC to RISC observation (load-store simplifies CPU design) EECC 550 - Shaaban #54 Lec # 1 Winter 2009 12 -1 -2009

ISA Examples Machine Number of General Architecture year Purpose Registers EDSAC IBM 701 CDC 6600 IBM 360 DEC PDP-8 DEC PDP-11 Intel 8008 Motorola 6800 DEC VAX 1 1 8 16 1 8 1 1 16 Intel 8086 Motorola 68000 Intel 80386 MIPS HP PA-RISC SPARC Power. PC DEC Alpha HP/Intel IA-64 AMD 64 (EMT 64) 1 16 8 32 32 32 128 16 accumulator load-store register-memory accumulator register-memory-memory extended accumulator register-memory load-store load-store register-memory 1949 1953 1964 1965 1970 1972 1974 1977 1978 1980 1985 1986 1987 1992 2001 2003 EECC 550 - Shaaban #55 Lec # 1 Winter 2009 12 -1 -2009

Examples of GPR Machines For Arithmetic/Logic (ALU) Instructions (ISAs) Max. number of memory addresses of operands allowed SPARC, MIPS 0 3 Power. PC, ALPHA Intel 80386 Intel 1 2 Motorola 68000 2 or 3 VAX EECC 550 - Shaaban #56 Lec # 1 Winter 2009 12 -1 -2009

Complex Instruction Set Computer (CISC) • Emphasizes doing more with each instruction: ISAs – Thus fewer instructions per program (more compact code). • Motivated by the high cost of memory and hard disk Why? capacity when original CISC architectures were proposed – When M 6800 was introduced: 16 K RAM = $500, 40 M hard disk = $ 55, 000 – When MC 68000 was introduced: 64 K RAM = $200, 10 M HD = $5, 000 Circa 1980 • Original CISC architectures evolved with faster more complex CPU designs but backward instruction set compatibility had to be maintained (e. g X 86). • Wide variety of addressing modes: • 14 in MC 68000, 25 in MC 68020 • A number instruction modes for the location and number of operands: • The VAX has 0 - through 3 -address instructions. • Variable-length instruction encoding. EECC 550 - Shaaban #57 Lec # 1 Winter 2009 12 -1 -2009

Example CISC ISAs Motorola 680 X 0 18 addressing modes: • • • • • Data register direct. Address register direct. Immediate. Absolute short. Absolute long. Address register indirect with postincrement. Address register indirect with predecrement. Address register indirect with displacement. Address register indirect with index (8 -bit). Address register indirect with index (base). Memory inderect postindexed. Memory indirect preindexed. Program counter indirect with index (8 -bit). Program counter indirect with index (base). Program counter indirect with displacement. Program counter memory indirect postindexed. Program counter memory indirect preindexed. GPR ISA (Register-Memory) Operand size: • Range from 1 to 32 bits, 1, 2, 4, 8, 10, or 16 bytes. Instruction Encoding: • Instructions are stored in 16 -bit words. • the smallest instruction is 2 - bytes (one word). • The longest instruction is 5 words (10 bytes) in length. EECC 550 - Shaaban #58 Lec # 1 Winter 2009 12 -1 -2009

Example CISC ISA: Intel 80386 Intel X 86 or IA-32 GPR ISA (Register-Memory) 12 addressing modes: • • • Register. Immediate. Direct. Base + Displacement. Index + Displacement. Scaled Index + Displacement. Based Index. Based Scaled Index. Based Index + Displacement. Based Scaled Index + Displacement. Relative. Operand sizes: • Can be 8, 16, 32, 48, 64, or 80 bits long. • Also supports string operations. Instruction Encoding: • The smallest instruction is one byte. • The longest instruction is 12 bytes long. • The first bytes generally contain the opcode, mode specifiers, and register fields. • The remainder bytes are for address displacement and immediate data. EECC 550 - Shaaban #59 Lec # 1 Winter 2009 12 -1 -2009

Reduced Instruction Set Computer (RISC) ~1984 ISAs • Focuses on reducing the number and complexity of instructions of the ISA. RISC: Simplify ISA Simplify CPU Design Better CPU Performance – Motivated by simplifying the ISA and its requirements to: RISC Goals • Reduce CPU design complexity • Improve CPU performance. – CPU Performance Goal: Reduced number of cycles needed per instruction. At least one instruction completed per clock cycle. • Simplified addressing modes supported. – Usually limited to immediate, register indirect, register displacement, indexed. • Load-Store GPR: Only load and store instructions access memory. – (Thus more instructions are usually executed than CISC) • Fixed-length instruction encoding. – (Designed with CPU instruction pipelining in mind). • Support of delayed branches. • Examples: MIPS, HP PA-RISC, SPARC, Alpha, POWER, Power. PC. EECC 550 - Shaaban #60 Lec # 1 Winter 2009 12 -1 -2009

Example RISC ISA: Power. PC 8 addressing modes: • • Register direct. Immediate. Register indirect with immediate index (loads and stores). Register indirect with register index (loads and stores). Absolute (jumps). Link register indirect (calls). Count register indirect (branches). Load-Store GPR Operand sizes: • Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: • Instruction set has 15 different formats with many minor variations. • • All are 32 bits in length. EECC 550 - Shaaban #61 Lec # 1 Winter 2009 12 -1 -2009

Example RISC ISA: HP Precision Architecture HP PA-RISC Load-Store GPR 7 addressing modes: • • Register Immediate Base with displacement Base with scaled index and displacement Predecrement Postincrement PC-relative Operand sizes: • Five operand sizes ranging in powers of two from 1 to 16 bytes. Instruction Encoding: • Instruction set has 12 different formats. • • All are 32 bits in length. EECC 550 - Shaaban #62 Lec # 1 Winter 2009 12 -1 -2009

Example RISC ISA: SPARC 5 addressing modes: • • • Register indirect with immediate displacement. Register inderect indexed by another register. Register direct. Immediate. PC relative. Load-Store GPR Operand sizes: • Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: • Instruction set has 3 basic instruction formats with 3 minor variations. • All are 32 bits in length. EECC 550 - Shaaban #63 Lec # 1 Winter 2009 12 -1 -2009

Example RISC ISA: DEC Alpha AXP Load-Store GPR 4 addressing modes: • • Register direct. Immediate. Register indirect with displacement. PC-relative. Operand sizes: • Four operand sizes: 1, 2, 4 or 8 bytes. Instruction Encoding: • Instruction set has 7 different formats. • • All are 32 bits in length. EECC 550 - Shaaban #64 Lec # 1 Winter 2009 12 -1 -2009

RISC ISA Example: MIPS R 3000 (32 -bit) Instruction Categories: 5 Addressing Modes: • • Load/Store. • Computational. • Jump and Branch. • Floating Point (using coprocessor). • Memory Management. • Special. • • Load-Store GPR Register direct (arithmetic). Immedate (arithmetic). Base register + immediate offset (loads and stores). PC relative (branches). Pseudodirect (jumps) Registers R 0 - R 31 Operand Sizes: PC HI • LO Memory accesses in any multiple between 1 and 4 bytes. Instruction Encoding: 3 Instruction Formats, all 32 bits wide. R I OP rs rt J OP rd sa funct immediate jump target MIPS is the target ISA for CPU design in this course EECC 550 - Shaaban #65 Lec # 1 Winter 2009 12 -1 -2009