Chapter 11 WAN MOKHDZANI WAN NOR HAIMI MOHD

  • Slides: 37
Download presentation
Chapter 11 WAN MOKHDZANI WAN NOR HAIMI MOHD NATASHA NORIZAN + Reduced Instruction Set

Chapter 11 WAN MOKHDZANI WAN NOR HAIMI MOHD NATASHA NORIZAN + Reduced Instruction Set Computers (RISC)

+ Major Advances of the Computer n The family concept n IBM System/360 1964

+ Major Advances of the Computer n The family concept n IBM System/360 1964 n DEC PDP-8 n Separates architecture from implementation n Microprogrammed control unit n Idea by Wilkes 1951 n Produced by IBM S/360 1964 n Eases task of designing and implementing the control unit n Cache memory n IBM S/360 model 85 1969 n Improves performance dramatically

+ Major Advances of the Computer (cont) n Solid State RAM n (See memory

+ Major Advances of the Computer (cont) n Solid State RAM n (See memory notes) n Microprocessors n Intel 4004 1971 n Intel 8086 n X 86 architecture n Pipelining n Introduces parallelism into fetch execute cycle n Multiple processors

+ THE NEXT STEP, RISC n Reduced instruction set computing or RISC is a

+ THE NEXT STEP, RISC n Reduced instruction set computing or RISC is a CPU design strategy based on the insight that simplified (as opposed to complex) instructions can provide higher performance if this simplicity enables much faster execution of each instruction. n The opposing architecture is known as complex instruction set computing, CISC.

+ Reduced Instruction Set Computer n. Key features n. Large number of general purpose

+ Reduced Instruction Set Computer n. Key features n. Large number of general purpose registers n. Applies compiler technology to optimize register use. n. Limited and simple instruction set n. Emphasis on optimising the instruction pipeline

+ RISC vs CISC RISC CISC Simple instructions, few in number Many complex instructions

+ RISC vs CISC RISC CISC Simple instructions, few in number Many complex instructions Fixed length instructions Variable length instructions Only Load/Store instructions access memory Many instructions can access memory Few Addressing modes Many addressing modes Complexity in compiler Complexity in microcode(coding)

+ RISC vs CISC n RISC systems shorten exection time by reducing the clock

+ RISC vs CISC n RISC systems shorten exection time by reducing the clock cycles per instructions (simple instructions take less time to interpret) n CISC systems shorten execution time by reducing the number of instructions per program

+ RISC vs CISC RISC Mov ax, 10 CISC mov ax, 0 mov bx,

+ RISC vs CISC RISC Mov ax, 10 CISC mov ax, 0 mov bx, 10 Mov bx, 5 mov cx, 5 Mul bx, ax Begin : add ax, bx loop Begin

Table 15. 1 Characteristics of Some CISCs, RISCs, and Superscalar Processors

Table 15. 1 Characteristics of Some CISCs, RISCs, and Superscalar Processors

+ CHARACTERISTICS AND DESIGN PHILOSOPHY n INSTRUCTION SET “Reduced instruction set computer" is the

+ CHARACTERISTICS AND DESIGN PHILOSOPHY n INSTRUCTION SET “Reduced instruction set computer" is the mistaken idea that instructions are simply eliminated, resulting in a smaller set of instructions. (MISUNDERSTANDING) The term "reduced“ is amount of work any single instruction accomplishes is reduced, at most a single data memory cycle compared to the "complex instructions" of CISC CPUs that may require dozens of data memory cycles in order to execute a single instruction. In particular, RISC processors typically have separate instructions for I/O and data processing.

+ CHARACTERISTICS AND DESIGN PHILOSOPHY (Cont) n HARDWARE UTILIZATION Uniform instruction format, using a

+ CHARACTERISTICS AND DESIGN PHILOSOPHY (Cont) n HARDWARE UTILIZATION Uniform instruction format, using a single word with the opcode in the same bit positions in every instruction, demanding less decoding; Identical general purpose registers, allowing any register to be used in any context, simplifying compiler design (although normally there are separate floating point registers); Simple addressing modes, with complex addressing performed via sequences of arithmetic and/or load-store operations; Few data types in hardware, some CISCs have byte string instructions, or support complex numbers; this is so far unlikely to be found on a RISC.

Instruction Execution Characteristics Execution sequencing • Determines the control and pipeline organization Operands used

Instruction Execution Characteristics Execution sequencing • Determines the control and pipeline organization Operands used • The types of operands and the frequency of their use determine the memory organization for storing them and the addressing modes for accessing them High-level languages (HLLs) • Allow the programmer to express algorithms more concisely • Allow the compiler to take care of details that are not important in the programmer’s expression of algorithms • Often support naturally the use of structured programming and/or objectoriented design Semantic gap • The difference between the operations provided in HLLs and those provided in computer architecture Operations performed • Determine the functions to be performed by the processor and its interaction with memory

Table 15. 2 Weighted Relative Dynamic Frequency of HLL Operations [PATT 82 a]

Table 15. 2 Weighted Relative Dynamic Frequency of HLL Operations [PATT 82 a]

Table 15. 3 Operands Table 15. 3 Dynamic Percentage of Operands

Table 15. 3 Operands Table 15. 3 Dynamic Percentage of Operands

Table 15. 4 Procedure Calls/Arguments and Local Scalar Variables Table 15. 4 Procedure Arguments

Table 15. 4 Procedure Calls/Arguments and Local Scalar Variables Table 15. 4 Procedure Arguments and Local Scalar Variables

+ Implications n HLLs can best be supported by optimizing performance of the most

+ Implications n HLLs can best be supported by optimizing performance of the most time-consuming features of typical HLL programs n Three elements characterize RISC architectures: n Use a large number of registers or use a compiler to optimize register usage n Careful attention needs to be paid to the design of instruction pipelines n Instructions should have predictable costs and be consistent with a high-performance implementation

+ The Use of a Large Register File Software Solution n Requires compiler to

+ The Use of a Large Register File Software Solution n Requires compiler to allocate registers n Allocates based on most used variables in a given time n Requires sophisticated program analysis Hardware Solution n More registers n Thus more variables will be in registers

+ Overlapping Register Windows

+ Overlapping Register Windows

Circular Buffer Organization of Overlapped Windows

Circular Buffer Organization of Overlapped Windows

+ Circular Buffer Organization of Overlapped Windows n When a call is made, a

+ Circular Buffer Organization of Overlapped Windows n When a call is made, a current window pointer is moved to show the currently active register window n If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory n A saved window pointer indicates where the next saved windows should restore to

+ Global Variables n Variables declared as global in an HLL can be assigned

+ Global Variables n Variables declared as global in an HLL can be assigned memory locations by the compiler and all machine instructions that reference these variables will use memory reference operands n n However, for frequently accessed global variables this scheme is inefficient Alternative is to incorporate a set of global registers in the processor n n These registers would be fixed in number and available to all procedures A unified numbering scheme can be used to simplify the instruction format n There is an increased hardware burden to accommodate the split in register addressing n In addition, the linker must decide which global variables should be assigned to registers

Characteristics of Large-Register-File and Cache Organizations Table 15. 5 Characteristics of Large-Register-File and Cache

Characteristics of Large-Register-File and Cache Organizations Table 15. 5 Characteristics of Large-Register-File and Cache Organizations

+ Referencing a Scalar

+ Referencing a Scalar

Graph Coloring Approach

Graph Coloring Approach

+ Why CISC ? (Complex Instruction Set Computer) n There is a trend to

+ Why CISC ? (Complex Instruction Set Computer) n There is a trend to richer instruction sets which include a larger and more complex number of instructions n Two principal reasons for this trend: n n n A desire to simplify compilers A desire to improve performance There are two advantages to smaller programs: n n The program takes up less memory Should improve performance n Fewer instructions means fewer instruction bytes to be fetched n In a paging environment smaller programs occupy fewer pages, reducing page faults n More instructions fit in cache(s)

Table 15. 6 Code Size Relative to RISC 1 Table 15. 6 Code Size

Table 15. 6 Code Size Relative to RISC 1 Table 15. 6 Code Size Relative to RISC I

Characteristics of Reduced Instruction Set Architectures One machine instruction per machine cycle • Machine

Characteristics of Reduced Instruction Set Architectures One machine instruction per machine cycle • Machine cycle --- the time it takes to fetch two operands from registers, perform an ALU operation, and store the result in a register Register-to-register operations • Only simple LOAD and STORE operations accessing memory • This simplifies the instruction set and therefore the control unit Simple addressing modes • Simplifies the instruction set and the control unit Simple instruction formats • Generally one or a few formats are used • Instruction length is fixed and aligned on word boundaries • Opcode decoding and register operand accessing can occur simultaneously

Comparison of Register-to-Register and Memory-to-Memory Approaches

Comparison of Register-to-Register and Memory-to-Memory Approaches

Table 15. 7 Characteristics of Some Processors

Table 15. 7 Characteristics of Some Processors

The Effects of Pipelining

The Effects of Pipelining

+ Optimization of Pipelining n Delayed branch n n n Delayed Load n n

+ Optimization of Pipelining n Delayed branch n n n Delayed Load n n n Does not take effect until after execution of following instruction This following instruction is the delay slot Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load is complete Re-arranging instructions can allow useful work while loading Loop Unrolling n n n Replicate body of loop a number of times Iterate loop fewer times Reduces loop overhead Increases instruction parallelism Improved register, data cache, or TLB locality

Table 15. 8 Normal and Delayed Branch

Table 15. 8 Normal and Delayed Branch

+ Use of the Delayed Branch

+ Use of the Delayed Branch

do i=2, n-1 a[i] = a[i] + a[i-1] * a[i+l] end do Becomes do

do i=2, n-1 a[i] = a[i] + a[i-1] * a[i+l] end do Becomes do i=2, n-2, 2 a[i] = a[i] + a[i-1] * a[i+i] a[i+l] = a[i+l] + a[i] * a[i+2] end do if (mod(n-2, 2) = i) then a[n-1] = a[n-1] + a[n-2] * a[n] end if Loop Unrolling Twice Example

MIPS R 4000 One of the first commercially available RISC chip sets was developed

MIPS R 4000 One of the first commercially available RISC chip sets was developed by MIPS Technology Inc. Inspired by an experimental system developed at Stanford Has substantially the same architecture and instruction set of the earlier MIPS designs (R 2000 and R 3000) Uses 64 bits for all internal and external data paths and for addresses, registers, and the ALU Is partitioned into two sections, one containing the CPU and the other containing a coprocessor for memory management Supports thirty-two 64 bit registers Provides for up to 128 Kbytes of high-speed cache, half each for instructions and data

+ RISC versus CISC Controversy n Quantitative n n Qualitative n n Compare program

+ RISC versus CISC Controversy n Quantitative n n Qualitative n n Compare program sizes and execution speeds of programs on RISC and CISC machines that use comparable technology Examine issues of high level language support and use of VLSI real estate Problems with comparisons: n n n No pair of RISC and CISC machines that are comparable in lifecycle cost, level of technology, gate complexity, sophistication of compiler, operating system support, etc. No definitive set of test programs exists Difficult to separate hardware effects from complier effects Most comparisons done on “toy” rather than commercial products Most commercial devices advertised as RISC possess a mixture of RISC and CISC characteristics

+ Summary Chapter 11 Reduced Instruction Set Computers (RISC) n n Instruction execution characteristics

+ Summary Chapter 11 Reduced Instruction Set Computers (RISC) n n Instruction execution characteristics n Operations n Operands n Procedure calls n Implications n The use of a large register file n Register windows n Global variables n Large register file versus cache n Reduced instruction set architecture n Characteristics of RISC n CISC versus RISC characteristics RISC pipelining n Pipelining with regular instructions n Optimization of pipelining n Compiler-based register optimization n RISC versus CISC controversy