RISC Architecture FK Boachie Architecture Review Design Point

RISC Architecture FK Boachie.

Architecture Review: Design Point A set of design considerations and their importance ◦ leads to tradeoffs in both ISA and uarch Problem Algorithm Program Considerations ◦ ◦ ◦ ◦ ISA Microarchitecture Cost Circuits Performance Electrons Maximum power consumption Energy consumption (battery life useful in mobiles etc) Availability Reliability and Correctness ( trustworthiness) Time to Market (useful engineering considerations) Design point determined by the “Problem” space (application space) - discuss

Tradeoffs: Soul of Computer Architecture ISA-level tradeoffs Microarchitecture-level tradeoffs System and Task-level tradeoffs ◦ How to divide the labor between hardware and software is art of designers and developers. Computer architecture is the science and art of making the appropriate trade-offs to meet a design point ◦ Why do you think this is an art? (And why tradeoffs the soul? )

REDUCED INSTRUCTION SET COMPUTERS (RISC) 1. Why do we need RISCs? 2. Some Typical Features of Current Programs 3. Main Characteristics of RISC Architectures 4. Are RISCs Really Better than CISCs? 5. Examples

What are RISCs and why do we need them? RISC architectures represent an important innovation in the area of computer organization. • • The RISC architecture is an attempt to produce more CPU power by simplifying the instruction set Architecture of the CPU. • The opposed trend to RISC is that of complex instruction set computers (CISC). Both RISC and CISC architectures have been developed as an attempt to cover the semantic gap. (What is the semantic gap? )

The Semantic gap High-level Language (HLL) Machine language (machine architecture fully visible) Growing abstraction level The Semantic Gap In order to improve the efficiency of software development, new and powerful programming languages have been developed (Ada, C++, Java PHPs, Python, Jumla. Net etc, what else? ). They provide: high level of abstraction, conciseness, power. • By this evolution the semantic gap grows.

The Semantic Gap (cont’d) Problem: How should new HLL programs be compiled and executed efﬁciently on a processor architecture? Two possible answers: 1. The CISC approach: design very complex architectures including a large number of instructions and addressing modes; include also instructions close to those present in HLL. 2. The RISC approach: simplify the instruction set and adapt it to the real requirements of user programs. Which option is the best one and why?

Small versus Large Semantic Gap CISC vs. RISC ◦ Complex instruction set computer complex instructions Initially motivated by “not good enough” code generation ◦ Reduced instruction set computer simple instructions John Cocke, mid 1970 s, IBM 801 Goal: enable better compiler control and optimization RISC motivated by ◦ Memory stalls (no work done in a complex instruction when there is a memory stall? – remember pipeline issues? ) When is this correct? ◦ Simplifying the hardware lower cost, higher frequency ◦ Enabling the compiler to optimize the code better ( remember the performance equation? Find fine-grained parallelism to reduce stalls ( Can you see this useful in pipelining? )

Evaluation of Program Execution or What are Programs Doing Most of the Time? – Several studies have been conducted to determine the execution characteristics of machine instruction sequences generated from HLL programs. – Aspects of interest: 1. The frequency of operations performed. 2. The types of operands and their frequency of use. 3. Execution sequencing (frequency of jumps, loops, subprogram calls).

Frequency of Instructions Executed Frequency distribution of executed machine instructions: - moves: 33% - conditional branch: 20% - Arithmetic/logic: 16% - others: Between 0. 1% and 10% • Addressing modes: the overwhelming majority of instructions uses simple addressing modes, in which the address can be calculated in a single cycle (register, register indirect, displacement); complex addressing modes (memory indirect, indexed+indirect, displacement+indexed, stack) are used only by ~18% of the instructions.

Frequency of Instructions Executed Operand Types • 74 to 80% of the operands are scalars (integers, reals, characters, etc. ) which can be held in registers; • the rest (20 -26%) are arrays/structures; 90% of them are global variables; • 80% of the scalars are local variables. That would imply that: The majority of operands are local variables of scalar type, which can be stored in registers.

Procedure Calls Investigations have been also performed about the percentage of the total execution time spent executing a certain HLL instruction. • It turned out that comparing HLL instructions, most of the time is spent executing CALLs and RETURNs. • Even if only 15% of the executed HLL instructions is a CALL or RETURN, they are executed most of the time, because of their complexity. • A CALL or RETURN is compiled into a relatively long sequence of machine instructions with a lot of memory references.

Procedure Calls (cont’d) int x, y; ﬂoat z; void proc 3(int a, int b, int c) { - - - -- - } ----------void proc 2(int k) { int j, q; --------proc 3(j, 5, q); } ----------void proc 1() { int i; --------proc 2(i); } ----------main () { proc 1(); } Some statistics concerning procedure calls: • Only 1. 25% of called procedures have more than six parameters. • Only 6. 7% of called procedures have more than six local variables. • Chains of nested procedure calls are usually short and only very seldom longer than 6.

Conclusions from Evaluation of Program Execution • An overwhelming preponderance of simple (ALU and move) operations over complex operations. • Preponderance of simple addressing modes. • Large frequency of operand accesses; on average each instruction references 1. 9 operands. • Most of the referenced operands are scalars (so they can be stored in a register) and are local variables or parameters. • Optimizing the procedure CALL/RETURN mechanism promises large beneﬁts in speed. These conclusions have been at the starting point to the Reduced Instruction Set Computer (RISC) approach

Main Characteristics of RISC Architectures The instruction set is limited and includes only simple instructions. - The goal is to create an instruction set containing instructions that execute quickly; most of the RISC instructions are executed in a single machine cycle (after fetched and decoded). • Pipeline operation (without memory reference): FI DI EI - RISC instructions, being simple, are hard-wired, while CISC architectures have to use microprogramming in order to implement complex instructions.

Main Characteristics of RISC Architectures - contd - Having only simple instructions results in reduced complexity of the control unit and the data path; as a consequence, the processor can work at a high clock frequency. - The pipelines are used efﬁciently if instructions are simple and of similar execution time. - Complex operations on RISCs are executed as a sequence of simple RISC instructions. In the case of CISCs they are executed as one single or a few complex instruction.

Main Characteristics of RISC Architectures (cont’d) Let’s see some small example: Assume: - we have a program with 80% of executed instructions being simple and 20% complex; - on a CISC machine simple instructions take 4 cycles, complex instructions take 8 cycles; cycle time is 100 ns (10 -7 s); - on a RISC machine simple instructions are executed in one cycle; complex operations are implemented as a sequence of instructions; we consider on average 14 instructions (14 cycles) for a complex operation; cycle time is 75 ns (0. 75 * 10 -7 s).

Main Characteristics of RISC Architectures (cont’d) How much time takes a program of 1 000 instructions to be executed using the following? CISC: (106*0. 80*4 + 106*0. 20*8)*10 -7 = 0. 48 s RISC: (106*0. 80*1 + 106*0. 20*14)*0. 75*10 -7 = 0. 27 s • complex operations take more time on the RISC, but their number is small; • because of its simplicity, the RISC works at a smaller cycle time; with the CISC, simple instructions are slowed down because of the increased data path length and the increased control complexity.

Main Characteristics of RISC Architectures (cont’d) Load-and-store architecture - Only LOAD and STORE instructions reference data in memory; all other instructions operate only with registers (are register-to-register instructions); thus, only the few instructions accessing memory need more than one cycle to execute (after fetched and decoded). • Pipeline operation with memory reference: FI DI CA TR CA: compute address TR: transfer

Main Characteristics of RISC Architectures (cont’d) Instructions use only few addressing modes - Addressing modes are usually register, direct, register indirect, displacement. Instructions are of fixed length and uniformat - This makes the loading and decoding of instructions simple and fast; it is not needed to wait until the length of an instruction is known in order to start decoding the following one; - Decoding is simplified because opcode and address fields are located in the same position for all instructions

Main Characteristics of RISC Architectures (cont’d) • A large number of registers is available - Variables and intermediate results can be stored in registers and do not require repeated loads and stores from/to memory. - All local variables of procedures and the passed parameters can be stored in registers (see slide 8 for comments on possible number of variables and parameters). What happens when a new procedure is called? - Normally the registers have to be saved in memory (they contain values of variables and parameters for the calling procedure); at return to the calling procedure, the values have to be again loaded from memory. This takes a lot of time. - If a large number of registers is available, a new set of registers can be allocated to the called procedure and the register set assigned to the calling one remains untouched.

Main Characteristics of RISC Architectures (cont’d) Is the above strategy realistic? This in reference to new procedure calls - The strategy is realistic, because the number of local variables in procedures is not large. The chains of nested procedure calls is only exceptionally larger than 6. - If the chain of nested procedure calls becomes large, at a certain call there will be no registers to be assigned to the called procedure; in this case local variables and parameters have to be stored in memory. Why is a large number of registers typical for RISC architectures? - Because of the reduced complexity of the processor there is enough space on the chip to be allocated to a large number of registers. This, usually, is not the case with CISCs.

Main Characteristics of RISC Architectures (cont’d) The delayed load problem • LOAD instructions (similar to the STORE) require memory access and their execution cannot be completed in a single clock cycle. However, in the next cycle a new instruction is started by the processor. LOAD R 1, X ADD R 2, R 1 ADD R 4, R 3 FI DI CA FI DI TR stall TR FI DI stall TR Two possible solutions 1. The hardware should delay the execution of the instruction following the LOAD, if this instruction needs the loaded value 2. A more efﬁcient, compiler based, solution, which has similarities with the delayedbranching, is the delayed load

The delayed load problem (cont’d) With delayed load the processor always executes the instruction following a LOAD, without a stall; It is the programmers (compilers) responsibility that this instruction does not need the loaded value The following sequence is not correct with such a processor, and a compiler will not generate something like this: LOAD R 1, X ADD R 2, R 1 ADD R 4, R 3 SUB R 2, R 4 loads from address X into R 1 R 2 ← R 2 + R 1 R 4 ← R 4 + R 3 R 2 ← R 2 - R 4 The value in R 1 is not yet available in the stall scenario observed on the other slide!

The delayed load problem (cont’d) The following LOAD R 1, X ADD R 4, R 3 ADD R 2, R 1 SUB R 2, R 4 one is the correct sequence: loads from address X into R 1 R 4 ← R 4 + R 3 R 2 ← R 2 + R 1 R 2 ← R 2 - R 4 For the following sequence the compiler has generated a NOP after the LOAD because there is no instruction to ﬁll the load-delay slot: LOAD R 1, X NOP ADD R 2, R 1 ADD R 4, R 2 STORE R 4, X loads from address X into R 1 R 2 ← R 2 + R 1 R 4 ← R 4 + R 2 stores from R 4 to address X

Are RISCs Really Better than CISCs? RISC architectures have several advantages and they were discussed throughout this lecture. However, a definitive answer to the above question is difficult to give. A lot of performance comparisons have shown that benchmark programs are really running faster on RISC processors than on processors with CISC characteristics. However, it is difficult to identify which feature of a processor produces the higher performance. Some "CISC fans" argue that the higher speed is not produced by the typical RISC features but because of technology, better compilers, etc. An argument in favour of the CISC: the simpler instruction set of RISC processors results in a larger memory requirement compared to the similar program compiled for a CISC architecture

Are RISCs Really Better than CISCs? Most of the current processors are not typically RISCs or CISCs but try to combine advantages of both approaches – why? Discuss!

Some Processor Examples CISC Architectures: VAX 11/780 Nr. of instructions: 303 Instruction size: 2 - 57 Instruction format: not fixed Addressing modes: 22 Number of general purpose registers: 16 Pentium Nr. of instructions: 235 Instruction size: 1 - 11 Instruction format: not fixed Addressing modes: 11 Number of general purpose registers: 8 RISC Architectures: Sun SPARC Nr. of instructions: 52 Instruction size: 4 Instruction format: fixed Addressing modes: 2 Number of general purpose registers: up to 520 Power. PC Nr. of instructions: 206 Instruction size: 4 Instruction format: not fixed (but small differences) Addressing modes: 2 Number of general purpose registers: 32

Summary Both RISCs and CISCs try to solve the same problem: to cover the semantic gap. They do it in different ways. CISCs are going the traditional way of implementing more and more complex instructions. RISCs try to simplify the instruction set. • Innovations in RISC architectures are based on a close analysis of a large set of widely used programs. • The main features of RISC architectures are: reduced number of simple instructions, few addressing modes, loadstore architecture, instructions are of ﬁxed length and format, a large number of registers is available. • One of the main concerns of RISC designers was to maximise the efﬁciency of pipelining. • Present architectures often include both RISC and CISC features.

RISC vs. CISC Cont’d The CISC Approach The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (we'll call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction: MULT 2: 3, 5: 2 MULT is what is known as a "complex instruction. " It operates directly on the computer's memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of 2: 3 and "b" represent the value of 5: 2, then this command is identical to the C statement "a = a * b. “

RISC vs. CISC Cont’d Separating the "LOAD" and "STORE" instructions actually reduces the amount of work that the computer must perform. After a CISC-style "MULT" command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register. In RISC, the operand will remain in the register until another value is loaded in its place. RISC has the flexibility due to number of registers. One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware

RISC vs. CISC Cont’d However, the RISC strategy also brings some very important advantages. Because each instruction requires only one clock cycle to execute, the entire program will execute in approximately the same amount of time as the multi-cycle "MULT" command. These RISC "reduced instructions" require less transistors of hardware space than the complex instructions, leaving more room for general purpose registers. Because all of the instructions execute in a uniform amount of time (i. e. one clock), pipelining is possible.

RISC vs. CISC Cont’d Separating the "LOAD" and "STORE" instructions actually reduces the amount of work that the computer must perform. After a CISC-style "MULT" command is executed, the processor automatically erases the registers. If one of the operands needs to be used for another computation, the processor must re-load the data from the memory bank into a register. In RISC, the operand will remain in the register until another value is loaded in its place. RISC has the flexibility due to number of registers.

RISC vs. CISC Cont’d RISC Roadblocks Despite the advantages of RISC based processing, RISC chips took over a decade to gain a foothold in the commercial world. This was largely due to a lack of software support. Although Apple's Power Macintosh line featured RISC-based chips and Windows NT was RISC compatible, Windows 3. 1 and Windows 95 were designed with CISC processors in mind. Many companies were unwilling to take a chance with the emerging RISC technology. Without commercial interest, processor developers were unable to manufacture RISC chips in large enough volumes to make their price competitive. Another major setback was the presence of Intel. Although their CISC chips were becoming increasingly unwieldy and difficult to develop, Intel had the resources to plow through development and produce powerful processors. Although RISC chips might surpass Intel's efforts in specific areas, the differences were not great enough to persuade buyers to change technologies.

Recall Processor Performance Time Processor Performance = -------Program = Instructions Program (code size) X Cycles X Instruction (CPI) Time Cycle (cycle time) The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program.

RISC vs. CISC The Overall RISC Advantage Today, the Intel x 86 is arguable the only chip which retains CISC architecture. This is primarily due to advancements in other areas of computer technology. The price of RAM has decreased dramatically. In 1977, 1 MB of DRAM cost about $5, 000. By 1994, the same amount of memory cost only $6 (when adjusted for inflation). What is the price today? Find out!!! Compiler technology has also become more sophisticated, so that the RISC use of RAM and emphasis on software has become ideal. Above all, prices of RAM/SDRAM etc have continue to be lower thus creating the platform for RISC based solutions to be economical.

A Note on RISC vs. CISC Usually, … RISC ◦ ◦ Simple instructions Fixed length Uniform decode Few addressing modes CISC ◦ ◦ Complex instructions Variable length Non-uniform decode Many addressing modes