CISC versus RISC Does this comparison still make

  • Slides: 52
Download presentation
CISC versus RISC Does this comparison still make any sense?

CISC versus RISC Does this comparison still make any sense?

ISA definition in 1964… “Instruction set architecture is the structure of a computer that

ISA definition in 1964… “Instruction set architecture is the structure of a computer that a machine language programmer (or a compiler) must understand to write a correct (timing independent) program for that machine” –IBM introducing 360 (1964) An instruction set specifies a processor’s functionality: what operations it supports what storage mechanisms it has & how they are accessed how the programmer/compiler communicates programs to Processor instruction set architecture (ISA): “architecture” part of this course • the rest is micro-architecture • • 2

What makes a good ISA? • Implementability • Supports a (performance/cost) range of implementations

What makes a good ISA? • Implementability • Supports a (performance/cost) range of implementations • Programmability • Easy to express programs (for human and/or compiler) • backward/forward compatibility • Implementability & programmability across generations • Business reality: software cost greater than hardware cost • e. g. , x 86 generations: 8086, 286, 386, 486, Pentium III, Core i 3, . . . 3

Pre-1975: Human Programmability • Focus: instruction sets that were easy for humans to program

Pre-1975: Human Programmability • Focus: instruction sets that were easy for humans to program • ISA semantically close to high-level-language (HLL) • Semantically heavy (CISC-like) instructions • Implicit saves/restores on procedure calls • People thought computers would someday execute HLL directly • Never materialized - “multiple HLLs” = “semantic clash” 4

Pre-1975: Human Programmability • Earliest machine were programmed in assembly language and memory was

Pre-1975: Human Programmability • Earliest machine were programmed in assembly language and memory was slow and expensive • Bigger program more storage more $ • Need to reduce the number of instructions/program • A single instruction can be represented by multiple operations • Instruction length variable fetch-decode-execute time unpredictable (CPI > 1) • Hardware handless the complexity • E. g. : VAX, x 86 5

1975 -: Compiler Programmability Focus: instruction sets that are easy for compilers to compile

1975 -: Compiler Programmability Focus: instruction sets that are easy for compilers to compile to • Primitive instructions from which solutions are synthesized • Provide primitives (not solutions)1 • Hard for compiler to tell if complex instruction fits situation • Regularity: do things the same way, consistently • Orthogonality, composability • all combinations of operation, data type, addressing mode possible • e. g. , ADD and SUB should have same addressing modes • Few modes/obvious choices • compilers do complicated case analysis, don’t add more cases “Compilers and Computer Architecture” by William Wulf, IEEE Computer, 14(8), 1981. 1 6

CISC – Complex Instruction Set Computer • Language development • More operations increase instruction

CISC – Complex Instruction Set Computer • Language development • More operations increase instruction size • Design ever more complex instructions • Provide more addressing modes • Implement some High-Level Languages constructs in ISA • Back-end compiler simplification = Hardware handle all complexity 7

CISC – Complex Instruction Set Computer • Intel 8086, 80286, 80386, 80486, Pentium •

CISC – Complex Instruction Set Computer • Intel 8086, 80286, 80386, 80486, Pentium • Each instruction is hard-wired into the control unit (ALU) • Difficult and expensive to design and build new instructions • Motivate the use of microprogramming 8

CISC – Complex Instruction Set Computer • In the last 20 years: optimization in

CISC – Complex Instruction Set Computer • In the last 20 years: optimization in performance • Changes in software and hardware technology have forced a re-examination of CISC and many modern CISC processors are hybrids, implementing many RISC principles. • The arise of microprogramming technology • Ease to add new instructions and maintain backward compatibility. • Less hardware-wire instructions • Microprogram instruction sets can be writer to match the constructs of high-level languages • Compiler does not have to be as complicated • Not avoid HW implementations of some high-level functions 9

CISC – Complex Instruction Set Computer • Microprogramming • Complex instructions are split into

CISC – Complex Instruction Set Computer • Microprogramming • Complex instructions are split into a series of simples instructions • Complex instruction = small microprogram stored in a control memory (ROM) and executed by CPU • Simplify design of processor • allows the addition of new instructions • allows bug-fixes after processor is released in the market 10

CISC – Complex Instruction Set Computer • Disadvantages • More complex hardware translation from

CISC – Complex Instruction Set Computer • Disadvantages • More complex hardware translation from a high level to control signals and optimization needs to be done by hardware • Earlier generations of a processor family generally were contained as a subset in every new version - so instruction set & chip hardware become more complex with each generation of computers. • Instructions with different length • different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine. • Many specialized instructions aren't used frequently enough to justify their existence -approximately 20% of the available instructions are used in a typical program. • Complexity • Pipelining bottlenecks lower clock rates • Marketing • Prolonged design time and frequent microcode errors hurt competitiveness 11

Intel x 86 Processors • Dominate laptop/desktop/server market • Evolutionary design • Backwards compatible

Intel x 86 Processors • Dominate laptop/desktop/server market • Evolutionary design • Backwards compatible up until 8086, introduced in 1978 • Added more features as times goes on • Nowadays, about 5, 000 pages in documentation • CISC Architecture • Many different instructions with many different formats 12

Intel x 86 Evolution: Milestones Name • 8086 Date 1978 Transistors 29 K MHz

Intel x 86 Evolution: Milestones Name • 8086 Date 1978 Transistors 29 K MHz 5 -10 • 386 1985 275 K 16 -33 • Pentium 4 E 2004 125 M 2800 -3800 • Core 2 2006 291 M 1060 -3333 • Core i 7 2008 731 M 1600 -4400 • First 16 -bit Intel processor. Basis for IBM PC & DOS • 1 MB address space • First 32 bit Intel processor , referred to as IA 32 • Added “flat addressing”, capable of running Unix • First 64 -bit Intel x 86 processor, referred to as x 86 -64 • First multi-core Intel processor • Four cores 13

Intel x 86 Processors • Machine Evolution • • 386 Pentium/MMX Pentium. Pro Pentium

Intel x 86 Processors • Machine Evolution • • 386 Pentium/MMX Pentium. Pro Pentium III Pentium 4 Core 2 Duo Core i 7 1985 1993 1997 1995 1999 2000 2006 2008 0. 3 M 3. 1 M 4. 5 M 6. 5 M 8. 2 M 42 M 291 M 731 M • Added Features • • Instructions to support multimedia operations Instructions to enable more efficient conditional operations Transition from 32 bits to 64 bits More cores 14

Intel x 86 Processors • Past Generations • • 1 st Pentium Pro 1995

Intel x 86 Processors • Past Generations • • 1 st Pentium Pro 1995 1 st Pentium III 1999 1 st Pentium 4 2000 1 st Core 2 Duo 2006 • Recent Generations 1. 2. 3. 4. 5. 6. 7. Nehalem 2008 Sandy Bridge 2011 Ivy Bridge 2012 Haswell 2013 Broadwell 2014 Skylake 2015 Kaby Lake 2016 • Upcoming Generations • Cannonlake 2017? 600 nm 250 nm 180 nm 65 nm 45 nm 32 nm 22 nm 14 nm Process technology dimension = width of narrowest wires (10 nm ≈ 100 atoms wide) 10 nm 15

2017 State of the Art: Skylake • Mobile Model: Core i 7 • 2.

2017 State of the Art: Skylake • Mobile Model: Core i 7 • 2. 6 -2. 9 GHz • 45 W • Desktop Model: Core i 7 • Integrated graphics • 2. 8 -4. 0 GHz • 35 -91 W • Server Model: Xeon • • Integrated graphics Multi-socket enabled 2 -3. 7 GHz 25 -80 W 16

x 86 Clones: Advanced Micro Devices (AMD) • Historically • AMD has followed just

x 86 Clones: Advanced Micro Devices (AMD) • Historically • AMD has followed just behind Intel • A little bit slower, a lot cheaper • Then • Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies • Built Opteron: tough competitor to Pentium 4, XEON • Developed x 86 -64, their own extension to 64 bits • Recent Years • Intel got its act together • Leads the world in semiconductor technology • AMD has fallen behind • Relies on external semiconductor manufacturer 17

Intel’s 64 -Bit History • 2001: Intel Attempts Radical Shift from IA 32 to

Intel’s 64 -Bit History • 2001: Intel Attempts Radical Shift from IA 32 to IA 64 • Totally different architecture (Itanium) • Executes IA 32 code only as legacy • Performance disappointing • 2003: AMD Steps in with Evolutionary Solution • x 86 -64 (now called “AMD 64”) • Intel Felt Obligated to Focus on IA 64 • Hard to admit mistake or that AMD is better • 2004: Intel Announces EM 64 T extension to IA 32 • Extended Memory 64 -bit Technology • Almost identical to x 86 -64! • All but low-end x 86 processors support x 86 -64 • But, lots of code still runs in 32 -bit mode 18

“Simplicity is the ultimate sophistication” Leonardo da Vinci 19

“Simplicity is the ultimate sophistication” Leonardo da Vinci 19

RISC – Reduced Instruction Set Computer • John Cock • • IBM 801, 1980

RISC – Reduced Instruction Set Computer • John Cock • • IBM 801, 1980 (started in 1975) Name 801 came from the building that housed the project Idea: Possible to make a very small and very fast core Influences: Known as “the father of RISC Architecture”. Turing Award Recipient and National Medal of Science. 20

RISC – Reduced Instruction Set Computer Dave Patterson • RISC Project, 1982 • UC

RISC – Reduced Instruction Set Computer Dave Patterson • RISC Project, 1982 • UC Berkeley • RISC-I: ½ transistors & 3 x faster • Influences: Sun SPARC John L. Hennessy • MIPS, 1981 • Stanford • Simple pipelining, keep full • Influences: MIPS computer system, Play. Station, Nintendo 21

RISC – Reduced Instruction Set Computer • Low complexity • Generally results in overall

RISC – Reduced Instruction Set Computer • Low complexity • Generally results in overall speedup • Less error-prone implementation by hardwired logic or simple microcodes • VLSI implementation advantages • Less transistors • Extra space: more registers, cache • Marketing • Reduced design time, less errors, and more options increase competitiveness 22

RISC – Reduced Instruction Set Computer • Principle: sacrifice everything for speed • reduce

RISC – Reduced Instruction Set Computer • Principle: sacrifice everything for speed • reduce the number of instructions – make CPU simpler • get rid of complex instructions, which may slow down the CPU • use simple addressing modes – less time spent to compute the address of an operand • limit the number of accesses to the memory • if a given operation cannot be executed in one clock period than do not implement it in an instruction • extensive use of pipeline architecture – in order to reach CPI <=1 (at least one instruction per clock period) 23

RISC – Reduced Instruction Set Computer • The compilers themselves • Computationally more complex

RISC – Reduced Instruction Set Computer • The compilers themselves • Computationally more complex • More portable • The compiler writer • Less instructions probably “easier” job • Simpler instructions probably less bugs • Can reuse optimization techniques 24

RISC – Reduced Instruction Set Computer • MIPS Design Principles • Simplicity favor regularity

RISC – Reduced Instruction Set Computer • MIPS Design Principles • Simplicity favor regularity • 32 bit instructions • Smaller is faster • Small register file • Make the common case fast • Include support for constants • Good design demands good compromises • Support for different type off interpretations/classes • E. g. : ARM, POWER 25

RISC – Reduced Instruction Set Computer • MIPS (RISC) • • ≈ 200 instructions,

RISC – Reduced Instruction Set Computer • MIPS (RISC) • • ≈ 200 instructions, 32 bits each, 3 formats all operands in registers almost all are 32 bits each ≈ 1 addressing mode: Mem[reg + imm] • x 86 (Cl. SC) • > 1000 instructions, 1 to 15 bytes each • operands in dedicated registers, general purpose registers, memory, on stack, … • can be 1, 2, 4, 8 bytes, signed or unsigned • 10 types of addressing modes • e. g. Mem[segment + reg*scale + offset] 26

RISC x CISC RISC Philosophy • Regularity & simplicity • Leaner means faster •

RISC x CISC RISC Philosophy • Regularity & simplicity • Leaner means faster • Optimize the common case Energy efficiency Embedded Systems Phones/Tablets CISC Rebuttal • Compilers can be smart • Transistors are plentiful • Legacy is important • Code size counts • Micro-code! Desktops/Servers 27

RISC x CISC (half-truth) RISC CISC MACHINE INSTRUCTIONS µ-CODE TRANSLATION MACHINE INSTRUCTIONS µINSTRUCTIONS PROCESSING

RISC x CISC (half-truth) RISC CISC MACHINE INSTRUCTIONS µ-CODE TRANSLATION MACHINE INSTRUCTIONS µINSTRUCTIONS PROCESSING 28

ARM versus x 86 Android OS on ARM processor High-end processor • < 100

ARM versus x 86 Android OS on ARM processor High-end processor • < 100 instructions • 13 pipeline stages • 13 B produced (2013) Windows OS on Intel (x 86) processor High-end processor • ~ 2000 instructions • 14 pipeline stages • 100 M produced (2013) 29

ARM • ARM is British semiconductor and SW-tools development company, founded in 1990 •

ARM • ARM is British semiconductor and SW-tools development company, founded in 1990 • ARM - leading RISC architecture, used in wide variety of products (mobile devices, peripherals, computers, HD/SSD controllers, automotive apps, Io. T devices, wearables, etc. ) • During 2014 12 billion ARM-based chips shipped, 20% annual growth, 17 -37% market share [Investopedia] • In 2016 august Soft. Bank group (SW, Information Revolution) purchased ARM Holdings (HW, technology), aiming more Io. T. 30

ARM – family history Architecture Bit width Cores designed by ARM Holdings Designed by

ARM – family history Architecture Bit width Cores designed by ARM Holdings Designed by third partners Cortex profile ARMv 2 32/26 ARM 2, ARM 3 Amber, STORM Open Soft Core ARMv 3 32 ARM 6, ARM 7 ARMv 4 32 ARM 8 Stong. ARM, FA 526 ARMv 5 32 ARM 7 EJ, ARM 9 E, ARM 10 E XScale, FA 626 TE, Feroceon, PJ 1/Mohawk ARMv 6 32 ARM 11 ARMv 6 -M 32 ARM Cortex-M 0, ARM Cortex. M 0+, ARM Cortex-M 1 Microcontroller ARMv 7 -M 32 ARM Cortex-M 3 Microcontroller ARMv 7 E-M 32 ARM Cortex-M 4 Microcontroller ARMv 7 -R 32 ARM Cortex-R 4, ARM Cortex. R 5, ARM Cortex-R 7 Real-time ARMv 7 -A 32 ARM Cortex-A 5, ARM Cortex-A 7, ARM Cortex-A 8, ARM Cortex-A 9, ARM Cortex. A 12, ARM Cortex-A 15, ARM Cortex-A 17 Krait, Scorpion, PJ 4/Sheeva, Apple A 6/A 6 X ARMv 8 -A 64/32 ARM Cortex-A 53, ARM Cortex-A 57 X-Gene, Denver, Apple A 7 (Cyclone), K 12 Application 31

ARM Profiles • ARMv 8 -A (Application) architecture profile for high performance markets such

ARM Profiles • ARMv 8 -A (Application) architecture profile for high performance markets such as mobile and enterprise • ARMv 8 -R (Real-time) architecture profile for embedded applications in automotive and industrial control • ARMv 8 -M (Microcontroller) architecture profile for embedded and Io. T applications. Application 32 -bit and 64 -bit A 32, T 32 and A 64 instruction sets • Virtual memory system Supporting rich operating systems • • Real-time • • 32 -bit A 32 and T 32 instruction sets Protected memory system (optional virtual memory) Optimized for real-time systems Microcontroller • • 32 -bit T 32 / Thumb® instruction set only Protected memory system Optimized for microcontroller applications 32

ARM Architecture • Harvard architecture Different busses for instruction and data • RISC machine

ARM Architecture • Harvard architecture Different busses for instruction and data • RISC machine Pipelining (single cycle operation for many instructions) Tumb-2 configuration for both 16 - and 32 -bit instructions 33

ARM ISA: Instruction size • Variable-length instructions • ARM instructions are a fixed length

ARM ISA: Instruction size • Variable-length instructions • ARM instructions are a fixed length of 32 bits • Thumb instructions are a fixed length of 16 bits • Thumb-2 instructions can be either 16 -bit or 32 -bit • Thumb-2 gives approximately 26% improvement in code density over ARM • Thumb-2 gives approximately 25% improvement in performance over Thumb 34

ARM: state of Art 35

ARM: state of Art 35

POWER 36

POWER 36

Complex vs. Simple Instructions • Complex instruction: An instruction does a lot of work,

Complex vs. Simple Instructions • Complex instruction: An instruction does a lot of work, e. g. many operations • Insert in a doubly linked list • Compute FFT • String copy • Simple instruction: An instruction does small amount of work, it is a primitive using which complex operations can be built • Add • XOR • Multiply 37

Complex vs. Simple Instructions • early 80’s: RISC movement challenges “CISC establishment” • RISC

Complex vs. Simple Instructions • early 80’s: RISC movement challenges “CISC establishment” • RISC (reduced instruction set computer) • Berkeley RISC-I (Patterson), Stanford MIPS (Hennessy), IBM 801 • CISC (complex instruction set computer) • VAX, x 86 • word “CISC” did not exist before word RISC came along 38

Complex vs. Simple Instructions • RISC argument [Dave Patterson] CISC is fundamentally handicapped for

Complex vs. Simple Instructions • RISC argument [Dave Patterson] CISC is fundamentally handicapped for a given technology, RISC implementation will be faster current VLSI technology enables single-chip RISC when technology enables single-chip CISC, RISC will be pipelined • when technology enables pipelined CISC, RISC will have caches • when technology enables CISC with caches, RISC will have. . . • • • CISC rebuttal [Bob Colwell] • CISC flaws not fundamental (fixed with more transistors) • Moore’s Law will narrow the RISC/CISC gap (true) • software costs will dominate (very true) 39

Complex vs. Simple Instructions • Argues • RISCs fundamentally better than CISCs • implementation

Complex vs. Simple Instructions • Argues • RISCs fundamentally better than CISCs • implementation effects and compilers are second order • unfair because it compares specific implementations • VAX advantages: big immediates, not-taken branches • MIPS advantages: more registers, FPU, instruction scheduling, TB 40

RISC curiosities • Most commercially successful ISA is x 86 • also: Pentium. Pro

RISC curiosities • Most commercially successful ISA is x 86 • also: Pentium. Pro was first out-of-order microprocessor • • good RISC pipeline, 100 K transistors good CISC pipeline, 300 K transistors by 1995: 2 M+ transistors evened pipeline playing field rest of transistors used for caches (diminishing returns) • Intel’s other trick? • decoder translates CISC into sequences of RISC μops • internally (micro-architecture) is actually RISC! 41

ISA-level Tradeoffs: Semantic Gap • Where to place the ISA? Semantic gap • Closer

ISA-level Tradeoffs: Semantic Gap • Where to place the ISA? Semantic gap • Closer to high-level language (HLL) Small semantic gap, complex instructions • Closer to hardware control signals? Large semantic gap, simple instructions • RISC vs. CISC machines • RISC: Reduced instruction set computer • CISC: Complex instruction set computer • FFT, QUICKSORT, POLY, FP instructions? • VAX INDEX instruction (array access with bounds checking) 42

ISA-level Tradeoffs: Semantic Gap • Simple compiler, complex hardware vs. complex compiler, simple hardware

ISA-level Tradeoffs: Semantic Gap • Simple compiler, complex hardware vs. complex compiler, simple hardware • Caveat: Translation (indirection) can change the tradeoff! • Burden of backward compatibility • Performance? • Optimization opportunity: Example of VAX INDEX instruction: who (compiler vs. hardware) puts more effort into optimization? • Instruction size, code size 43

Small versus Large Semantic Gap • CISC vs. RISC • Complex instruction set computer

Small versus Large Semantic Gap • CISC vs. RISC • Complex instruction set computer complex instructions • Initially motivated by “not good enough” code generation • Reduced instruction set computer simple instructions • John Cocke, mid 1970 s, IBM 801 • Goal: enable better compiler control and optimization • RISC motivated by • Memory stalls (no work done in a complex instruction when there is a memory stall? ) • When is this correct? • Simplifying the hardware lower cost, higher frequency • Enabling the compiler to optimize the code better • Find fine-grained parallelism to reduce stalls 44

A Note on ISA Evolution • ISAs have evolved to reflect/satisfy the concerns of

A Note on ISA Evolution • ISAs have evolved to reflect/satisfy the concerns of the day • Examples: • • Limited on-chip and off-chip memory size Limited compiler optimization technology Limited memory bandwidth Need for specialization in important applications (e. g. , MMX) • Use of translation (in HW and SW) enabled underlying implementations to be similar, regardless of the ISA • Concept of dynamic/static interface • Contrast it with hardware/software interface 45

RISC x CISC: less 4 -100 long execution time = no_instructions*CPI*freq RISC: more 1

RISC x CISC: less 4 -100 long execution time = no_instructions*CPI*freq RISC: more 1 short • Hard to tell which is the best • A combination of CISC and RISC may be the solution: • RISC inside, CISC outside – see Pentium processors • Complex instructions translated into simple (RISC) instructions • BUT there is a cost of translation in CISC 46

ISA-level Tradeoffs: Instruction Length • Fixed length: Length of all instructions the same +

ISA-level Tradeoffs: Instruction Length • Fixed length: Length of all instructions the same + Easier to decode single instruction in hardware + Easier to decode multiple instructions concurrently -- Wasted bits in instructions -- Harder-to-extend ISA (how to add new instructions? ) • Variable length: Length of instructions different (determined by opcode and sub-opcode) + Compact encoding Intel 432: Huffman encoding (sort of). 6 to 321 bit instructions. -- More logic to decode a single instruction -- Harder to decode multiple instructions concurrently • Tradeoffs • Code size (memory space, bandwidth, latency) vs. hardware complexity • ISA extensibility and expressiveness • Performance? Smaller code vs. imperfect decode 47

ISA-level Tradeoffs: Decode type • Uniform decode: Same bits in each instruction correspond to

ISA-level Tradeoffs: Decode type • Uniform decode: Same bits in each instruction correspond to the same meaning • Opcode is always in the same location • Ditto operand specifiers, immediate values, … • Many “RISC” ISAs: Alpha, MIPS, SPARC + Easier decode, simpler hardware + Enables parallelism: generate target address before knowing the instruction is a branch -- Restricts instruction format (fewer instructions? ) or wastes space • Non-uniform decode • E. g. , opcode can be the 1 st-7 th byte in x 86 + More compact and powerful instruction format -- More complex decode logic 48

ISA Wars: Systematic study ACM Transactions on Computer Systems, Vol. 33, No. 1, Article

ISA Wars: Systematic study ACM Transactions on Computer Systems, Vol. 33, No. 1, Article 3, Publication date: March 2015. 49

ISA Wars: Systematic study 50

ISA Wars: Systematic study 50

ISA Wars: Systematic study 51

ISA Wars: Systematic study 51

ISA Wars: Systematic study “Role of ISA: Although our study shows that RISC and

ISA Wars: Systematic study “Role of ISA: Although our study shows that RISC and CISC ISA traits are irrelevant to power and performance characteristics of modern cores, ISAs continue to evolve to better support exposing workloadspecific semantic information to the execution substrate. On x 86, such changes include the transition to Intel 64 (larger word sizes, optimized calling conventions, and shared code support), … , architectural support for transactions in the form of HLE. Similarly, ARM ISA has introduced shorter fixed-length instructions for low-power targets (Thumb), vector extensions (NEON), DSP and bytecode execution extensions (Jazelle DBX), Trustzone security, and hardware virtualization support. Thus, although ISA… ” • CONCLUSION: • CISC x RISC DISCUSSION IS IRRELEVANT • In both cases performance and power-efficiency are the same • Microarchitecture and design methodology are the main factor weighing on performance and consumption. 52