Computer Architecture A Quantitative Approach Sixth Edition Chapter

n Performance improvements: n Improvements in semiconductor technology n n Feature size, clock speed

Copyright © 2019, Elsevier Inc. All rights reserved. Introduction Single Processor Performance 3

n Cannot continue to leverage Instruction-Level parallelism (ILP) n n Single processor performance improvement

n Personal Mobile Device (PMD) n n n Desktop Computing n n Emphasis on

n Classes of parallelism in applications: n n n Data-Level Parallelism (DLP) Task-Level Parallelism

n Single instruction stream, single data stream (SISD) n Single instruction stream, multiple data

n “Old” view of computer architecture: n n Instruction Set Architecture (ISA) design i.

n Class of ISA n n n General-purpose registers Register-memory vs load-store RISC-V registers

n Memory addressing n n Addressing modes n n n RISC-V: byte addressed, aligned

Defining Computer Architecture Instruction Set Architecture n Operations n n n Control flow instructions

Trends in Technology n Integrated circuit technology (Moore’s Law) n n DRAM capacity: 25

n Bandwidth or throughput n n Total work done in a given time 32,

Trends in Technology Bandwidth and Latency Log-log plot of bandwidth and latency milestones Copyright

n Feature size n n n Minimum size of transistor or wire in x

n Problem: Get power in, get power out n Thermal Design Power (TDP) n

n Dynamic energy n n n Dynamic power n n Transistor switch from 0

n n Intel 80386 consumed ~ 2 W 3. 3 GHz Intel Core i

n Techniques for reducing power: n n Do nothing well Dynamic Voltage-Frequency Scaling Trends

n Static power consumption n n 25 -50% of total power Currentstatic x Voltage

n Cost driven down by learning curve n n n Trends in Cost Yield

n Integrated circuit n Bose-Einstein formula: n n Trends in Cost Integrated Circuit Cost

Dependability n Module reliability n n Mean time to failure (MTTF) Mean time to

n Typical performance metrics: n n n Speedup of X relative to Y n

n Take Advantage of Parallelism n n e. g. multiple processors, disks, memory banks,

n Principles of Computer Design The Processor Performance Equation Copyright © 2019, Elsevier Inc.

n Principles of Computer Design Different instruction types having different CPIs Copyright © 2019,

Fallacies and Pitfalls n All exponential laws must come to an end n Dennard

Fallacies and Pitfalls n Microprocessors are a silver bullet n n n Performance is

Fallacies and Pitfalls n The rated mean time to failure of disks is 1,

Slides: 31

Download presentation

n Performance improvements: n Improvements in semiconductor technology n n Feature size, clock speed Improvements in computer architectures n n n Introduction Computer Technology Enabled by HLL compilers, UNIX Lead to RISC architectures Together have enabled: n n Lightweight computers Productivity-based managed/interpreted programming languages Copyright © 2019, Elsevier Inc. All rights reserved. 2

n Cannot continue to leverage Instruction-Level parallelism (ILP) n n Single processor performance improvement ended in 2003 New models for performance: n n Introduction Current Trends in Architecture Data-level parallelism (DLP) Thread-level parallelism (TLP) Request-level parallelism (RLP) These require explicit restructuring of the application Copyright © 2019, Elsevier Inc. All rights reserved. 4

n Personal Mobile Device (PMD) n n n Desktop Computing n n Emphasis on availability, scalability, throughput Clusters / Warehouse Scale Computers n n Emphasis on price-performance Servers n n e. g. start phones, tablet computers Emphasis on energy efficiency and real-time Classes of Computers Used for “Software as a Service (Saa. S)” Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Internet of Things/Embedded Computers n Emphasis: price Copyright © 2019, Elsevier Inc. All rights reserved. 5

n Classes of parallelism in applications: n n n Data-Level Parallelism (DLP) Task-Level Parallelism (TLP) Classes of Computers Parallelism Classes of architectural parallelism: n n Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism Copyright © 2019, Elsevier Inc. All rights reserved. 6

n Single instruction stream, single data stream (SISD) n Single instruction stream, multiple data streams (SIMD) n n Vector architectures Multimedia extensions Graphics processor units Multiple instruction streams, single data stream (MISD) n n Classes of Computers Flynn’s Taxonomy No commercial implementation Multiple instruction streams, multiple data streams (MIMD) n n Tightly-coupled MIMD Loosely-coupled MIMD Copyright © 2019, Elsevier Inc. All rights reserved. 7

n “Old” view of computer architecture: n n Instruction Set Architecture (ISA) design i. e. decisions regarding: n n registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding Defining Computer Architecture “Real” computer architecture: n n n Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware Copyright © 2019, Elsevier Inc. All rights reserved. 8

n Class of ISA n n n General-purpose registers Register-memory vs load-store RISC-V registers n 32 g. p. , 32 f. p. Register Name Use Saver x 9 s 1 saved callee Register Name Use Saver x 10 -x 17 a 0 -a 7 arguments caller x 0 zero constant 0 n/a x 18 -x 27 s 2 -s 11 saved callee x 1 ra return addr caller x 28 -x 31 t 3 -t 6 temporaries caller x 2 sp stack ptr callee f 0 -f 7 ft 0 -ft 7 FP temps caller x 3 gp gbl ptr f 8 -f 9 fs 0 -fs 1 FP saved callee x 4 tp thread ptr f 10 -f 17 fa 0 -fa 7 FP arguments callee x 5 -x 7 t 0 -t 2 temporaries caller x 8 s 0/fp saved/ frame ptr callee f 18 -f 27 fs 2 -fs 21 FP saved callee f 28 -f 31 ft 8 -ft 11 FP temps caller Copyright © 2019, Elsevier Inc. All rights reserved. Defining Computer Architecture Instruction Set Architecture 9

n Memory addressing n n Addressing modes n n n RISC-V: byte addressed, aligned accesses faster RISC-V: Register, immediate, displacement (base+offset) Other examples: autoincrement, indexed, PC-relative Defining Computer Architecture Instruction Set Architecture Types and size of operands n RISC-V: 8 -bit, 32 -bit, 64 -bit Copyright © 2019, Elsevier Inc. All rights reserved. 10

Defining Computer Architecture Instruction Set Architecture n Operations n n n Control flow instructions n n n RISC-V: data transfer, arithmetic, logical, control, floating point See Fig. 1. 5 in text Use content of registers (RISC-V) vs. status bits (x 86, ARMv 7, ARMv 8) Return address in register (RISC-V, ARMv 7, ARMv 8) vs. on stack (x 86) Encoding n Fixed (RISC-V, ARMv 7/v 8 except compact instruction set) vs. variable length (x 86) Copyright © 2019, Elsevier Inc. All rights reserved. 11

Trends in Technology n Integrated circuit technology (Moore’s Law) n n DRAM capacity: 25 -40%/year (slowing) n n 8 Gb (2014), 16 Gb (2019), possibly no 32 Gb Flash capacity: 50 -60%/year n n Transistor density: 35%/year Die size: 10 -20%/year Integration overall: 40 -55%/year 8 -10 X cheaper/bit than DRAM Magnetic disk capacity: recently slowed to 5%/year n n n Density increases may no longer be possible, maybe increase from 7 to 9 platters 8 -10 X cheaper/bit then Flash 200 -300 X cheaper/bit than DRAM Copyright © 2019, Elsevier Inc. All rights reserved. 12

n Bandwidth or throughput n n Total work done in a given time 32, 000 -40, 000 X improvement for processors 300 -1200 X improvement for memory and disks Trends in Technology Bandwidth and Latency or response time n n n Time between start and completion of an event 50 -90 X improvement for processors 6 -8 X improvement for memory and disks Copyright © 2019, Elsevier Inc. All rights reserved. 13

n Feature size n n n Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to. 011 microns in 2017 Transistor performance scales linearly n n Trends in Technology Transistors and Wires Wire delay does not improve with feature size! Integration density scales quadratically Copyright © 2019, Elsevier Inc. All rights reserved. 15

n Problem: Get power in, get power out n Thermal Design Power (TDP) n n n Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power (1. 5 X higher), higher than average power consumption Trends in Power and Energy Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement Copyright © 2019, Elsevier Inc. All rights reserved. 16

n Dynamic energy n n n Dynamic power n n Transistor switch from 0 -> 1 or 1 -> 0 ½ x Capacitive load x Voltage 2 Trends in Power and Energy Dynamic Energy and Power ½ x Capacitive load x Voltage 2 x Frequency switched Reducing clock rate reduces power, not energy Copyright © 2019, Elsevier Inc. All rights reserved. 17

n n Intel 80386 consumed ~ 2 W 3. 3 GHz Intel Core i 7 consumes 130 W Heat must be dissipated from 1. 5 x 1. 5 cm chip This is the limit of what can be cooled by air Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Power and Energy Power 18

n Techniques for reducing power: n n Do nothing well Dynamic Voltage-Frequency Scaling Trends in Power and Energy Reducing Power Low power state for DRAM, disks Overclocking, turning off cores Copyright © 2019, Elsevier Inc. All rights reserved. 19

n Static power consumption n n 25 -50% of total power Currentstatic x Voltage Scales with number of transistors To reduce: power gating Copyright © 2019, Elsevier Inc. All rights reserved. Trends in Power and Energy Static Power 20

n Cost driven down by learning curve n n n Trends in Cost Yield DRAM: price closely tracks cost Microprocessors: price depends on volume n 10% less for each doubling of volume Copyright © 2019, Elsevier Inc. All rights reserved. 21

n Integrated circuit n Bose-Einstein formula: n n Trends in Cost Integrated Circuit Cost Defects per unit area = 0. 016 -0. 057 defects per square cm (2010) N = process-complexity factor = 11. 5 -15. 5 (40 nm, 2010) Copyright © 2019, Elsevier Inc. All rights reserved. 22

Dependability n Module reliability n n Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF Copyright © 2019, Elsevier Inc. All rights reserved. 23

n Typical performance metrics: n n n Speedup of X relative to Y n n Execution time. Y / Execution time. X Execution time n n n Response time Throughput Measuring Performance Wall clock time: includes all system overheads CPU time: only computation time Benchmarks n n Kernels (e. g. matrix multiply) Toy programs (e. g. sorting) Synthetic benchmarks (e. g. Dhrystone) Benchmark suites (e. g. SPEC 06 fp, TPC-C) Copyright © 2019, Elsevier Inc. All rights reserved. 24

n Take Advantage of Parallelism n n e. g. multiple processors, disks, memory banks, pipelining, multiple functional units Principle of Locality n n Principles of Computer Design Reuse of data and instructions Focus on the Common Case n Amdahl’s Law Copyright © 2019, Elsevier Inc. All rights reserved. 25

Fallacies and Pitfalls n All exponential laws must come to an end n Dennard scaling (constant power density) n n Disk capacity n n Stopped by threshold voltage 30 -100% per year to 5% per year Moore’s Law n n Most visible with DRAM capacity ITRS disbanded Only four foundries left producing state-of-the-art logic chips 11 nm, 3 nm might be the limit Copyright © 2019, Elsevier Inc. All rights reserved. 29

Fallacies and Pitfalls n Microprocessors are a silver bullet n n n Performance is now a programmer’s burden Falling prey to Amdahl’s Law A single point of failure Hardware enhancements that increase performance also improve energy efficiency, or are at worst energy neutral Benchmarks remain valid indefinitely n Compiler optimizations target benchmarks Copyright © 2019, Elsevier Inc. All rights reserved. 30

Fallacies and Pitfalls n The rated mean time to failure of disks is 1, 200, 000 hours or almost 140 years, so disks practically never fail n n n MTTF value from manufacturers assume regular replacement Peak performance tracks observed performance Fault detection can lower availability n Not all operations are needed for correct execution Copyright © 2019, Elsevier Inc. All rights reserved. 31