ECE 721 Overview Spring 19 Prof Eric Rotenberg
ECE 721 Overview Spring’ 19 Prof. Eric Rotenberg ECE 721, Spring’ 19 Prof. Eric Rotenberg 1
Performance Strategies Application Class Nature of Parallelism Example Applications Architecture Approach sequential programs Instruction-Level Parallelism (ILP) most ordinary apps, operating systems, etc. general-purpose superscalar or VLIW 721 focus data-parallel programs irregular and fine-grained speculation is key theme Data-Level multimedia, games, Parallelism (DLP) network processing, scientific apps, etc. vector, SIMD, SIMT (GPGPU), special-purpose, ASICs, FPGAs, etc. regular and fine-grained thread-parallel programs ECE 721, Spring’ 19 Thread-Level Parallelism (TLP) regular and coarse-grained Data Center apps, multimedia, games, network processing, scientific apps, etc. Prof. Eric Rotenberg parallel computers, multi-core 2
The Big Five ILP Techniques (ECE 463/563 Review) • Pipelining – Overlap instructions for higher throughput • Caches, prefetching – Bridge processor/memory speed gap • Branch prediction – Remove control dependencies for effective pipelining • Out-of-order execution – Mitigate data dependencies and latencies by decoupling future independent instructions from earlier stalled instructions • Superscalar / VLIW – Exceed scalar (1 instr. /cycle) performance via multiple-instruction issue (N instr. /cycle) ECE 721, Spring’ 19 Prof. Eric Rotenberg 3
ILP Scaling in Commercial Processors Processor Generation Pipeline Depth Issue Width In-flight (fetch to execute) Instructions Pentium 5 1 instr. ~5 Pentium-III 10 3 µ-ops ~40 Pentium-IV 20 3 µ-ops 126 IBM Power 4 12 5 instr. 200 10 (issue) 8 (retire) 224 IBM Power 8 ECE 721, Spring’ 19 Prof. Eric Rotenberg 4
ECE 721 Topics 1. Modern Superscalar Processors – Contemporary organization • • • Physical Register File Memory dependencies Canonical superscalar pipeline – Implementation details – Superscalar complexity – Case studies 2. Possible Next-Generation Superscalars – Large-Window Processors – High-ILP Processors 3. Specialization – New source of performance, as speed and power benefits of technology scaling decrease – Efficiency – Forms: reconfigurable processor, heterogeneous multi-core processor, accelerators ECE 721, Spring’ 19 Prof. Eric Rotenberg 5
Topic 1: Modern Superscalar Processors ECE 721, Spring’ 19 Prof. Eric Rotenberg 6
Style 1 (ECE 563) branch predictor instruction fetch I$ decode, rename, register read, dispatch ARF retire issue queue (IQ) OOO issue ROB FU FU FU D$ execution complete ECE 721, Spring’ 19 Prof. Eric Rotenberg 7
Style 2 (ECE 721) branch predictor instruction fetch Free list I$ exception Arch. recovery Map rename Map decode, rename, dispatch Issue Queue (IQ) misp. branch recovery retire head Shadow Maps OOO issue execution Function Units (FUs) Physical RF tail Active List ECE 721, Spring’ 19 complete Prof. Eric Rotenberg 8
Superscalar Complexity ECE 721, Spring’ 19 Prof. Eric Rotenberg 9
Case Studies ARM Cortex A 15 ECE 721, Spring’ 19 Prof. Eric Rotenberg 10
Case Studies IBM Power 8 microarchitecture block diagram [Image credit: The Linley Group] ECE 721, Spring’ 19 Prof. Eric Rotenberg 11
Topic 2: Possible Next-Gen. Superscalars a. Large-window processors b. High-ILP processors ECE 721, Spring’ 19 Prof. Eric Rotenberg 12
Large-Window Processors • Checkpoint processing and recovery (CPR) – Large virtual window (e. g. , 1 K to 8 K in-flight instructions!) with small physical resources • Continual Flow Pipelines (CFP) – Program continues executing for 100 s of cycles while L 2 -miss instructions are deferred • Run-Ahead Execution – Efficiently exploits memory-level parallelism ECE 721, Spring’ 19 Prof. Eric Rotenberg 13
High-ILP Processors • E. g. , 16 -way superscalar • Explored in the 1990’s • Why revisit today? – Frequency has peaked due to reaching aircooled power limit – Increase performance through parallelism, including ILP ECE 721, Spring’ 19 Prof. Eric Rotenberg 14
High-ILP Processors (cont. ) • Two challenges – ILP bottlenecks – Logic complexity in all pipeline stages increases cycle time and power • Two sets of solutions – Need advanced speculation techniques to overcome ILP bottlenecks – Need a complexity-effective microarchitecture ECE 721, Spring’ 19 Prof. Eric Rotenberg 15
ILP Bottlenecks ILP bottleneck Advanced Speculation Techniques control-flow: branch mispredictions multipath execution, control independence, other control-flow: fetch bandwidth trace cache data-flow value prediction, instr. /trace reuse ECE 721, Spring’ 19 Prof. Eric Rotenberg 16
Advanced Speculation • Value prediction – Predict values and execute dependent instructions in parallel • Trace-level reuse – Collapse 10 s of instructions into a few cycles • Control independence – Don’t squash all instructions after mispredicted branch • Confidence, multipath – Execute both paths of a branch if not confident of prediction ECE 721, Spring’ 19 Prof. Eric Rotenberg 17
Value Prediction = ECE 721, Spring’ 19 p 1 p 2 = = Prof. Eric Rotenberg p 3 18
Control Independence mispredicted branch Save control-independent, data-independent (CIDI) instructions ECE 721, Spring’ 19 Prof. Eric Rotenberg 19
Multipath Execution “unconfident” branch (from confidence estimator) confident 1 st thread confident 2 nd thread ECE 721, Spring’ 19 Prof. Eric Rotenberg 20
Simultaneous Multithreading (SMT) • Run multiple independent threads on wide processor at same time – Naturally increases ILP (more independent instructions) • SMT hardware can also be repurposed for other microarchitecture techniques – Multipath execution – Pre-execution, helper threads, etc. ECE 721, Spring’ 19 Prof. Eric Rotenberg 21
Complexity-effective: Hierarchical Processors Trace Predictor Trace Cache Global Registers Local Registers Function Units ECE 721, Spring’ 19 Prof. Eric Rotenberg 22
Topic 3: Specialization ECE 721, Spring’ 19 Prof. Eric Rotenberg 23
Specialization • Past: Generic superscalar microarchitecture – Generic means not as efficient as possible for individual tasks – Didn’t care: • Exponential performance improvements: technology + microarchitecture (5 ILP techniques) = frequency • Vdd scaling kept power in check • Future: Specialize – Need to specialize hardware to tasks because the scaling “gravy train” is over ECE 721, Spring’ 19 Prof. Eric Rotenberg 24
Forms of Specialization • Reconfigurable processor – Adaptive core: e. g. , adjustable width, depth, or structure sizes – Core fusion: aggregate narrow cores to form a wide processor • Single-ISA heterogeneous multi-core processor – Multiple core “types”, each with different superscalar dimensions – Basic form commercialized: big and little cores (e. g. , ARM’s big. LITTLE) – More advanced form: non-monotonic cores (can’t be performance-ranked) • Accelerators – CPU + Accelerators – Many variants: programmable loop accelerators, compound circuits, reconfigurable arrays of ALUs, conservation cores, ASICs, FPGAs, GPGPU now mainstream in data centers ECE 721, Spring’ 19 Prof. Eric Rotenberg 25
Single-ISA Heterogeneous Multi-core Processor • Include many differently-designed cores on a chip • Fundamentally change what a “general-purpose processor” looks like • Overcome barrier to deploying new microarchitecture ideas: general applicability not an issue ECE 721, Spring’ 19 Prof. Eric Rotenberg 26
Project Frameworks: • 721 sim (C++) – Required – Cycle-level execute-at-execute simulator of a superscalar processor – Projects 1 and 2 are training ground • Fab. Scalar (verilog) – Optional – Highly-parameterized synthesizable RTL design of a superscalar core – Width, depth, and size are configurable • Other – Must be approved by instructor – If needed by your custom research project – Compilers, other simulators, etc. ECE 721, Spring’ 19 Prof. Eric Rotenberg 27
Fab. Scalar-based Chips from NCSU H 3 (3 D Heterogeneous Processor) Technology Dimensions Area Transistors Cells Nets Memory macros Clock domains ECE 721, Spring’ 19 Prof. Eric Rotenberg IBM 8 RF (130 nm) 5. 25 mm x 5. 25 mm 27. 6 mm 2 14. 6 Million 1. 1 Million 721 Thousand 56 10 28
Fab. Scalar-based Chips from NCSU Any. Core: A Width and Size Adaptive Superscalar Core ECE 721, Spring’ 19 Prof. Eric Rotenberg 29
- Slides: 29