Instruction Flow Techniques Prof Mikko H Lipasti University

  • Slides: 20
Download presentation
Instruction Flow Techniques Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on

Instruction Flow Techniques Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti

Instruction Flow Techniques Goal of Instruction Flow and Impediments Branch Types and Implementations What’s

Instruction Flow Techniques Goal of Instruction Flow and Impediments Branch Types and Implementations What’s So Bad About Branches? What are Control Dependences? Impact of Control Dependences on Performance • Improving I-Cache Performance • • •

Instruction Flow in Context

Instruction Flow in Context

Goal and Impediments • Goal of Instruction Flow – Supply processor with maximum number

Goal and Impediments • Goal of Instruction Flow – Supply processor with maximum number of useful instructions every clock cycle • Impediments – Branches and jumps – Finite I-Cache • Capacity • Bandwidth restrictions

Branch Types and Implementation 1. Types of Branches A. Conditional or Unconditional B. Save

Branch Types and Implementation 1. Types of Branches A. Conditional or Unconditional B. Save PC? C. How is target computed? • • Single target (immediate, PC+immediate) Multiple targets (register) 2. Branch Architectures A. Condition code or condition registers B. Register

Branch Types and Implementation 31 0 CR 1 CR 2 CR 3 CR 4

Branch Types and Implementation 31 0 CR 1 CR 2 CR 3 CR 4 CR 5 CR 6 CR 7 1. Power. PC 32 -bit condition register – eight 4 -bit fields (CR 0 -CR 7) CR 0 can be implicit result of integer op CR 1 can be implicit result of FP op Compare ops set explicit CR field Special CR ops manipulate bits Conditional branch instructions test CR bits

Branches – Power. PC 12 Types of Branches Branch (unconditional, no save PC, PC+imm)

Branches – Power. PC 12 Types of Branches Branch (unconditional, no save PC, PC+imm) Branch absolute (uncond, no save PC, imm) Branch and link (uncond, save PC, PC+imm) Branch abs and link (uncond, save PC, imm) Branch conditional (conditional, no save PC, PC+imm) Branch cond abs (cond, no save PC, imm) Branch cond and link (cond, save PC, PC+imm) Branch cond abs and link (cond, save PC, imm) Branch cond to link register (cond, don’t save PC, reg) Branch cond to link reg and link (cond, save PC, reg) Branch cond to count reg (cond, don’t save PC, reg) Branch cond to count reg and link (cond, save PC, reg)

Branches – DEC Alpha 2. Alpha 3 Types of Branches Conditional branch (cond, no

Branches – DEC Alpha 2. Alpha 3 Types of Branches Conditional branch (cond, no save PC, PC+imm) Bxx Ra, disp Unconditional branch (uncond, Save PC, PC+imm) Br Ra, disp Jumps (uncond, save PC, Register) J Ra

Branches – MIPS 3. MIPS 6 Types of Branches Jump (uncond, no save PC,

Branches – MIPS 3. MIPS 6 Types of Branches Jump (uncond, no save PC, imm) Jump and link (uncond, save PC, imm) Jump register (uncond, no save PC, register) Jump and link register (uncond, save PC, register) Branch (conditional, no save PC, PC+imm) Branch and link (conditional, save PC, PC+imm)

What’s So Bad About Branches? • Effects of Branches – – Fragmentation of I-Cache

What’s So Bad About Branches? • Effects of Branches – – Fragmentation of I-Cache lines Need to determine branch direction Need to determine branch target Use up execution resources • Pipeline drain/fill

What’s So Bad About Branches? Problem: Fetch stalls until direction is determined Solutions: •

What’s So Bad About Branches? Problem: Fetch stalls until direction is determined Solutions: • Minimize delay – Move instructions determining branch condition away from branch (CC architecture) • Make use of delay – Non-speculative: • • – Fill delay slots with useful safe instructions Execute both paths (eager execution) Speculative: • Predict branch direction

What’s So Bad About Branches? Problem: Fetch stalls until branch target is determined Solutions:

What’s So Bad About Branches? Problem: Fetch stalls until branch target is determined Solutions: • Minimize delay – Generate branch target early • Make use of delay: Predict branch target – Single target – Multiple targets

Control Dependences • Control Flow Graph – Shows possible paths of control flow through

Control Dependences • Control Flow Graph – Shows possible paths of control flow through basic blocks

Control Dependences • Control Dependence – Node B is CD on Node A if

Control Dependences • Control Dependence – Node B is CD on Node A if A determines whether B executes – If path 1 from A to exit includes B, and path 2 does not, then B is control-dependent on A

Limits on Instruction Level Parallelism (ILP) Weiss and Smith [1984] 1. 58 Sohi and

Limits on Instruction Level Parallelism (ILP) Weiss and Smith [1984] 1. 58 Sohi and Vajapeyam [1987] 1. 81 Tjaden and Flynn [1970] 1. 86 (Flynn’s bottleneck) Tjaden and Flynn [1973] 1. 96 Uht [1986] 2. 00 Smith et al. [1989] 2. 00 Jouppi and Wall [1988] 2. 40 Johnson [1991] 2. 50 Acosta et al. [1986] 2. 79 Wedig [1982] 3. 00 Butler et al. [1991] 5. 8 Melvin and Patt [1991] 6 Wall [1991] 7 (Jouppi disagreed) Kuck et al. [1972] 8 Riseman and Foster [1972] 51 (no control dependences) Nicolau and Fisher [1984] 90 (Fisher’s optimism)

Riseman and Foster’s Study • 7 benchmark programs on CDC-3600 • Assume infinite machines

Riseman and Foster’s Study • 7 benchmark programs on CDC-3600 • Assume infinite machines – – Infinite memory and instruction stack Infinite register file Infinite functional units True dependencies only at dataflow limit • If bounded to single basic block, speedup is 1. 72 (Flynn’s bottleneck) • If one can bypass n branches (hypothetically), then: Branches Bypassed Speedup 0 1 2 8 32 128 1. 72 2. 72 3. 62 7. 21 14. 8 24. 4 51. 2

Speculative Execution • Riseman & Foster showed potential – But no idea how to

Speculative Execution • Riseman & Foster showed potential – But no idea how to reap benefit • 1979: Jim Smith patents branch prediction at Control Data – Predict current branch based on past history • Today: virtually all processors use branch prediction © 2005 Mikko Lipasti 17

Instruction Flow in Context © 2005 Mikko Lipasti 18

Instruction Flow in Context © 2005 Mikko Lipasti 18

Improving I-Cache Performance • Larger cache size – Code compression – Instruction registers •

Improving I-Cache Performance • Larger cache size – Code compression – Instruction registers • Increased associativity – Conflict misses less of a problem than in data caches • Larger line size – Spatial locality inherent in sequential program I-stream • Code layout – Maximize instruction stream’s spatial locality • Cache prefetching – Next-line, streaming buffer – Branch target (even if not taken) • Other types of I-cache organization – Trace cache [Ch. 9] © 2005 Mikko Lipasti 19

Recap • Branch types • Control dependences • Improving instruction cache performance

Recap • Branch types • Control dependences • Improving instruction cache performance