Computer Organization and Architecture Chapter 13 Instruction Level

  • Slides: 28
Download presentation
Computer Organization and Architecture Chapter 13 Instruction Level Parallelism and Superscalar Processors 1

Computer Organization and Architecture Chapter 13 Instruction Level Parallelism and Superscalar Processors 1

What is Superscalar? z Common instructions (arithmetic, load/store, conditional branch) can be initiated and

What is Superscalar? z Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently z Equally applicable to RISC & CISC z In practice usually RISC 2

Why Superscalar? z Most operations are on scalar quantities (see RISC notes) z Improve

Why Superscalar? z Most operations are on scalar quantities (see RISC notes) z Improve these operations to get an overall improvement 3

General Superscalar Organization 4

General Superscalar Organization 4

Superpipelined z Many pipeline stages need less than half a clock cycle z Double

Superpipelined z Many pipeline stages need less than half a clock cycle z Double internal clock speed gets two tasks per external clock cycle z Superscalar allows parallel fetch execute 5

Superscalar v Superpipeline 6

Superscalar v Superpipeline 6

Limitations z Instruction level parallelism z Compiler based optimisation z Hardware techniques z Limited

Limitations z Instruction level parallelism z Compiler based optimisation z Hardware techniques z Limited by y. True data dependency y. Procedural dependency y. Resource conflicts y. Output dependency y. Antidependency 7

True Data Dependency z ADD r 1, r 2 (r 1 : = r

True Data Dependency z ADD r 1, r 2 (r 1 : = r 1+r 2; ) z MOVE r 3, r 1 (r 3 : = r 1; ) z Can fetch and decode second instruction in parallel with first z Can NOT execute second instruction until first is finished 8

Procedural Dependency z Can not execute instructions after a branch in parallel with instructions

Procedural Dependency z Can not execute instructions after a branch in parallel with instructions before a branch z Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed z This prevents simultaneous fetches 9

Resource Conflict z Two or more instructions requiring access to the same resource at

Resource Conflict z Two or more instructions requiring access to the same resource at the same time ye. g. two arithmetic instructions z Can duplicate resources ye. g. have two arithmetic units 10

Dependencies 11

Dependencies 11

Design Issues z Instruction level parallelism y. Instructions in a sequence are independent y.

Design Issues z Instruction level parallelism y. Instructions in a sequence are independent y. Execution can be overlapped y. Governed by data and procedural dependency z Machine Parallelism y. Ability to take advantage of instruction level parallelism y. Governed by number of parallel pipelines 12

Instruction Issue Policy z Order in which instructions are fetched z Order in which

Instruction Issue Policy z Order in which instructions are fetched z Order in which instructions are executed z Order in which instructions change registers and memory 13

In-Order Issue In-Order Completion z Issue instructions in the order they occur z Not

In-Order Issue In-Order Completion z Issue instructions in the order they occur z Not very efficient z May fetch >1 instruction z Instructions must stall if necessary 14

In-Order Issue In-Order Completion (Diagram) 15

In-Order Issue In-Order Completion (Diagram) 15

In-Order Issue Out-of-Order Completion z Output dependency y. R 3: = R 3 +

In-Order Issue Out-of-Order Completion z Output dependency y. R 3: = R 3 + R 5; (I 1) y. R 4: = R 3 + 1; (I 2) y. R 3: = R 5 + 1; (I 3) y. I 2 depends on result of I 1 - data dependency y. If I 3 completes before I 1, the result from I 1 will be wrong - output (read-write) dependency 16

In-Order Issue Out-of-Order Completion (Diagram) 17

In-Order Issue Out-of-Order Completion (Diagram) 17

Out-of-Order Issue Out-of-Order Completion z Decouple decode pipeline from execution pipeline z Can continue

Out-of-Order Issue Out-of-Order Completion z Decouple decode pipeline from execution pipeline z Can continue to fetch and decode until this pipeline is full z When a functional unit becomes available an instruction can be executed z Since instructions have been decoded, processor can look ahead 18

Out-of-Order Issue Out-of-Order Completion (Diagram) 19

Out-of-Order Issue Out-of-Order Completion (Diagram) 19

Antidependency z Write-write dependency y. R 3: =R 3 + R 5; (I 1)

Antidependency z Write-write dependency y. R 3: =R 3 + R 5; (I 1) y. R 4: =R 3 + 1; (I 2) y. R 3: =R 5 + 1; (I 3) y. R 7: =R 3 + R 4; (I 4) y. I 3 can not complete before I 2 starts as I 2 needs a value in R 3 and I 3 changes R 3 20

Register Renaming z Output and antidependencies occur because register contents may not reflect the

Register Renaming z Output and antidependencies occur because register contents may not reflect the correct ordering from the program z May result in a pipeline stall z Registers allocated dynamically yi. e. registers are not specifically named 21

Register Renaming example z R 3 b: =R 3 a + R 5 a

Register Renaming example z R 3 b: =R 3 a + R 5 a (I 1) z R 4 b: =R 3 b + 1 (I 2) z R 3 c: =R 5 a + 1 (I 3) z R 7 b: =R 3 c + R 4 b (I 4) z Without subscript refers to logical register in instruction z With subscript is hardware register allocated z Note R 3 a R 3 b R 3 c 22

Machine Parallelism z Duplication of Resources z Out of order issue z Renaming z

Machine Parallelism z Duplication of Resources z Out of order issue z Renaming z Not worth duplication functions without register renaming z Need instruction window large enough (more than 8) 23

Branch Prediction z 80486 fetches both next sequential instruction after branch and branch target

Branch Prediction z 80486 fetches both next sequential instruction after branch and branch target instruction z Gives two cycle delay if branch taken 24

RISC - Delayed Branch z Calculate result of branch before unusable instructions pre-fetched z

RISC - Delayed Branch z Calculate result of branch before unusable instructions pre-fetched z Always execute single instruction immediately following branch z Keeps pipeline full while fetching new instruction stream z Not as good for superscalar y. Multiple instructions need to execute in delay slot y. Instruction dependence problems z Revert to branch prediction 25

Superscalar Execution 26

Superscalar Execution 26

Superscalar Implementation z Simultaneously fetch multiple instructions z Logic to determine true dependencies involving

Superscalar Implementation z Simultaneously fetch multiple instructions z Logic to determine true dependencies involving register values z Mechanisms to communicate these values z Mechanisms to initiate multiple instructions in parallel z Resources for parallel execution of multiple instructions z Mechanisms for committing process state in correct order 27

Required Reading z Stallings chapter 13 z Manufacturers web sites z IMPACT web site

Required Reading z Stallings chapter 13 z Manufacturers web sites z IMPACT web site yresearch on predicated execution 28