Computer Architecture Lecture Notes Spring 2005 Dr Michael

  • Slides: 48
Download presentation
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank (New) Competency Area 6:

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank (New) Competency Area 6: Introduction to Pipelining

Basic Pipelining Concepts P&H 3 rd ed. , Chapter 6 H&P 3 rd ed.

Basic Pipelining Concepts P&H 3 rd ed. , Chapter 6 H&P 3 rd ed. §A. 1

Pipelining - The Basic Concept • In early CPUs, deep combinational logic networks were

Pipelining - The Basic Concept • In early CPUs, deep combinational logic networks were used in between state updates. – Signal delays may vary widely across different paths. – New input cannot be provided to the network until the slowest paths have finished. – Slow clock speed, slow overall processing rates. • In pipelined design, deep logic networks are subdivided into relatively shallow slices (pipeline stages). – Delays through the network are made uniform. – A new input can be provided to each slice as soon as its quick, shallow network has finished. – Multiple inputs are processed simultaneously across stages. – Clock cycle is only as long as the slowest pipeline stage.

Generic Pipelining Illustration • Let represent any of a variety of logic gates •

Generic Pipelining Illustration • Let represent any of a variety of logic gates • Initial, non-pipelined design for some random block of complex logic: latch

Pipelining Illustration cont. • Aggressively pipelined version of same logic: – Insert extra “pipeline

Pipelining Illustration cont. • Aggressively pipelined version of same logic: – Insert extra “pipeline registers” periodically • Here, after every 1 -2 logic layers – This design can process 5 x as much data at once! latch

Another View of Pipelining • Space-time diagrams: – Here, each colored area shows which

Another View of Pipelining • Space-time diagrams: – Here, each colored area shows which parts of the logic network are occupied with data computed from a given input item, at which times. Depth in logic network Data 1 Time Data 2 Non-Pipelined (depth 6)

Simple Multicycle RISC Datapath IF ID EX MEM Next PC Program Counter Inst. Reg.

Simple Multicycle RISC Datapath IF ID EX MEM Next PC Program Counter Inst. Reg. Load fr. Mem. Data WB

Basic RISC Execution Pipeline • Basic idea of instruction-execution pipelining: tim e stage –

Basic RISC Execution Pipeline • Basic idea of instruction-execution pipelining: tim e stage – Each instruction spends 1 clock cycle in each of the execution stages (in our example, there are 5). – during 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions simultaneously!

Same Time, Different Places Different Visualizations Same instruction, different steps Sa Same Time, Different

Same Time, Different Places Different Visualizations Same instruction, different steps Sa Same Time, Different Data Item / Instruction m e. P Skew Same Place, Different Times lac e, Di ffe re nt Ti m es

More Graphical Detail

More Graphical Detail

Adding Pipeline Registers

Adding Pipeline Registers

Description of Pipe Stages

Description of Pipe Stages

Dependences (from H&P 3 rd ed. § 3. 1)

Dependences (from H&P 3 rd ed. § 3. 1)

Dependences • A dependence is a way in which one instruction can depend on

Dependences • A dependence is a way in which one instruction can depend on (be impacted by) another for scheduling purposes. • Three major dependence types: – Data dependence – Name dependence – Control dependence • I’ll sometimes use the word dependency for a particular instance of one instruction depending on another. – The instructions can’t be effectively (as opposed to just syntactically) fully parallelized, or reordered.

Data Dependence • Recursive definition: – Instruction B is data dependent on instruction A

Data Dependence • Recursive definition: – Instruction B is data dependent on instruction A iff: • B uses a data result produced by instruction A, or • There is another instruction C such that B is data dependent on C, and C is data dependent on A. • When a data dependence is present, there is a potential RAW hazard. Loop: LD ADDD SD SUBI BNEZ F 0, 0(R 1) F 4, F 0, F 2 0(R 1), F 4 R 1, #8 R 1, Loop Direct data dependencies in a simple example code fragment A A B C B

Name Dependence • When two instructions access the same data storage location, but are

Name Dependence • When two instructions access the same data storage location, but are not data dependent. – Also, at least one of the accesses must be a write. – Antidependence: A reads, then B writes. time • Two sub-types (for inst. B after inst. A): B • Potential for a WAR hazard. • Potential for a WAW hazard. time – Output dependence: A writes, then B writes. A A B • Note: Name dependencies can be avoided by changing instructions to use different locations – (Rather than reusing 1 location for 2 purposes. ) – This fix is called renaming.

Control Dependence • Occurs when the execution of an instruction (as in, will it

Control Dependence • Occurs when the execution of an instruction (as in, will it be executed, or not? ) depends on the outcome of some earlier, conditional branch instruction. • We generally can’t easily change which branches an instruction depends on w/o ruining the program’s functional behavior. • However, there are exceptions.

Hazards, Stalls, & Forwarding H&P 3 rd ed. §A. 2 -3

Hazards, Stalls, & Forwarding H&P 3 rd ed. §A. 2 -3

Hazards • Hazards are circumstances which may lead to stalls in the pipeline if

Hazards • Hazards are circumstances which may lead to stalls in the pipeline if not addressed. – Stalls are delays, and may be called “bubbles” • There are three major types of hazards: – Structural hazards: • Not enough HW resources to keep all instrs. moving. – Data hazards • Data results of earlier instrs. not yet avail. when needed. – Control hazards • Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute.

Structural Hazard Example Suppose you had a combined instruction+data memory w. only 1 read

Structural Hazard Example Suppose you had a combined instruction+data memory w. only 1 read port

Progress through pipe Hazards Produce “Bubbles” Bubble rises Time Unskew

Progress through pipe Hazards Produce “Bubbles” Bubble rises Time Unskew

Textual View A pipeline stalled for a structural hazard – a load with only

Textual View A pipeline stalled for a structural hazard – a load with only one memory port

Example Data Hazards

Example Data Hazards

Forwarding for Data Hazards

Forwarding for Data Hazards

Another Forwarding Example

Another Forwarding Example

Three Types of Data Hazards • Let i be an earlier instruction, j a

Three Types of Data Hazards • Let i be an earlier instruction, j a later one. • RAW (read after write) – j is supposed to Read a value After i Writes it, • But instead j tries to read the value before i has written it • WAW (write after write) – j should Write to a given place After i Writes there, • But they end up writing in the wrong order. – Only occurs if >1 pipeline stage can write. • WAR (write after read) – j should Write a new value After i Reads the old, • But instead j writes the new value before i has read the old one. – Only occurs if writes can happen before reads in pipeline.

An Unavoidable Stall

An Unavoidable Stall

Stalling in midst of instruction

Stalling in midst of instruction

Data Hazard Prevention • A clever compiler can often reschedule instructions to avoid a

Data Hazard Prevention • A clever compiler can often reschedule instructions to avoid a stall. – A simple example: • Original code: lw r 2, 0(r 4) add r 1, r 2, r 3 lw r 5, 4(r 4) • Transformed code: lw r 2, 0(r 4) lw r 5, 4(r 4) add r 1, r 2, r 3 Note: Stall happens here! No stall needed!

Simple RISC Pipeline Stall Statistics Note that ~1 in 5 loads causes a stall

Simple RISC Pipeline Stall Statistics Note that ~1 in 5 loads causes a stall in many programs! Percentage of loads that cause a stall Benchmark

Data Hazard Detection

Data Hazard Detection

Hazard Detection Logic • Example: Detecting whether an instruction that has just been fetched

Hazard Detection Logic • Example: Detecting whether an instruction that has just been fetched needs to be stalled 1 cycle because of an immediately preceding load. IF/ID IF ID ID/EX EX/ME ME/WB EX ME WB IF/ID

Forwarding Situations in DLX

Forwarding Situations in DLX

Implementing Forwarding in HW

Implementing Forwarding in HW

Control Hazards, Branch Prediction, Delayed Branches H&P 3 rd ed. , §§A. 2 -3

Control Hazards, Branch Prediction, Delayed Branches H&P 3 rd ed. , §§A. 2 -3 & § 4. 2

Control Hazards • Suppose the new PC value was not computed until the MEM

Control Hazards • Suppose the new PC value was not computed until the MEM stage (like orig. RISC design). • Then we must stall 3 clocks after every branch!

Early Branch Resolution

Early Branch Resolution

New Pipeline Logic

New Pipeline Logic

Control Instruction Statistics • ~10% of dynamic insts. are fwd. cond. branches • only

Control Instruction Statistics • ~10% of dynamic insts. are fwd. cond. branches • only ~3% are backwards cond. branches • similar percentage are unconditional branches`

Stats on Taken Branches ~67% of cond. branches are taken

Stats on Taken Branches ~67% of cond. branches are taken

Predict-Not-Taken

Predict-Not-Taken

Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is

Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is taken (if taken) at this point

Filling the Branch-Delay Slot

Filling the Branch-Delay Slot

Static Branch Prediction • Earlier we discussed predict-taken, predict-nottaken static prediction strategies – Applied

Static Branch Prediction • Earlier we discussed predict-taken, predict-nottaken static prediction strategies – Applied uniformly across all branches in program • Static analysis in compiler may be able to do better, if it can non-uniformly predict whether each specific branch is likely to be taken or not – One way: Backwards taken, forwards not taken. • If we can do better, it can help with static code scheduling to reduce data hazard stalls… – Also may assist later dynamic prediction

Prediction Helps Static Scheduling Some data dependences Code movements to consider: If case Else

Prediction Helps Static Scheduling Some data dependences Code movements to consider: If case Else case else: after: LD DSUBU BEQZ OR DADDU J DADDU … R 1, 0(R 2) Potential load delay to fill R 1, R 3 Which way will this R 1, else branch go? R 4, R 5, R 6 R 10, R 4, E 3 If-then-else after control flow R 7, R 8, R 9

Some Static Prediction Schemes • Always predict taken – 34% mispredict rate on SPEC

Some Static Prediction Schemes • Always predict taken – 34% mispredict rate on SPEC (range 9%-54%) • Backwards predict taken, forwards not taken – In SPEC, more than ½ of forwards are taken! • This does worse than “always predict taken” strategy – Usu. not better than 30 -40% misprediction rate • Better than either: Use profile information! – Collect statistics on earlier program runs. – Works well because individual branches tend to be strongly biased (taken or not) given average data • Bias tends to remain stable across multiple runs

Profile-Based Predictor Statistics Floating-Point

Profile-Based Predictor Statistics Floating-Point

Predict-Taken vs. Profile-Based Instructions executed in between mispredictions Floating-point (Log scale!)

Predict-Taken vs. Profile-Based Instructions executed in between mispredictions Floating-point (Log scale!)