Chapter 6 Pipelining and Superscalar Techniques Linear pipeline

  • Slides: 29
Download presentation
Chapter 6 Pipelining and Superscalar Techniques

Chapter 6 Pipelining and Superscalar Techniques

Linear pipeline processors • A linear pipeline processor is a cascade of processing stages

Linear pipeline processors • A linear pipeline processor is a cascade of processing stages which are linearly connected to perform a fixed function over a stream of data flowing from one end to the other. • Depending on the control of flow of data along the pipeline, 2 categories of pipeline: asynchronous and synchronous model.

 • Asynchronous model: data flow between adjacent stages in this is controlled by

• Asynchronous model: data flow between adjacent stages in this is controlled by a handshaking protocol. • When stage is ready to transmit, it sends a ready signal to the next stage. Next stage then receives the data and returns the acknowledgement. • Different delays may be experienced in different stages.

 • Synchronous: Clocked Latches are used. Upon arrival of the clock signal, all

• Synchronous: Clocked Latches are used. Upon arrival of the clock signal, all latches transfer data to the next stage simultaneously. • The utilization pattern of successive stages in a synchronous pipeline is specified by a reservation table.

 • Clock cycle: let t be the clock cycle of a pipeline. Ti

• Clock cycle: let t be the clock cycle of a pipeline. Ti be the time delay of the stage. D is the time delay of the latch. • T=t + d • Pipeline frequency: f=1/t

Total time: T = [k + (n-1)]t Where k =no. of stages. N= no.

Total time: T = [k + (n-1)]t Where k =no. of stages. N= no. of tasks, t= clock cycle. Speed-up factor: Sk= T 1/ Tk = nkt/ [k + (n-1)]t = nk/ [k + (n-1)]

 • • Performance/Cost ratio: Given by larson: PCR= f/ [c + kh] =

• • Performance/Cost ratio: Given by larson: PCR= f/ [c + kh] = 1/ [ (t/k + d)(c +kh)] Where c = cost of logic stage And h = cost of latch K 0 = √t. c/d. h

 • Efficiency : Ek = Sk/ k. = n/ [k + (n-1)] •

• Efficiency : Ek = Sk/ k. = n/ [k + (n-1)] • Throughput: number of tasks performed per unit time. • Hk = n / [k + (n-1)]t = nf/ [k + (n-1)].

Dynamic/ Non-linear pipeline • Linear pipelines are static pipelines. • Dynamic pipelines can be

Dynamic/ Non-linear pipeline • Linear pipelines are static pipelines. • Dynamic pipelines can be reconfigured to perform variable functions at different times. • Dynamic pipeline allows feedforward and feedback connections in addition to the streamline connections.

 • Reservation table • Multiple checkmarks in a row, which means repeated usage

• Reservation table • Multiple checkmarks in a row, which means repeated usage of the same stage in different cycles. • Latency: the number of time units between two initiations of a pipeline is the latency between them. A latency of k means that two initiations are separated by k clock cycles. • Any attempt by two or more initiations to use the same pipeline stage at the same time will

 • Some latencies will cause collisions and some will not. Latencies that cause

• Some latencies will cause collisions and some will not. Latencies that cause collisions are called forbidden latencies.

Instruction pipeline design • Instruction execution phases. • Mechanisms for instruction pipeline: three types

Instruction pipeline design • Instruction execution phases. • Mechanisms for instruction pipeline: three types of buffers can be used to match the instruction fetch rate to the pipeline consumption rate. • Prefetch buffers: sequential buffer and target buffer. Loop buffer: holds sequential instructions contained in a small loop.

 • Multiple functional units: a certain pipeline stage can become a bottle neck.

• Multiple functional units: a certain pipeline stage can become a bottle neck. This problem can be alleviated by using multiple copies of the same stage simultaneously.

Mechanisms for Instruction pipelining

Mechanisms for Instruction pipelining

Internal data forwarding • The throughput of a pipelined processor can be further improved

Internal data forwarding • The throughput of a pipelined processor can be further improved with internal data forwarding among multiple functional units. Some memory-access operations can be replaced by register transfer operations. • 1. Store-load forwarding. • 2. Load-load forwarding. • 3. Store- Store forwarding

Hazard avoidance The read and write of shared variables by different instructions in a

Hazard avoidance The read and write of shared variables by different instructions in a pipeline may lead to different results if these instructions are executed out of order. • 1. RAW hazard: read after write- flow dependence. • 2. WAW: write after write: output dependence. • 3. WAR: write after read: antidependence.

 • • • R(i)∩ D(j) ≠ 0 : raw (Flow dependence) R(i)∩ R(j)

• • • R(i)∩ D(j) ≠ 0 : raw (Flow dependence) R(i)∩ R(j) ≠ 0 : waw(O/p dependence) D(i)∩ R(j) ≠ 0 : war(Anti -dependence) Where D= domain: contain input set R= range: output set.

instruction scheduling • 3 methods for scheduling instructions through an instruction pipeline. • 1.

instruction scheduling • 3 methods for scheduling instructions through an instruction pipeline. • 1. Static scheduling scheme • 2. Dynamic scheduling- Tomasulo’s register tagging scheme and scoreboarding scheme