Pipeline Design Problems Job Sequencing and Collision Prevention
Pipeline Design Problems
Job Sequencing and Collision Prevention for the Design of Static Pipeline
Job Sequencing and Collision Prevention • Consider reservation table given below at t=0 0 Sa A Sb Sc 1 2 3 4 5 A A A
Job Sequencing and Collision Prevention • Consider next initiation made at t=1 0 1 2 3 4 5 6 Sa A 1 A 2 Sb A 1 A 2 Sc A 1 A 2 7 • The second initiation easily fits in the reservation table
Job Sequencing and Collision Prevention • Now consider the case when first initiation is made at t = 0 and second at t = 2. 0 1 2 Sa A 1 A 2 Sb Sc A 1 A 2 A 1 3 4 A 1 A A 2 2 A 2 5 6 7 A 1 A 2 A 1 A 2 • Here both markings A 1 and A 2 falls in the same stage time units and is called collision and it must be avoided
Terminologies
Terminologies • Latency: Time difference between two initiations in units of clock period • Forbidden Latency: Latencies resulting in collision • Forbidden Latency Set: Set of all forbidden latencies
General Method of finding Latency Considering all initiations: 0 1 2 3 4 5 6 7 8 9 10 S A 6 A a A 1 A 2 A 3 A 4 A 5 A 6 1 S A 1 A A 2 A A 3 A A 4 A b A 1 A 2 3 A 5 A 6 4 5 6 • Forbidden Latencies 2 and A 1 A are A 2 A A 3 A 5 A 4 A Sc A 1 A 2 A 5 A 6 3 4 5 6
Shortcut Method of finding Latency • Forbidden Latency Set = {0, 5} U {0, 2} = { 0, 2, 5 }
Terminologies • Initiation Sequence : Sequence of time units at which initiation can be made without causing collision • Example : { 0, 1, 3, 4 …. } • Latency Sequence : Sequence of latencies between successive initiations • Example : { 1, 2, 1…. } • For a RT, number of valid initiations and latencies are infinite
Terminologies • Initiation Rate : – The average number of initiations done per unit time – It is a positive fraction and maximum value of IR is 1 • Average Latency : The average of latency of a given latency sequence IR = 1/AL
Terminologies • Latency Cycle: • Among the infinite possible latency sequence, the periodic ones are significant. E. g. { 2, 3, 4, … } • The subsequence that repeats itself is called latency cycle. E. g. {2, 3, 4}
Terminologies • Period of cycle: The sum of latencies in a latency cycle (2+3+4=9) • Average Latency: The average taken over its latency cycle (AL=9/3=3) • To design a pipeline, we need a control strategy that maximize throughput (no. of results per unit time) • Maximizing throughput is minimizing AL
Terminologies • Control Strategy – Initiate pipeline as specified by latency sequence. – Latency sequence which is aperiodic in nature is impossible to design • Thus design problem is arriving at a latency cycle having minimal average latency.
Terminologies • Stage Utilization Factor (SUF): • SUF of a particular stage is the fraction of time units the stage used while following a latency sequence. • Example: Consider 5 initiations of function A as below 0 1 Sa A 1 A 2 Sb Sc 2 3 4 5 6 7 A 3 A 1 A 2 A 4 A 1 A 2 A 3 A 3 8 9 10 11 12 13 A A 4 A 5 5 A 3 A 4 A 5
Terminologies • SUF of stage Sa is number of markings present along Sa divided by the time interval over which marking is counted. • SUF(Sa) = SUF(Sb) = SUF(Sc) = 10/14
Terminologies • Let SU(i) be the stage utilization factor of stage i • Let N(i) be no. of markings against stage i in the reservation table • Suppose we initiate pipeline with initiation rate (IR), then SU(i) is given by
SUF
Terminologies • Minimum Average Latency (MAL) • Thus SU(i) = IR x N(i) • SU(i) ≤ 1 IR x N(i) ≤ 1/IR N(i) ≤ AL • Therefore
State Diagram • Suppose a pipeline is initially empty and make an initiation at t = 0. • Now we need to check whether an initiation possible at t=i for i > 0. • bi is used to note possibility of initiation • bi = 1 initiation not possible • bi = 0 initiation possible
State Diagram bi 1 0 0 1
State Diagram • The above binary representation (binary vector) is called collision vector(CV) • The collision vector obtained made at first initiation is called initial collision vector(ICV) ICVA = (101001) • The graphical representation of states (CVs) that a pipeline can reach and the relation is given by state diagram
State Diagram • States (CVs) are denoted by nodes • The node representing CVt-1 is connected to CVt by a directed graph from CVt-1 to CVt and similarly for CVt* with a * on arc
Procedure to draw state diagram 1. Start with ICV 2. For each unprocessed state, say CVt-1, do as follows: a) Find CVt from CVt-1 by the following steps 1. Left shift CVt-1 by 1 bit 2. Drop the leftmost bit 3. Append the bit 0 at the right-hand end
Procedure to draw state diagram b) If the 0 th bit of CVt is 0, then obtain CV* by logically ORing CVt with ICV. c) Make a new node for CVt and join with CVt-1 with an arc if the state CVt does not already exist. d) If CV* exists, repeat step (c), but mark the arc with a *.
State Diagram 101001
State Diagram Left Shift 1010010
State Diagram Zero CV* exists 1010010
State Diagram 101001 * 010010 111011 ICV – 101001 CVi – 010010 CV* 111011 OR
State Diagram 101001 * Left Shift 010010 111011 No CV* Left Shift No CV* 100100 110110
State Diagram 101001 * 010010 Left Shift 111011 * Zero CV* exists 100100 110110 Left Shift No CV* 001000 101100 ICV – 101001 OR CVi – 001000 CV* 101001
State Diagram 101001 * 010010 111011 * 100100 101100 0010000 * Zero CV* exists 110110 111001 ICV – 101001 CVi – 010000 CV* 111001
101001 * * 111011 010010 110110 10010000 111001 * 001000 101100 Zero CV* exists 011000 ICV – 101001 CVi – 011000 CV* 111001
101001 * * 010010 111011 100100 * 010000 * 001000 1101100 011000 111001 No CV* 110000
101001 * * 010010 1110110 100100 * 010000 * 001000 1011000 111001 110000 No CV* 100000
101001 * * 010010 1110110 100100 * 010000 111001 * 001000 1011000 1100000 000000 * *
101001 * 010010 * 111011 10010000 111001 * 001000 1101100 0110000 * 1000000 *
101001 * 010010 * 1110110 100100 * 001000 101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
State Diagram • From the above diagram, closed loops can be identified as latency cycles. • To find the latency corresponding to a loop, start with any initial * count the number of states before we encounter another * and reach back to initial *.
101001 Latency = (3) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (1, 3, 3) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (4, 3) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (1, 6) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (1, 7) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (4) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (6) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
101001 Latency = (7) * 010010 * 111011 100100 * 001000 1101100 * 010000 111001 011000 110010 110000 * 1000000 *
State Diagram • The state with all zeros has a self-loop which corresponds to empty pipeline and it is possible to wait for indefinite number of latency cycles of the form (1, 8), (1, 9), (1, 10) etc. • Simple Cycle: latency cycle in which each state is encountered only once. • Complex Cycle: consists of more than one simple cycle in it. • It is enough to look for simple cycles
State Diagram • In the above example, the cycle that offers MAL is (1, 3, 3) • From • A cycle arrived so is called greedy cycle, which minimize latency between successive initiation
Modified State Diagram • The state diagram becomes cumbersome for longer ICVs. • In modified state diagrams, we represent only states obtained of initiations.
Modified State Diagram • The procedure is as follows: 1. Start with the ICV 2. For each unprocessed state, For each bit I in the CVi which is 0, do the following: a. Shift CVi left by i bits b. Drop i leftmost bits
Modified State Diagram c. Append zeros to right d. Logically OR with ICV e. If step(d) results in a new state then form a new node for this state and join it with node of CVi by an arc with a marking i. Join this new node with node of ICV with an arc having the marking ≥ d (length of ICV)
Modified State Diagram 101001
Modified State Diagram 101001 1 111011 i =1 ICV – 101001 CVi – 010010 CV* 111011 OR
Modified State Diagram 101001 ≥ 6 1 111011
Modified State Diagram 101001 ≥ 6 1 111011 i = 3 ICV – 101001 CVi – 001000 CV* 101001 OR
Modified State Diagram 3 101001 ≥ 6 1 111011 i = 3
Modified State Diagram 3 101001 ≥ 6 i = 4 1 111011 ICV – 101001 CVi – 010000 CV* 111001 OR
Modified State Diagram 3 101001 ≥ 6 4 1 111011 111001 ICV – 101001 CVi – 010000 CV* 111001 OR
Modified State Diagram 3 101001 ≥ 6 4 ≥ 6 1 111011 111001
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 ≥ 6 1 111011 111001
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 ≥ 6 1 111011 ICV – 101001 CVi – 011000 CV* 111001 OR i = 3
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 ≥ 6 1 111011 3 111001
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 ≥ 6 1 111011 3 111001 i = 3 ICV – 101001 CVi – 001000 CV* 101001 OR
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 3 1 111011 3 111001
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 3 1 111011 3 111001 i = 4 ICV – 101001 CVi – 010000 CV* 111001 OR
Modified State Diagram 3 ≥ 6 101001 ≥ 6 4 3 1 111011 3 111001 4
Dynamic Pipeline and Reconfigurability • Two methods to improve throughput of dynamic pipeline: – Insertion of non-compute delays – Use of Internal Buffers
End
- Slides: 70