CS 704 Advanced Computer Architecture Lecture 12 Instruction
- Slides: 38
CS 704 Advanced Computer Architecture Lecture 12 Instruction Level Parallelism (Introduction to multi cycle pipelined datapath) Prof. Dr. M. Ashraf Chughtai
Today’s Topics Recap: Pipelining Basics Longer Pipelines – FP Instructions Loop Level Parallelism FP Loop Hazards Summary MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 2
Recap: Pipelined datapath and control In the previous lecture we reviewed the pipelined datapath to understand the basics of ILP – overlap among the instruction execution to enhance performance Key components of pipeline data path Performance enhancement due to pipeline: – Pipelining helps instruction bandwidth but not latency MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 3
Recap: Pipeline Hazards Structural hazards MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 4
Recap: Pipeline Hazards …. . Cont’d Data Hazards MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 5
Recap: Three Generic Data Hazards Read After Write (RAW): (dependence) – instr. J tries to read operand before instri writes it; i: add r 1, r 2, r 3 j: sub r 4, r 1, r 3 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 6
Recap: Three Generic Data Hazards Write After Read (WAR): anti-dependence – i: sub r 4, r 1, r 3 j: add r 1, r 2, r 3 - Also called Name dependence(renaming) MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 7
Recap: Three Generic Data Hazards • Write After Write (WAW) i: sub r 1, r 4, r 3 j: add r 1, r 2, r 3 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 8
Recap: Pipeline Hazards …. . Cont’d Control hazards How to overcome Hazards? Stall MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 9
Recap: How to remove Hazards? Structural Hazard: Multiple functional units Data Hazard : Forwarding or bypassing Control Hazards: Predict, delay branch MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 10
Instruction Level Parallelism – clock speed – number of instructions that can execute in parallel, i. e. , increasing ILP MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 11
How to achieve Instruction Level Parallelism? A superscalar processor: - - pre-fetch and decode - Start several branch instruction streams - Finally, discard all but the correct stream MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 12
Superscalar Design MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 13
MIPS Longer Pipelines – FP Instructions MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 14
MIPS Longer Pipelines – FP Instructions For example to ADD two FP minimum four steps are performed in the following sequence: MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 15
Flow diagram of MIPS FP Adder Draw flow diagram of pp 284 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 16
Steps for FP Addition Step 1: Exponents of two numbers are compared, the smaller number is shifted to the right to till its exponent matches to the larger exponent Step 2: Add the significands Step 3: Normalize the sum – shift right and increment or shift left and decrement Step 4: If no overflow or underflow then round the significand to number of bits Stop if further normalization is not required, otherwise go to step 3 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 17
MIPS Longer Pipelines …… Cont’d - The latency of functional unit is defined as: the number of cycles between the instructions that produces a result and the one that uses the result of the operation - The initiation or repeat interval is defined as: the number of cycles that must elapse between issuing two operations (repeat of an operation) of the same type MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 18
MIPS Longer Pipelines …… Cont’d Latency Initiation (repeat) Interval Integer ALU Data Memory (Int / FP Load) FP ADD FP/ Integer Multiply FP/Integer Divide MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) =0 =1 =3 =6 = 24 1 1 25 19
Typical MIPS FP Pipeline Let us consider a typical MIPS FP pipeline with three un-pipelined FP functional units Insert Fig. A. 29 (page A-48) Explanation next please MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 20
Typical MIPS FP Pipeline MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 21
MIPS FP Pipeline with Pipelined FUs The previous FP pipeline can be extended by adding additional pipeline stages in the functional units Insert Fig. A. 31(page A-50) Explanation next please MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 22
Working of extended FP Pipeline MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 23
Working of extended FP Pipeline Note that additional pipeline register have been inserted between intervening stage, e. g. , A 1/A 2, A 2/A 3, …. . Furthermore, ID/EX register must be expanded to connect ID to A 1, M 1, EX and DIV Function Units Here, the FP divide FP is not pipelined but it requires 24 clock cycles to complete MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 24
FP Pipeline Timing: Example MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 25
Hazards in Longer Latency Pipeline All the functional units are not fully pipelined. So structural hazard may occur. Instructions have varying running time, so more than one register write may occur. Instructions are no longer reaching WB stage in order so WAW data hazard may occur. WAR hazards are not possible since registers are read in ID stage. Stall for RAW data hazard may be more frequent because of longer latency of operations. MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 26
FP Pipeline Hazards - RAW Clock Cycle Number INST L. D F 4, 0(R 2) MUL. D F 0, F 4, F 6 ADD. D F 2, F 0, F 8 S. DF 2, 0(R 2) MAC/VU-Advanced Computer Architecture 1 2 IFID IFst Lecture 12 –Instruction Level Parallelism (1) 3 EX st ID st 4 Me M 1 st st 5 WB M 2 st st 6 M 3 st st 27
FP Pipeline Structural Hazard Clock Cycle Number 1 IF 2 ID 3 M 1 4 M 2 5 M 3 IF IF ID ID Ex Ex Me Me WB WB IF ID A 1 A 2 A 3 IF IF ID ID Ex Ex Me Me WB WB IF ID EX Me WB MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 6 M 4 7 M 5 8 M 6 A 4 Me WB 9 M 7 28
Conclusion about FP Pipeline 1: Structural Hazard – wait until required functional unit is available 2: Check for RAW data hazard : wait until the source registers are not listed as pending destinations register that will not be available 3. Check for WAW: determine if any instruction in A 1, A 2, …D, M 1, M 2 , …. has same destination as this instruction MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 29
Precise Exceptions: Out-of-order Completion!. In the program: DIV. D F 0, F 2, F 4 ADD. D F 10, F 8 SUB. D F 12, F 14 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 30
Overcoming the Data Hazard by Scheduling Static Scheduling – Compiler based Dynamic Scheduling – Hardware based Statically Scheduled Pipeline: MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 31
Dynamic Scheduling Overcoming the Data Hazard Dynamically Scheduled Pipeline: Advantages: - Allows to handle cases where dependence is unknown at the compile time - Allows code compiled for one pipeline to run on other pipe line MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 32
Concept of Dynamic Scheduling. . Cont’d In the program: DIV. D F 0, F 2, F 4 ADD. D F 10, F 8 SUB. D F 12, F 8, F 14 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 33
Problems of Out-of-order execution WAR and WAW In the program: DIV. D F 0, F 2, F 4 ADD. D F 6, F 0, F 8 SUB. D F 8, F 10, F 14 MUL. D F 6, F 10, F 8 MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 34
Exception due to out of order execution Already completed instructions Not Yet completed instructions MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 35
Overcoming Exceptions Split the ID pipe stage into two: Issue: Decode instructions and check for structural hazard Read Operand: Wait until no data hazards, then read MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 36
Summary We have talked about longer FP pipelines MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 37
Asslam-u-a. Lacum and ALLAH Hafiz MAC/VU-Advanced Computer Architecture Lecture 12 –Instruction Level Parallelism (1) 38
- Computer architecture notes
- Isa computer architecture
- Instruction format in computer architecture
- Instruction cycle in computer architecture
- Scalar pipeline in computer architecture
- Instruction level parallelism in computer architecture
- Arc instruction set
- Instruction set architecture in computer organization
- Ec 6009
- Tpp 704 01 download
- Afi 36 704
- 704-631-1500
- Mc-338 cryogenic tankers
- Irc 704
- Chemical grade
- Iso 704
- 704 kar 3:305
- 704 error
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- Instruction de lecture et d'écriture
- Advanced inorganic chemistry lecture notes
- 3 bus architecture
- Differentiated instruction vs individualized instruction
- § 367 abgb
- 4 step bowling approach
- Computer organization and computer architecture difference
- What is basic computer organization
- Marie instruction set
- What is instruction set architecture
- Very large instruction word
- Good design demands good compromises
- Very long instruction word architecture
- 3 stage pipeline arm organization
- Which instruction set architecture is used in beaglebone?
- Instruction set architecture
- 430830
- Classifying instruction set architecture
- Very long instruction word architecture
- Computer security 161 cryptocurrency lecture