Chapter 4 The Processor 1 Revised Forwarding Condition






























- Slides: 30
Chapter 4 The Processor 1
Revised Forwarding Condition § MEM hazard § if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 31) and not(EX/MEM. Reg. Write and (EX/MEM. Register. Rd ≠ 31) and (EX/MEM. Register. Rd ≠ ID/EX. Register. Rn 1)) and (MEM/WB. Register. Rd = ID/EX. Register. Rn 1)) Forward. A = 01 § if (MEM/WB. Reg. Write and (MEM/WB. Register. Rd ≠ 31) and not(EX/MEM. Reg. Write and (EX/MEM. Register. Rd ≠ 31) and (EX/MEM. Register. Rd ≠ ID/EX. Register. Rm 2)) and (MEM/WB. Register. Rd = ID/EX. Register. Rm 2)) Forward. B = 01 2
Datapath with Forwarding § The signed-immediate input to the ALU, needed by loads and stores, is missing from the datapath 3
Datapath with Forwarding Multiplexor chooses between the Forward. B multiplexor output and the signed immediate 4
Load-Use Hazard Detection § Checking for Load instruction is decoded in ID stage § ALU operand register numbers in ID stage are given by § IF/ID. Register. Rn 1, IF/ID. Register. Rm 2 § Load-use hazard IF (ID/EX. Mem. Read and ((ID/EX. Register. Rd = IF/ID. Register. Rn 1) or (ID/EX. Register. Rd = IF/ID. Register. Rm 1))) stall the pipeline § If the instruction in the ID stage is stalled, then the instruction in the IF stage must also be stalled; otherwise, we would lose the fetched instruction. § prevent the PC register and the IF/ID pipeline register from changing. 5
How to Stall the Pipeline § Deasserting all eight control signals (setting them to 0) in the EX, MEM, and WB stages will create a “do nothing” or nop instruction. § By identifying the hazard in the ID stage, we can insert a bubble into the pipeline by changing the EX, MEM, and WB control fields of the ID/EX pipeline register to 0. § Prevent update of PC and IF/ID register § Using instruction is decoded again § Following instruction is fetched again § 1 -cycle stall allows MEM to read data for LDUI § Can subsequently forward to EX stage 6
Load-Use Data Hazard Stall inserted here 7
Datapath with Hazard Detection 8
Stalls and Performance The BIG Picture § Stalls reduce performance § But are required to get correct results § Compiler can arrange code to avoid hazards and stalls § Requires knowledge of the pipeline structure 9
§ If branch outcome determined in MEM, predict branch not taken Flush these instructions (Set control values to 0) PC 10 § 4. 8 Control Hazards Branch Hazards
Reducing Branch Delay § Move the conditional branch execution earlier in the pipeline, then fewer instructions need be flushed. It requires two actions to occur earlier: § computing the branch target address and evaluating the branch decision § Move hardware from EX stage to determine outcome to ID stage § Target address adder § Register comparator to see if it is zero § This will require additional forwarding and hazard detection hardware § we will need to forward results to the zero test logic that operates during ID. § To flush instructions in the IF stage, add a control line, called IF. Flush, that zeros the instruction field of the IF/ID pipeline register. § Clearing the register transforms the fetched instruction into a nop. 11
Reducing Branch Delay § Example: branch taken, assuming the pipeline is optimized for branches that are not taken, and that we moved the branch execution to the ID stage: 36: 40: 44: 48: 52: 56: 72: SUB CBZ AND ORR ADD SUB. . . LDUR X 10, X 12, X 13, X 14, X 15, X 4, X 3, X 2, X 4, X 6, X 8 8 X 5 X 6 X 2 X 7 // PC-relative branch to 40+8*4=72 X 4, [X 7, #50] 12
Example: Branch Taken 13
Example: Branch Taken 14
Dynamic Branch Prediction § In deeper and superscalar pipelines, branch penalty is more significant § Use dynamic prediction § Branch prediction buffer (aka branch history table) § Indexed by recent branch instruction addresses § Stores outcome (taken/not taken) § To execute a branch § Check table, expect the same outcome § Start fetching from fall-through or target § If wrong, flush pipeline and flip prediction 15
1 -Bit Predictor: Shortcoming § Inner loop branches mispredicted twice! outer: … … inner: … … CBZ …, …, inner … CBZ …, …, outer n n Mispredict as taken on last iteration of inner loop Then mispredict as not taken on first iteration of inner loop next time around 16
2 -Bit Predictor § Only change prediction on two successive mispredictions 17
Calculating the Branch Target § Even with predictor, still need to calculate the target address § 1 -cycle penalty for a taken branch § Branch target buffer § Cache of target addresses (destination PC) or destination instruction § Indexed by PC when instruction fetched § If hit and instruction is branch predicted taken, can fetch target immediately § Correlating predictor § A branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches. § Tournament branch predictor § A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch. 18
§ Control is the most challenging aspect of processor design: it is both the hardest part to get right and the toughest part to make fast. § One of the demanding tasks of control is implementing exceptions and interrupts § “Unexpected” events requiring change in flow of control § Different ISAs use the terms differently § Exception § Arises within the CPU § e. g. , undefined opcode, overflow, syscall, … § Interrupt § From an external I/O controller § Detecting exception conditions and taking the appropriate action is often on the critical timing path of a processor, which determines the clock cycle time and thus performance. § Dealing with them without sacrificing performance is hard 19 § 4. 9 Exceptions and Interrupts
Handling Exceptions § Save PC of offending (or interrupted) instruction § In LEGv 8: Exception Link Register (ELR) § Transfer control to the operating system at some specified address § For the operating system to handle the exception, it must know the reason for the exception § Communicate the reason for an exception through a register § In LEGv 8: Exception Syndrome Register (ESR) § We’ll assume 1 -bit § 0 for undefined opcode, 1 for overflow 20
An Alternate Mechanism § Vectored Interrupts § Handler address determined by the cause § Exception vector address to be added to a vector table base register: § Unknown Reason: § Floating-point arithmetic exception: § System Error (hardware malfunction): 00 0000 two 10 1100 two 11 1111 two § Instructions either § Deal with the interrupt, or § Jump to real handler 21
Handler Actions § Read cause, and transfer to relevant handler § Determine action required § If restartable § Take corrective action § use ELR to return to program § Otherwise § Terminate program § Report error using ESR, cause, … 22
Exception Handling in LEGv 8 § Exception not vectored (as in LEGv 8) § A single interrupt entry point for all exceptions - 0000 1 C 09 0000 § operating system decodes the status register to find the cause § Two additional registers to our current LEGv 8 implementation: § ELR: A 64 -bit register used to hold the address of the affected instruction. § ESR: A register used to record the cause of the exception. In the LEGv 8 architecture, this register is 32 bits, although some bits are currently unused. 23
Exceptions in a Pipeline § Exceptions in a pipelined implementation - another form of control hazard § Consider hardware malfunction on add in EX stage ADD X 1, X 2, X 1 § Flush add and subsequent instructions § Prevent X 1 from being clobbered as Destination register § EX. Flush signal to prevent the instruction in the EX stage from writing its result in the WB stage. § Many exceptions require that we complete previous instructions § flush the instruction and restart it from the beginning after the exception is handled. § Set ESR and ELR register values § Transfer control to handler § Similar to mispredicted branch § Use much of the same hardware 24
Pipeline with Exceptions LEGv 8 exception address 0000 1 C 09 0000 25
Exception Properties § Restartable exceptions § Pipeline can flush the instruction § Handler executes, then returns to the instruction § Refetched and executed from scratch § PC saved in ELR register § Identifies causing instruction § Actually PC + 4 is saved § Handler must adjust 26
Exception Example § Exception on ADD in 40 44 48 4 C 50 54 … SUB AND ORR ADD SUB LDUR X 11, X 12, X 13, X 15, X 16, X 2, X 4 X 2, X 5 X 2, X 6 X 2, X 1 X 6, X 7 [X 7, #100] § assume the instructions to be invoked on an exception begin like this: § 80000180 80000184 … STUR X 26, [X 0, #1000] STUR X 27, [X 0, #1008] 27
Exception Example 28
Exception Example 29
Multiple Exceptions § Pipelining overlaps multiple instructions § Could have multiple exceptions at once § Simple approach: deal with exception from earliest instruction § Flush subsequent instructions § “Precise” exceptions - always associating the proper exception with the correct instruction § Imprecise exceptions - Interrupts or exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception. § In complex pipelines § Multiple instructions issued per cycle § Out-of-order completion § Maintaining precise exceptions is difficult! 30