CS 7810 Lecture 11 Delaying Physical Register Allocation



















- Slides: 19

CS 7810 Lecture 11 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999

Register File Design Considerations • Number of ports = 3 x issue width • Number of entries = window size + logical-regs • Multiple threads more registers (more power) • Wire delays, clock speeds multiple cycle access • Pipelining a RAM structure is hard

Register Allocation Fetch Rename assign pr 7 cycle 4 Issue cycle 15 no result – 26 cyc Complete Wake-up write pr 7 cycle 30 read pr 7 cycle 50 Commit release pr 7 cycle 80 useful time – 20 cyc no activity – 30 cyc

Two-Level Register File Base regfile Two-level regfile

Virtual-Physical Registers Register map table lr 3 vr 7 Virtual map table

Virtual-Physical Registers Register map table lr 3 vr 7 Virtual map table Instruction issues

Virtual-Physical Registers Register map table lr 3 vr 7, pr 9 vr 7 (pr 9) vr 7 pr 9 Virtual map table vr 7, pr 9 Instruction completes Is assigned pr 9

Virtual-Physical Registers Register map table lr 3 vr 7, pr 9 vr 7 (pr 9) vr 7 pr 9 Virtual map table

Lack of Registers Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register

Lack of Registers cycle t+1 commits Finishes, has no register, keeps re-executing gets reg In-flight window Has physical register Has no physical register

Deadlock Who will generate a register for this instr? Finishes, has no register, keeps re-executing Solution: Reserve a register for the oldest instruction In-flight window Has physical register Has no physical register

Sequential Execution Oldest instr has reserved register In-flight window Has physical register Has no physical register

Sequential Execution instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

Sequential Execution Behaves like an in-order processor instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register

Reserving All Registers Allows quick progress, but almost behaves like a conventional processor Has physical register Has no physical register

Register Stealing Instr finishes; steals register from the youngest finished instr In-flight window Has physical register Has no physical register • No reservation of regs • The younger instrs may have to execute twice • Note the pre-execution effect

Implementation • Finished instructions have to remain in issueq in case they have to re-execute • Issued dependents of the victim instruction need not re-execute • The VP tag of the victim has to be broadcast so that unissued dependents can reset the ready bit • Can benefit from an instruction reuse buffer? • Pre-execution without explicitly attempting it

Results • Improves the base case by 5% (Int programs) and 24% (FP programs) • FP programs have more ILP, better branch prediction, and are more limited by cache misses • Re-executions: 10% (int) 58% (fp) • Steals: 5% (int) 12% (fp) • For the same IPC, VP registers employ 25% fewer registers

Title • Bullet