8 Microarchitecture of Superscalars 6 Register renaming Dezs

  • Slides: 30
Download presentation
8. Microarchitecture of Superscalars (6) Register renaming Dezső Sima Fall 2006 D. Sima, 2006

8. Microarchitecture of Superscalars (6) Register renaming Dezső Sima Fall 2006 D. Sima, 2006

Overview • 1 The Principle of register renaming 2 Design Space • • 2.

Overview • 1 The Principle of register renaming 2 Design Space • • 2. 1 Overview • 2. 2 Types of rename buffers • 3 Operation of register renaming • 4 Design parameters of register renaming • 5 Implementation of renaming in superscalars • • 5. 1 The chronology of introducing register renaming • 5. 2 Basic implementation schemes of register renaming 6 Examples

1. Principle of register renaming (1) Aim: • Eliminating false data dependencies to relieve

1. Principle of register renaming (1) Aim: • Eliminating false data dependencies to relieve the issue bottleneck False data dependencies WAR Write After Read (Anti dependency) Examples: I 1: mul I 2: add r 1, r 2, r 3 r 2, r 4, r 5 WAW Write After Write: (Output dependency) I 1: mul I 2: add r 1, r 2, r 3 r 1, r 4, r 5

1. Principle of register renaming (2) Basic principle to eliminate false data dependencies: Source

1. Principle of register renaming (2) Basic principle to eliminate false data dependencies: Source register numbers Results RB Retirement False data dependencies are eliminated by writing generated results temporarily to buffers, called the rename buffers (RB) instead of the referenced architectural registers (AR). AR Ops. Then EU - during dispatching a new rename buffer need to be allocated to each instruction whose destination register causes false data depenency 1, EU - referenced source operands need to be fetched from the RB file, if they are actually renaned, else from the AR file, - during retirement buffered results need to be transferred from the RB file to the AR file. Usually, processors allocate to each dispatched instruction a rename buffer without checking for the existence of false data dependecies to reduce logic complexity. 1 Figure 1. 1: The principle of register renaming

2. Design space of register renaming 2. 1 Overview Register renaming Scope of register

2. Design space of register renaming 2. 1 Overview Register renaming Scope of register renaming Layout of the rename buffers Type of rename buffers Layout of the register mapping Rename rate

2. 2 Types of rename buffers Rename reg. file Reg. Res. RR nrs. Ret.

2. 2 Types of rename buffers Rename reg. file Reg. Res. RR nrs. Ret. Ops. AR

Allocate, if instruction is dispatched Rename reg. file Reg. Res. RR Initialized nrs. Ret.

Allocate, if instruction is dispatched Rename reg. file Reg. Res. RR Initialized nrs. Ret. AR Reclaim, if instruction is retired Ops. Allocated, not valid Available Reclaim, if instruction is canceled Allocated, valid Update, if instruction is finished

2. 2 Types of rename buffers Rename reg. file Reg. Res. RR Future file

2. 2 Types of rename buffers Rename reg. file Reg. Res. RR Future file nrs. Ret. AR Ops. Power. PC 603 (1993) Power. PC 604 (1995) Power. PC 620 (1996) Power 3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) Reg. Res. FF nrs. Ret. Ops. AR

Future file Update if instruction is finished Reg. nrs. Initialized Res. FF Ret. AR

Future file Update if instruction is finished Reg. nrs. Initialized Res. FF Ret. AR Ops. The FF has as many entries as the AR and holds the most actual register values Not valid Vali d Invalidate by referring to the same register as destination

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register file Future file Reg. nrs. Reg. Res. RR nrs. Ret. AR Ops. Power. PC 603 (1993) Power. PC 604 (1995) Power. PC 620 (1996) Power 3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) Reg. Res. FF nrs. Ret. AR Ops. Ultra. SPARC III (1999) K 7 (FX) (1999) K 8 (FX) (2003) Res. AR, RR Ops.

Merged arch. and rename register file Entry is allocated to a dispatched instruction Initialized

Merged arch. and rename register file Entry is allocated to a dispatched instruction Initialized Reg. nrs. Res. AR, RR RB, Available not valid Architectural register is reclaimed if this architectural register becomes renamed anew. Instruction is canceled RB, valid AR Ops. Instruction is completed It needs a large number of physical registers. During completion no physical transfer is needed from the rename buffer to the referenced architetural register instead the former rename buffer changes its state and becomes the referenced architectural register. Instruction is finished

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register file Future file Holding renamed values in the ROB Reg. nrs. Reg. Res. RR nrs. Ret. AR Ops. Power. PC 603 (1993) Power. PC 604 (1995) Power. PC 620 (1996) Power 3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) Reg. Res. FF nrs. Ret. AR Ops. Ultra. SPARC III (1999) K 7 (FX) (1999) K 8 (FX) (2003) Res. AR, RR Ops. Power 1 (1990) Power 2 (1993) R 10000 (1996) R 12000 (1999) Alpha 21264 (1998) Pentium 4 (FP) (2000) K 7 (FP) (1999) K 8 (FP) (2003) Res. ROB Ret. Ops. AR

Holding renamed values in the ROB Reg. Allocate, if instruction is dispatched Initialized nrs.

Holding renamed values in the ROB Reg. Allocate, if instruction is dispatched Initialized nrs. Allocated, not valid Available ROB Ret. AR Reclaim, if instruction is retired Ops. ROB entries are extended to hold results as well. During dispatching a new ROB entry with its result field is allocated to each dispatched instruction. (The result field serves as the allocated rename buffer). Reclaim, Res. if instruction is canceled Allocated, valid Update, if instruction is finished

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register

2. 2 Types of rename buffers Rename reg. file Merged arch. and rename register file Future file Holding renamed values in the ROB Reg. nrs. Reg. Res. RR nrs. Ret. AR Ops. Power. PC 603 (1993) Power. PC 604 (1995) Power. PC 620 (1996) Power 3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) Reg. Res. FF nrs. Ret. AR Ops. Ultra. SPARC III (1999) K 7 (FX) (1999) K 8 (FX) (2003) Res. AR, RR Ops. Power 1 (1990) Power 2 (1993) R 10000 (1996) R 12000 (1999) Alpha 21264 (1998) Pentium 4 (FP) (2000) K 7 (FP) (1999) K 8 (FP) (2003) Res. ROB Ret. AR Ops. K 5 (1995) K 6 (1997) Pentium Pro (1995) Pentium II (1997) Pentium III (1999) Pentium 4 (FX) (2000) Pentium M (2003) Core (2006)

3. Operation of register renaming (1) The actual rename process depends on both the

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Assumptions: Rename technique: using rename registers and mapping tables

Rename registers: Provide buffer space to temporarily hold instruction results V Rename register file

Rename registers: Provide buffer space to temporarily hold instruction results V Rename register file (RR) During dispatching the Valid bit of the allocated rename register becomes invalidated (v 0) When the instruction becomes finished the result of the instruction is transferred to the allocated rename buffer entry and the Valid bit is set (V 1), to indicate that the corresponding value is available.

3. Operation of register renaming (1) The actual rename process depends on both the

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Assumptions: Rename technique: using rename registers and mapping tables

Mapping table: Entry RB valid index It includes an entry to each architectural register.

Mapping table: Entry RB valid index It includes an entry to each architectural register. Each entry has an „Entry valid” bit that indicates whether or not the corresponding architectural register is renamed and in case of a renaming it holds the index of the associated rename buffer (RB index). 0 Mapping table Look-up for r 7 6 7 0 1 12 8 1 14 n-1 A new entry is created while an instruction is dispatched • by setting the „Entry valid” bit and • writing the index of the allocated rename buffer („RB index”) to the entry that corresponds to the destination register of the dispatched instruction. "12" (RB index=12) A valid mapping is updated by writing a new „RB index” into it when the architectural register belonging to that entry is renamed again. An entry is invalidated when the instruction that actually belongs to that entry is retired. In this way the mapping table continuously holds the latest allocations.

3. Operation of register renaming (1) The actual rename process depends on both the

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Assumptions: Rename technique: using rename registers and mapping tables Underlying microarchitechture: • in order dispatching • dynamic instruction issue • split FX and FP register files • operand fetch policy • both alteratives are discussed

3. Operation of register renaming (2) Considered part of the microarchitecture for both dispatch

3. Operation of register renaming (2) Considered part of the microarchitecture for both dispatch bound and issue bound operand fetching : • it executes only FX-instructions, • consists of an architectural register file (AR) and a single execution unit (EU).

3. Operation of register renaming (3) Decoded instructions OC Dispatch Rd, Rs 1, Rs

3. Operation of register renaming (3) Decoded instructions OC Dispatch Rd, Rs 1, Rs 2 Update RR Rs 1' Mapping table V Op 1/Rs 1' Rd' Op 2/Rs 2' Issue Update arch. rf. Rename register file (RR) Rs 2' Architectural register file (AR) OC Rd' Op 1/Rs 1' V 1 Op 2/Rs 2' V 2 When inst. retired updating the AR Op 1 Fetching op. s if valid else tags Op 2 Reservation station (RS) Update RS Renaming destination and surce registers Check valid bits Issuing instr. when op. s ready OC, Rd', Op 1, Op 2 EU Bypassing After instr. executed, updating RS, RR Result, Rd' Figure 3. 1: An FX-core assuming buffered issue and dispatch bound operand fetching

3. Operation of register renaming (4) Decoded instructions Dispatch OC Rd, Rs 1, Rs

3. Operation of register renaming (4) Decoded instructions Dispatch OC Rd, Rs 1, Rs 2 Renaming destination and source registers Mapping table Rd' Rs 2' Rs 1' Dispatching instructions into the RS Reservation station (RS) Issue Checking for availability of (Rs 1'), (Rs 2') Rs 1', Rs 2' Issuing inst. when operands valid, fetching op. s OC Rd’ Rs 1' Rs 2' Update RR Rename register file (RR) Architectural register file (AR) V Updating AR when inst. retires Op 1 Op 2 OC, Rd' EU Bypassing Executing instr. updating RR when instr. finished Result, Rd' Figure 3. 2: An FX-core assuming buffered issue and issue bound operand fetching

4. Design parameters of register renaming (1) Processor type/year of Type of rename Number

4. Design parameters of register renaming (1) Processor type/year of Type of rename Number of rename volume shipment buffers FX Dispatch rate FP Width of the issue window Total number of rename buffers Reorder width (wdw) (nr) (n. ROB) RISC processors Power. PC 603 (1993) ren. reg. file na. 4 3 3 na. 5 Power. PC 604 (1995) ren. reg. file 12 8 4 12 20 16 Power. PC 620 (1996) ren. reg. file 8 8 4 15 16 16 POWER 3 (1998) POWER 4 (2001) POWER 5 (2004) R 10000 (1996) R 12000 (1998) Alpha 21264 (1998) PA 8000 (1986) PA 8200 (1987) PA 8500 (1989) PM 1 (1996) ren. reg. file merged merged ren. reg. file merged 16 80 120 32 32 48 56 56 56 38 24 72 120 32 32 41 56 56 56 24 4 5 5 4 4 4 4 23 78 82 48 48 35 56 56 56 36 40 152 240 64 64 89 112 112 62 32 20*5 32 48 80 56 56 56 62 Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

4. Design parameters of register renaming (2) Processor type/year of volume shipment Type of

4. Design parameters of register renaming (2) Processor type/year of volume shipment Type of rename buffer Number of rename buffers FX Dispatch rate FP Width of the issue window Total number of rename buffers Reorder width (wdw) (nr) (n. ROB) CISC (x 86) processors Pentium Pro (1995) in the ROB 40 32 20 40 40 Pentium II (1997) Pentium III (1999) in the ROB 40 40 32 32 20 20 40 40 Pentium 4 (2000) (Willamette) merged 128 32 n. a. 128 126 Pentium 4 (2002) Northwood merged 128 3 n. a. 256? 2*126? Pentium 4 (2004) Prescott merged 256 3 n. a. 512? 4*128? Pentium M (2003) Core (2006) K 5 (1995) K 6 (1996) in the ROB 40 96 16 24 3 4 42 32 24 32 11(? ) 24 40 96 16 24 K 7 (1999) in the ROB/ merged 72 n. a. 32 54 88 24*3 K 8 (2003) in the ROB/ merged 32 60 192 24*3 72 120 Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

5. Implementation of renaming in superscalars 5. 1 The chronology of introducing register renaming

5. Implementation of renaming in superscalars 5. 1 The chronology of introducing register renaming Figure 5. 1: Chronology of introducing register renaming Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

Types of rename buffers Rename reg. file Dispatch bound Issue bound Future file Dispatch

Types of rename buffers Rename reg. file Dispatch bound Issue bound Future file Dispatch bound Issue bound Merged arch. and rename register file Holding renamed values in the ROB Dispatch bound Proposals Op. fet. poli. Types of ren. buffers 5. 2 The basic implementation schemes of register renaming Issue bound Smith, Pleszkun, (85) Johnson (87) Keller (75) Sohi, Vajapeyam (87) Power. PC 603 (93) Power. PC 604 (95) Power. PC 620 (96) Examples Issue bound POWER 3 (98) PA 8000 (96) PA 8200 (97) PA 8500 (99) K 7 (FX) (99) K 8 (FX) (03) Ultra. SPARC III (99) PM 1 (95) (SPARC 64) ES/9000 (92) Pentium Pro (95) POWER 1 (90) Pentium II (97) POWER 2 (93) Pentium III (99) Pentium M (03) P 2 SC (96) Core (06) POWER 4 (01) POWER 5 (04) Nx 586 (94) Am 29000 (95) R 10000 (96) K 5 (95) R 12000 (99) Lightning* (91) Pentium 4 (00) K 6* (97) K 7 (FP) (99) K 8 (FP) (03)

6. Examples (1) Rename register file Figure 6. 1: The microarchitecture of the POWER

6. Examples (1) Rename register file Figure 6. 1: The microarchitecture of the POWER 3 Source: Song, P. „IBM’s Power 3 to Replace P 2 SC”, Microprocessor Report, Nov. 17, 1997

6. Examples (2) Future file WARF: Working and Architectural Register File (Future file) Figure

6. Examples (2) Future file WARF: Working and Architectural Register File (Future file) Figure 6. 2: The microarchitecture of the Ultra. SPARC-III Source: Horel, T. „Ultra. SPARC-III”, IEEE MICRO, May-June 99, pp. 73 -95

6. Examples (3) Merged architectural and rename reg. Figure 6. 3: The microarchitecture of

6. Examples (3) Merged architectural and rename reg. Figure 6. 3: The microarchitecture of the Alpha 21264 Source: Kessler, R. E. et al. . „The Alpha 21264 Microprocessor Architecture”, h 18002. www 1. hp. com/alphaserver

6. Examples (4) Holding renamed values in the ROB Figure 6. 4: The microarchitecture

6. Examples (4) Holding renamed values in the ROB Figure 6. 4: The microarchitecture of the Core processor Source: Kanter, D. , „Intel’s next Generation Microarchitecture Unveiled”, Real World Tech. , 2006 March 9.