Register Renaming Value Prediction Overview Need for PostRISC
- Slides: 31
Register Renaming & Value Prediction
Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines ► Dynamic Register Renaming through Virtual-Physical Registers
Software Outlives Hardware ► How to make old software run faster? • Faster CPU clock and memory hierarchy • Adapt CPU’s to actual software (profiling/tuning) • More instructions per cycle ► Today’s software will run on tomorrow’s CPU’s • Need to keep software interface stable • More functional units and registers
Compile-time vs. Run-time ► Little is known about software at compile-time ► Space/time trade-offs • Memory speeds cannot keep up with CPU speeds • When to apply optimizations that increase code size
Solutions ► New scalable architecture (IA-64) • Decouple physical/virtual registers using register windows • More explicit parallelism allows for more function units • Explicit speculative instructions ► Post-RISC architecture • Remove limits in super scalar implementation of existing architectures • Extract even more parallelism out of existing software
Anti- and Output Dependencies ► Also called read-after-write (RAW) hazards ► An instruction may use a result produced by the previous instruction • Both instructions may not execute simultaneously in multiple pipelines. • The second instruction must typically be stalled.
Structural Dependencies ► Stalls results in less than optimal performance We may have single issue cycles, which process only a single instruction. Worse, we may have zero issue cycles, which initiate no new instructions. ► Data dependencies can also limit performance for a scalar machine • Two cycle memory load/write • Intra instruction dependencies
Scheduling ► Scheduling can remove stalls ► Intra-instruction dependencies cannot be removed by scheduling (CISC)
Need for Post-RISC ► Super-scalar has diminishing returns in CPI (Clocks Per Instruction) • 2 Way 1. 6 1. 8 (85%) • 4 Way 2. 6 (65%) • 8 Way ? ? ? ► More parallelism needed ► Look beyond set of 4 instructions
Post-RISC characteristics ► Out-of-order execution • (Existed 20 years ago on IBM and CDC) • Innovative for single chip • Branch history bits ► Precise interrupts ► Fetch/Flow Prediction ► More caching • Instruction cache becomes CPU scratch space ► Register renaming • First in IBM 360/91 FPU
Specint 92 Trends ► Specint 92 numbers are increasing • DEC has historically been the champ ► Specint 92/Clock rates • DEC low (21164@300 => 1. 14 10/95) • IBM strong early (580 H@55 => 1. 76 9/93) • HP (PA 8000@133 2. 7 10/95)
The Post-RISC Architecture
Post-RISC CPU’s ► Traditional RISC • DEC Alpha 21164 • Sun Ultra. SPARC 1 ► (partially) Post-RISC • Power. PC 604 • MIPS R 10000 • HP PA 8000 • Intel Pentium Pro • DEC Alpha 21264 • HAL SPARC 64
Automatic Register Renaming ► Every R-write allocates new R ► The register name A is an alias for the last R allocated by a write to A ► An instruction reading and writing an register allocates a new R too
Advantages over More ISA Registers ► Smaller instructions ► Allow same software to run on range of implementations • Compare the same program running on Pentium or AMD Ath ► Less state to save • Faster function calls • Faster context switches • Life times can be optimized
Renaming Implementation ► Rename Storage Locations • Reorder Buffer • Physical Register File ► Similarities: • Allocate at decode • Release at commit
Renaming using Reorder buffer ► Results are kept in reorder buffer ► Source operands are read either from • the register file, or • a reorder buffer entry ► Not-yet-ready results are forwarded to instruction queue ► Used by Intel Pentium III, Power. PC 604, SPARC 64
Renaming on Pentium III ► All registers can be renamed (generic, floatingpoint, status) ► Renaming uses a set of 40 reorder buffers • FPU control/status cannot be renamed • Max 2 renamings per instruction
Register Allocation Example ► Minimal number of named registers ► Scheduling is limited ► Strictly serial execution Mem 2 : = Mem 1 * Mem 1; Mem 4 : = Mem 3 + 1; r. A : = Mem 1; r. A : = r. A * r. A; Mem 2 : = r. A; r. A : = Mem 3; r. A : = r. A + 1; Mem 4 : = r. A;
Renaming using Physical Register File ► Register file contains more registers than defined in ISA (logical registers) ► Map logical register to physical registers during decode ► Operands are always read from logical file ► Used by MIPS R 10000 and DEC 21264
Virtual-Physical Registers ► Motivation: better utilization of physical registers • Important in presence of long latency instructions ► Conventional scheme “wastes” register for each: • Decoded instruction that has not finished execution • Committed instruction whose result is dead Can be eliminated by maintaining reference counter Example: load fdiv fmul fadd f 2, 0(r 6) f 2, f 10 f 2, f 12 f 2, 1
Virtual-Physical Register Renaming ► General Map Table • Indexed by logical register L • VP register: last virtual physical register that L has been mapped to • P register: Last physical register that L and VP have been mapped to • V bit: indicates whether P is valid ► Physical Map Table • Has entry for each VP • Contains last physical register that VP has been mapped to
Functional Description ► For each logical source register S do a GMT lookup • If V bit is set, rename S to P • Otherwise, rename S to VP ► Rename the logical destination register to a new VP ► Update GMT: set VP to new mapping and reset V ► Save previous VP in reorder buffer to be able to roll back
Functional Description ► Instruction Queue Fields: • • Operation code Destination VP Source operands Ready bits for source operands: when ready Source operand contains a physical register number ► Reorder Buffer Entry • Destination logical register • Completion bit • VP mapping of last instruction with same logical destination
Functional Description ► When source operands are ready, instruction is issued ► When instruction completes: • new physical register R is allocated for result • PMT is updated to reflect new mapping • VP number of destination is broadcast to all entries in instruction queue with physical register identifier • GMT is updated: entry corresponding to logical destination is checked for match with the VP and if so, the physical register nr is copied to the P register field and the V flag is set • As a result a new instruction using same logical register will find corresponding physical register in GMT
Register Allocation Example ► Uses more named registers ► Scheduling more effective ► 2 -way super-scalar execution r. A : = Mem 1; r. B : = Mem 3; Mem 2 : = Mem 1 * Mem 1; r. A : = r. A * r. A; Mem 4 : = Mem 3 + 1; r. B : = r. B + 1; Mem 2 : = r. A; Mem 4 : = r. B;
Effect of Register Renaming ► Schedule uses 4 hardware registers ► 2 -way super-scalar execution r. A 1 : = Mem 1; r. B 1 : = Mem 3; r. A 2 : = r. A 1 * r. A 1; r. B 2 : = r. B 1 + 1; Mem 2 : = r. A 2; Mem 4 : = r. B 2;
Effect of Register Renaming ► Schedule uses 4 hardware registers ► Can hide memory-write latency ► Still no full use of multiple pipelines r. A 1 : = Mem 1; r. A 2 : = r. A 1 * r. A 1; Mem 2 : = r. A 2; r. A 3 : = Mem 3; r. A 4 : = r. A 3 + 1; Mem 4 : = r. A 4;
Renaming and O-O-O execution ► Instructions wait for: • • Availability of execution unit Input dependencies Older instructions have priority Load instructions have priority ► Instructions do NOT wait for: • Program order • Branch resolution • Output dependencies (use “rename register”)
Renaming and O-O-O execution ► Schedule uses 4 hardware registers ► Can hide memory-write latency ► “Bad” schedule uses both pipelines ► Only one register name used r. A 1 : = Mem 1; r. A 2 : = r. A 1 * r. A 1; Mem 2 : = r. A 2; r. A 3 : = Mem 3; r. A 4 : = r. A 3 + 1; Mem 4 : = r. A 4;
Renaming aware scheduling? ► Use Register Renaming in allocator • minimal number of named registers • maximal number of register instances ► Do not do scheduling that CPU can do • over scheduling can be worse than no scheduling at all
- A two word poetic renaming
- Literary terms lyric
- Elements of the epic hero cycle
- Beowulf epic hero cycle
- Elements of the epic hero cycle beowulf answers
- Penciptaan nilai adalah
- L "home" "user account" "register" inurl:register
- Kontinuitetshantering i praktiken
- Typiska novell drag
- Nationell inriktning för artificiell intelligens
- Vad står k.r.å.k.a.n för
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Särskild löneskatt för pensionskostnader
- Personlig tidbok för yrkesförare
- A gastrica
- Densitet vatten
- Datorkunskap för nybörjare
- Stig kerman
- Hur skriver man en debattartikel
- För och nackdelar med firo
- Nyckelkompetenser för livslångt lärande
- Påbyggnader för flakfordon
- Arkimedes princip formel
- Publik sektor
- Lyckans minut erik lindorm analys
- Presentera för publik crossboss
- Teckenspråk minoritetsspråk argument
- Bat mitza
- Klassificeringsstruktur för kommunala verksamheter
- Fimbrietratt
- Bästa kameran för astrofoto