IA64 Microarchitecture Itanium Processor Jun Feng Jun Xie

  • Slides: 21
Download presentation
IA-64 Microarchitecture --- Itanium Processor Jun Feng Jun Xie Huafeng Lü

IA-64 Microarchitecture --- Itanium Processor Jun Feng Jun Xie Huafeng Lü

Outline n n Introduction Pipeline Issue Performance Comparison Summary

Outline n n Introduction Pipeline Issue Performance Comparison Summary

Itanium Processor n n n First implementation of IA-64 Compiler based exploitation of ILP

Itanium Processor n n n First implementation of IA-64 Compiler based exploitation of ILP Also has many features of superscalar

10 -stage Pipeline n n Front-end Instruction delivery Operand delivery Execution

10 -stage Pipeline n n Front-end Instruction delivery Operand delivery Execution

Front-end n IPG, Fetch, Rotate Prefetches up to 32 bytes per cycle (2 bundles)

Front-end n IPG, Fetch, Rotate Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold 8 bundles) n Branch prediction is done using a multilevel adaptive predictor n

Instruction delivery n EXP and REN Distributes up to 6 instructions to the 9

Instruction delivery n EXP and REN Distributes up to 6 instructions to the 9 functional units n Implements registers renaming for both rotation and register stacking n

Operand delivery n WLD and REG n Accesses the register file n Performs register

Operand delivery n WLD and REG n Accesses the register file n Performs register bypassing n Accesses and updates a register scoreboard n Checks predicate dependences

Execution n EXE, DET and WRB Executes instructions through ALUs and load/store units n

Execution n EXE, DET and WRB Executes instructions through ALUs and load/store units n n Detects exceptions and posts Na. Ts n Retires instructions and performs write-back

Integer Performance SPECint benchmark: considerably slower n n Itanium is considerably slower than Alpha

Integer Performance SPECint benchmark: considerably slower n n Itanium is considerably slower than Alpha 21264 and Pentium 4. Only: 60% of of P 4, 68% of Alpha Itanium: HP rx 4610, 800 MHz, 4 MB off-chip L 3 cache Alpha 21264: Compaq GS 320, 1 GHz, on-chip L 2 cache Pentium 4: Compaq Precision 330, 2 GHz, 256 KB on-chip L 2 cache

Floating Point Performance SPECfp benchmarks: a different story n n Itanium is quicker than

Floating Point Performance SPECfp benchmarks: a different story n n Itanium is quicker than Alpha 21264 and Pentium 4. 108% of of P 4, 120% of Alpha Itanium: HP rx 4610, 800 MHz, 4 MB off-chip, L 3 cache Alpha 21264: Compaq GS 320, 1 GHz, on-chip L 2 cache Pentium 4: Compaq Precision 330, 2 GHz, on-chip L 2 cache

Discussion on SPECfp n Floating point app: competitive. higher degrees of ILP. aggressive memory

Discussion on SPECfp n Floating point app: competitive. higher degrees of ILP. aggressive memory system n Art benchmark: 4 times of Pentium 4 n Alpha: outperform when tuned n In terms of power: worse than P 4 56% of floating point performance per watt

Summary By Us n n n Good floating point performance Poor integer performance Overall:

Summary By Us n n n Good floating point performance Poor integer performance Overall: not so good as Intel has advertised

Conclusion n n Large code size Only static instruction-level parallelism Cannot manage cache misses/hits

Conclusion n n Large code size Only static instruction-level parallelism Cannot manage cache misses/hits flexibly Lack of applications