IA64 Microarchitecture Itanium Processor Jun Feng Jun Xie
- Slides: 21
IA-64 Microarchitecture --- Itanium Processor Jun Feng Jun Xie Huafeng Lü
Outline n n Introduction Pipeline Issue Performance Comparison Summary
Itanium Processor n n n First implementation of IA-64 Compiler based exploitation of ILP Also has many features of superscalar
10 -stage Pipeline n n Front-end Instruction delivery Operand delivery Execution
Front-end n IPG, Fetch, Rotate Prefetches up to 32 bytes per cycle (2 bundles) into a prefetch buffer (up to hold 8 bundles) n Branch prediction is done using a multilevel adaptive predictor n
Instruction delivery n EXP and REN Distributes up to 6 instructions to the 9 functional units n Implements registers renaming for both rotation and register stacking n
Operand delivery n WLD and REG n Accesses the register file n Performs register bypassing n Accesses and updates a register scoreboard n Checks predicate dependences
Execution n EXE, DET and WRB Executes instructions through ALUs and load/store units n n Detects exceptions and posts Na. Ts n Retires instructions and performs write-back
Integer Performance SPECint benchmark: considerably slower n n Itanium is considerably slower than Alpha 21264 and Pentium 4. Only: 60% of of P 4, 68% of Alpha Itanium: HP rx 4610, 800 MHz, 4 MB off-chip L 3 cache Alpha 21264: Compaq GS 320, 1 GHz, on-chip L 2 cache Pentium 4: Compaq Precision 330, 2 GHz, 256 KB on-chip L 2 cache
Floating Point Performance SPECfp benchmarks: a different story n n Itanium is quicker than Alpha 21264 and Pentium 4. 108% of of P 4, 120% of Alpha Itanium: HP rx 4610, 800 MHz, 4 MB off-chip, L 3 cache Alpha 21264: Compaq GS 320, 1 GHz, on-chip L 2 cache Pentium 4: Compaq Precision 330, 2 GHz, on-chip L 2 cache
Discussion on SPECfp n Floating point app: competitive. higher degrees of ILP. aggressive memory system n Art benchmark: 4 times of Pentium 4 n Alpha: outperform when tuned n In terms of power: worse than P 4 56% of floating point performance per watt
Summary By Us n n n Good floating point performance Poor integer performance Overall: not so good as Intel has advertised
Conclusion n n Large code size Only static instruction-level parallelism Cannot manage cache misses/hits flexibly Lack of applications
- Xie jun feng
- Itanium instruction set
- Processor microarchitecture
- Ia-64
- Ia64 architecture
- Itanium architecture
- Microarchitecture diagram
- µops
- Microinstruction format
- Structured computer organization
- Isa definition computer
- Agner fog instruction tables
- Pregiera padre nostro
- Convection threshold
- Teleast internet prices
- Shangping xie
- Pet rock guide
- Tim xie
- Elizabeth xie
- Pengtao xie
- Hen duo qian
- Jack ma leadership style