Announcing the IA64 Architecture Hans Mulder Jerry Huck

  • Slides: 34
Download presentation
Announcing the IA-64 Architecture Hans Mulder Jerry Huck Lead Architect Intel Corporation Manager and

Announcing the IA-64 Architecture Hans Mulder Jerry Huck Lead Architect Intel Corporation Manager and Lead Architect Hewlett Packard Co. Introduction by: Albert Yu Senior Vice President and General Manager Microprocessor Products Group Intel Corporation ®

Agenda l Introduction l IA-64 Architecture Announcement l IA-64 - Inside the Architecture l

Agenda l Introduction l IA-64 Architecture Announcement l IA-64 - Inside the Architecture l Features for E-business l Features for Technical Computing l Summary ® 2

IA-64: A New Computing Era l Most significant architecture advancement since 32 -bit computing

IA-64: A New Computing Era l Most significant architecture advancement since 32 -bit computing with the 80386 – 80386: multi-tasking, advances from 16 bit to 32 bit – Merced: explicit parallelism, advances from 32 bit to 64 bit l Application Instruction Set Architecture Guide – Complete disclosure of IA-64 application architecture l Result of the successful collaboration between Intel and HP ® 3

Creating Complete IA-64 Solutions Intel 64 Fund Enterprise Technology Centers High-end Platform Initiatives Application

Creating Complete IA-64 Solutions Intel 64 Fund Enterprise Technology Centers High-end Platform Initiatives Application Instruction Set Architecture Guide Internet, Enterprise, and Workstation IA-64 Solutions Development Systems ® Operating Systems Intel Developer Forum Application Solution Centers Tools Software Enabling Programs Industry wide IA-64 development 4

IA Server/Workstation Roadmap Madison IA-64 Perf Deerfield IA-64 Price/Performance Mc. Kinley Merced Foster .

IA Server/Workstation Roadmap Madison IA-64 Perf Deerfield IA-64 Price/Performance Mc. Kinley Merced Foster . . . Future IA-32 . . . Pentium®III Xeon™ Proc. Pentium® II Xeon. TM Processor ’ 98. 25µ ® ’ 99 ’ 00. 18µ ’ 01 ’ 02. 13µ IA-64 starts with Merced processor ’ 03 All dates specified are target dates provided for planning purposes only and are subject to change. 5

IA-64 Architecture Announcement ® 6

IA-64 Architecture Announcement ® 6

IA Changing the Face of High End Computing A B C D DISTRIBUTION APPLICATIONS

IA Changing the Face of High End Computing A B C D DISTRIBUTION APPLICATIONS SYSTEM SOFTWARE SYSTEMS CPUs “Vertical Market Structure” • Limited Compatibility • Few Choices • Proprietary business ® Channel Choices Application Choices OS Choices System Choices Intel Architecture “Horizontal Market Structure” • Highly Interoperable • Many Choices • Volume economics Unifying high end computing with a common infrastructure

Merced Industry Rollout 1999 Intel 64 Fund 2000 Merced Prototype Systems IA-64 Architecture Public

Merced Industry Rollout 1999 Intel 64 Fund 2000 Merced Prototype Systems IA-64 Architecture Public Release Production Solutions Beta OSs and apps Prototypes to ISVs Open source software enabling Key apps running on simulator Compilers/Development tools shipping OEM board / systems development ® IA-64 application architecture an integral part of a comprehensive plan 8

IA-64 Application Architecture l Application instructions and opcodes – Instructions available to an application

IA-64 Application Architecture l Application instructions and opcodes – Instructions available to an application programmer – Machine code for these instructions l Unique architecture features & enhancements – Explicit parallelism and templates – Predication, speculation, memory support, and others – Floating-point and multimedia architecture l IA-64 resources available to applications – Large, application visible register set – Rotating registers, register stack engine l IA-32 & PA-RISC compatibility models Details now available to the broad industry ® 9

Today’s Architecture Challenges l Performance barriers : – Memory latency – Branches – Loop

Today’s Architecture Challenges l Performance barriers : – Memory latency – Branches – Loop pipelining and call / return overhead l Headroom constraints : – Hardware-based instruction scheduling – Unable to efficiently schedule parallel execution – Resource constrained – Too few registers – Unable to fully utilize multiple execution units l Scalability limitations : – Memory addressing efficiency ® IA-64 addresses these limitations 10

IA-64 Mission l l Overcome the limitations of today’s architectures Provide world-class floating-point performance

IA-64 Mission l l Overcome the limitations of today’s architectures Provide world-class floating-point performance Support large memory needs with 64 -bit addressability Protect existing investments – Full binary compatibility with existing IA-32 instructions in hardware – Full binary compatibility with PA-RISC instructions through software translation l Support growing high-end application workloads – E-business and internet applications – Scientific analysis and 3 D graphics Define the next generation computer architecture ® 11

IA-64 Architecture : Explicit Parallelism Original Source Code Parallel Machine Code Compiler IA-64 Compiler

IA-64 Architecture : Explicit Parallelism Original Source Code Parallel Machine Code Compiler IA-64 Compiler Views Wider Scope ® Hardware More efficient use of execution resources multiple functional units . . . Fundamental design philosophy enables new levels of headroom 12

IA-64 : Explicitly Parallel Architecture 128 bits (bundle) Instruction 2 41 bits Memory (M)

IA-64 : Explicitly Parallel Architecture 128 bits (bundle) Instruction 2 41 bits Memory (M) l Instruction 1 41 bits Memory (M) Template 5 bits Instruction 0 41 bits Integer (I) (MMI) IA-64 template specifies – The type of operation for each instruction – MFI, MMI, MII, MLI, MIB, MMF, MFB, MMB, MBB, BBB – Intra-bundle relationship – M / MI or MI / I M=Memory F=Floating-point I=Integer L=Long Immediate B=Branch – Inter-bundle relationship l Most common combinations covered by templates – Headroom for additional templates l l Simplifies hardware requirements Scales compatibly to future generations ® Basis for increased parallelism 13

Full Binary IA-32 Instruction Compatibility Jump to IA-64 IA-32 Instruction Set IA-64 Instruction Set

Full Binary IA-32 Instruction Compatibility Jump to IA-64 IA-32 Instruction Set IA-64 Instruction Set Branch to IA-32 Intercepts, Exceptions, Interrupts IA-64 Hardware (IA-32 Mode) IA-64 Hardware (IA-64 Mode) Registers Execution Units System Resources • IA-32 instructions supported through shared hardware resources • Performance similar to volume IA-32 processors Preserves existing software investments ® 14

Full Binary Compatibility for PA-RISC l Transparency: – Dynamic object code translator in HP-UX

Full Binary Compatibility for PA-RISC l Transparency: – Dynamic object code translator in HP-UX automatically converts PA-RISC code to native IA-64 code – Translated code is preserved for later reuse l Correctness: – Has passed the same tests as the PA-8500 l Performance: – Close PA-RISC to IA-64 instruction mapping – Translation on average takes 1 -2% of the time Native instruction execution takes 98 -99% – Optimization done for wide instructions, predication, speculation, large register sets, etc. – PA-RISC optimizations carry over to IA-64 ® 15

High Performance Computing Applications E-business servers -Large number of users -Large databases -High availability

High Performance Computing Applications E-business servers -Large number of users -Large databases -High availability -Secure environment Workstations and high performance technical computing -Digital content creation -Design engineering (EDA, MDA, etc) -Scientific / financial analysis ® IA-64 architecture optimized for these high growth applications 16

E-Business Environment Applications Mid-tier IP Services Front End Web IA-64 focus area Back-end Data

E-Business Environment Applications Mid-tier IP Services Front End Web IA-64 focus area Back-end Data E-Commerce Mail ~ ~ Security CSU/DSU, ISDN, ADSL Cable. . . ~ ~ ERP Network Hub Intelligent Storage Server Production Databases (Failover Cluster) DNS News Data Warehouse, DSS (Scalability Cluster) Systems/Network Management ® E-business is compute- intensive requiring security and support for large databases 17

IA-64 for High Performance Databases l Number of branches in large server apps overwhelm

IA-64 for High Performance Databases l Number of branches in large server apps overwhelm traditional processors – IA-64 predication removes branches, avoids mispredicts l Environments with a large number of users require high performance – IA-64 uses speculation to reduce impact of memory latency – Significant benefit to large databases with many cache accesses – 64 -bit addressing enables systems with very large virtual and physical memory ® 18

Middle Tier Application Needs l Mid-tier applications (ERP, etc. ) have diverse code requirements

Middle Tier Application Needs l Mid-tier applications (ERP, etc. ) have diverse code requirements – Integer code with many small loops – Significant call / return requirements (C++, Java) l IA-64’s unique register model supports these various requirements – Large register file provides significant resources for optimized performance – Rotating registers enables efficient loop execution – Register stack to handle call-intensive code IA-64 resources enable optimization for a variety of application requirements ® 19

IA-64’s Large Register File Floating-Point Registers Integer Registers 63 0 0 GR 1 81

IA-64’s Large Register File Floating-Point Registers Integer Registers 63 0 0 GR 1 81 GR 0 GR 1 GR 31 GR 32 GR 127 Na. T 0 0. 0 63 Branch Registers Predicate Registers 0 bit 0 BR 0 PR 0 1 PR 1 BR 7 PR 15 PR 16 PR 63 32 Static 96 Stacked, Rotating 96 Rotating 16 Static 48 Rotating Large number of registers enables flexibility and performance ® 20

Software Pipelining via Rotating Registers l Software pipelining - improves performance by overlapping execution

Software Pipelining via Rotating Registers l Software pipelining - improves performance by overlapping execution of different software loops - execute more loops in the same amount of time l Software Pipelining Loop Execution Time Sequential Loop Execution Traditional architectures need complex software loop unrolling for pipelining – Results in code expansion --> Increases cache misses --> Reduces performance l IA-64 utilizes rotating registers to achieve software pipelining – Avoids code expansion --> Reduces cache misses --> Higher performance ® IA-64 rotating registers enable optimized loop execution 21

Traditional Register Models Traditional Register Stacks Procedure Register Memory Procedures Register B A A

Traditional Register Models Traditional Register Stacks Procedure Register Memory Procedures Register B A A B B C C l l l Procedure A calls procedure B Procedures must share space in register Performance penalty due to register save / restore IA-64 significantly improves upon this ® D l l ? D Eliminate the need for save / restore by reserving fixed blocks in register However, fixed blocks waste resources 22

IA-64 Register Stack Traditional Register Stacks Procedures Register A A B B B C

IA-64 Register Stack Traditional Register Stacks Procedures Register A A B B B C C C D D D l l IA-64 Register Stack ? Eliminate the need for save / restore by reserving fixed blocks in register However, fixed blocks waste resources ® l l B C D D IA-64 able to reserve variable block sizes No wasted resources IA-64 combines high performance and high efficiency 23

IA-64 Security Performance for E-Business Achieved thru 64 -bit Integer Multiply-Add IA-64 Security Performance

IA-64 Security Performance for E-Business Achieved thru 64 -bit Integer Multiply-Add IA-64 Security Performance RSA Algorithm – Estimated performance* Pentium® Processor Future 32 -bit Processor Merced Processor IA-64 delivers secure transactions to more users ® *Intel estimates * All third party marks, brands, and names are the property of their respective owners 24

Delivery of Streaming Media l Audio and video functions regularly perform the same operation

Delivery of Streaming Media l Audio and video functions regularly perform the same operation on arrays of data values – IA-64 manages its resources to execute these functions efficiently – Able to manage general register’s as 8 x 8, 4 x 16, or 2 x 32 bit elements – Multimedia operands/results reside in general registers l IA-64 accelerates compression / decompression algorithms – Parallel ALU, Multiply, Shifts – Pack/Unpack; converts between different element sizes. l Fully compatible with IA-32 MMXä technology, Streaming SIMD Extensions and PA-RISC MAX 2 IA-64 resources and parallelism enables efficient delivery of rich web content ® 25

Technical Computing Environment • Rendering • Editing • 3 D Animation DCC • Verification

Technical Computing Environment • Rendering • Editing • 3 D Animation DCC • Verification • Synthesis • DRC EDA • FEA • Modeling • Hi-end CAE MDA • Equity • Treasury • Risk Analysis Finance • CFD • GIS • Molecular Scientific Analysis High performance floating -point is key ® 26

IA-64 for Scientific Analysis l Variety of software optimizations supported – Load double pair

IA-64 for Scientific Analysis l Variety of software optimizations supported – Load double pair : doubles bandwidth between L 1 & registers – Full predication and speculation support – Na. T Value to propagate deferred exceptions – Alternate IEEE flag sets allow preserving architectural flags – Software pipelining for large loop calculations l High precision & range internal format : 82 bits – Mixed operations supported: single, double, extended, and 82 -bit – Interfaces easily with memory formats – Simple promotion/demotion on loads/stores – Iterative calculations converge faster – Ability to handle numbers much larger than RISC competition without overflow High performance & High precision ® 27

IA-64 Floating-Point Architecture (82 bit floating point numbers) Multiple read ports Memory 128 FP

IA-64 Floating-Point Architecture (82 bit floating point numbers) Multiple read ports Memory 128 FP Register File A B+ C FMAC #1 Multiple write ports l X FMAC #2 . . . FMAC . . . D 128 registers – Allows parallel execution of multiple floating-point operations l Simultaneous Multiply - Accumulate (FMAC) – 3 -input, 1 -output operation : a * b + c = d – Shorter latency than independent multiply and add – Greater internal precision and single rounding error ® Resourced for scientific analysis and 3 D graphics 28

IA-64 3 D Graphics Capabilities l Many geometric calculations (transforms and lighting) use 32

IA-64 3 D Graphics Capabilities l Many geometric calculations (transforms and lighting) use 32 -bit floating-point numbers l IA-64 configures registers for maximum 32 -bit floatingpoint performance – Floating-point registers treated as 2 x 32 bit single precision registers – Able to execute fast divide – Achieves up to 2 X performance boost in 32 -bit data floating-point operations l Full support for Pentium® III processor Streaming SIMD Extensions (SSE) IA-64 enables world-class GFLOPs performance ® * estimated 29

Memory Support for High Performance Technical Computing l l Scientific analysis, 3 D graphics

Memory Support for High Performance Technical Computing l l Scientific analysis, 3 D graphics and other technical workloads tend to be predictable & memory bound IA-64 data pre-fetching of operations allows for fast access of critical information – Reduces memory latency impact l IA-64 able to specify cache allocation – Cache hints from load / store operations allow data to be placed at specific cache level – Efficient use of caches, efficient use of bandwidth Reduces the memory bottleneck ® 30

IA-64 Features Function Benefits IA-64 : Next Generation Architecture Explicit Parallelism : compiler /

IA-64 Features Function Benefits IA-64 : Next Generation Architecture Explicit Parallelism : compiler / Executes more instructions in • Maximizes headroom for hardware synergy the same amount of time Register Model : large register file, rotating registers, register stack engine Able to optimize for scalar and • World-class performance object oriented applications for complex applications Floating Point Architecture : extended precision calculations, 128 registers, FMAC, SIMD Multimedia Architecture : parallel arithmetic, parallel shift, data arrangement instructions Memory Management : 64 -bit addressing, speculation, memory hierarchy control Compatibility : full binary compatibility with existing IA-32 instructions in hardware, PARISC through software translation High performance 3 D graphics and scientific analysis Improves calculation throughput for multimedia data the future • Enables more complex scientific analysis • Faster digital content creation and rendering • Efficient delivery of rich Web content Manages large amounts of • Increased architecture & memory, efficiently organizes system scalability data from / to memory Existing software runs seamlessly • Preserves investment in existing software ® 31

IA-64 Details Made Public l IA-64 Application ISA Guide (AIG) – Application instructions and

IA-64 Details Made Public l IA-64 Application ISA Guide (AIG) – Application instructions and machine code – Application programming model – Unique architecture features & enhancements l Provides understanding of IA-64 for the broad industry – Features and benefits for key applications – Insight into techniques for optimizing IA-64 solutions l IA-64 AIG and other developer information available 5/26 – http: //developer. intel. com/design/ia 64/index. htm – http: //www. hp. com/go/ia 64 Continuing to fuel IA-64 developer momentum ® 32

Supporting IA-64 Solutions Hardware Operating Systems and Infrastructure Processors, Chipsets, Platforms Multiple Operating Systems

Supporting IA-64 Solutions Hardware Operating Systems and Infrastructure Processors, Chipsets, Platforms Multiple Operating Systems (Win 64, Unix, Open Source ) BIOS and Drivers Software Development (Development tools, Porting Centers) Industry Enabling IA-64 Solutions Applications Systems Support Investments (IA-64 Fund, Other) IA-64 Application Architecture (Public Unveiling) ® IA-64 application architecture an integral part of a comprehensive plan 33

Summary l l IA-64 represents the most significant architecture development since 80386 IA-64 advances

Summary l l IA-64 represents the most significant architecture development since 80386 IA-64 advances beyond the capabilities of traditional architectures – Compiler / hardware synergy, massive resources, headroom l IA-64 provides features to benefit the high-end applications of the future – E-business – Technical computing l Today’s architecture unveiling is an additional element of the comprehensive IA-64 industry program IA-64 begins with Merced ® 34