AMD Microprocessor Technologies 062106 Ben Sander AMD Principal
AMD Microprocessor Technologies 06/21/06 Ben Sander AMD Principal Member of Technical Staff 2006
Motivation : PC Jargon Demystified • “AMD Athlon™ 64 4200+* dual-core processor with 64 bit platform, Direct Connect Architecture and Hyper. Transport™ Technology for increased multitasking performance; improved security with Enhanced Virus Protection**; Cool'n'Quiet™ Technology to minimize heat and noise” 2 06/21/06 Ben Sander
Talk Outline • Motivation • Recent innovations – Dual-core processors – Direct Connect Architecture. TM and Hyper. Transport. TM – Power-efficient design (and Cool’n’Quiet. TM) – AMD 64 Architecture • What’s next? – Direct Connect Architecture. TM enhancements – HTX “Accelerators” – Core enhancements – Virtualization and AMD-V • Summary and Conclusion 3 06/21/06 Ben Sander
Dual-Core AMD Opteron™ Processor Design • Two AMD Opteron™ processor cores on a single die – Each with 1 MB L 2 cache • Shared Northbridge – Three Hyper. Transport™ technology links – Dual-channel (128 bit) DDR interface • AMD Opteron processor designed as CMP from the start – 2 nd port on SRI, request management, 2 APICs, clocking microcode • Two complete CPUs CPU 0 CPU 1 1 MB L 2 Cache System Request Interface Crossbar Switch – Symmetric multiprocessor programming (SMP) model – Simpler, less restrictive programming model than ‘virtual CPU’ approach Memory Controller Hyper. Transport™ 0 1 2 Existing AMD Opteron™ Processor Design 4 06/21/06 Ben Sander
MPF 2004 - AMD Dual-Core Processor Chip Integration: • Two 64 -bit CPU cores • 2 MB L 2 cache • On-chip Northbridge & Memory Controller Bandwidth: • Dedicated 64 -bit L 2 busses for each core • Dual channel DDR (128 -bit) memory bus • 3 HT links (16 -bit each x 2 GT/sec x 2) Usability and Scalability: • Socket compatible: Platform and TDP! • Glueless SMP up to 4 sockets • Memory capacity & BW scale w/ CPUs Power Efficiency: • Power. Now! Optimized power management • Leadership system level power attributes 5 06/21/06 Ben Sander
AMD 64 Dual-Core Physical Design • 90 nm – Approximately same die size as 130 nm single-core AMD Opteron™ processor – ~205 million transistors • 68/95 watt power envelope – Fits into 90 nm power infrastructure • 939/940 Socket compatible – Fits into existing sockets 6 06/21/06 Ben Sander
Dual-Core : Customer Value • What is it? – Two processing cores on the same die • AMD: Clean single-core to multi-core upgrade path – Same pinout – Same power envelope! • Server customers – Server apps scale extremely well with increasing processors Transaction processing, web serving – Doubles compute density More compute power from the same motherboard More compute power in a server rack – More efficient software licensing • Consumers – Efficiently run multiple programs at the same time Operating system + background application Virus checker + photo-editing software – Significantly improves performance of threaded applications Video editing, MP 3 encoding 7 06/21/06 Ben Sander
Dual-Core AMD Opteron™ Processor Design • Two AMD Opteron™ processor cores on a single die – Each with 1 MB L 2 cache • Shared Northbridge – Three Hyper. Transport™ technology links – Dual-channel (128 bit) DDR interface • AMD Opteron processor designed as CMP from the start – 2 nd port on SRI, request management, 2 APICs, clocking microcode • Two complete CPUs CPU 0 CPU 1 1 MB L 2 Cache System Request Interface Crossbar Switch – Symmetric multiprocessor programming (SMP) model – Simpler, less restrictive programming model than ‘virtual CPU’ approach Memory Controller Hyper. Transport™ 0 1 2 • AMD Direct Connect Architecture – Everything connected directly to CPU – Reduces system architecture bottlenecks – Further reduces latency by directly connecting two cores on same die 8 06/21/06 Existing AMD Opteron™ Processor Design Ben Sander
Direct Connect : Advantages of good plumbing Chip X X MCP Chip X X Chip MCP USB Chip X X MCP PCI Chip X X MCP SRQ Crossbar Mem. Ctrlr HT 8 GB/S PCI-E Bridge PCIe. TM Bridge Memory Controller Hub I/O Chip SRQ Crossbar Mem. Ctrlr HT 8 GB/S XMB XMB 8 GB/S PCIe. TM Bridge XMB PCIe. TM Bridge 8 GB/S USB I/O Hub PCI Legacy x 86 Architecture • • 20 -year old front-side bus (FSB) architecture CPUs, Memory, I/O all share a bus Major bottleneck to performance Faster CPUs or more cores ≠ performance 9 06/21/06 AMD 64’s Direct Connect Architecture • Industry-standard technology • Direct Connect eliminates the FSB bottleneck • Hyper. Transport™ interconnect offers scalable high bandwidth and low latency Ben Sander
AMD Direct Connect : Customer Value • What is it? – Direct connection of cpu to the DRAM/memory – And cpu-to-cpu for multi-processor systems. • Increased performance – Reduced memory latency – Reduced chip communication latency • Reduced power – Reduced chip-count in system – Reduced external pin switching • Scalability – Unlocks the potential of faster CPUs and additional cores 10 06/21/06 Ben Sander
What’s Consuming all the Power? Server power consumption 38% - 63% Computer Room Air Conditioner power consumption 23% - 54% Battery Backup power consumption 6% - 13% Lighting power consumption 1% - 2% Server Power Consumption Impacts Power throughout the Datacenter 11 06/21/06 Ben Sander
System-level Power Consumption – Present Day Chip X Chip X X 692 watts MCP MCP Chip X X MCP SRQ Crossbar HT Mem. Ctrlr HT 380 watts USB I/O Hub I/O PCI 8 GB/S PCI-E Bridge PCIe. TM Bridge 14 watts Memory Controller Hub SRQ Crossbar Mem. Ctrlr HT 8 GB/S 8. 5 XMB watts 8 GB/S PCIe. TM Bridge 8. 5 XMB watts PCIe. TM Bridge 8 GB/S USB I/O Hub PCI Dual-Core Packages with legacy technology Dual-Core AMD Opteron™ processors • 692 watts for processors (173 w each) • 48 watts for external memory controller • 380 watts for processors (95 w each) • Integrated memory controllers 95% More Power 740 watts 380 watts Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used 12 06/21/06 Ben Sander
Reducing Power and Cooling Requirements with Processor Performance States P-State P 0 HIGH Average CPU Core Power (measured at CPU) 2600 MHz 1. 40 V ~95 watts 25 Power. Now! ENABLED P 1 20 P 2 P 3 PROCESSOR UTILIZATION 2000 MHz 1. 25 V ~65 watts 1800 MHz 1. 20 V ~55 watts 15 -62% -75% 10 0 10500 Connections 5000 Connections (~62% CPU Utilization) (~40% CPU Utilization) P 5 13 -33% 5 P 4 1000 MHz 1. 10 V ~32 watts Power (W) 2400 MHz 1. 35 V ~90 watts 2200 MHz 1. 30 V ~76 watts Power. Now! DISABLED LOW Idle (in OS) Up to 75% power savings! 06/21/06 Ben Sander
Power-efficient design : Customer Value • What is it? – Power. Now! Technology changes frequency in response to workload At lower frequencies, voltage is reduced as well – Power efficiency “designed-in” Appropriate frequency targets Integrate external chipset logic (aka Dirrect Connect) “Fine gating” and other design-for-power techniques • Customer value – Server: Save $$$ on server power and air conditioning – Desktop: Quieter operation via “Cool’n’Quiet™” technology – Notebook: Longer battery life 14 06/21/06 Ben Sander
AMD 64 : Evolutionary 64 -bit ISA • What is it? – Evolutionary extension to support “ 64 -bits” on x 86 processors – Now an industry standard supported by other processor vendors • Why 64 bits? – Driven by apps needing large amounts of memory CAD tools, large databases, simulations – 64 -bit integer arithmetic Security and encryption applications • Why extend x 86 to 64 bits? – X 86 is the most widely installed instruction set in the world – Delivers 64 -bit advantages while providing full x 86 compatibility – Doesn’t require a completely new tool chain • User benefits from 64 bits: – Large-memory applications Some applications see 10 x speedup from additional memory. 64 -bit flat programming model massively easier for software developers – Some performance improvement from additional registers and wider data operations – AMD 64: Backwards compatibility allows migration on customer’s timeframe 15 06/21/06 Ben Sander
Design Goals for AMD 64 Technology • Processor is fully compatible with existing x 86 modes • Straightforward extensions for 64 bits – Minimize architectural divergences Maintain consistency with existing architecture – Minimize instruction set encoding changes – Straightforward implementation & verification • Double the number of Integer and SSE registers • Architectural support for 64 bits of virtual address space and 52 bits of physical address space – Implementations may support less • 64 -bit integer operations • Eliminate unused/underutilized arcane x 86 features within the context of 64 -bit mode 16 06/21/06 Ben Sander
AMD 64 Programmer’s Model RAX 17 06/21/06 Ben Sander
REX prefix byte • Additional registers encoded without altering existing instruction format • Optional REX prefix specifies 64 -bit operation size override – Plus 3 additional register encoding bits • REX is actually a family of 16 prefixes (40 -4 F) • Average instruction length in 64 -bit mode increased by 0. 4 bytes 18 06/21/06 Ben Sander
Talk Outline • Motivation • Recent innovations – Dual-core processors – Direct Connect Architecture. TM and Hyper. Transport. TM – Power-efficient design (and Cool’n’Quiet. TM) – AMD 64 Architecture • What’s next? – Direct Connect Architecture. TM enhancements – HTX “Accelerators” – Core enhancements – Virtualization and AMD-V • Summary and Conclusion 19 06/21/06 Ben Sander
Co-processors and Accelerators Excellent way to get power-efficient performance boosts q Special-purpose, tuned solutions for common functions q Drop to low-power states when not in use q Enabled by Modern API’s Aligns with modularity imperative q Co-processor becomes another (optional) “IP block” q Micro-architecture: Command delivery, Synchronization, Streaming Promising Concept Many possible opportunities now, and/or in the future q Media processing q JVM/CLR runtime hosting q NIC integration (TOE, XML, SSL, etc) 20 06/21/06 Ben Sander
Hyper. Transport HTXTM Enables System-level Coprocessing Today 21 06/21/06 Ben Sander
AMD’s Next Generation Processor Technology • Scalable performance and balance • Maintain performance per watt leadership Faster Hyper. Transport links (up to 5. 2 GT/sec) Additional bandwidth enhancements On-chip shared L 3 cache Independent NB and CPU power management Independent CPU P-state and C-state controls • Performance on diverse workloads Enhanced IPC CPU core; >2 X FPU performance 48 -bit virtual and physical address space 1 GB large page support Platform support for co-processors • Compatibility DDR 2 memory support with migration to DDR 3 FBDIMM Gen 1 and Gen 2 at the appropriate time HT-1 backwards compatibility • Enhanced Virtualization I/O Virtualization Nested paging support • Enhanced RAS Memory mirroring Data poisoning support HT retry protocol support 22 06/21/06 Ben Sander
AMD’s Next Generation Processor Technology Optimized for 65 nm SOI and beyond Native quad core die Expandable shared L 3 cache IPC enhanced CPU cores 32 B instruction fetch Improved branch prediction Out-of-order load execution Up to 4 DP FLOPS/cycle Dual 128 -bit SSE dataflow Dual 128 -bit loads per cycle Improved core and Northbridge prefetchers Bit Manipulation extensions (LZCNT/POPCNT) SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS) 23 06/21/06 Enhanced Direct Connect Architecture and Northbridge HT-3 links (5. 2 GT/sec) Enhanced crossbar DDR 2 with migration path to DDR 3 FBDIMM when appropriate Enhanced power management Enhanced RAS Ben Sander
Virtualization is the pooling and abstraction of resources in a way that masks the physical nature and boundaries of those resources from the resource users 24 06/21/06 Ben Sander
Virtualization: Customer Value • What it is? – Allows a single computer to efficiently run multiple guest Operating Systems and associated applications – AMD-V provides hardware acceleration for virtualization And simplfies the development process. • Benefits: – Consolidation More efficient use of compute resources Eliminate “single-application” servers Consolidate old unsupported servers onto newer hardware – Migration/reliability If a server fails, can easily move app to another server – Allows developers to easily test multiple OS environments on a single machine. – Upgrades can be tested on hardware before deployment 25 06/21/06 Ben Sander
Virtualization Methods • Software-only virtualization – – Software acts a translator between OS and hardware No need to modify the operating system Available today Can be slow • OS-enabled virtualization – Host OS and virtualization software tightly integrated Offers improved performance But requires changes to OS • Processor-supported virtualization – Processor protects memory locations so that only virtualization software can access them – Processor provides hooks on all system-level instructions – Accelerated performance and better security 26 06/21/06 Ben Sander
AMD-V: Overview • Virtualization is being used in several server scenarios today • AMD expects that virtualization will prove valuable for PC clients too • There are ways to modify the X 86 architecture, so that virtualization is easier to accomplish, performs better, and provides more security • AMD’s AMD-V technology is being developed for future AMD 64 CPUs for servers and clients • Key technologies include adding new instructions, supporting different methods of handling page tables, handle host and guest interrupts (including SMI/SMM), and provide DMA protection 27 06/21/06 Ben Sander
Summary and Conclusion üAMD is focused on customer-centric innovation and value – Dual-core processors – Direct Connect Architecture and Hyper. Transport – Power-efficient design – AMD 64 Architecture – And more! üAMD is investing heavily in extending our leadership – – 28 Next generation Direct Connect Architecture technology Next generation CPU technology AMD-V and hardware virtualization Developing a fundamental understanding of important emerging trends 06/21/06 Ben Sander
Thank you ! www. amd. com/power © 2006 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow, AMD Athlon, AMD Opteron and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Hyper. Transport is a trademark of the Hyper. Transport Consortium PCI-X, PCIe and PCI Express are trademarks of PCI-SIG Other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. 29 06/21/06 Ben Sander
Backup 30 06/21/06 Ben Sander
AMD Architectural Generations Now AMD 64 Architecture Dual Core Architecture Coming Soon Extensions to AMD 64 Multi-core Architecture Future FPU Extensions to AMD 64 Throughput Architecture Direct Connect Architecture Scalable SMP Architecture Enhanced Virus Protection AMD-V Virtualization Secure Execution Hyper. Transport™ v 1. 0, v 2. 0 Hyper. Transport v 3. 0 Hyper. Transport v 4. 0 DDR, DDR 2 AMD Power. Now!™ Technology DDR 3, FBDIMM Partitioned Power. Now! Mainframe-class reliability On-chip Coprocessors DDR 4, FBD 2 System Resource Mgmnt Best-in-class Reliability High Reliability RAS System Performance 31 System Perf. / Watt 06/21/06 Throughput / Watt / $$ Ben Sander
- Slides: 31