EMERGING TRENDS OF INTEL MICROPROCESSORS Presented by Nasser
EMERGING TRENDS OF INTEL MICROPROCESSORS Presented by : Nasser Hadjloo http: //Hajloo. wordpress. com
Design Considerations Instruction-level parallelism. Use of Cache hierarchies and their management. Higher clock speeds The Front Side Bus (FSB). Multi-Threading. Power Consumption and heating issues. Etc …
Intel Architectures: Netburst Net. Burst Core Nehalem Sandy Bridge
Net. Burst Architecture
Features of Netburst Architecture Hyperthreading single processor appears to be two logical processor Each logical processor has its own set of register, APIC( Advanced programmable interrupt controller) Increases resource utilization and improve performance.
Rapid Execution Engine: Arithmetic Logic Units (ALUs) run at twice the processor frequency. Basic integer operations executes in 1/2 processor clock tick. Provides higher throughput and reduced latency of execution.
Netburst Microarchitecture
Design Considerations Deeper pipeline(20 stage) with increased branch mispredictions but greater clock speeds and performance. Techniques to hide penalties such as parallel execution, buffering, and speculation. Executes instructions dynamically and out-of order. Performance of a particular code sequence may vary depending on the state the machine was in when that code sequence was entered.
Modifications in Net. Burst Northwood design combined an increased cache size, a smaller 130 nm fabrication process, and hyperthreading technology Prescott, had a heavily improved branch predictor, the introduction of the SSE 3 SIMD instructions , the implementation of Intel 64, Intel's branding for their compatible implementation of the x 86 -64 64 -bit version of the x 86 architecture two Prescott cores in a single die, and later Presler, which consists of two Cedar Mill cores on two separate dies. But this had problems……….
Heading to Core Net. Burst Core Nehalem Sandy Bridge
Core Microachitecture
Core Microarchitecture
Design Considerations of Core L 2 control unit (super-queue)= L 2 controller (snoop requests)+ Bus control unit (data and I/O requests to and from the external bus). Prefetching unit is extended to handle separately hardware prefetching by each core. Shared L 2 cache in the Core 2 Duo eliminates on-chip L 2 -level cache coherence and between L 1 s of two cores in Core 2 Duo. Although, Core 2 Duo benefits from its on-chip access to the other L 1 cache, its performance is limited.
Features of Core Architecture Multiple cores and hardware virtualization. 14 stage pipeline (smaller than Netburst). Dual core design with linked L 1 cache and shared L 2 cache. Macrofusion - Two program instructions can be executed as one micro-operation. Intelligent Power Capability- manages run time power consumption of the processors’ execution cores. Includes advanced power gating capacity- ultra finegrained control systems that turns on individual processor logic subsystems only if when they are needed.
Modifications in Core Allendale core, with 2 MB L 2 cache, offers a smaller die size and therefore greater yields. Merom, the first mobile version of the Core 2, gives more emphasis on low power consumption to enhance notebook battery life. Kentsfield released was the first Intel desktop quad core CPU. It comprises of two separate silicon dies (each equivalent to a single Core 2 duo) on one multi chip module Penryn design are the addition of new instructions including SSE 4. Problem……. .
Problem with quad core
Heading to Nehalem Net. Burst Core Nehalem Sandy Bridge
Introduction Core i 7 New Intel CPU brand name for the business and high-end consumer markets Core i 5 processors intended for the main -stream consumer market Core i 3 processors intended for the entry-level consumer market
Features of Nehalem Integrated Memory Controller Quick Path Interconnect Advanced Configuration and Power States Improvements to the pipeline (L 2 Branch Predictor, Renamed Returned Stack Buffer, L 2 TLB, etc) Hyper. Threading SSE 4. 2 instructions Nehalem architecture has a three-level cache
Core i 7 History It was started by Bloomfield Architecture in 2008 In 2009 Lynnfield and Clarksfield models cames Prior to 2010 all models were quad core In 2010 Arrandale (dual core) models comes In 2010 Gulftown models (extreme) comes which has six hyperthreaded cores
Bloomfield All models started by Core-i 7 9 xx with socket 1366 Includes single-processor servers sold as Xeon 35 xx Replaced Yorkfield processors Use a different socket than other core-I cpus. Even from all 45 nm cpus On-die memory controller (uncore clock) Use (only one) QPI instead of FSB Support for SSE 4. 2 & SSE 4. 1 instruction sets
Bloomfield 32 KB L 1 instruction and 32 KB L 1 data cache per core 256 KB L 2 cache (combined instruction and data) per core 8 MB L 3 (combined instruction and data) "inclusive", shared by all cores "Turbo Boost" technology allows all active cores to intelligently clock themselves up in steps of 133 MHz over the design clock rate as long as the CPU's predetermined thermal and electrical requirements are still met
Lynnfield Used on Core-i 5 There is no QPI but directly connects to a southbridge using a 2. 5 GT/s Direct Media Interface and to other devices using PCI Express links in its Socket 1156 Core i 7 processors based on Lynnfield have Hyper-Threading, which is disabled in Lynnfield-based Core i 5 processors
Lynnfield Core i 5 -7 xx, Core i 7 -8 xx or Xeon X 34 xx Replaced Penryn based Yorkfield processor 45 nm Socket 1156 opposed to the 1366 include Direct Media Interface and PCI Express links (dedicated northbridge chip, called the memory controller hub or I/O hub)
Clarksfield Is the mobile version of Lynnfield and available under the Core i 7 Mobile brand Quad core, 45 nm integrated PCI Express and DMI links Core i 7 7 xx. QM (6 MB), Core i 7 8 xx. QM (8 MB), Core i 7 9 xx. XM Extreme Edition (8 MB) Replaced Penryn-QC
Arrandale Second Mobile cups which contains All Core i 7 6 xx [UE, LE, E] (4 MB) Core i 5 5 xx [UM, M, E] (3 MB), Core i 5 4 xx. M (3 MB) Core i 3 3 xx. M, Celeron U 3 xxx (unreleased), P 4 xxx (2 MB) Integrated graphics processing unit but only two processor cores 32 nm and Dual Core E series processors are embedded versions with support for PCIe bifurcation and ECC memory
Clarkdale Desktop version of Arrandale, 32 nm Only as Core i 3 and Core i 5 and Dual Core All support Intel's Hyper Threading (HT) Integrated Graphics as well as PCI-Express and DMI links The Clarkdale processor package contains two dies: the actual 32 nm processor with the I/O connections and the 45 nm graphics controller with the memory interface Successor of Wolfdale (45 nm)
Clarkdale Used in Intel Core, Pentium and Celeron The Core i 5 versions generally have all features enabled Only the Core i 5 -661 model lacking Intel VT-d and TXT like the Core i 3, which also does not support Turbo Boost and the AES new instructions Pentium and Celeron versions do not have SMT, only use a reduced amount of third-level cache
Gulftown or Westmere-EP The Extreme Edition version of the Core i 7 featuring 6 cores, 32 nm process (core i 9) Gulftown is the first six-core dual-socket processor from Intel Hyper-Threading (for a total of 12 logical threads), 12 MB of cache, Turbo Boost and Intel Quick. Path connection bus Uses Westmere micro architecture a 32 nm shrink version of Nehalem
Gulftown 50% higher performance than bloomfield core i 7 975 Includes Core i 7 9 xx and Corei 7 9 xxx [12 MB], Xeon 36 xx, Xeon 56 xx Socket 1366
Specification
Nehalem Architecture
Nehalem Architecture
Design Considerations Hypertreading is reintroduced to cater to increasing number of thread based applications. Cores are placed on a single die to reduce latencies. Quick. Path Interconnect also supplements to achieve this purpose. L 1 and L 2 for each core and large shared L 3 cache for improving performance.
Looking forward to Sandy Bridge Net. Burst Core Nehalem Sandy Bridge
What can we expect…… � � � Sandy Bridge microchip will have an architecture optimized for 32 -nanometer transistors The Sandy Bridge microarchitecture is also said to focus on the connections of the processor core like vertical interconnects and multilevel dies Increase in FLOPs by using AVX (Advanced Vector Extensions) Haswell will be the successor to Sandy Bridge will be in 22 nm. The tick tock model works just fine…!!!
Trends and Performance Comparison
Intel Processor Trends
Intel Processor Trends Cache Hierarchy Second level cache size Third level cache size Front side bus (in MHz) Net. Burst Core Nehalem Two level hierarchy Three level hierarchy 256 KB– 2 MB 1 MB– 12 MB >1 MB - - 8 MB 400, 533, 800, 533, 667, 800, 1066, 1333, 1600 (QPI=6. 4 GT/s)
Intel Processor Trends
SPEC 2000 benchmark 2005 - 3. 73 GHz, 2004 - (3. 80 GHz, Intel 2006 - Intel(R) Core(TM) 2003 - (3. 0 GHz, Pentium 4 processor Intel(R) Pentium(R) 4 2 Extreme processor Pentium 4 processor 570 J) processor X 6800( 2. 93 GHz, 1066 with Hyper-Threading Primary Cache: 12 k MHz bus Technology) micro-ops I + 16 KBD on Primary Cache: 32 KBI + Primary Cache: 12 k chip Secondary Cache: 32 KBD per core, on chip micro-ops I + 8 KBD on 1 MB(I+D) on chip 2 MB(I+D) on chip Secondary Cache: 4 chip Secondary Cache: Memory: 1 GB MB(I+D) per chip, on 512 KB(I+D) on chip (shared) Memory: 512 MB Memory: 2 GB
SPEC 2006 benchmark 2006: Intel Core 2 Duo 2009: Intel Core i 7 -965 Extreme 2007: Intel Core 2 Extreme 2008: Intel Xeon X 5270 3. 5 GHz E 6700 2. 67 GHz, 1066 Edition QX 9650 3. 00 GHz MHz bus Intel Turbo Boost Technology up 1333 MHz FSB Primary Cache: 32 KB I + 32 to 3. 46 GHz KB Primary Cache: 32 KB I + 32 KB D on chip per core Primary Cache: 32 KB I + 32 KB D on chip per core Secondary Cache: 6 MB I+D D on chip per core Secondary Cache: 12 MB I+D on chip per chip Secondary Cache: 4 MB Secondary Cache: 256 KB I+D on chip per chip, I+D on chip per chip on chip per core 6 MB shared / 2 cores Memory: 16 GB L 3 Cache: 8 MB I+D on chip per Memory: 2 GB chip Memory: 4 GB Memory: 12 GB
Concluding Remarks
Our Views Focus needs to be on more scalable and robust architecture. Implementing 3 -D integration. How about a 128 bit processor? The speed of light problem. The end of Moore’s Law?
REFERENCES: Journals: � Koufaty, D. Marr, D. T, “Hyperthreading technology In the netburst Microarchitecture”, Volume: 23 , Issue: 2, page(s): 56 – 65. � Lu Peng, Jih-Kwon Peir, Prakash, T. K. , Yen-Kuang Chen, Koppelman, D, “Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study”, Performance, Computing, and Communications Conference, 2007. IPCCC 2007. IEEE International 11 -13 April 2007 Page(s): 55 – 64. � Kurd, N. , Douglas, J. , Mosalikanti, P. , Kumar, R. , “Next generation Intel® microarchitecture (Nehalem) clocking architecture”, VLSI Circuits, 2008 IEEE Symposium on 18 -20 June 2008 Page(s): 62 – 63. � Varghese George, Sanjeev Jahagirdar, Chao Tong, Smits, Ken, Satish Damaraju, Siers, Scott, Ves Naydenov, Tanveer Khondker, Sanjib Sarkar, Puneet Singh, “Penryn: 45 -nm next generation Intel® core™ 2 processor”, Solid-State Circuits Conference, 2007. ASSCC '07. IEEE Asian 12 -14 Nov. 2007 Page(s): 14 – 17. � Chang, J. , Ming Huang, Shoemaker, J. , Benoit, J. , Szu-Liang Chen, Wei Chen, Siufu Chiu, Ganesan, R. ; Leong, G. , Lukka, V. , Rusu, S. , Srivastava, D. , “The 65 -nm 16 -MB Shared On-Die L 3 Cache for the Dual-Core Intel Xeon Processor 7100 Series”, Solid-State Circuits, IEEE Journal of Volume 42, Issue 4, April 2007 Page(s): 846 – 852. � Bin-feng Qian, Li-min Yan, “The research of the inclusive cache used in multi-core processor”, Electronic Packaging Technology & High Density Packaging, 2008. ICEPT-HDP 2008. International Conference on 28 -31 July 2008 Page(s): 1 – 4. Online References: � www. wikipedia. org � www. intel. com � http: //www. hexus. net/content/item. php? item=3824
Question
- Slides: 46