Niagara a 32 Way Multithreaded SPARC Processor P

  • Slides: 12
Download presentation
Niagara: a 32 -Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K. Olokotun Sun

Niagara: a 32 -Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K. Olokotun Sun Microsystems Presented by Bogdan Romanescu

Goal • Commercial server applications: – High thread level parallelism (TLP) • Large numbers

Goal • Commercial server applications: – High thread level parallelism (TLP) • Large numbers of parallel client requests – Low instruction level parallelism (ILP) • High cache miss rates • Many unpredictable branches • Frequent load-load dependencies • Power, cooling, and space are major concerns for data centers

Sun’s Solution • Ultra. SPARC T 1 processor • “the highest-throughput and most ecoresponsible

Sun’s Solution • Ultra. SPARC T 1 processor • “the highest-throughput and most ecoresponsible processor ever created”® • • • Multicore Fine-grain multithreading within core Simple pipelines Small L 1 cache Shared L 2 Metric: Performance/Watt

Architecture

Architecture

Sparc pipe • Ultra. SPARC II style • Single issue 6 stage: F, S,

Sparc pipe • Ultra. SPARC II style • Single issue 6 stage: F, S, D, E, M, W • Shared units: – – L 1 $ TLB X units pipe registers • Hazards: – Data – Structural

Integer Register file • One register file / thread • SPARC window: in, out,

Integer Register file • One register file / thread • SPARC window: in, out, local registers • Highly integrated cell structure to support 4 threads: – 8 windows of 32 locations / thread – 3 read ports + 2 write ports – Read/write: single cycle latency • 1 Active Window Cell (copy of the architectural set window)

Thread scheduling • Thread selection based on: – Previous long latency instruction in pipe

Thread scheduling • Thread selection based on: – Previous long latency instruction in pipe – Instruction type – LRU status • Select & Fetch coupled

Memory Write through • 16 KB 4 way set assoc. I$/ core • allocate

Memory Write through • 16 KB 4 way set assoc. I$/ core • allocate LD • 8 KB 4 way set assoc. D$/ core • no-allocate ST • 3 MB 12 way set assoc. L 2 $ shared – 4 x 750 KB independent banks – 2 cycle throughput, 8 cycle latency – Direct link to DRAM & Jbus – Manages cache coherence for the 8 cores – CAM based directory

Performance TestArchitecture Sun Fire T 2000 IBM p 5 -550 with 2 dual-core Power

Performance TestArchitecture Sun Fire T 2000 IBM p 5 -550 with 2 dual-core Power 5 chips Dell Power. Edge SPECjbb 2005 (Java server software) business operations/ sec 63, 378 61, 789 24, 208 (SC 1425 with dual single-core Xeon) SPECweb 2005 (Web server performance) 14, 001 7, 881 4, 850 (2850 with two dualcore Xeon processors) Notes. Bench (Lotus Notes performance) 16, 061 14, 740

“Home run“ ? • Relatively slow single-thread performance • Poor floating-point performance • Lack

“Home run“ ? • Relatively slow single-thread performance • Poor floating-point performance • Lack of software support ( Sun Fire T 2000 does not support Linux or Windows) • Price • Concurrency counterattack – no place as a general-purpose computer running databases – small low-end market segment ? • Niagara II & The “Rock” – multiprocessor & enhanced single thread support

References • • [1] P. Kongetira, et al, “A 32 -Way Multithreaded SPARC Processor,

References • • [1] P. Kongetira, et al, “A 32 -Way Multithreaded SPARC Processor, ” IEEE Micro, vol. 25, pp. 21 -29, Mar. , 2005. [2] A. S. Leon, et al, “A Power-Efficient High-Throughput 32 -Thread SPARC Processor”, ISSCC 2006 , SESSION 5 , PROCESSORS [3] S. Chaudhry, S. Yip, P. Caprioli and M. Tremblay, “High Performance Throughput Computing” , IEEE Micro, vol. 25, Issue 3, 2005 [4] http: //opensparc. sunsource. net/nonav/opensparct 1. html [5] http: //www. sun. com/processors/Ultra. SPARC-T 1/features. xml [6] http: //www. sun. com/servers/coolthreads/t 1000/benchmarks. jsp [7] http: //news. com/Sun+begins+Sparc+phase+of+server+overhaul/2163 -1010_35983365. html [8] http: //h 71028. www 7. hp. com/ERC/cache/280124 -0 -0 -0 -121. html