Execution Characteristics of SPEC CPU 2000 Benchmarks Intel
Execution Characteristics of SPEC CPU 2000 Benchmarks: Intel C++ vs. Microsoft VC++ Swathi Tanjore Gurumani, Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Outline • • Objective Background Problem Overview Performance Evaluation - Overview Experimental Setup Results Conclusion and Future Research Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Problem Objective Prove and stress the importance of designing architecture-aware compilers Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Background - Application Performance § Advancement in processor technology • • • Deep pipelining Multi-level cache hierarchy Improved branch predictors Out of order execution engine Advanced floating point Multimedia units § Compilers • Optimization levels and switches § Compilers should keep up with processor technology Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Architecture-aware Compilers § Compiler/hardware interaction can maximize application performance by • Exploiting advances in processor technology • Generating target-specific optimal codes § § § Path length reduction Efficient instruction selection Pipelining scheduling Instruction level parallelism Memory penalty minimization Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Performance Evaluation § Systematic process of data collection and analysis to determine and evaluate any system Benchmarks Compile Exe Performance Metrics § Benchmarks: A program that performs a strictly defined set of operations (a workload) and returns some form of result (a metric) describing how the tested computer performed. Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Performance Evaluation – Previous Works § Study underlying architecture and characterize workloads • Evaluation of Pentium Pro using SPEC 2000 • Evaluation of Pentium II using Multimedia applications § Processor centric optimization • Xeon vs. Pentium III • Pentium III vs. Pentium IV § Compilers and optimization • Branch optimizations by different compilers Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Problem Overview § Objective Prove and stress the importance of architecture aware compilers § How? • • Compile benchmarks using different compilers Use same optimization switches Execute the binaries using performance analyzer Analyze and compare the performance metrics collected § Same OS, hardware features - difference in metrics only due to compiler used Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Experimental Setup IC++ Exe VTune SPEC CPU 2000 VC++ Exe VTune Performance Metrics Processor : Pentium IV Operating System : Windows 2000 Optimization Level : /O 2 Input : Reference set from SPEC Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
SPEC CPU 2000 § § Portray real user application and computation intensive Can measure performance of processor, memory and compiler Does not stress on I/O devices, networking and OS Used CINT 2000 and CFP 2000 Name Description 164. gzip (INT) Data Compression written in C 176. gcc (INT) C Programming Language Compiler 177. mesa (FP) 3 -D Graphics Library written in C 181. mcf (INT) Combinatorial Optimization written in C 186. crafty (INT) Chess – Game Playing written in C 197. parser (INT) Word Processing written in C 252. eon (INT) Computer Visualization written in C++ 253. perlbmk (INT) PERL Programming Language written in C 254. gap (INT) Group Theory, Interpreter written in C 255. vortex (INT) Object Oriented database written in C Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
VTune Performance Analyzer § Simultaneous sampling of multiple events and real time display using counter monitors § Supports time-based and event-based sampling • To take advantage of Pentium IV’s EBS feature § Has a low intrusion • Samples collected provide a closer representation of application’s actual performance § Events Collected • Clockticks, instructions retired, loads retired, stores retired, branches retired, I level cache misses and mispredicted branches Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Compiler Optimizations Option Effect § § /Od Disable optimization § /O 1 Minimize size § /O 2 Maximize speed Both compilers were used with /O 2 option Invoke the same switches and have same functions Microsoft VC++ has special switches to target Pentium (/G 5) & Pentium Pro (/G 6) Intel C++ compiler optimizes performance for applications running on Intel architecture-based computers § Performance gains by using IC++ are result of - profile-guided optimization - pre-fetch instruction - support for Streaming SIMD Extensions (SSE) - data prefetching - inter-procedural optimization Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Clock ticks Department of Electrical and Computer Engineering - UAH § On average, 10% performance gain with IC++ § Performance gain more pronounced for 3 D graphics library and computer visualization application ACMSE’ 04, AL
Comparison of Binaries Benchmark Code Size (in Bytes) MSVC++ IC++ 164. gzip 69, 632 77, 824 176. gcc 1, 089, 536 1, 314, 816 177. mesa 442, 368 610, 304 181. mcf 49, 152 53, 248 186. crafty 241, 664 258, 048 197. parser 118, 784 131, 072 252. eon 405, 504 413, 696 253. perlbmk 516, 096 651, 264 254. gap 356, 352 413, 696 255. vortex 417, 792 454, 656 §VC++ produced smaller sized binaries Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Instruction Count § 3 D and Computer Visualization applications have a much reduced instruction count than others Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Loads Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Stores Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Branches Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Other Instructions Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Comparison of Cache Misses Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Conclusion & Future Research § Execution characteristics of CPU 2000 benchmarks was presented for VC++ and IC++ § IC++ performed better than VC++ for all considered applications and more pronounced for graphics applications § Distribution of loads, stores and branches were same – difference in absolute numbers § No difference in branch prediction and memory references § Use - Strength and weakness of compilers § Future Directions • Different Optimization switches • Usage of microbenchmarks for better control Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
Thank You! Questions and Feedback… Department of Electrical and Computer Engineering - UAH ACMSE’ 04, AL
- Slides: 22