Comparing Intel C and Microsoft Visual C Compilers

Agenda • • • Problem Statement System Environment Programs Used for Comparison Matrix Processing

Problem Statement • The general purpose of our project is to verify Intel’s claim

System Environment • Programs were run on a single processor system with Intel P

Programs Used for Comparison • SPEC CPU 2000 Benchmark – 164. gzip – 300.

VTune Setup • Using Intel’s VTune application the following events were measured: – Instruction

Matrix Processing Programs Results Clockticks Instructio n Count Clockticks per Instructio n 844, 962

Matrix Processing Programs Results (cont. ) Baum, Boyett, &

Matrix Processing Analysis • For Simple Matrix and Array Processing the Intel compiler verified

SPEC Benchmark Results 1 st Level Cache Misses Loads Stores Clockticks Instruction Count 871,

SPEC CPU 2000 Analysis • SPEC CPU 2000 Benchmarks did not show any significant

Conclusions • Even though our group saw significant improvements in performance for our small

Conclusions (cont. ) • The Intel C++ Compiler showed results that were equal to

Slides: 16

Download presentation

Comparing Intel C++ and Microsoft Visual C++ Compilers Michael Baum David Boyett Holly Garrison Baum, Boyett, &

Agenda • • • Problem Statement System Environment Programs Used for Comparison Matrix Processing Programs Results and Analysis SPEC Benchmark Results and Analysis Conclusion Baum, Boyett, &

Problem Statement • The general purpose of our project is to verify Intel’s claim that their compiler is 10% better then the Microsoft Visual compiler. • Data will be gathered using Intel VTune tool from both SPEC CPU 2000 benchmarks and from simple matrix processing programs. Baum, Boyett, &

System Environment • Programs were run on a single processor system with Intel P 4 2. 4 GHz processor and 512 MB RAM. – Windows 2000 operating system • Microsoft Visual. NET compiler • Intel C++ Compiler 7. 1 for Windows • Intel VTune Performance Analyzer 7. 0 Baum, Boyett, &

Programs Used for Comparison • SPEC CPU 2000 Benchmark – 164. gzip – 300. twolf • Simple Matrix Processing Programs – Array Summation of 10000 elements – Matrix Multiplication of 250 x 250 matrices Baum, Boyett, &

VTune Setup • Using Intel’s VTune application the following events were measured: – Instruction Count – Clockticks and Clockticks per Instruction – Loads & Stores – Level 1 cache misses – Mispredicted Calls and Branches Baum, Boyett, &

Matrix Processing Programs Results Clockticks Instructio n Count Clockticks per Instructio n 844, 962 18, 995, 295 981, 030 19. 36 863, 772 1, 162, 239 13, 069, 242 1, 462, 053 8. 94 0 0 657, 324 9, 502, 532 1, 979, 090 4. 80 18, 640, 249 31, 728, 270 657, 328 88, 513, 594 54, 242, 733 1. 63 Mispredict ed Calls Mispredic ted Branches Array Sum 10000 (Intel) 1, 518 22, 285 49, 890 1, 268, 145 Array Sum 10000 (VC++) 4, 536 39, 123 186, 760 Matrix Mult 250 (Intel) 220 5, 132 Matrix Mult 250 (VC++) 289 68, 354 Executable (*. exe) 1 st Level Cache Misses Loads Stores Baum, Boyett, &

Matrix Processing Programs Results (cont. ) Baum, Boyett, &

Matrix Processing Analysis • For Simple Matrix and Array Processing the Intel compiler verified it’s claim of a 10% better compiler – With the exception of the number of Stores executed, the Intel compiler showed approximately a 50% savings in the measured operations. • The Matrix Multiplication program showed one noteworthy result: the Intel compiler had zero events for both 1 st Level Cache Misses and for Loads. – Verified by multiple builds and runs Baum, Boyett, &

SPEC Benchmark Results 1 st Level Cache Misses Loads Stores Clockticks Instruction Count 871, 754, 172 2, 267, 577, 93 6 22, 054, 374, 34 2 11, 101, 416, 84 0 106, 412, 563, 51 5 76, 670, 596, 52 0 1. 39 7, 695 869, 317, 015 2, 273, 066, 85 2 22, 074, 844, 24 8 11, 108, 909, 04 9 107, 286, 054, 47 0 76, 671, 138, 91 5 1. 40 300. twolf (Intel) 346 4, 874, 982 7, 639, 211 77, 060, 025 32, 577, 657 484, 933, 215 210, 922, 988 2. 30 300. twolf (VC++) 537 4, 797, 552 7, 526, 588 76, 831, 638 33, 214, 416 473, 946, 742 211, 425, 444 2. 24 Mispredicte d Calls Mispredicte d Branches 11, 725 164. gzip (VC++) Executable (*. exe) 164. gzip (Intel) Clockticks per Instruction Baum, Boyett, &

SPEC Benchmark Results Baum, Boyett, &

SPEC CPU 2000 Analysis • SPEC CPU 2000 Benchmarks did not show any significant difference between the two compilers. • SPEC Benchmarks were re-compiled and data sets were collected multiple times to verify the validity of the original data. Baum, Boyett, &

Conclusions • Even though our group saw significant improvements in performance for our small test programs, these same gains could not be duplicated for the Benchmark applications. • These variations might be the result of differences in program complexity. Baum, Boyett, &

Conclusions (cont. ) • The Intel C++ Compiler showed results that were equal to or in some cases better than those of Microsoft Visual C++. • While Intel’s claim of 10% better results may not be true in all cases it is still a superior compiler. Baum, Boyett, &