C11 and three compilers performance considerations 19082014 CERN

  • Slides: 9
Download presentation
C++11 and three compilers: performance considerations › 19/08/2014 CERN openlab Summer Students Lightning Talks

C++11 and three compilers: performance considerations › 19/08/2014 CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang

Motivation › “Surprisingly, C++11 feels like a new language” by Stroustrup. › › Improvements:

Motivation › “Surprisingly, C++11 feels like a new language” by Stroustrup. › › Improvements: Language usability, Mulithreading and other stuff. How about performance? › Four questions: Time measurement methods, For-loop efficiency, std: : async, STL algorithms in parallel mode › GCC 4. 9. 0, ICC 15. 0. 0, Clang+LLVM 3. 4 › 19/08/2014 -O 2, -O 3, -Ofast Stephen Wang 2

Contribution › › › › 19/08/2014 Make four micro-benchmarks for each feature. Code: https:

Contribution › › › › 19/08/2014 Make four micro-benchmarks for each feature. Code: https: //github. com/wangyichao/CPP 11_Benchmarks Automatize the process, make results repeatable Python and bash script are used. Compile -> Run -> Table (3 compiler * 3 options * containers * algorithms …) Use profiling tools to check performance such as vectorization and threads. Perf, Intel Vtune, Likwid, even PMU Answer basic questions. Stephen Wang 3

Conclusions auto start = std: : chrono: : steady_clock: : no w(); auto end

Conclusions auto start = std: : chrono: : steady_clock: : no w(); auto end = std: : chrono: : steady_clock: : no w(); int elapsed = std: : chrono: : duration_cast<st d: : chrono: : nanoseconds>(end - start). count(); overhead. push_back(elapsed); RDTSCP std: : chrono 19/08/2014 Stephen Wang 4

Conclusions › std: : chrono (C++11) is a reliable time measurement. auto start =

Conclusions › std: : chrono (C++11) is a reliable time measurement. auto start = std: : chrono: : steady_clock: : no w(); microseconds_sleep(index); auto end = std: : chrono: : steady_clock: : no w(); int elapsed = std: : chrono: : duration_cast<st d: : chrono: : nanoseconds>(end - start). count(); 19/08/2014 Stephen Wang 5

Conclusions › › Performance of for-loop varies with iteration method and container. Which containers

Conclusions › › Performance of for-loop varies with iteration method and container. Which containers are vectorized? array std: : list std: : set std: : vector GCC ICC Clang Table range-based for loop 19/08/2014 Stephen Wang 6

Conclusions › std: : async spawns threads when called and the computation is done

Conclusions › std: : async spawns threads when called and the computation is done immediately. ( stdlibc++ from GCC) double sum = 0; auto handle 1 = std: : async(std: : launch: : async, fsum, v. begin()+distance); Auto handle 2 = std: : async(…); … … sum = handle 1. get()+handle 2. get()+ha ndle 3. get()+handle 4. get(); 19/08/2014 Stephen Wang 7

Conclusions › 19/08/2014 GNU libstdc++ parallel speeds up part of STL algorithms, whose performance

Conclusions › 19/08/2014 GNU libstdc++ parallel speeds up part of STL algorithms, whose performance varies with containers. Stephen Wang 8

› Thanks 19/08/2014 Stephen Wang 9

› Thanks 19/08/2014 Stephen Wang 9