Intel Compilers 9 x on the Intel Core
- Slides: 41
Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version Intel Software College
Objectives At the successful completion of this module, you will be able to: • Use key compiler optimization switches • Optimize software for the Architecture • Enhance performance with vectorization and other techniques Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda Introduction Compiler Switches Dual Core Vectorization Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Key to optimizing: Intel® Core™ Duo Exploiting Architectural Power requires Sophisticated Compilers Optimal use of • Registers & functional units • Dual-Core/Multi-processor • SSE instructions • Cache architecture Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
C++ Compatibility with Microsoft Source & binary compatible with VC 2003 with /Qvc 71, Source & binary compatible with w/ VC 2005 under /Qvc 8. Microsoft* & Intel Open. MP binaries are not compatible. • Use the one compiler for all modules compiled with Open. MP For more information, refer to the User’s Guide Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Use Intel Compiler in Microsoft IDE C++ Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda Introduction Compiler Switches • Intel® C++ compiler Dual Core Vectorization Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
General Optimizations Windows* Linux* Mac* /Od -O 0 Disables optimizations /Zi -g -g Creates symbols /O 1 -O 1 Optimize for Binary Size: Server Code /O 2 -O 2 Optimizes for speed (default) /O 3 -O 3 Optimize for Data Cache: Loopy Floating Point Code Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Multi-pass Optimization Interprocedural Optimizations (IPO) ip: Enables interprocedural optimizations for single file compilation ipo: Enables interprocedural optimizations across files Windows* Linux* Mac* /Qip -ip /Qipo -ipo Can inline functions in separate files Enhances optimization when used in combination with other compiler features Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Multi-pass Optimization - IPO Usage: Two-Step Process Compiling Pass 1 Windows* icl -c /Qipo main. c func 1. c func 2. c Linux* icc -c -ipo main. c func 1. c func 2. c Mac* icc -c -ipo main. c func 1. c func 2. c virtual. o Pass 2 executable Linking Windows* icl /Qipo main. o func 1. o func 2. o Linux* icc -ipo main. o func 1. o func 2. o Mac* icc -ipo main. o func 1. o func 2. o Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Profile Guided Optimizations (PGO) Use execution-time feedback to guide many other compiler optimizations Helps I-cache, paging, branch-prediction Enabled optimizations: • Basic block ordering • Better register allocation • Better decision of functions to inline • Function ordering • Switch-statement optimization • Better vectorization decisions Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Multi-pass Optimization PGO: Three-Step Process Step 1 Instrumented Compilation Instrumented executable (Mac*/Linux*) icc -prof_gen[x] prog. c (Windows*) icl -Qprof_gen[x] prog. c Step 2 Instrumented Execution Run program on a typical dataset Step 3 Feedback Compilation (Mac/Linux) icc -prof_use prog. c (Windows) icl -Qprof_use prog. c DYN file containing dynamic info: . dyn Merged DYN summary file: . dpi Delete old dyn files if you do not want the info included Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda Introduction Compiler Switches Dual Core • • • Auto Parallelization Open. MP Threading Diagnostics Vectorization Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Auto-parallelization: Automatic threading of loops without having to manually insert Open. MP* directives. Windows* Linux* Mac* /Qparallel -parallel /Qpar_report[n] -par_report[n] • Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze. Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Open. MP* Threading Technology Pragma based approach to parallelism Usage: Open. MP switches: -openmp : /Qopenmp Open. MP reports: -openmp-report : /Qopenmp-report #pragma omp parallel for (i=0; i<MAX; i++) A[i]= c*A[i] + B[i]; Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Open. MP: Workqueueing Extension Example Intel Compiler’s Workqueuing extension • Create Queue of tasks…Works on… • • Recursive functions Linked lists, etc. #pragma intel omp parallel taskq shared(p) { while (p != NULL) { #pragma intel omp task captureprivate(p) do_work 1(p); p = p->next; } } Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Parallel Diagnostics Source Instrumentation for Intel Thread Checker • Allows thread checker to diagnose threading correctness bugs • To use tcheck/Qtcheck you must have Intel Thread Checker installed • • See thread checker documentation http: //www. intel. com/support/perfor mancetools/sb/CS-009681. htm Windows* Linux* Mac* /Qtcheck No support -tcheck Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda Introduction Compiler Switches Dual Core Vectorization • • • SSE & Vectorization Reports Explanations of a few specific vectorization inhibitors Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SIMD – SSE, SSE 2, SSE 3 Support 2 x doubles 4 x floats 1 x dqword SSE 2 SSE 3 16 x bytes SSE 8 x words MMX* 4 x dwords 2 x qwords * MMX actually used the x 87 Floating Point Registers - SSE, SSE 2, and SSE 3 use the new SSE registers Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SSE 3 Instructions FISTTP FP to integer conversions ADDSUBPD, ADDSUBPS, Complex arithmetic MOVDDUP, MOVSHDUP, MOVSLDUP Video encoding SIMD FP using AOS format* LDDQU HADDPD, HSUBPD Thread Synchronization HADDPS, HSUBPS MONITOR, MWAIT * Also benefits Complex and Vectorization Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Using SSE 3 - Your Task: Convert This… for (i=0; i<=MAX; i++) c[i]=a[i]+b[i]; A[1] A[0] + not used + B[0] B[1] not used C[0] C[1] not used + 128 -bit Registers not used Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
… Into This … for (i=0; i<=MAX; i++) c[i]=a[i]+b[i]; A[3] + A[2] + A[1] B[3] B[2] B[1] C[3] C[2] C[1] A[0] + + 128 -bit Registers B[0] C[0] Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Compiler Based Vectorization Processor Specific Description Use Windows* Linux* Mac* Generate instructions and optimize for Intel® Pentium® 4 compatible processors including MMX, SSE and SSE 2. W /Qx. W -x. W Does not apply Generate instructions and optimize for Intel® processors with SSE 3 capability including Core Duo. These processors support SSE 3 as well as MMX, SSE and SSE 2. P /Qx. P /Qax. P -x. P, -ax. P Vectorization occurs by default Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Compiler Based Vectorization Automatic Processor Dispatch – ax[? ] Single executable • Optimized for Intel® Core Duo processors and generic code that runs on all IA 32 processors. For each target processor it uses: • Processor-specific instructions • Vectorization Low overhead • Some increase in code size Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Why Loops Don’t Vectorize Independence • Loop Iterations generally must be independent Some relevant qualifiers: • Some dependent loops can be vectorized. • Most function calls cannot be vectorized. • Some conditional branches prevent vectorization. • Loops must be countable. • Outer loop of nest cannot be vectorized. • Mixed data types cannot be vectorized. Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Why Didn’t My Loop Vectorize? Windows* Linux* Macintosh* -Qvec_reportn -vec_reportn Set diagnostic level dumped to stdout n=0: No diagnostic information n=1: (Default) Loops successfully vectorized n=2: Loops not vectorized – and the reason why not n=3: Adds dependency Information n=4: Reports only non-vectorized loops n=5: Reports only non-vectorized loops and adds dependency info Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Why Loops Don’t Vectorize • “Existence of vector dependence” • “Nonunit stride used” • “Mixed Data Types” • “Unsupported Loop Structure” • “Contains unvectorizable statement at line XX” • There are more reasons loops don’t vectorize but we will disucss the reasons above Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Existence of Vector Dependency” Usually, indicates a real dependency between iterations of the loop, as shown here: for (i = 0; i < 100; i++) x[i] = A * x[i + 1]; Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Defining Loop Independence Iteration Y of a loop is independent of when (or whether) iteration X occurs. int a[MAX], b[MAX]; for (j=0; j<MAX; j++) { a[j] = b[j]; } Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 29 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Nonunit stride used” Memory for (I=0; I<=MAX; I++) for (J=0; J<=MAX; J++) { c[I][J]+=1; // Unit Stride c[J][I]+=1; // Non-Unit A[J*J]+=1; // Non-unit A[B[J]]+=1; // Non-Unit if (A[MAX-J])=1 last 1=J; }// Non-Unit End Result: Loading Vector may take more cycles than executing operation sequentially. Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 30 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Mixed Data Types” An example: int howmany_close(double *x, double *y) { int withinborder=0; double dist; for(int i=0; i<MAX; i++) { dist=sqrtf(x[i]*x[i] + y[i]*y[i]); if (dist<5) withinborder++; } } Mixed data types are possible – but complicate things • i. e. : 2 doubles vs 4 ints per SIMD register Some operations with specific data types won’t work Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 31 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Unsupported Loop Structure” Example: struct _xx { int data; int bound; } ; doit 1(int *a, struct _xx *x) { for (int i=0; i<x->bound; i++) a[i] = 0; An unsupported loop structure means the loop is not countable, or the compiler for whatever reason can’t construct a run-time expression for the trip count. Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 32 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Contains unvectorizable statement” for (i=1; i<nx; i++) { B[i] = func(A[i]); } A[3] func A[2] func A[1] func B[3] B[2] B[1] A[0] func 128 -bit Registers B[0] Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 33 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Reference Web-based and classroom training • www. intel. com/software/college White papers and technical notes • www. intel. com/ids • www. intel. com/software/products Product support resources • www. intel. com/software/products/support Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 34 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 35 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 1 - raytrace 2: Initial Compilation Set up environment and compile with both Microsoft* Visual C++. NET (MSVC*) and Intel® C++ Compiler (icl) Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 36 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 2 - raytrace 2: O 3 Compilation Use Intel compiler’s High Level Optimizer (-O 3) for loop centric codes Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 37 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 3 - raytrace 2: IPO Compilation Use Intel compiler’s Inter-procedural Optimization (-Qipo) Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 38 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 4 - raytrace 2: PGO Compilation Use Intel compiler’s Profile-guided Optimization Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 39 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 5 – raytrace 2: Vectorization Use Intel compiler’s Vectorization optimization (-Qx. P) Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 40 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 6 - raytrace 2: Putting it all together Use all previous optimizations in tandem (-O 3, -Qx. P, IPO and PGO) Intel Compilers 9. x on the Intel® Core Duo™ Processor Windows version 41 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
- Elsa gunter uiuc
- Finding and understanding bugs in c compilers
- Compilers: principles, techniques, and tools
- C++ binarymove
- Compilers book
- Compilers and interpreters are themselves
- Cross compilers
- Functions of compiler
- Cs 421 uiuc
- Pros and cons of compilers and interpreters
- Front end in compiler design
- 8088 microprocessor architecture
- Perkembangan intel
- Intel core processor architecture
- Intel core competencies
- Mantle is made up of
- Layers of earth from most dense to least dense
- Core rigidity
- The brittle, rocky outer layer of earth
- Thế nào là mạng điện lắp đặt kiểu nổi
- Hình ảnh bộ gõ cơ thể búng tay
- Khi nào hổ mẹ dạy hổ con săn mồi
- Các loại đột biến cấu trúc nhiễm sắc thể
- Thế nào là sự mỏi cơ
- độ dài liên kết
- Chó sói
- Thiếu nhi thế giới liên hoan
- Phối cảnh
- điện thế nghỉ
- Một số thể thơ truyền thống
- Trời xanh đây là của chúng ta thể thơ
- Thế nào là hệ số cao nhất
- Hệ hô hấp
- Bảng số nguyên tố lớn hơn 1000
- đặc điểm cơ thể của người tối cổ
- Các châu lục và đại dương trên thế giới
- Thang điểm glasgow
- ưu thế lai là gì
- Tư thế ngồi viết
- Cái miệng nó xinh thế chỉ nói điều hay thôi
- Mật thư tọa độ 5x5