Intel Copyright 2002 Intel Corporation Agenda Introduction Who

  • Slides: 20
Download presentation
提升循环级并行 Intel Copyright © 2002 Intel Corporation

提升循环级并行 Intel Copyright © 2002 Intel Corporation

Agenda § § § Introduction Who Cares? Definition Loop Dependence and Removal Dependency Identification

Agenda § § § Introduction Who Cares? Definition Loop Dependence and Removal Dependency Identification Lab Summary

Introduction § Loops must meet certain criteria… § Iteration Independence § Memory Disambiguation §

Introduction § Loops must meet certain criteria… § Iteration Independence § Memory Disambiguation § High Loop Count § Etc…

Who Cares § 实现真正的并行: § Open. MP § Auto Parallelization… § 显式的指令级并行 ILP (Instruction

Who Cares § 实现真正的并行: § Open. MP § Auto Parallelization… § 显式的指令级并行 ILP (Instruction Level Parallelism) § § Streaming SIMD (MMX, SSE 2, …) Software Pipelining on Intel® Itanium™ Processor Remove Dependencies for the Out-of-Order Core More Instructions run in parallel on Intel Itanium-Processor § 自动编译器并行 § High Level Optimizations

Definition Loop Independence: Iteration Y of a loop is int a[MAX]; independent of when

Definition Loop Independence: Iteration Y of a loop is int a[MAX]; independent of when for (J=0; J<MAX; J++) { a[J] = b[J]; or whether iteration X } happens

图例 Open. MP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP:

图例 Open. MP: True Parallelism SIMD: Vectorization SWP: Software Pipelining OOO: Out-of-Order Core ILP: Instruction Level Parallelism Green: Benefits from concept Yellow: Some Benefits from Concept Red: No Benefit from Concept

Agenda § Definition § Who Cares? § Loop Dependence and Removal § Data Dependencies

Agenda § Definition § Who Cares? § Loop Dependence and Removal § Data Dependencies § Removing Dependencies § Data Ambiguity and the Compiler § Dependency Removal Lab § Summary

Flow Dependency § Read After Write § Cross-Iteration Flow Dependence: Variables written then read

Flow Dependency § Read After Write § Cross-Iteration Flow Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J-1]; } A[1]=A[0]; A[2]=A[1];

Anti-Dependency § Write After Read § Cross-Iteration Anti. Dependence: Variables written then read in

Anti-Dependency § Write After Read § Cross-Iteration Anti. Dependence: Variables written then read in different iterations for (J=1; J<MAX; J++) { A[J]=A[J+1]; } A[1]=A[2]; A[2]=A[3];

Output Dependency § Write After Write § Cross-Iteration Output Dependence: Variables written then written

Output Dependency § Write After Write § Cross-Iteration Output Dependence: Variables written then written again in a different iteration for (J=1; J<MAX; J++) { A[J]=B[J]; A[J+1]=C[J]; } A[1]=B[1]; A[2]=C[1]; A[2]=B[1]; A[3]=C[1];

Intra. Iteration Dependency § Dependency within an iteration § Hurts ILP § May be

Intra. Iteration Dependency § Dependency within an iteration § Hurts ILP § May be automatically removed by compiler K = 1; for (J=1; J<MAX; J++) { A[J]=A[J] + 1; B[K]=A[K] + 1; K = K + 2; } A[1] = A[1] + 1; B[1]= A[1] + 1;

Remove Dependencies § Best Choice § Requirement for true Parallelism § Not all dependencies

Remove Dependencies § Best Choice § Requirement for true Parallelism § Not all dependencies can be removed for (J=1; J<MAX; J++) { A[J]=A[J-1] + 1; } for (J=1; J<MAX; J++) { A[J]= A[0] + J; }

Increasing ILP, without removing dependencies § § Good: Unroll Loop Make sure the compiler

Increasing ILP, without removing dependencies § § Good: Unroll Loop Make sure the compiler can’t or didn’t do this for you Compiler should not apply common subexpression elimination Also notice that if this is floating point data precision could be altered for (J=1; J<MAX; J++) { A[J] =A[J-1] + B[J]; } for (J=1; J<MAX; J+=2) { A[J]=A[J-1] + B[J]; A[J+1]=A[J-1] + (B[J] + B[J+1]); }

Induction Variables § Induction variables are incremented on each trip through the loop §

Induction Variables § Induction variables are incremented on each trip through the loop § Fix by replacing increment expressions with pure function of loop index i 1 = 0; i 2 = 0; for(J=0, J<MAX, J++) { i 1 = i 1 + 1; B(i 1) = … i 2 = i 2 + J; A(i 2) = … } for(J=0, J<MAX, J++) { B(J) =. . . A((J**2 + J)/2)=. . . }

Reductions § Reductions collapse array data to scalar data via associative operations: for (J=0;

Reductions § Reductions collapse array data to scalar data via associative operations: for (J=0; J<MAX; J++) sum = sum + c[J]; § Take advantage of associativity and compute partial sums or local maximum in private storage § Next, combine partial results into shared result, taking care to synchronize access

Data Ambiguity and the Compiler Are the loop iterations independent? The C++ compiler has

Data Ambiguity and the Compiler Are the loop iterations independent? The C++ compiler has no idea No chance for optimization - In order to run error free the compiler assumes that a and b overlap void func(int *a, int *b) { for (J=0; J<MAX; J++) { a[J] = b[J]; } }

Function Calls § Generally function calls inhibit ILP § Exceptions: § Transcendentals § IPO

Function Calls § Generally function calls inhibit ILP § Exceptions: § Transcendentals § IPO compiles for (J=0; J<MAX; J++) { compute(a[J], b[J]); a[J][1]=sin(b[J]); }

Function Calls with State § Parallel access to § Many routines such routines is

Function Calls with State § Parallel access to § Many routines such routines is maintain state across unsafe unless calls: synchronized § Memory allocation § Pseudo-random number generators § I/O routines § Graphics libraries § Third-party libraries § Check documentation for specific functions to determine threadsafety

A Simple Test *Exception: Loops with induction variables Reverse 1. Reverse the loop order

A Simple Test *Exception: Loops with induction variables Reverse 1. Reverse the loop order and rerun in serial 2. If results are unchanged, the loop is Independent* for(J=0; J<MAX; J++) { <. . . > compute(J, . . . ) <. . . > } for(J=MAX-1; J>=0; J--){ <. . . > compute(J, . . . ) <. . . > }

Summary § Loop Independence: Loop Iterations are independent of each other. § Explained it’s

Summary § Loop Independence: Loop Iterations are independent of each other. § Explained it’s importance § ILP and Parallelism § Identified common causes of loop dependence § Flow Dependency, Anti-Dependency, Output Dependency § Taught some methods of fixing loop dependence § Reinforced concepts through lab