Intel Array Building Blocks BY EDWARD JONES Background
Intel Array Building Blocks BY: EDWARD JONES
Background Intel Ct: Developed in 2007 Parallel programming model for multicore chips Exploits Single Instruction, Multiple Data (SIMD) Rapid. Mind Started in 2004 Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs) Intel acquired Rapid. Mind on August 19, 2009
Intel Ar. BB is a C++ API Promote parallel programming Hide intricacies hardware and vector ISA Oriented to data-intensive mathematical computations Built in protection An Ar. BB program cannot create race conditions or deadlocks by default
What is it used for? Bioinformatics Visual Computing Engineering Design Signal and Image Processing Financial Analytics Science and Research Oil and Gas Enterprise Medical Imaging
Extend C++ Use standard C++ feature to create new types and operators Constructs of Ar. BB Scalar types – equivalent to primitive C++ types Vector types – parallel collections of scalar data Operators – Scalar and vector operators Functions – User defined code fragments Control flow
Scalar Types Description C++ equivalents f 32, f 64 32/64 bit floating point number Float, double i 8, i 16, i 32, i 64 8/16/32/64 bit signed integers Char, short, int u 8, u 16, u 32, u 64 8/16/32/64 bit unsigned integers Unsigned char, short, int Boolean value bool usize, isize Signed/unsigned integers sufficiently large to store addresses. size_t
Dense Containers Very similar to vectors Dynamically changes size during runtime Operations: Element wise scalar operations Indexing Reordering Reductions Property Access Most operations run in parallel
Dense Containers Example void vecsum (dense<f 32> a, dense<f 32> b, dense<f 32>&c){ c = a + b; } int main(int argc, char** argv){ #define SIZE = 1024; float a[SIZE]; float b[SIZE]; float c[SIZE]; dense<f 32> va; bind (va, a, SIZE); dense<f 32> vb; bind (vb, b, SIZE); dense<f 32> vc; bind (va, c, SIZE); call(vecsum)(va, vb, vc); }
Element-wise and Vector-scalar Operators All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations This allows these operations to be done in parallel to speed up runtime. Other operators Operator Description abs Absolute value cos Cosine sin Sine tan Tangent exp Exponent log Natural logarithm
Collective Operators Perform computations where output(s) depend on all of the inputs. Example Reduction – applies an operator over an entire vector to compute a distilled value or values. add_reduce([1 0 2 -1 4]) yields 6 Scan – computes reductions on all prefixes of a collection add_iscan([1 0 2 -1 4]) yields [1 (1+0) (1+0+2+(-1)) (1+0+2+(-1)+4)]
Other Types of Operators Permutation Operators These operations alter the size and order of vectors a = shift(b, -1, value); a = rotate(b, -1) Facility Operators Provides data processing features Operator Dimension Description cat 1, 2, 3 Concatenate dense containers page 3 Retrieve slice of a dense container
Differences from C++ _for(i 32 i=0, i<=N, i++) { _if(condition){ /* code */ } _end_for; /* code */ } _else { _while(condition){ /* code */ } _end_while; /* code */ } _end_if;
Functions Calling Ar. BB functions is different from normal function calls Form: mfc fnct = call(my_function); Calling a function creates a closure for that function Once created the first time it will never be created again Allows for Currying ‘map’ function allows the programmer to execute a function for every element in a vector
Dynamic Execution Engine Array Building Blocks provides a dynamic execution engine which comprises three major services: Threading Runtime Provides a model for fine-grained model for data and task parallel threading Memory Manager Segregates normal C++ memory from the Ar. BB memory Set of lock-free memory interfaces as a garbage collector Just-in-time Compiler/Dynamic Engine Constructs intermediate representation of computations, performs optimizations, and generates code.
Monte Carlo Computation of Pi
Monte Carlo Computation of Pi C/C++ double computepi(){ int cnt = 0; for(int i = 0; i < NEXP; i++){ float x = float(rand()) / float(RAND_MAX); float y = float(rand()) / float(RAND_MAX); float dst = sqrtf (x*x + y*y); if (dst <= 1. 0 f){ cnt++; } } return 4. 0 * ((double) cnt) /NEXP; } *NEXP = O(2 p(n))
Monte Carlo Computation of Pi Ar. BB Void computepi(f 64& pi) { random_generator rng; dense<f 32> x = rng. randomize(NEXP); dense<f 32> y = rng. randomize(NEXP); dense<f 32> dist = sqrt(x*x + y*y); dense<Boolean> mask = (dist <= 1. 0 f); dense<i 32> cnt = select(mask, 1, 0); pi = 4. 0 * add_reduce(cnt) / NEXP; }
Evaluation of Monte Carlo Samples Pi(Distances <= 1) 10 3. 2 1, 000 3. 212 1, 000 3. 14572 10, 000 3. 141176 50, 000 3. 141698
Intel Ar. BB Today Preview Release August 25, 2011 1. 0 beta 6 Project retired by Intel October 2012 Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks
Sources http: //www. drdobbs. com/parallel/array-building-blocks-a-flexibleparalle/227300084 http: //openlab-mu-internal. web. cern. ch/openlab-muinternal/03_Documents/4_Presentations/Slides/2010 list/02_CERN_open. Lab_Workshop-2010_Hans_Pabst. pdf
- Slides: 20