Intel Array Building Blocks BY EDWARD JONES Background

Intel Array Building Blocks BY: EDWARD JONES

Background Intel Ct: Developed in 2007 Parallel programming model for multicore chips Exploits Single Instruction, Multiple Data (SIMD) Rapid. Mind Started in 2004 Provided software product that simplifies the use of multi-core processors and graphics processing units (GPUs) Intel acquired Rapid. Mind on August 19, 2009

Intel Ar. BB is a C++ API Promote parallel programming Hide intricacies hardware and vector ISA Oriented to data-intensive mathematical computations Built in protection An Ar. BB program cannot create race conditions or deadlocks by default

What is it used for? Bioinformatics Visual Computing Engineering Design Signal and Image Processing Financial Analytics Science and Research Oil and Gas Enterprise Medical Imaging

Extend C++ Use standard C++ feature to create new types and operators Constructs of Ar. BB Scalar types – equivalent to primitive C++ types Vector types – parallel collections of scalar data Operators – Scalar and vector operators Functions – User defined code fragments Control flow

Scalar Types Description C++ equivalents f 32, f 64 32/64 bit floating point number Float, double i 8, i 16, i 32, i 64 8/16/32/64 bit signed integers Char, short, int u 8, u 16, u 32, u 64 8/16/32/64 bit unsigned integers Unsigned char, short, int Boolean value bool usize, isize Signed/unsigned integers sufficiently large to store addresses. size_t

Dense Containers Very similar to vectors Dynamically changes size during runtime Operations: Element wise scalar operations Indexing Reordering Reductions Property Access Most operations run in parallel

Dense Containers Example void vecsum (dense<f 32> a, dense<f 32> b, dense<f 32>&c){ c = a + b; } int main(int argc, char** argv){ #define SIZE = 1024; float a[SIZE]; float b[SIZE]; float c[SIZE]; dense<f 32> va; bind (va, a, SIZE); dense<f 32> vb; bind (vb, b, SIZE); dense<f 32> vc; bind (va, c, SIZE); call(vecsum)(va, vb, vc); }

Element-wise and Vector-scalar Operators All standard C++ arithmetic, bitwise, and logical operators can be used in vector computations This allows these operations to be done in parallel to speed up runtime. Other operators Operator Description abs Absolute value cos Cosine sin Sine tan Tangent exp Exponent log Natural logarithm

Collective Operators Perform computations where output(s) depend on all of the inputs. Example Reduction – applies an operator over an entire vector to compute a distilled value or values. add_reduce([1 0 2 -1 4]) yields 6 Scan – computes reductions on all prefixes of a collection add_iscan([1 0 2 -1 4]) yields [1 (1+0) (1+0+2+(-1)) (1+0+2+(-1)+4)]

Other Types of Operators Permutation Operators These operations alter the size and order of vectors a = shift(b, -1, value); a = rotate(b, -1) Facility Operators Provides data processing features Operator Dimension Description cat 1, 2, 3 Concatenate dense containers page 3 Retrieve slice of a dense container

$Differences from C++ _for(i 32 i=0, i<=N, i++) { _if(condition){ /* code */ }$

Differences from C++ _for(i 32 i=0, i<=N, i++) { _if(condition){ /* code */ } _end_for; /* code */ } _else { _while(condition){ /* code */ } _end_while; /* code */ } _end_if;

Functions Calling Ar. BB functions is different from normal function calls Form: mfc fnct = call(my_function); Calling a function creates a closure for that function Once created the first time it will never be created again Allows for Currying ‘map’ function allows the programmer to execute a function for every element in a vector

Dynamic Execution Engine Array Building Blocks provides a dynamic execution engine which comprises three major services: Threading Runtime Provides a model for fine-grained model for data and task parallel threading Memory Manager Segregates normal C++ memory from the Ar. BB memory Set of lock-free memory interfaces as a garbage collector Just-in-time Compiler/Dynamic Engine Constructs intermediate representation of computations, performs optimizations, and generates code.

Monte Carlo Computation of Pi

Monte Carlo Computation of Pi C/C++ double computepi(){ int cnt = 0; for(int i = 0; i < NEXP; i++){ float x = float(rand()) / float(RAND_MAX); float y = float(rand()) / float(RAND_MAX); float dst = sqrtf (x*x + y*y); if (dst <= 1. 0 f){ cnt++; } } return 4. 0 * ((double) cnt) /NEXP; } *NEXP = O(2 p(n))

$Monte Carlo Computation of Pi Ar. BB Void computepi(f 64& pi) { random_generator rng;$

Monte Carlo Computation of Pi Ar. BB Void computepi(f 64& pi) { random_generator rng; dense<f 32> x = rng. randomize(NEXP); dense<f 32> y = rng. randomize(NEXP); dense<f 32> dist = sqrt(x*x + y*y); dense<Boolean> mask = (dist <= 1. 0 f); dense<i 32> cnt = select(mask, 1, 0); pi = 4. 0 * add_reduce(cnt) / NEXP; }

Evaluation of Monte Carlo Samples Pi(Distances <= 1) 10 3. 2 1, 000 3. 212 1, 000 3. 14572 10, 000 3. 141176 50, 000 3. 141698

Intel Ar. BB Today Preview Release August 25, 2011 1. 0 beta 6 Project retired by Intel October 2012 Overshadowed by Intel Cilk Plus and Intel Threading Building Blocks

Sources http: //www. drdobbs. com/parallel/array-building-blocks-a-flexibleparalle/227300084 http: //openlab-mu-internal. web. cern. ch/openlab-muinternal/03_Documents/4_Presentations/Slides/2010 list/02_CERN_open. Lab_Workshop-2010_Hans_Pabst. pdf