A Standard for Shared Memory Parallel Programming Seyong

  • Slides: 44
Download presentation
A Standard for Shared Memory Parallel Programming Seyong Lee Purdue University School of Electrical

A Standard for Shared Memory Parallel Programming Seyong Lee Purdue University School of Electrical and Computer Engineering

Overview I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP

Overview I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives Run-Time Library Routine Environment Variables Summary

I. Introduction to Open. MP What is Open. MP? n n n Application program

I. Introduction to Open. MP What is Open. MP? n n n Application program interface (API) for shared memory parallel programming A specification for a set of compiler directives, library routines, and environment variables Make it easy to create multi-threaded (MT) programs in Fortran, C and C++ Portable / multi-platform, including Unix platforms and Windows NT platforms Jointly defined and endorsed by a group of major computer hardware and software vendors

Open. MP is not…. n Not Automatic parallelization - User explicitly specifies parallel execution

Open. MP is not…. n Not Automatic parallelization - User explicitly specifies parallel execution - Compiler does not ignore user directives even if wrong n Not just loop level parallelism - Functionality to enable coarse grained parallelism n n n Not meant for distributed memory parallel systems Not necessarily implemented identically by all vendors Not Guaranteed to make the most efficient use of shared memory

Why Open. MP? n Parallel programming before Open. MP - Standard way to program

Why Open. MP? n Parallel programming before Open. MP - Standard way to program distributed memory computers (MPI and PVM) - No standard API for shared memory programming n Several vendors had directive based API for shared memory programming - All different, vendor proprietary n Commercial users, high end software vendors have big investment in existing code - Not very eager to rewrite their code in new language n Portability possible only through MPI - Library based, good performance and scalability - But sacrifice the built in shared memory advantage of the hardware

Goals of Open. MP n Standardization : - Provide a standard among a variety

Goals of Open. MP n Standardization : - Provide a standard among a variety of shared memory architectures/platforms n Lean and mean : - Establish a simple and limited set of directives for programming shared memory machines. n Ease of Use : - Provide capability to incrementally parallelize a serial program - Provide the capability to implement both coarse-grain and finegrain parallelism n Portability : - Support Fortran (77, 90, and 95), C, and C++

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives Run-Time Library Routine Environment Variables Summary

II. Open. MP Programming Model Thread Based Parallelism n Explicit Parallelism n Fork-Join Model

II. Open. MP Programming Model Thread Based Parallelism n Explicit Parallelism n Fork-Join Model n Compiler Directive Based n Nested Parallelism Support n Dynamic Threads n

User Interface Model n Compiler Directives - Most of the API : Control constructs,

User Interface Model n Compiler Directives - Most of the API : Control constructs, Data attribute constructs - Extends base language : f 77, f 90, C, C++ - Example : C$OMP PARALLEL DO n Library - Small set of functions to control threads and to implement unstructured locks - Example : call omp_set_num_threads(128) n Environment Variables - For end users to control run time execution - Example : setenv OMP_NUM_THREADS 8

Execution Model

Execution Model

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives Run-Time Library Routine Environment Variables Summary

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

Open. MP Directives Format n Fortran Directives Format C$OMP construct [clause]…] !$OMP construct [clause]…]

Open. MP Directives Format n Fortran Directives Format C$OMP construct [clause]…] !$OMP construct [clause]…] *$OMP construct [clause]…] !$OMP PARALLEL DEFAULT (SHARED) PRIVATE (a, b) [structured block of code] !$OMP END PARALLEL cf. !$ a = OMP_get_thread_num() n C / C++ Directives Format #pragma omp construct [clause]…] #pragma omp parallel default (shared) private(a, b) { [structured block of code] } /* all threads join master thread and terminate */

Structured blocks n Most Open. MP constructs apply to structured block - Structured block:

Structured blocks n Most Open. MP constructs apply to structured block - Structured block: A block of code with one point of entry at the top and one point of exit at the bottom. The only other branches allowed are STOP statements in Fortran and exit() in C/C++ C$OMP PARALLEL 10 wrk (id) = garbage (id) res (id) = wrk (id) ** 2 if (conv(res(id)) goto 10 C$OMP END PARALLEL print *, id A structured block C$OMP PARALLEL 10 wrk (id) = garbage (id) 30 res (id) = wrk (id) ** 2 if (conv(res(id)) goto 20 go to 10 C$OMP END PARALLEL if (not_DONE) goto 30 20 print *, id Not a structured block

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

Directive Scoping n Static (Lexical) Extent : - The block of code directly placed

Directive Scoping n Static (Lexical) Extent : - The block of code directly placed between the two directives !$OMP PARALLEL and !$OMP END PARALLEL n Dynamic Extent : - The code included in the lexical extent plus all the code called from inside the lexical extent n Orphaned Directive : - An Open. MP directive that appears independently from another enclosing directives. It exists outside of another directive’s static extent n Example PROGRAM TEST … !$OMP PARALLEL … !$OMP DO DO I = … … CALL SUB 1 … ENDDO !$OMP END DO … CALL SUB 2 … !$OMP END PARALLEL Static Extent SUBROUTINE SUB 1 … !$OMP CRITICAL … !$OMP END CRITICAL END SUBROUTINE SUB 2 … !$OMP SECTIONS … !$OMP END SECTIONS … END Orphaned Directive

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

PARALLEL Region Construct n n n A block of code that will be executed

PARALLEL Region Construct n n n A block of code that will be executed by multiple threads Properties - Fork-Join Model - Number of threads won’t change inside a parallel region - SPMD execution within region - Enclosed block of code must be structured, no branching into or out of block Format !$OMP PARALLEL clause 1 clause 2 … … !$OMP END PARALLEL

PARALLEL Region Construct !$OMP PARALLEL write (*, *) “Hello” !$OMP END PARALLEL

PARALLEL Region Construct !$OMP PARALLEL write (*, *) “Hello” !$OMP END PARALLEL

PARALLEL Region Construct n How many threads? 1. Use of the omp_set_threads() library function

PARALLEL Region Construct n How many threads? 1. Use of the omp_set_threads() library function 2. Setting of the OMP_NUM_THREADS environment variable 3. Implementation default n Dynamic Threads : - By default, the same number of threads are used to execute each parallel region - Two methods for enabling dynamic threads 1. Use of the omp_set_dynamic() library function 2. Setting of the OMP_DYNAMIC environment variable

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

Work-Sharing Constructs n n n Divide the execution of the enclosed code region among

Work-Sharing Constructs n n n Divide the execution of the enclosed code region among the members of the team that encounter it Do not launch new threads Must be enclosed within a parallel region No-implied barrier upon entry to a work-sharing construct An implied barrier at the end of a work-sharing construct Type of work-sharing constructs DO Directive: !OMP DO / !$OMP END DO ¨ SECTIONS Directive: !OMP SECTIONS / !$OMP END SECTIONS ¨ SINGLE Directive: !OMP SINGLE / !$OMP END SINGLE ¨

Work-Sharing Constructs n n DO Directive Format !$OMP DO clause 1 clause 2 …

Work-Sharing Constructs n n DO Directive Format !$OMP DO clause 1 clause 2 … [do loop] !$ OMP END DO end_clause

Work-Sharing Constructs • How iterations of the loop are divided? => use SHEDULE (type,

Work-Sharing Constructs • How iterations of the loop are divided? => use SHEDULE (type, chunk) clause STATIC DYNAMIC GUIDED

Work-Sharing Constructs n SECTIONS Directive - Non-iterative work-sharing - Each section is executed once

Work-Sharing Constructs n SECTIONS Directive - Non-iterative work-sharing - Each section is executed once by a thread - Potential MIMD? n Format !$OMP SECTIONS clause 1, clause 2… !$OMP SECTION [block 1] !$OMP SECTION [block 2] … !$OMP END SECTIONS end_clause

Work-Sharing Constructs n SECTIONS Directive Example code !$OMP SECTIONS !$OMP SECTION write(*, *) “Hello”

Work-Sharing Constructs n SECTIONS Directive Example code !$OMP SECTIONS !$OMP SECTION write(*, *) “Hello” !$OMP SECTION write(*, *) “Hi” !$OMP SECTION write(*, *) “Bye” !$OMP ENT SECTIONS

Work-Sharing Constructs n SINGLE Directive - Encloses code to be executed by only one

Work-Sharing Constructs n SINGLE Directive - Encloses code to be executed by only one thread in the team - Threads in the team that do not execute the SINGLE directive, wait at the end of the enclosed code block unless a NOWAIT clause is specified n Format !$OMP SINGLE clause 1 clause 2… … !$OMP END SINGLE end_clause

Work-Sharing Constructs n SINGLE DIRECTIVE Example code !$OMP SINGLE write(*, *) “Hello” !$OMP END

Work-Sharing Constructs n SINGLE DIRECTIVE Example code !$OMP SINGLE write(*, *) “Hello” !$OMP END SINGLE

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

Synchronization Constructs n Open. MP has the following construct to support synchronization: - MASTER

Synchronization Constructs n Open. MP has the following construct to support synchronization: - MASTER Directive - CRITICAL Directive - BARRIER Directive - ATOMIC Directive - ORDERED Directive - FLUSH Directive

Synchronization Constructs n MASTER Directive - Executed only by the master thread of the

Synchronization Constructs n MASTER Directive - Executed only by the master thread of the team - No implied barrier associated with this directive n Format !$OMP MASTER … !$OMP END MASTER

Synchronization Constructs n CRITICAL Directive - Specified a region of code that must be

Synchronization Constructs n CRITICAL Directive - Specified a region of code that must be executed by only one thread at a time - The optional name enables multiple different CRITICAL regions to exist n Format !$OMP CRITICAL name … !$OMP END CRITICAL name

Synchronization Constructs n BARRIER Directive - Synchronize all threads in the team - When

Synchronization Constructs n BARRIER Directive - Synchronize all threads in the team - When encountered, each thread waits until all the other threads have reached this point - Must be encountered by all threads in a team or by non at all: otherwise, deadlock n Format !$OMP BARRIER

Synchronization Constructs n FLUSH Directive - Explicit synchronization point at which the implementation is

Synchronization Constructs n FLUSH Directive - Explicit synchronization point at which the implementation is required to provide a consistent view of memory - Thread-visible variables are written to back to memory at this point n Format !$OMP FLUSH (variable 1, variable 2, …) n The FLUSH directives is implied for the directives shown in the table below. - The directive is not implied if NOWAIT clause is present BARRIER CRITICAL and END CRITICAL END DO END PARALLEL END SECTIONS END SINGLE ORDERED and END ORDERED

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing

III. Open. MP Directives Format n Directive Scoping n PARALLEL Region Construct n Work-Sharing Constructs n Synchronization Constructs n Data Environment Constructs n

Data Environment Constructs n n n Define how and which data variables in the

Data Environment Constructs n n n Define how and which data variables in the serial section of the program are transferred to the parallel sections of the program (and back) Define which variables will be visible to all threads and which variable be private Include - Directive THREADPRIVATE - Data scope attribute clause PRIVATE FIRSTPRIVATE DEFAULT REDUCTION SHARED LASTPRIVATE COPYIN

Data Environment Constructs n n n THREADPRIVATE Directive - Make global file scope variable

Data Environment Constructs n n n THREADPRIVATE Directive - Make global file scope variable or common blocks local and persistent to a thread - Use COPYIN clause to initialize data in THREADPRIVATE variables and common blocks Format !$OMP THREADPRIVATE (a, b, …) Data scope attribute clauses - PRIVATE clause !$OMP PARALLEL PRIVATE (a, b) - SHARED clause !$OMP PARALLEL SHARED (c, d)

Data Environment Constructs n Data scope attribute clauses - FIRSTPRIVATE clause : PRIVATE with

Data Environment Constructs n Data scope attribute clauses - FIRSTPRIVATE clause : PRIVATE with automatic initialization !$OMP PARALLEL FIRSTPRIVATE (a, b) - LASTPRIVATE clause : PRIVATE with a copy from the last loop iteration or section to the original variable object !$OMP PARALLEL LASTPRIVATE (a, b) - DEFAULT clause : Specify a default PRIVATE, SHARED, or NONE scope for all variables in the lexical extent of any parallel region !$OMP PARALLEL DEFAULT (PRIVATE | SHARED | NONE) - COPYIN clause : Assign the same value to THREADPRIVATE variables for all thread in the team !$OMP PARALLEL COPYIN (a)

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives Run-Time Library Routine Environment Variables Summary

IV. Run-Time Library Routine n API for library calls that performs a variety functions

IV. Run-Time Library Routine n API for library calls that performs a variety functions - For C/C++ : include “omp. h” - Fortran 95 : use “omp_lib” module n Runtime environment routines : - Modify/Check the number of threads - OMP_SET_NUM_THREADS(), OMP_GET_NUM_THREADS(), - OMP_GET_MAX_THREADS(), OMP_GET_THREAD_NUM() - Turn on/off nesting and dynamic mode - OMP_SET_NESTED(), OMP_GET_NESTED(), - OMP_SET_DYNAMIC(), OMP_GET_DYNAMIC() - Are we in a parallel region? - OMP_IN_PARALLEL() - How many processors in the system? - OMP_GET_NUM_PROCS()

Run-Time Library Routine n Lock routines - Lock : A flag which can be

Run-Time Library Routine n Lock routines - Lock : A flag which can be set or unset. - Ownership of the lock : A thread who sets a given lock to get some privileges - Lock differs from other synchronization directives Only threads related to the lock are affected by the status of the lock - Related functions : - OMP_INIT_LOCK(), OMP_SET_LOCK(), - OMP_UNSET_LOCK(), OMP_DESTROY_LOCK(), - OMP_TEST_LOCK()

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives

I. III. IV. V. VI. Introduction to Open. MP Programming Model Open. MP Directives Run-Time Library Routine Environment Variables Summary

V. Environment Variables n Control how “OMP DO SCHEDULE(RUNTIME)” loop iterations are scheduled. -

V. Environment Variables n Control how “OMP DO SCHEDULE(RUNTIME)” loop iterations are scheduled. - OMP_SCHEDULE “schedule, chunk_size” n Set the default number of threads to use - OMP_NUM_THREADS int_literal n Can the program use a different number of threads in each parallel region ? - OMP_DYNAMIC TRUE || FALSE n Will nested parallel region create new teams of threads? - OMP_NESTED TRUE || FALSE

VI. Summary n n Open. MP is a directive based shared memory programming model

VI. Summary n n Open. MP is a directive based shared memory programming model Open. MP API is a general purpose parallel programming API with emphasis on the ability to parallelize existing programs Scalable parallel programs can be written by using parallel regions Work-sharing constructs enable efficient parallelization of computationally intensive portions of program