Parallel Processing CS 730 Lecture 5 Shared Memory

Introduction • Objective: To further study the shared memory model of parallel programming. Introduction

Open. MP • Extension to FORTRAN, C/C++ – Uses directives (comments in FORTRAN, pragma

Benefits • Provides incremental parallelism • Small increase in code size • Simpler model

Further Information • Adopted as a standard in 1997 – Initiated by SGI •

Shared vs. Distributed Memory P 0 P 1 . . . Memory P 1

Shared Memory Programming Model • Shared memory programming does not require physically shared memory

IBM S 80 • An SMP with upto 24 processors (RS 64 III processors)

hello. c #include <stdio. h> int main(int argc, char **argv) { int n; n

hello. f program hello integer omp_get_thread_num, omp_get_num_threads print *, "Hello parallel world from threads"

Compiling and Executing Open. MP Programs on the IBM S 80 • To compile

Parallel Loop subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a,

Execution Model Master thread Implicit thread creation Parallel Region Master and slave threads Implicit

More Complicated Exampe Real*8 x, y integer i, j, m, n, maxiter integer depth(*,

Parallel Loop !$omp parallel do private(j, x, y) maxiter = 200 do i =

Parallel Loop maxiter = 200 !$omp parallel do private(j, x, y) do i =

Explicit Synchronization maxiter = 200 total_iters = 0 !$omp parallel do private(j, x, y)

Reduction Variables maxiter = 200 total_iters = 0 !$omp parallel do private(j, x, y)

Slides: 18

Download presentation

Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with Open. MP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts of this lecture was derived from chapters 1 -2 in Chandra et al. Jan. 10, 2001 Parallel Processing 1

Introduction • Objective: To further study the shared memory model of parallel programming. Introduction to the Open. MP standard for shared memory parallel programming • Topics – Introduction to Open. MP • hello. c • hello. f – – Loop level parallelism Shared vs. private variables Synchronization (implicit and explicit) Parallel regions Jan. 10, 2001 Parallel Processing 2

Open. MP • Extension to FORTRAN, C/C++ – Uses directives (comments in FORTRAN, pragma in C/C++) • ignored without compiler support • Some library support required • Shared memory model – – – – parallel regions loop level parallelism implicit thread model communication via shared address space private vs. shared variables (declaration) explicit synchronization via directives (e. g. critical) library routines for returning thread information (e. g. omp_get_num_threads(), omp_get_thread_num() ) – Environment variables used to provide system info (e. g. OMP_NUM_THREADS) Jan. 10, 2001 Parallel Processing 3

Benefits • Provides incremental parallelism • Small increase in code size • Simpler model than message passing • Easier to use than thread library • With hardware and compiler support smaller granularity than message passing. Jan. 10, 2001 Parallel Processing 4

Further Information • Adopted as a standard in 1997 – Initiated by SGI • www. openmp. org • Chandra, Dagum, Kohr, Maydan, Mc. Donald, Menon, “Parallel Programming in Open. MP”, Morgan Kaufman Publishers, 2001. Jan. 10, 2001 Parallel Processing 5

Shared vs. Distributed Memory P 0 P 1 . . . Memory P 1 M 0 M 1 Pn . . . Pn Mn Interconnection Network Shared memory Jan. 10, 2001 P 0 Distributed memory Parallel Processing 6

Shared Memory Programming Model • Shared memory programming does not require physically shared memory so long as there is support for logically shared memory (in either hardware or software) • If logical shared memory then there may be different costs for accessing memory depending on the physical location. • UMA - uniform memory access – SMP - symmetric multi-processor – typically memory connected to processors via a bus • NUMA - non-uniform memory access – typically physically distributed memory connected via an interconnection network Jan. 10, 2001 Parallel Processing 7

IBM S 80 • An SMP with upto 24 processors (RS 64 III processors) • http: //www. rs 6000. ibm. com/hardware/enterprise/s 80. html Name: Goopi. coe. drexel. edu Machine type: S 80 12 -Way with 8 Gb RAM Specifications: 2 x 6 way 450 MHz RS 64 III Processor Card, 8 Mb L 2 Cache 2 x 4096 Mb Memory 9 x 18. 2 Gb Ultra SCSI Hot Swappable Hard Disk Drives. Name: bagha. coe. drexel. edu Machine Type: 44 P Model 270 4 way with 2 Gb RAM Specifications: 2 x 2 way 375 MHz POWER 3 -II Processor, 4 Mb L 2 Cache 4 x 512 Mb SDRAM DIMMs 2 x 9. 1 Gb Ultra SCSI HDD Jan. 10, 2001 Parallel Processing 8

hello. c #include <stdio. h> int main(int argc, char **argv) { int n; n = atoi(argv[1]); omp_set_num_threads(n); printf("Number of threads = %dn", omp_get_num_threads()); #pragma omp parallel { int id = omp_get_thread_num(); if (id == 0) printf("Number of threads = %dn", omp_get_num_threads()); printf("Hello World from %dn", id); } exit(0); } Jan. 10, 2001 Parallel Processing 9

hello. f program hello integer omp_get_thread_num, omp_get_num_threads print *, "Hello parallel world from threads" print *, "Num threads = ", omp_get_num_threads() !$omp parallel print *, "Num threads = ", omp_get_num_threads() print *, omp_get_thread_num() !$omp end parallel print *, "Back to the sequential world" end Jan. 10, 2001 Parallel Processing 10

Compiling and Executing Open. MP Programs on the IBM S 80 • To compile a C program with Open. MP directives – cc_r -qsmp=omp hello. c -o hello • To compile a Fortran program with Open. MP directives – xlf_r -qsmp=omp hello. f -o hello • The environment variable OMP_NUM_THREADS controls the number of threads used in Open. MP parallel regions. It can be set from the C shell – setenv OMP_NUM_THREAD <count> – where <count> is a positive integer Jan. 10, 2001 Parallel Processing 11

Parallel Loop subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a, x(n), y do i = 1, n z(i) = a * x(i) + y end do return end Jan. 10, 2001 subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end Parallel Processing 12

Execution Model Master thread Implicit thread creation Parallel Region Master and slave threads Implicit barrier synchronization Master thread Jan. 10, 2001 Parallel Processing 13

More Complicated Exampe Real*8 x, y integer i, j, m, n, maxiter integer depth(*, *) integer mandel_val … maxiter = 200 do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j, i) = mandel_val(x, y, maxiter) end do Jan. 10, 2001 Parallel Processing 14

Parallel Loop !$omp parallel do private(j, x, y) maxiter = 200 do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j, i) = mandel_val(x, y, maxiter) end do !$omp end parallel do Jan. 10, 2001 Parallel Processing 15

Parallel Loop maxiter = 200 !$omp parallel do private(j, x, y) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j, i) = mandel_val(x, y, maxiter) end do !$omp end parallel do Jan. 10, 2001 Parallel Processing 16

Explicit Synchronization maxiter = 200 total_iters = 0 !$omp parallel do private(j, x, y) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j, i) = mandel_val(x, y, maxiter) !$omp critical total_iters = total_iters + depth(j, I) !$omp end critical end do !$omp end parallel do Jan. 10, 2001 Parallel Processing 17

Reduction Variables maxiter = 200 total_iters = 0 !$omp parallel do private(j, x, y) !$omp+ reduction(+: total_iters) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j, i) = mandel_val(x, y, maxiter) total_iters = total_iters + depth(j, I) end do !$omp end parallel do Jan. 10, 2001 Parallel Processing 18