MIMD Shared Memory PE PE PE PE Interconnection

  • Slides: 52
Download presentation
MIMD - Shared Memory PE PE PE … PE Interconnection Network Memory 9/29/2020 ICSS

MIMD - Shared Memory PE PE PE … PE Interconnection Network Memory 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 1

MIMD - Distributed Memory PE PE PE Memory … PE Memory Interconnection Network 9/29/2020

MIMD - Distributed Memory PE PE PE Memory … PE Memory Interconnection Network 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 2

MIMD Programs • A MIMD program is divided into individual processes – Which run

MIMD Programs • A MIMD program is divided into individual processes – Which run asynchronously in parallel – Must be synchronized for data exchanges • Ideally – 1 process : 1 process – In reality n processes : 1 processor • Which means the processes must be scheduled 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 3

Communication • Processes need to exchange data with one another • This can be

Communication • Processes need to exchange data with one another • This can be done by – Utilizing shared memory – Passing messages between the processes • Which can be simulated using shared memory • In order to exchange data, processes must be synchronized before the exchange can take place – This synchronization is almost automatic when using message passing 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 4

The Simple Case • Two processes need to communicate and are running on the

The Simple Case • Two processes need to communicate and are running on the same processor – There a large number of solutions to this problem 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 5

First Attempt Var turn : 1. . 2; turn : = 1; Loop While

First Attempt Var turn : 1. . 2; turn : = 1; Loop While turn 1 do nothing end; critical section turn : = 2; other instructions End Loop While turn 2 do nothing end; critical section turn : = 1; other instructions End • Only one process in critical section at a time • Alternating access is forced • Restriction on Parallelism 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 6

Second Attempt Var flag: array [1. . 2] of Boolean; flag[1] : = false;

Second Attempt Var flag: array [1. . 2] of Boolean; flag[1] : = false; flag[2] : = false; Loop While flag[2] do nothing end; flag[1] : = true; critical section flag[1] : = false; other instructions End Loop While flag[1] do nothing end; flag[2] : = true; critical section flag[2] : = false; other instructions End • What happens if the tests happen at the same time? • Both processes can be in the critical section at the same time • Incorrect!! 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 7

Third Attempt Var flag: array [1. . 2] of Boolean; flag[1] : = false;

Third Attempt Var flag: array [1. . 2] of Boolean; flag[1] : = false; flag[2] : = false; Loop flag[1] : = true; While flag[2] do nothing end; critical section flag[1] : = false; other instructions End Loop flag[2] : = true; While flag[1] do nothing end; critical section flag[2] : = false; other instructions End • Only one will be in the critical section at a time • But what happens if both set their flags simultaneously? • Deadlock -- Incorrect!! 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 8

Fourth Attempt Var turn: 1. . 2; flag: array [1. . 2] of Boolean;

Fourth Attempt Var turn: 1. . 2; flag: array [1. . 2] of Boolean; turn : = 1; flag[1] : = false; flag[2] : = false; Loop flag[1] : = true; turn : = 2; While flag[2] and turn = 2 do nothing End; critical section flag[1] : = false; other instructions End Loop flag[2] : = true; turn : = 1; While flag[1] and turn = 1 do nothing End; critical section flag[2] : = false; other instructions End • At last correct!! • Similar to Dekker’s Algorithm 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 9

Hardware Solution • Most systems provide a hardware solution – Most often though a

Hardware Solution • Most systems provide a hardware solution – Most often though a test-and-set or compareand-swap instruction is provided Var lock: Boolean; lock : = false; 9/29/2020 Reads the lock, and then Sets it to true Loop Guaranteed to be atomic While test-and-set(lock) do nothing End; critical section lock : = false; other instructions End ICSS 531 - Shared Memory Parallel Computing 10

Semaphores • Introduced by Dijkstra in 1965 – Based on the signaling mechanisms in

Semaphores • Introduced by Dijkstra in 1965 – Based on the signaling mechanisms in railroad traffic • A semaphore is an integer value, apart from initialization, is accessed through two standard atomic operations: – Wait (originally P for proberen) – Signal (originally V for verhogen) 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 11

Wait and Signal Procedure wait( var s: Integer ); Begin While s 0 do

Wait and Signal Procedure wait( var s: Integer ); Begin While s 0 do nothing End; Procedure signal( var s: Integer ); Begin s : = s + 1; End s : = s – 1; End 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 12

Wait and Signal Type Semaphore = record value: Integer; L: List of Process; end;

Wait and Signal Type Semaphore = record value: Integer; L: List of Process; end; Procedure wait( var s: Semaphore ); Begin s. value : = s. value – 1; if s. value < 0 then putlast( s. L, this ); block( this ); End 9/29/2020 Procedure signal( var s: Semaphore ); begin s : = s + 1; if s. value 0 then wakeup( removefirst( s. L ) ); End ICSS 531 - Shared Memory Parallel Computing 13

Monitors • Introduced in 1974/75 by Hoare and Brinch Hansen • Every monitor contains

Monitors • Introduced in 1974/75 by Hoare and Brinch Hansen • Every monitor contains both the data to be protected as well as access and synchronization mechanisms • Only one process can be active in a monitor at a time 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 14

View of a Monitor Shared Data … Entry Queue Operations Initialization Code 9/29/2020 ICSS

View of a Monitor Shared Data … Entry Queue Operations Initialization Code 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 15

Conditions • Conditions are used in a monitor to provide additional synchronization mechanisms –

Conditions • Conditions are used in a monitor to provide additional synchronization mechanisms – Condition variables – A wait operation causes the current process to be suspended – A signal operation resumes exactly one suspended process 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 16

Signal • Suppose a process P invokes signal, and there is a suspended process

Signal • Suppose a process P invokes signal, and there is a suspended process Q – Clearly Q cannot begin execution immediately • Two possibilities – P waits until Q leaves the monitor – Q waits for P to leave 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 17

Java Threads • A thread is a flow of control in a program. •

Java Threads • A thread is a flow of control in a program. • The Java Virtual Machine allows an application to have multiple threads of execution running concurrently. • When a Java Virtual Machine starts up, there is usually a single thread (which typically calls the method named main of some designated class). • Threads are given priorities. A high priority thread has preference over a low priority thread. 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 18

Understanding Threads • You must be able to answer the following questions – What

Understanding Threads • You must be able to answer the following questions – What code does a thread execute? – What states can a thread be in? – How does a thread change its state? – How does synchronization work? 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 19

Thread Objects • As is everything else, threads in Java are represented as objects.

Thread Objects • As is everything else, threads in Java are represented as objects. • The code that a thread executes is contained in its run() method. – There is nothing special about run, anyone can call it. • To make a thread eligible for running you call its start() method 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 20

Example public class Counter. Thread extends Thread { public void run() { for (

Example public class Counter. Thread extends Thread { public void run() { for ( int i=0; i<10; i++) System. out. println(“Count: “ + i); } public static void main(String args[]) { Counter. Thread ct = new Counter. Thread(); ct. start(); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 21

Interface Runnable • Classes that implement Runnable can also be run as separate threads

Interface Runnable • Classes that implement Runnable can also be run as separate threads • Runnable classes have a run() method • In this case you create a thread specifying the Runnable object as the constructor argument 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 22

Example public class Down. Counter implements Runnable { public void run() { for (int

Example public class Down. Counter implements Runnable { public void run() { for (int i=10; i>0; i--) System. out. println(“Down: “+ i); } public static void main(String args[]) { Down. Counter ct = new Down. Counter(); Thread t = new Thread(ct); t. start(); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 23

Many public class Many extends Thread { private int retry; private String info; public

Many public class Many extends Thread { private int retry; private String info; public Many (int retry, String info) { this. retry = retry; this. info = info; } public void run () { for (int n = 0; n < retry; ++ n) work(); quit(); } protected void work () { System. out. print(info); } protected void quit () { System. out. print('n'); } public static void main (String args []) { if (args != null) for (int n = 0; n < args. length; ++n) new Many(args. length, args[n]). start(); }} 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 24

When Execution Ends • The Java Virtual Machine continues to execute threads until either

When Execution Ends • The Java Virtual Machine continues to execute threads until either of the following occurs: – The exit method of class Runtime has been called – All threads that are not daemon threads have died, either by returning from the call to the run() or by throwing an exception that propagates beyond run(). • You cannot restart a dead thread, but you can access its state and behavior. 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 25

Thread Scheduling • Threads are scheduled like processes • Thread states – – Running

Thread Scheduling • Threads are scheduled like processes • Thread states – – Running Waiting, Sleeping, Suspended, Blocked Ready Dead • When you invoke start() the Thread is marked ready and placed in the thread queue 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 26

Thread Priorities • Threads can have priorities from 1 to 10 (10 is the

Thread Priorities • Threads can have priorities from 1 to 10 (10 is the highest - this is not Unix) • The default priority is 5 • Priorities can be changed via set. Priority() (there is also a get. Priority()) 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 27

Scheduling Implementations • Scheduling is typically either: – non-preemptive – preemptive • Most Java

Scheduling Implementations • Scheduling is typically either: – non-preemptive – preemptive • Most Java implementations use preemptive scheduling. – the type of scheduler will depend on the JVM that you use. – In a non-preemptive scheduler a thread leaves the running state only when it is ready to do so. 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 28

Other Thread Methods • A call to the yield() method causes the currently executing

Other Thread Methods • A call to the yield() method causes the currently executing thread to go to the ready state (this is done by the thread itself) • join() waits for a thread to die. • A sleeping thread suspends itself for a specified period of time and then automatically resumes: – public static void sleep(long milliseconds) throws Interrupted. Exception 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 29

Timer import java. util. Date; class Timer implements Runnable { public void run() {

Timer import java. util. Date; class Timer implements Runnable { public void run() { while ( true ) { System. out. println( new Date() ); try { Thread. current. Thread(). sleep(1000); } catch ( Interrupted. Exception e ) {} } } public static void main( String args[] ) { Thread t = new Thread( new Timer() ); t. start(); System. out. println( "Main done" ); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 30

Synchronization • Thread synchronization in Java uses monitors • Each object has a lock

Synchronization • Thread synchronization in Java uses monitors • Each object has a lock that can be held by a thread. • A thread can give up a lock by: – leaving a block of synchronized code – calling wait() • A thread executing wait() can be released by: – notify() • some waiting thread is allowed to compete for the lock – notify. All() • all waiting threads are allowed to compete for the lock 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 31

Synchronized Code • There are two ways to mark code as synchronized: – use

Synchronized Code • There are two ways to mark code as synchronized: – use the synchronize statement synchronize( some. Object) { // must obtain lock to enter this block. // wait()ing threads have to reacquire the // lock before they are allowed to proceed. } – using the synchronized method shorthand public synchronized some. Method() { … } – which the same as public some. Method() { synchronized( this ) { … } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 32

Semaphores public class Semaphore { protected int n; public Semaphore( int n ) {

Semaphores public class Semaphore { protected int n; public Semaphore( int n ) { this. n = n; } public synchronized void P() { while (n <= 0) { try { wait(); } catch( Exception e ) { // still competes for lock } } -- n; } public synchronized void V() { if ( ++n > 0 ) notify(); }} 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 33

Bounded Buffer public class Bounded. Buffer { private int buf[], head = 0, tail

Bounded Buffer public class Bounded. Buffer { private int buf[], head = 0, tail = 0, count = 0; public Bounded. Buffer( int size ) { n = new int[ size ]; } public synchronized deposit( int value) { while ( count == n. length ) try { wait(); } catch ( Interrupted. Exception e ) {} buf[ tail ] = value; tail = ( tail + 1 ) % buf. length; count++; if ( count == 1 ) notify(); } public synchronized int fetch() { while ( count == 0 ) try { wait(); } catch Interrupted. Exception e ) {} int val = buf[ head ]; head = ( head + 1 ) % n. length; count--; if ( count = buf. length – 1 ) notify(); return val; }} 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 34

Be Careful!! public class Accumulator { private static int count = 0; public Accumulator()

Be Careful!! public class Accumulator { private static int count = 0; public Accumulator() {} public synchronized add. To( int value) { count = count + value; } public synchronized int sub. From( int value ) { count = count – value; } public String to. String() { return String. value. Of( count ); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 35

Be Careful!! public class Problem extends Thread { Accumulator a; public Problem( Accumulator x

Be Careful!! public class Problem extends Thread { Accumulator a; public Problem( Accumulator x ) { a = x; } public void run() { for ( int I = 0; I < 10; I++ ) a. add. To( 6 ); } public static main( String args[] ) { Accumulator reg = new Accumulator(); new Problem( new Warning() ). run(); System. out. println( reg ); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 36

Synchronized Static Methods • Java also provides synchronized static methods. • Each class has

Synchronized Static Methods • Java also provides synchronized static methods. • Each class has a class lock which takes the place of an object lock. • Before a synchronized static method is executed, the calling thread must first obtain the class lock. • Since there is only one class lock, at most one thread can hold the lock for the class (object locks can be held by different threads locking on different instances of the class). 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 37

Linda • Gelernter proposed a universal paradigm for shared-memory access called Linda • Linda

Linda • Gelernter proposed a universal paradigm for shared-memory access called Linda • Linda is a toolbox which provides hidden mechanisms to protect/access shared data – The Linda primitives are simple and few in number • Linda is language neutral – Implementations for C and FORTRAN are available 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 38

Linda Basics • Tuple space is Linda's name for its shared data space –

Linda Basics • Tuple space is Linda's name for its shared data space – A Tuple is simply a list of fields, separated by commas and enclosed in parentheses • A tuple is accessed by specifying its contents – Associative memory model – There is no address associated with a tuple 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 39

Tuple Space Sender Tuple Space Receiver 9/29/2020 ICSS 531 - Shared Memory Parallel Computing

Tuple Space Sender Tuple Space Receiver 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 40

Linda Operations • There are four basic operations: – out • Generates a data

Linda Operations • There are four basic operations: – out • Generates a data (passive) tuple. • Each field is evaluated and put into tuple space. – in • Uses a template to retrieve tuple from tuple space • Once retrieved, the tuple is taken out of tuple space and is no longer • If no matching tuple is found process will block. Provides for synchronization between processes. 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 41

Linda Templates • Specifies tuple to retrieve – Consists of sequence of typed fields

Linda Templates • Specifies tuple to retrieve – Consists of sequence of typed fields • Two kinds of fields – Actuals • Variables, constants or expression that resolve to constant – Formals • Holders for data to retrieve • Preceded by a question mark • Assigned values of corresponding fields in matched tuple 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 42

Matching Templates • In order for a template to match a tuple: – Have

Matching Templates • In order for a template to match a tuple: – Have to have the same number of fields – Actuals must have same type, length and values as those in corresponding tuple fields – Formals in template must match type and length of corresponding fields in tuple • If several tuples match the template, impossible to predict which will be selected • The order of evaluation of fields within a tuple or template is undefined. 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 43

Linda Operations – rd • Uses a template to copy data without removing it

Linda Operations – rd • Uses a template to copy data without removing it from tuple space. • Once read, the tuple is still available for others. • If no matching tuple is found process will block. – eval • Generates process (active) tuple • Control is immediately returned to invoking program – Logically, each field is evaluated concurrently, by a separate process and then placed into tuple space – 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 44

Linda Implementations • There are several Linda implementations • Most common are – C-Linda

Linda Implementations • There are several Linda implementations • Most common are – C-Linda – Fortran-Linda • C-Linda is available on the RIT machines 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 45

Parallel Pie in Linda • The value of can be computed by computing the

Parallel Pie in Linda • The value of can be computed by computing the integral and multiplying by 4: • Simpson’s rule can be used to provide an estimate of the value of the integral 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 46

Simpson’s Rule • The area is approximated by dividing the interval [a, b] into

Simpson’s Rule • The area is approximated by dividing the interval [a, b] into an even number of subintervals by the points • where 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 47

Simpson’s Rule • A quadratic function is fit over each panel, to three successive

Simpson’s Rule • A quadratic function is fit over each panel, to three successive points (xi, fi) where fi=f(xi) • Which eventually leads to Simpson’s rule 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 48

Pie. java public class pie { public static void main( String args[] ) {

Pie. java public class pie { public static void main( String args[] ) { System. out. println( 4 * simpsons( 0. 0, 1. 0, 90000 ) ); } public static double f( double x ) { return Math. sqrt( 1 - x * x ); } public static double simpsons( double a, double b, double n ) { double sum 4 = 0. 0, sum 2 = 0. 0; double h = ( b - a ) / n; for ( int i = 1; i < n; i += 2 ) sum 4 += f( a + i * h ); sum 4 = sum 4 * 4; for ( int i = 2; i < n; i += 2 ) sum 2 += f( a + i * h ); sum 2 = sum 2 * 2; return ( h / 3. 0 ) * ( f( a ) + sum 4 + sum 2 + f( b ) ); } } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 49

Computing Pi Using Linda • The basic computing model is referred to as a

Computing Pi Using Linda • The basic computing model is referred to as a farmer/worker model – The farmer prepares the work and doles it out – Workers perform the work and return the result to the farmer • In this program – The farmer will split the area to be integrated and give different portions to the workers – Each worker will compute their area and return the result to the farmer 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 50

Farmer #include “linda. h” #include <stdio. h> main() { int pie 1, pie 2,

Farmer #include “linda. h” #include <stdio. h> main() { int pie 1, pie 2, pie 3; eval(pie 1); eval(pie 2); eval(pie 3); out( “work”, 0, . 3, 30000 ); out( “work”, . 3, . 6, 30000 ); out( “work”, . 6, 1. 0, 30000 ); in( “ans”, pie 1 ); in( “ans”, pie 2 ); in( “ans”, pie 3 ); printf( “pi is about %fn”, 4. 0*(pie 1+pie 2+pie 3) ); } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 51

Worker #include “linda. h” #include <stdio. h> main() { double a, b, n; in(

Worker #include “linda. h” #include <stdio. h> main() { double a, b, n; in( “work”, a, b, n ); double area = simpsons( a, b, n ); out( “ans”, area ); } 9/29/2020 ICSS 531 - Shared Memory Parallel Computing 52