A Concurrent Matrix Transpose Algorithm The Implementation Presented

  • Slides: 18
Download presentation
A Concurrent Matrix Transpose Algorithm, The Implementation Presented by Pourya Jafari

A Concurrent Matrix Transpose Algorithm, The Implementation Presented by Pourya Jafari

Review: Algorithm Steps Pre-process inside each thread n Shift rows Intra-process/thread communication n Shift

Review: Algorithm Steps Pre-process inside each thread n Shift rows Intra-process/thread communication n Shift columns Post-process inside each thread n Shift rows again 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Review: Shift values? Set shifts based on row index : range 0 to N-1

Review: Shift values? Set shifts based on row index : range 0 to N-1 Now arrange the rows, so that column shifts gets us to i n n Preprocess shifting: i’ = i - L After intra-process shift columns should be equal to original row index i i’ + j = i i-L+j=i L=-j So we shift each column j cells up

Review: Last step ? 00 01 02 03 00 11 22 33 10 11

Review: Last step ? 00 01 02 03 00 11 22 33 10 11 12 13 10 21 32 03 03 10 21 32 20 21 22 23 20 31 02 13 20 31 32 33 30 01 12 23 30 (1) (2 -a) (2 -b) 1 → 2: Column shift j up 2 → 3: Row shift based on row indices 3 → 4: ? n Change of indices so far (i - j, j) → (i - j, i - j + j) (i - j, i) = (m, n) n One operation to change row index to j n - m = (i - j))= j (3) 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 (4)

Review: Radix Using radix representation, we can group row shifts We use radix 2

Review: Radix Using radix representation, we can group row shifts We use radix 2 for simplicity n Digits are bit representation, Shift all row indices have their k-th bit on 0 0 0 1 1 1 2 3 Shift for each row = + 2 3 k=0 k=1

The concurrency picture Each thread can do pre/post processing independently Processes must synchronize n

The concurrency picture Each thread can do pre/post processing independently Processes must synchronize n n n after each phase after each step of intra-process step during intra-process communications

Communication package (1) We need a mean of communication n n Facilitates synchronized communication

Communication package (1) We need a mean of communication n n Facilitates synchronized communication Provides unbuffered communication to save memory JCSP: based on the algebra of Communicating Sequential Processes (CSP) n n has strong theory background Object Oriented

Communication package (2) JCSP provides n One 2 One. Channel Where a single sender

Communication package (2) JCSP provides n One 2 One. Channel Where a single sender can send a single receiver can receive n One 2 Any. Channel Where a single sender and many receiver can communicate but one at the same time n Any 2 One. Channel Multiple senders and one receiver

Classes (1) CProcess: Column process n n Has a PID; Knows N; Has an

Classes (1) CProcess: Column process n n Has a PID; Knows N; Has an array to save its items One 2 One. Channel to each other process for intra-process shift operation One 2 Any. Channel to MProcess to receive start/resume calls Any 2 One. Channel to MProcess to signal that this CProcess has finished current step

Classes (2) MProcess: Master Process n n One 2 Any Channel Anyto. One. Channel

Classes (2) MProcess: Master Process n n One 2 Any Channel Anyto. One. Channel to any CProcess Synchronizes the phases and intra-process communication by waiting for all CProcesses to finish current phase and then resume them for the next phase

Classes (3) Launcher: Threads driver n n n Create channels Create one MProcess and

Classes (3) Launcher: Threads driver n n n Create channels Create one MProcess and CProcess Run them in parallel

Intra-process communication in CProcess Might send/receive multiple items n n n Determines the indices

Intra-process communication in CProcess Might send/receive multiple items n n n Determines the indices that need to be shifted Packs them in form of a message Sends the message to the next CProcess and receive from the previous process in the shift chain Unpack the received message Assign the items inside to the same indices determined in the first step

UML Diagram

UML Diagram

The Intraprocess Shift Synchronized send and then receive Cycle might form n n All

The Intraprocess Shift Synchronized send and then receive Cycle might form n n All CProcesses will go to send state and wait for the next CProcess to receive None of CSProcesses receive -> Deadlock

The Shift Cycle (1) n One CProcess in the cycle should receive to break

The Shift Cycle (1) n One CProcess in the cycle should receive to break the cycle But will lose the value which has to send Receives and buffers the send value Sends and then assign the buffered value to the relevant array cell

The Shift Cycle (3) Cycles happen when the interleaving value h divides N We

The Shift Cycle (3) Cycles happen when the interleaving value h divides N We do buffered read for all numbers less than h

The Shift Cycle (3) Even after this, the program runs into deadlock again Cycles

The Shift Cycle (3) Even after this, the program runs into deadlock again Cycles form when gcd(h, N) is greater than 1 Must buffer values less than equal to gcd(h, N)

Results

Results