A Concurrent Matrix Transpose Algorithm The Implementation Presented


















- Slides: 18
A Concurrent Matrix Transpose Algorithm, The Implementation Presented by Pourya Jafari
Review: Algorithm Steps Pre-process inside each thread n Shift rows Intra-process/thread communication n Shift columns Post-process inside each thread n Shift rows again 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33
Review: Shift values? Set shifts based on row index : range 0 to N-1 Now arrange the rows, so that column shifts gets us to i n n Preprocess shifting: i’ = i - L After intra-process shift columns should be equal to original row index i i’ + j = i i-L+j=i L=-j So we shift each column j cells up
Review: Last step ? 00 01 02 03 00 11 22 33 10 11 12 13 10 21 32 03 03 10 21 32 20 21 22 23 20 31 02 13 20 31 32 33 30 01 12 23 30 (1) (2 -a) (2 -b) 1 → 2: Column shift j up 2 → 3: Row shift based on row indices 3 → 4: ? n Change of indices so far (i - j, j) → (i - j, i - j + j) (i - j, i) = (m, n) n One operation to change row index to j n - m = (i - j))= j (3) 00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33 (4)
Review: Radix Using radix representation, we can group row shifts We use radix 2 for simplicity n Digits are bit representation, Shift all row indices have their k-th bit on 0 0 0 1 1 1 2 3 Shift for each row = + 2 3 k=0 k=1
The concurrency picture Each thread can do pre/post processing independently Processes must synchronize n n n after each phase after each step of intra-process step during intra-process communications
Communication package (1) We need a mean of communication n n Facilitates synchronized communication Provides unbuffered communication to save memory JCSP: based on the algebra of Communicating Sequential Processes (CSP) n n has strong theory background Object Oriented
Communication package (2) JCSP provides n One 2 One. Channel Where a single sender can send a single receiver can receive n One 2 Any. Channel Where a single sender and many receiver can communicate but one at the same time n Any 2 One. Channel Multiple senders and one receiver
Classes (1) CProcess: Column process n n Has a PID; Knows N; Has an array to save its items One 2 One. Channel to each other process for intra-process shift operation One 2 Any. Channel to MProcess to receive start/resume calls Any 2 One. Channel to MProcess to signal that this CProcess has finished current step
Classes (2) MProcess: Master Process n n One 2 Any Channel Anyto. One. Channel to any CProcess Synchronizes the phases and intra-process communication by waiting for all CProcesses to finish current phase and then resume them for the next phase
Classes (3) Launcher: Threads driver n n n Create channels Create one MProcess and CProcess Run them in parallel
Intra-process communication in CProcess Might send/receive multiple items n n n Determines the indices that need to be shifted Packs them in form of a message Sends the message to the next CProcess and receive from the previous process in the shift chain Unpack the received message Assign the items inside to the same indices determined in the first step
UML Diagram
The Intraprocess Shift Synchronized send and then receive Cycle might form n n All CProcesses will go to send state and wait for the next CProcess to receive None of CSProcesses receive -> Deadlock
The Shift Cycle (1) n One CProcess in the cycle should receive to break the cycle But will lose the value which has to send Receives and buffers the send value Sends and then assign the buffered value to the relevant array cell
The Shift Cycle (3) Cycles happen when the interleaving value h divides N We do buffered read for all numbers less than h
The Shift Cycle (3) Even after this, the program runs into deadlock again Cycles form when gcd(h, N) is greater than 1 Must buffer values less than equal to gcd(h, N)
Results