Parallelizing Sobel Edge Detection MessagePassing SharedMemory and Streaming

Parallelizing Sobel Edge Detection Message-Passing, Shared-Memory, and Streaming Implementations Patrick Griffin, Levente Jakob, and James Psota

Methodology n 3 Computational Models: ¨ Message Passing ¨ Shared Memory ¨ Stream n Evaluated on Raw simulator

Sobel Edge Detection Gx = n n Gy = n n Edges are defined by first derivative Find dx and dy with convolution Edge strength: We used:

Message Passing Implementation Calc Calc DRAM Calc Calc DRAM Scatter + Calc Gather + Calc n Tile 8 distributes “oversized” partitions

Message Passing Results n n 177 cycles / pixel Memory copies kill performance ¨ Packaging/unpackaging messages expensive ¨ Alternative solution: send many messages, no need to package n r. MPI status ¨ Seemingly correct, but currently not optimized ¨ Non-blocking receives will help tremendously for some applications

Shared Memory Implementation

Shared Memory Results

Streaming Implementation Red Row Sum Gx Gy Green Row Sum Gx Gy Blue Row Sum Gx Gy Pack Sum Abs

Streaming Results 20 cycles / pixel n Throughput limited by Gx calculations n Most nodes required no memory, Gx and Gy need 2 rows worth n Integer ops + abs(float) n All routing done on static network 1 n

Conclusions n n Streaming 4 x slower than ideal Overhead of MP and SM detrimental Method Cycles / Pixel Ideal 79/16 = 5 Streaming 20 Shared Memory Message Passing 539 177