Parallelizing Sobel Edge Detection MessagePassing SharedMemory and Streaming
Parallelizing Sobel Edge Detection Message-Passing, Shared-Memory, and Streaming Implementations Patrick Griffin, Levente Jakob, and James Psota
Methodology n 3 Computational Models: ¨ Message Passing ¨ Shared Memory ¨ Stream n Evaluated on Raw simulator
Sobel Edge Detection Gx = n n Gy = n n Edges are defined by first derivative Find dx and dy with convolution Edge strength: We used:
Message Passing Implementation Calc Calc DRAM Calc Calc DRAM Scatter + Calc Gather + Calc n Tile 8 distributes “oversized” partitions
Message Passing Results n n 177 cycles / pixel Memory copies kill performance ¨ Packaging/unpackaging messages expensive ¨ Alternative solution: send many messages, no need to package n r. MPI status ¨ Seemingly correct, but currently not optimized ¨ Non-blocking receives will help tremendously for some applications
Shared Memory Implementation
Shared Memory Results
Streaming Implementation Red Row Sum Gx Gy Green Row Sum Gx Gy Blue Row Sum Gx Gy Pack Sum Abs
Streaming Results 20 cycles / pixel n Throughput limited by Gx calculations n Most nodes required no memory, Gx and Gy need 2 rows worth n Integer ops + abs(float) n All routing done on static network 1 n
Conclusions n n Streaming 4 x slower than ideal Overhead of MP and SM detrimental Method Cycles / Pixel Ideal 79/16 = 5 Streaming 20 Shared Memory Message Passing 539 177
- Slides: 10