PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By Dima

PARALLEL COMPUTATION FOR MATRIX MULTIPLICATION Presented By: Dima Ayash Kelwin Payares Tala Najem

Matrix Multiplication • Matrix operations, like matrix multiplication, are commonly used in almost all areas of scientific research. • Applications of matrix multiplication in computational problems are found in many fields including scientific computing and pattern recognition and in seemingly unrelated problems such counting the paths through a graph (graph theory), signal processing, digital control and is such a central operation in many numerical algorithms. • Many different algorithms have been designed for multiplying matrices on different types of hardware, including parallel and distributed systems, where the computational work is spread over multiple processors (perhaps over a network).

C = A B. . . MATH •

Sequential Algorithm

Sequential Algorithm • A B C

Partitioning into Submatrices • Matrix is divided into S 2 submatrices. Each submatrix has n/s elements. Using the notation Ap, q as the sbmatrix in submatrix row p and submatrix column q: • For (p=0; p<s; p++) For (q=0; q<s; q++){ Cp, q=0; /*clear elements of sumbatrix*/ For (r=0; r<m ; r++) /*submatrix multiplication and add to accumulating submatrix*/ Cp, q= Cp, q + Ap, r * Br, q; } • The line: Cp, q= Cp, q + Ap, r * Br, q means multiply submatrix Ap, r and Br, q using matrix multiplication and add to submatrix Cp, q using matrix addition.

Partitioning into Submatrices -cont

Cannon Algorithm • This is a memory efficient algorithm. • Both n matrices A & B are partitioned among P processors. A B

Cannon Algorithm –Cont. 1. Initially processor Pi, j has elements ai, j and bi, j (0 ≤ I < n, 0 ≤ j < n)

Cannon Algorithm –Cont. 2. Elements are moved from their initial position to an aligned position. The complete ith row of A is shifted i places left and the complete jth column of B is shifted j places upward.

Cannon Algorithm –Cont. 3. Each processor Pi, j multiply its elements. 4. The ith row of A is shifted one place left, and the jth column of B is shifted one place upward.

Cannon Algorithm –Cont. 5. Each processor Pi, j multiplies its elements brought to it and adds the results to the accumulating sum. 6. Step 4 and 5 are repeated until the final result is obtained (n-1 shifts with n rows and n columns of elements).

Cannon Algorithm –Cont. Initially the matrix A Initially the matrix B Row 0 is unchanged. Column 0 is unchanged. Row 1 is shifted 1 place left. Column 1 is shifted one place up. Row 2 is shifted 2 places left. Column 2 is shifted 2 places up. Row 3 is shifted 3 places left. Column 3 is shifted 3 places up.

Cannon Algorithm –Cont.

Cannon Algorithm – Step 1

Cannon Algorithm – Step 2

Cannon Algorithm – Step 3

Cannon Algorithm – Step 4

Fox Algorithm •

Fox Algorithm –Cont. •

Fox Algorithm –Step 1 • Initially broadcast the diagonal elements of A

Fox Algorithm –Step 2 • Broadcast the next element of A in rows, shift B in column and perform multiplication

Fox Algorithm –Step 3 • Broadcast the next element of A in rows, shift B in column and perform multiplication

Fox Algorithm –Step 4 • Broadcast the next element of A in rows, shift B in column and perform multiplication

Fox Algorithm –Conclusion • Shifting is over. Stop the iteration. • Conclusion • Fox algorithm is memory efficient method. • Communication overhead is more than Cannon algorithm.

Parallel Algorithm for Dense Matrix Multiplication •

Parallel Algorithm for Dense Matrix Multiplication –Cont.

Example • Matrices to be multiplied:

Example –Cont. • These matrices are divided into 4 square blocks as follows:

Example –Cont. • Matrices A and B after the initial alignment.

Example –Cont. • Local matrix multiplication

Example –Cont. • Shift A one step to left, shift B one step up

Example –Cont. • Local matrix multiplication.

References • Parallel Computing Chapter 8 Dense Matrix Multiplication, Jun Zhang Department of Computer Science University of Kentucky. • Parallel Programming Application: Matrix Multiplication, UYBHM Yaz Çalıstayı 15 – 26 Haziran 2009. • Parallel Algorithm for Dense Matrix Multiplication; Ortega, Patricia 2012.

Thank you