Systolic Array HW Enrique Montealegre CDA 4150 Computer

  • Slides: 20
Download presentation
Systolic Array HW Enrique Montealegre CDA 4150 Computer Architecture Dr. Montagne Fall 2005

Systolic Array HW Enrique Montealegre CDA 4150 Computer Architecture Dr. Montagne Fall 2005

Problem Using the linear array explained in class, create a Power. Point presentation to

Problem Using the linear array explained in class, create a Power. Point presentation to show all the steps to execute two matrix vector products simultaneously, say: y = Ax and w = Bz. The matrices must be 3 x 3 matrices and the systolic array must have only 5 processing elements. a) How many steps are necessary to carry out the simultaneous computations? b) Assuming that the input vector is called the carrier vector, where in the carrier vector are the elements of x stored? c) Where in the output vector are the elements of w stored?

The matrices

The matrices

Matrix Multiplication To calculate Y we follow this formula: y 1 = a 11*x

Matrix Multiplication To calculate Y we follow this formula: y 1 = a 11*x 1 + a 12*x 2 + a 13*x 3 y 2 = a 21*x 1 + a 22*x 2 + a 23*x 3 y 3 = a 31*x 1 + a 32*x 2 + a 33*x 3 To calculate W we follow this formula: w 1 = b 11*z 1 + b 12*z 2 + b 13*z 3 w 2 = b 21*z 1 + b 22*z 2 + b 23*z 3 w 3 = b 31*z 1 + b 32*z 2 + b 33*z 3

Processing Unit

Processing Unit

Systolic array with 5 processing Units

Systolic array with 5 processing Units

Set Matrix A a 33 a 23 a 13 a 32 a 22 a

Set Matrix A a 33 a 23 a 13 a 32 a 22 a 12 a 31 a 21 a 11 y 1 x 3 x 2 x 1 y 2 y 3

Set Matrix B b 33 b 23 a 33 b 13 a 23 b

Set Matrix B b 33 b 23 a 33 b 13 a 23 b 22 a 13 b 12 a 22 b 21 b 11 a 21 a 12 b 32 a 32 b 31 a 11 y 1 w 1 y 2 z 3 x 3 z 2 x 2 z 1 x 1 w 2 y 3 w 3

T 0 b 33 b 23 a 33 b 13 a 23 b 22

T 0 b 33 b 23 a 33 b 13 a 23 b 22 a 13 b 12 a 22 b 21 b 11 a 21 a 12 b 32 a 32 b 31 a 11 w 1 y 2 y 1 z 3 x 3 z 2 x 2 z 1 x 1 w 2 y 3 w 3

T 1 b 33 b 23 a 33 b 13 a 23 b 22

T 1 b 33 b 23 a 33 b 13 a 23 b 22 a 13 b 12 a 22 b 21 b 11 a 21 a 12 b 32 a 32 b 31 a 11 y 2 y 1 z 3 x 3 z 2 x 2 z 1 x 1 w 2 y 3 w 3

T 2 b 33 b 23 a 33 b 13 a 23 b 22

T 2 b 33 b 23 a 33 b 13 a 23 b 22 a 13 b 12 a 22 a 12 b 32 a 32 b 31 b 21 b 11 a 31 a 21 w 2 y 1 a 11 z 3 x 3 z 2 x 2 y 1 = a 11*x 1 z 1 x 1 w 1 y 2 y 3 w 3

T 3 b 33 b 23 a 33 b 13 a 23 b 22

T 3 b 33 b 23 a 33 b 13 a 23 b 22 a 13 b 12 a 22 b 32 a 32 b 31 b 21 a 31 y 3 y 1 w 1 a 12 z 3 x 3 z 2 x 2 y 1 = a 11*x 1 + a 12*x 2 y 2 = a 21* x 1 y 2 b 11 z 1 a 21 x 1 w 1 = b 11*z 1 w 2 w 3

T 4 b 33 b 13 b 23 a 33 a 23 b 22

T 4 b 33 b 13 b 23 a 33 a 23 b 22 b 32 a 32 b 31 w 3 y 1 w 1 a 13 z 3 x 3 y 2 b 12 z 2 w 2 a 22 x 2 y 3 b 21 z 1 a 31 x 1 y 1 = a 11*x 1 + a 12*x 2 + a 13*x 3 w 1 = b 11*z 1 + b 12*z 2 y 2 = a 21* x 1+ a 22*x 2 w 2 = b 21*z 1 y 3 = a 31*x 1

T 5 b 33 y 1 a 33 b 23 w 1 y 2

T 5 b 33 y 1 a 33 b 23 w 1 y 2 b 13 z 3 w 2 a 23 x 3 b 32 y 3 b 22 z 2 w 3 a 32 x 2 b 31 z 1 y 1 = in output vector w 1 = b 11*z 1+ b 12*z 2 + b 13*z 3 y 2 = a 21* x 1+ a 22*x 2 + a 23*x 3 w 2 = b 21*z 1+ b 22*z 2 y 3 = a 31* x 1+ a 32*x 2 w 3 = b 31*z 1

T 6 y 1 w 1 b 33 y 2 w 2 y 3

T 6 y 1 w 1 b 33 y 2 w 2 y 3 b 23 z 3 w 3 a 33 x 3 b 32 z 2 x 2 y 1 = in output vector w 1 = in output vector y 2 = complete answer w 2 = b 21*z 1+ b 22*z 2 + b 23*z 3 y 3 = a 31* x 1+ a 32*x 2 + a 33*x 3 w 3 = b 31*z 1+ b 32*z 2

T 7 y 1 w 1 y 2 w 2 y 3 w 3

T 7 y 1 w 1 y 2 w 2 y 3 w 3 b 33 z 3 x 3 z 2 y 1 = in output vector w 1 = in output vector y 2 = in output vector w 2 = complete answer y 3 = complete answer w 3 = b 31*z 1+ b 32*z 2 + b 33*z 3

T 8 y 1 w 1 y 2 w 2 y 3 w 3

T 8 y 1 w 1 y 2 w 2 y 3 w 3 z 3 x 3 y 1 = in output vector w 1 = in output vector y 2 = in output vector w 2 = in output vector y 3 = complete answer w 3 = complete answer

T 9 y 1 w 1 y 2 w 2 y 3 w 3

T 9 y 1 w 1 y 2 w 2 y 3 w 3 z 3 y 1 = in output vector w 1 = in output vector y 2 = in output vector w 2 = in output vector y 3 = in output vector w 3 = complete answer

T 10 y 1 w 1 y 2 w 2 y 3 w 3

T 10 y 1 w 1 y 2 w 2 y 3 w 3 y 1 = in output vector w 1 = in output vector y 2 = in output vector w 2 = in output vector y 3 = in output vector w 3 = in output vector

Answers a) How many steps are necessary to carry out the simultaneous computations? It

Answers a) How many steps are necessary to carry out the simultaneous computations? It took 11 steps b) b) Assuming that the input vector is called the carrier vector, where in the carrier vector are the elements of x stored? x 3 x 2 x 1 input vector c) Where in the output vector are the elements of w stored? w 1 output vector w 2 w 3