Slides from Shaaban Systolic Architectures Replace single processor
Slides from Shaaban Systolic Architectures • Replace single processor with an array of regular processing elements • Orchestrate data flow for high throughput with less memory access • Different from pipelining – Nonlinear array structure, multidirection data flow, each PE may have (small) local instruction and data memory • Different from SIMD: each PE may do something different • Initial motivation: VLSI enables inexpensive special-purpose chips • Represent algorithms directly by chips connected in regular pattern EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product b 2, 0 b 1, 0 b 0, 0 Alignments in time b 2, 1 b 1, 1 b 0, 1 b 2, 2 b 1, 2 b 0, 2 Columns of B Rows of A a 0, 2 a 1, 2 a 2, 1 a 1, 1 a 0, 0 a 1, 0 a 2, 0 T=0 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product b 2, 0 b 1, 0 Alignments in time b 2, 1 b 1, 1 b 0, 1 b 2, 2 b 1, 2 b 0, 0 a 0, 2 a 1, 2 a 2, 1 a 1, 1 a 0, 0*b 0, 0 a 1, 0 a 2, 0 T=1 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time b 2, 1 b 1, 1 b 2, 0 b 1, 0 a 0, 2 a 0, 1 a 0, 0*b 0, 0 + a 0, 1*b 1, 0 a 0, 0 b 2, 2 b 1, 2 b 0, 1 a 0, 0*b 0, 1 b 0, 0 a 1, 2 a 2, 1 a 1, 0*b 0, 0 a 2, 0 T=2 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time b 2, 2 b 1, 2 b 2, 1 b 2, 0 a 0, 2 a 0, 0*b 0, 0 + a 0, 1*b 1, 0 + a 0, 2*b 2, 0 a 0, 1 b 1, 0 a 1, 2 a 1, 1 a 1, 0*b 0, 0 + a 1, 1*b 1, 0 a 1, 0 b 1, 1 a 0, 0*b 0, 1 + a 0, 1*b 1, 1 a 0, 0 b 0, 2 a 0, 0*b 0, 2 b 0, 1 a 1, 0*b 0, 1 b 0, 0 a 2, 2 a 2, 1 a 2, 0 T=3 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ a 2, 0*b 0, 0 EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time b 2, 2 b 2, 1 a 0, 0*b 0, 0 + a 0, 1*b 1, 0 + a 0, 2*b 2, 0 a 0, 2 b 2, 0 a 1, 2 a 1, 0*b 0, 0 + a 1, 1*b 1, 0 + a 1, 2*a 2, 0 a 1, 1 b 1, 0 a 2, 2 T=4 a 2, 1 a 2, 0*b 0, 0 + a 2, 1*b 1, 0 a 2, 2 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ a 2, 0 a 0, 0*b 0, 1 + a 0, 1*b 1, 1 + a 0, 2*b 2, 1 a 0, 1 b 1, 1 a 1, 0*b 0, 1 +a 1, 1*b 1, 1 a 1, 0 b 1, 2 a 0, 0*b 0, 2 + a 0, 1*b 1, 2 b 0, 2 a 1, 0*b 0, 2 b 0, 1 a 2, 0*b 0, 1 EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time b 2, 2 a 0, 0*b 0, 0 + a 0, 1*b 1, 0 + a 0, 2*b 2, 0 a 0, 0*b 0, 1 + a 0, 1*b 1, 1 + a 0, 2*b 2, 1 a 0, 2 b 2, 1 a 1, 0*b 0, 0 + a 1, 1*b 1, 0 + a 1, 2*a 2, 0 a 1, 2 b 2, 0 a 2, 2 a 2, 0*b 0, 0 + a 2, 1*b 1, 0 + a 2, 2*b 2, 0 T=5 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ a 2, 1 a 1, 0*b 0, 1 +a 1, 1*b 1, 1 + a 1, 2*b 2, 1 a 1, 1 b 1, 1 a 2, 0*b 0, 1 + a 2, 1*b 1, 1 a 2, 0 a 0, 0*b 0, 2 + a 0, 1*b 1, 2 + a 0, 2*b 2, 2 b 1, 2 a 1, 0*b 0, 2 + a 1, 1*b 1, 2 b 0, 2 a 2, 0*b 0, 2 EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time a 0, 0*b 0, 0 + a 0, 1*b 1, 0 + a 0, 2*b 2, 0 a 0, 0*b 0, 1 + a 0, 1*b 1, 1 + a 0, 2*b 2, 1 a 0, 0*b 0, 2 + a 0, 1*b 1, 2 + a 0, 2*b 2, 2 a 1, 0*b 0, 0 + a 1, 1*b 1, 0 + a 1, 2*a 2, 0 a 1, 0*b 0, 1 +a 1, 1*b 1, 1 + a 1, 2*b 2, 1 a 1, 2 b 2, 1 a 2, 0*b 0, 0 + a 2, 1*b 1, 0 + a 2, 2*b 2, 0 T=6 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ a 2, 2 a 2, 0*b 0, 1 + a 2, 1*b 1, 1 + a 2, 2*b 2, 1 a 1, 0*b 0, 2 + a 1, 1*b 1, 2 + a 1, 2*b 2, 2 b 1, 2 a 2, 0*b 0, 2 + a 2, 1*b 1, 2 EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
Systolic Array Example: 3 x 3 Systolic Array Matrix Multiplication • Processors arranged in a 2 -D grid • Each processor accumulates one element of the product Alignments in time a 0, 0*b 0, 0 + a 0, 1*b 1, 0 + a 0, 2*b 2, 0 a 1, 0*b 0, 0 + a 1, 1*b 1, 0 + a 1, 2*a 2, 0 a 0, 0*b 0, 1 + a 0, 1*b 1, 1 + a 0, 2*b 2, 1 a 0, 0*b 0, 2 + a 0, 1*b 1, 2 + a 0, 2*b 2, 2 a 1, 0*b 0, 2 + a 1, 1*b 1, 2 + a 1, 2*b 2, 2 a 1, 0*b 0, 1 +a 1, 1*b 1, 1 + a 1, 2*b 2, 1 Done b 2, 2 a 2, 0*b 0, 0 + a 2, 1*b 1, 0 + a 2, 2*b 2, 0 T=7 Example source: http: //www. cs. hmc. edu/courses/2001/spring/cs 156/ a 2, 0*b 0, 1 + a 2, 1*b 1, 1 + a 2, 2*b 2, 1 a 2, 2 a 2, 0*b 0, 2 + a 2, 1*b 1, 2 + a 2, 2*b 2, 2 EECC 756 - Shaaban # lec # 1 Spring 2003 3 -11 -2003
- Slides: 9