Targeting MultiCore Systems in Linear Algebra Applications Presented
Targeting Multi-Core Systems in Linear Algebra Applications Presented by Jack Dongarra Alfredo Buttari Jakub Kurzak
Multithreading LA operations. LU Factorization · Indentify performance bottlenecks: BLAS 2 operations cannot be efficiently parallelized · Define dependencies between subtasks · Hide the execution of serial subtask by means of lookahead 2
Multithreading LA operations. LU Factorization Linear Algebra operations can be represented as Directed Acyclic Graphs (DAG) where the nodes represent the tasks in which the operation may be decomposed and the edges represent the dependencies among them. 3
Multithreading LA operations. LU Factorization The panel factorization can be overlapped to the trailing submatrix update (Lookahead) 4
Multithreading LA operations. LU Factorization 5
Multithreading LA operations. LU Factorization Adaptive lookahead can avoid stalls thanks to dynamic scheduling of the subtasks 6
Multithreading LA operations. One Sided Transformations 7
Multithreading LA operations. Hessenberg Reduction Only Lookahead-1 is possible on Hessenberg reduction. 8
Multithreading LA operations. Hessenberg Reduction Two sided transformations like the Hessenberg reduction suffer from two main problems: · The presence of much more strict dependencies that almost prevent overlapping · The cost of the panel is extremely high 9
Multithreading LA operations. Hessenberg Reduction At each step only a slight improvement can be achieved and the cost of the panel cannot be completely hidden 10
Contacts Jack Dongarra Alfredo Buttari Jakub Kurzak 11
- Slides: 11