Automatically Tuned Linear Algebra Software ATLAS R Clint
Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www. netlib. org/atlas
What is ATLAS A package that adapts to differing architectures via AEOS techniques - Initially, supply BLAS Ø Automated Empirical Optimization of Software (AEOS) Ø Ø AEOS requires: - Method of code variation » Code generation » Multiple implement. » Parameterization - Sophisticated Timers - Robust search heuristic - Machine searches opt space - Finds applicationapparent architecture University of Tennessee www. netlib. org/atlas
Why ATLAS is needed Ø BLAS require many man-hours / platform - Only done if financial incentive is there » Many platforms will never have an optimal version - Lags behind hardware - May not be affordable by everyone - Improves vendor code Ø Allows for portably optimal codes - Obsolescence insurance Ø Operations may be important, but not general enough for standard University of Tennessee www. netlib. org/atlas
ATLAS Software Ø Currently provided Ø - Full BLAS (C & F 77) » Level 3 BLAS Generated GEMM - 1 -2 hours install time per precision u Recursive GEMMbased L 3 BLAS - Antoine Petitet u GEMV & GER ker » Level 1 BLAS - Some LAPACK » LU, LLt University of Tennessee - pthread support - Open source kernels » SSE & 3 DNOW! » GOTO ev 5/6 BLAS - Performance for banded and packed - More LAPACK » Level 2 BLAS u Coming soon Ø Coming not-so-soon - Sparse support - User customization www. netlib. org/atlas
Algorithmic Approach for Matrix Multiply Ø Ø Ø Only generated code is on-chip multiply All BLAS operations written in terms of generated on-chip multiply All transpose cases coerced through data copy to 1 case of on-chip multiply - Only 1 case generated per platform N M NB C University of Tennessee K M A N * B K www. netlib. org/atlas
Algorithmic approach for Level 3 BLAS Ø Ø Recur down to L 1 cache block size Need kernel at bottom of recursion - Use gemm-based kernel for portability Recursive TRMM 0 0 0 0 University of Tennessee www. netlib. org/atlas
500 x 500 DGEMM Across Various Architectures University of Tennessee www. netlib. org/atlas
500 x 500 Double Precision RB LU factorization University of Tennessee www. netlib. org/atlas
500 x 500 Recursive BLAS on Ultra. Sparc 2200 University of Tennessee www. netlib. org/atlas
- Slides: 9