Weighted Matrix Reordering and Parallel Banded Preconditioners for
Weighted Matrix Reordering and Parallel Banded Preconditioners for Nonsymmetric Linear Systems Murat Manguoğlu*, Mehmet Koyutürk**, Ananth Grama* and Ahmed Sameh* * Purdue University ** Case Western Reserve University Support: DARPA, NSF, Intel, NCSA
A computational loop Integration Newton Iteration Linear system solvers k k t
Motivation • New architectures increasingly rely on parallelism • Concurrency and localization play an important role • Algorithms for such platforms must account for concurrency and memory references
Implications: General Sparse Solvers • Maximal use of dense kernels • Development of methods that optimize concurrency • A banded matrix is a natural candidate as a preconditioner
Preprocessing to Obtain the Preconditioner (Bi. CGStab/GMRES is used as the iterative solver) • ILUPACK : Multilevel ILU [Bollhöfer] – http: //www. math. tu-berlin. de/ilupack/ • ILUT : Incomplete LU Factorization from Sparsekit [Saad] – http: //www-users. cs. umn. edu/~saad/software/SPARSKIT/sparskit. html • ILUT-I : Improved ILUT [Benzi, et. al. ] 1. reorder using HSL-MC 64 to maximize the product of the diagonals and scale the matrix 2. apply symmetric RCM reordering 3. get the incomplete factorization via ILUT
• WSO : Our proposed method 1. reorder using HSL-MC 64 to make the diagonal zero free 2. reorder |A| + |AT | using HSL-MC 73 to place larger elements closest to the main diagonal 3. extract a banded preconditioner, such that %99. 9 percent of the weight is inside the band 4. factorize the banded preconditioner
Test Problems Matrix Name Application Circuit Simulati 1. ASIC_680 K on 2. DC 1 3. FINAN 512 4. H 2 O n nnz 680, 0 2, 638, 0 99 0 7 Circuit Simulati on 116, 8 3 5 766, 39 6 Econometric s 74, 7 5 2 596, 99 2 Quantum Chemist ry 67, 0 2, 216, 2 73 4 6 54,
Comparison to ILUPACK AMF/PQ preconditoners on an uniprocessor [of Sgi-Altix] MethodMatrix Number 1 2 3 4 5 6 ILUPACK-AMF >600 s Conv. Best Conv. ILUPACK-PQ >600 s Conv. Best WSO Best Outer Iterative Solver: unrestarted GMRES ILUPACK Parameters : droptol : 1 e-1 , bound for inv(L), inv(U) : 10 , elbow space : 100
Comparison to ILUT and Improved. ILUT Preconditioners on an uniprocessor [of Clovertown] MethodMatrix Number 1 2 3 4 Con v. Best 5 6 Fail Con v. ILUT(1 e-1, n) Fail ILUTI(1 e-1, n) Con Con Con v. v. v. ILUT(1 e-3, n) Fail >60 0 s ILUTI(1 e-3, n) Con Con Con v. v. v. >60 0 s ILUT(0, k) Fail >60 0 s ILUTI(0, k) Con v. >60 0 s WSO Best Con v. >60 0 s Con v. v. Best Outer Iterative Solver : Bi. CGStab
WSO: Factorization+Solve time Scalability Speed improvement over uniprocessor timing on Sgi-Altix
Reordering and Solve Times of 3 Different Systems on an Uniprocessor
Reservoir Simulation (SPE 10 benchmarks) • Problem #1 : N= 2, 244, 000 • Problem #2 : N= 2, 462, 265 • “banded systems” →Simple/no reordering to extract a central band as a preconditioner • Results on an SGI-Altix
Reservoir Simulation #1 33 Algebraic Multigrid time: 31. 4 seconds (AMD dual core) 21 11 5 2. 5 1. 4
Reservoir Simulation #1
Reservoir Simulation #1
Reservoir Simulation #2 58 29 14 7 3 2 1
Reservoir Simulation #2
Summary and Future Work • Weighted reordering is an effective method for obtaining a banded preconditioner • Overall the method we propose is both reliable and scalable • Spectral reordering is relatively inexpensive for extracting banded preconditioners for solving several systems with “roughly the same” matrix of coefficients. • Parallel weighted reordering schemes needs to be developed
- Slides: 18