CASL Consortium for Advanced Simulation of Light Water

Contributors ORNL Staff Students and Post. Docs • Greg Davidson • Rachel Slaybaugh (Wisconsin)

Outline • Neutronics • Deterministic Transport • Parallel Algorithms and Solvers • Verification and

Virtual Reactor Simulation • Neutronics is one part of a complete reactor simulation 4

VERA (Virtual Environment for Reactor Applications) 5 Managed by UT-Battelle for the U. S.

Science Drivers for Neutronics ~10 -20 cm • Spatial resolution – To resolve the

Denovo HPC Transport 7 Managed by UT-Battelle for the U. S. Department of Energy

Denovo Capabilities • State of the art transport methods • Modern, Innovative, High. Performance

Denovo Capabilities • Parallel Algorithms • Advanced visualization, run-time, and development environment – Koch-Baker-Alcouffe

Discrete Ordinates Methods • We solve the first-order form of the transport equation: –

Discrete Ordinates Methods • The SN method is a collocation method in angle. –

Degrees of Freedom • Total number of unknowns in solve: • An ideal (conservative)

Eigenvalue Problem • The eigenvalue problem has the following form • Expressed in standard

Advanced Eigenvalue Solvers • We can use Krylov (Arnoldi) iteration to solve the eigenvalue

Solver Taxonomy Eigenvalue Solvers Power iteration Arnoldi Shifted-inverse The innermost part of each solver

KBA Algorithm sweeping in direction of particle flow KBA is a direct-inversion algorithm Start

Parallel Performance Angular Pipelining • Angles in ± z directions are pipelined • Results

KBA Reality KBA does not achieve close to the predicted maximum • Communication latency

Efficiency vs Block Size 19 Managed by UT-Battelle for the U. S. Department of

Overcoming Wavefront Challenge • This behavior is systemic in any wavefronttype problem – Hyberbolic

Multilevel Energy Decomposition The use of Krylov methods to solve the multigroup equations effectively

Multilevel Summary • Energy decomposed into sets. • Each set contains blocks constituting the

Whole Core Reactor Problem PWR-900 Whole Core Problem • 2 and 44 -group, homogenized

Results Solvers Blocks Sets Domains Solver Time (min) PI + MG GS (2 -grid

Parallel Scaling and Peak Performance 17, 424 cores is effectively the maximum that can

Strong Scaling Optimized communication gave performance boost to 100 K core job, number of

Scaling Limitations 27 • Reduction across groups each iteration imposes a “flat” cost •

GPU Performance • Single core (AMD Istanbul) / single GPU (Fermi C 2050 comparison

AMA V&V Activities • Andrew Godfrey (AMA) has performed a systematic study of Denovo

Quarter Core Simulations • Good results are achieved (< 40 pcm) using 4 x

Japanese Events • ORNL and CASL are working to address the Japanese Emergency •

Questions 32 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel

RQI Solver • Shift the right-hand side and take Rayleigh. Quotient accelerates eigenvalue convergence

MG Krylov Preconditioning • Each MG Krylov iteration involves two-steps – preconditioning: – matrix-vector

V-Cycle Relaxation • We are investigating both weighted-Jacobi • And weighted-Richardson relaxation schemes •

Traditional SN Solution Methods • Traditional SN solutions are divided into outer iterations over

Krylov Methods • Krylov methods are more robust than stationary solvers – Uniformly stable

Pin Cell and Lattice Results • Summary: – Pin cell yields converged results with

Current State-of-the-Art in Reactor Neutronics • 0/1 -D transport • High energy fidelity (102

Verification and Validation • We have successfully run the C 5 G 7 (unrodded)

Slides: 40

Download presentation

CASL: Consortium for Advanced Simulation of Light Water Reactors Neutronics and 3 D SN Transport Thomas M. Evans ORNL HPC Users Meeting, April 6 2011 Houston, TX

Contributors ORNL Staff Students and Post. Docs • Greg Davidson • Rachel Slaybaugh (Wisconsin) • Josh Jarrell • Bob Grove • Chris Baker • Andrew Godfrey • Kevin Clarno • Douglas Peplow • Scott Mosher CASL • Roger Pawloski 2 • Brian Adams Managed by UT-Battelle for the U. S. Department of Energy • Stuart Slattery (Wisconsin) • Josh Hykes (North Carolina State) • Todd Evans (North Carolina State) • Cyrus Proctor (North Carolina State) OLCF (NCCS) Support • Dave Pugmire • Sean Ahern • Wayne Joubert Denovo Parallel SN

Outline • Neutronics • Deterministic Transport • Parallel Algorithms and Solvers • Verification and Validation 3 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Virtual Reactor Simulation • Neutronics is one part of a complete reactor simulation 4 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

VERA (Virtual Environment for Reactor Applications) 5 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Science Drivers for Neutronics ~10 -20 cm • Spatial resolution – To resolve the geometry • 109 -12 unknowns • mm 3 cells in a m 3 vessel – Depletion makes it harder ~1 -2 cm • Energy resolution – To resolve resonances • 104 -6 unknowns • Done in 0 D or 1 D today • Angular resolution – To resolve streaming • 102 -4 unknowns – Space-energy resolution make it harder 6 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN BWR and PWR cores have similar dimension, but much different compositions and features 3 -8 m radial 4 -5 m height

Denovo HPC Transport 7 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Denovo Capabilities • State of the art transport methods • Modern, Innovative, High. Performance Solvers – 3 D/2 D, non-uniform, regular grid SN – Multigroup energy, anisotropic PN scattering – Forward/Adjoint – Within-group solvers • Krylov (GMRES, Bi. CGStab) and source iteration • DSA preconditioning (Super. LU/MLpreconditioned CG/PCG) – Multigroup solvers – Fixed-source/k-eigenvalue – 6 spatial discretization algorithms • Linear and Trilinear discontinuous FE, stepcharacteristics, thetaweighted diamond, weighted diamond + flux-fixup – Parallel first-collision • Transport Two-Grid upscatter acceleration of Gauss-Seidel • Krylov (GMRES, Bi. CGtab) – Multigrid preconditioning in development – Eigenvalue solvers • Power iteration (with rebalance) – CMFD in testing phase • Krylov (Arnoldi) • RQI • Analytic ray-tracing (DR) Power distribution in a BWR assembly • Monte Carlo (DR and DD) – Multiple quadratures 8 Managed by UT-Battelle Level-symmetric for the U. S. Department of Energy • Denovo Parallel SN

Denovo Capabilities • Parallel Algorithms • Advanced visualization, run-time, and development environment – Koch-Baker-Alcouffe (KBA) wavefront decomposition – 3 front-ends (HPC, SCALE, Python-bindings) – Domain-replicated (DR) and domain-decomposed firstcollision solvers – Direct connection to SCALE geometry and data – Multilevel energy decomposition – Parallel I/O built on SILO/HDF 5 > 10 M CPU hours on Jaguar with 3 bugs – Direct connection to MCNP input through ADVANTG – HDF 5 output directly interfaced with Vis. It – Built-in unit-testing and regression harness with DBC – Emacs-based code-development environment 2010 -11 INCITE Award Uncertainty Quantification for Three Dimensional Reactor Assembly Simulations, 26 MCPU-HOURS 2010 ASCR Joule Code 2009 -2011 2 ORNL LDRDs 9 Managed by UT-Battelle for the U. S. Department of Energy – Support for multiple external vendors Denovo Parallel SN • BLAS/LAPACK, TRILINOS (required) • BRLCAD, SUPERLU/METIS, SILO/HDF 5 (optional) • MPI (toggle for parallel/serial builds) • SPRNG (required for MC module) • PAPI (optional instrumentation)

Discrete Ordinates Methods • We solve the first-order form of the transport equation: – Eigenvalue form for multiplying media (fission): – Fixed source form: 10 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Discrete Ordinates Methods • The SN method is a collocation method in angle. – Energy is discretized in groups. – Scattering is expanded in Spherical Harmonics. – Multiple spatial discretizations are used (DGFEM, Characteristics, Cell-Balance). • Dimensionality of operators: 11 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Degrees of Freedom • Total number of unknowns in solve: • An ideal (conservative) estimate. 12 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Eigenvalue Problem • The eigenvalue problem has the following form • Expressed in standard form Energy-dependent Energy-indepedent • The traditional way to solve this problem is with Power Iteration 13 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Advanced Eigenvalue Solvers • We can use Krylov (Arnoldi) iteration to solve the eigenvalue problem more efficiently Matrix-vector multiply and sweep Multigroup fixed-source solve • Shifted-inverse iteration (Raleigh-Quotient Iteration) is also being developed (using Krylov to solve the shifted multigroup problem in each eigenvalue iteration) 14 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Solver Taxonomy Eigenvalue Solvers Power iteration Arnoldi Shifted-inverse The innermost part of each solver are transport sweeps Multigroup Solvers Gauss-Seidel Residual Krylov Gauss-Seidel + Krylov “It’s turtles all the way down…” Within-group Solvers Krylov Residual Krylov Source iteration 15 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

KBA Algorithm sweeping in direction of particle flow KBA is a direct-inversion algorithm Start first angle in (-1, +1, -1) octant Begin next angle in octant 16 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Parallel Performance Angular Pipelining • Angles in ± z directions are pipelined • Results in 2×M pipelined angles per octant • Quadrants are ordered to reduce latency 17 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN 6 angle pipeline (S 4; M = 3)

KBA Reality KBA does not achieve close to the predicted maximum • Communication latency dominates as the block size becomes small • Using a larger block size helps achieve the predicted efficency but, 18 – Maximum achievable efficiency is lower – Places a fundamental limit on the number of cores that can be Managedused by UT-Battelle for any given problem for the U. S. Department of Energy Denovo Parallel SN

Efficiency vs Block Size 19 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Overcoming Wavefront Challenge • This behavior is systemic in any wavefronttype problem – Hyberbolic aspect of transport operator • We need to exploit parallelism beyond spaceangle – Energy – Time • Amortize the inefficiency in KBA while still retaining direct inversion of the transport operator 20 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Multilevel Energy Decomposition The use of Krylov methods to solve the multigroup equations effectively decouples energy – Each energy-group SN equation can be swept independently – Efficiency is better than Gauss. Seidel 21 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Multilevel Summary • Energy decomposed into sets. • Each set contains blocks constituting the entire spatial mesh. • The total number of domains is • KBA is performed for each group in a set across all of the blocks. – Not required to scale beyond O(1000) cores. • Scaling in energy across sets should be linear. • Allows scaling to O(100 K) cores and enhanced parallelism on accelerators. 22 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Whole Core Reactor Problem PWR-900 Whole Core Problem • 2 and 44 -group, homogenized fuel pins 17× 17 assembly • 2× 2 spatial discretization per fuel pin • 17× 17 fuel pins per assembly • 289 assemblies (157 fuel, 132 reflector) – high, med, low enrichments • Space-angle unknowns: 23 – 233, 858, 800 cells – 168 angles (1 moment) Managed by UT-Battelle for the U. S. – Department of Energy Denovo Parallel S 1 spatial unknown per cell N

Results Solvers Blocks Sets Domains Solver Time (min) PI + MG GS (2 -grid preconditioning) 17, 424 11. 00 PI + MG Krylov 10, 200 2 20, 400 3. 03 Arnoldi + MG Krylov 10, 200 2 20, 400 2. 05 Total unknowns = 78, 576, 556, 800 Number of groups = 2 keff tolerance = 1. 0 e-3 • Arnoldi performs best, but is even more efficient at tighter convergence • 27 v 127 iterations for eigenvector convergence of 0. 001 • The GS solver cannot use more computational resource for a problem of this spatial size • Simply using more spatial partitions will not result in more efficiency 24 • Problem cannot effectively use more cores to run a higher Managedfidelity by UT-Battelle problem in energy for the U. S. Department of Energy Denovo Parallel SN

Parallel Scaling and Peak Performance 17, 424 cores is effectively the maximum that can be used by KBA alone 1, 728, 684, 249, 600 unknowns (44 groups) 78, 576, 556, 800 unknowns (2 groups) Multilevel solvers allow weak scaling beyond the KBA wavefront limit MG Krylov solver partitioned across 11 sets 25 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Strong Scaling Optimized communication gave performance boost to 100 K core job, number of sets = 11 At 200 K cores, the multiset communication dominates, number of sets = 22 • Communication improvements were significant at 100 K core level (using 11 sets). • They do not appear to scale to 200 K core. Why? • Multiset reduction each iteration imposes a constant cost! 26 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Scaling Limitations 27 • Reduction across groups each iteration imposes a “flat” cost • Only way to reduce this cost is to increase the work per set each iteration (more angles) • Generally the work in space will not increase because we attempt to keep the number of blocks per domain constant Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

GPU Performance • Single core (AMD Istanbul) / single GPU (Fermi C 2050 comparison • For both processors, code attains 10% of peak flop rate AMD Istanbul 1 core 28 NVIDIA C 2050 Fermi Kernel 171 sec compute time 3. 2 sec PCIe-2 time (faces) -- 1. 1 sec TOTAL 171 sec 4. 2 sec Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN Speedup 54 X 40 X

AMA V&V Activities • Andrew Godfrey (AMA) has performed a systematic study of Denovo on a series of problems – – 2/3 D pins 3 x 3 lattice 17 x 17 full lattice ¼ core • Examined differencing schemes, quadratures, and solvers • Of primary interest was the spatial resolution needed to obtain convergence (used Denovo python pincell-toolkit to generate meshes) • Results compared to Monte Carlo (KENO) runs using identical data 29 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Quarter Core Simulations • Good results are achieved (< 40 pcm) using 4 x 4 radial zoning, 15. 24 cm axial zoning, and QR 2/4 quadrature – results attained in 42 min runtime using 960 cores • Running a 1. 6 G-cell problem is feasable (190 KCPU-hours) 30 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Japanese Events • ORNL and CASL are working to address the Japanese Emergency • We are developing new models of the spent fuel pool as data comes in 31 • Running thermohydraulics and transport calculations to Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Questions 32 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

RQI Solver • Shift the right-hand side and take Rayleigh. Quotient accelerates eigenvalue convergence • In each inner we have the following multigroup problem to solver for the next eigenvector iterate, • As this converges the MG problem becomes very difficult to solve (preconditioning is essential): 33 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

MG Krylov Preconditioning • Each MG Krylov iteration involves two-steps – preconditioning: – matrix-vector multiply: • At end of iteration we must apply the preconditioner one last time to recover • We use a simple 1 -D multigrid preconditioner in energy: – 1 -pass V-cycle 34 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

V-Cycle Relaxation • We are investigating both weighted-Jacobi • And weighted-Richardson relaxation schemes • Energy-parallelism is largely preserved 35 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Traditional SN Solution Methods • Traditional SN solutions are divided into outer iterations over energy and inner iterations over space-angle. • Generally, accelerated Gauss-Seidel or SOR is used for outer iterations. • Eigenvalue forms of the equation are solved using Power Iteration • In Denovo we are motivated to look at more advanced solvers 36 – Improved robustness – Improved efficiency – Improved parallelism Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Krylov Methods • Krylov methods are more robust than stationary solvers – Uniformly stable (preconditioned and unpreconditioned) • Can be implemented matrix-free • More efficient – Source iteration spectral radius – Gauss-Seidel spectral radius • There is no coupling in Krylov methods 37 – Gauss-Seidel imposes coupling between rows in the matrix Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Pin Cell and Lattice Results • Summary: – Pin cell yields converged results with 6 x 6 radial mesh and QR 4/4 quadrature (32 PCM agreement with 49 Groups) – Lattice yields converged results with 4 x 4 radial mesh and QR 2/4 quadrature (2 PCM agreement with 49 Groups); excellent pin-power aggrement (< 0. 3%) – Both problems converge consistently to 50 x 50 radial mesh and QR 10/10 quadrature 38 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Current State-of-the-Art in Reactor Neutronics • 0/1 -D transport • High energy fidelity (102 -5 unknowns) pin cell • Approximate state and BCs • 2 -D transport • Moderate energy fidelity (7 -102 groups) • Approximate state and BCs • Depletion with spectral corrections • Space-energy homogenization lattice cell • 3 -D diffusion • Low energy fidelity (2 -4 groups) • Homogeneous lattice cells • Heterogeneous flux reconstruction • Coupled physics core General Electric ESBWR 39 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN

Verification and Validation • We have successfully run the C 5 G 7 (unrodded) 3 D and 2 D benchmarks – All results within ~30 pcm of published benchmark – Linear-discontinuous spatial differencing (although SC differencing gave similar results) – Clean and mixed-cell material treatments (preserving absolute fissionable fuel volumes) 40 Managed by UT-Battelle for the U. S. Department of Energy Denovo Parallel SN