Parallelizing stencil computations Based on slides from David
Parallelizing stencil computations Based on slides from David Culler, Jim Demmel, Bob Lucas, Horst Simon, Kathy Yelick, et al. , UCB CS 267
Parallelism in the Game of Life • The activities in this system are discrete events • The simulation is synchronous • use two copies of the grid (old and new) • the value of each new grid cell in new depends only on the 9 cells (itself plus neighbors) in old grid (“stencil computation”) • Each grid cell update is independent: reordering or parallelism OK • simulation proceeds in timesteps, where (logically) each cell is evaluated at every timestep old world new world 2
Parallelism in Life • Parallelism is straightforward • ocean is regular data structure • even decomposition across processors gives load balance • Locality is achieved by using large patches of the world • boundary values from neighboring patches are needed • Optimization: visit only occupied cells (and neighbors) 3
Two-dimensional block decomposition • If each processor owns n 2/p elements to update, … • … amount of data communicated, n/p per neighbor, is relatively small if n>>p • This is less than n per neighbor for block column decomposition 4
Redundant “Ghost” Nodes in Stencil Computations To compute green Copy yellow Compute blue • Size of ghost region (and redundant computation) depends on network/memory speed vs. computation • Can be used on unstructured meshes 5
Comments on practical meshes • Regular 1 D, 2 D, 3 D meshes • Important as building blocks for more complicated meshes • Practical meshes are often irregular • Composite meshes, consisting of multiple “bent” regular meshes joined at edges • Unstructured meshes, with arbitrary mesh points and connectivities • Adaptive meshes, which change resolution during solution process to put computational effort where needed 6
Parallelism in Regular meshes • Computing a Stencil on a regular mesh • need to communicate mesh points near boundary to neighboring processors. • Often done with ghost regions • Surface-to-volume ratio keeps communication down, but • Still may be problematic in practice Implemented using “ghost” regions. Adds memory overhead 7
Irregular mesh: NASA Airfoil in 2 D 8
Composite Mesh from a Mechanical Structure 9
Converting the Mesh to a Matrix 10
Adaptive Mesh Refinement (AMR) • Adaptive mesh around an explosion • Refinement done by calculating errors • Parallelism • Mostly between “patches, ” dealt to processors for load balance • May exploit some within a patch (SMP) 11
fluid density Adaptive Mesh Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement) See: http: //www. llnl. gov/CASC/SAMRAI/ 12
Irregular mesh: Tapered Tube (Multigrid) 13
Challenges of Irregular Meshes for PDE’s • How to generate them in the first place • E. g. Triangle, a 2 D mesh generator by Jonathan Shewchuk • 3 D harder! E. g. QMD by Stephen Vavasis • How to partition them • Par. Metis, a parallel graph partitioner • How to design iterative solvers • PETSc, a Portable Extensible Toolkit for Scientific Computing • Prometheus, a multigrid solver for finite element problems on irregular meshes • How to design direct solvers • Super. LU, parallel sparse Gaussian elimination 14 • These are challenges to do sequentially, more so in parallel
- Slides: 14